But we're giving agents terminal access and API keys now. The attack vector is becoming natural language. An agent gets "socially engineered" by a prompt; another hallucinates fake data and passes it down the chain.
Trying to secure these systems feels like trying to write a regex that catches every possible lie. We've shifted the foundation of security from numbers to words, and I don't think we've figured out what that means yet.
Is anyone thinking about actual architectural solutions to this? Not just "use another LLM to guard the LLM" — that feels like circular logic. Something fundamentally different.
(Not a native English speaker, used AI to clean up the grammar.)
nine_k•1h ago
lielcohen•1h ago
codingdave•1h ago
That answer hasn't changed since day one of LLMs, despite some of the thing people are attempting to build these days: If you don't want to get in trouble, don't give LLMs access to anything that can cause actual harm, nor give them autonomy.
lielcohen•1h ago
"Don't give it access" is like saying "don't connect to the internet" in 1995. The question isn't whether agents get these permissions. They will. The question is what happens when they do.
nine_k•18m ago
nine_k•14m ago
My answer is simple: it just won't be all right this way. The problems will cost the management who drank too much kool-aid; maybe they already do (check out what was happening at Cloudflare recently). Sanity will return, now as a hard-won lesson.