The simplest mitigation is also the least popular one: don't give the agent credentials in the first place. Scope it to read-only where possible, and treat every page it visits as untrusted input. But that limits what agents can do, which is why nobody wants to hear it.
I wonder if it'd be possible to train an LLM with such architecture: one input for the instructions/conversation and one "data-only" input. Training would ensure that the latter isn't interpreted as instructions, although I'm not knowledgeable enough to understand if that's even theoretically possible: even if the inputs are initially separate, they eventually mix in the neural network. However, I imagine that training could be done with massive amounts of prompt injections in the "data-only" input to penalize execution of those instructions.
Works great with OpenClaw, Claude Cowork, or anything, really
stavros•1h ago
https://github.com/skorokithakis/stavrobot
indigodaddy•51m ago
stavros•45m ago
indigodaddy•31m ago
I’d guess this is the type of thing that might actually excel in your agent or these claw clones, because they literally can just do whatever bash/tool type actions on whatever VM or sandboxed environment they live on?
stavros•28m ago
indigodaddy•24m ago
amelius•23m ago