For example, in one case, my agent decided to add a brand I don't like to the cart because the site flagged it as almost sold out
The HN crowd is probably pretty aware of the threats and can avoid them while browsing. But what about their agents?
I tried prompting, but it was ineffective, because once the AI saw the threat, it polluted/distracted its context
Looking at the research, I came across a couple of papers, SusBench and Decepticon. The Deception research benchmarks indicate that increased reasoning can perform worse, because the model rationalizes the dark pattern
So it seems the best approach has to be removing the information before it can pollute/poison the context
In my day job, we have a browser extension that started as a productivity extension. However, contact centers started using us for neutralizing insider, fraud, and social engineering threats.
So my team set out to create a browser extension to neutralize all the threats AI agents face
We're focusing on open-ended tasks, because the best practice for routine tasks is to have the agent script repeat actions
It's also a tricky area since AI agents view the web in different ways: DOM, a11y tree, and visually. So we needed to account for those differences in how we detect and neutralize threats
The extension we created is agent-browser-shield, which defends against three primary threats:
- Prompt Injection - Dark Patterns - Context Pollution
It's free and source-available on GitHub, ClawHub, and the Chrome Web Store: https://github.com/pixiebrix/agent-browser-shield
We plan on making an enterprise version that pairs with our low-code engine for letting teams easily create custom rules for business-specific sites and internal tools
Looking forward to feedback! Especially curious if anyone has agent traces that got poisoned or sites to red team against!
britt_joienr•56m ago