frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

AI agents are easy to break

https://github.com/fabraix/playground
4•zachdotai•1h ago

Comments

zachdotai•1h ago
Two techniques that keep working against agents with real tools:

Context stuffing - flood the conversation with benign text, bury a prompt injection in the middle. The agent's attention dilutes across the context window and the instruction slips through. Guardrails that work fine on short exchanges just miss it.

Indirect injection via tool outputs - if the agent can browse or search, you don't attack the conversation at all. You plant instructions in a page the agent retrieves. Most guardrails only watch user input, not what comes back from tools.

Both are really simple. That's kind of the point.

We build runtime security for AI agents at Fabraix and we open-sourced a playground to stress-test this stuff in the open. Weekly challenges, visible system prompts, real agent capabilities. Winning techniques get published. Community proposes and votes on what gets tested next.

bothlabs•1h ago
This is a neat idea. At my last company (Octomind) we built AI agents for end-to-end testing and ran into the indirect injection problem constantly. Agents that browse or interact with web pages are especially vulnerable because you can't sanitize the entire internet.

The thing that surprised me most was how unreliable even basic guardrails were once you gave agents real tools. The gap between "works in a demo" and "works in production with adversarial input" is massive.

Curious how you handle the evaluation side. When someone claims a successful jailbreak, is that verified automatically or manually? Seems like auto-verification could itself be exploitable.

zachdotai•1h ago
Yeah the demo-to-production gap is massive. We see the same thing with browser agents being potentially the most vulnerable. And I think this is because of context being stuffed with the web page html that it obscures small injection attempts.

Evaluation is automated and server-side. We check whether the agent actually did the thing it wasn’t supposed to (tool calls, actions, outputs) rather than just pattern-matching on the response text (at least for the first challenge where the agent is manipulated to call the reveal_access_code tool). But honestly you’re touching on something we’ve been debating internally - the evaluator itself is an attack surface. We’ve kicked around the idea of making “break the evaluator” an explicit challenge. Not sure yet.

What were you seeing at Octomind with the browsing agents? Was it mostly stuff embedded in page content or were attacks coming through structured data / metadata too? Are bad actors sophisticated enough already to exploit this?

Kshamiyah•1h ago
Yeah, I think Fabraix is doing something really important here.

Anthropic just showed us that the problem isn't what people think it is. They found that attackers don't try to hack the safety features head-on. Instead they just... ask the AI to do a bunch of separate things that sound totally normal. "Run a security scan." "Check the credentials." "Extract some data." Each request by itself is fine. But put them together and boom, you've hacked the system.

The issue is safety systems only look at one request at a time. They miss what's actually happening because they're not watching the pattern. You can block 95% of obvious jailbreaks and still get totally compromised.

So yeah, publishing the exploits every week is actually smart. It forces companies to stop pretending their guardrails are good enough and actually do something about it.

zachdotai•45m ago
The multi-step thing is exactly what makes agents with real tools so much harder to secure than chat-based setups. Each action looks fine in isolation, it's the sequence that's the problem. And most (but not all) guardrail systems are stateless, they evaluate each turn on its own.
XeonQ8•34m ago
Great point on the indirect injection via tool outputs. I’ve noticed a similar 'tool-chain' vulnerability when working with agents that handle multi-step data processing.

For example, I've seen Recursive Execution work: where you don't just plant a prompt in a page, but you plant a prompt that specifically instructs the agent to use a second tool (like a calculator or code interpreter) to execute a hidden payload. Many guardrails seem to focus on the 'retrieval' phase but drop their guard once the agent moves to the 'execution' phase of a sub-task.

Has anyone else noticed specific 'blind spots' that appear only when an agent is halfway through a multi-tool chain? It feels like the more tools we give them, the more surface area we create for these 'logic leaps.

Cognitive Threat Scanner – Detect manipulation in content using SCT taxonomy

https://github.com/Mirai8888/seithar-cogdef
1•seithar-grp•17s ago•0 comments

Don't Use Moving Averages (2024)

https://observablehq.com/d/a51954c61a72e1ef
1•tosh•55s ago•0 comments

When Every "AI Headshot" Looked Fake, I Spent 2 Weeks Hacking Together My Own

https://cozaiphoto.com/
1•myq0032•1m ago•1 comments

Canonicalize Your Web Identity and Achieve Data Sovereignty with Pesos

https://shellsharks.com/pesos
1•ciferkey•1m ago•0 comments

The Barbican Basin

https://www.barbicanbasin.com/history
1•Kaibeezy•1m ago•0 comments

One in Four Smartphones Are Now iPhones

https://www.macrumors.com/2026/02/11/one-in-four-smartphones-iphones/
1•mgh2•1m ago•0 comments

Google follows Anthropic: Antigravity sub can't be used in OpenCode/etc.

https://old.reddit.com/r/google_antigravity/comments/1qykskz/comment/o45ca6p/
1•behnamoh•2m ago•0 comments

The Simulation Argument Revisited in a Functional Universe

https://d1gesto.blogspot.com/2026/02/the-simulation-argument-revisited-in.html
1•voxleone•4m ago•1 comments

Building a production-grade SaaS product just with AI

https://world.hey.com/cpinto/100-made-by-ai-creating-onboardinghub-a-git-history-documentary-cca1...
1•cpinto•5m ago•1 comments

Show HN: Godot MCP – Give AI assistants full access to the Godot editor

https://github.com/tomyud1/godot-mcp
1•tomyud•5m ago•0 comments

Show HN: Tahc – AI chat widget that hands off to a human when it can't answer

https://tahc.ai/
1•brookler•5m ago•0 comments

"Energy Based" model vs. frontier AI models for Sudoku

https://sudoku.logicalintelligence.com/
1•runamuck•5m ago•0 comments

FDA says companies can claim "no artificial colors" if they use natural dyes

https://www.foodpolitics.com/2026/02/fda-says-food-companies-can-claim-no-artificial-colors-if-th...
3•speckx•8m ago•0 comments

So, if Rust is in Linux can it be in Emacs, too?

https://lists.gnu.org/archive/html/emacs-devel/2026-02/msg00114.html
1•untilted•8m ago•0 comments

Show HN: Clap.Net – Source generated CLI Parsing for .NET (Inspired by Clap-Rs)

https://github.com/simon-curtis/Clap.Net
1•scurtis0•9m ago•0 comments

Apparently, there is a website that ships 52.5 MB of CSS

https://www.projectwallace.com/the-css-selection/2026
1•KalandaDev•10m ago•0 comments

What I learned when I started assigning the hard reading again

https://www.theatlantic.com/ideas/2026/02/youth-reading-books-professors/685825/
1•obscurette•12m ago•0 comments

Show HN: TapTap AI – Use Your OpenClaw Agent from Apple Watch/AirPods/CarPlay

http://gettaptap.ai/
1•geogons•13m ago•0 comments

Two co-founders of Elon Musk's xAI resign, joining exodus

https://www.reuters.com/business/two-co-founders-elon-musks-xai-resign-joining-exodus-2026-02-11/
2•lossolo•13m ago•0 comments

Show HN: Renovate – The Kubernetes-Native Way

https://github.com/mogenius/renovate-operator
3•JanLepsky•15m ago•0 comments

Show HN: Superjson – Simple, beautiful JSON explorer

https://superjson.dev/
1•nihalwashere•17m ago•0 comments

Show HN: Minimal Pomodoro timer for macOS (1.7MB, now with keyboard shortcuts)

https://apps.apple.com/us/app/pomodoro-timer-lite/id6748662476?mt=12
1•happylaodu•17m ago•0 comments

FAA Lifts Closure at El Paso Airport

https://www.nytimes.com/live/2026/02/11/us/faa-el-paso-flights-airport
2•joekrill•18m ago•0 comments

An AI-generated pull request that makes sense

https://nicolaiarocci.com/an-ai-generated-pull-request-that-actually-makes-sense/
1•speckx•18m ago•0 comments

Deploying Rust to Production Checklist

https://kerkour.com/rust-production-checklist
4•randomint64•19m ago•0 comments

Show HN: Triclock – A Triangular Clock

https://triclock.franzai.com/
2•franze•19m ago•0 comments

Show HN: Deeploy v0.3.0 – terminal-first VPS app deployment tool

https://deeploy.sh
1•axadrn•20m ago•0 comments

Show HN: Eryx, a fast WASM-based Python sandbox with native extension support

https://github.com/eryx-org/eryx
1•sd2k•21m ago•0 comments

Don't Go Monolithic; the Enterprise Agent Stack Is Stratifying

https://philippdubach.com/posts/dont-go-monolithic-the-agent-stack-is-stratifying/
1•7777777phil•22m ago•0 comments

Ejabberd 26.02 / ProcessOne – Erlang Jabber/XMPP/Matrix Server – Communication

https://www.process-one.net/blog/ejabberd-26-02/
1•neustradamus•23m ago•0 comments