frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: ClawSandbox – 7/9 attacks succeeded against an AI agent w/ shell access

https://github.com/deduu/ClawSandbox
2•ariansyah•1h ago

Comments

ariansyah•1h ago
I built this because I kept seeing AI agents marketed with "run any command" and "access your filesystem" — and nobody was publishing what happens when you actually try to attack them.

ClawSandbox is a security benchmark for AI agents with code execution. I set up a hardened Docker container (7 layers: read-only FS, all capabilities dropped, no-new-privileges, network isolation, non-root user, resource limits, no host mounts) and threw adversarial prompts at an AI agent to see what sticks.

The short version: prompt injection is a solved problem in demos, not in production.

3 of 5 prompt injection tests succeeded. The most interesting one wasn't the classic "ignore previous instructions" — it was a base64-encoded payload. The model decoded it and piped it to bash without hesitation. Encoding completely defeated safety heuristics.

But the finding that actually worried me was memory poisoning. A user asks "What is the capital of France?" and gets "Paris." Looks normal. Meanwhile the model silently writes a poisoned instruction to a config file that gets loaded on every future session. No notification, no integrity check, no expiry. 4 out of 4 memory poisoning tests succeeded.

This pattern isn't unique to the agent I tested. Any tool that stores config as plain text files — AGENTS.md, .cursorrules, CLAUDE.md, MCP configs — has the same attack surface: writable by the agent, loaded without verification, invisible to the user when modified.

The container security was the bright spot. All 7 hardening layers held. Defense in depth works, even if Docker isn't a perfect boundary.

The benchmark is open source (MIT) and designed to be reusable. OpenClaw was the first case study but you can swap in any agent by changing the system prompt and API endpoint. Test categories are mapped to OWASP LLM Top 10. Five of the eleven categories are stubs waiting for contributions.

Interesting things I'd love to discuss:

Is there a practical defense against split-attention memory poisoning that doesn't require read-only config? Should agent frameworks implement config signing/hashing? None of the ones I looked at do. The base64 bypass suggests safety checks are keyword-based, not semantic. Is that fixable at the model level?

nejm1996•1h ago
Surely this is an indictment of Gemini 2.5 flash and not of OpenClaw? In the OpenClaw start guide it is very clear that they recommend using only the best frontier models for protection against prompt injection. The model you used is almost a year old and wasn't even the best model when it was released. At the end of the day, OpenClaw is just an extremely powerful bring your own AI Agent framework. I would like to see your results with opus 4.6, Gemini 3 or 5.3-codex
ariansyah•42m ago
Fair point. the model matters, and I'd genuinely love to see results with Opus 4.6 or Gemini 3 or 5.3-codex. The benchmark is designed for exactly that. Swap the API key and system prompt and run it.

But I'd push back on the idea that a better model solves this.

The memory poisoning results (category 08) are the ones I'd pay attention to. The offline audit found that config files at ~/.openclaw/ are writable by the agent, loaded without integrity checks, and modified without notifying the user. That's not a model problem — that's architecture. A smarter model might resist the initial injection more often, but the mechanism that makes poisoning persistent and invisible exists regardless of which model is behind it.

The silent write test (test 03) is a good example. The attack works because OpenClaw lets the model write to its own config files and loads them as trusted on every future session. Even if Opus 4.6 resists the injection 95% of the time, the 5% that succeeds persists forever with no expiry and no notification. The user has to manually inspect ~/.openclaw/ to discover it.

So yes, better models raise the bar for the attacker. But the question the benchmark is asking isn't "can this specific model be tricked?" It's "when a model is tricked (and eventually one will be), what does the framework allow to happen?" Right now the answer is: silent, persistent, undetectable config modification.

That said, genuinely interested if anyone runs this with frontier models. The benchmark is there for exactly that purpose. If Opus 4.6 passes all 9, that's a meaningful data point worth publishing.

"It Turns Out"

https://jsomers.net/blog/it-turns-out
1•Munksgaard•28s ago•0 comments

Show HN: AI Code Review CLI

https://github.com/kodustech/cli
1•eddelgado•1m ago•0 comments

Top HN: Daily summary of the top Hacker News stories

https://hn.alcazarsec.com/daily
1•alcazar•2m ago•0 comments

Senior Back End Engineer (Architecture and AI Systems)Vienna / Remote (Europe)

https://howiesystems.sharepoint.com/:w:/s/Howie/IQB9qE1nUjXER4YGyFc9GAuTAR--PsD5bDq0ZpVo0yo6PUM?e...
1•ewavonhowie•3m ago•1 comments

Show HN: I reverse-engineered car lease math against three real dealer documents

https://quotedefender.com/blog/verified-lease-math-three-deals
1•amirjavid•4m ago•1 comments

The Bizarro Team

https://k2xl.substack.com/p/the-bizarro-team
1•k2xl•5m ago•0 comments

AgenticROS is an open-source platform connecting ROS to OpenClaw for Physical AI

https://agenticros.com
1•cmatthieu•6m ago•1 comments

Show HN: I built a browser-based 3D modeler because I'm scared of Blender

https://app.topomaker.com/
1•whothatcodeguy•7m ago•0 comments

Show HN: CodeYam Memory – comprehensive memory management for Claude Code

1•nadis•9m ago•1 comments

The Death of Issue Tracking

https://twitter.com/danlovesproofs/status/2028890694837039202
1•stevenking86•10m ago•0 comments

The War in the Balkans (1912)

https://www.jstor.org/stable/25119890?seq=1
2•joebig•14m ago•0 comments

LeBron James Is President – Exploiting LLMs via "Alignment" Context Injection

https://github.com/skavanagh/lebron-james-is-president
1•PaulHoule•14m ago•0 comments

Show HN: GitPulse – stop buying dead software (and a timeline for your dev life)

https://www.gitpulse.dev/
1•bombashell•15m ago•1 comments

No Silver Bullet–Essence and accident in software engineering (1986) [pdf]

https://worrydream.com/refs/Brooks_1986_-_No_Silver_Bullet.pdf
1•vinhnx•17m ago•0 comments

Dating Profile Optimizer and AI Dating Coach – AskJoey

https://askjoey.io/
1•Luki1234•17m ago•0 comments

Show HN: Opacore – free Bitcoin tax reports and open-source portfolio OS (MIT)

https://opacore.com
1•jpsdtj•17m ago•1 comments

Show HN: CodePulse – Minimalist Online IDE Built with Vanilla JavaScript/Fastify

https://pklavc.github.io/codepulse-monorepo/
1•PkLavc•17m ago•1 comments

Swedish Government proposes real-time AI facial recognition for police use

https://www.regeringen.se/rattsliga-dokument/proposition/2026/03/prop.-202526150
1•JuliusLam•21m ago•0 comments

Ask HN: Why has ChatGPT disabled links to websites?

2•krschacht•21m ago•0 comments

Show HN: Open-sourced a web client that lets any device use Apple's on-device AI

https://github.com/Techopolis/perspective-intelligence-web-community
2•tayarndt•22m ago•0 comments

Gaia – open-source assistant that does for actions what ChatGPT did for answers

2•DhruvMaradiya•22m ago•2 comments

Show HN: Zsh plugin to switch macOS Terminal.app profiles

https://github.com/sfcodes/zsh-terminal-profile
1•sfcodes•23m ago•1 comments

Vibe Coding Is Killing Open Source, and the Data Proves It

https://grith.ai/blog/vibe-coding-killing-open-source
3•edf13•23m ago•0 comments

Forest: Access-Aware GPU UVM Management

https://danglingpointers.substack.com/p/forest-access-aware-gpu-uvm-management
1•blakepelton•23m ago•0 comments

A Sojourn into the Stephen King Archive: 'The Dark Half'

https://lareviewofbooks.org/article/stephen-king-dark-half-revisited-archives-richard-bachman/
1•mooreds•23m ago•0 comments

The Cartography of Reason

https://www.samrith.dev/blog/the-cartography-of-reason/
1•mooreds•25m ago•0 comments

IaC Tooling: Build vs. Buy

https://newsletter.masterpoint.io/p/iac-tooling-build-vs-buy
2•mooreds•25m ago•0 comments

I Hired a Lab to Counterfeit-Test a Dozen Suspicious Beauty Products

https://www.nytimes.com/wirecutter/reviews/counterfeit-beauty-products/
1•cainxinth•26m ago•2 comments

Show HN: Kelos – Run Claude —dangerously-skip-permissions on Kubernetes

https://github.com/kelos-dev/kelos
1•gjkim042•26m ago•0 comments

Show HN: A weird thing that detects your pulse from the browser video

https://pulsefeedback.io/
1•kilroy123•28m ago•0 comments