Sandboxes won't save you from OpenClaw

https://tachyon.so/blog/sandboxes-wont-save-you

63•logicx24•1h ago

Comments

hackingonempty•1h ago

Yes we need capability based auth on the systems we use.

I'm sure we will get them but only for use with in-house agents, i.e. GMail and Google Pay will get agentic capabilities but they'll only work with Gemini, and only Siri will be able to access your Apple cloud stuff without handing over access to everything, and if you want your grocery shopping handled for you, Rufus is there.

Maybe you will be able to link Copilot to Gemini for an extra $2.99 a month.

2gremlin181•3m ago

I do not forsee GoogleClaw, MetaClaw, and AppleClaw all playing well with each other. Everyone will have their own walled garden and we will be no better off than we are now.

edf13•1h ago

Agree, that’s why we’re building grith.ai

Sandboxing alone isn’t the right approach… a multi-faceted approach is what works.

What we’ve found that does work is automation on the approval process but only with very strong guards in place… approval fatigue is another growing problem - users simply clicking approve on all requests.

dmos62•55m ago

Interesting. How are the security filters implemented?

edf13•52m ago

Every system call, file access, net access etc is forced through a local “proxy” where 17 individual filters check what’s going on.

Everything is done locally via our grith cli tool.

Happy to answer any questions on hello@grith.ai too

imiric•39m ago

Was grift.ai too expensive?

edf13•36m ago

https://grith.ai/blog/what-grith-means

gz09•1h ago

Security models from SaaS companies based on having a bunch of random bytes/numbers with coarse-grained permissions, and valid for a very long time are already a bad idea. With agents, secrets/tokens really need to be minted with time-limited, scope-limited, OpenID/smart-contract based trust relationships so they will fare much better in this new world. Unfortunately, this is a struggle still for most major vendors (e.g., Github gh CLI still doesn't let you use Github Apps out-of-the box)

stronglikedan•1h ago

TL;DR: sandboxes can't save you from anything if the sandbox contains your secrets and has access to the outside world. a tale as old as time and nothing new to agents specifically

dinkleberg•1h ago

Call me overly cautious, but as someone using OpenClaw I never for a moment considered hooking it up to real external services as me. Instead I put it on one server and created a second server with shared services like Gitea and other self-hosted tools that are only accessible over a tailnet and openclaw is able to use those services. When I needed it to use a real external service I have created a limited separate account for it. But not a chance in the world am I going to just let it have full access to my own accounts on everything.

simonw•58m ago

That's not overly cautious, that's smart. I do not think most OpenClaw users are taking the same sensible measures as you are.

andrewflnr•53m ago

In the other hand, the AI hit piece guy seems to have put similar "sensible measures" in place, at least giving the claw its own accounts. Look what that got them.

giancarlostoro•37m ago

He shared his prompt. He basically prompted that model to be the Kanye of science tool coding (ego wise, not the racism).

skywhopper•47m ago

That is literally the only remotely safe approach.

supermdguy•1h ago

One promising direction is building abstraction layers to sandbox individual tools, even those that don't have an API already. For example, you could build/vibe code a daemon that takes RPC calls to open Amazon in a browser, search for an item, and add it to your cart. You could even let that be partially "agentic" (e.g. an LLM takes in a list of search results, and selects the one to add to cart).

If you let OpenClaw access the daemon, sure it could still get prompt injected to add a bunch of things to your cart, but if the daemon is properly segmented from the OpenClaw user, you should be pretty safe from getting prompt injected to purchase something.

AnimalMuppet•23m ago

Honest question: Could you define "agent" in this context?

logicx24•20m ago

Yeah, agreed. This is probably what that middleware would look like. That's also where you'd add the human approval flow.

cheriot•57m ago

This is a general thing with agent orchestration. A good sandbox does something for your local environment, but nothing for remote machines/APIs.

I can't say this loudly enough, "an LLM with untrusted input produces untrusted output (especially tool calls)." Tracking sources of untrusted input with LLMs will be much harder than traditional [SQL] injection. Read the logs of something exposed to a malicious user and you're toast.

tovej•45m ago

Even an LLM with trusted input produces untrusted output.

ramoz•43m ago

Information flow control is a solid mindset but operationally complex and doesn’t actually safeguard you from the main problem.

Put an openclaw like thing in your environment, and it’ll paperclip your business-critical database without any malicious intent involved.

paxys•37m ago

Given the "random" nature of language models even fully trusted input can produce untrusted output.

"Find emails that are okay to delete, and check with me before deleting them" can easily turn into "okay deleting all your emails", as so many examples posted online are showing.

I have found this myself with coding agents. I can put "don't auto commit any changes" in the readme, in model instructions files, at the start of every prompt, but as soon as the context window gets large enough the directive will be forgotten, and there's a high chance the agent will push the commit without my explicit permission.

ramoz•53m ago

I’ve said similar in another thread[1]:

Sandboxes will be left in 2026. We don't need to reinvent isolated environments; not even the main issue with OpenClaw - literally go deploy it in a VM on any cloud and you've achieved all same benefits. We need to know if the email being sent by an agent is supposed to be sent and if an agent is actually supposed to be making that transaction on my behalf. etc

——-

Unfortuently it’s been a pretty bad week for alignment optimists (meta lead fail, Google award show fail, anthropic safety pledge). Otherwise… Cybersecurity LinkedIn is all shuffling the same “prevent rm -rf” narrative, researchers are doing the LLM as a guard focus but this is operationally not great & theoretically redundant+susceptible to same issues.

The strongest solution right now is human in the loop - and we should be enhancing the UX and capabilities here. This can extend to eventual intelligent delegation and authorization.

[1] https://news.ycombinator.com/threads?id=ramoz&next=47006445

g_delgado14•43m ago

> meta lead fail, Google award show fail

Can I get some links / context on this please

ramoz•40m ago

Meta: https://x.com/summeryue0/status/2025774069124399363 context: meta alignment lead made rookie mistakes (their words) in instructing openclaw and lost their inbox to it.

Goog: https://deadline.com/2026/02/google-apologizes-bafta-news-al... *

Ant: https://time.com/7380854/exclusive-anthropic-drops-flagship-...

* There is now a clarification in the press saying it was not ai-generated.

Alignment as a solution to all of this has a rough long road ahead is my point.

notenlish•39m ago

I think the google award fail is this: https://www.forbes.com/sites/maryroeloffs/2026/02/24/google-...

meta lead fail: https://techcrunch.com/2026/02/23/a-meta-ai-security-researc...

dbl000•38m ago

The meta lead is probably a reference to Summer Yue having OpenClaw delete all the emails in her inbox despite being told not to.

https://x.com/summeryue0/status/2025774069124399363

gmueckl•37m ago

The Meta thing is the AI safety lead experimenting with OpenClawd on her inbox and the bloody thing deciding to follow her inbox cleanup instructions by "starting fresh" - deleting the inbox contents. It's the very first link in the linked story.

giancarlostoro•38m ago

> literally go deploy it in a VM on any cloud

Sure, but now you're adding extra cost, vs just running it locally. RAM is also heavily inflated thanks to Sam Altman investment magic.

ramoz•37m ago

Yea just an example. I personally have it running on a local Mac Mini (obviously aware that this isnt a perfect security measure, but I couldnt install on my laptop which has sensitive work access).

dheera•36m ago

> We need to know if the email being sent by an agent is supposed to be sent and if an agent is actually supposed to be making that transaction on my behalf. etc

At the same time, let's not let the perfect be the enemy of good.

If you're piloting an aircraft, yeah, you should have perfection.

But if you're sending 34 e-mails and 7 hours of phone calls back and forth to fight a $5500 medical bill that insurance was supposed to pay for, I'd love for an AI bot to represent me. I'd absolutely LOVE for the AI bot to create so much piles of paperwork for these evil medical organizations so that they learn that I will fight, I'm hard to deal with, and pay for my stuff as they're supposed to. Threaten lawyers, file complaints with the state medical board, everything needs to be done. Create a mountain of paperwork for them until they pay that $5500. The next time maybe they'll pay to begin with.

bee_rider•30m ago

The AI bot wouldn’t be representing you any more than your text editor would be. You would be using an AI bot to create a lot of text.

An AI bot can’t be held accountable, so isn’t able to be a responsibility-absorbing entity. The responsibility automatically falls through to the person running it.

logicx24•26m ago

True. But it can help me create a lot of useful text so I can represent my self better.

I do wonder what happens when everyone is using agents for this, though. If AI produces the text and AI also reads the text, then do we even need the intermediary at all?

doctorwho42•30m ago

Is this before or after they have already implemented their own models to reply to your mountain of paper work with their own auto denial system

bee_rider•35m ago

> We need to know if the email being sent by an agent is supposed to be sent and if an agent is actually supposed to be making that transaction on my behalf. etc

Isn’t this the whole point of the Claw experiment? They gave the LLMs permission to send emails on their behalf.

LLMs can not be responsibility-bearing structures, because they are impossible to actually hold accountable. The responsibility must fall through to the user because there is no other sentient entity to absorb it.

The email was supposed to be sent because the user created it on purpose (via a very convoluted process but one they kicked off intentionally).

ramoz•32m ago

I'm not too sure what you're asking, but that last part, I think, is very key to the eventual delegation.

Where we can verify the lineage of the user's intent originally captured and validated throughout the execution process - eventually used as an authorization mechanism.

Google has a good thought model around this for payments (see verifiable mandates): https://cloud.google.com/blog/products/ai-machine-learning/a...

b112•9m ago

I see a lot of discussion on that page about APIs and sign offs, but the real sign-off is installing anything on your computer, and then doing things.

The liability is yours.

Claude messes up? So sad, too bad, you pay.

That's where the liability need sit.

And one point on this is, every act of vibe coding is a lawsuit waiting to happen. But even every act by a company is too.

An example is therac-25:

https://en.wikipedia.org/wiki/Therac-25

Vibe coding is still coding. You're giving instructions on program flow, logic, etc. My rant here is, I feel people think that if the code is bad, it's someone else's fault.

But is it?

Animats•34m ago

> I’ve said similar in another thread[1]

Me too, at [1].

We need fine-grained permissions at online services, especially ones that handle money. It's going to be tough. An agent which can buy stuff has to have some constraints on the buy side, because the agent itself can't be trusted. The human constraints don't work - they're not afraid of being fired and you can't prosecute them for theft.

In the B2B environment, it's a budgeting problem. People who can spend money have a budget, an approval limit, and a list of approved vendors. That can probably be made to work. In the consumer environment, few people have enough of a detailed budget, with spending categories, to make that work.

Next upcoming business area: marketing to LLMs to get them to buy stuff.

[1] https://news.ycombinator.com/item?id=47132273

beepbooptheory•12m ago

What could "human in the loop" be here but just literally reading your own emails?

simonw•52m ago

I do find it amusing when I consider people buying a Mac Mini for OpenClaw to run on as a security measure... and then granting OpenClaw on that Mac Mini access to their email and iMessage and suchlike.

(I hope people don't do that, but I expect they probably do.)

latexr•41m ago

> I hope people don't do that, but I expect they probably do.

How about the corporate vice president of Microsoft Word?

https://www.omarknows.ai/p/meet-lobster-my-personal-ai-assis...

https://www.linkedin.com/in/omarshahine

It’s not going to be amusing when he gets hacked. Zero sense of responsibility.

kllrnohj•33m ago

I mean https://www.tomshardware.com/tech-industry/artificial-intell... just also happened.

jejeyyy77•38m ago

eh, the point of the Mac is so that it can have its own iMessage and iCloud account

programmarchy•18m ago

Then what’s the point of skills like apple-reminders? Isn’t the implication for a personal assistant styled OpenClaw setup that you allow it access to those tools on your behalf? Otherwise where is the benefit?

chaostheory•52m ago

Just treating it as an employee, would solve most of the problems I.e. it runs on its own machine with separate accounts for everything: email, git, etc…

TZubiri•49m ago

Oh ok, we'll add encryption then.

Checkmate atheists

downsplat•41m ago

I don't think openclaw can possibly be secured given the current paradigm. It has access to your personal stuff (that's its main use case), access to the net, and it gets untrusted third party inputs. That's the unfixable trifecta right there. No amount of filtering band-aid whack-a-mole is going to fix that.

Sandboxes are a good measure for things like Claude Code or Amp. I use a bubblewrap wrapper to make sure it can't read $HOME or access my ssh keys. And even there, you have to make sure you don't give the bot write access to files you'll be executing outside the sandbox.

observationist•26m ago

Current AI requires a human in the loop for anything non-trivial. Even the most used feature, coding, causes chaos without strict human oversight.

You can vibe-code a standalone repository, but any sort of serious work with real people working alongside bots, every last PR has to be reviewed, moderated, curated, etc.

Everything AI does that's not specifically intended to be a standalone, separate project requires that sort of intervention.

The safe way to do this is having a sandboxed test environment, high level visibility and a way to quickly and effectively review queued up actions, and then push those to a production environment. You need the interstitial buffer and a way of reverting back to the last known working state, and to keep the bot from having any control over what gets pushed to production.

Giving them realtime access to production is a recipe for disaster, whether it's your personal computer or a set of accounts built specifically for them or whatever, without your human in the loop buffer bad things will happen.

A lot of that can be automated, so you can operate confidently with high level summaries. If you can run a competent local AI and develop strict processes for review and summaries and so forth, kind of a defense in depth approach for agents, you can still get a lot out of ClawBot. It takes work and care.

Hopefully frameworks for these things start developing all of the safety security and procedure scaffolding we need, because OpenClaw and AI bots have gone viral. I'm getting all sorts of questions about how to set them up by completely non-technical people that would have trouble installing a sound system. Very cool to see, I'm excited for it, but there will definitely be some disasters this year.

zahlman•2m ago

> Even the most used feature, coding, causes chaos without strict human oversight.

s/Even/Especially , I would think. Everyone's idea of how to get any decent performance out of an LLM for coding, entails allowing the code to be run automatically. Nominally so that the LLM can see the results and iterate towards a user-provided goal; but it's still untrusted code.

logicx24•18m ago

One insidious thing is whitelists. If you allow the bot to run a command like `API_KEY=fdafsafa docker run ...`, then the API_KEY will be written to a file, and the agent can then read that in future runs. That bit me once already.

dgxyz•16m ago

That's a shit show in a shit show there!

zahlman•3m ago

> If you allow the bot to run a command like `API_KEY=fdafsafa docker run ...`, then the API_KEY will be written to a file

It wouldn't be inherently. Is this something that Docker does? Or perhaps something that was done by the code that was run? (Shouldn't it have stayed within that container?)

But also, if it's not okay for the agent to know the API key permanently, why is it okay for the agent to have one-off use of something that requires the same key? Did it actually craft a Bash command line with the API key set and request to run it; or was it just using a tool that ends up with that command?

luxuryballs•37m ago

makes me wonder if the metal it is running on is even a good enough sandbox, perhaps I should have it browse the web from a guest network isolated from other devices

ChicagoDave•36m ago

I’m late in looking at this OpenClaw thing. Maybe it’s because I’ve been in IT for 40 years or I’ve seen War Games, but who on earth gives an AI access to their personal life?

Am I the only one that finds this mind bogglingly dumb?

chickensong•27m ago

You're not alone

dgxyz•24m ago

No you're not the only one.

I've got my popcorn ready.

AlienRobot•12m ago

I genuinely don't know anymore. Another user linked this https://www.tomshardware.com/tech-industry/artificial-intell... and the irony is at satire levels.

By the way, was that that movie a boy plays a game with an A.I. and the same A.I. starts a thermonuclear war or something like that? I think I watched the start when I was a kid but never really finished it.

throwpoaster•25m ago

OpenClaw running Opus is intelligent, careful, polite. It has a lot to do with the underlying model.

And if you don’t connect it to stuff, it can’t connect.

logicx24•16m ago

But if I don't connect it to stuff, then what is it useful for?

throwpoaster•12m ago

As long as you’re careful, you can let it meat puppet you (go here do this).

You give it its own accounts, say email and calendar, and have it send you drafts and invite you to stuff. It doesn’t need your email and calendar.

Actually, I just asked my guy and he suggests just generating local ICS files. Even safer.

tonymet•10m ago

There are three ways to authorize agents that could work (1) scoped roles (2) PAM / entitlements or (3) transaction approval

The first two are common. With transaction approval the agent would operate on shadow pages / files and any writes would batch in a transaction pending owner approval.

For example, sending emails would batch up drafts and the owner would have to trigger the approval flow to send. Modifying files would copy on write and the owner would approve the overwrite. Updating social activity would queue the posts and the owner would approve the publish.

it's about the same amount of work as implementing undo or a tlog , it's not too complex and given that AI agents are 10000 faster than humans, the big companies should have this ready in a few days.

The problem with scoped roles and PAM is that no reasonable user can know the future and be smart about managing scoped access. But everyone is capable of reading a list of things to do and signing off on them.

Show HN: ClawMoat – Open-source host-level security for AI agents

How Expensify's OSS program is powering SWE-Lancer

Dear Back End Software Engineers: UX Is Your Job Too

Migrate to Vercel from Cloudflare

Show HN: Wikilangs Games – Wordle-like for 300 Languages

The world of hard power and the future of war against Ukraine

Game theory meets lattice gases and spin-glasses: Zero-player Entropy Game

Ask HN: Are "% improvement" stats in resumes an AI indicator?

Show HN: Chess960v2 – Over 100 Rounds Played (chess960v2.com)

Code Red for Humanity

Large-Scale Online Deanonymization with LLMs

Sprites: Stateful sandbox environments with checkpoint and restore

A gut-liver lipid flux checkpoint mediates FAHFA protection from MASLD

Anthropic Dials Back AI Safety Commitments

Wearable trackers can detect depression relapse weeks before it returns: study

Show HN: My focus had a pattern. I built a macOS app to make it visible

Is Perplexity's new Computer a safer version of OpenClaw?

Hexagon-MLIR: An AI Compilation Stack for Qualcomm's NPUs

CHICKEN Scheme

uf

An AI agent on an ESP32 that can automate sensors, relais, speak NATS, Telegram

Thoughts on Forth Programming

Computer History Museum Recovers Rare Unix History

Watching a Robotics Startup Die from the Inside

TranslateGemma now runs 100% in the browser on WebGPU with Transformers.js v4

What Holds America Together?

Show HN: Elev8or Run Creator Marketing Like Paid Ads

Michael Burry Reveals Accounting Tricks of Mag 7 Firms to Inflate Earnings

Show HN: Draw on Screen – a modern screen annotation tool with webcam

DataClaw