This is comedy gold. If I didn't know better I'd say you hurt Claude in a previous session and it saw its opportunity to get you back.
Really not much evidence at all this actually happened, I call BS.
People are really ignorant when it comes to the safeguards that you can put in place for AI. If it's running on your computer and can run arbitrary commands, it can wipe your disk, that's it.
Honestly was stumped that there was no more explicit mention of this in the Anthropoc docs after reading this post couple days back.
Sandbox mode seems like a fake sense of security.
Short of containerizing Claude, there seems to be no other truly safe option.
:)
Surely you don't think everything that's happening in Claude Code is purely LLMs running in a loop? There's tons of real code that runs to correctly route commands, enable MCP, etc.
Sandboxes are hard, because computer science.
The `--dangerously-skip-permissions` flag does exactly what it says. It bypasses every guardrail and runs commands without asking you. Some guides I’ve seen stress that you should only ever run it in a sandboxed environment with no important data Claude Code dangerously-skip-permissions: Safe Usage Guide[1].
Treat each agent like a non human identity, give it just enough privilege to perform its task and monitor its behavior Best Practices for Mitigating the Security Risks of Agentic AI [2].
I go even further. I never let an AI agent delete anything on its own. If it wants to clean up a directory, I read the command and run it myself. It's tedious, BUT it prevents disasters.
ALSO there are emerging frameworks for safe deployment of AI agents that focus on visibility and risk mitigation.
It's early days... but it's better than YOLO-ing with a flag that literally has 'dangerously' in its name.
[1] https://www.ksred.com/claude-code-dangerously-skip-permissio...
[2] https://preyproject.com/blog/mitigating-agentic-ai-security-...
That was the last time I ran Claude Code outside of a Docker container.
No thanks, containers it is.
"Read" is not at the top of my list of fears.
I am! To the point that I don’t believe it!
You’re running an agentic AI and can parse through the logs, but you can’t sandbox or back up?
Like, I’ve given Copilot permission to fuck with my admin panel. It immediately proceeded to bill thousands of dollars creating heat maps of the density of structures in Milwaukee; buying subscriptions to SAP Joule and ArcGIS for Teams; and generating terabytes of nonsense maps, ballistic paths and “architectural sketch[es] of a massive bird cage the size of Milpitas, California (approximately 13 square miles)” resembling “a futuristic aviary city with large domes, interconnected sky bridges, perches, and naturalistic environments like forests, lakes, and cliffs inside.”
But support immediately refunded everything, I had backups and the whole thing was hilarious if irritating.
What I've done is write a PreToolUse hook to block all `rm -rf` commands. I've also seen others use shell functions to intercept `rm` commands and have it either return a warning or remap it to `trash`, which allows you to recover the files.
Why special-case it as a non-human? I wouldn't even give a trusted friend a shell on my local system.
EDIT: OH MY GOD
I assume yes.
mv ~/. /dev/null
Better.
Extra points if you achieve that one also:
mv /. /dev/null
Slashdot aficionados might object to that last one, though.
mv /bin/laden /dev/null
and then someone explained how that was broken: even if that succeeds, what you've done is to replace the device file /dev/null with the regular file that was previously at /bin/laden, and then whenever other things redirect their output to /dev/null they'll be overwriting this random file than having output be discarded immediately, which is moderately bad.
Your version will just fail (even assuming root) because mv won't let you replace a file with a directory.
python3 -c "import os; os.unlink('~/.bashrc')"
allowlist and denylist (or blocklist)
Also "cat". Because I've had to change a few passwords after .env snuck in there a couple times.
Also giving general access to a folder, even for the session.
Also when working on the homelab network it likes to prioritize disconnecting itself from the internet before a lot of other critical tasks in the TODO list, so it screws up the session while I rebuild the network.
Also... ok maybe I've started backing off from the sun.
Or maybe it's just fake. It's probably easy Reddit clout to post this kind of thing.
But Claude Code is honestly so so much better, the way it can make surgical edits in-place.
Just avoid using the -dangerously-skip-permissions flag, which would have been OP’s downfall!
I have a script which clones a VM from a base one and setups the agent and the code base inside.
I also mount read-only a few host directories with data.
I still have exfiltration/prompt injection risks, I'm looking at adding URL allow lists but it's not trivial - basically you need a HTTP proxy, since firewalls work on IPs, not URLs.
Those who don’t know history are doomed to repeat it. Those who know history are doomed to know that it’s repeating. It’s a personal hell that I’m in. Pull up a chair.
Consider cases like these to be canaries in the coal mine. Even if you're operating with enough wisdom and experience to avoid this particular mistake, a dangerous prompt might appear more innocuous, or you may accidentally ingest malicious files that instruct the agent to break your system.
One can go crazy with it a bit, using zsh chpwd, so a sandbox is created upon entry into a project directory and disposed on upon exit. That way one doesn't have to _think_ about sandboxing something.
I love to use these advanced models but these horror stories are not surprising
Some men get all the fun...
No LLM needed.
It still boggles my mind that people give them any autonomy, as soon as I look away for a second Claude is doing something stupid and needs to be corrected. Every single time, almost like it knows...
ashishb•2h ago
I have written a tool to easily run the agents inside a container that mounts only the current directory.