Even with npm/pip, these may not be available on a base linux box.
Even then, some complex projects may need other tools that are not part of a base system (command line tools, redis, ...).
just give it its own machine and let it check out any code
I PXE boot it from a known image when I feel the need
With these powers there's a lot less back-and-forth with me running commands, copying the output, pasting it to Claude, etc.
I'm sure you've had the case where you had to instruct someone to do something (e.g. playing tech support with family, helping another engineer, etc). While it helps the other person learn, it feels soooo slow vs just doing it yourself :) And since I don't have to teach the agent, I think this approach makes sense.
Syncthing works well for getting a local copy of a directory from the VM.
One frustrating thing about these solutions is that they’re great to prevent Claude from breaking a machine, but there’s no pervasive sandbox for third party services
So it's basically adding "don't delete my files pretty please" to the prompt?
EDIT: I misread, the natural language description of the rule is just a shortcut to generate the actual rule which is based on regexp patterns.
Still, it only protects you against very specific commands. Won't help you if the LLM decides to fill your disk with `cat /dev/urandom > foo` for example.
https://code.claude.com/docs/en/sandboxing#sandboxing
> Claude Code includes an intentional escape hatch mechanism that allows commands to run outside the sandbox when necessary. When a command fails due to sandbox restrictions (such as network connectivity issues or incompatible tools), Claude is prompted to analyze the failure and may retry the command with the dangerouslyDisableSandbox parameter.
The ability for the agent itself to decide to disable the sandbox seems like a flaw. But do I understand correctly that this would cause a pause to ask for the user's approval?
[0] https://github.com/anthropics/claude-code/issues/14268
Side note: I wish Anthropic would open source claude code. filing an issue is like tossing toilet paper into the wind.
There's a bug in that it can't output smart quotes “like this”
Sonnet, Opus et al think they output it but something in the pipeline is rewriting it
https://github.com/firasd/vibesbench/blob/main/docs/2026/A/t...
Try it in Claude Code and you'll see what I mean! Very weird
The only access the container has are the folders that are bind mounted from the host’s filesystem. The container gets network access from a transparent proxy.
https://github.com/dogestreet/dev-container
Much more usable than setting up a VM and you can share the same desktop environment as the host.
I ended up getting a mini-PC solely dedicated toward running agents in dangerous mode, it's refreshing to not have to think too much about sandboxing.
This allows you to use Claude Code from your mobile device, in a safe environment (restricted Kubernetes pod)
https://old.reddit.com/r/ClaudeAI/comments/1pgxckk/claude_cl...
as
"Bash(az resource:)",
is much more permissive than
"Bash(az resource show:
)",It mostly gets it right but I instantly fix the file with the "readonly" version when it gets it too open.
There is definitely a real world risk. You should browse the ai coding subreddits, the regularity of `rm -rf` disasters is, sadly, a great source of entertainment for me.
There's not a tonne of tooling for that use case now, although it's not too hard to put together I vibe-coded something that works for my use case fairly quickly (CC + Opus 4.5 seemed to understand what's needed)
I needed a way to run Claude marketplace agents via Discord. Problem: agents can execute code, hit APIs, touch the filesystem—the dangerous stuff. Can't do that in a Worker's 30s timeout.
Solution: Worker handles Discord protocol (signature verification, deferred response) and queues the task. Cloudflare Sandbox picks it up with a 15min timeout and runs claude --agent plugin:agent in an isolated container. Discord threads store history, so everything stays stateless. Hono for routing.
This was surprisingly little glue. And the Cloudflare MCP made it a breeze do debug (instead of headbanging against the dashboard). Still working on getting E2E latency down.
I have such a love/hate relationship with VirtualBox. It's so useful but so buggy. My current installation has a bug that causes high network latency, but I'm afraid to upgrade in case it introduces new, worse bugs.
VMware is a million times better, but it is also Proprietary™
I do believe in the whole RMS "respects the user's freedoms" spiel, so all things being equal I prefer FOSS, even if it's worse - but there are limits.
There's a load of ways that a repository owner can get an LLM agent to execute code on user's machines so not a good plan to let them run on your main laptop/desktop.
Personally my approach has been put all my agents in a dedicated VM and then provide them a scratch test server with nothing on it, when they need to do something that requires bare metal.
I'm working on targeting both the curl|bash pattern and coding agents with this (via smart out of the box profiles). Early stages but functional. Feedback and bug reports would be appreciated.
Instead you can just mount the socket and call docker from within docker.
Shannot[0] captures intent before execution. Scripts run in a PyPy sandbox that intercepts all system calls - commands and file writes get logged but don't happen. You review in a TUI, approve what's safe, then it actually executes.
The trade-off vs VMs: VMs let Claude do anything in isolation, Shannot lets Claude propose changes to your real system with human approval. Different use cases - VMs for agentic coding, whereas this is for "fix my server" tasks where you want the changes applied but reviewed first.
There's MCP integration for Claude, remote execution via SSH, checkpoint/rollback for undoing mistakes.
Feedback greatly appreciated!
ompogUe•6d ago
emilburzo•6d ago
But if you need something more strict, 'config.vm.synced_folder' also supports 'type rsync', which will copy the source folder at startup to the VM, but then it's on you to sync it back or whatever.
ompogUe•5d ago
Thanks
gregoriol•1h ago
gcr•1h ago
Version control ain’t a match for a good backup
gregoriol•41m ago