Can we build our own python sandbox using the sandboxfile spec? This is if I want to add my own packages. Would this be just having my own requirements file here - https://github.com/microsandbox/microsandbox/blob/main/MSB_V...
> Can we build our own python sandbox using the sandboxfile spec?
Yes and I plan to make that work with the SDK.
PS: Multi-stage build is WIP.
Cloud Hypervisor and Firecracker both have an excellent reputation for ultra lightweight VM's. Both are usable in the very popular Kata Containers project (as well as other upstart VM's Dragonball, & StratoVirt). In us by for example the CNCF Confidential Containers https://github.com/kata-containers/kata-containers/blob/main... https://confidentialcontainers.org/
There's also smaller efforts such as firecracker-containerd or Virtink, both which bring OCI powered microvms into a Docker like position (easy to slot into Kubernetes), via Firecracker and Cloud Hypervisor respectively. https://github.com/smartxworks/virtink https://github.com/firecracker-microvm/firecracker-container...
Poking around under the hood, microsandbox appears to use krun. There is krunvm for OCI support (includes MacOS/arm64 support!). https://github.com/containers/krunvm https://github.com/slp/krun
The orientation as a safe sandbox for AI / MCP tools is a very nicely packaged looking experience, and very well marketred. Congratulations! I'm still not sure why this warrants being it's own project.
So much of the solutions to this stuff I see come from a GitHub repo with a few dozen commits and often a README that says "do not rely on this software yet".
Definitely going to play with it a bit though, I love the idea of hooking into Apple's Hypervisor.framework (which absolutely fits my billion-dollar-company requirement.)
https://dev.to/rimelek/using-gvisors-container-runtime-in-do...
After this is done, it is:
docker run --rm --runtime=runsc hello-world
I've thought about making one of these for other coding agents. It's not quite as trivial as it looks and I know how to do it, also on Windows, although it seems quite a few coding agents just pretend Windows doesn't exist unfortunately.
I'm also disheartened by how the man pages for some of the macOS sandboxing commands have declared them deprecated for at least the last five years: https://7402.org/blog/2020/macos-sandboxing-of-folder.html
Is AI a special case? Maybe! I have some ideas about how to do AI sandboxing in a way that works more with the grain of macOS, though god knows when I'll find the time for it!
im working on a wrapper that lets you swap runtimes and my first implementation is mostly a wrapper around docker containers
planning to add firecracker next
will explore adding microsandbox too cool stuff!
However, by looking at it and playing with a few simple examples, I think this is the one that looks the closest so far.
Definitely interested to see the FS support, and also some instruction on how to customize the images to e.g. pre-install common Python packages or Rust crates. As an example, I tried to use the MCP with some very typical use-cases for code-execution that OpenAI/Anthropic models would generate for data analysis, and they almost always include using numpy or a excel library, so you very quicly hit a wall here without the ability to include libraries.
That said I don't think either KataContainer or Cloud Hypervisor has first-class support for macOS.
Edit: when I say anything, I'm not talking user programs. I mean as in, before even the first instruction of the firmware -- before even the virtual disk file is zeroed out, in cases where it needs to be. You literally can't pause the VM during this interval because the window hasn't even popped up yet, and even when it has, you still can't for a while because it literally hasn't started running anything. So the kernel and even firmware initialization slowness are entirely irrelevant to my question.
Why is that?
If I let a VM use most of my hardware, it takes a few seconds from start to login prompt, which is the same time it takes for my Arch desktop to boot from pressing the button to seeing the login prompt.
That's not what I'm asking.
I'm saying it takes a long time for it to even execute a single instruction, in the BIOS itself. Even for the window to pop up, before you can even pause the VM (because it hasn't even started yet). What you're describing comes after all that, which I already understand and am not asking about.
I think task manager would tell you if there is a blip of memory usage and paging activity at the time. And I'm sure windows itself has profilers that can tell you what is happening when the VM is started..
Again, it was a few years ago, but we didn't solve the problem or identify an actual root cause. We stopped banging our heads against that particular wall and switched technologies.
I’ve noticed that windows can only evict data from the page cache at about 5 GB/s. I do not know if this zeros the memory or that would need to be done in the allocation path.
A couple years ago I tracked down a long pause while starting qemu on Linux to it zeroing the 100s of GB of RAM given to the VM as 1 GB huge pages.
These may or may not be big contributors to what you are seeing, depending on the VM’s RAM size.
So this meant VMWare, VirtualBox, etc as they were would no longer work on Windows. Microsoft required all of them to switch to using Hyper-V libs behind the scenes to launch Hyper-V VMs and then present them as their own (while hiding them from the Hyper-V UI).
VirtualBox was slow, hot garbage on its own before this happened, but now it's even worse. They didn't optimize their Hyper-V integration as well as VMWare (eventually) did. VMWare is still worse off than it was though since it has to inherit all of Hyper-V's problems behind the scenes.
Hope this brings some clarity.
You also need to start OS services, configure filesystems, prepare caches, configure networking, and so on. If you're not booting UKIs or similar tools, you'll also be loading a bootloader, then loading an initramfs into memory, then loading the main OS and starting the services you actually need, with eachsstep requiring certain daemons and hardware probes to work correctly.
There are tools to fix this problem. Amazon's Firecracker can start a Linux VM in a time similar to that of a container (milliseconds) by basically storing the initialized state of the VM and loading that into memory instead of actually performing a real boot. https://firecracker-microvm.github.io/
On Windows, I think it depends on the hypervisor you use. Hyper V has a pretty slow UEFI environment, its hard disk access always seems rather slow to me, and most Linux distro don't seem to package dedicated minimal kernels for it.
I'm saying it takes a long time for it to even execute a single instruction, in the BIOS itself. Even for the window to pop up, before you can even pause the VM (because it hasn't even started yet). What you're describing comes after all that, which I already understand and am not asking about.
On windows it was almost 10x faster. On the project where this change was released, my morning ritual was to come in, log on, run an svn pull command, lock my screen and go get coffee. I had at least ten minutes to kill after I got coffee, if the pot wasn’t empty when I got there.
Windows is hot garbage about fopen particularly when virus scanning is on.
For example: your VM starts up with the CPU in 16 bit mode because that’s just how things work in x86 and then it waits for the guest OS to set the CPU into 64 bit mode.
This is completely unnecessary if you just want to run x86-64 code in a virtualized environment and you control the guest kernel and can just assume things are in 64bit mode because it’s not the 70s or whatever
The guest OS would also need to probe few ports to get a bootable disk. If you control the kernel then you can just not do that and boot directly.
There’s a ton of stuff that isn’t needed
A virtual environment doesn’t even really need any BIOS or anything like that.
You can feel free to test with qemu direct kernel booting to see this skips a lot of delay without even having to use a specialized hypervisor like firecracker
I'm using Hyper-V and I can connect through XRDP to a GUI Ubuntu 22 in 10 seconds and I can SSH into a Ubuntu 22 server in 3 seconds after start.
In practice virtual machines are trying to emulate a lot of stuff that isn’t really needed but they’re doing it for compatibility.
If one builds a hypervisor which is optimized for startup speed and doesn’t need to support generalized legacy software then you can:
> Unlike traditional VMs that might take several seconds to start, Firecracker VMs can boot up in as little as 125ms.
Windows probably has an equivalent.
I wonder how this compares to Orbstack's [0] tech stack on macOS, specifically the "Linux machines" [1] feature. Seems like Orb might reuse a single VM?
---
Firecracker is no different btw and E2B uses that for agentic AI workloads. Anyway, I don't have any major plan except fix some issues with the filesystem rn.
That is an ideal use case
> Are there better alternatives?
Created microsandbox because I didn't find any
gVisor has performance problems, though. Their data shows 1/3rd the throughput vs. docker runtime for concurrent network calls--if that's an issue for your use-case.
We're building an IoT Cloud Platform, Fostrom[1] where we're using Javy to power our Actions infrastructure. But instead of compiling each Action's JS code to a Javy WASM module, I figured out a simpler way by creating a single WASM module with our wrapper code (which contains some further isolation and helpful functions), and we provide the user code as an input while executing the single pre-compiled WASM module.
Overlays are always tough because docker doesn’t like you writing to the filesystem in the first place. The weapon if first result is deflection; tell them not to do it.
I had to put up with an old docker version that leaked overlay data for quite a while before we moved off prem.
I'd like to see a formal container security grade that works like:
1) Curate a list of all known (container) exploits
2) Run each exploit in environments of increasing security like permissions-based, jail, Docker and emulator
3) The percentage of prevented exploits would be the score from 0-100%
Under this scheme, I'd expect naive attempts at containerization with permissions and jails to score around 0%, while Docker might be above 50% and Microsandbox could potentially reach 100%.This might satisfy some of our intuition around questions like "why not just use a jail?". Also the containers could run on a site on the open web as honeypots with cash or crypto prizes for pwning them to "prove" which containers achieve 100%.
We might also need to redefine what "secure" means, since exploits like Rowhammer and Spectre may make nearly all conventional and cloud computing insecure. Or maybe it's a moving target, like how 64 bit encryption might have once been considered secure but now we need 128 bit or higher.
Edit: the motivation behind this would be to find a container that's 100% secure without emulation, for performance and cost-savings benefits, as well as gaining insights into how to secure operating systems by containerizing their various services.
The only way to make Linux containers a meaningful sandbox is to drastically restrict the syscall API surface available to the sandboxee, which quickly reduces its value. It's no longer a "generic platform that you can throw any workload onto" but instead a bespoke thing that needs to be tuned and reconfigured for every usecase.
This is why you need virtualization. Until we have a properly hardened and memory safe OS, it's the only way. And if we do build such an OS it's unclear to me whether it will be faster than running MicroVMs on a Linux host.
The only meaningful difference is that Linux containers target partitioning Linux kernel services which is a shared-by-default/default-allow environment that was never designed for and has never achieved meaningful security. The number of vulnerabilities resulting from, "whoopsie, we forgot to partition shared service 123" would be hilarious if it were not a complete lapse of security engineering in a product people are convinced is adequate for security-critical applications.
Present a vulnerability assessment demonstrating a team of 10 with 3 years time (~10-30 M$, comparable to many commercially-motivated single-victim attacks these days) can find no vulnerabilities in your deployment or a formal proof of security and correctness otherwise we should stick with the default assumption that software if easily hacked instead of the extraordinary claim that demands extraordinary evidence.
Seacomp, capabilities, selinux, apparmor, etc.. can help harden containers, but most of the popular containers don't even drop root for services, and I was one of the people who tried to even get Docker/Moby etc.. to let you disable the privileged flag...which they refused to do.
While some CRIs make this easier, any agent that can spin up a container should be considered a super user.
With the docker --privlaged flag I could read the hosts root volume or even install efi bios files just using mknod etc, walking /sys to find the major/minor numbers.
Namespaces are useful in a comprehensive security plan, but as you mentioned, they are not jails.
It is true that both VMs and containers have attack surfaces, but the size of the attack surface on containers is much larger.
There are VMMs (e.g. pKVM in upstream Linux) with small SLoC that are isolated by silicon support for nested virtualization. This can be found on recent Google Pixel phones/tablets with strong isolation of untrusted Debian Arm Linux "Terminal" VM.
A similar architecture was shipped a decade ago by Bromium and now on millions of HP business laptops, including hypervisor isolation of firmware, "Hypervisor Security : Lessons Learned — Ian Pratt, Bromium — Platform Security Summit 2018", https://www.youtube.com/watch?v=bNVe2y34dnM
Christian Slater, HP cybersecurity ("Wolf") edutainment on nested virt hypervisor in printers, https://www.youtube.com/watch?v=DjMSq3n3Gqs
Is there any guarantee that this "silicon support" is any safer than the software? Once we break the software abstraction down far enough it's all just configuring hardware. Conversely, once you start baking significant complexity into hardware (such as strong security boundaries) it would seem like hardware would be subject to exactly the same bugs as software would, except it will be hard to update of course.
Safety and security claims are only meaningful in the context of threat models. As described in the Xen/uXen/AX video, pKVM and AWS Nitro security talks, one goal is to reduce the size, function and complexity of open-source code running at the highest processor privilege levels [1], minimizing dependency on closed firmware/SMM/TrustZone. Nitro moved some functions (e.g. I/O virtualization) to separate processors, e.g. SmartNIC/DPU. Apple used an Arm T2 secure enclave processor for encryption and some I/O paths, when their main processor was still x86. OCP Caliptra RoT requires OSS firmware signed by both the OEM and hyperscaler customer. It's a never-ending process of reducing attack surface, prioritized by business context.
> hardware would be subject to exactly the same bugs as software would, except it will be hard to update of course
Some "hardware" functions can be updated via microcode, which has been used to mitigate speculative execution vulnerabilities, at the cost of performance.
[1] https://en.wikipedia.org/wiki/Protection_ring
[2] https://en.wikipedia.org/wiki/Transient_execution_CPU_vulner...
You can create security boundaries around (and even within!) the VMM. You can make it so an escape into the VMM process has only minimal value, by sandboxing the VMM aggressively.
Plus you can absolutely escape the model of C++ emulating devices. Ideally I think VMMs should do almost nothing but manage VF passthroughs. Of course then we shift a lot of the problem onto the inevitably completely broken device firmware but again there are more ways to mitigate that than kernel bugs.
Intuitively there are differences. The Linux kernel is fucking huge, and anything that could bake the "shared resources" down to less than the entire kernel would be easier to verify, but that would also be true for an entirely software based abstraction inside the kernel.
In a way it's the whole micro kernel discussion again.
If you escape into a VMM you can do whatever the VMM can do. You can build a system where it can not do very much more than the VM guest itself. By the time the guest boots the process containing the vCPU threads has already lost all its interesting privileges and has no credentials of value.
Similar with device passthrough. It's not very interesting if the device you're passing through ultimately has unchecked access to PCIe but if you have a proper ioMMU set up it should be possible to have a system where pwning the device firmware is just a small step rather than an immediate escalation to root-equivalent. (I should say, I don't know if this system actually exists today, I just know it's possible).
With a VMM escape your next step is usually to exploit the kernel. But if you sandbox the VMM properly there is very limited kernel attack surface available to it.
So yeah you're right it's similar to the microkernel discussion. You could develop these properties for a shared-kernel container runtime... By making it a microkernel.
It's just that isn't a path with any next steps in the real world. The road from Docker to a secure VM platform is rich with reasonable incremental steps forward (virtualization is an essential step but it's still just one of many). The road from Docker to a microkernel is... Rewrite your entire platform and every workload!
It appears we find ourselves at the Theory/Praxis intersection once again.
> The road from Docker to a secure VM platform is rich with reasonable incremental steps forward
The reason it seems so reasonable is that it's well trodden. There were an infinity of VM platforms before Docker, and they were all discarded for pretty well known engineering reasons mostly to do with performance, but also for being difficult for developers to reason about. I have no doubt that there's still dialogue worth having between those two approaches, but cgroups isn't a "failed" VM security boundary anymore than Linux is a failed micro kernel. It never aimed to be a VM-like security boundary.
For example there is Kata containers
This can be used with regular `podman` by just changing the container runtime so there’s no even need for any extra tooling
In theory you could shove the container runtime into something like k8s
True, by "container" I really meant "shared-kernel container".
> In theory you could shove the container runtime into something like k8s
Yeah this is actually supported by k8s.
Whether that means it's actually reasonable to run completely untrusted workloads on your own cluster is another question. But it definitely seems like a really good defense-in-depth feature.
Depends I guess as Android has had quite a bit of success with seccomp-bpf & Android-specific flavour of SELinux [0]
> Until we have a properly hardened and memory safe OS ... faster than running MicroVMs on a Linux host.
Andy Tanenbaum might say, Micro Kernels would do just as well.
Exactly. Android pulls this off by being extremely constrained. It's dramatically less flexible than an OCI runtime. If you wanna run a random unenlightened workload on it you're probably gonna have a hard time.
> Micro Kernels would do just as well.
Yea this goes in the right direction. In the end a lot of kernel work I look at is basically about trying to retrofit benefits of microkernels onto Linux.
Saying "we should just use an actual microkernel" is a bit like "Russia and Ukraine should just make peace" IMO though.
I think it's generally understood that any sort of kernel LPE can potentially (and therefore is generally considered to) lead to breaking all security boundaries on the local machine, since the kernel contains no internal security boundaries. That includes both containers, but also everything else such a user separation, hardware virtualization controlled by the local kernel, and kernel private secrets.
In some architectures, kernel LPE does not break platform (L0/EL2) virtualization, https://news.ycombinator.com/item?id=44141164
L0/EL2 L1/EL1
pKVM KVM
AX Hyper-V / Xen / ESX
There is no inherent advantage to virtualization, the only thing that matters is the security and robustness of the privileged host.
The only reason there is any advantage in common use is that the Linux Kernel is a security abomination designed for default-shared/allow services that people are now trying to kludge into providing multiplexed services. But even that advantage is minor in comparison to modern, commonplace threat actors who can spend millions to tens of millions of dollars finding security vulnerabilities in core functions and services.
You need privileged manager code that a highly skilled team of 10 with 3 years to pound on it can not find any vulnerabilities in to reach the minimum bar to be secure against prevailing threat actors, let alone near-future threat actors.
Basically I'd love to see a giant ablation
I welcome alternatives. It's been tough wrestling with Firecracker and OCI images. Kata container is also tough.
Microsandbox does not offer a cloud solution. It is self-hosted, designed to do what E2B does, to make it easier working with microVM-based sandboxes on your local machine whether that is Linux, macOS or Windows (planned) and to seamlessly transition to prod.
> Do you also use Firecracker under the hood?
It uses libkrun.
What like about containers is how quickly I can run something, e.g. `docker run --rm ...` without having to specify disk size, amount of cpu cores, etc. I can then diff the state of the container with the image (and other things) to see what some program did while it ran.
So I basically want the same but instead with small vms to have better sandboxing. Sometimes I also use bwrap but it's not really intended to be used on the command line like that.
Then the installation instructions include piping a remote script directly to Bash ... Oh irony ...
That said, the concept itself is intriguing.
mount
you immediately see what I mean. Stuff that should be hidden is now in plain sight, and destroys the usefulness of simple system commands. And worse, the user can fiddle with the data structures. It's like giving the user peek and poke commands.The idea of containers is nice, but they are a hack until kernels are re-architected.
findmnt --real
It's part of linux-utils, so it is generally available wherever have a shell. The legacy tools you have in mind aren't ever going to be changed as you would wish, for reasons.PS: microsandbox will likely have its own OCI registry in the future
I want to run sandboxes based on Docker images that have Nix pre-installed. (Once the VM boots, apply the project-specific Flake, and then run Docker Compose for databases and other supporting services.) In theory, an easy-to-use, fully isolated dev environment that matches how I normally develop, except inside of a VM.
Nix, on the other hand, solves the problem of building reproducible environments... but making said environments safe for running untrusted code is left as an exercise for the reader.
appcypher•1d ago
I'm the creator of microsandbox. If there is anything you need to know about the project, let me know.
This project is meant to make creating microvms from your machine as easy as using Docker containers.
Ask me anything.
esafak•1d ago
edit: A fleshed out contributors guide to add support for a new language would help. https://github.com/microsandbox/microsandbox/blob/main/CONTR...
appcypher•1d ago
0cf8612b2e1e•1d ago
How is it so fast? Is it making any trade offs vs a traditional VM? Is there potential the VM isolation is compromised?
Can I run a GUI inside of it?
Do you think of this as a new Vagrant?
How do I get data in/out?
appcypher•1d ago
It is a lighweight VM and uses the same technology as Firecracker
> Can I run a GUI inside of it?
It is planned but not yet implemented. But it is absolutely possible.
> Do you think of this as a new Vagrant?
I would consider Docker for VMs instead. In a similar way, it focuses on dev ops type use case like deplying apps, etc.
> How do I get data in/out?
There is an SDK and server that help does that and file streaming is planned. But right now, you can execute commands in the VM and get the result back via the server
westurner•1d ago
Native Containers would probably solve here, too.
From https://news.ycombinator.com/item?id=43553198 :
>>> ostree native containers are bootable host images that can also be built and signed with a SLSA provenance attestation; https://coreos.github.io/rpm-ostree/container/
And also from that thread:
> How should a microkernel run (WASI) WASM runtimes?
What is the most minimal microvm for WASM / WASI, and what are the advantages to running WASM workloads with firecracker or microsandbox?
appcypher•1d ago
By setting up an image with wasmtime for example.
> and what are the advantages to running WASM workloads with firecracker or microsandbox?
I can think of stronger isolation or when you have legacy stuff you need to run alongside.
westurner•1d ago
> AWS built [Firecracker (which is built on KVM)] to power Lambda and Fargate [2], where they need to quickly spin up isolated environments for running customer code. Companies like E2B use Firecracker to run AI generated code securily in the cloud, while Fly.io uses it to run lightweight container-like VMs at the edge [4, 5].
"We replaced Firecracker with QEMU" (2023) https://news.ycombinator.com/item?id=36666782
"Firecracker's Kernel Support Policy" describes compatible kernel configurations; https://github.com/firecracker-microvm/firecracker/blob/main...
/? wasi microvm kernel [github] https://www.google.com/search?q=wasi+microvm+kernel+GitHub :
- "Mewz: Lightweight Execution Environment for WebAssembly with High Isolation and Portability using Unikernels" (2024) https://arxiv.org/abs/2411.01129 similar: https://scholar.google.com/scholar?q=related:b3657VNcyJ0J:sc...
hugs•1d ago
Question: How does networking work? Can I restrict/limit microvms so that they can only access public IP addresses? (or in other words... making sure the microvms can't access any local network IP addresses)
appcypher•1d ago
https://github.com/microsandbox/microsandbox/blob/0c13fc27ab...
hugs•1d ago
(also, this project is really cool. great work!)
appcypher•1d ago
hugs•1d ago
simonw•1d ago
appcypher•1d ago
wolfhumble•1d ago
Congratulations on the launch!
appcypher•1d ago
That said, hosting microVMs require dedicated hardware or VMs with nested virt support. Containers don’t have that problem.
nqzero•1d ago
1. each one should have it's own network config, eg so i can use wireguard or a vpn
2. gui pass-through to the host, eg wayland, for trusted tools, eg firefox, zoom or citrix
3. needs to be lightweight. eg gnome-boxes is dead simple to setup and run and it works, but the resource usage was noticeably higher than native
4. optional - more security is better (ie, i might run semi-untrusted software in one of them, eg from a github repo or npm), but i'm not expecting miracles and accept that escape is possible
5. optional - sharing disk with the host via COW would be nice, so i'd only need to install the env-specific packages, not the full OS
i'm currently working on a podman solution, and i believe that it will work (but rebuilding seems to hammer the network - i'm hoping i can tweak the layers to reduce this). does microsandbox offer any advantages for this use case ?
appcypher•1d ago
This is possible right now but the networking is not where I want it to be yet. It uses libkrun's default TSI impl; performant and simplifies setup but can be inflexible. I plan to implement an alternative user-space networking stack soon.
> 2. gui pass-through to the host, eg wayland, for trusted tools, eg firefox, zoom or citrix
We don't have GUI passthrough. VNC?
> 3. needs to be lightweight. eg gnome-boxes is dead simple to setup and run and it works, but the resource usage was noticeably higher than native
It is lightweight in the sense that it is not a full vm
> 4. optional - more security is better (ie, i might run semi-untrusted software in one of them, eg from a github repo or npm), but i'm not expecting miracles and accept that escape is possible
The security guarantees are similar to what typical VMs support. It is hardware-virtualized so I would say you should be fine.
> 5. optional - sharing disk with the host via COW would be nice, so i'd only need to install the env-specific packages, not the full OS
Yeah. It uses virtio-fs and has overlayfs on top of that for COW.
simonw•1d ago
The documented code pattern is this:
Due to the way my code works I want to instantiate the sandbox once for a specific class and then have multiple calls to it by class methods, which isn't a clean fit for that "async with" pattern.Any recommendations?
appcypher•22h ago
There is an example of that here:
https://github.com/microsandbox/microsandbox/blob/0c13fc27ab...
gcharbonnier•22h ago
https://docs.python.org/3/library/contextlib.html#contextlib...
codethief•21h ago
Nypro•20h ago
codethief•7h ago
One more question: What syscalls do I need to have access to in order to run a MicroVM? I'm asking because ideally I'd like to run container workloads inside existing containers (self-hosted GitLab CI runners) whose configuration (including AppArmor) I don't control.
Hilift•20h ago
Nypro•20h ago
Networking continues to be a pain but I'm open to suggestions.
catlifeonmars•16h ago
appcypher•9h ago
nulld3v•15h ago
appcypher•11h ago
spicybright•14h ago
appcypher•11h ago
meander_water•13h ago
[0] https://katacontainers.io/
appcypher•11h ago
More importantly is making sandboxing really accessible to AI devs with `msb server`.
nikolamus•5h ago
appcypher•4h ago