I have heard that gvisor isn't recommended to run every single production but rather only some front facing or some other activities but it has some serious performance degradation which is why most end up using firecracker
This is really cool though, does this mean that we could probably have AI models that are snapshotted?
Are the states of checkpoint/recovery encrypted by default or how would that even work? Like what are the privacy aspects of it. I don't think even using something like modal would be the private llm that many people sometimes want on subreddits like localllama but the people dont have gpu. of course nothing beats privacy if you have your own gpu's but I'd be curious to know what people's thoughts are
If Modal's customers' workloads are mainly GPU-bound, then the performance hit of gvisor isn't as big as it might be for other workloads. GPU activity does have to go through the fairly heavyweight nvproxy to be executed on the host, but most gpu activity is longer-lived async calls like running kernels so a bit of overhead in starting / retrieving the results from those calls can be tolerated.
So I can agree that perhaps Modal might make sense for LLM's but they position themselves as sandbox including something like running python code etc. and some of this may be more intensive in workflows than others so I just wanted to point it out
Fly.io uses firecracker so I kinda like firecracker related applications (I tried to run firecracker myself its way too hard to build your own firecracker based provider or anything) and they recently released https://sprites.dev/
E2B is another well known solution out there. I have talked to their developers once and they mentioned that they run it on top of gcp
I am really interested in kata containers as well because I think kata runs on top of firecracker and can hook with docker rather quickly.
Kata runs atop many things, but is a little awkward because it creates a "pod" (VM) inside which it creates 1+ containers (runc/gVisor). Firecracker is also awkward because GPU support is pretty hard / impossible.
Does anyone know of a more efficient alternative if you’re running a trusted container?
- what is the difference between docker and modal?
- what does modal do that docker doesnt?
- what is the cold start time comparison between both?
- how do both of these differ from something called "Firecracker VM"?
With Intel VMX virtualization, instruction execution is handled by the CPU but (a lot) of software still has to deal with HW peripheral emulation .
QEMU uses KVM (Intel VMX, etc) but implements HW peripherals (display, network, disk, etc) faithfully matching really HW and provides a full BIOS (SeaBios) or UEFI firmware (EDK) to deal with with boot process.
Over time, Linux (and Windows) were extended to support novel “peripherals” designed for high emulation performance (not a real HW product).
Firecracker basically skips all the “real” peripheral emulation and skips the full BIOS/UEFI firmware. Instead, it implements just enough to boot modern Linux directly. Also written in Rust instead of C. It will never support DOS, Windows 95 or probably anything else.
The “microVM” BIOS allows it to start booting Linux very quickly (sub-second). A traditional QEMU VM might take 2-5 seconds. Some people are emboldened to effectively move back from containers to running applications in a VM…
Instead of the VM being long lived, it is really just for running a single app.
I think Kata containers had this idea for much longer but Firecracker provides a more efficient implementation for such a thing.
erwaen98•11h ago