Reduce GVisor Cold Starts with GPU Snapshotting

https://cerebrium.ai/blog/reducing-gpu-cold-starts-with-memory-snapshots-restoring-cuda-workloads-in-second

26•jono_irwin•1h ago

Comments

nixosbestos•1h ago

Started scrolling, immediately closed the page. Something is deeply wrong with a person who chooses to implement this shit on a webpage. Unusable garbage, I'm sorry, literally making me motion sick somehow.

htrp•54m ago

Isn't this exactly what modal does?

za_mike157•41m ago

Hey! Yes you are correct! We have both been upstreaming changes to the main GVisor repo. However, in order to work within our own infrastructure we had to make various changes that we explain throughout the article (Open TCP connections, multiprocessing, unix sockets etc).

Also in our benchmarks we seem to perform better than Modal by ~20% in 4/6 workloads we tested and have a lower spread of results meaning you get more consistent results. However the same fundamentals still apply -> how can you move storage into memory as quickly as possible

gpgn_•39m ago

Interesting work. How does NVIDIA Dynamo Snapshot relate?

za_mike157•12m ago

There are a lot of similarities.

They run their snapshot agent as a Kubernetes DaemonSet, whereas our implementation runs as part of the Cerebrium container runtime path. Under the hood, both approaches rely on cuda-checkpoint, since cuda-checkpoint is currently the main primitive NVIDIA exposes for interacting with GPU memory during checkpoint/restore.

One difference is how KV cache handling is exposed. NVIDIA’s approach appears to automatically handle KV cache allocation/deallocation, whereas today we expose that choice to users (vLLM and SGLang expose primitives to to his). In some cases, users may want to discard the KV cache to reduce checkpoint size and restore time; in others, preserving it may be useful.

Their DaemonSet approach is also nice because it can be more portable across Kubernetes environments and clouds. Our approach is more deeply integrated into the node/runtime layer, which gives us tighter control over the serverless startup path, but also means it depends on custom node VM images, which not every provider supports equally.

The optimizations they mention around parallel memfd restore and Linux native AIO for anonymous memory could also be applied to our architecture if we find them stable and beneficial. That said, our current results are already pretty close. For example, they report restoring Qwen3-8B in 4.7s with those changes, while we currently restore it in 6.49s.

The biggest thing we are excited for is multi-GPU restore, which is not supported yet. That would unlock a much broader set of workloads.

mountainriver•28m ago

How does this compare to the CRIU work? Or does it use that under the hood?

za_mike157•6m ago

No we don't! CRIU is used for normal checkpoint/restore of Linux processes. Since we run GVisor for container isolation we use their checkpoint/restore support for the sandboxed process state.

Both approaches still need NVIDIA’s cuda-checkpoint for the GPU side, because CUDA/GPU memory and driver state are not something a normal process checkpointing tool can handle on its own.

Show HN: Open-Source Interview Platform

Meta's Un-Stable Signature

Show HN: Trigora – A hosted runtime for event-driven TypeScript workflows

Pieces: Social Network for People

Fable Jailbroken Hours After Anthropic Lifted Restrictions

Show HN: Open-source sandbox for your product team

Animagraffs – How Nuclear Power Works [video]

Mortality associated with non-optimal ambient temperatures from 2000 to 2019

Show HN: AnalystAIPack – 118 runnable agent skills for malware analysis and RE

Google Must Pay Nearly $2B to Klarna in Antitrust Case

Hey GLM 5.2, build me a hypervisor

Show HN: AnalystAIPack – 118 runnable agent skills for malware analysis and RE

The Worst Caldecott Winning Books

Why Gemini 3.1 Pro lost money running Andon Café

The Doomsday Organism

Open Source Is a Thankless Job

NASA inspector general suggests Boeing's Starliner will now be a decade late

Are readers generating fiction with AI models?

Devin Security Swarm

Wisk, Boeing's air taxi firm, rushed software testing, ex-employee claims

The Website Is Down

Tech giants lose $2T in SpaceX's IPO month

The Regret We Get Wrong

Show HN: Coding Agent Survey – Which coding agents do you use?

What do you mean by "Event-Driven"? (2017)

Show HN: I Made TS Compiler Graph MCP: 10x Fewer Tokens in Claude Code and Codex

FFmpeg's native AAC encoder has just been rewritten, and beats fdk_aac

Who needs a museum when there's a banana room in town?

Fedora: Future of Community Initiatives and AI Deveoper Desktop

What are you, Claude Fable 5?

Reduce GVisor Cold Starts with GPU Snapshotting

Comments

Show HN: Open-Source Interview Platform

Meta's Un-Stable Signature

Show HN: Trigora – A hosted runtime for event-driven TypeScript workflows

Pieces: Social Network for People

Fable Jailbroken Hours After Anthropic Lifted Restrictions

Show HN: Open-source sandbox for your product team

Animagraffs – How Nuclear Power Works [video]

Mortality associated with non-optimal ambient temperatures from 2000 to 2019

Show HN: AnalystAIPack – 118 runnable agent skills for malware analysis and RE

Google Must Pay Nearly $2B to Klarna in Antitrust Case

Hey GLM 5.2, build me a hypervisor

Show HN: AnalystAIPack – 118 runnable agent skills for malware analysis and RE

The Worst Caldecott Winning Books

Why Gemini 3.1 Pro lost money running Andon Café

The Doomsday Organism

Open Source Is a Thankless Job

NASA inspector general suggests Boeing's Starliner will now be a decade late

Are readers generating fiction with AI models?

Devin Security Swarm

Wisk, Boeing's air taxi firm, rushed software testing, ex-employee claims

The Website Is Down

Tech giants lose $2T in SpaceX's IPO month

The Regret We Get Wrong

Show HN: Coding Agent Survey – Which coding agents do you use?

What do you mean by "Event-Driven"? (2017)

Show HN: I Made TS Compiler Graph MCP: 10x Fewer Tokens in Claude Code and Codex

FFmpeg's native AAC encoder has just been rewritten, and beats fdk_aac

Who needs a museum when there's a banana room in town?

Fedora: Future of Community Initiatives and AI Deveoper Desktop

What are you, Claude Fable 5?