frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

The 1979 Design Choice Breaking AI Workloads

https://www.cerebrium.ai/blog/rethinking-container-image-distribution-to-eliminate-cold-starts
22•za_mike157•2h ago

Comments

PaulHoule•2h ago
I remember dealing with this BS back in 2017. It was clear to me that containers were, more than anything else, a system for turning 15MB of I/O into 15GB of I/O.

So wow and new shiny though so if you told people that they would just plug their ears with their fingers.

pocksuppet•1h ago
This doesn't follow from anything in the article.
PaulHoule•11m ago
I was working with prototypical foundation models and having the exact same problem. My diagnosis wasn't quite the same, I think more radical gains could be had with a "stamp out unnecessary copies everywhere" policy but it looks like he did get through a bottleneck.
formerly_proven•2h ago
The gzip compression of layers is actually optional in OCI images, but iirc not in legacy docker images. The two formats are not the same. On SSDs, the overhead for building an index for a tar is not that high, if we're primarily talking about large files (so the data/weights/cuda layers instead of system layers). The approach from the article is of course still faster, especially for running many minor variations of containers, though I am wondering how common it is for only some parts of weights changing? I would've assumed that most things you'll do with weights would change about 100% of them when viewed through 1M chunks. The lazy pulling probably has some rather dubious/interesting service latency implications.

The main annoyance imho with gzip here is that it was already slow when the format was new (unless you have Intel QAT and bothered to patch and recompile that into all the go binaries which handle these, which you do not).

MontyCarloHall•1h ago
I ran into a similar issue years ago, where the base infrastructure occupied the lion's share of the container size, very similar to the sizes shown in the article:

   Ubuntu base      ~29 MB compressed
   PyTorch + CUDA   7 – 13 GB
   NVIDIA NGC       4.5+ GB compressed
The easy solution that worked for us was to bake all of these into a single base container, and force all production containers built within the company to use that base. We then preloaded this base container onto our cloud VM disk images, so that pulling the model container only needed to download comparatively tiny layers for model code/weights/etc. As a benefit, this forced all production containers to be up-to-date, since we regularly updated the base container which caused automatic rebuilding of all derived containers.
jono_irwin•6m ago
That approach works really well when you have a stable shared base image.

Where it starts to get harder is when you have multiple base stacks (different CUDA versions, frameworks, etc.) or when you need to update them frequently. You end up with lots of slightly different multi-GB bases.

Chunked images keep the benefit you mentioned (we still cache heavily on the nodes) but the caching happens at a finer granularity. That makes it much more tolerant to small differences between images and to frequent updates, since unchanged chunks can still be reused.

andrewvc•1h ago
They say an ideal container system would download portions of layers on demand, however is seems far from ideal for many production workloads. What if your service starts, works fine for an hour, then needs to read one file that is only available over the network, but that endpoint is unreachable? What if it is reachable but it is very very slow?

The current system has issues with network stuff, but in a deploy process you can delineate that all to a new container deployment. Perhaps you try to deploy a new container and it fails because the network is slow or broken. Rollback is simpler there. Spreading network issues over time makes debugging much harder.

The current system is simple and resilient but clearly not fast. Trading speed for more complex failure modes for such a widely distributed technology is hardly a clear win.

The de-duplication seems like a neat win however.

jono_irwin•13m ago
Good point, network dependency is a valid concern.

In practice these systems typically fetch data over a local, highly available network and aggressively cache anything that gets read. If that network path becomes unavailable, it usually indicates a much larger infrastructure issue since many other parts of the system rely on the same storage or registry endpoints.

So while it does introduce a different failure mode, in most production environments it ends up being a low practical risk compared to the startup latency improvements.

For us and our customers, the trade off is worth it.

pocksuppet•1h ago
Clickbait title. Summary: Their AI docker containers are slow to start up because they are 10GB layers that have to be gunzipped, and gzip doesn't support random access.
alanfranz•1h ago
Looks like they'd like something git repositories (maybe with transparent compression on top) rather than .tar.gz files. Just pull the latest head and you're done.
cosmotic•56m ago
Why does the model data need to be stored in the image? Download the model data on container startup using whatever method works best.
jono_irwin•23m ago
hey cosmotic, we're not really advocating for storing model weights in the container image.

even the smaller nvidia images (like nvidia/cuda:13.1.1-cudnn-runtime-ubuntu24.04) are about 2Gb before adding any python deps and that is a problem.

if you split the image into chunks and pull on-demand, your container will start much faster.

za_mike157•22m ago
You are correct! From our tests, storing model weights in the image actually isn't a preferred approach for model weights larger than ~1GB. We run a distributed, multi-layer cache system to combat this and we can load roughly 6-7GB of files in p99 of <2.5s
dsr_•32m ago
The problem: "containers that take far too long to start".

Somehow, they don't hit upon the solution other organizations use: having software running all the time.

I suppose if you have a lousy economic model where the cost of running your software is a large percentage of your overall costs, that's a problem. I can only advise them to move to a model where they provide more value for their clients.

za_mike157•25m ago
A lot of AI workloads require GPUs which are expensive so customers would waste money running idle machines 24/7 with low utilisation which kills gross margins. By loading containers quickly means, means we can scale up quickly as requests come in and you only need to pay for usage.

This is successful for CPU workloads (AWS Lambda) but AI models and images are 50x the size

dsr_•21m ago
As I said, if only you were providing more value rather than being a commodity, you could avoid all this.
notyourbiz•31m ago
Super helpful.
za_mike157•22m ago
Glad you liked it!

Oscar Pool Ballot, 98th Academy Awards

http://fxrant.blogspot.com/2026/03/oscar-pool-ballot-98th-academy-awards.html
1•speckx•1m ago•0 comments

Advanced Pet Screen Drawing Techniques

https://retrogamecoders.com/advanced-pet-screen-drawing-techniques/
1•ibobev•1m ago•0 comments

The Reviewer Isn't the Bottleneck

https://rishi.baldawa.com/posts/review-isnt-the-bottleneck/
1•timbray•1m ago•0 comments

Apple in 2025: The Six Colors report card

https://sixcolors.com/post/2026/02/2025reportcard/
1•akyuu•2m ago•0 comments

Show HN: ContextForge now supports Cursor IDE – persistent AI memory

https://contextforge.dev/blog/contextforge-now-supports-cursor-ide-persistent-ai-memory-everywhere
1•alfredoizjr•2m ago•0 comments

Show HN: A2UI for Elixir/Phoenix/LiveView

https://github.com/actioncard/a2ui-elixir
1•maxekman•3m ago•0 comments

Reasoning boosts search relevance 15-30%

https://softwaredoug.com/blog/2025/10/06/how-much-does-reasoning-improve-search-quality
1•gmays•4m ago•0 comments

Specimen Gallery – CC0 transparent specimen PNGs organized by taxonomy

https://specimen.gallery/
1•eclectic_mind05•6m ago•1 comments

Show HN: An AI system that pushes political reform

https://twitter.com/AshtonBars/status/2031082203418079664
1•sagebowsystem•6m ago•0 comments

Price-Checking Zerocopy's Zero Cost Abstractions

https://jack.wrenn.fyi/blog/price-check/
1•todsacerdoti•6m ago•0 comments

Uber reported to the state that I was fired for "annoying a coworker."

https://anon-ex-uber.medium.com/uber-reported-to-the-state-that-i-was-fired-for-annoying-a-cowork...
1•anon-ex-uber•6m ago•0 comments

Things I've Done with AI

https://sjer.red/blog/2026/built-with-ai/
2•shepherdjerred•7m ago•0 comments

Ask HN: What apps have you created for your own use?

1•koconder•8m ago•2 comments

Sam Kriss on AI's false starts, doomsday scenarios, and eccentric proponents

https://harpers.org/2026/03/general-interest-agents-of-chaos-sam-kriss-will-stephenson-future-ai/
2•YPGolyadkin•9m ago•0 comments

Ask HN: How does one review code when most of the code is written by AI?

2•daemon_9009•9m ago•0 comments

Code-review-graph: persistent code graph that cuts Claude Code token usage

https://github.com/tirth8205/code-review-graph
1•tirthkanani•9m ago•1 comments

Latent Context Compilation: Distilling Long Context into Compact Portable Memory

https://arxiv.org/abs/2602.21221
1•PaulHoule•10m ago•0 comments

AluminatiAI – per-job GPU cost tracking (Nvidia-smi shows watts, not dollars)

1•AluminatiAi•10m ago•0 comments

Hono js

https://hono.dev/
2•dtj1123•11m ago•0 comments

Code-review-graph: persistent code graph that cuts Claude Code token usage

1•tirthkanani•11m ago•0 comments

Andrew Ng Just Dropped Context Hub – GitHub for AI Agent Knowledg

https://github.com/andrewyng/context-hub
2•v-mdev•12m ago•0 comments

Toni Schneider (New Bluesky CEO) - Coming Off the Bench for Bluesky

https://toni.org/2026/03/09/coming-off-the-bench-for-bluesky/
3•aghuang•13m ago•0 comments

Software Architecture in the Era of Agentic AI

https://www.exploravention.com/blogs/soft_arch_agentic_ai/
2•walterbell•14m ago•0 comments

An Economic Analysis of a Drug-Selling Gang's Finances [pdf]

https://pricetheory.uchicago.edu/levitt/Papers/LevittVenkateshAnEconomicAnalysis2000.pdf
1•paulpauper•14m ago•0 comments

Agent Session Kit (ASK) – Git guardrails for AI-assisted coding workflows

https://github.com/qarau/agent-session-kit
1•qarau•14m ago•2 comments

Teenagers report for duty as Croatia reinstates conscription

https://www.bbc.com/news/articles/c93j2l32lzgo
2•tartoran•15m ago•0 comments

Show HN: LocalConvert –browser-based file converter using ffmpeg.wasm no uploads

1•Gokul_Raj_R•15m ago•0 comments

Advice for (Berkeley) Ph. D. students in math (2013)

https://blogs.ams.org/mathgradblog/2013/06/25/advice-berkeley-ph-d-students-math-bjorn-poonen/
1•paulpauper•16m ago•0 comments

Billionaires are a danger to themselves and (especially) us

https://pluralistic.net/2026/03/09/autocrats-of-trade-2/
4•Refreeze5224•16m ago•0 comments

Show HN: SimpleStats – Server-side Laravel analytics, immune to ad blockers

https://simplestats.io/blog/server-side-analytics-laravel
1•Sairahcaz2k•16m ago•1 comments