frontpage.
newsnewestaskshowjobs

Open Source @Github

fp.

Open in hackernews

Popping the GPU Bubble

https://moondream.ai/blog/popping-the-gpu-bubble
56•radq•1h ago

Comments

blueblazin•49m ago
I really appreciate this type of articles. I feel like a lot of knowledge in LLM training and inference is locked inside the heads of practitioners. Similar to compiler engineers before.

To work in LLM training/inference you’re expected to know this stuff but to know this stuff you need to be working in the space.

rjzzleep•30m ago
Gentle reminder that while most money is spent on LLM inference, the vast majority of useful AI use is in fact not LLMs. Also, more and more work is poured into making small models. One thing I like about the whole export controls saga is that people are finding creative ways to squeeze performance out of these devices as witnessed in this post. But, if you then look at solutions like vLLM, vLLM will just fill whatever VRAM is available, no matter the context size, or the model size. So then you have two things to worry about:

First, where do you know exactly what the optimal VRAM assignment per model, per context size is, which seems to be currently based purely on experience and second how do you make sure that only that amount is available to your infra/containers, which is being handled by DRA and stuff like https://project-hami.io

While only tangentially related to the blog post here. The title is picked in such a way that I couldn't help, but put the shameless plug here. When he wrote popping the bubble, I thought we're talking about devices and reducing NVIDIA dependency, but this seems very focused on Cuda.

Disclaimer: I work with Dynamia.ai, the founders of which created HAMi.

radq•23m ago
Thank you for the kind words. We will write and share more of these.
nl•48m ago
> you find that the GPU often sits idle, not for lack of work, but because the CPU hasn't told it what to do next yet. This phenomenon is called a GPU bubble.

This is true, but I've never heard anyone refer to this as a GPU bubble before.

I think most people hear "GPU bubble" and think of a financial bubble of some kind.

nnevatie•46m ago
Yes, the title seems off - I also thought I am going to be reading about the AI/pricing bubble.
rusk•44m ago
The term I would use would be “underutilised”
barries11•39m ago
"stall" is the best term I can think of as in "pipeline stall".

Better term, anyone?

cma•37m ago
It's very common to call it a GPU bubble in gamedev, though not strictly for CPU induced bubbles.
SCdF•35m ago
It appears to be a real term? https://docs.vulkan.org/tutorial/latest/Synchronization/Asyn...

Very odd, but perhaps more familiar to graphics programmers? I will say I'd probably call it a stall, which is exactly what the Vulkan docs call it moments later, so :shrug:

gardnr•47m ago
Different bubble than the one I was hoping for.

This appears to be different than the recent "Speculative Pipeline Decoding" paper: https://arxiv.org/abs/2605.30852

kibibu•28m ago
"bubble" used to be used a lot more when talking about very deep pipelines, eg Pentium 4 depth.
radq•26m ago
I feel like bubble is what this is commonly called in GPU programming circles (e.g. https://github.com/sgl-project/sglang/issues/5593 or any number of other issues). Didn't occur to me that it would be confusing to be honest. But yes stall is maybe a better word.
vkazanov•22m ago
I saw it in literature on cpu pipelines in quotes, never without.

Show HN: PDFMergely – In-browser PDF tools that never upload your files

https://pdfmergely.com
1•pdfmergely•1m ago•0 comments

Show HN: Berth – A native macOS app for managing containers with Apple/container

https://github.com/tofa84/berth
1•tomfal•1m ago•0 comments

Ask for Feedback Before You Need It

1•Semi_hayat•2m ago•0 comments

Estonian camera headed for deep-space mission in 2028

https://news.err.ee/1610059198/estonian-camera-headed-for-deep-space-mission-in-2028
1•marklit•3m ago•0 comments

DevOps

1•Snapymon•6m ago•0 comments

Oil stocks in US Strategic Reserve fall by 5.5M – lowest level since 1983

https://www.reuters.com/business/energy/oil-stocks-us-strategic-petroleum-reserve-fall-by-55-mill...
1•Teever•7m ago•0 comments

The Force Is with Cristal Beer

https://en.wikipedia.org/wiki/The_Force_is_with_Cristal_Beer
1•Michelangelo11•9m ago•0 comments

DeepSeek Open Sources DSpark

https://venturebeat.com/orchestration/deepseek-open-sources-dspark-a-new-framework-to-speed-up-ll...
1•msalsas•9m ago•0 comments

Wellformed: Validation Schemas as JSON for TypeScript and Rust

https://wellformed.net/
1•burnrate•14m ago•1 comments

DeepSeek V4 official release coming in mid-July with 2x peak-hour API pricing

https://technode.com/2026/06/30/deepseek-to-launch-v4-in-mid-july-with-new-peak-time-api-pricing/
3•linzhangrun•16m ago•0 comments

Universal agents require universal memory

https://adapt.com/blog/unified-memory
1•ashumz•17m ago•0 comments

The state of the AI economy from bottom up

https://www.exponentialview.co/p/the-state-of-the-ai-economy
1•damethos•17m ago•0 comments

Open Hardware and Free Software: Teufel Mynd, a Case Study of a BT Loudspeaker

https://fsfe.org/news/2026/news-20260629-01.en.html
1•kirschner•18m ago•0 comments

QDBP: Explicit depth markers as an alternative to indentation and parentheses

https://github.com/tearflake/qdbp
1•tearflake•23m ago•0 comments

AI Policy Update

https://blog.freecad.org/2026/06/29/ai-policy-update/
1•ilreb•30m ago•0 comments

Reward hacking is swamping model intelligence gains

https://cursor.com/blog/reward-hacking-coding-benchmarks
3•matt_d•32m ago•0 comments

Vega: Zero-knowledge proofs for digital identity in the age of AI

https://www.microsoft.com/en-us/research/blog/vega-zero-knowledge-proofs-for-digital-identity-in-...
1•tosh•34m ago•0 comments

Gemma 4 on Cerebras - The Fastest Inference Is Now Multimodal

https://www.cerebras.ai/blog/gemma-4-on-cerebras-the-fastest-inference-is-now-multimodal
3•Tiberium•34m ago•1 comments

Show HN: Bored People Chat – Anonymous global chat room

https://boredpeoplechat.com/
3•syc-bpc•34m ago•3 comments

I built 25 executable skills for my AI agent �” all open source

https://github.com/ChrisLamDev/hermes-core-skills
1•ChrisLamDev118•35m ago•0 comments

Another Semiquincentennial

https://sanfranciscan.org/2026/06/29/another-semiquincentennial/
1•chema•38m ago•0 comments

Ask HN: Which is the best local model under 3B parameters today?

1•akarshhegde18•39m ago•0 comments

The op log was peer-to-peer the whole time

https://avelino.run/from-icloud-to-peers/
1•ethanplant•41m ago•0 comments

I built a free invoice generator for freelancers – no login, no subscription

https://quickinvoice-jade.vercel.app
1•Mini_dev•44m ago•0 comments

Operation RYaN

https://en.wikipedia.org/wiki/Operation_RYAN
1•valgaze•47m ago•0 comments

We built a P2P app with no servers. 1M users didn't miss them [Video]

https://www.youtube.com/watch?v=n76zGrt4aRY
1•danboarder•48m ago•0 comments

Tangled CI runs on microVMs

https://blog.tangled.org/spindle-microvm/
2•icy•52m ago•0 comments

Manifest-Driven Development

https://spacedock.md/blog/manifest-driven-development/
1•clkao•56m ago•1 comments

Meshtryoshka: Differentiable Mesh Rendering for Unbounded Scenes

https://danielxu9393.github.io/meshtryoshka-website/
1•E-Reverance•56m ago•0 comments

OGN 3D Viewer – glider flights replayed in 3D in the browser

https://s-celles.github.io/ogn-3d-viewer/
1•scls19fr•1h ago•0 comments