frontpage.
newsnewestaskshowjobs

Open Source @Github

fp.

Open in hackernews

Popping the GPU Bubble

https://moondream.ai/blog/popping-the-gpu-bubble
70•radq•1h ago

Comments

blueblazin•1h ago
I really appreciate this type of articles. I feel like a lot of knowledge in LLM training and inference is locked inside the heads of practitioners. Similar to compiler engineers before.

To work in LLM training/inference you’re expected to know this stuff but to know this stuff you need to be working in the space.

rjzzleep•57m ago
Gentle reminder that while most money is spent on LLM inference, the vast majority of useful AI use is in fact not LLMs. Also, more and more work is poured into making small models. One thing I like about the whole export controls saga is that people are finding creative ways to squeeze performance out of these devices as witnessed in this post. But, if you then look at solutions like vLLM, vLLM will just fill whatever VRAM is available, no matter the context size, or the model size. So then you have two things to worry about:

First, where do you know exactly what the optimal VRAM assignment per model, per context size is, which seems to be currently based purely on experience and second how do you make sure that only that amount is available to your infra/containers, which is being handled by DRA and stuff like https://project-hami.io

While only tangentially related to the blog post here. The title is picked in such a way that I couldn't help, but put the shameless plug here. When he wrote popping the bubble, I thought we're talking about devices and reducing NVIDIA dependency, but this seems very focused on Cuda.

Disclaimer: I work with Dynamia.ai, the founders of which created HAMi.

radq•50m ago
Thank you for the kind words. We will write and share more of these.
nl•1h ago
> you find that the GPU often sits idle, not for lack of work, but because the CPU hasn't told it what to do next yet. This phenomenon is called a GPU bubble.

This is true, but I've never heard anyone refer to this as a GPU bubble before.

I think most people hear "GPU bubble" and think of a financial bubble of some kind.

nnevatie•1h ago
Yes, the title seems off - I also thought I am going to be reading about the AI/pricing bubble.
rusk•1h ago
The term I would use would be “underutilised”
barries11•1h ago
"stall" is the best term I can think of as in "pipeline stall".

Better term, anyone?

cma•1h ago
It's very common to call it a GPU bubble in gamedev, though not strictly for CPU induced bubbles.
SCdF•1h ago
It appears to be a real term? https://docs.vulkan.org/tutorial/latest/Synchronization/Asyn...

Very odd, but perhaps more familiar to graphics programmers? I will say I'd probably call it a stall, which is exactly what the Vulkan docs call it moments later, so :shrug:

gardnr•1h ago
Different bubble than the one I was hoping for.

This appears to be different than the recent "Speculative Pipeline Decoding" paper: https://arxiv.org/abs/2605.30852

kibibu•55m ago
"bubble" used to be used a lot more when talking about very deep pipelines, eg Pentium 4 depth.
radq•54m ago
I feel like bubble is what this is commonly called in GPU programming circles (e.g. https://github.com/sgl-project/sglang/issues/5593 or any number of other issues). Didn't occur to me that it would be confusing to be honest. But yes stall is maybe a better word.
vkazanov•49m ago
I saw it in literature on cpu pipelines in quotes, never without.
IshKebab•20m ago
I've never seen it in quotes, but yeah it is a very common term in pipelined CPUs.

Qwen 3.6 27B is the sweet spot for local development

https://quesma.com/blog/qwen-36-is-awesome/
828•stared•14h ago•579 comments

.self: A new top-level domain designed to support self-hosting

https://hccf.onmy.cloud/2026/06/21/reclaiming-our-digital-selves-hccfs-vision-for-a-human-centere...
447•HumanCCF•11h ago•263 comments

Free the Icons

https://weblog.rogueamoeba.com/2026/06/26/free-the-icons/
384•zdw•2d ago•104 comments

Memory Safe Context Switching

https://fil-c.org/context_switches
97•modeless•6h ago•22 comments

LongCat-2.0, a large-scale MoE model with 1.6T total and 48B Active

https://longcat.chat/blog/longcat-2.0/
102•benjiro29•6h ago•32 comments

Old Computer Challenge

http://occ.sdf.org/
45•wrxd•2d ago•14 comments

Exploring PDP-1 Lisp (1960)

https://obsolescence.dev/pdp1-lisp-introduction.html
52•ozymandiax•6h ago•16 comments

Rocketlab acquires Iridium

https://investors.rocketlabcorp.com/news-releases/news-release-details/rocket-lab-acquire-iridium...
395•everfrustrated•16h ago•259 comments

Linux for the Sega MegaDrive

https://github.com/LinuxMD/linuxmd
76•HardwareLust•16h ago•10 comments

Open Source Low Tech

https://opensourcelowtech.org/
21•grep_it•4d ago•2 comments

Popping the GPU Bubble

https://moondream.ai/blog/popping-the-gpu-bubble
72•radq•1h ago•16 comments

Study suggests most Americans would be healthier without daylight saving time

https://med.stanford.edu/news/all-news/2025/09/daylight-saving-time.html
69•andsoitis•3h ago•63 comments

Ornith-1.0: self-improving open-source models for agentic coding

https://github.com/deepreinforce-ai/Ornith-1
193•danboarder•13h ago•39 comments

The end of the AArch64 desktop experiment

https://marcin.juszkiewicz.com.pl/2026/06/26/the-end-of-the-aarch64-desktop-experiment/
11•signa11•2h ago•4 comments

How to corrupt an SQLite database file

https://www.sqlite.org/howtocorrupt.html
51•tosh•3d ago•14 comments

US Supreme Court rules geofence warrants require constitutional protections

https://www.theguardian.com/us-news/2026/jun/29/supreme-court-geofence-warrants-case-decision
509•cdrnsf•15h ago•241 comments

British Origami: the 1955 exhibition by Akira Yoshizawa (2005)

https://www.britishorigami.org/cp-lister-list/the-1955-exhibition-by-akira-yoshizawa/
29•dang•1d ago•3 comments

One million passports leaked online

https://www.theverge.com/tech/947157/passports-data-breach-cannabis-club-systems-nefos-puffpal
228•jruohonen•1d ago•127 comments

Zig – SPIR-V Backend Progress

https://ziglang.org/devlog/2026/#2026-06-26
48•Retro_Dev•4d ago•16 comments

Apple Neural Engine: Architecture, Programming, and Performance

https://arxiv.org/abs/2606.22283
153•Jimmc414•2d ago•22 comments

A native graphical shell for SSH

https://probablymarcus.com/blocks/2026/06/28/native-graphical-shell-for-SSH.html
287•mrcslws•15h ago•153 comments

Kb – Prolog Knowledge Base

https://github.com/mat-mgm/kb-prolog
66•triska•2d ago•7 comments

WATaBoy: JIT-Ing Game Boy Instructions to WASM Beats a Native Interpreter

https://humphri.es/blog/WATaBoy/
201•energeticbark•16h ago•33 comments

Dark Sky Lighting

https://www.savingourstars.org/darkskylighting#whatisdarkskylighting
197•alexandrehtrb•4d ago•31 comments

South Korea to spend $1T on more memory chip production and humanoid robots

https://arstechnica.com/ai/2026/06/south-korea-to-spend-1t-on-more-memory-chip-production-and-hum...
205•jnord•8h ago•122 comments

Alan Kay on the meaning of "object-oriented programming" (2003)

https://notes.shixiangxi.com/en/docs/appendix/alan-kay-on-oop/
46•sxx0•2d ago•11 comments

What happens when you run a CUDA kernel?

https://fergusfinn.com/blog/what-happens-when-you-run-a-gpu-kernel/
244•mezark•17h ago•29 comments

Philae's extraordinary comet landing relived (2024)

https://www.esa.int/Science_Exploration/Space_Science/Rosetta/Philae_s_extraordinary_comet_landin...
18•1970-01-01•5d ago•1 comments

Working With AI: A concrete example

https://htmx.org/essays/working-with-ai/
140•comma_at•16h ago•46 comments

Wallace the 6 inch f/2.8 telescope, building it, and hiking with it

https://lucassifoni.info/blog/hiking-with-wallace/
130•chantepierre•3d ago•20 comments