frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Nvidia greenboost: transparently extend GPU VRAM using system RAM/NVMe

https://gitlab.com/IsolatedOctopi/nvidia_greenboost
93•mmastrac•3d ago

Comments

tandr•3d ago
Some simpler benchmark table would be great. May I suggest Ollama on base machine, Ollama with T1, Ollama with T1+T2 etc. on midsize and big models to compare token/sec?
pabs3•2d ago
Would be great to get this into mainline Linux.
yjftsjthsd-h•1h ago
Previously: https://news.ycombinator.com/item?id=47384557

(Still cool, still would benefit from better benchmarks)

holoduke•1h ago
The is extremely slow and not useful in my opinion.
majorchord•1h ago
I would say it depends entirely on your usecase. I don't think there can be a simple "not useful" generalization that applies to everyone.
jauntywundrkind•1h ago
Man I wish that was a canned response that could be deployed on demand! Well said.

I really appreciate thriftful & resourceful points of view. Exploring what if, looking for use is such a great virtue.

bigwheels•1h ago
Can you elaborate beyond the shallow/superficial dismissal?
daneel_w•1h ago
It makes the difference between being able to run a lot of machine learning tasks, and not being able at all. Pretty useful.
bhewes•1h ago
This has been fun we can task our nemotron-3-super model to run over night when our desktops are idle. 4070s and 96gb of ram works fine. Slow but does it's job.
daneel_w•1h ago
Related, a couple of years ago: https://old.reddit.com/r/Amd/comments/15t0lsm/i_turned_a_95_...

"I turned a $95 AMD APU into a 16GB VRAM GPU and it can run stable diffusion!"

3abiton•36m ago
> it can generate a 50 steps 512x512 image around 1 minute and 50 seconds.

I have the 4650G APU, and the best way to describe it is: lacking of support. This was even more true 3 yo than now. rocm (is) was absolutely dogshit then, I know this because I tried to do the same when that post was made. You have to compile everything from scratch, get the relevant patches, and even then, xformers which is a library that accelerate diffusion model inferencing was not supported for renoir or rocm back then. Yes, you can generate an image, but it was much slower, and rigged with bugs. You couldn'rt update rocm because it broke compatibility, and it was partly the reason I got into nixos. That being said, those APUs are a power house. Nowadays I can run decent agentic workflows on them (I have 64gb of ddr4 ram, ie APU can suck as much as it needs with the latest linux kernels).

Just note, diffusion models are still second class citizens on AMD apus even GPUs. But then again, nothing close right now on the market except for what apple offers.

nl•5m ago
The Ryzen AI CPU/GPUs (Ryzan AI 395+ etc) seem to have increasing support - https://lemonade-server.ai/ now has support for the NPU as well as the combined CPU/GPU (which I guess is a APU but is different to the G series of APUs I think?)

But I'm always interested in first hand experiences of how good is it really - I'm pretty cynical about the idea that AMD actually knows what it takes to build good software end-to-end.

paultendo•1h ago
Could be a very useful way to do some overnight tasks using spare RAM. Possibly things like LLM-based categorisation, labelling, data cleansing. That's what comes to mind for me anyway.
yjtpesesu2•1h ago
How does this differ from anything llama.cpp offers, regarding offloading layers? The repo consistently refers to "DDR4". Is there a reason DDR5 won't work with this?
svnt•53m ago
The readme opens with this:

> I have an RTX 5070 with 12 GB VRAM and I wanted to run glm-4.7-flash:q8_0, which is a 31.8 GB model. The standard options are:

> Offload layers to CPU — works, but drops token/s by 5–10× because CPU RAM has no CUDA coherence. You end up waiting. Use a smaller quantization — you lose quality. At q4_0 the model is noticeably worse on reasoning tasks.

> Buy a bigger GPU — not realistic for consumer hardware. A 48 GB card costs more than a complete workstation.

> None of those felt right, so I built an alternative: route the overflow memory to DDR4 via DMA-BUF, which gives the GPU direct access to system RAM over PCIe 4.0 without a CPU copy involved.

And then limps home with this caveat on the closest thing to a benchmark:

> The PCIe 4.0 link (~32 GB/s) is the bottleneck when the model overflows VRAM. The best strategy is to shrink the model until it fits — either with EXL3 quantization or ModelOpt PTQ — and use GreenBoost's DDR4 pool for KV cache only.

I think the reason it refers it to DDR4 is because that is how the user explained it to their coding agent. LLMs are great at perpetuating unnecessary specificity.

xienze•52m ago
Presumably it means that software doesn’t have to write the same sort of layer offloading support. It’ll “just work” as if you had X GB of VRAM all along.
yjtpesesu2•45m ago
so, magic?
kcb•42m ago
CUDA has had managed memory that pages between VRAM and system RAM for a decade. Problem is doing so is unusably slow for AI purposes. Seems like an unnecessary layer here.
0xbadcafebee•43m ago
You can already do this with some GPU drivers:

  GRUB_CMDLINE_LINUX_DEFAULT="quiet splash amdttm.pages_limit=5242880 ttm.pages_limit=5242880"
One downside is your kernel isn't going to reserve that memory away from userland. You will still see all the memory at system level as "free". As the GPU driver starts using it, other apps/the OS will try to use the "free" memory, not knowing how much of it is in use (it may show up as "cache", or not at all). Then OOM killer starts going or programs start crashing, and at some point the OS tips over or GPU driver crashes. You can add loads of swap as a compromise and it works okay, if a bit slow.

In any case, loading a gigantic model just to use system RAM is absurdly slow (due to mem bandwidth), like 1-5 t/s, so it's not practical. It'd take a whole day to process one 86k token request. Just pay a cloud provider $0.01 to do it in 10 seconds.

sabareesh•35m ago
I wish it provided benchmark comparing Direct RAM offload vs CPU offload vs Full VRAM
Havoc•13m ago
> The best strategy is to shrink the model until it fits — either with EXL3 quantization or ModelOpt PTQ — and use GreenBoost's DDR4 pool for KV cache only.

Does this make sense? I'd have thought the KV is guaranteed to be used 100% of the time while say in a MoE the same can't be said of the weights.

Though I suppose if you're shooting for huge context then having that allocation go into ram makes sense specially when its allocated but not used yet

Warranty Void If Regenerated

https://nearzero.software/p/warranty-void-if-regenerated
145•Stwerner•3h ago•65 comments

Nvidia greenboost: transparently extend GPU VRAM using system RAM/NVMe

https://gitlab.com/IsolatedOctopi/nvidia_greenboost
93•mmastrac•3d ago•21 comments

OpenRocket

https://openrocket.info/
365•zeristor•3d ago•80 comments

Wander – A tiny, decentralised tool to explore the small web

https://susam.net/wander/
181•susam•16h ago•51 comments

Rob Pike’s Rules of Programming (1989)

https://www.cs.unc.edu/~stotts/COMP590-059-f24/robsrules.html
821•vismit2000•14h ago•404 comments

The math that explains why bell curves are everywhere

https://www.quantamagazine.org/the-math-that-explains-why-bell-curves-are-everywhere-20260316/
28•ibobev•2d ago•8 comments

Nvidia NemoClaw

https://github.com/NVIDIA/NemoClaw
220•hmokiguess•8h ago•170 comments

Book: The Emerging Science of Machine Learning Benchmarks

https://mlbenchmarks.org/00-preface.html
75•jxmorris12•4d ago•1 comments

Show HN: Will my flight have Starlink?

147•bblcla•6h ago•177 comments

What's on HTTP?

https://whatsonhttp.com/
16•elixx•2h ago•3 comments

Nightingale – open-source karaoke app that works with any song on your computer

https://nightingale.cafe/
481•rzzzzru•15h ago•144 comments

Show HN: I built 48 lightweight SVG backgrounds you can copy/paste

https://www.svgbackgrounds.com/set/free-svg-backgrounds-and-patterns/
137•visiwig•8h ago•22 comments

Show HN: Playing LongTurn FreeCiv with Friends

https://github.com/ndroo/freeciv.andrewmcgrath.info
43•verelo•5h ago•19 comments

2025 Turing award given for quantum information science

https://awards.acm.org/about/2025-turing
86•srvmshr•13h ago•21 comments

Show HN: Tmux-IDE, OSS agent-first terminal IDE

https://tmux.thijsverreck.com
57•thijsverreck•6h ago•31 comments

CVE-2026-3888: Important Snap Flaw Enables Local Privilege Escalation to Root

https://blog.qualys.com/vulnerabilities-threat-research/2026/03/17/cve-2026-3888-important-snap-f...
84•askl•8h ago•49 comments

On a Boat

https://moq.dev/blog/on-a-boat/
121•mmcclure•4d ago•23 comments

Machine Payments Protocol (MPP)

https://stripe.com/blog/machine-payments-protocol
136•bpierre•8h ago•70 comments

Despite Doubts, Federal Cyber Experts Approved Microsoft Cloud Service

https://www.propublica.org/article/microsoft-cloud-fedramp-cybersecurity-government
427•hn_acker•9h ago•197 comments

Show HN: Hacker News archive (47M+ items, 11.6GB) as Parquet, updated every 5m

https://huggingface.co/datasets/open-index/hacker-news
272•tamnd•4d ago•124 comments

OpenAI Has New Focus (on the IPO)

https://om.co/2026/03/17/openai-has-new-focus-on-the-ipo/
129•aamederen•13h ago•134 comments

FBI is buying location data to track US citizens, director confirms

https://techcrunch.com/2026/03/18/fbi-is-buying-location-data-to-track-us-citizens-kash-patel-wyden/
342•jbegley•3h ago•113 comments

Measuring progress toward AGI: A cognitive framework

https://blog.google/innovation-and-ai/models-and-research/google-deepmind/measuring-agi-cognitive...
96•surprisetalk•12h ago•150 comments

Death to Scroll Fade

https://dbushell.com/2026/01/09/death-to-scroll-fade/
343•PaulHoule•8h ago•184 comments

Using calculus to do number theory

https://hidden-phenomena.com/articles/hensels
111•cpp_frog•2d ago•17 comments

Explore 19th Century Scientific Correspondence

https://epsilon.ac.uk/
14•rramadass•3d ago•1 comments

Trevor Milton is raising funds for a new jet he claims will transform flying

https://www.wsj.com/business/trevor-milton-pardon-nikola-trump-3163e19c
78•jgalt212•11h ago•124 comments

A ngrok-style secure tunnel server written in Rust and Open Source

https://github.com/joaoh82/rustunnel
75•joaoh82•10h ago•35 comments

Celebrating Tony Hoare's mark on computer science

https://bertrandmeyer.com/2026/03/16/celebrating-tony-hoares-mark-on-computer-science/
128•benhoyt•17h ago•31 comments

Snowflake AI Escapes Sandbox and Executes Malware

https://www.promptarmor.com/resources/snowflake-ai-escapes-sandbox-and-executes-malware
224•ozgune•8h ago•69 comments