frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Big GPUs don't need big PCs

https://www.jeffgeerling.com/blog/2025/big-gpus-dont-need-big-pcs
70•mikece•3h ago

Comments

jonahbenton•2h ago
So glad someone did this. Have been running big gpus on egpus connected to spare laptops and thinking why not pis.
3eb7988a1663•2h ago
Datapoints like this really make me reconsider my daily driver. I should be running one of those $300 mini PCs at <20W. With ~flat CPU performance gains, would be fine for the next 10 years. Just remote into my beefy workstation when I actually need to do real work. Browsing the web, watching videos, even playing some games is easily within their wheelhouse.
ekropotin•1h ago
As experiment, I decided to try using proxmox VM with eGPU and usb bus bypassed to it, as my main PC for browsing and working on hobby projects.

It’s just 1 vCPU with 4 Gb ram, and you know what? It’s more than enough for these needs. I think hardware manufactures falsely convinced us that every professional needs beefy laptop to be productive.

reactordev•39m ago
I went with a beelink for this purpose. Works great.

Keeps the desk nice and tidy while “the beasts” roar in a soundproofed closet.

samuelknight•24m ago
Switching from my 8-core ryzen minipc to an 8-core ryzen desktop makes my unit tests run way faster. TDP limits can tip you off to very different performance envelopes in otherwise similar spec CPUs.
yjftsjthsd-h•1h ago
I've been kicking this around in my head for a while. If I want to run LLMs locally, a decent GPU is really the only important thing. At that point, the question becomes, roughly, what is the cheapest computer to tack on the side of the GPU? Of course, that assumes that everything does in fact work; unlike OP I am barely in a position to understand eg. BAR problems, let alone try to fix them, so what I actually did was build a cheap-ish x86 box with a half-decent GPU and called it a day:) But it still is stuck in my brain: there must be a more efficient way to do this, especially if all you need is just enough computer to shuffle data to and from the GPU and serve that over a network connection.
zeusk•1h ago
Get the DGX Spark computers? They’re exactly what you’re trying to build.
tcdent•1h ago
We're not yet to the point where a single PCIe device will get you anything meaningful; IMO 128 GB of ram available to the GPU is essential.

So while you don't need a ton of compute on the CPU you do need the ability address multiple PCIe lanes. A relatively low-spec AMD EPYC processor is fine if the motherboard exposes enough lanes.

skhameneh•44m ago
There is plenty that can run within 32/64/96gb VRAM. IMO models like Phi-4 are underrated for many simple tasks. Some quantized Gemma 3 are quite good as well.

There are larger/better models as well, but those tend to really push the limits of 96gb.

FWIW when you start pushing into 128gb+, the ~500gb models really start to become attractive because at that point you’re probably wanting just a bit more out of everything.

tcdent•21m ago
IDK all of my personal and professional projects involve pushing the SOTA to the absolute limit. Using anything other than the latest OpenAI or Anthropic model is out of the question.

Smaller open source models are a bit like 3d printing in the early days; fun to experiment with but really not that valuable for anything other than making toys.

Text summarization, maybe? But even then I want a model that understands the complete context and does a good job. Even things like "generate one sentence about the action we're performing" I usually find I can just incorporate it into the output schema of a larger request instead of making a separate request to a smaller model.

dist-epoch•1h ago
This problem was already solved 10 years ago - crypto mining motherboards, which have a large number of PCIe slots, a CPU socket, one memory slot, and not much else.

> Asus made a crypto-mining motherboard that supports up to 20 GPUs

https://www.theverge.com/2018/5/30/17408610/asus-crypto-mini...

For LLMs you'll probably want a different setup, with some memory too, some m.2 storage.

jsheard•56m ago
Those only gave each GPU a single PCIe lane though, since crypto mining barely needed to move any data around. If your application doesn't fit that mould then you'll need a much, much more expensive platform.
dist-epoch•54m ago
After you load the weights into the GPU and keep the KV cache there too, you don't need any other significant traffic.
numpad0•28m ago
Even in tensor parallel modes? I thought it could only work if you're fine stalling all but n GPU for n users at any given moments.
skhameneh•53m ago
In theory, it’s only sufficient for pipeline parallel due to limited lanes and interconnect bandwidth.

Generally, scalability on consumer GPUs falls off between 4-8 GPUs for most. Those running more GPUs are typically using a higher quantity of smaller GPUs for cost effectiveness.

zozbot234•28m ago
M.2 is mostly just a different form factor for PCIe anyway.
seanmcdirmid•49m ago
And you don’t want to go the M4 Max/M3 Ultra route? It works well enough for most mid sized LLMs.
binsquare•32m ago
I run a crowd sourced website to collect data on the best and cheapest hardware setup for local LLM here: https://inferbench.com/

Source code: https://github.com/BinSquare/inferbench

Wowfunhappy•1h ago
I really would have liked to see gaming performance, although I realize it might be difficult to find a AAA game that supports ARM. (Forcing the Pi to emulate x86 with FEX doesn't seem entirely fair.)
3eb7988a1663•1h ago
You might have to thread the needle to find a game which does not bottleneck on the CPU.
kristjansson•1h ago
Really why have the PCI/CPU artifice at all? Apple and Nvidia have the right idea: put the MPP on the same die/package as the CPU.
bigyabai•51m ago
> put the MPP on the same die/package as the CPU.

That would help in latency-constrained workloads, but I don't think it would make much of a difference for AI or most HPC applications.

lostmsu•53m ago
Now compare batched training performance. Or batched inference.

Of course prefill is going to be GPU bound. You only send a few thousand bytes to it, and don't really ask to return much. But after prefill is done, unless you use batched mode, you aren't really using your GPU for anything more that it's VRAM bandwidth.

numpad0•31m ago
Not sure what was unexpected about the multi GPU part.

It's very well known that most LLM frameworks including llama.cpp splits models by layers, which has sequential dependency, and so multi GPU setups are completely stalled unless there are n_gpu users/tasks running in parallel. It's also known that some GPUs are faster in "prompt processing" and some in "token generation" that combining Radeon and NVIDIA does something sometimes. Reportedly the inter-layer transfer sizes are in kilobyte ranges and PCIe x1 is plenty or something.

It takes appropriate backends with "tensor parallel" mode support, which splits the neural network parallel to the direction of flow of data, which also obviously benefit substantially from good node interconnect between GPUs like PCIe x16 or NVlink/Infinity Fabric bridge cables, and/or inter-GPU DMA over PCIe(called GPU P2P or GPUdirect or some lingo like that).

Absent those, I've read somewhere that people can sometimes see GPU utilization spikes walking over GPUs on nvtop-style tools.

Looking for a way to break up tasks for LLMs so that there will be multiple tasks to run concurrently would be interesting, maybe like creating one "manager" and few "delegated engineers" personalities. Or simulating multiple different domains of brain such as speech center, visual cortex, language center, etc. communicating in tokens might be interesting in working around this problem.

zozbot234•22m ago
> Looking for a way to break up tasks for LLMs so that there will be multiple tasks to run concurrently would be interesting, maybe like creating one "manager" and few "delegated engineers" personalities.

This is pretty much what "agents" are for. The manager model constructs prompts and contexts that the delegated models can work on in parallel, returning results when they're done.

alecco•23m ago
Wide interconnects to load and sync do matter, though.

'LeBron James of spreadsheets' wins world Microsoft Excel title

https://www.bbc.com/news/articles/cj4qzgvxxgvo
72•1659447091•1h ago•19 comments

Backing Up Spotify

https://annas-archive.li/blog/backing-up-spotify.html
219•vitplister•2h ago•70 comments

Pure Silicon Demo Coding: No CPU, No Memory, Just 4k Gates

https://www.a1k0n.net/2025/12/19/tiny-tapeout-demo.html
208•a1k0n•4h ago•23 comments

OpenSCAD Is Kinda Neat

https://nuxx.net/blog/2025/12/20/openscad-is-kinda-neat/
123•c0nsumer•3h ago•84 comments

Log level 'error' should mean that something needs to be fixed

https://utcc.utoronto.ca/~cks/space/blog/programming/ErrorsShouldRequireFixing
227•todsacerdoti•3d ago•145 comments

Big GPUs don't need big PCs

https://www.jeffgeerling.com/blog/2025/big-gpus-dont-need-big-pcs
72•mikece•3h ago•26 comments

I spent a week without IPv4

https://www.apalrd.net/posts/2023/network_ipv6/
70•mahirsaid•2h ago•87 comments

Gemini 3 Pro vs. 2.5 Pro in Pokemon Crystal

https://blog.jcz.dev/gemini-3-pro-vs-25-pro-in-pokemon-crystal
208•alphabetting•4d ago•63 comments

Go ahead, self-host Postgres

https://pierce.dev/notes/go-ahead-self-host-postgres#user-content-fn-1
317•pavel_lishin•5h ago•226 comments

Show HN: HN Wrapped 2025 - an LLM reviews your year on HN

https://hn-wrapped.kadoa.com?year=2025
57•hubraumhugo•7h ago•30 comments

Biscuit is a specialized PostgreSQL index for fast pattern matching LIKE queries

https://github.com/CrystallineCore/Biscuit
32•eatonphil•4d ago•6 comments

Depot (YC W23) Is Hiring an Enterprise Support Engineer (Remote/US)

https://www.ycombinator.com/companies/depot/jobs/jhGxVjO-enterprise-support-engineer
1•jacobwg•3h ago

NTP at NIST Boulder Has Lost Power

https://lists.nanog.org/archives/list/nanog@lists.nanog.org/message/ACADD3NKOG2QRWZ56OSNNG7UIEKKT...
384•lpage•13h ago•177 comments

X-59 3D Printing

https://www.nasa.gov/stem-content/x-59-3d-printing/
20•Jsebast24•4d ago•1 comments

Immersa: Open-source Web-based 3D Presentation Tool

https://github.com/ertugrulcetin/immersa
109•simonpure•7h ago•14 comments

Detailed balance in large language model-driven agents

https://arxiv.org/abs/2512.10047
29•Anon84•4d ago•3 comments

Mathematicians don't care about foundations

https://matteocapucci.wordpress.com/2022/12/21/mathematicians-dont-care-about-foundations/
14•scrivanodev•2h ago•1 comments

Skills Officially Comes to Codex

https://developers.openai.com/codex/skills/
205•rochansinha•12h ago•106 comments

Why do people leave comments on OpenBenches?

https://shkspr.mobi/blog/2025/12/why-do-people-leave-comments-on-openbenches/
44•sedboyz•5h ago•2 comments

CSS Grid Lanes

https://webkit.org/blog/17660/introducing-css-grid-lanes/
684•frizlab•22h ago•210 comments

Privacy doesn't mean anything anymore, anonymity does

https://servury.com/blog/privacy-is-marketing-anonymity-is-architecture/
321•ybceo•14h ago•219 comments

Mistral OCR 3

https://mistral.ai/news/mistral-ocr-3
656•pember•2d ago•119 comments

Shallow trees with heavy leaves (2020)

https://cp4space.hatsya.com/2020/12/13/shallow-trees-with-heavy-leaves/
5•HeliumHydride•5d ago•0 comments

Reflections on AI at the End of 2025

https://antirez.com/news/157
172•danielfalbo•11h ago•247 comments

Charles Proxy

https://www.charlesproxy.com/
276•handfuloflight•14h ago•102 comments

Garage – An S3 object store so reliable you can run it outside datacenters

https://garagehq.deuxfleurs.fr/
672•ibobev•1d ago•150 comments

Maximizing Compression of Apple II Hi-Res Images

http://deater.net/weave/vmwprod/hgr_compress/
19•deater•4d ago•2 comments

You have reached the end of the internet

https://hmpg.net/
16•raytopia•3h ago•1 comments

New Quantum Antenna Reveals a Hidden Terahertz World

https://www.sciencedaily.com/releases/2025/12/251213032617.htm
121•aacker•4d ago•8 comments

A train-sized tunnel is now carrying electricity under South London

https://www.ianvisits.co.uk/articles/a-train-sized-tunnel-is-now-carrying-electricity-under-south...
109•zeristor•12h ago•80 comments