Mini PC for local LLMs in 2026

https://terminalbytes.com/best-mini-pc-for-local-llm-2026/

27•charlieirish•1h ago

Comments

znpy•1h ago

As somebody that has a vague interest in running local LLMs… they day i decide to burn cash on hardware I might as well go all-in a get either a 128gb mac studio or an nvidia dgx spark (or some other equivalent gb10-based system).

The 64gb mac mini is also interesting, if anything because it is very likely to hold most of its value when reselling.

I’m keeping an eye on the next apple hardware refreshes, particularly for mac minis and mac studios.

amelius•1h ago

The models are good enough now, so I'm waiting for the day they start selling inference ASICs with 100x the token output speed. See Taalas demo.

adityamwagh•58m ago

Taalas is a nice concept, but I don’t want to use the same model forever!

amelius•49m ago

Just buy a new one every few years, just like your phone and laptop. And sell the old one.

2ndorderthought•1h ago

I just use my gaming pc. So I can play games or code with assistance for fun. It's awesome because it's mine and technically I can do whatever I want with it. Having a decent computer around and lower end laptops is pretty budget friendly.

walthamstow•1h ago

The 14inch Macbook Pros with 64GB are really good value considering it's a much more complicated machine than the Mini.

edot•49m ago

I am in a similar boat to you, but I can’t make the money math work. Local LLMs obviously have a privacy benefit but DeepSeek V4 Flash (which you’ll struggle to get running on any single Mac - you’d need at least 128gb RAM) is $0.14$/mtok input $0.28/mtok output on the API. You’d have to be just absolutely burning tokens to ever make this make sense.

Mac Studio M4 Max with 128gb at $3,699 (if you can find it) would equate to 10 million tokens a day of mixed input-output for over 5 years to break even. At which point that hardware is outdated compared to the SOTA models that will probably still be cheap on hosted platforms.

pjmlp•1h ago

Currently NVidia's mini PC, or the version licensed to Asus, is one of the few that I can actually buy with Linux pre-installed with a fully OEM supported version.

One would expect that by now buying desktop class computers on shops with a Linux experience would be rather common.

Geekcom devices that it advertises as Linux ready, are actually sold with Windows pre-installed.

I guess they mean WSL ready.

Neywiny•1h ago

I would guess they mean it's ready for you to install Linux on it

pjmlp•54m ago

Yeah, ignoring the whole fragmentation that keeps happening on the desktop stack, The Year of Desktop Linux will never happen if only computer nerds get to build such systems, as it has always been.

Instead normies get The Year of Linux kernel deployed with all kinds of consumer devices, and The Year of Linux VMs on retail.

alexktz•1h ago

Could we post articles that are obviously written by an LLM with a flair?

aalam•1h ago

"Here's the part that nobody talks about"

"Two gotchas before you click buy"

I really think there could be a score for entropy in playfulness that should differentiate LLM output

bachmeier•1h ago

"Local inference is rarely cheaper if you’re being honest with yourself about how much you actually use it."

Sorry, but this is not even close to "being honest", it's bad math. That calculation assumes you do nothing with the computer other than local inference.

hdgvhicv•58m ago

Doesnt that calculation assume you value your privacy and owmership at zero too?

spwa4•31m ago

Huh, you make me curious. Let's actually do that calculation. Let's say you do actually do 24/7/365 AI use. Let's say by some miracle you can do 60 t/s on Qwen 3.6 27b, and let's say this PC cost $3000 (you should be able to do this on a DGX spark, and one of the non-Nvidia models, e.g. the Dell one. $3000 would be a good price, but not totally out of the question). And, of course, let's say these prices remain stable.

So that gets you 1_892_160_000 tokens per year at full blast.

If you go the openrouter, eh, route, you'd get charged $2 per million tokens (anywhere from $2 to $3.6 per million tokens). So the value you'd get from your machine at 100% utilization is 1892 * $2 = $3784 up to 1892 * $3.6 = $6800)

So yeah, not counting electricity and your time the machine "is worth it".

[1] https://openrouter.ai/qwen/qwen3.6-27b/providers

bluechair•1h ago

“What’s the memory bandwidth (GB/s) of the device holding the model weights?”

Isn’t the recommended option going to be dog slow at 256 GB/s.

croes•1h ago

> 128GB Ryzen AI MAX+ 395, listed at $2,099.

Wasn‘t that a discounted price?

cowmix•57m ago

I got mine almost exactly a year ago - $1699 direct from GMKTEK. To think it retails for 2X that, a year later, blows my mind.

dannyw•1h ago

> The 256 GB/s number is real, but for context, an Apple M5 Ultra hits ~800 GB/s on its unified memory

The M5 Ultra has not been even announced.

This article appears to be predominately or entirely LLM-produced with little to no human review, and contains numerous material and misinforming errors.

It also omits serious contenders that's worth at least comparing, like the DGX Spark.

woadwarrior01•55m ago

It appears to be an LLM-generated affiliate link farm.

jcgrillo•59m ago

I got a well used HP Z840 with 256GB ECC DDR4 and twin Xeons ca. 2014. Then I slapped 2 AMD V640 32GB passively cooled GPUs in it with some 3D printed fan shrouds and 2 1U 15k rpm fans each. They just fit! I needed to order a quad 8pin power cable, the standard configuration has 3 6pin cables--but there's unused pins on the GPU power rail, and there are aftermarket suppliers.

72 Xeon cores

256GB ECC DDR4

64GB VRAM

$2200 total

I run it on a 20A 240V outlet to make sure the power supply can deliver enough watts, but so far it's working pretty well. The eWaste LLM rig is probably not as good value for money as a new machine, but it gets the job done cheaper (for now).

EDIT: IIRC this approach gets me more VRAM bandwidth than Strix Halo at the cost of less addressable GBs (but a lot more total system RAM), but I figured with CPU offloading that might make up for it?

ALSO EDIT: Note you can get a 128GB Strix Halo motherboard minus power supply, fans, case, etc from Framework for $2200.. that could work if you have some parts lying around.

lkey•56m ago

This article was authored by AI. It contains hallucinated info from compilations of random reddit threads.

visarga•53m ago

Yes, I too think it's authored by AI, but can you indicate where it is wrong?

mark_l_watson•55m ago

I bought a 32G MacMini over two years ago and it has been great for experimenting with local models, and now is even useful for local coding (at a slow speed!) with models supporting large context sizes.

With the current extreme RAM shortage I deeply regret not buying a 64G MacMini a few months ago.

I bet a zillion people feel the same way.

pjmlp•35m ago

Which is why the Mac Pro was actually relevant.

Those of us on PC land can at least extend them, or exchange the GPU, even if pricey.

Apple has lost the server and workstation market by their own decisions.

visarga•54m ago

Good research, but man do I feel the LLM vibe shining through. That sustained information density...

jcgrillo•43m ago

Look closer, it really isn't good research

jmyeet•53m ago

There's some mention of Apple silicon here but it's worth expanding upon. Macs have a unified memory architecture. So if you have a Mac with 64GB of memory then the GPU can use all of that. This is potentially quite useful but Apple silicon in general is limited by memory bandwidth. For comparison, a 5090 is 1792GB/s. Here are some examples:

- GMKTek EVO-X2: 120GB/s reads, 212GB/s writes

- NVidia DGX Spark 273GB/s

- Mac Mini M4 120GB/s but only $600+

- Mac Mini w/ M4 Pro 273GB/s ($2199 for 64GB)

- Mac Studio M4 Max 410GB/s ($3500 for 128GB)

- Mac Studio M3 Ultra 819GB/s ($5500 for 96GB)

- Macbook Pro 16" with M5 Pro 64GB 307GB/s ($3300)

- Macbook Pro 16" with M5 Max 128GB 460GB/s ($5399)

Sadly, Apple discontinued the 512GB Mac Studio. Mac Studios are a little long in the tooth now and due for an upgrade this year. I suspect that prices will be a lot higher given the RAM prices but we'll see.

Upcoming Blender Development Fund and AI Policies

The Annoying Usefulness ofa Emacs [video]

The Sky Tonight

New US phone network for Christians to block porn and gender-related content

Making Your Writing Work Harder for You

Show HN: TradingAgents without the API bill – run multi agents in Claude Code

Stop Supplying. Start Owning

Uber wants to turn its drivers into a sensor grid for AV companies

Zugzwang

If Claude writes the code, what makes me still a developer?

Santa Cruz restaurant changes logo after flurry of negative reviews for AI art

LLMs consistently pick resumes they generate over ones by humans or other models

Domination: A contrarian view of AI risk (2024)

I moved my blog from Jekyll to Emacs Lisp

The History of Lipstick

Alberta allows windfall oil and gas payments to select ranchers – on public land

US blockade costs Iran $4.8B, US Navy acting 'sort of like pirates,' Trump says

A preliminary model to establish a digital twin for coffee roasting

Show HN: RegularMonk – a web app that helps me use my phone less

Apple Faces Lawsuits over AirTag Stalking After Class Action Denied

Make Common Sense Common Again

Stackless coroutines for gamedev in ~200 lines of C++

Proudly Pathetic

NASA to increase CLPS contract to support surge of lunar lander missions

America's Expanding Domestic Surveillance

The Fake Hawaii CTO Who Fooled Everyone

Apple Stores Targeted in $16.2M Counterfeit Device Scheme

Docker vs. Podman: Which Containerization Tool Is Right for You – DataCamp

Ask HN:Do people configure Claude Code to use other models

LibreLocal 2026 – Global Meetups Across Six Continents