Ask HN: Anyone is using Linux machine for local inference?

2•throwaw12•10h ago

Hey there,

Is anyone here using Linux machine with 256Gb or 512Gb RAM to run latest models locally?

I am considering buying a new laptop/desktop to run models locally. Most benchmarks I see are for Mac Mx series chips with MLX, even then for big models (>300B param) people are using quantized versions (3bit, 4bit) and its causing drop in quality.

If anyone used Linux with >256Gb ram and no dedicated GPU, how is your experience?

Comments

compressedgas•8h ago

Running LLMs on CPU only is too slow.

incomingpain•7h ago

Ive tried this with deepseek r1, i got about 2 tokens/second and each response took 10-15 minutes to reply.

The cost of that hardware was free to me, but to build this yourself would be thousands. You might as well just hit up an api: https://openrouter.ai/deepseek/deepseek-r1-0528/providers

Even if you hammer it, it'll only be $10.

>Most benchmarks I see are for Mac Mx series chips with MLX

Mac mini pro with 64gb of ram is actually suspiciously good value. Somehting like $4000... bit high but it can be your workstation.

The gpu and system memory are unified so you can load up bigger models. It's not the same speeds as high end gpus, but it's also not the same power draw. You'll stick to under 200watts.

Obviously 64GB doesnt let you run full deepseek or similar neither; but those 32B-70B models are ideal anyway.

At a bit cheaper price, there are minipcs with AMD Ryzen™ AI Max+ 395. Same idea as the mac mini; and you can get 64-128GB of ram. Intel has a similar chip.

You'll get 15-20 tokens/s from 32B. Which is slow if you're coding.

Now, you could look into high end gpus, get a server mobo with 10 pcie slots, load it up with 16GB cards. Have 160GB of vram. But you'll need special electrical plugs; it'll idle at like 600watts, costing $100/month. But man that thing would be great, so fast.

How JIT builds of CPython work

'bitchat? now on the App Store

Large-scale study uncovers 57 genetic hotspots into stuttering origins

Innovation starts with consumers, not academia

Toshiba MG11 series hard drive

Debugging Hell: Spark Tomcat and Proxies

Researcher is a relic term from academia – Elon Musk

Microsoft Nears OpenAI Agreement for Ongoing Tech Access

Predictive UX Engineering

A Curated List of Awesome Honeypots

DietPi released a new version v9.15

Google's June 2025 Core Update

Spotify stock falls on revenue miss, lackluster guidance

Microsoft bans LibreOffice developer's account without warning, rejects appeal

Show HN: Gradient-Free ML Algorithm (Available for Contract Work)

Show HN: I Built a GitHub Action to Wait for Vercel Deployments Before CI

New Generational Pomodoro

Big Tech Is the Only Winner of the [UK's] Online Safety Act

GLM 4.5 one-shots a Full Coding Project

One Year After Fisker's Bankruptcy, Ocean Owners Are Still Paying the Price

Unleashing the Editing Superpower of Emacs

Scamming Substack?

Ask HN: Why the fundamental skepticism around LLMs?

Private Equity in the Hospital Industry (2021)

Apple to Shutter 1st Retail Store in China

SecureFlow Extension to Vibe Code Securely – Codepathfinder.dev

HealthEquity to Replace Passwords with Passkeys

Alcoholic Drink Names You're Probably Mispronouncing

Show HN: I built an API to generate PDF invoices from JSON

Solving the "AI agent black box" problem with typed tasks