frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

A 10 year old Xeon is all you need (for 26B-A4B MTP Drafters without GPU)

https://point.free/blog/gemma-4-on-a-2016-xeon/
5•cafkafk•56m ago

Comments

cafkafk•53m ago
Hi HN. I wrote this post after getting frustrated by the lack of ways to run the new Gemma 4 Drafter models, and mainstream tools not prioritizing this, and hiding all the performance levers.

I ended up getting a modern 26B MoE model (Gemma 4) running at reading speed on an old recycled server with a single Xeon E5-2620 v4 and 128GB of DDR3 RAM (and no GPU). It took a lot of work, but it actually worked out somehow.

I've also linked the quants at the end, but they're not gonna run unless you use the ik_llama-cpp fork I mention, see other posts for more details.

I'm not an ML engineer, so I'm by no means an expert, and the server is busy acting as a Nix cache, but if you have any question, I can try to answer, but best effort.

fragmede•28m ago
(purple on black is really hard to read)

You say it runs "at reading speed". Have you benchmarked it?

cafkafk•3m ago
> (purple on black is really hard to read)

Noted, and agree (it looks like it has also already been clicked, which I dislike). I honestly I need to redo the themes.

> You say it runs "at reading speed". Have you benchmarked it?

At some point a few weeks ago, yes I think so, but I didn't write it down for some reason... so I'll have to find a time when it's not busy and do it again without a noisy system. Right now the system is noisy, but that said doing it like this:

llama-cli --model gemma-4-26B-A4B-it-Q8_0.gguf --model-draft gemma-4-26B-A4B-t-assistant-GGUF/wikitext-2-raw_ik-llama-mtp_drafter-conservative/gemma-4-26B-A4B-it-assistant-Q8_0.gguf --spec-type mtp --draft-max 3 --draft-p-min 0.0 --color -sm graph -smgs -sas -mea 256 --split-mode-f32 --temp 0.7 --cpu-moe -t 8 --flash-attn on --mla-use 3 --merge-up-gate-experts --special --mlock --run-time-repack --spec-autotune --no-kv-offload --parallel 8 --jinja -p "Why is the sky blue?" -n 128

Gives:

  llama_print_timings:        load time =   83911.65 ms
  llama_print_timings:      sample time =      26.99 ms /   128 runs   (    0.21 ms per token,  4742.15 tokens per second)
  llama_print_timings: prompt eval time =     343.41 ms /     7 tokens (   49.06 ms per token,    20.38 tokens per second)
  llama_print_timings:        eval time =   10639.36 ms /   127 runs   (   83.77 ms per token,    11.94 tokens per second)
  llama_print_timings:       total time =   11114.98 ms /   134 tokens
So 11.94 tokens per second while it's also playing binary cache and CI builder.

When I do it properly, I'll add it to the blog as well!

Eonexus•17m ago
I wonder what the tokens per second actually are. Yes, it does say "reading speed" but that varies for everyone, no?
potus_kushner•15m ago
@cafkafk got a recommendation for a good model that fits into 64GB and leaves a couple GB free for other tasks ?

Decache – you might have lost media in your PC's cache folders

https://sindexmon.github.io/decache/
1•notRobot•27s ago•0 comments

Criminal Activities and Migration

https://www.michelecoscia.com/?p=2545
1•mikk14•1m ago•0 comments

A free, open-source library of DESIGN.md files for AI-generated UIs

https://design-md-web.pages.dev/
1•albemala•2m ago•1 comments

Dune's Butlerian Jihad and the Future of AI

https://technology.inquirer.net/147084/dunes-butlerian-jihad-and-the-future-of-ai
1•SVI•4m ago•0 comments

MiniMax M3

https://xcancel.com/MiniMax_AI/status/2061266317815296322
1•44za12•4m ago•0 comments

People are apparently farming citations on ResearchGate – Chuniversiteit

https://chuniversiteit.nl/papers/citation-farming-on-researchgate
1•rhazn•5m ago•0 comments

The DOJ Wants to Know Who on Reddit and X Is Criticizing ICE's Tactics

https://www.bloomberg.com/news/articles/2026-05-28/trump-s-doj-ramps-up-probes-of-anonymous-ice-c...
1•petethomas•7m ago•0 comments

How Elon Musk Killed Hundreds of Thousands of People

https://www.currentaffairs.org/news/how-elon-musk-killed-hundreds-of-thousands-of-people
1•tastyface•11m ago•0 comments

Basketeer – a typed TS SDK for your Tesco account, with nutrition data

https://github.com/tobyandrews1985/basketeer
1•tobyandrews1985•12m ago•0 comments

'Penguin' decays from CERN's Large Hadron Collider experiment hint new physics

https://www.scientificamerican.com/article/these-exotic-particles-could-break-physics/
1•thunderbong•17m ago•0 comments

Emergence World: A Laboratory for Evaluating Long-Horizon Agent Autonomy

https://www.emergence.ai/blog/emergence-world-a-laboratory-for-evaluating-long-horizon-agent-auto...
1•mnky9800n•20m ago•0 comments

Homebrew lead Mike McQuaid: Sandboxes and Worktrees - My Secure Agentic AI Setup

https://mikemcquaid.com/sandboxed-agent-worktrees-my-coding-and-ai-setup-in-2026/
1•benwen•21m ago•0 comments

Lean, Not Backpressure

https://entropicthoughts.com/lean-not-backpressure
1•kqr•24m ago•0 comments

Using Git's rerere feature to escape recurring conflict hell

https://gist.github.com/skipcloud/f1033afb4fa5681d69fa63458cc95928
1•ankitg12•29m ago•0 comments

Malaysia enforces ban on social media accounts for children younger than 16

https://apnews.com/article/malaysia-social-media-ban-16-bfaa7b01163b61b5d53c4ecfa870d133
22•01-_-•30m ago•2 comments

AI Dangers Eclipse Nuclear Weapons at Singapore Defense Forum

https://www.bloomberg.com/news/articles/2026-05-30/ai-dangers-eclipse-nuclear-weapons-at-singapor...
1•01-_-•30m ago•0 comments

Open source analytics that answers backbase

https://www.metabase.com/
1•janandonly•31m ago•0 comments

Turkey Hacked the Hair Transplant Industry

https://www.wired.com/story/how-turkey-hacked-the-hair-transplant-industry/
1•joozio•33m ago•0 comments

How GPT Image 2 Is Transforming Marketing Workflows in 2026

https://gpt-image2ai.net/blog/gpt-image-2-marketing-workflows-2026/
1•wangneo276•34m ago•0 comments

Improve Git monorepo performance with a file system monitor

https://github.blog/engineering/infrastructure/improve-git-monorepo-performance-with-a-file-syste...
1•ankitg12•39m ago•0 comments

Strava for Claude Code

https://straude.com
1•fragmede•41m ago•0 comments

Rift: Better Alternative to Git Worktrees

https://github.com/anomalyco/rift
2•f4n4tiX•42m ago•0 comments

MiniMax M3 on Qubrid AI

1•Qubrid_AI•42m ago•0 comments

There's Something Else We Should Be Worrying About

https://www.nytimes.com/2026/05/31/opinion/artificial-intelligence-public-good.html
4•iancmceachern•52m ago•4 comments

Growth Isn't About Doing Everything

https://arpitbhayani.me/blogs/growth-is-not-about-doing-everything/
1•imakumar98•55m ago•0 comments

A 10 year old Xeon is all you need (for 26B-A4B MTP Drafters without GPU)

https://point.free/blog/gemma-4-on-a-2016-xeon/
5•cafkafk•56m ago•5 comments

Celebrity Profile of an A.I. Actress

https://www.nytimes.com/2026/05/31/magazine/ai-actress-tilly-norwood.html
2•ryan_j_naughton•57m ago•0 comments

What Is Windows K2?

https://www.windowscentral.com/microsoft/windows-11/what-is-windows-k2-everything-you-need-to-kno...
1•tosh•58m ago•0 comments

AI is devoid of meaning and humanity. Its vapid voice suits the political moment

https://www.theguardian.com/commentisfree/2026/jun/01/ai-meaning-humanity-political-moment-trust-...
3•devonnull•1h ago•0 comments

Show HN: Interpreto – Live Translation for Travel

https://www.interpre.to
1•HudZah•1h ago•3 comments