frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Apple Silicon costs more than OpenRouter

https://twitter.com/rohan_sood15/status/2056585919805714777
20•rohansood15•1h ago

Comments

gavinsyancey•30m ago
You managed to reverse the title somehow.
rohansood15•28m ago
That's some HN shenanigans, I swear I copy pasted my original title. https://imgur.com/a/UgJqWEh
selcuka•27m ago
Because less is more?
DesaiAshu•16m ago
I think the most correct would be "OpenRouter costs 800% less than Apple Silicon" ;)
consumer451•10m ago
For future reference, after posting you have a couple minutes to undo any of those auto-shenanigans. I always check, and undo whatever appears to be a silly regex.

YC is super AI-forward regarding request for startups, so it feels like about time that this became an LLM-based thing. LLMs do have their uses.

note: would be hilarious if this was the result of an LLM fail, using a new system. I am a regex muggle, but could regex have even fumbled like that?

est•22m ago
clickbait.
rohansood15•18m ago
Nope, HN changed the title.

https://imgur.com/a/UgJqWEh

rohansood15•22m ago
The title is Apple Silicon costs LESS than OpenRouter. Not sure why it got updated to this - maybe because I referenced the original HN post?

Here's the full post:

TLDR; When you consider batching, cache and input tokens, together with the residual cost of Macbook Pro is actually 14% cheaper than OpenRouter. This becomes a whooping 3x (i.e. 65%) cheaper if you consider MoE models like Gemma 4 26B.

There was a well-meaning post yesterday by @DataDrivenAngel comparing costs of self-hosting LLMs v/s using OpenRouter (HN link). The analysis however had a few flaws as pointed out by the HN community, and I ran benchmarks on my M4 Max 128GB to adjust for those.

1. The estimate was based entirely using output tokens, instead of real-world input-output token mix. The numbers look very different if you consider a 4:1 or 5:1 input to output token ratio.

2. Batching/concurrency/caching improves token throughput, and if you're running multiple coding agents/work trees the performance gain can be significant.

3. A Macbook Pro is an asset purchase, and retains significant residual value through it's life. Probably not unreasonable to expect ~1.5-2.5k resale value after 3-5 years of use.

I ran vllm bench using a resonable approximation for a coding agent workload with concurrency 4 for Gemma 4 31B (same as the original post), and got the following results:

-----------------------------------

Serving Benchmark Gemma 4 31B Successful requests: 20 Maximum request concurrency: 4 Benchmark duration (s): 263.19 Total input tokens: 35000 Total generated tokens: 6400 Request throughput (req/s): 0.08 Output token throughput (tok/s): 24.32 Peak output token throughput (tok/s): 36 Peak concurrent requests: 8 Total token throughput (tok/s): 157.3

Scenario 3 years $0.15 Local cheaper (~6%) 5 years $0.14 Local cheaper (~13%) 7 years $0.13 Local cheaper (~19%)

-----------------------------------

Once you work out the math (using original assumptions on power costs and 5 year timeline), you get to a blended cost of ~$0.14 per million tokens for local, v/s ~$0.16 for OpenRouter. That is not a massive win. But it is close enough to flip the narrative from local being more expensive to 'it depends'.

But it doesn't end there. If you used an MoE model like Gemma 4 26B, the blended cost drops to $0.038 per million tokens, v/s OpenRouter's $0.1 per million. That is a ~3x difference.

-----------------------------------

Serving Benchmark Gemma 4 26B (MoE) Successful requests: 20 Maximum request concurrency: 4 Benchmark duration (s): 60.05 Total input tokens: 30002 Total generated tokens: 4870 Request throughput (req/s): 0.33 Output token throughput (tok/s): 81.1 Peak output token throughput (tok/s): 128 Peak concurrent requests: 8 Total token throughput (tok/s): 580.72

Scenario 3 years $0.040 Local cheaper (~60%) 5 years $0.038 Local cheaper (~62%) 7 years $0.035 Local cheaper (~65%)

-----------------------------------

This is not meant as an attack on the original analysis - I am sure the synthetic bench I used has a few holes, plus buying price/residual value varies a fair bit. Plus, I don't think anybody will run their MBP for inference for 5 years straight. But with worsening GPU supply and the inevitable price/access squeeze, I think local LLMs have a huge role to play. And this is on top of the privacy benefits. A misperceived price differential should not be the reason that slows down adoption.

jmalicki•9m ago
The tweet does not make clear what the power cost assumptions are? That is wildly variable and important! For some people it may be, perhaps not for others.
rohansood15•5m ago
I used the same assumptions as the original HN post https://news.ycombinator.com/item?id=48168198
dnnddidiej•13m ago
As you would expect? It is also cheaper than EC2 for general compute.
rohansood15•11m ago
The title auto-corrected, my post was 'less' not 'more'.
seltzered_•9m ago
Similar titled discussion from a day ago: https://news.ycombinator.com/item?id=48168198

The last six months in LLMs in five minutes

https://simonwillison.net/2026/May/19/5-minute-llms/
142•yakkomajuri•3h ago•70 comments

Click (2016)

https://clickclickclick.click/
249•andrewzeno•6h ago•60 comments

PyTorch Landscape

https://pytorch.landscape2.io
11•salamo•59m ago•0 comments

Anyone on the Internet Can Ring Your Doorbell

https://www.abgeo.dev/blog/anyone-can-ring-your-doorbell
52•jrdres•2d ago•17 comments

Codex-Maxxing

https://jxnl.co/writing/2026/05/10/codex-maxxing/
10•dnw•1h ago•1 comments

Regex Chess: A 2-ply minimax chess engine in 84,688 regular expressions

https://nicholas.carlini.com/writing/2025/regex-chess.html
73•surprisetalk•4d ago•12 comments

Anthropic acquires Stainless

https://www.anthropic.com/news/anthropic-acquires-stainless
406•tomeraberbach•12h ago•279 comments

War game exposed U.S. vulnerability to low-tech warfare

https://nsarchive.gwu.edu/news/2024-11-01/rigged-war-game-exposed-us-vulnerability-low-tech-warfare
34•KnuthIsGod•3h ago•29 comments

Peter Salus has died

https://www.tuhs.org/pipermail/tuhs/2026-May/033750.html
98•speckx•2h ago•8 comments

Pope Leo XIV’s first encyclical Magnifica humanitas to be published May 25

https://www.vaticannews.va/en/pope/news/2026-05/pope-leo-xiv-first-encyclical-magnifica-humanitas...
166•cucho•6h ago•99 comments

We stopped AI bot spam in our GitHub repo using Git's –author flag

https://archestra.ai/blog/only-responsible-ai
448•ildari•14h ago•201 comments

Hyperpolyglot Lisp: Common Lisp, Racket, Clojure, Emacs Lisp

https://hyperpolyglot.org/lisp
148•veqq•10h ago•34 comments

The Quiet Renovation at Bitwarden

https://blog.ppb1701.com/the-quiet-renovation-at-bitwarden
598•DaSHacka•2d ago•268 comments

Turn your Android phone into a ham radio transceiver

https://www.kv4p.com/
12•krupan•2d ago•0 comments

Show HN: Files.md – Open-source alternative to Obsidian

https://github.com/zakirullin/files.md
598•zakirullin•15h ago•294 comments

Show HN: Number Gacha, a gacha game distilled to its essence

https://isabisabel.com/gacha/
102•babel16•5d ago•42 comments

We let AIs run radio stations

https://andonlabs.com/blog/andon-fm
215•lukaspetersson•11h ago•179 comments

When can the C++ compiler devirtualize a call?

https://quuxplusone.github.io/blog/2021/02/15/devirtualization/
41•lionkor•1d ago•6 comments

Cursor Introduces Composer 2.5

https://twitter.com/cursor_ai/status/2056415413077233983
86•asar•12h ago•43 comments

Project Glasswing: what Mythos showed us

https://blog.cloudflare.com/cyber-frontier-models/
308•Fysi•15h ago•120 comments

Peter Neumann has died

https://www.tuhs.org/pipermail/tuhs/2026-May/033748.html
8•pabs3•2h ago•1 comments

Elon Musk has lost his lawsuit against Sam Altman and OpenAI

https://techcrunch.com/2026/05/18/elon-musk-has-lost-his-lawsuit-against-sam-altman-and-openai/
877•nycdatasci•11h ago•442 comments

Earth's Radio Bubble: Every signal we've ever sent into space

https://www.thescientificdrop.com/2026/05/earths-radio-bubble-every-signal-weve.html
57•jonbaer•20h ago•30 comments

Show HN: Hsrs – Type-Safe Haskell Bindings Generator for Rust

https://github.com/harmont-dev/hsrs
4•suis_siva•1h ago•0 comments

Why is it called Kent House?

https://diamondgeezer.blogspot.com/2026/05/kent-house.html
9•susam•2d ago•1 comments

LLMCap – A proxy that hard-stops LLM API calls when you hit a dollar cap

https://www.llmcap.io/
3•cfaruk•1h ago•0 comments

Agora-1: The Multi-Agent World Model

https://odyssey.ml/introducing-agora-1
94•olivercameron•10h ago•18 comments

Sieve – scans Cursor/Claude chat history for leaked API keys

https://apps.apple.com/us/app/sieve-secret-scanner/id6767409365?mt=12
8•helpful_human•2h ago•1 comments

Alignment pretraining: AI discourse creates self-fulfilling (mis)alignment

https://arxiv.org/abs/2601.10160
41•anigbrowl•7h ago•17 comments

Apple Silicon costs less than OpenRouter

https://twitter.com/rohan_sood15/status/2056585919805714777
21•rohansood15•1h ago•13 comments