Apple Silicon costs more than OpenRouter

https://twitter.com/rohan_sood15/status/2056585919805714777

20•rohansood15•1h ago

Comments

gavinsyancey•30m ago

You managed to reverse the title somehow.

rohansood15•28m ago

That's some HN shenanigans, I swear I copy pasted my original title. https://imgur.com/a/UgJqWEh

selcuka•27m ago

Because less is more?

DesaiAshu•16m ago

I think the most correct would be "OpenRouter costs 800% less than Apple Silicon" ;)

consumer451•10m ago

For future reference, after posting you have a couple minutes to undo any of those auto-shenanigans. I always check, and undo whatever appears to be a silly regex.

YC is super AI-forward regarding request for startups, so it feels like about time that this became an LLM-based thing. LLMs do have their uses.

note: would be hilarious if this was the result of an LLM fail, using a new system. I am a regex muggle, but could regex have even fumbled like that?

est•22m ago

clickbait.

rohansood15•18m ago

Nope, HN changed the title.

https://imgur.com/a/UgJqWEh

rohansood15•22m ago

The title is Apple Silicon costs LESS than OpenRouter. Not sure why it got updated to this - maybe because I referenced the original HN post?

Here's the full post:

TLDR; When you consider batching, cache and input tokens, together with the residual cost of Macbook Pro is actually 14% cheaper than OpenRouter. This becomes a whooping 3x (i.e. 65%) cheaper if you consider MoE models like Gemma 4 26B.

There was a well-meaning post yesterday by @DataDrivenAngel comparing costs of self-hosting LLMs v/s using OpenRouter (HN link). The analysis however had a few flaws as pointed out by the HN community, and I ran benchmarks on my M4 Max 128GB to adjust for those.

1. The estimate was based entirely using output tokens, instead of real-world input-output token mix. The numbers look very different if you consider a 4:1 or 5:1 input to output token ratio.

2. Batching/concurrency/caching improves token throughput, and if you're running multiple coding agents/work trees the performance gain can be significant.

3. A Macbook Pro is an asset purchase, and retains significant residual value through it's life. Probably not unreasonable to expect ~1.5-2.5k resale value after 3-5 years of use.

I ran vllm bench using a resonable approximation for a coding agent workload with concurrency 4 for Gemma 4 31B (same as the original post), and got the following results:

-----------------------------------

Serving Benchmark Gemma 4 31B Successful requests: 20 Maximum request concurrency: 4 Benchmark duration (s): 263.19 Total input tokens: 35000 Total generated tokens: 6400 Request throughput (req/s): 0.08 Output token throughput (tok/s): 24.32 Peak output token throughput (tok/s): 36 Peak concurrent requests: 8 Total token throughput (tok/s): 157.3

Scenario 3 years $0.15 Local cheaper (~6%) 5 years $0.14 Local cheaper (~13%) 7 years $0.13 Local cheaper (~19%)

-----------------------------------

Once you work out the math (using original assumptions on power costs and 5 year timeline), you get to a blended cost of ~$0.14 per million tokens for local, v/s ~$0.16 for OpenRouter. That is not a massive win. But it is close enough to flip the narrative from local being more expensive to 'it depends'.

But it doesn't end there. If you used an MoE model like Gemma 4 26B, the blended cost drops to $0.038 per million tokens, v/s OpenRouter's $0.1 per million. That is a ~3x difference.

-----------------------------------

Serving Benchmark Gemma 4 26B (MoE) Successful requests: 20 Maximum request concurrency: 4 Benchmark duration (s): 60.05 Total input tokens: 30002 Total generated tokens: 4870 Request throughput (req/s): 0.33 Output token throughput (tok/s): 81.1 Peak output token throughput (tok/s): 128 Peak concurrent requests: 8 Total token throughput (tok/s): 580.72

Scenario 3 years $0.040 Local cheaper (~60%) 5 years $0.038 Local cheaper (~62%) 7 years $0.035 Local cheaper (~65%)

-----------------------------------

This is not meant as an attack on the original analysis - I am sure the synthetic bench I used has a few holes, plus buying price/residual value varies a fair bit. Plus, I don't think anybody will run their MBP for inference for 5 years straight. But with worsening GPU supply and the inevitable price/access squeeze, I think local LLMs have a huge role to play. And this is on top of the privacy benefits. A misperceived price differential should not be the reason that slows down adoption.

jmalicki•9m ago

The tweet does not make clear what the power cost assumptions are? That is wildly variable and important! For some people it may be, perhaps not for others.

rohansood15•5m ago

I used the same assumptions as the original HN post https://news.ycombinator.com/item?id=48168198

dnnddidiej•13m ago

As you would expect? It is also cheaper than EC2 for general compute.

rohansood15•11m ago

The title auto-corrected, my post was 'less' not 'more'.

seltzered_•9m ago

Similar titled discussion from a day ago: https://news.ycombinator.com/item?id=48168198

The last six months in LLMs in five minutes

Click (2016)

PyTorch Landscape

Anyone on the Internet Can Ring Your Doorbell

Codex-Maxxing

Regex Chess: A 2-ply minimax chess engine in 84,688 regular expressions

Anthropic acquires Stainless

War game exposed U.S. vulnerability to low-tech warfare

Peter Salus has died

Pope Leo XIV’s first encyclical Magnifica humanitas to be published May 25

We stopped AI bot spam in our GitHub repo using Git's –author flag

Hyperpolyglot Lisp: Common Lisp, Racket, Clojure, Emacs Lisp

The Quiet Renovation at Bitwarden

Turn your Android phone into a ham radio transceiver

Show HN: Files.md – Open-source alternative to Obsidian

Show HN: Number Gacha, a gacha game distilled to its essence

We let AIs run radio stations

When can the C++ compiler devirtualize a call?

Cursor Introduces Composer 2.5

Project Glasswing: what Mythos showed us

Peter Neumann has died

Elon Musk has lost his lawsuit against Sam Altman and OpenAI

Earth's Radio Bubble: Every signal we've ever sent into space

Show HN: Hsrs – Type-Safe Haskell Bindings Generator for Rust

Why is it called Kent House?

LLMCap – A proxy that hard-stops LLM API calls when you hit a dollar cap

Agora-1: The Multi-Agent World Model

Sieve – scans Cursor/Claude chat history for leaked API keys

Alignment pretraining: AI discourse creates self-fulfilling (mis)alignment

Apple Silicon costs less than OpenRouter

Apple Silicon costs more than OpenRouter

Comments

The last six months in LLMs in five minutes

Click (2016)

PyTorch Landscape

Anyone on the Internet Can Ring Your Doorbell

Codex-Maxxing

Regex Chess: A 2-ply minimax chess engine in 84,688 regular expressions

Anthropic acquires Stainless

War game exposed U.S. vulnerability to low-tech warfare

Peter Salus has died

Pope Leo XIV’s first encyclical Magnifica humanitas to be published May 25

We stopped AI bot spam in our GitHub repo using Git's –author flag

Hyperpolyglot Lisp: Common Lisp, Racket, Clojure, Emacs Lisp

The Quiet Renovation at Bitwarden

Turn your Android phone into a ham radio transceiver

Show HN: Files.md – Open-source alternative to Obsidian

Show HN: Number Gacha, a gacha game distilled to its essence

We let AIs run radio stations

When can the C++ compiler devirtualize a call?

Cursor Introduces Composer 2.5

Project Glasswing: what Mythos showed us

Peter Neumann has died

Elon Musk has lost his lawsuit against Sam Altman and OpenAI

Earth's Radio Bubble: Every signal we've ever sent into space

Show HN: Hsrs – Type-Safe Haskell Bindings Generator for Rust

Why is it called Kent House?

LLMCap – A proxy that hard-stops LLM API calls when you hit a dollar cap

Agora-1: The Multi-Agent World Model

Sieve – scans Cursor/Claude chat history for leaked API keys

Alignment pretraining: AI discourse creates self-fulfilling (mis)alignment

Apple Silicon costs less than OpenRouter