Apple Silicon costs more than OpenRouter

https://www.williamangel.net/blog/2026/05/17/offline-llm-energy-use.html

57•datadrivenangel•1h ago

Comments

synthos•58m ago

How much does your data privacy cost?

datadrivenangel•35m ago

As stated in the analysis, thousands of dollars. That said, the smart thing to do is target smaller models (few billion parameters) and then use larger models for non-privacy tasks.

SecretDreams•57m ago

Will this cost structure always be this way and are there other benefits to not running your LLM on the cloud?

E.g.

Privacy

Uptime

Future cost structure controls

This is a field that has moved very quickly. And it has moved in a direction to try to trap users into certain habits. But these habits might not best align with what best benefits end users today or some time in the future.

SpyCoder77•56m ago

Open router doesn't cost money per say, it depends on the providers pricing

mnahkies•51m ago

They do take a cut of 5.5%, (as they should)

moritzwarhier•41m ago

> OpenRouter has Gemma4 31b at ~38-50 cents per million tokens. This means that on the optimistic side (50 watts, 40 tokens per second, and 10 years) the pro max is as cheap as openrouter. On the pessimistic side (100 watts and 3 years at 10 tokens per second) the pro max is 10x the cost. I think ~3x the cost per million tokens is likely the right number for local inference on the pro max from an accounting perspective.

Apart from that, like detailed in the the article, pricing for local compute also depends on electricity prices.

By the way, I don't want to snark about it, my English is not very good, but it's "per se", not "per say". Just commenting on this petty thing because it seems to be a common misspelling, and it always trips me up a bit. Makes me wonder about another supposed meaning like "from hearsay".

applfanboysbgon•52m ago

Unless I'm misunderstanding, this is counting the entire laptop in the cost of generating tokens. The calculation seems to omit that, in addition to receiving LLM output, you have also received a laptop in exchange for your money. If you intend to put this machine in a dark corner and run it solely as a token-munching server, a laptop would be an exceptionally poor choice of technology for this purpose. But if you intend to use the laptop as a laptop, having a laptop is a pretty big benefit over not having a laptop.

You also get the benefit of privacy, freedom from censorship, and control over the model used (i.e. it will not be rugpulled on you in three months after you've built a workflow around a specific model's idiosyncrasies).

andai•43m ago

Yeah, a better metric might be, the difference in cost between the laptop you need to run local models, and the laptop you would have bought anyway.

dist-epoch•27m ago

> control over the model used

but you lose access to the most capable models, you can run only the small ones

an0malous•51m ago

OpenRouter and other LLM platforms are being subsidized by VC investment to less than it costs them to run inference, the MacBook Pro is not

Kwpolska•35m ago

When the AI bubble inevitably pops, the author will find a new way to skew results in favor of cloud LLMs. Like including the price of a desk and a chair in the local token cost.

datadrivenangel•34m ago

I really wanted the laptop to look better cost-wise, but it doesn't.

maho•50m ago

The author only compared output token costs -- but for typical agentic workloads, input tokens dominate the costs by a large margin. Running inference locally, input tokens are, to first order, free. (They only generate implicit costs through higher time-to-first-token, higher power use, and lower token output speed).

bilekas•50m ago

I don't hear people debating which is cheaper, local or cloud run models. The conversation, at least what I hear, is a lot of the time users are not utilizing an awful lot of tickets all the time, those providers will be paid if you never use them. If 80% - 90% of the work I and my team are doing with Ai is grunt work, write tests for this, implement a FFT here, write the dB query for X. Nothing exhausting. Those who are using AI for whole cloth "vibe coded" applications and services are definitely better suited to cloud. If a work laptop can run my local models and get my works needed performance for development, why wouldn't I as a company prefer that?

Add to that the privacy improvements and data protection and potentially further specific inferance if needed it's a no brainer.

Again, Ai is a tool, and the right tool for the job, I would wager with no evidence looked up, is that the majority of Devs would be happy with 10-30 per second locally.

regexorcist•50m ago

I simply can't go back to cloud AI. Privacy and full control are more important to me than speed and SOTA models.

xyzzy123•4m ago

Also predictability, resilience, sovereignty. I'm not worried about other people's outages, that unexpected demand will impact me at an inconvenient time, that someone's watering down my model, that my costs will change unpredictably or that some unforseen error will lead to a huge bill.

JSR_FDED•49m ago

Wouldn’t a Mac Mini be a better comparison?

sgt•39m ago

Yes, or Mac Studio. Laptops with screens aren't made to run 24/7 heavy workloads.

bastawhiz•45m ago

This isn't a good analysis, and it's because it keeps rounding everything up. He rounds up the cost of electricity by 10%. He has a range of power use, takes the high end (which is 2x the low end) and multiplies it by the inflated electricity cost.

But then they talk about using a newly purchased Mac to do the inference, running at full capacity, 24/7. Why would you do that? Apple silicon is fast but the author points out: you're only getting 10-40 tokens per second. It's not bad, but it's not meant for this!

It's comparing apples to oranges. Yeah, data centers don't pay residential electricity rates. Data centers use chips that are power efficient. Data centers use chips that aren't designed to be a Mac.

Apple silicon works out pretty good if you're not burning tokens 24/7/365 and you're not buying hardware specifically to do it. I use my Mac Studio a few times a week for things that I need it for, but I can run ollama on it over the tailnet "for free". The economics work when I'm not trying to make my Mac Studio behave like a H100 cluster with liquid cooling. Which should come as no surprise to anyone: more tokens per watt on hardware that's multi tenant with cheap electricity will pretty much always win.

datadrivenangel•31m ago

Rounding everything down in the most optimistic setting got me to $0.40 per million tokens, and openrouter has the same model at $.38/mtok.

650REDHAIR•17m ago

I’ll keep my data local over a $.02/mtok difference.

dist-epoch•29m ago

using it 24/7 brings the average cost down, not up.

the less you use local LLM, the less sense it makes since you paid a lot for hardware you don't use

nu11ptr•44m ago

"Accelerated depreciation (if any) from shortening the lifespan of the device will be more expensive than the electricity"

Shortening the lifespan?

Der_Einzige•15m ago

The amount of FUD and notion that hardware depreciates in this manner is widely held. I blame Michael Burry of the Big Short who is perpetuating these lies to the investor community today.

There's a bunch of retro hardware which should make people pause and realize they're stupid to assume hardware slows down on average even 5% 20 years later (it's probably closer to 2% and I'm being generous).

HVAC/power delivery and generation are the major factors, and if you didn't skimp/get defective parts for this and replace failed moving parts (usually fans), your hardware is basically the same 20 years down the line as it was today.

Also using LLMs locally doesn't even induce sustained 100% GPU usage over significant periods of time for most real (agentic coding in OpenCode) use-cases.

michaelbuckbee•41m ago

Slightly different slice into this a very similar situation (local vs OpenRouter AI inference).

But in _every_ metric other than privacy it was better to run via OpenRouter than a local model, and not by a small amount.

Direct link to the comparison charts:

https://sendcheckit.com/blog/ai-powered-subject-line-alterna...

Havoc•40m ago

I like that the numbers were crunched, but the answer to these is always a bit of a foregone conclusion.

* Industrial power pricing

* Wholesale hardware pricing

* Utilization density of shared API

means API always wins a cost shootout.

Privacy & tinkering is cool too though

panny•39m ago

Your laptop AI costs too much? Speculative investors can help!

freakynit•31m ago

So I did the India-specific analysis for a tier-3 city. Here, electricity costs 1/3rd of the US version, and you also get solar subsidy up to a certain amount.

https://shorturl.at/q6gRE

tldr;

Hardware deprecation costs are the major factor.

But, if we assume ZERO hardware deprecation (not realistic), then local inference becomes super cheap.. roughly, 90%+ cheaper.

Third case: the break-even happens only if we can get at the very very very least, 8.7 years of useful hardware life. A more realistic number, however, when working 8 hrs/day and not of 24 hrs/day, is around 25 years.

So, for now, local inference is preferable if you deeply care about privacy. From cost perspective, it's still not there.

newsclues•19m ago

Local isn’t (just) about cost, it’s control and trust.

Der_Einzige•18m ago

OpenRouter doesn't expose all the LLM sampling parameters/research that llamacpp, vllm, sglang, et al expose (so no high temperature/highly diverse outputs). Also OpenRouter doesn't let you use steering vectors or LoRA or other personalization techniques per-request. Also no true guarantees of ZDR/privacy/data sovereignty.

Oh, and the author didn't mention at all anything related to inference optimization, so no idea if they even know about or enabled things like speculative decoding, optimized attention backends, quantization, etc.

At least AI slop would have hit on far more of the things I listed above. This is worse-than-AI.

brisket_bronson•13m ago

> Let's round up to $0.20 per kWh.

Next paragraph

> At ~50-100 watts and $0.18/kWh that's $0.009 or $0.018 per hour. $0.02 per hour. $0.48 cents per day for the electricity to be running inference at 100%.

lol

Jayakumark•4m ago

OP is comparing against Gemma everywhere but concludes paying Anthropic make more sense. Anthropic is $15 per million output token which is 30-35x more expensive even in openrouter .

This is like comparing e-bike at home with e-bike rental and concluding therefore we need to rent Toyota since it can go at similar speeds. Getting tired of bad posts getting much attention .

Native all the way, until you need text

I don't think AI will make your processes go faster

Apple Silicon costs more than OpenRouter

Every AI Subscription Is a Ticking Time Bomb for Enterprise

Zerostack – A Unix-inspired coding agent written in pure Rust

Prolog Basics Explained with Pokémon

Mozilla to UK regulators: VPNs are essential privacy and security tools

A nicer voltmeter clock

Colossus: The Forbin Project

Hosting a website on an 8-bit microcontroller

Moving away from Tailwind, and learning to structure my CSS

OpenAI and Government of Malta partner to roll out ChatGPT Plus to all citizens

How Diamonds Are Made

Playing Atari ST Music on the Amiga with Zero CPU

SANA-WM, a 2.6B open-source world model for 1-minute 720p video

Mado: Fast Markdown linter written in Rust

Illusions of understanding in the sciences

Twilight of the Velocipede: Typesetting Races Before the Age of Linotype

We've made the world too complicated

Roman Letters

The Third Hard Problem

Accelerando (2005)

Frontier AI has broken the open CTF format

Why did Clovis toolmakers choose difficult quartz crystal?

Halt and Catch Fire

MCP Hello Page

Unknowable Math Can Help Hide Secrets

δ-mem: Efficient Online Memory for Large Language Models

A molecule with half-Möbius topology

Self-Distillation Enables Continual Learning [pdf]

Apple Silicon costs more than OpenRouter

Comments

Native all the way, until you need text

I don't think AI will make your processes go faster

Apple Silicon costs more than OpenRouter

Every AI Subscription Is a Ticking Time Bomb for Enterprise

Zerostack – A Unix-inspired coding agent written in pure Rust

Prolog Basics Explained with Pokémon

Mozilla to UK regulators: VPNs are essential privacy and security tools

A nicer voltmeter clock

Colossus: The Forbin Project

Hosting a website on an 8-bit microcontroller

Moving away from Tailwind, and learning to structure my CSS

OpenAI and Government of Malta partner to roll out ChatGPT Plus to all citizens

How Diamonds Are Made

Playing Atari ST Music on the Amiga with Zero CPU

SANA-WM, a 2.6B open-source world model for 1-minute 720p video

Mado: Fast Markdown linter written in Rust

Illusions of understanding in the sciences

Twilight of the Velocipede: Typesetting Races Before the Age of Linotype

We've made the world too complicated

Roman Letters

The Third Hard Problem

Accelerando (2005)

Frontier AI has broken the open CTF format

Why did Clovis toolmakers choose difficult quartz crystal?

Halt and Catch Fire

MCP Hello Page

Unknowable Math Can Help Hide Secrets

δ-mem: Efficient Online Memory for Large Language Models

A molecule with half-Möbius topology

Self-Distillation Enables Continual Learning [pdf]