Apple Silicon costs more than OpenRouter

https://www.williamangel.net/blog/2026/05/17/offline-llm-energy-use.html

38•datadrivenangel•57m ago

Comments

synthos•34m ago

How much does your data privacy cost?

datadrivenangel•10m ago

As stated in the analysis, thousands of dollars. That said, the smart thing to do is target smaller models (few billion parameters) and then use larger models for non-privacy tasks.

SecretDreams•32m ago

Will this cost structure always be this way and are there other benefits to not running your LLM on the cloud?

E.g.

Privacy

Uptime

Future cost structure controls

This is a field that has moved very quickly. And it has moved in a direction to try to trap users into certain habits. But these habits might not best align with what best benefits end users today or some time in the future.

SpyCoder77•32m ago

Open router doesn't cost money per say, it depends on the providers pricing

mnahkies•26m ago

They do take a cut of 5.5%, (as they should)

moritzwarhier•17m ago

> OpenRouter has Gemma4 31b at ~38-50 cents per million tokens. This means that on the optimistic side (50 watts, 40 tokens per second, and 10 years) the pro max is as cheap as openrouter. On the pessimistic side (100 watts and 3 years at 10 tokens per second) the pro max is 10x the cost. I think ~3x the cost per million tokens is likely the right number for local inference on the pro max from an accounting perspective.

Apart from that, like detailed in the the article, pricing for local compute also depends on electricity prices.

By the way, I don't want to snark about it, my English is not very good, but it's "per se", not "per say". Just commenting on this petty thing because it seems to be a common misspelling, and it always trips me up a bit. Makes me wonder about another supposed meaning like "from hearsay".

applfanboysbgon•27m ago

Unless I'm misunderstanding, this is counting the entire laptop in the cost of generating tokens. The calculation seems to omit that, in addition to receiving LLM output, you have also received a laptop in exchange for your money. If you intend to put this machine in a dark corner and run it solely as a token-munching server, a laptop would be an exceptionally poor choice of technology for this purpose. But if you intend to use the laptop as a laptop, having a laptop is a pretty big benefit over not having a laptop.

You also get the benefit of privacy, freedom from censorship, and control over the model used (i.e. it will not be rugpulled on you in three months after you've built a workflow around a specific model's idiosyncrasies).

andai•19m ago

Yeah, a better metric might be, the difference in cost between the laptop you need to run local models, and the laptop you would have bought anyway.

an0malous•27m ago

OpenRouter and other LLM platforms are being subsidized by VC investment to less than it costs them to run inference, the MacBook Pro is not

Kwpolska•11m ago

When the AI bubble inevitably pops, the author will find a new way to skew results in favor of cloud LLMs. Like including the price of a desk and a chair in the local token cost.

datadrivenangel•10m ago

I really wanted the laptop to look better cost-wise, but it doesn't.

maho•26m ago

The author only compared output token costs -- but for typical agentic workloads, input tokens dominate the costs by a large margin. Running inference locally, input tokens are, to first order, free. (They only generate implicit costs through higher time-to-first-token, higher power use, and lower token output speed).

bilekas•25m ago

I don't hear people debating which is cheaper, local or cloud run models. The conversation, at least what I hear, is a lot of the time users are not utilizing an awful lot of tickets all the time, those providers will be paid if you never use them. If 80% - 90% of the work I and my team are doing with Ai is grunt work, write tests for this, implement a FFT here, write the dB query for X. Nothing exhausting. Those who are using AI for whole cloth "vibe coded" applications and services are definitely better suited to cloud. If a work laptop can run my local models and get my works needed performance for development, why wouldn't I as a company prefer that?

Add to that the privacy improvements and data protection and potentially further specific inferance if needed it's a no brainer.

Again, Ai is a tool, and the right tool for the job, I would wager with no evidence looked up, is that the majority of Devs would be happy with 10-30 per second locally.

regexorcist•25m ago

I simply can't go back to cloud AI. Privacy and full control are more important to me than speed and SOTA models.

JSR_FDED•25m ago

Wouldn’t a Mac Mini be a better comparison?

sgt•15m ago

Yes, or Mac Studio. Laptops with screens aren't made to run 24/7 heavy workloads.

bastawhiz•21m ago

This isn't a good analysis, and it's because it keeps rounding everything up. He rounds up the cost of electricity by 10%. He has a range of power use, takes the high end (which is 2x the low end) and multiplies it by the inflated electricity cost.

But then they talk about using a newly purchased Mac to do the inference, running at full capacity, 24/7. Why would you do that? Apple silicon is fast but the author points out: you're only getting 10-40 tokens per second. It's not bad, but it's not meant for this!

It's comparing apples to oranges. Yeah, data centers don't pay residential electricity rates. Data centers use chips that are power efficient. Data centers use chips that aren't designed to be a Mac.

Apple silicon works out pretty good if you're not burning tokens 24/7/365 and you're not buying hardware specifically to do it. I use my Mac Studio a few times a week for things that I need it for, but I can run ollama on it over the tailnet "for free". The economics work when I'm not trying to make my Mac Studio behave like a H100 cluster with liquid cooling. Which should come as no surprise to anyone: more tokens per watt on hardware that's multi tenant with cheap electricity will pretty much always win.

datadrivenangel•7m ago

Rounding everything down in the most optimistic setting got me to $0.40 per million tokens, and openrouter has the same model at $.38/mtok.

dist-epoch•4m ago

using it 24/7 brings the average cost down, not up.

the less you use local LLM, the less sense it makes since you paid a lot for hardware you don't use

nu11ptr•19m ago

"Accelerated depreciation (if any) from shortening the lifespan of the device will be more expensive than the electricity"

Shortening the lifespan?

michaelbuckbee•16m ago

Slightly different slice into this a very similar situation (local vs OpenRouter AI inference).

But in _every_ metric other than privacy it was better to run via OpenRouter than a local model, and not by a small amount.

Direct link to the comparison charts:

https://sendcheckit.com/blog/ai-powered-subject-line-alterna...

Havoc•16m ago

I like that the numbers were crunched, but the answer to these is always a bit of a foregone conclusion.

* Industrial power pricing

* Wholesale hardware pricing

* Utilization density of shared API

means API always wins a cost shootout.

Privacy & tinkering is cool too though

panny•14m ago

Your laptop AI costs too much? Speculative investors can help!

freakynit•7m ago

So I did the India-specific analysis for a tier-3 city. Here, electricity costs 1/3rd of the US version, and you also get solar subsidy up to a certain amount.

https://shorturl.at/q6gRE

tldr;

Hardware deprecation costs are the major factor.

But, if we assume ZERO hardware deprecation (not realistic), then local inference becomes super cheap.. roughly, 90%+ cheaper.

Third case: the break-even happens only if we can get at the very very very least, 8.7 years of useful hardware life. A more realistic number, however, when working 8 hrs/day and not of 24 hrs/day, is around 25 years.

So, for now, local inference is preferable if you deeply care about privacy. From cost perspective, it's still not there.

Nim-Presto – REST API Framework for Nim Language

Intel Core i9-14900KF reaches 9.2Ghz setting a new CPU frequency world record

LogTape 2.1.0: Throttling, logfmt, and smarter redaction

(VBS-NN) ML – 512k context length pre-training on a 12GB GPU

Stochastic Flocks and the Critical Problem of 'Useful' AI

Construction on Meta's largest data center brings chaos to rural Louisiana

Coal Makes a Comeback, Fueled by War in the Middle East

AsymFlow: Turning Latent Diffusion Models into Pixel-Space Generators

CUDA Books

Astronomers produce most detailed map of the cosmic web, across 13.7B years

MatterSim-MT: A multi-task foundation model for materials characterization

Signals vs. Noise: How to spot architectural shifts

Reducing "show less like this" by 11% with NSFW filtering

Yes, you can be allergic to water

Learning-focused CTFs are Facing a Restructure

Agentic Trading with Safe Guardrails

Self-hosted browser fingerprinting and bot detection with real-world constraints

Show HN: I vibe coded a music box

Hubski

Ask HN: How do you approach a new codebase?

Post-Quantum JWT Library/Package for Node.js/JS/TypeScript (NIST FIPS 204M-DSA)

The jobs apocalypse: a (very) short history

What Do You Want?

'Once in a lifetime find': Dinosaur tail discovered trapped in amber (2016)

Async I/O in Zig 0.16, today

Refactor: Unified Codebase for Better Performance

WorkClarity – Free AI tools fo freelancers

Show HN: CLI for image/video to ASCII art

OpenSMTPD Is the Mail Server for the Future

How a blind taste competition launched the American wine industry