AI API Prices are 90% Subsidized

https://tinyml.substack.com/p/the-unsustainable-economics-of-llm

27•csoham•7mo ago

Comments

PaulHoule•7mo ago

When the AI hype train left the station I said "we don't understand how these things work at all and they're going to get much cheaper to run" and that turned out to be... true.

Already vendors of legacy models like ChatGPT-4 have to subsidize inference to keep up with new entrants based on a better foundation. It's likely that inference costs can be brought down by another factor of ten or so so of course you have to 90% subsidize these to get where the industry will be in 2-3 years.

revskill•7mo ago

No lol. The quality is mostly bad. Basically u need to prompt in detail like writing a novel for llm to understand. At that price, we want real AI who can really have common sense, not just an autocompletion tool.

Stop adverting LLM as AI, instead sell it as a superior copy & paste engine.

What's worst about LLM, is the more you talk with it, the worse it became to the point of broken.

mrtksn•7mo ago

Subsidized is probably not the correct word here, it's probably more like loss leader in the race of the land grab.

It's like the early days of the internet when everything was amazing and all the people who put money into this thing were "losing" their money.

It's going to be like this until monopolization and moat becomes defensible and then they will enshittify the crap of it and make their money back 10x, 100x etc.

apsec112•7mo ago

This ignores batching - token generation is much more efficient in batch - and I strongly suspect is itself written by AI, given the heavy use of bullets

biophysboy•7mo ago

is it common for adjacent tokens to use the same weights in a memory cache?

twoodfin•7mo ago

The “X—not Y” pattern is also a dead giveaway.

GaggiX•7mo ago

This calculation doesn't account for batches, it makes no sense.

BriggyDwiggs42•7mo ago

On average how much does batching bring costs down?

GaggiX•7mo ago

It balances the computing and memory bandwidth bottleneck so by a lot, with continuous batching you can easily see a x10, x20 or more.

BriggyDwiggs42•7mo ago

Wow! Thanks.

impure•7mo ago

I’ve been playing around with Gemma E4B and have gotten really good results. That’s a model you can run on a phone. So although prices have been going up recently I suspect they will start to fall again soon.

python273•7mo ago

A much better article on token prices: https://www.tensoreconomics.com/p/llm-inference-economics-fr...

There's not much incentive to subsidize prices for OpenRouter providers for example, and the prices are much lower than the $6.37/M estimate from the article.

https://openrouter.ai/meta-llama/llama-3.3-70b-instruct

avg $0.37/M input tokens, $0.73/M output tokens (21 providers)

Llama is not even a good example, as the recent models are more optimized using Mixture Of Experts and KV cache compression.

daft_pink•7mo ago

Also, it ignores the fact that they will optimize it and make it more efficient like Moore’s law, so everyone is basically assuming that the price will come down over time.

SidePop – track revenue, costs, and overall business health in one place

The Other Markov's Inequality

The Cascading Effects of Repackaged APIs [pdf]

Lightweight and extensible compatibility layer between dataframe libraries

Haskell for all: Beyond agentic coding

Dorsey's Block cutting up to 10% of staff

Show HN: Freenet Lives – Real-Time Decentralized Apps at Scale [video]

In the AI age, 'slow and steady' doesn't win

Administration won't let student deported to Honduras return

How were the NIST ECDSA curve parameters generated? (2023)

AI, networks and Mechanical Turks (2025)

Goto Considered Awesome [video]

Show HN: I Built a Free AI LinkedIn Carousel Generator

Implementing Auto Tiling with Just 5 Tiles

Open Challange (Get all Universities involved

Apple Tried to Tamper Proof AirTag 2 Speakers – I Broke It [video]

Show HN: Isolating AI-generated code from human code | Vibe as a Code

Show HN: More beautiful and usable Hacker News

Toledo Derailment Rescue [video]

War Department Cuts Ties with Harvard University

Show HN: LocalGPT – A local-first AI assistant in Rust with persistent memory

A Bid-Based NFT Advertising Grid

AI readability score for your documentation

NASA Study: Non-Biologic Processes Don't Explain Mars Organics

I inhaled traffic fumes to find out where air pollution goes in my body

X said it would give $1M to a user who had previously shared racist posts

155M US land parcel boundaries

Private Inference

Font Rendering from First Principles

Show HN: Seedance 2.0 AI video generator for creators and ecommerce