I would think companies would set the usage low and increase it with capacity rather than subsidizing the power users and going into the red. Maybe my strategy wouldn't be aggressive enough to capture the market, which I'm sure the major AI companies are trying to do.
If big AI does crash out, it would be an absolute gold-mine for local LLM. Cheap, efficient, Nvidia GPUs, and RAM that can run the best local models already available, will be a real boon.
PS - And as great as Qwen3.6-27B is, how large you can scale it (i.e. how big of a context/project) is mostly hardware constrained.
In the short term, resource management can affect prices and allocation, especially when it's being figured out on the go.
A permanent position that technology is fixed assumes the technology will not improve.
This means, the software won't get more efficient with it's use of hardware, or the hardware won't become more power efficient, etc. Open/self-hosted models are a real world example where efficiency is happening.
Thinking technology won't become more efficient is like imagining that cell phones will still run with the poor battery life of the 1990's.
- What are the margins of Anthropic over their API pricing? Without this, all we're saying is the API is more expensive for heavy usage
- How have their price margins changed over time? I imagine built into their commercial model is the expectation for inference to get cheaper over time
- They have tighter usage windows than they used to, now having both 5-hour and weekly limits, they also seem to experiment with their usage quite often, this probably affects user's average utilisation, do they have any other levers they can pull here? eg how does changing to an 8-hour window affect it, or limiting certain models to API-only usage based on capacity like Fable
I know there is a certain level of subsidised usage built into their subscriptions, the VC-funded company playbook, but I don't think anyone from the outside knows for sure how much it is and I imagine it's lower than most people think, and reducing over time.
A model's capability is a function of model size, and you can only push a small overspecialized "idiot savant" model so far before its crippling size starts to bite you.
You can make a model like Composer 2.5. But Mythos 5 will beat it on capability, both at coding and at everything else. And the world is always hungry for more capabilities.
If you're running high on agentic AI and low on human oversight, paying x2 for going from 5% faults to 2% faults is a good deal.
Oh wait.
The good news is that, after 40 years of Democrats not prioritizing opposition to the Reagan/Bork CWS, the issue is back on the ballots. Lina Khan picked up the torch that Louis Brandeis picked up a century ago (the arguments are identical, in this aspect time is a circle). Unfortunately, her team lost the last election, but just remember for the next one: this issue is now on the ballots.
Instead, this dumping is exporting "thinking" to destroy humans' innate thoughts, get them hooked, then rugpull for 3x the cost. Cause just over 1 year of LLMs, takes a developer who could reverse engineer a thing, to now needing help to construct a for loop.
Thats why I run my own LLMs. Hard to rugpull what you own and control. And thats also why I focus on questions not of "do this", but "explain this". I seek to use LLMs to learn more effectively, so I end up needing it less and less.
If I lose it, it will not be the end of the world. I'll probably start digging into local models.
I suspect there are many like me. Far more than there are totally dependent users. I also suspect that the AI economy is some sort of "whale economy", where a minority is footing the bill, by paying outrageous amounts to Anthropic/Open AI/Google.
if i were in business, the idea that my employees would lose skills and be dependent on a third party that controls both price and quality with zero feedback would be insane.
We will have better and cheaper intelligence in the future than we have now. This is not Uber. Inference is profitable for these companies. Looking at API pricing and assuming that reflects the cost basis is dumb.
It costs less for OpenAI to serve GPT-5.5 than it did to serve GPT-4. An H100 is more valuable today than it was five years ago because it can serve more intelligence per token.
Jevons paradox and short-term crunches may cause some swings, but the value of a token keeps increasing while the average token price decreases.
Chinese models are already a fraction of the cost, and we will have a mythos/fable-level open-source model by the end of the year. There is no “gotcha” where every AI company rugs you in unison.
Stop trying to figure out how this screws you. Start figuring out what cool shit you can build with it.
It's a different story for subscriptions. According to my rough computation (N=1), a Claude Max 20x at $200 gives you access to around $8k worth of tokens per month – but they don't cost Anthropic $8k! – and there I think they'd make a loss on every token maxxer which may or may not be compensated by subscriptions that are not used. But that's not the end of the subscription story.
Once you are "enterprise" you pay for token use and there is no way around it: Anthropic does it and so does OpenAI. The subscription is the gateway drug to token maxxing. When people are hired in an Enterprise job, they'll come with their habit of using AI for all and any task.
All to say that: yes, AI labs are bleeding money but on everything else – datacenters, training models, talent,...
tonymelony•1h ago
dataflow•1h ago
simonw•57m ago
Consider a model that costs $100m to train.
If the vendor then prices it such that each inference token has a margin of 10% over the variable costs to serve (power + server costs), whether or not they cover their costs is based entirely on how many tokens they can sell.
If they sell less than $1bn of tokens, they lose money - the break even point is 10x100m = $1bn.
If they sell $10bn of tokens they make a ton of money.
This also means you can't credibly calculate how much of the fixed training expense is covered by your token spend, because until the model is retired and you can account for how much inference it ran you don't know what percentage of the training cost each sold token was responsible for.
frotaur•53m ago
And if capabilities plateau such that training the next one is useless, then the margins will drop fast due to competition.
ACCount37•40m ago
Driven mostly by just how much inference they sell nowadays - but also by things like base model reuse.
vb-8448•41m ago
You have to include also failed training sessions and experiments in the math.
There are no official figures but given how fast new models are rolled out, I wouldn't be surprised if neither Anthropic nor OAI manage to cover the full models cost.
KumaBear•1h ago
6stringmerc•1h ago
In short, citation needed or shens bruh.
ACCount37•55m ago
All we have actual evidence of is: some users use enough AI that the subscription is sold at a loss to them (up to degenerate cases: usage maxed out at all times), if billed by API metrics, while some other users are, by the same metrics, profitable (down to degenerate cases: a forgotten subscription with $20 a month and 0 usage).
We don't know how API prices relate to costs - we only have estimates. And we certainly don't know how much inference does an average subscription user spend.
If you have some sort of information that would decisively prove that the aggregate is "AI company N is losing money on subscriptions", then, show it.
Or is it you who's blinded by faith? Like some sort of AI bubble cultist? The bubble is real, you just have to believe in it?
BosunoB•15m ago
I imagine we'll know in a few months when these companies go public.
ktzar•55m ago
The figure mentioned in the video is not far off
moralestapia•53m ago
Grab gpt-oss-120b, run it continuously and see how far 20 dollars worth of that gets you. People definitely use much more than that in a month, not just power users but regular ones, and they're using models that are more expensive to run (plus the "cloud" markup).
anthonypasq•9m ago
heres some napkin math
gpt oss 120b is in/out price at 0.039/ 0.18 per million on open router. heres some assumptions.
1. the ratio of input/ouput is about 25/1. (coding is mostly grep and fairly low outpu)
2. you are getting 75% prompt cache reads
Case B: 50% Prompt Caching Discount (Standard Provider Rate)At 75% Prompt Caching:Total Tokens Obtained: 658,749,010 (approx. 659 Million tokens)
Input: ~633mil
~475 mil cached at 50% input pricing = ~$9.25
~158 mil uncached = ~$6.15
tokensOutput: 25mil tokens ($4.5)
This doesnt even account for profit margins on inference providers, or the fact that openAI probably has a much more efficient inference stack.
its really hard to know what these companies are actually paying, but from everything im hearing, people are reporting API inference pricing is 50% margin.
Someone1234•25m ago
OpenAI, Anthropic, and Microsoft/Meta/Google are all at a net negative on AI (i.e. they're "demonstrably" losing money). So it is objectively true. If everyone is losing money, and nobody is profitable, then it is a demonstrable fact.
As far as I know, the only "AI" venture currently in the green is Nvidia, and they're selling shovels to gold miners.
BosunoB•22m ago