I don’t think they got it right and the market share and usage grew faster than inference dropped, but inference costs will clearly drop and these companies will eventually be very profitable.
Reality is that startups like this assume moore’s law will drop the cost over time and arrange their business around where they expect costs to be and not where costs currently are.
It could also be that you give too much credit to the market. People follow trends because in most cases that makes money. There is no other deeper though involved. Look at the financial crisis, totally irrational.
Inference costs for old models will drop, but inference costs may stay the same if models continue to improve.
No guarantee that any wrapper for inference will be able to hold on to customers when they stop selling $1.00 for $0.50.
Thanks!
Price-performance though? The trend is clear: a given level of LLM capability keeps getting cheaper, and that trend is expected to hold. Improvements in architecture and training make LLMs more capability-dense, and advanced techniques make inference cheaper.
They haven't though. On two fronts: 1, the soa models have been pretty constantly priced, and everyone wants the soa models. Likely the only way costs drop is the models get so good that people are like hey, I'm fine with a less useful answer (which is still good enough) and that seems, right now, like a bad bet.
and 2 - we use a lot more tokens now. No more pasting Q&A into a site; now people hammer up chunks of their codebases and would love to push more. More context, more thinking, more everything.
Here's an analogy you may understand:
Inference is getting so much cheaper that cursor and zed have had to raise prices.
Even if we take this as true, the point is that this is different than "the cost of inference isn't going down." It is going down, it's just that people want more performance, and are willing to pay for it. Spend going up is not the same as cost going up.
I don't disagree that there are a wide variety of things to talk about here, but that means it's extra important to get what you're talking about straight.
The cost of inference -- ie $ that go to your llm api provider -- has increased and certainly appears to continue to increasing.
see also https://ethanding.substack.com/p/ai-subscriptions-get-short-...
This is the crux of it: when talking about "the cost of inference" for the purposes of the unit economics of the business, what's being discussed is not what they charge you. It's about their COGs.
That's not word games. It's about being clear about what's being talked about.
Talking about increased prices is something that could be talked about! But it's a different thing. For example, what you're talking about here is total spend, not about individual pricing going up or down. That's also a third thing!
You can't come to agreement unless you agree on what's being discussed.
I think if these companies are gambling their future on COGS going down, that’s a gamble they’re going to lose.
It seems to me like models' capability scales logarithmically with size and wattage, making them the rare piece of software that can counteract Moore's Law. That doesn't seem like a way to make a trillion dollars.
Anthropic: "$2.66 billion on compute on an estimated $2.55 billion in revenue"
Cursor: "bills more than doubled from $6.2 million in May 2025 to $12.6 million in June 2025"
Clickthrough if you want the analysis and caveats
ARR could be a useful tool to help predict future revenue, but why not simply report on actual revenue and suggest it might increase in the next year? I have found the most articles to be unclear to the reader about what ARR actually represents.
The point of ARR is to give an up to date measure on a rapidly changing number. If you only report projected calendar year revenue, then on January 1 you switch from reporting 2025 annual revenue to 2026 projected revenue, a huge and confusing jump. Why not just report ARR every month? It's basically just a way of reporting monthly revenue — take the number you get and divide it by 12.
I am really skeptical that people are being bamboozled by this in some significant way. Zitron does far more confusing things with numbers in the name of critique.
Nobody considers a year from June to June because that would be misleading.
I am also willing to bet that the student dropoff is not pronounced. I am more thinking of a business that sells beach umbrellas, they make a lot of sales in the summer months and then next to nothing in the winter months. That would be dishonest.
so he got a leaked copy of their AWS bills?
Well I don't have to scratch my head any longer and wonder why Amazon hasn't jumped on the AI bandwagon with their own Gemini or whatever. They are sitting pretty and selling shovels and pickaxes to the AI fools. Not a bad strategy for them...
It might take out your 401k for a decade.
At a certain point you just expect it.
[1]: https://en.wikipedia.org/wiki/August_2011_stock_markets_fall
Its like fruit fly generations, not 20-30 year human cohort generations.
I would expect to see OpenAI, Anthropic, and a lot of the little tool wrappers to get taken out though, or at least acquired for pennies on the dollar when it bursts.
But like the last one, it's going to be us, the tax payers, that are left holding the bag.
Everything else is expense.
There’s a lot of that sort of thing going on at the moment in the AI bubble.
when the music stops, suddenly a lot of people won't just sit on the ground but plunge into the depths of hell.
50% of people into coding agents are quite concerned about that last mile in difference with frontier models that they "can't afford to lose" - my experience tells me otherwise, the difference is negligible once you have a good setup going and knows how to tune your model + agent.
The other 50% don't give a damn, they just landed, or got locked, into some deal for a coding agent and are happy with what they got, so why change? These deals arrived from the big model providers and resellers first, so Chinese arrived late and with too little to the party.
Running Chinese models (for coding) requires many things that you need to figure-out yourself. Are you running the model on your hw or through a provider? Are you paying by token or on a plan? Does the model pair well with you agent CLI/IDE of choice? (Zed, Cline, Opencode, etc) Does it even work with your favorite tool? (tool calling is very wobbly) Is it fast (tps)? Is it reliable? How do you do "ultrathink" with a secondary model? How do you do "large context"? Does it include a cache or are you going to eat through the plan in 1hr/day? What context size are you getting? Does it include vision and web search or do you have to get another provider/mcp for that? And, yeah, is it in a territory where you can send your client's code to? A lot to grok.
Cerebras Coder Max is really cool if you want to hack your way through this, but they couldn't care less about your experience: no cache, no tool endpoint fine-tuning, no plans or roadmap on updating models, on increasing context windows, adding vision, or anything really. They just deleted some of the tools they were recommending out of the website (ie Cursor) as they got reports of things that stopped working.
I am not invested in anything except popcorn to watch it burst;)
E.g
What is the unit cost of serving a Token? It is the cost of electricity + amortized cost of GPU (GPUs would have been Capex, but because of their fast depreciation rate, you can claim they should be Opex). Given this cost structure, every SOTA labs (Google, Anthropic and OpenAI) are profitable and actually have high-margins 50-60%.
With this margin and growth, the frontier labs can be profitable anytime they want to. But they are sacrificing profitability for growth (as they should be)
Where is Ed's analysis about this? Either he is disingenuous or clueless. Remember people who voluntarily subscribe to Ed, are coming from wanting to hear what they believe.
If he is level-headed, show me an Ed article that is positive about AI
Why should those two things go together?
But I guess Ed Zitron has found his audience
Those people arent exactly experts or right most of the time either
"What is the unit cost of serving a Token? It is the cost of electricity + amortized cost of GPU (GPUs would have been Capex, but because of their fast depreciation rate, you can claim they should be Opex). Given this cost structure, every SOTA labs (Google, Anthropic and OpenAI) are profitable and actually have high unit margins of 50-60%."
High Unit Margins and growth means, these labs can be profitable anytime they choose to
All bubbles (dot com, housing, tech, crypto, etc) have a lot of losers and a few big winners.
That is less a reflection on the market of the bubble and more a reflection of the number, skill and risk taking of the prospectors.
Claiming that a single journalist blog has power to stop others from criticiaing ai for different reasons ia kind of absurd.
I have sat with these numbers for a great deal of time, and I can’t find any evidence that Anthropic has any path to profitability outside of aggressively increasing the prices on their customers to the point that its services will become untenable for consumers and enterprise customers alike.
This is where he misunderstands. Enterprise companies will absolutely pay 10x the cost for Claude. Meta and Apple are two large customers, you think they won't pay $500 a month per employee? $1000 a month per employee? Neither of those are outrageous to imagine if it increases productivity 10%.I suppose that's the pessimistic-on-AI side. On the other hand, once you create God little things like money are meaningless.
That said, I hope they're using their prime Visa card so they can get some cash back on that spend.
isoprophlex•3h ago
swyx•2h ago
"coming soon" is also really over simplistic. you would have missed some of the greatest tech companies in the past 20 years if you evaluated startups based on their early-year revenue vs infra spend
like sure i have a dog in this fight but i actually want the criticism to sharpen my thinking, unfortunately yours does not meet that bar.
isoprophlex•1h ago
they spend 104% of revenue on ONE cloud provider and costs scale linearly with revenue growth. assume zitron didn't pull these numbers out of his ass.
educate me how this isnt selling $20 bills for $5. you're a smart dude; i myself aint seeing the "sustainable business practices" here
swyx•1h ago
pull up Uber's financials leading up to IPO. unsustainable and everyone knew it. they work it out after because they burned money and eventually achieved a sustainable moat. this is why venture exists. HN doesnt like venture which is, well, ironic given the domain we're on.
a better negative argument i'd rather see looks like this - "ive run these aws numbers against the typical spend path of pre IPO startups who then later improved their cost baseline and margin profile and even after accounting for all that, Anthropic ngmi". thats the kind of minimum sophistication you need here to play armchair financial analyst. ed zitron, and everyone involved in this entire thread, incl myself, have not done that because we are lazy and ignorant and dont actually care enough about seeking the truth here. we are as unprepared to analyze this AWS spend as we are to understand their 1b -> 10b revenue ramp in 2025. you havent done the work and yet you sit here and judge it unsustainable based off some shitty "leaks". dont pretend that ed's analysis is at all meaningful particularly because he conveniently stops where it supports his known negative bias.
watwut•29m ago