Whatever question you ask, the response will recommend a cool, refreshing Coca Cola soft drink.
Your AI coding project will automatically display ads collecting revenue for Anthropic, not for you.
Every tenth email sent by your AI agent will encourage the recipient to consider switching to Geico.
The opportunities are endless.
LLM and stuff are the ultimate propaganda machine: a machine which is able to masquerade everything, to generate endless lies in the coherent manner
Sure there might not be any analysis that proves that they subsidized, but you also don't have any evidence that they are profitable. All the data points we have today show that companies are spending an insane amount of capex on gaining AI dominance without the revenue to achieve profitability yet.
You're also comparing two products in very different spots in the maturity lifecycle. There's no way to justify losing money on a decade-old product that's likely declining in overall usage -- ask any MBA (as much as engineers don't like business perspectives).
(Also you can reasonably serve search queries off of CPUs with high rates of caching between queries. LLM inference essentially requires GPUs and is much harder to cache between users since any one token could make a huge difference in the output)
> there's not that much motive to gain API market share with unsustainably cheap prices. Any gains would be temporary, since there's no long-term lock-in, and better models are released weekly
The goal may be not so much locking customers in, but outlasting other LLM providers whilst maintaining a good brand image. Once everyone starts seeing you as "the" LLM provider, costs can start going up. That's what Uber and Lyft have been trying to do (though obviously without success).
Also, the prices may become more sustainable if LLM providers find ways to inject ad revenue into their products.
I'm sure they've already found ways to do that, injecting relevant ads is just a form of RAG.
But they won't risk it yet as long as they're still grabbing market share just like Google didn't run them at the start - and kept them unobtrusive until their search won.
(GPUs are generally much more cost effective and energy efficient than CPU if the solution maps to both architectures. Anthropic certainly caches the KV-cache of their 24k token system prompt.)
though at the outset (pre-profit / private) it's hard to say there's much difference.
Sure we do. Go to AWS or any other hosting provider and pay them for inference. You think AWS are going to subsidise your usage of somebody else’s models indefinitely?
> All the data points we have today show that companies are spending an insane amount of capex on gaining AI dominance without the revenue to achieve profitability yet.
Yes, capex not opex. The cost of running inference is opex.
As with Costco's giant $5 roasted chickens, this is not solid evidence they're profitable. Loss-leaders exist.
At the same time, cards have gotten >8x more efficient over the last 3 years, inference engines >10x more efficient and the raw models are at least treading water if not becoming more efficient. It’s likely that we’ll lose another 10-100x off the cost of inference in the next 2 years.
This is a tangent to the rest of the article, but this "just" is doing more heavy lifting than Atlas holding up the skies. Taking a user from $0 to $1 is immeasurably harder than taking a user from $1 to $2, and the vast majority of those active users would drop as soon as you put a real price tag on it, no matter the actual number.
https://andymasley.substack.com/p/reactions-to-mit-technolog...
It’s basically the same story as this article: people incorrectly believe they use a huge amount of energy (and water), but it’s actually pretty reasonable and not out of line with anything else we do.
The minute it starts costing me money, I have to make that decision: Is this worth the dollar?
I'm saying that good-enough LLMs are so cheap that they could easily be monetized with ads, and it's not even close. If you look at other companies with similar sized consumer-facing services monetized with ads, their ARPU is far higher than $1.
A lot of people have this mental model of LLMs being so expensive that they can’t possibly be ad-supported, leaving subscriptions as the only consumer option. That might have been true two years ago, but I don't think it's true now.
I’m sure you are going to provide some sort of evidence for this otherwise ridiculous claim, correct?
Paying $1000 for an iPhone? Sure. $10 for a Starbucks? Sure. $1 per year for LLM? Now hold on, papa is not an oil oligarch...
For most people yes. Also many people are spending for less than 1000$ for their phones.
So, basically, ads.
Last time personal computing took up an entire building, we put the same compute power into a (portable) "personal computer" a few decades later.
Can't wait to send all my data and life to my own lil inference box, instead of big tech (and NSA etc).
Better math would be converting 1% of those users, but that gets you $1000/year.
Hard indeed but they don't need everyone to pay only enough people to effectively subsidise the free users
Source? Is this in the API ToS?
How big is that impact? Well, that's a complicated issue.
https://www.sustainabilitybynumbers.com/p/carbon-footprint-c...
When this happens, what we will see is once again the rich and privileged will benefit from the use of LLMs while the poor have to just rely on their own brains. Consider how some students will have to grow up struggling through school without any LLMs while rich kids breeze their way through everything with their assistants.
Meanwhile, a free model running locally is good enough for most people. This causes pricing pressure (and I think is probably going to bankrupt most of the AI companies).
More likely IMO is that AI becomes a loss-leader. It'll all be stuff like Grok or DeepSeek where the real profit is in censorship and propaganda.
- If I'm using a search engine, I want to search the web. Yes these engines are increasingly providing answers rather than just search results, but that's a UI/product feature rather than an API one. If I'm paying Google $$ for access to their index, I'm interested in the index.
- If I'm using an LLM, it is for parsing large amounts of input data, image recognition, complex analysis, deep thinking/reasoning, coding. All of these result in significantly more token usage than a 2-line "the answer to your question is xyz" response.
The author is basically saying – a Honda Civic is cheap because it costs about the same per pound as Honeycrisp apples.
This would probably increase 10x if one of the providers sold a family plan and my kids got paid access.
Most of my heavy lifting is work related and goes through my employer's pockets.
There's less experts using search engines. Normal people treat search engines less like an index search and more like a person. Asking an old school search engine "What is the capital of USA" is actually not quite right, because the "what is" is probably quite superfluous, and you're counting on finding some sort of educative website with the answer. In fact phrasing it as "the capital of the USA is" is probably a better fit for a search engine, since that's the sort of sentence that would contain what you want to know.
Also with the plague of "SEO", there's a million sites trying to convince Google that their site is relevant even when it's not.
So LLMs are increasingly more and more relevant at informally phrased queries that don't actually contain relevant key words, and they're also much more useful in that they bypass a lot of pointless verbiage, spam, ads and requests to subscribe.
I'd argue that search engines should stick to just outputting relevant websites and let LLMs give you an overview. Both technologies are complimentary and fulfill different roles.
Correct, but you're also not the median user. You're a power user.
You have a point but no it doesn't. The article already kind of addresses it, but Open AI had a pretty low loss in 2024 for the volume of usage they get. 5B seems like a lot until you realize chatgpt.com alone even in 2024 was one of the most visited sites on the planet each month with the vast majority of those visits being entirely free users (no ads, nothing). Open AI in December last year said chatgpt had over a billion messages per day.
So even if you look at what people do with the service as a whole in general, inference really doesn't seem that costly.
Which is precisely why Google started adding their AI "answers". The web has kind of become a cancer -- the sites that game SEO the most seem to have the trashiest, most user-hostile behaviour, so search became unpleasant for most -- so Google just replaces the outbound visit conceptually.
So even if the per-query or per-token cost is lower, the total consumption is vastly higher. For that reason, while it may not be a fair comparison, due to people looking at it from the perspective of personal economics, people will compare how much it costs to use each to its full potential, respectively.
Wouldn't this award have to go to computers? They're a prerequisite for using LLMs and can do a lot more besides running LLMs.
They burn through insane amounts of cash and are, for some reason, still called startups. Sure, they'll be around for a long time until they figure something out, but unless hardware prices and power consumption go down, they won't be turning a profit anytime soon.
Just look at YouTube: in business for 20 years, but it's still unclear whether it's profitable or not, as Alphabet chooses not to disclose YT's net income. I'd imagine any public company would do this, unless those numbers are in the red.
I understand the point, but gold is expensive because it is a traditionally agreed store of value, rather than because of its usage. Rhodium would be a better example.
The queries for the LLM which were used to estimate costs don't make a lot of sense for LLMs.
You would not ask an LLM to tell you the baggage size for a flight because there might be a rule added a week ago that changes this or the LLM might hallucinate the numbers.
You would ask an LLM with web search included so it can find sources and ground the answer. This applies to any question where you need factual data, otherwise it's like asking a random stranger on the street about things that can cost money.Then the token size balloons because the LLM needs to add entire websites to its context.
If you are not looking for a grounded answer, you might be doing something more creative, like writing a text. In that case, you might be iterating on the text where the entire discussion is sent multiple times as context so you can get the answer. There might be caching/batching etc but still the tokens required grow very fast.
In summary, I think the token estimates are likely quite off. But not to be all critical, I think it was a very informative post and in the end without real world consumption data, it's hard to estimate these things.
Why wouldn’t I use it like this?
The first thing I saw was the AI summary. Underneath that was a third-party site. Underneath that was “People also ask” with five different questions. And then underneath that was the link to the American Airlines site.
I followed the line to the official site. I was presented with a “We care about your privacy” consent screen, with four categories.
The first category, “Strictly necessary”, told me it was necessary for them to share info with eleven entities, such as Vimeo and LinkedIn, because it was “essential to our site operation”.
The remaining categories added up to 59 different entities that American Airlines would like to share my browsing data with while respecting my privacy.
Once I dismissed the consent screen, I was then able to get the information.
Then I tried the question on ChatGPT. It said “Searching the web”, paused for a second, and then it told me.
Then I tried it on Claude. It paused for a second, said “Searching the web”, and then it told me.
Then I tried it on Qwen. It paused for a second, then told me.
Then I tried it on DeepSeek. It paused for a second, said “Searching the web”, and then it told me.
All of the LLMs gave me the information more quickly, got the answer right, and linked to the official source.
Yes, Google’s AI answer did too… but that’s just Google’s LLM.
Websites have been choosing shitty UX for decades at this point. The web is so polluted with crap and obstacles it’s ridiculous. Nobody seems to care any more. Now LLMs have come along that will just give you the info straight away without any fuss, so of course people are going to prefer them.
What benefit did the LLM add here, if you still had to vet the sources?
4o will always do a web search for a pointedly current question, give references in the reply that can be checked, and if it didn't, you can tell it to search.
o3 meanwhile will do many searches and look at the thing from multiple angles.
I think this article is measuring all the wrong things and therefore comes to the wrong conclusion.
E.g. based on the calculations in https://www.tensoreconomics.com/p/llm-inference-economics-fr..., increasing batch size from 1 to 64 cuts the cost per token to 1/16th.
Also laughably excludes this one from openai's pricing details:
o1-pro-2025-03-19 Price per 1M tokens Batch API price -- Input: $150.00, Output: $600.00
And this doesn't even address quality. Results quality is also explicitly ignored. I personally find most results from cheaper models to be far, far worse than any results I find using search prior to the LLM content flood. But of course, that's 1) subjective, and 2) completely impossible to conduct any analytical comparison now since indexed search has been so completely ruined by SEO and LLM junk. Yet another externalized cost for which accounting is completely impossible, but is likely to have immeasurably negative impacts on the world's ability to share information.
Runs to shop to buy GPU rig.
But yeah 0.20 per million is nothing for light use.
datadrivenangel•1h ago
This is going to reshape large portions of our text based communication networks.
falcor84•1h ago
iwontberude•1h ago