LLMs are cheap

https://www.snellman.net/blog/archive/2025-06-02-llms-are-cheap/

131•Bogdanp•2h ago

Comments

datadrivenangel•1h ago

low to moderate quality digital text work is now almost free!

This is going to reshape large portions of our text based communication networks.

falcor84•1h ago

There's of course also the issue that an increasing fraction of web content reading is being done by AI agents. I wonder what the Pareto front here is.

iwontberude•1h ago

No one has successfully rebutted that paper about stochastic collapse of AI models which happens when models train on their own output over time. It’s just a matter of time before we find out if it was right or not.

qoez•1h ago

So far. Give it a few years when the core players have spent their way to market dominance and regulation kicks in and you'll see the price hikes investors have been promised behind closed doors.

hackyhacky•1h ago

Or maybe they'll just use ads.

Whatever question you ask, the response will recommend a cool, refreshing Coca Cola soft drink.

Your AI coding project will automatically display ads collecting revenue for Anthropic, not for you.

Every tenth email sent by your AI agent will encourage the recipient to consider switching to Geico.

The opportunities are endless.

JackSlateur•59m ago

Yes

LLM and stuff are the ultimate propaganda machine: a machine which is able to masquerade everything, to generate endless lies in the coherent manner

xxbondsxx•1h ago

You can't compare an API that is profitable (search) to an API that is likely a loss-leader to grab market share (hosted LLM cloud models).

Sure there might not be any analysis that proves that they subsidized, but you also don't have any evidence that they are profitable. All the data points we have today show that companies are spending an insane amount of capex on gaining AI dominance without the revenue to achieve profitability yet.

You're also comparing two products in very different spots in the maturity lifecycle. There's no way to justify losing money on a decade-old product that's likely declining in overall usage -- ask any MBA (as much as engineers don't like business perspectives).

(Also you can reasonably serve search queries off of CPUs with high rates of caching between queries. LLM inference essentially requires GPUs and is much harder to cache between users since any one token could make a huge difference in the output)

xxbondsxx•1h ago

For example, Perplexity has been fudging their accounting numbers to shift COGS to R&D to make their margin appear profitable: https://thedeepdive.ca/did-perplexity-fudge-its-numbers/

TZubiri•1h ago

This is addressed in the article. Giving arguments for llms being profitable as APIs.

n4r9•59m ago

One of those arguments is:

> there's not that much motive to gain API market share with unsustainably cheap prices. Any gains would be temporary, since there's no long-term lock-in, and better models are released weekly

The goal may be not so much locking customers in, but outlasting other LLM providers whilst maintaining a good brand image. Once everyone starts seeing you as "the" LLM provider, costs can start going up. That's what Uber and Lyft have been trying to do (though obviously without success).

Also, the prices may become more sustainable if LLM providers find ways to inject ad revenue into their products.

unilynx•30m ago

> Also, the prices may become more sustainable if LLM providers find ways to inject ad revenue into their products.

I'm sure they've already found ways to do that, injecting relevant ads is just a form of RAG.

But they won't risk it yet as long as they're still grabbing market share just like Google didn't run them at the start - and kept them unobtrusive until their search won.

pama•1h ago

Please read the DeepSeek analysis of their API service (linked in this article): they have 500% profit margin and they are cheaper than any of the US companies serving the same model. It is conceivable that the API service of OpenAI or Anthropic have much higher profit margins yet.

(GPUs are generally much more cost effective and energy efficient than CPU if the solution maps to both architectures. Anthropic certainly caches the KV-cache of their 24k token system prompt.)

iamnotagenius•1h ago

With all due respect to Deepseek, I would take their numbers with grain of salt, as they might as well be politically motivated.

jarym•41m ago

Any more politically motivated than a model from anywhere else?

WithinReason•34m ago

is that better or worse than commercially motivated?

leeoniya•14m ago

commercial motivatation needs to show eventual profit to be sustainable, while political does not.

though at the outset (pre-profit / private) it's hard to say there's much difference.

JimDabell•34m ago

> you also don't have any evidence that they are profitable.

Sure we do. Go to AWS or any other hosting provider and pay them for inference. You think AWS are going to subsidise your usage of somebody else’s models indefinitely?

> All the data points we have today show that companies are spending an insane amount of capex on gaining AI dominance without the revenue to achieve profitability yet.

Yes, capex not opex. The cost of running inference is opex.

rco8786•26m ago

AWS isn’t doing the training on those models.

antman•26m ago

No we don't, MS used their OpenAI position as a strategy to increase Azure adoption. I am surprised AWS didn't give ls for free

ceejayoz•18m ago

> You think AWS are going to subsidise your usage of somebody else’s models indefinitely?

As with Costco's giant $5 roasted chickens, this is not solid evidence they're profitable. Loss-leaders exist.

Workaccount2•24m ago

Just wait till there are ads for free users, which is going to happen. Depending on how insidious these ads are, they could be extremely profitable too, like recommending products and services directly in context.

Sevii•5m ago

They could dynamically update the system prompt with ad content on a per request basis. Lots of options.

lumost•11m ago

We don’t know what the marginal cost of inference is yet however. So far, users are demonstrating that they are willing to pay more for LLMs than traditional web experiences.

At the same time, cards have gotten >8x more efficient over the last 3 years, inference engines >10x more efficient and the raw models are at least treading water if not becoming more efficient. It’s likely that we’ll lose another 10-100x off the cost of inference in the next 2 years.

Etheryte•1h ago

> OpenAI reportedly made a loss of $5B in 2024. They also reportedly have 500M MAUs. To reach break-even, they'd just need to monetize those free users for an average of $10/year, or $1/month. A $1 ARPU for a service like this would be pitifully low.

This is a tangent to the rest of the article, but this "just" is doing more heavy lifting than Atlas holding up the skies. Taking a user from $0 to $1 is immeasurably harder than taking a user from $1 to $2, and the vast majority of those active users would drop as soon as you put a real price tag on it, no matter the actual number.

holoduke•1h ago

There are more monetization ways than just a hard paying user. You can ask Google or Facebook. I dont think its super hard to get chatgpt to a. Profitable business. Its probably the most used service currently out there. And its use and effectiveness is immense.

eptcyka•1h ago

I wonder how many more watts does producing an answer OpenAI use than answering a Google search query.

JimDabell•1h ago

This is a good article on the subject. Make sure you read the linked articles as well.

https://andymasley.substack.com/p/reactions-to-mit-technolog...

It’s basically the same story as this article: people incorrectly believe they use a huge amount of energy (and water), but it’s actually pretty reasonable and not out of line with anything else we do.

jeffbee•1h ago

To make a billion dollars, I would simply sell a Coke to everyone in China. I have been giving away Coke in China and it is very popular, so I am sure this will work.

barrkel•55m ago

You joke, but for food and beverages, a stand in the supermarket giving the stuff away for free is a really common (and thus successful) tactic.

bena•1h ago

Exactly, when the cost is free, I can ask it for whatever stupid thing I can think of.

The minute it starts costing me money, I have to make that decision: Is this worth the dollar?

og_kalu•1h ago

It's doing some heavy lifting but not that much. Saas subscriptions are not the be-all and end-all of software monetization. He's saying they need to get $1 more on average, not convert all users to $1 subscribers. Doable.

eptcyka•1h ago

500M MAU also implies that some are already paying. They need to extract 1$ more on average, not just get all of them to pay 1$ per month. This, I imagine is harder than assuming there are 500m users that pay nothing today.

fsmv•1h ago

Another problem is once they're on the pro plan using better models the users are more expensive

jsnell•1h ago

Ok, I clearly should have made the wording more explict since this is the second comment I got in the same vein. I'm not saying you'd convert users to $1/month subscriptions. That would indeed be an absurd idea.

I'm saying that good-enough LLMs are so cheap that they could easily be monetized with ads, and it's not even close. If you look at other companies with similar sized consumer-facing services monetized with ads, their ARPU is far higher than $1.

A lot of people have this mental model of LLMs being so expensive that they can’t possibly be ad-supported, leaving subscriptions as the only consumer option. That might have been true two years ago, but I don't think it's true now.

andrew_lettuce•1h ago

There are some big problems with this, mostly that openAI doesn't want to break even or be profitable, their entire setup is based on being wildly so. Building a Google sized business on ads is incredibly difficult. They need to be so much better than the competition that we have no choice but to use them, and that's not the case any more. More minor but still a major issue is the underlying IP rights. As users mature they will increasingly look for citations from LLMs, and if open AI is monetizing in this vein everyone is going to come for a piece.

NewsaHackO•56m ago

> mostly that openAI doesn't want to break even or be profitable, their entire setup is based on being wildly so.

I’m sure you are going to provide some sort of evidence for this otherwise ridiculous claim, correct?

andrew_lettuce•1h ago

Of there 500M users a very small number are already paying, so it's not zero-to-one for all of them, but monetize more and take $10 a month to $100. It's unclear if this is easier or harder than what you presented, but both are hard.

dist-epoch•1h ago

This is true only because people are so dumb.

Paying $1000 for an iPhone? Sure. $10 for a Starbucks? Sure. $1 per year for LLM? Now hold on, papa is not an oil oligarch...

andrew_lettuce•1h ago

People pay for the perceived value. If apple started by giving away iPhones they would balk at paying that much for them too. It's also very well know that free to anything is much harder than increasing the price

sandrello•1h ago

It's not only that they're stupid, it's the fact that maybe they don't really need it. Do they really need an iPhone? in a sense, yes, since the alternative still means spending a good amount of money and in no way they can do without a phone.

Retric•1h ago

A 1000$ iPhone over 5 years is 17$/month, is it worth 17x as much as a free tier LLM?

For most people yes. Also many people are spending for less than 1000$ for their phones.

relaxing•1h ago

The iphone is worth infinitely more because every time I ask it for some information it returns for me the fact I asked for, no hallucinations.

otabdeveloper4•1h ago

The LLM usually provides negative value tho. Unlike the iPhone which can theoretically play mobile games.

chaz6•1h ago

I thought that services like these were run at a loss because the data that users provide is often worth more than the price of a subscription.

AndrewDucker•1h ago

Only if you can find a way of monetisng that data or selling it on.

So, basically, ads.

bboygravity•1h ago

The entire businessmodel may only work as long as inference takes up the physical space and cost of a small building.

Last time personal computing took up an entire building, we put the same compute power into a (portable) "personal computer" a few decades later.

Can't wait to send all my data and life to my own lil inference box, instead of big tech (and NSA etc).

relaxing•1h ago

“Last time” we weren’t up against physical limitations for solid state electronics like the size of an atom, wavelength of light, quantum effects, thermal management, etc.

layer8•1h ago

Last time personal computing took up an entire building, we weren’t anywhere near as close to the physical limits of semiconductors as today, though. We’ll have to see how much optimization headroom there is on the model side.

paxys•1h ago

It's easy. All OpenAI has to do to break even is checks notes replicate Google's multi-trillion dollar advertising engine and network that has been in operation for 2+ decades.

brookst•57m ago

Agreed. Not to mention that having 500m paid users would dramatically change usage and drive up costs.

Better math would be converting 1% of those users, but that gets you $1000/year.

barrkel•56m ago

$1 in monetization doesn't mean $1 in subscription. It means advertising, affiliate links, traffic deals.

netdevphoenix•22m ago

>This is a tangent to the rest of the article, but this "just" is doing more heavy lifting than Atlas holding up the skies. Taking a user from $0 to $1 is immeasurably harder than taking a user from $1 to $2, and the vast majority of those active users would drop as soon as you put a real price tag on it, no matter the actual number.

Hard indeed but they don't need everyone to pay only enough people to effectively subsidise the free users

TZubiri•1h ago

"Data from paid API queries will also typically not be used for training or tuning the models, so getting access to more data wouldn't explain it."

Source? Is this in the API ToS?

xpe•6m ago

[delayed]

prmoustache•1h ago

LLMs aren't cheap if you consider the impact on the climate and the cost that comes from it.

fastball•1h ago

I didn't realize Large Language Models have a direct impact on the climate.

tecleandor•1h ago

Well, running them does. And, from what I get from the article, that's what they're trying to do: either running them or having someone do it for them as a service.

How big is that impact? Well, that's a complicated issue.

johnisgood•1h ago

How about indirect? At any rate, something is going on, because our summers are more and more hotter, and there are no snow during our winters. We are all noticing it but it gets shrugged off as "misremembering". I am not contributing it to running LLMs alone, however, but climate change seems real enough to me, I experience it. It is barely July and I am dying! We used to have more tolerable weather around this time of year, for a long time.

worldsayshi•1h ago

I would like to understand if this still has truth to it.

PickledChris•33m ago

I will preface this by saying that I care a lot about climate change and carbon usage and AI usage is not a big issue, it is in fact a distraction from where we should be focusing our efforts.

https://www.sustainabilitybynumbers.com/p/carbon-footprint-c...

theOGognf•1h ago

Some anecdotal data, but we recently estimated the cost of running a LLM at $WORK by looking at power usage over a bursty period of requests from our internal users and it was on the order of $10s/mil tokens. And we arent a big place, nor were our servers at max load, so I can see the cost being much lower at scale

exceptione•1h ago

This is only the power usage?

dist-epoch•1h ago

Hardware spend also need to be amortized (over 1 year? 2 years?) Unless you cloud rent.

theOGognf•1h ago

That is true too

theOGognf•1h ago

Right, this is only power usage. Factoring in labor and all that would make it more expensive for sure. However, it’s not like it’s a complex system to maintain. We use a popular inference server and just run it with some modest rate limits . It’s been hands-off for close to a year at this point

exceptione•1h ago

Ok! What hardware do you run? I had thought that would be the most expensive part.

deadbabe•1h ago

Given how addicted people are to using LLMs steep price hikes are almost certainly guaranteed at some point.

When this happens, what we will see is once again the rich and privileged will benefit from the use of LLMs while the poor have to just rely on their own brains. Consider how some students will have to grow up struggling through school without any LLMs while rich kids breeze their way through everything with their assistants.

worldsayshi•1h ago

That would assume that there's a moat...

thijson•1h ago

Not sure if you're being sarcastic. The cost of compute is perpetually going lower, it is getting harder to scale though. I feel like LLM's will become ubiquitous. When I went to University in the 90's, only the wealthy could afford cell phones, pulling one out was a flex. Now they are everywhere. Even Nvidia's sky high margins will someday be eroded.

hajile•43m ago

If they are dividing a few billion dollars in model training between a small number of rich people, it quickly becomes too expensive even for them.

Meanwhile, a free model running locally is good enough for most people. This causes pricing pressure (and I think is probably going to bankrupt most of the AI companies).

More likely IMO is that AI becomes a loss-leader. It'll all be stuff like Grok or DeepSeek where the real profit is in censorship and propaganda.

paxys•1h ago

The entire comparison hinges on people only making simple factual searches ("what is the capital of USA") on both search engines and LLMs. I'm going to say that's far enough from the standard use case for both these sets of APIs to be entirely meaningless.

- If I'm using a search engine, I want to search the web. Yes these engines are increasingly providing answers rather than just search results, but that's a UI/product feature rather than an API one. If I'm paying Google $$ for access to their index, I'm interested in the index.

- If I'm using an LLM, it is for parsing large amounts of input data, image recognition, complex analysis, deep thinking/reasoning, coding. All of these result in significantly more token usage than a 2-line "the answer to your question is xyz" response.

The author is basically saying – a Honda Civic is cheap because it costs about the same per pound as Honeycrisp apples.

fkyoureadthedoc•1h ago

Anecdotally, I'm a paying user and do a lot of super basic queries. What is this bug, rewrite this drivel into an email to my HOA, turn me into a gnome, what is the worst state and why is it west Virginia.

This would probably increase 10x if one of the providers sold a family plan and my kids got paid access.

Most of my heavy lifting is work related and goes through my employer's pockets.

johnisgood•1h ago

I love your prompts. :D

paxys•1h ago

None of those are "basic queries", in the sense that you will not be able to solve them using the Google/Bing search API.

dale_glass•1h ago

I think the issue is that the classical search engine model has increasingly become less useful.

There's less experts using search engines. Normal people treat search engines less like an index search and more like a person. Asking an old school search engine "What is the capital of USA" is actually not quite right, because the "what is" is probably quite superfluous, and you're counting on finding some sort of educative website with the answer. In fact phrasing it as "the capital of the USA is" is probably a better fit for a search engine, since that's the sort of sentence that would contain what you want to know.

Also with the plague of "SEO", there's a million sites trying to convince Google that their site is relevant even when it's not.

So LLMs are increasingly more and more relevant at informally phrased queries that don't actually contain relevant key words, and they're also much more useful in that they bypass a lot of pointless verbiage, spam, ads and requests to subscribe.

atrettel•51m ago

This is a great point. I'll add that search engines are also unclear about what kind of output they give. As you point out, search engines accept both questions and key words as queries. Arguably you'd want completely different searches/answers for those. Moreover, search engines no longer just output web sites with the key words but also give an "AI overview" in an attempt to keep you on their site, which is contrary to what search engines have traditionally done. Previously search engines were something you pass through but they now try to position themselves as destinations instead.

I'd argue that search engines should stick to just outputting relevant websites and let LLMs give you an overview. Both technologies are complimentary and fulfill different roles.

agentultra•15m ago

Most search engines will parse the query sentence much more intelligently than that. It's not literally matching every word and hasn't for decades. I just tried a handful of popular search engines, they all return the appropriate responses and links.

phillipcarter•59m ago

> If I'm using an LLM, it is for parsing large amounts of input data, image recognition, complex analysis, deep thinking/reasoning, coding. All of these result in significantly more token usage than a 2-line "the answer to your question is xyz" response.

Correct, but you're also not the median user. You're a power user.

og_kalu•56m ago

>The entire comparison hinges on people only making simple factual searches

You have a point but no it doesn't. The article already kind of addresses it, but Open AI had a pretty low loss in 2024 for the volume of usage they get. 5B seems like a lot until you realize chatgpt.com alone even in 2024 was one of the most visited sites on the planet each month with the vast majority of those visits being entirely free users (no ads, nothing). Open AI in December last year said chatgpt had over a billion messages per day.

So even if you look at what people do with the service as a whole in general, inference really doesn't seem that costly.

llm_nerd•47m ago

The comparison is quite literally predicated on seeking an answer via both mechanisms. And the simple truth is that for an enormous percentage of users, that is indeed precisely how they use both search engines and LLMs: They want an answer to a question, maybe with some follow-up links so if that isn't satisfactory they can use heuristics to dig deeper.

Which is precisely why Google started adding their AI "answers". The web has kind of become a cancer -- the sites that game SEO the most seem to have the trashiest, most user-hostile behaviour, so search became unpleasant for most -- so Google just replaces the outbound visit conceptually.

ETH_start•1h ago

Search is narrow, used occasionally to find external information. LLMs are the single most general-purpose tool in existence. If you're using them to their full potential, you end up relying on them across writing, planning, coding, summarizing, etc.

So even if the per-query or per-token cost is lower, the total consumption is vastly higher. For that reason, while it may not be a fair comparison, due to people looking at it from the perspective of personal economics, people will compare how much it costs to use each to its full potential, respectively.

throwaway0123_5•1h ago

> LLMs are the single most general-purpose tool in existence.

Wouldn't this award have to go to computers? They're a prerequisite for using LLMs and can do a lot more besides running LLMs.

ETH_start•1h ago

Yes, "tool" is probably not the right term. Application?

pmdr•1h ago

I really doubt that, in an industry where chips are so hard to come by, draw so much power and are so terribly expensive, big players could at any time flip a switch and become profitable.

They burn through insane amounts of cash and are, for some reason, still called startups. Sure, they'll be around for a long time until they figure something out, but unless hardware prices and power consumption go down, they won't be turning a profit anytime soon.

Just look at YouTube: in business for 20 years, but it's still unclear whether it's profitable or not, as Alphabet chooses not to disclose YT's net income. I'd imagine any public company would do this, unless those numbers are in the red.

dist-epoch•1h ago

Stock price go up is another way a company is profitable. The amazon playbook for 10+ years.

andrew_lettuce•1h ago

Amazon made huge money as they captured more and more of the market and didn't return any of it. The company literally became worth more and more each year. Open AI continues to hemorrhage money.

bfrog•59m ago

It's another Uber moment for VC. The bullshit ends as soon as becoming a functioning business suddenly takes precedence, and the real costs start to come out.

patapong•51m ago

Sure, but Alphabet is insanely profitable, based on having grabbed a lot of market share in the search market and showing people ads. The AI companies är betting that AI will be similarly important to people, and that there is at least some stickiness to the product, meaning that market share can eventually be converted to revenue. I think both of these are relatively likely.

fedeb95•1h ago

I don't think LLMs are inherently "costly" or "cheap". This doesn't really matter. Gold is pricey, but its usages justify the cost. Will LLMs, as they are used and evangelized now, have a true positive return for those using it? In some domains it will, most probably not everywhere and not for everyone.

graemep•38m ago

> Gold is pricey, but its usages justify the cost

I understand the point, but gold is expensive because it is a traditionally agreed store of value, rather than because of its usage. Rhodium would be a better example.

postexitus•1h ago

They are cheap as long as subsidized by VC and (in some cases) government money. We will see the real cost when free tiers disappear or start to be supported by ads.

WhyIsItAlwaysHN•1h ago

There's something I don't get in this analysis.

The queries for the LLM which were used to estimate costs don't make a lot of sense for LLMs.

You would not ask an LLM to tell you the baggage size for a flight because there might be a rule added a week ago that changes this or the LLM might hallucinate the numbers.

You would ask an LLM with web search included so it can find sources and ground the answer. This applies to any question where you need factual data, otherwise it's like asking a random stranger on the street about things that can cost money.Then the token size balloons because the LLM needs to add entire websites to its context.

If you are not looking for a grounded answer, you might be doing something more creative, like writing a text. In that case, you might be iterating on the text where the entire discussion is sent multiple times as context so you can get the answer. There might be caching/batching etc but still the tokens required grow very fast.

In summary, I think the token estimates are likely quite off. But not to be all critical, I think it was a very informative post and in the end without real world consumption data, it's hard to estimate these things.

brookst•59m ago

Just tried asking “what is the maximum carryon size for an American Airlines flight DFW-CDG” and it used a webs search, provided the correct answer, and provided links to both the airline and FAA sites.

Why wouldn’t I use it like this?

adrian_b•38m ago

I do not see which is the added benefit provided by the LLM in such cases, instead of doing yourself that web search, and for free.

JimDabell•7m ago

I just tried that search on Google.

The first thing I saw was the AI summary. Underneath that was a third-party site. Underneath that was “People also ask” with five different questions. And then underneath that was the link to the American Airlines site.

I followed the line to the official site. I was presented with a “We care about your privacy” consent screen, with four categories.

The first category, “Strictly necessary”, told me it was necessary for them to share info with eleven entities, such as Vimeo and LinkedIn, because it was “essential to our site operation”.

The remaining categories added up to 59 different entities that American Airlines would like to share my browsing data with while respecting my privacy.

Once I dismissed the consent screen, I was then able to get the information.

Then I tried the question on ChatGPT. It said “Searching the web”, paused for a second, and then it told me.

Then I tried it on Claude. It paused for a second, said “Searching the web”, and then it told me.

Then I tried it on Qwen. It paused for a second, then told me.

Then I tried it on DeepSeek. It paused for a second, said “Searching the web”, and then it told me.

All of the LLMs gave me the information more quickly, got the answer right, and linked to the official source.

Yes, Google’s AI answer did too… but that’s just Google’s LLM.

Websites have been choosing shitty UX for decades at this point. The web is so polluted with crap and obstacles it’s ridiculous. Nobody seems to care any more. Now LLMs have come along that will just give you the info straight away without any fuss, so of course people are going to prefer them.

ceejayoz•15m ago

That search query brings up https://www.aa.com/i18n/travel-info/baggage/carry-on-baggage... for the first result, which says "The total size of your carry-on, including the handles and wheels, cannot exceed 22 x 14 x 9 inches (56 x 36 x 23 cm) and must fit in the sizer at the airport."

What benefit did the LLM add here, if you still had to vet the sources?

barrkel•58m ago

Oh contraire, I ask questions about recent things all the time, because the LLM will do a web search and read the web page - multiple pages - for me, and summarize it all.

4o will always do a web search for a pointedly current question, give references in the reply that can be checked, and if it didn't, you can tell it to search.

o3 meanwhile will do many searches and look at the thing from multiple angles.

pzo•49m ago

But that's from user perspective, check Google or openai pricing if you wanted to have grounded results in their API. Google ask $45 for 1k grounded searches on top of tokens. If you have business model based on ads you unlikely gonna have $45 CPM. Same if you want to offer so free version of you product then it's getting expensive.

zambal•46m ago

But in that case it's hard to argue that llm's are cheap in comparison to search (the premise of the article)

harperlee•19m ago

Nitpick: Au contraire

bfrog•1h ago

Cheap by what measure? Surely not by the carbon footprint these large capital intense datacenters are going up in droves to support them? Surely not given by the revenue being generated by one silicon design company at the moment?

I think this article is measuring all the wrong things and therefore comes to the wrong conclusion.

empiko•1h ago

One thing that is making them cheap is the lack of moats. If anybody can provide the same service, the market will push the prices down eventuallt, as this is a model demand-supply situation. OpenAI has the advantage due to brand awareness, but that is more or less it for them. Most users would probably not notice if you would switch the product they are using. For this reason, I think that companies that already have some channels to get their products on users' screens - Google, MS, Apple - have theoretically the best position to control the market. But practically, they do not seem very keen to do so.

physicsguy•1h ago

LLMs are heavily subsidised. If you self-host them and run them at cost, then you find that the GPU costs are high, and that's largely without the additional tools that OpenAI and Anthropic provide and which also must cost a lot to operate.

jsnell•42m ago

If you self-host, you likely won't have anywhere near enough volume to do efficient batching, and end up bottlenecked on memory rather than compute.

E.g. based on the calculations in https://www.tensoreconomics.com/p/llm-inference-economics-fr..., increasing batch size from 1 to 64 cuts the cost per token to 1/16th.

jbd0•34m ago

Before I started self-hosting my LLMs with Ollama, I imagined that they required a ton of energy to operate. I was amazed at how quickly my local LLM operates with a relatively inexpensive GeForce RTX 4060 with 8GB VRAM and an 8b model. The 8b model isn't as smart as the hosted 70b models I've used, but it's still surprisingly useful.

dangoodmanUT•59m ago

It still boggles my mind the grip had on search, it's SO expensive (especially considering it's otherwise free for humans)

nelsnelson•56m ago

Completely ignores externalized costs, and focuses entirely on purely end-user retail costs to operate, not even vendor internal operational costs. Can't even find the words "energy" or "electricity" or "scale" in the post. Whatever point this person is making, it is of such dramatic limitation that I am going to contentedly ignore it. Point people at this all you like, Juho Snellman. I for one will merely ignore you.

Also laughably excludes this one from openai's pricing details:

o1-pro-2025-03-19 Price per 1M tokens Batch API price -- Input: $150.00, Output: $600.00

And this doesn't even address quality. Results quality is also explicitly ignored. I personally find most results from cheaper models to be far, far worse than any results I find using search prior to the LLM content flood. But of course, that's 1) subjective, and 2) completely impossible to conduct any analytical comparison now since indexed search has been so completely ruined by SEO and LLM junk. Yet another externalized cost for which accounting is completely impossible, but is likely to have immeasurably negative impacts on the world's ability to share information.

fumeux_fume•45m ago

Author seems comfortable basing their complicated argument entirely on speculation and only one aspect of what it means to bring these services to market. Some numbers are pulled out of thin air with no context. It just doesn't seem worth taking seriously even IF there really is a misconception about the costs of running an LLM.

amelius•42m ago

> LLMs are cheap

Runs to shop to buy GPU rig.

szczepano•42m ago

I want to see how this pricing compares with searxng.

mmcnl•34m ago

I think this is a good analysis but falls a little short. Sure, the price is not high for inference, but what about the cost? To be fair, the author already tries to answer this claim, but you could look more critically at this question. Something like: taking into account the insane amount of capital that is being spent and injected into AI companies, what is the strategy to break-even in a reasonable amount of time? What would be the implications for the price over time from now on? That's an interesting thought experiment that, at least in my head, raises the question if the price we're paying for inference today is actually fair.

Havoc•10m ago

Guess it comes down to how heavy the query is in context size. If you’re not doing RAG and instead just inlining large amounts then it won’t stay cheap.

But yeah 0.20 per million is nothing for light use.

A long-shot plan to mine the Moon comes a little closer to reality

Mitchellh: My NixOS Configurations

Finding My Spark Again

Archimedes' Screw: The Ancient Greek Water-Raising Invention Still Used Today

Medicine's rapid adoption of AI has researchers concerned

Founders: Don't Give Up

A medical breakthrough might allow doctors to reverse death–for real this time

Mushrooms communicate with each other using up to 50 'words', scientist claims

Feeding AI models 10% 4chan trash actually makes them better behaved

Show HN: Autonomous AI for e-signatures – chat/voice assistant

How Long it takes to know if a Job Is Right for You or Not

ChatGPT 'got wrecked' by Atari 2600 in beginner's chess match

Language Is a Tool for Communication, Not for Thinking

What are umlauts, the dots themselves or the letters with the dots?

Askanyquestion.ai – No-Code AI Search with Shopify and Stripe

Pioneering Apple engineer Bill Atkinson dies at 74

Sourcery.ai is wrong, and when told it is wrong, refuses to admit it three times

Bruteforcing the phone number of any Google user

Ask HN: Are LLMs a net environmental positive, deserving of green tech status?

Tiny SA town buys disused primary school for $1.10, transforms it into op shop

The greatest organ piece ever written is far weirder than you think [video]

Quality Outreach Heads-Up – Separate Metaspace and GC Printing

My Short Position Got Crushed, And Now I Owe E-Trade $106,445.56 (2015)

Can Large Tech Companies Be 'Bad' for the Economy but Good for Workers?

Using AI to create self describing PNG files

Dolphin Language Decoding: The SETI Connection

The Internal Workings of the Chinese Cybercrime Ecosystem

Corvette C8 Turned Food Truck Is About to Set a Guinness World Record

High-Touch vs. Low-Touch PortCos: Spending Your Time Strategically

What Is OAuth and How Does It Work?

A long-shot plan to mine the Moon comes a little closer to reality

Mitchellh: My NixOS Configurations

Finding My Spark Again

Archimedes' Screw: The Ancient Greek Water-Raising Invention Still Used Today

Medicine's rapid adoption of AI has researchers concerned

Founders: Don't Give Up

A medical breakthrough might allow doctors to reverse death–for real this time

Mushrooms communicate with each other using up to 50 'words', scientist claims

Feeding AI models 10% 4chan trash actually makes them better behaved

Show HN: Autonomous AI for e-signatures – chat/voice assistant

How Long it takes to know if a Job Is Right for You or Not

ChatGPT 'got wrecked' by Atari 2600 in beginner's chess match

Language Is a Tool for Communication, Not for Thinking

What are umlauts, the dots themselves or the letters with the dots?

Askanyquestion.ai – No-Code AI Search with Shopify and Stripe

Pioneering Apple engineer Bill Atkinson dies at 74

Sourcery.ai is wrong, and when told it is wrong, refuses to admit it three times

Bruteforcing the phone number of any Google user

Ask HN: Are LLMs a net environmental positive, deserving of green tech status?

Tiny SA town buys disused primary school for $1.10, transforms it into op shop

The greatest organ piece ever written is far weirder than you think [video]

Quality Outreach Heads-Up – Separate Metaspace and GC Printing

My Short Position Got Crushed, And Now I Owe E-Trade $106,445.56 (2015)

Can Large Tech Companies Be 'Bad' for the Economy but Good for Workers?

Using AI to create self describing PNG files

Dolphin Language Decoding: The SETI Connection

The Internal Workings of the Chinese Cybercrime Ecosystem

Corvette C8 Turned Food Truck Is About to Set a Guinness World Record

High-Touch vs. Low-Touch PortCos: Spending Your Time Strategically

What Is OAuth and How Does It Work?

LLMs are cheap

Comments