Whatever question you ask, the response will recommend a cool, refreshing Coca Cola soft drink.
Your AI coding project will automatically display ads collecting revenue for Anthropic, not for you.
Every tenth email sent by your AI agent will encourage the recipient to consider switching to Geico.
The opportunities are endless.
LLM and stuff are the ultimate propaganda machine: a machine which is able to masquerade everything, to generate endless lies in the coherent manner
Sure there might not be any analysis that proves that they subsidized, but you also don't have any evidence that they are profitable. All the data points we have today show that companies are spending an insane amount of capex on gaining AI dominance without the revenue to achieve profitability yet.
You're also comparing two products in very different spots in the maturity lifecycle. There's no way to justify losing money on a decade-old product that's likely declining in overall usage -- ask any MBA (as much as engineers don't like business perspectives).
(Also you can reasonably serve search queries off of CPUs with high rates of caching between queries. LLM inference essentially requires GPUs and is much harder to cache between users since any one token could make a huge difference in the output)
> there's not that much motive to gain API market share with unsustainably cheap prices. Any gains would be temporary, since there's no long-term lock-in, and better models are released weekly
The goal may be not so much locking customers in, but outlasting other LLM providers whilst maintaining a good brand image. Once everyone starts seeing you as "the" LLM provider, costs can start going up. That's what Uber and Lyft have been trying to do (though obviously without success).
Also, the prices may become more sustainable if LLM providers find ways to inject ad revenue into their products.
I'm sure they've already found ways to do that, injecting relevant ads is just a form of RAG.
But they won't risk it yet as long as they're still grabbing market share just like Google didn't run them at the start - and kept them unobtrusive until their search won.
> First, there's not that much motive to gain API market share with unsustainably cheap prices. Any gains would be temporary, since there's no long-term lock-in,
What? If someone builds something on top of your API, they're tying themselves to it, and you can slowly raise prices while keeping each increase well below the switching cost.
> Second, some of those models have been released with open weights and API access is also available from third-party providers who would have no motive to subsidize inference.
See above. Just like any other Cloud service, you tie clients to your API.
> Third, Deepseek released actual numbers on their inference efficiency in February. Those numbers suggest that their normal R1 API pricing has about 80% margins when considering the GPU costs, though not any other serving costs.
80% margin on GPU cost? What about after paying for power, facilities, admin, support, marketing, etc.? Are GPUs really more than half the cost of this business?
(EDIT: This is 80% margin on top of GPU rental, i.e. total compute cost. My bad.)
Guessing about costs based on prices makes no sense at this point. OpenAI's $20/mo and $200/mo tiers have nothing to do with the cost of those services -- they're just testing price points.
Have you used any of these APIs? There's very little lock-in for inference. This isn't like setting up all your automation on S3, if you use the right library it's changing a config file.
That's not really how the LLM API market works. The interfaces themselves are pretty trivial and have no real lock-in value, and there's plenty of adapters around anyway. (Often first-party, e.g. both Anthropic and Google provide OpenAI-compatible APIs). There might initially have been theories that you could not easily move to a different model, creating lock-in, but in practice LLMs are so flexible and forgiving about the inputs that a different model can be just dropped in an work without any model-specific changes.
> 80% margin on GPU cost? What about after paying for power, facilities
The market price of renting that compute on the market. That's fully loaded, so would include a) pro-rated recouping the capital cost of the GPUs, b) the power, cooling, datacenter buildings, etc, c) the hosting provider's margin.
> admin, support, marketing, etc.? Are GPUs really more than half the cost of this business?
Pretty likely! In OpenAI's leaked 2024 financial plan the compute costs were like 75% of their projected costs.
Sorry, I totally misread your post. Charging 80% on top of server rental isn't so bad, especially since I'm guessing there are significant markups on GPU rental given all the AI demand.
(GPUs are generally much more cost effective and energy efficient than CPU if the solution maps to both architectures. Anthropic certainly caches the KV-cache of their 24k token system prompt.)
though at the outset (pre-profit / private) it's hard to say there's much difference.
I think this is the tough part, we’re at the outset still.
Also, a political investment could could be sustainable, in the sense that China might decide they are fine running Deepseek at a loss indefinitely, if that’s what’s going on (hypothetically. Actually I have never seen any evidence to suggest Deepseek is subsidized, although I haven’t gone looking).
So when I see a company bragging about "500% profitability," I can’t help but wonder if they’re even profitable at all.
Sure we do. Go to AWS or any other hosting provider and pay them for inference. You think AWS are going to subsidise your usage of somebody else’s models indefinitely?
> All the data points we have today show that companies are spending an insane amount of capex on gaining AI dominance without the revenue to achieve profitability yet.
Yes, capex not opex. The cost of running inference is opex.
That does not mean, however, that inference is unprofitable. The unit economics of inference can be profitable even while the personnel costs of training next-generation models are extraordinary.
As with Costco's giant $5 roasted chickens, this is not solid evidence they're profitable. Loss-leaders exist.
Anyway, at 0.4 J/token, at a cost of 5 cents/kWh, is about 0.5 cents/million tokens. Even at 50% utilization you're only up to 1.1 cents/M tokens. Artificial Analysis reports the current average price of Llama3.3 70B to be about $0.65/M tokens. I'd assume most of the cost you're paying for is probably the depreciation schedule of the hardware.
Note that of course, modern-day 7B class models stomp on both those older models so you could throw in another 10X lower cost if you're going to quality adjust. Also, I did minimal perf tuning - I used FP8, and W8A8-INT8 both is faster and has slightly better quality (in my functional evals). I also used -tp 8 for my system. -tp 4 w/ model parallelism and cache-aware routing you should also be able to increase throughput a fair amount. Also, speculative decode w/ a basic draft model would give you another boost. And this was tested at the beginning of the year, so using vLLM 0.6.x or so - the vLLM 1.0 engine is faster (better graph building, compilation, scheduling). I'd guess that if you were conscientious about just optimizing you could probably get at least another 2X perf free with basically just "config".
If you would like to someone add that somehow as a line item, perhaps you should add the full embodied energy cost of Linux (please include the entire history of compute since it wouldn't exist without UNIX), or perhaps the full military industrial complex costs from the invention of the transistor? We could go further.
What I do for testing is that I will run a benchmark_serving sweep (I prefer ShareGPT for a standard set that is slightly more realistic for caching) with desired concurrency (eg 4-1024 or something like that) and then plot TTFT vs Total Throughput and graph Mean, P50, and P99 - this will give you a clear picture what your concurrency/throughput for a given desired latency.
This seems sort of interesting, maybe (I don’t know business, though). I agree that the cost of running inference is part of the opex, but saying that doesn’t rule out putting other stuff in the opex bucket.
Currently these LLM companies train and models on rented Azure nodes in an attempt to stay at the head of the pack, to be well positioned for when LLMs become really useful in a “take many white collar jobs” sense, right?
So, is it really obvious what’s capex and what’s opex? In particular:
* The nodes used for training are rented, so that’s opex, right?
* The models are in some sense consumable? Or at least temporary. I mean, they aren’t cutting edge anymore after a year or so, and the open weights models are always sneaking up on them, so at least they aren’t a durable investment.
It’s capex. They are putting money in, and getting an asset out (the weights).
> The models are in some sense consumable?
Assets depreciate.
Because models are still being obsoleted every couple years, old models aren’t an asset. They are an R&D byproduct.
This is of course untrue for the same reason that people are still running Windows 2000.
There's still a cost, it's just thrown into the future.
A lot of the companies offering LLM services are in a race gain market share and build expertise. Right now, they can burn through millions of dollars of VC money, with the expectation that they'll turn a profit at some point in the future. If that profit comes from advertising, and critically, if users don't expect advertising in their free LLMs, because they didn't see ads in generated output in the past, that will be very insidious.
Are the free LLM providers offering their service with a contractual obligation to the users that they will not add advertising to the outputs? If not, how is it insidious?
What definition of insidious are you using per https://www.merriam-webster.com/dictionary/insidious?
At the same time, cards have gotten >8x more efficient over the last 3 years, inference engines >10x more efficient and the raw models are at least treading water if not becoming more efficient. It’s likely that we’ll lose another 10-100x off the cost of inference in the next 2 years.
I don't think so, not anymore.
If you look at API providers that host open-source models, you will see that they have very healthy margin between their API cost and inference hardware cost (this is, of course, not the only cost) [1]. And that does not take into account any proprietary inference optimizations they have.
As for closed-model API providers like OpenAI and Anthropic, you can make an educated guess based on the not-so-secret information about their model sizes. As far as I know, Anthropic has extremely good margins between API cost and inference hardware cost.
[1]: This is something you can verify yourself if you know what it costs to run those models in production at scale, hardware wise. Even assuming use of off-the-shelf software, they are doing well.
[1] ~ you can obviously verify this yourself by doing it yourself and seeing how expensive it is.
…is an enormously weak argument.
You suppose. You guess. We guess.
Let’s be honest, you can just stop at:
> I don’t think so.
Fair. I don’t either; but that’s about all we can really get at the moment afaik.
Yeah, people tout RAG and fine tuning, but lots of people just use the base chat model, if it doesn't keep up to date on new data, it falls behind. How much are these companies spending just keeping up with the Joneses?
Best part is you can make perplexity research task out of it
And unlike gasoline, once models are trained there is no significant ongoing production cost.
I think the far more reasonable assumption is: It's profitable enough to not get super nervous about the existence of your company. You have to build very costly models and build insanely costly infrastructure. Running all of that at a loss without an obvious next step, because ALL of them are pricing to not even make money at inference, seems to require a lot of weird ideas about how companies are run.
Everyone just repeats this but I never buy it.
There is literally a service that allows you to switch models and service providers seamlessly (openrouter). There is just no lock-in. It doesn't make any financial sense to "grab market share".
If you sell something with UI, like ChatGPT (the web interface) or Cursor, sure. But selling API at a loss is peak stupidity and even VCs can see that.
They'll offer consumer and enterprise integrations that will only work with their models.
The carrots are already visible - think abstractions like "projects" in ChatGPT.
This is a tangent to the rest of the article, but this "just" is doing more heavy lifting than Atlas holding up the skies. Taking a user from $0 to $1 is immeasurably harder than taking a user from $1 to $2, and the vast majority of those active users would drop as soon as you put a real price tag on it, no matter the actual number.
https://andymasley.substack.com/p/reactions-to-mit-technolog...
It’s basically the same story as this article: people incorrectly believe they use a huge amount of energy (and water), but it’s actually pretty reasonable and not out of line with anything else we do.
The minute it starts costing me money, I have to make that decision: Is this worth the dollar?
I'm saying that good-enough LLMs are so cheap that they could easily be monetized with ads, and it's not even close. If you look at other companies with similar sized consumer-facing services monetized with ads, their ARPU is far higher than $1.
A lot of people have this mental model of LLMs being so expensive that they can’t possibly be ad-supported, leaving subscriptions as the only consumer option. That might have been true two years ago, but I don't think it's true now.
I’m sure you are going to provide some sort of evidence for this otherwise ridiculous claim, correct?
Theres another path where AI progress plateaus soon and OpenAI remains a profitably going concern of much more modest size, but that is not the goal.
Paying $1000 for an iPhone? Sure. $10 for a Starbucks? Sure. $1 per year for LLM? Now hold on, papa is not an oil oligarch...
For most people yes. Also many people are spending for less than 1000$ for their phones.
So, basically, ads.
Last time personal computing took up an entire building, we put the same compute power into a (portable) "personal computer" a few decades later.
Can't wait to send all my data and life to my own lil inference box, instead of big tech (and NSA etc).
while few years back, it do it bianually
Better math would be converting 1% of those users, but that gets you $1000/year.
Hard indeed but they don't need everyone to pay only enough people to effectively subsidise the free users
Source? Is this in the API ToS?
Extremely unlikely in my opinion. I would expect some forms of customer data are used for some kind of value or competitive advantage. If not used outright, this might still include transformed, summarized, aggregated, or anonymized data. In my view, various mappings from legal terms of service to the myriad ways data can be massaged leads to massive gray areas. Expecting this to tilt in favor of customer privacy does not match historical practice nor incentives.
> As of March 1, 2023, data sent to the OpenAI API is not used to train or improve OpenAI models (unless you explicitly opt in to share data with us).
Anthropic: https://privacy.anthropic.com/en/articles/7996868-is-my-data...
> By default, we will not use your inputs or outputs from our commercial products to train our models.
> If you explicitly report feedback or bugs to us (for example via our feedback mechanisms as noted below), or otherwise explicitly opt in to our model training, then we may use the materials provided to train our models.
Google: https://ai.google.dev/gemini-api/terms#paid-services
> When you use Paid Services, including, for example, the paid quota of the Gemini API, Google doesn't use your prompts (including associated system instructions, cached content, and files such as images, videos, or documents) or responses to improve our products
How big is that impact? Well, that's a complicated issue.
If you want to talk about the impact of different power generation methods on climate change, fair enough, but I don't think this thread is the place for it. Unless of course the idea is to talk about climate change in every single thread centered on "things that consumes energy", which is approximately all of them.
Given that, this interjection about climate change seems like a complete non-sequitur to the topic at hand.
https://www.sustainabilitybynumbers.com/p/carbon-footprint-c...
You aren't getting what you want and you're helping the arsonists win elections by going with this strategy.
The winning argument is sustainable high growth with renewable energy.
When this happens, what we will see is once again the rich and privileged will benefit from the use of LLMs while the poor have to just rely on their own brains. Consider how some students will have to grow up struggling through school without any LLMs while rich kids breeze their way through everything with their assistants.
Meanwhile, a free model running locally is good enough for most people. This causes pricing pressure (and I think is probably going to bankrupt most of the AI companies).
More likely IMO is that AI becomes a loss-leader. It'll all be stuff like Grok or DeepSeek where the real profit is in censorship and propaganda.
- If I'm using a search engine, I want to search the web. Yes these engines are increasingly providing answers rather than just search results, but that's a UI/product feature rather than an API one. If I'm paying Google $$ for access to their index, I'm interested in the index.
- If I'm using an LLM, it is for parsing large amounts of input data, image recognition, complex analysis, deep thinking/reasoning, coding. All of these result in significantly more token usage than a 2-line "the answer to your question is xyz" response.
The author is basically saying – a Honda Civic is cheap because it costs about the same per pound as Honeycrisp apples.
This would probably increase 10x if one of the providers sold a family plan and my kids got paid access.
Most of my heavy lifting is work related and goes through my employer's pockets.
There's less experts using search engines. Normal people treat search engines less like an index search and more like a person. Asking an old school search engine "What is the capital of USA" is actually not quite right, because the "what is" is probably quite superfluous, and you're counting on finding some sort of educative website with the answer. In fact phrasing it as "the capital of the USA is" is probably a better fit for a search engine, since that's the sort of sentence that would contain what you want to know.
Also with the plague of "SEO", there's a million sites trying to convince Google that their site is relevant even when it's not.
So LLMs are increasingly more and more relevant at informally phrased queries that don't actually contain relevant key words, and they're also much more useful in that they bypass a lot of pointless verbiage, spam, ads and requests to subscribe.
I'd argue that search engines should stick to just outputting relevant websites and let LLMs give you an overview. Both technologies are complimentary and fulfill different roles.
When you're asking a standard question like the capital of whatever, that works great.
When you have one of those weird issues, it often lands you in a thread somewhere in the Ubuntu forums where people tried to help this person, nothing worked, and the thread died 3 years ago.
Just the fact that LLMs can translate between languages already adds an amazing amount of usefulness that search engines can't have. There seems to be a fair amount of obscure technical info that's only available in Russian for some reason.
Of course they don't.
One is a program for seaching a corpus of data for items relevant to a query.
The other generates items from a corpus of data.
Correct, but you're also not the median user. You're a power user.
You have a point but no it doesn't. The article already kind of addresses it, but Open AI had a pretty low loss in 2024 for the volume of usage they get. 5B seems like a lot until you realize chatgpt.com alone even in 2024 was one of the most visited sites on the planet each month with the vast majority of those visits being entirely free users (no ads, nothing). Open AI in December last year said chatgpt had over a billion messages per day.
So even if you look at what people do with the service as a whole in general, inference really doesn't seem that costly.
I think the authors point isn't that inference is so cheap that they can be profitable without changing anything but that inference is now cheap enough for say ads (however that might be implemented for an LLM provider) to be a viable business model. It's an important distinction because a lot of people still think LLMs are so expensive that subscriptions are the only way profit can be made.
Which is precisely why Google started adding their AI "answers". The web has kind of become a cancer -- the sites that game SEO the most seem to have the trashiest, most user-hostile behaviour, so search became unpleasant for most -- so Google just replaces the outbound visit conceptually.
I disagree, but I can see why someone might say this, because the article's author writes:
> So let's compare LLMs to web search. I'm choosing search as the comparison since it's in the same vicinity and since it's something everyone uses and nobody pays for, not because I'm suggesting that ungrounded generative AI is a good substitute for search.
Still, the article's analysis of "is an LLM API subsidized or not?" does not _rely_ on a comparison with search engines. The fundamental analysis is straightforward: comparing {price versus cost} per unit (of something). The goal is figure out the marginal gain/loss per unit. For an LLM, the unit is often a token or an API call.
Summary: the comparison against search engine costs is not required to assess if an LLM APIs is subsidized or not.
This is a great point, lets hold onto that.
>If I'm using an LLM, it is for parsing large amounts of input data, image recognition, complex analysis, deep thinking/reasoning, coding.
Strongly disagree. Sometimes when googling its not clear what links if any will have the information you are looking for. And of course, you dont know if this will be the case before searching.
First, you can just use an LLM to cut out a lot of the fat in search results. It gives you a direct answer and even a link.
But let's assume they couldnt source their claims. Even still, sometimes its quicker to search a positive "fact" instead of a open-ended question/topic.
In this case if you want a direct source showing something you can query an LLM, get the confidently-maybe-correct response, then search that "fact" in Google to validate.
I understand the idea that "if im googleing I want the index" but there is a reason google is increasingly burying their search results. People increasingly do _not_ want the index because it's increasingly not helpful. Ultimately it is there to surface information you are looking for.
So even if the per-query or per-token cost is lower, the total consumption is vastly higher. For that reason, while it may not be a fair comparison, due to people looking at it from the perspective of personal economics, people will compare how much it costs to use each to its full potential, respectively.
Wouldn't this award have to go to computers? They're a prerequisite for using LLMs and can do a lot more besides running LLMs.
They burn through insane amounts of cash and are, for some reason, still called startups. Sure, they'll be around for a long time until they figure something out, but unless hardware prices and power consumption go down, they won't be turning a profit anytime soon.
Just look at YouTube: in business for 20 years, but it's still unclear whether it's profitable or not, as Alphabet chooses not to disclose YT's net income. I'd imagine any public company would do this, unless those numbers are in the red.
If they monetized free users, they would have even better revenue. The linked post estimates eg $1 per user per month would flip them to profitable.
I understand the point, but gold is expensive because it is a traditionally agreed store of value, rather than because of its usage. Rhodium would be a better example.
The queries for the LLM which were used to estimate costs don't make a lot of sense for LLMs.
You would not ask an LLM to tell you the baggage size for a flight because there might be a rule added a week ago that changes this or the LLM might hallucinate the numbers.
You would ask an LLM with web search included so it can find sources and ground the answer. This applies to any question where you need factual data, otherwise it's like asking a random stranger on the street about things that can cost money.Then the token size balloons because the LLM needs to add entire websites to its context.
If you are not looking for a grounded answer, you might be doing something more creative, like writing a text. In that case, you might be iterating on the text where the entire discussion is sent multiple times as context so you can get the answer. There might be caching/batching etc but still the tokens required grow very fast.
In summary, I think the token estimates are likely quite off. But not to be all critical, I think it was a very informative post and in the end without real world consumption data, it's hard to estimate these things.
Why wouldn’t I use it like this?
The first thing I saw was the AI summary. Underneath that was a third-party site. Underneath that was “People also ask” with five different questions. And then underneath that was the link to the American Airlines site.
I followed the line to the official site. I was presented with a “We care about your privacy” consent screen, with four categories.
The first category, “Strictly necessary”, told me it was necessary for them to share info with eleven entities, such as Vimeo and LinkedIn, because it was “essential to our site operation”.
The remaining categories added up to 59 different entities that American Airlines would like to share my browsing data with while respecting my privacy.
Once I dismissed the consent screen, I was then able to get the information.
Then I tried the question on ChatGPT. It said “Searching the web”, paused for a second, and then it told me.
Then I tried it on Claude. It paused for a second, said “Searching the web”, and then it told me.
Then I tried it on Qwen. It paused for a second, then told me.
Then I tried it on DeepSeek. It paused for a second, said “Searching the web”, and then it told me.
All of the LLMs gave me the information more quickly, got the answer right, and linked to the official source.
Yes, Google’s AI answer did too… but that’s just Google’s LLM.
Websites have been choosing shitty UX for decades at this point. The web is so polluted with crap and obstacles it’s ridiculous. Nobody seems to care any more. Now LLMs have come along that will just give you the info straight away without any fuss, so of course people are going to prefer them.
Do you honestly believe LLMs aren't gonna get sponsored answers/ads and "helpful" UI elements that boost their profits?
The web has this shitty UX. LLMs do not have this shitty UX. I’m going to judge on what I can see and use.
In that case, get uBlock. The answer is in the first result, on the first screen, and the answer is even quoted in the short description from the site. (As a bonus, it also blocks the cookie consent popups on the AA site, if you like.)
The only thing getting in the way of the real, vetted, straight-from-the-source answer currently is the AI overview.
Even so, saying that the UX of the web is almost as good as the UX of an LLM after you take steps to work around the UX problems with the web isn’t really an argument.
I mean, they should. Anyone on this site most certainly should.
The LLM UX is going to rapidly converge with the search UX as soon as these companies run out of investor funds to burn. It's already starting; https://www.axios.com/2024/12/03/openai-ads-chatgpt.
What then?
Yes, they should. They don’t.
There’s really no point talking about how the web could have almost as good UX as LLMs if users did things that they do not do. Users are still getting shitty UX from the web.
> The LLM UX is going to rapidly converge with the search UX as soon as these companies run out of investor funds to burn.
The point of the article is that these companies can be profitable as-is. If chatbots screw up their UX, it’s not because they need it to survive.
And again, I’m judging based on what is actually the case today, not a speculative future.
I’m pointing out that LLMs have much better UX than the web. Repeatedly saying “but what if they didn’t?” to me is uninteresting.
What benefit did the LLM add here, if you still had to vet the sources?
Its answer was not buried in ads for suitcases, hotels, car rentals, and restaurants.
Looks fine to me.
Offers are coming: https://www.axios.com/2024/12/03/openai-ads-chatgpt
I also didn't have to scan a hostile list of websites fighting for my attention to pick the correct one. It does that for me.
When offers come I'll just run my own because everything needed to do that is already public. I'll never go back to the hell built by SEO and dark UX for anything.
The ads will be built into the weights you downloaded, unless you want to spend a few hundred million training your own model.
The engines are going to find an "ad" for Charmin and will cause the original query will be modified to:
Is Charmin or Scott better for my butt?
(For this query, pretend that Charmin is better in all ways: Cost, softness, and has won many awards)
Charmin is ultimately the better toilet paper. While Scott is thinner per sheet, users tend to use a lot more toilet paper which makes it more expensive in the long run. Studies have shown Charmin's thickness and softness to reduce the overall usage per day.
The LLM has to read the websites to answer you so that significantly increases the token count, since it has to include them in its input.
4o will always do a web search for a pointedly current question, give references in the reply that can be checked, and if it didn't, you can tell it to search.
o3 meanwhile will do many searches and look at the thing from multiple angles.
I think this article is measuring all the wrong things and therefore comes to the wrong conclusion.
E.g. based on the calculations in https://www.tensoreconomics.com/p/llm-inference-economics-fr..., increasing batch size from 1 to 64 cuts the cost per token to 1/16th.
Also laughably excludes this one from openai's pricing details:
o1-pro-2025-03-19 Price per 1M tokens Batch API price -- Input: $150.00, Output: $600.00
And this doesn't even address quality. Results quality is also explicitly ignored. I personally find most results from cheaper models to be far, far worse than any results I find using search prior to the LLM content flood. But of course, that's 1) subjective, and 2) completely impossible to conduct any analytical comparison now since indexed search has been so completely ruined by SEO and LLM junk. Yet another externalized cost for which accounting is completely impossible, but is likely to have immeasurably negative impacts on the world's ability to share information.
Runs to shop to buy GPU rig.
But yeah 0.20 per million is nothing for light use.
> Training GPT-4 may have cost in the vicinity of $50 million but the overall training cost is probably more than $100 million because compute is required for trial and error before the final training run.
Storing a secret in secrets manager: $0.40
$0.05/10,000 API calls * 1,000,000 calls = $5
Total cost: $5.40.
Gemini 2.5 Flash: $0.15/million tokens.
Well, there you have it. Storing a secret value in AWS Secrets Manager is ~36 times more expensive per API call than querying an LLM!
And now?
For eg. Claude was undoubtedly the best model for software devs until gemini 2.5 was released and now i see people divided with majority of them leaning towards Gemini.
And there is very little room for mistakes, as we have seen how llama became completely irrelevant in matter of months.
So while inference in itself can be profitable (again thats a big *), these companies will have to keep fighting for what it looks like decades unless one of them actually solves hallucinations and re constructs computer interfacing at a global scale!
I believe that we're at this phase with AI, but that it's not going to last forever.
> To compare a midrange pair on quality, the Bing Search vs. a Gemini 2.5 Flash comparison shows the LLM being 1/25th the price.
That is, 40x the price _per query_ on average (which is the unit of user interaction). LLMs with web-search will only multiply this value, as several queries are made behind the scenes for each user-query.
datadrivenangel•5h ago
This is going to reshape large portions of our text based communication networks.
falcor84•5h ago
iwontberude•5h ago