Edit: It says on the Jetbrains website:
“The AI Assistant plugin is not bundled and is not enabled in IntelliJ IDEA by default. AI Assistant will not be active and will not have access to your code unless you install the plugin, acquire a JetBrains AI Service license and give your explicit consent to JetBrains AI Terms of Service and JetBrains AI Acceptable Use Policy while installing the plugin.”
They didn’t cancel my existing ‘AI Pro’ subscription though, and have just let it keep running with no refunds.
Thanks, Jetbrains. You get worse every day.
Which, of course, is to donate money to Sama so he can create AGI and be less lonely with his robotic girlfriend, I mean...change the world for the better somehow. /s
Then you can think about automated labs. If things pan out, we can have the same thing in chemistry/bio/physics. Having automated labs definitely seems closer now than 2.5 years ago. Is cost relevant when you can have a lab test formulas 24/7/365? Is cost a blocker when you can have a cure to cancer_type_a? And then _b_c...etc?
Also, remember that costs go down within a few generations. There's no reason to think this will stop.
In that bright AGI future, who does my business serve, like who actually are my actual paying clients? Like, the robots are farming, the robots are driving, the robots are "creating" and robots are "thinking", right? In that awesome future, what paid jobs do us humans have, so my clients can afford my amazing entrepreneurial business that I just bootstrapped with the help of 100s of agents? And how did I get the money to hire those 100s of agents in the first place?
> Is cost a blocker when you can have a cure to cancer_type_a? And then _b_c...etc?
Yes, it very much is. The fact that even known and long discovered solutions like insulin for diabetes management are being sold to people at 9x its actual price should speak to you volumes that while it's great to have cures for X, Y and Z, it's the control over the production and development of these cures that is equally, if not much more important for the cure to actually reach people. In this rosy world of yours, do you think Zuck will give you his LLAMAGI-generated cancer cure out of the goodness of his heart? We are talking about the same dude that helped a couple of genocides and added ads in Whatsapp to squeeze the last cent of the people who are trapped with an app that gets progressively worse and more invasive.
https://www.rand.org/news/press/2024/02/01/index1.html
https://systemicjustice.org/article/facebook-and-genocide-ho...
> Also, remember that costs go down within a few generations. There's no reason to think this will stop.
The destruction of the natural world, the fires all around us, the rise of fascism and nationalism, the wars that are spawning all over the place and the fact that white and blue collar jobs are being automated out while soil erosion and PFAS make our land infertile point to a different future. But yeah, I am simply ecstatic at the possibility that the costs of generating a funny picture Ghibli style with a witty caption could go down by 10 to 30%.
On one of the systems I'm developing I'm using LLMs to compile user intents to a DSL, without every looking at the real data to be examined. There are ways; increased context length is bad for speed, cost and scalability.
I managed a deep learning team at Capital One and the lock-in thing is real. Replit is an interesting case study for me because after a one week free agent trial I signed up for a one year subscription, had fun the their agent LLM-based coding assistant for a few weeks, and almost never used their coding agent after that, but I still have fun with Replit as an easy way to spin up Nix based coding environments. Replit seems to offer something for everyone.
And everything, I mean everything after the title is only a downhill:
> saying "this car is so much cheaper now!" while pointing at a 1995 honda civic misses the point. sure, that specific car is cheaper. but the 2025 toyota camry MSRPs at $30K.
Cars got cheaper. The only reason you don't feel it is trade barrier that stops BYD from flooding your local dealers.
> charge 10x the price point > $200/month when cursor charges $20. start with more buffer before the bleeding begins.
What does this even mean? The cheapest Cursor plan is $20, just like Claude Code. And the most expensive Cursor plan is $200, just like Claude Code. So clearly they're at the exact same price point.
> switch from opus ($75/m tokens) to sonnet ($15/m) when things get heavy. optimize with haiku for reading. like aws autoscaling, but for brains.
> they almost certainly built this behavior directly into the model weights, which is a paradigm shift we’ll probably see a lot more of
"I don't know how Claude built their models and I have no insider knowledge, but I have very strong opinions."
> 3. offload processing to user machines
What?
> ten. billion. tokens. that's 12,500 copies of war and peace. in a month.
Unironically quoting data from viberank leaderboard, which is just user-submitted number...
> it's that there is no flat subscription price that works in this new world.
The author doesn't know what throttling is...?
I've stopped reading here. I should've just closed the tab when I saw the first letter in each sentence isn't capitalized. This is so far the most glaring signal of slop. More than the overuse of em-dash and lists.
> when I saw the first letter in each sentence isn't capitalized. This is so far the most glaring signal of slop.
How so? It's the exact opposite imho. Lowercase everything with a staccato writing style to differentiate from AI slop, because LLMs usually don't write lowercase.
This comes across as sloppily written, but not sloppily generated.
This has been working great for the occasional use, I'd probably top up my account by $10 every few months. I figured the amount of tokens I use is vastly smaller than the packaged plans so it made sense to go with the cheaper, pay-as-you-go approach.
But since I've started dabbling in tooling like Claude Code, hoo-boy those tokens burn _fast_, like really fast. Yesterday I somehow burned through $5 of tokens in the space of about 15 minutes. I mean, sure, the Code tool is vastly different to asking an LLM about a certain topic, but I wasn't expecting such a huge leap, a lot of the token usage is masked from you I guess wrapped up in the ever increasing context + back/forth tool orchestration, but still
If I (and billions others) can be bothered to learn your damn language so we can all communicate, do us a service and actually use it properly, FFS.
The article repeats this throughout but isn't it a straight lie? The plan was named 20x because it's 20x usage limits, it always had enforced 5 hour session limits, it always had (unenforced? soft?) 50 session per month limits.
It was limited, but not enough and very very probably still isn't, judging by my own usage. So I don't think the argument would even suffer from telling the truth.
I can’t believe how many comments and articles I’ve read that assume it was unlimited.
It’s like it has been repeated so many times that it’s assumed to be true.
I am seeing problems with formatting that seemed 'solved' already.
I mean, I have seen "the same" model get better and worse already.
clearly somebody is calibrating the stupidity level relative to energy cost and monetary gain
You definitely want that for some tasks, but for the majority of tasks there is a lot of space for cheap & cheerful (and non-thinking)
> consumers hate metered billing. they'd rather overpay for unlimited than get surprised by a bill.
Yes and no.
Take Amazon. You think your costs are known and WHAMMO surprise bill. Why do you get a surprise bill? Because you cannot say 'Turn shit off at X money per month'. Can't do it. Not an option.
All of these 'Surprise Net 30' offerings are the same. You think you're getting a stable price until GOTAHCA.
Now, metered billing can actually be good, when the user knows exactly where they stand on the metering AND can set maximums so their budget doesn't go over.
Taken realistically, as an AI company, you provide a 'used tokens/total tokens' bar graph, tokens per response, and estimated amount of responses before exceeding.
Again, don't surprise the user. But that's an anathema to companies who want to hide tokens to dollars, the same way gambling companies obfuscate 'corporate bux' to USD.
It’s nearly impossible to tell what the hell is going where and we are mostly surviving on enterprise discounts from negotiations.
The worst thing is they worked out you can blend costs in using AWS marketplace without having to raise due diligence on a new vendor or PO. So up it goes even more.
Not my department or funeral fortunately. Our AWS account is about $15 a month.
Not a bug, a feature.
I am not saying this is desirable, but it is necessary IFF you chose to use these services. They are complex by design, and intended primarily for large scale users who do have the expertise to handle the complexity.
Pricing schemes like these just make them move back to virtual machines with "unlimited" shared cpu usage and setting up services (db,...) manually.
You could also have potential customers who would be interested in your solution, but don't want it hosted by an American company. Spinning up a few Hetzner VMs is easy. Finding European alternatives to all the different "serverless" services Amazon offers is hard.
Not happened yet. The nearest I have come to it was a requirement that certain medical information stays in the UK, and that is satisfied by using AWS (or other American suppliers) as long as its hosted in the UK.
Most small business I have dealt with use AWS do just need a VPS. If they are willing to move to a scary unknown supplier I suggest (unknown to them, very often one that would be well known to people on HN) then I suggest AWS Lightsail which is pretty much a normal VPS with VPS pricing - it significantly cheaper than an instance plus storage, just from buying them bundled (which, to be fair to Amazon, is common practice).
My own stuff goes on VPSs.
At that point wouldn't it simply be cheaper to do VMs?
I think a lot of people are missing a key part of the wording of my comment, that capitalised for emphasis "IFF" (which means "if and only if").
I am absolutely certain a lot of people would save money using VMs - or at scale bare metal.
IMO a lot of people are using AWS because it is a "safe" choice management buy into that is not expensive in context (its not a big proportion of costs).
The point where you get sticker shock from AWS is often significantly lower than the point where you have enough money to hire in either of those roles. AWS is obviously the infrastructure of choice if you plan to scale. The problem is that scaling on expertise isn’t instant and that’s where you’re more likely to make a careless mistake and deploy something relatively costly.
This:
> The point where you get sticker shock from AWS is often significantly lower than the point where you have enough money to hire in either of those roles
makes me doubt this:
> AWS is obviously the infrastructure of choice if you plan to scale.
What a baffling comment. Is it normal to even consider hiring someone to figure out how you are being billed by a service? You started with one problem and now you have at least two? And what kind of perverse incentive are you creating? Don't you think your "finops" person has a vested interest in preserving their job by ensuring billing complexity will always be there?
Is it, though? At best someone wearing that hat will explain the bill you're getting. What value do you get from that?
To cut costs, either you microoptimize things, of you redesign systems to shed expenses. The former gets you nothing, the latter is not something a "finops" (whatever that is supposed to mean) brings to the table.
I did say it applies IFF and only IFF you choose to use these services, and if you have chosen to use these services you have presumably decided they are good value for money. If not, why are they using AWS.
Of course the complexity and extra cost of managing the billing is something that someone who has chosen to use AWS has already factored in, right?
The alternative is to not use AWS.
If and only if and only if and only if? :)
(also, while on the topic, I think a simple "if" covers it here, since the relationship is not bidirectional)
Absolutely. This was common for complicated services like telecom/long distance even in the pre-cloud days. Big companies would have a staff or hire a service to review telecom bills and make sure they weren’t overpaying.
I made it clear that you ask the user to choose between 'accept risk of overrun and keep running stuff', 'shut down all stuff on exceeding $ number', or even a 'shut down these services on exceeding number', or other possible ways to limit and control costs.
The cloud companies do not want to permit this because they would lose money over surprise billing.
Up until recently, you could hit somebody else's S3 endpoint, no auth, and get 403's that would charge them 10s of thousands of dollars. Coudnt even firewall it. And no way to see, or anything. Number go up every 15-30 minutes in cost dashboard.
Real responsibility is 'I have 100$ a month for cloud compute'. Give me a easy way to view it, and shut down if I exceed that. That's real responsibility, that Scamazon, Azure, Google - none of them 'permit'.
They (and well, you) instead say "you can build some shitty clone of the functionality we should have provided, but we would make less money".
Oh, and your lambda job? That too costs money. It should not cost more money to detect and stop stuff on 'too much cost' report.
This should be a default feature of cloud: uncapped costs, or stop services
Perhaps requiring support for bill capping is the right way to go, but honestly I don’t see why providers don’t compete at all here. Customers would flock to any platform with something like “You set a budget and uptime requirements, we’ll figure out what needs to be done”, with some sort of managed auto-adjustment and a guarantee of no overage charges.
Ah well, one can only dream.
Because the types of customers that make them the most money don't care about any of this stuff. They'll happily pay whatever AWS (or other cloud provider) charges them, either because "scale" or because the decision makers don't realize there are better options for them. (And depending on the use case, sometimes there aren't.)
This is the exact same thing that frustrates me with GitHub's AI rollout. Been trialing the new Copilot agent, and it's cost is fully opaque. Multiple references to "premium requests" that don't show up real-time in my dashboard, not clear how many I have in total/left, and when these premium requests are referenced in the UI they link to the documentation that also doesn't talk about limits (instead of linking to the associated billing dashboard).
* One chat message -> one premium credit (most at 1 credit but some are less and some, like opus, are 10x)
* Edit mode is the same as Ask/chat
* One agent session (meaning you start a new agent chat) is one "request" so you can have multiple messages and they cost the credit cost of one chat message.
Microsoft's Copilot offerings are essentially a masterclass in cost opaqueness. Nothing in any offering is spelled out and they always seem to be just short of the expectation they are selling.300 or 1500 per month depending on plan. $0.04 per premium request i believe.
This is coding agent, the asynchronous copilot, not the agent chatmode in copilot plugins for vscode etc
But for users, that fine grained cost is not good, because you’re forcing a user to be accountable with metrics that aren’t tied to their productivity. When I was an intern in the 90s, I was at a company that required approval to make long distance phone calls. Some bureaucrat would assess whether my 20 minute phone call was justified and could charge me if my monthly expense was over some limit. Not fun.
Flat rate is the way to go for user ai, until you understand the value in the business and the providers start looking for margin. If I make a $40/hr analyst 20% more productive, that’s worth $16k of value - the $200/mo ChatGPT Pro is a steal.
But that means that if you were conned into using infrastructure that actually costs more than the alternative, making your cost structure worse, you're still going to eat the loss because it's not worth taking your devs time to switch back.
But tokens don't quite have this problem -yet. Most of us can still do development the old way, and it's not a project to turn it off. Expect this to change though.
But for AI in the context of point-solutions and on-the-job use cases, metered billing is a death blow.
In this context, metered is a massive incentive to not use the product and requires the huge friction of having to do a cost/benefit analysis before every task. And if you're using it at work you may even need management sign-off before you can use it again.
For a tool that's intended to amplify productivity, very few humans want to make a cost/benefit analysis 250 times a day whether it's worth $3 to code up a boilerplate or not. On metered billing, they just wont use it.
That's what we're supposed to do, right?
So let's see if we can spend a few tokens to ask the LLM for a cost/benefit analysis of using an LLM to solve the problem. I'd bet we can trust the result...
This extension might make the internet more accessible for you!
If you don't care to trivially make your text readable, then we for sure don't care to spend time to struggle through your text to see if there is any useful substance there.
And, I know this seems dramatic, but besides being cognitively distracting, it also makes me feel sad. Chatroom formatting in published writings is clearly a developing trend at this point, and I love my language so much. Not in a linguistic capacity - I'm not an English expert or anything, nor do I follow every rule - I mean in an emotional capacity.
I'm not trying to be condescending. This is a style choice, not "bad writing" in the typical sense. I realize there is often a lot of low-quality bitterness on both sides about this kind of thing.
Edit:
I also fear that this is exactly the kind of thing where any opinion in opposition to this style will feel like the kind of attack that makes a writer want to push back in a "oh yeah? fuck you" kind of way. I.e. even just my writing this opinion may give an author using the style in question the desire to "double down". Though this conundrum is appropriate (ironic?) - the intensely personal nature of language is part of why I love it.
SEARCH FOR “FILM CRIT HULK” FOR SOME EXAMPLES
Descriptive language is how language evolves, and the internet is the first real regional conflict area that Americans have really ever encountered without traveling.
History, you would have just been in your linguistic local, with your own rules, and differences could easily been attributed to outsiders being outsiders. The internet flattens physical distance.
Thus we have a real parallel to the different regions of Italy, where no one came understand each other, or at least the UK, where different cities have extreme pronunciation differences.
The same exists for written language, and it will continue to diverge culturally. The way I look at it is that language isn’t a thing, trapped in amber, but a river we are all wading through. Different people enter at different times, and we all subtly affect the flow.
I distinctly remember thinking “email” was the dumbest sounding word ever. Now I don’t even hear it.
It’s still fine to nitpick, we’re all battling in the descriptive war for correctness. My own personal hobbyhorse is how stupid American quotations syntax is, when learning at graduate school in the UK that you use single quotes and leave the punctuation outside of the quoted sections, which is entirely sensible!
Does this mean that other languages might offer better information density per token? And does this mean that we could invent a language that’s more efficient for these purposes, and something humans (perhaps only those who want a job as a prompt engineer) could be taught?
Kevin speak good? https://youtu.be/_K-L9uhsBLM?si=t3zuEAmspuvmefwz
https://www.science.org/content/article/human-speech-may-hav...
Granted English is probably going to have better quality output based on training data size
English (And any of the dominant languages that you could use in it's place) work significantly better than other languages purely by having significantly larger bodies of work for the LLM to work from
Maybe even something anyone can read and maybe write… so… Kevin English.
Job applications will ask for how well one can read and write Kevin.
Regarding cost per token: is a token ideally a composable, atomic unit of information? Since English is often used as an encoding format, efficiency is limited by English's encoding capacity.
Could other languages offer higher information density per token? Could a more efficient language be invented for this purpose, one teachable to humans, especially aspiring prompt engineers?
67 tokens vs 106 for the original.
Many languages don't have articles, you could probably strip them from this and still understand what it's saying.
On top of this Gemini CLI still doesn’t support paying through the Google AI subscription. I assume it’s some sort of bureaucratic reason that’s preventing them from moving quickly.
I don't agree with the Cognition conclusion either. Enterprises are fighting super hard to not have a long term buying contract when they know SOTA (app or model) is different every 6 months. They are keeping their switching costs low and making sure they own the workflow, not the tool. This is even more prominent after Slack restricted API usage for enterprise customers.
Making money on the infra is possible, but that again misunderstands the pricing power of Anthropic. Lovable, Replit etc. work because of Claude. Openai had codex, google had jules, both aren't as good in terms of taste compared to Claude. It's not the cli form factor which people love, it's the outcome they like. When Anthropic sees the money being left on the table in infra play, they will offer the same (at presumably better rates given Amazon is an investor) and likely repeat this strategy. Abstraction is a good play, only if you abstract it to the maximum possible levels.
> when a new model is released as the SOTA, 99% of the demand immediately shifts over to it
99% is in the wrong ballpark. Lots of users use Sonnet 4 over Opus 4, despite Opus being 'more' SOTA. Lots of users use 4o over o3 or Gemini over Claude. In fact it's never been a closer race on who is the 'best': https://openrouter.ai/rankings
>switch from opus ($75/m tokens) to sonnet ($15/m) when things get heavy. optimize with haiku for reading. like aws autoscaling, but for brains.
they almost certainly built this behavior directly into the model weights
???
Overall the article seems to argue that companies are running into issues with usage-based pricing due to consumers not accepting or being used to usage based pricing and it's difficult to be the first person to crack and switch to usage based.
I don't think it's as big of an issue as the author makes it out to be. We've seen this play out before in cloud hosting.
- Lots of consumers are OK with a flat fee per month and using an inferior model. 4o is objectively inferior to o3 but millions of people use it (or don't know any better). The free ChatGPT is even worse than 4o and the vast majority of chatgpt visitors use it!
- Heavy users or businesses consume via API and usage based pricing (see cloud). This is almost certainly profitable.
- Fundamentally most of these startups are B2B, not B2C
Thank you for pointing out that fact. Sometimes it's very hard to keep perspective.
Sometimes I use Mistral as my main LLM. I know it's not lauded as the top performing LLM but the truth of the matter is that it's results are just as useful as the best models that ChatGPT/Gemini/Claude outputs, and it is way faster.
There is indeed diminished returns on the current blend of commercial LLMs. Deep seek already proved that cost can be a major factor and quality can even improve. I think we're very close to see competition based on price, which might be the reason there is so much talk about mixture of experts approaches and how specialized models can drive down cost while improving targeted output.
It's great if you can leave it unattended, but personally, coding's an active thing for me, and watching it go is really frustrating.
The meaningful frontier isn't scalar on just the capability, it's on capability for a given cost. The highest capability models are not where 99% of the demand is on. Actually the opposite.
To get an idea of what point on the frontier people prefer, have a look at the OpenRouter statistics (https://openrouter.ai/rankings). Claude Opus 4 has about 1% of their total usage, not 99%. Claude Sonnet 4 is the single most popular model at about 18%. The runners up in volume are Gemini Flash 2.0 and 2.5, which are in turn significantly cheaper than Sonnet 4.
One of the graphs even lists a "Claude 3.5 Opus", which does not exist. After 3.5 Sonnet was released, 3 Opus largely fell into irrelevance until they decided to finally release another big, expensive model with Opus 4, which still isn't anywhere near as popular as Sonnet 4 with users who pay API prices.
They can deliver pretty much whatever they feel like. Who can tell a trash token from an hallucination? And tracking token usage is a pita.
Sum it up and it translates to: sell whatever you feel like at whatever price you feel like.
Nice!
Sure I do!
I will consistently pick the fastest and cheapest model that will do the job.
Sonnet > Opus when coding
Haiku > Sonnet when fusing kitchen recipes, or answering questions where search results deliver the bulk of the value, and the LLM part is really just for summarizing.
It is standard practice with some coding agents to have different models for different tasks, like building and planning.
Which then might lead to you using a lot more, because it offsets some other thing that costs even more still, like your time.
With the primary advancements over the past two years being Chain Of Thought which absolutely obliterates token counts in what world would the "per token" value of a model be going up...
Second, why are SV people obsessed with fake exponentials? It's very clear that AI progress has only been exponential in the sense that people are throwing a lot more resources at AI then they did a couple years ago.
Is it done like this just to show it wasn't written by a LLM?
Thou needst to live in the archaic.
Even before AI, it felt like the value of intelligence and knowledge had been dropping over time. This makes sense as the internet has democratized access to information and promoted intellectual self-improvement. The supply of intelligence increased dramatically but demand for it struggled to keep up (in spite of the tech boom). Now demand for intelligence has plateaued; This is one way to look at current tech layoffs.
It got to a point that intelligence is almost worthless now. 'Earning' money is mostly about social connections, not intelligence. So all these use cases which people are using AI for are pointless in terms of earning money in the current system. The current system rewards money-acquisition, it does not reward value-creation. You don't need intelligence to acquire money in this system; you need social connections. AI does not give you social connections; if anything, it takes away social connections. The people using AI to build themselves an amazing second internet will have nobody to share it with; no users, no investment.
The oversupply of intelligence means that it cannot find any serious avenues to earn a financial return, so instead it turns to political manipulations because system reform (or manipulation) is the shortest path to earning monetary returns... Though often this manipulation only further decouples value creation from money-acquisition.
This seems like such an obvious idea that I'm sure everyone is already working on it!
michaelbuckbee•12h ago
Not every problem needs a SOTA generalist model, and as we get systems/services that are more "bundles" of different models with specific purposes I think we will see better usage graphs.
mustyoshi•12h ago
But we're still in the hype phase, people will come to their senses once the large model performance starts to plateau
_heimdall•12h ago
bakugo•2h ago
Like what? People always talk about how amazing it is that they can run models on their own devices, but rarely mention what they actually use them for. For most use cases, small local models will always perform significantly worse than even the most inexpensive cloud models like Gemini Flash.
zamadatix•1h ago
simonjgreen•12h ago
alecco•12h ago
This shouldn't be that expensive even for large prompts since input is cheaper due to parallel processing.
isoprophlex•12h ago
danielbln•12h ago
illusive4080•10h ago
nateburke•12h ago
In the food industry is it more profitable to sell whole cakes or just the sweetener?
The article makes a great point about replit and legacy ERP systems. The generative in generative AI will not replace storage, storage is where the margins live.
Unless the C in CRUD can eventually replace the R and U, with the D a no-op.
marcosdumay•4h ago
I really don't understand where you are trying to get. But on that example, cakes have a higher profit margin, and sweeteners have larger scale.
empiko•11h ago
benreesman•1h ago
AI companies advertise peak AI performance, users select AI tools on worst case AI fuckups: hence, only SOTA is ever in demand. TFA illustrates this well.
AI will be judged on it's worst performance, just like people are fired for their worst showing, not their best. No one cares about AI performance in ideal (read: carefully contrived) settings. We care how bad it fucks up when we take our eyes off it for 2 seconds.