Tokens are getting more expensive

https://ethanding.substack.com/p/ai-subscriptions-get-short-squeezed

44•admp•2h ago

Comments

michaelbuckbee•37m ago

A major current problem is that we're smashing gnats with sledgehammers via undifferentiated model use.

Not every problem needs a SOTA generalist model, and as we get systems/services that are more "bundles" of different models with specific purposes I think we will see better usage graphs.

mustyoshi•15m ago

Yeah this is the thing people miss a lot. 7,32b models work perfectly fine for a lot of things, and run on previously high end consumer hardware.

But we're still in the hype phase, people will come to their senses once the large model performance starts to plateau

simonjgreen•14m ago

Completely agree. It’s worth spending time to experiment too. A reasonably simple chat support system I build recently uses 5 different models dependent on the function it it’s in. Swapping out different models for different things makes a huge difference to cost, user experience, and quality.

alecco•10m ago

If there was an option to have Claude Opus guide Sonnet I'd use it for most interactions. Doing it manually is a hassle and breaks the flow, so I end up using Opus too often.

This shouldn't be that expensive even for large prompts since input is cheaper due to parallel processing.

nateburke•6m ago

generalist = fungible?

In the food industry is it more profitable to sell whole cakes or just the sweetener?

The article makes a great point about replit and legacy ERP systems. The generative in generative AI will not replace storage, storage is where the margins live.

Unless the C in CRUD can eventually replace the R and U, with the D a no-op.

comrade1234•35m ago

I'm kind of curious what IntelliJ's deal is with the different providers. I usually just keep it set to Claude but there are others that you can pick. I don't pay extra for the AI assistant - it's part of my regular subscription. I don't think I use the AI features as heavily as many others, but it does feed my code base to whoever I'm set to...

louthy•25m ago

Are you sure you don’t pay extra? I’m on Rider and it’s an additional cost. Unless us C# and F# devs are subsidising everyone else :D

Edit: It says on the Jetbrains website:

“The AI Assistant plugin is not bundled and is not enabled in IntelliJ IDEA by default. AI Assistant will not be active and will not have access to your code unless you install the plugin, acquire a JetBrains AI Service license and give your explicit consent to JetBrains AI Terms of Service and JetBrains AI Acceptable Use Policy while installing the plugin.”

comrade1234•22m ago

When they first added the assistant it was $100/yr to enable it. However, it's now part of the subscription and they even reimbursed me a portion of the $100 that I paid.

double051•20m ago

If you pay for the all products subscription, their AI features are now bundled in. I believe that may be a relatively recent change, and I would not have known about it if I hadn't been curious and checked.

ath3nd•32m ago

Mathematics are not relevant when we have hype and vibes. We can't have facts and projections and no path to profitability distract us from our final goal.

Which, of course, is to donate money to Sama so he can create AGI and be less lonely with his robotic girlfriend, I mean...change the world for the better somehow. /s

NitpickLawyer•17m ago

I get your point but I think it's debatable. As long as the capabilities increase (and they have, IMO) cost isn't really relevant. If you can reasonably solve problems of a given difficulty (and we're starting to see that), then suddenly you can do stuff that you simply can't with humans. You can "hire" 100 agents / servers / API bundles, whatever and "solve" all tasks with difficulty x in your business. Then you cancel and your bottom is suddenly raised. You can't do that with humans. You can't suddenly hire 100 entry-level SWEs and fire them after 3 months.

Then you can think about automated labs. If things pan out, we can have the same thing in chemistry/bio/physics. Having automated labs definitely seems closer now than 2.5 years ago. Is cost relevant when you can have a lab test formulas 24/7/365? Is cost a blocker when you can have a cure to cancer_type_a? And then _b_c...etc?

Also, remember that costs go down within a few generations. There's no reason to think this will stop.

flyinglizard•25m ago

The truth is we're brute forcing some problems via tremendous amount of compute. Especially for apps that use AI backends (rather than chats where you interface with the LLM directly), there needs to be hybridization. I haven't used Claude Code myself but I did a screenshare session with someone who does and I think I saw it running old fashioned keyword search on the codebase. That's much more effective than just pushing more and more raw data into the chat context.

On one of the systems I'm developing I'm using LLMs to compile user intents to a DSL, without every looking at the real data to be examined. There are ways; increased context length is bad for speed, cost and scalability.

mark_l_watson•21m ago

I have already thought a lot about the large packaged inference companies hitting a financial brick wall, but I was surprised by material near the end of the article: the discussions of lock in for companies that can’t switch and about Replit making money on the whole stack. Really interesting.

I managed a deep learning team at Capital One and the lock-in thing is real. Replit is an interesting case study for me because after a one week free agent trial I signed up for a one year subscription, had fun the their agent LLM-based coding assistant for a few weeks, and almost never used their coding agent after that, but I still have fun with Replit as an easy way to spin up Nix based coding environments. Replit seems to offer something for everyone.

raincole•11m ago

First of all the title is click-bait. Tokens are getting cheaper and cheaper. People just use more and more tokens.

And everything, I mean everything after the title is only a downhill:

> saying "this car is so much cheaper now!" while pointing at a 1995 honda civic misses the point. sure, that specific car is cheaper. but the 2025 toyota camry MSRPs at $30K.

Cars got cheaper. The only reason you don't feel it is trade barrier that stops BYD from flooding your local dealers.

> charge 10x the price point > $200/month when cursor charges $20. start with more buffer before the bleeding begins.

What does this even mean? The cheapest Cursor plan is $20, just like Claude Code. And the most expensive Cursor plan is $200, just like Claude Code. So clearly they're at the exact same price point.

> switch from opus ($75/m tokens) to sonnet ($15/m) when things get heavy. optimize with haiku for reading. like aws autoscaling, but for brains.

> they almost certainly built this behavior directly into the model weights, which is a paradigm shift we’ll probably see a lot more of

"I don't know how Claude built their models and I have no insider knowledge, but I have very strong opinions."

> 3. offload processing to user machines

What?

> ten. billion. tokens. that's 12,500 copies of war and peace. in a month.

Unironically quoting data from viberank leaderboard, which is just user-submitted number...

> it's that there is no flat subscription price that works in this new world.

The author doesn't know what throttling is...?

I've stopped reading here. I should've just closed the tab when I saw the first letter in each sentence isn't capitalized. This is so far the most glaring signal of slop. More than the overuse of em-dash and lists.

djhworld•11m ago

Over the past year or two I've just been paying for the API access and using open source frontends like LibreChat to access these models.

This has been working great for the occasional use, I'd probably top up my account by $10 every few months. I figured the amount of tokens I use is vastly smaller than the packaged plans so it made sense to go with the cheaper, pay-as-you-go approach.

But since I've started dabbling in tooling like Claude Code, hoo-boy those tokens burn _fast_, like really fast. Yesterday I somehow burned through $5 of tokens in the space of about 15 minutes. I mean, sure, the Code tool is vastly different to asking an LLM about a certain topic, but I wasn't expecting such a huge leap, a lot of the token usage is masked from you I guess wrapped up in the ever increasing context + back/forth tool orchestration, but still

TechDebtDevin•4m ago

$20.00 via Deepseek's api (Yes China, can have my code idc), has lasted me almost a year. Its slow, but better quality output than any of the independently hosted Deepseek models (ime). I don't really use agents or anything tho.

senko•6m ago

[delayed]

furyofantares•3m ago

[delayed]

Russia's creepy sport farms [video]

Perfection in Tetris

Rice grain-sized earphone may have been used to cheat on TOEIC English test

The "miracle material" has been bent like never before

A laundry-folding robot blew up the Internet. We talked to the inventor

Tequila, Drugs and Torture: The Spending Binge of Two Crypto Bros

The highest funded academic plagiarist may no longer work as a Prof at UPC

$83B Wasted: Showing Up at the Airport 3 Hours Before Your Flight

Putin Widens Effort to Control Russia's Internet

Want a glass of wine with dinner? Blame our ape ancestors

PRFI Protocol: Decentralized API Tokenization (122M Token Cap)

C: A Language for MicroProcessors? (1977 Byte Magazine)

Scuba Part 1 – Basics and Recreational Diving

Show HN: Slupe lets web LLMs safely edit local files

Why thousands of M&A deals are avoiding antitrust scrutiny

KuKu Klok Online Alarm Clock

Show HN: Sync and mount volumes on GPUs sourced from popular cloud providers

American Victims of Hamas and Hezbollah Attacks Sue U.N. Agency

Thousands rally for hostage deal after haunting videos of captives

Dear Boltdotnew, Fuck You

Individuals with lower incomes show greater physiological tuning

A Steep Mountain Drive, a Brake Failure and a Volvo Recall

China is using cyber attribution to pressure Taiwan

Programming Projects for Advanced Beginners

Building for the Era of Experience

Falcon-H1: A Family of Hybrid-Head Models Redefining Efficiency and Performance

That Squiggly, Treacherous Line

Humans May Only Have 41,000 Years to Catch Signs of Aliens Before They Fade

What's wrong with the JSON gem API?

Silent Push CEO on cybercrime takedowns: 'It's an ongoing cat-and-mouse game'