There are a whole bunch of companies somewhere else in the world that are getting better and cheaper every month, hardware side included. all without the infinite VC money
For what it's worth, when I provide a Pangram link it's because I can already tell something is AI and I'm attempting to provide objective third-party confirmation so the conversation doesn't just degrade into me asserting that I have superior taste to you.
When Meta announced token leaderboards and other followed, I could see this being the logical conclusion. That whole trend is so dumb because it leads to this.
Company announces they will measure developer performance by how many tokens they burn and constantly talks about how the best developers burn the most tokens. Developers see the message and start burning tokens. And then the company acts surprised when their bills go through the roof.
I personally use my OpenAI subscription pretty heavily, 2-3 agents running practically all day on various tasks but I never even get close to running into limits while I hear about others blowing through limits on multiple accounts in the same time period. I'm convinced that most of those folks and their elaborate workflows aren't really for productivity but for bragging rights about how much they use AI.
Same. But if I was working for an organization that measured token usage, you can bet I would be doing things like creating a cron job that uses claude to create a customized bespoke report update of the current status of all my open assigned tickets and message that to myself 4 times a day... token burn for zero purpose whatsoever.
1. I'm doing it wrong. Apparently I'm supposed to give it a vague paragraph about what the business does, and I can run off and sip margaritas and wake up to a fully fleshed business
2. They don't know what they're doing, and they're sending the LLM off on a wild goose chase that it does a reasonable job of working it's way out of, so they consider it success despite the waste.
This is quite the reductive, charged statement. Can I ask what subscription plan you're using?
My personal experience is unlike this at all-- I work on ever-expanding codebases so I can easily burn tokens. Not to mention, structured agentic coding with adverserial reviews & task organization is not token-efficient. Additionally, for the problems I'm working on, only xhigh or high reasoning gives me worthwhile results while saving time. There are definitely configurations where default consumption doesn't work.
For reference, I used 15 billion tokens (most of it cached) last month on my day job's enterprise plan. That doesn't include my personal plans' usage.
The fact that somebody established a leaderboard for tokenmaxxing ought to follow you around like a black cloud for the rest of your career once the collective hallucination lifts and people realize just how monumentally stupid it was.
Claiming that there's some small subset of their services (like inference per token) that's "profitable" doesn't mean anything when it relies on everything else that company is still paying for. If you could make money from it at current prices - why aren't they?
Otherwise it's just "how much they're willing to subsidize".
The latest deepseek v4 pro model is 2-5x cheaper than Claude Sonnet 4.6. Cursor's Compose 2.5 that was just recently released is 6x cheaper than Sonnet.
The state of the art models are going to get better and more expensive and smaller models are going to get cheaper.
There will be a point where the intelligence of both the cheap and state of the art models are indistinguishable by humans like it is indistinguishable for me to understand the difference the difference between Terrance Tao and my university math professor.
I don't always need the smartest and most expensive models. I will need it every once in awhile and will gladly pay that price if I had to. What I do need is the model that will solve the current problem I have in a reasonable amount of time.
Why do you think this will be true?
Right now I see the major US labs betting on gaining an advantage from having way more compute, and I see Chinese labs competing with one another in a resource-scarce environment, so they place much more emphasis on compute-efficiency.
But the supply chains that feed into the massive data center growth in the US are strained; there are energy, memory, and logistical bottlenecks to name a few.
In the medium-long run, compute capacity will not grow exponentially forever. Somehow it has for decades, but there can be no infinite exponential growth, and that point may be when the planet really starts to cook itself.
Maybe the US labs will become more compute-constrained, and then have to compete on efficiency.
Or maybe things change fundamentally in some other way I'm not thinking of.
Things will normalize, but it will take time.
> This is where open source models are important
open-weights, the training data isn't public
eg compare say gpt 3.5 to latest deepseek. Both cheaper and more at more capable
Here is a recent non-rigorous benchmark I ran against a bunch of models. Qwen3.6 35B A3B fine-tuned with opus data runs plenty fast on my local machine and produce outstanding results - easily in the top 5, comparable to GPT 5.5 Pro (which is $180/mtok).
https://gistpreview.github.io/?31d66ef69e4aed3efae1aec69d86c...
I've predicted for years now that the industry will head down the path of the virus scanning vendors: selling subscriptions to be able to download the latest versions of models. I simply don't see how any other business model is remotely viable, except at the very highest end of inference or video gen.
You can see price vs performance in artificial analysis and the the pareto optimal is all just 6 months old model.
> Did we collectively forget second order thinking?
I bought 2x 16Gb NVIDIA cards this week because I don’t see hardware getting cheaper anytime soon, and because of that I totally don’t see the point of “waiting until prices go lower for graphics cards” because that might not for a long time yet!
In fact, if you include factoring in world events (and the ones that haven’t happened yet but eventually will e.g. China’s 2027 long planned take of Taiwan), then there’s no way graphics prices are going to be accessible to mere mortals until at least 2028.
But my real reasoning is that you’re going to see a flood of OpenAI and Anthropic users leave because of a) increasing pricing plans, and b) impeding business laws on the horizon about protecting sovereign data from AI (i.e data in cloud for training is a no no).
So what happens when people and companies one by one start leaving the SOTA AI cloud for from-good-enough-to-wow models? RAM and graphics cards become the new toilet paper, which is going to double again current prices.
Upgrade now before it’s too late folks!
dtagames•1h ago
An outside small dev shop or internal dev team can pay these prices and spread the cost over several customers or departments, but the era of giving everyone AI and telling them to dev stuff is about to be over.