The only argument we have so far is wild extrapolation and faith. The burden of proof is on the proclaimer.
There could be some scenario where it is advantageous to have humans working with AI. But if that isn't how reality plays out then companies won't be able to afford to pay people.
We talk about AI replacing a workforce, but your observation that it's more about replacing applications is spot on. That's definitely going to be the trend, especially for traditional back-office processing.
LLMs act as interfaces to applications which you are capable to build yourself and run your own hardware, since you are much more capable.
It's going to be really fun for us people who love to write unicode symbols into numeric input boxes and such funny things.
([^1]: They have been at it for a long while now, a few thousand years?)
An engineer shackled to an LLM has about 80% output.
That is already "related to what people can afford", in attractive places or not.
And that to write the business analysis that the AI can actually turn into working code requires senior developers.
In a year or so, the open source models will become good enough (in both quality and speed) to run locally.
Arguably, OpenAI OSS 120B is already good enough, in both quality and speed, to run on Mac Studio.
Then $10k, amortized over 3 years, will be enough to run code LLMs 24/7.
I hope that’s the future.
But I'm happy to pay the subscription vs buying a Mac Studio for now.
With medium and high reasoning, I will see between 60 and 120 tokens per second, which is outrageous compared to the LLaMa models I was running before (20-40tps - I'm sure I could have adjusted parameters somewhere in there).
I'm no shill, I'm fairly skeptical about AI, but been doing a lot of research and playing to see what I'm missing.
I haven't bothered running anything locally as the overwhelming consensus is that it's just not good enough yet. And that from posts and videos in the last two weeks.
I've not seen something so positive about local LLMs anywhere else.
It's simply just not there yet, and definitely aren't for a 4090.
Generally, 20b MoE will run faster but be less smart than a 20b dense model. In terms of "intelligence" the rule of thumb is the geometric mean between the number of active parameters and the number of total parameters.
So a 20b model with 3.6b active (like the small gpt-oss) should be roughly comparable in terms of output quality to a sqrt(3.6*20) = 8.5b parameter model, but run with the speed of a 3.6b model.
I'm not saying it is anywhere close to a paid foundation model, but the code it is outputting (albeit simple) has been generally well written and works. I do only get a handful of those high-thought responses before the 50k token window starts to delete stuff, though.
My agentic coding "app" (basically just a tool "server" around dotnet/git/fs commands with a kanban board) seems to be able to spit out quick SPAs with little additional prompting.
More niche use case models have to be developed for cheaper and energy optimized hardware.
└── Dey well
For a short-term gig, though, I don’t think they would do that.
Also tooling, you can use aider which is ok. But claude code and gemini cli will always be superior and will only work correctly with their respective models.
For well defined tasks that Claude creates, I'll pass off execution to a locally run model (running in another Claude Code instance) and it works just fine. Not for every task, but more than you might think.
But the second point seems even less likely to be true: why will Claude code and Gemini cli always be superior? Other than advantageous token prices (which the people willing to pay the aforementioned premium shouldn’t even care about), what do they inherently have over third-party tooling?
Maybe to answer my own question, LLM developers have one, potentially two advantages over third-party tooling developers: 1) virtually unlimited tokens, zero rate limiting with which to play around with tooling dev. 2) the opportunity to train the network on their own tooling.
The first advantage is theoretically mitigated by insane VC funding, but will probably always be a problem for OSS.
I’m probably overlooking news that the second advantage is where Anthropic is winning right now; I don’t have intuition for where this advantage will change with time.
If they use a hosted model, they’ll probably pin everything initially to, at best, the second newest model from their chosen provider (the newest being insufficiently proven) and update models to something similarly behind only when the older model goes completely out of support.
There's also a personal good enough point for everyone who's hoping to cut the cord and go local. If local models get as good as current moments Claude Sonnet, I would actually be totally fine using that locally and riding the local improvements from then on.
And for local stuff like home automation or general conversational tasks, local has been good enough for a while now. I don't need the hypercar of LLMs to help me with cooking a recipe for example.
The inference speed locally will be acceptable in 5-10 years thanks to those generation of chips and finally we can have good local AI apps.
If you want complete control over your data and don't trust anyone's assurances that they keep it private (and why should you) then you have to self-host. But if all you care about is a good price then the free market already provides that for open models
It might be fun to work out how to share, too. A whole new breed of shell hosting.
There's no such thing as models that are "good enough". There are models that are better and models that are worse and OS models will always be worse. Businesses that use better, more expensive models will be more successful.
I don't think we're there yet, but it's reasonable to expect at _some point_ your typical OS model could be 98% of the way to a cutting edge commercial model, and at that point your last sentence probably doesn't hold true.
One of the key concepts in the AI zeitgeist is the possibility of superintelligence. There will always be the possibility of a more productive AI agent.
Better back of house tech can differentiate you, but startups history is littered with failed companies using the best tech, and they were often beaten by companies using a worse is better approach. Anyone here who has been around long enough has seen this play out a number of times.
Indeed. In my idealistic youth I bought heavily into the "if you build it, they will come," but that turned out to not at all be reality. Often times the best product loses because of marketing, network effects, or some other reason that has nothing to do with the tech. I wish it weren't that way, but if wishes were fishes we'd all have a fry
The business itself will also massively develop in the coming years. For example, there will be dozens of providers for integrating open source models with an in-house AI framework that smoothly works with their stack and deployment solution.
"Good enough" for what is the question. You can already run them locally, the problem is that they aren't really practical for the use-cases we see with SOTA models, which are just now becoming passable as semi-reliable autonomous agents. There is no hope of running anything like today's SOTA models locally in the next decade.
Beyond that, running inference on the equivalent of a 2025 SOTA model with 100GB of VRAM is very unlikely. One consistent quality of transformer models has been the fact that smaller and quantized models are fundamentally unreliable, even though high quality training data and RL can boost the floor of their capabilities.
Do you think these enterprises will begin hosting their own models? I'm not convinced they'll join the capex race to build AI data centers. It would make more sense they just end up consuming existing services.
Then there are the smaller startups that just never had their own data center. Are those going to start self-hosting AI models? And all of the related requirements to allow say a few hundred employees to access a local service at once? network, HA, upgrades, etc. Say you have multiple offices in different countries also, and so on.
1. Protecting their intellectual property, and
2. Unknown “safety” constraints baked in. Imagine an engineer unable to ran some security tests because LLM thinks it’s “unsafe”. Meanwhile, VP of Sales is on the line with the customer.
they already are
They're much less strict than they were on cloud, but the security practices are really quite strict. I work in this sector and yes, they'll allow cloud, but strong data isolation + segregation, access controls, networking reqs, etc. etc. etc. are very much a thing in the industry still, particularly where the production process is commercially sensitive in itself.
Also, I've never tried really huge local models and especially not RAG with local models.
Based on what?
And where? On systems < 48GB?
When your provider is dumping at a loss, it's their way of saying that the business plan is to maximize lock-in/monopoly effects followed by the infamous "enshittification".
There is also nothing stopping this silly world from breaking out into a dispute where chips are embargoed. Then we'll have high API prices and hardware prices (if there's any hardware at all). Even for the individual it's worth having that 2-3k AI machine around, perhaps two.
presumably... capitalism still exists?
Shame anyone is actually _paying_ for commercial inference, its worse than whatever you can do locally.
Hardware vendors will create efficient inference pcie chips and innovations in ram architecture will make make even mid-level devices capable of running local 120B parameter models efficiently.
Open source models will get good enough that there isn’t a meaningful difference between them and the closed source offerings.
Hardware is relatively cheap, it’s just that vendors haven’t had enough cycles yet on getting local inference capable devices out to the people.
I give it 5 years or so before this is the standard
I’m not entirely sure that AI companies like Cursor necessarily miscalculated though. It’s noted that the actual strategies the blog advertises are things used by tools like Cursor (via auto mode). The important thing for them is that they are able to successfully push users towards their auto mode and use more usage data to improve their routing and frontier models don’t continue to be so much better AND so expensive that users continue to demand them. I wouldn’t hate that bet if I were Cursor personally.
I’ve just become comfortable using GH copilot in agent mode, but I haven’t started letting it work in an isolated way in parallel to me. Any advise on getting started?
If we assume 5 tasks, each running $400/mo of tokens, we reach an annual bill of $24,000. We would have to see a 4x increase in token cost to reach the $100,000/yr mark. This seems possible with increased context sizes. Additionally, we might see additional context sizes lead to longer running more complicated tasks which would increase my number of parallel tasks.
OSS models are only ~1 year behind SOTA proprietary, and we're already approaching a point where models are "good enough" for most usage. Where we're seeing advancements is more in tool calling, agentic frameworks, and thinking loops, all of which are independent of the base model. It's very likely that local, continuous thinking on an OSS model is the future.
At $100k/yr/eng inference spend, your options widen greatly is my point.
The irony is that Kilo itself is playing the same game they're criticizing. They're burning cash on free credits (with expiry dates) and paid marketing to grab market share -- essentially subsidizing inference just like Cursor, just with VC money instead of subscription revenue.
The author is right that the "$20 → $200" subscription model is broken. But Kilo's approach of giving away $100+ in credits isn't sustainable either. Eventually, everyone has to face the same reality: frontier model inference is expensive, and someone has to pay for it.
Unless you got a trove of self starters with a lot of money, they arn't cost efficient.
I believe it's pretty clear when you use these credits that it's temporary (and that it's a marketing strategy), vs claude/cursor where they have to fit their costs into the subscription price and make things opaque to you
The $100k/dev/year figure feels like sticker shock math more than reality. Yes, AI bills are growing fast - but most teams I see are still spending substantially lower annually, and that's before applying even basic optimizations like prompt caching, model routing, or splitting work across models.
The real story is the AWS playbook all over again: vendors keep dropping unit costs, customers keep increasing consumption faster than prices fall, and in the end the bills still grow. If you’re not measuring it daily, the "marginal cost is trending down" narrative is meaningless - you’ll still get blindsided by scale.
I'm biased but the winners will be the ones who treat AI like any other cloud resource: ruthlessly measured, budgeted, and tuned.
A fellow HN user's post I engaged with recently talked about low hanging fruits.
What that means for me and where I'm from is some sort of devloan initiative by NGOs and Government Grants, where devs have access to these models/hardware and repay back with some form of value.
What that is, I haven't thought that far. Thoughts?
└── Dey well
For how many developers? Chip design companies aren't paying Synopsys $250k/year per developer. Even when using formal tools which are ludicrously expensive, developers can share licenses.
In any case, the reason chip design companies pay EDA vendors these enormous sums is because there isn't really an alternative. Verilator exists, but ... there's a reason commercial EDA vendors can basically ignore it.
That isn't true for AI. Why on earth would you pay more than a full time developer salary on AI tokens when you could just hire another person instead. I definitely think AI improves productivity but it's like 10-20% maybe, not 100%.
That actually probably is per developer. You might be able to reassign a seat to another developer, but that's still arguably one seat per user.
They're super opaque about pricing but I don't think it's that expensive. Apparently formal tools are way more expensive than simulation though (which makes sense), so we only had a handful of those licenses.
I managed to find a real price that someone posted:
https://www.reddit.com/r/FPGA/comments/c8z1x9/modelsim_and_q...
> Questa Prime licenses for ~$30000 USD.
That sounds way more realistic, and I guess you get decent volume discounts if you want 200 licenses.
It is based on supply and demand of GPUs, the demand currently outstrips supply, while the 'frontier models' are also much more computationally efficient than last year's models in some ways - using far fewer computational resources to do the same thing
so now that everyone wants to use frontier models in "agentic mode" with reasoning eating up a ton more tokens before sticking with a result, the demand is outpacing supply but it is possible it equalizes yet again, before the cycle begins anew
Okay, but when did that ever create a comparable effect for any other kind of software dev in history?
Ultimately, this will become a people problem more than a financial problem. People that lack the confidence to code without AI will cost less to hire and dramatically more to employ, no differently than people reliant on large frameworks. All historical data indicates employers will happily eat that extra cost if it means candidates are easier to identify and select because hiring and firing remain among the most serious considerations for technology selection.
Candidates, currently thought of 10x, that are productive without these helpers will continue to remain no more or less elusive than they are now. That means employers must choose between higher risks with higher selection costs for the potentially higher return on investment knowing that ROE is only realized if these high performance candidates are allowed to execute with high productivity. Employers will gladly eat increased expenses if they can qualify lower risks to candidate selection.
In my experience, a 10x developer that can code without AI becomes a 100x developer because the menial tasks they'd delegate to less-skilled employees while setting technical direction can now be delegated to an AI instead.
If your only skill is writing boilerplate in a framework, you won't be employed to do that with AI. You will not have a job at all and the 100xer will take your salary.
This will reduce demand for devs but it's super likely that after a delay, demand for software development will go even higher.
The only thing I don't know is how that demand for software development will look like. It could be included in DevOps work or IT Project Management work or whatever.
I guess we'll see in a few years.
seriously, I don't see the AI outcome worth that much yet.
On the current level of ai tools, the attention you need to manage 10+ async tasks are over limit for most human.
In 10 years maybe, but $100k probably worths much less by then.
It's only going to get cheaper to train and run these models as time goes on. Modes running on single consumer grade PCs today were almost unthinkable four years ago.
I wonder how the economics will play out, especially when you add in all the different geographic locations for remote devs and their cost.
Why are we assuming everyone uses the full $400? Margins aren't calculated based on only the heaviest users..
And where are they pulling the 100k number from?
This doesn't make any sense to me. Why would Cursor et al expect they could pocket the difference if inference costs went down? There's no stickiness to the product; they would compete down to zero margins regardless. If anything, higher total spend is better for them because it's more to skim off of.
The temptation to monetize it in shady ways will be irresistible.
And giving up control will reduce income sources.
If for free, won't happen, as I said, you'd just turn them into a dumb pipe.
If paid, yeah, maybe that could work as they offload the compute, but then they would need to figure out how to push ads to you and probably also sell your data
Mechanical engineers pay gobs of money for software suites that make them productive. They do automatic FEA, renders, etc. There's no ads in those. The parent companies can spend millions / y to improve that software and sell upgraded seats.
Windows as a software product has _barely_ dipped its does into ads, and survived forever at about 100/machine costs.
There are lots of examples. The only reason some software ends up cloud is so you can extort subscription feeds and the software is _cheap_ to run for the user.
LLMs are not like that. They are fundamentally expensive to run for users. An obvious end objective here is to slash all the operational budget / api teams / web ui teams / inferrence infra costs and roll that into just "build better models". It's trivial to wrap a model in a sign-in workflow and ship it to users as a core piece of software that all their other local software can use to gain LLM super powers. Selling upgrades every year keeps people paying 100/machine/user. The market size here is as big as the PC + phone market. It's enormous.
It just seems like the obvious end game is to focus on making good models and productionizing them vs running them as a service with all the headaches that takes while also building useful tools while also training new models.
I have a graphics card my employer bought, they aren't going to care if it costs 100/y more if I gain this productivity boost.
This is how it works (ad free) in almost every other engineering discipline, from matlab to autocad to adobe etc etc.
The future of LLMs is basically Windows XP: A software tax on all machines sold + Ios: A software tax on all phones sold.
I think there is a case Claude did not reduce their pricing given that they have the best coding models out there. There recent fundraise had them disclose their Gross margins at 60% (and -30% with usage via bedrock etc). This way they can offer 2.5x more tokens at the same price than the vibe code companies and yet break even. The market movement where the assumption did not work out was about how we still only have claude which made vibe coding work and is the most tasteful when it comes to what users want. There are probably models better at thinking and logic, especially o3, but this signals the staying power of claude - having a lock in, it's popularity, and challenges the more fundamental assumption about language models being commodities.
(Speculating) Many companies woudl want to move away from claude but cant because users love the models.
As far as spend per dev- I can’t even manage to use up the limits on my $100 Claude plan. It gets everything done and I run out of things to ask it. Considering that the models will get better and cheaper over time, I’m personally not seeing a future where I will need to spend that much more than $100 a month.
senko•5mo ago
> This is driven by two developments: more parallel agents and more work done before human feedback is needed.