The only argument we have so far is wild extrapolation and faith. The burden of proof is on the proclaimer.
There could be some scenario where it is advantageous to have humans working with AI. But if that isn't how reality plays out then companies won't be able to afford to pay people.
We talk about AI replacing a workforce, but your observation that it's more about replacing applications is spot on. That's definitely going to be the trend, especially for traditional back-office processing.
LLMs act as interfaces to applications which you are capable to build yourself and run your own hardware, since you are much more capable.
([^1]: They have been at it for a long while now, a few thousand years?)
An engineer shackled to an LLM has about 80% output.
That is already "related to what people can afford", in attractive places or not.
And that to write the business analysis that the AI can actually turn into working code requires senior developers.
In a year or so, the open source models will become good enough (in both quality and speed) to run locally.
Arguably, OpenAI OSS 120B is already good enough, in both quality and speed, to run on Mac Studio.
Then $10k, amortized over 3 years, will be enough to run code LLMs 24/7.
I hope that’s the future.
But I'm happy to pay the subscription vs buying a Mac Studio for now.
With medium and high reasoning, I will see between 60 and 120 tokens per second, which is outrageous compared to the LLaMa models I was running before (20-40tps - I'm sure I could have adjusted parameters somewhere in there).
I'm no shill, I'm fairly skeptical about AI, but been doing a lot of research and playing to see what I'm missing.
I haven't bothered running anything locally as the overwhelming consensus is that it's just not good enough yet. And that from posts and videos in the last two weeks.
I've not seen something so positive about local LLMs anywhere else.
It's simply just not there yet, and definitely aren't for a 4090.
Generally, 20b MoE will run faster but be less smart than a 20b dense model. In terms of "intelligence" the rule of thumb is the geometric mean between the number of active parameters and the number of total parameters.
So a 20b model with 3.6b active (like the small gpt-oss) should be roughly comparable in terms of output quality to a sqrt(3.6*20) = 8.5b parameter model, but run with the speed of a 3.6b model.
I'm not saying it is anywhere close to a paid foundation model, but the code it is outputting (albeit simple) has been generally well written and works. I do only get a handful of those high-thought responses before the 50k token window starts to delete stuff, though.
More niche use case models have to be developed for cheaper and energy optimized hardware.
└── Dey well
For a short-term gig, though, I don’t think they would do that.
Also tooling, you can use aider which is ok. But claude code and gemini cli will always be superior and will only work correctly with their respective models.
For well defined tasks that Claude creates, I'll pass off execution to a locally run model (running in another Claude Code instance) and it works just fine. Not for every task, but more than you might think.
But the second point seems even less likely to be true: why will Claude code and Gemini cli always be superior? Other than advantageous token prices (which the people willing to pay the aforementioned premium shouldn’t even care about), what do they inherently have over third-party tooling?
Maybe to answer my own question, LLM developers have one, potentially two advantages over third-party tooling developers: 1) virtually unlimited tokens, zero rate limiting with which to play around with tooling dev. 2) the opportunity to train the network on their own tooling.
The first advantage is theoretically mitigated by insane VC funding, but will probably always be a problem for OSS.
I’m probably overlooking news that the second advantage is where Anthropic is winning right now; I don’t have intuition for where this advantage will change with time.
The inference speed locally will be acceptable in 5-10 years thanks to those generation of chips and finally we can have good local AI apps.
If you want complete control over your data and don't trust anyone's assurances that they keep it private (and why should you) then you have to self-host. But if all you care about is a good price then the free market already provides that for open models
It might be fun to work out how to share, too. A whole new breed of shell hosting.
There's no such thing as models that are "good enough". There are models that are better and models that are worse and OS models will always be worse. Businesses that use better, more expensive models will be more successful.
I don't think we're there yet, but it's reasonable to expect at _some point_ your typical OS model could be 98% of the way to a cutting edge commercial model, and at that point your last sentence probably doesn't hold true.
Better back of house tech can differentiate you, but startups history is littered with failed companies using the best tech, and they were often beaten by companies using a worse is better approach. Anyone here who has been around long enough has seen this play out a number of times.
Indeed. In my idealistic youth I bought heavily into the "if you build it, they will come," but that turned out to not at all be reality. Often times the best product loses because of marketing, network effects, or some other reason that has nothing to do with the tech. I wish it weren't that way, but if wishes were fishes we'd all have a fry
The business itself will also massively develop in the coming years. For example, there will be dozens of providers for integrating open source models with an in-house AI framework that smoothly works with their stack and deployment solution.
"Good enough" for what is the question. You can already run them locally, the problem is that they aren't really practical for the use-cases we see with SOTA models, which are just now becoming passable as semi-reliable autonomous agents. There is no hope of running anything like today's SOTA models locally in the next decade.
Do you think these enterprises will begin hosting their own models? I'm not convinced they'll join the capex race to build AI data centers. It would make more sense they just end up consuming existing services.
Then there are the smaller startups that just never had their own data center. Are those going to start self-hosting AI models? And all of the related requirements to allow say a few hundred employees to access a local service at once? network, HA, upgrades, etc. Say you have multiple offices in different countries also, and so on.
1. Protecting their intellectual property, and
2. Unknown “safety” constraints baked in. Imagine an engineer unable to ran some security tests because LLM thinks it’s “unsafe”. Meanwhile, VP of Sales is on the line with the customer.
they already are
They're much less strict than they were on cloud, but the security practices are really quite strict. I work in this sector and yes, they'll allow cloud, but strong data isolation + segregation, access controls, networking reqs, etc. etc. etc. are very much a thing in the industry still, particularly where the production process is commercially sensitive in itself.
Also, I've never tried really huge local models and especially not RAG with local models.
Based on what?
And where? On systems < 48GB?
I’m not entirely sure that AI companies like Cursor necessarily miscalculated though. It’s noted that the actual strategies the blog advertises are things used by tools like Cursor (via auto mode). The important thing for them is that they are able to successfully push users towards their auto mode and use more usage data to improve their routing and frontier models don’t continue to be so much better AND so expensive that users continue to demand them. I wouldn’t hate that bet if I were Cursor personally.
I’ve just become comfortable using GH copilot in agent mode, but I haven’t started letting it work in an isolated way in parallel to me. Any advise on getting started?
If we assume 5 tasks, each running $400/mo of tokens, we reach an annual bill of $24,000. We would have to see a 4x increase in token cost to reach the $100,000/yr mark. This seems possible with increased context sizes. Additionally, we might see additional context sizes lead to longer running more complicated tasks which would increase my number of parallel tasks.
└── Yarn me
- Dey well: Be well
- Yarn me: Lets talk
└── Dey well/Be wellEdit: Would have sworn that this was in the guidelines but I don't see it just now.
OSS models are only ~1 year behind SOTA proprietary, and we're already approaching a point where models are "good enough" for most usage. Where we're seeing advancements is more in tool calling, agentic frameworks, and thinking loops, all of which are independent of the base model. It's very likely that local, continuous thinking on an OSS model is the future.
At $100k/yr/eng inference spend, your options widen greatly is my point.
The irony is that Kilo itself is playing the same game they're criticizing. They're burning cash on free credits (with expiry dates) and paid marketing to grab market share -- essentially subsidizing inference just like Cursor, just with VC money instead of subscription revenue.
The author is right that the "$20 → $200" subscription model is broken. But Kilo's approach of giving away $100+ in credits isn't sustainable either. Eventually, everyone has to face the same reality: frontier model inference is expensive, and someone has to pay for it.
Unless you got a trove of self starters with a lot of money, they arn't cost efficient.
The $100k/dev/year figure feels like sticker shock math more than reality. Yes, AI bills are growing fast - but most teams I see are still spending substantially lower annually, and that's before applying even basic optimizations like prompt caching, model routing, or splitting work across models.
The real story is the AWS playbook all over again: vendors keep dropping unit costs, customers keep increasing consumption faster than prices fall, and in the end the bills still grow. If you’re not measuring it daily, the "marginal cost is trending down" narrative is meaningless - you’ll still get blindsided by scale.
I'm biased but the winners will be the ones who treat AI like any other cloud resource: ruthlessly measured, budgeted, and tuned.
A fellow HN user's post I engaged with recently talked about low hanging fruits.
What that means for me and where I'm from is some sort of devloan initiative by NGOs and Government Grants, where devs have access to these models/hardware and repay back with some form of value.
What that is, I haven't thought that far. Thoughts?
└── Dey well
For how many developers? Chip design companies aren't paying Synopsys $250k/year per developer. Even when using formal tools which are ludicrously expensive, developers can share licenses.
In any case, the reason chip design companies pay EDA vendors these enormous sums is because there isn't really an alternative. Verilator exists, but ... there's a reason commercial EDA vendors can basically ignore it.
That isn't true for AI. Why on earth would you pay more than a full time developer salary on AI tokens when you could just hire another person instead. I definitely think AI improves productivity but it's like 10-20% maybe, not 100%.
That actually probably is per developer. You might be able to reassign a seat to another developer, but that's still arguably one seat per user.
They're super opaque about pricing but I don't think it's that expensive. Apparently formal tools are way more expensive than simulation though (which makes sense), so we only had a handful of those licenses.
I managed to find a real price that someone posted:
https://www.reddit.com/r/FPGA/comments/c8z1x9/modelsim_and_q...
> Questa Prime licenses for ~$30000 USD.
That sounds way more realistic, and I guess you get decent volume discounts if you want 200 licenses.
It is based on supply and demand of GPUs, the demand currently outstrips supply, while the 'frontier models' are also much more computationally efficient than last year's models in some ways - using far fewer computational resources to do the same thing
so now that everyone wants to use frontier models in "agentic mode" with reasoning eating up a ton more tokens before sticking with a result, the demand is outpacing supply but it is possible it equalizes yet again, before the cycle begins anew
Okay, but when did that ever create a comparable effect for any other kind of software dev in history?
Ultimately, this will become a people problem more than a financial problem. People that lack the confidence to code without AI will cost less to hire and dramatically more to employ, no differently than people reliant on large frameworks. All historical data indicates employers will happily eat that extra cost if it means candidates are easier to identify and select because hiring and firing remain among the most serious considerations for technology selection.
Candidates, currently thought of 10x, that are productive without these helpers will continue to remain no more or less elusive than they are now. That means employers must choose between higher risks with higher selection costs for the potentially higher return on investment knowing that ROE is only realized if these high performance candidates are allowed to execute with high productivity. Employers will gladly eat increased expenses if they can qualify lower risks to candidate selection.
In my experience, a 10x developer that can code without AI becomes a 100x developer because the menial tasks they'd delegate to less-skilled employees while setting technical direction can now be delegated to an AI instead.
If your only skill is writing boilerplate in a framework, you won't be employed to do that with AI. You will not have a job at all and the 100xer will take your salary.
This will reduce demand for devs but it's super likely that after a delay, demand for software development will go even higher.
The only thing I don't know is how that demand for software development will look like. It could be included in DevOps work or IT Project Management work or whatever.
I guess we'll see in a few years.
seriously, I don't see the AI outcome worth that much yet.
On the current level of ai tools, the attention you need to manage 10+ async tasks are over limit for most human.
In 10 years maybe, but $100k probably worths much less by then.
It's only going to get cheaper to train and run these models as time goes on. Modes running on single consumer grade PCs today were almost unthinkable four years ago.
I wonder how the economics will play out, especially when you add in all the different geographic locations for remote devs and their cost.
Why are we assuming everyone uses the full $400? Margins aren't calculated based on only the heaviest users..
And where are they pulling the 100k number from?
This doesn't make any sense to me. Why would Cursor et al expect they could pocket the difference if inference costs went down? There's no stickiness to the product; they would compete down to zero margins regardless. If anything, higher total spend is better for them because it's more to skim off of.
I think there is a case Claude did not reduce their pricing given that they have the best coding models out there. There recent fundraise had them disclose their Gross margins at 60% (and -30% with usage via bedrock etc). This way they can offer 2.5x more tokens at the same price than the vibe code companies and yet break even. The market movement where the assumption did not work out was about how we still only have claude which made vibe coding work and is the most tasteful when it comes to what users want. There are probably models better at thinking and logic, especially o3, but this signals the staying power of claude - having a lock in, it's popularity, and challenges the more fundamental assumption about language models being commodities.
(Speculating) Many companies woudl want to move away from claude but cant because users love the models.
senko•2h ago
> This is driven by two developments: more parallel agents and more work done before human feedback is needed.