LLMs can produce better code for languages and domains I’m not proficient in, at a much faster rate, but damn it’s rare I look at LLM output and don’t spot something I’d do measurably better.
These things are average text generation machines. Yes you can improve the output quality by writing a good prompt that activates the right weights, getting you higher quality output. But if you’re seeing output that is consistently better than what you produce by hand, you’re probably just below average at programming. And yes, it matters sometimes. Look at the number of software bugs we’re all subjected to.
And let’s not forget that code is a liability. Utilizing code that was “cheap” to generate has a cost, which I’m sure will be the subject of much conversation in the near future.
The first iteration of Claude code is usually a big over-coded mess, but it's pretty good at iterating to clean it up, given proper instruction.
"Make the smallest possible change. Do not refactor existing code unless I explicitly ask."
That directive cut down considerably on the amount of extra changes I had to review. When it gets it right, the changes are close to the right size now.
The agent still tries to do too much, typically suggesting three tangents for every interaction.
- it adds superfluous logic that is assumed but isn’t necessary
- as a result the code is more complex, verbose, harder to follow
- it doesn’t quite match the domain because it makes a bunch of assumptions that aren’t true in this particular domain
They’re things that can often be missed in a first pass look at the code but end up adding a lot of accidental complexity that bites you later.
When reading an unfamiliar code base we tend to assume that a certain bit of logic is there for a good reason, and that helps you understand what the system is trying to do. With generative codebases we can’t really assume that anymore unless the code has been thoroughly audited/reviewed/rewritten, at which point I find it’s easier to just write the code myself.
Coding aside, LLM's aren't very good at following nice practices in general unless explicitly prompted to. For example if you ask an LLM to create an error modal box from scratch, will it also implement the ability to select the text, or being able to ctrl c to copy the text, or perhaps a copy message button? Maybe this is a bad example, but they usually don't do things like this unless you explicitly ask them to. I don't personally care too much about this, but I think it's noteworthy in the context of lay people using LLM's to vibe code.
So, in short, LLMs write better code than I do. I'm not alone.
This is absolutely not true lol, as anyone who's worked with a fabled 10X engineer will tell you. It's like saying the best civil engineer is the one that builds the most bridges.
The best code looks real boring.
It's so good that we are genuinely left with crappy options to replace it, and people have died in fires that could have been saved with the right application of asbestos.
Current AI hype is closer to the Radium craze back during the discovery of radioactivity. Yes it's a neat new thing that will have some interesting uses. No don't put it in everything and especially not in your food what are you doing oh my god!
Some seniors love to bikeshed PRs all day because they can do it better but generally that activity has zero actual value. Sometimes it matters, often it doesn't.
Stop with the "I could do this better by hand" and ask "is it worth the extra 4 hours to do this by hand, or is this actually good enough to meet the goals?"
There's "okay for now" and then there's "this is so crap that if we set our bar this low we'll be knee deep in tech debt in a month".
A lot of LLM output in the specific areas _I_ work in is firmly in that latter category and many times just doesn't work.
Also, even for novel domains, using tools like deep research and the ability of these tools to straight up search through the internet, including public repos during the planning phase (you should be planning first before implementing right? You're not just opening a window and asking in a few sentences for a vaguely defined final product I hope) is a huge level up.
If there are repos, papers, articles, etc of your novel domain out there, there's a path to a successful research -> plan -> implement -> iterate path out there imo, especially when you get better at giving the tools ways to evaluate their own results, rather than going back and forth yourself for hours telling them "no, this part is wrong, no now this part is wrong, etc etc"
That being said, given the quality of code these things produce, I just don't see that ever stopping being the case. These things require a lot of supervision and at some point you are spending more time asking for revisions than just writing it yourself.
There's a world of difference between an MPV which, in the right domain, you can get done much faster now, and a finished product.
And greenfield code is some of the most enjoyable to write, yet apparently we should let robots do the thing we enjoy the most, and reserve the most miserable tasks for humans, since the robots appear to be unable to do this.
I have yet to see an LLM or coding agent that can be prompted with "Please fix subtle bugs" or "Please retire this technical debt as described in issue #6712."
*(Of course, depending on the issue, it could be doing anything from surpressing logs so existing tests pass, to making useless-but-passing tests, to brute-forcing special cases, to possibly actually fixing something.)
Just like writing assembly is today.
There's an interesting aspect to the LLM debt being taken on though in that I'm sure some are taking it on now in the bet/hopes that further advancements in LLMs will make it more easily addressable in the future before it is a real problem.
The LLM-generated code is by far the worst technical debt. And a fair bit of that time is spent debugging subtle issues where it doesn't quite do what was prompted.
Having to write all the specs and tests just right so you can regenerate the code until you get the desired output just sounds like an expensive version of the infinite monkey theorem, but with LLMs instead of monkeys.
I use LLMs to generate tests as well, but sometimes the tests are also buggy. As any competent dev knows, writing high-quality tests generally takes more time than writing the original code.
The shape of the problem is super important in considering the results here
On the other hand, a highly skilled worker who just joined the team won't have any of that tribal knowledge. There is a significant lag time getting ramped up, no matter how intelligent they are due to sheer scale (and complexity doesn't help).
A general purpose model is more like the latter than the former. It would be interesting to compare how a model fine tuned on the specific shape of your code base and problem domain performs.
What about all the other, large amounts of cases? Don't you ever face situations in which an LLM can greatly help (and outrace) you?
But well.. when working with coworkers on known projects it's a different story, right?
My stance is these tools are, of course, useful, but humans can most definitely be faster than the current iteration of these tools in a good number of tasks, and some form of debugging tasks are like that for me. The ones I've tried have been too prone to meandering and trying too many "top results on Google"-style fixes.
But hey maybe I'm just holding it wrong! Just seems like some of my coworkers are too
It's let me apply my general knowledge across domains, and do things in tech stacks or languages I don't know well. But that has also cost me hours debugging a solution I don't quite understand.
When working in my core stack though it's a nice force multiplier for routine changes.
what's your core stack?
BUT
An LLM can write a PNG decoder that works in whatever language I choose in one or a few shots. I can do that too, but it will take me longer than a minute!
(and I might learn something about the png format that might be useful later..)
Also, us engineers can talk about code quality all day, but does this really matter to non-engineers? Maybe objectively it does, but can we convince them that it does?
What I got was an absolute mess that did not work at all. Perhaps this was because, in retrospect, BMP is not actually all that simple, a fact that I discovered when I did write a BMP decoder by hand. But I spent equal time vibe coding and real coding. At the end of the real coding session, I understood BMP, which I see as a benefit unto itself. This is perhaps a bit cynical but my hot take on vibe coders is that they place little value on understanding things.
The vibe coded version was a different story. For simplicity, I wanted to stick to an early version of BMP. I don’t remember the version off the top of my head. This was a simplified implementation for students to use and modify in a class setting. Sticking to early version BMPs also made it harder for students to go off-piste since random BMPs found on the internet probably would not work.
The main problem was that the LLM struggled to stick to a specific version of BMP. Some of those newer features (compression, color table, etc, if I recall correctly) have to be used in a coordinated way. The LLM made a real mess here, mixing and matching newer features with older ones. But I did not understand that this was the problem until I gave up and started writing things myself.
It sounds like you used an older model, and perhaps copy-pasted code from a chat session. (Just guessing, based on what you described.)
In short: when you produce the PNG decoder, and are satisfied with it, it's because you don't have a good reason to care about the code quality.
> Maybe objectively it does, but can we convince them that it does?
I strongly doubt it, and that's why articles like TFA project quite a bit of concern for the future. If non-engineers end up accepting results from a low-quality, not-quite-correct system, that's on them. If those results compromise credentials, corrupt databases etc., not so much.
how long would you give our current civilisation if quality of software ceased to be important for:
- medical devices
- aircraft
- railway signalling systems
- engine management systems
- the financial system
- electrical grid
- water treatment
- and every other critical system
unless "AI" dies, we're going to find outIn the unlikely event you did, you would be doing something quite special to not be using an off-the-shelf library. Would an LLM be able to do whatever that special thing would be?
It's true that quality doesn't matter for code that doesn't matter. If you're writing code that isn't important, then quality can slip, and it's true an LLM is good candidate for generating that code.
The worst case I remember happened a few months ago when a staff (!) engineer gave a presentation about benchmarks they had done between Java and Kotlin concurrency tools and how to write concurrent code. There was a very large and strange difference in performance favoring Kotlin that didn't make sense. When I dug into their code, it was clear everything had been generated by a LLM (lots of comments with emojis, for example) and the Java code was just wrong.
The competent programmers I've seen there use LLMs to generate some shell scripts, small python automations or to explore ideas. Most of the time they are unimpressed by these tools.
That's hilarious LLM code is always very bad. It's only merit is it occasionally works.
> LLMs can produce better code for languages and domains I’m not proficient in.
I am sure that's not true.
We should feed the output code back in to get even better code.
even though this statement does not mathematically / statistically make sense - vast majority of SWEs are “below average.” therein lies the crux of this debate. I’ve been coding since the 90’s and:
- LLM output is better than mine from the 90’s
- LLM output is better than mine from early 2000’s
- LLM output is worse than any of mine from 2010 onward
- LLM output (in the right hands) is better than 90% of human-written code I have seen (and I’ve seen a lot)
Funny... seems like about half of devs think AI writes good code, and half think it doesn't. When you consider that it is designed to replicate average output, that makes a lot of sense.
So, as insulting as OP's idea is, it would make sense that below-average devs are getting gains by using AI, and above-average devs aren't. In theory, this situation should raise the average output quality, but only if the training corpus isn't poisoned with AI output.
I have an anecdote that doesn't mean much on its own, but supports OP's thesis: there are two former coworkers in my linkedin feed who are heavy AI evangelists, and have drifted over the years from software engineering into senior business development roles at AI startups. Both of them are unquestionably in the top 5 worst coders I have ever worked with in 15 years, one of them having been fired for code quality and testing practices. Their coding ability, transition to less technical roles, and extremely vocal support for the power of vibe coding definitely would align with OP's uncharitable character evaluation.
They are certainly opening more PRs. Being the gate and last safety check on the PRs is certainly driving me in the opposite direction.
The moment you start the prompt with "You are an interactive CLI tool that helps users with software engineering at the level of a veteran expert" you have biased the LLM such that the tokens it produces are from a very non-average part of the distribution it's modeling.
See examples in https://arxiv.org/abs/2305.14688; They certainly do say things like "You are a physicist specialized in atomic structure ...", but the important point is that the rest of the "expert persona" prompt _calls attention to key details_ that improves the response. The hint about electromagnetic forces in the expert persona prompt is what tipped off the model to mention it in the output.
Bringing attention to key details is what makes this work. A great tip for anyone who wants to micromanage code with an LLM is to include precise details about what they wish to micromanage: say "store it in a hash map keyed by unsigned integers" instead of letting the model decide which data structure to use.
Front end pages like a user settings page? Done. One shottable.
Nuanced data migration problems specific to your stack? You're going to be yelling at the agent.
> LLM evangelists - are you willing to admit that you just might not be that good at programming computers? Maybe you once were. Maybe you never were.
A bit harsh considering that many of us used knowledge bases like SO for so long to figure out new problems that we were confronting.
This is only one shottable if you are high paced startup or you don't care enough. In real world software, you would need to make it accessible, store data in a complaint way, hook up translations, make sure all inputs are validated and do some usability testing.
If as a SWE you see the oncoming change and adapt to it, no issue.
If as a SWE you see the enablement of LLMs as an existential threat, then you will find many issues, and you will fail to adapt and have all kind of issues related to it.
I saw a version of this yesterday where a commenter framed LLM-skepticism as a disappointing lack of "hacker" drive and ethos that should be applied to making "AI" toolchains work.
As you might guess, I disagreed: The "hacker" is not driven just by novelty in problems to solve, but in wanting to understand them on more than a surface layer. Messing with kludgy things until they somehow work is always a part of software engineering... but the motive and payoff comes from knowing how things work, and perceiving how they could work better.
What I "fear" from LLMs-in-coding is that they will provide an unlimited flow of "mess around until it works" drudgery tasks with none of the upside. The human role will be hammering at problems which don't really have a "root cause" (except in a stochastic sense) and for which there is never any permanent or clever fix.
Would we say someone is "not really an artist" just because they don't want to spend their days reviewing generated photos for extra-fingers, circling them, and hitting the "redo" button?
We have a hard enough time finding juniors (hell, non-juniors) that know how to program and design effectively.
The industry jerking itself off over Leetcode practice already stunted the growth of many by having them focus on rote memorization and gaming interviews.
With ubiquitous AI and all of these “very smart people” pushing LLMs as an alternative to coding, I fear we’re heading into an era where people don’t understand how anything works and have never been pushed to find out.
Then again, the ability of LLMs to write boilerplate may be the reset that we need to cut out all of the people that never really had an interest in CS that have flocked to the industry over the last decade or so looking for an easy big paycheck.
I had assumed most of them had either filtered out at some stage (an early one being college intro CS classes), ended up employed somewhere that didn't seem to mind their output, or perpetually circle on LinkedIn as "Lemons" for their next prey/employer.
My gut feeling is that messy code-gen will increase their numbers rather than decrease them. LLMs make it easier to generate an illusion of constant progress, and the humans can attribute the good parts of the output to themselves, while blaming bad-parts on the LLM.
Most schools' CS departments have shifted away from letting introductory CS courses perform this function— they go out of their way to court students who are unmotivated or uninterested in computer science fundamentals. Hiring rates for computer science majors are good, so anything to up those enrollment numbers makes the school look better on average.
That's why intro courses (which were often already paced painfully slowly for anyone with talent or interest, even without any prior experience) are being split into more gradual sequences, Python has gradually replaced Scheme virtually everywhere in schools (access to libs subordinating fundamental understanding even in academia), the relaxation of the major's math requirements, etc.
Undergraduate computer science classrooms are increasingly full of mercenaries who not only don't give a shit about computer science, but lack basic curiosity about computation.
- computer science
- computer engineering
- software engineering
- mathematics
- some kind(s) of interdisciplinary programs that interweave computing with fine arts, liberal arts, or business, e.g.,
- digital humanities
- information science
- idk what other disciplines
and provide generously list courses taught in one department but highly relevant in another under multiple headings, for use as electives in adjacent minors and majors.IIRC, when I was in school, my university only had programs in "computer science", "electrical and computer engineering", "management information systems", "mathematics", and an experimental interdisciplinary thing they called "information science, technology, and the arts". Since then, they've created a "software engineering" major, which I imagine may have alleviated some of the misalignment I saw in my computer science classes.
I loved the great range of theory classes available to me, and they were my favorite electives. If there had been more (e.g., in programming language design, type theory, or functional programming), I definitely would have taken them. But if we'd had a software engineering program, I likely would have tried to minor in that as well!
To me, it's an old-school liberal art (like geometry and arithmetic) that specialists typically pursue as a formal science (that is, a science of logical structure rather than experimentation, like mathematics or Chomskyan grammar). The engineering elements that I see as vital to computer science per se are not really software engineering in the broadest sense, but mostly about fundamentals of computing that are taught in most computer science programs already (compilers, operating systems, binary operations, basic organization of CPUs, mainframes, etc.).
My computer science program technically had only one course on software engineering per se, and I think schools should really offer more than that. In fact, I think that's not enough even within a "computer science" program. But I think the most beneficial way to provide courses of broader interest is with "clear but porous" boundaries between the various members of this cluster of related disciplines, rather than revising core computer science curricula to court students who aren't really interested in computer science per se.
I feel like its very true to the hacker spirit to spend more time customizing your text editor than actually programming, so i guess this is just the natural extension.
1. This thing at work broke. Understand why it broke, and fix it in a way which stays and has preventative power. In the rare case where the cause is extremely shallow, like a typo, at least the fix is still reliable.
2. This thing at work broke. The LLM zigged when it should have zagged for no obvious reason. There is plausible-looking code that is wrong in a way that doesn't map to any human (mis-)understanding. Tweak it and hope for the best.
We're well past ad nauseum now. Let's talk about anything else.
Small models don't have as much world knowledge as very large models (proprietary or open source ones), but it's not always needed. They still can do a lot of stuff. OCR and image captioning, tagging, following well-defined instructions, general chat, some coding, are all things local models do pretty well.
Edit: fixed unnecessarily abrasive wording
Let's make it more of a category thing: when AI shows itself responsible for a new category of life-saving technique, like a cure for cancer or Alzheimer's, then I'd have to reconsider.
(And even then, it will be balanced against rising sea levels, extinctions, and other energy use effects.)
We’re way past that
Search through github for commits authored by .edu, .ac.uk etc emails and spend a few days understanding what they’ve been building the past few years. Once you’ve picked your jaw off the floor, take another 10 minutes to appreciate that this is just the public code by some researchers, and is crumbs compared to what is being built right now behind closed doors.
Tenured professors are abdicating their teaching positions to work on startups. Commercial labs are pouring billions into tech that was unreachable just a few years ago. Academic labs are downscaling their interns 20x. Historically hermit companies are opening their doors to build partnerships in industry.
The scale of what is happening is difficult to comprehend.
Steam reached a new peak of 42 million concurrent players today [1]. An average/mid-tier gaming PC uses 0.2 kWh per hour [2]. 42 million * 0.2 gives 8,400,000 kWh per hour, or 8,400 MWh per hour.
By contrast, training GPT3 was estimated to have used 1,300 MWh of energy [3].
This does not account for training costs of newer models, nor inference costs. But we know inference costs are extraordinarily inexpensive and energy efficient [2]. The lowest estimate of energy cost for 1 hour of Steam's peak concurrent player count uses 6.5x more energy than all of the energy that went into training GPT3.
[1]: https://www.gamespot.com/articles/steam-has-already-set-a-ne...
[2]: https://jamescunliffe.co.uk/is-gen-ai-bad-for-the-environmen...
[3]: https://www.theverge.com/24066646/ai-electricity-energy-watt...
Who lied to you and told you this was some kind of saving gotcha??
I was skeptical of the LLM energy use claim. I went looking for numbers on energy usage in a domain that most people do not worry about or actively perceive as a net negative. Gaming is a very big industry ($197 billion in 2025 [1], compare to the $252 billion in private AI investment for 2025 [2]) and mostly runs on the same hardware as LLMs. So it's a good gut check.
I have not seen evidence that LLM energy usage is out of control. It appears to be much less than gaming. But please feel free to provide sources that demonstrate this lie.
The question is whether claims of AI energy use have sustenance, or if there are other industries that should be more concerning. People are either truly concerned about the cost of energy or it's a misplaced excuse to reinforce their negative opinions.
[1]: https://gameworldobserver.com/2025/12/23/the-gaming-industry...
[2]: https://hai.stanford.edu/ai-index/2025-ai-index-report/econo...
Within five years I think the debate will be over, and I think I know what the outcome will be.
It's trivial to share coding sessions, be they horrific or great. Without those, you're hot air on the internet, independent of whatever specific opinions on LLMs you voice.
Same could be said about the anti-AI crowd.
I'm glad the author made the distinction that he's talking about LLMs, though, because far too many people these days like to shout from the rooftops about all AI being bad, totally ignoring (willfully or otherwise) important areas it's being used in like cancer research.
I am curious why the author doesn't think this saves them time (i.e. makes them more productive).
I never had terribly high output as a programmer. I certainly think LLMs have helped increased the amount of code that I can write, net total, in a year. Not to superhuman levels or even super-me levels, just me++.
But, I think the total time spent producing code has gone down to a fraction and has allowed me more time to spend thinking about what my code is meant to solve.
I wonder about two things: 1. maybe added productivity isn't going to be found in total code produced, because there is a limit on how much useful code can be produced that is based on external factors 2. do some devs look at the output of an LLM and "get the ick" because they didn't write it and LLM-code is often more verbose and "ugly", even though it may work? (this is a total supposition and not an accusation in any way. i also understand that poorly thought out, overly verbose code comes with problems over time)
To answer my own question, if you can pump out features faster but turn around and spend more time on bugs than you do previously then your productivity is likely net neutral.
There is a reason LoC as a measure of productivity has been shunned from the industry for many, many years.
To try and give an example, say that you want to make a module that transforms some data and you ask the LLM to do it. It generates a module with tons of single-layer if-else branches with a huge LoC. Maybe one human dev looks at it and says, "great this solves my problem and the LoC and verbosity isn't an issue even though it is ugly". Maybe the second looks at it and says, "there's definitely some abstraction I can find to make this easier to understand and build on top of."
Depending on the scenario and context, either of them could be correct.
OTOH, for a given developer to implement a given feature in a given system, at the end of the day, some amount of code has to be written.
If a particular developer finds that AI lets him write code comparable to what he would have written, in lieu of the code he would have written, but faster than he can do it alone, then looking at lines written might actually be meaningful, just in that context.
> even though it may work?
The first of those is about taste, and it's real, and engineers with bad taste write unstable buggy systems.
The second of those is about priority. If all you want is functional code, any old thing will do. That's what I do for one-off scripts. But if you plan to support the code at 2am when exposed to production requests on the internet, you need to understand it, which is about legibility and coherence.
I hope you do have taste, and I hope you value more than simple "it works" tests. But it might be worth looking there for why some struggle with LLM output.
For what it's worth, I use coding agents all the time, but almost never accept their output verbatim outside of boilerplate code.
For those who have been around a while, dealing with the "over time" of yesteryear is a daily occurrence. So naturally they are more averse to it. And LLMs seem to dramatically shorten the duration of "over time".
Simon Wilson (known for Django) has been doing a lot of LLM evangelism on his blog these days. Antirez (Redis) wrote a blog post recently with the same vibe.
I doubt they are not good programmers. They are probably better than most of us, and I doubt they feel insecure because of the LLMs. Either I'm wrong, or there's something more to this.
edit: to clarify, I'm not saying Simon and Antirez are part of the hostile LLM evangelists the article criticizes. Although the article does generalize to all LLM evangelists at least in some parts and Simon did react to this here. For these reasons, I haven't ruled him out as a target of this article, at least partly.
And yeah, as I laid out in the article (that of course, very few people actually read, even though it was short...), I really don't mind how people make code. It's those that try so hard to convince the rest of us I find very suspect.
By now we at least agree that stochastical parrots can be useful. It would be nice if the debate now was less polarized so we could focus on what makes them work better for some and worse for others other than just expectations.
>Skipping AI is not going to help you or your career. Think about it.
Smart guy, but he can speak for himself.
How could this happen with local models already in service?
The article is against the set of LLM evangelists who are hostile towards the skeptics.
I 100% agree with the part that basically says fuck you to them.
However, explaining the hostile part with there being the feeling of insecurity (which is plausible but would need evidence) is not fully convincing and it seems dangerous to accept this conclusion and stop looking for the actual reasons this quickly.
And the fact that there are actually good programmers persuaded that LLMs help them weakens the "insecurity" argument quite a bit, at least as the only explanation.
As someone currently pretty much hostile to LLMs, I'm quite interested in what's currently at play but I'm suspicious of claims that initially feel good but are not strongly backed.
Like, if these hostile people were actually shills, we would want to know this and not have closed the eyes too early because of some explanation that felt good, right? Or any actual reason.
This article just functions as flamebait for people who use LLMs to implement whole features to argue the semantics of "vibe coding". All while everyone is ignoring the writing on the wall. That we will soon have boxes going through billions of tokens every second. At that point slopcoding WILL be productive, but only if you build up the skill to differentiate yourself from the top 10% of prompters.
There is not some new skill that you need to learn if you were already a competent programmer.
Also they didn't adopt the your-career-is-ruined-if-you-don't-get-on-board tone that is sickeningly pervasive on LinkedIn. If you believe that advice and give up on being someone who understands code, you sure aren't gonna write Redis or Django.
most top engineers will have their best work locked up in their employer's private repositories
simonw and antirez have an advantage here, and at least the former is very good at self-promotion
Questions like.. did we really even need to invoke particular influencers to discuss this issue? Why does that come up at all, and why is it the top comment? If names and argument from authority can settle issues on HN now, does it work with all credentialed authorities, or only those vocal few with certain opinions?
I understand your point and how frustrating it is trying to reason with people who think using a tool repeatedly means they understand how it works.
Also their experience is not my experience. I will make my own choices.
Here have some actual skill
https://github.com/PCBox/PCBox
This guy is half his age and keeps to himself. Pure quality.
No.
Someone had to do the implementation, after all. And the C API was (and still is) kind of a big deal.
There's a reason the standard library is full of direct ports of C libraries with unsightly, highly un-Pythonic names and APIs. (Of course, it's also full of direct ports of Java libraries with unsightly, highly un-Pythonic architecture.)
or
no, you, as an LLM evangelist, are not not willing to admit this?
Also curious if you publish your working setup or if it changes as fast as the LLMs? Seems like you may have a more stable setup than most given how you are developing tools in the space.
We've all been through The Daily WTF at least once. That's representative of the average. (Although some examples are more egregious than others.)
I'd say The Daily WTF are the more spectacularly weird and wrong, not representative; I've seen a few things deserving to be on their site in real life*, but the average I've seen has been much better than that.
It's difficult to be sure, but I think these models are roughly like someone with 1-3 years of experience, though the worst codebase(s) I've seen have been from person(s) who somehow lasted a decade or two in the industry.
* 1000 lines inside an always-true if-block, a pantheon of god classes that poorly re-invent the concept of named properties, copy-pasting class files rather than subclassing even after I'd added comments about needing to de-duplicate things in them, and that was all the same project.
Apart from my own personal anecdotal information, the state of web development is absolutely dire [^1]. Software bloat has been a problem for as long as I've been using computers. In fact, the software crisis [^2] predates _me_, and I'm an old man.
I'm sure there are people who disagree that inefficiency in software is a problem at all, or that it's even related to the original question (whether LLMs have reached the level of an average programmer). LLMs can generate an astronomical amount of bloat! Because that's what they have been trained on. Most of the code in the wild is outrageously bad.
And that's my point. LLMs absolutely have reached the level of average programmers because average programmers produce code that most critical thinkers believe is of objectively low quality.
I can't say what the average years of experience is for the developers that create these SPA behemoths. Perhaps an estimate based on the Stack Overflow Annual Developer Survey is in the right ballpark. Most respondents (21.1%) in 2025 have been coding for 6 - 10 years [^3]. This is of course across all domains, not just web development. Without detailed breakdowns, this is the best information I have on the subject.
[^1]: https://tonsky.me/blog/js-bloat/
[^2]: https://en.wikipedia.org/wiki/Software_crisis
[^3]: https://survey.stackoverflow.co/2025/developers#2-years-codi...
My current setup is mainly Claude Code CLI on macOS and Claude Code for web driven by the iPhone all and macOS desktop app. I occasionally use Codex CLI too.
I expect I'll be on a different default combo of tools within a month or two.
Do you mind when people do this about topics you find interesting? If not, why are you even on HN?
In fact I would posit this is the central crux of the post: OP does not believe those LLM evangelists were ever good programmers.
As others have already noted[1], many well-known excellent programmers - including yourself! and now even Linus! - would beg to differ.
Personally I think the rate of improvement will plateau: in my experience software inevitably becomes less about tech and more about the interpersonal human soup of negotiating requirements, needs, contradiction, and feedback loops, at lot of which is not signal accessible to a text-in-text-out engine.
they said this every time too
While this place has always been attractive to people building startups, back in the day (my original account is from 2009) "Hacker" News was much more about Hackers. Most people posting here had read "On Lisp", respected Paul Graham as a programmer and were enthusiastic about programming and solving problems above all else.
I'm honestly curious how many people that visit HN today even know what a "y combinator" is, and I have a pretty reasonable guess as to how many have implemented it for fun (though probably the applicative order version).
For what it's worth, I think most of them are genuine when they say they're seeing 10X gains,they just went from, like, a 0.01X engineer (centi-swe) to to a 0.1X engineer (deci-swe).
The whole point of the AI coding thing is that it lets inexperienced people create software. What skill moat are you building that a skilled software developer won't be able to pick up in 20 minutes?
Everyone now is driving automatic, LLMs are the manual transmission in a classic car with "character".
Yes, anyone can step into one, start it and maybe get there, but the transmission and engine will make strange noises all the way and most people just stick to the first gear because the second gear needs this weird wiggle and a trick with the gas pedal to engage properly.
Using (agentic) LLMs as coding assistants is a skill that (at the moment) can't really be taught as it's not deterministic and based a lot on feels and getting the hang of different base models (gemini, sonnet/opus, gpt, GLM etc). The only way to really learn is by doing.
Yes, anyone can start up Google Antigravity or whatever and say "build me a todo app" and you'll get one. That's just the first gear.
The attitude you present here has become my litmus test for who has actually given the agents a thorough shake rather than just a cursory glance. These agents are tools, not magic (even though they appear to be magic when they are really humming). They require operator skill to corral them. They need tweaking and iteration, often from people already skilled in the kinds of software they are trying to write. Its only then that you get the true potential, and its only then you realize just how much more productive you can be _because of them_, not _in spite of them_. The tools are imperfect, and there are a lot of rough edges that a skilled operator can sand down to become huge boons for themselves rather than getting cut and saying "the tools suck".
Its very much like google. Anyone can google anything. But at a certain point, you need to understand _how to google_ to get good results rather than just slapping any random phrase and hoping the top 3 results are magically going to answer you. And sometimes you need to combine the information from multiple results to get what you are going for.
Lmao, and just as with google, they’ll eventually optimize it for ad delivery and it will become shit.
And there it is, the insecure evangelism.
"Why are you using Go? Rust is best! You should be using that!" "Don't use AWS CDK, use Terraform! Don't you know anything?"
https://knowyourmeme.com/videos/433740-just-coffee-black
> want to feel normal, to walk around and see that most other people made the same choice they made
In technology, the historical benefits of evangelizing your favorite technology might just be that it becomes more popular and better supported.
Even though LLMs may or may not follow the same path, if you can get your fellow man on-board, then you'll have a shared frame of reference, and someone to talk through different usage scenarios.
The worst thing you can say to a dev is they are wrong. Most will do everything in their power to prove otherwise, even on the dumbest of topics.
You don't hear from all the people who don't feel that others must know their opinion.
Lurkers always outweigh posters.
Don't ever make the mistake of believing that a sample of posts is a sample of people
Demonstrably impossible if you’re actually properly trying to use them in non-esoteric domains. I challenge anyone to very honestly showcase a non-esoteric domain in which opus4.5 does not make even the most experienced developer more productive.
If you say it's demonstrably impossible that someone can't be made more productive with opus4.5, then it should probably be up to you to demonstrate impossibility.
Not enough training data couldn’t be the problem - Bazel is not an esoteric domain. Unless you’re trying to do something esoteric.
lol. is this supposed to be like some sort of "gotcha"! yes? like maybe i am a really shitty programmer and always just wanted to hack things together. what it has allowed me to do is prevent burnout to some extent, outsource the "boring" parts and getting back to building things i like.
also getting tired of these extreme takes but whatever, it's #1 so mission accomplished. llms are neither this or that. just another tool in the toolbox, one that has been frustrating in some contexts and a godsend in others and part of that process is figuring out where it excels and doesn't.
I use LLMs for things I am not good at. But I also know I am not good at them.
No one is good at everything. 100% fine.
i mean if you have imposter syndrome then this feeling will always be prevalent. how do you know what you are good at or not? i might be competent enough to have progressed this far in my career as in "results", but comparison to people i consider "good" devs always puts in that doubt.
i guess it strikes a chord when someone in the same breath of claiming to be open minded makes a backhand comment where people who like llms might just must be a shitty programmer or whatever. i get the point, but that line doesn't quite land the way you think it does.
Oh yeah that's entirely possible. I think mentioned it several times in the article. One always has to be open to the possibility that one is just ignorant.
i guess it strikes a chord when someone in the same breath of claiming to be open minded makes a backhand comment where people who like llms might just must be a shitty programmer or whatever.
It does somewhat depress me that even in a very short article, people can't figure out I am talking about the irritating LLM Evangelists, not everyone who uses LLMs. It's in the title.
i get the point, but that line doesn't quite land the way you think it does.
Landed well enough to #1 on this hell site :)
Note I also think AI is bad for philosophical/ethical reasons.
If I can build better/faster with reasonably equal quality, I'll trade off the joy of programming for the joy of more building, of more high level problem solving and thinking, etc.
I've also seen the opposite: those that derive more joy from the programming and the cool engineering than from the product. And you see the opposite behavior from them, of course--such as selecting a solution that's cool and novel to build, rather than the simple, boring, but better alternative.
I often find this type of engineer rather frustrating to work with, and coincidentally, they seem to be the most anti-AI type I've encountered.
Its always been the case that engineers come in many flavors, some more and some less business-inclined. The difference with AI imo is that it will (or already is) putting its trillion-dollar finger on the scale, such that there is less patience and space for people like me, and more for people like you.
The author's central complaint is that LLM evangelists dismiss skeptics with psychological speculation ("you're afraid of being irrelevant"). Their response? Psychological speculation ("you're projecting insecurity about your coding skills").
This is tu quoque dressed up as insight. Fighting unfounded psychoanalysis with unfounded psychoanalysis doesn't refute anything. It just levels the playing field of bad arguments.
The author gestures at this with "I am still willing to admit I am wrong" but the bulk of the piece is vibes-based counter-psychoanalysis, not engagement with evidence.
It's a well-written "no u" that mistakes self-awareness ("I know this isn't charitable") for self-correction.
LLMs have also become kind of a political issue, except only the "anti" side even really cares about it. Given that using and prompting them is very much a garbage in/garbage out scenario, people let their social and political biases cloud their usage, and instead of helping it succeed, they try to collect "gotcha" moments, which doesn't reflect the workflow of someone using an LLM productively.
I can't agree with that.
The pro-LLM side is relentlessly pushing everyone into using it, even when very few people want it (e.g. WhatsApp not even allowing people to turn it off). That smacks of insecurity to me.
Anti-LLM people are happy to continue coding/writing/drawing without the new digital tools - that sounds like someone who knows what they're doing and feels secure about their abilities.
> LLMs have also become kind of a political issue, except only the "anti" side even really cares about it
The stock market would disagree to the tune of a large sum of money. What the "anti" side care most about is being forced to pay for (typically hidden costs such as a percentage of your pension investments) and often forced to use "AI" despite knowing that the service is substantially worse than getting people to do the same job (e.g. "AI" support service agents)
If the tech is really all that it is hyped up to be, then let's see the actual results rather than a hundred and one blog posts that are often AI-slop themselves. (Not accusing you of posting AI-slop, just to be clear)
You just couldn't hear them over all the hype.
On the otherhand, that hype did help bring a lot of investment which begat things like desktop resin printers and prosumer-level SLS and such advanced tech that actually can replace much existing manufacturing tech.
Current "AI"/LLM situation sure seems like a bubble, but in the end transformer-based ML is obviously insanely powerful for many domains and it's going to change and create industries. But just like there isn't a 3d printer in every house, and it doesn't seem like that will be a thing anytime soon, it doesn't seem likely that we're all going to just be prompt engineers.
The people who were the best at something don't necessarily be the best at a new paradigm. Unlearning some principles and learning new ones might be painful exercise for some masters.
Military history has shown that the masters of the new wave are not necessarily the masters of the previous wave we see the rise and downfall of several civilizations from Roman to Greek for being too sure of their old methods and old military equipments and strategy.
Just because LLMs don't work for you outside of vibe-coding, doesn't mean it's the same for everyone.
> LLM evangelists - are you willing to admit that you just might not be that good at programming computers?
Productive usage of LLMs in large scale projects become viable with excellent engineering (tests, patterns, documentation, clean code) so perhaps that question should also be asked to yourself.
The article starts from the premise that LLMs are only good for vibe-coding.
It starts from the premise that the author finds LLMs are good for limited, simple tasks with small contexts and clearly defined guidelines, and specifically not good for vibe-coding.
And the author literally mentions that they aren't making universal claims about LLMs, but just speaking from personal experience.
> I genuinely don't mind if other people vibe code. Go for it!
> But that is not enough for the vocal proponents. It's the future!
The author is okay for others to voice their positive opinion about LLMs as long as it is limited to vibe coding.
It starts defining a gatekeeping threshold of what level of positive opinion is acceptable for others to have, according to the author.
Good day.
I work a lot with doctors (writing software for them), I am very envious of their system of specialisation, eg this dude is such and such a specialist - he knows about it, listen to him. IT seems to be anyone who talks the loudest has a podium, separating the wheat from the chaff is difficult. One day we will have a system of qualifications I hope, but it seems a long way off.
It's a lot like why I've been bullish on Tesla's approach to FSD even as someone who owned an AP1 vehicle that objectively was NOT "self-driving" in any sense of the word: it's less about where the technology is right now, or even the speed the technology is currently improving at, and more about how the technology is now present to enable acceleration in the rate of improvement of performance, paired with the reality of us observing exactly that. Like FSD V12 to V14, the last several years in AI can only be characterized as an unprecedented rate of improvement, very much like scientific advancement throughout human society. It took us millions of years to evolve into humans. Hundreds of thousands to develop language. Tens of thousands to develop writing. Thousands to develop the printing press. Hundreds to develop typewriters. Decades to develop computers. Years to go from the 8086 to the modern workstations of today. The time horizon of tasks AI agents can now reliably perform is now doubling every 4 months, per METR.
Do frontier models know more than human experts in all domains right now? Absolutely not. But they already know far more than any individual human expert outside that human's domain(s) of expertise.
I've been passionate about technology for nearly two decades, working in the technology industry for close to a decade. I'm a security guy, not a dev. I have over half a dozen CVEs and countless private vuln disclosures. I can and do write code myself - I've been writing scripts for various network tasks for a decade before ChatGPT ever came into existence. That said, it absolutely is a better dev than me. But specialized harnesses paired with frontier models are also better security engineers than I am, dollar for dollar versus my cost. They're better pentesters than me, for the relative costs. These statements were not true at all without accounting for cost two years ago. Two years from now, I am fully expecting them to just be outright better at security engineering, pentesting, SCA than I am, without accounting for cost, yet I also expect they will cost less then than they do now.
A year ago, OpenAI's o1 was still almost brand new, test-time compute was this revolutionary new idea. Everyone thought you needed tens of billions to train a model as good as o1, it was still a week before Deepseek released R1.
Now, o1's price/performance seems like a distant bad dream. I had always joked that one quarter in tech saw as much change as like 1 year in "the real world". For AI, it feels more like we're seeing more change every month than we do every year in "the real world", and I'd bet on that accelerating, too.
I don't think experienced devs still preferring to architect and write code themselves are coping at all. I still have to fix bugs in AI-generated code myself. But I do think it's short sighted to not look at the trajectory and see the writing on the wall over the next 5 years.
Stanford's $18/hr pentester that outperforms 9/10 humans should have every pentester figuring out what they're going to be doing when it doubles in performance and halves in cost again over the next year, just like human Uber drivers should be reading Motortrend's (historically a vocal critic of Tesla and FSD) 2026 Best Driver Assistance System and figuring out what they're going to do next. Experienced devs should be looking at how quickly we came from text-davinci-003 to Opus 4.5 and considering what their economic utility will look like in 2030.
I agree with this but I just don't believe that the "promised AI" would be an LLM but some other technology.
> But specialized harnesses paired with frontier models are also better security engineers than I am, dollar for dollar versus my cost. They're better pentesters than me, for the relative costs.
Don't mind but some parts of cybersecurity are more brute-force in nature, where AI excels, but using that to claim frontier models are better security engineers than humans is a bit of an oversimplification. You are BETTER because of your expertise of X years in cyber security, and I can assure you that there is not frontier model out there with the context capacity to challenge you on that. And that makes you a 10x or even 100x more cost effective (dollar-wise). And your (perceived) cost will go on increasing with each adding year to your experience. AI can do things faster than you but your judgement on which subtree/direction to traverse while pentesting makes you a better than an LLM and the cost of this (human) judgement will increase day-by-day because other humans need other humans to take accountability.
"you'll be left behind if you don't learn crypto" with crypto
or
"you'll be left behind if you don't learn how to drive" with cars
One of those statements is made in good faith, and the other is made out of insecurity. But we'll probably only really be able to tell looking backwards.
Can’t speak for others but that’s not what I’d understand (or do) as vibecoding. If you’re babysitting it every inch of the way then yeah sure I can see how it might not be productive relative to doing it yourself.
If you’re constantly fighting the LLM because you have a very specific notion of what each line should look like it won’t be a good time.
Better to spec out some assumptions, some desired outcomes, tech to be used, maybe the core data structure, ask the llm what else it needs to connect the dots, add that and then let it go
Yeah I find they are useful for large sweeping changes, introducing new features and stuff, mostly because they write a lot of the boilerplate, granted with some errors. But for small fiddly changes they suck, you will have a much easier time doing these changes your self.
If it’s less than that, then it’s more like adding syntax highlighting or moving from Java to Ruby on Rails. Both of those were nice, but people weren’t breathlessly shouting about being left behind.
Of course we need a few people to get wildly overexcited about new possibilities, so they will go make all the early mistakes which show the rest of us what the new thing can and cannot actually do; likewise, we need most of us to feel skeptical and stick to what already works, so we don't all run off a cliff together by mistake.
I think that coding assistants tend to quite good as long as what you ask is close to the training data.
Anything novel and the quality if falling off rapidly.
So, if you are like Antirez and ask for a Linenoize improvement that has already be seen many times by the LLM at training time, the result will seem magical, but that is largely an illusion, IMO.
spend 60% on A.I, 30% on Humans and 10% on operations but I can bet you my sole penny that's not happening - so we know someone is tryna sell us a polished turd as a diamond
If A.I maximalism gospel was true we would see companies raising absurd seed and A rounds in record numbers. Which is exactly what we’re seeing
It's fear, but of different kind. Those who are most aggressive and pushy about it are those who invested too much [someone else's] money in it and are scared angry investors will come for their hides when reality won't match their expectations.
> You see a lot of accomplished, prominent developers claiming they are more productive without it.
You also see a lot of accomplished, prominent developers claiming they are more productive with it, so I don't know what this is supposed to prove. The inverse argument is just as easy to make and just as spurious.
Yeah, this.
I sucked (still sucks?) at it too, I spent countless hours correcting them. And throwing away hours of "work" they made, And even had them nuking the workplace a couple times (thankfully, they were siloed). I still feel like I'm wasting too much time way too often and trying new things constantly.
But I always thought I can learn and improve on this tool and its associated ecosystem as much as the other programming tools and languages and frameworks I learned over the years.
Map-reading evangelists, are you willing to admit that you just might not be that good at driving a car? Maybe you once were. Maybe you never were."
I'll remain skeptical and let the technology speak for itself, if it ever does.
And one major thing is language.
Some languages (Rust, React) are so complex and nuanced that LLMs struggle with them - as do humans. Agentic LLMs will eventually solve the problem you've given them but the solution might be a bit wonky.
Compare that to LLMs writing Python or Go. With Go there's just one way to write a loop, it can't get confused with that. The way to write and format the language has been exactly the same since the beginning.
Same with Python, it's pretty lenient on how you write it (objects vs functional) but there are well-estabilished standards on how to do things and it's an old language (34 years btw). Most of Python 2.x is still valid Python 3.
Instead of psychoanalyzing each other, people should share concrete examples
We need to drop this competition paradigm ASAP.
Or maybe the author is bad at programming AND bad at agentic coding.
That’s more likely than the possibility that all llm evangelists are terrible coders.
What I use LLMs for
I don't use Claude Code, Codex or any other hyped-up, VC-approved, buzzword-compliant, productivity-theater SF tool for actual day-to-day coding. I just use ChatGPT outside of my editor in browser.
The only tasks that I feel LLMs right now can do somewhat-reliably without human supervision include:
1. Search: To search and compare different things; that is boring to do manually. As an example, I checked how M1 Pro compares to M4. ChatGPT crawls the web to compare both the processors on similar specs and provides a conclusion but I need to check and verify the claims it reached by briefly reading the sources.
2. Documentation: I have to do things on time so I offload reading documentations for syntax or how-to-do-it-with-this-library to ChatGPT.
3. Single task scripts: I let ChatGPT create me boring and one-time-use python scripts for trivial tasks.
I don't spend ChatGPT's limited but free GPT-5-(latest-model) tokens for such trivial tasks - I have a special keybinding on my mac that fires up ChatGPT in incognito for me.
But don't get me wrong, I find LLMs pretty useful for PoC-level stuff and I do like instructing my agents to make me a half-baked PoC that makes me feel like a revolutionary SF-founder for an afternoon before I delete the repo and go back to writing real code like a responsible adult. Because nothing says ‘disruptive genius’ like ctrl+z and moving on.
---
Why I find vibe coding in production not useful
Like sean goedecke, I feel that using LLMs in production is not useful and (in fact) anti-productive because it gives junior engineer's free pass to push code without actually learning/reading it (Humans often tend to chose the easiest path to accomplish given problem). And due to this, its a pain to review junior PR these days.
I have two reasons why LLMs are not good at production coding:
1. Context: There are enough discussions out on the internet that says LLMs are limited by context. Humans are better because they can handle and work with multiple contexts in mind - an LLMs knows how to solve the user problem, but the engineer knows what business problem this piece of code solves, in which context the code will run, with other process on same service and in which context of other services, what company code idioms to follow while writing code and what will make your pair-programmer think that you are smart.
2. Lack of deterministic output: I would not over-simply by saying its just a probabilistic system but you can feel it right? It would sometimes give correct, decent and sometimes wrong answers for the same prompt. Sometimes it would enlighten you with a very interesting insight/perspective and other times it would be as dumb as a toaster trying to file your taxes - consistently inconsistent (when you reached context rot), yet somehow still sold as “intelligent.”
---
A brief rant
My manager fear-mongers me nearly everyday saying LLMs are really good at coding, there are at-most 10 years left to earn as programmer and stuff. The fact he is hasn't touched coding in last 8 years.
He thinks his half-baked trivial ideas translated to code by an overpriced probabilistic system is a breakthrough for his company.
I am non-native english speaker and you would have probably sensed that by now. I often get tasks to write documentations, and every time my manager doesn't understand something, he takes it as a chance to break my confidence saying that you are bad at using LLM for language related tasks because my english is poor.
But (at-least right now) I strongly feel this is just a hype:
- CEOs want to get-rich-quick or they want to increase company profits by "ai-enabling" it and getting salary hikes.
- Companies are hyping this tech to recalibrate engineer salaries that has skyrocketed since covid.
- My manager wants to make me feel replaceable.
Queue copilot (which is arguably likelty not the ultimate toolchain) which can at least read files it needs etc, either explicitly or implicitly, is a whole other ballgame and it works 100x better.
That said, practically speaking, you can just ignore the evangelists and even use it as a signal for people you should ignore.
The unfortunate thing is how it's affecting our careers, since it's apparently so alluring for Management types to buy the hype and start trying to micro manage devs.
It takes practice to figure out which things the LLM handles well and how best to present your problems to the LLM to get a good result.
It take luck that the specific things you're trying to get results for are things the LLM actually can handle well.
budududuroiu•3w ago
ymyms•3w ago
j2kun•3w ago
ymyms•3w ago
hu3•3w ago
falloutx•3w ago
theshrike79•3w ago
If the slop-o-matic next to you is delivering 5 features a week without tripping up QA and you do one every two weeks - which one will the company pick when layoffs hit again?
tstrimple•3w ago
theshrike79•3w ago
So many companies would be just fine with a single VPS, Python and PostgreSQL - but that's boring and doesn't look good in your resume =)