I’ve worked with plenty of developers who are happy to slam null checks everywhere to solve NREs with no thought to why the object is null, should it even be null here, etc. There’s just a vibe that the null check works and solves the problem at hand.
I actually think a few folks like this can be valuable around the edges of software but whole systems built like this are a nightmare to work on. IMO AI vibe coding is an accelerant on this style of not knowing why something works but seeing what you want on the screen.
AI just automates that
I had a peer who suddenly started completing more stories for a month or two when our output was largely equal before. They got promoted over me. I reviewed one of their PRs... what a mess. They were supposed to implement caching. Their first attempt created the cache but never stored anything in it. Their next attempt stored the data in the cache, but never looked at the cache - always retrieving from the API. They deleted that PR to hide their incompetence and opened a new one that was finally right. He was just blindly using AI to crank out his stories.
That team had something like 40% of capacity being spent on tech debt, rework, and bug fixes. The leadership wanted speed above all else. They even tried to fire me because they thought I was slow, even though I was doing as much or more work than my peers.
That’s a very low bar. It’s easy to get a program to compile. And if it’s interpreted, you can coast for months with no crashes, just corrupted state.
The issue is not that they can’t code, it’s that they can’t problem solve and can’t design.
It tried to sneak in changing the CI build script to proceed to next step on failure.
It's a bold approach, I'll give it that.
1. if it won't compile you'll give up on the tool in minutes or an hour.
2. if it won't run you'll give up in a few hours or a day.
3. if it sneaks in something you don't find until you're almost - or already - in production it's too late.
charitable: the model was trained on a lot of weak/lazy code product.less-charitable: there's a vested interest in the approach you saw.
In a complex model like Claude there is no doubt much more at work, but some version of optimizing for the wrong thing is what’s ultimately at play.
I would correct that: it's not an accelerant of "seeing what you want on the screen," it's an accelerant of "seeing something on the screen."
[Hey guys, that's a non-LLM it's not X, it's Y!]
Things like habitual, unthoughtful null-checks are a recipe for subtle data errors that are extremely hard to fix because they only get noticed far away (in time and space) from the actual root cause.
Using tab complete gives me the chance to generate a few lines of a solution, then stop it, correct the architectural mistakes it is making, and then move on.
To AI's credit, once corrected, it is reasonably good at using the correct approach. I would like to be able to prompt the tab completion better, and the IDEs could stand to feed the tab completion code more information from the LSP about available methods and their arguments and such, but that's a transient feature issue rather than a fundamental problem. Which is also a reason I fight the AI on this matter rather than just sitting back: In the end, AI benefits from well-organized code too. They are not infinite, they will never be infinite, and while code optimized for AI and code optimized for humans will probably never quite be the same, they are at least correlated enough that it's still worth fighting the AI tendency to spew code out that spends code quality without investing in it.
[1]: Which is less trivial than it sounds and violated by programmers on a routine basis: https://jerf.org/iri/post/2025/fp_lessons_half_constructed_o...
[2]: https://jerf.org/iri/post/2025/fp_lessons_types_as_assertion...
I barely ever use AI code gen at the file level.
Other uses I’ve gotten are:
1. It’s a great replacement for search in many cases
2. I have used it to fully generate bash functions and regexes. I think it’s useful here because the languages are dense and esoteric. So most of my time is remembering syntax. I don’t have it generate pipelines of scripts though.
Yea, this is something I've also noticed but it never frustrated me to the point where I wanted to write about it. Playing around with Claude, I noticed it has been trained to code very defensively. Null checks everywhere. Data validation everywhere (regardless of whether the input was created by the user, or under the tight control of the developer). "If" tests for things that will never happen. It's kind of a corporate "safe" style you train junior programmers to do in order to keep them from wrecking things too badly, but when you know what you're doing, it's just cruft.
For example, it loves to test all my C++ class member variables for null, even though there is no code path that creates an incomplete class instance, and I throw if construction fails. Yet it still happily whistles along, checking everything for null in every method, unless I correct it.
In some cases I feel like I get better quality at slightly more time than usual. My testing situation in the front end is terribly ugly because of the "test framework can't know React is done rendering" problem but working with Junie I figured out a way to isolate object-based components and run them as real unit test with mocks. I had some unmaintainable Typescript which would explode with gobbledygook error messages that neither Junie or I could understand whenever I changed anything but after two days of talking about it and working on it it was an amazing feeling to see that the type finally made sense to me at Junie at the same time.
In cases where I would have tried one thing I can now try two or three things and keep the one I like the best. I write better comments (I don't do the Claude.md thing but I do write "exemplar" classes that have prescriptive AND descriptive comments and say "take a look at...") and more tests than I would on my on my own for the backend.
Even if you don't want Junie writing a line of code it shines at understanding code bases. If I didn't understand how to use an open source package from reading the docs I've always opened it in the IDE and inspected the code. Now I do the same but ask Junie questions like "How do I do X?" or "How is feature Y implemented?" and often get answers quicker than digging into unfamiliar code manually.
On the other hand it is sometimes "lights on and nobody home", and for a particular patch I am working on now it's tried a few things that just didn't work or had convoluted if-then-else ladders that I hate (even if I told it I didn't like that) but out of all that fighting I got a clear idea of where to put the patch to make it really simple and clean.
But yeah, if you aren't paying attention it can slip something bad past you.
You have to be careful by exactly telling the LLM what to test for and manually check the whole suite of tests. But overall it makese feel way more confident over increasing amounts of generated code. This of course decreases the productivity gains but is necessary in my opinion.
And linters help.
I worked on a team where we had someone come in and help us improve our tests a lot.
The default LLM generated tests are bit like the ones I wrote before that experience.
No one cares what a compiler or js minifier names its variables in its output.
Yes, if you don't believe we will get there ever, then this is totally valid complaint. You are also wrong about the future.
I'll take the other side of your bet for the next 10 years but I won't take it for the next 30 years.
In that spirit, I want my fusion reactor and my flying car.
The last 2-3 months of releases have been an unprecedented whirlwind. Code writing will be solved by the end of 2026. Architecture, maybe not, but formatting issues isn't architecture.
In 1960 they were planning nuclear powered cars and nuclear mortars.
I really dislike people comparing GenAI with compilers. Compilers largely do mechanic transformations, they do almost 0 logic changes (and if they do, they're bugs).
We are in an industry that's great at throwing (developing) and really bad at catching (QA) and we've just invented the machine gun. For some reason people expect the machine gun to be great at catching, or worse, they expect to just throw things continuously and have things working as before.
There is a lot of software for which bugs (especially data handling bugs) don't meaningfully affect its users. BUT there isn't a lot of software we use daily and rely on for which that's the case.
I know that GenAI can help with QA, but I don't really see a world where using GenAI for both coding and QA gets us to where we want to go, unless as some people say, we start using formal verification (or other very rigorous and hopefully automatable advanced verification), at which point we'll have invented a new category of programmers (and we will need to train all of them since the vast majority of current developers don't know about or use formal verification).
Input costs are lower and velocity is higher. You get a finished product out the door quicker, though maintenance is more expensive. Largely because the product is no longer a collection of individual parts made to be interfaced by a human. It is instead a machine-assembled good that requires a machine to perform "the work". Therefore, because the machine is only designed to assemble the good, your main recourse is to have the machine assemble a full replacement.
With that framing, there seems to be a tradeoff to bear in mind when considering fit for the problem we're meaning to solve. It also explains the widespread success of LLMs generating small scripts and MVPs. Which are largely disposable.
I'd like to say that AI just takes this to an extreme but I'm not even sure about that, I think it could produce more code and more bugs than I could in the same amount of time but not significantly so if I just gave up on caring about anything
But, assuming there are not bugs and the code ships. Has there been any study in resource usage creeping up and an impact of this on a whole system. The tests I have done with trying to build things with AI it always seems like there is zero efficiency unless you notice it and can put it in the right direction.
I have been curious about the impact this will have on general computing as more low quality code makes it into applications we use every day.
This is ... not actually faster.
I think I've seen so much rush-shipped slop (before and after) that I'm really anxiously waiting for this bubble to pop.
I have yet to be convinced that AI tooling can provide more than 20% or so speedup for an expert developer working in a modern stack/language.
To give an example of how to use AI successfully check the following post:
https://friendlybit.com/python/writing-justhtml-with-coding-...
Many folks would say that if shipping faster allows for faster iterations across an idea then the silly errors are worth it. I’ve certainly seen a sharp increase on execs calling BS on dev teams saying they need months to develop some basic thing.
I like to think of intentionalists—people who want to understand systems—and vibe coders—people who just want things to work on screen expediently.
I think success requires a balance of both. The current problem I see with AI is that it accelerates the vibe part more than the intentionalist part and throws the system out of balance.
Nobody wants teams to ship crap, but also folks are increasingly questioning why a bit of final polishing takes so long.
Code quality can be poor as long as someone understands the tradeoffs for why it's poor.
This is so true. Software that should be simple can become so gnarly because of bad infra. For example, our CI/CD team couldn't get updated versions of Python on the CI machines, and so suddenly we need to start using Docker for what should be a very simple software. That's just an example, but you get the idea, and it causes problems to compound over the years.
You really want good people with sharp elbows laying the foundations. At one time I resented people like that, but now I have seen what happens when you don't have anyone like that making technical decisions.
Some of the teams I worked with in the years right before AI coding went mainstream had become really terrible about this. They would spend months forming committees, writing documents, getting sign-offs and approvals, creating Gantt charts, and having recurring meetings for the simplest requests.
Before I left, they were 3 months deep into meetings about setting up role based access control on a simple internal CRUD app with a couple thousand users. We needed about 2-3 roles. They were into pros and cons lists for every library and solution they found, with one of the front runners involving a lot of custom development for some reason.
Yet the entire problem could have been solved with 3 Boolean columns in the database for the 3 different roles. Any developer could have done it in an afternoon, but they were stuck in a mindset of making a big production out of the process.
I feel like LLMs are good at getting those easy solutions done. If the company really only needs a simple change, having an LLM break free from the molasses of devs who complicate everything is a breath of fresh air.
On the other hand, if the company had an actual complicated need with numerous and changing roles over time, the simple Boolean column approach would have been a bad idea. Having people who know when to use each solution is the real key.
Impact of having 1.7x more bugs is difficult to assess and is not solved that easily. Comparison would work if that was about optimisations: code that is 1.7x slower / memory hungry.
They sometimes can, but this is no longer a guaranteed outcome. Supercompilation optimizers can often put manual assembly to shame.
> Impact of having 1.7x more bugs is difficult to assess and is not solved that easily.
Time will tell. Arguably the number of bugs produced by AI 2 years ago was much higher than 1.7x. In 2 more years it might only be 1.2x bugs. In 4 years time it might be barely measurable. The trend over the next couple of years will judge whether this is a viable way forward.
All websites involved are vibe coded garbage that use 100% CPU in Firefox.
Coding with agents has forced me to generate more tests than we do in most startups, think through more things than we get the time to do in most startups, create more granular tasks and maintain CI/CD (my pipelines are failing and I need to fix them urgently).
These are all good things.
I have started thinking through my patterns to generate unit tests. I was generating mostly integration or end to end tests before. I started using helping functions in API handlers and have unit tests for helpers, bypassing the API level arguments (so not API mocking or framework test to deal with). I started breaking tasks down into smaller units, so I can pass on to a cheaper model.
There are a few patterns in my prompts but nothing that feels out of place. I do not use agents files and no MCPs. All sources here: https://github.com/brainless/nocodo (the product is itself going through a pivot so there is that).
1. "Write a couple lines or a function that is pretty much what four years ago I would have gone to npm to solve" (e.g. "find the md5 hash of this blob")
2. "Write a function that is highly represented and sampleable in the rest of the project" (e.g. "write a function to query all posts in the database by author_id" (which might include app-specific steps like typing it into a data model)).
3. "Make this isolated needle-in-a-haystack change" (e.g. "change the text of such-and-such tooltip to XYZ") (e.g. "there's a bug with uploading files where we aren't writing the size of the file to the database, fix that")
I've found that it can definitely do wider-ranging tasks than that (e.g. implement all API routes for this new data type per this description of the resource type and desired routes); and it can absolutely work. But, the two problems I run into:
1. Because I don't necessarily have a grokable handle on what it generated, I don't have a sense of what its missing and needed follow-on prompts to create. E.g.: I tell it to write an endpoint that allows users to upload files. A few days later, we realize we aren't MD5-hashing the files that got uploaded; there was a field in the database & resource type to store this value, but it didn't pick up on that, and I didn't prompt it to do this; so its not unreasonable. But oftentimes when I'm writing routes by hand, I'm spending so much time in that function body that follow-on requirements naturally occur to me ("Oh that's right, we talked about needing this route available to both of these two permissions, crap let me implement that"). With AI, it finishes so fast that my brain doesn't have time to remember all the requirements.
2. We've tried to mitigate this by pushing more development into the specs and requirements up-front. This is really hard to get humans to do, first of all. But more critically: None of our data supports the hypothesis that this has shortened cycle times. It mostly just trades writing typescript for reading & writing English (which few engineers I've ever worked with are actually all that good at). The engineers still end up needing long cycle times back-and-forth with the AI to get correct results, and long cycle times in review.
3. The more code you ask it to generate, the more vibeslop you get. Deeply-nested try/catch statements with multiple levels of error handling & throwing. Comments everywhere. Reimplementing the same helper functions five times. These things, we have found, raise the cost and lower the reliability & performance of future prompting, and quickly morph parts of the system into a no-man's-land (literally) where only AIs can really make any change; and every change even by the AIs get harder and harder to ship. Our reported customer issues on these parts of the app are significantly higher than others, and our ability to triage these issues is also impacted because we no longer have SMEs that can just brain-triage issues in our CS channels; everything now requires a full engineering cycle, with AI involvement, to solve.
Our engineers run the spectrum of "never wanted to touch AI, never did" to "earnestly trying to make it work". Ultimately I think the consensus position is: Its a tool that is nice to have in the toolbox, but any assertion that its going to fundamentally change the profile of work our engineers do, or even seriously impact hiring over the long-term, is outside the realm of foreseeable possibility. The models and surrounding tooling are not improving fast enough.
I suppose the tl;dr is if you're generating bugs in your flow and they make it to prod, it's not a tool problem - it's a cultural one.
But it's well worth it. It has saved me some considerable time. I let it run first, even before my own final self-review (so if others do the same, the article's data might be biased). It's particularly good at identifying dead code and logical issues. If you tune it with its own custom rules (like Claude.md), you can also cut a lot of noise.
Like for example: found missing data validation / sanitization reported, only because the code has already been sanitized / validated but this is not visible in the diff.
You can tell CodeRabbit he is wrong about this and the tool accepts it then, though.
The real kicker is when someone copies AI generated code without understanding it and then 3 months later nobody can figure out why production keeps having these random issues. debugging AI slop is its own special hell
bogzz•3h ago
jjmarr•3h ago
fwiw, I agree. LLM-powered code review is a lifesaver. I don't use Coderabbit but all of my PRs go through Copilot before another human looks at it. It's almost always right.
bpicolo•3h ago
It’s literally right at the end of their recommendations list in the article
jjmarr•3h ago
> an article that claims AI is oddly not as bad when it comes to generating gobbledegook
Ironically, Coderabbit wants you to believe AI is worse at generating gobbledegook.
GoatInGrey•16m ago
I'm obviously taking the piss here, but the irony is amusing.
elktown•2h ago
naasking•1h ago
Doesn't it seem plausible to you that, whatever the ratio of bugs in AI-generated code today, that bug count is only going to really go down? Doesn't it then seem reasonable to say that programmers should start familiarizing themselves with these new tools, where the pitfalls are and how to avoid them?
bogzz•1h ago
naasking•1h ago
miningape•1h ago
naasking•1h ago
miningape•59m ago
Yes, because:
> They are applied deterministically inside a compiler
Sorry, but an LLM randomly generating the next token isn't even comparable.
Deterministic complexity =/= randomness.
naasking•56m ago
Unless you wrote the compiler, you are 100% full of it. Even as the compiler writer you'd be wrong sometimes.
> Deterministic complexity =/= randomness.
LLMs are also deterministically complex, not random.
miningape•52m ago
You can check the source code? What's hard to understand? If you find it compiled something wrong, you can walk backwards through the code, if you want to find out what it'll do walk forwards. LLMs have no such capability.
Sure maybe you're limited by your personal knowledge on the compiler chain, but again complexity =/= randomness.
For the same source code, and compiler version (+ flags) you get the exact same output every time. The same cannot be said of LLMs, because they use randomness (temperature).
> LLMs are also deterministically complex, not random
What exactly is the temperature setting in your LLM doing then? If you'd like to argue pseudorandom generators our computers are using aren't random - fine, I agree. But for all practical purposes they're random, especially when you don't control the seed.
naasking•18m ago
Right, so you agree that optimization outputs not fully predictable in complex programs, and what you're actually objecting to is that LLMs aren't like compiler optimizations in the specific ways you care about, and somehow this is supposed to invalidate my argument that they are alike in the specific ways that I outlined.
I'm not interested in litigating the minutiae of this point, programmers who treat the compiler as a black box (ie. 99% of them) see probabilistic outputs. The outputs are generally reliable according to certain criteria, but unpredictable.
LLM models are also typically probabilistic black boxes. The outputs are also unpredictable, but also somewhat reliable according to certain criteria that you can learn through use. Where the unreliability is problematic you can often make up for their pitfalls. The need for this is dropping year over year, just as the need for assembly programming to eke out performance dropped year over year of compiler development. Whether LLMs will become as reliable as compiler optimizations remains to be seen.
mwigdahl•1h ago
elktown•1h ago
naasking•1h ago
elktown•8m ago
azemetre•1h ago
Maybe if this LLM craze was being pushed by democratic groups where citizens are allowed to state their objections to such system, where such objections are taken seriously, but what we currently have are business magnates that just want to get richer with no democratic controls.
naasking•1h ago
This is not correct, plenty of programmers are seeing value in these systems and use them regularly. I'm not really sure what's undemocratic about what's going on, but that seems beside the point, we're presumably mostly programmers here talking about the technical merits and downsides of an emerging tech.
NeutralCrane•52m ago
At my company, there is absolutely no mandate for use of AI tooling, but we have a very large number of engineers who are using AI tools enthusiastically simply because they want to. In my anecdotal experience those who do tend to be much better engineers than the ones who are most skeptical or anti-AI (though its very hard to separate how much of this is the AI tooling, and how much is that naturally curious engineers looking for new ways to improve inevitably become better engineers who don't).
The broader point is, I think you are limiting yourself when you immediately reduce AI to snake oil being sold by "business magnates". There is surely a lot of hype that will die out eventually, but there is also a lot of potential there that you guarantee you will miss out on when you dismiss it out of hand.
azemetre•15m ago
Also add in the fact that big tech has been extremely damaging to western society for the last 20 years, there's really little reason to trust them. Especially since we see how they treat those with different opinions than them (trying to force them out of power, ostracize them publicly, or in some cases straight up poisoning people + giving them cancer).
Not really hard to see how people can be against such actions? Well buckle up bro, come post 2028 expect a massive crackdown and regulations against big tech. It's been boiling for quite a while and there's trillions of dollars to plunder for the public's benefit.
gldrk•50m ago
If you only ever target one platform, you might as well do it in assembly, it's just unfashionable. I don't believe you'd lose any 'productivity' compared to e.g. C, assuming equal amounts of experience.
naasking•15m ago
I'm skeptical, but do you think that you'd see no productivity gains for Python, Java or Haskell?
gldrk•5m ago
saulpw•15m ago
NeutralCrane•1h ago
Ironically, this response contains no critical thinking or nuance.
elktown•54m ago
NeutralCrane•43m ago
elktown•27m ago
XenophileJKO•32m ago
If you were pro-ai doing the majority of coding a year ago, you would have been optimistically in front of where the tech was actually capable.
If you are strongly against AI doing the majority of coding now, you are likely well behind what the current tech is capable of.
People who were pragmatic and knowledgeable anticipated this rise in capability.
GoatInGrey•23m ago
asdfdfd•7m ago