AI helps ship faster but it produces 1.7× more bugs

https://www.coderabbit.ai/blog/state-of-ai-vs-human-code-generation-report

207•birdculture•1mo ago

Comments

bogzz•1mo ago

oh wow, an LLM-based company with an article that claims AI is oddly not as bad when it comes to generating gobbledegook as everyday empirical evidence should suggest

jjmarr•1mo ago

Coderabbit is an LLM code review company so their incentives are the opposite. AI is terrible and you need more AI to review it.

fwiw, I agree. LLM-powered code review is a lifesaver. I don't use Coderabbit but all of my PRs go through Copilot before another human looks at it. It's almost always right.

bpicolo•1mo ago

Their incentives are perfectly aligned - you’re making more bugs, surely you need some AI code review to help prevent that.

It’s literally right at the end of their recommendations list in the article

jjmarr•1mo ago

The original comment said:

> an article that claims AI is oddly not as bad when it comes to generating gobbledegook

Ironically, Coderabbit wants you to believe AI is worse at generating gobbledegook.

GoatInGrey•1mo ago

Make the gobbledygook from your gobbledygook generator better with our proprietary gobbledygook generator.

I'm obviously taking the piss here, but the irony is amusing.

jjmarr•1mo ago

It sounds stupid but it works. I've seen it. I put Copilot on AI-generated slop PRs and hit refresh until it stops commenting. It's great seeing it take out all the dead code.

elktown•1mo ago

You're comment history suggests a pro-AI bias on par with AI companies. I don't understand it. It seems like critical thinking, nuance, and just basic caution have been turned off like a light-switch for far too many people.

naasking•1mo ago

Our industry never exhibited an abundance of caution, but if you have trouble understanding the value of AI here, consider that you are akin to an assembly language programmer in the 1970s or 80s who couldn't understand why people are so gung-ho about these compilers that just output worse code than they could write by hand. In retrospect, compilers only got better and better, and familiarity with programming languages and compilation toolchains became a valuable productivity skill and the market for assembly language programming either stagnated, or shrank.

Doesn't it seem plausible to you that, whatever the ratio of bugs in AI-generated code today, that bug count is only going to really go down? Doesn't it then seem reasonable to say that programmers should start familiarizing themselves with these new tools, where the pitfalls are and how to avoid them?

bogzz•1mo ago

compilers aren't probabilistic models though

naasking•1mo ago

Successful compiler optimizations are probabilistic though, from the programmer's point of view. LLMs are internally deterministic too.

miningape•1mo ago

What? Do you even know how compilers work?

naasking•1mo ago

Are you able to predict with 100% accuracy when a loop will successfully unroll, or various interprocedural or intraprocedural analyses will succeed? They are applied deterministically inside a compiler, but often based on heuristics, and the complex interplay of optimizations in complex programs means that sometimes they will not do what you expect them to do. Sometimes they work better than expected, and sometimes worse. Sounds familiar...

miningape•1mo ago

> Are you able to predict with 100% accuracy when a loop will successfully unroll, or various interprocedural or intraprocedural analyses will succeed?

Yes, because:

> They are applied deterministically inside a compiler

Sorry, but an LLM randomly generating the next token isn't even comparable.

Deterministic complexity =/= randomness.

naasking•1mo ago

> Yes, because:

Unless you wrote the compiler, you are 100% full of it. Even as the compiler writer you'd be wrong sometimes.

> Deterministic complexity =/= randomness.

LLMs are also deterministically complex, not random.

miningape•1mo ago

> Unless you wrote the compiler, you are 100% full of it. Even then you'd be wrong sometimes

You can check the source code? What's hard to understand? If you find it compiled something wrong, you can walk backwards through the code, if you want to find out what it'll do walk forwards. LLMs have no such capability.

Sure maybe you're limited by your personal knowledge on the compiler chain, but again complexity =/= randomness.

For the same source code, and compiler version (+ flags) you get the exact same output every time. The same cannot be said of LLMs, because they use randomness (temperature).

> LLMs are also deterministically complex, not random

What exactly is the temperature setting in your LLM doing then? If you'd like to argue pseudorandom generators our computers are using aren't random - fine, I agree. But for all practical purposes they're random, especially when you don't control the seed.

naasking•1mo ago

> If you find it compiled something wrong, you can walk backwards through the code, if you want to find out what it'll do walk forwards. LLMs have no such capability.

Right, so you agree that optimization outputs not fully predictable in complex programs, and what you're actually objecting to is that LLMs aren't like compiler optimizations in the specific ways you care about, and somehow this is supposed to invalidate my argument that they are alike in the specific ways that I outlined.

I'm not interested in litigating the minutiae of this point, programmers who treat the compiler as a black box (ie. 99% of them) see probabilistic outputs. The outputs are generally reliable according to certain criteria, but unpredictable.

LLM models are also typically probabilistic black boxes. The outputs are also unpredictable, but also somewhat reliable according to certain criteria that you can learn through use. Where the unreliability is problematic you can often make up for their pitfalls. The need for this is dropping year over year, just as the need for assembly programming to eke out performance dropped year over year of compiler development. Whether LLMs will become as reliable as compiler optimizations remains to be seen.

miningape•1mo ago

> invalidate my argument that they are alike in the specific ways that I outlined

Basketballs and apples are both round, so they're the same thing right? I could eat a basketball and I can make a layup with an apple, so what's the difference?

> programmers who treat the compiler as a black box (ie. 99% of them) see probabilistic outputs

In reality this is at best the bottom 20% of programmers.

No programmer I've ever talked to has described compilers as probabilistic black boxes - and I'm sorry if your circle does. Unfortunately there's no use of probability and all modern compilers definitionally white boxes (open source).

mwigdahl•1mo ago

True. The question is whether that's relevant to the trajectory described or not.

elktown•1mo ago

If I have a horse and plow and you show up with a tractor, I will no doubt get a tractor asap. But if you show up with novel amphetamines for you and your horse and scream "Look how productive I am! We'll figure out the long-term downsides, don't you worry! Just more amphetamines probably!", I'm happy to be a late adopter.

naasking•1mo ago

A tractor based on a Model T wouldn't have been very compelling either at the time. Not many horse-drawn plows these days though.

elktown•1mo ago

I understand that you've convinced yourself that progress is inevitable. I'll ponder over it on my commute to Mars. Oh wait, that was still on the tele.

azemetre•1mo ago

No because programmers aren't the ones pushing the wares, it's business magnates and sales people. The two core groups software developers should never trust.

Maybe if this LLM craze was being pushed by democratic groups where citizens are allowed to state their objections to such system, where such objections are taken seriously, but what we currently have are business magnates that just want to get richer with no democratic controls.

naasking•1mo ago

> No because programmers aren't the ones pushing the wares, it's business magnates and sales people.

This is not correct, plenty of programmers are seeing value in these systems and use them regularly. I'm not really sure what's undemocratic about what's going on, but that seems beside the point, we're presumably mostly programmers here talking about the technical merits and downsides of an emerging tech.

NeutralCrane•1mo ago

This seems like an overly reductive worldview. Do you really think there isn't genuine interest in LLM tools among developers? I absolutely agree there are people pushing AI in places where it is unneeded, but I have not found software development to be one of those areas. There are lots of people experimenting and hacking with LLMs because of genuine interest and perceived value.

At my company, there is absolutely no mandate for use of AI tooling, but we have a very large number of engineers who are using AI tools enthusiastically simply because they want to. In my anecdotal experience those who do tend to be much better engineers than the ones who are most skeptical or anti-AI (though its very hard to separate how much of this is the AI tooling, and how much is that naturally curious engineers looking for new ways to improve inevitably become better engineers who don't).

The broader point is, I think you are limiting yourself when you immediately reduce AI to snake oil being sold by "business magnates". There is surely a lot of hype that will die out eventually, but there is also a lot of potential there that you guarantee you will miss out on when you dismiss it out of hand.

azemetre•1mo ago

I use AI every day and run my own local models, that has nothing to do with seeing sales people acting like sales people or conmen being con artists.

Also add in the fact that big tech has been extremely damaging to western society for the last 20 years, there's really little reason to trust them. Especially since we see how they treat those with different opinions than them (trying to force them out of power, ostracize them publicly, or in some cases straight up poisoning people + giving them cancer).

Not really hard to see how people can be against such actions? Well buckle up bro, come post 2028 expect a massive crackdown and regulations against big tech. It's been boiling for quite a while and there's trillions of dollars to plunder for the public's benefit.

gldrk•1mo ago

High-level languages were absolutely indispensable at a time when every hardware vendor had its own bespoke instruction set.

If you only ever target one platform, you might as well do it in assembly, it's just unfashionable. I don't believe you'd lose any 'productivity' compared to e.g. C, assuming equal amounts of experience.

naasking•1mo ago

> I don't believe you'd lose any 'productivity' compared to e.g. C, assuming equal amounts of experience.

I'm skeptical, but do you think that you'd see no productivity gains for Python, Java or Haskell?

gldrk•1mo ago

Those are garbage-collected environments. I have some experience with a garbage-collected 'assembly' (.NET CIL). It is a delight to read and write compared to most C code.

naasking•1mo ago

Agree to disagree then! I've done plenty of CIL reading and writing. It's fine, but not what I'd call pleasant, not even compared to C.

saulpw•1mo ago

Type checking, even that as trivial as C's, is a boon to productivity, especially on large teams but also when coding solo if you have anything else in your brain.

acituan•1mo ago

> compilers only got better and better

At no point compilers produced stochastic output. The intent user expressed was translated down with a much much higher fidelity, repeatability and explainability. Most important of all, it completely removed the need for the developer to meddle with that output. If anything it became a verification tool for the developer‘s own input.

If LLMs are that good, I dare you skip the programming language and have it code in machine directly next time. And it is exactly how it is going to feel like if we treat them as valuable as compilers.

naasking•1mo ago

> At no point compilers produced stochastic output. [...] Most important of all, it completely removed the need for the developer to meddle with that output.

Yes, once the optimizations became sophisticated enough and reliable enough that people no longer needed to think about it or go down to assembly to get the performance they needed. Do you get the analogy now?

acituan•1mo ago

I don't know why you'd think your analogy wasn't clear in the first place. But your analogy can't support you on the assertion that optimizations will be sophisticated and reliable enough to completely forget about the programming language underneath.

If you have any first principles thinking on why this is more likely than not, I am all ears. My epistemic bet is that it is not going to happen, or somehow if we end up there the language we will have to use to instruct them is not going to be different than any other high level programming language that the point will be moot.

naasking•1mo ago

> But your analogy can't support you on the assertion that optimizations will be sophisticated and reliable enough to completely forget about the programming language underneath.

Where did I make that assertion?

acituan•1mo ago

Here is where I got that impression:

> once the optimizations became sophisticated enough

Either way I am not trying to litigate here. Feel free to correct me if your position was softer.

NeutralCrane•1mo ago

> It seems like critical thinking, nuance, and just basic caution have been turned off like a light-switch for far too many people.

Ironically, this response contains no critical thinking or nuance.

elktown•1mo ago

Such a typical HN "gotcha!".

NeutralCrane•1mo ago

I recommend engaging with ideas next time, rather than making reductive, ad-hominem, thought-terminating statements.

elktown•1mo ago

Thanks! I recommend not reading all comments literally. We have a significant hype bubble atm and I'm not exactly alone in thinking how crazy it is. I think you can draw a connection from my exasperated statement to that if you really wanted to.

cthalupa•1mo ago

You intentionally made disparaging remarks about someone and attempted to tie them having an opinion about a technology to that of people who have a vested financial interest in said technology.

You didn't engage at all on the substance of their comment - that they find AI useful for doing code reviews - and instead made a comment that was nothing but condescension.

All of that is separate from whether or not AI is overhyped or anything else - it being valuable for PRs could be true while it is also overhyped. If true, that could be some of the nuance you seem to be so concerned about us lacking.

elktown•1mo ago

1. No, I haven't suggested financial interest. There are plenty of non-financial ones on this forum.

2. True, I challenged the person's bias considering extraordinary historical comments lacking extraordinary evidence.

threethirtytwo•1mo ago

Your comments lack evidence most of it consists of digging through other people’s comment history and profiling them. I’ve seen you use that in several of your arguments. Never have I once seen you use logic or evidence to state your points.

I never look at someone’s comment history. I judge them for what they said only and I disseminate that without getting personal. I made an exception for you given how you decided to stalk my comment history and I noticed you just do this for everyone.

elktown•1mo ago

Yeah, still spamming annoyed replies everywhere including old threads certainly fits the profile of someone unbalanced being angry about being found out.

threethirtytwo•1mo ago

bro, you went through my comment history. I simply did the same and I responded to a few of your responses that are wrong. I'm not angry man, I just felt you needed a mentor, someone who can point you in the right direction.

And wth do you mean by "found out"? are you ok?

elktown•1mo ago

Haha, funny. Hopefully you won’t be banned again!

threethirtytwo•1mo ago

I actually don’t know what you’re on about. You need help bro.

elktown•1mo ago

Given the various unhinged tirades of your banned account I'd look inwards.

threethirtytwo•1mo ago

I don’t have a banned account?

You need to calm down.

elktown•1mo ago

[flagged]

threethirtytwo•1mo ago

I’m not and I’m not joking when I say you need help. Not being derogatory. Genuine advice.

elktown•1mo ago

[flagged]

threethirtytwo•1mo ago

Dude. You need to calm down. This conspiracy theory level stuff is not only a waste of my time but mostly a waste of your time. I suggest you move on with your life.

elktown•1mo ago

> You need to calm down

This was a fun dig actually. You doubling down made it even more so. Anyway, best of luck!

threethirtytwo•1mo ago

Good. Hope you get better.

threethirtytwo•1mo ago

We are in a hype bubble. Similar to the internet hype bubble. Like the internet bubble, the AI bubble is orthogonal to whether or not AI will change the world forever.

XenophileJKO•1mo ago

They're not wrong. I think many people also saw/see the trajectory of the models.

If you were pro-ai doing the majority of coding a year ago, you would have been optimistically in front of where the tech was actually capable.

If you are strongly against AI doing the majority of coding now, you are likely well behind what the current tech is capable of.

People who were pragmatic and knowledgeable anticipated this rise in capability.

GoatInGrey•1mo ago

My operating assumption, for everyone acting the way you described, is that it's predicated on the belief of "I have an opportunity to make money from this." It is exceedingly rare to find an instance of someone using the tech purely for the love of the game who isn't also tying it back to income generation in some way.

seanw444•1mo ago

I use it as an accelerated search engine to learn about things quicker than I otherwise would. But that's it. I ask it a question, it tells me an answer, and I work from there myself. Slapping it into your editor to write the code for you sounds disastrous to me. And also incredibly boring.

asdfdfd•1mo ago

it's called a love of money

smb06•1mo ago

Do you use Copilot for coding and then also Copilot for reviewing? Or are you using some other coding agent and Copilot only for PR reviews?

jjmarr•1mo ago

I do not use Copilot for coding. I use other assistants now.

Copilot code review is amazing. I use it all the time.

tyleo•1mo ago

I have a theory that vibe coding existed before AI.

I’ve worked with plenty of developers who are happy to slam null checks everywhere to solve NREs with no thought to why the object is null, should it even be null here, etc. There’s just a vibe that the null check works and solves the problem at hand.

I actually think a few folks like this can be valuable around the edges of software but whole systems built like this are a nightmare to work on. IMO AI vibe coding is an accelerant on this style of not knowing why something works but seeing what you want on the screen.

jmkni•1mo ago

Blindly copying and pasting from StackOverflow until it kinda sorta works is basically vibe coding

AI just automates that

giantg2•1mo ago

Yeah, but you had to integrate it until it at least compiled, which kind of made people think about what they're pasting.

I had a peer who suddenly started completing more stories for a month or two when our output was largely equal before. They got promoted over me. I reviewed one of their PRs... what a mess. They were supposed to implement caching. Their first attempt created the cache but never stored anything in it. Their next attempt stored the data in the cache, but never looked at the cache - always retrieving from the API. They deleted that PR to hide their incompetence and opened a new one that was finally right. He was just blindly using AI to crank out his stories.

That team had something like 40% of capacity being spent on tech debt, rework, and bug fixes. The leadership wanted speed above all else. They even tried to fire me because they thought I was slow, even though I was doing as much or more work than my peers.

skydhash•1mo ago

> Yeah, but you had to integrate it until it at least compiled, which kind of made people think about what they're pasting

That’s a very low bar. It’s easy to get a program to compile. And if it’s interpreted, you can coast for months with no crashes, just corrupted state.

The issue is not that they can’t code, it’s that they can’t problem solve and can’t design.

giantg2•1mo ago

Yeah, but integrating manually is more likely to force them to think than if the agent just does everything. You used to have to search stackoverflow, which requires articulating the problem. Now you can just tell copilot to fix it.

PaulHoule•1mo ago

It's a frustrating situation. I had a stretch in my career when I was the clean up person who did the 90% of work that was left after management thought a junior had gotten in 90% done. It's potentially very satisfying but very easy to feel unappreciated in (e.g. they wish the junior could have gotten it done and thought I was "too slow" though in retrospect one year of that was an annus mirabilis where I completed an almost unbelievable number of diverse projects.)

dionian•1mo ago

To be fair my AI setup almost always compiles before thinking its done.

giantg2•1mo ago

Is that Claude Code or something else? GitHub Copilot in VSCode does not always compile.

ahmadtbk•1mo ago

The field of software is slowly getting worse for some and better for others. I'm probably going to just contract myself out.

zipy124•1mo ago

I agree but I'd draw a different comparison. That is vibe coding has accelerated the type of developers who relied on stack overflow to solve all their problems. The kind of dev who doesn't try to solve problems themselves. It has just accelerated this type of working, but is less reliable than before.

skeeter2020•1mo ago

this matches with my first thought of this "study" (remember what coderabbit sells...); can you compare these types of PRs directly? Is the conclusion that AI produces more bugs, or is that a symptom of something else, like AI PRs are produced by less experienced developers?

whynotmaybe•1mo ago

"on error resume next" has been the first line of many vba scripts for years

eterm•1mo ago

I caught claude trying to sneak in the equivalent to a CI script yesterday as I was wrangling how to run framework and dotnet tests next to each other without slowing down the framework tests horrendously.

It tried to sneak in changing the CI build script to proceed to next step on failure.

It's a bold approach, I'll give it that.

skeeter2020•1mo ago

  1. if it won't compile you'll give up on the tool in minutes or an hour.
  2. if it won't run you'll give up in a few hours or a day.
  3. if it sneaks in something you don't find until you're almost - or already - in production it's too late.

charitable: the model was trained on a lot of weak/lazy code product.

less-charitable: there's a vested interest in the approach you saw.

andy99•1mo ago

Yeah it’s trained to do that somewhere though it’s not necessary malicious. For RLHF (the model fine tuning) the HF stands for human feedback but is really another trained model that’s trained to score replies the way a human would. And so if that model likes code that passes tests more than code that’s stuck in a debugging loop, that’s what the model becomes optimized for.

In a complex model like Claude there is no doubt much more at work, but some version of optimizing for the wrong thing is what’s ultimately at play.

eurekin•1mo ago

"ship fast, break things"

palmotea•1mo ago

> I actually think a few folks like this can be valuable around the edges of software but whole systems built like this are a nightmare to work on. IMO AI vibe coding is an accelerant on this style of not knowing why something works but seeing what you want on the screen.

I would correct that: it's not an accelerant of "seeing what you want on the screen," it's an accelerant of "seeing something on the screen."

[Hey guys, that's a non-LLM it's not X, it's Y!]

Things like habitual, unthoughtful null-checks are a recipe for subtle data errors that are extremely hard to fix because they only get noticed far away (in time and space) from the actual root cause.

jerf•1mo ago

One of my frustrations with AI, and one of the reasons I've settled into a tab-complete based usage of it for a lot of things, is precisely that the style of code it uses in the language I'm using puts out a lot of things I consider errors based on the "middle-of-the-road" code style that it has picked up from all the code it has ingested. For instance, I use a policy of "if you don't create invalid data, you won't have to deal with invalid data" [1], but I have to fight the AI on that all the time because it is a routine mistake programmers make and it makes the same mistake repeatedly. I have to fight the AI to properly create types [2] because it just wants to slam everything out as base strings and integers, and inline all manipulations on the spot (repeatedly, if necessary) rather than define methods... at all, let alone correctly use methods to maintain invariants. (I've seen it make methods on some occasions. I've never seen it correctly define invariants with methods.)

Using tab complete gives me the chance to generate a few lines of a solution, then stop it, correct the architectural mistakes it is making, and then move on.

To AI's credit, once corrected, it is reasonably good at using the correct approach. I would like to be able to prompt the tab completion better, and the IDEs could stand to feed the tab completion code more information from the LSP about available methods and their arguments and such, but that's a transient feature issue rather than a fundamental problem. Which is also a reason I fight the AI on this matter rather than just sitting back: In the end, AI benefits from well-organized code too. They are not infinite, they will never be infinite, and while code optimized for AI and code optimized for humans will probably never quite be the same, they are at least correlated enough that it's still worth fighting the AI tendency to spew code out that spends code quality without investing in it.

[1]: Which is less trivial than it sounds and violated by programmers on a routine basis: https://jerf.org/iri/post/2025/fp_lessons_half_constructed_o...

[2]: https://jerf.org/iri/post/2025/fp_lessons_types_as_assertion...

tyleo•1mo ago

This is close to my approach. I love copilot intellisense at GitHub’s entry tier because I can accept/reject on the line level.

I barely ever use AI code gen at the file level.

Other uses I’ve gotten are:

1. It’s a great replacement for search in many cases

2. I have used it to fully generate bash functions and regexes. I think it’s useful here because the languages are dense and esoteric. So most of my time is remembering syntax. I don’t have it generate pipelines of scripts though.

ryandrake•1mo ago

> a lot of things I consider errors based on the "middle-of-the-road" code style that it has picked up from all the code it has ingested. For instance, I use a policy of "if you don't create invalid data, you won't have to deal with invalid data"

Yea, this is something I've also noticed but it never frustrated me to the point where I wanted to write about it. Playing around with Claude, I noticed it has been trained to code very defensively. Null checks everywhere. Data validation everywhere (regardless of whether the input was created by the user, or under the tight control of the developer). "If" tests for things that will never happen. It's kind of a corporate "safe" style you train junior programmers to do in order to keep them from wrecking things too badly, but when you know what you're doing, it's just cruft.

For example, it loves to test all my C++ class member variables for null, even though there is no code path that creates an incomplete class instance, and I throw if construction fails. Yet it still happily whistles along, checking everything for null in every method, unless I correct it.

palmotea•1mo ago

> is precisely that the style of code it uses in the language I'm using puts out a lot of things I consider errors based on the "middle-of-the-road" code style that it has picked up from all the code it has ingested.

That is a really good point: the output you're gonna get is going to be mediocre, because it was trained (in aggregate) on mediocrity.

So the people who gush about LLMs were probably subpar programmers to start, and the ones that complain probably tend to be better-than-average, because who would be irritated by mediocrity?

And then you have to think about the long-term social effects: the more code the mediocrity machine puts out, the more mediocre code people are exposed to, and the more mediocre habits they'll pick up and normalize. IMHO, a lot of mediocrity comes from "growing up" in an environment with poor to mediocre norms. The next generation of seniors, who have more experience being LLM operators than writing code themselves, and probably more likely to get stuck in mediocrity.

I know someone's going to make an analogy to compilers to dismiss what I'm saying: but the thing about compilers is they are typically written by very talented and experienced people who've spent a lot of time carefully reasoning about how they behave in different scenarios. That's nothing like an LLM (just imagine how bad compilers would be if they were written by a bunch of mediocre developers from an outsourcing body shop, that's an LLM).

PaulHoule•1mo ago

My experience with AI coding is mixed.

In some cases I feel like I get better quality at slightly more time than usual. My testing situation in the front end is terribly ugly because of the "test framework can't know React is done rendering" problem but working with Junie I figured out a way to isolate object-based components and run them as real unit test with mocks. I had some unmaintainable Typescript which would explode with gobbledygook error messages that neither Junie or I could understand whenever I changed anything but after two days of talking about it and working on it it was an amazing feeling to see that the type finally made sense to me at Junie at the same time.

In cases where I would have tried one thing I can now try two or three things and keep the one I like the best. I write better comments (I don't do the Claude.md thing but I do write "exemplar" classes that have prescriptive AND descriptive comments and say "take a look at...") and more tests than I would on my on my own for the backend.

Even if you don't want Junie writing a line of code it shines at understanding code bases. If I didn't understand how to use an open source package from reading the docs I've always opened it in the IDE and inspected the code. Now I do the same but ask Junie questions like "How do I do X?" or "How is feature Y implemented?" and often get answers quicker than digging into unfamiliar code manually.

On the other hand it is sometimes "lights on and nobody home", and for a particular patch I am working on now it's tried a few things that just didn't work or had convoluted if-then-else ladders that I hate (even if I told it I didn't like that) but out of all that fighting I got a clear idea of where to put the patch to make it really simple and clean.

But yeah, if you aren't paying attention it can slip something bad past you.

alexjplant•1mo ago

I've written before about my experience at a shop like this. The null check would swallow the exception and do nothing about the failure so things just errored silently. Many high fives and haughty remarks about how smart the team was for doing this were had at the expense of lesser teams that didn't. The whole operation ran on a hackneyed MVP architecture from a Learning Tree class a guy took in 2008 and snippets stolen from StackOverflow and passed around on a USB key. Deviation from this bible was heresy and rebuked with sharp, unprofessional behavior. It was not a good place to work for those who value independent thought.

> AI vibe coding is an accelerant on this style of not knowing why something works but seeing what you want on the screen.

I've been saying this exact thing for years now. It also does the whole CRUD app "copy, paste, find, replace from another part of the application" workflow for building new domains very well. If you can bootstrap a codebase with good architectural practices and tests then Claude Code is a productivity godsend for building business apps.

LorenPechtel•1mo ago

Yeah. There are times when silently swallowing nulls is the proper answer. I've found myself doing it many times in C# to trap events that get triggered during creation. But you should never do so unless you've traced where they're coming from!

qudat•1mo ago

Totally agree. I see it all the time: https://bower.sh/death-by-thousand-existential-checks

Groxx•1mo ago

I'd call some null-pointer-lint-with-automatic-fixes tools "vibe coding" tbh. I've run across a couple that do a pretty good job of detecting possible nulls and add annotations about it and that's great... but then the fix is "if null, return null", in practice it's frequently applied completely blindly without any regards to correctness.

If you lean on tools like that, you can rapidly degrade your codebase into "everything might be null and might short circuit silently and it can't tell you about when it happens", leaving you with buggy software that is next to impossible to understand or troubleshoot because there aren't "should not be null" hints or stack traces or logs or anything that would help figure out causes.

motbus3•1mo ago

If you are in the industry for enough time you certainly crossed with a boss who said that it needs to be fixed in 5 minutes or else even if the problem was not caused by you and the solution clearly needs more than 5 minutes. (The root cause was because someone had only 5 minutes to do something too)

I once had a job that my boss ordered (that's the word he used) me to do the wrong thing. Me and the rest of the team refused except for one guy who did it because he was certain that 9 out 10 people were wrong while he was the only right one) The company spend 2M USD in returns, refunds and compensations in a project that probably didn't cost that. It was just a patch! How he could've possibly know - said the dismissed manager.

(Now he works for oracle, why not right)

phartenfeller•1mo ago

Definitely. But AI can also generate unit tests.

You have to be careful by exactly telling the LLM what to test for and manually check the whole suite of tests. But overall it makese feel way more confident over increasing amounts of generated code. This of course decreases the productivity gains but is necessary in my opinion.

And linters help.

SketchySeaBeast•1mo ago

I've been using Claude sonnet 4.5 lately and I've noticed a tendency for it to create tests that prove themselves. Rather than calling the function we're hoping to test, it re-implements the code in the test and then tests it there. It's helpful, and it usually works very well if you have well defined inputs and outputs, I much prefer it over writing tests manually, but you have to be very careful.

stuaxo•1mo ago

It doesn't generate good tests by default though.

I worked on a team where we had someone come in and help us improve our tests a lot.

The default LLM generated tests are bit like the ones I wrote before that experience.

dnautics•1mo ago

this is solvable by prompting and giving good examples?

LanceH•1mo ago

> It doesn't generate good tests by default though.

I agree with this.

I've found I need a whole separate cycle of test writing to get proper (in both scope and accuracy) coverage.

It does help tremendously with all the boilerplate of tests, and it seems to be quite good at setting up numerous tests for all combinations of variables. It does have to be done explicitly, though.

And you do need to mind when a test fails whether it fixes the test or the code.

strangescript•1mo ago

Do they consider code readability, formatting and variable naming as "errors" for the overall count. That seems dubious given where we are headed.

No one cares what a compiler or js minifier names its variables in its output.

Yes, if you don't believe we will get there ever, then this is totally valid complaint. You are also wrong about the future.

oblio•1mo ago

The "future" is a really long time.

I'll take the other side of your bet for the next 10 years but I won't take it for the next 30 years.

In that spirit, I want my fusion reactor and my flying car.

strangescript•1mo ago

If your outlook is 10 years then for sure, its valid. I am not sure how you come to that conclusion logically though. At the beginning of the year we had 0 code agents. Now we have dozens, some are basically free, (of various degrees of quality, sure).

The last 2-3 months of releases have been an unprecedented whirlwind. Code writing will be solved by the end of 2026. Architecture, maybe not, but formatting issues isn't architecture.

oblio•1mo ago

It's similar with every technology, there's a reason we have sigmoids.

In 1960 they were planning nuclear powered cars and nuclear mortars.

bopbopbop7•1mo ago

Code writing was solved in 1997 when Dreamweaver was released.

oblio•1mo ago

Nope, it was solved with Visual Basic in 1991. And with Nextstep in 1989. And with...

I really dislike people comparing GenAI with compilers. Compilers largely do mechanic transformations, they do almost 0 logic changes (and if they do, they're bugs).

We are in an industry that's great at throwing (developing) and really bad at catching (QA) and we've just invented the machine gun. For some reason people expect the machine gun to be great at catching, or worse, they expect to just throw things continuously and have things working as before.

There is a lot of software for which bugs (especially data handling bugs) don't meaningfully affect its users. BUT there isn't a lot of software we use daily and rely on for which that's the case.

I know that GenAI can help with QA, but I don't really see a world where using GenAI for both coding and QA gets us to where we want to go, unless as some people say, we start using formal verification (or other very rigorous and hopefully automatable advanced verification), at which point we'll have invented a new category of programmers (and we will need to train all of them since the vast majority of current developers don't know about or use formal verification).

cgearhart•1mo ago

So…great for prototyping (where velocity rules) but somewhere between mixed to negative for critical projects. Seems like this just puts some mildly quantitative numbers behind the consensus & trends I see emerging.

GoatInGrey•1mo ago

I'm seeing parallels between this and factory-assembled houses.

Input costs are lower and velocity is higher. You get a finished product out the door quicker, though maintenance is more expensive. Largely because the product is no longer a collection of individual parts made to be interfaced by a human. It is instead a machine-assembled good that requires a machine to perform "the work". Therefore, because the machine is only designed to assemble the good, your main recourse is to have the machine assemble a full replacement.

With that framing, there seems to be a tradeoff to bear in mind when considering fit for the problem we're meaning to solve. It also explains the widespread success of LLMs generating small scripts and MVPs. Which are largely disposable.

everdrive•1mo ago

Sounds like what companies have been scrambling for this whole time. People just want to dump something out there. They don't really care if it works very well.

0x3f•1mo ago

At best this would be 1.7x more _discovered_ bugs. The average PR (IMO) is hardly checked. AI could have 10x as many real issues on PRs, but we're just bad at reviewing PRs.

core-utility•1mo ago

Couldn't this be true in the other direction as well? Anecdotally I see developers putting a lot more scrutiny into vibe coded PRs, while AI code tends to be highly commented (by the AI) and potentially easier to read.

I've seen way more human comments of "I don't know what this does but if I remove it everything breaks" in systems.

maerF0x0•1mo ago

It's been suggested that code reviews shouldnt be about finding bugs, because they do not really work well for that goal.

https://www.microsoft.com/en-us/research/wp-content/uploads/...

bodge5000•1mo ago

As has already been said, we've been here before. I could ship significantly faster if I ignored any error handling or edge cases and basically just assumed the data would flow 100% how I expect it to all the time. Of course that is almost never the case, so I'd end up with more bugs.

I'd like to say that AI just takes this to an extreme but I'm not even sure about that, I think it could produce more code and more bugs than I could in the same amount of time but not significantly so if I just gave up on caring about anything

dnautics•1mo ago

> I could ship significantly faster if I ignored any error handling...

time to switch to erlang/elixir?

nerdjon•1mo ago

Something I have been very curious about for some time now. We know the quality of the code is not very high and has a high likelihood of bugs.

But, assuming there are not bugs and the code ships. Has there been any study in resource usage creeping up and an impact of this on a whole system. The tests I have done with trying to build things with AI it always seems like there is zero efficiency unless you notice it and can put it in the right direction.

I have been curious about the impact this will have on general computing as more low quality code makes it into applications we use every day.

windex•1mo ago

I think devs have now split into two camps, the kvetchers and the shippers. It's a new tool, it's fresh. Things will work itself out over the next couple of years/months(?). The kvetching helps keep AI research focused on the problem which is good. Meanwhile continue to ship.

SideburnsOfDoom•1mo ago

> ship faster but it produces more bugs

This is ... not actually faster.

mmastrac•1mo ago

In the pre-AI days I worked on a system like this that was constructed by a high-profile consulting team but continuously lost data and failed to meet even the basic standards.

I think I've seen so much rush-shipped slop (before and after) that I'm really anxiously waiting for this bubble to pop.

I have yet to be convinced that AI tooling can provide more than 20% or so speedup for an expert developer working in a modern stack/language.

yomismoaqui•1mo ago

Agentic AI coding is a tool, you can use it wrong.

To give an example of how to use AI successfully check the following post:

https://friendlybit.com/python/writing-justhtml-with-coding-...

joshribakoff•1mo ago

The problem with this black or white framing is it implies competency is a failure. Finding it not useful doesn’t make you wrong, anymore than finding it useful doesn’t make you inexperienced.

cmiles8•1mo ago

There are certainly some valid criticisms of vibe coding. That said, it’s not like the quality of most code was amazing before AI came along. In fact, most code is generally pretty terrible and took far too long for teams to ship.

Many folks would say that if shipping faster allows for faster iterations across an idea then the silly errors are worth it. I’ve certainly seen a sharp increase on execs calling BS on dev teams saying they need months to develop some basic thing.

tyleo•1mo ago

I think you need a balance. I’ve seen products fall apart due to high error rate.

I like to think of intentionalists—people who want to understand systems—and vibe coders—people who just want things to work on screen expediently.

I think success requires a balance of both. The current problem I see with AI is that it accelerates the vibe part more than the intentionalist part and throws the system out of balance.

cmiles8•1mo ago

Don’t disagree… I think it’s just applying a lot more pressure on dev teams to do things faster though. Devs tend to be expensive and expectations on productivity have increased dramatically.

Nobody wants teams to ship crap, but also folks are increasingly questioning why a bit of final polishing takes so long.

jmathai•1mo ago

More important than code quality is a joint understanding of the business problem and the technical solution for it. Today, that understanding is spread across multiple parties (eng, pm, etc).

Code quality can be poor as long as someone understands the tradeoffs for why it's poor.

coliveira•1mo ago

When a team says that a "trivial" feature takes months to ship is not because of the complexity of the algorithm. It's because of the infrastructure and coordination work required for the feature to properly work. It is almost aways a failure of the technical infrastructure previously created in the company. An AI will solve the trivial aspects of the problem, not the real problem.

dj_gitmo•1mo ago

> It is almost aways a failure of the technical infrastructure previously created in the company. An AI will solve the trivial aspects of the problem, not the real problem.

This is so true. Software that should be simple can become so gnarly because of bad infra. For example, our CI/CD team couldn't get updated versions of Python on the CI machines, and so suddenly we need to start using Docker for what should be a very simple software. That's just an example, but you get the idea, and it causes problems to compound over the years.

You really want good people with sharp elbows laying the foundations. At one time I resented people like that, but now I have seen what happens when you don't have anyone like that making technical decisions.

thunky•1mo ago

> It's because of the infrastructure and coordination work required for the feature to properly work.

Or, it's because they have no incentive to do it faster.

WhyOhWhyQ•1mo ago

And you think people who don't understand the software telling people who do they're doing it wrong is an outright positive?

Aurornis•1mo ago

> I’ve certainly seen a sharp increase on execs calling BS on dev teams saying they need months to develop some basic thing.

Some of the teams I worked with in the years right before AI coding went mainstream had become really terrible about this. They would spend months forming committees, writing documents, getting sign-offs and approvals, creating Gantt charts, and having recurring meetings for the simplest requests.

Before I left, they were 3 months deep into meetings about setting up role based access control on a simple internal CRUD app with a couple thousand users. We needed about 2-3 roles. They were into pros and cons lists for every library and solution they found, with one of the front runners involving a lot of custom development for some reason.

Yet the entire problem could have been solved with 3 Boolean columns in the database for the 3 different roles. Any developer could have done it in an afternoon, but they were stuck in a mindset of making a big production out of the process.

I feel like LLMs are good at getting those easy solutions done. If the company really only needs a simple change, having an LLM break free from the molasses of devs who complicate everything is a breath of fresh air.

On the other hand, if the company had an actual complicated need with numerous and changing roles over time, the simple Boolean column approach would have been a bad idea. Having people who know when to use each solution is the real key.

cmiles8•1mo ago

Yes. I’ve seen meetings where the dev team is going on and on about how it will take weeks to add a feature and someone calling BS just shares their screen, asks some AI agent to code it up, and it does. Is it 100% perfect? Perhaps not, but is close and does put the dev team in a spot of having to truly justify why it will take so long vs hand-wavy smoke and mirrors and “its technical you wouldn’t understand” commentary to leadership. Things have changed and I don’t think we’re going back.

Seattle3503•1mo ago

I think the "just do it" mindset requires management who understands that follow up requests might need to clean things up and refactor. I think some engineers have been traumatized by repeated applications of "just do it" and try to avoid technical debt up front, which is really really hard unless you are a SME with years of experience on building that exact thing.

dannersy•1mo ago

This attitude just furthers our race to the bottom. I agree with iteration, but software quality is getting really laughable. I know we're still on the better side of what existed in the hands of consumers in the 90s, but anyway... Execs calling BS is further evidence of that race to the bottom.

adverbly•1mo ago

> if shipping faster allows for faster iterations across an idea then the silly errors are worth it.

There is more to the consideration here...

Maintenance costs compound over time, as does complexity.

If you're building something that doesn't need to last long or is really simple, sure it's worth it... But a lot of stuff does need to last a long time and continuously change.

Vibe coding is a trade off that only works up to a certain distance from the existing system. If you don't need to go far, it's great. But if you need long range then it's the wrong tool for the job at the moment.

satisfice•1mo ago

The “it’s no worse” argument seems to ignore the headline, which literally reports that it is worse.

exitb•1mo ago

1.7x does not look that bad? If "AI code" is a broad classification that includes people using bad tools, or not being very skilful operators of said tools, then we can expect this number to meaningfully improve over time.

speed_spread•1mo ago

Tell that to your customers. And tell them how much longer the bugs generated by AI will take to fix by humans. Or tell them that you'll never fix the bugs because you're too busy vibe coding new ones.

exitb•1mo ago

I'm not saying bugs aren't a problem. I'm saying that if an emerging, fast improving tech is only slightly behind a human coder now, it seems conceivable that we're not that far off when they reach parity.

naasking•1mo ago

Exactly. I'm sure assembly language programmers from the 1980s could easily write code that ran 2x faster than the code produced by compilers of the time, but compilers only got better and eventually assembly language programming became a rare job, and humans can rarely outperform compilers on whole program compilation.

cryptonym•1mo ago

Assembly experts still write code that runs faster than code produced by compilers. Being slower is predictable and solved with better hardware, or just waiting. This is fine for most so we switched to easier or portable languages. Output of the program remains the same.

Impact of having 1.7x more bugs is difficult to assess and is not solved that easily. Comparison would work if that was about optimisations: code that is 1.7x slower / memory hungry.

naasking•1mo ago

> Assembly experts still write code that runs faster than code produced by compilers.

They sometimes can, but this is no longer a guaranteed outcome. Supercompilation optimizers can often put manual assembly to shame.

> Impact of having 1.7x more bugs is difficult to assess and is not solved that easily.

Time will tell. Arguably the number of bugs produced by AI 2 years ago was much higher than 1.7x. In 2 more years it might only be 1.2x bugs. In 4 years time it might be barely measurable. The trend over the next couple of years will judge whether this is a viable way forward.

djeastm•1mo ago

Auto-assign bug tickets to AI agents which work to fix the bugs, get AI code reviewed, make adjustments, send to human for sanity checking, deploy via CI.

lherron•1mo ago

They buried the lede. The last half of the article with ways to ground your dev environment to reduce the most common issues should be its own article. (However implementing the proper techniques somewhat obviates the need for CodeRabbit, so guess it’s understandable.)

bgwalter•1mo ago

The report is from cortex.io, based on only 50 self-selected responses from "engineering leaders" as well as from idpcon.com, hosted by cortex.

All websites involved are vibe coded garbage that use 100% CPU in Firefox.

neallindsay•1mo ago

1.7x more is not the same as 1.7x as many.

esafak•1mo ago

It's a lost cause. "It's two times faster!"

brainless•1mo ago

I use LLMs to generate almost all my code. Currently at 40K lines of Rust, backend and a desktop app. I am a senior engineer with almost all my tech career (16 years) in startups.

Coding with agents has forced me to generate more tests than we do in most startups, think through more things than we get the time to do in most startups, create more granular tasks and maintain CI/CD (my pipelines are failing and I need to fix them urgently).

These are all good things.

I have started thinking through my patterns to generate unit tests. I was generating mostly integration or end to end tests before. I started using helping functions in API handlers and have unit tests for helpers, bypassing the API level arguments (so not API mocking or framework test to deal with). I started breaking tasks down into smaller units, so I can pass on to a cheaper model.

There are a few patterns in my prompts but nothing that feels out of place. I do not use agents files and no MCPs. All sources here: https://github.com/brainless/nocodo (the product is itself going through a pivot so there is that).

WhyOhWhyQ•1mo ago

I see that your release is GPL 3.0. Are you worried about LLM's effectively laundering your source code a year from now? I've become hesitant about releasing source code since LLM's, though I do use Claude heavily while programming to make suggestions and look for issues etc.., but I'd be interested in hearing your perspective.

brainless•1mo ago

I am using GPL 3.0 mostly from a business standpoint, so that I can sell a commercial license to companies that may want to modify the project. In really LLMs will launder everything they can, I am not sure if that can be stopped. So that affects every project. The reason why LLMs are as good is that they train on all code that is available.

WhyOhWhyQ•1mo ago

Why do companies pay the commercial license instead of just generating a copycat? Genuinely curious.

brainless•1mo ago

Companies are lazy. Big ones are lazy and scared to not have someone to hold accountable.

naasking•1mo ago

It's totally plausible that AI codegen produces more bugs. It still seems important to familiarize yourself with these tools now though, because that bug count is only ever going to go down. These tools are here to stay.

oasisaimlessly•1mo ago

Are you trying to assure others or reassure yourself?

naasking•1mo ago

Just extrapolating a trend of the past few years. If you disagree, then carry on and time will tell.

djeastm•1mo ago

Same question can be asked to either side of this debate. We're all self-soothing apes, after all.

esafak•1mo ago

It produces more bugs but the count goes down?!

naasking•1mo ago

Did the models from 2 years ago produce more bugs, fewer bugs or the same bugs as today's models? Do you think next years AI models will produce the same number of bugs, more bugs, or fewer bugs?

kentm•1mo ago

> Did the models from 2 years ago produce more bugs, fewer bugs or the same bugs as today's models?

Is anyone actually tracking that with a methodology not prone to fine-tuning? Specifically, I know a lot of the tests have the problem that you can train the AI to pass the test, so a higher score is not indicative of overall higher performance. I'm not actually being rhetorical here to make a point; I'm genuinely interested if anyone has derived a methodology that gives confidence behind these claims.

(Aside: Its not a huge stretch to claim that they're getting better, but it mostly seems anecdotal from this point, or using methods that have the above problem I stated)

naasking•1mo ago

I'm assessing my own experience here. I occasionally check new models on some kinds of problems I'm familiar with but that are not common programming challenges, like arrow-based FRP abstractions but written in C# rather than Haskell. I've noticed considerable improvements on their ability to translate such abstractions idiomatically.

GoatInGrey•1mo ago

> These tools are here to stay.

I see this stranded claim get trotted out in the corporate world all the time as a refutation to "The AI broke". I fail to understand who the invisible audience for this is supposed to be.

Also, the FOMO argument for why one should use X as much as they can comes across as scammy. Not that I believe you're trying to scam anyone. Though the person you originally got this from may have been!

naasking•1mo ago

> Also, the FOMO argument for why one should use X as much as they can

Who said that you should use it as much as you can?

> Though the person you originally got this from may have been!

Borderline offensive to claim that I could only parrot someone else's sales pitch rather than come to my own conclusions based on technical merit and an evaluation of progress of the past few years.

kentm•1mo ago

> These tools are here to stay.

I don't think that these specific tools are here to stay. Categorically, yes, but I expect there to be big changes in interfaces and how they work in the next five years.

I don't think spin up time on LLM technology requires as much investment as the hype claims, nor do I think that the current methodology will be as long lived as they think. Sitting out may be detrimental in the now, but I expect that developers that do so will be able to catch up just fine.

naasking•1mo ago

> I don't think that these specific tools are here to stay.

I agree, we're still in an experimental phase so these tools are definitely not in their final form, I meant more that the LLM core of the tool is here to stay, and familiarizing yourself with how to use LLMs to solve every day problems is a good time investment.

I've experimented mostly with the raw chat interfaces for programming, circuit design, querying documentation, and troubleshooting OS issues rather than spending time googling, and they've proven incredibly valuable even with the basic chat interface. I've also hit many of the issues/limitations other people have reported, sometimes wasting some time going down a rabbit hole, and sometimes I was convinced that the LLM was wrong in diagnosing some issue, but it turned out to be correct in the end.

Despite the difficulties, it remains the case that I wouldn't have started or finished some projects without them.

geldedus•1mo ago

Not for me.

827a•1mo ago

Archetypes of prompts that I find AI to be quite good at handling:

1. "Write a couple lines or a function that is pretty much what four years ago I would have gone to npm to solve" (e.g. "find the md5 hash of this blob")

2. "Write a function that is highly represented and sampleable in the rest of the project" (e.g. "write a function to query all posts in the database by author_id" (which might include app-specific steps like typing it into a data model)).

3. "Make this isolated needle-in-a-haystack change" (e.g. "change the text of such-and-such tooltip to XYZ") (e.g. "there's a bug with uploading files where we aren't writing the size of the file to the database, fix that")

I've found that it can definitely do wider-ranging tasks than that (e.g. implement all API routes for this new data type per this description of the resource type and desired routes); and it can absolutely work. But, the two problems I run into:

1. Because I don't necessarily have a grokable handle on what it generated, I don't have a sense of what its missing and needed follow-on prompts to create. E.g.: I tell it to write an endpoint that allows users to upload files. A few days later, we realize we aren't MD5-hashing the files that got uploaded; there was a field in the database & resource type to store this value, but it didn't pick up on that, and I didn't prompt it to do this; so its not unreasonable. But oftentimes when I'm writing routes by hand, I'm spending so much time in that function body that follow-on requirements naturally occur to me ("Oh that's right, we talked about needing this route available to both of these two permissions, crap let me implement that"). With AI, it finishes so fast that my brain doesn't have time to remember all the requirements.

2. We've tried to mitigate this by pushing more development into the specs and requirements up-front. This is really hard to get humans to do, first of all. But more critically: None of our data supports the hypothesis that this has shortened cycle times. It mostly just trades writing typescript for reading & writing English (which few engineers I've ever worked with are actually all that good at). The engineers still end up needing long cycle times back-and-forth with the AI to get correct results, and long cycle times in review.

3. The more code you ask it to generate, the more vibeslop you get. Deeply-nested try/catch statements with multiple levels of error handling & throwing. Comments everywhere. Reimplementing the same helper functions five times. These things, we have found, raise the cost and lower the reliability & performance of future prompting, and quickly morph parts of the system into a no-man's-land (literally) where only AIs can really make any change; and every change even by the AIs get harder and harder to ship. Our reported customer issues on these parts of the app are significantly higher than others, and our ability to triage these issues is also impacted because we no longer have SMEs that can just brain-triage issues in our CS channels; everything now requires a full engineering cycle, with AI involvement, to solve.

Our engineers run the spectrum of "never wanted to touch AI, never did" to "earnestly trying to make it work". Ultimately I think the consensus position is: Its a tool that is nice to have in the toolbox, but any assertion that its going to fundamentally change the profile of work our engineers do, or even seriously impact hiring over the long-term, is outside the realm of foreseeable possibility. The models and surrounding tooling are not improving fast enough.

kristopherleads•1mo ago

I really think the answer here is human-in-the-loop. Too many people are thinking that AI is a full on drop-in replacement for engineers or managers, but ultimately having it be an augment is the magic. I work at FlowFuse so super biased, but that's something I've really enjoyed with our MCP and Expert Assistant - it's built to help you, not to replace you, so you can ask questions, get insights, etc. faster.

I suppose the tl;dr is if you're generating bugs in your flow and they make it to prod, it's not a tool problem - it's a cultural one.

sailfast•1mo ago

How many more bugs does it produce if we use CodeRabbit to review PRs? I assume the number will be less? (Asking seriously and hopefully if the product will help or would’ve caught the bugs, while also pointing out the natural conclusion of the article is to purchase your service :) )

jampa•1mo ago

I've used it in some repos. It doesn't catch all code review issues, especially around product requirements and logic simplification, and occasionally produces irrelevant comments (suggesting a temporary model downgrade).

But it's well worth it. It has saved me some considerable time. I let it run first, even before my own final self-review (so if others do the same, the article's data might be biased). It's particularly good at identifying dead code and logical issues. If you tune it with its own custom rules (like Claude.md), you can also cut a lot of noise.

carra•1mo ago

Am I the only one thinking that 1.7x is a very weird way of saying "70% more"? It's even wrong since, like other comments point out, 1.7x MORE would in fact be 2.7 times as much. Which is not what the bug numbers say.

visarga•1mo ago

AI helps ship faster but we need to code 1.7x more tests (with AI) and it's allright

kkarpkkarp•1mo ago

I can't find if they deducted false-positives before they count the results. I've been using CodeRabbit heavily and like any other AI code reviewing tools it was having a lot of them.

Like for example: found missing data validation / sanitization reported, only because the code has already been sanitized / validated but this is not visible in the diff.

You can tell CodeRabbit he is wrong about this and the tool accepts it then, though.

TheAnkurTyagi•1mo ago

code reviews take way longer now because you gotta actually read through everything instead of trusting the dev knew what they were writing. Its like the AI is great at the happy path but completely misses edge cases or makes weird assumptions about state...

The real kicker is when someone copies AI generated code without understanding it and then 3 months later nobody can figure out why production keeps having these random issues. debugging AI slop is its own special hell

LurkandComment•1mo ago

Sorry guys I'm having trouble here:

Is it: AI builds product faster but with more bugs in production (Adds overall time to acceptable production)

Is is: Al helps us build faster, enough though we have to fix more bugs before production. (over all less time to acceptable production, but specifically fixing bugs takes longer)

givemeethekeys•1mo ago

It’s AI helps you ship faster. Ignore everything else.

daheza•1mo ago

Depends on the product

compliance / healthcare - shipping faster with bugs is not better

marketing / non-critical-workflows - sure go for it

crowbahr•1mo ago

He didn't say AI helps you ship with fewer bugs, just faster.

It's faster. It's not better, it's not more stable - the only promise is velocity on initial ship.

Oh you want to follow up? Extend it? Reconfigure? Sry can only go faster.

nphardon•1mo ago

Ship fast, let customer do the QA, is that the idea?

jasonlotito•1mo ago

The title (Our new report: AI code creates 1.7x more problems) is wrong.

The article says problems, not bug. Problems seems to include formatting or naming issues. I couldn't find 1.7x bugs specifically. I only see 3 mentions of bugs, with no number attached.

bilater•1mo ago

for now

yonibot•1mo ago

How useful is this metric if we don't know which LLM produced each MR?

There could be massive differences in quality between LLMs.

core-utility•1mo ago

Also, vibe coding has really become a skill in itself. Does it allow the average non-programmer to go to production easily (and likely have bugs/flaws)? Yes. But as someone who used to be a programmer and just fell out of skill with all of it, I've found that what I do still have is a keen sniff test for good vs bad, and can guide the direction and architecture with proper planning.

Garbage in, garbage out still applies.

stevenfoster•1mo ago

Someone solved this 13 years ago: https://github.com/mattdiamond/fuckitjs

alexgotoi•1mo ago

The pattern here feels pretty old: every time something shows up that lets people go much faster, we use it to crash harder first. When cars showed up, people didn’t suddenly become more careful because they could now move at 50 km/h instead of 5 – they just plowed into things faster until seatbelts, traffic rules and driver training caught up.

LLMs in coding feel similar. They don’t magically remove the need for tests, specs, and review; they just compress the time between “idea” and “running code” so much that all the missing process shows up as outages instead of slow PRs. The risk isn’t “AI writes code”, it’s orgs refusing to slow down long enough to build the equivalent of traffic lights and driver’s ed around it.

andrenotgiant•1mo ago

... _says company selling a service to use AI to find bugs._

gwbas1c•1mo ago

A few days ago I implemented IComparable in .Net / C#, and copilot "read my mind" and made a suggestion. It saved me 2-3 minutes of typing.

It took me about an hour of debugging to realize that copilot swapped a variable prefixed with "x" and a different one prefixed with "y".

If I wrote it myself, I wouldn't have made that mistake. But, otherwise copilot wrote the exact code that I was going to write.

scuff3d•1mo ago

Yesterday we get a "report" from an AI company talking about how much faster you can go with AI. Today we get an AI code review company telling us how much buggier that code is.

It's almost like companies that want to sell you shit are only interested in the parts of the story that benefit them.

gilbetron•1mo ago

1.7x more bugs, not great, not terrible

xnx•1mo ago

> What we learned from analyzing hundreds of open-source pull requests.

This does not sound like a sufficiently large dataset

quantum_state•1mo ago

Vibe coding does not seem to be able to escape from the saying: he who attempts a free lunch will get diarrhea.

SectorC: A C Compiler in 512 bytes

The F Word

Brookhaven Lab's RHIC concludes 25-year run with final collisions

Speed up responses with fast mode

Software factories and the agentic moment

Hoot: Scheme on WebAssembly

Tiny C Compiler

Stories from 25 Years of Software Development

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

The Waymo World Model

First Proof

Show HN: A luma dependent chroma compression algorithm (image compression)

Italy Railways Sabotaged

Al Lowe on model trains, funny deaths and working with Disney

I write games in C (yes, C)

Vocal Guide – belt sing without killing yourself

Start all of your commands with a comma (2009)

Show HN: I saw this cool navigation reveal, so I made a simple HTML+CSS version

Reinforcement Learning from Human Feedback

Selection Rather Than Prediction

72M Points of Interest

A Fresh Look at IBM 3270 Information Display System

The AI boom is causing shortages everywhere else

Coding agents have replaced every framework I used

France's homegrown open source online office suite

Unseen Footage of Atari Battlezone Arcade Cabinet Production

Where did all the starships go?

Show HN: Kappal – CLI to Run Docker Compose YML on Kubernetes for Local Dev

Learning from context is harder than we thought

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

SectorC: A C Compiler in 512 bytes

The F Word

Brookhaven Lab's RHIC concludes 25-year run with final collisions

Speed up responses with fast mode

Software factories and the agentic moment

Hoot: Scheme on WebAssembly

Tiny C Compiler

Stories from 25 Years of Software Development

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

The Waymo World Model

First Proof

Show HN: A luma dependent chroma compression algorithm (image compression)

Italy Railways Sabotaged

Al Lowe on model trains, funny deaths and working with Disney

I write games in C (yes, C)

Vocal Guide – belt sing without killing yourself

Start all of your commands with a comma (2009)

Show HN: I saw this cool navigation reveal, so I made a simple HTML+CSS version

Reinforcement Learning from Human Feedback

Selection Rather Than Prediction

72M Points of Interest

A Fresh Look at IBM 3270 Information Display System

The AI boom is causing shortages everywhere else

Coding agents have replaced every framework I used

France's homegrown open source online office suite

Unseen Footage of Atari Battlezone Arcade Cabinet Production

Where did all the starships go?

Show HN: Kappal – CLI to Run Docker Compose YML on Kubernetes for Local Dev

Learning from context is harder than we thought

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

AI helps ship faster but it produces 1.7× more bugs

Comments