It's basically outsourcing to mediocre programmers - albeit very fast ones with near-infinite patience and little to no ego.
Does the study normalize velocity between the groups by adjusting the timeframes so that we could tell if complexity and warnings increased at a greater rate per line of code added in the AI group?
I suspect it would, since I've had to simplify AI generated code on several occasions but right now the study just seems to say that the larger a code base grows the more complex it gets which is obvious.
This doesn't equate to a faster development speed in my eyes? We know that AI code is incredibly verbose.
More lines of code doesn't equate to faster development - even more so when you're comparing apples (human written) to oranges (AI written)
Then there's the question of if LoC is a reliable proxy for velocity at all? The common belief amongst developers is that it's not.
ofc that doesn't take into account the useful high-level and other advantages of IDEs that might mitigate against slop during review, but overall Cursor was a more natural fit for vibe-coders.
This is said without judgement - I was a cheerleader for Cursor early on until it became uncompetitive in value.
So overall seems like the pros and cons of "AI vibe coding" just cancel themselves out.
Traditional software dev would be build, test, refactor, commit.
Even the Clean Coder recommends starting with messy code then tidying it up.
We just need to apply traditional methods to AI assisted coding.
rfw300•3h ago
If a module becomes unsustainably complex, I can ask Claude questions about it, have it write tests and scripts that empirically demonstrate the code's behavior, and worse comes to worst, rip out that code entirely and replace it with something better in a fraction of the time it used to take.
That's not to say complexity isn't bad anymore—the paper's findings on diminishing returns on velocity seem well-grounded and plausible. But while the newest (post-Nov. 2025) models often make inadvisable design decisions, they rarely do things that are outright wrong or hallucinated anymore. That makes them much more useful for cleaning up old messes.
joshribakoff•2h ago
SR2Z•2h ago
In theory experienced humans introduce less bugs. That sounds reasonable and believable, but anyone who's ever been paid to write software knows that finding reliable humans is not an easy task unless you're at a large established company.
verdverm•2h ago
MeetingsBrowser•2h ago
In my experience, they are not even close.
mathgeek•1h ago
MeetingsBrowser•50m ago
I would frame it differently. There are developers successfully shipping product X. Those developer are, on average, as skilled as necessary to work on project X. else they would have moved on or the project would have failed.
Can LLMs produce the same level of quality as project X developers? The only projects I know of where this is true are toy and hobby projects.
mathgeek•21m ago
Of course not, you have switched “quality” in this statement to modify the developer instead of their work. Regarding the work, each project, as you agree with me on from your reply, has an average quality for its code. Some developers bring that down on the whole, others bring it up. An LLM would have a place somewhere on that spectrum.
MeetingsBrowser•2h ago
Its the same reason a junior + senior engineer is about as fast as a senior + 100 junior engineers. The senior's review time becomes the bottleneck and does not scale.
And even with the latest models and tooling, the quality of the code is below what I expect from a junior. But you sure can get it fast.
phillipclapham•26m ago
I've been doing 10-12 hour days paired with Claude for months. The velocity gains are absolutely real, I am shipping things I would have never attempted solo before AI and shipping them faster then ever. BUT the cognitive cost of reviewing AI output is significantly higher than reviewing human code. It's verbose, plausible-looking, and wrong in ways that require sustained deep attention to catch.
The study found "transient velocity increase" followed by "persistent complexity increase." That matches exactly. The speed feels incredible at first, then the review burden compounds and you're spending more time verifying than you saved generating.
The fix isn't "apply traditional methods" — it's recognizing that AI shifts the bottleneck from production to verification, and that verification under sustained cognitive load degrades in ways nobody's measuring yet. I think I've found some fixes to help me personally with this and for me velocity is still high, but only time will tell if this remains true for long.
i_love_retros•1h ago
Just make sure it hasn't mocked so many things that nothing is actually being tested. Which I've witnessed.
moregrist•49m ago
You have to actually care about quality with these power saws or you end up with poorly-fitting cabinets and might even lose a thumb in the process.
teaearlgraycold•16m ago
AlexandrB•59m ago
This is the same pattern I observed with IDEs. Autocomplete and being able to jump to a definition means spaghetti code can be successfully navigated so there's no "natural" barrier to writing spaghetti code.