Speed at the cost of quality: Study of use of Cursor AI in open source projects

61•wek•3h ago

Comments

rfw300•3h ago

Super interesting study. One curious thing I've noticed is that coding agents tend to increase the code complexity of a project, but simultaneously massively reduce the cost of that code complexity.

If a module becomes unsustainably complex, I can ask Claude questions about it, have it write tests and scripts that empirically demonstrate the code's behavior, and worse comes to worst, rip out that code entirely and replace it with something better in a fraction of the time it used to take.

That's not to say complexity isn't bad anymore—the paper's findings on diminishing returns on velocity seem well-grounded and plausible. But while the newest (post-Nov. 2025) models often make inadvisable design decisions, they rarely do things that are outright wrong or hallucinated anymore. That makes them much more useful for cleaning up old messes.

joshribakoff•2h ago

Bad code has real world consequences. Its not limited to having to rewrite it. The cost might also include sanctions, lost users, attrition, and other negative consequences you don’t just measure in dev hours

SR2Z•2h ago

Right, but that cost is also incurred by human-written code that happens to have bugs.

In theory experienced humans introduce less bugs. That sounds reasonable and believable, but anyone who's ever been paid to write software knows that finding reliable humans is not an easy task unless you're at a large established company.

verdverm•2h ago

There was a recent study posted here that showed AI introduces regressions at an alarming rate, all but one above 50%, which indicates they spend a lot of time fixing their own mistakes. You've probably seen them doing this kind of thing, making one change that breaks another, going and adjusting that thing, not realizing that's making things worse.

MeetingsBrowser•2h ago

The question then becomes, can LLMs generate code close to the same quality as professionals.

In my experience, they are not even close.

mathgeek•1h ago

We should qualify that kind of statement, as it’s valuable to define just what percentile of “professional developers” the quality falls into. It will likely never replace p90 developers for example, but it’s better than somewhere between there and p10. Arbitrary numbers for examples.

MeetingsBrowser•50m ago

Can you quantify the quality of a p90 or p10 developer?

I would frame it differently. There are developers successfully shipping product X. Those developer are, on average, as skilled as necessary to work on project X. else they would have moved on or the project would have failed.

Can LLMs produce the same level of quality as project X developers? The only projects I know of where this is true are toy and hobby projects.

mathgeek•21m ago

> Can you quantify the quality of a p90 or p10 developer?

Of course not, you have switched “quality” in this statement to modify the developer instead of their work. Regarding the work, each project, as you agree with me on from your reply, has an average quality for its code. Some developers bring that down on the whole, others bring it up. An LLM would have a place somewhere on that spectrum.

MeetingsBrowser•2h ago

This only helps if you notice the code is bad. Especially in overlay complex code, you have to really be paying attention to notice when a subtle invariant is broken, edge case missed, etc.

Its the same reason a junior + senior engineer is about as fast as a senior + 100 junior engineers. The senior's review time becomes the bottleneck and does not scale.

And even with the latest models and tooling, the quality of the code is below what I expect from a junior. But you sure can get it fast.

phillipclapham•26m ago

This is the most important point in the thread. The study measures code complexity but the REAL bottleneck is cognitive load (and drain) on the reviewer.

I've been doing 10-12 hour days paired with Claude for months. The velocity gains are absolutely real, I am shipping things I would have never attempted solo before AI and shipping them faster then ever. BUT the cognitive cost of reviewing AI output is significantly higher than reviewing human code. It's verbose, plausible-looking, and wrong in ways that require sustained deep attention to catch.

The study found "transient velocity increase" followed by "persistent complexity increase." That matches exactly. The speed feels incredible at first, then the review burden compounds and you're spending more time verifying than you saved generating.

The fix isn't "apply traditional methods" — it's recognizing that AI shifts the bottleneck from production to verification, and that verification under sustained cognitive load degrades in ways nobody's measuring yet. I think I've found some fixes to help me personally with this and for me velocity is still high, but only time will tell if this remains true for long.

i_love_retros•1h ago

> have it write tests

Just make sure it hasn't mocked so many things that nothing is actually being tested. Which I've witnessed.

moregrist•49m ago

I’ve also seen Opus 4.5 and 4.6 churn out tons of essentially meaningless tests, including ones where it sets a field on a structure and then tests that the field was set.

You have to actually care about quality with these power saws or you end up with poorly-fitting cabinets and might even lose a thumb in the process.

teaearlgraycold•16m ago

The first thing you should do after having them write tests is delete half of the tests.

AlexandrB•59m ago

> Super interesting study. One curious thing I've noticed is that coding agents tend to increase the code complexity of a project, but simultaneously massively reduce the cost of that code complexity.

This is the same pattern I observed with IDEs. Autocomplete and being able to jump to a definition means spaghetti code can be successfully navigated so there's no "natural" barrier to writing spaghetti code.

PeterStuer•2h ago

Interesting from an historical perspective. But data from 4/2025? Might as well have been last century.

happycube•2h ago

I think the gist of it still applies to even Claude Code w/Opus 4.6.

It's basically outsourcing to mediocre programmers - albeit very fast ones with near-infinite patience and little to no ego.

Miraste•2h ago

It doesn't map well to a mediocre human programmer, I think. It operates in a much more jagged world between superhuman, and inhuman stupidity.

matt_heimer•2h ago

Yes, it's not surprising that warnings and complexity increased at a higher rate when paired with increased velocity. Increased velocity == increased lines of code.

Does the study normalize velocity between the groups by adjusting the timeframes so that we could tell if complexity and warnings increased at a greater rate per line of code added in the AI group?

I suspect it would, since I've had to simplify AI generated code on several occasions but right now the study just seems to say that the larger a code base grows the more complex it gets which is obvious.

ex-aws-dude•2h ago

That was my thought as well, because obviously complexity increases when a project grows regardless of AI

bensyverson•2h ago

Yeah, I have a more complex project I'm working on with Claude, but it's not that Claude is making it more complex; it's just that it's so complex I wouldn't attempt it without Claude.

AstroBen•2h ago

"Notably, increases in codebase size are a major determinant of increases in static analysis warnings and code complexity, and absorb most variance in the two outcome variables. However, even with strong controls for codebase size dynamics, the adoption of Cursor still has a significant effect on code complexity, leading to a 9% baseline increase on average compared to projects in similar dynamics but not using Cursor."

AstroBen•2h ago

> On average, Cursor adoption has a modestly significant positive impact on development velocity, particularly in terms of code production volume: Lines added increase by about 28.6% (Table 2). There is no statistically significant effect for the volume of commits.

This doesn't equate to a faster development speed in my eyes? We know that AI code is incredibly verbose.

More lines of code doesn't equate to faster development - even more so when you're comparing apples (human written) to oranges (AI written)

AstroBen•2h ago

They're measuring development speed through lines of code. To show that's true they'd need to first show that AI and humans use the same number of lines to solve the same problem. That hasn't been my experience at all. AI is incredibly verbose.

Then there's the question of if LoC is a reliable proxy for velocity at all? The common belief amongst developers is that it's not.

mellosouls•2h ago

Depends on the nature of the tool I would imagine - eg. Claude Code Terminal (say) would have higher entry requirements in terms of engineering experience (Cursor was sold as newbie-friendly) so I would predict higher quality code than Cursor in a similar survey.

ofc that doesn't take into account the useful high-level and other advantages of IDEs that might mitigate against slop during review, but overall Cursor was a more natural fit for vibe-coders.

This is said without judgement - I was a cheerleader for Cursor early on until it became uncompetitive in value.

mentalgear•1h ago

> We find that the adoption of Cursor leads to a statistically significant, large, but transient increase in project-level development velocity, along with a substantial and persistent increase in static analysis warnings and code complexity. Further panel generalized-method-of-moments estimation reveals that increases in static analysis warnings and code complexity are major factors driving long-term velocity slowdown. Our study identifies quality assurance as a major bottleneck for early Cursor adopters and calls for it to be a first-class citizen in the design of agentic AI coding tools and AI-driven workflows.

So overall seems like the pros and cons of "AI vibe coding" just cancel themselves out.

chris_money202•1h ago

Now someone do a research study where a summary of this research paper is in the AGENTS.md and let’s see if the overall outcomes are better

dalemhurley•1h ago

I think the issue is people AI assisted code, test then commit.

Traditional software dev would be build, test, refactor, commit.

Even the Clean Coder recommends starting with messy code then tidying it up.

We just need to apply traditional methods to AI assisted coding.

duendefm•59m ago

AI is not perfect sure, one has to know how to use it. But this study is already flawed since models improved a lot since the beginning of 2026.

Eufrat•37m ago

This is not a useful, constructive or meaningful statement. Attempting to claim the models are the future by perpetually arguing their limitations are because people are using the models wrong or that the argument has been invalidated because the new model fixes it might as well be part of the training data since Claude Opus 3.5.

Meta’s renewed commitment to jemalloc

The “small web” is bigger than you might think

My Journey to a reliable and enjoyable locally hosted voice assistant (2025)

Language Model Teams as Distrbuted Systems

Why I love FreeBSD

Launch HN: Voygr (YC W26) – A better maps API for agents and AI apps

Apideck CLI – An AI-agent interface with much lower context consumption than MCP

Agent Skills – Open Security Database

Nvidia Launches Vera CPU, Purpose-Built for Agentic AI

Polymarket gamblers threaten to kill me over Iran missile story

Cert Authorities Check for DNSSEC from Today

Starlink Mini as a failover

Corruption erodes social trust more in democracies than in autocracies

Kaizen (YC P25) Hiring Eng, GTM, Cos to Automate BPOs

On The Need For Understanding

Launch HN: Chamber (YC W26) – An AI Teammate for GPU Infrastructure

Lazycut: A simple terminal video trimmer using FFmpeg

US Job Market Visualizer

Home Assistant waters my plants

MoD sources warn Palantir role at heart of government is threat to UK security

Lies I was told about collaborative editing, Part 2: Why we don't use Yjs

Kona EV Hacking

Where does engineering go? Retreat findings and insights [pdf]

AirPods Max 2

The bureaucracy blocking the chance at a cure

Show HN: Claude Code skills that build complete Godot games

Comparing Python Type Checkers: Typing Spec Conformance

Speed at the cost of quality: Study of use of Cursor AI in open source projects

Palestinian boy, 12, describes how Israeli forces killed his family in car

Even faster asin() was staring right at me

Meta’s renewed commitment to jemalloc

The “small web” is bigger than you might think

My Journey to a reliable and enjoyable locally hosted voice assistant (2025)

Language Model Teams as Distrbuted Systems

Why I love FreeBSD

Launch HN: Voygr (YC W26) – A better maps API for agents and AI apps

Apideck CLI – An AI-agent interface with much lower context consumption than MCP

Agent Skills – Open Security Database

Nvidia Launches Vera CPU, Purpose-Built for Agentic AI

Polymarket gamblers threaten to kill me over Iran missile story

Cert Authorities Check for DNSSEC from Today

Starlink Mini as a failover

Corruption erodes social trust more in democracies than in autocracies

Kaizen (YC P25) Hiring Eng, GTM, Cos to Automate BPOs

On The Need For Understanding

Launch HN: Chamber (YC W26) – An AI Teammate for GPU Infrastructure

Lazycut: A simple terminal video trimmer using FFmpeg

US Job Market Visualizer

Home Assistant waters my plants

MoD sources warn Palantir role at heart of government is threat to UK security

Lies I was told about collaborative editing, Part 2: Why we don't use Yjs

Kona EV Hacking

Where does engineering go? Retreat findings and insights [pdf]

AirPods Max 2

The bureaucracy blocking the chance at a cure

Show HN: Claude Code skills that build complete Godot games

Comparing Python Type Checkers: Typing Spec Conformance

Speed at the cost of quality: Study of use of Cursor AI in open source projects

Palestinian boy, 12, describes how Israeli forces killed his family in car

Even faster asin() was staring right at me

Speed at the cost of quality: Study of use of Cursor AI in open source projects

Comments