Stack Overflow data reveals the hidden productivity tax of almost right AI code

https://venturebeat.com/ai/stack-overflow-data-reveals-the-hidden-productivity-tax-of-almost-right-ai-code/

29•isaacfrond•20h ago

Comments

belter•20h ago

It's remarkable that academia hasn't produced rigorous studies with statistically significant samples, with hundreds of developers across languages and skill levels, to properly assess GenAI impact on productivity. The latest published research has a sample size of like...16 developers?

We don't even need human subjects. GitHub contains a natural experiment: Millions of high-quality commits and PRs from before the GenAI explosion (pre-Jan 2021). Compare velocity, complexity, and quality metrics before and after, the data is sitting there, waiting to be analyzed.

Instead, we're drowning in anecdotes while petabytes of empirical evidence remain untouched.

steveBK123•19h ago

As in every bubble, its the incentives.

Everyone is selling AI or being sold AI.

CJefferson•18h ago

The main problem (and the reason I have no intention of doing any research involving AI programming), is even if you could get the research started, completed, and published in 4 months (and incredibly ambitious goal), 95% of the comments would just be "Oh, but you didn't consider FishGPT 5.2, and Pineapple AI, which came out 6 weeks ago, so any negative points are entirely out of date".

bigtex•18h ago

"You didn't use SOTA" which is what I see when people respond to critical statements about the productivity of AI on Twitter and Reddit.

lelanthran•18h ago

> It's remarkable that academia hasn't produced rigorous studies with statistically significant samples, with hundreds of developers across languages and skill levels, to properly assess GenAI impact on productivity.

It's much more remarkable, to me, that the AI vendors have been unable to publish a study on the effects of agents on developer velocity.

I mean, if it was really a 10x (or even 2x) boost to productivity, why aren't they seeing it? The returns on each generation have been diminishing compared to the cost. Shouldn't it be the other way around if AI was such a productivity boost?

AIPedant•18h ago

This has been the case in software development for a very long time, it is almost impossible to get reliable scientific data about these things because it's almost impossible to run a controlled experiment, and even if you did it's very difficult to decide what to measure and how to measure it. (It's easy to say "compare complexity and quality metrics" but how the fk are you going to define that? Does it really make sense to use the same complexity metric for Go and C++?)

Software developers have had many flame wars over the decades because the lack of data forces the conversation to be anecdotal and ideological:

- should we use dynamic/gradual typing and prioritize developer productivity, or static typing to help enforce correctness?

- agile vs waterfall

- OO vs procedural vs functional

- is Rust's fussiness around memory management more trouble than it's worth for large projects?

- when should you use a 3rd-party library vs doing it yourself?

So this really is nothing new. You are badly underestimating the scientific challenges, instead just hoping big data will plow through. It won't.

belter•18h ago

Your point cuts both ways: If measuring GenAI impact is as difficult as you say, then the bold claims of dramatic productivity gains are just as fragile. How, exactly, do nine out of ten CEOs know it works?

AIPedant•18h ago

Yes, that is correct. (I don't use LLMs at all for ethical reasons, so I don't have a dog in this specific fight.)

bwfan123•20h ago

We are past the "peak of inflated expectations" and entering the "trough of disillusionment" - which are the gartner hype-cycle phases for any new tech [1]. Just for reference (and whether you agree with it or not) the phases are 1) innovation, 2) peak of inflated expectations, 3) trough of disillusionment, 4) enlightenment and finally 5) plateau

Of course, stack-overflow has every incentive to push this narrative, which the LLM makers will counter vigorously. The end-consumers (developers) will be the jury.

[1] https://www.gartner.com/en/articles/hype-cycle-for-artificia...

AIPedant•18h ago

The core issue with the Gartner hype cycle (beyond it not being a cycle!) is that it simply does not apply to "any new tech," tautologically it only applies to hype-driven tech with staying power. Tech which is not driven by hype (e.g. mRNA vaccines driven by urgent need) don't have a trough of disappointment, just a steady slope upwards. And tech which is driven entirely by hype (Theranos) stays in the trough of disappointment forever.

The Gartner hype cycle is astrology for MBAs: it cannot fail, it can only be failed by those lacking faith to see its wisdom.

chuckadams•18h ago

I'm a bit amused to learn that the term "Gartner Hype Cycle" is actually some official term from Gartner. I always thought it was a tongue-in-cheek reference to the fact that Gartner is at is core a hype merchant. Reality laps satire once again.

devenson•17h ago

mRNA could be said to be in the trough of disillusionment -- didn't prevent COVID as initially promised, and has problems like causing heart damage in rare cases.

This disillusionment is evidenced by lower proportion of people seeking to take such injections now.

AIPedant•16h ago

The trough of disillusionment really refers to serious analysis of a technology's problems, not idiotic conspiracy theories spread on Twitter. This is like saying 5G entered the trough of disillusionment because of the threat of telepathic Zionists.

In Realityland, the COVID vaccine was an overwhelming success.

lowsong•16h ago

We're not even close to passing the "peak of inflated expectations". This is the monster at the heart of the AI industry, that it just doesn't provide value. The "trough of disillusionment" will not be a minor reassessment of AI's place but a complete collapse of billions of dollars of investment overnight.

zihotki•20h ago

As if humans always produce high quality always right working code without tech debt. The article is full of AI cliche echos.

Capricorn2481•19h ago

I've never had a colleague make up a method that doesn't exist and then hand it to me saying it was done. That's not tech debt, it's wrong in a way that's distinct from humans.

The problem with LLMs is they treat all context as significant. Even telling it to ignore previous context is just confusing it more. So Cursor gets hung up on dumb shit like insignificant linter errors or insisting that extracting a block of code to a function will fix a bug (what?). You add more cursor rules to keep it from doing these things, which doesn't focus it, it just pollutes the context more. It doesn't know when to ignore files it's already read. It just gets shittier and shittier.

I appreciate that I can tell someone where the problem is and they'll know what is or isn't irrelevant.

Edit: Here is something Claude Sonnet 4 told me today. I would expect this if I was talking to it for a while, not the second request.

"Used excludeSwitches (plural) instead of excludeSwitches"

qwertylicious•19h ago

Yeah, LLMs share a lot of the same challenges as self-driving cars: when they work great, we get complacent, and then when they fail, they fail in ways that humans are really bad at anticipating.

And the discourse around all this is also the same: the detractors point out these flaws, and the proponents chime in with "Yeah well humans suck too" and around and around we go...

Quarrelsome•15h ago

I can teach a human, teaching an AI is much harder. It has no emotional hooks you can put things on. It has no shame.

orev•19h ago

It reminds me of “Debugging is twice as hard as writing the code in the first place” (Brian Kernighan).

If you use AI to write the code, did you ever make it to the 1x level to begin with? How can you be 2x smarter to debug it if you didn’t reach the 1x level initially?

belter•19h ago

> How can you be 2x smarter to debug it if you didn’t reach the 1x level initially?

Apparently the solution will be 10x prompters.

PaulHoule•19h ago

No, 10x reviewers.

mparnisari•19h ago

"Developers still use Stack Overflow and other human sources of expertise"

press X to doubt

lubujackson•18h ago

"So you write the answers yourself?"

"No, the developers do that."

"So you must take the answers to the person with the question?"

"Well, no... my LLM does that!"

"...what would you say you DO here?"

"I'm an answer website! What don't you people understand?!"

PaulHoule•19h ago

I've cleaned up plenty of "almost right" code written by humans, even by me.

AlexeyBrin•18h ago

The thing is a human can produce only so much code in a given time, with LLM(s) one can produce x times the same amount of code. Are you able to read, understand and clean up 5x or 10x as much code as 5 years ago ?

PaulHoule•17h ago

You’ve got to read

https://en.wikipedia.org/wiki/No_Silver_Bullet

Coding is 10% or 20% of the work, LLMs don’t radically improve throughput of software organizations

https://www.reddit.com/r/ExperiencedDevs/comments/1lwk503/st...

I see working with a LLM is like pair programming, if anything I end up wit better quality in the end because they see things in my blind spots, but there is no radical speed up.

throwawayoldie•19h ago

> One of the most surprising findings was a significant shift in developer preferences for AI compared to previous years, while most developers use AI, they like it less and trust it less this year

I bet that is surprising, at least if you're astonishingly naive and believe everything a company tells you when they're trying to sell you something.

saulpw•17h ago

You mean if you're under 30 years old, like half of the people in the world? Or pick an age at which being "astonishingly naive" should be expected, since you just don't have the life experience yet. The proportion of people that age or younger may be surprising to you.

throwawayoldie•17h ago

That's a fair point. I'm well over 30 (see username) and I freely admit to being cynical and grumpy. Or, if you prefer, "rich in life experience."

shpx•13h ago

> the survey data shows that developers maintain strong connections to human expertise and community resources. Stack Overflow remains the top community platform at 84% usage.

Survey of current and ex-Stack Overflow users shows that they use Stack Overflow.

'Hello World' in Bismuth

Does Betteridge's Law Still Apply?

Meta Admits There's a Goldilocks Zone for VR Session Length Due to Form Factor

Show HN: WTMF: An AI Companion for Late-Night Thoughts – Launching Next Week

Australia's productivity commission proposes cashflow tax to boost investment

OpenCQRS – an open-source CQRS framework for the JVM

Show HN: I made a website to find relevant conversations about your brand

My first browser extensions|speed up AEO with generated content to copy & paste

Amazon DocumentDB Serverless is now available

Why Won't Anyone Use the Beautiful Corporate Spaces

Google ADK and AMD Instinct GPUs: The Dynamic Duo for AI Agents

How to Build a Satellite?

'This wasn't obvious': the potato evolved from a tomato ancestor

Onshape – Product Development Platform

Quadratic Voting

Brightest explosion ever seen is still baffling astronomers

Subagents.sh – Share and discover Claude Code sub-agents

Bbor62 – A compact binary-to-text compressor

Top Anonymous Email Services for Privacy Lovers

Fujitsu starts development of 10000 plus superconducting quantum computer

I built a free, open-source security scanner with shareable dashboards

US Energy Department misrepresents climate science in new report

The Art of Parsing and Comparing Version Strings

One diet soft drink daily may increase diabetes risk by more than a third

Isle FPGA Computer

Ask HN: How do I sandbox Gemini Code Assist on Mac from accessing other files?

China struggles to break its addiction to manufacturing [Financial Times]

Why Japanese Developers Write Code Differently – Why It Works Better

Ubiquiti users report having access to others' UniFi routers, cameras (2023)

How to Grow Human Bones

'Hello World' in Bismuth

Does Betteridge's Law Still Apply?

Meta Admits There's a Goldilocks Zone for VR Session Length Due to Form Factor

Show HN: WTMF: An AI Companion for Late-Night Thoughts – Launching Next Week

Australia's productivity commission proposes cashflow tax to boost investment

OpenCQRS – an open-source CQRS framework for the JVM

Show HN: I made a website to find relevant conversations about your brand

My first browser extensions|speed up AEO with generated content to copy & paste

Amazon DocumentDB Serverless is now available

Why Won't Anyone Use the Beautiful Corporate Spaces

Google ADK and AMD Instinct GPUs: The Dynamic Duo for AI Agents

How to Build a Satellite?

'This wasn't obvious': the potato evolved from a tomato ancestor

Onshape – Product Development Platform

Quadratic Voting

Brightest explosion ever seen is still baffling astronomers

Subagents.sh – Share and discover Claude Code sub-agents

Bbor62 – A compact binary-to-text compressor

Top Anonymous Email Services for Privacy Lovers

Fujitsu starts development of 10000 plus superconducting quantum computer

I built a free, open-source security scanner with shareable dashboards

US Energy Department misrepresents climate science in new report

The Art of Parsing and Comparing Version Strings

One diet soft drink daily may increase diabetes risk by more than a third

Isle FPGA Computer

Ask HN: How do I sandbox Gemini Code Assist on Mac from accessing other files?

China struggles to break its addiction to manufacturing [Financial Times]

Why Japanese Developers Write Code Differently – Why It Works Better

Ubiquiti users report having access to others' UniFi routers, cameras (2023)

How to Grow Human Bones

Stack Overflow data reveals the hidden productivity tax of almost right AI code

Comments