Preliminary data from a longitudinal AI impact study

https://newsletter.getdx.com/p/ai-productivity-gains-are-10-not

23•donutshop•3h ago

Comments

verdverm•3h ago

so far, we're still learning how to use this new tool, which is also getting better with each release

dude250711•1h ago

I agree, it was about 10.29% earlier this year, now we are standing at least at 10.35% or something.

verdverm•1h ago

The last one that made the rounds was negative, so we have moved more than 10% in less than 1/2 a year

eucyclos•6m ago

That's got to be more about processes around it than the tool itself though, right?

arisAlexis•1h ago

because the human may be the bottleneck soon

eucyclos•7m ago

It might be more accurate to say humans will only work at the bottlenecks soon, unless I've misunderstood the vector of your commentary.

0xbadc0de5•1h ago

Fair assessment. And worth noting that in a sane world, a broad 10% productivity improvement across industry would be a once-in-a-lifetime, headline-making story, not a disappointment.

enraged_camel•1h ago

>> November 2024 through February 2026

Yeah, listen... I'm glad these types of studies are being conducted. I'll say this though: the difference between pre- and post-Opus 4.5 has been night and day for me.

From August 2025 through November 2025 I led a complex project at work where I used Sonnet 4.5 heavily. It was very helpful, but my total productivity gains were around 10-15%, which is pretty much what the study found. Once Opus came out in November though, it was like someone flipped a switch. It was much more capable at autonomous work and required way less hand-holding, intervention or course-correction. 4.6 has been even better.

So I'm much more interested in reading studies like this over the next two years where the start period coincides with Opus 4.5's release.

rybosworld•35m ago

> Planning, alignment, scoping, code review, and handoffs—the human parts of the SDLC—remain largely untouched

Seems likely that process is holding things back. Planning has always been a "best-guess". There's lots you can't account for until you start a task.

Code review mostly exists because the cost of doing something wrong was high (because human coding is slow). If you can code faster, you can replace bad code faster. I.e., LLMs have cheapened the cost of deployment.

We can't honestly assess the new way of doing things when we bring along the baggage of the old way of doing things.

felipeerias•5m ago

Planning might end up being more reliable thanks to coding agents: if you want to estimate how long a task would take, just send an agent to do it.

If the agent comes back in a few minutes with a tiny fix, it is probably a small task.

If the agent produces a large, convoluted solution that would need careful review, it is at least a medium task.

And if the agent gets stuck, runs into architectural constraints, etc. then it is definitely a hard task.

Temporal: A nine-year journey to fix time in JavaScript

Many SWE-bench-Passing PRs would not be merged

Don't post generated/AI-edited comments. HN is for conversation between humans.

Making WebAssembly a first-class language on the Web

Show HN: I built a tool that watches webpages and exposes changes as RSS

Show HN: Autoresearch@home

Show HN: A context-aware permission guard for Claude Code

I was interviewed by an AI bot for a job

Google closes deal to acquire Wiz

Personal Computer by Perplexity

CNN Explainer – Learn Convolutional Neural Network in Your Browser (2020)

The MacBook Neo

Meticulous (YC S21) is hiring to redefine software dev

BitNet: 100B Param 1-Bit model for local CPUs

Entities enabling scientific fraud at scale (2025)

Show HN: Klaus – OpenClaw on a VM, batteries included

Preliminary data from a longitudinal AI impact study

5,200 holes carved into a Peruvian mountain left by an ancient economy

Against vibes: When is a generative model useful

Britain is ejecting hereditary nobles from Parliament after 700 years

How we hacked McKinsey's AI platform

Swiss e-voting pilot can't count 2,048 ballots after decryption failure

Physicist Astrid Eichhorn is a leader in the field of asymptotic safety

Show HN: Open-source browser for AI agents

Building Better Country Selects

Launch HN: Prism (YC X25) – Workspace and API to generate and edit videos

Can the Dictionary Keep Up?

Show HN: Satellite imagery object detection using text prompts

Launch HN: Sentrial (YC W26) – Catch AI agent failures before your users do

Building a TB-303 from Scratch

Temporal: A nine-year journey to fix time in JavaScript

Many SWE-bench-Passing PRs would not be merged

Don't post generated/AI-edited comments. HN is for conversation between humans.

Making WebAssembly a first-class language on the Web

Show HN: I built a tool that watches webpages and exposes changes as RSS

Show HN: Autoresearch@home

Show HN: A context-aware permission guard for Claude Code

I was interviewed by an AI bot for a job

Google closes deal to acquire Wiz

Personal Computer by Perplexity

CNN Explainer – Learn Convolutional Neural Network in Your Browser (2020)

The MacBook Neo

Meticulous (YC S21) is hiring to redefine software dev

BitNet: 100B Param 1-Bit model for local CPUs

Entities enabling scientific fraud at scale (2025)

Show HN: Klaus – OpenClaw on a VM, batteries included

Preliminary data from a longitudinal AI impact study

5,200 holes carved into a Peruvian mountain left by an ancient economy

Against vibes: When is a generative model useful

Britain is ejecting hereditary nobles from Parliament after 700 years

How we hacked McKinsey's AI platform

Swiss e-voting pilot can't count 2,048 ballots after decryption failure

Physicist Astrid Eichhorn is a leader in the field of asymptotic safety

Show HN: Open-source browser for AI agents

Building Better Country Selects

Launch HN: Prism (YC X25) – Workspace and API to generate and edit videos

Can the Dictionary Keep Up?

Show HN: Satellite imagery object detection using text prompts

Launch HN: Sentrial (YC W26) – Catch AI agent failures before your users do

Building a TB-303 from Scratch

Preliminary data from a longitudinal AI impact study

Comments