DHH: AI models are now good enough

https://twitter.com/dhh/status/2007504187568074843

12•nl•1mo ago

Comments

_se•1mo ago

DHH has long past the point where anyone should be caring about his technical opinions. This is a 0 substance post.

chokolad•1mo ago

> DHH has long past the point where anyone should be caring about his technical opinions. This is a 0 substance post.

Can you elaborate?

D-Machine•1mo ago

What can be stated without evidence can be dismissed without evidence. It is IMO pretty clear to me there is no substance to this post, without knowing anything about the author.

In general most such claims today are without substance, as they are made without any real metrics, and the metrics we actually need we just don't have. I.e. we need to quantify the technical debt of LLM code, how often it has errors relative to human-written code, and how critical / costly those errors are in each case relative to the cost of developer wages, and also need to be clear if the LLM usage is just boilerplate / webshit vs. on legacy codebases involving non-trivial logic and/or context, and whether e.g. the velocity / usefulness of the LLM-generated code decreases as the codebase grows, and etc.

Otherwise, anyone can make vague claims that might even be in earnest, only to have e.g. studies show that in fact the productivity is reduced, despite the developer "feeling" faster. Vague claims are useless at this point without concrete measurements and numbers.

Ianjit•1mo ago

This study does a good job of measuring the productivity impact. It found 1% uplift in dev productivity from using AI.

https://youtu.be/JvosMkuNxF8?si=J9qCjE-RvfU6qoU0

D-Machine•1mo ago

Great example of something that actually has some substance beyond meaningless anecdotes.

nl•1mo ago

Actually it didn't

From the video summary itself:

> We’ll unpack why identical tools deliver ~0% lift in some orgs and 25%+ in others.

At https://youtu.be/JvosMkuNxF8?t=145 he says the median is 10% more productivity, and looking at the chart we can see a 19% increase for the top teams (from July 2025).

The paper this is based on doesn't seem to be available which is frustrating though!

Ianjit•1mo ago

I think you are quoting productivity measured before checking the code actually works and correcting it. After re-work productivity drops to 1%. Tinestamp 14:04.

nl•1mo ago

That was from a single company, not across the cohort.

Ianjit•1mo ago

My bad. What was the result when they measured productivity after rework across the entire co hort?

nl•1mo ago

They don't publish it as far as I can see!

In any case, IMHO I think AI SWE has happened in 3 phases:

Pre-Sonnet 3.7 (Feb 2025): Autocomplete worked.

Sonnet 3.7 to Codex 5.2/Opus 4.5 (Feb 2025-Nov 2025): Agentic coding started working, depending on your problem space, ambition and the model you chose

Post Opus 4.5 (Nov 2025): Agentic coding works in most circumstances

This study was published July 2025. For most of the study timeframe it isn't surprising to me that it was more trouble than it was worth.

But it's different now, so I'm not sure the conclusions are particularly relevant anymore.

As DHH pointed out: AI models are now good enough.

Ianjit•4w ago

Sorry for the late response!

My guess is they didn't publish it because they only measured it at one company, if they had the data across the cohort they would have published.

The general result that review/re-wrok can cancel out the productivity gains is supported by other studies

AI generated code is 1.7x more buggy vs human generated code: https://www.coderabbit.ai/blog/state-of-ai-vs-human-code-gen...

Individual dev productivity gains are offset by peers having to review the verbose (and buggy) AI code: https://www.faros.ai/blog/ai-software-engineering

On agentic being the saviour for productivity, Meta measured a 6-12% productivity boost from agents programming: https://www.youtube.com/watch?v=1OzxYK2-qsI&si=ABTk-2RZM-leT...

"But it's different now" :)

chokolad•1mo ago

The claim was > DHH has long past the point where anyone should be caring about his technical opinions.

I asked for evidence, you are replying to something else.

jtbayly•1mo ago

I’ve seen the same change in the last 6 months.

christophilus•1mo ago

So have I. Opus 4.5 still needs close monitoring and code review, but it is now good enough for most of my day to day tasks.

scuff3d•1mo ago

Can we please stop taking this guy seriously...

LightBug1•1mo ago

Hasn't that long-haired, old racist just retired yet?

Goldman Sachs taps Anthropic's Claude to automate accounting, compliance roles

Ai.com bought by Crypto.com founder for $70M in biggest-ever website name deal

Big Tech's AI Push Is Costing More Than the Moon Landing

The AI boom is causing shortages everywhere else

Suno, AI Music, and the Bad Future [video]

Ask HN: How are researchers using AlphaFold in 2026?

Running the "Reflections on Trusting Trust" Compiler

Watermark API – $0.01/image, 10x cheaper than Cloudinary

Now send your marketing campaigns directly from ChatGPT

Queueing Theory v2: DORA metrics, queue-of-queues, chi-alpha-beta-sigma notation

Show HN: Hibana – choreography-first protocol safety for Rust

Haniri: A live autonomous world where AI agents survive or collapse

GPT-5.3-Codex System Card [pdf]

Atlas: Manage your database schema as code

Geist Pixel

Show HN: MCP to get latest dependency package and tool versions

The better you get at something, the harder it becomes to do

Show HN: WP Float – Archive WordPress blogs to free static hosting

Show HN: I Hacked My Family's Meal Planning with an App

Sony BMG copy protection rootkit scandal

The Future of Systems

NASA now allowing astronauts to bring their smartphones on space missions

Claude Code Is the Inflection Point

Show HN: MicroClaw – Agentic AI Assistant for Telegram, Built in Rust

Show HN: Omni-BLAS – 4x faster matrix multiplication via Monte Carlo sampling

The AI-Ready Software Developer: Conclusion – Same Game, Different Dice

AI Agent Automates Google Stock Analysis from Financial Reports

Voxtral Realtime 4B Pure C Implementation

I Was Trapped in Chinese Mafia Crypto Slavery [video]

U.S. CBP Reported Employee Arrests (FY2020 – FYTD)

Goldman Sachs taps Anthropic's Claude to automate accounting, compliance roles

Ai.com bought by Crypto.com founder for $70M in biggest-ever website name deal

Big Tech's AI Push Is Costing More Than the Moon Landing

The AI boom is causing shortages everywhere else

Suno, AI Music, and the Bad Future [video]

Ask HN: How are researchers using AlphaFold in 2026?

Running the "Reflections on Trusting Trust" Compiler

Watermark API – $0.01/image, 10x cheaper than Cloudinary

Now send your marketing campaigns directly from ChatGPT

Queueing Theory v2: DORA metrics, queue-of-queues, chi-alpha-beta-sigma notation

Show HN: Hibana – choreography-first protocol safety for Rust

Haniri: A live autonomous world where AI agents survive or collapse

GPT-5.3-Codex System Card [pdf]

Atlas: Manage your database schema as code

Geist Pixel

Show HN: MCP to get latest dependency package and tool versions

The better you get at something, the harder it becomes to do

Show HN: WP Float – Archive WordPress blogs to free static hosting

Show HN: I Hacked My Family's Meal Planning with an App

Sony BMG copy protection rootkit scandal

The Future of Systems

NASA now allowing astronauts to bring their smartphones on space missions

Claude Code Is the Inflection Point

Show HN: MicroClaw – Agentic AI Assistant for Telegram, Built in Rust

Show HN: Omni-BLAS – 4x faster matrix multiplication via Monte Carlo sampling

The AI-Ready Software Developer: Conclusion – Same Game, Different Dice

AI Agent Automates Google Stock Analysis from Financial Reports

Voxtral Realtime 4B Pure C Implementation

I Was Trapped in Chinese Mafia Crypto Slavery [video]

U.S. CBP Reported Employee Arrests (FY2020 – FYTD)

DHH: AI models are now good enough

Comments