AI makes you more productive. This is no longer up for debate. The energy you spend arguing last year's talking points is better spent knuckling down and learning the tools.
Or do you just produce more code but not more productive value?
It's like saying you're convinced people reporting they feel more productive in a mauve-coloured room are liars, or those that drive automatic vs manual. Maybe they just find muave a restful colour?
A frontend dev doing tailwind integration for his day job is gonna see very different speedups than someone working in a niche scientific codebase. Taking the average makes about as much sense as taking the average of the speedup from calculators for a mathematician, a farmer, and an elementary school student.
That is, unless you're building a single page app/landing page that is the typical center column with a hero and below that a 3x3 feature grid with those same 3 colors that all the sloppers show off.
I'm not a frontend dev, but these statements are starting to get outright disrespectful to those that are. Do you people understand how much "world", customer and product knowledge is required to design and implement great UX/UI?
I promise you are not going to be able to translate all this internalized understanding to an LLM and have it do your "tailwind integration" It actually sucks at all frontend outside of the 3 types of page layouts it understand.. Shitty landing pages, generic dashboards and shitty blog layouts.
Ya'll yearn for slop though so maybe everything will just become shit anyways.
METR already redid the study at a later date and now finds a likely 18% speedup
"For the subset of the original developers who participated in the later study, we now estimate a speedup of -18% with a confidence interval between -38% and +9%" (note their use of - and + here could be slightly confusing but they do mean 18% faster per the post)
Which is ancient at this point, and half a year older than the November 2025 inflection point when agentic coding got really good.
The original article is from August 2025, and the overall message to not trust ‘how it feels’ and rather measure outcomes seems right to me despite the outdated figures. On my team at least, we are seeing a noticeable inflection in work shipped with AI according to Weave.
Whenever I tell them about how awesome AI is, they come back with stories about how they used AI and it couldn't even do anything basic and what it did do had errors.
People will always create a world narrative that matches what they already believe.
Anti AI people are always quoting these "facts" about how AI reduces productivity even when developers feel it increases productivity - it reinforces their world view.
Productivity is not a feeling though. Either you show an increased productivity or it doesn't exist
That proves AI is capable of doing one part of the software engineering process. The 16 devs in the study trusted AI to write the code. Once we trust AI to do the verification as well we'll realise the gains we feel we're getting now. Essentially we're intentionally going slower on the second half because the trust is missing.
Alternatively, rather than trusting AI to do the validation, we could follow the vibe-coder approach by skipping the validation entirely, and trust that the generation stage is good enough not to need it. Historically that's come with some small downsides, like the code being a broken mess of security holes, but with time AI might fix that.
The reality is making good decisions and thinking about approaches take time. AI can absolutely make us faster at it but it's not magic and these speedups come with effort.
I'm British. I've been taught to turn understatement into an art.
Oh, the irony of this post being AI-generated.
The actual study with the data, minus the "I was right all along" commentary
And this is coming from an AI sceptic.
He came up with a fun idea for a racing game renderer: it distorted the perspective transformation a bit, grading depth on a curve, so far away things would linger in the distance a bit longer, then speed up and WHOOSH past you, seeming even faster than they would be photorealisticly!
[1] https://metr.org/blog/2025-07-10-early-2025-ai-experienced-o...
Devs wish that was true but it isn't and it will get better.
For me as a dev, that's not the whole truth. Where I've found actual value in AI (and I think were some of that "perceived speedup" is coming from) is looking up things.
Unless you know the codebase and used libraries extremely well, you will have to do lots of "micro-lookups" during coding, where you have to find the specific APIs or library functions for your problem, then figure out how exactly you have to call them, how to handle the result, etc. That's lots of "research" work interleaved with actually writing the code.
AIs seem to be good enough to have a lot of that knowledge already baked into their weights, at least for popular platforms, so if you prompt it something, you can skip all that low-level lookup work or at least defer it until code review. Even during review, it's easier, because you don't have to come up with the appropriate library function from scratch, you only have to verify that the ones the AI used make sense and are used correctly.
Before, a backend guy asked to add an intranet page would make an austere page -bare html with barely any styling or javascript. Today, the same guy given the same task can turn in something with styling, javascript, internationalisation, interactive form validation, progress spinner, minification build stage, linting, maybe even automated browser tests.
And I have to code review it. Now the bottleneck of writing the code has been removed, I now find code review is the bottleneck - and a bottleneck facing much higher flow must either let more through, or start applying back pressure.
Sometimes I think an evil genie granted my wish for better tested code by trying to drown me in it.
ROFL sorry
I suspect:
If you know what you are doing it is a power tool.
If you don't know what you are doing it's also a power tool - if you measure a lot of devs then the bad ones (or anyone having a bad day, or the wrong fit for a project) can make work for everyone else at an outrageous pace.
I got very frustrated with LLMs and their inability to apply good taste or maintain consistent design languages, and put the project on ice. But I decided to double down on more tooling and learn as much about frontend as I could because I also realized that frontend itself - the problem domain, the engineering culture (or lack thereof), the historical baggage, the sheer size of the frontend api/language surface was part of the problem. And also there was/is a lack of good LLM and agent-oriented tooling that was a much deeper problem than I expected initially.
I originally thought I would just create skills/workflows and apis for generating sites from templates, but the problem is moreso that you need an entirely different kind of harness and development process for frontend, which doesn't really exist yet. Claude design is probably the most familiar gesture in that direction for most people but I think it's only scratching the surface. Our own "agentic playwright" is https://github.com/accretional/chromerpc/tree/main/chrome-pr... - IMO this kind of tool (both ours and Claude Design) is a major win for removing the largest, most frustrating frontend LLM painpoints (having a human doing QA and prodding the model to fix obviously-wrong outputs).
But the bigger problem is that the webdev tooling ecosystem is FUCKING AWFUL, and there are too many different ways to do something even using the actual base browser apis, let alone all the random ass low-quality tools and cargoculting that seeps into the models' way of working and thinking. That's not to say that tools like React are bad, necessarily, but that there is so much pre-LLM slop and churn and low quality/inconsistent work in the frontend ecosystem that you really need to be MUCH more knowledgable about the way browsers and the web actually work than the median frontend developer (especially the ones participating in the endless hype flavor of the months, generating all the noise that defines the engineering culture) to effectively use them. Or even better, if you know enough you can also NOT reach for them because you're able to just implement it via raw html/css/browser primitives instead of through 2000 node packages.
To be clear, I'm not saying frontend development is slop, but that it has a very high skill ceiling and requires a lot of very particular/thorny knowledge to be good at. I think the reason AI frontend looks so much like slop is that it hasn't been RLed against the actual web-standards in a way that lets it learn how to actually build good sites, it just has the median frontend engineer archetype from its pretraining and then some kind of RLVR to get it to produce workable, not-fucked-up code (the 3x3 grid, the slop hero, the unnecessary blinking green buttons, etc.). And also, for LLMs, maybe engaging with the webdev tool ecosystem beyond the core infrastructure layer and base apis/languages is more trouble than it's worth, because they often optimize for "I want a particular kind of UX and don't know how to implement it directly, but I do know how to find a package and call it, then prod it into working".
LLMs need something more like a browser-harness, a meta-design system, per-design-language component management tooling, and a non-slop build system. They also generally need much better support/more sophisticated UX for hierarchical iframes, CSP, etc. which is a space that is not very well-explored despite its potential, because most frontend devs find it too hard or complicated.
People are already starting to build these and I think we'll get there in the next year. The hardest piece of the puzzle is figuring out how to structure RL training envs to learn frontend directly against web standards, because web standards are very complex and high-surface; but this is also the most promising because it's how you get Mythos-like superhuman performance. We have a project to build some of the base domain modeling/search tooling needed for frontend RL, eg https://github.com/accretional/proto-css, but it's early days. You should definitely try agentic browser tooling if you haven't yet because it makes a huge difference in getting existing LLMs to be more effective at frontend, and automating most of the debugging. It's what allows us to eg fully automate creating gifs of models interacting with our site in the context of a user journey when we run tests: https://github.com/accretional/proto-css/tree/main/chrome-te...
What I'm really wondering is how much extra tasks are being done that wouldn't have been done without AI, and whether those actually have a payoff. That is, 100x faster versus not doing the work at all.
I'm hoping some enterprises that collect metrics on e.g. time-to-market, customer satisfaction, revenue, costs, etc will release an authoritative report some day.
Overall this suggests to them that the current speedup is likely greater than what the study could measure.
Like, what people are saying is, “That old study was wrong! They did a new broken study that overturned it!”
I think there is a simple reason for that. If you automate something, you make the measureable/predictable thing faster. So the hard to measure/predict part of the job will take more share of the time, and overall difficulty to measure/predict goes up.
I think this is what happened with Agile Scrum - as developers became more productive (for unrelated reasons, two main sources of SW developer productivity before AI were compilers and open source), the bureacracy (amount of meetings) increased, because the ratio of hard to measure vs easy to measure went up. Bureacracy is hard to measure, so it went up (as a share of work). I expect this only getting worse with more automation, such as AI. So I predict an increase in share of bureacracy compared to pre-AI world.
Either way, IMHO main point is automation has the opposite effect on human job predictability, it lowers it. Tasks we can easily automate are those that are easy to predict.
I do think AI has been a huge boon to productivity in many ways, but looking at feature timelines, I think it's pretty clear the 'critical shortest path' of key features hasn't been sped up by that much.
I would not, at all, suggest that this second study corrects or debunks the first.
Instead what it shows (if anything, i.e. if you can even put aside the regrettable choice to change the payment level, which affects applicant recruitment) is that the mindset shift has already happened: developers now don’t want to attempt some tasks without AI.
What that tells you is not (with any confidence at least) that they are faster, but perhaps that we are beyond the point that this can be meaningfully measured. AI could still be making developers slower, but developers aren’t going to be willing or perhaps able to help you find out.
Basically the job is different now.
What this does for me, perhaps, is vindicate my feelings. I can do agentic coding; I have learned the principles and some tools and I could learn more. But if this study is really reflective of how other developers feel now, I am done.
If you trust your team to care about quality then PRs aren't necessary, and if you don't then why are you trusting them to catching problems in PR reviews?
jiggawatts•1h ago
This may as well have been written in the stone ages, when we were banging AI rocks together.
I just did a ~6 month project in ~2 weeks using a frontier model.
I wouldn't even have attempted this kind work a year ago, with or without the AIs available at the time!
suddenlybananas•1h ago
koe123•1h ago
loveparade•1h ago
ImprobableTruth•1h ago
Claims like this are hard for me to take seriously because 'good' models have been available since the start of the year. So, if they really 10x one's productivity, then people should be able to have gotten done 5 years worth of work since then, but I've never actually seen anybody show any project like this.
Shitty-kitty•50m ago