``` It's common for engineers to end up working on projects which they don't have an accurate mental model of. Projects built by people who have long since left the company for pastures new. It's equally common for developers to work in environments where little value is placed on understanding systems, but a lot of value is placed on quickly delivering changes that mostly work. In this context, I think that AI tools have more of an advantage. They can ingest the unfamiliar codebase faster than any human can, and can often generate changes that will essentially work. ```
Reason: you cannot evaluate the work accurately if you have no mental model. If there's a bug given the systems unwritten assumptions you may not catch it.
Having said that it also depends on how important it is to be writing bug free code in the given domain I guess.
I like AI particularly for green field stuff and one off scripts as it let's you go faster here. Basically you build up the mental model as you're coding with the AI.
Not sure about whether this breaks down at a certain codebase size though.
So
> Reason: you cannot evaluate the work accurately if you have no mental model. If there's a bug given the systems unwritten assumptions you may not catch it.
This is completely correct. It's a very fair statement. The problem is that a developer coming into a large legacy project is in this spot regardless of the existence of AI.
I've found that asking AI tools to generate a changeset in this case is actually a pretty solid way of starting to learn the mental model.
I want to see where it tries to make changes, what files it wants to touch, what libraries and patterns it uses, etc.
It's a poor man's proxy for having a subject matter expert in the code give you pointers. But it doesn't take anyone else's time, and as long as you're not just trying to dump output into a PR can actually be a pretty good resource.
The key is not letting it dump out a lot of code, in favor of directional signaling.
ex: Prompts like "Which files should I edit to implement a feature which does [detailed description of feature]?" Or "Where is [specific functionality] implemented in this codebase?" Have been real timesavers for me.
The actual code generation has probably been a net time loss.
This. Leveraging the AI to start to develop the mental model is an advantage. But, using the AI is a non-trivial skill set that needs to be learned. Skepticism of what it's saying is important. AI can be really useful just like a 747 can be useful, but you don't want someone picked off the street at random flying it.
Is there any evidence that AI helps you build the mental model of an unfamiliar codebase more quickly?
In my experience trying to use AI for this it often leads me into the weeds
Before beginning the study, the average developer expected about a 20% productivity boost.
After ending the study, the average developer (potentially: you) believed they actually were 20% more productive.
In reality, they were 0% more productive at best, and 40% less productive at worst.
Think about what it would be like to be that developer; off by 60% about your own output.
If you can't even gauge your own output without being 40% off on average, 60% off at worst; be cautious about strong opinions on anything in life. Especially politically.
Edit 1: Also consider, quite terrifyingly, if said developers were in an online group, together, like... here. The one developer who said she thought it made everyone slower (the truth in this particular case), would be unanimously considered an idiot, downvoted to the full -4, even with the benefit of hindsight.
Edit 2: I suppose this goes to show, that even on Hacker News, where there are relatively high-IQ and self-aware individuals present... 95% of the crowd can still possibly be wildly delusional. Stick to your gut, regardless of the crowd, and regardless of who is in it.
It's like Tog's study that people think Keyboard is faster than the mouse even when they are faster with the mouse. Because they are measuring how they feel, not what is actually happening.
But despite the popularity of some of this (planning poker, particularly; PDCA for process improvements is sadly less popular) as ritual, those elements have become part of a cargo cult where almost no one remembers why we do it.
Yeah, this is me at my job right now. Every time I express even the mildest skepticism about the value of our Cursor subscription, I'm getting follow up conversations basically telling me to shut up about it
It's been very demoralizing. You're not allowed to question the Emperor's new clothes
Using it beyond that is just more work. First parse the broken response, remove any useless junk, have it reprocess with updated query.
It’s a nice tool to have (just as search engines gave us easy access to multiple sources/forums), but its limitations are well known. Trying to use it 100% as intended is a massive waste of time and resources (energy use…)
I just started working on a 3-month old codebase written by someone else, in a framework and architecture I had never used before
Within a couple hours, with the help of Claude Code, I had already created a really nice system to replicate data from staging to local development. Something I had built before in other projects, and I new that manually it would take me a full day or two, especially without experience in the architecture
That immediately sped up my development even more, as now I had better data to test things locally
Then a couple hours later, I had already pushed my first PR. All code following the proper coding style and practices of the existing project and the framework. That PR, would have taken me at least a couple of days and up to 2 weeks to fully manually write out and test
So sure, AI won’t speed everyone or everything up. But at least in this one case, it gave me a huge boost
As I keep going, I expect things to slow down a bit, as the complexity of the project grows. However, it’s also given me the chance to get an amazing jumpstart
> It's equally common for developers to work in environments where little value is placed on understanding systems, but a lot of value is placed on quickly delivering changes that mostly work. In this context, I think that AI tools have more of an advantage. They can ingest the unfamiliar codebase faster than any human can, and can often generate changes that will essentially work.
Sadly clickbait headlines like the OP, "AI slows down open source developers," spread this misinformation, ensuring that a majority of people will have the same misapprehension.
Using Warp terminal (which used Claude) I was get past those barriers and achieve results that weren't happening at all before.
“When open source developers working in codebases that they are deeply familiar with use AI tools to complete a task, they take longer to complete that task”
I have anecdotally found this to be true as well, that an LLM greatly accelerates my ramp up time in a new codebase, but then actually leads me astray once I am familiar with the project.
Not always the case, but whenever I read about these strained studies or arguments about how AI is actually making people less productive, I can't help but wonder why nearly every programmer I know, myself included, finds value in these tools. I wonder if the same thing happened with higher level programming languages where people argued, you may THINK not managing your own garbage collector will lead to more productivity but actually...
Even if we weren't more "productive", millions prefer to use these tools, so it has to count for something. And I don't need a "study" to tell me that
16 devs. And they weren't allowed to pick which tasks they used the AI on. Ridiculous. Also using it on "old and >1 million line" codebases and then extrapolating that to software engineering in general.
Writers like this then theorize why AI isn't helpful, then those "theories" get repeated until it feels less like a theory and more like a fact and it all proliferates into an echo chamber of AI isn't a useful tool. There have been too many anecdotes and my own personal experience to ignore that it isn't useful.
It is a tool and you have to learn it to be successful with it.
They were allowed to pick whether or not to use AI on a subset of tasks. They weren't forced to use AI on tasks that don't make sense for AI
"To directly measure the impact of AI tools on developer productivity, we conduct a randomized controlled trial by having 16 developers complete 246 tasks (2.0 hours on average) on well-known open-source repositories (23,000 stars on average) they regularly contribute to. Each task is randomly assigned to allow or disallow AI usage, and we measure how long it takes developers to complete tasks in each condition."
> If AI is allowed, developers can use any AI tools or models they choose, including no AI tooling if they expect it to not be helpful. If AI is not allowed, no generative AI tooling can be used.
AI is allowed not required
Can you bring up any specific issues with the metr study? Alternatively, can you site a journal that critiques it?
They used 16 developers. The confidence intervals are wide and a few atypical issues per dev could swing the headline figure.
Veteran maintainers on projects they know inside-out. This is a bias.
Devs supplied the issue list (then randomized) which still leads to subtle self-selection bias. Maintainers may pick tasks they enjoy or that showcase deep repo knowledge—exactly where AI probably has least marginal value.
Time was not independently logged and was self-reported.
No possible direct quality metric is possible. Could the AI code be better?
The Hawthorne effect. Knowing they are observed paid may make devs over-document, over-prompt, or simply take their time.
Many of the devs were new to Cursor
Bias in forecasting.
To the credit of the paper authors, they were very clear that they were not making a claim against software engineering in general. But everyone wants to reinforce their biases, so...
Metr may overall have an ok mission, but their motivation is questionable. They published something like this to get attention. Mission accomplished on that but they had to have known how this would be twisted.
I really like how the author then brought up the point that for most daily work we don't have the theory built, even a small fraction of it, and that this may or may not change the equation.
Just one problem with that...
(The SPACE [1] framework is a pretty overview of considerations here; I agree with a lot of it, although I'll note that METR [2] has different motivations for studying developer productivity than Microsoft does.)
It can make some things faster and better than a human with a saw, but you have to learn how to use them right (or you will loose some fingers).
I personally find that agentic AI tools make me be more ambitious in my projects, I can tackle some things I didn't tthougth about doing before. And I also delegate work that I don't like to them because they are going to do it better and quicker than me. So my mind is free to think on the real problems like architecture, the technical debt balance of my code...
Problem is that there is the temptation of letting the AI agent do everything and just commit the result without understanding YOUR code (yes, it was generated by an AI but if you sign the commit YOU are responsible for that code).
So as with any tool try to take the time to understand how to better use it and see if it works for you.
This is insulting to all pre-2023 open source developers, who produced the entire stack that the "AI" robber barons use in their companies.
It is even more insulting because no actual software of value has been demonstrably produced using "AI".
I think this blog post is an interesting take on one specific factor that is likely contributing to slowdown. We discuss this in the paper [2] in the section "Implicit repository context (C.1.5)" -- check it out if you want to see some developer quotes about this factor.
> This is why AI coding tools, as they exist today, will generally slow someone down if they know what they are doing, and are working on a project that they understand.
I made this point in the other thread discussing the study, but in general, these results being surprising makes it easy to read the paper, find one factor that resonates, and conclude "ah, this one factor probably just explains slowdown." My guess: there is no one factor -- there's a bunch of factors that contribute to this result -- at least 5 seem likely, and at least 9 we can't rule out (see the full factors table on page 11).
> If there are no takers then I might try experimenting on myself.
This sounds super cool! I'd be very excited to see how you set this up + how it turns out... please do shoot me an email (in the paper) if you do this!
> AI slows down open source developers. Peter Naur can teach us why
Nit: I appreciate how hard it is to write short titles summarizing the paper (the graph title is the best I was able to do after a lot of trying) -- but I might have written this "Early-2025 AI slows down experienced open-source developers. Peter Naur can give us more context about one specific factor." It's admittedly less of a catchy-title, but I think getting the qualifications right are really important!
Thanks again for the sweet write-up! I'll hang around in the comments today as well.
[1] https://news.ycombinator.com/item?id=44522772
[2] https://metr.org/Early_2025_AI_Experienced_OS_Devs_Study.pdf
Even that's too general, because it'll depend on what the task is. It's not as if open source developers in general never work on tasks where AI could save time.
Ehhhh... not so much. It had serious design flaws in both the protocol and the analysis. This blog post is a fairly approachable explanation of what's wrong with it: https://www.argmin.net/p/are-developers-finally-out-of-a-job
A few notes if it's helpful:
1. This post is primarily worried about ordering considerations -- I think this is a valid concern. We explicitly call this out in the paper [1] as a factor we can't rule out -- see "Bias from issue completion order (C.2.4)". We have no evidence this occurred, but we also don't have evidence it didn't.
2. "I mean, rather than boring us with these robustness checks, METR could just release a CSV with three columns (developer ID, task condition, time)." Seconded :) We're planning on open-sourcing pretty much this data (and some core analysis code) later this week here: https://github.com/METR/Measuring-Early-2025-AI-on-Exp-OSS-D... - star if you want to dig in when it comes out.
3. As I said in my comment on the post, the takeaway at the end of the post is that "What we can glean from this study is that even expert developers aren’t great at predicting how long tasks will take. And despite the new coding tools being incredibly useful, people are certainly far too optimistic about the dramatic gains in productivity they will bring." I think this is a reasonable takeaway from the study overall. As we say in the "We do not provide evidence that:" section of the paper (Page 17), we don't provide evidence across all developers (or even most developers) -- and ofc, this is just a point-in-time measurement that could totally be different by now (from tooling and model improvements in the past month alone).
Thanks again for linking, and to the original author for their detailed review. It's greatly appreciated!
[1] https://metr.org/Early_2025_AI_Experienced_OS_Devs_Study.pdf
Everyone else was an absolute Cursor beginner with barely any Cursor experience. I don't find it surprising that using tools they're unfamiliar with slows software engineers down.
I don't think this study can be used to reach any sort of conclusion on use of AI and development speed.
1. Some prior studies that find speedup do so with developers that have similar (or less!) experience with the tools they use. In other words, the "steep learning curve" theory doesn't differentially explain our results vs. other results.
2. Prior to the study, 90+% of developers had reasonable experience prompting LLMs. Before we found slowdown, this was the only concern that most external reviewers had about experience was about prompting -- as prompting was considered the primary skill. In general, the standard wisdom was/is Cursor is very easy to pick up if you're used to VSCode, which most developers used prior to the study.
3. Imagine all these developers had a TON of AI experience. One thing this might do is make them worse programmers when not using AI (relatable, at least for me), which in turn would raise the speedup we find (but not because AI was better, but just because with AI is much worse). In other words, we're sorta in between a rock and a hard place here -- it's just plain hard to figure out what the right baseline should be!
4. We shared information on developer prior experience with expert forecasters. Even with this information, forecasters were still dramatically over-optimistic about speedup.
5. As you say, it's totally possible that there is a long-tail of skills to using these tools -- things you only pick up and realize after hundreds of hours of usage. Our study doesn't really speak to this. I'd be excited for future literature to explore this more.
In general, these results being surprising makes it easy to read the paper, find one factor that resonates, and conclude "ah, this one factor probably just explains slowdown." My guess: there is no one factor -- there's a bunch of factors that contribute to this result -- at least 5 seem likely, and at least 9 we can't rule out (see the factors table on page 11).
I'll also note that one really important takeaway -- that developer self-reports after using AI are overoptimistic to the point of being on the wrong side of speedup/slowdown -- isn't a function of which tool they use. The need for robust, on-the-ground measurements to accurately judge productivity gains is a key takeaway here for me!
(You can see a lot more detail in section C.2.7 of the paper ("Below-average use of AI tools") -- where we explore the points here in more detail.)
Rather than fit two generally disparate things together it’s probably better to just use VSTO and C# (hammer and nails) rather than some unholy combination no one else has tried or suffered through. When it goes wrong there’s more info to get you unstuck.
* spec out project goals and relevant context in a README and spec out all components; have the AI build out each component and compose them. I understand the high-level but don't necessarily know all of the low-level details. This is particularly helpful when I'm not deeply familiar with some of the underlying technologies/libraries. * having an AI write tests for code that I've verified is working. As we all know, testing is tedious - so of course I want to automate it. And we written tests (for well written code) can be pretty easy to review.
The serpent is devouring its own tail.
mkagenius•1h ago
Fully autonomous coding tools like v0, a0, or Aider work well as long as the context is small. But once the context grows—usually due to mistakes made in earlier steps—they just can’t keep up. There's no real benefit of "try again" loop yet.
For now, I think simple VSCode extensions are the most useful. You get focused assistance on small files or snippets you’re working on, and that’s usually all you need.
ethan_smith•44m ago