> To directly measure the real-world impact of AI tools on software development, we recruited 16 experienced developers from large open-source repositories (averaging 22k+ stars and 1M+ lines of code) that they’ve contributed to for multiple years. Developers provide lists of real issues (246 total) that would be valuable to the repository—bug fixes, features, and refactors that would normally be part of their regular work. Then, we randomly assign each issue to either allow or disallow use of AI while working on the issue. When AI is allowed, developers can use any tools they choose (primarily Cursor Pro with Claude 3.5/3.7 Sonnet—frontier models at the time of the study); when disallowed, they work without generative AI assistance. Developers complete these tasks (which average two hours each) while recording their screens, then self-report the total implementation time they needed. We pay developers $150/hr as compensation for their participation in the study.
So it's a small sample size of 16 developers. And it sounds like different tasks were (randomly) assigned to the no-AI and with-AI groups - so the control group doesn't have the same tasks as the experimental group. I think this could lead to some pretty noisy data.
Interestingly - small sample size isn't in the list of objections that the auther includes under "Addressing Every Objection You Thought Of, And Some You Didn’t".
I do think it's an interesting study. But would want to see if the results could be reproduced before reading into it too much.
I thought it was the model, but then I realised, v0 is carried by the shadcn UI library, not the intelligence of the model
Example: using LeafletJS — not hard, but I didn't want to have to search all over to figure out how to use it.
Example: other web page development requiring dropping image files, complicated scrolling, split-views, etc.
In short, there are projects I have put off in the past but eagerly begin now that LLMs are there to guide me. It's difficult to compare times and productivity in cases like that.
Some of the most productive devs don't get paid by the big corps who make use of their open source projects, hence the constant urging of corps and people to sponsor projects they make money via.
Like can we determine the productivity of doctors, lawyers, journalists, or pastry chefs?
What job out there is so simple that we can meaningfully measure all the positive and negative effects of the worker as well as account for different conditions between workers.
I could probably get behind the idea that you could measure productivity for professional poker players (given a long enough evaluation period). Hard to think of much else.
Like what if by focusing on LLMs for productivity we just reinforce old-bad habits, and get into a local maxima... And even worse, what if being stuck with current so-so patterns, languages, etc means we don't innovate in language design, tooling, or other areas that might actually be productivity wins?
I expect it'll balance.
AI isn't very good at being concise, in my experience. To the point of producing worse code. Which is a strange change from humans who might just have a habit of being too concise, but not by the same degree.
These were maintainers of large open source projects. It's all relative. It's clearly providing massive gains for some and not as much for others. It should follow that it's benefit to you depends on who you are and what you are working on.
It isn't black and white.
There are some very good findings though, like how the devs thought they were sped up but they were actually slowed down.
My analogy to this is seeing people spend time trying to figure out how to change colors, draw shapes in powerpoint, rather than focus on the content and presentation. So here, we have developers now focusing their efforts on correcting the AI output, rather than doing the research and improving their ability to deliver code in the future.
Hmm...
When I’m in the “zone” I wouldn’t go near an LLM, but when I’ve fallen out of the “zone” they can be useful tools in getting me back into it, or just finishing that one extra thing before signing off for the day
I think the right answer to “does LLM use help or hinder developer productivity” is “it depends on how you use them”
I guess the tricky bit is, nobody knows what the future looks like. "The internet is a fad" in 1999 hasn't aged well, but a lot of people touted 1960s AI, XML and 3d telivisions as things that'd be the tools in only a few years.
We're all just guessing till then.
They're not great at business logic though, especially if you're doing anything remotely novel. Which is the difficult part of programming anyway.
But yeah, to the average corporate programmer who needs to recreate the same internal business tool that every other company has anyway, it probably saves a lot of time.
How I measure performance is how many features I can implement in a given period of time.
It's nice that people have done studies and have opinions, but for me, it's 10x to 20x better.
latenightcoding•4h ago
edit: should have mentioned the low-level stuff I work on is mature code and a lot of times novel.
justinko•4h ago
relaxing•4h ago
Falimonda•3h ago
hluska•3h ago
owebmaster•2h ago
hluska•3h ago
famahar•4h ago
sottol•4h ago
I ended shoehorned into backend dev in Ruby/Py/Java and don't find it improves my day to day a lot.
Specifically in C, it can bang out complicated but mostly common data-structures without fault where I would surely do one-off errors. I guess since I do C for hobby I tend to solve more interesting and complicated problems like generating a whole array of dynamic C-dispatchers from a UI-library spec in JSON that allows parsing and rendering a UI specified in YAML. Gemini pro even spat out a YAML-dialect parser after a few attempts/fixes.
Maybe it's a function of familiarity and problems you end using the AI for.
freeone3000•1h ago
moron4hire•4h ago
Recently, my company has been investigating AI tools for coding. I know this sounds very late to the game, but we're a DoD consultancy and one not traditional associated with software development. So, for most of the people in the company, they are very impressed with the AI's output.
I, on the other hand, am a fairly recent addition to the company. I was specifically hired to be a "wildcard" in their usual operations. Which is too say, maybe 10 of us in a company of 3000 know what we're doing regarding software (but that's being generous because I don't really have visibility into half of the company). So, that means 99.7% of the company doesn't have the experience necessary to tell what good software development looks like.
The stuff the people using the AI are putting out is... better than what the MilOps analysts pressed into writing Python-scripts-with-delusions-of-grandeur were doing before, but by no means what I'd call quality software. I have pretty deep experience in both back end and front end. It's a step above "code written by smart people completely inexperienced in writing software that has to be maintained over a lifetime", but many steps below, "software that can successfully be maintained over a lifetime".
IX-103•3h ago
You can tweak the prompt a bit to skew the probability distribution with careful prompting (LLMs that are told to claim to be math PHDs are better at math problems, for instance), but in the end all of those weights in the model are spent to encode the most probable outputs.
So, it will be interesting to see how this plays out. If the average person using AI is able to produce above average code, then we could end up in a virtuous cycle where AI continuously improves with human help. On the other hand, if this just allows more low quality code to be written then the opposite happens and AI becomes more and more useless.
jack_h•2h ago
When it comes to software the entire reason maintainability is a goal is because writing and improving software is incredibly time consuming and requires a lot of skill. It requires so much skill and time that during my decades in industry I rarely found code I would consider quality. Furthermore the output from AI tools currently may have various drawbacks, but this technology is going to keep improving year over year for the foreseeable future.
kannanvijayan•4h ago
In both of these cases, I found that just the smart auto-complete is a massive time-saver. In fact, it's more valuable to me than the interactive or agentic features.
Here's a snippet of some code that's in one of my recent buffers:
The actual code _I_ wrote were the comments. The savings in not having to type out the syntax is pretty big. About 80% of the time in manual coding would have been that. Little typos, little adjustments to get the formatting right.The other nice benefit is that I don't have to trust the LLM. I can evaluate each snippet right there and typically the machine does a good job of picking out syntactic style and semantics from the rest of the codebase and file and applying it to the completion.
The snippet, if it's not obvious, is from a bit of compiler backend code I'm working on. I would never have even _attempted_ to write a compiler backend in my spare time without this assistance.
For experienced devs, autocomplete is good enough for massive efficiency gains in dev speed.
I still haven't warmed to the agentic interfaces because I inherently don't trust the LLMs to produce correct code reliably, so I always end up reviewing it, and reviewing greenfield code is often more work than just writing it (esp now that autocomplete is so much more useful at making that writing faster).
sgc•3h ago
kannanvijayan•2h ago
cguess•3h ago
For frontend though? The stuff I really don't specialize in (despite some of my first html beginning on FrontPage 1997 back in 1997), it's a lifesaver. Just gotta be careful with prompts since so many front end frameworks are basically backend code at this point.
AstroBen•3h ago
sysmax•2h ago
Things like "apply this known algorithm to that project-specific data structure" work really well and save plenty of time. Things that require a gut feeling for how things are organized in memory don't work unless you are willing to babysit the model.