Before, lines of code was (mis)used to try to measure individual developer productivity. And there was the collective realization that this fails, because good refactoring can reduce LoC, a better design may use less lines, etc.
But LoC never went away, for example, for estimating the overall level of complexity of a project. There's generally a valid distinction between an app that has 1K, 10K, 100K, or 1M lines of code.
Now, the author is describing LoC as a metric for determining the proportion of AI-generated code in a codebase. And just like estimating overall project complexity, there doesn't seem to be anything inherently problematic about this. It seems good to understand whether 5% or 50% of your code is written using AI, because that has gigantic implications for how the project is managed, particularly from a quality perspective.
Yes, as the author explains, if the AI code is more repetitive and needs refactoring, then the AI proportion will seem overly high in terms of how much functionality the AI proportion contributes. But at the same time, it's entirely accurate in terms of how this is possibly a larger surface for bugs, exploits, etc.
And when the author talks about big tech companies bragging about the high percentage of LoC being generated with AI... who cares? It's obviously just for press. I would assume (hope) that code review practices haven't changed inside of Microsoft or Google. The point is, I don't see these numbers as being "targets" in the way that LoC once were for individual developer productivity... there's more just a description of how useful these tools are becoming, and a vanity metric for companies signaling to investors that they're using new tools efficiently.
I'd say you're operating on a higher plane of thought than the majority in this industry right now. Because the majority view roughly appears to be "Need bigger number!", with very little thought, let alone deep thought, employed towards the whys or wherefores thereof.
The overall level of complexity of a project is not an "up means good" kind of measure. If you can achieve the same amount of functionality, obtain the same user experience, and have the same reliability with less complexity, you should.
Accidental complexity, as defined by Brooks in No Silver Bullet, should be minimized.
And if AI tools are writing all of the code, does it even matter anymore?
Google engineer perspective:
I'm actually thinking code reviews are one of the lowest hanging fruits for AI here. We have AI reviewers now in addition to the required human reviews, but it can do anything from be overly defensive at times to finding out variables are inconsistently named (helpful) to sometimes finding a pretty big footgun that might have otherwise been missed.
Even if it's not better than a huamn reviwer, the faster turnaround time for some small % of potential bugs is a big productivity boost.
> AI didn't just repeat the mistake. It broke the mistake open.
Come on bruh
- Is the client happy? - Are the team members growing(as in learning)? - Were we able to make a profit?
Everything else was less relevant. For example: Why do I care that the project took bit longer, if at the end the client was happy with the result, and we can continue the relationship with new projects. It frees you from the cruelty of dates that are often set arbitrary.
So perhaps we should evaluate AI coding tools the same. If we can deliver successful projects in a sustainable way, then we are good.
It also also often fails to clean up after itself. When you remove a feature (one that you may not have even explicitly asked for) it will sometimes just leave behind the unused code. This is really annoying when reviewing and you realize one of the files you read through is referenced nowhere.
You have to keep a close eye out to prevent bloat from these issues.
I like the author's proposed "Comprehension coverage" metric. It aligns well with Naur's Programming as Theory Building.
This off-the-cuff statement buries so much complexity. Sure it catches new code the exactly implements existing code, but IME it is __way__ more common to need to slightly (or not so slightly) change existing code that can now be used by multiple consumers, and then delete the new "duplicate" code. That is not trivial and requires (1) judgement from your AI coder and (2) deep reviewer expertise from your human coder.
kittikitti•1h ago
These metrics for advanced roles are not applicable, no matter what you come up with. But even lines of code are good enough to see progress from a blank slate. Every developer or advanced AI agent must be judged on a case by case basis.
kemotep•1h ago
The OpenBSD project prides itself on producing very secure, bug free software, and they largely trend towards as low of lines of codes as they can possibly get away with while maintaining readability (so no codegolf tricks for the most part). I would rather we write secure bug free software than speed up the ability to output 10kLOC. The typing out code isn’t the difficult part in that scenario.
skydhash•45m ago
But reducing the amount of LoC helps, just like using the correct word helps in writing text. That’s the craft part of software engineering, having the taste to write clear and good code.
And just like writing and any other craft, the best way to acquire such taste is to study others works.