Now i can write one example of a pass and then get codex to read the code and write a test for all the branches in that section saves time as it can type a lot faster than i can and its mostly copying the example i already have but changing the input to hit all the branches.
Lmao. Rofl even.
(Testing is the one thing you would never outsource to AI.)
it could still speculate wrong things, but it wont speculate that the code is supposed to crash on the first line of code
That doesn't tell you that the code is correct. It tells you that the branching code can reach all the branches. That isn't very useful.
I would rephrase that as "all LLMs, no matter how many you use, are only as good as one single pair of eyes".
If you're a one-person team and have no capital to spend on a proper test team, set the AI at it. If you're a megacorp with 10k full time QA testers, the AI probably isn't going to catch anything novel that the rest of them didn't, but it's cheap enough you can have it work through everything to make sure you have, actually, worked through everything.
That's not really true.
Making the AI write the code, the test, and the review of itself within the same session is YOLO.
There's a ton of scaffolding in testing that can be easily automated.
When I ask the AI to test, I typically provide a lot of equivalence classes.
And the AI still surprises me with finding more.
On the other hand, it's equally excellent at saying "it tested", and when you look at the tests, they can be extremely shallow. Or they can be fairly many unit tests of certain parts of the code, but when you run the whole program, it just breaks.
The most valuable testing when programming with AI (generated by AI, or otherwise) are near-realistic integration tests. That's true for human programmers, but we take for granted that casual use of the program we make as we develop it constitutes as a poor man's test. When people who generally don't write tests start using AI, there's just nothing but fingers crossed.
I'd rather say: If there's one thing you would never outsource to AI, it's final QA.
https://en.wikipedia.org/wiki/Amdahl's_law
a 100% increase in coding speed means I then I get to spend an extra 30 minutes a week in meetings
while now hating my job, because the only fun bit has been removed
"progress"
And here we are, the central argument for why code agents are not these job killing hype beasts that are so regularly claimed.
Has anyone seen what multi-agent code workflows produce? Take a look at openclaw, the code base is an absolute disaster. 500k LoC for something that can be accomplished in 10k.
In the days of AGI, higher LoC is better. It just means the code is more robust, more adaptable, better suited to real world conditions.
I have. Sometimes the resulting code was much worse than what you get from an LLM, and yet the project itself was still a success despite this.
I've also worked in places with code review, where the project's own code quality architecture-and-process caused it to be so late to the market it was an automatic failure.
What matters to a business is ideally identical to the business metrics, which are usually not (but sometimes are) the code metrics.
Mission accomplished: acquhire worth probably millions and millions.
I agree with you, by the way.
the bottleneck is aligning people on what the right thing to do is, and fiting the change into everyone's mental models. it gets worse the more people are involved
Getting code written and reviewed is the trivial part of the job in most cases, discovering the product needs, considering/uncovering edge-cases, defining business logic that is extensible or easily modifiable when conditions change, etc. are the parts that consume 80% of my time.
We in the engineering org at the company I work for have raised this flag many times during adoption of AI-assisting tools, now that the rollout is deeply in progress with most developers using the tools, changing workflows, it has become the sore thumb sticking out: yes, we can deliver more code if it's needed but for what exactly do you need it?
So far I haven't seen a speed up in decision-making, the same chain of approvals, prioritisation, definitions chugs along as it was and it is clearly the bottleneck.
If we're very lucky, we'll break even time wise compared to just running a single agent on a tight leash.
The threat of AI for devs, and the way to drastically improve productivity is there: keep the better devs who can think systemically, who can design solutions, who can solve issues themselves and give them all the AI help available, cut the rest.
It really does feel like a multiplier on me and I understand things enough to get my hands dirty where Claude struggles.
Lately I’ve been wondering if that role evolves into a more hierarchical review system: senior engineers own independent modules end-to-end, and architects focus on integration, interfaces, and overall coherence. Honestly, the best parts of our product already worked like that even before AI.
Soon, I predict we will see a pretty significant jump in price that will make a 10% productivity gain seem tiny compared to the associated bills.
For now, these companies are trying to reach critical mass so their users are so dependant on their tech that they have to keep paying at least in the short term.
Regardless, if you’re a dev who is now 2x as productive in terms of work completed per day, and quality remains stable, why should this translate to 2x the output? Most people are paid by the hour and not for outcomes.
And yes, I am suggesting that if you complete in 4 hours that which took you 8 hours in 2019, that you should consider calling it a day.
At a personal level, AI has made non-trivial improvements to my life. I can clearly see the value in there.
At an organizational level, it tends to get in the way much more than helping out. I do not yet see the value in there.
Summarizing this with AI makes you lose that context.
> The minutiae are often just as important as some of the higher level decisions.
Frankly, a failure to understand this is a tell that someone is not equipped to evaluate code quality.
According to the article, onboarding speed is measured as “time to the 10th Pull Request (PR).”
As we have seen on public GitHub projects, LLMs have made it really easy to submit a large number of low-effort pull requests without having any understanding of a project.
Obviously, such a kind of higher onboarding speed is not necessarily good for an organization.
is that mastery still useful as time goes on though? its always felt a bit like its unhealthy for code to have people with mastery on it. its a sign of a bad bus factor. every effort ive ever seen around code quality and documentation improvement has been to make that code mastery and full understanding irrelevant.
(It used a clever and rigorous technique for measuring productivity differences, BTW, for anyone as skeptical of productivity measures as I am.)
[0] https://mikelovesrobots.substack.com/p/wheres-the-shovelware...
That info is from mid 2025, talking about models released in Oct 2024 and Feb 2025. It predates tools like Claude Code and Codex, Lovable was 1/3 current ARR, etc.
This might still be true but we desperately need new data.
(Also, Anthropic released Claude Code in Febuary of 2025, which was near the start of the period the study ran).
However, because these threads always go the same way whenever I post this, I'll link to a previous thread in hopes of preempting the same comments and advancing the discussion! https://news.ycombinator.com/item?id=46559254
Also, DX (whose CTO was giving the presentation) actually collects telemetry-based metrics (PR's etc.) as well: https://getdx.com/uploads/ai-measurement-framework.pdf
It's not clear from TFA if these savings are self-reported or from DX metrics.
I dont think this is a purely AI problem more with the legacy costs of maintaining many minds that can't be solved by just giving people AI tools until the AI comes for the CTO role (but not CEO or revenue generating roles) too and whichever manager is bottlenecking.
I imagine a future where we have Nasdaq listed companies run by just a dozen people with AI agents running and talking to each other so fast that text becomes a bottleneck and they need another medium that can only be understood by an AI that will hold humans hand
This shift would also be reflected by new hardware shifts...perhaps photonic chips or anything that lets AI scale up crazy without the energy cost....
Exciting times are ahead AI but it's also accelerating digital UBI....could be good and bad.
Do you have sources for this claim?
* Getting code reviewed
* Making sure its actually solving the problem
* Communicating to the rest of the team whats happening
* Getting tests to pass
* Getting it deployed
* Verifying that the fix is implemented in production
* Starting it all over when there is a misunderstanding
Slinging more code faster is great and getting unit testing more-or-less for free is awesome but the separation between a good and great engineer is one of communication and management.
AI is causing us to regress to thinking that code velocity is a good metric to use when comparing engineers.
Expecting AI to magically overcome your development culture is like expecting consultants to magically fix your business culture.
Furthermore, by various estimates, engineers only spend 10 - 60% of their time on actual code. So, given that currently AI is largely used only for coding activities, 10% is actually considerable savings.
Also this is the result of retro-fitting AI into existing workflows; actual "AI-native" workflows would probably look very different, likely having refactored in other parts of software engineering. Spotify's "Honk" workflow is probably just a starting point.
I'll be honest: I piss poor code, each time I come back to an old project I see where I could have done better. New hires are worse, but before AI (and especially Opus) they didn't produce that much code before spending like 6 months learning (I'm on a netsec tooling team). Now, they start producing code after two weeks or less, and every line have to be checked because they don't understand what they are doing.
I think my personal output was increased by 15% on average (maybe 5 on difficult projects), but our team output decreased overall.
And I'm also hearing grumblings about entry level talent that is absolutely clueless without AI, which does not help the junior hiring scene at all.
At this point it seems clear that people wishing to learn a discipline should restrict their usage of AI until they have "built the muscles", but none of our educational, testing, recruitment and upskilling practices are conducive to that.
nasretdinov•1h ago
doomslayer999•1h ago
co_king_5•1h ago
Are the AI firms capable of retraining their models to understand new features in the technologies we work with? Or are LLMs going to be stuck generating C.A. 2022 boilerplate forever?
doomslayer999•1h ago
mattmanser•54m ago
Right now, front end has tons of boiler plate. It's one of the reasons AI hassle such a wow factor for FE, trivial tasks require a lot of code.
But even that is much better than it was 10 years ago.
That was a long way of saying I disagree with your no.
skydhash•46m ago
matthewbauer•1h ago
otabdeveloper4•1h ago
The OG data came from sites like Stackoverflow. These sites will stop existing once LLMs become better and easier to use. Game over.
esclerofilo•1h ago
co_king_5•1h ago
esclerofilo•59m ago
otabdeveloper4•50m ago
8note•43m ago
moffkalast•3m ago