So in that extra time, you can now stack more PRs that still have a 30 hour review time and have more overall throughput (good lord, we better get used to doing more code review)
This doesn’t work if you spend 3 minutes prompting and 27 minutes cleaning up code that would have taken 30 minutes to write anyway, as the article details, but that’s a different failure case imo
Generally if your job is acting as an expensive frontend for senior engineers to interact with claude code, well, speaking as a senior engineer I'd rather just use claude code directly.
We can use AI these days to add another layer.
Hang on, you think that a queue that drains at a rate of $X/hour can be filled at a rate of 10x$X/hour?
No, it cannot: it doesn't matter how fast you fill a queue if the queue has a constant drain rate, sooner or later you are going to hit the bounds of the queue or the items taken off the queue are too stale to matter.
In this case, filling a queue at a rate of 20 items per hour (every 3 minutes) while it drains at a rate of 1 item every 5 hours means that after a single day, you can expect your last PR to be reviewed in ((8x20) - 1) hours.
IOW, after a single day the time-to-review is 159 hours. Your PRs after the second day is going to take +300 hours.
There are some strategies that help: a lot of the AI directives need to go towards making the code actually easy to review. A lot of it it sits around clarity, granularity (code should be committed primarily in reviewable chunks - units of work that make sense for review) rather than whatever you would have done previously when code production was the bottleneck. Similarly, AI use needs to be weighted not just more towards tests, but towards tests that concretely and clearly answer questions that come up in review (what happens on this boundary condition? or if that variable is null? etc). Finally, changes need to be stratified along lines of risk rather than code modularity or other dimensions. That is, if a change is evidently risk free (in the sense of, "even if this IS broken it doesn't matter) it should be able to be rapidly approved / merged. Only things where it actually matters if it wrong should be blocked.
I have a feeling there are whole areas of software engineering where best practices are just operating on inertia and need to be reformulated now that the underlying cost dynamics have fundamentally shifted.
Why don't those other dimensions, and especially the code modularity, already reflect the lines of business risk?
Lemme guess, you cargo culted some "best practices" to offload risk awareness, so now your code is organized in "too big to fail" style and matches your vendor's risk profile instead of yours.
I guess the answer (if you're really asking seriously) is that previously when code production cost so far outweighed everything else, it made sense to structure everything to optimise efficiency in that dimension.
So if a change was implemented, the developer would deliver it as a functional unit that might cut across several lines of risk (low risk changes like updating some CSS sitting along side higher risk like a database migration, all bundled together). Because this was what made it fastest for the developer to implement the code.
Now if AI is doing it, screw how easy or fast it is to make the change. Deliver it in review chunks.
Was the original method cargo culted? I think most of what we do is cargo culted regardless. Virtually the entire software industry is built that way. So probably.
I think GP is thinking in terms of being incentivized by their environment to demonstrate an image of high personal throughput.
In a dysfunctional organization one is forced to overpromise and underdeliver, which the AI facilitates.
If I can approve something without review, it’s instant. If it requires only immediate manager, it takes a day. Second level takes at least ten days. Third level trivially takes at least a quarter (at least two if approaching the end of the fiscal year). And the largest proposals I’ve pushed through at large companies, going up through the CEO, take over a year.
You don't need so much code or maintenance work if you get better requirements upfront. I'd much rather implement things at the last minute knowing what I'm doing than cave in to the usual incompetent middle manager demands of "starting now to show progress". There's your actual problem.
In software it's the opposite, in my experience.
> You don't need so much code or maintenance work if you get better requirements upfront.
Sure, and if you could wave a magic wand and get rid of all your bugs that would cut down on maintenance work too. But in the real world, with the requirements we get, what do we do?
That's been my experience as well: ten hours of doing will definitely save you an hour of planning.
If you aren't getting requirements from elsewhere, at least document the set of requirements you think you're working towards, and post them for review. You sometimes get new useful requirements very fast if you post "wrong" ones.
And I think this has become even more so with the age of ai, because there is even more unknown unknowns, which is harder to discover while planning, but easy wile “doing” and that “doing” itself is so much more streamlined.
In my experience no amount of planning will de-risk software engineering effort, what works is making sure coming back and refactoring, switching tech is less expensive, which allows you to rapidly change the approach when you inevitably discover some roadblock.
You can read all the docs during planning phases, but you will stumble with some undocumented behaviour / bug / limitation every single time and then you are back to the drawing board. The faster you can turn that around the faster you can adjust and go forward.
I really like the famous quote from Churchill- “Plans are useless, planning is essential”
You expect your calculator to always give correct answers, your bank to always transfer your money correctly, and so on.
I agree with him anyway: if every dev felt comfortable hitting a stop button to fix a bug then reviewing might not be needed.
The reality is that any individual dev will get dinged for not meeting a release objective.
Now I work at a company where reviews take minutes. We have 5 lines of technical debt per 3 lines of code written. We spend months to work on complicated bugs that have made it to production.
So we will need to extract the decision making responsibility from people management and let the Decision maker be exclusively focused on reviewing inputs, approving or rejecting. Under an SLA.
My hypothesis is that the future of work in tech will be a series of these input/output queue reviewers. It's going to be really boring I think. Probably like how it's boring being a factory robot monitor.
Actually you can. If you shift the reviews far to the left, and call them code design sessions instead, and you raise problems on dailys, and you pair programme through the gnarly bits, then 90% of what people think a review should find goes away. The expectation that you'll discover bugs and architecture and design problems doesn't exist if you've already agreed with the team what you're going to build. The remain 10% of things like var naming, whitespace, and patterns can be checked with a linter instead of a person. If you can get the team to that level you can stop doing code reviews.
You also need to build a team that you can trust to write the code you agreed you'd write, but if your reviews are there to check someone has done their job well enough then you have bigger problems.
Agents are getting really good, and if you're used to planning and designing up front you can get a ton of value from them. The main problem with them that I see today is people having that level of trust without giving the agent the context necessary to do a good job. Accepting a zero-shotted service to do something important into your production codebase is still a step too far, but it's an increasingly small step.
I have been doing this to, and I've forgotten half of them. For me the point is that this usage scenario is really good, but it also has no added value to it, really. The moment Claude Code raises it prices 2x this won't be viable anymore, and at the same time to scale this to enterprise software production levels you need to spend on an agent probably as much as hiring two SWEs, given that you need at least one to coordinate the agents.
1. The longer I work in this industry, the more it becomes clear that CxO's aren't great at projecting/planning, and default to copy-cat, herd behaviors when uncertain.
Perhaps kind of a pain to inject fixes in, have to rebase the outstanding work. But I kind of like this idea of the org having responsibility to do what review it wants, without making every person have to coral all the cats to get all the check marks. Make it the org's challenge instead.
I tell every hire new and old “Hey do your thing, we trust you. Btw we have your phone number. Thanks”
Works like a charm. People even go out of their way to write tests for things that are hard to verify manually. And they verify manually what’s hard to write tests for.
The other side of this is building safety nets. Takes ~10min to revert a bad deploy.
1. I don't care because the company at large fails to value quality engineering.
2. 90% of PR comments are arguments about variable names.
3. The other 10% are mistakes that have very limited blast radius.
It's just that, unless my coworker is a complete moron, then most likely whatever they came up with is at least in acceptable state, in which case there's no point delaying the project.
Regarding knowledge share, it's complete fiction. Unless you actually make changes to some code, there's zero chance you'll understand how it works.
My trust in my colleagues is gone, I have no reason to believe they wrote the code they asked me to put my approval on, and so I certainly don’t want to be on a postmortem being asked why I approved the change.
Perhaps if I worked in a different industry I would feel like you do, but payments is a scary place to cause downtime.
https://blog.barrack.ai/amazon-ai-agents-deleting-production...
But. The design contract needs review, which takes time.
and it also works for me when working with ai. that produces much better results, too, when I first so a design session really discussing what to build. then a planning session, in which steps to build it ("reviewability" world wonder). and then the instruction to stop when things get gnarly and work with the hooman.
does anyone here have a good system prompt for that self observance "I might be stuck, I'm kinda sorta looping. let's talk with hooman!"?
> Get it code reviewed by the peer next to you 300 minutes → 5 hours → half a day
Is it takes 5 hours for a peer to review a simple bugfix your operation is dysfunctional.
We talked a lot about the costs of context switches so its reasonable to finish your work before switching to the review.
This seems to check out, and it's the reason why I can't reconcile the claims of the industry about workers replacement with reality. I still wonder when a reckoning will come, though. seems long overdue in the current environment
tptacek•2h ago
abtinf•2h ago
paulmooreparks•2h ago
bsjshshsb•37m ago
Needing full human attention on a co.plex task from a pro who can only look at your thing has a wait time. It is worse when there are only 2 or 3 such people in the world you can ask!
nixon_why69•2h ago
lelanthran•2h ago
Most devs set aside some time at most twice a day for PRs. That's 5 hours at least.
Some PRs come in at the end of the day and will only get looked at the next day. That's more than 5 hours.
IME it's rare to see a PR get reviewed in under 5 hours.
CBLT•1h ago
riffraff•1h ago
If you work in a team of 5 people, and each one only reviews things twice a day, that's still less than 5 hours any way you slice it.
Aurornis•2h ago
ukuina•23m ago
Most excellent.