Is that how it works? Do managers claim credit for the work of those below them, despite not doing the work?
I hope they also get penalised when a lowly worker does a bad thing, even if the worker is an LLM silently misinterpreting a vague instruction.
When things go south, no penalization is made. A simple "post-mortem" is written in confluence and people write "action items". So, yeah, no need for the manager to get the blame.
It's all very shitty, but it's always been like that.
Yeah the buck stops with the manager (IMO the direct manager). So I can do some constructive criticism with my dev if they make a mistake, but it's my fault in the larger org that it happened. Then it's my manager's job to work with me to make sure I create the environment where the same mistake doesn't happen again. Am I training well? Am I giving them well-scoped work? All that.
Solving new problems is a thing engineers get to do constantly, whereas building an agent infrastructure is mostly a one-ish time thing. Yes, it evolves, but I worry that once the fun of building an agentic engineering system is done, we’re stuck doing arguably the most tedious job in the SDLC, reviewing code. It’s like if you were a principal researcher who stopped doing research and instead only peer reviewed other people’s papers.
The silver lining is if the feeling of faster progress through these AI tools gives enough satisfaction to replace the missing satisfaction of problem-solving. Different people will derive different levels of contentment from this. For me, it has not been an obvious upgrade in satisfaction. I’m definitely spending less time in flow.
but a chart of commits/contribs is such a lousy metric for productivity.
It's about on par with the ridiculousness of LOC implying code quality.
And it's not like I'm blindly commiting LLM output. I often write everything myself because I want to understand what I'm doing. Claude often comments that my version is better and cleaner. It's just that the tasks seemed so monumental I felt paralyzed and had difficulty even starting. Claude broke things down into manageable steps that were easy to do. Having a code review partner was also invaluable.
That said, by the time I'm happy with it all the AI stuff outside very boilerplate ops/config stuff has been rewritten and refined. I just find it quite helpful to get over that initial hump of "I have nothing but a dream" to the stage of "I have a thing that compiles but is terrible". Once I can compile it then I can refine which where my strengths lie.
Every comment I make is a "really perceptive observation" according to Claude and every question I ask is either "brilliant" or at least "good", so...
Most effective engineers on the brownfield projects I've worked on, usually deleted more LOC than they've added, because they were always looking to simplify the code and replace it with useful (and often shorter) abstractions.
Especially in brownfield settings, if you do use CC, you really should be spending something like a day refactoring the code for every 15 minutes of work it spends implementing new functionality. Otherwise the accumulation of technical debt will make the code base unworkable by both human and claude hands in a fairly short time.
I think overall it can be a force for good, and a source of high quality code, but it requires a significant amount of human intervention.
Claude Code operating on unsupervised Claude code fairly rapidly generates a mess not even Claude Code can decode, resulting in a sort of technical debt Kessler syndrome, where the low quality makes the edits worse, which makes the quality worse, rinse and repeat.
the assumption to this workflow is that claude code can complete tasks with little or no oversight.
If the flow looks like review->accept, review->accept, it is manageable.
In my personal experience, claude needs heavy guidance and multiple rounds of feedback before arriving at a mergeable solution (if it does at all).
Interleaving many long running tasks with multiple rounds of feedback does not scale well unfortunately.
I can only remember so much, and at some point I spend more time trying to understand what has been done so far to give accurate feedback than actually giving feedback for the next iteration.
I'm so conflicted about this. On the one hand I love the buzz of feeling so productive and working on many different threads. On the other hand my brain gets so fried, and I think this is a big contributor.
I have nothing to back up the idea though.
I also have nothing to back it up, but it fits my mental models. When juggling multiple things as humans, it eats up your context window (working memory). After a long day, your coherence degrades and your context window needs flushing (sleeping) and you need to start a new session (new day, or post-nap afternoon).
I prefer focusing mostly on 1 task at a time (sometimes 2 for a short time, or asking other agent some questions simultaneously) and doing the task in chunks so it doesn't take much time until you have something to review. Then I review it, maybe ask for some refactoring and let it continue to the next step (maybe let it continue a bit before finishing review if feeling confident about the code). It's easier to review smaller self-contained chunks and easier to refer to code and tell AI what needs changing because of fewer amount of relevant lines.
Turns out we weren't opposed to bad metrics! We were just opposed to being measured! Given the chance to pick our own, we jumped straight to the same nonsense.
This seems like a distinction without a difference, unless there actually are any good metrics (which also requires them to be objectively and reliably quantifiable). I think most developers don't really want to measure themselves, it's just that pro-AI people think measurement is necessary to put forward a convincing argument that they've improved anything.
If you have the tokens for it, having a team of agents checking and improving on the work does help a lot and reduces the slop.
Why do people do this? Why do they outsource something that is meant to have been written by a human, so that another human can actually understand what that first human wanted to do, so why do people outsource that to AI? It just doesn't make sense.
We have “Cursor Bot” enabled at work. It reviews our PRs (in addition to a human review)
One thing it does is add a PR summary to the PR description. It’s kind of helpful since it outlines a clear list of what changed in code. But it would be very lacking if it was the full PR description. It doesn’t include anything about _why_ the changes were made, what else was tried, what is coming next, etc.
Mentioning LLM usage as a distinction is like bragging about using a modern compiler instead of writing assembly. Yeah it's faster, but so is everyone else code... Besides, I wouldn't brag about being more productive with LLMS because it's a double edge sword: it's very easy to use them, and nobody is reviewing all the lines of code you are pushing to prod (really, when was the last time you reviewed a PR generated by AI that changed 20+ files and added/removed thousands of lines of code?), so you don't know what's the long game of your changes; they seem to work now but who knows how it will turn out later?
Outside of work, yeah, everything is fine and there's nothing but the pure pursue of knowledge and joy.
Is that the end game? Well why can’t the agents orchestrate the agents? Agents all the way down?
The whole agent coding scene seems like people selling their soul for very shiny inflatable balloons. Now you have twelve bespoke apps tailored for you that you don’t even care about.
Unless you don't review every generated line manually, and instead rely on, let's say, UI e2e testing, or perhaps unit testing (that the agents also wrote). I don't know, perhaps we are past the phase of "double check what agents write" and are now in the phase of "ship it. if it breaks, let agents fix it, no manual debugging needed!" ?
I'm sure these larger models are both faster and more cogent, but its also clear what matter is managing it's side tracks and cutting them short. Then I started seeing the deeper problematic pattern.
Agents arn't there to increase the multifactor of production; their real purpose is to shorten context to manageable levels. In effect, they're basically try to reduce the odds of longer context poisoning.
So, if we boil down the probabilty of any given token triggering the wrong subcontext, it's clear that the greater the context, the greater the odds of a poison substitution.
Then that's really the problematic issue every model is going to contend with because there's zero reality in which a single model is good enough. So now you're onto agents, breaking a problem into more manageable subcontext and trying to put that back into the larger context gracefully, etc.
Then that fails, because there's zero consistent determinism, so you end up at the harness, trying to herd the cats. This is all before you realize that these businesses can't just keep throwing GPUs at everything, because the problem isn't computing bound, it's contextual/DAG the same way a brain is limited.
We all got intelligence and use several orders of magnitude less energy, doing mostly the same thing.
jmathai•1h ago
I've started to use git worktrees to parallelize my work. I spend so much time waiting...why not wait less on 2 things? This is not a solved problem in my setup. I have a hard time managing just two agents and keeping them isolated. But again, I'm the bottleneck. I think I could use 5 agents if my brain were smarter........or if the tools were better.
I am also a PM by day and I'm in Claude Code for PM work almost 90% of my day.