let me describe my workflow first - this has been my workflow across hundreds of successful sessions: 1. identify bugs through dogfooding 2. ask claude code to investigate the codebase for three potential root causes. 3. paste the root causes and proposed fixes to claude project where i store all architecture doc and design decision for it to evaluate 4. discuss with claude in project to write detailed task spec - the task spec will have a specified format with all sorts of test 5. give it back to claude code to implement the fix
in today's session, the root cause analysis was still great, but the proposed fixes are so bad that i really think that's how most of vibe coded project lost maintainability in the long run.
there is two of the root causes and proposed fix:
bug: agent asks for user approval, but sometimes the approval popup doesnt show up. i tried sending a message to unstick it. message got silently swallowed. agent looks dead. and i needed to restart the entire thing.
claude's evaluation: root cause 1: the approval popup is sent once over a live connection. if the user's ui isn't connected at that moment — page refresh, phone backgrounded, flaky connection — they never see it. no retry, no recovery.
this is actually true.
proposed fix "let's save approval state to disk so it survives crashes". sounds fine but then the key is by design, if things crashes, the agent will cold-resume from the session log, and it wont pick up the approval state anyway. the fix just add schema complexity and it's completely useless
root cause 2: when an approval gets interrupted (daemon crash, user restart), there's an orphan tool_call in the session history with no matching tool_result.
proposed fix: "write a synthetic tool_result to keep the session file structurally valid." sounds clean. but i asked: who actually breaks on this? the LLM API? no, it handles missing results. the session replay? no, it reads what's there. the orphan tool_call accurately represents what happened: the tool was called but never completed. that's the truth. writing a fake result to paper over it introduces a new write-coordination concern (when exactly do you write the fake result? what if the daemon crashes during the write?) to solve a problem that doesn't exist. the session file isn't "broken." it's accurate.
claude had full architecture docs, the codebase, and over a hundred sessions of project history in context. it still reaches for the complex solution because it LOOKS like good engineering. it never asked "does it even matter after a restart?"
i have personally encounterd this preference for seemingly more robust over-engineering multiple times. and i genuinely believe that this is where human operate actually should step in, instead of giving an one-sentence requirement and watches agents to do all sorts of "robust" engineering.
boesboes•1h ago
That is the whole problem imho. I've found that I can use LLMs to do programming only if I fully understand the problem and solution. Because if I don't, it will just pretend that I'm right and happily spend hours trying to implement a broken idea.
The problem is that it's very hard to known whether my understanding of something is sufficient to have claude propose a solution and for me to know if it is going to work. If my understanding of the problem is incorrect or incomplete, the plan will look fine too me, but it will be wrong.
If I start working on something from poor understanding, I will notice and improve my understanding. A LLM will just deceive and try to do the impossible anyway.
Also, it overcooks everything, atleast 50-60% of the code it generates are pointlessly verbose abstractions. agian: imho, ymmv, ianal, not financial advice ;)
10keane•1h ago
that is another reason in why i separate product/architecture design and implementation into two agents with isolated context in my workflow. because i can always iterate with the product agent to refine my understanding and THEN ask the coding agent to implement it. by that time i already have the ability to make proper judgement and evaluate coding agent's output