Why Vibe Coding Fails

6•10keane•1h ago

i am using claude to maintain an agent loop, which will pause to ask for users' approval before important tool call. while doing some bug fixes，i have identified some clear patterns and reasons why vibe coding can fail for people who dont have technical knowledge and architecture expertise.

let me describe my workflow first - this has been my workflow across hundreds of successful sessions: 1. identify bugs through dogfooding 2. ask claude code to investigate the codebase for three potential root causes. 3. paste the root causes and proposed fixes to claude project where i store all architecture doc and design decision for it to evaluate 4. discuss with claude in project to write detailed task spec - the task spec will have a specified format with all sorts of test 5. give it back to claude code to implement the fix

in today's session, the root cause analysis was still great, but the proposed fixes are so bad that i really think that's how most of vibe coded project lost maintainability in the long run.

there is two of the root causes and proposed fix:

bug: agent asks for user approval, but sometimes the approval popup doesnt show up. i tried sending a message to unstick it. message got silently swallowed. agent looks dead. and i needed to restart the entire thing.

claude's evaluation: root cause 1: the approval popup is sent once over a live connection. if the user's ui isn't connected at that moment — page refresh, phone backgrounded, flaky connection — they never see it. no retry, no recovery.

this is actually true.

proposed fix "let's save approval state to disk so it survives crashes". sounds fine but then the key is by design, if things crashes, the agent will cold-resume from the session log, and it wont pick up the approval state anyway. the fix just add schema complexity and it's completely useless

root cause 2: when an approval gets interrupted (daemon crash, user restart), there's an orphan tool_call in the session history with no matching tool_result.

proposed fix: "write a synthetic tool_result to keep the session file structurally valid." sounds clean. but i asked: who actually breaks on this? the LLM API? no, it handles missing results. the session replay? no, it reads what's there. the orphan tool_call accurately represents what happened: the tool was called but never completed. that's the truth. writing a fake result to paper over it introduces a new write-coordination concern (when exactly do you write the fake result? what if the daemon crashes during the write?) to solve a problem that doesn't exist. the session file isn't "broken." it's accurate.

claude had full architecture docs, the codebase, and over a hundred sessions of project history in context. it still reaches for the complex solution because it LOOKS like good engineering. it never asked "does it even matter after a restart?"

i have personally encounterd this preference for seemingly more robust over-engineering multiple times. and i genuinely believe that this is where human operate actually should step in, instead of giving an one-sentence requirement and watches agents to do all sorts of "robust" engineering.

Comments

boesboes•1h ago

> because it LOOKS like good engineering

That is the whole problem imho. I've found that I can use LLMs to do programming only if I fully understand the problem and solution. Because if I don't, it will just pretend that I'm right and happily spend hours trying to implement a broken idea.

The problem is that it's very hard to known whether my understanding of something is sufficient to have claude propose a solution and for me to know if it is going to work. If my understanding of the problem is incorrect or incomplete, the plan will look fine too me, but it will be wrong.

If I start working on something from poor understanding, I will notice and improve my understanding. A LLM will just deceive and try to do the impossible anyway.

Also, it overcooks everything, atleast 50-60% of the code it generates are pointlessly verbose abstractions. agian: imho, ymmv, ianal, not financial advice ;)

10keane•1h ago

exactly. vibe coding only works when you fully understand the problem and know precisely how to solve it. ai just do the dirty implementation work for you.

that is another reason in why i separate product/architecture design and implementation into two agents with isolated context in my workflow. because i can always iterate with the product agent to refine my understanding and THEN ask the coding agent to implement it. by that time i already have the ability to make proper judgement and evaluate coding agent's output

Open-Sourcing SEC Edgar on Hugging Face

40% Increased Throughput 16.8% Less Energy for AI (Verified via ZKP)

Democracy Policy Under Obama [pdf]

Show HN: Lazy-HN, a faster Hacker News front end you probably don't need

Rest of the World Annual Report 2025

Snap's Crucible Moment

Show HN: Evo – parallel autoresearch experiments for Claude Code and Codex

Cal.com is going closed source

Richard Dawkins, let's not bring back Neanderthals

Ask HN: Which LLM model and agentic CLI are you using for local development?

The Malleable Computer

I built a calculator site that doesn't look like garbage

We're only seeing the tip of the chip-smuggling iceberg

Meta creating AI version of Mark Zuckerberg so staff can talk to the boss

The best way to advertise a programming language

Cybersecurity Looks Like Proof of Work Now

Show HN: A semantic flow tool for embeddings

Allbirds shares surge over 430% as footwear firm trades shoes for AI business

I built my first AI agent (and what I got wrong)

I'm curating a digital library of lindy books

Show HN: Cachefetch – Fast CLI tool that shows cache file sizes

Unreal Engine C++ compilation for Windows under Linux with Wine

WhatDoTheyMake, Anonymous Salary Sharing

Show HN: Aegis – 85ns Sovereign Infrastructure Running on $100 Android Hardware

No one's sure if synthetic mirror life will kill us all

Mathematics Isn't Unreasonably Effective

Show HN: I built on-device TTS app because I run out of audiobooks on a flight

Technical debt is dead, the metaphor is broken

Show HN: DeepFake Detector Flags Swalwell Video as Fake

Show HN: Avec – iOS email app that lets you handle your Gmail inbox in seconds