Vibe code isn't meant to be reviewed

https://monadical.com/posts/vibe-code-how-to-stay-in-control.html

17•Firfi•3h ago

Comments

Firfi•3h ago

After some vibe coding frustrations, ups and downs, I found that splitting the code explicitly into well-curated, domain-heavy guidance code and code marked “slop” can solve a lot of frustration and inefficiency.

We can be honest in our PR, “yes, this is slop,” while being technical and picky about code that actually matters.

The “guidance” code is not only great for preserving knowledge and aiding the discovery process, but it is very strong at creating a system of “checks and balances” for your AI slops to conform to, which greatly boosts vibe quality.

Helps me both technically (at least I feel so) with guiding claude code to do exactly what I want (or what we agreed to!) and psychologically because there's no detachment from the knowledge of the system anymore.

mkleczek•3h ago

What is the measured LoC ratio of well-curated to "slop" code?

Firfi•2h ago

Just feeling and experience, really. For me, if I spent time with the vibe code snippet and improved it until I can say "yes I would've written this" it's not slop anymore, even if it was written by Claude initially.

On the contrary, if I glanced over the code and could say "ok it doesn't look terrible, no obvious `rm -rf` and all", even if I changed a couple obvious mistakes, I still consider it vibe.

mkleczek•2h ago

I was more asking to assess the actual gain.

So the question really is: in your experience how much code requires careful review and re-prompting vs leaving it as "not terrible".

Asking because my experience is that in practice LLMs are no better than juniors - ie. it is more effective to just write the thing by myself instead of multiple rounds of reviewing and re-prompting which does not really achieve what I really want.

Firfi•1h ago

That's one of my biggest frustrations - I wasted a lot of time on reprompting. I was making myself stick to 100% LLM approach for a while, in order to learn.

I can't say for everyone, but for me it's hit-and-miss: if LLM starts with "Oh, sorry, you're right" that's a STRONG signal I have to take over right now or rethink the approach, or I get into the doom spiral of reprompting and waste half a day on something I could've done myself by that point, with only difference that after half a day with a coding agent I discovered no important domain or technical knowledge.

So, "how much" to me depends so very much on seemingly random factors, including the time of the day when Antropic decides to serve their quantised version instead of a normal one. On non-random too, like how difficult the domain area is, how well you described it in the prompt, and how well you crafted your system queries. And I hate it very much! At this point, I'm trigger-happy to take over the control and write the stuff that LLM can't in the "controlling package" and tell it to use it as an example / safety check.

mkleczek•1h ago

> how well you described it in the prompt, and how well you crafted your system queries.

This part is the most frustrating in discussions about LLMs. Since there are no criteria to measure the quality of your prompting there is really no way to learn the skill. Assessing prompting skills based on the actual results is wrong as it does not isolate the model capabilities.

Hence the whole thing looks a lot like an ancient shamanism.

PaulHoule•2h ago

Lately I've been thinking "there is no such thing as an application, there are only screens" in the context of HTMX-enhanced web applications.

If your persistence layer and long-term data structures are solid you can accept shoddy coding in screens (e.g. a small bundle of http endpoints.) From that viewpoint you modernize an application a screen at a time and if you don't like a shoddy screen you create a new screen. From that viewpoint you vibe code screens but schemas and updating are carefully handwritten code, though I think deterministic code generation from a schema is the power tool for that.

SoftTalker•2h ago

Problem is that what "actually matters" isn't always obvious, at least not to everyone.

When they built Citicorp Center, the contractor bolted the steel insstead of welding it. It was thought to be an implementation detail. Bolting was cheaper, and nobody thought it actually mattered. Until the actual engineer who designed it looked more carefully and discovered that as a result the building was more vulnerable to wind loads. Expensive rework was required to open up the interior walls and weld all the bolted connections.

Firfi•2h ago

It seems to me we have to find out how to figure out "what matters" to have the benefits that 10x vibe coder bros promise. I think we still have to review (despite my clickbait title), it's just different things that we are looking for in slop, and different type/amount of mental strain required. For more important libs, I guess we can "overshoot" a bit and put more time into vetting vibe code (and making it the guardrail code). While in the "next revolutionary React Todo App" the balance could be much farther towards vibe...

dtagames•3h ago

Great stuff here. I've also found that doing all the architecture and interface work manually, then having Cursor write plans that follow that architecture to implement specific features is an ideal place to separate the human from the vibe coding.

Then, it's easy to revise the plan itself or have Cursor do that, then re-run it to make individual implementation changes or fixes that don't affect your architecture or invent new interfaces.

ufo•2h ago

This reminds me of the old idea that you could get some architecture astronauts to specify a system and then offload the implementation of the individual modules to the cheapest developers one could find. Which didn't always work well in practice...

paxys•2h ago

No you don't understand, UML will make software developers obsolete.

iwontberude•2h ago

This is certainly not engineering. Maybe I’m finally a curmudgeon after years of chasing everything in this industry and getting burned enough times but vibe coding without manual review and testing is antithetical to contemporary study on software engineering and development. Are we really gluttons for punishment like this? Like writing a novel on a Nokia 6630, it’s a fun oddity but not really a scalable way to create value.

Firfi•2h ago

I actually do recommend reviewing manually - it's just very convenient to see when a person wrote something (then much more scrutiny can be applied) vs. when the work was outsourced to AI. I feel like there is another application though, but didn't mention it for it's not that clear to me yet: you can yet again estimate whether a new programmer can actually code or if they 10x YOLO their way slowly bringing codebase maintainability down.

OutOfHere•2h ago

Precisely. Unreviewed vibe coding is never going to scale. In the best case, if it works at all, performance issues will persist. In the worst case it could have bugs that will silently destroy data, also create innumerable security vulnerabilities. Future AIs may resolve these limitations, but current AI is not there yet.

To me, coding is a process of learning and discovery, of perpetually preparing me to develop something even better next time. Just as a developer wouldn't be using libraries if they weren't well written, the same logic extends to applications.

I can definitely see poor programmers rely on unreviewed vibe code, and I guess they have nothing to lose, but I don't imagine anyone actually using their output. It's like trying to resell an AI generated image; there is just no market for it after the initial generation.

deadbabe•2h ago

Be honest: do people vibe code because they simply can’t imagine all the complexity and details of what they’re trying to achieve, or is it simply because it’s faster than typing it all out?

If it’s the latter, perhaps it’s a sign that we are making languages too verbose, and there’s a lot of boilerplate patterns that could be cut down if we give ourselves wider vocabulary (syntax) to express concepts.

In the end, if we can come up with a language that is 1 to 1 with the time and effort spent to write equivalent prompts, there will be no need for vibe coding anymore unless you really don’t know what you’re doing, in which case you should develop your skills or simply not be a software engineer. Some may say this language already exists.

starkparker•2h ago

This sets up a pointless strawman about reviews for the headline that distracts so much from the point that I only caught it on second read:

Restrict agentic workflows to implementation details, hand-write the higher-level logic and critical tests, and only pay attention to whether those human-written tests pass or fail. Then you don't have to worry about reviewing agent-generated code as long as the human-written tests about the functionality pass.

(Still not sure I agree, not least of which for security and performance reasons at existing orgs; this assumes very good test coverage and design exist before any code is written. Interesting for greenfield projects though.)

9rx•2h ago

> this assumes very good test coverage and design exist before any code is written.

Does it? In the olden days when hand-coding everything was the only way, you'd write a single test, implement what is necessary for it to pass, and then repeat until you have the full set of functionality covered. Your design would also emerge out of that process.

Which, conveniently, is also how AI seems to work best in this role. i.e. Give it a minimal task and then keep iteratively expanding upon it with more and more bits of information until finally reaching completion. So, in theory, I'm not sure anything has changed.

But the roundtrip time on the agents today is excruciatingly slow, so the question is: Does the typical developer have enough fortitude to stick with it from start to finish without looking for shortcuts to speed up the process? It may not be practical for that reason.

rich_sasha•2h ago

Did they dogfood their own approach and have AI fill in the boring bits between headlines? I get that "vibe" from this article...

rco8786•2h ago

There's just so much that feels "off" here, it's hard to put my finger on any particular part of it.

Fundamentally, we acknowledge that AI writes crappy code and there's no visible path forward to really change that. I realize that it's getting incrementally better against various benchmarks, but would need a large step function change in code quality/accuracy to be considered "good" at software engineering.

I get the appeal of trying to provide it stricter guardrails. And I'm willing to bet that the overall quality of the system built this way is better than one that is just 100% vibe coded.

But that also implies a spectrum of quality between human code and vibe code..where the closer you get to human code the higher the quality, and vice versa. The author says this as well. But is there really an acceptable quality bar that can be reached with a significant % of the codebase being vibe coded? I'm pretty skeptical (speaking as some who uses AI tools all the time).

> “Does it work? Does it pass tests? Doesn’t it sneak around the overseer package requirements? Does it look safe enough? Ship it.”

If this type of code review were sufficient, we would already be doing it for human code. Wouldn't we?

> The business logic is in the interface packages - here’s exactly how it works. The implementation details are auto-generated, but the core logic is solid.

I don't understand how to separate "business logic" from "implementation details". These things form a venn diagram, but the author seems to treat them as exclusive.

jasonthorsness•2h ago

There's a lot of precedence for excluding "generated" code from review and linting but it's a bit weird when that is a large portion of the application and you can't rely on the correctness (non-LLM generated code is typically very mechanically generated so you just need to ensure the inputs are correct, not outputs).

I think testing and reviewing LLM-generated code remains just as important. Hopefully they will get better, and it will be easier (and hopefully LLMs can also assist with reviews).

jollyllama•2h ago

Indeed, it is bizarre to see the this approach emerging, in which non-deterministically generated code is treated the same as deterministically-generated. They're two categorically different things and it's as though everyone forgot all of those LLM primers from two years ago that explained why LLM output is non-deterministic and what that means.

knome•2h ago

yeah, traditionally 'generated code' is really just 'decompressed code'. it's got some source of information that's either brought in or crafted to be easily read and modified by people, and a transformer to expand it in a regular way that means, assuming the transformer isn't flawed, the code will always be correct.

if there was an `if( randomly() ){ emit("error"); }` in there, we'd be right back to reviewing it and probably wouldn't bother with it in the first place. besides which, any work to the transformer itself necessitates review even for generated code, making sure that the output is actually what you expect it to be.

the idea that you shouldn't care what's in a function because your possibly-insufficient-over-the-interface tests passed is kind of insane.

drivingmenuts•1h ago

How do you trust the generated vibe code? With human coding and review, you know what's in the code and what it's doing behind your back. How does that work with vibe code?

It seems like something that should NEVER be trusted - you don't know the source of the original code inhaled by the AI and the AI doesn't actually understand what it's taking in. Seems like a recipe for disaster.

jillesvangurp•1h ago

It's not that different from managing a few junior developers and getting them to do stuff for you. Sometimes doing it yourself is faster but letting them do it is a good investment because it makes them better prepared for the next time you want something from them. That's how they become senior developers.

With AIs/vibe coding/whatever you want to call it, there is no such benefit. It's more an opportunistic thing. You can delegate or do it yourself. If delegating is overall faster and better, it's an easy choice. Otherwise, it's your own time you are wasting.

Using this stuff (like everybody else) over the last two years has definitely planted the thought that I need to start thinking in terms of having LLM friendly code bases. It seems I get a lot better results when things are modular, well documented, and not too ambiguous. Of course that's what makes code bases nice to work with in general so these are not bad goals to have.

Working with large code bases is hard and expensive (more tokens) and create room for ambiguity. So, break it up. Modularize. Apply those SOLID principles. Or get your agentic coding tool of choice to refactor things for you. No need to do that yourself. All you need to do is nudge things in the right direction. And that would be a good idea without AIs anyway. So, all this stuff does is remove excuses for you to not have better code.

If you only vibe code and don't care, you just create a big mess that then needs cleaning up. Or for somebody else to clean up because what's your added value at that point? Your vibes aren't that valuable. Working software is. The difference between a product that makes money and a vibe coded thing that you look at and than discard is that one pays the bills and the other one is just for your entertainment.

Building an Audit Readiness Platform for Startups – Would Love Your Feedback

A protester tried to interrupt Apple exec Craig Federighi at WWDC

Japan's Recent Rice Price Crisis [video]

Show HN: I got tired of saving and forgetting – built this

Japan telecom giant NTT Docomo to end own emoji after 26 years

NOAA warns staff a militia group thinks its radars are 'weather weapons'

AWS open sources pgactive: active-active replication extension for PostgreSQL

Cartridges: Storing long contexts in tiny caches with self-study

Journal likely coming to macOS 26 and iPadOS 26

Follow the Smoke – China-Nexus Threat Actors Hammer at the Doors of Top Targets

Kubernetes is a never-ending wheel of misery. But it doesn't have to be

Norway Chess 2025 in 7 Graphs

Apple launches iPadOS 26 with a new look and better multitasking

Ask HN: Has Apple lost its way?

Emergent Models: a general modeling framework and alternative to Neural Networks

We should fund the software we use, not just the software we see

DeepSeek R1 0528 scored 71% on the aider polyglot coding benchmark (3rd)

Emergent Models in ML: Cellular Automata and Emergent Intelligence

A bit more on Twitter/X's new encrypted messaging

European Union Official DNS

Recession Watch Metrics

June Changes to AWS and GCP Commitment Sharing

Why did 1+1=2 take Russell and Whitehead 300 pages? (2011)

Lightweight Diagramming for Lightweight Formal Methods

Type-based vs. Value-based Reflection

VisionOS 26 Preview

A toolbox for ablating excitatory and inhibitory synapses

Support pi-thon in Python 3.14 venvs

iPadOS 26 Preview

macOS Tahoe 26 Preview

Vibe code isn't meant to be reviewed

Comments

Building an Audit Readiness Platform for Startups – Would Love Your Feedback

A protester tried to interrupt Apple exec Craig Federighi at WWDC

Japan's Recent Rice Price Crisis [video]

Show HN: I got tired of saving and forgetting – built this

Japan telecom giant NTT Docomo to end own emoji after 26 years

NOAA warns staff a militia group thinks its radars are 'weather weapons'

AWS open sources pgactive: active-active replication extension for PostgreSQL

Cartridges: Storing long contexts in tiny caches with self-study

Journal likely coming to macOS 26 and iPadOS 26

Follow the Smoke – China-Nexus Threat Actors Hammer at the Doors of Top Targets

Kubernetes is a never-ending wheel of misery. But it doesn't have to be

Norway Chess 2025 in 7 Graphs

Apple launches iPadOS 26 with a new look and better multitasking

Ask HN: Has Apple lost its way?

Emergent Models: a general modeling framework and alternative to Neural Networks

We should fund the software we use, not just the software we see

DeepSeek R1 0528 scored 71% on the aider polyglot coding benchmark (3rd)

Emergent Models in ML: Cellular Automata and Emergent Intelligence

A bit more on Twitter/X's new encrypted messaging

European Union Official DNS

Recession Watch Metrics

June Changes to AWS and GCP Commitment Sharing

Why did 1+1=2 take Russell and Whitehead 300 pages? (2011)

Lightweight Diagramming for Lightweight Formal Methods

Type-based vs. Value-based Reflection

VisionOS 26 Preview

A toolbox for ablating excitatory and inhibitory synapses

Support pi-thon in Python 3.14 venvs

iPadOS 26 Preview

macOS Tahoe 26 Preview