Then, it's easy to revise the plan itself or have Cursor do that, then re-run it to make individual implementation changes or fixes that don't affect your architecture or invent new interfaces.
To me, coding is a process of learning and discovery, of perpetually preparing me to develop something even better next time. Just as a developer wouldn't be using libraries if they weren't well written, the same logic extends to applications.
I can definitely see poor programmers rely on unreviewed vibe code, and I guess they have nothing to lose, but I don't imagine anyone actually using their output. It's like trying to resell an AI generated image; there is just no market for it after the initial generation.
If it’s the latter, perhaps it’s a sign that we are making languages too verbose, and there’s a lot of boilerplate patterns that could be cut down if we give ourselves wider vocabulary (syntax) to express concepts.
In the end, if we can come up with a language that is 1 to 1 with the time and effort spent to write equivalent prompts, there will be no need for vibe coding anymore unless you really don’t know what you’re doing, in which case you should develop your skills or simply not be a software engineer. Some may say this language already exists.
Restrict agentic workflows to implementation details, hand-write the higher-level logic and critical tests, and only pay attention to whether those human-written tests pass or fail. Then you don't have to worry about reviewing agent-generated code as long as the human-written tests about the functionality pass.
(Still not sure I agree, not least of which for security and performance reasons at existing orgs; this assumes very good test coverage and design exist before any code is written. Interesting for greenfield projects though.)
Does it? In the olden days when hand-coding everything was the only way, you'd write a single test, implement what is necessary for it to pass, and then repeat until you have the full set of functionality covered. Your design would also emerge out of that process.
Which, conveniently, is also how AI seems to work best in this role. i.e. Give it a minimal task and then keep iteratively expanding upon it with more and more bits of information until finally reaching completion. So, in theory, I'm not sure anything has changed.
But the roundtrip time on the agents today is excruciatingly slow, so the question is: Does the typical developer have enough fortitude to stick with it from start to finish without looking for shortcuts to speed up the process? It may not be practical for that reason.
Fundamentally, we acknowledge that AI writes crappy code and there's no visible path forward to really change that. I realize that it's getting incrementally better against various benchmarks, but would need a large step function change in code quality/accuracy to be considered "good" at software engineering.
I get the appeal of trying to provide it stricter guardrails. And I'm willing to bet that the overall quality of the system built this way is better than one that is just 100% vibe coded.
But that also implies a spectrum of quality between human code and vibe code..where the closer you get to human code the higher the quality, and vice versa. The author says this as well. But is there really an acceptable quality bar that can be reached with a significant % of the codebase being vibe coded? I'm pretty skeptical (speaking as some who uses AI tools all the time).
> “Does it work? Does it pass tests? Doesn’t it sneak around the overseer package requirements? Does it look safe enough? Ship it.”
If this type of code review were sufficient, we would already be doing it for human code. Wouldn't we?
> The business logic is in the interface packages - here’s exactly how it works. The implementation details are auto-generated, but the core logic is solid.
I don't understand how to separate "business logic" from "implementation details". These things form a venn diagram, but the author seems to treat them as exclusive.
I think testing and reviewing LLM-generated code remains just as important. Hopefully they will get better, and it will be easier (and hopefully LLMs can also assist with reviews).
if there was an `if( randomly() ){ emit("error"); }` in there, we'd be right back to reviewing it and probably wouldn't bother with it in the first place. besides which, any work to the transformer itself necessitates review even for generated code, making sure that the output is actually what you expect it to be.
the idea that you shouldn't care what's in a function because your possibly-insufficient-over-the-interface tests passed is kind of insane.
It seems like something that should NEVER be trusted - you don't know the source of the original code inhaled by the AI and the AI doesn't actually understand what it's taking in. Seems like a recipe for disaster.
With AIs/vibe coding/whatever you want to call it, there is no such benefit. It's more an opportunistic thing. You can delegate or do it yourself. If delegating is overall faster and better, it's an easy choice. Otherwise, it's your own time you are wasting.
Using this stuff (like everybody else) over the last two years has definitely planted the thought that I need to start thinking in terms of having LLM friendly code bases. It seems I get a lot better results when things are modular, well documented, and not too ambiguous. Of course that's what makes code bases nice to work with in general so these are not bad goals to have.
Working with large code bases is hard and expensive (more tokens) and create room for ambiguity. So, break it up. Modularize. Apply those SOLID principles. Or get your agentic coding tool of choice to refactor things for you. No need to do that yourself. All you need to do is nudge things in the right direction. And that would be a good idea without AIs anyway. So, all this stuff does is remove excuses for you to not have better code.
If you only vibe code and don't care, you just create a big mess that then needs cleaning up. Or for somebody else to clean up because what's your added value at that point? Your vibes aren't that valuable. Working software is. The difference between a product that makes money and a vibe coded thing that you look at and than discard is that one pays the bills and the other one is just for your entertainment.
Firfi•3h ago
We can be honest in our PR, “yes, this is slop,” while being technical and picky about code that actually matters.
The “guidance” code is not only great for preserving knowledge and aiding the discovery process, but it is very strong at creating a system of “checks and balances” for your AI slops to conform to, which greatly boosts vibe quality.
Helps me both technically (at least I feel so) with guiding claude code to do exactly what I want (or what we agreed to!) and psychologically because there's no detachment from the knowledge of the system anymore.
mkleczek•3h ago
Firfi•2h ago
On the contrary, if I glanced over the code and could say "ok it doesn't look terrible, no obvious `rm -rf` and all", even if I changed a couple obvious mistakes, I still consider it vibe.
mkleczek•2h ago
So the question really is: in your experience how much code requires careful review and re-prompting vs leaving it as "not terrible".
Asking because my experience is that in practice LLMs are no better than juniors - ie. it is more effective to just write the thing by myself instead of multiple rounds of reviewing and re-prompting which does not really achieve what I really want.
Firfi•1h ago
I can't say for everyone, but for me it's hit-and-miss: if LLM starts with "Oh, sorry, you're right" that's a STRONG signal I have to take over right now or rethink the approach, or I get into the doom spiral of reprompting and waste half a day on something I could've done myself by that point, with only difference that after half a day with a coding agent I discovered no important domain or technical knowledge.
So, "how much" to me depends so very much on seemingly random factors, including the time of the day when Antropic decides to serve their quantised version instead of a normal one. On non-random too, like how difficult the domain area is, how well you described it in the prompt, and how well you crafted your system queries. And I hate it very much! At this point, I'm trigger-happy to take over the control and write the stuff that LLM can't in the "controlling package" and tell it to use it as an example / safety check.
mkleczek•1h ago
This part is the most frustrating in discussions about LLMs. Since there are no criteria to measure the quality of your prompting there is really no way to learn the skill. Assessing prompting skills based on the actual results is wrong as it does not isolate the model capabilities.
Hence the whole thing looks a lot like an ancient shamanism.
PaulHoule•2h ago
If your persistence layer and long-term data structures are solid you can accept shoddy coding in screens (e.g. a small bundle of http endpoints.) From that viewpoint you modernize an application a screen at a time and if you don't like a shoddy screen you create a new screen. From that viewpoint you vibe code screens but schemas and updating are carefully handwritten code, though I think deterministic code generation from a schema is the power tool for that.
SoftTalker•2h ago
When they built Citicorp Center, the contractor bolted the steel insstead of welding it. It was thought to be an implementation detail. Bolting was cheaper, and nobody thought it actually mattered. Until the actual engineer who designed it looked more carefully and discovered that as a result the building was more vulnerable to wind loads. Expensive rework was required to open up the interior walls and weld all the bolted connections.
Firfi•2h ago