Somebody jump on that. It's yours. :)
Could be a good idea for a non-profit like you said. I know someone who’s exploring something similar but for disabled folks who aren’t tech-savvy (for-profit)
"Vibe Coding" is specifically using the LLM instead of programming anything, barely caring about the output. If something is wrong, don't even open the file, just ask the LLM. Basically "prompting while blindfolded" I guess you could say.
Peer programming with an LLM would be to use it as another tool in the toolbox. You still own your program and your code. Edit away, let the LLM do some parts that are either too tricky, or too trite to implement, or anything in-between. Prompts usually are more specific, like "Seems X is broken, look into Y and figure out if Z could be the reason".
> There's a new kind of coding I call "vibe coding", where you fully give in to the vibes, embrace exponentials, and forget that the code even exists. (...) I ask for the dumbest things like "decrease the padding on the sidebar by half" because I'm too lazy to find it. I "Accept All" always, I don't read the diffs anymore. (...)
Pair programming still very much deals with code and decisions.
But doesn't a 'vibe-coding' "we'll just sort out the engineering challenges later" ensure that there will be re-work and thus less overall efficiency?
I suppose the modern difference is the degree of human validation before committing or releasing.
> [...] a collection of blog posts written by other senior or staff+ engineers exploring the use of LLM in their work
It seems to be by senior engineers if anything, I don't see anything in the linked articles indicating they're for senior engineers, seems programmers of all seniority could find them useful, if they find LLMs useful.
I dont use it much to generate code, I ask it higher level questions more often. Like when I need a math formula.
There is also the problem that none of these workflows were validated or verified. Everyone is free to go on social media or personal blogs and advertise their snake oil. Thus in a scenario where these workflows are found to be lacking, the perceived staleness might actually be ineffectiveness beyond self promotion.
You don't even need to switch models. Write a prompt to generate some code and immediately after prompt the same model to review the code it just generated. Sometimes it takes 3 or 4 prompts to get the result to converge. But converge to where?
I could assign the LLM the simple drudgery that I don't really want to do, such as writing tests, without feeling bad about it.
I could tell the LLM "that's the stupidest fucking thing I've ever seen" whereas I would not say that to a real person.
Here's an example of using this pattern with Brokk to solve a real world bug: https://www.youtube.com/watch?v=t_7MqowT638
It rewrote some comments, changed the test name and added extra assertions to the test. Baby sitting something like that seems like an absolute waste of time.
Just because they can't fix most failures doesn't mean they can't fix many.
No one's disputing this was bad. People are merely claiming it can also be good. I've dealt with plenty of humans this bad - it's not an argument that humans can't program.
There are some people who fall into the bucket that we can’t trust them to finish the task correctly, or within a time frame or level of effort on our part to make the task offloading exercise have a positive benefit.
If we view LLMs in the same light, IMO currently they fall into “not trust” category to really give they a task and trust them to finish it correctly, with us being confident we don’t really need to understand their implementation.
If one day LLMs or some other solution reaches that point, then it definitely won’t look like a bubble, but a real revolution.
1. Find simpler tasks for which the trust in LLMs is high.
2. Give tasks to the LLMs that have a very low cost to verify (even when the task is not simple) - particularly one off scripts.
I once had a colleague who was in the "not trust" bucket for the work we were doing. So we found something he was good at that was a pain for me to do, and re-assigned him to do those things and take that burden off of us.
In the last few months I've had the LLM solve (simple) problems via code that had been in my head for years. At any point I could have done them, but they were a chore. If the LLM failed for one of these tasks - it's not a big deal - not much time was lost. But they tend to succeed fairly often, because they are simple tasks.
I almost never let the LLM write production code, because of the extra burden that you and others allude to. But I do let it write code I rely on in my personal life, because frankly I tend to write pretty poor code for my personal use - I can't justify the time it would take to write things well - life is too busy. I welcome the code quality I get from Sonnet or Gemini 2.5 Pro.
That's my point in this thread. Writing code is a pretty diverse discipline, and many are dismissing it simply because it doesn't do one particular use case (high quality production code) well.
I didn't take LLM coding seriously until I found well respected, well known SW engineers speak positively about them. Then I tried it and ... oh wow. People dismissing them is dismissing not only a lot of average developers' reality, but also a lot of experts' daily reality.
Just look at the other submission:
https://sean.heelan.io/2025/05/22/how-i-used-o3-to-find-cve-...
He used an LLM to find a security vulnerability in the kernel. To quote him:
> Before I get into the technical details, the main takeaway from this post is this: with o3 LLMs have made a leap forward in their ability to reason about code, and if you work in vulnerability research you should start paying close attention. If you’re an expert-level vulnerability researcher or exploit developer the machines aren’t about to replace you. In fact, it is quite the opposite: they are now at a stage where they can make you significantly more efficient and effective. If you have a problem that can be represented in fewer than 10k lines of code there is a reasonable chance o3 can either solve it, or help you solve it.
I've worked with real flesh and blood developers who did the exactly same thing. At least with LLMs we don't have to jump into a 1h long call to discuss the changes.
The last time I set Cursor on something without watching it very very closely it spun for a while fixing tests and when it finally stopped and I looked what it had done it had coded special cases in to pass the specific failing tests in a way that didn't generalize at all to the actual problem. Another recent time I had to pull the plug on it installing a bunch of brand new dependencies that it decided would somehow fix the failing tests. It had some kind of complete rewrite planned.
Claude Code is even worse when it gets into this mode because it'll do something totally absurd like that and then at the end you have to `git reset` and you're also on the hook for the $5 of tokens that it managed to spend in 5 minutes.
I still find them useful, but it takes a lot of practice to figure out when they'll be useful and when they'll be a total waste of time.
When I first began programming as a teenager, one of the mental hurdles I had to get over was asking the computer to "too much"; like, I would feel bad writing a nested loop --- that can't possibly be the right answer! What a chore for the computer! It didn't take me too long to figure out that was the whole point of computers. To me, it's the same thing with LLMs spinning on something. Who gives a shit? It's not me wasting that time.
My bet, and I realize this might just be wishful thinking, is that the high order bit for being an effective software developer in the near future will be skill at using more reliable and non-exploitative automation tools, such as programming languages with powerful macro systems and other high-level abstractions, to stay competitive with developers who sling LLM-generated code. So I'd better get started developing that skill myself.
Whether it ends up getting good enough in the near future that it does become a net positive both isn't the question being discussed and still remains to be seen.
It doesn’t replace the hard tasks (yet) and you do need to think about the tasks and the tooling but it’s a game changer.
I wasn’t kidding in a peer comment (except about the mars cheese castle). I started an agent task before leaving on a trip and gave it feedback from my iPad when I stopped. I have a real business problem solved now.
Not all scripts are “ops”!
It was because writing one off tools took time and you needed it to do more for it to be worth the time.
Now a lot more are getting written because it takes a lot less effort. :-)
I've also tried asking the LLM to come up with a proposed solution while I work on my own implementation at the same time.
LLMs can also be much faster if a task requires some repetitive work. When I recognize a task like that, I try coding the first version and then ask the LLM to follow my pattern for the other areas where I need to repeat the work.
That’s why I really like copilot agent and codex right now.
Even more parallel stuff and from my phone when I’m just thinking of ideas.
I bear some responsibility for this, since I was one of the people who basically said, in the 2010s, that we should just give up and use popular languages like JavaScript because they're popular. I regret that now.
Flashback to when I committed a suite of tests in Python that were indented one tab too much, resulting in them not running at all. This passed code review (on a FAANG company) and was discovered months later from an unrelated bug. The point is even unit tests have a very human element to them.
In 2010s the move was towards more concise languages and programming techniques in e.g. JavaScript scene too. CoffeeScript is a prime example of this.
But then came the enterprise software people pushing their Javaisms and now we have verbose bondage and ceremony mess like TypeScript and ES6 modules.
And in a tragicomic turn after we made expressing programmer intent formally so difficult, we are turning into writing bad novels for LLMs in a crapshoot trial and error hoping it writes the pointless boilerplate correctly.
I also offer you an old saying we have all heard many times:
There are 2 kinds of languages: ones that everyone complains about and ones that nobody uses.
The answer is always "it depends". There are some drudge work tasks that are brilliantly done by LLMs, such as generating unit tests or documentation. Often the first attempt is not great, but iterating over them is so fast that you can regenerate everything from scratch a dozen times before you spend as much time as you would do if you wrote them yourself.
It also depends on what scope you're working on. Small iterations have better results than grand redesigns.
Context is also critical. if your codebase is neatly organized with squeaky clean code then LLMs generate better recommendations. If your codebase is a mess with inconsistent styles in spaghetti code then your prompts tend to generate more of the same.
It often is, if you pick the right tasks (and more tasks fall into that bucket every few weeks).
You can get a simple but fully-working app out of a single prompt, though quality varies widely unless you’re very specific.
Once you have a codebase, agent output quality comes down to architecture and tests.
If you have a scalable architecture with well-separated concerns, a solid integration test harness with examples, and good documentation (features, stack, procedures, design constraints), then getting the exact change you want is a matter of how well you can articulate what you want.
One more asterisk, the development environment has to support the agent: like a human, agents work well with compiler feedback, and better with testing tools and documentation/internet access (yes my agents have these).
I use CheepCode to work on itself, but I am still building up the test library and preview environments to de-risk merging non-trivial PRs that I haven’t pulled down and run locally. I also use it to work on other apps that I'm building, and since those are far more self-contained / easier to test, I get much better results there.
If you want to put less effort into describing what you want, have a chat with an AI to generate tickets. Then paste those tickets into Linear and let CheepCode agents rip through them. I’ve got tooling in the works that will make that much easier, but I can only be in so many places at once as a bootstrapped founder :-)
The other thing is sometimes just writing out method signatures with all the right types and conversions/casts etc between types/interfaces/classes etc when I can't be bothered to do all the look ups myself ("Create a method here called foo that accepts a Bar instance and converts it to Baz. Return type should be Quux - add a conversion from Baz to a new Quux instance before the final return - use the builder pattern and map the constant magic-strings in blah.ts to appropriate enum values in Quux." Etc etc and then I write the logic in the middle) Again not a huge time saving but it mentally lightens the load and keeps you concentrating on the problem rather than the minutia
https://blog.scottlogic.com/2025/05/08/new-tools-new-flow-th...
The best structure I've found which leverages this idea is called BMAD, and treats the LLM as though it were a whole development team in an orchestrated way that you have full control over.
https://youtu.be/E_QJ8j74U_0 https://github.com/bmadcode/BMAD-METHOD
I’ve also been experimenting with giving an LLM coins and a budget. “You have 10 coins to spend doing x, you earn coins if you m,n,o and lose coins if you j,k,l” this has reduced slop and increased succinctness. It will come back, recount what it’s done explaining the economy and spending. I’ve had it ask “All done boss I have 2 left how can i earn some more coins?” It’s fun to spy on the thinking model working through the choices “if I do this it’ll cost me this so maybe I should do this instead in 1 line of code and I’ll earn 3 coins!”
> AI is much better than strong engineers at writing very short programs: in particular, it can produce ten to thirty lines of straightforward mostly-working code faster than any engineer.
> How can you leverage this? There’s not much demand for this kind of program in the day-to-day of a normal software engineer. Usually code either has to be a modification to a large program, or occasionally a short production-data script (such as a data backfill) where accuracy matters a lot more than speed.
While this may be technically correct — there’s little demand for standalone small programs — it overlooks a crucial reality: the demand for small code segments within larger workflows is enormous.
Software development (in my experience) is built around composing small units — helpers, glue code, input validation, test cases, config wrappers, etc. These aren’t standalone programs, but they’re written constantly. And they’re exactly the kind of 10–30 line tasks where LLMs are most effective.
Engineers who break down large tasks into AI-assisted microtasks can move faster. It’s not about replacing developers — it’s about amplifying them.
pmbanugo•6mo ago