Does including an agent at each stage of this cycle mean "context engineering"? Is this then just more text and assets to feed in at each stage of LLM ussage to provide the context for the next set of tokens to generate for the next stage of the cycle? Is there something deeper that can be done to encode this level of staged development into the agent's weights/"understanding"? Is there an established process for this yet?
- Specification
- Documentation
- Modular Design
- Test-Driven Development
- Coding Standard
- Monitoring & Introspection
I’ve found LLMs just as useful for the "thankless" layers (e.g. tests, docs, deployment).
The real failure mode is letting AI flood the repo with half-baked abstractions without a playbook. It's helpful to have the model review the existing code and plan out the approach before writing any new code.
The leverage may be in using LLMs more systematically across the lifecycle, including the grunt work the author says remains human-only.
It's also great for things that aren't creative, like 'implement a unit test framework using google test and cmake, but don't actually write the tests yet'. That type of thing saves me hours and hours. It's something I rarely do, so it's not like I just start editing my cmake and test files, I'd be looking up documentation, and a lot of code that is necessary, but takes a lot of time.
With LLMs, I usually get what I want quickly. If it's not what I want, a bit of time reviewing what it did and where it went wrong usually tells me what I need to give it a better prompt.
So give them some context. I like Cline's memory bank approach https://docs.cline.bot/prompting/cline-memory-bank which includes the architecture, progress, road map etc. Some of my more complex projects use 30k tokens just on this, with the memory bank built from existing docs and stuff I told the model along the way. Too much context can make models worse but overall it's a fair tradeoff - it maintains my coding style and architecture decisions pretty well.
I also recommend in each session using Plan mode to get to a design you are happy with before generating any code.
The plan-build-test-reflect loop is equally important when using an LLM to generate code, as anyone who's seriously used the tech knows: if you yolo your way through a build without thought, it will collapse in on itself quickly. But if you DO apply that loop, you get to spend much more time on the part I personally enjoy, architecting the build and testing the resultant experience.
> While the LLMs get to blast through all the fun, easy work at lightning speed, we are then left with all the thankless tasks
This is, to me, the root of one disagreement I see playing out in every industry where AI has achieved any level of mastery. There's a divide between people who enjoy the physical experience of the work and people who enjoy the mental experience of the work. If the thinking bit is your favorite part, AI allows you to spend nearly all of your time there if you wish, from concept through troubleshooting. But if you like the doing, the typing, fiddling with knobs and configs, etc etc, all AI does is take the good part away.
1) Bad actors using AI at scale to do bad things
2) AI just commodifying everything and making humans into zoo animals
I'm on a small personal project with it intentionally off, and I honestly feel I'm moving through it faster and certainly having a better time. I also have a much better feel for the code.
These are all just vibes, in the parlance of our times, but it's making me question why I'm bothering with LLM assisted coding.
Velocity is rarely the thing in my niche, and I'm not convinced babysitting an agent is all in all faster. It's certainly a lot less enjoyable, and that matters, right?
Using "AI" is just like speed reading a math book without ever doing single exercise. The proponents rarely have any serious public code bases.
That's not to say that LLMs as good as some of the more outrageous claims. You do still need to do a lot of work to implement code. But if you're not finding value at all it honestly reflects badly on you and your ability to use tools.
The craziest thing is i see the above type of comment on linked in regularly. Which is jaw dropping. Prospective hiring managers will read it and think "Wow you think advertising a lack of knowledge is helpful to your career?" Big tech co's are literally firing people with attitudes like the above. There's no room for people who refuse to adapt.
I put absolute LLM negativity right up there with comments like "i never use a debugger and just use printf statements". To me it just screams you never learnt the tool.
lmao, "executioner don't understand why human right activists are against the death penalty" type of beat
To me it just feels different. Learning to use a debugger made me feel more powerful and "in control" (even though I still use a lot of print debugging; every tool has its place). Using AI assisted coding makes me feel like a manager who has to micro-manage a noob - it's exhausting.
To many of us coding us simply more fun. At the same time, many of us could benefit from that exercise with or without the LLM.
The open source code of these companies is also not that great and definitely not bug free. Perhaps these companies should do more thinking and less tooling politics.
You are in a forum full of people that routinely claim that vibe coding is the future, that LLMs already can fully replace engineers, and if you don't think so you are just a naysayer that is doing it wrong.
Rephrasing your claim, LLMs are just moderately useful, far from being the future-defining technology people invested in it wants it to be. But you choose to rally against people not interested in marketing it further.
Given the credentials you decided to share, I find it unsurprising.
You described the current AI Bubble.
AI can only recycle the past.
Since we don't know what else might already exist in the world without digging very deep, we fool ourselves into thinking that we do something very original and unique.
This AI will be built from it's own excretions, recursively.
That might be a hilarious trap.
The article sort of goes sideways with this idea but pointing out that AI coding robs you a deep understanding of the code it produces is a valid and important criticism of AI coding.
A software engineer's primary job isn't producing code, but producing a functional software system. Most important to that is the extremely hard to convey "mental model" of how the code works and an expertise of the domain it works in. Code is a derived asset of this mental model. And you will never know code as well as a reader and you would have as the author for anything larger than a very small project.
There are other consequences of not building this mental model of a piece of software. Reasoning at the level of syntax is proving to have limits that LLM-based coding agents are having trouble scaling beyond.
This feels very true - but also consider how much code exists for which many of the current maintainers were not involved in the original writing.
There are many anecdotal rules out there about how much time is spent reading code vs writing. If you consider the industry as a whole, it seems to me that the introduction of generative code-writing tools is actually not moving the needle as far as people are claiming.
We _already_ live in a world where most of us spend much of our time reading and trying to comprehend code written by others from the past.
What's the difference between a messy codebase created by a genAI, and a messy codebase where all the original authors of the code have moved on and aren't available to ask questions?
The difference is the hope of getting out of that situation. If you've inherited a messy and incoherent code base, you recognize that as a problem and work on fixing it. You can build an understanding of the code through first reading and then probably rewriting some of it. This over time improves your ability to reason about that code.
If you're constantly putting yourself back into that situation through relegating the reasoning about code to coding agent, then you won't develop a mental model. You're constantly back at Day 1 of having to "own" someone else's code.
All code is temporary and should be treated as ephemeral. Even if it lives for a long time, at the end of the day what really matters is data. Data is what helps you develop the type of deep understanding and expertise of the domain that is needed to produce high quality software.
In most problem domains, if you understand the data and how it is modeled, the need to be on top of how every single line of code works and the nitty-gritty of how things are wired together largely disappears. This is the thought behind the idiom “Don’t tell me what the code says—show me the data, and I’ll tell you what the code does.”
It is therefore crucial to start every AI-driven development effort with data modeling, and have lots of long conversations with AI to make sure you learn the domain well and have all your questions answered. In most cases, the rest is mostly just busywork, and handing it off to AI is how people achieve the type of productivity gains you read about.
Of course, that's not to say you should blindly accept everything the AI generates. Reading the code and asking the AI questions is still important. But the idea that the only way to develop an understanding of the problem is to write the code yourself is no longer true. In fact, it was never true to begin with.
Why? Code has always been the artifact. Thinking about and understanding the domain clearly and solving problems is where the intrinsic value is at (but I'd suspect that in the future this, too, will go away).
This same error in thinking happens in relation to AI agents too. Even if the agent is perfect (not really possible) but other links in the chain are slower, the overall speed of the loop still does not increase. To increase productivity with AI you need to think of the complete loop, reorganize and optimize every link in the chain. In other words a business has to redesign itself for AI, not just apply AI on top.
Same is true for coding with AI, you can't just do your old style manual coding but with AI, you need a new style of work. Maybe you start with constraint design, requirements, tests, and then you let the agent loose and not check the code, you need to automate that part, it needs comprehensive automated testing. The LLM is like a blind force, you need to channel it to make it useful. LLM+Constraints == accountable LLM, but LLM without constraints == unaccountable.
Here's mine, I use Cline occasionally to help me code but more and more I find myself just coding by hand. The reason is pretty simple which is with these AI tools you for the most part replace writing code with writing a prompt.
I look at it like this, if writing the prompt, and the inference time is less than what it would take me to write the code by hand I usually go the AI route. But this is usually for refactoring tasks where I consider the main bottleneck to be the speed at which my fingers can type.
For virtually all other problems it goes something like this: I can do X task in 10 minutes if i code it manually or I can prompt AI to do it and by the time I finish crafting the prompt and execute, it takes me about 8 minutes. Yes that's a savings of 2 minutes on that task and that's all fine and good assuming that the AI didn't make a mistake, if I have to go back and re-prompt or manually fix something, then all of a sudden the time it took me to complete that task is now 10-12 minutes with AI. Here the best case scenario is I just spent some AI credits for zero time savings and worse case is I spent AI credits AND the task was slower in the end.
With all sorts of tasks I now find myself making this calculation and for the most part, I find that doing it by hand is just the "safer" option, both in terms of code output but also in terms of time spent on the task.
I'm convinced I spend more time typing and end up typing more letters and words when AI coding than when not.
My hands are hurting me more from the extra typing I have to do now lol.
I'm actually annoyed they haven't integrated their voice to text models inside their coding agents yet.
That being said, these agents may still just YOLO and ignore your instructions on occasion, which can be a time suck, so sometimes I still get my hands dirty too :)
I think this depends. I prefer the thinking bit, but it's quite difficult to think without the act of coding.
It's how white boarding or writing can help you think. Being in the code helps me think, allows me to experiment, uncover new learnings, and evolve my thinking in the process.
Though maybe we're talking about thinking of different things? Are you thinking in the sense of what a PM thinks about ? User features, user behavior, user edge cases, user metrics? Or do you mean thinking about what a developer thinks about, code clarity, code performance, code security, code modularization and ability to evolve, code testability, innovative algorithms, innovative data-structure, etc. ?
But the result of that thinking would hardly ever align neatly with whatever an LLM is doing. The only time it wouldn’t be working against me would be drafting boilerplate and scaffolding project repos, which I could already automate with more prosaic (and infinitely more efficient) solutions.
Even if it gets most of what I had in mind correct, the context switching between “creative thinking” and “corrective thinking” would be ruinous to my workflow.
I think the best case scenario in this industry will be workers getting empowered to use the tools that they feel work best for their approach, but the current mindset that AI is going to replace entire positions, and that individual devs should be 10x-ing their productivity is both short-sighted and counterproductive in my opinion.
What makes you regard this as an anti-AI take? To my mind, this is a very pro-AI take
Are you genuinely saying you never saw a critique of AI on environmental impact, or how it amplifies biases, or how it widens the economic gap, or how it further concentrates power in the hands of a few, or how it facilitates the dispersion of misinformation and surveillance, directly helping despots erode civil liberties? Or, or, or…
You don’t have to agree with any of those. You don’t even have to understand them. But to imply anti-AI arguments “hinge on the idea that technology forces people to be lazy/careless/thoughtless” is at best misinformed.
Go grab whatever your favourite LLM is and type “critiques of AI”. You’ll get your takes.
The energy cost is nonsensical unless you pin down a value out vs value in ratio and some would argue the output is highly valuable and the input cost is priced in.
I don't know if it will end up being a concentrated power. It seems like local/open LLMs will still be in the same ballpark. Despite the absurd amounts of money spent so far the moats don't seem that deep.
Baking in bias is a huge problem.
The genie is out of the bottle as far as people using it for bad. Your own usage won't change that.
> You don’t have to agree with any of those. You don’t even have to understand them. But to imply anti-AI arguments “hinge on the idea that technology forces people to be lazy/careless/thoughtless” is at best misinformed.
We can certainly discuss some of those points, but that’s not what is in question here. The OP is suggesting there is only one type of anti-AI argument they are familiar with and that they’d “love” to see something different. But I have to question how true that is considering the myriad of different arguments that exist and how easy they are to find.
At least when doing stuff the old way you learn something if you waste time.
That said AI is useful enough and some poker games are +EV.
So this is more caution-AI than anti-AI take. It is more an anti-vibe-koolaid take.
I don't know... that seems like a false dichotomy to me. I think I could enjoy both but it depends on what kind of work. I did start using AI for one project recently: I do most of the thinking and planning, and for things that are enjoyable to implement I still write the majority of the code.
But for tests, build system integration, ...? Well that's usually very repetitive, low-entropy code that we've all seen a thousand times before. Usually not intellectually interesting, so why not outsource that to the AI.
And even for the planning part of a project there can be a lot of grunt work too. Haven't you had the frustrating experience of attempting a re-factoring and finding out midway it doesn't work because of some edge case. Sometimes the edge case is interesting and points to some deeper issue in the design, but sometimes not. Either way it sure would be nice to get a hint beforehand. Although in my experience AIs aren't at a stage to reason about such issues upfront --- no surprise since it's difficult for humans too --- of course it helps if your software has an oracle for if the attempted changes are correct, i.e. it is statically-typed and/or has thorough tests.
My strong belief after almost twenty years of professional software development is that both us and LLMs should be following the order: build, test, reflect, plan, build.
Writing out the implementation is the process of materializing the requirements, and learning the domain. Once the first version is out, you can understand the limits and boundaries of the problem and then you can plan the production system.
This is very much in line with Fred Brooks' "build one to throw away" (written ~40 years ago in the "The Mythical Man-Month". While often quoted, if you never read his book, I urge you to do so, it's both entertaining, and enlightening on our software industry), startup culture (if you remove the "move fast break things" mantra), and governmental pilot programs (the original "minimum viable").
This argument is wearing a little thin at this point. I see it multiples times a day, rephrase a little bit.
The response, "How well do you think your thinking will go if you had not spent years doing the 'practice' part?", is always followed by either silence or a non-sequitor.
So, sure, keep focusing on the 'thinking' part, but your thinking will get more and more shallow without sufficient 'doing'
Here's a couple points which are related to each other:
1) LLMs are statistical models of text (code being text). They can only exist because huge for-profit companies ingested a lot of code under proprietary, permissive and copyleft licenses, most of which at the very least require attribution, some reserve rights of the authors, some give extra rights to users.
LLM training mixes and repurposes the work of human authors in a way which gives them plausible deniability against any single author, yet the output is clearly only possible because of the input. If you trained an LLM on only google's source code, you'd be sued by google and it would almost certainly reproduce snippets which can be tracked down to google's code. But by taking way, way more input data, the blender cuts them into such fine pieces that the source is undetectable, yet the output is clearly still based on the labor of other people who have not been paid.
Hell, GPT3 still produced verbatim snippets of inverse square root and probably other well known but licensed code. And github has a checkbox which scans for verbatim matches so you don't accidentally infringe copyright by using copilot in a way which is provable. Which means they take extra care to make it unprovable.
If I "write a book" by taking an existing book but replacing every word with a synonym, it's still plagiarism and copyright infringement. It doesn't matter if the mechanical transformation is way more sophisticated, the same rules should apply.
2) There's no opt out. I stopped writing open source over a year ago when it became clear all my code is unpaid labor for people who are much richer than me and are becoming richer at a pace I can't match through productive work because they own assets which give them passive income. And there's no license I can apply which will stop this. I am not alone. As someone said, "Open-Source has turned into a form of unpaid internship"[0]. It might lead to a complete death of open source because nobody will want to see their work fed into a money printing machine (subscription based LLM services) and get nothing in return for their work.
> But if you like the doing, the typing, fiddling with knobs and configs, etc etc, all AI does is take the good part away.
I see quite the opposite. For me, what makes programming fun is deeply understanding a problem and coming up with a correct, clear to understand, elegant solution. But most problems a working programmer has are just variations of what other programmers had. The remaining work is prompting the LLMs in the right way that they produce this (describing the problem instead of thinking about its solutions) and debugging bugs LLMs generated.
A colleague vibe coded a small utility. It's useful but it's broken is so many ways, the UI falls apart when some text gets too long, labels are slightly incorrect and misleading, some text handle decimal numbers in weird ways, etc. With manually written code, a programmer would get these right the right time. Potential bugs become obvious as you're writing the code because you are thinking about it. But they do not occur to someone prompting an LLM. Now I can either fix them manually which is time consuming and boring, or I can try prompting an LLM about every single one which is less time consuming but more boring and likely to break something else.
Most importantly, using an LLM does not give me deeper understanding of the problem or the solution, it keeps knowledge locked in a black box.
I’ve found this concept trips CC up—- assertions are backwards, confusing comments in the test, etc. Just starting a prompt with “Use TDD to…” really helps.
Article being discussed in this thread isn't intended to be a luddite rejection of AI. It's just a mistake I see people keep making (and have made myself) and some thoughts on how to avoid it with the tools we have today.
I will raise that LLMs are pretty good at some of the non-coding tasks too.
eg. "I'm currently creating an AI for a turn based board game. Without doing any implementation, create a plan for the steps that need to be done including training off real world game data".
The LLM creates a tasklist for iterative steps to accomplish the above. It usually needs correction specific to the business/game needs but it's a great start and i recommend doing this just so the LLM has a doc with context on what its trying to achieve in a bigger picture as you have it complete tasks.
I think that this approach can already get us pretty far. One thing I'm missing is tooling to make it easier to build automation on top of, eg, Claude Code, but I'm sure it's going to come (and I'm tempted to try vibe coding it; if only I had the time).
LLMs aren't effective when used this way.
You still have to think.
IMO a vibe coder who is speaking their ideas to an agent which implements them is going to have way more time to think than a hand coder who is spending 80% of their time editing text.
obviously good and experienced engineers aren't going to be vibe coders/mollycoddlers by nature. but many good and experienced engineers will be pressured to make poor decisions by impatient business leaders. and that's the root of most AI anxiety: we all know it's going to be used as irresponsibly and recklessly as possible. it's not about the tech. it's about a system with broken incentives.
Which looks to attempt to give better "coding" understanding to the model instead of mere tokens and positioning and hence improve the coding capabilities of these "brilliant but unpredictable junior engineer" coding agents:
- https://ai.meta.com/research/publications/cwm-an-open-weight...
My one concrete pushback to the article is that it states the inevitable end result of vibe coding is a messy unmaintainable codebase. This is empirically not true. At this point I have many vibecoded projects that are quite complex but work perfectly. Most of these are for my private use but two of them serve in a live production context. It goes without saying that not only do these projects work, but they were accomplished 100x faster than I could have done by hand.
Do I also have vibecoded projects that went of the rails? Of course. I had to build those to learn where the edges of the model’s capabilities are, and what its failure modes are, so I can compensate. Vibecoding a good codebase is a skill. I know how to vibecode a good, maintainable codebase. Perhaps this violates your definition of vibecoding; my definition is that I almost never need to actually look at the code. I am just serving as a very hands-on manager. (Though I can look at the code if I need to - have 20 years of coding experience. But if I find that I need to look at the code, something has already gone badly wrong.)
Relevant anecdote: A couple of years ago I had a friend who was incredibly skilled at getting image models to do things that serious people asserted image models definitely couldn’t do at the time. At that time there were no image models that could get consistent text to appear in the image, but my friend could always get exactly the text you wanted. His prompts were themselves incredible works of art and engineering, directly grabbing hold of the fundamental control knobs of the model that most users are fumbling at.
Here’s the thing: any one of us can now make an image that is better than anything he was making at the time. Better compositionality, better understanding of intent, better text accuracy. We do this out of the box and without any attention paid to promoting voodoo at all. The models simply got that much better.
In a year or two, my carefully cultivated expertise around vibecoding will be irrelevant. You will get results like mine by just telling the model what you want. I assert this with high confidence. This is not disappointing to me, because I will be taking full advantage of the bleeding edge of capabilities throughout that period of time. Much like my friend, I don’t want to be good at managing AIs, I want to realize my vision.
Other than that, I keep hearing the same arguments - "LLMs free up more time for me to think about the 'important things'." Son, your system is not durable, your tests are misleading, and you can't reason about what's happening because you didn't write it. What important things are left to think about??
I cannot express how tired I am of seeing this beyond stupid take.
If you truly believe that, you have either only ever worked with the most piss-poor junior engineers, or you simply have never worked with junior engineers.
LLMs do not learn, LLMs do not ask clarifications, LLMs do not wonder if they're going in the wrong direction, LLMs do not have taste, LLMs do not have opinions, LLMs write absolutely illogical nonsense, and much more.
tptacek•1h ago
First, skilled engineers using LLMs to code also think and discuss and stare off into space before the source code starts getting laid down. In fact: I do a lot, lot more thinking and balancing different designs and getting a macro sense of where I'm going, because that's usually what it takes to get an LLM agent to build something decent. But now that pondering and planning gets recorded and distilled into a design document, something I definitely didn't have the discipline to deliver dependably before LLM agents.
Most of my initial prompts to agents start with "DO NOT WRITE ANY CODE YET."
Second, this idea that LLMs are like junior developers that can't learn anything. First, no they're not. Early-career developers are human beings. LLMs are tools. But the more general argument here is that there's compounding value to working with an early-career developer and there isn't with an LLM. That seems false: the LLM may not be learning anything, but I am. I use these tools much more effectively now than I did 3 months ago. I think we're in the very early stages of figuring how to get good product out of them. That's obvious compounding value.
dpflan•1h ago
I like asking for the plan of action first, what does it think to do before actually do any edits/file touching.
james_marks•1h ago
But there’s a guiding light that both the LLM and I can reference.
dpflan•44m ago
badsectoracula•1h ago
Regardless of that, personally i'd really like it if they could actually learn from interacting with them. From a user's perspective what i'd like to do is to be able to "save" the discussion/session/chat/whatever, with everything the LLM learned so far, to a file. Then later be able to restore it and have the LLM "relearn" whatever is in it. Now, you can already do this with various frontend UIs, but the important part in what i'd want is that a) this "relearn" should not affect the current context window (TBH i'd like that entire concept to be gone but that is another aspect) and b) it should not be some sort of lossy relearning that loses information.
There are some solutions but there are all band-aids to fundamental issues. For example you can occasionally summarize whatever discussed so far and restart the discussion. But obviously that is just some sort of lossy memory compression (i do not care that humans can do the same, LLMs are software running on computers, not humans). Or you could use some sort of RAG but AFAIK this works via "prompt triggering" - i.e. only via your "current" interaction, so even if the knowledge is in there but whatever you are doing now wouldn't trigger its index the LLM will be oblivious to it.
What i want is, e.g., if i tell to the LLM that there is some function `foo` used to barfize moo objects, then go on and tell it other stuff way beyond whatever context length it has, save the discussion or whatever, restore it next day, go on and tell it other stuff, then ask it about joining splarfers, it should be able to tell me that i can join splarfers by converting them to barfized moo objects even if i haven't mentioned anything about moo objects or barfization since my previous session yesterday.
(also as a sidenote, this sort of memory save/load should be explicit since i'd want to be able to start from clean slate - but this sort of clean slate should be because i want to, not as a workaround to the technology's limitations)
didibus•6m ago
Models don't have memory, and they don't have understanding or intelligence beyond what they learned in training.
You give them some text (as context), and they predict what should come after (as the answer).
They’re trained to predict over some context size, and what makes them good is that they learn to model relationships across that context in many dimensions. A word in the middle can affect the probability of a word at the end.
If you insanely scale the training and inference to handle massive contexts, which is currently far too expensive, you run into another problem: the model can’t reliably tell which parts of that huge context are relevant. Irrelevant or weakly related tokens dilute the signal and bias it in the wrong direction, the distribution flatten or just ends up in the wrong place.
That's why you have to make sure you give it relevant well attended context, aka, context engineering.
It won't be able to look at a 100kloc code base and figure out what's relevant to the problem at hand, and what is irrelevant. You have to do that part yourself.
Or what some people do, is you can try to automate that part a little as well by using another model to go research and build that context. That's where people say the research->plan->build loop. And it's best to keep to small tasks, otherwise the context needing for a big task will be too big.
boredemployee•1h ago
haha I always do that. I think it's a good way to have some control and understand what it is doing before the regurgitation. I don't like to write code but I love the problem solving/logic/integrations part.
tptacek•1h ago
ctoth•54m ago
tptacek•45m ago
mccoyb•51m ago
AlexCoventry•54m ago
What have you figured out so far, apart from explicit up-front design?
closeparen•47m ago
Yes, and the thinking time is a significant part of overall software delivery, which is why accelerating the coding part doesn't dramatically change overall productivity or labor requirements.
tptacek•45m ago
swiftcoder•29m ago
lomase•12m ago
If their job is basically to generate code to close jira tickets I can see the appeal of LLMs.
yggdrasil_ai•42m ago
surgical_fire•35m ago
I really like that on IntelliJ I have to approve all changes, so this prompt is unnecessary.
There's a YOLO mode that just changes shit without approval, that I never use. I wonder if anyone does.
dvratil•12m ago
Even Claude Code lets you approve each change, but it's already writing code according to a plan that you reviewed and approved.