That being said, every few months a new model comes out that is a little less encumbered by the typical flaws of LLM's, a little more "intuitively" smart and less needing of hand-holding, a little more reliable. I feel that this is simply a natural course of evolution, as more money is put into LLM's they get better because they're essentially a giant association machine, and those associations give rise to larger abstractions, more robust conceptions of how to wield the tools of understanding the world, etc. Over time it seems inevitable that providing an LLM any task it will be able to perform that task better than any human programmer given it, and the same will go for the rest of what humans do.
No, LLMs will not get better. The singularity bullshit has been active since 2010s. LLMs have consumed the entire fucking Internet and are still useless. Where the fuck is the rest of the data going to come from? All these emails from people wanting high quality data from PhDs only for them to be scammy. People only want to train these things on easily stolen garbage, not quality input, because quality is expensive. Go figure!
This optimistic horeshit hype is embarrassing.
What makes you so sure of this? They've been getting better like clockwork every few months for the past 5 years.
The hallucinate exactly as much as they did five years ago.
> "benchmarks"
Stop drinking the coolaid and making excuses for LLM limitations, and learn to use the tools properly given their limits instead.
They aren't useless. Otherwise, ChatGPT would have died a long time back
> Where the fuck is the rest of the data going to come from?
Good question. Personally, I think companies will start paying more for high quality data or what is at least perceived as high quality data. I think Reddit and some other social media companies like it are poised to reap the rewards of this.
Whether this will be effective in the long run remains to be seen.
Isn’t the entire industry being fuelled by orders of magnitude more VC funding than revenue?
Because people want to use it, right? And it is a matter of time before they start limiting the ChatGPT "free" or "logged out" accounts, I feel. In the consumer AI chat apps, it is still the dominant brand, at least in my anecdotal experience, and they will basically make the Plus version the one version of the app to definitely use.
Plus they are planning on selling it to enterprises, and at least a couple of them are signing up for sure.
People use them because they are useful, not because they are VC funded.
Programming and machine languages aim for a precise and unambiguous semantics, such that it's meaningful to talk about things like whether the semantics are actually precise or whether the compiler has a bug in failing to implement the spec.
Natural language is not just a higher level of abstraction on our existing stack. If a new model comes out, or you even run an existing model with a new seed, you can get different code out that behaves differently. This is not how compilers work.
search_engine.get_search_results(query, length, order)
It doesn't "care" about the algorithm that produced that list of results, only that it fits the approximation of how the algorithm works as defined by the schema. There are thousands of ways the engine could have been implemented to produce the schema that returns relevance-based results from a web-crawler-sourced database.
In the same way, if I prompt an LLM "design a schema with [list of requirements] that works in [code context and API calls]", there are thousands of ways it could produce that code, but within a margin of error a high quality LLM should be able to produce the code that fits those requirements.
Of course the difference is that there is a stochastic element to LLM generated code. However it is useful to think of LLM's this way because it allows being able to leverage their probability of being correct, even if they aren't as precise as calling APIs but being explicit in how those abstractions are used.
That being said, a context length problem could be potentially be solved but it will take a bit of time, I think Llama4 had 10M context length (not sure if anyone tried prompting it with that much data to see how effective it really is)
Like I don't memorize the last 20 commits, but I know generally the direction things are going by reading those commits at some point
And even if you juiced up a context length of an LLM to astronomical numbers AND made it somehow better at parsing and understanding its context, it will not always repeat said capabilities in other codebases (see for example o3 supposedly being the top of most benchmarks but it will still fumble a simple variation mother-is-a-surgeon puzzle).
I am not saying its impossible for a company to figure this out, but it will be incredibly hard.
I once worked on a massive codebase that had survived multiple acquisitions, renames and mergers over a 20 year period. By the time I left it had finally passed into the hands of a Fortune 500 global company.
You would often find code that matched an API call you required that was last updated in the mid-2000s, but there was a good chance that it was not the most recent code for that task, but still existed as it was needed for some bespoke function a single client used.
There could also be similar API calls with no documentation, and you had to pick the one that returned the data fields that you wanted.
Many didn’t code (much) before.
I've also noticed that the effort to de-slop the shit-code is quite significant, and many times eats the productivity gains of having the LLM generate the code.
The legal department may have a different idea there.
> No one would implement a bunch of utility functions that we already have in a different module.
> No one would change a global configuration when there’s a mechanism to do it on a module level.
> No one would write a class when we’re using a functional approach everywhere.
Boy I'd like to work on whatever teams this guy's worked on. People absolutely do all those things.
I'd be extremely careful about applying this thinking anywhere else. There's enough baseless finger-pointing in academia and arts already.
Humm.
Maybe if we say that this is not an issue from vibe coding it wont be?
Maybe if we pretend that maybe a naive junior would make these mistakes (true) we should be happy to accept them from senior developers (false)?
LLMs are extraordinarily bad at doing these things.
I’ve seen it.
You've seen it.
The OP has seen it.
You’re in a rush so you wrote some classes in a code base in a language which supports classes but has no classes in it?
Really? Did that get past code review before? Did you deliberately put up a code review that you knew would be rejected and take longer to merge as a result because you were in a hurry?
Of course not.
You did the bare minimum that still met the basic quality standards expected of you.
I get it. We all get it. When youre in a rush you cut corners to move faster.
…but thats not what the OP is talking about, and its not what I see either:
Its people putting up AI slop and not caring at all what the content was.
Just a quick check it compiled and the tests pass if youre lucky.
Too lazy even put a “dont use classes” in their cursor rules file.
Come on. The OP isnt saying dont use AI.
Theyre saying care, just a little bit about your craft ffs.
Critical solutions, but small(er) projects with 2-4 devs, that’s where it’s at. I feel like it’s because then it’s actually possible to build a devteam culture and consensus that has the wanted balance of quality and deliveryspeed.
to be fair on this one, and while I don't flat out disagree, lots of people reinvent utility functions simply because they don't know they exist elsewhere, especially on huge code bases. This seems to get mostly rectified within the PRs, when a senior dev comments on it - the problem then is, you've only increased the number of people who now know by 1.
New people contributing usually reinvent many things and change global configuration because they don't know they can use something already there.
Ironically indexing the codebase and ask LLM questions about specific things is the best thing you can do. Because the only 3 people who you can ask left the project or are busy and will reply within a week.
Documentation does not help beyond a point. Nobody reads the documentation repeatedly, which would be needed.
When you keep working on a project, and you need a new function, you would need to check or remember every single time that such a function already exists or might exist somewhere. You may have found it when you read the docs months ago, but since you had no need for that function at the time your brain just dismissed it and tossed that knowledge out.
For example, I had a well-documented utils/ folder with just a few useful modules, but they kept getting reimplemented by various programmers. I did not fault them, they would have had to remember every single time they needed some utility to first check that folder. All while keeping up that diligence forever, and while working on a number of projects. It is just too hard. Most of the time you would not find what you need, so most of the time that extra check would be a waste. Even the most diligent person would at some point reimplement something that already exists, no matter how well-documented it is. It's about that extra search step itself.
The closer you want 100% perfection you get exponentially increasing effort. So we have some duplication, not a big deal. Overall architectural quality is more important than squeezing out those last not really important few percent of perfection.
Claude code’s Plan mode kind of does this research before coding - but tbf the Search tool seemingly fails half the time with 0 results and it gets confused and then reimplements too…
Assuming they even have code reviews - in your experience, in a situation where the person writing the code didn't check if it already exists, the reviewer will check that and then tell them to delete their already finished implementation and use that existing thing?
That being said, good documentation is worth its weight in gold and supports the overall health and quality of a codebase/project. Open-source projects that succeed often seem to have unusually strong, disciplined documentation practices. Maybe that's just a by-product of engineering discipline, but I don't think it is -- at least not entirely.
> docs that tells you basically that the code is self documented
Anytime someone tells me the code is self-documented I hear "there's no documentation."The most common programmer's footgun
I don't have time to document
| ^
v |
Spends lots of time trying to understand code
We constantly say we don't have time to document the code. So instead we spend all our time reading code and trying to figure out what it does, to the minimal amount of understanding needed to implement whatever thing we need to implement.This, of course, itself is naive because you can't know what the minimal necessary information is without knowing something about the whole codebase. Which is also why institutional knowledge is so important and why it is also weird that we'd rather have pay raises through switching companies than through internal raises. That's like trying to fix the damage from the footgun with a footgun.
How does the saying go, something like “show me the incentives and I’ll show you the outcome?”
> That's like trying to fix the damage from the footgun with a footgun.
If you value your money/time/etc, wouldn't the best way to fix the damage from footguns be by preventing the damage to you in the first place by not being there if/when it goes off?
I think your point is well put, I’m just trying to follow your reasoning to a conclusion logical to me, though I don't know if mine is the most helpful framing. I didn’t pick the footgun metaphor, but it is a somewhat useful model here for explaining why people may act the way they do.
So the question becomes: is no documentation better or documentation that can be - potentially - entirely out of date, misleading or subtly wrong, because eg they documented the desired behavior vs actual behavior (or vice versa).
I'm generally pro documentation, I'm just fully aware that internal documentation the devs need to write themselves and for themselves... Very rarely gets treated with enough respect to be trustworthy.
So what it comes down to is one person spearheading the efforts for docs while the rest of the team constantly "forgets" it, until they decide it's not worth the effort as soon as the driving force either changes teams or gave up themself.
Knowledge transfer through technical writing doesn’t always manifest itself if it isn’t part of the work process at the time you have that in your mental context. It’s hard to have that context to write the docs if you’re context switching from working on something else or not involved at that level, so it’s hard to just drop in to add docs if there isn’t some framework for writing ad hoc docs for someone to fix up later.
I don’t have experience at traditional employers though so I can’t speak authoritatively here. I’m used to contracts and individual folks and small business. Having human readable documents is important to me because I’m used to folks having things explained on their level, which requires teaching only what they need and want to know to get their work done. Some folks don’t even know what they need when they ask me for help, so that’s its own process of discovery and of documentation. I’m used to having to go to them where they are and where the issue is, so there was no typical day at the office or out of it. Whatever couldn’t fit through the door, I had to go to myself. I’ve had to preserve evidence of potential criminal wrongdoing and document our process. It taught me to keep notes and to write as I work.
I think most places do have some kind of process for doing this, and I suspect the friction in doing the thing is part of the issue, and the fact that it’s difficult thankless work that doesn’t show up on most tracked metrics is part of the issue.
If docs were mandated they would get done. If someone’s job was to make sure they were done well, that would help. I guess folks could step up and try to make that happen and that might be what it takes to make that happen.
Not if you generate reference docs from code and how-to docs from tests.
> documentation is that it inevitably goes stale.
Of course! Just like the code itself.There's two helpful and minimal mitigating strategies here
1) dating/versioning
Provides hints that they might be stale if we can identify if they're old. Easy to diff a function between versions when something seems wrong
2) In code documentation (e.g. docstrings)
While not as good as a manual, it's not too difficult to write docstrings *while* coding.
But the unfortunate thing is that you just need to keep documents up to date. Docstrings only go so far.Also, I'm pretty certain we've all experienced a choice between two tools where our choice prioritized documentation. Docker is a great example (it's even RedHat's business model!). There's many container systems, many that can even do more! But take systemd-nspawn (and vm). Very poorly documented stuff and not many examples to learn from.
I wanted to make that reminder because UX is important to the business.
> How does the saying go, something like “show me the incentives and I’ll show you the outcome?”
I think you're trivializing this saying here. The incentives actually suggest you should raise wages of current employees more than new ones. Current ones are more valuable.Of course, the issue is time. What timeframe are we measuring the incentives at.
You should pay current employees less iff either 1) time doesn't exist (or you are finding the instantaneous optimal solution) or 2) employees are fungible (institutional knowledge does not exist)
Otherwise, you should be trying harder to keep current employees because you recognize the value of institutional knowledge. You don't have to train current employees. Current employers don't have to get up to speed (which usually take a few months and can take years).
It's not a hard equation
Costs:
Existing employee:
+cost of raise
New employee:
+salary of existing employee
+ raise
- onboarding inefficiency * (t_n - t_0)
- existing employee * (time they train)
- costs to interview, hire, etc * (time to hire new person)
So yeah, if time doesn't exist, you're right, it is the incentives. But since it does, I disagree that they are > the best way to fix the damage from footguns be by preventing the damage to you in the first place by not being there if/when it goes off?
This implies that the footgun will inevitably fire. It also implies you can get out of the line of fire. But you can't get out of the way of a footgun. A footgun is something where you, the gun operator, shoot yourself in the foot.My argument is that the best strategy is to ,,never fire'' the footgun.
Avhception gives a good visual analogy without using the word footgun[0]
You should only do this if you have to in order to have better business outcomes. It may be better for the business to not do this, because the current employees will stay even if you don't pay them more, until they don't. So we have to find out what that point they will leave is by not paying some of them more when we otherwise ought to or would otherwise.
> This implies that the footgun will inevitably fire. It also implies you can get out of the line of fire. But you can't get out of the way of a footgun. A footgun is something where you, the gun operator, shoot yourself in the foot.
These are Chekhov's footguns. As you mention in this comment, they do fire, and they will hit whoever is in front of them. They don't only fail when pointed at the feet of the operator. Your wording implies that they will go off in the original comment too. I can't be blamed for the shortcomings of your original metaphorical argument, which I responded to in good faith.
> > That's like trying to fix the damage from the footgun with a footgun.
This implies that the footgun going off is seemingly unavoidable, which leads folks to weird anti-footgun (damage) mitigations, even second footguns. I responded to this phrasing specifically. That's why I argue that the damage of footguns is probabilistic, in that iff footguns usually go off, then on a long enough timeline, they will hit someone somewhere, and you don't want that to be you, so you should jump ship before it seems like it's unavoidable. I don't see how that is a misreading of the concept or your words, because that is consistent with how a lot of job hoppers I know relate to their work and switching jobs. Even when they do their best, the footguns eventually go off on someone at job sites that allow the footguns to begin with, so it is fair to say that they will go off, but it's uncertain who management will blame or find fault with, so they need not "go off on" the person holding the footgun or even the person who loaded it or pulled its trigger. Those are all different roles/jobs, even though they may be done by the same person at times.
> My argument is that the best strategy is to ,,never fire'' the footgun.
Surely then the second best strategy is to not be there if/when it goes off? We can't count on them not being fired, to my view.
After all, these are Chekhov's footguns, remember?
It's easy to hear "let's slow down a little" as "don't move fast" but it's wrong to interpret that because "slow down" is relative. There is such a thing as "too fast". You want to hear "slow down" just as much as you want you great calls to speed up. When you hear both you should be riding that line of fast but not too fast. It's also good to make sure you have a clear direction. No use in getting nowhere faster.
I'll use another visual analogy. Let's say you and I have a race around the world. I take off running, you move to the drawing board. I laugh as I'm miles ahead, and you go to your workshop, which is in the other direction. The news picks up our little race, and laughs at you as I have such a tremendous lead. I'm half way done, but you come out of your workshop having build a jet. I only get a few more miles before you win the race. The news then laughs at my stupidity and your cleverness, as if it was so obvious all along.
Sometimes to move fast you need to slow down. It takes lots of planning and strategizing to move at extraordinary speeds.
Okay but this is the ninth time in a row, and now you’re building on top of all the other half-baked bits, and the confusion from every one of these layering on top of one another is forcing you to do more hacks on hacks on hacks. And then when you finally “get it to work” there will be no way to disentangle this whole mess.
When revisiting code is the best time to add comments because then you will find out what is tricky and what is obvious.
Code reviews are also good for adding code comments. If the people reviewing are doing their job and are actually trying to understand the code then it is a good time to get feedback where to add comments.
Your first "docs" are your initial sketch. The pieces of paper, whiteboard, or whatever you used to formulate your design. I then usually write code "3" times. The first is the hack time. If in a scripting language like python, test your functions in the interpreter, isolated. Then "write" 2 is bringing into the code, and it is a good idea to add comments here. You'll usually catch some small things here. Write the docstrings now, which is your 2nd docs and your first "official" ones. While writing those I usually realize some ways I can make my code better. If in a rush, I write these down inside the docstring with a "TODO". When not rushing I'll do my 3rd "write" and make those improvements (realistically this is usually doing some and leaving TODOs).
This isn't full documentation, but at least what I'd call "developer docs". The reason I do things this way is that it helps me stay in the flow state, but allows me to move relatively fast while minimizing tech debt. It is always best to write docs while everything is fresh in your mind. What's obvious today isn't always obvious tomorrow. Hell, it isn't always obvious after lunch! This method also helps remind me to keep my code flexible and containerize functions.
Then code reviews help you see other viewpoints and things you possibly missed. You can build a culture here where during review TODOs and other similar things can be added to internal docs so even if triaged the knowledge isn't completely lost.
Method isn't immutable though. You have to adapt to the situation at the time, but I think this is a good guideline. It probably sounds more cumbersome than it is, but I promise that second and third write are very cheap[0]. It just sounds like a lot because I'm mentioning every step[1]
[0] Even though I use vim, you can run code that's in the working file, like cells. So "write 2" kinda disappears, but you still have to do the cleanup here so that's "write 2"
[1] Flossing your teeth also sounds like a lot of work if you break it down into all subtasks 1) find floss, 2) reach for floss, 3) open floss container, ...
I’m not arguing about your personal experience but these things are not absolutes.
The key thing is can a new developer jump in and understand the thing. Add enough docs until they facilitate this understanding as well as possible. Then stop documenting and point the reader to the code.
My point is that everyone is different. Documentation isn't just for developers and you never know who's going to contribute. It is also beneficial to have multiple formats just because even with a single person different ways make more sense on one day than the next. Having different vantage points is good to have. It is also good to practice your own mental flexibility[0]
I think the pytorch docs are a good example here. Check out the linalg module[1] (maybe skip the matrix properties section).
[0] This will also help you in the workplace to better communicate with others as well as makes you a better problem solver. Helps you better generalize ideas.
> Is speed the greatest virtue?
If speed is the greatest virtue then yeah, all that stuff will happen. But if it isn't, then that stuff will happen at a much lower frequency. Because, all the stuff mentioned is just tech debt. Debt doesn't go away, it accrues interest.If speed is all that matters then you need exponential output, as your output needs to offset the debt. If speed is a factor but isn't the only factor, then you need to weigh it against the other things. Take on debt wisely and pay it off when you can. But it does seem like there's a trend to just take on as much debt and hope for the best. Last I checked, most people aren't really good at handling debt.
Not everything that is not perfect is Tech Debt, some of it is just pragmatism. If you end up with two methods doing the same thing, who cares? As long as they are both correct, they cost nothing, might never need any maintenance attention and will never be paid down before the codebase is replaced in 10 years time.
Same with people writing code in a different style to others. If it is unreadable, that isn't tech debt either, it's just a lack of process or lack of someone following the process. Shouldn't be merged = no tech debt.
Adding some code to check edge cases that are already handled elsewhere. Again, who cares? If the code make it unreadable, delete it if you know it isn't needed, it only took 10 seconds to generate. If it stays in place and is understandable, it's not tech debt. Again, not going to pay it down, it doesn't cost anything and worse case is you change one validation and not the other and a test fails, shouldn't take long to find the problem.
Tech debt is specifically borrowing against the right way to do something in order to speed up delivery but knowing that either the code will need updating later to cope with future requirements or that it is definitely not done in a reliable/performant/safe way and almost certainly will need visiting again.
Thing that many people do without even realizing they are incurring in tech debt. This kind of developers are the one that will just generate more tech debt with an LLM in their hands (at least now).
That said, tech debt isn't paid by developers individually, it's paid by organizations in developers time. Only in rare cases can you make a deliberate decision for it, as it grows organically within any project. For example, most python2 code today that used niche libraries with outdated docs that have been taken offline in the meantime has to be considered expensive tech debt nowadays.
> If you end up with two methods doing the same thing, who cares? As long as they are both correct, they cost nothing
To be clear, tech debt isn't "code that doesn't run". It's, like you later say "borrowing against the right way to do something in order to speed up delivery", which is what I said the authors thesis was.No need for perfection. Perfection doesn't exist in code. The environment is constantly moving, so all code needs to eventually be maintained.
But I also want to be very very clear here. Just because two functions have the same output doesn't mean that they're the same and no one should care. I'll reference Knuth's premature optimization here. You grab a profiler and find the bottleneck in the code and it's written with a function that's O(n^3) but can be written in O(n log n). Who cares? The customer cares. Or maybe your manager who's budgeting that AWS bill does. You're right that they're both logically "correct" but it's not what you want in your code.
Similarly, code that is held together with spaghetti and duct tape is tech debt. It runs. It gives the correct output. But it is brittle, hard to figure out what it does (in context), and will likely rot. "There's nothing more permanent than a temporary fix that works ", as the saying goes. I guess I'll also include the saying "why is there never time to do things right but there's always time to do things twice?"
Code can be broken in many ways. Both of those situations have real costs. Costs in terms of both time and money. It's naïve to think that the only way code can be broken is by not passing tests. It's naïve to think you've tested everything that needs to be tested. Idk about you, but when I code I learn more about the problem, often with the design changing. Most people I know code this way. Which is why it is always good to write flexible code, because the only thing you can rely on with high confidence is that it's going to change
Linters can also help quite a bit. In the end, you either have your rules enforced programmatically or by a human in review.
I think it’s a very different (and so far, for me, uncomfortable) way of working, but I think there can be benefits especially as tooling improves
Coding agents come with a lot of good behavior built in.
Like "planning mode" where they create a strong picture of what's to be made before touching files. This has honestly improved my workflow at programming from wanting to jump into prototyping before I even have a clear idea, to being very spec-oriented: Of course there needs to be a plan, especially when it will be drafted for me in seconds.
But the amount of preventable dumb things coding agents will do that need to be explicitly stated and meticulously repeated in their contexts reveals how simply training on the world's knowledge does not capture senior software engineer workflows entirely, and captures a lot of human averageness that is frowned upon.
All the models I’ve used (yes, including all the biggest, newest, smartest ones) follow the binary rule about 75% of the time at the very most. Usually closer to 50% on average, with odds significantly decreasing the longer the context increases as it occurs at the end of a task but other than that seems to have no predictable pattern.
The fuzzier rule is slightly better, I’m guessing because it applies earlier in the context window, at around 80% compliance and uses lots of caps and emphasis. This one has a more predictable failure mode of the ratio of reading code vs thinking/troubleshooting/time the model is “in its own head”. When mostly reading code or my instructions compliance is very high, when doing extended troubleshooting or anything that starts to veer away from the project itself into training data it is much lower.
So it’s hit and miss and does help but definitely not something I’d rely on as a hard guardrail, like not executing commands, which Roo has a non-LLM tool config to control. So over time I hope agentic runners add more detetministic config outside the model itself, because instructions still aren't as reliable as they should be and don't seem to be getting substantially better in real use.
The key is that we all have an intuitive sense that this behavior is wrong - building a project means working within the established patterns of that project, or at least being aware of them! Going off half-cocked and building a solution without considering the context is extremely bad form.
In the case of human developers, this can be fixed on the code review level, encouraging a culture of reading not just writing code. Without proper guardrails, they can create code that's dissonant with the existing project.
In the case of LLMs, the only recourse is context engineering. You need to make everything explicit. You need to teach the LLM all the patterns that matter. Their responses will always be probabilistic token salad, by definition. Without proper guardrails, it will create code that's dissonant with the existing project.
Either way, it's a question of subjective values. The patterns that are important need to be articulated, otherwise you get token salad randomly sampling the solution space.
I think soon enough we'll have a decent LLM that's capable of reviewing ALL changes to ensure they follow the "culture" we expect to see.
The question then is: do the bad developers improve by vibe coding, or are they stuck in a local optimum?
If we want to be more precise, I think the main issue is that the AI-generated code lacks a clear architecture. It has no (or very little) respect for overall information flow, and single-responsibility principle.
Since the AI wants you to have "safe" code, so it will catch things and return non-results instead. In practice, that means the calling code has to inspect the result to see if it's a placeholder or not, instead of being confident because you'd get an exception otherwise.
Similarly, to avoid problems the AI might tweak some parameter. If for example you were to design an program to process something with AI, you might to gather_parameters -> call -> process_results. Call should not try to do funky things with parameters because that should be fixed at the gathering step. But locally the AI is always going to suggest having a bunch of "if this parameter is not good, swap it silently so that it can go through anyway".
Then tests are such a problem it would require an even longer explanation...
The developer can do whatever they want, but at the end, what I review is their code. If that code is bad, it is the developer's responsibility. No amount of "the agent did it" matters to me. If the code written by the agent requires heavy refactoring, then the developer has to do it, period.
However, you'll probably get an angry answer that it's management fault, or something of the sort, that is to blame (because there isn't enough time). Responsibility would have to be taken up before in pushing back if some objectives truly are not reasonable.
If the agent has a clean, relevant context explaining what global functions are available it tends to use them properly.
The biggest challenge is how to construct the right context for each request, and keep it clean until the feature is finished. I expect we will see a lot of improvements in this area the coming months (sub-agents being an obvious example).
STOP! The agent does not exist. There are no agents; only mathematical functions that have an input and produce an output.
Stop anthropomorphizing LLMs, they are not human, they don’t do anything.
It might seem like it does not matter; my take is it’s primordial. Humans are not machines and vice-versa.
What I'd like is for people to stop pretending we have any idea what the hidden layer of an LLM is actually doing. We do not know at all. Yes, words like "statistics" and "mathematical functions" can accurately describe the underlying architecture of LLMs, but the actual mechanism of knowledge processing is not understood at all. It is exactly analogous to how we understand quite a lot about how neurons function at the cellular level (but far from everything, seeing as how complicated and opaque nature tends to be), but that we have no idea whatsoever what exactly is happening when a human being is doing a cognitive task.
It is a fallacy to confuse the surface level understanding of how a transformer functions, to the unknown mechanisms that LLMs employ.
> The main unifying theme is the idea of an intelligent agent. We define AI as the study of agents that receive percepts from the environment and perform actions.
This is from Artificial Intelligence: A Modern Approach by Stuart J. Russell and Peter Norvig.
Also, I make it work the same way I do: I first come up with the data model until it "works" in my head, before writing any "code" to deal with it. Again, clear instructions.
Oh another thing, one of my "golden rule" is that it needs to keep a block comment at the top of the file to describe what's going on in that file. It acts as a second "prompt" when I restart a session.
It works pretty well, it doesn't appear as "magic" as the "make it so!" approach people think they can get away with, but it works for me.
But yes, I still also spend maybe 30% of the time cleaning up, renaming stuff and do more general rework of the code before it comes "presentable" but it still allows to work pretty quickly, a lot quicker than if I were to do it all by hand.
Very often it comes down to HR issues in the end, so you end up having to take that code anyway, and either sneakily revert it or secretly rework it...
But I agree completely some juniors are a pleasure to see bloom, it's nice when one day you see their eye shine and "wow this is so cool, never realized you made that like THAT for THAT reason" :-)
Having a junior programmer assistant who never gets better sounds like hell.
Or maybe this is it. Who knows.
We found it mostly starts to abandon instructions when the context gets too polluted. Subagents really help address that by not loading the top context with the content of all your files.
Another tip: give it feedback as PR comments and have it read them with the gh CLI. This is faster than hand editing the code yourself a lot of times. While it cleans up its own work you can be doing something else.
But sometimes I wonder if pushing a +400.000 lines PR to an open-source project in a programming language that I don't understand is more beneficial to my career than being honest and quality-driven. In the same way that YoE takes precedence over actual skill in hiring at most companies.
You might get the same in Stack Overflow too, but more likely I’ve found either no response or, or someone pretty competent actually does come out of the woodworks.
As for the last part, I've recently been getting close to 50 and my eyes aren't what they used to be. In order to fight off eye-strain I now have to tightly ration whatever I do into 20 minute blocks, before having to take appropriate breaks etc.
As a result of that time has become one of the biggest factors for me. An LLM can output code 1000x faster than a human, so if I can wrangle it somehow to do whatever basics for me then its a huge bonus. At the moment I'm busy generating appropriate struct of arrays for SIMD from input AoS structs, and I'm using Unity C# with LINQ to output the text (I need it to be editable by anyone, so I didn't want to go down the Roslyn or T4 route).
The queries are relatively simple, take the list of data elements and select the correct entries, then take whatever fields and construct strings with them. Even so, copying/editing them takes a lot longer than me telling GPT to select this, exclude that and make the string look like ABC.
I think there was a post yesterday about AI's as HUDs, and that makes a lot of sense to me. We don't need an all-powerful model that can write the whole program, what we need is a super-powered assistant that can write and refactor on a very small and local scale.
But then it's not vibe coding anymore :)
I have ended up thinking about it as a "hunting dog". It can do some things better than me. It can get into tiny crevasses and bushes. It doesn't mind getting wet or dirty. It will smell the prey better than me.
But I should make the kill. And I should be leading the hunt, not the other way around.
More work up front and some work after, but still saves time and brain power vs doing it all myself or letting it vibe out some garbage.
But this is only scratching the surface of what's wrong, as the article elaborates.
The thing is people claim these things are making them faster. I don't believe it. What I believe is they are faster at generating shit. I know that because a baby can coax an LLM into producing shit too.
I do not believe you can spend that much time writing the correct prompt - use this exact function, follow this pattern, add a comment here, don't add one there, no, not like that - and still be quicker than just writing it yourself directly in the language.
It's like if I speak French fluently but only communicate through a translator that I instruct in English but constantly have to correct when they miss the nuance in my speech. I'd just speak French!
So, no, I don't believe it.
What I believe is that many, many software developers have been manually writing boilerplate, repetitive and boring code over and over again up until this point. I believe it because I've seen it. LLMs will obviously speed this up. But some of us already learnt how to use the computer to do that for us.
What I also believe is developers exist who don't understand, or care to understand, what they are doing. They will code using a trial and error approach and find solutions based purely on perceived behaviour of the software. I believe it because I've seen it. Of course LLMs will speed up this process. But some of us actually think about what we're writing, just like we don't just randomly string together words in a restaurant and then just keep trying until we get the dish we want.
Yeah, that's not happening.
LLMs enable masses of non-technical people to create and publish software. They enable scammers and grifters who previously would've used a web site builder to also publish native and mobile apps, all in a fraction of the time and effort previously required. They enable experienced software developers to cut corners and automate parts of the job they never enjoyed to begin with. It's remarkable to me that many people who have been working in this industry for years don't enjoy the process of creating software, and find many tasks to be a "chore".
A small part of this group can't identify quality even if they cared about it. The rest simply doesn't care, and never will. Their priorities are to produce something that works on the surface with the least amount of effort, or, in the case of scammers, to produce whatever can bring them the most revenue as quickly and cheaply as possible. LLMs are a perfect fit for both use cases.
Software developers who care about quality are now even a smaller minority than before. They can also find LLMs to be useful, but not the magical productivity booster that everyone else is so excited about. If anything their work has become more difficult, since they now need to review the mountains of code thrown at them. Producing thousands of lines of code is easy. Ensuring it's high quality is much more difficult.
I also noticed that the time I had to spend on reviews from some of my colleagues increased by 9 times (time tracked). So I don't know how much faster they are being at producing that code, but I think it's taking longer overall to get that ticket closed.
Can't agree more.
Claude Code has /init, Cursor comes with /Generate Cursor Rules, and so on. It's not even context engineering: There are out of the box tools you can use not to have this happen. And even if they do happen: you can make them never happen again, with these same tools, for your entire organization - if you had invested the time to know how to use them.
It is interesting how these tools split up the development community.
They care like they code: not.
> My understanding is that a rule should essentially do the same as if it is put in the prompt directly. Is there a solution to that?
Yes from my understanding Cursor Rule files are essentially an invisible prefix to every prompt. I had some issues in the past with Cursor not picking up rule files until I restarted it (some glitch, probably gone by now.). So put something simple like a "version" or for your rules file and ask it what version of the rules are we following for this conversation just to validate that the process is working.
For Cursor with larger projects I use a set of larger rule files that always apply. Recently I worked with Spotify's Backstage for example and I had it index online documentation on architecture, build instructions, design, development of certain components, project layout. Easily 500+ lines worth of markdown. I tell Cursor where to look, i.e. online documentation of the libraries you use, reference implementations if you have any, good code examples and why they are good, and then it writes its own rule files - I don't write them manually anymore. That has been working really well for me. If you have a common technology stack you or way of working you can also try throwing in some examples from https://github.com/PatrickJS/awesome-cursorrules
For a codebase containing both good and bad code; maybe you can point it to a past change where code was refactored from bad to good, so it can write out what why you prefer which style and how to manage the migration from bad to good. That said; the tools are not perfect. Even with rules the bad output still can happen but larger rule files describing what you'd like to do and what to avoid makes the chance significantly smaller and the tool more pleasant to work with. I recently switched to Claude Code because Cursor tended to get "stuck" on the same problem which I don't really experience with Claude Code but YMMV.
I don't think it's fair to dismiss this article as a superficial anti-ai knee jerk. The solutions you describe are far from perfect
> It works, it’s clear, it’s tested, and it’s maintainable.
It would be super funny if he ended his blogpost there.
I can imagine stuff like this happening when copy pasting from/to ai online chat interfaces, but not in a properly initialized project.
The agent will read all the crappy, partly outdated documentation all over the project and also take the reality of the project into consideration.
It's probably a good idea to also let it rewrite the programmer facing docs. Who else is going to maintain that realistically?
The common counter-argument here is that you miss out on training juniors, which is true, but it's not always an option (we are really struggling to hire at my startup, for instance, so I'm experimenting with AI to work on tasks I would otherwise give to a junior as a stop-gap).
Another aspect to consider is that what we used to consider important for software quality may change a lot in light of AI tooling. These things aren't absolutes. I think this is already happening, but it's early days, so I'm not sure what will play out here.
And by definition of this, you should care about the spec.
How code looks like, doesn't matter that much, as long it adheres to the spec.
Welcome to enterprise, it's not shit because people don't care. People don't care because they are incentivised not to, and those that do care, burn out.
If I come across a fugly code base, I don't bother reading it, I just ask Claude what it's doing and I ask Claude to fix it. To me this is a huge advantage because my OCD prevented me from producing fugly code by hand but now I wield Claude like an automatic complexity gun.
Producing that kind of complexity when you know there exists a simpler way is demoralizing, but it's not demoralizing when an LLM does it because it's so low effort.
I just hated thinking about all this mind-numbing nonsense.
Since AI he is typing more prose.
I think the author is vastly underestimating what the majority of people actually want. It took me a lot to get this, but for many people, quick/cheap will always trump quality.
But just because it's a powerful way to work, doesn't mean you get to be irresponsible with it! (Quite the opposite: Think table-saw)
[1] English SHell (Claude Code in my case), who says I need to be Bourne Again?
They're not orthogonal. Closures and classes are dual forms of the same thing. There are cases where one is better than the other for a given problem.
Those come from giving a shit about the work itself, not just shipping it.
parpfish•17h ago
do people really think functional coding shouldn't involve writing classes?
i can't imagine writing what i think of as code in a "functional programming style" without tons of dataclasses to describe different immutable records that get passed around. and if you're feeling fancy, add some custom constructors to those dataclasses for easy type conversions.
bublyboi•16h ago
williamstein•16h ago