Before AI, we were trying to save money, but through a different technique: Prompting (overseas) humans.
After over a decade of trying that, we learned that had... flaws. So round 2: Prompting (smart) robots.
The job losses? This is just Offshoring 2.0; complete with everyone getting to re-learn the lessons of Offshoring 1.0.
I think this is a US-centric point of view, and seems (though I hope it's not!) slightly condescending to those of us not in the US.
Software engineering is more than what happens to US-based businesses and their leadership commanding hundreds or thousands of overseas humans. Offshoring in software is certainly a US concern (and to a lesser extent, other nations suffer it), but is NOT a universal problem of software engineering. Software engineering happens in multiple countries, and while the big money is in the US, that's not all there is to it.
Software engineering exists "natively" in countries other than the US, so any problems with it should probably (also) be framed without exclusive reference to the US.
The problems are inherent with outsourcing to a 3rd party and having little oversight. Oversight is, in both cases, way harder than it appears.
The conclusion I reached was different, though. We learnt how to do outsourcing "properly" pretty quickly after some initial high-profile failures, which is why it has only continued to grow into such a huge industry. This also involved local talent refocusing on higher-value tasks, which is why job losses were limited. Those same lessons and outcomes of outsourcing are very relevant to "bot-sourcing'.
However, I do feel concerned that AI is gaining skill-levels much faster than the rate at which people can upskill themselves.
Nice.
[1] https://huggingface.co/docs/smolagents/conceptual_guides/int...
Salience (https://en.wikipedia.org/wiki/Salience_(neuroscience)), "the property by which some thing stands out", is something LLMs have trouble with. Probably because they're trained on human text, which ranges from accurate descriptions of reality to nonsense.
So the only real difference between "perception" and a "hallucination" is whether it is supported by physical reality.
I can recognize my own meta cognition there. My model of reality course corrects the information feed interpretation on the fly. Optical illusions feel very similar whereby the inner reality model clashes with the observed.
For general ai, it needs a world model that can be tested against and surprise is noted and models are updated. Looping llm output with test cases is a crude approximation of that world model.
This is different from human hallucinations where it makes something up because of something wrong with the mind rather than some underlying issue with the brain's architecture.
> In the field of artificial intelligence (AI), a hallucination or artificial hallucination (also called confabulation,[1] or delusion)[2] is a response generated by AI that contains false or misleading information presented as fact.[3][4]
You say
> This is different from human hallucinations where it makes something up because of something wrong with the mind rather than some underlying issue with the brain's architecture.
For consistency you might as well say everything the human mind does is hallucination. It's the same sort of claim. This claim at least has the virtue of being taken seriously by people like Descartes.
https://en.wikipedia.org/wiki/Hallucination_(artificial_inte...
It's possible LLMs are lying but my guess is that they really just can't tell the difference.
But much more than an arithmetic engine, the current crop of AI needs an epistemic engine, something that would help follow logic and avoid contradictions, to determine what is a well-established fact, and what is a shaky conjecture. Then we might start trusting the AI.
To me this is the most bizarre part. Have we ever had a technology deployed at this scale without a true understanding of its inner workings?
My fear is that the general public perception of AI will be damaged since for most LLMs = AI.
The idea we don't is tabloid journalism, it's simply because the output is (usually) randomised - taken to mean, by those who lack the technical chops, that programmers "don't know how it works" because the output is indeterministic.
This is not withstanding we absolutely can repeat the output by using not randomisation (temperature 0).
It implies that some parts of the output aren’t hallucinations, when the reality is that none of it has any thought behind it.
So, we VALUE creativity, we claim that it helps us solve problems, improves our understanding of the universe, etc.
BUT people with some mental illnesses, their brain is so creative that they lose the understanding of where reality is and where their imagination/creativity takes over.
eg. Hearing voices? That's the brain conjuring up a voice - auditory and visual hallucinations are the easy example.
But it goes further, depression is where people's brains create scenarios where there is no hope, and there's no escape. Anxiety too, the brain is conjuring up fears of what's to come
"All (large language) model outputs are hallucinations, but some are useful."
Some astonishingly large proportion of them, actually. Hence the AI boom.
An insight I picked up along the way…
1. Routinely some task or domain of work that some expert claims that LLM’s will able to do, LLM’s start being able to reliably perform that task within 6 months to a year, if they haven’t already
2. Whenever AI gets better, people move the goalposts regarding what “intelligence” counts as
3. Still, LLM’s reveal that there is an element to intelligence that is not orthogonal to the ability to do well on tests or benchmarks
By now I’m sure it won’t. Even if you provide the expected code verbatim, LLMs might go on a side quest to “improve” something.
This works on people as well!
Cops do this when interrogating. You tell the same story three times, sometimes backwards. It's hard to keep track of everything if you're lying or you don't recall clearly so you can get a sense of confidence. Also works on interviews, ask them to explain a subject in three different ways to see if they truly understand.
Only within certain conditions or thresholds that we're still figuring out. There are many cases where the more someone recalls and communicates their memory, the more details get corrupted.
> Cops do this when interrogating.
Sometimes that's not to "get sense of the variation" but to deliberately encourage a contradiction to pounce upon it. Ask me my own birthday enough times in enough ways and formats, and eventually I'll say something incorrect.
Care must also be taken to ensure that the questioner doesn't change the details, such as by encouraging (or sometimes forcing) the witness/suspect to imagine things which didn't happen.
With LLMs you have no such guarantee or expectation.
I just want to point out that this answer implicitly means that, at the very least, the profession is at least questionably uncertain which isn't a good sign for people with a long future orientation such as students.
Students shouldn't ever have become so laser-focused on a single career path anyway, and even worse than that is how colleges have become glorified trade schools in our minds. Students should focus on studying for their classes, getting their electives and extracurriculars in, getting into clubs... Then depending on which circles they end up in, they shape their career that way. The thought that getting a Computer Science major would guarantee students a spot in the tech industry was always ridiculous, because the industry was just never structured that way, it's always been hacker groups, study groups, open source, etc. bringing out the best minds
The claim in the blog post that all technology leads to speculative asset bubbles I find hard to believe. Where was the electricity bubble? The steel bubble? The pre-war aviation bubble? (The aviation bubble appeared decades later due to changes in government regulation.)
Is this an AI bubble? I genuinely don't know! There is a lot of real uncertainty about future cash flows. Uncertainty is not the foundation of a bubble.
I knew dot-com was a bubble because you could find evidence, even before it popped. (A famous case: a company held equity in a bubble asset, and that company had a market cap below the equity it held, because the bubble did not extend to second-order investments.)
https://en.wikipedia.org/wiki/Public_Utility_Holding_Company...
avation: likewise there was the "Lindberg Boom" https://en.wikipedia.org/wiki/Lindbergh_Boom which led to overspeculation and the crash of many early aviation companies
To me a bubble reflects a market disconnect from fundamentals - wherein prices go up steeply, with no help from the fundamentals (expected growth in base year cash flows and risk).
Indeed there is a subtle difference between a bubble and speculation of what could come of a technology. But the two are connected because the effects of technology are reflected by investors in asset prices.
> Maybe LLMs mark the point where we join our engineering peers in a world on non-determinism.
Those other forms of engineering have no choice due to the nature of what they are engineering.
Software engineers already have a way to introduce determinism into the systems they build! We’re going backwards!
The engineers at TSMC, Intel, Global Foundries, Samsung, and others have done us an amazing service, and we are throwing all that hard work away.
especially if you have some saved in the same project and then you just tell it "look at the scripts in x and use that style"
There’s a beautiful invitation to learn and contribute baked into a world where each command is fully deterministic and spec-ed out.
Yes, there have always been poorly documented black boxes, but I thought the goal was to minimize those.
People don’t understand how much is going to be lost if that goal is abandoned.
The more practical question is though, does that matter? Maybe not.
I think it matters quite a lot.
Specifically for knowledge preservation and education.
Ah but you see, imagine the shareholder value we can generate for a quarter or two in the meanwhile!
Editors note: please read that as absolutely dripping with disdain and sarcasm.
If I wanted to plumb together badly documented black boxes, I'd have become an EE.
For example, web requests are non-deterministic. They depend, among other things, on the state of the network. They also depend on the load of the machine serving the request.
One way to think about this is: how easy is it for you to produce byte-for-byte deterministic builds of the software you're working on? If it's not trivial there's more non-determinism than is obvious.
How can that be extralopated with LLMs? How does a system independently know that it's arrived at a correct answer within a timeout or not? Has the halting problem been solved?
That's the catch 22 with LLM. You're supposed to be both the asker and the verifier. Which in practice, it's not that great. LLMs will just find the snippets of code that matches somehow and just act on it (It's the "I'm feeling Lucky" button with extra steps)
In traditional programming, coding is a notation too more than anything. You supposed to have a solution before coding, but because of how the human brain works, it's more like a blackboard, aka an helper for thinking. You write what you think is correct, verify your assumptions, then store and forget about all of it when that's true. Once in a while, you revisit the design and make it more elegant (at least you hope you're allowed to).
LLM programming, when first started, was more about a direct english to finished code translation. Now, hope has scaled down and it's more about precise specs to diff proposal. Which frankly does not improve productivity as you can either have a generator that's faster and more precise (less costly too) or you will need to read the same amount of docs to verify everything as you would need to do to code the stuff in the first place (80% of the time spent coding).
So no determinism with LLMs. The input does not have any formal aspects, and the output is randomly determined. And the domain is very large. It is like trying to find a specific grain of sand on a beach while not fully sure it's there. I suspect most people are doing the equivalent of taking a handful of sand and saying that's what they wanted all along.
My intuition for the problem here is that people are fixated on the nondeterminism of the LLM itself, which is of limited importance to the actual problem domain of code generation. The LLM might spit out ancient Egyptian hieroglyphics! It's true! The LLM is completely nondeterministic. But nothing like that is ever going to get merged into `main`.
It's fine if you want to go on about how bad "vibe coding" is, with LLM-callers that don't bother to read LLM output, because they're not competent. But here we're assuming an otherwise competent developer. You can say the vibe coder is the more important phenomenon, but the viber doesn't implicate the halting problem.
SO that's why "it compiles" is worthless in a business settings. Of course it should compile. That's the bare minimum of expectations. And even "it passes the tests" is not that great. That just means you have not mess things up. So review and quality (accountability for both) is paramount, so that the proper stuff get shipped (and fixed swiftly if there was a mistake).
Again: my claim is simply that whatever else is going on, the halting problem doesn't enter into it, because the user in this scenario isn't obligated to prove arbitrary programs. Here, I can solve the halting problem right now: "only accept branchless programs with finite numbers of instructions". Where's my Field Medal? :)
It always feels like the "LLMs are nondeterministic" people are relying on the claim that it's impossible to tell whether an arbitrary program is branchless and finite. Obviously, no, that's not true.
Pretty sure you've just edited to add that part.
I did add the last paragraph of the comment you just responded to (the one immediately above this) about 5 seconds after I submitted it, though. Doesn't change the thread.
In a business settings, what usually matter is getting something into prod and not have bug reports thrown back. And a maintainable code that is not riddled down with debts.
Compiled code is as basic as steering left and right for a F1 driver. Having the tests pass is like driving the car at slow speed and completing a lap with no other cars around. If you're struggling to do that, then you're still in the beginner phase and not a professional. The real deal is getting a change request from Product and and getting it to Production.
This suggests a huge gap in your understanding of LLMs if we are to take this literally.
> LLM programming, when first started, was more about a direct english to finished code translation
There is no direct english to finished code translation. A prompt like "write me a todo app" has infinitely many codes it maps to with different tradeoffs and which appeal to different people. Even if LLMs never made any coding mistakes, there is no function that maps a statement like that to specific pieces of code unless you're making completely arbitrary choices like the axiom of choice.
So we're left with the fact that we have to specify what we want. And at that LLMs do exceptionally well.
It is definitely non-trivial and large organizations spend money to try to make it happen.
It’s non trivial because you have to back through decades of tools and fix all of them to remove non determinism because they weren’t designed with that in mind.
The hardest part was ensuring build environment uniformity but that’s a lot easier with docker and similar tooling.
We're introducing chaos monkeys, not just variability.
> Process engineers for example have to account for human error rates. [...] Designing systems to detect these errors (which are highly probabilistic!)
> Likewise even for regular mechanical engineers, there are probabilistic variances in manufacturing tolerances.
I read them as relatively confined, thus probabilistic. When a human pushes the wrong button, an elephant isn't raining from the sky. Same way tolerances are bounded.
When requesting a JSON array from an LLM, it could as well decide this time that JSON is a mythological Greek fighter.
Was it going backwards when the probabilistic nature of quantum mechanics emerged?
More seriously, this is not a fair comparison. Adding LLM output to your source code is not analogical to quantum physics; it is analogical to letting your 5 years old child transcribe the experimentally measured values without checking and accepting that many of them will be transcribed wrong.
And on the side hand, no transistors without quantum mechanics.
A junior engineer can't write code anywhere nearly as fast. It's apples vs oranges. I can have the LLm rewrite the code 10 times until its correct and its much cheaper than hiring an obsequious jr engineer
For any macOS users, I highly recommend an Alfred workflow so you just press command + space then type 'llm <prompt>' and it opens tabs with the prompt in perplexity, (locally running) deepseek, chatgpt, claude and grok, or whatever other LLMs you want to add.
This approach satisfies Fowler's recommendation of cross referencing LLM responses, but is also very efficient and over time gives you a sense of which LLMs perform better for certain tasks.
Are you just opening a browser tab?
http://localhost:3005/?q={query}
https://www.perplexity.ai/?q={query}
https://x.com/i/grok?text={query}
https://chatgpt.com/?q={query}&model=gpt-5
https://claude.ai/new?q={query}
Modify to your taste. Example: https://github.com/stevecondylios/alfred-workflows/tree/main (you should be able to download the .alfredworkflow file and double click on it to import it straight into alfred, but creating your own shouldn't take long, maybe 5-10 mins if it's your first workflow)
I like this article, I generally agree with it. I think the take is good. However, after spending ridiculous amounts of time with LLMs (prompt engineering, writing tokenizers/samplers, context engineering, and... Yes... Vibe coding) for some periods 10 hour days into weekends, I have come to believe that many are a bit off the mark. This article is refreshing, but I disagree that people talking about the future are talking "from another orifice".
I won't dare say I know what the future looks like, but the present very much appears to be an overall upskilling and rework of collaboration. Just like every attempt before, some things are right and some are simply misguided. e.g. Agile for the sake of agile isn't any more efficient than any other process.
We are headed in a direction where written code is no longer a time sink. Juniors can onboard faster and more independently with LLMs, while seniors can shift their focus to a higher level in application stacks. LLMs have the ability to lighten cognitive loads and increase productivity, but just like any other productivity enhancing tool doing more isn't necessarily always better. LLMs make it very easy to create and if all you do is create [code], you'll create your own personal mess.
When I was using LLMs effectively, I found myself focusing more on higher level goals with code being less of a time sink. In the process I found myself spending more time laying out documentation and context than I did on the actual code itself. I spent some days purely on documentation and health systems to keep all content in check.
I know my comment is a bit sparse on specifics, I'm happy to engage and share details for those with questions.
It still is, and should be. It’s highly unlikely that you provided all the required info to the agent at first try. The only way to fix that is to read and understand the code thoroughly and suspiciously, and reshaping it until we’re sure it reflects the requirements as we understand them.
No, written code is no longer a time sink. Vibe coding is >90% building without writing any code.
The written code and actions are literally presented in diffs as they are applied, if one so chooses.
The most efficient way to communicate these plans is in code. English is horrible in comparison.
When you’re using an agent and not reviewing every line of code, you’re offloading thinking to the AI. Which is fine in some scenarios, but often not what people would call high quality software.
Writing code was never the slow part for a competent dev. Agent swarming etc is mostly snake oil by those who profit off LLMs.
With an engineer you can hand off work and trust that it works, whereas I find code reviewing llm output something that I have to treat as hostile. It will comment out auth or delete failing tests.
Which should be always in my opinion
Are people really pushing code to production that they don't understand?
No.
The general assumed definition of vibe coding, hence the vibe word, is that coding becomes an iterative process guided by intuition rather than spec and processes.
What you describe is literally the opposite of vibe coding, it feels the term is being warped into "coding with an LLM".
Leaving out specs and documentation leads to more slop and hallucinations, especially with smaller models.
You can always just wing it, but if you do so and there isn't adequate existing context you're going to struggle with slop and hallucinations more frequently.
Written code has never been a time sink. The actual time that software developers have spent actually writing code has always been a very low percentage of total time.
Figuring out what code to write is a bigger deal. LLMs can help with part of this. Figuring out what's wrong with written code, and figuring out how to change and fix the code, is also a big deal. LLMs can help with a smaller part of this.
> Juniors can onboard faster and more independently with LLMs,
Color me very, very skeptical of this. Juniors previously spent a lot more of their time writing code, and they don't have to do that anymore. On the other hand, that's how they became not-juniors; the feedback loop from writing code and seeing what happened as a result is the point. Skipping part of that breaks the loop. "What the computer wrote didn't work" or "what the computer wrote is too slow" or even to some extent "what the computer wrote was the wrong thing" is so much harder to learn from.
Juniors are screwed.
> LLMs have the ability to lighten cognitive loads and increase productivity,
I'm fascinated to find out where this is true and where it's false. I think it'll be very unevenly distributed. I've seen a lot of silver bullets fired and disintegrate mid-flight, and I'm very doubtful of the latest one in the form of LLMs. I'm guessing LLMs will ratchet forward part of the software world, will remove support for other parts that will fall back, and it'll take us way too long to recognize which part is which and how to build a new system atop the shifted foundation.
I found exactly this is what LLMs are great at assisting with.
But, it also requires context to have guiding points for documentation. The starting context has to contain just enough overview with points to expand context as needed. Many projects lack such documentation refinement, which causes major gaps in LLM tooling (thus reducing efficacy and increasing unwanted hallucinations).
> Juniors are screwed.
Mixed, it's like saying "if you start with Python, you're going to miss lower level fundamentals" which is true in some regards. Juniors don't inherently have to know the inner workings, they get to skip a lot of the steps. It won't inherently make them worse off, but it does change the learning process a lot. I'd refute this by saying I somewhat naively wrote a tokenizer, because the >3MB ONNX tokenizer for Gemma written in JS seemed absurd. I went in not knowing what I didn't know and was able to learn what I didn't know through the process of building with an LLM. In other words, I learned hands on, at a faster pace, with less struggle. This is pretty valuable and will create more paths for juniors to learn.
Sure, we may see many lacking fundamentals, but I suppose that isn't so different from the criticism I heard when I wrote most of my first web software in PHP. I do believe we'll see a lot more Python and linguistic influenced development in the future.
> I'm guessing LLMs will ratchet forward part of the software world, will remove support for other parts that will fall back, and it'll take us way too long to recognize which part is which and how to build a new system atop the shifted foundation.
I entirely agree, in fact I think we're seeing it already. There is so much that's hyped and built around rough ideas that's glaringly inefficient. But FWIW inefficiency has less of an impact than adoption and interest. I could complain all day about the horrible design issues of languages and software that I actually like and use. I'd wager this will be no different. Thankfully, such progress in practice creates more opportunities for improvement and involvement.
It's not just the fundamentals, though you're right that is an easy casualty. I also agree that LLMs can greatly help with some forms of learning -- previously, you kind of had to follow the incremental path, where you couldn't really do anything complex without have the skills that it built on, because 90% of your time and brain would be spent on getting the syntax right or whatever and so you'd lose track of the higher-level thing you were exploring. With an LLM, it's nice to be able to (temporarily) skip that learning and be able to explore different areas at will. Especially when that motivates the desire to now go back and learn the basics.
But my real fear is about the skill acquisition, or simply the thinking. We are human, we don't want to have to go through the learning stage before we start doing, and we won't if we don't have to. It's difficult, it takes effort, it requires making mistakes and being unhappy about them, unhappy enough to be motivated to learn how to not make them in the future. If we don't have to do it, we won't, even if we logically know that we'd be better off.
Especially if the expectations are raised to the point where the pressure to be "productive" makes it feel like you're wasting other people's time and your paycheck to learn anything that the LLM can do for you. We're reaching the point where it feels irresponsible to learn.
(Sometimes this is ok. I'm fairly bad at long division now, but I don't think it's holding me back. But juniors can't know what they need to know before they know it!)
I've noticed the effects of this first hand from intense LLM engagement.
I relate it more to the effects of portable calculators, navigation systems, and tools like Wikipedia. I'm under the impression this is valid criticism, but we may be overly concerned because it's new and a powerful tool. There's even surveys/studies showing differences in how LLM are perceived WRT productivity between different generations.
I'm more concerned with potential loss of critical thinking skills, more than anything else. And on a related note, there have been concerns of critical thinking skills before this mass adoption of LLMs. I'm also concerned with the impact of LLMs on the quality of information. We're seeing a huge jump in quantity while some quality lacks. It bothers me when I see an LLM confidently presenting incorrect information that's seemingly trivial to validate. I've had web searches give me more incorrect information from LLM tooling at a much greater frequency than I've ever experienced before. It's even more unsettling when the LLM gives the wrong answer and the correct answer is in the description of the top result.
You have finally made an astute observation...
I have already made the assumption that use of LLMs is going to add new mounds of BS atop the mass of crap that already exists on the internet, as part of my startup thesis.
These things are not obvious in the here and now, but I try to take the view of - how would the present day look, 50 years out in the future looking backwards?
This is a completely wrong assumption and negates a bunch of the points of the article...
Do you have any hard number and data to state that tab/auto complete are less popular than agentic coding?
It was the original success of LLMs -- code autocomplete. All of the agentic coding have autocomplete under the hood. Autocomplete is almost literally what LLMs do + post-processing.
One thing I've done with some success is use a Test Driven Development methodology with Claude Sonnet (or recently GPT-5). Moving forward the feature in discrete steps with initial tests and within the red/green loop. I don't see a lot written or discussed about that approach so far, but then reading Martin's article made me realize that the people most proficient with TDD are not really in the Venn Diagram intersection of those wanting to throw themselves wholeheartedly into using LLMs to agent code. The 'super clippy' autocomplete is not the interesting way to use them, it's with multiple agents and prompt techniques at different abstraction levels - that's where you can really cook with gas. Many TDD experts have great pride in the art of code, communicating like a human and holding the abstractions in their head, so we might not get good guidance from the same set of people who helped us before. I think there's a nice green field of 'how to write software' lessons with these tools coming up, with many caution stories and lessons being learnt right now.
edit: heh, just saw this now, there you go - https://news.ycombinator.com/item?id=45055439
But like you said, it was meant more TDD as 'test first' - so a sort of 'prompt-as-spec' that then produces the test/spec code first, and then go iterate on that. The code design itself is different as influenced by how it is prompted to be testable. So rather than go 'prompt -> code' it's more an in-between stage of prompting the test initially and then evolve, making sure the agent is part of the game of only writing testable code and automating the 'gate' of passes before expanding something. 'prompt -> spec -> code' repeat loop until shipped.
As a simple example, a "buildUrl" style function that put one particular host for prod and a different host for staging (for an "environment" argument) had that argument "tested" by exactly comparing the entire functions return string, encoding all the extra functionality into it (that was tested earlier anyway).
A better output would be to check startsWith(prodHost) or similar, which is what I changed it into, but I'm still trying to work out how to get coding agents to do that in the first or second attempt.
But that's also not surprising: people write those kinds of too-narrow not-useful tests all the time, the codebase I work on is littered with them!
That sounds like an anti-pattern and not true TDD to get LLMs to generate tests for you if you don't know what to test for.
It also reduces your confidence in knowing if the generated test does what it says. Thus, you might as well write it yourself.
Otherwise you will get these sort of nasty incidents. [0] Even when 'all tests passed'.
[0] https://sketch.dev/blog/our-first-outage-from-llm-written-co...
In short, LLMs often get confused about where the problem lies: the code under test or the test itself. And no amount of context engineering seems to solve that.
Without providing the actual feature requirements to the LLM(or the developer) it is impossible to determine which is wrong.
Which is why I think it is also sort of stupid by having the LLM generate tests by just giving it access to the implementation. That is at best testing the implementation as it is, but tests should be based on the requirements.
Before I let an agent touch code, I spell out the issue/feature and have it write two markdown files - strategy.md and progress.md (with the execution order of changes) inside a feat_{id} directory. Once I’m happy with those, I wipe the context and start fresh: feed it the original feature definition + the docs, then tell it to implement by pulling in the right source code context. So by the time any code gets touched, there’s already ~80k tokens in play. And yet, the same confusion frequently happens.
Even if I flat out say “the issue is in the test/logic,”, even if I point out _exactly_ what the issue is, it just apologizes and loops.
At that point I stop it, make it record the failure in the markdown doc, reset context, and let it reload the feature plus the previous agent’s failure. Occasionally that works, but usually once it’s in that state, I have to step in and do it myself.
I used to avidly read all his stuff, and I remember 20ish years ago he decided to rename Inversion of Control to Dependency Injection. In doing so, and his accompany blog, he showed he didn't actually understand it at a deep level (and hence his poor renaming).
This feels similar. I know what he's trying to say, but he's just wrong. He's trying to say the LLM is hallucinating everything, but Fowler is missing is that Hallucination in LLM terms refers to a very specific negative behavior.
Positive hallucinations are more likely to happen nowadays, thanks to all the effort going into these systems.
If you disagree then I would ask what exactly is the “specific behaviour” you’re talking about?
What didn't he understand properly about inversion of control then.
Reminds me of a recent experience when I asked CC to implement a feature. It wrote some code that struck me as potentially problematic. When I said, "why did you do X? couldn't that be problematic?" it responded with "correct; that approach is not recommended because of Y; I'll fix it". So then why did it do it in the first place? A human dev might have made the same mistake, but it wouldn't have made the mistake knowing that it was making a mistake.
No, they're like an extremely experienced and knowledgeable senior colleague – who drinks heavily on the job. Overconfident, forgetful, sloppy, easily distracted. But you can hire so many of them, so cheaply, and they don't get mad when you fire them!
And if you push back on that insanity, they'll smile and nod and agree with you and in the next sentence, go right back to pushing that nonsense.
Ask it for things that many people get wrong or just do badly, or can be mistakenly likened to a popular thing in a way that produces a wrong result, and it'll often err.
The trick is having an awareness of what correct solutions are prevalent in training data and what the bulk of accessible code used for training probably doesn't have many examples of. And this experience is hard to substitute for.
Juniors therefore use LLMs in a bumbling fashion and are productive either by sheer luck, or because they're more likely to ask for common things and so stay in a lane with the model.
A senior developer who develops a good intuition for when the tool is worth using and when not can be really efficient. Some senior developers however either overestimate or underestimate the tool based on wrong expectations and become really inefficient with them.
“libFakeCall doesn’t exist. Use libRealCall instead of libFakeCall.”
“You’re absolutely correct. I apologize for blah blah blah blah. Here’s the updated code with libRealCall instead. :[…]”
“You just replaced the libFakeCall reference with libRealCall but didn’t update the calls themselves. Re-write it and cite the docs. “
“Sorry about the confusion! I’ve found these calls in the libRealCall docs. Here’s the new code and links to the docs.”
“That’s the same code but with links to the libRealCall docs landing page.”
“You’re absolutely correct.” It appears that these calls belong to another library with that functionality:” looks totally plausible but hallucinated libFakeCall.
For all the hot air I hear about the user having to give the system the proper context to give you good answers - does anyone claim to have a solution for dealing with such a belligerent approach to bullshit?
They are by no means useless, but once they fall into that hole, there's no further value in interrogating them.
But it is nothing like a wetware colleague. It is a machine.
They are extremely shallow, even compared to a junior developer. But extremely broad, even compared to the most experienced developer. They type real fuckin fast compared to anyone on earth, but they need to be told what to do much more carefully than anyone on earth.
LLM did help a lot to get some busy work out of the way, but it's difficult to know when you need to jump out of the LLM loop and go old skool.
Unrelated to software but recently I wanted to revive an old dumbphone I haven't used since 2014 and apparently I had it password protected and forgot the password and wanted to factory reset it. I found the exact model of the phone and google had only content farm articles that didn't help me at all but Gemini gave me the perfect solution first try. I went to google first because I had no faith in Gemini since to me it seemed like a pretty obscure question but guess I was wrong.
AI is like a super advanced senior wearing a blindfold. It knows almost everything, it's super fast, and it gets confused pretty quickly about things after you tell it.
> Most of the juniors
Most senior programmers can't write CUDA kernels either. Even fewer can write ones that are any good.I think the programming profession overvalues experience over skill. However, when I was young I had no appreciation for what the benefits of experience are... including not writing terrible code.
I figured there's probably a ton more logical issues and deleted it immediately
have you ever managed an offshore team. holy cow
I asked Claude to design me a UI, and it made a lovely one... but I wanted a web ui. It very happily through away all its work and made a brand new web UI.
I can't imagine any employee being that quick to just move on after something like that.
They follow directions for maybe an hour and then go off and fix random shit because they forgot what their main task was.
They'll tell you to your face how great your ideas were, and that you're absolutely right about something, then go implement it completely incorrectly.
They add comments to literally everything even when you ask them to stop. But they also ignore said comments sometimes.
Maybe they are kinda like us lol.
No, typically __you__ are mad when you fire them ...
Yes, that's why you should add "... and answer in the style of a drunkard" to every prompt. Makes it easier to not forget what you are dealing with.
Individual sentences and paragraphs may mostly work, but it's an edifice built on sand out of poorly constructed bricks plus mortar with the wrong proportions (or entirely wrong ingredients).
LLM output is "truthy" - it looks like it might be true, and sometimes it will even be accurate (see also, "stopped clock") but depending on it is foolish because what's generating the output doesn't actually understand what it's putting out - it's just generating output that looks like the kind of thing you've requested.
And constantly microdosing, sometimes a bit too much.
The internet build out left massive amounts of useful infrastructure.
The railroads left us with lots of railroads that fell into disuse and eventually left us with a complete joke of a railway system. Made a few people so rich we started calling them robber barons and talking the gilded age.
Are we going to continue to use the 60 billion dollar data centers in Louisiana when the bubble bursts? Is it valuable infrastructure or just a waste of money that gets written off?
Why not?
Tracks in the middle of nowhere are useless.
Data centers in the middle of nowhere are useful, as long as energy, cooling and uplink are cost-effective.
(Not saying Louisiana is in the middle of nowhere!)
There was once a coding agent which achieved SOTA performance on SWE Bench Verified by "just" running the agent 5 times on each instance, scoring each attempt and picking the attempt with the highest score: https://aide.dev/blog/sota-bitter-lesson
We are producing more code, but quality is definitely taking a hit now that no-one is able to keep up.
So instead of slowly inching towards the result we are getting 90% there in no time, and then spending lots and lots of time on getting to know the code and fixing and fine-tuning everything.
Maybe we ARE faster than before, but it wouldn't surprise me if the two approaches are closer than what one might think.
What bothers me the most is that I much prefer to build stuff rather than fixing code I'm not intimately familiar with.
In any case poor work quality is a failure of tech leadership and culture, it's not AI's fault.
Do you think by the time GPT-9 comes, we'll say "That's it, AI is a failure, we'll just stop using it!"
Or do you speak in metaphorical/bigger picture/"butlerian jihad" terms?
The tools used do not hold responsibilities, they are tools.
But also, blameless culture is IMO important in software development. If a bug ends up in production, whose fault is it? The developer that wrote the code? The LLM that generated it? The reviewer that approved it? The product owner that decided a feature should be built? The tester that missed the bug? The engineering organization that has a gap in their CI?
As with the Therac-25 incident, it's never one cause: https://news.ycombinator.com/item?id=45036294
And its going to get worse! So please explain to me how in the net, you are going to be better off? You're not.
I think most people haven't taken a decent economics class and don't deeply understand the notion of trade offs and the fact there is no free lunch.
Yes there are trade offs, but at this point if you haven’t found a way to significantly amplify and scale yourself using llms, and your plan is to instead pretend that they are somehow not useful, that uphill battle can only last so long. The genie is out of the bag. Adapt to the times or you will be left behind. That’s just what I think.
Also telling someone to "adapt to the times" is a bit silly. If it helped as much as its claimed, there wouldn't be any need to try and convince people they should be using it.
A LOT of parallels with crypto, which is still trying to find its killer app 16 years later.
Technology moves forward and productivity improves for those that move with it.
It does not, technology regresses just as often and linear deterministic progress is just a myth to begin with. There is no guarantee for technology to move forward and always make things better.
There are plenty of examples to be made where technology has made certain things worse.
That's not all it does but I think it's one of the more important fundamentals.
Would you say the same thing ("If it helped as much as its claimed, there wouldn't be any need to try and convince people they should be using it.") about the internet?
Go ahead, convince me. Please describe clearly and concisely in one or two sentences the clear economic value/advantage of LLMs.
Citation needed. In my circles, Senior engineer are not using them a lot, or in very specific use cases. My company is blocking LLMs use apart from a few pilots (which I am part of, and while claude code is cool, its effectiveness on a 10-year old distributed codebase is pretty low).
You can't make sweeping statements like this, software engineering is a large field.
And I use claude code for my personal projects, I think it's really cool. But the code quality is still not there.
People are honestly just drunk on this thing at this point. The sunken cost fallacy has people pushing on (ie. spending more time) when LLMs aren't getting it right. People are happy to trade convenience for everything else, just look at junk food where people trade in flavour and their health. And ultimately we are in a time when nobody is building for the future, it's all get rich quick schemes: squeeze then get out before anyone asks why the river ran dry. LLMs are like the perfect drug for our current society.
Just look at how technology has helped us in the past decades. Instead of launching us towards some kind of Star Trek utopia, most people now just work more for less!
Best practices in software development for forever have been to verify everything; CI, code reviews, unit tests, linters, etc. I'd argue that with LLM generated code, a software developer's job and/or that of an organization as a whole has shifted even more towards reviewing and verification.
If quality is taking a hit you need to stop; how important is quality to you? How do you define quality in your organization? And what steps do you take to ensure and improve quality before merging LLM generated code? Remember that you're still the boss and there is no excuse for merging substandard code.
Boilerplate sucks to review. You just see a big mass of code and can't fully make sense of it when reviewing. Also, Github sucks for reviewing PRs with too many lines.
So junior/mid devs are just churning boilerplate-rich code and don't really learn.
The only outcome here is code quality is gonna go down very very fast.
This is what LLM "reasoning" does. More than "reasoning" in the human sense, it just reduces variance from variations in the prompt and random next token prediction.
Yup, this matches my recommended workflow exactly. Why waste time trying to turn an initially bad answer into a passable one, when you could simply re-generate (possibly with different context)
I wrote up an example of this workflow here: https://github.com/sutt/agro/blob/master/docs/case-studies/a...
What a great way of framing it. I've been trying to explain this to people, but this is a succinct version of what I was stumbling to convey.
E.g. Programming in JS or Python: good enough
Programming in Rust: I can scrap over 50% of the code because it will
a) not compile at all (I see this while the "AI" types)
b) not meet the requirements at all
It generated a thousand line file with a robust breakdown of everything that needed to be done and at my command it did it. We went module by module and I made sure that each module Had comprehensive unit test coverage and that the repo built well as we went. After a few hours of back and forth we made 9 modules, 60+ APIs across 10 different tables, and hundreds of unit tests that all are passing.
Does that mean that I’m all done and ready to deploy to prod? Unlikely. But it does mean that I got a ton of boilerplate stuff put into place really quickly and that I’m eight hours into a project that would have taken at least a month before.
Once the BE was done I had it generate extensive documentation for the agent that would handle the FE integration as a sort of instruction guide - in case we need it. As issues and bugs arise during integration (they will!) the model has everything it needs to keep on track and finish the job it set out to do.
What a time to be alive!
Even this post by Martin Fowler shows he's an aging dinosaur stuck in denial.
> I’ve often heard, with decent reason, an LLM compared to a junior colleague. But I find LLMs are quite happy to say “all tests green”, yet when I run them, there are failures. If that was a junior engineer’s behavior, how long would it be before H.R. was involved?
I don't know what "LLM's" he's using but I just simply don't get hallucinations like that with cursor or claude code.
He ends with this: > LLMs create a huge increase in the attack surface of software systems. Simon Willison described the The Lethal Trifecta for AI agents: an agent that combines access to your private data, exposure to untrusted content, and a way to externally communicate (“exfiltration”). That “untrusted content” can come in all sorts of ways, ask it to read a web page, and an attacker can easily put instructions on the website in 1pt white-on-white font to trick the gullible LLM to obtain that private data.
Not sure why he is re-iterating well known prompt injection vulnerability, passing it off as a general weakness of LLM's that applies to all LLM use when that's not the reality.
Plus, there is no telling how many bugs you have in that code.
So far AI seems to be a great augmentation, but not a replacement.
Recently some people have compared LLMs to compilers and the resulting source code to object code. This is a false analogy, because compilation is (almost always) a semantics preserving transformation. LLMs are given a natural language spec (prompt) that by definition is underspecified. And so they cannot be semantics preserving, as the semantics of their input is ambiguous.
The programmer is left with two options: (1) understanding the resulting code, repairing and rewriting it, or (2) ignoring the code and performing validation by testing.
Both of these approaches are assistive. At least in its current form, AI can only accelerate a programmer, not replace them. Lovable and similar tools rely on very informal testing, which is why they can be used by non-programmers, but they have very little chance of producing robust software of any complexity. I’ve seen people creating working web apps, but I am confident I could find plenty of strange bugs just by testing edge cases or stressing non functional qualities. The bigger issue is the bugs I can’t find because they’re not bugs a human programmer would create.
Option (1) is problematic because LLMs tend not to produce clean code designed to be human readable. A lot of the efforts coders are making is to break down tasks and try to guide the LLMs to produce good code. I have yet to see this work for anything novel and complex. For non trivial systems, reasoning and architecture are required. The hope is that a programmer can write specs well enough that LLMs can “fill in the gaps”. But whether this is a net positive once considering the work involved is still an open question. I’ve yet to see any first hand evidence that there’s a productivity gain here. It’s early days.
Option(2) is also difficult because there is a crucial factor missing in AI coding, the “generality of intent” as a human user. This is a problem, because the non-trivial bugs an AI produces are unlikely to be similar to those from a human. Those bugs are usually a failure of reasoning, but LLMs don’t reason in the same sense that humans do, so testing in the same way may not be possible. Your intuitions for where bugs lie are no longer applicable. The likely result is worse code produced more quickly, and that trade-off needs exploring.
At the moment I think AI is useful for (a) discussions around design, libraries, debugging, (b) autocomplete, (c) agent analysis of existing code where partial answers are ok and false positives acceptable (eg finding some but not all bugs). Agent coding doesn’t seem ready for production to me, not until we have much better tooling to prevent some of these problems, or AI becomes capable of proper reasoning.
krainboltgreene•12h ago
Is this actually correct? I don't see any evidence for a "airflight bubble" or a "car bubble" or a "loom bubble" at the technologies' invention. Also the "canal bubble" wasn't about the technology, it was about the speculation on a series of big canals but we had been making canals for a long time. More importantly, even if it was correct, there are plenty of bubbles (if not significantly more) around things that didn't have value or tech that didn't matter.
mmmm2•12h ago
sfink•12h ago
Does that sound like any human, ever, to you?
(The only time there isn't a bubble is when the thing just isn't that interesting to people and so there's never a big wave of uptake in the first place.)
krainboltgreene•10h ago
That's an absurd framing for a cute quip.
BlueTemplar•9h ago
Marazan•12h ago
tptacek•12h ago
daveguy•7h ago
Edits--
Found one: https://en.wikipedia.org/wiki/Panic_of_1893
Another good one:https://en.wikipedia.org/wiki/Public_Utility_Holding_Company... (from cake_robot here: https://news.ycombinator.com/item?id=45056621)
For reference, apple and spotify links to the Derek Thompson podcast in reply below (thank you!):
https://podcasts.apple.com/us/podcast/plain-english-with-der...
https://open.spotify.com/show/3fQkNGzE1mBF1VrxVTY0oo
tptacek•7h ago
The whole subtext of that podcast was how eerily similar the Transcontinental Railroad was to AI (as an investment/malinvestment/prediction of future trends).