> Go read Peter Naur's "Programming as Theory Building" and then come back and tell me that LLMs can replace human programmers
Which to me gives a very different understanding of what the article is going to be about than the current HN title. This is not a criticism of the submitter, I know HN has a character limit and sometimes it’s hard to condense titles without unintentionally losing meaning.
> In this essay, I will perform the logical fallacy of argument from authority (wikipedia.org) to attack the notion that large language model (LLM)-based generative "AI" systems are capable of doing the work of human programmers.
Is any part of this intended to be valid? It's a very weak argument - is that the purpose?
We do understand how the machine works and how it came to be. What most companies are seeking for is a way to make that useful.
It isn't certain that this framing is true. As part of learning to predict the outcome of the work token by token, LLMs very well might be "doing the work" as an intermediate step via some kind of reverse engineering.
They're already have the full work available. When you're reading the source code of a program to learn how it works, your objective is not to learn what keyword are close to each other or extract the common patterns. You're extracting a model which is an abstraction about some real world concept (or some other abstractions) and rules of manipulation of that abstraction.
After internalizing that abstraction, you can replicate it with whatever you want, extends it further,... It's an internal model that you can shape as you please in your mind, then create a concrete realization once you're happy with the shape.
I.e. the theory of the program as it exist in the minds of the development team might not be fully available for reconstruction from just the final code and docs since it includes a lot of activity that does not end up in the code.
I suspect that the question to his final answer is:
> To replace human programmers, LLMs would need to be able to build theories by Ryle’s definition
If you see "a = b + 1" it's obvious that the variable a is taking the value of variable b incremented by one. What LLMs can't do is explaining why we have this and why it needs to change to "a = b - 1" in the new iteration. Writing code is orthogonal to this capability.
I did a search on Github for code containing `a=b+1` and found this:
https://github.com/haoxizhong/problem/blob/a2b934ee7bb33bbe9...
It looks to me that ChatGPT specifically does a more than OK job at explaining why we have this.
https://chatgpt.com/share/680f877d-b588-8003-bed5-b425e14a53...
While your use of 'theory' is reasonable Naur uses a specific and more elaborate definition of theory.
Example from the paper:
>Case 1 concerns a compiler. It has been developed by a group A for a Language L and worked very well on computer X. Now another group B has the task to write a compiler for a language L + M, a modest extension of L, for computer Y. Group B decides that the compiler for L developed by group A will be a good starting point for their design, and get a contract with group A that they will get support in the form of full documentation, including annotated program texts and much additional written design discussion, and also personal advice. The arrangement was effective and group B managed to develop the compiler they wanted. In the present context the significant issue is the importance of the personal advice from group A in the matters that concerned how to implement the extensions M to the language. During the design phase group B made suggestions for the manner in which the extensions should be accommodated and submitted them to group A for review. In several major cases it turned out that the solutions suggested by group B were found by group A to make no use of the facilities that were not only inherent in the structure of the existing compiler but were discussed at length in its documentation, and to be based instead on additions to that structure in the form of patches that effectively destroyed its power and simplicity. The members of group A were able to spot these cases instantly and could propose simple and effective solutions, framed entirely within the existing structure. This is an example of how the full program text and additional documentation is insufficient in conveying to even the highly motivated group B the deeper insight into the design, that theory which is immediately present to the members of group A.
There are alternative views on theorizing that reject flat positivistic reductions and attempt to show that theories are metaphysical and force us to make varying degrees of ontological and normative claims, see the work of Marx Wartofsky, for example. This view is far more humanistic and ties in directly to sociological bases in praxis. This view will support the author's claims much better. Furthermore, Wartofsky differentiates between different types of cognitive representations (e.g. there is a difference between full blown theories and simple analogies). A lot of people use the term "theory" way more loosely than a proper analysis and rigorous epistemic examination would necessitate.
(I'm not going to make the argument here but fwiw, it's clear under these notions that LLMs do not form theories, however, they are playing an increasingly important part in our epistemic activity of theory development)
I might one day wake up and find my dog to be more intelligent than me, not very likely but I can't prove it to be impossible.
It's still useless.
> Second, you cannot effectively work on a large program without a working "theory" of that program...
I find the whole argument and particularly the above to be a senseless rejection of bootstrapping. Obviously there was a point in time (for any program, individual programmer and humanity as a whole) that we didn't have a "theory" and didn't do the work, but now we have both, so a program and its theory can appear "de novo".
So with that in mind, how can we reject the possibility that as an AI Agent (e.g. Aider) works on a program over time, it bootstraps a theory?
Lack of effective memory, that might have worked if you constantly retrained the LLM incorporating the new wisdom iteratively like a human does, but current LLM architecture doesn't enable that. The context provided is neither large enough nor can it use it effectively enough for complex problems.
And this isn't easy to solve, you very quickly collapse the LLM if you try to do this in the naive ways. We need some special insight that lets us update LLM continuously as it works in a positive direction the way humans can.
So that isn't just "ask it to play pokemon", that is a large program with tons of different prompts and memories that kicks in at different times, and even with all that and updates to the program when it gets stuck it still struggles massively and repeats mistakes over and over in ways human never would.
That's the appropriate level of faith for today's LLMs. They're not good enough to replace programmers. They're good enough that we can't reject the possibility of them one day being good enough to replace programmers.
Possibility means practically nothing.
[1] https://www.hipkapi.com/2011/03/10/foundationalism-and-virtu...
[2] Brown, Harold I. “Circular Justifications.” PSA: Proceedings of the Biennial Meeting of the Philosophy of Science Association 1994 (1994): 406–14. http://www.jstor.org/stable/193045.
You can have the models pseudo “learn” by putting things in something like a system prompt but this is limited by context, and they will never permanently learn. But we don’t train at inference time with today’s LLMs.
We can explicitly reject this possibility by looking at the information that goes into the model at train and test time.
However, I have two counters:
- First, the rational argument right now is that one person and money spent toward LLMs can replace three - or more - programmers total. This is the argument with a three year bound. The current technology will improve and developers will learn how to use it to its potential.
- Second, the optimistic argument is that a combination of the LLM model with larger context windows and other supporting technology around it will allow it to emulate a theory of mind that is similar to the average programmer. Consider Go or Chess - we didn't think computers had the theory of mind to be better than a human, but it found other ways. For humans, Naur's advice stands. We cannot assume that this is true if there are tools with different strengths and weaknesses than humans.
There is no doubt it will improve, but if you look at a car, it is still the same fundamental "shape" of a model T.
There are niceties and conveniences, efficiency went way up, but we don't have flying cars.
I think we are going to have something, somewhere in the middle, AI features will eventually find their niche, people will continue to leverage whatever tools and products are available to build the best thing they can.
I believe that a future of self-writing code pooping out products, AI doing all the other white collar jobs, and robots doing the rest cannot work. Fundamentally there is no "business" without customers and no customers if no one is earning.
Maybe as a senior dev working on a large complex established project I don't benefit from LLMs as much as others because as I and the project mature.. productivity becomes less and less correlated with lines of code, and more about the ability to comprehend the bigger picture and how different components interact... things that even LLMs with bigger context aren't good at.
n4r9•4h ago
Firstly:
> LLMs are capable of appearing to have a theory about a program ... but it’s, charitably, illusion.
To make this point stick, you would also have to show why it's not an illusion when humans "appear" to have a theory.
Secondly:
> Theories are developed by doing the work and LLMs do not do the work
Isn't this a little... anthropocentric? That's the way humans develop theories. In principle, could a theory not be developed by transmitting information into someone's brain patterns as if they had done the work?
IanCal•4h ago
1. Run code 2. Communicate with POs 3. Iteratively write code
n4r9•3h ago
The system you describe appears to treat programmers as mere cogs. Programmers do not simply write and iterate code as dictated by POs. That's a terrible system for all but the simplest of products. We could implement that system, then lose the ability to make broad architectural improvements, effectively adapt the model to new circumstances, or fix bugs that the model cannot.
IanCal•1h ago
Not at all, it simply addresses key issues raised. That they cannot have a theory of the program because they are reading it and not actually writing it - so have them write code, fix problems and iterate. Have them communicate with others to get more understanding of the "why".
> . Programmers do not simply write and iterate code as dictated by POs.
Communicating with POs is not the same as writing code directed by POs.
n4r9•1h ago
Jensson•2h ago
Human theory building works, we have demonstrated this, our science letting us build things on top of things proves it.
LLM theory building so far doesn't, they always veer in a wrong direction after a few steps, you will need to prove that LLM can build theories just like we proved that humans can.
jerf•1h ago
"Building a theory" is something I expect the next generation of AIs to do, something that has some sort of memory that isn't just a bigger and bigger context window. As I often observe, LLMs != AI. The fact that an LLM by its nature can't build a model of a program doesn't mean that some future AI can't.
imtringued•1h ago
What I personally find perplexing is that we are still stuck at having a single context window. Everyone knows that turing machines with two tapes require significantly fewer operations than a single tape turning machine that needs to simulate multiple tapes.
The reasoning stuff should be thrown into a separate context window that is not subject to training loss (only the final answer).
falcor84•1h ago
Arguably that's the case for humans too in the general case, as per the aphorism "Beware of a guy in a room" [0]. But as for AIs, the thing is that they're exponentially improving at this, such that according to METR, "The length of tasks that AI can do is doubling every 7 months"[1].
[0] https://medium.com/machine-words/a-guy-in-a-room-bbbe058645e...
[1] https://metr.org/blog/2025-03-19-measuring-ai-ability-to-com...
Jensson•1h ago
dkarl•23m ago
You can read it as a claim about what LLMs can do now, but that wouldn't be very interesting, because it's obvious that no current LLM can replace a human programmer.
I think the author contradicts themselves. They argue that LLMs cannot build theories because they fundamentally do not work like humans do, and they conclude that LLMs can't replace human programmers because human programmers need to build theories. But if LLMs fundamentally do not work like humans, how do we know that they need to build theories the same way that humans do?
ryandv•2h ago
This idea has already been explored by thought experiments such as John Searle's so-called "Chinese room" [0]; an LLM cannot have a theory about a program, any more than the computer in Searle's "Chinese room" understands "Chinese" by using lookup tables to generate canned responses to an input prompt.
One says the computer lacks "intentionality" regarding the topics that the LLM ostensibly appears to be discussing. Their words aren't "about" anything, they don't represent concepts or ideas or physical phenomena the same way the words and thoughts of a human do. The computer doesn't actually "understand Chinese" the way a human can.
[0] https://en.wikipedia.org/wiki/Chinese_room
CamperBob2•1h ago
The computer understands Chinese better than Searle (or anyone else) understood the nature and functionality of language.
ryandv•1h ago
Wow.
CamperBob2•54m ago
How do you know you're not arguing with an LLM at the moment? You don't... any more than I do.
ryandv•52m ago
I wish I was right now. It would probably provide at least the semblance of greater insight into these topics.
> the Chinese Room was a weak test the day it was proposed
Why?
CamperBob2•34m ago
That's very safe to say. You should try it. Then ask yourself how a real Chinese Room would have responded.
Why?
My beef with the argument is that simulating intelligence well enough to get a given job done is indistinguishable from intelligence itself, with respect to the job in question.
More specific arguments along the lines of "Humans can do job X but computers cannot" have not held up well lately, but they were never on solid logical ground. Searle set out to construct such a logical ground, but he obviously failed. If you took today's LLMs back to the 1960s when he proposed that argument, either Searle would be laughed out of town, or you would be burned as a witch.
Arguments along the lines of "Machines can never do X, only humans can do that" never belonged in the scientific literature in the first place, and I think the Chinese Room falls into that class. I believe that any such argument needs to begin by explaining what's special about human thought. Right now, the only thing you can say about human thought that you can't say about AI is that humans have real-time sensory input and can perform long-term memory consolidation.
Those advantages impose real limitations on what current-generation LLM-based technology can do compared to humans, but they sound like temporary ones to me.
Jensson•13m ago
That isn't the argument though.
> If you took today's LLMs back to the 1960s when he proposed that argument, either Searle would be laughed out of town, or you would be burned as a witch.
Do you think humans were different in the 1960s? No they would see the same limitations as people point out today. 1960s was when AI optimism was still very high.
nullstyle•34m ago
ryandv•21m ago
I see. So if I gave you a full set of those lookup tables, a whole library full, and a set of instructions for their usage... you would now understand the world's languages?
Jensson•15m ago
TeMPOraL•42m ago
> Their words aren't "about" anything, they don't represent concepts or ideas or physical phenomena the same way the words and thoughts of a human do. The computer doesn't actually "understand Chinese" the way a human can.
That's very much unclear at this point. We don't fully understand how we relate words to concepts and meaning ourselves, but to the extent we do, LLMs are by far the closest implementation of those same ideas in a computer.
vacuity•36m ago
ryandv•35m ago
We can hem and haw about whether or not there are others, but the particular conclusion I am drawing from is that computers lack "intentionality" regarding language, and indeed about anything at all. Symbol shunting, pencil pushing, and the mechanics of syntax are insufficient for the production of meaning and understanding.
That is, to oversimplify, the broad distinction drawn in Naur's article regarding the "programming as text manipulation" view vis-a-vis "programming as theory building."
> That's very much unclear at this point.
It's certainly a central point of contention.
looofooo0•14m ago
smithkl42•11m ago
psychoslave•10m ago
That burden of proof is on you, since you are presumably human and you are challenging the need of humans to have more than a mere appearance of having a theory when they claim to have one.
Note that even when the only theoretical assumption we go with is that we will have a good laugh watching other people going crazy after random bullshits thrown at them, we still have a theory.
dcre•5m ago