My prediction: in 10 years we'll see LLMs generate machine code directly, just like a normal compiler. The programming language will be the context provided by the context engineer.
The core issue is that you need to be able to iterate on different parts of the application, either without altering unaffected parts or with deterministic translation. Otherwise, this AI application will be full of new bugs every change.
This can be achieved by utilizing tests. So the SWE agent will write up a set of tests as it understands the task. These are the functional requirements, which should/could be easily inspected by the BI (biological intelligence).
Once the functional requirements have been set, the SWE agent can iterate over and over again until the tests pass. At this point it doesn't really matter what the solution code looks like or how it's written, only that the functional requirements as defined via the tests are upheld. New requirements? Additional tests.
But I always end up in a scenario where, in order to make the LLM spit out consistent and as precise as possible code, we end up with a very simple and tight syntax.
For example we'll be using less and less complete human sentences, because they leave too much open to interpretation, and end up with keywords like "if", "else" and "foreach". When we eventually do end up at that utopia, the first person to present this at a conference will be hailed as a revolutionist.
Only for the LLM to have a resolve clash and, while 'hallucinating', flip a boolean check.
> That means we no longer examine the code. Our time as engineers will be spent handling context, testing features, and iterating on them
IOW there would be no human to "work on, collaborate on and maintain the codebase" and so the premise of the article is that it might just as well emit machine code from the "source prompt", hence "LLM as compiler".
Or maybe you mean that this formal language is not for humans to handle but entirely dedicated to LLMs, for the sake of LLMs not having to reverse engineer assembly?
I think that's where the premises differ: the author seems to suggest that the assembly would be generated each time from the "source prompt"
I don't know, these all read like thought experiments built on hypothetical properties that these AI tools would somehow be bestowed upon in some future and not something grounded in any reality. IOW science fiction.
Congratulations. An LLM is not a 'compiler'.
Which one? Most languages are full of imprecision and change over time. So which one would be best for giving instructions to the machines?
Specifications need to be unambiguous but Natural Language is often ambiguous.
Sure and if you run into a crash the system is just bricked.
This sort of wishful thinking glosses over decades of hard earned deterministic behavior in computers.
Now this isn’t to say the current programming languages are good, they are generally not. They don’t offer good abstraction powers, typically. You pay for this in extra lines of code.
But having to restate everything in English, then hoping that the LLM will fill in enough details, then iterating until you can close the gaps, well it doesn’t seem super efficient to me. You either cede control to what the LLM guesses or you spend a lot of natural language.
Certainly in a language with great abstractions you’d be fine already.
Djikstra was talking about something completely different.
I have created some pipelines this way where the LLM generates input files for a molecular dynamics code and write a python script for execution on a HPC system.
The real value of LLMs is their conversational ability. Try something, iterate, try something else, iterate again, have it patch a bug you see, ask if it has recommendations based on where you are headed, flesh things out and fine tune them real time. Understand its misunderstandings and help it grasp the bigger picture.
Then at the end of the session, you'll have working code AND a detailed requirements document as an output. The doc will discuss the alternatives you tried along the way, and why you ended up where you did.
It's much like this in graphics too. Yeah you could spend a ton of time coming up with the single one-shot prompt that gives you something reasonably close to what you need, which is how it worked in the past. But now that approach is silly. It's much easier to work iteratively, change one thing, change another, until you have exactly what you need, in a much faster and more creative session.
So yeah you could use LLMs as a compiler, but it's so much more engaging not to.
It's inspired by the evolution you mentioned: early compilers generating Assembly, now AI tools generating Python or SQL. Mochi leans into that by embedding declarative data queries, AI generation, and streaming logic directly into the language. Here is how it looks:
type Person {
name: string
age: int
email: string
}
let p = generate Person {
prompt: "Generate a fictional software engineer"
}
print(p.name)
let vec = generate embedding {
text: "hello world"
normalize: true
}
print(len(vec))
We see this as the natural next step after traditional compilers, more like intent compilers. The old "compiler to Assembly" phase now maps to "LLM prompt scaffolding" and prompt engineering is quickly becoming the new backend pass.Would love feedback if this resonates with others building around AI + structured languages.
You don't need as specialized skillsets to build complex apps, you just need to know how to put context together and iterate
I feel it is exactly the opposite. AI helps specialists iterate faster, knowing what they are doing. Who doesn't know the details will stumble upon problems unsolvable by AI iteration. Who knows the details can step in where AIs fail.The problem is that people often can't recognize whether they are getting closer to the solution or not, so iteration breaks.
The current way to generate source code of existing programming languages, is only a transition step, akin to how early compilers always generated Assembly that was further processed by an existing Assembler.
Eventually most developers don't even know the magic incantations to spew Assembly of their compilers, including JITs, it has become a dark art for compiler engineers, game developers and crypto folks.
People that used to joke about COBOL, would be surprised how much effort is being spent in prompt engineering.
cschep•5h ago
gloxkiqcza•5h ago
zekica•4h ago
mzl•4h ago
galaxyLogic•3h ago
So if an LLM uses a seedable pseudo-random-number-generator for its random numbers, then it can be fully deterministic.
lou1306•3h ago
mzl•4h ago
Varying the deployment type (chip model, number of chips, batch size, ...) can also change the output due to rounding errors. See https://arxiv.org/abs/2506.09501 for some details on that.
mzl•4h ago
miningape•2h ago
This is to say every output can be understood by understanding the systems that produced it. There are no dice rolls required. I.e. if it builds wrongly every other Tuesday, the reason for that can be determined (there's a line of code describing this logic).
daxfohl•3h ago
With a coding language, once you know the rules, there's no two ways to understand the instructions. It does what it says. With English, good luck getting everyone and the LLM to agree on what every word means.
Going with LLM as a compiler, I expect by the time you get the English to be precise enough to be "compiled", the document will be many times larger than the resulting code, no longer be a reasonable requirements doc because it reads like code, but also inscrutable to engineers because it's so verbose.
dworks•3h ago
First, the term “accuracy” is somewhat meaningless when it comes to LLMs. Anything that an LLM outputs is by definition “accurate” or “correct” from a technical point of view because it was produced by the model. The term accuracy then is not a technical or perhaps even factual term, but a sociological and cultural term, where what is right or wrong is determined by society, and even we sometimes have a hard time determining what is true or note (see: philosophy).
miningape•2h ago
If you cannot agree on the correct interpretation, nor output, what stops an LLM from solving the wrong problem? what stops an LLM from "compiling" the incorrect source code? What even makes it possible for us to solve a problem? If I ask an LLM to add a column to a table and it drops the table it's a critical failure - not something to be reinterpreted as a "new truth".
Philosophical arguments are fine when it comes to loose concepts like human language (interpretive domains). On the other hand computer languages are precise and not open to interpretation (formal domains) - so philosophical arguments cannot be applied to them (only applied to the human interpretation of code).
It's like how mathematical "language" (again a formal domain) describes precise rulesets (axioms) and every "fact" (theorem) is derived from them. You cannot philosophise your way out of the axioms being the base units of expression, you cannot philosophise a theorem into falsehood (instead you must show through precise mathematical language why a theorem breaks the axioms). This is exactly why programming, like mathematics, is a domain where correctness is objective and not something that can be waved away with philosophical reinterpretation. (This is also why the philosophy department is kept far away from the mathematics department)
pjmlp•2h ago