LLMs as Compilers

https://resync-games.com/blog/engineering/llms-as-compiler

29•kadhirvelm•7h ago

Comments

cschep•3h ago

Since LLM’s aren’t deterministic isn’t it impossible? What would keep it from iterating back and forth between two failing states forever? Is this the halting problem?

gloxkiqcza•3h ago

Correct me if I’m wrong but LLMs are deterministic, the randomness is added intentionally in the pipeline.

zekica•2h ago

The two parts of your statement don't go together. A list of potential output tokens and their probabilities are generated deterministically but the actual token returned is then chosen at random (weighted based on the "temperature" parameter and the probability value).

mzl•2h ago

That depends on the sampling strategy. Greedy sampling takes the max token at each step.

galaxyLogic•1h ago

I assume they use software-based pseudo-random-number generators. Those can typically be given a seed-value which determines (deterministically) the sequence of random numbers that will be generated.

So if an LLM uses a seedable pseudo-random-number-generator for its random numbers, then it can be fully deterministic.

lou1306•1h ago

There are subtle sources of nondeterminism in concurrent floating point operations, especially on GPU. So even with a fixed seed, if an LLM encounters two tokens with very close likelihoods, it may pick one or the other across different runs. This has been observed even with temperature=0, which in principle does not involve _any_ randomness (see arXiv paper cited earlier in this thread).

mzl•2h ago

LLMs can be run in a mostly deterministic mode (see https://docs.pytorch.org/docs/stable/notes/randomness.html for some info on running PyTorch programs).

Varying the deployment type (chip model, number of chips, batch size, ...) can also change the output due to rounding errors. See https://arxiv.org/abs/2506.09501 for some details on that.

mzl•2h ago

Many compilers are not deterministic (it is why repeatable builds is not a solved problem), and many LLMs can be run in a mostly deterministic way.

miningape•9m ago

Repeatable builds are not a requirement for determinism. Since the outputs can be determined based on the exact system running the code, it is deterministic - even though the output can vary based on the system running the code.

daxfohl•1h ago

I'd suggest the problem isn't that LLMs are nondeterministic. It's that English is.

With a coding language, once you know the rules, there's no two ways to understand the instructions. It does what it says. With English, good luck getting everyone and the LLM to agree on what every word means.

Going with LLM as a compiler, I expect by the time you get the English to be precise enough to be "compiled", the document will be many times larger than the resulting code, no longer be a reasonable requirements doc because it reads like code, but also inscrutable to engineers because it's so verbose.

dworks•1h ago

Sure, we cannot agree on the correct interpretation of the instructions. But, we also cannot define what is correct output.

First, the term “accuracy” is somewhat meaningless when it comes to LLMs. Anything that an LLM outputs is by definition “accurate” or “correct” from a technical point of view because it was produced by the model. The term accuracy then is not a technical or perhaps even factual term, but a sociological and cultural term, where what is right or wrong is determined by society, and even we sometimes have a hard time determining what is true or note (see: philosophy).

miningape•17m ago

What? What does philosophy have to do with anything?

If you cannot agree on the correct interpretation, nor output, what stops an LLM from solving the wrong problem? what stops an LLM from "compiling" the incorrect source code? What even makes it possible for us to solve a problem? If I ask an LLM to add a column to a table and it drops the table it's a critical failure - not something to be reinterpreted as a "new truth".

Philosophical arguments are fine when it comes to loose concepts like human language (interpretive domains). On the other hand computer languages are precise and not open to interpretation (formal domains) - so philosophical arguments cannot be applied to them (only applied to the human reader/writer of the code).

It's like how mathematical "language" (again a formal domain) describes precise rulesets (axioms) and every "fact" (theorem) is derived from them. You cannot philosophise your way out of the axioms being the base units of expression, you cannot philosophise your way into disproving a theorem (instead you must show through precise mathematical language why a theorem breaks the axioms). This naive applicability is why the philosophy department is kept far away from the mathematics department.

pjmlp•4m ago

As much as many devs that haven't read the respective ISO standards, the compiler manual back to back, and then get surprised with UB based optimizations.

baalimago•3h ago

I've had the same exact thought! The reason why we've moved from higher to higher level of programming language is to make it easier for humans to describe to the machine what we want it to do. That's why languages are semantically easier and easier to read js > cpp > c > assembly > machine code (subjectively, yes yes, you get the point). It makes perfect sense to believe that natural language interpreted by an LLM is the next step in this evolution.

My prediction: in 10 years we'll see LLMs generate machine code directly, just like a normal compiler. The programming language will be the context provided by the context engineer.

vbezhenar•3h ago

Normal compiler does not generate machine code directly. Normal compiler generates LLVM IR code. LLVM generates assembly listings. Assembler generates machine code. You can write compiler which outputs machine code directly, but this multi-level translation exists for a reason. IMO, LLM might be utilised to generate some Python code in the far far away future, if the issue with deterministic generation would be solved. But generating machine code does not make much sense. Today LLM uses external tools to compute sum of numbers, because they are so bad at deterministic calculations.

The core issue is that you need to be able to iterate on different parts of the application, either without altering unaffected parts or with deterministic translation. Otherwise, this AI application will be full of new bugs every change.

baalimago•3h ago

>if the issue with deterministic generation would be solved

This can be achieved by utilizing tests. So the SWE agent will write up a set of tests as it understands the task. These are the functional requirements, which should/could be easily inspected by the BI (biological intelligence).

Once the functional requirements have been set, the SWE agent can iterate over and over again until the tests pass. At this point it doesn't really matter what the solution code looks like or how it's written, only that the functional requirements as defined via the tests are upheld. New requirements? Additional tests.

kadhirvelm•3h ago

Totally agree - I'd bet there will be a bigger emphasis on functional testing to prevent degradation of previously added features. And I'd bet the scope of tests we'll need to write will also go up. For example, I'd bet we'll need to add latency based unit tests to make sure as the LLM compiler is iterating, it doesn't make the user perceived performance worse

pjmlp•5m ago

C and C++ UB enter the room,....

kadhirvelm•3h ago

Interesting, I'd hypothesize something slightly different, that we'll see a much more efficient language come out. Something humans don't need to read that can then get compiled to machine code super efficiently. Basically optimizing the output tokens to machine work done as much as possible

c048•3h ago

I've thought of this too.

But I always end up in a scenario where, in order to make the LLM spit out consistent and as precise as possible code, we end up with a very simple and tight syntax.

For example we'll be using less and less complete human sentences, because they leave too much open to interpretation, and end up with keywords like "if", "else" and "foreach". When we eventually do end up at that utopia, the first person to present this at a conference will be hailed as a revolutionist.

Only for the LLM to have a resolve clash and, while 'hallucinating', flip a boolean check.

gloxkiqcza•3h ago

I agree that the level of abstraction will grow and LLMs will be the primary tool to write code *but* I think they will still generate code in a formal language. That formal language might be very close to a natural language, pseudo code if you will, but it will still be a formal language. That will make it much much easier to work on, collaborate on and maintain the codebase. It’s just my prediction though, I might be proven wrong shortly.

lloeki•1h ago

You seem to have missed this part of TFA:

> That means we no longer examine the code. Our time as engineers will be spent handling context, testing features, and iterating on them

IOW there would be no human to "work on, collaborate on and maintain the codebase" and so the premise of the article is that it might just as well emit machine code from the "source prompt", hence "LLM as compiler".

Or maybe you mean that this formal language is not for humans to handle but entirely dedicated to LLMs, for the sake of LLMs not having to reverse engineer assembly?

I think that's where the premises differ: the author seems to suggest that the assembly would be generated each time from the "source prompt"

I don't know, these all read like thought experiments built on hypothetical properties that these AI tools would somehow be bestowed upon in some future and not something grounded in any reality. IOW science fiction.

azaras•2h ago

But this is a waste of resources. LLM should be generated in a higher-level language and then compiled.

rvz•50m ago

There you go. Then an actual compiler compiles the code into the correct low-level assembly for the actual linker to create an executable.

Congratulations. An LLM is not a 'compiler'.

alaaalawi•2h ago

I concur. not intermediate code directly machine code, even no tests. It will take human specs internally understand them (maybe formal methods of reasoning) and keep chatting with the user asking about any gaps (you. mentioned a and c, what about b) or ask for clarification on inconsistencies (i.e. in point 16 you mentioned that and in point 50 you mentioned this, to my limited understanding doesn't this contradicts? for example if we have that based on point 16 and have that based point 50,how do you resolve it. in short will act as business analysis with no (imagined) ego or annoyance in the middle by the user. from talk to walk

arkh•2h ago

> It makes perfect sense to believe that natural language interpreted by an LLM is the next step in this evolution.

Which one? Most languages are full of imprecision and change over time. So which one would be best for giving instructions to the machines?

galaxyLogic•2h ago

In the scheme described in the article the main input for AI would be the tests. If we are testing code outputs (and why not) the input then must be in a programming language.

Specifications need to be unambiguous but Natural Language is often ambiguous.

sjrd•2h ago

The level of abstraction of programming languages has been growing, yes. However, new languages have preserved precision and predictability. I would even argue that as we went up the abstraction ladder, we have increasingly improved the precision of the semantics of our languages. LLMs don't do that at all. They completely destroy determinism as a core design. Because of that, I really don't think LLMs will be the future of programming languages.

ryanobjc•2h ago

So this has already been conceived of many decades ago, and there are some substantial issues with it, the illustrious djikstra covers it: https://www.cs.utexas.edu/~EWD/transcriptions/EWD06xx/EWD667...

Now this isn’t to say the current programming languages are good, they are generally not. They don’t offer good abstraction powers, typically. You pay for this in extra lines of code.

But having to restate everything in English, then hoping that the LLM will fill in enough details, then iterating until you can close the gaps, well it doesn’t seem super efficient to me. You either cede control to what the LLM guesses or you spend a lot of natural language.

Certainly in a language with great abstractions you’d be fine already.

quantumgarbage•2h ago

Ah so I was right to scroll down to find a sane take

nojito•1h ago

It’s no different from translating business requirements into code.

Djikstra was talking about something completely different.

sponnath•1h ago

True but this only works well if the natural language "processor" was reliable enough to properly translate business requirements into code. LLMs aren't there yet.

lou1306•59m ago

Exactly, and translating business requirements into code is so frustrating and error-prone that entire philosophies (and consulting firms) have been built around it. LLMs are no silver bullet, they are just faster to come up with _something_.

fxj•1h ago

That not only works for compilation but also for more general code transformations like code parallelization with OpenMP in C and Fortran, Array-of-lists to List-of-arrays, or transform python code to parallel C-code and make a python module out of it.

I have created some pipelines this way where the LLM generates input files for a molecular dynamics code and write a python script for execution on a HPC system.

daxfohl•1h ago

Nah, I think that's the opposite of what to do. That requires you to specify all requirements up front, then press go and pray. Even if it worked perfectly, it takes us back to the stone ages of waterfall design. With LLMs, missing one requirement that would be obvious to a human (don't randomly delete accounts) often leads to a fun shortcut from the LLM perspective (hey if there's a race condition then I can fix it by deleting the account)!

The real value of LLMs is their conversational ability. Try something, iterate, try something else, iterate again, have it patch a bug you see, ask if it has recommendations based on where you are headed, flesh things out and fine tune them real time. Understand its misunderstandings and help it grasp the bigger picture.

Then at the end of the session, you'll have working code AND a detailed requirements document as an output. The doc will discuss the alternatives you tried along the way, and why you ended up where you did.

It's much like this in graphics too. Yeah you could spend a ton of time coming up with the single one-shot prompt that gives you something reasonably close to what you need, which is how it worked in the past. But now that approach is silly. It's much easier to work iteratively, change one thing, change another, until you have exactly what you need, in a much faster and more creative session.

So yeah you could use LLMs as a compiler, but it's so much more engaging not to.

fedeb95•1h ago

> Democratize access to engineering

    You don't need as specialized skillsets to build complex apps, you just need to know how to put context together and iterate

I feel it is exactly the opposite. AI helps specialists iterate faster, knowing what they are doing. Who doesn't know the details will stumble upon problems unsolvable by AI iteration. Who knows the details can step in where AIs fail.

klntsky•1h ago

I don't think there are problems unsolvable in principle. Given a good enough specialist and some amount of time, it's possible to guide an LLM to the solution eventually.

The problem is that people often can't recognize whether they are getting closer to the solution or not, so iteration breaks.

iamgvj•1h ago

There was a literal LLM compiler model that was released last year by Meta.

https://arxiv.org/abs/2407.02524

UltraSane•26m ago

This really only works well if you have a TLA+ style formal model of the algorithm and can use it to generate lots of unit tests.

pjmlp•7m ago

Not necessarly this, however I am quite convinced that AI based tooling will be the evolution of compilers.

The current way to generate source code of existing programming languages, is only a transition step, akin to how early compilers always generated Assembly that was further processed by an existing Assembler.

Eventually most developers don't even know the magic incantations to spew Assembly of their compilers, including JITs, it has become a dark art for compiler engineers, game developers and crypto folks.

People that used to joke about COBOL, would be surprised how much effort is being spent in prompt engineering.

Dutch authorities: Almost every Dutch citizen has too much PFAS in their blood

"You can't have privacy without security" – Building with Certifications in Mind

Space Ship, but Fish – Using Blender 3D Modelling first time

The Evasive Evitability of Enshittification

Jury says Google must pay California Android smartphone users $314.6M

Jakarta EE 11 Delivers 16 Updated Specifications and Modernized TCK

Hundreds of Brother printer models have an unpatchable security flaw

How to manage configuration settings in Go web applications

Take Two: Eshell

Perplexity joins Anthropic and OpenAI in offering a $200 per month subscription

Digital Hygiene: Emails

Space Force to fund development of Atomic-6 solar power for satellites

Trump tries to kill the most indisputable evidence of climate change

A nuclear attack on the U.S. might unfold, step by step

Laptop Mag is shutting down

Show HN: Bookmark and organise your mobile links with ease with this free app

Albumentations: Licensing Change and Project Fork

Recreating Laravel Cloud's range input with native HTML

When do pattern match compilation heuristics matter?

Guaranteeing post-quantum encryption in the browser: ML-KEM over WebSockets

Trusting the Boot Process: Inside Bottlerocket's Security Architecture

2-D Digital Waveguide and Finite Difference Modeling of a Sitar (2015) [pdf]

Tenkai: AI-powered no-code platform to extract structured web data from any site

Show HN: Open Dog Registry – free, open-source API for 200 dog breeds

All Rocket launches in 2025 so far, chronologically and to scale

ChatGPT creates phisher's paradise by recommending the wrong URLs

Nintendo locked down the Switch 2's USB-C port and broke third-party docking

Takens Embedding Theorem

Apple is reportedly working on a cheaper MacBook with an iPhone processor

Show HN: Managing VectorDB via Natural Language

LLMs as Compilers

Comments

Dutch authorities: Almost every Dutch citizen has too much PFAS in their blood

"You can't have privacy without security" – Building with Certifications in Mind

Space Ship, but Fish – Using Blender 3D Modelling first time

The Evasive Evitability of Enshittification

Jury says Google must pay California Android smartphone users $314.6M

Jakarta EE 11 Delivers 16 Updated Specifications and Modernized TCK

Hundreds of Brother printer models have an unpatchable security flaw

How to manage configuration settings in Go web applications

Take Two: Eshell

Perplexity joins Anthropic and OpenAI in offering a $200 per month subscription

Digital Hygiene: Emails

Space Force to fund development of Atomic-6 solar power for satellites

Trump tries to kill the most indisputable evidence of climate change

A nuclear attack on the U.S. might unfold, step by step

Laptop Mag is shutting down

Show HN: Bookmark and organise your mobile links with ease with this free app

Albumentations: Licensing Change and Project Fork

Recreating Laravel Cloud's range input with native HTML

When do pattern match compilation heuristics matter?

Guaranteeing post-quantum encryption in the browser: ML-KEM over WebSockets

Trusting the Boot Process: Inside Bottlerocket's Security Architecture

2-D Digital Waveguide and Finite Difference Modeling of a Sitar (2015) [pdf]

Tenkai: AI-powered no-code platform to extract structured web data from any site

Show HN: Open Dog Registry – free, open-source API for 200 dog breeds

All Rocket launches in 2025 so far, chronologically and to scale

ChatGPT creates phisher's paradise by recommending the wrong URLs

Nintendo locked down the Switch 2's USB-C port and broke third-party docking

Takens Embedding Theorem

Apple is reportedly working on a cheaper MacBook with an iPhone processor

Show HN: Managing VectorDB via Natural Language