Nobody Reviews Compiler Output

https://skiplabs.io/blog/codegen_as_compiler

15•rzk•2d ago

Comments

zby•1h ago

""" we need to build:

    Formal specification layers that agents execute against, not just prompts

"""

It is probably easier to just write that program.

bwestergard•1h ago

Right, because to trust that those "formal specifications" are correct, you will have to write them by hand.

zby•30m ago

First you need to write these specifications and if you say just tell the llm to write them - then how would it be different from just tell the llm to write the program?

I guess you can argue that these are two independent processes so you can combine them to get something more reliable than both - this might be a viable path. But from what I heard writing formal specifications is just really hard - I haven't seen anything practical in this area.

secos•1h ago

Talked with someone this morning who is using "formal methods" to validate their AI generated code.

They are using the same AI to generate the proofs.

janice1999•1h ago

That's why you should just subscribe to multiple LLM vendors. One model to write specs, one to write code against the specs and another to validate the code. Problem solved. (I have heard this proposed at work.)

perfunctory•1h ago

yep. "Formal specification layers " aka code.

secos•1h ago

ah yes.

Lets indeed treat non-deterministic output exactly like we treat deterministic output.

lokar•1h ago

We could make it deterministic

mathisfun123•1h ago

this take is peak dunning-kruger:

https://github.com/llvm/llvm-project/tree/main/llvm/test/Cod...

Pannoniae•1h ago

Just like LLMs, compilers are just another layer of abstraction and no they're not deterministic.

Just yesterday I've reported a codegen bug in MSVC. (Luckily they've fixed it very fast.) Can you realise that it's an optimiser bug without inspecting the assembly? Hardly.

All the arguments people claim against LLMs are similarly applicable to compilers, but compilers are old technology and LLMs are new.

If you're an expert, just about every compiled function contains obvious inefficiencies, and a skilled assembly programmer can speed it up by in the ballpark of 3x. If we're talking about your average webapp, you can usually get 1000x better resource usage in most ways, including CPU, RAM, storage and so on.

And the output isn't deterministic either - the bugs no withstanding, code generation is highly chaotic, optimisations have non-local impacts and you can't easily predict optimised codegen output from source.

LLMs aren't much worse. They have non-deterministic output, but you can steer it - similarly to a compiler. An expert can use it to gain great speed and efficiency, but in the hands of someone not as capable, you can make something awful just as fast. Both tools are force multipliers.

tsimionescu•1h ago

> Compilers have type systems, formal contracts about what code means before it runs.

This is a complete misunderstanding of what makes compilers trustworthy. Those are all properties of the language, not the compiler. The compiler is trustworthy to the extent that it is well built, internally. It is trustworthy to the extent that the mapping from source code to machine code is well defined, and implemented correctly.

You can have the best type system you want, but if the compiler is badly implemented, it won't be trustworthy. A perfect example is C - a language that barely has a type system, yet has some of the most trustworthy and optimized compilers. And it also has, or at least had, plenty of buggy compilers, typically for small embedded platforms with complicated mappings between C constructs and the limited CPU instruction set.

xyzzy_plugh•1h ago

All other arguments aside... Yes, people do review compiler output, all the time in fact!

When optimizing code it's not unusual to look at the assembly. It's not unusual to look for opportunities for autovectorization or to verify inlining or loop unrolling.

Compilers are, for the most part, deterministic. This means after people have reviewed the output, it's unlikely to change. It also means if they do change, only a few people are required to notice.

None of this applies to LLMs. They are worse than compilers, in regards to the quality and characteristics of their output, in every possible way.

If no one reviewed compiler output then https://godbolt.org/ wouldn't exist.

CamperBob2•1h ago

All other arguments aside... Yes, people do review compiler output, all the time in fact!

No. In reality, this is almost never done anymore.

We used to do it all the time back when performance mattered, but that was then.

HN readers don't have to like it, and obviously they (we) don't, but shooting the messenger won't help.

xyzzy_plugh•57m ago

In my circles we're actually doing it even more because we can have an LLM take a look at the assembly.

CamperBob2•46m ago

Your circles are... rarefied.

janice1999•54m ago

Same goes for linker output, especially in embedded dev.

pjmlp•47m ago

Not really, people that argue about assembly on Godbolt tend to be the specimens that the site was created for, those that count CPU cycles per Assembly instrution and are religious about which programming language syntax generates the less amount of Assembly opcodes.

The rest of us uses it because it is cool way to share code snippets.

fuhsnn•1h ago

Some of us do spend hours on godbolt.org tweaking code like it was game character build.

keybored•1h ago

I only skimmed this. Lots of “not to be read, but to be verified; process, not the artifact; not x but...”.

“AI-checks-AI pipelines as first-class CI infrastructure, not bolt-on curiosity”—what’s the contrast here? Is it serious aspiration, not unserious aspiration?

“Formal specification layers that agents execute against, not just prompts”—Okay.

It just looks like it is stating lots of problems with a x-not-y as if there is progress being made by way of insistence.

I am open to the idea of something like a small verification kernel that can be comprehended by “humans” which can check GenAI output. But right now we can contrast mature (decade+) compilers with GenAI like this.

- Compilers: You get the abstraction you asked for: it might not be “optimal” code, but it is code that works the way you wrote it

- GenAI: Here is 200KLOC, good luck, could be anything

Now you could reduce the space of those 200KLOC with tests and verification. But so far (based on this submission) it looks like this is at the handwaving stage.

Certainly you would need high-value tests if tests are the thing that is supposed to be the verification. Either something simple and expressive enough for “humans” to write or something that is both short and easy to read for “humans” (and generated by GenAI). Not some copy-paste smelling mockfest that looks like it is a pile of junk that has evolved over five years, each author pushing some junk on top while taking care to not make the whole pile tilt and collapse.

recursivedoubts•56m ago

Compilers are (mostly) deterministic. LLMs are explicitly stochastic.

This is so dumb.

vzaliva•54m ago

There are formally compilers (e.g. CompCert https://en.wikipedia.org/wiki/CompCert ) which are formally proven correct. I think eventually all production compilers will be formally verified.

sdevonoes•52m ago

Biggest problem right now is that we have teams pushing dozen of MD files, dozend of thousands of lines of English prose as if they were specs. There’s no way we can validate so much written prose because it’s plain English: perhaps one sentence is written in a way LLMs will read between the lines; perhaps line 100 and line 1000 are contradictory in such a subtle way that LLMs may not be aware of it.

In my company we have so much english prose committed to MD files that Im starting to think it’s all just snake oil. I cannot trust an engineer that writes “no bugs , please” and can go on with their lives.

tuveson•16m ago

What we need is some sort of Common, business-Oriented suBset Of the english Language that can be deterministically translated into something that the machine can understand, but also be read and understood by non-technical stakeholders. Such technology is a pipe dream, but one can dream…

Gualdrapo•50m ago

Several days ago I was toying with Rust trying to hack mmtc[0]. Mind you, I have absolutely no idea about Rust, but thanks to the verbosity and helpfulness of its compiler I was able to add a shuffle playlist feature - and am trying to figure out how to add an 'update database' feature too. At some point I even thought learning Rust could be easy...

[0] https://github.com/figsoda/mmtc/

OutOfHere•48m ago

This is why I use LLMs to write directly in Assembly, making it impossible for me to review it. (joke)

jknoepfler•39m ago

This is a very bad comparison. You can see that immediately if you think to yourself "can I see a future in which we compile code with LLMs?". The answer is no. That's a terrible idea, and the fact that it's a terrible idea is obvious.

The formal foundations of compilers are completely different from the formal foundations of LLMs.

The former are deterministic, easy to formally verify, and extremely simple in nature. "Translate a for-loop into x86 instructions using a set of rules."

The latter is intrinsically statistical in nature. "Translate a human language prompt into functional code" has to infer the correct output statistically from similar, observed input->output relationships. There is no guarantee of consistency. Different builds of the model will see different input->output evidence, in different order, and parameter tuning will further change how it responds to those pieces of evidence. Evidence is incomplete. Local minima are inevitable. LLMs are lossy curve-fitters under the hood. Errors aren't an option, they're an inevitability.

codewritinfool•18m ago

I work in embedded and I absolutely review it. If nothing else, for learning.

ignoreusernames•11m ago

I think this argument only holds if you believe that LLMs are at a point where it can handle any combination of craziness that you throw at it.

From my own experience working with agents is that there’s “snowball of shit” effect. Small little mistakes that compound on each other. You can either

- review the code and try to prune some of the shit occasionally - let the LLM handle everything

As of the current status of the industry it’s very hard for me to not see option 2 as extremely irresponsible. Coding agents limits are not well defined and unless you’re running an open weight model locally (most people aren’t) you just gave up all control over your code to a third party. If running local models were the norm, the argument that LLM are just another layer of abstraction would hold a little better. Reusing the compiler analogy from the post, it’s like depending on a compiler where you pay a monthly premium to compile your code. Those did exist a while ago with closed licenses, but I think the majority of deployed code nowadays is on open-ish platforms. This walled garden development paradigm already lost once

camgunz•11m ago

Lord Satan give me strength. Yes people review compiler output, like compiler engineers. Compilers are deterministic, so reviewing it once is sufficient. Man, it's like some kind of blanket of stupidity over the world. Also clearly AI-written article, because of course it is.

jquaint•5m ago

If the quality of LLM's keep increasing, will it emulate the abstraction shift that compliers gave us?

i.e. can useful deterministic complier-like behavior ever be found with a non-deterministic LLM approach?

In my view the answer is yes (for most people). I don't think the technology has to formally perfect to create a significant shift in how we write (most) software.

There will still be some who review AI code. Probably in the domains where people review complier code. But not everything actually needs that level of formal verification.

Dirtyfrag: Universal Linux LPE

The Burning Man MOOP Map

Agents need control flow, not more prompts

Building for the Future

Natural Language Autoencoders: Turning Claude's Thoughts into Text

AlphaEvolve: Gemini-powered coding agent scaling impact across fields

DeepSeek 4 Flash local inference engine for Metal

AI slop is killing online communities

Colored Shadow Penumbra

I want to live like Costco people

Chrome removes claim of On-device Al not sending data to Google Servers

Child marriages plunged when girls stayed in school in Nigeria

PySimpleGUI 6

Principles for agent-native CLIs

Show HN: Kstack – Skill pack for monitoring/troubleshooting K8s in Claude Code

The Self-Cancelling Subscription

OpenBSD Stories: The closest thing to cute kittens (OpenBSD/zaurus)

RaTeX: KaTeX-compatible LaTeX rendering engine in pure Rust

Show HN: Full Python GUI apps in the browser – no JavaScript, no server

Motherboard sales 'collapse' amid unprecedented shortages fueled by AI

I switched from Mac to a Lenovo Chromebook

OurCar: What I learned making an app for my family

GovernGPT (YC W24) Is Hiring Engineers to Build Thinking Systems in Montreal

Show HN: TRUST – Coding Rust like it's 1989

MPEG-2 Transport Stream Packaging for Media over QUIC Transport

Boris Cherny: TI-83 Plus Basic Programming Tutorial (2004)

ProgramBench: Can language models rebuild programs from scratch?

ZAYA1-8B matches DeepSeek-R1 on math with less than 1B active parameters

Indian matchbox labels as a visual archive

Show HN: Stage CLI – An easier way of reading your AI generated changes locally