The more I use LLMs, the more I find this true. Haskell made me think for minutes before writing one line of code. Result? I stopped using Haskell and went back to Python because with Py I can "think while I code". The separation of thinking|coding phases in Haskell is what my lazy mind didn't want to tolerate.
Same goes with LLMs. I want the model to "get" what I mean but often times (esp. with Codex) I must be very specific about the project scope and spec. Codex doesn't let me "think while I vibe", because every change is costly and you'd better have a good recovery plan (git?) when Codex goes stray.
The obvious has been stated.
LLMs are not designed for that.
If you don't like the results or the process, you have to switch targets or add new intermediates. For example instead of doing description -> implementation, do description -> spec -> plan -> implementation
Using LLMs to do something like what a compiler can already do is also modelling LLMs as infinite rather than finite. In fact in this particular situation not only are they finite, they're grotesquely finite, in particular, they are expensive. For example, there is no world where we just replace our entire infrastructure from top to bottom with LLMs. To see that, compare the computational effort of adding 10 8-digit numbers with an LLM versus a CPU. Or, if you prefer something a bit less slanted, the computational costs of serving a single simple HTTP request with modern systems versus an LLM. The numbers run something like LLMs being trillions of times more expensive, as an opening bid, and if the AIs continue to get more expensive it can get even worse than that.
For similar reasons, using LLMs as a compiler is very unlikely to ever produce anything even remotely resembling a payback versus the cost of doing so. Let the AI improve the compiler instead. (In another couple of years. I suspect today's AIs would find it virtually impossible to significatly improve an already-optimized compiler today.)
Moreover, remember, oh, maybe two years back when it was all the rage to have AIs be able to explain why they gave the answer they did? Yeah, I know, in the frenzied greed to be the one to grab the money on the table, this has sort of fallen by the wayside, but code is already the ultimate example of that. We ask the LLM to do things, it produces code we can examine, and the LLM session then dies away leaving only the code. This is a good thing. This means we can still examine what the resulting system is doing. In a lot of ways we hardly even care what the LLM was "thinking" or "intending", we end up with a fantastically auditable artifact. Even if you are not convinced of the utility of a human examining it, it is also an artifact that the next AI will spend less of its finite resources simply trying to understand and have more left over to actually do the work.
We may find that we want different programming languages for AIs. Personally I think we should always try to retain that ability for humans to follow it, even if we build something like that. We've already put the effort into building AIs that produce human-legible code and I think it's probably not that great a penalty in the long run to retain that. At the moment it is hard to even guess what such a thing would look like, though, as the AIs are advancing far faster than anyone (or any AI) could produce, test, prove out, and deploy such a language, against the advantage of other AIs simply getting better at working with the existing coding systems.
This could be a good way to learn how robust your tests are, and also what accidental complexity could be removed by doing a rewrite. But I doubt that the results would be so good that you could ask a coding agent to regenerate the source code all the time, like we do for compilers and object code.
One current idea of mine, is to iteratively make things more and more specific, this is the approach I take with psuedocode-expander ([0]) and has proven generally useful. I think there's a lot of value in the LLM instead of one shot generating something linearly, building from the top down with human feedback, for instance. I give a lot more examples on the repo for this project, and encourage any feedback or thoughts on LLM driven code generation in a more sustainable then vibe-coding way.
[0]: https://github.com/explosion-Scratch/psuedocode-expander/
Well, you can always set temperature to 0, but that doesn't remove hallucinations.
This is not really a point about whether LLMs can currently be used as English compilers, but more questioning whether determinism of the final machine code output is a critical property of a build system.
"This gets to my core point. What changes with LLMs isn’t primarily nondeterminism, unpredictability, or hallucination. It’s that the programming interface is functionally underspecified by default."
We have mechanisms for ensuring output from humans, and those are nothing like ensuring the output from a compiler. We have checks on people, we have whole industries of people whose whole careers are managing people, to manage other people, to manage other people.
with regards to predictability LLMs essentially behave like people in this manner. The same kind of checks that we use for people are needed for them, not the same kind of checks we use for software.
Those checks works for people because humans and most living beings respond well to rewards/punishment mechanisms. It’s the whole basis of society.
> not the same kind of checks we use for software.
We do have systems that are non deterministic (computer vision, various forecasting models…). We judge those by their accuracy and the likely of having false positive or false negatives (when it’s a classifier). Why not use those metrics?
I think this is an interesting development, because we (linguists and logicians in particular) have spent a long time developing a highly specified language that leaves no room for ambiguity. One could say that natural language was considered deficient – and now we are moving in the exact opposite direction.
But for reference, we don't (usually) care which register three compiler uses for which variable, we just care that it works, with no bugs. If the non-dertetminism of LLMs mean the variable is called file, filename, or fileName, file_name, and breaking with convention, why do we care? At the level Claude let's us work with code now, it's immaterial.
Compilation isn't stable. If you clear caches and recompile, you don't get a bit-for-bit exact copy, especially on today's multi-core processors, without doing extra work to get there.
One of the first things I tried to have an llm do is transpile. These days that works really well. You find an interesting project in python, i'm a js guy, boom js version. Very helpful.
You see a business you like, boom competing business.
These are going to turn into business factories.
Anthropic has a business factory. They can make new businesses. Why do they need to sell that at all once it works?
We're focusing on a compiler implementation. Classic engineering mindset. We focus on the neat things that entertain us. But the real story is what these models will actually be doing to create value.
Can you formally verify prose?
> But yeah, they probably don't fit the bill of English based code to machine code
Which is why LLMs cannot be compilers that transform code to machine code.
Stop this. This is such a stupid way way of describing mistakes from AI. Please try to use the confusion matrix or any other way. If you're going to try and make arguments, it's hard to take them seriously if you keep regurgitating that LLM's hallucinate. It's not a well defined definition so if you continually make this your core argument, it becomes disingenuous.
You can see it clearly if you just translate the article's expensive vocabulary into plain English. When the author writes, 'When you hand-build, the space of possibilities is explored through design decisions you’re forced to confront,' they are just saying, 'When you write code yourself, you have to choose how to write it.' When they claim, 'contextuality is dominated by functional correctness,' they just mean, 'Usually, we just care if the code works.' When they warn about 'inviting us to outsource functional precision itself,' they really mean, 'LLMs let you be lazy.' And finaly, 'strengthening the will to specify,' is just a dramatic way of saying, 'We need to write better requirements.' It is obscurantism plain and simple. using complexity to hide the fact that the insight is trivial.
But that is just an estethical problem to me. Worse. The argument collapses entirely when you look at the logical leap between the premises.
The author basically argues that because Natural Language is vague, engineers will inevitably stop caring about the details and just accept whatever reasonable output the AI gives. This is pure armchair psychology. It assumes that just because the tool allows for vagueness, professionals will suddenly abandon the concept of truth or functional requirements. That is a massive, unsubstantiated jump.
If we use fuzzy matching to find contacts on our phones all the time. Just because the search algorithm is imprecise doesn't mean we stop caring if we call the right person. We don't say, 'Well, the fuzzy match gave me Bob instead of Bill, I guess I'll just talk to Bob now.' The hard constraint, the functional requirement of talking to the specific person you need, remains absolute. Similarly, in software, the code either compiles and passes the tests, or it doesn't. The medium of creation might be fuzzy, but the execution environment is binary. We aren't going to drift into accepting broken banking software just because the prompt was in English.
This entire essay feels like those social psychology types that now have been thoroughly been discredited by the replication crisis in psychology. The ones who are where concerned with dazzling people with verbal skills than with being right. It is unnecessarily complex, relying on projection of dreamt up concepts and behavior, rather than observation. THIS tries to sound profound by turning a technical discussion into a philosophical crisis, but underneath the word salad, it is not just shallow, it is wrong.
This feels like the same debate assembly programmers had about C in the 60s. "You don’t understand what the compiler is doing, therefore it’s dangerous". Eventually we realised the important thing isn’t how the code was authored but whether the behaviour is correct, testable, and maintainable.
If code generated by an LLM:
- passes a real test suite (not toy tests),
- meets performance/security constraints,
- goes through review like any other change,
then the acceptance criteria haven’t changed. The test suite is part of the spec. If the spec is enforced in CI, the authoring tool is secondary.The real risk isn’t "LLMs as compilers", it’s letting changes bypass verification and ownership. We solved that with C, with large dependency trees, with codegen tools. Same playbook applies here.
If you give expected input and get expected output, why does it matter how the code was written?
codingdave•1h ago
It all feels to me like the guys who make videos of using using electric drills to hammer in a nail - Sure, you can do that, but it is the wrong tool for the job. Everyone knows the phrase: "When all you have is a hammer, everything looks like a nail." But we need to also keep in mind the other side of that coin: "When all you have is nails, all you need is a hammer." LLMs are not a replacement for everything that happens to be digital.
alpaylan•1h ago
liveoneggs•1h ago
wizzwizz4•34m ago
skydhash•22m ago
Often it seems like tech maximalists are the most against tech reliability.
snovv_crash•5m ago
I suggest when their pointer dereferences, it can go a bit forward or backwards in memory as long as it is mostly correct.
SecretDreams•4m ago
blazinglyfast•52m ago
No. LLMs are undefined behavior.
xixixao•47m ago
But most LLM services on purpose introduce randomness, so you don’t get the same result for the same input you control as a user.
WithinReason•56m ago
SecretDreams•1m ago
Humans, in all their non deterministic brain glory, long ago realized they don't want their software to behave like their coworkers after a couple of margaritas.
CGMthrowaway•37m ago
"Deterministic" is not the the right constraint to introduce here. Plenty of software is non-deterministic (such as LLMs! But also, consensus protocols, request routing architecture, GPU kernels, etc) so why not compilers?
What a compiler needs is not determinism, but semantic closure. A system is semantically closed if the meanings of its outputs are fully defined within the system, correctness can be evaluated internally and errors are decidable. LLMs are semantically open. A semantically closed compiler will never output nonsense, even if its output is nondeterministic. But two runs of a (semantically closed) nondeterministic compiler may produce two correct programs, one being faster on one CPU and the other faster on another. Or such a compiler can be useful for enhancing security, e.g. programs behave identically, resist fingerprinting.
Nondeterminism simply means the compiler selects any element of an equivalence class. Semantic closure ensures the equivalence class is well‑defined.
moregrist•12m ago
I am not. To me that describes a debugging fiasco. I don't want "semantic closure," I want correctness and exact repeatability.
SecretDreams•6m ago
candiddevmike•5m ago
bee_rider•30m ago
123malware321•26m ago
or does your binary always come out differently each time you compile the same file??
You can try it. try to compile the same file 10 times and diff the resultant binaries.
Now try to prompt a bunch of LLMs 10 times and diff the returned rubbish.
vlovich123•7m ago
There’s even efforts to guarantee this for many packages on Linux - it’s a core property of security because it lets you validate that the compilation process or environment wasn’t tampered with illicitly by being able to verify by building from scratch.
Now actually managing to fix all inputs and getting deterministic output can be challenging, but that’s less to do with the compiler and more to do with the challenge of completely taking the entire environment (the profile you are using for PGO, isolating paths on the build machine being injected into the binary, programs that have things in their source or build system that’s non deterministic (e.g. incorporating the build time into the binary)
candiddevmike•3m ago
9rx•17m ago
They are designed to be where temperature=0. Some hardware configurations are known defy that assumption, but when running on perfect hardware they most definitely are.
What you call compilers are also nondeterministic on 'faulty' hardware, so...
vlovich123•6m ago