Because I want to be able to review it, and extend it myself.
edit: Pure vibe coding is a joke or thought exercise, not a goal to aspire to. Do you want to depend on a product that has not been vetted by any human? And if it is your product, do you want the risk of selling it?
I can imagine a future where AI coders and AI QA bots do all the work but we are not there yet. Besides, an expressive language with safety features is good for bots too.
I'm getting too old for this shit.
Without checks and feedback, LLMs can easily generate unsafe code. So even if they can generate C or Assembly that works, they’re likely to produce code that’s riddled with incorrect edge cases, memory leaks, and so on.
Also, abstraction isn’t only for humans; it’s also for LLMs. Sure, they might benefit from different kinds of abstraction - but that doesn’t mean “oh, just write machine code” is the way to go.
https://cacm.acm.org/research/automatically-translating-c-to...
It's also has GC which makes it better suited for most programs, compared to Rust with its manual memory management.
So...yeah.
Haskell is also nice because of quickcheck.
But none of them really have enough training data for LLMs to be any good at them.
Linux want all memory management explicit.
If LLMs produce code riddled with bugs in one language it will do in other languages as well. Rust isn't going to save you.
Like everything around Rust, this has been discussed ad nauseam.
Preventing memory safety bugs has a meaningful impact in reducing CVEs, even if it has no impact on logic bugs. (Which: I think you could argue the flexible and expressive type system helps with. But for the sake of this argument, let's say it provides no benefits.)
If the only concern is "can an LLM write code in this language without memory errors" then there's plenty of reasons to choose a language other than Rust.
First, Rust has lots of checks that C and assembly don't, and AI benefits from those checks. Then, a post about those checks are related to memory safety, not logic errors. Then, a post about whether that's a helpful comment. Finally, me pointing out that checks regarding types and memory errors aren't unique to Rust and there's tons of languages that could benefit.
Since you want to bring it back to the original article, here's a quote from the author:
Is C the ideal language for vibe coding? I think I could mount an argument for why it is not, but surely Rust is even less ideal. To say nothing of Haskell, or OCaml, or even Python. All of these languages, after all, are for people to read, and only incidentally for machines to execute.
It would seem that the author fundamentally misunderstand significant reasons for many of the languages he mentions to be the way that they are.Fil-C gets you close in the case of C, but we can ignore it because, of course, F* has significantly more checks than Rust, and AI benefits from those checks. Choosing Rust would be as ridiculous as choosing C if that was your motivation.
But if you don't find the need for those checks in order to consider Rust, why not C or even assembly instead?
If you don't find importance in those checks, you wouldn't choose Fil-C anyway. But, of course, it remains that if do find those checks to be important, you're going to use a serious programming language like F* anyway.
There is really no place for Fil-C, Rust, etc. They are in this odd place where they have too many checks to matter when you don't care about checks, but not enough checks when you do care about checks. Well, at least you could make a case for Fil-C if you are inheriting an existing C codebase and need to start concerning yourself with checks in that codebase which previously didn't have concern for them. Then maybe a half-assed solution is better than nothing. But Rust serves no purpose whatsoever.
These trade-offs are wholly unnecessary if the LLM writes the software in Rust, assuming that in principle the LLM is able to do so.
Sure, but it prevents memory safety issues, which C doesn't. As for logic bugs, what does prevent them? That's a bigger question but I'd suggest it's:
1. The ability to model your problem in a way that can be "checked". This is usually done via type systems, and Rust has an arguably good type system for this.
2. Tests that allow you to model your problem in terms of assertions. Rust has decent testing tooling but it's not amazing, and I think this is actually a strike against Rust to a degree. That said, proptest, fuzzing, debug assertions, etc, are all present and available for Rust developers.
There are other options like using external modeling tools like TLA+ but those are decoupled from your language, all you can ever do is prove that your algorithm as specified is correct, not the code you wrote - type systems are a better tool to some degree in that way.
I think that if you were to ask an LLM to write very correct code then give two languages, one with a powerful, express type system and testing utilities, and one without those, then the LLM would be far more likely to produce buggy code in the system without those features.
There are static tools available for C as well. What you get from Rust mostly is that the check is part of the syntax of the language as well and escaping from it is very visible. You get safety, but you give up flexibility and speed.
Really? Never from limitations of the ability to express your mental model in a way that's formally verifiable? What a strong claim to make.
> There are static tools available for C as well.
For checking the semantics of the code itself? And why discount the fact that a tool being native means it's easier to adopt?
Modern sewers sometimes back up, so might as well just releive yourself in a bucket and dump it into your sidewalk.
Modern food preservation doesn't prevent all spoilage so you might as well just go back to hoping that meat hasn't been sitting in the sun for too many days.
You can't get a gutter ball if you put up the rails in a bowling lane. Rust's memory safety is the rails here.
You might get different "bad code" from AI, but if it can self-validate that some code it spits out has memory management issues at compile time, it helps the development. Same as with a human.
Sure you can. It's difficult, and takes skill, but it can be done.
Nobody ever claimed that. The claims are:
1. Rust drastically reduces the chance of memory errors. (Or eliminates them if you avoid unsafe code.)
2. Rust reduces the chance of other logic errors.
Rust doesn't have to eliminate logic errors to be a better choice than C or assembly. Significantly reducing their likelihood is enough.
Although we did recently get pretty good evidence of those claims for humans and it would be very surprising if the situation were completely reversed for LLMs (i.e. humans write Rust more reliably but LLMs write C more reliably).
https://security.googleblog.com/2025/11/rust-in-android-move...
I'm not aware of any studies pointing in the opposite direction.
> it would be very surprising if the situation were completely reversed for LLMs
Lifetimes must be well-defined in safe Rust, which requires a deep degree of formal reasoning. The kind of complex problem analysis where it is known that LLMs produce worse results in than humans. Specifically in the context of security vulnerabilities, LLMs produce marginally less but significantly more severe issues in memory safe languages[1]. Still though, we might say LLMs will produce safer code with safe Rust, on the basis that 100,000 vibe coded lines will probably never compile.
> LLMs produce worse results in than humans
We aren't talking about whether LLMs are better than humans.
Also we're obviously talking about Rust code that compiles. Code that doesn't compile is 100% secure!
> would you say "have you done a scientific study on the average journey times across the year and for different locations to know that it was actually bad"?
In response to a similarly suspiciously faulty inductive claim? Yeah, absolutely.
> We aren't talking about whether LLMs are better than humans.
The point I'm making here is specifically in response to the idea that it would "be surprising" if LLMs produced substantially worse code in Rust than they did in C. The paper I posted is merely a touch point to demonstrate substantial deviation in results in an adjacent context. Rust has lower surface area to make certain classes of vulns under certain conditions, but that's not isomorphic with the kind of behavior LLMs exhibit. We don't have:
- Guarantees LLMs will restrict themselves to operating in safe Rust
- Guarantees these specific vulnerabilities are statistically significant in comparative LLM output
- The vulnerability severity will be lower in Rust
Where I think you might be misunderstanding me is that this isn't a statement of empirical epistemological negativism. I'm underlining that this context is way too complex to be attempting prediction. I think it should be studied, and I hope that it's the case LLMs can write good, high quality safe Rust reliably. But specifically advocating for it on gut assumptions? No. We are advocating for safety here.
Because of how chaotic this context is, we can't reasonably assume anything here without explicit data to back it up. It's no better than trying to predict the weather based on your gut. Hence, why I asked for specific data to back the claim up. Even safe Rust isn't safe from security vulnerabilities stemming from architectural inadequacies and panics. It very well may be the case that in reasonably comparable contexts, LLMs produce security vulnerabilities in real Rust codebases at the same rate they create similar vulnerabilities in C. It might also be the case that they produce low severity issues in C at a similar statistical rate as high severity issues in Rust. For instance, buffer overflows manifesting in 30% of sample C codebase resulting in unexploitable segfaults, vs architectural deficiencies in a safe Rust codebase, manifesting in 30% of cases, that allow exfiltration of everything in your databases without RCE. Under these conditions, I don't think it's reasonable to say Rust is a better choice.
Again, it's not a critique in some epistemological negativist sense. It's a critique that you are underestimating how chaotic this context actually is, and the knock-on effects of that. Nothing should surprise you.
But most of them don't have a nice strong type system like Rust. I have vibe coded some OCaml and that seems to work pretty well but I wouldn't want to use OCaml for other reasons.
https://security.googleblog.com/2025/11/rust-in-android-move...
That team claims that not having to deal with memory bugs saved them time. That time can be spent on other things (like fixing logic errors)
It makes me imagine a programming language designed for LLMs but not humans, designed for rigorous specification of every function, variable, type, etc., valid inputs and outputs, tightly coupled to unit tests, mandatory explicit handling of every exception, etc.
Maybe it'll look like a lot of boilerplate but make it easy to read as opposed to easy to write.
The idea of a language that is extremely high-effort to write, but massively assists in guaranteeing correctness, could be ideal for LLM's.
I think the ideal language for LLMs will look more like APL than C.
I’m thinking of generative tests (quickcheck style), fuzzing, erroring on invariants, contract testing (see the test.contract clojure library for a very cool contract test setup!).
Really really good test suites can do stuff that even logically verified programs can’t do. They’re just a pain in the ass to write. Seems like a good use of LLMs, and you can keep using the same languages!
We're at the point of diminishing returns from scaling and RL is the only way to see meaningful improvements
Very hard to improve much via RL without some way to tell if the code works without requiring compilation
Logic based languages like Prolog take this to the logic extreme, would love to see people revisit that idea
Look at Shellcheck. It turns a total newbie into a shell master just by iteration.
Checking preconditions and postconditions is much easier to do for a human than checking an implementation
The thing that would really make sense is a proved language like Coq or Promela
You can then really just leave the implementation to the AI.
Bingo. LLMs are language models, not models of software systems. Everything gets translated through natural language! So the quality of the abstraction still matters: code that can be described well in plain language wins.
No, it absolutely doesn't. We've seen so much vibe coded slop that it's very clear that vibe coding produces a hot mess which no self respecting person would call acceptable. No idea how you can say this as it isn't remotely true.
The two recent IT catastrophes [0] from Alaska Airlines will continue elsewhere.
[0] https://www.seattletimes.com/business/alaska-airlines/alaska...
Now they have two problems....
I was really hoping you were going to make this argument, based upon the title of the piece! Still a good read, but if you have the inclination I hope you loop back around to weighing the pros and cons of vibe coding in different languages
That has not been my experience when using Codex, Composer, Claude, or ChatGPT.
Things have just gotten to the point over the last year that the undefined behavior, memory safety, and thread safety violations are subtler and not as blindingly obvious to the person auditing the code.
But I guess that's my problem, because I'm not fully vibing it out.
Why are some of you so resistant to admit that LLMs hallucinate? A normal response would be "Oh yeah, I have issues with that sometimes too, here's how I structure my prompts." Instead you act like you've never experienced this very common thing before, and it makes you sound like a shill.
As to why not use C, or assembly, it’s not just about the code, but the toolchains. These require way more knowledge and experience to get something working than, say, Python - although that has its own rather horrible complexities with packaging and portability on the back end of the code authoring process.
A legitimate point, there are lots of performance and fine grain changes you can make, and it's a simple, common language many people use. Perhaps we could realize some of these benefits from a simple, fast language.
> Or hell, why not do it in x86 assembly?
A terrible take imo. This would be impossible to debug and it's complex enough you likely won't see any performance improvements from writing in assembly. It's also not portable, meaning you'd have to rewrite it for every OS you want to compile on.
I think there's an argument that if machines are writing code, they should write for a machine optimized language. But even using this logic I don't want to spend a bunch of time and money writing multiple architectures, or debugging assembly when things go wrong.
Debugging machine code is only bad because of poor tooling. Surely if vibe coding to machine code works we should be able to vibe code better debuggers. Portability is a non issue because the llm would have full semantic knowledge of the problem and would generate optimal, or at least nearly optimal, machine code for any known machine. This would be better, faster and cheaper than having the llm target an intermediate language, like c or rust. Moreover, they would have the ability to self-debug and fix their own bugs with minimal to no human intervention.
I don't think there is widespread understanding of how bloated and inefficient most real world compilers (and build systems) are, burning huge amounts of unnecessary energy to translate high level code, written by humans who have their own energy requirements, to machine code. It seems highly plausible to me that better llms could generate better machine code for less total energy expenditure (and in theory cost) than the human + compiler pair.
Of course I do not believe that any of the existing models are capable of doing this today, but I do not have enough expertise to make any claims for or against the possibility that the models can reach this level.
But code doesn’t only need to be understood for maintenance purposes: code is documentation for business processes. It’s a thing that needs to be understandable and explainable by humans anytime the business process is important.
LLMs can never / should never replace verifiability, liability, or value judgment.
We've not really seen what impact this will have just yet.
It produces hot garbage when it needs to bring together two tokens from the far ends of a large code base together.
This comes as no surprise to anyone who understands what the attention mechanism actually is, and as a great surprise to everyone who thinks transformers are AI magic.
> But this leads me to my second point, which I must make as clearly and forcefully as I can. Vibe coding actually works. It creates robust, complex systems that work. You can tell yourself (as I did) that it can’t possibly do that, but you are wrong. You can then tell yourself (as I did) that it’s good as a kind of alternative search engine for coding problems, but not much else. You are also wrong about that. Because when you start giving it little programming problems that you can’t be arsed to work out yourself (as I did), you discover (as I did) that it’s awfully good at those. And then one day you muse out loud (as I did) to an AI model something like, “I have an idea for a program…” And you are astounded. If you aren’t astounded, you either haven’t actually done it or you are at some stage of grief prior to acceptance. Perfect? Hardly. But then neither are human coders. The future? I think the questions answers itself.
This cannot be repeated enough. For all the AI hype, if you think AI isn't the most useful programming tool invented in the last 20 years, you're ignorant of the SOTA or deeply in denial.
As @tptacek recently wrote:
> All progress on LLMs could halt today, and LLMs would remain the 2nd most important thing to happen over the course of my career.
Do you have any examples of these? All the vibe coded systems I've seen so far were very far from robust.
https://news.ycombinator.com/item?id=45549434
† this is a joke; i feel it works on multiple levels
edit: it was a real request, I was being interested, not some mockery or idk
After having understood the context, I still believe that a strongly typed language would be a much better choice of a language, for exactly the same reason why I wouldn't recommend starting a project in C unless there is a strong preference (and even then Rust would probably be better still).
LLMs are not perfect, just like humans, so I would never vibe code in any other environment than one in which many/most logical errors are by definition impossible to compile.
Not sure if C is worse than python/js in that respect (I'd argue it is better for some and worse for other, regarding safety) but Java, Swift, C#, Go, Rust, etc. are great languages for vibe coding since you have the compiler giving you almost instant feedback on how well your vibe coding is going.
I wouldn't trust it to reliably write safe C though. It works in Rust because there's meaning embedded into the types that are checked by the compiler that gives it feedback when it makes mistakes. But you don't get that in C.
I would appreciate a post with examples, not just prose. It helps to put things in a more grounded reality.
A) I barely get to do any coding these days anyways
B) Reading code is harder than writing it (and thus, easier to gloss over), and by the time I'm ready to write code I've already done all the hard work (I.E. even if vibe coding made me 50% faster, it's 50% of 5% of the overall software development life cycle in this more senior role)
C) I've never even copied code from Stack Overflow into my editor (maybe once or twice in a couple decades), I always type things myself because it literally forces you to walk through character by character in a field where changing one character can easily lead to 8 hour bug hunts
D) There's probably not much world where I can't ramp up fairly quickly on how to prompt well
E) It seems to me everyone is spending all their time comparing model X with model Y, creating prompt context files, running multiple agents in parallel... if the purported gains are to occur, eventually we should have tools that require less of all that, and I can just use those later tools when they're around instead of learning a bunch of stuff that will naturally be useless (This is like if you became a Backbone JS expert and were left stunned when people started using React)
F) And if those gains don't occur (and the gains certainly seem to be leveling off quick, the comments today look much like the comments a few years ago, and I've really not seen much one way or the other when comparing a variety of coworkers in terms of productivity beyond POCs, and the starts of small scope green field projects (although, those can be accomplished by non technical people in some instances which is neat)) then... well... I guess I'll just keep doing what I've been doing for the last couple decades, but I won't have wasted a bunch of time learning how to prompt Grok vs Copilot vs ChatGPT or what ever and I'll still have tons of information in my head about how everything works
I think strong static typing probably? Which is, well, not javascript in fact! (And I have bucked the trend on this previously, liking ruby -- but I'm not sure I'd want AI-generated code without it?)
None of our existing programming languages were designed for quite the circumstance in which contemporary programming now finds itself; they all address an ergonomic situation in which there are humans and machines (not humans, machines, and LLMs).
It's possible, I suppose that the only PL that makes sense here is the one the LLMs "knows" best, but I sort of doubt that that makes sense over the long term. And I'm repeating myself, but really, it seems to me that a language that was written entirely for the ergonomic situation of human coders without any consideration of LLMs is not addressing the contemporary situation. This is not a precise analogy, but it seems to me a little like the difference between a language that was designed before vs after multicore -- or before vs after the internet.
So even if you make a better programming language for a LLM, it has nothing to train on. Unless we start to transcode human language code to the LLM code.
Are the vectors/tokens/whatever, not already LLM code at this point? Technically, LLMs not are doing what Haxe was doing (haxe.org) but in a more advanced form?
Even if we make a more LLM like programming code, in a sense, we are just making another code that needs to be translated into the tokens that consist in a LLM model, no?
Feels like we are starting to hit philosophical debates with that one lol
My experience with LLMs is that they are not good at tracking resources and perform much better with languages that reduce cognitive load for humans.
Clojure generation is also very solid. Gemini Pro 2.5/3 is fantastic at it.
A part of me wonders if that is because these languages primarily have senior devs writing code, so the entire training set is "good" code.
When I would explore Elixir forums with much larger communities there'd be myriad base level questions with code blocks written as if Elixir and Ruby were interchangable cause the syntax looks similar and thus missing out on many of the benefits of OTP.
But when you'd go to the Erlang community to ask a question, half the time the author of the book or library was one of the like... 20 people online at any given moment, and they'd respond directly. The quality of the discussions was of course much deeper and substantial much more consistently.
I have not tried to generate Elixir vs Erlang code but maybe it'd be a neat experiment to see if the quality seems better with Erlang
The difference is, we forgive humans for needing iteration. We expect them to get it wrong first, improve with feedback, and learn through debugging. But when AI writes imperfect code, you declare the entire approach fraudulent?
We shouldn't care about flawless one-shot generations. The value is in collapsing the time between idea and execution. If a model can give you a working draft in 3 seconds - even if it's 80% right - that's already a 10x shift in how we build software.
Don't confuse the present with the limit. Eventually, in not that many years, you'll vibe in English, and your AI co-dev will do the rest.
In my work, the bigger bottleneck to productivity is that very few people can correctly articulate requirements. I work in backend, API development, which is completely different from fullstack development with backend development. If you ask PMs about backend requirements, they will dodge you, and if you ask front-end or web developers, they are waiting for you to provide them the API. The hardest part is understanding the requirements. It's not because of illiteracy. It's because software development is a lot more than coding and requires critical thinking to discover the requirements.
this already exists.
This is how most non-technical stakeholders feel when you probe for consistent, thorough requirements and a key professional skill for many more senior developers and consultants is in mastering the soft skills that keep them attentive and sufficiently helpful. Those skills are not generic sycophancy, but involve personal attunement to the stakeholder, patience (exercising and engendering), and cycling the right balance between persistence and de-escalation.
Or do you just mean there will be some PM who acts as proxy between for the stakeholder on the ticket, but still needs to get them onto the phone and into meetings so the answers can be secured?
Because in the real world, the prior is outlandish and the latter doesn't gain much.
It can make a vague ticket precise and that can be an easy platform to have discussions with stakeholders.
Thank you for sharing this workflow. I have low tolerance for LLM written text, but this seems like a really good use case.
I find having an LLM create tickets for itself to implement to be an effective tool that I rarely have to provide feedback for at all.
This seems like greybeards complaining that people who don't write assembly by hand.
Stop being outraged for things that are only real on your mind.
Am I outraged?
And yes, there absolutely was a vocal group of a certain type of programmer complaining about high level languages like C and their risks and inefficiency and lack of control insisting that real programmers wrote code in assembly. It's hard to find references because google sucks these days and I'm not really willing to put in the effort.
It's not surprising that Google doesn't turn these up, the golden era of this complaining was pre-WWW.
[0]: https://www.ee.torontomu.ca/~elf/hack/realmen.html [1]: https://melsloop.com/
Mel or Terry Adams are the exception to the rule... Having that image of greybeards only come if you have never worked with one in real life, sorry you are biased.
And yes, the shift to higher level languages like C, FORTRAN, etc., was regarded by some as pandering to the new generation that didn't want to actually learn programming.
With some truth, in my opinion. I think higher level languages bring huge benefits, so I'm not bemoaning their existence. But it still weirds me out when there's a professional developer that doesn't have at least a cursory knowledge of assembly. AI programming assistance (which I'm sure will be very different than today's 'vibe coding') does seem like a similar state change. I certainly don't object to it in principle, it will probably be a large productivity improvement.
But I'm sure that with it, there will be the loss of fundamental knowledge for some people. Like digital artists who never learn the properties of real paint.
"Wait until you learn that most people's writing skills are that of below LLMs"
... went askew at "that of below LLMs".
I'm an arse: soz!
As long as you are also paying attention to the content and not just form.
From time to time I have talked over a ticket with an LLM and gotten back what I think is a useful analysis of the problem and put it into the text or comments and I find my peeps tend to think these are TLDR.
An LLM will be just as verbose as you ask it to be. The default response can be very chatty, but you can figure out how to ask it to give results in various lengths.
The guy is also a complete tool. I'd point out that what he described wasn't actually what they needed, and that there functionality was ... strange and didn't actually do anything useful. We'd be told to just do as we where being told, seeing as they where the ones paying the bills. Sometimes we'd read between the lines, and just deliver what was actually needed, then we'd be told just do as we where told next time, and they'd then use the code we wrote anyway. At some point we got tired of the complaining and just did exactly as the tasks described, complete with tests that showed that everything worked as specified. Then we where told that our deliveries didn't work, because that wasn't what they'd asked for, but couldn't tell us where we misunderstood the Jira task. Plus the tests showed that the code functioned as specified.
Even if the Jira tasks are in a state where it seems like you could feed them directly to an LLM, there's no context (or incorrect context) and how is a chatbot to know that the author of the task is a moron?
Maybe for the most mundane, repetitive tasks that's true.
But I'd argue that the code is the full specification, so if you're going to fully specify it you might as well just write the code and then you'll actually have to be confronted with your mistaken assumptions.
Does it matter?
The chatbot could deliver exactly what was asked for (even if it wasn't what was needed) without any angst or interpersonal issues.
Don't get me wrong. I feel you. I've been there, done that.
OTOH, maybe we should leave the morons to their shiny new toys and let them get on with specifying enough rope to hang themselves from the tallest available structure.
Agentic AI can now do 20 rounds of lobbying with all stake holders as long as it’s over something like slack.
A) as stated by parent comment, the ones doing req. mngmt. Are doing a poor job of abstracting the requirements, and what could be done as one feature suddenly turns in 25.
B) in a similar manner as A, all solutions imply writing more and more code, and never refactor and abstract parts away.
When you start getting down into the weeds, there can be tons and tons of little details around state maintenance, accessibility, edge cases, failure modes, alternate operation modes etc.
That all combines to make lots of code that is highly interconnected, so you need to write even more code to test it. Sometimes much more than even the target implementations code.
Most software products built that way seem to move fast at first but become monstrous abominations over time. If those are the only places you keep finding yourself in, be careful!
As a stupid example, I hate the functionality that YouTube has to maintain playlists. However, I don't have the time to build something by hand. It turns out that the general case is hard, but the "for me" case is vibe codable. (Yes, I could code it myself. No, I'm not going to spend the time to do so.)
Or, using the Jira API to extract the statistics I need instead of spending a Thursday night away from the family or pushing out other work.
Or, any number of tools that are within my capabilities but not within my time budget. And there's more potential software that fits this bill than software that needs to be bridge-stable.
But the person I replied to seemed to be talking about a task agenda for their professional work, not a todo list of bespoke little weekend hobby hacks that might be handy "around the house".
Work is finite, but there can be vastly more available than there are employees to do it for many reasons, not just my personal case.
souce: been there, done some of that.
I'd say 25% of my work-hours are just going around to stakeholders and getting them to say what some of their unstated assumptions and requirements are.
I find the LLMs boost my productivity because I've always had a sort of architectural mindset, I love looking up projects that solve specific problems and keeping them on the back of my mind, turns out I was building myself up for instructing LLMs on how to build me software, and it takes several months worth of effort and spits it out in a few hours.
Speaking of vibe coding in archaic languages, I'm using LLMs to understand old Shockwave Lingo to translate it to a more modern language, so I can rebuild a legacy game in a modern language. Maybe once I spin up my blog again I'll start documenting that fun journey.
- SSL 2.0-TLS 1.1, HTTP/0.9-HTTP/1.1, ftp, WAIS, gopher, finger, telnet, rwho, TinyFugue MUD, UUCP email, SHOUTcast streaming some public domain radio whatever
- <blink>, <marquee>, <object>, XHTML, SGML
- Java <applet>, Java Web Start
- MSJVM/J++, ActiveX, Silverlight
- Flash, Shockwave (of course), Adobe Air
- (Cosmo) VRML
- Joke ActiveX control or toolbar that turns a Win 9x/NT-XP box into a "real" ProgressBar95. ;)
(Gov't mandated PSA: Run vintage {good,bad}ness with care.)
edit: I think i found it https://news.ycombinator.com/item?id=45783640
Well, I think we can say C is archaic when most developers write in something that for one isn't C, two isn't a language itself written in C, or three isn't running on something written in C :)
C++: JavaScript (V8), Java, C#
C: Python, PHP, Lua, Ruby
Self-hosted: Go, Rust
Far from archaic indeed. We're still living in the C/C++ world.Then depending on which JVM implementation we are talking about the actual JVM runtime can be Java, C, or C++, or a mix of them.
Modern C compilers are written in C++.
Rust uses LLVM, written in C++.
lol
> How do I express this code in Typescript?
it's
> What is the best way to express this idea in a way that won't confuse or anger our users? Where in the library should I put this new idea? Upstream of X? Downstream of Y? How do I make it flexible so they can choose how to integrate this? Or maybe I don't want to make it flexible - maybe I want to force them to use this new format?
> Plus making sure that whatever changes I make are non-breaking, which means that if I update some function with new parameters, they need to be made optional, so now I need to remember, downstream, that this particular argument may or may not be `undefined` because I don't want to break implementations from customers who just upgraded the most recent minor or patch version
The majority of the problems I solve are philosophical, not linguistic
Even if you don't let it author or write a single line of code, from collecting information, inspecting code, reviewing requirements, reviewing PRs, finding bugs, hell even researching information online, there's so many things it does well and fast that if you're not leveraging it, you're either in denial or have ai skill issues period.
Even if you limit your AI experience to finding information online through deep research it's such a time saver and productivity booster that makes a lot of difference.
The list of things it can do for you is massive, even if you don't have it write a single line of code.
Yet the counter argument is like "bu..but..my colleague is pushing slop and it's not good at writing code for me", come on, then use it at things it's good at, not things you don't find it satisfactory.
AI multiplied the amount of code I committed last month by 5x and it's exactly the code I would have written manually. Because I review every line.
model: Claude Sonnet 3.5/4.5 in VSCode GitHub Copilot. (GPT Codex and Gemini are good too)
There is in my case because it's just CRUD code. The pattern looks exactly like the code I wrote the month prior.
And this is where LLMs excel at, in my experience. "Given these examples, extrapolate to these other cases."
When I was early in use of it I would say I sped up 4x but now after using it heavily for a long time some days it's 20% other days -20%
It's a very difficuly technology to know when you're one or the other.
The real thing to note is when you "feel" lazy and using AI you are almost certainly in the -20% category. I've had days of not thinking and I have to revert all the code from that day because AI jacked it up so much.
To get that speed up you need to be truly focused 100% or risk death by a thousand cuts.
- I think given public available metrics, it's clear that this isn't translating into more products/apps getting shipped. That could be because devs are now running into other bottlenecks, but it could also indicate that there's something wrong with these studies.
- Most devs who say AI speeds them up assert numbers much higher than what those studies have shown. Much of the hype around these tools is built on those higher estimates.
- I won't claim to have read every study, but of the ones I have checked in the past, the more the methodology impressed me the less effect it showed.
- Prior to LLMs, it was near universally accepted wisdom that you couldn't really measure developer productivity directly.
- Review is imperfect, and LLMs produce worse code on average than human developers. That should result in somewhat lowered code quality with LLM usage (although that might be an acceptable trade off for some). The fact that some of these studies didn't find that is another thing that suggests there shortcomings in said studies.
I am not sure how much is just programmers saying "10x" because that is the meme, but if at all realistic numbers are mentioned, I see people claiming 20 - 50%, which lines up with the studies above. E.g. https://news.ycombinator.com/item?id=45800710 and https://news.ycombinator.com/item?id=46197037
> - Prior to LLMs, it was near universally accepted wisdom that you couldn't really measure developer productivity directly.
Absolutely, and all the largest studies I've looked at mention this clearly and explain how they try to address it.
> Review is imperfect, and LLMs produce worse code on average than human developers.
Wait, I'm not sure that can be asserted at all. Anecdotally not my experience, and the largest study in the link above explicitly discuss it and find that proxies for quality (like approval rates) indicate more improvement than a decline. The Stanford video accounts for code churn (possibly due to fixing AI-created mistakes) and still finds a clear productivity boost.
My current hypothesis, based on the DORA and DX 2025 reports, is that quality is largely a function of your quality control processes (tests, CI/CD etc.)
That said, I would be very interested in studies you found interesting. I'm always looking for more empirical evidence!
Most of those studies either measure productivity using useless metrics like lines of code, number of PRs, or whose participants are working for organizations that are heavily invested in future success of AI.
One of my older comments addressing a similar list of studies: https://news.ycombinator.com/item?id=45324157
For instance, when measure the number of PRs they ensure that each one goes through the same review process whether AI-assisted or not, ensuring these PRs meet the same quality standards as humans.
Furthermore, they did this as a randomly controlled trial comparing engineers without AI to those with AI (in most cases, the same ones over time!) which does control for a lot of the issues with using PRs in isolation as a holistic view of productivity.
>... whose participants are working for organizations that are heavily invested in future success of AI.
That seems pretty ad hom, unless you want to claim they are faking the data. Along with co-authors who are from premier institutes like NBER, MIT, UPenn, Princeton, etc.
And here's the kicker: they all converge on a similar range of productivity boost, such as the Stanford study:
> https://www.youtube.com/watch?v=tbDDYKRFjhk (from Stanford, not an RCT, but the largest scale with actual commits from 100K developers across 600+ companies, and tries to account for reworking AI output. Same guys behind the "ghost engineers" story.
The preponderence of evidence paints a very clear picture. The alternative hypothesis is that ALL these institutes and companies are colluding. Occam's razor and all that.
IME most people claim small integer multiples, 2-5x.
> all the largest studies I've looked at mention this clearly and explain how they try to address it.
Yes, but I think pre-AI virtually everyone reading this would have been very skeptical about their ability to do so.
> My current hypothesis, based on the DORA and DX 2025 reports, is that quality is largely a function of your quality control processes (tests, CI/CD etc.)
This is pretty obviously incorrect, IMO. To see why, let's pretend it's 2021 and LLMs haven't come out yet. Someone is suggesting no longer using experienced (and expensive) first world developers to write code. Instead, they suggest hiring several barely trained boot camp devs (from low cost of living parts of the world so they're dirt cheap) for every current dev and having the latter just do review. They claim that this won't impact quality because of the aforementioned review and their QA process. Do you think that's a realistic assessment? If and on the off chance you think it is, why didn't this happen on a larger scale pre-LLM?
The resolution here is that while quality control is clearly important, it's imperfect, ergo the quality of the code before passing through that process still matters. Pass worse code in, and you'll get worse code out. As such, any team using the method described above might produce more code, but it would be worse code.
> the largest study in the link above explicitly discuss it and find that proxies for quality (like approval rates) indicate more improvement than a decline
Right, but my point is that that's a sanity check failure. The fact that shoving worse at your quality control system will lower the quality of the code coming out the other side is IMO very well established, as is the fact that LLM generated code is still worse than human generated (where the human knows how to write the code in question, which they should if they're going to be responsible for it). It follows that more LLM code generation will result in worse code, and if a study finds the opposite it's very likely that the it made some mistake.
As an analogy, when a physics experiment appeared to find that neutrino travel faster than the speed of light in a vacuum, the correct conclusion was that there had almost certainly been a problem with the experiment, not that neutrinos actually travel faster than the speed of light. That was indeed the explanation. (Note that I'm not claiming that "quality control processes cannot completely eliminate the effect of input code quality" and "LLM generated code is worse than human generated code" are as well established as relativity.)
That's not quite true: while everybody acknowledged it was folly to measure absolute individual productivity, there were aggregate metrics many in the industry were aligning on like DORA or the SPACE framework, not to mention studies like https://dl.acm.org/doi/abs/10.1145/3540250.3558940
Similarly, many of these AI coding studies do not look at productivity on an individual level at a point of time, but in aggregate and over an extended period of time using a randomized controlled trial. It's not saying Alice is more productive than Bob, it's saying Alice and Bob with AI are on average more productive than themselves without AI.
> They claim that this won't impact quality because of the aforementioned review and their QA process. Do you think that's a realistic assessment? If and on the off chance you think it is, why didn't this happen on a larger scale pre-LLM?
Interestingly, I think something similar did happen pre-LLM at industry-scale! My hypothesis (based on observations when personally involved) is that this is exactly what allowed offshoring to boom. The earliest attempts at offshoring were marked by high-profile disasters that led many to scoff at the whole idea. However companies quickly learned and instituted better processes that basically made failures an exception rather than the norm.
I expand a bit more and draw parallels to coding with AI here: https://news.ycombinator.com/item?id=44944717
> ... as is the fact that LLM generated code is still worse than human generated...
I still don't think that can be assumed as a fact. The few studies I've seen find comparable outcomes, with LLMs actually having a slight edge in some cases, e.g.
Feel free to cite said data you've seen supporting this argument.
For some things LLMs are like magic. For other things LLMs are maddeningly useless.
The irony to me is anyone who says something like "you don't know how to use the LLM" actually hasn't explored the models enough to understand their strengths/weaknesses and how random and arbitrary the strengths and weakness are.
Their use cases happen to line up with the strengths of the model and think it is something they are doing special themselves when it is not.
Meanwhile, if you grift hard enough, you can become CEO of a trillion dollar company or President of the United States. Young people are being raised today seeing that you can raise billions on the promise building self driving cars in 3 years, not deliver even after 10 years, and nothing bad actually happens. Your business doesn't crater, you don't get sued into oblivion, your reputation doesn't really change. In fact, the bigger the grift, the more people are incentivized to prop it up. Care and professionalism are dead until we go back to an environment that is not so nurturing for grifts.
As a Professor of English who teaches programming to humanities students, the writer has had an extremely interesting and unusual academic career [1]. He sounds awesome, but I think it's fair to suggest he may not have much experience of large scale commercial software development or be particularly well placed to predict what will or will not work in that environment. (Not that he necessarily claims to, but it's implicit in strong predictions about what the "future of programming" will be.)
That said, I think people really under appreciate how diverse programmers actually are. I started in physics and came over when I went to grad school. While I wouldn't expect a physicist to do super well on leetcode problems I've seen those same people write incredible code that's optimized for HPC systems and they're really good at tracing bottlenecks (it's a skill that translates from physics really really well). Hell, the best programmer I've ever met got that way because he was doing his PhD in mechanical engineering. He's practically the leading expert in data streaming for HPC systems and gained this skill because he needed more performance for his other work.
There's a lot of different types of programmers out there but I think it's too easy to think the field is narrow.
sigh
For my part, I'm a lot older than you and don't consider myself old. Indeed, I think prematurely thinking of yourself as old can be a pretty bad mistake, health-wise.
I guess the median age of YCombinator cohorts is <30 ?
I'm 62, and I'm not old yet, you're just a kid. ;-)
Seriously, there are some folks here who started on punch cards and/or paper tape in the 1960s.
Some magic tricks are unimpressive when you know how they are done. But that's not true for all of them. Some of them only become more and more impressive, only truly being able to be appreciated by other masters. The best magic tricks don't just impress an audience, they impress an audience of magicians.
I guess I am reaching Gandalf status then. :)
The 30s is the first decade of life that people experience where there are adults younger than them. This inevitably leads people in their 30s to start saying that they are "old" even though they generally have decades of vigor ahead of them.
That's absolutely not true. It was awkwardly funny to read that.
That is the strangest thing I've heard today.
From the author's about page:
> I discovered digital humanities (“humanities computing,” as it was then called) while I was a graduate student at the University of Virginia in the mid-nineties. I found the whole thing very exciting, but felt that before I could get on to things like computational text analysis and other kinds of humanistic geekery, I needed to work through a set of thorny philosophical problems. Is there such a thing as “algorithmic” literary criticism? Is there a distinct, humanistic form of visualization that differs from its scientific counterpart? What does it mean to “read” a text with a machine? Computational analysis of the human record seems to imply a different conception of hermeneutics, but what is that new conception?
I very much enjoy the act of programming, but I'm also a professional software developer. Incidentally, I've almost always worked in fields where subtly wrong answers could get someone hurt or killed. I just can't imagine either giving up my joy in the former case or abdicating my responsibility to understand my code in the latter.
And this is why the wood working analogy falls down. The scale at which damage can occur due to the decision to use power tools over hand tools is, for most practical purposes, limited to just myself. With computers, we can share our fuck ups with the whole world.
The advantage of hand coded solutions is that the author of the code has some sense of what the code really does and so is a proxy for transparency, vibe coded solutions not so much.
I mean, it is 2025 and still customers are the best detectors of bad software over all quality apparatus to date.
we vibe requirements to our ticket tracker with an api key, vibe code ticket effort, and manage the state of the tickets via our commits and pull requests and deployments
just teach the guy the product manager is shielding you from not to micromanage and all the frictions are gone
in this same year I've worked at an organization that didn't allow AI use at all, and by Q2, Co-Pilot was somehow solving their data security concerns (gigglesnort)
in a different organization none of those restrictions are there and the productivity boost is through an order of magnitude greater
I was thinking more that the human would tell the machine want to make. The machine would help flesh out the idea into actual requirements, and make any decisions the humans are too afraid or indecisive to make. Then the coding can start.
I've found the same way. I just published an AI AUP for my company and most of it is teaching folks HOW to use AI.
I'm the last guy to be enthused about any "ritualistic" seeming businessy processes. Just let me code...
However, some things do need actually well defined adhered to processes where all parties are aware of and agreeing with the protocol.
Sure, there are the overhypers who talk about software engineers getting entirely replaced, but I get the sense those are not people who've ever done software development in their lives. And I have not seen any credible person claiming that engineering as whole can be done by AI.
On the other hand, the most grounded comments about AI-assisted programming everywhere are about the code, and maybe some architecture and design aspects. I personally, along with many other commenters here and actual large-scale studies, have found that AI does significantly boost coding productivity.
So yes, actual software engineering is much more than coding. But note that even if coding is, say, only 25% of engineering (there are actually studies about this), putting a significant dent in that is still a huge boost to overall productivity.
The converse is that if vibe coding is the future, that means we assume there are things the AI cannot do well (such as come up with requirements), at which point it's also likely it cannot actually vibe code that well.
The general problem is that once we start talking about imagined AI capabilities, both the capabilities and the constraints become arbitrary. If we imagine an AI that does X but not Y, we could just as easily imagine an AI that does both X and Y.
But I think it is certainly possible that we reach a point/plateau where everything is just 'english -> code' compilation but that 'vibe coding' compilation step is really really good.
Ideation at the working PM level, sure. I meant more hard technical ideation - ie. what gets us from 'not working humanoid robot' to 'humanoid robot' or 'what do we need to do to get a detection of a higgs boson', etc. etc. I think it is possible to imagine a world where 'english -> code' (for reasonably specific english) is solved but not that level of ideation. If that level of ideation is solved, then we have ASI.
One: English is terribly non-prescriptive. Explaining an algorithm is incredibly laborious in spoken language and can contain many ambiguous errors. Try reading Euclid’s Elements. Or really any pre-algebra text and reproduce its results.
Fortunately there’s a solution to that. Formal languages.
Now LLMs can somewhat bridge that gap due to how frequently we write about code. But it’s a non-deterministic process and hallucinations are by design. There’s no escaping the fact that an LLM is making up the code it generates. There’s nothing inside the machine that is understanding what any of the data it’s manipulating means or how it affects the system it’s generating code for.
And it’s not even a tool.
Worse, we can’t actually ship the code that gets generated without a human appendage to the machine to take the fall for it if there are any mistakes in it.
If you’re trying to vibe code an operating system and have no idea what good OS design is or what good code for such a system looks like… you’re going to be a bad appendage for the clanker. If it could ship code on its own the corporate powers that be absolutely would fire all the vibe coders and you’d never work again.
Vibe coding is turning people into indentured corporate servants. The last mile delivery driver of code. Every input surveilled and scrutinized. Output is your responsibility and something you have little control over. You learn nothing when the LLM gives you the answer because you’ll forget it tomorrow. There’s no joy in it either because there is no challenge and no difficulty.
I think what pron is leading to is that there’s no need to imagine what these machines could potentially do. I think we should be looking at what they actually do, who they’re doing it to, and who benefits from it.
But there doesn’t seem to be any off ramp, given the incentives of our current economic system.
Do you have evidence or empirical arguments to the contrary?
They already can brainstorm new features and make roadmaps. If you give them more context about the business strategy/goals then they will make better guesses. If you give them more details about the user personas / feedback / etc they will prioritize better.
We're still just working our way up the ladder of systematizing that context, building better abstractions, workflows, etc.
If you were to start a new company with an AI assistant and feed it every piece of information (which it structures / summarizes synthesizes etc in a systematic way) even with finite context it's going to be damn good. I mean just imagine a system that can continuously read and structure all the data from regular news, market reports, competitor press releases, public user forums, sales call transcripts, etc etc. It's the dream of "big data".
There are plenty of reasons.
Radiologists aren’t being replaced by AI because of liability. Same for e.g. civil engineers. Coders don’t have liability for shipping shit code. That makes switching to an AI that’s equally blameless easier.
Also, data: the web is first and foremost a lot of code. AI is getting good at coding first for good reason.
Finally, as OP says, the hard work in engineering is actually scoping requirements and then executing and iterating on that. Some of that is technical know-how. A lot is also political and social skills. Again, customers are okay with a vibe-coded website in a way most people are not with even support chatbots.
What if you're shipping code for a therac-25?
Trying to insert myself, or the right backend people, into the process, is more challenging now than it used to be, and a bad API can make or break the user experience as the UI gets tangled in the web of spaghetti.
It hobbles the effectiveness of whatever you could get an LLM to do because you’re already starting on the backfoot, requirements-wise.
Lots of people hide the fact that they struggle with reading and a lot of people hide or try to hide the fact they don’t understand something.
Highest gains are def in full stack frameworks (like nextjs), with Database ORM, and building large features in one go, not having to go back & forth with stakeholders or collegues.
This is the new programming. Programming and requirements are both a form of semantics. One conveys meaning to a computer at a lower level, the other conveys it to a human at a higher level. Well now we need to convey it at a higher level to an LLM so it can take care of the lower-level translation.
I wonder if the LLM will eventually skip the programming part and just start moving bits around in response to requirements?
Mythical Man Month had it all--build one to throw away.
The people with the needs and ideas are often so divorced from the "how" that they don't even bother trying to nail down the details. I think in their mind they are delegating that to the specialists.
This question of who writes the requirements is so ubiquitous you would think we'd have better solutions for it. I know some people solve it with processes like BDD but personally I think we'd be better off if we just had clearer role definitions.
For example, in a waterfall project the requirements usually land in the lap of the Business Analyst. Well when you look at Business Analyst roles you see they are expected to do a lot more than documenting requirements, so it's viewed as acceptable when they are somewhat bad at it. They also spend most of their time with the business so they are unaware of the limitations of the team who is expected to implement the changes.
For another example look at Scrum. It talks a lot about good requirements in the form of user stories, but it stops short of assigning this responsibility to any one of the formal roles, presumably making it a team activity or expecting it to be organic.
When we want someone to write code we hire a programmer, and writing code is what they are expected to do. Where is the role that is strictly requirements and nothing else? Considering how often I hear complaints about bad requirements, it seems overdue that we establish one.
Agreed.
In addition, on the other side of the pipeline, code reviews are another bottleneck. We could have more MRs in review thanks to AI, but we can't really move at the speed of LLM's outputs unless we blindly trust it (or trust another AI to do the reviews, at which point what are we doing here at all...)
Except that now it still takes me the same time to understand the requirements ... and then the coding takes 1/2 or 1/3 of the time. The coding also always takes 1/3 of the effort so I leave my job less burned out.
Context: web app development agency.
I really don't understand this "if it does not replace me 100% it's not making me more productive" mentality. Yeah, it's not a perfect replacement for a senior developer ... but it is like putting the senior developer on a bike and pretending that it's not making them go any faster because they are still using their legs.
The observation from Lean is that the faster you can build a prototype, the faster you can validate the real/unspoken/unclear requirements.
This applies for backends too. A lot of the “enterprise-y” patterns like BFFs, hexagonal, and so on, will make it really easy to compose new APIs from your building blocks. We don’t do this now because it’s too expensive to write all the boilerplate involved. But one BFF microservice per customer would be totally feasible for a sales engineer to vibe code, in the right architecture.
One could argue that "vibe coding" forces you (eventually) to think in terms of requirements. There's a range of approaches, from "nitpick over every line written by AI" to "yolo this entire thing", but one thing they have in common is they all accelerate failure if the specs are not there. You very quickly find out you don't know where you're going.
I see this in my work as well, the biggest bottleneck is squeezing coherent, well-defined requirements out of PMs. It's easy to get a vision board, endless stacks of slides about priorities and direction, even great big nests of AWS / Azure thingnames masquerading as architecture diagrams. But actual "this is the functionality we want to implement and here are the key characteristics of it" detail? Absolutely scarce.
Which is what vibe coders are.....
I had to retake it with the same instructor but by some luck I was able to take it online, where I would spend the majority of the time trying to decipher what he was asking me to do.
Ultimately I found that the actual ask was being given as a 3 second aside in a 50 minute lecture. Once I figured out his quirk I was able to isolate the ask and code it up, ended with an A+ in the class on the second take.
I would like to say that I learned a lot about programming from that teacher, but what I actually learned is what you're saying.
Smart, educated, capable people are broken when it comes to clearly communicating their needs to other people just slightly outside of their domain. If you can learn the skill of figuring out what the hell they're asking for and delivering that, that one skill will be more valuable to you in your career than competency itself.
Unless it's an existing project where migration is too costly, C is just entering a time wasting pact along with a lot of other people that like suffering for free.
And having them fight it off between each other. To see where the issues are with each methode, what works better. Doing that without vibe coding the hell out of it, will take months of work, but with vibing and some cash, you do it in a few days.
But if you want a "citation needed" name of someone shipping vibe coded apps and making money off it: on YouTube, Ed Yonge, or many of the guests on Starter Story.
Whether those are substantial enough to count as shipped projects is a matter of debate
I'll write about the process after I've released a few more things as I have some disagreements with the current discourse.
No it doesn't. Just for the fun of it because I'm somewhat familiar with the VLC codebase I tried to fix some bugs with "agentic tooling" and "vibe coding". And it just produces crap. Which is one metric I'd propose for the usefulness of these tools, why aren't they fixing real bugs in the large open source codebases of this world? You'd be a hero, VLC has like 4000 open issues.
The answer is of course because these tools, in particular in manual memory managed languages which the author proposes to use, don't work at all. Maybe they work on a toy project of 500 lines of code, which is all every demo ever produces, but these text based systems have no actual understanding of the hardware underlying a complex program. That's just not how they work.
Both the author and I agree in that yes, it can.
Does it always generate good code?
Here is where the author and I disagree vehemently. The author implies that the ai-generated code is always correct. My personal experience is that it often isn't. Not even for big projects - for small bugfixes it also misunderstands and hallucinates solutions.
So no C or assembly for me, thank you very much.
Going forwards, when LLMs / coding tools are able to learn new languages, then languages designed for machines vs humans certainly makes sense.
Languages designed for robust error detection and checking, etc. Prefer verbosity where it adds information rather than succintness. Static typing vs dynamic. Contractual specification of function input/output guarantees. Modular/localized design.
It's largely the same considerations that make a language good for large team, large code base projects, opposite end of the spectrum to scripting languages, except that if it's machine generated you can really go to town on adding as much verbosity is needed to tighten the specification and catch bugs at compile time vs runtime.
“Do not fall into the trap of anthropomorphizing Larry Ellison. You need to think of Larry Ellison the way you think of a lawnmower. You don’t anthropomorphize your lawnmower, the lawnmower just mows the lawn, you stick your hand in there and it’ll chop it off, the end. You don’t think ‘oh, the lawnmower hates me’ — lawnmower doesn’t give a shit about you, lawnmower can’t hate you. Don’t anthropomorphize the lawnmower. Don’t fall into that trap about Oracle.” -Bryan Cantrill
“I actually think that it does a dis-service to not go to Nazi allegory because if I don’t use Nazi allegory when referring to Oracle there’s some critical understanding that I have left on the table […] in fact as I have said before I emphatically believe that if you have to explain the Nazis to someone who had never heard of World War 2 but was an Oracle customer there’s a very good chance that you would explain the Nazis in Oracle allegory.” -Bryan Cantrill
https://www.youtube.com/watch?v=-zRN7XLCRhc
Let's please not turn over the future of AI and programming languages over to a lawnmower.
Then show us this robust, complex code that was produced by vibe coding and let us judge for ourselves.
Alternatively, use a language like ZL that embeds C/C++ in a macro-supporting, high-level language (eg Scheme). Encode higher level concepts in it with generation of human-readable, low-level code. F* did this. Now, you get C with higher-level features we can train AI's on
If an LLM is in fact capable of generating code free of memory safety errors, then it's certainly also capable of writing the Rust types that guarantee this and are checkable. We could go even further and have automated generation of proofs, either in C using tools similar to CompCert, or perhaps something like ATS2. The reason we don't do these at scale is that they're tedious and verbose, and that's presumably something AI can solve.
Similar points were also made in Martin Kleppmann's recent blog post [1].
[1]: https://martin.kleppmann.com/2025/12/08/ai-formal-verificati...
Simplicity? I learned Rust years ago (when it was still pre release), and when i now look at a lot of codebases, i can barely get a sense what is going on, with all the new stuff that got introduced. Its like looking at something familiar and different at the same time.
I do not feel the same when i see Go code, as so little has changed / got added to it. The biggest thing is probably generics and that is so rarely used.
For me, this is, what i think, appeals for C programmers. The fact that the language does not evolve and has been static.
If we compare this to C++, that has become a mess over time, and i know i am getting downvoted for this, Rust feels like its going way too much in the Rust++ route.
Like everybody and their dog wants something added, to make Rust do more things, but at the same moment, it feels like its repeating the C++ history. I have seen the same issue with other languages that started simple, and then becomes monsters of feature sets. D comes to mind.
So when you see the codebase between developers, the different styles because of the use of different feature sets, creates this disconnect and makes it harder for people to read other code. While with C, because of the language limits, your more often down a rather easier way to read the same code. If that makes sense?
In order to prove lack of UB, you have to be able to reason about other things. For example, to safely call qsort, you have to prove that the comparison is a total order. That's not easy, especially if comparing larger and more complicated structures with pointers.
And of course, proving the lack of pointer aliasing in C is extremely difficult, even more so if pointer arithmetic is employed.
I'm planning to, why bother with react when I can jump straight into WASM?
ya filthy animal!
But then again LLMs in their current form are trained on mountains of human language so maybe having them output human readable code makes sense at least for now
Well, because you can do it in Fortran, of course!
What else do you want? Multidimensional arrays out of the box, fast loops, native cuda support, trivial build and packaging system, zero version churning... all of this just with the bare language. It's the anti-python! The perfect language, you could say! Strings and i/o are a bit cumbersome, agreed, but your llm can take care of these without any trouble, no matter the language.
Vibe-coding a program that segfaults and you don't know why and you keep burning compute on that? Doesn't seem like a great idea.
>Is C the ideal language for vibe coding? I think I could mount an argument for why it is not, but surely Rust is even less ideal.
I've been using Rust with LLMs for a long time (mid-2023?) now; cargo check and the cargo package system make it very easy for LLMs to check their work and produce high quality code that almost never breaks, and always compiles.
The first pass is to learn the fundamentals of language, and then it is refined on curated datasets, so you could refine them on high quality curated C code.
Also when I use LLM to fix a bug, I tell it to write a test to prevent regression of the bug at the end of the session, after the bug is fixed.
Oh that’s clever. Thanks
I propose WASM, or an updated version of it
People are still confusing AI putting together scraps of text it has seen that correlates with its understanding of the input, with the idea that AI understands causation, and provides actual answers.
1. I have encountered a problem where AI will suggest 4 different "solutions" and when I point out a problem with one, it cycles on to the next, and stays in that loop, repeating over and over that set of 4, with no recollection of the previous refutation of the soltuion (this is a mix of context retention, and the fact that the solution selection is limited to that which has already been fully explored on the web - I had a 5th idea in mind which the AI failed to understand, but worked well)
2. Yesterday I was discussing with AI the fact that I had three options for action, and it misunderstood that as 4 actions, a trivial arithmetic failure.
This demonstrates (clearly) that the AI didn't "understand" the points discussed, and was instead staying with the correlation of text with other text.
I really like where AI is at the moment and use it a lot - it's very helpful for debugging, for example, but as every vibe coder out there will attest, AI fails hard at standalone coding, and I submit that this is a symptom of its inability to understand what its doing.
It's still correlation is not causation, and it demonstrates why correlation is so attractive, you can get quite far knowing that there is a correlation between ice cream sales and shark attacks, but it takes work to understand that there is no causative link (FTR I suspect that it's because ice cream sales go up in hot weather, more people are in the ocean during those hot weather periods, therefore there's more opportunity for people to interact with sharks)
Edit: Note how I use the word "suspect" when I talk about the cause of the correlation - it's VERY tempting to say that the weather is the cause, but that's still a correlation, and the fact is, as humans have discovered, actual research is required to verify whether that is, indeed, the cause, or not - something AI might miss.
There is probably some point where you can go so wild and crazy with ideas never seen before that it starts to break down, but if it remains within the realm of what the LLM can deal with in most common languages, my experience says it is able to pick up and apply the same ideas in the IL quite well.
On the other hand, I've enjoyed vibe coding Rust more, because I'm interested in Rust and felt like my understanding approved along they way as I saw what code was produced.
A lot of coding "talent" isn't skill with the language, it's learning all the particularities of the dependencies: The details of the Smithay package in Rust, the complex set of GTK modules or the Wayland protocol implementation.
On a good day, AI can help navigate all that "book knowledge" faster.
I highly recommend people learn how to write their own agents. Its really not that hard. You can do it with any llm model, even ones that run locally.
I.e you can automate things like checking for memory freeing.
Or, if you don't need to use C (e.g. for FFI or platform compatibility reasons), you could use a language with a compiler that does it for you.
Not quite. Its not about being expressive enough to define algorithms, its about simplification, organization and avoidance of repetition. We invented languages to automate a lot of the work that programmers had to do in a lower level language.
C abstracts away handling memory addresses and setting up frame stacks like you would in assembly.
Rust makes handling memory more restrictive so you don't run into issues.
Java abstracts away memory management completely, so you don't need to manage memory, freeing up you to design algorithm without worrying about memory leaks (although apparently you do have to worry if your log statements can execute arbitrary code).
Javascript and Python abstract type definition away through dynamic interpretation.
Likewise, OOP/Typing, functional programming, and other styles were included for better organization.
LLMs are right in line with this. There is no difference between you using a compiler to compile a program, vs a sufficiently advanced LLM writing said compiler and using it to compile your program, vs LLM compiling the program directly with agentic loops for accuracy.
Once we get past the hype of big LLMs, the next chapter is gonna be much smaller, specialized LLMs with architecture that is more deterministic than probabilistic that are gonna replace a lot of tools. The future of programming will be you defining code in a high level language like Python, then the LLM will be able to infer a lot of the information (for example, the task of finding how variables relate to each other is right in line with what transformers do) just from the code and do things like auto infer types, write template code, then adapt it to the specific needs.
In fact, CPUs already do this to a certain extent - modern branch predictors are basically miniature neural networks.
I have a custom agent that can take python code, translates it to C, does a refactoring job to include a mempool implementation (so that memory is allocated once at the start of the program and instead of malloc it grabs chunks out of mempool), runs cppcheck, uploads to a container, and runs it with valgrind.
Been using it since ChatGPT3 - the only updates I did to it was API changes to call different providers. Doesn't use any agent/mcp/tools thing either, pure chat.
A mempool seems very much like a DIY implementation of malloc, unless you have fixed size allocations or something else that would make things different, not sure why I'd want that in the general case.
For "non hacker style" production code it just seems like a lot of extra steps.
For example, the last C code I wrote was tcp over ethernet, bypassing the IP layer, so I can be connected to the VPN while being able to access local machines on my network.
If im writing it in Rust, I have to do a lot of research, think about code structure, and so on. With LLMs, it took me an hour to write, and that is with no memory leaks or any other safety issues.
The problem is I want to run VNC on my home computer to the server on my work Mac so I can just access everything from one screen and m+b combo without having to use a USB switch and a second monitor. With VPN it basically just does not allow any inbound connections.
So I run a localhost tunnel its a generic ethernet listener that basically takes data and initiates a connection to localhost from localost and proxies the data. On my desktop side, its the same thing just in reverse.
Or to quote Rick and Morty, “that’s just rust with extra steps!”
So in essense I have to disagree with the author's suggestion to vibe code in C instead of Python. I think the python usability features that were made for humans actually help the AI the exact same ways.
There are all kinds of other ways that vibe coding should change one's design though. It's way easier now to roll your own version of some UI or utility library instead of importing one to save time. It's way easier now to drop down into C++ for a critical section and have the AI handle the annoying data marshalling. Things like that are the real unlock in my opinion.
I have not found this to be the case. I mean, yeah, they're really good with Python and yeah that's a lot easier, but I had one recently (IIRC it was the pre-release GPT5.1) code me up a simulator for a kind of a microcoded state machine in C++ and it did amazingly well - almost in one-shot. It can single-step through the microcode, examine IOs, allows you to set input values, etc. I was quite impressed. (I had asked it to look at the C code for a compiler that targets this microcoded state machine in addition to some Verilog that implements the machine in order for it to figure out what the simulator should be doing). I didn't have high expectations going in, but was very pleasantly surprised to have a working simulator with single-stepping capabilities within an afternoon all in what seems to be pretty-well written C++.
First one. Most of C code you can find out there is either oneliners or shit, there are fewer bigger projects for the LLMs to train on, compared to python and typescript
And once we go to the embedded space, the LLMs are trained on manufacturer written/autogenerated code, which is usually full of inaccuracies (mismatched comments) bugs and bat practices
With rust, what I see is generally what I get. I'm not worried about heisenbug gotchas lurking in innocent looking changes. If someone is going to be vibe coding, and truly doesn't care about the language the product ends up in, they might as well do it in a language that has rigid guardrails.
The LLM gets stuck in unproductive loops all the time in Python. In Rust, it generally converges to a result that compiles and passes unit tests. Of course the code quality is still variable. My experience is that it works best when prompts are restricted to a very small unit of work. Asking an LLM to write an entire library/module/application from scratch virtually never results in usable code.
otherwise, works great; much easier to un-vibe the code compared to eg python
(gpt 5.* in codex/sonnet 4.5 in cc/glm 4.6)
Not that my own code is good but every single time assembly output from an optimizing compiler beats the AI as it "forgets" about all the little tricks involved. However it may still be about how I prompt it. If I tell it to solve the actual challenge in assembly it does do that, it's just not good or efficient code.
On the other hand because I take the time to proof read it I learn from it's mistakes just as I would from my own.
Same here. I've been vibe-coding in C for the sake of others in my group who only know C (no C++ or Rust). And I have to say that the agent did do pretty well with memory management. There were some early problems, but it was able to debug them pretty quickly (and certainly if I had had to dig into the intricacies of GDB to do that on my own, it would've taken a lot longer). I'm glad that it takes care of things like memory management and dealing with strings in C (things that I do not find pleasant).
Recently I've been preparing a series that teaches how to use AI to assist with coding, and in preparation for that there's this thing I've coded several times in several different languages. In the process of that, I've observed something that's frankly bizarre: I get a 100% different experience doing it in Python vs C#. In C#, the agent gets tripped up in doing all kinds of infrastructure and overengineering blind alleys. But it doesn't do that when I use Python, Go, or Elixir.
My theory is that there are certain habits and patterns that the agents engage with that are influenced by the ecosystem, and the code that it typically reads in those languages. This can have a big impact on whether you're achieving your goals with the activity, either positive or negative.
I lost a day chasing my tail cycling through those 4 approaches, but the experience was worthwhile (IMO) because I had beeen becoming lazy and relying on AI too much, after that I switched to a better style of using AI to help me find those approaches, and as a sounding board for my ideas, whilst staying in control of the actual code.
(Oh, I should also mention that AI's conviction/confidence did cause me to believe it knew what it was talking about when I should have backed myself, but, again, experience is what you get after you needed it :)
I do vibe code in C; I'm not a C programmer and I certainly couldn't do a security audit of any serious C codebase, but I can read and understand a simple C program, and debug and refactor it (as long as it's still quite simple).
And it's super fun! Being able to compile a little C utility that lives in the Windows tray and has a menu, etc. is exhilarating.
But I couldn't do that in assembly; I would just stare at instructions and not understand anything. So, yes for C, no for assembly.
The author's point is correct IMO. If you have direct mappings between assembly and natural language, there's no functional need for these intermediate abstractions to act as pseudo-LUIs. If you could implement it, you would just need two layers above assembly: an LLM OS [1], and a LUI-GUI combo.
However, I think there's a non-functional, quality need for intermediate abstractions - particularly to make the mappings auditable, maintainable [2], understandable, etc. For most mappings, there won't be a 1:1 representation between a word and an assembly string.
It's already difficult for software devs to balance technical constraints and possibilities with vague user requirements. I wonder how an LLM OS would handle this, and why we would trust that its mappings are correct without wanting to dig deeper.
[1] Coincidentally, just like "vibe coding", this term was apparently also coined by Andrej Karpathy.
[2] For example, good luck trying to version control vectors.
My philosophy regarding AI is that you should never have it do something you couldn't do yourself.
Of course people break this rule, or the concept of vibe coding wouldn't exist. But some of us actually get a lot of value from AI without succumbing to it. It just doesn't make sense to me to trust a machine's hallucinations for something like programming code. It fabricates things with such confidence that I can't even imagine how it would go if I didn't already know the topic I had it work on.
Same here. I can read and understand most of it, but not enough to debug it. And outsourcing that task to Claude is like taking a long winding path through thick, dark woods.
I think that there's a real rift between people who use LLMs to rough out large swathes of functionality vs people who took the "vibe coding" brain fart way, way too literally. I'm kind of horrified that there are people out there who attempt to one-shot multiple copies of the same app in different instances and then pick the best one without ever looking at the code because "vibe coding". That was always supposed to be a silly stupid thing you try once, like drinking Tide pods or whatever the kids do for fun... not something people should be debating a year later.
But I have written C in the past, it was almost 20 years ago, and everything seemed to work fine, until the memory leaks.
Of course today I would ask the AI, why is my program leaking memory. I think you have a point, AI would be sort of like having a mentor help you find bad practices in your C code.
You've inspired me to maybe try my hand at Rust, something I've been wanting to do since I heard of it.
And just in my experience, I feel everyone is slowly learning, all models are better at the common thing, they are better at bash, they are better at Python and JS, and so on. Everyone trying to invent at that layer has failed to beat that truth. That bootstrapping challenge is dismissed much too easily in the article in my opinion.
I admit I can't say for sure until we try it. If someone were to train a model at the same scale on the same amount of raw binary code as we do these models on raw language and code, would it perform better at generating working programs. Thing is, it would now fail to understand human language prompts.
From what I know and understand though, it seems like it would be more complex to achieve.
My meta point is, you shouldn't think of it as what would a computer most likely understand, because we're not talking about a CPU/GPU. You have to think, what would a transformer architecture deep neural net better learn and infer? Python or binary code? And I think from that lens it seems more likely it's Python.
Who said that creating bits efficiently from English to be computed by CPUs or GPUs must be done with transformer architecture? Maybe it can be, maybe there are other ways of doing it that are better. The AI model architecture is not the focus of the discussion. It is the possibilities of how it can look like if we ask for some computation, and that computation appears without all the middle-men layers we have right now, English->Model->Computation, not English->Model->DSL->Compiler->Linker->Computation.
Binary code takes more space, and both training and inference is highly capped by memory and context sizes.
Models tokenize to a limited set of tokens, and then learn relations between those. I can't say for sure, but I feel it be more challenging to find tokenization schemes for binary code and learn their relationships.
The model needs to first learn human language really well, because it has to understand the prompt and map it accurately to the binary code. That means the corpus will need to include a lot of human languages that it learns and also binary code, I wonder if the fact they differ so much would conflict the learning.
I think coming up with a corpus of mapped human language to binary code will be really challenging. Unless we can include the original code's comments at appropriate places around the binary code and so on.
Binary code is machine dependent, so it would result in programs that aren't portable between architecture and operating system and so on. The model would need to learn more than one binary code and be able to accurately generate the same program for different target platforms and OS.
> Who said that creating bits efficiently from English to be computed by CPUs or GPUs must be done with transformer architecture?
We've never had any other method ever do as well and by a magnitude. We may invent a whole new way in the future, but as of now, it's the absolute best method we've ever figured out.
> The AI model architecture is not the focus of the discussion. It is the possibilities of how it can look like if we ask for some computation, and that computation appears without all the middle-men layers we have right now, English->Model->Computation, not English->Model->DSL->Compiler->Linker->Computation.
Each layer simplifies the task of the layer above. These aren't like business layer that take a cut of the value out at each level, software layers remove complexity from the layers above.
I don't know why we wouldn't be talking about AI models? Isn't the topic that it may be more optimal for an AI model to be trained on binary code directly and to generate binary code directly? At least it's what I was talking about.
So if I stick to AI models. With LLMs and image/video diffusion and such, we've already observed that inference through smaller steps and chains of inference work way better. Based on that, I feel it's likely going from human language to binary code in a single hop to also work worse.
- Everything about rust enforcing correctness catches lots of bugs
- Using a high-level API means I can easily hand-check things in a repl
- In addition to tests, I required a full “demo notebook” with any PR — I should be able to read through it and confirm that all the functionality I wanted has actually been implemented
If the philosophy is (and it should be) “loc is free”, it’s worth thinking about how we can make LLMs produce more loc to give us additional comfort with correctness. Language choice is very much a way.
Currently, using Claude to vibe code Rust is _much_ more hit-or-miss than using it for Python... so Python has become the lingua franca or IR I use with it.
Often I'll ask Claude to implement something in Python, validate and correct the implementation, and in a separate session ask it to translate it from Python to Rust (with my requirements). It often helps.
Claude is particularly bad at hallucinating the APIs of Crates, something it does a lot less for python.
My approach is evolving due to NixOS and home-manager with vibe coding to do the lifting. I increasing lean on vibe coding to handle simple details to safely write shell scripts (escaping strings, fml) and C/C++ apps. The complexity is minimized, allowing me to almost one-shot small utilities, and Nix handles long-term maintenance.
With NixOS, a simple C/C++ application can often replace a Python one. Nix manages reading the source, pulling dependencies, and effectively eliminating the overhead that used to favor scripting languages while marking marginal power savings during everyday use.
Getting it to output a spec lets me correct the spec, reload the browser tab to speed things up, or move to a different AI.
Yes, it’s extremely soul sucking. With the added disadvantage of not teaching me anything.
I went full meta and sketched out a file, then had an expensive LLM go through the codebase and write such a file. I don't know if it's any good though, I only really use coding assistants to write unit tests.
Or... I want to only write the tests. The implementation is... an implementation detail!
I'll admit that I'd like to do a programming challenge with or without AI that would be like "advent of code" in assembly but if it was actual "advent of code" the direct route is to write something that looks like a language runtime system so you have the dynamic data structures you need on your fingertips.
Or assembly, or binary
Yes, this is a completely valid take and it is the ultimate answer to why vibe coding, the way most people define vibe coding is a dead end.
The point is we want the LLM to generate code that is first and foremost readable by humans and structured in such a way that a human can take over control at any time.
If you think this is how LLM should generate code, congratulations we are already in agreement.
If you think programmers should not exist and that you will help your bottom line by reducing the number of programmers on your payroll or worse, completely eliminate programmers from your payroll by paying product managers who will never ever look at the code (which is required for vibe coding the way I understand it), then this question at the top is for you.
python though is very readable, not so much for typescript for me.
… well, you are wrong.
I recently gave the "vibe" AI the assignment of "using GTK [4, I think], establish a global shortcut key".
No amount of massaging the prompt, specifying the version of GTK, etc. could prevent it from just outright hallucinating the functions it wanted to call into existence. The entire reason I was asking was because I did not know what function to call, and was having difficulty discerning that from GTK's documentation. (I know how to do this now, and it is effectively undocumented.)
Prior to that, an assignment to determine some information from Alembic. Again, the AI desired to just hallucinate the functions it required into existence.
A script to fetch the merge queue length from GH. It decided to call GH's GraphQL API, which is fine, and doable for the task, but the query was entirely hallucinated.
A bash script to count files change in git. The code ran, and the output was wrong. The author did not check the LLM's code.
Even non-programming tasks are the same. Image generation is a constant fight of trying to get the AI to understand what you mean, or it just ignoring your prompts, etc. I went about 10 prompts trying to get an image with a stone statue of 4 ASCII characters in a field. The last character was consistently just wrong, and no amount of prompting to fix.
"Generate a character with a speech bubble that says 'Hi'" -> speech bubble has Japanese in it! (And the Japanese is gibberish, but if you ask AI to translate it, it "will".)
I find CUX to be very intuitive for prototyping. But my game is Language and HCI at heart, logic that allows the development process to go smoothly. It is certainly not for everyone or every project.
A language designed for vibe coding could certainly be useful, but what that means is the opposite of what the author thinks that means.
The author thinks that such a language wouldn't need to have lots of high-level features and structure, since those are things that exist for human comprehension.
But actually, the opposite is true. If you're designing a language for LLMs, the language should be extremely strict and wordy and inconvenient and verbose. You should have to organize your code in a certain way, and be forced to check every condition, catch every error, consider every edge case, or the code won't compile.
Such a language would aggravate a human, but a machine wouldn't care. And LLMs would benefit from the rigidness, as it would help prevent any confusion or hallucination from causing bugs in the finished software.
No, that's the problem (same misconception the author has) - it can't. At least not reliably. If you give an LLM free rein with a non-memory safe output format, it will make the exact same mistakes a human would.
The point of a verbose language is to create extensive guardrails. Which the LLM won't be annoyed by, unlike a human developer.
The reason is that you want to have some kind of guidance from a larger perspective in the long run. And that is exactly what types and module systems provide. The LLM has to create code which actually type checks, and it can use type checking as an important part of verification.
If you push this idea further: use Lean, Agda or Rocq. Let the LLM solve the nitty gritty details of proof, but use the higher-level theorem formation as the vessel for doing great things.
If you ask for a Red-black tree, you get a red-black tree. If you ask for a red-black tree where all the important properties are proven, you don't have to trust the LLM anymore. The proof is the witness of correctness. That idea is extremely powerful, because it means you can suddenly lift software quality by an order of magnitude, without having to trust the LLM at all.
We currently don't do this. I think it's because proving software correctness is just 50x more work, and it moves too slow. But if you could get an amplifier (LLM) to help out, it's possible this becomes more in the feasible area for a lot of software.
Formal proofs have so much potential in this context
- a lot of C code out there is not safe, so the LLM outputs that
- C encodes way less of the programmer's intention, and way more implementation details. So unless the author is extremely good at naming, encapsulating and commenting, the LLM just has less to work with. Not every C code is Sqlite/redis/ffmeg quality.
- the feedback loop is slower, so the LLM has less chance to brute force a decent answer
- there is no npm/pypi equivalent for C on which to train the LLM so the pool for training is less diverse
- the training pool is vastly Linux-oriented, with the linux kernel and distro system libs being very prominent in the training data because C programs on Windows are often proprietary. But most vibe coders are not on Linux, nor into system programming.
Sure, you can vibe code in C. Antirez famously states he gets superb ROI out of it.
But it's likely you'll get even better results with other languages.
This is such a bad take. I'm convinced that engineers simply don't understand what the job is. The point was never "does it output code that works", the point is "can it build the right thing in a way that is maintainable and understandable". If you need an LLM to understand the output then you have failed to engineer software.
If all you're doing is spitting out PoCs and pure greenfield development then I'm sure it looks very impressive, as the early language models did when it looked like they were capable of holding a conversation. But 99% of software engineering is not that kind of work.
I prompt my agents to use proper OO-encapsulated idiomatic ruby paradigms. Your goal should be reduced cognitive load.
Even if you never write a line of code, you will still need to understand your problems to solve them.
"Vibe debugging" will get you stuck in loops of hallucinated solutions.
(https://chatgpt.com/share/693891af-d608-8002-8b9b-91e984bb13...)
* boring and straightforward syntax and file structure: no syntax sugar, aliases, formatting freedom that humans cherish, but machines are getting confused, no context-specific syntax.
* explicitness: no hidden global state, shortcuts and UB
* basic static types and constraints
* tests optimized for machine evaluation
etc.
No one (other than computer people) wants computers and software, they want results.
This generation of AI will be used to bootstrap the next generation of AI.
Programmers getting excited about vibe coding is like candlemakers getting excited about installing electric lights in their shops, so they can make more candles!
We can't teach AI to code in languages that do not have human ergonomics because, as of now, all AI is based on human example.
Also, like others said, even once you have your formal spec, C is a particularly bad choice (unless you want to specify quite a bit more). You want the program implemented in a language with as many safety constraints on it as possible, not one where you have to mentally track memory.
Because you would not be able to audit the code if you don't (you'll be terribly slow to read and understand the inner flows correctly and that's if these aren't so bad that would do you some brain damage).
Dang, AI is pushing us all to become managers.
Of course in practice I think the author is actually correct - LLM's struggle more than humans with sophisticated formal constraints and less than humans with remembering to write a bunch of boilerplate. But I think it's a pretty counterintuitive result and I'd love to have seen more discussion of it.
C extensions for SQLite: https://simonwillison.net/2024/Mar/23/building-c-extensions-...
This one is closest to something I might use because it's C compiled to WebAssembly, so the blast radius for any dumb bugs is strictly limited: https://github.com/simonw/research/blob/main/cmarkgfm-in-pyo...
On the topic: I feel like we still need at least a few more innovations in the space before we can rely on them to work in areas where we as humans still have trouble (that pesky training data!). Even when providing documentation, I still find LLMs to often have trouble creating code in newer versions of libraries.
My biggest fear with LLMs is that it will steer a lot of development into a more homogenous space over time (even just with the types and versions of libraries it chooses when vibing).
I think that’s what makes it so common in codebases that have long term maintenance stories.
(I say that because my personal project has me reading great loads of C written by diverse authors and I am surprised at how easy it is to figure out, compared to most other languages)
Or why not just produce a binary directly? It seems we've just invented a compiler.
But anyway. That’s all besides the point. Because the progress apologists[1] come in all shapes and forms (we are lead to believe), now also uber-passionate college professor who aah loves programming as much as the day he met her. But unlike you he’s a hard-prostheticed pragmatist. He both knows and sympathises with your “passion” but is ready to assert, in a tptacek-memetic style, that it is the way it is—and if you think otherwise (pause for effect), you are wrong.
Because don’t you see? Why are you so blind? No, we can’t let the chips fall as they may and just send you a “told you so” letter once everything you know-now is resolutely quaint. No, we must assert it right now. (So you don’t miss out on the wonderful ride.)
Aah the text complains. It saddens me to think of “coding by hand” becoming a kind of quaint Montessori-school... Oh, the twists and turns of the turbulent text, so organic. Just like your mind. But awake.
The room is at this point drenched in a mist of farts. Yes, programming by-hand, I think we ought to call it a quaintism at this point.
And did you know: people used to resist mechnical computers. Hmm? Yes, indeed, favoring people computers. The text prompts for another model to make an image of a person smirking so hard that their eyes become kind of diagonal and their cheeks disappear. But not in an evil cartoon character way. In a human way. That three years ago felt slightly off-putting. Now just looks like, well, you know.
- - -
Ahh. (Again.) These fools. With their hand-coding. Do they really think they will be employable three years from now? Well, no matter. I have a PhD from MIT along with my associate professorship. I only came out here to Iowa Community College because of my disabled son. Needed more time with him. And to get away from dat citation grind. Man. I have many organic hobbies. And a few very, really incredibly specific collections, as is fitting. puffs pipe Mmm yeah what do I care, so what if programming is quaint now—I’m already in my “ivory tower”, baby. People will listen to my takes on AI. They are appropriately detached, informal, just saying it like it is, you know? And if they don’t? Well, there’s an army of texts right behind me. They’ll be convinced to suppress any feelings of alienation eventually. Eventually, there will just be their own vanishing, small-minded, petty, “thoughts” on the matter. That tiny holdout. Against all content they can sense.
[1] Insert scare quotes here. All history is whitewashed. “We” progressed and defeated “them”. It’s all just a linear curve. No critical thinking is supposed to occur here. Those idiots thirty years ago used reusable underwear and had to load detergent into a washing machine and then even bend over to turn on a “button” to make the underwear reusable. Our underwear costs fifty cents, is made from the most comfortable plastic you can get, and dissolves and crumbles when it gets into contact with water; down the bathroom drain it goes.
> Thus, programs must be written for people to read, and only incidentally for machines to execute
But that's... terrible. Humans can barely communicate to each other. And now you wanna take our terrible communication, and make a machine try to guess what the hell we want to happen? You want a plane to operate like that?Now, it is true that vibes results in producing a larger quantity of lower-level code than we would stomach on our own. But that has some consequences for the resulting maintenance challenge, since the system-as-a-whole is less structured by its boundaries.
I think a reasonable approach when using the tools is to address problems "one level down" from where you'd ordinarily do it, and to allow yourself to use something older where there is historical source for the machine to sample from. So, if you currently use Python, maybe try generating some Object Pascal. If you use C++, maybe use plain C. If there were large Forth codebases I'd recommend targeting that since it breaks past the C boundary into "you're the operator of the system, not just a developer", but that might be the language that the approach stumbles over the most.
In the game we're building we generate, compile and run code (C#) in real time to let the player "train and command" its monster in creative ways. So, I've thought about this.
You need both a popular language and one that has a ton of built-in verifying tools.
The author correctly highlights the former, but dismisses the latter as being targeted to humans. I think it is even more important for LLMs!
These coding agents are excellent at generating plausible solutions, but they have no guarantees whatsoever. So you need to pair them with a verifying system. This can be unit tests, integration tests, static / type checks, formal methods, etc. The point is that if you don't have these "verifier" systems you are creating an open loop and your code will quickly devolve to nonsense [0].
In my view, the best existing languages for vibe coding are: - Rust: reasonably popular, very powerful and strict type system, excellent compiler error messages. If it compiles you can be confident that a whole class of errors won't exist in your program. Best for "serious" programs, but probably requires more back and forths with the coding agent. - TypeScript: extremely popular, powerful type system, ubiquitous. Best for rapid iteration. - Luau: acceptably popular, but typed and embeddable. Best as a real-time scripting sandbox for LLMs (like our use case).
I think there is space for a "Vibe-Oriented Programming" language (VOP as the author says), but I think it will require the dust to settle a bit on the LLM capabilities to understand how much can we sacrifice from the language's lack of popularity (since its new!) and the verifiability that we should endow it with. My bet is that something like AssemblyScript would be the way to go, ie, something very, very similar to an existing, typed popular language (TS) but with extra features that serve the VOP needs.
Another aspect to consider besides verifiability is being able to incrementally analyze code. For structured outputs, we can generate guaranteed structures thanks to grammar-based sampling. There are papers studying how to use LSPs to guide LLM outputs at the token level [1] . We can imagine analyzers that also provide context as needed based on what the LLM is doing, for example there was this recent project that could trace all upstream and downstream information flow in a program thanks to Rust's ownership features [2].
Finally, the importance of a LLM-coding friendly sandbox will only increase: we already are seeing Anthropic move towards using LLMs to generate script as a way to make tool calls rather than calling tools directly. And we know that verifiable outputs are easier to hillclimb. So coding will get increasingly better and probably mediate everything these agents do. I think this is why Anthropic bought Bun.
[0] very much in the spirit of the LLM-Modulo framework: https://arxiv.org/pdf/2402.01817 [1] https://proceedings.neurips.cc/paper_files/paper/2023/file/6... [2] https://cel.cs.brown.edu/paper/modular-information-flow-owne...
for all the same reasons I wouldn’t have done it in C a decade ago!
Plus, now: credit limits!
I would absolutely love to teach programming to non-programmers. I have also been offered a job at the technical school where I graduated. But remembering how uninterested the vast majority of my classmates were back then discouraged me from even trying. I guess what I'd want is a teach a room full of people excited to learn about programming.
And these are systems that require a human in the loop to verify the output because you are ultimately responsible for it when it makes a mistake. And it will.
It’s not fun because it’s not fun being an appendage to a machine that doesn’t know or care that you exist. It will generate 1200 lines of code. You have to try and make sure it doesn’t contain the subtle kinds of errors that could cost you your job.
At least if you made those errors you could own them and learn from it. Instead you gain nothing when the machine makes an error except the ability to detect them over time.
I think if you don’t know C extremely well then there’s no point vibe coding it. If you don’t know anything about operating systems you’re not going to find the security bugs or know if the scheduler you chose does the the right thing. You won’t be able to tell the difference between good code and bad.
Doubt. These things have been trained to emulate humans, why wouldn't they make the same mistakes that humans do? (Yes, they don't make spelling errors, but most published essays etc. don't have spelling errors, whereas most published C codebases do have undefined behaviour).
How do you know? I can believe that they didn't show memory errors in a quick test run on a common architecture with a common compiler, much like most human-written code in the training corpus.
Sure, for prototype sized codebases it might be able to handle finding mistakes a fresh grad might easily make, or even that memory bugs aren't a big problem - but in my experience it happily adds memory bugs to large codebases and multithreaded code (that I think an experienced human could easily spot tbh).
Sure, but having access to merely mildly superhuman programming ability still doesn't make using C a good idea.
I think Claude would do much better with tools provided by modern C++ or Zig than C, frankly, anyways. Or even better, like the Rust people have helpfully mentioned, Rust.
It's incorrect to think because it is trained on buggy human code it will make these mistakes. It predicts the most likely token. Let's say 100 programmers write a function, most (unless it's something very tricky), won't forget to free that particular function. So the most likely tokens are those which do not leak.
In addition, this is not GPT 3. There's a massive amount of reinforcement learning at play, which reinforces good code, particularly verifiably good (which includes no leaks). And also a massive amount of synthetic data which can also be generated in a way that is provably correct.
You don't free a function.
And this would only be true if the function is the same content with minor variations, which is why LLMs are better suited for very small examples. Because bigger examples are less likely to be semantically similar, and so there is less data to determine the "correct" next token.
> There's a massive amount of reinforcement learning at play, which reinforces good code, particularly verifiably good (which includes no leaks)
This is a really dubious claim. Where are you getting this? Do you have some information on how these models are trained on C code specifically? How do you know whether the code they train on has no leaks?
There are huge projects that everyone depends on that have memory bugs in them right now. And these are actual experts missing these bugs, what makes you think the people at OpenAI are creating safer data than the people whose livelihoods actually depend on it?
This thread is full of people sharing how easy it is to make memory bugs with an LLM, and that has been my experience as well.
But I don't see a reason why the LLM shouldn't be writing binary CPU instructions directly. Or programming some FPGA directly. Why have the assembly language/compiler/linked in between? There is really no need.
We humans write some instructions in English. The LLM generates a working executable for us to use repeatedly in the future.
I also think it wouldn't be so hard to train such a model. We have plenty of executables with their source code in some other language available to us. We can annotate the original source code with a model that understands that language, get its descriptions in English, and train another model to use these descriptions for understanding the executable directly. With enough such samples we will be able to write executables by prompting.
Claude hilariously refused to rewrite my rails codebase in Brainfuck…not that I really expected it to. But it went on a hilarious tirade about how doing so was a horrible idea and I would probably fired if I did.
Including shooting yourself in the foot.
/Rust
IF this is true, you have bad PMs.
C is actually pretty good, if you can manage to architect your project cohesively
AI just gives suggestions but you have to make all the hard choices and fix/simplify a lot of the output
In C, without anything like a borrow checker or such, I'd be very worried about there being subtle pointer safety issues...
Now, some of that could be volumes of training data, etc, but Rust is widely discussed these days in the places these models are trained on, so I'm not certain it's a training problem vs a attention-to-detail across files problem. I.e., since LLMs are trained to mimic human language, programming languages that are most procedural-human-language-like (vs having other levels o fmeaning embedded in the syntax too) may exactly be those "LLM-friendly" languages.
Many people I've seen have taken existing software and 'ported' it to more performant languages like C, Rust, etc.
LLMs are extremely efficient and good at translation.
The biggest question is maintainability and legibility. If you want it for your own proprietary software, this can (and probably is, generally) a good pattern if you can get the LLM to nail language specific challenges (e.g. memory allocation in C)
However, fewer people can write C code generally, and even fewer can use it to build things like UI's. So you're by definition moving the software away from a collaborative mechanism.
The abstraction layers were built for human maintenance. LLMs don't need that.
Tons of people make the above claim and tons of other people make the exact opposite claim that it’s just a search engine and it can’t actually code. I’m utterly shocked at how two people can look at ground truth reality and derive two different factual conclusions. Make no doubt one of these two people is utterly wrong.
The only thing I can conclude is many people are either in denial or outdated. A year ago none of what this man said was true.
Agreed. It's like replacing a complex and fulfilling journey with drugs.
This is a big issue, personally. I write Python and bash these days and I love that we're not bottlenecked by IDE-based autocomplete anymore, especially for dynamic languages and a huge amount of fixing and incremental feature work can be done in minutes instead of days thanks to AI being able to spot patterns. Simultaneously I'm frustrated when these agents fail to deliver small changes and I have to jump in and change something I don't have a good mental model of or, worse still, something that's all Greek to me, like Javascript.
I sincerely hope the author is joking.
If you could invent a language that is somehow tailored for vibe coding _and then_ produce a sufficient high quality corpus of it to train the AI on them, that would be something.
It's one thing to program as a hobby or to do programming in an institutional environment free of economic pressures like academia (like this educator), it's another thing to exist as a programmer outside that.
My partner was telling me her company is now making all their software engineers use ChatGPT Codex. This isn't a company with a great software engineer culture, but it's probably representative of the median enterprise/non SV/non tech start employer than people realise.
The second and more important point is that what makes coding simpler for humans is the ability of the language to facilitate communication so you don't have to translate, now LLMs are good at translation, but it's still work. Imagine you have an implementation of a program and you want to see what it does, for any non-trival program you must scan millions of tokens, are current LLMs even physcially capable of attending to that? nonono we need names.
Besides how can you even vibecode if you aren't (just) using names.
The problem here is that human languages are terrible programming languages and LLMs are terrible compilers.
I love C. I came up on C. But C does not tell you a story. It tells you about the machine. It tells you how to keep the machine happy. It tells you how to translate problems into machine operations. It is hard to read. It takes serious effort to discern its intent.
I think any time you believe the codebase you're developing will have to be frequently modified by people unfamiliar with it, you should reach for a language which is both limiting and expressive. That is, the language states the code intent plainly in terms of the problem language and it allows a limited number of ways to do that. C#, Java (Kotlin) and maybe Python would be big votes from me.
And FYI, I came up on C. One of the first senior engineers I was tutored by in this biz loved to say, good code will tell you a story.
When you're living with a large, long lived codebase, essenti
It was Schadenfreude watching the CEO's son (LLM guys) implode a public-facing production server ( https://en.wikipedia.org/wiki/Dunning-Kruger_effect .)
Slop content about slop code is slop recursive. Languages like C are simply very unforgiving to amateurs, and naive arbitrary code generators. Bad workmanship writes bad code in any language. Typically the "easier" the compiler is to use... the more complex the failure mode. =3
Vibe coders usually offer zero workmanship, and are enamored with statistically salient generated arbitrary content. https://en.wikipedia.org/wiki/The_Power_of_10:_Rules_for_Dev...
While I am not a big fan of Rust, the philosophy is likely useful here. Perhaps something like it, with a lot of technical validation pushed to the compiler, could actually be really useful here.
Getting rid of the garbage collector with no major increase in human cognitive load might actually be a big win.
It is not portable to computers other than x86. It is one of the reasons I do not use x86 assembly much even though I have a x86 computer; I prefer C. It is not about vibe coding.
> I suppose what I’m getting at, here, is that if vibe coding is the future of software development (and it is), then why bother with languages that were designed for people who are not vibe coding? Shouldn’t there be such a thing as a “vibe-oriented programming language?” VOP. You read it here first.
Someone told me that two companies (one of which is Google) were working on such a thing, although I do not know the details (or if they were correct about that), and I do not know whether or not it resembles what is described in that article.
I do not use LLM myself, although I have seen a few examples of it. I have not seen very many so the sample size is too small, but what I have seen (from simple example programs), the program works although it is not written very well.
This exactly. Programming is art, because it comes from the soul. You can tackle a problem a million ways, but there's only one way that YOU would solve it.
Vibe coding feels like people who aren't creative, stealing everyone's creativity, and then morphing it into something they find appealing.
There is no skill.
There is no talent.
You are asking the machine to do ALL THE THINKING. All the little decisions, the quirks, the bugs, the comments to yourself. All the things that make a piece of code unique.
If you don't give a damn about integrity though, then may as well get funky with it. Hell, go hard: Do it in brainfuck and just let it rip.
I used to say that implementation does not matter, tests should be your main focus. Now I treat every bit of code I wrote with my bare hands like a candy, agents have sucked the joy out of building things
I spend all day in Claude Code, and use Codex as a second-line code reviewer.
They do not create robust systems. They’ve been a huge productivity boost for me in certain areas, but as soon as you stop making sure you understand every line it’s writing, or give it a free reign where you’re not auto-approving everything, the absolute madness sets in.
And then you have to unpick it when it’s trying to read the source of npm because it’s decided that’s where the error in your TypeScript project must lie, and if you weren’t on top of the whole thing from the start, this will be very difficult.
Don’t vibe-code in C unless you are a very strong C developer who can reliably catch subtle bugs in other people’s code. These things have no common sense.
- The C-compiler. AI tools work better if their automated feedback loop via tools includes feedback on correctness, safety, etc. The C compiler is not great at that. It requires a lot of discipline from the programmer. There mostly isn't a compile time safety net.
- Macros add to this mess. C's macros are glorified string replacements.
- Automated tests are another tool that helps improving quality of vibe coded code. While you can of course write tests for C code, the test frameworks are a bit immature and it's hard to write testable code in C due to the lack of abstractions.
- Small mistakes can have catastrophic consequences (crashes, memory overflows)
- A lot of libraries (including the standard library) contain tools with very sharp edges.
- Manual memory management adds a lot of complexity to code bases and the need for more discipline.
- Weak/ambiguous semantics mean that it's harder to reason about code.
There are counter arguments to each of those things. Compilers have flags. There are static code analyzers. And with some discipline, it gets better. You could express that discipline in additional instructions for your agent. And of course people do test C code. There just are a lot of projects where none of that stuff is actively used. Vibe coding on those projects would probably be a lot harder than on a project that uses more structured languages and tools.
All these things make it harder to work with C code for humans; and for AIs. But not impossible of course. AI coding models are getting quite good at coding. Including coding in C.
But it makes it a poor default language for AI coding. The ideal vibe coding language for an AI would be simple, expressive, have great tools and compilers, fast feedback loops, etc. It means the AI has less work to do: shorter/faster feedback loops, less iterations and reasoning to do, less complex problems to solve, less ambiguity, entire categories of bugs that are avoided, etc. Same reasons as to why it is a poor choice for most human programmers to default to.
If I'm making a C#/WPF app, I can't just decide to make part of it C.
I get it's just a generalised criticism of vibe coding, but "why not use a harder language then" doesn't seem to make any sense.
Yes, it can consistently generate code that works and seems to be on top of it all due to a lot of training data. But wouldn't that use up more tokens and computational resources to produce than e.g. the same program in python?
If using more complex languages requires more resources from LLM, the same principles apply. One-off scripts are better in high level languages. Hot paths executed millions of times a second are better in lower level languages with high optimisation potential.
LLMs might slightly shift the correct choice towards lower languages. E.g. a small one-off script in C is much more viable with LLM's help. But the moment one needs to reuse it, it grows, and needs to be modified, one might regret not using higher level language.
I've had the most success running Claude iteratively with mypy and pytest. But it regularly wants to just delete the tests or get rid of static typing. A language like Haskell augmented with contracts over tests wouldn't allow that. (Except for diverging into a trivial ad-hoc dynamic solution, of course.)
So, vibecoding in C feels like playing with loaded gun.
For this it’s the web deployment target and fast compile times rather than the language itself that is useful.
Routinely I am sent code that works, but obviously nobody has looked at. Because they don’t even actually know what the code does.
For prototyping this is quite great. I think it’s the biggest competitor to tools like figma, because writing actually functional program with access to real APIs beats mocks. Now, how often will these end up in production and blow everything up…
I was thinking that this week. We are quickly reaching a point where the quality of the code isn't as important as the test suite around it and reducing the number of tokens. High level languages are for humans to read/write, if most people aren't reading the code we should just skip this step.
It's an ugly future but it seems inevitable.
We already are having visual programming tools with AI agents, with various kinds of success, see iPaaS like Boomi, Workato and similar.
Recently I have had the opportunity to be part of projects using such kind of tools.
If there is any traditional coding it is a bunch of serverless endpoints exposed as MCP tools.
For reference, here are the two heavy-lifting workers:
- https://github.com/akaalias/bipscan/blob/main/src/c/find_seq...
- https://github.com/akaalias/bipscan/blob/main/src/c/check_se...
and here's a screenshot of the thing running:
- https://x.com/SpringStreetNYC/status/1996951130526425449/pho...
and here's the full story:
LOL, I got 100% nerd-sniped by my friend Sönke this week and wound up building a small spaceship.
On Monday he's like "Hey, what if you found obscure seed phrases embedded in public texts? You'd only need to remember the name of the book and the paragraph and go from there."
I honestly could care less about crypto(currencies) and I'm 100% sure this is like cryptanalysis 101. But, yeah, it seemed like an interesting problem anyways.
First, I downloaded a few hundred books from Gutenberg, wrote a ruby script and found BIP39 word sequences with a tolerable buffer for filler-words.
Then, I was like, okay, gotta now check them against actual addresses. Downloaded a list of funded ETH addresses. Wrote the checker in ruby. Ran it. No hits but this was now definitely weirdly interesting.
Because: And what if I downloaded the whole pg19 text corpus to scan! And what if I'd add BTC addresses! And what if I checked every permutation of the seed phrase!
Everything got really slow once I got to processing 12G of raw text for finding sequences and then checking a few million candidates with 44.000+ variations per candidate.
So, let's rewrite this into C! And since I've got 16 cores, let's parallelize this puppy! And since it's a MacBook, let's use GCD! Optimize all the things!
Lol, so NOW this thing is so fucking FAST. Takes four minutes to go through the full pg19 corpus and generates 64,205,390 "interesting" seed phrases. The fully parallelized checker (see Terminal screenshot) processes 460 derived addresses per second.
I really don't care if I get a match or not. I feel like I started with building a canoo and wound up with a spaceship is in itself just the best thing in the world.
If a project is important enough to require C or x86 assembly, where memory management and undefined behavior have real consequences, then it’s important enough to warrant a real developer who understands every line. It shouldn’t be vibe coded at all.
Python’s “adorable concern for human problems” isn’t a bug here, it’s a feature. The garbage collection, the forgiving syntax, the interpreted nature: these create a sandbox where vibe coded solutions can fail safely. A buggy Python script throws an exception. A buggy C program gives you memory corruption or security holes that show up three deployments later.
The question isn’t what language should AI write. It’s what problems should we trust to vibe coding. The answer: problems where Python’s safety net is enough. The moment you need C’s performance or assembly’s precision, you’ve crossed into territory that demands human accountability.
If not then I see the argument for everything being done in Python and performance coming from optimizing Python -> C.
I more bullish on the Python -> Rust pipeline. The two languages have a lot of overlap in philosophies, have great interop, and have similar levels of guard rails (when it comes to multithreading Rust even beats Python in terms of safety). And both languages seem well suited to being vibecoded
It doesn’t feel like we’re very far from that point.
I disagree. I write a lot of one-off numerical simulations where something quick and dirty is useful but performance matters and the results can be easily verified without analyzing every line of code. Python would be a terrible choice.
A poorly written comment by a human wastes time. A vibe comment by an LLM wastes both time and electricity that only shows up when global warming reaches 3c.
The question isn't if the comment is valuable or not. It's whether it is ethical or not to waste peoples time with AI slop.
This is chatGPTs pattern.
Edit: but I empathize with the paranoia of everything being AI slop! I’m constantly scrutinizing stuff and it’s annoying
If you want a language that protects you from the largest amount of problems, how about Rust? Vulnerabilities will still be possible, but at least data races won't be possible.
Also vibe coding is a mistake, it will undoubtedly turn anything more elaborate than a simple script into a monstrosity. DO NOT TELL AI TO WRITE MORE THAN A FUNCTION OR PART OF A FUNCTION.
Now personally I don't think this ultimate vibe-coding paradigm is just around the corner, but it does seem that it's the direction we're heading and I think this article does a good job of explaining why.
Writing a program means stating unambiguously what you want, but natural language is ambiguous which is why legalese exists.
So you need to have an unambiguous source language for what's end up being executed by your machine (which is what programming languages are), otherwise you have no way of knowing if the machine does what you want it to do.
Of course you can use an LLM to translate natural language to unambiguous language, but at the end of the day you must read the generated language because that's the only way to dispel the fundamental ambiguity of the natural language.
I saw a webcomic ten years ago where a project manager discussing the future automation of programmer's jobs:
PM: “In the future, there will be no developers, just project manager. We will just give the machines very complete and precise specifications and it will make the software on its own. Without needing you”.
Dev: “You know how we call a "very complete and precise specification"? It's called code”.
Perhaps the programming languages of the future will be designed with AI in mind: to properly put guardrails on it.
Could also be that the models just accelerate to the future so fast that they'll simply stop making mistakes. Then we'll be coding in Assembler, because why waste CPU time for anything else?
I must take issue with the central point however: the machines of LLMs are very different than the machines of CPUs. While it is true that Claude writes fewer memory errors than even expert C programmers (I’ve had to fully accept this only this week), the LLM is still subject to mistakes that the compiler will catch. And I dare say the category of error coding agents commit are eerily similar to those of human developers.
Which will probably store up a problem for the future, with an outsized amount of programs written in languages that were popular on SO when models started learning...
I suspect the assembly would not be as highly optimized as what a modern C compiler would output.
I don't get how the second follows from the first. One of the main complaints levelled against Rust is that it is not ergonomic for humans, specifically because it forces you to work to a set of restrictions that benefit the machine.
With an LLM coding agent that quickly produces volumes of somewhat-scattershot code, surely we're better off implementing incredibly onerous guardrails (that a human would never be willing to deal with)?
2) use a language that is well supported by the community with countless examples ( Python )
Or greater hell, why not binary?
If we ignore human readability (as the author suggests), the answer is context. The token count explodes as you fall down the abstraction rabbit hole. Context consumption means worse reasoning.
In turn, this means expressiveness matters to LLMs just as much as it matters to us. A functional map reduce can be far simpler to write and edit than an imperative loop. Type safety and borrow checking free an LLM from having to reason about types and race conditions. Interpreted languages allow an LLM to do rapid exploration and iteration. Good luck with live reloading a GUI in C.
And if you were to force the LLM to do all that in C, at some point it might decide to write its own language.
Also I want to understand the code as much as possible - sometimes the agent overcomplicates things and I can make suggestions to simplify. But if it's writing in a language only it can understand, that's going to be harder for me.
C code that survives in the wild tends to be written by experienced devs who care about correctness. The JS corpus includes mountains of tutorial code, Stack Overflow copypasta, and npm packages with 3 downloads. In my experience, generated C is noticeably better—and this might be why.
I’m not claiming the generated C is “safe” or even close. I am sure that in practice it still has plenty of time-bombs, but empirically, for the narrow WASM tasks I tried, the raw C suggestions were dramatically less wrong than the equivalent JavaScript ones — fewer obvious foot-guns, better idioms, etc.
So my original “noticeably better” was really about “fewer glaring mistakes per 100 lines” rather than “actually correct.” I still end up rewriting or heavily massaging almost everything, but it’s a better starting point than the JS ever was.
What frontier models also excel at is writing their own libraries and skipping third-party dependencies. It's very easy for a human to just pick up a bloated 750kb library they're only going to actually use 15kb worth of its code for, BUT that library can serve as a valuable implementation model for someone very patient and willing to "reinvent the wheel" a little bit, which is definitely going to be AI and not me, because I just want to point to a black box and tell it what to do. For big things like web server, I'm still defaulting to Axum, but for things like making a JS soundbank parser/player or a WebGL2 mp4 & webm parser/demuxer & player, these are tasks frontier models are good for with the right prompting.
To an extent, maybe counter-intuitively, I think the big thing we'll see out of AI is an explosion of artisanship -- with humanoid robots in 2040, perhaps, our households may be making their own butter again, for example.
auntienomen•1d ago
bigstrat2003•1d ago
auntienomen•1d ago
DonHopkins•1d ago
A: Yes.