It's time for people to wake up and stop using Python, and forcing me to use Python
In any case, you can easily get most of the benefits of typed languages by adding a rule that requires the LLM to always output Python code with type annotations and validate its output by running ruff and ty.
I've done work on reviewing and fine-tuning training data with a couple of providers, and the amount of Python code I got to see at least out-distanced C++ code by far more than 2 orders of magnitude. It could be a heavily biased sample, but I have no problems believing it also could be representative.
Practically, it was reported that LLM-backed coding agents just worked around type errors by using `any` in a gradually typed language like TypeScript. I also personally observed such usage multiple times.
I also tried using LLM agents with stronger languages like Rust. When complex type errors occured, the agents struggled to fix them and eventually just used `todo!()`
The experience above can be caused by insufficient training data. But it illustrates the importance of eval instead of ideological speculation.
I have no problem believing they will handle some languages better than others, but I don't think we'll know whether typing makes a significant difference vs. other factors without actual tests.
Anecdotally, the worst and most common failure mode of an agent is when an agent starts spinning its wheels and unproductively trying to fix some error and failing, iterating wildly, eventually landing on a bullshit (if any) “solution”.
In my experience, in Typescript, these “spin out” situations are almost always type-related and often involve a lot of really horrible “any” casts.
(1) Are current LLMs better at vibe coding typed languages, under some assumptions about user workflow?
(2) Are LLMs as a technology more suited to typed languages in principle, and should RL pipelines gravitate that way?
I have had a good time with Rust. It's not nearly as easy to skirt the type system in Rust, and I suspect the culture is also more disciplined when it comes to 'unwrap' and proper error management. I find I don't have to explicitly say "stop using unwrap" nearly as often as I have to say "stop using any".
In this current world of quite imperfect LLMs, I agree with the OP, though. I also wonder whether, even if LLMs improve, we will be able to use type systems not exactly for their original purpose but more as a way of establishing that the generated code is really doing what we want it to, something similar to formal verification.
However perfect LLMs would just replace compilers and programming languages above assembly completely.
Depending on who you speak to it can be anything from coding only by describing the general idea of what you want, to just being another term for LLM assisted programming.
In truth, for LLM generated code to be maintainable and scalable, it first needs to be speced-out super well by the engineer in collaboration with the LLM, and then the generated code must also be reviewed line-by-line by the engineer.
There is no room for vibe coding in making things that last and don't immediately get hacked.
That just leaves the business logic to sort out. I can only imagine that IDEs will eventually pair directly with the compiler for instant feedback to fix generations.
But rust also has traits, lifetimes, async, and other type flavors that multiples complexity and causes issues. It also an in progress language… im about to add a “don’t use once cell.. it’s part of std now “ to my system prompt. So it’s not all sunshine, and I’m deeply curious how a pure vibe coded rust app would turn out.
I did this not knowing any rust: https://github.com/KnowSeams/KnowSeams and rust felt like a very easy to use a scripting language.
Did the LLM help at all in designing the core, the state machine itself ?
Rust's RegEx was perfect because it doesn't allow anything that isn't a DFA. Yes-ish, the LLM facilitated designing the state machine, because it was part of the dev-loop I was trying out.
The speed is primarily what enabled finding all of the edge cases I cared about. Given it can split 'all' of a local project gutenberg mirror in < 10 seconds on my local dev box I could do things I wouldn't otherwise attempt.
The whole thing is there in the ~100 "completed tasks" directory.
I wonder if LLMs can use the type information more like a human with an IDE.
eg. It generates "(blah blah...); foo." and at that point it is constrained to only generate tokens corresponding to public members of foo's type.
Just like how current gen LLMs can reliably generate JSON that satisfies a schema, the next gen will be guaranteed to natively generate syntactically and type- correct code.
Just throw more GPUs at the problem and generate N responses in parallel and discard the ones that fail to match the required type signature. It’s like running a linter or type check step, but specific to that one line.
> It seems that typed, compiled, etc. languages are better suited for vibecoding, because of the safety guarantees.
There are no "safety guarantees" with typed, compiled languages such as C, C++, and the like. Even with Go, Rust and others, if you don't know the language well enough, you won't find the "logic bugs" and race conditions in your own code that the LLM creates; even with the claims of "safety guarantees".
Additionally, the author is slightly confusing the meaning of "safety guarantees" which refers to memory safety. What they really mean is "reasoning with the language's types" which is easier to do with Rust, Go, etc and harder with Python (without types) and Javascript.
Again we will see more of LLM written code like this example: [0]
[0] https://sketch.dev/blog/our-first-outage-from-llm-written-co...
Can you explain more why you've arrived at this opinion?
It's the best at Go imho since it has enforced types and a garbage collector.
For example if you are using rails vibe coding is great because there is an MCP, there are published prompts, and there is basically only one way to do things in rails. You know how files are to be named, where they go, what format they should take etc.
Try the same thing in go and you end up with a very different result despite the fact that go has stronger typing. Both Claude and Gemini have struggled with one shotting simple apps in go but succeed with rails.
the more constraints you have, the more freedom you have to "vibe" code
and if someone actually built AI for writing tests, catching bugs and iterating 24/7 then you'd have something even cooler
foreach (string enumName in Enum.GetNames(typeof(Pair)))
{
if (input.Contains($"${enumName}"))
This framing reminds me of the classic problem in media literacy: people know when a journalistic source is poor when they’re a subject matter expert, but tend to assume that the same source is at least passably good when less familiar with the subject.
I’ve had the same experience as the author when doing web development with LLMs: it seems to be doing a pretty good job, at least compared to the mess I would make. But I’m not actually qualified to make that determination, and I think a nontrivial amount of AI value is derived from engineers thinking that they are qualified as such.
[0] https://en.m.wikipedia.org/wiki/Gell-Mann_amnesia_effect
But if you want it to generate chunks of usable and eloquent Python from scratch, it’s pretty decent.
And, FWIW, I’m not fluent in Python.
My repos all have pre-commit hooks which run the linters/formatters/type-checkers. Both Claude and Gemini will sometimes write code that won't get past mypy and they'll then struggle to get it typed correct before eventually by passing the pre-commit check with `git commit -n`.
I've had to add some fairly specific instructions to CLAUDE.md/GEMINI.md to get them to cut this out.
Claude is better about following the rules. Gemini just flat out ignores instructions. I've also found Gemini is more likely to get stuck in a loop and give up.
That said, I'm saying this after about 100 hours of experience with these LLMs. I'm sure they'll get better with their output and I'll get better with my input.
You can also just ask the LLM: are you sure this is idiomatic?
Of course it may lie to you...
This works so long as you know how to ask the question. But it's been my experience that an LLM directed on a task will do something, and I don't even know how to frame its behavior in language in a way that would make sense to search for.
(My experience here is with frontend in particular: I'm not much of a JS/TS/HTML/CSS person, and LLMs produce outputs that look really good to me. But I don't know how to even begin to verify that they are in fact good or idiomatic, since there's more often than not multiple layers of intermediating abstractions that I'm not already familiar with.)
(BTW the answer is Go, not Rust, because the other thing that makes a language well suited for AI development is fast compile times.)
(I don't have an opinion on one being better than the other for LLM-driven development; I've heard that Go benefits from having a lot more public data available, which makes sense to me and seems like a very strong advantage.)
I'm a relatively old school lisp fan, but it's hard to do this job for a long time without eventually realizing helping your tools is more valuable than helping yourself
It is easier to write things using a Python dict than to create a struct in Go or use the weird `map[string]interface{}` and then deal with the resulting typecast code.
After I started using GitHub Copilot (before the Agents), that pain went away. It would auto-create the field names, just by looking at the intent or a couple of fields. It was just a matter of TAB, TAB, TAB... and of course I had to read and verify - the typing headache was done with.
I could refactor the code easily. The autocomplete is very productive. Type conversion was just a TAB. The loops are just a TAB.
With Agents, things have become even better - but also riskier, because I can't keep up with the code review now - it's overwhelming.
Pre-llms, this was an up front cost when writing golang, which made the cost/benefit tradeoff often not worth it. With LLMs, the cost of writing verbose code not only goes down, it forces the LLM to be strict with what it's writing and keeps it on track. The cost/benefit tradeoff has increased greatly in go's favor as a result.
The issue is those who don't use type checkers religiously with Python - they give Python a bad name.
> I am amazed every time how my 3-5k line diffs created in a few hours don’t end up breaking anything, and instead even increase stability.
In my personal opinion, there's no way you're going to get a high quality code base while adding 3,000 - 5,000 lines of code from LLMs on a regular basis. Those are huge diffs.
gompertz•2h ago
benreesman•2h ago
I used to yell at Claude Code when it tried to con me with mocks to get the TODO scratched off, now I laugh at the little bastard when it tries to pull a fast one on -Werror.
Nice try Claude Code, but around here we come to work or we call in sick, so what's it going to be?
herrington_d•1h ago
Also https://arxiv.org/abs/2406.03283, Enhancing Repository-Level Code Generation with Integrated Contextual Information, uses staic analyzers to produce prompts with more context info.
Yet, the argument does directly translate to the conclusion that typed language is rigorously better for LLM without external tools. However, typed language and its static analysis information do seem to help LLM.
vidarh•1h ago
A system doing type-constrained code-generation can certainly implement its own static type system by tracking a type for variables it uses and ensuring those constraints are maintained without actually emitting the type checks and annotations.
Similarly, static analyzers can be - and have been - applied to dynamically typed languages, though if these projects have been written using typical patterns of dynamic languages the types can get very complex, so this tends to work best with code-bases written for it.
cultofmetatron•1h ago
treve•1h ago