It's time for people to wake up and stop using Python, and forcing me to use Python
In any case, you can easily get most of the benefits of typed languages by adding a rule that requires the LLM to always output Python code with type annotations and validate its output by running ruff and ty.
I've done work on reviewing and fine-tuning training data with a couple of providers, and the amount of Python code I got to see at least out-distanced C++ code by far more than 2 orders of magnitude. It could be a heavily biased sample, but I have no problems believing it also could be representative.
https://en.wikipedia.org/wiki/C%2B%2B#History
In 1985, the first edition of The C++ Programming Language was released, which became the definitive reference for the language, as there was not yet an official standard.[31] The first commercial implementation of C++ was released in October of the same year.[28]
In 1998, C++98 was released, standardizing the language, and a minor update (C++03) was released in 2003.
https://en.wikipedia.org/wiki/History_of_Python
The programming language Python was conceived in the late 1980s,[1] and its implementation was started in December 1989[2] by Guido van Rossum at CWI in the Netherlands as a successor to ABC capable of exception handling and interfacing with the Amoeba operating system.[3]
Python reached version 1.0 in January 1994.
Of course it's hard to say how much that is reflected in code available and is any of the old code still valid input for modern use. It does broadly look like c++ is older, in general.
Sure, C++ is 42 years old, Python is “only” 34. Both are older than the online code hosts (or even the web itself) from which the code for AI training data is sourced, so age probably isn't a key factor in how much code of each is there, popularity with projects hosted in accessible public code repos is more relevant.
Yes, mypy is slow, but who cares if it's the agent waiting on it to complete.
My personal experience is that by doing exactly that, the productivity, code readability, and correctness goes through the roof, at a slight increase in cost due to having to iterate more.
And since that is an actual language-independent comparison, it leads me to believe that yes, static typing does in fact help substantially, and that the current differences between vibe coding languages are, just like you say, due to the relative quantity of training data.
Practically, it was reported that LLM-backed coding agents just worked around type errors by using `any` in a gradually typed language like TypeScript. I also personally observed such usage multiple times.
I also tried using LLM agents with stronger languages like Rust. When complex type errors occured, the agents struggled to fix them and eventually just used `todo!()`
The experience above can be caused by insufficient training data. But it illustrates the importance of eval instead of ideological speculation.
I have no problem believing they will handle some languages better than others, but I don't think we'll know whether typing makes a significant difference vs. other factors without actual tests.
Build runs linters and tests and actually builds the project, kinda-sorta confirming that nothing major broke.
If the goal is just to output code that does not show any linter errors, then yes, choose a dynamically typed language.
But for code that works at runtime? Types are a huge helper for humans and LLMs alike.
Anecdotally, the worst and most common failure mode of an agent is when an agent starts spinning its wheels and unproductively trying to fix some error and failing, iterating wildly, eventually landing on a bullshit (if any) “solution”.
In my experience, in Typescript, these “spin out” situations are almost always type-related and often involve a lot of really horrible “any” casts.
The one thing I would really recommend adding to your constraints is to Don't Repeat Yourself - always check if something already exists. LLMs like to duplicate functionality, even if it's included in their context.
Can I ask why you have asked it to avoid abstractions? My experience has been that the old rules, such as avoid premature abstraction or premature optimization, don't apply as cleanly because of how ephemeral and easy to write the actual code is. I now ask the LLM to anticipate the space of future features and design modular abstractions that maximize extensibility.
Some models like to add abstractions regardless of their usefulness (Google's models seems excessively prone to this for some reason), so ended up having to prompt it away so it lets me come up with whatever abstractions are needed. The rules in that gist is basically just my own coding guidelines put in a way that LLMs can understand them, when I program "manually" I program pretty much that way.
I have yet to find any model that can properly plan feature implementations or come up with designs that are proper, including abstractions, so that's something I do myself at least for now, the system prompts mostly reflect that workflow too.
> because of how ephemeral and easy to write the actual code is
The code I produce isn't ephemeral by any measure I understand that word, anything I end up using stays where it is until it gets modified. I'm not doing "vibe coding" which it seems you're doing, might need some different prompts for that.
(1) Are current LLMs better at vibe coding typed languages, under some assumptions about user workflow?
(2) Are LLMs as a technology more suited to typed languages in principle, and should RL pipelines gravitate that way?
I have had a good time with Rust. It's not nearly as easy to skirt the type system in Rust, and I suspect the culture is also more disciplined when it comes to 'unwrap' and proper error management. I find I don't have to explicitly say "stop using unwrap" nearly as often as I have to say "stop using any".
There's a fine line between gradient descent, pedantry, and mocking. I suspect we will learn more about it.
In this current world of quite imperfect LLMs, I agree with the OP, though. I also wonder whether, even if LLMs improve, we will be able to use type systems not exactly for their original purpose but more as a way of establishing that the generated code is really doing what we want it to, something similar to formal verification.
However perfect LLMs would just replace compilers and programming languages above assembly completely.
It's not just that humans aren't good at thinking in assembly language or binary, but the operations are much more granular and so it requires a lot of operations to do express something as simple as a for-loop or a function call.
I think the perfect AI might actually come up with a language closer to Python or JavaScript.
Depending on who you speak to it can be anything from coding only by describing the general idea of what you want, to just being another term for LLM assisted programming.
In truth, for LLM generated code to be maintainable and scalable, it first needs to be speced-out super well by the engineer in collaboration with the LLM, and then the generated code must also be reviewed line-by-line by the engineer.
There is no room for vibe coding in making things that last and don't immediately get hacked.
tldr; fast throwaway code from a LLM, where the human is just looking at the results and not trying to make maintainable code.
> There's a new kind of coding I call "vibe coding", where you fully give in to the vibes, embrace exponentials, and forget that the code even exists. It's possible because the LLMs (e.g. Cursor Composer w Sonnet) are getting too good. Also I just talk to Composer with SuperWhisper so I barely even touch the keyboard. I ask for the dumbest things like "decrease the padding on the sidebar by half" because I'm too lazy to find it. I "Accept All" always, I don't read the diffs anymore. When I get error messages I just copy paste them in with no comment, usually that fixes it. The code grows beyond my usual comprehension, I'd have to really read through it for a while. Sometimes the LLMs can't fix a bug so I just work around it or ask for random changes until it goes away. It's not too bad for throwaway weekend projects, but still quite amusing. I'm building a project or webapp, but it's not really coding - I just see stuff, say stuff, run stuff, and copy paste stuff, and it mostly works.
That just leaves the business logic to sort out. I can only imagine that IDEs will eventually pair directly with the compiler for instant feedback to fix generations.
But rust also has traits, lifetimes, async, and other type flavors that multiples complexity and causes issues. It also an in progress language… im about to add a “don’t use once cell.. it’s part of std now “ to my system prompt. So it’s not all sunshine, and I’m deeply curious how a pure vibe coded rust app would turn out.
I did this not knowing any rust: https://github.com/KnowSeams/KnowSeams and rust felt like a very easy to use a scripting language.
Did the LLM help at all in designing the core, the state machine itself ?
Rust's RegEx was perfect because it doesn't allow anything that isn't a DFA. Yes-ish, the LLM facilitated designing the state machine, because it was part of the dev-loop I was trying out.
The speed is primarily what enabled finding all of the edge cases I cared about. Given it can split 'all' of a local project gutenberg mirror in < 10 seconds on my local dev box I could do things I wouldn't otherwise attempt.
The whole thing is there in the ~100 "completed tasks" directory.
Although, to be fair this is far from vibecoding. Your setup, at a glance, says a lot about how you use the tools, and it's clear you care about the end result a lot.
You have a PRD file, your tasks are logged, each task defines both why's and how's, your first tasks are about env setup, quality of dev flow, exploration and so on. (as a nice tidbit, the model(s) seem to have caught on to this, and I see some "WHY:" as inline comments throughout the code, with references to the PRD. This feels nice)
It's a really cool example of "HOW" one should approach LLM-assisted coding, and shows that methods and means matter more than your knowledge in langx or langy. You seem to have used systems meant to help you in both speed of dev and ease of testing that what you got is what you need. Kudos!
I might start using your repo as a good example of good LLM-assisted dev flows.
I wonder if LLMs can use the type information more like a human with an IDE.
eg. It generates "(blah blah...); foo." and at that point it is constrained to only generate tokens corresponding to public members of foo's type.
Just like how current gen LLMs can reliably generate JSON that satisfies a schema, the next gen will be guaranteed to natively generate syntactically and type- correct code.
Just throw more GPUs at the problem and generate N responses in parallel and discard the ones that fail to match the required type signature. It’s like running a linter or type check step, but specific to that one line.
> It seems that typed, compiled, etc. languages are better suited for vibecoding, because of the safety guarantees.
There are no "safety guarantees" with typed, compiled languages such as C, C++, and the like. Even with Go, Rust and others, if you don't know the language well enough, you won't find the "logic bugs" and race conditions in your own code that the LLM creates; even with the claims of "safety guarantees".
Additionally, the author is slightly confusing the meaning of "safety guarantees" which refers to memory safety. What they really mean is "reasoning with the language's types" which is easier to do with Rust, Go, etc and harder with Python (without types) and Javascript.
Again we will see more of LLM written code like this example: [0]
[0] https://sketch.dev/blog/our-first-outage-from-llm-written-co...
Can you explain more why you've arrived at this opinion?
It's the best at Go imho since it has enforced types and a garbage collector.
For example if you are using rails vibe coding is great because there is an MCP, there are published prompts, and there is basically only one way to do things in rails. You know how files are to be named, where they go, what format they should take etc.
Try the same thing in go and you end up with a very different result despite the fact that go has stronger typing. Both Claude and Gemini have struggled with one shotting simple apps in go but succeed with rails.
the more constraints you have, the more freedom you have to "vibe" code
and if someone actually built AI for writing tests, catching bugs and iterating 24/7 then you'd have something even cooler
This is called a nightly CI/CD pipeline.
Run a build and run all tests and run all coverage at midnight, failed/regressed tests and reduced coverage automatically are assigned to new tickets for managers to review and assign.
Who does that, we are not in 90s anymore.
Run all the tests and coverage on every PR, block merge on it passing. If you think that's too slow then you need to fix your tests.
The existing tests aren't optimal, but it's not going to be possible to cut it by 1-2 orders of magnitude by "fixing the tests"
We obviously have smaller pre-merge tests as well.
This. I feel like trying to segregate tests into "unit" and "integration" tests (among other kinds) did a lot of damage in terms of prevalent testing setups.
Tests are either fast or slow. Fast ones should be run as often as possible, with really fast ones every few keystrokes (or on file save in the IDE/editor), normal fast ones on commit, and slow ones once a day (or however often you can afford, etc.). All these kinds of tests have value, so going without covering both fast and slow cases is risky. However, there's no need for the slow tests to interrupt day-to-day development.
I seem to remember seeing something like `<slowTest>` pragma in GToolkit test suites, so at least a few people seem to have had the same idea. The majority, however, remains fixated on unit/integration categorization and end up with (a select few) unit tests taking "1-2 orders of magnitude" too long, which actually diminishes the value of those tests since now they're run less often.
How else are we going to cover these costs? https://www.youtube.com/watch?v=cwGVa-6DxJM
Iteration speed can never be faster than the testing cycle.
Unless you're building something massive (like Windows or Oracle maybe) nobody is relying on "nightly" integration tests anymore.
But if there are any instances of this, I have not seen them, and seemingly neither has anyone I've posed the question to, or any passersby.
I agree we're not seeing open source projects be entirely automated with LLMs yet. People still have to find issues, generate PRs (even if mostly automatic), open them, respond to comments, etc. It takes time and energy.
There's also a bit of a stigma about vibe coding: career wise, personally I worry that sharing some of this work will diminish how people view me as an engineer. Who'd take the risk if there might be a naysayer on some future interview panel who will see CLAUDE.md in a repo of yours and assume you're incompetent or feckless?
Plus, worries about code: being an author gives you a much higher level of control than being an author-reviewer. To err as a writer is human, to err as a reader has bigger consequences.
There was a second shot, which was to add caching of job names because we have a few hundred now.
(Context: I'm at a company that has only ever done data via hitting a few hand replicated on prem databases at the moment and wanted to give twitchy folks an overview tool that was easy to use and look at)
While a lot of people here on this platform like to tinker and are often jumping to a new thing, most of my colleagues have no such ideas of grandeur and just want something that works. Rails and it's acolytes work really well. I'm curious to know what popular frameworks you're referencing that don't fit into this Rails-like mold?
Spring Boot is definitely opinionated (this is taken from their home page). Maybe not as much as RoR, but saying it isn't at all sounds very strange to me, having worked with it for a few years too...
Are you sure? Django is insanely popular. I am not sure on what basis you are saying Django isn't popular. I posit Django is more popular than Ruby on Rails.
I imagine there could be some presets out there that guide the vibe-coding engines to produce a particular structure in other languages for better results.
Same thing with other libraries like HTMX. Using TypeScript with React, and opinionated tools like Tanstack Query helps LLMs be way more productive because it can fix errors quickly by looking at type annotations, and using common patterns to build out user interactions.
foreach (string enumName in Enum.GetNames(typeof(Pair)))
{
if (input.Contains($"${enumName}"))This framing reminds me of the classic problem in media literacy: people know when a journalistic source is poor when they’re a subject matter expert, but tend to assume that the same source is at least passably good when less familiar with the subject.
I’ve had the same experience as the author when doing web development with LLMs: it seems to be doing a pretty good job, at least compared to the mess I would make. But I’m not actually qualified to make that determination, and I think a nontrivial amount of AI value is derived from engineers thinking that they are qualified as such.
[0] https://en.m.wikipedia.org/wiki/Gell-Mann_amnesia_effect
That's lethologica! Or maybe in this specific case lethonomia. [0]
But if you want it to generate chunks of usable and eloquent Python from scratch, it’s pretty decent.
And, FWIW, I’m not fluent in Python.
My repos all have pre-commit hooks which run the linters/formatters/type-checkers. Both Claude and Gemini will sometimes write code that won't get past mypy and they'll then struggle to get it typed correct before eventually by passing the pre-commit check with `git commit -n`.
I've had to add some fairly specific instructions to CLAUDE.md/GEMINI.md to get them to cut this out.
Claude is better about following the rules. Gemini just flat out ignores instructions. I've also found Gemini is more likely to get stuck in a loop and give up.
That said, I'm saying this after about 100 hours of experience with these LLMs. I'm sure they'll get better with their output and I'll get better with my input.
I think that's the point of the article.
In a dynamic language or a compiled language, its going to be hallucinating either way. If you vibe coding the errors are caught earlier so you can vibe code them away before it blows up at run time.
You can say that again.
I was looking into the many comments for this particular comment and you did hit the nail on the head.
The irony is that it took the entire GenAI -> LLM -> vibe coding cycle to settle the argument that typed language is better for human coding and software engineering.
Why not have static analysis tools on the other side of those generations that constrain how the LLM can write the code?
What I'd like to see is the CLI's interaction with VSCode etc extending to understand things which the IDE has given us for free for years.
We do have it, we call those programmers, without such tools you don't get much useful output at all. But other than that static analysis tools aren't powerful enough to detect the kind of problems and issues these language models creates.
A few things beyond your question, for anyone curious:
I've also poked around with a custom MCP server that attempts to teach the LLM how to use ast-grep, but that didn't really work as hoped. It helps sometimes but my next shot on that project will be to rely on GritQL. Smaller LLMs stumble over the YAML indentation. GritQL is more like a template language for AST aware code transformations.
Lastly, there are probably a lot of little things in my long term context that help get into a successful flow. I wouldn't be surprised if a key difference between getting good results and getting bad results with these agentic LLM tools is how people are reacting to failures. If a failure makes you immediately throw up your hands and give up, you're not doing it right. If instead you press the little '#' (in claude code) and enter some instructions to the long term context memory, you'll get results. It's about persistence and really learning to understand these things as tools.
Also interesting note on the docs, though, Claude does try to use cargo doc by itself sometimes.
I was actually wondering why GritQL did not have an MCP, this seems like a natural fit. Would be interested to know if this works for you.
I'm always a bit hesitant to add things to the long term context as it feels very finicky to not have it be ignored and having more seems to make it more likely to be ignored. Instead I usually just repeat myself.
Thank you for the answer, seems there is still lots of things to try.
Largely I think LLMs struggle with Rust because it is one of very few languages that actually does something new. The semantics are just way more different than the difference between, say, Go and TypeScript. I imagine they would struggle just as much with Haskell, Ocaml, Prolog, and other interesting languages.
But that is all independent of how the LLMs are used, especially in an agentic coding environment. Strong/static typed languages with good compiler messages have a very fast feedback loop via parsing and typechecking, and agentic coding systems that are properly guided (with rulesets like Claude.md files) can iterate much quicker because of it.
I find that even with relatively obscure languages (like OCaml and Scala), the time and effort it takes to get good outcomes is dramatically reduced, albeit with a higher cost due to the fact that they don't usually get it right on the first try.
'I have a database table Foo, here is the DDL: <sql>, create CRUD end points at /v0/foo; and use the same coding conventions used for Bar.'
I find it copies existing code style pretty well.
At the end of the day this is a trivial problem. When Claude Code finishes a commit, just spin up another Claude Code instance and say "run a git diff, find and fix inefficient and ugly code, and make sure it still compiles."
You can also just ask the LLM: are you sure this is idiomatic?
Of course it may lie to you...
This works so long as you know how to ask the question. But it's been my experience that an LLM directed on a task will do something, and I don't even know how to frame its behavior in language in a way that would make sense to search for.
(My experience here is with frontend in particular: I'm not much of a JS/TS/HTML/CSS person, and LLMs produce outputs that look really good to me. But I don't know how to even begin to verify that they are in fact good or idiomatic, since there's more often than not multiple layers of intermediating abstractions that I'm not already familiar with.)
Have you tried recursion? Something like: "Using idiomatic terminology from the foo language ecosystem, explain what function x is doing."
If all goes well it will hand you the correct terminology to frame your earlier question. Then you can do what the adjacent comment describes and ask it what the idiomatic way of doing p in q is.
Sure, that approach could fail in the face of it having solidly internalized an absolutely backwards conception of an entire area. But that seems exceedingly unlikely to me.
It will also be incredibly time consuming if you're starting from zero on the topic in question. But then if you're trying to write related code you were already committed to that uphill battle, right?
To your point that you're not sure what to search for, I do the same thing I always do: I start searching for reference documentation, reading it, and augmenting that with whatever prominent code bases/projects I can find.
After just a few weeks in this brave new world my answer is: it depends, and I'm not really sure.
I think over time as both the LLMs get better and I get better at working with them, I'll start trusting them more.
One thing that would help with that would be for them to become a lot less random and less sensitive to their prompts.
I found the reverse flow to be better. Never argue. Start asking questions first. "What is the idiomatic way of doing x in y?" or "Describe idiomatic y when working on x" or similar.
Then gather some stuff out of the "pedantic" generations and add to your constraints, model.md, task.md or whatever your stuff uses.
You can also use this for a feedback loop. "Here's a task and some code, here are some idiomatic concepts in y, please provide feedback on adherence to these standards".
When reviewing LLM code, you should have this readability in the given language yourself - or the code should not be important.
(BTW the answer is Go, not Rust, because the other thing that makes a language well suited for AI development is fast compile times.)
(I don't have an opinion on one being better than the other for LLM-driven development; I've heard that Go benefits from having a lot more public data available, which makes sense to me and seems like a very strong advantage.)
I'm a relatively old school lisp fan, but it's hard to do this job for a long time without eventually realizing helping your tools is more valuable than helping yourself
It is easier to write things using a Python dict than to create a struct in Go or use the weird `map[string]interface{}` and then deal with the resulting typecast code.
After I started using GitHub Copilot (before the Agents), that pain went away. It would auto-create the field names, just by looking at the intent or a couple of fields. It was just a matter of TAB, TAB, TAB... and of course I had to read and verify - the typing headache was done with.
I could refactor the code easily. The autocomplete is very productive. Type conversion was just a TAB. The loops are just a TAB.
With Agents, things have become even better - but also riskier, because I can't keep up with the code review now - it's overwhelming.
Pre-llms, this was an up front cost when writing golang, which made the cost/benefit tradeoff often not worth it. With LLMs, the cost of writing verbose code not only goes down, it forces the LLM to be strict with what it's writing and keeps it on track. The cost/benefit tradeoff has increased greatly in go's favor as a result.
The issue is those who don't use type checkers religiously with Python - they give Python a bad name.
> I am amazed every time how my 3-5k line diffs created in a few hours don’t end up breaking anything, and instead even increase stability.
In my personal opinion, there's no way you're going to get a high quality code base while adding 3,000 - 5,000 lines of code from LLMs on a regular basis. Those are huge diffs.
Of course, there might be some exceptions like if the codebase for some reason has some massive fixed tables or imports upstream files that may get updated occasionally. Those end up as massive patches or sets.
I remember when I started coding (decades ago), it would take me days to debug certain issues. Part of the problem was that it was difficult to find information online at the time, but another part of the problem was that my code was over-engineered. I could churn out thousands of lines of code quickly but I was only trying to produce code which appeared to work, not code which actually worked in all situations. I would be shocked when some of my code turned out to break once in a while but now I understand that this is a natural consequence of over-complicating the solution and churning out those lines as fast as I could without thinking enough.
Good code is succinct; it looks like it was discovered in its minimal form. Bad code looks like it was invented and the author tried to make it extra fancy. After 20 years coding, I can tell bad code within seconds.
Good code is just easy to read; first of all, you already know what each function is going to do before you even started reading it, just by its name. Then when you read it, there's nothing unexpected, it's not doing anything unnecessary.
Not to different from how a college CS student who hasn't learned git yet would do come to think of it.
Still pretty bad if the author isn't taking the time to at least cull the changes. Though guess it could just be file renames?
As judged by who? And in what field?
I mean, if I look at the big Python libraries I use regularly none of them have types - Django, DRF, NumPy, SciPy, Scikit-learn. That’s not to say there aren’t externally provided stubs but the library authors themselves are often not the ones writing them
Overall though my point was that the article, and most comments here, were completely misrepresenting the situation regarding Python. It's a statically typed language for those that want it to be. There's no need to attempt to run any code that hasn't passed a type checker. And it's an expressive type system; much more so than Go which has been mentioned in comments.
However the fact that the standard library documentation doesn't have types is embarrassing IMO.
Django's stubbing isn't great, there's a lot of polymorphism in the framework (and in DRF). You actually have to change your code rather than just sprinkling annotations in some of places to stop getting 'Any' types.
With the numeric stuff, it's even worse though, for with something like e.g.:
np.sum(X)
the appropriate type of X can be a python list, a numpy array of any numeric type and dimension, etc.With JS, Claude has very high success rate. Only issue I had with it was that one time it forgot to update one part of the code which was in a different file but as soon as I told it, it updated it perfectly.
With TypeScript my experience was that it struggles to find things. Writing tests was a major pain because it kept trying to grep the build output because it had to mock out one of the functions in the test and it just couldn't figure it out.
Also typed code it produces is more complex to solve the same problem with more different files and it struggles to get the right context. Also TS is more verbose (this is objectively true and measurable); requires more tokens so it literally costs more.
https://jackpal.github.io/2025/03/29/Gemini_2.5_Pro_Advent_o...
If Gemini is equally good at them in spite of that, doesn't that mean it'd be better at Rust than at Python if it had equal training in both?
It's not quite the same principal OP articulates, which is that a compiler provides safety and that certainty lets you move fast when vibe coding. Instead, what I'm claiming is that you can move fast by allowing the LLM to focus on fewer things. (Though, incidentally, the compiler does give you that safety net as well.)
With Scala, I have to give the LLM a super simple job, e.g. creating some mock data for a unit test, and even then it frequently screws up; every now and then it gives me code that doesn't even compile. So much for Scala's strong type system ..
Of course, you have to tell the agent to set up static analysis linters first, and tell the agent to write tests. But then it'll obey them.
The reason why large enterprises could hire armies of juniors in the past, safely, was because they set up all manner of guardrails that juniors could bounce off of. Why would you "hire" a "junior" agent without the same guardrails in place?
I've been vibe-coding for a few days in Haskell, and I don't like the result.
Maybe I am just accustomed to being ok with verbose Rust, while Haskell comes with a great potential for elegance that the LLM does not explore.
Regardless, the argument that types guide the LLM in a very reliable way holds in both cases.
What dynamically typed languages lack in compile-time safety, the programmer must make up using (automated) testing. With adequate tests, a python program doesn't break more than a Rust or Go program. It's just that people often regard testing as an annoying chore which is the first thing they skip when vibe coding (or "going fast and breaking things" which is then literally what happens).
but it is tho, You literally can just give LLM to check LSP to analyze early it for you without write test to begin, Their LSP and Compiler is just that smart
Fixed it
I suspect vibe coding will not be a good fit for writing these libraries, because they require knowledge and precision which the typical vibe coding use probably doesn't show, or the willingness to spend time on the topic which is also typically not what drives people to vibe coding.
So my conclusion would be that vibe coding drives the industry to solidify around already well-established ecosystem, since less of the people producing code will have the time, knowledge and/or will to build that ecosystem in newer languages. Whether that drive is strong enough to be noticable is another question.
Perhaps there is a future where individuals can translate large numbers of libraries, and instead of manually porting future improvements of the original versions to the copies, just rerun the translation as needed.
[tool.ruff]
line-length = 88
select = ["E", "F", "W", "I", "N", "UP", "B", "C4"] # A good strict baseline
ignore = []
[tool.mypy]
python_version = "3.12"
warn_return_any = true
warn_unused_configs = true
disallow_untyped_defs = true
disallow_any_unimported = true
no_implicit_optional = true
check_untyped_defs = true
strict = true
I wrote this [1] comment a few weeks ago:
""" ... Claude Code is surprisingly good at writing Nim. I just created a QuickJS + MicroPython wrapper in Nim with it last week, and it worked great!
Don't let "but the Rust/Go/Python/JavaScript/TypeScript community is bigger!" be the default argument. I see the same logic applied to LLM training data: more code means more training data, so you should only use popular languages. That reasoning suggests less mainstream languages are doomed in the AI era.
But the reality is, if a non-mainstream language is well-documented and mature (Nim's been around for nearly 20 years!), go for it. Modern AI code gen can help fill in the gaps. """
I add one more step - add strong linting (ESLint with all recommended rules switched on, Ruff for Python) and asking to run it after each edit.
Usually I also prompt to type things well, and avoid optional types unless strictly necessary (LLMs love to shrink responsibility that way).
For example, see my recent vibe-coding instructions, https://github.com/QuesmaOrg/demo-webr-ggplot/blob/main/CLAU....
From the perspective of a primarily backend dev who knows just enough React/ts to be dangerous, Claude is generating pretty decent frontend code, letting me spend more time on the Rust backend of my current side project.
Better in what sense? I've been using Anthropic models to write in different Lisps - Fennel, Clojure, Emacs Lisp, and they do perform a decent job. I can't always blindly copy-and-paste generated code, but I wouldn't do that with any PL.
> It seems that typed, compiled, etc. languages are better suited for vibecoding, because of the safety guarantees. This is unsurprising in hindsight, but it was counterintuitive because by default I “vibed” projects into existence in Python since forever
[...]
> For example, I refactored large chunks of our TypeScript frontend code at TextCortex. Claude Code runs tsc after finishing each task and ensures that the code compiles before committing. This let me move much faster compared to how I would have done it in Python, which does not provide compile-time guarantees.
While Python doesn't have a required compulation step, it has both a standard type system and typecheckers for it (mypy, etc.) that are ubiquitous in the community and could be run at the same point in the process.
I would say it's not just Rust, TypeScript, and Go that the author has a weak foundation in.
Given Rail's maturity, i would have expected otherwise - there is tons of Ruby/Rails code to train on, but... yeah.
OTOH, doing some side-project stuff in TS, and the difference is a little mindblowing. I can see the hype behind vibecoding WAY more.
Just compare SeaORM with Ruby + sequel where you just inherit the Sequel::Model class and Sequel reads the table schema without you having to tell it to. It gives you objects with one method for each column and each value has the correct type.
I was happy with Ruby's performance 15 years ago and now it's about 7-20x with a modern ruby version and CPU, one a single thread.
AI is still helpful to learn but it doesn't need to do the coding when using Ruby. I think the same criteria apply with or without AI for choosing a language. Is it a single-person project? Does it really require highly optimized machine code? etc.
Most definitely not going to happen. Python is the language of the AI age and lot of ML/AI libraries do their reference or first release in Python.
gompertz•6mo ago
benreesman•6mo ago
I used to yell at Claude Code when it tried to con me with mocks to get the TODO scratched off, now I laugh at the little bastard when it tries to pull a fast one on -Werror.
Nice try Claude Code, but around here we come to work or we call in sick, so what's it going to be?
herrington_d•6mo ago
Also https://arxiv.org/abs/2406.03283, Enhancing Repository-Level Code Generation with Integrated Contextual Information, uses staic analyzers to produce prompts with more context info.
Yet, the argument does directly translate to the conclusion that typed language is rigorously better for LLM without external tools. However, typed language and its static analysis information do seem to help LLM.
vidarh•6mo ago
A system doing type-constrained code-generation can certainly implement its own static type system by tracking a type for variables it uses and ensuring those constraints are maintained without actually emitting the type checks and annotations.
Similarly, static analyzers can be - and have been - applied to dynamically typed languages, though if these projects have been written using typical patterns of dynamic languages the types can get very complex, so this tends to work best with code-bases written for it.
cultofmetatron•6mo ago
treve•6mo ago