So for these reasons alone I would be against using TS as a lingua franca for LLM codegen (as is GP I assume). As another commenter mentioned, LLMs have a tendency to throw their hands^Hlogits up when presented with complex TS type errors and just resort to using `any` to get it to compile (and probably hiding bugs).
And that doesn't even touch the issues with the JS/TS ecosystem and runtimes more broadly.
But the devil is in the details - some libraries are typed quite crappily, some have unnecessary complex types, and the code that the LLMs was trained on is probably not the best in the world
However crappy your Java codebase is going to be, it will still use types. And as just today Gemini hallucinated an API call that never existed (in a widely available and used library even), it's just better to have the ability to check that right away.
If a codebase is so unkempt the issue is not Typescript - and forgive for writing such a platitude, but you can write awful code in Java, too.
In fact, the more behaviour we can model at compile time the better when it comes to LLMs - there's some cool ideas here like transpiling Rust into languages for formal verification. See https://github.com/formal-land/coq-of-rust as an example.
Formal verification was one of those things that was previously so annoying to do that it rarely made it past academic use cases or extremely important libraries, but I think LLMs take the tedium out of it. Perhaps formal verification will have a "test driven development" type of moment in the sun thanks to this.
There is a little occasional difficulty on syntax with rust, but there are often the same sort of logic errors / getting lost an llm would have on another codebase -- the compiler helps catch many of these.
I think so as well. The rust errors are some of the most "helpful" and easy to understand (once you grok the core concepts around rust) and it seems that the loop of - generate (maybe constrained) - check - fix benefits from this. In my testing it is better than python (long ass traces that you have to manually trim for the LLM).
If you mean “can GitHub copilot author long syntactically, type, and memory-safe- correct rust code in one shot?” Then the answer is “not right now”
Think back to Javascript and untyped Python (without type annotations). It is a lot easier to have bugs in these languages without types. Types help eliminate classes of bugs.
For example, if you want a program from a number to HTML, then if the HTML type is a syntax tree type of all valid HTML rather than wrapper around string, then filtering LLM output by that type will make it more correct than a string wrapper kind of type (as with the latter, any program generated by the LLM which returns a string and wraps it into HTML will do).
The actual use cases might not go as extreme as the above, but the idea is that the "tighter" your type is, the better it is on pruning LLM outputs from invalid programs.
You can write a sorting algorithm in assembly, and it can be correct. Rewriting in Haskell won’t make it “more” correct.
There’s an undercurrent of people espousing strictly types languages (not accusing you) who believe that somehow programs written in them are better. They’re not. They either serve their purpose, or they don’t. Strict typing is a tool. Sometimes it’s enabling. Sometimes (example: horrible polymorphism in most strictly typed languages like C++/Java/copy cats) it’s a hinderance. Strictly typed languages aren’t strictly better than non-strictly typed ones.
tsc error messages are so bad that every time my LLM sees one of those "SomeType is not assignable to SomeLongAssTypeDontEvenTryToUnderstandWhatsGoingOnHere<<<<>>>>>>>>>>>>>>>>>>>>" it just gives up and casts to any. goes for python too.
(Please forgive me the extreme disrespect put forth in the above statement! It is not the intention to show disrespect; I… am quite the rutabaga enjoyer in all respects, you know? I certainly include myself within the absurdity and it is with love.)
Also in response to adjacent commenters - many mission-critical TS codebases will disable the use of an explicit "any" with eslint - https://typescript-eslint.io/rules/no-explicit-any/.
[1]: https://microsoft.github.io/TypeChat/blog/introducing-typech...
For example, you can write a function that takes an object received from an API that uses snake_cased keys, and returns that same object, but with camelCased keys instead. This is not some "special case" in the Typescript compiler, the ability to do this emerges naturally from Typescript's features. I don't know any other language that can do this.
Most people don't know enough TS to use tese things effectively, but I think one could train an LLM to be very good at them. The combination of LLMs placing such advanced constraints on themselves, and then generating code based on those constraints, seems extremely powerful.
More != better.
https://infoscience.epfl.ch/entities/publication/6c6bb09d-a4...
is go faster than rust?
Depends on how you write the Go or Rust code. The most optimal Rust re-write of the TypeScript compiler would very likely be faster than the most optimal version in Go. However they didn't want to do a re-write, they wanted to port the existing compiler codebase written in TS. Go like TS (ultimately the JS runtime) also has GC which makes a 1-to-1 port much easier.
The reason for this decision is that they wanted a near 1:1 port of the typescript code to go, keeping design and structure almost identical.
You can’t do that in rust as easily because of all the cyclical references and indirection involved.
A rust port would be a rewrite. This is merely a migration.
No.
They rewrote in go because go is similar enough to typescript, while being faster than typescript.
Source: https://github.com/microsoft/typescript-go/discussions/411
TypeScript has a type system that is complex enough, you can literally implement wasm inside it (and then use that to run e.g. Doom: https://socket.dev/blog/typescript-types-running-doom)
Need to dig in a bit more on the implementation, but I was surprised that the paper didn't mention hooking into existing language service/server. There's more than types that an LLM could leverage from existing language tooling. Auto imports is a good example, it is handy for the human developer to keep a linear writing flow, something a LLM needs even more.
Pulling in more features to help the system is definitely worth looking into!
https://arxiv.org/abs/2407.08983
AST-T5: Structure-Aware Pretraining for Code Generation and Understanding
https://arxiv.org/abs/2401.03003
CodeGRAG: Bridging the Gap between Natural Language and Programming Language via Graphical Retrieval Augmented Generation
The aidocs server: keeps track of generated llm-friendly docs for any github repo.
The aidocs daemon (aidd) is resident, and can watch a repo, find imports in a number of languages, request the docs from aidocs, serve them up in mcp, and/or put them into a directory in your repo. Planning on generating docs for a codebase and incremental docs creation later.
I could use a couple beta testers -- lmk if you're interested. macos for now, although the daemon is written in go and should be portable.
That this research comes out of universities, and not large AI labs, makes me think those labs believe that larger models are still the way to go.
Used in multiple similar publications, including "Guiding Language Models of Code with Global Context using Monitors" (https://arxiv.org/abs/2306.10763), which uses static analysis beyond the type system to filter out e.g. invalid variable names, invalid control flow etc.
I've never seen it check-in uncompilable code, but watching the Devin console I can see it building and using the code to ensure commits are not complete garbage. When it has checked-in compilable and almost right but slightly wrong code, automatically running lint and tests (it doesn't always run them before checking in) from ci triggers it to push a fix on its own.
Feedback loops are nice, but they can be expensive, and time consuming (oh look at me complain that it takes Devin a whopping 15 minutes to complete a task) so I can definitely see the value in type constraints.
also they have a pay-as-you-go tier now as well.
I pay the full $500 though. This month I'm going to blow past the base allowance and tap into 'gift credits'
speaking of which if anyone wants a referral code (gift creds for me, and for you) hmu
Are there some related works?
Evaluation method which can decide distance between two sentences is hard to find, best option is closed-source LLM API even it's not the most ideal option. As a result, we also must use current LLM to improve our models.
Also, TypeScript error messages can be a pain. When LLMs encounter something like "SomeType is not assignable," instead of handling it properly, they often just cast it to any. This happens way too often.
Google somewhat did this with javascript in their latest Gemini-2.5 Pro release. But what about doing it for a smaller language? Google isn't going to do that, but there is still a lot of demand.
As a developer I certainly think my programming skills in a specific language was improved by knowing other languages so I can contrast and compare.
Many people do think this, but I'm not sure many of them are running AI labs.
I dunno tho.
Big AI labs also have their own agendas and would rather keep scaling and growing than serving a rather smaller real market ?
Once you're into real usage territory, you can't no longer use make up numbers to justify future growth.
If you take some niche language and build an LLM from scratch that's hyperspecialized on that language, will that LLM actually outperform some big LLM that's trained on all the programming resources out there, and all the blogs, forum conversations, stack overflow posts on all those languages, and then learns to generalize that information and apply it to your niche language?
One of the things that LLMs seem to excel at is taking information from one context, transforming it and applying it to another context.
You get then best of both worlds at the expense of a double-round trip or x2, which for something like coding seems fine, people are OK paying 200 for ChatGPT Plus
This also would solve the context window sizes problem of them getting full and the model starting to generate non-sense, if you have the bigger model using the bigger context window to orchestrate and organize the task calling smaller specialized sub-modules, that seems like it should yield better final code outptus than just one big ass LLM
but we'r'e moving the goalposts from 1 model to multi-agentic system i guess so nvermind
and i agree it seems all the big corps are betting for bigger more data for now
To be fair, humans have trouble with that as well.
https://arxiv.org/abs/2407.21783
See figure 8.
More recent work is better at using concrete types, and choosing functions from Hackage, like Hoogle+ https://github.com/TyGuS/hoogle_plus and Hectare https://dl.acm.org/doi/10.1145/3547622
There's also "inductive programming" (producing a function from input/output examples), with Haskell implementations like Magic Haskeller http://nautilus.cs.miyazaki-u.ac.jp/~skata/MagicHaskeller.ht...
My prior experience was that LLMs were not much better than reading the docs, and certainly you wouldn’t get far vibe-coding in Rust. But Claude code behaves like I would, writing code that does t compile (typical LLM behavior), then reading the errors, correcting the code, and iterating until it compiles.
Its first attempt at a graph based scheduler in Rust took about $3 and 10 minutes to work correctly. It was ~500 loc, so definitely faster than what I can write in rust. (To be fair I spent a decent amount of time drafting a description of what I wanted in a markdown file to get Claude started).
Similarly, we have a tool that makes sure the type and syntax are correct, namely, a compiler. Building an LLM that can only output syntactically correct code is one approach of stacking them, but in fact it will significantly worsen the ability of the LLM to reason and construct code. The winning choice seems to be to emulate the workflow of humans: code, compile, read errors, repeat.
Several people mentioned the generation - compilation - fixing loop. Just want to remind you that our approach works for not only the generation step but also the fixing step. This is because fixing is essentially asking LLMs to generate a new version of the code. The paper actually has a "repair" experiment to demonstrate this and our approach achieves significant gain in this experiment, i.e., 37% relative improvement in functional correctness.
> To address this challenge, we introduce a type-constrained decoding approach that leverages type systems to guide code generation.
This should not work with type inference even at the level of C++ "auto x = " - "auto" does not constrain "x" at all and what is right of equal sign is not constrained either..In Haskell, the gap is even wider. A long "where" clause may have dependencies constraining things in different direction.
But, what important I see here is the continuation of reinvention of Cyc, from different starting point. ;)
Definitely, "every big LLM has in support code an ad-hoc bug ridden inefficient implementation of half of Cyc." Cyc was written in Lisp, most of LLM support code is C/C++, thus, it is just a corrolary of Greenspun's Tenth Rule.
jiggawatts•1mo ago
The real challenge will be to make this detect and switch languages automatically. For example, a snippet of code could include a LaTeX formula in a comment and SQL in a string literal. There are many more examples, such as regex inside a shell script, and so on.
The obvious next step after that is back-tracking. It's possible to emit a token that is valid, but then allows no further completions that are valid. In other words, the model can paint itself into a corner. To my knowledge, no current online LLM service uses any kind of backtracking, they run in append ("forwards") mode only.
helltone•1mo ago
grafmax•1mo ago
foota•1mo ago
tough•1mo ago
https://arxiv.org/abs/2504.00532
IterGen: Iterative Semantic-aware Structured LLM Generation with Backtracking
https://arxiv.org/abs/2410.07295
ROCODE: Integrating Backtracking Mechanism and Program Analysis in Large Language Models for Code Generation
https://arxiv.org/abs/2411.07112v1
pizza•1mo ago
There was also an hn thread: https://news.ycombinator.com/item?id=36425375
nielstron•1mo ago
re backtracking: a core part of this paper is ensuring a prefix property. that is there is always a legitimate completion and the model can not "corner" itself!
research needs to be done for what kind of languages and language features this prefix property can be ensured