https://en.wikipedia.org/wiki/Intermediate_representation#La...
Optimizing a language for LLM consumption and generation (probably) doesn't mean you want a LLM designing it.
in that sense I don't see how this is more succinct than phyton
it is more than typescript and c#, of course, but we need to compete with the laconic languages
in that sense you will end up with Cisc vs Risc dilemma from the cpu wars. you will find the ability to compress even more is adding new tokens to compress repetitive tasks like sha256 being a single token. I feel that's a way to compress even more
LLVM IR is a better example.
https://en.wikipedia.org/wiki/Intermediate_representation#La...
I've spent a good bit of time exploring this space in the context of web frameworks and templating languages. One technique that's been highly effective is starting with a _very_ minimal language with only the most basic concepts. Describe that to the LLM, ask it to solve a small scale problem (which the language is likely not yet capable of doing), and see what kinds of APIs or syntax it hallucinates. Then add that to your language, and repeat. Obviously there's room for adjustment along the way, but we've found this process is able to cut many many lines from the system prompts that are otherwise needed to explain new syntax styles to the LLM.
In the fullness of time, you end up having to. Or at least I have. Which is why I always dislike additional layers and transforms at this point.
(eg. when I think about react native on android, I hear "now I'll have to be excellent at react/javascript and android/java/kotlin and C++ to be able to debug the bridge; not that "I can get away with just javascript".)
I'm not necessarily against the approach shown here, reducing tokens for more efficient LLM generation; but if this catches on, humans will read and write it, will write debuggers and tooling for it, etc. It will definitely not be a perfectly hidden layer underneath.
But why not, for programming models, just select tokens that map concisely existing programming languages ? Would that not be as effective ?
There’s a chance this is a joke, but even if it is I don’t wanna give the AI tech bros more terrible ideas, they have enough. ;)
I debug at my abstraction layer because I can trust that my compiler actually works, LLMs are fundamentally different and need to produce human readable code.
No LLM has seen enough of this language vs. python and context is now going to be mostly wordy not codey (e.g. docs, specs etc.)
LLMs aren’t a “superior intelligence” because every abstract concept they “learn” is done so emergently. They understand programming concepts within the scope of languages and tasks that easily map back to those things, and due to finite quantisation they can’t generalise those concepts from first principles. I.e. it can map python to programming concepts, but it can’t map programming concepts to an esoteric language with any amount of reliability. Try doing some prompting and this becomes agonisingly apparent!
It is so annoying to realise mid read that a piece of text was written by an LLM.
It’s the same feeling as bothering to answer a call to hear a spam recording.
I think HN should really ban complaints about LLM written text. It is annoying at best and a discouraging insinuation at worst. The insinuation is really offensive when the insinuation is false and the author in fact wrote the sentence with their own brain.
I don't know if this sentence was written by LLM or not but people will definitely use LLMs to revise and refine posts. No amount of complaining will stop this. It is the new reality. It's a trend that will only continue to grow. These incessant complaints about LLM-written text don't help and they make the comment threads really boring. HN should really introduce a rule to ban such complaints just like it bans complaints about tangential annoyances like article or website formats, name collisions, or back-button breakage
Not saying the commenters never get it wrong, but I’ve seen them get it provably right a bunch of times.
I've seen this happen many times here on HN where the one accused comes back and says that they did in fact write it themselves.
Using an LLM to generate a post with the implication it is the author's own thoughts is the quintessential definition of intellectual laziness.
One might as well argue that plagiarism is perfectly fine when writing a paper in school.
You are talking about an entirely different situation that I purposely avoided in my comment.
If its output isn't massaged by a team, then I appreciate the callouts until the stack is mature/proven. Doesn't make it better/worse...just a different level of scrutiny.
I can't speak for the author, but I do often do this. IMO it's a misleading comparison though, you don't have to debug those things because rarely does the compiler output incorrect code compared to the code you provided, it's not so simple for an LLM.
Imagine saying existing human languages like English are “inefficient” for LLMs so we need to invent a new language. The whole thing LLMs are good at is producing output that resembles their training data, right?
maybe AI should write better readable code than humans. more consistent naming, clearer structure, better comments. precisely because humans only "skim". optimize for skimmability and debuggability, not keystroke efficiency.
> "But I need to debug!"
> Do you debug JVM bytecode? V8's internals? No. You debug at your abstraction layer. If that layer is natural language, debugging becomes: "Hey Claude, the login is failing for users with + in their email."
Folks can get away without reading assembly only when the compiler is reliable. English -> code compilation by llms is not reliable. It will become more reliable, but (a) isn’t now so I guess this is a project to “provoke thought” (b) you’re going to need several nines of reliability, which I would bet against in any sane timeframe (b) English isn’t well specified enough to have “correct” compilation, so unclear if “several nines of reliability” is even theoretically possible.
It seems like a short-sighted solution to a problem that is either transient or negligible in the long run. "Make code nearly unreadable to deal with inefficient tokenization and/or a weird cost model for LLMs."
I strongly question the idea that code can be effectively audited by humans if it can't be read by humans.
A programming language for LLMs isn't a bad idea, but this doesn't look like a good one.
Doesnt typescript have types? The example seems to not have types?
I’ve run into countless situations where this simply doesn’t work. I once had a simple off-by-one error and the AI could not fix it. I tried explaining the end result of what I was seeing, as implied by this example, with no luck. I then found why it was happening myself and explained the exact problem and where it was, and the AI still couldn’t do it. It was sloshing back and further between various solutions and compounding complexity that didn’t help the issue. I ended up manually fixing the problem in the code.
The AI needs to be nearly flawless before this is viable. I feel like we are still a long way away from that.
> Do you debug JVM bytecode? V8's internals?
People do debug assembly generated by compilers to look for miscompilations, missed optimization opportunities, and comparison between different approaches.
The insight seems flawed. I think LLMs are just as capable of understanding these symbols as tokens as they are English words. I am not convinced that this is a better idea than writing code with a ton of comments
That sounds like step 2 before step 1. First you get complains that login in doesn’t work, then you find out it’s the + sign while you are debugging.
My take on the timeline; ( Roughly I think some of them are in between but may be best not to be picky about it )
1950s: Machine code
1960s: Assembly
1970s: C
1980s: C++
1990s: Java
2000s: Perl / PHP / Python / Ruby
2010s: Javascripts / Frameworks
2020s: AI writes, humans review
But the idea is quite clear once we have written this out, we are moving to higher level abstraction every 10 years. In essence we are moving to Low Code / No Code direction.
The languages for AI assisted programming idea isn't new. I have heard at least a few said may be this will help Ruby ( Or Nim ) . Or a Programming languages that is closest reassembling of the English language.
And considering we are reading the code more that ever writing it, since we are mostly reviewing now with LLM. I am thinking if this will also changes the pattern or code output preference.
I think we are in a whole different era now. And a lot of old assumptions we have about PL may need a rethink. Would Procedure Programming and Pascal made a comeback, or the resurgence of SmallTalk OO Programming ?
gnanagurusrgs•2h ago
40% of code is now machine-written. That number's only going up. So I spent some weekends asking: what would an intermediate language look like if we stopped pretending humans are the authors?
NERD is the experiment.
Bootstrap compiler works, compiles to native via LLVM. It's rough, probably wrong in interesting ways, but it runs. Could be a terrible idea. Could be onto something. Either way, it was a fun rabbit hole.
Contributors welcome if this seems interesting to you - early stage, lots to figure out: https://github.com/Nerd-Lang/nerd-lang-core
Happy to chat about design decisions or argue about whether this makes any sense at all.
wmoxam•2h ago
How did you arrive at that number?
gnanagurusrgs•2h ago
wmoxam•2h ago
andrepd•2h ago
kevml•2h ago
gnanagurusrgs•2h ago
wilsonnb3•2h ago
How much of the code is read by humans, though? I think using languages that LLMs work well with, like TS or Python, makes a lot of sense but the chosen language still needs to be readable by humans.
sublinear•53m ago
I've never had a good result. Just tons of silent bugs that are obvious those experienced with Python, JS/TS, etc. and subtle to everyone else.
alienbaby•42m ago
sublinear•22m ago
A poor craftsman may blame his tools, but some tools really are the wrong ones for the job.
liqilin1567•1h ago
tyre•50m ago
What about something like clojure? It’s already pretty succinct and Claude knows it quite well.
Plus there are heavily documented libraries that it knows how to use and are in its training data.