The article mentions a REPL skill. I don’t do that: letting model+tools run sbcl is sufficient.
I haven't tried integrating it into a repl or even command line tools though. The llm can't experience the benefit of a repl so it makes sense it struggled with it and preferred feeing entire programs into sbcl each time.
If you take a hard look at that workflow, it implies a high degree of incompetence on the part of humans: the reason we generally don’t write thousands of lines without any automated feedback is because our mistake rate is too high.
I learned Common Lisp years ago while working in the AI lab at the University of Toronto, and parts of this article resonated strongly with me.
However, if you abandon the idea of REPL-driven development, then the frontier models from Anthropic and OpenAI are actually very capable of writing Lisp code. They struggle sometimes editing it (messing up parens)), but usually the first pass is pretty good.
I've been on an LLM kick the past few months, and two of my favorite AI-coded (mostly) projects are, interestingly, REPL-focused. icl (https://github.com/atgreen/icl) is a TUI and browser-based front end for your CL REPL designed to make REPL programming for humans more fun, whether you use it stand-alone, or as an Emacs companion. Even more fun is whistler (https://github.com/atgreen/whistler), which allows you to write/compile/load eBPF code in lisp right from your REPL. In this case, the AI wrote the highly optimizing SSA-based compiler from scratch, and it is competitive against (and sometimes beating) clang -O2. I mean... I say the AI wrote it... but I had to tell it what I wanted in some detail. I start every project by generating a PRD, and then having multiple AIs review that until we all agree that it makes sense, is complete enough, and is the right approach to whatever I'm doing.
I proceeded to spend about 45 minutes configuring Emacs. Not because Claude struggled with it, but because Claude was amazing at it and I just kept pushing it well beyond sane default territory. It was weirdly enthralling to have Claude nail customizations that I wouldn't have even bothered trying back in the day due to my poor elisp skills. It was a genuinely fun little exercise. But I went back to VS Code.
E.g. I work on a huge monorepo at this new company, and Emacs TRAMP was super slow to work with. With help of Claude, I figured out what packages are making it worse, added some optimizations (Magit, Project Find File), hot-loaded caching to some heavyweight operations (e.g. listing all files in project) without making any changes to packages itself, and while listing files I added keybindings to my mini buffer map to quickly just add filters for subproject I'm on. Could have probably done all this earlier as well, but it was definitely going to take much longer as I was never deep into elisp ecosystem.
This is definitely partly training data, but if you give an LLM a simple language to use on the fly it can usually do ok. I think the real problem is complexity.
Go and Java require very little mental modelling of the problem, everything is written down on the page really quite clearly (moreso with Go, but still with Java).
In GCL however the semantics are _weird_, the scoping is unlike most languages, because it's designed for DSLs. Humans writing DSL content requires little thought, but authoring DSLs requires a fair amount of mental modelling about the structure of the data that is not present on the page. I'd wager that Lisp is similar, more of a mental model is required.
The problem is of course that LLMs don't have a mental model, or at least what they do have is far from what humans have. This is very apparent when doing non-trivial code, non-CRUD, non-React, anything that requires thinking hard about problems more than it requires monkeys at typewriters.
The main problem is the dynamic scoping (as opposed to lexical scoping like most languages), and the fact that lots of things are untyped and implicitly referenced.
This is a weird moment in time where proprietary technology can hurt more than it can help, even if it's superior to what's available in public in principle.
Damn. And here I have a Gemini Pro subscription sitting unused for a year now.
I created Schematra[1] and also a schematra-starter-kit[2] that can be spun from claude and create a project and get you ready in less than 5 minutes. I've created 10+ side projects this way and it's been a great joy. I even added a scheme reviewer agent that is extremely strict and focus on scheme best practices (it's all in the starter kit, btw)
I don't think the lack of training material makes LLMs poor at writing lisp. I think it's the lack of guidelines, and if you add enough of them, the fact that lisp has inherently such a simple pattern & grammar that it makes it a prime candidate (IMO) for code generation.
There are some issues of course. Sometimes, Claude Code gets into "parenthesis counting loop" which is somewhat hilarious, but luckily this doesn't really happen too often for me. In the worst case I fix the problematic fragment myself and then let it continue. But overall I'd say Claude Code is not bad at all with Lisps
However, a large part of OP is about REPLs and on that I've also had a hard time with CC. I was working on it this evening in fact, and while I got something running, it's clunky and slow.
It's though to steal what doesn't exist.
> but AI can write hundreds of lines in one go so that it just makes sense for the AI to use a language that doesn't use the REPL. It is orders of magnitude easier and cheaper to write in high-internet-volume languages like Go and Python
Python doesn't have a REPL?
Not really in the Lisp sense. If you consider how people typically develop and modify Python code (edit file -> run from beginning -> observe effects -> start over) and how people typically develop Lisp code (rarely do "start over" and "run from beginning" happen) it becomes obvious. Most Python development resembles Go or C++, you just get to skip the explicit "compile" step and go straight to "run". The Python "REPL" is nice for little snippets and little bits of interactive modification but the experience compared to Lisp isn't the same (and I think the experience is actually better/closer to Lisp in Java, with debug mode and JRebel).
That's what you get with every language. So, not much to really be disappointed by in terms of Lisp performance.
You guys are depressing.
We should be using LLMs to translate from (fuzzy) human specifications to formal specifications (potentially resolving contradictions), and then solving the resulting logic problem with a proper reasoning algorithm. That would also guarantee correctness.
LLMs are a "worse is better" kind of solution.
Agreed! This is why having LLMs write assembly or binary, as people suggest, is IMO moving in the wrong direction.
> then solving the resulting logic problem with a proper reasoning algorithm. That would also guarantee correctness.
Yes! I.e. write in a high-level programming language, and have a compiler, the reasoning algorithm, output binary code.
It seems like we're already doing this!
Now is the time to switch to a popular language and let the machines wrangle it for you. With more training data available, you'll be far more productive in JavaScript than you ever were in Lisp.
Yep. Language and libraries too.
TacticalCoder•8h ago
Some are going to nitpick that Clojure isn't as lispy as, say, Common Lisp but I did experiment with Claude Code CLI and my paid Anthropic subscription (Sonnet 4.6 mostly) and Clojure.
It is okay'ish. I got it to write a topological sort and pure (no side effect) functions taking in and returning non-totally-trivial data structures (maps in maps with sets and counters etc.). But apparently it's got problems with...
... drumroll ...
The number of parentheses. It's so bad that the author of figwheel (a successful ClojureScript project) is working on a Clojure MCP that fixes parens in Clojure code spoutted by AI (well the project does more than that, but the description literally says it's "designed to handle Clojure parentheses reliably").
You can't make that up: there's literally an issue with the number of closing parens.
Now... I don't think giving an AI access to a Lisp REPL and telling it: "Do this by bumping on the guardrails left and right until something is working" is the way to go (yet?) for Clojure code.
I'm passing it a codebase (not too big, so no context size issue) and I know what I want: I tell it "Write a function which takes this data structure in and that other parameter, the function must do xxx, the function must return the same data structure out". Before that I told it to also implement tests (relatively easy for they're pure functions) for each function it writes and to run tests after each function it implements or modify.
And it's doing okay.
tasty_freeze•7h ago
As an example, I asked claude 3.5 back when that was the latest to indent all the code in my file by four more spaces. The file was about 700 lines long. I got a busy spinner for two minutes then it said, "OK, first 50 lines done, now I'll do the rest" and got another busy spinner and it said, "this is taking too long. I'm going to write a program to do it", which of course it had no problem doing. The point is that it is superhuman at some things and completely brain-dead about others, and counting parens is one of those things I wouldn't expect it to be good at.
nextos•6h ago
smackeyacky•5h ago
Edit: working on a lot of legacy code that needs boring refactoring, which Claude is great at.
lagniappe•6h ago
surround•6h ago
> Are the parentheses in ((((()))))) balanced?
There was a thread about this the other day [1]. It's the same issue as "count the r's in strawberry." Tokenization makes it hard to count characters. If you put that string into OpenAI's tokenizer, [2] this is how they are grouped:
Token 1: ((((
Token 2: ()))
Token 3: )))
Which of course isn't at all how our minds would group them together in order to keep track of them.
[1] https://news.ycombinator.com/item?id=47615876 [2] https://platform.openai.com/tokenizer
frwrfwrfeefwf•5h ago
surround•5h ago
Our brains also process text entire words at a time, not letter-by-letter. The difference is that our brains are much more flexible than a tokenizer, and we can easily switch to letter-by-letter reading when needed, such as when we encounter an unfamiliar word.
otterley•5h ago
surround•5h ago
xigoi•4h ago
sph•55m ago
ksaj•2h ago
Try to get your favourite LLM to read the time from a clock face. It'll fail ridiculously most of the time, and come up with all kinds of wonky reasons for the failures.
It can code things that it's seen the logic for before. That's not the same as counting. That's outputing what it's previously seen as proper code (and even then it often fails. Probably 'cos there's a lot of crap code out there)
whartung•6h ago
Things, on the whole, were fine, save for the occasional, rogue (or not) parentheses.
The AI would just go off the rails trying to solve the problem. I told it that if it ever encountered the problem to let me know and not try to fix it, I’d do it.
mark_l_watson•1h ago