λ-bench
A benchmark of 120 pure lambda calculus programming problems for AI models.
→ Live results
What is this?
λ-bench evaluates how well AI models can implement algorithms using pure lambda calculus. Each problem asks the model to write a program in Lamb, a minimal lambda calculus language, using λ-encodings of data structures to implement a specific algorithm.
The model receives a problem description, data encoding specification, and test cases. It must return a single .lam program that defines @main. The program is then tested against all input/output pairs — if every test passes, the problem is solved.
tromp•1h ago
An example task (writing a lambda calculus evaluator) can be seen at https://github.com/VictorTaelin/lambench/blob/main/tsk/algo_...