I built this because I wanted a very small, readable way to compare AI agent outputs locally. Most agent tools are powerful but heavy. This project intentionally avoids frameworks, configs, and abstractions. An "agent" is just a function, and a "scorer" is just a function. The goal is not to be production-ready, but to make experimentation and learning easy.
Feedback welcome. Especially on what’s missing or what should stay out.