Zeno is an open-source toolkit for verifiable, deterministic reward functions for RL on LLMs.
While the initial release focuses on Python code generation, the goal is broader: make RL reward design for LLMs transparent, modular, and extendable across domains (math, retrieval, reasoning, tool-use, etc.)
What's in Zeno for now? - Auditable, stateless reward functions for Python code - docstrings, ruff linting, type hints, recursion, and more - Works directly with Huggingface's TRL or any RL loop - plug reward functions in as needed. - MIT licensed and minimal.
Roadmap: Python code is just the starting point. Extensions for math problem solving, planning and agentic behaviors are in todo.
Repo: https://github.com/think-a-tron/zeno
Docs and more details in the README
Comments, critiques, and real-world use cases encouraged, especially if you want to push beyond code.