This is NOT a quick-one-off vibe/LLM-coded project.
I started this project out of curiosity: I wanted to build an RL-based game as there were very few out there (e.g. Sony GT Sophy). And I wanted to learn the core RL foundation in a practical/useful manner.
I built on top of Puffer - but the training speed was not up to my needs so I rewrote the core with a ground-up native eval/training loop with multithreaded GPU batching (gonna be a part of the next Puffer release). [ Unaffiliated plug: Puffer is an excellent OSS library - check out https://puffer.ai ]
I trained the RL agent using curriculum learning + self-play. The demo showcases this self-play as well - which you can play against yourself, like an RL agent would!
Technical details in my blog in the link above.