Between us we've solved 80 of the 149 levels. It didn't come free: plenty I had to hand-hold or play through myself, and for the nasty ones I had Claude build a solver that watches YouTube speedruns and rebuilds the moves frame by frame (oddly satisfying to watch: https://www.youtube.com/watch?v=6wndAf4EXNc).
It also built a full level editor (https://claudes-challenge.vercel.app/?level=1#editor) and a replay viewer to watch the solved levels back (https://claudes-challenge.vercel.app/replay.html).
Code: https://github.com/blumk/claudes-challenge
Obligatory IP note: this is someone else's game. I'm assuming it's effectively abandonware but I honestly don't know, so the site might have to come down at some point. The repo is stripped of all the original art and assets, code only.
vunderba•1h ago
Regarding the verifier that plays against the live engine, I’ve approached the problem from a similar angle by having LLM agents effectively borrow a page from the speedrunning community in the form of tool-assisted speedruns, allowing the LLM access only to a virtualized game controller.
[1] - https://store.steampowered.com/app/346850/Chips_Challenge_1
kenblum•1h ago
Curious about your agent setup though. Any public repo?
vunderba•1h ago
I don't have a GH repo up for the TAS system yet - it's a bespoke mess right now since it was built with the old game "Castle of the Winds" in mind but I'll definitely consider it in the future!
https://en.wikipedia.org/wiki/Castle_of_the_Winds