The initial goal was getting a local agent to solve a small maze using some benchtop hardware. The agent observes the maze through a webcam, decides its next move, and calls a hardware tool to move.
When something goes wrong, it's hard to understand why. You usually end up staring at a huge JSON log of prompts, tool calls, and responses.
So I started building a trace harness and an openclaw specific shim to capture structured events from the agent runtime.
Instead of just logging everything, the tool emits clean execution boundaries.
observe > reason > act > result
Once those exist, the run starts behaving like a program execution which allows things like replay-from-step, diffing two runs to see divergence, and mutation payloads / branching from known path.
It's still early, but it's been an interesting way to think about debugging agent systems.
The repo is docs only, as I'm still in the process of migrating the codebase from a separate large project, but it explains usage well enough and has a few example provider shims (openai, langchain, openclaw) to get you started if you want to build your own.