The workflow I keep coming back to: write a permissive policy, let the agent run for a week, then tighten the rules and replay the old sessions to see what would have been blocked. Much better than guessing at policy upfront, and it’s the part of the tool I didn’t expect to use as much as I do.
Every gated decision gets written to jsonl, so you can grep, diff, or feed traces back through a stricter policy without re-running the agent. There’s also a TUI for browsing sessions, inspecting individual gate decisions, and stepping through replays interactively, which makes it easier to spot patterns across runs.
Currently works with Claude Code and MCP-based clients like Codex.
Still a WIP and mostly a project for myself, but figured others experimenting with coding agents might find it interesting.