The problem wasn't capability — it was accountability. An agent would make a choice buried in a 50-file commit, and I'd only find out weeks later when something broke. No trace of which agent did what, when, or based on what context.
So I built a governance layer on top. The core idea: every agent decision gets recorded in an append-only receipt ledger (NDJSON). Each receipt links a specific agent action to a git commit, a dispatch ID, and a quality verdict. The orchestrator (T0) reviews receipts and decides what happens next — approve, hold, or redispatch.
Some things I learned: 1. Sub-agents are a black box. I never use them. When a bug surfaces, you can't trace which agent's context was polluted. Instead, I run independent agents in separate terminals with their own context windows, reporting back to T0. 2. Quality gates need to be deterministic, not LLM-based. An automated advisory checks every completion against pre-registered rules (file size limits, test coverage, open blockers). The LLM proposes, the gate validates. No vibes. 3. Context rotation is unsolved by the ecosystem. When an agent fills its context window mid-task, most workflows just fail. I built an automated rotation pipeline using Claude Code hooks — detects context usage, writes a structured handover, clears the window, and resumes. Zero human intervention. 4. The receipt ledger is the most valuable artifact. After 1100+ entries, patterns emerge: which types of tasks fail, which agents struggle with what, where context pollution happens. That data feeds back into dispatch planning. 5. Terminal locking prevents chaos. Each terminal can only work on one dispatch at a time. Sounds obvious, but without it you get overlapping work, merge conflicts, and agents overwriting each other's changes.
The system runs across 4 tmux panes (T0 orchestrator + 3 worker tracks), supports multiple AI providers, and everything is filesystem-based — no database, no cloud dependency. Open-sourced it recently.
Happy to answer questions about the architecture or specific failure modes.