Show HN: Glass box governance for multi-agent AI coding workflows

https://github.com/Vinix24/vnx-orchestration

1•vincentvandeth•1h ago

Comments

vincentvandeth•1h ago

Hey HN. I've been running multi-agent AI coding workflows in production for 6 months now, and VNX is the governance system I built to make it actually work. The problem isn't getting AI agents to write code — it's knowing when they went wrong, why, and preventing the same failure next time.

Every multi-agent framework I tried solved the demo but collapsed in production: no audit trail, no way to scope tasks, no quality enforcement, and when something broke three agents deep, no way to trace it.

VNX is a different approach. Four components, all filesystem-based:

1. Dispatch queue — T0 (orchestrator) breaks work into scoped tasks (150-300 lines max) and routes them to worker terminals. Each terminal runs its own AI CLI (Claude Code, Codex CLI, or Gemini CLI) with its own context window. No shared state between agents.

2. Receipt ledger — Every agent completion produces an append-only NDJSON receipt: what was dispatched, what was produced, which git commit, which files changed, duration, cost. After 1100+ entries, patterns emerge that you can't see any other way — which task types fail most, which agents struggle with which skills, where context pollution actually happens.

3. Quality gates — Deterministic, not LLM-based. The agent proposes, the gate validates: file size limits, test coverage thresholds, open blocker counts. Verdicts are APPROVE, HOLD, or ESCALATE. The LLM never decides whether its own work is good enough.

4. Context rotation — When an agent's context window fills up mid-task, a 3-hook pipeline detects it at 65%, has the agent write a structured handover, clears the session via tmux, and resumes with a fresh context window. Zero lost work, zero human intervention.

The whole thing runs in a 2x2 tmux grid. T0 orchestrates, T1-T3 execute. The terminal layout IS the architecture — each pane is a fully observable, independent agent session. I can read every thought, every tool call, every mistake. That's what "glass box" means: the opposite of agents calling agents inside a shared process where you're debugging the framework's abstractions.

Try it without any LLM:

git clone https://github.com/Vinix24/vnx-orchestration.git

cd vnx-orchestration/demo/dry-run bash replay.sh --fast

This replays a real 6-PR development session with dispatches, receipts, quality verdicts, and open item resolution.

There's also a context rotation demo in demo/dry-run-context-rotation/.

What VNX is NOT: not a SaaS, not a framework you import into code, not an agent builder. It's bash + python, local-first, no database, no cloud dependency. MIT licensed.

What I'd love to discuss: governance approaches for AI agents in general. Quality gates, audit trails, scoping strategies — I think this is the actual hard problem in multi-agent systems, not the orchestration itself. Curious what patterns others have found.

How the Psychedelic Drug Ibogaine Changed Me Forever

US Military reportedly used Claude in Iran strikes despite Trump's ban

The Long Afterlife of the Console Modchip

Show HN: Boucle – A self-dogfooding autonomous AI agent framework in Rus

An audio modem in 5 lines of Awk

Be the Village Rome Can't Read

To launch an online business in 2026 you needed

How to enforce contracts in API development?

The Claw – The First AI-Powered Digital Media Publication

How to Run a Small Social Network Site for Your Friends

Show HN: Fail-closed execution guard for AI agents (Python, pip installable)

DealMaker Uses Morning Brew and Robinhood to Lure Retail Investors

A cellular atlas of aging comes into focus

P3: Reputation-based lending with developer API

Supabase Blocked in India: random proxies are on market

Show HN: I Built Context+ AST and Embeddings for Codebase Understanding

Republican Steve Hilton surges ahead in California governor's race

'Enshittification' blamed for fewer NZers feeling positive about the internet

MicroGPT Explained Interactively

I Programmed an AI Bot to Help Me Run for President (2020)

US and Israel strike Iran, raising oil supply security risks

U.S. service members killed in Iran operation, military says

Afghan Taliban open to talks after Pakistan bombs Kabul, Kandahar

Trump says 9 Iranian warships have been sunk

FBI investigating 'potential nexus to terrorism' in deadly mass shooting

Freecode: A $0 coding agent auto-picks the best free LLM (~300 lines of Rust)

The Factory Model: How Coding Agents Changed Software Engineering

Free, real-time, AI-powered OSINT dashboard with 180 data feeds

Show HN: Aura-State – A Formally Verified LLM State Machine Compiler

Show HN: Spekkio: Reverse-engineer specs from vibe-coded apps