Hi HN, I built Binex because debugging multi-agent pipelines was driving me crazy.
The problem: you chain 5 agents together, something in the middle breaks or gives a weird output,
and you have no idea what happened. Logs are scattered, there's no replay, and swapping a model
means rewriting code.
Binex lets you define agent pipelines in YAML and records everything per node. After a run you can:
- `binex trace` — full execution timeline with latency per node
- `binex debug` — post-mortem with inputs, outputs, and errors
- `binex replay --agent node=llm://other-model` — re-run swapping one model
- `binex diff run_a run_b` — compare two runs side-by-side
It uses LiteLLM under the hood so it works with Ollama, OpenAI, Anthropic, and 6 more providers.
Also supports local Python agents, remote agents via A2A protocol, and human-in-the-loop approval
gates.
Everything is stored locally (SQLite + filesystem), no cloud dependency.
pip install binex
Happy to answer any questions about the architecture or design decisions.
alexli1807•1h ago