fp.

  The problem: you chain 5 agents together, something in the middle breaks or gives a weird output,
  and you have no idea what happened. Logs are scattered, there's no replay, and swapping a model
  means rewriting code.

  Binex lets you define agent pipelines in YAML and records everything per node. After a run you can:

  - `binex trace` — full execution timeline with latency per node
  - `binex debug` — post-mortem with inputs, outputs, and errors
  - `binex replay --agent node=llm://other-model` — re-run swapping one model
  - `binex diff run_a run_b` — compare two runs side-by-side

  It uses LiteLLM under the hood so it works with Ollama, OpenAI, Anthropic, and 6 more providers.
  Also supports local Python agents, remote agents via A2A protocol, and human-in-the-loop approval
  gates.

  Everything is stored locally (SQLite + filesystem), no cloud dependency.

  pip install binex

  Happy to answer any questions about the architecture or design decisions.