I was spending way too much time staring at traces, logs and dashboards trying to figure out why my multi-agent setups kept failing.
You just point it at your traces (LangSmith, Langfuse, OpenTelemetry, or a JSON file). It pulls the system prompts directly from the logs, extracts the behavioral rules, and uses an LLM-as-a-judge to replay each conversation step-by-step.
It flags exactly which turn broke things, which agent caused it, and traces cascading failures across routing, handoffs, and retrieval.
It aggregates root causes across all of them: "24 out of 51 failures are missing escalations." You know exactly what to fix first.
Runs locally. Only LLM API calls leave your machine.
npx agent-triage demo — runs on sample data, uses your own API key (~$0.002/conversation with gpt-4o-mini).
https://github.com/converra/agent-triage Demo report: https://demo-report-sigma.vercel.app/