frontpage.

I built agent-triage - a CLI that automates diagnosing AI agent failures in production.

I was spending way too much time staring at traces, logs and dashboards trying to figure out why my multi-agent setups kept failing.

You just point it at your traces (LangSmith, Langfuse, OpenTelemetry, or a JSON file). It pulls the system prompts directly from the logs, extracts the behavioral rules, and uses an LLM-as-a-judge to replay each conversation step-by-step.

It flags exactly which turn broke things, which agent caused it, and traces cascading failures across routing, handoffs, and retrieval.

It aggregates root causes across all of them: "24 out of 51 failures are missing escalations." You know exactly what to fix first.

Runs locally. Only LLM API calls leave your machine.

npx agent-triage demo — runs on sample data, uses your own API key (~$0.002/conversation with gpt-4o-mini).

https://github.com/converra/agent-triage Demo report: https://demo-report-sigma.vercel.app/

fp.

Show HN: Agent-triage – diagnosis of agent failures from production traces