1. Debugging agents is painful - When your agent makes 20 tool calls and fails, good luck figuring out which decision was wrong. WatchLLM gives you a step-by-step timeline showing every decision, tool call, and model response with explanations for why the agent did what it did.
2. Agent costs spiral fast - Agents love getting stuck in loops or calling expensive tools repeatedly. WatchLLM tracks cost per step and flags anomalies like "loop detected - same action repeated 3x, wasted $0.012" or "high cost step - $0.08 exceeds threshold".
The core features:
Timeline view of every agent decision with cost breakdown Anomaly detection (loops, repeated tools, high-cost steps) Semantic caching that cuts 40-70% off your LLM bill as a bonus Works with OpenAI, Anthropic, Groq - just change your baseURL
It's built on ClickHouse for real-time telemetry and uses vector similarity for the caching layer. The agent debugger explains decisions using LLM-generated summaries of why each step happened. Right now it's free for up to 50K requests/month. I'm looking for early users who are building agents and want better observability into what's actually happening (and what it's costing). Try it: https://watchllm.dev Would love feedback on what other debugging features would be useful. What do you wish you had when your agents misbehave?