I’ve been building Agent Ledger, a tool for developers and teams shipping AI agents and LLM apps. Once an agent starts chaining models, tools, and retries, a single request stops looking like “app → result” and starts looking like a mini distributed system. When it goes wrong, normal logs don’t explain why the agent made a sequence of decisions, and billing dashboards don’t tell you which step caused the spend.
Agent Ledger turns each run into a structured, replayable timeline (prompt → LLM calls → tool calls → results/errors) and tags every step with tokens, latency, provider/model, and USD cost (plus total run cost). It also flags obvious degenerate behavior like repeated tool calls / loops, and makes it easy to compare runs to spot regressions after prompt or tool changes, etc
It’s currently a lightweight Node/TypeScript SDK + Python SDK plus a simple web UI for browsing sessions. I’d love feedback from anyone operating agents in production—what signals are most important, what’s missing vs your current tracing setup, and what would you want next (prompt diffs, alerts, guardrails, replay, integrations, etc.).
furaha_damien•1h ago
Agent Ledger turns each run into a structured, replayable timeline (prompt → LLM calls → tool calls → results/errors) and tags every step with tokens, latency, provider/model, and USD cost (plus total run cost). It also flags obvious degenerate behavior like repeated tool calls / loops, and makes it easy to compare runs to spot regressions after prompt or tool changes, etc
It’s currently a lightweight Node/TypeScript SDK + Python SDK plus a simple web UI for browsing sessions. I’d love feedback from anyone operating agents in production—what signals are most important, what’s missing vs your current tracing setup, and what would you want next (prompt diffs, alerts, guardrails, replay, integrations, etc.).