Kalibr is an autonomous routing system for AI agents. It replaces human debugging with an outcome-driven learning loop. On every agent run, it decides which execution path to use based on what is actually working in production.
An execution path is a full strategy, not just a model: model + tools + parameters.
Most agents hardcode one path. When that path degrades or fails, a human has to notice, debug, change configs, and redeploy. Even then, the fix often doesn’t stick because models and tools keep changing.
I got tired of being the reliability layer for my own agents. Kalibr replaces that.
With Kalibr, you register multiple paths for a task. You define what success means. After each run, your code reports the outcome. Kalibr captures telemetry on every run, learns from outcomes, and routes traffic to the path that’s working best while continuously canarying your alternative paths. When one path degrades or fails, traffic shifts immediately. No alerts, no dashboards and no incident response.
How is this different from other routers or observability tools?
Most routers choose between models using static rules or offline benchmarks. Observability tools show traces and metrics but still require humans to act. Kalibr is outcome-aware and autonomous. It learns directly from production success and changes runtime behavior automatically. It answers not “what happened?” but “what should my agent do next?”
We’re not a proxy. Calls go directly to OpenAI, Anthropic, or Google. We’re not a retry loop. Failed paths are routed away from, not retried blindly. Success rate always dominates; cost and latency only matter when success rates are close.
Python and TypeScript SDKs. Works with LangChain, CrewAI, and the OpenAI Agents SDK. Decision latency is ~50ms. If Kalibr is unavailable, the Router falls back to your first path.
Think of it as if/else logic for agents that rewrites itself based on real production outcomes.
We’ve been running this with design partners and would love feedback. Always curious how others are handling agent reliability in production.
GitHub: https://github.com/kalibr-ai/kalibr-sdk-python
Docs & benchmarks: https://kalibr.systems/docs
Antonioromero10•2h ago
devonkelley•2h ago