frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Kalibr – Autonomous Routing for AI Agents

2•devonkelley•2h ago
Hey HN, we’re Devon and Alex from Kalibr (https://kalibr.systems).

Kalibr is an autonomous routing system for AI agents. It replaces human debugging with an outcome-driven learning loop. On every agent run, it decides which execution path to use based on what is actually working in production.

An execution path is a full strategy, not just a model: model + tools + parameters.

Most agents hardcode one path. When that path degrades or fails, a human has to notice, debug, change configs, and redeploy. Even then, the fix often doesn’t stick because models and tools keep changing.

I got tired of being the reliability layer for my own agents. Kalibr replaces that.

With Kalibr, you register multiple paths for a task. You define what success means. After each run, your code reports the outcome. Kalibr captures telemetry on every run, learns from outcomes, and routes traffic to the path that’s working best while continuously canarying your alternative paths. When one path degrades or fails, traffic shifts immediately. No alerts, no dashboards and no incident response.

How is this different from other routers or observability tools?

Most routers choose between models using static rules or offline benchmarks. Observability tools show traces and metrics but still require humans to act. Kalibr is outcome-aware and autonomous. It learns directly from production success and changes runtime behavior automatically. It answers not “what happened?” but “what should my agent do next?”

We’re not a proxy. Calls go directly to OpenAI, Anthropic, or Google. We’re not a retry loop. Failed paths are routed away from, not retried blindly. Success rate always dominates; cost and latency only matter when success rates are close.

Python and TypeScript SDKs. Works with LangChain, CrewAI, and the OpenAI Agents SDK. Decision latency is ~50ms. If Kalibr is unavailable, the Router falls back to your first path.

Think of it as if/else logic for agents that rewrites itself based on real production outcomes.

We’ve been running this with design partners and would love feedback. Always curious how others are handling agent reliability in production.

GitHub: https://github.com/kalibr-ai/kalibr-sdk-python

Docs & benchmarks: https://kalibr.systems/docs

Comments

Antonioromero10•2h ago
Awesome tool been using for a month or so now.
devonkelley•2h ago
It's been amazing to build this product around you, thank you Antonio!
neilmagnuson•2h ago
awesome tool for observation, been using it for a while !
devonkelley•1h ago
So glad to have you as a user, and we love that you're loving the agentic observability as well as routing!
adeebvaliulla•2h ago
This resonates with a pain I see repeatedly in production agent systems: humans acting as the reliability layer.

Most teams I work with hardcode a single “golden path” for agents, then rely on dashboards, alerts, and tribal knowledge to notice when behavior degrades. By the time someone debugs model choice, tool params, or prompt drift, the environment has already changed again. The feedback loop is slow and brittle.

What’s interesting here is the explicit shift from observability to outcome-driven control. Routing based on actual production success rather than static benchmarks or offline evals aligns with how reliability engineering evolved in other domains. We moved from “what happened?” to “what should the system do next?” years ago.

A couple of questions I’m curious about:

- How do you define and normalize “success” across heterogeneous tasks without overfitting to short-term signals?

- How do you prevent oscillation or path thrashing when outcomes are noisy or sparse?

- Is there a notion of confidence or regret baked into the routing decisions over time?

Overall, this feels less like a router and more like an autonomous control plane for agents. If it holds up under real-world variance, this is a meaningful step toward agents that are self-healing rather than constantly babysat.

devonkelley•1h ago
Wow, yes. You nailed the framing. Autonomous control plane is the perfect way to describe Kalibr.

Defining success: We don't normalize it. Teams define their own outcome signals (latency, cost, user ratings, task completion, etc). You don't need perfect attribution to beat static configs; even noisy signals surface real patterns when aggregated correctly.

Oscillation: Thompson Sampling. Instead of greedily chasing the current best path, we maintain uncertainty estimates and explore proportionally. Sparse or noisy outcomes widen confidence intervals, which naturally dampens switching. Wilson scoring handles the low-sample edge cases without the wild swings you'd get from raw percentages.

Confidence/regret: Explicit in the routing math. Every path carries uncertainty that decays with evidence. The system minimizes cumulative regret over time rather than optimizing point-in-time decisions.

The gap we're closing is exactly what you mentioned. Self-correcting instead of babysat.

CSS selectors are global and evaluated RTL

https://bsky.app/profile/brandondail.com/post/3mdg76zewxk2e
1•linolevan•1m ago•0 comments

A CEO, Captured

https://om.co/2026/01/27/a-ceo-captured/
2•speckx•1m ago•0 comments

Known Physical Bitcoin Attacks

https://github.com/jlopp/physical-bitcoin-attacks
1•alcazar•1m ago•0 comments

History of the browser user-agent string (2008)

https://webaim.org/blog/user-agent-string-history/
2•smushy•2m ago•0 comments

Show HN: Maditate – Meditation timer tracking your 10k hours to enlightenment

https://maditation.app
1•koryna•4m ago•0 comments

Why code indexing matters for AI security tools

https://www.gecko.security/blog/why-static-analysis-struggles-with-business-logic
1•jjjutla•4m ago•1 comments

Supreme Court to consider whether geofence warrants are constitutional

https://therecord.media/supreme-court-geofence-constitutionality
2•zdw•4m ago•0 comments

Ice Drives Unmarked Cars. This Public Database Tracks Their License Plates

https://theintercept.com/2026/01/02/ice-license-plates-database/
8•JumpCrisscross•6m ago•0 comments

Arm's Cortex A725 Ft. Dell's Pro Max with GB10

https://chipsandcheese.com/p/arms-cortex-a725-ft-dells-pro-max
1•pixelpoet•7m ago•0 comments

Larry says the race for AI will be led by those with private company data

https://www.ibtimes.co.uk/larry-ellison-says-ai-race-will-led-those-access-private-enterprise-dat...
1•01-_-•7m ago•1 comments

Blocking Claude

https://aphyr.com/posts/403-blocking-claude
1•zdw•7m ago•0 comments

Trump's use of AI images pushes new boundaries, further eroding public trust

https://apnews.com/article/ai-videos-trump-ice-artificial-intelligence-08d91fa44f3146ec1f8ee4d213...
4•geox•7m ago•0 comments

Is Boston's tech and innovation scene withering?

https://www.bostonglobe.com/2026/01/27/business/boston-tech-innovation-biotech-worry/
1•martincmartin•8m ago•0 comments

Lennart Poettering, Christian Brauner founded a new company

https://amutable.com/about
16•hornedhob•9m ago•2 comments

Worklist: A zero‑knowledge task manager for teams

https://worklist.app/
1•a0b2a33•10m ago•1 comments

The Spectrum of Agentic Coding

https://agenticcoding.substack.com/p/the-spectrum-of-agentic-coding
1•ykdojo•11m ago•0 comments

Washington Post may cut sports section amid layoffs

https://www.sportsbusinessjournal.com/Articles/2026/01/26/report-washington-post-may-cut-sports-s...
1•ortusdux•12m ago•0 comments

AI-induced cultural stagnation is no longer speculation − it's happening

https://theconversation.com/ai-induced-cultural-stagnation-is-no-longer-speculation-its-already-h...
1•cdrnsf•12m ago•0 comments

New Android Theft Protection Feature Updates: Smarter, Stronger

https://security.googleblog.com/2026/01/android-theft-protection-feature-updates.html
1•ImJamal•13m ago•0 comments

Splitby: A modern, regex capable alternative to cut

https://serenacula.github.io/splitby/
1•Serenacula•15m ago•0 comments

Systemd Founder Lennart Poettering Announces Amutable Company

https://www.phoronix.com/news/Amutable
2•ImJamal•16m ago•0 comments

What it's like to get undressed by Grok

https://www.rollingstone.com/culture/culture-features/grok-sexualized-image-xai-elon-musk-women-1...
6•ryandrake•16m ago•0 comments

Steve at Home

https://stevejobsarchive.com/artifact/steve-at-home-sitting-under-his-tiffany-lamp
3•mefengl•16m ago•0 comments

LG's new subscription program charges up to £277 per month to rent a TV

https://arstechnica.com/gadgets/2026/01/lgs-new-subscription-program-charges-up-to-277-per-month-...
1•voxadam•16m ago•0 comments

The Interventions We Need

https://fivetwelvethirteen.substack.com/p/the-interventions-we-need
1•yorwba•18m ago•0 comments

Show HN: EduFSDP – Minimal and educational Fully Sharded Data Parallel

https://github.com/0xNaN/edufsdp
1•xnan•18m ago•1 comments

Google Suite CLI: Gmail, GCal, GDrive, GContacts

https://github.com/steipete/gogcli
1•nateb2022•20m ago•0 comments

The Most Dangerous Code in the World: Non-Browser Software Validating SSL Certs [pdf]

https://www.cs.cornell.edu/~shmat/shmat_ccs12.pdf
1•ripe•20m ago•0 comments

California's Newsom accuses TikTok of suppressing content critical of Trump

https://www.rnz.co.nz/news/world/585135/california-s-newsom-accuses-tiktok-of-suppressing-content...
8•randycupertino•20m ago•0 comments

The Harry Potter Generation Needs to Grow Up

https://www.nytimes.com/2026/01/26/opinion/harry-potter-millenials-liberalism.html
1•whack•21m ago•0 comments