frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Brute Force Colors (2022)

https://arnaud-carre.github.io/2022-12-30-amiga-ham/
1•erickhill•1m ago•0 comments

Google Translate apparently vulnerable to prompt injection

https://www.lesswrong.com/posts/tAh2keDNEEHMXvLvz/prompt-injection-in-google-translate-reveals-ba...
1•julkali•1m ago•0 comments

(Bsky thread) "This turns the maintainer into an unwitting vibe coder"

https://bsky.app/profile/fullmoon.id/post/3meadfaulhk2s
1•todsacerdoti•2m ago•0 comments

Software development is undergoing a Renaissance in front of our eyes

https://twitter.com/gdb/status/2019566641491963946
1•tosh•2m ago•0 comments

Can you beat ensloppification? I made a quiz for Wikipedia's Signs of AI Writing

https://tryward.app/aiquiz
1•bennydog224•3m ago•1 comments

Spec-Driven Design with Kiro: Lessons from Seddle

https://medium.com/@dustin_44710/spec-driven-design-with-kiro-lessons-from-seddle-9320ef18a61f
1•nslog•4m ago•0 comments

Agents need good developer experience too

https://modal.com/blog/agents-devex
1•birdculture•5m ago•0 comments

The Dark Factory

https://twitter.com/i/status/2020161285376082326
1•Ozzie_osman•5m ago•0 comments

Free data transfer out to internet when moving out of AWS (2024)

https://aws.amazon.com/blogs/aws/free-data-transfer-out-to-internet-when-moving-out-of-aws/
1•tosh•6m ago•0 comments

Interop 2025: A Year of Convergence

https://webkit.org/blog/17808/interop-2025-review/
1•alwillis•7m ago•0 comments

Prejudice Against Leprosy

https://text.npr.org/g-s1-108321
1•hi41•8m ago•0 comments

Slint: Cross Platform UI Library

https://slint.dev/
1•Palmik•12m ago•0 comments

AI and Education: Generative AI and the Future of Critical Thinking

https://www.youtube.com/watch?v=k7PvscqGD24
1•nyc111•12m ago•0 comments

Maple Mono: Smooth your coding flow

https://font.subf.dev/en/
1•signa11•13m ago•0 comments

Moltbook isn't real but it can still hurt you

https://12gramsofcarbon.com/p/tech-things-moltbook-isnt-real-but
1•theahura•17m ago•0 comments

Take Back the Em Dash–and Your Voice

https://spin.atomicobject.com/take-back-em-dash/
1•ingve•17m ago•0 comments

Show HN: 289x speedup over MLP using Spectral Graphs

https://zenodo.org/login/?next=%2Fme%2Fuploads%3Fq%3D%26f%3Dshared_with_me%25253Afalse%26l%3Dlist...
1•andrespi•18m ago•0 comments

Teaching Mathematics

https://www.karlin.mff.cuni.cz/~spurny/doc/articles/arnold.htm
2•samuel246•21m ago•0 comments

3D Printed Microfluidic Multiplexing [video]

https://www.youtube.com/watch?v=VZ2ZcOzLnGg
2•downboots•21m ago•0 comments

Abstractions Are in the Eye of the Beholder

https://software.rajivprab.com/2019/08/29/abstractions-are-in-the-eye-of-the-beholder/
2•whack•22m ago•0 comments

Show HN: Routed Attention – 75-99% savings by routing between O(N) and O(N²)

https://zenodo.org/records/18518956
1•MikeBee•22m ago•0 comments

We didn't ask for this internet – Ezra Klein show [video]

https://www.youtube.com/shorts/ve02F0gyfjY
1•softwaredoug•23m ago•0 comments

The Real AI Talent War Is for Plumbers and Electricians

https://www.wired.com/story/why-there-arent-enough-electricians-and-plumbers-to-build-ai-data-cen...
2•geox•25m ago•0 comments

Show HN: MimiClaw, OpenClaw(Clawdbot)on $5 Chips

https://github.com/memovai/mimiclaw
1•ssslvky1•25m ago•0 comments

I Maintain My Blog in the Age of Agents

https://www.jerpint.io/blog/2026-02-07-how-i-maintain-my-blog-in-the-age-of-agents/
3•jerpint•26m ago•0 comments

The Fall of the Nerds

https://www.noahpinion.blog/p/the-fall-of-the-nerds
1•otoolep•28m ago•0 comments

Show HN: I'm 15 and built a free tool for reading ancient texts.

https://the-lexicon-project.netlify.app/
5•breadwithjam•30m ago•1 comments

How close is AI to taking my job?

https://epoch.ai/gradient-updates/how-close-is-ai-to-taking-my-job
1•cjbarber•31m ago•0 comments

You are the reason I am not reviewing this PR

https://github.com/NixOS/nixpkgs/pull/479442
2•midzer•32m ago•1 comments

Show HN: FamilyMemories.video – Turn static old photos into 5s AI videos

https://familymemories.video
1•tareq_•34m ago•0 comments
Open in hackernews

Show HN: Kalibr – Autonomous Routing for AI Agents

3•devonkelley•1w ago
Hey HN, we’re Devon and Alex from Kalibr (https://kalibr.systems).

Kalibr is an autonomous routing system for AI agents. It replaces human debugging with an outcome-driven learning loop. On every agent run, it decides which execution path to use based on what is actually working in production.

An execution path is a full strategy, not just a model: model + tools + parameters.

Most agents hardcode one path. When that path degrades or fails, a human has to notice, debug, change configs, and redeploy. Even then, the fix often doesn’t stick because models and tools keep changing.

I got tired of being the reliability layer for my own agents. Kalibr replaces that.

With Kalibr, you register multiple paths for a task. You define what success means. After each run, your code reports the outcome. Kalibr captures telemetry on every run, learns from outcomes, and routes traffic to the path that’s working best while continuously canarying your alternative paths. When one path degrades or fails, traffic shifts immediately. No alerts, no dashboards and no incident response.

How is this different from other routers or observability tools?

Most routers choose between models using static rules or offline benchmarks. Observability tools show traces and metrics but still require humans to act. Kalibr is outcome-aware and autonomous. It learns directly from production success and changes runtime behavior automatically. It answers not “what happened?” but “what should my agent do next?”

We’re not a proxy. Calls go directly to OpenAI, Anthropic, or Google. We’re not a retry loop. Failed paths are routed away from, not retried blindly. Success rate always dominates; cost and latency only matter when success rates are close.

Python and TypeScript SDKs. Works with LangChain, CrewAI, and the OpenAI Agents SDK. Decision latency is ~50ms. If Kalibr is unavailable, the Router falls back to your first path.

Think of it as if/else logic for agents that rewrites itself based on real production outcomes.

We’ve been running this with design partners and would love feedback. Always curious how others are handling agent reliability in production.

GitHub: https://github.com/kalibr-ai/kalibr-sdk-python

Docs & benchmarks: https://kalibr.systems/docs

Comments

Antonioromero10•1w ago
Awesome tool been using for a month or so now.
devonkelley•1w ago
It's been amazing to build this product around you, thank you Antonio!
neilmagnuson•1w ago
awesome tool for observation, been using it for a while !
devonkelley•1w ago
So glad to have you as a user, and we love that you're loving the agentic observability as well as routing!
adeebvaliulla•1w ago
This resonates with a pain I see repeatedly in production agent systems: humans acting as the reliability layer.

Most teams I work with hardcode a single “golden path” for agents, then rely on dashboards, alerts, and tribal knowledge to notice when behavior degrades. By the time someone debugs model choice, tool params, or prompt drift, the environment has already changed again. The feedback loop is slow and brittle.

What’s interesting here is the explicit shift from observability to outcome-driven control. Routing based on actual production success rather than static benchmarks or offline evals aligns with how reliability engineering evolved in other domains. We moved from “what happened?” to “what should the system do next?” years ago.

A couple of questions I’m curious about:

- How do you define and normalize “success” across heterogeneous tasks without overfitting to short-term signals?

- How do you prevent oscillation or path thrashing when outcomes are noisy or sparse?

- Is there a notion of confidence or regret baked into the routing decisions over time?

Overall, this feels less like a router and more like an autonomous control plane for agents. If it holds up under real-world variance, this is a meaningful step toward agents that are self-healing rather than constantly babysat.

devonkelley•1w ago
Wow, yes. You nailed the framing. Autonomous control plane is the perfect way to describe Kalibr.

Defining success: We don't normalize it. Teams define their own outcome signals (latency, cost, user ratings, task completion, etc). You don't need perfect attribution to beat static configs; even noisy signals surface real patterns when aggregated correctly.

Oscillation: Thompson Sampling. Instead of greedily chasing the current best path, we maintain uncertainty estimates and explore proportionally. Sparse or noisy outcomes widen confidence intervals, which naturally dampens switching. Wilson scoring handles the low-sample edge cases without the wild swings you'd get from raw percentages.

Confidence/regret: Explicit in the routing math. Every path carries uncertainty that decays with evidence. The system minimizes cumulative regret over time rather than optimizing point-in-time decisions.

The gap we're closing is exactly what you mentioned. Self-correcting instead of babysat.

adeebvaliulla•1w ago
That makes a lot of sense, and I like that you’re being explicit about regret minimization rather than chasing local optima.

The Thompson Sampling + Wilson score combo is a pragmatic choice. In practice, most agent systems I see fail not because they lack metrics, but because they overreact to them. Noisy reward signals plus greedy selection is how teams end up whipsawing configs or freezing change altogether. Treating uncertainty as a first-class input instead of something to smooth away is the right move.

I also agree with your point on attribution. Perfect attribution is a trap. In real production environments, partial and imperfect outcome signals still dominate static configs if the system can reason probabilistically over time. This mirrors what we learned in reliability and delivery metrics years ago: trend dominance beats point accuracy.

One area I’d be curious about as this matures is organizational adoption rather than the math:

- How teams reason about defining outcomes without turning it into a governance bottleneck

- How you help users build intuition around uncertainty and regret so they trust the system when it routes “away” from what feels intuitively right

- Where humans still need to intervene, if anywhere, once the control plane is established

If this holds up across long-tail tasks and low-frequency failures, it feels like a real step toward agents that behave more like adaptive systems and less like fragile workflows with LLMs bolted on.

Appreciate the thoughtful reply.

devonkelley•1w ago
These are excellent questions!

Outcome definition: Simpler is better. Teams that start with one binary signal like "did it work?" (call completed, meeting booked, etc.), get value immediately. Governance bottlenecks usually come from overthinking it upfront.

Building trust: When Kalibr routes away from what feels like the "right" model and it works, people are surprised. We capture and show outcome history so teams can see when a path started to degrade and when Kalibr shifted traffic. No LLM decision making means no black box around routing choices, it's all shown in your dashboard when you use Kalibr.

Human intervention: Defining new paths, adding goals, handling edge cases where signal is genuinely sparse. The goal isn't zero humans anywhere, it's getting them out of the reactive debugging loop so they can focus on strategic decisions instead of repeatedly patching failed agents.

Curious, have you built multi step agents and run into the challenge of repeated failures?

roan-we•1w ago
I had developed a side project with AI agents to help me summarize the research papers and extract key citations, and I was repeatedly hitting the same annoying pattern. I would finetune everything with GPT4 to perfection, and then in a couple of weeks, it would start hallucinating references or missing citations. I used to waste my Saturday mornings changing prompts and switching models instead of really using the thing.

Kalibr pretty much freed me from that loop.

I basically arranged GPT-4 and Claude as two different routes, explained that success means accurate citations that I can verify, and now it just works.

Last week, GPT-4 oddly started being very slow on longer papers, and by the time I realized it, the traffic was already automatically diverted to Claude.

It's like the difference between caretaking an agent and actually having a tool that remains functional without constant supervision.

Honestly, I wish I had discovered this a few months ago hehe

devonkelley•1w ago
This made my day. Exactly the use case we had in mind. Really glad it's working for you, and that GPT-4 slowdown story is a perfect example of why canary traffic matters. Thanks for sharing this.
0to1ton•1w ago
Congrats on the launch !!!
devonkelley•1w ago
Thank you so much! If you build agents, try Kalibr for free to check out how well our routing works. I am biased, but it's awesome :)
curranadvani•1w ago
Amazing work! This will change AI
devonkelley•1w ago
Thank you!! That's the goal. We see this as the vital infra layer that ennobles agents/MAS to reliably scale in prod.
ashishforai•1w ago
Amazing tool, much needed. For last two years 80% of yaps were about reliability , reproducibility and observability! Glad this is being addressed here.
devonkelley•1w ago
Right! There's such a push for agent observability right now. It's great to know why your agents are failing, but better if they never fail in the first place :) Are you building agents?
deaux•1w ago
Comment section needs a look at @ tomhow
devonkelley•1w ago
These are real people dude. I know some of them. Some are users or friends who came to comment on our post. They aren't bots and neither am I. Just new to HN.
devonkelley•1w ago
@tomhow emailing to verify that I am real, the comments on my post are not bots and I can verify whatever you need me to. Annoying to be flagged when I am just new here and trying to be part of the community.
deaux•1w ago
Never said you're a bot or that they're not real people. They're mostly people you know or your sockpuppets. Most people who do this are real people.

https://news.ycombinator.com/item?id=46719034

https://news.ycombinator.com/item?id=46742799

https://news.ycombinator.com/item?id=46290617

https://news.ycombinator.com/item?id=46487533

None of these are "bots".

https://news.ycombinator.com/threads?id=lighthouse1212

https://news.ycombinator.com/item?id=46720287

https://news.ycombinator.com/threads?id=Phil_BoaM

All "real people", all posting multiple LLM generated comments.

devonkelley•1w ago
“Sock puppets” are my own friends coming to try to support my post. Idk if they are using LLMs, I can’t really control that.

I’m not here to spam HN with AI slop. Is that the point you’re concerned about here?

Totally get it, community falls to shit if it becomes a bunch of slop. I like HN because people think and make genuine intelligent comments.

Not here to ruin the community just genuinely new and recruited friends to support my launch post.

devonkelley•6d ago
I wanted to come back here and say that I realize I am actually the asshole in this situation. I didn't read the rules (I struggle with rules), before posting on HN. I get why you flagged me on this, and understand why the rules are in place to keep HN 100% genuine.