frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Kelet – Root Cause Analysis agent for your LLM apps

https://kelet.ai/
36•almogbaku•3h ago
I've spent the past few years building 50+ AI agents in prod (some reached 1M+ sessions/day), and the hardest part was never building them — it was figuring out why they fail.

AI agents don't crash. They just quietly give wrong answers. You end up scrolling through traces one by one, trying to find a pattern across hundreds of sessions.

Kelet automates that investigation. Here's how it works:

1. You connect your traces and signals (user feedback, edits, clicks, sentiment, LLM-as-a-judge, etc.) 2. Kelet processes those signals and extracts facts about each session 3. It forms hypotheses about what went wrong in each case 4. It clusters similar hypotheses across sessions and investigates them together 5. It surfaces a root cause with a suggested fix you can review and apply

The key insight: individual session failures look random. But when you cluster the hypotheses, failure patterns emerge.

The fastest way to integrate is through the Kelet Skill for coding agents — it scans your codebase, discovers where signals should be collected, and sets everything up for you. There are also Python and TypeScript SDKs if you prefer manual setup.

It’s currently free during beta. No credit card required. Docs: https://kelet.ai/docs/

I'd love feedback on the approach, especially from anyone running agents in prod. Does automating the manual error analysis sound right?

Comments

yanovskishai•3h ago
I imagine it's hard to create a very generic tool for this usecase - what are the supported frameworks/libs, what does this tool assume about my implementation ?
BlueHotDog2•3h ago
nice. what a crazy space. how is this different vs other telemetry/analysis platforms such as langchain/braintrust etc?
almogbaku•1h ago
Hi @BlueHotDog2, OP here

langsmith/langfuse/braintrust collect traces, and then YOU need to look at them and analyze them(error analysis/RCA).

Kelet do that for you :)

Does that make any sense? If not, please tell me, I'm still trying to figure out how to explain that, lol.

halflife•2h ago
Kelet as in קלט as in input?
almogbaku•1h ago
Hi @halflife, OP here

YEP, Good catch! Kelet as input/prompt in Hebrew :)

hadifrt20•2h ago
in the auickstart, the suggested fixes are called "Prompt Patches" .. does that mean Kelet only surfaces root causes that are fixable in the prompt? What happens when the real bug is in tool selection or retrieval ranking for example?
almogbaku•1h ago
Hi @hadifrt20, OP here

From what we discovered analyzing ~33K+ sessions, most of the time when the agent selects the wrong tool, it's because the tool's description (i.e., prompt) was not good enough, or there's a missing nuance here.

That goes exactly under Kelet's scope :)

dwb•2h ago
> The key insight

I'm so tired

hmokiguess•2h ago
Hahahahahahahahahhaa ngl, your comment killed me, some LLM tells are so funny
whythismatters•1h ago
Sadly, they forgot to mention why this matters
almogbaku•1h ago
hey @dwb, OP here

Yes. I definitely assisted LLM in writing it. Yeah - I should have stripped it better.

Yet it's f*ing painful to do error analysis and go through thousands of traces. Hope you can live with my human mistakes

system16•1h ago
Also the obligatory “It’s not A. It’s B.”
trannnnun•2h ago
jkfrntgijbntbuijhb8ybu
RoiTabach•2h ago
This looks Amazing Do you have a LiteLLM integration?
almogbaku•1h ago
Hi @RoiTabach, OP here

Yep. We can integrate with every solution that supports OpenTelemetry :) so it's pretty native, just use the integration skill

npx skills add kelet-ai/skills

peter_parker•2h ago
> They just quietly give wrong answers. It's not about wrong answers only. They just stuck in a circle sometimes.
jldugger•2h ago
Every six months or so, someone at work does a hackathon project to automate outage analysis work SRE would likely perform. And every one of them I've seen has been underwhelming and wrong.

There's like three reasons for this disconnect.

1. The agents aren't expert at your proprietary code. They can read logs and traces and make educated guesses, but there's no world model of your code in there.

2. The people building these apps are unqualified to review the output. I used to mock narcissists evaluating ChatGPT quality by asking it for their own biography, but they're at least using a domain they are an expert in. Your average MLE has no profound truths about kubernetes or the app. At best, they're using some toy "known broken" app to demonstrate under what are basically ideal conditions, but part of the holdout set should be new outages in your app.

3. SREs themselves are not so great at causal analysis. Many junior SRE take the "it worked last time" approach, but this embeds a presumption that whatever went wrong "last time" hasn't been fixed in code. Your typical senior SRE takes a "what changed?" approach, which is depressingly effective (as it indicates most outages are caused by coworkers). At the highest echelons, I've seen research papers examining meta-stablity and granger causality networks, but I'm pretty sure nobody in SRE or these RCA agents can explain what they mean.

> The key insight: individual session failures look random. But when you cluster the hypotheses, failure patterns emerge.

My own insight is mostly bayesian. Typical applications have redundancy of some kind, and you can extract useful signals by separating "good" from "bad". A simple bayesian score of (100+bad)/(100+good) does a relatively good job of removing the "oh that error log always happens" signals. There's also likely a path using clickhouse level data and bayesian causal networks, but the problem is traditional bayesian networks are hand crafted by humans.

So yea, you can ask an LLM for 100 guesses and do some kind of k-means clustering on them, but you can probably do a better job doing dimensional analysis first and passing that on to the agent.

almogbaku•1h ago
Hi @jldugger

Great points, but I think there's a domain confusion here . You're describing infra/code RCA. Kelet does an AI agent Quality RCA — the agent returns a 200 OK, but gives the wrong answer.

The signal space is different. We're working with structured LLM traces + explicit quality signals (thumbs down, edits, eval scores), not distributed system logs. Much more tractable.

Your Bayesian point actually resonates — separating good from bad sessions and looking for structural differences is close to what we do. But the hypotheses aren't "100 LLM guesses + k-means." Each one is grounded in actual session data: what the user asked, what the agent did, what came back, and what the signal was.

Curious about the dimensional analysis point — are you thinking about reducing the feature space before hypothesis generation?

I wrote to Flock's privacy contact to opt out of their domestic spying program

https://honeypot.net/2026/04/14/i-wrote-to-flocks-privacy.html
297•speckx•2h ago•125 comments

YouTube now world's largest media company, topping Disney

https://www.hollywoodreporter.com/business/digital/youtube-worlds-largest-media-company-2025-tops...
94•bookofjoe•5d ago•65 comments

Rare concert recordings are landing on the Internet Archive

https://techcrunch.com/2026/04/13/thousands-of-rare-concert-recordings-are-landing-on-the-interne...
383•jrm-veris•6h ago•107 comments

Spain to expand internet blocks to tennis, golf, movies broadcasting times

https://bandaancha.eu/articulos/telefonica-consigue-bloqueos-ips-11731
321•akyuu•3h ago•279 comments

Claude Code Routines

https://code.claude.com/docs/en/routines
186•matthieu_bl•3h ago•119 comments

5NF and Database Design

https://kb.databasedesignbook.com/posts/5nf/
84•petalmind•3h ago•34 comments

California ghost-gun bill wants 3D printers to play cop, EFF says

https://www.theregister.com/2026/04/14/eff_california_3dprinted_firearms/
62•Bender•1h ago•17 comments

Turn your best AI prompts into one-click tools in Chrome

https://blog.google/products-and-platforms/products/chrome/skills-in-chrome/
37•xnx•2h ago•17 comments

Let's Talk Space Toilets

https://mceglowski.substack.com/p/lets-talk-space-toilets
70•zdw•21h ago•19 comments

Modifying FileZilla to Workaround Bambu 3D Printer's FTP Issue

https://lantian.pub/en/article/modify-computer/modify-filezilla-workaround-bambu-3d-printer-ftp-i...
35•speckx•3h ago•36 comments

OpenSSL 4.0.0

https://github.com/openssl/openssl/releases/tag/openssl-4.0.0
96•petecooper•2h ago•22 comments

guide.world: A compendium of travel guides

https://guide.world/
27•firloop•5d ago•6 comments

Show HN: LangAlpha – what if Claude Code was built for Wall Street?

https://github.com/ginlix-ai/langalpha
65•zc2610•5h ago•22 comments

Show HN: Plain – The full-stack Python framework designed for humans and agents

https://github.com/dropseed/plain
20•focom•2h ago•5 comments

Backblaze has stopped backing up OneDrive and Dropbox folders and maybe others

https://rareese.com/posts/backblaze/
817•rrreese•11h ago•503 comments

Gas Town: From Clown Show to v1.0

https://steve-yegge.medium.com/gas-town-from-clown-show-to-v1-0-c239d9a407ec
16•martythemaniak•49m ago•2 comments

ClawRun – Deploy and manage AI agents in seconds

https://github.com/clawrun-sh/clawrun
7•afshinmeh•56m ago•0 comments

jj – the CLI for Jujutsu

https://steveklabnik.github.io/jujutsu-tutorial/introduction/what-is-jj-and-why-should-i-care.html
430•tigerlily•9h ago•366 comments

The Mouse Programming Language on CP/M

https://techtinkering.com/articles/the-mouse-programming-language-on-cpm/
32•PaulHoule•3d ago•3 comments

Carol's Causal Conundrum: a zine intro to causally ordered message delivery

https://decomposition.al/zines/
29•evakhoury•4d ago•2 comments

Introspective Diffusion Language Models

https://introspective-diffusion.github.io/
203•zagwdt•12h ago•39 comments

Show HN: A memory database that forgets, consolidates, and detects contradiction

https://github.com/yantrikos/yantrikdb-server
20•pranabsarkar•4h ago•15 comments

Show HN: Kontext CLI – Credential broker for AI coding agents in Go

https://github.com/kontext-dev/kontext-cli
54•mc-serious•6h ago•21 comments

Nucleus Nouns

https://ben-mini.com/2026/nucleus-nouns
44•bewal416•4d ago•11 comments

Show HN: Kelet – Root Cause Analysis agent for your LLM apps

https://kelet.ai/
36•almogbaku•3h ago•18 comments

DaVinci Resolve – Photo

https://www.blackmagicdesign.com/products/davinciresolve/photo
992•thebiblelover7•17h ago•254 comments

The acyclic e-graph: Cranelift's mid-end optimizer

https://cfallin.org/blog/2026/04/09/aegraph/
58•tekknolagi•4d ago•15 comments

A new spam policy for “back button hijacking”

https://developers.google.com/search/blog/2026/04/back-button-hijacking
775•zdw•17h ago•448 comments

The M×N problem of tool calling and open-source models

https://www.thetypicalset.com/blog/grammar-parser-maintenance-contract
106•remilouf•5d ago•36 comments

Lean proved this program correct; then I found a bug

https://kirancodes.me/posts/log-who-watches-the-watchers.html
365•bumbledraven•19h ago•164 comments