Show HN: IncidentFox – open-source AI SRE with log sampling and RAPTOR retrieval

https://github.com/incidentfox/incidentfox

1•chiehminwei•2w ago

Hi HN, I’m Jimmy.

We open-sourced the core of IncidentFox, an AI SRE / on-call agent.

The main thing we’re working on is handling context for incident investigation. Logs, metrics, traces, runbooks, prior incidents — this data is large, fragmented, and doesn’t fit cleanly into an LLM context window.

For logs, we don’t fetch everything. We start with stats (counts, severity distribution, common patterns) and then sample intentionally (errors-only, around-anomaly, stratified). Most investigations end up with tens of logs instead of millions.

For long documents like runbooks or postmortems, flat chunk-based RAG wasn’t working well, so we implemented a RAPTOR-style hierarchical retrieval to preserve higher-level context while still allowing drill-down.

The open-source core is a tool-based agent runtime with integrations. You can run it locally via CLI (or Slack/ GitHub), which is effectively on-prem on your laptop.

We’re very early and trying to find our first users / customers. If you’ve been on call before, I’m curious:

- does “AI SRE” feel useful, or mostly hype?

- where would something like this actually help, if at all?

- what would you want it to do before you’d trust it?

If you try it and it’s not useful, that’s still helpful feedback. I’ll be around in the comments!

Comments

incidentiq•2w ago

Been on-call across several orgs. To answer your questions:

1. "AI SRE" useful or hype? Useful in specific contexts, but the trust barrier is real. Most on-call engineers are skeptical of AI suggestions during incidents because the cost of a wrong recommendation at 3am is high. That said, the pain of digging through logs and finding relevant context is also real.

2. Where it helps: The biggest wins are in "pre-work" - surfacing relevant past incidents before you start investigating, correlating alerts that are likely related, and summarizing what changed recently. Reducing the "context gathering" phase which often eats 30%+ of incident time.

3. Trust requirements: For me to trust it: - Show confidence levels and your reasoning. "Here's what I found and why" beats "do this." - Be a copilot that accelerates my investigation, not one that acts on my behalf. - Get the easy stuff 100% right before attempting the hard stuff. If log correlation is wrong on obvious patterns, I won't trust root cause suggestions.

The RAPTOR approach for runbooks is interesting - the "loss of context in chunked RAG" problem is real for long-form incident docs. How do you handle cases where relevant context spans multiple documents (runbook references an architecture doc)?

Show HN: FSID - Identifier for files and directories (like ISBN for Books)

Show HN: Holy Grail: Open-Source Autonomous Development Agent

Show HN: Minecraft Creeper meets 90s Tamagotchi

Show HN: Termiteam – Control center for multiple AI agent terminals

The only U.S. particle collider shuts down

Ask HN: Why do purchased B2B email lists still have such poor deliverability?

Show HN: Remotion directory (videos and prompts)

Portable C Compiler

Show HN: Kokki – A "Dual-Core" System Prompt to Reduce LLM Hallucinations

Software Engineering Transformation 2026

Microsoft purges Win11 printer drivers, devices on borrowed time

Lunch with the FT: Tarek Mansour

Old Mexico and her lost provinces (1883)

'AI' is a dick move, redux

The source code was the moat. But not anymore

Does anyone else feel like their inbox has become their job?

An AI model that can read and diagnose a brain MRI in seconds

Dev with 5 of experience switched to Rails, what should I be careful about?

AlphaFace: High Fidelity and Real-Time Face Swapper Robust to Facial Pose

Scientists discover “levitating” time crystals that you can hold in your hand

Rammstein – Deutschland (C64 Cover, Real SID, 8-bit – 2019) [video]

Tell HN: Yet Another Round of Zendesk Spam

Postgres Message Queue (PGMQ)

Show HN: Django-rclone: Database and media backups for Django, powered by rclone

NY lawmakers proposed statewide data center moratorium

OpenClaw AI chatbots are running amok – these scientists are listening in

Show HN: AI agent forgets user preferences every session. This fixes it

Introduce the Vouch/Denouncement Contribution Model

Show HN: SSHcode – Always-On Claude Code/OpenCode over Tailscale and Hetzner

Microsoft appointed a quality czar. He has no direct reports and no budget