We’re building Kubegraf, an AI-powered SRE for Kubernetes that helps engineers quickly identify the root cause of incidents.
In most Kubernetes incidents we’ve seen, engineers spend 1–2 hours manually correlating logs, metrics, events, probes, and scaling behavior just to understand what actually broke.
Kubegraf tries to automate that process.
It connects to your cluster observability data and analyzes relationships between signals to surface the most probable root cause and timeline of what happened.
The goal is simple:
Reduce the time spent debugging infrastructure so engineers can focus on building.
Some things we’re working on:
• Incident timeline reconstruction
• Cross-signal correlation (logs, metrics, events)
• Root cause suggestions
• Faster MTTR for Kubernetes teams
We’re preparing for our launch and opening early access.
If you work with Kubernetes, we’d really appreciate feedback from the HN community.
Prajol•2h ago
We’re building Kubegraf, an AI-powered SRE for Kubernetes that helps engineers quickly identify the root cause of incidents.
In most Kubernetes incidents we’ve seen, engineers spend 1–2 hours manually correlating logs, metrics, events, probes, and scaling behavior just to understand what actually broke.
Kubegraf tries to automate that process.
It connects to your cluster observability data and analyzes relationships between signals to surface the most probable root cause and timeline of what happened.
The goal is simple: Reduce the time spent debugging infrastructure so engineers can focus on building.
Some things we’re working on:
• Incident timeline reconstruction • Cross-signal correlation (logs, metrics, events) • Root cause suggestions • Faster MTTR for Kubernetes teams
We’re preparing for our launch and opening early access.
If you work with Kubernetes, we’d really appreciate feedback from the HN community.
Early access: https://kubegraf.io
Happy to answer questions or hear how your team currently handles Kubernetes incident debugging.