The problem I kept running into: you get paged at 3 AM, open Grafana, flip between six dashboards, grep through logs, try to figure out which deploy went out, and an hour later you maybe have a root cause. You still need to dig through code and find the culprit. The data is all there, but nothing connects it for you.
RocketLogs tries to fix that:
VS Code / Cursor Extension - Fetches your slow and errored-out endpoints from production and lists them right in the Cursor chat sidebar with their latencies. It maps them to your codebase using Pyroscope profiling. So you have metrics -> traces -> profiles -> code right inside the codebase. It shows latency and a heat-map of how much your modules/funcs are bleeding your endpoints. So you can jump to the offending function and fix it without context-switching and within minutes.
Beyond those, it's a full observability stack: log search, distributed tracing with waterfall views, SLOs with error budget tracking, incident management with AI-generated synopses, smart alerts, and Prometheus dashboards.
We're built on the LGTM stack, so if you're already sending data to Loki/Tempo/Prometheus, you can point RocketLogs at your existing infrastructure. Using this as essentially another observability layer that sits on top. If you want to use multiple vendors, you can also do a telemetry fan-out.
Would love feedback. What's the most painful part of your current observability workflow?