exporters:
otlp/incidentary:
endpoint: api.incidentary.com:4317
headers:
authorization: "Bearer ${INCIDENTARY_API_KEY}"
service:
pipelines:
traces:
exporters: [otlp/incidentary]
What it does: when an alert fires (PagerDuty, OpsGenie, a Slack command, or a direct webhook), it takes the trace data your services already emitted in the window around the alert and assembles a causal chain like "service-A's HTTP call to service-B returned 503 at 14:22:03; service-B was timing out on Redis; Redis primary was failing over." That artifact lands in the incident channel before the war room fills up.I built this because every incident I was on opened the same way: a team of engineers opening different tools and coming up with different theories. Datadog had the trace data, Sentry had the errors, Slack had the channel, PagerDuty fired the alert. Nothing stitched them into "what failed first, and what called what." Incidentary does that one thing and nothing else.
Why it isn't a Datadog Watchdog clone:
- Deterministic, not probabilistic. Every edge is proven by an actual parent_ce_id or W3C traceparent in the message envelope. If a service in the path wasn't instrumented, that link appears as a labeled gap, not filled in by a model.
- No LLM in the assembly path. The artifact is identical on a re-run; you can paste it into an RCA without retracting a sentence later.
- Pre-alert capture. The SDKs and the collector processor we ship hold an unsampled rolling window. When error rate, p99, or queue depth increases, the window elevates to full detail before the page fires, so you see the lead-up, not just the aftermath.
- Cluster ground-truth via the K8s operator. OOMs, evictions, HPA scale events. Your application telemetry never sees these. They union onto the same trace by service+time, not by W3C trace context (which most cluster events don't propagate).
If you are on dd-trace and don't run a separate OTel collector: dd-trace v1+ has built-in OTLP export. One env-var flip and you're dual-shipping to Datadog and to us. Or run our Docker sidecar in front of the dd-agent.
Quickstart: https://incidentary.com/docs/quickstart
Free plan: 1,000,000 traces/month, 14-day retention, no credit card. Pricing is per trace, not per span. A 3-service request with two downstream calls is one trace. Most sub-10-service teams stay on free.
Live artifact, no signup, real synthetic incident assembled by the same engine that runs in production, not a video: https://incidentary.com/demo
Pushback I would value:
1. The dd-trace dual-export path. A lot of you run Datadog APM and nothing else. If the env-var flip doesn't survive a real production dd-trace setup, that is the installation path I most need to fix this week. Tell me where it breaks; I would rather hear it from HN than from a user who churns silently.
2. The deterministic-only stance against the AI-Ops wave. I am betting "no hallucinations and you can paste this into an RCA" is worth more than what an LLM can guess from spans. The market is voting differently this year, and I want the strongest case for why I am wrong.
If your collector refuses the exporter, drop the YAML in a reply and I will debug it in the thread. Easier than a support ticket and you get the answer in public.