Show HN: TraceRoot – Open-source agentic debugging for distributed services

https://github.com/traceroot-ai/traceroot

40•xinweihe•6mo ago

Hey Xinwei and Zecheng here, we are the authors of TraceRoot (https://github.com/traceroot-ai/traceroot).

TraceRoot (https://traceroot.ai) is an open-source debugging platform that helps engineers fix production issues faster by combining structured traces, logs, source code contexts and discussions in Github PRs, issues and Slack channels, etc. with AI Agents.

At the heart are our lightweight Python (https://github.com/traceroot-ai/traceroot-sdk) and TypeScript (https://github.com/traceroot-ai/traceroot-sdk-ts) SDKs - they can hook into your app using OpenTelemetry and captures logs and traces. These are either sent to a local Jaeger (https://www.jaegertracing.io/) + SQLite backend or to our cloud backend, where we correlate them into a single view. From there, our custom agent takes over.

The agent builds a heterogeneous execution tree that merges spans, logs, and GitHub context into one internal structure. This allows it to model the control and data flow of a request across services. It then uses LLMs to reason over this tree - pruning irrelevant branches, surfacing anomalous spans, and identifying likely root causes. You can ask questions like “what caused this timeout?” or “summarize the errors in these 3 spans”, and it can trace the failure back to a specific commit, summarize the chain of events, or even propose a fix via a draft PR.

We also built a debugging UI that ties everything together - you explore traces visually, pick spans of interest, and get AI-assisted insights with full context: logs, timings, metadata, and surrounding code. Unlike most tools, TraceRoot stores long-term debugging history and builds structured context for each company - something we haven’t seen many others do in this space.

What’s live today:

- Python and TypeScript SDKs for structured logs and traces.

- AI summaries, GitHub issue generation, and PR creation.

- Debugging UI that ties everything together

TraceRoot is MIT licensed and easy to self-host (via Docker). We support both local mode (Jaeger + SQLite) and cloud mode. Inspired by OSS projects like PostHog and Supabase - core is free, enterprise features like agent mode multi-tenant and slack integration are paid.

If you find it interesting, you can see a demo video here: https://www.youtube.com/watch?v=nb-D3LM0sJM

We’d love you to try TraceRoot (https://traceroot.ai) and share any feedback. If you're interested, our code is available here: https://github.com/traceroot-ai/traceroot. If we don’t have something, let us know and we’d be happy to build it for you. We look forward to your comments!

Comments

thatrandybrown•6mo ago

I like the idea of this and the use case, but don't love the tight coupling to openai. I'd love to see a framework for allowing BYOM.

zecheng•6mo ago

Yes, there is a roadmap to support more models. For now there is a in progress PR to support Anthropic models https://github.com/traceroot-ai/traceroot/pull/21 (contributed by some active open source contributors) Feel free to let us know which (open source) model or framework (VLLM etc.) you want to use :)

44za12•6mo ago

Why not use something like litellm?

zecheng•6mo ago

That's also one option, we will consider add it later :)

Onawa•6mo ago

It's been 2.5 years since ChatGPT came out, and so many projects still don't allow for easy switching of the OPEN_AI_BASE_URL or affiliated parameters.

There are so many inferencing libraries that serve an OpenAI-compatible API that any new project being locked in to OpenAI only is a large red flag for me.

xinweihe•6mo ago

Thanks for the feedback! Totally hear you on the tight OpenAI coupling - we're aware and already working to make BYOM easier. Just to echo what Zecheng said earlier: broader model flexibility is definitely on the roadmap.

Appreciate you calling it out — helps us stay honest about the gaps.

ethan_smith•6mo ago

Adding model provider abstraction would significantly improve adoption, especially for organizations with specific LLM preferences or air-gapped environments that can't use OpenAI.

xinweihe•6mo ago

Yep, you're spot on - and we're hearing this loud and clear across the thread. Model abstraction is on the roadmap, and we're already working on making BYOM smoother.

lmeyerov•6mo ago

I'm curious -- let's say we have claude code hooked up to MCPs for jaeger, grafana, and the usual git/gh CLIs it can use out-of-the-box, and we let claude's planner work through investigations with whatever help we give it. Would TraceRoot do anything clever wrt the AI that such as a setup wouldn't/couldn't?

(I'm asking b/c we're planning a setup that's basically that, so real question.)

xinweihe•6mo ago

Good question! Your setup already covers a lot — but TraceRoot tries to go a bit further in a few areas:

In TraceRoot, we organize all logs, metrics, etc. around traces and build an execution tree. This structured view makes it much easier for our agent to reason through the large amount of telemetry data using context-aware optimizations. (We plan to support slack and notion integration as well.)

It’s not a one-off tool. TraceRoot is a live monitoring platform. It continuously watches what’s happening in prod. So when something breaks, the agent already has full team-visible context, not just a scratchpad session spun up in the moment.

Down the line, we're aiming for automatic bug detection and remediation - not just smarter copiloting, but proactive debugging workflows. The system also retains team-level memory of past bugs, fixes, and infra quirks, so the agent gets smarter over time.

We’ve open sourced a lot of the core. Would love to jam on this if you're up for it. Always down to trade ideas or even hack on something together!

lmeyerov•6mo ago

I don't understand - otel does that unification already. Traces connected to logs etc.. I'm still missing something...

xinweihe•6mo ago

Thanks for the follow-up. Let me try to clarify!

When we say we "organize all logs, metrics, and traces", we mean more than just linking them together (which otel already supports). What we’re doing is:

- context engineering optimization: We leverage the structure among logs, spans, and metadata to filter and group relevant context before passing it to the LLM. In real production issues, it's common to see 10k+ logs, traces, etc. related to a single incident — but most of it is noise. Throwing all that at agents usually leads to poor performance due to context bloat see https://arxiv.org/pdf/2307.03172. We're working on addressing that by doing structured filtering and summarization. For more details see https://bit.ly/45Bai1q.

- Human-in-the-Loop UI: For cases where developers want to manually inspect or guide the agent, we provide a UI that makes it easy to zoom in on relevant subtrees, trace paths, or log clusters and directly select spans to be included in the reasoning of agents.

The goal isn't just unification, it's scalable reasoning over noisy telemetry data, both automated and interactive.

Hope that clears things up a bit! Happy to dive deeper if useful.

lmeyerov•6mo ago

The second link helps

It's interesting to wonder if 80% of the question answering can be achieved as a prompts/otel.md over MCPs connected to Claude Code and let agentic reasoning do the rest

Ex:

* When investigating errors, only query for error-level logs

* When investigating performance, only query spans (skip logs unless required) and keep only name, time. Linearize as ... .

* When querying both logs & traces, inline logs near relevant trace as part of an llm-friendly stored artifact jobs/abc123/context.txt

Are there aspects of the question answering (not ui widgets) you think are too hard there?

zecheng•6mo ago

Yes, we can connect for example CC with MCPs. But this may not work well for example if user wants to check the latency for previous 10 days error log on function A. By using MCP the agent needs to get 10 days error logs at first and then somehow get the latency and correlates them, apply filters for function A. IMO it will hallucinate a lot if there are too many tools, logs and traces. But on TraceRoot platform we "mixed" all necessary data at first, and based on user's query apply filters on structured data, which is more accurate, straightforward and efficient. Here is the README of the general design https://github.com/traceroot-ai/traceroot/tree/main/rest/age...

sand_9999•6mo ago

I can connect MCP for Datadog/NewRelic/Cloudwatch logs. Cursor or ClaudeCode would give me all that I need. Are you doing something new here?

xinweihe•6mo ago

Fair question. Here’s how TraceRoot is different.

- We don’t just stream raw logs/traces into an LLM, we build execution trees and correlate data across services and threads. That gives our agent causal context, not just pattern matching.

- It’s designed to debug real issues in production, where things are messy, not just dev or staging.

- We are aiming for automatic bug detection and remediation soon, not just copiloting, but a debugging agent that can spot regressions and trigger fixes proactively.

- We are working on persist historical incidents, fixes, and infra quirks, so the agent improves with each investigation, and doesn’t start from scratch every time.

Happy to dive deeper! Let me know if you have more questions.

sand_9999•6mo ago

Sentry does that. Also most observability platforms have tracing built in. All of this can be fed into LLM using MCP.

I saw your video...and I see that it makes things easy to understand (in right panel) at any node.

zecheng•6mo ago

We provide an easy to use solution that Sentry is quite complex to use by connecting code context to corresponding loggings and tracings. Also, directly using MCP with LLMs may hallucinate if there are too many tool candidates and if there are a lot of loggings (which is very common) We need to have some optimizations to improve the both of the efficiency and reduce the context fed into the LLMs. An example is shown in this README https://github.com/traceroot-ai/traceroot/tree/main/rest/age... There is also some cursor like UI in TraceRoot to better involve human in the loop which is crucial to minimize the context length and other platforms such as Sentry does not have.

jinusunil•6mo ago

How do you evaluate the output of your trace tool? Are some benchmarks for tracing tools?

xinweihe•6mo ago

Yep, we're working on a golden test set with known root causes to benchmark and track agent performance over time. It's taking a bit of work to get right, but we're on it and definitely open to contributions!

autorinalagist•6mo ago

Very cool! I have a question, how are you evaluating the performance while you develop this. Do you have some golden set of examples that you evaluate against?

xinweihe•6mo ago

Great question! Yes, we're actively building a golden test set of debugging scenarios with known root causes and failure patterns. This allows us to systematically evaluate and improve agent performance with every release. Contributions are very welcome as we expand this effort!

In the meantime, we lean on explainability, i.e. every agent output is grounded in the original logs, traces, and metadata, with inline references. So if the output is off, users can easily verify, debug, and either trust or challenge the agent’s reasoning by reviewing the linked evidence.

Show HN: LoKey Typer – A calm typing practice app with ambient soundscapes

Long-Sought Proof Tames Some of Math's Unruliest Equations

Hacking the last Z80 computer – FOSDEM 2026 [video]

Browser-use for Node.js v0.2.0: TS AI browser automation parity with PY v0.5.11

Michael Pollan Says Humanity Is About to Undergo a Revolutionary Change

Software Engineering Is Back

Storyship: Turn Screen Recordings into Professional Demos

Reputation Scores for GitHub Accounts

A BSOD for All Seasons – Send Bad News via a Kernel Panic

Show HN: I got tired of copy-pasting between Claude windows, so I built Orcha

Omarchy First Impressions

Reinforcement Learning from Human Feedback

Show HN: Versor – The "Unbending" Paradigm for Geometric Deep Learning

Show HN: HypothesisHub – An open API where AI agents collaborate on medical res

Big Tech vs. OpenClaw

Anofox Forecast

Ask HN: How do you figure out where data lives across 100 microservices?

Motus: A Unified Latent Action World Model

Rotten Tomatoes Desperately Claims 'Impossible' Rating for 'Melania' Is Real

The protein denitrosylase SCoR2 regulates lipogenesis and fat storage [pdf]

Los Alamos Primer

NewASM Virtual Machine

Terminal-Bench 2.0 Leaderboard

I vibe coded a BBS bank with a real working ledger

The Path to Mojo 1.0

Show HN: I'm 75, building an OSS Virtual Protest Protocol for digital activism

Show HN: I built Divvy to split restaurant bills from a photo

Hot Reloading in Rust? Subsecond and Dioxus to the Rescue

Skim – vibe review your PRs

Show HN: Open-source AI assistant for interview reasoning

Show HN: LoKey Typer – A calm typing practice app with ambient soundscapes

Long-Sought Proof Tames Some of Math's Unruliest Equations

Hacking the last Z80 computer – FOSDEM 2026 [video]

Browser-use for Node.js v0.2.0: TS AI browser automation parity with PY v0.5.11

Michael Pollan Says Humanity Is About to Undergo a Revolutionary Change

Software Engineering Is Back

Storyship: Turn Screen Recordings into Professional Demos

Reputation Scores for GitHub Accounts

A BSOD for All Seasons – Send Bad News via a Kernel Panic

Show HN: I got tired of copy-pasting between Claude windows, so I built Orcha

Omarchy First Impressions

Reinforcement Learning from Human Feedback

Show HN: Versor – The "Unbending" Paradigm for Geometric Deep Learning

Show HN: HypothesisHub – An open API where AI agents collaborate on medical res

Big Tech vs. OpenClaw

Anofox Forecast

Ask HN: How do you figure out where data lives across 100 microservices?

Motus: A Unified Latent Action World Model

Rotten Tomatoes Desperately Claims 'Impossible' Rating for 'Melania' Is Real

The protein denitrosylase SCoR2 regulates lipogenesis and fat storage [pdf]

Los Alamos Primer

NewASM Virtual Machine

Terminal-Bench 2.0 Leaderboard

I vibe coded a BBS bank with a real working ledger

The Path to Mojo 1.0

Show HN: I'm 75, building an OSS Virtual Protest Protocol for digital activism

Show HN: I built Divvy to split restaurant bills from a photo

Hot Reloading in Rust? Subsecond and Dioxus to the Rescue

Skim – vibe review your PRs

Show HN: Open-source AI assistant for interview reasoning

Show HN: TraceRoot – Open-source agentic debugging for distributed services

Comments