A single agent request can fan out across sub-agents, tools, model calls, retries, and external APIs. When something goes wrong, logs aren’t enough and raw traces aren’t very helpful unless you can actually understand them.
The hard part isn’t emitting OpenTelemetry spans. That’s easy.
The hard part is turning high-cardinality agent traces into something a human can reason about.
In our agent framework at Inkeep, we use OTEL end-to-end. The linked post is by SigNoz, because that’s the OTEL backend we use underneath- but we don’t expect developers to debug agents by staring at raw span trees or waterfall views alone.
On top of raw OTEL traces, we expose an agent-aware timeline view that makes a single request’s execution path across agents, tools, and model calls easy to follow.
The underlying trace data is still standard OTEL. The extra work is in making it understandable.
We ended up using SigNoz largely because:
- It exposes a programmatic trace API, which makes building custom views possible
- It can actually query high-cardinality traces without falling over
- It’s OTEL-native, so nothing is proprietary or locked in
A few opinions we formed along the way:
- Instrument everything by default
- Make observability automatic, not opt-in
- Treat observability as part of the product, not a separate tool
If you’re building AI agents and observability isn’t shaping your UX yet, you’re going to feel it in production.
gaurav12342345•8h ago
The hard part isn’t emitting OpenTelemetry spans. That’s easy.
The hard part is turning high-cardinality agent traces into something a human can reason about.
In our agent framework at Inkeep, we use OTEL end-to-end. The linked post is by SigNoz, because that’s the OTEL backend we use underneath- but we don’t expect developers to debug agents by staring at raw span trees or waterfall views alone.
On top of raw OTEL traces, we expose an agent-aware timeline view that makes a single request’s execution path across agents, tools, and model calls easy to follow. The underlying trace data is still standard OTEL. The extra work is in making it understandable.
We ended up using SigNoz largely because: - It exposes a programmatic trace API, which makes building custom views possible - It can actually query high-cardinality traces without falling over - It’s OTEL-native, so nothing is proprietary or locked in
A few opinions we formed along the way: - Instrument everything by default - Make observability automatic, not opt-in - Treat observability as part of the product, not a separate tool
If you’re building AI agents and observability isn’t shaping your UX yet, you’re going to feel it in production.