Also SigNoz supports rendering practically unlimited number of spans in trace detail UI and allows filtering them as well which has been really useful in analyzing batch processes: https://signoz.io/blog/traces-without-limits/
You can further run aggregation on spans to monitor failures and latency.
PS: I am SigNoz maintainer
Is that beginning "logged" at a separate point in time from when the span end is logged?
> AIUI, there aren't really start or end messages,
Can you explain this sentence a bit more? How does it have a duration without a start and end?
The thing is that at scale you’d never be able to guarantee that the start of the span showed up at a collector in chronological order anyway, especially due to the queuing intervals being distinct per collection sidecar. But what you could do with two events is discover spans with no orderly ending to them. You could easily truncate traces that go over the span limit instead of just dropping them on the floor (fuck you for this, OTEL, this is the biggest bullshit in the entire spec). And you could reduce the number of traceids in your parsing buffer that have no metadata associated with them, both in aggregate and number of messages in the limbo state per thousand events processed.
The other turns out to be our OPs teams problem more than OTEL’s. Well a little of both. If a trace goes over a limit then OTEL just silently drops the entire thing, and the default size on AWS is useful for toy problems not retrofitting onto live systems. It’s the silent failure defaults of OTEL that are giant footguns. Give me a fucking error log on data destruction, you asshats.
I’ll just use Prometheus next time, which is apparently what our OPs team recommended (except one individual who was the one I talked to).
And if you’ve ever tried to trace a call tree using correlationIDs and Splunk queries and still say OTEL is ‘just a fancy’ then you’re in dangerous territory, even if it’s just by way of explanation. Don’t feed the masochists. When masochists derail attempts at pain reduction they become sadists.
I was at first implementing otel throughout my api, but ran into some minor headaches and a lot of boilerplate. I shopped a bit around and saw that Sentry has a lot of nice integrations everywhere, and seems to have all the same features (metrics, traces, error reporting). I'm considering just using Sentry for both backend and frontend and other pieces as well.
Curious if anyone has thoughts on this. Assuming Sentry can fulfill our requirements, the only thing taht really concerns me is vendor-lockin. But I'm wondering other people's thoughts
OTeL also has numerous integrations https://opentelemetry.io/ecosystem/registry/. In contrast, Sentry lacks traditional metrics and other capabilities that OTeL offers. IIRC, Sentry experimented with "DDM" (Delightful Developer Metrics), but this feature was deprecated and removed while still in alpha/beta.
Sentry excels at error tracking and provides excellent browser integration. This might be sufficient for your needs, but if you're looking for the comprehensive observability features that OpenTelemetry provides, you'd likely need a full observability platform.
1: https://github.com/getsentry/self-hosted/blob/25.5.1/docker-...
Otel can take a little while to understand because, like many standards, it's designed by committee and the code/documentation will reflect that. LLMs can help but the last time I was asking them about otel they constantly gave me code that was out of date with the latest otel libraries.
Prometheus is bog easy to run, Grafana understands it and anything involving alerting/monitoring from logs is bad idea for future you, I PROMISE YOU, PLEASE DON'T!
Maybe this has changed?
reactordev•9h ago
That said, if you own your infrastructure, I’d build out a signoz cluster in a heartbeat. Otel is awesome but once you set down a path for your org, it’s going to be extremely painful to switch. Choose otel if you’re a hybrid cloud or you have on premises stuff. If you’re on AWS, CloudWatch is a better option simply because they have the data. Dead simple tracing.
6r17•7h ago
I wonder if there are any other adapters for trace injest instead of OTEL ?
darkstar_16•7h ago
6r17•7h ago
elza_1111•6h ago
bbkane•3h ago
mdaniel•1h ago
https://github.com/uptrace/uptrace/blob/v1.7.6/LICENSE
https://github.com/openobserve/openobserve/blob/v0.14.7/LICE...
FunnyLookinHat•7h ago
We've frequently seen a slowdown or error at the top of our stack, and the teams are able to immediately pinpoint the problem as a downstream service. Not only that, they can see the specific issue in the downstream service almost immediately!
Once you get to that level of detail, having your infrastructure metrics pulled into your Otel provider does start to make some sense. If you observe a slowdown in a service, being able to see that the DB CPU is pegged at the same time is meaningful, etc.
[Edit - Typo!]
makeavish•6h ago
Also SigNoz has native correlation between different signals out of the box.
PS: I am SigNoz Maintainer
elza_1111•6h ago
elza_1111•6h ago
Check this out, https://signoz.io/blog/6-silent-traps-inside-cloudWatch-that...
mdaniel•59m ago
The demo for https://github.com/draios/sysdig was also just amazing, but I don't have any idea what the storage requirements would be for leaving it running