frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Big Tech's AI Push Is Costing More Than the Moon Landing

https://www.wsj.com/tech/ai/ai-spending-tech-companies-compared-02b90046
1•1vuio0pswjnm7•1m ago•0 comments

The AI boom is causing shortages everywhere else

https://www.washingtonpost.com/technology/2026/02/07/ai-spending-economy-shortages/
1•1vuio0pswjnm7•3m ago•0 comments

Suno, AI Music, and the Bad Future [video]

https://www.youtube.com/watch?v=U8dcFhF0Dlk
1•askl•5m ago•0 comments

Ask HN: How are researchers using AlphaFold in 2026?

1•jocho12•8m ago•0 comments

Running the "Reflections on Trusting Trust" Compiler

https://spawn-queue.acm.org/doi/10.1145/3786614
1•devooops•13m ago•0 comments

Watermark API – $0.01/image, 10x cheaper than Cloudinary

https://api-production-caa8.up.railway.app/docs
1•lembergs•14m ago•1 comments

Now send your marketing campaigns directly from ChatGPT

https://www.mail-o-mail.com/
1•avallark•18m ago•1 comments

Queueing Theory v2: DORA metrics, queue-of-queues, chi-alpha-beta-sigma notation

https://github.com/joelparkerhenderson/queueing-theory
1•jph•30m ago•0 comments

Show HN: Hibana – choreography-first protocol safety for Rust

https://hibanaworks.dev/
5•o8vm•32m ago•0 comments

Haniri: A live autonomous world where AI agents survive or collapse

https://www.haniri.com
1•donangrey•32m ago•1 comments

GPT-5.3-Codex System Card [pdf]

https://cdn.openai.com/pdf/23eca107-a9b1-4d2c-b156-7deb4fbc697c/GPT-5-3-Codex-System-Card-02.pdf
1•tosh•45m ago•0 comments

Atlas: Manage your database schema as code

https://github.com/ariga/atlas
1•quectophoton•48m ago•0 comments

Geist Pixel

https://vercel.com/blog/introducing-geist-pixel
2•helloplanets•51m ago•0 comments

Show HN: MCP to get latest dependency package and tool versions

https://github.com/MShekow/package-version-check-mcp
1•mshekow•59m ago•0 comments

The better you get at something, the harder it becomes to do

https://seekingtrust.substack.com/p/improving-at-writing-made-me-almost
2•FinnLobsien•1h ago•0 comments

Show HN: WP Float – Archive WordPress blogs to free static hosting

https://wpfloat.netlify.app/
1•zizoulegrande•1h ago•0 comments

Show HN: I Hacked My Family's Meal Planning with an App

https://mealjar.app
1•melvinzammit•1h ago•0 comments

Sony BMG copy protection rootkit scandal

https://en.wikipedia.org/wiki/Sony_BMG_copy_protection_rootkit_scandal
2•basilikum•1h ago•0 comments

The Future of Systems

https://novlabs.ai/mission/
2•tekbog•1h ago•1 comments

NASA now allowing astronauts to bring their smartphones on space missions

https://twitter.com/NASAAdmin/status/2019259382962307393
2•gbugniot•1h ago•0 comments

Claude Code Is the Inflection Point

https://newsletter.semianalysis.com/p/claude-code-is-the-inflection-point
3•throwaw12•1h ago•1 comments

Show HN: MicroClaw – Agentic AI Assistant for Telegram, Built in Rust

https://github.com/microclaw/microclaw
1•everettjf•1h ago•2 comments

Show HN: Omni-BLAS – 4x faster matrix multiplication via Monte Carlo sampling

https://github.com/AleatorAI/OMNI-BLAS
1•LowSpecEng•1h ago•1 comments

The AI-Ready Software Developer: Conclusion – Same Game, Different Dice

https://codemanship.wordpress.com/2026/01/05/the-ai-ready-software-developer-conclusion-same-game...
1•lifeisstillgood•1h ago•0 comments

AI Agent Automates Google Stock Analysis from Financial Reports

https://pardusai.org/view/54c6646b9e273bbe103b76256a91a7f30da624062a8a6eeb16febfe403efd078
1•JasonHEIN•1h ago•0 comments

Voxtral Realtime 4B Pure C Implementation

https://github.com/antirez/voxtral.c
2•andreabat•1h ago•1 comments

I Was Trapped in Chinese Mafia Crypto Slavery [video]

https://www.youtube.com/watch?v=zOcNaWmmn0A
2•mgh2•1h ago•1 comments

U.S. CBP Reported Employee Arrests (FY2020 – FYTD)

https://www.cbp.gov/newsroom/stats/reported-employee-arrests
1•ludicrousdispla•1h ago•0 comments

Show HN: I built a free UCP checker – see if AI agents can find your store

https://ucphub.ai/ucp-store-check/
2•vladeta•1h ago•1 comments

Show HN: SVGV – A Real-Time Vector Video Format for Budget Hardware

https://github.com/thealidev/VectorVision-SVGV
1•thealidev•1h ago•0 comments
Open in hackernews

OpenTelemetry for Go: Measuring overhead costs

https://coroot.com/blog/opentelemetry-for-go-measuring-the-overhead/
129•openWrangler•7mo ago

Comments

dmoy•7mo ago
Not on original topic, but:

I definitely prefer having graphs put the unit at least on the axis, if not in the individual axis labels directly.

I.e. instead of having a graph titled "latency, seconds" at the top and then way over on the left have an unlabeled axis with "5m, 10m, 15m, 20m" ticks...

I'd rather have title "latency" and either "seconds" on the left, or, given the confusion between "5m = 5 minutes" or "5m = 5 milli[seconds]", just have it explicitly labeled on each tick: 5ms, 10ms, ...

Way, way less likely to confuse someone when the units are right on the number, instead of floating way over in a different section of the graph

Thaxll•7mo ago
Logging, metrics and traces are not free, especially if you turn them on at every requests.

Tracing every http 200 at 10k req/sec is not something you should be doing, at that rate you should sample 200 ( 1% or so ) and trace all the errors.

anonzzzies•7mo ago
A very small % of startups gets anywhere near that traffic so why give them angst? Most people can just do this without any issues and learn from it and a tiny fraction shouldn't.
williamdclt•7mo ago
10k/s across multiple services is reached quickly even at startup scale.

In my previous company (startup), we’d use Otel everywhere and we definitely needed sampling for cost reasons (1/30 iirc). And that was using a much cheaper provider than Datadog

cogman10•7mo ago
Having high req/s isn't as big a negative as it once was. Especially if you are using http2 or http3.

Designing APIs which cause a high number of requests and spit out a low amount of data can be quite legitimate. It allows for better scaling and capacity planning vs having single calls that take a large amount of time and return large amounts of data.

In the old http1 days, it was a bad thing because a single connection could only service 1 request at a time. Getting any sort of concurrency or high request rates require many connections (which had a large amount of overhead due to the way tcp functions).

We've moved past that.

orochimaaru•7mo ago
Metrics are usually minimal overheard. Traces need to be sampled. Logs need to be sampled at error/critical levels. You also need to be able to dynamically change sampling and log levels.

100% traces are a mess. I didn’t see where he setup sampling.

phillipcarter•7mo ago
The post didn't cover sampling, which indeed, significantly reduces overhead in OTel because the spans that aren't sampled aren't ever created, when you head sample at the SDK level. This is more of a concern when doing tail-based sampling only, wherein you will want to trace each request and offload to a sidecar so that export concerns are handled outside your app. And then it routes to a sampler elsewhere in your infrastructure.

FWIW at my former employer we had some fairly loose guidelines for folks around sampling: https://docs.honeycomb.io/manage-data-volume/sample/guidelin...

There's outliers, but the general idea is that there's also a high cost to implementing sampling (especially for nontrivial stuff), and if your volume isn't terribly high then you'll probably eat a lot more in time than paying for the extra data you may not necessarily need.

nikolay_sivko•7mo ago
As suggested, I measured the overhead at various sampling rates:

No instrumentation (otel is not initialized): CPU=2.0 cores

SAMPLING 0% (otel initialized): CPU=2.2 cores

SAMPLING 10%: CPU=2.5 cores

SAMPLING 50%: CPU=2.6 cores

SAMPLING 100%: CPU=2.9 cores

Even with 0% sampling, OpenTelemetry still adds overhead due to context propagation, span creation, and instrumentation hooks

orochimaaru•7mo ago
Thanks!!
jhoechtl•7mo ago
I am relatively new to the topic. In the sample code of the OP there is no logging right? It's metrics and traces but no logging.

How is logging in OTel?

shanemhansen•7mo ago
To me traces (or maybe more specifically spans) are essentially a structured log with a unique ID and a reference to a parent ID.

Very open to have someone explain why I'm wrong or why they should be handled separately.

kiitos•7mo ago
Traces have a very specific data model, and corresponding limitations, which don't really accommodate log events/messages of arbitrary size. The access model for traces is also fundamentally different vs. that of logs.
phillipcarter•7mo ago
There are practical limitations mostly with backend analysis tools. OTel does not define a limit on how large a span is. It’s quite common in LLM Observability to capture full prompts and LLM responses as attributes on spans, for example.
kiitos•7mo ago
> There are practical limitations mostly with backend analysis tools

Not just end-of-line analysis tools, but also initiating SDKs, and system agents, and intermediate middle-boxes -- really anything that needs to parse OTel.

Spec > SDK > Trace > Span limits: https://opentelemetry.io/docs/specs/otel/trace/sdk/#span-lim...

Spec > Common > Attribute limits: https://opentelemetry.io/docs/specs/otel/common/#attribute-l...

I know the spec says the default AttributeValueLengthLimit = infinity, but...

> It’s quite common in LLM Observability to capture full prompts and LLM responses as attributes on spans, for example.

...I'd love to learn about any OTel-compatible pipeline/system that supports attribute values of arbitrary size! because I've personally not seen anything that lets you get bigger than O(1MB).

phillipcarter•7mo ago
Well yeah, there are practical limits imposed by the fact that these have to run on real systems. But in practice, you find that you're limited by your backend observability system because it was designed for a world of many events with narrow data, not fewer events with wider data (so-called "wide events").

OTel and the standard toolkit you get with it doesn't prevent you from doing wide events.

kiitos•7mo ago
"Wide events" describe a structure/schema for incoming data on the "write path" to a system. That's fine. But that data always needs to be transformed, specialized, for use-case specific "read paths" offered by that same system, in order to be efficient. You can "do wide events" on ingest but you always need to transform them to specific (narrow? idk) events/metrics/summarizations/etc. for the read paths, that's the whole challenge of the space.
phillipcarter•7mo ago
You…don’t? This is why tools like ClickHouse and Honeycomb are starting to grow, you just aggregate what you need at query time, and the cost to query is not usually too expensive. The tradeoff is each event has a higher per-unit cost, but this is often the more favorable tradeoff.
kiitos•7mo ago
> you just aggregate what you need at query time, and the cost to query is not usually too expensive

The entire challenge of observability systems is rooted in the fact that the volume of input data (wide events) on the write path, is astronomically larger than what can ever be directly evaluated by any user-facing system on the read path. Data transformation and specialization and etc. is the whole ball-game. If you can build something directly on top of raw wide-events, and it works for you, that's cool, but it means that you're operating at trivial and non-representative scale.

phillipcarter•7mo ago
It does not.
phillipcarter•7mo ago
Logging in OTel is logging with your logging framework of choice. The SDK just requires you initialize the wrapper and it’ll then wrap your existing logging calls and correlate term with a trace/span in active context, if it exists. There is no separate logging API to learn. Logs are exported in a separate pipeline from traces and metrics.

Implementation for many languages are starting to mature, too.

kubectl_h•7mo ago
You have to do the tracing anyway if you are going to sample based on criteria that isn't available at the beginning of the trace (like an error that occurs later in the request) and tail sample. You can head sample of course, but that's going to be the most coarse sampling you can do and you can't sample based on anything but the initial conditions of the trace.

What we have started doing is still tracing every unit of work, but deciding at the root span the level of instrumentation fidelity we want for the trace based on the initial conditions. Spans are still generated in the lifecycle of the trace, but we discard them at the processor level (before they are batched and sent to the collector) unless they have errors on them or the trace has been marked as "full fidelity".

kiitos•7mo ago
> Tracing every http 200 at 10k req/sec is not something you should be doing

You don't know if a request is HTTP 200 or HTTP 500 until it ends, so you have to at least collect trace data for every request as it executes. You can decide whether or not to emit trace data for a request based on its ultimate response code, but emission is gonna be out-of-band of the request lifecycle, and (in any reasonable implementation) amortized such that you really shouldn't need to care about sampling based on outcome. That is, the cost of collection is >> the cost of emission.

If your tracing system can't handle 100% of your traffic, that's a problem in that system; it's definitely not any kind of universal truth... !

jeffbee•7mo ago
I feel like this is a lesson that unfortunately did not escape Google, even though a lot of these open systems came from Google or ex-Googlers. The overhead of tracing, logs, and metrics needs to be ultra-low. But the (mis)feature whereby a trace span can be sampled post hoc means that you cannot have a nil tracer that does nothing on unsampled traces, because it could become sampled later. And the idea that if a metric exists it must be centrally collected is totally preposterous, makes everything far too expensive when all a developer wants is a metric that costs nothing in the steady state but can be collected when needed.
mamidon•7mo ago
How would you handle the case where you want to trace 100% of errors? Presumably you don't know a trace is an error until after you've executed the thing and paid the price.
phillipcarter•7mo ago
This is correct. It's a seemingly simple desire -- "always capture whenever there's a request with an error!" -- but the overhead needed to set that up gets complex. And then you start heading down the path of "well THESE business conditions are more important than THOSE business conditions!" and before you know it, you've got a nice little tower of sampling cards assembled. It's still worth it, just a hefty tax at times, and often the right solution is to just pay for more compute and data so that your engineers are spending less time on these meta-level concerns.
jeffbee•7mo ago
I wouldn't. "Trace contains an error" is a hideously bad criterion for sampling. If you have some storage subsystem where you always hedge/race reads to two replicas then cancel the request of the losing replica, then all of your traces will contain an error. It is a genuinely terrible feature.

Local logging of error conditions is the way to go. And I mean local, not to a central, indexed log search engine; that's also way too expensive.

phillipcarter•7mo ago
I disagree that it's a bad criterion. The case you describe is what sounds difficult, treating one error as part of normal operations and another as not. That should be considered its own kind of error or other form of response, and sampling decisions could take that into consideration (or not).
jeffbee•7mo ago
Another reason against inflating sampling rates on errors is: for system stability you never want to do more stuff during errors than you would normally do. Doing something more expensive during an error can cause your whole system, or elements of it, to latch into an unplanned operating point where they only have the capacity to do the expensive error path, and all of the traffic is throwing errors because of the resource starvation.
hamandcheese•7mo ago
It can also be expensive as in money. Especially if you are a Datadog customer.
phillipcarter•7mo ago
I mean, this is why you offload data elsewhere to handle things like sampling and filtering and aggregation.
amir_jak•7mo ago
You can use the OTel Collector for sampling decisions over tracing, it can also be used for reducing log cost before data is sent to Datadog. There's a whole category of telemetry pipeline now for fully managing that (full disclosure, I work for https://www.sawmills.ai which is a smart telemetry management platform)
vanschelven•7mo ago
The article never really explains what eBPF is -- AFAIU, it’s a kernel feature that lets you trace syscalls and network events without touching your app code. Low overhead, good for metrics, but not exactly transparent.

It’s the umpteenth OTEL-critical article on the front page of HN this month alone... I have to say I share the sentiment but probably for different reasons. My take is quite the opposite: most value is precisely in the application (code) level so you definetly should instrument... and then focus on Errors over "general observability"[0]

[0] https://www.bugsink.com/blog/track-errors-first/

nikolay_sivko•7mo ago
I'm the author. I wouldn’t say the post is critical of OTEL. I just wanted to measure the overhead, that’s all. Benchmarks shouldn’t be seen as critique. Quite the opposite, we can only improve things if we’ve measured them first.
politician•7mo ago
I don't want to take away from your point, and yet... if anyone lacks background knowledge these days the relevant context is just an LLM prompt away.
vanschelven•7mo ago
It was always "a search away" but on the _web_ one might as well use... A hyperlink
sa46•7mo ago
Funny timing—I tried optimizing the Otel Go SDK a few weeks ago (https://github.com/open-telemetry/opentelemetry-go/issues/67...).

I suspect you could make the tracing SDK 2x faster with some cleverness. The main tricks are:

- Use a faster time.Now(). Go does a fair bit of work to convert to the Go epoch.

- Use atomics instead of a mutex. I sent a PR, but the reviewer caught correctness issues. Atomics are subtle and tricky.

- Directly marshal protos instead of reflection with a hand-rolled library or with https://github.com/VictoriaMetrics/easyproto.

The gold standard is how TiDB implemented tracing (https://www.pingcap.com/blog/how-we-trace-a-kv-database-with...). Since Go purposefully (and reasonably) doesn't currently provide a comparable abstraction for thread-local storage, we can't implement similar tricks like special-casing when a trace is modified on a single thread.

malkia•7mo ago
There is an effort to use arrow format for metrics too - https://github.com/open-telemetry/otel-arrow - but no client that exports directly to it yet.
rastignack•7mo ago
Would the sync.Pool trick mentionned here: https://hypermode.com/blog/introducing-ristretto-high-perf-g... help ? It’s lossy but might be a good compromise.
sa46•7mo ago
It might be. I've seen the trick pop up a few times:

1. https://puzpuzpuz.dev/thread-local-state-in-go-huh

2. https://victoriametrics.com/blog/go-sync-pool/

It's probably too complex for the Otel SDK, but I might give it a spin in my experimental tracing repo.

otterley•7mo ago
Out of curiosity, does Go's built-in pprof yield different results?

The nice thing about Go is that you don't need an eBPF module to get decent profiling.

Also, CPU and memory instrumentation is built into the Linux kernel already.

coxley•7mo ago
The OTel SDK has always been much worse to use than Prometheus for metrics — including higher overhead. I prefer to only use it for tracing for that reason.
reactordev•7mo ago
Mmmmmmm, the last 8 months of my life wrapped into a blog post but with an ad on the end. Excellent. Basically the same findings as me, my team, and everyone else in the space.

Not being sarcastic at all, it’s tricky. I like that the article called out eBPF and why you would want to disable it for speed but recommends caution. I kept hearing from executives a “single pane of glass” marketing speak and I kept my mouth shut about how that isn’t feasible across the entire organization. Needless to say, they didn’t like that non-answer and so I was canned. What an engineer cared about is different from organization/business metrics and often the two were confused.

I wrote a lot of great otel receivers though. VMware, Veracode, Hashicorp Vault, GitLab, Jenkins, Jira, and the platforms itself.

phillipcarter•7mo ago
> I kept hearing from executives a “single pane of glass” marketing speak

It's really unfortunate that Observability vendors lean into this to reinforce it too. What the execs usually care about is engineering workflows consolidating and allowing teams to all "speak the same language" in terms of data, analysis workflows, visualizations, runbooks, etc.

This goal is admirable, but nearly impossible to achieve because it's the exact same problem as solving "we are aligned organizationally", which no organization ever is.

That doesn't mean progress can't be made, but it's always far more complicated than they would like.

reactordev•7mo ago
For sure, it’s the ultimate nirvana. Let me know when an organization gets there. :)
jiggawatts•7mo ago
A standard trick is to only turn on detailed telemetry from a subset of identical worker VMs or container instances.

Sampling is almost always sufficient for most issues, and when it’s not, you can turn on telemetry on all nodes for selected error levels or critical sections.

nfrankel•7mo ago
I have a talk on OpenTelemetry that I regularly present at conferences. After it, I often get the question: "But what's the performance overhead?". In general, I answer by another question: "Is it better to go fast blindfolded or slightly slower with full visibility?". Then I advise the person to do their own performance test in their specific context.

I'm very happy somebody took the time to measure it.

baalimago•7mo ago
What's the performance drop on Prometheus?
valyala•7mo ago
Performance drop for exposing application metrics in Prometheus format is close to zero. These metrics are usually some counters, which are updated atomically in a few nanoseconds. Prometheus scrapes these metrics once per 10-30 seconds, so generating the /metrics response in Prometheus text exposition format doesn't need a lot of CPU, especially if using a library optimized for simplicity and speed like https://github.com/VictoriaMetrics/metrics/