frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Why "top" missed a cron job that was killing our API latency

https://parth21shah.substack.com/p/why-your-dashboard-is-green-but-the
3•parth21shah•1h ago

Comments

parth21shah•1h ago
OP here. I’ve been doing backend work for ~15 years, but this was the first time I really felt why eBPF matters. We had a latency spike that all the usual polling tools missed — top, CloudWatch, Datadog, everything looked normal. In the end it was a misconfigured cron job spawning ~50 short-lived workers every minute. Each one ran for ~500ms, burned the CPU, and exited before the next poll. So all our “snapshot” tools were basically blind. I wrote the post to show this exact gap: Polling = snapshots, Tracing = event stream. For stuff that appears and disappears between polls, only tracing really sees it.tools like execsnoop or auditd can catch this, but in our case the overhead felt too high to leave on 24/7 in production. I amm currently playing with a small Rust+Aya agent that listens on ring buffers so we can run this continuously with less overhead. If you just want to try the idea, the post has a few bpftrace one-liners so you can reproduce the detection logic without writing any C or Rust.
danishSuri1994•1h ago
This is a great example of the blind spot between sampling-based observability and event-driven tracing.

Anything that appears + disappears between polls is effectively invisible unless you’re streaming syscalls/process events. It’s surprising how often “short-lived, high-impact” processes cause the worst production spikes.

Curious whether you’re planning to surface this at the scheduler level (run queue latency/involuntary context switches) or stick to process-lifecycle tracing?

parth21shah•52m ago
Right now I’m sticking to process lifecycle (sched_process_fork and sched_process_exit), mostly for correlation. It’s much easier to grab container ID / cgroup metadata at fork time and say “this pod/image is the bad actor” than it is to reconstruct that context off a firehose of sched_switch events. I agree that run queue latency / scheduler stats are the “better” signals for pure performance debugging. But scheduler switches generate a huge volume of events compared to forks. So I’m starting with fork/exec/exit + container/cgroup mapping If you’ve shipped scheduler-level tracing in production I’d love to hear how you handled filtering + aggregation.
zahlman•1h ago
I could already guess the answer and there is just so little actual content here with way too many words to explain a simple idea. Which is what you typically get when you let the LLM write for you.

Reducing MCP token usage by 100x – you don't need code mode

https://www.speakeasy.com/blog/how-we-reduced-token-usage-by-100x-dynamic-toolsets-v2
1•crumbaugh•1m ago•0 comments

Act-1: A Robot Foundation Model Trained on Zero Robot Data [video]

https://www.youtube.com/watch?v=jjOfpsMRhL4
1•lukeinator42•1m ago•0 comments

To Rebuild the Labor Movement, Take on the Giants

https://jacobin.com/2025/10/union-organizing-targets-nlrb-strategy/
1•PaulHoule•2m ago•0 comments

How to Enforce Row-Level Security Using Lua Scripting

https://github.com/exasol/row-level-security-lua/blob/main/doc/user_guide/user_guide.md
1•geab•5m ago•0 comments

Considering a Tech Conference? Do's, Don'ts, and Notes

https://spin.atomicobject.com/tech-conference-dos-donts-notes/
1•philk10•5m ago•0 comments

Failure Is Required

https://theaiunderwriter.substack.com/p/failure-is-required
1•participant26•6m ago•0 comments

Why a foreign language sounds like a blur to non-native ears

https://medicalxpress.com/news/2025-11-foreign-language-blur-native-ears.html
1•pseudolus•6m ago•0 comments

Missing last name check left all Airline Passenger Data Vulnerable

https://alexschapiro.com/blog/security/vulnerability/2025/11/20/avelo-airline-reservation-api-vul...
3•bearsyankees•7m ago•0 comments

AI for bio needs real-time data

https://coherenceneuro.substack.com/p/ai-for-bio-needs-real-time-data
1•pppone•7m ago•0 comments

Key Observability Best Practices You Should Know in 2025

https://spacelift.io/blog/observability-best-practices
1•mariuszm•9m ago•0 comments

Benjamin Graham Formula

https://en.wikipedia.org/wiki/Benjamin_Graham_formula
1•kamaraju•9m ago•0 comments

Reply to Anil Dash, Re: Mozilla's Plan to Add AI to Firefox

https://manualdousuario.net/en/reply-anil-dash-firefox-mozilla-ai/
2•rpgbr•10m ago•0 comments

Nano Banana Pro

https://blog.google/technology/ai/nano-banana-pro/
31•meetpateltech•13m ago•7 comments

Open-weight LLM by a US company: Cogito v2.1 671B

https://twitter.com/drishanarora/status/1991204769642475656
1•newusertoday•14m ago•0 comments

VoiceCheap

https://www.voicecheap.ai/en
1•bellamoon544•14m ago•1 comments

Amazon RDS for PostgreSQL now supports major version 18

https://aws.amazon.com/about-aws/whats-new/2025/11/amazon-rds-postgresql-major-version-18/
2•enz•14m ago•0 comments

Tokyo Court Finds Cloudflare Liable for Manga Piracy in Long-Running Lawsuit

https://torrentfreak.com/tokyo-court-finds-cloudflare-liable-for-manga-piracy-in-long-running-law...
4•t-3•14m ago•0 comments

How to perform adaptive batching for massive remote LLM calls

https://cocoindex.io/blogs/batching
1•badmonster•15m ago•0 comments

Show HN: Built a vibe testing tool to turn screen recording into Playwright test

https://vibe.bug0.com
2•Sandeepg33k•16m ago•0 comments

AI is eating the world

https://www.ben-evans.com/presentations
1•rememberlenny•17m ago•0 comments

A vibecoded HN client with automatic summaries powered by AI

https://hn.nicola.dev/
2•napolux•18m ago•0 comments

Smart device uses AI and bioelectronics to speed up wound healing process

https://news.ucsc.edu/2025/09/smart-device-ai-bioelectronics-speed-up-wound-healing/
1•lxm•19m ago•0 comments

A Month of Chat-Oriented Programming

https://checkeagle.com/checklists/njr/a-month-of-chat-oriented-programming/
1•birdculture•20m ago•0 comments

OpenAI can't beat Google in consumer AI

https://nextword.substack.com/p/openai-cant-beat-google-in-consumer
1•gk1•20m ago•0 comments

Gemini 3 Pro Image

https://deepmind.google/models/gemini-image/pro/
1•meetpateltech•20m ago•1 comments

We avoided side-channels in our new post-quantum Go cryptography libraries

https://blog.trailofbits.com/2025/11/14/how-we-avoided-side-channels-in-our-new-post-quantum-go-c...
1•crescit_eundo•20m ago•0 comments

Freer Monads, More Extensible Effects [pdf]

https://okmij.org/ftp/Haskell/extensible/more.pdf
2•todsacerdoti•21m ago•0 comments

Vulnerabilities in LUKS2 disk encryption for confidential VMs

https://blog.trailofbits.com/2025/10/30/vulnerabilities-in-luks2-disk-encryption-for-confidential...
2•crescit_eundo•21m ago•0 comments

Balancer hack analysis and guidance for the DeFi ecosystem

https://blog.trailofbits.com/2025/11/07/balancer-hack-analysis-and-guidance-for-the-defi-ecosystem/
1•crescit_eundo•21m ago•0 comments

The Spectre of Information Overload

https://medium.com/@thevibethinker/the-spectre-of-information-overload-ce549a3a4714
3•erhuve•22m ago•0 comments