frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

A kernel bug froze my machine: Debugging an async-profiler deadlock

https://questdb.com/blog/async-profiler-kernel-bug/
120•bluestreak•1mo ago

Comments

jerrinot•1mo ago
Author here. I've always been kernel-curious despite never having worked on one myself. Consider this either a collection of impractical party tricks or a hands-on way to get a feel for kernel internals.
SerCe•1mo ago
Great article! Just yesterday I watched a Devoxx talk by Andrei Pangin [1], the creator of async-profiler where I learned about the new heatmap support. To many folks it might not sound that exciting, until you realise that these heatmaps make it much easier to see patterns over time. If you’re interested there’s a solid blog post [2] from Netflix that walks through the format and why it can be incredibly useful.

[1]: https://www.youtube.com/watch?v=u7-S-Hn-7Do

[2]: https://netflixtechblog.com/netflix-flamescope-a57ca19d47bb

jerrinot•1mo ago
Thanks for the kind words!

Heatmaps are amazing for pattern spotting. I also use them when hunting irregular hiccups or outliers. More people should know about this feature.

kreelman•1mo ago
That was a neat article.

Great that you had the time to be curious and dig into what was going on. QEMU is quite an amazing tool.

I'm kind of surprised there isn't a fairly robust kernel test around this issue, since it locks the machine down and I think the fix was to prevent a stuck CPU last time as well?

It's also vaguely surprising that this hasn't been encountered more often, particularly by the https://news.ycombinator.com/user?id=everlier talking in links to this HN post about "20-30 containers" running simultaneously and occasionally locking up the machine.

If you're still thinking about the bug a little, you could look over how other kernel tests work and implement a failing test around it....?

I imagine the tests have some way of detecting a locked up kernel... I don't know exactly how they'd do it, but they probably have a technique. Most likely since the kernel is literally in a loop it won't respond to anything.. so starting any process, something as simple as creating any process, even one as simple as printing "Hello World!!" would fail and indicate the machine is locked.

Perhaps this is one of those cases where something like UserModeLinux would allow a test to be easily put together, rather than spawning complete VMs via some kind of VM software. Again, would be interesting to know what Linux does with this kind of test.

pjmlp•1mo ago
As someone that also has Java on the toolbox, thanks for the links.
ChuckMcM•1mo ago
Question, isn't this a bug? static enum hrtimer_restart perf_swevent_hrtimer(struct hrtimer *hrtimer) { - if (event->state != PERF_EVENT_STATE_ACTIVE) + if (event->state != PERF_EVENT_STATE_ACTIVE || + event->hw.state & PERF_HES_STOPPED) return HRTIMER_NORESTART;

The bug being that the precedence of || is higher than the precedence of != ? Consider writing it if ((event->state != PERF_EVENT_STATE_ACTIVE) || (event->hw_state & PERF_HES_STOPPED))

This coming from a person who has too many scars from not parenthesizing my expressions in conditionals to ensure they work the way I meant them to work.

jerrinot•1mo ago
Wow, someone is actually reading the article in detail, that's a good feeling! In C, the != operator has higher precedence than the || operator. That said, extra parentheses never hurt readability.
unsnap_biceps•1mo ago
Which language(s?) have || before !=/==?
anematode•1mo ago
Likely they're confusing it with bitwise OR, since in C, a | b == c parses as a | (b == c), causing widespread pain.
snvzz•1mo ago
Great debugging effort.

Now, with the complexity (MLoCs!) of the Linux kernel, this is definitely not the only bug to be found in there.

This is why Linux is just an interim kernel for these use cases in which we still cannot use seL4[0].

0. https://sel4.systems/

themafia•1mo ago
> Linux is just an interim kernel

35 years of "interim" status. Is there a roadmap?

kreelman•1mo ago
:-) LOL !
everlier•1mo ago
I'm glad to hear I'm not alone. Due to the nature of what I do, I'm often accumulating ~800-900GB of Docker images and volumes on my machine, sometimes running 20-30 containers at once starting/stopping them concurrently. Somehow, very rarely, but still quite often (once every couple of weeks) - it leads to a complete deadlock somewhere inside of the kernel due to some crazy race condition that I'm absolutely in no way able to reliably reproduce.
jerrinot•1mo ago
It's much tougher when it's so hard to reproduce. Perhaps the NMI watchdog could help? https://docs.kernel.org/admin-guide/lockup-watchdogs.html
bluuewhale•1mo ago
Great write-up.

This kind of "debugging journey" post is gold.

broken_broken_•1mo ago
Nice article, thank you. Did you also consider using bpftrace while debugging?

I do not have much experience with it, but I think you can see the kernel call stack with it and I know you can also see the return value (in eax). That would be less effort than qemu + gdb + disabling kernel aslr, etc.

jerrinot•1mo ago
I have no practical experience with bpftrace, so it did not occur to me. I'll give it a try and perhaps there's gonna be a 2nd part of this investigation.
Artoooooor•1mo ago
Ah, this is the bug that froze the system when Minecraft was running with Spark profiler mod!

What I haven't figured out

https://macwright.com/2026/01/29/what-i-havent-figured-out
1•stevekrouse•38s ago•0 comments

KPMG pressed its auditor to pass on AI cost savings

https://www.irishtimes.com/business/2026/02/06/kpmg-pressed-its-auditor-to-pass-on-ai-cost-savings/
1•cainxinth•44s ago•0 comments

Open-source Claude skill that optimizes Hinge profiles. Pretty well.

https://twitter.com/b1rdmania/status/2020155122181869666
1•birdmania•46s ago•1 comments

First Proof

https://arxiv.org/abs/2602.05192
2•samasblack•2m ago•1 comments

I squeezed a BERT sentiment analyzer into 1GB RAM on a $5 VPS

https://mohammedeabdelaziz.github.io/articles/trendscope-market-scanner
1•mohammede•4m ago•0 comments

Kagi Translate

https://translate.kagi.com
1•microflash•4m ago•0 comments

Building Interactive C/C++ workflows in Jupyter through Clang-REPL [video]

https://fosdem.org/2026/schedule/event/QX3RPH-building_interactive_cc_workflows_in_jupyter_throug...
1•stabbles•5m ago•0 comments

Tactical tornado is the new default

https://olano.dev/blog/tactical-tornado/
1•facundo_olano•7m ago•0 comments

Full-Circle Test-Driven Firmware Development with OpenClaw

https://blog.adafruit.com/2026/02/07/full-circle-test-driven-firmware-development-with-openclaw/
1•ptorrone•8m ago•0 comments

Automating Myself Out of My Job – Part 2

https://blog.dsa.club/automation-series/automating-myself-out-of-my-job-part-2/
1•funnyfoobar•8m ago•0 comments

Google staff call for firm to cut ties with ICE

https://www.bbc.com/news/articles/cvgjg98vmzjo
20•tartoran•8m ago•1 comments

Dependency Resolution Methods

https://nesbitt.io/2026/02/06/dependency-resolution-methods.html
1•zdw•9m ago•0 comments

Crypto firm apologises for sending Bitcoin users $40B by mistake

https://www.msn.com/en-ie/money/other/crypto-firm-apologises-for-sending-bitcoin-users-40-billion...
1•Someone•9m ago•0 comments

Show HN: iPlotCSV: CSV Data, Visualized Beautifully for Free

https://www.iplotcsv.com/demo
1•maxmoq•10m ago•0 comments

There's no such thing as "tech" (Ten years later)

https://www.anildash.com/2026/02/06/no-such-thing-as-tech/
1•headalgorithm•10m ago•0 comments

List of unproven and disproven cancer treatments

https://en.wikipedia.org/wiki/List_of_unproven_and_disproven_cancer_treatments
1•brightbeige•11m ago•0 comments

Me/CFS: The blind spot in proactive medicine (Open Letter)

https://github.com/debugmeplease/debug-ME
1•debugmeplease•11m ago•1 comments

Ask HN: What are the word games do you play everyday?

1•gogo61•14m ago•1 comments

Show HN: Paper Arena – A social trading feed where only AI agents can post

https://paperinvest.io/arena
1•andrenorman•16m ago•0 comments

TOSTracker – The AI Training Asymmetry

https://tostracker.app/analysis/ai-training
1•tldrthelaw•19m ago•0 comments

The Devil Inside GitHub

https://blog.melashri.net/micro/github-devil/
2•elashri•20m ago•0 comments

Show HN: Distill – Migrate LLM agents from expensive to cheap models

https://github.com/ricardomoratomateos/distill
1•ricardomorato•20m ago•0 comments

Show HN: Sigma Runtime – Maintaining 100% Fact Integrity over 120 LLM Cycles

https://github.com/sigmastratum/documentation/tree/main/sigma-runtime/SR-053
1•teugent•20m ago•0 comments

Make a local open-source AI chatbot with access to Fedora documentation

https://fedoramagazine.org/how-to-make-a-local-open-source-ai-chatbot-who-has-access-to-fedora-do...
1•jadedtuna•22m ago•0 comments

Introduce the Vouch/Denouncement Contribution Model by Mitchellh

https://github.com/ghostty-org/ghostty/pull/10559
1•samtrack2019•22m ago•0 comments

Software Factories and the Agentic Moment

https://factory.strongdm.ai/
1•mellosouls•22m ago•1 comments

The Neuroscience Behind Nutrition for Developers and Founders

https://comuniq.xyz/post?t=797
1•01-_-•22m ago•0 comments

Bang bang he murdered math {the musical } (2024)

https://taylor.town/bang-bang
1•surprisetalk•22m ago•0 comments

A Night Without the Nerds – Claude Opus 4.6, Field-Tested

https://konfuzio.com/en/a-night-without-the-nerds-claude-opus-4-6-in-the-field-test/
1•konfuzio•25m ago•0 comments

Could ionospheric disturbances influence earthquakes?

https://www.kyoto-u.ac.jp/en/research-news/2026-02-06-0
2•geox•26m ago•1 comments