frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Tactical tornado is the new default

https://olano.dev/blog/tactical-tornado/
1•facundo_olano•1m ago•0 comments

Full-Circle Test-Driven Firmware Development with OpenClaw

https://blog.adafruit.com/2026/02/07/full-circle-test-driven-firmware-development-with-openclaw/
1•ptorrone•1m ago•0 comments

Automating Myself Out of My Job – Part 2

https://blog.dsa.club/automation-series/automating-myself-out-of-my-job-part-2/
1•funnyfoobar•1m ago•0 comments

Google staff call for firm to cut ties with ICE

https://www.bbc.com/news/articles/cvgjg98vmzjo
1•tartoran•2m ago•0 comments

Dependency Resolution Methods

https://nesbitt.io/2026/02/06/dependency-resolution-methods.html
1•zdw•2m ago•0 comments

Crypto firm apologises for sending Bitcoin users $40B by mistake

https://www.msn.com/en-ie/money/other/crypto-firm-apologises-for-sending-bitcoin-users-40-billion...
1•Someone•3m ago•0 comments

Show HN: iPlotCSV: CSV Data, Visualized Beautifully for Free

https://www.iplotcsv.com/demo
1•maxmoq•4m ago•0 comments

There's no such thing as "tech" (Ten years later)

https://www.anildash.com/2026/02/06/no-such-thing-as-tech/
1•headalgorithm•4m ago•0 comments

List of unproven and disproven cancer treatments

https://en.wikipedia.org/wiki/List_of_unproven_and_disproven_cancer_treatments
1•brightbeige•4m ago•0 comments

Me/CFS: The blind spot in proactive medicine (Open Letter)

https://github.com/debugmeplease/debug-ME
1•debugmeplease•5m ago•1 comments

Ask HN: What are the word games do you play everyday?

1•gogo61•8m ago•1 comments

Show HN: Paper Arena – A social trading feed where only AI agents can post

https://paperinvest.io/arena
1•andrenorman•9m ago•0 comments

TOSTracker – The AI Training Asymmetry

https://tostracker.app/analysis/ai-training
1•tldrthelaw•13m ago•0 comments

The Devil Inside GitHub

https://blog.melashri.net/micro/github-devil/
2•elashri•13m ago•0 comments

Show HN: Distill – Migrate LLM agents from expensive to cheap models

https://github.com/ricardomoratomateos/distill
1•ricardomorato•13m ago•0 comments

Show HN: Sigma Runtime – Maintaining 100% Fact Integrity over 120 LLM Cycles

https://github.com/sigmastratum/documentation/tree/main/sigma-runtime/SR-053
1•teugent•14m ago•0 comments

Make a local open-source AI chatbot with access to Fedora documentation

https://fedoramagazine.org/how-to-make-a-local-open-source-ai-chatbot-who-has-access-to-fedora-do...
1•jadedtuna•15m ago•0 comments

Introduce the Vouch/Denouncement Contribution Model by Mitchellh

https://github.com/ghostty-org/ghostty/pull/10559
1•samtrack2019•16m ago•0 comments

Software Factories and the Agentic Moment

https://factory.strongdm.ai/
1•mellosouls•16m ago•1 comments

The Neuroscience Behind Nutrition for Developers and Founders

https://comuniq.xyz/post?t=797
1•01-_-•16m ago•0 comments

Bang bang he murdered math {the musical } (2024)

https://taylor.town/bang-bang
1•surprisetalk•16m ago•0 comments

A Night Without the Nerds – Claude Opus 4.6, Field-Tested

https://konfuzio.com/en/a-night-without-the-nerds-claude-opus-4-6-in-the-field-test/
1•konfuzio•18m ago•0 comments

Could ionospheric disturbances influence earthquakes?

https://www.kyoto-u.ac.jp/en/research-news/2026-02-06-0
2•geox•20m ago•1 comments

SpaceX's next astronaut launch for NASA is officially on for Feb. 11 as FAA clea

https://www.space.com/space-exploration/launches-spacecraft/spacexs-next-astronaut-launch-for-nas...
1•bookmtn•21m ago•0 comments

Show HN: One-click AI employee with its own cloud desktop

https://cloudbot-ai.com
2•fainir•24m ago•0 comments

Show HN: Poddley – Search podcasts by who's speaking

https://poddley.com
1•onesandofgrain•24m ago•0 comments

Same Surface, Different Weight

https://www.robpanico.com/articles/display/?entry_short=same-surface-different-weight
1•retrocog•27m ago•0 comments

The Rise of Spec Driven Development

https://www.dbreunig.com/2026/02/06/the-rise-of-spec-driven-development.html
2•Brajeshwar•31m ago•0 comments

The first good Raspberry Pi Laptop

https://www.jeffgeerling.com/blog/2026/the-first-good-raspberry-pi-laptop/
3•Brajeshwar•31m ago•0 comments

Seas to Rise Around the World – But Not in Greenland

https://e360.yale.edu/digest/greenland-sea-levels-fall
2•Brajeshwar•31m ago•0 comments
Open in hackernews

Unweaving warp specialization on modern tensor core GPUs

https://rohany.github.io/blog/warp-specialization/
34•rohany•4mo ago

Comments

liuliu•4mo ago
My understanding is that you cannot talk about warp specialization without talking about the alternative: multi-stage pipelining. And the final example code given is multi-stage pipeline with double buffers.

And here is my understanding where it differs:

1. multi-stage pipeline requires careful hand-tuning, even at PTX level to make sure your async wait is weaved properly to maximize overlap.

2. since these register files now is huge, multi-stage pipeline is difficult to write at intrinsics level to make efficient use of these huge register files.

3. Warp specialization delegated most of these scheduling dynamically, hence it is better adapted to hardware (and have more information to make scheduling decisions at runtime). Although this is a bit moot because we write different code for different hardware anyway.

Anything more I am missing?

rohany•4mo ago
Author here! I think that warp specialization is inherently related to multi-stage pipelining, they aren't really alternatives of each other. Warp specialization is a way to realize a multi-stage pipeline in the face of hazards that may cause the pipeline to spill out of the register file or not let parts of the pipeline run concurrently as desired.

The fact that we tend to need different warp specialization strategies for different hardware is a consequence of the capabilities of that hardware (i.e. different asynchronous instruction types), and contributes to the complexity of targeting that new hardware.

majke•4mo ago
I always assumed that when one warp waits for results from a long latency instruction, another warp, potentially from another block can be scheduled in.

I guess this post assumes the need to use all the gpu resources from within a single block.

rohany•4mo ago
> I always assumed that when one warp waits for results from a long latency instruction, another warp, potentially from another block can be scheduled in.

Yes, that is correct. However, most MMA-style kernels that utilize the Tensor Core usually need enough resources per block that only 1 block fits on each SM.