frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Unweaving warp specialization on modern tensor core GPUs

https://rohany.github.io/blog/warp-specialization/
18•rohany•1h ago

Comments

liuliu•1h ago
My understanding is that you cannot talk about warp specialization without talking about the alternative: multi-stage pipelining. And the final example code given is multi-stage pipeline with double buffers.

And here is my understanding where it differs:

1. multi-stage pipeline requires careful hand-tuning, even at PTX level to make sure your async wait is weaved properly to maximize overlap.

2. since these register files now is huge, multi-stage pipeline is difficult to write at intrinsics level to make efficient use of these huge register files.

3. Warp specialization delegated most of these scheduling dynamically, hence it is better adapted to hardware (and have more information to make scheduling decisions at runtime). Although this is a bit moot because we write different code for different hardware anyway.

Anything more I am missing?

rohany•16m ago
Author here! I think that warp specialization is inherently related to multi-stage pipelining, they aren't really alternatives of each other. Warp specialization is a way to realize a multi-stage pipeline in the face of hazards that may cause the pipeline to spill out of the register file or not let parts of the pipeline run concurrently as desired.

The fact that we tend to need different warp specialization strategies for different hardware is a consequence of the capabilities of that hardware (i.e. different asynchronous instruction types), and contributes to the complexity of targeting that new hardware.

majke•22m ago
I always assumed that when one warp waits for results from a long latency instruction, another warp, potentially from another block can be scheduled in.

I guess this post assumes the need to use all the gpu resources from within a single block.

rohany•15m ago
> I always assumed that when one warp waits for results from a long latency instruction, another warp, potentially from another block can be scheduled in.

Yes, that is correct. However, most MMA-style kernels that utilize the Tensor Core usually need enough resources per block that only 1 block fits on each SM.

Palantir Wants to Be a Lifestyle Brand

https://www.wired.com/story/palantir-wants-to-be-a-lifestyle-brand/
1•nis0s•26s ago•0 comments

Cloudflare proposes the Spotify model for the web

https://www.coryd.dev/posts/2025/cloudflare-proposes-the-spotify-model-for-the-web
1•goncalossilva•3m ago•0 comments

TikTok Deal Swaps Chinese Surveillance for U.S. Surveillance, Critics Warn

https://time.com/7319281/tiktok-trump-surveillance-china/
2•asix66•4m ago•0 comments

Show HN: An MCP that allows you break LLM's context limit

https://github.com/VectifyAI/pageindex-mcp
1•mingtianzhang•4m ago•0 comments

Trump Is Getting Closer to Having an 'Infinite Money Pit'

https://www.theatlantic.com/economy/archive/2025/09/trump-federal-reserve-control-unchecked-power...
2•mxschumacher•6m ago•0 comments

Germicidal UV could make airborne diseases as rare as those carried by water

https://www.worksinprogress.news/p/how-to-clean-the-air
1•venkii•8m ago•0 comments

RFC: Multikernel Architecture Support

https://lore.kernel.org/lkml/20250918222607.186488-1-xiyou.wangcong@gmail.com/
1•kleinmatic•12m ago•1 comments

AnyCoder creates a demo for Qwen Image Edit Plus in 10mins

https://huggingface.co/spaces/akhaliq/Qwen-Image-Edit-2509
1•ilovecode•13m ago•1 comments

CCXML

https://en.wikipedia.org/wiki/Call_Control_eXtensible_Markup_Language
1•indigodaddy•13m ago•0 comments

Atlassian Rovo

https://www.atlassian.com/software/rovo
1•nigelgutzmann•14m ago•0 comments

Gemini API Charging Indefinetly for Expired Caches

https://twitter.com/stablefluffy/status/1968221018348253191
1•acossta•15m ago•1 comments

Spirit Airlines Furloughing One-Third (1,800) of Its Flight Attendants

https://www.wsj.com/business/airlines/spirit-flight-attendant-furlough-bankruptcy-820d5506
1•bookofjoe•15m ago•1 comments

Black Swan Manager Sees Rally, Then 1929-Style Crash

https://www.wsj.com/finance/stocks/black-swan-manager-sees-huge-rally-then-1929-style-crash-f2d16c9b
2•c420•15m ago•0 comments

Trump admin links autism and Tylenol ingredient use during pregnancy

https://www.cnbc.com/2025/09/22/trump-autism-tylenol-acetaminophen-pregnancy.html
9•hmm37•17m ago•9 comments

Python SDK for Venice AI

https://github.com/actuallyrizzn/Venice-AI-SDK
1•actuallyrizzn•24m ago•1 comments

Reverse brain drain: governments hope to lure talent after US visa change

https://www.reuters.com/world/china/reverse-brain-drain-governments-hope-lure-talent-after-us-vis...
5•alephnerd•25m ago•0 comments

Hostship: A Lightweight Alternative to Dokku

https://plark.com/hostship-dokku-alternative
2•khaledg•27m ago•0 comments

Porting a library to a different language with a sentence

https://randomlabs.ai/blog/porting-a-library-with-slate
1•tortilla•29m ago•0 comments

Show HN: I built an AI news site

https://ainews247.org
1•computerex•29m ago•0 comments

Potential plagiarism in Hierarchical Reasoning Model paper

https://twitter.com/skewbed/status/1970236528925258014
1•tripplyons•30m ago•0 comments

Training VLA Models with Normalizing Flows

https://github.com/dunnolab/NinA
1•vokneruk•32m ago•0 comments

The Cracker Barrel Hype(rreality)

https://www.unpopularfront.news/p/the-cracker-barrel-hyperreality
17•jpm_sd•38m ago•3 comments

Flights Are Diverted from Copenhagen Airport After Drone Sightings

https://www.nytimes.com/2025/09/22/world/europe/copenhagen-airport-closed-drone.html
4•donohoe•39m ago•0 comments

Flashed face distortion effect - (optical illusion) [video]

https://www.youtube.com/watch?v=_fW9uWFXRpQ
1•whycome•41m ago•1 comments

Acid-resistant artificial mucus improves gastric wound healing in animals

https://medicalxpress.com/news/2025-09-acid-resistant-artificial-mucus-gastric.html
2•PaulHoule•43m ago•0 comments

Galatea, by Emily Short (2000)

https://iplayif.com/?story=http%3A%2F%2Fwww.ifarchive.org%2Fif-archive%2Fgames%2Fzcode%2FGalatea....
2•matheist•45m ago•0 comments

Disney reinstates Jimmy Kimmel after backlash over capitulation to FCC

https://arstechnica.com/tech-policy/2025/09/disney-abc-reinstate-jimmy-kimmel-amid-uproar-over-go...
112•tomrod•47m ago•86 comments

GitHub replaces dashbord feed with AI shit?

https://github.com/login
4•twp•48m ago•5 comments

Vulkan – Cross platform 3D Graphics

https://www.vulkan.org/
5•ibobev•52m ago•0 comments

Show HN: A price breakdown of "rapture prep" as consumer math, not theology

https://www.thepricer.org/rapture-prep-prices-2025/
1•lorastonden•55m ago•0 comments