Async/Await on the GPU

https://www.vectorware.com/blog/async-await-on-gpu/

64•Philpax•2h ago

Comments

shayonj•1h ago

Very cool to see this and something I have been curious about myself and exploring the space as well. I'd be curious what are some parallels and differentiations between this and NVIDIA's stdexec (outside of it being in Rust and using Future, which is also cooL)

zozbot234•1h ago

I'm not quite seeing the real benefit of this. Is the idea that warps will now be able to do work-stealing and continuation-stealing when running heterogenous parallel workloads? But that requires keeping the async function's state in GPU-wide shared memory, which is generally a scarce resource.

LegNeato•1h ago

Yes, that's the idea.

GPU-wide memory is not quite as scarce on datacenter cards or systems with unified memory. One could also have local executors with local futures that are `!Send` and place in a faster address space.

textlapse•59m ago

What's the performance like? What would the benefits be of converting a streaming multiprocessor programming model to this?

LegNeato•52m ago

We aren't focused on performance yet (it is often workload and executor dependent, and as the post says we currently do some inefficient polling) but Rust futures compile down to state machines so they are a zero-cost abstraction.

The anticipated benefits are similar to the benefits of async/await on CPU: better ergonomics for the developer writing concurrent code, better utilization of shared/limited resources, fewer concurrency bugs.

textlapse•7m ago

warp is expensive - essentially it's running a 'don't run code' to maintain SIMT.

GPUs are still not practically-Turing-complete in the sense that there are strict restrictions on loops/goto/IO/waiting (there are a bunch of band-aids to make it pretend it's not a functional programming model).

So I am not sure retrofitting a Ferrari to cosplay an Amazon delivery van is useful other than for tech showcase?

Good tech showcase though :)

firefly2000•56m ago

Is this Nvidia-only or does it work on other architectures?

LegNeato•52m ago

Currently NVIDIA-only, we're cooking up some Vulkan stuff in rust-gpu though.

monster_truck•30m ago

I don't have anything to offer but my encouragement, but there are _dozens_ of ROCm enjoyers out there.

In years prior I wouldn't have even bothered, but it's 2026 and AMD's drivers actually come with a recent version of torch that 'just works' on windows. Anything is possible :)

firefly2000•3m ago

Does the lack of forward progress guarantees (ITS) on other architectures pose challenges for async/await?

Arch485•23m ago

Very cool!

Is the goal with this project (generally, not specifically async) to have an equivalent to e.g. CUDA, but in Rust? Or is there another intended use-case that I'm missing?

Claude Sonnet 4.6

Using go fix to modernize Go code

Gentoo on Codeberg

GrapheneOS – Break Free from Google and Apple

So you want to build a tunnel

Async/Await on the GPU

HackMyClaw

Can a Computer Science Student Be Taught to Design Hardware?

Show HN: I wrote a technical history book on Lisp

Trata (YC W25) Is Hiring Founding Engineers (NYC)

I converted 2D conventional flight tracking into 3D

Show HN: I taught LLMs to play Magic: The Gathering against each other

Chess engines do weird stuff

Launch HN: Sonarly (YC W26) – AI agent to triage and fix your production alerts

Climbing Mount Fuji visualized through milestone stamps

Is Show HN dead? No, but it's drowning

Show HN: 6cy – Experimental streaming archive format with per-block codecs

Don't pass on small block ciphers

Show HN: Continue – Source-controlled AI checks, enforceable in CI

Labyrinth Locator

Discord Rival Gets Overwhelmed by Exodus of Players Fleeing Age-Verification

Four Column ASCII (2017)

Semantic ablation: Why AI writing is generic and boring

Students Are Being Treated Like Guinea Pigs: Inside an AI-Powered Private School

Sub-Millisecond RAG on Apple Silicon. No Server. No API. One File

Hamming Distance for Hybrid Search in SQLite

Show HN: Cycast – High-performance radio streaming server written in Python

Rise of the Triforce

Show HN: I built a simulated AI containment terminal for my sci-fi novel

Show HN: Glitchy camera – a circuit-bent camera simulator in the browser

Claude Sonnet 4.6

Using go fix to modernize Go code

Gentoo on Codeberg

GrapheneOS – Break Free from Google and Apple

So you want to build a tunnel

Async/Await on the GPU

HackMyClaw

Can a Computer Science Student Be Taught to Design Hardware?

Show HN: I wrote a technical history book on Lisp

Trata (YC W25) Is Hiring Founding Engineers (NYC)

I converted 2D conventional flight tracking into 3D

Show HN: I taught LLMs to play Magic: The Gathering against each other

Chess engines do weird stuff

Launch HN: Sonarly (YC W26) – AI agent to triage and fix your production alerts

Climbing Mount Fuji visualized through milestone stamps

Is Show HN dead? No, but it's drowning

Show HN: 6cy – Experimental streaming archive format with per-block codecs

Don't pass on small block ciphers

Show HN: Continue – Source-controlled AI checks, enforceable in CI

Labyrinth Locator

Discord Rival Gets Overwhelmed by Exodus of Players Fleeing Age-Verification

Four Column ASCII (2017)

Semantic ablation: Why AI writing is generic and boring

Students Are Being Treated Like Guinea Pigs: Inside an AI-Powered Private School

Sub-Millisecond RAG on Apple Silicon. No Server. No API. One File

Hamming Distance for Hybrid Search in SQLite

Show HN: Cycast – High-performance radio streaming server written in Python

Rise of the Triforce

Show HN: I built a simulated AI containment terminal for my sci-fi novel

Show HN: Glitchy camera – a circuit-bent camera simulator in the browser

Async/Await on the GPU

Comments