GPU-wide memory is not quite as scarce on datacenter cards or systems with unified memory. One could also have local executors with local futures that are `!Send` and place in a faster address space.
The anticipated benefits are similar to the benefits of async/await on CPU: better ergonomics for the developer writing concurrent code, better utilization of shared/limited resources, fewer concurrency bugs.
GPUs are still not practically-Turing-complete in the sense that there are strict restrictions on loops/goto/IO/waiting (there are a bunch of band-aids to make it pretend it's not a functional programming model).
So I am not sure retrofitting a Ferrari to cosplay an Amazon delivery van is useful other than for tech showcase?
Good tech showcase though :)
In years prior I wouldn't have even bothered, but it's 2026 and AMD's drivers actually come with a recent version of torch that 'just works' on windows. Anything is possible :)
Is the goal with this project (generally, not specifically async) to have an equivalent to e.g. CUDA, but in Rust? Or is there another intended use-case that I'm missing?
shayonj•1h ago