Hmm, but supposing the accelerated NVIDIA specific inference data types were available for Triton, then you would just use that? Why not contribute to Triton, they accept PRs? Like so what if you do free product ecosystem development for NVIDIA and giant corporations by contributing to Triton?
qeternity•41m ago
Second line of the post:
> The main objective is to learn writing attention in CUDA C++, since many features are not available in Triton, such as MXFP8 / NVFP4 MMA for sm120.
steinvakt2•1h ago
I had a 5090 some months ago but couldnt get flash attention to work. Does it now work natively? What about 5080?
sigmoid10•16m ago
Pytorch now has native support for the Blackwell architecture:
doctorpangloss•1h ago
qeternity•41m ago
> The main objective is to learn writing attention in CUDA C++, since many features are not available in Triton, such as MXFP8 / NVFP4 MMA for sm120.