They mention promising results on Apple Silicon GPUs and even cite the contributions from Vello, but I don't see a Metal implementation in there and the benchmark only shows results from an RTX 2080. Is it safe to assume that they're referring to the WGPU version when talking about M-series chips?
genpfault•11h ago
almostgotcaught•10h ago
https://developer.nvidia.com/gpugems/gpugems3/part-vi-gpu-co...
merope14•9h ago
WJW•8h ago
otherjason•6h ago
WJW•5h ago
I would love to be enlightened about some real-world applications of radix sort I may have missed though, since it's a cool algorithm. Hence my question above.
littlestymaar•3h ago
LLMs are made from dense matrices, aren't they?
WJW•3h ago
[1] https://developer.nvidia.com/blog/mastering-llm-techniques-i...
[2] https://arxiv.org/html/2405.15525v1
almostgotcaught•2h ago
woadwarrior01•6h ago
almostgotcaught•6h ago
m-schuetz•6h ago