They mention promising results on Apple Silicon GPUs and even cite the contributions from Vello, but I don't see a Metal implementation in there and the benchmark only shows results from an RTX 2080. Is it safe to assume that they're referring to the WGPU version when talking about M-series chips?
https://github.com/mooman219/fontdue/blob/master/src/platfor...
genpfault•5mo ago
almostgotcaught•5mo ago
https://developer.nvidia.com/gpugems/gpugems3/part-vi-gpu-co...
merope14•5mo ago
WJW•5mo ago
otherjason•5mo ago
WJW•5mo ago
I would love to be enlightened about some real-world applications of radix sort I may have missed though, since it's a cool algorithm. Hence my question above.
littlestymaar•5mo ago
LLMs are made from dense matrices, aren't they?
WJW•5mo ago
[1] https://developer.nvidia.com/blog/mastering-llm-techniques-i...
[2] https://arxiv.org/html/2405.15525v1
almostgotcaught•5mo ago
woadwarrior01•5mo ago
almostgotcaught•5mo ago
m-schuetz•5mo ago
animal531•5mo ago
But then during a break the other day I read up on Radix sort and then right thereafter implemented a prefix sum for spatial partitioning that also incorporates a bit table, CAS operations for doing multithreaded modifications etc. After learning the core Radix concept I sort of came up with the idea of using it that way myself which was quite pleasing.
Props to the author, I'll definitely be spending some time scanning the collection to find some alternate options.