Impressive work anti diagonal DP on CUDA, clean MCUPS framing, and the multi language shipping is legit. The “109× faster than NVIDIA on H100” line is accurate for your chosen case (cuDF/nvtext, long strings), but it’s not a blanket “faster than NVIDIA,” and readers will assume that tighten the scope. Bio results are a good baseline, not SOTA; Hopper’s DPX and WFA style tiling/bucketing would likely move you a tier up. Hashing and 52 bit MinHash are clever, but you need full SMHasher reports and retrieval quality metrics, not just entropy/collisions. Publish exact versions, params, and end to end timings (I/O + marshaling), plus short string vs long string batches. If you add those and rename the headline to reflect the setup, the claims will be hard to poke holes in.
ozgrakkurt•4mo ago