We just ran the exact expert FFN slice from Qwen2.5-72B-Instruct (8192×28672, batch 512) on a single NVIDIA B200.
Results (ROLV vs vendor-best cuBLAS):
- Speedup : 50.5× (4953% faster)
- Energy Savings : 91.4%
- Tokens/s : 6.42M vs 127k
- TFLOPS : 3,018 vs 59.7
- Energy : 64 J vs 742 J
- Per-iter : 0.000080 s vs 0.004027 s
A_hash and V_hash are identical for both runs (full reproducibility).
heggenhougen•1h ago
Results (ROLV vs vendor-best cuBLAS):
- Speedup : 50.5× (4953% faster) - Energy Savings : 91.4% - Tokens/s : 6.42M vs 127k - TFLOPS : 3,018 vs 59.7 - Energy : 64 J vs 742 J - Per-iter : 0.000080 s vs 0.004027 s
A_hash and V_hash are identical for both runs (full reproducibility).
ROLV_norm_hash: 8dbe5f139fd946d4cd84e8cc612cd9f68cbc87e394457884acc0c5dad56dd8dd
This is the real hot path for MoE inference. No synthetic matrices.
Comments and questions welcome