Seems like the Pallas of old has completely been upgraded
reasonableklout•4mo ago
Pallas has a couple backends, this is the new-ish Mosaic GPU one. AAUI it provides a bunch of low-level APIs for interacting directly with NVIDIA-specific and new Blackwell features like SMEM, TMEM, collective MMA, etc.
What's interesting is that the MGPU team has achieved SOTA Blackwell GEMM performance before Triton (which IIUC is trying to bring up Gluon to reach the same level). All the big players are coming up with their own block-based low-level-ish DSLs for CUDA: OpenAI, NVIDIA, and now Google.
flakiness•4mo ago
So OpenAI has Triton and Google has Pallas. What's the NVIDIA counterpart?
arjvik•4mo ago
Seems like the Pallas of old has completely been upgraded
reasonableklout•4mo ago
What's interesting is that the MGPU team has achieved SOTA Blackwell GEMM performance before Triton (which IIUC is trying to bring up Gluon to reach the same level). All the big players are coming up with their own block-based low-level-ish DSLs for CUDA: OpenAI, NVIDIA, and now Google.
flakiness•4mo ago
saagarjha•4mo ago
flakiness•4mo ago