This is not static analysis. It runs your CUDA kernel in a CPU-backed simulator and predicts how it behaves on real GPUs
Basicly it uses a tile model tied to L2 size and SM limits
Right now it covers 80+ NVIDIA architectures and the Mean error on exec time is around 1–2% on our test kernals that we made 'more info in the blog'
It still struggles with dynamic parallelism but I will figure it out soon
rightnow_ai•2h ago
This is not static analysis. It runs your CUDA kernel in a CPU-backed simulator and predicts how it behaves on real GPUs
Basicly it uses a tile model tied to L2 size and SM limits
Right now it covers 80+ NVIDIA architectures and the Mean error on exec time is around 1–2% on our test kernals that we made 'more info in the blog'
It still struggles with dynamic parallelism but I will figure it out soon