It's to our knowledge the first open-source model that's RL-trained on CUDA kernels. Our goal was to demonstrate multi-turn RL using GRPO. We used 180 Python->CUDA conversion tasks from the KernelBench dataset.
The results were surprisingly strong! We were able to outperform top reasoning model like o3 & o4-mini.
We're sharing our training setup and learnings in the blogpost. Also the model is on HuggingFace: https://huggingface.co/cognition-ai/Kevin-32B