This is the first step towards fully automated GPU performance optimization. The idea is to automatically generate GPU kernels, then automatically integrate them in vLLM/SGLang/PyTorch.
atallahw•4h ago
This was fun to work on. LLMs for writing kernels still has a long way to go. Its honestly a little surprising how decent they are now. I guess I've been pretty consistently "surprised" by codegen for a while now (meaning the last two years)
essamwisam•4h ago
Quite cool. It's interesting that the LLM is able to optimize code based on the target hardware itself.
mohsaied•4h ago