reinforcement learning can push LLMs beyond generation and into true performance optimization at the assembly level. achieving a 1.47x speedup over gcc -O3 is no small feat—especially considering -O3 is already highly optimized.
vlovich123•16m ago
O3 is highly optimized using generic techniques that worked in a variety of scenarios and could get papers published. Given that carefully laid out assembly can outperform by like 10x across size and speed, I think there’s a lot of headroom to play with.
badmonster•4h ago
vlovich123•16m ago