The Punchline: I made it go 4,600x faster in pure C code, no dependencies and using a compiler with SIMD auto-vectorisation!!!
Andrej recently released microgpt.py - a brilliant, atomic look at the core of a GPT. As a low-latency developer, I couldn't resist seeing how fast it could go when you get closer to the metal.
So just for funzies, I spent a few hours building microgpt-c, a zero-dependency and pure C99 implementation featuring:
- 4,600x Faster training vs the Python reference (Tested on MacBook Pro M2 Max). On Windows, it is 2,300x faster. - SIMD Auto-vectorisation for high-speed matrix operations. - INT8 Quantisation (reducing weight storage by ~8x). Training is slightly slower, but the storage reduction is significant.
- Zero Dependencies - just pure logic.
The amalgamation image below is just for fun (and to show off the density!), but the GitHub repo contains the fully commented, structured code for anyone who wants to play with on-device AI.
I have started to build something useful, like a simple C code static analyser - I will do a follow-up post.
Everything else is just efficiency... but efficiency is where the magic happens
idiotsecant•11m ago