You can try it in the playground: https://microgpt-ts.vercel.app/playground
There are preset datasets (baby names, Pokemon, company names, movie titles, etc.) or you can paste your own text. The playground shows live loss curves as the model trains, and you can step through generation one token at a time to see the probability distribution at each step.
One difference from Karpathy's original is style. His microgpt is a single Python script optimized for brevity. This version splits the code into a few small files, types everything, and uses named helper functions (dotProduct, transpose, mean) instead of terse one-liners. The tradeoff is a bit more code, but it's easier to read and follow.
I built it up following the same progression as the blog post: bigram count table, then MLP with manual gradients, then autograd, single-head attention, multi-head + layer loop, and finally Adam. Each step is a separate PR and tag on GitHub [2] so you can follow along or check out any snapshot.
Martin_Gouy•1h ago
sdubois•1h ago