P.S. Course goes far beyond micrograd, to makemore (transfomers), minbpe (tokenization), and nanoGPT (LLM training/loading).
It arguably reads cleaner than Karpathy's in some respects, as he occasionally gets a little ahead of his students with his '1337 Python skillz.
Supporting higher order derivatives was also something I considered, but it’s basically never needed in production models from what I’ve seen.
jjzkkj•1d ago