If you’ve ever been curious about how GPT actually works under the hood, I built a small project you might find interesting. I implemented a GPT-style transformer from scratch in a single notebook—covering tokenization, embeddings, causal self-attention, training, and autoregressive text generation without relying on high-level abstractions. The focus was on mechanistic clarity rather than scale or performance, and the notebook is structured to read more like a technical walkthrough than an experiment log. Feedback from people who’ve built or studied transformers would be very welcome.