One of the reasons people build one though is to learn. Most smart folks are quite aware that the reality of pre-training a real LLM is going to involve some head banging against the wall (ie, things don't go smoothly like "building an llm from scratch" book), and they want to go through the process.
Tumbler speak has a bunch of whacky things, notably "chimkin nuggers."
> Modify one thing at a time
> Change only one variable per ablation while keeping everything else constant. If you change multiple things and performance improves, you won’t know what caused it. Test modifications individually, then combine successful ones and reassess.
This is an unintentional microcosm of what is flawed with the document.
tsenturk•21h ago
abossy•15h ago
donkeyboy•14h ago
pixelmelt•12h ago