Thanks for sharing; you clearly spent a lot of time making this easy to digest. I especially like the tokens-to-embedding visualisation.
I recently had some trouble converting a HF transformer I trained with PyTorch to Core ML. I just couldn’t get the KV cache to work, which made it unusably slow after 50 tokens…
Hopefully you can write the teased next article about how Feedforward and Output layers work. The article was super helpful for me to get better understanding on how LLM GPTs work!
Youden•1d ago
Link seems to be broken: content briefly loads then is replaced with "Something Went Wrong" then "D is not a function". Stays broken with adblock disabled.
est•1h ago
This is a surprising good read of how LLM works in general.
simedw•2d ago
I recently had some trouble converting a HF transformer I trained with PyTorch to Core ML. I just couldn’t get the KV cache to work, which made it unusably slow after 50 tokens…
samwho•1d ago
Yes, I recently wrote https://github.com/samwho/llmwalk and had a similar experience with cache vs no cache. It’s so impactful.
mrgaro•35m ago