I am learning to write LLM pipelines using the Modular MAX inference framework. As a starting point I got GPT-2 working after reading through "The Illustrated GPT-2", Karpathy's nanoGPT codebase and existing models in the Modular repo. The MAX framework does require a lot of boilerplates and not designed to be very flexible, but you do gain awesome performance out of the box.
red2awn•2h ago