Why: I wanted to compare attention vs Mamba vs GQA at different parameter budgets without writing PyTorch for each experiment. Edit a JSON config, hit enter, see loss numbers. It will race different configs for you. The number one goal is iteration speed.
JSON config lets you chain together common ML blocks (attention, GQA, mamba, RetNet, and several more) and optimizers (muon, adamw) and compiles them to MLX IR, which can either run on Metal or CUDA backends.
Why Go: 1.6s builds, built-in profiling (mixlab -cpuprofile gives you a flame graph), import-based extensibility for custom blocks. No C++ extensions, no custom build systems. And personally I prefer strongly-typed, compiled languages.
On a Shakespeare benchmark matching nanoGPT (6L, 6H, d=384, 10.8M params): val loss 1.5527 on M1 Max, 1.5588 on A40. PyTorch numerical parity confirmed to 8 decimal places.
brew install mrothroc/tap/mixlab
mrothroc•1h ago
{ "name": "plain_3L",