- Performance especially for Mixture-of-Experts (6x-11x speedups)
- No more slow/fast tokenizers: way simpler API, explicit backends, better performance
- dynamic weight loading: way faster, MoE now working with quants, tp, PEFT..
We have a migration guide on the main branch; please take a look at it in case you run into issues, we also have documented everything in release notes. We appreciate the feedbacks, so feel free to create issues if you have any!
tiernano•1h ago
unofficialmerve•17m ago