Over the past few months, I have built a distillation toolkit that supports cross-tokenizer distillation (e.g., distilling from LLaMA to Qwen vocab, or others). This approach has worked well on reasoning datasets like AIME, and we’ve validated on models like Phi and Qwen.
We’ve also integrated Modal for quick deployment (with $30/month credits to try it out).
Would love any feedback!
GitHub: https://github.com/agokrani/distillKitPlus
Docs: https://distillkitplus.mintlify.app/
shikharM07•2h ago
agokrani•1h ago