This is definitely not the best-performing model like this out there! But I found it surprising we were able to get to this much out of it: stone's throw away from a teacher 1000x the size!
We also ran the same thing using the 4B Qwen and matched the teacher accuracy, though here the difference is merely 100x :)
I find this pretty cool - obviously our distilled models can only do this one task and don't generalize, but that's often exactly what you want when you're building agentic systems.
Happy to answer any questions!