I also have yet to see any of these at a larger scale. For example, can you try one of these at 100 billion parameters?
(I've been reading the MMLU-Redux questions for electrical engineering. They're very funny. Fifty years ago they might have been relevant. The references to the Intel 8085 date this to the mid-1970s. Moving coil meters were still a big thing back then. Ward-Leonard drives still drove some elevators and naval guns. This is supposed to be the hand-curated version of the questions. Where do they get this stuff? Old exams?)
[1] https://github.com/aryopg/mmlu-redux/blob/main/outputs/multi...
If you got that into a couple gigs--what could you stuff into 20 gigs?
in my results, accuracy-wise Ternary-Bonsai-8B is on par with Qwen3.5-4B. But in accuracy-per-byte, bonsai is the clear winner:
=> Ternary-Bonsai-1.7B achieved 65.1% from 462 MiB, beating Qwen3.5-0.8B by 12 points while being ~5% smaller on disk. => Ternary-Bonsai-4B is the accuracy-per-byte winner above 1 GiB. 83.0% from only 1.1 GiB, within 2 points of Qwen3.5-4B at 40% of the weight size.
they show strong promise on edge devices and where disk space is limited. I think this lab is worth watching.
wmf•1h ago
Dumbledumb•54m ago