lol, it took me 48 hours to do (and re-do, and re-do) this test + write it up and now that I convinced myself to stop changing bits and just publish it... Google's just announced the Gemma 4 QAT models :-D
It would not change the core of my article since the bottleneck remains the memory bandwidth on the old M1 16GB though
ABS•55m ago
It would not change the core of my article since the bottleneck remains the memory bandwidth on the old M1 16GB though