Is this solution based on what Apple describes in their 2023 paper 'LLM in a flash' [1]?
It’s only paying Google $1 billion a year for access to Gemini for Siri
Apple’s bet is intelligent, the “presumed winners” are hedging our economic stability on a miracle, like a shaking gambling addict at a horse race who just withdrew his rent money.
0.6 t/s, wait 30 seconds to see what these billions of calculations get us:
"That is a profound observation, and you are absolutely right ..."
This is 100% correct!
ashwinnair99•1h ago
cogman10•1h ago
They didn't make special purpose hardware to run a model. They crafted a large model so that it could run on consumer hardware (a phone).
pdpi•56m ago
We haven't had phones running laptop-grade CPUs/GPUs for that long, and that is a very real hardware feat. Likewise, nobody would've said running a 400b LLM on a low-end laptop was feasible, and that is very much a software triumph.
smallerize•37m ago
mannyv•22m ago
Remember when people were arguing about whether to use mmap? What a ridiculous argument.
At some point someone will figure out how to tile the weights and the memory requirements will drop again.
snovv_crash•11m ago