I have a 16GB GPU as well, but have never run a local model so far. According to the table in the article, 9B and 8-bit -> 13 GB and 27B and 3-bit seem to fit inside the memory. Or is there more space required for context etc?
On my M3 Air w/ 24GB of memory 27B is 2 tok/s but 35B A3B is 14-22 tok/s which is actually usable.
Twirrim•4h ago
ufish235•3h ago
hooch•1h ago
This was a qwen3-coder-next 35B model on M4 Max with 64GB which seems to be 51GB size according to ollama. Have not yet tried the variants from the TFA.
manmal•14m ago
fragmede•3h ago
fy20•1h ago
manmal•18m ago