I’m on an M1 Max with 32GB VRAM, so I’m looking forward to the 27B or 35B-A3B models. Is dropping $5k for an RTX 6000 or a DGX Spark really the best option?
In October/2024 I got my Mac studio M1 ultra with 128G, IIRC it was ~$2500. With recent prices explosion, it has certainly gotten more expensive. https://frame.work/ is selling 128G strix halo mainboard for $2700, but you have to add storage and case.
Unfortunately, the prices rose on these a lot, but unevenly. Beelink GTR 9 Pro is $4400, Framework Desktop is ~$3500, for what is basically the exact same mainboard as a Bosgame M5 for $2800.
Apple's M5 Max is another attractive option. Apple silicon traditionally had great MBW and was good at TG, but struggled with PP, but the new neural engines in those GPU cores have made a big difference in a good way here.
Gorgon Halo is rumored for June announcement with Q4'26 release with basically +100 MHz clocks on Strix Halo, LPDDR5X-8533 instead of LPDDR5X-8000, but more importantly, 192 GB max instead of 128 GB.
I'd say it's better to wait for Gorgon Halo than to grab Strix Halo now. However, Medusa Halo, rumored for H2'27, is slated to have 24c/48t Zen 6, 48 CU of RDNA 5 instead of 40 CU RDNA 3.5, and a 384 bit bus w/ LPDDR6, which should make 256 GB at more like ~490-600 GB/s MBW, which will really make Strix and Gorgon Halo obsolete.
Also worth keeping an eye out for Serpent Lake (intel CPU + nvidia iGPU on a single board with unified memory, rumored for 2028-2029 iirc), and on the 160 GB Crescent Island Intel dGPU.
I've created a 2.54BPW quant that fit on my hardware with 128k context, 20 tps tg and 200tps pp, while maintaining high scores on many benchmarks: https://huggingface.co/tarruda/Qwen3.5-397B-A17B-GGUF/discus...
But I was not super impressed with deepseek 4 flash using it from the official API either, so it doesn't seem quantization fault. It is a good model, but nothing out of the ordinary in the few benchmarks I ran on it (with full awareness that benchmarks are biased).
goyozi•1h ago
hmokiguess•39m ago
htrp•22m ago
if they say it's 4.7 comparable, it anchors that into your head as the model to evaluate against.
maelito•18m ago