What’s wild about Step-3.5-Flash isn’t just the quality, it’s how close it is to fitting on personal hardware.
The int4 weights are ~110GB. That sounds insane, but 128GB unified memory machines already exist, and people are running it today. A few years ago, a 200B-class model was pure datacenter territory. Now it’s “expensive laptop / workstation” territory. That’s a huge shift.
The interesting part isn’t that this model is big. It’s that hardware curves and model efficiency are finally intersecting. Sparse MoE + quantization means frontier-ish reasoning is no longer locked to hyperscalers. We’re basically one consumer hardware generation away from this class of model being normal for power users.
mh3467•1h ago
Indeed. The direction is promising - the democratization of frontier intelligence. Your personal assistant (this and that Claw) isn't powered by commercial models via API but rather a model smart enough and small enough hosted on your own device.
cauenapier•1h ago
The int4 weights are ~110GB. That sounds insane, but 128GB unified memory machines already exist, and people are running it today. A few years ago, a 200B-class model was pure datacenter territory. Now it’s “expensive laptop / workstation” territory. That’s a huge shift.
The interesting part isn’t that this model is big. It’s that hardware curves and model efficiency are finally intersecting. Sparse MoE + quantization means frontier-ish reasoning is no longer locked to hyperscalers. We’re basically one consumer hardware generation away from this class of model being normal for power users.
mh3467•1h ago