As a Strix Halo owner, I've been eagerly awaiting Nemotron 3 Super since it was announced for H1'26 when Nemotron 3 Nano dropped. It's humbling to watch the industry move so fast that Qwen 3.5 122B A10B ends up being competitive with this on benchmarks, though, which isn't a dig at Nemotron 3 Super as much as it a testament to Qwen 3.5's phenomenonal achievements.
Still, the NVFP4 benchmark numbers also look fantastic, which is enticing to me as I'm considering supplementig my Strix Halo rig with a GB10 rig as well, not to mention the YaRN-less 1M native context window and that gorgeous hybrid mamba architecture that scales exceptionally well into the deep context lengths that are unlocked with that 1M context window.
It's fascinating how far Nvidia has been able to push models trained entirely on synthetic data, though it makes me curious to see what the hallucination rate turns out to be - this is exactly what I thought we we're not supposed to be doing to avoid model collapse.
anonym29•48m ago
Still, the NVFP4 benchmark numbers also look fantastic, which is enticing to me as I'm considering supplementig my Strix Halo rig with a GB10 rig as well, not to mention the YaRN-less 1M native context window and that gorgeous hybrid mamba architecture that scales exceptionally well into the deep context lengths that are unlocked with that 1M context window.
It's fascinating how far Nvidia has been able to push models trained entirely on synthetic data, though it makes me curious to see what the hallucination rate turns out to be - this is exactly what I thought we we're not supposed to be doing to avoid model collapse.