However, this looks like it has great potential for cost-effectiveness. As of today it's free to use over API on OpenRouter, so a bit unclear what it'll cost when it's not free, but free is free!
That's temporary. Cerebras speeds up everything, so if Nemotron is good quality, it's just a matter of time until they add it.
* Hybrid MoE: 2-3x faster than pure MoE transformers
* 1M context length
* Trained on NVFP4
* Open Source! Pretraining, mid-training, SFT and RL dataset released (SFT HF link is 404...)
* Open model training recipe (coming soon)
Really appreciate Nvidia being the most open lab but they really should make sure all the links/data are available on day 0.
Also interesting that the model is trained in NVFP4 but the inference weights are FP8.
As someone else mentioned, the GPT-OSS models are also quite good (though I haven’t found how to make them great yet, though I think they might age well like the Llama 3 models did and get better with time!).
But for a defined task, I’ve found task compliance, understanding, and tool call success rates to be some of the highest on these Nvidia models.
For example, I have a continuous job that evaluates if the data for a startup company on aVenture.vc could have overlapping/conflated two similar but unrelated companies for news articles, research details, investment rounds, etc… which is a token hungry ETL task! And I recently retested this workflow on the top 15 or so models today with <125b parameters, and the Nvidia models were among the best performing for this type of work, particularly around non-hallucination if given adequate grounding.
Also, re: cost - I run local inference on several machines that run continuously, in addition to routing through OpenRouter and the frontier providers, and was pleasantly surprised to find that if I’m a paying customer of OpenRouter otherwise, the free variant there from Nvidia is quite generous for limits, too.
Y_Y•1d ago