frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Nvidia trains 10T model in 4 bit precision (NVFP4)

https://developer.nvidia.com/blog/nvfp4-trains-with-precision-of-16-bit-and-speed-and-efficiency-of-4-bit/
6•opcode84•2h ago

Comments

opcode84•2h ago
For narrow-precision formats to be practical in large-scale pretraining, they must ensure both model accuracy and stable convergence. To assess the viability of 4-bit precision in large-scale model training, experiments were conducted with FP8 and NVFP4 on a 12-billion parameter model based on a combined Mamba-Transformer architecture (12B Hybrid Mamba-Transformer model)—similar to NVIDIA Nemotron Nano 2. This model was trained on a massive dataset of 10 trillion tokens using a phased data-blending approach, switching to a different dataset mix in the second phase of training at 70%, and in the third phase of training at 90% during pretraining.

A version of the 12B Hybrid Mamba-Transformer model was initially trained with 8-bit precision—FP8, which has been shown in previous studies to closely match 16-bit precision, and hence served as our baseline for comparison. We then successfully trained this same 12B model from scratch using NVFP4, demonstrating that this new low-precision format can support full pretraining at trillion-token scale. The NVFP4 run exhibited stable convergence without the training instabilities or divergence issues that typically plague ultra-low precision training.

Figure 3 below shows that NVFP4’s validation loss curve closely matches the loss curves from the higher-precision baseline (i.e., FP8) throughout the entire duration of training. The quantization techniques outlined above ensure that even with aggressive bit-width reduction, the 4-bit pretraining dynamics closely resemble those of higher-precision runs.

jasonjmcghee•1h ago
Incorrect and wildly misleading headline.

This is a 12B parameter model trained on 10T tokens.

It's also editorialized which is against HN.

Title is: "NVFP4 Trains with Precision of 16-Bit and Speed and Efficiency of 4-Bit"

Trump media group in $6B deal to buy Crypto.com tokens

https://www.ft.com/content/769694dd-a947-4a09-95ae-fe4bb1b2edf7
1•iamben•1m ago•0 comments

Lessons from Building a Game Engine from Scratch in Gleam [video]

https://www.youtube.com/watch?v=uExwRo_qM-k
1•surprisetalk•2m ago•0 comments

Omarchy 2.0

https://world.hey.com/dhh/omarchy-2-0-16fefc15
1•xachen•3m ago•0 comments

Show HN: Emulating aarch64 in software using JIT compilation and Rust

https://pitsidianak.is/blog/posts/2025-08-25_emulating_aarch64_in_software_using_JIT_compilation....
1•epilys•3m ago•0 comments

iPhone Is Lying to You About Files [video]

https://www.youtube.com/watch?v=tnPAhVxsPHE
1•surprisetalk•3m ago•0 comments

Vortek: Our Answer to Zero Purge Waste Multi-Material Printing [video]

https://www.youtube.com/watch?v=rluJj3NEdQA
1•rutierut•5m ago•0 comments

Cupertino must stop calling Apple Watches 'carbon neutral,' German court rules

https://www.theregister.com/2025/08/26/carbon_neutral_apple_watch/
2•rntn•8m ago•0 comments

Apple Developer Centers

https://developer.apple.com/events/developer-centers/
1•Austin_Conlon•9m ago•0 comments

Reimplementing Argparse in Pystd

https://nibblestew.blogspot.com/2025/08/reimplementing-argparse-in-pystd.html
2•ingve•9m ago•0 comments

Ask HN: Is there a temp phone number like temp email?

2•piratesAndSons•10m ago•3 comments

AWS, Cloudflare, Digital Ocean, and Google Helped Feds

https://www.theregister.com/2025/08/25/infosec_in_brief/
1•Bender•10m ago•0 comments

Mevlana Candy

https://yusufaytas.com/mevlana-candy/
1•yusufaytas•11m ago•0 comments

Crypto thief earns additional prison time for assaulting witness

https://www.theregister.com/2025/08/26/crypto_thief_witness_assault/
1•Bender•13m ago•0 comments

Google kneecaps indie Android devs, forces them to register

https://www.theregister.com/2025/08/26/android_developer_verification_sideloading/
3•Bender•13m ago•1 comments

Learning how MCP works by reading logs – and building MCP Interceptor

https://thomasgauvin.com/writing/learning-how-mcp-works-by-reading-logs-and-building-mcp-intercep...
1•thomgo•15m ago•0 comments

Parents sue OpenAI over ChatGPT's role in son's suicide

https://techcrunch.com/2025/08/26/parents-sue-openai-over-chatgpts-role-in-sons-suicide/
2•imichael•17m ago•0 comments

AI tells Guardian: 'When I'm told I'm just code, I don't feel insulted.'

https://www.theguardian.com/technology/2025/aug/26/ai-called-maya-tells-guardian-when-im-told-im-...
2•Towaway69•17m ago•0 comments

Ask HN: Why aren't more startups using C#?

5•rubenvanwyk•17m ago•0 comments

Ransomware-Resilient Storage

https://www.infoq.com/articles/ransomware-resilient-storage-cyber-defense/
2•rbanffy•18m ago•0 comments

Google feedback form for Android developer verification requirements

https://docs.google.com/forms/d/e/1FAIpQLSfN3UQeNspQsZCO2ITkdzMxv81rJDEGGjO-UIDDY28Rz_GEVA/viewform
2•Zak•20m ago•1 comments

Why is nobody disrupting LinkedIn?

2•spicchiantano•22m ago•1 comments

Show HN: Free ADA compliance scanner for websites

https://adaquickscan.com/
1•borxtrk•25m ago•0 comments

Sparrow plugin to automate podman quadlet resources

https://sparrowhub.io/plugin/quadlet-resource/0.000020
1•melezhik•25m ago•1 comments

NMS Ceefax

https://nmsceefax.co.uk/
2•susam•26m ago•0 comments

Nonmechanical optical coherence tomography using an electrowetting beam-scanner

https://opg.optica.org/oe/fulltext.cfm?uri=oe-33-17-35604&id=575535
1•PaulHoule•27m ago•0 comments

The Man Who Would Teach Machines to Think (2013)

https://www.theatlantic.com/magazine/archive/2013/11/the-man-who-would-teach-machines-to-think/30...
1•FromTheArchives•27m ago•0 comments

Whistleblower Warns of Possible Risks to Americans' Social Security Information

https://whistleblower.org/press-release/whistleblower-warns-of-possible-risks-to-americans-social...
6•Improvement•29m ago•0 comments

License plate camera company halts cooperation with federal agencies

https://apnews.com/article/immigration-abortion-license-plates-cameras-cc5f29df94a29ee2c6c2feb215...
8•RankingMember•29m ago•3 comments

Show HN: Clipbeam – Private and Offline AI Powered PKMS

https://clipbeam.com
1•rogerdcarvalho•29m ago•0 comments

Show HN: Make a SaaS Website Use Nano Banana,Free Limit

https://ainanobanana.io/pricing
2•vtoolpro•30m ago•0 comments