§ feed · storyline

Nvidia introduces 4-bit pretraining method for large language models

Nvidia introduces a 4-bit pretraining method using NVFP4, validated on a 12B hybrid Mamba-Transformer model in the longest publicly documented 4-bit training run.

May 18 · 02:00:00 · primary fetch1 sourceupdated May 18 · 02:00:00

NVIDIA introduced a 4-bit pretraining methodology using NVFP4, validated on a 12B hybrid Mamba-Transformer, marking the longest publicly documented 4-bit training run.

read full article on marktechpost.com ↗

§ sources1 publication · timeline below

marktechpost.comNVIDIA Introduces a 4-Bit Pretraining Methodology Using NVFP4, Validated on a 12B Hybrid Mamba-Transformer at 10T Token Horizonprimary02:00:00