§ feed · storyline
Nvidia introduces 4-bit pretraining method for large language models
Nvidia introduces a 4-bit pretraining method using NVFP4, validated on a 12B hybrid Mamba-Transformer model in the longest publicly documented 4-bit training run.
NVIDIA introduced a 4-bit pretraining methodology using NVFP4, validated on a 12B hybrid Mamba-Transformer, marking the longest publicly documented 4-bit training run.
§ sources1 publication · timeline below