§ feed · storyline

Nemotron-4-340B: NVIDIA's new large open models, built on syndata, great for syndata

NVIDIA releases Nemotron-4-340B, a dense model trained on 9T tokens using over 98% synthetic data, with performance comparable to GPT-4 and an open-sourced synthetic data pipeline.

Jun 14 · 23:06:38 · primary fetch1 sourceupdated Jun 14 · 23:06:38

NVIDIA has scaled up its Nemotron-4 model from 15B to a massive 340B dense model, trained on 9T tokens, achieving performance comparable to GPT-4. The model alignment process uses over 98% synthetic data, with only about 20K human-annotated samples for fine-tuning and reward model training. The synthetic data generation pipeline is open-sourced, including synthetic prompts and preference data generation. The base and instruct versions outperform Mixtral and Llama 3, while the reward model ranks better than Gemini 1.5, Cohere, and GPT-4o.

Other notable models include Mamba-2-Hybrid 8B, which is up to 8x faster than Transformers and excels on long-context tasks, Samba-3.8B-instruct for infinite context length with linear complexity, Dolphin-2.9.3 tiny models optimized for low-resource devices, and Faro Yi 9B DPO with a 200K context window running efficiently on 16GB VRAM. The Mixture-of-Agents technique boosts open-source LLMs beyond GPT-4 Omni on AlpacaEval 2.0.

read full article on news.smol.ai ↗

§ sources1 publication · timeline below

news.smol.aiNemotron-4-340B: NVIDIA's new large open models, built on syndata, great for syndataprimary23:06:38