§ feed · storyline

Did Nvidia's Nemotron 70B train on test?

Nvidia's Nemotron-70B faces scrutiny over whether its strong benchmark results on Arena Hard and AlpacaEval reflect test-set contamination rather than genuine capability gains over base Llama-3.1-70B.

Oct 17 · 02:44:43 · primary fetch1 sourceupdated Oct 17 · 02:44:43

NVIDIA's Nemotron-70B model has drawn scrutiny despite strong benchmark performances on Arena Hard, AlpacaEval, and MT-Bench, with some standard benchmarks like GPQA and MMLU Pro showing no improvement over the base Llama-3.1-70B. The new HelpSteer2-Preference dataset improves some benchmarks with minimal losses elsewhere. Meanwhile, Mistral released Ministral 3B and 8B models featuring 128k context length and outperforming Llama-3.1 and GPT-4o on various benchmarks under the Mistral Commercial License.

NVIDIA's Nemotron 70B also surpasses GPT-4o and Claude-3.5-Sonnet on key benchmarks using RLHF (REINFORCE) training. Additionally, Zep introduced Graphiti, an open-source temporal knowledge graph memory layer for AI agents, built on Neo4j.

read full article on news.smol.ai ↗

§ sources1 publication · timeline below

news.smol.aiDid Nvidia's Nemotron 70B train on test?primary02:44:43