shipfeedAI news, curated daily

23:05:50 CET
20 MAY23:05:50shipfeed
pull to refreshlast sync
Just in — 30 new
§ feed · storyline

Did Nvidia's Nemotron 70B train on test?

Nvidia's Nemotron-70B faces scrutiny over whether its strong benchmark results on Arena Hard and AlpacaEval reflect test-set contamination rather than genuine capability gains over base Llama-3.1-70B.

Oct 17 · · primary fetch1 sourceupdated Oct 17 ·

NVIDIA's Nemotron-70B model has drawn scrutiny despite strong benchmark performances on Arena Hard, AlpacaEval, and MT-Bench, with some standard benchmarks like GPQA and MMLU Pro showing no improvement over the base Llama-3.1-70B. The new HelpSteer2-Preference dataset improves some benchmarks with minimal losses elsewhere. Meanwhile, Mistral released Ministral 3B and 8B models featuring 128k context length and outperforming Llama-3.1 and GPT-4o on various benchmarks under the Mistral Commercial License.

NVIDIA's Nemotron 70B also surpasses GPT-4o and Claude-3.5-Sonnet on key benchmarks using RLHF (REINFORCE) training. Additionally, Zep introduced Graphiti, an open-source temporal knowledge graph memory layer for AI agents, built on Neo4j.

read full article on news.smol.ai
§ sources1 publication · timeline below
  1. news.smol.aiDid Nvidia's Nemotron 70B train on test?primary