§ feed · storyline

Llama 3.1 Leaks: big bumps to 8B, minor bumps to 70b, and SOTA OSS 405b model

Meta's Llama 3.1 leaks reveal a 405B dense model with 128k context, trained on 39.3M GPU hours, with the 70B variant reportedly outperforming GPT-4o on some benchmarks.

Jul 23 · 03:12:50 · primary fetch1 sourceupdated Jul 23 · 03:12:50

Llama 3.1 leaks reveal a 405B dense model with 128k context length, trained on 39.3M GPU hours using H100-80GB GPUs, and fine-tuned with over 25M synthetic examples. The model shows significant benchmark improvements, especially for the 8B and 70B variants, with some evals suggesting the 70B outperforms GPT-4o. GPT-4o Mini launched as a cost-efficient variant with strong performance but some reasoning weaknesses.

Synthetic datasets like NuminaMath enable models such as Alibaba Qwen 2 to surpass GPT-4o and Claude 3.5 in math competitions. Discussions include reasoning task benchmarks and dataset building for improved reasoning.

read full article on news.smol.ai ↗

§ sources1 publication · timeline below

news.smol.aiLlama 3.1 Leaks: big bumps to 8B, minor bumps to 70b, and SOTA OSS 405b modelprimary03:12:50