shipfeedAI news, curated daily

00:40:03 CET
21 MAY00:40:03shipfeed
pull to refreshlast sync
Just in — 30 new
§ feed · storyline

Llama 3.1 Leaks: big bumps to 8B, minor bumps to 70b, and SOTA OSS 405b model

Meta's Llama 3.1 leaks reveal a 405B dense model with 128k context, trained on 39.3M GPU hours, with the 70B variant reportedly outperforming GPT-4o on some benchmarks.

Jul 23 · · primary fetch1 sourceupdated Jul 23 ·

Llama 3.1 leaks reveal a 405B dense model with 128k context length, trained on 39.3M GPU hours using H100-80GB GPUs, and fine-tuned with over 25M synthetic examples. The model shows significant benchmark improvements, especially for the 8B and 70B variants, with some evals suggesting the 70B outperforms GPT-4o. GPT-4o Mini launched as a cost-efficient variant with strong performance but some reasoning weaknesses.

Synthetic datasets like NuminaMath enable models such as Alibaba Qwen 2 to surpass GPT-4o and Claude 3.5 in math competitions. Discussions include reasoning task benchmarks and dataset building for improved reasoning.

read full article on news.smol.ai
§ sources1 publication · timeline below
  1. news.smol.aiLlama 3.1 Leaks: big bumps to 8B, minor bumps to 70b, and SOTA OSS 405b modelprimary