shipfeedAI news, curated daily

00:39:51 CET
21 MAY00:39:51shipfeed
pull to refreshlast sync
Just in — 30 new
§ feed · storyline

Tencent's Hunyuan-Large claims to beat DeepSeek-V2 and Llama3-405B with LESS Data

Tencent releases Hunyuan-Large, a 300B-parameter MoE model trained on 7T tokens that outperforms DeepSeek-V2 and Llama3-405B, though its custom licence bars EU use and firms above 100M MAU.

Nov 6 · · primary fetch1 sourceupdated Nov 6 ·

Tencent released a notable >300B parameter MoE model pretrained on 7T tokens, including 1.5T synthetic data generated via Evol-Instruct. The model introduces novel techniques like "recycle routing" and expert-specific learning rates, alongside a compute-efficient scaling law for MoE active parameters. However, its custom license restricts use in the EU and by companies with over 100M MAU, and it avoids China-sensitive queries. Meanwhile, Anthropic launched Claude 3.5 Haiku, now available on multiple platforms, praised for intelligence and speed but criticized for a 10x price increase.

Meta opened Llama AI to the U.S. defense sector, and a Llama Impact Hackathon offers a $15K prize for projects using Llama 3.1 & 3.2 Vision. LlamaIndex released a React chat UI component with Tailwind CSS and LLM backend integrations. The MLX LM model advances text generation speed and efficiency with KV cache quantization.

read full article on news.smol.ai
§ sources1 publication · timeline below
  1. news.smol.aiTencent's Hunyuan-Large claims to beat DeepSeek-V2 and Llama3-405B with LESS Dataprimary