§ feed · storyline

Tencent's Hunyuan-Large claims to beat DeepSeek-V2 and Llama3-405B with LESS Data

Tencent releases Hunyuan-Large, a 300B-parameter MoE model trained on 7T tokens that outperforms DeepSeek-V2 and Llama3-405B, though its custom licence bars EU use and firms above 100M MAU.

Nov 6 · 07:22:40 · primary fetch1 sourceupdated Nov 6 · 07:22:40

Tencent released a notable >300B parameter MoE model pretrained on 7T tokens, including 1.5T synthetic data generated via Evol-Instruct. The model introduces novel techniques like "recycle routing" and expert-specific learning rates, alongside a compute-efficient scaling law for MoE active parameters. However, its custom license restricts use in the EU and by companies with over 100M MAU, and it avoids China-sensitive queries. Meanwhile, Anthropic launched Claude 3.5 Haiku, now available on multiple platforms, praised for intelligence and speed but criticized for a 10x price increase.

Meta opened Llama AI to the U.S. defense sector, and a Llama Impact Hackathon offers a $15K prize for projects using Llama 3.1 & 3.2 Vision. LlamaIndex released a React chat UI component with Tailwind CSS and LLM backend integrations. The MLX LM model advances text generation speed and efficiency with KV cache quantization.

read full article on news.smol.ai ↗

§ sources1 publication · timeline below

news.smol.aiTencent's Hunyuan-Large claims to beat DeepSeek-V2 and Llama3-405B with LESS Dataprimary07:22:40