shipfeedAI news, curated daily

23:05:47 CET
20 MAY23:05:47shipfeed
pull to refreshlast sync
Just in — 30 new
§ feed · storyline

Qwen3-Next-80B-A3B-Base: Towards Ultimate Training & Inference Efficiency

Alibaba releases Qwen3-Next-80B-A3B-Base, a sparse MoE model activating 3.7% of parameters that claims 10× cheaper training and 10× faster inference than prior models.

Sep 11 · · primary fetch1 sourceupdated Sep 11 ·

MoE (Mixture of Experts) models have become essential in frontier AI models, with Qwen3-Next pushing sparsity further by activating only 3.7% of parameters (3B out of 80B) using a hybrid architecture combining Gated DeltaNet and Gated Attention. This new design includes 512 total experts (10 routed + 1 shared), Zero-Centered RMSNorm for stability, and improved MoE router initialization, resulting in ~10× cheaper training and 10× faster inference compared to previous models.

Alibaba's Qwen3-Next reportedly outperforms Gemini-2.5-Flash-Thinking and approaches the flagship 235B model's performance, with deployments on Hugging Face, Baseten, and native vLLM support for efficient inference.

read full article on news.smol.ai
§ sources1 publication · timeline below
  1. news.smol.aiQwen3-Next-80B-A3B-Base: Towards Ultimate Training & Inference Efficiencyprimary