shipfeedAI news, curated daily

23:06:21 CET
20 MAY23:06:21shipfeed
pull to refreshlast sync
Just in — 30 new
§ feed · storyline

Qwen 3: 0.6B to 235B MoE full+base models that beat R1 and o1

Alibaba releases Qwen 3, a family of models ranging from 0.6B to 235B parameters including two MoE variants, under an Apache 2.0 licence with optional chain-of-thought inference mode.

Apr 28 · · primary fetch1 sourceupdated Apr 28 ·

Qwen 3 has been released by Alibaba featuring a range of models including two MoE variants, Qwen3-235B-A22B and Qwen3-30B-A3B, which demonstrate competitive performance against top models like DeepSeek-R1, o1, o3-mini, Grok-3, and Gemini-2.5-Pro. The models introduce an "enable_thinking=True" mode with advanced soft switching for inference scaling. The release is notable for its Apache 2.0 license and broad inference platform support including MCP.

The dataset improvements and multi-stage RL post-training contribute to performance gains. Meanwhile, Gemini 2.5 Pro from Google DeepMind shows strong coding and long-context reasoning capabilities, and DeepSeek R2 is anticipated soon. Twitter discussions highlight Qwen3's finegrained MoE architecture, large context window, and multi-agent system applications.

read full article on news.smol.ai
§ sources1 publication · timeline below
  1. news.smol.aiQwen 3: 0.6B to 235B MoE full+base models that beat R1 and o1primary
Qwen 3: 0.6B to 235B MoE full+base models that beat R1 and o1 · shipfeed