§ feed · storyline

QwQ-32B claims to match DeepSeek R1-671B

Alibaba Qwen releases QwQ-32B, a 32-billion-parameter reasoning model trained with two-stage reinforcement learning that aims to match the performance of DeepSeek R1-671B.

Apr 16 · 21:06:15 · primary fetch1 sourceupdated Apr 16 · 21:06:15

Alibaba Qwen released their QwQ-32B model, a 32 billion parameter reasoning model using a novel two-stage reinforcement learning approach: first scaling RL for math and coding tasks with accuracy verifiers and code execution servers, then applying RL for general capabilities like instruction following and alignment. Meanwhile, OpenAI rolled out GPT-4.5 to Plus users, with mixed feedback on coding performance and noted inference cost improvements.

The QwQ model aims to compete with larger MoE models like DeepSeek-R1. "GPT-4.5 is unusable for coding" was a notable user critique, while others praised its reasoning improvements due to scaling pretraining.

read full article on news.smol.ai ↗

§ sources1 publication · timeline below

news.smol.aiQwQ-32B claims to match DeepSeek R1-671Bprimary21:06:15