§ feed · storyline

Kimi K2 Thinking: 1T-A32B params, SOTA HLE, BrowseComp, TauBench && Soumith leaves Pytorch

Moonshot AI launches Kimi K2 Thinking, a 1T-parameter MoE model with 32B active experts, 256K context, and top benchmark scores on HLE, BrowseComp, and agentic tool-use tasks.

Nov 6 · 06:44:39 · primary fetch1 sourceupdated Nov 6 · 06:44:39

Moonshot AI launched Kimi K2 Thinking, a 1 trillion parameter mixture-of-experts (MoE) model with 32 billion active experts, a 256K context window, and native INT4 quantization-aware training. It achieves state-of-the-art results on benchmarks like HLE (44.9%), BrowseComp (60.2%), and agentic tool use with 200-300 sequential tool calls. The model is deployed with vLLM support and OpenAI-compatible APIs, available on platforms like Arena, Baseten, and Yupp.

Early user reports note some API instability under launch load. Meanwhile, Google announced the TPU v7 (Ironwood) with a 10× peak performance improvement over TPU v5p, aimed at training and agentic inference for models like Gemini. Apple added support for M5 Neural Accelerators in llama.cpp for inference acceleration.

read full article on news.smol.ai ↗

§ sources1 publication · timeline below

news.smol.aiKimi K2 Thinking: 1T-A32B params, SOTA HLE, BrowseComp, TauBench && Soumith leaves Pytorchprimary06:44:39