§ feed · storyline

xAI Grok 4.1: #1 in Text Arena, #1 in EQ-bench, and better Creative Writing

xAI launches Grok 4.1, claiming the top position on the LM Arena Text Leaderboard with an Elo score of 1483 alongside improvements in creative writing and reduced hallucination.

Nov 17 · 06:44:39 · primary fetch1 sourceupdated Nov 17 · 06:44:39

xAI launched Grok 4.1, achieving a #1 rank on the LM Arena Text Leaderboard with an Elo score of 1483, showing improvements in creative writing and anti-hallucination. OpenAI's GPT-5.1 "Thinking" demonstrates efficiency gains with ~60% less "thinking" on easy queries and strong ARC-AGI performance. Google DeepMind released WeatherNext 2, an ensemble generative model that is 8× faster and more accurate for global weather forecasts, integrated into multiple Google products.

Sakana AI raised ¥20B ($135M) in Series B funding at a $2.63B valuation to focus on efficient AI for resource-constrained enterprise applications in Japan. New evaluations highlight tradeoffs between hallucination and knowledge accuracy across models including Claude 4.1 Opus and Anthropic models.

read full article on news.smol.ai ↗

§ sources1 publication · timeline below

news.smol.aixAI Grok 4.1: #1 in Text Arena, #1 in EQ-bench, and better Creative Writingprimary06:44:39