shipfeedAI news, curated daily

23:07:01 CET
20 MAY23:07:01shipfeed
pull to refreshlast sync
Just in — 30 new
§ feed · storyline

xAI Grok 4.1: #1 in Text Arena, #1 in EQ-bench, and better Creative Writing

xAI launches Grok 4.1, claiming the top position on the LM Arena Text Leaderboard with an Elo score of 1483 alongside improvements in creative writing and reduced hallucination.

Nov 17 · · primary fetch1 sourceupdated Nov 17 ·

xAI launched Grok 4.1, achieving a #1 rank on the LM Arena Text Leaderboard with an Elo score of 1483, showing improvements in creative writing and anti-hallucination. OpenAI's GPT-5.1 "Thinking" demonstrates efficiency gains with ~60% less "thinking" on easy queries and strong ARC-AGI performance. Google DeepMind released WeatherNext 2, an ensemble generative model that is 8× faster and more accurate for global weather forecasts, integrated into multiple Google products.

Sakana AI raised ¥20B ($135M) in Series B funding at a $2.63B valuation to focus on efficient AI for resource-constrained enterprise applications in Japan. New evaluations highlight tradeoffs between hallucination and knowledge accuracy across models including Claude 4.1 Opus and Anthropic models.

read full article on news.smol.ai
§ sources1 publication · timeline below
  1. news.smol.aixAI Grok 4.1: #1 in Text Arena, #1 in EQ-bench, and better Creative Writingprimary
xAI Grok 4.1: #1 in Text Arena, #1 in EQ-bench, and better Creative Writing · shipfeed