shipfeedAI news, curated daily

00:39:20 CET
21 MAY00:39:20shipfeed
pull to refreshlast sync
Just in — 30 new
§ feed · storyline

new Gemini 3 Deep Think, Anthropic $30B @ $380B, GPT-5.3-Codex Spark, MiniMax M2.5

Google DeepMind rolls out Gemini 3 Deep Think V2 to AI Ultra subscribers and select API users, scoring 84.6% on ARC-AGI-2 and 48.4% on Humanity's Last Exam without tools.

Feb 12 · · primary fetch1 sourceupdated Feb 12 ·

Google DeepMind is rolling out the upgraded Gemini 3 Deep Think V2 reasoning mode to Google AI Ultra subscribers and opening early access to the Vertex AI / Gemini API for select users. Key benchmark achievements include ARC-AGI-2 at 84.6%, Humanity’s Last Exam (HLE) at 48.4% without tools, and a Codeforces Elo of 3455, showcasing Olympiad-level performance in physics and chemistry. The mode emphasizes practical scientific and engineering applications such as error detection in math papers, physical system modeling, semiconductor optimization, and a sketch to CAD/STL pipeline for 3D printing.

ARC benchmark creator François Chollet highlights the benchmark's role in advancing test-time adaptation and fluid intelligence, projecting human-AI parity around 2030. This rollout is framed as a productized, compute-heavy test-time mode rather than a lab demo, with cost disclosures for ARC tasks provided.

read full article on news.smol.ai
§ sources1 publication · timeline below
  1. news.smol.ainew Gemini 3 Deep Think, Anthropic $30B @ $380B, GPT-5.3-Codex Spark, MiniMax M2.5primary