§ models · storyline

not much happened today

OpenAI releases GPT-5.4 with a 1.05M token context window, tied #1 on the Artificial Analysis Intelligence Index at score 57, with higher per-token pricing and improved agentic coding performance.

Mar 6 · 06:44:39 · primary fetch1 sourceupdated Mar 6 · 06:44:39

OpenAI rolled out GPT-5.4, achieving tied #1 on the Artificial Analysis Intelligence Index with Gemini 3.1 Pro Preview scoring 57 (up from 51 for GPT-5.2 xhigh). GPT-5.4 features a larger ~1.05M token context window and higher per-token prices ($2.50/$15 vs $1.75/$14 for GPT-5.2), with strengths in physics reasoning (CritPt) and agentic coding (TerminalBench Hard) but a higher hallucination rate and ~28% higher benchmark run cost. The GPT-5.4 Pro variant shows a +10 point jump on CritPt reaching 30% but at an extreme output token cost of $180 / 1M tokens. Community benchmarks show GPT-5.4 excels in agentic/coding tasks but mixed feedback on reasoning efficiency and literalness compared to Claude.

OpenAI updated agent prompting guidance for GPT-5.4 API users, emphasizing tool use, structured outputs, and verification loops. Claude Code added local scheduled tasks and loop patterns for agents. The MCP framework is highlighted as a connective tissue for AI evaluation and design-code round-trips, with Truesight MCP enabling AI evaluation like unit testing and Figma MCP server supporting bidirectional design-code integration. Open-source T3 Code launched as an agent orchestration…

read full article on news.smol.ai ↗

§ sources1 publication · timeline below

news.smol.ainot much happened todayprimary06:44:39