§ feed · storyline

o3 achieves major breakthroughs across AI reasoning benchmarks

OpenAI announces o3 and o3-mini models, achieving 25% on FrontierMath and 87.5% on ARC-AGI, with o3-mini offering lower inference costs on coding tasks.

Dec 21 · 02:44:22 · primary fetch1 sourceupdated Dec 21 · 02:44:22

OpenAI announced the o3 and o3-mini models with groundbreaking benchmark results, including a jump from 2% to 25% on the FrontierMath benchmark and 87.5% on the ARC-AGI reasoning benchmark, representing about 11 years of progress on the GPT3 to GPT4o scaling curve. The o1-mini model shows superior inference efficiency compared to o3-full, promising significant cost reductions on coding tasks.

The announcement was accompanied by community discussions, safety testing applications, and detailed analyses. Sama highlighted the unusual cost-performance tradeoff, and Eric Wallace shared insights on the o-series deliberative alignment strategy.

read full article on news.smol.ai ↗

§ sources1 publication · timeline below

news.smol.aio3 solves AIME, GPQA, Codeforces, makes 11 years of progress in ARC-AGI and 25% in FrontierMathprimary02:44:22