§ feed · storyline

PaperBench: Evaluating AI’s Ability to Replicate AI Research

OpenAI releases PaperBench, a benchmark designed to evaluate how well AI agents can replicate state-of-the-art AI research papers.

Apr 2 · 12:15:00 · primary fetch1 sourceupdated Apr 2 · 12:15:00

We introduce PaperBench, a benchmark evaluating the ability of AI agents to replicate state-of-the-art AI research.

§ sources1 publication · timeline below