§ feed · storyline

Introducing SWE-bench Verified

OpenAI releases SWE-bench Verified, a human-validated subset of SWE-bench designed to more reliably measure AI models' ability to resolve real-world software issues.

Aug 13 · 12:00:00 · primary fetch1 sourceupdated Aug 13 · 12:00:00

We’re releasing a human-validated subset of SWE-bench that more reliably evaluates AI models’ ability to solve real-world software issues.

read full article on openai.com ↗

§ sources1 publication · timeline below

openai.comIntroducing SWE-bench Verifiedprimary12:00:00