§ feed · storyline
Introducing SWE-bench Verified
OpenAI releases SWE-bench Verified, a human-validated subset of SWE-bench designed to more reliably measure AI models' ability to resolve real-world software issues.
We’re releasing a human-validated subset of SWE-bench that more reliably evaluates AI models’ ability to solve real-world software issues.
§ sources1 publication · timeline below
- openai.comIntroducing SWE-bench Verifiedprimary