shipfeedAI news, curated daily

00:33:20 CET
21 MAY00:33:20shipfeed
pull to refreshlast sync
Just in — 30 new
§ feed · storyline

Introducing SWE-bench Verified

OpenAI releases SWE-bench Verified, a human-validated subset of SWE-bench designed to more reliably measure AI models' ability to resolve real-world software issues.

Aug 13 · · primary fetch1 sourceupdated Aug 13 ·

We’re releasing a human-validated subset of SWE-bench that more reliably evaluates AI models’ ability to solve real-world software issues.

read full article on openai.com
§ sources1 publication · timeline below
  1. openai.comIntroducing SWE-bench Verifiedprimary