§ feed · storyline
Introducing the SWE-Lancer benchmark
OpenAI introduces SWE-Lancer, a benchmark that tests frontier LLMs on real-world freelance software engineering tasks worth up to $1 million in aggregate payouts.
Can frontier LLMs earn $1 million from real-world freelance software engineering?
§ sources1 publication · timeline below