§ feed · storyline

Open-world evaluations for measuring frontier AI capabilities

Anthropic launches CRUX, a new evaluation project designed to measure frontier AI capabilities on long, complex, open-world tasks.

Apr 16 · 19:47:29 · primary fetch1 sourceupdated Apr 16 · 19:47:29

Introducing CRUX, a new project for evaluating AI on long, messy tasks

§ sources1 publication · timeline below