§ feed · storyline
Open-world evaluations for measuring frontier AI capabilities
Anthropic launches CRUX, a new evaluation project designed to measure frontier AI capabilities on long, complex, open-world tasks.
Introducing CRUX, a new project for evaluating AI on long, messy tasks
§ sources1 publication · timeline below