shipfeedAI news, curated daily

23:54:49 CET
20 MAY23:54:49shipfeed
pull to refreshlast sync
Just in — 30 new
§ feed · storyline

Open-world evaluations for measuring frontier AI capabilities

Anthropic launches CRUX, a new evaluation project designed to measure frontier AI capabilities on long, complex, open-world tasks.

Apr 16 · · primary fetch1 sourceupdated Apr 16 ·

Introducing CRUX, a new project for evaluating AI on long, messy tasks

read full article on normaltech.ai
§ sources1 publication · timeline below
  1. normaltech.aiOpen-world evaluations for measuring frontier AI capabilitiesprimary