shipfeedAI news, curated daily

17:03:20 CET
29 JUN17:03:20shipfeed
pull to refreshlast sync
Just in — 8 new
§ topic

evals

25 stories · 7d·6 sources covering·30 active storylines

Updated Sun, 28 Jun 2026 CEST·25 new storylines this week·live

What this is

Evals are the benchmarks and tests that measure how AI models perform on reasoning, coding, and safety tasks. shipfeed tracks new benchmark releases and notable evaluation results.

storylines this week30 active

Monday, June 8, 2026’s edition
Tuesday, June 2, 2026’s edition
Thursday, May 28, 2026’s edition
Saturday, May 16, 2026’s edition
Wednesday, May 13, 2026’s edition
AI Security Institute
SAFETY · 1 source

Mythos Preview first to complete both AISI cyber ranges

AI Security Institute: Mythos Preview is the first AI model to complete both of AISI's cyber ranges, which measure models' cyberattack capabilities; GPT-5.5 solved only one of them — In February 2026, we internally…

via techmeme.com
Yesterday’s edition
Friday, June 26, 2026’s edition
Wednesday, June 24, 2026’s edition
Tuesday, June 23, 2026’s edition
Monday, June 22, 2026’s edition
Tuesday, June 16, 2026’s edition
Friday, June 12, 2026’s edition
Thursday, June 11, 2026’s edition
Tuesday, June 9, 2026’s edition
Monday, June 8, 2026’s edition
Thursday, June 4, 2026’s edition
Tuesday, June 2, 2026’s edition
Wednesday, May 27, 2026’s edition
Wednesday, May 20, 2026’s edition
Tuesday, May 19, 2026’s edition
Monday, May 18, 2026’s edition
Sunday, May 17, 2026’s edition
Saturday, May 16, 2026’s edition