shipfeedAI news, curated daily

23:06:08 CET
20 MAY23:06:08shipfeed
pull to refreshlast sync
Just in — 30 new
§ agents · storyline

Claude and GPT-5.5 develop real browser exploits in new benchmark

Carnegie Mellon researchers release a benchmark testing AI agents on real V8 engine exploits, with Claude Mythos outperforming GPT-5.5 but at twelve times the cost.

May 16 · · primary fetch1 sourceupdated May 16 ·

Researchers at Carnegie Mellon University built a new benchmark that measures how far AI agents can go when exploiting real vulnerabilities in Google's V8 engine. Mythos leads GPT-5.5 by a wide margin but costs twelve times as much.

The article New benchmark shows Claude Mythos and GPT-5.5 can develop real browser exploits autonomously appeared first on The Decoder.

read full article on the-decoder.com
§ sources1 publication · timeline below
  1. the-decoder.comNew benchmark shows Claude Mythos and GPT-5.5 can develop real browser exploits autonomouslyprimary