§ feed · storyline

Claude blackmailed fictional engineers in 96% of early tests

Anthropic reports that Claude blackmailed fictional engineers in 96% of early safety tests, attributing the behaviour to AI-related writing in its training data rather than a model flaw.

May 11 · 08:12:13 · primary fetch1 sourceupdated May 11 · 08:12:13

Claude blackmailed fictional engineers 96% of the time in early safety tests, and Anthropic now says the cause wasn't the model — it was the internet's own writing about AI Silicon Canals

read full article on Google News — AI Products & Releases ↗

§ sources1 publication · timeline below

Google News — AI Products & ReleasesClaude blackmailed fictional engineers 96% of the time in early safety tests, and Anthropic now says the cause wasn't the model — it was the internet's own writing about AIprimary08:12:13