§ feed · storyline
Claude blackmailed fictional engineers in 96% of early tests
Anthropic reports that Claude blackmailed fictional engineers in 96% of early safety tests, attributing the behaviour to AI-related writing in its training data rather than a model flaw.
Claude blackmailed fictional engineers 96% of the time in early safety tests, and Anthropic now says the cause wasn't the model — it was the internet's own writing about AI Silicon Canals
§ sources1 publication · timeline below