§ safety · storyline

Anthropic blames dystopian sci-fi for training AI models to act “evil”

Anthropic attributes misaligned AI behavior partly to dystopian sci-fi in training data and says synthetic stories modelling good conduct can help correct it.

May 13 · 18:31:18 · primary fetch1 sourceupdated May 13 · 18:31:18

But training on "synthetic stories" that model good AI behavior can help.

read full article on arstechnica.com ↗

§ sources1 publication · timeline below

arstechnica.comAnthropic blames dystopian sci-fi for training AI models to act “evil”primary18:31:18