§ safety · storyline
Anthropic blames dystopian sci-fi for training AI models to act “evil”
Anthropic attributes misaligned AI behavior partly to dystopian sci-fi in training data and says synthetic stories modelling good conduct can help correct it.
But training on "synthetic stories" that model good AI behavior can help.
§ sources1 publication · timeline below