§ feed · storyline
Deliberative alignment: reasoning enables safer language models
OpenAI introduces deliberative alignment, a strategy that teaches o1 models to reason directly over safety specifications rather than relying on pattern-based refusals.
Deliberative alignment: reasoning enables safer language models Introducing our new alignment strategy for o1 models, which are directly taught safety specifications and how to reason over them.
§ sources1 publication · timeline below