§ feed · storyline

Deliberative alignment: reasoning enables safer language models

OpenAI introduces deliberative alignment, a strategy that teaches o1 models to reason directly over safety specifications rather than relying on pattern-based refusals.

Dec 20 · 11:00:00 · primary fetch1 sourceupdated Dec 20 · 11:00:00

Deliberative alignment: reasoning enables safer language models Introducing our new alignment strategy for o1 models, which are directly taught safety specifications and how to reason over them.

read full article on openai.com ↗

§ sources1 publication · timeline below

openai.comDeliberative alignment: reasoning enables safer language modelsprimary11:00:00