§ feed · storyline

Improving Model Safety Behavior with Rule-Based Rewards

Anthropic develops a Rule-Based Rewards method that aligns model safety behaviour without requiring extensive human data collection.

Jul 24 · 11:00:00 · primary fetch1 sourceupdated Jul 24 · 11:00:00

We've developed and applied a new method leveraging Rule-Based Rewards (RBRs) that aligns models to behave safely without extensive human data collection.

read full article on openai.com ↗

§ sources1 publication · timeline below

openai.comImproving Model Safety Behavior with Rule-Based Rewardsprimary11:00:00