§ feed · storyline

Improving mathematical reasoning with process supervision

OpenAI trains a model to a new state-of-the-art in mathematical problem solving using process supervision, rewarding each correct reasoning step rather than only the final answer.

May 31 · 09:00:00 · primary fetch1 sourceupdated May 31 · 09:00:00

We've trained a model to achieve a new state-of-the-art in mathematical problem solving by rewarding each correct step of reasoning (“process supervision”) instead of simply rewarding the correct final answer (“outcome supervision”).

In addition to boosting performance relative to outcome supervision, process supervision also has an important alignment benefit: it directly trains the model to produce a chain-of-thought that is endorsed by humans.

read full article on openai.com ↗

§ sources1 publication · timeline below

openai.comImproving mathematical reasoning with process supervisionprimary09:00:00