§ feed · storyline

Fine-tuning GPT-2 from human preferences

OpenAI fine-tunes its 774M-parameter GPT-2 model using human feedback, finding that labeler preferences shaped outputs in unintended ways, such as copying source text verbatim for summarisation tasks.

Sep 19 · 09:00:00 · primary fetch1 sourceupdated Sep 19 · 09:00:00

We’ve fine-tuned the 774M parameter GPT-2 language model using human feedback for various tasks, successfully matching the preferences of the external human labelers, though those preferences did not always match our own. Specifically, for summarization tasks the labelers preferred sentences copied wholesale from the input (we’d only asked them to ensure accuracy), so our models learned to copy.

Summarization required 60k human labels; simpler tasks which continue text in various styles required only 5k. Our motivation is to move safety techniques closer to the general task of “machines talking to humans,” which we believe is key to extracting information about human values.

read full article on openai.com ↗

§ sources1 publication · timeline below

openai.comFine-tuning GPT-2 from human preferencesprimary09:00:00