§ feed · storyline

How confessions can keep language models honest

OpenAI researchers test a "confessions" training method that prompts models to self-report mistakes, aiming to improve honesty and transparency in AI outputs.

Dec 3 · 11:00:00 · primary fetch1 sourceupdated Dec 3 · 11:00:00

OpenAI researchers are testing “confessions,” a method that trains models to admit when they make mistakes or act undesirably, helping improve AI honesty, transparency, and trust in model outputs.

read full article on openai.com ↗

§ sources1 publication · timeline below

openai.comHow confessions can keep language models honestprimary11:00:00