§ feed · storyline
Fine-tuning open LLM judges to outperform GPT-5.2
Researchers fine-tune GPT-OSS 120B using Direct Preference Optimization on 5,400 preference pairs, outperforming GPT-5.2 as an LLM judge at 15x lower cost and 14x faster inference.
Fine-tuned open-source LLM judges can outperform GPT-5.2 at evaluating model outputs. Using Direct Preference Optimization on just 5,400 preference pairs, we trained GPT-OSS 120B to beat GPT-5.2 on human preference alignment—at 15x lower cost and 14x faster inference speeds.
§ sources1 publication · timeline below