§ feed · storyline

Fine-tuning open LLM judges to outperform GPT-5.2

Researchers fine-tune GPT-OSS 120B using Direct Preference Optimization on 5,400 preference pairs, outperforming GPT-5.2 as an LLM judge at 15x lower cost and 14x faster inference.

Feb 2 · 01:00:00 · primary fetch1 sourceupdated Feb 2 · 01:00:00

Fine-tuned open-source LLM judges can outperform GPT-5.2 at evaluating model outputs. Using Direct Preference Optimization on just 5,400 preference pairs, we trained GPT-OSS 120B to beat GPT-5.2 on human preference alignment—at 15x lower cost and 14x faster inference speeds.

read full article on together.ai ↗

§ sources1 publication · timeline below

together.aiFine-tuning open LLM judges to outperform GPT-5.2primary01:00:00