09:02 CETWednesday · May 13, 2026

shipfeed

K SEARCHJK NAVO OPEN
on the wire
home/§ research/cluster
ad slot opena single understated line lives here — sponsor wordmark + a short line.advertise on shipfeed →
§ research · cluster

Sparse-to-dense rewards improve language model post-training

yesterday · · primary fetch1 sourcecluster 47b1a65cupdated yesterday ·

This cluster groups 2 articles from 1 source. The originating feed didn’t ship an excerpt — open any link below to read the piece.

read full article on arxiv.org
§ sources2 publications · timeline below
  1. arxiv.orgBeyond GRPO and On-Policy Distillation: An Empirical Sparse-to-Dense Reward Principle for Language-Model Post-Trainingprimary
  2. arxiv.orgOGLS-SD: On-Policy Self-Distillation with Outcome-Guided Logit Steering for LLM Reasoning

§ how this story moved

  1. primaryarXiv — cs.AI publishes the launch post.
  2. arXiv — cs.AI picks up coverage.