shipfeedAI news, curated daily

01:18:41 CET
21 MAY01:18:41shipfeed
pull to refreshlast sync
Just in — 30 new
§ research · storyline

Sparse-to-dense rewards improve language model post-training

Sparse-to-dense rewards improve language model post-training

May 12 · · primary fetch1 sourceupdated May 12 ·

This storyline groups 2 articles from 1 source. The originating feed didn’t ship an excerpt — open any link below to read the piece.

read full article on arxiv.org
§ sources2 publications · timeline below
  1. arxiv.orgBeyond GRPO and On-Policy Distillation: An Empirical Sparse-to-Dense Reward Principle for Language-Model Post-Trainingprimary
  2. arxiv.orgOGLS-SD: On-Policy Self-Distillation with Outcome-Guided Logit Steering for LLM Reasoning

§ how this story moved

  1. primaryarXiv — cs.AI publishes the launch post.
  2. arXiv — cs.AI picks up coverage.