§ feed · storyline

Large Reasoning Models Fail to Follow Instructions During Reasoning: A Benchmark Study

ReasonIF benchmark finds frontier large reasoning models fail to follow instructions during the reasoning process over 75% of the time, tested across languages, formatting, and length constraints.

Oct 22 · 02:00:00 · primary fetch1 sourceupdated Oct 22 · 02:00:00

ReasonIF finds frontier LRMs fail to follow reasoning instructions >75% of the time; introduces a benchmark across languages, formatting, and length.

read full article on together.ai ↗

§ sources1 publication · timeline below

together.aiLarge Reasoning Models Fail to Follow Instructions During Reasoning: A Benchmark Studyprimary02:00:00