§ feed · storyline
Large Reasoning Models Fail to Follow Instructions During Reasoning: A Benchmark Study
ReasonIF benchmark finds frontier large reasoning models fail to follow instructions during the reasoning process over 75% of the time, tested across languages, formatting, and length constraints.
ReasonIF finds frontier LRMs fail to follow reasoning instructions >75% of the time; introduces a benchmark across languages, formatting, and length.
§ sources1 publication · timeline below