shipfeedAI news, curated daily

23:04:18 CET
20 MAY23:04:18shipfeed
pull to refreshlast sync
Just in — 30 new
§ feed · storyline

Microsoft AgentInstruct + Orca 3

Microsoft Research releases AgentInstruct, a synthetic data pipeline generating 25.8 million instructions to fine-tune Mistral-7B, yielding gains of up to 54% on GSM8K and a 31% reduction in hallucinations.

Jul 16 · · primary fetch1 sourceupdated Jul 16 ·

Microsoft Research released AgentInstruct, the third paper in its Orca series, introducing a generative teaching pipeline that produces 25.8 million synthetic instructions to fine-tune mistral-7b, achieving significant performance gains: +40% AGIEval, +19% MMLU, +54% GSM8K, +38% BBH, +45% AlpacaEval, and a 31.34% reduction in hallucinations. This synthetic data approach follows the success of FineWeb and Apple's Rephrasing research in improving dataset quality.

Additionally, Tencent claims to have generated 1 billion diverse personas for synthetic data. On AI Twitter, notable discussions included a shooting incident at a Trump rally and recent ML research highlights such as FlashAttention-3, RankRAG, and Mixture of A Million Experts.

read full article on news.smol.ai
§ sources1 publication · timeline below
  1. news.smol.aiMicrosoft AgentInstruct + Orca 3primary