shipfeedAI news, curated daily

00:38:29 CET
21 MAY00:38:29shipfeed
pull to refreshlast sync
Just in — 30 new
§ feed · storyline

lots of little things happened this week

Anthropic, NVIDIA, Sakana AI, Meta AI, and Percy Liang each release agent tools, benchmarks, or models across a busy week of AI updates.

Mar 22 · · primary fetch1 sourceupdated Mar 22 ·

Anthropic introduced a novel 'think' tool enhancing instruction adherence and multi-step problem solving in agents, with combined reasoning and tool use demonstrated by Claude. NVIDIA's Llama-3.3-Nemotron-Super-49B-v1 ranked #14 on LMArena, noted for strong math reasoning and a 15M post-training dataset. Sakana AI launched a Sudoku-based reasoning benchmark to advance AI problem-solving capabilities.

Meta AI released SWEET-RL, a reinforcement learning algorithm improving long-horizon multi-turn tasks by 6%, and introduced CollaborativeAgentBench, a benchmark for collaborative LLM agents working with humans on programming and design tasks. Percy Liang relaunched the HELM benchmark with 5 challenging datasets evaluating 22 top language models.

read full article on news.smol.ai
§ sources1 publication · timeline below
  1. news.smol.ailots of little things happened this weekprimary