§ feed · storyline

lots of little things happened this week

Anthropic, NVIDIA, Sakana AI, Meta AI, and Percy Liang each release agent tools, benchmarks, or models across a busy week of AI updates.

Mar 22 · 01:20:28 · primary fetch1 sourceupdated Mar 22 · 01:20:28

Anthropic introduced a novel 'think' tool enhancing instruction adherence and multi-step problem solving in agents, with combined reasoning and tool use demonstrated by Claude. NVIDIA's Llama-3.3-Nemotron-Super-49B-v1 ranked #14 on LMArena, noted for strong math reasoning and a 15M post-training dataset. Sakana AI launched a Sudoku-based reasoning benchmark to advance AI problem-solving capabilities.

Meta AI released SWEET-RL, a reinforcement learning algorithm improving long-horizon multi-turn tasks by 6%, and introduced CollaborativeAgentBench, a benchmark for collaborative LLM agents working with humans on programming and design tasks. Percy Liang relaunched the HELM benchmark with 5 challenging datasets evaluating 22 top language models.

read full article on news.smol.ai ↗

§ sources1 publication · timeline below

news.smol.ailots of little things happened this weekprimary01:20:28