shipfeedAI news, curated daily

23:05:42 CET
20 MAY23:05:42shipfeed
pull to refreshlast sync
Just in — 30 new
§ feed · storyline

not much happened today

ARC Prize introduces ARC-AGI-3, a benchmark where humans solve 100% of tasks versus under 1% for current models, testing zero-preparation generalisation and learning efficiency.

Mar 24 · · primary fetch1 sourceupdated Mar 24 ·

ARC-AGI-3 benchmark introduced by @arcprize and François Chollet resets the frontier for general agentic reasoning with humans solving 100% of tasks versus under 1% for current models, focusing on zero-preparation generalization and human-like learning efficiency. The scoring protocol sparked debate over its harsh efficiency-based metric compared to prior ARC versions and other benchmarks like NetHack. The community acknowledges the benchmark highlights weaknesses in current LLM agents in interactive, sparse-feedback environments.

Concurrently, agent infrastructure advances with LangChain launching Fleet shareable skills for reusable domain knowledge, and Anthropic revealing Claude Code auto mode for classifier-mediated approval balancing autonomy and manual confirmation. Browser and coding agents are evolving into trainable systems beyond prompt wrappers, exemplified by BrowserBase and Prime Intellect collaboration.

read full article on news.smol.ai
§ sources1 publication · timeline below
  1. news.smol.ainot much happened todayprimary