§ feed · storyline

Claude Crushes Code - 92% HumanEval and Claude.ai Artifacts

Anthropic releases Claude 3.5 Sonnet with state-of-the-art benchmark scores, twice the speed and one-fifth the cost of Opus, alongside a new Artifacts feature for interactive AI-generated content.

Jun 21 · 09:27:45 · primary fetch1 sourceupdated Jun 21 · 09:27:45

Claude 3.5 Sonnet, released by Anthropic, is positioned as a Pareto improvement over Claude 3 Opus, operating at twice the speed and costing one-fifth as much. It achieves state-of-the-art results on benchmarks like GPQA, MMLU, and HumanEval, surpassing even GPT-4o and Claude 3 Opus on vision tasks. The model demonstrates significant advances in coding capabilities, passing 64% of test cases compared to 38% for Claude 3 Opus, and is capable of autonomously fixing pull requests.

Anthropic also introduced the Artifacts feature, enabling users to interact with AI-generated content such as code snippets and documents in a dynamic workspace, similar to OpenAI's Code Interpreter. This release highlights improvements in performance, cost-efficiency, and coding proficiency, signaling a growing role for LLMs in software development.

read full article on news.smol.ai ↗

§ sources1 publication · timeline below

news.smol.aiClaude Crushes Code - 92% HumanEval and Claude.ai Artifactsprimary09:27:45