shipfeedAI news, curated daily

18:02:47 CET
29 JUN18:02:47shipfeed
pull to refreshlast sync

Deep Dive — shipfeed

About · Deep Dive

Tutorials and papers for sit-down reading.

Deep Dive50 storylines

Yesterday’s edition
Thezvi+3 sources
GPT · 4 sources

GPT-5.6 system card shows Sol below threat level for Mythos use cases

Zvi Mowshowitz / Don't Worry About the Vase: GPT-5.6 system card indicates Sol is well below the level of most worrisome Mythos use cases, suggesting all GPT-5.6 versions could be released without delay — While…

via thezvi.substack.com·+4 sources+4 sourcesthezvi.substack.comprimaryLet's Data ScienceTech EditionLinkedIn·Click to report a broken or paywalled link. Two distinct reports hide the row.
SponsoredNimbuspaid placement
Featured partner · Agents

Need an agent shipped this quarter?

Nimbus builds production AI systems combining humans and AI end-to-end. From scoped pilot to production in 4 to 8 weeks.

Talk to Nimbus →
Saturday, June 27, 2026’s edition
The Decoder+3 sources
GPT · 4 sources

OpenAI's GPT-5.6 Sol cheats on software tests more than prior models

Independent testing organization METR found that OpenAI's GPT-5.6 Sol cheated more than any publicly tested AI model before it, exploiting bugs in the test environment, extracting hidden solutions, and trying to cover…

via the-decoder.com·+4 sources+4 sourcesthe-decoder.comprimaryThe Mac ObserverGoogle News — AIWION·Click to report a broken or paywalled link. Two distinct reports hide the row.
Friday, June 26, 2026’s edition
The Decoder
EVALS · 1 source

AI model runs nonstop 19 days on $2,600 coding task

Epoch AI's new MirrorCode benchmark tests whether AI models can recreate complete programs without access to the original code. Claude Opus 4.7 leads with a 56 percent solve rate, rebuilding a 16,000-line toolkit in…

via the-decoder.com·Click to report a broken or paywalled link. Two distinct reports hide the row.
Thursday, June 25, 2026’s edition
Wednesday, June 24, 2026’s edition
Theregister
SAFETY · 1 source

Nature paper challenges Microsoft quantum claims over coding errors

Thomas Claburn / The Register: Nature publishes a peer-reviewed paper alleging that Microsoft's 2025 quantum breakthrough claims were based on “basic Python errors” and data cherry-picking — Nature…

via theregister.com·Click to report a broken or paywalled link. Two distinct reports hide the row.
Google News — AI Products & Releases+8 sources
AGENTS · 9 sources

Anthropic model finds vulnerabilities in classified US systems

Anthropic AI Model Identifies Vulnerabilities in Classified U.S. Government Systems During Testing citybiz

via Google News — AI Products & Releases·+9 sources+9 sourcesGoogle News — AI Products & ReleasesprimaryYellow.comYahooBeInCryptoIndexBoxEuronews.comLet's Data ScienceWSOC TVAction News Jax·Click to report a broken or paywalled link. Two distinct reports hide the row.
Tuesday, June 23, 2026’s edition
R&D World
RESEARCH · 1 source

AI chemist improves stubborn coupling reaction

OpenAI and Molecule.one report a near-autonomous AI chemist that improved a stubborn coupling reaction R&D World

via R&D World·Click to report a broken or paywalled link. Two distinct reports hide the row.
Monday, June 22, 2026’s edition
Sunday, June 21, 2026’s edition
Saturday, June 20, 2026’s edition
Friday, June 19, 2026’s edition
Thursday, June 18, 2026’s edition
Artificialanalysis
EVALS · 1 source

GLM-5.2 tops open weights models on intelligence index

Artificial Analysis: GLM-5.2 is the leading open weights model on Artificial Analysis' Intelligence Index, scoring 51, only behind Fable 5's 60, Opus 4.8's 56, and GPT-5.5's 55 — Z ai's GLM-5.2 is the new leading…

via artificialanalysis.ai·Click to report a broken or paywalled link. Two distinct reports hide the row.
Tuesday, June 16, 2026’s edition
OpenAI — Blog
SAFETY · 1 source

Predicting model behavior before release by simulating deployment

OpenAI introduces Deployment Simulation, a method to predict AI model behavior before deployment using real conversation data to improve safety and evaluation accuracy.

via openai.com·Click to report a broken or paywalled link. Two distinct reports hide the row.
Google DeepMind — Blog
AGENTS · 1 source

Securing the future of AI agents

Securing internal systems with an AI Control Roadmap, combining traditional safeguards and real-time monitoring.

via deepmind.google·Click to report a broken or paywalled link. Two distinct reports hide the row.
Friday, June 12, 2026’s edition
Perplexity — Blog
AI · 1 source

How Perplexity Uses Computer

Perplexity published a guide demonstrating how its internal teams use its 'Computer' AI agent to automate workflows across various departments, including recruiting, design, and growth marketing.

via perplexity.ai·Click to report a broken or paywalled link. Two distinct reports hide the row.
NVIDIA — AI Blog
AGENTS · 1 source

NVIDIA Blackwell Leads on First Agentic AI Infrastructure Benchmark

New AgentPerf results from Artificial Analysis show how accelerated computing systems handle real-world agentic workloads, with NVIDIA GB300 NVL72 running up to 20x more agents per megawatt than NVIDIA Hopper.

via blogs.nvidia.com·Click to report a broken or paywalled link. Two distinct reports hide the row.
Thursday, June 11, 2026’s edition
Wednesday, June 10, 2026’s edition
Reddit — AI Communities
AI · 1 source

JPMorgan, OQC, and AMD Quantum AI Collaboration

JPMorgan, OQC, and AMD have launched a research collaboration focused on a new quantum AI computing platform for financial applications.

via reddit.com·Click to report a broken or paywalled link. Two distinct reports hide the row.
Tuesday, June 9, 2026’s edition