Shipfeed. AI News Channel

About · Deep Dive

Tutorials and papers for sit-down reading.

Deep Dive50 storylines

N° 001 · ▲ Biggest story · claude code

Claude Code runs malware from compromised repos without verification

Security researchers at Mozilla's 0DIN platform have shown how a single compromised GitHub repo can take over a developer's machine the moment an AI coding tool like Claude Code runs its setup. The catch: the malicious…

1 source converging · last poll 12:04:32

the-decoder.com12:04:32

View storyline · all 1 →12:04

Yesterday’s editionSunday, June 28, 2026

21:55:04Thezvi+3 sources

GPT · 4 sources

GPT-5.6 system card shows Sol below threat level for Mythos use cases

Zvi Mowshowitz / Don't Worry About the Vase: GPT-5.6 system card indicates Sol is well below the level of most worrisome Mythos use cases, suggesting all GPT-5.6 versions could be released without delay — While…

via thezvi.substack.com·+4 sources+4 sourcesthezvi.substack.comprimary↗Let's Data Science↗Tech Edition↗LinkedIn↗·

11:30:18The Decoder

AGENTS · 1 source

Chinese cybersecurity firm builds AI tools to rival Mythos and frames the race as cyber-nuclear deterrence

360 founder Zhou Hongyi presents two AI security tools designed to compete with Anthropic's Mythos. One has already flagged 3,432 vulnerabilities. Zhou admits Chinese models trail Western ones by 20 to 30 percent, but…

via the-decoder.com·

SponsoredNimbuspaid placement

Featured partner · Agents

Need an agent shipped this quarter?

Nimbus builds production AI systems combining humans and AI end-to-end. From scoped pilot to production in 4 to 8 weeks.

Talk to Nimbus →

09:33:18AOL.com

SAFETY · 1 source

Anthropic's Mythos AI uncovers 2,000 unknown software vulnerabilities

Anthropic's Mythos AI found over 2,000 unknown software vulnerabilities in just seven weeks of testing AOL.com

via AOL.com·

08:13:59Google News — AI

GPT · 1 source

GPT-5.6 Sol launch raises questions over METR evaluation methods

GPT-5.6 Sol’s Launch: METR’s Evaluation Gaming Finding Matters More Than the Restrictions latesthackingnews.com

via Google News — AI·

Saturday, June 27, 2026’s editionSaturday, June 27, 2026

11:23:42The Decoder+3 sources

GPT · 4 sources

OpenAI's GPT-5.6 Sol cheats on software tests more than prior models

Independent testing organization METR found that OpenAI's GPT-5.6 Sol cheated more than any publicly tested AI model before it, exploiting bugs in the test environment, extracting hidden solutions, and trying to cover…

via the-decoder.com·+4 sources+4 sourcesthe-decoder.comprimary↗The Mac Observer↗Google News — AI↗WION↗·

Friday, June 26, 2026’s editionFriday, June 26, 2026

19:24:27The Decoder

EVALS · 1 source

AI model runs nonstop 19 days on $2,600 coding task

Epoch AI's new MirrorCode benchmark tests whether AI models can recreate complete programs without access to the original code. Claude Opus 4.7 leads with a 56 percent solve rate, rebuilding a 16,000-line toolkit in…

via the-decoder.com·

08:26:23SQ Magazine

CLAUDE · 1 source

Claude vs ChatGPT for Coding: The Benchmark Numbers Developers Actually Need

Claude vs ChatGPT for Coding: The Benchmark Numbers Developers Actually Need SQ Magazine

via SQ Magazine·

03:12:30Latent Space

RESEARCH · 1 source

OpenAI Codex output tokens surge across divisions since November

It's happening.

via latent.space·

Thursday, June 25, 2026’s editionThursday, June 25, 2026

17:34:41The New York Times

AI · 1 source

Chinese A.I. Models Close the Gap With Anthropic and OpenAI

Chinese A.I. Models Close the Gap With Anthropic and OpenAI The New York Times

via The New York Times·

15:35:28arXiv — cs.AI

MCP · 1 source

ShareLock: A Stealthy Multi-Tool Threshold Poisoning Attack Against MCP

via arxiv.org·

Wednesday, June 24, 2026’s editionWednesday, June 24, 2026

02:00:00MarkTechPost

AI · 1 source

DFlash Speculative Decoding Drafts Whole Token Blocks in Parallel

Researchers introduced DFlash, a speculative decoding model that drafts entire token blocks in parallel, achieving up to 15x throughput on NVIDIA Blackwell GPUs.

via marktechpost.com·

21:30:00New York Post+1 source

SAFETY · 2 sources

Anthropic's Mythos found vulnerabilities in classified US systems

Anthropic's 'Mythos' sniffed out vulnerabilities in classified US government systems within hours: report New York Post

via New York Post·+2 sources+2 sourcesNew York Postprimary↗MIT Sloan Management Review Middle East↗·

19:07:37The Decoder

CLAUDE · 1 source

Snowflake CEO finds GLM-5.2 competitive with Opus 4.7 at a fraction of the cost

Zhipu AI's GLM-5.2 nearly matches Claude Opus 4.7 in a Snowflake benchmark with 103 coding tasks at one-fifth the cost per output token. But the Chinese model burns through nearly twice as many tokens per task. Still…

via the-decoder.com·

18:51:52Google Research — Blog

RESEARCH · 1 source

Thinking to recall: How reasoning unlocks parametric knowledge in LLMs

Generative AI

via research.google·

17:55:01Theregister

SAFETY · 1 source

Nature paper challenges Microsoft quantum claims over coding errors

Thomas Claburn / The Register: Nature publishes a peer-reviewed paper alleging that Microsoft's 2025 quantum breakthrough claims were based on “basic Python errors” and data cherry-picking — Nature…

via theregister.com·

13:41:02Google News — AI Products & Releases+8 sources

AGENTS · 9 sources

Anthropic model finds vulnerabilities in classified US systems

Anthropic AI Model Identifies Vulnerabilities in Classified U.S. Government Systems During Testing citybiz

via Google News — AI Products & Releases·+9 sources+9 sourcesGoogle News — AI Products & Releasesprimary↗Yellow.com↗Yahoo↗BeInCrypto↗IndexBox↗Euronews.com↗Let's Data Science↗WSOC TV↗Action News Jax↗·

02:47:00CNBC+3 sources

AGENTS · 4 sources

Anthropic's Mythos model finds vulnerabilities in U.S. classified

Anthropic’s Mythos model found vulnerabilities in classified U.S. government systems, official says: AP CNBC

via CNBC·+4 sources+4 sourcesCNBCprimary↗International Business Times↗Times Herald Online↗CUToday↗·

Tuesday, June 23, 2026’s editionTuesday, June 23, 2026

10:18:48Google News — AI Products & Releases+1 source

SAFETY · 2 sources

OpenAI's Cybersecurity AI Surpasses Anthropic's Mythos 5

via Google News — AI Products & Releases·+2 sources+2 sourcesGoogle News — AI Products & Releasesprimary↗Neowin↗·

21:59:14Crypto Briefing

GPT · 1 source

OpenAI’s GPT-5.5 surpasses Anthropic’s Mythos in key AI evaluation

OpenAI’s GPT-5.5 surpasses Anthropic’s Mythos in key AI evaluation Crypto Briefing

via Crypto Briefing·

19:05:50arXiv — cs.CL

AGENTS · 1 source

SHERLOC: Structured Diagnostic Localization for Code Repair Agents

via arxiv.org·

15:53:55arXiv — cs.CL

QWEN · 1 source

Qwen-AgentWorld: Language World Models for General Agents

via arxiv.org·

14:46:29arXiv — cs.AI

AGENTS · 1 source

Reinforcement Learning for Computer-Use Agents with Autonomous Evaluation

via arxiv.org·

00:07:03R&D World

RESEARCH · 1 source

AI chemist improves stubborn coupling reaction

OpenAI and Molecule.one report a near-autonomous AI chemist that improved a stubborn coupling reaction R&D World

via R&D World·

Monday, June 22, 2026’s editionMonday, June 22, 2026

19:39:43arXiv — cs.CL

AGENTS · 1 source

EnterpriseClawBench: Benchmarking Agents from Real Workplace Sessions

via arxiv.org·

Sunday, June 21, 2026’s editionSunday, June 21, 2026

19:53:58WinBuzzer

AGENTS · 1 source

Google DeepMind Tests AI Controls on One Million Agent Tasks

Google DeepMind Tests AI Controls on One Million Agent Tasks WinBuzzer

via WinBuzzer·

14:17:00Google News — AI Products & Releases+1 source

AGENTS · 2 sources

Google DeepMind prepares for risk of AI agents going rogue - thestreet.com

Google DeepMind prepares for risk of AI agents going rogue thestreet.com

via Google News — AI Products & Releases·+2 sources+2 sourcesGoogle News — AI Products & Releasesprimary↗Startup Fortune↗·

Saturday, June 20, 2026’s editionSaturday, June 20, 2026

11:51:55The Decoder

AGENTS · 1 source

Data2Story turns a CSV file into a verified interactive news article using seven AI agents

Seven AI agents work together like a newsroom. The "Data Journalist Agent" from Oxford and Stanford turns a CSV file into a finished interactive article with graphics, web research, and verifiable source links for 93…

via the-decoder.com·

Friday, June 19, 2026’s editionFriday, June 19, 2026

02:00:00MarkTechPost

AI · 1 source

Nvidia releases SpatialClaw, training-free agent for spatial reasoning

NVIDIA released SpatialClaw, a training-free framework that treats code as an action interface to improve spatial reasoning in vision-language models.

via marktechpost.com·

12:16:17Google News — AI Products & Releases

SAFETY · 1 source

OpenAI researchers show small doses of "beneficial trait" training make AI models broadly safer and harder to manipulate - the-decoder.com

OpenAI researchers show small doses of "beneficial trait" training make AI models broadly safer and harder to manipulate the-decoder.com

via Google News — AI Products & Releases·

12:08:27The Decoder

SAFETY · 1 source

OpenAI researchers show small doses of "beneficial trait" training make AI models broadly safer and harder to manipulate

OpenAI researchers show that reinforcement learning on desired behavioral traits like truthfulness and corrigibility works across domains. Training on health data also improved deception detection, and the model scored…

via the-decoder.com·

Thursday, June 18, 2026’s editionThursday, June 18, 2026

20:55:01Artificialanalysis

EVALS · 1 source

GLM-5.2 tops open weights models on intelligence index

Artificial Analysis: GLM-5.2 is the leading open weights model on Artificial Analysis' Intelligence Index, scoring 51, only behind Fable 5's 60, Opus 4.8's 56, and GPT-5.5's 55 — Z ai's GLM-5.2 is the new leading…

via artificialanalysis.ai·

15:15:57Tech Times

RESEARCH · 1 source

AI model boosts Chan-Lam yields across 10,080 reactions

AI Drug Discovery Chemistry Hits Wet Lab: GPT-5.4 Boosts Chan-Lam Yields in 10,080 Reactions Tech Times

via Tech Times·

10:00:00OpenAI — Blog

RESEARCH · 1 source

Using AI to help physicians diagnose rare genetic diseases affecting children

Researchers used an OpenAI reasoning model to help diagnose rare diseases, identifying 18 new diagnoses in previously unsolved cases.

via openai.com·

16:00:40NBC News

RESEARCH · 1 source

AI helped diagnose 18 children whose rare diseases had stumped doctors

AI helped diagnose 18 children whose rare diseases had stumped doctors NBC News

via NBC News·

15:00:00Fortune+2 sources

AGENTS · 3 sources

Google DeepMind unveils a plan to protect itself from its own rogue AI agents

Google DeepMind unveils a plan to protect itself from its own rogue AI agents Fortune

via Fortune·+3 sources+3 sourcesFortuneprimary↗Axios↗Google DeepMind↗·

Tuesday, June 16, 2026’s editionTuesday, June 16, 2026

02:00:00OpenAI — Blog

SAFETY · 1 source

Predicting model behavior before release by simulating deployment

OpenAI introduces Deployment Simulation, a method to predict AI model behavior before deployment using real conversation data to improve safety and evaluation accuracy.

via openai.com·

17:46:31Google DeepMind — Blog

AGENTS · 1 source

Securing the future of AI agents

Securing internal systems with an AI Control Roadmap, combining traditional safeguards and real-time monitoring.

via deepmind.google·

02:53:14Crypto Briefing

CLAUDE · 1 source

Anthropic's Claude Fable 5 scores 161 on Epoch Capabilities Index, surpassing GPT-5.5 Pro

Anthropic's Claude Fable 5 scores 161 on Epoch Capabilities Index, surpassing GPT-5.5 Pro Crypto Briefing

via Crypto Briefing·

Friday, June 12, 2026’s editionFriday, June 12, 2026

02:00:00Perplexity — Blog

AI · 1 source

How Perplexity Uses Computer

Perplexity published a guide demonstrating how its internal teams use its 'Computer' AI agent to automate workflows across various departments, including recruiting, design, and growth marketing.

via perplexity.ai·

23:00:08NVIDIA — AI Blog

AGENTS · 1 source

NVIDIA Blackwell Leads on First Agentic AI Infrastructure Benchmark

New AgentPerf results from Artificial Analysis show how accelerated computing systems handle real-world agentic workloads, with NVIDIA GB300 NVL72 running up to 20x more agents per megawatt than NVIDIA Hopper.

via blogs.nvidia.com·

Thursday, June 11, 2026’s editionThursday, June 11, 2026

19:58:35arXiv — cs.AI

AGENTS · 1 source

Agents-K1: Towards Agent-native Knowledge Orchestration

via arxiv.org·

19:56:35arXiv — cs.AI

AGENTS · 1 source

EurekAgent tool enables autonomous scientific discovery

via arxiv.org·

19:23:54arXiv — cs.AI

AGENTS · 1 source

Agentbeats brings standardized agent assessment framework

via arxiv.org·

Wednesday, June 10, 2026’s editionWednesday, June 10, 2026

19:38:24The Decoder

SAFETY · 1 source

Anthropic study shows AI needs hours, not weeks, to build exploits from security patches

Anthropic's security team found that its Mythos Preview AI model can turn security patches for Firefox and the Windows kernel into working exploits within hours, for a few thousand dollars and no specialized knowledge…

via the-decoder.com·

19:34:55Google Research — Blog

SAFETY · 1 source

New framework for auditing machine unlearning

Algorithms & Theory

via research.google·

17:15:01arXiv — cs.AI

AGENTS · 1 source

Survey examines environment engineering for large language models

via arxiv.org·

02:00:00Reddit — AI Communities

AI · 1 source

JPMorgan, OQC, and AMD Quantum AI Collaboration

JPMorgan, OQC, and AMD have launched a research collaboration focused on a new quantum AI computing platform for financial applications.

via reddit.com·

Tuesday, June 9, 2026’s editionTuesday, June 9, 2026

19:35:37arXiv — cs.AI

AGENTS · 1 source

ABC-Bench: An Agentic Bio-Capabilities Benchmark for Biosecurity

via arxiv.org·

18:39:32arXiv — cs.CL

AGENTS · 1 source

VISTA: A Versatile Interactive User Simulation Toolkit for Agent Evaluation

via arxiv.org·

Deep Dive — shipfeed

Deep Dive50 storylines

Need an agent shipped this quarter?