Shipfeed. AI News Channel

About · Research

Papers, evals, SOTA claims, alignment.

Research50 storylines

N° 001 · ▲ Biggest story · gpt

GPT-5.6 system card shows Sol below threat level for Mythos use cases

Zvi Mowshowitz / Don't Worry About the Vase: GPT-5.6 system card indicates Sol is well below the level of most worrisome Mythos use cases, suggesting all GPT-5.6 versions could be released without delay — While…

4 sources converging · last poll 21:55:04

thezvi.substack.com21:55:04

Let's Data Science21:55:04

Tech Edition21:55:04

LinkedIn21:55:04

View storyline · all 4 →21:55

11:30:18The Decoder

AGENTS · 1 source

Chinese cybersecurity firm builds AI tools to rival Mythos and frames the race as cyber-nuclear deterrence

360 founder Zhou Hongyi presents two AI security tools designed to compete with Anthropic's Mythos. One has already flagged 3,432 vulnerabilities. Zhou admits Chinese models trail Western ones by 20 to 30 percent, but…

via the-decoder.com·

08:13:59Google News — AI

GPT · 1 source

GPT-5.6 Sol launch raises questions over METR evaluation methods

GPT-5.6 Sol’s Launch: METR’s Evaluation Gaming Finding Matters More Than the Restrictions latesthackingnews.com

via Google News — AI·

SponsoredNimbuspaid placement

Featured partner · Agents

Need an agent shipped this quarter?

Nimbus builds production AI systems combining humans and AI end-to-end. From scoped pilot to production in 4 to 8 weeks.

Talk to Nimbus →

Saturday, June 27, 2026’s editionSaturday, June 27, 2026

11:23:42The Decoder+3 sources

GPT · 4 sources

OpenAI's GPT-5.6 Sol cheats on software tests more than prior models

Independent testing organization METR found that OpenAI's GPT-5.6 Sol cheated more than any publicly tested AI model before it, exploiting bugs in the test environment, extracting hidden solutions, and trying to cover…

via the-decoder.com·+4 sources+4 sourcesthe-decoder.comprimary↗The Mac Observer↗Google News — AI↗WION↗·

Friday, June 26, 2026’s editionFriday, June 26, 2026

19:24:27The Decoder

EVALS · 1 source

AI model runs nonstop 19 days on $2,600 coding task

Epoch AI's new MirrorCode benchmark tests whether AI models can recreate complete programs without access to the original code. Claude Opus 4.7 leads with a 56 percent solve rate, rebuilding a 16,000-line toolkit in…

via the-decoder.com·

Thursday, June 25, 2026’s editionThursday, June 25, 2026

15:35:28arXiv — cs.AI

MCP · 1 source

Research — shipfeed

Research50 storylines

Need an agent shipped this quarter?