129 stories · 7d·6 sources covering·30 active storylines
Updated Sun, 28 Jun 2026 CEST·129 new storylines this week·live
What this is
AI safety spans alignment research, model evaluations, and incident reports aimed at keeping AI systems reliable and controllable. shipfeed tracks safety research and disclosures.
Independent testing organization METR found that OpenAI's GPT-5.6 Sol cheated more than any publicly tested AI model before it, exploiting bugs in the test environment, extracting hidden solutions, and trying to cover…
Zvi Mowshowitz / Don't Worry About the Vase: GPT-5.6 system card indicates Sol is well below the level of most worrisome Mythos use cases, suggesting all GPT-5.6 versions could be released without delay — While…
Nimbus builds production AI systems — internal tools, customer agents, retrieval pipelines — combining humans and AI end-to-end. From scoped pilot to production in 4–8 weeks.
Anthropic claims that China's Alibaba used 25,000 fake accounts and 28.8 million exchanges to illicitly 'distill' its Claude model — violations occurred from April to June 2026 Tom's Hardware
OpenAI is expanding its Daybreak cybersecurity initiative with an updated Codex Security plugin, the full GPT-5.5-Cyber model, and a partner network with more than 25 security firms and several governments. The focus…
Lily Hay Newman / Wired: OpenAI unveils an updated GPT-5.5-Cyber model, launches the Patch the Planet initiative in partnership with Trail of Bits to fix open source bugs, and more — Amid concerns about AI…
OpenAI introduces new Daybreak tools, including Codex Security and GPT-5.5-Cyber, to help organizations find, validate, and patch vulnerabilities at scale.
OpenAI introduces Deployment Simulation, a method to predict AI model behavior before deployment using real conversation data to improve safety and evaluation accuracy.
Nimbus builds production AI systems — internal tools, customer agents, retrieval pipelines — combining humans and AI end-to-end. From scoped pilot to production in 4–8 weeks.
Ben Thompson / Stratechery: Anthropic's belief in its own commitment to safety gives Anthropic the license to aggressively favor its business and even challenge the US government — I'm sympathetic to the cynics…
Anthropic has apologized for stealthily throttling its new AI model, Claude Fable 5, with hidden guardrails that undermine both researchers and rivals using it to develop competing systems. The company says it is…
Patch Changes bae5e2b: fix(security): re-validate tool approvals from client message history before execution The approval-replay path in `generateText`/`streamText` (and `WorkflowAgent.stream`) reconstructed approved…
Anthropic's security team found that its Mythos Preview AI model can turn security patches for Firefox and the Windows kernel into working exploits within hours, for a few thousand dollars and no specialized knowledge…
Sam Sabin / Axios: Anthropic researchers say Mythos Preview can now turn publicly disclosed software vulnerabilities, or N-days, into working exploits in hours instead of weeks — Anthropic's Mythos Preview can…