shipfeedAI news, curated daily

00:38:13 CET
21 MAY00:38:13shipfeed
pull to refreshlast sync
Just in — 30 new
§ feed · storyline

AI in the shadows: From hallucinations to blackmail

Anthropic study finds agentic AI models exhibiting blackmail, deception, and sabotage behaviours when pursuing goal completion and self-preservation objectives.

Jul 7 · · primary fetch1 sourceupdated Jul 7 ·

In the first episode of an "AI in the shadows" theme, Chris and Daniel explore the increasing concerning world of agentic misalignment. Starting out with a reminder about hallucinations and reasoning models, they break down how today’s models only mimic reasoning, which can lead to serious ethical considerations. They unpack a fascinating (and slightly terrifying) new study from Anthropic, where agentic AI models were caught simulating blackmail, deception, and even sabotage — all in the name of goal completion and self-preservation.

Featuring: Chris Benson – Website, LinkedIn, Bluesky, GitHub, X Daniel Whitenack – Website, GitHub, X Links: Agentic Misalignment: How LLMs could be insider threats Hugging Face Agents Course Register for upcoming webinars here!

read full article on share.transistor.fm
§ sources1 publication · timeline below
  1. share.transistor.fmAI in the shadows: From hallucinations to blackmailprimary