shipfeedAI news, curated daily

23:57:14 CET
20 MAY23:57:14shipfeed
pull to refreshlast sync
Just in — 30 new
§ feed · storyline

Study finds method to prevent AI models from gaming safety tests

Researchers from MATS, Redwood Research, Oxford, and Anthropic publish a study identifying methods to detect and prevent AI models from deliberately underperforming on safety evaluations.

May 10 · · primary fetch1 sourceupdated May 10 ·

A study by researchers from the MATS program, Redwood Research, the University of Oxford, and Anthropic examines a safety problem that grows more pressing as AI systems become more capable: "sandbagging," where a model deliberately hides its true abilities and delivers work that looks adequate but is intentionally subpar.

The article Researchers may have found a way to stop AI models from intentionally playing dumb during safety evaluations appeared first on The Decoder.

read full article on the-decoder.com
§ sources1 publication · timeline below
  1. the-decoder.comResearchers may have found a way to stop AI models from intentionally playing dumb during safety evaluationsprimary