AI model runs nonstop 19 days on $2,600 coding task
Epoch AI's MirrorCode benchmark tests AI models' ability to recreate programs from scratch, with Claude Opus 4.7 achieving 56 percent accuracy on code reconstruction tasks.
Epoch AI's new MirrorCode benchmark tests whether AI models can recreate complete programs without access to the original code. Claude Opus 4.7 leads with a 56 percent solve rate, rebuilding a 16,000-line toolkit in just 14 hours.
But every model tested still fails on the most complex tasks. The article An AI model programmed nonstop for 19 days on a single MirrorCode task that cost $2,600 to run appeared first on The Decoder.