§ feed · storyline

DeepSeek-OCR achieves 97% accuracy with 10x better efficiency

DeepSeek releases DeepSeek-OCR 3B, a MoE vision-language model achieving ~97% decoding precision at under 10× compression and processing up to ~33M pages per day on 20 A100 nodes.

Oct 20 · 07:44:39 · primary fetch1 sourceupdated Oct 20 · 07:44:39

As ICCV 2025 begins, DeepSeek releases a novel DeepSeek-OCR 3B MoE vision-language model that compresses long text as visual context with high accuracy and efficiency, challenging traditional tokenization approaches. The model achieves ~97% decoding precision at <10× compression and processes up to ~33M pages/day on 20 A100-40G nodes, outperforming benchmarks like GOT-OCR2.0. Discussions highlight the potential for unlimited context windows and tokenization-free inputs, with contributions from @karpathy, @teortaxesTex, and others.

In video generation, google-deepmind's Veo 3.1 leads community benchmarks with advanced precision editing and scene blending, while Krea open-sources a 14B autoregressive video model enabling realtime long-form generation at ~11 FPS on a single B200 GPU.

read full article on news.smol.ai ↗

§ sources1 publication · timeline below

news.smol.aiDeepSeek-OCR finds vision models can decode 10x more efficiently with ~97% accuracy of text-only, 33/200k pages/day/A100primary07:44:39