§ models · storyline

not much happened today

NousResearch, Mistral, Anthropic, Cursor, VS Code, LangChain, and others release updates spanning open math models, coding agents, observability tools, and training optimisations.

Dec 10 · 06:44:39 · primary fetch1 sourceupdated Dec 10 · 06:44:39

NousResearch's Nomos 1 is a 30B open math model achieving a top Putnam score with only ~3B active parameters, enabling consumer Mac inference. AxiomProver also posts top Putnam results using ThinkyMachines' RL stack. Mistral's Devstral 2 Small outperforms DeepSeek v3.2 in 71% of preferences with better speed and cost. Anthropic's Claude Code introduces asynchronous agent execution. Cursor 2.2 adds deep agent primitives like Debug and Plan Modes. VS Code launches unified agent chat sessions improving multi-agent workflows.

LangChain releases "Polly" for agent observability. The Stirrup harness leads OpenAI GDPval benchmarks with Claude Opus 4.5, GPT-5, and Gemini 3 Pro following. Advances in quantization include vLLM integrating Intel's AutoRound PTQ for efficient serving. Unsloth achieves up to 3× training speedups with new kernels across Llama, Qwen, Mistral, and Gemma models. "Compositional reasoning + specialized post-training under constrained active params can rival frontier closed models on formal math."

read full article on news.smol.ai ↗

§ sources1 publication · timeline below

news.smol.ainot much happened todayprimary06:44:39