§ feed · storyline

o1 destroys Lmsys Arena, Qwen 2.5, Kyutai Moshi release

OpenAI's o1-preview tops LMsys Arena benchmarks as Alibaba releases Qwen 2.5 surpassing Llama 3.1 at 70B scale and Kyutai publishes open-weights real-time voice model Moshi.

Sep 18 · 23:51:26 · primary fetch1 sourceupdated Sep 18 · 23:51:26

OpenAI's o1-preview model has achieved a milestone by fully matching top daily AI news stories without human intervention, consistently outperforming other models like Anthropic, Google, and Llama 3 in vibe check evaluations. OpenAI models dominate the top 4 slots on LMsys benchmarks, with rate limits increasing to 500-1000 requests per minute. In open source, Alibaba's Qwen 2.5 suite surpasses Llama 3.1 at the 70B scale and updates its closed Qwen-Plus models to outperform DeepSeek V2.5 but still lag behind leading American models.

Kyutai Moshi released its open weights realtime voice model featuring a unique streaming neural architecture with an "inner monologue." Weights & Biases introduced Weave, an LLM observability toolkit that enhances experiment tracking and evaluation, turning prompting into a more scientific process. The news also highlights upcoming events like the WandB LLM-as-judge hackathon in San Francisco. "o1-preview consistently beats out our vibe check evals" and "OpenAI models are gradually raising rate limits by the day."

read full article on news.smol.ai ↗

§ sources1 publication · timeline below

news.smol.aio1 destroys Lmsys Arena, Qwen 2.5, Kyutai Moshi releaseprimary23:51:26