§ feed · storyline

Mistral releases Voxtral speech model, outperforms Whisper

Mistral releases Voxtral, a speech transcription model in 3B and 24B sizes that supports 32k token context, multilingual audio up to 40 minutes, and built-in Q&A, summarisation, and function-calling.

Jul 15 · 07:44:39 · primary fetch1 sourceupdated Jul 15 · 07:44:39

Mistral surprises with the release of Voxtral, a transcription model outperforming Whisper large-v3, GPT-4o mini Transcribe, and Gemini 2.5 Flash. Voxtral models (3B and 24B) support 32k token context length, handle audios up to 30-40 minutes, offer built-in Q&A and summarization, are multilingual, and enable function-calling from voice commands, powered by the Mistral Small 3.1 language model backbone. Meanwhile, Moonshot AI's Kimi K2, a non-reasoning Mixture of Experts (MoE) model built by a team of around 200 people, gains attention for blazing-fast inference on Groq hardware, broad platform availability including Together AI and DeepInfra, and local running on M4 Max 128GB Mac.

Developer tool integrations include LangChain and Hugging Face support, highlighting Kimi K2's strong tool use capabilities.

read full article on news.smol.ai ↗

§ sources1 publication · timeline below

news.smol.aiVoxtral - Mistral's SOTA ASR model in 3B (mini) and 24B ("small") sizes beats OpenAI Whisper large-v3primary07:44:39