Mistral releases Voxtral speech model, outperforms Whisper
Mistral releases Voxtral, a speech transcription model in 3B and 24B sizes that supports 32k token context, multilingual audio up to 40 minutes, and built-in Q&A, summarisation, and function-calling.
Mistral surprises with the release of Voxtral, a transcription model outperforming Whisper large-v3, GPT-4o mini Transcribe, and Gemini 2.5 Flash. Voxtral models (3B and 24B) support 32k token context length, handle audios up to 30-40 minutes, offer built-in Q&A and summarization, are multilingual, and enable function-calling from voice commands, powered by the Mistral Small 3.1 language model backbone. Meanwhile, Moonshot AI's Kimi K2, a non-reasoning Mixture of Experts (MoE) model built by a team of around 200 people, gains attention for blazing-fast inference on Groq hardware, broad platform availability including Together AI and DeepInfra, and local running on M4 Max 128GB Mac.
Developer tool integrations include LangChain and Hugging Face support, highlighting Kimi K2's strong tool use capabilities.