§ tools · storyline

Transformers v5.2.0

Transformers v5.2.0 adds VoxtralRealtime, a streaming ASR model from Mistral AI, and GLM-5, a 744B-parameter mixture-of-experts model from zAI with DeepSeek Sparse Attention support.

Feb 16 · 19:55:53 · primary fetch1 sourceupdated Feb 16 · 19:55:53

New Model additions VoxtralRealtime VoxtralRealtime is a streaming speech-to-text model from Mistral AI, designed for real-time automatic speech recognition (ASR). Unlike the offline Voxtral model which processes complete audio files, VoxtralRealtime is architected for low-latency, incremental transcription by processing audio in chunks as they arrive. The model combines an audio encoder with a Mistral-based language model decoder, using time conditioning embeddings and causal convolutions with padding caches to enable efficient streaming inference. Add Voxtral Realtime (#43769) by @eustlb GLM-5 - GlmMoeDsa The zAI team launches GLM-5, and introduces it as such: GLM-5, targeting complex systems engineering and long-horizon agentic tasks.

Scaling is still one of the most important ways to improve the intelligence efficiency of Artificial General Intelligence (AGI). Compared to GLM-4.5, GLM-5 scales from 355B parameters (32B active) to 744B parameters (40B active), and increases pre-training data from 23T to 28.5T tokens. GLM-5 also integrates DeepSeek Sparse Attention (DSA), largely reducing deployment cost while preserving long-context capacity. Reinforcement learning aims to…

read full article on github.com ↗

§ sources1 publication · timeline below

github.comtransformers v5.2.0 — v5.2.0: GLM-5, Qwen3.5, Voxtral Realtime, VibeVoice Acoustic Tokenizerprimary19:55:53