Transformers v5.3.0
Hugging Face releases Transformers v5.3.0, adding EuroBERT, Microsoft's VibeVoice ASR, and TimesFM 2.5 among new model integrations.
New Model additions EuroBERT EuroBERT is a multilingual encoder model based on a refreshed transformer architecture, akin to Llama but with bidirectional attention. It supports a mixture of European and widely spoken languages, with sequences of up to 8192 tokens. Links: Documentation | Paper | Blog Post Add eurobert (#39455) by @ArthurZucker in #39455 VibeVoice ASR VibeVoice ASR is an automatic speech recognition model from Microsoft that combines acoustic and semantic audio tokenizers with a causal language model for robust speech-to-text transcription. The model uses VibeVoice's acoustic and semantic tokenizers that process audio at 24kHz, paired with a Qwen2-based language decoder for generating transcriptions.
It can process up to 60 minutes of continuous audio input, supports customized hotwords, performs joint ASR/diarization/timestamping, and handles over 50 languages with code-switching support. Links: Documentation | Paper Add VibeVoice ASR (#43625) by @ebezzam in #43625 TimesFM2.5 TimesFM 2.5 is a pretrained time-series foundation model that uses a decoder-only attention architecture with input patching for forecasting. The model is designed to provide accurate…