§ tools · storyline

Transformers v5.3.0

Hugging Face releases Transformers v5.3.0, adding EuroBERT, Microsoft's VibeVoice ASR, and TimesFM 2.5 among new model integrations.

Mar 4 · 18:42:16 · primary fetch1 sourceupdated Mar 4 · 18:42:16

New Model additions EuroBERT EuroBERT is a multilingual encoder model based on a refreshed transformer architecture, akin to Llama but with bidirectional attention. It supports a mixture of European and widely spoken languages, with sequences of up to 8192 tokens. Links: Documentation | Paper | Blog Post Add eurobert (#39455) by @ArthurZucker in #39455 VibeVoice ASR VibeVoice ASR is an automatic speech recognition model from Microsoft that combines acoustic and semantic audio tokenizers with a causal language model for robust speech-to-text transcription. The model uses VibeVoice's acoustic and semantic tokenizers that process audio at 24kHz, paired with a Qwen2-based language decoder for generating transcriptions.

It can process up to 60 minutes of continuous audio input, supports customized hotwords, performs joint ASR/diarization/timestamping, and handles over 50 languages with code-switching support. Links: Documentation | Paper Add VibeVoice ASR (#43625) by @ebezzam in #43625 TimesFM2.5 TimesFM 2.5 is a pretrained time-series foundation model that uses a decoder-only attention architecture with input patching for forecasting. The model is designed to provide accurate…

read full article on github.com ↗

§ sources1 publication · timeline below

github.comtransformers v5.3.0 — v5.3.0: EuroBERT, VibeVoice ASR, TimesFM2.5, PP-DocLayoutV2, OlmoHybrid, ModernVBert, Higgs Audio V2primary18:42:16