vLLM v0.13.0
vLLM v0.13.0 Release Notes Highlights Highlights This release features 442 commits from 207 contributors (61 new contributors)! Breaking Changes: This release includes deprecation removals, PassConfig flag renames, and attention configuration changes from environment variables to CLI arguments. Please review the breaking changes section carefully before upgrading. Model Support New models: BAGEL (AR only) (#28439), AudioFlamingo3 (#30539), JAIS 2 (#30188), latent MoE architecture support (#30203). Tool parsers: DeepSeek-V3.2 (#29848), Gigachat 3 (#29905), Holo2 reasoning (#30048). Model enhancements: Qwen3-VL embeddings support (#30037), Qwen3-VL EVS (Efficient Video Sampling) (#29752), DeepSeek V3.2 proper `drop_thinking` logic (#30490), DeepSeek V3.2 top-k fix (#27568).
Task expansion: Automatic TokenClassification model conversion (#30666), Ultravox v0.7 transformer projector (#30089). Quantization: BitsAndBytes for Qwen3-Omni-MoE (#29896). Speculative decoding: Eagle/Eagle3 Transformers backend (#30340), Mamba `selective_state_update` spec decode (#29488). Engine Core Compilation: Conditional compilation via `compile_ranges` for selective kernel compilation (#24252). Prefix…
- github.comvllm v0.13.0primary