§ tools · cluster

vLLM v0.13.0

Dec 19 · 04:02:22 · primary fetch1 sourcecluster 90b60103updated Dec 19 · 04:02:22

vLLM v0.13.0 Release Notes Highlights Highlights This release features 442 commits from 207 contributors (61 new contributors)! Breaking Changes: This release includes deprecation removals, PassConfig flag renames, and attention configuration changes from environment variables to CLI arguments. Please review the breaking changes section carefully before upgrading. Model Support New models: BAGEL (AR only) (#28439), AudioFlamingo3 (#30539), JAIS 2 (#30188), latent MoE architecture support (#30203). Tool parsers: DeepSeek-V3.2 (#29848), Gigachat 3 (#29905), Holo2 reasoning (#30048). Model enhancements: Qwen3-VL embeddings support (#30037), Qwen3-VL EVS (Efficient Video Sampling) (#29752), DeepSeek V3.2 proper `drop_thinking` logic (#30490), DeepSeek V3.2 top-k fix (#27568).

Task expansion: Automatic TokenClassification model conversion (#30666), Ultravox v0.7 transformer projector (#30089). Quantization: BitsAndBytes for Qwen3-Omni-MoE (#29896). Speculative decoding: Eagle/Eagle3 Transformers backend (#30340), Mamba `selective_state_update` spec decode (#29488). Engine Core Compilation: Conditional compilation via `compile_ranges` for selective kernel compilation (#24252). Prefix…

read full article on github.com ↗

§ sources1 publication · timeline below

github.comvllm v0.13.0primary04:02:22