§ local-llm · storyline

vLLM v0.13.0

vLLM v0.13.0 releases with 442 commits from 207 contributors, adding new models including BAGEL and AudioFlamingo3, latent MoE support, xxHash prefix caching, and breaking changes to attention and config flags.

Dec 19 · 04:02:22 · primary fetch1 sourceupdated Dec 19 · 04:02:22

vLLM v0.13.0 Release Notes Highlights Highlights This release features 442 commits from 207 contributors (61 new contributors)! Breaking Changes: This release includes deprecation removals, PassConfig flag renames, and attention configuration changes from environment variables to CLI arguments. Please review the breaking changes section carefully before upgrading. Model Support New models: BAGEL (AR only) (#28439), AudioFlamingo3 (#30539), JAIS 2 (#30188), latent MoE architecture support (#30203). Tool parsers: DeepSeek-V3.2 (#29848), Gigachat 3 (#29905), Holo2 reasoning (#30048). Model enhancements: Qwen3-VL embeddings support (#30037), Qwen3-VL EVS (Efficient Video Sampling) (#29752), DeepSeek V3.2 proper `drop_thinking` logic (#30490), DeepSeek V3.2 top-k fix (#27568).

Task expansion: Automatic TokenClassification model conversion (#30666), Ultravox v0.7 transformer projector (#30089). Quantization: BitsAndBytes for Qwen3-Omni-MoE (#29896). Speculative decoding: Eagle/Eagle3 Transformers backend (#30340), Mamba `selective_state_update` spec decode (#29488). Engine Core Compilation: Conditional compilation via `compile_ranges` for selective kernel compilation (#24252). Prefix…

read full article on github.com ↗

§ sources1 publication · timeline below

github.comvllm v0.13.0primary04:02:22