§ local-llm · storyline

vLLM v0.20.0

vLLM v0.20.0 releases with DeepSeek V4 support, CUDA 13.0 as default, PyTorch 2.11 upgrade, Python 3.14 support, and HuggingFace Transformers v5 compatibility across 752 commits from 320 contributors.

Apr 27 · 23:20:28 · primary fetch1 sourceupdated Apr 27 · 23:20:28

vLLM v0.20.0 Highlights This release features 752 commits from 320 contributors (123 new)! DeepSeek V4: Initial DeepSeek V4 support landed (#40860), with DSML token-leakage fix in DSV4/3.2 (#40806), DSA + MTP IMA fix (#40772), and a silu clamp limit on the shared expert (#40950). CUDA 13.0 default: Default CUDA wheel on PyPI and `vllm/vllm-openai:v0.20.0` image switched to CUDA 13.0; architecture lists and build-args cleaned up (#39878), and CUDA bumped to 13.0.2 to match PyTorch 2.11.0 (#40669). As a general rule of thumb, our CUDA version policy follows PyTorch's. We highly recommend to install vLLM with `uv` and use `--torch-backend=cu129` if you are on CUDA 12.9.

PyTorch 2.11 upgrade (#34644): vLLM ships on torch 2.11 for CUDA, and XPU is now also on torch 2.11 (#37947) — XPU is no longer pinned to 2.10. This is a breaking change for environment dependency. Python 3.14: Added to the supported Python version list (#34770). Transformers v5: vLLM now runs on HuggingFace `transformers>=5` (#30566), with vision-encoder torch.compile bypass (#30518) and continued v4/v5 compat fixes including PaddleOCR-VL image processor `max_pixels` (#38629), Mistral YaRN warning (#37292), and Jina…

read full article on github.com ↗

§ sources1 publication · timeline below

github.comvllm v0.20.0primary23:20:28