vLLM — Releases · shipfeed

ad slot opena single understated line lives here — sponsor wordmark + a short line.advertise on shipfeed →

items21 latest

▶ ai·09:37

vllm v0.20.2

vLLM v0.20.2 Highlights This release features 6 commits from 6 contributors (0 new)! This is a small patch release with bug fixes for DeepSeek V4, gpt-oss, and Qwen3-VL Bug Fixes DeepSeek V4 sparse attention: Re-enable…

vLLM — Releases

▶ ai·12:36

vllm v0.20.1

vLLM v0.20.1 This is a patch release on top of `v0.20.0` primarily focused on DeepSeek V4 stabilization and performance improvements, along with several important bug fixes. DeepSeek V4 Base model support (#41006)…

vLLM — Releases

▶ ai·23:20

vllm v0.20.0

vLLM v0.20.0 Highlights This release features 752 commits from 320 contributors (123 new)! DeepSeek V4: Initial DeepSeek V4 support landed (#40860), with DSML token-leakage fix in DSV4/3.2 (#40806), DSA + MTP IMA fix…

vLLM — Releases

▶ ai·07:44

vllm v0.19.1

This is a patch release on top of `v0.19.0` with Transformers v5.5.3 upgrade and bug fixes for Gemma4: Update to transformers v5 (#30566) [Bugfix] Fix invalid JSON in Gemma 4 streaming tool calls by stripping partial…

vLLM — Releases

▶ ai·04:19

vllm v0.19.0

vLLM v0.19.0 Highlights This release features 448 commits from 197 contributors (54 new)! Gemma 4 support: Full Google Gemma 4 architecture support including MoE, multimodal, reasoning, and tool-use capabilities…

vLLM — Releases

▶ ai·02:53

vllm v0.18.1

This is a patch release on top of v0.18.0 to address a few issues: Change default SM100 MLA prefill backend back to TRT-LLM (#38562) Fix mock.patch resolution failure for standalone_compile.FakeTensorMode on Python <=…

vLLM — Releases

▶ ai·22:31

vllm v0.18.0

vLLM v0.18.0 Known issues Degraded accuracy when serving Qwen3.5 with FP8 KV cache on B200 (#37618) If you previously ran into `CUBLAS_STATUS_INVALID_VALUE` and had to use a workaround in `v0.17.0`, you can reinstall…

vLLM — Releases

▶ ai·11:24

vllm v0.17.1

This is a patch release on top of `v0.17.0` to address a few issues: New Model: Nemotron 3 Super Fix passing of activation_type to trtllm fused MoE NVFP4 and FP8 (#36017) Fix/resupport nongated fused moe triton…

vLLM — Releases

▶ ai·01:46

vllm v0.17.0

vLLM v0.17.0 Known Issue: If you are on CUDA 12.9+ and encounter a `CUBLAS_STATUS_INVALID_VALUE` error, this is caused by a CUDA library mismatch. To resolve, try one of the following: 1. Remove the path to system CUDA…

vLLM — Releases

▶ ai·20:58

vllm v0.16.0

vLLM v0.16.0 Please note that this release was branch cut on Feb 8, so any features added to vLLM after that date is not included. Highlights This release features 440 commits from 203 contributors (7 new)! Async…

vLLM — Releases

▶ ai·21:48

vllm v0.15.1

v0.15.1 is a patch release with security fixes, RTX Blackwell GPU fixes support, and bug fixes. Security CVE-2025-69223: Updated aiohttp dependency (#33621) CVE-2026-0994: Updated Protobuf dependency (#33619)…

vLLM — Releases

▶ ai·11:21

vllm v0.15.0

Highlights This release features 335 commits from 158 contributors (39 new)! Model Support New architectures: Kimi-K2.5 (#33131), Molmo2 (#30997), Step3vl 10B (#32329), Step1 (#32511), GLM-Lite (#31386), Eagle2.5-8B…

vLLM — Releases

▶ ai·21:29

vllm v0.14.1

This is a patch release on top of `v0.14.0` to address a few security and memory leak fixes.

vLLM — Releases

▶ ai·10:20

vllm v0.14.0

Highlights This release features approximately 660 commits from 251 contributors (86 new contributors). Breaking Changes: Async scheduling is now enabled by default - Users who experience issues can disable with…

vLLM — Releases

▶ ai·04:02

vllm v0.13.0

vLLM v0.13.0 Release Notes Highlights Highlights This release features 442 commits from 207 contributors (61 new contributors)! Breaking Changes: This release includes deprecation removals, PassConfig flag renames, and…

vLLM — Releases

▶ ai·10:36

vllm v0.12.0

vLLM v0.12.0 Release Notes Highlights Highlights This release features 474 commits from 213 contributors (57 new)！ Breaking Changes: This release includes PyTorch 2.9.0 upgrade (CUDA 12.9), V0 deprecations including…

vLLM — Releases

▶ ai·08:29

vllm v0.11.2

This release includes 4 bug fixes on top of `v0.11.1`: [BugFix] Ray with multiple nodes (https://github.com/vllm-project/vllm/pull/28873) [BugFix] Fix false assertion with spec-decode=[2,4,..] and TP>2…

vLLM — Releases

▶ ai·00:03

vllm v0.11.1

Highlights This release includes 1456 commits from 449 contributors (184 new contributors)! Key changes include: PyTorch 2.9.0 + CUDA 12.9.1: Updated the default CUDA build to `torch==2.9.0+cu129`, enabling Inductor…

vLLM — Releases

▶ ai·21:17

vllm v0.11.0

Highlights This release features 538 commits, 207 contributors (65 new contributors)! This release completes the removal of V0 engine. V0 engine code including AsyncLLMEngine, LLMEngine, MQLLMEngine, all attention…

vLLM — Releases

▶ ai·08:37

vllm v0.10.2

Highlights This release contains 740 commits from 266 contributors (97 new)! Breaking Changes: This release includes PyTorch 2.8.0 upgrade, V0 deprecations, and API changes - please review the changelog carefully…

vLLM — Releases

▶ ai·23:20

vllm v0.10.1.1

This is a critical bugfix and security release: Fix CUTLASS MLA Full CUDAGraph (#23200) Limit HTTP header count and size (#23267): https://github.com/vllm-project/vllm/security/advisories/GHSA-rxc4-3w6r-4v47 Do not use…

vLLM — Releases