§ local-llm · storyline

vLLM v0.15.1

vLLM releases v0.15.1 with security patches for CVE-2025-69223 and CVE-2026-0994, RTX Blackwell GPU fixes, reduced torch.compile cold-start times, and new Step-3.5-Flash model support.

Feb 4 · 21:48:08 · primary fetch1 sourceupdated Feb 4 · 21:48:08

v0.15.1 is a patch release with security fixes, RTX Blackwell GPU fixes support, and bug fixes. Security CVE-2025-69223: Updated aiohttp dependency (#33621) CVE-2026-0994: Updated Protobuf dependency (#33619) Highlights Bugfix Hardware Support RTX Blackwell (SM120): Fixed NVFP4 MoE kernel support for RTX Blackwell workstation GPUs. Previously, NVFP4 MoE models would fail to load on these GPUs (#33417) FP8 kernel selection: Fixed FP8 CUTLASS group GEMM to properly fall back to Triton kernels on SM120 GPUs (#33285) Model Support Step-3.5-Flash: New model support (#33523) Bugfix Model Support Qwen3-VL-Reranker: Fixed model loading (#33298) Whisper: Fixed FlashAttention2 with full CUDA graphs (#33360) Performance torch.compile cold-start: Fixed regression that increased cold-start compilation time (Llama3-70B: ~88s → ~22s) (#33441) MoE forward pass: Optimized by caching layer name computation (#33184) Bug Fixes Fixed prefix cache hit rate of 0% with GPT-OSS style hybrid attention models (#33524) Enabled Triton MoE backend for FP8 per-tensor dynamic quantization (#33300) Disabled unsupported Renormalize routing methods for TRTLLM per-tensor FP8 MoE (#33620) Fixed speculative decoding…

read full article on github.com ↗

§ sources1 publication · timeline below

github.comvllm v0.15.1primary21:48:08