§ local-llm · storyline

vLLM v0.11.2

vLLM releases v0.11.2 with four bug fixes addressing multi-node Ray support, speculative decoding assertions, async scheduling with FlashAttn MLA, and SM100 CUTLASS MoE macro guards.

Nov 20 · 08:29:19 · primary fetch1 sourceupdated Nov 20 · 08:29:19

This release includes 4 bug fixes on top of `v0.11.1`: [BugFix] Ray with multiple nodes (https://github.com/vllm-project/vllm/pull/28873) [BugFix] Fix false assertion with spec-decode=[2,4,..] and TP>2 (https://github.com/vllm-project/vllm/pull/29036) [BugFix] Fix async-scheduling + FlashAttn MLA (https://github.com/vllm-project/vllm/pull/28990) [NVIDIA] Guard SM100 CUTLASS MoE macro to SM100 builds v2 (https://github.com/vllm-project/vllm/pull/28938)

read full article on github.com ↗

§ sources1 publication · timeline below

github.comvllm v0.11.2primary08:29:19