vLLM v0.11.2
vLLM releases v0.11.2 with four bug fixes addressing multi-node Ray support, speculative decoding assertions, async scheduling with FlashAttn MLA, and SM100 CUTLASS MoE macro guards.
This release includes 4 bug fixes on top of `v0.11.1`: [BugFix] Ray with multiple nodes (https://github.com/vllm-project/vllm/pull/28873) [BugFix] Fix false assertion with spec-decode=[2,4,..] and TP>2 (https://github.com/vllm-project/vllm/pull/29036) [BugFix] Fix async-scheduling + FlashAttn MLA (https://github.com/vllm-project/vllm/pull/28990) [NVIDIA] Guard SM100 CUTLASS MoE macro to SM100 builds v2 (https://github.com/vllm-project/vllm/pull/28938)
- github.comvllm v0.11.2primary