vLLM v0.17.1
vLLM releases v0.17.1, a patch fixing MoE fusion issues, re-enabling expert parallelism for TRT-LLM FP8, and adding Nemotron 3 Super model support.
This is a patch release on top of `v0.17.0` to address a few issues: New Model: Nemotron 3 Super Fix passing of activation_type to trtllm fused MoE NVFP4 and FP8 (#36017) Fix/resupport nongated fused moe triton (#36412) Re-enable EP for trtllm MoE FP8 backend (#36494) [Mamba][Qwen3.5] Zero freed SSM cache blocks on GPU (#35219) Fix TRTLLM Block FP8 MoE Monolithic (#36296) [DSV3.2][MTP] Optimize Indexer MTP handling (#36723)
- github.comvllm v0.17.1primary