§ local-llm · storyline

vLLM v0.17.1

vLLM releases v0.17.1, a patch fixing MoE fusion issues, re-enabling expert parallelism for TRT-LLM FP8, and adding Nemotron 3 Super model support.

Mar 11 · 11:24:34 · primary fetch1 sourceupdated Mar 11 · 11:24:34

This is a patch release on top of `v0.17.0` to address a few issues: New Model: Nemotron 3 Super Fix passing of activation_type to trtllm fused MoE NVFP4 and FP8 (#36017) Fix/resupport nongated fused moe triton (#36412) Re-enable EP for trtllm MoE FP8 backend (#36494) [Mamba][Qwen3.5] Zero freed SSM cache blocks on GPU (#35219) Fix TRTLLM Block FP8 MoE Monolithic (#36296) [DSV3.2][MTP] Optimize Indexer MTP handling (#36723)

read full article on github.com ↗

§ sources1 publication · timeline below

github.comvllm v0.17.1primary11:24:34