shipfeedAI news, curated daily

00:36:35 CET
21 MAY00:36:35shipfeed
pull to refreshlast sync
Just in — 30 new
§ tools · storyline

not much happened today

vLLM v0.20.0 releases with TurboQuant 2-bit KV cache, broader hardware support, and day-0 compatibility for new models including Poolside Laguna XS.2 and NVIDIA Nemotron 3 Nano Omni.

Apr 28 · · primary fetch1 sourceupdated Apr 28 ·

vLLM v0.20.0 introduces significant improvements in memory and MoE serving efficiency, including TurboQuant 2-bit KV cache for 4× KV capacity and a 2.1% latency improvement. The update supports multiple hardware platforms like DeepSeek V4 MegaMoE on Blackwell, Jetson Thor, ROCm, Intel XPU, and Grace-Blackwell setups. Early benchmarks show DeepSeek V4 Pro on B300 hardware can be up to 8× faster than H200. The ecosystem is rapidly adopting day-0 support for new open models such as Poolside Laguna XS.2, Ling-2.6-flash, and NVIDIA Nemotron 3 Nano Omni.

Poolside released Laguna XS.2, a 33B total / 3B active MoE coding model under Apache 2.0, capable of running on a single GPU, with hybrid attention and FP8 KV cache, performing near Qwen-3.5. NVIDIA launched Nemotron 3 Nano Omni, a 30B / A3B multimodal MoE with 256K context, supporting text, image, video, audio, and documents, with immediate distribution across multiple platforms. Discussions highlighted tradeoffs in quantization methods and a shift away from CUDA lock-in towards heterogeneous accelerator support.

read full article on news.smol.ai
§ sources1 publication · timeline below
  1. news.smol.ainot much happened todayprimary