§ feed · storyline
DFlash Speculative Decoding Drafts Whole Token Blocks in Parallel
DFlash speculative decoding model drafts entire token blocks in parallel, achieving up to 15x throughput gains on NVIDIA Blackwell GPUs.
Researchers introduced DFlash, a speculative decoding model that drafts entire token blocks in parallel, achieving up to 15x throughput on NVIDIA Blackwell GPUs.
§ sources1 publication · timeline below