§ feed · storyline

Sakana AI and NVIDIA introduce TwELL for faster LLM inference

Sakana AI and NVIDIA introduce TwELL, a sparse data format with custom CUDA kernels delivering up to 21.9% training and 20.5% inference speedups in LLM feedforward layers.

May 11 · 10:36:00 · primary fetch1 sourceupdated May 11 · 10:36:00

Sakana AI and NVIDIA introduce TwELL, a sparse data format with custom CUDA kernels that achieves 20.5% inference and 21.9% training speedup in LLMs by targeting feedforward layers.

read full article on marktechpost.com ↗

§ sources1 publication · timeline below

marktechpost.comSakana AI and NVIDIA Introduce TwELL with CUDA Kernels for 20.5% Inference and 21.9% Training Speedup in LLMsprimary10:36:00