§ feed · storyline
Sakana AI and NVIDIA introduce TwELL for faster LLM inference
Sakana AI and NVIDIA introduce TwELL, a sparse data format with custom CUDA kernels delivering up to 21.9% training and 20.5% inference speedups in LLM feedforward layers.
Sakana AI and NVIDIA introduce TwELL, a sparse data format with custom CUDA kernels that achieves 20.5% inference and 21.9% training speedup in LLMs by targeting feedforward layers.
§ sources1 publication · timeline below