§ evals · storyline

ParallelKernelBench: Frontier LLMs can't write fast multi-GPU kernels (yet)

ParallelKernelBench benchmark shows frontier LLMs solve fewer than a third of multi-GPU CUDA kernel tasks despite occasionally outperforming public implementations.

yesterday · 02:00:00 · primary fetch1 sourceupdated yesterday · 02:00:00

ParallelKernelBench tests whether LLMs can write fast multi-GPU CUDA kernels across 87 real workloads.

The best model solves under a third, but a few generated kernels beat any public implementation.

read full article on together.ai ↗

§ sources1 publication · timeline below

together.aiParallelKernelBench: Frontier LLMs can't write fast multi-GPU kernels (yet)primary02:00:00