shipfeedAI news, curated daily

00:32:32 CET
21 MAY00:32:32shipfeed
pull to refreshlast sync
Just in — 30 new
§ tools · storyline

llama.cpp b9158

llama.cpp b9158 adds RDNA3 tensor core support to the CUDA MMA flash-attention kernel and tunes parameters for RDNA3, RDNA4, and CDNA1, including head sizes up to 256 on CDNA.

May 15 · · primary fetch1 sourceupdated May 15 ·

HIP: RDNA3 mma FA, faster AMD transpose, tune AMD (#22880) Adds RDNA3 support to the CUDA mma FA kernel. To make the RDNA3 tensor cores work with the FP16 accumulation for VKQ the tiles they need to be 32 logical units long in direction of the attention head; for head sizes 80 and 112 that are not exactly divided by 32 the regular length of 16 with FP32 accumulation is used instead. The longer tiles also enable more efficient transposition for a warp size of 32 which is why it's also used for RDNA4. However, this scrambles the data layout of the accumulators along the attention head dimension.

To prevent accidental misuse I added another entry to ggml_cuda_mma::data_layout. I also tuned the kernel parameters for RDNA3, RDNA4, and CDNA1 in general, during which I discovered that the kernel can be made to work for head sizes up to 256 for CDNA. For RDNA3/4 I was not able to get better performance that the tile kernel for head sizes > 128. macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan) Ubuntu x64 (ROCm 7.2)…

read full article on github.com
§ sources6 publications · timeline below
  1. github.comllama.cpp b9158primary
  2. github.comllama.cpp b9165
  3. github.comllama.cpp b9163
  4. github.comllama.cpp b9161
  5. github.comllama.cpp b9159
  6. github.comllama.cpp b9156

§ how this story moved

  1. primaryllama.cpp — Releases publishes the launch post.
  2. llama.cpp — Releases picks up coverage.
  3. llama.cpp — Releases picks up coverage.
  4. llama.cpp — Releases picks up coverage.
  5. llama.cpp — Releases picks up coverage.
  6. llama.cpp — Releases picks up coverage.