§ feed · storyline

llama.cpp b9113

llama.cpp b9113 adds Q4_1 MoE support for Adreno GPUs via OpenCL, fixing supports_op handling and removing unnecessary code and asserts.

May 12 · 01:54:54 · primary fetch1 sourceupdated May 12 · 01:54:54

opencl: add q4_1 MoE for Adreno (#22856) Q4_1 MoE CLC pass sanity check remove unnecessary code opencl: remove unnecessary asserts and reformat opencl: fix supports_op for q4_1 moe q4_1 moe is supported by Adreno with certain shapes --------- Co-authored-by: Li He macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan) Ubuntu x64 (ROCm 7.2) Ubuntu x64 (OpenVINO) Ubuntu x64 (SYCL FP32) Ubuntu x64 (SYCL FP16) Android: Android arm64 (CPU) Windows: Windows x64 (CPU) Windows arm64 (CPU) Windows x64 (CUDA 12) - CUDA 12.4 DLLs Windows x64 (CUDA 13) - CUDA 13.1 DLLs Windows x64 (Vulkan) Windows x64 (SYCL) Windows x64 (HIP) openEuler: openEuler x86 (310p) openEuler x86 (910b, ACL Graph) openEuler aarch64 (310p) openEuler aarch64 (910b, ACL Graph)

read full article on github.com ↗

§ sources1 publication · timeline below

github.comllama.cpp b9113primary01:54:54