llama.cpp b9113
llama.cpp b9113 adds Q4_1 MoE support for Adreno GPUs via OpenCL, fixing supports_op handling and removing unnecessary code and asserts.
opencl: add q4_1 MoE for Adreno (#22856) Q4_1 MoE CLC pass sanity check remove unnecessary code opencl: remove unnecessary asserts and reformat opencl: fix supports_op for q4_1 moe q4_1 moe is supported by Adreno with certain shapes --------- Co-authored-by: Li He macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan) Ubuntu x64 (ROCm 7.2) Ubuntu x64 (OpenVINO) Ubuntu x64 (SYCL FP32) Ubuntu x64 (SYCL FP16) Android: Android arm64 (CPU) Windows: Windows x64 (CPU) Windows arm64 (CPU) Windows x64 (CUDA 12) - CUDA 12.4 DLLs Windows x64 (CUDA 13) - CUDA 13.1 DLLs Windows x64 (Vulkan) Windows x64 (SYCL) Windows x64 (HIP) openEuler: openEuler x86 (310p) openEuler x86 (910b, ACL Graph) openEuler aarch64 (310p) openEuler aarch64 (910b, ACL Graph)
- github.comllama.cpp b9113primary