§ feed · storyline

llama.cpp b9031

llama.cpp b9031 updates backend loading so backends are only initialised when required, with ggml_backend_load_all() called directly from llama_backend_init().

May 5 · 16:05:29 · primary fetch1 sourceupdated May 5 · 16:05:29

common : only load backends when required (#22290) common : only load backends when required Signed-off-by: Adrien Gallouët llama : call ggml_backend_load_all() directly from llama_backend_init() Signed-off-by: Adrien Gallouët Add ggml_backend_load_all() where llama_backend_init() is not used Signed-off-by: Adrien Gallouët --------- Signed-off-by: Adrien Gallouët macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan) Ubuntu x64 (ROCm 7.2) Ubuntu x64 (OpenVINO) Ubuntu x64 (SYCL FP32) Ubuntu x64 (SYCL FP16) Android: Android arm64 (CPU) Windows: Windows x64 (CPU) Windows arm64 (CPU) Windows x64 (CUDA 12) - CUDA 12.4 DLLs Windows x64 (CUDA 13) - CUDA 13.1 DLLs Windows x64 (Vulkan) Windows x64 (SYCL) Windows x64 (HIP) openEuler: openEuler x86 (310p) openEuler x86 (910b, ACL Graph) openEuler aarch64 (310p) openEuler aarch64 (910b, ACL Graph)

read full article on github.com ↗

§ sources1 publication · timeline below

github.comllama.cpp b9031primary16:05:29