§ local-llm · storyline

llama.cpp b9169

llama.cpp releases build b9169, adding multimodal chunk support and fixing preprocessing for Qwen3A with memory usage limits and corrected audio token handling.

May 15 · 23:29:28 · primary fetch1 sourceupdated May 15 · 23:29:28

mtmd: add chunks and fix preproc for qwen3a (#23073) mtmd: add chunks and fix preproc for qwen3a add attn_mask limit mtmd_chunk size (avoid blow up memory) correct audio tokens re-order the set_input case remove attn_mask macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan) Ubuntu x64 (ROCm 7.2) Ubuntu x64 (OpenVINO) Ubuntu x64 (SYCL FP32) Ubuntu x64 (SYCL FP16) Android: Android arm64 (CPU) Windows: Windows x64 (CPU) Windows arm64 (CPU) Windows x64 (CUDA 12) - CUDA 12.4 DLLs Windows x64 (CUDA 13) - CUDA 13.1 DLLs Windows x64 (Vulkan) Windows x64 (SYCL) Windows x64 (HIP) openEuler: openEuler x86 (310p) openEuler x86 (910b, ACL Graph) openEuler aarch64 (310p) openEuler aarch64 (910b, ACL Graph)

read full article on github.com ↗

§ sources1 publication · timeline below

github.comllama.cpp b9169primary23:29:28