§ local-llm · storyline

llama.cpp b9055

llama.cpp build b9055 adds support for the Mimo v2.5 model, including tensor parallelism fixes, fused QKV handling, and MTP weight inclusion in GGUF conversion.

May 7 · 15:29:23 · primary fetch1 sourceupdated May 7 · 15:29:23

model: Add Mimo v2.5 model support (#22493) add mimo-v2.5 support mimo-v2.5: fix modify_tensors row split mimi-v2.5: forgot `add_attn_value_scale` plumbing mimi-v2.5: fix tp dequant to detect tp rows mimo-v2.5: fix TP iteration to be descending mimo-v2.5: fix comment mimo-v2.5: retain fused qkv mimo-v2.5: missed the attn_value scale during merge mimo-v2.5: fused QKV needs contiguous for scaling attention value mimo-v2.5: move `speech_embeddings.` to TextModel filter_tensors Update src/llama-hparams.h Co-authored-by: Sigbjørn Skjæret Update src/models/mimo2.cpp Co-authored-by: Sigbjørn Skjæret Update src/models/mimo2.cpp Co-authored-by: Sigbjørn Skjæret Update convert_hf_to_gguf.py Co-authored-by: Sigbjørn Skjæret Update convert_hf_to_gguf.py Co-authored-by: Sigbjørn Skjæret Update src/models/mimo2.cpp Co-authored-by: Sigbjørn Skjæret mimo-v2.5: include MTP weights in gguf --------- Co-authored-by: Sigbjørn Skjæret macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan) Ubuntu x64 (ROCm 7.2) Ubuntu x64 (OpenVINO)…

read full article on github.com ↗

§ sources1 publication · timeline below

github.comllama.cpp b9055primary15:29:23