§ local-llm · storyline

llama.cpp b9045

llama.cpp b9045 adds support for IBM's Granite 4.0 1B Speech model in llama.cpp, including a Conformer encoder, QFormer projector, and log-mel spectrogram preprocessing via the mtmd subsystem.

May 6 · 15:33:51 · primary fetch1 sourceupdated May 6 · 15:33:51

mtmd: add granite-speech support (ibm-granite/granite-4.0-1b-speech) (#22101) mtmd: add granite-speech support (ibm-granite/granite-4.0-1b-speech) Conformer encoder with Shaw relative position encoding, QFormer projector, log-mel spectrogram with frame stacking. Encoder uses GLU gating, folded batch norm, and SSM depthwise conv. QFormer compresses encoder output via windowed cross-attention (window=15, queries=3) into the LLM embedding space. Audio preprocessing: reflect-padded STFT, 80-bin mel filterbank, dynamic range compression, 2x frame stacking (80->160 mel). GGUF converter handles batch norm folding at export time, fused K/V split, and Conv1d weight reshaping.

Tested against HF transformers reference: token-for-token match on 30s/60s audio clips with greedy decoding. mtmd: rename gs_ prefixed tensors to generic/architecture names mtmd: use tensor_mapping.py for all granite_speech tensors convert: fold GraniteSpeechTextModel into GraniteModel mtmd: replace n_layer hack with explicit has_standard_layers flag mtmd: replace hardcoded magic numbers with GGUF hparams for granite speech mtmd: align KEY_A_ define spacing convert: register GraniteModel for…

read full article on github.com ↗

§ sources1 publication · timeline below

github.comllama.cpp b9045primary15:33:51