§ local-llm · storyline

Recent Developments in LLM Architectures: KV Sharing, mHC, and Compressed Attention

Recent open-weight LLMs including Gemma 4 and DeepSeek V4 adopt KV sharing, multi-head compression, and compressed attention to cut long-context inference costs.

May 16 · 13:33:51 · primary fetch1 sourceupdated May 16 · 13:33:51

From Gemma 4 to DeepSeek V4, How New Open-Weight LLMs Are Reducing Long-Context Costs

read full article on magazine.sebastianraschka.com ↗

§ sources1 publication · timeline below

magazine.sebastianraschka.comRecent Developments in LLM Architectures: KV Sharing, mHC, and Compressed Attentionprimary13:33:51