§ research · storyline
Recent Developments in LLM Architectures: KV Sharing, mHC, and Compressed Attention
Recent open-weight LLMs including Gemma 4 and DeepSeek V4 adopt KV sharing, multi-head compression, and compressed attention to cut long-context inference costs.
From Gemma 4 to DeepSeek V4, How New Open-Weight LLMs Are Reducing Long-Context Costs
§ sources1 publication · timeline below
- magazine.sebastianraschka.comRecent Developments in LLM Architectures: KV Sharing, mHC, and Compressed Attentionprimary