§ local-llm · storyline
MiniMax-M3 enables million-token context with efficient inference
MiniMax-M3 supports million-token context and is served by Together using sparse attention, paged MSA decode, and a Rust-based multimodal gateway.
How Together served MiniMax-M3 efficiently with KV-block-major sparse attention, paged MSA decode, optimized index scoring, and a Rust-based multimodal gateway.
§ sources1 publication · timeline below