§ feed · storyline

Mamba-2: State Space Duality

Mamba-2 releases as a state space model with 8x larger states and 50% faster training than its predecessor, introducing state space duality to connect SSMs and linear attention.

Jun 3 · 23:31:26 · primary fetch1 sourceupdated Jun 3 · 23:31:26

Mamba-2, a new state space model (SSM), outperforms previous models like Mamba and Transformer++ in perplexity and wall-clock time, featuring 8x larger states and 50% faster training. It introduces the concept of state space duality (SSD) connecting SSMs and linear attention. The FineWeb-Edu dataset, a high-quality subset of the 15 trillion token FineWeb dataset, filtered using llama-3-70b for educational quality, enables better and faster LLM learning, potentially reducing tokens needed to surpass GPT-3 performance.

Additionally, perplexity-based data pruning using a 125M parameter model improves downstream performance and reduces pretraining steps by up to 1.45x. The Video-MME benchmark evaluates multi-modal LLMs on video analysis across multiple visual domains and video lengths.

read full article on news.smol.ai ↗

§ sources1 publication · timeline below

news.smol.aiMamba-2: State Space Dualityprimary23:31:26