shipfeedAI news, curated daily

00:32:49 CET
21 MAY00:32:49shipfeed
pull to refreshlast sync
Just in — 30 new
§ feed · storyline

Mamba-2: State Space Duality

Mamba-2 releases as a state space model with 8x larger states and 50% faster training than its predecessor, introducing state space duality to connect SSMs and linear attention.

Jun 3 · · primary fetch1 sourceupdated Jun 3 ·

Mamba-2, a new state space model (SSM), outperforms previous models like Mamba and Transformer++ in perplexity and wall-clock time, featuring 8x larger states and 50% faster training. It introduces the concept of state space duality (SSD) connecting SSMs and linear attention. The FineWeb-Edu dataset, a high-quality subset of the 15 trillion token FineWeb dataset, filtered using llama-3-70b for educational quality, enables better and faster LLM learning, potentially reducing tokens needed to surpass GPT-3 performance.

Additionally, perplexity-based data pruning using a 125M parameter model improves downstream performance and reduces pretraining steps by up to 1.45x. The Video-MME benchmark evaluates multi-modal LLMs on video analysis across multiple visual domains and video lengths.

read full article on news.smol.ai
§ sources1 publication · timeline below
  1. news.smol.aiMamba-2: State Space Dualityprimary