shipfeedAI news, curated daily

23:05:52 CET
20 MAY23:05:52shipfeed
pull to refreshlast sync
Just in — 30 new
§ research · storyline

not much happened today

DeepSeek publishes a paper on manifold-constrained hyper-connections, constraining residual mixing matrices to the Birkhoff polytope to improve stability with roughly 6.7% training overhead.

Jan 2 · · primary fetch1 sourceupdated Jan 2 ·

DeepSeek released a new paper on mHC: Manifold-Constrained Hyper-Connections, advancing residual-path design as a key scaling lever in neural networks. Their approach constrains residual mixing matrices to the Birkhoff polytope to improve stability and performance, with only about 6.7% training overhead. The innovation includes systems-level optimizations like fused kernels and activation recomputation, highlighting a frontier-lab integration of math and kernel engineering.

Additionally, discussions around long-horizon agents emphasize context management bottlenecks, introducing Recursive Language Models (RLMs) that manage context dynamically rather than relying on larger context windows. This work signals a shift in architectural design and efficiency for base model training and agent development.

read full article on news.smol.ai
§ sources1 publication · timeline below
  1. news.smol.ainot much happened todayprimary