§ feed · storyline

Jamba: Mixture of Architectures dethrones Mixtral

AI21 Labs releases Jamba, a 52B-parameter hybrid transformer-Mamba MoE model with 256K context length and Apache 2.0 open weights, optimised to run on a single A100 GPU.

Mar 29 · 00:43:23 · primary fetch1 sourceupdated Mar 29 · 00:43:23

AI21 labs released Jamba, a 52B parameter MoE model with 256K context length and open weights under Apache 2.0 license, optimized for single A100 GPU performance. It features a unique blocks-and-layers architecture combining transformer and MoE layers, competing with models like Mixtral. Meanwhile, Databricks introduced DBRX, a 36B active parameter MoE model trained on 12T tokens, noted as a new standard for open LLMs.

In image generation, advancements include Animatediff for video-quality image generation and FastSD CPU v1.0.0 beta 28 enabling ultra-fast image generation on CPUs. Other innovations involve style-content separation using B-LoRA and improvements in high-resolution image upscaling with SUPIR.

read full article on news.smol.ai ↗

§ sources1 publication · timeline below

news.smol.aiJamba: Mixture of Architectures dethrones Mixtralprimary00:43:23