§ feed · storyline

Gemma 4

Google DeepMind releases Gemma 4, a family of open-weight multimodal models up to 31B parameters with 256K-token context, Apache 2.0 licensing, and day-one support across llama.cpp, Ollama, and vLLM.

Apr 2 · 07:44:39 · primary fetch1 sourceupdated Apr 2 · 07:44:39

Google DeepMind released Gemma 4, a family of open-weight, multimodal models with long-context support up to 256K tokens under an Apache 2.0 license, marking a major capability and licensing shift. The lineup includes 31B dense, 26B MoE (A4B), and two edge models (E4B, E2B) optimized for local and edge deployment with native multimodal support (text, vision, audio). Early benchmarks show Gemma-4-31B ranking #3 among open models and strong scientific reasoning performance with 85.7% GPQA Diamond.

Day-0 ecosystem support includes llama.cpp, Ollama, vLLM, and LM Studio, with notable local inference performance on hardware like M2 Ultra and RTX 4090. The architecture features hybrid attention and MoE layering, diverging from standard transformers. Community and developer engagement is high, with rapid adoption and tooling integration.

read full article on news.smol.ai ↗

§ sources1 publication · timeline below

news.smol.aiGemma 4primary07:44:39