46 stories · 7d·6 sources covering·30 active storylines
Updated Fri, 19 Jun 2026 CEST·46 new storylines this week·live
What this is
Local LLMs are open-weight models you can run on your own hardware. shipfeed tracks open-weight releases, quantization, and local inference tools like llama.cpp and Ollama.
Juro Osawa / The Information: Chinese AI developer MiniMax launches M3, a new coding model that it says rivals Opus 4.7, costing $0.12 per 1M input tokens, compared with $5 for Opus 4.7 — Chinese AI developer…
Gemma 4 was launched by Google under an Apache 2.0 license, marking a significant open-model release focused on reasoning, agentic workflows, multimodality, and on-device use. It outperforms models 10x larger and has…
NVIDIA’s Nemotron 3 Super is a 120B parameter / ~12B active open model featuring a hybrid Mamba-Transformer / SSM Latent MoE architecture and 1M context window, delivering up to 2.2x faster inference than GPT-OSS-120B…
Google Deepmind's Gemma 4 12B is an open-source model that processes text, images, and audio natively and runs on laptops with just 16 GB of RAM. It nearly matches the twice-as-large 26B model in benchmarks and ships…
server: real-time reasoning interruption via control endpoint (#23971) server: real-time reasoning interruption via control endpoint Builds on the manual reasoning budget trigger from #23949. Adds a CONTROL task that…
Nimbus builds production AI systems — internal tools, customer agents, retrieval pipelines — combining humans and AI end-to-end. From scoped pilot to production in 4–8 weeks.
Maximilian Schreiner / The Decoder: Nvidia launches Nemotron 3 Ultra, a 550B-parameter MoE open model; Artificial Analysis: it's the smartest open US model but trails the Chinese model Kimi K2.6 — It has roughly…
According to benchmark platform Artificial Analysis, Nvidia's new Nemotron 3 Ultra is the most capable open AI model from the US to date. The article Nvidia's Nemotron 3 Ultra becomes the smartest open US model, but…
Highlights This release features 459 commits from 230 contributors (63 new)! DeepSeek V4 maturity: DeepSeek V4 received a major hardening pass this cycle — the model was reorganized into a dedicated…
The Canadian AI company Cohere is releasing its most powerful language model to date, Command A+, as open source under an Apache 2.0 license. The article Cohere open-sources its strongest model yet appeared first on…
Carl Franzen / VentureBeat: Cohere releases Command A+, a sparse MoE open model built for agentic tasks, with 218B total and 25B active parameters, its first under the Apache 2.0 license — Canadian AI lab Cohere made…
This version of Ollama will change the architecture to directly support llama.cpp instead of building on top of GGML, and allows for compatibility with GGUF file format. MLX is used to accelerate model inference on…
This version of Ollama will change the architecture to directly support llama.cpp instead of building on top of GGML, and allows for compatibility with GGUF file format. MLX is used to accelerate model inference on…
This version of Ollama will change the architecture to directly support llama.cpp instead of building on top of GGML, and allows for compatibility with GGUF file format. MLX is used to accelerate model inference on…
Release v5.8.0 New Model additions DeepSeek-V4 DeepSeek-V4 is the next-generation MoE (Mixture of Experts) language model from DeepSeek that introduces several architectural innovations over DeepSeek-V3. The…
vLLM v0.20.0 Highlights This release features 752 commits from 320 contributors (123 new)! DeepSeek V4: Initial DeepSeek V4 support landed (#40860), with DSML token-leakage fix in DSV4/3.2 (#40806), DSA + MTP IMA fix…
Alibaba released Qwen3.6-27B, a dense, Apache 2.0 open coding model with thinking and non-thinking modes, outperforming the larger Qwen3.5-397B-A17B on multiple coding benchmarks including SWE-bench and Terminal-Bench…
Moonshot's Kimi K2.6 is a major open-weight 1T-parameter MoE model featuring 32B active parameters, 384 experts, MLA attention, 256K context window, native multimodality, and INT4 quantization. It supports day-0…
Nimbus builds production AI systems — internal tools, customer agents, retrieval pipelines — combining humans and AI end-to-end. From scoped pilot to production in 4–8 weeks.
vLLM v0.19.0 Highlights This release features 448 commits from 197 contributors (54 new)! Gemma 4 support: Full Google Gemma 4 architecture support including MoE, multimodal, reasoning, and tool-use capabilities…
vLLM v0.16.0 Please note that this release was branch cut on Feb 8, so any features added to vLLM after that date is not included. Highlights This release features 440 commits from 203 contributors (7 new)! Async…
Highlights This release features approximately 660 commits from 251 contributors (86 new contributors). Breaking Changes: Async scheduling is now enabled by default - Users who experience issues can disable with…
MiniMax M2.1 launches as an open-source agent and coding Mixture-of-Experts (MoE) model with ~10B active / ~230B total parameters, claiming to outperform Gemini 3 Pro and Claude Sonnet 4.5, and supports local inference…
GLM-4.7 and MiniMax M2.1 open-weight model releases highlight day-0 ecosystem support, coding throughput, and agent workflows, with GLM-4.7 achieving a +9.5% improvement over GLM-4.6 and MiniMax M2.1 positioned as an…