shipfeedAI news, curated daily

04:09:16 CET

pull to refreshlast sync 02:00:13

Just in — 8 new

§ topic

llama-cpp

25 stories · 7d·6 sources covering·30 active storylines

Updated Tue, 16 Jun 2026 00:05:01 CEST·25 new storylines this week·live

What this is

llama.cpp is an open-source C/C++ engine for running LLMs efficiently on local hardware. shipfeed tracks llama.cpp releases, new model support, and quantization and performance work.

storylines this week30 active

22:23:18llama.cpp — Releases

LLAMA.CPP · b9496

Fixes Gemma 4 unified FPE on mtmd

mtmd: fix Gemma 4 unified FPE (#24088) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu…

Friday, April 3, 2026’s editionFriday, April 3, 2026

07:44:39Smol AI — Daily

AI · 1 source

not much happened today

Gemma 4 was launched by Google under an Apache 2.0 license, marking a significant open-model release focused on reasoning, agentic workflows, multimodality, and on-device use. It outperforms models 10x larger and has…

via news.smol.ai

Tuesday, June 2, 2026’s editionTuesday, June 2, 2026

07:53:36llama.cpp — Releases

LLAMA.CPP · b9468

Adds real-time reasoning interruption via POST

server: real-time reasoning interruption via control endpoint (#23971) server: real-time reasoning interruption via control endpoint Builds on the manual reasoning budget trigger from #23949. Adds a CONTROL task that…

Monday, June 1, 2026’s editionMonday, June 1, 2026

06:30:11NVIDIA — AI Blog

AGENTS · 1 source

NVIDIA Levels Up Local AI Agents Across RTX PCs and DGX Spark

Announced at GTC Taipei at COMPUTEX, NVIDIA OpenShell brings secure agents to Windows with 2x inference performance on llama.cpp — plus, Adobe rebuilds its apps with performance and memory enhancements, and Blender…

via blogs.nvidia.com

Wednesday, May 13, 2026’s editionWednesday, May 13, 2026

16:32:54Ollama — Releases

OLLAMA · 1 source

Ollama v0.30.0-rc27

This version of Ollama will change the architecture to directly support llama.cpp instead of building on top of GGML, and allows for compatibility with GGUF file format. MLX is used to accelerate model inference on…

16:32:54Ollama — Releases

OLLAMA · v0.30.0

Switches to llama.cpp backend for GGUF compatibility and MLX

This version of Ollama will change the architecture to directly support llama.cpp instead of building on top of GGML, and allows for compatibility with GGUF file format. MLX is used to accelerate model inference on…

16:32:54Ollama — Releases

OLLAMA · 1 source

Ollama v0.30.0-rc17

This version of Ollama will change the architecture to directly support llama.cpp instead of building on top of GGML, and allows for compatibility with GGUF file format. MLX is used to accelerate model inference on…

Wednesday, June 3, 2026’s editionWednesday, June 3, 2026

02:37:18Ollama — Releases

OLLAMA · v0.30.2

Adds Cline CLI auto-install and Qwen code integration to launch

What's Changed feat(launch): show and auto-install Cline CLI by @hoyyeva in https://github.com/ollama/ollama/pull/16402 log template details to aid troubleshooting by @dhiltgen in…

* sponsored·▶ nimbus

Need an agent shipped this quarter?

Nimbus builds production AI systems — internal tools, customer agents, retrieval pipelines — combining humans and AI end-to-end. From scoped pilot to production in 4–8 weeks.

Nimbus — talk to Nimbus →

Saturday, May 16, 2026’s editionSaturday, May 16, 2026

21:22:23llama.cpp — Releases

LLAMA.CPP · 1 source

llama.cpp b9186

sync : ggml macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan)…

Wednesday, May 13, 2026’s editionWednesday, May 13, 2026

16:32:54Ollama — Releases

OLLAMA · 1 source

Ollama v0.30.0-rc22

This version of Ollama will change the architecture to directly support llama.cpp instead of building on top of GGML, and allows for compatibility with GGUF file format. MLX is used to accelerate model inference on…

16:32:54Ollama — Releases

OLLAMA · 1 source

Ollama v0.30.0-rc20

This version of Ollama will change the architecture to directly support llama.cpp instead of building on top of GGML, and allows for compatibility with GGUF file format. MLX is used to accelerate model inference on…

16:32:54Ollama — Releases

OLLAMA · 1 source

Ollama v0.30.0-rc21

This version of Ollama will change the architecture to directly support llama.cpp instead of building on top of GGML, and allows for compatibility with GGUF file format. MLX is used to accelerate model inference on…

16:32:54Ollama — Releases

OLLAMA · v0.30.0-rc32

Switches to llama.cpp backend and adds MLX acceleration for Apple

This version of Ollama will change the architecture to directly support llama.cpp instead of building on top of GGML, and allows for compatibility with GGUF file format. MLX is used to accelerate model inference on…

16:32:54Ollama — Releases

OLLAMA · 1 source

Ollama v0.30.0-rc29

This version of Ollama will change the architecture to directly support llama.cpp instead of building on top of GGML, and allows for compatibility with GGUF file format. MLX is used to accelerate model inference on…

16:32:54Ollama — Releases

OLLAMA · 1 source

Ollama v0.30.0-rc31

This version of Ollama will change the architecture to directly support llama.cpp instead of building on top of GGML, and allows for compatibility with GGUF file format. MLX is used to accelerate model inference on…

16:32:54Ollama — Releases

OLLAMA · 1 source

Ollama v0.30.0-rc15

This version of Ollama will change the architecture to directly support llama.cpp instead of building on top of GGML, and allows for compatibility with GGUF file format. MLX is used to accelerate model inference on…

Tuesday, June 16, 2026’s editionTuesday, June 16, 2026

00:05:01llama.cpp — Releases

LLAMA.CPP · b9660

Fixes LFM2 tool-call parsing double-escaping in chat

chat : fix LFM2 tool-call parsing double-escaping (#24667) Add escape test cases chat : fix LFM2 tool-call parsing double-escaping macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled)…

Monday, June 15, 2026’s editionMonday, June 15, 2026

23:32:11llama.cpp — Releases

LLAMA.CPP · b9659

Fixes miscounting n_tokens in multi-threaded mode

mtmd: fix miscounting n_tokens (#24656) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu…

19:46:50arXiv — cs.AI

AGENTS · 1 source

TokenPilot: Cache-Efficient Context Management for LLM Agents

Wednesday, June 3, 2026’s editionWednesday, June 3, 2026

19:35:34llama.cpp — Releases

LLAMA.CPP · b9494

Enables non-causal vision for Gemma 4 unified

mtmd: enable non-causal vision for gemma 4 unified (#24082) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu…

Tuesday, June 2, 2026’s editionTuesday, June 2, 2026

22:44:52Ollama — Releases

OLLAMA · v0.30.1-rc0

Adds Cline CLI auto-install and Qwen code integration to launch

What's Changed feat(launch): show and auto-install Cline CLI by @hoyyeva in https://github.com/ollama/ollama/pull/16402 log template details to aid troubleshooting by @dhiltgen in…

18:57:07llama.cpp — Releases

LLAMA.CPP · b9481

Adds support for IBM Granite Embedding multilingual R2 models

model : support granite multilingual embeddings R2 (ibm-granite/granite-embedding-{97,311}m-multilingual-r2) (#22716) Add support for the ibm-granite/granite-embedding-{97m,311m}-multilingual-r2 embedding models: Added…

13:21:34llama.cpp — Releases

LLAMA.CPP · b9474

Adds thinking mode toggle with reasoning effort levels to chat UI

ui: Add Thinking mode toggle with reasoning effort levels + improvements for Chat Form Add Action UI (#23434) feat: Add "Thinking" toggle and status icon + redesign Chat Form Actions Add panel test: Update test…

Monday, June 1, 2026’s editionMonday, June 1, 2026

21:23:22llama.cpp — Releases

LLAMA.CPP · b9460

Limits max outputs of llama_context to save VRAM

llama: limit max outputs of `llama_context` (#23861) llama: save more VRAM by reserving n_outputs == n_seqs when possible add n_outputs_per_seq move n_outputs_max to server-context change ubatch to batch everywhere…

* sponsored·▶ nimbus

Need an agent shipped this quarter?

Nimbus builds production AI systems — internal tools, customer agents, retrieval pipelines — combining humans and AI end-to-end. From scoped pilot to production in 4–8 weeks.

Nimbus — talk to Nimbus →

15:13:30llama.cpp — Releases

LLAMA.CPP · b9453

Adds EXAONE 4.5 model support with vision and GQA

vulkan: Removed unused functions (#23175) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU)…

Friday, May 22, 2026’s editionFriday, May 22, 2026

02:48:34llama.cpp — Releases

LLAMA.CPP · b9276

Exposes prompt token counts in /slots endpoint for progress monitoring

server: expose prompt token counts in /slots endpoint (#23454) Add n_prompt_tokens, n_prompt_tokens_processed, and n_prompt_tokens_cache to the /slots JSON response. These fields are already tracked internally but were…

Wednesday, May 20, 2026’s editionWednesday, May 20, 2026

23:38:57llama.cpp — Releases

LLAMA.CPP · b9254

Enables Programmatic Dependent Launch for better performance

Programmatic Dependent Launch (PDL) for more performance on newer NVIDIA GPUs (Hopper+) (#22522) Adds initial PDL setup. Adds PDL barriers based on simple heuristic: place "sync" before first input pointer access, and…

14:09:47llama.cpp — Releases

LLAMA.CPP · b9247

Optimizes Metal pad and copy operations with better threadgroup row

metal : optimize pad + cpy (#23354) metal : optimize pad metal : optinmize cpy cont : better row packing in threadgroup macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel…

05:39:52llama.cpp — Releases

LLAMA.CPP · 1 source

llama.cpp b9244

opencl: add MoE support for q4_k, q5_k, q6_k on Adreno (#23303) opencl: add q4_k moe support opencl: add q5_k moe support opencl: add q6_k moe support opencl: adjust format --------- Co-authored-by: Li He macOS/iOS…

Tuesday, May 19, 2026’s editionTuesday, May 19, 2026

02:29:07llama.cpp — Releases

LLAMA.CPP · 1 source

llama.cpp b9222

hexagon: add support for TRI op (#22822) Hexagon: TRI HVX Kernel addition to ggml hexagon HTP ops and context addressed PR review comments for TRI op hexagon: clang format hex-unary: remove merge conflict markers…

At a glance

storylines tracked185

new · 7 days25

new · 30 days109

last updated00:05:01

FAQ

What's the latest in llama-cpp?

The most recent llama-cpp storyline on shipfeed is "Fixes LFM2 tool-call parsing double-escaping in chat". shipfeed grouped 25 new llama-cpp storylines from across the AI press in the past 7 days.

Which sources cover llama-cpp?

The sources most active in shipfeed's llama-cpp feed are llama.cpp — Releases, Ollama — Releases, Hugging Face — Blog, Continue — Releases, and Smol AI — Daily.

How many llama-cpp stories does shipfeed track?

shipfeed is tracking 185 llama-cpp storylines in total — 25 updated in the past 7 days and 109 in the past 30 — each a deduplicated group of articles from its original sources.

How often is this page updated?

Continuously. shipfeed re-checks its llama-cpp sources around the clock and regroups new coverage into deduplicated storylines; the last-updated time is shown at the top of this page.

topics

research1399 tools1217 agents819 vercel-ai-sdk521 chatgpt420