shipfeedAI news, curated daily

04:09:16 CET
24 JUN04:09:16shipfeed
pull to refreshlast sync
Just in — 8 new
§ topic

llama-cpp

25 stories · 7d·6 sources covering·30 active storylines

Updated Tue, 16 Jun 2026 CEST·25 new storylines this week·live

What this is

llama.cpp is an open-source C/C++ engine for running LLMs efficiently on local hardware. shipfeed tracks llama.cpp releases, new model support, and quantization and performance work.

storylines this week30 active

llama.cpp — Releases
LLAMA.CPP · b9496

Fixes Gemma 4 unified FPE on mtmd

mtmd: fix Gemma 4 unified FPE (#24088) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu…

via github.com
Friday, April 3, 2026’s edition
Smol AI — Daily
AI · 1 source

not much happened today

Gemma 4 was launched by Google under an Apache 2.0 license, marking a significant open-model release focused on reasoning, agentic workflows, multimodality, and on-device use. It outperforms models 10x larger and has…

via news.smol.ai
Tuesday, June 2, 2026’s edition
llama.cpp — Releases
LLAMA.CPP · b9468

Adds real-time reasoning interruption via POST

server: real-time reasoning interruption via control endpoint (#23971) server: real-time reasoning interruption via control endpoint Builds on the manual reasoning budget trigger from #23949. Adds a CONTROL task that…

via github.com
Monday, June 1, 2026’s edition
Wednesday, May 13, 2026’s edition
Ollama — Releases
OLLAMA · 1 source

Ollama v0.30.0-rc27

This version of Ollama will change the architecture to directly support llama.cpp instead of building on top of GGML, and allows for compatibility with GGUF file format. MLX is used to accelerate model inference on…

via github.com
Ollama — Releases
OLLAMA · 1 source

Ollama v0.30.0-rc17

This version of Ollama will change the architecture to directly support llama.cpp instead of building on top of GGML, and allows for compatibility with GGUF file format. MLX is used to accelerate model inference on…

via github.com
Wednesday, June 3, 2026’s edition
Saturday, May 16, 2026’s edition
llama.cpp — Releases
LLAMA.CPP · 1 source

llama.cpp b9186

sync : ggml macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan)…

via github.com
Wednesday, May 13, 2026’s edition
Ollama — Releases
OLLAMA · 1 source

Ollama v0.30.0-rc22

This version of Ollama will change the architecture to directly support llama.cpp instead of building on top of GGML, and allows for compatibility with GGUF file format. MLX is used to accelerate model inference on…

via github.com
Ollama — Releases
OLLAMA · 1 source

Ollama v0.30.0-rc20

This version of Ollama will change the architecture to directly support llama.cpp instead of building on top of GGML, and allows for compatibility with GGUF file format. MLX is used to accelerate model inference on…

via github.com
Ollama — Releases
OLLAMA · 1 source

Ollama v0.30.0-rc21

This version of Ollama will change the architecture to directly support llama.cpp instead of building on top of GGML, and allows for compatibility with GGUF file format. MLX is used to accelerate model inference on…

via github.com
Ollama — Releases
OLLAMA · 1 source

Ollama v0.30.0-rc29

This version of Ollama will change the architecture to directly support llama.cpp instead of building on top of GGML, and allows for compatibility with GGUF file format. MLX is used to accelerate model inference on…

via github.com
Ollama — Releases
OLLAMA · 1 source

Ollama v0.30.0-rc31

This version of Ollama will change the architecture to directly support llama.cpp instead of building on top of GGML, and allows for compatibility with GGUF file format. MLX is used to accelerate model inference on…

via github.com
Ollama — Releases
OLLAMA · 1 source

Ollama v0.30.0-rc15

This version of Ollama will change the architecture to directly support llama.cpp instead of building on top of GGML, and allows for compatibility with GGUF file format. MLX is used to accelerate model inference on…

via github.com
Tuesday, June 16, 2026’s edition
llama.cpp — Releases
LLAMA.CPP · b9660

Fixes LFM2 tool-call parsing double-escaping in chat

chat : fix LFM2 tool-call parsing double-escaping (#24667) Add escape test cases chat : fix LFM2 tool-call parsing double-escaping macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled)…

via github.com
Monday, June 15, 2026’s edition
llama.cpp — Releases
LLAMA.CPP · b9659

Fixes miscounting n_tokens in multi-threaded mode

mtmd: fix miscounting n_tokens (#24656) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu…

via github.com
Wednesday, June 3, 2026’s edition
llama.cpp — Releases
LLAMA.CPP · b9494

Enables non-causal vision for Gemma 4 unified

mtmd: enable non-causal vision for gemma 4 unified (#24082) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu…

via github.com
Tuesday, June 2, 2026’s edition
Monday, June 1, 2026’s edition
llama.cpp — Releases
LLAMA.CPP · b9460

Limits max outputs of llama_context to save VRAM

llama: limit max outputs of `llama_context` (#23861) llama: save more VRAM by reserving n_outputs == n_seqs when possible add n_outputs_per_seq move n_outputs_max to server-context change ubatch to batch everywhere…

via github.com
llama.cpp — Releases
LLAMA.CPP · b9453

Adds EXAONE 4.5 model support with vision and GQA

vulkan: Removed unused functions (#23175) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU)…

via github.com
Friday, May 22, 2026’s edition
Wednesday, May 20, 2026’s edition
llama.cpp — Releases
LLAMA.CPP · 1 source

llama.cpp b9244

opencl: add MoE support for q4_k, q5_k, q6_k on Adreno (#23303) opencl: add q4_k moe support opencl: add q5_k moe support opencl: add q6_k moe support opencl: adjust format --------- Co-authored-by: Li He macOS/iOS…

via github.com
Tuesday, May 19, 2026’s edition
llama.cpp — Releases
LLAMA.CPP · 1 source

llama.cpp b9222

hexagon: add support for TRI op (#22822) Hexagon: TRI HVX Kernel addition to ggml hexagon HTP ops and context addressed PR review comments for TRI op hexagon: clang format hex-unary: remove merge conflict markers…

via github.com