Shipfeed. AI News Channel

items50 latest

▶ llama.cpp·12:25

llama.cpp b9840

DeepSeek V4 (#24162) convert: add dsv4 conversion add basic setup add llm_graph_input_dsv4 add save-load state add sinkhorn eps - correction by @fairydreaming add rope fix cleanup dead code fix bugs support pro model…

llama.cpp — Releases

▶ llama.cpp·11:45

llama.cpp b9839

tools/ui: restore Tailwind scanning in ignored worktrees (#24879) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU)…

llama.cpp — Releases

▶ llama.cpp·09:33

llama.cpp b9838

common : remove unused regex-partial (#25118) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU)…

llama.cpp — Releases

▶ llama.cpp·02:05

llama.cpp b9837

jinja, chat: add --reasoning-preserve flag (#25105) jinja, chat: add --reasoning-preserve flag correct help message macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS…

llama.cpp — Releases

▶ llama.cpp·21:37

llama.cpp b9835

ui: fix stop and reasoning skip in single-model mode (#25084) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU)…

llama.cpp — Releases

▶ llama.cpp·17:32

llama.cpp b9833

chat : implement minicpm5 parser (#24889) Add minicpm5 tool call parser Refactor MiniCPM5 PEG parser per review feedback Fix jinja min/max API to match Jinja2 modify by review MiniCPM5: use autoparser for XML tool…

llama.cpp — Releases

▶ llama.cpp·16:18

llama.cpp b9832

jinja: add --dump-prog for debugging (#25086) jinja: add --dump-prog for debugging Update common/jinja/runtime.cpp Co-authored-by: Sigbjørn Skjæret --------- Co-authored-by: Sigbjørn Skjæret macOS/iOS: macOS Apple…

llama.cpp — Releases

▶ llama.cpp·15:38

llama.cpp b9831

spec : add DFlash support (#22105) spec: add DFlash v2 support dflash: support sliding window attention per layer_types docs: add dflash section --------- Co-authored-by: Kashif Rasul macOS/iOS: macOS Apple Silicon…

llama.cpp — Releases

▶ llama.cpp·13:03

llama.cpp b9830

common : allow --offline in llama download (#25091) Expose the existing --offline flag to `llama download` so a script can run it to check whether a model is already cached and ready to be served without touching the…

llama.cpp — Releases

▶ llama.cpp·08:46

llama.cpp b9829

logs : reduce v2 (#25078) server : reduce logs cont : common cont : spec cont : CMN_ -> COM_ macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS…

llama.cpp — Releases

▶ llama.cpp·01:15

llama.cpp b9828

opencl: flash attention improvement (#25069) opencl: rework FA kernel for f16 and f32 opencl: flash-attention prefill prepass kernels flash_attn_kv_pad_f16 pads the tail KV tile to a BLOCK_N multiple…

llama.cpp — Releases

▶ llama.cpp·14:49

llama.cpp b9827

[CUDA] Added a cudaMemcpy2DAsync fast path to ggml_cuda_cpy (#25057) [CUDA] Added a cudaMemcpy2DAsync fast path to ggml_cuda_cpy Add a CUDA ggml_cpy fast path for same-type, same-shape strided copies that are just 2D…

llama.cpp — Releases

▶ llama.cpp·13:00

llama.cpp b9826

sycl : fix failed ut cases of norm (#25044) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU)…

llama.cpp — Releases

▶ llama.cpp·12:31

llama.cpp b9825

vulkan: fix step operator for 0 input (#25036) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU)…

llama.cpp — Releases

▶ llama.cpp·12:03

llama.cpp b9824

binaries : Improve rpc-server and export-graph-ops names. (#25045) Tests are generally prefixed with -test, so rename export-graph-ops accordingly. rpc-server is probably too generic a name for /usr/bin. Because it…

llama.cpp — Releases

▶ llama.cpp·11:13

llama.cpp b9823

ci : add windows-openvino to check-release (#25022) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64…

llama.cpp — Releases

▶ llama.cpp·10:38

llama.cpp b9822

tests : fix test-chat-template --no-common option (#25075) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu…

llama.cpp — Releases

▶ llama.cpp·23:55

llama.cpp b9821

app : allow --version, --licenses & --help (#25054) Signed-off-by: Adrien Gallouët macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux…

llama.cpp — Releases

▶ llama.cpp·20:35

llama.cpp b9820

sched : reintroduce less synchronizations during split compute (#20793) CUDA: Improve performance via less synchronizations between token (#17795) Adds CPU-to-CUDA copy capability to…

llama.cpp — Releases

▶ llama.cpp·19:27

llama.cpp b9817

openvino: Update to OV 2026.2.1, self-contained release packages, operator improvements (#24974) Update to OV 2026.2.1, Make OV release packages self-contained Update to OV 2026.2.1, Make OV release packages…

llama.cpp — Releases

▶ llama.cpp·18:48

llama.cpp b9816

sync : ggml macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64…

llama.cpp — Releases

▶ llama.cpp·18:14

llama.cpp b9814

vulkan: opt mul_mat_vecq for mi50 (#22933) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU)…

llama.cpp — Releases

▶ llama.cpp·17:38

llama.cpp b9813

vulkan: add INTEL_XE1 arch enum and enable coopmat1 on Intel Xe-LPG Plus (#24404) vulkan: add INTEL_PRE_XE2 arch enum and enable coopmat1 on Intel Xe-LPG Plus (1/3, Xe1-ARLH) Co-authored-by: Xia, Jie Co-authored-by…

llama.cpp — Releases

▶ llama.cpp·15:04

llama.cpp b9811

vulkan: Workaround compiler bug in conv2d coopmat2 path (#24924) vulkan: Workaround compiler bug in conv2d coopmat2 path apply same workaround to CONV_3D Apply suggestion from @jeffbolznv macOS/iOS: macOS Apple Silicon…

llama.cpp — Releases

▶ llama.cpp·13:44

llama.cpp b9810

CUDA: add cublasSgemmBatched mapping for HIP/MUSA vendor headers (#25033) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu…

llama.cpp — Releases

▶ llama.cpp·09:15

llama.cpp b9804

mamba2: remove hardcoded 2x expansion factor and invalid d_inner % d_state check (#23082) mamba2: remove hardcoded 2x expansion factor, support any expand value mamba2: remove invalid d_inner %% d_state check…

llama.cpp — Releases

▶ llama.cpp·04:26

llama.cpp b9803

opencl: flush profiling batch at shutdown for incomplete batches (#25016) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu…

llama.cpp — Releases

▶ llama.cpp·00:49

llama.cpp b9802

macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu…

llama.cpp — Releases

▶ llama.cpp·11:22

llama.cpp b9789

quant : fix quantizing moe with mtp (#24986) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU)…

llama.cpp — Releases

▶ llama.cpp·10:45

llama.cpp b9788

sycl : support --split-mode tensor (#24152) Sycl tp stage1 (#1) SYCL: tensor parallelism (--split-mode tensor) for dual-GPU Adds the comm_init/comm_free/comm_allreduce_tensor trio that the meta-backend queries via…

llama.cpp — Releases

▶ llama.cpp·08:06

llama.cpp b9787

sycl : fix the failed UT cases of conv_3d (#24900) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64…

llama.cpp — Releases

▶ llama.cpp·05:01

llama.cpp b9786

opencl: support non-contig rows in norm (#24965) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64…

llama.cpp — Releases

▶ llama.cpp·03:21

llama.cpp b9785

chat: harden caps check (#24973) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x…

llama.cpp — Releases

▶ llama.cpp·21:55

llama.cpp b9784

hexagon: MUL_MAT and MUL_MAT_ID rework : 32x32 tiled weight repack, kernel-params, cached graphs (#24954) hex-mm: new weight layout and fusion updates hvx-mm: unroll the new tiled vec_dots to optimize hvx register util…

llama.cpp — Releases

▶ llama.cpp·18:44

llama.cpp b9782

common: remove unused json-partial (#24968) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU)…

llama.cpp — Releases

▶ llama.cpp·17:01

llama.cpp b9781

vulkan: allow reducing the graph submission batches to avoid timeouts (#24872) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux…

llama.cpp — Releases

▶ llama.cpp·12:18

llama.cpp b9780

vulkan: fail the build when a shader fails to compile (#24450) vulkan-shaders-gen: fail the build when a shader fails to compile vulkan-shaders-gen did not detect shader-compile subprocess failures, so a broken…

llama.cpp — Releases

▶ llama.cpp·09:43

llama.cpp b9777

model : Add LFM2.5-ColBERT-350M and LFM2.5-Embedding-350M (#24913) model : Add LFM2.5-ColBERT-350M and LFM2.5-Embedding-350M Restore LFM2 models in README.md macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon…

llama.cpp — Releases

▶ llama.cpp·05:59

llama.cpp b9776

vulkan: Apply bias before softmax in FA, to avoid overflow (#24909) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64…

llama.cpp — Releases

▶ llama.cpp·18:48

llama.cpp b9775

server : check draft context creation error (#24922) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64…

llama.cpp — Releases

▶ llama.cpp·18:17

llama.cpp b9774

vulkan: support all backend tests for SQR/SQRT/SIN/COS/CLAMP/LEAKY_RELU/NORM (#24582) vulkan: make SQR/SQRT/SIN/COS/CLAMP/LEAKY_RELU use unary.comp vulkan: make NORM support noncontig add noncontiguous row test cases…

llama.cpp — Releases

▶ llama.cpp·17:43

llama.cpp b9773

vulkan: Support GET_ROWS_BACK (#24883) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu…

llama.cpp — Releases

▶ llama.cpp·15:27

llama.cpp b9771

vulkan: make mul_mm ALIGNED a spec constant (#24689) This trims down some of the shader variant explosion and reduces binary size. macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled)…

llama.cpp — Releases

▶ llama.cpp·14:25

llama.cpp b9770

server: fix remote preset handling, add test (#24938) server: add test for remote preset fix remote preset handling fix fix test macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled)…

llama.cpp — Releases

▶ llama.cpp·13:43

llama.cpp b9769

vulkan: link ggml-cpu when GGML_VULKAN_CHECK_RESULTS / RUN_TESTS are enabled (#24444) The result-checking and test debug paths in ggml-vulkan.cpp call ggml_graph_compute_with_ctx() to compute a CPU reference graph, but…

llama.cpp — Releases

▶ llama.cpp·12:48

llama.cpp b9768

model: Granite Speech Plus (#24818) feat: Add conversion support for Granite Speech Plus Branch: GraniteSpeechPlus AI-usage: full (Bob, OpenCode + Qwen3.6-35b) Signed-off-by: Gabe Goodhart feat: Extend granite_speech…

llama.cpp — Releases

▶ llama.cpp·10:58

llama.cpp b9767

ggml-webgpu: improve MTP inference by using mat-vec path for small batches (#24811) ggml-webgpu: improve small batches decoding Add barrier to the NUM_COLS loop in mul-mat-vec macOS/iOS: macOS Apple Silicon (arm64)…

llama.cpp — Releases

▶ llama.cpp·23:32

llama.cpp b9763

server : Add id to tool call responses api (#24882) macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64…

llama.cpp — Releases

▶ llama.cpp·19:03

llama.cpp b9761

server: (router) move model downloading to dedicated process (#24834) server: real-time model load progress tracking via /models/sse update docs server: move model download to child process rm unused fix most problems…

llama.cpp — Releases

▶ llama.cpp·17:22

llama.cpp b9760

server: refactor/generalize input file schema (#24299) server: refactor/generalize input file schema wire up input_video, accept raw base64 nits nits (2) fix windows macOS/iOS: macOS Apple Silicon (arm64) macOS Apple…

llama.cpp — Releases