shipfeedAI news, curated daily

00:32:33 CET
21 MAY00:32:33shipfeed
pull to refreshlast sync
Just in — 30 new
§ tools · storyline

llama.cpp b9133

llama.cpp b9133 adds continued generation support for reasoning models in its server and web UI, routing thinking tags correctly around prefilled messages to preserve chain-of-thought across reloads and resumes.

May 13 · · primary fetch1 sourceupdated May 13 ·

server, webui: support continue generation on reasoning models (#22727) server, webui : support continue generation on reasoning models (#22727) Remove the throw blocking assistant prefill on reasoning models and orchestrate thinking tags around the prefilled message so the parser routes the next stream chunks correctly. WebUI drops the reasoning guard on the Continue button, sends reasoning_content with the prefilled message and persists partial reasoning on stop so the CoT survives reload and resume. Scope : templates with a simple thinking_start_tag / thinking_end_tag pair. Channel-based templates like GPT-OSS are out of scope, pending a per-template prefill API in common/chat.

First step toward #21754. chore: update webui build output server: reject reasoning prefill on channel based templates macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan) Ubuntu x64 (ROCm 7.2) Ubuntu x64 (OpenVINO) Ubuntu x64 (SYCL FP32) Ubuntu x64 (SYCL FP16) Android: Android arm64 (CPU) Windows: Windows x64 (CPU) Windows arm64 (CPU)…

read full article on github.com
§ sources2 publications · timeline below
  1. github.comllama.cpp b9133primary
  2. github.comllama.cpp b9131

§ how this story moved

  1. primaryllama.cpp — Releases publishes the launch post.
  2. llama.cpp — Releases picks up coverage.