§ feed · storyline

Ollama v0.18.4-rc0

Ollama releases v0.18.4-rc0 with flash attention disabled for Grok, an MLX KV cache memory leak fix, and periodic prefill snapshots for the MLX runner.

Mar 27 · 00:11:43 · primary fetch1 sourceupdated Mar 27 · 00:11:43

What's Changed ggml: force flash attention off for grok by @rick-github in https://github.com/ollama/ollama/pull/15050 mlx: fix KV cache snapshot memory leak by @jessegross in https://github.com/ollama/ollama/pull/15065 mlxrunner: schedule periodic snapshots during prefill by @jessegross in https://github.com/ollama/ollama/pull/15058 doc: update vscode doc by @hoyyeva in https://github.com/ollama/ollama/pull/15064 Full Changelog: https://github.com/ollama/ollama/compare/v0.18.3...v0.18.4-rc0

read full article on github.com ↗

§ sources1 publication · timeline below

github.comollama v0.18.4-rc0 — v0.18.4primary00:11:43