Ollama v0.18.4-rc0
Ollama releases v0.18.4-rc0 with flash attention disabled for Grok, an MLX KV cache memory leak fix, and periodic prefill snapshots for the MLX runner.
What's Changed ggml: force flash attention off for grok by @rick-github in https://github.com/ollama/ollama/pull/15050 mlx: fix KV cache snapshot memory leak by @jessegross in https://github.com/ollama/ollama/pull/15065 mlxrunner: schedule periodic snapshots during prefill by @jessegross in https://github.com/ollama/ollama/pull/15058 doc: update vscode doc by @hoyyeva in https://github.com/ollama/ollama/pull/15064 Full Changelog: https://github.com/ollama/ollama/compare/v0.18.3...v0.18.4-rc0
- github.comollama v0.18.4-rc0 — v0.18.4primary