Context Hard Drive Cache
DeepSeek API enables disk-based context caching by default for all users, storing prefix units at request endpoints, common prefixes, and fixed token intervals to reduce recomputation costs.
DeepSeek API context disk caching is enabled by default for all users and requires no code modifications to use. Each user request triggers the construction of disk cache. If subsequent requests have overlapping prefixes with previous requests, the overlapping portion only needs to be retrieved from the cache and is counted as a "cache hit". Cache write-to-disk and hit rules A cache hit requires that the corresponding prefix has been "written to disk" (stored in disk cache). Due to the Sliding Window Attention mechanism, cache prefix storage and retrieval differ from before. Each cached prefix is an independent complete unit.
Subsequent requests can only achieve a cache hit by completely matching the cached prefix unit. Cache prefix write-to-disk timing: Request endpoint write-to-disk: The end position of user input and model output in each request generates two cache prefix units. Subsequent requests that completely match them can achieve a hit. Common prefix detection write-to-disk: When the system detects a common prefix across multiple requests, it writes that common prefix as an independent cache prefix unit to disk. Subsequent requests that fully reuse this cache prefix…
- api-docs.deepseek.comContext Hard Drive Cacheprimary