mirror of
https://github.com/ollama/ollama.git
synced 2025-11-10 15:57:58 +01:00
Fragmentation of the KV cache can occur due to cache shifting or different sequences getting processed. Decode uses a heuristic to decide if it should defrag. However, this heuristic isn't 100% accurate, so decoding can sometimes fail by surprise. For these cases, if decode indicates that there is no KV cache space, we should defrag and then try again.
3.4 KiB
3.4 KiB