Files
ollama/integration/context_test.go
Jesse Gross 7121dfa309 runner.go: Retry decoding after defragmentation if needed
Fragmentation of the KV cache can occur due to cache shifting or
different sequences getting processed. Decode uses a heuristic to
decide if it should defrag. However, this heuristic isn't 100%
accurate, so decoding can sometimes fail by surprise.

For these cases, if decode indicates that there is no KV cache space,
we should defrag and then try again.
2024-11-20 12:49:24 -08:00

3.4 KiB