ollama

mirror of https://github.com/ollama/ollama.git synced 2025-04-14 14:49:25 +02:00

History

Jesse Gross 1feff61977 kvcache: Sliding window cache only needs a single batch total

When computing the size of the cache for sliding window attention,
we don't need to multiple the batch size by the number of parallel
sequences - the batch size is constant.

This also simplifies the check for whether to allocate the cache
size based on capacity or window size as the batch size is already
incorporated into the capacity when handled by the runner.

2025-03-26 13:16:03 -07:00

cache.go

kvcache: Pass granular cache size into implementations

2025-03-21 11:20:19 -07:00

causal_test.go

kvcache: Optimize sliding window attention

2025-03-21 11:20:19 -07:00

causal.go

kvcache: Sliding window cache only needs a single batch total

2025-03-26 13:16:03 -07:00

encoder.go

kvcache: Pass granular cache size into implementations

2025-03-21 11:20:19 -07:00

wrapper.go

kvcache: Pass granular cache size into implementations

2025-03-21 11:20:19 -07:00