ollama/ml/backend.go at 7e34f4fbfa192b3a2334d8fc28e24d69b83064d9

mirror of https://github.com/ollama/ollama.git synced 2025-11-13 09:37:56 +01:00

Files

Jesse Gross 4100ed7bdd ml: Add support for quantized KV cache

Similar to the llama engine, quantizing the KV cache requires
flash attention to be enabled through the Ollama server.

2025-03-07 18:43:39 -08:00

View Raw