mirror of
https://github.com/ollama/ollama.git
synced 2025-03-19 22:32:15 +01:00
Similar to the llama engine, quantizing the KV cache requires flash attention to be enabled through the Ollama server.
Similar to the llama engine, quantizing the KV cache requires flash attention to be enabled through the Ollama server.