ollama

mirror of https://github.com/ollama/ollama.git synced 2025-03-19 22:32:15 +01:00

History

Jesse Gross 5beede47d9 ml: Add support for quantized KV cache

Similar to the llama engine, quantizing the KV cache requires
flash attention to be enabled through the Ollama server.

2025-02-27 17:09:16 -08:00

2025-02-27 17:09:16 -08:00

2025-02-27 17:06:41 -08:00

backend.go

2025-02-27 17:09:16 -08:00