ollama/llm/memory.go at pdevine/parser-tidy

mirror of https://github.com/ollama/ollama.git synced 2025-11-11 07:07:54 +01:00

Files

Jesse Gross 71cb86af3e llm: Remove unneeded warning with flash attention enabled

If flash attention is enabled without KV cache quanitization, we will
currently always get this warning:
level=WARN source=server.go:226 msg="kv cache type not supported by model" type=""

2025-09-10 16:40:45 -07:00

15 KiB

Raw Permalink Blame History

View Raw

15 KiB Raw Permalink Blame History

15 KiB

Raw Permalink Blame History