ollama

mirror of https://github.com/ollama/ollama.git synced 2025-11-10 17:48:11 +01:00

Files

Jesse Gross 29ddfc2cab ggml: Disable flash attention for gemma2

Our new engine implementation of gemma2 doesn't support flash
attention, which means that it also doesn't support KV cache
quantization. Currently, it is possible to turn these two on,
which will result in a crash.

2025-09-10 16:40:45 -07:00

ggml

ggml: Disable flash attention for gemma2

2025-09-10 16:40:45 -07:00

gguf

Reapply "feat: incremental gguf parser (#10822 )" (#11114 ) (#11119 )

2025-06-20 11:11:40 -07:00

util/bufioutil

next ollama runner (#7913 )

2025-02-13 16:31:21 -08:00

config.go

add new gemma model (#11204 )

2025-06-25 21:47:09 -07:00