ollama/server/quantization.go at 3258a89b6e4c2030ca47dfe51483e768cbd38b33

mirror of https://github.com/ollama/ollama.git synced 2025-11-11 07:37:34 +01:00

Files

Michael Yang d0b32def60 skip quantizing per_layer_token_embd (#11207 )

this tensor isn't compatible with cuda when quantized to q4_K so skip it

2025-06-26 21:49:35 -07:00

View Raw