ollama/server/quantization.go at 6d02a43a75cd3364b3c230e9070f1413e60f0d02

mirror of https://github.com/ollama/ollama.git synced 2025-11-11 03:57:46 +01:00

Files

Michael Yang d0b32def60 skip quantizing per_layer_token_embd (#11207 )

this tensor isn't compatible with cuda when quantized to q4_K so skip it

2025-06-26 21:49:35 -07:00

View Raw