ollama/server/quantization.go at fbe6ae285a23baddb14c5bbce26d4fcb837503e4

mirror of https://github.com/ollama/ollama.git synced 2025-11-11 02:17:45 +01:00

Files

Bruce MacDonald fbe6ae285a server: improve tensor quantization fallback logic (#10806 )

Fall back to alternative quantization types when a tensor's dimensions aren't divisible by the block size required for the original desired quantization type. If retried quantization types fail, the system ultimately falls back to F16 (half-precision floating point) which has a block size of 1 and can handle any tensor dimension.

2025-05-22 10:48:08 -07:00

8.1 KiB

Raw Blame History

View Raw

8.1 KiB Raw Blame History

8.1 KiB

Raw Blame History