ollama/server/quantization.go at eda472df1bd420517ca05c59ba0096e8b518fb69

mirror of https://github.com/ollama/ollama.git synced 2025-11-11 06:47:45 +01:00

Files

Bruce MacDonald fbe6ae285a server: improve tensor quantization fallback logic (#10806 )

Fall back to alternative quantization types when a tensor's dimensions aren't divisible by the block size required for the original desired quantization type. If retried quantization types fail, the system ultimately falls back to F16 (half-precision floating point) which has a block size of 1 and can handle any tensor dimension.

2025-05-22 10:48:08 -07:00

8.1 KiB

Raw Blame History

View Raw

8.1 KiB Raw Blame History

8.1 KiB

Raw Blame History