ollama

mirror of https://github.com/ollama/ollama.git synced 2025-11-11 14:39:30 +01:00

Author	SHA1	Message	Date
Michael Yang	a6fbfc880c	gguf: fix write order (#11068 ) * ggml: test write gguf order * ggml: fix write tensor order	2025-06-16 10:42:32 -07:00
Jesse Gross	94ab428e3f	ggml: Seperate tensor load from backend creation Currently, when the backend is created, the tensors are loaded at the same time, which is a slow operation. This separates them to be two steps: - Create backend, including enumerating tensors and memory allocation - Loading tensor data This allows more flexibility in managing model loading.	2025-05-19 09:54:22 -07:00
Daniel Hiltgen	424810450f	Move quantization to new backend (#10363 ) * Move quantization logic to GGML via new backend This moves the model aware logic to Go code and calls GGMLs quantization code for model creation. * Remove "add model quantizations" This is no longer needed now that quantization is implemented in Go+GGML code directly.	2025-05-06 11:20:48 -07:00
Michael Yang	a7835c6716	fix: write gguf padding (#10510 ) * add gguf_test * fix padding padding was being added to offset but not to the running count	2025-04-30 17:59:31 -07:00