ollama

mirror of https://github.com/ollama/ollama.git synced 2025-09-14 18:21:05 +02:00

Author	SHA1	Message	Date
Michael Yang	61aeaf7e81	remove support for multiple ggufs in a single file (#10722 ) * remove support for multiple ggufs in a single file this was an attempt to make it easier to import multimodal models into ollama. this was rarely used and error prone so remove it * fix: create fused model from blob	2025-05-21 13:55:31 -07:00
Daniel Hiltgen	1a0cfd080a	avoid kv truncation during create (#10761 )	2025-05-19 13:54:54 -07:00
Jesse Gross	94ab428e3f	ggml: Seperate tensor load from backend creation Currently, when the backend is created, the tensors are loaded at the same time, which is a slow operation. This separates them to be two steps: - Create backend, including enumerating tensors and memory allocation - Loading tensor data This allows more flexibility in managing model loading.	2025-05-19 09:54:22 -07:00
Daniel Hiltgen	ff80718e9c	fix crash in old clients with quantization progress (#10710 ) Older clients assumed the digest was at least 19 characters long so increase the size of the dummy digest to avoid array out of bounds crashes.	2025-05-14 14:54:18 -07:00
Bruce MacDonald	ad035ad595	convert: quantize from safetensors needs kv (#10675 ) When creating a quantized model from safetensors we need the array KV values to be loaded.Changing this value to -1 loads the KV values on the returned layer to be used and saved during quantization.	2025-05-12 12:04:20 -07:00
Daniel Hiltgen	424810450f	Move quantization to new backend (#10363 ) * Move quantization logic to GGML via new backend This moves the model aware logic to Go code and calls GGMLs quantization code for model creation. * Remove "add model quantizations" This is no longer needed now that quantization is implemented in Go+GGML code directly.	2025-05-06 11:20:48 -07:00
Michael Yang	340448d2d1	explicitly decode maxarraysize 1024	2025-04-25 16:59:01 -07:00
Michael Yang	88738b357b	create tempdir in models directory the models directory should have plenty of storage and also ensure there's no cross-device copy	2025-04-18 18:13:05 -07:00
Bruce MacDonald	bebb6823c0	server: validate local path on safetensor create (#9379 ) More validation during the safetensor creation process. Properly handle relative paths (like ./model.safetensors) while rejecting absolute paths Add comprehensive test coverage for various paths No functionality changes for valid inputs - existing workflows remain unaffected Leverages Go 1.24's new os.Root functionality for secure containment	2025-02-28 16:10:43 -08:00
Michael Yang	58245413f4	next ollama runner (#7913 ) feat: add new Ollama engine using ggml through cgo This change introduces a new way to run pretrained models. It introduces 3 high level interfaces and a bunch of smaller helper interfaces to facilitate this. - `model.Model` defines the interface for a model architecture. Models such as `llama` and `mllama`, which are provided as examples, can implement the model's forward propagation in the `Forward` method. This method will be called to generate completions. This interface can be found in `model/model.go` - `ml.Backend` defines the interface for a backend tensor library, in this case `ggml`. Among other things, a Backend is responsible for loading a pretrained model into hardware (GPU, CPU, etc) and providing an interface for Models to access loaded tensors. This interface can be found in `ml/backend.go` - `ml.Tensor` defines the interface for a tensor and tensor operations This is the first implementation of the new engine. Follow up PRs will implement more features: - non-greedy sampling (#8410) - integration with Ollama and KV caching (#8301) - more model support (#9080) with more coming soon Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>	2025-02-13 16:31:21 -08:00
Patrick Devine	2539f2dbf9	Fix absolute path names + gguf detection (#8428 )	2025-01-14 19:01:24 -08:00
Patrick Devine	8bccae4f92	show a more descriptive error in the client if it is newer than the server (#8351 )	2025-01-09 10:12:30 -08:00
Patrick Devine	86a622cbdc	Update the /api/create endpoint to use JSON (#7935 ) Replaces `POST /api/create` to use JSON instead of a Modelfile. This is a breaking change.	2024-12-31 18:02:30 -08:00

13 Commits