ollama

mirror of https://github.com/ollama/ollama.git synced 2025-06-27 08:11:03 +02:00

Author	SHA1	Message	Date
Jesse Gross	c2f5d6662b	ollamarunner: Re-enable worst case graph preallocation. Worst case graph preallocation was disabled by a27462b "ollamarunner: Temporarily disable worst case graph preallocation" since it caused crashes with large batches when not using the GPU. This backports upstream llama.cpp commit f057808 "ggml: Don't assert fail when tensor data changes (#13222)", which fixes the underlying bug and allows reverting the previous workaround.	2025-05-02 12:22:47 -07:00
Harsh Nevse	57fb759f3c	readme: update link to langchain in community integrations (#10465 )	2025-05-01 23:08:51 -07:00
Jeffrey Morgan	8dd12c873d	llama: update to commit e1e8e099 (#10513 )	2025-05-01 18:24:09 -07:00
frob	e6d2d04121	image: add vision capability for projector-based models (#10509 ) Co-authored-by: Richard Lyons <frob@cloudstaff.com>	2025-05-01 16:50:20 -07:00
Jesse Gross	074bac8447	kvcache: Log batch size if we can't find a slot In some cases, we can't find a cache slot when using sliding window attention. It would be helpful in this (and other cases) to know what the batch size is. Bug #10127	2025-05-01 16:26:36 -07:00
Jesse Gross	8e8f2c6d67	ollamarunner: Fix memory leak when processing images The context (and therefore associated input tensors) was not being properly closed when images were being processed. We were trying to close them but in reality we were closing over an empty list, preventing anything from actually being freed. Fixes #10434	2025-05-01 15:15:24 -07:00
AliAhmedNada	938e8447e8	readme: add Jirapt project to community integrations (#10522 )	2025-05-01 14:49:47 -07:00
aritra saha	d5d5f0c445	readme: change granite3.2 to granite3.3 (#10525 ) Update the list for readme	2025-05-01 14:46:09 -07:00
Michael Yang	a7835c6716	fix: write gguf padding (#10510 ) * add gguf_test * fix padding padding was being added to offset but not to the running count	2025-04-30 17:59:31 -07:00
Devon Rifkin	ad3c7c9bda	strip out thinking tags in message history for qwen3 & r1 (#10490 ) * strip out thinking tags in message history for qwen3 & r1 This is in advance of "proper" support where we'll make reasoning configurable and we'll parse out thinking/reasoning tags and provide them to the caller. These models expect there to be no thinking tags in the message history, so this should improve quality * parse model names instead of hacky prefix check	2025-04-30 13:57:45 -07:00
Daniel Hiltgen	415c8fcc3d	Fix "Stopping..." scheduler hang (#10487 ) * Adjust initial scheduler refCount Ensure we only set the refCount on success * sched: fix lock order inversion deadlock Under certain race conditions, there was a scenario where the scheduler would get into a deadlock while trying to update free space information while a model was trying to unload.	2025-04-30 11:26:52 -07:00
Daniel Hiltgen	718eda1b3e	Narrow set of paths we load GGML from (#10485 ) Users may have other incompatible GGML installs on their systems. This will prevent us from trying to load them from the path.	2025-04-30 11:25:22 -07:00
Shahin R	421b7edeb4	readme: add link to lumina, a lightweight React frontend client (#10378 )	2025-04-30 09:50:47 -07:00
batuhankadioglu	7b68e254c2	all: update several golang.org/x packages (#10436 )	2025-04-29 16:51:09 -07:00
Daniel Hiltgen	7bec2724a5	integration: fix embedding tests error handling (#10478 ) The cleanup routine from InitServerconnection should run in the defer of the test case to properly detect failures and report the server logs	2025-04-29 11:57:54 -07:00
Jesse Gross	a27462b708	ollamarunner: Temporarily disable worst case graph preallocation When we later have a large batch running purely on a CPU, this results the error: GGML_ASSERT(talloc->buffer_id >= 0) Disabling this means that we will incrementally reallocate memory as the graph grows. Fixes #10410	2025-04-29 11:04:58 -07:00
crStiv	6bf0b8193a	readme: fix typos (#10399 )	2025-04-29 10:30:44 -07:00
Devon Rifkin	db428adbb8	Merge pull request #10468 from ollama/drifkin/num-parallel-1	2025-04-29 10:21:36 -07:00
Devon Rifkin	fe5b9bb21b	lower default num parallel to 2 this is in part to "pay" for #10452, which doubled the default context length. The combination isn't fully neutral though, because even though the old 4x2k limit and the new 2x4k limit are memory equivalent, the 1x fallback is larger with 4k	2025-04-29 02:04:14 -07:00
Devon Rifkin	6ec71d8fb6	Merge pull request #10452 from ollama/drifkin/4096-context-length config: update default context length to 4096	2025-04-28 17:13:51 -07:00
Devon Rifkin	44b466eeb2	config: update default context length to 4096	2025-04-28 17:03:27 -07:00
Devon Rifkin	a25f3f8260	Merge pull request #10451 from ollama/revert-10364-drifkin/context-length Revert "increase default context length to 4096"	2025-04-28 17:02:10 -07:00
Devon Rifkin	dd93e1af85	Revert "increase default context length to 4096 (#10364 )" This reverts commit 424f648632c925ce14a75018c4dcab395e035993.	2025-04-28 16:54:11 -07:00
Michael Yang	5cfc1c39f3	model: fix build (#10416 ) v0.6.7-rc0	2025-04-25 19:24:48 -07:00
Michael Yang	f0ad49ea17	memory	2025-04-25 16:59:20 -07:00
Michael Yang	7ba9fa9c7d	fixes for maverick	2025-04-25 16:59:20 -07:00
Michael Yang	8bf11b84c1	chunked attention	2025-04-25 16:59:20 -07:00
Michael Yang	470af8ab89	connect vision to text	2025-04-25 16:59:20 -07:00
Michael Yang	178761aef3	image processing Co-authored-by: Patrick Devine <patrick@infrahq.com>	2025-04-25 16:59:20 -07:00
Michael Yang	f0c66e6dea	llama4	2025-04-25 16:59:20 -07:00
Michael Yang	54055a6dae	fix test	2025-04-25 16:59:01 -07:00
Michael Yang	340448d2d1	explicitly decode maxarraysize 1024	2025-04-25 16:59:01 -07:00
Michael Yang	ced7d0e53d	fix parameter count	2025-04-25 16:59:01 -07:00
Michael Yang	a0dba0f8ae	default slice values	2025-04-25 16:59:01 -07:00
Michael Yang	5e20b170a7	update comment	2025-04-25 16:59:01 -07:00
Michael Yang	d26c18e25c	fix token type	2025-04-25 16:59:01 -07:00
Michael Yang	8d376acc9b	zero means zero use a default of 1024 when asking for zero is confusing since most calls seem to assume 0 means do not ready any data	2025-04-25 16:59:01 -07:00
Michael Yang	dc1e81f027	convert: use -1 for read all	2025-04-25 16:59:01 -07:00
Michael Yang	5d0279164c	generic ggml.array	2025-04-25 16:59:01 -07:00
Michael Yang	214a7678ea	fix superfluous call to WriteHeader the first call to http.ResponseWriter.Write implicitly calls WriteHeader with http.StatusOK if it hasn't already been called. once WriteHeader has been called, subsequent calls has no effect. Write is called when JSON encoding progressUpdateJSON{}. calls to http.ResponseWriter.WriteHeader after the first encode is useless and produces a warning: http: superfluous response.WriteHeader call from github.com/ollama/ollama/server/internal/registry.(*statusCodeRecorder).WriteHeader (server.go:77)	2025-04-25 16:58:49 -07:00
Michael Yang	4892872c18	convert: change to colmajor	2025-04-25 15:27:39 -07:00
Michael Yang	0b9198bf47	ci: silence deprecated gpu targets warning	2025-04-25 13:37:54 -07:00
Jeffrey Morgan	e9e5f61c45	llama: update to commit 2016f07b (#10352 )	2025-04-24 17:26:02 -07:00
Parth Sareen	11dde41824	server: improve spacing for JSON grammar (#10131 )	2025-04-24 16:47:57 -07:00
Parth Sareen	a53d744b01	llama: remove model loading for grammar (#10096 )	2025-04-24 11:51:19 -07:00
Adrien Duermael	40b10eee6d	api: fix ImageData struct comment to expect raw image bytes (#10386 )	2025-04-24 12:13:51 +09:00
Devon Rifkin	424f648632	increase default context length to 4096 (#10364 ) * increase default context length to 4096 We lower the default numParallel from 4 to 2 and use these "savings" to double the default context length from 2048 to 4096. We're memory neutral in cases when we previously would've used numParallel == 4, but we add the following mitigation to handle some cases where we would have previously fallen back to 1x2048 due to low VRAM: we decide between 2048 and 4096 using a runtime check, choosing 2048 if we're on a one GPU system with total VRAM of <= 4 GB. We purposefully don't check the available VRAM because we don't want the context window size to change unexpectedly based on the available VRAM. We plan on making the default even larger, but this is a relatively low-risk change we can make to quickly double it. * fix tests add an explicit context length so they don't get truncated. The code that converts -1 from being a signal for doing a runtime check isn't running as part of these tests. * tweak small gpu message * clarify context length default also make it actually show up in `ollama serve --help`	2025-04-22 16:33:24 -07:00
Richard Shiue	2eb1fb3231	readme: add AppFlowy to community integrations (#10335 )	2025-04-20 15:38:06 -07:00
greengrass821	0806521642	cmd: add support for escaping ~ in filepath (#10339 ) Co-authored-by: tooth paste <tooth_paste91@Poorneshwars-MacBook-Pro.local>	2025-04-20 15:21:48 -07:00
Michael Yang	88738b357b	create tempdir in models directory the models directory should have plenty of storage and also ensure there's no cross-device copy v0.6.6	2025-04-18 18:13:05 -07:00

1 2 3 4 5 ...

4223 Commits