ollama

mirror of https://github.com/ollama/ollama.git synced 2025-08-25 20:51:11 +02:00

Author	SHA1	Message	Date
Michael Yang	ef202789fa	fix pixel values padding (#10718 ) * panic if trying to pad 4d * fix pixel values padding	2025-05-15 13:44:44 -07:00
Michael Yang	55760195e6	fix mllama conversion (#10716 ) cross attention Q and K projections needs to have their heads swapped, similar to non-cross attention Q and K tensors	2025-05-15 12:15:01 -07:00
Bruce MacDonald	bd68d3ae50	ggml: update qwen25vl vision size estimate (#10711 )	2025-05-14 16:42:30 -07:00
Daniel Hiltgen	ff80718e9c	fix crash in old clients with quantization progress (#10710 ) Older clients assumed the digest was at least 19 characters long so increase the size of the dummy digest to avoid array out of bounds crashes.	2025-05-14 14:54:18 -07:00
Bruce MacDonald	0aa8b371dd	model: add Qwen2.5-VL support (#10385 )	2025-05-13 20:58:02 -07:00
Michael Yang	23125648b8	chore: update mllama to use ollama engine (#10637 )	2025-05-13 17:36:02 -07:00
tej	0478d440f0	Fixed over vram allcation dure to small initial layer sizes. Co-authored-by: Tej Kiran <kiran.tej@amd.com> Co-authored-by: Michael Yang <mxyng@pm.me> Co-authored-by: Tej Kiran <itej89@gmailcom>	2025-05-13 16:42:39 -07:00
Parth Sareen	8cc33f4c2b	llama: fix memory leak for grammar (#10696 )	2025-05-13 15:39:27 -07:00
Jeffrey Morgan	f46df4e5d2	llama: fix defrag patch to defragment when no slots are available (#10695 )	2025-05-13 14:02:08 -07:00
Daniel Hiltgen	c6bcdc4223	Revert "remove cuda v11 (#10569 )" (#10692 ) Bring back v11 until we can better warn users that their driver is too old. This reverts commit `fa393554b9`.	2025-05-13 13:12:54 -07:00
Jeffrey Morgan	4b903f088a	llama: fix crash on snowflake embedding model (#10690 )	2025-05-13 13:11:11 -07:00
Jeffrey Morgan	c7f4ae7b9c	server: add webp image input support (#10653 )	2025-05-12 20:41:42 -07:00
Michael Yang	526b2ed102	fix vocabulary (#10679 )	2025-05-12 17:29:46 -07:00
Bruce MacDonald	a7240c6d63	models: remove unused qwen2vl processing (#10677 )	2025-05-12 16:08:42 -07:00
Daniel Hiltgen	9d6df90805	Follow up to #10363 (#10647 ) The quantization PR didn't block all unsupported file types, which this PR fixes. It also updates the API docs to reflect the now reduced set of supported types.	2025-05-12 15:23:31 -07:00
Jeffrey Morgan	0cefd46f23	llama: update to commit de4c07f93 (#10655 )	2025-05-12 12:17:26 -07:00
Bruce MacDonald	ad035ad595	convert: quantize from safetensors needs kv (#10675 ) When creating a quantized model from safetensors we need the array KV values to be loaded.Changing this value to -1 loads the KV values on the returned layer to be used and saved during quantization.	2025-05-12 12:04:20 -07:00
Michael Yang	f95a1f2bef	feat: add trace log level (#10650 ) reduce prompt log to trace level	2025-05-12 11:43:00 -07:00
HardCodeDev	82a9e9462a	readme: add UnityCodeLama to community integrations (#10665 )	2025-05-11 13:44:51 -07:00
HardCodeDev	76724e2f29	readme: add OllamaPlusPlus C++ library to community integrations (#10664 )	2025-05-11 13:40:41 -07:00
frob	ecf14a220f	llama: allocate grammar buffer based on schema length (#10649 )	2025-05-10 11:57:30 -07:00
frob	69ce44b33c	envconfig: Remove no longer supported max vram var (#10623 ) Co-authored-by: Richard Lyons <frob@cloudstaff.com>	2025-05-10 11:31:04 -07:00
Michael Yang	5969674cf1	feat: add threshold to dump options (#10639 ) ml.Dump will preserve default values if not specified	2025-05-10 11:27:15 -07:00
AliAhmedNada	867d75b21e	readme: add ojira to community integrations (#10648 )	2025-05-10 10:36:40 -07:00
Bruce MacDonald	3fa78598a1	cmd: strip single quotes from image page (#10636 )	2025-05-09 18:05:43 -07:00
Michael Yang	0d6e35d3c6	fix: stream accumulator exits early (#10593 ) the stream accumulator exits as soon as it sees `api.ProgressResponse(status="success")` which isn't strictly correctly since some requests may have multiple successes, e.g. `/api/create` when the source model needs to be pulled.	2025-05-08 13:17:30 -07:00
Devon Rifkin	20c5fd39c8	Merge branch 'main' into drifkin/array-head-count-simple	2025-05-08 11:46:52 -07:00
Michael Yang	6e9a7a2568	lint: enable usetesting, disable tenv (#10594 )	2025-05-08 11:42:14 -07:00
Michael Yang	b585a58121	chore: remove unused ZipReader type (#10621 )	2025-05-08 11:17:41 -07:00
Jeffrey Morgan	fa9973cd7f	api: remove unused sampling parameters (#10581 )	2025-05-08 08:31:08 -07:00
Jesse Gross	3d9498a425	ollamarunner: Use correct constant to remove cache entries The correct constant to remove all entries to the end of the sequence for the Ollama engine is math.MaxInt32. -1 is used by the old engine. The impact of this is currently minimal because it would only occur in situations that are not supported by the implemented models or rarely used options.	2025-05-07 17:26:15 -07:00
Daniel Hiltgen	3098c8b29b	CI: trigger downstream release process (#10508 )	2025-05-07 10:35:12 -07:00
Daniel Hiltgen	5e380c3b42	sched: fix race leading to orphaned runners (#10599 ) If a model is loading, and the request context is canceled during the load by a client closing the connection, and another request is inbound for the same model with a different configuration (context size, etc.) thus requiring a reload, two unload events can be in flight. The first shuts down the original model load, but the second one caused the loss of the new reloading runner reference, thus triggering the leak. The primary fix is detecting the duplicate unload and ignoring the second instance. The load routine is also hardened to ensure we detect clobbering an already present runner and unload it with a warning.	2025-05-07 09:38:17 -07:00
Jeffrey Morgan	392de84031	api: remove unused RetrieveModelResponse type (#10603 )	2025-05-06 23:08:03 -07:00
Daniel Hiltgen	af31ccefc0	fix data race in WriteGGUF (#10598 ) err in the go routine should not be shared with the outer scope	2025-05-06 17:36:38 -07:00
Daniel Hiltgen	fa393554b9	remove cuda v11 (#10569 ) This reduces the size of our Windows installer payloads by ~256M by dropping support for nvidia drivers older than Feb 2023. Hardware support is unchanged. Linux default bundle sizes are reduced by ~600M to 1G.	2025-05-06 17:33:19 -07:00
Aharon Bensadoun	307e3b3e1d	readme: add Flufy to community integrations (#9719 )	2025-05-06 14:47:35 -07:00
Devon Rifkin	4090aca97b	server: send 405 instead of 404 for unallowed methods (#10275 ) Fixes: #5483	2025-05-06 14:45:37 -07:00
Michael Yang	92ce438de0	server: remove internal cmd (#10595 )	2025-05-06 13:05:01 -07:00
Daniel Hiltgen	424810450f	Move quantization to new backend (#10363 ) * Move quantization logic to GGML via new backend This moves the model aware logic to Go code and calls GGMLs quantization code for model creation. * Remove "add model quantizations" This is no longer needed now that quantization is implemented in Go+GGML code directly.	2025-05-06 11:20:48 -07:00
Michael Yang	95e744beeb	discover: fix compiler warnings (#10572 )	2025-05-06 10:49:22 -07:00
Jeffrey Morgan	3b2d2c8326	api: remove unused or unsupported api options (#10574 ) Some options listed in api/types.go are not supported in newer models, or have been deprecated in the past. This is the first of a series of PRs to clean up the API options	2025-05-05 14:54:40 -07:00
Michael Yang	d931ee8f22	create blobs in parallel (#10135 ) * default max term height * error on out of tree files	2025-05-05 11:59:26 -07:00
Jesse Gross	7073600797	ggml: Reduce log level of "key not found" Most of the time this is not an error.	2025-05-05 11:17:32 -07:00
Daniel Hiltgen	b1c40138da	win: lint fix (#10571 )	2025-05-05 11:08:12 -07:00
Ashok Gelal	17466217e5	Hide empty terminal window (#8668 ) This hides the LlamaServer blank window when chatting outside of the terminal (say like with an app like Msty). This has no other side effects when invoking it the regular way.	2025-05-05 09:06:46 -07:00
Jeffrey Morgan	1703d1472e	server: fix panic when runner.Options is nil (#10566 )	2025-05-05 09:01:33 -07:00
Jeffrey Morgan	913905028b	all: fix cgo compiler warnings on windows (#10563 )	2025-05-05 08:02:39 -07:00
湛露先生	7e5c8eee5c	file close check and close. (#10554 ) Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>	2025-05-04 15:37:59 -07:00
Daniel Hiltgen	6a74bba7e7	win: ensure ollama paths come first (#10549 ) For all search path env vars make sure our dirs are first to avoid potentially finding other incompatible libraries on the users system. Also fixes a minor build script glitch for windows rocm	2025-05-03 13:11:48 -07:00

... 2 3 4 5 6 ...

4428 Commits