ollama

mirror of https://github.com/ollama/ollama.git synced 2025-10-11 05:03:14 +02:00

Author	SHA1	Message	Date
Patrick Devine	8b894933a7	engine: add remote proxy (#12307 )	2025-09-17 14:40:53 -07:00
Devon Rifkin	47991940d4	add qwen3-coder tool support The format qwen3-coder uses is relatively unique, both in rendering and in parsing. To implement parsing, I wrote a custom parser in similar style to harmony. For the rendering, I found that the logic would be much more difficult to follow in a template, so I introduced the concept of a built-in renderer that uses go code, rather than a template to generate prompts. I set us up for future built-in parsers and renderers by making it so they can be specified in a Modelfile like so: ``` RENDERER "qwen3-coder" PARSER "qwen3-coder" ``` These need to be provided explicitly because the architecture alone is not enough to understand what format the model expects to receive, and what format we expect it to output (e.g., qwen3-coder is `qwen3moe`, which includes other qwen3-family models as well) I haven't converted harmony to be one of these "built-ins" yet, since some of it is in flux with the changes @ParthSareen has been making to move harmony to the runner. It is likely that many other built-ins will need to move to the runner as well, but I'm able to slightly defer that decision since qwen3-coder doesn't have thinking (and therefore doesn't need to be in the runner to make structured outputs work). I expect to unify harmony with this approach very soon. Whether a particular model supports tools or thinking was previously inferred from templates, but without a template we now also use the parser itself to declare what it supports. If we have future models that re-use the same parsing format, but have different capabilities, we'll want to parameterize them and give them different names to be specified as a `PARSER`. Misc changes: - I worked on the renderer by diffing outputs from the reference implementation and ours. To make it easier to do this, I extended <https://github.com/ollama/ollama/pull/11875> to also support returning the prompt via the openai compat layer	2025-09-15 11:33:47 -07:00
Michael Yang	feb18cd710	feat: add dimensions field to embed requests (#12242 ) * feat: add field to truncate embeddings * add openai embeddings for dimensions	2025-09-11 10:36:10 -07:00
Michael Yang	1081532430	fix keep alive (#12041 )	2025-08-27 11:51:25 -07:00
Jeffrey Morgan	d3450dd52e	api: implement stringer for ToolFunctionParameters (#12038 )	2025-08-22 16:26:48 -07:00
Devon Rifkin	8de1da4767	server: add debug option for printing out prompt instead of calling model	2025-08-15 13:52:50 -07:00
Michael Yang	d0cf6c8281	fix(openai): handle reasoning_effort (#11868 )	2025-08-12 11:02:01 -07:00
Devon Rifkin	30f8a68c4c	tools: support anyOf types afaik gpt-oss is the first model that meaningfully transforms tool function definitions in its template. We found that relatively common definitions that include `anyOf` were not working because the template was assuming that types were always defined via a `type` field. anyOf allows for fully recursive types, so I exposed a `toTypeScriptType()` function to handle this recursive logic in go and keep the templates cleaner. The gpt-oss templates will need to be updated to use this. We should keep building out our function definition support to more fully support the parts of json schema that make sense for this use case, but in the meantime this will unblock some users (e.g., zed's ollama integration w/ gpt-oss). Probably the most urgent is proper array support	2025-08-05 16:46:24 -07:00
Michael Yang	fa7776fd24	gpt-oss (#11672 ) * bf16 * tests * gpt-oss * enable gptoss for engine * rough estimate * convert to mxfp4 * handle safetensors U8 * clamp glu/linear * update tokenizer * MXFP4 support This implements the Open Compute Microscaling (MX) FP4 format as a tensor type with backend implementations focusing on mulmat and mulmatid on CPU, CUDA, and Metal. * Unit tests for MXFP4 support This exercises various operations and shapes on both CPU and GPU (if detected on the system) * cuda graph * unit test adjustments * cuda: optimize memory access Read 4 bytes at a time (8 elements) when performing mul_mat_vec_mxfp4 * mac: fix crash on old macos versions cblas_sgemm is only supported on v13.3 and up, however bf16 is only supported on v14+ so we were falling back to ggml-blas and crashing on bf16 tensors. Checking for the function being null seems to be the simplest way to condittionally avoid registering the backend. * server: Minimum context length for gptoss This model requires a minimum context length of 8192 to function effectively. Users can set higher values through all normal mechanisms but lower values will be silently reset. * ggml: Multiply by numParallel for gptoss sliding window When computing the graph size estimate, the context size is already multiplied by numParallel so estimates reflect that. However, since sliding window models use a smaller, fixed context size, they need to manually take numParallel into account. * gpt-oss integration includes harmony parser and thinking levels, etc. * fix sync * fix tests * fix lint --------- Co-authored-by: Daniel Hiltgen <daniel@ollama.com> Co-authored-by: Jesse Gross <jesse@ollama.com> Co-authored-by: Devon Rifkin <drifkin@drifkin.net>	2025-08-05 12:21:16 -07:00
Daniel Hiltgen	34088dbcfb	API/CLI context enhancements (#11331 ) * API: expose context size of loaded models * CLI: add context UX This adds a column in the ps output to show the models context size.	2025-07-08 11:59:06 -07:00
Parth Sareen	1f91cb0c8c	template: add tool result compatibility (#11294 )	2025-07-07 15:53:42 -07:00
Jeffrey Morgan	09d308d6b6	Revert "server: add model capabilities to the list endpoint (#10174 )" (#11004 ) This reverts commit `0943001193`.	2025-06-06 23:29:14 -04:00
JasonHonKL	0943001193	server: add model capabilities to the list endpoint (#10174 )	2025-06-04 11:39:48 -07:00
Devon Rifkin	5f57b0ef42	add thinking support to the api and cli (#10584 ) - Both `/api/generate` and `/api/chat` now accept a `"think"` option that allows specifying whether thinking mode should be on or not - Templates get passed this new option so, e.g., qwen3's template can put `/think` or `/no_think` in the system prompt depending on the value of the setting - Models' thinking support is inferred by inspecting model templates. The prefix and suffix the parser uses to identify thinking support is also automatically inferred from templates - Thinking control & parsing is opt-in via the API to prevent breaking existing API consumers. If the `"think"` option is not specified, the behavior is unchanged from previous versions of ollama - Add parsing for thinking blocks in both streaming/non-streaming mode in both `/generate` and `/chat` - Update the CLI to make use of these changes. Users can pass `--think` or `--think=false` to control thinking, or during an interactive session they can use the commands `/set think` or `/set nothink` - A `--hidethinking` option has also been added to the CLI. This makes it easy to use thinking in scripting scenarios like `ollama run qwen3 --think --hidethinking "my question here"` where you just want to see the answer but still want the benefits of thinking models	2025-05-28 19:38:52 -07:00
Jeffrey Morgan	fa9973cd7f	api: remove unused sampling parameters (#10581 )	2025-05-08 08:31:08 -07:00
Jeffrey Morgan	392de84031	api: remove unused RetrieveModelResponse type (#10603 )	2025-05-06 23:08:03 -07:00
Jeffrey Morgan	3b2d2c8326	api: remove unused or unsupported api options (#10574 ) Some options listed in api/types.go are not supported in newer models, or have been deprecated in the past. This is the first of a series of PRs to clean up the API options	2025-05-05 14:54:40 -07:00
Adrien Duermael	40b10eee6d	api: fix ImageData struct comment to expect raw image bytes (#10386 )	2025-04-24 12:13:51 +09:00
Tom Sheffler	ef65174df2	types: include the 'items' and '$defs' fields to properly handle "array" types (#10091 ) --------- Co-authored-by: Parth Sareen <parth.sareen@ollama.com>	2025-04-09 17:45:49 -07:00
Parth Sareen	6747099d71	types: add any type and validation for ToolFunction enum (#10166 )	2025-04-08 15:05:38 -07:00
Alex Rozgo	2f723ac2d6	types: allow tool function parameters with a single type or an array of types (#9434 )	2025-04-07 14:27:01 -07:00
Bruce MacDonald	9876c9faa4	chore(all): replace instances of interface with any (#10067 ) Both interface{} and any (which is just an alias for interface{} introduced in Go 1.18) represent the empty interface that all types satisfy.	2025-04-02 09:44:27 -07:00
Bruce MacDonald	e172f095ba	api: return model capabilities from the show endpoint (#10066 ) With support for multimodal models becoming more varied and common it is important for clients to be able to easily see what capabilities a model has. Retuning these from the show endpoint will allow clients to easily see what a model can do.	2025-04-01 15:21:46 -07:00
Patrick Devine	4bed739259	add verbose mode to the show command (#9640 ) Add metadata and tensor information to the show command to be able to see more information about a model. This outputs the same data as shown on the model details page on ollama.com	2025-03-13 14:24:27 -07:00
Blake Mizerany	e2252d0fc6	server/internal/registry: take over pulls from server package (#9485 ) This commit replaces the old pull implementation in the server package with the new, faster, more robust pull implementation in the registry package. The new endpoint, and now the remove endpoint too, are behind the feature gate "client2" enabled only by setting the OLLAMA_EXPERIMENT environment variable include "client2". Currently, the progress indication is wired to perform the same as the previous implementation to avoid making changes to the CLI, and because the status reports happen at the start of the download, and the end of the write to disk, the progress indication is not as smooth as it could be. This is a known issue and will be addressed in a future change. This implementation may be ~0.5-1.0% slower in rare cases, depending on network and disk speed, but is generally MUCH faster and more robust than the its predecessor in all other cases.	2025-03-05 14:48:18 -08:00
Parth Sareen	314573bfe8	config: allow setting context length through env var (#8938 ) * envconfig: allow setting context length through env var	2025-02-24 13:26:35 -08:00
Jeffrey Morgan	1deafd8254	llama: update vendored code to commit 46e3556 (#8308 )	2025-01-08 11:22:01 -08:00
Bruce MacDonald	29a8975c66	api: remove unused create fields These fields are deprecated, but specifying them will not do anything. Removing them as the other deprecated fields will still work, but these do not, so they dont match our existing pattern.	2025-01-03 12:03:58 -08:00
Patrick Devine	86a622cbdc	Update the /api/create endpoint to use JSON (#7935 ) Replaces `POST /api/create` to use JSON instead of a Modelfile. This is a breaking change.	2024-12-31 18:02:30 -08:00
Jeffrey Morgan	527cc97899	llama: update vendored code to commit 40c6d79f (#7875 )	2024-12-10 19:21:34 -08:00
Parth Sareen	c6c526275d	api: add generate endpoint for structured outputs (#7939 )	2024-12-04 17:37:12 -08:00
Parth Sareen	630e7dc6ff	api: structured outputs - chat endpoint (#7900 ) Adds structured outputs to chat endpoint --------- Co-authored-by: Michael Yang <mxyng@pm.me> Co-authored-by: Hieu Nguyen <hieunguyen1053@outlook.com>	2024-12-04 16:31:19 -08:00
Parth Sareen	5f8051180e	Enable index tracking for tools - openai api support (#7888 )	2024-11-29 20:00:09 -08:00
Evan	d48c1c5a44	api: fix typos in Go Doc comments (#7620 )	2024-11-11 16:21:58 -08:00
Jesse Gross	a909417602	runner.go: Remove unused arguments Now that server.cpp is gone, we don't need to keep passing arguments that were only ignored and only kept for compatibility.	2024-11-06 13:32:18 -08:00
Michael Yang	8e6da3cbc5	update deprecated warnings	2024-08-28 09:55:11 -07:00
Chua Chee Seng	d4a7216c82	Fixed invalid option provided not displaying the invalid option name problem. (#6202 )	2024-08-06 14:37:16 -04:00
Daniel Hiltgen	f457d63400	Implement linux NUMA detection If the system has multiple numa nodes, enable numa support in llama.cpp If we detect numactl in the path, use that, else use the basic "distribute" mode.	2024-08-05 12:56:20 -07:00
royjhan	1b44d873e7	Add Metrics to `api\embed` response (#5709 ) * add prompt tokens to embed response * rm slog * metrics * types * prompt n * clean up * reset submodule * update tests * test name * list metrics	2024-07-30 13:12:21 -07:00
Jeffrey Morgan	46e6327e0f	api: add stringifier for `Tool` (#5891 )	2024-07-29 13:35:16 -07:00
Tibor Schmidt	f3d7a481b7	feat: add support for min_p (resolve #1142 ) (#1825 )	2024-07-27 14:37:40 -07:00
Jeffrey Morgan	84e5721f3a	always provide content even if empty (#5778 )	2024-07-18 11:28:19 -07:00
Michael Yang	b255445557	marshal json automatically for some template values (#5758 )	2024-07-17 15:35:11 -07:00
Michael Yang	c279f96371	remove ToolCall from GenerateResponse	2024-07-16 15:22:49 -07:00
Michael Yang	499e87c9ba	Merge pull request #5730 from ollama/mxyng/cleanup remove unneeded tool calls	2024-07-16 14:42:13 -07:00
Michael Yang	d290e87513	add suffix support to generate endpoint this change is triggered by the presence of "suffix", particularly useful for code completion tasks	2024-07-16 14:31:35 -07:00
Michael Yang	5a83f79afd	remove unneeded tool calls	2024-07-16 13:48:45 -07:00
Michael Yang	64039df6d7	Merge pull request #5284 from ollama/mxyng/tools tools	2024-07-15 18:03:37 -07:00
Jeffrey Morgan	7ac6d462ec	server: return empty slice on empty `/api/embed` request (#5713 ) * server: return empty slice on empty `/api/embed` request * fix tests	2024-07-15 17:39:44 -07:00
Michael Yang	d02bbebb11	tools	2024-07-15 15:26:16 -07:00

1 2 3 4

171 Commits