ollama

mirror of https://github.com/ollama/ollama.git synced 2025-07-29 05:33:33 +02:00

Author	SHA1	Message	Date
Chua Chee Seng	d4a7216c82	Fixed invalid option provided not displaying the invalid option name problem. (#6202 )	2024-08-06 14:37:16 -04:00
Daniel Hiltgen	f457d63400	Implement linux NUMA detection If the system has multiple numa nodes, enable numa support in llama.cpp If we detect numactl in the path, use that, else use the basic "distribute" mode.	2024-08-05 12:56:20 -07:00
Michael Yang	b732beba6a	lint	2024-08-01 17:06:06 -07:00
Michael Yang	5c1912769e	Merge pull request #5473 from ollama/mxyng/environ fix: environ lookup	2024-07-31 10:18:05 -07:00
royjhan	1b44d873e7	Add Metrics to `api\embed` response (#5709 ) * add prompt tokens to embed response * rm slog * metrics * types * prompt n * clean up * reset submodule * update tests * test name * list metrics	2024-07-30 13:12:21 -07:00
Jeffrey Morgan	46e6327e0f	api: add stringifier for `Tool` (#5891 )	2024-07-29 13:35:16 -07:00
Tibor Schmidt	f3d7a481b7	feat: add support for min_p (resolve #1142 ) (#1825 )	2024-07-27 14:37:40 -07:00
Michael Yang	1954ec5917	uint64	2024-07-22 11:49:02 -07:00
Michael Yang	4f1afd575d	host	2024-07-22 11:25:30 -07:00
Jeffrey Morgan	84e5721f3a	always provide content even if empty (#5778 )	2024-07-18 11:28:19 -07:00
Michael Yang	b255445557	marshal json automatically for some template values (#5758 )	2024-07-17 15:35:11 -07:00
Michael Yang	c279f96371	remove ToolCall from GenerateResponse	2024-07-16 15:22:49 -07:00
Michael Yang	499e87c9ba	Merge pull request #5730 from ollama/mxyng/cleanup remove unneeded tool calls	2024-07-16 14:42:13 -07:00
Michael Yang	d290e87513	add suffix support to generate endpoint this change is triggered by the presence of "suffix", particularly useful for code completion tasks	2024-07-16 14:31:35 -07:00
Michael Yang	5a83f79afd	remove unneeded tool calls	2024-07-16 13:48:45 -07:00
Michael Yang	64039df6d7	Merge pull request #5284 from ollama/mxyng/tools tools	2024-07-15 18:03:37 -07:00
Jeffrey Morgan	7ac6d462ec	server: return empty slice on empty `/api/embed` request (#5713 ) * server: return empty slice on empty `/api/embed` request * fix tests	2024-07-15 17:39:44 -07:00
Michael Yang	d02bbebb11	tools	2024-07-15 15:26:16 -07:00
Jeffrey Morgan	9e35d9bbee	server: lowercase roles for compatibility with clients (#5695 )	2024-07-15 13:55:57 -07:00
royjhan	b9f5e16c80	Introduce `/api/embed` endpoint supporting batch embedding (#5127 ) * Initial Batch Embedding * Revert "Initial Batch Embedding" This reverts commit `c22d54895a`. * Initial Draft * mock up notes * api/embed draft * add server function * check normalization * clean up * normalization * playing around with truncate stuff * Truncation * Truncation * move normalization to go * Integration Test Template * Truncation Integration Tests * Clean up * use float32 * move normalize * move normalize test * refactoring * integration float32 * input handling and handler testing * Refactoring of legacy and new * clear comments * merge conflicts * touches * embedding type 64 * merge conflicts * fix hanging on single string * refactoring * test values * set context length * clean up * testing clean up * testing clean up * remove function closure * Revert "remove function closure" This reverts commit `55d48c6ed1`. * remove function closure * remove redundant error check * clean up * more clean up * clean up	2024-07-15 12:14:24 -07:00
Patrick Devine	057d31861e	remove template (#5655 )	2024-07-13 20:56:24 -07:00
Daniel Hiltgen	ccd7785859	Merge pull request #5243 from dhiltgen/modelfile_use_mmap Fix use_mmap for modefiles	2024-07-03 13:59:42 -07:00
royjhan	996bb1b85e	OpenAI: /v1/models and /v1/models/{model} compatibility (#5007 ) * OpenAI v1 models * Refactor Writers * Add Test Co-Authored-By: Attila Kerekes * Credit Co-Author Co-Authored-By: Attila Kerekes <439392+keriati@users.noreply.github.com> * Empty List Testing * Use Namespace for Ownedby * Update Test * Add back envconfig * v1/models docs * Use ModelName Parser * Test Names * Remove Docs * Clean Up * Test name Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com> * Add Middleware for Chat and List * Testing Cleanup * Test with Fatal * Add functionality to chat test * OpenAI: /v1/models/{model} compatibility (#5028) * Retrieve Model * OpenAI Delete Model * Retrieve Middleware * Remove Delete from Branch * Update Test * Middleware Test File * Function name * Cleanup * Test Update * Test Update --------- Co-authored-by: Attila Kerekes <439392+keriati@users.noreply.github.com> Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>	2024-07-02 11:50:56 -07:00
Daniel Hiltgen	97c9e11768	Switch use_mmap to a pointer type This uses nil as undefined for a cleaner implementation.	2024-07-01 08:44:59 -07:00
Daniel Hiltgen	7e7749224c	Fix use_mmap parsing for modelfiles Add the new tristate parsing logic for the code path for modelfiles, as well as a unit test.	2024-06-21 12:27:19 -07:00
royjhan	fedf71635e	Extend api/show and ollama show to return more model info (#4881 ) * API Show Extended * Initial Draft of Information Co-Authored-By: Patrick Devine <pdevine@sonic.net> * Clean Up * Descriptive arg error messages and other fixes * Second Draft of Show with Projectors Included * Remove Chat Template * Touches * Prevent wrapping from files * Verbose functionality * Docs * Address Feedback * Lint * Resolve Conflicts * Function Name * Tests for api/show model info * Show Test File * Add Projector Test * Clean routes * Projector Check * Move Show Test * Touches * Doc update --------- Co-authored-by: Patrick Devine <pdevine@sonic.net>	2024-06-19 14:19:02 -07:00
Daniel Hiltgen	171796791f	Adjust mmap logic for cuda windows for faster model load On Windows, recent llama.cpp changes make mmap slower in most cases, so default to off. This also implements a tri-state for use_mmap so we can detect the difference between a user provided value of true/false, or unspecified.	2024-06-17 16:54:30 -07:00
royjhan	89c79bec8c	Add ModifiedAt Field to /api/show (#5033 ) * Add Mod Time to Show * Error Handling	2024-06-15 20:53:56 -07:00
Patrick Devine	c69bc19e46	move OLLAMA_HOST to envconfig (#5009 )	2024-06-12 18:48:16 -04:00
royjhan	4bf1da4944	Separate ListResponse and ModelResponse for api/tags vs api/ps (#4842 ) * Remove false time fields * Struct Separation for List and Process * Remove Marshaler	2024-06-06 10:11:45 -07:00
Michael Yang	e40145a39d	lint	2024-06-04 11:13:30 -07:00
Michael Yang	c895a7d13f	some gocritic	2024-06-04 11:13:30 -07:00
Patrick Devine	6845988807	Ollama `ps` command for showing currently loaded models (#4327 )	2024-05-13 17:17:36 -07:00
Jeffrey Morgan	6602e793c0	Use `--quantize` flag and `quantize` api parameter (#4321 ) * rename `--quantization` to `--quantize` * backwards * Update api/types.go Co-authored-by: Michael Yang <mxyng@pm.me> --------- Co-authored-by: Michael Yang <mxyng@pm.me>	2024-05-10 13:06:13 -07:00
Bruce MacDonald	c02db93243	omit empty done reason	2024-05-09 16:45:29 -07:00
Bruce MacDonald	cfa84b8470	add done_reason to the api (#4235 )	2024-05-09 13:30:14 -07:00
Jeffrey Morgan	d5eec16d23	use model defaults for `num_gqa`, `rope_frequency_base` and `rope_frequency_scale` (#1983 )	2024-05-09 09:06:13 -07:00
Eli Bendersky	d77c1c5f9d	api: fill up API documentation (#3596 ) * api: fill up API documentation Followup for #2878 Now that the documentation is more complete, mention it in the README. Updates #2840 * fix typo/lint * Update README.md Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com> --------- Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>	2024-05-07 16:27:46 -07:00
Jackie Li	af47413dba	Add MarshalJSON to Duration (#3284 ) --------- Co-authored-by: Patrick Devine <patrick@infrahq.com>	2024-05-06 15:59:18 -07:00
Patrick Devine	9009bedf13	better checking for OLLAMA_HOST variable (#3661 )	2024-04-29 19:14:07 -04:00
Jeffrey Morgan	993cf8bf55	llm: limit generation to 10x context size to avoid run on generations (#3918 ) * llm: limit generation to 10x context size to avoid run on generations * add comment * simplify condition statement	2024-04-25 19:02:30 -04:00
Daniel Hiltgen	34b9db5afc	Request and model concurrency This change adds support for multiple concurrent requests, as well as loading multiple models by spawning multiple runners. The default settings are currently set at 1 concurrent request per model and only 1 loaded model at a time, but these can be adjusted by setting OLLAMA_NUM_PARALLEL and OLLAMA_MAX_LOADED_MODELS.	2024-04-22 19:29:12 -07:00
Cheng	62be2050dd	chore: use errors.New to replace fmt.Errorf will much better (#3789 )	2024-04-20 22:11:06 -04:00
Eli Bendersky	ad90b9ab3d	api: start adding documentation to package api (#2878 ) * api: start adding documentation to package api Updates #2840 * Fix lint typo report	2024-04-10 13:31:55 -04:00
Michael Yang	01114b4526	fix: rope	2024-04-09 16:15:24 -07:00
Michael Yang	9502e5661f	cgo quantize	2024-04-08 15:31:08 -07:00
Michael Yang	e1c9a2a00f	no blob create if already exists	2024-04-08 15:09:48 -07:00
Michael Yang	be517e491c	no rope parameters	2024-04-05 18:05:27 -07:00
Patrick Devine	1b272d5bcd	change `github.com/jmorganca/ollama` to `github.com/ollama/ollama` (#3347 )	2024-03-26 13:04:17 -07:00
Patrick Devine	47cfe58af5	Default Keep Alive environment variable (#3094 ) --------- Co-authored-by: Chris-AS1 <8493773+Chris-AS1@users.noreply.github.com>	2024-03-13 13:29:40 -07:00

1 2 3 4

185 Commits