ollama

mirror of https://github.com/ollama/ollama.git synced 2025-08-02 07:02:39 +02:00

Author	SHA1	Message	Date
Daniel Hiltgen	82a02e18d9	build: fix typo in override variable (#8031 ) The "F" was missing.	2024-12-10 10:51:16 -08:00
Daniel Hiltgen	4879a234c4	build: Make target improvements (#7499 ) * llama: wire up builtin runner This adds a new entrypoint into the ollama CLI to run the cgo built runner. On Mac arm64, this will have GPU support, but on all other platforms it will be the lowest common denominator CPU build. After we fully transition to the new Go runners more tech-debt can be removed and we can stop building the "default" runner via make and rely on the builtin always. * build: Make target improvements Add a few new targets and help for building locally. This also adjusts the runner lookup to favor local builds, then runners relative to the executable, and finally payloads. * Support customized CPU flags for runners This implements a simplified custom CPU flags pattern for the runners. When built without overrides, the runner name contains the vector flag we check for (AVX) to ensure we don't try to run on unsupported systems and crash. If the user builds a customized set, we omit the naming scheme and don't check for compatibility. This avoids checking requirements at runtime, so that logic has been removed as well. This can be used to build GPU runners with no vector flags, or CPU/GPU runners with additional flags (e.g. AVX512) enabled. * Use relative paths If the user checks out the repo in a path that contains spaces, make gets really confused so use relative paths for everything in-repo to avoid breakage. * Remove payloads from main binary * install: clean up prior libraries This removes support for v0.3.6 and older versions (before the tar bundle) and ensures we clean up prior libraries before extracting the bundle(s). Without this change, runners and dependent libraries could leak when we update and lead to subtle runtime errors.	2024-12-10 09:47:19 -08:00
frob	63269668c0	Prevent underflow when FreeMemory < overhead (#8014 ) Co-authored-by: Richard Lyons <frob@cloudstaff.com>	2024-12-10 09:10:40 -08:00
Jesse Gross	900f64e6be	prompt: Don't trim whitespace from prompts New lines can be an important part of a user's prompt and trimming it can alter the results. We previously only trimmed prompts with images but refactoring brought this behavior to all prompts, where it became more noticable. The /generate endpoint adds less whitespace and therefore doesn't need to trim it out - this brings the same behavior to /chat. Thanks to @gabe-l-hart for spotting the issue! Fixes #7795	2024-12-09 11:02:55 -08:00
Yannick Gloster	da09488fbf	docs: remove comment regarding tool streaming in openai.md (#7960 )	2024-12-07 22:16:21 -08:00
湛露先生	7f0ccc8a9d	docs: fix syntax error in openai.md (#7986 )	2024-12-07 22:14:36 -08:00
Parth Sareen	de52b6c2f9	bugfix: "null" value json mode (#7979 ) v0.5.1	2024-12-06 14:13:15 -08:00
Michael	acd7d03266	readme: add llama3.3 to readme (#7975 ) readme: add llama3.3 to readme	2024-12-06 14:05:11 -05:00
Parth Sareen	f6e87fd628	docs: update readmes for structured outputs (#7962 )	2024-12-06 10:35:37 -08:00
Jeffrey Morgan	aed1419c64	ci: skip go build for tests (#7899 ) v0.5.0-rc1 v0.5.0	2024-12-04 21:22:36 -08:00
Parth Sareen	c6c526275d	api: add generate endpoint for structured outputs (#7939 )	2024-12-04 17:37:12 -08:00
Parth Sareen	630e7dc6ff	api: structured outputs - chat endpoint (#7900 ) Adds structured outputs to chat endpoint --------- Co-authored-by: Michael Yang <mxyng@pm.me> Co-authored-by: Hieu Nguyen <hieunguyen1053@outlook.com>	2024-12-04 16:31:19 -08:00
Michael Yang	eb8366d658	Merge pull request #7932 from ollama/mxyng/fix-merges v0.4.8-rc0	2024-12-04 10:04:52 -08:00
Michael Yang	4456012956	fix unmarshaling merges	2024-12-04 09:21:56 -08:00
Sam	539be43640	llm: normalise kvct parameter handling (#7926 )	2024-12-03 16:30:40 -08:00
Sam	1bdab9fdb1	llm: introduce k/v context quantization (vRAM improvements) (#6279 )	2024-12-03 15:57:19 -08:00
owboson	2b82c5a8a1	docs: correct default num_predict value in modelfile.md (#7693 )	2024-12-03 15:00:05 -08:00
Tigran	55c3efa900	docs: remove extra quote in modelfile.md (#7908 )	2024-12-02 09:28:56 -08:00
David Mayboroda	1aedffad93	readme: add minima to community integrations (#7906 )	2024-12-02 01:14:47 -08:00
Jeffrey Morgan	ff6c2d6dc8	cmd: don't rely on reading repo file for test (#7898 )	2024-11-30 14:12:53 -08:00
Jeffrey Morgan	d543b282a7	server: add warning message for deprecated context field (#7878 )	2024-11-30 14:05:50 -08:00
Parth Sareen	5f8051180e	Enable index tracking for tools - openai api support (#7888 ) v0.4.7	2024-11-29 20:00:09 -08:00
Jeffrey Morgan	39e29ae5dd	llama: fix typo and formatting in readme (#7876 )	2024-11-28 17:27:11 -08:00
TheCookingSenpai	30a9f063c9	readme: add SpaceLlama, YouLama, and DualMind to community integrations (#7216 )	2024-11-28 15:16:27 -08:00
Parth Sareen	ce7455a8e1	api: enable tool streaming (#7836 ) v0.4.6	2024-11-27 13:40:57 -08:00
ItzCrazyKns	e3936d4fb3	Support Multiple LoRa Adapters (#7667 ) Closes #7627	2024-11-27 11:00:04 -08:00
Bruce MacDonald	940e62772e	openai: remove unused error code (#7850 ) The writeError takes a code argument which is no longer used. Remove it for clarity.	2024-11-26 16:08:09 -08:00
Jesse Gross	71e6a0d0d1	runner.go: Don't try to extract image tags for text models When processing a prompt, we look for image tags of the form [img-0], which are inserted by the Ollama server process. However, this can cause errors if the original prompt has these tags - typically an image not found error is returned. This changes tag searching behavior to be similar to the 0.3.x series, which will largely avoid these problems. However,they can still happen when input text with these tags is used with image models. The correct solution is to escape the tags but this is a larger issue with special sequences in general so this is an incremental fix that should avoid the problem for the majority of cases.	2024-11-26 13:23:24 -08:00
Jesse Gross	2cd11ae365	runner.go: Add unit tests for context shifting This also makes it easier to truncate long inputs the same as shifting but does not actually implement it. This type of truncation has a trade off between quality and time to first token.	2024-11-26 11:21:35 -08:00
jake83741	52bbad12f9	readme: update description for vnc-lm community integration (#7832 )	2024-11-25 17:56:30 -08:00
frob	30e88d7f31	cmd: don't submit svg files as images for now (#7830 )	2024-11-25 16:43:29 -08:00
Blake Mizerany	2b7ed61ca2	server: fix Transport override (#7834 ) This changes makeRequest to update the http client Transport if and only if testMakeRequestDialContext is set. This is to avoid overriding the default Transport when testMakeRequestDialContext is nil, which broke existing behavior, included proxies, timeouts, and other behaviors. Fixes #7829 Fixes #7788 v0.4.5	2024-11-25 15:08:34 -08:00
Shikhar Bakhda	647513a7d4	readme: add HoneyHive to community integrations (#7831 )	2024-11-25 09:55:33 -08:00
Bruce MacDonald	a210ec74d2	cmd: print location of model after pushing (#7695 ) After a user pushes their model it is not clear what to do next. Add a link to the output of `ollama push` that tells the user where their model can now be found.	2024-11-25 09:40:16 -08:00
Simon Schampijer	cfb1ddd6fc	examples: update langchain-python-simple (#3591 ) - better formatting of input prompt - use invoke instead of predict	2024-11-24 16:06:22 -08:00
reid41	3987acd7ec	readme: add descriptions for QA-Pilot and shell-pilot community integrations (#4303 )	2024-11-24 15:55:09 -08:00
frob	fda1e6b563	llm: bring fileTypes into alignment with llama.cpp (#7819 )	2024-11-24 10:33:33 -08:00
Adarsh Mishra	3440ffb37b	readme: add description for OpenTalkGpt in community integrations (#7818 )	2024-11-24 10:32:23 -08:00
Patcher	a820d2b267	readme: add observability section with OpenLIT to community-integrations	2024-11-23 18:03:12 -08:00
Meng Zhuo	2ebdb54fb3	all: update math32 go mod to v1.11.0 (#6627 )	2024-11-23 15:21:54 -08:00
josc146	bb52abfa55	readme: add ChatGPTBox and RWKV-Runner to community integrations (#4118 )	2024-11-23 13:31:27 -08:00
oza6ut0ne	31cb1ca9e5	openai: accept X-Stainless-Retry-Count header (#6910 )	2024-11-23 12:39:05 -08:00
Rodrigo Ribeiro Gomes	78f779a323	readme: add powershai, a powershell module with ollama support to community integrations (#7438 )	2024-11-23 10:08:59 -08:00
Jesse Gross	3478b2cf14	runner.go: Fix deadlock with many concurrent requests If there are no avilable slots for new sequences then a request will not be added to the processing queue but will continue on to wait for a response that never comes. Besides never giving a response to the request, this prevents the model from being unloaded due to the outstanding request. To prevent this, there are semaphores that prevent more requests from being processed than there are slots - one in the Ollama server and one in the runner. - The Ollama server one works but it is not designed to protect the runner's data internal structures and the runner can return a final response before clearing its data structures. - The internal runner semaphore has similar behavior where it can release the semaphore when it issues a response. This is wrong - it should only release the semaphore after it has cleared the data structure. In addition, we should return an error if a slot is not found rather than deadlocking in the event we ever get to this spot. Fixes #7779 v0.4.4	2024-11-22 16:14:51 -08:00
Bruce MacDonald	7b5585b9cb	server: remove out of date anonymous access check (#7785 ) In the past the ollama.com server would return a JWT that contained information about the user being authenticated. This was used to return different error messages to the user. This is no longer possible since the token used to authenticate does not contain information about the user anymore. Removing this code that no longer works. Follow up changes will improve the error messages returned here, but good to clean up first.	2024-11-22 11:57:35 -08:00
Daniel Hiltgen	f0a351810c	tests: fix max queue integration test (#7782 ) This had fallen out of sync with the envconfig behavior, where max queue default was not zero.	2024-11-22 08:05:45 -08:00
Daniel Hiltgen	b85520bfb9	logs: explain client aborts better (#7783 ) Users get confused by "Failed to acquire semaphore" error="context canceled" messages in the logs, which are actually clients giving up. While there could be a legitimate hang bug in the system, sometimes this is just short client timeouts with an overloaded system, so this should help users understand what's going on better.	2024-11-22 08:05:32 -08:00
Daniel Hiltgen	d88972ea48	Be quiet when redirecting output (#7360 ) This avoids emitting the progress indicators to stderr, and the interactive prompts to the output file or pipe. Running "ollama run model > out.txt" now exits immediately, and "echo hello \| ollama run model > out.txt" produces zero stderr output and a typical response in out.txt	2024-11-22 08:04:54 -08:00
Leon Sander	25c9339e2d	readme: add Local Multimodal AI Chat app to community integrations (#6931 )	2024-11-21 20:39:38 -08:00
Mikel Olasagasti Uranga	597072ef1b	readme: update google/uuid module (#7310 ) update uuid.New().String() to uuid.NewString()	2024-11-21 19:37:04 -08:00

... 3 4 5 6 7 ...

3915 Commits