ollama

mirror of https://github.com/ollama/ollama.git synced 2025-07-21 14:32:48 +02:00

Author	SHA1	Message	Date
Patrick Devine	6d1103048e	fix: show correct bool value for kv in verbose show information (#9928 )	2025-03-21 11:13:54 -07:00
Patrick Devine	2c8b484643	fix: correctly save in interactive mode (#9788 ) This fixes the case where a FROM line in previous modelfile points to a file which may/may not be present in a different ollama instance. We shouldn't be relying on the filename though and instead just check if the FROM line was instead a valid model name and point to that instead.	2025-03-15 12:09:02 -07:00
Patrick Devine	4bed739259	add verbose mode to the show command (#9640 ) Add metadata and tensor information to the show command to be able to see more information about a model. This outputs the same data as shown on the model details page on ollama.com	2025-03-13 14:24:27 -07:00
frob	b3af953a55	cli: don't exit for invalid model during /load. (#9576 ) Co-authored-by: Richard Lyons <frob@cloudstaff.com>	2025-03-11 23:42:53 -07:00
Michael Yang	05a01fdecb	ml/backend/ggml: consolidate system info logging - output backend system info when initializing the backend. this ensures this information is always present without needing to be called explicitly - convert to structured logging - enumerate devices rather than backends since devices are ordered - track device indices grouped by device name	2025-03-04 15:14:31 -08:00
Daniel Hiltgen	1fdb351c37	New engine: vision models and auto-fallback (#9113 ) * Include unified vision layers in memory prediction For newer vision models with a single gguf, include the projection estimates. * Adjust CLI to handle both styles of vision model metadata * Wire up new tokenizers for new engine If we're loading the new engine, utilize the new model text processor instead of calling into cgo wrappers for llama.cpp. This also cleans up some tech debt from the older tokenization flow for the C++ server which was no longer used. This also adjusts the grammar handling logic to pass through to the new engine instead of utilizing the cgo schema to grammar call. * Lay foundation for auto selection of new engine	2025-03-04 09:03:46 -08:00
CYJiang	d25efe3954	cmd: add default err return for stop (#9458 )	2025-03-03 12:13:41 -08:00
yuiseki	d721a02e7d	test: add test cases for ListHandler (#9146 )	2025-02-19 13:24:27 -08:00
Jesse Gross	ed443a0393	Runner for Ollama engine This provides integration with the new Ollama engine (`5824541` next ollama runner (#7913)) and the rest of the Ollama infrastructure such as the runner and Ollama server. In addition, it also builds out the KV cache infrastructure to support requirements of how Ollama runs models such as: - Parallel processing - Memory management for defragmentation and shifting - Multi-modal modals Both old and new engines continue to be supported. By default, only the old engine is used. To enable the new engine: Start the server with the OLLAMA_NEW_ENGINE environment variable set: OLLAMA_NEW_ENGINE=1 ./ollama serve Start a model that is supported by the Ollama engine. This one is Llama 3.1 8b Q4_K_M: ./ollama run jessegross/llama3.1	2025-02-13 17:09:26 -08:00
Patrick Devine	a420a453b4	fix default modelfile for create (#8452 )	2025-01-16 01:14:04 -08:00
Patrick Devine	32bd37adf8	make the modelfile path relative for `ollama create` (#8380 )	2025-01-10 16:14:08 -08:00
Patrick Devine	8bccae4f92	show a more descriptive error in the client if it is newer than the server (#8351 )	2025-01-09 10:12:30 -08:00
Patrick Devine	86a622cbdc	Update the /api/create endpoint to use JSON (#7935 ) Replaces `POST /api/create` to use JSON instead of a Modelfile. This is a breaking change.	2024-12-31 18:02:30 -08:00
Patrick Devine	dd352ab27f	fix crash bug with /save when quotes are used (#8208 )	2024-12-21 22:31:37 -08:00
Blake Mizerany	b1fd7fef86	server: more support for mixed-case model names (#8017 ) Fixes #7944	2024-12-11 15:29:59 -08:00
Daniel Hiltgen	4879a234c4	build: Make target improvements (#7499 ) * llama: wire up builtin runner This adds a new entrypoint into the ollama CLI to run the cgo built runner. On Mac arm64, this will have GPU support, but on all other platforms it will be the lowest common denominator CPU build. After we fully transition to the new Go runners more tech-debt can be removed and we can stop building the "default" runner via make and rely on the builtin always. * build: Make target improvements Add a few new targets and help for building locally. This also adjusts the runner lookup to favor local builds, then runners relative to the executable, and finally payloads. * Support customized CPU flags for runners This implements a simplified custom CPU flags pattern for the runners. When built without overrides, the runner name contains the vector flag we check for (AVX) to ensure we don't try to run on unsupported systems and crash. If the user builds a customized set, we omit the naming scheme and don't check for compatibility. This avoids checking requirements at runtime, so that logic has been removed as well. This can be used to build GPU runners with no vector flags, or CPU/GPU runners with additional flags (e.g. AVX512) enabled. * Use relative paths If the user checks out the repo in a path that contains spaces, make gets really confused so use relative paths for everything in-repo to avoid breakage. * Remove payloads from main binary * install: clean up prior libraries This removes support for v0.3.6 and older versions (before the tar bundle) and ensures we clean up prior libraries before extracting the bundle(s). Without this change, runners and dependent libraries could leak when we update and lead to subtle runtime errors.	2024-12-10 09:47:19 -08:00
Parth Sareen	de52b6c2f9	bugfix: "null" value json mode (#7979 )	2024-12-06 14:13:15 -08:00
Parth Sareen	c6c526275d	api: add generate endpoint for structured outputs (#7939 )	2024-12-04 17:37:12 -08:00
Parth Sareen	630e7dc6ff	api: structured outputs - chat endpoint (#7900 ) Adds structured outputs to chat endpoint --------- Co-authored-by: Michael Yang <mxyng@pm.me> Co-authored-by: Hieu Nguyen <hieunguyen1053@outlook.com>	2024-12-04 16:31:19 -08:00
Sam	1bdab9fdb1	llm: introduce k/v context quantization (vRAM improvements) (#6279 )	2024-12-03 15:57:19 -08:00
Jeffrey Morgan	ff6c2d6dc8	cmd: don't rely on reading repo file for test (#7898 )	2024-11-30 14:12:53 -08:00
frob	30e88d7f31	cmd: don't submit svg files as images for now (#7830 )	2024-11-25 16:43:29 -08:00
Bruce MacDonald	a210ec74d2	cmd: print location of model after pushing (#7695 ) After a user pushes their model it is not clear what to do next. Add a link to the output of `ollama push` that tells the user where their model can now be found.	2024-11-25 09:40:16 -08:00
Bruce MacDonald	7b5585b9cb	server: remove out of date anonymous access check (#7785 ) In the past the ollama.com server would return a JWT that contained information about the user being authenticated. This was used to return different error messages to the user. This is no longer possible since the token used to authenticate does not contain information about the user anymore. Removing this code that no longer works. Follow up changes will improve the error messages returned here, but good to clean up first.	2024-11-22 11:57:35 -08:00
Daniel Hiltgen	d88972ea48	Be quiet when redirecting output (#7360 ) This avoids emitting the progress indicators to stderr, and the interactive prompts to the output file or pipe. Running "ollama run model > out.txt" now exits immediately, and "echo hello \| ollama run model > out.txt" produces zero stderr output and a typical response in out.txt	2024-11-22 08:04:54 -08:00
湛露先生	eaaf5d309d	cmd: delete duplicated call to sb.Reset() (#7308 ) Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>	2024-11-21 11:20:48 -08:00
Blake Mizerany	67691e410d	cmd: preserve exact bytes when displaying template/system layers (#7586 )	2024-11-13 23:53:30 -08:00
Daniel Hiltgen	35ec7f079f	Fix unicode output on windows with redirect to file (#7358 ) If we're not writing out to a terminal, avoid setting the console mode on windows, which corrupts the output file.	2024-10-25 13:43:16 -07:00
Patrick Devine	d78fb62056	default to "FROM ." if a Modelfile isn't present (#7250 )	2024-10-22 13:32:24 -07:00
Patrick Devine	c7cb0f0602	image processing for llama3.2 (#6963 ) Co-authored-by: jmorganca <jmorganca@gmail.com> Co-authored-by: Michael Yang <mxyng@pm.me> Co-authored-by: Jesse Gross <jesse@ollama.com>	2024-10-18 16:12:35 -07:00
Jesse Gross	7fe3902552	cli: Send all images in conversation history Currently the CLI only sends images from the most recent image- containing message. This prevents doing things like sending one message with an image and then a follow message with a second image and asking for comparision based on additional information not present in any text that was output. It's possible that some models have a problem with this but the CLI is not the right place to do this since any adjustments are model-specific and should affect all clients. Both llava:34b and minicpm-v do reasonable things with multiple images in the history.	2024-10-10 11:21:51 -07:00
Alex Mavrogiannis	f40bb398f6	Stop model before deletion if loaded (fixed #6957 ) (#7050 )	2024-10-01 15:45:43 -07:00
Patrick Devine	abed273de3	add "stop" command (#6739 )	2024-09-11 16:36:21 -07:00
Michael Yang	ecab6f1cc5	refactor show ouput fixes line wrapping on long texts	2024-09-11 14:23:09 -07:00
Daniel Hiltgen	6719097649	llm: make load time stall duration configurable via OLLAMA_LOAD_TIMEOUT With the new very large parameter models, some users are willing to wait for a very long time for models to load.	2024-09-05 14:00:08 -07:00
Daniel Hiltgen	b05c9e83d9	Introduce GPU Overhead env var (#5922 ) Provide a mechanism for users to set aside an amount of VRAM on each GPU to make room for other applications they want to start after Ollama, or workaround memory prediction bugs	2024-09-05 13:46:35 -07:00
Vimal Kumar	5f7b4a5e30	fix(cmd): show info may have nil ModelInfo (#6579 )	2024-08-31 21:12:17 -07:00
Patrick Devine	0c819e167b	convert safetensor adapters into GGUF (#6327 )	2024-08-23 11:29:56 -07:00
Michael Yang	beb49eef65	create bert models from cli	2024-08-20 17:27:34 -07:00
longtao	0a8d6ea86d	Fix typo and improve readability (#5964 ) * Fix typo and improve readability Summary: * Rename updatAvailableMenuID to updateAvailableMenuID * Replace unused cmd parameter with _ in RunServer function * Fix typos in comments (cherry picked from commit 5b8715f0b04773369e8eb1f9e6737995a0ab3ba7) * Update api/client.go Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com> --------- Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>	2024-08-13 17:54:19 -07:00
Josh	f7e3b9190f	cmd: spinner progress for transfer model data (#6100 )	2024-08-12 11:46:32 -07:00
Michael Yang	b732beba6a	lint	2024-08-01 17:06:06 -07:00
Michael Yang	c4c84b7a0d	Merge pull request #5196 from ollama/mxyng/messages-2 include modelfile messages	2024-07-31 10:18:17 -07:00
Michael Yang	5c1912769e	Merge pull request #5473 from ollama/mxyng/environ fix: environ lookup	2024-07-31 10:18:05 -07:00
Daniel Hiltgen	1a83581a8e	Merge pull request #5895 from dhiltgen/sched_faq Better explain multi-gpu behavior	2024-07-29 14:25:41 -07:00
Michael Yang	38d9036b59	Merge pull request #5992 from ollama/mxyng/save fix: model save	2024-07-29 09:53:19 -07:00
Tibor Schmidt	f3d7a481b7	feat: add support for min_p (resolve #1142 ) (#1825 )	2024-07-27 14:37:40 -07:00
Michael Yang	a250c2cb13	display messages	2024-07-26 13:39:57 -07:00
Michael Yang	3d9de805b7	fix: model save stop parameter is saved as a slice which is incompatible with modelfile parsing	2024-07-26 13:23:06 -07:00
Michael Yang	15af558423	include modelfile messages	2024-07-26 11:40:11 -07:00

1 2 3 4 5 ...

300 Commits