291 Commits

Author SHA1 Message Date
Patrick Devine
a420a453b4
fix default modelfile for create (#8452) 2025-01-16 01:14:04 -08:00
Patrick Devine
32bd37adf8
make the modelfile path relative for ollama create (#8380) 2025-01-10 16:14:08 -08:00
Patrick Devine
8bccae4f92
show a more descriptive error in the client if it is newer than the server (#8351) 2025-01-09 10:12:30 -08:00
Patrick Devine
86a622cbdc
Update the /api/create endpoint to use JSON (#7935)
Replaces `POST /api/create` to use JSON instead of a Modelfile.

This is a breaking change.
2024-12-31 18:02:30 -08:00
Patrick Devine
dd352ab27f
fix crash bug with /save when quotes are used (#8208) 2024-12-21 22:31:37 -08:00
Blake Mizerany
b1fd7fef86
server: more support for mixed-case model names (#8017)
Fixes #7944
2024-12-11 15:29:59 -08:00
Daniel Hiltgen
4879a234c4
build: Make target improvements (#7499)
* llama: wire up builtin runner

This adds a new entrypoint into the ollama CLI to run the cgo built runner.
On Mac arm64, this will have GPU support, but on all other platforms it will
be the lowest common denominator CPU build.  After we fully transition
to the new Go runners more tech-debt can be removed and we can stop building
the "default" runner via make and rely on the builtin always.

* build: Make target improvements

Add a few new targets and help for building locally.
This also adjusts the runner lookup to favor local builds, then
runners relative to the executable, and finally payloads.

* Support customized CPU flags for runners

This implements a simplified custom CPU flags pattern for the runners.
When built without overrides, the runner name contains the vector flag
we check for (AVX) to ensure we don't try to run on unsupported systems
and crash.  If the user builds a customized set, we omit the naming
scheme and don't check for compatibility.  This avoids checking
requirements at runtime, so that logic has been removed as well.  This
can be used to build GPU runners with no vector flags, or CPU/GPU
runners with additional flags (e.g. AVX512) enabled.

* Use relative paths

If the user checks out the repo in a path that contains spaces, make gets
really confused so use relative paths for everything in-repo to avoid breakage.

* Remove payloads from main binary

* install: clean up prior libraries

This removes support for v0.3.6 and older versions (before the tar bundle)
and ensures we clean up prior libraries before extracting the bundle(s).
Without this change, runners and dependent libraries could leak when we
update and lead to subtle runtime errors.
2024-12-10 09:47:19 -08:00
Parth Sareen
de52b6c2f9
bugfix: "null" value json mode (#7979) 2024-12-06 14:13:15 -08:00
Parth Sareen
c6c526275d
api: add generate endpoint for structured outputs (#7939) 2024-12-04 17:37:12 -08:00
Parth Sareen
630e7dc6ff
api: structured outputs - chat endpoint (#7900)
Adds structured outputs to chat endpoint
---------

Co-authored-by: Michael Yang <mxyng@pm.me>
Co-authored-by: Hieu Nguyen <hieunguyen1053@outlook.com>
2024-12-04 16:31:19 -08:00
Sam
1bdab9fdb1
llm: introduce k/v context quantization (vRAM improvements) (#6279) 2024-12-03 15:57:19 -08:00
Jeffrey Morgan
ff6c2d6dc8
cmd: don't rely on reading repo file for test (#7898) 2024-11-30 14:12:53 -08:00
frob
30e88d7f31
cmd: don't submit svg files as images for now (#7830) 2024-11-25 16:43:29 -08:00
Bruce MacDonald
a210ec74d2
cmd: print location of model after pushing (#7695)
After a user pushes their model it is not clear what to do next. Add a link
to the output of `ollama push` that tells the user where their model can now
be found.
2024-11-25 09:40:16 -08:00
Bruce MacDonald
7b5585b9cb
server: remove out of date anonymous access check (#7785)
In the past the ollama.com server would return a JWT that contained
information about the user being authenticated. This was used to return
different error messages to the user. This is no longer possible since the
token used to authenticate does not contain information about the user
anymore. Removing this code that no longer works.

Follow up changes will improve the error messages returned here, but good to
clean up first.
2024-11-22 11:57:35 -08:00
Daniel Hiltgen
d88972ea48
Be quiet when redirecting output (#7360)
This avoids emitting the progress indicators to stderr, and the interactive
prompts to the output file or pipe.  Running "ollama run model > out.txt"
now exits immediately, and "echo hello | ollama run model > out.txt"
produces zero stderr output and a typical response in out.txt
2024-11-22 08:04:54 -08:00
湛露先生
eaaf5d309d
cmd: delete duplicated call to sb.Reset() (#7308)
Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>
2024-11-21 11:20:48 -08:00
Blake Mizerany
67691e410d
cmd: preserve exact bytes when displaying template/system layers (#7586) 2024-11-13 23:53:30 -08:00
Daniel Hiltgen
35ec7f079f
Fix unicode output on windows with redirect to file (#7358)
If we're not writing out to a terminal, avoid setting the console mode
on windows, which corrupts the output file.
2024-10-25 13:43:16 -07:00
Patrick Devine
d78fb62056
default to "FROM ." if a Modelfile isn't present (#7250) 2024-10-22 13:32:24 -07:00
Patrick Devine
c7cb0f0602
image processing for llama3.2 (#6963)
Co-authored-by: jmorganca <jmorganca@gmail.com>
Co-authored-by: Michael Yang <mxyng@pm.me>
Co-authored-by: Jesse Gross <jesse@ollama.com>
2024-10-18 16:12:35 -07:00
Jesse Gross
7fe3902552 cli: Send all images in conversation history
Currently the CLI only sends images from the most recent image-
containing message. This prevents doing things like sending
one message with an image and then a follow message with a
second image and asking for comparision based on additional
information not present in any text that was output.

It's possible that some models have a problem with this but the
CLI is not the right place to do this since any adjustments are
model-specific and should affect all clients.

Both llava:34b and minicpm-v do reasonable things with multiple
images in the history.
2024-10-10 11:21:51 -07:00
Alex Mavrogiannis
f40bb398f6
Stop model before deletion if loaded (fixed #6957) (#7050) 2024-10-01 15:45:43 -07:00
Patrick Devine
abed273de3
add "stop" command (#6739) 2024-09-11 16:36:21 -07:00
Michael Yang
ecab6f1cc5 refactor show ouput
fixes line wrapping on long texts
2024-09-11 14:23:09 -07:00
Daniel Hiltgen
6719097649
llm: make load time stall duration configurable via OLLAMA_LOAD_TIMEOUT
With the new very large parameter models, some users are willing to wait for
a very long time for models to load.
2024-09-05 14:00:08 -07:00
Daniel Hiltgen
b05c9e83d9
Introduce GPU Overhead env var (#5922)
Provide a mechanism for users to set aside an amount of VRAM on each GPU
to make room for other applications they want to start after Ollama, or workaround
memory prediction bugs
2024-09-05 13:46:35 -07:00
Vimal Kumar
5f7b4a5e30
fix(cmd): show info may have nil ModelInfo (#6579) 2024-08-31 21:12:17 -07:00
Patrick Devine
0c819e167b
convert safetensor adapters into GGUF (#6327) 2024-08-23 11:29:56 -07:00
Michael Yang
beb49eef65 create bert models from cli 2024-08-20 17:27:34 -07:00
longtao
0a8d6ea86d
Fix typo and improve readability (#5964)
* Fix typo and improve readability

Summary:
* Rename updatAvailableMenuID to updateAvailableMenuID
* Replace unused cmd parameter with _ in RunServer function
* Fix typos in comments

(cherry picked from commit 5b8715f0b04773369e8eb1f9e6737995a0ab3ba7)

* Update api/client.go

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

---------

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
2024-08-13 17:54:19 -07:00
Josh
f7e3b9190f
cmd: spinner progress for transfer model data (#6100) 2024-08-12 11:46:32 -07:00
Michael Yang
b732beba6a lint 2024-08-01 17:06:06 -07:00
Michael Yang
c4c84b7a0d
Merge pull request #5196 from ollama/mxyng/messages-2
include modelfile messages
2024-07-31 10:18:17 -07:00
Michael Yang
5c1912769e
Merge pull request #5473 from ollama/mxyng/environ
fix: environ lookup
2024-07-31 10:18:05 -07:00
Daniel Hiltgen
1a83581a8e
Merge pull request #5895 from dhiltgen/sched_faq
Better explain multi-gpu behavior
2024-07-29 14:25:41 -07:00
Michael Yang
38d9036b59
Merge pull request #5992 from ollama/mxyng/save
fix: model save
2024-07-29 09:53:19 -07:00
Tibor Schmidt
f3d7a481b7
feat: add support for min_p (resolve #1142) (#1825) 2024-07-27 14:37:40 -07:00
Michael Yang
a250c2cb13 display messages 2024-07-26 13:39:57 -07:00
Michael Yang
3d9de805b7 fix: model save
stop parameter is saved as a slice which is incompatible with modelfile
parsing
2024-07-26 13:23:06 -07:00
Michael Yang
15af558423 include modelfile messages 2024-07-26 11:40:11 -07:00
Daniel Hiltgen
830fdd2715 Better explain multi-gpu behavior 2024-07-23 15:16:38 -07:00
Michael Yang
55cd3ddcca bool 2024-07-22 11:27:21 -07:00
Michael Yang
4f1afd575d host 2024-07-22 11:25:30 -07:00
Daniel Hiltgen
cc269ba094 Remove no longer supported max vram var
The OLLAMA_MAX_VRAM env var was a temporary workaround for OOM
scenarios.  With Concurrency this was no longer wired up, and the simplistic
value doesn't map to multi-GPU setups.  Users can still set `num_gpu`
to limit memory usage to avoid OOM if we get our predictions wrong.
2024-07-22 09:08:11 -07:00
Patrick Devine
057d31861e
remove template (#5655) 2024-07-13 20:56:24 -07:00
Patrick Devine
23ebbaa46e Revert "remove template from tests"
This reverts commit 9ac0a7a50b8d7a0f0627b037c7632181bfbcca97.
2024-07-12 15:47:17 -07:00
Patrick Devine
9ac0a7a50b remove template from tests 2024-07-12 15:41:31 -07:00
royjhan
5f034f5b63
Include Show Info in Interactive (#5342) 2024-06-28 13:15:52 -07:00
royjhan
b910fa9010
Ollama Show: Check for Projector Type (#5307)
* Check exists projtype

* Maintain Ordering
2024-06-28 11:30:16 -07:00