60 Commits

Author SHA1 Message Date
Parth Sareen
0682dae027
sample: improve ollama engine sampler performance (#9374)
This change bring in various interface cleanups along with greatly improving the performance of the sampler.

Tested with llama3.2 on local machine.
Improves performance from ~ 70 tokens/s -> 135 tokens/s with topK(40) enabled.
Without topK performance is ~ 110 tokens/s
2025-03-07 12:37:48 -08:00
Blake Mizerany
e2252d0fc6
server/internal/registry: take over pulls from server package (#9485)
This commit replaces the old pull implementation in the server package
with the new, faster, more robust pull implementation in the registry
package.

The new endpoint, and now the remove endpoint too, are behind the
feature gate "client2" enabled only by setting the OLLAMA_EXPERIMENT
environment variable include "client2".

Currently, the progress indication is wired to perform the same as the
previous implementation to avoid making changes to the CLI, and because
the status reports happen at the start of the download, and the end of
the write to disk, the progress indication is not as smooth as it could
be. This is a known issue and will be addressed in a future change.

This implementation may be ~0.5-1.0% slower in rare cases, depending on
network and disk speed, but is generally MUCH faster and more robust
than the its predecessor in all other cases.
2025-03-05 14:48:18 -08:00
Jesse Gross
e185c08ad9 go.mod: Use full version for go 1.24.0
Otherwise on Linux I get:
go: download go1.24 for linux/amd64: toolchain not available
2025-02-27 13:01:32 -08:00
Blake Mizerany
2412adf42b
server/internal: replace model delete API with new registry handler. (#9347)
This commit introduces a new API implementation for handling
interactions with the registry and the local model cache. The new API is
located in server/internal/registry. The package name is "registry" and
should be considered temporary; it is hidden and not bleeding outside of
the server package. As the commits roll in, we'll start consuming more
of the API and then let reverse osmosis take effect, at which point it
will surface closer to the root level packages as much as needed.
2025-02-27 12:04:53 -08:00
Blake Mizerany
4604b10306
go.mod: bump to go1.24 (#9242) 2025-02-24 13:11:46 -08:00
Michael Yang
dcfb7a105c
next build (#8539)
* add build to .dockerignore

* test: only build one arch

* add build to .gitignore

* fix ccache path

* filter amdgpu targets

* only filter if autodetecting

* Don't clobber gpu list for default runner

This ensures the GPU specific environment variables are set properly

* explicitly set CXX compiler for HIP

* Update build_windows.ps1

This isn't complete, but is close.  Dependencies are missing, and it only builds the "default" preset.

* build: add ollama subdir

* add .git to .dockerignore

* docs: update development.md

* update build_darwin.sh

* remove unused scripts

* llm: add cwd and build/lib/ollama to library paths

* default DYLD_LIBRARY_PATH to LD_LIBRARY_PATH in runner on macOS

* add additional cmake output vars for msvc

* interim edits to make server detection logic work with dll directories like lib/ollama/cuda_v12

* remove unncessary filepath.Dir, cleanup

* add hardware-specific directory to path

* use absolute server path

* build: linux arm

* cmake install targets

* remove unused files

* ml: visit each library path once

* build: skip cpu variants on arm

* build: install cpu targets

* build: fix workflow

* shorter names

* fix rocblas install

* docs: clean up development.md

* consistent build dir removal in development.md

* silence -Wimplicit-function-declaration build warnings in ggml-cpu

* update readme

* update development readme

* llm: update library lookup logic now that there is one runner (#8587)

* tweak development.md

* update docs

* add windows cuda/rocm tests

---------

Co-authored-by: jmorganca <jmorganca@gmail.com>
Co-authored-by: Daniel Hiltgen <daniel@ollama.com>
2025-01-29 15:03:38 -08:00
Michael Yang
cb40d60469 chore: upgrade to gods v2
gods v2 uses go generics rather than interfaces which simplifies the
code considerably
2024-12-21 00:05:16 -08:00
Squishedmac
9ab62eb96f
update golang.org/x dependencies (#8172) 2024-12-20 09:29:30 -08:00
Blake Mizerany
a37f4a86a7
go.mod: go 1.22.8 -> 1.23.4 (#8036) 2024-12-10 18:16:16 -08:00
Meng Zhuo
2ebdb54fb3
all: update math32 go mod to v1.11.0 (#6627) 2024-11-23 15:21:54 -08:00
Mikel Olasagasti Uranga
597072ef1b
readme: update google/uuid module (#7310)
update uuid.New().String() to uuid.NewString()
2024-11-21 19:37:04 -08:00
Bruce MacDonald
0679d491fe
chore(deps): bump golang.org/x dependencies (#7655)
- golang.org/x/sync v0.3.0 -> v0.9.0
- golang.org/x/image v0.14.0 -> v0.22.0
- golang.org/x/text v0.15.0 -> v0.20.0
2024-11-14 13:58:25 -08:00
Daniel Hiltgen
abd5dfd06a
Bump to latest Go 1.22 patch (#7379) 2024-10-26 17:03:37 -07:00
Patrick Devine
c7cb0f0602
image processing for llama3.2 (#6963)
Co-authored-by: jmorganca <jmorganca@gmail.com>
Co-authored-by: Michael Yang <mxyng@pm.me>
Co-authored-by: Jesse Gross <jesse@ollama.com>
2024-10-18 16:12:35 -07:00
Daniel Hiltgen
feedf49c71 Go back to a pinned Go version
Go version 1.22.6 is triggering AV false positives, so go back to 1.22.5
2024-08-13 11:45:44 -07:00
Michael Yang
fb6cbc02fb update named templates 2024-07-05 16:29:32 -07:00
Michael Yang
9b6c2e6eb6 detect chat template from KV 2024-06-06 16:03:47 -07:00
Michael Yang
171eb040fc simplify safetensors reading 2024-05-21 11:28:22 -07:00
Michael Yang
34d5ef29b3 fix conversion for f16 or f32 inputs 2024-05-21 11:28:22 -07:00
jmorganca
63a453554d go mod tidy 2024-05-19 23:03:57 -07:00
Patrick Devine
1e1634daca
update go deps (#4324) 2024-05-10 21:39:27 -07:00
Patrick Devine
9f8691c6c8
Add llama2 / torch models for ollama create (#3607) 2024-04-15 11:26:42 -07:00
Patrick Devine
5a5efee46b
Add gemma safetensors conversion (#3250)
Co-authored-by: Michael Yang <mxyng@pm.me>
2024-03-28 18:54:01 -07:00
Patrick Devine
1b272d5bcd
change github.com/jmorganca/ollama to github.com/ollama/ollama (#3347) 2024-03-26 13:04:17 -07:00
Patrick Devine
2c017ca441
Convert Safetensors to an Ollama model (#2824) 2024-03-06 21:01:51 -08:00
Michael Yang
fc483274ad clean up go.mod 2024-02-23 16:53:36 -08:00
vinjn
66ef308abd Import "containerd/console" lib to support colorful output in Windows terminal 2024-02-15 05:56:45 +00:00
Daniel Hiltgen
29e90cc13b Implement new Go based Desktop app
This focuses on Windows first, but coudl be used for Mac
and possibly linux in the future.
2024-02-15 05:56:45 +00:00
Daniel Hiltgen
ecbfc0182f Go bump to v1.21 to pick up slog 2024-01-18 14:12:57 -08:00
Daniel Hiltgen
39928a42e8 Always dynamically load the llm server library
This switches darwin to dynamic loading, and refactors the code now that no
static linking of the library is used on any platform
2024-01-11 08:42:47 -08:00
Daniel Hiltgen
d4cd695759 Add cgo implementation for llama.cpp
Run the server.cpp directly inside the Go runtime via cgo
while retaining the LLM Go abstractions.
2023-12-19 09:05:46 -08:00
Patrick Devine
630518f0d9
Add unit test of API routes (#1528) 2023-12-14 16:47:40 -08:00
Michael Yang
7232f1fa41 go mod tidy 2023-12-04 16:59:23 -08:00
Michael Yang
01ea6002c4 replace go-humanize with format.HumanBytes 2023-11-14 14:57:41 -08:00
Michael Yang
341fb7e35f go mod tidy 2023-11-01 11:54:25 -07:00
Patrick Devine
deeac961bb
new readline library (#847) 2023-10-25 16:41:18 -07:00
Ajay Kemparaj
bb8464c0d2
update golang.org/x/net fixes CVE-2023-3978,CVE-2023-39325,CVE-2023-44487 (#855) 2023-10-25 16:17:24 -07:00
Bruce MacDonald
a0c3e989de
deprecate modelfile embed command (#759) 2023-10-16 11:07:37 -04:00
Michael Yang
8544edca21 parallel chunked downloads 2023-10-06 12:56:43 -07:00
Patrick Devine
87d9efb364
switch to forked readline lib which doesn't wreck the repl prompt (#578) 2023-09-22 12:17:45 -07:00
Michael Yang
e9f6df7dca use slices.DeleteFunc 2023-09-05 09:56:59 -07:00
Bruce MacDonald
42998d797d
subprocess llama.cpp server (#401)
* remove c code
* pack llama.cpp
* use request context for llama_cpp
* let llama_cpp decide the number of threads to use
* stop llama runner when app stops
* remove sample count and duration metrics
* use go generate to get libraries
* tmp dir for running llm
2023-08-30 16:35:03 -04:00
Michael Yang
d791df75dd check memory requirements before loading 2023-08-10 09:23:11 -07:00
Bruce MacDonald
a6f6d18f83 embed text document in modelfile 2023-08-08 11:27:17 -04:00
Bruce MacDonald
1c5a8770ee read runner parameter options from map
- read runner options from map to see what was specified explicitly and overwrite zero values
2023-08-01 13:38:19 -04:00
Bruce MacDonald
daa0d1de7a allow specifying zero values in modelfile 2023-08-01 13:37:50 -04:00
Michael Yang
8609db77ea use gin-contrib/cors middleware 2023-07-22 09:39:08 -07:00
Patrick Devine
e4d7f3e287
vendor in progress bar and change to bytes instead of bibytes (#130) 2023-07-19 17:24:03 -07:00
Michael Yang
84200dcde6 use readline 2023-07-19 13:34:56 -07:00
Patrick Devine
5bea29f610
add new list command (#97) 2023-07-18 09:09:45 -07:00