3954 Commits

Author SHA1 Message Date
Michael Yang
58245413f4
next ollama runner (#7913)
feat: add new Ollama engine using ggml through cgo

This change introduces a new way to run pretrained models. It introduces 3 high level interfaces and a bunch of smaller helper interfaces to facilitate this.

- `model.Model` defines the interface for a model architecture. Models such as `llama` and `mllama`, which are provided as examples, can implement the model's forward propagation in the `Forward` method. This method will be called to generate completions. This interface can be found in `model/model.go`
- `ml.Backend` defines the interface for a backend tensor library, in this case `ggml`. Among other things, a Backend is responsible for loading a pretrained model into hardware (GPU, CPU, etc) and providing an interface for Models to access loaded tensors. This interface can be found in `ml/backend.go`
- `ml.Tensor` defines the interface for a tensor and tensor operations

This is the first implementation of the new engine. Follow up PRs will implement more features:

- non-greedy sampling (#8410)
- integration with Ollama and KV caching (#8301)
- more model support (#9080) with more coming soon

Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
2025-02-13 16:31:21 -08:00
Bùi Đức Nhật
8cf16063a5
docs: add ollamazing to the README.md (#9075) 2025-02-13 10:47:09 -08:00
frob
3a4449e2f1
docs: add H200 as supported device. (#9076)
Co-authored-by: Richard Lyons <frob@cloudstaff.com>
2025-02-13 10:44:23 -08:00
Anuraag (Rag) Agrawal
10d59d5f90
openai: finish_reason as tool_calls for streaming with tools (#7963) 2025-02-13 10:20:12 -08:00
Jeffrey Morgan
a4f69a0191
build: add -DGGML_CUDA_NO_PEER_COPY=ON for rocm builds on windows (#9060) v0.5.9 2025-02-13 00:23:17 -08:00
Clinton
82658c3eec
readme: add Homebrew to package managers section (#9052) 2025-02-12 11:17:39 -08:00
bloominstrong
378d6e1e6a
docs: fix nix package link (#9045)
removing the channel tag from the url so it will always go to the current stable channel.
2025-02-12 09:16:26 -08:00
Hugues Chocart
afa55bc70c
doc: fix link for Abso (#9043) 2025-02-12 09:15:08 -08:00
Michael Yang
49df03da9a
fix: harden backend loading (#9024)
* wrap ggml_backend_load_best in try/catch
* ignore non-ollama paths
v0.5.9-rc0
2025-02-11 15:36:53 -08:00
Hugues Chocart
0189bdd0b7
readme: add Abso SDK to community integrations (#8973) 2025-02-11 00:14:45 -08:00
Jeffrey Morgan
f4711da7bd
ml/backend/ggml: fix crash on dlopen for non-AVX systems (#8976) v0.5.8-rc13 v0.5.8 2025-02-10 09:52:12 -08:00
Hugues Chocart
38117fba83
readme: add Lunary to observability community integrations (#8975) 2025-02-09 22:08:46 -08:00
Michael Yang
1f766c36fb
ci: use windows-2022 to sign and bundle (#8941)
ollama requires vcruntime140_1.dll which isn't found on 2019. previously
the job used the windows runner (2019) but it explicitly installs
2022 to build the app. since the sign job doesn't actually build
anything, it can use the windows-2022 runner instead.
v0.5.8-rc12
2025-02-08 13:07:00 -08:00
Qusai Ismael
484a99e428
docs: add LocalLLM app to community integrations (#8953) 2025-02-08 12:28:01 -08:00
DravenK
ec6121c331
docs: ollama zig community lib (#8688) 2025-02-08 11:10:47 -08:00
Jeffrey Morgan
b86c0a1500
docs: link directly to latest release page for tdm-gcc (#8939) 2025-02-08 00:21:10 -08:00
Guddu Kumar
7e402ebb8c
readme: add deepseek to supported models v0.5.8-rc11 2025-02-07 11:28:28 -08:00
Azis Alvriyanto
b901a712c6
docs: improve syntax highlighting in code blocks (#8854) 2025-02-07 09:55:07 -08:00
Michael Yang
abb8dd57f8
add gfx instinct gpus (#8933) 2025-02-07 09:51:22 -08:00
Leisure Linux
a400df48c0
docs: include port in faq.md OLLAMA_HOST examples (#8905) 2025-02-06 18:45:09 -08:00
annilq
6ab4ba4c26
readme: add React Native client to community integrations (#8877) 2025-02-06 17:15:48 -08:00
CosmicEventHorizon
e8d4eb3e68
readme: add ChibiChat to community integrations (#8883) v0.5.8-rc10 2025-02-06 16:08:46 -08:00
Michael Yang
ae7e368f75
build(rocm): add numa, elf (#8900) 2025-02-06 15:46:30 -08:00
oslook
31acd1ebf9
readme: add Ollama Chat WebUI for Docker to community integrations (#8084) 2025-02-06 15:41:02 -08:00
Michael Yang
9a4757ae66
build(rocm): add tinfo (#8899) 2025-02-06 15:08:12 -08:00
Abhinav Pant
7814019708
docs: add step for removing libraries in linux.md (#8897) 2025-02-06 14:54:58 -08:00
Michael Yang
b698f9a0d8
build: add missing dependencies (#8896) v0.5.8-rc9 2025-02-06 13:12:16 -08:00
Azis Alvriyanto
32285a6d19
format: rename test file from byte_test.go to bytes_test.go (#8865) 2025-02-06 13:06:15 -08:00
Michael Yang
1c198977ec
ci: fix linux archive (#8862)
the find returns intermediate directories which pulls the parent
directories. it also omits files under lib/ollama.

switch back to globbing
v0.5.8-rc8
2025-02-05 19:45:58 -08:00
zyphixor
330b6c50b0
readme: add simple-discord-ai to community integrations (#8659) 2025-02-05 18:35:04 -08:00
Diego Pereira
928911bc68
runner: avoid buffer overwrite when generating multiple embeddings (#8714)
Shield the code processing the embedding result
from subsequent calls that may overwrite the same
buffer to process a second input when retrieving
model embeddings.
2025-02-05 16:53:33 -08:00
Michael Yang
5b446cc815
chore: update gitattributes (#8860)
* chore: update gitattributes
* chore: add build info source
2025-02-05 16:37:18 -08:00
Daniel Lok
451c1596af
readme: add MLflow Tracing as an observability integration (#8811) 2025-02-05 16:04:24 -08:00
Michael Yang
932bded12f chore: add optional field for server logs 2025-02-05 15:55:32 -08:00
Michael Yang
070ad913ac ci: fix linux archive 2025-02-05 15:08:02 -08:00
Azis Alvriyanto
8d8b9f83ae
format: byte formatting test coverage (#8692)
Removed redundant checks and streamlined the switch-case structure.
Added test cases for both HumanBytes and HumanBytes2 to cover a wide range of scenarios.
2025-02-05 12:23:07 -08:00
Jeffrey Morgan
f00d359a67
docs: add section in development.md on library detection (#8855) 2025-02-05 11:16:27 -08:00
Yashwanth A
291def6adb
server: increase timeout in stall detection from 5s to 30s (#8831)
In some cases, downloads slow due to disk i/o or other factors,
causing the download to restart a part. This causes the download
to "reverse" in percent completion. By increasing the timeout to 30s,
this should happen less frequently.
v0.5.8-rc7
2025-02-05 10:00:26 -08:00
Jeffrey Morgan
cd3fbf1c49
llama: use dynamic backend loading for mllama and clip (#8835) 2025-02-05 09:46:56 -08:00
Jeffrey Morgan
c852b8e021
server: always print upload/download part info (#8832) 2025-02-04 19:30:49 -08:00
William
d8932c55e7
server: fix out of bounds exception on model download (#8746) 2025-02-04 18:52:47 -08:00
Michael Yang
63f0269f7f ci: split docker build by platform
this improves build reliability and concurrency
v0.5.8-rc6
2025-02-04 17:04:27 -08:00
Jeffrey Morgan
4759ecae19
ml/backend/ggml: fix library loading on macOS amd64 (#8827) 2025-02-04 15:05:39 -08:00
Michael Yang
65b7ecac7b fix extra quote v0.5.8-rc5 2025-02-04 08:35:30 -08:00
Michael Yang
f9d2d89135 fix linux archive v0.5.8-rc4 2025-02-03 16:12:33 -08:00
Michael Yang
669dc31cf3 fix build v0.5.8-rc3 2025-02-03 15:10:51 -08:00
Tilman Griesel
d4d338c224
readme: add Chipper to community integrations (#8803) 2025-02-03 14:18:19 -08:00
Melroy van den Berg
bfdeffc375
docs: use OLLAMA_VERSION=0.5.7 for install version override (#8802) 2025-02-03 13:54:08 -08:00
Michael Yang
e806184023 fix release workflow v0.5.8-rc2 2025-02-03 13:19:57 -08:00
Jeffrey Morgan
50566113ac
llm: do not error if LibOllamaPath does not exist (#8801) 2025-02-03 12:27:48 -08:00