18 Commits

Author SHA1 Message Date
Michael Yang
58245413f4
next ollama runner (#7913)
feat: add new Ollama engine using ggml through cgo

This change introduces a new way to run pretrained models. It introduces 3 high level interfaces and a bunch of smaller helper interfaces to facilitate this.

- `model.Model` defines the interface for a model architecture. Models such as `llama` and `mllama`, which are provided as examples, can implement the model's forward propagation in the `Forward` method. This method will be called to generate completions. This interface can be found in `model/model.go`
- `ml.Backend` defines the interface for a backend tensor library, in this case `ggml`. Among other things, a Backend is responsible for loading a pretrained model into hardware (GPU, CPU, etc) and providing an interface for Models to access loaded tensors. This interface can be found in `ml/backend.go`
- `ml.Tensor` defines the interface for a tensor and tensor operations

This is the first implementation of the new engine. Follow up PRs will implement more features:

- non-greedy sampling (#8410)
- integration with Ollama and KV caching (#8301)
- more model support (#9080) with more coming soon

Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
2025-02-13 16:31:21 -08:00
Josh
93a8daf285
convert: import support for command-r models from safetensors (#6063)
---------

Co-authored-by: Patrick Devine <patrick@infrahq.com>
2025-01-15 16:31:22 -08:00
Bruce MacDonald
f6f3713001
convert: qwen2 from safetensors (#8408)
Add native support for converting Qwen2 family models (including Qwen2.5)
from safetensors to gguf format so we can run it.
2025-01-14 10:34:37 -08:00
Patrick Devine
c7cb0f0602
image processing for llama3.2 (#6963)
Co-authored-by: jmorganca <jmorganca@gmail.com>
Co-authored-by: Michael Yang <mxyng@pm.me>
Co-authored-by: Jesse Gross <jesse@ollama.com>
2024-10-18 16:12:35 -07:00
Patrick Devine
608e87bf87
Fix gemma2 2b conversion (#6645) 2024-09-05 17:02:28 -07:00
Michael Yang
9cfd2dd3e3
Merge pull request #6522 from ollama/mxyng/detect-chat
detect chat template from configs that contain lists
2024-08-28 11:04:18 -07:00
Patrick Devine
6c1c1ad6a9
throw an error when encountering unsupport tensor sizes (#6538) 2024-08-27 17:54:04 -07:00
Michael Yang
eae3af6807 clean up convert tokenizer 2024-08-27 11:11:43 -07:00
Patrick Devine
0c819e167b
convert safetensor adapters into GGUF (#6327) 2024-08-23 11:29:56 -07:00
Michael Yang
77903ab8b4 llama3.1 2024-08-21 11:49:31 -07:00
Michael Yang
3546bbd08c convert gemma2 2024-08-20 17:27:51 -07:00
Michael Yang
5a28b9cf5f bert 2024-08-20 17:27:34 -07:00
Michael Yang
6ffb5cb017 add conversion for microsoft phi 3 mini/medium 4k, 128 2024-08-12 15:13:29 -07:00
Michael Yang
b732beba6a lint 2024-08-01 17:06:06 -07:00
Michael Yang
eafc607abb convert: only extract large files 2024-07-31 15:58:55 -07:00
Michael Yang
5e9db9fb0b refactor convert 2024-07-31 15:58:33 -07:00
Michael Yang
6b252918fb update convert test to check result data 2024-07-31 10:59:38 -07:00
Michael Yang
3591bbe56f add test 2024-05-21 11:28:22 -07:00