Michael Yang
58245413f4
next ollama runner ( #7913 )
...
feat: add new Ollama engine using ggml through cgo
This change introduces a new way to run pretrained models. It introduces 3 high level interfaces and a bunch of smaller helper interfaces to facilitate this.
- `model.Model` defines the interface for a model architecture. Models such as `llama` and `mllama`, which are provided as examples, can implement the model's forward propagation in the `Forward` method. This method will be called to generate completions. This interface can be found in `model/model.go`
- `ml.Backend` defines the interface for a backend tensor library, in this case `ggml`. Among other things, a Backend is responsible for loading a pretrained model into hardware (GPU, CPU, etc) and providing an interface for Models to access loaded tensors. This interface can be found in `ml/backend.go`
- `ml.Tensor` defines the interface for a tensor and tensor operations
This is the first implementation of the new engine. Follow up PRs will implement more features:
- non-greedy sampling (#8410 )
- integration with Ollama and KV caching (#8301 )
- more model support (#9080 ) with more coming soon
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
2025-02-13 16:31:21 -08:00
Josh
93a8daf285
convert: import support for command-r models from safetensors ( #6063 )
...
---------
Co-authored-by: Patrick Devine <patrick@infrahq.com>
2025-01-15 16:31:22 -08:00
Bruce MacDonald
f6f3713001
convert: qwen2 from safetensors ( #8408 )
...
Add native support for converting Qwen2 family models (including Qwen2.5)
from safetensors to gguf format so we can run it.
2025-01-14 10:34:37 -08:00
Stefan Weil
abfdc4710f
all: fix typos in documentation, code, and comments ( #7021 )
2024-12-10 12:58:06 -08:00
Michael Yang
4456012956
fix unmarshaling merges
2024-12-04 09:21:56 -08:00
Patrick Devine
c7cb0f0602
image processing for llama3.2 ( #6963 )
...
Co-authored-by: jmorganca <jmorganca@gmail.com>
Co-authored-by: Michael Yang <mxyng@pm.me>
Co-authored-by: Jesse Gross <jesse@ollama.com>
2024-10-18 16:12:35 -07:00
Patrick Devine
84b84ce2db
catch when model vocab size is set correctly ( #6714 )
2024-09-09 17:18:54 -07:00
Patrick Devine
608e87bf87
Fix gemma2 2b conversion ( #6645 )
2024-09-05 17:02:28 -07:00
Michael Yang
9cfd2dd3e3
Merge pull request #6522 from ollama/mxyng/detect-chat
...
detect chat template from configs that contain lists
2024-08-28 11:04:18 -07:00
Patrick Devine
6c1c1ad6a9
throw an error when encountering unsupport tensor sizes ( #6538 )
2024-08-27 17:54:04 -07:00
Michael Yang
60e47573a6
more tokenizer tests
2024-08-27 14:51:10 -07:00
Michael Yang
eae3af6807
clean up convert tokenizer
2024-08-27 11:11:43 -07:00
Michael Yang
3eb08377f8
detect chat template from configs that contain lists
2024-08-27 10:49:33 -07:00
Patrick Devine
0c819e167b
convert safetensor adapters into GGUF ( #6327 )
2024-08-23 11:29:56 -07:00
Michael Yang
77903ab8b4
llama3.1
2024-08-21 11:49:31 -07:00
Michael Yang
3546bbd08c
convert gemma2
2024-08-20 17:27:51 -07:00
Michael Yang
5a28b9cf5f
bert
2024-08-20 17:27:34 -07:00
Bruce MacDonald
aec77d6a05
support new "longrope" attention factor
2024-08-12 15:13:29 -07:00
Michael Yang
6ffb5cb017
add conversion for microsoft phi 3 mini/medium 4k, 128
2024-08-12 15:13:29 -07:00
Michael Yang
b732beba6a
lint
2024-08-01 17:06:06 -07:00
Michael Yang
d8e2664c33
convert: fix parse functions
2024-07-31 15:58:55 -07:00
Michael Yang
eafc607abb
convert: only extract large files
2024-07-31 15:58:55 -07:00
Michael Yang
781fc2d576
Update convert/reader_safetensors.go
...
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
2024-07-31 15:58:55 -07:00
Michael Yang
df993fa37b
comments
2024-07-31 15:58:55 -07:00
Michael Yang
5e9db9fb0b
refactor convert
2024-07-31 15:58:33 -07:00
Michael Yang
6b252918fb
update convert test to check result data
2024-07-31 10:59:38 -07:00
Jeffrey Morgan
d835368eb8
convert: capture head_dim
for mistral ( #5818 )
2024-07-22 16:16:22 -04:00
Michael Yang
e40145a39d
lint
2024-06-04 11:13:30 -07:00
Michael Yang
c895a7d13f
some gocritic
2024-06-04 11:13:30 -07:00
Michael Yang
55f6eba049
gofmt
2024-06-04 11:13:30 -07:00
Ikko Eltociear Ashimine
955c317cab
chore: update tokenizer.go ( #4571 )
...
PreTokenziers -> PreTokenizers
2024-05-22 00:25:23 -07:00
Michael Yang
171eb040fc
simplify safetensors reading
2024-05-21 11:28:22 -07:00
Michael Yang
3591bbe56f
add test
2024-05-21 11:28:22 -07:00
Michael Yang
34d5ef29b3
fix conversion for f16 or f32 inputs
2024-05-21 11:28:22 -07:00
Michael Yang
bbbd9f20f3
cleanup
2024-05-20 16:13:57 -07:00
Michael Yang
547132e820
bpe pretokenizer
2024-05-20 16:13:57 -07:00
Patrick Devine
2d315ba9a9
add missing file
2024-05-20 16:13:57 -07:00
Patrick Devine
d355d2020f
add fixes for llama
2024-05-20 16:13:57 -07:00
Patrick Devine
c8cf0d94ed
llama3 conversion
2024-05-20 16:13:57 -07:00
Patrick Devine
4730762e5c
add safetensors version
2024-05-20 16:13:57 -07:00
Patrick Devine
d88582dffd
some changes for llama3
2024-05-20 16:13:57 -07:00
Michael Yang
6694be5e50
convert/llama: use WriteSeeker
2024-05-06 15:24:01 -07:00
Michael Yang
7ffe45734d
rebase
2024-05-06 15:24:01 -07:00
Michael Yang
9685c34509
quantize any fp16/fp32 model
...
- FROM /path/to/{safetensors,pytorch}
- FROM /path/to/fp{16,32}.bin
- FROM model:fp{16,32}
2024-05-06 15:24:01 -07:00
Daniel Hiltgen
42fa9d7f0a
Fix lint warnings
2024-05-03 16:44:19 -07:00
Patrick Devine
ce8ce82567
add mixtral 8x7b model conversion ( #3859 )
2024-04-23 20:17:04 -07:00
Patrick Devine
9f8691c6c8
Add llama2 / torch models for ollama create
( #3607 )
2024-04-15 11:26:42 -07:00
Michael Yang
be517e491c
no rope parameters
2024-04-05 18:05:27 -07:00
Patrick Devine
3b6a9154dd
Simplify model conversion ( #3422 )
2024-04-01 16:14:53 -07:00
Patrick Devine
5a5efee46b
Add gemma safetensors conversion ( #3250 )
...
Co-authored-by: Michael Yang <mxyng@pm.me>
2024-03-28 18:54:01 -07:00