jmorganca
83f0ec8269
all: address linter errors
2025-03-11 14:49:20 -07:00
Michael Yang
63a394068c
use 2d pooling
2025-03-11 14:49:20 -07:00
Patrick Devine
2e54d72fc3
fix gemma3 1b conversion
2025-03-11 14:49:19 -07:00
Michael Yang
6b32a2d549
compat with upstream gguf
2025-03-11 14:49:19 -07:00
Michael Yang
d368c039f0
skip repacking vision tensors
2025-03-11 14:49:19 -07:00
Patrick Devine
9b54267e69
fix configs
2025-03-11 14:49:19 -07:00
Michael Yang
46bb0169c4
update model
2025-03-11 14:49:19 -07:00
Patrick Devine
c62861f4fa
fix conversion
2025-03-11 14:49:18 -07:00
Michael Yang
0df1800436
set non-causal attention
2025-03-11 14:49:18 -07:00
Patrick Devine
631fecc6d9
temporary work around for converting spm
2025-03-11 14:49:18 -07:00
Michael Yang
4b037a97dc
add gemma vision encoder
2025-03-11 14:49:17 -07:00
Patrick Devine
5f74d1fd47
gemma2 impl
2025-03-11 14:35:08 -07:00
Michael Yang
58245413f4
next ollama runner ( #7913 )
...
feat: add new Ollama engine using ggml through cgo
This change introduces a new way to run pretrained models. It introduces 3 high level interfaces and a bunch of smaller helper interfaces to facilitate this.
- `model.Model` defines the interface for a model architecture. Models such as `llama` and `mllama`, which are provided as examples, can implement the model's forward propagation in the `Forward` method. This method will be called to generate completions. This interface can be found in `model/model.go`
- `ml.Backend` defines the interface for a backend tensor library, in this case `ggml`. Among other things, a Backend is responsible for loading a pretrained model into hardware (GPU, CPU, etc) and providing an interface for Models to access loaded tensors. This interface can be found in `ml/backend.go`
- `ml.Tensor` defines the interface for a tensor and tensor operations
This is the first implementation of the new engine. Follow up PRs will implement more features:
- non-greedy sampling (#8410 )
- integration with Ollama and KV caching (#8301 )
- more model support (#9080 ) with more coming soon
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
2025-02-13 16:31:21 -08:00
Josh
93a8daf285
convert: import support for command-r models from safetensors ( #6063 )
...
---------
Co-authored-by: Patrick Devine <patrick@infrahq.com>
2025-01-15 16:31:22 -08:00
Bruce MacDonald
f6f3713001
convert: qwen2 from safetensors ( #8408 )
...
Add native support for converting Qwen2 family models (including Qwen2.5)
from safetensors to gguf format so we can run it.
2025-01-14 10:34:37 -08:00
Stefan Weil
abfdc4710f
all: fix typos in documentation, code, and comments ( #7021 )
2024-12-10 12:58:06 -08:00
Michael Yang
4456012956
fix unmarshaling merges
2024-12-04 09:21:56 -08:00
Patrick Devine
c7cb0f0602
image processing for llama3.2 ( #6963 )
...
Co-authored-by: jmorganca <jmorganca@gmail.com>
Co-authored-by: Michael Yang <mxyng@pm.me>
Co-authored-by: Jesse Gross <jesse@ollama.com>
2024-10-18 16:12:35 -07:00
Patrick Devine
84b84ce2db
catch when model vocab size is set correctly ( #6714 )
2024-09-09 17:18:54 -07:00
Patrick Devine
608e87bf87
Fix gemma2 2b conversion ( #6645 )
2024-09-05 17:02:28 -07:00
Michael Yang
9cfd2dd3e3
Merge pull request #6522 from ollama/mxyng/detect-chat
...
detect chat template from configs that contain lists
2024-08-28 11:04:18 -07:00
Patrick Devine
6c1c1ad6a9
throw an error when encountering unsupport tensor sizes ( #6538 )
2024-08-27 17:54:04 -07:00
Michael Yang
60e47573a6
more tokenizer tests
2024-08-27 14:51:10 -07:00
Michael Yang
eae3af6807
clean up convert tokenizer
2024-08-27 11:11:43 -07:00
Michael Yang
3eb08377f8
detect chat template from configs that contain lists
2024-08-27 10:49:33 -07:00
Patrick Devine
0c819e167b
convert safetensor adapters into GGUF ( #6327 )
2024-08-23 11:29:56 -07:00
Michael Yang
77903ab8b4
llama3.1
2024-08-21 11:49:31 -07:00
Michael Yang
3546bbd08c
convert gemma2
2024-08-20 17:27:51 -07:00
Michael Yang
5a28b9cf5f
bert
2024-08-20 17:27:34 -07:00
Bruce MacDonald
aec77d6a05
support new "longrope" attention factor
2024-08-12 15:13:29 -07:00
Michael Yang
6ffb5cb017
add conversion for microsoft phi 3 mini/medium 4k, 128
2024-08-12 15:13:29 -07:00
Michael Yang
b732beba6a
lint
2024-08-01 17:06:06 -07:00
Michael Yang
d8e2664c33
convert: fix parse functions
2024-07-31 15:58:55 -07:00
Michael Yang
eafc607abb
convert: only extract large files
2024-07-31 15:58:55 -07:00
Michael Yang
781fc2d576
Update convert/reader_safetensors.go
...
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
2024-07-31 15:58:55 -07:00
Michael Yang
df993fa37b
comments
2024-07-31 15:58:55 -07:00
Michael Yang
5e9db9fb0b
refactor convert
2024-07-31 15:58:33 -07:00
Michael Yang
6b252918fb
update convert test to check result data
2024-07-31 10:59:38 -07:00
Jeffrey Morgan
d835368eb8
convert: capture head_dim
for mistral ( #5818 )
2024-07-22 16:16:22 -04:00
Michael Yang
e40145a39d
lint
2024-06-04 11:13:30 -07:00
Michael Yang
c895a7d13f
some gocritic
2024-06-04 11:13:30 -07:00
Michael Yang
55f6eba049
gofmt
2024-06-04 11:13:30 -07:00
Ikko Eltociear Ashimine
955c317cab
chore: update tokenizer.go ( #4571 )
...
PreTokenziers -> PreTokenizers
2024-05-22 00:25:23 -07:00
Michael Yang
171eb040fc
simplify safetensors reading
2024-05-21 11:28:22 -07:00
Michael Yang
3591bbe56f
add test
2024-05-21 11:28:22 -07:00
Michael Yang
34d5ef29b3
fix conversion for f16 or f32 inputs
2024-05-21 11:28:22 -07:00
Michael Yang
bbbd9f20f3
cleanup
2024-05-20 16:13:57 -07:00
Michael Yang
547132e820
bpe pretokenizer
2024-05-20 16:13:57 -07:00
Patrick Devine
2d315ba9a9
add missing file
2024-05-20 16:13:57 -07:00
Patrick Devine
d355d2020f
add fixes for llama
2024-05-20 16:13:57 -07:00