Michael Yang
4ea4d2b189
Merge pull request #9703 from ollama/mxyng/gemma3-memory
...
count gemma3 vision tensors
v0.6.1-rc0
2025-03-13 16:56:34 -07:00
Michael Yang
8d76fa23ef
count non-repeating vision layers
2025-03-13 16:53:29 -07:00
Bradley Erickson
74b44fdf8f
docs: Add OLLAMA_ORIGINS for browser extension support ( #9643 )
2025-03-13 16:35:20 -07:00
Michael Yang
65b88c544f
fix divide by zero
2025-03-13 16:35:00 -07:00
Michael Yang
a422ba39c9
roughly count gemma3 graph
...
the largest operation is by far (q @ k) so just count that for
simplicity
2025-03-13 16:35:00 -07:00
Michael Yang
d2ec22371e
count all vision tensors
2025-03-13 16:35:00 -07:00
Michael Yang
033cec232a
count gemma3 vision tensors
2025-03-13 16:34:42 -07:00
Michael Yang
543240fb5f
Merge pull request #9741 from ollama/mxyng/visionless
...
fix: error if image requested without vision model
2025-03-13 15:03:25 -07:00
Patrick Devine
4bed739259
add verbose mode to the show command ( #9640 )
...
Add metadata and tensor information to the show command to be able to
see more information about a model. This outputs the same data as
shown on the model details page on ollama.com
2025-03-13 14:24:27 -07:00
Patrick Devine
80c7ce381b
fix: change default context size for gemma3 ( #9744 )
2025-03-13 13:59:19 -07:00
Michael Yang
ccfd41c4f0
Merge pull request #9742 from ollama/mxyng/engine-error-embeddings
...
fix: error on models that don't support embeddings
2025-03-13 13:12:33 -07:00
Michael Yang
3e102b7dad
Update model/model.go
...
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
2025-03-13 13:11:52 -07:00
Michael Yang
ec46f3286c
engine: error on embeddings; not currently implemented
2025-03-13 11:40:55 -07:00
Michael Yang
5e2e0b46b1
fix: error if image requested without vision model
2025-03-13 10:52:09 -07:00
Michael Yang
45a13b1dec
Merge pull request #9688 from Shane-XB-Qian/debug_mistype_lld
...
ollama-debug.c: correct mistype
2025-03-13 10:12:44 -07:00
Parth Sareen
5c0b663969
sample: separate softmax and temperature transforms ( #9732 )
2025-03-13 09:53:27 -07:00
shane.xb.qian
30d7a59ba8
ollama-debug.c: change 'ld' to 'PRIi64'
...
* macOS has different definition per info from @mxyng
2025-03-13 17:10:37 +08:00
ParthSareen
4aeb67ef4c
sample: do all sorting in topK
2025-03-12 11:59:17 -07:00
ParthSareen
3ba91634c1
sample: simplify top_k=0 sorting
2025-03-12 11:59:17 -07:00
ParthSareen
1b7433b71e
sample: use container/heap for top_k
2025-03-12 11:59:17 -07:00
Bruce MacDonald
a70820daa0
models/gemma3: remove final logit softcap ( #9692 )
...
Softcap isn't in the whitepaper/implementation for the language model so we should remove it. There is no discernible difference in output with it removed.
2025-03-12 10:17:57 -07:00
Shane-XB-Qian
6b45b1d6b4
cli: adding support ctrl-n/p like general cli ( #9136 )
...
Signed-off-by: shane.xb.qian <shane.qian@foxmail.com>
2025-03-12 08:51:56 -07:00
shane.xb.qian
85ab552028
ollama-debug.c: correct mistype
...
Signed-off-by: shane.xb.qian <shane.qian@foxmail.com>
2025-03-12 22:32:30 +08:00
frob
b3af953a55
cli: don't exit for invalid model during /load. ( #9576 )
...
Co-authored-by: Richard Lyons <frob@cloudstaff.com>
2025-03-11 23:42:53 -07:00
Michael
ad4e0bf3be
Adding Gemma 3 to readme ( #9671 )
2025-03-12 07:39:25 +01:00
Michael Yang
aee28501b5
Merge pull request #9661 from ollama/gemma
...
engine: add gemma support
v0.6.0-rc0
v0.6.0
2025-03-11 15:07:50 -07:00
jmorganca
83f0ec8269
all: address linter errors
2025-03-11 14:49:20 -07:00
jmorganca
c6b6938b3a
kvcache: fix tests by adding AvgPool2D stub
2025-03-11 14:49:20 -07:00
jmorganca
fb4664fcec
model: add more spm tokenizer tests
2025-03-11 14:49:20 -07:00
jmorganca
20e3593863
model: validate left and right pairs before merging them
2025-03-11 14:49:20 -07:00
Michael Yang
63a394068c
use 2d pooling
2025-03-11 14:49:20 -07:00
Daniel Hiltgen
ab39e08eb9
llm: auto detect models that require Ollama Engine ( #1 )
2025-03-11 14:49:20 -07:00
jmorganca
11bfa62796
add trailing \n\n after <end_of_image> to match reference implementation
2025-03-11 14:49:20 -07:00
jmorganca
f63e62e546
reduce kernel size, add TODO for loading from config
2025-03-11 14:49:20 -07:00
jmorganca
65b0f329d1
Revert "Allow models to force a new batch"
...
This reverts commit c7eae586b899083acebcd9b3847b89ea78c2850c.
2025-03-11 14:49:20 -07:00
Jesse Gross
06007c0a18
Allow models to force a new batch
...
This is useful for a few things:
- Work around bugs, such as having 2 images in one batch
- Keep the image in a single batch for fully connected attention
- Improve performance by not evaluating embeddings multiple times
2025-03-11 14:49:20 -07:00
Jesse Gross
a8e83a7654
Disable causal attention based on batch index
...
Currently we are using positions, which are relative to a
sequence and may not be unique.
2025-03-11 14:49:20 -07:00
Jesse Gross
475005504e
Restrict Gemma to a single image per request
2025-03-11 14:49:20 -07:00
Jesse Gross
2c40c4d35e
Fix follow up images and images split across batches
2025-03-11 14:49:19 -07:00
Michael Yang
e95278932b
use non-causal mask only for image positions
2025-03-11 14:49:19 -07:00
Michael Yang
9d2a20a763
use non-causal mask for inputs with images
2025-03-11 14:49:19 -07:00
Patrick Devine
2e54d72fc3
fix gemma3 1b conversion
2025-03-11 14:49:19 -07:00
Michael Yang
6b32a2d549
compat with upstream gguf
2025-03-11 14:49:19 -07:00
Michael Yang
c5cbe4fc2a
fallback to cpu
2025-03-11 14:49:19 -07:00
Michael Yang
f888912870
fix vision encoder
2025-03-11 14:49:19 -07:00
Michael Yang
9e4642e9b3
ollama debug tensor
2025-03-11 14:49:19 -07:00
Michael Yang
6b0486c216
duplicate token_embd to output
2025-03-11 14:49:19 -07:00
Michael Yang
d368c039f0
skip repacking vision tensors
2025-03-11 14:49:19 -07:00
Patrick Devine
9b54267e69
fix configs
2025-03-11 14:49:19 -07:00
Michael Yang
46bb0169c4
update model
2025-03-11 14:49:19 -07:00