Michael Yang
8d76fa23ef
count non-repeating vision layers
2025-03-13 16:53:29 -07:00
Michael Yang
65b88c544f
fix divide by zero
2025-03-13 16:35:00 -07:00
Michael Yang
a422ba39c9
roughly count gemma3 graph
...
the largest operation is by far (q @ k) so just count that for
simplicity
2025-03-13 16:35:00 -07:00
Michael Yang
d2ec22371e
count all vision tensors
2025-03-13 16:35:00 -07:00
Michael Yang
033cec232a
count gemma3 vision tensors
2025-03-13 16:34:42 -07:00
Shane-XB-Qian
6b45b1d6b4
cli: adding support ctrl-n/p like general cli ( #9136 )
...
Signed-off-by: shane.xb.qian <shane.qian@foxmail.com >
2025-03-12 08:51:56 -07:00
frob
b3af953a55
cli: don't exit for invalid model during /load. ( #9576 )
...
Co-authored-by: Richard Lyons <frob@cloudstaff.com >
2025-03-11 23:42:53 -07:00
Michael
ad4e0bf3be
Adding Gemma 3 to readme ( #9671 )
2025-03-12 07:39:25 +01:00
Michael Yang
aee28501b5
Merge pull request #9661 from ollama/gemma
...
engine: add gemma support
v0.6.0-rc0
v0.6.0
2025-03-11 15:07:50 -07:00
jmorganca
83f0ec8269
all: address linter errors
2025-03-11 14:49:20 -07:00
jmorganca
c6b6938b3a
kvcache: fix tests by adding AvgPool2D stub
2025-03-11 14:49:20 -07:00
jmorganca
fb4664fcec
model: add more spm tokenizer tests
2025-03-11 14:49:20 -07:00
jmorganca
20e3593863
model: validate left and right pairs before merging them
2025-03-11 14:49:20 -07:00
Michael Yang
63a394068c
use 2d pooling
2025-03-11 14:49:20 -07:00
Daniel Hiltgen
ab39e08eb9
llm: auto detect models that require Ollama Engine ( #1 )
2025-03-11 14:49:20 -07:00
jmorganca
11bfa62796
add trailing \n\n after <end_of_image> to match reference implementation
2025-03-11 14:49:20 -07:00
jmorganca
f63e62e546
reduce kernel size, add TODO for loading from config
2025-03-11 14:49:20 -07:00
jmorganca
65b0f329d1
Revert "Allow models to force a new batch"
...
This reverts commit c7eae586b899083acebcd9b3847b89ea78c2850c.
2025-03-11 14:49:20 -07:00
Jesse Gross
06007c0a18
Allow models to force a new batch
...
This is useful for a few things:
- Work around bugs, such as having 2 images in one batch
- Keep the image in a single batch for fully connected attention
- Improve performance by not evaluating embeddings multiple times
2025-03-11 14:49:20 -07:00
Jesse Gross
a8e83a7654
Disable causal attention based on batch index
...
Currently we are using positions, which are relative to a
sequence and may not be unique.
2025-03-11 14:49:20 -07:00
Jesse Gross
475005504e
Restrict Gemma to a single image per request
2025-03-11 14:49:20 -07:00
Jesse Gross
2c40c4d35e
Fix follow up images and images split across batches
2025-03-11 14:49:19 -07:00
Michael Yang
e95278932b
use non-causal mask only for image positions
2025-03-11 14:49:19 -07:00
Michael Yang
9d2a20a763
use non-causal mask for inputs with images
2025-03-11 14:49:19 -07:00
Patrick Devine
2e54d72fc3
fix gemma3 1b conversion
2025-03-11 14:49:19 -07:00
Michael Yang
6b32a2d549
compat with upstream gguf
2025-03-11 14:49:19 -07:00
Michael Yang
c5cbe4fc2a
fallback to cpu
2025-03-11 14:49:19 -07:00
Michael Yang
f888912870
fix vision encoder
2025-03-11 14:49:19 -07:00
Michael Yang
9e4642e9b3
ollama debug tensor
2025-03-11 14:49:19 -07:00
Michael Yang
6b0486c216
duplicate token_embd to output
2025-03-11 14:49:19 -07:00
Michael Yang
d368c039f0
skip repacking vision tensors
2025-03-11 14:49:19 -07:00
Patrick Devine
9b54267e69
fix configs
2025-03-11 14:49:19 -07:00
Michael Yang
46bb0169c4
update model
2025-03-11 14:49:19 -07:00
Michael Yang
8934324b72
use fast attention
2025-03-11 14:49:18 -07:00
Jesse Gross
0e886595bf
Fix tests and drift from main
2025-03-11 14:49:18 -07:00
Patrick Devine
c62861f4fa
fix conversion
2025-03-11 14:49:18 -07:00
Michael Yang
0df1800436
set non-causal attention
2025-03-11 14:49:18 -07:00
Patrick Devine
631fecc6d9
temporary work around for converting spm
2025-03-11 14:49:18 -07:00
Jesse Gross
4346c2409d
fix drift from main
2025-03-11 14:49:18 -07:00
Michael Yang
4b037a97dc
add gemma vision encoder
2025-03-11 14:49:17 -07:00
Patrick Devine
5f74d1fd47
gemma2 impl
2025-03-11 14:35:08 -07:00
Daniel Hiltgen
4dcf80167a
Build release for windows with local script ( #9636 )
2025-03-11 08:34:20 -07:00
Michael Yang
26a26998fb
Merge pull request #9590 from ollama/mxyng/dump-pad
...
fix: pad tensor item if ge zero
2025-03-10 16:34:55 -07:00
Michael Yang
9926eae015
fix: pad tensor item if ge zero
...
this produces a nicer output since both positive and negative values
produces the same width
2025-03-10 16:18:12 -07:00
Vincent Koc
8585b7b151
docs: add opik to observability integrations ( #9626 )
2025-03-10 16:15:10 -07:00
Parth Sareen
7e34f4fbfa
sample: add numerical stability to temperature/softmax transform ( #9631 )
2025-03-10 14:43:53 -07:00
Michael Yang
fe776293f7
Merge pull request #9569 from dwt/patch-1
...
Better WantedBy declaration
2025-03-10 14:09:37 -07:00
frob
d8a5d96b98
docs: Add OLLAMA_CONTEXT_LENGTH to FAQ. ( #9545 )
2025-03-10 11:02:54 -07:00
Xiaowei Zhu
757668c42f
docs: add SwiftChat ( #9540 )
2025-03-10 11:01:09 -07:00
Sam
96ec8afd09
docs(tool): add mcp-llm ( #9537 )
2025-03-10 09:52:02 -07:00