Blake Mizerany
4e320b8b90
server/internal/chunks: remove chunks package ( #9755 )
v0.6.1
2025-03-14 08:57:59 -07:00
Blake Mizerany
eb2b22b042
server/internal/client: use chunksums for concurrent blob verification ( #9746 )
...
Replace large-chunk blob downloads with parallel small-chunk
verification to solve timeout and performance issues. Registry users
experienced progressively slowing download speeds as large-chunk
transfers aged, often timing out completely.
The previous approach downloaded blobs in a few large chunks but
required a separate, single-threaded pass to read the entire blob back
from disk for verification after download completion.
This change uses the new chunksums API to fetch many smaller
chunk+digest pairs, allowing concurrent downloads and immediate
verification as each chunk arrives. Chunks are written directly to their
final positions, eliminating the entire separate verification pass.
The result is more reliable downloads that maintain speed throughout the
transfer process and significantly faster overall completion, especially
over unstable connections or with large blobs.
2025-03-13 22:18:29 -07:00
Michael Yang
4ea4d2b189
Merge pull request #9703 from ollama/mxyng/gemma3-memory
...
count gemma3 vision tensors
v0.6.1-rc0
2025-03-13 16:56:34 -07:00
Michael Yang
8d76fa23ef
count non-repeating vision layers
2025-03-13 16:53:29 -07:00
Bradley Erickson
74b44fdf8f
docs: Add OLLAMA_ORIGINS for browser extension support ( #9643 )
2025-03-13 16:35:20 -07:00
Michael Yang
65b88c544f
fix divide by zero
2025-03-13 16:35:00 -07:00
Michael Yang
a422ba39c9
roughly count gemma3 graph
...
the largest operation is by far (q @ k) so just count that for
simplicity
2025-03-13 16:35:00 -07:00
Michael Yang
d2ec22371e
count all vision tensors
2025-03-13 16:35:00 -07:00
Michael Yang
033cec232a
count gemma3 vision tensors
2025-03-13 16:34:42 -07:00
Michael Yang
543240fb5f
Merge pull request #9741 from ollama/mxyng/visionless
...
fix: error if image requested without vision model
2025-03-13 15:03:25 -07:00
Patrick Devine
4bed739259
add verbose mode to the show command ( #9640 )
...
Add metadata and tensor information to the show command to be able to
see more information about a model. This outputs the same data as
shown on the model details page on ollama.com
2025-03-13 14:24:27 -07:00
Patrick Devine
80c7ce381b
fix: change default context size for gemma3 ( #9744 )
2025-03-13 13:59:19 -07:00
Michael Yang
ccfd41c4f0
Merge pull request #9742 from ollama/mxyng/engine-error-embeddings
...
fix: error on models that don't support embeddings
2025-03-13 13:12:33 -07:00
Michael Yang
3e102b7dad
Update model/model.go
...
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
2025-03-13 13:11:52 -07:00
Michael Yang
ec46f3286c
engine: error on embeddings; not currently implemented
2025-03-13 11:40:55 -07:00
Michael Yang
5e2e0b46b1
fix: error if image requested without vision model
2025-03-13 10:52:09 -07:00
Michael Yang
45a13b1dec
Merge pull request #9688 from Shane-XB-Qian/debug_mistype_lld
...
ollama-debug.c: correct mistype
2025-03-13 10:12:44 -07:00
Parth Sareen
5c0b663969
sample: separate softmax and temperature transforms ( #9732 )
2025-03-13 09:53:27 -07:00
shane.xb.qian
30d7a59ba8
ollama-debug.c: change 'ld' to 'PRIi64'
...
* macOS has different definition per info from @mxyng
2025-03-13 17:10:37 +08:00
ParthSareen
4aeb67ef4c
sample: do all sorting in topK
2025-03-12 11:59:17 -07:00
ParthSareen
3ba91634c1
sample: simplify top_k=0 sorting
2025-03-12 11:59:17 -07:00
ParthSareen
1b7433b71e
sample: use container/heap for top_k
2025-03-12 11:59:17 -07:00
Bruce MacDonald
a70820daa0
models/gemma3: remove final logit softcap ( #9692 )
...
Softcap isn't in the whitepaper/implementation for the language model so we should remove it. There is no discernible difference in output with it removed.
2025-03-12 10:17:57 -07:00
Shane-XB-Qian
6b45b1d6b4
cli: adding support ctrl-n/p like general cli ( #9136 )
...
Signed-off-by: shane.xb.qian <shane.qian@foxmail.com>
2025-03-12 08:51:56 -07:00
shane.xb.qian
85ab552028
ollama-debug.c: correct mistype
...
Signed-off-by: shane.xb.qian <shane.qian@foxmail.com>
2025-03-12 22:32:30 +08:00
frob
b3af953a55
cli: don't exit for invalid model during /load. ( #9576 )
...
Co-authored-by: Richard Lyons <frob@cloudstaff.com>
2025-03-11 23:42:53 -07:00
Michael
ad4e0bf3be
Adding Gemma 3 to readme ( #9671 )
2025-03-12 07:39:25 +01:00
Michael Yang
aee28501b5
Merge pull request #9661 from ollama/gemma
...
engine: add gemma support
v0.6.0-rc0
v0.6.0
2025-03-11 15:07:50 -07:00
jmorganca
83f0ec8269
all: address linter errors
2025-03-11 14:49:20 -07:00
jmorganca
c6b6938b3a
kvcache: fix tests by adding AvgPool2D stub
2025-03-11 14:49:20 -07:00
jmorganca
fb4664fcec
model: add more spm tokenizer tests
2025-03-11 14:49:20 -07:00
jmorganca
20e3593863
model: validate left and right pairs before merging them
2025-03-11 14:49:20 -07:00
Michael Yang
63a394068c
use 2d pooling
2025-03-11 14:49:20 -07:00
Daniel Hiltgen
ab39e08eb9
llm: auto detect models that require Ollama Engine ( #1 )
2025-03-11 14:49:20 -07:00
jmorganca
11bfa62796
add trailing \n\n after <end_of_image> to match reference implementation
2025-03-11 14:49:20 -07:00
jmorganca
f63e62e546
reduce kernel size, add TODO for loading from config
2025-03-11 14:49:20 -07:00
jmorganca
65b0f329d1
Revert "Allow models to force a new batch"
...
This reverts commit c7eae586b899083acebcd9b3847b89ea78c2850c.
2025-03-11 14:49:20 -07:00
Jesse Gross
06007c0a18
Allow models to force a new batch
...
This is useful for a few things:
- Work around bugs, such as having 2 images in one batch
- Keep the image in a single batch for fully connected attention
- Improve performance by not evaluating embeddings multiple times
2025-03-11 14:49:20 -07:00
Jesse Gross
a8e83a7654
Disable causal attention based on batch index
...
Currently we are using positions, which are relative to a
sequence and may not be unique.
2025-03-11 14:49:20 -07:00
Jesse Gross
475005504e
Restrict Gemma to a single image per request
2025-03-11 14:49:20 -07:00
Jesse Gross
2c40c4d35e
Fix follow up images and images split across batches
2025-03-11 14:49:19 -07:00
Michael Yang
e95278932b
use non-causal mask only for image positions
2025-03-11 14:49:19 -07:00
Michael Yang
9d2a20a763
use non-causal mask for inputs with images
2025-03-11 14:49:19 -07:00
Patrick Devine
2e54d72fc3
fix gemma3 1b conversion
2025-03-11 14:49:19 -07:00
Michael Yang
6b32a2d549
compat with upstream gguf
2025-03-11 14:49:19 -07:00
Michael Yang
c5cbe4fc2a
fallback to cpu
2025-03-11 14:49:19 -07:00
Michael Yang
f888912870
fix vision encoder
2025-03-11 14:49:19 -07:00
Michael Yang
9e4642e9b3
ollama debug tensor
2025-03-11 14:49:19 -07:00
Michael Yang
6b0486c216
duplicate token_embd to output
2025-03-11 14:49:19 -07:00
Michael Yang
d368c039f0
skip repacking vision tensors
2025-03-11 14:49:19 -07:00