Michael Yang
5e2e0b46b1
fix: error if image requested without vision model
2025-03-13 10:52:09 -07:00
Bruce MacDonald
a70820daa0
models/gemma3: remove final logit softcap ( #9692 )
...
Softcap isn't in the whitepaper/implementation for the language model so we should remove it. There is no discernible difference in output with it removed.
2025-03-12 10:17:57 -07:00
jmorganca
83f0ec8269
all: address linter errors
2025-03-11 14:49:20 -07:00
Michael Yang
63a394068c
use 2d pooling
2025-03-11 14:49:20 -07:00
jmorganca
11bfa62796
add trailing \n\n after <end_of_image> to match reference implementation
2025-03-11 14:49:20 -07:00
jmorganca
f63e62e546
reduce kernel size, add TODO for loading from config
2025-03-11 14:49:20 -07:00
jmorganca
65b0f329d1
Revert "Allow models to force a new batch"
...
This reverts commit c7eae586b899083acebcd9b3847b89ea78c2850c.
2025-03-11 14:49:20 -07:00
Jesse Gross
06007c0a18
Allow models to force a new batch
...
This is useful for a few things:
- Work around bugs, such as having 2 images in one batch
- Keep the image in a single batch for fully connected attention
- Improve performance by not evaluating embeddings multiple times
2025-03-11 14:49:20 -07:00
Jesse Gross
a8e83a7654
Disable causal attention based on batch index
...
Currently we are using positions, which are relative to a
sequence and may not be unique.
2025-03-11 14:49:20 -07:00
Jesse Gross
2c40c4d35e
Fix follow up images and images split across batches
2025-03-11 14:49:19 -07:00
Michael Yang
e95278932b
use non-causal mask only for image positions
2025-03-11 14:49:19 -07:00
Michael Yang
9d2a20a763
use non-causal mask for inputs with images
2025-03-11 14:49:19 -07:00
Michael Yang
6b32a2d549
compat with upstream gguf
2025-03-11 14:49:19 -07:00
Michael Yang
f888912870
fix vision encoder
2025-03-11 14:49:19 -07:00
Patrick Devine
9b54267e69
fix configs
2025-03-11 14:49:19 -07:00
Michael Yang
46bb0169c4
update model
2025-03-11 14:49:19 -07:00
Michael Yang
8934324b72
use fast attention
2025-03-11 14:49:18 -07:00
Patrick Devine
c62861f4fa
fix conversion
2025-03-11 14:49:18 -07:00
Michael Yang
0df1800436
set non-causal attention
2025-03-11 14:49:18 -07:00
Jesse Gross
4346c2409d
fix drift from main
2025-03-11 14:49:18 -07:00
Michael Yang
4b037a97dc
add gemma vision encoder
2025-03-11 14:49:17 -07:00
Patrick Devine
5f74d1fd47
gemma2 impl
2025-03-11 14:35:08 -07:00