Jesse Gross
45f216a9c7
ggml: Temporarily disable reporting UUIDs
...
This is causing segfaults, so disable it. Currently UUIDs are only
used for debugging purposes, although they planned to be used in
additional ways in the future.
Bug #11211
2025-06-27 11:27:22 -07:00
Michael Yang
d0b32def60
skip quantizing per_layer_token_embd ( #11207 )
...
this tensor isn't compatible with cuda when quantized to q4_K so skip it
2025-06-26 21:49:35 -07:00
Daniel Hiltgen
11ffc36157
ci: multi-stage release process ( #11001 )
2025-06-26 10:32:48 -07:00
Jeffrey Morgan
ba04902670
fs/ggml: add multiplier in graph estimates ( #11208 )
2025-06-26 00:19:44 -07:00
Jeffrey Morgan
3944602f51
fs/ggml: add missing architecture to OllamaEngineRequired() ( #11206 )
2025-06-26 00:11:23 -07:00
Michael Yang
73b642e6f3
add new gemma model ( #11204 )
...
* update patches
* cherry pick metal mean kernel
* cherry pick cuda mean kernel
* gemma3n
2025-06-25 21:47:09 -07:00
Daniel Hiltgen
ad118d8b13
ci: arm sbsa fixes ( #11194 )
2025-06-24 21:00:15 -07:00
Daniel Hiltgen
f08534137b
ci: include dependencies
2025-06-24 20:27:43 -07:00
Daniel Hiltgen
4b4a90f233
ci: pick up arm sbsa cuda libs ( #11192 )
2025-06-24 18:59:22 -07:00
Daniel Hiltgen
03274a6b2f
ci: recombine linux amd64 binaries ( #11188 )
...
Glue the rocm and archive builds back together.
2025-06-24 18:45:01 -07:00
Devon Rifkin
cc6463ebca
Merge pull request #10238 from ollama/drifkin/array-head-count-simple
...
ggml: fix crash for array head counts
2025-06-24 17:50:02 -07:00
Daniel Hiltgen
405d2f628f
ci: rocm parallel builds on windows ( #11187 )
...
The preset CMAKE_HIP_FLAGS isn't getting used on Windows.
This passes the parallel flag in through the C/CXX flags, along
with suppression for some log spew warnings to quiet down the build.
2025-06-24 15:27:09 -07:00
Devon Rifkin
a3f7dd3e98
Merge branch 'main' into drifkin/array-head-count-simple
2025-06-24 14:20:05 -07:00
Daniel Hiltgen
c85c0ebf89
CI: switch windows to vs 2022 ( #11184 )
...
* CI: switch windows to vs 2022
* ci: fix regex match
2025-06-24 13:26:55 -07:00
Daniel Hiltgen
10a8e04a8d
avoid context overflow ( #11175 )
...
For smaller context models, make sure we do not exceed the training size.
2025-06-23 15:52:50 -07:00
Daniel Hiltgen
1c6669e64c
Re-remove cuda v11 ( #10694 )
...
* Re-remove cuda v11
Revert the revert - drop v11 support requiring drivers newer than Feb 23
This reverts commit c6bcdc4223
.
* Simplify layout
With only one version of the GPU libraries, we can simplify things down somewhat. (Jetsons still require special handling)
* distinct sbsa variant for linux arm64
This avoids accidentally trying to load the sbsa cuda libraries on
a jetson system which results in crashes.
* temporary prevent rocm+cuda mixed loading
2025-06-23 14:07:00 -07:00
Devon Rifkin
b2b270ad5d
Merge branch 'main' into drifkin/array-head-count-simple
2025-06-23 10:37:31 -07:00
AJ
2bb69b40c7
readme: add ai-hub to community integrations ( #11169 )
2025-06-23 09:21:12 -07:00
Daniel Hiltgen
65bff664cb
build speedups ( #11142 )
...
Enable parallel building of the GPU architectures.
2025-06-20 12:32:51 -07:00
Michael Yang
c088ac0e79
convert: utility for merging tensors ( #11069 )
2025-06-20 11:12:01 -07:00
Michael Yang
0a066cfd91
Reapply "feat: incremental gguf parser ( #10822 )" ( #11114 ) ( #11119 )
...
* Reapply "feat: incremental gguf parser (#10822 )" (#11114 )
This reverts commit a6e64fbdf2
.
* fix older ggufs
2025-06-20 11:11:40 -07:00
Jesse Gross
87b7af6cee
ggml: Check return status for computation.
...
We don't check the return status after computing the graph, which
can silently lead to bad outputs if we try to keep going and future
computation succeeds. This appears to happens in certain cases on
Apple M2 devices.
Fixes #11070
2025-06-19 17:12:49 -07:00
Daniel Hiltgen
f2527b08fb
int: add coverage for older models ( #11137 )
...
Verified these fail on 0.9.1 and pass on HEAD.
2025-06-19 12:10:19 -07:00
Jeffrey Morgan
8bcb3125c1
benchmark: remove unused benchmark test ( #11120 )
...
Removes a test under benchmark/ that is unused
2025-06-18 12:58:50 -07:00
Jeffrey Morgan
6baf1e31e2
Revert "Revert "ggml: Export GPU UUIDs" ( #11115 )" ( #11117 )
...
Reverts PR #11115 . The original change was mistakingly reverted instead of #10822
2025-06-18 07:30:49 -07:00
Jeffrey Morgan
ed567ef43b
Revert "ggml: Export GPU UUIDs" ( #11115 )
...
This reverts commit aaa7818000
.
2025-06-18 05:45:00 -07:00
Jeffrey Morgan
a6e64fbdf2
Revert "feat: incremental gguf parser ( #10822 )" ( #11114 )
...
This reverts commit 6b04cad7e8
.
2025-06-18 05:42:44 -07:00
曹家巧
60cfa2a203
cache: fix comment function name in cache.go ( #11110 )
2025-06-18 05:21:45 -07:00
Jeffrey Morgan
55bbf3b4a1
tools: return empty arguments object instead of null ( #11113 )
2025-06-18 05:20:43 -07:00
Jeffrey Morgan
6bda1d2479
tools: fix parsing tool calls without any parameters ( #11101 )
...
Fixes issue where tool calls that don't expect any parameters were
not being parsed. This also fixes two additional issues: one where
2+ tool calls would not be correctly parsed, and cases where tool calls
with invalid parameters would still get parsed
2025-06-17 10:51:43 -07:00
Jeffrey Morgan
9e125d884c
model: treat 'user defined' tokens as special tokens ( #11077 )
2025-06-16 16:03:16 -07:00
Michael Yang
a6fbfc880c
gguf: fix write order ( #11068 )
...
* ggml: test write gguf order
* ggml: fix write tensor order
2025-06-16 10:42:32 -07:00
NGC13009
502028968d
readme: add ollama-launcher to community integrations ( #11080 )
2025-06-15 21:27:49 -07:00
Phil
5a8eb0e151
readme: add GPTranslate to community integrations ( #11071 )
2025-06-14 08:54:03 -07:00
Jeffrey Morgan
9f8a18ec05
tools: loosen tool parsing to allow for more formats ( #11030 )
2025-06-12 14:18:54 -07:00
Michael Yang
6b04cad7e8
feat: incremental gguf parser ( #10822 )
...
* incremental gguf parser
* gguf: update test to not rely on gguf on disc
* re-use existing create gguf
* read capabilities from gguf kv
* kv exists
* update tests
* s/doneFunc/successFunc/g
* new buffered reader
---------
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com >
2025-06-12 11:04:11 -07:00
Michael Yang
45f56355d5
feat: uneven splits ( #11048 )
...
The current splitDim function only operates on tensors that are split evenly which isn't always the case, e.g. a QKV tensor. This change allows the function to be used for arbitrary splits
2025-06-11 12:10:54 -07:00
Michael Yang
0dabb4ef6a
skip tokenizer.model if possible ( #11050 )
...
if tokenizer.json is already copied, skip tokenizer.model
2025-06-11 12:10:35 -07:00
Michael Yang
2e77aa1ae7
use nn.Linear in place of ml.Tensor ( #11049 )
...
while nn.Linear.Forward isn't applicable for sparse MLP, it's still
a nice container for the tensors
2025-06-11 12:10:15 -07:00
Attogram Project
deaabe292d
readme: add ollama-multirun to community integrations ( #11038 )
2025-06-10 14:14:51 -07:00
Jeffrey Morgan
af21a5ac39
readme: update quickstart link text to Gemma 3
2025-06-10 09:34:23 -07:00
Jeffrey Morgan
f63d7f68eb
readme: update quickstart example to Gemma 3
2025-06-10 09:33:54 -07:00
Daniel Hiltgen
82ad1dbc07
mac: handle "keep" named apps ( #11031 )
...
When a user elects to keep the existing app, the
new Ollama is named `Ollama 2.app`
This fixes the app startup flow to handle this naming pattern.
2025-06-09 16:29:57 -07:00
Daniel Hiltgen
feeabdadd2
spawn desktop quickly ( #11011 )
...
Give the desktop app a hint to start fast.
2025-06-08 09:34:52 -07:00
Krzysztof Jeziorny
fc0309615e
docs: update link to AMD drivers in linux.md ( #10973 )
2025-06-06 23:30:04 -04:00
Jeffrey Morgan
09d308d6b6
Revert "server: add model capabilities to the list endpoint ( #10174 )" ( #11004 )
...
This reverts commit 0943001193
.
2025-06-06 23:29:14 -04:00
Daniel Hiltgen
a8ed68bd93
launch app hidden ( #10962 )
...
When starting the app in the background, start it hidden.
2025-06-06 14:06:29 -07:00
Daniel Hiltgen
2ae65ae471
win: handle more than 2048 processes ( #10997 )
...
Fix an array out of bounds crash
2025-06-06 14:06:09 -07:00
Devon Rifkin
a3b6886b7d
move thinking logic into its own package ( #10990 )
...
move thinking logic into its own package
2025-06-06 12:02:20 -07:00
Hunter Wittenborn
c6a6d7294d
docs: fix typo in development.md ( #10998 )
2025-06-06 12:07:29 -04:00