Commit Graph

4469 Commits

Author SHA1 Message Date
Gabe Goodhart
d724caced3 fix: Remove Gemma3n CUDA Graphs patch
It was implemented upstream:
https://github.com/ggml-org/llama.cpp/pull/14741

Branch: GraniteFour

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
2025-07-30 13:55:21 -04:00
Gabe Goodhart
94912ec7dd fix: Fix Solar and argsort/copy patches after bump
Branch: GraniteFour

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
2025-07-30 13:54:38 -04:00
Gabe Goodhart
8fbeb68858 feat: Bump to 41e78c in the makefile
Branch: GraniteFour

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
2025-07-30 13:54:03 -04:00
Gabe Goodhart
70d2f70dd3 fix: Re-number patches after merge with main
Branch: GraniteFour

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
2025-07-30 13:02:08 -04:00
Gabe Goodhart
c22e9c9bbd Merge remote-tracking branch 'origin/main' into GraniteFour
* origin/main:
Revert "CI: switch back to x86 macos builder" (#11588)
mac: disable bf16 on unsupported OS versions (#11585)
CI: switch back to x86 macos builder (#11572)
Increase performance for Gemma3n models on NVGPUs by enabling CUDA Graph execution (#11525)
kvcache: Don't shift empty batches
docs: fix typos and remove trailing whitespaces (#11554)

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
2025-07-30 13:01:12 -04:00
Daniel Hiltgen
6dcc5dfb9c Revert "CI: switch back to x86 macos builder" (#11588)
This reverts commit 9d071e6089.
v0.10.0
2025-07-30 08:56:01 -07:00
Daniel Hiltgen
25911a6e6b mac: disable bf16 on unsupported OS versions (#11585)
Support for bf16 was added in MacOS v14+ and attempting to enable
on older versions causes runtime failures.
2025-07-30 08:50:54 -07:00
Daniel Hiltgen
8afa6e83f2 CI: switch back to x86 macos builder (#11572) v0.10.0-rc4 2025-07-29 16:41:25 -07:00
Oliver Simons
ea85e27bbd Increase performance for Gemma3n models on NVGPUs by enabling CUDA Graph execution (#11525)
* Enable CUDA Graphs for gemma3n.

Similar to
https://github.com/ggml-org/llama.cpp/pull/14741,
though ollama has a slightly different model graph
than llama.cpp which requires different workaround
checks.

* Remove residual check by reshaping differently in gemma3n model

This should make the heuristics more robust
2025-07-29 12:37:06 -07:00
Jesse Gross
c116a7523d kvcache: Don't shift empty batches
When we context shift, we delete half the context and apply RoPE
with an offset to the other half. We used to RoPE across the entire
context in a single pass with a zero offset for the deleted
section. With the change to shifting in batches, we can skip any
batches where all of the offsets would be zero. This typically
reduces the number of operations by half.
2025-07-29 12:32:22 -07:00
Yoshi
3515cc377c docs: fix typos and remove trailing whitespaces (#11554) v0.10.0-rc3 2025-07-28 11:19:13 -07:00
Gabe Goodhart
74d1f478e3 fix: Handle multi-chunk image encodings from mtmd
Branch: GraniteFour

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
2025-07-28 11:44:35 -04:00
Gabe Goodhart
444c2bf248 Merge remote-tracking branch 'origin/main' into GraniteFour
* origin/main:
readme: add Mayan EDMS to community integrations (#11543)
kvcache: Group shift operations into batches
CONTRIBUTING: fix typo in commit message example (#11528)
2025-07-28 10:33:49 -04:00
Mayan EDMS
bbf66c0b96 readme: add Mayan EDMS to community integrations (#11543) 2025-07-27 15:02:52 -07:00
Jesse Gross
764be7480f kvcache: Group shift operations into batches
Currently, when we need to do a shift on the cache, it is one
RoPE operation on the entire size of the cache (per layer). In
some cases, this can create a compute graph that is larger than
the forward pass since the forward pass is working in batches.
Since we don't consider shifting in our memory estimates, it's
possible for this to cause a crash if we run out of memory.

By limiting the size of the RoPE calls to batch size chunks, we
ensure that the shift will never exceed the size of the forward
pass, since the forward pass will also contain a RoPE of the same
size. This does not have a sigificant impact on performance since
RoPE is a math operation that is mostly proportional to the size
of its inputs.

In theory defrag could have the same issue since it also creates a
compute graph outside of the forward pass, however, since it is
only copies, it does not require any working space.
2025-07-25 16:50:27 -07:00
Ruyut
b72e5adb14 CONTRIBUTING: fix typo in commit message example (#11528) v0.10.0-rc2 2025-07-25 14:24:06 -07:00
Gabe Goodhart
11a0d7376c Merge remote-tracking branch 'origin/main' into GraniteFour
* origin/main:
cli: catch upstream errors gracefully (#11512)
tools: loosen tool argument parsing (#11509)
server: use slices.Equal to simplify code (#11502)
s#x/exp/maps#maps# (#11506)
Fix GetModelInfo (#11496)
Update linux.md (#11462)
2025-07-25 09:50:47 -06:00
Patrick Devine
80b538e312 cli: catch upstream errors gracefully (#11512) 2025-07-23 22:16:55 -07:00
Jeffrey Morgan
4f8a0166cc tools: loosen tool argument parsing (#11509) 2025-07-23 21:21:29 -07:00
minxinyi
1e6eab5c33 server: use slices.Equal to simplify code (#11502) 2025-07-23 14:25:39 -07:00
Michael Yang
6c733bf0a6 s#x/exp/maps#maps# (#11506) 2025-07-23 13:23:32 -07:00
Patrick Devine
3bac5cba60 Fix GetModelInfo (#11496)
---------

Co-authored-by: Richard Lyons <frob@cloudstaff.com>
v0.10.0-rc1
2025-07-22 13:40:47 -07:00
ycomiti
4151ef8cf7 Update linux.md (#11462) 2025-07-22 11:17:31 -07:00
Gabe Goodhart
895d5563df Merge remote-tracking branch 'origin/main' into GraniteFour
* origin/main:
readme: add GMAI - Gradle Managed to community integrations (#11461)
tools: fix parsing issue when a tool name is a substring of another (#11456)
readme: update argo description to support deep research (#11455)
ci: switch mac builder to arm64 (#11379)
docs: add the no-Modelfile function of `ollama create` (#9077)
openai: allow openai endpoint to accept webp images (#11412)
readme: update the llama.cpp github link (#11427)
compile bf16 support into ggml-metal (#11430)
cmd: add default assistant role to message construction (#11431)
api: fix unreachable status err (#11423)
docs: fix typo in macos.md (#11425)
2025-07-21 15:04:52 -06:00
Stefan Wärting
82da19c634 readme: add GMAI - Gradle Managed to community integrations (#11461) 2025-07-20 14:55:47 -07:00
Jeffrey Morgan
bdd9d22dfd tools: fix parsing issue when a tool name is a substring of another (#11456)
Co-authored-by: frob <rick+github@frob.com.au>
2025-07-20 14:55:14 -07:00
zmldndx
5fc38d042f readme: update argo description to support deep research (#11455) 2025-07-19 13:29:38 -07:00
Daniel Hiltgen
191d94289d ci: switch mac builder to arm64 (#11379)
The macos-13 is x86, while macos-13-xlarge is arm64
2025-07-17 07:33:44 -07:00
frob
802ad16ce4 docs: add the no-Modelfile function of ollama create (#9077) 2025-07-16 22:16:10 -07:00
frob
5e67f4f90e openai: allow openai endpoint to accept webp images (#11412)
Co-authored-by: Richard Lyons <frob@cloudstaff.com>
2025-07-16 21:31:49 -07:00
Haiyue Wang
e840ccb523 readme: update the llama.cpp github link (#11427) 2025-07-16 21:20:28 -07:00
Michael Yang
b4fe3adc0a compile bf16 support into ggml-metal (#11430) 2025-07-16 17:32:57 -07:00
Parth Sareen
d73f8aa8c3 cmd: add default assistant role to message construction (#11431) v0.10.0-rc0 2025-07-16 11:18:16 -07:00
Bruce MacDonald
92c2e8a56c api: fix unreachable status err (#11423)
StatusError was unreachable, the client always checked for error messages in the response body first, and the server always includes error messages with HTTP error status codes.
2025-07-16 11:03:28 -07:00
Marcelo Fornet
2e3fd86d48 docs: fix typo in macos.md (#11425) 2025-07-16 10:50:46 -07:00
Gabe Goodhart
e6a22f20d1 Merge remote-tracking branch 'origin/main' into GraniteFour
* origin/main:
docs: update modelfile.md to reflect current default num_ctx (#11189)
ggml: Use assigned layers when reporting loading stats
ggml: Disable unused pipeline parallelism
Only load supported models on new engine (#11362)
2025-07-15 14:50:19 -06:00
Gabe Goodhart
5305e2ad14 feat: Sync llama.cpp
Branch: GraniteFour

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
2025-07-15 14:50:01 -06:00
Gabe Goodhart
4f462a9f67 feat: Bump llama.cpp to 4a4f42
This picks up support for Kimi K2 and PLaMO-2

Branch: GraniteFour

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
2025-07-15 14:49:15 -06:00
先知
4261a3b0b2 docs: update modelfile.md to reflect current default num_ctx (#11189)
As in the commit 44b466eeb2, the default context length has been increased to 4096.
v0.9.7-rc1
2025-07-11 15:15:00 -07:00
Gabe Goodhart
91e4b10d40 fix: Sync patch changes for ggml-cpu.c
Branch: GraniteFour

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
2025-07-11 16:01:15 -06:00
Gabe Goodhart
0beea04b52 fix: Add a patch to avoid power throttling API on non-msvc windows builds
Branch: GraniteFour

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
2025-07-11 16:00:49 -06:00
Jesse Gross
acef9b4c1b ggml: Use assigned layers when reporting loading stats
Reporting params.NumGPULayers can be misleading because it is the
requested number of layers, not the actual number that is loaded.
While they are often the same, there are cases where they might mismatch,
such as if the GPU backend is missing.
2025-07-11 14:21:50 -07:00
Jesse Gross
9a43994c45 ggml: Disable unused pipeline parallelism
We're not currently using it, even in cases where we could. Disabling
it improves generation performance by 10-30% with multiple GPUs.
2025-07-11 13:30:05 -07:00
Gabe Goodhart
e8a303a701 build: Add top-level include for GNUINstallDirs in CMakeLists.txt
This is used to populate CMAKE_INSTALL_BINDIR

Branch: GraniteFour

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
2025-07-11 13:44:10 -06:00
Gabe Goodhart
81d821ba9b build: Include cmake/common.cmake in ggml sync
Branch: GraniteFour

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
2025-07-11 13:25:01 -06:00
Daniel Hiltgen
f8a6e88819 Only load supported models on new engine (#11362)
* Only load supported models on new engine

Verify the model is supported before trying to load

* int: testcase for all library models
2025-07-11 12:21:54 -07:00
Gabe Goodhart
bf1b261611 feat: Sync all patched code
Branch: GraniteFour

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
2025-07-11 11:44:18 -06:00
Gabe Goodhart
3020c462da fix: Add patch for GGML_VERSION and GGML_COMMIT constants
Branch: GraniteFour

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
2025-07-11 11:43:14 -06:00
Gabe Goodhart
d7f98e0673 fix: Revert changes to ggml export GPU UUID patch
Branch: GraniteFour

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
2025-07-11 11:42:26 -06:00
Gabe Goodhart
111434ab39 feat: Bump back to the cenral repo and point at the latest master
This includes granite 4 and a number of other model architectures!

Branch: GraniteFour

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
2025-07-11 10:43:22 -06:00