Gabe Goodhart
81d821ba9b
build: Include cmake/common.cmake in ggml sync
...
Branch: GraniteFour
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com >
2025-07-11 13:25:01 -06:00
Gabe Goodhart
bf1b261611
feat: Sync all patched code
...
Branch: GraniteFour
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com >
2025-07-11 11:44:18 -06:00
Gabe Goodhart
3020c462da
fix: Add patch for GGML_VERSION and GGML_COMMIT constants
...
Branch: GraniteFour
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com >
2025-07-11 11:43:14 -06:00
Gabe Goodhart
d7f98e0673
fix: Revert changes to ggml export GPU UUID patch
...
Branch: GraniteFour
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com >
2025-07-11 11:42:26 -06:00
Gabe Goodhart
111434ab39
feat: Bump back to the cenral repo and point at the latest master
...
This includes granite 4 and a number of other model architectures!
Branch: GraniteFour
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com >
2025-07-11 10:43:22 -06:00
Gabe Goodhart
06a5592dc5
fix: Update patches for bump
...
Branch: GraniteFour
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com >
2025-07-10 16:01:30 -06:00
Gabe Goodhart
0a7ddc4e17
feat: Bump to the latest tip of the branch
...
Branch: GraniteFour
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com >
2025-07-10 16:01:14 -06:00
Gabe Goodhart
152260e9c7
fix: Update patch 0015 for upstream implementation of uuid
...
Branch: GraniteFour
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com >
2025-07-10 14:33:12 -06:00
Gabe Goodhart
e61826c180
Merge remote-tracking branch 'origin/main' into GraniteFour
...
* origin/main:
ggml: Report ordinal IDs for AMD GPUs on Windows
doc: add MacOS docs (#11334 )
Reduce default parallelism to 1 (#11330 )
API/CLI context enhancements (#11331 )
add `tool_name` to api.md (#11326 )
template: add tool result compatibility (#11294 )
ci: modularization (#11324 )
Revert "ggml: Temporarily disable reporting UUIDs"
readme: update Ollama icon size
int: add performance integration tests (#11173 )
doc: add NVIDIA blackwell to supported list (#11307 )
Update base image to Ubuntu 24.04 LTS (#9681 )
doc: Update link for mac install (#11288 )
mimic logs for layers on new engine (#11278 )
readme: add NativeMind to community integrations (#11242 )
tools: fix parsing tool calls with empty arguments, missing required fields (#11233 )
readme: add ollama-bash-toolshed to community integrations (#11224 )
2025-07-10 14:01:24 -06:00
Jesse Gross
35fda7b4af
ggml: Report ordinal IDs for AMD GPUs on Windows
...
We don't get valid UUIDs for AMD GPUs on Windows, so the best option
is to use the ordinal IDs. This brings us in line with what we currently
do on the Ollama server - the only exception is AMD GPUs on Linux, which
falls back to using ordinal IDs. The GGML implementation has no fallback
but it doesn't appear to occur for any of the GPUs that we support.
It's also possible that there are collisions between ordinal IDs for
different libraries - however the only places where we use them are
AMD on Windows and Metal on Mac, which can never occur on the same
system.
v0.9.7-rc0
2025-07-09 10:35:31 -07:00
Daniel Hiltgen
66fb8575ce
doc: add MacOS docs ( #11334 )
...
also removes stale model dir instructions for windows
2025-07-08 15:38:04 -07:00
Daniel Hiltgen
20c3266e94
Reduce default parallelism to 1 ( #11330 )
...
The current scheduler algorithm of picking the paralellism based on available
VRAM complicates the upcoming dynamic layer memory allocation algorithm. This
changes the default to 1, with the intent going forward that parallelism is
explicit and will no longer be dynamically determined. Removal of the dynamic
logic will come in a follow up.
2025-07-08 12:08:37 -07:00
Daniel Hiltgen
34088dbcfb
API/CLI context enhancements ( #11331 )
...
* API: expose context size of loaded models
* CLI: add context UX
This adds a column in the ps output to show the models context size.
2025-07-08 11:59:06 -07:00
Parth Sareen
43107b15b9
add tool_name
to api.md ( #11326 )
v0.9.6-rc0
v0.9.6
2025-07-07 16:53:13 -07:00
Parth Sareen
1f91cb0c8c
template: add tool result compatibility ( #11294 )
2025-07-07 15:53:42 -07:00
Daniel Hiltgen
12d8ad0d38
ci: modularization ( #11324 )
...
switch a few constants to variables
2025-07-07 14:07:43 -07:00
Jesse Gross
592d21e7db
Revert "ggml: Temporarily disable reporting UUIDs"
...
The root cause was an unclean upgrade - this code is fine.
This reverts commit 45f216a9c7
.
2025-07-07 11:31:02 -07:00
Jeffrey Morgan
5a08b01f5b
readme: update Ollama icon size
2025-07-05 17:20:42 -07:00
Daniel Hiltgen
4f473e224c
int: add performance integration tests ( #11173 )
...
usage example:
go test --tags=integration,perf -count 1 ./integration -v -timeout 1h -run TestModelsPerf 2>&1 | tee int.log
cat int.log | grep MODEL_PERF_HEADER | cut -f2- -d: > perf.csv
cat int.log | grep MODEL_PERF_DATA | cut -f2- -d: >> perf.csv
2025-07-05 16:07:09 -07:00
Daniel Hiltgen
9d60bb44cf
doc: add NVIDIA blackwell to supported list ( #11307 )
2025-07-05 16:06:30 -07:00
Vincent RAMPAL
f371260e75
Update base image to Ubuntu 24.04 LTS ( #9681 )
2025-07-05 16:02:33 -07:00
Daniel Hiltgen
c9e6d7719e
doc: Update link for mac install ( #11288 )
...
Favor the dmg now.
2025-07-03 09:48:45 -07:00
Daniel Hiltgen
2c4ce40334
mimic logs for layers on new engine ( #11278 )
...
This adds some extra logs to make the new engine a bit more consistent
with the llama engine.
2025-07-02 16:38:36 -07:00
XuKecheng
5d8c173529
readme: add NativeMind to community integrations ( #11242 )
v0.9.5
2025-07-01 09:46:15 -07:00
Jeffrey Morgan
44b17d2bfa
tools: fix parsing tool calls with empty arguments, missing required fields ( #11233 )
v0.9.4-rc6
v0.9.4-rc3
v0.9.4-rc4
v0.9.4-rc5
v0.9.4
2025-06-30 08:59:03 -07:00
Attogram Project
3b8b692218
readme: add ollama-bash-toolshed to community integrations ( #11224 )
2025-06-29 14:59:54 -07:00
Gabe Goodhart
34ff84df43
fix: Use c++17 and include vendor for go wrapper modules
...
Branch: GraniteFour
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com >
2025-06-27 17:23:27 -06:00
Gabe Goodhart
d395132510
fix: Add sync'ed stb vendored header
...
Branch: GraniteFour
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com >
2025-06-27 17:17:23 -06:00
Gabe Goodhart
16c116c2b7
fix: Add missing stb to llama.cpp rsync-filter
...
Branch: GraniteFour
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com >
2025-06-27 17:16:58 -06:00
Gabe Goodhart
58300273f4
fix: Apply patch for mtmd_text_input
...
Branch: GraniteFour
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com >
2025-06-27 17:09:48 -06:00
Gabe Goodhart
f358dd5a1c
fix: Use mtmd_helper to correctly load the bitmap for the image
...
Branch: GraniteFour
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com >
2025-06-27 17:09:05 -06:00
Gabe Goodhart
dbd8ee2654
fix: Fix support for arch-specific ggml-cpu source files with new arrangement
...
In https://github.com/ggml-org/llama.cpp/pull/13892 , all arch-specific
implementations were split out into a nested tree structure under
ggml-cpu/arch. This conflicts with standard CGO layout where all
arch-specific source files are expected to live in the same directory as
the parent go module and use suffixes based on GOOS and GOARCH. As such,
there were really two options for getting this to work:
1. Add a patch on top of the GGML sync to rearrange the files to match the
GO layout convention
2. Use CGO directives to conditionally include the nested source files in
the compilation units
This commit does (2) in order to minimize the set of changes needed on top
of the upstream file layout. To get this to work, there are two key things
needed:
1. In cpu.go, #cgo directives are added to explicitly set __${GOARCH}__ in
the preprocessor directives
2. In arch-impls.c|cpp, use an #ifdef | #elif defined | #endif chain to
explicitly include the .c|.cpp files for the given architecture from the
nested directory
Branch: GraniteFour
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com >
2025-06-27 17:08:56 -06:00
Gabe Goodhart
7334a0ea07
chore: Ignore *.patched in the patch directory
...
Branch: GraniteFour
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com >
2025-06-27 17:08:42 -06:00
Gabe Goodhart
1664d52be6
fix: Add patch for mtmd_input_text
...
Branch: GraniteFour
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com >
2025-06-27 17:08:29 -06:00
Gabe Goodhart
3d70237fd1
fix: Update llama.go to use mtmd instead of clip/llava
...
It's _very_ possible that this is broken!
Branch: GraniteFour
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com >
2025-06-27 17:06:47 -06:00
Gabe Goodhart
fa54a3cf3a
fix: Add missing include in sampling_ext.cpp
...
Branch: GraniteFour
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com >
2025-06-27 17:06:40 -06:00
Gabe Goodhart
d0fd9e5aa2
fix: Remove mtmd main cpp files
...
Branch: GraniteFour
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com >
2025-06-27 17:06:31 -06:00
Gabe Goodhart
1cd9352cc3
fix: Narrow llama.cpp rsync-filter to not include mtmd main tool cpp files
...
Branch: GraniteFour
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com >
2025-06-27 17:06:18 -06:00
Gabe Goodhart
85aba511ec
fix: Add ggml files missing from sync
...
Branch: GraniteFour
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com >
2025-06-27 17:06:05 -06:00
Gabe Goodhart
62af160d82
fix: Update ggml rsync-filter for new ggml-cpu/arch subdirs
...
Branch: GraniteFour
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com >
2025-06-27 17:05:39 -06:00
Gabe Goodhart
414a097372
fix: Add files missing from sync
...
Branch: GraniteFour
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com >
2025-06-27 17:05:25 -06:00
Gabe Goodhart
424e05c20e
fix: Update rsync-filter for all moved/new/removed files
...
Branch: GraniteFour
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com >
2025-06-27 17:04:51 -06:00
Gabe Goodhart
2613f5da2d
feat: Sync llama.cpp and ggml
...
Branch: GraniteFour
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com >
2025-06-27 17:01:24 -06:00
Gabe Goodhart
73d089bb90
feat: Update all patches
...
There are a number that are no longer needed at all:
- 0003-embeddings: Embeddings entirely overhauled on master
- 0008-ensure-KV-cache-is-fully-defragmented: KV caching entirely
overhauled on master
- 0019-metal-add-mean-kernel-14267: Merged upstream
- 0020-CUDA-add-mean-operation-14313: Merged upstream
Branch: GraniteFour
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com >
2025-06-27 16:57:05 -06:00
Gabe Goodhart
a30ae1fa20
TEMPORARY: Update the llama.cpp upstream to my fork's Granite Four branch
...
This will be redone once my branch is merged upstream in llama.cpp
Branch: GraniteFour
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com >
2025-06-27 16:24:42 -06:00
Michael Yang
4129af9205
chore: cleanup comments + unused vars ( #11225 )
v0.9.4-rc1
v0.9.4-rc2
2025-06-27 11:45:33 -07:00
Jesse Gross
45f216a9c7
ggml: Temporarily disable reporting UUIDs
...
This is causing segfaults, so disable it. Currently UUIDs are only
used for debugging purposes, although they planned to be used in
additional ways in the future.
Bug #11211
2025-06-27 11:27:22 -07:00
Michael Yang
d0b32def60
skip quantizing per_layer_token_embd ( #11207 )
...
this tensor isn't compatible with cuda when quantized to q4_K so skip it
2025-06-26 21:49:35 -07:00
Daniel Hiltgen
11ffc36157
ci: multi-stage release process ( #11001 )
v0.9.4-rc0
2025-06-26 10:32:48 -07:00
Jeffrey Morgan
ba04902670
fs/ggml: add multiplier in graph estimates ( #11208 )
v0.9.3
2025-06-26 00:19:44 -07:00