Commit Graph

4466 Commits

Author SHA1 Message Date
Jeffrey Morgan
4f8a0166cc tools: loosen tool argument parsing (#11509) 2025-07-23 21:21:29 -07:00
minxinyi
1e6eab5c33 server: use slices.Equal to simplify code (#11502) 2025-07-23 14:25:39 -07:00
Michael Yang
6c733bf0a6 s#x/exp/maps#maps# (#11506) 2025-07-23 13:23:32 -07:00
Patrick Devine
3bac5cba60 Fix GetModelInfo (#11496)
---------

Co-authored-by: Richard Lyons <frob@cloudstaff.com>
2025-07-22 13:40:47 -07:00
ycomiti
4151ef8cf7 Update linux.md (#11462) 2025-07-22 11:17:31 -07:00
Stefan Wärting
82da19c634 readme: add GMAI - Gradle Managed to community integrations (#11461) 2025-07-20 14:55:47 -07:00
Jeffrey Morgan
bdd9d22dfd tools: fix parsing issue when a tool name is a substring of another (#11456)
Co-authored-by: frob <rick+github@frob.com.au>
2025-07-20 14:55:14 -07:00
zmldndx
5fc38d042f readme: update argo description to support deep research (#11455) 2025-07-19 13:29:38 -07:00
Daniel Hiltgen
191d94289d ci: switch mac builder to arm64 (#11379)
The macos-13 is x86, while macos-13-xlarge is arm64
2025-07-17 07:33:44 -07:00
frob
802ad16ce4 docs: add the no-Modelfile function of ollama create (#9077) 2025-07-16 22:16:10 -07:00
frob
5e67f4f90e openai: allow openai endpoint to accept webp images (#11412)
Co-authored-by: Richard Lyons <frob@cloudstaff.com>
2025-07-16 21:31:49 -07:00
Haiyue Wang
e840ccb523 readme: update the llama.cpp github link (#11427) 2025-07-16 21:20:28 -07:00
Michael Yang
b4fe3adc0a compile bf16 support into ggml-metal (#11430) 2025-07-16 17:32:57 -07:00
Parth Sareen
d73f8aa8c3 cmd: add default assistant role to message construction (#11431) 2025-07-16 11:18:16 -07:00
Bruce MacDonald
92c2e8a56c api: fix unreachable status err (#11423)
StatusError was unreachable, the client always checked for error messages in the response body first, and the server always includes error messages with HTTP error status codes.
2025-07-16 11:03:28 -07:00
Marcelo Fornet
2e3fd86d48 docs: fix typo in macos.md (#11425) 2025-07-16 10:50:46 -07:00
先知
4261a3b0b2 docs: update modelfile.md to reflect current default num_ctx (#11189)
As in the commit 44b466eeb2, the default context length has been increased to 4096.
2025-07-11 15:15:00 -07:00
Jesse Gross
acef9b4c1b ggml: Use assigned layers when reporting loading stats
Reporting params.NumGPULayers can be misleading because it is the
requested number of layers, not the actual number that is loaded.
While they are often the same, there are cases where they might mismatch,
such as if the GPU backend is missing.
2025-07-11 14:21:50 -07:00
Jesse Gross
9a43994c45 ggml: Disable unused pipeline parallelism
We're not currently using it, even in cases where we could. Disabling
it improves generation performance by 10-30% with multiple GPUs.
2025-07-11 13:30:05 -07:00
Daniel Hiltgen
f8a6e88819 Only load supported models on new engine (#11362)
* Only load supported models on new engine

Verify the model is supported before trying to load

* int: testcase for all library models
2025-07-11 12:21:54 -07:00
Jesse Gross
35fda7b4af ggml: Report ordinal IDs for AMD GPUs on Windows
We don't get valid UUIDs for AMD GPUs on Windows, so the best option
is to use the ordinal IDs. This brings us in line with what we currently
do on the Ollama server - the only exception is AMD GPUs on Linux, which
falls back to using ordinal IDs. The GGML implementation has no fallback
but it doesn't appear to occur for any of the GPUs that we support.

It's also possible that there are collisions between ordinal IDs for
different libraries - however the only places where we use them are
AMD on Windows and Metal on Mac, which can never occur on the same
system.
2025-07-09 10:35:31 -07:00
Daniel Hiltgen
66fb8575ce doc: add MacOS docs (#11334)
also removes stale model dir instructions for windows
2025-07-08 15:38:04 -07:00
Daniel Hiltgen
20c3266e94 Reduce default parallelism to 1 (#11330)
The current scheduler algorithm of picking the paralellism based on available
VRAM complicates the upcoming dynamic layer memory allocation algorithm.  This
changes the default to 1, with the intent going forward that parallelism is
explicit and will no longer be dynamically determined.  Removal of the dynamic
logic will come in a follow up.
2025-07-08 12:08:37 -07:00
Daniel Hiltgen
34088dbcfb API/CLI context enhancements (#11331)
* API: expose context size of loaded models

* CLI: add context UX

This adds a column in the ps output to show the models context size.
2025-07-08 11:59:06 -07:00
Parth Sareen
43107b15b9 add tool_name to api.md (#11326) 2025-07-07 16:53:13 -07:00
Parth Sareen
1f91cb0c8c template: add tool result compatibility (#11294) 2025-07-07 15:53:42 -07:00
Daniel Hiltgen
12d8ad0d38 ci: modularization (#11324)
switch a few constants to variables
2025-07-07 14:07:43 -07:00
Jesse Gross
592d21e7db Revert "ggml: Temporarily disable reporting UUIDs"
The root cause was an unclean upgrade - this code is fine.

This reverts commit 45f216a9c7.
2025-07-07 11:31:02 -07:00
Jeffrey Morgan
5a08b01f5b readme: update Ollama icon size 2025-07-05 17:20:42 -07:00
Daniel Hiltgen
4f473e224c int: add performance integration tests (#11173)
usage example:
  go test --tags=integration,perf -count 1 ./integration -v -timeout 1h -run TestModelsPerf 2>&1 | tee int.log
  cat int.log | grep MODEL_PERF_HEADER | cut -f2- -d: > perf.csv
  cat int.log | grep MODEL_PERF_DATA | cut -f2- -d: >> perf.csv
2025-07-05 16:07:09 -07:00
Daniel Hiltgen
9d60bb44cf doc: add NVIDIA blackwell to supported list (#11307) 2025-07-05 16:06:30 -07:00
Vincent RAMPAL
f371260e75 Update base image to Ubuntu 24.04 LTS (#9681) 2025-07-05 16:02:33 -07:00
Daniel Hiltgen
c9e6d7719e doc: Update link for mac install (#11288)
Favor the dmg now.
2025-07-03 09:48:45 -07:00
Daniel Hiltgen
2c4ce40334 mimic logs for layers on new engine (#11278)
This adds some extra logs to make the new engine a bit more consistent
with the llama engine.
2025-07-02 16:38:36 -07:00
XuKecheng
5d8c173529 readme: add NativeMind to community integrations (#11242) 2025-07-01 09:46:15 -07:00
Jeffrey Morgan
44b17d2bfa tools: fix parsing tool calls with empty arguments, missing required fields (#11233) 2025-06-30 08:59:03 -07:00
Attogram Project
3b8b692218 readme: add ollama-bash-toolshed to community integrations (#11224) 2025-06-29 14:59:54 -07:00
Michael Yang
4129af9205 chore: cleanup comments + unused vars (#11225) 2025-06-27 11:45:33 -07:00
Jesse Gross
45f216a9c7 ggml: Temporarily disable reporting UUIDs
This is causing segfaults, so disable it. Currently UUIDs are only
used for debugging purposes, although they planned to be used in
additional ways in the future.

Bug #11211
2025-06-27 11:27:22 -07:00
Michael Yang
d0b32def60 skip quantizing per_layer_token_embd (#11207)
this tensor isn't compatible with cuda when quantized to q4_K so skip it
2025-06-26 21:49:35 -07:00
Daniel Hiltgen
11ffc36157 ci: multi-stage release process (#11001) 2025-06-26 10:32:48 -07:00
Jeffrey Morgan
ba04902670 fs/ggml: add multiplier in graph estimates (#11208) 2025-06-26 00:19:44 -07:00
Jeffrey Morgan
3944602f51 fs/ggml: add missing architecture to OllamaEngineRequired() (#11206) 2025-06-26 00:11:23 -07:00
Michael Yang
73b642e6f3 add new gemma model (#11204)
* update patches

* cherry pick metal mean kernel

* cherry pick cuda mean kernel

* gemma3n
2025-06-25 21:47:09 -07:00
Daniel Hiltgen
ad118d8b13 ci: arm sbsa fixes (#11194) 2025-06-24 21:00:15 -07:00
Daniel Hiltgen
f08534137b ci: include dependencies 2025-06-24 20:27:43 -07:00
Daniel Hiltgen
4b4a90f233 ci: pick up arm sbsa cuda libs (#11192) 2025-06-24 18:59:22 -07:00
Daniel Hiltgen
03274a6b2f ci: recombine linux amd64 binaries (#11188)
Glue the rocm and archive builds back together.
2025-06-24 18:45:01 -07:00
Devon Rifkin
cc6463ebca Merge pull request #10238 from ollama/drifkin/array-head-count-simple
ggml: fix crash for array head counts
2025-06-24 17:50:02 -07:00
Daniel Hiltgen
405d2f628f ci: rocm parallel builds on windows (#11187)
The preset CMAKE_HIP_FLAGS isn't getting used on Windows.
This passes the parallel flag in through the C/CXX flags, along
with suppression for some log spew warnings to quiet down the build.
2025-06-24 15:27:09 -07:00