469 Commits

Author SHA1 Message Date
Thomas Pelster
883d031268 docs: added missing comma in 'Ollama's Javascript library'' (#11915) 2025-08-15 14:45:01 -07:00
Daniel Hiltgen
7ccfd97a93 doc: clarify both rocm and main bundle necessary (#11900)
Some users expect the rocm bundles to be self-sufficient, but are designed to be additive.
2025-08-14 12:54:55 -07:00
Patrick Devine
44bc36d063 docs: update the faq (#11760) 2025-08-06 16:55:57 -07:00
Gao feng
8a75e9ee15 Update downloading to pulling in api.md (#11170)
update api.md to make it consist with code.
https://github.com/ollama/ollama/blob/main/server/download.go#L447
2025-08-06 11:33:09 -07:00
Parth Sareen
4742e12c23 docs: update turbo model name (#11707) 2025-08-05 17:29:08 -07:00
Jeffrey Morgan
ee92ca3e1d docs: add docs for Ollama Turbo (#11687) 2025-08-05 13:09:10 -07:00
Yoshi
3515cc377c docs: fix typos and remove trailing whitespaces (#11554) 2025-07-28 11:19:13 -07:00
ycomiti
4151ef8cf7 Update linux.md (#11462) 2025-07-22 11:17:31 -07:00
frob
802ad16ce4 docs: add the no-Modelfile function of ollama create (#9077) 2025-07-16 22:16:10 -07:00
Marcelo Fornet
2e3fd86d48 docs: fix typo in macos.md (#11425) 2025-07-16 10:50:46 -07:00
先知
4261a3b0b2 docs: update modelfile.md to reflect current default num_ctx (#11189)
As in the commit 44b466eeb2, the default context length has been increased to 4096.
2025-07-11 15:15:00 -07:00
Daniel Hiltgen
66fb8575ce doc: add MacOS docs (#11334)
also removes stale model dir instructions for windows
2025-07-08 15:38:04 -07:00
Daniel Hiltgen
20c3266e94 Reduce default parallelism to 1 (#11330)
The current scheduler algorithm of picking the paralellism based on available
VRAM complicates the upcoming dynamic layer memory allocation algorithm.  This
changes the default to 1, with the intent going forward that parallelism is
explicit and will no longer be dynamically determined.  Removal of the dynamic
logic will come in a follow up.
2025-07-08 12:08:37 -07:00
Parth Sareen
43107b15b9 add tool_name to api.md (#11326) 2025-07-07 16:53:13 -07:00
Parth Sareen
1f91cb0c8c template: add tool result compatibility (#11294) 2025-07-07 15:53:42 -07:00
Daniel Hiltgen
9d60bb44cf doc: add NVIDIA blackwell to supported list (#11307) 2025-07-05 16:06:30 -07:00
Daniel Hiltgen
1c6669e64c Re-remove cuda v11 (#10694)
* Re-remove cuda v11

Revert the revert - drop v11 support requiring drivers newer than Feb 23

This reverts commit c6bcdc4223.

* Simplify layout

With only one version of the GPU libraries, we can simplify things down somewhat.  (Jetsons still require special handling)

* distinct sbsa variant for linux arm64

This avoids accidentally trying to load the sbsa cuda libraries on
a jetson system which results in crashes.

* temporary prevent rocm+cuda mixed loading
2025-06-23 14:07:00 -07:00
Jeffrey Morgan
8bcb3125c1 benchmark: remove unused benchmark test (#11120)
Removes a test under benchmark/ that is unused
2025-06-18 12:58:50 -07:00
Krzysztof Jeziorny
fc0309615e docs: update link to AMD drivers in linux.md (#10973) 2025-06-06 23:30:04 -04:00
Jeffrey Morgan
09d308d6b6 Revert "server: add model capabilities to the list endpoint (#10174)" (#11004)
This reverts commit 0943001193.
2025-06-06 23:29:14 -04:00
Hunter Wittenborn
c6a6d7294d docs: fix typo in development.md (#10998) 2025-06-06 12:07:29 -04:00
JasonHonKL
0943001193 server: add model capabilities to the list endpoint (#10174) 2025-06-04 11:39:48 -07:00
Devon Rifkin
5f57b0ef42 add thinking support to the api and cli (#10584)
- Both `/api/generate` and `/api/chat` now accept a `"think"`
  option that allows specifying whether thinking mode should be on or
  not
- Templates get passed this new option so, e.g., qwen3's template can
  put `/think` or `/no_think` in the system prompt depending on the
  value of the setting
- Models' thinking support is inferred by inspecting model templates.
  The prefix and suffix the parser uses to identify thinking support is
  also automatically inferred from templates
- Thinking control & parsing is opt-in via the API to prevent breaking
  existing API consumers. If the `"think"` option is not specified, the
  behavior is unchanged from previous versions of ollama
- Add parsing for thinking blocks in both streaming/non-streaming mode
  in both `/generate` and `/chat`
- Update the CLI to make use of these changes. Users can pass `--think`
  or `--think=false` to control thinking, or during an interactive
  session they can use the commands `/set think` or `/set nothink`
- A `--hidethinking` option has also been added to the CLI. This makes
  it easy to use thinking in scripting scenarios like
  `ollama run qwen3 --think --hidethinking "my question here"` where you
  just want to see the answer but still want the benefits of thinking
  models
2025-05-28 19:38:52 -07:00
frob
6623898198 docs: remove unsupported quantizations (#10842) 2025-05-24 13:17:26 -07:00
Daniel Hiltgen
c6bcdc4223 Revert "remove cuda v11 (#10569)" (#10692)
Bring back v11 until we can better warn users that their driver
is too old.

This reverts commit fa393554b9.
2025-05-13 13:12:54 -07:00
Daniel Hiltgen
9d6df90805 Follow up to #10363 (#10647)
The quantization PR didn't block all unsupported file types,
which this PR fixes.  It also updates the API docs to reflect
the now reduced set of supported types.
2025-05-12 15:23:31 -07:00
Jeffrey Morgan
fa9973cd7f api: remove unused sampling parameters (#10581) 2025-05-08 08:31:08 -07:00
Daniel Hiltgen
fa393554b9 remove cuda v11 (#10569)
This reduces the size of our Windows installer payloads by ~256M by dropping
support for nvidia drivers older than Feb 2023.  Hardware support is unchanged.

Linux default bundle sizes are reduced by ~600M to 1G.
2025-05-06 17:33:19 -07:00
Jeffrey Morgan
3b2d2c8326 api: remove unused or unsupported api options (#10574)
Some options listed in api/types.go are not supported in
newer models, or have been deprecated in the past. This is
the first of a series of PRs to clean up the API options
2025-05-05 14:54:40 -07:00
Devon Rifkin
44b466eeb2 config: update default context length to 4096 2025-04-28 17:03:27 -07:00
Devon Rifkin
dd93e1af85 Revert "increase default context length to 4096 (#10364)"
This reverts commit 424f648632.
2025-04-28 16:54:11 -07:00
Devon Rifkin
424f648632 increase default context length to 4096 (#10364)
* increase default context length to 4096

We lower the default numParallel from 4 to 2 and use these "savings" to
double the default context length from 2048 to 4096.

We're memory neutral in cases when we previously would've used
numParallel == 4, but we add the following mitigation to handle some
cases where we would have previously fallen back to 1x2048 due to low
VRAM: we decide between 2048 and 4096 using a runtime check, choosing
2048 if we're on a one GPU system with total VRAM of <= 4 GB. We
purposefully don't check the available VRAM because we don't want the
context window size to change unexpectedly based on the available VRAM.

We plan on making the default even larger, but this is a relatively
low-risk change we can make to quickly double it.

* fix tests

add an explicit context length so they don't get truncated. The code
that converts -1 from being a signal for doing a runtime check isn't
running as part of these tests.

* tweak small gpu message

* clarify context length default

also make it actually show up in `ollama serve --help`
2025-04-22 16:33:24 -07:00
Devon Rifkin
637fd21230 docs: change more template blocks to have syntax highlighting
In #8215 syntax highlighting was added to most of the blocks, but there were a couple that were still being rendered as plaintext
2025-04-15 12:08:11 -07:00
Devon Rifkin
378d3210dc docs: update some response code blocks to json5
This is to prevent rendering bright red comments indicating invalid JSON when the comments are just supposed to be explanatory
2025-04-14 17:09:06 -07:00
frob
ccc8c6777b cleanup: remove OLLAMA_TMPDIR and references to temporary executables (#10182)
* cleanup: remove OLLAMA_TMPDIR
* cleanup: ollama doesn't use temporary executables anymore

---------

Co-authored-by: Richard Lyons <frob@cloudstaff.com>
2025-04-08 15:01:39 -07:00
Bruce MacDonald
e172f095ba api: return model capabilities from the show endpoint (#10066)
With support for multimodal models becoming more varied and common it is important for clients to be able to easily see what capabilities a model has. Retuning these from the show endpoint will allow clients to easily see what a model can do.
2025-04-01 15:21:46 -07:00
Parth Sareen
b816ff86c9 docs: make context length faq readable (#10006) 2025-03-26 17:34:18 -07:00
copeland3300
5e0b904e88 docs: add flags to example linux log output command (#9852) 2025-03-25 09:52:23 -07:00
Bruce MacDonald
fb6252d786 benchmark: performance of running ollama server (#8643) 2025-03-21 13:08:20 -07:00
Parth Sareen
d14ce75b95 docs: update final response for /api/chat stream (#9919) 2025-03-21 12:35:47 -07:00
Bradley Erickson
74b44fdf8f docs: Add OLLAMA_ORIGINS for browser extension support (#9643) 2025-03-13 16:35:20 -07:00
Michael Yang
fe776293f7 Merge pull request #9569 from dwt/patch-1
Better WantedBy declaration
2025-03-10 14:09:37 -07:00
frob
d8a5d96b98 docs: Add OLLAMA_CONTEXT_LENGTH to FAQ. (#9545) 2025-03-10 11:02:54 -07:00
‮rekcäH nitraM‮
25248f4bd5 Better WantedBy declaration
The problem with default.target is that it always points to the target that is currently started. So if you boot into single user mode or the rescue mode still Ollama tries to start.

I noticed this because either tried (and failed) to start all the time during a system update, where Ollama definitely is not wanted.
2025-03-07 10:26:31 +01:00
Daniel Hiltgen
cae5d4d4ea Win: doc new rocm zip file (#9367)
To stay under the 2G github artifact limit, we're splitting ROCm
out like we do on linux.
2025-03-05 14:11:21 -08:00
Blake Mizerany
55ab9f371a server/.../backoff,syncs: don't break builds without synctest (#9484)
Previously, developers without the synctest experiment enabled would see
build failures when running tests in some server/internal/internal
packages using the synctest package. This change makes the transition to
use of the package less painful but guards the use of the synctest
package with build tags.

synctest is enabled in CI. If a new change will break a synctest
package, it will break in CI, even if it does not break locally.

The developer docs have been updated to help with any confusion about
why package tests pass locally but fail in CI.
2025-03-03 16:45:40 -08:00
Daniel Hiltgen
688925aca9 Windows ARM build (#9120)
* Windows ARM build

Skip cmake, and note it's unused in the developer docs.

* Win: only check for ninja when we need it

On windows ARM, the cim lookup fails, but we don't need ninja anyway.
2025-02-27 09:02:25 -08:00
Chuanhui Liu
888855675e docs: rocm install link (#9346) 2025-02-25 13:15:47 -08:00
frob
4df98f3eb5 Move cgroups fix out of AMD section. (#9072)
Co-authored-by: Richard Lyons <frob@cloudstaff.com>
2025-02-25 08:52:50 -08:00
Jeffrey Morgan
7cfd4aee4d docs: add additional ROCm docs for building (#9066) 2025-02-22 11:22:59 -08:00