mirror of
https://github.com/ollama/ollama.git
synced 2025-11-10 23:57:32 +01:00
154 lines
4.6 KiB
Plaintext
154 lines
4.6 KiB
Plaintext
---
|
||
title: Thinking
|
||
---
|
||
|
||
Thinking-capable models emit a `thinking` field that separates their reasoning trace from the final answer.
|
||
|
||
Use this capability to audit model steps, animate the model *thinking* in a UI, or hide the trace entirely when you only need the final response.
|
||
|
||
## Supported models
|
||
|
||
- [Qwen 3](https://ollama.com/library/qwen3)
|
||
- [GPT-OSS](https://ollama.com/library/gpt-oss) *(use `think` levels: `low`, `medium`, `high` — the trace cannot be fully disabled)*
|
||
- [DeepSeek-v3.1](https://ollama.com/library/deepseek-v3.1)
|
||
- [DeepSeek R1](https://ollama.com/library/deepseek-r1)
|
||
- Browse the latest additions under [thinking models](https://ollama.com/search?c=thinking)
|
||
|
||
## Enable thinking in API calls
|
||
|
||
Set the `think` field on chat or generate requests. Most models accept booleans (`true`/`false`).
|
||
|
||
GPT-OSS instead expects one of `low`, `medium`, or `high` to tune the trace length.
|
||
|
||
The `message.thinking` (chat endpoint) or `thinking` (generate endpoint) field contains the reasoning trace while `message.content` / `response` holds the final answer.
|
||
|
||
<Tabs>
|
||
<Tab title="cURL">
|
||
```shell
|
||
curl http://localhost:11434/api/chat -d '{
|
||
"model": "qwen3",
|
||
"messages": [{
|
||
"role": "user",
|
||
"content": "How many letter r are in strawberry?"
|
||
}],
|
||
"think": true,
|
||
"stream": false
|
||
}'
|
||
```
|
||
</Tab>
|
||
<Tab title="Python">
|
||
```python
|
||
from ollama import chat
|
||
|
||
response = chat(
|
||
model='qwen3',
|
||
messages=[{'role': 'user', 'content': 'How many letter r are in strawberry?'}],
|
||
think=True,
|
||
stream=False,
|
||
)
|
||
|
||
print('Thinking:\n', response.message.thinking)
|
||
print('Answer:\n', response.message.content)
|
||
```
|
||
</Tab>
|
||
<Tab title="JavaScript">
|
||
```javascript
|
||
import ollama from 'ollama'
|
||
|
||
const response = await ollama.chat({
|
||
model: 'deepseek-r1',
|
||
messages: [{ role: 'user', content: 'How many letter r are in strawberry?' }],
|
||
think: true,
|
||
stream: false,
|
||
})
|
||
|
||
console.log('Thinking:\n', response.message.thinking)
|
||
console.log('Answer:\n', response.message.content)
|
||
```
|
||
</Tab>
|
||
</Tabs>
|
||
|
||
<Note>
|
||
GPT-OSS requires `think` to be set to `"low"`, `"medium"`, or `"high"`. Passing `true`/`false` is ignored for that model.
|
||
</Note>
|
||
|
||
## Stream the reasoning trace
|
||
|
||
Thinking streams interleave reasoning tokens before answer tokens. Detect the first `thinking` chunk to render a "thinking" section, then switch to the final reply once `message.content` arrives.
|
||
|
||
<Tabs>
|
||
<Tab title="Python">
|
||
```python
|
||
from ollama import chat
|
||
|
||
stream = chat(
|
||
model='qwen3',
|
||
messages=[{'role': 'user', 'content': 'What is 17 × 23?'}],
|
||
think=True,
|
||
stream=True,
|
||
)
|
||
|
||
in_thinking = False
|
||
|
||
for chunk in stream:
|
||
if chunk.message.thinking and not in_thinking:
|
||
in_thinking = True
|
||
print('Thinking:\n', end='')
|
||
|
||
if chunk.message.thinking:
|
||
print(chunk.message.thinking, end='')
|
||
elif chunk.message.content:
|
||
if in_thinking:
|
||
print('\n\nAnswer:\n', end='')
|
||
in_thinking = False
|
||
print(chunk.message.content, end='')
|
||
|
||
```
|
||
</Tab>
|
||
<Tab title="JavaScript">
|
||
```javascript
|
||
import ollama from 'ollama'
|
||
|
||
async function main() {
|
||
const stream = await ollama.chat({
|
||
model: 'qwen3',
|
||
messages: [{ role: 'user', content: 'What is 17 × 23?' }],
|
||
think: true,
|
||
stream: true,
|
||
})
|
||
|
||
let inThinking = false
|
||
|
||
for await (const chunk of stream) {
|
||
if (chunk.message.thinking && !inThinking) {
|
||
inThinking = true
|
||
process.stdout.write('Thinking:\n')
|
||
}
|
||
|
||
if (chunk.message.thinking) {
|
||
process.stdout.write(chunk.message.thinking)
|
||
} else if (chunk.message.content) {
|
||
if (inThinking) {
|
||
process.stdout.write('\n\nAnswer:\n')
|
||
inThinking = false
|
||
}
|
||
process.stdout.write(chunk.message.content)
|
||
}
|
||
}
|
||
}
|
||
|
||
main()
|
||
```
|
||
</Tab>
|
||
</Tabs>
|
||
|
||
## CLI quick reference
|
||
|
||
- Enable thinking for a single run: `ollama run deepseek-r1 --think "Where should I visit in Lisbon?"`
|
||
- Disable thinking: `ollama run deepseek-r1 --think=false "Summarize this article"`
|
||
- Hide the trace while still using a thinking model: `ollama run deepseek-r1 --hidethinking "Is 9.9 bigger or 9.11?"`
|
||
- Inside interactive sessions, toggle with `/set think` or `/set nothink`.
|
||
- GPT-OSS only accepts levels: `ollama run gpt-oss --think=low "Draft a headline"` (replace `low` with `medium` or `high` as needed).
|
||
|
||
<Note>Thinking is enabled by default in the CLI and API for supported models.</Note>
|