mirror of
https://github.com/ollama/ollama.git
synced 2025-07-23 23:05:47 +02:00
chat api endpoint (#1392)
This commit is contained in:
143
docs/api.md
143
docs/api.md
@@ -24,7 +24,7 @@ All durations are returned in nanoseconds.
|
||||
|
||||
### Streaming responses
|
||||
|
||||
Certain endpoints stream responses as JSON objects delineated with the newline (`\n`) character.
|
||||
Certain endpoints stream responses as JSON objects.
|
||||
|
||||
## Generate a completion
|
||||
|
||||
@@ -32,7 +32,7 @@ Certain endpoints stream responses as JSON objects delineated with the newline (
|
||||
POST /api/generate
|
||||
```
|
||||
|
||||
Generate a response for a given prompt with a provided model. This is a streaming endpoint, so will be a series of responses. The final response object will include statistics and additional data from the request.
|
||||
Generate a response for a given prompt with a provided model. This is a streaming endpoint, so there will be a series of responses. The final response object will include statistics and additional data from the request.
|
||||
|
||||
### Parameters
|
||||
|
||||
@@ -47,7 +47,7 @@ Advanced parameters (optional):
|
||||
- `template`: the full prompt or prompt template (overrides what is defined in the `Modelfile`)
|
||||
- `context`: the context parameter returned from a previous request to `/generate`, this can be used to keep a short conversational memory
|
||||
- `stream`: if `false` the response will be returned as a single response object, rather than a stream of objects
|
||||
- `raw`: if `true` no formatting will be applied to the prompt and no context will be returned. You may choose to use the `raw` parameter if you are specifying a full templated prompt in your request to the API, and are managing history yourself.
|
||||
- `raw`: if `true` no formatting will be applied to the prompt. You may choose to use the `raw` parameter if you are specifying a full templated prompt in your request to the API.
|
||||
|
||||
### JSON mode
|
||||
|
||||
@@ -57,7 +57,7 @@ Enable JSON mode by setting the `format` parameter to `json`. This will structur
|
||||
|
||||
### Examples
|
||||
|
||||
#### Request
|
||||
#### Request (Prompt)
|
||||
|
||||
```shell
|
||||
curl http://localhost:11434/api/generate -d '{
|
||||
@@ -114,6 +114,8 @@ To calculate how fast the response is generated in tokens per second (token/s),
|
||||
|
||||
#### Request (No streaming)
|
||||
|
||||
A response can be recieved in one reply when streaming is off.
|
||||
|
||||
```shell
|
||||
curl http://localhost:11434/api/generate -d '{
|
||||
"model": "llama2",
|
||||
@@ -144,9 +146,9 @@ If `stream` is set to `false`, the response will be a single JSON object:
|
||||
}
|
||||
```
|
||||
|
||||
#### Request (Raw mode)
|
||||
#### Request (Raw Mode)
|
||||
|
||||
In some cases you may wish to bypass the templating system and provide a full prompt. In this case, you can use the `raw` parameter to disable formatting and context.
|
||||
In some cases you may wish to bypass the templating system and provide a full prompt. In this case, you can use the `raw` parameter to disable formatting.
|
||||
|
||||
```shell
|
||||
curl http://localhost:11434/api/generate -d '{
|
||||
@@ -164,6 +166,7 @@ curl http://localhost:11434/api/generate -d '{
|
||||
"model": "mistral",
|
||||
"created_at": "2023-11-03T15:36:02.583064Z",
|
||||
"response": " The sky appears blue because of a phenomenon called Rayleigh scattering.",
|
||||
"context": [1, 2, 3],
|
||||
"done": true,
|
||||
"total_duration": 14648695333,
|
||||
"load_duration": 3302671417,
|
||||
@@ -275,7 +278,6 @@ curl http://localhost:11434/api/generate -d '{
|
||||
"model": "llama2",
|
||||
"created_at": "2023-08-04T19:22:45.499127Z",
|
||||
"response": "The sky is blue because it is the color of the sky.",
|
||||
"context": [1, 2, 3],
|
||||
"done": true,
|
||||
"total_duration": 5589157167,
|
||||
"load_duration": 3013701500,
|
||||
@@ -288,6 +290,133 @@ curl http://localhost:11434/api/generate -d '{
|
||||
}
|
||||
```
|
||||
|
||||
## Send Chat Messages
|
||||
```shell
|
||||
POST /api/chat
|
||||
```
|
||||
|
||||
Generate the next message in a chat with a provided model. This is a streaming endpoint, so there will be a series of responses. The final response object will include statistics and additional data from the request.
|
||||
|
||||
### Parameters
|
||||
|
||||
- `model`: (required) the [model name](#model-names)
|
||||
- `messages`: the messages of the chat, this can be used to keep a chat memory
|
||||
|
||||
Advanced parameters (optional):
|
||||
|
||||
- `format`: the format to return a response in. Currently the only accepted value is `json`
|
||||
- `options`: additional model parameters listed in the documentation for the [Modelfile](./modelfile.md#valid-parameters-and-values) such as `temperature`
|
||||
- `template`: the full prompt or prompt template (overrides what is defined in the `Modelfile`)
|
||||
- `stream`: if `false` the response will be returned as a single response object, rather than a stream of objects
|
||||
|
||||
### Examples
|
||||
|
||||
#### Request
|
||||
Send a chat message with a streaming response.
|
||||
|
||||
```shell
|
||||
curl http://localhost:11434/api/generate -d '{
|
||||
"model": "llama2",
|
||||
"messages": [
|
||||
{
|
||||
"role": "user",
|
||||
"content": "why is the sky blue?"
|
||||
}
|
||||
]
|
||||
}'
|
||||
```
|
||||
|
||||
#### Response
|
||||
|
||||
A stream of JSON objects is returned:
|
||||
|
||||
```json
|
||||
{
|
||||
"model": "llama2",
|
||||
"created_at": "2023-08-04T08:52:19.385406455-07:00",
|
||||
"message": {
|
||||
"role": "assisant",
|
||||
"content": "The"
|
||||
},
|
||||
"done": false
|
||||
}
|
||||
```
|
||||
|
||||
Final response:
|
||||
|
||||
```json
|
||||
{
|
||||
"model": "llama2",
|
||||
"created_at": "2023-08-04T19:22:45.499127Z",
|
||||
"done": true,
|
||||
"total_duration": 5589157167,
|
||||
"load_duration": 3013701500,
|
||||
"sample_count": 114,
|
||||
"sample_duration": 81442000,
|
||||
"prompt_eval_count": 46,
|
||||
"prompt_eval_duration": 1160282000,
|
||||
"eval_count": 113,
|
||||
"eval_duration": 1325948000
|
||||
}
|
||||
```
|
||||
|
||||
#### Request (With History)
|
||||
Send a chat message with a conversation history.
|
||||
|
||||
```shell
|
||||
curl http://localhost:11434/api/generate -d '{
|
||||
"model": "llama2",
|
||||
"messages": [
|
||||
{
|
||||
"role": "user",
|
||||
"content": "why is the sky blue?"
|
||||
},
|
||||
{
|
||||
"role": "assistant",
|
||||
"content": "due to rayleigh scattering."
|
||||
},
|
||||
{
|
||||
"role": "user",
|
||||
"content": "how is that different than mie scattering?"
|
||||
}
|
||||
]
|
||||
}'
|
||||
```
|
||||
|
||||
#### Response
|
||||
|
||||
A stream of JSON objects is returned:
|
||||
|
||||
```json
|
||||
{
|
||||
"model": "llama2",
|
||||
"created_at": "2023-08-04T08:52:19.385406455-07:00",
|
||||
"message": {
|
||||
"role": "assisant",
|
||||
"content": "The"
|
||||
},
|
||||
"done": false
|
||||
}
|
||||
```
|
||||
|
||||
Final response:
|
||||
|
||||
```json
|
||||
{
|
||||
"model": "llama2",
|
||||
"created_at": "2023-08-04T19:22:45.499127Z",
|
||||
"done": true,
|
||||
"total_duration": 5589157167,
|
||||
"load_duration": 3013701500,
|
||||
"sample_count": 114,
|
||||
"sample_duration": 81442000,
|
||||
"prompt_eval_count": 46,
|
||||
"prompt_eval_duration": 1160282000,
|
||||
"eval_count": 113,
|
||||
"eval_duration": 1325948000
|
||||
}
|
||||
```
|
||||
|
||||
## Create a Model
|
||||
|
||||
```shell
|
||||
|
Reference in New Issue
Block a user