Commit Graph

796 Commits

Author SHA1 Message Date
499e87c9ba Merge pull request #5730 from ollama/mxyng/cleanup
remove unneeded tool calls
2024-07-16 14:42:13 -07:00
d290e87513 add suffix support to generate endpoint
this change is triggered by the presence of "suffix", particularly
useful for code completion tasks
2024-07-16 14:31:35 -07:00
5a83f79afd remove unneeded tool calls 2024-07-16 13:48:45 -07:00
987dbab0b0 OpenAI: /v1/embeddings compatibility (#5285)
* OpenAI v1 models

* Empty List Testing

* Add back envconfig

* v1/models docs

* Remove Docs

* OpenAI batch embed compatibility

* merge conflicts

* integrate with api/embed

* ep

* merge conflicts

* request tests

* rm resp test

* merge conflict

* merge conflict

* test fixes

* test fn renaming

* input validation for empty string

---------

Co-authored-by: jmorganca <jmorganca@gmail.com>
2024-07-16 13:36:08 -07:00
a8388beb94 Merge pull request #5726 from ollama/mxyng/tools-templates
fix unmarshal type errors
2024-07-16 12:12:10 -07:00
5afbb60fc4 fix unmarshal type errors 2024-07-16 11:39:34 -07:00
4cb5d7decc server: omit model system prompt if empty (#5717) 2024-07-16 11:09:00 -07:00
4a565cbf94 add chat and generate tests with mock runner 2024-07-16 09:39:31 -07:00
64039df6d7 Merge pull request #5284 from ollama/mxyng/tools
tools
2024-07-15 18:03:37 -07:00
7ac6d462ec server: return empty slice on empty /api/embed request (#5713)
* server: return empty slice on empty `/api/embed` request

* fix tests
2024-07-15 17:39:44 -07:00
ef5136a745 tools test 2024-07-15 17:18:21 -07:00
d02bbebb11 tools 2024-07-15 15:26:16 -07:00
b9f5e16c80 Introduce /api/embed endpoint supporting batch embedding (#5127)
* Initial Batch Embedding

* Revert "Initial Batch Embedding"

This reverts commit c22d54895a.

* Initial Draft

* mock up notes

* api/embed draft

* add server function

* check normalization

* clean up

* normalization

* playing around with truncate stuff

* Truncation

* Truncation

* move normalization to go

* Integration Test Template

* Truncation Integration Tests

* Clean up

* use float32

* move normalize

* move normalize test

* refactoring

* integration float32

* input handling and handler testing

* Refactoring of legacy and new

* clear comments

* merge conflicts

* touches

* embedding type 64

* merge conflicts

* fix hanging on single string

* refactoring

* test values

* set context length

* clean up

* testing clean up

* testing clean up

* remove function closure

* Revert "remove function closure"

This reverts commit 55d48c6ed1.

* remove function closure

* remove redundant error check

* clean up

* more clean up

* clean up
2024-07-15 12:14:24 -07:00
057d31861e remove template (#5655) 2024-07-13 20:56:24 -07:00
f7ee012300 server: prepend system message in chat handler 2024-07-13 15:08:00 -07:00
1ed0aa8fea server: fix context, load_duration and total_duration fields (#5676)
* server: fix `contet`, `load_duration` and `total_duration` fields

* Update server/routes.go
2024-07-13 09:25:31 -07:00
22c5451fc2 fix system prompt (#5662)
* fix system prompt

* execute template when hitting previous roles

* fix tests

---------

Co-authored-by: jmorganca <jmorganca@gmail.com>
2024-07-12 21:04:44 -07:00
ebc529cbb3 autodetect stop parameters from template 2024-07-12 16:01:23 -07:00
57ec6901eb revert embedded templates to use prompt/response
This reverts commit 19753c18c0.

for compat. messages will be added at a later date
2024-07-11 14:49:35 -07:00
791650ddef sched: only error when over-allocating system memory (#5626) 2024-07-11 00:53:12 -07:00
41be28096a add system prompt to first legacy template 2024-07-10 17:03:08 -07:00
f4408219e9 Refine scheduler unit tests for reliability
This breaks up some of the test scenarios to create a
more reliable set of tests, as well as adding a little more
coverage.
2024-07-09 16:00:08 -07:00
6bbbc50f10 Merge pull request #5440 from ollama/mxyng/messages-templates
update named templates
2024-07-09 09:36:32 -07:00
9bbddc37a7 Merge pull request #5126 from ollama/mxyng/messages
update message processing
2024-07-09 09:20:44 -07:00
e4ff73297d server: fix model reloads when setting OLLAMA_NUM_PARALLEL (#5560)
* server: fix unneeded model reloads when setting `OLLAMA_NUM_PARALLEL`

* remove whitespace change

* undo some changes
2024-07-08 22:32:15 -07:00
0ee87615c7 sched: don't error if paging to disk on Windows and macOS (#5523) 2024-07-06 22:01:52 -04:00
fb6cbc02fb update named templates 2024-07-05 16:29:32 -07:00
ac7a842e55 fix model reloading
ensure runtime model changes (template, system prompt, messages,
options) are captured on model updates without needing to reload the
server
2024-07-05 13:17:25 -07:00
2c3fe1fd97 comments 2024-07-05 13:17:24 -07:00
269ed6e6a2 update message processing 2024-07-05 13:16:58 -07:00
af28b94533 Merge pull request #5469 from dhiltgen/prevent_system_oom
Prevent loading models larger than total memory
2024-07-05 08:22:20 -07:00
0d16eb310e fix: use envconfig.ModelsDir directly (#4821)
* Co-authored-by: Anatoli Babenia <anatoli@rainforce.org>

Co-authored-by: Maas Lalani <maas@lalani.dev>
2024-07-03 15:36:11 -07:00
955f2a4e03 Only set default keep_alive on initial model load
This change fixes the handling of keep_alive so that if client
request omits the setting, we only set this on initial load.  Once
the model is loaded, if new requests leave this unset, we'll keep
whatever keep_alive was there.
2024-07-03 15:29:56 -07:00
3c75113e37 Prevent loading models larger than total memory
Users may not realize the siny new model they're trying to load
fits on their disk, but can't load into system+GPU memory.  Today
we crash, but with this fix, we'll give them a better error message
before even trying to load it.
2024-07-03 14:47:42 -07:00
65a5040e09 fix generate template 2024-07-02 16:42:17 -07:00
d626b99b54 OpenAI: v1/completions compatibility (#5209)
* OpenAI v1 models

* Refactor Writers

* Add Test

Co-Authored-By: Attila Kerekes

* Credit Co-Author

Co-Authored-By: Attila Kerekes <439392+keriati@users.noreply.github.com>

* Empty List Testing

* Use Namespace for Ownedby

* Update Test

* Add back envconfig

* v1/models docs

* Use ModelName Parser

* Test Names

* Remove Docs

* Clean Up

* Test name

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

* Add Middleware for Chat and List

* Completions Endpoint

* Testing Cleanup

* Test with Fatal

* Add functionality to chat test

* Rename function

* float types

* type cleanup

* cleaning

* more cleaning

* Extra test cases

* merge conflicts

* merge conflicts

* merge conflicts

* merge conflicts

* cleaning

* cleaning

---------

Co-authored-by: Attila Kerekes <439392+keriati@users.noreply.github.com>
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
2024-07-02 16:01:45 -07:00
dddb58a38b Merge pull request #5051 from ollama/mxyng/capabilities
add model capabilities
2024-07-02 14:26:07 -07:00
400056e154 Merge pull request #5420 from ollama/mxyng/insecure-path
err on insecure path
2024-07-02 14:03:23 -07:00
996bb1b85e OpenAI: /v1/models and /v1/models/{model} compatibility (#5007)
* OpenAI v1 models

* Refactor Writers

* Add Test

Co-Authored-By: Attila Kerekes

* Credit Co-Author

Co-Authored-By: Attila Kerekes <439392+keriati@users.noreply.github.com>

* Empty List Testing

* Use Namespace for Ownedby

* Update Test

* Add back envconfig

* v1/models docs

* Use ModelName Parser

* Test Names

* Remove Docs

* Clean Up

* Test name

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

* Add Middleware for Chat and List

* Testing Cleanup

* Test with Fatal

* Add functionality to chat test

* OpenAI: /v1/models/{model} compatibility (#5028)

* Retrieve Model

* OpenAI Delete Model

* Retrieve Middleware

* Remove Delete from Branch

* Update Test

* Middleware Test File

* Function name

* Cleanup

* Test Update

* Test Update

---------

Co-authored-by: Attila Kerekes <439392+keriati@users.noreply.github.com>
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
2024-07-02 11:50:56 -07:00
88bcd79bb9 err on insecure path 2024-07-01 15:55:59 -07:00
da8e2a0447 use kvs to detect embedding models 2024-07-01 10:47:43 -07:00
a30915bde1 add capabilities 2024-07-01 10:47:43 -07:00
58e3fff311 rename templates to template 2024-07-01 10:40:54 -07:00
3f0b309ad4 remove ManifestV2 2024-07-01 10:40:54 -07:00
cff3f44f4a Fix case for NumCtx 2024-07-01 09:43:59 -07:00
3518aaef33 Merge pull request #4218 from dhiltgen/auto_parallel
Enable concurrency by default
2024-07-01 08:32:29 -07:00
123a722a6f zip: prevent extracting files into parent dirs (#5314) 2024-06-26 21:38:21 -07:00
cb42e607c5 llm: speed up gguf decoding by a lot (#5246)
Previously, some costly things were causing the loading of GGUF files
and their metadata and tensor information to be VERY slow:

  * Too many allocations when decoding strings
  * Hitting disk for each read of each key and value, resulting in a
    not-okay amount of syscalls/disk I/O.

The show API is now down to 33ms from 800ms+ for llama3 on a macbook pro
m3.

This commit also prevents collecting large arrays of values when
decoding GGUFs (if desired). When such keys are encountered, their
values are null, and are encoded as such in JSON.

Also, this fixes a broken test that was not encoding valid GGUF.
2024-06-24 21:47:52 -07:00
642cee1342 Sort the ps output
Provide consistent ordering for the ps command - longest duration listed first
2024-06-21 15:59:41 -07:00
9929751cc8 Disable concurrency for AMD + Windows
Until ROCm v6.2 ships, we wont be able to get accurate free memory
reporting on windows, which makes automatic concurrency too risky.
Users can still opt-in but will need to pay attention to model sizes otherwise they may thrash/page VRAM or cause OOM crashes.
All other platforms and GPUs have accurate VRAM reporting wired
up now, so we can turn on concurrency by default.
2024-06-21 15:45:05 -07:00