Commit Graph

693 Commits

Author SHA1 Message Date
f3d7a481b7 feat: add support for min_p (resolve #1142) (#1825) 2024-07-27 14:37:40 -07:00
f2a96c7d77 llm: keep patch for llama 3 rope factors (#5987) 2024-07-26 15:20:52 -07:00
e12fff8810 Enable windows error dialog for subprocess startup
Make sure if something goes wrong spawning the process, the user gets
enough info to be able to try to self correct, or at least file a bug
with details so we can fix it.  Once the process starts, we immediately
change back to the recommended setting to prevent the blocking dialog.
This ensures if the model fails to load (OOM, unsupported model type,
etc.) the process will exit quickly and we can scan the stdout/stderr
of the subprocess for the reason to report via API.
2024-07-22 14:07:27 -07:00
e2c3f6b3e2 string 2024-07-22 11:27:52 -07:00
55cd3ddcca bool 2024-07-22 11:27:21 -07:00
35b89b2eab rfc: dynamic environ lookup 2024-07-22 11:25:30 -07:00
5784c05397 Merge pull request #5854 from dhiltgen/win_exit_status
Refine error reporting for subprocess crash
2024-07-22 10:40:22 -07:00
f8fedbda20 Update llama.cpp submodule commit to d94c6e0c (#5805) 2024-07-22 12:42:00 -04:00
a3c20e3f18 Refine error reporting for subprocess crash
On windows, the exit status winds up being the search term many
users search for and end up piling in on issues that are unrelated.
This refines the reporting so that if we have a more detailed message
we'll suppress the exit status portion of the message.
2024-07-22 08:52:16 -07:00
5534f2cc6a llm: consider head_dim in llama arch (#5817) 2024-07-20 21:48:12 -04:00
283948c83b Adjust windows ROCm discovery
The v5 hip library returns unsupported GPUs which wont enumerate at
inference time in the runner so this makes sure we align discovery.  The
gfx906 cards are no longer supported so we shouldn't compile with that
GPU type as it wont enumerate at runtime.
2024-07-20 15:17:50 -07:00
1475eab95f add patch for tekken (#5807) 2024-07-20 13:41:21 -04:00
4a565cbf94 add chat and generate tests with mock runner 2024-07-16 09:39:31 -07:00
b9f5e16c80 Introduce /api/embed endpoint supporting batch embedding (#5127)
* Initial Batch Embedding

* Revert "Initial Batch Embedding"

This reverts commit c22d54895a.

* Initial Draft

* mock up notes

* api/embed draft

* add server function

* check normalization

* clean up

* normalization

* playing around with truncate stuff

* Truncation

* Truncation

* move normalization to go

* Integration Test Template

* Truncation Integration Tests

* Clean up

* use float32

* move normalize

* move normalize test

* refactoring

* integration float32

* input handling and handler testing

* Refactoring of legacy and new

* clear comments

* merge conflicts

* touches

* embedding type 64

* merge conflicts

* fix hanging on single string

* refactoring

* test values

* set context length

* clean up

* testing clean up

* testing clean up

* remove function closure

* Revert "remove function closure"

This reverts commit 55d48c6ed1.

* remove function closure

* remove redundant error check

* clean up

* more clean up

* clean up
2024-07-15 12:14:24 -07:00
ef98803d63 llm: looser checks for minimum memory (#5677) 2024-07-13 09:20:05 -07:00
10e768826c fix: quant err message (#5616) 2024-07-11 17:24:29 -07:00
c4cf8ad559 llm: avoid loading model if system memory is too small (#5637)
* llm: avoid loading model if system memory is too small

* update log

* Instrument swap free space

On linux and windows, expose how much swap space is available
so we can take that into consideration when scheduling models

* use `systemSwapFreeMemory` in check

---------

Co-authored-by: Daniel Hiltgen <daniel@ollama.com>
2024-07-11 16:42:57 -07:00
791650ddef sched: only error when over-allocating system memory (#5626) 2024-07-11 00:53:12 -07:00
efbf41ed81 llm: dont link cuda with compat libs (#5621) 2024-07-10 20:01:52 -07:00
37a570f962 Merge pull request #5612 from ollama/mxyng/mem
chatglm graph
2024-07-10 14:18:33 -07:00
5a739ff4cb chatglm graph 2024-07-10 13:43:47 -07:00
4e262eb2a8 remove GGML_CUDA_FORCE_MMQ=on from build (#5588) 2024-07-10 13:17:13 -07:00
b50c818623 Merge pull request #5607 from dhiltgen/win_rocm_v6
Bump ROCm on windows to 6.1.2
2024-07-10 12:47:10 -07:00
1f50356e8e Bump ROCm on windows to 6.1.2
This also adjusts our algorithm to favor our bundled ROCm.
I've confirmed VRAM reporting still doesn't work properly so we
can't yet enable concurrency by default.
2024-07-10 11:01:22 -07:00
22c81f62ec Remove duplicate merge glitch 2024-07-10 09:01:33 -07:00
2d1e3c3229 Merge pull request #5503 from dhiltgen/dual_rocm
Workaround broken ROCm p2p copy
2024-07-09 15:44:16 -07:00
b51e3b63ac Statically link c++ and thread lib
This makes sure we statically link the c++ and thread library on windows
to avoid unnecessary runtime dependencies on non-standard DLLs
2024-07-09 11:34:30 -07:00
9bbddc37a7 Merge pull request #5126 from ollama/mxyng/messages
update message processing
2024-07-09 09:20:44 -07:00
0bacb30007 Workaround broken ROCm p2p copy
Enable the build flag for llama.cpp to use CPU copy for multi-GPU scenarios.
2024-07-08 09:40:52 -07:00
53da2c6965 llm: remove ambiguous comment when putting upper limit on predictions to avoid infinite generation (#5535) 2024-07-07 14:32:05 -04:00
d8def1ff94 llm: allow gemma 2 to context shift (#5534) 2024-07-07 13:41:51 -04:00
571dc61955 Update llama.cpp submodule to a8db2a9c (#5530) 2024-07-07 13:03:09 -04:00
0e09c380fc llm: print caching notices in debug only (#5533) 2024-07-07 12:38:04 -04:00
4607c70641 llm: add -DBUILD_SHARED_LIBS=off to common cpu cmake flags (#5520) 2024-07-06 18:58:16 -04:00
a08f20d910 release: remove unwanted mingw dll.a files 2024-07-06 15:21:15 -04:00
6cea036027 Revert "llm: only statically link libstdc++"
This reverts commit 5796bfc401.
2024-07-06 15:10:48 -04:00
5796bfc401 llm: only statically link libstdc++ 2024-07-06 14:06:20 -04:00
f1a379aa56 llm: statically link pthread and stdc++ dependencies in windows build 2024-07-06 12:54:02 -04:00
9ae146993e llm: add GGML_STATIC flag to windows static lib 2024-07-06 03:27:05 -04:00
e0348d3fe8 llm: add COMMON_DARWIN_DEFS to arm static build (#5513) 2024-07-05 22:42:42 -04:00
2cc854f8cb llm: fix missing dylibs by restoring old build behavior on Linux and macOS (#5511)
* Revert "fix cmake build (#5505)"

This reverts commit 4fd5f3526a.

* llm: fix missing dylibs by restoring old build behavior

* crlf -> lf
2024-07-05 21:48:31 -04:00
5304b765b2 llm: put back old include dir (#5507)
* llm: put back old include dir

* llm: update link paths for old submodule commits
2024-07-05 19:34:21 -04:00
4fd5f3526a fix cmake build (#5505) 2024-07-05 19:07:01 -04:00
ac7a842e55 fix model reloading
ensure runtime model changes (template, system prompt, messages,
options) are captured on model updates without needing to reload the
server
2024-07-05 13:17:25 -07:00
78fb33dd07 fix typo in cgo directives in llm.go (#5501) 2024-07-05 15:18:36 -04:00
8f8e736b13 update llama.cpp submodule to d7fd29f (#5475) 2024-07-05 13:25:58 -04:00
d89454de80 Use slot with cached prompt instead of least recently used (#5492)
* Use common prefix to select slot

* actually report `longest`
2024-07-05 12:32:47 -04:00
e9188e971a Fix assert on small embedding inputs (#5491)
* Fix assert on small embedding inputs

* Update llm/patches/09-pooling.diff
2024-07-05 11:20:57 -04:00
02c24d3d01 Merge pull request #5466 from dhiltgen/fix_clip_unicode
Fix clip model loading with unicode paths
2024-07-05 08:16:58 -07:00
4d71c559b2 fix error detection by limiting model loading error parsing (#5472) 2024-07-03 20:04:30 -04:00