c9e6f0542d
Merge pull request #5069 from dhiltgen/ci_release
...
Implement custom github release action
2024-06-17 13:59:37 -07:00
152fc202f5
llm: update llama.cpp commit to 7c26775
( #4896 )
...
* llm: update llama.cpp submodule to `7c26775`
* disable `LLAMA_BLAS` for now
* `-DLLAMA_OPENMP=off`
v0.1.45-rc2
2024-06-17 15:56:16 -04:00
4ad0d4d6d3
Fix a build warning ( #5096 )
...
Signed-off-by: Lei Jitang <leijitang@outlook.com >
2024-06-17 14:47:48 -04:00
163cd3e77c
gpu: add env var for detecting Intel oneapi gpus ( #5076 )
...
* gpu: add env var for detecting intel oneapi gpus
* fix build error
2024-06-16 20:09:05 -04:00
4c2c8f93dd
Merge pull request #5080 from dhiltgen/debug_intel_crash
...
Add some more debugging logs for intel discovery
2024-06-16 14:42:41 -07:00
fd1e6e0590
Add some more debugging logs for intel discovery
...
Also removes an unused overall count variable
2024-06-16 07:42:52 -07:00
89c79bec8c
Add ModifiedAt Field to /api/show ( #5033 )
...
* Add Mod Time to Show
* Error Handling
2024-06-15 20:53:56 -07:00
c7b77004e3
docs: add missing powershell package to windows development instructions ( #5075 )
...
* docs: add missing instruction for powershell build
The powershell script for building Ollama on Windows now requires the `ThreadJob` module. Add this to the instructions and dependency list.
* Update development.md
2024-06-15 23:08:09 -04:00
07d143f412
Merge pull request #5058 from coolljt0725/fix_build_warning
...
gpu: Fix build warning
2024-06-15 11:52:36 -07:00
a12283e2ff
Implement custom github release action
...
This implements the release logic we want via gh cli
to support updating releases with rc tags in place and retain
release notes and other community reactions.
2024-06-15 11:36:56 -07:00
4b0050cf0e
Merge pull request #5037 from dhiltgen/faster_win_build
...
More parallelism on windows generate
v0.1.45-rc1
2024-06-15 08:03:05 -07:00
0577af98f4
More parallelism on windows generate
...
Make the build faster
2024-06-15 07:44:55 -07:00
17ce203a26
Merge pull request #4875 from dhiltgen/rocm_gfx900_workaround
...
Rocm gfx900 workaround
2024-06-15 07:38:58 -07:00
d76555ffb5
Merge pull request #4874 from dhiltgen/rocm_v6_bump
...
Rocm v6 bump
2024-06-15 07:38:32 -07:00
2786dff5d3
Merge pull request #4264 from dhiltgen/show_gpu_visible_settings
...
Centralize GPU configuration vars
2024-06-15 07:33:52 -07:00
225f0d1219
gpu: Fix build warning
...
Signed-off-by: Lei Jitang <leijitang@outlook.com >
2024-06-15 14:26:23 +08:00
532db58311
Merge pull request #4972 from jayson-cloude/main
...
fix: "Skip searching for network devices"
2024-06-14 17:04:40 -07:00
6be309e1bd
Centralize GPU configuration vars
...
This should aid in troubleshooting by capturing and reporting the GPU
settings at startup in the logs along with all the other server settings.
2024-06-14 15:59:10 -07:00
da3bf23354
Workaround gfx900 SDMA bugs
...
Implement support for GPU env var workarounds, and leverage
this for the Vega RX 56 which needs
HSA_ENABLE_SDMA=0 set to work properly
2024-06-14 15:38:13 -07:00
26ab67732b
Bump ROCm linux to 6.1.1
2024-06-14 15:37:54 -07:00
45cacbaf05
Merge pull request #4517 from dhiltgen/gpu_incremental
...
Enhanced GPU discovery and multi-gpu support with concurrency
2024-06-14 15:35:00 -07:00
17df6520c8
Remove mmap related output calc logic
2024-06-14 14:55:50 -07:00
6f351bf586
review comments and coverage
2024-06-14 14:55:50 -07:00
ff4f0cbd1d
Prevent multiple concurrent loads on the same gpus
...
While models are loading, the VRAM metrics are dynamic, so try
to load on a GPU that doesn't have a model actively loading, or wait
to avoid races that lead to OOMs
2024-06-14 14:51:40 -07:00
fc37c192ae
Refine CPU load behavior with system memory visibility
2024-06-14 14:51:40 -07:00
434dfe30c5
Reintroduce nvidia nvml library for windows
...
This library will give us the most reliable free VRAM reporting on windows
to enable concurrent model scheduling.
2024-06-14 14:51:40 -07:00
4e2b7e181d
Refactor intel gpu discovery
2024-06-14 14:51:40 -07:00
48702dd149
Harden unload for empty runners
2024-06-14 14:51:40 -07:00
68dfc6236a
refined test timing
...
adjust timing on some tests so they don't timeout on small/slow GPUs
2024-06-14 14:51:40 -07:00
5e8ff556cb
Support forced spreading for multi GPU
...
Our default behavior today is to try to fit into a single GPU if possible.
Some users would prefer the old behavior of always spreading across
multiple GPUs even if the model can fit into one. This exposes that
tunable behavior.
2024-06-14 14:51:40 -07:00
6fd04ca922
Improve multi-gpu handling at the limit
...
Still not complete, needs some refinement to our prediction to understand the
discrete GPUs available space so we can see how many layers fit in each one
since we can't split one layer across multiple GPUs we can't treat free space
as one logical block
2024-06-14 14:51:40 -07:00
206797bda4
Fix concurrency integration test to work locally
...
This worked remotely but wound up trying to spawn multiple servers
locally which doesn't work
2024-06-14 14:51:40 -07:00
43ed358f9a
Refine GPU discovery to bootstrap once
...
Now that we call the GPU discovery routines many times to
update memory, this splits initial discovery from free memory
updating.
2024-06-14 14:51:40 -07:00
b32ebb4f29
Use DRM driver for VRAM info for amd
...
The amdgpu drivers free VRAM reporting omits some other apps, so leverage the
upstream DRM driver which keeps better tabs on things
2024-06-14 14:51:40 -07:00
fb9cdfa723
Fix server.cpp for the new cuda build macros
2024-06-14 14:51:40 -07:00
efac488675
Revert "Limit GPU lib search for now ( #4777 )"
...
This reverts commit 476fb8e892
.
2024-06-14 14:51:40 -07:00
6b800aa7b7
openai: do not set temperature to 0 when setting seed ( #5045 )
2024-06-14 13:43:56 -07:00
dd7c9ebeaf
server: longer timeout in TestRequests
( #5046 )
2024-06-14 09:48:25 -07:00
4dc7fb9525
update 40xx gpu compat matrix ( #5036 )
2024-06-13 17:10:33 -07:00
c39761c552
Merge pull request #5032 from dhiltgen/actually_skip
...
Actually skip PhysX on windows
v0.1.44
2024-06-13 13:26:09 -07:00
aac367636d
Actually skip PhysX on windows
2024-06-13 13:17:19 -07:00
15a687ae4b
Merge pull request #5031 from ollama/mxyng/fix-multibyte-utf16
...
fix: multibyte utf16
2024-06-13 13:14:55 -07:00
d528e1af75
fix utf16 for multibyte runes
2024-06-13 13:07:42 -07:00
cd234ce22c
parser: add test for multibyte runes
2024-06-13 13:07:42 -07:00
94618b2365
add OLLAMA_MODELS to envconfig ( #5029 )
2024-06-13 12:52:03 -07:00
1fd236d177
server: remove jwt decoding error ( #5027 )
2024-06-13 11:21:15 -07:00
e87fc7200d
Merge pull request #5025 from ollama/mxyng/revert-parser-scan
...
Revert "proper utf16 support"
2024-06-13 10:31:25 -07:00
20b9f8e6f4
Revert "proper utf16 support"
...
This reverts commit 66ab48772f
.
this change broke utf-8 scanning of multi-byte runes
2024-06-13 10:22:16 -07:00
c69bc19e46
move OLLAMA_HOST to envconfig ( #5009 )
2024-06-12 18:48:16 -04:00
bba5d177aa
Merge pull request #5004 from ollama/mxyng/fix-templates
...
fix: multiple templates when creating from model
2024-06-12 14:39:29 -07:00