Commit Graph

3014 Commits

Author SHA1 Message Date
d88582dffd some changes for llama3 2024-05-20 16:13:57 -07:00
2f81b3dce2 Merge pull request #4502 from ollama/mxyng/fix-quantize
fix quantize file types
2024-05-20 16:09:27 -07:00
5cab13739e set llama.cpp submodule commit to 614d3b9 2024-05-20 15:28:17 -07:00
8aadad9c72 updated updateURL 2024-05-20 15:24:32 -07:00
807d092761 fix quantize file types 2024-05-20 15:22:11 -07:00
f36f1d6be9 tidy intermediate blobs 2024-05-20 15:15:06 -07:00
8800c8a59b chore: fix typo in docs (#4536) 2024-05-20 14:19:03 -07:00
b4dce13309 Merge pull request #4330 from ollama/mxyng/cache-intermediate-layers
cache and reuse intermediate blobs
2024-05-20 13:54:41 -07:00
Sam
e15307fdf4 feat: add support for flash_attn (#4120)
* feat: enable flash attention if supported

* feat: enable flash attention if supported

* feat: enable flash attention if supported

* feat: add flash_attn support
2024-05-20 13:36:03 -07:00
3520c0e4d5 cache and reuse intermediate blobs
particularly useful for zipfiles and f16s
2024-05-20 13:25:10 -07:00
ccdf0b2a44 Move the parser back + handle utf16 files (#4533) 2024-05-20 11:26:45 -07:00
63a453554d go mod tidy 2024-05-19 23:03:57 -07:00
105186aa17 add OLLAMA_NOHISTORY to turn off history in interactive mode (#4508) 2024-05-18 11:51:57 -07:00
ba04afc9a4 Merge pull request #4483 from dhiltgen/clean_exit
Don't return error on signal exit
2024-05-17 11:41:57 -07:00
7e1e0086e7 Merge pull request #4482 from dhiltgen/integration_improvements
Skip max queue test on remote
2024-05-16 16:43:48 -07:00
02b31c9dc8 Don't return error on signal exit 2024-05-16 16:25:38 -07:00
7f2fbad736 Skip max queue test on remote
This test needs to be able to adjust the queue size down from
our default setting for a reliable test, so it needs to skip on
remote test execution mode.
2024-05-16 16:24:18 -07:00
5bece94509 Merge pull request #4463 from ollama/jyan/line-display
changed line display to be calculated with runewidth
2024-05-16 14:15:08 -07:00
3d90156e99 removed comment 2024-05-16 14:12:03 -07:00
5e46c5c435 Updating software for read me (#4467)
* Update README.md

Added chat/moderation bot to list of software.

* Update README.md

Fixed link error.
2024-05-16 13:55:14 -07:00
583c1f472c update llama.cpp submodule to 614d3b9 (#4414) 2024-05-16 13:53:09 -07:00
26bfc1c443 go fmt'd cmd.go 2024-05-15 17:26:39 -07:00
799aa9883c go fmt'd cmd.go 2024-05-15 17:24:17 -07:00
84ed77cbd8 Merge pull request #4436 from ollama/mxyng/done-part
return on part done
2024-05-15 17:16:24 -07:00
c9e584fb90 updated double-width display 2024-05-15 16:45:24 -07:00
17b1e81ca1 fixed width and word count for double spacing 2024-05-15 16:29:33 -07:00
7e9a2da097 Merge pull request #4462 from dhiltgen/opt_out_build
Port cuda/rocm skip build vars to linux
2024-05-15 16:27:47 -07:00
c48c1d7c46 Port cuda/rocm skip build vars to linux
Windows already implements these, carry over to linux.
2024-05-15 15:56:43 -07:00
d1692fd3e0 fix the cpu estimatedTotal memory + get the expiry time for loading models (#4461) v0.1.38 2024-05-15 15:43:16 -07:00
5fa36a0833 Merge pull request #4459 from dhiltgen/sanitize_env_log
Sanitize the env var debug log
2024-05-15 14:58:55 -07:00
853ae490e1 Sanitize the env var debug log
Only dump env vars we care about in the logs
2024-05-15 14:42:57 -07:00
f2cf97d6f1 fix typo in modelfile generation (#4439) 2024-05-14 15:34:29 -07:00
c344da4c5a fix keepalive for non-interactive mode (#4438) 2024-05-14 15:17:04 -07:00
85a57006d1 check if name exists before create/pull/copy 2024-05-14 14:58:58 -07:00
c5e892cb3e update tests 2024-05-14 14:56:31 -07:00
81fb06f530 more resilient Manifests 2024-05-14 14:08:24 -07:00
a385382ff5 filepath.Join 2024-05-14 14:08:24 -07:00
b8772a353f remove DeleteModel 2024-05-14 14:08:24 -07:00
c2714fcbfd routes: use Manifests for ListHandler 2024-05-14 14:08:24 -07:00
a2fc933fed update delete handler to use model.Name 2024-05-14 14:08:24 -07:00
0e331c7168 Merge pull request #4328 from ollama/mxyng/mem
count memory up to NumGPU if set by user
2024-05-14 13:47:44 -07:00
ac145f75ca return on part done 2024-05-14 13:04:30 -07:00
a4b8d1f89a re-add system context (#4435) 2024-05-14 11:38:20 -07:00
798b107f19 Fixed the API endpoint /api/tags when the model list is empty. (#4424)
* Fixed the API endpoint /api/tags to return {models: []} instead of {models: null} when the model list is empty.

* Update server/routes.go

---------

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
2024-05-14 11:18:10 -07:00
6a1b471365 Merge pull request #4430 from dhiltgen/gpu_info
Remove VRAM convergence check for windows
2024-05-14 10:59:06 -07:00
ec231a7923 Remove VRAM convergence check for windows
The APIs we query are optimistic on free space, and windows pages
VRAM, so we don't have to wait to see reported usage recover on unload
2024-05-14 09:53:46 -07:00
7ca71a6b0f don't abort when an invalid model name is used in /save (#4416) 2024-05-13 18:48:28 -07:00
7607e6e902 Merge pull request #4379 from WolfTheDeveloper/main
Update `LlamaScript` to point to new link from Legacy link.
2024-05-13 18:08:32 -07:00
f1548ef62d update the FAQ to be more clear about windows env variables (#4415) 2024-05-13 18:01:13 -07:00
6845988807 Ollama ps command for showing currently loaded models (#4327) 2024-05-13 17:17:36 -07:00