Commit Graph

3014 Commits

Author SHA1 Message Date
26a00a0410 use ffi for tokenizing/detokenizing 2024-05-29 11:26:47 -07:00
646371f56d Merge pull request #3278 from zhewang1-intc/rebase_ollama_main
Enabling ollama to run on Intel GPUs with SYCL backend
2024-05-28 16:30:50 -07:00
1f5008544b Update install.sh 2024-05-28 15:01:22 -07:00
45cbfc5aee fix wsl2 status check for nvidia cards (#4689) 2024-05-28 14:49:46 -07:00
6d423b383b Improve install experience on WSL2 and Linux (#4653) 2024-05-28 14:41:50 -07:00
ad897080a2 working on integration of multi-byte and multi-width runes (#4549)
* integrated runewidth for display management - fixed cursor movement for mutli-width char

* updated input and deletion of multi-byte chars

* fixed line history with some exceptions

* improved insert and add

* fixed issues with moving across lines

* end of line extra space tracking'

* saved changes

* fixed end of line issues with empty spaces

* worked some more

* worked on end of line

* fixed failed test

* fixed minor inserting bug

* fixed movement hotkeys

* adjusted hotkeys

* removed comments

* Update readline/buffer.go

Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>

* Update readline/buffer.go

Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>

* Update readline/buffer.go

Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>

* Update readline/buffer.go

Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>

* Update readline/buffer.go

Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>

* Update readline/buffer.go

Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>

* Update readline/buffer.go

Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>

* Update readline/buffer.go

Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>

* Update readline/buffer.go

Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>

* Update readline/buffer.go

Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>

* Update readline/buffer.go

Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>

* Update readline/buffer.go

Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>

* deleted comments and duplicate code

* removed duplicate code

* added comments, refactored add function to use addChar

* added helper to retrieve lineSpacing, renamed lineFlags for clarity

* fixed remove()

---------

Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
v0.1.39
2024-05-28 12:04:03 -07:00
b7d316d98d fix nvidia detection in install script (#4683) 2024-05-28 09:59:36 -07:00
d7339fad52 Merge pull request #4682 from dhiltgen/more_time
Give the final model loading more time
2024-05-28 09:36:02 -07:00
92c81e8117 Give the final model loading more time
On some systems, 1 minute isn't sufficient to finish the load after it
hits 100% This creates 2 distinct timers, although they're both set to
the same value for now so we can refine the timeouts further.
2024-05-28 09:08:10 -07:00
Tai
9db0996ed4 Add OllamaSpring Project to Readme (#4672)
* Add OllamaSpring Project to Readme

* Update README.md

---------

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
2024-05-27 19:58:26 -07:00
6f43898b17 Adds olpaka flutter client (#4647)
* Adds olpaka flutter client

* Update README.md

---------

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
2024-05-27 17:22:01 -07:00
7487229c34 llm/server.go: Fix 2 minor typos (#4661)
Signed-off-by: Lei Jitang <leijitang@outlook.com>
2024-05-27 17:21:10 -07:00
8a8e7afa96 small fix on examples/python-simplechat/client.py to actually get a streamed response and get tokens printed as we receive it (#4671) 2024-05-27 17:19:20 -07:00
c79f8c9c39 Ensure nvidia and nvidia_uvm kernel modules are loaded in install.sh script and at startup (#4652)
* ensure kernel modules are loaded in `install.sh` script and at startup

* indentation

* use `SUDO` variable

* restart if nouveau is detected

* consistent success message for AMD
2024-05-26 14:57:17 -07:00
485016bfbb Update install.sh 2024-05-26 11:46:00 -07:00
0165ba1651 Merge pull request #4638 from dhiltgen/better_error
Report better warning on client closed abort of load
2024-05-25 14:32:28 -07:00
c4209d6d21 Report better warning on client closed abort of load
If the client closes the connection before we finish loading the model
we abort, so lets make the log message clearer why to help users
understand this failure mode
2024-05-25 09:23:28 -07:00
6adca97f37 Merge pull request #4619 from noxer/patch-1
Fix download retry issue
2024-05-24 17:21:57 -07:00
9a3c8003c8 Merge pull request #4624 from ollama/mxyng/fix-5
fix q5_0, q5_1
2024-05-24 16:11:21 -07:00
d51f15257c Update llm/ggml.go
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
2024-05-24 16:10:43 -07:00
8f440d579a fix q5_0, q5_1 2024-05-24 16:01:46 -07:00
4cc3be3035 Move envconfig and consolidate env vars (#4608) 2024-05-24 14:57:15 -07:00
db2ffa79f1 Fix download retry issue 2024-05-24 20:30:42 +02:00
afd2b058b4 set codesign timeout to longer (#4605) 2024-05-23 22:46:23 -07:00
fd5971be0b support ollama run on Intel GPUs 2024-05-24 11:18:27 +08:00
89bf98bcf2 Merge pull request #4598 from dhiltgen/docs
Tidy up developer guide a little
2024-05-23 15:14:29 -07:00
1b2d156094 Tidy up developer guide a little 2024-05-23 15:14:05 -07:00
714adb8bd1 bump (#4597) 2024-05-23 14:16:26 -07:00
95b1133d0c Merge pull request #4547 from dhiltgen/load_progress
Wire up load progress
2024-05-23 14:06:02 -07:00
b37b496a12 Wire up load progress
This doesn't expose a UX yet, but wires the initial server portion
of progress reporting during load
2024-05-23 13:36:48 -07:00
d6f692ad1a Add support for IQ1_S, IQ3_S, IQ2_S, IQ4_XS. IQ4_NL (#4322)
Co-authored-by: ManniX-ITA <20623405+mann1x@users.noreply.github.com>
2024-05-23 13:21:49 -07:00
f77713bf1f Add isolated gpu test to troubleshooting 2024-05-23 09:33:25 -07:00
38255d2af1 Use flash attention flag for now (#4580)
* put flash attention behind flag for now

* add test

* remove print

* up timeout for sheduler tests
v0.1.39-rc2
2024-05-22 21:52:09 -07:00
73630a7e85 add phi 3 medium (#4578) 2024-05-22 12:53:45 -04:00
955c317cab chore: update tokenizer.go (#4571)
PreTokenziers -> PreTokenizers
2024-05-22 00:25:23 -07:00
9f18b88a06 Merge pull request #4566 from ollama/jyan/shortcuts
add Ctrl + W shortcut
2024-05-21 22:49:36 -07:00
353f83a9c7 add Ctrl + W shortcut 2024-05-21 16:55:09 -07:00
3bade04e10 doc updates for the faq/troubleshooting (#4565) 2024-05-21 15:30:09 -07:00
a6d0f443eb Merge pull request #4543 from ollama/mxyng/simple-safetensors
simplify safetensors reading
v0.1.39-rc1
2024-05-21 14:43:55 -07:00
96236b7968 Merge pull request #4268 from ollama/pdevine/llama3
Convert directly from llama3
2024-05-21 14:43:37 -07:00
4434d7f447 Correct typo in error message (#4535)
The spelling of the term "request" has been corrected, which was previously mistakenly written as "requeset" in the error log message.
2024-05-21 13:39:01 -07:00
171eb040fc simplify safetensors reading 2024-05-21 11:28:22 -07:00
3591bbe56f add test 2024-05-21 11:28:22 -07:00
34d5ef29b3 fix conversion for f16 or f32 inputs 2024-05-21 11:28:22 -07:00
bbbd9f20f3 cleanup 2024-05-20 16:13:57 -07:00
547132e820 bpe pretokenizer 2024-05-20 16:13:57 -07:00
2d315ba9a9 add missing file 2024-05-20 16:13:57 -07:00
d355d2020f add fixes for llama 2024-05-20 16:13:57 -07:00
c8cf0d94ed llama3 conversion 2024-05-20 16:13:57 -07:00
4730762e5c add safetensors version 2024-05-20 16:13:57 -07:00