Bruce MacDonald
|
d40497b9a2
|
Add support for IQ1_S, IQ3_S, IQ2_S, IQ4_XS. IQ4_NL
Co-authored-by: ManniX-ITA <20623405+mann1x@users.noreply.github.com>
|
2024-05-03 14:51:07 -07:00 |
|
Michael Yang
|
828e4bf101
|
s/DisplayLongest/String/
|
2024-05-03 13:18:28 -07:00 |
|
Michael Yang
|
05105903d8
|
only quantize language models
|
2024-05-03 13:18:28 -07:00 |
|
Michael Yang
|
abf3b1fb34
|
no iterator
|
2024-05-03 13:18:28 -07:00 |
|
Michael Yang
|
82fcc0601d
|
rebase
|
2024-05-03 13:18:28 -07:00 |
|
Michael Yang
|
185a927210
|
comments
|
2024-05-03 13:18:28 -07:00 |
|
Michael Yang
|
096ea2c8c3
|
update tests
|
2024-05-03 13:18:28 -07:00 |
|
Michael Yang
|
06b31e2e24
|
quantize any fp16/fp32 model
- FROM /path/to/{safetensors,pytorch}
- FROM /path/to/fp{16,32}.bin
- FROM model:fp{16,32}
|
2024-05-03 13:18:28 -07:00 |
|
Michael Yang
|
b7a87a22b6
|
Merge pull request #4059 from ollama/mxyng/parser-2
rename parser to model/file
|
2024-05-03 13:01:22 -07:00 |
|
Dr Nic Williams
|
e8aaea030e
|
Update 'llama2' -> 'llama3' in most places (#4116)
* Update 'llama2' -> 'llama3' in most places
---------
Co-authored-by: Patrick Devine <patrick@infrahq.com>
|
2024-05-03 15:25:04 -04:00 |
|
Daniel Hiltgen
|
267e25a750
|
Merge pull request #4129 from dhiltgen/unit_tests
Soften timeouts on sched unit tests
|
2024-05-03 11:10:26 -07:00 |
|
Daniel Hiltgen
|
9a32c514cb
|
Soften timeouts on sched unit tests
This gives us more headroom on the scheduler tests to tamp
down some flakes.
|
2024-05-03 09:08:33 -07:00 |
|
Michael Yang
|
e9ae607ece
|
Merge pull request #3892 from ollama/mxyng/parser
refactor modelfile parser
|
2024-05-02 17:04:47 -07:00 |
|
Michael Yang
|
93707fa3f2
|
Merge pull request #4108 from ollama/mxyng/lf
fix line ending
|
2024-05-02 14:55:15 -07:00 |
|
Michael Yang
|
94c369095f
|
fix line ending
replace CRLF with LF
|
2024-05-02 14:53:13 -07:00 |
|
Jeffrey Morgan
|
9164b0161b
|
Update .gitattributes
v0.1.33
|
2024-05-02 14:06:31 -04:00 |
|
Bryce Reitano
|
bf4fc25f7b
|
Add a /clear command (#3947)
* Add a /clear command
* change help messages
---------
Co-authored-by: Patrick Devine <patrick@infrahq.com>
|
2024-05-01 17:44:36 -04:00 |
|
Michael Yang
|
5b806d8d24
|
Merge pull request #4089 from ollama/mxyng/target-invalid
server: destination invalid
v0.1.33-rc7
|
2024-05-01 12:46:35 -07:00 |
|
Michael Yang
|
cb1e072643
|
Merge pull request #4087 from ollama/mxyng/fix-host-port
types/model: fix name for hostport
|
2024-05-01 12:42:07 -07:00 |
|
Michael Yang
|
45b6a12e45
|
server: target invalid
|
2024-05-01 12:40:45 -07:00 |
|
alwqx
|
68755f1f5e
|
chore: fix typo in docs/development.md (#4073)
|
2024-05-01 15:39:11 -04:00 |
|
Michael Yang
|
997a455039
|
want filepath
|
2024-05-01 12:33:41 -07:00 |
|
Michael Yang
|
88775e1ff9
|
strip scheme from name
|
2024-05-01 12:26:19 -07:00 |
|
Michael Yang
|
8867e744ff
|
types/model: fix name for hostport
|
2024-05-01 12:14:53 -07:00 |
|
Daniel Hiltgen
|
4fd064bea6
|
Merge pull request #4031 from MarkWard0110/fix/issue-3736
Fix/issue 3736: When runners are closing or expiring. Scheduler is getting dirty VRAM size readings.
|
2024-05-01 12:13:26 -07:00 |
|
Jeffrey Morgan
|
59fbceedcc
|
use lf for line endings (#4085)
|
2024-05-01 15:02:45 -04:00 |
|
Mark Ward
|
321d57e1a0
|
Removing go routine calling .wait from load.
|
2024-05-01 18:51:10 +00:00 |
|
Mark Ward
|
ba26c7aa00
|
it will always return an error due to Kill() discarding Wait() errors
|
2024-05-01 18:51:10 +00:00 |
|
Mark Ward
|
63c763685f
|
log when the waiting for the process to stop to help debug when other tasks execute during this wait.
expire timer clear the timer reference because it will not be reused.
close will clean up expireTimer if calling code has not already done this.
|
2024-05-01 18:51:10 +00:00 |
|
Mark Ward
|
34a4a94f13
|
ignore debug bin files
|
2024-05-01 18:51:10 +00:00 |
|
Mark Ward
|
f4a73d57a4
|
fix runner expire during active use. Clearing the expire timer as it is used. Allowing the finish to assign an expire timer so that the runner will expire after no use.
|
2024-05-01 18:51:10 +00:00 |
|
Mark Ward
|
948114e3e3
|
fix sched to wait for the runner to terminate to ensure following vram check will be more accurate
|
2024-05-01 18:51:10 +00:00 |
|
Arpit Jain
|
a3e60d9058
|
README.md: fix typos (#4007)
Co-authored-by: Blake Mizerany <blake.mizerany@gmail.com>
|
2024-05-01 10:39:38 -07:00 |
|
Michael Yang
|
8acb233668
|
use strings.Builder
|
2024-05-01 10:01:09 -07:00 |
|
Michael Yang
|
119589fcb3
|
rename parser to model/file
|
2024-05-01 09:53:50 -07:00 |
|
Michael Yang
|
5ea844964e
|
cmd: import regexp
|
2024-05-01 09:53:45 -07:00 |
|
Michael Yang
|
bd8eed57fc
|
fix parser name
|
2024-05-01 09:52:54 -07:00 |
|
Michael Yang
|
9cf0f2e973
|
use parser.Format instead of templating modelfile
|
2024-05-01 09:52:54 -07:00 |
|
Michael Yang
|
176ad3aa6e
|
parser: add commands format
|
2024-05-01 09:52:54 -07:00 |
|
Michael Yang
|
4d08363580
|
comments
|
2024-05-01 09:52:54 -07:00 |
|
Michael Yang
|
8907bf51d2
|
fix multiline
|
2024-05-01 09:52:54 -07:00 |
|
Michael Yang
|
abe614c705
|
tests
|
2024-05-01 09:52:54 -07:00 |
|
Michael Yang
|
238715037d
|
linting
|
2024-05-01 09:52:54 -07:00 |
|
Michael Yang
|
c0a00f68ae
|
refactor modelfile parser
|
2024-05-01 09:52:54 -07:00 |
|
Jeffrey Morgan
|
f0c454ab57
|
gpu: add 512MiB to darwin minimum, metal doesn't have partial offloading overhead (#4068)
v0.1.33-rc6
|
2024-05-01 11:46:03 -04:00 |
|
Blake Mizerany
|
b9f74ff3d6
|
types/model: reintroduce Digest (#4065)
|
2024-04-30 16:38:03 -07:00 |
|
jmorganca
|
fcf4d60eee
|
llm: add back check for empty token cache
|
2024-04-30 17:38:44 -04:00 |
|
jmorganca
|
e33d5c2dbc
|
update llama.cpp commit to 952d03d
|
2024-04-30 17:31:20 -04:00 |
|
Jeffrey Morgan
|
18d9a7e1f1
|
update llama.cpp submodule to f364eb6 (#4060)
|
2024-04-30 17:25:39 -04:00 |
|
Michael
|
8488388cbd
|
Update README.md
|
2024-04-30 15:45:56 -04:00 |
|