Commit Graph

15 Commits

Author SHA1 Message Date
dddb58a38b Merge pull request #5051 from ollama/mxyng/capabilities
add model capabilities
2024-07-02 14:26:07 -07:00
88bcd79bb9 err on insecure path 2024-07-01 15:55:59 -07:00
58e3fff311 rename templates to template 2024-07-01 10:40:54 -07:00
123a722a6f zip: prevent extracting files into parent dirs (#5314) 2024-06-26 21:38:21 -07:00
cb42e607c5 llm: speed up gguf decoding by a lot (#5246)
Previously, some costly things were causing the loading of GGUF files
and their metadata and tensor information to be VERY slow:

  * Too many allocations when decoding strings
  * Hitting disk for each read of each key and value, resulting in a
    not-okay amount of syscalls/disk I/O.

The show API is now down to 33ms from 800ms+ for llama3 on a macbook pro
m3.

This commit also prevents collecting large arrays of values when
decoding GGUFs (if desired). When such keys are encountered, their
values are null, and are encoded as such in JSON.

Also, this fixes a broken test that was not encoding valid GGUF.
2024-06-24 21:47:52 -07:00
c16f8af911 fix: multiple templates when creating from model
multiple templates may appear in a model if a model is created from
another model that 1) has an autodetected template and 2) defines a
custom template
2024-06-12 13:35:49 -07:00
d61ef8b954 update create handler to use model.Name 2024-06-04 13:28:25 -07:00
e40145a39d lint 2024-06-04 11:13:30 -07:00
f36f1d6be9 tidy intermediate blobs 2024-05-20 15:15:06 -07:00
3520c0e4d5 cache and reuse intermediate blobs
particularly useful for zipfiles and f16s
2024-05-20 13:25:10 -07:00
b2f00aa977 close zip files 2024-05-06 15:27:19 -07:00
f5e8b207fb s/DisplayLongest/String/ 2024-05-06 15:24:01 -07:00
4d0d0fa383 no iterator 2024-05-06 15:24:01 -07:00
01811c176a comments 2024-05-06 15:24:01 -07:00
9685c34509 quantize any fp16/fp32 model
- FROM /path/to/{safetensors,pytorch}
- FROM /path/to/fp{16,32}.bin
- FROM model:fp{16,32}
2024-05-06 15:24:01 -07:00