This commit replaces the old pull implementation in the server package
with the new, faster, more robust pull implementation in the registry
package.
The new endpoint, and now the remove endpoint too, are behind the
feature gate "client2" enabled only by setting the OLLAMA_EXPERIMENT
environment variable include "client2".
Currently, the progress indication is wired to perform the same as the
previous implementation to avoid making changes to the CLI, and because
the status reports happen at the start of the download, and the end of
the write to disk, the progress indication is not as smooth as it could
be. This is a known issue and will be addressed in a future change.
This implementation may be ~0.5-1.0% slower in rare cases, depending on
network and disk speed, but is generally MUCH faster and more robust
than the its predecessor in all other cases.
This reintroduces aggressive pruning on model deletion as a temporary
measure until a more controlled garbage collection (GC) mechanism is
implemented.
Issues with the current approach:
1. Users may accidentally delete a model (`ollama rm llama3.3` instead
of `ollama rm llama3.2`), requiring a full re-download unless another
model references the same blobs.
2. Users may assume a deleted model is still referenced elsewhere, but
due to prior updates or deletions, the references no longer exist,
leading to unnecessary re-downloads.
Soon, we should implement a structured GC mechanism to retain
unreferenced blobs for a configurable period before removal, which will
run on "ollama rm" and other commands we deem appropriate.
Users that want to immediately remove unreferenced blobs can use a new
prune command that will allow them to specify the age and class of blobs
to remove.
Example usage:
# Run basic blob GC
$ ollama prune
# Remove unreferenced blobs older than 7 days
$ ollama prune --age 7d
# Remove all blobs, referenced or not, older than 7 days (and their manifests?)
$ ollama prune --age 7d --all
# Remove all unreferenced blobs immediately
$ ollama prune --age 0 --all
# Remove all blobs
$ ollama prune --age 0 --all
This should provide a safer and more predictable cleanup process.
Previously, developers without the synctest experiment enabled would see
build failures when running tests in some server/internal/internal
packages using the synctest package. This change makes the transition to
use of the package less painful but guards the use of the synctest
package with build tags.
synctest is enabled in CI. If a new change will break a synctest
package, it will break in CI, even if it does not break locally.
The developer docs have been updated to help with any confusion about
why package tests pass locally but fail in CI.
Previously, using a Registry required a DiskCache to be passed in for
use in various methods. This was a bit cumbersome, as the DiskCache is
required for most operations, and the DefaultCache is used in most of
those cases. This change makes the DiskCache an optional field on the
Registry struct.
This also changes DefaultCache to initialize on first use. This is to
not burden clients with the cost of creating a new cache per use, or
having to hold onto a cache for the lifetime of the Registry.
Also, slip in some minor docs updates for Trace.
The extended name format is a superset of the name format that only the
client needs to know about, not the server or other dependents of the
name package, so move the split logic into the client package.
Also, take advantage of knowing about the extended name format to allow
the client to use the extended name format when unlinking to verify they
are unlinking the manifest with the content they intend.
This commit is a step towards a goal to make names less ceremonial
outside of the registry client. Clients of the registry package can
treat names as opaque strings, and the registry package will handle
parsing, validating, and normalizing names.
Ideally we end up with the names package tucked away in an internal
package for good. We'll see how things go.
Also, this package name is not permanent. This another step in the
on-going process of refactoring the server code, and at some point it
will most likely be renamed/moved.
Also, require the -as flag to be set when importing a model. This
prevents the confusing error message "invalid name".
Also, allow short names to be used when importing a model and
auto-complete the name with the default mask.
This fixes panics introduced in 2412adf42b8380748ac79476e273f5b337c3b977
when Gin ungracefully assumes that the http.ResponseWriter implements
http.CloseNotifier and http.Flusher, which our new statusCodeRecorder
does not. This is a temporary fix until we can pour the rest of the Gin
out.
This commit introduces a new API implementation for handling
interactions with the registry and the local model cache. The new API is
located in server/internal/registry. The package name is "registry" and
should be considered temporary; it is hidden and not bleeding outside of
the server package. As the commits roll in, we'll start consuming more
of the API and then let reverse osmosis take effect, at which point it
will surface closer to the root level packages as much as needed.
This commit copies (without history) the bmizerany/ollama-go repository
with the intention of integrating it into the ollama as a replacement
for the pushing, and pulling of models, and management of the cache they
are pushed and pulled from.
New homes for these packages will be determined as they are integrated
and we have a better understanding of proper package boundaries.