mirror of https://github.com/ollama/ollama.git synced 2025-04-13 06:09:52 +02:00

History

Bruce MacDonald 66b2539238

runner: clear cache when shift is not possible (#9433 )

Clear KV cache when shift operation is not supported by model.
Added KvCacheCanShift() check to handle models that can't perform cache shifts,
falling back to full cache clear while preserving logical token history to
maintain expected behavior when context window fills up.

2025-03-31 12:54:45 -07:00

common

Runner for Ollama engine

2025-02-13 17:09:26 -08:00

llamarunner

runner: clear cache when shift is not possible (#9433 )

2025-03-31 12:54:45 -07:00

ollamarunner

runner: clear cache when shift is not possible (#9433 )

2025-03-31 12:54:45 -07:00

README.md

Runner for Ollama engine

2025-02-13 17:09:26 -08:00

runner.go

Runner for Ollama engine

2025-02-13 17:09:26 -08:00

README.md

`runner`

Note: this is a work in progress

A minimial runner for loading a model and running inference via a http web server.

./runner -model <model binary>

Completion

curl -X POST -H "Content-Type: application/json" -d '{"prompt": "hi"}' http://localhost:8080/completion

Embeddings

curl -X POST -H "Content-Type: application/json" -d '{"prompt": "turn me into an embedding"}' http://localhost:8080/embedding