mirror of https://github.com/ollama/ollama.git synced 2025-04-15 23:21:28 +02:00

History

Jesse Gross b2a465296d runner: Release semaphore and improve error messages on failures

If we have an error after creating a new sequence but before
finding a slot for it, we return without releasing the semaphore.
This reduces our parallel sequences and eventually leads to deadlock.

In practice this should never happen because once we have acquired
the semaphore, we should always be able to find a slot. However, the
code is clearly not correct.

2025-03-30 19:21:54 -07:00

common

Runner for Ollama engine

2025-02-13 17:09:26 -08:00

llamarunner

runner: Release semaphore and improve error messages on failures

2025-03-30 19:21:54 -07:00

ollamarunner

runner: Release semaphore and improve error messages on failures

2025-03-30 19:21:54 -07:00

README.md

Runner for Ollama engine

2025-02-13 17:09:26 -08:00

runner.go

Runner for Ollama engine

2025-02-13 17:09:26 -08:00

README.md

`runner`

Note: this is a work in progress

A minimial runner for loading a model and running inference via a http web server.

./runner -model <model binary>

Completion

curl -X POST -H "Content-Type: application/json" -d '{"prompt": "hi"}' http://localhost:8080/completion

Embeddings

curl -X POST -H "Content-Type: application/json" -d '{"prompt": "turn me into an embedding"}' http://localhost:8080/embedding