mirror of https://github.com/ollama/ollama.git synced 2025-04-15 23:21:28 +02:00

History

Bruce MacDonald e53b3cbd0c

llm: set done reason at server level (#9830 )

No functional change. Many different done reasons can be set at the runner
level, so rather than obsuring them we should return them to the server
process and let it choose what to do with the done reason. This separates
the API concerns from the runner.

2025-04-03 10:19:24 -07:00

common

Runner for Ollama engine

2025-02-13 17:09:26 -08:00

llamarunner

llm: set done reason at server level (#9830 )

2025-04-03 10:19:24 -07:00

ollamarunner

llm: set done reason at server level (#9830 )

2025-04-03 10:19:24 -07:00

README.md

Runner for Ollama engine

2025-02-13 17:09:26 -08:00

runner.go

Runner for Ollama engine

2025-02-13 17:09:26 -08:00

README.md

`runner`

Note: this is a work in progress

A minimial runner for loading a model and running inference via a http web server.

./runner -model <model binary>

Completion

curl -X POST -H "Content-Type: application/json" -d '{"prompt": "hi"}' http://localhost:8080/completion

Embeddings

curl -X POST -H "Content-Type: application/json" -d '{"prompt": "turn me into an embedding"}' http://localhost:8080/embedding