highperfocused/ollama

mirror of https://github.com/ollama/ollama.git synced 2025-04-01 08:30:16 +02:00

History

Parth Sareen 0b7e1676eb

sample: add sampling package for new engine (#8410 )

2025-02-24 17:19:01 -08:00

..

Runner for Ollama engine

2025-02-13 17:09:26 -08:00

llamarunner: Init GGML before printing system info

2025-02-14 11:41:53 -08:00

sample: add sampling package for new engine (#8410 )

2025-02-24 17:19:01 -08:00

README.md

Runner for Ollama engine

2025-02-13 17:09:26 -08:00

runner.go

Runner for Ollama engine

2025-02-13 17:09:26 -08:00

README.md

`runner`

Note: this is a work in progress

A minimial runner for loading a model and running inference via a http web server.

./runner -model <model binary>

Completion

curl -X POST -H "Content-Type: application/json" -d '{"prompt": "hi"}' http://localhost:8080/completion

Embeddings

curl -X POST -H "Content-Type: application/json" -d '{"prompt": "turn me into an embedding"}' http://localhost:8080/embedding