mirror of
https://github.com/ollama/ollama.git
synced 2025-03-19 14:21:57 +01:00
As are adding support for weighted sampling we have seen some performance regressions, bypassing the sampler logic for now and defaulting to greedy until we can benchmark the new sampler logic.
runner
Note: this is a work in progress
A minimial runner for loading a model and running inference via a http web server.
./runner -model <model binary>
Completion
curl -X POST -H "Content-Type: application/json" -d '{"prompt": "hi"}' http://localhost:8080/completion
Embeddings
curl -X POST -H "Content-Type: application/json" -d '{"prompt": "turn me into an embedding"}' http://localhost:8080/embedding