logs: quiet down context canceled on completion and scheduler noise (#12553)

* logs: quiet down context canceled on completion

If the client closes the connection before Completion finishes, we were
logging at error level implying the runner crashed which was misleading.

time=2025-10-08T22:59:20.566-07:00 level=ERROR source=server.go:1490 msg="post predict" error="Post \"http://127.0.0.1:57736/completion\": context canceled"

* quiet down scheduler log error on expected case

Since we don't hold the lock while performing memory load calculations, other
runners can unload in parallel, so finding no runner to unload is a valid scenario
which we shouldn't log at error level.
This commit is contained in:
Daniel Hiltgen
2025-10-09 10:37:47 -07:00
committed by GitHub
parent 77060d462c
commit 15e3611d3d
2 changed files with 7 additions and 3 deletions

View File

@@ -1486,7 +1486,10 @@ func (s *llmServer) Completion(ctx context.Context, req CompletionRequest, fn fu
serverReq.Header.Set("Content-Type", "application/json")
res, err := http.DefaultClient.Do(serverReq)
if err != nil {
if err != nil && errors.Is(err, context.Canceled) {
// client closed connection
return err
} else if err != nil {
slog.Error("post predict", "error", err)
return errors.New("model runner has unexpectedly stopped, this may be due to resource limitations or an internal error, check ollama server logs for details")
}

View File

@@ -229,8 +229,9 @@ func (s *Scheduler) processPending(ctx context.Context) {
}
if runnerToExpire == nil {
// Shouildn't happen
slog.Error("runner to expire was nil!")
// While we were performing load calculations, the loaded runner(s) unloaded in parallel
// so findRunnerToUnload returned no runners. We'll try again and the loadedCount should be zero
slog.Debug("runner to expire was nil, retrying")
continue
}
// Trigger an expiration to unload once it's done