llm: Change memory allocation backoff from exponential to incremental

If we create a memory layout that should fit based on report free VRAM but allocation still fails, we start applying a backoff. This reduces free VRAM by an exponential percentage (1%, 2%, 4%...). However, the points chosen tend to be too dense at the beginning and too sparse at the end. Therefore, this switches to an incremental backoff (10%, 20%, 30%...).
2025-11-10 21:17:32 +01:00 · 2025-10-23 11:31:25 -07:00
parent 6723a40be6
commit ad6f6a1d29
1 changed files with 2 additions and 5 deletions
--- a/llm/server.go
+++ b/llm/server.go
@@ -766,15 +766,12 @@ nextOperation:
 				// Memory allocation failed even though we created a layout that we thought should
 				// fit in available memory. This could happen if either our free memory reports
 				// are incorrect or if available memory is changing between layout and allocation
-				// time. Apply an exponential backoff to try to find the real amount of available
-				// space.
+				// time. Apply a backoff to try to find the real amount of available space.
 				if backoff > 1 {
 					slog.Warn("memory layout cannot be allocated", "memory", resp.Memory)
 					return nil, errors.New("memory layout cannot be allocated")
-				} else if backoff == 0 {
-					backoff = 0.01
 				} else {
-					backoff *= 2
+					backoff += 0.1
 				}

 				slog.Info("model layout did not fit, applying backoff", "backoff", fmt.Sprintf("%.2f", backoff))