llm: Change memory allocation backoff from exponential to incremental

If we create a memory layout that should fit based on report free VRAM
but allocation still fails, we start applying a backoff. This reduces
free VRAM by an exponential percentage (1%, 2%, 4%...). However, the
points chosen tend to be too dense at the beginning and too sparse at
the end. Therefore, this switches to an incremental backoff (10%, 20%,
30%...).
This commit is contained in:
Jesse Gross
2025-10-23 11:31:25 -07:00
committed by Jesse Gross
parent 6723a40be6
commit ad6f6a1d29

View File

@@ -766,15 +766,12 @@ nextOperation:
// Memory allocation failed even though we created a layout that we thought should // Memory allocation failed even though we created a layout that we thought should
// fit in available memory. This could happen if either our free memory reports // fit in available memory. This could happen if either our free memory reports
// are incorrect or if available memory is changing between layout and allocation // are incorrect or if available memory is changing between layout and allocation
// time. Apply an exponential backoff to try to find the real amount of available // time. Apply a backoff to try to find the real amount of available space.
// space.
if backoff > 1 { if backoff > 1 {
slog.Warn("memory layout cannot be allocated", "memory", resp.Memory) slog.Warn("memory layout cannot be allocated", "memory", resp.Memory)
return nil, errors.New("memory layout cannot be allocated") return nil, errors.New("memory layout cannot be allocated")
} else if backoff == 0 {
backoff = 0.01
} else { } else {
backoff *= 2 backoff += 0.1
} }
slog.Info("model layout did not fit, applying backoff", "backoff", fmt.Sprintf("%.2f", backoff)) slog.Info("model layout did not fit, applying backoff", "backoff", fmt.Sprintf("%.2f", backoff))