mirror of
https://github.com/ollama/ollama.git
synced 2025-11-10 21:17:32 +01:00
llm: Change memory allocation backoff from exponential to incremental
If we create a memory layout that should fit based on report free VRAM but allocation still fails, we start applying a backoff. This reduces free VRAM by an exponential percentage (1%, 2%, 4%...). However, the points chosen tend to be too dense at the beginning and too sparse at the end. Therefore, this switches to an incremental backoff (10%, 20%, 30%...).
This commit is contained in:
@@ -766,15 +766,12 @@ nextOperation:
|
||||
// Memory allocation failed even though we created a layout that we thought should
|
||||
// fit in available memory. This could happen if either our free memory reports
|
||||
// are incorrect or if available memory is changing between layout and allocation
|
||||
// time. Apply an exponential backoff to try to find the real amount of available
|
||||
// space.
|
||||
// time. Apply a backoff to try to find the real amount of available space.
|
||||
if backoff > 1 {
|
||||
slog.Warn("memory layout cannot be allocated", "memory", resp.Memory)
|
||||
return nil, errors.New("memory layout cannot be allocated")
|
||||
} else if backoff == 0 {
|
||||
backoff = 0.01
|
||||
} else {
|
||||
backoff *= 2
|
||||
backoff += 0.1
|
||||
}
|
||||
|
||||
slog.Info("model layout did not fit, applying backoff", "backoff", fmt.Sprintf("%.2f", backoff))
|
||||
|
||||
Reference in New Issue
Block a user