mirror of
https://github.com/ollama/ollama.git
synced 2025-11-10 06:37:17 +01:00
The scheduler updates free VRAM based on current loaded models. This was mutating the persisted list of GPUs, and when coupled with the non-refreshing logic for Metal that lead to stale low VRAM reporting after unload. The fix is to make sure the GPU discovery always returns a copy so the schedulers GPU list is in fact ephemeral and doesn't leak any temporary adjustments back into the persistent list.