Files
ollama/server
Daniel Andersen ea7657b54a sched: Add support for grouping GPUs (#10678)
This patch modifies Ollama to allow grouping GPUs to memory-fit to the requested model, instead of the former algorithm of using one GPU distributing over all available GPUs.

Benefits:
 - Lower amount of (PCIe-)bus communication between GPUs - especially when they are not very high speed
 - Allowing unallocated GPUs to get into power-saving mode.
 - Significantly reduce VRAM allocation when using more than 2 GPUs in a system
 - Due to the reduced memory allocation, you can run more models simultaneously.
2025-08-11 13:59:38 -07:00
..
2025-08-05 12:21:16 -07:00
2025-08-05 12:21:16 -07:00
2025-08-05 12:21:16 -07:00
2025-08-05 12:21:16 -07:00
2025-08-05 12:21:16 -07:00