mirror of
https://github.com/ollama/ollama.git
synced 2025-11-11 04:17:34 +01:00
This changes the memory allocation strategy from upfront estimation to tracking actual allocations done by the engine and reacting to that. The goal is avoid issues caused by both under-estimation (crashing) and over-estimation (low performance due to under-utilized GPUs). It is currently opt-in and can be enabled for models running on the Ollama engine by setting OLLAMA_NEW_ESTIMATES=1. Behavior in other cases is unchanged and will continue to use the existing estimates.
23 lines
994 B
Diff
23 lines
994 B
Diff
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
|
|
From: Gabe Goodhart <ghart@us.ibm.com>
|
|
Date: Fri, 11 Jul 2025 15:59:19 -0600
|
|
Subject: [PATCH] no power throttling win32 with gnuc
|
|
|
|
---
|
|
ggml/src/ggml-cpu/ggml-cpu.c | 2 +-
|
|
1 file changed, 1 insertion(+), 1 deletion(-)
|
|
|
|
diff --git a/ggml/src/ggml-cpu/ggml-cpu.c b/ggml/src/ggml-cpu/ggml-cpu.c
|
|
index a5689c18..85af19a3 100644
|
|
--- a/ggml/src/ggml-cpu/ggml-cpu.c
|
|
+++ b/ggml/src/ggml-cpu/ggml-cpu.c
|
|
@@ -2412,7 +2412,7 @@ static bool ggml_thread_apply_priority(int32_t prio) {
|
|
// Newer Windows 11 versions aggresively park (offline) CPU cores and often place
|
|
// all our threads onto the first 4 cores which results in terrible performance with
|
|
// n_threads > 4
|
|
- #if _WIN32_WINNT >= 0x0602
|
|
+ #if (_WIN32_WINNT >= 0x0602) && !defined(__GNUC__)
|
|
THREAD_POWER_THROTTLING_STATE t;
|
|
ZeroMemory(&t, sizeof(t));
|
|
t.Version = THREAD_POWER_THROTTLING_CURRENT_VERSION;
|