Files
ollama/llama/patches/0017-no-power-throttling-win32-with-gnuc.patch
Jesse Gross d5a0d8d904 llm: New memory management
This changes the memory allocation strategy from upfront estimation to
tracking actual allocations done by the engine and reacting to that. The
goal is avoid issues caused by both under-estimation (crashing) and
over-estimation (low performance due to under-utilized GPUs).

It is currently opt-in and can be enabled for models running on the
Ollama engine by setting OLLAMA_NEW_ESTIMATES=1. Behavior in other
cases is unchanged and will continue to use the existing estimates.
2025-08-14 15:24:01 -07:00

23 lines
994 B
Diff

From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Gabe Goodhart <ghart@us.ibm.com>
Date: Fri, 11 Jul 2025 15:59:19 -0600
Subject: [PATCH] no power throttling win32 with gnuc
---
ggml/src/ggml-cpu/ggml-cpu.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/ggml/src/ggml-cpu/ggml-cpu.c b/ggml/src/ggml-cpu/ggml-cpu.c
index a5689c18..85af19a3 100644
--- a/ggml/src/ggml-cpu/ggml-cpu.c
+++ b/ggml/src/ggml-cpu/ggml-cpu.c
@@ -2412,7 +2412,7 @@ static bool ggml_thread_apply_priority(int32_t prio) {
// Newer Windows 11 versions aggresively park (offline) CPU cores and often place
// all our threads onto the first 4 cores which results in terrible performance with
// n_threads > 4
- #if _WIN32_WINNT >= 0x0602
+ #if (_WIN32_WINNT >= 0x0602) && !defined(__GNUC__)
THREAD_POWER_THROTTLING_STATE t;
ZeroMemory(&t, sizeof(t));
t.Version = THREAD_POWER_THROTTLING_CURRENT_VERSION;