Files
ollama/llama/patches/0016-add-C-API-for-mtmd_input_text.patch
Jesse Gross d5a0d8d904 llm: New memory management
This changes the memory allocation strategy from upfront estimation to
tracking actual allocations done by the engine and reacting to that. The
goal is avoid issues caused by both under-estimation (crashing) and
over-estimation (low performance due to under-utilized GPUs).

It is currently opt-in and can be enabled for models running on the
Ollama engine by setting OLLAMA_NEW_ESTIMATES=1. Behavior in other
cases is unchanged and will continue to use the existing estimates.
2025-08-14 15:24:01 -07:00

47 lines
1.5 KiB
Diff

From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Gabe Goodhart <ghart@us.ibm.com>
Date: Tue, 24 Jun 2025 16:55:31 -0600
Subject: [PATCH] add C API for mtmd_input_text
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
---
tools/mtmd/mtmd.cpp | 10 ++++++++++
tools/mtmd/mtmd.h | 3 +++
2 files changed, 13 insertions(+)
diff --git a/tools/mtmd/mtmd.cpp b/tools/mtmd/mtmd.cpp
index a05373d5..6f70f7f4 100644
--- a/tools/mtmd/mtmd.cpp
+++ b/tools/mtmd/mtmd.cpp
@@ -79,6 +79,16 @@ enum mtmd_slice_tmpl {
// TODO @ngxson : add support for idefics (SmolVLM)
};
+mtmd_input_text* mtmd_input_text_init(const char * text, bool add_special, bool parse_special) {
+ return new mtmd_input_text{text, add_special, parse_special};
+}
+
+void mtmd_input_text_free(mtmd_input_text* input_text) {
+ if (input_text) {
+ delete input_text;
+ }
+}
+
const char * mtmd_default_marker() {
return "<__media__>";
}
diff --git a/tools/mtmd/mtmd.h b/tools/mtmd/mtmd.h
index f4ea07d3..cf287224 100644
--- a/tools/mtmd/mtmd.h
+++ b/tools/mtmd/mtmd.h
@@ -75,6 +75,9 @@ typedef struct mtmd_input_chunk mtmd_input_chunk;
typedef struct mtmd_input_chunks mtmd_input_chunks;
typedef struct mtmd_input_text mtmd_input_text;
+MTMD_API mtmd_input_text* mtmd_input_text_init(const char * text, bool add_special, bool parse_special);
+MTMD_API void mtmd_input_text_free(mtmd_input_text* input_text);
+
struct mtmd_context_params {
bool use_gpu;
bool print_timings;