ollama

mirror of https://github.com/ollama/ollama.git synced 2025-04-14 22:59:22 +02:00

History

Jesse Gross f50d691254 ggml: Fix memory leak on input tensors

For every forward pass through the model, we need to allocate input
tensors: tokens, images, positions, outputs and masks. These get
allocated in system memory.

However, when we close the context that the tensors were allocated
through, the metadata gets freed but the actual backend memory does
not. This results in a significant memory leak.

This makes it so that all the memory allocated through a context
gets freed when it is closed.

Fixes #10040

2025-04-11 11:13:22 -07:00

backend

ggml: Fix memory leak on input tensors

2025-04-11 11:13:22 -07:00

attention: Remove unnecessary contiguous operations

2025-03-01 20:53:23 -08:00

backend.go

ollamarunner: Preallocate worst case graph at startup

2025-04-08 10:01:28 -07:00