ollama

mirror of https://github.com/ollama/ollama.git synced 2025-11-12 15:57:17 +01:00

Files

Jesse Gross 34c3b68fc8 ggml: Don't allocate CPU buffers as CUDA Host buffers

Allocating (and in particular, freeing) memory from CUDA host buffers
is expensive and can cause a significant performance hit if we do
it for every token. Using normal system memory avoids this issue
and also gives the OS more flexibility to manage it.

There is no performance impact from this patch directly (either
positive or negative) but it makes a difference once we start
freeing memory correctly.

2025-04-11 11:13:22 -07:00

ggml

model: support for mistral-small in the ollama runner

2025-04-03 16:57:36 -07:00

ggml.go

ggml: Don't allocate CPU buffers as CUDA Host buffers

2025-04-11 11:13:22 -07:00

threads_debug.go

ollama debug tensor

2025-03-11 14:49:19 -07:00

threads.go

ollama debug tensor

2025-03-11 14:49:19 -07:00