ollama

mirror of https://github.com/ollama/ollama.git synced 2025-11-10 12:07:07 +01:00

Files

Jesse Gross 392a270261 ggml: Avoid cudaMemsetAsync during memory fitting

We pass invalid pointers when we check the size of the required
compute graph before fitting. Some CUDA APIs validate these pointers
but we can just skip them during this phase. cudaMemsetAsync is one
of these that we weren't skipping but never took the code path that
used it. Now that we have enabled op_offload, we can hit it in
memory pressured situations.

2025-10-31 15:23:28 -07:00

backend

ggml: Avoid cudaMemsetAsync during memory fitting

2025-10-31 15:23:28 -07:00

interleaved mrope (#12807 )

2025-10-30 11:29:00 -07:00

backend.go

ggml: Enable op_offload to improve partial offload performance

2025-10-30 13:53:10 -07:00

device.go

cpu: always ensure LibOllamaPath included (#12890 )

2025-10-31 14:37:29 -07:00

path.go

cpu: always ensure LibOllamaPath included (#12890 )

2025-10-31 14:37:29 -07:00