ollama/llm at 08f1e18965c15648504fc5ec367134898e92ec6d - ollama - Gitea: Git with a cup of tea

highperfocused/ollama

mirror of https://github.com/ollama/ollama.git synced 2025-03-29 19:22:16 +01:00

History

Jeffrey Morgan 08f1e18965

Offload layers to GPU based on new model size estimates (#1850 )

* select layers based on estimated model memory usage

* always account for scratch vram

* dont load +1 layers

* better estmation for graph alloc

* Update gpu/gpu_darwin.go

Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>

* Update llm/llm.go

Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>

* Update llm/llm.go

* add overhead for cuda memory

* Update llm/llm.go

Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>

* fix build error on linux

* address comments

---------

Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>

2024-01-08 16:42:00 -05:00

..

add -DCMAKE_SYSTEM_NAME=Darwin cmake flag (#1832 )

2024-01-07 00:46:17 -05:00

add -DCMAKE_SYSTEM_NAME=Darwin cmake flag (#1832 )

2024-01-07 00:46:17 -05:00

llama.cpp @ 328b83de23

Init submodule with new path

2024-01-04 13:00:13 -08:00

dynamic_shim.c

Switch windows build to fully dynamic

2024-01-02 15:36:16 -08:00

dynamic_shim.h

Refactor how we augment llama.cpp

2024-01-02 15:35:55 -08:00

ext_server_common.go

Offload layers to GPU based on new model size estimates (#1850 )

2024-01-08 16:42:00 -05:00

ext_server_default.go

Offload layers to GPU based on new model size estimates (#1850 )

2024-01-08 16:42:00 -05:00

ext_server_windows.go

Load dynamic cpu lib on windows

2024-01-04 08:41:41 -08:00

ggml.go

Offload layers to GPU based on new model size estimates (#1850 )

2024-01-08 16:42:00 -05:00

gguf.go

Offload layers to GPU based on new model size estimates (#1850 )

2024-01-08 16:42:00 -05:00

llama.go

Offload layers to GPU based on new model size estimates (#1850 )

2024-01-08 16:42:00 -05:00

llm.go

Offload layers to GPU based on new model size estimates (#1850 )

2024-01-08 16:42:00 -05:00

shim_darwin.go

Offload layers to GPU based on new model size estimates (#1850 )

2024-01-08 16:42:00 -05:00

shim_ext_server_linux.go

Code shuffle to clean up the llm dir

2024-01-04 12:12:05 -08:00

shim_ext_server_windows.go

Code shuffle to clean up the llm dir

2024-01-04 12:12:05 -08:00

shim_ext_server.go

Offload layers to GPU based on new model size estimates (#1850 )

2024-01-08 16:42:00 -05:00

utils.go

partial decode ggml bin for more info

2023-08-10 09:23:10 -07:00