ollama

mirror of https://github.com/ollama/ollama.git synced 2025-11-10 22:20:14 +01:00

Files

Jesse Gross aa45f7ce27 discover: Disable flash attention for Jetson Xavier (CC 7.2)

GGML picks the wrong kernel and these systems fail with:
Sep 28 22:25:39 xavier ollama[48999]: //ml/backend/ggml/ggml/src/ggml-cuda/fattn-wmma-f16.cu:437:
ERROR: CUDA kernel flash_attn_ext_f16 has no device code compatible with CUDA arch 720. ggml-cuda.cu
was compiled for: __CUDA_ARCH_LIST__

Fixes #12442

2025-10-08 09:56:15 -07:00

cpu_linux_test.go

…

cpu_linux.go

…

cpu_windows_test.go

…

cpu_windows.go

…

gpu_darwin.go

…

gpu_info_darwin.h

…

gpu_info_darwin.m

…

gpu.go

discover: Disable flash attention for Jetson Xavier (CC 7.2)

2025-10-08 09:56:15 -07:00

path.go

…

runner_test.go

…

runner.go

Bring back escape valve for llm libraries and fix Jetpack6 crash (#12529 )

2025-10-07 16:06:14 -07:00

types.go

discover: Disable flash attention for Jetson Xavier (CC 7.2)

2025-10-08 09:56:15 -07:00