Jongwook Choi 12e8c12d2b
Disable CUDA peer access as a workaround for multi-gpu inference bug (#1261)
When CUDA peer access is enabled, multi-gpu inference will produce
garbage output. This is a known bug of llama.cpp (or nvidia). Until the
upstream bug is fixed, we can disable CUDA peer access temporarily
to ensure correct output.

See #961.
2023-11-24 14:05:57 -05:00
..
2023-10-02 19:56:51 -07:00
2023-10-23 09:35:49 -07:00
2023-11-22 11:40:30 -08:00
2023-10-02 19:56:51 -07:00
2023-08-10 09:23:10 -07:00