mirror of
https://github.com/ollama/ollama.git
synced 2025-11-12 17:47:39 +01:00
We currently short circuit generation of the cache mask and just generate an empty tensor of the correct size. However, in some cases, this can also skip a cast operation. This can result in the worst case graph being not fully worst case. We don't actually need the fast path for mask generation, so it's better to just use the normal code path.