there's two bugs here.
1. the check for a layer id is incorrect and should be >= 0 since layer
0 is valid
2. if both tensors have an layer identifier, it will only compare the
layer id which will return 0 if the tensors are in the same layer.
instead it should fallback to comparing the full tensor name
Currently, when the backend is created, the tensors are loaded at the
same time, which is a slow operation. This separates them to be two
steps:
- Create backend, including enumerating tensors and memory allocation
- Loading tensor data
This allows more flexibility in managing model loading.
* Move quantization logic to GGML via new backend
This moves the model aware logic to Go code and calls GGMLs quantization code for model creation.
* Remove "add model quantizations"
This is no longer needed now that quantization is implemented in Go+GGML code directly.