* test: harden scheduler tests
This removes reschedDelay which was stale code, and adds
a new configurable timeout for the waitForVRAMRecovery so
tests can now set the timeout to be very short to avoid the
scheduler getting stuck and hitting a test timeout.
* test: tune tests for partial loads
Give stress tests more time when the model is split between CPU/GPU
* tests: add single threaded history test
Also tidies up some existing tests to handle more model output variation
* test: add support for testing specific architectures
* Move quantization logic to GGML via new backend
This moves the model aware logic to Go code and calls GGMLs quantization code for model creation.
* Remove "add model quantizations"
This is no longer needed now that quantization is implemented in Go+GGML code directly.