* DRY out the runner lifecycle code
Now that discovery uses the runners as well, this unifies the runner spawning code
into a single place. This also unifies GPU discovery types with the newer ml.DeviceInfo
* win: make incremental builds better
Place build artifacts in discrete directories so incremental builds don't have to start fresh
* Adjust sort order to consider iGPUs
* handle cpu inference oom scenarios
* review comments
* test: harden scheduler tests
This removes reschedDelay which was stale code, and adds
a new configurable timeout for the waitForVRAMRecovery so
tests can now set the timeout to be very short to avoid the
scheduler getting stuck and hitting a test timeout.
* test: tune tests for partial loads
Give stress tests more time when the model is split between CPU/GPU
Made it so when api/generate builds up a message array and generates the
prompt it now goes through the same function as `api/chat` for
consistency. This is where we hook the optional built-in renderers to
bypass templates, which was missing for `api/generate` before this
change.
Closes: #12578