Commit Graph

3 Commits

Author SHA1 Message Date
Daniel Hiltgen
3258a89b6e DRY out the runner lifecycle code (#12540)
* DRY out the runner lifecycle code

Now that discovery uses the runners as well, this unifies the runner spawning code
into a single place.  This also unifies GPU discovery types with the newer ml.DeviceInfo

* win: make incremental builds better

Place build artifacts in discrete directories so incremental builds don't have to start fresh

* Adjust sort order to consider iGPUs

* handle cpu inference oom scenarios

* review comments
2025-10-23 11:20:02 -07:00
Daniel Hiltgen
68e04c7ff8 test: harden scheduler tests (#12662)
* test: harden scheduler tests

This removes reschedDelay which was stale code, and adds
a new configurable timeout for the waitForVRAMRecovery so
tests can now set the timeout to be very short to avoid the
scheduler getting stuck and hitting a test timeout.

* test: tune tests for partial loads

Give stress tests more time when the model is split between CPU/GPU
2025-10-17 08:56:44 -07:00
Devon Rifkin
6db8da9958 routes: fix built-in renderers for api/generate
Made it so when api/generate builds up a message array and generates the
prompt it now goes through the same function as `api/chat` for
consistency. This is where we hook the optional built-in renderers to
bypass templates, which was missing for `api/generate` before this
change.

Closes: #12578
2025-10-11 13:57:43 -07:00