Files
multica/server
Bohan Jiang 38f777d0ba feat(autopilot): auto-pause autopilots with sustained high failure rate (#2136)
* feat(autopilot): auto-pause autopilots with sustained high failure rate

Adds a background monitor that pauses any active autopilot whose recent
runs are dominated by failures (defaults: ≥100 terminal runs in 7d, ≥90%
failed). The monitor leaves a severity=attention inbox notification for
the autopilot's creator (or the agent's owner if the autopilot was
agent-created) so a human learns about the auto-pause and can fix the
root cause before re-enabling.

Motivated by MUL-1336 §6 #2: a single broken cron autopilot
(`Registro de ls cada 5 min`, 1,475/1,476 failed in 7d) was burning
~1.5k tasks/tokens per week with no human in the loop.

Tunable via AUTOPILOT_FAIL_MONITOR_{INTERVAL,LOOKBACK,MIN_RUNS,FAIL_RATIO,STARTUP_DELAY};
INTERVAL=0 disables the monitor entirely.

Co-authored-by: multica-agent <github@multica.ai>

* chore(autopilot): relax failure monitor defaults to daily / 50 runs

Per review feedback in MUL-1339: 30-min scan was overkill — the 50-run
threshold already provides multi-hour lag, and operational simplicity
matters. Lowering MinRuns from 100 → 50 keeps low-frequency autopilots
in scope (~7 runs/day reaches threshold within 7d window).

Co-authored-by: multica-agent <github@multica.ai>

---------

Co-authored-by: multica-agent <github@multica.ai>
2026-05-06 17:59:15 +08:00
..