mirror of
https://github.com/multica-ai/multica.git
synced 2026-07-05 21:39:54 +02:00
Post-deploy of the new scheduled-dispatch scheduler (PR #4444), an autopilot configured for "weekdays 17:10 Asia/Shanghai" fired at ~12:30 Beijing the day after deploy — ~4h 38m before the next scheduled time the UI showed. Traced to a cold-start regression in the planner hook: Old behaviour ------------- On the first tick after migration the hook found no `sys_cron_executions` row for the trigger (`latestPlan(...).Found == false`) and anchored on the trigger's `created_at`, then applied the 24h replay cap: after := cfg.CreatedAt if oldest := now.Add(-replayWindow); after.Before(oldest) { after = oldest // now - 24h } For a trigger created days/weeks earlier and last fired by the legacy goroutine at Mon 17:10 Beijing (= Mon 09:10 UTC), this set `after = Tue 04:13 UTC - 24h ≈ Mon 04:13 UTC`. The half-open enumeration `(Mon 04:13 UTC, Tue 04:13 UTC]` STILL contained Mon 09:10 UTC — the occurrence the legacy code had already handled — so the new scheduler dispatched it again the moment it took over. The result: a SCHEDULED-source autopilot_run with planned_at = Mon 17:10 Beijing but a wall-clock dispatch at Tue ~12:30 Beijing. Timezone math was correct; the bug was purely the cold-start anchor not respecting prior-fire history. Fix Co-authored-by: multica-agent <github@multica.ai> --- The `autopilot_trigger.last_fired_at` column is maintained by both the legacy goroutine and the new scheduler (via TouchAutopilotTriggerFiredAt), so it is the authoritative "most-recent successful fire" cursor across the migration boundary. The planner hook now anchors cold-start enumeration on it: case latest.Found: after = latest.PlanTime case lastFiredAt != zero: after = lastFiredAt default: after = cfg.CreatedAt For the regressed case, `after = Mon 17:10 Beijing`, the next enumeration window is `(Mon 17:10, Tue 12:30]`, and Tue 17:10 is in the future — the hook returns nothing and the trigger waits quietly for Tue 17:10 as the UI promised. For brand-new triggers (last_fired_at NULL), the original `created_at` path still applies. For long-dormant triggers the `replayWindow` cap remains. Changes ------- * `ListSchedulableAutopilotTriggers` SQL now returns `last_fired_at`. * `autopilotTriggerConfig.LastFiredAt` is populated by the scope provider on every tick. * `autopilotPlansForScope` cold-start branch uses the new anchor. Tests ----- * TestAutopilotScheduleJobColdStartHonorsLastFiredAt — seeds the exact dev-environment shape (created 3 days ago, last_fired_at 5 hours ago, no sys_cron_executions row), runs a tick, asserts zero exec rows AND zero autopilot_run rows. Without the fix this test produces one of each at a historical plan_time. * TestAutopilotScheduleJobColdStartBrandNewTriggerStillFires — asserts a brand-new trigger (last_fired_at NULL) still fires its first due occurrence on cold start. All existing `TestAutopilotScheduleJob*` tests still pass. Refs MUL-3551 Co-authored-by: Eve <eve@multica-ai.local>