Files
multica/server/pkg/db/generated/autopilot.sql.go
Multica Eve a2dd80d4f6 feat(autopilot): skip dispatch when assignee runtime is offline (MUL-1899) (#2311)
* feat(autopilot): skip dispatch when assignee runtime is offline (MUL-1899)

Prevents scheduled autopilots from accumulating doomed tasks against
offline / archived / unbound agents. Before this change, a paused laptop
or crashed daemon would let a 5-minute-cron autopilot pile up thousands
of queued agent_task_queue rows that no runtime would ever drain — this
is the dominant source of the 89k stuck-task backlog flagged in MUL-1899.

DispatchAutopilot now performs a pre-flight admission check on the
assignee agent's runtime status. If the runtime is not 'online' (or the
agent is archived / has no runtime bound / has no assignee), the run is
recorded as 'skipped' with a failure_reason and no task is enqueued.
Skipped runs still emit autopilot:run.done so the UI / activity feed
reflect that the trigger fired and was evaluated.

Skipped runs are deliberately NOT counted toward the failure-ratio
auto-pause: a user who closes their laptop overnight should not have
their autopilot paused. Sustained server-side failures keep their
existing pause path via the failure monitor.

Tests: added an integration test that creates an offline runtime and
asserts DispatchAutopilot records a skipped run with no task enqueued.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: multica-agent <github@multica.ai>

* feat(scheduler): expire stale queued tasks via TTL sweeper (MUL-1899)

Companion to the dispatch-time admission gate added in this PR. The
admission gate prevents *new* tasks from being enqueued against an
offline runtime, but it does not drain the historical backlog
(~89k stuck queued rows observed at MUL-1899 baseline) and does not
help when a runtime goes offline *after* a task has already been
queued. This adds a passive TTL sweeper:

- New SQL query `ExpireStaleQueuedTasks` transitions queued tasks
  older than the TTL to status='failed' with
  failure_reason='queued_expired' and a clear error message.
- Sweep is capped per tick (`queuedExpireBatchSize`, default 500) via
  a CTE+LIMIT so that draining a large backlog cannot monopolise the
  DB on a single tick. At 30s ticks the worst case is 60k rows/hour.
- Wired into the existing 30s `runRuntimeSweeper` loop alongside
  `sweepStaleTasks` and reuses `taskSvc.HandleFailedTasks` so the
  expired tasks broadcast `task:failed` events, reconcile agent
  status, and roll back any in-progress issues — same lifecycle as
  any other failed task.
- Default TTL = 2h. Conservatively above any reasonable
  "queued behind a long-running task" window (default agent timeout
  is 2h, sweeper runs every 30s) so legitimate work isn't expired.
- Integration tests cover the happy path (stale → expired, fresh →
  left alone, correct status/reason/error) and the per-tick batch cap.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: multica-agent <github@multica.ai>

* fix(autopilot): address review blockers from PR #2311 (MUL-1899)

GPT-Boy review of the offline-runtime + queued-TTL PR flagged four
blockers; this commit addresses them all.

1. Restore the 'skipped' autopilot_run status in the DB constraint.
   Migration 043 had removed 'skipped' along with the now-defunct
   concurrency_policy feature, so the new admission gate's INSERT of
   status='skipped' violated `autopilot_run_status_check` and broke
   `TestAutopilotDispatchSkipsWhenRuntimeOffline` in CI. New
   migration 079 re-adds 'skipped' to the CHECK list. The down
   migration migrates skipped → failed before re-tightening, mirror-
   ing what 043 did for the original removal.

2. Make `ExpireStaleQueuedTasks` race-safe.
   The CTE-then-UPDATE pattern could clobber a task that the daemon
   claimed between victim selection and the outer update. Two
   guards added:
     - `FOR UPDATE SKIP LOCKED` in the CTE so we never wait on a
       row that's currently being claimed (and never block the
       claim path either).
     - The outer UPDATE now re-checks `t.status = 'queued'` AND the
       TTL predicate so even if a row's lock is released after a
       successful claim, we cannot transition a now-dispatched/
       running task to 'failed'.

3. Add a partial index for the queued-TTL sweeper.
   `idx_agent_task_queue_queued_created_at` on `created_at WHERE
   status = 'queued'` — keeps the 30s sweep query (status=queued
   AND created_at < ... ORDER BY created_at LIMIT 500) cheap even
   when historical terminal rows accumulate (~89k+ at MUL-1899
   baseline). The partial predicate keeps the index tiny because
   only in-flight rows live in 'queued'.

4. Fix the failure-monitor denominator.
   `SelectAutopilotsExceedingFailureThreshold` had been counting
   'skipped' toward total runs, which would have diluted the failure
   ratio: a 100%-failing autopilot could mask itself behind a wall
   of admission skips. With 'skipped' restored as a real status,
   the auto-pause monitor must explicitly exclude it from BOTH
   numerator and denominator — admission skips are neither a
   success nor a failure.

Verified: `go test ./cmd/server/... ./internal/service/...` passes
(including TestAutopilotDispatchSkipsWhenRuntimeOffline,
TestExpireStaleQueuedTasks, TestExpireStaleQueuedTasksRespectsBatch
Limit). `go build ./... && go vet ./...` clean.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: multica-agent <github@multica.ai>

* fix(migrations): split queued-task TTL index into concurrent migration

Per PR #2311 review: agent_task_queue is a hot table, so building the
new partial index with plain CREATE INDEX inside migration 079 would
hold ACCESS EXCLUSIVE on the queue and block dispatch during deploy.

The migration runner does not allow CONCURRENTLY to share a file with
other statements (documented in 068), so split the index into its own
single-statement file 080 — matching the existing pattern in 035 /
067 / 074 / 075 / 078. Migration 079 keeps the autopilot_run
constraint change.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: multica-agent <github@multica.ai>

---------

Co-authored-by: Eve <eve@multica-ai.local>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: multica-agent <github@multica.ai>
2026-05-09 15:07:57 +08:00

1082 lines
31 KiB
Go

// Code generated by sqlc. DO NOT EDIT.
// versions:
// sqlc v1.30.0
// source: autopilot.sql
package db
import (
"context"
"github.com/jackc/pgx/v5/pgtype"
)
const advanceTriggerNextRun = `-- name: AdvanceTriggerNextRun :exec
UPDATE autopilot_trigger
SET next_run_at = $2,
last_fired_at = now(),
updated_at = now()
WHERE id = $1
`
type AdvanceTriggerNextRunParams struct {
ID pgtype.UUID `json:"id"`
NextRunAt pgtype.Timestamptz `json:"next_run_at"`
}
func (q *Queries) AdvanceTriggerNextRun(ctx context.Context, arg AdvanceTriggerNextRunParams) error {
_, err := q.db.Exec(ctx, advanceTriggerNextRun, arg.ID, arg.NextRunAt)
return err
}
const claimDueScheduleTriggers = `-- name: ClaimDueScheduleTriggers :many
UPDATE autopilot_trigger t
SET next_run_at = NULL
FROM autopilot a
WHERE t.autopilot_id = a.id
AND t.kind = 'schedule'
AND t.enabled = true
AND t.next_run_at IS NOT NULL
AND t.next_run_at <= now()
AND a.status = 'active'
RETURNING t.id, t.autopilot_id, t.kind, t.enabled, t.cron_expression, t.timezone, t.next_run_at, t.webhook_token, t.label, t.last_fired_at, t.created_at, t.updated_at, a.workspace_id AS autopilot_workspace_id
`
type ClaimDueScheduleTriggersRow struct {
ID pgtype.UUID `json:"id"`
AutopilotID pgtype.UUID `json:"autopilot_id"`
Kind string `json:"kind"`
Enabled bool `json:"enabled"`
CronExpression pgtype.Text `json:"cron_expression"`
Timezone pgtype.Text `json:"timezone"`
NextRunAt pgtype.Timestamptz `json:"next_run_at"`
WebhookToken pgtype.Text `json:"webhook_token"`
Label pgtype.Text `json:"label"`
LastFiredAt pgtype.Timestamptz `json:"last_fired_at"`
CreatedAt pgtype.Timestamptz `json:"created_at"`
UpdatedAt pgtype.Timestamptz `json:"updated_at"`
AutopilotWorkspaceID pgtype.UUID `json:"autopilot_workspace_id"`
}
// =====================
// Scheduler Queries
// =====================
// Atomically claim all due schedule triggers to prevent concurrent execution.
// Joins the autopilot table to ensure only active autopilots are fired.
func (q *Queries) ClaimDueScheduleTriggers(ctx context.Context) ([]ClaimDueScheduleTriggersRow, error) {
rows, err := q.db.Query(ctx, claimDueScheduleTriggers)
if err != nil {
return nil, err
}
defer rows.Close()
items := []ClaimDueScheduleTriggersRow{}
for rows.Next() {
var i ClaimDueScheduleTriggersRow
if err := rows.Scan(
&i.ID,
&i.AutopilotID,
&i.Kind,
&i.Enabled,
&i.CronExpression,
&i.Timezone,
&i.NextRunAt,
&i.WebhookToken,
&i.Label,
&i.LastFiredAt,
&i.CreatedAt,
&i.UpdatedAt,
&i.AutopilotWorkspaceID,
); err != nil {
return nil, err
}
items = append(items, i)
}
if err := rows.Err(); err != nil {
return nil, err
}
return items, nil
}
const createAutopilot = `-- name: CreateAutopilot :one
INSERT INTO autopilot (
workspace_id, title, description, assignee_id,
status, execution_mode, issue_title_template,
created_by_type, created_by_id
) VALUES (
$1, $2, $8, $3,
$4, $5, $9,
$6, $7
) RETURNING id, workspace_id, title, description, assignee_id, status, execution_mode, issue_title_template, created_by_type, created_by_id, last_run_at, created_at, updated_at
`
type CreateAutopilotParams struct {
WorkspaceID pgtype.UUID `json:"workspace_id"`
Title string `json:"title"`
AssigneeID pgtype.UUID `json:"assignee_id"`
Status string `json:"status"`
ExecutionMode string `json:"execution_mode"`
CreatedByType string `json:"created_by_type"`
CreatedByID pgtype.UUID `json:"created_by_id"`
Description pgtype.Text `json:"description"`
IssueTitleTemplate pgtype.Text `json:"issue_title_template"`
}
func (q *Queries) CreateAutopilot(ctx context.Context, arg CreateAutopilotParams) (Autopilot, error) {
row := q.db.QueryRow(ctx, createAutopilot,
arg.WorkspaceID,
arg.Title,
arg.AssigneeID,
arg.Status,
arg.ExecutionMode,
arg.CreatedByType,
arg.CreatedByID,
arg.Description,
arg.IssueTitleTemplate,
)
var i Autopilot
err := row.Scan(
&i.ID,
&i.WorkspaceID,
&i.Title,
&i.Description,
&i.AssigneeID,
&i.Status,
&i.ExecutionMode,
&i.IssueTitleTemplate,
&i.CreatedByType,
&i.CreatedByID,
&i.LastRunAt,
&i.CreatedAt,
&i.UpdatedAt,
)
return i, err
}
const createAutopilotRun = `-- name: CreateAutopilotRun :one
INSERT INTO autopilot_run (
autopilot_id, trigger_id, source, status, trigger_payload
) VALUES (
$1, $4, $2, $3, $5
) RETURNING id, autopilot_id, trigger_id, source, status, issue_id, task_id, triggered_at, completed_at, failure_reason, trigger_payload, result, created_at
`
type CreateAutopilotRunParams struct {
AutopilotID pgtype.UUID `json:"autopilot_id"`
Source string `json:"source"`
Status string `json:"status"`
TriggerID pgtype.UUID `json:"trigger_id"`
TriggerPayload []byte `json:"trigger_payload"`
}
// =====================
// Autopilot Run Management
// =====================
func (q *Queries) CreateAutopilotRun(ctx context.Context, arg CreateAutopilotRunParams) (AutopilotRun, error) {
row := q.db.QueryRow(ctx, createAutopilotRun,
arg.AutopilotID,
arg.Source,
arg.Status,
arg.TriggerID,
arg.TriggerPayload,
)
var i AutopilotRun
err := row.Scan(
&i.ID,
&i.AutopilotID,
&i.TriggerID,
&i.Source,
&i.Status,
&i.IssueID,
&i.TaskID,
&i.TriggeredAt,
&i.CompletedAt,
&i.FailureReason,
&i.TriggerPayload,
&i.Result,
&i.CreatedAt,
)
return i, err
}
const createAutopilotTask = `-- name: CreateAutopilotTask :one
INSERT INTO agent_task_queue (agent_id, runtime_id, issue_id, status, priority, autopilot_run_id, trigger_summary)
VALUES ($1, $2, NULL, 'queued', $3, $4, $5)
RETURNING id, agent_id, issue_id, status, priority, dispatched_at, started_at, completed_at, result, error, created_at, context, runtime_id, session_id, work_dir, trigger_comment_id, chat_session_id, autopilot_run_id, attempt, max_attempts, parent_task_id, failure_reason, trigger_summary, force_fresh_session
`
type CreateAutopilotTaskParams struct {
AgentID pgtype.UUID `json:"agent_id"`
RuntimeID pgtype.UUID `json:"runtime_id"`
Priority int32 `json:"priority"`
AutopilotRunID pgtype.UUID `json:"autopilot_run_id"`
TriggerSummary pgtype.Text `json:"trigger_summary"`
}
// =====================
// Task Queue (run_only mode)
// =====================
func (q *Queries) CreateAutopilotTask(ctx context.Context, arg CreateAutopilotTaskParams) (AgentTaskQueue, error) {
row := q.db.QueryRow(ctx, createAutopilotTask,
arg.AgentID,
arg.RuntimeID,
arg.Priority,
arg.AutopilotRunID,
arg.TriggerSummary,
)
var i AgentTaskQueue
err := row.Scan(
&i.ID,
&i.AgentID,
&i.IssueID,
&i.Status,
&i.Priority,
&i.DispatchedAt,
&i.StartedAt,
&i.CompletedAt,
&i.Result,
&i.Error,
&i.CreatedAt,
&i.Context,
&i.RuntimeID,
&i.SessionID,
&i.WorkDir,
&i.TriggerCommentID,
&i.ChatSessionID,
&i.AutopilotRunID,
&i.Attempt,
&i.MaxAttempts,
&i.ParentTaskID,
&i.FailureReason,
&i.TriggerSummary,
&i.ForceFreshSession,
)
return i, err
}
const createAutopilotTrigger = `-- name: CreateAutopilotTrigger :one
INSERT INTO autopilot_trigger (
autopilot_id, kind, enabled, cron_expression, timezone,
next_run_at, webhook_token, label
) VALUES (
$1, $2, $3, $4, $5,
$6, $7, $8
) RETURNING id, autopilot_id, kind, enabled, cron_expression, timezone, next_run_at, webhook_token, label, last_fired_at, created_at, updated_at
`
type CreateAutopilotTriggerParams struct {
AutopilotID pgtype.UUID `json:"autopilot_id"`
Kind string `json:"kind"`
Enabled bool `json:"enabled"`
CronExpression pgtype.Text `json:"cron_expression"`
Timezone pgtype.Text `json:"timezone"`
NextRunAt pgtype.Timestamptz `json:"next_run_at"`
WebhookToken pgtype.Text `json:"webhook_token"`
Label pgtype.Text `json:"label"`
}
func (q *Queries) CreateAutopilotTrigger(ctx context.Context, arg CreateAutopilotTriggerParams) (AutopilotTrigger, error) {
row := q.db.QueryRow(ctx, createAutopilotTrigger,
arg.AutopilotID,
arg.Kind,
arg.Enabled,
arg.CronExpression,
arg.Timezone,
arg.NextRunAt,
arg.WebhookToken,
arg.Label,
)
var i AutopilotTrigger
err := row.Scan(
&i.ID,
&i.AutopilotID,
&i.Kind,
&i.Enabled,
&i.CronExpression,
&i.Timezone,
&i.NextRunAt,
&i.WebhookToken,
&i.Label,
&i.LastFiredAt,
&i.CreatedAt,
&i.UpdatedAt,
)
return i, err
}
const deleteAutopilot = `-- name: DeleteAutopilot :exec
DELETE FROM autopilot WHERE id = $1
`
func (q *Queries) DeleteAutopilot(ctx context.Context, id pgtype.UUID) error {
_, err := q.db.Exec(ctx, deleteAutopilot, id)
return err
}
const deleteAutopilotTrigger = `-- name: DeleteAutopilotTrigger :exec
DELETE FROM autopilot_trigger WHERE id = $1
`
func (q *Queries) DeleteAutopilotTrigger(ctx context.Context, id pgtype.UUID) error {
_, err := q.db.Exec(ctx, deleteAutopilotTrigger, id)
return err
}
const failAutopilotRunsByIssue = `-- name: FailAutopilotRunsByIssue :exec
UPDATE autopilot_run
SET status = 'failed', completed_at = now(), failure_reason = 'linked issue was deleted'
WHERE issue_id = $1
AND status IN ('issue_created', 'running')
`
// Fails active autopilot runs linked to a given issue.
// Must be called BEFORE issue deletion (ON DELETE SET NULL clears issue_id).
func (q *Queries) FailAutopilotRunsByIssue(ctx context.Context, issueID pgtype.UUID) error {
_, err := q.db.Exec(ctx, failAutopilotRunsByIssue, issueID)
return err
}
const getAutopilot = `-- name: GetAutopilot :one
SELECT id, workspace_id, title, description, assignee_id, status, execution_mode, issue_title_template, created_by_type, created_by_id, last_run_at, created_at, updated_at FROM autopilot
WHERE id = $1
`
func (q *Queries) GetAutopilot(ctx context.Context, id pgtype.UUID) (Autopilot, error) {
row := q.db.QueryRow(ctx, getAutopilot, id)
var i Autopilot
err := row.Scan(
&i.ID,
&i.WorkspaceID,
&i.Title,
&i.Description,
&i.AssigneeID,
&i.Status,
&i.ExecutionMode,
&i.IssueTitleTemplate,
&i.CreatedByType,
&i.CreatedByID,
&i.LastRunAt,
&i.CreatedAt,
&i.UpdatedAt,
)
return i, err
}
const getAutopilotInWorkspace = `-- name: GetAutopilotInWorkspace :one
SELECT id, workspace_id, title, description, assignee_id, status, execution_mode, issue_title_template, created_by_type, created_by_id, last_run_at, created_at, updated_at FROM autopilot
WHERE id = $1 AND workspace_id = $2
`
type GetAutopilotInWorkspaceParams struct {
ID pgtype.UUID `json:"id"`
WorkspaceID pgtype.UUID `json:"workspace_id"`
}
func (q *Queries) GetAutopilotInWorkspace(ctx context.Context, arg GetAutopilotInWorkspaceParams) (Autopilot, error) {
row := q.db.QueryRow(ctx, getAutopilotInWorkspace, arg.ID, arg.WorkspaceID)
var i Autopilot
err := row.Scan(
&i.ID,
&i.WorkspaceID,
&i.Title,
&i.Description,
&i.AssigneeID,
&i.Status,
&i.ExecutionMode,
&i.IssueTitleTemplate,
&i.CreatedByType,
&i.CreatedByID,
&i.LastRunAt,
&i.CreatedAt,
&i.UpdatedAt,
)
return i, err
}
const getAutopilotRun = `-- name: GetAutopilotRun :one
SELECT id, autopilot_id, trigger_id, source, status, issue_id, task_id, triggered_at, completed_at, failure_reason, trigger_payload, result, created_at FROM autopilot_run
WHERE id = $1
`
func (q *Queries) GetAutopilotRun(ctx context.Context, id pgtype.UUID) (AutopilotRun, error) {
row := q.db.QueryRow(ctx, getAutopilotRun, id)
var i AutopilotRun
err := row.Scan(
&i.ID,
&i.AutopilotID,
&i.TriggerID,
&i.Source,
&i.Status,
&i.IssueID,
&i.TaskID,
&i.TriggeredAt,
&i.CompletedAt,
&i.FailureReason,
&i.TriggerPayload,
&i.Result,
&i.CreatedAt,
)
return i, err
}
const getAutopilotRunByIssue = `-- name: GetAutopilotRunByIssue :one
SELECT id, autopilot_id, trigger_id, source, status, issue_id, task_id, triggered_at, completed_at, failure_reason, trigger_payload, result, created_at FROM autopilot_run
WHERE issue_id = $1 AND status IN ('issue_created', 'running')
LIMIT 1
`
// =====================
// Run lookup by linked entities
// =====================
func (q *Queries) GetAutopilotRunByIssue(ctx context.Context, issueID pgtype.UUID) (AutopilotRun, error) {
row := q.db.QueryRow(ctx, getAutopilotRunByIssue, issueID)
var i AutopilotRun
err := row.Scan(
&i.ID,
&i.AutopilotID,
&i.TriggerID,
&i.Source,
&i.Status,
&i.IssueID,
&i.TaskID,
&i.TriggeredAt,
&i.CompletedAt,
&i.FailureReason,
&i.TriggerPayload,
&i.Result,
&i.CreatedAt,
)
return i, err
}
const getAutopilotTrigger = `-- name: GetAutopilotTrigger :one
SELECT id, autopilot_id, kind, enabled, cron_expression, timezone, next_run_at, webhook_token, label, last_fired_at, created_at, updated_at FROM autopilot_trigger
WHERE id = $1
`
func (q *Queries) GetAutopilotTrigger(ctx context.Context, id pgtype.UUID) (AutopilotTrigger, error) {
row := q.db.QueryRow(ctx, getAutopilotTrigger, id)
var i AutopilotTrigger
err := row.Scan(
&i.ID,
&i.AutopilotID,
&i.Kind,
&i.Enabled,
&i.CronExpression,
&i.Timezone,
&i.NextRunAt,
&i.WebhookToken,
&i.Label,
&i.LastFiredAt,
&i.CreatedAt,
&i.UpdatedAt,
)
return i, err
}
const listAutopilotRuns = `-- name: ListAutopilotRuns :many
SELECT id, autopilot_id, trigger_id, source, status, issue_id, task_id, triggered_at, completed_at, failure_reason, trigger_payload, result, created_at FROM autopilot_run
WHERE autopilot_id = $1
ORDER BY created_at DESC
LIMIT $2 OFFSET $3
`
type ListAutopilotRunsParams struct {
AutopilotID pgtype.UUID `json:"autopilot_id"`
Limit int32 `json:"limit"`
Offset int32 `json:"offset"`
}
func (q *Queries) ListAutopilotRuns(ctx context.Context, arg ListAutopilotRunsParams) ([]AutopilotRun, error) {
rows, err := q.db.Query(ctx, listAutopilotRuns, arg.AutopilotID, arg.Limit, arg.Offset)
if err != nil {
return nil, err
}
defer rows.Close()
items := []AutopilotRun{}
for rows.Next() {
var i AutopilotRun
if err := rows.Scan(
&i.ID,
&i.AutopilotID,
&i.TriggerID,
&i.Source,
&i.Status,
&i.IssueID,
&i.TaskID,
&i.TriggeredAt,
&i.CompletedAt,
&i.FailureReason,
&i.TriggerPayload,
&i.Result,
&i.CreatedAt,
); err != nil {
return nil, err
}
items = append(items, i)
}
if err := rows.Err(); err != nil {
return nil, err
}
return items, nil
}
const listAutopilotTriggers = `-- name: ListAutopilotTriggers :many
SELECT id, autopilot_id, kind, enabled, cron_expression, timezone, next_run_at, webhook_token, label, last_fired_at, created_at, updated_at FROM autopilot_trigger
WHERE autopilot_id = $1
ORDER BY created_at ASC
`
// =====================
// Autopilot Trigger CRUD
// =====================
func (q *Queries) ListAutopilotTriggers(ctx context.Context, autopilotID pgtype.UUID) ([]AutopilotTrigger, error) {
rows, err := q.db.Query(ctx, listAutopilotTriggers, autopilotID)
if err != nil {
return nil, err
}
defer rows.Close()
items := []AutopilotTrigger{}
for rows.Next() {
var i AutopilotTrigger
if err := rows.Scan(
&i.ID,
&i.AutopilotID,
&i.Kind,
&i.Enabled,
&i.CronExpression,
&i.Timezone,
&i.NextRunAt,
&i.WebhookToken,
&i.Label,
&i.LastFiredAt,
&i.CreatedAt,
&i.UpdatedAt,
); err != nil {
return nil, err
}
items = append(items, i)
}
if err := rows.Err(); err != nil {
return nil, err
}
return items, nil
}
const listAutopilots = `-- name: ListAutopilots :many
SELECT id, workspace_id, title, description, assignee_id, status, execution_mode, issue_title_template, created_by_type, created_by_id, last_run_at, created_at, updated_at FROM autopilot
WHERE workspace_id = $1
AND ($2::text IS NULL OR status = $2)
ORDER BY created_at DESC
`
type ListAutopilotsParams struct {
WorkspaceID pgtype.UUID `json:"workspace_id"`
Status pgtype.Text `json:"status"`
}
// =====================
// Autopilot CRUD
// =====================
func (q *Queries) ListAutopilots(ctx context.Context, arg ListAutopilotsParams) ([]Autopilot, error) {
rows, err := q.db.Query(ctx, listAutopilots, arg.WorkspaceID, arg.Status)
if err != nil {
return nil, err
}
defer rows.Close()
items := []Autopilot{}
for rows.Next() {
var i Autopilot
if err := rows.Scan(
&i.ID,
&i.WorkspaceID,
&i.Title,
&i.Description,
&i.AssigneeID,
&i.Status,
&i.ExecutionMode,
&i.IssueTitleTemplate,
&i.CreatedByType,
&i.CreatedByID,
&i.LastRunAt,
&i.CreatedAt,
&i.UpdatedAt,
); err != nil {
return nil, err
}
items = append(items, i)
}
if err := rows.Err(); err != nil {
return nil, err
}
return items, nil
}
const recoverLostTriggers = `-- name: RecoverLostTriggers :many
SELECT t.id, t.autopilot_id, t.kind, t.enabled, t.cron_expression, t.timezone, t.next_run_at, t.webhook_token, t.label, t.last_fired_at, t.created_at, t.updated_at, a.workspace_id AS autopilot_workspace_id
FROM autopilot_trigger t
JOIN autopilot a ON t.autopilot_id = a.id
WHERE t.kind = 'schedule'
AND t.enabled = true
AND t.next_run_at IS NULL
AND t.cron_expression IS NOT NULL
AND a.status = 'active'
`
type RecoverLostTriggersRow struct {
ID pgtype.UUID `json:"id"`
AutopilotID pgtype.UUID `json:"autopilot_id"`
Kind string `json:"kind"`
Enabled bool `json:"enabled"`
CronExpression pgtype.Text `json:"cron_expression"`
Timezone pgtype.Text `json:"timezone"`
NextRunAt pgtype.Timestamptz `json:"next_run_at"`
WebhookToken pgtype.Text `json:"webhook_token"`
Label pgtype.Text `json:"label"`
LastFiredAt pgtype.Timestamptz `json:"last_fired_at"`
CreatedAt pgtype.Timestamptz `json:"created_at"`
UpdatedAt pgtype.Timestamptz `json:"updated_at"`
AutopilotWorkspaceID pgtype.UUID `json:"autopilot_workspace_id"`
}
// =====================
// Scheduler Recovery
// =====================
// Finds schedule triggers that were claimed (next_run_at = NULL) but never
// advanced — typically due to a scheduler crash. Returns them so the scheduler
// can recompute next_run_at.
func (q *Queries) RecoverLostTriggers(ctx context.Context) ([]RecoverLostTriggersRow, error) {
rows, err := q.db.Query(ctx, recoverLostTriggers)
if err != nil {
return nil, err
}
defer rows.Close()
items := []RecoverLostTriggersRow{}
for rows.Next() {
var i RecoverLostTriggersRow
if err := rows.Scan(
&i.ID,
&i.AutopilotID,
&i.Kind,
&i.Enabled,
&i.CronExpression,
&i.Timezone,
&i.NextRunAt,
&i.WebhookToken,
&i.Label,
&i.LastFiredAt,
&i.CreatedAt,
&i.UpdatedAt,
&i.AutopilotWorkspaceID,
); err != nil {
return nil, err
}
items = append(items, i)
}
if err := rows.Err(); err != nil {
return nil, err
}
return items, nil
}
const selectAutopilotsExceedingFailureThreshold = `-- name: SelectAutopilotsExceedingFailureThreshold :many
WITH stats AS (
SELECT autopilot_id,
count(*) FILTER (WHERE status IN ('completed', 'failed')) AS total,
count(*) FILTER (WHERE status = 'failed') AS failed
FROM autopilot_run
WHERE created_at >= $3::timestamptz
GROUP BY autopilot_id
)
SELECT a.id, a.workspace_id, a.title, a.assignee_id,
a.created_by_type, a.created_by_id,
s.total::bigint AS total_runs,
s.failed::bigint AS failed_runs
FROM autopilot a
JOIN stats s ON s.autopilot_id = a.id
WHERE a.status = 'active'
AND s.total >= $1::bigint
AND s.failed::float8 / NULLIF(s.total, 0)::float8 >= $2::float8
ORDER BY s.failed DESC, a.id ASC
`
type SelectAutopilotsExceedingFailureThresholdParams struct {
MinRuns int64 `json:"min_runs"`
FailRatioThreshold float64 `json:"fail_ratio_threshold"`
Since pgtype.Timestamptz `json:"since"`
}
type SelectAutopilotsExceedingFailureThresholdRow struct {
ID pgtype.UUID `json:"id"`
WorkspaceID pgtype.UUID `json:"workspace_id"`
Title string `json:"title"`
AssigneeID pgtype.UUID `json:"assignee_id"`
CreatedByType string `json:"created_by_type"`
CreatedByID pgtype.UUID `json:"created_by_id"`
TotalRuns int64 `json:"total_runs"`
FailedRuns int64 `json:"failed_runs"`
}
// =====================
// Failure-rate auto-pause
// =====================
// Find active autopilots whose recent run failure rate exceeds the threshold.
// Counts only "real" terminal runs (completed | failed). 'skipped' is
// excluded from BOTH numerator and denominator: an admission-skipped run
// (e.g. assignee runtime offline at dispatch time, MUL-1899) is neither a
// success nor a failure, so it must not dilute the failure ratio (which
// would let a 100%-failing autopilot mask itself behind a wall of skips)
// nor inflate it. issue_created/running are still excluded so in-flight
// work isn't penalised.
// Used by the failure monitor to auto-pause sustained-failure autopilots
// (the canonical example from MUL-1336 was an autopilot scheduled every 5 min
// that 100% failed for days, burning ~1.5k useless tasks per week).
func (q *Queries) SelectAutopilotsExceedingFailureThreshold(ctx context.Context, arg SelectAutopilotsExceedingFailureThresholdParams) ([]SelectAutopilotsExceedingFailureThresholdRow, error) {
rows, err := q.db.Query(ctx, selectAutopilotsExceedingFailureThreshold, arg.MinRuns, arg.FailRatioThreshold, arg.Since)
if err != nil {
return nil, err
}
defer rows.Close()
items := []SelectAutopilotsExceedingFailureThresholdRow{}
for rows.Next() {
var i SelectAutopilotsExceedingFailureThresholdRow
if err := rows.Scan(
&i.ID,
&i.WorkspaceID,
&i.Title,
&i.AssigneeID,
&i.CreatedByType,
&i.CreatedByID,
&i.TotalRuns,
&i.FailedRuns,
); err != nil {
return nil, err
}
items = append(items, i)
}
if err := rows.Err(); err != nil {
return nil, err
}
return items, nil
}
const systemPauseAutopilot = `-- name: SystemPauseAutopilot :one
UPDATE autopilot
SET status = 'paused', updated_at = now()
WHERE id = $1 AND status = 'active'
RETURNING id, workspace_id, title, description, assignee_id, status, execution_mode, issue_title_template, created_by_type, created_by_id, last_run_at, created_at, updated_at
`
// Atomically pauses an autopilot only if it is currently active. Returns no
// rows when the autopilot was already paused/archived (or another worker
// raced first), letting the caller treat that as a benign no-op rather than
// an error.
func (q *Queries) SystemPauseAutopilot(ctx context.Context, id pgtype.UUID) (Autopilot, error) {
row := q.db.QueryRow(ctx, systemPauseAutopilot, id)
var i Autopilot
err := row.Scan(
&i.ID,
&i.WorkspaceID,
&i.Title,
&i.Description,
&i.AssigneeID,
&i.Status,
&i.ExecutionMode,
&i.IssueTitleTemplate,
&i.CreatedByType,
&i.CreatedByID,
&i.LastRunAt,
&i.CreatedAt,
&i.UpdatedAt,
)
return i, err
}
const updateAutopilot = `-- name: UpdateAutopilot :one
UPDATE autopilot SET
title = COALESCE($2, title),
description = COALESCE($3, description),
assignee_id = COALESCE($4::uuid, assignee_id),
status = COALESCE($5, status),
execution_mode = COALESCE($6, execution_mode),
issue_title_template = $7,
updated_at = now()
WHERE id = $1
RETURNING id, workspace_id, title, description, assignee_id, status, execution_mode, issue_title_template, created_by_type, created_by_id, last_run_at, created_at, updated_at
`
type UpdateAutopilotParams struct {
ID pgtype.UUID `json:"id"`
Title pgtype.Text `json:"title"`
Description pgtype.Text `json:"description"`
AssigneeID pgtype.UUID `json:"assignee_id"`
Status pgtype.Text `json:"status"`
ExecutionMode pgtype.Text `json:"execution_mode"`
IssueTitleTemplate pgtype.Text `json:"issue_title_template"`
}
func (q *Queries) UpdateAutopilot(ctx context.Context, arg UpdateAutopilotParams) (Autopilot, error) {
row := q.db.QueryRow(ctx, updateAutopilot,
arg.ID,
arg.Title,
arg.Description,
arg.AssigneeID,
arg.Status,
arg.ExecutionMode,
arg.IssueTitleTemplate,
)
var i Autopilot
err := row.Scan(
&i.ID,
&i.WorkspaceID,
&i.Title,
&i.Description,
&i.AssigneeID,
&i.Status,
&i.ExecutionMode,
&i.IssueTitleTemplate,
&i.CreatedByType,
&i.CreatedByID,
&i.LastRunAt,
&i.CreatedAt,
&i.UpdatedAt,
)
return i, err
}
const updateAutopilotLastRunAt = `-- name: UpdateAutopilotLastRunAt :exec
UPDATE autopilot SET last_run_at = now(), updated_at = now()
WHERE id = $1
`
func (q *Queries) UpdateAutopilotLastRunAt(ctx context.Context, id pgtype.UUID) error {
_, err := q.db.Exec(ctx, updateAutopilotLastRunAt, id)
return err
}
const updateAutopilotRunCompleted = `-- name: UpdateAutopilotRunCompleted :one
UPDATE autopilot_run
SET status = 'completed', completed_at = now(), result = $2
WHERE id = $1
RETURNING id, autopilot_id, trigger_id, source, status, issue_id, task_id, triggered_at, completed_at, failure_reason, trigger_payload, result, created_at
`
type UpdateAutopilotRunCompletedParams struct {
ID pgtype.UUID `json:"id"`
Result []byte `json:"result"`
}
func (q *Queries) UpdateAutopilotRunCompleted(ctx context.Context, arg UpdateAutopilotRunCompletedParams) (AutopilotRun, error) {
row := q.db.QueryRow(ctx, updateAutopilotRunCompleted, arg.ID, arg.Result)
var i AutopilotRun
err := row.Scan(
&i.ID,
&i.AutopilotID,
&i.TriggerID,
&i.Source,
&i.Status,
&i.IssueID,
&i.TaskID,
&i.TriggeredAt,
&i.CompletedAt,
&i.FailureReason,
&i.TriggerPayload,
&i.Result,
&i.CreatedAt,
)
return i, err
}
const updateAutopilotRunFailed = `-- name: UpdateAutopilotRunFailed :one
UPDATE autopilot_run
SET status = 'failed', completed_at = now(), failure_reason = $2
WHERE id = $1
RETURNING id, autopilot_id, trigger_id, source, status, issue_id, task_id, triggered_at, completed_at, failure_reason, trigger_payload, result, created_at
`
type UpdateAutopilotRunFailedParams struct {
ID pgtype.UUID `json:"id"`
FailureReason pgtype.Text `json:"failure_reason"`
}
func (q *Queries) UpdateAutopilotRunFailed(ctx context.Context, arg UpdateAutopilotRunFailedParams) (AutopilotRun, error) {
row := q.db.QueryRow(ctx, updateAutopilotRunFailed, arg.ID, arg.FailureReason)
var i AutopilotRun
err := row.Scan(
&i.ID,
&i.AutopilotID,
&i.TriggerID,
&i.Source,
&i.Status,
&i.IssueID,
&i.TaskID,
&i.TriggeredAt,
&i.CompletedAt,
&i.FailureReason,
&i.TriggerPayload,
&i.Result,
&i.CreatedAt,
)
return i, err
}
const updateAutopilotRunIssueCreated = `-- name: UpdateAutopilotRunIssueCreated :one
UPDATE autopilot_run
SET status = 'issue_created', issue_id = $2
WHERE id = $1
RETURNING id, autopilot_id, trigger_id, source, status, issue_id, task_id, triggered_at, completed_at, failure_reason, trigger_payload, result, created_at
`
type UpdateAutopilotRunIssueCreatedParams struct {
ID pgtype.UUID `json:"id"`
IssueID pgtype.UUID `json:"issue_id"`
}
func (q *Queries) UpdateAutopilotRunIssueCreated(ctx context.Context, arg UpdateAutopilotRunIssueCreatedParams) (AutopilotRun, error) {
row := q.db.QueryRow(ctx, updateAutopilotRunIssueCreated, arg.ID, arg.IssueID)
var i AutopilotRun
err := row.Scan(
&i.ID,
&i.AutopilotID,
&i.TriggerID,
&i.Source,
&i.Status,
&i.IssueID,
&i.TaskID,
&i.TriggeredAt,
&i.CompletedAt,
&i.FailureReason,
&i.TriggerPayload,
&i.Result,
&i.CreatedAt,
)
return i, err
}
const updateAutopilotRunRunning = `-- name: UpdateAutopilotRunRunning :one
UPDATE autopilot_run
SET status = 'running', task_id = $2
WHERE id = $1
RETURNING id, autopilot_id, trigger_id, source, status, issue_id, task_id, triggered_at, completed_at, failure_reason, trigger_payload, result, created_at
`
type UpdateAutopilotRunRunningParams struct {
ID pgtype.UUID `json:"id"`
TaskID pgtype.UUID `json:"task_id"`
}
func (q *Queries) UpdateAutopilotRunRunning(ctx context.Context, arg UpdateAutopilotRunRunningParams) (AutopilotRun, error) {
row := q.db.QueryRow(ctx, updateAutopilotRunRunning, arg.ID, arg.TaskID)
var i AutopilotRun
err := row.Scan(
&i.ID,
&i.AutopilotID,
&i.TriggerID,
&i.Source,
&i.Status,
&i.IssueID,
&i.TaskID,
&i.TriggeredAt,
&i.CompletedAt,
&i.FailureReason,
&i.TriggerPayload,
&i.Result,
&i.CreatedAt,
)
return i, err
}
const updateAutopilotRunSkipped = `-- name: UpdateAutopilotRunSkipped :one
UPDATE autopilot_run
SET status = 'skipped', completed_at = now(), failure_reason = $2
WHERE id = $1
RETURNING id, autopilot_id, trigger_id, source, status, issue_id, task_id, triggered_at, completed_at, failure_reason, trigger_payload, result, created_at
`
type UpdateAutopilotRunSkippedParams struct {
ID pgtype.UUID `json:"id"`
FailureReason pgtype.Text `json:"failure_reason"`
}
// Marks an autopilot_run as skipped without enqueueing any task. Used by the
// pre-flight admission check when the assignee agent's runtime is offline:
// creating an issue / task in that state would just pile a doomed job onto
// agent_task_queue (the canonical "持续给离线 local agent 入队" symptom from
// MUL-1899). Recording the skip + reason gives the UI / failure monitor / ops
// a paper trail without polluting the failure ratio.
func (q *Queries) UpdateAutopilotRunSkipped(ctx context.Context, arg UpdateAutopilotRunSkippedParams) (AutopilotRun, error) {
row := q.db.QueryRow(ctx, updateAutopilotRunSkipped, arg.ID, arg.FailureReason)
var i AutopilotRun
err := row.Scan(
&i.ID,
&i.AutopilotID,
&i.TriggerID,
&i.Source,
&i.Status,
&i.IssueID,
&i.TaskID,
&i.TriggeredAt,
&i.CompletedAt,
&i.FailureReason,
&i.TriggerPayload,
&i.Result,
&i.CreatedAt,
)
return i, err
}
const updateAutopilotTrigger = `-- name: UpdateAutopilotTrigger :one
UPDATE autopilot_trigger SET
enabled = COALESCE($2::boolean, enabled),
cron_expression = COALESCE($3, cron_expression),
timezone = COALESCE($4, timezone),
next_run_at = $5,
label = COALESCE($6, label),
updated_at = now()
WHERE id = $1
RETURNING id, autopilot_id, kind, enabled, cron_expression, timezone, next_run_at, webhook_token, label, last_fired_at, created_at, updated_at
`
type UpdateAutopilotTriggerParams struct {
ID pgtype.UUID `json:"id"`
Enabled pgtype.Bool `json:"enabled"`
CronExpression pgtype.Text `json:"cron_expression"`
Timezone pgtype.Text `json:"timezone"`
NextRunAt pgtype.Timestamptz `json:"next_run_at"`
Label pgtype.Text `json:"label"`
}
func (q *Queries) UpdateAutopilotTrigger(ctx context.Context, arg UpdateAutopilotTriggerParams) (AutopilotTrigger, error) {
row := q.db.QueryRow(ctx, updateAutopilotTrigger,
arg.ID,
arg.Enabled,
arg.CronExpression,
arg.Timezone,
arg.NextRunAt,
arg.Label,
)
var i AutopilotTrigger
err := row.Scan(
&i.ID,
&i.AutopilotID,
&i.Kind,
&i.Enabled,
&i.CronExpression,
&i.Timezone,
&i.NextRunAt,
&i.WebhookToken,
&i.Label,
&i.LastFiredAt,
&i.CreatedAt,
&i.UpdatedAt,
)
return i, err
}