Files
multica/server/migrations/067_task_queue_claim_candidate_index.up.sql
Bohan Jiang 2dddfaa196 feat(daemon): Redis empty-claim fast path for /tasks/claim polling (#1860)
* feat(daemon): Redis empty-claim fast path for /tasks/claim polling

Daemons poll /tasks/claim every 30s per runtime; the steady-state
warm-empty case currently runs ListPendingTasksByRuntime against
Postgres on every poll. This collapses that path:

- New ListQueuedClaimCandidatesByRuntime query restricts to status =
  'queued' (the old query also returned 'dispatched' rows that can
  never be reclaimed) and is backed by a partial index keyed on
  (runtime_id, priority DESC, created_at ASC).
- New EmptyClaimCache caches the negative verdict in Redis with a
  30s TTL. ClaimTaskForRuntime checks the cache before SELECT and
  populates it on confirmed-empty results.
- notifyTaskAvailable now invalidates the runtime's empty key before
  kicking the daemon WS, so newly enqueued tasks become claimable
  immediately rather than waiting out the TTL.
- AutopilotService.dispatchRunOnly now goes through
  TaskService.NotifyTaskEnqueued so run_only tasks get the same
  invalidate-then-wakeup contract as every other enqueue path.

Co-authored-by: multica-agent <github@multica.ai>

* fix(daemon): close MarkEmpty/Bump race in empty-claim fast path

GPT-Boy's review on PR #1860 caught a real concurrency bug. Under the
prior implementation it was possible for a slow claim to write an
empty verdict AFTER a concurrent enqueue had already invalidated it:

  T1 claim:   SELECT -> empty
  T2 enqueue: INSERT row, DEL empty key (no-op, key not set yet),
              wakeup
  T1 claim:   SET empty (writes a stale "empty" verdict)
  T3 wakeup:  IsEmpty -> hit -> returns null

The just-queued task would then sit idle until the empty key's TTL
expired (up to 30s).

Replace the DEL-based invalidation with a per-runtime version
counter:

- CurrentVersion(rt) is a Redis INCR counter at
  mul:claim:runtime:version:<rt> with a 24h sliding TTL.
- Claim samples version BEFORE the SELECT and passes it to MarkEmpty,
  which stores the verdict's value as the observed-version string.
- IsEmpty MGETs both keys and trusts the verdict only when the
  empty-key value equals the current version.
- Enqueue Bumps the version (INCR + EXPIRE) before the wakeup,
  causing any verdict written under a prior version to be rejected
  on the next read.

Also bound every Redis call from this cache with a 250ms timeout —
notifyTaskAvailable uses a background context so a wedged Redis
must not block enqueue.

Tests against a real Redis (REDIS_TEST_URL) cover:
- MarkEmpty + IsEmpty under matching version returns hit
- Bump invalidates a prior empty verdict (race-fix pin)
- A MarkEmpty written under a stale pre-Bump version is rejected
- TTL clamping, per-runtime isolation, nil-cache safety
- notifyTaskAvailable Bumps before the wakeup fires

Co-authored-by: multica-agent <github@multica.ai>

* chore(daemon): renumber claim-candidate index migration to 067

Slot 064 was taken on main by 064_notification_preference. The
migration runner tracks per-version in schema_migrations and would
silently skip the second 064_*, leaving the index uncreated.
Rename to 067 (next free slot).

Co-authored-by: multica-agent <github@multica.ai>

---------

Co-authored-by: multica-agent <github@multica.ai>
2026-04-30 15:50:05 +08:00

12 lines
770 B
SQL

-- Partial index that backs ListQueuedClaimCandidatesByRuntime. Daemons poll
-- /tasks/claim every 30s per runtime; the filter "runtime_id = $1 AND
-- status = 'queued'" runs every poll and is the dominant cost on warm paths.
-- Restricting to status = 'queued' keeps the index tiny — terminal-state
-- rows (completed/failed/cancelled) accumulate forever in the table but are
-- excluded from the index, so it stays bounded by current queue depth.
-- ORDER BY priority DESC, created_at ASC mirrors the SELECT so the planner
-- can serve the query as an index-only scan without an extra sort.
CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_agent_task_queue_claim_candidates
ON agent_task_queue (runtime_id, priority DESC, created_at ASC)
WHERE status = 'queued';