Files
multica/server/migrations/093_webhook_deliveries.up.sql
Bohan Jiang 2323b72710 feat(autopilots): webhook delivery layer + idempotency/signature/replay (MUL-2334) [PR1] (#2774)
* feat(autopilots): webhook delivery layer + idempotency / signature / replay (MUL-2334)

Splits "inbound webhook receipt" from "autopilot run creation" so we can
record duplicate attempts, signature outcomes, and ignored/skipped
deliveries — and replay a delivery on demand. v1 ingress wrote straight
into autopilot_run.trigger_payload, which collapsed the two concerns and
left run_only autopilots vulnerable to provider retry storms.

Backend only (PR1). UI Deliveries tab follows in PR2.

Schema (migration 093):
  - autopilot_trigger.provider: 'generic' | 'github' (default 'generic').
  - autopilot_trigger.signing_secret: nullable plaintext (HMAC needs it
    cleartext; mirrors how webhook_token is stored).
  - webhook_delivery: one row per inbound POST. Carries raw_body,
    selected_headers, dedupe_key/source, signature_status,
    autopilot_run_id, replayed_from_delivery_id, response_status / body.
  - Partial unique index on (trigger_id, dedupe_key) excludes NULL and
    'rejected' rows, so a wrong-secret 401 does NOT permanently block a
    future retry with the same X-GitHub-Delivery once the operator fixes
    the secret.

Ingress flow (autopilot_webhook.go), persist-first + sync dispatch:
  1. IP rate limit -> 2. token lookup -> 3. token rate limit ->
  4. read raw body -> 5. autopilot/workspace cross-check ->
  6. normalize JSON (400 without persistence on parse failure) ->
  7. compute dedupe key + signature status ->
  8. INSERT delivery (status=queued). On (trigger_id, dedupe_key)
     unique-violation: bump attempt_count on existing row and return
     the original delivery_id + autopilot_run_id with 200 ->
  9. invalid/missing signature: UPDATE -> rejected, return 401 with
     delivery_id (no dispatch, not replayable) ->
 10. trigger disabled / autopilot paused/archived: UPDATE -> ignored,
     return 200 ->
 11. DispatchAutopilot synchronously, UPDATE -> dispatched/skipped/failed
     with autopilot_run_id and the response body we returned ->
 12. TouchAutopilotTriggerFiredAt and return 200.

No new long-running worker. A stale 'queued' row only happens if the
process dies between INSERT and UPDATE; that's a follow-up sweeper, not
this PR.

Authenticated API:
  - GET    /api/autopilots/{id}/deliveries (slim list)
  - GET    /api/autopilots/{id}/deliveries/{deliveryId} (with raw_body)
  - POST   /api/autopilots/{id}/deliveries/{deliveryId}/replay -> creates
    a new delivery row (replayed_from_delivery_id set), dispatches a
    new run, never collapses onto the original via dedupe.
  - PUT    /api/autopilots/{id}/triggers/{triggerId}/signing-secret
    Write-only; trigger response surfaces has_signing_secret +
    signing_secret_hint (last 4 chars), never the secret itself.

Signature verification reuses the GitHub-compatible
X-Hub-Signature-256: sha256=<hex(hmac(body, secret))> scheme; the
HMAC helper is constant-time. Invalid/missing signatures still count
against per-IP and per-token rate limits.

autopilot_run.trigger_payload is intentionally preserved — delivery
records the HTTP receipt; run records the normalized envelope handed
to the agent. They are two different views.

Tests (Postgres-backed):
  - delivery persistence on accept
  - dedupe via Idempotency-Key and X-GitHub-Delivery; run_only retry
    storm pin (3 retries -> 1 run)
  - invalid signature: 401 + rejected row + no run linkage
  - missing signature when secret configured: 401 + 'missing' state
  - valid signature dispatches
  - signing secret never echoed in trigger responses; hint shows last 4
  - min-length and clear-by-empty for signing secret PUT
  - replay creates a NEW delivery + new run; rejected deliveries cannot
    be replayed
  - list omits raw_body; detail includes it; cross-autopilot ID returns
    404 (workspace isolation defense in depth)
  - provider validation: unknown -> 400, github -> 201 round-trips
  - bad-signature stream still counts against per-token rate limit

Co-authored-by: multica-agent <github@multica.ai>

* fix(autopilots): address PR review on webhook delivery layer (MUL-2334)

- Exclude `failed` from the (trigger_id, dedupe_key) partial unique index
  alongside `rejected`, so a transient ingress failure does not strand the
  provider's stable X-GitHub-Delivery / Idempotency-Key retry. Update the
  dedupe lookup to prefer non-terminal rows under the same predicate.
- Tighten delivery status enum: drop `skipped` from the CHECK constraint
  and from the handler. A run that was admission-skipped (e.g. runtime
  offline) is now recorded as delivery=`dispatched` linked to the
  skipped run, with the response payload carrying status=`skipped`.
  Source of truth for skipped-ness is autopilot_run.status, not the
  delivery row — keeps the Deliveries UI enum unambiguous.
- On dispatch error, link the (possibly non-nil) autopilot_run returned
  by DispatchAutopilot to the failed delivery so Deliveries UI can
  navigate to the run row for debugging.
- Slim list projection: ListWebhookDeliveriesByAutopilot no longer pulls
  raw_body / selected_headers / response_body — a 100-row page × 256 KiB
  would otherwise round-trip ~25 MiB from Postgres per Deliveries reload.
  Detail endpoint continues to return the full row.
- Fix backend CI: TestGetDelivery_ReturnsFullPayload now decodes the
  response and asserts on the parsed raw_body instead of substring-
  matching against an escaped JSON string; raise the test-suite default
  webhook rate limits in TestMain so the shared 192.0.2.1 IP bucket
  doesn't fill across the suite and leak 429s into unrelated tests.
- Add regression coverage for the dedupe-after-failure path.

cd server && go test ./... is green locally.

Co-authored-by: multica-agent <github@multica.ai>

---------

Co-authored-by: multica-agent <github@multica.ai>
2026-05-18 14:59:40 +08:00

98 lines
5.4 KiB
SQL

-- Webhook delivery layer: separates "we received an inbound HTTP webhook" from
-- "we created an autopilot run". Splitting the two lets us record duplicate
-- attempts, signature outcomes, ignored/skipped deliveries, and offers a
-- replay path — all of which the previous shape (writing straight into
-- autopilot_run.trigger_payload) could not express.
--
-- Scope of this migration:
-- 1. Add `provider` + `signing_secret` to autopilot_trigger so a webhook
-- trigger can optionally carry per-provider configuration for signature
-- verification. signing_secret is plaintext at rest, mirroring how
-- webhook_token already lives — HMAC verification needs the cleartext
-- and there is no general secrets-at-rest infrastructure to layer on
-- yet (see issue MUL-2334 for the design rationale).
-- 2. Create webhook_delivery, one row per inbound HTTP request the public
-- ingress endpoint accepted (including rejected / ignored outcomes).
-- Duplicate requests don't get their own row — they bump attempt_count
-- on the existing dedupe target. The autopilot_run table is kept intact
-- and continues to represent the downstream execution — both views are
-- needed.
ALTER TABLE autopilot_trigger
ADD COLUMN provider TEXT NOT NULL DEFAULT 'generic'
CHECK (provider IN ('generic', 'github')),
ADD COLUMN signing_secret TEXT;
CREATE TABLE IF NOT EXISTS webhook_delivery (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
workspace_id UUID NOT NULL REFERENCES workspace(id) ON DELETE CASCADE,
autopilot_id UUID NOT NULL REFERENCES autopilot(id) ON DELETE CASCADE,
trigger_id UUID NOT NULL REFERENCES autopilot_trigger(id) ON DELETE CASCADE,
provider TEXT NOT NULL
CHECK (provider IN ('generic', 'github')),
event TEXT NOT NULL DEFAULT 'webhook.received',
-- dedupe_key is extracted from request headers per-provider:
-- github -> X-GitHub-Delivery
-- generic -> Idempotency-Key
-- NULL means "no provider-supplied dedupe identifier"; the partial unique
-- index below skips these rows so each NULL-keyed delivery is treated as
-- distinct (correct behavior for clients that don't implement idempotency
-- headers — the alternative would collapse all of them onto one row).
dedupe_key TEXT,
dedupe_source TEXT,
signature_status TEXT NOT NULL DEFAULT 'not_required'
CHECK (signature_status IN ('not_required', 'valid', 'invalid', 'missing')),
-- Delivery status tracks the *ingress*, not the autopilot run:
-- queued — INSERTed, dispatch not yet attempted
-- dispatched — handed off to AutopilotService; autopilot_run_id is set.
-- A run that was admission-skipped (e.g. runtime offline)
-- still lives here — the skipped-ness is recorded on
-- autopilot_run.status, not on the delivery, so the
-- Deliveries enum stays unambiguous.
-- rejected — signature verification failed (invalid or missing)
-- ignored — trigger disabled / autopilot paused / archived
-- failed — dispatch attempted and errored
status TEXT NOT NULL DEFAULT 'queued'
CHECK (status IN ('queued', 'dispatched', 'rejected', 'ignored', 'failed')),
attempt_count INTEGER NOT NULL DEFAULT 1,
-- Selected headers we want to keep for debugging (user-agent, event,
-- delivery id, idempotency key, signature presence). NOT the raw header
-- map — never store auth tokens or signature values plaintext.
selected_headers JSONB NOT NULL DEFAULT '{}'::jsonb,
content_type TEXT,
-- raw_body holds the exact bytes we received, capped at the ingress body
-- limit (256 KiB). Required for replay and to debug normalization issues.
-- No TTL/retention enforced here; if delivery volume becomes a storage
-- concern, add a follow-up migration with a retention job rather than
-- baking the policy into the schema.
raw_body BYTEA,
response_status INTEGER,
response_body TEXT,
autopilot_run_id UUID REFERENCES autopilot_run(id) ON DELETE SET NULL,
replayed_from_delivery_id UUID REFERENCES webhook_delivery(id) ON DELETE SET NULL,
error TEXT,
received_at TIMESTAMPTZ NOT NULL DEFAULT now(),
last_attempt_at TIMESTAMPTZ NOT NULL DEFAULT now(),
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
-- Listing deliveries per autopilot, newest first.
CREATE INDEX IF NOT EXISTS idx_webhook_delivery_autopilot
ON webhook_delivery(autopilot_id, created_at DESC);
-- Provider-supplied dedupe identifiers must be unique per trigger. Partial so
-- that NULL keys (no Idempotency-Key / X-GitHub-Delivery header) never
-- collide. Terminal-but-not-successful outcomes (`rejected`, `failed`) are
-- excluded so a transient ingress failure or a misconfigured secret does
-- not strand the operator: providers like GitHub keep X-GitHub-Delivery
-- stable across retries, and a permanently-blocking row would prevent the
-- next retry from ever being dispatched.
CREATE UNIQUE INDEX IF NOT EXISTS idx_webhook_delivery_dedupe
ON webhook_delivery(trigger_id, dedupe_key)
WHERE dedupe_key IS NOT NULL AND status NOT IN ('rejected', 'failed');
-- Lookup by linked run (sync flows, gc check, run-detail "what triggered me").
CREATE INDEX IF NOT EXISTS idx_webhook_delivery_run
ON webhook_delivery(autopilot_run_id)
WHERE autopilot_run_id IS NOT NULL;