Compare commits

...

1 Commits

Author SHA1 Message Date
J
dfa6ecd52f chore(channel): remove the one-time MULTICA_LARK_HUB_DISABLED cutover switch
The lark_*->channel_* cutover (MUL-3515) is deployed to prod, and the
MULTICA_LARK_HUB_DISABLED park-switch was a one-time scaffold for that
rollout — the end state intentionally does not use it (prod never set the
env). Remove the env-gated branch from cmd/server/main.go so the channel
supervisor always starts when built; its existing nil-guard and shutdown
join are unchanged. Trim migration 124's now-obsolete switch runbook to a
short historical note (comment-only; 124 is already applied, so this does
not re-run).

Refs MUL-3515

Co-authored-by: multica-agent <github@multica.ai>
2026-06-24 18:14:52 +08:00
2 changed files with 7 additions and 37 deletions

View File

@@ -381,24 +381,6 @@ func main() {
// rows it simply idles. Lifecycle is bound to sweepCtx so it winds down
// alongside the other long-running workers, AFTER the HTTP server has
// drained.
// Cutover control (MUL-3515): MULTICA_LARK_HUB_DISABLED parks the inbound
// channel supervisor WITHOUT taking down the rest of the API — the process
// still serves HTTP normally, it just never claims a WS lease or opens a
// Feishu connection. The switch is generation-agnostic: in a pre-cutover
// build it parks the OLD (lark_*) hub, in this build it parks the NEW
// (channel_*) supervisor. The lark_*->channel_* rollout uses it twice —
// first to stop the old hub on the current build so migration 124 takes a
// clean snapshot, then to hold this build's new supervisor dormant until
// that migration has run and the old pods have drained, so the two never
// double-process the same bot. Nil-ing it here makes the start below and
// the shutdown join skip it. The operator flips it off last to bring the
// new supervisor up on channel_*. See migration 124's ROLLOUT note for the
// full order. The env var keeps its name for operator/runbook
// compatibility across the cutover.
if h.ChannelSupervisor != nil && os.Getenv("MULTICA_LARK_HUB_DISABLED") == "true" {
slog.Warn("Lark inbound supervisor disabled via MULTICA_LARK_HUB_DISABLED; API serves normally but no Feishu WebSocket is opened")
h.ChannelSupervisor = nil
}
if h.ChannelSupervisor != nil {
go h.ChannelSupervisor.Run(sweepCtx)
}

View File

@@ -39,25 +39,13 @@
-- version upgrade is a clean cutover. Only a self-host re-tuned to
-- multi-replica RollingUpdate needs the prd procedure below.
--
-- PRD (rolling multica-api, maxUnavailable:0) overlaps old and new pods,
-- so use the MULTICA_LARK_HUB_DISABLED switch (cmd/server/main.go) to park
-- a hub while the API stays up. For a clean, drift-free cutover:
-- 1. Pre-release the hub-park switch on the CURRENT build and set
-- MULTICA_LARK_HUB_DISABLED=true. The old hub stops everywhere (no
-- more lease/dedup/binding/thread writes to lark_*); the API stays up.
-- 2. Run this migration — a clean snapshot, no live hub writing lark_*.
-- 3. Deploy the channel build with the switch still ON. channel_* already
-- exists (step 2), so the new HTTP paths never 500, and the new hub
-- stays parked while old pods drain.
-- 4. Flip MULTICA_LARK_HUB_DISABLED off — the new hub comes up on
-- channel_*. Only the Feishu bot is unavailable across steps 1-4; the
-- API stays up throughout.
-- The earlier "ship new code, THEN migrate after pods drain" order is
-- wrong: it serves channel_* HTTP before channel_* exists, violating (a).
-- Rollback to a pre-cutover build is not lossless once the new hub has
-- written Feishu state into channel_*. See the PR "Deployment / rollout"
-- section for the full runbook (incl. a lower-effort single-deploy variant
-- that trades a small transient drift for one fewer release).
-- PRD (rolling multica-api, maxUnavailable:0) overlapped old and new pods.
-- A one-time MULTICA_LARK_HUB_DISABLED park-switch existed during the
-- cutover to hold a hub dormant while the API stayed up, so only one hub
-- was ever live (invariant b). That cutover is complete and the switch has
-- since been removed (MUL-3515); this note is kept as history. Rollback to
-- a pre-cutover build is not lossless once the new hub has written Feishu
-- state into channel_*.
--
-- app_secret_encrypted is BYTEA; it is carried into the JSONB config as a
-- base64 string. PostgreSQL's encode(...,'base64') MIME-wraps the output