chore(channel): remove the one-time MULTICA_LARK_HUB_DISABLED cutover switch

The lark_*->channel_* cutover (MUL-3515) is deployed to prod, and the MULTICA_LARK_HUB_DISABLED park-switch was a one-time scaffold for that rollout — the end state intentionally does not use it (prod never set the env). Remove the env-gated branch from cmd/server/main.go so the channel supervisor always starts when built; its existing nil-guard and shutdown join are unchanged. Trim migration 124's now-obsolete switch runbook to a short historical note (comment-only; 124 is already applied, so this does not re-run). Refs MUL-3515 Co-authored-by: multica-agent <github@multica.ai>
2026-06-28 10:02:36 +02:00 · 2026-06-24 18:14:52 +08:00
2 changed files with 7 additions and 37 deletions
--- a/server/cmd/server/main.go
+++ b/server/cmd/server/main.go
@@ -381,24 +381,6 @@ func main() {
 	// rows it simply idles. Lifecycle is bound to sweepCtx so it winds down
 	// alongside the other long-running workers, AFTER the HTTP server has
 	// drained.
-	// Cutover control (MUL-3515): MULTICA_LARK_HUB_DISABLED parks the inbound
-	// channel supervisor WITHOUT taking down the rest of the API — the process
-	// still serves HTTP normally, it just never claims a WS lease or opens a
-	// Feishu connection. The switch is generation-agnostic: in a pre-cutover
-	// build it parks the OLD (lark_*) hub, in this build it parks the NEW
-	// (channel_*) supervisor. The lark_*->channel_* rollout uses it twice —
-	// first to stop the old hub on the current build so migration 124 takes a
-	// clean snapshot, then to hold this build's new supervisor dormant until
-	// that migration has run and the old pods have drained, so the two never
-	// double-process the same bot. Nil-ing it here makes the start below and
-	// the shutdown join skip it. The operator flips it off last to bring the
-	// new supervisor up on channel_*. See migration 124's ROLLOUT note for the
-	// full order. The env var keeps its name for operator/runbook
-	// compatibility across the cutover.
-	if h.ChannelSupervisor != nil && os.Getenv("MULTICA_LARK_HUB_DISABLED") == "true" {
-		slog.Warn("Lark inbound supervisor disabled via MULTICA_LARK_HUB_DISABLED; API serves normally but no Feishu WebSocket is opened")
-		h.ChannelSupervisor = nil
-	}
 	if h.ChannelSupervisor != nil {
 		go h.ChannelSupervisor.Run(sweepCtx)
 	}
--- a/server/migrations/124_channel_generalization.up.sql
+++ b/server/migrations/124_channel_generalization.up.sql
@@ -39,25 +39,13 @@
 --     version upgrade is a clean cutover. Only a self-host re-tuned to
 --     multi-replica RollingUpdate needs the prd procedure below.
 --
--     PRD (rolling multica-api, maxUnavailable:0) overlaps old and new pods,
--     so use the MULTICA_LARK_HUB_DISABLED switch (cmd/server/main.go) to park
--     a hub while the API stays up. For a clean, drift-free cutover:
--       1. Pre-release the hub-park switch on the CURRENT build and set
--          MULTICA_LARK_HUB_DISABLED=true. The old hub stops everywhere (no
--          more lease/dedup/binding/thread writes to lark_*); the API stays up.
--       2. Run this migration — a clean snapshot, no live hub writing lark_*.
--       3. Deploy the channel build with the switch still ON. channel_* already
--          exists (step 2), so the new HTTP paths never 500, and the new hub
--          stays parked while old pods drain.
--       4. Flip MULTICA_LARK_HUB_DISABLED off — the new hub comes up on
--          channel_*. Only the Feishu bot is unavailable across steps 1-4; the
--          API stays up throughout.
--     The earlier "ship new code, THEN migrate after pods drain" order is
--     wrong: it serves channel_* HTTP before channel_* exists, violating (a).
--     Rollback to a pre-cutover build is not lossless once the new hub has
--     written Feishu state into channel_*. See the PR "Deployment / rollout"
--     section for the full runbook (incl. a lower-effort single-deploy variant
--     that trades a small transient drift for one fewer release).
+--     PRD (rolling multica-api, maxUnavailable:0) overlapped old and new pods.
+--     A one-time MULTICA_LARK_HUB_DISABLED park-switch existed during the
+--     cutover to hold a hub dormant while the API stayed up, so only one hub
+--     was ever live (invariant b). That cutover is complete and the switch has
+--     since been removed (MUL-3515); this note is kept as history. Rollback to
+--     a pre-cutover build is not lossless once the new hub has written Feishu
+--     state into channel_*.
 --
 -- app_secret_encrypted is BYTEA; it is carried into the JSONB config as a
 -- base64 string. PostgreSQL's encode(...,'base64') MIME-wraps the output