Files
multica/server/internal/integrations/lark/binding_token.go
Bohan Jiang ce28d0aa0e feat(integrations): add platform-agnostic channel foundation (MUL-3515) (#4412)
* feat(integrations): add platform-agnostic channel foundation

Introduce server/internal/integrations/channel — the contract every
inbound IM integration implements, so the core never learns a platform's
event JSON. Four pieces:

- Channel interface (Type/Connect/Disconnect/Send/Capabilities) + Factory
  + Config (channel_type + opaque JSON blob, maps to channel_installation).
- Normalized InboundMessage/OutboundMessage envelopes + Source/MediaRef/
  ReplyCtx/MsgType/ChatType. Envelope holds only cross-platform-true
  fields; platform specifics live in Raw, read only by the adapter.
- Capability bitmask: declaration only, no degrade logic in core.
- Registry: Type->Factory map, last-writer-wins, concurrency-safe.

Pure package (no DB/network/platform deps). Foundation for MUL-3515; the
lark cutover + lark_*->channel_* generalization land in follow-up PRs.

MUL-3515

Co-authored-by: multica-agent <github@multica.ai>

* feat(channel): generalize lark_* tables into channel_* (DB layer)

Migration 123 creates channel_installation / channel_user_binding /
channel_chat_session_binding / channel_inbound_message_dedup /
channel_inbound_audit / channel_outbound_card_message /
channel_binding_token. Each carries a channel_type discriminator and a
JSONB config for platform-specific identifiers/credentials; cross-platform
columns stay flat. Existing Feishu rows are backfilled (channel_type=
'feishu', app_secret_encrypted via base64). NO foreign keys / cascades
(MUL-3515 §4) — integrity moves to the app layer in the cutover.

queries/channel.sql ports the lark query surface to channel_*, JSONB-aware,
plus DeleteChannelUserBindingsByWorkspaceMember /
DeleteChannelChatSessionBindingBySession for the app-layer cleanup that
replaces the removed cascades.

lark_* tables/queries are left in place here and removed once the Go
cutover lands, so this commit ships green on its own.

Verified: sqlc generate, go build ./..., full migrate chain (1..123) on
Postgres 17, and a real-data backfill spot-check (base64 round-trip,
NULL-strip, functional unique index on (channel_type, app_id)).

MUL-3515

Co-authored-by: multica-agent <github@multica.ai>

* fix(channel): name app_id query param + multi-IM install key + null-safe binding merge

Addresses review on MUL-3515 (PR #4412):

- GetChannelInstallationByAppID: explicitly name params and cast app_id to
  ::text so sqlc emits AppID string. A bare $2 next to `config ->> 'app_id'`
  was mis-attributed to the JSONB config column, generating Config []byte.

- channel_installation uniqueness -> (workspace_id, agent_id, channel_type),
  with the UpsertChannelInstallation conflict key matched. Lets one agent
  hold one installation per IM (feishu + slack + ...) instead of a later
  install clobbering an earlier one. Behaviorally identical in the current
  feishu-only world; "one agent, at most one IM overall" stays an app-layer
  rule per MUL-3515 §4, not a DB constraint.

- CreateChannelUserBinding merges jsonb_strip_nulls(EXCLUDED.config) so a
  re-bind carrying {"union_id": null} no longer erases an already-captured
  union_id, restoring the old COALESCE(EXCLUDED.union_id, ...) semantics.

Regenerated with sqlc v1.31.1. Verified on PG17: re-install replaces in
place, feishu+slack coexist, null re-bind keeps union_id, real union_id wins.

Co-authored-by: multica-agent <github@multica.ai>

* feat(lark): channel-backed Feishu store + fix base64 backfill wrapping

Cutover step 1 of switching the lark Go code from lark_* onto the channel_*
tables (MUL-3515). Introduces the JSONB config boundary the rest of the
cutover sits on, and fixes a latent backfill bug surfaced while building it.

- migration 123: strip newlines from the app_secret_encrypted base64 backfill.
  PostgreSQL encode(...,'base64') MIME-wraps at 76 chars, and a secretbox-
  sealed ~72-byte secret exceeds that. Go's encoding/json decodes a JSON
  string into []byte with base64.StdEncoding, which rejects embedded newlines,
  so without the strip every migrated installation would fail to decrypt its
  app secret once reads move to channel_installation.config.

- store.go: flat domain types (Installation / UserBinding / ChatSessionBinding)
  with field parity to the retired db.Lark* rows, plus the feishu config codec.
  Row->domain mappers decode the JSONB config; the secret decoder is
  whitespace-tolerant so legacy MIME-wrapped data still round-trips, while the
  encoder emits unwrapped base64. Binding config encodes an absent union_id as
  "{}" so the upsert's jsonb_strip_nulls merge never clobbers a stored union_id.

- store_test.go: 72-byte secret round-trip, MIME-wrapped tolerance, optional
  null-strip, and flat-column preservation. Verified on PG17.

Field parity keeps the upcoming ~190 db.LarkInstallation call sites a
mechanical rename. No call sites switched yet; behavior unchanged.

Co-authored-by: multica-agent <github@multica.ai>

* feat(lark): route inbound integration onto channel_* + explicit membership checks

Cutover step 2 (MUL-3515): switch the Feishu Go code from the lark_* queries to
channel_* via a ChannelStore adapter, and replace the removed member foreign key
with explicit application-layer membership checks. No user-visible behavior change.

- channel_store.go: ChannelStore embeds *db.Queries and SHADOWS the ~24 lark
  query methods with channel_*-backed equivalents, keeping the db.Lark*
  signatures so the dispatcher/hub/services and their ~20k lines of tests stay
  untouched; the feishu JSONB config is (de)coded by store.go. Adds
  IsWorkspaceMember and a tx-aware WithTx. Only production wiring swaps
  *db.Queries for *ChannelStore.

- Membership re-check (§4 removed the lark_user_binding -> member FK, so a
  binding row no longer proves current membership):
  * the dispatcher inbound identity step verifies membership after the binding
    lookup; a former member's stale binding is dropped as non_workspace_member
    + audited and never reaches chat_session (§4.3 safety property).
  * RedeemAndBind and BindInstallerTx replace the now-dead FK (23503) branch
    with an explicit IsWorkspaceMember gate, preserving the existing
    ErrBindingNotWorkspaceMember outcome without burning the token.

- router wires the ChannelStore into the patcher, typing indicator, dispatcher,
  hub, and the union_id/region backfills; constructor-based services wrap
  *db.Queries internally so their signatures and nil-check tests are unchanged.

Verified: go build ./... ; go vet ; gofmt ; go test -race ./internal/integrations/...
(full lark suite green unchanged + new membership drop/error tests). Adapter
field mappings (secret base64, union_id RMW, chat-id/open-id remaps, dedup,
token, card) checked end-to-end against a PG17 channel_* schema.

lark_* tables and queries remain (unused at runtime) until the S3 cleanup-hooks
and S4 drop-tables/rename commits.

Co-authored-by: multica-agent <github@multica.ai>

* fix(channel): renumber generalization migration 123 -> 124

main merged 123_issue_stage after this branch forked, so the branch's 123_channel_generalization now collides on the migration number. The runner keys schema_migrations by full version string and would still apply both, but a duplicate number is a merge hazard and convention violation, so move the channel migration to the next free slot (124).

issue_stage (ALTER issue ADD COLUMN stage) and the channel generalization touch disjoint tables; verified on PG17 that 123_issue_stage applies cleanly on a DB already carrying 124_channel_generalization, so the two are order-independent. sqlc regenerated (v1.31.1): only the migration-number comment changed.

MUL-3515

Co-authored-by: multica-agent <github@multica.ai>

* feat(channel): prune channel bindings on member removal + chat session delete

MUL-3515 §4 dropped every channel_* foreign key, so the old ON DELETE CASCADE that cleared a user's channel_user_binding when they left a workspace, and a chat's channel_chat_session_binding when its chat_session was deleted, no longer fires. Re-establish that integrity in the application layer, inside the existing transactions: revokeAndRemoveMember -> DeleteChannelUserBindingsByWorkspaceMember, DeleteChatSession -> DeleteChannelChatSessionBindingBySession.

Adds real-DB tests for both paths, including a scoping check that a remaining member's binding survives the prune. Verified on PG17: both new tests plus the existing revocation tests and the full handler package pass.

MUL-3515

Co-authored-by: multica-agent <github@multica.ai>

* fix(channel): scope Lark/Feishu store reads to channel_type='feishu'

The S2 cutover routed the Feishu integration onto channel_*, but the Lark-facing ChannelStore wrappers read installation / chat-session-binding / outbound-card rows across ALL channel_type values. Once a second IM exists, that would let the Lark hub supervise a non-Feishu installation, the Lark install list show it, /lark/installations/{id} revoke another channel's row, and the outbound patcher / typing indicator act on a non-Feishu chat binding or card.

Add a channel_type predicate to the six read/list channel queries and pass channelTypeFeishu from every wrapper: GetChannelInstallation, GetChannelInstallationInWorkspace, ListChannelInstallationsByWorkspace, ListActiveChannelInstallations, GetChannelChatSessionBindingBySession, GetChannelOutboundCardByTask.

The S3 cleanup deletes (DeleteChannelUserBindingsByWorkspaceMember / DeleteChannelChatSessionBindingBySession) stay all-channel on purpose: a member leaving or a chat_session being deleted should clear every IM's binding. Adds a real-DB test that seeds a Slack installation/binding/card next to the Feishu ones and asserts the Lark wrappers never return them.

MUL-3515

Co-authored-by: multica-agent <github@multica.ai>

* refactor(channel): replace db.Lark* translation layer with lark domain types

S2 introduced ChannelStore as a translation layer that read/wrote channel_* but kept the retired db.Lark* struct/param shapes so the dispatcher/hub/services and their ~20k lines of tests did not have to change. This collapses that layer: the store now takes and returns the package's flat domain types (Installation, UserBinding, ChatSessionBinding, InboundMessageDedup, BindingTokenRow, OutboundCardMessage) and the *Params types in params.go, with channel-neutral field names (ChannelUserID / ChannelChatID / ...). All call sites, fakes, and tests move to the domain types.

No behavior change: only channel_* is read/written (as before); db.Lark* is now unused, and the lark_* tables + queries/lark.sql are removed in the next commit. Verified on PG17: go build / vet / gofmt clean, go test -race ./internal/integrations/... green (the ~20k-line fake suite), and the lark + handler suites pass.

MUL-3515

Co-authored-by: multica-agent <github@multica.ai>

* refactor(channel): drop lark_* tables and queries (remove old path)

The Go cutover (previous commit) moved the lark package entirely onto channel_* and the domain types, leaving the lark_* tables, queries/lark.sql, and the generated db.Lark* models unused. Remove them per the design (§5: replace, do not keep both): migration 125 drops the seven lark_* tables (data already lives in channel_* since migration 124), and queries/lark.sql is deleted + sqlc regenerated, removing the db.Lark* models and lark query methods.

The 125 down recreates the authoritative pre-drop schema (bot_union_id, region, per-installation dedup PK, thread-reply columns). Verified on PG17: fresh migrate up ends with lark_* gone + channel_* present; isolated 125 down/up round-trips correctly; go build / vet / gofmt clean; go test -race ./internal/integrations/... and the handler suite pass.

MUL-3515

Co-authored-by: multica-agent <github@multica.ai>

* fix(migrations): remove trailing blank line at EOF of 125 down migration

git diff --check flagged a blank line at EOF of 125_drop_lark_tables.down.sql (a pg_dump-generation artifact). Whitespace only; the recreate SQL is unchanged.

MUL-3515

Co-authored-by: multica-agent <github@multica.ai>

* refactor(channel): defer lark_* table drop to a follow-up migration

Preflight deploy review: dropping lark_* in the same release that cuts over (old migration 125) is not rollback/rolling-safe — the v0.3.27 release still reads lark_*, so a rolling deploy or a post-deploy code rollback would hit "relation does not exist". Remove the drop and keep the old tables for one release (standard expand/contract): migration 124 already backfilled lark_* -> channel_*, the new code reads/writes only channel_*, and the physical drop moves to a separate cleanup migration once this ships and is observed.

The lark_* tables remain in the schema, so sqlc regenerates the (now unused) db.Lark* models; queries/lark.sql stays deleted (the new code uses channel_*). No code path reads lark_* — only the destructive drop is deferred, keeping the design's no-compat-layer / no-dual-write rule while being deploy-safe.

MUL-3515

Co-authored-by: multica-agent <github@multica.ai>

* fix(channel): skip orphaned installations in hub-boot active scan

Preflight deploy review: channel_installation dropped the workspace/agent FK (MUL-3515 §4), so unlike lark_installation it does not cascade away when its workspace is deleted or its agent is hard-deleted (e.g. runtime teardown). The hub-boot query then keeps opening a WebSocket for a bot whose owner is gone.

JOIN ListActiveChannelInstallations to live workspace + agent so an orphaned installation is never connected, uniformly for every deletion path. The JOIN matches the old ON DELETE CASCADE semantics (row existence, not agent archival), so an archived-but-present agent's installation is still listed; the orphaned row's encrypted secret is thereby never decrypted/used.

Tests: a real-DB handler test asserts a deleted-workspace/agent installation and a non-Feishu one are both excluded; the lark scope test's active-list assertion moved there since the JOIN now needs real workspace/agent fixtures. (Physically deleting dormant orphaned channel rows on workspace/agent deletion is a separate app-layer-cleanup follow-up.)

MUL-3515

Co-authored-by: multica-agent <github@multica.ai>

* docs(channel): document non-rolling cutover constraint for the lark->channel migration

Elon deploy review: keeping the lark_* tables (deferred drop) stops old v0.3.27 code from crashing, but is not full expand/contract. Migration 124 is a one-time backfill; afterwards new code runs on channel_* (lease + dedup on channel_*) while pre-cutover code runs on lark_* (lease + dedup on lark_*). If both run concurrently during a rolling deploy, each side claims the same Feishu bot's WS lease on its own table and double-processes inbound events.

This release therefore requires a NON-ROLLING cutover (stop the old hub before applying migration 124 + starting new code; rollback is not lossless once new code writes channel_*). Documented where deployers/reviewers see it: migration 124 header gains a ROLLOUT note; the channel_store.go header is corrected (lark_* tables are retained one release for rollback safety, not "gone"; the store still never touches them). Comment-only — no schema/codegen/behavior change.

MUL-3515

Co-authored-by: multica-agent <github@multica.ai>

* feat(lark): add MULTICA_LARK_HUB_DISABLED switch for the channel cutover

The lark_*->channel_* cutover needs a way to make the Feishu bot briefly unavailable WITHOUT taking down the whole multica-api process — the Lark hub is a goroutine inside it, not a separate Deployment. MULTICA_LARK_HUB_DISABLED=true parks the hub at startup: the API serves HTTP normally but never claims a WS lease or opens a Feishu connection.

Rollout (see migration 124 ROLLOUT note): ship the new release with the flag SET so new pods run API-only while old pods (hub on lark_*) drain during the rolling deploy — the two hubs never overlap. After the old pods are gone and migration 124 has run, flip the flag off; the new hub comes up on channel_*. The old backend does NOT need this switch — its hub stops when k8s terminates the old pods, not via a flag. Nil-ing LarkHub reuses the existing not-configured path so both the startup start and the shutdown join skip it.

MUL-3515

Co-authored-by: multica-agent <github@multica.ai>

* docs(channel): point migration 124 ROLLOUT note at the hub-disable switch

Refine the rollout note to use MULTICA_LARK_HUB_DISABLED for a bot-only cutover (new pods serve API with the hub parked while old pods drain; flip the switch off after the migration), instead of the earlier whole-API recreate. Comment-only.

MUL-3515

Co-authored-by: multica-agent <github@multica.ai>

* docs(channel): fix migration 124 rollout order and document self-host cutover

The previous ROLLOUT note shipped the new (channel_*) build before
running migration 124, so the channel_*-backed HTTP paths (installation
list/install/revoke, chat-session delete, member revoke) would 500 in
the window between new-pod boot and the deferred migration. Restate the
runbook around two explicit invariants — channel_* must exist before the
new build serves those paths, and the old/new hubs must never overlap —
and order the steps so channel_* is created first (park old hub -> snapshot
-> deploy parked new build -> unpark). Document that default self-host
(entrypoint migrate + single-replica Recreate) satisfies both invariants
automatically and needs no manual steps; only prd / multi-replica rolling
self-host needs the switch procedure. Clarify in main.go that the
hub-park switch is generation-agnostic (parks whichever hub the build
carries), which is what enables the preparatory release.

Refs MUL-3515

Co-authored-by: multica-agent <github@multica.ai>

---------

Co-authored-by: J <j@multica.ai>
Co-authored-by: multica-agent <github@multica.ai>
2026-06-24 12:46:20 +08:00

304 lines
12 KiB
Go

package lark
import (
"context"
"crypto/rand"
"crypto/sha256"
"encoding/base64"
"encoding/hex"
"errors"
"fmt"
"time"
"github.com/jackc/pgx/v5"
"github.com/jackc/pgx/v5/pgtype"
db "github.com/multica-ai/multica/server/pkg/db/generated"
)
// BindingToken is the public shape of a freshly minted token. The raw
// token is returned to the caller exactly once — it is the unguessable
// secret embedded in the binding URL the Bot replies with. After this
// call returns, only the hash exists server-side; the raw value
// cannot be recovered from the DB.
type BindingToken struct {
Raw string
ExpiresAt time.Time
}
// RedeemedBindingToken is the row returned to the caller after a
// successful redemption. The redemption path uses these fields to
// write the lark_user_binding row.
type RedeemedBindingToken struct {
WorkspaceID pgtype.UUID
InstallationID pgtype.UUID
LarkOpenID OpenID
}
// InstallerBinder is the narrow surface RegistrationService needs to
// record the installer's lark_user_binding row in the same business
// step as the installation insert. Without this step the first inbound
// message from the installer would be dropped as `unbound_user` and
// the Bot would reply "you're not bound, click here…" to the person
// who just authorized the install seconds ago.
//
// Implementations MUST be idempotent on (installation_id, lark_open_id):
// a re-install by the same user should not error.
//
// `qtx` is the channel-backed handle to run the bind against. The
// caller opens the transaction so the installation insert and the
// binding write commit together; nil means "use the service's own
// (non-transactional) queries handle".
type InstallerBinder interface {
BindInstallerTx(ctx context.Context, qtx *ChannelStore, p InstallerBindParams) error
}
// InstallerBindParams carries the inputs InstallerBinder needs. Kept
// as a struct so adding union_id (Phase 2) does not break callers.
type InstallerBindParams struct {
WorkspaceID pgtype.UUID
InstallationID pgtype.UUID
MulticaUserID pgtype.UUID // the installer's Multica account
LarkOpenID OpenID // the installer's per-installation open_id
}
// BindingTokenService mints and redeems binding tokens for the
// "you're not bound yet, click here" flow. The TTL is fixed at
// BindingTokenTTL (15 min); the DB CHECK enforces the same cap so a
// misconfigured caller cannot quietly mint a longer-lived token.
//
// Redemption (RedeemAndBind) is transactional: consuming the token
// and inserting the lark_user_binding row commit together, so a
// failed bind never burns a token, and a successful bind never
// leaves a consumed-but-unused token behind.
type BindingTokenService struct {
queries *ChannelStore
tx TxStarter
now func() time.Time
}
// NewBindingTokenService constructs the default service. The clock
// is injectable so tests can pin time for deterministic expiry
// behavior; production callers use NewBindingTokenServiceWithClock
// with time.Now.
func NewBindingTokenService(queries *db.Queries, tx TxStarter) *BindingTokenService {
return NewBindingTokenServiceWithClock(queries, tx, time.Now)
}
// NewBindingTokenServiceWithClock is the seam for tests; production
// callers should use NewBindingTokenService. queries is wrapped in a
// ChannelStore so lark_* calls resolve to channel_* rows (MUL-3515).
func NewBindingTokenServiceWithClock(queries *db.Queries, tx TxStarter, now func() time.Time) *BindingTokenService {
return &BindingTokenService{queries: NewChannelStore(queries), tx: tx, now: now}
}
// Mint creates a new single-use binding token and returns the raw
// secret + expiry. The raw value MUST be sent over a secure channel
// to the intended recipient — Lark DMs are encrypted in transit by
// the platform — and never logged. Mint is the only function in this
// package that produces a raw token; subsequent reads are by hash.
func (s *BindingTokenService) Mint(ctx context.Context, workspaceID, installationID pgtype.UUID, openID OpenID) (BindingToken, error) {
raw, err := randomToken(32)
if err != nil {
return BindingToken{}, fmt.Errorf("generate token: %w", err)
}
hash := hashToken(raw)
expiresAt := s.now().Add(BindingTokenTTL)
if _, err := s.queries.CreateLarkBindingToken(ctx, CreateBindingTokenParams{
TokenHash: hash,
WorkspaceID: workspaceID,
InstallationID: installationID,
ChannelUserID: string(openID),
ExpiresAt: pgtype.Timestamptz{Time: expiresAt, Valid: true},
}); err != nil {
return BindingToken{}, fmt.Errorf("persist token: %w", err)
}
return BindingToken{Raw: raw, ExpiresAt: expiresAt}, nil
}
// RedeemAndBind atomically consumes a raw token and writes the
// lark_user_binding row in a single DB transaction. The redeemer's
// identity is the supplied multicaUserID (taken from the session by
// the handler, never from the token), so a stolen token cannot bind
// a Lark open_id to an attacker's account.
//
// Failure modes are returned as typed errors:
//
// - ErrBindingTokenInvalid: token doesn't exist / already consumed /
// expired. Same opaque error for all three to avoid a timing
// oracle for replay races.
//
// - ErrBindingAlreadyAssigned: a binding already exists for this
// (installation, open_id), pointing at a DIFFERENT Multica user.
// The token is NOT consumed in this case — we roll back so the
// correct holder of the existing binding is not disrupted and
// ops can still revoke the surplus token explicitly. Account
// transfer must go through an explicit unbind, not a redemption.
//
// - ErrBindingNotWorkspaceMember: the redeemer is not a member of
// the token's workspace, which trips the composite FK to
// member(workspace_id, user_id). Rolled back identically.
//
// On the happy path the consume + bind commit together: a successful
// return guarantees both the consumed_at write and the binding row
// landed; a returned error guarantees neither did.
func (s *BindingTokenService) RedeemAndBind(ctx context.Context, raw string, multicaUserID pgtype.UUID) (RedeemedBindingToken, error) {
if s.tx == nil {
return RedeemedBindingToken{}, errors.New("lark: BindingTokenService missing TxStarter")
}
tx, err := s.tx.Begin(ctx)
if err != nil {
return RedeemedBindingToken{}, fmt.Errorf("begin tx: %w", err)
}
defer tx.Rollback(ctx)
qtx := s.queries.WithTx(tx)
row, err := qtx.ConsumeLarkBindingToken(ctx, hashToken(raw))
if err != nil {
if errors.Is(err, pgx.ErrNoRows) {
return RedeemedBindingToken{}, ErrBindingTokenInvalid
}
return RedeemedBindingToken{}, fmt.Errorf("consume token: %w", err)
}
// Explicit membership gate. The lark_user_binding -> member FK that
// used to reject a non-member redeemer is gone (MUL-3515 §4), so we
// check it here. Returning before Commit rolls the consume back, so
// a non-member's attempt does not burn the token — same outcome the
// FK violation produced.
isMember, err := qtx.IsWorkspaceMember(ctx, row.WorkspaceID, multicaUserID)
if err != nil {
return RedeemedBindingToken{}, fmt.Errorf("check membership: %w", err)
}
if !isMember {
return RedeemedBindingToken{}, ErrBindingNotWorkspaceMember
}
_, err = qtx.CreateLarkUserBinding(ctx, CreateUserBindingParams{
WorkspaceID: row.WorkspaceID,
MulticaUserID: multicaUserID,
InstallationID: row.InstallationID,
ChannelUserID: row.ChannelUserID,
})
if err != nil {
// pgx.ErrNoRows here means the conflict row exists but its
// multica_user_id differs from ours, so the WHERE clause on
// the ON CONFLICT DO UPDATE rejected the rebind. See the
// comment on CreateChannelUserBinding in queries/channel.sql.
if errors.Is(err, pgx.ErrNoRows) {
return RedeemedBindingToken{}, ErrBindingAlreadyAssigned
}
return RedeemedBindingToken{}, fmt.Errorf("create binding: %w", err)
}
if err := tx.Commit(ctx); err != nil {
return RedeemedBindingToken{}, fmt.Errorf("commit: %w", err)
}
return RedeemedBindingToken{
WorkspaceID: row.WorkspaceID,
InstallationID: row.InstallationID,
LarkOpenID: OpenID(row.ChannelUserID),
}, nil
}
// BindInstallerTx is the auto-binding path for the device-flow
// install: the user who just authorized the install is recorded as
// bound to their own open_id, so the first inbound message in the
// bot's DM arrives at a `bound` identity check and the user is NOT
// prompted with a redundant "click here to bind" card.
//
// `qtx` is the RegistrationService's transaction-scoped queries
// handle. The service opens a transaction that wraps the
// lark_installation insert and this binding write so a half-applied
// install (installation row without the installer binding) cannot
// land. When `qtx` is nil the service's own (non-transactional)
// queries handle is used, which is the right behavior for tests that
// don't need atomicity.
//
// Token redemption deliberately does NOT share this code path:
// - RedeemAndBind consumes a server-minted token in the same tx as
// the binding insert; that's how anti-replay works.
// - BindInstallerTx is invoked from the device-flow success hook
// where the authoritative proof of identity is the Lark-validated
// polling response (open_id returned alongside the freshly minted
// client_id / client_secret). There is no token to consume, and
// inventing one would only widen the attack surface.
//
// The underlying CreateLarkUserBinding query is idempotent on
// (installation_id, lark_open_id) when multica_user_id matches (the
// ON CONFLICT DO UPDATE gating spelled out on the SQL), so a
// re-install by the same user is a no-op metadata refresh. A
// re-install by a DIFFERENT user surfaces as ErrBindingAlreadyAssigned
// — the registration caller treats that as a hard error and the
// frontend surfaces it as "this Lark account is bound elsewhere",
// preventing one workspace admin from silently rebinding another's
// PersonalAgent install.
func (s *BindingTokenService) BindInstallerTx(ctx context.Context, qtx *ChannelStore, p InstallerBindParams) error {
q := qtx
if q == nil {
q = s.queries
}
// Explicit membership gate, replacing the removed member FK
// (MUL-3515 §4): the installer must be a member of the workspace
// they are binding into.
isMember, err := q.IsWorkspaceMember(ctx, p.WorkspaceID, p.MulticaUserID)
if err != nil {
return fmt.Errorf("check membership: %w", err)
}
if !isMember {
return ErrBindingNotWorkspaceMember
}
_, err = q.CreateLarkUserBinding(ctx, CreateUserBindingParams{
WorkspaceID: p.WorkspaceID,
MulticaUserID: p.MulticaUserID,
InstallationID: p.InstallationID,
ChannelUserID: string(p.LarkOpenID),
})
if err != nil {
if errors.Is(err, pgx.ErrNoRows) {
return ErrBindingAlreadyAssigned
}
return fmt.Errorf("bind installer: %w", err)
}
return nil
}
// ErrBindingTokenInvalid is returned by RedeemAndBind when the token
// hash does not exist, the token has already been consumed, or it
// has expired. The caller must NOT distinguish those sub-cases —
// that distinction enables timing oracles for token replay races and
// adds no product value (the user sees the same "link invalid or
// expired, please request a new one" copy either way).
var ErrBindingTokenInvalid = errors.New("binding token invalid or expired")
// ErrBindingAlreadyAssigned is returned by RedeemAndBind when a
// lark_user_binding row already exists for the (installation,
// open_id) pair and points at a different Multica user. Account
// transfer must go through an explicit unbind flow; a binding token
// cannot be used to grab an already-bound open_id from another user.
var ErrBindingAlreadyAssigned = errors.New("lark open_id is already bound to a different user")
// ErrBindingNotWorkspaceMember is returned by RedeemAndBind and
// BindInstallerTx when the user is not (or no longer) a member of the
// target workspace, detected by an explicit IsWorkspaceMember check
// (MUL-3515 §4 removed the member FK that used to enforce this).
// Translated to 403 at the HTTP boundary.
var ErrBindingNotWorkspaceMember = errors.New("redeemer is not a workspace member")
func randomToken(n int) (string, error) {
buf := make([]byte, n)
if _, err := rand.Read(buf); err != nil {
return "", err
}
// URL-safe so the token embeds cleanly in the binding URL
// without escaping. RawURLEncoding drops `=` padding which is
// optional for decoders and would otherwise look ugly in
// user-visible URLs.
return base64.RawURLEncoding.EncodeToString(buf), nil
}
func hashToken(raw string) string {
sum := sha256.Sum256([]byte(raw))
return hex.EncodeToString(sum[:])
}