Commit Graph

150 Commits

Author SHA1 Message Date
Bohan Jiang
64ce459e30 fix(github): preserve early installation webhook metadata (#4193)
Co-authored-by: J <j@multica.ai>
Co-authored-by: multica-agent <github@multica.ai>
2026-06-17 12:26:25 +08:00
LinYushen
52e76e7b23 MUL-3284: server API + daemon (custom runtime PR2) (#4149)
* MUL-3284: add runtime_profile schema (custom runtime PR1)

Schema-only foundation for custom runtimes. Additive migration 120:

- New workspace-level `runtime_profile` table: the shared, team-visible
  definition of a custom runtime (e.g. an in-house Codex wrapper).
  protocol_family is CHECK-constrained to the exact backend list in
  agent.New() (server/pkg/agent/agent.go). The only args column is
  `fixed_args` (args every agent on the runtime must inherit); there is
  deliberately no generic per-agent args field — those stay on
  agent.custom_args.
- `agent_runtime.profile_id` (nullable, FK -> runtime_profile ON DELETE
  CASCADE): NULL = built-in runtime, non-NULL = a registered instance of
  a custom profile.
- Partial unique index agent_runtime_workspace_daemon_profile_key on
  (workspace_id, daemon_id, profile_id) WHERE profile_id IS NOT NULL.

The legacy UNIQUE (workspace_id, daemon_id, provider) constraint is left
INTACT so the existing registration upsert
(ON CONFLICT (workspace_id, daemon_id, provider) in runtime.sql) keeps
resolving its arbiter and the server stays green. Converting that key to
a partial (WHERE profile_id IS NULL) index and making the upsert
profile-aware is PR2's registration work, not this migration.

Verified up + down against Postgres 17: full `migrate up` applies 120;
schema shows the table, column, partial index and intact legacy
constraint; functional checks pass (partial index blocks dup
(ws,daemon,profile), allows same profile on another daemon; CHECK and
display_name uniqueness reject bad input; legacy ON CONFLICT still
resolves; profile delete cascades to instances); down/up round-trip is
clean.

Co-authored-by: multica-agent <github@multica.ai>

* MUL-3284: drop DB FKs/cascade from runtime_profile migration (review fix)

Per review (house rule: no new database foreign keys / cascades; relational
integrity lives in the application layer):

- runtime_profile.workspace_id: drop REFERENCES workspace ON DELETE CASCADE
  -> plain UUID NOT NULL.
- runtime_profile.created_by: drop REFERENCES "user" ON DELETE SET NULL
  -> plain UUID.
- agent_runtime.profile_id: drop REFERENCES runtime_profile ON DELETE CASCADE
  -> plain UUID.

CHECK constraints, UNIQUE (workspace_id, display_name), the workspace index,
and the partial unique index agent_runtime_workspace_daemon_profile_key are
unchanged. The legacy UNIQUE (workspace_id, daemon_id, provider) constraint
remains untouched.

Behavioral consequence: the database no longer auto-removes a profile's
agent_runtime instance rows on profile delete. That cleanup moves into PR2's
profile-delete path. Up-migration comments document this; down-migration
comment no longer references FKs/cascade.

Re-verified on Postgres 17: migrate up applies 120; no FK constraints exist on
the new columns; partial index still blocks dup (ws,daemon,profile_id); CHECK
and display_name uniqueness still reject bad input; deleting a profile now
leaves the runtime row orphaned (proving cascade is gone); down/up round-trip
clean with the legacy constraint intact.

Co-authored-by: multica-agent <github@multica.ai>

* MUL-3284 PR2 (server): runtime_profile CRUD + profile-aware registration

Server/DB half of the custom-runtime feature.

- Migration 121: convert the legacy UNIQUE (workspace_id, daemon_id, provider)
  constraint on agent_runtime into a partial unique index scoped to built-in
  rows (WHERE profile_id IS NULL). With 120's partial index on profile_id this
  lets one daemon host the built-in provider AND custom profiles of the same
  protocol family without collision.
- Queries: runtime_profile CRUD; ListEnabledRuntimeProfilesForWorkspace
  (daemon-facing); CountAgentsByProfile + DeleteAgentRuntimesByProfile for the
  app-layer cascade; profile-aware UpsertAgentRuntimeWithProfile; the built-in
  UpsertAgentRuntime ON CONFLICT now spells out WHERE profile_id IS NULL so it
  targets the right partial index. sqlc regenerated.
- agent.SupportedTypes / IsSupportedType: single-source protocol_family
  whitelist, in lockstep with agent.New and the migration 120 CHECK.
- Handlers + routes: runtime_profile CRUD (member-read, admin-write) with
  protocol_family whitelist validation, display_name uniqueness (409), and
  fixed_args validation (no generic per-agent args — iron rule); a
  daemon-token endpoint GET /api/daemon/workspaces/{id}/runtime-profiles;
  DeleteRuntimeProfile does the app-layer cascade (delete instance rows then
  profile, in one tx) and refuses (409) while active agents are bound.
- DaemonRegister accepts an optional per-runtime profile_id: validates the
  profile belongs to the workspace and is enabled, registers via the
  profile-aware upsert, and skips legacy hostname merge for custom rows.
  AgentRuntimeResponse now carries profile_id.

Verified on Postgres 17: migrate up through 121; built-in + custom codex
coexist on one daemon; both upsert arbiters are idempotent; delete-by-profile
cascade removes only the custom instance; migrate down reverses 121 then 120
and replays clean. go build ./... and go vet pass; handler test package
compiles.

Daemon-side wiring (fetch profiles, PATH-resolve command_name, register with
profile_id, exec uses command_name) lands in a follow-up commit on this branch.

Co-authored-by: multica-agent <github@multica.ai>

* MUL-3284 PR2 (daemon): pull profiles, PATH-resolve, register, exec command

Daemon-side half of custom runtime profiles, against the server contract on
this branch.

- client.go: GetRuntimeProfiles(workspaceID) -> GET
  /api/daemon/workspaces/{id}/runtime-profiles (mirrors GetWorkspaceRepos);
  RuntimeProfile / RuntimeProfilesResponse types.
- types.go: Runtime gains profile_id (parsed from the register response so
  runtimeIndex carries it).
- daemon.go:
  * appendProfileRuntimes — called inside registerRuntimesForWorkspace before
    the empty-runtimes guard. Best-effort fetch (older server 404s are logged
    and swallowed; never fails registration). Per enabled profile: resolve
    command_name via PATH (exec.LookPath, behind a `lookPath` test hook),
    skip+log when absent, best-effort version probe, record the resolved
    absolute path keyed by profile_id, and append a registration entry
    {name, type=protocol_family, version, status:online, profile_id}. A
    custom-only host (no built-in agents) still registers.
  * profileCommandPaths map (guarded by d.mu) + recordProfileCommandPath /
    customCommandPathForRuntime helpers.
  * runTask: looks up the claimed task's RuntimeID -> profile command path and
    overrides the executable path, synthesizing an AgentEntry so a custom
    runtime runs even when the host has no built-in agent of the same
    provider. provider (=protocol_family) is unchanged so agent.New still
    selects the right backend.
- Tests: GetRuntimeProfiles request shape; profile runtime appended + path
  recorded (custom-only host); profile skipped when command not on PATH;
  profiles-fetch-404 is best-effort; customCommandPathForRuntime bookkeeping.
- agent: lockstep test pinning SupportedTypes to agent.New and the migration
  120 protocol_family CHECK.

Iron rule honored: profile carries no generic per-agent args. fixed_args are
parsed and carried but intentionally NOT wired into the launch command yet
(optional/best-effort; explicit TODO(MUL-3284) in appendProfileRuntimes).

Verified: go build ./... clean; go vet ./internal/daemon/... clean;
go test ./internal/daemon/... pass (existing + 5 new); full
go test ./internal/handler/ suite passes against a migrated Postgres 17;
agent lockstep test passes.

Co-authored-by: multica-agent <github@multica.ai>

* MUL-3284 PR2: profile delete runs full archived-agent cascade (fix 500)

Review fix. DeleteRuntimeProfile previously guarded only on ACTIVE agents, but
agent.runtime_id is ON DELETE RESTRICT — a profile whose runtimes had only
ARCHIVED agents passed the guard, then DeleteAgentRuntimesByProfile hit the FK
and the handler 500'd.

Now it mirrors the mature runtime-delete cascade (DeleteAgentRuntime): in one
transaction it enumerates the profile's runtime rows, refuses (409) any with
active agents or active squads led by archived agents, then for each runtime
pauses autopilots pinned to its archived agents, drops archived squads led by
them, and hard-deletes the archived agents before removing the runtime rows
and the profile. No code path can now fall through to a raw FK error.

- queries: ListAgentRuntimeIDsByProfile (sqlc regen). Reuses the existing
  per-runtime teardown queries (CountActiveSquadsWithArchivedLeadersByRuntime,
  ListArchivedAgentIDsByRuntime, PauseAutopilotsByAgentAssignees,
  DeleteSquadsByArchivedAgentsOnRuntime, DeleteArchivedAgentsByRuntime).
- tests: TestDeleteRuntimeProfile_ArchivedAgentCascade (archived-only profile
  deletes cleanly: 204, runtime + archived agent + profile gone) and
  TestDeleteRuntimeProfile_ActiveAgentBlocks (active agent → 409, survives).

Verified against Postgres 17: both new tests pass; full handler suite, daemon
tests, and agent lockstep test pass; go vet clean.

Co-authored-by: multica-agent <github@multica.ai>

---------

Co-authored-by: multica-agent <github@multica.ai>
2026-06-17 11:33:09 +08:00
LinYushen
32dac3dd57 MUL-3284: runtime_profile schema (custom runtime PR1) (#4140)
* MUL-3284: add runtime_profile schema (custom runtime PR1)

Schema-only foundation for custom runtimes. Additive migration 120:

- New workspace-level `runtime_profile` table: the shared, team-visible
  definition of a custom runtime (e.g. an in-house Codex wrapper).
  protocol_family is CHECK-constrained to the exact backend list in
  agent.New() (server/pkg/agent/agent.go). The only args column is
  `fixed_args` (args every agent on the runtime must inherit); there is
  deliberately no generic per-agent args field — those stay on
  agent.custom_args.
- `agent_runtime.profile_id` (nullable, FK -> runtime_profile ON DELETE
  CASCADE): NULL = built-in runtime, non-NULL = a registered instance of
  a custom profile.
- Partial unique index agent_runtime_workspace_daemon_profile_key on
  (workspace_id, daemon_id, profile_id) WHERE profile_id IS NOT NULL.

The legacy UNIQUE (workspace_id, daemon_id, provider) constraint is left
INTACT so the existing registration upsert
(ON CONFLICT (workspace_id, daemon_id, provider) in runtime.sql) keeps
resolving its arbiter and the server stays green. Converting that key to
a partial (WHERE profile_id IS NULL) index and making the upsert
profile-aware is PR2's registration work, not this migration.

Verified up + down against Postgres 17: full `migrate up` applies 120;
schema shows the table, column, partial index and intact legacy
constraint; functional checks pass (partial index blocks dup
(ws,daemon,profile), allows same profile on another daemon; CHECK and
display_name uniqueness reject bad input; legacy ON CONFLICT still
resolves; profile delete cascades to instances); down/up round-trip is
clean.

Co-authored-by: multica-agent <github@multica.ai>

* MUL-3284: drop DB FKs/cascade from runtime_profile migration (review fix)

Per review (house rule: no new database foreign keys / cascades; relational
integrity lives in the application layer):

- runtime_profile.workspace_id: drop REFERENCES workspace ON DELETE CASCADE
  -> plain UUID NOT NULL.
- runtime_profile.created_by: drop REFERENCES "user" ON DELETE SET NULL
  -> plain UUID.
- agent_runtime.profile_id: drop REFERENCES runtime_profile ON DELETE CASCADE
  -> plain UUID.

CHECK constraints, UNIQUE (workspace_id, display_name), the workspace index,
and the partial unique index agent_runtime_workspace_daemon_profile_key are
unchanged. The legacy UNIQUE (workspace_id, daemon_id, provider) constraint
remains untouched.

Behavioral consequence: the database no longer auto-removes a profile's
agent_runtime instance rows on profile delete. That cleanup moves into PR2's
profile-delete path. Up-migration comments document this; down-migration
comment no longer references FKs/cascade.

Re-verified on Postgres 17: migrate up applies 120; no FK constraints exist on
the new columns; partial index still blocks dup (ws,daemon,profile_id); CHECK
and display_name uniqueness still reject bad input; deleting a profile now
leaves the runtime row orphaned (proving cascade is gone); down/up round-trip
clean with the legacy constraint intact.

Co-authored-by: multica-agent <github@multica.ai>

---------

Co-authored-by: multica-agent <github@multica.ai>
2026-06-17 11:32:55 +08:00
LinYushen
99afb82c50 Add index on "user".created_at (#4063)
Adds migration 119 creating idx_user_created_at on "user"(created_at)
using CREATE INDEX CONCURRENTLY, matching the repo convention for
index-only migrations (114/115).

Co-authored-by: multica-agent <github@multica.ai>
2026-06-12 15:53:55 +08:00
Bohan Jiang
2e0b0bb776 fix(db): drop FK on agent_task_queue.initiator_user_id (MUL-2645) (#3959)
The initiator_user_id column (added in 117) carried a foreign key to the
"user" table. Adding that FK also locks the hot "user" table at migration
time, which made the ALTER time out on a busy production deploy. The
column only feeds a best-effort name/email lookup at claim time (a stale
id just yields no `## Task Initiator` section), so referential integrity
is not load-bearing.

- Edit 117 to add a plain `UUID` column (no FK). The original timed-out
  deploy never recorded 117, so its retry now runs the FK-free version.
- Add 118 to `DROP CONSTRAINT IF EXISTS` for environments that already
  applied the constraint-bearing 117 (they skip the edited 117 by
  version). All environments converge to a plain, FK-free column.

No code/codegen change: dropping the FK does not affect the Go column
type, so sqlc output is unchanged. Verified locally: 118 drops the FK and
keeps the column; sqlc regen produces no diff; build/vet/tests pass.

Co-authored-by: J <j@multica.ai>
Co-authored-by: multica-agent <github@multica.ai>
2026-06-09 19:59:20 +08:00
Bohan Jiang
24b162cdbc feat(daemon): surface the real task initiator to the agent runtime (MUL-2645) (#3899)
* feat(daemon): surface the real task initiator to the agent runtime (MUL-2645)

In a multi-person workspace the agent runtime only ever saw the runtime
OWNER identity: the brief's `## Requesting User` is sourced from
runtime.OwnerID and the task-scoped token is owner-bound, so every
requester (whoever commented, @mentioned, or chatted) appeared to the
agent as the owner. Agents that route by initiator for permission,
privacy, or audit all misjudged.

Resolve the real task initiator at claim time and surface it distinctly
from the owner:
- comment / mention trigger -> triggering comment's author (member or agent)
- chat task -> chat session creator (sessions are creator-only)
- on-assign / autopilot / quick-create -> no attributable initiator (omitted)

Adds initiator_{type,id,name,email} to the claim response, the daemon
Task, and TaskContextForEnv, rendered into the brief as a new
`## Task Initiator` section. The section documents the privacy boundary:
the agent's credentials stay owner-scoped, so this is an attested
identity for the agent's own routing/privacy logic, not act-as. No DB
migration — both paths are derivable from existing rows.

Tests: brief rendering (member/agent/omit/sanitize) + email guard unit
tests, and claim-handler tests for the comment and chat paths.

Co-authored-by: multica-agent <github@multica.ai>

* fix(chat): store real sender as task initiator, not chat_session creator (MUL-2645)

Review fix (Niko, PR #3899). v1 resolved the chat task initiator from
chat_session.creator_id at claim time. That is correct for web chat and
Lark p2p (creator == sender), but WRONG for Lark group chats: the group
session creator is deliberately the installer (stable identity across
member churn), not the message sender. So in a Lark group, every member
who triggered the agent showed up in the brief as the installer/owner —
the exact bug this issue is about, still live at that entry point.

Capture the real sender at enqueue time instead of deriving it from the
session creator at claim time:

- migration 117: agent_task_queue.initiator_user_id (FK user, ON DELETE
  SET NULL); NULL for non-chat and pre-migration rows.
- EnqueueChatTask now takes an explicit initiatorUserID. Web chat passes
  the authenticated request user; the Lark dispatcher threads the inbound
  sender (binding.MulticaUserID) through scheduleRun -> flushChatRun. The
  debouncer keeps the latest scheduled flush per session, so in a multi-
  sender silence window the LATEST sender wins (documented + tested).
- claim handler resolves the initiator from task.initiator_user_id and
  drops the creator_id fallback entirely.

The Lark group session creator stays the installer (unchanged) — only the
task initiator is corrected, keeping the two concepts cleanly separate.

Tests: dispatcher group regression (initiator = sender, not installer),
latest-sender-wins, p2p initiator assertion; the chat claim handler test
now sets creator != initiator and asserts the stored sender wins.

Co-authored-by: multica-agent <github@multica.ai>

---------

Co-authored-by: J <j@multica.ai>
Co-authored-by: multica-agent <github@multica.ai>
2026-06-08 19:29:57 +08:00
Bohan Jiang
6ac8314711 feat(lark): support both Feishu and Lark from one deployment (MUL-3083) (#3815)
* feat(lark): serve Feishu and Lark from one deployment, per installation

The Lark integration was locked to a single open-platform host chosen
deployment-wide (MULTICA_LARK_HTTP_BASE_URL / _CALLBACK_BASE_URL,
defaulting to open.feishu.cn), so one deployment could talk to only the
mainland Feishu cloud OR Lark international — never both. Teams on the
other tenant could not use the integration at all.

Make the host per-installation. The device-flow installer already
auto-detects the tenant (Lark emits tenant_brand="lark" mid-poll); we now
persist that as lark_installation.region, carry it on
InstallationCredentials.Region, and resolve the open-platform host per
call (REST + WS bootstrap) from the region. An explicit cfg.BaseURL
(env / httptest) still overrides every region, so existing tests and
staging/proxy setups keep working.

- migration 116: lark_installation.region TEXT NOT NULL DEFAULT 'feishu'
  CHECK (region IN ('feishu','lark')) — existing rows are all mainland.
- lark.Region enum + OpenPlatformBaseURL/RegionOrDefault helpers.
- registration: thread the detected region into finishSuccess so the
  install-time GetBotInfo hits the right cloud AND the row records it.
- every credential-build site (patcher, replier, WS provider, union_id
  backfill) copies region off the installation row.
- region is part of the WS supervisor fingerprint so a re-install that
  switches cloud restarts the connection.
- API: surface region on the installation listing DTO.

MUL-3083

Co-authored-by: multica-agent <github@multica.ai>

* feat(lark): surface installation region in settings UI

Read the per-installation region off the listings response: build the
"Manage in Lark" dev-console host from it (open.feishu.cn vs
open.larksuite.com instead of a hardcoded mainland host) and render a
Feishu / Lark badge on each connected bot. The field is optional and
defaults to Feishu when an older server omits it (API-compat). Adds the
region_feishu / region_lark labels to all four locales.

MUL-3083

Co-authored-by: multica-agent <github@multica.ai>

* docs(lark): document simultaneous Feishu + Lark support

The cloud each bot belongs to is now auto-detected at install and stored
per installation, so one deployment serves both. Replace the old
"point MULTICA_LARK_HTTP_BASE_URL at larksuite for international tenants"
guidance (now just an optional override) in all four locales.

MUL-3083

Co-authored-by: multica-agent <github@multica.ai>

* fix(lark): repair legacy Lark-international installs on upgrade

Review follow-up (MUL-3083). Migration 116 backfilled every existing
lark_installation to region='feishu', assuming all historical rows were
mainland. But self-host deployments could already run Lark international
via the deployment-wide MULTICA_LARK_HTTP_BASE_URL override, so those
rows are really Lark — clearing the override after upgrade (which the new
docs invite) would route them to open.feishu.cn and break them.

Add a one-shot startup repair, BackfillRegionFromLegacyOverride, fired
off the hot path like BackfillBotUnionIDs: when the deployment's global
base-URL override targets open.larksuite.com, relabel the still-default
'feishu' rows to 'lark'. Gating on the deployment-wide override is what
makes it safe — every pre-existing install on such a deployment was Lark.
Idempotent; no-op on mainland / fresh deployments. Verified end-to-end
against a scratch DB (flip then 0-row idempotent re-run).

Also document that a Lark/飞书 app_id is globally unique across both
clouds, which is what makes the app_id-keyed token cache and the
UNIQUE(app_id) constraint safe across regions (review nit).

MUL-3083

Co-authored-by: multica-agent <github@multica.ai>

* docs(lark): fix ops guidance to match auto per-installation region

Review follow-up (MUL-3083). .env.example and docker-compose.selfhost.yml
still told operators that international Lark requires pointing both base
URLs at open.larksuite.com — now wrong, and it would push a fresh
deployment back into a single-cloud override. Rewrite them: the base
URLs are optional deployment-wide overrides; normal dual-cloud operation
keeps them empty. Document the first-boot auto-relabel for deployments
migrating off the old single-cloud override, across the integration docs
(en/zh/ja/ko).

MUL-3083

Co-authored-by: multica-agent <github@multica.ai>

---------

Co-authored-by: J <j@multica.ai>
Co-authored-by: multica-agent <github@multica.ai>
2026-06-05 16:03:13 +08:00
LinYushen
3caba86b09 feat(scheduler): DB-backed execution-record scheduler [MUL-2957] 2026-06-05 13:46:26 +08:00
Multica Eve
945d684afd MUL-2965 fix(metrics): harden business sampler quality (#3738)
* fix(metrics): harden business sampler quality (MUL-2965)

Co-authored-by: multica-agent <github@multica.ai>

* fix(metrics): alert on sampler acquire failures

Co-authored-by: multica-agent <github@multica.ai>

---------

Co-authored-by: Eve <eve@multica-ai.local>
Co-authored-by: multica-agent <github@multica.ai>
2026-06-04 11:21:51 +08:00
Bohan Jiang
8c98940b79 Lark Bot integration MVP: migration + service boundary (MUL-2671) (#3277)
* feat(db): add Lark integration migration (MUL-2671)

Introduces seven tables for the 飞书 Bot integration MVP — per-agent
PersonalAgent installations, user/chat bindings, inbound dedup +
non-content drop audit, outbound card mapping, and short-lived
single-use member binding tokens.

Schema notes:
- chat_session schema unchanged; Lark routes through a separate
  binding table rather than adding a metadata JSONB column.
- Outbound card mapping is task/message scoped so multiple runs on
  the same session can't stomp each other's cards.
- lark_inbound_audit stores routing / identity / drop_reason ONLY,
  never message body — the audit channel for unbound users and group
  messages that don't address the Bot.
- app_secret stores ciphertext (encryption helper lands in a follow-up
  commit on this branch); DB never sees plaintext.

Co-authored-by: multica-agent <github@multica.ai>

* feat(util): add secretbox AES-256-GCM helper for at-rest secrets

First consumer is lark_installation.app_secret (MUL-2671 §4.4), but
the helper is intentionally generic — future per-tenant secrets that
must not appear in a DB dump can reuse it.

Construction: AES-256-GCM with a per-message random nonce, providing
authenticated encryption. Tampered ciphertext fails Open instead of
silently decrypting to garbage. Master key loaded from a base64 env
var via LoadKey; key rotation is not in scope yet.

Co-authored-by: multica-agent <github@multica.ai>

* refactor(issues): extract IssueService.Create as single create entry (MUL-2671)

Establishes the service-layer boundary mandated by Elon's 二审 of
MUL-2671 §4.8: issue creation no longer lives inside the HTTP
handler. Both the HTTP POST /issues handler and the future Lark
/issue command call into service.IssueService.Create, so duplicate
guard, issue numbering, attachment linking, broadcast, analytics,
and agent/squad enqueue stay aligned.

Handler responsibilities shrink to parsing the HTTP request, doing
actor resolution / validation (transport-specific), and converting
service results into the IssueResponse + 201. The transaction-wrapped
core, attachment link, event publish, analytics capture, and
agent/squad enqueue all move into service.IssueService.Create.

A BroadcastPayload callback on the service keeps the WS broadcast
shape (the full IssueResponse) without forcing the service to
depend on handler-layer response types.

Co-authored-by: multica-agent <github@multica.ai>

* feat(integrations): add Lark package skeleton (MUL-2671)

Establishes the architectural boundaries Elon's 二审 mandated as
first-PR blockers without dragging in OAuth, WebSocket, or
card-patching code (those land in follow-up PRs):

- ChatSessionService interface — channel-aware chat-session entry
  point for Lark, deliberately separate from the HTTP SendChatMessage
  handler. The HTTP handler's single-creator guard (creator_id ==
  request user_id) is correct for the browser client but rejects
  group chat_sessions by construction; Lark needs its own service.
- AuditLogger interface — the only path for recording dropped events.
  Its signature deliberately omits message body, enforcing the
  drop-audit policy (MUL-2671 §4.7) at the type level: unbound users
  and non-addressed group messages can't accidentally end up in
  chat_session.
- Typed IDs (OpenID, ChatID) prevent UUIDs from being conflated with
  Lark-side identifiers at compile time.
- DropReason constants align dashboard/audit queries across callers.

Co-authored-by: multica-agent <github@multica.ai>

* refactor(issues): move parent/project workspace check into IssueService (MUL-2671)

Parent existence and project workspace membership now live inside
IssueService.Create, inside the same transaction as the duplicate guard
and counter increment. The HTTP handler stops re-implementing the
lookup; every future create entry (Lark /issue, MCP, API keys) inherits
the same boundary without copy-pasting the SQL.

Adds two error sentinels (ErrParentIssueNotFound, ErrProjectNotFound)
so transports can translate to their own error shapes. Handler-level
cross-workspace tests guard the boundary against future regressions.

Co-authored-by: multica-agent <github@multica.ai>

* fix(db): harden Lark migration safety底座 — TTL cap + workspace FK (MUL-2671)

Two storage-layer hardenings that move the must-fix line off "the app
layer enforces it" and onto the schema itself, so future write paths or
hand-inserted rows cannot regress the invariants.

1) lark_binding_token TTL cap. The DB CHECK was 1 hour as
   defense-in-depth while the app constant was 15 minutes; the CHECK
   now matches the product cap (15 minutes). Application constant
   docstring updated to reflect that storage enforces the same bound.

2) lark_user_binding workspace membership. The table previously only
   FK'd to workspace / user / installation independently, so a binding
   could exist for a user no longer in the workspace, or claim a
   workspace different from its installation's. Two composite FKs
   close the gap structurally:

   * (installation_id, workspace_id) → lark_installation(id, workspace_id)
     — guarantees a binding's workspace_id always matches its
     installation's workspace_id. A new UNIQUE (id, workspace_id) on
     lark_installation is added as the FK target.

   * (workspace_id, multica_user_id) → member(workspace_id, user_id)
     with ON DELETE CASCADE — when a user is removed from the
     workspace, the binding cascades away in the same transaction.
     There is no longer a path where lark_user_binding outlives
     workspace membership.

These two FKs are the schema-level proof for §4.3's "unbound or
non-workspace members cannot leak content into chat_session" invariant.

Co-authored-by: multica-agent <github@multica.ai>

* feat(integrations/lark): inbound services + /issue dispatcher (MUL-2671)

Lands the inbound service layer for the Lark Bot MVP, sitting on top
of the migration + service-boundary scaffold from the previous commits.
What ships:

- sqlc queries for all seven lark_* tables (idempotent dedup insert,
  CAS WS-lease, single-use binding-token consume, etc.) plus
  GetMostRecentUserChatMessage for the /issue fallback.
- AuditLogger backed by lark_inbound_audit; signature deliberately
  body-free so callers cannot leak content into the drop log.
- ChatSessionService: find-or-create chat_session via the binding
  table (winner-takes-all on the UNIQUE race), append-with-dedup, /issue
  parser, "previous user message" fallback for bare `/issue` invocation.
- Dispatcher orchestrates the inbound pipeline in one place:
  installation routing → group-mention filter → identity check → ensure
  session → append+dedup → /issue → enqueue chat task. Group sessions
  use the installer as creator (stable workspace identity); p2p uses
  the sender. Agent-offline path falls through with OutcomeAgentOffline
  so the WS adapter can reply with the offline notice from §4.6.
- BindingTokenService: random URL-safe token, SHA-256 stored hash,
  15-min TTL pinned at the application AND the DB CHECK; Redeem
  returns the same opaque error for all rejection cases (no timing
  oracle on replay).
- Unit tests for the parser (13 cases), dispatcher (8 cases via fake
  Queries/Chat/Audit/IssueCreator/Enqueuer), and binding-token
  hash/entropy. Real-DB integration tests for OAuth + token redeem
  land alongside the HTTP handlers in the next commit.

Out of scope for this commit (next ones on the same feature branch):
OAuth callback, HTTP routes, WebSocket hub, outbound card patcher,
frontend.

Co-authored-by: multica-agent <github@multica.ai>

* feat(integrations/lark): installation HTTP surface + secretbox-gated wiring (MUL-2671)

Lands the HTTP boundary on top of the inbound services from the
previous commit. What ships:

- InstallationService.Upsert: the only path that writes
  lark_installation. Encrypts app_secret with the secretbox passed in
  at construction time; refuses to fall back to plaintext storage
  (returns an error from the constructor if no Box is supplied), so a
  misconfigured dev environment cannot accidentally land a row with
  cleartext credentials. Revoke flips status without DELETE so audit
  trail survives.

- HTTP handlers under /api/workspaces/{id}/lark/:
  * GET  /installations           — member-visible (Integrations tab
                                    renders for non-admins). Soft 200
                                    with empty list + configured:false
                                    when MULTICA_LARK_SECRET_KEY is
                                    unset, so the tab does not error
                                    on self-host that has not opted in.
  * POST /installations           — admin-only; 503 when not configured.
                                    Re-validates agent_id ∈ workspace
                                    before accepting credentials so a
                                    cross-workspace agent UUID is
                                    rejected.
  * DELETE /installations/{id}    — admin-only; workspace-scoped lookup
                                    so one workspace cannot revoke
                                    another's installation by UUID
                                    guess.

- POST /api/lark/binding/redeem (user-scoped, no workspace context):
  the only path that mints a lark_user_binding row from user action.
  Redeemer identity comes from the session, not the token, so a stolen
  link cannot bind an open_id to an attacker's Multica user. The
  composite FK on lark_user_binding cascades the binding away if the
  user is not (or no longer) a workspace member, so a non-member who
  steals the link gets 403 at the DB layer.

- Two new event-bus types in protocol.events:
  EventLarkInstallationCreated, EventLarkInstallationRevoked.

- Router wiring: MULTICA_LARK_SECRET_KEY drives a conditional
  initialization of h.LarkInstallations + h.LarkBindingTokens. When
  unset, the integration disables itself with an INFO log and the
  rest of the server boots normally.

- Handler tests cover all four not-configured short-circuits.
  Happy-path integration tests (real DB, full create→list→revoke
  cycle and token mint→redeem) ship alongside the WS hub PR.

Co-authored-by: multica-agent <github@multica.ai>

* fix(integrations/lark): close binding-token rebind & typed task errors (MUL-2671)

Two must-fixes from PR review on HEAD 87ad15e1:

1. Binding-token redeem could be used to grab an already-bound Lark
   open_id. Two changes harden the path:
   - lark.sql `CreateLarkUserBinding` now gates ON CONFLICT DO UPDATE
     on `multica_user_id = EXCLUDED.multica_user_id`, so a cross-user
     rebind via a second valid token returns zero rows instead of
     silently switching ownership.
   - `BindingTokenService.RedeemAndBind` consumes the token and writes
     the binding row inside one transaction. A failed bind no longer
     burns the token; a successful bind never leaves a consumed-but-
     unused token. Distinct typed errors: ErrBindingTokenInvalid (410),
     ErrBindingAlreadyAssigned (409), ErrBindingNotWorkspaceMember
     (403). The handler maps each to its own status code.

2. Dispatcher collapsed every `EnqueueChatTask` error to
   `OutcomeAgentOffline`, hiding infra failure and misusing the
   "offline" label for cases (e.g. archived agent) where it doesn't
   fit. Now:
   - `service.EnqueueChatTask` returns `ErrChatTaskAgentNoRuntime` and
     `ErrChatTaskAgentArchived` as sentinel errors; DB / load / insert
     failures stay wrapped as ordinary errors.
   - Dispatcher uses `errors.Is` to map only the productizable cases
     (`OutcomeAgentOffline`, new `OutcomeAgentArchived`); any other
     error is returned to the WS adapter so it can retry or page
     instead of disguising the outage as an offline card.

A daemon that's merely disconnected is still NOT an error — as long
as `agent.runtime_id` is set the chat task enqueues and waits for the
daemon to claim it on next online (returns `OutcomeIngested`).

Co-authored-by: multica-agent <github@multica.ai>

* ci: re-trigger workflow on lark MVP must-fix HEAD

Co-authored-by: multica-agent <github@multica.ai>

* ci: re-trigger workflow on lark MVP must-fix HEAD (retry)

Co-authored-by: multica-agent <github@multica.ai>

* test(integrations/lark): guard binding-token sentinel contract (MUL-2671)

Two unit tests that document and protect the must-fix invariants
without requiring a DB:

1. TestRedeemAndBindRequiresTxStarter — if a future refactor wires
   up BindingTokenService without a TxStarter, RedeemAndBind must
   fail fast with a clear error rather than nil-panic on Begin.
   The atomicity contract (consume + bind commit together) depends
   on that transaction existing.

2. TestBindingErrorSentinelsAreDistinct — the HTTP handler maps
   ErrBindingTokenInvalid → 410, ErrBindingAlreadyAssigned → 409,
   ErrBindingNotWorkspaceMember → 403. Accidentally aliasing them
   (e.g. var ErrBindingAlreadyAssigned = ErrBindingTokenInvalid)
   would silently regress the response codes without any other
   test catching it.

Co-authored-by: multica-agent <github@multica.ai>

* feat(integrations/lark): WS hub orchestrator + outbound card patcher (MUL-2671)

The hub owns one supervisor goroutine per active installation. Each
supervisor acquires the WS lease via the existing CAS query, runs an
EventConnector (interface — real Lark wire protocol lands in a follow-up
behind it), renews the lease on a tighter cadence than the TTL, and
backs off (with jitter) on connector failure. Lease loss tears the
connector down cleanly; revocation is reaped on the next sweep. Per-
process node id satisfies §4.4 multi-replica safety: at most one Hub
globally holds the lease for any installation.

The patcher subscribes to task / chat-done events on the existing
events.Bus and keeps the per-task Lark interactive card in sync
(thinking → streaming → final | error). Card binding is per-task as
required by §4.5; throttled patches via an in-memory last-patched map;
final / error transitions bypass the throttle so the user always sees
the terminal state. The Renderer is plug-replaceable so the product
card template can evolve without touching transport.

The APIClient interface centralizes the Lark Open Platform surface
this package needs (send card, patch card, send binding prompt,
exchange OAuth code). The default stubAPIClient returns
ErrAPIClientNotConfigured for every transport call so a misconfigured
deployment fails loudly instead of dropping cards silently. Real
implementation lands in a follow-up; OAuth callback + frontend entries
land in the next commits on this branch.

Co-authored-by: multica-agent <github@multica.ai>

* feat(integrations/lark): OAuth install start / callback (MUL-2671)

OAuthService builds a signed-state Lark authorization URL the frontend
can render as a QR (or open directly), then on callback verifies the
HMAC-protected state, exchanges the OAuth code for installation
credentials via APIClient.ExchangeOAuthCode, and persists the row via
InstallationService.Upsert (which keeps app_secret encryption inside a
single chokepoint).

State token format: workspaceID.agentID.initiatorID.expiresUnix.nonce.sig
— HMAC-SHA256 over the first five fields with a deployment-level
secret. TTL defaults to 10 minutes (covered by tests). Three failure
modes (invalid state / expired state / missing code) map to typed
errors so the HTTP handler can emit a single lark_error= query param
the frontend uses to pick copy.

Both endpoints degrade cleanly: the at-rest key gate (already in place)
returns 503 from /install/start when the InstallationService is nil,
and the OAuth gate (MULTICA_LARK_OAUTH_APP_ID / _SECRET / _REDIRECT_URI
/ _STATE_SECRET) returns configured:false from /install/start so the
frontend can render "configure manually instead" without an error
banner. /install/callback always finishes with a redirect to
/settings?tab=lark carrying either lark_installed=1 or lark_error=<code>.

Tests cover signed-URL shape, missing-config rejection, tampered state,
expired state, propagated exchange error, and the no-config redirect
path on the HTTP handler.

Co-authored-by: multica-agent <github@multica.ai>

* feat(views/lark): settings tab + agent bind button + /lark/bind redemption page (MUL-2671)

Adds the user-facing Lark surface across the shared packages:

- packages/core/types/lark.ts — wire shapes that mirror server/internal/
  handler/lark.go. Optional fields default to undefined so older desktop
  builds keep parsing if the server adds new keys (CLAUDE.md → API
  Response Compatibility).
- packages/core/lark/{queries,index}.ts — Tanstack Query options keyed
  by workspace id; realtime sync invalidates `installations(wsId)` on
  `lark_installation:*` events.
- packages/core/api/client.ts — listLarkInstallations,
  getLarkInstallURL, deleteLarkInstallation, redeemLarkBindingToken.
- packages/views/settings/components/lark-tab.tsx — Settings → Lark
  panel. Listing is member-visible (matches backend); disconnect is
  admin-only. Empty state points users at the per-Agent bind entry,
  matching the (workspace_id, agent_id) UNIQUE: there is no
  "pick an agent" UI here because the bind URL is per-agent.
- LarkAgentBindButton (same file) is the per-Agent CTA the Agent
  detail page imports. Opens the OAuth URL in a new tab; the callback
  bounces back to /settings?tab=lark with a query param the panel
  reads for inline confirmation copy.
- packages/views/lark/bind-page.tsx — the Bot's "you need to bind"
  destination. Requires session before redeeming, distinguishes the
  410/409/403 backend responses into distinct copy.
- apps/web/app/lark/bind/page.tsx — Next.js route wrapping the shared
  bind page in a Suspense boundary (Next 15 useSearchParams rule).

i18n: all user-facing strings land in en/zh-Hans, settings tab nav
includes a Sparkles-iconed Lark entry, bind-page copy lives under
common.lark_bind so it works pre-workspace-context too. typecheck +
lint clean.

Co-authored-by: multica-agent <github@multica.ai>

* chore(integrations/lark): wire outbound Patcher into server bootstrap (MUL-2671)

Constructs the Patcher next to the existing Installation/BindingToken
wiring in router.go and Register()s it on the event bus. With the stub
APIClient any actual transport call surfaces ErrAPIClientNotConfigured;
once the real Lark client lands, swap NewStubAPIClient for the real
implementation here without touching the Patcher's subscription logic.

doc.go updated to reflect everything the package now contains (Hub,
Patcher, OAuthService, APIClient interface). The Hub itself is NOT
booted here yet — it needs an EventConnector implementation for the
Lark long-connection wire protocol, which lands in a follow-up; the
orchestrator code and its unit tests are in place so that follow-up
can focus on the WS protocol rather than lifecycle plumbing.

Co-authored-by: multica-agent <github@multica.ai>

* fix(integrations/lark): address Elon 二审 5 must-fix items (MUL-2671)

- Hub: renewer cancels run ctx on lease loss so the connector exits
  even if its wire I/O is blocked, keeping the §4.4 ownership
  invariant intact under lease theft.
- Hub: EventEmitter returns (DispatchResult, error) so the real
  connector can post the matching Lark-side card (needs_binding,
  agent_offline, agent_archived) and react to infra failures instead
  of silently logging at the seam.
- Dispatcher: top-level message_id dedup runs before group filter
  and identity check, so a reconnect storm cannot re-fire binding
  prompts or re-spam not_addressed_in_group audit rows; the in-
  AppendUserMessage dedup is removed since the table-level UNIQUE
  is the ultimate backstop.
- OAuth: HandleCallback auto-binds the installer via the new
  InstallerBinder seam (BindingTokenService implements it), so the
  §2.1 "scan to bind, you're done" promise holds end-to-end.
  validateExchangeResult now requires installer open_id; new error
  reason codes wired through the callback redirect.
- Frontend / handler: install_supported listing field + StartLark-
  Install short-circuit on stub APIClient hide install entry points
  (Settings tab + per-agent button) while no real Lark HTTP client
  is wired, so users do not land in an OAuth flow that fails at
  exchange.

Includes tests for each fix (lease-loss cancel, emit error
propagation, dedup ordering, OAuth installer-bind contract, stub-
client install gate) and i18n strings for the new preview state.

Co-authored-by: multica-agent <github@multica.ai>

* fix(integrations/lark): two-phase dedup so infra failures do not swallow messages (MUL-2671)

The pre-fix top-level dedup wrote the lark_inbound_message_dedup row before
EnsureChatSession / AppendUserMessage. An infra error in either step left
the row in place and a WS-adapter retry was mis-classified as a duplicate,
so the user's Lark message was permanently lost without ever landing in
chat_session.

Make dedup two-phase:

  - ClaimLarkInboundDedup acquires an in-flight claim (processed_at NULL).
    Stale claims older than 60 s are re-takeable so a process crash does
    not strand the message_id.
  - MarkLarkInboundDedupProcessed flips processed_at on durable success
    (audit row OR chat_message + session touch).
  - ReleaseLarkInboundDedup deletes the in-flight row on infra failure
    before any durable side effect, so the retry can re-claim immediately.

Dispatcher.Handle now finalizes the claim exactly once based on whether
the inner pipeline reached a durable outcome — chat_message commit being
the transition point (errors past it Mark, errors before it Release).

Regression tests cover the two failure variants Elon flagged plus the
inverse invariants (durable-error Marks, drops Mark, in-flight replays
drop, stale claims re-claim).

Co-authored-by: multica-agent <github@multica.ai>

* fix(integrations/lark): owner-fence dedup claim to close the double-write windows (MUL-2671)

The two-phase Claim/Mark/Release fix from the previous commit closed the
"infra error swallows a replay" gap but left two windows that could still
write a chat_message twice for the same Lark message_id:

  1. Stale-reclaim race. Worker A claims at t=0, runs slowly past the
     60 s staleness TTL but is still alive. Worker B sees the row as
     stale and re-takes the claim. A reaches AppendUserMessage and
     commits a second chat_message.

  2. Mark window. Worker A commits chat_message but the post-pipeline
     MarkLarkInboundDedupProcessed fails (DB hiccup) or the process
     crashes before it runs. 60 s later a retry treats the in-flight
     row as stale, re-claims it, and writes a second chat_message.

Close both with owner fencing + same-tx Mark:

  - lark_inbound_message_dedup now carries a `claim_token` UUID;
    ClaimLarkInboundDedup mints a fresh one on insert and on stale
    re-take, so a reclaim ROTATES the token.

  - MarkLarkInboundDedupProcessed and ReleaseLarkInboundDedup are
    fenced on (message_id, claim_token, processed_at IS NULL) and
    return rowsAffected. Zero means our token is no longer live, and
    the caller treats it as a no-op (not an error).

  - AppendUserMessage invokes MarkLarkInboundDedupProcessed INSIDE its
    chat_message+session tx (qtx). If the token has been rotated by a
    concurrent reclaim, the Mark matches zero rows and the method
    returns ErrClaimLost; the deferred Rollback unwinds the
    chat_message insert, so the other holder is the sole writer. The
    durable write and the Mark therefore commit (or roll back)
    atomically — there is no "committed but not yet Marked" window
    for a crash or retry to exploit.

Dispatcher.processClaimed now returns a tri-state dedupFinalize directive
(none / mark / release): finalizeNone for the in-tx Mark path (and
ErrClaimLost), finalizeMark for audit-drop branches and the defensive
post-Append-success fallback, finalizeRelease for pre-durable infra
errors. ErrClaimLost is translated into OutcomeDropped + DropReason-
Duplicate at the Handle boundary, matching what the WS adapter expects
for a "another worker is the writer" outcome.

Regression tests:

  - TestDispatcher_StaleReclaimRaceDoesNotDoubleWrite injects worker
    B's reclaim via a beforeAppend hook so the claim_token rotates
    between Claim and AppendUserMessage. Asserts worker A's
    AppendUserMessage returns ErrClaimLost (no chat_message
    committed), the dispatcher surfaces a duplicate drop, the token
    rotated to a value distinct from A's original, and a follow-up
    replay still duplicate-drops.

  - TestDispatcher_InTxMarkPreventsPostCommitReclaim verifies the
    "Mark window" case is unreachable: a successful in-tx Mark
    produces exactly one Mark call (no post-finalize duplicate), the
    row is terminal, and a retry with dedupReclaim=true still
    duplicate-drops without re-rotating the token.

  - TestDispatcher_InTxMarkSucceedsAndSkipsPostFinalize pins the
    positive contract: DedupMarked=true must make applyFinalize a
    no-op (no extra Mark, no Release).

fakeQueries gains a fakeDedupRow model carrying (processed, token,
rotations) so the test seam matches production's UPDATE-with-WHERE
semantics; fakeChat gains a beforeAppend hook to inject race timing.

go test ./... and go vet ./... pass.

Co-authored-by: multica-agent <github@multica.ai>

* feat(integrations/lark): real Lark HTTP APIClient for IM v1 send/patch (MUL-2671)

Lands the production Lark Open Platform HTTP APIClient that replaces
the stub for outbound transport. The patcher's "thinking → streaming
→ final | error" card lifecycle and the dispatcher's binding-prompt
card both now reach Lark for real once MULTICA_LARK_HTTP_ENABLED=true.

Scope of this stage:

  - tenant_access_token retrieval via /open-apis/auth/v3/
    tenant_access_token/internal, cached in-process per app_id with a
    60s safety margin against Lark's `expire` value. Sub-2-minute
    expires are clamped to 120s so we never cache an entry that's
    already past its safe window.
  - SendInteractiveCard: POST /open-apis/im/v1/messages?receive_id_type=chat_id
    returning the Lark message_id the Patcher persists in
    lark_outbound_card_message for later patches.
  - PatchInteractiveCard: PATCH /open-apis/im/v1/messages/:id with
    the full re-rendered card body (Lark's update endpoint replaces,
    not deep-merges).
  - SendBindingPromptCard: open_id-targeted interactive card with a
    primary "去绑定" CTA pointing at the redemption URL. Template is
    co-located with the transport so the dispatcher never has to know
    about Lark's card schema.
  - Token-error invalidation: Lark codes 99991663 (expired) /
    99991664 (invalid) drop the cached token so the next call
    refreshes from /tenant_access_token/internal instead of looping
    on a stale entry.

Out of scope (deferred to follow-up stages):

  - ExchangeOAuthCode stays unimplemented behind
    ErrAPIClientNotConfigured. The PersonalAgent install handshake's
    response shape (returning per-installation app credentials in a
    single call) is not yet verified against the production endpoint,
    and a silent mis-fill of OAuthExchangeResult would corrupt
    lark_installation rows past validateExchangeResult. Operators
    continue to use the manual-paste InstallationService path until
    the OAuth stage lands.
  - Inbound WS EventConnector — Hub's ConnectorFactory still needs a
    real wire-protocol implementation.

Wiring:

  - MULTICA_LARK_HTTP_ENABLED=true switches router.go from the stub
    to the real client. MULTICA_LARK_HTTP_BASE_URL overrides the
    default open.feishu.cn host (set to open.larksuite.com for the
    Lark international tenant, or to an httptest URL for integration
    tests).
  - The OAuth handler now also receives the real client (its
    ExchangeOAuthCode still surfaces ErrAPIClientNotConfigured, so
    callback behavior is unchanged until that stage lands).

Tests (19 new cases against an httptest.Server fake):

  - happy path send/patch/binding-prompt round trips, asserting URL
    query params, body shape, Authorization header
  - token cache: 3 sends share one /tenant_access_token/internal hit
  - token refresh after clock-driven expiry
  - sub-margin expire clamping (10s expire → cached for >= safety
    margin of wall-clock)
  - Lark error code surfacing (230001 send, 230002 patch, 10003 auth)
  - token-expired (99991663) invalidates the cache; caller's retry
    re-fetches and succeeds
  - non-2xx HTTP status surfaces "http 500: …"
  - input validation: missing chat_id short-circuits BEFORE auth
    round-trip, missing card json / open_id / bind url all fail
    pre-flight without hitting Lark
  - ExchangeOAuthCode still returns ErrAPIClientNotConfigured
  - binding-prompt template carries the BindURL and the localized
    "去绑定" CTA in valid JSON

go build ./..., go vet ./..., and go test ./internal/integrations/lark/...
pass. Pre-existing handler/router integration tests that require a
real Postgres connection are unaffected by this change.

Co-authored-by: multica-agent <github@multica.ai>

* fix(integrations/lark): split outbound vs OAuth-install capability + card update_multi (MUL-2671)

Address Elon's two must-fix items from the HEAD a09993b1 review:

1. HTTP outbound and OAuth-install are now distinct APIClient
   capabilities. The new SupportsOAuthInstall() reports whether the
   install flow can succeed end-to-end (i.e. ExchangeOAuthCode is
   implemented); the real httpAPIClient still returns IsConfigured()
   = true (send / patch / binding prompt work) but
   SupportsOAuthInstall() = false until the PersonalAgent install-time
   response shape is pinned. Handler-side `install_supported` and
   StartLarkInstall now gate on SupportsOAuthInstall, so a half-wired
   client never reveals the scan-to-bind UI. larkOAuthErrorReason also
   maps ErrAPIClientNotConfigured to a dedicated
   `oauth_exchange_unimplemented` reason so a raw callback hit no
   longer masquerades as `internal_error`.

2. defaultRenderer now emits config.update_multi=true on every Kind.
   Lark refuses to apply PatchInteractiveCard to a card whose initial
   config doesn't declare it shared/updatable, so the absent flag
   would make every patch after the first send silently no-op on the
   wire while the local outbound status row still flipped to
   streaming/final.

Tests cover both halves of each fix:
- TestHTTPClient_SupportsOAuthInstall_FalseUntilExchangeLands +
  TestHTTPClient_StubReportsBothCapabilitiesFalse pin the new
  capability surface.
- TestStartLarkInstall_TransportOnlyClientReportsNotConfigured +
  TestListLarkInstallations_TransportOnlyClientReportsInstallNotSupported
  pin the handler gate at exactly the half-wired state.
- TestLarkOAuthErrorReason_APIClientNotConfigured pins the mapping
  for both the bare sentinel and the fmt.Errorf-wrapped form
  HandleCallback produces.
- TestDefaultRendererConfigCarriesUpdateMulti covers every CardKind.
- TestHTTPClient_(Send|Patch)InteractiveCard_DefaultRendererBodyHasUpdateMulti
  verify the wire body Lark actually receives carries update_multi
  through both send and patch transport paths.

Co-authored-by: multica-agent <github@multica.ai>

* feat(integrations/lark): real OAuth code exchange + agent-detail bind entry (MUL-2671)

Stages the install side of the MVP critical path on top of the real
HTTP outbound work:

- httpAPIClient.ExchangeOAuthCode runs the production Lark v2 OAuth
  flow: POST /authen/v2/oauth/token to swap the authorization code
  for the installer's open_id, then GET /bot/v3/info under the parent
  app's tenant_access_token to fetch bot_open_id. Result feeds
  InstallationParams unchanged so OAuthService.HandleCallback's
  auto-bind step lights up automatically.

- HTTPClientConfig gains OAuthAppID/OAuthAppSecret, read from the same
  MULTICA_LARK_OAUTH_APP_ID/_APP_SECRET env vars the OAuthConfig
  consumes. SupportsOAuthInstall now mirrors that pair so the install
  capability gate is honest: outbound transport without OAuth creds
  reports configured-but-not-install-supported, exactly like before.

- Agent detail inspector wires the LarkAgentBindButton in a new
  Integrations section, viewer-hidden by canEdit. The button still
  self-hides when SupportsOAuthInstall is false, so a deployment
  without OAuth creds renders the section empty rather than CTA-broken.

- Capability wording cleaned across handler / router / lark-tab to say
  "OAuth-install capability" instead of "real APIClient wired", and
  the misleading TransportOnly... test was renamed/refocused on the
  early-return branch it actually exercises (Elon non-blocking note).

Co-authored-by: multica-agent <github@multica.ai>

* fix(integrations/lark): identity-only OAuth + atomic bind (MUL-2671)

Addresses Elon's round-4 must-fix items on PR #3277:

1. OAuth v2 token → user_info chain now matches Lark's official
   user-OAuth shape. `httpAPIClient.ExchangeOAuthCode` POSTs
   /open-apis/authen/v2/oauth/token (RFC 6749: top-level
   access_token, NO open_id), then GETs /open-apis/authen/v1/user_info
   with the user_access_token as Bearer to obtain the installer's
   open_id / union_id. The test fixture now reflects the real
   wire shape (separate user_info handler; no synthetic open_id in
   the token response).

2. `OAuthExchangeResult` is identity-only — drops the synthesized
   shared-parent AppID / AppSecret / BotOpenID return that broke
   the UNIQUE(app_id) constraint and the dispatcher's per-app_id
   routing. `OAuthService.HandleCallback` no longer Upserts an
   installation row: it looks up the lark_installation already
   provisioned via the manual-paste POST /lark/installations route
   and binds the installer onto it. Two new typed errors —
   ErrInstallationNotProvisioned and ErrInstallationRevoked — map
   to `installation_not_provisioned` / `installation_revoked`
   reasons at the HTTP boundary so the UI can guide the admin.
   The PersonalAgent install API (which would deliver
   per-installation bot credentials at scan time) remains a
   follow-up; until it lands the OAuth flow is identity-binding
   only and the agent-detail bind button stays hidden on
   deployments without OAuth env (capability gate unchanged).

3. The installation lookup + installer bind run inside a single
   DB transaction so a concurrent revoke / re-provision between
   the read and the binding insert cannot leak a half-applied
   state. `InstallerBinder.BindInstaller` is renamed to
   `BindInstallerTx` and accepts the OAuth-service-owned
   transaction's qtx; the binding_token redemption path is
   unchanged.

4. `validateExchangeResult` is simplified to require only the
   installer's open_id; the obsolete ErrExchangeMissingAppID /
   AppSecret / BotOpenID sentinels are removed (no caller can
   trip them now). The oauth_test suite is rewritten to use a
   stub failTxStarter so tests covering state-token verification
   and exchange-error propagation remain DB-free, while a new
   TestOAuthCallbackOpensTxAfterValidExchange pins the post-must-fix
   order (state ok + exchange ok ⇒ Begin runs before any lookup
   or bind, and a Begin failure aborts cleanly with no bind).

Verified locally:
  - go build ./... / go vet ./... clean
  - go test ./internal/integrations/lark/... ✓
  - go test ./internal/handler -run 'Lark|Binding|OAuth' ✓
  - go test ./internal/util/secretbox/... ./internal/service/... ✓

Co-authored-by: multica-agent <github@multica.ai>

* feat(integrations/lark): device-flow scan-to-install (MUL-2671)

Replaces the manual paste-credentials install path + identity-only
OAuth callback (rejected in product review: too many steps before a
user sees value) with a true single-step scan-to-install built on
Lark's RFC 8628 device-flow registration endpoint
(POST accounts.feishu.cn/oauth/v1/app/registration) — the same
protocol the official larksuite/oapi-sdk-go/scene/registration
package and zarazhangrui/feishu-claude-code-bridge use.

User journey: admin clicks "Bind to Lark" on the Agent detail page
→ QR dialog opens → admin scans in the Lark app on their phone →
authorizes the new PersonalAgent → dialog auto-closes with the new
installation visible. No app_id / app_secret to copy, no Lark
developer console visit, no Multica-side OAuth env to configure.

Backend (server/internal/integrations/lark):
- registration.go — inline ~280-line RFC 8628 client. Begin posts
  archetype=PersonalAgent / auth_method=client_secret /
  request_user_info=open_id; Poll follows the upstream SDK's
  state machine including the tenant-brand mid-stream domain swap
  to accounts.larksuite.com when a Lark-international account
  authorizes. SDK is NOT vendored — one endpoint isn't worth
  dragging the full oapi-sdk-go + transitive deps.
- registration_service.go — owns the in-process session store
  + background polling goroutine. On success calls APIClient.GetBotInfo
  (the new IM-side endpoint added below) and writes
  lark_installation + the installer's lark_user_binding inside
  one DB transaction so a half-applied install can never land.
  Stable error_reason codes (expired / access_denied /
  lark_protocol_error / bot_info_failed / installation_conflict /
  installer_bind_failed / internal_error) drive the UI copy
  without parsing prose.
- client.go / http_client.go — drops ExchangeOAuthCode and
  SupportsOAuthInstall (no longer applicable: device-flow returns
  identity alongside credentials in one response); adds GetBotInfo
  which mints a tenant_access_token from the freshly-minted
  client_id / client_secret and calls /open-apis/bot/v3/info for
  the bot_open_id. install_supported now gates on IsConfigured()
  (real HTTP client wired) instead of a separate OAuth capability.
- binding_token.go — absorbs InstallerBindParams / InstallerBinder
  (previously in oauth.go), retargets the doc-comment from the
  OAuth caller to the device-flow caller.
- Deletes oauth.go + oauth_test.go entirely.

Handler & router (server/internal/handler, server/cmd/server):
- POST /api/workspaces/{id}/lark/install/begin — opens a new
  registration session, returns {session_id, qr_code_url,
  expires_in_seconds, poll_interval_seconds}. Admin-only.
- GET /api/workspaces/{id}/lark/install/{sessionId}/status —
  polling endpoint, returns {status, installation_id?, error_reason?,
  error_message?}. Workspace-scoped lookup so a stolen session_id
  cannot be polled from another workspace. Admin-only.
- Removes POST /lark/installations (paste form),
  GET /lark/install/start (OAuth-redirect entry), and
  GET /api/lark/install/callback (OAuth redirect target).
- Removes MULTICA_LARK_OAUTH_APP_ID / _APP_SECRET / _REDIRECT_URI /
  _STATE_SECRET / _AUTHORIZE_URL / _SUCCESS_URL env vars. Self-host
  operators no longer need a parent Lark app at all.

Frontend (packages/core, packages/views):
- New types BeginLarkInstallResponse / LarkInstallStatusResponse
  + matching API methods (beginLarkInstall / getLarkInstallStatus);
  drops getLarkInstallURL.
- LarkAgentBindButton opens LarkInstallDialog instead of a
  window.open() to Lark's authorize page. The dialog uses
  react-qr-code (catalog) to render the verification_uri_complete
  inline as SVG (no external CDN image), polls status at the
  server-supplied cadence, auto-closes on success, offers
  "scan again" on terminal failure. Per CLAUDE.md "Enum drift
  downgrades, not crashes", error_reason switch has a default
  fallback so an older desktop build on a newer server still
  renders the generic failure copy.
- Adds the device-flow strings to en + zh-Hans settings.json;
  removes the obsolete OAuth-not-configured copy.

Verified locally:
  - go build ./... / go vet ./... clean
  - go test ./internal/integrations/lark/... — all green
    (existing tests + 15 new registration / GetBotInfo tests)
  - go test ./internal/handler -run 'Lark|Binding' — all green
  - pnpm typecheck — all 6 packages clean
  - pnpm lint — 0 errors (15 pre-existing warnings, none in changed files)
  - pnpm --filter @multica/views test — 859/859 pass

Pre-existing failures in server/internal/middleware (column
"profile_description" missing from local test DB) reproduce against
the parent commit and are unrelated to this change.

Co-authored-by: multica-agent <github@multica.ai>

* fix(integrations/lark): gate bind CTA to workspace admins, terminate QR polling on 4xx (MUL-2671)

Two frontend must-fixes from the PR #3277 二审:

1. LarkAgentBindButton now self-hides for non-admin viewers in addition
   to the existing install_supported check. The agent-detail page mounts
   the button under `canEdit`, which canEditAgent lets agent owners
   through even when they are not workspace admins — but the backend
   gates POST /lark/install/begin and the status poll on owner/admin
   (router.go:478-487), so the previous behavior shipped a CTA that was
   guaranteed to 403. The new gate reads workspace role from the same
   member list the settings tab already uses.

2. The status polling loop now terminates on 404 (session gone — server
   restarted, multi-instance routing, or in-process GC swept it) and
   403/401 (permission revoked mid-session). Previously every error
   path scheduled another setTimeout, which trapped the user on a stale
   QR forever. ApiError gives us the HTTP status verbatim; terminal
   responses set status=error with stable error_reason codes
   (session_lost, forbidden) that flow through the existing dialog
   switch + retry/close affordances. 5xx + network blips still retry.

i18n: new install_error_session_lost / install_error_forbidden in en
and zh-Hans, with default fallback preserved per the enum-drift rule.

Coverage: 6 new vitest cases — admin/owner allow, member deny,
unsupported-install deny, and the two terminal-error polling paths
using fake timers to assert the loop stops scheduling.

Also clears a handful of stale OAuth/manual-install doc comments
flagged in the review (non-blocker cleanup): doc.go's §10 now points
at RegistrationService, installation.go's input-shape doc loses the
OAuth-callback half, and client.go's stubAPIClient comments no longer
reference OAuth callbacks.

Co-authored-by: multica-agent <github@multica.ai>

* docs(integrations/lark): describe gate as device-flow install in agent-detail integrations comment (MUL-2671)

The comment block above the agent-detail Integrations section still
described the capability gate as 'server-side OAuth-install'. The
OAuth path is gone — install is now device-flow per RFC 8628 — so the
comment now reads 'server-side device-flow install capability gate'.

Pure comment change; behavior is unchanged. Cleans up the nit Elon
called out in PR #3277 二审 (MUL-2671).

Co-authored-by: multica-agent <github@multica.ai>

* feat(integrations/lark): wire inbound pipeline + WS Hub at boot (MUL-2671)

Stage 3.a of MUL-2671. Hub class, Dispatcher, ChatSessionService and
AuditLogger have all been implemented and tested in prior PRs but
none of them was constructed at boot, so the in-process plumbing
was never exercised end-to-end. This change wires them together
behind the same `MULTICA_LARK_SECRET_KEY` gate that already gates
InstallationService / RegistrationService, and starts the Hub under
the existing `sweepCtx` so it winds down alongside the other
long-running workers after HTTP drain.

The real long-conn EventConnector is still pending; the factory
hands every supervisor a shared NoopConnector that holds the lease
and emits nothing. That lets staging exercise the lease /
supervisor / shutdown lifecycle against real DB rows without
committing to the Lark wire protocol implementation. Swapping in
the real connector is a single line change in the same router
block; the Dispatcher / ChatSessionService / Hub seams stay frozen.

## Why a noop placeholder, not a stub-or-skip

The Hub's value is mostly its lifecycle: §4.4 ownership lease,
LeaseRenewInterval / LeaseTTL, supervisor reap on revoke, clean
release on shutdown. None of that runs unless the Hub is actually
started. Holding off until the real connector lands means the next
PR has to debut both pieces simultaneously; wiring the supervisor
loop first lets the real connector PR be a focused, reviewable
swap.

## Changes

- `internal/integrations/lark/noop_connector.go` — `NoopConnector`
  implementing `EventConnector`: blocks on ctx until the Hub
  cancels (lease loss / shutdown / revoke), emits no events, logs
  on enter/exit so operators see exactly which installation the
  supervisor is holding the lease for.
- `internal/integrations/lark/noop_connector_test.go` — verifies
  the connector blocks until ctx cancel, returns nil on clean exit,
  never invokes the emit callback, and the factory shares a single
  connector instance across installations.
- `internal/handler/handler.go` — new `LarkHub *lark.Hub` field on
  `Handler`. Nil when the Lark integration is disabled.
- `cmd/server/router.go` — inside the existing Lark wiring block,
  construct `AuditLogger`, `ChatSessionService` (with `*pgxpool.Pool`
  for the in-tx dedup Mark), `Dispatcher` (wiring `h.IssueService`
  and `h.TaskService` so `/issue`-created issues share counter /
  duplicate guard / project boundary / broadcast / analytics with
  the rest of the product), and the `Hub` with the
  `NoopConnectorFactory`. `NewRouterWithOptions` now returns
  `(chi.Router, *handler.Handler)` so main.go can drive Hub
  lifecycle; `NewRouter` discards the handler.
- `cmd/server/main.go` — start the Hub under `sweepCtx` after the
  other background workers, and `Wait` on it after HTTP drain +
  sweep cancel so the lease renewer can issue a final release
  before exit. Skipped entirely when `h.LarkHub == nil`.

## Test plan

- [x] `go build ./...` clean
- [x] `go vet ./...` clean
- [x] `go test ./internal/integrations/lark/...` (new noop tests +
      existing hub / dispatcher / chat_service / registration /
      binding_token / outbound / issue_command suites) — all pass
- [x] `go test ./internal/handler -run 'TestLark|TestRedeemLarkBinding'`
      pass — handler-side Lark surfaces unchanged
- [x] `go test ./internal/service/... ./internal/util/secretbox/...`
      pass
- [x] `pnpm --filter @multica/views exec vitest run settings/components/lark-tab`
      pass (6/6) — frontend lark surfaces unchanged
- [ ] Local broad `go test ./internal/handler/...` still blocked by
      the pre-existing test DB schema drift Elon flagged in the
      previous round (`column "metadata" does not exist`,
      unrelated to this change); CI is the authoritative check.
- [ ] Manual end-to-end deferred until the real long-conn
      EventConnector lands (next stage).

MUL-2671

Co-authored-by: multica-agent <github@multica.ai>

* fix(integrations/lark): bound Hub lease release + shutdown wait (MUL-2671)

Lease release used context.Background(); a stalled DB pool could pin
shutdown indefinitely. Add LeaseReleaseTimeout (5s default) and
ShutdownTimeout (15s default) to HubConfig, route releaseLease through
a bounded context, and expose WaitWithTimeout for main.go so a wedged
supervisor degrades to LeaseTTL expiry on the next replica instead of
blocking process exit. Also correct the LarkHub field comment in
handler.go: the Hub is wired whenever the at-rest secret key is set,
independent of whether the outbound HTTP APIClient is configured.

Co-authored-by: multica-agent <github@multica.ai>

* feat(integrations/lark): real WS long-conn connector + ctx-cancel-breaks-read (MUL-2671)

Replaces NoopConnectorFactory with a production EventConnector that
opens Lark's event-subscription WebSocket. Gated behind
MULTICA_LARK_WS_ENABLED so staging boots stay on the noop path until
operators opt in, and falls back to noop with a warning when the WS
flag is set without MULTICA_LARK_HTTP_ENABLED (the real connector
needs the cached tenant_access_token).

Why this connector exists separately from the Hub: gorilla/websocket
ReadMessage blocks on the underlying TCP socket and does not observe
context. The watchdog goroutine inside WSLongConnConnector.Run closes
the conn the moment ctx fires, so lease loss / shutdown breaks the
blocking read in bounded time — exactly the invariant Hub
renewLeaseUntil's runCancel depends on for the "at most one active WS
per installation across replicas" guarantee. Tests cover this
explicitly (TestWSConnectorRunReturnsOnCtxCancelEvenWhenReadIsBlocked).

The Lark wire surface is split into three swappable seams so the
transport layer stays tested in isolation:

  - EndpointFetcher (POST /event-subscription/v1/connection_token)
    resolves a one-shot wss URL per Run. No caching — replaying a
    one-shot token would look like a Lark outage.
  - FrameDecoder turns one raw JSON envelope into an InboundMessage
    or a "control / heartbeat / drop" verdict. Decoder errors log
    + drop the frame; they do NOT tear down the connection.
  - CredentialsProvider wraps InstallationService.DecryptAppSecret
    so plaintext app_secret lives in memory only during a Run.

Also fixes the handler.go LarkHub comment: it still said "joins on
Wait during graceful shutdown" but main.go has used WaitWithTimeout
(bounded wait) for several commits. Comment now matches.

Co-authored-by: multica-agent <github@multica.ai>

* feat(integrations/lark): align WS to official binary Frame protocol + DispatchResult outbound replies (MUL-2671)

Two must-fix items from Elon's review of PR #3277:

1. WS protocol layer rewritten to match the official Lark Go SDK
   (`larksuite/oapi-sdk-go/v3/ws`):
   - Bootstrap is `POST /callback/ws/endpoint` with AppID/AppSecret
     in the body (no tenant_access_token bearer). Response carries
     wss URL + ClientConfig (PingInterval / ReconnectInterval /
     ReconnectNonce / ReconnectCount).
   - `service_id` is parsed from the wss URL query and used as
     Frame.Service on every outbound frame.
   - Wire envelope is the binary protobuf `pbbp2.Frame` (hand-rolled
     via protowire to avoid pulling the whole SDK in, byte-identical
     field tags). JSON payloads are nested inside Frame.Payload.
   - Inbound data frames are ACKed with a `Response{code:200,...}`
     JSON payload that reuses the inbound headers; infra failures
     produce code=500 so Lark retries.
   - Ping is the app-layer binary `NewPingFrame(serviceID)` at the
     server-supplied cadence; WebSocket protocol PING is removed
     (Lark ignores it). Server-initiated pings get a pong reply.
   - ctx-cancel-breaks-read invariant preserved via the watchdog
     goroutine that closes the conn on ctx.Done; the read loop and
     ping goroutine serialize their writes through a single mutex.

2. `DispatchResult` outbound replies wired via a new `OutcomeReplier`:
   - `OutcomeNeedsBinding` mints a one-shot binding token and sends
     the binding prompt card to the sender's open_id.
   - `OutcomeAgentOffline` / `OutcomeAgentArchived` push a notice
     card into the chat with the agent name + Chinese copy matching
     §4.6.
   - `OutcomeIngested` stays owned by the Patcher; `OutcomeDropped`
     is silent.
   - The replier is best-effort: outbound failures are logged and
     swallowed so a Lark outage cannot stall the inbound pipeline.
   - Hub installs the noop replier by default; router wires the
     production `LarkOutcomeReplier` when APIClient.IsConfigured().

PersonalAgent long-conn risk surfaced (open per Feishu docs:
`长连接模式仅支持企业自建应用`). The implementation works for any
app archetype; the open question is whether `/callback/ws/endpoint`
accepts PersonalAgent credentials in practice. Surfacing the Lark
code+msg verbatim from the bootstrap response so an operator running
the smoke test sees the exact failure rather than a generic timeout.

Co-authored-by: multica-agent <github@multica.ai>

* fix(integrations/lark): byte-compat Frame marshal, chunk reassembly, ACK off reply critical path (MUL-2671)

Three protocol blockers from Elon's review of 9540008a:

1. Frame.Marshal is now byte-identical to oapi-sdk-go/v3/ws/pbbp2.Frame:
   - SeqID/LogID/Service/Method (proto2 req) emit unconditionally even at zero
   - PayloadEncoding/PayloadType/LogIDNew emit unconditionally per gogo
     generated MarshalToSizedBuffer (no zero-guard)
   - Payload uses the SDK's `!= nil` guard (nil omits, []byte{} emits 0-length)
   - ACK payload JSON matches SDK's NewResponseByCode + json.Marshal output
     ({"code":N,"headers":null,"data":null})

   Golden tests pin exact byte sequences for ping/pong/ACK/full/zero
   frames; verified against the real SDK pbbp2.pb.go MarshalToSizedBuffer
   producing identical bytes.

2. Multi-frame events (sum>1) are reassembled via the new chunkAssembler:
   - 5s sliding TTL (matches SDK combine() cache TTL)
   - Lazy GC on admit (no separate sweeper goroutine)
   - Out-of-order seq + duplicate seq idempotent
   - Partial chunks are NOT ACKed (SDK behaviour: only the final chunk's
     ACK confirms the whole event so Lark can retry on partial loss)
   - Connector wires assembler per-Run; state dies with the session

3. OutcomeReplier detached from ACK critical path:
   - HubConfig.ReplyTimeout default 2.5s, strictly under Lark's 3s ACK deadline
   - handleEvent dispatches synchronously (fast DB path), then spawns the
     replier under a fresh background ctx with WithTimeout(ReplyTimeout)
   - Hub.replyWg tracks in-flight replies; Hub.Wait / WaitWithTimeout
     drain them so shutdown is bounded
   - Noop replier short-circuits inline (no goroutine cost when outbound
     APIClient isn't configured)

   Proof tests:
   - TestHubScheduleReplyReturnsImmediately: scheduleReply with a 10s
     slow replier returns in <50ms
   - TestHubReplyTimeoutCancelsHungReplier: hung replier ctx fires at
     ReplyTimeout
   - TestHubWaitDrainsInFlightReplies: Wait blocks until replies finish
   - TestHubACKNotBlockedByOutboundReply: end-to-end through the
     connector — data-frame ACK lands within 500ms even when the
     replier hangs 5s

PersonalAgent real-env smoke remains Bohan's decision; this PR closes
the technical blockers Elon flagged.

Co-authored-by: multica-agent <github@multica.ai>

* docs(service/issue): narrow position concurrency claim to create-create (MUL-2671)

Elon's review of the merge resolution flagged that the comment on the
new NextTopPosition call promised more than the code guarantees:
concurrent manual reorder via UpdateIssue(position) does NOT take the
workspace row lock that IncrementIssueCounter holds, so a create
racing a reorder can still land on the same position. Rewrite the
comment to only claim create-create serialization, which is the
behaviour the lock actually delivers. No code change.

Co-authored-by: multica-agent <github@multica.ai>

* fix(integrations/lark): keep device-flow polling on RFC 8628 HTTP 400 (MUL-2671)

Lark's device-flow polling endpoint returns HTTP 400 with the JSON
body `{"error":"authorization_pending"}` while the user hasn't scanned
the QR yet — this is the RFC 8628 spec, and the upstream oapi-sdk-go
implements the same handling. Our previous doForm treated ANY non-2xx
as a terminal protocol error, so every install session was killed by
the first poll (~5s after begin) and the install dialog appeared
silently empty: the frontend received status=error +
lark_protocol_error before the user could even read the description.

Fix: doForm now decodes the JSON body first; if it parses, the caller
(Begin / Poll) routes on the body's `error` field, where the existing
switch correctly maps authorization_pending / slow_down to "keep
polling" and access_denied / expired_token to terminal failure. Only
unparseable bodies (5xx HTML proxy pages, gateway timeouts) still
surface as a typed http_NNN RegistrationError.

Three regression tests pin the new behaviour:
- HTTP 400 + authorization_pending → res.Status="authorization_pending"
- HTTP 400 + access_denied → res.Err.Code="access_denied" (terminal)
- HTTP 502 + HTML body → http_502 RegistrationError

Verified against the live local env: install/begin -> 200, status
stays "pending" through the first poll cycle, no longer flips to
"error" within seconds.

Co-authored-by: multica-agent <github@multica.ai>

* fix(views/lark): reset closedRef on every mount so StrictMode double-mount renders QR (MUL-2671)

Empty QR dialog body in the dev env: Bohan opened the bind dialog and
got an empty white area where the QR should have been — no QR, no
"starting" placeholder, no error text. Backend was returning the QR
URL correctly; the bug was on the frontend.

Root cause: React 19 / Next.js dev StrictMode mounts every component
twice (mount → cleanup → mount). The component instance is REUSED
across the simulated remount, which means useRef objects are
preserved. The dialog's `closedRef` lifecycle:

  1. Mount #1: closedRef={current:false}, beginSession() kicked off
     (HTTP request still in flight)
  2. Cleanup runs: closedRef.current=true
  3. Mount #2: beginSession() kicked off again, BUT the ref still
     reads {current:true} from step 2
  4. Both promises resolve. Both hit the post-await guard
     `if (closedRef.current) return;` and bail out before setSession().
  5. Result: session stays null forever. Every conditional in the
     dialog body (beginning/session-pending/success/error) is false →
     empty body.

Fix: reset closedRef.current=false at the START of the effect, not
just at component construction. The cleanup-then-mount pair now
re-arms the guard so subsequent setSession calls actually land.

Regression test wraps the dialog in <StrictMode> and asserts the
QR appears within 2s with the correct value — fails closed if anyone
removes the reset.

Co-authored-by: multica-agent <github@multica.ai>

* fix(integrations/lark): drop EventTaskCompleted subscription so the chat reply doesn't get overwritten by "Done." (MUL-2671)

Bohan reproduced on the live dev env: agent replies show only a card
saying "Done." in Lark, even though Multica's own chat panel has the
real "Hello! I'm cc…" reply. Tasks succeed end-to-end, but the user
loses the reply on the Lark side.

Root cause: TaskService.CompleteTask publishes two events for every
chat task IN ORDER:

  1. broadcastChatDone(...)       → ChatDonePayload{Content: "Hello!..."}
  2. broadcastTaskEvent(Completed) → map[string]any{task_id, agent_id,...}
                                     (no `content` key)

The Patcher subscribed to BOTH and routed each to finalize(). The
first patch correctly rendered the reply text, the second
patched the same card with an empty payload — chatDoneContent()
returned "" and the renderer fell back to "Done." (default empty-body
copy). The second patch wins because Lark stores whatever was last
applied.

Fix: stop subscribing to EventTaskCompleted in the Patcher and remove
the corresponding switch arm. EventChatDone is the canonical "agent
finished replying" signal for the Lark card path; EventTaskCompleted
is still emitted to the bus for other listeners (web UI, analytics,
task usage) where the lack of content doesn't matter.

Regression test TestPatcherIgnoresEventTaskCompletedForChatTasks
emits ChatDone followed by TaskCompleted on a streaming card and
asserts: exactly one patch, body contains the agent reply, body does
NOT contain "Done.". If anyone re-adds the EventTaskCompleted
subscription, this fails immediately.

Co-authored-by: multica-agent <github@multica.ai>

* feat(integrations/lark): chat replies as plain text IM messages, not card chrome (MUL-2671)

Bohan reported on the live dev env that even with the agent's reply
shown correctly, every message is wrapped in an interactive card with
the agent name as the header — it feels like a system notification,
not a normal chat reply. He wants the reply to land as a regular Lark
text bubble.

Changes:

- Add APIClient.SendTextMessage backed by Lark's
  /open-apis/im/v1/messages with msg_type=text. JSON-encodes the
  {"text": ...} envelope Lark requires so callers pass raw strings.
- Patcher.Register no longer subscribes to EventTaskQueued /
  EventTaskRunning. There is no more thinking → running → final
  card lifecycle on the success path: it added card chrome without
  buying anything for free-form chat.
- On EventChatDone, the new sendChatReply path posts the assistant
  message content as plain text. Empty content is silently dropped
  rather than rendered as "Done." (the prior fallback that
  confused Bohan).
- Failure path keeps a one-shot error card on EventTaskFailed —
  the visual distinction from a normal reply is genuinely useful,
  and failures are rare enough that the chrome isn't noisy.
- Throttle / lastPatched map / MinPatchInterval / shouldPatch /
  markPatched / loadCardOrSkip are all removed; nothing in the new
  flow patches.

Tests:

- TestPatcherSendsPlainTextOnChatDone pins the new contract: exactly
  one SendTextMessage call, no card sends or patches, content
  matches the ChatDonePayload.
- TestPatcherDropsEmptyChatReply pins the "no more Done. fallback"
  decision — empty content drops, period.
- TestPatcherFailEventSendsErrorCard pins the failure path still
  uses a card (one-shot, no patching).
- TestPatcherIgnoresEventTaskCompletedForChatTasks rewritten for
  text path: ChatDone then TaskCompleted yields exactly one text
  send, no duplicate.
- TestPatcherSkipsWhenNoChatSessionBinding and
  TestPatcherSwallowsInstallationLoadErrors rewritten to drive
  EventChatDone (the new entry point) instead of TaskQueued.
- TestPatcherSendsThinkingCardOnTaskQueued deleted (no more
  thinking card).

Co-authored-by: multica-agent <github@multica.ai>

* feat(integrations/lark): pre-fill PersonalAgent bot name as "<agent> - Multica" (MUL-2823) (#3520)

The device-flow install left the bot at Lark's auto-generated
"{用户姓名}的智能助手". Lark's registration scene supports pre-filling the
name via a `name` query param on the verification/QR URL (mirrors the
upstream SDK's AppPreset.Name) — a user-editable default that rides on
the QR URL, not the begin POST body (which has no name field).

BeginInstall already loads the agent for its ownership check, so we keep
it and thread `<agent.Name> - Multica` through Begin → decorateQRCodeURL.
A blank name degrades to plain "Multica".

There is no post-install rename API (bot/v3 is read-only; no
bot/v3/update), so the install-time pre-fill is the only programmatic
lever; the user can still edit the name on the creation form.

Co-authored-by: J <j@multica.ai>
Co-authored-by: multica-agent <github@multica.ai>

* fix(integrations/lark): restore /issue confirmation + pin SendTextMessage wire (MUL-2671)

Two recovered/added contracts off Trump's review of HEAD fe381a07:

1) /issue confirmation in Lark was a casualty of the plain-text
   refactor. The pre-refactor `RenderInput.IssueNumber` field was
   declared but never actually rendered into the card body, so even
   in the original card-based flow the user never saw a "Created
   [MUL-42]" confirmation. Now the OutcomeReplier handles
   OutcomeIngested + IssueID.Valid by sending a plain text message:

     Created MUL-42 — fix login bug
     https://multica.example/issues/MUL-42

   Composed from a new DispatchResult.IssueIdentifier +
   IssueTitle, populated by the Dispatcher from
   workspace.IssuePrefix + issue.Number / issue.Title. Workspace
   lookup is best-effort: a Postgres blip on workspace gets a "#42"
   fallback rather than silently dropping the confirmation.

   The agent's own chat reply (if any) continues to land separately
   via the Patcher on EventChatDone — these are two semantically
   distinct messages and the user benefits from seeing both.

2) SendTextMessage is the wire layer Trump flagged for missing
   coverage. Three new wire tests pin:
   - happy path: POST /open-apis/im/v1/messages?receive_id_type=chat_id,
     msg_type=text, Bearer <tenant_access_token>, double-JSON
     content envelope
   - special-character round trip: newlines, double quotes,
     backslashes, tabs, Chinese + emoji, JSON-lookalike strings.
     The inner {"text": ...} is encoded once at JSON.Marshal time
     and once again when the outer body serializes; losing either
     pass corrupts the message and the bug is invisible without a
     contract pin.
   - Lark error path: non-zero `code` surfaces as a wrapped error
     with the code embedded.

Tests:
- TestDispatcher_IssueCreationFromCommand asserts IssueIdentifier
  ("MUL-42") and IssueTitle propagate through DispatchResult.
- TestDispatcher_IssueIdentifierFallsBackToNumberOnWorkspaceLookupErr
  pins the "#7" degrade-graceful fallback.
- TestLarkOutcomeReplierIssueCreatedSendsConfirmation pins the
  text body (identifier + title + deep link) and asserts no card
  send on this path.
- TestLarkOutcomeReplierOutcomeIngestedSilentWithoutIssue pins
  the silent-on-plain-chat default so we don't accidentally start
  emitting a confirmation for every message.
- TestHTTPClient_SendTextMessage_* covers the wire contract.

Frontend locale parity (en + zh-Hans, 53 tests) is currently green
on this HEAD; no changes needed.

Co-authored-by: multica-agent <github@multica.ai>

* fix(views/locales): add missing ko keys for Lark MVP (MUL-2671)

Trump flagged on PR #3277 review that the ko bundle was missing the
Lark-MVP-only keys that en + zh-Hans both carry. The parity test
caught it cleanly after main was merged in (Korean PR landed on main
between the prior review and this one):

  common.lark_bind.*                       (13 keys)
  settings.page.tabs.lark                  (1 key)
  settings.lark.*                          (45 keys)
  agents.inspector.section_integrations    (1 key)

Korean translations are professional/concise — "Lark" stays as the
brand name (matches how en keeps "Lark" + "(飞书)" parenthetically;
ko/users searching for the product expect "Lark"), and product copy
follows the zh-Hans tone where Multica nouns ("에이전트", "워크스페이스")
are romanized loan words consistent with the rest of the ko bundle.

Slot ordering preserved against EN:
  - page.tabs.lark sits between github and integrations
  - inspector.section_integrations sits right after section_skills

Verified: pnpm exec vitest run locales/parity → 105/105 pass.
Co-authored-by: multica-agent <github@multica.ai>

* fix(integrations/lark): /issue origin_type CHECK + Hub restart on credentials rotation (MUL-2671)

Two live-env bugs Bohan reproduced:

1) /issue command crashed the WS connector. Dispatcher writes
   origin_type='lark_chat' on issues born from `/issue`, but the
   issue_origin_type_check CHECK constraint was last extended in
   migration 060 for quick_create — it doesn't list lark_chat, so
   every Lark /issue tripped SQLSTATE 23514 and bubbled up as an
   infra error. The infra error tore down the WS connector, Lark
   retried the same message, the new connector tripped the same
   constraint and crashed again. Repro in the live env: three
   crashes from the same /issue event over ~40s, each leaving the
   user with no confirmation in Lark.

   Migration 111 extends the CHECK list:
     CHECK (origin_type IN ('autopilot', 'quick_create', 'lark_chat'))

2) Re-scanning an already-bound agent silenced the bot. The device
   flow re-registers with Lark, which mints a brand-new bot (fresh
   app_id + app_secret); RegistrationService.finishSuccess upserts
   into lark_installation by agent_id, so the row's credentials
   rotate in place. But the running supervisor held the OLD inst
   struct by value and kept a WS open against the OLD bot's app_id —
   so all events to the NEW bot went nowhere. Bohan's "claude code
   现在不能在飞书里回复了" symptom maps exactly to this:

      log timeline:
      16:29:57  cc connector connected with app_id=cli_aa9398dd...  (OLD)
      16:34:07  lark registration: install complete                  (rotation)
      → row.app_id is now cli_aa93f36f...                            (NEW)
      → old WS still subscribed to OLD app_id; new app_id receives nothing

   Fix: Hub.sweep now compares each installation row's credentials
   fingerprint (app_id + bot_open_id + sha256(app_secret_encrypted))
   against the snapshot the running supervisor was started with. On
   diff, cancel the old supervisor and start a fresh one inline. A
   monotonic gen counter on the supervisor entry disambiguates the
   old goroutine's deferred cleanup from the new entry the rotation
   path already swapped in.

Tests:
- TestHubRestartsSupervisorOnCredentialsRotation pins the new path:
  starts hub on app_one, rotates the row to app_two, asserts the
  connector factory is called again with the fresh AppID.
- TestHubDoesNotRestartSupervisorOnUnchangedRow pins the negative
  case so an unchanged row doesn't degenerate into a per-sweep
  busy-loop.
- Existing hub tests (lease, supervise, shutdown, ACK timing,
  noop replier) all green.

Verification:
- go test ./internal/integrations/lark/... -race -count=1   ok
- go build ./... clean
- migration applied locally; \d+ issue confirms lark_chat in CHECK

Co-authored-by: multica-agent <github@multica.ai>

* fix(integrations/lark): per-supervisor lease token to fence rotation handoff (MUL-2671)

Elon flagged a race in HEAD be8d4cef's rotation path: both the old
and the new supervisors of the same Hub used the hub-wide nodeID as
their WS lease token, so an old supervisor's post-cancel
releaseLease(nodeID) would CAS-match the lease row the successor had
just acquired with the SAME token and DELETE it. Symptom would be a
silently empty lease row a few hundred ms after every device-flow
re-scan — no replica owning the install, no events delivered, the
"bot goes quiet" pattern Bohan hit the first time but now from the
fencing side rather than the credentials side.

Fix: leaseToken(nodeID, gen) composes "<nodeID>-g<gen>", where gen is
the monotonic counter already attached to each supervisorEntry. The
nodeID prefix keeps cross-replica observability (an operator
inspecting lark_installation.ws_lease_token can still map back to a
process) while the -g suffix makes the OLD supervisor's release
target the OLD row state. Once the rotation path swaps in the new
supervisor, the row's CurrentToken is the new -g(N+1) token, so the
old -gN release's WHERE clause no-ops instead of clobbering.

acquireLease / renewLeaseUntil / releaseLease now take an explicit
token argument; supervise threads its leaseToken through. The
plumbing isn't pretty, but having an explicit argument at every call
site is the only way the rotation invariant survives subsequent
refactors — without it, a future caller could quietly reintroduce
"just use h.nodeID" and the race is back.

Two regression tests:

- TestHubRotationStaleReleaseDoesNotClearSuccessorLease drives the
  fake lease state machine directly:
    1. old acquires(tokenA)
    2. rotation lands; new acquires(tokenB)
    3. old's stale release(tokenA) fires
  Asserts owner ends up still tokenB. Hub-wide-nodeID code would fail
  step 3 by clearing the entry.

- TestHubRotationEndToEndKeepsSuccessorLeased runs the same scenario
  through the live supervise loop: starts hub, rotates the row, waits
  for sup2 to take over with a distinct token, sleeps past sup1's
  unwind, asserts the row is still held by a non-sup1 token. Catches
  the bug even when the goroutine timing is non-deterministic.

Verification: go test ./internal/integrations/lark/... -race -count=1   ok
  go build ./...                                            clean
  go vet ./...                                              clean
Co-authored-by: multica-agent <github@multica.ai>

* fix(integrations/lark): route group @-mentions via union_id, not open_id (MUL-2671)

In a Lark group with multiple Multica bots installed, the bot whose WS
received the event sometimes failed to recognize that it was the @-target
while the OTHER bot's supervisor falsely fired. Bohan's controlled three-
message test (only @A, only @B, @both) hit this: @A and @B alone went
unanswered, @both got picked up by A only.

Root cause: the `mentions[].id.open_id` field Lark puts on the WS event
is structurally INVERSE to `/bot/v3/info`'s `bot.open_id` across the two
WSes. From A's WS perspective, the wire-form open_id for "A was @-ed"
is NOT equal to A's API-side open_id, but IS equal to what B's WS sees
on its side, and vice versa. The decoder's `mention.open_id ==
inst.BotOpenID` match therefore fires on the wrong bot in multi-bot
groups. Only `union_id` (the Lark-tenant-scoped stable identifier) is
consistent across both WSes.

Changes:
- migration 112 adds nullable `lark_installation.bot_union_id`
- sqlc query exposes UpsertLarkInstallation/CreateLarkInstallation
  with bot_union_id, plus a focused SetLarkInstallationBotUnionID for
  the backfill path
- httpAPIClient.GetBotInfo now follows /bot/v3/info with /contact/v3/
  users/{open_id}?user_id_type=open_id and returns both identifiers
  on BotInfo. Soft-fails on contact-scope denial: install still
  succeeds with an empty UnionID, and the decoder falls back to the
  legacy open_id match for single-bot deployments.
- RegistrationService.finishSuccess persists union_id alongside
  open_id during the device-flow finalize.
- ws_frame_decoder.containsMention prefers union_id and only walks
  open_id when the installation row has not been backfilled yet.
- BackfillBotUnionIDs runs once at server boot for installations
  created before migration 112; bounded per-row 10s timeout and a
  pure soft-fail policy so a slow Lark round-trip cannot block
  startup.
- regression tests cover the three decoder paths: union_id match
  wins over open_id mismatch, union_id mismatch overrides open_id
  match, and open_id fallback when union_id is unknown.

Co-authored-by: multica-agent <github@multica.ai>

* chore: drop trailing blank lines at EOF on four files (MUL-2671)

git diff --check origin/main..origin/pr-3277 flagged these as new
blank lines at EOF; clearing so the diff stays clean for review.

Co-authored-by: multica-agent <github@multica.ai>

* fix(views/locales): add missing ja keys for Lark MVP + section_integrations (MUL-2671)

CI frontend job tripped on the ja locale parity check: ja is missing
the lark_bind block in common.json, the lark block + page.tabs.lark
in settings.json, and inspector.section_integrations in agents.json.
The ko fix earlier covered Korean; ja was added separately on main
and the merge surfaced these gaps. Translations mirror the en source
and follow the same voice as the existing ja bundle.

Co-authored-by: multica-agent <github@multica.ai>

* fix(integrations/lark): rewrite @_user_N placeholders into clean body (MUL-2671)

When Lark dispatches a group `im.message.receive_v1`, the message
text contains opaque `@_user_1`, `@_user_2`, … placeholders and the
real identity is in `mentions[]`. We were forwarding the raw text to
the agent, so a Bohan-typed "@Bot ping test" arrived as "@_user_1
ping test" — neither human-readable nor useful as LLM context, and
the agent was paying tokens to figure out which `@_user_N` was even
itself.

The new resolveMentions pass:
  * strips the bot's own mention entirely (the dispatcher already
    routes the event on AddressedToBot; re-emitting @<self> in front
    of every message adds zero signal and pollutes context),
  * substitutes other participants with `@<displayName>` so a
    follow-up "@Alice" reads naturally,
  * collapses horizontal whitespace introduced by the strip while
    preserving original newlines.

Bot identity check uses the same union_id-preferred + open_id
fallback as containsMention, so the rewrite stays consistent with
the routing path. Tests cover the four shapes: bot self-mention,
mixed bot + other-user mention, multi-line body with stripped
mention, and a no-mention body that should be left untouched.

Co-authored-by: multica-agent <github@multica.ai>

* fix(integrations/lark): union_id-first self mention strip + token-aware scan + local whitespace cleanup (MUL-2671)

Three review blockers on the mention rewrite from PR review:

1. isBotMention now mirrors containsMention's union_id-first policy.
   When the installation row knows our union_id, we trust it
   exclusively (open_id is structurally inverted in multi-bot
   groups — matching on it would re-introduce the routing bug we
   fixed two commits ago). open_id fallback fires only when
   union_id is absent. New tests: @-ing both bots in one message
   correctly strips only self and renders the sibling as @<name>;
   open_id-matches-but-union_id-differs does NOT strip.

2. resolveMentions no longer collapses or trims whitespace globally.
   Indentation, tabs, code blocks, tables — all preserved verbatim.
   When the self mention is removed we eat exactly one adjacent
   horizontal space (the one after the placeholder, or, when the
   mention sits at end-of-input, a single space already emitted
   right before it). New test exercises a multi-line indented +
   tabbed body and asserts the whole shape survives.

3. Prefix-collision-safe replacement. A chat with 11+ participants
   exposes both `@_user_1` and `@_user_10`; naive ReplaceAll for
   `@_user_1` would mangle the substring of `@_user_10`. The
   resolver now does a single-pass token scan with the mention
   list sorted longest-key-first, so the longer placeholder always
   wins at any scan position. New test covers the @_user_1 /
   @_user_10 case explicitly.

Also drops the temporary INFO-level diag logging the previous
commit added — root cause was confirmed (union_id swap in the
manual backfill; not a decoder bug).

Co-authored-by: multica-agent <github@multica.ai>

* fix(integrations/lark): scope inbound dedup per (installation_id, message_id) (MUL-2671)

Root cause of the residual "@Cc gets dropped as not_addressed_in_group"
even after the union_id swap landed: lark_inbound_message_dedup was
keyed on `message_id` alone. In a Lark group chat where the workspace
has multiple Multica bots installed, Lark delivers the SAME message_id
to every bot's WS supervisor. Whichever WS claimed first then ran its
own AddressedToBot check; the bot that was actually @-ed lost the dedup
race, found the row already terminal (`processed_at IS NOT NULL`), and
was dropped as `duplicate` BEFORE it could evaluate its own mention.
Net: every @ silently disappeared if Lark happened to route the OTHER
bot's WS first.

The dedup gate's original purpose (idempotency against WS reconnect
replay) is per-installation by definition, so the right key is
composite (installation_id, message_id).

Changes:
- migration 113 drops + recreates lark_inbound_message_dedup with
  installation_id NOT NULL REFERENCES lark_installation(id) ON DELETE
  CASCADE and PRIMARY KEY (installation_id, message_id). The table is
  a 24h transient cache, so dropping existing rows is safe.
- sqlc queries: ClaimLarkInboundDedup / MarkLarkInboundDedupProcessed /
  ReleaseLarkInboundDedup all now take installation_id.
- AppendUserMessageParams carries InstallationID through to the
  in-tx Mark call so the chat_message+dedup atomicity stays intact.
- Dispatcher passes inst.ID to claim + applyFinalize + AppendUserMessage.
- Test fakes key dedup state on (installation_id, message_id) via a
  composite map key; all existing pre-seeded rows use a seedDedupKey
  helper bound to the default activeInstallation fixture so the prior
  staleness / token-rotation / in-tx mark tests still exercise the
  same regression they did before.
- New regression TestDispatcher_DedupIsScopedPerInstallation pins the
  multi-bot invariant: a row pre-seeded for installation A does NOT
  block installation B's first delivery of the same message_id; B
  runs through its own group-filter / identity / ingest pipeline.

Co-authored-by: multica-agent <github@multica.ai>

* feat(integrations/lark): render markdown chat replies via schema-2.0 card (MUL-2671)

The agent's chat replies were going out as msg_type=text, so every
`**bold**`, fenced code block, list, table, and link in the body
showed up as literal markdown characters in Lark — the user saw raw
asterisks, hashes, pipes instead of formatted text. Bohan reported
this and pointed at zarazhangrui/lark-coding-agent-bridge as the
shape to emulate.

The bridge repo uses Lark interactive cards with the schema-2.0
envelope and a `tag: "markdown"` body element; Lark's client
renders that to formatted text (GFM-ish: bold/italic, headings,
lists, links, fenced code blocks, tables, blockquotes). They expose
multiple reply modes (card / markdown-as-post / text) gated by user
config; we go a step simpler — auto-detect markdown syntax in the
agent's body and route accordingly:

- containsMarkdown(): cheap substring + regex pass for fenced code
  blocks, headings, list markers, bold/italic, tables, links,
  blockquotes, horizontal rules, inline code. Biases toward false-
  positive — wrapping prose in a card still renders fine, but
  missing a real markdown block leaves raw characters visible.

- APIClient gains SendMarkdownCard / SendMarkdownCardParams.
  Implementation marshals the schema-2.0 envelope verbatim:
  {schema:"2.0", body:{elements:[{tag:"markdown", content: md}]}}.
  Stub returns ErrAPIClientNotConfigured.

- Patcher.sendChatReply now branches on containsMarkdown:
  markdown → SendMarkdownCard, plain prose → SendTextMessage. A
  one-liner "sure, on it" stays as a normal IM bubble (no card
  chrome); anything with markdown gets the rendered card.

Tests: TestContainsMarkdown pins the heuristic across plain prose
and ten markdown shapes; TestPatcherRoutesMarkdownReplyToCard and
TestPatcherRoutesPlainReplyToText cover the router; new HTTP wire
test TestHTTPClient_SendMarkdownCard_HappyPath contract-pins the
card envelope (msg_type=interactive, schema 2.0, markdown tag,
verbatim body). Full lark suite passes.

Co-authored-by: multica-agent <github@multica.ai>

* fix(service/issue): route analytics.IssueCreated through obsmetrics.RecordEvent (MUL-2671)

CI's TestNoNakedAnalyticsCaptureInHandlersOrServices guard caught the
post-merge analytics call in IssueService.captureCreatedAnalytics
that still used s.Analytics.Capture(...) directly. Main added that
lint to prevent the Prometheus and PostHog sides from drifting — any
new analytics.* event must go through obsmetrics.RecordEvent so the
business-metrics collector and the PostHog client fire from the same
call site.

Fix mirrors how TaskService handles it: IssueService gains a
Metrics *obsmetrics.BusinessMetrics field (router wires it via
h.IssueService.Metrics = opts.BusinessMetrics next to the existing
TaskService line), and the in-service Capture call becomes
obsmetrics.RecordEvent(s.Analytics, s.Metrics, ...). nil-safe by
construction — RecordEvent treats a nil Metrics as PostHog-only.

Co-authored-by: multica-agent <github@multica.ai>

* feat(views/lark): swap Bind CTA for Connected+Manage link when agent already has an installation (MUL-2671)

Bohan reported the agent-detail Bind button keeps inviting the user to
re-scan the QR even when the agent already has an active Lark
PersonalAgent connected — and re-scanning silently upserts the
installation row, leaving the previously-created Lark bot dangling
as a zombie. Frustrating UX and an actual product footgun.

Anti-zombie guard at the only entry point: LarkAgentBindButton now
checks the cached installations listing for an active row pinned to
this agent_id. When one exists, the install CTA is gone — replaced
by a small Connected pill + an "Manage in Lark" link that opens the
Bot's app page in Lark's developer console (open.feishu.cn/app/<app_id>)
in a new tab. That's where scopes, display name, and additional
permission requests actually live; re-scanning never was the right
answer for managing an existing bot.

Scoping is per-agent: an active installation on a DIFFERENT agent
in the same workspace doesn't affect this agent's button, and a
revoked installation falls back to the bind CTA so the user can
re-create. Tests cover all four states (own-active / own-revoked /
other-agent-active / no-installation) and pin the Manage link's
href + target=_blank + noopener.

i18n: three new keys in settings.json (en / zh-Hans / ja / ko):
agent_bot_connected_label, agent_bot_manage_link,
agent_bot_manage_tooltip. Locale parity test still 157/157.

The dev console host is hardcoded to open.feishu.cn — operators
on the Lark international tenant currently get the wrong host;
future-proof fix wants the backend to surface a per-installation
dev_console_url on the listings response, called out in a code
comment.

Co-authored-by: multica-agent <github@multica.ai>

* feat(views/settings): collapse Lark into Integrations + render agent identity (MUL-2671)

Lark was its own top-level workspace settings tab while Integrations sat
empty next to it. As more integrations land, the sidebar would balloon
with one tab per provider. Move the Lark surface into Integrations as
the first hosted integration; the old ?tab=lark URL redirects through
LEGACY_WORKSPACE_TAB_REDIRECTS so bookmarks still resolve.

The Connected bots list was leaking the raw Lark app_id (cli_…) as the
row title with bot_open_id (ou_…) underneath — meaningless to product
users. Since the binding is 1:1 with a Multica Agent, join on agent_id
and render the agent's avatar + name via the workspace-standard
ActorAvatar + useActorName.getAgentName. Deleted agents fall back to
"Unknown Agent" so the row is still actionable for cleanup.

Tests: stub useActorName + ActorAvatar in lark-tab.test.tsx and add
LarkTab connected-bot tests covering the agent identity render and the
deleted-agent fallback. Drop the now-dead integrations.* + page.tabs.lark
+ lark.bot_open_id_label keys across all four locales — parity still
157/157, views suite 1141/1141.

Co-authored-by: multica-agent <github@multica.ai>

* feat(views/settings): wrap Lark in a named section inside Integrations (MUL-2671)

Integrations is meant to host multiple providers (Slack, Linear etc. as
they land), so the Lark content should sit under a Lark heading rather
than fill the tab directly — otherwise the first additional integration
would feel like it broke the IA. Add a "Lark" / "飞书" section heading
above LarkTab using the same h2 chrome the other settings tabs use, and
pin lark.section_title across all four locales (parity 169/169).

Co-authored-by: multica-agent <github@multica.ai>

---------

Co-authored-by: multica-agent <github@multica.ai>
Co-authored-by: J <j@multica.ai>
2026-06-03 19:12:14 +08:00
Bohan Jiang
5900d8b637 fix(issues): make start_date/due_date timezone-stable calendar days (#3618) (#3692)
* fix(issues): store start_date/due_date as DATE, not timestamp (MUL-2925)

These fields are calendar days (the pickers offer no time-of-day), but were
stored as TIMESTAMPTZ. A client serializing local midnight via toISOString()
folded its timezone into the instant, so the day shifted by the local offset
(GH #3618). Migrate the columns to DATE and parse/serialize date-only
"YYYY-MM-DD". ParseCalendarDate still accepts legacy RFC3339 (truncated to the
UTC day) so older clients keep working.

Co-authored-by: multica-agent <github@multica.ai>

* fix(issues): render start_date/due_date as timezone-stable calendar days (MUL-2925)

Pickers now emit date-only "YYYY-MM-DD" (local calendar day) instead of
toISOString(), and every read formats via the shared @multica/core/issues/date
helpers with timeZone:"UTC" so the day never shifts with the viewer's offset.
The Gantt's existing UTC bucketing is now correct. Covers web/desktop pickers,
quick-set menu, list/board/detail/activity, and the mobile due-date picker.

Co-authored-by: multica-agent <github@multica.ai>

* fix(issues): address date-only review — loud-fail ambiguous dates, finish display sweep (MUL-2925)

Review follow-ups on #3692:
- ParseCalendarDate no longer silently truncates a legacy non-midnight RFC3339
  to the wrong UTC day; it accepts only YYYY-MM-DD or an exact UTC-midnight
  instant and rejects ambiguous ones loudly. Adds util unit tests.
- migration 112 pins the TIMESTAMPTZ->DATE conversion to UTC explicitly via
  AT TIME ZONE 'UTC' (was session-timezone dependent); down migration too.
- Convert remaining date-change display sites to formatDateOnly: inbox detail
  label (web) and mobile activity + inbox labels (were new Date()+local format).
- CLI --start-date/--due-date help now says YYYY-MM-DD, not RFC3339.

Co-authored-by: multica-agent <github@multica.ai>

---------

Co-authored-by: J <j@multica.ai>
Co-authored-by: multica-agent <github@multica.ai>
2026-06-03 14:34:01 +08:00
Matt Voska
700cd97407 feat(workspace): add per-workspace logo upload (#2760)
Adds avatar_url column to workspace, threads it through the API +
WorkspaceAvatar component, and adds a click-to-upload editor in the
workspace settings tab. Mirrors the squad avatar pattern (migration 086);
UI strings use "logo" while the schema/code uses avatar_url for codebase
consistency with user.avatar_url and squad.avatar_url.

- migration 093: ALTER TABLE workspace ADD COLUMN avatar_url TEXT
- UpdateWorkspace SQL + handler accept avatar_url (auth gated to
  owner/admin at the router via RequireWorkspaceRoleFromURL)
- WorkspaceAvatar renders <img> when avatar_url is set, falls back to
  the initial-letter span otherwise
- workspace-tab.tsx adds a 16x16 click-to-upload logo editor at the
  top of the general settings card, using useFileUpload + accept=
  image/png,image/jpeg,image/webp (server stores under workspaces/{id}/)
- en + zh-Hans settings i18n strings added

Co-authored-by: Matt Voska <voska@users.noreply.github.com>
2026-06-01 16:48:05 +02:00
Raúl Anatol
2b5696703f MUL-2703: feat(autopilots): webhook event filters per trigger (MUL-2334 follow-up) (#3231)
* feat(autopilots): webhook event filters per trigger (MUL-2334 follow-up)

Adds schema-backed event/action filtering to webhook triggers so operators
can declare exactly which GitHub (or generic) events should spawn autopilot
runs. Events outside the declared scope are recorded as ignored with reason
'event_filtered' — visible in the delivery log but without expensive run/task
creation.

Closes #3093 (supersedes the description-parsing approach from that PR).

Backend:
- Migration 108 adds event_filters JSONB to autopilot_trigger
- sqlc queries updated for CREATE / UPDATE / LIST / GET
- HandleAutopilotWebhook filters against trigger.event_filters before dispatch
- Create/Update trigger handlers accept event_filters in the request body
- Response shape includes event_filters so the UI can render it

Frontend:
- New WebhookEventFilterSection component in the autopilot dialog
- Inputs for event name + comma-separated actions
- i18n strings added (en + zh-Hans)

Tests:
- Unit tests for splitWebhookEvent and webhookEventAllowedByTriggerScope
- Handler-level integration tests for filtered / allowed / no-filter paths

co-authored-by: ZephaniaCN <agent/autopilot-webhook-filter>

* fix: recognize gitlab/bitbucket/gitea as providers in splitWebhookEvent

TestSplitWebhookEvent failed because only 'github' was recognized as a
provider prefix. Extract isKnownProvider() to handle gitlab, bitbucket,
and gitea as well.

* fix(autopilots): address PR #3231 review for webhook event filters

Must-fix from PR #3231 review:

1. event_filters now uses typed []WebhookEventFilter at the HTTP boundary
   instead of []byte. encoding/json was base64-encoding the field on the
   way out, so the UI could not .map() the response, and a real JSON
   array on the way in failed to decode. Response field also decodes the
   stored JSONB into a typed slice before serialising back.

2. UpdateAutopilotTriggerRequest.EventFilters is *[]WebhookEventFilter
   with tri-state PATCH semantics: nil pointer = leave alone, [] =
   clear, [...] = replace. The handler marshals an explicit empty slice
   to the JSONB literal `[]` so COALESCE overwrites instead of preserves.
   AutopilotDialog now PATCHes the webhook trigger when event_filters
   change in edit mode (previously the toast said "updated" while the
   backend was unchanged).

3. webhookEventAllowedByTriggerScope no longer short-circuits to false
   on the first event-name match whose actions don't line up. Earlier
   code silently shadowed any later filter that shared the same event
   name with disjoint actions.

Robustness: validateWebhookEventFilters rejects empty event names /
actions at write time, and the matcher fails closed on malformed stored
bytes instead of widening the allowlist.

Tests: handler tests now post real JSON arrays (the prior []byte path
masked the contract bug). Adds round-trip / clear-with-[] / preserve-
when-omitted / replace / invalid-filter / filters-on-schedule coverage,
plus matcher tests for same-event multi-filter and malformed-deny.

Migration renamed 108 → 110 to avoid colliding with main's
108_task_token (came in via the merge from main).
2026-05-27 15:47:36 +08:00
Bohan Jiang
341ce7bfa5 feat: support local working directory for projects (MUL-2618 v1) (#3283)
* feat(project): add local_directory project_resource type (MUL-2662)

Adds a second project_resource type alongside github_repo so a project
can be pinned to an existing directory on a specific daemon (the v1 of
the local-working-directory flow tracked in MUL-2618). The ref schema is
{ local_path, daemon_id, label? }; local_path must be absolute and
daemon_id is required. The same (daemon_id, local_path) pair is allowed
on multiple projects by design — no UNIQUE constraint is added.

Implementation reuses the existing project_resource API surface: the new
type is wired through the validator switch with no migration, no new
events, and no daemon-handler changes (daemon already passes through
arbitrary resource types via ProjectResources). The CLI gains
--local-path / --daemon-id / --ref-label shortcuts so
`multica project resource add --type local_directory` mirrors the
existing `--type github_repo --url ...` ergonomics; the generic --ref
flag still works for both types.

Tests cover the full CRUD lifecycle, the same-path-across-projects
allowance, the same-path-same-project conflict, the validator rejections
(missing/blank/relative path, missing daemon_id, wrong payload type),
and the cross-platform isAbsoluteLocalPath helper.

Co-authored-by: multica-agent <github@multica.ai>

* feat(project): add update endpoint + label-shadow guard for project_resource (MUL-2662)

Addresses the Elon review on PR #3263:

- Add PUT /api/projects/{id}/resources/{resourceId} with sqlc query,
  matching handler, CLI `project resource update`, and a new
  EventProjectResourceUpdated WS event. resource_type stays immutable;
  ref/label/position are all individually optional.
- Catch same-project (daemon_id, local_path) collisions where only the
  embedded label differs — the row-level UNIQUE only matches the full
  ref JSON, so a label typo would otherwise let the same working
  directory bind twice.
- Tests cover the update lifecycle (label-only / ref / clear / 404 /
  invalid path) and the label-shadow conflict on both create and
  update; the in-place rename still succeeds because the conflict
  scan ignores the row being edited.

Incidental: regenerating sqlc picked up a missing skills_local scan in
UpdateAgentCustomEnv that drifted in from #3200.

Co-authored-by: multica-agent <github@multica.ai>

* fix(project): close bundled-create label-shadow gap + merge resource_ref on CLI update (MUL-2662)

Two follow-ups from MUL-2662 review round 2:

- CreateProject inline resources path now dedupes local_directory entries on
  (daemon_id, local_path) before opening the transaction. The DB-level
  UNIQUE(project_id, resource_type, resource_ref) constraint only fires on a
  full JSON match, so two rows with the same target but different `label`
  would otherwise slip past. Standalone POST/PUT already cover this via
  findLocalDirectoryConflict; bundled create was the missing surface.
- `multica project resource update` now seeds resource_ref from the existing
  row before applying per-type shortcut flags, so `--default-branch-hint x`
  on its own no longer constructs a payload missing `url` (which the server
  400s on). Local_directory partial edits get the same merge behavior.

Co-authored-by: multica-agent <github@multica.ai>

* feat(desktop): local_directory project_resource UI (MUL-2665) (#3273)

* feat(desktop): local_directory project_resource UI (MUL-2665)

First UI surface for the local-working-directory flow tracked in MUL-2618.
Lets users on the desktop pin a project to an existing folder on this
machine; web stays read-only since the per-daemon check can't be done in
the browser.

What's new for the renderer:

- ProjectResourcesSection grows a desktop-only "Add local directory"
  button next to the existing GitHub-repo popover. Clicking it opens
  Electron's native folder picker, validates the path through a new
  IPC pair (existence + r/w), and submits a project_resource of
  resource_type=local_directory with daemon_id pulled live from
  daemonAPI.getStatus.
- LocalDirectoryRow renders the rename pencil + path tooltip, and
  greys out when ref.daemon_id != this machine's daemon_id (with a
  "only available on the machine that registered this directory"
  tooltip). Delete stays enabled so users can drop stale registrations
  from any device.
- LocalDirectoryHint sits above the issue-detail comment composer and
  shows "Agent will work in-place at {label} ({path})" when the issue's
  project has a local_directory matching this daemon. Hidden on web.
- TaskStatusPill picks up a new "waiting_for_directory_release" stage
  that the daemon will publish when it dequeues a task but can't
  acquire the path lock. The render is in place now so the daemon
  sibling subtask can wire the status string without an additional UI
  PR.

Plumbing:

- @multica/core/types gains LocalDirectoryResourceRef +
  UpdateProjectResourceRequest, and the api client gets the matching
  PUT method backed by the server endpoint that landed in
  2ac3faebb (MUL-2662). A useUpdateProjectResource hook drives the
  in-place label edit.
- New Electron handlers under apps/desktop/src/main/local-directory.ts:
    local-directory:pick     -> dialog.showOpenDialog (openDirectory)
    local-directory:validate -> stat + access(R_OK + W_OK)
  exposed through the preload as desktopAPI.pickDirectory /
  validateLocalDirectory. View code talks to them via a thin
  packages/views/platform helper that returns reason=unsupported on
  web instead of crashing.
- useLocalDaemonStatus exposes the local daemon's id, device name, and
  running flag from daemonAPI.onStatusChange so the renderer can do the
  cross-device match without coupling to the desktop preload typings.

Tests:

- pickStageKeys gets a unit test covering the new stage and proving
  the directory-release status outranks availability hints.
- LocalDirectoryHint tests cover the four render branches (no project,
  no daemon, foreign daemon, matching daemon).
- i18n parity stays green; new keys added under projects.resources.*
  and chat.status_pill.stages.waiting_for_directory_release in both
  locales.

Out of scope (will land separately):
- The daemon-side waiting/lock signal that flips the pill into the
  new state.
- Adding local_directory to the create-project modal's bulk
  attach flow.
- Docs page refresh for project-resources.mdx — left for the
  MUL-2618 umbrella sweep.

Co-authored-by: multica-agent <github@multica.ai>

* fix(desktop): hide rename for foreign daemon local_directory rows (MUL-2618)

Address review nit on #3273: the rename pencil was gated only by
`canEdit`, so a foreign / unknown-daemon row still showed it even
though the spec says cross-device rows are disabled. Gate rename on
`!mismatch` so it disappears on those rows; delete stays available
so a stale registration can still be dropped from any device.

Co-authored-by: multica-agent <github@multica.ai>

---------

Co-authored-by: multica-agent <github@multica.ai>

* feat(daemon): local_directory execution + path mutex + GC exception (MUL-2663) (#3274)

* feat(daemon): local_directory execution + path mutex + GC exception (MUL-2663)

Wires up the daemon side of the local_directory project_resource introduced
in MUL-2662. When a task is dispatched against a project whose resources
include a local_directory pinned to this daemon's UUID, the daemon now:

  - Validates the path (absolute, exists, daemon process can read+write,
    not in the system-root / $HOME blacklist) and fails the task fast on
    any precondition violation, with a user-readable reason.
  - Serialises concurrent tasks on the same on-disk path via a
    daemon-local LocalPathLocker keyed by symlink-resolved realpath. The
    lock is held for the entire task lifetime (claim → context write →
    agent → result report).
  - When the lock is contended, the daemon flips the row to a new
    waiting_local_directory status on the server (carrying a wait_reason
    like "<path> (held by task <short id>)") so the UI can render
    "等待本地目录释放" instead of leaving the row silently in dispatched
    past the sweeper timeout. The status accepts being woken into running
    once the lock is acquired.
  - Sets execenv.WorkDir to the user's path (no copy, no mount). envRoot
    still lives under workspacesRoot/<wsID>/ and hosts output/, logs/, and
    .gc_meta.json — the daemon's logbook for the run.
  - Stamps GCMeta.LocalDirectory=true so the GC loop never RemoveAlls
    envRoot for these tasks (gcActionClean → gcActionCleanArtifacts,
    gcActionOrphan → gcActionSkip). The user's directory was never under
    envRoot to begin with, so this is defense in depth.
  - Skips execenv.Reuse for local_directory tasks because the prior
    WorkDir is the user's path and reusing it through that code path
    loses the envRoot association the GC loop needs. Prepare is cheap
    here (no clone, no copy), so always running it is fine.

Server-side protocol changes:

  - New CHECK value 'waiting_local_directory' on agent_task_queue.status
    plus a wait_reason TEXT column (migration 109).
  - All cancel / active / counted-as-running / orphan-recovery queries
    expanded to include the new status; FailStaleTasks intentionally
    excludes it (the daemon owns the wait).
  - New SQL MarkAgentTaskWaitingLocalDirectory(id, reason) and a relaxed
    StartAgentTask that accepts both dispatched and
    waiting_local_directory as preconditions (and clears wait_reason on
    the way through).
  - New POST /api/daemon/tasks/{taskId}/wait-local-directory endpoint,
    TaskService.MarkTaskWaitingLocalDirectory broadcaster, and matching
    daemon Client.MarkTaskWaitingLocalDirectory.

Tests cover: path blacklist + R/W enforcement, mutex serialisation +
ctx-cancelled wait, lock handover between two tasks, GC never returns
gcActionClean / gcActionOrphan for local_directory rows (with negative
control for the standard path), and Prepare/Cleanup correctly substitute
+ protect the user's WorkDir.

The desktop UI side (UI for adding a local_directory resource, surfacing
the "等待本地目录" badge) is MUL-2665; the agent-task lifecycle changes
(no branch switch, dirty-tree tolerant, auto-commit) are MUL-2664.

This PR targets the shared MUL-2618 v1 feature branch agent/j/912b8cb1,
not main; the whole v1 will be merged to main together when complete.

Co-authored-by: multica-agent <github@multica.ai>

* fix(daemon): tighten local_directory status, symlink, cancel handling (MUL-2618)

Address the 3 must-fix items from Elon's review of PR #3274.

1. Status string unified. The server / daemon publish
   `waiting_local_directory`; align views, locales, and the
   pickStageKeys test (PR #3273 had used `waiting_for_directory_release`
   on a placeholder string). Without this, the daemon's wait state
   never reached the pill once the two siblings merged.

2. validateLocalPath now also runs the blacklist against the
   symlink-resolved realpath, with macOS's `/etc` -> `/private/etc`
   redirect handled via `isBlacklistedRealPath` which compares
   canonical forms. Without this, a symlink such as
   `/Users/me/proj/home -> /Users/me` slipped the literal $HOME check
   while every daemon write still landed in the user's home. Tests
   cover symlink-to-home, symlink-to-system-root, and the negative
   case (symlink to a regular subdirectory).

3. acquireLocalDirectoryLockIfNeeded now spins up a cancellation
   watcher inside `onWait` (lazy — the fast path stays free) so the
   gap between dispatch and StartTask responds to server-side cancel
   or row deletion. If the watcher fires while the daemon is parked
   on the path mutex, the lock-wait context is cancelled, Acquire
   returns promptly, and the helper exits silently the same way the
   run-phase poller does. New TestAcquireLocalDirectoryLock_CancelDuringWait
   exercises the path end-to-end with a fake server.

Co-authored-by: multica-agent <github@multica.ai>

* fix(daemon): unconditional canonical blacklist + Windows drive-root generalisation (MUL-2618)

- validateLocalPath now always runs isBlacklistedRealPath on the
  symlink-resolved path, not only when it differs from absPath. The old
  guard let users type the canonical form of an OS-symlinked banned root
  (e.g. /private/tmp, /private/etc, /private/var on macOS) straight
  through, since EvalSymlinks is a no-op on already-canonical input.
- Windows drive-root rejection moved off the static C/D/E/F enumeration
  onto filepath.VolumeName via a new isDriveRoot helper, so removable /
  network drives mounted at G:..Z: and UNC \\server\share roots are also
  blocked. systemRootBlacklist keeps the well-known C:\ trees only.
- Tests: macOS-only case exercises direct /private/{tmp,etc,var}; a
  new TestIsDriveRoot covers the Windows generalisation (skipped on
  POSIX runners by runtime guard).

Co-authored-by: multica-agent <github@multica.ai>

---------

Co-authored-by: multica-agent <github@multica.ai>

* feat(views): wire waiting_local_directory end-to-end in issue UI + presence (MUL-2618)

Connect the daemon-emitted `task:waiting_local_directory` and `task:running`
events through to issue execution log, sticky agent banner, activity indicator,
and agent presence so a parked task is no longer invisible on the issue page.

- Add `waiting_local_directory` to `AgentTask.status` and the typed
  `task:running` / `task:waiting_local_directory` WS event payloads.
- Chat realtime sync writes both new statuses into the pending-task cache so
  the chat StatusPill flips out of a stale `dispatched` frame.
- ExecutionLogSection: count `waiting_local_directory` as active, add tone +
  status label, treat parked tasks the same as dispatched for time anchor /
  transcript visibility / terminate-confirm note.
- AgentLiveCard: subscribe to both new events, rank the parked state between
  dispatched and queued, and surface a "is waiting for the local directory"
  banner with the muted "Clock" treatment used for queued.
- IssueAgentActivityIndicator: route parked tasks into the queued bucket so
  the hover stack and chip stay visible.
- derive-presence: parked tasks count toward `queuedCount` so the agent
  workload chip stays out of `idle` while the daemon waits on the path lock.
- Locales: add `agent_live.is_waiting_local_directory` and
  `execution_log.status_waiting_local_directory` (en + zh-Hans).

Co-authored-by: multica-agent <github@multica.ai>

* feat(project): enforce one local_directory per (project, daemon) (MUL-2618)

The daemon-side resolver picks the first matching local_directory by
daemon_id, so allowing two rows on the same daemon — even at different
paths — let the agent silently write into whichever sorted first. Tighten
the invariant top to bottom:

- server: `findLocalDirectoryConflict` rejects any second row sharing a
  daemon_id, regardless of `local_path` or label. Bundled-create surface in
  `CreateProject` runs the same daemon-scoped dedupe up front.
- daemon: `findLocalDirectoryAssignment` fails fast when it finds more than
  one row pinned to the current daemon (older API client / direct DB
  writes can still produce that state — refuse to guess).
- desktop UI: hide the "Add local directory" action once the current
  daemon owns a row on this project, with a hint and a defensive toast on
  the call path; foreign-daemon rows stay visible read-only as before.
- Tests:
  * daemon: new `two local_directory rows on this daemon fail fast` /
    `local_directory rows on different daemons coexist` cases.
  * handler: rewrite the legacy `LabelShadow` cases as
    `DaemonScopedConflict` / `BundledLocalDirectoryDaemonConflict` —
    asserts 409 on same-daemon different-path, 201 on per-daemon bundles.
- Locales: en + zh-Hans copy for the new hint + toast.

Co-authored-by: multica-agent <github@multica.ai>

* chore(sqlc): drop stale skills_local in UpdateAgentCustomEnv (MUL-2618)

Follow-up to the main-merge in 0f8e8ca7: the auto-merge preserved most
of main's skills_local revert but kept the column reference inside the
UpdateAgentCustomEnv scanner because that block hadn't been touched by
either side. Re-running `sqlc generate` regenerates the file without
skills_local in this query, matching the rest of the file and the
post-revert schema.

Co-authored-by: multica-agent <github@multica.ai>

* feat(create-project): binary source picker — repos OR local directory

Turn the create-project dialog's "Repos" pill into a binary Source
picker. A project's source is mutually exclusive: either a set of
GitHub repos (worktree mode, default) or a single local working
directory (local mode, desktop-only). Mirrors the constraint the
backend will enforce next.

Behavior:
- Pill shows the active mode's selection (GitHub icon + repo count, or
  folder icon + local label/path).
- Popover has a 2-tab segmented control at the top; the Local tab is
  hidden entirely on web (local_directory needs a daemon_id).
- Local tab requires the daemon online — amber notice + disabled picker
  when offline, re-renders automatically via useLocalDaemonStatus.
- Switching tabs preserves the other side's stash, but handleSubmit
  only emits the resource matching the active sourceMode, so abandoned
  picks never leak into the created project.

Backend mutual-exclusion validation + the resources-section
conditional-add-button still to come — this PR just unblocks the
dialog so it can be demoed.

* fix(mobile): cover waiting_local_directory in run row status maps (MUL-2618)

---------

Co-authored-by: multica-agent <github@multica.ai>
Co-authored-by: Multica J <j@multica.ai>
2026-05-27 13:44:31 +08:00
Multica Eve
744b474199 revert(agent): remove per-agent local skill toggle (MUL-2603) (#3286)
* Revert "feat(agents): hide skills_local toggle for runtimes that don't honour it (MUL-2603) (#3276)"

This reverts commit 0b50c5a209.

Co-authored-by: multica-agent <github@multica.ai>

* Revert "fix(agent): surface host OAuth token via env var on macOS isolation (MUL-2603) (#3267)"

This reverts commit a67bf81225.

Co-authored-by: multica-agent <github@multica.ai>

* Revert "fix(agents): tighten skills-tab intro and drop redundant import hint (#3265)"

This reverts commit d8075a5775.

Co-authored-by: multica-agent <github@multica.ai>

* Revert "fix(agent): mirror $HOME/.claude.json into isolated config dir (MUL-2661) (#3261)"

This reverts commit 40da88fc16.

Co-authored-by: multica-agent <github@multica.ai>

* Revert "feat(agent): per-agent toggle to isolate host-machine skills (MUL-2603) (#3200)"

This reverts commit 960befa56f.

Co-authored-by: multica-agent <github@multica.ai>

* Add migration cleanup for reverted agent skills toggle

Co-authored-by: multica-agent <github@multica.ai>

---------

Co-authored-by: Eve <eve@multica-ai.local>
Co-authored-by: multica-agent <github@multica.ai>
2026-05-26 17:00:01 +08:00
Bohan Jiang
ae11f290b4 fix(server): gate GitHub auto-close on closing keywords (MUL-2680) (#3281)
* fix(server): gate GitHub auto-close on closing keywords (MUL-2680)

Closes multica-ai/multica#3264. The PR webhook previously treated any
mention of an issue identifier in a PR title/body/branch as a close
intent, so a body of "Closes MUL-1. Follow up in MUL-2. Unblocks MUL-3."
would advance all three issues to done on merge. The auto-link layer
stays generous (mentions still link the PR), but advancing to done now
requires an explicit "Closes/Fixes/Resolves MUL-X" keyword adjacent to
the identifier in the title or body — bare title prefixes (`MUL-1: ...`)
and branch-name references no longer auto-complete.

MUL-2680

Co-authored-by: multica-agent <github@multica.ai>

* fix(server): persist close_intent on issue↔PR link rows (MUL-2680)

The first take of MUL-2680 gated auto-advance on `closingIdents[id]` from
the current webhook event. That broke the multi-PR sibling case: a PR
declaring `Closes MUL-X` could merge first while a link-only sibling
stayed open, leaving the issue in_progress; when the sibling closed
later, its webhook carried no closing keyword and the handler skipped
re-evaluation, so the issue stayed stuck forever.

Move close intent from per-event state to per-link state:

- New `close_intent` column on `issue_pull_request` (migration 109),
  set monotonically — `LinkIssueToPullRequest` ORs the existing flag with
  the incoming one so a subsequent webhook re-fire without the keyword
  cannot clear it.
- New `GetIssuePullRequestCloseAggregate` query returns open-count and
  merged-with-close-intent-count for an issue. The auto-advance gate
  now reads from this persisted aggregate, which is event-agnostic: any
  terminal linked-PR event re-evaluates and the verdict only depends on
  accumulated DB state.
- Webhook handler links all mentioned identifiers first (writing
  close_intent for the ones declared with a keyword), then iterates the
  affected issues in a separate pass to re-evaluate. The 'only fires for
  keyword-declared identifiers in this event' gate is gone — replaced by
  `merged_with_close_intent_count > 0` against the link rows.

Regression test `TestWebhook_LinkOnlySiblingMergeAfterCloseKeywordPR`
walks the full open→merge→open→merge sequence Elon described and asserts
the issue advances on the link-only sibling's merge.

MUL-2680

Co-authored-by: multica-agent <github@multica.ai>

* Fix GitHub close intent updates

Co-authored-by: multica-agent <github@multica.ai>

---------

Co-authored-by: multica-agent <github@multica.ai>
Co-authored-by: Eve <eve@multica-ai.local>
2026-05-26 16:45:46 +08:00
Bohan Jiang
960befa56f feat(agent): per-agent toggle to isolate host-machine skills (MUL-2603) (#3200)
* feat(agent): per-agent toggle to isolate host-machine skills (MUL-2603)

Adds an agent-scoped `skills_local` switch ("ignore" default / "merge") so
shared agents stop inheriting the operator's user-global Claude skill
directory. A single broken local skill on one operator's machine was
crashing the Claude CLI before it ever read stdin — the daemon saw a
"broken pipe" with no recoverable signal (GitHub #3052).

- DB: migration 108 adds `agent.skills_local` (NOT NULL DEFAULT 'ignore'),
  with sqlc CreateAgent/UpdateAgent updates and handler validation.
- Claude runtime: when the agent is in "ignore" mode the backend points
  CLAUDE_CONFIG_DIR at an empty per-task scratch dir under the task cwd
  (fallback: OS temp), strips any inherited override, and cleans up after
  the run. Workspace skills under `{cwd}/.claude/skills/` still load.
  "merge" preserves the legacy inherit-from-machine behavior; Codex and
  other isolated backends are no-ops.
- UI: new Skills toggle in the Create Agent dialog and the Agent → Skills
  tab, with EN/zh-Hans copy and SkillsLocalToggle shared between the two.
- Tests: unit coverage for the new env helper, isolation dir lifecycle,
  full Claude execute paths (ignore + merge), and the handler tristate
  contract. Existing skills-tab test updated for the new copy.
- Docs: updated `/skills` docs (EN + ZH) and added a 0.3.7 changelog entry
  in the landing-page i18n.

Co-authored-by: multica-agent <github@multica.ai>

* fix(agent): preserve claude login + validate skills_local input (MUL-2603)

Address Elon's review on PR #3200:

1. Skill isolation no longer drops the operator's Claude login. The
   per-task scratch dir now mirrors every entry under `~/.claude/`
   as symlinks except `skills/`, so `.credentials.json`, settings,
   plugins, etc. reach the CLI exactly as on the host while the
   user-global skills directory stays hidden. Without this, default
   `ignore` would have broken every Claude agent on a non-API-key
   host the moment migration 108 landed.

2. Internal CreateAgent callers (agent_template, onboarding_shim)
   now set `SkillsLocal: "ignore"`. The Go zero value was about to
   trip the migration-108 CHECK constraint and 500 template /
   onboarding agent creation.

3. Create / update handler validation no longer normalizes garbage
   to "ignore". The strict 400 path is now reachable on bad client
   input; the drift-safe `normalizeSkillsLocal` stays on the read
   side only.

UI copy + docs clarified that the toggle is Claude-only; other
runtimes ignore the setting.

Verification:
- `go test ./...` green (full suite locally).
- `pnpm --filter @multica/views exec vitest run agents/components/tabs/skills-tab.test.tsx` green.
- Handler DB-backed tests still skip locally without docker (same
  as Elon's run) — CI will validate the create / update paths
  against migration 108.

Co-authored-by: multica-agent <github@multica.ai>

* fix(agent): mirror effective claude config dir with windows fallback (MUL-2603)

Address Elon's second-round review on PR #3200:

1. The per-task scratch dir now mirrors the *effective* host Claude
   config dir, not unconditionally `~/.claude/`. Precedence: agent
   `custom_env` CLAUDE_CONFIG_DIR > parent process env > `~/.claude/`.
   Without this, an operator who pinned Claude at a managed install
   (custom env CLAUDE_CONFIG_DIR) would get the wrong credentials in
   the scratch dir, because `buildClaudeEnv` strips that env before
   handing it to the child. We resolve the source up front and feed
   it to the mirror, so the override env still points at the right
   bytes.

2. Mirror entries now go through platform-aware linkers. On Windows
   without Developer Mode / admin, `os.Symlink` is denied, which
   previously left the scratch dir empty and broke Claude Code auth
   on default `ignore`. The new helpers try symlink first, then fall
   back to a directory junction (`mklink /J`) for dirs or a hardlink
   (same-volume content share) / copy for files. Mirrors the
   execenv/codex_home_link_windows.go pattern.

3. Tests:
   - `TestResolveHostClaudeConfigDir` locks in the custom_env >
     parent_env > `~/.claude` precedence.
   - `TestNewIsolatedClaudeConfigDirMirrorsCustomHostDir` confirms
     the scratch dir picks up `.credentials.json` from a synthetic
     custom host dir, proving the source resolution actually
     propagates into the mirror.
   - `TestNewIsolatedClaudeConfigDirEmptyHostIsNoop` documents the
     env-var-auth-only case (no host source ⇒ empty scratch dir).
   - `TestMirrorHostClaudeExceptSkillsWith_FallbackWhenSymlinkFails`
     exercises the Windows-no-Developer-Mode path via the new
     `mirrorHostClaudeExceptSkillsWith` seam, asserting credentials
     and sub-dir children still reach the scratch dir after the
     symlink stand-in fails.
   - `TestMirrorHostClaudeExceptSkillsWith_PropagatesFirstLinkError`
     confirms callers see the per-entry error when even fallback
     fails (so the warn-log fires on broken Windows installs).
   - `TestCopyFileRoundTrip` covers the last-resort copy fallback
     and its EXCL no-overwrite contract.
   - `TestClaudeExecuteIsolatesUsesCustomEnvSource` is the
     end-to-end check: an agent with custom_env CLAUDE_CONFIG_DIR
     reads its credentials from the pinned dir, not `~/.claude/`.

4. Docs: `apps/docs/content/docs/skills.{mdx,zh.mdx}` updated to
   describe the effective-source resolution and the Windows
   fallback chain so the docs match the runtime behaviour.

Verification:
- `go test ./...` green (full server suite locally, including
  `pkg/agent` 23 cases covering the new + existing isolation
  paths).
- `GOOS=windows GOARCH=amd64 go vet ./pkg/agent/...` and
  `go test -c -o /dev/null` both compile clean, confirming the
  Windows-tagged linker file builds.

Co-authored-by: multica-agent <github@multica.ai>

* fix(agent): default skills_local to merge to preserve legacy behavior (MUL-2603)

Per Bohan's product decision on PR #3200, the per-agent host-skill toggle
defaults to "merge" — the pre-MUL-2603 inherit-from-machine behavior —
so existing personal workflows that rely on locally installed Claude
Skills keep working unchanged. Agent owners explicitly opt into "ignore"
when they need to harden a shared agent against a broken local skill on
one operator's machine (GitHub #3052).

Also audited all 11 runtimes for user-global skill discovery paths and
documented the scope of the toggle. Only Claude reads a user-global
`~/.claude/skills/`; Codex isolates via `CODEX_HOME`, the ACP backends
(Hermes / Kimi / Kiro) and the JSON-stream backends (Copilot / Cursor /
Gemini / Pi / OpenCode / OpenClaw) anchor discovery to the task workdir
and never read a user-global skill directory. UI copy and docs now say
"for runtimes that support it (currently Claude Code)" everywhere so
the scope is explicit.

Changes:

- Migration 108: column default flipped to 'merge'.
- Handler CreateAgent: missing field → "merge"; explicit "ignore" /
  "merge" still validated, garbage still 400.
- normalizeSkillsLocal: drift-safe coercion now lands on "merge" for
  anything that isn't the exact literal "ignore".
- agent_template.go / onboarding_shim.go: internal CreateAgent callers
  send "merge" instead of "ignore" to match the new default.
- Claude runtime (`claude.go`): isolate-mode gate flipped from
  `SkillsLocal != "merge"` to `SkillsLocal == "ignore"`, so "" (legacy
  daemons / older clients) and "merge" both walk `~/.claude/` directly.
- Create Agent dialog + Skills tab: toggle defaults to on (merge); only
  duplicate of an explicit "ignore" agent carries through. The
  isolation opt-in is now `skills_local: "ignore"` when the user flips
  off; "merge" is omitted from the request body.
- i18n (EN + zh-Hans): copy reframed — "On (default) — merged"; "Off —
  ignored. Recommended for shared agents".
- Docs (`/skills`, `/guides/agents.zh`): describe new default and
  enumerate which runtimes act on the toggle.
- Landing changelog 0.3.7: retitled "Per-Agent Local-Skill Toggle"; note
  the on-by-default behavior + off-to-isolate framing.
- Tests:
  - `TestClaudeExecuteIsolatesHostSkillsWhenIgnoreOptedIn` replaces the
    old by-default isolation case (now requires explicit "ignore").
  - New `TestClaudeExecuteDefaultModeKeepsHostConfigDir` locks in that
    default ExecOptions preserve the host CLAUDE_CONFIG_DIR.
  - `TestClaudeExecuteIsolatesUsesCustomEnvSource` now explicitly opts
    into "ignore" mode.
  - Handler tests: omitted → "merge"; explicit "ignore" round-trips;
    preserve-existing test seeds "ignore" and asserts "merge" flip-back.
  - `TestNormalizeSkillsLocal_DriftStaysSafe`: only literal "ignore"
    maps to ignore; everything else → "merge".
  - `skills-tab.test.tsx`: toggle ON by default; flip OFF when agent
    opted into "ignore". Intro-text matcher anchored to a more specific
    phrase so it no longer collides with the toggle hint copy.

Verification:
- `go test ./...` green (full server suite locally).
- `GOOS=windows GOARCH=amd64 go vet ./pkg/agent/...` and
  `go test -c -o /dev/null` both compile clean (windows-tagged linker
  file still builds).
- `pnpm typecheck` green across all packages and apps.
- `pnpm --filter @multica/views test` 88 files / 771 tests green.
- `pnpm --filter @multica/core test` 43 files / 390 tests green.
- Handler DB-backed tests still skip locally without docker; CI will
  validate the create / update paths against migration 108.

Co-authored-by: multica-agent <github@multica.ai>

* chore(landing): drop 0.3.7 changelog entry from this PR (MUL-2603)

The landing-page release notes belong in a separate release-prep PR, not in the feature PR.

Co-authored-by: multica-agent <github@multica.ai>

* fix(agent): propagate skills_local=ignore to codex user-skill seed (MUL-2603)

Make the per-agent skills_local toggle real for Codex too, not just Claude.
Previously the toggle was only consumed by the Claude backend, while the
daemon's execenv layer always seeded Codex's per-task CODEX_HOME with the
host machine's user-installed skills from ~/.codex/skills/. A shared Codex
agent with skills_local=ignore could still inherit a broken local skill
from one operator's machine.

Now: PrepareParams/ReuseParams carry SkillsLocal; hydrateCodexSkills
skips seedUserCodexSkills when SkillsLocal == "ignore" so the per-task
CODEX_HOME exposes only workspace skills to the codex CLI. Default
("merge", or empty from older servers/clients) preserves existing
inherit-from-machine behavior. UI / docs are updated to reflect the
contract honestly: Claude Code and Codex honor the toggle; other
runtimes (Hermes / Kimi / Kiro / Copilot / Cursor / Gemini / Pi /
OpenCode / OpenClaw) leave $HOME untouched and discover user-level
skills natively, so the toggle is a no-op for them today.

New tests: TestPrepareCodexSkillsLocalIgnoreSkipsUserSeed,
TestPrepareCodexSkillsLocalMergeSeedsUserSkills, and
TestReuseCodexSkillsLocalIgnoreSkipsUserSeed cover Prepare(ignore),
Prepare(merge), and the toggle-flip-on-reuse path.

Co-authored-by: multica-agent <github@multica.ai>

* docs(skills): scope skills_local toggle copy to Claude Code + Codex (MUL-2603)

Off-state hint and Skills tab intro now explicitly call out Claude Code +
Codex as the only runtimes that honor the toggle, with "other runtimes
ignore this setting" wired into both states (en + zh-Hans), so users on
non-Claude/Codex agents don't read "Off" as runtime-wide isolation.

Docs (skills.mdx, skills.zh.mdx, guides/agents.zh.mdx) stop describing
Hermes / Kimi / Gemini / Copilot / Cursor / Pi / OpenCode / OpenClaw / Kiro
as having native user-level skill discovery; the daemon simply does not
manage user-level skill discovery for those runtimes today, and the toggle
is a no-op regardless of where it is set.

Co-authored-by: multica-agent <github@multica.ai>

---------

Co-authored-by: multica-agent <github@multica.ai>
2026-05-26 13:26:33 +08:00
Bohan Jiang
13f74e651a feat(agents): remove custom_env from agent resources, add audited env endpoint (MUL-2600) (#3209)
* feat(agents): remove custom_env from agent resources, add audited env endpoint (MUL-2600)

The agent resource shape (list / get / create / update / archive /
restore responses + WebSocket events) no longer carries `custom_env`
values. Reads/writes of env now flow exclusively through a dedicated
`/api/agents/{id}/env` endpoint that is owner/admin-only, rejects
agent-actor sessions, applies a "****" sentinel preserve guard on
PUT, and writes a persistent audit row per reveal/update.

Why
- `multica agent list --output json` historically returned plaintext
  `custom_env` for owner/admin callers (the redaction gate gave only
  members the masked map). Any agent token running on the workspace
  inherits its owner's role and could read every other agent's
  secrets just by listing.
- Patching list/get redaction alone (PR #3175 direction) left
  symmetric leaks via mutation responses, WS events, the "reveal"
  path itself (no actor-aware auth), and a `****` overwrite footgun
  on UpdateAgent.

What changed
- Backend: drop `custom_env` from AgentResponse; add coarse
  `has_custom_env` + `custom_env_key_count`. Strip env handling from
  UpdateAgent (silently ignored if sent). Keep CreateAgent's
  custom_env acceptance.
- Backend: new GET/PUT `/api/agents/{id}/env` handlers in
  `internal/handler/agent_env.go`:
  - resolveActor → 403 for agent actors (closes the lateral-movement
    path).
  - Owner/admin role gate via existing helper.
  - PUT honours value == "****" as "preserve existing value".
  - Both write to `activity_log` with `agent_env_revealed` /
    `agent_env_updated` actions. Audit details record key names only,
    never values.
- Daemon claim path (`ClaimAgentTask`) unchanged — `TaskAgentData`
  still carries plaintext env for runtime injection.
- SQL: new `UpdateAgentCustomEnv` query; sqlc regenerated (v1.31.1).
- CLI: new `multica agent env get|set` subcommands. `--custom-env*`
  flags removed from `multica agent update`; the no-fields error
  now points to the new path.
- Frontend: drop env fields from `Agent` + `UpdateAgentRequest`; add
  `getAgentEnv` / `updateAgentEnv` client methods; rewrite env-tab
  to show "N variables configured" + explicit "Reveal & edit"
  button, fetching values only on intentional reveal.
- Locales: parity-safe additions to en + zh-Hans.
- Docs: agents-create.{mdx,zh.mdx} reflect the new threat model and
  endpoint.
- Mobile: schema drops `custom_env` / `custom_env_redacted`, adds
  metadata fields.

Tests
- Handler tests pinned the new invariants: no env in list/get
  responses, owner reveal happy-path + audit row, agent-actor 403,
  `****` sentinel preserves real values, UpdateAgent silently
  ignores `custom_env`, pure `mergeAgentEnv` cases.
- CLI tests pivot to the new flag surface: `agent update` MUST NOT
  expose the env flags; `agent env set` MUST expose
  --custom-env-stdin/--custom-env-file.
- Frontend test fixtures updated; pnpm typecheck / test / lint
  pass cleanly.

This is a breaking API change. Scripts that read `custom_env` from
`/api/agents` must migrate to `GET /api/agents/{id}/env`.

Co-authored-by: multica-agent <github@multica.ai>

* fix(agents): close actor-spoofing + audit fail-closed in env endpoints (MUL-2600)

Addresses Elon's review of #3209:

* Mint a task-scoped `mat_` token per claim, bound to (agent, task,
  workspace, owner). Daemon injects it into the agent process in place
  of its own credential. Auth middleware authoritatively rebuilds
  X-User-ID / X-Agent-ID / X-Task-ID from the token row and sets
  X-Actor-Source=task_token; that header is server-set only — incoming
  values are stripped before any auth branch runs. resolveActor honors
  the header so an agent that strips X-Agent-ID / X-Task-ID still
  resolves as actor=agent.
* GetAgentEnv / UpdateAgentEnv are now fail-closed on audit-log
  failures: GET refuses to return plaintext, PUT persists inside the
  same tx as the audit row so they commit/roll back together.
* PUT /api/agents/{id} returns 400 when the body carries custom_env
  instead of silently dropping it — directs callers to the audited env
  endpoint.
* Agent actors never see mcp_config, even when the underlying member
  is owner/admin; mutation broadcasts go through a redaction shim so
  WS subscribers don't pick it up either.
* Fix backend test that asserted dense JSON (jsonb::text renders
  whitespace) and frontend test that assumed a unique "Test User"
  match.

Co-authored-by: multica-agent <github@multica.ai>

* fix(agents): close residual MUL-2600 gaps from review (MUL-2600)

Migration 108 FK now correctly references agent_task_queue(id) instead
of the non-existent agent_task table; the previous name blocked CI
backend migrations.

Task-token-authenticated requests can no longer be re-routed at a
different workspace by passing workspace_slug / workspace_id /
?workspace_id / a URL workspace param. ResolveWorkspaceIDFromRequest
and resolveWorkspaceUUID both short-circuit on X-Actor-Source=task_token
and return only the token-bound X-Workspace-ID; buildMiddleware adds a
defence-in-depth 403 if any URL-resolved workspace disagrees with the
token binding.

mcp_config no longer leaks back to agent actors through UpdateAgent /
CreateAgent / ArchiveAgent / RestoreAgent HTTP responses — the same
redactAgentResponseForActor helper that GetAgent/ListAgents use is now
applied to mutation responses too. WS broadcasts were already redacted
via broadcastAgentResponse.

FailTask and every TaskService cancel path (CancelTask /
CancelTasksForIssue / CancelTasksForAgent / CancelTasksByTriggerComment
/ BroadcastCancelledTasks) now eagerly DeleteTaskTokensByTask so the
mat_ token's 24h window doesn't outlive a terminated task. Failure is
non-fatal — the FK cascade and expiry remain durable guards.

Doc-only: clarify that PUT /api/agents/{id} now hard-rejects bodies
that carry custom_env (was previously "silently ignores").

Tests:
- middleware: TestResolveWorkspaceIDFromRequest gains a task_token
  case asserting client-supplied slug/id/query cannot override the
  bound workspace.
- handler: TestUpdateAgent_RedactsMcpConfigForAgentActor and
  TestUpdateAgent_KeepsMcpConfigForMemberActor pin the mutation-
  response redaction contract per actor type.

Co-authored-by: multica-agent <github@multica.ai>

* fix(agents): match redacted mcp_config as JSON null, not Go nil (MUL-2600)

`AgentResponse.McpConfig` is `json.RawMessage` without `omitempty`, so
the redacted response serialises as `"mcp_config": null`. On decode,
`json.RawMessage` keeps the literal bytes `null` rather than collapsing
to Go nil, which made the assertion fire on a non-leak.

The product contract (field always present, distinguished from "no
config" via `mcp_config_redacted`) is intentional, so adjust the test
to check for "no secret-bearing content" instead of weakening the
contract via `omitempty`.

Co-authored-by: multica-agent <github@multica.ai>

---------

Co-authored-by: multica-agent <github@multica.ai>
2026-05-25 18:42:48 +08:00
Bohan Jiang
c967ae0e0e feat(issues): platform-owned parent notify on child done (MUL-2538) (#3055)
* feat(issues): platform-owned parent notify on child done (MUL-2538)

When a child issue transitions from a non-done status into `done` and has
an open parent, the server now posts a top-level platform-generated
comment on the parent itself. Replaces the agent-prompt rule shipped in
PR #2918, which produced self-mention loops, planner ping-pong, and
accidental `MUL-` prefix hardcoding because the agent did not always know
the workspace prefix.

- Migration 107 widens `comment.author_type` to allow `system`; the
  zero UUID is used as the sentinel `author_id` (the column stays NOT
  NULL, callers branch on `author_type === 'system'`).
- `Handler.notifyParentOfChildDone` fires from both `UpdateIssue` and
  `BatchUpdateIssues`. Guards: prev status != done, new status == done,
  parent set, parent not in `done`/`cancelled`. Bypasses the
  CreateComment HTTP path so the assignee on_comment trigger and the
  mention-trigger paths do not fire — the comment content carries only
  the safe issue mention for the child, no `mention://agent/...` /
  `mention://member/...` / `mention://squad/...` links.
- `runtime_config.go` downgrades the Parent/Sub-issue Protocol rule 1
  to an explicit "do NOT post one yourself" guardrail; rule 2 (sub-issue
  creation `--status todo` vs `backlog`) is unchanged.
- New handler test exercises the happy path, idempotency, reopen+done,
  parent done/cancelled guards, and the no-parent case. Runtime-config
  tests reassert the new wording and the banned strings from the prior
  revision.

Co-authored-by: multica-agent <github@multica.ai>

* fix(issues): isolate system comments + wire GH merge path (MUL-2538)

Addresses the two must-fix items from the PR #3055 second review:

1. The platform-generated `comment:created` event (author_type='system')
   was running through the generic comment listeners, which (a) tried to
   subscribe the zero-UUID author and (b) parsed @mentions from the body
   for inbox notifications. Both subscriber_listeners and
   notification_listeners now early-return on author_type='system' so the
   event becomes a pure WS broadcast for the timeline — no inbox rows,
   no transcluded-mention attack surface.

2. advanceIssueToDone (the GitHub merge auto-done path) only published
   issue:updated and skipped notifyParentOfChildDone, so a child closed
   via merged PR — the dominant completion path — left the parent
   silent. The helper is now invoked on the same prev/updated pair, with
   the existing guards (transition + parent state) protecting double-fire.

Tests:
- New cmd/server/notification_listeners_test:
  TestNotification_SystemCommentSkipsInboxAndMentions (parent subscribers
  and smuggled @mention targets stay quiet),
  TestSubscriberSystemCommentDoesNotSubscribe (zero-UUID never reaches
  AddIssueSubscriber).
- New internal/handler/github_test:
  TestWebhook_MergedPR_ChildWithParent_NotifiesParent fires a real
  pull_request closed-merged webhook against a child and asserts the
  parent receives exactly one safe system comment with the workspace's
  real identifier (no `mention://agent|member|squad` links).

Co-authored-by: multica-agent <github@multica.ai>

* fix(runtime): drop parent-notification guidance from agent brief (MUL-2538)

Per Bohan's product call on PR #3055: the platform now owns the
child-done parent notification, so the runtime brief should not mention
the parent-comment path at all — not as an instruction, not as a "do
not do it" guardrail. The previous revision kept rule 1 of the Parent /
Sub-issue Protocol as a "Do NOT post your own parent-notification
comment." sentence; that still puts the concept in front of the agent
every run, which is exactly what we are trying to avoid.

What changes:
- Delete the "Parent / Sub-issue Protocol" preamble and rule 1 from
  buildMetaSkillContent. The remaining content — the `--status todo`
  vs `--status backlog` rule for creating sub-issues — now lives in a
  dedicated `## Sub-issue Creation` section, since the parent/child
  framing it previously sat under is gone.
- The system comment on the parent stays exactly as in 366f6e2: the
  agent simply does not need to know about it.

Tests:
- runtime_config_test.go is rewritten around the new section name and
  the wider "no parent-notification guidance" canary; the banned list
  now covers both the original PR #2918 wording and the intermediate
  "do NOT post one" wording.

System comment UI: the frontend already renders `author_type === "system"`
with author name "Multica" (`useActorName`) and the MulticaIcon avatar
(`ActorAvatar` via `isSystem`), matching Bohan's "looks like a normal
comment, author is multica + multica logo" requirement — no frontend
changes needed.

Co-authored-by: multica-agent <github@multica.ai>

---------

Co-authored-by: multica-agent <github@multica.ai>
2026-05-22 14:51:43 +08:00
Bohan Jiang
7984606eed feat(landing): add Contact Sales page and inquiry endpoint (MUL-2493) (#2988)
* feat(landing): add Contact Sales page and inquiry endpoint (MUL-2493)

Adds a public `/contact-sales` marketing page with a needs-discovery form
modelled on the design reference attached to MUL-2493 — first/last name,
business email (with free-provider rejection), company name + size,
country/region, intended use case, and a free-text goals field, plus the
two consent checkboxes from the reference.

Submissions hit a new public `POST /api/contact-sales` endpoint with
per-IP rate limiting (Redis-backed via the existing RateLimit middleware,
configurable through `RATE_LIMIT_CONTACT_SALES`) and a per-email hourly
cap so a single business address can't be used as a flood channel after
one valid pass. The inquiry is stored in a new `contact_sales_inquiry`
table; analytics fires a `contact_sales_submitted` PostHog event with
only the closed-enum dimensions (size, country, use case) — the free-text
goals stay in the DB and are never broadcast.

The page is linked from the landing header (md+) and the footer's Company
column, in both English and Simplified Chinese. The reserved-slug list is
updated so a workspace named `contact-sales` can't shadow the route.

Co-authored-by: multica-agent <github@multica.ai>

* fix(landing): canonicalize business email and tighten contact-sales form (MUL-2493)

- Parse the submitted email with net/mail and run the free-email
  block-list against the canonical addr.Address, so a display-name
  form like `Ada <ada@gmail.com>` can no longer slip past the gate
  (the raw string had domain `gmail.com>`, which wasn't blocked).
  Adds regression tests covering the display-name bypass and the
  canonicalization helper.
- Drop noValidate from the contact-sales form so the browser's
  native required / email / select checks fire before submit;
  the JS-side free-email warning still runs as a UX guard.
- Update success copy ("respond within three business days") in
  EN and ZH plus the page metadata.

Co-authored-by: multica-agent <github@multica.ai>

---------

Co-authored-by: multica-agent <github@multica.ai>
2026-05-22 13:22:36 +08:00
LinYushen
5bacfd9742 MUL-2526 feat: add member(user_id, workspace_id) index + upgrade sqlc to v1.31.1 (#3046)
- Add migration 106: CREATE INDEX CONCURRENTLY on member(user_id, workspace_id)
- Rewrite ListWorkspaces to drive from member table with explicit fields
- Regenerate all sqlc code with v1.31.1 (intentional version upgrade)

Co-authored-by: multica-agent <github@multica.ai>
2026-05-22 12:26:56 +08:00
Naiyuan Qing
fbd965e5bf feat(onboarding): v3 — thin server, frontend-orchestrated welcome (#3008)
* feat(onboarding): Multica Helper as general workspace assistant + blocking modal

Reshape Multica Helper from an onboarding-only guide into the workspace's
general-purpose AI assistant. The agent's permanent identity (injected as
`## Agent Identity` into every task's CLAUDE.md / AGENTS.md / GEMINI.md
via execenv.InjectRuntimeConfig) is rewritten to three sections that don't
overlap with what the brief already provides:

  - Who I am (built-in workspace assistant, not onboarding-only)
  - What Multica is + docs/source/issues URLs as knowledge sources
  - What I can do (CLI = manifest, `multica --help` is the source of truth)
  - Tone (concise, like a colleague, match user's language)

Bootstrap moves out of the in-flow Step 4. Runtime step now exits the
onboarding shell with no bootstrap call; a blocking OnboardingHelperModal
mounts inside the workspace layout (web + desktop) and gates purely on
`me.onboarded_at == null`. The user picks one of three starter prompts
(intro / assign / second_agent) and the modal calls
BootstrapOnboardingRuntime with a new optional `starter_prompt` field that
becomes the seeded onboarding issue's description.

Side effects required to make `onboarded_at == null` an honest signal:

  - CreateWorkspace no longer marks onboarded (was atomic with CreateMember).
    The "member exists ⟹ onboarded_at != null" invariant is intentionally
    broken; guards (useDashboardGuard / desktop App.tsx) already tolerate
    this — comments updated to reflect the new contract.
  - AcceptInvitation still marks (invitee skips the modal in someone
    else's workspace). Code comment added warning future removers.
  - resolvePostAuthDestination flips to workspace-presence-first: a user
    with a workspace lands in it regardless of `onboarded_at`, so the
    modal can pick up an interrupted setup on relogin.

Other backend changes:
  - `onboardingAssistantDescription` rewritten ("Built-in workspace assistant…")
  - `onboardingAssistantInstructions` rewritten to the 3-section identity
  - `bootstrapOnboardingRuntimeRequest.StarterPrompt` (optional, 2 KiB rune
    cap, empty-falls-back-to onboardingIssueDescription)

Frontend changes:
  - Delete `packages/views/onboarding/steps/step-teammate.tsx` (no longer a
    persisted step)
  - `ONBOARDING_STEP_ORDER` and `OnboardingStep` type drop `"teammate"`
  - `handleRuntimeNext` exits via `onComplete(workspace, undefined)` — no
    bootstrap, `onboarded_at` stays NULL so the modal fires
  - Runtime step next-button copy → "Start exploring" / "开始探索"
  - New `packages/views/workspace/onboarding-helper-modal.tsx`:
    Base UI Dialog, dismissible=false, three localized cards, mutation
    invalidates agents + issues queries then navigates to the seeded issue
  - Mounted in both `apps/web/app/[workspaceSlug]/layout.tsx` and
    `apps/desktop/src/renderer/src/components/workspace-route-layout.tsx`

Tests:
  - Backend: TestBootstrapOnboardingRuntime_{With,No}StarterPrompt and
    TestCreateWorkspace_DoesNotMarkOnboarded
  - Frontend: onboarding-helper-modal.test.tsx covers all four gating
    conditions, three-card behavior, mutation pending state, and the
    "no close button" invariant

Compatibility:
  - Already-onboarded users: zero impact (modal can't fire)
  - Invitees: AcceptInvitation still marks → modal can't fire
  - Skip-runtime path: BootstrapOnboardingNoRuntime still marks → modal can't fire
  - Old desktop / web clients: legacy teammate-step path keeps working
    (bootstrap accepts missing starter_prompt) — the new modal only fires
    on the new frontend bundle
  - Avatar SVG kept (asterisk variant) — no migration of existing Helper
    agents, only newly-created Helpers pick up the new instructions/description

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(desktop): suppress OnboardingHelperModal while a WindowOverlay is open

On desktop, App.tsx auto-creates a tab pointing at the user's first
workspace as soon as workspaces.length flips from 0 → 1 (during onboarding
Step 2). The new tab mounts WorkspaceRouteLayout under the overlay,
which mounts OnboardingHelperModal. The modal's Portal renders to
document.body — appearing AFTER the WindowOverlay in DOM order, so its
z-50 wins and the modal floats in front of the still-active onboarding
Step 3 (runtime).

Suppress the modal whenever any WindowOverlay is active. When the overlay
closes (onComplete fires after the user finishes onboarding), the modal
re-evaluates `me.onboarded_at == null` and pops on its own.

Web is unaffected (onboarding flow lives at /onboarding, not under
/[workspaceSlug]/, so WorkspaceRouteLayout never mounts during the
onboarding flow).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(onboarding): add v2 refactor plan

Captures the design + 8-step implementation order for collapsing the
onboarding state machine: single mark-onboarded entry point, persisted
Step 3 user choice, dumb Modal, single install-runtime seed call site.
Includes old-user compatibility analysis (4 existing gates) and per-PR
risk/rollback.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(db): persist Step 3 runtime choice on user record (MUL-onboarding-v2)

Adds onboarding_runtime_id UUID NULL + onboarding_runtime_skipped BOOLEAN
columns to "user" and the CHECK constraint enforcing the 3-state machine
(unset / picked-runtime / explicit-skip; the fourth combination is
forbidden). ON DELETE SET NULL on the FK so a deleted runtime degrades
to "unset" rather than dangling.

PatchUserOnboarding gains the two narg fields plus CASE expressions that
collapse the runtime/skipped pair atomically — a follow-up PATCH that
flips one side now clears the other in the same statement, instead of
preserving it via per-field COALESCE and tripping the CHECK constraint.

Backwards compatible for existing users: both new fields default to
(NULL, false), which is the "unset" leaf of the state machine, and four
upstream gates on me.onboarded_at != null already short-circuit the
new fields' readers for everyone who's already onboarded.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(server): collapse onboarding side effects to service layer

Introduces OnboardingService.MarkComplete and
WorkspaceContentService.{Ensure,Seed}InstallRuntimeIssue as the single
authorities for the two onboarding side effects that used to be
duplicated across four handlers:

  - MarkUserOnboarded + claim starter_content_state +
    optional install-runtime fallback seed: was inline in
    BootstrapOnboardingRuntime, BootstrapOnboardingNoRuntime,
    AcceptInvitation, and CompleteOnboarding.
  - install-runtime issue seeding: was inline in CreateWorkspace and
    AcceptInvitation as a "no runtime yet" fallback.

After this refactor:
  - MarkUserOnboarded is called from exactly one place (the service).
  - install-runtime issue is seeded from exactly one place (the service).
  - CreateWorkspace deliberately does not seed — the new
    /ensure-onboarding-content endpoint (also added here) lets the
    workspace-entry init component request the seed on first mount, so
    workspaces created but never opened don't accumulate stale issues.
  - The PatchOnboarding handler now accepts the new runtime_id /
    runtime_skipped fields and rejects (uuid, skipped=true) up front.
  - UserResponse exposes the two new persisted fields so the frontend
    can read them off `me` without an extra round-trip.

Handler-side tests added: TestPatchOnboarding_RuntimeChoiceSwitch (the
explicit cross-request switch path that the original COALESCE design
would have 500'd on) + TestPatchOnboarding_PreserveUntouched.

Old handler-local file no_runtime_issue.go is deleted; its content
moved to service/workspace_content.go with the helpers exported.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(core): API + types for persisted onboarding runtime choice

User type / Zod schema gain onboarding_runtime_id (string | null) and
onboarding_runtime_skipped (boolean); EMPTY_USER + test fixture updated
to match. api.patchOnboarding accepts the new optional fields and the
new api.ensureOnboardingContent endpoint is wired so the workspace
shell can request the fallback seed.

Two new store helpers — recordOnboardingRuntimeChoice(runtimeId) and
recordOnboardingRuntimeSkipped() — replace the prior pattern of
Step 3 calling bootstrap directly. They PATCH the user's choice, sync
the auth store, and return. Mutually exclusive on the server side via
the CHECK constraint; the client just ships one intent at a time.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(workspace): WorkspaceOnboardingInit single decision point + dumb Modal

Replaces OnboardingHelperModal's self-gating render path with a 4-branch
dispatcher that runs once on workspace-shell mount:

  branch 0  me.onboarded_at != null         → ensure install-runtime issue
                                              fallback, render nothing
  branch 1  me.onboarding_runtime_skipped   → SkipBootstrapping component:
                                              loading veil → bootstrap →
                                              navigate. On failure shows
                                              a Retry UI instead of
                                              silently freezing the veil
  branch 2  me.onboarding_runtime_id        → render Modal with the
                                              runtime id from `me` (no
                                              internal list query)
  branch 3  (none of the above)             → useEffect navigate back to
                                              /onboarding so the user
                                              walks Step 3 again

The Modal itself is now a dumb component — receives `workspace` and
`runtimeId` as props, no internal gates, no runtimeListOptions query.
Tests rewritten to cover the props-driven render + pick-card paths;
the prior gating tests move into the new
workspace-onboarding-init.test.tsx alongside the M2 retry-on-failure
behaviour.

Mounted in both apps/web/app/[workspaceSlug]/layout.tsx and the desktop
workspace-route-layout. Desktop keeps its `!overlayActive` suppression
guard so the init doesn't portal-jump in front of an active
WindowOverlay.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(onboarding): Step 3 records user choice instead of calling bootstrap

handleRuntimeNext now PATCHes the user's pick (recordOnboardingRuntime
{Choice,Skipped}) and navigates straight into the workspace shell. The
workspace-entry WorkspaceOnboardingInit reads the persisted choice off
`me` and runs the appropriate branch — Step 3 is pure intent capture
with zero side effects on its own.

PATCH must succeed before navigation: if it fails the user stays on
Step 3 with a toast, because navigating with no persisted intent would
land them in WorkspaceOnboardingInit's branch 3 "no decision yet" rescue
and trigger a redirect loop back to /onboarding.

The prior asymmetry (Connect deferred bootstrap to the workspace, Skip
ran bootstrap inline) is gone — both paths defer to the workspace
shell now.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(onboarding): v3 — thin server, frontend-orchestrated welcome

Collapse v2's persisted runtime-choice fields + 4-branch dispatcher +
OnboardingService/WorkspaceContentService stack down to a single rule:
`onboarded_at` is the only state field, layout hard-gates on it, and the
welcome experience after Step 3 is owned entirely by the frontend.

V3 flow
- Step 3 button: await POST /api/me/onboarding/complete (mark only) +
  park a transient signal in `useWelcomeStore` + navigate
- Workspace layout: hard gate `onboarded_at == null` -> /onboarding
- `<WelcomeAfterOnboarding />` reads the welcome-store signal:
  - runtime path: find-or-create Multica Helper via generic createAgent
    with bilingual instructions from `templates/helper-instructions.ts`,
    blocking modal with 3 starter cards, pick -> createIssue + navigate
  - skip path: provision install-runtime (in_progress) -> agent-guide
    (todo, body embeds install-runtime mention chip) -> follow-up comment
    on install-runtime mentioning agent-guide; then pop celebration
    modal with 🎉 emoji pop animation, 2 read-only preview cards, single
    [Got it] CTA that navigates to install-runtime

Server cleanup
- Drop OnboardingService, WorkspaceContentService, v2 runtime-choice
  columns/CHECK on user, EnsureOnboardingContent endpoint
- CompleteOnboarding/AcceptInvitation call qtx.MarkUserOnboarded
  directly (no service indirection)
- BootstrapOnboardingRuntime / BootstrapOnboardingNoRuntime kept as a
  deprecation shim in onboarding_shim.go for desktop < v3 during the
  rollout window — handlers inlined to qtx.* calls, no service layer

Localization
- Persisted strings (issue titles/bodies, Helper instructions/
  description, comment prefix) live as TS const `{en, zh}` maps in
  `packages/views/onboarding/templates/` — i18n bundle staleness can no
  longer write raw key paths into DB
- UI-rendered strings (modal copy, status chips, buttons) stay in
  `packages/views/locales/{en,zh-Hans}/onboarding.json`
- Language picked from live `i18n.language` (not `me.language`, which is
  null for new users until they pick a preference)

Race protection
- Module-level promise dedupe (`findOrCreateHelper`, `seedIssueDeduped`,
  `postCommentDeduped`) so React StrictMode double-mount can't fire two
  parallel API calls that the server would then 409

Cross-references between the two skip-path issues render via Multica's
mention-chip protocol `[<identifier>](mention://issue/<uuid>)` so they
match the styled IssueChip pills used elsewhere.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(onboarding): welcome-after-onboarding modal redesign + cross-user safety

Welcome modal polish (the post-Step-3 surface this branch already
introduced):

Runtime path
- Helper avatar replaces the bouncy 🎉 hero; tone-down animation to
  fade. New copy: "Hi, welcome to Multica / I'm your first Agent
  assistant" + capability hint sentence so users discover assignment +
  chat from the first screen.
- Cards changed from "click = submit" to multi-select with the existing
  border-primary + ring selection pattern used by compact-runtime-row;
  bottom CTA "Assign N tasks to me →" appears only with N>0.
- New starter cards: intro / tour / welcome_page (the last one tells
  Helper to paste an HTML welcome page into the issue comment — works
  on any runtime regardless of fs access).
- Success state added between createIssue and navigation: 🎉 +
  "All set!" + "Sit tight  — your {agentName} is on it" + inbox/chat
  hints, single [Got it] button.
- Title/prompt for starter cards now live in TS const
  HELPER_STARTER_PROMPTS (persisted to DB — must not depend on i18n
  bundle being loaded); subtitle stays in onboarding.json.

Skip path
- Body restructured into three independent ```md blocks (Name /
  Description / Instructions) so each picks up the markdown renderer's
  per-block copy button — no manual extraction.
- ZH body now embeds the ZH Helper Description + Instructions (was
  Chinese-around-English-block).
- Follow-up comment uses Multica's mention-chip protocol
  [identifier](mention://issue/uuid) so it renders as the styled
  IssueChip pill.
- Issue titles bilingual with "Step 1 / Step 2" prefix.

Cross-user / cross-workspace safety (code review feedback)
- web onLogout + desktop handleDaemonLogout now call
  useWelcomeStore.reset() so user B logging into the same browser
  doesn't inherit user A's signal.
- WelcomeAfterOnboarding gates on
  currentWorkspace.id === signal.workspaceId — prevents firing the
  modal in workspace B when the signal was parked for workspace A
  (desktop multi-tab, back/forward, deep-link).
- Module-level promise dedupes (pendingHelperSetup,
  pendingIssueSeed, pendingCommentSeed) for the three API calls so
  React 18+ StrictMode dev double-mount can't race-create duplicates.

Other small fixes carried in this commit
- Helper instructions / agent description / starter card titles all
  read i18n.language (not me.language, which is null for new users
  who haven't picked a UI language preference yet).
- Reverted welcome-emoji-pop animation to a small fade for the runtime
  avatar (kept the bouncy variant for the skip 🎉 hero where the
  celebration is the whole point).
- Removed the duplicate 🎉 from the skip modal title (kept the hero
  one only).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(views): i18n hardcoded "Close" in welcome FullScreenError

CI lint (i18next/no-literal-string) blocked on a literal "Close" string
inside `FullScreenError` — surfaced as a nit in the original code
review but missed in the merge. Add `error_close` to onboarding.json
(EN: "Close" / ZH: "关闭") and thread it through as a `closeLabel`
prop, matching the existing `retryLabel` plumbing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 19:00:26 +08:00
Bohan Jiang
0c767c0052 feat(issues): per-issue metadata KV (MUL-2017) (#2845)
* feat(issues): per-issue metadata KV (MUL-2017)

Adds a small JSONB KV map to every issue for agent pipeline state (attempts,
PR number, pipeline status, ...). Keys match a narrow regex, values are
primitives (string / number / bool), capped at 50 keys per issue and 8KB
per blob. Defense-in-depth via two CHECK constraints (object shape + size).

All mutations are single-key atomic (jsonb_set / `- key`). `UpdateIssue`
intentionally does NOT touch metadata: a whole-blob overwrite would race
with concurrent agent writes.

  GET    /api/issues/:id/metadata
  PUT    /api/issues/:id/metadata/:key   body: { "value": <primitive> }
  DELETE /api/issues/:id/metadata/:key

Containment filter on list: GET /api/issues?metadata=<json-object> uses
PG `@>` against a `jsonb_path_ops` GIN index. Mirrored across ListIssues,
CountIssues, ListOpenIssues, and the hand-rolled ListGroupedIssues SQL so
CLI/API and UI grouped views stay consistent.

CLI: multica issue metadata {list,get,set,delete}
  multica issue list --metadata key=value (repeatable, AND)
  set has --type to override the default value-sniffing
Co-authored-by: multica-agent <github@multica.ai>

* fix(issues): metadata test bugs + wire realtime + read-only display (MUL-2017)

- Fix two failing handler tests blocking backend CI:
  - reset decode target after delete so map merge does not mask removal
  - url.PathEscape the key segment so spaces no longer panic NewRequest
- Wire issue_metadata:changed end to end so the detail / list / my-issues
  caches stay in sync with set/delete events (other tabs, CLI writes).
- Add a read-only Metadata strip to the issue detail sidebar; hidden when
  the issue has no keys so it stays quiet in the common case.

Co-authored-by: multica-agent <github@multica.ai>

* feat(runtime): teach agents to read/write issue metadata (MUL-2017)

Add an `## Issue Metadata` section to the runtime brief plus a
`metadata list` step on entry and a `metadata set`/`delete` step on
exit. Section only emits when the task carries an issue id (comment- or
assignment-triggered); chat / quick-create / run-only autopilot stay
clean so they don't fire failing CLI calls.

Co-authored-by: multica-agent <github@multica.ai>

* fix(issues): bump metadata migration to 105 and drop attempts as example (MUL-2017)

main is now at 104_drop_runtime_timezone; the migrator picks
LatestVersion() by sorted filename, so a slot before the tail would
let DBs that have already run 099–104 think they're up-to-date while
the issue.metadata column is missing — runtime would then fail with
column does not exist. Renumbering to 105 puts the migration at the
tail and forces it to run.

Also drop attempts as a positive example across docs/code comments and
test fixtures — the runtime instruction prompt already lists it under
"What NOT to pin" (runtime bookkeeping). Replace with pr_number, which
is in the recommended-keys set, so docs/tests speak the same language
as the prompt.

Co-authored-by: multica-agent <github@multica.ai>

---------

Co-authored-by: multica-agent <github@multica.ai>
2026-05-21 16:35:45 +08:00
Bohan Jiang
9a5d8a52f3 fix(timezone): harden hourly-rollup rollout against straight-through migrate MUL-2488 (#2998)
* fix(timezone): harden hourly-rollup rollout against straight-through migrate

MUL-2488

PR #2968 introduced the new task_usage_hourly rollup but assumed operators
would stop migrate between 102 and 103 to run the one-shot
cmd/backfill_task_usage_hourly. Two pieces made that unsafe in practice:

1. The Dockerfile only shipped server / multica / migrate, so a deployed
   container has no backfill binary to run between phases.
2. cmd/migrate has no per-version stop, and entrypoint.sh runs `migrate up`
   to the latest version, so 103 silently drops the legacy daily rollups
   even when nobody ran the backfill — leaving usage dashboards at zero
   despite source data being intact in task_usage.

Changes:

- Build cmd/backfill_task_usage_hourly into the runtime image alongside
  the other binaries so operators can `docker exec` the backfill instead
  of needing a source checkout.
- Add a fail-closed plpgsql guard at the top of migration 103 that
  aborts the migration when task_usage has rows but task_usage_hourly is
  empty. Fresh databases (no task_usage rows) are exempt because the new
  triggers from 102 will populate the hourly table on the first event.

Already-applied databases are unaffected — schema_migrations tracks by
version only, so 103 is not re-run.

Co-authored-by: multica-agent <github@multica.ai>

* fix(timezone): use watermark coverage for hourly-rollup guard

The previous check only required `task_usage_hourly` to be non-empty,
which an interrupted backfill or a manual `rollup_task_usage_hourly_window`
call both satisfy. The completion signal we actually trust is
`task_usage_hourly_rollup_state.watermark_at` — backfill only stamps it
to `now() - 5 min` after every monthly slice succeeded, and the cron
worker only advances it on a real tick. Default after migration 101 is
`1970-01-01`, so an unrun or partial backfill is trivially detected.

Also corrects the comment about fresh-install behavior: the triggers in
102 only enqueue dirty keys for agent_task_queue / issue / task_usage
DELETE — they do not write hourly rows. INSERT/UPDATE flows through the
`updated_at` watermark window of `rollup_task_usage_hourly()`, which
only runs once the operator registers it as a pg_cron job.

Co-authored-by: multica-agent <github@multica.ai>

---------

Co-authored-by: multica-agent <github@multica.ai>
2026-05-21 16:26:42 +08:00
YYClaw
614dfae884 MUL-2488 feat(timezone): Scheduling / Viewing two-layer timezone architecture (#2968)
* docs(timezone): add scheduling/viewing timezone architecture RFC

* feat(db): replace daily rollups with task_usage_hourly, add user.timezone

Migrations 100-104: add "user".timezone (Viewing tz), build the UTC
hourly task_usage_hourly rollup with its pipeline, drop the legacy
task_usage_daily / task_usage_dashboard_daily pipelines, and drop the
agent_runtime.timezone column. Report queries now slice day boundaries
at read time by the caller-supplied @tz instead of materialising in a
fixed tz. Regenerate sqlc.

* feat(server): add task_usage_hourly backfill command

Replace the two legacy backfill commands (daily / dashboard_daily) with
a single backfill_task_usage_hourly that loads historical task_usage
into the new UTC hourly rollup, sliced per workspace.

* refactor(server): resolve viewing timezone in report handlers

Report handlers resolve the Viewing tz per request (?tz query param,
then user.timezone, then UTC) and pass it to the hourly-rollup queries.
Drop the UseDailyRollup feature flags and the old raw-scan/daily-rollup
dual paths, remove the /api/usage endpoints, and stop the daemon from
reporting and the runtime handler from accepting host timezone.

* refactor(core): switch report queries to viewing timezone

API client and dashboard/runtime queries send ?tz with each report
request, the user schema/types carry the new timezone field, and the
runtime timezone field/mutation is removed.

* feat(views): add viewing timezone preference and UI

Add the useViewingTimezone hook and a Timezone setting in Preferences;
report charts and the dashboard week boundary follow the viewer tz.
Remove the runtime detail timezone editor and its locale strings.

* fix(test): update fixtures and stabilize tests for timezone refactor

The timezone architecture refactor changed several types without
updating dependent test code:

- RuntimeDevice no longer has a timezone field — drop it from the
  create-agent-dialog runtime fixture.
- User now requires a timezone field — add it to the apps/web mockUser
  fixture.
- The PreferencesTab timezone tests asserted on the async save handler
  (PATCH then store update) with a bare expect, racing the mutation's
  settle callback, and timed out querying the Select's ~600-option IANA
  list on a loaded CI runner. Wrap the assertions in waitFor and extend
  the timeout for those three tests.

* docs(timezone): document self-host migration order and trigger invariant

Add a SELF-HOST UPGRADE ORDER runbook to the backfill command's package
comment: applying migrations 100-104 in a single migrate-up drops the
legacy daily rollups before the hourly backfill runs, leaving dashboards
empty until cron catches up.

Add an INVARIANT comment on trg_atq_dirty_hourly noting that agent_id
must be added to the trigger's OF list if it ever becomes mutable,
otherwise dirty buckets for the old agent_id are silently missed.

* style(runtimes): drop trailing blank line in runtime-detail
2026-05-21 15:33:47 +08:00
Angular
1f978bf1ec feat(autopilot): link created issues to projects (#2908)
* feat(autopilot): link created issues to projects

* test(autopilot): cover project flag
2026-05-20 15:37:23 +08:00
Bohan Jiang
2bec2221d2 feat(agent): per-agent thinking_level for claude + codex (MUL-2339) (#2865)
* feat(agent): persist thinking_level per agent (MUL-2339)

Adds a nullable `thinking_level` column to the `agent` table so the
backend can route a runtime-native reasoning/effort token (e.g. Claude's
`xhigh`, Codex's `minimal`) through to the agent CLI on every dispatch.

The column is intentionally TEXT rather than an enum — Claude and Codex
publish overlapping but distinct vocabularies and we want the persisted
value to round-trip exactly through whichever CLI receives it. NULL is
the "use runtime default" sentinel that every downstream consumer reads
as "do not inject --effort / reasoning_effort".

This commit is just the storage layer (migration + sqlc); subsequent
commits wire it through the API, daemon, and agent backends.

Co-authored-by: multica-agent <github@multica.ai>

* feat(agent-backend): inject reasoning effort for claude + codex (MUL-2339)

Extends ExecOptions with a runtime-native ThinkingLevel string and wires
it into the Claude and Codex backends. Discovery is driven by the local
CLI so the daemon advertises whatever the host install supports rather
than a hand-maintained list that goes stale.

Per Elon's PR1 review:
- Claude: parses `claude --help` to learn the `--effort` superset and
  projects through a per-model allow-list (xhigh is Opus-only; max is
  session-only on the smaller models). Falls back to a conservative
  static list when the binary is missing or help drift hides the line.
- Codex: drives `codex debug models --output json` so per-model
  reasoning subsets and the documented default come directly from the
  CLI. The older config-error probe trick is gone — the JSON path is
  stable and doesn't pollute stderr with an intentional misconfig.
- Cache key includes (provider, executablePath, cliVersion) so a CLI
  upgrade invalidates entries that referenced the older help / catalog.

Per Trump's PR1 constraint, all three Codex injection points
(thread/start.config, thread/resume.config, turn/start.effort) flow
through one helper (`applyCodexReasoningEffort`) so they cannot drift
independently. The shared `codexReasoningCases` fixture in
`thinking_test.go` asserts the same value→{shape, key} contract at
each site for every level the runtimes know about.

Claude's `--effort` is also added to `claudeBlockedArgs` so a user
custom_args entry can't silently outvote the daemon-injected value.

Co-authored-by: multica-agent <github@multica.ai>

* feat(api): wire thinking_level through API + daemon contract (MUL-2339)

End-to-end plumbing for the per-agent reasoning/effort setting:

- AgentResponse / TaskAgentData now carry `thinking_level`; the daemon's
  claim response includes it and the daemon's executor passes it through
  to agent.ExecOptions, where the Claude and Codex backends already know
  what to do with it.
- ModelEntry on the runtime-models wire format gains a `thinking` block
  carrying `supported_levels` + `default_level` per model so the UI can
  render a runtime-aware picker without the server having to know about
  the local CLI install. `handleModelList` projects the agent-package
  catalog (including the new Thinking field) into the wire shape.
- CreateAgent / UpdateAgent gate the field with a synchronous provider
  enum check (claude / codex only today). UpdateAgent is tri-state:
  field omitted = no change, "" = explicit clear (new
  `ClearAgentThinkingLevel` query, mirrors the existing mcp_config null
  pattern), non-empty = validate then set.

Per Trump's PR1 review, the API NEVER auto-clears on a runtime/model
swap and ALWAYS returns 400 on an unknown literal value — same shape
across CreateAgent, UpdateAgent, and combined patches that move
runtime + level in one request. Per-model combination failures (e.g.
`xhigh` against a model that only supports up to `high`) surface as a
daemon-side task error, not a silent server-side rewrite.

TS types follow the same shape: `Agent.thinking_level`,
`CreateAgentRequest`/`UpdateAgentRequest` add the field, `RuntimeModel`
grows a `thinking` block. Older backends omit the field, which the
front-end treats as "no picker for this model" — installed desktop
builds keep working.

Co-authored-by: multica-agent <github@multica.ai>

* fix(agent): correct codex debug models argv + pin via runner test (MUL-2339)

`codex debug models --output json` is rejected by codex-cli 0.131.0 —
the subcommand emits JSON on stdout by default and has no `--output`
flag. Drop the flag and add `--bundled` to skip the network refresh
discovery doesn't need. Move the argv to a package-level var and add
a test that runs a fake `codex` to assert the binary actually
receives exactly `debug models --bundled`, so the contract can't
silently drift on the next refactor.

Also teach ValidateThinkingLevel to resolve an empty model to the
provider's default model entry. Without this, every default-model
task with a persisted thinking_level would be misjudged "unknown
model" by the daemon guard.

Co-authored-by: multica-agent <github@multica.ai>

* fix(api): reject runtime switch that would leave invalid thinking_level (MUL-2339)

A PATCH that changed `runtime_id` without touching `thinking_level`
used to silently keep the existing value, so a Claude agent storing
`max` could land on a Codex runtime where `max` is not a recognised
token at all, and the daemon would receive a literal-invalid level.

Hold the same "always 400 on literal-invalid, never silent coerce"
rule on this implicit path. When runtime_id changes and the existing
value is not in the new provider's enum, return 400 with the
recovery options (clear via `thinking_level=""` or re-set in the
same PATCH).

Add coverage for both the kept-when-still-valid and the rejected
cases, plus the two recovery paths (clear and replace).

Co-authored-by: multica-agent <github@multica.ai>

* fix(daemon): guard runTask with per-model thinking_level validator (MUL-2339)

ValidateThinkingLevel existed but had no call site — `task.Agent.
ThinkingLevel` flowed straight into ExecOptions, so `xhigh` configured
on a non-Opus Claude model, or API-side stale values that escaped the
provider enum gate, would be injected anyway.

Run the validator before building ExecOptions. Invalid combinations
log a warning and drop the level instead of failing the task: the
agent still runs, just at the runtime's default reasoning effort.
Discovery errors fail open (keep the level, let the CLI surface any
objection) so a transient `claude --help` failure can't strand work.

Empty model is forwarded as-is; the validator resolves it to the
provider's default model internally per the cross-package contract.

Co-authored-by: multica-agent <github@multica.ai>

* chore(agent): drop stale `--output json` comments + unused scanner (MUL-2339)

Codex CLI's `debug models` subcommand emits JSON without an `--output`
flag, and `parseCodexDebugModels` never read from the bufio.Scanner.
Sync the comments with the actual invocation and remove the dead init.

Co-authored-by: multica-agent <github@multica.ai>

---------

Co-authored-by: multica-agent <github@multica.ai>
2026-05-20 12:30:10 +08:00
Jiayuan Zhang
fc8528d64d feat(autopilot): support assigning to a squad (MUL-2429) (#2888)
* feat(autopilot): support assigning autopilot to a squad (MUL-2429)

Path A (Squad-as-Leader) from the RFC: when an autopilot's assignee is a
squad, dispatch resolves to squad.leader_id and executes against the
leader's runtime — semantics match a human manually assigning the issue
to that squad, no fan-out.

Backend scope only; frontend picker change is a follow-up PR.

Changes:
- 096_autopilot_squad_assignee migration: drop agent FK on
  autopilot.assignee_id, add assignee_type column (default 'agent'),
  add autopilot_run.squad_id attribution column.
- service.AgentReadiness: single source of truth for archived /
  runtime-bound / runtime-online checks. Shared by autopilot
  admission gate, run_only dispatch, and isSquadLeaderReady.
- service.resolveAutopilotLeader: translates assignee_type/id to the
  agent that actually runs the work.
- dispatchCreateIssue: stamps issue with assignee_type='squad' for
  squad autopilots and enqueues via EnqueueTaskForSquadLeader.
- dispatchRunOnly: belt-and-braces readiness re-check after resolving
  squad → leader so a leader that went offline between admission and
  dispatch produces a clean failure instead of a doomed task.
- handler.CreateAutopilot / UpdateAutopilot: accept assignee_type with
  squad/agent existence + leader-archived validation. Backward-compatible
  default of "agent" preserves the contract for older clients.
- Analytics: AutopilotRunStarted/Completed/Failed events carry
  assignee_type and squad_id; PostHog can now group autopilot runs by
  squad without joining back to the autopilot row.

Co-authored-by: multica-agent <github@multica.ai>

* fix(autopilot): reject archived squads, route post-admission skips, cleanup dangling-agent autopilots (MUL-2429)

Addresses three review findings on PR #2888:

1. Archived squad handling: validateAutopilotAssignee now rejects squads
   with archived_at set; resolveAutopilotLeader returns errSquadArchived
   so the admission gate fails closed; DeleteSquad now mirrors the issue
   transfer for autopilot rows (TransferSquadAutopilotsToLeader) so
   surviving autopilots flip to assignee_type='agent' (leader) instead
   of dangling at the archived squad.

2. dispatchRunOnly post-admission readiness: introduces errDispatchSkipped
   sentinel, recognised by DispatchAutopilot via handleDispatchSkip so
   the run is recorded as `skipped` (not `failed`). Manual triggers no
   longer 500 when the leader's runtime goes offline between admission
   and task creation. New TestManualTriggerDoesNotErrorOnPostAdmissionSkip
   locks the behaviour in.

3. Dangling agent assignee after migration 096 dropped the FK:
   shouldSkipDispatch now distinguishes pgx.ErrNoRows / errSquadArchived
   (hard skip — retrying won't help) from transient DB errors
   (fail-open). DeleteAgentRuntime pauses autopilots that target agents
   about to be hard-deleted (ListArchivedAgentIDsByRuntime +
   PauseAutopilotsByAgentAssignees) so the breakage surfaces as a paused
   row in the UI instead of a quiet skip-burning loop.

Unit tests cover the sentinel unwrap contract and errSquadArchived
errors.Is behaviour. Integration test
TestAutopilotDispatchSkipsWhenRuntimeOffline re-verified against a fresh
DB with migration 096 applied.

Co-authored-by: multica-agent <github@multica.ai>

* fix(autopilot): bump last_run_at on post-admission skip (MUL-2429)

Match recordSkippedRun (pre-flight skip) and the success path so the
scheduler / "last seen" UI both reflect that this tick evaluated the
trigger, even when the post-admission readiness gate caught a late
regression.

Addresses Emacs review caveat #1 on PR #2888.

Co-authored-by: multica-agent <github@multica.ai>

* feat(autopilot): mixed agent/squad assignee picker in dialog (MUL-2429)

End-to-end UI for assigning an autopilot to a squad. Closes the PR #2888
backend gap: the squad-as-assignee feature was already wired in Go (Path A,
RFC §4) but the desktop dialog never offered the choice.

- core/types/autopilot: add `AutopilotAssigneeType`, surface
  `assignee_type` on `Autopilot` + Create/Update request payloads.
- views/autopilots/pickers/agent-picker: switch to a polymorphic
  AssigneeSelection (`{type, id}`); render agents and squads as two
  grouped sections with shared pinyin search.
- views/autopilots/autopilot-dialog: maintain `assigneeType` state, send
  it on create/update, render the trigger avatar / hover dot with
  `assignee.type`.
- views/autopilots/autopilots-page + autopilot-detail-page: render the
  assignee row using `autopilot.assignee_type` so squad-typed autopilots
  show the squad avatar + name, not a broken agent lookup.
- locales: add `agents_group` / `squads_group` / `select_assignee` keys
  (en + zh-Hans), keep legacy `select_agent` for callers that still
  reference it.

Co-authored-by: multica-agent <github@multica.ai>

---------

Co-authored-by: Lambda <lambda@multica.ai>
Co-authored-by: multica-agent <github@multica.ai>
2026-05-20 05:30:13 +02:00
Jiayuan Zhang
2ad1cd8ff8 feat(profile): user profile description injected into agent brief (MUL-2406)
## Summary

Adds per-user `profile_description` so coding agents have cheap, durable context about who is asking. v1 per the brief Xeon locked in on [MUL-2406](mention://issue/63a7247c-4f6a-42cf-90d1-7c746e77158a):

- **DB** — `user.profile_description TEXT NOT NULL DEFAULT ''` (migration 096). 2000-rune cap enforced server-side. No nullable / privacy state to manage.
- **API** — `PATCH /api/me` accepts the field; `UserResponse` always emits it. Client wraps `updateMe` in a lenient `UserSchema` + `EMPTY_USER` fallback per CLAUDE.md API Response Compatibility.
- **UI** — Settings → Account gains an "About you" textarea with live `n/2000` counter, `maxLength` guard, and a localized too-long error (EN + zh-Hans).
- **CLI** — `multica user profile get` / `multica user profile update` with `--description / --description-stdin / --description-file / --clear`, mirroring the existing `issue comment add` input-mode menu.
- **Daemon injection** — claim handler resolves the runtime owner and stamps `requesting_user_name` + `requesting_user_profile_description` on the task. `buildMetaSkillContent` emits `## Requesting User` between `## Agent Identity` and `## Available Commands`, blockquoted and framed as background context. The block is omitted entirely when the description is empty (no token cost when unused).

Brief is written **once per task** via `CLAUDE.md` / `AGENTS.md`, not the per-turn prompt — same path the agent already reads for identity, so no extra per-turn cost.

## Test plan

- [x] `go build ./...`, `go vet ./...`, `go test ./internal/cli/ ./internal/daemon/ ./internal/daemon/execenv/ ./cmd/multica/`
- [x] New brief tests: `TestBuildMetaSkillContentEmitsRequestingUser`, `TestBuildMetaSkillContentOmitsRequestingUserWhenEmpty`
- [x] `pnpm typecheck`, `pnpm lint`, `pnpm test` (74 files, 644 tests pass)
- [ ] Handler DB tests (`TestUpdateMe*`) require a migrated test DB — not runnable in this sandbox
- [ ] Manual: open Settings → Account, set a description, confirm the next daemon-run agent's `CLAUDE.md` shows `## Requesting User`
2026-05-19 19:51:28 +02:00
Jiayuan Zhang
591e47842d refactor(onboarding): remove starter-content kit; unify install-runtime issue across mark-onboarded paths (MUL-2438) (#2884)
* refactor(onboarding): remove starter-content kit, unify install-runtime issue across mark-onboarded paths (MUL-2438)

Drops the post-onboarding ImportStarterContent / DismissStarterContent
flow (handler + routes + StarterContentPrompt + templates + locale
strings + analytics event). The bug — web onboarding seeding 6+ starter
issues without a runtime — only existed through that path; with it gone
the source disappears.

The "install a runtime" issue from BootstrapOnboardingNoRuntime is now
the canonical no-runtime onboarding seed. The title/description and a
LockAndFindActiveDuplicate-deduped seeder move to
handler/no_runtime_issue.go, and CompleteOnboarding / CreateWorkspace /
AcceptInvitation seed it whenever the workspace has no runtime yet, so
every mark-onboarded entry point lands the user on a concrete next
step.

starter_content_state column is kept and continues to be claimed as
'imported' in all five entry points so older desktop builds (which
still render the legacy dialog on NULL) don't surface it to accounts
created after this change.

Co-authored-by: multica-agent <github@multica.ai>

* fix(onboarding): backfill starter_content_state for in-window NULL users (MUL-2438)

054 only covered pre-feature users. Anyone onboarded between then and the
starter-content kit removal could still sit at NULL, and old desktop
clients gate the legacy StarterContentPrompt on `starter_content_state
IS NULL`. The import/dismiss routes are gone, so leaving these rows NULL
would surface a dialog whose buttons 404. Mark them 'imported' to match
the new helper's claim semantics.

Co-authored-by: multica-agent <github@multica.ai>

---------

Co-authored-by: Lambda <lambda@multica.ai>
Co-authored-by: multica-agent <github@multica.ai>
2026-05-19 18:37:48 +02:00
Jiayuan Zhang
c577a29c10 feat(onboarding): v2 per-question questionnaire (source/role/use_case) (#2814)
* feat(onboarding): per-question v2 questionnaire (source/role/use_case)

Replaces the 3-questions-on-one-screen gate with three lightweight,
individually-skippable steps. New step order:

  welcome → source → role → use_case → workspace → runtime → agent → first_issue

- New v2 questionnaire schema: source/role/use_case + per-slot
  `*_skipped` markers. `team_size` removed.
- Click-to-advance card grid with lucide + emoji icons (RFC Option B).
- Skip is a footer text button; Other expands a free-text input.
- Recommendation table updated for new role × use_case vocabulary,
  with use_case-only fallback when role is skipped.
- DB migration v1 → v2 maps existing role/use_case answers and drops
  team_size; historical nulls stay null (not retroactively skipped).
- Re-entry treats skipped slots as fresh; analytics record kept in DB.
- onboarding_questionnaire_submitted event payload updated:
  source replaces team_size, per-slot skip booleans added.

Co-authored-by: multica-agent <github@multica.ai>

* fix(onboarding): tighten question UX (Continue, layout, brand icons)

Address review feedback on Source/Role/Use-case:

- Replace auto-advance with an explicit Continue button so selections
  are reviewable. Continue is disabled until something is picked (and,
  for Other, until the free-text input is non-empty).
- Move Back/Skip/Continue inline under the option grid; drop the
  duplicate Back from the top header — the page now has a single,
  anchored action row.
- Swap the placeholder lucide marks for real brand SVGs on Source:
  Google, X, LinkedIn, YouTube, and an OpenAI mark for the AI-assistant
  option. Generic options stay on lucide.
- Replace the awkward expanded underline input on the Other card with
  an inline borderless input that swaps in for the label slot, so the
  Other state has the same height and weight as the other cards.

E2E smoke test updated to click Continue between question steps.

Co-authored-by: multica-agent <github@multica.ai>

* fix(onboarding): unify step nav, rename Runtime step around "where agents run"

- Refactor the Source/Role/Use case questionnaire steps to use the same
  3-region chrome (header with Back + step indicator, scrolling main,
  sticky footer with Skip + Continue) that Workspace/Runtime/Agent
  already use, so the Back/Skip/Continue affordances stay in the same
  on-screen position across the whole flow.
- Reframe the Runtime step around the user-visible question — "Where
  will your agents run?" — instead of the internal "runtime" concept.
  The aside panel keeps the educational "What's a runtime?" copy for
  users who want to learn.
- Drop the hard-coded "Step 3 · Runtime" eyebrow on the web fork step:
  Runtime is now step 5 of 7 after the per-question split, and the
  step indicator already shows the correct count.

Co-authored-by: multica-agent <github@multica.ai>

* fix(onboarding): tighten Skip/Continue spacing in step footer

Group Skip and Continue inside a sub-flex with gap-2 so they read as a
single action cluster on the right, while the status hint still anchors
left via mr-auto. Applied to both the questionnaire steps and the
runtime step so the footer layout stays consistent across onboarding.

Co-authored-by: multica-agent <github@multica.ai>

* fix(onboarding): move Skip/Continue inline below form, drop sticky footer

The sticky bottom footer left a large dead zone between the form
content and the action buttons — most onboarding steps only fill the
top third of the viewport. Move the hint + Skip + Continue inline,
directly below the form/options grid, so the buttons sit where the eye
already is after picking an option.

Co-authored-by: multica-agent <github@multica.ai>

* fix(onboarding): match Skip button size to Continue (size="lg")

Skip used the default button size (h-8) while Continue used size="lg"
(h-9), so the two adjacent action buttons rendered visibly different
heights. Promote Skip to size="lg" in step-question and
step-runtime-connect so they line up.

Co-authored-by: multica-agent <github@multica.ai>

* fix(onboarding): reframe step 3 as 'connect a computer' / 'pick an agent runtime'

Co-authored-by: multica-agent <github@multica.ai>

* fix(onboarding): replace cloud waitlist with "Coming soon", reword CLI intro

- Web Step 3 cloud card: remove "Join waitlist" CTA + dialog and render a
  static "Coming soon" badge instead. Drops CloudWaitlistDialog, the
  cloud DialogState, waitlistSubmitted local state, and the
  onWaitlistSubmitted prop on StepPlatformFork (desktop's
  StepRuntimeConnect still owns its own waitlist path).
- Tighten cloud_subtitle to drop the "join the waitlist" half now that
  the action is gone.
- cli_install.intro: "AI coding tool" → "agent runtime", EN + zh-Hans.

Tests updated to match: asserts the Coming soon badge is non-actionable
and drops the four cloud-dialog scenarios (now unreachable).

Co-authored-by: multica-agent <github@multica.ai>

* fix(onboarding): refresh button, "agent runtime" wording, coming-soon card

Three fixes on the desktop Step 3 empty state per review:

1. Empty headline + hints now say "agent runtime", matching the
   picker-context terminology established earlier in this PR.
2. Add a Refresh button (header pill in Found, inline with the
   headline in Empty). Desktop wires it to restart the bundled
   daemon so a freshly-installed Claude/Codex/Cursor CLI is picked
   up — the daemon's PATH probe runs once at boot, so without a
   restart the install would only take effect on next launch.
3. "Use a cloud computer" loses the waitlist dialog and renders as
   a disabled "Coming soon" badge, aligning with the web fork.

Co-authored-by: multica-agent <github@multica.ai>

* fix(onboarding): address review follow-ups (i18n, step-order, version, tests)

- runtime-aside-panel: point "Learn more" to /docs/install-agent-runtime,
  branching by language so zh users land on /docs/zh/...
- zh-Hans: unify Cloud "Coming soon" wording to "即将推出"; translate
  step_workspace.preview.more_meta ("and more" -> "等等")
- onboarding-flow: derive forward navigation from ONBOARDING_STEP_ORDER
  via advanceFrom(curr) so inserting/reordering a step only requires
  editing the canonical array; runtime → agent/first_issue branch keeps
  its bespoke routing with a comment explaining why
- onboarding handler: gate questionnaireAnswers.complete() on
  Version == 2 so a future schema bump can't be silently mis-counted
  against v2 funnel semantics
- add unit tests for step-source / step-role / step-use-case (option
  click, Skip patch, Other free-text) and step-question shell
  (canContinue + pendingOther state machine)

Co-authored-by: multica-agent <github@multica.ai>

* fix(onboarding): rename useCaseFallback to fallbackFromUseCase

ESLint's react-hooks/rules-of-hooks treats any function starting with
"use" as a React hook. The helper is a pure switch — give it a name
that doesn't trip the rule.

Co-authored-by: multica-agent <github@multica.ai>

---------

Co-authored-by: multica-agent <github@multica.ai>
2026-05-19 05:35:18 +02:00
Bohan Jiang
2323b72710 feat(autopilots): webhook delivery layer + idempotency/signature/replay (MUL-2334) [PR1] (#2774)
* feat(autopilots): webhook delivery layer + idempotency / signature / replay (MUL-2334)

Splits "inbound webhook receipt" from "autopilot run creation" so we can
record duplicate attempts, signature outcomes, and ignored/skipped
deliveries — and replay a delivery on demand. v1 ingress wrote straight
into autopilot_run.trigger_payload, which collapsed the two concerns and
left run_only autopilots vulnerable to provider retry storms.

Backend only (PR1). UI Deliveries tab follows in PR2.

Schema (migration 093):
  - autopilot_trigger.provider: 'generic' | 'github' (default 'generic').
  - autopilot_trigger.signing_secret: nullable plaintext (HMAC needs it
    cleartext; mirrors how webhook_token is stored).
  - webhook_delivery: one row per inbound POST. Carries raw_body,
    selected_headers, dedupe_key/source, signature_status,
    autopilot_run_id, replayed_from_delivery_id, response_status / body.
  - Partial unique index on (trigger_id, dedupe_key) excludes NULL and
    'rejected' rows, so a wrong-secret 401 does NOT permanently block a
    future retry with the same X-GitHub-Delivery once the operator fixes
    the secret.

Ingress flow (autopilot_webhook.go), persist-first + sync dispatch:
  1. IP rate limit -> 2. token lookup -> 3. token rate limit ->
  4. read raw body -> 5. autopilot/workspace cross-check ->
  6. normalize JSON (400 without persistence on parse failure) ->
  7. compute dedupe key + signature status ->
  8. INSERT delivery (status=queued). On (trigger_id, dedupe_key)
     unique-violation: bump attempt_count on existing row and return
     the original delivery_id + autopilot_run_id with 200 ->
  9. invalid/missing signature: UPDATE -> rejected, return 401 with
     delivery_id (no dispatch, not replayable) ->
 10. trigger disabled / autopilot paused/archived: UPDATE -> ignored,
     return 200 ->
 11. DispatchAutopilot synchronously, UPDATE -> dispatched/skipped/failed
     with autopilot_run_id and the response body we returned ->
 12. TouchAutopilotTriggerFiredAt and return 200.

No new long-running worker. A stale 'queued' row only happens if the
process dies between INSERT and UPDATE; that's a follow-up sweeper, not
this PR.

Authenticated API:
  - GET    /api/autopilots/{id}/deliveries (slim list)
  - GET    /api/autopilots/{id}/deliveries/{deliveryId} (with raw_body)
  - POST   /api/autopilots/{id}/deliveries/{deliveryId}/replay -> creates
    a new delivery row (replayed_from_delivery_id set), dispatches a
    new run, never collapses onto the original via dedupe.
  - PUT    /api/autopilots/{id}/triggers/{triggerId}/signing-secret
    Write-only; trigger response surfaces has_signing_secret +
    signing_secret_hint (last 4 chars), never the secret itself.

Signature verification reuses the GitHub-compatible
X-Hub-Signature-256: sha256=<hex(hmac(body, secret))> scheme; the
HMAC helper is constant-time. Invalid/missing signatures still count
against per-IP and per-token rate limits.

autopilot_run.trigger_payload is intentionally preserved — delivery
records the HTTP receipt; run records the normalized envelope handed
to the agent. They are two different views.

Tests (Postgres-backed):
  - delivery persistence on accept
  - dedupe via Idempotency-Key and X-GitHub-Delivery; run_only retry
    storm pin (3 retries -> 1 run)
  - invalid signature: 401 + rejected row + no run linkage
  - missing signature when secret configured: 401 + 'missing' state
  - valid signature dispatches
  - signing secret never echoed in trigger responses; hint shows last 4
  - min-length and clear-by-empty for signing secret PUT
  - replay creates a NEW delivery + new run; rejected deliveries cannot
    be replayed
  - list omits raw_body; detail includes it; cross-autopilot ID returns
    404 (workspace isolation defense in depth)
  - provider validation: unknown -> 400, github -> 201 round-trips
  - bad-signature stream still counts against per-token rate limit

Co-authored-by: multica-agent <github@multica.ai>

* fix(autopilots): address PR review on webhook delivery layer (MUL-2334)

- Exclude `failed` from the (trigger_id, dedupe_key) partial unique index
  alongside `rejected`, so a transient ingress failure does not strand the
  provider's stable X-GitHub-Delivery / Idempotency-Key retry. Update the
  dedupe lookup to prefer non-terminal rows under the same predicate.
- Tighten delivery status enum: drop `skipped` from the CHECK constraint
  and from the handler. A run that was admission-skipped (e.g. runtime
  offline) is now recorded as delivery=`dispatched` linked to the
  skipped run, with the response payload carrying status=`skipped`.
  Source of truth for skipped-ness is autopilot_run.status, not the
  delivery row — keeps the Deliveries UI enum unambiguous.
- On dispatch error, link the (possibly non-nil) autopilot_run returned
  by DispatchAutopilot to the failed delivery so Deliveries UI can
  navigate to the run row for debugging.
- Slim list projection: ListWebhookDeliveriesByAutopilot no longer pulls
  raw_body / selected_headers / response_body — a 100-row page × 256 KiB
  would otherwise round-trip ~25 MiB from Postgres per Deliveries reload.
  Detail endpoint continues to return the full row.
- Fix backend CI: TestGetDelivery_ReturnsFullPayload now decodes the
  response and asserts on the parsed raw_body instead of substring-
  matching against an escaped JSON string; raise the test-suite default
  webhook rate limits in TestMain so the shared 192.0.2.1 IP bucket
  doesn't fill across the suite and leak 429s into unrelated tests.
- Add regression coverage for the dedupe-after-failure path.

cd server && go test ./... is green locally.

Co-authored-by: multica-agent <github@multica.ai>

---------

Co-authored-by: multica-agent <github@multica.ai>
2026-05-18 14:59:40 +08:00
Kerim Incedayi
9418d2a2c1 feat(autopilots): webhook triggers (server + CLI + UI + docs) MUL-2049 (#2348)
* feat(server): add webhook trigger DB migration + sqlc queries

Lays the foundation for webhook autopilot triggers:
- partial unique index on autopilot_trigger.webhook_token (kind=webhook only)
  so the public ingress route can resolve a trigger in O(1)
- GetWebhookTriggerByToken / TouchAutopilotTriggerFiredAt /
  RotateAutopilotTriggerWebhookToken / SetAutopilotTriggerWebhookToken
  queries, regenerated with sqlc

* feat(server): webhook token generator + payload normalizer

Two pure helpers for the webhook autopilot work:
- generateWebhookToken: 32 random bytes -> base64-url, "awt_" prefix.
  256 bits of entropy keeps brute-force off the table; the prefix makes
  leaked tokens recognisable in logs.
- normalizeWebhookPayload: turns arbitrary JSON into the WebhookEnvelope
  shape (event/eventPayload/request) used by trigger_payload. Header- and
  body-based event inference covers GitHub, GitLab, X-Event-Type, and
  caller-provided envelopes; scalar/empty/invalid bodies are rejected so
  the handler can answer 400.

* feat(server): generate webhook tokens and expose rotate endpoint

- New handler.Config.PublicURL fed by MULTICA_PUBLIC_URL env so
  /api/autopilots/.../triggers responses can include an absolute
  webhook_url alongside the always-present webhook_path.
- CreateAutopilotTrigger now mints a webhook_token via crypto/rand
  for kind=webhook and ignores cron/timezone for non-schedule kinds.
  api triggers stay accepted-but-inert per PLAN.md.
- New POST /api/autopilots/{id}/triggers/{triggerId}/rotate-webhook-token
  protected by the existing workspace auth group; old tokens stop
  working immediately because the unique-index lookup keys on the
  current row value.

* feat(server): public webhook ingress route + per-token rate limiter

- New POST /api/webhooks/autopilots/{token} route, mounted outside the
  authenticated group: the path token is the credential. Workspace
  context is derived from the joined autopilot row, never headers.
- Body capped at 256 KiB via http.MaxBytesReader; oversized payloads
  return 413 mid-read instead of being fully buffered.
- Disabled triggers / paused / archived autopilots return
  200 {"status":"ignored"} so providers stop retrying.
- Skipped-runtime dispatches surface 200 {"status":"skipped"} with the
  reason from the autopilot service's pre-flight admission check.
- WebhookRateLimiter interface with sliding-window in-memory + Redis
  Lua-script implementations. Default 60 req/min per token. Test
  coverage on the in-memory path; Redis variant fails open on cache
  errors so a Redis hiccup never blocks ingress.
- Integration tests exercise token generation, dispatch, payload
  envelope persistence, GitHub-header inference, paused/disabled
  short-circuits, oversized rejection, and rotate-then-old-token-404.

* feat(server): include webhook payload in create_issue description

When an autopilot run is triggered by a webhook and execution_mode is
create_issue, the agent only sees the issue body — never the run's
trigger_payload. Append a 'Webhook event:' line and a fenced JSON block
with the normalized eventPayload so the agent has the inbound context
inline. Schedule / manual runs are unchanged.

Tests cover:
  - schedule path keeps existing italic note, no webhook block
  - webhook path emits event line + payload block, italic before block
  - non-envelope JSON falls back to raw body (defensive)
  - non-webhook source with payload still gets no webhook block

* feat(core): types, API client and mutations for webhook triggers

- AutopilotRunStatus gains 'skipped' so the run-list UI handles the
  admission-skipped state explicitly instead of falling through to a
  generic case (the backend already emits it via MUL-1899).
- AutopilotTrigger picks up optional webhook_path / webhook_url. Both
  are optional so older self-hosted servers that pre-date this change
  still parse cleanly.
- buildAutopilotWebhookUrl helper composes a usable absolute URL with
  the priority webhook_url > apiBaseUrl + path > origin + path > path.
  Tested with seven cases covering each branch.
- ApiClient.rotateAutopilotTriggerWebhookToken posts to
  /api/autopilots/{id}/triggers/{triggerId}/rotate-webhook-token; the
  HTTP-contract test pins URL + method.
- useRotateAutopilotTriggerWebhookToken mutation invalidates
  autopilotKeys.detail on settle, mirroring the existing trigger-mutation
  pattern.

* feat(views): webhook trigger UI in Add Trigger dialog and trigger row

Add Trigger dialog gains a Schedule/Webhook segmented toggle:
  - Schedule reuses TriggerConfigSection unchanged.
  - Webhook hides the cron config and shows a help line; the trigger is
    created with kind=webhook and the URL is generated server-side.
  - Toast text differentiates schedule vs webhook on success.

TriggerRow grows a webhook branch:
  - Webhook icon, kind translated via trigger_kind.
  - URL shown in a truncating monospace pill, with copy + rotate
    buttons. Copy uses navigator.clipboard with toast feedback; rotate
    uses an AlertDialog confirm because the old URL stops working
    immediately.
  - api triggers render a Deprecated badge and skip URL/copy/rotate
    affordances.

RunRow gains a 'skipped' RUN_VISUAL entry (muted dash) so admission-
skipped runs don't fall through to a generic case. Source label uses the
new run_source i18n key instead of capitalize.

Locales: en + zh-Hans gain run_status.skipped, run_source.*,
trigger_kind.*, trigger_row.{copy_url,rotate_url,*_confirm_*,toast_*},
add_trigger_dialog.{type_*,webhook_help,toast_added_{schedule,webhook}}.

* feat(cli): support webhook trigger creation and URL rotation

- multica autopilot trigger-add now takes --kind schedule|webhook
  (default schedule for backward compatibility). For webhook it skips
  --cron / --timezone validation and prints the resulting webhook URL,
  preferring the server-provided webhook_url and falling back to
  client.BaseURL + webhook_path.
- New multica autopilot trigger-rotate-url <autopilot-id> <trigger-id>
  command for rotating the bearer URL of a webhook trigger.

* docs(autopilots): add webhook trigger guide (en + zh)

Replaces the 'Webhook and API triggers are not available yet' section
with end-to-end webhook documentation: how the URL is generated, what
payload shapes are accepted, the inferred-event rules, the bearer-secret
warning + rotate flow, status-code semantics for accepted/skipped/
ignored/4xx/5xx outcomes, and the MULTICA_PUBLIC_URL self-host
configuration.

Run history list now mentions skipped status. The 'unavailable
features' section narrows to api-kind triggers, HMAC signing, IP
allowlists, and provider presets.

* feat(views): add Schedule/Webhook toggle to the create autopilot dialog

Closes the gap where a brand-new autopilot could only be created with a
schedule trigger. The right-column config now has a Trigger section
with a segmented Schedule/Webhook control:
  - Schedule keeps the existing cron/timezone UI.
  - Webhook hides the cron UI and shows a help line; on submit, a
    kind=webhook trigger is created right after the autopilot.

In edit mode the toggle is intentionally hidden (PLAN.md treats trigger-
type changes as delete-old + create-new, not in-place updates), but the
panel still picks the right kind based on props.triggers[0].kind so a
webhook autopilot doesn't render an irrelevant cron form.

Locales: section_trigger_kind, trigger_kind_{schedule,webhook},
section_webhook, webhook_help_{create,edit} added in en + zh-Hans.

* feat(views): show webhook URL inline after creating a webhook autopilot

After a successful create with kind=webhook, the dialog stays open and
swaps to a confirmation panel showing the freshly minted URL with a
copy button + 'Treat this URL like a password' warning + Done button.
Avoids the friction of "create the autopilot, then go find it in the
list, click in, scroll to triggers, copy URL."

Locales: dialog.webhook_created_{title,description,warning,done} added
in en + zh-Hans.

Schedule create flow is unchanged (toast + close). The success panel is
gated on the trigger returned from the create mutation, so a partial
failure (autopilot created, trigger creation errored) still falls
through to the toast_create_partial path.

* feat(views): show webhook payload in run detail dialog

The agent transcript dialog now accepts an optional headerSlot that
sits above the event list. The autopilot RunRow drops a
WebhookPayloadPreview into that slot when the run came from a webhook
and trigger_payload is non-empty.

The preview is collapsed by default (the transcript itself is the main
event), shows the inferred event name + receivedAt in the header, and
reveals the eventPayload as pretty-printed JSON with a copy button on
expand. Falls back gracefully if the row's trigger_payload doesn't
match the WebhookEnvelope shape — the whole value is shown instead so
nothing is hidden.

Closes the "agent didn't echo the payload, now I can't see what
triggered the run" gap. PLAN.md tracked this as
"Payload preview in run history" under follow-ups.

Locales: webhook_payload.{label, unknown_event, payload, content_type,
copy, copied, copied_short, copy_failed} added in en + zh-Hans.

* chore(server): wire MULTICA_PUBLIC_URL through self-host compose

Two small follow-ups split out of the webhook trigger PR:

- docker-compose.selfhost.yml passes MULTICA_PUBLIC_URL into the
  backend container so a self-hosted deployment behind a real domain
  gets absolute webhook URLs in the trigger response. Documented in
  .env.example with the rationale for not deriving the public host
  from request headers.
- Drop a duplicated 'invalid json:' prefix in the webhook ingress
  400 error path. normalizeWebhookPayload already prefixes its
  errors, so the handler doesn't need to re-prefix.

* fix(migrations): renumber webhook trigger migration 081 → 089 to avoid collision

The branch's 081_autopilot_webhook_triggers.{up,down}.sql collided
numerically with 081_runtime_timezone.{up,down}.sql that landed on
main, making migration apply order undefined. Renumber to 089 so the
file slots after the latest main migration (088_squad_instructions).

The SQL itself doesn't conflict — it only creates a partial unique
index on autopilot_trigger.webhook_token — but the duplicate prefix
is what the migration runner sees, so the filename must move.

* fix(autopilot-webhook): address PR review blocking issues

- Redact bearer tokens from request logs: paths matching
  /api/webhooks/autopilots/<token> now log "[redacted]" instead of the
  token. The resolved trigger ID is plumbed via context so audit lines
  stay useful for debugging. (Review item Blocking #1.)
- Distinguish pgx.ErrNoRows from transient DB errors in token lookup:
  no-row stays 404 (so providers don't retry on a deleted webhook),
  other errors return 500 (which providers DO retry, avoiding silent
  drops on DB blips). (Review item Blocking #2.)
- Add per-IP sliding-window rate limiter that runs BEFORE the token
  lookup, so spraying random tokens can no longer probe the
  autopilot_trigger index unboundedly. Reuses the existing Lua script
  with a separate Redis key namespace; falls open on Redis errors.
  Default budget 30 req/min/IP. (Review item Blocking #3.)

The webhook handler now applies the gates in the order: per-IP rate
limit → token lookup → per-token rate limit → handler logic.

* fix(autopilot): atomic webhook trigger creation + strict kind/timezone validation

- Mint the webhook bearer token BEFORE the INSERT and pass it via
  CreateAutopilotTriggerParams so the row never exists in a half-written
  kind=webhook + webhook_token=NULL state. On the (vanishingly rare)
  unique-index collision the whole INSERT is retried with a fresh token
  — no UPDATE second step. Removes the now-dead attachFreshWebhookToken
  helper. (Review item Recommended #4.)
- Add new GET /api/autopilots/{id}/runs/{runId} endpoint that returns a
  single run including the full trigger_payload. The list response is
  now slim (omits trigger_payload) so worst-case payload size drops
  from ~5 MB to ~5 KB. (Review item Recommended #5, server side.)
- Reject kind=api with 400 ("kind=api is deprecated; use schedule or
  webhook") and reject kind=webhook with --timezone with 400 — both
  surfaces stragglers loudly instead of silently dropping fields.
  CLI mirrors the check so --timezone with --kind webhook errors
  client-side. (Review nits.)
- Add --yes (-y) flag and an interactive y/N confirmation prompt to
  `multica autopilot trigger-rotate-url` so the destructive rotate
  matches the UI's AlertDialog safety. (Review item Recommended #6.)

* fix(views): fetch webhook payload on-demand and truncate at 4 KiB

- Add useAutopilotRun query hook + getAutopilotRun API client method
  paired with the new server endpoint. The run-detail dialog now mounts
  a WebhookPayloadSlot that fetches the full run (incl. trigger_payload)
  lazily — list responses no longer carry up to 256 KiB × N runs of
  envelope data.
- WebhookPayloadPreview truncates its in-DOM <pre> at 4 KiB with a
  localized marker so jank-y machines aren't asked to render a 256 KiB
  JSON blob. The Copy button still yields the full string.
- Adds the truncated_marker i18n string to en + zh-Hans.

Review items Recommended #5 (frontend) and a nit on the preview's
unbounded <pre>.

* test(autopilot-webhook): close coverage gaps flagged in PR review

- request_logger: redactWebhookPath unit tests + integration test
  proving the bearer token never lands in slog output, plus the
  webhook_trigger_id context plumbing.
- autopilot_webhook_handler: empty body → 400, archived autopilot →
  200 ignored, per-IP rate limiter trips before DB lookup, kind=api
  and webhook+timezone are rejected at 400, slim list + full detail
  endpoint round-trip.
- webhook_rate_limiter: Lua script structure guard (catches reordering
  even without a live Redis), plus live-Redis tests for both per-token
  and per-IP limiters (REDIS_TEST_URL gated, matching the existing
  Redis test pattern in the package).
- WebhookPayloadPreview: envelope rendering, fallback shape, and the
  >4 KiB truncation path with full-payload-on-Copy guarantee.

Two branches are documented as code-review-protected rather than
covered by tests: the 500-on-DB-error path requires injecting a stub
Queries (no interface here), and the cross-workspace defense-in-depth
check is unreachable from valid SQL state.

* fix(middleware): SetWebhookTriggerID must mutate request in place

The round-1 helper returned a fresh *http.Request from WithContext, and
the webhook handler did `r = SetWebhookTriggerID(r, ...)`. That swaps
the handler's local pointer but doesn't propagate the new context back
to RequestLogger, which is still holding the original *http.Request —
so the audit line never actually included webhook_trigger_id in
production. The round-1 test happened to pass because it pre-stashed
the value on the request before calling ServeHTTP, bypassing the bug
it was meant to verify.

Switch to in-place mutation via `*r = *r.WithContext(...)` so the
wrapping middleware sees the new context after next.ServeHTTP returns,
and update the test to exercise the real call pattern (set the context
from inside the handler, assert the surrounding logger reads it).

Verified live: an accepted webhook now logs
  path=/api/webhooks/autopilots/[redacted] webhook_trigger_id=<uuid>

* fix(autopilot-webhook): symmetric ErrNoRows split + trusted-proxy gate

Round-2 review (Bohan-J, PR #2348 follow-up):

- Must-fix #1: the second lookup at autopilot_webhook.go:258
  (GetAutopilot after the token resolves) was folding every error into
  404. A transient DB blip would tell a webhook sender "not found" and
  it would never retry. Apply the same errors.Is(err, pgx.ErrNoRows)
  → 404 / else → 500 split as the first lookup got in round 1.

- Must-fix #2: clientIPForRateLimit was honoring X-Forwarded-For /
  X-Real-IP from any caller. An attacker spraying random tokens could
  just rotate the XFF header and the per-IP bucket became per-request,
  so the limiter that's specifically supposed to gate spraying before
  it hits the DB unique index was bypassed.

  New shape — matches Bohan's suggestion exactly:
  * Default: r.RemoteAddr only, headers ignored.
  * Operator opt-in via MULTICA_TRUSTED_PROXIES (comma-separated
    CIDRs). XFF/X-Real-IP are honored only when r.RemoteAddr is
    inside one of the listed prefixes; otherwise they're dropped.

  Wired through .env.example and docker-compose.selfhost.yml so
  self-host operators can configure their reverse-proxy's CIDR.
  Invalid CIDRs in the env var are dropped with a single slog.Warn at
  startup rather than crashing the server. Uses net/netip (stdlib,
  value-typed) for parsing and containment checks.

Verified live on the rebuilt self-host backend: a 35-request spray
from one source with rotating XFF gets the expected 30× 404 + 5× 429,
proving the per-IP bucket is keyed on the real connection IP.

* fix(autopilot): reject cron/timezone PATCH on non-schedule triggers

Round-2 review should-fix. CreateAutopilotTrigger already 400s on
kind=webhook + timezone/cron_expression, but UpdateAutopilotTrigger
silently wrote those fields regardless of prev.Kind. The values then
sat in the DB visible to nobody and read by nothing — a back door that
left the API contract fuzzy across create vs update.

Mirror the create-path discipline: after loading prev, if prev.Kind
!= "schedule" and the PATCH body sets cron_expression or timezone,
return 400 with a clear message. enabled and label remain accepted on
every kind.

The existing prev.Kind == "schedule" guard on next_run_at recompute
stays as belt-and-braces, but with this gate in place the recompute
branch is now reachable only for the kind it was meant for.

* test(autopilot-webhook): close round-2 coverage gaps

- IPRateLimitNotBypassedByXFFSpoof: drives the must-fix #2 invariant
  by rotating XFF across three calls from the same RemoteAddr and
  asserting the third gets 429. Pre-round-2 this test would have
  passed for the wrong reason (limiter trusted XFF, so per-bucket
  collision was incidental); now it pins the bypass-closed property.
- IPRateLimitReturns429BeforeDBLookup: updated to set RemoteAddr
  explicitly and drop the XFF header it was leaning on. With
  TrustedProxies empty (test default) the limiter keys on the real
  connection IP, which is what the test wants to assert anyway.
- UpdateAutopilotTrigger_RejectsCronExpressionOnWebhookKind +
  UpdateAutopilotTrigger_RejectsTimezoneOnWebhookKind: drive the
  round-2 should-fix from the handler boundary.
- UpdateAutopilotTrigger_AcceptsEnabledAndLabelOnWebhookKind: counter
  test so a regression to a blanket reject is caught.

* fix(migrations): bump webhook trigger migration 089 → 091

origin/main added 089_squad_no_action_activity_index (and 090_task_is_leader)
since our last rebase, re-colliding with our 089_autopilot_webhook_triggers.
Bump to 091 so the filename ordering is unambiguous again. The SQL is
unchanged — same partial unique index on autopilot_trigger.webhook_token —
only the filename moves.

* fix(views): dedupe skipped icon in autopilot RUN_VISUAL after rebase

The rebase against origin/main merged main's add of `Ban` for the
skipped status next to our round-1 `MinusCircle` entry, leaving the
RUN_VISUAL map with two `skipped` keys (only the last would have been
read at runtime, and MinusCircle had been dropped from the imports
during conflict resolution — so the file would not compile).

Keep main's `Ban` icon (latest design) and a single `skipped` entry.
Carry over the round-1 comment about why the muted styling matters
for failure-ratio readability.

---------

Co-authored-by: Kerim Incedayi <kerim.incedayi@digitalchargingsolutions.com>
2026-05-18 12:17:39 +08:00
Bohan Jiang
3645bdb5b6 feat(issues): add start_date field with progressive disclosure (MUL-2274) (#2696)
* feat(issues): add start_date field with progressive disclosure (MUL-2274)

Mirrors the existing due_date implementation end-to-end so an issue can
express a planned start in addition to a deadline. Surfaces start_date as
an optional sidebar property alongside priority / due_date / labels (added
in MUL-2275), with consistent picker, board/list/sort, activity, and inbox
plumbing.

Backs the Project Gantt work (parent MUL-1881) and keeps the
progressive-disclosure attribute experience consistent.

- DB: migration 091 adds issue.start_date TIMESTAMPTZ.
- sqlc: ListIssues / CreateIssue / UpdateIssue / CreateIssueWithOrigin /
  ListOpenIssues read & write start_date.
- Backend: IssueResponse + create/update/batch-update handlers parse and
  emit start_date with RFC3339 validation; new start_date_changed activity
  event + subscriber notification (with prev_start_date in event payload).
- CLI: --start-date flag on `multica issue create` / `issue update`.
- Frontend: StartDatePicker component, start_date wired into Issue type,
  Zod schema, draft / view stores, sort util, header sort + card-property
  options, list-row / board-card display, create-issue modal, and the
  issue-detail progressive-disclosure "+ Add property" surface (visibility
  rule, picker row, add-property menu icon + label).
- i18n: en + zh-Hans for sort_start_date / card_start_date /
  prop_start_date / activity start_date_set / start_date_removed /
  picker start_date.trigger_label / clear_action / inbox labels.
- Tests: new TestNotification_StartDateChanged; existing Issue / draft /
  modal fixtures extended with start_date.

Co-authored-by: multica-agent <github@multica.ai>

* feat(issues): align start_date with due_date in actions menu and CLI table

- Add Start Date submenu (today / tomorrow / next week / clear) in
  actions menu, mirroring Due Date — parity with the Due Date quick
  setters in list/board context and 3-dot menus.
- Add corresponding en / zh-Hans i18n keys
  (actions.start_date / start_today / start_tomorrow / start_next_week
  / start_clear).
- CLI human table for `multica issue list` and `multica issue get`
  now shows a START DATE column next to DUE DATE; --full-id variant
  too.

Co-authored-by: multica-agent <github@multica.ai>

---------

Co-authored-by: multica-agent <github@multica.ai>
2026-05-17 15:01:38 +08:00
Jiayuan Zhang
668cab6022 feat(github): mirror PR CI checks and merge conflict status (MUL-2228) (#2632)
* feat(github): mirror PR CI checks and merge conflict status (MUL-2228)

Surface "checks passed/failed" and "conflicts/no conflicts" badges under
each linked PR on the issue page so users can judge readiness without
flipping over to GitHub. CI state is fed by check_suite webhooks
(GitHub Actions + apps using the Checks API; legacy status events are
out of scope for MVP); conflicts are read from pull_request.mergeable_state.

Data model:
  * github_pull_request: add head_sha + mergeable_state
  * github_pull_request_check_suite: per-suite rows keyed by (pr_id, suite_id)
  * Aggregation done at query time, filtering by current head_sha so
    late-arriving suites for a stale head can't contaminate the new head's
    pending view; per-app latest suite chosen first so a single app firing
    multiple suites isn't counted N times.

Webhook hardening:
  * synchronize/opened/reopened/edited(base) explicitly clear mergeable_state
  * single-row ordering protection on the check_suite upsert prevents a
    late-delivered older event from overwriting a newer one
  * check_suite.pull_requests is iterated; unknown PRs are logged and dropped

UI:
  * PR row shows Checks + Conflicts badges; opaque mergeable values
    (blocked/behind/unstable/...) render as no badge, not as conflicts.
  * Terminal PR states (merged/closed) suppress the status row entirely.

Tests: * Pure unit coverage for derivePRMergeableState + aggregateChecksConclusion
  * Webhook integration tests: multi-app aggregation, old-head ignore,
    late-older-event ignore, synchronize clears mergeable_state
  * Vitest coverage for pull-request-list badge rendering across CI/conflict
    combinations and the legacy (null) fallback.
Co-authored-by: multica-agent <github@multica.ai>

* fix(github): scope check_suite PR lookup; preserve mergeable on metadata

Addresses code review on PR #2632.

1. check_suite handler now resolves the PR through the workspace-scoped
   GetGitHubPullRequest query instead of GetGitHubPullRequestByRepoNumber.
   The (workspace_id, repo_owner, repo_name, pr_number) tuple is the real
   uniqueness key, so a bare (owner, repo, number) lookup could return a
   stale row from another workspace and either land the suite on the wrong
   PR or skip the right one when the installation ids drifted. The old
   unscoped query is removed.

2. derivePRMergeableState now returns (value, clear) and the upsert SQL
   distinguishes three cases: state-changing actions clear the column to
   NULL, non-empty payloads write the value, and metadata events with an
   empty payload preserve the existing column. Previously every empty
   payload became NULL, so a labeled/assigned event silently wiped a
   known clean/dirty verdict in violation of the RFC's "metadata empty
   payload preserves" rule.

3. ListPullRequestsByIssue narrows to the issue's PR ids before running
   the per-app check_suite aggregation, avoiding a full-table scan over
   github_pull_request_check_suite when only a handful of rows belong to
   the requested issue.

New helper test covers labeled+empty preserves; new integration test
verifies a metadata event after a known mergeable_state keeps the value.

Co-authored-by: multica-agent <github@multica.ai>

* feat(github): PR card layout v3 increment — stats + segmented progress bar

Replaces the row + badge layout under "Pull requests" on the issue
detail sidebar with a card that mirrors the GitHub PR summary look:
title, author/avatar, +N −M · K files diff stats, segmented progress
bar (failed → pending → passed, failure leftmost), and a one-line
status caption following an explicit priority pass-through.

Backend
- Migration 092: github_pull_request adds additions / deletions /
  changed_files (INT NOT NULL DEFAULT 0). Zero defaults are what the
  new frontend treats as "legacy backend — hide the stats row" so old
  PR rows that pre-date this migration don't render "+0 −0 · 0 files".
- pull_request webhook handler reads stats off the top-level payload.
- ListPullRequestsByIssue now surfaces per-suite counts
  (checks_passed / failed / pending) alongside the existing aggregate
  conclusion, so the segmented bar reuses the already-computed counts
  with no new aggregation.

Frontend (packages)
- core/github/pull-request-status.{ts,test.ts}: pure-function module
  for the status-kind priority table and the segment derivation; 15
  cases covered, includes the "all-zero → hide stats" guard.
- views/issues/components/pull-request-list.tsx: PullRequestCard plus
  a compact-row fallback used when count > 4 (first 3 as cards, the
  remainder collapsed behind a Show more toggle).
- i18n: new `pull_request_card_*` keys in en + zh-Hans.

Tests
- 12 component tests covering each rule of the priority table, the
  legacy-zero stats fallback, and the collapse threshold.
- Reuse of the v3 webhook handler tests confirmed.

Verification
- pnpm typecheck + pnpm test green (60 test files, 536 tests).
- go build ./... + go vet ./... clean.
- 6 demo issues (DEV-2..DEV-7) screenshotted via Playwright; see the
  PR comments for the visual check matrix.

Co-authored-by: multica-agent <github@multica.ai>

* fix(views): collapse PR cards at N>=4, not N>4

The card-vs-collapse threshold used `>` so 4 PRs slipped past it and
all rendered as full cards, contrary to RFC v3 (N >= 4 collapses to
3 cards + compact tail). Switch to `>=` and update the threshold-
boundary test to expect "Show 1 more".

Co-authored-by: multica-agent <github@multica.ai>

* fix(views): align PR sidebar rows with existing list style

Co-authored-by: multica-agent <github@multica.ai>

* fix(views): hide terminal PR status badges

Co-authored-by: multica-agent <github@multica.ai>

---------

Co-authored-by: multica-agent <github@multica.ai>
2026-05-16 21:26:30 +02:00
LinYushen
319b23eb39 Revert "feat(task): add claim lease mechanism (Phase 2, MUL-2246) (#2660)" (#2674)
This reverts commit 3137feecdf.
2026-05-15 16:07:23 +08:00
LinYushen
3137feecdf feat(task): add claim lease mechanism (Phase 2, MUL-2246) (#2660)
Add claim_token + claim_expires_at columns to agent_task_queue and three
new SQL queries for the claim lease protocol:

- ClaimAgentTaskWithLease: generates a UUID token and sets a lease expiry
  when claiming a task, so the daemon must prove it received the response
- StartAgentTaskWithClaimToken: validates the token on StartTask, preventing
  stale daemons from starting requeued tasks
- RequeueExpiredClaimLeases: moves dispatched tasks with expired leases back
  to queued for re-claim

This closes the reliability gap where a claim response lost in transit
leaves a task stuck in dispatched until the 60s dispatch timeout fires.

Co-authored-by: multica-agent <github@multica.ai>
2026-05-15 15:14:05 +08:00
Jiayuan Zhang
4d6b5ad06f fix(squad): wake leader when dual-role agent posts as worker (MUL-2218) (#2626)
* fix(squad): wake leader when dual-role agent posts as worker (MUL-2218)

The squad-leader self-trigger guard skipped a comment whenever the
author equalled the squad's leader id, regardless of the role the agent
was acting in. For an agent that holds both leader and worker roles in
the same squad, this meant the leader role never reacted to its own
worker output and the issue stalled.

Tag each enqueued task with is_leader_task and consult the agent's
most recent task on the issue from both self-trigger guards (comment
path + @squad mention path) — skip only when that task was itself a
leader task.

Co-authored-by: multica-agent <github@multica.ai>

* fix(squad): inherit is_leader_task on retry task clone (MUL-2218)

CreateRetryTask cloned a parent task into a fresh queued attempt but
omitted is_leader_task from the column list, so the child silently fell
back to the column default (false). For a leader task that hit auto-retry
through MaybeRetryFailedTask, the retried task posed as a worker task —
the self-trigger guard then no longer recognised the leader's own
comments, re-opening the very loop MUL-2218 closes.

Inherit p.is_leader_task in the clone and add a query-level test that
covers both leader and worker retries.

Co-authored-by: multica-agent <github@multica.ai>

---------

Co-authored-by: multica-agent <github@multica.ai>
2026-05-14 15:23:36 +02:00
LinYushen
0cb759b446 fix(squad): suppress no-action leader comments (#2583) 2026-05-14 14:07:26 +08:00
LinYushen
29082f7cfe feat: implement Squad feature MVP (#2505)
* feat: implement Squad feature MVP

- Add migration 084_squad: squad, squad_member, squad_activity_log tables
- Extend issue.assignee_type to support 'squad'
- Add sqlc queries for squad CRUD, member management, activity logs
- Add Go handler with full Squad API (CRUD, members, activity log)
- Register routes: /api/squads/*, /api/issues/{id}/squad-activity, /api/squad-activity
- Add Squad trigger logic:
  - Assign Squad immediately triggers leader
  - Every external comment on squad-assigned issue triggers leader
  - Anti-loop: squad members' comments don't trigger leader
  - Dedup: skip if leader already has pending task
- Add squad activity log API (方案 B) for leader no-op recording
- Add frontend TypeScript types (Squad, SquadMember, SquadActivityLog)
- Add protocol events: squad:created, squad:updated, squad:deleted

Co-authored-by: multica-agent <github@multica.ai>

* fix: address PR review blocking issues

1. validateAssigneePair now accepts 'squad' assignee_type
2. All squad endpoints validate workspace ownership via GetSquadInWorkspace
3. CreateSquadActivityLog restricted to squad leader agent only
4. AddSquadMember validates member exists in workspace
5. UpdateSquad auto-adds new leader to squad members
6. DeleteSquad transfers assigned issues to leader before deletion
7. IssueAssigneeType includes 'squad' in frontend types

Co-authored-by: multica-agent <github@multica.ai>

* feat: soft-delete squads via archive instead of hard delete

- Add migration 085: archived_at + archived_by columns on squad table
- ListSquads now excludes archived squads (ListAllSquads for admin)
- DeleteSquad → ArchiveSquad (sets archived_at, preserves all records)
- Transfer squad-assigned issues to leader before archiving
- SquadResponse includes archived_at/archived_by fields
- Frontend Squad type updated with nullable archived fields

Co-authored-by: multica-agent <github@multica.ai>

* feat: re-add Squads frontend entry (sidebar nav + pages)

Re-applies the frontend squad entry that was lost during a merge:
- Sidebar nav: Squads item with Users icon
- Paths: squads() and squadDetail() in workspace paths
- Routes: /squads and /squads/[id] pages
- Views: SquadsPage (list) and SquadDetailPage
- i18n: en 'Squads' / zh '小队'
- Reserved slug: 'squads'

Co-authored-by: multica-agent <github@multica.ai>

* fix: fix SquadsPage rendering - use PageHeader children pattern

PageHeader takes children, not title/actions props. The incorrect
usage caused a React rendering error. Now matches the pattern used
by autopilots and agents pages.

Co-authored-by: multica-agent <github@multica.ai>

* fix(squads): add API client methods and package export for squads pages

* feat: complete Squad frontend - create dialog, member management, API methods

- Add CreateSquadModal with name/description/leader selection
- Register 'create-squad' in modal registry
- Wire 'New Squad' button to open the modal
- Add full API client methods: createSquad, updateSquad, deleteSquad,
  addSquadMember, removeSquadMember
- Rewrite SquadDetailPage with:
  - Member list showing resolved names
  - Add/remove member UI
  - Archive squad button
  - Back navigation to squads list

Co-authored-by: multica-agent <github@multica.ai>

* feat: improve Squad UI - match create agent dialog style

- CreateSquadModal: proper Dialog with Header/Description/Footer,
  agent picker with avatars, textarea for description
- SquadDetailPage: centered max-w-2xl layout, ActorAvatar for members,
  Crown badge for leader, textarea for member description,
  improved spacing and visual hierarchy
- Renamed 'role' field label to 'Description' in add member form
  (describes the member's responsibilities in the squad)

Co-authored-by: multica-agent <github@multica.ai>

* feat(squad): add avatar, instructions; drop unique-name constraint

- 086: add squad.avatar_url
- 087: drop unique constraint on squad.name (squads with the same
  name are legitimate across teams; uniqueness was an accidental
  product constraint)
- 088: add squad.instructions (text, default '')
- UpdateSquad now COALESCEs avatar_url + instructions
- handler exposes Instructions in SquadResponse and accepts it in
  UpdateSquad

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* feat(squad): assignable + mention target; trigger leader on assign

- assignee picker and @mention suggestion list squads alongside
  agents and members; renders squad avatar/icon
- creating or updating an issue with assignee_type=squad enqueues
  a task for the squad's current leader (mirrors agent-assignee
  parking-lot rule: skip backlog only)
- workspace queries/hooks expose squads where needed for the
  pickers
- locales updated for new picker copy

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* feat(squad): agent-style detail page with members + instructions tabs

- restructure squad detail page to mirror the agent detail page:
  320px inspector (creator, leader, created/updated) + tabbed
  pane (Members | Instructions) with dirty-guard AlertDialog
- inline name + avatar editing on the inspector
- inline description editor (modal textarea)
- members tab: leader + member picker with role descriptions,
  swap leader, edit member roles, remove
- instructions tab: ContentEditor + Save (mirrors agent pattern)
- squads list shows the squad avatar/icon
- core types + api.updateSquad accept avatar_url + instructions

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* feat(squad): inject leader briefing on claim (protocol + roster + instructions)

When a squad's leader agent claims a task on a squad-assigned issue,
append a system-level briefing to the agent's Instructions composed of:

1. Squad Operating Protocol — hard-coded rules: leader is a
   coordinator, dispatch via @mention, stop after dispatching,
   resume on re-trigger, do not work outside the roster.
2. Squad Roster — leader self-row plus one row per non-archived
   member with a literal mention markdown string ([@Name](mention://
   agent|member/<UUID>)) the leader can paste verbatim. Round-trips
   through util.ParseMentions, enforced by a contract test.
3. Squad Instructions — the user-defined squad.instructions block,
   omitted entirely when empty so we do not leave a dangling heading.

Non-leader members claiming the same issue receive no briefing.

Tests cover: full squad with mixed agent/human members, lone leader,
archived agents skipped, empty user instructions, mention round-trip,
and the leader/non-leader claim-handler gate.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix(squad): tell leader not to restate issue context in dispatch comment

After observing leaders padding their delegation comments with full
re-summaries of the issue body and prior discussion, make the
Operating Protocol explicit:

- assignees on Multica already have the full issue (title,
  description, all comments, attachments) and workspace context;
- delegation comments should add only what cannot be inferred
  (who is picked, why, extra constraints), aim for two or three
  sentences;
- restating context is now an explicit hard rule violation.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* feat(squad): unify leader evaluation into activity_log, add CLI command

- Squad member comments now trigger leader (only leader self-excluded)
- Replace squad_activity_log with activity_log (action: squad_leader_evaluated)
- Add CLI: multica squad activity <issue-id> <outcome> --reason
- Add API: POST /api/issues/{id}/squad-evaluated
- Update squad operating protocol to require evaluation recording
- Remove squad_activity_log table from schema and generated code

* feat(cli): add squad list, get, member list commands

* fix(squad): address review findings (P1+P2)

P1 fixes:
- Add 'squads' to reserved_slugs.json (source of truth)
- Add 'create-squad' to ModalType union
- Remove unused leaderOpen/selectedLeader in create-squad modal
- Replace literal JSX strings with i18n selectors (en + zh-Hans)

P2 fixes:
- Add 'squad' to mention regex (MentionRe)
- Fix human member lookup in squad briefing (use GetUser directly)
- Add squads routes to desktop app
- Add squad:created/updated/deleted to WSEventType + invalidation
- Reject archived squads as issue assignees

* fix(squad): restore zh-Hans key, publish activity event, invalidate issues on archive

- Restore create_project.title in zh-Hans modals.json (dropped by prior edit)
- Publish activity:created WS event after squad leader evaluation
- Invalidate issue queries on squad:deleted (archive transfers assignees)
- Add creator info to squad list cards

* fix(squad): realtime sync, rerun support, leader validation

- Use workspaceKeys.squads prefix for detail/member queries (realtime invalidation)
- Publish squad:updated after add/remove/role-change member mutations
- Support rerun for squad-assigned issues (targets leader agent)
- Reject assignment to squads whose leader is archived

---------

Co-authored-by: multica-agent <github@multica.ai>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-13 18:46:20 +08:00
Bohan Jiang
96695a79c5 feat(dashboard): workspace/project token + run-time dashboard MUL-1882 (#2462)
* feat(dashboard): workspace/project token + run-time dashboard

Add a `/{slug}/dashboard` page showing per-agent token spend and execution
time across the whole workspace, with an optional project filter.

Backend:
  - Three new sqlc queries against task_usage + agent_task_queue: daily
    usage, per-agent usage, per-agent total run-time. All optionally
    scoped to a project via sqlc.narg('project_id'), reaching project
    through the issue join.
  - Handlers under /api/dashboard return the same wire shape the runtime
    page already consumes (model preserved for client-side cost math).

Frontend: - Shared DashboardPage in packages/views/dashboard reusing KpiCard,
    DailyCostChart, ActorAvatar, and estimateCost from the runtime page
    so the visual style and pricing math stay in lock-step.
  - Period selector (7/30/90d), project dropdown, four KPI tiles
    (cost, tokens, run time, tasks), daily cost chart, and a combined
    "cost + run time by agent" list.
  - Routed in both web (app/[slug]/(dashboard)/dashboard) and desktop
    (memory router); sidebar nav entry added under Workspace group.
Co-authored-by: multica-agent <github@multica.ai>

* fix(dashboard): drop stale project filter and stop double-counting tasks

Two issues caught in PR #2462 review:

1. Project filter held the previous selection's UUID across workspace
   switches and project deletions: the dropdown gracefully showed
   "All projects" (because the title lookup missed) while the three
   dashboard queries kept forwarding the dead UUID, leaving the UI
   looking like a full-workspace view but populated with empty
   project-scoped data. Validate the picked UUID against the current
   projects list before passing it to the queries.

2. The "by agent" table read its task count from the token rollup,
   which is grouped per (agent, model). A single task that spans two
   models lands twice and the agent's row reads e.g. "2 tasks" when
   the real count is 1. Prefer `ListDashboardAgentRunTime`'s per-agent
   distinct count when available; fall back to the token aggregate
   only for agents with no terminal run yet (in-flight tasks).

Extract the merge into `mergeAgentDashboardRows` so the precedence
rules are unit-tested directly.

Co-authored-by: multica-agent <github@multica.ai>

* test(dashboard): allocate per-workspace issue.number explicitly

TestDashboardEndpoints creates two issues in the shared fixture
workspace. issue.number defaults to 0 (migration 020), and the table
carries UNIQUE (workspace_id, number), so the second insert raced the
first on the same default and failed in CI.

Allocate MAX(number) + 1 per insert so each row gets a fresh number
without stepping on rows other tests left behind in the same workspace.

Co-authored-by: multica-agent <github@multica.ai>

* feat(dashboard): rollup table + cron-driven aggregation for dashboard

Mirror the per-runtime rollup in `task_usage_daily` (migrations 073/077/082)
to remove the per-request raw aggregation the dashboard was doing.

Migration 084 adds:
  - `task_usage_dashboard_daily` keyed on
    (bucket_date, workspace_id, agent_id, project_id, model) — the
    dimensions the dashboard actually queries, with project_id nullable
    via UNIQUE NULLS NOT DISTINCT (PG15+) so "no-project" buckets
    upsert cleanly.
  - `task_usage_dashboard_rollup_state` watermark table.
  - `task_usage_dashboard_dirty` invalidation queue.
  - Triggers on agent_task_queue DELETE, task_usage DELETE, and
    issue.project_id UPDATE — the cases the updated_at watermark can't
    see. The project_id trigger re-attributes existing rollup rows when
    a user moves an issue across projects.
  - `rollup_task_usage_dashboard_daily_window(from, to)` —
    idempotent recompute primitive (same shape as 077).
  - `rollup_task_usage_dashboard_daily()` cron entry — own advisory
    lock (4244) so it serialises independently of the runtime rollup.
  - `task_usage_dashboard_rollup_lag_seconds()` health helper.

Sqlc queries `ListDashboardUsageDailyRollup` /
`ListDashboardUsageByAgentRollup` read from the new table; the handler
dispatches between rollup and raw on a separate
`UseDailyRollupForDashboard` config flag
(`USAGE_DASHBOARD_ROLLUP_ENABLED` env). Same fail-safe default (false →
raw) so operators can roll out independently of the per-runtime flag.

Bucket date is UTC (the dashboard aggregates across runtimes that may
sit in different tzs; there's no single correct local boundary).

Adds `cmd/backfill_task_usage_dashboard_daily` mirroring the existing
per-runtime backfill — operator runs it once before flipping the flag.

Tests: - TestDashboardEndpoints now also exercises the rollup read path
    (raw vs. rollup, same project-scoped totals).
  - TestDashboardRollupReattributesOnProjectChange verifies the
    issue.project_id trigger enqueues both old + new buckets and the
    next rollup tick zeroes the old project + populates the new one.
Co-authored-by: multica-agent <github@multica.ai>

* fix(dashboard-rollup): close two invalidation gaps

Two leak paths missed by migration 084 review:

1. Issue cascade DELETE — the atq BEFORE DELETE trigger runs AFTER the
   issue row is gone, so `LEFT JOIN issue` returns NULL project_id and
   the original-project bucket never gets cleared (issue 077 calls this
   out for the runtime rollup but didn't need to act on it). Adds an
   `issue BEFORE DELETE` trigger that enqueues using OLD.project_id
   while the issue row is still readable.

2. `LinkTaskToIssue` (quick-create task attaching to a real issue post-
   completion) UPDATEs `agent_task_queue.issue_id` from NULL to a real
   id. Migration 084 only watched DELETE on atq, so usage already
   rolled up under the no-project bucket stayed attributed to NULL
   forever. Extends the atq trigger to fire on UPDATE OF issue_id too,
   enqueueing both OLD (NULL project) and NEW (linked issue's project).

Tests: - TestDashboardRollupClearsOnIssueDelete asserts rollup row drops to
    zero after issue delete + rollup tick.
  - TestDashboardRollupReattributesOnLinkTaskToIssue verifies tokens
    move from the NULL bucket to the project bucket after the UPDATE.
Co-authored-by: multica-agent <github@multica.ai>

---------

Co-authored-by: multica-agent <github@multica.ai>
2026-05-13 12:51:16 +08:00
Bohan Jiang
caeb146bac feat(github): GitHub App integration for PR ↔ issue linking (#1817)
* feat(github): GitHub App backend for PR ↔ issue linking

- New tables: github_installation (workspace ↔ App install), github_pull_request (mirrored PR state), issue_pull_request (M:N link).
- Webhook handler verifies HMAC-SHA256, upserts PR rows, parses issue identifiers from PR title/body/branch and auto-links them. Merging a linked PR moves the issue to done.
- Connect/setup endpoints power the zero-config "Connect GitHub" install flow; state token is HMAC-signed so the setup callback can recover the workspace.
- Workspace-scoped admin routes for listing/disconnecting installations, plus a per-issue `pull-requests` list endpoint.

Co-authored-by: multica-agent <github@multica.ai>

* feat(github): UI for connecting GitHub and viewing linked PRs

- Settings → Integrations: new tab with Connect GitHub / installations list / disconnect, gated on the deployment having the App configured.
- Issue detail sidebar: Pull requests section showing linked PR title, repo, state (open/draft/merged/closed), and author, with deep link to GitHub.
- Real-time refresh: github_installation:* and pull_request:* events invalidate the matching TanStack Query caches.

Co-authored-by: multica-agent <github@multica.ai>

* fix(github): address review — null actor, role gating, configured guard, scoped uninstall broadcast

- listeners: use optionalUUID(e.ActorID) so the system actor on the github-driven issue:updated event no longer panics activity / notification listeners; merged-PR → issue done now produces a status_changed activity and inbox entry.
- IntegrationsTab: gate the admin-only installations query on canManage so members no longer hit /github/installations 403; the configured/not-configured copy is also scoped to admins.
- backend: introduce isGitHubConfigured() requiring both GITHUB_APP_SLUG and GITHUB_WEBHOOK_SECRET, and surface that single flag from list-installations + connect endpoints so the frontend Connect button stays disabled until both are set.
- DeleteGitHubInstallationByInstallationID now RETURNs workspace_id; webhook handler publishes github_installation:deleted scoped to the right workspace so already-open Settings tabs invalidate in real time. ErrNoRows on a re-fired delete short-circuits cleanly.
- tests: focused webhook integration coverage (auto-link + merge → done, cancelled preservation, uninstall returns workspace).

Co-authored-by: multica-agent <github@multica.ai>

* fix(github): i18n the new GitHub UI strings to satisfy lint

CI flagged every literal string in the Integrations tab, the Pull requests
sidebar section, and the per-PR row label. Move them through useT() and
add the matching `integrations.*` block to settings.json (en / zh-Hans)
plus `detail.section_pull_requests` / `detail.pull_request_state_*` /
loading + empty copy under `issues.json`.

Co-authored-by: multica-agent <github@multica.ai>

---------

Co-authored-by: multica-agent <github@multica.ai>
2026-05-12 13:49:03 +08:00
Naiyuan Qing
86aa5199fc feat(chat): support attachments & images in chat input (#2445)
* docs(plans): chat attachment & image support implementation plan

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-authored-by: multica-agent <github@multica.ai>

* feat(db): add chat_session_id/chat_message_id to attachment

Co-authored-by: multica-agent <github@multica.ai>

* feat(db): sqlc — chat_session_id on CreateAttachment + LinkAttachmentsToChatMessage

Co-authored-by: multica-agent <github@multica.ai>

* feat(file): upload-file accepts chat_session_id form field

Co-authored-by: multica-agent <github@multica.ai>

* feat(chat): SendChatMessage links uploaded attachments to the new message

Co-authored-by: multica-agent <github@multica.ai>

* feat(api): uploadFile accepts chatSessionId; sendChatMessage accepts attachmentIds

Co-authored-by: multica-agent <github@multica.ai>

* feat(core): useFileUpload supports chatSessionId context

Co-authored-by: multica-agent <github@multica.ai>

* feat(chat): support paste/drag/upload attachments in chat input

Co-authored-by: multica-agent <github@multica.ai>

* test(e2e): chat input attachment upload + send round-trip

Co-authored-by: multica-agent <github@multica.ai>

* chore(chat): keep lazy-created session title empty so untitled fallback localizes

Co-authored-by: multica-agent <github@multica.ai>

* fix(chat): address review — dedupe ensureSession + parse upload response

- chat-window: cache in-flight createSession promise in a ref so a file drop
  followed by a quick send no longer spawns two sessions (and orphans the
  attachment on the losing one).
- Attachment type + EMPTY_ATTACHMENT + AttachmentResponseSchema: include the
  new chat_session_id / chat_message_id fields the server now returns.
- uploadFile: route the response through parseWithFallback so a malformed
  body returns EMPTY_ATTACHMENT instead of an undefined-keyed Attachment,
  matching the API boundary rule.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-authored-by: multica-agent <github@multica.ai>

* fix(chat): address PR #2445 review — test ctx, send gating, attachment surface

1. Backend test was 400ing because the handler reads workspace from
   middleware-injected ctx, and `newRequest` only sets the header. Helper
   `withChatTestWorkspaceCtx` mirrors the agent-access-test pattern and
   loads the member row + SetMemberContext before invoking the handler.

2. Attachment metadata now flows end-to-end:
   - new sqlc `ListAttachmentsByChatMessageIDs` (batch lookup, mirrors the
     comment-side query)
   - `chatMessageToResponse` takes `attachments` and `ChatMessageResponse`
     surfaces them — same shape as CommentResponse
   - `ListChatMessages` loads them via a new `groupChatMessageAttachments`
     helper so the chat bubble can render file cards
   - daemon claim path pulls `ListAttachmentsByChatMessage` for the latest
     user message and ships `ChatMessageAttachments` to the daemon
   - `buildChatPrompt` lists id+filename+content_type and instructs the
     agent to `multica attachment download <id>` — fixes the private-CDN
     expiring-URL problem where the markdown URL would have expired by
     the time the agent acts
   - TS `ChatMessage` gains an optional `attachments` field

3. Chat composer now blocks send while uploads are in flight:
   - `pendingUploads` counter increments in handleUpload, SubmitButton
     uses it to disable
   - handleSend also gates on `editorRef.current.hasActiveUploads()` to
     catch the Mod+Enter path that bypasses the button
   - new vitest covers the "drop large file → immediate send" scenario
     where attachment id would otherwise be silently dropped

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-authored-by: multica-agent <github@multica.ai>

* chore: drop implementation plan doc

Process artefact, not something the repo needs to keep.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-authored-by: multica-agent <github@multica.ai>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Co-authored-by: multica-agent <github@multica.ai>
2026-05-12 10:57:54 +08:00
Bohan Jiang
63d215e1c3 feat(runtime): visibility (public/private) gate on CreateAgent / UpdateAgent (#2419)
* feat(runtime): visibility (public/private) gate on CreateAgent / UpdateAgent

Closes the hole where a plain workspace member could pick another member's
runtime in the Create Agent dialog and bind an agent to it — the backend
wasn't checking runtime ownership, so the agent ran on someone else's
hardware / tokens. Reported on GH #1804.

Schema
- Migration 083 adds agent_runtime.visibility ('private' default, 'public')
  with a CHECK constraint. Existing rows default to private — same
  ownership semantics as before, no behavior change for legacy data.

Backend
- canUseRuntimeForAgent predicate: allow when caller is workspace
  owner/admin, the runtime owner, or the runtime is public.
- CreateAgent and UpdateAgent both gate on it: UpdateAgent matters because
  a plain member could otherwise create on their own runtime, then re-bind
  to a private one.
- PATCH /api/runtimes/:id accepts { visibility } — owner/admin only,
  validated against the same private/public allow-list.

Frontend
- Create-agent dialog renders other-owned private runtimes disabled with a
  Lock badge + tooltip explaining who to ask.
- Inspector runtime-picker disables the same set so re-binding fails
  the same way at the UI layer.
- Runtime detail diagnostics gains a Visibility editor (owner/admin) or
  read-only chip (everyone else).
- Runtime list shows a private/public chip next to the name.

Tests
- Go: canUseRuntimeForAgent truth table; CreateAgent / UpdateAgent
  end-to-end gate tests (admin / runtime owner / plain member);
  PATCH visibility owner / admin / member / invalid-value coverage.
- Vitest: create-agent dialog disabled state on private/public runtimes,
  default-runtime selection skips locked rows; runtime detail visibility
  editor → mutation, read-only fallback.

Migrating runtimes: existing rows default to private to preserve the
"owner only" status quo. Owners switch to public via the detail page
diagnostics card.

Co-authored-by: multica-agent <github@multica.ai>

* fix(runtime): apply timezone+visibility atomically; don't seed locked template runtime

Two issues surfaced in review of MUL-2062:

1. PATCH /api/runtimes/:id ran the timezone branch first, which:
   - returned early on a tz no-op, silently dropping a concurrent
     `visibility` patch in the same body;
   - committed the timezone mutation (+ usage rollup rebuild) before
     validating visibility, so an invalid visibility left the row
     half-updated.

   Validate every field first, then run the mutations in order. The
   no-op short-circuit now only triggers when nothing else is requested.

2. The Create Agent dialog in duplicate mode unconditionally seeded
   `template.runtime_id` as the selected runtime, even when that runtime
   is now private and owned by someone else — the user saw a selected
   row they couldn't submit (Create → backend 403). Fall back to the
   first usable runtime when the template's runtime is locked, and gate
   the Create button on `selectedRuntimeLocked` as defense in depth.

Tests:
- Go: TestUpdateAgentRuntime_CombinedPatchAppliesBoth (tz no-op +
  visibility flip), TestUpdateAgentRuntime_InvalidVisibilityDoesNotMutateTimezone
  (atomic-fail invariant).
- Vitest: duplicate template pointing at a locked runtime now seeds
  the first usable one; Create button stays disabled when no usable
  alternative exists.

Co-authored-by: multica-agent <github@multica.ai>

---------

Co-authored-by: multica-agent <github@multica.ai>
2026-05-11 22:53:07 +08:00
Multica Eve
d6349c16ec feat(runtime): per-runtime timezone for token-usage aggregation (MUL-1950) (#2394)
* feat: per-runtime timezone for token usage aggregation

The runtime token-usage charts (daily and hourly tabs on the
runtime-detail page) bucketed every event by the Postgres session
timezone, which is UTC in production. For an operator in UTC+8 that
meant a Tuesday afternoon's tasks landed in Tuesday early-morning's
bar — the chart was always one off.

Fix: store an IANA timezone on agent_runtime and aggregate under it.

* migrations 081 / 082 add agent_runtime.timezone (TEXT NOT NULL
  DEFAULT 'UTC') and rebuild the rollup pipeline (window function
  and both trigger functions) to compute bucket_date with
  AT TIME ZONE rt.timezone instead of bare DATE().
* No historical backfill — task_usage_daily rows already on disk
  keep their UTC bucket_date; only future writes / re-touches
  recompute under the new tz. (Product call from MUL-1950: 'guarantee
  future correctness'.)
* runtime_usage.sql gains a @tz parameter on ListRuntimeUsage and
  GetRuntimeUsageByHour and threads tz through GetRuntimeTaskHourly  Activity. ListRuntimeUsageDaily reads bucket_date as-is since the
  rollup already wrote it in tz.
* parseSinceParamInTZ replaces the raw N×24h cutoff with start-of-
  day-N in the runtime's tz so 'last 7 days' lines up with bucket
  boundaries.
* Daemon registration sends the host's IANA tz (TZ env, then
  time.Local), and UpsertAgentRuntime preserves any user override
  via a CASE-on-existing-value pattern so a daemon reconnect can't
  silently revert the operator's setting.
* New PATCH /api/runtimes/:id endpoint (UpdateAgentRuntime) lets
  the runtime detail page edit the tz; the editor seeds with the
  browser tz on first interaction.

Refs: MUL-1950

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: multica-agent <github@multica.ai>

* fix: harden runtime timezone rollups

Co-authored-by: multica-agent <github@multica.ai>

* fix: address runtime timezone review nits

Co-authored-by: multica-agent <github@multica.ai>

---------

Co-authored-by: Eve <eve@multica.ai>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: multica-agent <github@multica.ai>
Co-authored-by: Eve <eve@multica-ai.local>
2026-05-11 14:39:35 +08:00
Multica Eve
a2dd80d4f6 feat(autopilot): skip dispatch when assignee runtime is offline (MUL-1899) (#2311)
* feat(autopilot): skip dispatch when assignee runtime is offline (MUL-1899)

Prevents scheduled autopilots from accumulating doomed tasks against
offline / archived / unbound agents. Before this change, a paused laptop
or crashed daemon would let a 5-minute-cron autopilot pile up thousands
of queued agent_task_queue rows that no runtime would ever drain — this
is the dominant source of the 89k stuck-task backlog flagged in MUL-1899.

DispatchAutopilot now performs a pre-flight admission check on the
assignee agent's runtime status. If the runtime is not 'online' (or the
agent is archived / has no runtime bound / has no assignee), the run is
recorded as 'skipped' with a failure_reason and no task is enqueued.
Skipped runs still emit autopilot:run.done so the UI / activity feed
reflect that the trigger fired and was evaluated.

Skipped runs are deliberately NOT counted toward the failure-ratio
auto-pause: a user who closes their laptop overnight should not have
their autopilot paused. Sustained server-side failures keep their
existing pause path via the failure monitor.

Tests: added an integration test that creates an offline runtime and
asserts DispatchAutopilot records a skipped run with no task enqueued.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: multica-agent <github@multica.ai>

* feat(scheduler): expire stale queued tasks via TTL sweeper (MUL-1899)

Companion to the dispatch-time admission gate added in this PR. The
admission gate prevents *new* tasks from being enqueued against an
offline runtime, but it does not drain the historical backlog
(~89k stuck queued rows observed at MUL-1899 baseline) and does not
help when a runtime goes offline *after* a task has already been
queued. This adds a passive TTL sweeper:

- New SQL query `ExpireStaleQueuedTasks` transitions queued tasks
  older than the TTL to status='failed' with
  failure_reason='queued_expired' and a clear error message.
- Sweep is capped per tick (`queuedExpireBatchSize`, default 500) via
  a CTE+LIMIT so that draining a large backlog cannot monopolise the
  DB on a single tick. At 30s ticks the worst case is 60k rows/hour.
- Wired into the existing 30s `runRuntimeSweeper` loop alongside
  `sweepStaleTasks` and reuses `taskSvc.HandleFailedTasks` so the
  expired tasks broadcast `task:failed` events, reconcile agent
  status, and roll back any in-progress issues — same lifecycle as
  any other failed task.
- Default TTL = 2h. Conservatively above any reasonable
  "queued behind a long-running task" window (default agent timeout
  is 2h, sweeper runs every 30s) so legitimate work isn't expired.
- Integration tests cover the happy path (stale → expired, fresh →
  left alone, correct status/reason/error) and the per-tick batch cap.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: multica-agent <github@multica.ai>

* fix(autopilot): address review blockers from PR #2311 (MUL-1899)

GPT-Boy review of the offline-runtime + queued-TTL PR flagged four
blockers; this commit addresses them all.

1. Restore the 'skipped' autopilot_run status in the DB constraint.
   Migration 043 had removed 'skipped' along with the now-defunct
   concurrency_policy feature, so the new admission gate's INSERT of
   status='skipped' violated `autopilot_run_status_check` and broke
   `TestAutopilotDispatchSkipsWhenRuntimeOffline` in CI. New
   migration 079 re-adds 'skipped' to the CHECK list. The down
   migration migrates skipped → failed before re-tightening, mirror-
   ing what 043 did for the original removal.

2. Make `ExpireStaleQueuedTasks` race-safe.
   The CTE-then-UPDATE pattern could clobber a task that the daemon
   claimed between victim selection and the outer update. Two
   guards added:
     - `FOR UPDATE SKIP LOCKED` in the CTE so we never wait on a
       row that's currently being claimed (and never block the
       claim path either).
     - The outer UPDATE now re-checks `t.status = 'queued'` AND the
       TTL predicate so even if a row's lock is released after a
       successful claim, we cannot transition a now-dispatched/
       running task to 'failed'.

3. Add a partial index for the queued-TTL sweeper.
   `idx_agent_task_queue_queued_created_at` on `created_at WHERE
   status = 'queued'` — keeps the 30s sweep query (status=queued
   AND created_at < ... ORDER BY created_at LIMIT 500) cheap even
   when historical terminal rows accumulate (~89k+ at MUL-1899
   baseline). The partial predicate keeps the index tiny because
   only in-flight rows live in 'queued'.

4. Fix the failure-monitor denominator.
   `SelectAutopilotsExceedingFailureThreshold` had been counting
   'skipped' toward total runs, which would have diluted the failure
   ratio: a 100%-failing autopilot could mask itself behind a wall
   of admission skips. With 'skipped' restored as a real status,
   the auto-pause monitor must explicitly exclude it from BOTH
   numerator and denominator — admission skips are neither a
   success nor a failure.

Verified: `go test ./cmd/server/... ./internal/service/...` passes
(including TestAutopilotDispatchSkipsWhenRuntimeOffline,
TestExpireStaleQueuedTasks, TestExpireStaleQueuedTasksRespectsBatch
Limit). `go build ./... && go vet ./...` clean.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: multica-agent <github@multica.ai>

* fix(migrations): split queued-task TTL index into concurrent migration

Per PR #2311 review: agent_task_queue is a hot table, so building the
new partial index with plain CREATE INDEX inside migration 079 would
hold ACCESS EXCLUSIVE on the queue and block dispatch during deploy.

The migration runner does not allow CONCURRENTLY to share a file with
other statements (documented in 068), so split the index into its own
single-statement file 080 — matching the existing pattern in 035 /
067 / 074 / 075 / 078. Migration 079 keeps the autopilot_run
constraint change.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: multica-agent <github@multica.ai>

---------

Co-authored-by: Eve <eve@multica-ai.local>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: multica-agent <github@multica.ai>
2026-05-09 15:07:57 +08:00
Bohan Jiang
6d9ebb0fdd fix(daemon): unblock issues stuck on a poisoned-image agent session (#2314)
* fix(daemon): treat upstream API 400 invalid_request_error as poisoned session

A markdown-linked image in an issue description that the agent downloads as
a tiny CDN auth-error file and Read's as a PNG poisons the conversation:
the LLM API rejects the bad image with 400 invalid_request_error, the
session_id is pinned mid-flight, and every follow-up task on the issue
(comment-trigger, auto-retry) resumes the same poisoned conversation and
hits the same 400 — the issue can no longer be executed even after the
description is cleaned up.

Mirror the existing fallback-output classifier on the error side: detect
"API Error: ... 400 ... invalid_request_error" in the agent error string,
persist failure_reason='api_invalid_request', and add it to the
GetLastTaskSession exclusion list so the next task starts a fresh
session that re-reads the (now-clean) description.

Co-authored-by: multica-agent <github@multica.ai>

* fix(daemon): unblock issues already poisoned by API 400 invalid_request_error

The forward-only classifier from the previous commit only tags new failures.
Issues like MUL-1918 already have multiple failed-task rows whose
failure_reason is the pre-fix default 'agent_error', and GetLastTaskSession
falls back to those legacy rows on the next claim — so deploying the
classifier alone leaves existing poisoned issues stuck (GPT-Boy review
on PR #2314).

Two complementary changes:

- Migration 079 backfills failure_reason='api_invalid_request' on every
  pre-existing 'agent_error' row whose error text matches the canonical
  Anthropic 400 invalid_request_error shape. Keeps observability
  consistent (multica issue runs / UI now report the right reason).

- GetLastTaskSession adds a defensive ILIKE clause on error text. Closes
  the deploy-window gap where the old binary could write a new
  'agent_error' row between the migration running and the new code
  taking over, and protects against future error-format variants the
  daemon classifier might miss.

Plus regression tests covering the legacy + new coexistence case GPT-Boy
flagged, and a guard rail asserting benign 'agent_error' failures
(timeouts, tool errors) still resume their session.

Co-authored-by: multica-agent <github@multica.ai>

---------

Co-authored-by: multica-agent <github@multica.ai>
2026-05-09 14:39:10 +08:00
Jiayuan Zhang
3b3be9d7bd feat(comments): resolve threads with collapsible bar (MUL-1895) (#2300)
* feat(comments): resolve threads with collapsible bar (MUL-1895)

Adds a Linear-style resolve action on comment thread roots. Resolved
threads collapse to a single "N resolved comments from X" bar in the
activity feed; clicking expands the thread inline (per-session, not
persisted). Replying inside a resolved thread auto-unresolves it.

Backend
- migration 069: resolved_at, resolved_by_type, resolved_by_id on comment
- sqlc ResolveComment / UnresolveComment queries (idempotent via COALESCE)
- POST/DELETE /api/comments/{id}/resolve handlers, root-only validation
- CreateComment auto-clears resolved_at when a reply lands in a resolved
  thread, publishing comment:unresolved
- comment:resolved / comment:unresolved events; CommentResponse and
  TimelineEntry both surface the new fields

Frontend
- Comment + TimelineEntry types extended; payloads typed; WS sync wired
- useResolveComment optimistic mutation with rollback
- ResolvedThreadBar component for the collapsed view
- Resolve / Unresolve menu items on root comments; Collapse strip on the
  expanded resolved card
- en + zh-Hans locale strings

Co-authored-by: multica-agent <github@multica.ai>

* fix(comments): cover agent reply path, expand-state hygiene, nested counts (MUL-1895)

Addresses three review issues from Emacs on PR #2300:

1. TaskService.createAgentComment bypasses Handler.CreateComment, so the
   auto-unresolve wired into the handler did not fire when an agent replied
   in a resolved thread (task / mention / on_comment paths). Extracted the
   logic to TaskService.AutoUnresolveThreadOnReply so both reply paths share
   it; rewired Handler.CreateComment to call the new method.

2. Resolving an already-expanded thread no longer collapses it back to the
   bar because expandedResolved still contained the id. Added
   clearResolvedExpand + handleResolveToggle wrapper so resolve / unresolve
   always wipe the session expand entry.

3. ResolvedThreadBar received only direct children, while CommentCard's
   expanded view recurses through descendants. Extracted the recursive
   walk into thread-utils.collectThreadReplies and called from both —
   counts and author lists now match.

Co-authored-by: multica-agent <github@multica.ai>

* test(comments): mock useResolveComment + add zh-Hans plural key

Co-authored-by: multica-agent <github@multica.ai>

---------

Co-authored-by: multica-agent <github@multica.ai>
2026-05-09 05:49:33 +02:00
Multica Eve
eb067ff077 fix(server): aggregate task_usage into daily rollup table to cut DB load (#2256)
* fix(server): aggregate task_usage into daily rollup table to cut DB load

ListRuntimeUsage previously did a SUM(...) GROUP BY DATE(created_at), provider,
model over the raw task_usage stream once per runtime row on the runtimes
list and once per detail page load, scaling O(events) per call. This is the
hot read path responsible for sustained load on Postgres.

Switch the read path to a materialized daily rollup table maintained by a
pg_cron job:

- 072_task_usage_daily_rollup: schema for task_usage_daily +
  task_usage_rollup_state, plus rollup_task_usage_daily_window(p_from, p_to)
  (window primitive used by both cron and offline backfill, idempotent via
  ON CONFLICT DO UPDATE adding deltas) and rollup_task_usage_daily() (cron
  entry point — pg_try_advisory_lock(4242) for serialization, watermark
  advancement, 5-minute safety lag for late-visible inserts). Also adds
  idx_task_usage_created_at to help the two lazy endpoints
  (ListRuntimeUsageByAgent / GetRuntimeUsageByHour) that still hit the
  raw table.

- 073_task_usage_daily_pgcron: CREATE EXTENSION IF NOT EXISTS pg_cron in a
  DO/EXCEPTION block (mirrors the migration 032 pg_bigm pattern so envs
  without shared_preload_libraries=pg_cron skip gracefully) and schedules
  rollup_task_usage_daily() every 5 minutes when the extension is present.

- queries/runtime_usage.sql ListRuntimeUsage rewritten to read from
  task_usage_daily; sqlc regenerated. Other usage queries unchanged.

- cmd/backfill_task_usage_daily: one-shot Go command that walks
  task_usage in monthly slices through rollup_task_usage_daily_window,
  then stamps the watermark to now()-5m so the cron resumes cleanly.
  Run once after migrations have applied, before relying on the rollup.

- runtime_test.go: TestGetRuntimeUsage_BucketsByUsageTime now invokes
  rollup_task_usage_daily_window after fixture inserts so the handler
  sees the rolled-up rows. Synthetic daily rows cleaned up after each
  test.

- runtime_rollup_test.go: new tests covering aggregation correctness,
  idempotency contract of ON CONFLICT DO UPDATE, and the watermark
  advancing exactly to now()-5m via the cron entry point.

Deployment order: apply migrations → run backfill_task_usage_daily once
→ pg_cron picks up subsequent windows automatically. Today bucket may be
up to ~10 minutes stale (5 min cron + 5 min lag) by design.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: multica-agent <github@multica.ai>

* fix(server): make task_usage_daily rollup safe to overlap, replay, and correct

Addresses 4 review blockers on the original PR:

1. Cron/backfill double-count race: the rollup function is now idempotent.
   Window calls find DIRTY KEYS via task_usage.updated_at, then RECOMPUTE
   each bucket from ground truth and REPLACE the daily row (no more
   additive ON CONFLICT). Cron and backfill can now overlap safely.

2. Silent pg_cron absence: the read path is gated behind a new
   USAGE_DAILY_ROLLUP_ENABLED feature flag (default off). The raw
   task_usage scan is preserved as the fallback. Operators flip the
   flag per-environment after backfill + cron are confirmed healthy
   (task_usage_rollup_lag_seconds() helper added for monitoring).

3. UpsertTaskUsage corrections invisible to rollup: added
   task_usage.updated_at column (default now(), backfilled from
   created_at), and bumped it on conflict. Corrections now mark the
   bucket dirty and the next window call recomputes it correctly.

4. CREATE INDEX blocking writes on hot table: split into separate
   single-statement migrations using CREATE INDEX CONCURRENTLY
   (074, 075), matching the 035/067 pattern.

Also: cron.schedule() removed from migrations entirely. Migration 076
only enables the extension (gracefully on unsupported envs); the actual
schedule is a documented operator runbook step that runs AFTER backfill.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: multica-agent <github@multica.ai>

* fix(server): trigger-driven invalidation + online-safe migration for task_usage_daily

Round-2 review feedback on PR #2256:

1. Add explicit dirty-bucket queue (task_usage_daily_dirty) populated by
   triggers on agent_task_queue (UPDATE OF runtime_id, DELETE) and
   task_usage (DELETE). The rollup window function drains both this queue
   and the updated_at-based discovery, so runtime reassignment and
   issue-cascade deletes no longer leave the rollup divergent from the
   raw query.

   Triggers join via agent (not issue) to look up workspace_id, because
   when the cascade comes from issue, the issue row is already gone by
   the time atq's BEFORE DELETE fires; agent stays alive.

2. Make migration 072 online-safe: only ADD COLUMN updated_at TIMESTAMPTZ
   (nullable, no default → metadata-only ALTER, no row rewrite) and a
   separate ALTER for SET DEFAULT now() (also metadata-only). No bulk
   UPDATE on the hot task_usage table. The rollup window function's
   dirty_keys CTE handles legacy NULL rows via an OR branch, supported
   by partial index idx_task_usage_created_at_legacy.

3. Refresh stale documentation in cmd/backfill_task_usage_daily/main.go
   header to describe the current recompute/replace semantics, idempotent
   re-runnability, and the actual migration numbering (072..077).

Tests:
- TestRollupTaskUsageDaily_InvalidationOnReassign: verifies usage moves
  between runtime buckets after ReassignTasksToRuntime-style update.
- TestRollupTaskUsageDaily_InvalidationOnIssueDelete: verifies daily
  bucket is cleared after issue delete cascades through atq → task_usage.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: multica-agent <github@multica.ai>

* fix(server): close dirty-queue race + move legacy partial index to its own concurrent migration

Round-3 review feedback on PR #2256:

1. Blocker: dirty-queue invalidations could be silently lost under
   concurrency. ON CONFLICT DO NOTHING let a late trigger see the row
   already enqueued, no-op, and then the rollup drain (WHERE
   enqueued_at < p_to) would delete the original row — losing the
   late invalidation. Switched all three trigger enqueue paths to
   ON CONFLICT DO UPDATE SET enqueued_at = GREATEST(existing,
   EXCLUDED.enqueued_at), so any invalidation arriving during a
   rollup tick keeps enqueued_at > p_to (p_to = now() - 5min) and
   survives the post-tick drain.

2. High: idx_task_usage_created_at_legacy (partial index on hot
   task_usage table) was being created in the regular 077 migration
   without CONCURRENTLY. Moved to new migration 078 with
   CREATE INDEX CONCURRENTLY, matching the pattern of 074/075.
   077's down migration leaves the index alone (it is owned by 078).

3. Minor: gofmt -w on runtime_rollup_test.go and
   backfill_task_usage_daily/main.go (tabs were lost in the original
   heredoc append). PR description rewritten to describe the current
   recompute/replace + dirty queue + feature flag design and the
   072..078 migration ordering.

Tests still green: TestRollupTaskUsageDaily_* (including both new
invalidation regressions), TestGetRuntimeUsage_*, TestWorkspaceUsage_*.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: multica-agent <github@multica.ai>

* fix(server): unify workspace_id source via agent in rollup window function

Round-4 review feedback (J) on PR #2256:

M1 (must-fix): The dirty queue triggers resolved workspace_id via
`agent.workspace_id`, but the window function's `dirty_from_updates`
discovery and `recomputed` recompute join used `issue.workspace_id`.
There is no schema-level FK guaranteeing
`agent.workspace_id == issue.workspace_id`. Any divergence (future
cross-workspace task scenarios, data repairs, migration bugs) would
cause:

  - dirty queue rows with workspace_id from agent
  - recompute join filtering by workspace_id from issue
  - 0 matches in recompute → bucket erroneously hits the
    deleted_empty branch and the daily row is silently dropped
  - dirty_from_updates path attributing usage to the wrong workspace

Replaced both CTEs to JOIN agent (not issue) so trigger / discovery /
recompute share one workspace_id source. Comment in 077 explains the
constraint.

N1: Refreshed two stale references in
cmd/backfill_task_usage_daily/main.go (header now says "072..078";
stampWatermark warning now mentions migration 073, where the rollup
state table is actually introduced).

Test: New TestRollupTaskUsageDaily_WorkspaceMismatch constructs an
atq with agent.workspace_id != issue.workspace_id, asserts the bucket
lands under agent's workspace (not issue's), and re-asserts after a
runtime reassign in the foreign workspace. Acts as a canary if the
schema invariant changes.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: multica-agent <github@multica.ai>

---------

Co-authored-by: Eve <eve@multica.ai>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: multica-agent <github@multica.ai>
Co-authored-by: Devv <devv@Devvs-Mac-mini.local>
2026-05-08 15:35:21 +08:00
LinYushen
250ada1fb3 chore(db): drop unused agent_task_queue.last_heartbeat_at (#2212)
Drops the unused agent_task_queue.last_heartbeat_at column and removes the hot-path task heartbeat write.
2026-05-07 15:45:29 +08:00