373 Commits

Author SHA1 Message Date
marovole
0e31a9ca58 fix(agent/runtimes): show Cursor Composer token usage and billing (#4135)
* fix(agent/runtimes): show Cursor Composer token usage and billing

Attribute Cursor stream-json usage to the configured runtime model when
result events omit `model`, and add Composer/Auto pricing so dashboard
cost estimates resolve for composer-2.5 runs.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(views): align Cursor Composer pricing

Co-authored-by: multica-agent <github@multica.ai>

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Eve <eve@multica-ai.local>
Co-authored-by: multica-agent <github@multica.ai>
2026-06-15 15:52:48 +08:00
Naiyuan Qing
ea4f816ce2 fix(comments): support edit trigger suppression (#4136)
Co-authored-by: multica-agent <github@multica.ai>
2026-06-15 15:12:45 +08:00
Naiyuan Qing
63cf0ed308 feat(lists): rebuild all six list surfaces on a shared Linear-style list grid (#4038)
* fix(issues): render thread replies in chronological order (#3691)

collectThreadReplies walked the parent_id tree depth-first, so an agent
reply forced to nest under its trigger comment rendered before earlier
sibling replies (A-D-B-C instead of A-B-C-D) whenever the agent returned
late. Sort the collected subtree by created_at (id tie-break) so the
thread reads in arrival order — the same order the server already feeds
agents via `comment list --thread` (ListThreadCommentsForIssue).

All other consumers of the array (resolution derivation, fold bars,
counts, deep-link) are order-independent.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* feat(skills): rebuild skills list on shared Linear-style list grid

- new ListGrid primitives (subgrid: single source of truth for column tracks)
- skills list: sortable columns, used-by avatar stack, source/creator columns,
  row kebab + batch toolbar with add-to-agent and delete
- skill view store in core; addAgentSkills client method; HoverCheck extracted
  to views/common (issues header now imports the shared copy)
- locale keys for list actions/filters and the reworked detail page

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* feat(skills): rework detail page into overview/files tabs

- tabs directly under the breadcrumb header: overview (default) and files
- overview: identity block + rendered SKILL.md as the main column, right
  rail with metadata card (source/creator/updated, inline name+description
  edit toggle) and used-by panel with bind/unbind
- files: file tree + viewer/editor unchanged; SKILL.md "edit" jumps here
- header kebab menu (copy skill ID, delete); page-level save bar shared by
  both tabs; tab state persisted in ?tab=
- file tree: ARIA tree roles + roving-tabindex keyboard navigation
- drop the old right sidebar (metadata dl, permissions paragraph)

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* revert(skills): restore detail page to main, keep branch list-only

Drop the overview/files tabs rework from this branch so the PR scope is
the list rebuild only. skill-detail-page.tsx and file-tree.tsx are back
to the main versions; the locale detail/file_tree sections are restored
to match. The detail rework is preserved on stash/skills-detail-tabs
for a follow-up PR.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* feat(skills): drop description column from skills list

Description is agent-facing routing metadata, not a scannable list
property — Linear's display options expose no description column for
the same reason. Removes the cell, column key, display toggle, lg grid
track, skeleton cells, and the now-dead table.description /
table.no_description locale keys.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* feat(skills): drive list column hiding by container width, drop by priority

Replace viewport sm:/lg: breakpoints with Tailwind v4 container query
variants (@2xl/@4xl) on the list wrapper, so an open sidebar or split
pane narrows the column set instead of squashing tracks. Remove the
min-w-fit + overflow-x-auto horizontal-scroll fallback: when space runs
out, low-priority columns (created/source/creator, then updated) drop
and return as the container widens; name and usedBy never drop. ListGrid
conventions comment updated — this is the template for all list pages.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* feat(skills): virtualize list rows with @tanstack/react-virtual

Linear-style headless virtualization: the virtualizer computes the
visible index range and offsets; offsets land as padding on the
scrolling ListGridBody so mounted rows stay direct subgrid children and
column alignment is untouched. Fixed 48px rows skip per-row measurement.

Hideable column tracks move from max-content to deterministic widths
(CSS vars) — with only the visible slice mounted, content-driven tracks
would resize during scroll. A user-hidden column zeroes its var so the
track still collapses; per-cell max-w caps move into the tracks.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* fix(skills): list tiers must fit their container trigger width

The @4xl tier's track sum (~1080px with gaps) exceeded its 896px
trigger; with the horizontal-scroll fallback gone, the right-side
columns were clipped unreachably between 896-1080px. Move tier 3 to
@5xl (1024px), trim usedBy/source/creator tracks, and document the
fit invariant with its arithmetic next to the template and in the
ListGrid conventions.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* feat(skills): show description as subtext under the skill name

Lives in the name track as a second truncated line (max-w 36rem,
title attr for the full text) — no track, no header, no slot in the
responsive arithmetic. Both lines fit the fixed 48px row, so the
virtualizer contract is untouched; rows without a description center
the name.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* Revert "feat(skills): show description as subtext under the skill name"

This reverts commit f39721301b.

* fix(skills): anchor batch toolbar to the page, not the viewport

fixed bottom-6 left-1/2 centered the bar on the window; with the
sidebar open the list's visual center sits ~120px right of the window
center, so the bar looked off-center (worse with desktop split panes).
Page root becomes the positioning context (relative) and the bar uses
absolute — same rule applies to future list pages.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* feat(skills): show matching count next to search while list is narrowed

"n / total" appears right of the search box only when search or
filters are active — idle state would duplicate the total already in
the page header.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* feat(autopilots): derive trigger kinds, next run, last run status in list

The list endpoint only selected the autopilot table, so the list UI
could not answer "is this automation working" without N+1 detail
calls. Each list row now carries trigger_kinds + next_run_at (enabled
triggers only — the columns describe how it fires today) and
last_run_status (most recent run). Fields are omitempty and absent
from detail/create/update responses; clients must treat them as
optional per the API compatibility rules.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* feat(autopilots): list schema, parsed client, and view store in core

- listAutopilots now runs through parseWithFallback with a zod schema
  (this endpoint was a bare fetch — overdue per the API compatibility
  rules); malformed bodies degrade to an empty list, old-server rows
  without assignee_type or the new derived fields parse cleanly, and
  enum drift passes through as plain strings
- Autopilot type gains the three optional list-only derived fields
- New autopilots view store (scope/sort/columns/filters, persisted per
  workspace): status is the promoted scope dimension so it does NOT
  appear in filters — one dimension lives in exactly one place

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* feat(autopilots): rebuild list on shared ListGrid with scope buttons

Same skeleton as the skills list (container-query tiers, deterministic
var-width tracks with documented fit arithmetic, virtualized 48px rows,
sortable headers, filter + display toolbar, page-anchored batch
toolbar), plus the autopilots-specific pieces:

- Status is the promoted SCOPE dimension: 全部/运行中/已暂停/已归档
  segmented buttons with full-set counts; "all" = active+paused
  (archived gets its own visible home, Linear archive semantics);
  status is therefore absent from the filter dropdown
- Columns: name (paused marker inline), assignee (agent/squad),
  trigger kind badges, last run (outcome dot + time, enum-drift safe
  default), next run; mode/creator/created opt-in hidden
- Filters: assignee, trigger kind, mode, creator (composite type:id
  values for polymorphic actors); sort name/lastRun/nextRun/created
  with lastRun desc default
- Row kebab (pause/resume/archive/unarchive/delete) and batch toolbar
  share one delete dialog; status changes ride useUpdateAutopilot's
  optimistic cache
- Fix noUncheckedIndexedAccess errors the branch had never typechecked
  (skills virtual rows, UsedByCell, added_toast)

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* fix(autopilots): scope buttons follow the issues header pattern

Replace the bespoke segmented-pill control with the existing scope
button convention from the issues page: outline buttons with bg-accent
active state on md+, collapsing to a radio dropdown below md. Counts
stay (stage inventories from the full set).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* fix(skills,autopilots): toolbar small-screen treatment follows issues header

Below md: the search box (and its result count) disappear entirely,
and the filter/display controls collapse to square icon-only buttons
(labels and the clear-X are md+), matching the issues header's
responsive pattern.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* fix(skills,autopilots): two-zone columns — WYSIWYG with scroll escape valve

Static width tiers silently hid user-enabled columns (toggle on,
nothing appears — autopilots' mode/creator/created sat behind a 1280px
container gate no laptop reaches; skills' source/created behind
1024px). Tiers can't know how many columns are enabled, so the
mechanism is replaced, not retuned:

- ≥@2xl container: every enabled column renders; the grid carries
  min-width = Σ(enabled tracks + gaps) (pure constants, no
  measurement) and the wrapper scrolls horizontally only when the
  enabled set outgrows the container
- <@2xl: static core set (skills: name+usedBy; autopilots:
  name+assignee), no scroll, toggles don't apply

Per-tier templates and the hand-maintained fit arithmetic retire;
ListGrid conventions updated accordingly.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* fix(skills,autopilots): widen name column minimums (120px base, 200px wide)

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* feat(autopilots): drop the archived scope and the list search box

Archiving never existed as a UI flow (the DB status value is only
reachable via direct API; the detail page disables its switch when
archived), so the list stops inventing it: no archived scope, no
archive/unarchive row or batch actions. API-archived rows are excluded
everywhere; a persisted retired scope value falls back to "all".
The search box goes too — scope buttons already partition the small
set, search is redundant (product call). Skills keeps its search (no
scope there).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* feat(skills,autopilots): quiet outline create buttons in page headers

Page-header chrome shouldn't carry the loudest element on the page:
the create button becomes outline with text on md+ and collapses to a
square plus icon below md (same responsive treatment as the toolbar
controls). Primary stays reserved for empty-state CTAs. Agents follows
when its list migrates to ListGrid.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* feat(agents): rebuild list on shared ListGrid with identity rows

Same skeleton as the skills/autopilots lists (two-zone container
responsiveness, deterministic var tracks + min-width scroll escape
valve, virtualized fixed-height rows, issues-style scope buttons,
page-anchored batch toolbar, quiet outline create button), plus the
agents-specific decisions:

- Identity rows: the documented exception to the single-line rule —
  avatar + name + description two-line cells, 64px rows (agents are
  few, identity-rich entities); the italic "no description"
  placeholder is gone, empty descriptions just center the name
- Scope: Mine (historical default) | All | Archived with full-set
  counts; archived ignores the ownership lens; no search box
- The 7d sparkline column is replaced by a sortable "Last active"
  column derived from the same 30-day activity buckets (zero API
  change) — per-row-normalized mini bars can't be compared across
  rows, and the default sort finally has a visible anchor; the
  detailed histogram stays on the hover card / detail page
- Workload folds into the status cell ("Online · 2 tasks") — a 0-2
  integer doesn't earn a column
- Columns: status, runtime, last active, runs (30d); model/created
  opt-in hidden; filters: availability, runtime
- Operations unchanged: row kebab reuses AgentRowActions
  (cancel-tasks/duplicate/archive/restore with permissions); batch
  archive (confirmed) + restore; no delete — the API has none
- View store extended (scope incl. archived, sort, columns, filters);
  agent-columns.tsx (DataTable columns) deleted

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* fix(agents): trim status track to its real worst case (160 -> 144px)

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* feat(runtimes): machine detail's runtime table on the shared ListGrid

The master-detail console keeps its shape (machines are few and
strongly categorized; left list, charts, update section untouched) —
only the right pane's runtimes table moves from TanStack DataTable to
the ListGrid family, taking the paradigm pieces that earn their keep
at 1-5 rows: subgrid template + var tracks, two-zone container
responsiveness (the pane is squeezed by the machine list, so the
core-set collapse below @2xl matters more here than on full-width
pages), min-width scroll escape valve, shared header/row/hover visual
language. Deliberately NOT taken: virtualization, sorting, filters,
column toggles, and batch selection — dead weight at this row count,
and batch-deleting runtimes (a cascade-confirm operation) is unsafe
by design.

Workload folds into the health cell ("Online · Working 2") like the
agents status cell; the owner column keeps its only-when-multiple-
owners rule via a zeroed track var. runtime-columns.tsx is deleted;
the row-menu/CLI tests render the exported cells directly.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* fix(runtimes): collapse the kebab track when no row has actions

On a healthy local machine every row's only action (delete) is hidden
by the self-healing rule, leaving a permanent ~64px dead zone after
the CLI column. The action track now follows the owner column's
conditional-var mechanism: zeroed unless at least one row will show
the menu.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* fix(runtimes): drop doubled header border, align create button with convention

PageHeader already carries border-b; the content wrappers' border-t
stacked a second line right under it (the only list page doing this).
"Add a computer" follows the chrome-button convention: outline with
text on md+, square plus icon below md — primary stays reserved for
the empty state CTA.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* fix(runtimes): health cell load suffix matches the agents status cell

"Healthy · 2 tasks" instead of the old workload vocabulary
("Working 2 +1q") — the count is unit-bearing and both surfaces now
speak one language. The queued-anomaly distinction the old words
hinted at belongs to the health layer if it ever earns surfacing.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* fix(lists): pin overflow-y-hidden on the horizontal-scroll wrappers

CSS coerces overflow-x:auto into overflow:auto on both axes, which
silently armed the list wrappers with a vertical scrollbar they were
never meant to have. Combined with the h-full grid's percentage
resolution across scrollbar-induced reflows, the wrapper's vertical
bar and horizontal bar fed each other in a non-converging layout loop
(visible as two stacked, flickering scrollbars on the agents list —
the same latent loop exists in all four wrappers; agents' wider
min-width and 64px rows just hit the trigger zone first). Vertical
scrolling belongs solely to ListGridBody; declare overflow-y-hidden
explicitly to break the loop.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* fix(agents): single scroll container for the list (trial before rollout)

Both scroll axes move to the outer wrapper; the grid drops h-full and
the rows wrapper drops its own overflow. Kills the percentage-height
bridge between the two scroll elements that fed the flickering double
scrollbars and clipped the last row under the horizontal scrollbar.
Sticky header pins inside the scroller; vertical scrollbar now spans
the full pane (Linear's structure). Skills/autopilots follow after
visual confirmation.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* fix(lists): roll single scroll container out to skills/autopilots, add bottom clearance

ListGridBody retires its own scrolling entirely (the agents trial
confirmed the structure): both axes live on the single outer wrapper,
grids drop the h-full percentage bridge, virtualizers point at the
wrapper. The rows wrapper gains LIST_GRID_BOTTOM_CLEARANCE (64px)
appended to the virtualization padding so the last row scrolls clear
of the chat FAB (~48px at bottom-right) and the batch toolbar (~62px).
Runtimes' machine table is untouched: content-height at the top of a
tall pane, no bridge and no practical FAB overlap.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* feat(squads): rebuild list on shared ListGrid (identity rows, minimal)

The last list joins the family. Squads are the fewest entity (1-5 rows),
so this is the agents identity-row shell on the runtime-list minimal
skeleton: ListGrid subgrid + var tracks + two-zone responsiveness +
single scroll container, but NO virtualization, checkbox, or batch.

- Identity two-line rows (squad avatar + name + description, 64px) like
  agents; columns: name / leader / members (polymorphic ActorAvatar
  stack from member_preview), creator + created opt-in hidden
- Scope Mine/All (creator-based, issues-header styling, <md dropdown);
  no archived scope (list API hard-filters archived + no restore
  endpoint), no search (scope-bearing), no filters (set too small)
- Sort name (default) / members / created
- Row kebab = Archive (= the delete endpoint, which archives + transfers
  issues/autopilots to the leader); workspace owner/admin only, so the
  kebab track collapses for non-admins. Reuses the existing
  archive_dialog copy. No batch.
- View store extended (scope + sort + columns); zero API change — pure
  frontend (member_preview/count already in the list payload)

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* feat(agents,squads): owner/created-by columns + owner filter

Surface ownership as a real column on both lists, named by what the
field actually means in each permission model:
- Agents: "Owner" — owner_id is the creator (set at creation, never
  transferred) and carries management rights. Promoted to a default-
  visible column (avatar + name); the half-baked inline owner avatar in
  the name cell is removed ("You" badge stays).
- Squads: "Created by" (NOT Owner) — creator_id holds no rights
  (archiving is workspace-admin only), so Owner would mislead. Now a
  default-visible column with avatar + name.

Agents also gains an Owner filter, kept orthogonal to the Mine scope by
the single-axis rule: "Mine" is the clean no-filter personal view, so
applying any filter (owner or otherwise) leaves Mine for All, and
clicking Mine clears all filters. Owner and Mine therefore never
coexist — no "mine + owner=someone-else = empty" contradiction. Squads
keep the plain Mine/All toggle (too few rows for a creator filter).

Both lists keep a Created (date) column, opt-in hidden.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* fix(agents): backfill new filter dimensions on rehydrate (owners crash)

A view payload persisted before the owners filter existed overwrote the
default filters wholesale on rehydrate, dropping filters.owners to
undefined and crashing the list's filter predicate (.length on
undefined). The store merge now deep-merges filters over
EMPTY_AGENT_FILTERS so newly-added dimensions always get their default.
Regression test added.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* fix(skills,autopilots): deep-merge filters on rehydrate too

Same latent crash the agents store just hit: the copied view-store
merge spread persisted.filters wholesale, so adding a new filter
dimension later would drop it to undefined for users with older
persisted state. Harden skills and autopilots the same way (merge over
their EMPTY_*_FILTERS) before that bug can ship.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* feat(projects): rebuild table view on ListGrid + filters + pin/delete kebab

Projects is the dual-view list: the compact table moves onto the shared
ListGrid (subgrid tracks, two-zone responsiveness, single scroll
container, FAB bottom clearance) while the comfortable card grid stays
as the alternate view, toggled by a restyled view switch (Table/Cards
outline buttons, active = bg-accent). Inline editing is preserved —
rows are NOT whole-row links; the name navigates and status/priority/
lead stay click-to-edit (matching prior behaviour, no navigate-vs-edit
conflict).

- View store extended: viewMode + sort (name/priority/status/progress/
  created) + hidden columns + filters (status/priority/lead); merge
  deep-merges filters (migration-safe). No scope (lead optional/often
  an agent; status is a 5-value lifecycle → filter, not scope).
- Toolbar: search (kept — scopeless list) + result count + Filter
  (status/priority/lead) + Display (sort+columns, table view only).
- Row kebab: Pin/Unpin (any member, reuses the existing project pin
  API — zero new endpoints) + Delete (workspace admin). Pin is the
  flexible per-user favourite the list previously lacked.
- Zero API change; status/priority filtering is client-side like the
  other lists.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* fix(projects): GRID_COLS must be a literal string (Tailwind can't see interpolation)

The table view's grid-cols template interpolated ${STATUS_WIDTH}px, so
Tailwind never generated the arbitrary-value class — the grid collapsed
to one column and every cell stacked vertically. Inline the literal
116px. This is the documented ListGrid rule (keep the class literal so
Tailwind scans it).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* fix(projects): single view-toggle button, decouple Display from view mode

Two fixes from the same principle — view mode is pure presentation and
must not couple to anything:
- The view switch is now ONE button that flips table ⇄ cards (shows the
  current view's icon+label, tooltip names the target), instead of two
  side-by-side buttons.
- The Display (sort/columns) control no longer disappears when you
  switch to cards — it was gated on isCompact, so flipping the view
  made it vanish (the "filter gone after switching" weirdness). It's
  always present now; only the columns *section* inside the popover is
  table-only (cards have no columns). Sort applies to both views.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* fix(projects,squads): projects multi-select + squads FAB clearance/toast

Cross-list consistency audit fixes:
- projects: add multi-select (checkbox column + select-all header +
  page-anchored batch toolbar) — it's a dozens-scale full-page list
  like skills/autopilots/agents but was the only one missing it. Batch
  ops: Pin all (any member) + Delete (workspace admin). Table view
  only (cards have no checkboxes). GRID template + min-width updated
  for the checkbox track.
- squads: add the FAB bottom clearance the other full-page lists have
  (last row/kebab was sliding under the chat FAB).
- squads: archive success toast was showing the dialog's question
  title ("Archive this squad?"); use a proper "Squad archived" key.

Intentional and left as-is (documented): squads/runtimes have no
multi-select/virtualization (1-5 rows); projects table isn't
virtualized yet (dual-view + card grid; tracked as low-risk debt).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* feat(agents,squads): close the filter/column consistency gaps

Apply the principle "every categorical column is filterable" where it
was missing:
- agents: add a Model filter (model was a categorical column with no
  filter). Distinct non-empty models from the in-scope rows.
- squads: add filters entirely (it had leader/creator columns + a
  column-toggle panel but no Filter button — the only such outlier).
  Leader (agent) + Creator (member) filters, with the result count and
  the same Filter dropdown shape as the other lists. Store gains
  SquadListFilters + toggleFilter/clearFilters + migration-safe
  filters deep-merge.

autopilots creator stays default-hidden per product call (not every
"who made it" must be visible). Filter stores' partialize tests
updated.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* fix(autopilots): match list-page root to flex-1 convention

skills/agents/projects roots use `relative flex flex-1 min-h-0 flex-col`;
autopilots used `h-full`. Both anchor the batch toolbar correctly, but
align the flex sizing for consistency across the six list surfaces.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
2026-06-15 14:12:24 +08:00
Willow Lopez
a4fb84d5ac MUL-3273: fix(agent): parse Cursor token usage fields
Fixes Cursor agent token usage parsing for top-level camelCase, nested camelCase, and legacy nested snake_case result usage shapes. Includes tests for the locally verified nested camelCase stream-json output.
2026-06-15 14:04:05 +08:00
YOMXXX
34d4cd3a28 feat(openclaw): support connecting to existing OpenClaw gateway (#3260) [MUL-3158] (#3664)
* feat(openclaw): support connecting to existing OpenClaw gateway (#3260)

When the daemon host is a lightweight dev machine or CI coordinator, the
heavy agent work (LLM inference, code execution, tool use) often belongs
on a more powerful remote server already running an OpenClaw gateway.
Multica historically hard-coded `openclaw agent --local`, forcing every
turn to execute in-process on the daemon host.

This change adds an opt-in gateway routing mode controlled per-agent via
`runtime_config`:

  {
    "mode": "gateway",
    "gateway": { "host": "...", "port": 18789, "token": "...", "tls": false }
  }

- Backend: ExecOptions gains OpenclawMode + OpenclawGateway; buildOpenclawArgs
  drops `--local` when mode == "gateway". Per-task openclaw-config.json
  wrapper pins gateway.{host,port,auth.{mode,token},tls} so users do not
  need to edit the daemon host's `~/.openclaw/openclaw.json` to point at
  a different endpoint.
- Daemon: AgentData carries the raw runtime_config; decoding is fail-soft
  (malformed JSON falls back to local mode rather than blocking dispatch).
- API: gateway.token is masked to "***" on every GET; PATCH replays the
  sentinel back, and the update handler restores the persisted token so
  the round-trip never destroys the secret. Defense-in-depth masking on
  WS broadcasts, plus String/MarshalJSON masking on the in-memory struct
  to block stray `%+v` / json.Marshal leaks.
- UI: openclaw-only "Routing" tab on the agent detail page with mode
  selector + structured endpoint form. Token uses a "saved — submit a
  new value to rotate" UX and matching backend preserve hook.

Empty `runtime_config` keeps the historical embedded behaviour, so
existing agents are unaffected.

* fix(openclaw): address #3664 review — drop dead gateway field, gate pin on mode

Per Bohan-J's review:

- Remove the dead ExecOptions.OpenclawGateway field (+ its String/MarshalJSON and
  the daemon.go construction block). It carried the plaintext bearer token but was
  never read — buildOpenclawArgs only consumes OpenclawMode and the live gateway
  path runs through execenv.OpenclawGatewayPin — so this narrows the secret's
  footprint.
- Gate the gateway pin on mode=="gateway" in decodeOpenclawRuntimeConfig: a
  {"mode":"local","gateway":{...,"token"}} payload no longer writes the token into
  the 0o600 per-task wrapper that --local makes openclaw ignore.
- Warn on an unrecognized non-empty mode (e.g. "gatway") instead of silently
  falling back to local.
- Run preserveMaskedGatewayToken in CreateAgent too, so a literal "***" at create
  time can't persist as a real bearer token.
- Document the gateway host:port trust boundary (SSRF note for shared daemon hosts).

Adds regression tests for the local-mode pin drop and the unknown-mode warning.
2026-06-13 15:33:28 +08:00
Bohan Jiang
5b7eb9ad20 fix: normalize codex cached input usage (#4083)
Co-authored-by: J <j@multica.ai>
Co-authored-by: multica-agent <github@multica.ai>
2026-06-13 15:32:01 +08:00
Bohan Jiang
c8ab73d38d MUL-3244: Bind quick-create attachments to created issues (#4062)
* fix: bind quick-create attachments to created issues

Co-authored-by: multica-agent <github@multica.ai>

* test: use real image markdown in quick-create attachment test

Co-authored-by: multica-agent <github@multica.ai>

---------

Co-authored-by: J <j@multica.ai>
Co-authored-by: multica-agent <github@multica.ai>
2026-06-12 16:45:38 +08:00
Naiyuan Qing
d2a03b8edc Fix chat stop and send recovery (#4060)
* Fix chat stop and send recovery

Co-authored-by: multica-agent <github@multica.ai>

* Fix chat cancel recovery follow-ups

Co-authored-by: multica-agent <github@multica.ai>

* Guard cancelled chat restore on tx failure

Co-authored-by: multica-agent <github@multica.ai>

---------

Co-authored-by: multica-agent <github@multica.ai>
2026-06-12 15:29:14 +08:00
Liu Guanzhong
4594c776e1 feat(agent): add CodeBuddy as first-class CLI backend (#3186)
* feat(agent): add codebuddyBackend struct and buildCodebuddyArgs

Introduces the codebuddy agent backend skeleton with args builder
that mirrors claudeBackend's protocol flags (stream-json, bypass
permissions, blocked args filtering) for the codebuddy CLI fork.

* feat(agent): implement codebuddyBackend.Execute with stream-json parsing

* feat(agent): wire codebuddy into New() factory and launchHeaders

* feat(agent): add codebuddy dynamic model discovery from --help

* feat(agent): add codebuddy thinking/effort discovery and providerThinkingEnums

* feat(daemon): add codebuddy CLI probe, env vars, and args support

* fix(agent): use len(models)==0 for default model instead of loop index

* fix(agent): increase codebuddy --help timeout to 35s for slow CLI startup

* fix(agent): address codebuddy PR review feedback

- Wire codebuddy into execenv: reuse claude's CLAUDE.md, .claude/skills,
  and ~/.claude/skills paths since CodeBuddy is a Claude Code fork
- Replace hardcoded 20-min timeout with runContext for zero-timeout =
  no-deadline semantics matching all other backends
- Restore runContext regression tests lost in rebase merge
- Mirror claude.go execution model: concurrent stdin write to prevent
  pipe deadlock, sync.Once for stdin closure, keep stdin open for
  control_request auto-approval mid-run
- Add control_request handling with auto-approve behavior
- Add RequestID/Request fields to codebuddySDKMessage
- Add codebuddy to metrics knownRuntimeProviders
- Add codebuddy to provider-logo.tsx (reuses ClaudeLogo)
- Consolidate --help discovery: shared codebuddyHelpOutput cache
  eliminates duplicate cold-start invocations

---------

Co-authored-by: krislliu <krislliu@tencent.com>
2026-06-12 15:22:16 +08:00
Truffle
6acca84c28 fix(agent): clear stale session id when a resumed ACP session is gone [MUL-3216] (#4015)
* fix(agent): clear stale session id when a resumed ACP session is gone

When an agent's stored ACP session no longer exists on the runtime side,
session/resume still succeeds — hermes echoes the requested sessionId
back — so the failure only surfaces when session/prompt returns JSON-RPC
-32603 "Session not found". The backend then reported Status=failed with
the stale SessionID still set, which kept the daemon's resume-failure
fallback (gated on SessionID == "") from ever firing. The failed task
never updates the stored session, so every future mention on the same
(agent, issue) dispatched against the same dead id, forever (#4010).

handleResponse now returns a structured acpRPCError instead of a flat
string (rendered text unchanged), and the hermes/kimi/kiro prompt-error
paths clear the session id when the error is session-not-found class on
a resumed session. The daemon's existing retry then re-executes with a
fresh session and stores the replacement id, healing the mapping.

* fix(agent): clear stale session id when set_model hits a dead resumed session

With a model override, session/set_model runs before session/prompt,
so a resumed session that is gone on the agent side surfaces there
instead of at the prompt — and the error branch returned the stale
SessionID, so the daemon's fresh-session retry (gated on
SessionID == "") never fired. Apply the same clear-the-id fix in the
set_model error branch of all three backends.

Also relax isACPSessionNotFound to accept -32602: kimi-cli raises
RequestError.invalid_params({"session_id": "Session not found"}) for
every unknown-session path (src/kimi_cli/acp/server.py), so pinning
-32603 made the fix dead code for kimi. The wording gate keeps
unrelated invalid_params errors (e.g. "model not available") on the
preserve-the-id path.

Regression tests for all three backends: resumed session + model
override + set_model failing with each runtime's observed
session-not-found shape must yield status=failed with an empty
SessionID.
2026-06-11 14:54:56 +08:00
Naiyuan Qing
34c68e1e4c fix(comments): enforce single resolution per thread (#3984)
A thread could hold multiple resolved comments at once: ResolveComment
was a plain per-row setter that never cleared the prior resolution, and
"replacing" one was a display-only illusion (deriveThreadResolution
picks the max resolved_at). The stale rows stayed resolved in the DB and
the optimistic update flashed the new resolution, then reverted.

Make single-resolution-per-thread a write invariant:

- ClearOtherThreadResolutions: thread-scoped clear via a RECURSIVE CTE
  (root + descendants of the target, id <> target), returns each cleared
  row.
- ResolveComment handler runs the clear + set in one tx so the replace
  is atomic. It emits comment:unresolved per cleared sibling (granular
  realtime consumers patch a single comment in place and would otherwise
  keep showing the stale resolution). Target keeps its COALESCE
  idempotency and the re-resolve event suppression.
- Frontend optimistic update mirrors the invariant: resolving clears
  every other resolution in the same thread, so the cache never shows
  two at once. Unresolve still only clears its own row.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Co-authored-by: multica-agent <github@multica.ai>
2026-06-10 14:20:39 +08:00
Bohan Jiang
b1c8eb5f11 feat: support Claude Fable 5 pricing (#3982)
Co-authored-by: J <j@multica.ai>
Co-authored-by: multica-agent <github@multica.ai>
2026-06-10 12:33:27 +08:00
Antoine GIRARD
9f21d0b634 feat(transcript): add timestamps to run transcript entries (MUL-3174) (#3951)
Threads the existing task_message.created_at column through the full stack (Go protocol -> REST/WS handlers -> TS types -> transcript dialog) so agent run transcripts show per-entry timestamps, helping users spot stalled runs. Additive, no migration.
2026-06-09 19:53:30 +08:00
shutcode
a2ef95445b MUL-2794 fix(agent): stop Cursor sessions on terminal result (#3165)
Treats Cursor's stream-json terminal `result` event as the protocol completion boundary so a lingering Cursor worker process can no longer hold the daemon task open after the agent has produced its final result.

- Tighten `cmd.WaitDelay` to 500ms (set before `Start()`)
- Set `resultSeen` and `cancel()` on terminal `result`
- Preserve completed/failed status across the cancellation via two `!resultSeen` guards in the post-loop status decision
- Add unix fake-CLI coverage for success and `is_error` terminal results
2026-06-09 16:49:48 +08:00
elrrrrrrr
254ec945f5 fix(agent/codex): shut down gracefully so OTEL telemetry flushes (#3888)
Codex telemetry was never reaching the OTLP collector for tasks run by the
daemon. The per-task config (including the [otel] block) is copied into
CODEX_HOME correctly, but the lifecycle goroutine closed stdin and then
immediately cancelled the run context, which SIGKILLs the app-server. Codex's
OTEL batch exporters only force-flush on a graceful shutdown, so the buffered
spans/metrics/logs were dropped before they could be exported — short tasks
lost everything, long tasks lost the final batch.

Let codex exit on its own after stdin EOF (running its shutdown + flush path)
and only force-cancel after a bounded grace period if it doesn't, so the reader
goroutine still can't block forever. Also set cmd.WaitDelay, matching the other
long-lived backends (claude, copilot, cursor, …).

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-09 14:46:07 +08:00
Multica Eve
13e9485a3b MUL-3130: persist stable /api/attachments/<id>/download URL in comment markdown (#3937)
* MUL-3130: persist a stable attachment download URL in comment markdown

Comment image attachments rendered as broken placeholders ~30 minutes
after upload because the editor was persisting a short-lived
HMAC-signed URL into the comment body. After PR #3903 (MUL-3132)
hardened /uploads/* with auth, `attachmentToResponse` started signing
`attachment.url` as `/uploads/<key>?exp=<unix>&sig=<HMAC>` for
LocalStorage so token-auth clients could keep loading inline images.
The signature has a 30-min TTL by design — but `useFileUpload` was
returning that signed value as `link` and the editor was writing
`![file](signed-url)` straight into the markdown, so the comment
permanently captured a URL that stopped working as soon as the
signature expired.

The fix is to persist a stable per-attachment URL that the server can
re-sign on every request:

* `useFileUpload` now returns `link = /api/attachments/<id>/download`
  (avatar uploads without an id still fall back to `att.url` so the
  pre-attachment-row code paths keep working).
* `DownloadAttachment` self-resolves the workspace from the attachment
  row instead of reading X-Workspace-Slug / X-Workspace-ID headers,
  and the route is registered under the auth-only group so a native
  browser <img>/<video> resource load (which cannot attach those
  headers) succeeds. Membership is checked inside the handler with
  a 404 deny shape so the route does not act as an IDOR oracle.
* A new `GetAttachmentByIDOnly` SQL query supports the workspace-
  derivation step.
* `AttachmentDownloadProvider` now extracts the attachment id from
  the stable URL when matching markdown refs to attachment records,
  with a fallback to the existing url-equality check for legacy
  comments (and S3/CloudFront markdown that points straight at the
  CDN).
* `contentReferencesAttachment` covers both URL shapes for the
  composer / standalone-list dedup paths so an attachment uploaded
  before the fix and one uploaded after both deduplicate cleanly.

Tests:
- New unit tests for the URL helpers (16 tests, packages/core).
- Backend regression test: bare `<img src>`-style request without
  workspace headers now succeeds for a member (200) and 404s for a
  non-member, replacing the previous "400 without workspace context"
  contract.
- Existing TestDownload*, TestServeLocalUpload*, TestAttachmentTo
  Response* and the 1220 frontend views tests all pass.

Refs: MUL-3130, GitHub issue #3891
Co-authored-by: multica-agent <github@multica.ai>

* MUL-3130: address PR review — split markdown link from upload link, swap render src

Two follow-ups from GPT-Boy's review on PR #3937.

(1) Don't reroute every upload consumer through the workspace-gated
    download endpoint.

    The previous change made `useFileUpload`'s `link` field unconditionally
    return `/api/attachments/<id>/download` whenever the upload had an id.
    But `useFileUpload` is also used by avatar / logo pickers
    (account-tab, workspace-tab, agents/avatar-picker, squads/squad-detail-page)
    that persist `result.link` directly into `avatar_url`. Avatars are
    referenced cross-workspace (mention chips, member lists, inbox
    items), so binding their URL to a workspace-membership-gated
    endpoint would silently break cross-workspace avatar visibility.

    The fix splits the URL into two semantically distinct fields:

      - `link`         — same as `att.url` (legacy contract). Avatar /
                          logo callers continue to use this and remain
                          on whatever URL semantics the storage backend
                          dictates.
      - `markdownLink` — the stable per-attachment URL
                          `/api/attachments/<id>/download`. Only the
                          editor's markdown-persisting flow consumes
                          this. Falls back to `link` for the
                          no-workspace upload branch (where there is
                          no attachment-row id to address).

    `editor/extensions/file-upload.ts` switches `image.src` and
    `fileCard.href` to `markdownLink ?? link` so comment markdown gets
    the stable shape while avatar callers stay on `link` unchanged.

(2) Make the render-time img src loadable for token-mode clients.

    Persisting the stable `/api/attachments/<id>/download` URL fixes the
    expiry problem but the path itself sits behind `middleware.Auth`,
    which expects either a `multica_auth` cookie or a Bearer token in
    `Authorization`. Native `<img>`/`<video>` resource loads from
    token-mode clients (Electron's default mode, the mobile app,
    legacy-token web sessions) cannot attach the Authorization header,
    so the bare URL would 401 immediately rather than 30 minutes later.

    `Attachment.normalize` now runs the resolved record through a new
    `pickInlineMediaURL` helper that returns:

      - `record.download_url` when it's an absolute URL with a
         recognised CDN signature query (CloudFront-signed
         `Signature` / `Expires` / `Key-Pair-Id`, or
         `X-Amz-Signature` for raw S3 presigns) — these load as
         native resource src in any client.
      - else `record.url`, which on the LocalStorage backend carries
         a freshly-minted `/uploads/<key>?exp&sig` query whose
         signature IS the auth (token-mode-loadable). On non-CF S3
         backends this is the raw stored URL — same behaviour as
         today.
      - else the original input URL (legacy / unresolved markdown
         keeps its existing path).

    This gives the same effect for both `kind: "record"` and
    `kind: "url"` attachment inputs: once a record is in hand, the
    rendered media src is whichever URL the current backend exposes
    a working signature on.

Tests:

  - New `file-upload.test.ts` regression pinning that `markdownLink`
    is what lands in the markdown body when the upload result returns
    both a short-lived storage URL and a stable download path.
  - Updated `attachment.test.tsx` to reflect the new render-time
    swap (the rendered img src now follows the freshly signed URL,
    not the raw storage URL) and added a record-mode regression
    pinning the LocalStorage default — when `download_url` is the
    bare /api/attachments/<id>/download path, the renderer must fall
    through to the signed `record.url`.
  - Updated `chat-input.test.tsx` makeUpload helper for the new
    `markdownLink` UploadResult field.
  - 1222 frontend views tests + 507 core tests + typecheck across
    @multica/{core,ui,views} all pass.

Refs: MUL-3130, GitHub issue #3891. Builds on a740f7a35.
Co-authored-by: multica-agent <github@multica.ai>

* MUL-3130: chat upload map keys on persisted markdownLink, not the short-lived link

GPT-Boy's second-round review on PR #3937 caught a chat-only blocker
left over from the previous fix.

After the previous commit split `UploadResult.link` into `link`
(legacy avatar/logo URL) and `markdownLink` (stable per-attachment
URL persisted into markdown), the comment editor's image src + file
card href correctly switched to `markdownLink ?? link`. But chat
input still kept the upload-map key on the old `link`:

  uploadMapRef.current.set(result.link, result.id)
  …
  if (content.includes(url)) activeIds.push(id)

In the LocalStorage backend `link` is the short-lived
`/uploads/<key>?exp=&sig=` URL. The editor persists the stable
`/api/attachments/<id>/download` URL into the message body, so
`content.includes(url)` never matches and the send call drops
`attachment_ids`. The attachment ends up bound only to the chat
session, not to the message — agents reading message-level metadata
see no attachments.

Fix: key the upload map on the same value the editor actually wrote
into the markdown body (`markdownLink || link`). The
`content.includes(url)` check then matches and the attachment id is
correctly forwarded on send.

Tests:

- Updated the chat-input mock editor to insert `markdownLink || link`
  into its value, mirroring the real editor's persisted-URL choice
  (uploadAndInsertFile in editor/extensions/file-upload.ts). Without
  this the mock would silently paper over the bug.
- Added a regression test where the upload result returns a
  short-lived `link = /uploads/...?exp&sig` and a stable
  `markdownLink = /api/attachments/<id>/download`. Asserts (a) the
  message body carries the stable URL and never the signed query,
  and (b) the bound `attachment_ids` includes the attachment id.

All 1223 frontend views tests pass (was 1222, +1 new regression).
Typecheck and 507 core tests still green.

Refs: MUL-3130, PR #3937 review by GPT-Boy. Builds on f66a522d0.
Co-authored-by: multica-agent <github@multica.ai>

---------

Co-authored-by: Eve <eve@multica-ai.local>
Co-authored-by: multica-agent <github@multica.ai>
2026-06-09 14:26:36 +08:00
Bohan Jiang
24b162cdbc feat(daemon): surface the real task initiator to the agent runtime (MUL-2645) (#3899)
* feat(daemon): surface the real task initiator to the agent runtime (MUL-2645)

In a multi-person workspace the agent runtime only ever saw the runtime
OWNER identity: the brief's `## Requesting User` is sourced from
runtime.OwnerID and the task-scoped token is owner-bound, so every
requester (whoever commented, @mentioned, or chatted) appeared to the
agent as the owner. Agents that route by initiator for permission,
privacy, or audit all misjudged.

Resolve the real task initiator at claim time and surface it distinctly
from the owner:
- comment / mention trigger -> triggering comment's author (member or agent)
- chat task -> chat session creator (sessions are creator-only)
- on-assign / autopilot / quick-create -> no attributable initiator (omitted)

Adds initiator_{type,id,name,email} to the claim response, the daemon
Task, and TaskContextForEnv, rendered into the brief as a new
`## Task Initiator` section. The section documents the privacy boundary:
the agent's credentials stay owner-scoped, so this is an attested
identity for the agent's own routing/privacy logic, not act-as. No DB
migration — both paths are derivable from existing rows.

Tests: brief rendering (member/agent/omit/sanitize) + email guard unit
tests, and claim-handler tests for the comment and chat paths.

Co-authored-by: multica-agent <github@multica.ai>

* fix(chat): store real sender as task initiator, not chat_session creator (MUL-2645)

Review fix (Niko, PR #3899). v1 resolved the chat task initiator from
chat_session.creator_id at claim time. That is correct for web chat and
Lark p2p (creator == sender), but WRONG for Lark group chats: the group
session creator is deliberately the installer (stable identity across
member churn), not the message sender. So in a Lark group, every member
who triggered the agent showed up in the brief as the installer/owner —
the exact bug this issue is about, still live at that entry point.

Capture the real sender at enqueue time instead of deriving it from the
session creator at claim time:

- migration 117: agent_task_queue.initiator_user_id (FK user, ON DELETE
  SET NULL); NULL for non-chat and pre-migration rows.
- EnqueueChatTask now takes an explicit initiatorUserID. Web chat passes
  the authenticated request user; the Lark dispatcher threads the inbound
  sender (binding.MulticaUserID) through scheduleRun -> flushChatRun. The
  debouncer keeps the latest scheduled flush per session, so in a multi-
  sender silence window the LATEST sender wins (documented + tested).
- claim handler resolves the initiator from task.initiator_user_id and
  drops the creator_id fallback entirely.

The Lark group session creator stays the installer (unchanged) — only the
task initiator is corrected, keeping the two concepts cleanly separate.

Tests: dispatcher group regression (initiator = sender, not installer),
latest-sender-wins, p2p initiator assertion; the chat claim handler test
now sets creator != initiator and asserts the stored sender wins.

Co-authored-by: multica-agent <github@multica.ai>

---------

Co-authored-by: J <j@multica.ai>
Co-authored-by: multica-agent <github@multica.ai>
2026-06-08 19:29:57 +08:00
Bohan Jiang
1ddf89a8f2 feat(daemon): enable Antigravity (agy) per-agent model selection (MUL-3125) (#3894)
* feat(daemon): wire agy --model and model discovery for Antigravity

agy 1.0.6 added a --model flag and an `agy models` catalog command, which
were the #1 blocker in the earlier agy-backend review (MUL-3125). The
antigravity backend already shipped but deliberately dropped opts.Model
because agy 1.0.1 had no way to select a model.

- buildAntigravityArgs now passes --model <display name> when opts.Model is
  set; the value is the exact `agy models` display string (spaces + parens),
  passed as a single exec arg so no shell quoting is needed.
- Block --model in custom_args so it can't override the managed value.
- ListModels("antigravity") enumerates via `agy models` (no static fallback:
  agy silently no-ops on unrecognised models, so a stale guess would turn a
  typo into a successful empty run).
- ModelSelectionSupported now returns true for every built-in provider; the
  hook stays for any future model-less runtime.
- Daemon probe reads MULTICA_ANTIGRAVITY_MODEL for the daemon-wide default.

Co-authored-by: multica-agent <github@multica.ai>

* docs(providers): mark Antigravity model selection as supported

Antigravity gained --model in agy 1.0.6 (MUL-3125). Update the provider
matrix + prose (en/zh/ja/ko) from "managed internally / no --model" to
dynamic discovery via `agy models`, and refresh the now-stale picker
comments. Flag the display-string (not slug) shape and agy's silent no-op
on unrecognised values.

Co-authored-by: multica-agent <github@multica.ai>

* fix(daemon): reject unknown Antigravity model at spawn (MUL-3125)

agy exits 0 with empty output on an unrecognised --model, so a stale/typo'd
value would surface as a 'completed' but empty task. Validate opts.Model
against the `agy models` catalog in Execute before spawning: a non-empty
model the CLI does not advertise fails fast with an actionable error listing
the real choices. opts.Model is the single funnel for agent.model and the
MULTICA_ANTIGRAVITY_MODEL default, so this one check covers every source
(UI free-text, API, persisted value, env) — addressing Elon's review that a
UI-only guard is bypassable.

Validation is fail-OPEN: if the catalog can't be discovered we pass the
value through and let agy resolve it, so a discovery hiccup never blocks a
run. Pure antigravityModelError() is unit-tested (valid / unknown / near-miss
/ empty-model / empty-catalog); verified live against real agy 1.0.6.

Co-authored-by: multica-agent <github@multica.ai>

---------

Co-authored-by: J <j@multica.ai>
Co-authored-by: multica-agent <github@multica.ai>
2026-06-08 15:32:53 +08:00
Bohan Jiang
3808049361 fix(codex): set semantic thread names (#3887)
Co-authored-by: J <j@multica.ai>
Co-authored-by: multica-agent <github@multica.ai>
2026-06-08 14:53:31 +08:00
Thanh Minh
8abdc77961 MUL-2489 fix(runtime): delete archived squads before runtime teardown (#2955)
* fix(runtime): delete squads referencing archived agents before runtime teardown

The DeleteAgentRuntime handler was failing with 500 'failed to clean up
archived agents' because squad.leader_id has an ON DELETE RESTRICT FK on
agent(id). When an archived agent was still referenced as a squad leader
(even on an archived squad), the DELETE FROM agent query was blocked.

Fix: add DeleteSquadsByArchivedAgentsOnRuntime query that removes squads
whose leader_id points to an archived agent on the target runtime, and
call it before DeleteArchivedAgentsByRuntime in the handler.

Closes TMI-85

* test(runtime): cover squad cleanup before archived-agent deletion

Adds four tests around the DeleteSquadsByArchivedAgentsOnRuntime fix:

* TestDeleteSquadsByArchivedAgentsOnRuntime_Query — query-level: deletes
  squads whose leader is an archived agent on the target runtime, leaves
  squads with active leaders or archived leaders on a different runtime
  alone, and is safe to call when nothing matches. Covers the archived-
  squad case that originally hid the FK blocker from `multica squad list`.
* TestDeleteAgentRuntime_RemovesSquadsLedByArchivedAgents — handler
  end-to-end regression for TMI-85. Reverting the handler change makes
  this fail with the exact 500 'failed to clean up archived agents' the
  user reported.
* TestDeleteAgentRuntime_NoSquadsRegression — happy path for runtimes
  whose archived agents were never squad leaders, ensuring the new step
  is a no-op there.
* TestDeleteAgentRuntime_StillBlockedByActiveAgents — preserves the 409
  CountActiveAgentsByRuntime guard so the active-agent contract isn't
  silently regressed by the new cleanup ordering.

Refs TMI-85

* chore: remove internal issue tracker references from test comments

* fix(runtime): keep active squads during runtime teardown

* fix(runtime): block runtime delete on active archived-leader squads

* fix(runtime): make runtime delete 409 path a no-op

---------

Co-authored-by: Kiro <kiro@multica.ai>
2026-06-08 13:08:38 +08:00
Multica Eve
14f89bc08a Fix Claude control request handling (#3827)
Co-authored-by: Eve <eve@multica-ai.local>
Co-authored-by: multica-agent <github@multica.ai>
2026-06-05 17:14:33 +08:00
Bohan Jiang
6ac8314711 feat(lark): support both Feishu and Lark from one deployment (MUL-3083) (#3815)
* feat(lark): serve Feishu and Lark from one deployment, per installation

The Lark integration was locked to a single open-platform host chosen
deployment-wide (MULTICA_LARK_HTTP_BASE_URL / _CALLBACK_BASE_URL,
defaulting to open.feishu.cn), so one deployment could talk to only the
mainland Feishu cloud OR Lark international — never both. Teams on the
other tenant could not use the integration at all.

Make the host per-installation. The device-flow installer already
auto-detects the tenant (Lark emits tenant_brand="lark" mid-poll); we now
persist that as lark_installation.region, carry it on
InstallationCredentials.Region, and resolve the open-platform host per
call (REST + WS bootstrap) from the region. An explicit cfg.BaseURL
(env / httptest) still overrides every region, so existing tests and
staging/proxy setups keep working.

- migration 116: lark_installation.region TEXT NOT NULL DEFAULT 'feishu'
  CHECK (region IN ('feishu','lark')) — existing rows are all mainland.
- lark.Region enum + OpenPlatformBaseURL/RegionOrDefault helpers.
- registration: thread the detected region into finishSuccess so the
  install-time GetBotInfo hits the right cloud AND the row records it.
- every credential-build site (patcher, replier, WS provider, union_id
  backfill) copies region off the installation row.
- region is part of the WS supervisor fingerprint so a re-install that
  switches cloud restarts the connection.
- API: surface region on the installation listing DTO.

MUL-3083

Co-authored-by: multica-agent <github@multica.ai>

* feat(lark): surface installation region in settings UI

Read the per-installation region off the listings response: build the
"Manage in Lark" dev-console host from it (open.feishu.cn vs
open.larksuite.com instead of a hardcoded mainland host) and render a
Feishu / Lark badge on each connected bot. The field is optional and
defaults to Feishu when an older server omits it (API-compat). Adds the
region_feishu / region_lark labels to all four locales.

MUL-3083

Co-authored-by: multica-agent <github@multica.ai>

* docs(lark): document simultaneous Feishu + Lark support

The cloud each bot belongs to is now auto-detected at install and stored
per installation, so one deployment serves both. Replace the old
"point MULTICA_LARK_HTTP_BASE_URL at larksuite for international tenants"
guidance (now just an optional override) in all four locales.

MUL-3083

Co-authored-by: multica-agent <github@multica.ai>

* fix(lark): repair legacy Lark-international installs on upgrade

Review follow-up (MUL-3083). Migration 116 backfilled every existing
lark_installation to region='feishu', assuming all historical rows were
mainland. But self-host deployments could already run Lark international
via the deployment-wide MULTICA_LARK_HTTP_BASE_URL override, so those
rows are really Lark — clearing the override after upgrade (which the new
docs invite) would route them to open.feishu.cn and break them.

Add a one-shot startup repair, BackfillRegionFromLegacyOverride, fired
off the hot path like BackfillBotUnionIDs: when the deployment's global
base-URL override targets open.larksuite.com, relabel the still-default
'feishu' rows to 'lark'. Gating on the deployment-wide override is what
makes it safe — every pre-existing install on such a deployment was Lark.
Idempotent; no-op on mainland / fresh deployments. Verified end-to-end
against a scratch DB (flip then 0-row idempotent re-run).

Also document that a Lark/飞书 app_id is globally unique across both
clouds, which is what makes the app_id-keyed token cache and the
UNIQUE(app_id) constraint safe across regions (review nit).

MUL-3083

Co-authored-by: multica-agent <github@multica.ai>

* docs(lark): fix ops guidance to match auto per-installation region

Review follow-up (MUL-3083). .env.example and docker-compose.selfhost.yml
still told operators that international Lark requires pointing both base
URLs at open.larksuite.com — now wrong, and it would push a fresh
deployment back into a single-cloud override. Rewrite them: the base
URLs are optional deployment-wide overrides; normal dual-cloud operation
keeps them empty. Document the first-boot auto-relabel for deployments
migrating off the old single-cloud override, across the integration docs
(en/zh/ja/ko).

MUL-3083

Co-authored-by: multica-agent <github@multica.ai>

---------

Co-authored-by: J <j@multica.ai>
Co-authored-by: multica-agent <github@multica.ai>
2026-06-05 16:03:13 +08:00
Bohan Jiang
3708fb0f07 fix(daemon): inactivity-based agent run timeout, no wall-clock guillotine (MUL-3064)
Active long-running sessions are no longer killed by a fixed wall-clock deadline. Liveness is delegated to the idle watchdog (MULTICA_AGENT_IDLE_WATCHDOG, default 30m) with a larger in-flight-tool budget (MULTICA_AGENT_TOOL_WATCHDOG, default 2h). MULTICA_AGENT_TIMEOUT is an opt-in absolute cap (default 0 = no cap). The server-side 2.5h sweeper is unchanged as a coarse backstop.

Fixes #3745.
2026-06-05 15:06:07 +08:00
Bohan Jiang
76dbb87762 fix(agent): standardize model-discovery timeouts to 15s, stop caching empty results
Raise pi and cursor model-list discovery timeouts 5s->15s to match opencode/ACP; openclaw stays 30s (sequential multi-spawn). Stop caching empty discovery results so a transient timeout doesn't keep the picker blank for the full TTL. Fixes #3729. MUL-2977.
2026-06-05 14:47:59 +08:00
Bohan Jiang
c9ceaee4d9 fix(agent): stop stripping user-facing CLAUDE_CODE_* config from child env (#3690)
* fix(agent): stop stripping user-facing CLAUDE_CODE_* config from child env

isFilteredChildEnvKey blanket-removed every CLAUDE_CODE_* var from the
spawned Claude Code child's environment. The intent was only to keep the
daemon's internal session markers from leaking, but CLAUDE_CODE_* is also
Anthropic's user-facing config namespace. On Windows this stripped the
user-set CLAUDE_CODE_GIT_BASH_PATH, so Claude Code could not locate
bash.exe, exited immediately, and every task failed with
"write claude input: write |1: The pipe has been ended."

Switch from prefixing the whole CLAUDE_CODE_ namespace to an exact-name
denylist of the internal runtime/session markers (CLAUDECODE,
CLAUDE_CODE_ENTRYPOINT, CLAUDE_CODE_EXECPATH, CLAUDE_CODE_SESSION_ID,
CLAUDE_CODE_TMPDIR, CLAUDE_CODE_SSE_PORT), still blanket-stripping the
wholly-internal CLAUDECODE_* namespace. Every other CLAUDE_CODE_* var
(GIT_BASH_PATH, USE_BEDROCK, USE_VERTEX, MAX_OUTPUT_TOKENS, ...) now
reaches the child. The internal-marker set was confirmed against the live
runtime, not guessed.

Fixes the whole class, not just git-bash: Bedrock/Vertex/etc. were
silently dropped the same way.

MUL-2940

Co-authored-by: multica-agent <github@multica.ai>

* fix(agent): keep CLAUDE_CODE_TMPDIR in child env

CLAUDE_CODE_TMPDIR is a documented, user-configurable temp-dir override
(public env-vars reference), not an internal per-session marker. Claude
Code creates its own per-session subdir under it, so inheriting it is
harmless — and stripping it would silently break a user's temp-dir
override the same way the broad prefix filter broke CLAUDE_CODE_GIT_BASH_PATH.

Drop it from the internal denylist (which now holds only the undocumented
per-process runtime markers: CLAUDECODE, CLAUDE_CODE_ENTRYPOINT,
CLAUDE_CODE_EXECPATH, CLAUDE_CODE_SESSION_ID, CLAUDE_CODE_SSE_PORT) and
assert it reaches the child.

MUL-2940

Co-authored-by: multica-agent <github@multica.ai>

---------

Co-authored-by: J <j@multica.ai>
Co-authored-by: multica-agent <github@multica.ai>
2026-06-05 12:29:55 +08:00
Bohan Jiang
c99c2493ae fix(agent): keep resolvable models when CLI discovery exits non-zero
Parse the discovered catalog even when the model-discovery CLI exits non-zero (pi/opencode/cursor/openclaw) instead of discarding it and returning an empty model picker. Filter pi diagnostic lines so stale-pattern warnings don't coin bogus models. Fixes #3729. MUL-2977.
2026-06-04 14:57:30 +08:00
Bohan Jiang
8c98940b79 Lark Bot integration MVP: migration + service boundary (MUL-2671) (#3277)
* feat(db): add Lark integration migration (MUL-2671)

Introduces seven tables for the 飞书 Bot integration MVP — per-agent
PersonalAgent installations, user/chat bindings, inbound dedup +
non-content drop audit, outbound card mapping, and short-lived
single-use member binding tokens.

Schema notes:
- chat_session schema unchanged; Lark routes through a separate
  binding table rather than adding a metadata JSONB column.
- Outbound card mapping is task/message scoped so multiple runs on
  the same session can't stomp each other's cards.
- lark_inbound_audit stores routing / identity / drop_reason ONLY,
  never message body — the audit channel for unbound users and group
  messages that don't address the Bot.
- app_secret stores ciphertext (encryption helper lands in a follow-up
  commit on this branch); DB never sees plaintext.

Co-authored-by: multica-agent <github@multica.ai>

* feat(util): add secretbox AES-256-GCM helper for at-rest secrets

First consumer is lark_installation.app_secret (MUL-2671 §4.4), but
the helper is intentionally generic — future per-tenant secrets that
must not appear in a DB dump can reuse it.

Construction: AES-256-GCM with a per-message random nonce, providing
authenticated encryption. Tampered ciphertext fails Open instead of
silently decrypting to garbage. Master key loaded from a base64 env
var via LoadKey; key rotation is not in scope yet.

Co-authored-by: multica-agent <github@multica.ai>

* refactor(issues): extract IssueService.Create as single create entry (MUL-2671)

Establishes the service-layer boundary mandated by Elon's 二审 of
MUL-2671 §4.8: issue creation no longer lives inside the HTTP
handler. Both the HTTP POST /issues handler and the future Lark
/issue command call into service.IssueService.Create, so duplicate
guard, issue numbering, attachment linking, broadcast, analytics,
and agent/squad enqueue stay aligned.

Handler responsibilities shrink to parsing the HTTP request, doing
actor resolution / validation (transport-specific), and converting
service results into the IssueResponse + 201. The transaction-wrapped
core, attachment link, event publish, analytics capture, and
agent/squad enqueue all move into service.IssueService.Create.

A BroadcastPayload callback on the service keeps the WS broadcast
shape (the full IssueResponse) without forcing the service to
depend on handler-layer response types.

Co-authored-by: multica-agent <github@multica.ai>

* feat(integrations): add Lark package skeleton (MUL-2671)

Establishes the architectural boundaries Elon's 二审 mandated as
first-PR blockers without dragging in OAuth, WebSocket, or
card-patching code (those land in follow-up PRs):

- ChatSessionService interface — channel-aware chat-session entry
  point for Lark, deliberately separate from the HTTP SendChatMessage
  handler. The HTTP handler's single-creator guard (creator_id ==
  request user_id) is correct for the browser client but rejects
  group chat_sessions by construction; Lark needs its own service.
- AuditLogger interface — the only path for recording dropped events.
  Its signature deliberately omits message body, enforcing the
  drop-audit policy (MUL-2671 §4.7) at the type level: unbound users
  and non-addressed group messages can't accidentally end up in
  chat_session.
- Typed IDs (OpenID, ChatID) prevent UUIDs from being conflated with
  Lark-side identifiers at compile time.
- DropReason constants align dashboard/audit queries across callers.

Co-authored-by: multica-agent <github@multica.ai>

* refactor(issues): move parent/project workspace check into IssueService (MUL-2671)

Parent existence and project workspace membership now live inside
IssueService.Create, inside the same transaction as the duplicate guard
and counter increment. The HTTP handler stops re-implementing the
lookup; every future create entry (Lark /issue, MCP, API keys) inherits
the same boundary without copy-pasting the SQL.

Adds two error sentinels (ErrParentIssueNotFound, ErrProjectNotFound)
so transports can translate to their own error shapes. Handler-level
cross-workspace tests guard the boundary against future regressions.

Co-authored-by: multica-agent <github@multica.ai>

* fix(db): harden Lark migration safety底座 — TTL cap + workspace FK (MUL-2671)

Two storage-layer hardenings that move the must-fix line off "the app
layer enforces it" and onto the schema itself, so future write paths or
hand-inserted rows cannot regress the invariants.

1) lark_binding_token TTL cap. The DB CHECK was 1 hour as
   defense-in-depth while the app constant was 15 minutes; the CHECK
   now matches the product cap (15 minutes). Application constant
   docstring updated to reflect that storage enforces the same bound.

2) lark_user_binding workspace membership. The table previously only
   FK'd to workspace / user / installation independently, so a binding
   could exist for a user no longer in the workspace, or claim a
   workspace different from its installation's. Two composite FKs
   close the gap structurally:

   * (installation_id, workspace_id) → lark_installation(id, workspace_id)
     — guarantees a binding's workspace_id always matches its
     installation's workspace_id. A new UNIQUE (id, workspace_id) on
     lark_installation is added as the FK target.

   * (workspace_id, multica_user_id) → member(workspace_id, user_id)
     with ON DELETE CASCADE — when a user is removed from the
     workspace, the binding cascades away in the same transaction.
     There is no longer a path where lark_user_binding outlives
     workspace membership.

These two FKs are the schema-level proof for §4.3's "unbound or
non-workspace members cannot leak content into chat_session" invariant.

Co-authored-by: multica-agent <github@multica.ai>

* feat(integrations/lark): inbound services + /issue dispatcher (MUL-2671)

Lands the inbound service layer for the Lark Bot MVP, sitting on top
of the migration + service-boundary scaffold from the previous commits.
What ships:

- sqlc queries for all seven lark_* tables (idempotent dedup insert,
  CAS WS-lease, single-use binding-token consume, etc.) plus
  GetMostRecentUserChatMessage for the /issue fallback.
- AuditLogger backed by lark_inbound_audit; signature deliberately
  body-free so callers cannot leak content into the drop log.
- ChatSessionService: find-or-create chat_session via the binding
  table (winner-takes-all on the UNIQUE race), append-with-dedup, /issue
  parser, "previous user message" fallback for bare `/issue` invocation.
- Dispatcher orchestrates the inbound pipeline in one place:
  installation routing → group-mention filter → identity check → ensure
  session → append+dedup → /issue → enqueue chat task. Group sessions
  use the installer as creator (stable workspace identity); p2p uses
  the sender. Agent-offline path falls through with OutcomeAgentOffline
  so the WS adapter can reply with the offline notice from §4.6.
- BindingTokenService: random URL-safe token, SHA-256 stored hash,
  15-min TTL pinned at the application AND the DB CHECK; Redeem
  returns the same opaque error for all rejection cases (no timing
  oracle on replay).
- Unit tests for the parser (13 cases), dispatcher (8 cases via fake
  Queries/Chat/Audit/IssueCreator/Enqueuer), and binding-token
  hash/entropy. Real-DB integration tests for OAuth + token redeem
  land alongside the HTTP handlers in the next commit.

Out of scope for this commit (next ones on the same feature branch):
OAuth callback, HTTP routes, WebSocket hub, outbound card patcher,
frontend.

Co-authored-by: multica-agent <github@multica.ai>

* feat(integrations/lark): installation HTTP surface + secretbox-gated wiring (MUL-2671)

Lands the HTTP boundary on top of the inbound services from the
previous commit. What ships:

- InstallationService.Upsert: the only path that writes
  lark_installation. Encrypts app_secret with the secretbox passed in
  at construction time; refuses to fall back to plaintext storage
  (returns an error from the constructor if no Box is supplied), so a
  misconfigured dev environment cannot accidentally land a row with
  cleartext credentials. Revoke flips status without DELETE so audit
  trail survives.

- HTTP handlers under /api/workspaces/{id}/lark/:
  * GET  /installations           — member-visible (Integrations tab
                                    renders for non-admins). Soft 200
                                    with empty list + configured:false
                                    when MULTICA_LARK_SECRET_KEY is
                                    unset, so the tab does not error
                                    on self-host that has not opted in.
  * POST /installations           — admin-only; 503 when not configured.
                                    Re-validates agent_id ∈ workspace
                                    before accepting credentials so a
                                    cross-workspace agent UUID is
                                    rejected.
  * DELETE /installations/{id}    — admin-only; workspace-scoped lookup
                                    so one workspace cannot revoke
                                    another's installation by UUID
                                    guess.

- POST /api/lark/binding/redeem (user-scoped, no workspace context):
  the only path that mints a lark_user_binding row from user action.
  Redeemer identity comes from the session, not the token, so a stolen
  link cannot bind an open_id to an attacker's Multica user. The
  composite FK on lark_user_binding cascades the binding away if the
  user is not (or no longer) a workspace member, so a non-member who
  steals the link gets 403 at the DB layer.

- Two new event-bus types in protocol.events:
  EventLarkInstallationCreated, EventLarkInstallationRevoked.

- Router wiring: MULTICA_LARK_SECRET_KEY drives a conditional
  initialization of h.LarkInstallations + h.LarkBindingTokens. When
  unset, the integration disables itself with an INFO log and the
  rest of the server boots normally.

- Handler tests cover all four not-configured short-circuits.
  Happy-path integration tests (real DB, full create→list→revoke
  cycle and token mint→redeem) ship alongside the WS hub PR.

Co-authored-by: multica-agent <github@multica.ai>

* fix(integrations/lark): close binding-token rebind & typed task errors (MUL-2671)

Two must-fixes from PR review on HEAD 87ad15e1:

1. Binding-token redeem could be used to grab an already-bound Lark
   open_id. Two changes harden the path:
   - lark.sql `CreateLarkUserBinding` now gates ON CONFLICT DO UPDATE
     on `multica_user_id = EXCLUDED.multica_user_id`, so a cross-user
     rebind via a second valid token returns zero rows instead of
     silently switching ownership.
   - `BindingTokenService.RedeemAndBind` consumes the token and writes
     the binding row inside one transaction. A failed bind no longer
     burns the token; a successful bind never leaves a consumed-but-
     unused token. Distinct typed errors: ErrBindingTokenInvalid (410),
     ErrBindingAlreadyAssigned (409), ErrBindingNotWorkspaceMember
     (403). The handler maps each to its own status code.

2. Dispatcher collapsed every `EnqueueChatTask` error to
   `OutcomeAgentOffline`, hiding infra failure and misusing the
   "offline" label for cases (e.g. archived agent) where it doesn't
   fit. Now:
   - `service.EnqueueChatTask` returns `ErrChatTaskAgentNoRuntime` and
     `ErrChatTaskAgentArchived` as sentinel errors; DB / load / insert
     failures stay wrapped as ordinary errors.
   - Dispatcher uses `errors.Is` to map only the productizable cases
     (`OutcomeAgentOffline`, new `OutcomeAgentArchived`); any other
     error is returned to the WS adapter so it can retry or page
     instead of disguising the outage as an offline card.

A daemon that's merely disconnected is still NOT an error — as long
as `agent.runtime_id` is set the chat task enqueues and waits for the
daemon to claim it on next online (returns `OutcomeIngested`).

Co-authored-by: multica-agent <github@multica.ai>

* ci: re-trigger workflow on lark MVP must-fix HEAD

Co-authored-by: multica-agent <github@multica.ai>

* ci: re-trigger workflow on lark MVP must-fix HEAD (retry)

Co-authored-by: multica-agent <github@multica.ai>

* test(integrations/lark): guard binding-token sentinel contract (MUL-2671)

Two unit tests that document and protect the must-fix invariants
without requiring a DB:

1. TestRedeemAndBindRequiresTxStarter — if a future refactor wires
   up BindingTokenService without a TxStarter, RedeemAndBind must
   fail fast with a clear error rather than nil-panic on Begin.
   The atomicity contract (consume + bind commit together) depends
   on that transaction existing.

2. TestBindingErrorSentinelsAreDistinct — the HTTP handler maps
   ErrBindingTokenInvalid → 410, ErrBindingAlreadyAssigned → 409,
   ErrBindingNotWorkspaceMember → 403. Accidentally aliasing them
   (e.g. var ErrBindingAlreadyAssigned = ErrBindingTokenInvalid)
   would silently regress the response codes without any other
   test catching it.

Co-authored-by: multica-agent <github@multica.ai>

* feat(integrations/lark): WS hub orchestrator + outbound card patcher (MUL-2671)

The hub owns one supervisor goroutine per active installation. Each
supervisor acquires the WS lease via the existing CAS query, runs an
EventConnector (interface — real Lark wire protocol lands in a follow-up
behind it), renews the lease on a tighter cadence than the TTL, and
backs off (with jitter) on connector failure. Lease loss tears the
connector down cleanly; revocation is reaped on the next sweep. Per-
process node id satisfies §4.4 multi-replica safety: at most one Hub
globally holds the lease for any installation.

The patcher subscribes to task / chat-done events on the existing
events.Bus and keeps the per-task Lark interactive card in sync
(thinking → streaming → final | error). Card binding is per-task as
required by §4.5; throttled patches via an in-memory last-patched map;
final / error transitions bypass the throttle so the user always sees
the terminal state. The Renderer is plug-replaceable so the product
card template can evolve without touching transport.

The APIClient interface centralizes the Lark Open Platform surface
this package needs (send card, patch card, send binding prompt,
exchange OAuth code). The default stubAPIClient returns
ErrAPIClientNotConfigured for every transport call so a misconfigured
deployment fails loudly instead of dropping cards silently. Real
implementation lands in a follow-up; OAuth callback + frontend entries
land in the next commits on this branch.

Co-authored-by: multica-agent <github@multica.ai>

* feat(integrations/lark): OAuth install start / callback (MUL-2671)

OAuthService builds a signed-state Lark authorization URL the frontend
can render as a QR (or open directly), then on callback verifies the
HMAC-protected state, exchanges the OAuth code for installation
credentials via APIClient.ExchangeOAuthCode, and persists the row via
InstallationService.Upsert (which keeps app_secret encryption inside a
single chokepoint).

State token format: workspaceID.agentID.initiatorID.expiresUnix.nonce.sig
— HMAC-SHA256 over the first five fields with a deployment-level
secret. TTL defaults to 10 minutes (covered by tests). Three failure
modes (invalid state / expired state / missing code) map to typed
errors so the HTTP handler can emit a single lark_error= query param
the frontend uses to pick copy.

Both endpoints degrade cleanly: the at-rest key gate (already in place)
returns 503 from /install/start when the InstallationService is nil,
and the OAuth gate (MULTICA_LARK_OAUTH_APP_ID / _SECRET / _REDIRECT_URI
/ _STATE_SECRET) returns configured:false from /install/start so the
frontend can render "configure manually instead" without an error
banner. /install/callback always finishes with a redirect to
/settings?tab=lark carrying either lark_installed=1 or lark_error=<code>.

Tests cover signed-URL shape, missing-config rejection, tampered state,
expired state, propagated exchange error, and the no-config redirect
path on the HTTP handler.

Co-authored-by: multica-agent <github@multica.ai>

* feat(views/lark): settings tab + agent bind button + /lark/bind redemption page (MUL-2671)

Adds the user-facing Lark surface across the shared packages:

- packages/core/types/lark.ts — wire shapes that mirror server/internal/
  handler/lark.go. Optional fields default to undefined so older desktop
  builds keep parsing if the server adds new keys (CLAUDE.md → API
  Response Compatibility).
- packages/core/lark/{queries,index}.ts — Tanstack Query options keyed
  by workspace id; realtime sync invalidates `installations(wsId)` on
  `lark_installation:*` events.
- packages/core/api/client.ts — listLarkInstallations,
  getLarkInstallURL, deleteLarkInstallation, redeemLarkBindingToken.
- packages/views/settings/components/lark-tab.tsx — Settings → Lark
  panel. Listing is member-visible (matches backend); disconnect is
  admin-only. Empty state points users at the per-Agent bind entry,
  matching the (workspace_id, agent_id) UNIQUE: there is no
  "pick an agent" UI here because the bind URL is per-agent.
- LarkAgentBindButton (same file) is the per-Agent CTA the Agent
  detail page imports. Opens the OAuth URL in a new tab; the callback
  bounces back to /settings?tab=lark with a query param the panel
  reads for inline confirmation copy.
- packages/views/lark/bind-page.tsx — the Bot's "you need to bind"
  destination. Requires session before redeeming, distinguishes the
  410/409/403 backend responses into distinct copy.
- apps/web/app/lark/bind/page.tsx — Next.js route wrapping the shared
  bind page in a Suspense boundary (Next 15 useSearchParams rule).

i18n: all user-facing strings land in en/zh-Hans, settings tab nav
includes a Sparkles-iconed Lark entry, bind-page copy lives under
common.lark_bind so it works pre-workspace-context too. typecheck +
lint clean.

Co-authored-by: multica-agent <github@multica.ai>

* chore(integrations/lark): wire outbound Patcher into server bootstrap (MUL-2671)

Constructs the Patcher next to the existing Installation/BindingToken
wiring in router.go and Register()s it on the event bus. With the stub
APIClient any actual transport call surfaces ErrAPIClientNotConfigured;
once the real Lark client lands, swap NewStubAPIClient for the real
implementation here without touching the Patcher's subscription logic.

doc.go updated to reflect everything the package now contains (Hub,
Patcher, OAuthService, APIClient interface). The Hub itself is NOT
booted here yet — it needs an EventConnector implementation for the
Lark long-connection wire protocol, which lands in a follow-up; the
orchestrator code and its unit tests are in place so that follow-up
can focus on the WS protocol rather than lifecycle plumbing.

Co-authored-by: multica-agent <github@multica.ai>

* fix(integrations/lark): address Elon 二审 5 must-fix items (MUL-2671)

- Hub: renewer cancels run ctx on lease loss so the connector exits
  even if its wire I/O is blocked, keeping the §4.4 ownership
  invariant intact under lease theft.
- Hub: EventEmitter returns (DispatchResult, error) so the real
  connector can post the matching Lark-side card (needs_binding,
  agent_offline, agent_archived) and react to infra failures instead
  of silently logging at the seam.
- Dispatcher: top-level message_id dedup runs before group filter
  and identity check, so a reconnect storm cannot re-fire binding
  prompts or re-spam not_addressed_in_group audit rows; the in-
  AppendUserMessage dedup is removed since the table-level UNIQUE
  is the ultimate backstop.
- OAuth: HandleCallback auto-binds the installer via the new
  InstallerBinder seam (BindingTokenService implements it), so the
  §2.1 "scan to bind, you're done" promise holds end-to-end.
  validateExchangeResult now requires installer open_id; new error
  reason codes wired through the callback redirect.
- Frontend / handler: install_supported listing field + StartLark-
  Install short-circuit on stub APIClient hide install entry points
  (Settings tab + per-agent button) while no real Lark HTTP client
  is wired, so users do not land in an OAuth flow that fails at
  exchange.

Includes tests for each fix (lease-loss cancel, emit error
propagation, dedup ordering, OAuth installer-bind contract, stub-
client install gate) and i18n strings for the new preview state.

Co-authored-by: multica-agent <github@multica.ai>

* fix(integrations/lark): two-phase dedup so infra failures do not swallow messages (MUL-2671)

The pre-fix top-level dedup wrote the lark_inbound_message_dedup row before
EnsureChatSession / AppendUserMessage. An infra error in either step left
the row in place and a WS-adapter retry was mis-classified as a duplicate,
so the user's Lark message was permanently lost without ever landing in
chat_session.

Make dedup two-phase:

  - ClaimLarkInboundDedup acquires an in-flight claim (processed_at NULL).
    Stale claims older than 60 s are re-takeable so a process crash does
    not strand the message_id.
  - MarkLarkInboundDedupProcessed flips processed_at on durable success
    (audit row OR chat_message + session touch).
  - ReleaseLarkInboundDedup deletes the in-flight row on infra failure
    before any durable side effect, so the retry can re-claim immediately.

Dispatcher.Handle now finalizes the claim exactly once based on whether
the inner pipeline reached a durable outcome — chat_message commit being
the transition point (errors past it Mark, errors before it Release).

Regression tests cover the two failure variants Elon flagged plus the
inverse invariants (durable-error Marks, drops Mark, in-flight replays
drop, stale claims re-claim).

Co-authored-by: multica-agent <github@multica.ai>

* fix(integrations/lark): owner-fence dedup claim to close the double-write windows (MUL-2671)

The two-phase Claim/Mark/Release fix from the previous commit closed the
"infra error swallows a replay" gap but left two windows that could still
write a chat_message twice for the same Lark message_id:

  1. Stale-reclaim race. Worker A claims at t=0, runs slowly past the
     60 s staleness TTL but is still alive. Worker B sees the row as
     stale and re-takes the claim. A reaches AppendUserMessage and
     commits a second chat_message.

  2. Mark window. Worker A commits chat_message but the post-pipeline
     MarkLarkInboundDedupProcessed fails (DB hiccup) or the process
     crashes before it runs. 60 s later a retry treats the in-flight
     row as stale, re-claims it, and writes a second chat_message.

Close both with owner fencing + same-tx Mark:

  - lark_inbound_message_dedup now carries a `claim_token` UUID;
    ClaimLarkInboundDedup mints a fresh one on insert and on stale
    re-take, so a reclaim ROTATES the token.

  - MarkLarkInboundDedupProcessed and ReleaseLarkInboundDedup are
    fenced on (message_id, claim_token, processed_at IS NULL) and
    return rowsAffected. Zero means our token is no longer live, and
    the caller treats it as a no-op (not an error).

  - AppendUserMessage invokes MarkLarkInboundDedupProcessed INSIDE its
    chat_message+session tx (qtx). If the token has been rotated by a
    concurrent reclaim, the Mark matches zero rows and the method
    returns ErrClaimLost; the deferred Rollback unwinds the
    chat_message insert, so the other holder is the sole writer. The
    durable write and the Mark therefore commit (or roll back)
    atomically — there is no "committed but not yet Marked" window
    for a crash or retry to exploit.

Dispatcher.processClaimed now returns a tri-state dedupFinalize directive
(none / mark / release): finalizeNone for the in-tx Mark path (and
ErrClaimLost), finalizeMark for audit-drop branches and the defensive
post-Append-success fallback, finalizeRelease for pre-durable infra
errors. ErrClaimLost is translated into OutcomeDropped + DropReason-
Duplicate at the Handle boundary, matching what the WS adapter expects
for a "another worker is the writer" outcome.

Regression tests:

  - TestDispatcher_StaleReclaimRaceDoesNotDoubleWrite injects worker
    B's reclaim via a beforeAppend hook so the claim_token rotates
    between Claim and AppendUserMessage. Asserts worker A's
    AppendUserMessage returns ErrClaimLost (no chat_message
    committed), the dispatcher surfaces a duplicate drop, the token
    rotated to a value distinct from A's original, and a follow-up
    replay still duplicate-drops.

  - TestDispatcher_InTxMarkPreventsPostCommitReclaim verifies the
    "Mark window" case is unreachable: a successful in-tx Mark
    produces exactly one Mark call (no post-finalize duplicate), the
    row is terminal, and a retry with dedupReclaim=true still
    duplicate-drops without re-rotating the token.

  - TestDispatcher_InTxMarkSucceedsAndSkipsPostFinalize pins the
    positive contract: DedupMarked=true must make applyFinalize a
    no-op (no extra Mark, no Release).

fakeQueries gains a fakeDedupRow model carrying (processed, token,
rotations) so the test seam matches production's UPDATE-with-WHERE
semantics; fakeChat gains a beforeAppend hook to inject race timing.

go test ./... and go vet ./... pass.

Co-authored-by: multica-agent <github@multica.ai>

* feat(integrations/lark): real Lark HTTP APIClient for IM v1 send/patch (MUL-2671)

Lands the production Lark Open Platform HTTP APIClient that replaces
the stub for outbound transport. The patcher's "thinking → streaming
→ final | error" card lifecycle and the dispatcher's binding-prompt
card both now reach Lark for real once MULTICA_LARK_HTTP_ENABLED=true.

Scope of this stage:

  - tenant_access_token retrieval via /open-apis/auth/v3/
    tenant_access_token/internal, cached in-process per app_id with a
    60s safety margin against Lark's `expire` value. Sub-2-minute
    expires are clamped to 120s so we never cache an entry that's
    already past its safe window.
  - SendInteractiveCard: POST /open-apis/im/v1/messages?receive_id_type=chat_id
    returning the Lark message_id the Patcher persists in
    lark_outbound_card_message for later patches.
  - PatchInteractiveCard: PATCH /open-apis/im/v1/messages/:id with
    the full re-rendered card body (Lark's update endpoint replaces,
    not deep-merges).
  - SendBindingPromptCard: open_id-targeted interactive card with a
    primary "去绑定" CTA pointing at the redemption URL. Template is
    co-located with the transport so the dispatcher never has to know
    about Lark's card schema.
  - Token-error invalidation: Lark codes 99991663 (expired) /
    99991664 (invalid) drop the cached token so the next call
    refreshes from /tenant_access_token/internal instead of looping
    on a stale entry.

Out of scope (deferred to follow-up stages):

  - ExchangeOAuthCode stays unimplemented behind
    ErrAPIClientNotConfigured. The PersonalAgent install handshake's
    response shape (returning per-installation app credentials in a
    single call) is not yet verified against the production endpoint,
    and a silent mis-fill of OAuthExchangeResult would corrupt
    lark_installation rows past validateExchangeResult. Operators
    continue to use the manual-paste InstallationService path until
    the OAuth stage lands.
  - Inbound WS EventConnector — Hub's ConnectorFactory still needs a
    real wire-protocol implementation.

Wiring:

  - MULTICA_LARK_HTTP_ENABLED=true switches router.go from the stub
    to the real client. MULTICA_LARK_HTTP_BASE_URL overrides the
    default open.feishu.cn host (set to open.larksuite.com for the
    Lark international tenant, or to an httptest URL for integration
    tests).
  - The OAuth handler now also receives the real client (its
    ExchangeOAuthCode still surfaces ErrAPIClientNotConfigured, so
    callback behavior is unchanged until that stage lands).

Tests (19 new cases against an httptest.Server fake):

  - happy path send/patch/binding-prompt round trips, asserting URL
    query params, body shape, Authorization header
  - token cache: 3 sends share one /tenant_access_token/internal hit
  - token refresh after clock-driven expiry
  - sub-margin expire clamping (10s expire → cached for >= safety
    margin of wall-clock)
  - Lark error code surfacing (230001 send, 230002 patch, 10003 auth)
  - token-expired (99991663) invalidates the cache; caller's retry
    re-fetches and succeeds
  - non-2xx HTTP status surfaces "http 500: …"
  - input validation: missing chat_id short-circuits BEFORE auth
    round-trip, missing card json / open_id / bind url all fail
    pre-flight without hitting Lark
  - ExchangeOAuthCode still returns ErrAPIClientNotConfigured
  - binding-prompt template carries the BindURL and the localized
    "去绑定" CTA in valid JSON

go build ./..., go vet ./..., and go test ./internal/integrations/lark/...
pass. Pre-existing handler/router integration tests that require a
real Postgres connection are unaffected by this change.

Co-authored-by: multica-agent <github@multica.ai>

* fix(integrations/lark): split outbound vs OAuth-install capability + card update_multi (MUL-2671)

Address Elon's two must-fix items from the HEAD a09993b1 review:

1. HTTP outbound and OAuth-install are now distinct APIClient
   capabilities. The new SupportsOAuthInstall() reports whether the
   install flow can succeed end-to-end (i.e. ExchangeOAuthCode is
   implemented); the real httpAPIClient still returns IsConfigured()
   = true (send / patch / binding prompt work) but
   SupportsOAuthInstall() = false until the PersonalAgent install-time
   response shape is pinned. Handler-side `install_supported` and
   StartLarkInstall now gate on SupportsOAuthInstall, so a half-wired
   client never reveals the scan-to-bind UI. larkOAuthErrorReason also
   maps ErrAPIClientNotConfigured to a dedicated
   `oauth_exchange_unimplemented` reason so a raw callback hit no
   longer masquerades as `internal_error`.

2. defaultRenderer now emits config.update_multi=true on every Kind.
   Lark refuses to apply PatchInteractiveCard to a card whose initial
   config doesn't declare it shared/updatable, so the absent flag
   would make every patch after the first send silently no-op on the
   wire while the local outbound status row still flipped to
   streaming/final.

Tests cover both halves of each fix:
- TestHTTPClient_SupportsOAuthInstall_FalseUntilExchangeLands +
  TestHTTPClient_StubReportsBothCapabilitiesFalse pin the new
  capability surface.
- TestStartLarkInstall_TransportOnlyClientReportsNotConfigured +
  TestListLarkInstallations_TransportOnlyClientReportsInstallNotSupported
  pin the handler gate at exactly the half-wired state.
- TestLarkOAuthErrorReason_APIClientNotConfigured pins the mapping
  for both the bare sentinel and the fmt.Errorf-wrapped form
  HandleCallback produces.
- TestDefaultRendererConfigCarriesUpdateMulti covers every CardKind.
- TestHTTPClient_(Send|Patch)InteractiveCard_DefaultRendererBodyHasUpdateMulti
  verify the wire body Lark actually receives carries update_multi
  through both send and patch transport paths.

Co-authored-by: multica-agent <github@multica.ai>

* feat(integrations/lark): real OAuth code exchange + agent-detail bind entry (MUL-2671)

Stages the install side of the MVP critical path on top of the real
HTTP outbound work:

- httpAPIClient.ExchangeOAuthCode runs the production Lark v2 OAuth
  flow: POST /authen/v2/oauth/token to swap the authorization code
  for the installer's open_id, then GET /bot/v3/info under the parent
  app's tenant_access_token to fetch bot_open_id. Result feeds
  InstallationParams unchanged so OAuthService.HandleCallback's
  auto-bind step lights up automatically.

- HTTPClientConfig gains OAuthAppID/OAuthAppSecret, read from the same
  MULTICA_LARK_OAUTH_APP_ID/_APP_SECRET env vars the OAuthConfig
  consumes. SupportsOAuthInstall now mirrors that pair so the install
  capability gate is honest: outbound transport without OAuth creds
  reports configured-but-not-install-supported, exactly like before.

- Agent detail inspector wires the LarkAgentBindButton in a new
  Integrations section, viewer-hidden by canEdit. The button still
  self-hides when SupportsOAuthInstall is false, so a deployment
  without OAuth creds renders the section empty rather than CTA-broken.

- Capability wording cleaned across handler / router / lark-tab to say
  "OAuth-install capability" instead of "real APIClient wired", and
  the misleading TransportOnly... test was renamed/refocused on the
  early-return branch it actually exercises (Elon non-blocking note).

Co-authored-by: multica-agent <github@multica.ai>

* fix(integrations/lark): identity-only OAuth + atomic bind (MUL-2671)

Addresses Elon's round-4 must-fix items on PR #3277:

1. OAuth v2 token → user_info chain now matches Lark's official
   user-OAuth shape. `httpAPIClient.ExchangeOAuthCode` POSTs
   /open-apis/authen/v2/oauth/token (RFC 6749: top-level
   access_token, NO open_id), then GETs /open-apis/authen/v1/user_info
   with the user_access_token as Bearer to obtain the installer's
   open_id / union_id. The test fixture now reflects the real
   wire shape (separate user_info handler; no synthetic open_id in
   the token response).

2. `OAuthExchangeResult` is identity-only — drops the synthesized
   shared-parent AppID / AppSecret / BotOpenID return that broke
   the UNIQUE(app_id) constraint and the dispatcher's per-app_id
   routing. `OAuthService.HandleCallback` no longer Upserts an
   installation row: it looks up the lark_installation already
   provisioned via the manual-paste POST /lark/installations route
   and binds the installer onto it. Two new typed errors —
   ErrInstallationNotProvisioned and ErrInstallationRevoked — map
   to `installation_not_provisioned` / `installation_revoked`
   reasons at the HTTP boundary so the UI can guide the admin.
   The PersonalAgent install API (which would deliver
   per-installation bot credentials at scan time) remains a
   follow-up; until it lands the OAuth flow is identity-binding
   only and the agent-detail bind button stays hidden on
   deployments without OAuth env (capability gate unchanged).

3. The installation lookup + installer bind run inside a single
   DB transaction so a concurrent revoke / re-provision between
   the read and the binding insert cannot leak a half-applied
   state. `InstallerBinder.BindInstaller` is renamed to
   `BindInstallerTx` and accepts the OAuth-service-owned
   transaction's qtx; the binding_token redemption path is
   unchanged.

4. `validateExchangeResult` is simplified to require only the
   installer's open_id; the obsolete ErrExchangeMissingAppID /
   AppSecret / BotOpenID sentinels are removed (no caller can
   trip them now). The oauth_test suite is rewritten to use a
   stub failTxStarter so tests covering state-token verification
   and exchange-error propagation remain DB-free, while a new
   TestOAuthCallbackOpensTxAfterValidExchange pins the post-must-fix
   order (state ok + exchange ok ⇒ Begin runs before any lookup
   or bind, and a Begin failure aborts cleanly with no bind).

Verified locally:
  - go build ./... / go vet ./... clean
  - go test ./internal/integrations/lark/... ✓
  - go test ./internal/handler -run 'Lark|Binding|OAuth' ✓
  - go test ./internal/util/secretbox/... ./internal/service/... ✓

Co-authored-by: multica-agent <github@multica.ai>

* feat(integrations/lark): device-flow scan-to-install (MUL-2671)

Replaces the manual paste-credentials install path + identity-only
OAuth callback (rejected in product review: too many steps before a
user sees value) with a true single-step scan-to-install built on
Lark's RFC 8628 device-flow registration endpoint
(POST accounts.feishu.cn/oauth/v1/app/registration) — the same
protocol the official larksuite/oapi-sdk-go/scene/registration
package and zarazhangrui/feishu-claude-code-bridge use.

User journey: admin clicks "Bind to Lark" on the Agent detail page
→ QR dialog opens → admin scans in the Lark app on their phone →
authorizes the new PersonalAgent → dialog auto-closes with the new
installation visible. No app_id / app_secret to copy, no Lark
developer console visit, no Multica-side OAuth env to configure.

Backend (server/internal/integrations/lark):
- registration.go — inline ~280-line RFC 8628 client. Begin posts
  archetype=PersonalAgent / auth_method=client_secret /
  request_user_info=open_id; Poll follows the upstream SDK's
  state machine including the tenant-brand mid-stream domain swap
  to accounts.larksuite.com when a Lark-international account
  authorizes. SDK is NOT vendored — one endpoint isn't worth
  dragging the full oapi-sdk-go + transitive deps.
- registration_service.go — owns the in-process session store
  + background polling goroutine. On success calls APIClient.GetBotInfo
  (the new IM-side endpoint added below) and writes
  lark_installation + the installer's lark_user_binding inside
  one DB transaction so a half-applied install can never land.
  Stable error_reason codes (expired / access_denied /
  lark_protocol_error / bot_info_failed / installation_conflict /
  installer_bind_failed / internal_error) drive the UI copy
  without parsing prose.
- client.go / http_client.go — drops ExchangeOAuthCode and
  SupportsOAuthInstall (no longer applicable: device-flow returns
  identity alongside credentials in one response); adds GetBotInfo
  which mints a tenant_access_token from the freshly-minted
  client_id / client_secret and calls /open-apis/bot/v3/info for
  the bot_open_id. install_supported now gates on IsConfigured()
  (real HTTP client wired) instead of a separate OAuth capability.
- binding_token.go — absorbs InstallerBindParams / InstallerBinder
  (previously in oauth.go), retargets the doc-comment from the
  OAuth caller to the device-flow caller.
- Deletes oauth.go + oauth_test.go entirely.

Handler & router (server/internal/handler, server/cmd/server):
- POST /api/workspaces/{id}/lark/install/begin — opens a new
  registration session, returns {session_id, qr_code_url,
  expires_in_seconds, poll_interval_seconds}. Admin-only.
- GET /api/workspaces/{id}/lark/install/{sessionId}/status —
  polling endpoint, returns {status, installation_id?, error_reason?,
  error_message?}. Workspace-scoped lookup so a stolen session_id
  cannot be polled from another workspace. Admin-only.
- Removes POST /lark/installations (paste form),
  GET /lark/install/start (OAuth-redirect entry), and
  GET /api/lark/install/callback (OAuth redirect target).
- Removes MULTICA_LARK_OAUTH_APP_ID / _APP_SECRET / _REDIRECT_URI /
  _STATE_SECRET / _AUTHORIZE_URL / _SUCCESS_URL env vars. Self-host
  operators no longer need a parent Lark app at all.

Frontend (packages/core, packages/views):
- New types BeginLarkInstallResponse / LarkInstallStatusResponse
  + matching API methods (beginLarkInstall / getLarkInstallStatus);
  drops getLarkInstallURL.
- LarkAgentBindButton opens LarkInstallDialog instead of a
  window.open() to Lark's authorize page. The dialog uses
  react-qr-code (catalog) to render the verification_uri_complete
  inline as SVG (no external CDN image), polls status at the
  server-supplied cadence, auto-closes on success, offers
  "scan again" on terminal failure. Per CLAUDE.md "Enum drift
  downgrades, not crashes", error_reason switch has a default
  fallback so an older desktop build on a newer server still
  renders the generic failure copy.
- Adds the device-flow strings to en + zh-Hans settings.json;
  removes the obsolete OAuth-not-configured copy.

Verified locally:
  - go build ./... / go vet ./... clean
  - go test ./internal/integrations/lark/... — all green
    (existing tests + 15 new registration / GetBotInfo tests)
  - go test ./internal/handler -run 'Lark|Binding' — all green
  - pnpm typecheck — all 6 packages clean
  - pnpm lint — 0 errors (15 pre-existing warnings, none in changed files)
  - pnpm --filter @multica/views test — 859/859 pass

Pre-existing failures in server/internal/middleware (column
"profile_description" missing from local test DB) reproduce against
the parent commit and are unrelated to this change.

Co-authored-by: multica-agent <github@multica.ai>

* fix(integrations/lark): gate bind CTA to workspace admins, terminate QR polling on 4xx (MUL-2671)

Two frontend must-fixes from the PR #3277 二审:

1. LarkAgentBindButton now self-hides for non-admin viewers in addition
   to the existing install_supported check. The agent-detail page mounts
   the button under `canEdit`, which canEditAgent lets agent owners
   through even when they are not workspace admins — but the backend
   gates POST /lark/install/begin and the status poll on owner/admin
   (router.go:478-487), so the previous behavior shipped a CTA that was
   guaranteed to 403. The new gate reads workspace role from the same
   member list the settings tab already uses.

2. The status polling loop now terminates on 404 (session gone — server
   restarted, multi-instance routing, or in-process GC swept it) and
   403/401 (permission revoked mid-session). Previously every error
   path scheduled another setTimeout, which trapped the user on a stale
   QR forever. ApiError gives us the HTTP status verbatim; terminal
   responses set status=error with stable error_reason codes
   (session_lost, forbidden) that flow through the existing dialog
   switch + retry/close affordances. 5xx + network blips still retry.

i18n: new install_error_session_lost / install_error_forbidden in en
and zh-Hans, with default fallback preserved per the enum-drift rule.

Coverage: 6 new vitest cases — admin/owner allow, member deny,
unsupported-install deny, and the two terminal-error polling paths
using fake timers to assert the loop stops scheduling.

Also clears a handful of stale OAuth/manual-install doc comments
flagged in the review (non-blocker cleanup): doc.go's §10 now points
at RegistrationService, installation.go's input-shape doc loses the
OAuth-callback half, and client.go's stubAPIClient comments no longer
reference OAuth callbacks.

Co-authored-by: multica-agent <github@multica.ai>

* docs(integrations/lark): describe gate as device-flow install in agent-detail integrations comment (MUL-2671)

The comment block above the agent-detail Integrations section still
described the capability gate as 'server-side OAuth-install'. The
OAuth path is gone — install is now device-flow per RFC 8628 — so the
comment now reads 'server-side device-flow install capability gate'.

Pure comment change; behavior is unchanged. Cleans up the nit Elon
called out in PR #3277 二审 (MUL-2671).

Co-authored-by: multica-agent <github@multica.ai>

* feat(integrations/lark): wire inbound pipeline + WS Hub at boot (MUL-2671)

Stage 3.a of MUL-2671. Hub class, Dispatcher, ChatSessionService and
AuditLogger have all been implemented and tested in prior PRs but
none of them was constructed at boot, so the in-process plumbing
was never exercised end-to-end. This change wires them together
behind the same `MULTICA_LARK_SECRET_KEY` gate that already gates
InstallationService / RegistrationService, and starts the Hub under
the existing `sweepCtx` so it winds down alongside the other
long-running workers after HTTP drain.

The real long-conn EventConnector is still pending; the factory
hands every supervisor a shared NoopConnector that holds the lease
and emits nothing. That lets staging exercise the lease /
supervisor / shutdown lifecycle against real DB rows without
committing to the Lark wire protocol implementation. Swapping in
the real connector is a single line change in the same router
block; the Dispatcher / ChatSessionService / Hub seams stay frozen.

## Why a noop placeholder, not a stub-or-skip

The Hub's value is mostly its lifecycle: §4.4 ownership lease,
LeaseRenewInterval / LeaseTTL, supervisor reap on revoke, clean
release on shutdown. None of that runs unless the Hub is actually
started. Holding off until the real connector lands means the next
PR has to debut both pieces simultaneously; wiring the supervisor
loop first lets the real connector PR be a focused, reviewable
swap.

## Changes

- `internal/integrations/lark/noop_connector.go` — `NoopConnector`
  implementing `EventConnector`: blocks on ctx until the Hub
  cancels (lease loss / shutdown / revoke), emits no events, logs
  on enter/exit so operators see exactly which installation the
  supervisor is holding the lease for.
- `internal/integrations/lark/noop_connector_test.go` — verifies
  the connector blocks until ctx cancel, returns nil on clean exit,
  never invokes the emit callback, and the factory shares a single
  connector instance across installations.
- `internal/handler/handler.go` — new `LarkHub *lark.Hub` field on
  `Handler`. Nil when the Lark integration is disabled.
- `cmd/server/router.go` — inside the existing Lark wiring block,
  construct `AuditLogger`, `ChatSessionService` (with `*pgxpool.Pool`
  for the in-tx dedup Mark), `Dispatcher` (wiring `h.IssueService`
  and `h.TaskService` so `/issue`-created issues share counter /
  duplicate guard / project boundary / broadcast / analytics with
  the rest of the product), and the `Hub` with the
  `NoopConnectorFactory`. `NewRouterWithOptions` now returns
  `(chi.Router, *handler.Handler)` so main.go can drive Hub
  lifecycle; `NewRouter` discards the handler.
- `cmd/server/main.go` — start the Hub under `sweepCtx` after the
  other background workers, and `Wait` on it after HTTP drain +
  sweep cancel so the lease renewer can issue a final release
  before exit. Skipped entirely when `h.LarkHub == nil`.

## Test plan

- [x] `go build ./...` clean
- [x] `go vet ./...` clean
- [x] `go test ./internal/integrations/lark/...` (new noop tests +
      existing hub / dispatcher / chat_service / registration /
      binding_token / outbound / issue_command suites) — all pass
- [x] `go test ./internal/handler -run 'TestLark|TestRedeemLarkBinding'`
      pass — handler-side Lark surfaces unchanged
- [x] `go test ./internal/service/... ./internal/util/secretbox/...`
      pass
- [x] `pnpm --filter @multica/views exec vitest run settings/components/lark-tab`
      pass (6/6) — frontend lark surfaces unchanged
- [ ] Local broad `go test ./internal/handler/...` still blocked by
      the pre-existing test DB schema drift Elon flagged in the
      previous round (`column "metadata" does not exist`,
      unrelated to this change); CI is the authoritative check.
- [ ] Manual end-to-end deferred until the real long-conn
      EventConnector lands (next stage).

MUL-2671

Co-authored-by: multica-agent <github@multica.ai>

* fix(integrations/lark): bound Hub lease release + shutdown wait (MUL-2671)

Lease release used context.Background(); a stalled DB pool could pin
shutdown indefinitely. Add LeaseReleaseTimeout (5s default) and
ShutdownTimeout (15s default) to HubConfig, route releaseLease through
a bounded context, and expose WaitWithTimeout for main.go so a wedged
supervisor degrades to LeaseTTL expiry on the next replica instead of
blocking process exit. Also correct the LarkHub field comment in
handler.go: the Hub is wired whenever the at-rest secret key is set,
independent of whether the outbound HTTP APIClient is configured.

Co-authored-by: multica-agent <github@multica.ai>

* feat(integrations/lark): real WS long-conn connector + ctx-cancel-breaks-read (MUL-2671)

Replaces NoopConnectorFactory with a production EventConnector that
opens Lark's event-subscription WebSocket. Gated behind
MULTICA_LARK_WS_ENABLED so staging boots stay on the noop path until
operators opt in, and falls back to noop with a warning when the WS
flag is set without MULTICA_LARK_HTTP_ENABLED (the real connector
needs the cached tenant_access_token).

Why this connector exists separately from the Hub: gorilla/websocket
ReadMessage blocks on the underlying TCP socket and does not observe
context. The watchdog goroutine inside WSLongConnConnector.Run closes
the conn the moment ctx fires, so lease loss / shutdown breaks the
blocking read in bounded time — exactly the invariant Hub
renewLeaseUntil's runCancel depends on for the "at most one active WS
per installation across replicas" guarantee. Tests cover this
explicitly (TestWSConnectorRunReturnsOnCtxCancelEvenWhenReadIsBlocked).

The Lark wire surface is split into three swappable seams so the
transport layer stays tested in isolation:

  - EndpointFetcher (POST /event-subscription/v1/connection_token)
    resolves a one-shot wss URL per Run. No caching — replaying a
    one-shot token would look like a Lark outage.
  - FrameDecoder turns one raw JSON envelope into an InboundMessage
    or a "control / heartbeat / drop" verdict. Decoder errors log
    + drop the frame; they do NOT tear down the connection.
  - CredentialsProvider wraps InstallationService.DecryptAppSecret
    so plaintext app_secret lives in memory only during a Run.

Also fixes the handler.go LarkHub comment: it still said "joins on
Wait during graceful shutdown" but main.go has used WaitWithTimeout
(bounded wait) for several commits. Comment now matches.

Co-authored-by: multica-agent <github@multica.ai>

* feat(integrations/lark): align WS to official binary Frame protocol + DispatchResult outbound replies (MUL-2671)

Two must-fix items from Elon's review of PR #3277:

1. WS protocol layer rewritten to match the official Lark Go SDK
   (`larksuite/oapi-sdk-go/v3/ws`):
   - Bootstrap is `POST /callback/ws/endpoint` with AppID/AppSecret
     in the body (no tenant_access_token bearer). Response carries
     wss URL + ClientConfig (PingInterval / ReconnectInterval /
     ReconnectNonce / ReconnectCount).
   - `service_id` is parsed from the wss URL query and used as
     Frame.Service on every outbound frame.
   - Wire envelope is the binary protobuf `pbbp2.Frame` (hand-rolled
     via protowire to avoid pulling the whole SDK in, byte-identical
     field tags). JSON payloads are nested inside Frame.Payload.
   - Inbound data frames are ACKed with a `Response{code:200,...}`
     JSON payload that reuses the inbound headers; infra failures
     produce code=500 so Lark retries.
   - Ping is the app-layer binary `NewPingFrame(serviceID)` at the
     server-supplied cadence; WebSocket protocol PING is removed
     (Lark ignores it). Server-initiated pings get a pong reply.
   - ctx-cancel-breaks-read invariant preserved via the watchdog
     goroutine that closes the conn on ctx.Done; the read loop and
     ping goroutine serialize their writes through a single mutex.

2. `DispatchResult` outbound replies wired via a new `OutcomeReplier`:
   - `OutcomeNeedsBinding` mints a one-shot binding token and sends
     the binding prompt card to the sender's open_id.
   - `OutcomeAgentOffline` / `OutcomeAgentArchived` push a notice
     card into the chat with the agent name + Chinese copy matching
     §4.6.
   - `OutcomeIngested` stays owned by the Patcher; `OutcomeDropped`
     is silent.
   - The replier is best-effort: outbound failures are logged and
     swallowed so a Lark outage cannot stall the inbound pipeline.
   - Hub installs the noop replier by default; router wires the
     production `LarkOutcomeReplier` when APIClient.IsConfigured().

PersonalAgent long-conn risk surfaced (open per Feishu docs:
`长连接模式仅支持企业自建应用`). The implementation works for any
app archetype; the open question is whether `/callback/ws/endpoint`
accepts PersonalAgent credentials in practice. Surfacing the Lark
code+msg verbatim from the bootstrap response so an operator running
the smoke test sees the exact failure rather than a generic timeout.

Co-authored-by: multica-agent <github@multica.ai>

* fix(integrations/lark): byte-compat Frame marshal, chunk reassembly, ACK off reply critical path (MUL-2671)

Three protocol blockers from Elon's review of 9540008a:

1. Frame.Marshal is now byte-identical to oapi-sdk-go/v3/ws/pbbp2.Frame:
   - SeqID/LogID/Service/Method (proto2 req) emit unconditionally even at zero
   - PayloadEncoding/PayloadType/LogIDNew emit unconditionally per gogo
     generated MarshalToSizedBuffer (no zero-guard)
   - Payload uses the SDK's `!= nil` guard (nil omits, []byte{} emits 0-length)
   - ACK payload JSON matches SDK's NewResponseByCode + json.Marshal output
     ({"code":N,"headers":null,"data":null})

   Golden tests pin exact byte sequences for ping/pong/ACK/full/zero
   frames; verified against the real SDK pbbp2.pb.go MarshalToSizedBuffer
   producing identical bytes.

2. Multi-frame events (sum>1) are reassembled via the new chunkAssembler:
   - 5s sliding TTL (matches SDK combine() cache TTL)
   - Lazy GC on admit (no separate sweeper goroutine)
   - Out-of-order seq + duplicate seq idempotent
   - Partial chunks are NOT ACKed (SDK behaviour: only the final chunk's
     ACK confirms the whole event so Lark can retry on partial loss)
   - Connector wires assembler per-Run; state dies with the session

3. OutcomeReplier detached from ACK critical path:
   - HubConfig.ReplyTimeout default 2.5s, strictly under Lark's 3s ACK deadline
   - handleEvent dispatches synchronously (fast DB path), then spawns the
     replier under a fresh background ctx with WithTimeout(ReplyTimeout)
   - Hub.replyWg tracks in-flight replies; Hub.Wait / WaitWithTimeout
     drain them so shutdown is bounded
   - Noop replier short-circuits inline (no goroutine cost when outbound
     APIClient isn't configured)

   Proof tests:
   - TestHubScheduleReplyReturnsImmediately: scheduleReply with a 10s
     slow replier returns in <50ms
   - TestHubReplyTimeoutCancelsHungReplier: hung replier ctx fires at
     ReplyTimeout
   - TestHubWaitDrainsInFlightReplies: Wait blocks until replies finish
   - TestHubACKNotBlockedByOutboundReply: end-to-end through the
     connector — data-frame ACK lands within 500ms even when the
     replier hangs 5s

PersonalAgent real-env smoke remains Bohan's decision; this PR closes
the technical blockers Elon flagged.

Co-authored-by: multica-agent <github@multica.ai>

* docs(service/issue): narrow position concurrency claim to create-create (MUL-2671)

Elon's review of the merge resolution flagged that the comment on the
new NextTopPosition call promised more than the code guarantees:
concurrent manual reorder via UpdateIssue(position) does NOT take the
workspace row lock that IncrementIssueCounter holds, so a create
racing a reorder can still land on the same position. Rewrite the
comment to only claim create-create serialization, which is the
behaviour the lock actually delivers. No code change.

Co-authored-by: multica-agent <github@multica.ai>

* fix(integrations/lark): keep device-flow polling on RFC 8628 HTTP 400 (MUL-2671)

Lark's device-flow polling endpoint returns HTTP 400 with the JSON
body `{"error":"authorization_pending"}` while the user hasn't scanned
the QR yet — this is the RFC 8628 spec, and the upstream oapi-sdk-go
implements the same handling. Our previous doForm treated ANY non-2xx
as a terminal protocol error, so every install session was killed by
the first poll (~5s after begin) and the install dialog appeared
silently empty: the frontend received status=error +
lark_protocol_error before the user could even read the description.

Fix: doForm now decodes the JSON body first; if it parses, the caller
(Begin / Poll) routes on the body's `error` field, where the existing
switch correctly maps authorization_pending / slow_down to "keep
polling" and access_denied / expired_token to terminal failure. Only
unparseable bodies (5xx HTML proxy pages, gateway timeouts) still
surface as a typed http_NNN RegistrationError.

Three regression tests pin the new behaviour:
- HTTP 400 + authorization_pending → res.Status="authorization_pending"
- HTTP 400 + access_denied → res.Err.Code="access_denied" (terminal)
- HTTP 502 + HTML body → http_502 RegistrationError

Verified against the live local env: install/begin -> 200, status
stays "pending" through the first poll cycle, no longer flips to
"error" within seconds.

Co-authored-by: multica-agent <github@multica.ai>

* fix(views/lark): reset closedRef on every mount so StrictMode double-mount renders QR (MUL-2671)

Empty QR dialog body in the dev env: Bohan opened the bind dialog and
got an empty white area where the QR should have been — no QR, no
"starting" placeholder, no error text. Backend was returning the QR
URL correctly; the bug was on the frontend.

Root cause: React 19 / Next.js dev StrictMode mounts every component
twice (mount → cleanup → mount). The component instance is REUSED
across the simulated remount, which means useRef objects are
preserved. The dialog's `closedRef` lifecycle:

  1. Mount #1: closedRef={current:false}, beginSession() kicked off
     (HTTP request still in flight)
  2. Cleanup runs: closedRef.current=true
  3. Mount #2: beginSession() kicked off again, BUT the ref still
     reads {current:true} from step 2
  4. Both promises resolve. Both hit the post-await guard
     `if (closedRef.current) return;` and bail out before setSession().
  5. Result: session stays null forever. Every conditional in the
     dialog body (beginning/session-pending/success/error) is false →
     empty body.

Fix: reset closedRef.current=false at the START of the effect, not
just at component construction. The cleanup-then-mount pair now
re-arms the guard so subsequent setSession calls actually land.

Regression test wraps the dialog in <StrictMode> and asserts the
QR appears within 2s with the correct value — fails closed if anyone
removes the reset.

Co-authored-by: multica-agent <github@multica.ai>

* fix(integrations/lark): drop EventTaskCompleted subscription so the chat reply doesn't get overwritten by "Done." (MUL-2671)

Bohan reproduced on the live dev env: agent replies show only a card
saying "Done." in Lark, even though Multica's own chat panel has the
real "Hello! I'm cc…" reply. Tasks succeed end-to-end, but the user
loses the reply on the Lark side.

Root cause: TaskService.CompleteTask publishes two events for every
chat task IN ORDER:

  1. broadcastChatDone(...)       → ChatDonePayload{Content: "Hello!..."}
  2. broadcastTaskEvent(Completed) → map[string]any{task_id, agent_id,...}
                                     (no `content` key)

The Patcher subscribed to BOTH and routed each to finalize(). The
first patch correctly rendered the reply text, the second
patched the same card with an empty payload — chatDoneContent()
returned "" and the renderer fell back to "Done." (default empty-body
copy). The second patch wins because Lark stores whatever was last
applied.

Fix: stop subscribing to EventTaskCompleted in the Patcher and remove
the corresponding switch arm. EventChatDone is the canonical "agent
finished replying" signal for the Lark card path; EventTaskCompleted
is still emitted to the bus for other listeners (web UI, analytics,
task usage) where the lack of content doesn't matter.

Regression test TestPatcherIgnoresEventTaskCompletedForChatTasks
emits ChatDone followed by TaskCompleted on a streaming card and
asserts: exactly one patch, body contains the agent reply, body does
NOT contain "Done.". If anyone re-adds the EventTaskCompleted
subscription, this fails immediately.

Co-authored-by: multica-agent <github@multica.ai>

* feat(integrations/lark): chat replies as plain text IM messages, not card chrome (MUL-2671)

Bohan reported on the live dev env that even with the agent's reply
shown correctly, every message is wrapped in an interactive card with
the agent name as the header — it feels like a system notification,
not a normal chat reply. He wants the reply to land as a regular Lark
text bubble.

Changes:

- Add APIClient.SendTextMessage backed by Lark's
  /open-apis/im/v1/messages with msg_type=text. JSON-encodes the
  {"text": ...} envelope Lark requires so callers pass raw strings.
- Patcher.Register no longer subscribes to EventTaskQueued /
  EventTaskRunning. There is no more thinking → running → final
  card lifecycle on the success path: it added card chrome without
  buying anything for free-form chat.
- On EventChatDone, the new sendChatReply path posts the assistant
  message content as plain text. Empty content is silently dropped
  rather than rendered as "Done." (the prior fallback that
  confused Bohan).
- Failure path keeps a one-shot error card on EventTaskFailed —
  the visual distinction from a normal reply is genuinely useful,
  and failures are rare enough that the chrome isn't noisy.
- Throttle / lastPatched map / MinPatchInterval / shouldPatch /
  markPatched / loadCardOrSkip are all removed; nothing in the new
  flow patches.

Tests:

- TestPatcherSendsPlainTextOnChatDone pins the new contract: exactly
  one SendTextMessage call, no card sends or patches, content
  matches the ChatDonePayload.
- TestPatcherDropsEmptyChatReply pins the "no more Done. fallback"
  decision — empty content drops, period.
- TestPatcherFailEventSendsErrorCard pins the failure path still
  uses a card (one-shot, no patching).
- TestPatcherIgnoresEventTaskCompletedForChatTasks rewritten for
  text path: ChatDone then TaskCompleted yields exactly one text
  send, no duplicate.
- TestPatcherSkipsWhenNoChatSessionBinding and
  TestPatcherSwallowsInstallationLoadErrors rewritten to drive
  EventChatDone (the new entry point) instead of TaskQueued.
- TestPatcherSendsThinkingCardOnTaskQueued deleted (no more
  thinking card).

Co-authored-by: multica-agent <github@multica.ai>

* feat(integrations/lark): pre-fill PersonalAgent bot name as "<agent> - Multica" (MUL-2823) (#3520)

The device-flow install left the bot at Lark's auto-generated
"{用户姓名}的智能助手". Lark's registration scene supports pre-filling the
name via a `name` query param on the verification/QR URL (mirrors the
upstream SDK's AppPreset.Name) — a user-editable default that rides on
the QR URL, not the begin POST body (which has no name field).

BeginInstall already loads the agent for its ownership check, so we keep
it and thread `<agent.Name> - Multica` through Begin → decorateQRCodeURL.
A blank name degrades to plain "Multica".

There is no post-install rename API (bot/v3 is read-only; no
bot/v3/update), so the install-time pre-fill is the only programmatic
lever; the user can still edit the name on the creation form.

Co-authored-by: J <j@multica.ai>
Co-authored-by: multica-agent <github@multica.ai>

* fix(integrations/lark): restore /issue confirmation + pin SendTextMessage wire (MUL-2671)

Two recovered/added contracts off Trump's review of HEAD fe381a07:

1) /issue confirmation in Lark was a casualty of the plain-text
   refactor. The pre-refactor `RenderInput.IssueNumber` field was
   declared but never actually rendered into the card body, so even
   in the original card-based flow the user never saw a "Created
   [MUL-42]" confirmation. Now the OutcomeReplier handles
   OutcomeIngested + IssueID.Valid by sending a plain text message:

     Created MUL-42 — fix login bug
     https://multica.example/issues/MUL-42

   Composed from a new DispatchResult.IssueIdentifier +
   IssueTitle, populated by the Dispatcher from
   workspace.IssuePrefix + issue.Number / issue.Title. Workspace
   lookup is best-effort: a Postgres blip on workspace gets a "#42"
   fallback rather than silently dropping the confirmation.

   The agent's own chat reply (if any) continues to land separately
   via the Patcher on EventChatDone — these are two semantically
   distinct messages and the user benefits from seeing both.

2) SendTextMessage is the wire layer Trump flagged for missing
   coverage. Three new wire tests pin:
   - happy path: POST /open-apis/im/v1/messages?receive_id_type=chat_id,
     msg_type=text, Bearer <tenant_access_token>, double-JSON
     content envelope
   - special-character round trip: newlines, double quotes,
     backslashes, tabs, Chinese + emoji, JSON-lookalike strings.
     The inner {"text": ...} is encoded once at JSON.Marshal time
     and once again when the outer body serializes; losing either
     pass corrupts the message and the bug is invisible without a
     contract pin.
   - Lark error path: non-zero `code` surfaces as a wrapped error
     with the code embedded.

Tests:
- TestDispatcher_IssueCreationFromCommand asserts IssueIdentifier
  ("MUL-42") and IssueTitle propagate through DispatchResult.
- TestDispatcher_IssueIdentifierFallsBackToNumberOnWorkspaceLookupErr
  pins the "#7" degrade-graceful fallback.
- TestLarkOutcomeReplierIssueCreatedSendsConfirmation pins the
  text body (identifier + title + deep link) and asserts no card
  send on this path.
- TestLarkOutcomeReplierOutcomeIngestedSilentWithoutIssue pins
  the silent-on-plain-chat default so we don't accidentally start
  emitting a confirmation for every message.
- TestHTTPClient_SendTextMessage_* covers the wire contract.

Frontend locale parity (en + zh-Hans, 53 tests) is currently green
on this HEAD; no changes needed.

Co-authored-by: multica-agent <github@multica.ai>

* fix(views/locales): add missing ko keys for Lark MVP (MUL-2671)

Trump flagged on PR #3277 review that the ko bundle was missing the
Lark-MVP-only keys that en + zh-Hans both carry. The parity test
caught it cleanly after main was merged in (Korean PR landed on main
between the prior review and this one):

  common.lark_bind.*                       (13 keys)
  settings.page.tabs.lark                  (1 key)
  settings.lark.*                          (45 keys)
  agents.inspector.section_integrations    (1 key)

Korean translations are professional/concise — "Lark" stays as the
brand name (matches how en keeps "Lark" + "(飞书)" parenthetically;
ko/users searching for the product expect "Lark"), and product copy
follows the zh-Hans tone where Multica nouns ("에이전트", "워크스페이스")
are romanized loan words consistent with the rest of the ko bundle.

Slot ordering preserved against EN:
  - page.tabs.lark sits between github and integrations
  - inspector.section_integrations sits right after section_skills

Verified: pnpm exec vitest run locales/parity → 105/105 pass.
Co-authored-by: multica-agent <github@multica.ai>

* fix(integrations/lark): /issue origin_type CHECK + Hub restart on credentials rotation (MUL-2671)

Two live-env bugs Bohan reproduced:

1) /issue command crashed the WS connector. Dispatcher writes
   origin_type='lark_chat' on issues born from `/issue`, but the
   issue_origin_type_check CHECK constraint was last extended in
   migration 060 for quick_create — it doesn't list lark_chat, so
   every Lark /issue tripped SQLSTATE 23514 and bubbled up as an
   infra error. The infra error tore down the WS connector, Lark
   retried the same message, the new connector tripped the same
   constraint and crashed again. Repro in the live env: three
   crashes from the same /issue event over ~40s, each leaving the
   user with no confirmation in Lark.

   Migration 111 extends the CHECK list:
     CHECK (origin_type IN ('autopilot', 'quick_create', 'lark_chat'))

2) Re-scanning an already-bound agent silenced the bot. The device
   flow re-registers with Lark, which mints a brand-new bot (fresh
   app_id + app_secret); RegistrationService.finishSuccess upserts
   into lark_installation by agent_id, so the row's credentials
   rotate in place. But the running supervisor held the OLD inst
   struct by value and kept a WS open against the OLD bot's app_id —
   so all events to the NEW bot went nowhere. Bohan's "claude code
   现在不能在飞书里回复了" symptom maps exactly to this:

      log timeline:
      16:29:57  cc connector connected with app_id=cli_aa9398dd...  (OLD)
      16:34:07  lark registration: install complete                  (rotation)
      → row.app_id is now cli_aa93f36f...                            (NEW)
      → old WS still subscribed to OLD app_id; new app_id receives nothing

   Fix: Hub.sweep now compares each installation row's credentials
   fingerprint (app_id + bot_open_id + sha256(app_secret_encrypted))
   against the snapshot the running supervisor was started with. On
   diff, cancel the old supervisor and start a fresh one inline. A
   monotonic gen counter on the supervisor entry disambiguates the
   old goroutine's deferred cleanup from the new entry the rotation
   path already swapped in.

Tests:
- TestHubRestartsSupervisorOnCredentialsRotation pins the new path:
  starts hub on app_one, rotates the row to app_two, asserts the
  connector factory is called again with the fresh AppID.
- TestHubDoesNotRestartSupervisorOnUnchangedRow pins the negative
  case so an unchanged row doesn't degenerate into a per-sweep
  busy-loop.
- Existing hub tests (lease, supervise, shutdown, ACK timing,
  noop replier) all green.

Verification:
- go test ./internal/integrations/lark/... -race -count=1   ok
- go build ./... clean
- migration applied locally; \d+ issue confirms lark_chat in CHECK

Co-authored-by: multica-agent <github@multica.ai>

* fix(integrations/lark): per-supervisor lease token to fence rotation handoff (MUL-2671)

Elon flagged a race in HEAD be8d4cef's rotation path: both the old
and the new supervisors of the same Hub used the hub-wide nodeID as
their WS lease token, so an old supervisor's post-cancel
releaseLease(nodeID) would CAS-match the lease row the successor had
just acquired with the SAME token and DELETE it. Symptom would be a
silently empty lease row a few hundred ms after every device-flow
re-scan — no replica owning the install, no events delivered, the
"bot goes quiet" pattern Bohan hit the first time but now from the
fencing side rather than the credentials side.

Fix: leaseToken(nodeID, gen) composes "<nodeID>-g<gen>", where gen is
the monotonic counter already attached to each supervisorEntry. The
nodeID prefix keeps cross-replica observability (an operator
inspecting lark_installation.ws_lease_token can still map back to a
process) while the -g suffix makes the OLD supervisor's release
target the OLD row state. Once the rotation path swaps in the new
supervisor, the row's CurrentToken is the new -g(N+1) token, so the
old -gN release's WHERE clause no-ops instead of clobbering.

acquireLease / renewLeaseUntil / releaseLease now take an explicit
token argument; supervise threads its leaseToken through. The
plumbing isn't pretty, but having an explicit argument at every call
site is the only way the rotation invariant survives subsequent
refactors — without it, a future caller could quietly reintroduce
"just use h.nodeID" and the race is back.

Two regression tests:

- TestHubRotationStaleReleaseDoesNotClearSuccessorLease drives the
  fake lease state machine directly:
    1. old acquires(tokenA)
    2. rotation lands; new acquires(tokenB)
    3. old's stale release(tokenA) fires
  Asserts owner ends up still tokenB. Hub-wide-nodeID code would fail
  step 3 by clearing the entry.

- TestHubRotationEndToEndKeepsSuccessorLeased runs the same scenario
  through the live supervise loop: starts hub, rotates the row, waits
  for sup2 to take over with a distinct token, sleeps past sup1's
  unwind, asserts the row is still held by a non-sup1 token. Catches
  the bug even when the goroutine timing is non-deterministic.

Verification: go test ./internal/integrations/lark/... -race -count=1   ok
  go build ./...                                            clean
  go vet ./...                                              clean
Co-authored-by: multica-agent <github@multica.ai>

* fix(integrations/lark): route group @-mentions via union_id, not open_id (MUL-2671)

In a Lark group with multiple Multica bots installed, the bot whose WS
received the event sometimes failed to recognize that it was the @-target
while the OTHER bot's supervisor falsely fired. Bohan's controlled three-
message test (only @A, only @B, @both) hit this: @A and @B alone went
unanswered, @both got picked up by A only.

Root cause: the `mentions[].id.open_id` field Lark puts on the WS event
is structurally INVERSE to `/bot/v3/info`'s `bot.open_id` across the two
WSes. From A's WS perspective, the wire-form open_id for "A was @-ed"
is NOT equal to A's API-side open_id, but IS equal to what B's WS sees
on its side, and vice versa. The decoder's `mention.open_id ==
inst.BotOpenID` match therefore fires on the wrong bot in multi-bot
groups. Only `union_id` (the Lark-tenant-scoped stable identifier) is
consistent across both WSes.

Changes:
- migration 112 adds nullable `lark_installation.bot_union_id`
- sqlc query exposes UpsertLarkInstallation/CreateLarkInstallation
  with bot_union_id, plus a focused SetLarkInstallationBotUnionID for
  the backfill path
- httpAPIClient.GetBotInfo now follows /bot/v3/info with /contact/v3/
  users/{open_id}?user_id_type=open_id and returns both identifiers
  on BotInfo. Soft-fails on contact-scope denial: install still
  succeeds with an empty UnionID, and the decoder falls back to the
  legacy open_id match for single-bot deployments.
- RegistrationService.finishSuccess persists union_id alongside
  open_id during the device-flow finalize.
- ws_frame_decoder.containsMention prefers union_id and only walks
  open_id when the installation row has not been backfilled yet.
- BackfillBotUnionIDs runs once at server boot for installations
  created before migration 112; bounded per-row 10s timeout and a
  pure soft-fail policy so a slow Lark round-trip cannot block
  startup.
- regression tests cover the three decoder paths: union_id match
  wins over open_id mismatch, union_id mismatch overrides open_id
  match, and open_id fallback when union_id is unknown.

Co-authored-by: multica-agent <github@multica.ai>

* chore: drop trailing blank lines at EOF on four files (MUL-2671)

git diff --check origin/main..origin/pr-3277 flagged these as new
blank lines at EOF; clearing so the diff stays clean for review.

Co-authored-by: multica-agent <github@multica.ai>

* fix(views/locales): add missing ja keys for Lark MVP + section_integrations (MUL-2671)

CI frontend job tripped on the ja locale parity check: ja is missing
the lark_bind block in common.json, the lark block + page.tabs.lark
in settings.json, and inspector.section_integrations in agents.json.
The ko fix earlier covered Korean; ja was added separately on main
and the merge surfaced these gaps. Translations mirror the en source
and follow the same voice as the existing ja bundle.

Co-authored-by: multica-agent <github@multica.ai>

* fix(integrations/lark): rewrite @_user_N placeholders into clean body (MUL-2671)

When Lark dispatches a group `im.message.receive_v1`, the message
text contains opaque `@_user_1`, `@_user_2`, … placeholders and the
real identity is in `mentions[]`. We were forwarding the raw text to
the agent, so a Bohan-typed "@Bot ping test" arrived as "@_user_1
ping test" — neither human-readable nor useful as LLM context, and
the agent was paying tokens to figure out which `@_user_N` was even
itself.

The new resolveMentions pass:
  * strips the bot's own mention entirely (the dispatcher already
    routes the event on AddressedToBot; re-emitting @<self> in front
    of every message adds zero signal and pollutes context),
  * substitutes other participants with `@<displayName>` so a
    follow-up "@Alice" reads naturally,
  * collapses horizontal whitespace introduced by the strip while
    preserving original newlines.

Bot identity check uses the same union_id-preferred + open_id
fallback as containsMention, so the rewrite stays consistent with
the routing path. Tests cover the four shapes: bot self-mention,
mixed bot + other-user mention, multi-line body with stripped
mention, and a no-mention body that should be left untouched.

Co-authored-by: multica-agent <github@multica.ai>

* fix(integrations/lark): union_id-first self mention strip + token-aware scan + local whitespace cleanup (MUL-2671)

Three review blockers on the mention rewrite from PR review:

1. isBotMention now mirrors containsMention's union_id-first policy.
   When the installation row knows our union_id, we trust it
   exclusively (open_id is structurally inverted in multi-bot
   groups — matching on it would re-introduce the routing bug we
   fixed two commits ago). open_id fallback fires only when
   union_id is absent. New tests: @-ing both bots in one message
   correctly strips only self and renders the sibling as @<name>;
   open_id-matches-but-union_id-differs does NOT strip.

2. resolveMentions no longer collapses or trims whitespace globally.
   Indentation, tabs, code blocks, tables — all preserved verbatim.
   When the self mention is removed we eat exactly one adjacent
   horizontal space (the one after the placeholder, or, when the
   mention sits at end-of-input, a single space already emitted
   right before it). New test exercises a multi-line indented +
   tabbed body and asserts the whole shape survives.

3. Prefix-collision-safe replacement. A chat with 11+ participants
   exposes both `@_user_1` and `@_user_10`; naive ReplaceAll for
   `@_user_1` would mangle the substring of `@_user_10`. The
   resolver now does a single-pass token scan with the mention
   list sorted longest-key-first, so the longer placeholder always
   wins at any scan position. New test covers the @_user_1 /
   @_user_10 case explicitly.

Also drops the temporary INFO-level diag logging the previous
commit added — root cause was confirmed (union_id swap in the
manual backfill; not a decoder bug).

Co-authored-by: multica-agent <github@multica.ai>

* fix(integrations/lark): scope inbound dedup per (installation_id, message_id) (MUL-2671)

Root cause of the residual "@Cc gets dropped as not_addressed_in_group"
even after the union_id swap landed: lark_inbound_message_dedup was
keyed on `message_id` alone. In a Lark group chat where the workspace
has multiple Multica bots installed, Lark delivers the SAME message_id
to every bot's WS supervisor. Whichever WS claimed first then ran its
own AddressedToBot check; the bot that was actually @-ed lost the dedup
race, found the row already terminal (`processed_at IS NOT NULL`), and
was dropped as `duplicate` BEFORE it could evaluate its own mention.
Net: every @ silently disappeared if Lark happened to route the OTHER
bot's WS first.

The dedup gate's original purpose (idempotency against WS reconnect
replay) is per-installation by definition, so the right key is
composite (installation_id, message_id).

Changes:
- migration 113 drops + recreates lark_inbound_message_dedup with
  installation_id NOT NULL REFERENCES lark_installation(id) ON DELETE
  CASCADE and PRIMARY KEY (installation_id, message_id). The table is
  a 24h transient cache, so dropping existing rows is safe.
- sqlc queries: ClaimLarkInboundDedup / MarkLarkInboundDedupProcessed /
  ReleaseLarkInboundDedup all now take installation_id.
- AppendUserMessageParams carries InstallationID through to the
  in-tx Mark call so the chat_message+dedup atomicity stays intact.
- Dispatcher passes inst.ID to claim + applyFinalize + AppendUserMessage.
- Test fakes key dedup state on (installation_id, message_id) via a
  composite map key; all existing pre-seeded rows use a seedDedupKey
  helper bound to the default activeInstallation fixture so the prior
  staleness / token-rotation / in-tx mark tests still exercise the
  same regression they did before.
- New regression TestDispatcher_DedupIsScopedPerInstallation pins the
  multi-bot invariant: a row pre-seeded for installation A does NOT
  block installation B's first delivery of the same message_id; B
  runs through its own group-filter / identity / ingest pipeline.

Co-authored-by: multica-agent <github@multica.ai>

* feat(integrations/lark): render markdown chat replies via schema-2.0 card (MUL-2671)

The agent's chat replies were going out as msg_type=text, so every
`**bold**`, fenced code block, list, table, and link in the body
showed up as literal markdown characters in Lark — the user saw raw
asterisks, hashes, pipes instead of formatted text. Bohan reported
this and pointed at zarazhangrui/lark-coding-agent-bridge as the
shape to emulate.

The bridge repo uses Lark interactive cards with the schema-2.0
envelope and a `tag: "markdown"` body element; Lark's client
renders that to formatted text (GFM-ish: bold/italic, headings,
lists, links, fenced code blocks, tables, blockquotes). They expose
multiple reply modes (card / markdown-as-post / text) gated by user
config; we go a step simpler — auto-detect markdown syntax in the
agent's body and route accordingly:

- containsMarkdown(): cheap substring + regex pass for fenced code
  blocks, headings, list markers, bold/italic, tables, links,
  blockquotes, horizontal rules, inline code. Biases toward false-
  positive — wrapping prose in a card still renders fine, but
  missing a real markdown block leaves raw characters visible.

- APIClient gains SendMarkdownCard / SendMarkdownCardParams.
  Implementation marshals the schema-2.0 envelope verbatim:
  {schema:"2.0", body:{elements:[{tag:"markdown", content: md}]}}.
  Stub returns ErrAPIClientNotConfigured.

- Patcher.sendChatReply now branches on containsMarkdown:
  markdown → SendMarkdownCard, plain prose → SendTextMessage. A
  one-liner "sure, on it" stays as a normal IM bubble (no card
  chrome); anything with markdown gets the rendered card.

Tests: TestContainsMarkdown pins the heuristic across plain prose
and ten markdown shapes; TestPatcherRoutesMarkdownReplyToCard and
TestPatcherRoutesPlainReplyToText cover the router; new HTTP wire
test TestHTTPClient_SendMarkdownCard_HappyPath contract-pins the
card envelope (msg_type=interactive, schema 2.0, markdown tag,
verbatim body). Full lark suite passes.

Co-authored-by: multica-agent <github@multica.ai>

* fix(service/issue): route analytics.IssueCreated through obsmetrics.RecordEvent (MUL-2671)

CI's TestNoNakedAnalyticsCaptureInHandlersOrServices guard caught the
post-merge analytics call in IssueService.captureCreatedAnalytics
that still used s.Analytics.Capture(...) directly. Main added that
lint to prevent the Prometheus and PostHog sides from drifting — any
new analytics.* event must go through obsmetrics.RecordEvent so the
business-metrics collector and the PostHog client fire from the same
call site.

Fix mirrors how TaskService handles it: IssueService gains a
Metrics *obsmetrics.BusinessMetrics field (router wires it via
h.IssueService.Metrics = opts.BusinessMetrics next to the existing
TaskService line), and the in-service Capture call becomes
obsmetrics.RecordEvent(s.Analytics, s.Metrics, ...). nil-safe by
construction — RecordEvent treats a nil Metrics as PostHog-only.

Co-authored-by: multica-agent <github@multica.ai>

* feat(views/lark): swap Bind CTA for Connected+Manage link when agent already has an installation (MUL-2671)

Bohan reported the agent-detail Bind button keeps inviting the user to
re-scan the QR even when the agent already has an active Lark
PersonalAgent connected — and re-scanning silently upserts the
installation row, leaving the previously-created Lark bot dangling
as a zombie. Frustrating UX and an actual product footgun.

Anti-zombie guard at the only entry point: LarkAgentBindButton now
checks the cached installations listing for an active row pinned to
this agent_id. When one exists, the install CTA is gone — replaced
by a small Connected pill + an "Manage in Lark" link that opens the
Bot's app page in Lark's developer console (open.feishu.cn/app/<app_id>)
in a new tab. That's where scopes, display name, and additional
permission requests actually live; re-scanning never was the right
answer for managing an existing bot.

Scoping is per-agent: an active installation on a DIFFERENT agent
in the same workspace doesn't affect this agent's button, and a
revoked installation falls back to the bind CTA so the user can
re-create. Tests cover all four states (own-active / own-revoked /
other-agent-active / no-installation) and pin the Manage link's
href + target=_blank + noopener.

i18n: three new keys in settings.json (en / zh-Hans / ja / ko):
agent_bot_connected_label, agent_bot_manage_link,
agent_bot_manage_tooltip. Locale parity test still 157/157.

The dev console host is hardcoded to open.feishu.cn — operators
on the Lark international tenant currently get the wrong host;
future-proof fix wants the backend to surface a per-installation
dev_console_url on the listings response, called out in a code
comment.

Co-authored-by: multica-agent <github@multica.ai>

* feat(views/settings): collapse Lark into Integrations + render agent identity (MUL-2671)

Lark was its own top-level workspace settings tab while Integrations sat
empty next to it. As more integrations land, the sidebar would balloon
with one tab per provider. Move the Lark surface into Integrations as
the first hosted integration; the old ?tab=lark URL redirects through
LEGACY_WORKSPACE_TAB_REDIRECTS so bookmarks still resolve.

The Connected bots list was leaking the raw Lark app_id (cli_…) as the
row title with bot_open_id (ou_…) underneath — meaningless to product
users. Since the binding is 1:1 with a Multica Agent, join on agent_id
and render the agent's avatar + name via the workspace-standard
ActorAvatar + useActorName.getAgentName. Deleted agents fall back to
"Unknown Agent" so the row is still actionable for cleanup.

Tests: stub useActorName + ActorAvatar in lark-tab.test.tsx and add
LarkTab connected-bot tests covering the agent identity render and the
deleted-agent fallback. Drop the now-dead integrations.* + page.tabs.lark
+ lark.bot_open_id_label keys across all four locales — parity still
157/157, views suite 1141/1141.

Co-authored-by: multica-agent <github@multica.ai>

* feat(views/settings): wrap Lark in a named section inside Integrations (MUL-2671)

Integrations is meant to host multiple providers (Slack, Linear etc. as
they land), so the Lark content should sit under a Lark heading rather
than fill the tab directly — otherwise the first additional integration
would feel like it broke the IA. Add a "Lark" / "飞书" section heading
above LarkTab using the same h2 chrome the other settings tabs use, and
pin lark.section_title across all four locales (parity 169/169).

Co-authored-by: multica-agent <github@multica.ai>

---------

Co-authored-by: multica-agent <github@multica.ai>
Co-authored-by: J <j@multica.ai>
2026-06-03 19:12:14 +08:00
Bohan Jiang
5900d8b637 fix(issues): make start_date/due_date timezone-stable calendar days (#3618) (#3692)
* fix(issues): store start_date/due_date as DATE, not timestamp (MUL-2925)

These fields are calendar days (the pickers offer no time-of-day), but were
stored as TIMESTAMPTZ. A client serializing local midnight via toISOString()
folded its timezone into the instant, so the day shifted by the local offset
(GH #3618). Migrate the columns to DATE and parse/serialize date-only
"YYYY-MM-DD". ParseCalendarDate still accepts legacy RFC3339 (truncated to the
UTC day) so older clients keep working.

Co-authored-by: multica-agent <github@multica.ai>

* fix(issues): render start_date/due_date as timezone-stable calendar days (MUL-2925)

Pickers now emit date-only "YYYY-MM-DD" (local calendar day) instead of
toISOString(), and every read formats via the shared @multica/core/issues/date
helpers with timeZone:"UTC" so the day never shifts with the viewer's offset.
The Gantt's existing UTC bucketing is now correct. Covers web/desktop pickers,
quick-set menu, list/board/detail/activity, and the mobile due-date picker.

Co-authored-by: multica-agent <github@multica.ai>

* fix(issues): address date-only review — loud-fail ambiguous dates, finish display sweep (MUL-2925)

Review follow-ups on #3692:
- ParseCalendarDate no longer silently truncates a legacy non-midnight RFC3339
  to the wrong UTC day; it accepts only YYYY-MM-DD or an exact UTC-midnight
  instant and rejects ambiguous ones loudly. Adds util unit tests.
- migration 112 pins the TIMESTAMPTZ->DATE conversion to UTC explicitly via
  AT TIME ZONE 'UTC' (was session-timezone dependent); down migration too.
- Convert remaining date-change display sites to formatDateOnly: inbox detail
  label (web) and mobile activity + inbox labels (were new Date()+local format).
- CLI --start-date/--due-date help now says YYYY-MM-DD, not RFC3339.

Co-authored-by: multica-agent <github@multica.ai>

---------

Co-authored-by: J <j@multica.ai>
Co-authored-by: multica-agent <github@multica.ai>
2026-06-03 14:34:01 +08:00
Multica Eve
10afd1af1b feat(server): introduce pkg/taskfailure classifier and switch in-flight failure_reason writes (MUL-2946) (#3693)
Lift MUL-1949's offline backfill failure_reason taxonomy into a shared
in-flight classifier so the agent_task_queue.failure_reason column is
written with refined values (provider_auth_or_access, context_overflow,
provider_capacity_or_rate_limit, …) at write time rather than waiting on
SQL backfill to re-classify after the fact. PR1 of the Grafana board
plan in MUL-2328 — the upcoming PR2 reuses pkg/taskfailure.AllReasons()
to pre-warm the Prometheus failure_reason label set.

* server/pkg/taskfailure: new package with the canonical 21 Reason
  constants (7 platform-side + 14 agent_error.* sub-reasons),
  AllReasons() returning a defensive copy, IsAgentError() prefix check,
  and Classify(rawError) Reason mirroring the SQL CASE rules from
  MUL-1949 (db-boy's analysis). 100% statement coverage.
* server/internal/daemon/daemon.go: route the 'agent_error' coarse
  fallback paths (StartTask error, runTask early-return error, CompleteTask
  permanent rejection, reportTaskResult default branch) and the
  executeAndDrain default error case (chained after classifyPoisonedError)
  through taskfailure.Classify so blocked / timeout / unknown-status
  results all carry a refined reason on the wire.
* server/internal/service/task.go: FailTask classifies errMsg when the
  daemon-supplied failureReason is empty, eliminating the legacy
  COALESCE(.., 'agent_error') landing.
* server/internal/daemon/poisoned.go: alias FailureReasonIterationLimit
  and FailureReasonAPIInvalidRequest to the canonical taskfailure
  constants. agent_fallback_message and codex_semantic_inactivity are
  pre-existing operational reasons not in the canonical 21 — kept as
  literals for now and revisited in a follow-up PR.

Backfill SQL from MUL-1949 stays as the authoritative offline source of
truth; this PR keeps the in-flight classifier in lock-step with the SQL
CASE expression so historical and future rows share the same taxonomy.
No behavior change for the platform-side reasons (queued_expired,
runtime_offline, runtime_recovery, timeout, etc.) which already align
with the canonical set.

Co-authored-by: Eve <eve@multica-ai.local>
Co-authored-by: multica-agent <github@multica.ai>
2026-06-03 13:52:56 +08:00
Naiyuan Qing
f2f17e3355 Optimize chat message loading (#3685)
* Optimize chat message loading

Co-authored-by: multica-agent <github@multica.ai>

* Fix chat history cursor pagination

Co-authored-by: multica-agent <github@multica.ai>

* Fix chat session list remount key

Co-authored-by: multica-agent <github@multica.ai>

* fix(chat): fall back to legacy /messages when paged endpoint 404s

Deployment-order compatibility: a backend deployed before the
/messages/page endpoint existed returns 404 for the unknown route.
The cursorless initial page now falls back to the legacy full-list
/messages endpoint and wraps it in a single has_more:false page, so
chat never white-screens regardless of which side deploys first. A 404
on a cursor request still propagates to avoid duplicating the full list.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-authored-by: multica-agent <github@multica.ai>

---------

Co-authored-by: multica-agent <github@multica.ai>
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-03 13:47:30 +08:00
Bohan Jiang
fcb5099ec5 fix(agent): raise opencode model-discovery timeout to 15s (MUL-2888) (#3689)
Newer opencode (1.15+) syncs its hosted free-model catalog over the
network on `opencode models`, which can take ~6s. The previous 5s cap
killed the command, discoverOpenCodeModels returned an empty list, and
the daemon reported it as a successful empty result — so the runtime
showed online but the model picker was empty ("暂无可用模型").

Fixes #3627

Co-authored-by: J <j@multica.ai>
Co-authored-by: multica-agent <github@multica.ai>
2026-06-03 12:18:43 +08:00
Multica Eve
e2720f7d33 feat: add opencode thinking variants
Adds OpenCode model variant discovery for thinking controls, passes saved thinking_level through opencode run --variant, and hardens verbose model parsing with fallback coverage.
2026-06-02 13:15:14 +08:00
Jan De Dobbeleer
1e1a4f7845 fix(daemon): fix Copilot CLI invocation on Windows and strip shell quotes from custom args (MUL-2876)
Bug 1: detect copilot.cmd/.bat on Windows and invoke the sibling .ps1 directly via powershell -File, bypassing cmd.exe %* re-tokenisation that mangled the multi-line -p prompt. Shared rewriteCmdToPS1() now serves cursor, pi, and copilot.

Bug 2: filterCustomArgs (shared by all agent backends) strips one outer layer of shell quotes via unshellQuoteArg() before processing, so shell-style custom args like --deny-tool='write' no longer reach the CLI with literal quotes.
2026-06-01 23:28:51 +08:00
Matt Voska
700cd97407 feat(workspace): add per-workspace logo upload (#2760)
Adds avatar_url column to workspace, threads it through the API +
WorkspaceAvatar component, and adds a click-to-upload editor in the
workspace settings tab. Mirrors the squad avatar pattern (migration 086);
UI strings use "logo" while the schema/code uses avatar_url for codebase
consistency with user.avatar_url and squad.avatar_url.

- migration 093: ALTER TABLE workspace ADD COLUMN avatar_url TEXT
- UpdateWorkspace SQL + handler accept avatar_url (auth gated to
  owner/admin at the router via RequireWorkspaceRoleFromURL)
- WorkspaceAvatar renders <img> when avatar_url is set, falls back to
  the initial-letter span otherwise
- workspace-tab.tsx adds a 16x16 click-to-upload logo editor at the
  top of the general settings card, using useFileUpload + accept=
  image/png,image/jpeg,image/webp (server stores under workspaces/{id}/)
- en + zh-Hans settings i18n strings added

Co-authored-by: Matt Voska <voska@users.noreply.github.com>
2026-06-01 16:48:05 +02:00
Bohan Jiang
674be86add fix(tasks): cancel autopilot run_only & quick_create tasks (MUL-2827) (#3615)
CancelTaskByUser (POST /api/tasks/{taskId}/cancel) keyed cancellation off
issue_id / chat_session_id alone, so any task whose only source link was
autopilot_run_id (run_only autopilots) or quick_create context fell into the
dead else branch and 404'd with "task not found" — even though the task was
visible (and showed a cancel X) on the agent Activity tab.

Enforce tenancy uniformly through the task's owning agent instead: agent_id is
NOT NULL on every task row (ON DELETE CASCADE), and agents are workspace-scoped,
so GetAgentTaskInWorkspace (task JOIN agent ON workspace) is a single tenant
guard that works regardless of which optional source FK is set — including
orphan tasks whose autopilot_run_id was SET NULL after the autopilot was
deleted. Privacy layers on top: chat tasks stay creator-only, and every other
task mirrors the agent Activity / snapshot private-agent visibility gate via
canAccessPrivateAgent so the id-only endpoint is never more permissive than the
surface that exposes the task.

Tests cover run_only (same-ws success, cross-ws 404 no-mutation), quick_create,
retry clones, issue-task regression, chat non-creator 403, and private-agent
plain-member 403.

Co-authored-by: J <j@multica.ai>
Co-authored-by: multica-agent <github@multica.ai>
2026-06-01 22:11:27 +08:00
Mohammed Helaiwa
d4b97dc44a fix(agent): drain claude stdout while writing prompt to stdin (#3490)
The claude backend wrote the full prompt to the child's stdin and closed
it before starting the stdout reader goroutine. With
--verbose --output-format stream-json the CLI emits a startup banner
before reading its first stdin frame; with no reader draining stdout, the
child blocks on its stdout write, never reads stdin, and our stdin Write
blocks until the per-task context fires. The field symptom is tasks
failing exactly at the 2 h per-task timeout with
"write |1: The pipe has been ended."

Move writeClaudeInput into its own goroutine so the prompt write and the
stdout drain proceed concurrently. Guard stdin close with sync.Once (it
can now be called from both the writer goroutine and, previously, the
result handler). Join the write result at cmd.Wait() and surface a write
failure as a "failed" status only when no result event arrived and no
session was established, so a genuine startup death still reports the
stderr tail.

Add a regression test that re-execs the test binary as a fake claude
which bursts 256 KiB to stdout before reading stdin, with a 128 KiB
prompt pushed at stdin — both past any plausible OS pipe buffer — so a
regression hangs until the test deadline instead of passing.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-01 13:52:41 +08:00
Naiyuan Qing
4ae4722ef0 fix(comments): preserve direct parent on replies (#3579)
Co-authored-by: multica-agent <github@multica.ai>
2026-06-01 08:28:15 +08:00
feifeigood
382cdd6a0b feat(agent): consume OpenCode mcp_config via OPENCODE_CONFIG_CONTENT (#3098)
Closes the runtime-side gap of #2106: previously `agent.mcp_config` was
honored only by Claude Code (via `--mcp-config <file>`); for OpenCode the
field was accepted by the API but silently ignored at execution time.

## Approach

OpenCode has no `--mcp-config` flag. Project the agent's `mcp_config`
into OpenCode via OPENCODE_CONFIG_CONTENT — OpenCode's general
inline-config injection environment variable, which accepts any subset
of OpenCode's config schema (model / agent / mode / plugin / mcp / …)
and merges at "local" scope after the project-config loop. MCP is the
only field this PR projects through that channel; if a future Multica
field needs the same channel it would assemble a combined config slice
before the env append.

The env-var route was deliberate. An earlier draft of this PR wrote
the translated MCP servers into <workdir>/opencode.json and removed
the file on cleanup; review (#3098) flagged that the task workdir is
reused across turns for the same (agent, issue), and any agent- or
user-written model / tools / permission settings in opencode.json
must survive across runs. OPENCODE_CONFIG_CONTENT avoids the workdir
entirely — nothing is written to disk, no cleanup is needed, and the
env entry dies with the spawned process.

OPENCODE_CONFIG_CONTENT was added to OpenCode in v1.4.10 (2025-09); the
official @opencode-ai/sdk uses the same env var to inject runtime
config, so the surface is stable. Verified empirically against
OpenCode 1.15.6 in our K8s runtime: `opencode debug config` returns
the injected mcp slice deep-merged with the user's global config,
and <workdir>/opencode.json is observably untouched.

## Translation surface

`agent.mcp_config` accepts two shapes for portability:

- Claude-style `{"mcpServers": {name: {url|command, ...}}}` is
  translated into OpenCode's native form: `type: "local"|"remote"`,
  `command` coerced to a string array, `env` renamed to `environment`.
- Native OpenCode `{"mcp": {name: ...}}` accepts the three shapes
  OpenCode's schema permits and is strict-decoded against each:
    - McpLocalConfig:  `{type:"local", command:[…], environment?, enabled?, timeout?}`
    - McpRemoteConfig: `{type:"remote", url:"…", headers?, oauth?, enabled?, timeout?}`
    - bare override:   `{enabled: bool}` (toggle a server inherited
                        from global / project config without redefining it)
  Decoding uses `json.DisallowUnknownFields` so any field outside the
  matching schema is rejected — matching OpenCode's
  `additionalProperties: false`. Without this, a malformed payload
  (e.g. `command: "node"` instead of `command: ["node"]`) would reach
  OpenCode verbatim and either silently disable the server or crash
  the CLI at startup.

Field-level checks the strict decoder doesn't catch:
  - `timeout` must be a positive integer (rejects 0, negative, fractional)
  - `oauth` must be either an object (validated against McpOAuthConfig)
    or the literal `false`; primitives and `true` are rejected as ambiguous
  - `oauth.callbackPort` must be in 1..65535 when set

## Precedence

Go's os/exec dedups `cmd.Env` by key keeping the LAST occurrence
(Go 1.9+). Appending OPENCODE_CONFIG_CONTENT after `buildEnv(b.cfg.Env)`
guarantees the daemon's value wins over any value the user happened
to put in `agent.custom_env` — which matches the intended semantics
(`mcp_config` is the authoritative daemon-managed field; `custom_env`
is the escape hatch). When that override happens we surface a warning
log so accidental clobbers are debuggable.

## Limitation (out of scope, accepted in review)

OpenCode also deep-merges its **global** config
(`~/.config/opencode/opencode.json`) into every session and exposes no
flag to disable that. Operators who want strict per-agent isolation
from the global layer can set:

```jsonc
// agent.custom_env on the platform
{ "XDG_CONFIG_HOME": "/tmp/opencode-isolated" }
```

…pointing at any directory without an `opencode/` subdir. OpenCode then
reads no global config and only honors what the daemon injects via
OPENCODE_CONFIG_CONTENT. Verified with `opencode debug config`.

## Changes

server/pkg/agent/opencode_mcp.go (new):
  - buildOpenCodeMCPConfigContent — translates raw mcp_config into the
    JSON string OpenCode accepts via OPENCODE_CONFIG_CONTENT, returns
    "" when there's nothing to inject so the caller can skip the env
    entry (avoids clobbering anything the user put in
    agent.custom_env.OPENCODE_CONFIG_CONTENT)
  - translateMCPConfigForOpenCode + helpers — Claude-style → OpenCode
    native shape
  - validateOpenCodeNativeMCPEntry + opencodeMCPLocal /
    opencodeMCPRemote / opencodeMCPEnabledOnly / opencodeMCPOAuth
    typed structs — strict-decode native-shape entries against the
    schema (DisallowUnknownFields), plus targeted post-decode
    assertions for timeout / oauth / callbackPort

server/pkg/agent/opencode.go:
  - 12 lines of env injection in Execute(), placed AFTER buildEnv so
    the daemon's value wins via os/exec dedup
  - warning log when agent.custom_env duplicates the same key
  - no on-disk state, no rollback closure, no post-run cleanup —
    OPENCODE_CONFIG_CONTENT lives only in the spawned process env

server/pkg/agent/opencode_mcp_test.go (new):
  - TestBuildOpenCodeMCPConfigContent_{Empty,Remote,Local,Native}
  - TestBuildOpenCodeMCPConfigContent_NativeAcceptsAllSchemaFields —
    covers each native variant round-tripping every optional field
    (local with env+timeout+enabled; remote with headers+oauth-object+
    timeout+enabled; remote with oauth: false; bare {enabled} override)
  - TestBuildOpenCodeMCPConfigContent_RejectsMalformedNative — 31-case
    table covering every constraint on Bohan-J's review: command must
    be a string array, environment / headers values must be strings,
    oauth must be an object or false, timeout must be a positive
    integer, additionalProperties: false (per-shape allow-list checked
    via DisallowUnknownFields)
  - TestOpencodeBackendInjectsMCPConfigViaEnv — E2E happy path; fake
    opencode binary captures $OPENCODE_CONFIG_CONTENT, asserts the
    translated mcp slice is present AND <workdir>/opencode.json was
    NOT written
  - TestOpencodeBackendOmitsMCPEnvWhenEmpty — empty mcp_config does
    NOT inject the env, preserving any value the user set in
    agent.custom_env
  - TestOpencodeBackendOverridesUserOpenCodeConfigContent — daemon
    value wins via os/exec dedup keep-last

apps/docs/content/docs/providers.{en,zh}.mdx:
  - flip OpenCode's MCP cell from  to 
  - reword the "MCP configuration: only Claude Code actually reads it"
    section so OpenCode is included; describe each tool's mechanism
    (Claude → `--mcp-config`, OpenCode → OPENCODE_CONFIG_CONTENT)

apps/docs/content/docs/install-agent-runtime.{en,zh}.mdx:
  - update the Claude Code blurb (no longer "the only one")
  - expand the OpenCode blurb to mention mcp_config support
  - fix the now-broken /providers anchor

Refs #2106 (TS types and per-agent UI for mcp_config are separate
follow-ups, not in this PR).

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-30 18:08:21 +08:00
Naiyuan Qing
973a43923f fix(comments): revert since-delta to issue-wide, steer to parent thread first (#3535)
#3509/#3523 scoped the comment-trigger since-delta count to the triggering
thread, so an agent resuming a busy issue only saw "+N in this thread" and
lost visibility of new comments in other threads. Revert the count to
issue-wide (every thread), keeping the trigger-comment + agent-own
exclusions, and reshape the warm-path hint to:

  - report the issue-wide new-comment volume,
  - steer the agent to read the triggering (parent) thread FIRST
    (`--thread <trigger> --since`, or `--tail 30` for full context),
  - demote the issue-wide `--since` catch-up to an only-if-needed fallback
    ("don't read them all blindly").

Also fixes the now-stale "scoped to the triggering thread" wording in the
resumed-session no-delta hint (it's issue-wide zero now).

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Co-authored-by: multica-agent <github@multica.ai>
2026-05-29 20:13:23 +08:00
Multica Eve
9616d78e47 MUL-2785: optimize resumed comment reads (#3509)
* feat(comments): skip default thread read on resumed comment sessions

Co-authored-by: multica-agent <github@multica.ai>

* fix(comments): scope since delta to trigger thread

Co-authored-by: multica-agent <github@multica.ai>

* chore(comments): address thread delta review nits

Co-authored-by: multica-agent <github@multica.ai>

---------

Co-authored-by: Eve <eve@multica-ai.local>
Co-authored-by: multica-agent <github@multica.ai>
2026-05-29 14:57:14 +08:00
Bohan Jiang
75b5be3f8e feat(comments): roots-only thread stats + summary projection for comment list (MUL-2809) (#3505)
* feat(comments): roots-only thread stats + summary projection for comment list

Enrich the roots_only read so each root carries reply_count (recursive
descendant count) and last_activity_at (MAX created_at over the subtree),
letting an agent triage which thread to open without fetching any replies.

Add an orthogonal summary=true projection (--summary) that clips each
returned comment's content to a fixed budget and sets content_truncated,
so an agent can scan a list cheaply before pulling a full body. It composes
with every read mode (default, since, thread, recent, roots_only).

New response fields are optional (omitempty) and only populated for the
agent-facing query params, so the default response shape is unchanged for
the desktop/web and existing CLI callers.

Co-authored-by: multica-agent <github@multica.ai>

* test(comments): cover roots_only + summary composition end-to-end

The summary projection composing with roots_only is the spec's headline
"table of contents" read, but it was only exercised at the CLI param-
forwarding level — no handler test asserted that a roots_only response
both clips content AND keeps reply_count / last_activity_at. A refactor
moving the clip into a per-mode branch would silently break that
composition with no failing test.

Add TestListComments_RootsOnlySummaryComposes: a long root + a reply,
read via roots_only=true&summary=true, asserting the root is clipped
(content_truncated=true) while its subtree stats still surface.

Co-authored-by: multica-agent <github@multica.ai>

* refactor(comments): address review nits on roots stats + summary

- ListRootComments[Since]ForIssue: scope the recursive membership walk to a
  selected_roots CTE (the @row_limit page, with the @since cut applied up front)
  so stats are only computed over the subtrees of the roots actually returned,
  instead of every thread in the issue.
- summarizeContent: scan by rune and stop at the budget+1th rune instead of
  allocating a full []rune for the whole body, so a pathologically long comment
  costs only the budget under summary mode. Add a multi-byte (CJK) test to lock
  rune-boundary clipping.

Co-authored-by: multica-agent <github@multica.ai>

---------

Co-authored-by: J <j@multica.ai>
Co-authored-by: multica-agent <github@multica.ai>
2026-05-29 12:59:53 +08:00
Fangfei
c730e906b9 feat(cli): add roots-only issue comment listing (MUL-2805) (#3288) 2026-05-29 12:03:38 +08:00
Naiyuan Qing
3187bbf90c feat(comments): re-add since-delta + cold-start thread read + parent-root write normalization (#3494)
* feat(comments): since-delta new-comment hint + default-on comment session resume (#3432)

* feat(db): add unresolved comment count + list filter queries

Add CountUnresolvedComments (excludes the agent's own comments) and
ListUnresolvedCommentsForIssue. Both are additive — existing callers stay
on the unfiltered queries — so old clients are unaffected.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(handler): support unresolved-only comment listing

Wire an additive `unresolved` query param into ListComments. Defaults off
so an old CLI that never sends it gets unchanged behavior; only true/1
enable it. Rejects combining unresolved with thread/recent (whole-issue
filter vs navigation models). Includes filter + count query tests.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(handler): plumb unresolved count + thread root into claim, gate comment resume

Populate trigger_parent_id (thread root of the trigger comment) and
unresolved_count (excludes the agent's own comments) on comment-triggered
claim responses. Both fields are omitempty so old daemons ignore them.

Gate comment-triggered session resume behind MULTICA_RESUME_COMMENT_SESSION
(default off): resumed comment turns can inherit the prior turn's "Done."
final message, so this stays an explicit rollout switch. The runtime-match
and poisoned-session guards still apply regardless of the flag.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(daemon): inject unresolved-comments hint + resolve step into agent brief

Add a shared BuildUnresolvedCommentsHint helper rendered on both the
per-turn prompt and the CLAUDE.md workflow (kept in sync per PR #2816). It
ships only the count and the relevant CLI call — never comment bodies — so
the server stays cheap. Thread case points at --thread <root>; issue case
points at --unresolved. Suppressed when the count is 0.

Also add a workflow step telling the agent to `multica comment resolve
<thread-root>` once a thread is fully handled, so the unresolved set
converges.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(cli): add comment list --unresolved and comment resolve command

Add an --unresolved filter to `issue comment list` (wired to the server's
unresolved param, rejected when combined with --thread/--recent) and a
top-level `comment resolve <id>` command that POSTs to the existing
/api/comments/{id}/resolve endpoint, letting an agent close threads it has
fully handled.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(comments): since-delta new-comment hint + default-on comment resume

Simplifies the comment-triggered agent flow down to what's actually needed:

- New-comment awareness is now a pure time delta: the claim response carries
  new_comment_count + new_comments_since (anchored on the prior run's
  started_at, never completed_at so a long run can't miss comments). The
  per-turn prompt and CLAUDE.md workflow render one line — "N new comment(s)
  since your last run, --since <ts>" — via a shared BuildNewCommentsHint so the
  two surfaces can't drift. Cold start (no prior run) falls back to a plain read.
- Comment-triggered tasks resume the prior session by default (same runtime),
  dropping the MULTICA_RESUME_COMMENT_SESSION rollout gate. The "Focus on THIS
  comment" prompt guard defends against inheriting the prior turn's "Done."
  marker; GetLastTaskSession still excludes poisoned sessions.
- Drops the resolved-based machinery from the first draft: CountUnresolvedComments
  / ListUnresolvedCommentsForIssue queries, the `comment list --unresolved`
  flag, the `multica comment resolve` command, and the resolve workflow step.
- Removes the verbose cursor-pagination paragraph from the comment prompt; the
  --thread/--recent/--since flags stay in the CLI/API, just no longer explained
  inline every turn.

Compatibility: new claim fields are omitempty (old daemons ignore them).
Comment resume is default-on and affects even old daemons, which already
consume prior_session_id.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(comments): collapse reply parent_id to thread root on write

Comment threads are a 2-level model (root + flat replies, like Linear/Slack),
enforced today only by the UI and the agent path — the CreateComment handler
stored whatever parent_id it was handed, and the agent-side flatten walked just
one level, so a reply-to-a-reply could land at depth 3+. Add GetThreadRoot (a
recursive walk to the parent_id=NULL root) and run both write paths
(handler.CreateComment, service.createAgentComment) through it, so every stored
reply's parent_id IS its thread root. Readers can now treat parent_id as the
thread root without re-walking. The agent-drift guard still compares the raw
parent_id to the trigger comment before normalization.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat(comments): cold-start reads triggering thread, warm keeps --thread pointer

The since-delta rework dropped the thread-first read on the COLD path: a
first-time agent fell back to the flat `comment list` dump (oldest-first, cap
2000), burying the trigger's context in ancient chatter. Point cold start at the
triggering conversation instead via a shared BuildColdCommentsHint
(`--thread <trigger> --tail 30` + a --recent pointer for cross-thread
background). On the WARM path, --since is a pure time delta and can miss the
triggering thread's pre-anchor history, so BuildNewCommentsHint now also emits a
--thread pointer. Both surfaces (per-turn prompt + CLAUDE.md workflow) render via
the shared helpers so they cannot drift (PR #2816 rule).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-29 10:38:37 +08:00
Bohan Jiang
e1745d09ea MUL-2797 feat(agent): add Claude Opus 4.8 to model catalog & pricing (#3492)
Claude Code now ships Opus 4.8 (claude-opus-4-8). Add it to the three
places that enumerate Claude models so the picker, thinking-level
catalog, and usage cost estimates all recognize it:

- claudeStaticModels(): list Claude Opus 4.8 (Sonnet 4.6 stays default)
- claudeModelEffortAllow: Opus supports the full low..max set incl. xhigh
- MODEL_PRICING: $5/$25 in, $0.50 cache read, $6.25 5m cache write —
  same current-gen Opus tier as 4.5/4.6/4.7, confirmed against
  platform.claude.com/docs/en/about-claude/pricing

Co-authored-by: J <j@multica.ai>
Co-authored-by: multica-agent <github@multica.ai>
2026-05-29 10:28:30 +08:00
Naiyuan Qing
d90732750f Revert "feat(comments): since-delta new-comment hint + default-on comment ses…" (#3455)
This reverts commit 5e78e5100a.
2026-05-28 17:52:59 +08:00
Bohan Jiang
09f9c7e2ce MUL-2764 feat(agent): wire mcp_config through ACP runtimes (Hermes / Kimi / Kiro) (#3439)
* MUL-2764 feat(agent): wire mcp_config through ACP runtimes (Hermes / Kimi / Kiro)

The MCP config Tab (#3419) already lets admins save mcp_config on an
agent, and the daemon plumbs it through to `agent.ExecOptions.McpConfig`
for every runtime. Claude and Codex consume it; the three ACP runtimes
(Hermes / Kimi / Kiro) ignored the field and hardcoded an empty
`mcpServers: []` in their `session/new` requests.

Add `buildACPMcpServers` to translate the Claude-style `{"mcpServers":
{"<name>": {...}}}` object-of-objects into the array shape ACP requires
(`[{name, command, args, env: [{name,value}, ...]}, ...]` for stdio;
`[{type, name, url, headers: [...]}, ...]` for http/sse), then pass the
translated array on `session/new` (all three) and `session/load` (kiro
resume). Malformed JSON fails the launch closed — same contract Codex's
`renderCodexMcpServersBlock` uses — so users see a real error instead of
silently running with no MCP servers. Individual unclassifiable entries
(no command, no url) are skipped with a warning so one bad row can't
take MCP down for the rest of the agent.

Co-authored-by: multica-agent <github@multica.ai>

* MUL-2764 fix(agent): wire mcp_config through ACP resume + gate http/sse on capability

Addresses the two blockers Elon raised on #3439:

1. session/resume now carries mcpServers for Hermes and Kimi (Kiro's
   session/load already did). Per the ACP Session Setup spec the resume
   path re-attaches MCP servers, and without this a resumed task lost
   access to MCP tools that a fresh task on the same agent would have
   had. Pinned with new TestHermesResumeIncludesMcpServers and
   TestKimiResumeIncludesMcpServers integration tests that inspect the
   recorded wire request.

2. Added extractACPMcpCapabilities + filterACPMcpServersByCapability so
   http/sse MCP entries get dropped (with a daemon-log warning naming
   the entry) when the runtime's initialize response doesn't advertise
   mcpCapabilities.http / .sse. Sending those entries to a stdio-only
   runtime is a spec violation and reliably tanks session/new; now they
   get filtered and the rest of the session still starts. Stdio entries
   pass through unconditionally. Both backends wire the filter in right
   after initialize so session/new and session/resume see the same
   filtered list.

Also added TestKiroLoadIncludesMcpServersFromConfig — Elon flagged that
no test pinned "non-empty mcp_config actually reaches the wire" for
Kimi/Kiro, so the wire assertions go in for all three runtimes.

Co-authored-by: multica-agent <github@multica.ai>

---------

Co-authored-by: J <j@multica.ai>
Co-authored-by: multica-agent <github@multica.ai>
2026-05-28 16:29:49 +08:00
Naiyuan Qing
5e78e5100a feat(comments): since-delta new-comment hint + default-on comment session resume (#3432)
* feat(db): add unresolved comment count + list filter queries

Add CountUnresolvedComments (excludes the agent's own comments) and
ListUnresolvedCommentsForIssue. Both are additive — existing callers stay
on the unfiltered queries — so old clients are unaffected.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(handler): support unresolved-only comment listing

Wire an additive `unresolved` query param into ListComments. Defaults off
so an old CLI that never sends it gets unchanged behavior; only true/1
enable it. Rejects combining unresolved with thread/recent (whole-issue
filter vs navigation models). Includes filter + count query tests.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(handler): plumb unresolved count + thread root into claim, gate comment resume

Populate trigger_parent_id (thread root of the trigger comment) and
unresolved_count (excludes the agent's own comments) on comment-triggered
claim responses. Both fields are omitempty so old daemons ignore them.

Gate comment-triggered session resume behind MULTICA_RESUME_COMMENT_SESSION
(default off): resumed comment turns can inherit the prior turn's "Done."
final message, so this stays an explicit rollout switch. The runtime-match
and poisoned-session guards still apply regardless of the flag.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(daemon): inject unresolved-comments hint + resolve step into agent brief

Add a shared BuildUnresolvedCommentsHint helper rendered on both the
per-turn prompt and the CLAUDE.md workflow (kept in sync per PR #2816). It
ships only the count and the relevant CLI call — never comment bodies — so
the server stays cheap. Thread case points at --thread <root>; issue case
points at --unresolved. Suppressed when the count is 0.

Also add a workflow step telling the agent to `multica comment resolve
<thread-root>` once a thread is fully handled, so the unresolved set
converges.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(cli): add comment list --unresolved and comment resolve command

Add an --unresolved filter to `issue comment list` (wired to the server's
unresolved param, rejected when combined with --thread/--recent) and a
top-level `comment resolve <id>` command that POSTs to the existing
/api/comments/{id}/resolve endpoint, letting an agent close threads it has
fully handled.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(comments): since-delta new-comment hint + default-on comment resume

Simplifies the comment-triggered agent flow down to what's actually needed:

- New-comment awareness is now a pure time delta: the claim response carries
  new_comment_count + new_comments_since (anchored on the prior run's
  started_at, never completed_at so a long run can't miss comments). The
  per-turn prompt and CLAUDE.md workflow render one line — "N new comment(s)
  since your last run, --since <ts>" — via a shared BuildNewCommentsHint so the
  two surfaces can't drift. Cold start (no prior run) falls back to a plain read.
- Comment-triggered tasks resume the prior session by default (same runtime),
  dropping the MULTICA_RESUME_COMMENT_SESSION rollout gate. The "Focus on THIS
  comment" prompt guard defends against inheriting the prior turn's "Done."
  marker; GetLastTaskSession still excludes poisoned sessions.
- Drops the resolved-based machinery from the first draft: CountUnresolvedComments
  / ListUnresolvedCommentsForIssue queries, the `comment list --unresolved`
  flag, the `multica comment resolve` command, and the resolve workflow step.
- Removes the verbose cursor-pagination paragraph from the comment prompt; the
  --thread/--recent/--since flags stay in the CLI/API, just no longer explained
  inline every turn.

Compatibility: new claim fields are omitempty (old daemons ignore them).
Comment resume is default-on and affects even old daemons, which already
consume prior_session_id.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-28 15:58:42 +08:00
Bohan Jiang
bae8a84abd MUL-2767 feat(agent): add Antigravity runtime backend (#3427)
* feat(agent): add Antigravity runtime backend

Adds Google's Antigravity CLI (`agy`) as the 12th supported coding-tool
runtime, alongside Claude / Codex / Cursor / Copilot / Gemini / Hermes /
Kimi / Kiro / OpenCode / OpenClaw / Pi.

The CLI emits plain assistant text on stdout (no structured event
stream), so the backend streams stdout line-by-line as `MessageText`
events and accumulates the same text as the final `Result.Output`.
Session resumption uses `--conversation <id>`; because the conversation
UUID is not echoed on stdout, the daemon routes `--log-file` to a temp
file and recovers the id from the glog-formatted log lines.

MUL-2767

Co-authored-by: multica-agent <github@multica.ai>

* fix(agent): correct Antigravity capability contract from Elon review

- ModelSelectionSupported now returns false for antigravity. `agy` has no
  --model flag and antigravityBackend deliberately drops opts.Model, so
  the UI must render a disabled "Managed by runtime" picker instead of
  an empty dropdown plus a silently-ignored manual-entry field. Also
  stop seeding AgentEntry.Model from MULTICA_ANTIGRAVITY_MODEL — the
  backend would silently ignore it.

- Antigravity skills now write to {workDir}/.agents/skills/, the CLI's
  native workspace path (inherits Gemini CLI's layout per
  https://antigravity.google/docs/gcli-migration). Previously they went
  to the .agent_context/skills/ fallback that the CLI doesn't scan.
  Runtime brief moves antigravity into the native-discovery branch and
  local_skills.go points the user-level skill root at
  ~/.gemini/antigravity-cli/skills for Runtime → local skill import.

- Doc + UI comment sync: providers matrix / install-agent-runtime /
  cloud-quickstart / agents-create / tasks (session-resume support) /
  skills / README all now list Antigravity in the right buckets, and
  the model-picker / model-dropdown comments cite antigravity (not the
  stale hermes reference) as the supported=false example.

New tests: TestAntigravityModelSelectionUnsupported,
TestInjectRuntimeConfigAntigravity (native discovery wording),
TestWriteContextFilesAntigravityNativeSkills (.agents/skills/ landing,
.agent_context/skills/ NOT written).

Co-authored-by: multica-agent <github@multica.ai>

* feat(provider-logo): swap inline placeholder for real Antigravity PNG

Replaces the hand-drawn planet+arc placeholder with the official asset
shipped from Downloads. Stored next to the component; bundlers
(Next.js / electron-vite) resolve the PNG import to a URL string at
build time. Added a small assets.d.ts so packages/views' tsc accepts
PNG / SVG module imports — there was no prior asset usage in this
package to register the declaration.

---------

Co-authored-by: J <j@multica.ai>
Co-authored-by: multica-agent <github@multica.ai>
2026-05-28 15:40:05 +08:00
Bohan Jiang
d39da9f7f0 MUL-2764: feat(agents): add MCP config tab to agent detail page (#3419)
* MUL-2764: feat(agents): add MCP config tab to agent detail page

Backend already stores `mcp_config` and the daemon forwards it to the
runtime CLI via `--mcp-config`; this only adds the UI entry point.

The new tab presents a JSON editor that pretty-prints the existing
config, validates the buffer on every keystroke, and saves through the
existing `PUT /api/agents/{id}` path. Clearing the editor sends
`mcp_config: null`, which the handler reads as "wipe the column" and
the daemon falls back to the CLI's own default.

When the caller can't see secrets (agent actor, or a non-owner
non-admin member), the server already returns `mcp_config: null` with
`mcp_config_redacted: true`; the tab renders a read-only "configured
but hidden" state in that case so a non-privileged member cannot
silently overwrite an admin-owned config by saving an empty editor.

Co-authored-by: multica-agent <github@multica.ai>

* fix(agents): MCP tab — preserve in-flight edits + warn non-Claude runtimes

- Fix stale-editor sync: compare the local draft against the *previous*
  original via a ref, so a background agent refetch updates an untouched
  editor instead of being silently ignored. Without this, a draft equal to
  the OLD original was treated as user-edited after the prop changed, and
  the next Save would write the old config back over a concurrent admin
  edit.
- Surface a notice inside the tab when the agent's runtime provider is not
  Claude — today's daemon only forwards mcp_config via Claude's
  --mcp-config, so saving on e.g. a Codex agent was silent but ineffective.
- Tests for both: rerender resyncs an untouched editor, rerender preserves
  an in-flight edit, warning renders on non-Claude / hides on Claude.

MUL-2764

Co-authored-by: multica-agent <github@multica.ai>

* MUL-2764: feat(agents): codex MCP support + hide MCP tab on unsupported runtimes

- Backend: codex.go now translates agent.mcp_config (Claude-style
  `{"mcpServers": {...}}`) into `-c mcp_servers.<name>=<inline-toml>`
  flags for `codex app-server`, so MCP servers configured in the UI
  reach Codex's per-task config layer. Bad mcp_config JSON downgrades
  to a warn-and-skip so it can't break the agent launch.
- Frontend: AgentOverviewPane hides the MCP tab when the agent's
  runtime provider doesn't read mcp_config — only `claude` and `codex`
  are supported today, every other provider sees no MCP tab. The
  previous in-tab warning is removed (no longer reachable).
- New shared helper `providerSupportsMcpConfig` lives in
  `@multica/core/agents` so views and any future caller share one list
  of MCP-aware providers.
- Tests: new go-side coverage for stdio + url + multi-server inputs,
  TOML string escaping, malformed-input fallback, and arg ordering vs
  custom_args; new views-side coverage for which providers surface the
  MCP tab. En + zh-Hans copy and parity test refreshed.

Co-authored-by: multica-agent <github@multica.ai>

* MUL-2764: fix(agents): keep codex mcp_config secrets out of argv/logs

Move the agent's mcp_config from a `-c mcp_servers.<id>=<inline-toml>`
argv flag into a daemon-managed `[mcp_servers.*]` block inside the
per-task `$CODEX_HOME/config.toml`. mcp_servers.<id>.env is a documented
Codex config field and the UI already treats mcp_config as redacted for
non-admins; argv would have leaked those values into `ps aux` and the
`agent command` log line. The file is forced to 0600 to keep secrets in
the daemon owner's lane regardless of the seed file's mode.

Also drop user-supplied `-c/--config mcp_servers.*` entries from
custom_args. Codex `-c` is last-wins (verified against codex-cli 0.132.0),
so without filtering, a custom_args entry could silently shadow whatever
the MCP Tab saved.

Strip inherited `[mcp_servers.*]` tables from the per-task config.toml
when the agent has its own mcp_config, mirroring Claude's
`--strict-mcp-config`: avoids TOML "table already exists" errors on
name collisions and matches admin expectations that the MCP Tab is the
authoritative source for that task.

Co-authored-by: multica-agent <github@multica.ai>

* MUL-2764: fix(agents): codex mcp_config three-state semantics + custom_args compat

Address the third review pass:

1. Distinguish nil vs present-but-empty mcp_config. `{}` and
   `{"mcpServers":{}}` now count as "admin saved an explicit (empty)
   managed set" — strip inherited user `[mcp_servers.*]` and pin an
   empty managed marker block. Only SQL NULL / JSON `null` map to
   "absent" and fall back to the user's global `~/.codex/config.toml`.
   This aligns Codex with the API's three-state contract (omit / null
   / object) and with Claude's `--strict-mcp-config` semantics.

2. Fail closed on `ensureCodexMcpConfig` errors and on managed
   mcp_config without CODEX_HOME. Previous warn-and-launch would
   silently inherit the user's global MCP servers and look identical
   to a successful apply — exactly the surprise the MCP Tab is meant
   to remove.

3. Only filter `-c mcp_servers.*` from `custom_args`/`extra_args`
   when the agent has a managed mcp_config. Pre-MUL-2764 agents that
   configured MCP via custom_args keep working; once an admin opts
   in via the MCP Tab the daemon owns the `mcp_servers` namespace
   and overrides are dropped (last-wins safety).

4. Update mcp_config locale intro to mention $CODEX_HOME/config.toml
   instead of the now-removed `-c mcp_servers.*` argv path.

Tests:
- Split `TestEnsureCodexMcpConfigEmptyInputsAreNoop` into
  `TestEnsureCodexMcpConfigAbsentLeavesUserTablesAlone` (nil/null)
  and `TestEnsureCodexMcpConfigEmptyManagedSetStripsUserMcp` (`{}`,
  `{"mcpServers":{}}`).
- Add `TestEnsureCodexMcpConfigEmptyManagedSetIdempotent` to pin
  byte-identical reruns on the empty managed marker block.
- Add `TestHasManagedCodexMcpConfig` covering the eight relevant
  inputs.
- Add `TestBuildCodexArgsPreservesCustomMcpOverridesWhenUnmanaged`
  and `TestBuildCodexArgsDropsCustomMcpOverridesWhenManaged` to
  pin the new gating.
- Add `TestCodexExecuteFailsClosedWhenMcpConfigInvalid` and
  `TestCodexExecuteFailsClosedWhenManagedMcpButNoCodexHome` for the
  Execute paths.

Co-authored-by: multica-agent <github@multica.ai>

---------

Co-authored-by: J <j@multica.ai>
Co-authored-by: multica-agent <github@multica.ai>
2026-05-28 15:11:28 +08:00
Bohan Jiang
2bda4065d0 MUL-2708: fix(agent): preserve multi-line Pi prompt on Windows by bypassing the .cmd shim (#3417)
Pi is installed on Windows via npm, which lays down `pi.cmd` → `pi.ps1`
→ `node_modules/@mariozechner/pi-coding-agent/dist/cli.js`. The daemon
spawns Pi with `exec.Command("pi", ...)`; PATHEXT resolves that to
`pi.cmd`, and cmd.exe expands `%*` in the shim by re-tokenising the
original command line, which truncates any argv containing newlines.

buildPiArgs passes the full prompt as the last positional argv, so the
multi-line system+user prompt is silently cut at the first newline
before it reaches the JS entrypoint. The session JSONL then records
only the first line ("You are running as a chat assistant for a Multica
workspace.") and Pi replies as if the user message were missing
(GitHub multica-ai/multica#3306).

Mirror the existing cursor-agent fix: when LookPath resolves Pi to a
.cmd/.bat launcher and a sibling pi.ps1 exists, invoke PowerShell with
`-File <ps1>` directly and forward each arg as a discrete token. This
keeps us on the official launch path while skipping the cmd.exe %*
re-expansion. Falls back to the original launcher when pi.ps1 or
PowerShell can't be located.

The Windows test asserts the rewrite produces the expected argv and
that the multi-line positional prompt survives unchanged.

Co-authored-by: J <j@multica.ai>
Co-authored-by: multica-agent <github@multica.ai>
2026-05-28 12:36:16 +08:00