Commit Graph

62 Commits

Author SHA1 Message Date
Bohan Jiang
674be86add fix(tasks): cancel autopilot run_only & quick_create tasks (MUL-2827) (#3615)
CancelTaskByUser (POST /api/tasks/{taskId}/cancel) keyed cancellation off
issue_id / chat_session_id alone, so any task whose only source link was
autopilot_run_id (run_only autopilots) or quick_create context fell into the
dead else branch and 404'd with "task not found" — even though the task was
visible (and showed a cancel X) on the agent Activity tab.

Enforce tenancy uniformly through the task's owning agent instead: agent_id is
NOT NULL on every task row (ON DELETE CASCADE), and agents are workspace-scoped,
so GetAgentTaskInWorkspace (task JOIN agent ON workspace) is a single tenant
guard that works regardless of which optional source FK is set — including
orphan tasks whose autopilot_run_id was SET NULL after the autopilot was
deleted. Privacy layers on top: chat tasks stay creator-only, and every other
task mirrors the agent Activity / snapshot private-agent visibility gate via
canAccessPrivateAgent so the id-only endpoint is never more permissive than the
surface that exposes the task.

Tests cover run_only (same-ws success, cross-ws 404 no-mutation), quick_create,
retry clones, issue-task regression, chat non-creator 403, and private-agent
plain-member 403.

Co-authored-by: J <j@multica.ai>
Co-authored-by: multica-agent <github@multica.ai>
2026-06-01 22:11:27 +08:00
Naiyuan Qing
3187bbf90c feat(comments): re-add since-delta + cold-start thread read + parent-root write normalization (#3494)
* feat(comments): since-delta new-comment hint + default-on comment session resume (#3432)

* feat(db): add unresolved comment count + list filter queries

Add CountUnresolvedComments (excludes the agent's own comments) and
ListUnresolvedCommentsForIssue. Both are additive — existing callers stay
on the unfiltered queries — so old clients are unaffected.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(handler): support unresolved-only comment listing

Wire an additive `unresolved` query param into ListComments. Defaults off
so an old CLI that never sends it gets unchanged behavior; only true/1
enable it. Rejects combining unresolved with thread/recent (whole-issue
filter vs navigation models). Includes filter + count query tests.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(handler): plumb unresolved count + thread root into claim, gate comment resume

Populate trigger_parent_id (thread root of the trigger comment) and
unresolved_count (excludes the agent's own comments) on comment-triggered
claim responses. Both fields are omitempty so old daemons ignore them.

Gate comment-triggered session resume behind MULTICA_RESUME_COMMENT_SESSION
(default off): resumed comment turns can inherit the prior turn's "Done."
final message, so this stays an explicit rollout switch. The runtime-match
and poisoned-session guards still apply regardless of the flag.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(daemon): inject unresolved-comments hint + resolve step into agent brief

Add a shared BuildUnresolvedCommentsHint helper rendered on both the
per-turn prompt and the CLAUDE.md workflow (kept in sync per PR #2816). It
ships only the count and the relevant CLI call — never comment bodies — so
the server stays cheap. Thread case points at --thread <root>; issue case
points at --unresolved. Suppressed when the count is 0.

Also add a workflow step telling the agent to `multica comment resolve
<thread-root>` once a thread is fully handled, so the unresolved set
converges.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(cli): add comment list --unresolved and comment resolve command

Add an --unresolved filter to `issue comment list` (wired to the server's
unresolved param, rejected when combined with --thread/--recent) and a
top-level `comment resolve <id>` command that POSTs to the existing
/api/comments/{id}/resolve endpoint, letting an agent close threads it has
fully handled.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(comments): since-delta new-comment hint + default-on comment resume

Simplifies the comment-triggered agent flow down to what's actually needed:

- New-comment awareness is now a pure time delta: the claim response carries
  new_comment_count + new_comments_since (anchored on the prior run's
  started_at, never completed_at so a long run can't miss comments). The
  per-turn prompt and CLAUDE.md workflow render one line — "N new comment(s)
  since your last run, --since <ts>" — via a shared BuildNewCommentsHint so the
  two surfaces can't drift. Cold start (no prior run) falls back to a plain read.
- Comment-triggered tasks resume the prior session by default (same runtime),
  dropping the MULTICA_RESUME_COMMENT_SESSION rollout gate. The "Focus on THIS
  comment" prompt guard defends against inheriting the prior turn's "Done."
  marker; GetLastTaskSession still excludes poisoned sessions.
- Drops the resolved-based machinery from the first draft: CountUnresolvedComments
  / ListUnresolvedCommentsForIssue queries, the `comment list --unresolved`
  flag, the `multica comment resolve` command, and the resolve workflow step.
- Removes the verbose cursor-pagination paragraph from the comment prompt; the
  --thread/--recent/--since flags stay in the CLI/API, just no longer explained
  inline every turn.

Compatibility: new claim fields are omitempty (old daemons ignore them).
Comment resume is default-on and affects even old daemons, which already
consume prior_session_id.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(comments): collapse reply parent_id to thread root on write

Comment threads are a 2-level model (root + flat replies, like Linear/Slack),
enforced today only by the UI and the agent path — the CreateComment handler
stored whatever parent_id it was handed, and the agent-side flatten walked just
one level, so a reply-to-a-reply could land at depth 3+. Add GetThreadRoot (a
recursive walk to the parent_id=NULL root) and run both write paths
(handler.CreateComment, service.createAgentComment) through it, so every stored
reply's parent_id IS its thread root. Readers can now treat parent_id as the
thread root without re-walking. The agent-drift guard still compares the raw
parent_id to the trigger comment before normalization.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat(comments): cold-start reads triggering thread, warm keeps --thread pointer

The since-delta rework dropped the thread-first read on the COLD path: a
first-time agent fell back to the flat `comment list` dump (oldest-first, cap
2000), burying the trigger's context in ancient chatter. Point cold start at the
triggering conversation instead via a shared BuildColdCommentsHint
(`--thread <trigger> --tail 30` + a --recent pointer for cross-thread
background). On the WARM path, --since is a pure time delta and can miss the
triggering thread's pre-anchor history, so BuildNewCommentsHint now also emits a
--thread pointer. Both surfaces (per-turn prompt + CLAUDE.md workflow) render via
the shared helpers so they cannot drift (PR #2816 rule).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-29 10:38:37 +08:00
Naiyuan Qing
d90732750f Revert "feat(comments): since-delta new-comment hint + default-on comment ses…" (#3455)
This reverts commit 5e78e5100a.
2026-05-28 17:52:59 +08:00
Naiyuan Qing
5e78e5100a feat(comments): since-delta new-comment hint + default-on comment session resume (#3432)
* feat(db): add unresolved comment count + list filter queries

Add CountUnresolvedComments (excludes the agent's own comments) and
ListUnresolvedCommentsForIssue. Both are additive — existing callers stay
on the unfiltered queries — so old clients are unaffected.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(handler): support unresolved-only comment listing

Wire an additive `unresolved` query param into ListComments. Defaults off
so an old CLI that never sends it gets unchanged behavior; only true/1
enable it. Rejects combining unresolved with thread/recent (whole-issue
filter vs navigation models). Includes filter + count query tests.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(handler): plumb unresolved count + thread root into claim, gate comment resume

Populate trigger_parent_id (thread root of the trigger comment) and
unresolved_count (excludes the agent's own comments) on comment-triggered
claim responses. Both fields are omitempty so old daemons ignore them.

Gate comment-triggered session resume behind MULTICA_RESUME_COMMENT_SESSION
(default off): resumed comment turns can inherit the prior turn's "Done."
final message, so this stays an explicit rollout switch. The runtime-match
and poisoned-session guards still apply regardless of the flag.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(daemon): inject unresolved-comments hint + resolve step into agent brief

Add a shared BuildUnresolvedCommentsHint helper rendered on both the
per-turn prompt and the CLAUDE.md workflow (kept in sync per PR #2816). It
ships only the count and the relevant CLI call — never comment bodies — so
the server stays cheap. Thread case points at --thread <root>; issue case
points at --unresolved. Suppressed when the count is 0.

Also add a workflow step telling the agent to `multica comment resolve
<thread-root>` once a thread is fully handled, so the unresolved set
converges.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(cli): add comment list --unresolved and comment resolve command

Add an --unresolved filter to `issue comment list` (wired to the server's
unresolved param, rejected when combined with --thread/--recent) and a
top-level `comment resolve <id>` command that POSTs to the existing
/api/comments/{id}/resolve endpoint, letting an agent close threads it has
fully handled.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(comments): since-delta new-comment hint + default-on comment resume

Simplifies the comment-triggered agent flow down to what's actually needed:

- New-comment awareness is now a pure time delta: the claim response carries
  new_comment_count + new_comments_since (anchored on the prior run's
  started_at, never completed_at so a long run can't miss comments). The
  per-turn prompt and CLAUDE.md workflow render one line — "N new comment(s)
  since your last run, --since <ts>" — via a shared BuildNewCommentsHint so the
  two surfaces can't drift. Cold start (no prior run) falls back to a plain read.
- Comment-triggered tasks resume the prior session by default (same runtime),
  dropping the MULTICA_RESUME_COMMENT_SESSION rollout gate. The "Focus on THIS
  comment" prompt guard defends against inheriting the prior turn's "Done."
  marker; GetLastTaskSession still excludes poisoned sessions.
- Drops the resolved-based machinery from the first draft: CountUnresolvedComments
  / ListUnresolvedCommentsForIssue queries, the `comment list --unresolved`
  flag, the `multica comment resolve` command, and the resolve workflow step.
- Removes the verbose cursor-pagination paragraph from the comment prompt; the
  --thread/--recent/--since flags stay in the CLI/API, just no longer explained
  inline every turn.

Compatibility: new claim fields are omitempty (old daemons ignore them).
Comment resume is default-on and affects even old daemons, which already
consume prior_session_id.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-28 15:58:42 +08:00
Bohan Jiang
341ce7bfa5 feat: support local working directory for projects (MUL-2618 v1) (#3283)
* feat(project): add local_directory project_resource type (MUL-2662)

Adds a second project_resource type alongside github_repo so a project
can be pinned to an existing directory on a specific daemon (the v1 of
the local-working-directory flow tracked in MUL-2618). The ref schema is
{ local_path, daemon_id, label? }; local_path must be absolute and
daemon_id is required. The same (daemon_id, local_path) pair is allowed
on multiple projects by design — no UNIQUE constraint is added.

Implementation reuses the existing project_resource API surface: the new
type is wired through the validator switch with no migration, no new
events, and no daemon-handler changes (daemon already passes through
arbitrary resource types via ProjectResources). The CLI gains
--local-path / --daemon-id / --ref-label shortcuts so
`multica project resource add --type local_directory` mirrors the
existing `--type github_repo --url ...` ergonomics; the generic --ref
flag still works for both types.

Tests cover the full CRUD lifecycle, the same-path-across-projects
allowance, the same-path-same-project conflict, the validator rejections
(missing/blank/relative path, missing daemon_id, wrong payload type),
and the cross-platform isAbsoluteLocalPath helper.

Co-authored-by: multica-agent <github@multica.ai>

* feat(project): add update endpoint + label-shadow guard for project_resource (MUL-2662)

Addresses the Elon review on PR #3263:

- Add PUT /api/projects/{id}/resources/{resourceId} with sqlc query,
  matching handler, CLI `project resource update`, and a new
  EventProjectResourceUpdated WS event. resource_type stays immutable;
  ref/label/position are all individually optional.
- Catch same-project (daemon_id, local_path) collisions where only the
  embedded label differs — the row-level UNIQUE only matches the full
  ref JSON, so a label typo would otherwise let the same working
  directory bind twice.
- Tests cover the update lifecycle (label-only / ref / clear / 404 /
  invalid path) and the label-shadow conflict on both create and
  update; the in-place rename still succeeds because the conflict
  scan ignores the row being edited.

Incidental: regenerating sqlc picked up a missing skills_local scan in
UpdateAgentCustomEnv that drifted in from #3200.

Co-authored-by: multica-agent <github@multica.ai>

* fix(project): close bundled-create label-shadow gap + merge resource_ref on CLI update (MUL-2662)

Two follow-ups from MUL-2662 review round 2:

- CreateProject inline resources path now dedupes local_directory entries on
  (daemon_id, local_path) before opening the transaction. The DB-level
  UNIQUE(project_id, resource_type, resource_ref) constraint only fires on a
  full JSON match, so two rows with the same target but different `label`
  would otherwise slip past. Standalone POST/PUT already cover this via
  findLocalDirectoryConflict; bundled create was the missing surface.
- `multica project resource update` now seeds resource_ref from the existing
  row before applying per-type shortcut flags, so `--default-branch-hint x`
  on its own no longer constructs a payload missing `url` (which the server
  400s on). Local_directory partial edits get the same merge behavior.

Co-authored-by: multica-agent <github@multica.ai>

* feat(desktop): local_directory project_resource UI (MUL-2665) (#3273)

* feat(desktop): local_directory project_resource UI (MUL-2665)

First UI surface for the local-working-directory flow tracked in MUL-2618.
Lets users on the desktop pin a project to an existing folder on this
machine; web stays read-only since the per-daemon check can't be done in
the browser.

What's new for the renderer:

- ProjectResourcesSection grows a desktop-only "Add local directory"
  button next to the existing GitHub-repo popover. Clicking it opens
  Electron's native folder picker, validates the path through a new
  IPC pair (existence + r/w), and submits a project_resource of
  resource_type=local_directory with daemon_id pulled live from
  daemonAPI.getStatus.
- LocalDirectoryRow renders the rename pencil + path tooltip, and
  greys out when ref.daemon_id != this machine's daemon_id (with a
  "only available on the machine that registered this directory"
  tooltip). Delete stays enabled so users can drop stale registrations
  from any device.
- LocalDirectoryHint sits above the issue-detail comment composer and
  shows "Agent will work in-place at {label} ({path})" when the issue's
  project has a local_directory matching this daemon. Hidden on web.
- TaskStatusPill picks up a new "waiting_for_directory_release" stage
  that the daemon will publish when it dequeues a task but can't
  acquire the path lock. The render is in place now so the daemon
  sibling subtask can wire the status string without an additional UI
  PR.

Plumbing:

- @multica/core/types gains LocalDirectoryResourceRef +
  UpdateProjectResourceRequest, and the api client gets the matching
  PUT method backed by the server endpoint that landed in
  2ac3faebb (MUL-2662). A useUpdateProjectResource hook drives the
  in-place label edit.
- New Electron handlers under apps/desktop/src/main/local-directory.ts:
    local-directory:pick     -> dialog.showOpenDialog (openDirectory)
    local-directory:validate -> stat + access(R_OK + W_OK)
  exposed through the preload as desktopAPI.pickDirectory /
  validateLocalDirectory. View code talks to them via a thin
  packages/views/platform helper that returns reason=unsupported on
  web instead of crashing.
- useLocalDaemonStatus exposes the local daemon's id, device name, and
  running flag from daemonAPI.onStatusChange so the renderer can do the
  cross-device match without coupling to the desktop preload typings.

Tests:

- pickStageKeys gets a unit test covering the new stage and proving
  the directory-release status outranks availability hints.
- LocalDirectoryHint tests cover the four render branches (no project,
  no daemon, foreign daemon, matching daemon).
- i18n parity stays green; new keys added under projects.resources.*
  and chat.status_pill.stages.waiting_for_directory_release in both
  locales.

Out of scope (will land separately):
- The daemon-side waiting/lock signal that flips the pill into the
  new state.
- Adding local_directory to the create-project modal's bulk
  attach flow.
- Docs page refresh for project-resources.mdx — left for the
  MUL-2618 umbrella sweep.

Co-authored-by: multica-agent <github@multica.ai>

* fix(desktop): hide rename for foreign daemon local_directory rows (MUL-2618)

Address review nit on #3273: the rename pencil was gated only by
`canEdit`, so a foreign / unknown-daemon row still showed it even
though the spec says cross-device rows are disabled. Gate rename on
`!mismatch` so it disappears on those rows; delete stays available
so a stale registration can still be dropped from any device.

Co-authored-by: multica-agent <github@multica.ai>

---------

Co-authored-by: multica-agent <github@multica.ai>

* feat(daemon): local_directory execution + path mutex + GC exception (MUL-2663) (#3274)

* feat(daemon): local_directory execution + path mutex + GC exception (MUL-2663)

Wires up the daemon side of the local_directory project_resource introduced
in MUL-2662. When a task is dispatched against a project whose resources
include a local_directory pinned to this daemon's UUID, the daemon now:

  - Validates the path (absolute, exists, daemon process can read+write,
    not in the system-root / $HOME blacklist) and fails the task fast on
    any precondition violation, with a user-readable reason.
  - Serialises concurrent tasks on the same on-disk path via a
    daemon-local LocalPathLocker keyed by symlink-resolved realpath. The
    lock is held for the entire task lifetime (claim → context write →
    agent → result report).
  - When the lock is contended, the daemon flips the row to a new
    waiting_local_directory status on the server (carrying a wait_reason
    like "<path> (held by task <short id>)") so the UI can render
    "等待本地目录释放" instead of leaving the row silently in dispatched
    past the sweeper timeout. The status accepts being woken into running
    once the lock is acquired.
  - Sets execenv.WorkDir to the user's path (no copy, no mount). envRoot
    still lives under workspacesRoot/<wsID>/ and hosts output/, logs/, and
    .gc_meta.json — the daemon's logbook for the run.
  - Stamps GCMeta.LocalDirectory=true so the GC loop never RemoveAlls
    envRoot for these tasks (gcActionClean → gcActionCleanArtifacts,
    gcActionOrphan → gcActionSkip). The user's directory was never under
    envRoot to begin with, so this is defense in depth.
  - Skips execenv.Reuse for local_directory tasks because the prior
    WorkDir is the user's path and reusing it through that code path
    loses the envRoot association the GC loop needs. Prepare is cheap
    here (no clone, no copy), so always running it is fine.

Server-side protocol changes:

  - New CHECK value 'waiting_local_directory' on agent_task_queue.status
    plus a wait_reason TEXT column (migration 109).
  - All cancel / active / counted-as-running / orphan-recovery queries
    expanded to include the new status; FailStaleTasks intentionally
    excludes it (the daemon owns the wait).
  - New SQL MarkAgentTaskWaitingLocalDirectory(id, reason) and a relaxed
    StartAgentTask that accepts both dispatched and
    waiting_local_directory as preconditions (and clears wait_reason on
    the way through).
  - New POST /api/daemon/tasks/{taskId}/wait-local-directory endpoint,
    TaskService.MarkTaskWaitingLocalDirectory broadcaster, and matching
    daemon Client.MarkTaskWaitingLocalDirectory.

Tests cover: path blacklist + R/W enforcement, mutex serialisation +
ctx-cancelled wait, lock handover between two tasks, GC never returns
gcActionClean / gcActionOrphan for local_directory rows (with negative
control for the standard path), and Prepare/Cleanup correctly substitute
+ protect the user's WorkDir.

The desktop UI side (UI for adding a local_directory resource, surfacing
the "等待本地目录" badge) is MUL-2665; the agent-task lifecycle changes
(no branch switch, dirty-tree tolerant, auto-commit) are MUL-2664.

This PR targets the shared MUL-2618 v1 feature branch agent/j/912b8cb1,
not main; the whole v1 will be merged to main together when complete.

Co-authored-by: multica-agent <github@multica.ai>

* fix(daemon): tighten local_directory status, symlink, cancel handling (MUL-2618)

Address the 3 must-fix items from Elon's review of PR #3274.

1. Status string unified. The server / daemon publish
   `waiting_local_directory`; align views, locales, and the
   pickStageKeys test (PR #3273 had used `waiting_for_directory_release`
   on a placeholder string). Without this, the daemon's wait state
   never reached the pill once the two siblings merged.

2. validateLocalPath now also runs the blacklist against the
   symlink-resolved realpath, with macOS's `/etc` -> `/private/etc`
   redirect handled via `isBlacklistedRealPath` which compares
   canonical forms. Without this, a symlink such as
   `/Users/me/proj/home -> /Users/me` slipped the literal $HOME check
   while every daemon write still landed in the user's home. Tests
   cover symlink-to-home, symlink-to-system-root, and the negative
   case (symlink to a regular subdirectory).

3. acquireLocalDirectoryLockIfNeeded now spins up a cancellation
   watcher inside `onWait` (lazy — the fast path stays free) so the
   gap between dispatch and StartTask responds to server-side cancel
   or row deletion. If the watcher fires while the daemon is parked
   on the path mutex, the lock-wait context is cancelled, Acquire
   returns promptly, and the helper exits silently the same way the
   run-phase poller does. New TestAcquireLocalDirectoryLock_CancelDuringWait
   exercises the path end-to-end with a fake server.

Co-authored-by: multica-agent <github@multica.ai>

* fix(daemon): unconditional canonical blacklist + Windows drive-root generalisation (MUL-2618)

- validateLocalPath now always runs isBlacklistedRealPath on the
  symlink-resolved path, not only when it differs from absPath. The old
  guard let users type the canonical form of an OS-symlinked banned root
  (e.g. /private/tmp, /private/etc, /private/var on macOS) straight
  through, since EvalSymlinks is a no-op on already-canonical input.
- Windows drive-root rejection moved off the static C/D/E/F enumeration
  onto filepath.VolumeName via a new isDriveRoot helper, so removable /
  network drives mounted at G:..Z: and UNC \\server\share roots are also
  blocked. systemRootBlacklist keeps the well-known C:\ trees only.
- Tests: macOS-only case exercises direct /private/{tmp,etc,var}; a
  new TestIsDriveRoot covers the Windows generalisation (skipped on
  POSIX runners by runtime guard).

Co-authored-by: multica-agent <github@multica.ai>

---------

Co-authored-by: multica-agent <github@multica.ai>

* feat(views): wire waiting_local_directory end-to-end in issue UI + presence (MUL-2618)

Connect the daemon-emitted `task:waiting_local_directory` and `task:running`
events through to issue execution log, sticky agent banner, activity indicator,
and agent presence so a parked task is no longer invisible on the issue page.

- Add `waiting_local_directory` to `AgentTask.status` and the typed
  `task:running` / `task:waiting_local_directory` WS event payloads.
- Chat realtime sync writes both new statuses into the pending-task cache so
  the chat StatusPill flips out of a stale `dispatched` frame.
- ExecutionLogSection: count `waiting_local_directory` as active, add tone +
  status label, treat parked tasks the same as dispatched for time anchor /
  transcript visibility / terminate-confirm note.
- AgentLiveCard: subscribe to both new events, rank the parked state between
  dispatched and queued, and surface a "is waiting for the local directory"
  banner with the muted "Clock" treatment used for queued.
- IssueAgentActivityIndicator: route parked tasks into the queued bucket so
  the hover stack and chip stay visible.
- derive-presence: parked tasks count toward `queuedCount` so the agent
  workload chip stays out of `idle` while the daemon waits on the path lock.
- Locales: add `agent_live.is_waiting_local_directory` and
  `execution_log.status_waiting_local_directory` (en + zh-Hans).

Co-authored-by: multica-agent <github@multica.ai>

* feat(project): enforce one local_directory per (project, daemon) (MUL-2618)

The daemon-side resolver picks the first matching local_directory by
daemon_id, so allowing two rows on the same daemon — even at different
paths — let the agent silently write into whichever sorted first. Tighten
the invariant top to bottom:

- server: `findLocalDirectoryConflict` rejects any second row sharing a
  daemon_id, regardless of `local_path` or label. Bundled-create surface in
  `CreateProject` runs the same daemon-scoped dedupe up front.
- daemon: `findLocalDirectoryAssignment` fails fast when it finds more than
  one row pinned to the current daemon (older API client / direct DB
  writes can still produce that state — refuse to guess).
- desktop UI: hide the "Add local directory" action once the current
  daemon owns a row on this project, with a hint and a defensive toast on
  the call path; foreign-daemon rows stay visible read-only as before.
- Tests:
  * daemon: new `two local_directory rows on this daemon fail fast` /
    `local_directory rows on different daemons coexist` cases.
  * handler: rewrite the legacy `LabelShadow` cases as
    `DaemonScopedConflict` / `BundledLocalDirectoryDaemonConflict` —
    asserts 409 on same-daemon different-path, 201 on per-daemon bundles.
- Locales: en + zh-Hans copy for the new hint + toast.

Co-authored-by: multica-agent <github@multica.ai>

* chore(sqlc): drop stale skills_local in UpdateAgentCustomEnv (MUL-2618)

Follow-up to the main-merge in 0f8e8ca7: the auto-merge preserved most
of main's skills_local revert but kept the column reference inside the
UpdateAgentCustomEnv scanner because that block hadn't been touched by
either side. Re-running `sqlc generate` regenerates the file without
skills_local in this query, matching the rest of the file and the
post-revert schema.

Co-authored-by: multica-agent <github@multica.ai>

* feat(create-project): binary source picker — repos OR local directory

Turn the create-project dialog's "Repos" pill into a binary Source
picker. A project's source is mutually exclusive: either a set of
GitHub repos (worktree mode, default) or a single local working
directory (local mode, desktop-only). Mirrors the constraint the
backend will enforce next.

Behavior:
- Pill shows the active mode's selection (GitHub icon + repo count, or
  folder icon + local label/path).
- Popover has a 2-tab segmented control at the top; the Local tab is
  hidden entirely on web (local_directory needs a daemon_id).
- Local tab requires the daemon online — amber notice + disabled picker
  when offline, re-renders automatically via useLocalDaemonStatus.
- Switching tabs preserves the other side's stash, but handleSubmit
  only emits the resource matching the active sourceMode, so abandoned
  picks never leak into the created project.

Backend mutual-exclusion validation + the resources-section
conditional-add-button still to come — this PR just unblocks the
dialog so it can be demoed.

* fix(mobile): cover waiting_local_directory in run row status maps (MUL-2618)

---------

Co-authored-by: multica-agent <github@multica.ai>
Co-authored-by: Multica J <j@multica.ai>
2026-05-27 13:44:31 +08:00
Multica Eve
744b474199 revert(agent): remove per-agent local skill toggle (MUL-2603) (#3286)
* Revert "feat(agents): hide skills_local toggle for runtimes that don't honour it (MUL-2603) (#3276)"

This reverts commit 0b50c5a209.

Co-authored-by: multica-agent <github@multica.ai>

* Revert "fix(agent): surface host OAuth token via env var on macOS isolation (MUL-2603) (#3267)"

This reverts commit a67bf81225.

Co-authored-by: multica-agent <github@multica.ai>

* Revert "fix(agents): tighten skills-tab intro and drop redundant import hint (#3265)"

This reverts commit d8075a5775.

Co-authored-by: multica-agent <github@multica.ai>

* Revert "fix(agent): mirror $HOME/.claude.json into isolated config dir (MUL-2661) (#3261)"

This reverts commit 40da88fc16.

Co-authored-by: multica-agent <github@multica.ai>

* Revert "feat(agent): per-agent toggle to isolate host-machine skills (MUL-2603) (#3200)"

This reverts commit 960befa56f.

Co-authored-by: multica-agent <github@multica.ai>

* Add migration cleanup for reverted agent skills toggle

Co-authored-by: multica-agent <github@multica.ai>

---------

Co-authored-by: Eve <eve@multica-ai.local>
Co-authored-by: multica-agent <github@multica.ai>
2026-05-26 17:00:01 +08:00
LinYushen
bf8a346cf0 feat(runtimes): cascade-archive agents on runtime delete (MUL-2667) (#3266)
* feat(runtimes): cascade-archive agents on runtime delete (MUL-2667)

Replace the bare 409 "cannot delete runtime: it has active agents" with a structured response carrying the blocking agent list, and wire a cascade endpoint that archives those agents, cancels their tasks, pauses dangling autopilots and deletes the runtime in a single transaction. The unified DeleteRuntimeDialog opens directly in cascade mode when the runtime has bound agents, pivots from light to cascade if the strict DELETE refuses with runtime_has_active_agents, and re-prompts when the cascade refuses with runtime_delete_plan_changed (live agent set drifted while the dialog was open). The online-local self-healing rule is preserved at the affordance level (kebab hidden, Diagnostics button disabled with tooltip) and re-checked at confirm time as defence in depth.

Co-authored-by: multica-agent <github@multica.ai>

* fix(runtimes): close cascade race + i18n delete dialog (PR #3266 review)

- Acquire FOR UPDATE on the runtime row at the top of the cascade tx so
  FK-validated agent INSERTs/UPDATEs that would point at this runtime
  block until commit, and lock each currently-active agent row via
  ListActiveAgentsByRuntimeForUpdate so a concurrent archive/move of
  an existing active row also blocks.
- Switch the bulk archive from runtime-keyed (ArchiveAgentsByRuntime)
  to ID-keyed (ArchiveAgentsByIDs), narrowed to the user-confirmed
  expected_active_agent_ids set. Combined with the runtime row lock,
  this guarantees no agent outside the confirmed plan can be silently
  archived between plan-compare and archive even at read-committed.
- Wire delete-runtime-dialog.tsx to runtimes locale via useT(); add
  detail.delete_dialog.{light,cascade} keys (EN with _one/_other
  plurals, zh-Hans _other) covering titles, descriptions, warning,
  notices, checkbox, buttons, table headers, presence labels, and
  toasts. Resolves the i18next/no-literal-string CI failure.
- Locale parity test passes (51 tests). All 4 dialog test cases pass
  unmodified (EN copy preserves original wording). Full views vitest:
  91 files / 792 tests green; full server go test: green.

Co-authored-by: multica-agent <github@multica.ai>

---------

Co-authored-by: multica-agent <github@multica.ai>
2026-05-26 14:59:38 +08:00
Bohan Jiang
960befa56f feat(agent): per-agent toggle to isolate host-machine skills (MUL-2603) (#3200)
* feat(agent): per-agent toggle to isolate host-machine skills (MUL-2603)

Adds an agent-scoped `skills_local` switch ("ignore" default / "merge") so
shared agents stop inheriting the operator's user-global Claude skill
directory. A single broken local skill on one operator's machine was
crashing the Claude CLI before it ever read stdin — the daemon saw a
"broken pipe" with no recoverable signal (GitHub #3052).

- DB: migration 108 adds `agent.skills_local` (NOT NULL DEFAULT 'ignore'),
  with sqlc CreateAgent/UpdateAgent updates and handler validation.
- Claude runtime: when the agent is in "ignore" mode the backend points
  CLAUDE_CONFIG_DIR at an empty per-task scratch dir under the task cwd
  (fallback: OS temp), strips any inherited override, and cleans up after
  the run. Workspace skills under `{cwd}/.claude/skills/` still load.
  "merge" preserves the legacy inherit-from-machine behavior; Codex and
  other isolated backends are no-ops.
- UI: new Skills toggle in the Create Agent dialog and the Agent → Skills
  tab, with EN/zh-Hans copy and SkillsLocalToggle shared between the two.
- Tests: unit coverage for the new env helper, isolation dir lifecycle,
  full Claude execute paths (ignore + merge), and the handler tristate
  contract. Existing skills-tab test updated for the new copy.
- Docs: updated `/skills` docs (EN + ZH) and added a 0.3.7 changelog entry
  in the landing-page i18n.

Co-authored-by: multica-agent <github@multica.ai>

* fix(agent): preserve claude login + validate skills_local input (MUL-2603)

Address Elon's review on PR #3200:

1. Skill isolation no longer drops the operator's Claude login. The
   per-task scratch dir now mirrors every entry under `~/.claude/`
   as symlinks except `skills/`, so `.credentials.json`, settings,
   plugins, etc. reach the CLI exactly as on the host while the
   user-global skills directory stays hidden. Without this, default
   `ignore` would have broken every Claude agent on a non-API-key
   host the moment migration 108 landed.

2. Internal CreateAgent callers (agent_template, onboarding_shim)
   now set `SkillsLocal: "ignore"`. The Go zero value was about to
   trip the migration-108 CHECK constraint and 500 template /
   onboarding agent creation.

3. Create / update handler validation no longer normalizes garbage
   to "ignore". The strict 400 path is now reachable on bad client
   input; the drift-safe `normalizeSkillsLocal` stays on the read
   side only.

UI copy + docs clarified that the toggle is Claude-only; other
runtimes ignore the setting.

Verification:
- `go test ./...` green (full suite locally).
- `pnpm --filter @multica/views exec vitest run agents/components/tabs/skills-tab.test.tsx` green.
- Handler DB-backed tests still skip locally without docker (same
  as Elon's run) — CI will validate the create / update paths
  against migration 108.

Co-authored-by: multica-agent <github@multica.ai>

* fix(agent): mirror effective claude config dir with windows fallback (MUL-2603)

Address Elon's second-round review on PR #3200:

1. The per-task scratch dir now mirrors the *effective* host Claude
   config dir, not unconditionally `~/.claude/`. Precedence: agent
   `custom_env` CLAUDE_CONFIG_DIR > parent process env > `~/.claude/`.
   Without this, an operator who pinned Claude at a managed install
   (custom env CLAUDE_CONFIG_DIR) would get the wrong credentials in
   the scratch dir, because `buildClaudeEnv` strips that env before
   handing it to the child. We resolve the source up front and feed
   it to the mirror, so the override env still points at the right
   bytes.

2. Mirror entries now go through platform-aware linkers. On Windows
   without Developer Mode / admin, `os.Symlink` is denied, which
   previously left the scratch dir empty and broke Claude Code auth
   on default `ignore`. The new helpers try symlink first, then fall
   back to a directory junction (`mklink /J`) for dirs or a hardlink
   (same-volume content share) / copy for files. Mirrors the
   execenv/codex_home_link_windows.go pattern.

3. Tests:
   - `TestResolveHostClaudeConfigDir` locks in the custom_env >
     parent_env > `~/.claude` precedence.
   - `TestNewIsolatedClaudeConfigDirMirrorsCustomHostDir` confirms
     the scratch dir picks up `.credentials.json` from a synthetic
     custom host dir, proving the source resolution actually
     propagates into the mirror.
   - `TestNewIsolatedClaudeConfigDirEmptyHostIsNoop` documents the
     env-var-auth-only case (no host source ⇒ empty scratch dir).
   - `TestMirrorHostClaudeExceptSkillsWith_FallbackWhenSymlinkFails`
     exercises the Windows-no-Developer-Mode path via the new
     `mirrorHostClaudeExceptSkillsWith` seam, asserting credentials
     and sub-dir children still reach the scratch dir after the
     symlink stand-in fails.
   - `TestMirrorHostClaudeExceptSkillsWith_PropagatesFirstLinkError`
     confirms callers see the per-entry error when even fallback
     fails (so the warn-log fires on broken Windows installs).
   - `TestCopyFileRoundTrip` covers the last-resort copy fallback
     and its EXCL no-overwrite contract.
   - `TestClaudeExecuteIsolatesUsesCustomEnvSource` is the
     end-to-end check: an agent with custom_env CLAUDE_CONFIG_DIR
     reads its credentials from the pinned dir, not `~/.claude/`.

4. Docs: `apps/docs/content/docs/skills.{mdx,zh.mdx}` updated to
   describe the effective-source resolution and the Windows
   fallback chain so the docs match the runtime behaviour.

Verification:
- `go test ./...` green (full server suite locally, including
  `pkg/agent` 23 cases covering the new + existing isolation
  paths).
- `GOOS=windows GOARCH=amd64 go vet ./pkg/agent/...` and
  `go test -c -o /dev/null` both compile clean, confirming the
  Windows-tagged linker file builds.

Co-authored-by: multica-agent <github@multica.ai>

* fix(agent): default skills_local to merge to preserve legacy behavior (MUL-2603)

Per Bohan's product decision on PR #3200, the per-agent host-skill toggle
defaults to "merge" — the pre-MUL-2603 inherit-from-machine behavior —
so existing personal workflows that rely on locally installed Claude
Skills keep working unchanged. Agent owners explicitly opt into "ignore"
when they need to harden a shared agent against a broken local skill on
one operator's machine (GitHub #3052).

Also audited all 11 runtimes for user-global skill discovery paths and
documented the scope of the toggle. Only Claude reads a user-global
`~/.claude/skills/`; Codex isolates via `CODEX_HOME`, the ACP backends
(Hermes / Kimi / Kiro) and the JSON-stream backends (Copilot / Cursor /
Gemini / Pi / OpenCode / OpenClaw) anchor discovery to the task workdir
and never read a user-global skill directory. UI copy and docs now say
"for runtimes that support it (currently Claude Code)" everywhere so
the scope is explicit.

Changes:

- Migration 108: column default flipped to 'merge'.
- Handler CreateAgent: missing field → "merge"; explicit "ignore" /
  "merge" still validated, garbage still 400.
- normalizeSkillsLocal: drift-safe coercion now lands on "merge" for
  anything that isn't the exact literal "ignore".
- agent_template.go / onboarding_shim.go: internal CreateAgent callers
  send "merge" instead of "ignore" to match the new default.
- Claude runtime (`claude.go`): isolate-mode gate flipped from
  `SkillsLocal != "merge"` to `SkillsLocal == "ignore"`, so "" (legacy
  daemons / older clients) and "merge" both walk `~/.claude/` directly.
- Create Agent dialog + Skills tab: toggle defaults to on (merge); only
  duplicate of an explicit "ignore" agent carries through. The
  isolation opt-in is now `skills_local: "ignore"` when the user flips
  off; "merge" is omitted from the request body.
- i18n (EN + zh-Hans): copy reframed — "On (default) — merged"; "Off —
  ignored. Recommended for shared agents".
- Docs (`/skills`, `/guides/agents.zh`): describe new default and
  enumerate which runtimes act on the toggle.
- Landing changelog 0.3.7: retitled "Per-Agent Local-Skill Toggle"; note
  the on-by-default behavior + off-to-isolate framing.
- Tests:
  - `TestClaudeExecuteIsolatesHostSkillsWhenIgnoreOptedIn` replaces the
    old by-default isolation case (now requires explicit "ignore").
  - New `TestClaudeExecuteDefaultModeKeepsHostConfigDir` locks in that
    default ExecOptions preserve the host CLAUDE_CONFIG_DIR.
  - `TestClaudeExecuteIsolatesUsesCustomEnvSource` now explicitly opts
    into "ignore" mode.
  - Handler tests: omitted → "merge"; explicit "ignore" round-trips;
    preserve-existing test seeds "ignore" and asserts "merge" flip-back.
  - `TestNormalizeSkillsLocal_DriftStaysSafe`: only literal "ignore"
    maps to ignore; everything else → "merge".
  - `skills-tab.test.tsx`: toggle ON by default; flip OFF when agent
    opted into "ignore". Intro-text matcher anchored to a more specific
    phrase so it no longer collides with the toggle hint copy.

Verification:
- `go test ./...` green (full server suite locally).
- `GOOS=windows GOARCH=amd64 go vet ./pkg/agent/...` and
  `go test -c -o /dev/null` both compile clean (windows-tagged linker
  file still builds).
- `pnpm typecheck` green across all packages and apps.
- `pnpm --filter @multica/views test` 88 files / 771 tests green.
- `pnpm --filter @multica/core test` 43 files / 390 tests green.
- Handler DB-backed tests still skip locally without docker; CI will
  validate the create / update paths against migration 108.

Co-authored-by: multica-agent <github@multica.ai>

* chore(landing): drop 0.3.7 changelog entry from this PR (MUL-2603)

The landing-page release notes belong in a separate release-prep PR, not in the feature PR.

Co-authored-by: multica-agent <github@multica.ai>

* fix(agent): propagate skills_local=ignore to codex user-skill seed (MUL-2603)

Make the per-agent skills_local toggle real for Codex too, not just Claude.
Previously the toggle was only consumed by the Claude backend, while the
daemon's execenv layer always seeded Codex's per-task CODEX_HOME with the
host machine's user-installed skills from ~/.codex/skills/. A shared Codex
agent with skills_local=ignore could still inherit a broken local skill
from one operator's machine.

Now: PrepareParams/ReuseParams carry SkillsLocal; hydrateCodexSkills
skips seedUserCodexSkills when SkillsLocal == "ignore" so the per-task
CODEX_HOME exposes only workspace skills to the codex CLI. Default
("merge", or empty from older servers/clients) preserves existing
inherit-from-machine behavior. UI / docs are updated to reflect the
contract honestly: Claude Code and Codex honor the toggle; other
runtimes (Hermes / Kimi / Kiro / Copilot / Cursor / Gemini / Pi /
OpenCode / OpenClaw) leave $HOME untouched and discover user-level
skills natively, so the toggle is a no-op for them today.

New tests: TestPrepareCodexSkillsLocalIgnoreSkipsUserSeed,
TestPrepareCodexSkillsLocalMergeSeedsUserSkills, and
TestReuseCodexSkillsLocalIgnoreSkipsUserSeed cover Prepare(ignore),
Prepare(merge), and the toggle-flip-on-reuse path.

Co-authored-by: multica-agent <github@multica.ai>

* docs(skills): scope skills_local toggle copy to Claude Code + Codex (MUL-2603)

Off-state hint and Skills tab intro now explicitly call out Claude Code +
Codex as the only runtimes that honor the toggle, with "other runtimes
ignore this setting" wired into both states (en + zh-Hans), so users on
non-Claude/Codex agents don't read "Off" as runtime-wide isolation.

Docs (skills.mdx, skills.zh.mdx, guides/agents.zh.mdx) stop describing
Hermes / Kimi / Gemini / Copilot / Cursor / Pi / OpenCode / OpenClaw / Kiro
as having native user-level skill discovery; the daemon simply does not
manage user-level skill discovery for those runtimes today, and the toggle
is a no-op regardless of where it is set.

Co-authored-by: multica-agent <github@multica.ai>

---------

Co-authored-by: multica-agent <github@multica.ai>
2026-05-26 13:26:33 +08:00
Bohan Jiang
13f74e651a feat(agents): remove custom_env from agent resources, add audited env endpoint (MUL-2600) (#3209)
* feat(agents): remove custom_env from agent resources, add audited env endpoint (MUL-2600)

The agent resource shape (list / get / create / update / archive /
restore responses + WebSocket events) no longer carries `custom_env`
values. Reads/writes of env now flow exclusively through a dedicated
`/api/agents/{id}/env` endpoint that is owner/admin-only, rejects
agent-actor sessions, applies a "****" sentinel preserve guard on
PUT, and writes a persistent audit row per reveal/update.

Why
- `multica agent list --output json` historically returned plaintext
  `custom_env` for owner/admin callers (the redaction gate gave only
  members the masked map). Any agent token running on the workspace
  inherits its owner's role and could read every other agent's
  secrets just by listing.
- Patching list/get redaction alone (PR #3175 direction) left
  symmetric leaks via mutation responses, WS events, the "reveal"
  path itself (no actor-aware auth), and a `****` overwrite footgun
  on UpdateAgent.

What changed
- Backend: drop `custom_env` from AgentResponse; add coarse
  `has_custom_env` + `custom_env_key_count`. Strip env handling from
  UpdateAgent (silently ignored if sent). Keep CreateAgent's
  custom_env acceptance.
- Backend: new GET/PUT `/api/agents/{id}/env` handlers in
  `internal/handler/agent_env.go`:
  - resolveActor → 403 for agent actors (closes the lateral-movement
    path).
  - Owner/admin role gate via existing helper.
  - PUT honours value == "****" as "preserve existing value".
  - Both write to `activity_log` with `agent_env_revealed` /
    `agent_env_updated` actions. Audit details record key names only,
    never values.
- Daemon claim path (`ClaimAgentTask`) unchanged — `TaskAgentData`
  still carries plaintext env for runtime injection.
- SQL: new `UpdateAgentCustomEnv` query; sqlc regenerated (v1.31.1).
- CLI: new `multica agent env get|set` subcommands. `--custom-env*`
  flags removed from `multica agent update`; the no-fields error
  now points to the new path.
- Frontend: drop env fields from `Agent` + `UpdateAgentRequest`; add
  `getAgentEnv` / `updateAgentEnv` client methods; rewrite env-tab
  to show "N variables configured" + explicit "Reveal & edit"
  button, fetching values only on intentional reveal.
- Locales: parity-safe additions to en + zh-Hans.
- Docs: agents-create.{mdx,zh.mdx} reflect the new threat model and
  endpoint.
- Mobile: schema drops `custom_env` / `custom_env_redacted`, adds
  metadata fields.

Tests
- Handler tests pinned the new invariants: no env in list/get
  responses, owner reveal happy-path + audit row, agent-actor 403,
  `****` sentinel preserves real values, UpdateAgent silently
  ignores `custom_env`, pure `mergeAgentEnv` cases.
- CLI tests pivot to the new flag surface: `agent update` MUST NOT
  expose the env flags; `agent env set` MUST expose
  --custom-env-stdin/--custom-env-file.
- Frontend test fixtures updated; pnpm typecheck / test / lint
  pass cleanly.

This is a breaking API change. Scripts that read `custom_env` from
`/api/agents` must migrate to `GET /api/agents/{id}/env`.

Co-authored-by: multica-agent <github@multica.ai>

* fix(agents): close actor-spoofing + audit fail-closed in env endpoints (MUL-2600)

Addresses Elon's review of #3209:

* Mint a task-scoped `mat_` token per claim, bound to (agent, task,
  workspace, owner). Daemon injects it into the agent process in place
  of its own credential. Auth middleware authoritatively rebuilds
  X-User-ID / X-Agent-ID / X-Task-ID from the token row and sets
  X-Actor-Source=task_token; that header is server-set only — incoming
  values are stripped before any auth branch runs. resolveActor honors
  the header so an agent that strips X-Agent-ID / X-Task-ID still
  resolves as actor=agent.
* GetAgentEnv / UpdateAgentEnv are now fail-closed on audit-log
  failures: GET refuses to return plaintext, PUT persists inside the
  same tx as the audit row so they commit/roll back together.
* PUT /api/agents/{id} returns 400 when the body carries custom_env
  instead of silently dropping it — directs callers to the audited env
  endpoint.
* Agent actors never see mcp_config, even when the underlying member
  is owner/admin; mutation broadcasts go through a redaction shim so
  WS subscribers don't pick it up either.
* Fix backend test that asserted dense JSON (jsonb::text renders
  whitespace) and frontend test that assumed a unique "Test User"
  match.

Co-authored-by: multica-agent <github@multica.ai>

* fix(agents): close residual MUL-2600 gaps from review (MUL-2600)

Migration 108 FK now correctly references agent_task_queue(id) instead
of the non-existent agent_task table; the previous name blocked CI
backend migrations.

Task-token-authenticated requests can no longer be re-routed at a
different workspace by passing workspace_slug / workspace_id /
?workspace_id / a URL workspace param. ResolveWorkspaceIDFromRequest
and resolveWorkspaceUUID both short-circuit on X-Actor-Source=task_token
and return only the token-bound X-Workspace-ID; buildMiddleware adds a
defence-in-depth 403 if any URL-resolved workspace disagrees with the
token binding.

mcp_config no longer leaks back to agent actors through UpdateAgent /
CreateAgent / ArchiveAgent / RestoreAgent HTTP responses — the same
redactAgentResponseForActor helper that GetAgent/ListAgents use is now
applied to mutation responses too. WS broadcasts were already redacted
via broadcastAgentResponse.

FailTask and every TaskService cancel path (CancelTask /
CancelTasksForIssue / CancelTasksForAgent / CancelTasksByTriggerComment
/ BroadcastCancelledTasks) now eagerly DeleteTaskTokensByTask so the
mat_ token's 24h window doesn't outlive a terminated task. Failure is
non-fatal — the FK cascade and expiry remain durable guards.

Doc-only: clarify that PUT /api/agents/{id} now hard-rejects bodies
that carry custom_env (was previously "silently ignores").

Tests:
- middleware: TestResolveWorkspaceIDFromRequest gains a task_token
  case asserting client-supplied slug/id/query cannot override the
  bound workspace.
- handler: TestUpdateAgent_RedactsMcpConfigForAgentActor and
  TestUpdateAgent_KeepsMcpConfigForMemberActor pin the mutation-
  response redaction contract per actor type.

Co-authored-by: multica-agent <github@multica.ai>

* fix(agents): match redacted mcp_config as JSON null, not Go nil (MUL-2600)

`AgentResponse.McpConfig` is `json.RawMessage` without `omitempty`, so
the redacted response serialises as `"mcp_config": null`. On decode,
`json.RawMessage` keeps the literal bytes `null` rather than collapsing
to Go nil, which made the assertion fire on a non-leak.

The product contract (field always present, distinguished from "no
config" via `mcp_config_redacted`) is intentional, so adjust the test
to check for "no secret-bearing content" instead of weakening the
contract via `omitempty`.

Co-authored-by: multica-agent <github@multica.ai>

---------

Co-authored-by: multica-agent <github@multica.ai>
2026-05-25 18:42:48 +08:00
YOMXXX
29c2a5d18f fix(daemon): reclaim stale dispatched claims (MUL-2485) (#2872)
* fix(daemon): reclaim stale dispatched claims

* fix(daemon): widen stale claim reclaim window
2026-05-21 17:06:55 +08:00
iYuan
2f1f90c11a fix(agent): retry codex semantic inactivity fresh (#2593) 2026-05-20 20:03:39 +08:00
Bohan Jiang
2bec2221d2 feat(agent): per-agent thinking_level for claude + codex (MUL-2339) (#2865)
* feat(agent): persist thinking_level per agent (MUL-2339)

Adds a nullable `thinking_level` column to the `agent` table so the
backend can route a runtime-native reasoning/effort token (e.g. Claude's
`xhigh`, Codex's `minimal`) through to the agent CLI on every dispatch.

The column is intentionally TEXT rather than an enum — Claude and Codex
publish overlapping but distinct vocabularies and we want the persisted
value to round-trip exactly through whichever CLI receives it. NULL is
the "use runtime default" sentinel that every downstream consumer reads
as "do not inject --effort / reasoning_effort".

This commit is just the storage layer (migration + sqlc); subsequent
commits wire it through the API, daemon, and agent backends.

Co-authored-by: multica-agent <github@multica.ai>

* feat(agent-backend): inject reasoning effort for claude + codex (MUL-2339)

Extends ExecOptions with a runtime-native ThinkingLevel string and wires
it into the Claude and Codex backends. Discovery is driven by the local
CLI so the daemon advertises whatever the host install supports rather
than a hand-maintained list that goes stale.

Per Elon's PR1 review:
- Claude: parses `claude --help` to learn the `--effort` superset and
  projects through a per-model allow-list (xhigh is Opus-only; max is
  session-only on the smaller models). Falls back to a conservative
  static list when the binary is missing or help drift hides the line.
- Codex: drives `codex debug models --output json` so per-model
  reasoning subsets and the documented default come directly from the
  CLI. The older config-error probe trick is gone — the JSON path is
  stable and doesn't pollute stderr with an intentional misconfig.
- Cache key includes (provider, executablePath, cliVersion) so a CLI
  upgrade invalidates entries that referenced the older help / catalog.

Per Trump's PR1 constraint, all three Codex injection points
(thread/start.config, thread/resume.config, turn/start.effort) flow
through one helper (`applyCodexReasoningEffort`) so they cannot drift
independently. The shared `codexReasoningCases` fixture in
`thinking_test.go` asserts the same value→{shape, key} contract at
each site for every level the runtimes know about.

Claude's `--effort` is also added to `claudeBlockedArgs` so a user
custom_args entry can't silently outvote the daemon-injected value.

Co-authored-by: multica-agent <github@multica.ai>

* feat(api): wire thinking_level through API + daemon contract (MUL-2339)

End-to-end plumbing for the per-agent reasoning/effort setting:

- AgentResponse / TaskAgentData now carry `thinking_level`; the daemon's
  claim response includes it and the daemon's executor passes it through
  to agent.ExecOptions, where the Claude and Codex backends already know
  what to do with it.
- ModelEntry on the runtime-models wire format gains a `thinking` block
  carrying `supported_levels` + `default_level` per model so the UI can
  render a runtime-aware picker without the server having to know about
  the local CLI install. `handleModelList` projects the agent-package
  catalog (including the new Thinking field) into the wire shape.
- CreateAgent / UpdateAgent gate the field with a synchronous provider
  enum check (claude / codex only today). UpdateAgent is tri-state:
  field omitted = no change, "" = explicit clear (new
  `ClearAgentThinkingLevel` query, mirrors the existing mcp_config null
  pattern), non-empty = validate then set.

Per Trump's PR1 review, the API NEVER auto-clears on a runtime/model
swap and ALWAYS returns 400 on an unknown literal value — same shape
across CreateAgent, UpdateAgent, and combined patches that move
runtime + level in one request. Per-model combination failures (e.g.
`xhigh` against a model that only supports up to `high`) surface as a
daemon-side task error, not a silent server-side rewrite.

TS types follow the same shape: `Agent.thinking_level`,
`CreateAgentRequest`/`UpdateAgentRequest` add the field, `RuntimeModel`
grows a `thinking` block. Older backends omit the field, which the
front-end treats as "no picker for this model" — installed desktop
builds keep working.

Co-authored-by: multica-agent <github@multica.ai>

* fix(agent): correct codex debug models argv + pin via runner test (MUL-2339)

`codex debug models --output json` is rejected by codex-cli 0.131.0 —
the subcommand emits JSON on stdout by default and has no `--output`
flag. Drop the flag and add `--bundled` to skip the network refresh
discovery doesn't need. Move the argv to a package-level var and add
a test that runs a fake `codex` to assert the binary actually
receives exactly `debug models --bundled`, so the contract can't
silently drift on the next refactor.

Also teach ValidateThinkingLevel to resolve an empty model to the
provider's default model entry. Without this, every default-model
task with a persisted thinking_level would be misjudged "unknown
model" by the daemon guard.

Co-authored-by: multica-agent <github@multica.ai>

* fix(api): reject runtime switch that would leave invalid thinking_level (MUL-2339)

A PATCH that changed `runtime_id` without touching `thinking_level`
used to silently keep the existing value, so a Claude agent storing
`max` could land on a Codex runtime where `max` is not a recognised
token at all, and the daemon would receive a literal-invalid level.

Hold the same "always 400 on literal-invalid, never silent coerce"
rule on this implicit path. When runtime_id changes and the existing
value is not in the new provider's enum, return 400 with the
recovery options (clear via `thinking_level=""` or re-set in the
same PATCH).

Add coverage for both the kept-when-still-valid and the rejected
cases, plus the two recovery paths (clear and replace).

Co-authored-by: multica-agent <github@multica.ai>

* fix(daemon): guard runTask with per-model thinking_level validator (MUL-2339)

ValidateThinkingLevel existed but had no call site — `task.Agent.
ThinkingLevel` flowed straight into ExecOptions, so `xhigh` configured
on a non-Opus Claude model, or API-side stale values that escaped the
provider enum gate, would be injected anyway.

Run the validator before building ExecOptions. Invalid combinations
log a warning and drop the level instead of failing the task: the
agent still runs, just at the runtime's default reasoning effort.
Discovery errors fail open (keep the level, let the CLI surface any
objection) so a transient `claude --help` failure can't strand work.

Empty model is forwarded as-is; the validator resolves it to the
provider's default model internally per the cross-package contract.

Co-authored-by: multica-agent <github@multica.ai>

* chore(agent): drop stale `--output json` comments + unused scanner (MUL-2339)

Codex CLI's `debug models` subcommand emits JSON without an `--output`
flag, and `parseCodexDebugModels` never read from the bufio.Scanner.
Sync the comments with the actual invocation and remove the dead init.

Co-authored-by: multica-agent <github@multica.ai>

---------

Co-authored-by: multica-agent <github@multica.ai>
2026-05-20 12:30:10 +08:00
LinYushen
319b23eb39 Revert "feat(task): add claim lease mechanism (Phase 2, MUL-2246) (#2660)" (#2674)
This reverts commit 3137feecdf.
2026-05-15 16:07:23 +08:00
LinYushen
b7a58c06ac Revert "feat(task): wire claim lease into TaskService and sweeper (MUL-2246) …" (#2673)
This reverts commit bb32be0e50.
2026-05-15 16:06:58 +08:00
LinYushen
bb32be0e50 feat(task): wire claim lease into TaskService and sweeper (MUL-2246) (#2662)
* feat(task): wire claim lease queries into TaskService and sweeper (MUL-2246)

- ClaimTask now uses ClaimAgentTaskWithLease (generates claim_token + lease)
- StartTask accepts optional claim_token for token-verified start
- AgentTaskResponse includes claim_token for daemon to use
- Daemon client sends claim_token in StartTask body
- Sweeper calls RequeueExpiredClaimLeases each tick
- Legacy daemons without claim_token still work (graceful fallback)

Co-authored-by: multica-agent <github@multica.ai>

* fix(task): address PR #2662 review blockers (MUL-2246)

1. ClaimAgentTaskForRuntime: push runtime_id into atomic SQL WHERE clause
   so runtime A cannot claim tasks queued for runtime B under the same agent.

2. Legacy StartAgentTask: add claim_token IS NULL guard so leased rows
   cannot be started without token verification. Handler rejects malformed
   tokens with 400 instead of silently degrading to legacy path.

3. StartAgentTaskWithClaimToken: validate claim_expires_at >= now(),
   preserve claim_token until terminal state (only clear claim_expires_at),
   use CTE + UNION ALL for idempotent retry when daemon resends after a
   lost StartTask response. Return 409 Conflict on token mismatch/expiry.

Co-authored-by: multica-agent <github@multica.ai>

* fix(daemon): StartTask 409 handling, transport retry, claim_token on FailTask (MUL-2246)

- StartTask 409 (claim superseded): release slot, don't call FailTask
- StartTask transport timeout/5xx: retry once with same token, then
  check task status before failing
- FailTask now sends claim_token; server-side FailAgentTask SQL adds
  AND (claim_token IS NULL OR claim_token = @claim_token) guard so
  stale daemons cannot fail tasks that have been re-claimed

Co-authored-by: multica-agent <github@multica.ai>

* fix(task): close FailTask token bypass and RequeueExpiredClaimLeases liveness gap (MUL-2246)

Blocker 1 - FailTask token validation:
- SQL: change (param IS NULL OR claim_token = param) to
  (param IS NULL AND claim_token IS NULL) OR claim_token = param
  so tokenless requests can only fail legacy (tokenless) rows.
- task.go: malformed claim_token now returns ErrInvalidClaimToken (400)
  instead of being silently dropped to NULL.
- Handler: maps ErrInvalidClaimToken→400, ErrClaimTokenInvalid→409.
- Service: when UPDATE returns no rows but task is still active,
  return ErrClaimTokenInvalid (token mismatch) instead of silent success.

Blocker 2 - RequeueExpiredClaimLeases runtime liveness:
- SQL: JOIN agent_runtime, only requeue tasks where runtime is 'online'.
  Dead/offline runtime tasks stay dispatched for FailTasksForOfflineRuntimes.
- FOR UPDATE → FOR UPDATE OF atq (required with JOIN).

Regression tests:
- task_claim_token_test.go: malformed, tokenless-on-tokened, wrong-token
- requeue_lease_test.go: SQL must JOIN agent_runtime with online filter

Co-authored-by: multica-agent <github@multica.ai>

* fix(task): move expired lease requeue to ClaimTaskForRuntime preflight, add heartbeat freshness backstop (MUL-2246)

- Add RequeueExpiredClaimLeasesForRuntime: per-runtime preflight self-requeue
  in ClaimTaskForRuntime. Runtime proves liveness by actively claiming, so no
  heartbeat check needed.
- Update global RequeueExpiredClaimLeases to require ar.last_seen_at freshness
  (stale_threshold_secs param). Prevents requeuing to a dead runtime in the
  90s gap between lease expiry (60s) and offline detection (150s).
- Add regression tests verifying the heartbeat freshness check and that the
  preflight query does not join agent_runtime.

Co-authored-by: multica-agent <github@multica.ai>

* fix(task): use LivenessStore for global requeue, move preflight before empty-cache (MUL-2246)

Blocker 1: Global RequeueExpiredClaimLeases now uses LivenessStore.IsAliveBatch
to verify runtimes are truly alive before requeuing expired leases. When
LivenessStore is unavailable (no Redis), global requeue is skipped entirely —
the preflight self-requeue in ClaimTaskForRuntime handles live runtimes. This
closes the 60-150s gap where a dead runtime still appears online in DB.

Blocker 2: Moved RequeueExpiredClaimLeasesForRuntime BEFORE EmptyClaim.IsEmpty
fast-path in ClaimTaskForRuntime. Expired leases are now requeued (which bumps
the empty cache via notifyTaskAvailable) before the empty check can
short-circuit the claim path.

Also adds ListRuntimesWithExpiredClaimLeases SQL query and LivenessChecker
interface on TaskService.

Co-authored-by: multica-agent <github@multica.ai>

* fix(task): wire EmptyClaimCache into backend taskSvc for backstop requeue (MUL-2246)

The backend taskSvc used by the sweeper only had Liveness wired but not
EmptyClaim. When global backstop requeue called notifyTaskAvailable,
s.EmptyClaim.Bump() was a nil no-op — the handler's empty-cache was never
invalidated, so the daemon's next claim hit a stale empty verdict.

Fix: wire the same Redis-backed EmptyClaimCache into the backend taskSvc
in main.go (same Redis keys as router.go:139 handler instance).

Add regression test verifying backstop requeue invalidates the handler's
empty-cache.

Co-authored-by: multica-agent <github@multica.ai>

* fix(task): global backstop must not requeue — alive runtimes use preflight, dead stay dispatched (MUL-2246)

- RequeueExpiredClaimLeases is now a no-op (returns 0 always)
- Alive runtimes self-requeue via ClaimTaskForRuntime preflight
- Dead runtimes stay dispatched for FailTasksForOfflineRuntimes
- Rewriting to queued on dead runtime creates 2h blackhole (offline
  sweeper only handles dispatched/running)
- Test actually calls RequeueExpiredClaimLeases and asserts 0 in all cases

Co-authored-by: multica-agent <github@multica.ai>

* fix(daemon): remove duplicate usage reporting block after merge conflict (MUL-2246)

The merge resolution introduced a second ReportTaskUsage call after the
status check, duplicating the usage-before-early-return block that already
runs right after runner.run. Remove the duplicate and add a regression test
asserting /usage is called exactly once on the normal completion path.

Co-authored-by: multica-agent <github@multica.ai>

---------

Co-authored-by: multica-agent <github@multica.ai>
2026-05-15 15:15:31 +08:00
LinYushen
3137feecdf feat(task): add claim lease mechanism (Phase 2, MUL-2246) (#2660)
Add claim_token + claim_expires_at columns to agent_task_queue and three
new SQL queries for the claim lease protocol:

- ClaimAgentTaskWithLease: generates a UUID token and sets a lease expiry
  when claiming a task, so the daemon must prove it received the response
- StartAgentTaskWithClaimToken: validates the token on StartTask, preventing
  stale daemons from starting requeued tasks
- RequeueExpiredClaimLeases: moves dispatched tasks with expired leases back
  to queued for re-claim

This closes the reliability gap where a claim response lost in transit
leaves a task stuck in dispatched until the 60s dispatch timeout fires.

Co-authored-by: multica-agent <github@multica.ai>
2026-05-15 15:14:05 +08:00
Jiayuan Zhang
4d6b5ad06f fix(squad): wake leader when dual-role agent posts as worker (MUL-2218) (#2626)
* fix(squad): wake leader when dual-role agent posts as worker (MUL-2218)

The squad-leader self-trigger guard skipped a comment whenever the
author equalled the squad's leader id, regardless of the role the agent
was acting in. For an agent that holds both leader and worker roles in
the same squad, this meant the leader role never reacted to its own
worker output and the issue stalled.

Tag each enqueued task with is_leader_task and consult the agent's
most recent task on the issue from both self-trigger guards (comment
path + @squad mention path) — skip only when that task was itself a
leader task.

Co-authored-by: multica-agent <github@multica.ai>

* fix(squad): inherit is_leader_task on retry task clone (MUL-2218)

CreateRetryTask cloned a parent task into a fresh queued attempt but
omitted is_leader_task from the column list, so the child silently fell
back to the column default (false). For a leader task that hit auto-retry
through MaybeRetryFailedTask, the retried task posed as a worker task —
the self-trigger guard then no longer recognised the leader's own
comments, re-opening the very loop MUL-2218 closes.

Inherit p.is_leader_task in the clone and add a query-level test that
covers both leader and worker retries.

Co-authored-by: multica-agent <github@multica.ai>

---------

Co-authored-by: multica-agent <github@multica.ai>
2026-05-14 15:23:36 +02:00
Bohan Jiang
f5c2994aed feat(workspace): revoke a member's runtimes when they leave or are removed (#2401)
* feat(workspace): revoke a member's runtimes when they leave or are removed

Previously, leaving or being removed from a workspace only deleted the
member row — every runtime the departed user owned in that workspace
remained in the DB, kept its daemon_token valid, and stayed reachable to
the workspace's other members. The departed user lost access but their
machine kept doing work.

This change converges the runtime state in the same transaction as the
member-row deletion: agents pinned to those runtimes are archived,
in-flight tasks are cancelled (so the daemon's per-task status poller
interrupts the running agent gracefully), the runtimes are forced
offline, and the daemon_token rows are deleted. After commit the
DaemonTokenCache is invalidated and agent:archived / daemon:register
events fire so connected clients reconcile immediately.

Server-side state convergence is the production safety net; the
daemon_token revoke takes effect once the mdt_ flow is live (today most
daemons fall back to PAT/JWT, and the member-row deletion is what stops
those requests via requireWorkspaceMember).

Daemon-side handling (recognising the resulting 401/404 and tearing down
the local pairing for that workspace) lands in a follow-up.

Co-authored-by: multica-agent <github@multica.ai>

* fix(workspace): also cancel tasks for archived agents on member revoke

CancelAgentTasksByRuntime only matched tasks whose runtime_id was in the
revoked set, missing a real path: agent.runtime_id can be reassigned via
UpdateAgent, but agent_task_queue.runtime_id keeps the value from when
the task was queued. So an agent currently bound to the leaving member's
runtime gets archived correctly, but its older tasks still pinned to a
prior runtime stay 'queued' — and ClaimAgentTask does not gate on
agent.archived_at, so those orphaned tasks remain claimable by the
prior runtime.

Replace CancelAgentTasksByRuntime with CancelAgentTasksByRuntimeOrAgent,
which OR-matches runtime_ids and the archived agent IDs in one UPDATE.
Pass the archived agent IDs through from revokeAndRemoveMember.

Adds TestDeleteMember_CancelsTasksFromAgentReassignment as a regression
guard: same agent, two runtimes, the older task on the surviving runtime
must end up cancelled while the surviving runtime stays online.

Co-authored-by: multica-agent <github@multica.ai>

---------

Co-authored-by: multica-agent <github@multica.ai>
2026-05-11 15:06:50 +08:00
Multica Eve
a2dd80d4f6 feat(autopilot): skip dispatch when assignee runtime is offline (MUL-1899) (#2311)
* feat(autopilot): skip dispatch when assignee runtime is offline (MUL-1899)

Prevents scheduled autopilots from accumulating doomed tasks against
offline / archived / unbound agents. Before this change, a paused laptop
or crashed daemon would let a 5-minute-cron autopilot pile up thousands
of queued agent_task_queue rows that no runtime would ever drain — this
is the dominant source of the 89k stuck-task backlog flagged in MUL-1899.

DispatchAutopilot now performs a pre-flight admission check on the
assignee agent's runtime status. If the runtime is not 'online' (or the
agent is archived / has no runtime bound / has no assignee), the run is
recorded as 'skipped' with a failure_reason and no task is enqueued.
Skipped runs still emit autopilot:run.done so the UI / activity feed
reflect that the trigger fired and was evaluated.

Skipped runs are deliberately NOT counted toward the failure-ratio
auto-pause: a user who closes their laptop overnight should not have
their autopilot paused. Sustained server-side failures keep their
existing pause path via the failure monitor.

Tests: added an integration test that creates an offline runtime and
asserts DispatchAutopilot records a skipped run with no task enqueued.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: multica-agent <github@multica.ai>

* feat(scheduler): expire stale queued tasks via TTL sweeper (MUL-1899)

Companion to the dispatch-time admission gate added in this PR. The
admission gate prevents *new* tasks from being enqueued against an
offline runtime, but it does not drain the historical backlog
(~89k stuck queued rows observed at MUL-1899 baseline) and does not
help when a runtime goes offline *after* a task has already been
queued. This adds a passive TTL sweeper:

- New SQL query `ExpireStaleQueuedTasks` transitions queued tasks
  older than the TTL to status='failed' with
  failure_reason='queued_expired' and a clear error message.
- Sweep is capped per tick (`queuedExpireBatchSize`, default 500) via
  a CTE+LIMIT so that draining a large backlog cannot monopolise the
  DB on a single tick. At 30s ticks the worst case is 60k rows/hour.
- Wired into the existing 30s `runRuntimeSweeper` loop alongside
  `sweepStaleTasks` and reuses `taskSvc.HandleFailedTasks` so the
  expired tasks broadcast `task:failed` events, reconcile agent
  status, and roll back any in-progress issues — same lifecycle as
  any other failed task.
- Default TTL = 2h. Conservatively above any reasonable
  "queued behind a long-running task" window (default agent timeout
  is 2h, sweeper runs every 30s) so legitimate work isn't expired.
- Integration tests cover the happy path (stale → expired, fresh →
  left alone, correct status/reason/error) and the per-tick batch cap.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: multica-agent <github@multica.ai>

* fix(autopilot): address review blockers from PR #2311 (MUL-1899)

GPT-Boy review of the offline-runtime + queued-TTL PR flagged four
blockers; this commit addresses them all.

1. Restore the 'skipped' autopilot_run status in the DB constraint.
   Migration 043 had removed 'skipped' along with the now-defunct
   concurrency_policy feature, so the new admission gate's INSERT of
   status='skipped' violated `autopilot_run_status_check` and broke
   `TestAutopilotDispatchSkipsWhenRuntimeOffline` in CI. New
   migration 079 re-adds 'skipped' to the CHECK list. The down
   migration migrates skipped → failed before re-tightening, mirror-
   ing what 043 did for the original removal.

2. Make `ExpireStaleQueuedTasks` race-safe.
   The CTE-then-UPDATE pattern could clobber a task that the daemon
   claimed between victim selection and the outer update. Two
   guards added:
     - `FOR UPDATE SKIP LOCKED` in the CTE so we never wait on a
       row that's currently being claimed (and never block the
       claim path either).
     - The outer UPDATE now re-checks `t.status = 'queued'` AND the
       TTL predicate so even if a row's lock is released after a
       successful claim, we cannot transition a now-dispatched/
       running task to 'failed'.

3. Add a partial index for the queued-TTL sweeper.
   `idx_agent_task_queue_queued_created_at` on `created_at WHERE
   status = 'queued'` — keeps the 30s sweep query (status=queued
   AND created_at < ... ORDER BY created_at LIMIT 500) cheap even
   when historical terminal rows accumulate (~89k+ at MUL-1899
   baseline). The partial predicate keeps the index tiny because
   only in-flight rows live in 'queued'.

4. Fix the failure-monitor denominator.
   `SelectAutopilotsExceedingFailureThreshold` had been counting
   'skipped' toward total runs, which would have diluted the failure
   ratio: a 100%-failing autopilot could mask itself behind a wall
   of admission skips. With 'skipped' restored as a real status,
   the auto-pause monitor must explicitly exclude it from BOTH
   numerator and denominator — admission skips are neither a
   success nor a failure.

Verified: `go test ./cmd/server/... ./internal/service/...` passes
(including TestAutopilotDispatchSkipsWhenRuntimeOffline,
TestExpireStaleQueuedTasks, TestExpireStaleQueuedTasksRespectsBatch
Limit). `go build ./... && go vet ./...` clean.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: multica-agent <github@multica.ai>

* fix(migrations): split queued-task TTL index into concurrent migration

Per PR #2311 review: agent_task_queue is a hot table, so building the
new partial index with plain CREATE INDEX inside migration 079 would
hold ACCESS EXCLUSIVE on the queue and block dispatch during deploy.

The migration runner does not allow CONCURRENTLY to share a file with
other statements (documented in 068), so split the index into its own
single-statement file 080 — matching the existing pattern in 035 /
067 / 074 / 075 / 078. Migration 079 keeps the autopilot_run
constraint change.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: multica-agent <github@multica.ai>

---------

Co-authored-by: Eve <eve@multica-ai.local>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: multica-agent <github@multica.ai>
2026-05-09 15:07:57 +08:00
Bohan Jiang
6d9ebb0fdd fix(daemon): unblock issues stuck on a poisoned-image agent session (#2314)
* fix(daemon): treat upstream API 400 invalid_request_error as poisoned session

A markdown-linked image in an issue description that the agent downloads as
a tiny CDN auth-error file and Read's as a PNG poisons the conversation:
the LLM API rejects the bad image with 400 invalid_request_error, the
session_id is pinned mid-flight, and every follow-up task on the issue
(comment-trigger, auto-retry) resumes the same poisoned conversation and
hits the same 400 — the issue can no longer be executed even after the
description is cleaned up.

Mirror the existing fallback-output classifier on the error side: detect
"API Error: ... 400 ... invalid_request_error" in the agent error string,
persist failure_reason='api_invalid_request', and add it to the
GetLastTaskSession exclusion list so the next task starts a fresh
session that re-reads the (now-clean) description.

Co-authored-by: multica-agent <github@multica.ai>

* fix(daemon): unblock issues already poisoned by API 400 invalid_request_error

The forward-only classifier from the previous commit only tags new failures.
Issues like MUL-1918 already have multiple failed-task rows whose
failure_reason is the pre-fix default 'agent_error', and GetLastTaskSession
falls back to those legacy rows on the next claim — so deploying the
classifier alone leaves existing poisoned issues stuck (GPT-Boy review
on PR #2314).

Two complementary changes:

- Migration 079 backfills failure_reason='api_invalid_request' on every
  pre-existing 'agent_error' row whose error text matches the canonical
  Anthropic 400 invalid_request_error shape. Keeps observability
  consistent (multica issue runs / UI now report the right reason).

- GetLastTaskSession adds a defensive ILIKE clause on error text. Closes
  the deploy-window gap where the old binary could write a new
  'agent_error' row between the migration running and the new code
  taking over, and protects against future error-format variants the
  daemon classifier might miss.

Plus regression tests covering the legacy + new coexistence case GPT-Boy
flagged, and a guard rail asserting benign 'agent_error' failures
(timeouts, tool errors) still resume their session.

Co-authored-by: multica-agent <github@multica.ai>

---------

Co-authored-by: multica-agent <github@multica.ai>
2026-05-09 14:39:10 +08:00
Jiayuan Zhang
0cd50e14eb feat(agent-live-card): show queued tasks in issue live banner (MUL-1897) (#2307)
The issue-detail "agent live" banner only showed dispatched/running tasks.
A task that was queued — runtime offline, busy on a prior task, or held
behind a coalesced sibling — left the issue silent until claim, which
reads as "the trigger never landed".

Include 'queued' in `ListActiveTasksByIssue`, then branch the renderer:
queued banners use a non-spinning Clock, "{name} 排队中 / is queued"
copy, "queued for Ns" elapsed anchored on `created_at`, and hide the
transcript button (no execution log yet). Cancel still works because
`CancelAgentTask` already accepts queued.

Client-side re-sort by lifecycle (running → dispatched → queued) so the
sticky slot stays on the most-active task even when a queued sibling
was created more recently.

Co-authored-by: multica-agent <github@multica.ai>
2026-05-09 07:33:12 +02:00
Bohan Jiang
b17f975a17 docs(cli): clarify issue rerun semantics (current assignee, fresh session) (#2304)
* docs(cli): clarify `issue rerun` semantics

The CLI table described `multica issue rerun <id>` as "Rerun the most
recent agent task", which led users to expect it would re-run whichever
agent ran last. The actual behavior is to enqueue a fresh task for the
issue's **current** agent assignee, regardless of who ran most
recently — see `TaskService.RerunIssue` in
`server/internal/service/task.go`.

Also fix a stale claim in `tasks.mdx`: the "Manual rerun" section
described session inheritance as "Yes", but commit b1345685 made manual
rerun pass `force_fresh_session=true` precisely to avoid replaying a
poisoned session. Only **automatic retry** still inherits the session.

Updates EN + ZH mirrors of `cli.mdx` and `tasks.mdx`.

Co-authored-by: multica-agent <github@multica.ai>

* docs(tasks): tighten rerun trigger surface; clean stale Go comments

Apply review feedback on PR #2304:

- `tasks.mdx` / `tasks.zh.mdx`: rerun is triggered via CLI or the
  `/api/issues/{id}/rerun` endpoint, not "UI or CLI" — there's no rerun
  affordance in web/desktop today.
- `tasks.mdx` / `tasks.zh.mdx`: comparison table — manual rerun applies
  to "Issues with an agent assignee", not "All sources". The handler
  rejects with `issue is not assigned to an agent` for anything else,
  and there's no rerun path for chat or autopilot tasks.
- `task_lifecycle.go`: `RerunIssue` doc comment claimed the new task
  "carries the most recent session_id/work_dir so the agent can resume".
  That has been false since b1345685 — rewrite to reflect the actual
  `force_fresh_session=true` contract.
- `agent.sql` (regenerated `agent.sql.go`): `GetLastTaskSession` doc
  said it serves "auto-retry / manual rerun"; manual rerun is now
  routed around it via `force_fresh_session=true`. Note both the
  auto-retry path it does serve and the rerun escape hatch.

No logic change.

Co-authored-by: multica-agent <github@multica.ai>

---------

Co-authored-by: multica-agent <github@multica.ai>
2026-05-09 12:46:37 +08:00
LinYushen
250ada1fb3 chore(db): drop unused agent_task_queue.last_heartbeat_at (#2212)
Drops the unused agent_task_queue.last_heartbeat_at column and removes the hot-path task heartbeat write.
2026-05-07 15:45:29 +08:00
Bohan Jiang
60b215f44f feat(chat): support deleting chat sessions (#2115)
* feat(chat): support deleting chat sessions

Replaces the unreachable archive endpoint with a real hard delete and
exposes it from the chat history panel.

- DELETE /api/chat/sessions/{id} now hard-deletes the session and its
  messages (CASCADE), cancels any in-flight tasks before removal so the
  daemon doesn't keep running work whose result has nowhere to land,
  and broadcasts chat:session_deleted.
- Frontend adds a per-row delete button with a confirmation dialog,
  optimistically drops the session from both list caches, and clears the
  active session pointer locally + on other tabs via the WS handler.

Co-authored-by: multica-agent <github@multica.ai>

* fix(chat): make session delete atomic and keep archived sessions read-only

Address review feedback on #2115.

- DeleteChatSession now runs lock + cancel + delete in a single tx and
  only broadcasts events post-commit. The new LockChatSessionForDelete
  query takes FOR UPDATE on chat_session, which blocks the FK validation
  of any concurrent SendChatMessage trying to enqueue a task for this
  session — that insert fails after we commit, so it can no longer
  produce an orphaned task whose chat_session_id is nulled by
  ON DELETE SET NULL. Cancel failure now aborts the delete instead of
  warn-and-continue.
- SendChatMessage refuses non-active sessions again. The archive code
  path is gone, but legacy rows with status='archived' may still exist
  in the DB; keep the guard until we explicitly migrate them.
- Frontend re-reads allChatSessionsOptions to disable ChatInput on
  legacy archived sessions so the UX matches the server-side guard.

Co-authored-by: multica-agent <github@multica.ai>

---------

Co-authored-by: multica-agent <github@multica.ai>
2026-05-06 13:22:53 +08:00
Bright Zheng
cf47d9b702 fix: guard session resume by runtime (#1905) 2026-05-03 10:51:31 +08:00
Bohan Jiang
2dddfaa196 feat(daemon): Redis empty-claim fast path for /tasks/claim polling (#1860)
* feat(daemon): Redis empty-claim fast path for /tasks/claim polling

Daemons poll /tasks/claim every 30s per runtime; the steady-state
warm-empty case currently runs ListPendingTasksByRuntime against
Postgres on every poll. This collapses that path:

- New ListQueuedClaimCandidatesByRuntime query restricts to status =
  'queued' (the old query also returned 'dispatched' rows that can
  never be reclaimed) and is backed by a partial index keyed on
  (runtime_id, priority DESC, created_at ASC).
- New EmptyClaimCache caches the negative verdict in Redis with a
  30s TTL. ClaimTaskForRuntime checks the cache before SELECT and
  populates it on confirmed-empty results.
- notifyTaskAvailable now invalidates the runtime's empty key before
  kicking the daemon WS, so newly enqueued tasks become claimable
  immediately rather than waiting out the TTL.
- AutopilotService.dispatchRunOnly now goes through
  TaskService.NotifyTaskEnqueued so run_only tasks get the same
  invalidate-then-wakeup contract as every other enqueue path.

Co-authored-by: multica-agent <github@multica.ai>

* fix(daemon): close MarkEmpty/Bump race in empty-claim fast path

GPT-Boy's review on PR #1860 caught a real concurrency bug. Under the
prior implementation it was possible for a slow claim to write an
empty verdict AFTER a concurrent enqueue had already invalidated it:

  T1 claim:   SELECT -> empty
  T2 enqueue: INSERT row, DEL empty key (no-op, key not set yet),
              wakeup
  T1 claim:   SET empty (writes a stale "empty" verdict)
  T3 wakeup:  IsEmpty -> hit -> returns null

The just-queued task would then sit idle until the empty key's TTL
expired (up to 30s).

Replace the DEL-based invalidation with a per-runtime version
counter:

- CurrentVersion(rt) is a Redis INCR counter at
  mul:claim:runtime:version:<rt> with a 24h sliding TTL.
- Claim samples version BEFORE the SELECT and passes it to MarkEmpty,
  which stores the verdict's value as the observed-version string.
- IsEmpty MGETs both keys and trusts the verdict only when the
  empty-key value equals the current version.
- Enqueue Bumps the version (INCR + EXPIRE) before the wakeup,
  causing any verdict written under a prior version to be rejected
  on the next read.

Also bound every Redis call from this cache with a 250ms timeout —
notifyTaskAvailable uses a background context so a wedged Redis
must not block enqueue.

Tests against a real Redis (REDIS_TEST_URL) cover:
- MarkEmpty + IsEmpty under matching version returns hit
- Bump invalidates a prior empty verdict (race-fix pin)
- A MarkEmpty written under a stale pre-Bump version is rejected
- TTL clamping, per-runtime isolation, nil-cache safety
- notifyTaskAvailable Bumps before the wakeup fires

Co-authored-by: multica-agent <github@multica.ai>

* chore(daemon): renumber claim-candidate index migration to 067

Slot 064 was taken on main by 064_notification_preference. The
migration runner tracks per-version in schema_migrations and would
silently skip the second 064_*, leaving the index uncreated.
Rename to 067 (next free slot).

Co-authored-by: multica-agent <github@multica.ai>

---------

Co-authored-by: multica-agent <github@multica.ai>
2026-04-30 15:50:05 +08:00
Bohan Jiang
b1345685a3 fix(task): rerun starts a fresh session, skip poisoned resume (#1928)
* fix(task): rerun starts a fresh session, skip poisoned resume

When a task ended in a known agent fallback ("I reached the iteration
limit and couldn't generate a summary.", "Put your final update inside
the content string. Keep it concise.") the (agent_id, issue_id) resume
lookup would still pick that session, so a manual rerun inherited the
poisoned state and reproduced the same bad output.

Two complementary guards:

1. Daemon classifies poisoned terminal output and routes it through the
   blocked path with failure_reason set ('iteration_limit' /
   'agent_fallback_message'). GetLastTaskSession excludes failed tasks
   with those reasons, so even comment-triggered tasks no longer resume
   them. Tasks that failed mid-flight (timeout, runtime_recovery, etc.)
   are still resumable, preserving MUL-1128's auto-retry contract.

2. Manual rerun marks the new task force_fresh_session=true. The daemon
   claim handler skips the resume lookup entirely when the flag is set,
   capturing the user-intent signal that "the prior output was bad" even
   when poisoned classification misses a future fallback wording.

Auto-retry of orphaned mid-flight failures (MaybeRetryFailedTask →
CreateRetryTask) does not take this path, so it keeps resuming.

Tests: classifyPoisonedOutput unit test; integration tests assert the
SQL filter excludes poisoned classifiers, RerunIssue flips the flag,
and the normal enqueue path leaves it false.

Co-authored-by: multica-agent <github@multica.ai>

* fix(daemon): cap poisoned-output matcher to short trimmed text

GPT-Boy review on MUL-1630: the previous strings.Contains match would
classify any output that quoted the marker substring — including a
review/analysis that simply discussed the marker itself. Real fallback
messages are short single-sentence affairs, so cap the candidate at
~one paragraph and trim whitespace before matching. Adds regression
tests covering a long quoting review and a marker buried in a long
real conclusion; both must stay classified as completed.

Co-authored-by: multica-agent <github@multica.ai>

* fix(migrations): rename 065 force_fresh_session → 066 to clear collision

main introduced 065_project_resources after this branch was cut, so
both files shared the 065_ prefix. The readiness check
(server/cmd/server/health.go → migrations.LatestVersion) takes the
last entry by lexical order, which is 065_project_resources, leaving
this branch's 065_force_fresh_session unguarded — a deploy that
applied project_resources but not force_fresh_session would still
report ready, and the next enqueue / rerun / claim would crash on
"column force_fresh_session does not exist".

Renaming to 066_force_fresh_session puts it strictly after
project_resources so readiness blocks until it's applied.

Co-authored-by: multica-agent <github@multica.ai>

---------

Co-authored-by: multica-agent <github@multica.ai>
2026-04-30 14:17:53 +08:00
Bohan Jiang
9baa72cc68 fix: polish quick-create UX (kind labeling, dark toast, placeholder) (#1831)
* fix: polish quick-create UX (kind labeling, dark toast, placeholder)

Three small fixes shaken out from using the agent-create flow:

- AgentTaskResponse now carries a `kind` discriminator
  ("comment" | "autopilot" | "chat" | "quick_create" | "direct"), computed
  from the existing FK shape with no extra DB access. The Activity row
  uses it to label quick-create tasks as "Creating issue" instead of
  falling through to the generic "Untracked" — once the agent finishes
  and the new issue is linked, the row transitions to the normal
  identifier+title display.

- Sonner Toaster reads `resolvedTheme` instead of `theme`, so toasts
  follow the actual dark/light state. Forwarding "system" let sonner
  pick its own answer from `prefers-color-scheme`, which in the Electron
  renderer can disagree with next-themes' `html.dark` class — the toast
  rendered light on a dark UI.

- Agent-create placeholder rephrased to a more conversational example
  with a project reference: "let Bohan fix the inbox loading slowness
  in the Web project". Drops the priority hint (priority isn't widely
  used) and matches how people actually instruct the agent.

* fix(quick-create): link new issue back to task on completion

Addresses the review on PR #1831: completed quick-create tasks were
left with issue_id=NULL forever, so the activity row stayed on
"Creating issue" instead of transitioning to the normal MUL-XXX +
title rendering once the agent finished.

- Server: notifyQuickCreateCompleted now writes the resolved issue id
  back to agent_task_queue.issue_id via a new LinkTaskToIssue query
  (guarded by `issue_id IS NULL` so it only ever fills the unset
  quick-create case). Best-effort: a write failure logs but doesn't
  block the inbox notification.
- Frontend: defensive wording fallback — kind=quick_create rows in
  terminal status (completed/failed/cancelled) now render as
  "Quick create" instead of the active "Creating issue" label,
  covering rows whose link write failed or whose agent never
  produced an issue at all.
2026-04-29 15:40:59 +08:00
Naiyuan Qing
f745a3bbbe feat(agent): presence v3 + execution log + trigger summary (#1823)
* refactor(views): migrate agent/runtime/skill lists to TanStack DataTable

Replace the per-page CSS Grid + minmax(min, fr) + sticky-first-col + truncate
implementation with a TanStack Table backend rendered through a Dice UI-style
DataTable shell. Column widths are now px-based via column.size, so cells
no longer shrink or auto-truncate as the viewport narrows; when the sum of
columns exceeds the viewport, the container scrolls horizontally instead.

- Add @tanstack/react-table to the catalog (8.21.3) and wire it into
  packages/ui (dep) and packages/views (peerDep).
- packages/ui: new DataTable + DataTableColumnHeader + lib/data-table.ts
  (getColumnPinningStyle), adapted from Dice UI's registry. The shell
  renders <table> directly (skipping shadcn's <Table> wrapper) so its own
  outer overflow controls both axes — no nested overflow conflicts.
- packages/views: each list now declares ColumnDef[] with explicit
  cell renderers. Row click navigates to detail via onRowClick (instead of
  wrapping <tr> in <a>, which is invalid HTML); kebab dropdowns
  stopPropagation so they don't trigger the row navigation.
- Drop the previous AGENT_LIST_GRID / GRID_WITH_OWNER / ROW_GRID
  templates and the sticky-first-col / subgrid mechanics that came with
  them. agent-list-item.tsx is removed; runtime-list.tsx and
  skills-page.tsx are trimmed to thin wrappers.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(agent): cap description at 255 chars (db + api + ui)

Symmetric enforcement across DB, server, and UI:

- Migration 060: pre-flight truncate of any oversize rows, then ADD
  CONSTRAINT NOT VALID + VALIDATE CONSTRAINT so the new check doesn't
  block writes during validation.
- Server handler validates utf8.RuneCountInString on Create/Update and
  rejects over-limit input with 400.
- Front-end gets AGENT_DESCRIPTION_MAX_LENGTH in core/agents/constants
  (single source of truth shared by the create dialog + edit modal +
  test suite) and a CharCounter component that warns at 90% and errors
  past the cap.
- Description editor moves from a 288px popover to a roomy modal.
  Editor body is mounted only while the dialog is open, so the local
  draft state is locked in at mount time and never reset by an external
  WS update — the React-recommended replacement for the
  useEffect(reset, [value]) anti-pattern.

Counted in code points everywhere (rune count / spread length /
char_length) so multibyte input agrees across all three layers.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(views): data-table polish across runtime + skill lists

Builds on the DataTable migration in 2be0f287:

- Add ColumnMeta.grow flag — declared via TanStack module augmentation
  in ui/lib/data-table.ts. Columns marked meta.grow skip their inline
  width so fixed table-layout assigns them the leftover container space
  (no spacer column). The Title-grows / others-fixed pattern from
  Linear / GitHub PR rows.
- Authoritative table min-width = sum of column.size, applied to the
  <table> itself (fixed-layout ignores cell-level min-width per spec,
  so the floor has to live on the table).
- Header tightens to h-8 + uppercase + tracking-wider; pinned cells
  switch to opaque bg + group-hover so they cover content scrolling
  beneath them and follow row hover state.
- Toolbar slot removed from DataTable (callers wrap the toolbar
  themselves now — keeps DataTable single-purpose).

Also: hover-card popup stops contextmenu / auxclick / dblclick from
bubbling out (in addition to click). Stops the popup from triggering
ancestor handlers (e.g. issue list rows) on right-click / middle-click
without breaking Base UI's outside-click dismiss, which listens to
pointerdown — pointerdown is deliberately NOT stopped.

Runtime + skill list pages updated to use the new sizing model.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(agent): drop LastTaskState, introduce 3-state Workload

Continues the presence-model rework started in #1794 / #1798.

The previous LastTaskState union (running / completed / failed /
cancelled / idle) carried historical outcome at the list level — a
runtime-healthy agent whose last task failed showed a sticky red dot
indistinguishable from a daemon-dead agent.

New model: presence is two orthogonal "right-now" dimensions:

  AgentAvailability — runtime reachability only (online / unstable /
                      offline). Drives the dot colour everywhere.
  Workload          — current load (working / queued / idle). Three
                      states, never historical. Failure / completion /
                      cancellation are surfaced via Recent Work + Inbox,
                      not list-level state.

`queued` (= nothing running, ≥1 queued) is an honest "stuck on offline
runtime" signal. To avoid amber flashes during the brief enqueue→claim
race on healthy runtimes, the queued chip composes with availability:
muted on online, warning amber otherwise.

Activity tab cleanup that follows from the new model:
  - failureReasonLabel relocated from agents/presence.ts to
    tabs/task-failure.ts (presence no longer owns historical state).
  - Recent Work paginates (5 initial, +20 per "Show more"); chat-session
    tasks are filtered out of every Agent-scoped surface to keep
    "team work" separate from private chat.
  - Agents page drops the lastTaskFilter chip group; users find broken
    agents via Inbox / Recent Work, not a list-level filter.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(task): trigger summary snapshot + task:queued lifecycle event

Two task-lifecycle improvements that ship together because they share
the same enqueue/retry hot paths and changes interleave inside task.go:

1. trigger_summary snapshot (migration 061)

   New nullable column on agent_task_queue. Comment-triggered tasks
   snapshot the comment content; autopilot tasks snapshot the run title.
   Truncated to 200 runes via strings.Builder so multibyte input counts
   correctly without O(N²) concatenation. Snapshot survives source
   edits/deletes — every task row self-describes across surfaces (issue
   detail Execution log, agent activity tooltip, inbox) without joining
   back to the originating row.

   Retry rows inherit the parent's snapshot (CreateRetryTask SELECT) so
   the description stays meaningful across attempts. The UI is
   responsible for stacking "Retry #N" context on top.

2. task:queued WS event

   New protocol event covering the ∅ → queued transition. Front-end
   types/events.ts registers it; use-realtime-sync's task: prefix path
   already invalidates task caches via onAny, so old clients without
   this exact-match subscription still refresh correctly. Specific
   subscribers (sticky banner) get sub-second updates instead of
   waiting for daemon claim.

   Retry path now broadcasts task:queued (not task:dispatch) — same
   status transition shape as enqueue, so all "new task created" paths
   agree on one event type.

   Ordering: broadcastTaskEvent runs *before* notifyTaskAvailable so
   the queued event is published into the WS bus before the daemon is
   poked. Without this, a fast daemon could claim and emit task:dispatch
   over the wire before the in-process queued broadcast fan-out reached
   clients — race window is tiny but unsafe-by-construction.

   Per-agent task list (agentTasksKeys.all) and per-issue task list
   (["issues","tasks"]) added to the task: invalidation set so Activity
   tab Recent Work and the Execution log section stay fresh.

Type contracts: AgentTask gains parent_task_id / attempt /
trigger_comment_id (already returned by the API, just missing from TS)
plus the new trigger_summary field.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(issue): ExecutionLogSection — unified active+past runs panel

Replaces two pieces:
  - the click-to-expand timeline that lived inside AgentLiveCard
  - the standalone TaskRunHistory below the main content

with a single right-panel section that lists every agent run for the
issue. Active runs sit at the top (always visible when present); past
runs collapse behind a "Show past runs (N)" toggle, sorted failed →
cancelled → completed within group.

Active rows show the trigger summary, status + relative time, and
Cancel / Transcript actions on hover (gradient backdrop fades the
status text rather than hard-clipping). Past rows show the same
shape minus Cancel.

Retry tasks prepend "Retry #N · " to the inherited summary so they're
distinguishable from their parent (which would otherwise share the
exact same trigger text).

Cache key registered as issueKeys.tasks(issueId); the global
useRealtimeSync task: prefix path already invalidates ["issues","tasks"]
on every task lifecycle event, so the section stays fresh without
local WS subscriptions.

AgentLiveCard slims down to a header-only "agent is working" sticky
banner — keeps the at-a-glance "is anyone working on this right now"
signal and the Stop / Transcript actions, drops the inline timeline
that ExecutionLogSection now owns. Subscribes to both task:queued and
task:dispatch so retries (which only emit queued) land in the banner
without waiting for daemon claim.

issue-detail mounts ExecutionLogSection in the right panel and removes
the now-defunct TaskRunHistory call site.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 14:50:58 +08:00
Bohan Jiang
2d9c153695 feat: quick-create issue (async agent + inbox completion) (#1786)
* feat(server): add quick-create issue async task path

Adds POST /api/issues/quick-create which validates the picked agent's
reachability up front (not archived, has runtime, runtime online) then
queues an issue-less agent task whose context JSONB carries the user's
natural-language prompt + requester + workspace. Daemon claim resolves
the workspace from the context, and the prompt builder switches to a
quick-create template instructing the agent to translate the prompt
into a single multica issue create call.

Task completion writes a success inbox item to the requester pointing at
the newly-created issue (located by querying the agent's most recent
issue in the workspace since task start, so we don't depend on agent
stdout shape). Failures write an action_required inbox item carrying the
original prompt + agent id so the frontend can offer "Edit as advanced
form" without losing input.

* feat(views): quick-create issue modal + inbox failure CTA

Adds a streamlined create-issue UI bound to the c shortcut: pick an
agent, type one line, submit. The modal closes immediately and the
agent translates the prompt into a multica issue create call in the
background. Shift+c keeps the legacy advanced form for users who want
every field. The "Advanced" button inside the new modal seeds the
shared issue-draft store with the prompt + picked agent so switching
mid-flow doesn't lose input.

Last-used agent persists per (user, workspace) via a workspace-aware
zustand store so frequent users skip the picker on every open.

Inbox renders quick_create_done items with a status pin to the new
issue and quick_create_failed items with an "Edit as advanced form"
CTA that re-seeds the legacy modal with the original prompt.

ApiError now carries the parsed JSON body so the modal can branch on
the structured agent_unavailable code without parsing the error
message.

* fix(quick-create): execenv injection, claim race, private-agent permission

Addresses GPT-Boy review on #1786:

1. execenv was rendering the assignment-task issue_context.md / runtime
   workflow even for quick-create, telling the agent to call
   `multica issue get/status/comment add` against an empty IssueID.
   Adds QuickCreatePrompt to TaskContextForEnv, plus a quick-create
   branch in renderIssueContext + the runtime_config workflow that
   instructs the agent to run a single `multica issue create` and
   exit, with explicit "do NOT call issue get/status/comment add"
   guards.

2. ClaimAgentTask serialized only on issue_id / chat_session_id, so
   concurrent quick-creates on the same agent (both NULL on those
   columns) ran in parallel — making the success-inbox lookup race
   over "most recent issue by this agent". Adds a third OR clause
   that treats "all four FKs NULL" as a serialization key for the
   same agent, so quick-create tasks on a given agent run one at a
   time.

3. QuickCreateIssue handler bypassed the private-agent ownership rule
   that validateAssigneePair enforces elsewhere — a user could POST a
   private agent_id they didn't own and trigger it. Now routes the
   picked agent through validateAssigneePair before the runtime
   liveness check.

4. Clarifies the quick-create-store namespacing comment to match the
   actual workspace-aware StateStorage convention used by the other
   issue stores (per-user is browser-profile-local).

* fix(quick-create): branch Output section + deterministic origin lookup

Addresses GPT-Boy's second-pass review on #1786:

1. The runtime_config.go Output section forced "Final results MUST be
   delivered via multica issue comment add" for every non-autopilot
   task — quick-create still got this conflicting instruction even
   though there's no issue to comment on. Switched the Output block
   to a three-way switch so quick-create gets a tailored "stdout is
   captured automatically; do NOT call comment add" branch matching
   the autopilot variant.

2. Completion lookup was "most recent issue created by this agent
   since task.started_at", which races against concurrent issue
   creates by the same agent (assignment task running alongside
   quick-create when max_concurrent_tasks > 1). Replaced with a
   deterministic origin link:

   - Migration 060 extends issue.origin_type CHECK to allow
     'quick_create'.
   - Daemon sets MULTICA_QUICK_CREATE_TASK_ID env var when running a
     quick-create task.
   - multica issue create CLI reads the env var and stamps the new
     issue with origin_type=quick_create + origin_id=<task_id>.
   - Server CreateIssue handler accepts (origin_type, origin_id)
     from trusted callers (only "quick_create" is allowed; the pair
     is rejected unless both fields are provided together).
   - notifyQuickCreateCompleted now calls GetIssueByOrigin keyed on
     (workspace_id, "quick_create", task.ID) — no more time-window
     racing against parallel agent activity.

The old GetRecentIssueByCreatorSince query is removed.
2026-04-29 14:05:26 +08:00
Naiyuan Qing
21e3cfaa01 Agent runtime status redesign: split presence into availability + last-task (#1794)
* feat(agent-status): add workspace live-tasks endpoint and TaskFailureReason type

Lays the API + type contract for the front-end agent presence cache:

- New `GET /api/active-tasks` returns active (queued/dispatched/running)
  tasks plus failed tasks within the last 2 minutes for the current
  workspace. The 2-minute window powers a UI-side auto-clearing "Failed"
  agent state without back-end pollers.
- `agent_task_queue` has no workspace_id column, so the query JOINs agent;
  `SELECT atq.*` keeps `failure_reason` (migration 055) on the wire.
- Adds `TaskFailureReason` to `AgentTask` so the UI can map the 5 backend
  classifiers (agent_error / timeout / runtime_offline / runtime_recovery
  / manual) to copy without parsing free-text errors.
- New `api.getActiveTasksForWorkspace()` client method; workspace is
  resolved server-side from the X-Workspace-Slug header (no path param,
  matching /api/agents and /api/runtimes conventions).

Includes the joint engineering plan and designer brief that scope the
broader Agent / Runtime status redesign — Phase 0 is this contract plus
the front-end derivation layer landing in the next commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(agent-status): derive presence/health states with WS sync and desktop IPC bridge

Adds the front-end derivation layer that turns raw server data into the
user-facing 5-state agent / 4-state runtime enums. UI files are
deliberately untouched in this commit — derivation lives behind hooks
(useAgentPresence, useRuntimeHealth) that any component can call with
zero additional network traffic.

Architecture:
- Derivation is pure functions in packages/core/{agents,runtimes}; the
  back-end stays free of UI translation. Agents algorithm: runtime
  offline > recent failed (2-min window) > running > queued > available.
  Runtimes algorithm: status + last_seen_at -> online / recently_lost /
  offline / about_to_gc.
- A single workspace-wide active-tasks query backs all per-agent
  presence reads, eliminating N+1 across hover cards, list rows, and
  pickers. 30-second tick re-renders the hooks so the failed window
  expires even when no underlying data changes.
- WS task lifecycle events (dispatch / completed / failed / cancelled)
  invalidate active-tasks via the prefix dispatcher. completed/failed
  were removed from specificEvents so they go through both the prefix
  invalidate and the existing chat ws.on() handlers. Reconnect refetch
  picks up active-tasks too.
- Desktop bridges window.daemonAPI.onStatusChange directly into the
  runtimes cache via setQueryData, giving the local daemon sub-second
  feedback (vs. 75s server sweep). Bridge is wsId-bound so workspace
  switches automatically rebind the subscription; daemon_id matching
  covers the same-daemon-multiple-providers case.

24 derivation unit tests cover all branches plus null/empty/boundary
inputs (FAILED_WINDOW_MS edges, null last_seen_at, missing
completed_at). Full core suite: 112 tests passing. Typecheck green
across all 8 workspace packages.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(agent-status): redesign agent runtime status as two orthogonal dimensions

Splits the conflated 5-state agent presence into two independent axes:

- AgentAvailability (3-state): online / unstable / offline — drives the
  dot indicator everywhere a dot appears. Pure runtime reachability;
  never sticky-red because of a past task outcome.

- LastTaskState (5-state): running / completed / failed / cancelled /
  idle — surfaced as text + icon on focused surfaces (hover card,
  agent detail page, agents list, runtime detail). Never colours the dot.

Major changes:

* Domain layer: AgentPresence union → AgentAvailability + LastTaskState.
  derive-presence split into deriveAgentAvailability + deriveLastTaskState
  + deriveAgentPresenceDetail orchestrator. Tests reorganised into three
  groups (availability invariants, last-task invariants, composition).

* Visual config: presenceConfig (5 entries) → availabilityConfig (3) +
  taskStateConfig (5). availabilityOrder + lastTaskOrder for filter chips.

* Workspace-level presence prefetch: new useWorkspacePresencePrefetch
  hook + WorkspacePresencePrefetch mount component, wired into
  DashboardLayout (web) and WorkspaceRouteLayout (desktop). Hover cards
  render synchronously with no skeleton flash on first hover.

* ActorAvatar hover: flipped default — disableHoverCard removed,
  enableHoverCard added (default false). Opt-in at ~14 decision-moment
  surfaces; pickers / decoration sub-chips stay plain. Status dot
  decoupled (showStatusDot prop) so picker rows can show presence
  without nesting popovers.

* Hover cards: AgentProfileCard simplified — availability dot only,
  Detail link top-right (logs live on the detail page). New
  MemberProfileCard mirrors the structure: name + role + email +
  top-2 owned agents (sorted by 30d run count) with click-through to
  agent detail.

* Agents list: split Status into two columns — availability (3-color
  dot + label) and Last run (task icon + label, optional running
  counts). Two independent filter chip groups (Status + Last run);
  combination acts as intersection ("online + failed" finds broken-
  but-alive agents).

* Other UI surfaces (issue list/board/detail, comments, autopilots,
  projects, runtimes, mention autocomplete, subscribers picker)
  updated to the new dot semantics; status dot now strictly 3-color.

Server changes accompany the client redesign — workspace-wide
agent-task-snapshot endpoint, runtime usage queries, etc. — to feed
the derive layer with the data it needs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(agent-detail): drop last-task chip from detail header + inspector

The Recent work section on the agent detail page already shows the same
data (with task titles, timestamps, error context) — surfacing
"Completed" / "Failed" / etc. up in the header was redundant chrome.
Detail surfaces now show only the 3-state availability dot.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(tables): handle narrow viewports across agents / skills / runtimes

Three table layouts were squeezing content into adjacent cells at
intermediate widths. Each fix is small and targeted:

* runtime-list: the Runtime cell's base name had `shrink-0`, so it
  refused to truncate when its grid column was narrowed under width
  pressure — the name visually overflowed into the Health column
  ("ClaudeOnline" etc). Removed shrink-0, added truncate. The Health
  column was also a fixed 9.5rem reservation for the worst-case
  "Recently lost · 2m 14s ago" copy; switched to minmax(0,1fr) so it
  competes fairly with Runtime.

* skills-page: had a single grid template with no responsive
  breakpoints — all 6 columns were rendered at any width and got
  visually jammed below md. Added a <md template that drops Source +
  Updated; the row markup hides those cells via `hidden md:block` /
  `md:contents`.

* agent-list-item: the new Last run column was reserved at minmax(8rem,
  max-content); on narrow md viewports the 8rem floor pushed the row
  past available width. Changed to minmax(0,max-content) so the cell
  shrinks under pressure (its content already truncates).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(agent-card): hover-only Detail + add Runtime row + breathing room

Three small polish tweaks to the agent hover card:

- Detail link gets `mr-1` + fades in only on card hover (group-hover).
  It was visually flush against the popover edge and competing for
  attention; now it stays out of the way during a quick glance and
  surfaces only when the user is dwelling on the card.

- Runtime row is back, in the meta block (cloud/local icon + runtime
  name). The earlier removal was over-aggressive — knowing where an
  agent runs is part of "who is this agent". The wifi badge stays
  dropped because the availability dot in the header already conveys
  reachability.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(runtime): wifi-style health icon (4-state) for runtime list + agent card

Replaces the 6px coloured dot with a wifi-shape icon that carries both
state (Wifi vs WifiOff) and severity (success/warning/muted/destructive).

Mapping:
- online        → Wifi (success)
- recently_lost → WifiHigh (warning) — transient hiccup, fewer bars
- offline       → WifiOff (muted)    — long unreachable
- about_to_gc   → WifiOff (destructive) — sweeper coming soon

Used in two places:

- Runtime list: replaces HealthDot in the dedicated leading-icon column.
  Bumped the column from 0.5rem (dot-sized) to 0.875rem (icon-sized).

- Agent profile card RuntimeRow: derives runtime health from runtime +
  clock (matching the 4-state semantics) and renders HealthIcon next
  to the runtime name. Cloud runtimes always read as online. The
  duplicate signal with the header availability dot is intentional —
  it confirms WHICH runtime is the one currently in the dot's state.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 19:21:13 +08:00
Bohan Jiang
b77acdf642 fix(comments): cancel triggered tasks when comment is deleted (#1747)
When a user deletes a comment that triggered an agent task, the agent
would still run with the now-deleted content baked into its prompt
(fetched at task claim time) — manifesting as "the agent still sees the
deleted comment". The FK ON DELETE SET NULL only nullified
trigger_comment_id; the queued task itself was never cancelled.

DeleteComment now cancels any queued/dispatched/running task whose
trigger is the deleted comment, before the comment row is removed.
2026-04-27 18:24:07 +08:00
devv-eve
ba2f19d631 fix: refresh agent status from active tasks (#1733)
Co-authored-by: Eve <eve@multica.ai>
2026-04-27 13:34:24 +08:00
jmoney8896
3c3e3bd330 fix(task): reconcile agent status when cancelling tasks by issue (#1587) (#1648)
CancelTasksForIssue silently dropped the list of affected tasks, so
whenever an issue transitioned to "cancelled" or "done" while a task was
still active (6 call sites in issue.go), the underlying agent was left
stuck at status="working" indefinitely and required a manual
`multica agent update <id> --status idle` to self-correct. This matches
the symptom reported in #1587: task rows move to "cancelled" via a
non-user-initiated path, agent status never reconciles.

Change CancelAgentTasksByIssue from :exec to :many (also tack on
completed_at = now() for consistency with CancelAgentTasksByIssueAndAgent),
then update CancelTasksForIssue to iterate the returned rows and call
ReconcileAgentStatus + broadcast task:cancelled per affected task —
mirroring the pattern already used by CancelTask and RerunIssue.

No test added; the change is small and mirrors well-covered paths.
Happy to add a mock-backed test in a follow-up if reviewers prefer.

Refs #1587
Refs #1149
2026-04-26 10:58:42 +08:00
LinYushen
0b1333fb00 feat(server): orphan-task recovery + auto-retry + manual rerun (MUL-1128) (#1476)
* feat(server): orphan-task recovery + auto-retry + manual rerun (MUL-1128)

When the daemon process crashed mid-task the issue was stuck at
in_progress for up to 2.5h: the in-flight task timeout was the only
mechanism that ever moved the row, and the runtime heartbeat sweeper
only fires after the runtime stays offline for 45s — a quick restart
beats both windows.

This change implements the A+B plan from the issue thread:

A. lifecycle hygiene
- migration 055 adds attempt / max_attempts / parent_task_id /
  failure_reason / last_heartbeat_at to agent_task_queue
- new daemon-auth endpoint POST /runtimes/{id}/recover-orphans:
  daemon calls it on every register so the server fails any
  dispatched/running tasks the previous process left behind
- new daemon-auth endpoint POST /tasks/{id}/session: persists the
  agent's session_id + work_dir mid-flight so a crash doesn't
  lose the resume pointer (claude+codex emit MessageStatus with
  SessionID; daemon forwards on the first one it sees)
- FailAgentTask / FailStaleTasks / FailTasksForOfflineRuntimes
  now set failure_reason ('agent_error' / 'timeout' /
  'runtime_offline')

B. auto-retry with resume context
- TaskService.MaybeRetryFailedTask spawns a fresh queued attempt
  carrying parent's session_id/work_dir when the failure reason
  is infrastructure-shaped (timeout, runtime_offline,
  runtime_recovery) and attempt < max_attempts; skips autopilot
- wired into the runtime sweeper paths and TaskService.FailTask
  so the user transparently sees a new in_progress run instead of
  a stuck row
- new user-auth POST /api/issues/{id}/rerun + multica issue rerun
  CLI for the manual escape hatch

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix(server): address PR review for orphan-task recovery (MUL-1128)

Three review-must-fix items on top of the A+B implementation:

1. recover-orphans now funnels through TaskService.HandleFailedTasks,
   the same shared post-failure pipeline used by the runtime sweeper.
   This guarantees task:failed events are emitted, agent status is
   reconciled, and issues stuck in_progress with no remaining active
   task are reset to todo even when no auto-retry is created
   (max_attempts exhausted, autopilot, non-retryable reason).

2. RerunIssue now uses CancelAgentTasksByIssueAndAgent, scoped to the
   issue's current assignee. The previous implementation called
   CancelAgentTasksByIssue, which would collateral-cancel parallel
   @-mention agents on the same issue.

3. GetLastTaskSession now considers both completed and failed tasks
   (mirroring GetLastChatTaskSession), ordering by the most recent
   timestamp. With UpdateAgentTaskSession pinning session_id/work_dir
   mid-flight, an auto-retry or manual rerun of a daemon-crash failure
   now actually resumes the prior conversation context instead of
   starting fresh — matching the stated B-branch behaviour.

go build / go vet pass; the existing service and agent test suites pass.
runtime_sweeper / handler integration tests require a local DB with the
055 migration (and the pre-existing 050 first_executed_at column).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-22 13:08:37 +08:00
Jiayuan Zhang
b291db11c2 feat(agents): add per-agent model field with provider-aware dropdown (#1399)
Adds a first-class `model` field on agents so users can pick the LLM model from the create / settings UI instead of editing `custom_env` / `custom_args`. Each provider's dropdown is populated from the live CLI when possible (`opencode models`, `pi --list-models`, `openclaw agents list --json`, `cursor-agent --list-models`, hermes ACP `session/new` → `SessionModelState`), with a static catalog for providers that don't enumerate.

Daemon resolves the runtime model as `agent.model → MULTICA_<PROVIDER>_MODEL → ""` — empty passes through so each backend's CLI picks its own default, avoiding static-guess drift.

Per-provider honouring:
- Claude / Codex / OpenCode / Cursor / Gemini / Pi / Copilot — CLI `--model` / thread payload.
- OpenClaw — `opts.Model` is mapped to `--agent <name>` (the CLI rejects `--model`).
- Hermes — `session/set_model` ACP RPC; stderr is sniffed for provider-level errors so HTTP 4xx from the configured LLM surfaces instead of "empty output"; explicit-model failures mark the task `failed`.

Supporting changes: migration 050 adds `agent.model`; daemon ↔ server heartbeat piggyback carries a model-discovery request; new REST endpoints under `/api/runtimes/{id}/models`; `multica agent create --model` / `update --model`; shared `ModelDropdown` in `packages/views/agents` (searchable, creatable, provider-grouped, default-badge, runtime-supported gate).
2026-04-21 00:06:34 +08:00
devv-eve
5fa1da448f fix(chat): preserve chat session resume pointer across failures (#1360)
* fix(chat): preserve chat session resume pointer across failures

The chat 'forgets earlier messages' bug came from PriorSessionID being
silently lost in several edge cases:

- UpdateChatSessionSession unconditionally overwrote chat_session.session_id,
  so any task that completed without a session_id (early agent crash,
  missing result) wiped the resume pointer to NULL.
- CompleteAgentTask + UpdateChatSessionSession ran in separate calls. A
  follow-up chat message claimed in between resumed against a stale (or
  NULL) session and started over.
- FailAgentTask never wrote session_id back, so a task that established
  a real session before failing lost its resume pointer.
- ClaimTaskByRuntime only trusted chat_session.session_id and never
  fell back to the existing GetLastChatTaskSession query, so a single
  bad turn could permanently drop the conversation memory.

This change:

- Use COALESCE in UpdateChatSessionSession so empty inputs preserve the
  existing pointer; surface DB errors instead of swallowing them.
- Run CompleteAgentTask/FailAgentTask + UpdateChatSessionSession inside
  the same transaction (TaskService now takes a TxStarter).
- Extend FailAgentTask + the daemon FailTask path (client, handler,
  service) to forward session_id/work_dir, so failed/blocked tasks that
  built a real session still record it.
- Fall back to GetLastChatTaskSession in ClaimTaskByRuntime when the
  chat_session pointer is missing, and include failed tasks in that
  lookup so a single failure can't lose the conversation.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix(daemon): forward session_id/work_dir on blocked + timeout paths

runTask previously dropped result.SessionID and env.WorkDir on the
non-completed return paths:

- timeout returned a naked error, so handleTask called FailTask with
  empty session info and the chat resume pointer was either left stale
  or eventually overwritten with NULL.
- blocked / failed (default branch) returned a TaskResult without
  SessionID / WorkDir, so even though FailTask now COALESCEs into
  chat_session, there was no value to write through.
- the empty-output completion path was the same: it raised an error
  even when a real session_id had been built.

All three paths now return a TaskResult that carries the SessionID /
WorkDir the backend produced. Combined with the COALESCE-based update
in UpdateChatSessionSession and the FailTask plumbing introduced in
PR #1360, the next chat turn can always resume from the latest agent
session — even when the previous turn timed out, was rate-limited, or
returned an empty completion — instead of starting over with no memory
of the conversation.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix(copilot): capture session id from session.start as fallback

The Copilot backend only read sessionId from the synthetic 'result'
event, ignoring the one already present on session.start. When the CLI
was killed before result arrived (timeout, cancel, crash, or a
session.error mid-turn), the daemon reported SessionID="" and the
chat-session resume pointer could not advance — causing the chat to
silently drop conversation memory on the next turn.

Capture session.start.sessionId into state up front, and only let
'result' overwrite it when it actually carries one. result still wins
when present (it is the authoritative end-of-turn record).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix(copilot): parse premiumRequests as float to preserve session id

Copilot CLI v1.0.32 serializes premiumRequests as a float (e.g. 7.5),
not an integer. Our copilotResultUsage struct typed it as int, which
made the entire 'result' line fail json.Unmarshal — silently dropping
sessionId on every turn.

This was the real cause of chat memory loss: the daemon reported
SessionID="" to the server, chat_session.session_id stayed NULL, and
the next chat turn never received --resume <id>, so each turn started
a fresh Copilot session with no prior context.

Add a regression test using the real JSON line from CLI v1.0.32 that
asserts sessionId is preserved when premiumRequests is fractional.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Devv <devv@Devvs-Mac-mini.local>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Eve <eve@multica.ai>
Co-authored-by: yushen <ldnvnbl@gmail.com>
2026-04-19 22:50:33 -07:00
Korkyzer
63800f05ff fix(agent): add per-agent mcp_config field to restore MCP access (#1168)
* fix(agent): add per-agent mcp_config field to restore MCP access

Closes #1111

The --strict-mcp-config flag was added defensively in #592 to prevent
Claude agents from inheriting MCP state from the outer Claude Code session.
It was meant to be paired with --mcp-config <path> to inject a controlled
set of MCPs, but that path was never implemented, which silently stripped
all user-scope MCPs from spawned agents.

This PR completes the original design by:

- Adding a nullable mcp_config jsonb column to the agents table
- Wiring mcp_config through AgentResponse, Create/Update requests
- Piping it into ExecOptions.McpConfig in the daemon
- Serializing to a temp file and passing --mcp-config <path> in buildClaudeArgs
- Blocklisting --mcp-config in claudeBlockedArgs to prevent override
  via custom_args

Does not touch Codex provider (tracked separately in #674).
Does not implement Multica MCP auto-injection (out of scope).

* fix: disambiguate JSON null vs absent for mcp_config
2026-04-18 01:35:22 +08:00
Bohan Jiang
ce447c7f06 feat(agent): add custom CLI arguments support (#986)
* feat(agent): add custom CLI arguments support

Allow users to configure custom CLI arguments per agent that get
appended to the agent subprocess command at launch time. This enables
use cases like specifying different models (--model o3), max turns,
or other provider-specific flags without needing separate runtimes.

Changes:
- Add custom_args JSONB column to agent table (migration 041)
- Update API handler to accept/return custom_args in create/update
- Pass custom_args through claim endpoint to daemon
- Append custom_args to CLI commands for all agent backends
- Add ExecOptions.CustomArgs field in agent package
- Add Custom Args tab in agent detail UI
- Add --custom-args flag to CLI agent create/update commands

Closes MUL-802

* fix(agent): filter protocol-critical flags from custom_args

Add per-backend filtering of custom_args to prevent users from
accidentally overriding flags that the daemon hardcodes for its
communication protocol (e.g. --output-format, --input-format,
--permission-mode for Claude).

This follows the same pattern as custom_env's isBlockedEnvKey: we
only block the small, stable set of flags that would break the
daemon↔agent protocol — not every possible dangerous flag. Workspace
members are trusted for everything else.

Each backend defines its own blocked set:
- Claude: -p, --output-format, --input-format, --permission-mode
- Gemini: -p, --yolo, -o
- Codex: --listen
- OpenCode: --format
- OpenClaw: --local, --json, --session-id, --message
- Hermes: none (ACP is positional)

Includes unit tests for the filtering logic.

* fix(agent): address code review nits for custom_args

- Replace module-level `nextArgId` counter with `crypto.randomUUID()`
  in custom-args-tab.tsx to avoid SSR ID conflicts
- Add unit tests for custom args passthrough and blocked-arg filtering
  in both Claude and Gemini arg builders
2026-04-15 14:58:53 +08:00
Jiang Bohan
4165401d16 feat(agent): support custom environment variables for router/proxy mode
Add per-agent custom_env configuration that gets injected into the agent
subprocess at launch time. This enables users to configure custom API
endpoints (ANTHROPIC_BASE_URL), API keys (ANTHROPIC_API_KEY), and cloud
provider modes (CLAUDE_CODE_USE_BEDROCK, CLAUDE_CODE_USE_VERTEX) without
requiring code changes.

Changes:
- Migration 040: add custom_env JSONB column to agent table
- Backend: custom_env in agent CRUD API + claim endpoint
- Daemon: merge custom_env into subprocess environment variables
- Frontend: env var editor in agent settings (key-value pairs with
  visibility toggle for sensitive values)

Closes #816
Related: #807, #809
2026-04-13 16:47:56 +08:00
yushen
50f9e673e8 feat(chat): add agent chat feature (full stack)
Implement the Master Agent chat feature allowing users to chat with agents
directly from a floating window, separate from the issue-based workflow.

Backend:
- New chat_session and chat_message tables (migration 033)
- Make issue_id nullable on agent_task_queue for chat tasks
- REST API: create/list/get/archive sessions, send/list messages
- EnqueueChatTask in TaskService with session_id persistence
- WS events: chat:message, chat:done
- Daemon: chat task type with separate prompt builder
- ClaimTaskByRuntime populates chat context (session, message, repos)

Frontend:
- ChatSession/ChatMessage types + API client methods
- core/chat: TanStack Query options, mutations with optimistic updates, WS updaters
- features/chat: Zustand store, ChatFab (floating button), ChatWindow with
  real-time streaming via task:message events
- Mounted in dashboard layout (bottom-right corner)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 14:19:46 +08:00
Bohan Jiang
98af9f442c Merge pull request #471 from multica-ai/agent/j/959392dd
feat: support multiple agents running on same issue
2026-04-08 12:45:56 +08:00
devv-eve
7c79611309 refactor: remove agent triggers config field (#469)
* refactor: remove agent triggers config field

Remove the triggers field from agent configuration. The on_assign,
on_comment, and on_mention behaviors are now always enabled (hardcoded),
as decided in the Agentflow design discussion (MUL-372).

Changes:
- Database: migration 032 drops triggers column from agent table
- Backend: remove triggers from create/update agent APIs and response
- Backend: simplify trigger-checking logic to always-enabled
- Frontend: remove TriggersTab UI and AgentTrigger types
- Tests: remove trigger config unit tests (no longer configurable)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: also remove agent tools config field

Remove the tools field from agent configuration alongside triggers.
The tools field was a placeholder — stored in the DB and shown in the
UI but never passed to the daemon or used at runtime.

- Database: migration 032 now also drops tools column
- Backend: remove tools from create/update agent APIs and response
- Frontend: remove ToolsTab UI, AgentTool type, and tools tab
- Update landing page copy

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(test): remove tools/triggers columns from test fixtures

The test fixtures still referenced the dropped tools and triggers
columns when inserting agent rows, causing CI failures.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Devv <devv@Devvs-Mac-mini.local>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 22:02:28 +08:00
Jiang Bohan
b94108768e feat: support multiple agents running concurrently on the same issue
- Relax ClaimAgentTask SQL constraint from per-issue to per-(issue, agent)
  serialization, allowing different agents to run in parallel on the same issue
- Update GetActiveTaskForIssue API to return all active tasks (array) instead of
  just the first one
- Refactor AgentLiveCard to render one card per active task, routing WebSocket
  messages by task_id for independent timelines
- Fix shouldEnqueueOnComment to use per-agent dedup so a mentioned agent's
  pending task doesn't block the assigned agent's on_comment trigger

Closes MUL-160
2026-04-07 18:19:57 +08:00
LinYushen
09764c5f51 feat(agent): replace hard delete with archive/restore (#346)
* feat(agent): replace hard delete with archive/restore

Replace agent deletion with soft archive pattern. Archived agents
are preserved in the database with all historical references intact
but cannot be assigned, mentioned, or trigger tasks.

Backend:
- Add archived_at/archived_by columns to agent table (migration 031)
- Replace DELETE /api/agents/{id} with POST /api/agents/{id}/archive
- Add POST /api/agents/{id}/restore endpoint
- ListAgents excludes archived by default (?include_archived=true to include)
- Skip archived agents in task triggers (on_assign, on_comment, on_mention)
- Block assignment to archived agents
- Cancel pending tasks on archive
- New events: agent:archived, agent:restored (replacing agent:deleted)

Frontend:
- Agent type includes archived_at/archived_by fields
- Mention autocomplete and assignee picker filter out archived agents
- Agent list shows archived agents with muted styling
- Agent detail shows archive banner with restore button
- Delete button replaced with Archive button and updated confirmation dialog
- API client: archiveAgent/restoreAgent replace deleteAgent

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(agent): self-review fixes for archive feature

- Fix: workspace store now fetches agents with include_archived=true
  so archived agents are actually visible in the frontend (the archived
  UI was dead code before — ListAgents excludes archived by default)
- Fix: add error logging for CancelAgentTasksByAgent in ArchiveAgent
- Fix: add idempotency guards — return 409 Conflict when archiving
  an already-archived agent or restoring a non-archived agent
- Fix: revert unnecessary extra GetAgent query in ReconcileAgentStatus
  (archived agents won't have running tasks after CancelAgentTasksByAgent)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 17:33:52 +08:00
LinYushen
a80d61f8e1 fix(task): enforce per-issue serial execution in task claiming (#330)
Add NOT EXISTS check to ClaimAgentTask SQL to prevent claiming a queued
task when the same issue already has a dispatched/running task. This
ensures serial execution within an issue while preserving parallel
execution across different issues (concurrency group pattern).

Also add defensive guard in the frontend task:dispatch handler to avoid
replacing an active task's LiveLog timeline mid-execution.

Closes MUL-183

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 14:12:00 +08:00
Jiang Bohan
4780540bd2 merge: resolve conflicts with main 2026-04-01 13:12:23 +08:00
Jiang Bohan
365b5b5f0e feat(agent): add task cancellation with stop button
Users can now interrupt running agents via a "Stop" button on the live
card. The daemon polls task status every 5 seconds and kills the agent
process when cancellation is detected.

Changes:
- New CancelAgentTask SQL query and CancelTask service method
- POST /api/issues/{id}/tasks/{taskId}/cancel endpoint
- Daemon polls GetTaskStatus during execution, cancels context on match
- Frontend: Stop button on AgentLiveCard, task:cancelled WS event
2026-03-31 18:43:37 +08:00
Jiayuan
37881adbed feat(server): trigger agents via @mention in comments
When a user @mentions an agent in any issue's comment, the system now
enqueues a task for that agent. The agent reads the issue context and
replies to the triggering comment thread.

Changes:
- Add shared util.ParseMentions for mention parsing (used by both
  comment handler and notification listeners)
- Add EnqueueTaskForMention to TaskService for explicit agent targeting
- Add on_mention trigger type support in agent trigger config
- Add HasPendingTaskForIssueAndAgent SQL query for per-agent dedup
- Add enqueueMentionedAgentTasks in CreateComment handler

Safety: prevents self-trigger (agent mentioning itself), dedup with
assignee on_comment trigger, terminal issue status check, and per-agent
pending task dedup.
2026-03-31 15:30:24 +08:00
LinYushen
961de18c97 feat(agents): reply as thread instead of top-level comment (#205)
* feat(agents): reply as thread instead of top-level comment

When an agent responds to a user comment, the reply is now nested under
the triggering comment (parent_id) instead of appearing as a separate
top-level comment. Also enables on_comment trigger by default for newly
created agents.

- Add trigger_comment_id column to agent_task_queue (migration 028)
- Pass triggering comment ID through EnqueueTaskForIssue → task → createAgentComment
- Include parent_id in WebSocket broadcast for agent comments
- Default agent creation includes both on_assign and on_comment triggers

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(cli): add --parent flag to comment add for threaded replies

The agent posts comments via the CLI, so the correct fix is giving it a
--parent flag rather than wiring trigger_comment_id through the task
infrastructure. The agent reads the comment list, decides which comment
to reply to, and passes --parent <comment-id>.

- Add --parent flag to `multica issue comment add`
- Update agent runtime instructions to explain --parent usage

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(daemon): pass trigger_comment_id to agent execution context

The agent now knows which comment triggered its task and gets an explicit
instruction to reply to it using --parent. The trigger_comment_id flows
from the DB through the claim response, daemon Task struct, and into
issue_context.md where the agent sees it.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(comments): agent replies to thread root, matching frontend behavior

When the triggering comment is itself a reply (has parent_id), resolve
to the thread root so the agent's reply stays in the same flat thread.
This matches the frontend where all replies share the top-level parent.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(cli): show parent_id and full IDs in comment list

The table output now includes a PARENT column and shows full comment IDs
(not truncated) so agents can see thread structure and use --parent.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(daemon): instruct agents to always use --output json

Agents now see explicit guidance to use --output json for all read
commands, ensuring they get structured data with full IDs and parent_id
for proper threading.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(daemon): differentiate comment-trigger vs assign-trigger context

When triggered by a comment, the agent now gets clear instructions:
- Primary goal is to read and respond to the comment
- Do NOT change issue status just because you replied
- Only change status if explicitly requested

This prevents the agent from seeing "In Review" and stopping, since
it now understands the task is to reply, not to re-evaluate the issue.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(daemon): split workflow by trigger type in CLAUDE.md/AGENTS.md

The Workflow section in the agent's runtime config now shows a
comment-reply workflow when triggered by a comment (read comments,
find trigger, reply, don't change status) vs the full assignment
workflow (set in_progress, do work, set in_review).

Previously the agent always saw the assignment workflow, causing it
to check the issue status, see "In Review", and stop without reading
or replying to the triggering comment.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor(daemon): remove duplicate workflow from issue_context.md

Workflow instructions now live only in CLAUDE.md/AGENTS.md (runtime_config.go).
issue_context.md keeps just the task data: issue ID, trigger type, and
triggering comment ID.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(task): skip duplicate comment on completion for comment-triggered tasks

When triggered by a comment, the agent posts its own reply via CLI
with --parent. The task completion path was also creating a comment
from the agent's stdout output, resulting in duplicates. Now only
assignment-triggered tasks auto-post output as a comment. Error
messages from FailTask are still posted regardless of trigger type.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 13:48:39 +08:00