multica

mirror of https://github.com/multica-ai/multica.git synced 2026-06-17 03:38:32 +02:00

Author	SHA1	Message	Date
YOMXXX	34f16e2c7a	fix(opencode): deny interactive questions in daemon mode (#2878 ) * fix(opencode): deny interactive questions in daemon mode * fix(opencode): avoid permission env ordering bypass	2026-05-20 17:17:31 +08:00
Angular	1f978bf1ec	feat(autopilot): link created issues to projects (#2908 ) * feat(autopilot): link created issues to projects * test(autopilot): cover project flag	2026-05-20 15:37:23 +08:00
Bohan Jiang	2bec2221d2	feat(agent): per-agent thinking_level for claude + codex (MUL-2339) (#2865 ) * feat(agent): persist thinking_level per agent (MUL-2339) Adds a nullable `thinking_level` column to the `agent` table so the backend can route a runtime-native reasoning/effort token (e.g. Claude's `xhigh`, Codex's `minimal`) through to the agent CLI on every dispatch. The column is intentionally TEXT rather than an enum — Claude and Codex publish overlapping but distinct vocabularies and we want the persisted value to round-trip exactly through whichever CLI receives it. NULL is the "use runtime default" sentinel that every downstream consumer reads as "do not inject --effort / reasoning_effort". This commit is just the storage layer (migration + sqlc); subsequent commits wire it through the API, daemon, and agent backends. Co-authored-by: multica-agent <github@multica.ai> * feat(agent-backend): inject reasoning effort for claude + codex (MUL-2339) Extends ExecOptions with a runtime-native ThinkingLevel string and wires it into the Claude and Codex backends. Discovery is driven by the local CLI so the daemon advertises whatever the host install supports rather than a hand-maintained list that goes stale. Per Elon's PR1 review: - Claude: parses `claude --help` to learn the `--effort` superset and projects through a per-model allow-list (xhigh is Opus-only; max is session-only on the smaller models). Falls back to a conservative static list when the binary is missing or help drift hides the line. - Codex: drives `codex debug models --output json` so per-model reasoning subsets and the documented default come directly from the CLI. The older config-error probe trick is gone — the JSON path is stable and doesn't pollute stderr with an intentional misconfig. - Cache key includes (provider, executablePath, cliVersion) so a CLI upgrade invalidates entries that referenced the older help / catalog. Per Trump's PR1 constraint, all three Codex injection points (thread/start.config, thread/resume.config, turn/start.effort) flow through one helper (`applyCodexReasoningEffort`) so they cannot drift independently. The shared `codexReasoningCases` fixture in `thinking_test.go` asserts the same value→{shape, key} contract at each site for every level the runtimes know about. Claude's `--effort` is also added to `claudeBlockedArgs` so a user custom_args entry can't silently outvote the daemon-injected value. Co-authored-by: multica-agent <github@multica.ai> * feat(api): wire thinking_level through API + daemon contract (MUL-2339) End-to-end plumbing for the per-agent reasoning/effort setting: - AgentResponse / TaskAgentData now carry `thinking_level`; the daemon's claim response includes it and the daemon's executor passes it through to agent.ExecOptions, where the Claude and Codex backends already know what to do with it. - ModelEntry on the runtime-models wire format gains a `thinking` block carrying `supported_levels` + `default_level` per model so the UI can render a runtime-aware picker without the server having to know about the local CLI install. `handleModelList` projects the agent-package catalog (including the new Thinking field) into the wire shape. - CreateAgent / UpdateAgent gate the field with a synchronous provider enum check (claude / codex only today). UpdateAgent is tri-state: field omitted = no change, "" = explicit clear (new `ClearAgentThinkingLevel` query, mirrors the existing mcp_config null pattern), non-empty = validate then set. Per Trump's PR1 review, the API NEVER auto-clears on a runtime/model swap and ALWAYS returns 400 on an unknown literal value — same shape across CreateAgent, UpdateAgent, and combined patches that move runtime + level in one request. Per-model combination failures (e.g. `xhigh` against a model that only supports up to `high`) surface as a daemon-side task error, not a silent server-side rewrite. TS types follow the same shape: `Agent.thinking_level`, `CreateAgentRequest`/`UpdateAgentRequest` add the field, `RuntimeModel` grows a `thinking` block. Older backends omit the field, which the front-end treats as "no picker for this model" — installed desktop builds keep working. Co-authored-by: multica-agent <github@multica.ai> * fix(agent): correct codex debug models argv + pin via runner test (MUL-2339) `codex debug models --output json` is rejected by codex-cli 0.131.0 — the subcommand emits JSON on stdout by default and has no `--output` flag. Drop the flag and add `--bundled` to skip the network refresh discovery doesn't need. Move the argv to a package-level var and add a test that runs a fake `codex` to assert the binary actually receives exactly `debug models --bundled`, so the contract can't silently drift on the next refactor. Also teach ValidateThinkingLevel to resolve an empty model to the provider's default model entry. Without this, every default-model task with a persisted thinking_level would be misjudged "unknown model" by the daemon guard. Co-authored-by: multica-agent <github@multica.ai> * fix(api): reject runtime switch that would leave invalid thinking_level (MUL-2339) A PATCH that changed `runtime_id` without touching `thinking_level` used to silently keep the existing value, so a Claude agent storing `max` could land on a Codex runtime where `max` is not a recognised token at all, and the daemon would receive a literal-invalid level. Hold the same "always 400 on literal-invalid, never silent coerce" rule on this implicit path. When runtime_id changes and the existing value is not in the new provider's enum, return 400 with the recovery options (clear via `thinking_level=""` or re-set in the same PATCH). Add coverage for both the kept-when-still-valid and the rejected cases, plus the two recovery paths (clear and replace). Co-authored-by: multica-agent <github@multica.ai> * fix(daemon): guard runTask with per-model thinking_level validator (MUL-2339) ValidateThinkingLevel existed but had no call site — `task.Agent. ThinkingLevel` flowed straight into ExecOptions, so `xhigh` configured on a non-Opus Claude model, or API-side stale values that escaped the provider enum gate, would be injected anyway. Run the validator before building ExecOptions. Invalid combinations log a warning and drop the level instead of failing the task: the agent still runs, just at the runtime's default reasoning effort. Discovery errors fail open (keep the level, let the CLI surface any objection) so a transient `claude --help` failure can't strand work. Empty model is forwarded as-is; the validator resolves it to the provider's default model internally per the cross-package contract. Co-authored-by: multica-agent <github@multica.ai> * chore(agent): drop stale `--output json` comments + unused scanner (MUL-2339) Codex CLI's `debug models` subcommand emits JSON without an `--output` flag, and `parseCodexDebugModels` never read from the bufio.Scanner. Sync the comments with the actual invocation and remove the dead init. Co-authored-by: multica-agent <github@multica.ai> --------- Co-authored-by: multica-agent <github@multica.ai>	2026-05-20 12:30:10 +08:00
Jiayuan Zhang	fc8528d64d	feat(autopilot): support assigning to a squad (MUL-2429) (#2888 ) * feat(autopilot): support assigning autopilot to a squad (MUL-2429) Path A (Squad-as-Leader) from the RFC: when an autopilot's assignee is a squad, dispatch resolves to squad.leader_id and executes against the leader's runtime — semantics match a human manually assigning the issue to that squad, no fan-out. Backend scope only; frontend picker change is a follow-up PR. Changes: - 096_autopilot_squad_assignee migration: drop agent FK on autopilot.assignee_id, add assignee_type column (default 'agent'), add autopilot_run.squad_id attribution column. - service.AgentReadiness: single source of truth for archived / runtime-bound / runtime-online checks. Shared by autopilot admission gate, run_only dispatch, and isSquadLeaderReady. - service.resolveAutopilotLeader: translates assignee_type/id to the agent that actually runs the work. - dispatchCreateIssue: stamps issue with assignee_type='squad' for squad autopilots and enqueues via EnqueueTaskForSquadLeader. - dispatchRunOnly: belt-and-braces readiness re-check after resolving squad → leader so a leader that went offline between admission and dispatch produces a clean failure instead of a doomed task. - handler.CreateAutopilot / UpdateAutopilot: accept assignee_type with squad/agent existence + leader-archived validation. Backward-compatible default of "agent" preserves the contract for older clients. - Analytics: AutopilotRunStarted/Completed/Failed events carry assignee_type and squad_id; PostHog can now group autopilot runs by squad without joining back to the autopilot row. Co-authored-by: multica-agent <github@multica.ai> * fix(autopilot): reject archived squads, route post-admission skips, cleanup dangling-agent autopilots (MUL-2429) Addresses three review findings on PR #2888: 1. Archived squad handling: validateAutopilotAssignee now rejects squads with archived_at set; resolveAutopilotLeader returns errSquadArchived so the admission gate fails closed; DeleteSquad now mirrors the issue transfer for autopilot rows (TransferSquadAutopilotsToLeader) so surviving autopilots flip to assignee_type='agent' (leader) instead of dangling at the archived squad. 2. dispatchRunOnly post-admission readiness: introduces errDispatchSkipped sentinel, recognised by DispatchAutopilot via handleDispatchSkip so the run is recorded as `skipped` (not `failed`). Manual triggers no longer 500 when the leader's runtime goes offline between admission and task creation. New TestManualTriggerDoesNotErrorOnPostAdmissionSkip locks the behaviour in. 3. Dangling agent assignee after migration 096 dropped the FK: shouldSkipDispatch now distinguishes pgx.ErrNoRows / errSquadArchived (hard skip — retrying won't help) from transient DB errors (fail-open). DeleteAgentRuntime pauses autopilots that target agents about to be hard-deleted (ListArchivedAgentIDsByRuntime + PauseAutopilotsByAgentAssignees) so the breakage surfaces as a paused row in the UI instead of a quiet skip-burning loop. Unit tests cover the sentinel unwrap contract and errSquadArchived errors.Is behaviour. Integration test TestAutopilotDispatchSkipsWhenRuntimeOffline re-verified against a fresh DB with migration 096 applied. Co-authored-by: multica-agent <github@multica.ai> * fix(autopilot): bump last_run_at on post-admission skip (MUL-2429) Match recordSkippedRun (pre-flight skip) and the success path so the scheduler / "last seen" UI both reflect that this tick evaluated the trigger, even when the post-admission readiness gate caught a late regression. Addresses Emacs review caveat #1 on PR #2888. Co-authored-by: multica-agent <github@multica.ai> * feat(autopilot): mixed agent/squad assignee picker in dialog (MUL-2429) End-to-end UI for assigning an autopilot to a squad. Closes the PR #2888 backend gap: the squad-as-assignee feature was already wired in Go (Path A, RFC §4) but the desktop dialog never offered the choice. - core/types/autopilot: add `AutopilotAssigneeType`, surface `assignee_type` on `Autopilot` + Create/Update request payloads. - views/autopilots/pickers/agent-picker: switch to a polymorphic AssigneeSelection (`{type, id}`); render agents and squads as two grouped sections with shared pinyin search. - views/autopilots/autopilot-dialog: maintain `assigneeType` state, send it on create/update, render the trigger avatar / hover dot with `assignee.type`. - views/autopilots/autopilots-page + autopilot-detail-page: render the assignee row using `autopilot.assignee_type` so squad-typed autopilots show the squad avatar + name, not a broken agent lookup. - locales: add `agents_group` / `squads_group` / `select_assignee` keys (en + zh-Hans), keep legacy `select_agent` for callers that still reference it. Co-authored-by: multica-agent <github@multica.ai> --------- Co-authored-by: Lambda <lambda@multica.ai> Co-authored-by: multica-agent <github@multica.ai>	2026-05-20 05:30:13 +02:00
Jiayuan Zhang	2ad1cd8ff8	feat(profile): user profile description injected into agent brief (MUL-2406) ## Summary Adds per-user `profile_description` so coding agents have cheap, durable context about who is asking. v1 per the brief Xeon locked in on [MUL-2406](mention://issue/63a7247c-4f6a-42cf-90d1-7c746e77158a): - DB — `user.profile_description TEXT NOT NULL DEFAULT ''` (migration 096). 2000-rune cap enforced server-side. No nullable / privacy state to manage. - API — `PATCH /api/me` accepts the field; `UserResponse` always emits it. Client wraps `updateMe` in a lenient `UserSchema` + `EMPTY_USER` fallback per CLAUDE.md API Response Compatibility. - UI — Settings → Account gains an "About you" textarea with live `n/2000` counter, `maxLength` guard, and a localized too-long error (EN + zh-Hans). - CLI — `multica user profile get` / `multica user profile update` with `--description / --description-stdin / --description-file / --clear`, mirroring the existing `issue comment add` input-mode menu. - Daemon injection — claim handler resolves the runtime owner and stamps `requesting_user_name` + `requesting_user_profile_description` on the task. `buildMetaSkillContent` emits `## Requesting User` between `## Agent Identity` and `## Available Commands`, blockquoted and framed as background context. The block is omitted entirely when the description is empty (no token cost when unused). Brief is written once per task via `CLAUDE.md` / `AGENTS.md`, not the per-turn prompt — same path the agent already reads for identity, so no extra per-turn cost. ## Test plan - [x] `go build ./...`, `go vet ./...`, `go test ./internal/cli/ ./internal/daemon/ ./internal/daemon/execenv/ ./cmd/multica/` - [x] New brief tests: `TestBuildMetaSkillContentEmitsRequestingUser`, `TestBuildMetaSkillContentOmitsRequestingUserWhenEmpty` - [x] `pnpm typecheck`, `pnpm lint`, `pnpm test` (74 files, 644 tests pass) - [ ] Handler DB tests (`TestUpdateMe*`) require a migrated test DB — not runnable in this sandbox - [ ] Manual: open Settings → Account, set a description, confirm the next daemon-run agent's `CLAUDE.md` shows `## Requesting User`	2026-05-19 19:51:28 +02:00
Joey Frasier (Boothe)	76cd8275ff	fix(openclaw): parse whole buffer instead of line-by-line scanner (MUL-1908) (#2292 ) * fix(openclaw): parse whole buffer instead of line-by-line scanner Follow-up to `c87d7676` (WOR-10). The stdout/stderr swap fixed the dominant case but `processOutput` still scanned line-by-line and only attempted a whole-buffer parse from a fragile fallback path. Pretty-printed JSON (openclaw 2026.5.x emits the result blob indented across many lines) made every individual line unparseable on its own — `{`, ` "payloads": [`, ` {`, etc. — so the success path hinged entirely on the fallback joining `rawLines` and re-trying. Under load (daemon restarts racing the close-on-cancel goroutine, partial chunked reads when stdout closes mid-flight) the line scanner could see truncated input that never reassembled into valid JSON, surfacing "openclaw returned no parseable output" against runs where the agent had in fact completed the work and posted comments. Roughly 30–40% of recent runs in v0.2.27 logs hit this path; multica still wrote a `task_failed` inbox row for each one even though the underlying issue had moved to `in_review` or `done`. The fix: - processOutput now reads the full stdout buffer with `io.ReadAll` first. - A new `parseWholeBufferOpenclawResult` helper attempts a single `json.Unmarshal` against the entire buffer (after trimming, and after optionally stripping leading non-JSON log lines). When it matches, we build the result and return — the line scanner never runs. - If the whole-buffer parse fails, we fall through to the existing NDJSON line-by-line scanner. This preserves streaming-event support (kept for forward compatibility and other backends) without leaving openclaw's dominant pretty-printed shape at the mercy of timing. - The failure path now emits a `(got N bytes; preview: ...)` suffix on the canonical "no parseable output" error so future debugging isn't blind. The exact canonical phrase is preserved for empty buffers so existing dashboards / log-grep tooling keep matching. Tests: - TestOpenclawProcessOutputWholeBufferPrettyJSON: feeds a hand-crafted multi-line indented blob (multiple payloads, nested agentMeta, usage map) and asserts every field round-trips through the whole-buffer fast path. - TestOpenclawProcessOutputDeeplyIndentedFixture: re-runs the recorded openclaw 2026.5.5 stdout fixture (1070 lines) directly through parseWholeBufferOpenclawResult, asserting the bug-shape parses cleanly on the first attempt without falling through to NDJSON scanning. - TestOpenclawProcessOutputEmptyBufferErrorIncludesByteCount: tightens the empty-buffer failure path, asserts the canonical phrase survives so observability tooling keeps working. All existing tests in the openclaw + buildOpenclawArgs suites stay green (streaming NDJSON event tests, lifecycle tests, structured-error tests, usage-field-variant tests). The two pre-existing flaky timeout-tight codex tests (TestCodexExecuteSemanticInactivityAllowsContinuous) fail on both this branch and on `c87d7676` baseline; they are unrelated and out of scope here. Co-authored-by: multica-agent <github@multica.ai> fix(openclaw): drop dead preview branch, document streaming regression Rebase + review-fix follow-up on top of f27df2d9b. processOutput's preview branch was unreachable: openclawNoParseableOutputError was only called from the `!gotEvents && trimmed == ""` path, which by construction means the entire scanned buffer collapsed to whitespace, so the `(got N bytes; preview: ...)` formatter could never fire on a non-empty buffer. Replace the helper with a single canonical-string constant (callsite is now inline) and update the test name to match what it actually asserts (the canonical empty-buffer error string is preserved for external log-grep / dashboard consumers). Also document on processOutput that the line-scanner path is no longer truly streaming after the io.ReadAll switch: events accumulate until stdout closes. OpenClaw 2026.5.x does not emit streaming events so this regression is invisible today, but flag it for the next backend that might. Misc: switch the scanner's input source from `strings.NewReader(string(buf))` to `bytes.NewReader(buf)` to drop one unnecessary byte/string round-trip. MUL-1908 Co-authored-by: multica-agent <github@multica.ai> --------- Co-authored-by: multica-agent <github@multica.ai> Co-authored-by: J (Multica agent) <j@multica.local>	2026-05-19 17:42:41 +08:00
Bohan Jiang	54368fd826	feat(projects): scheduled-only Gantt data source + WS reactivity (MUL-1881) (#2856 ) * feat(projects): scheduled-only Gantt data source + WS reactivity (MUL-1881) Project Gantt now fetches its own scheduled-only data instead of riding the Board/List pagination cache. The Unscheduled drawer and pagination warning banner are gone, and any WS-driven issue change (create / update / delete) invalidates the new cache so the timeline stays live. - Backend: `GET /api/issues?scheduled=true` adds an `(i.start_date IS NOT NULL OR i.due_date IS NOT NULL)` predicate on both ListIssues and CountIssues. New SQL filter is plumbed through sqlc + handler. - Frontend: new `projectGanttIssuesOptions(wsId, projectId)` issues a single fetch and lives under its own cache key. WS handlers and mutations invalidate the prefix on create/update/delete so the bar reacts to start_date / due_date changes from other tabs and from this tab without waiting on the WS round-trip. - GanttView: drops the Unscheduled section, the pagination warning banner, and the load-all button; renders only scheduled rows. - Removes now-dead `useLoadAllRemaining`, `myIssueListPaginationOptions`, `summarizeIssueListPagination`, and the gantt locale strings that supported the old plumbing. Co-authored-by: multica-agent <github@multica.ai> * fix(projects): page through Gantt fetch and isolate per-view data sources - Walk paginated `scheduled=true` issues until total is reached so projects with more than 500 scheduled bars no longer silently truncate. - Gantt mode disables the bucketed Board/List query and reads its own scheduled cache for the project empty-state check, so the page never short-circuits Gantt with a Board-derived "no issues" CTA. - `onIssueLabelsChanged` patches matching rows in the Project Gantt cache in-place, keeping label filters consistent after attach/detach from other tabs or agents. MUL-1881 Co-authored-by: multica-agent <github@multica.ai> --------- Co-authored-by: multica-agent <github@multica.ai>	2026-05-19 17:04:16 +08:00
Bohan Jiang	9a577f3e11	fix(runtimes): anchor OpenCode skill + AGENTS.md discovery to task workdir (MUL-2416) (#2849 ) * fix(runtimes): anchor OpenCode skill + AGENTS.md discovery to task workdir OpenCode resolves its project discovery root from `--dir` and `PWD` before falling back to `process.cwd()`. The daemon set `cmd.Dir = workDir` but never overrode the inherited `PWD`, so OpenCode walked from the daemon's shell directory and silently bypassed the per-task workdir — agents lost visibility into `.opencode/skills/` and `AGENTS.md`, falling back to whatever global skills the host had installed (MUL-2416). - Pass `opencode run --dir <workDir>` and override `PWD=<workDir>` in the child env so AGENTS.md walk-up + `.opencode/skills` project config scan both anchor on the task workdir. - Block `--dir` from custom args so user overrides cannot re-introduce the regression. - Plumb skill `description` from DB through service / daemon / execenv. `writeSkillFiles` synthesizes a YAML frontmatter block (`name`, optional `description`) when the stored content lacks one, since runtimes like OpenCode silently drop SKILL.md files without a parseable `name`. Existing frontmatter is preserved unchanged so upstream-imported skills (GitHub / ClawHub / Skills.sh) keep their hand-shaped metadata. Tests: - New fake-CLI test confirms argv carries `--dir <workDir>` and the child sees `PWD=<workDir>`. - New test confirms a user-supplied `--dir` in custom_args is dropped. - New execenv tests cover synthesized frontmatter and preservation of pre-existing frontmatter. Co-authored-by: multica-agent <github@multica.ai> * fix(runtimes): inject SKILL.md `name` when upstream frontmatter omits it Skills imported with frontmatter that sets `description` but leaves `name` implicit (relying on the directory slug, as common in GitHub/Skills.sh imports) still hit OpenCode's "no parseable name → drop" path because the DB Name fallback never made it into the SKILL.md body. ensureSkillFrontmatter now scans the existing block and, when name is missing or empty, prepends `name: <slug>` while preserving description, body, and any runtime-specific keys verbatim. Also tighten yamlEscapeInline to always double-quote so descriptions that look like YAML keywords (`null`, `true`, `[foo]`, `{x: y}`, `2024-01-01`) parse as strings rather than getting reinterpreted and rejected. Adds regression test for the nameless-frontmatter case and updates the existing OpenCode skill test for the always-quoted description format. Co-authored-by: multica-agent <github@multica.ai> --------- Co-authored-by: multica-agent <github@multica.ai>	2026-05-19 16:21:02 +08:00
Naiyuan Qing	93153d08b7	feat(my-issues): cover squad assignees via involves_user_id (MUL-2397) (#2829 ) Re-introduces the `involves_user_id` filter on the issues list / open-list / count / grouped paths, but with the semantics nailed down for the second time around: tab 3 surfaces issues whose assignee is an indirect extension of the user (owned agent, or a squad they're a human member of / lead via owned agent / have an owned agent inside) — and explicitly NOT direct member assignment, which is tab 1's meaning. - server/pkg/db/queries/issue.sql: 4-branch filter on ListIssues / ListOpenIssues / CountIssues. Each subquery clamps workspace_id because issue.assignee_id is polymorphic with no FK. Leader resolution reads squad.leader_id directly, not the squad_member copy row (squad.go ignores errors when seeding that copy, so it can be missing). FindActiveDuplicateIssue switched from positional $2/$3/$4 to named sqlc.arg() — pure hygiene so the generated struct field names don't drift when new nargs are added. - server/internal/handler/issue.go: parse involves_user_id and plumb it into the three sqlc params; ListGroupedIssues (hand-written dynamic SQL) gets a mirrored 4-branch fragment, no shortcut. - packages/core: ListIssuesParams / ListGroupedIssuesParams / MyIssuesFilter / api.listIssues / api.listGroupedIssues all carry the new param through. - packages/views/my-issues: tab 3 switches from client-side agent-fanout to involves_user_id=user.id. agentListOptions import and the myAgentIds memo go away. - server/internal/handler/issue_involves_test.go: 13 integration tests cover every branch (positive + cross-workspace negatives) plus the critical ExcludesDirectMemberAssignee negative on BOTH the sqlc and the grouped paths, locking tab 3 ∩ tab 1 = ∅. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com> Co-authored-by: multica-agent <github@multica.ai>	2026-05-19 10:37:38 +08:00
Naiyuan Qing	5476e7678d	Revert "feat(my-issues): cover squad assignees via involves_user_id (MUL-2364…" (#2828 ) This reverts commit `3c510c31ed`.	2026-05-19 09:31:43 +08:00
Naiyuan Qing	3c510c31ed	feat(my-issues): cover squad assignees via involves_user_id (MUL-2364) (#2801 ) * feat(my-issues): cover squad assignees via involves_user_id (MUL-2364) The "My Agents" tab on /my-issues only resolved agents owned by the caller, so issues assigned to squads (member, leader, or agent-member of mine) never surfaced. This added a UNION-based involves_user_id filter that the backend expands to "me + agents I own + squads I relate to" in a single query. - SQL: ListIssues / ListOpenIssues / CountIssues accept narg involves_user_id and OR a workspace-scoped 3-branch UNION on the squad assignee subquery. Leader is sourced from canonical squad.leader_id (not the best-effort squad_member copy row whose AddSquadMember error is dropped in squad.go:177-188 and :259-263). - Handler: parses involves_user_id via parseUUIDOrBadRequest, plumbs into all three list params, and mirrors the same UNION fragment into the grouped dynamic SQL path. - Frontend: ListIssuesParams / ListGroupedIssuesParams / MyIssuesFilter gain involves_user_id; api client forwards it to the querystring. - My Issues page: "agents" scope now passes involves_user_id instead of fanning out owned-agent IDs client-side. Tab label widens to "我的智能体 / 小队" / "My Agents / Squads". - Tests: Go suite covers all three squad relations including the canonical-leader-without-squad_member-copy variant, cross-workspace isolation for agent / leader / squad_member branches, combination with creator_id, and the malformed-UUID 400 path. Client test pins the involves_user_id querystring wiring for both list endpoints. The FindActiveDuplicateIssue query gets explicit sqlc.arg() names so sqlc regeneration keeps the existing struct field names regardless of the local sqlc version (no behavior change). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Co-authored-by: multica-agent <github@multica.ai> * test(my-issues): tighten cross-workspace negatives for involves_user_id UNION Cross-workspace negative tests previously put both the foreign actor and the foreign issue in the foreign workspace, so the outer i.workspace_id = $1 already excluded the row before the UNION branches were exercised. Stripping a.workspace_id = $1 / s.workspace_id = $1 from any of the UNION subqueries would not have failed the tests. Rewrite the three existing negative cases to seed the issue in testWorkspaceID with a polymorphic assignee_id pointing at a foreign-workspace agent or squad (issue.assignee_id has no FK per migrations/001_init.up.sql:61). Now each UNION branch must enforce its own workspace scoping for the issue to stay out of the result. Also add ExcludesOtherWorkspaceSquadAgentMember: the squad_member.agent UNION branch had only positive coverage; this test pins that s.workspace_id = $1 and a.workspace_id = $1 must both hold there too. Verified by mutation: stripping the workspace clause from each branch makes the corresponding test fail. Co-authored-by: multica-agent <github@multica.ai> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com> Co-authored-by: multica-agent <github@multica.ai>	2026-05-19 09:01:51 +08:00
Bohan Jiang	6f5fbb7813	feat(comments): thread-aware list with composite cursor (MUL-2340) (#2787 ) * feat(comments): thread-aware list with composite cursor (MUL-2340) Adds three optional query params to GET /api/issues/{id}/comments and the matching `multica issue comment list` flags: - `thread=<comment-uuid>` resolves the anchor to the thread root via a recursive CTE (defends against any future nested replies) and returns root + all descendants chronologically. Anchor can be any comment in the thread, root or reply. - `recent=<N>` returns the newest N comments for the issue, ordered chronologically in the response. - `before=<RFC3339>` + `before-id=<uuid>` form a composite cursor for stable pagination of `recent`. Both must be set together; a timestamp-only cursor is rejected because ties on `created_at` would let the existing `(created_at ASC, id ASC)` total order skip or duplicate rows across pages. Flag combination rules: `thread` is exclusive with `recent` and the cursor; both may combine with `since`. Server and CLI enforce the same matrix; the CLI fails fast locally so callers don't pay for a 400 round-trip. Default behaviour (no params) is unchanged — full chronological dump capped at commentHardCap — so the desktop UI and existing `--since` polling are untouched. Agent prompt updates land in a follow-up PR so the new CLI capabilities ship and bake first. Co-authored-by: multica-agent <github@multica.ai> * fix(comments): reject cursor without recent and align CLI/server on invalid --recent (MUL-2340) Elon's PR #2787 second review flagged two gaps in the flag combination matrix: - server: GET /comments?before=...&before_id=... without `recent` was silently dropped by fetchCommentsForList (RecentN=0 fell through to the default / since path), so callers got the full timeline instead of the documented "before X" semantics. Now returns 400. - CLI: --recent 0 / --recent -3 were collapsed with "flag not passed" by `recent > 0`, so an explicit invalid value silently fell back to the default list. Switched to Flags().Changed("recent") so explicit non-positive values fail loudly. Also enforces that --before / --before-id only appear with explicit --recent (mirrors the new server-side rule). Tests: - server flag matrix gains `before + before_id without recent → 400`. - CLI gains TestRunIssueCommentListFlagGuards covering `--recent 0`, `--recent -3`, cursor-without-recent, and the thread/recent exclusivity path under the new Changed()-based check. The mock server fatals if a request reaches /comments, proving the guards fire before any HTTP round-trip. Co-authored-by: multica-agent <github@multica.ai> * feat(comments): make `recent` thread-grouped with a thread cursor (MUL-2340) Bohan pushed back on the row-based `recent=N` shape: comments form a tree, not a list, and the newest N rows can come from N unrelated threads, giving the agent N disjoint conversational tails. Replace the row-based query with a thread-grouped one before #2787 merges so we never ship the wrong shape: - `recent=N` now returns the N most recently active threads (root + every descendant per thread). A thread's recency is MAX(created_at) across its whole subtree, so a stale-but-recently-replied thread outranks an old quiet one — exactly the property row-recent loses. - The cursor is now a thread cursor: `before` = a thread's last_activity_at, `before_id` = its root comment id. The pair walks threads strictly less recent than the page's oldest-active thread. The cursor surfaces via `X-Multica-Next-Before` / `X-Multica-Next-Before-Id` response headers (empty when there are no older threads); the CLI forwards the same pair to stderr after listing. - Row-based `recent` is gone — there is no internal caller and the prompt update has not shipped yet, so there is no compat surface to preserve. - Response body shape unchanged (flat JSON array, chronological). Default and `--since` paths untouched. Desktop UI keeps working. Tests: - recent=1 returns the freshest-active thread fully; recent=2 returns both with the older-active thread first (oldest-active → freshest tail). - Stale-but-fresh: a thread whose root is older but has a fresh reply outranks a thread whose root is newer but quiet. - Cursor headers emitted only on full pages; empty on the final page. - Pagination walks threads root2 → root1 → empty, no skips/duplicates. - Tie-break: three threads sharing last_activity_at paginate one-at-a-time using (last_activity_at, root_id) ordering — verifies the timestamp-only cursor failure mode is fixed for the thread case too. Co-authored-by: multica-agent <github@multica.ai> --------- Co-authored-by: multica-agent <github@multica.ai>	2026-05-18 19:28:26 +08:00
Bohan Jiang	e8d4b9a0a2	revert: drop exec_command watchdog (#2779 , #2786 ) (MUL-2337) (#2803 ) * Revert "fix(codex): bump default exec_command stuck timeout to 3 minutes (#2786)" This reverts commit `433cd1aaf5`. Co-authored-by: multica-agent <github@multica.ai> * Revert "feat(codex): add per-exec_command watchdog to escape dropped function_call_output (MUL-2337) (#2779)" This reverts commit `60bae62622`. Co-authored-by: multica-agent <github@multica.ai> --------- Co-authored-by: multica-agent <github@multica.ai>	2026-05-18 18:08:07 +08:00
AdamQQQ	fab0671332	feat(skills): support multi-select bulk import in Copy from runtime (#2686 ) - Multi-select UI for batch importing skills from a local runtime - Server batch-dispatches up to 10 import requests per heartbeat cycle - WS heartbeat now reads supports_batch_import from daemon payload instead of hardcoding true, so old daemons correctly fall back to one-at-a-time dispatch - Raised server pending timeout to 3min and client poll timeout to 4min to accommodate daemons that pop only one import per 15s heartbeat Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-18 16:56:27 +08:00
Jiayuan Zhang	46c1e2c889	feat(squads): show member working status on squad detail page (#2768 ) * feat(squads): show member working status on squad detail page Add a new GET /api/squads/{id}/members/status endpoint that returns each member's derived working/idle/offline/unstable status, the issues each agent is currently running, and the last observed activity timestamp. The Squad detail page's Members tab consumes this snapshot to render a status pill and an active-issue link next to each agent, with live refresh wired through the existing task/agent/daemon WS events. Human members are returned with status=null so the UI can keep them in the same list without implying a presence signal. Archived agents stay in the response and surface as offline rather than being filtered out. Co-authored-by: multica-agent <github@multica.ai> * fix(squads): address review feedback on member status endpoint - i18n the "blocked" issue-status pill in squad members tab (was a bare literal that failed `i18next/no-literal-string` lint). - Treat any dispatched/running task as working, even when its `agent_task_queue.issue_id` is NULL (chat / quick-create tasks). The agent slot is occupied regardless of whether we can render an issue link. - Force `offline` for archived agents so they appear in the list but never look like they're still on duty, matching the RFC decision in MUL-2319. - Include `workspaceKeys.squads` in the post-reconnect / workspace-switch bulk invalidation so members-status recovers after a disconnect during which task/runtime events were missed. Co-authored-by: multica-agent <github@multica.ai> --------- Co-authored-by: multica-agent <github@multica.ai>	2026-05-18 10:35:18 +02:00
Bohan Jiang	433cd1aaf5	fix(codex): bump default exec_command stuck timeout to 3 minutes (#2786 ) The watchdog fires on a "no progress" window, so the default mainly matters for commands that go fully silent (no outputDelta). Bumping from 2m → 3m leaves more headroom for legitimately slow silent commands before treating them as a dropped function_call_output, at a modest cost to recovery latency. MUL-2337 Co-authored-by: multica-agent <github@multica.ai>	2026-05-18 15:30:05 +08:00
Bohan Jiang	60bae62622	feat(codex): add per-exec_command watchdog to escape dropped function_call_output (MUL-2337) (#2779 ) * feat(codex): add per-exec_command watchdog to escape dropped function_call_output (MUL-2337) Codex app-server can drop the second function_call_output when two exec_command calls fan out in the same turn and both async-yield through the yield_time_ms boundary (observed 2026-05-18, MUL-2334 — Trump Agent wedged for 6+ min with no semantic activity events to drive any existing timer). The model then waits forever for the missing output; only the 10-minute semantic inactivity timeout would eventually rescue the run. Add a per-call watchdog in the codex client that tracks open exec_command / commandExecution items by call_id and fails the turn quickly (default 2 min, configurable via ExecOptions.ExecCommandStuckTimeout) when one stays open without progress. outputDelta events reset the per-call progress timestamp so long-running streaming commands aren't flagged. This is a daemon-side mitigation only — codex itself still has the upstream race, but the daemon no longer burns the full inactivity budget before the run is marked failed and a new run can recover. Co-authored-by: multica-agent <github@multica.ai> * feat(codex): track legacy exec_command_output_delta in watchdog (MUL-2337) Mirrors the raw v2 item/commandExecution/outputDelta refresh on the legacy codex/event protocol so a long-running streaming exec doesn't get falsely flagged as stuck after begin + 2 min. Co-authored-by: multica-agent <github@multica.ai> --------- Co-authored-by: multica-agent <github@multica.ai>	2026-05-18 15:14:45 +08:00
Bohan Jiang	2323b72710	feat(autopilots): webhook delivery layer + idempotency/signature/replay (MUL-2334) [PR1] (#2774 ) * feat(autopilots): webhook delivery layer + idempotency / signature / replay (MUL-2334) Splits "inbound webhook receipt" from "autopilot run creation" so we can record duplicate attempts, signature outcomes, and ignored/skipped deliveries — and replay a delivery on demand. v1 ingress wrote straight into autopilot_run.trigger_payload, which collapsed the two concerns and left run_only autopilots vulnerable to provider retry storms. Backend only (PR1). UI Deliveries tab follows in PR2. Schema (migration 093): - autopilot_trigger.provider: 'generic' \| 'github' (default 'generic'). - autopilot_trigger.signing_secret: nullable plaintext (HMAC needs it cleartext; mirrors how webhook_token is stored). - webhook_delivery: one row per inbound POST. Carries raw_body, selected_headers, dedupe_key/source, signature_status, autopilot_run_id, replayed_from_delivery_id, response_status / body. - Partial unique index on (trigger_id, dedupe_key) excludes NULL and 'rejected' rows, so a wrong-secret 401 does NOT permanently block a future retry with the same X-GitHub-Delivery once the operator fixes the secret. Ingress flow (autopilot_webhook.go), persist-first + sync dispatch: 1. IP rate limit -> 2. token lookup -> 3. token rate limit -> 4. read raw body -> 5. autopilot/workspace cross-check -> 6. normalize JSON (400 without persistence on parse failure) -> 7. compute dedupe key + signature status -> 8. INSERT delivery (status=queued). On (trigger_id, dedupe_key) unique-violation: bump attempt_count on existing row and return the original delivery_id + autopilot_run_id with 200 -> 9. invalid/missing signature: UPDATE -> rejected, return 401 with delivery_id (no dispatch, not replayable) -> 10. trigger disabled / autopilot paused/archived: UPDATE -> ignored, return 200 -> 11. DispatchAutopilot synchronously, UPDATE -> dispatched/skipped/failed with autopilot_run_id and the response body we returned -> 12. TouchAutopilotTriggerFiredAt and return 200. No new long-running worker. A stale 'queued' row only happens if the process dies between INSERT and UPDATE; that's a follow-up sweeper, not this PR. Authenticated API: - GET /api/autopilots/{id}/deliveries (slim list) - GET /api/autopilots/{id}/deliveries/{deliveryId} (with raw_body) - POST /api/autopilots/{id}/deliveries/{deliveryId}/replay -> creates a new delivery row (replayed_from_delivery_id set), dispatches a new run, never collapses onto the original via dedupe. - PUT /api/autopilots/{id}/triggers/{triggerId}/signing-secret Write-only; trigger response surfaces has_signing_secret + signing_secret_hint (last 4 chars), never the secret itself. Signature verification reuses the GitHub-compatible X-Hub-Signature-256: sha256=<hex(hmac(body, secret))> scheme; the HMAC helper is constant-time. Invalid/missing signatures still count against per-IP and per-token rate limits. autopilot_run.trigger_payload is intentionally preserved — delivery records the HTTP receipt; run records the normalized envelope handed to the agent. They are two different views. Tests (Postgres-backed): - delivery persistence on accept - dedupe via Idempotency-Key and X-GitHub-Delivery; run_only retry storm pin (3 retries -> 1 run) - invalid signature: 401 + rejected row + no run linkage - missing signature when secret configured: 401 + 'missing' state - valid signature dispatches - signing secret never echoed in trigger responses; hint shows last 4 - min-length and clear-by-empty for signing secret PUT - replay creates a NEW delivery + new run; rejected deliveries cannot be replayed - list omits raw_body; detail includes it; cross-autopilot ID returns 404 (workspace isolation defense in depth) - provider validation: unknown -> 400, github -> 201 round-trips - bad-signature stream still counts against per-token rate limit Co-authored-by: multica-agent <github@multica.ai> * fix(autopilots): address PR review on webhook delivery layer (MUL-2334) - Exclude `failed` from the (trigger_id, dedupe_key) partial unique index alongside `rejected`, so a transient ingress failure does not strand the provider's stable X-GitHub-Delivery / Idempotency-Key retry. Update the dedupe lookup to prefer non-terminal rows under the same predicate. - Tighten delivery status enum: drop `skipped` from the CHECK constraint and from the handler. A run that was admission-skipped (e.g. runtime offline) is now recorded as delivery=`dispatched` linked to the skipped run, with the response payload carrying status=`skipped`. Source of truth for skipped-ness is autopilot_run.status, not the delivery row — keeps the Deliveries UI enum unambiguous. - On dispatch error, link the (possibly non-nil) autopilot_run returned by DispatchAutopilot to the failed delivery so Deliveries UI can navigate to the run row for debugging. - Slim list projection: ListWebhookDeliveriesByAutopilot no longer pulls raw_body / selected_headers / response_body — a 100-row page × 256 KiB would otherwise round-trip ~25 MiB from Postgres per Deliveries reload. Detail endpoint continues to return the full row. - Fix backend CI: TestGetDelivery_ReturnsFullPayload now decodes the response and asserts on the parsed raw_body instead of substring- matching against an escaped JSON string; raise the test-suite default webhook rate limits in TestMain so the shared 192.0.2.1 IP bucket doesn't fill across the suite and leak 429s into unrelated tests. - Add regression coverage for the dedupe-after-failure path. cd server && go test ./... is green locally. Co-authored-by: multica-agent <github@multica.ai> --------- Co-authored-by: multica-agent <github@multica.ai>	2026-05-18 14:59:40 +08:00
Kerim Incedayi	9418d2a2c1	feat(autopilots): webhook triggers (server + CLI + UI + docs) MUL-2049 (#2348 ) * feat(server): add webhook trigger DB migration + sqlc queries Lays the foundation for webhook autopilot triggers: - partial unique index on autopilot_trigger.webhook_token (kind=webhook only) so the public ingress route can resolve a trigger in O(1) - GetWebhookTriggerByToken / TouchAutopilotTriggerFiredAt / RotateAutopilotTriggerWebhookToken / SetAutopilotTriggerWebhookToken queries, regenerated with sqlc * feat(server): webhook token generator + payload normalizer Two pure helpers for the webhook autopilot work: - generateWebhookToken: 32 random bytes -> base64-url, "awt_" prefix. 256 bits of entropy keeps brute-force off the table; the prefix makes leaked tokens recognisable in logs. - normalizeWebhookPayload: turns arbitrary JSON into the WebhookEnvelope shape (event/eventPayload/request) used by trigger_payload. Header- and body-based event inference covers GitHub, GitLab, X-Event-Type, and caller-provided envelopes; scalar/empty/invalid bodies are rejected so the handler can answer 400. * feat(server): generate webhook tokens and expose rotate endpoint - New handler.Config.PublicURL fed by MULTICA_PUBLIC_URL env so /api/autopilots/.../triggers responses can include an absolute webhook_url alongside the always-present webhook_path. - CreateAutopilotTrigger now mints a webhook_token via crypto/rand for kind=webhook and ignores cron/timezone for non-schedule kinds. api triggers stay accepted-but-inert per PLAN.md. - New POST /api/autopilots/{id}/triggers/{triggerId}/rotate-webhook-token protected by the existing workspace auth group; old tokens stop working immediately because the unique-index lookup keys on the current row value. * feat(server): public webhook ingress route + per-token rate limiter - New POST /api/webhooks/autopilots/{token} route, mounted outside the authenticated group: the path token is the credential. Workspace context is derived from the joined autopilot row, never headers. - Body capped at 256 KiB via http.MaxBytesReader; oversized payloads return 413 mid-read instead of being fully buffered. - Disabled triggers / paused / archived autopilots return 200 {"status":"ignored"} so providers stop retrying. - Skipped-runtime dispatches surface 200 {"status":"skipped"} with the reason from the autopilot service's pre-flight admission check. - WebhookRateLimiter interface with sliding-window in-memory + Redis Lua-script implementations. Default 60 req/min per token. Test coverage on the in-memory path; Redis variant fails open on cache errors so a Redis hiccup never blocks ingress. - Integration tests exercise token generation, dispatch, payload envelope persistence, GitHub-header inference, paused/disabled short-circuits, oversized rejection, and rotate-then-old-token-404. * feat(server): include webhook payload in create_issue description When an autopilot run is triggered by a webhook and execution_mode is create_issue, the agent only sees the issue body — never the run's trigger_payload. Append a 'Webhook event:' line and a fenced JSON block with the normalized eventPayload so the agent has the inbound context inline. Schedule / manual runs are unchanged. Tests cover: - schedule path keeps existing italic note, no webhook block - webhook path emits event line + payload block, italic before block - non-envelope JSON falls back to raw body (defensive) - non-webhook source with payload still gets no webhook block * feat(core): types, API client and mutations for webhook triggers - AutopilotRunStatus gains 'skipped' so the run-list UI handles the admission-skipped state explicitly instead of falling through to a generic case (the backend already emits it via MUL-1899). - AutopilotTrigger picks up optional webhook_path / webhook_url. Both are optional so older self-hosted servers that pre-date this change still parse cleanly. - buildAutopilotWebhookUrl helper composes a usable absolute URL with the priority webhook_url > apiBaseUrl + path > origin + path > path. Tested with seven cases covering each branch. - ApiClient.rotateAutopilotTriggerWebhookToken posts to /api/autopilots/{id}/triggers/{triggerId}/rotate-webhook-token; the HTTP-contract test pins URL + method. - useRotateAutopilotTriggerWebhookToken mutation invalidates autopilotKeys.detail on settle, mirroring the existing trigger-mutation pattern. * feat(views): webhook trigger UI in Add Trigger dialog and trigger row Add Trigger dialog gains a Schedule/Webhook segmented toggle: - Schedule reuses TriggerConfigSection unchanged. - Webhook hides the cron config and shows a help line; the trigger is created with kind=webhook and the URL is generated server-side. - Toast text differentiates schedule vs webhook on success. TriggerRow grows a webhook branch: - Webhook icon, kind translated via trigger_kind. - URL shown in a truncating monospace pill, with copy + rotate buttons. Copy uses navigator.clipboard with toast feedback; rotate uses an AlertDialog confirm because the old URL stops working immediately. - api triggers render a Deprecated badge and skip URL/copy/rotate affordances. RunRow gains a 'skipped' RUN_VISUAL entry (muted dash) so admission- skipped runs don't fall through to a generic case. Source label uses the new run_source i18n key instead of capitalize. Locales: en + zh-Hans gain run_status.skipped, run_source., trigger_kind., trigger_row.{copy_url,rotate_url,_confirm_,toast_}, add_trigger_dialog.{type_,webhook_help,toast_added_{schedule,webhook}}. * feat(cli): support webhook trigger creation and URL rotation - multica autopilot trigger-add now takes --kind schedule\|webhook (default schedule for backward compatibility). For webhook it skips --cron / --timezone validation and prints the resulting webhook URL, preferring the server-provided webhook_url and falling back to client.BaseURL + webhook_path. - New multica autopilot trigger-rotate-url <autopilot-id> <trigger-id> command for rotating the bearer URL of a webhook trigger. * docs(autopilots): add webhook trigger guide (en + zh) Replaces the 'Webhook and API triggers are not available yet' section with end-to-end webhook documentation: how the URL is generated, what payload shapes are accepted, the inferred-event rules, the bearer-secret warning + rotate flow, status-code semantics for accepted/skipped/ ignored/4xx/5xx outcomes, and the MULTICA_PUBLIC_URL self-host configuration. Run history list now mentions skipped status. The 'unavailable features' section narrows to api-kind triggers, HMAC signing, IP allowlists, and provider presets. * feat(views): add Schedule/Webhook toggle to the create autopilot dialog Closes the gap where a brand-new autopilot could only be created with a schedule trigger. The right-column config now has a Trigger section with a segmented Schedule/Webhook control: - Schedule keeps the existing cron/timezone UI. - Webhook hides the cron UI and shows a help line; on submit, a kind=webhook trigger is created right after the autopilot. In edit mode the toggle is intentionally hidden (PLAN.md treats trigger- type changes as delete-old + create-new, not in-place updates), but the panel still picks the right kind based on props.triggers[0].kind so a webhook autopilot doesn't render an irrelevant cron form. Locales: section_trigger_kind, trigger_kind_{schedule,webhook}, section_webhook, webhook_help_{create,edit} added in en + zh-Hans. * feat(views): show webhook URL inline after creating a webhook autopilot After a successful create with kind=webhook, the dialog stays open and swaps to a confirmation panel showing the freshly minted URL with a copy button + 'Treat this URL like a password' warning + Done button. Avoids the friction of "create the autopilot, then go find it in the list, click in, scroll to triggers, copy URL." Locales: dialog.webhook_created_{title,description,warning,done} added in en + zh-Hans. Schedule create flow is unchanged (toast + close). The success panel is gated on the trigger returned from the create mutation, so a partial failure (autopilot created, trigger creation errored) still falls through to the toast_create_partial path. * feat(views): show webhook payload in run detail dialog The agent transcript dialog now accepts an optional headerSlot that sits above the event list. The autopilot RunRow drops a WebhookPayloadPreview into that slot when the run came from a webhook and trigger_payload is non-empty. The preview is collapsed by default (the transcript itself is the main event), shows the inferred event name + receivedAt in the header, and reveals the eventPayload as pretty-printed JSON with a copy button on expand. Falls back gracefully if the row's trigger_payload doesn't match the WebhookEnvelope shape — the whole value is shown instead so nothing is hidden. Closes the "agent didn't echo the payload, now I can't see what triggered the run" gap. PLAN.md tracked this as "Payload preview in run history" under follow-ups. Locales: webhook_payload.{label, unknown_event, payload, content_type, copy, copied, copied_short, copy_failed} added in en + zh-Hans. * chore(server): wire MULTICA_PUBLIC_URL through self-host compose Two small follow-ups split out of the webhook trigger PR: - docker-compose.selfhost.yml passes MULTICA_PUBLIC_URL into the backend container so a self-hosted deployment behind a real domain gets absolute webhook URLs in the trigger response. Documented in .env.example with the rationale for not deriving the public host from request headers. - Drop a duplicated 'invalid json:' prefix in the webhook ingress 400 error path. normalizeWebhookPayload already prefixes its errors, so the handler doesn't need to re-prefix. * fix(migrations): renumber webhook trigger migration 081 → 089 to avoid collision The branch's 081_autopilot_webhook_triggers.{up,down}.sql collided numerically with 081_runtime_timezone.{up,down}.sql that landed on main, making migration apply order undefined. Renumber to 089 so the file slots after the latest main migration (088_squad_instructions). The SQL itself doesn't conflict — it only creates a partial unique index on autopilot_trigger.webhook_token — but the duplicate prefix is what the migration runner sees, so the filename must move. * fix(autopilot-webhook): address PR review blocking issues - Redact bearer tokens from request logs: paths matching /api/webhooks/autopilots/<token> now log "[redacted]" instead of the token. The resolved trigger ID is plumbed via context so audit lines stay useful for debugging. (Review item Blocking #1.) - Distinguish pgx.ErrNoRows from transient DB errors in token lookup: no-row stays 404 (so providers don't retry on a deleted webhook), other errors return 500 (which providers DO retry, avoiding silent drops on DB blips). (Review item Blocking #2.) - Add per-IP sliding-window rate limiter that runs BEFORE the token lookup, so spraying random tokens can no longer probe the autopilot_trigger index unboundedly. Reuses the existing Lua script with a separate Redis key namespace; falls open on Redis errors. Default budget 30 req/min/IP. (Review item Blocking #3.) The webhook handler now applies the gates in the order: per-IP rate limit → token lookup → per-token rate limit → handler logic. * fix(autopilot): atomic webhook trigger creation + strict kind/timezone validation - Mint the webhook bearer token BEFORE the INSERT and pass it via CreateAutopilotTriggerParams so the row never exists in a half-written kind=webhook + webhook_token=NULL state. On the (vanishingly rare) unique-index collision the whole INSERT is retried with a fresh token — no UPDATE second step. Removes the now-dead attachFreshWebhookToken helper. (Review item Recommended #4.) - Add new GET /api/autopilots/{id}/runs/{runId} endpoint that returns a single run including the full trigger_payload. The list response is now slim (omits trigger_payload) so worst-case payload size drops from ~5 MB to ~5 KB. (Review item Recommended #5, server side.) - Reject kind=api with 400 ("kind=api is deprecated; use schedule or webhook") and reject kind=webhook with --timezone with 400 — both surfaces stragglers loudly instead of silently dropping fields. CLI mirrors the check so --timezone with --kind webhook errors client-side. (Review nits.) - Add --yes (-y) flag and an interactive y/N confirmation prompt to `multica autopilot trigger-rotate-url` so the destructive rotate matches the UI's AlertDialog safety. (Review item Recommended #6.) * fix(views): fetch webhook payload on-demand and truncate at 4 KiB - Add useAutopilotRun query hook + getAutopilotRun API client method paired with the new server endpoint. The run-detail dialog now mounts a WebhookPayloadSlot that fetches the full run (incl. trigger_payload) lazily — list responses no longer carry up to 256 KiB × N runs of envelope data. - WebhookPayloadPreview truncates its in-DOM <pre> at 4 KiB with a localized marker so jank-y machines aren't asked to render a 256 KiB JSON blob. The Copy button still yields the full string. - Adds the truncated_marker i18n string to en + zh-Hans. Review items Recommended #5 (frontend) and a nit on the preview's unbounded <pre>. * test(autopilot-webhook): close coverage gaps flagged in PR review - request_logger: redactWebhookPath unit tests + integration test proving the bearer token never lands in slog output, plus the webhook_trigger_id context plumbing. - autopilot_webhook_handler: empty body → 400, archived autopilot → 200 ignored, per-IP rate limiter trips before DB lookup, kind=api and webhook+timezone are rejected at 400, slim list + full detail endpoint round-trip. - webhook_rate_limiter: Lua script structure guard (catches reordering even without a live Redis), plus live-Redis tests for both per-token and per-IP limiters (REDIS_TEST_URL gated, matching the existing Redis test pattern in the package). - WebhookPayloadPreview: envelope rendering, fallback shape, and the >4 KiB truncation path with full-payload-on-Copy guarantee. Two branches are documented as code-review-protected rather than covered by tests: the 500-on-DB-error path requires injecting a stub Queries (no interface here), and the cross-workspace defense-in-depth check is unreachable from valid SQL state. * fix(middleware): SetWebhookTriggerID must mutate request in place The round-1 helper returned a fresh http.Request from WithContext, and the webhook handler did `r = SetWebhookTriggerID(r, ...)`. That swaps the handler's local pointer but doesn't propagate the new context back to RequestLogger, which is still holding the original http.Request — so the audit line never actually included webhook_trigger_id in production. The round-1 test happened to pass because it pre-stashed the value on the request before calling ServeHTTP, bypassing the bug it was meant to verify. Switch to in-place mutation via `r = r.WithContext(...)` so the wrapping middleware sees the new context after next.ServeHTTP returns, and update the test to exercise the real call pattern (set the context from inside the handler, assert the surrounding logger reads it). Verified live: an accepted webhook now logs path=/api/webhooks/autopilots/[redacted] webhook_trigger_id=<uuid> * fix(autopilot-webhook): symmetric ErrNoRows split + trusted-proxy gate Round-2 review (Bohan-J, PR #2348 follow-up): - Must-fix #1: the second lookup at autopilot_webhook.go:258 (GetAutopilot after the token resolves) was folding every error into 404. A transient DB blip would tell a webhook sender "not found" and it would never retry. Apply the same errors.Is(err, pgx.ErrNoRows) → 404 / else → 500 split as the first lookup got in round 1. - Must-fix #2: clientIPForRateLimit was honoring X-Forwarded-For / X-Real-IP from any caller. An attacker spraying random tokens could just rotate the XFF header and the per-IP bucket became per-request, so the limiter that's specifically supposed to gate spraying before it hits the DB unique index was bypassed. New shape — matches Bohan's suggestion exactly: * Default: r.RemoteAddr only, headers ignored. * Operator opt-in via MULTICA_TRUSTED_PROXIES (comma-separated CIDRs). XFF/X-Real-IP are honored only when r.RemoteAddr is inside one of the listed prefixes; otherwise they're dropped. Wired through .env.example and docker-compose.selfhost.yml so self-host operators can configure their reverse-proxy's CIDR. Invalid CIDRs in the env var are dropped with a single slog.Warn at startup rather than crashing the server. Uses net/netip (stdlib, value-typed) for parsing and containment checks. Verified live on the rebuilt self-host backend: a 35-request spray from one source with rotating XFF gets the expected 30× 404 + 5× 429, proving the per-IP bucket is keyed on the real connection IP. * fix(autopilot): reject cron/timezone PATCH on non-schedule triggers Round-2 review should-fix. CreateAutopilotTrigger already 400s on kind=webhook + timezone/cron_expression, but UpdateAutopilotTrigger silently wrote those fields regardless of prev.Kind. The values then sat in the DB visible to nobody and read by nothing — a back door that left the API contract fuzzy across create vs update. Mirror the create-path discipline: after loading prev, if prev.Kind != "schedule" and the PATCH body sets cron_expression or timezone, return 400 with a clear message. enabled and label remain accepted on every kind. The existing prev.Kind == "schedule" guard on next_run_at recompute stays as belt-and-braces, but with this gate in place the recompute branch is now reachable only for the kind it was meant for. * test(autopilot-webhook): close round-2 coverage gaps - IPRateLimitNotBypassedByXFFSpoof: drives the must-fix #2 invariant by rotating XFF across three calls from the same RemoteAddr and asserting the third gets 429. Pre-round-2 this test would have passed for the wrong reason (limiter trusted XFF, so per-bucket collision was incidental); now it pins the bypass-closed property. - IPRateLimitReturns429BeforeDBLookup: updated to set RemoteAddr explicitly and drop the XFF header it was leaning on. With TrustedProxies empty (test default) the limiter keys on the real connection IP, which is what the test wants to assert anyway. - UpdateAutopilotTrigger_RejectsCronExpressionOnWebhookKind + UpdateAutopilotTrigger_RejectsTimezoneOnWebhookKind: drive the round-2 should-fix from the handler boundary. - UpdateAutopilotTrigger_AcceptsEnabledAndLabelOnWebhookKind: counter test so a regression to a blanket reject is caught. * fix(migrations): bump webhook trigger migration 089 → 091 origin/main added 089_squad_no_action_activity_index (and 090_task_is_leader) since our last rebase, re-colliding with our 089_autopilot_webhook_triggers. Bump to 091 so the filename ordering is unambiguous again. The SQL is unchanged — same partial unique index on autopilot_trigger.webhook_token — only the filename moves. * fix(views): dedupe skipped icon in autopilot RUN_VISUAL after rebase The rebase against origin/main merged main's add of `Ban` for the skipped status next to our round-1 `MinusCircle` entry, leaving the RUN_VISUAL map with two `skipped` keys (only the last would have been read at runtime, and MinusCircle had been dropped from the imports during conflict resolution — so the file would not compile). Keep main's `Ban` icon (latest design) and a single `skipped` entry. Carry over the round-1 comment about why the muted styling matters for failure-ratio readability. --------- Co-authored-by: Kerim Incedayi <kerim.incedayi@digitalchargingsolutions.com>	2026-05-18 12:17:39 +08:00
Bohan Jiang	113c4f4e90	docs(agent): clarify openclaw agent id vs name semantics (#2744 ) Follow-up to #2716. Updates two stale comments that still described openclaw's `name` and `id` as interchangeable. The actual contract: `id` is the routing key passed to `openclaw agent --agent <id>`; `name` is a human display label and is not safe to pass to the CLI. No behavior change. Co-authored-by: multica-agent <github@multica.ai>	2026-05-17 17:20:41 +08:00
Kagura	44d2fc1946	fix(agent): use openclaw agent id instead of name for --agent flag (#2716 ) openclawEntriesToModels() used the agent Name (which may contain spaces, e.g. "Sub2API OPS") as Model.ID. This ID is passed to openclaw via --agent, where normalizeAgentId mangles spaces into hyphens ("sub2api-ops"), causing a lookup miss against the registered id ("sub2api") and a "no parseable output" error. Fix: prefer agent ID for Model.ID; use Name only for display Label. When ID is empty, fall back to Name for backward compatibility. Fixes #2714	2026-05-17 17:08:00 +08:00
Bohan Jiang	3645bdb5b6	feat(issues): add start_date field with progressive disclosure (MUL-2274) (#2696 ) * feat(issues): add start_date field with progressive disclosure (MUL-2274) Mirrors the existing due_date implementation end-to-end so an issue can express a planned start in addition to a deadline. Surfaces start_date as an optional sidebar property alongside priority / due_date / labels (added in MUL-2275), with consistent picker, board/list/sort, activity, and inbox plumbing. Backs the Project Gantt work (parent MUL-1881) and keeps the progressive-disclosure attribute experience consistent. - DB: migration 091 adds issue.start_date TIMESTAMPTZ. - sqlc: ListIssues / CreateIssue / UpdateIssue / CreateIssueWithOrigin / ListOpenIssues read & write start_date. - Backend: IssueResponse + create/update/batch-update handlers parse and emit start_date with RFC3339 validation; new start_date_changed activity event + subscriber notification (with prev_start_date in event payload). - CLI: --start-date flag on `multica issue create` / `issue update`. - Frontend: StartDatePicker component, start_date wired into Issue type, Zod schema, draft / view stores, sort util, header sort + card-property options, list-row / board-card display, create-issue modal, and the issue-detail progressive-disclosure "+ Add property" surface (visibility rule, picker row, add-property menu icon + label). - i18n: en + zh-Hans for sort_start_date / card_start_date / prop_start_date / activity start_date_set / start_date_removed / picker start_date.trigger_label / clear_action / inbox labels. - Tests: new TestNotification_StartDateChanged; existing Issue / draft / modal fixtures extended with start_date. Co-authored-by: multica-agent <github@multica.ai> * feat(issues): align start_date with due_date in actions menu and CLI table - Add Start Date submenu (today / tomorrow / next week / clear) in actions menu, mirroring Due Date — parity with the Due Date quick setters in list/board context and 3-dot menus. - Add corresponding en / zh-Hans i18n keys (actions.start_date / start_today / start_tomorrow / start_next_week / start_clear). - CLI human table for `multica issue list` and `multica issue get` now shows a START DATE column next to DUE DATE; --full-id variant too. Co-authored-by: multica-agent <github@multica.ai> --------- Co-authored-by: multica-agent <github@multica.ai>	2026-05-17 15:01:38 +08:00
Jiayuan Zhang	668cab6022	feat(github): mirror PR CI checks and merge conflict status (MUL-2228) (#2632 ) * feat(github): mirror PR CI checks and merge conflict status (MUL-2228) Surface "checks passed/failed" and "conflicts/no conflicts" badges under each linked PR on the issue page so users can judge readiness without flipping over to GitHub. CI state is fed by check_suite webhooks (GitHub Actions + apps using the Checks API; legacy status events are out of scope for MVP); conflicts are read from pull_request.mergeable_state. Data model: * github_pull_request: add head_sha + mergeable_state * github_pull_request_check_suite: per-suite rows keyed by (pr_id, suite_id) * Aggregation done at query time, filtering by current head_sha so late-arriving suites for a stale head can't contaminate the new head's pending view; per-app latest suite chosen first so a single app firing multiple suites isn't counted N times. Webhook hardening: * synchronize/opened/reopened/edited(base) explicitly clear mergeable_state * single-row ordering protection on the check_suite upsert prevents a late-delivered older event from overwriting a newer one * check_suite.pull_requests is iterated; unknown PRs are logged and dropped UI: * PR row shows Checks + Conflicts badges; opaque mergeable values (blocked/behind/unstable/...) render as no badge, not as conflicts. * Terminal PR states (merged/closed) suppress the status row entirely. Tests: * Pure unit coverage for derivePRMergeableState + aggregateChecksConclusion * Webhook integration tests: multi-app aggregation, old-head ignore, late-older-event ignore, synchronize clears mergeable_state * Vitest coverage for pull-request-list badge rendering across CI/conflict combinations and the legacy (null) fallback. Co-authored-by: multica-agent <github@multica.ai> * fix(github): scope check_suite PR lookup; preserve mergeable on metadata Addresses code review on PR #2632. 1. check_suite handler now resolves the PR through the workspace-scoped GetGitHubPullRequest query instead of GetGitHubPullRequestByRepoNumber. The (workspace_id, repo_owner, repo_name, pr_number) tuple is the real uniqueness key, so a bare (owner, repo, number) lookup could return a stale row from another workspace and either land the suite on the wrong PR or skip the right one when the installation ids drifted. The old unscoped query is removed. 2. derivePRMergeableState now returns (value, clear) and the upsert SQL distinguishes three cases: state-changing actions clear the column to NULL, non-empty payloads write the value, and metadata events with an empty payload preserve the existing column. Previously every empty payload became NULL, so a labeled/assigned event silently wiped a known clean/dirty verdict in violation of the RFC's "metadata empty payload preserves" rule. 3. ListPullRequestsByIssue narrows to the issue's PR ids before running the per-app check_suite aggregation, avoiding a full-table scan over github_pull_request_check_suite when only a handful of rows belong to the requested issue. New helper test covers labeled+empty preserves; new integration test verifies a metadata event after a known mergeable_state keeps the value. Co-authored-by: multica-agent <github@multica.ai> * feat(github): PR card layout v3 increment — stats + segmented progress bar Replaces the row + badge layout under "Pull requests" on the issue detail sidebar with a card that mirrors the GitHub PR summary look: title, author/avatar, +N −M · K files diff stats, segmented progress bar (failed → pending → passed, failure leftmost), and a one-line status caption following an explicit priority pass-through. Backend - Migration 092: github_pull_request adds additions / deletions / changed_files (INT NOT NULL DEFAULT 0). Zero defaults are what the new frontend treats as "legacy backend — hide the stats row" so old PR rows that pre-date this migration don't render "+0 −0 · 0 files". - pull_request webhook handler reads stats off the top-level payload. - ListPullRequestsByIssue now surfaces per-suite counts (checks_passed / failed / pending) alongside the existing aggregate conclusion, so the segmented bar reuses the already-computed counts with no new aggregation. Frontend (packages) - core/github/pull-request-status.{ts,test.ts}: pure-function module for the status-kind priority table and the segment derivation; 15 cases covered, includes the "all-zero → hide stats" guard. - views/issues/components/pull-request-list.tsx: PullRequestCard plus a compact-row fallback used when count > 4 (first 3 as cards, the remainder collapsed behind a Show more toggle). - i18n: new `pull_request_card_` keys in en + zh-Hans. Tests - 12 component tests covering each rule of the priority table, the legacy-zero stats fallback, and the collapse threshold. - Reuse of the v3 webhook handler tests confirmed. Verification - pnpm typecheck + pnpm test green (60 test files, 536 tests). - go build ./... + go vet ./... clean. - 6 demo issues (DEV-2..DEV-7) screenshotted via Playwright; see the PR comments for the visual check matrix. Co-authored-by: multica-agent <github@multica.ai> fix(views): collapse PR cards at N>=4, not N>4 The card-vs-collapse threshold used `>` so 4 PRs slipped past it and all rendered as full cards, contrary to RFC v3 (N >= 4 collapses to 3 cards + compact tail). Switch to `>=` and update the threshold- boundary test to expect "Show 1 more". Co-authored-by: multica-agent <github@multica.ai> * fix(views): align PR sidebar rows with existing list style Co-authored-by: multica-agent <github@multica.ai> * fix(views): hide terminal PR status badges Co-authored-by: multica-agent <github@multica.ai> --------- Co-authored-by: multica-agent <github@multica.ai>	2026-05-16 21:26:30 +02:00
Jiayuan Zhang	380c6b5122	feat(usage): add Time and Tasks to daily-trend toggle (MUL-2283) (#2709 ) Extends the workspace /usage page Daily tokens chart toggle from Tokens \| Cost to Tokens \| Cost \| Time \| Tasks, so users see daily run-time and task-count trends alongside spend without leaving the page. - New SQL `ListDashboardRunTimeDaily`: per-date totals from agent_task_queue (terminal tasks only), scoped to workspace and optionally project. Same time anchor as ListDashboardAgentRunTime so day boundaries line up. - New handler GET /api/dashboard/runtime/daily + TanStack Query option. - New DailyTimeChart (single-series, smart h/m/s unit) and DailyTasksChart (completed + failed stacked). - Empty-state is per-metric so a workspace with tokens but no terminal runs (or vice-versa) doesn't get a false "no data". - i18n: en + zh-Hans daily.metric_time / metric_tasks + titles. Co-authored-by: multica-agent <github@multica.ai>	2026-05-15 18:51:02 +02:00
iYuan	d8635ad580	fix(issues): prevent duplicate active issue creation (MUL-2225) (#2602 ) * fix: prevent duplicate active issue creation * fix(issues): address duplicate guard review * fix(autopilot): skip duplicate issue admissions * fix(issueguard): tighten duplicate lookup edge cases * test(issues): cover duplicate guard autopilot skips * feat(autopilots): group skipped runs in history	2026-05-15 18:27:56 +08:00
Naiyuan Qing	5ad1641b72	Revert "Squad archive dialog + role editor + transactional DeleteSquad (#2680 )" (#2687 ) This reverts commit `2980ead4c7`.	2026-05-15 17:44:59 +08:00
Naiyuan Qing	2980ead4c7	Squad archive dialog + role editor + transactional DeleteSquad (#2680 ) * docs(squad): address plan-review feedback for archive + role plan Resolve the 4 items the reviewer raised on MUL-2265: 1. TS schema: declare `active_issue_count` as optional (`number \| null \| undefined`) so list/create/update Squad responses don't lie about their shape; only `getSquad` parses through SquadSchema. 2. Archive semantics: restrict TransferSquadAssignees to active issues (status NOT IN done, cancelled) so dialog count and SQL operate on one set and terminal-state issues keep their historical assignee. 3. Index assumption: corrected — `idx_issue_assignee (assignee_type, assignee_id)` exists and is sufficient at realistic squad cardinality; no new index needed. 4. Fixed `int64` test comparison and added `.loose()` to SquadSchema per the local schemas.ts convention. Co-authored-by: multica-agent <github@multica.ai> docs(squad): plan v3 — revert to count-all/transfer-all on archive Reviewer round 2 surfaced two structural problems with plan v2's active-only carve-out: 1. useActorName resolves squad names via ListSquads, which filters archived_at IS NULL. A closed issue with an archived-squad assignee would render as "Unknown Squad". 2. The status-only update path in UpdateIssue skips validateAssigneePair, so a done/cancelled issue with an archived-squad assignee could be reopened to in_progress, violating the "no active issue on an archived squad" invariant enforced elsewhere. Both problems disappear by reverting to count-all + transfer-all: after ArchiveSquad runs, no issue points at the archived squad, so neither case can occur. The product trade-off is that closed historical issues now show the leader agent instead of the archived squad in their "Assigned to" badge — consistent with existing agent-level reassignment behavior elsewhere in the product. Field rename: active_issue_count -> issue_count. TransferSquadAssignees SQL is unchanged (already transfers all). Co-authored-by: multica-agent <github@multica.ai> * docs(squad): add Task 2b — wrap DeleteSquad transfer + archive in one tx Reviewer round-3 flagged that the v3 invariant ("after archive no issue points to the squad") was asserted on the happy path only. DeleteSquad's current best-effort impl breaks it two ways: - transfer failure → slog.Warn but archive proceeds (Unknown Squad, reopen-into-archived-squad bugs reappear) - archive failure after a committed transfer → 500 with squad still active but emptied Task 2b rewrites DeleteSquad to run TransferSquadAssignees + ArchiveSquad inside one pgx tx, mirroring the project.go:266-314 pattern. Publish moves below Commit. Adds two regression tests that lock both partial-write failure modes. Co-authored-by: multica-agent <github@multica.ai> * feat(squad): replace native confirm() with AlertDialog and rewrite role editor as combobox Backend: - Add CountIssuesForSquad sqlc query (counts every issue assigned to a squad, no status filter — matches the existing transfer-all archive semantics). - Extend SquadResponse with optional `issue_count` (`int64` + omitempty, populated only by GetSquad to avoid an N+1 in the list endpoint). - Wrap DeleteSquad's transfer + archive in a single pgx transaction so the v3 invariant ("after archive, no issue points to the squad") is durable rather than best-effort. Promote slog.Warn to slog.Error and check the parseUUIDOrBadRequest ok flag (silent zero-UUID was a #1661-class latent bug). Publish only after Commit so realtime never sees rolled-back state. - Tests cover happy path (count, transfer-all including terminal statuses) and both rollback directions (transfer fail / archive fail) via a fault-injecting tx wrapper. Frontend: - Extend Squad TS type with `issue_count?: number \| null` (optional — list/create/update legitimately omit it). Add SquadSchema with `.loose()` and wrap getSquad with parseWithFallback so older servers and count-error responses degrade to the dialog's "no count" copy variant. - Replace `window.confirm()` with shadcn `ArchiveSquadConfirmDialog` (destructive variant, leader name + count + closed-issue caveat in the copy, Loader2 while pending). i18n keys added under squads.archive_dialog. - Rewrite RoleEditor as a Popover + Command combobox: Pencil affordance is always visible, suggestions aggregate other members' roles, commit only on Enter or selecting a suggestion (blur discards), per-member savingId drives Loader2 so the spinner only renders on the row being saved. Co-authored-by: multica-agent <github@multica.ai> fix(squad): discard RoleEditor draft on close and no-op blank Enter Two reviewer findings on `e0d754bf`: 1. Closing the Popover (outside click, Esc, trigger re-click) left `query` in state, so reopening + Enter would commit the stale draft. Clear `query` on every non-saving close path. 2. With an existing role, opening the editor and pressing Enter on an empty input committed "" — `commit` only no-op'd when trimmed matched value. Treat blank Enter as a no-op; clearing a role would need an explicit clear action that doesn't exist yet. Add two regression tests: - close (via outside click) → reopen surfaces a clean input; Enter does not commit the stale draft - blank Enter on an existing role does not call onSave Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Co-authored-by: multica-agent <github@multica.ai> * fix(squad): add explicit Clear button to RoleEditor Role is optional, but the previous fix turned blank Enter into a no-op without exposing any other way to clear an existing role — that broke a valid terminal state. Keep blank Enter as no-op; add a "Clear role" button at the bottom of the popover that only renders when value is non-empty and routes through onSave(""). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Co-authored-by: multica-agent <github@multica.ai> --------- Co-authored-by: multica-agent <github@multica.ai> Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-15 17:29:37 +08:00
LinYushen	319b23eb39	Revert "feat(task): add claim lease mechanism (Phase 2, MUL-2246) (#2660 )" (#2674 ) This reverts commit `3137feecdf`.	2026-05-15 16:07:23 +08:00
LinYushen	b7a58c06ac	Revert "feat(task): wire claim lease into TaskService and sweeper (MUL-2246) …" (#2673 ) This reverts commit `bb32be0e50`.	2026-05-15 16:06:58 +08:00
LinYushen	bb32be0e50	feat(task): wire claim lease into TaskService and sweeper (MUL-2246) (#2662 ) * feat(task): wire claim lease queries into TaskService and sweeper (MUL-2246) - ClaimTask now uses ClaimAgentTaskWithLease (generates claim_token + lease) - StartTask accepts optional claim_token for token-verified start - AgentTaskResponse includes claim_token for daemon to use - Daemon client sends claim_token in StartTask body - Sweeper calls RequeueExpiredClaimLeases each tick - Legacy daemons without claim_token still work (graceful fallback) Co-authored-by: multica-agent <github@multica.ai> * fix(task): address PR #2662 review blockers (MUL-2246) 1. ClaimAgentTaskForRuntime: push runtime_id into atomic SQL WHERE clause so runtime A cannot claim tasks queued for runtime B under the same agent. 2. Legacy StartAgentTask: add claim_token IS NULL guard so leased rows cannot be started without token verification. Handler rejects malformed tokens with 400 instead of silently degrading to legacy path. 3. StartAgentTaskWithClaimToken: validate claim_expires_at >= now(), preserve claim_token until terminal state (only clear claim_expires_at), use CTE + UNION ALL for idempotent retry when daemon resends after a lost StartTask response. Return 409 Conflict on token mismatch/expiry. Co-authored-by: multica-agent <github@multica.ai> * fix(daemon): StartTask 409 handling, transport retry, claim_token on FailTask (MUL-2246) - StartTask 409 (claim superseded): release slot, don't call FailTask - StartTask transport timeout/5xx: retry once with same token, then check task status before failing - FailTask now sends claim_token; server-side FailAgentTask SQL adds AND (claim_token IS NULL OR claim_token = @claim_token) guard so stale daemons cannot fail tasks that have been re-claimed Co-authored-by: multica-agent <github@multica.ai> * fix(task): close FailTask token bypass and RequeueExpiredClaimLeases liveness gap (MUL-2246) Blocker 1 - FailTask token validation: - SQL: change (param IS NULL OR claim_token = param) to (param IS NULL AND claim_token IS NULL) OR claim_token = param so tokenless requests can only fail legacy (tokenless) rows. - task.go: malformed claim_token now returns ErrInvalidClaimToken (400) instead of being silently dropped to NULL. - Handler: maps ErrInvalidClaimToken→400, ErrClaimTokenInvalid→409. - Service: when UPDATE returns no rows but task is still active, return ErrClaimTokenInvalid (token mismatch) instead of silent success. Blocker 2 - RequeueExpiredClaimLeases runtime liveness: - SQL: JOIN agent_runtime, only requeue tasks where runtime is 'online'. Dead/offline runtime tasks stay dispatched for FailTasksForOfflineRuntimes. - FOR UPDATE → FOR UPDATE OF atq (required with JOIN). Regression tests: - task_claim_token_test.go: malformed, tokenless-on-tokened, wrong-token - requeue_lease_test.go: SQL must JOIN agent_runtime with online filter Co-authored-by: multica-agent <github@multica.ai> * fix(task): move expired lease requeue to ClaimTaskForRuntime preflight, add heartbeat freshness backstop (MUL-2246) - Add RequeueExpiredClaimLeasesForRuntime: per-runtime preflight self-requeue in ClaimTaskForRuntime. Runtime proves liveness by actively claiming, so no heartbeat check needed. - Update global RequeueExpiredClaimLeases to require ar.last_seen_at freshness (stale_threshold_secs param). Prevents requeuing to a dead runtime in the 90s gap between lease expiry (60s) and offline detection (150s). - Add regression tests verifying the heartbeat freshness check and that the preflight query does not join agent_runtime. Co-authored-by: multica-agent <github@multica.ai> * fix(task): use LivenessStore for global requeue, move preflight before empty-cache (MUL-2246) Blocker 1: Global RequeueExpiredClaimLeases now uses LivenessStore.IsAliveBatch to verify runtimes are truly alive before requeuing expired leases. When LivenessStore is unavailable (no Redis), global requeue is skipped entirely — the preflight self-requeue in ClaimTaskForRuntime handles live runtimes. This closes the 60-150s gap where a dead runtime still appears online in DB. Blocker 2: Moved RequeueExpiredClaimLeasesForRuntime BEFORE EmptyClaim.IsEmpty fast-path in ClaimTaskForRuntime. Expired leases are now requeued (which bumps the empty cache via notifyTaskAvailable) before the empty check can short-circuit the claim path. Also adds ListRuntimesWithExpiredClaimLeases SQL query and LivenessChecker interface on TaskService. Co-authored-by: multica-agent <github@multica.ai> * fix(task): wire EmptyClaimCache into backend taskSvc for backstop requeue (MUL-2246) The backend taskSvc used by the sweeper only had Liveness wired but not EmptyClaim. When global backstop requeue called notifyTaskAvailable, s.EmptyClaim.Bump() was a nil no-op — the handler's empty-cache was never invalidated, so the daemon's next claim hit a stale empty verdict. Fix: wire the same Redis-backed EmptyClaimCache into the backend taskSvc in main.go (same Redis keys as router.go:139 handler instance). Add regression test verifying backstop requeue invalidates the handler's empty-cache. Co-authored-by: multica-agent <github@multica.ai> * fix(task): global backstop must not requeue — alive runtimes use preflight, dead stay dispatched (MUL-2246) - RequeueExpiredClaimLeases is now a no-op (returns 0 always) - Alive runtimes self-requeue via ClaimTaskForRuntime preflight - Dead runtimes stay dispatched for FailTasksForOfflineRuntimes - Rewriting to queued on dead runtime creates 2h blackhole (offline sweeper only handles dispatched/running) - Test actually calls RequeueExpiredClaimLeases and asserts 0 in all cases Co-authored-by: multica-agent <github@multica.ai> * fix(daemon): remove duplicate usage reporting block after merge conflict (MUL-2246) The merge resolution introduced a second ReportTaskUsage call after the status check, duplicating the usage-before-early-return block that already runs right after runner.run. Remove the duplicate and add a regression test asserting /usage is called exactly once on the normal completion path. Co-authored-by: multica-agent <github@multica.ai> --------- Co-authored-by: multica-agent <github@multica.ai>	2026-05-15 15:15:31 +08:00
LinYushen	3137feecdf	feat(task): add claim lease mechanism (Phase 2, MUL-2246) (#2660 ) Add claim_token + claim_expires_at columns to agent_task_queue and three new SQL queries for the claim lease protocol: - ClaimAgentTaskWithLease: generates a UUID token and sets a lease expiry when claiming a task, so the daemon must prove it received the response - StartAgentTaskWithClaimToken: validates the token on StartTask, preventing stale daemons from starting requeued tasks - RequeueExpiredClaimLeases: moves dispatched tasks with expired leases back to queued for re-claim This closes the reliability gap where a claim response lost in transit leaves a task stuck in dispatched until the 60s dispatch timeout fires. Co-authored-by: multica-agent <github@multica.ai>	2026-05-15 15:14:05 +08:00
Naiyuan Qing	f29bd93444	feat(squads): rework Create Squad modal (MUL-2233) (#2645 ) * feat(squad): accept avatar_url on CreateSquad Threads avatar_url through the SQL query, sqlc-generated code, and the Go handler so the create-squad flow can persist an avatar at creation time instead of forcing a follow-up PATCH. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Co-authored-by: multica-agent <github@multica.ai> * feat(squad): add avatar_url to CreateSquadRequest Extends the TS contract for the new backend field so the frontend can pass an uploaded avatar URL through api.createSquad. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Co-authored-by: multica-agent <github@multica.ai> * feat(squads): rework Create Squad modal to match CreateAgentDialog (MUL-2233) Replaces the cramped small-dialog flow with the same large-dialog shape used by Create Agent: identity row (AvatarPicker + name + description with char counter), grouped Leader picker (My Agents first, then Workspace Agents), and a new multi-select Additional Members picker covering agents and workspace members. The members trigger collapses to "+N" once more than three are selected; promoting an agent to leader auto-drops it from the additional-members list. After createSquad, additional members are attached via Promise.allSettled so a single failure surfaces a warning toast without blocking navigation — the squad still exists and the user can retry from the Members tab. Adds packages/views/modals/create-squad.test.tsx covering identity binding, leader-group ordering, leader/member conflict sanitization, the empty- and partial-failure success paths, and the create-failure recovery path. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Co-authored-by: multica-agent <github@multica.ai> * fix(squads): valid trigger HTML + drop conflicted leader from members Two issues from PR #2645 review: 1. AdditionalMembersPicker's PopoverTrigger was a <button> containing MemberChip's remove <button>, which React/HTML flags as nested interactive content (hydration + a11y warning). Render the trigger as a <div role="combobox"> via Base UI's render prop so the chip's remove button is valid. 2. sanitizedMembers only hid the leader from rendered/submitted output, so promoting an additional member to leader then switching leader away resurrected the hidden pick. Drop it from selectedMembers at the moment of promotion via handleLeaderChange; sanitizedMembers is no longer needed. Adds a test that promotes → switches leader and asserts the member is not resubmitted. Co-authored-by: multica-agent <github@multica.ai> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com> Co-authored-by: multica-agent <github@multica.ai>	2026-05-15 13:11:08 +08:00
Bohan Jiang	8d872b7521	fix(daemon): disable Claude AskUserQuestion in non-interactive mode (MUL-2244) (#2656 ) * fix(daemon): disable Claude AskUserQuestion in non-interactive mode (MUL-2244) GitHub #2588: when Claude Code calls its built-in AskUserQuestion tool inside the daemon's stream-json runtime, the question never reaches the user — there's no UI to render it — so the SDK returns an empty answer and the agent silently "infers" and continues. From the issue's perspective, execution looks stuck while the agent is actually charging ahead on its own guess. Two-part fix: - `buildClaudeArgs` now passes `--disallowedTools AskUserQuestion` so the tool is not exposed to the model at all. - The Claude-specific runtime brief tells the agent to use a `blocked` issue comment for genuine clarification, or to state an explicit assumption and proceed. Adds a regression test that pins both: AskUserQuestion is forbidden in CLAUDE.md and is NOT mentioned in the AGENTS.md emitted for non-Claude providers (the tool is Claude-specific). Co-authored-by: multica-agent <github@multica.ai> * refactor(daemon): drop CLAUDE.md AskUserQuestion guidance, rely on --disallowedTools The --disallowedTools flag already prevents Claude from invoking AskUserQuestion, so duplicating the rule in the runtime brief just bloats the prompt without changing behavior. Removes the section and its regression test; the argv-level test in pkg/agent already pins the flag. Co-authored-by: multica-agent <github@multica.ai> --------- Co-authored-by: multica-agent <github@multica.ai>	2026-05-15 12:42:23 +08:00
Jiayuan Zhang	4d6b5ad06f	fix(squad): wake leader when dual-role agent posts as worker (MUL-2218) (#2626 ) * fix(squad): wake leader when dual-role agent posts as worker (MUL-2218) The squad-leader self-trigger guard skipped a comment whenever the author equalled the squad's leader id, regardless of the role the agent was acting in. For an agent that holds both leader and worker roles in the same squad, this meant the leader role never reacted to its own worker output and the issue stalled. Tag each enqueued task with is_leader_task and consult the agent's most recent task on the issue from both self-trigger guards (comment path + @squad mention path) — skip only when that task was itself a leader task. Co-authored-by: multica-agent <github@multica.ai> * fix(squad): inherit is_leader_task on retry task clone (MUL-2218) CreateRetryTask cloned a parent task into a fresh queued attempt but omitted is_leader_task from the column list, so the child silently fell back to the column default (false). For a leader task that hit auto-retry through MaybeRetryFailedTask, the retried task posed as a worker task — the self-trigger guard then no longer recognised the leader's own comments, re-opening the very loop MUL-2218 closes. Inherit p.is_leader_task in the clone and add a query-level test that covers both leader and worker retries. Co-authored-by: multica-agent <github@multica.ai> --------- Co-authored-by: multica-agent <github@multica.ai>	2026-05-14 15:23:36 +02:00
LinYushen	0cb759b446	fix(squad): suppress no-action leader comments (#2583 )	2026-05-14 14:07:26 +08:00
LinYushen	add3135a42	feat(cli): add squad create/update/delete and member add/remove (#2574 ) * feat(cli): add squad create/update/delete and member add/remove commands Implement missing squad management commands in the CLI: - squad create --name --leader [--description] - squad update <id> [--name] [--description] [--instructions] [--leader] [--avatar-url] - squad delete <id> - squad member add <squad-id> --member-id --type [--role] - squad member remove <squad-id> --member-id --type Also adds DeleteJSONWithBody to the API client for the member remove endpoint which uses DELETE with a JSON body. All commands support --output json for structured output. Co-authored-by: multica-agent <github@multica.ai> * fix(squad): add --output json to delete/member remove, return 404 on 0-row delete - squad delete: add --output json flag, emit {id, deleted} on success - squad member remove: add --output json flag, emit {squad_id, member_id, removed} - Backend RemoveSquadMember: change query to :execrows, check RowsAffected and return 404 'squad member not found' when 0 rows deleted Co-authored-by: multica-agent <github@multica.ai> --------- Co-authored-by: multica-agent <github@multica.ai>	2026-05-14 12:51:44 +08:00
fr00st	cc9fbd3db0	Fix stale Done replies on comment follow-ups (#2495 ) * fix: avoid stale done replies on comment follow-ups * fix: avoid inlining runtime brief for Hermes ACP * fix: address comment follow-up review feedback	2026-05-14 12:00:04 +08:00
LinYushen	29082f7cfe	feat: implement Squad feature MVP (#2505 ) * feat: implement Squad feature MVP - Add migration 084_squad: squad, squad_member, squad_activity_log tables - Extend issue.assignee_type to support 'squad' - Add sqlc queries for squad CRUD, member management, activity logs - Add Go handler with full Squad API (CRUD, members, activity log) - Register routes: /api/squads/, /api/issues/{id}/squad-activity, /api/squad-activity - Add Squad trigger logic: - Assign Squad immediately triggers leader - Every external comment on squad-assigned issue triggers leader - Anti-loop: squad members' comments don't trigger leader - Dedup: skip if leader already has pending task - Add squad activity log API (方案 B) for leader no-op recording - Add frontend TypeScript types (Squad, SquadMember, SquadActivityLog) - Add protocol events: squad:created, squad:updated, squad:deleted Co-authored-by: multica-agent <github@multica.ai> fix: address PR review blocking issues 1. validateAssigneePair now accepts 'squad' assignee_type 2. All squad endpoints validate workspace ownership via GetSquadInWorkspace 3. CreateSquadActivityLog restricted to squad leader agent only 4. AddSquadMember validates member exists in workspace 5. UpdateSquad auto-adds new leader to squad members 6. DeleteSquad transfers assigned issues to leader before deletion 7. IssueAssigneeType includes 'squad' in frontend types Co-authored-by: multica-agent <github@multica.ai> * feat: soft-delete squads via archive instead of hard delete - Add migration 085: archived_at + archived_by columns on squad table - ListSquads now excludes archived squads (ListAllSquads for admin) - DeleteSquad → ArchiveSquad (sets archived_at, preserves all records) - Transfer squad-assigned issues to leader before archiving - SquadResponse includes archived_at/archived_by fields - Frontend Squad type updated with nullable archived fields Co-authored-by: multica-agent <github@multica.ai> * feat: re-add Squads frontend entry (sidebar nav + pages) Re-applies the frontend squad entry that was lost during a merge: - Sidebar nav: Squads item with Users icon - Paths: squads() and squadDetail() in workspace paths - Routes: /squads and /squads/[id] pages - Views: SquadsPage (list) and SquadDetailPage - i18n: en 'Squads' / zh '小队' - Reserved slug: 'squads' Co-authored-by: multica-agent <github@multica.ai> * fix: fix SquadsPage rendering - use PageHeader children pattern PageHeader takes children, not title/actions props. The incorrect usage caused a React rendering error. Now matches the pattern used by autopilots and agents pages. Co-authored-by: multica-agent <github@multica.ai> * fix(squads): add API client methods and package export for squads pages * feat: complete Squad frontend - create dialog, member management, API methods - Add CreateSquadModal with name/description/leader selection - Register 'create-squad' in modal registry - Wire 'New Squad' button to open the modal - Add full API client methods: createSquad, updateSquad, deleteSquad, addSquadMember, removeSquadMember - Rewrite SquadDetailPage with: - Member list showing resolved names - Add/remove member UI - Archive squad button - Back navigation to squads list Co-authored-by: multica-agent <github@multica.ai> * feat: improve Squad UI - match create agent dialog style - CreateSquadModal: proper Dialog with Header/Description/Footer, agent picker with avatars, textarea for description - SquadDetailPage: centered max-w-2xl layout, ActorAvatar for members, Crown badge for leader, textarea for member description, improved spacing and visual hierarchy - Renamed 'role' field label to 'Description' in add member form (describes the member's responsibilities in the squad) Co-authored-by: multica-agent <github@multica.ai> * feat(squad): add avatar, instructions; drop unique-name constraint - 086: add squad.avatar_url - 087: drop unique constraint on squad.name (squads with the same name are legitimate across teams; uniqueness was an accidental product constraint) - 088: add squad.instructions (text, default '') - UpdateSquad now COALESCEs avatar_url + instructions - handler exposes Instructions in SquadResponse and accepts it in UpdateSquad Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * feat(squad): assignable + mention target; trigger leader on assign - assignee picker and @mention suggestion list squads alongside agents and members; renders squad avatar/icon - creating or updating an issue with assignee_type=squad enqueues a task for the squad's current leader (mirrors agent-assignee parking-lot rule: skip backlog only) - workspace queries/hooks expose squads where needed for the pickers - locales updated for new picker copy Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * feat(squad): agent-style detail page with members + instructions tabs - restructure squad detail page to mirror the agent detail page: 320px inspector (creator, leader, created/updated) + tabbed pane (Members \| Instructions) with dirty-guard AlertDialog - inline name + avatar editing on the inspector - inline description editor (modal textarea) - members tab: leader + member picker with role descriptions, swap leader, edit member roles, remove - instructions tab: ContentEditor + Save (mirrors agent pattern) - squads list shows the squad avatar/icon - core types + api.updateSquad accept avatar_url + instructions Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * feat(squad): inject leader briefing on claim (protocol + roster + instructions) When a squad's leader agent claims a task on a squad-assigned issue, append a system-level briefing to the agent's Instructions composed of: 1. Squad Operating Protocol — hard-coded rules: leader is a coordinator, dispatch via @mention, stop after dispatching, resume on re-trigger, do not work outside the roster. 2. Squad Roster — leader self-row plus one row per non-archived member with a literal mention markdown string ([@Name](mention:// agent\|member/<UUID>)) the leader can paste verbatim. Round-trips through util.ParseMentions, enforced by a contract test. 3. Squad Instructions — the user-defined squad.instructions block, omitted entirely when empty so we do not leave a dangling heading. Non-leader members claiming the same issue receive no briefing. Tests cover: full squad with mixed agent/human members, lone leader, archived agents skipped, empty user instructions, mention round-trip, and the leader/non-leader claim-handler gate. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix(squad): tell leader not to restate issue context in dispatch comment After observing leaders padding their delegation comments with full re-summaries of the issue body and prior discussion, make the Operating Protocol explicit: - assignees on Multica already have the full issue (title, description, all comments, attachments) and workspace context; - delegation comments should add only what cannot be inferred (who is picked, why, extra constraints), aim for two or three sentences; - restating context is now an explicit hard rule violation. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * feat(squad): unify leader evaluation into activity_log, add CLI command - Squad member comments now trigger leader (only leader self-excluded) - Replace squad_activity_log with activity_log (action: squad_leader_evaluated) - Add CLI: multica squad activity <issue-id> <outcome> --reason - Add API: POST /api/issues/{id}/squad-evaluated - Update squad operating protocol to require evaluation recording - Remove squad_activity_log table from schema and generated code * feat(cli): add squad list, get, member list commands * fix(squad): address review findings (P1+P2) P1 fixes: - Add 'squads' to reserved_slugs.json (source of truth) - Add 'create-squad' to ModalType union - Remove unused leaderOpen/selectedLeader in create-squad modal - Replace literal JSX strings with i18n selectors (en + zh-Hans) P2 fixes: - Add 'squad' to mention regex (MentionRe) - Fix human member lookup in squad briefing (use GetUser directly) - Add squads routes to desktop app - Add squad:created/updated/deleted to WSEventType + invalidation - Reject archived squads as issue assignees * fix(squad): restore zh-Hans key, publish activity event, invalidate issues on archive - Restore create_project.title in zh-Hans modals.json (dropped by prior edit) - Publish activity:created WS event after squad leader evaluation - Invalidate issue queries on squad:deleted (archive transfers assignees) - Add creator info to squad list cards * fix(squad): realtime sync, rerun support, leader validation - Use workspaceKeys.squads prefix for detail/member queries (realtime invalidation) - Publish squad:updated after add/remove/role-change member mutations - Support rerun for squad-assigned issues (targets leader agent) - Reject assignment to squads whose leader is archived --------- Co-authored-by: multica-agent <github@multica.ai> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-13 18:46:20 +08:00
Naiyuan Qing	623d29f276	feat(agents): one-click create from curated templates (Phase 1) (#2520 ) * docs(agents): three-phase agent quick-create plan Captures the full design for moving agent creation from manual form + one-by-one skill attachment to a tiered experience: - Phase 1 (this PR): one-click curated templates, AI-free. - Phase 2 (next): AI-recommended skills via the existing quick-create task mechanism — no new server-side LLM dependency. - Phase 3 (later): AI creates the whole agent end-to-end, composing Phase 2 with a new `multica agent create` CLI driver. Documents the architectural decisions that keep all three phases on existing infrastructure (no SSE, no server-side LLM SDK, no new WS channels), the two soft blockers Phase 1 unlocks for later phases (createSkillWithFiles TX composability + skill same-name dedupe), and the scope decisions we explicitly opted out of (Anthropic plugin marketplace, ClawHub UI affordances). Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(skills): harden import against invalid UTF-8 and binary files PG rejects two byte patterns in a TEXT column. Both crashed real skill imports we hit while assembling the template catalog: - Embedded NUL (0x00) -> SQLSTATE 22021. Already stripped by sanitizeNullBytes, kept as-is. - Other invalid UTF-8 (e.g. 0x91 — Windows-1252 smart quote in a skill whose author saved prose from Word). sanitizeNullBytes now also runs strings.ToValidUTF8 over the content so the second class no longer takes the whole import down. For non-text payloads (images, fonts, archives, compiled binaries), sanitization isn't the right fix — agents never read those as text, and the bytes can't survive a TEXT column at all. addFile now skips them by extension before the per-bundle cap counters tick, logging the skip so an unexpected drop leaves a breadcrumb. Function name kept for compatibility with the many call sites; both behaviours are strict supersets of the original. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(skills): split createSkillWithFiles for tx composition + add workspace find-or-create query Two soft blockers cleared so create-from-template (next commit) can fold N skill creates and the agent + binding writes into one outer transaction: 1. createSkillWithFiles used to Begin/Commit its own tx. Caller composition was impossible — N invocations meant N separate transactions and no atomicity over the whole materialise step. Pull the body into createSkillWithFilesInTx(ctx, qtx, input); the original function becomes a thin wrapper that manages its own tx for standalone callers. Existing call sites: zero behaviour change. 2. Add GetSkillByWorkspaceAndName sqlc query — workspace skill lookup by name, anchored to UNIQUE(workspace_id, name) from migration 008. Lets the template materialiser implement find-or-create: reuse the workspace's existing skill row when a template references the same name, rather than crashing on the unique constraint or polluting the workspace with `<name>-2` clones. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(agents): agent template catalog + create-from-template endpoint Server-side foundation for Phase 1 of the quick-create roadmap (see docs/agent-quick-create-plan.md). Adds: - server/internal/agenttmpl/ — embed-loaded catalog of curated agent templates. Each template ships pre-written instructions plus a list of skill URLs that get materialised into the workspace at create time. Validation runs at startup (init() panics on a malformed template) so a bad JSON ships as a deploy-time defect, not a runtime 500. Slug must equal the filename basename so the URL router is mirror-symmetric with the file layout. - 11 starter templates covering Engineering / Writing / Building / Testing (code-reviewer, frontend-builder, planner, docs-writer, one-pager, html-slides, full-stack-engineer, …). - Three new endpoints, all behind RequireWorkspaceMember: GET /api/agent-templates — picker list (no instructions) GET /api/agent-templates/:slug — detail with instructions POST /api/agents/from-template — materialise + create Create flow: 1. Auth + runtime authorization happen BEFORE the GitHub fan-out so a 403 never wastes 20s of upstream fetches. 2. Pre-flight dedupe by cached_name reuses workspace skills without an HTTP fetch — second create-from-the-same-template drops from 20s to <100ms. 3. Parallel fetch (30s per-URL timeout) for the remaining skills. 4. Single transaction: every skill insert, the agent insert, and the agent_skill bindings. On any upstream fetch failure the TX rolls back and the API returns 422 with `failed_urls` so the UI can name the bad source(s). 5. extra_skill_ids (user-supplied additions) are verified through GetSkillInWorkspace per id before attach, so a malicious client can't graft a skill from another workspace via UUID guessing. - multica agent create --from-template <slug> CLI flag dispatches to the new endpoint with a 60s ceiling, matching `multica skill import`. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(agents): one-click create-from-template UI Frontend half of Phase 1. CreateAgentDialog becomes a state machine spanning four steps: chooser → Start blank / From template cards blank-form → existing manual form (post-chooser) duplicate-form → existing form pre-filled from a duplicated agent template-picker → grid of templates, click navigates to detail template-detail → instructions + skill list preview + one-click Use Picking a template never lands on the form: name auto-deduped against existingAgentNames, runtime = first usable one, visibility = private. Refinement happens on the agent detail page if needed. Same rationale the doc spells out — templates exist precisely to skip configuration. New components, all collapsible-by-default so quick-create stays fast: - template-picker.tsx — categorised grid, lucide icons + semantic accent tokens resolved through static maps so Tailwind's JIT picks up every variant (dynamic class strings would silently miss). - template-detail.tsx — instructions preview, skill list with cached descriptions, Use CTA. Renders the failedURLs banner when a 422 fires — the only step that can trigger that response. - instructions-editor.tsx — collapsed preview-card / expanded full ContentEditor. - skill-multi-select.tsx + skill-picker-list.tsx — shared multi- select surface, also adopted by the existing skill-add-dialog. - avatar-picker.tsx — agent avatar upload, mirrors the inspector's visual language. Schema-defended client (CLAUDE.md → API Response Compatibility): the three new endpoints are wired through parseWithFallback with lenient zod schemas. Desktop builds outlive any given server — a future field rename / wrapping must not white-screen older installs. listAgentTemplates accepts both the current bare array and a future {templates: [...]} envelope. Coverage: 7 new schema-test cases in schema.test.ts (null body, missing skills/instructions, malformed create response, envelope migration). Catalog + detail go through TanStack Query with staleTime: Infinity — workspace-independent static data, no per-mount refetch. Other: - skill-add-dialog becomes a true multi-select (Confirm button + checkbox list); attached skills are filtered out of the list. - agents-page hands the freshly-created Agent back to the dialog so a follow-up setAgentSkills can attach the form-selected skills. - agent-overview-pane drops the mx-auto/max-w-2xl frame on config- tab content; the wider dialog visual language reads better with tabs filling the column. - Every new UI string lives in both en/agents.json and zh-Hans/agents.json under create_dialog.* / tab_body.skills.* — locales/parity.test.ts blocks drift in CI. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(ci): align skill import test + drop next-only lint suppression - TestFetchFromSkillsSh_ResolvesRootLevelSkillMd now expects assets/logo.png to be skipped; matches the new addFile binary-extension guard (`6fafd86e`). The .png is intentionally dropped so PG TEXT inserts don't hit SQLSTATE 22021. - packages/views shares zero next/* deps, so the @next/next/no-img-element eslint plugin isn't loaded there. The eslint-disable directive referencing it produced a hard "rule not found" error in CI lint. Raw <img> is the right primitive in views; remove the disable comment. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Co-authored-by: multica-agent <github@multica.ai> * test(agents): wrap CreateAgentDialog tests in workspace/navigation providers The dialog now calls useNavigation() and useWorkspacePaths(), both of which throw outside their providers. The existing tests rendered the dialog bare and tripped both new requirements: - NavigationProvider — supply a stub adapter so push() works for the agent-detail redirect. - WorkspaceSlugProvider — useWorkspacePaths() requires a slug. The blank-vs-template chooser is now the default first step; the existing tests target the runtime picker on the manual form, so the helper auto-clicks "Start blank" when no template is passed (duplicate-mode tests skip the chooser). Manual afterEach(cleanup) + document.body wipe. Base UI's Dialog portal renders into document.body and leaves focus-guard/inert wrapper divs behind across tests, so the second test in the suite saw two "All" / "My Runtime" matches and getByText failed. The wipe is local to this file rather than the shared setup because it isn't a global issue — only suites that open Base UI dialogs hit it. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Co-authored-by: multica-agent <github@multica.ai> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: multica-agent <github@multica.ai>	2026-05-13 18:26:04 +08:00
Bohan Jiang	5db96b4007	fix(daemon): bypass Gemini folder-trust gate in headless mode (#2516 ) (#2523 ) Gemini CLI's folder-trust feature throws FatalUntrustedWorkspaceError (exit code 55) when the current workspace isn't in `~/.gemini/trustedFolders.json` and the process is headless — no interactive trust prompt is available. The daemon spawns gemini with `-p` + `--yolo` in a freshly checked-out worktree that the user has never trusted interactively, so every run with `security.folderTrust` enabled fails after ~10s with exit status 55 and no useful output. Default `GEMINI_CLI_TRUST_WORKSPACE=true` on the child env to short- circuit `checkPathTrust` in gemini-core. This mirrors gemini-cli's documented `--skip-trust` flag; the env var has been gemini's documented headless escape hatch for the entire folder-trust feature lifetime so the fix works on every gemini version that can produce the crash. Callers that explicitly set the same key in cfg.Env win, preserving the ability to opt back into the gate. Co-authored-by: multica-agent <github@multica.ai>	2026-05-13 17:05:12 +08:00
Bohan Jiang	178cfb5008	fix(daemon): strip Windows chcp noise from runtime version (#2516 ) (#2521 ) The gemini CLI's Windows shim emits `Active code page: 65001` (from `chcp`) to stdout before the real version reaches `--version` output. The daemon stored the raw concatenation as the runtime version, so the runtime detail page rendered `Active code page: 65001 0.42.0` instead of `0.42.0`. Scan `<cli> --version` line by line and return the first line carrying a semver-shaped token. Full strings like `2.1.5 (Claude Code)` or `codex-cli 0.118.0` survive unchanged; unparseable output falls back to the trimmed raw value. Co-authored-by: multica-agent <github@multica.ai>	2026-05-13 16:58:14 +08:00
Bohan Jiang	51aa924124	feat(chat): support renaming chat sessions inline (#2522 ) Adds a pencil icon next to the trash icon on each session row in the chat dropdown. Clicking it turns the title into an inline editable input: Enter / blur saves, Escape cancels. Server: new PATCH /api/chat/sessions/{id} handler that updates the title via the existing `UpdateChatSessionTitle` sqlc query, broadcasts a new `chat:session_updated` WS event so other tabs / devices stay in sync, and rejects blank titles. Frontend mutation is optimistic with rollback, matching the existing delete-session pattern. MUL-2110 Co-authored-by: multica-agent <github@multica.ai>	2026-05-13 16:57:34 +08:00
Naiyuan Qing	e8c2855746	fix(chat): collapse chat-done flicker via inline cache write (#2509 ) * fix(chat): collapse chat-done flicker via inline cache write The chat panel flickered at end-of-turn: live TimelineView unmounted → short blank + scroll jump → persistent AssistantMessage finally appeared. Root cause: chat:done's WS handler called setQueryData(pendingTask, {}) synchronously while invalidateQueries(messages) was an async refetch. The render guard pendingAlreadyPersisted (chat-message-list.tsx:62-68) expected the persisted message to already be in the messages cache before pending cleared, but the sync/async ordering broke that guard. Fix follows TkDodo's "combine setQueryData (active query) + invalidate (others)" pattern. ChatDonePayload now carries the freshly-persisted ChatMessage (id, content, elapsed_ms, created_at); the WS handler writes it into chatKeys.messages BEFORE clearing pending. Same render tick → AssistantMessage mounts before TimelineView unmounts → no flicker. invalidate(messages) stays as a fallback for clients that took the older code path or for content drift (redaction, etc.). Also slim task:completed's chat branch — chat:done already wrote the message and cleared pending; task:completed only refreshes the cross-session pending aggregate that drives the FAB. Field additions are all `omitempty` / TS `?:` so older clients ignore them and older servers (no fields populated) fall back to invalidate- only, preserving prior behavior. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Co-authored-by: multica-agent <github@multica.ai> * test(chat): cover chat done cache handoff Co-authored-by: multica-agent <github@multica.ai> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com> Co-authored-by: multica-agent <github@multica.ai> Co-authored-by: Eve <eve@multica-ai.local>	2026-05-13 15:27:44 +08:00
Bohan Jiang	96695a79c5	feat(dashboard): workspace/project token + run-time dashboard MUL-1882 (#2462 ) * feat(dashboard): workspace/project token + run-time dashboard Add a `/{slug}/dashboard` page showing per-agent token spend and execution time across the whole workspace, with an optional project filter. Backend: - Three new sqlc queries against task_usage + agent_task_queue: daily usage, per-agent usage, per-agent total run-time. All optionally scoped to a project via sqlc.narg('project_id'), reaching project through the issue join. - Handlers under /api/dashboard return the same wire shape the runtime page already consumes (model preserved for client-side cost math). Frontend: - Shared DashboardPage in packages/views/dashboard reusing KpiCard, DailyCostChart, ActorAvatar, and estimateCost from the runtime page so the visual style and pricing math stay in lock-step. - Period selector (7/30/90d), project dropdown, four KPI tiles (cost, tokens, run time, tasks), daily cost chart, and a combined "cost + run time by agent" list. - Routed in both web (app/[slug]/(dashboard)/dashboard) and desktop (memory router); sidebar nav entry added under Workspace group. Co-authored-by: multica-agent <github@multica.ai> * fix(dashboard): drop stale project filter and stop double-counting tasks Two issues caught in PR #2462 review: 1. Project filter held the previous selection's UUID across workspace switches and project deletions: the dropdown gracefully showed "All projects" (because the title lookup missed) while the three dashboard queries kept forwarding the dead UUID, leaving the UI looking like a full-workspace view but populated with empty project-scoped data. Validate the picked UUID against the current projects list before passing it to the queries. 2. The "by agent" table read its task count from the token rollup, which is grouped per (agent, model). A single task that spans two models lands twice and the agent's row reads e.g. "2 tasks" when the real count is 1. Prefer `ListDashboardAgentRunTime`'s per-agent distinct count when available; fall back to the token aggregate only for agents with no terminal run yet (in-flight tasks). Extract the merge into `mergeAgentDashboardRows` so the precedence rules are unit-tested directly. Co-authored-by: multica-agent <github@multica.ai> * test(dashboard): allocate per-workspace issue.number explicitly TestDashboardEndpoints creates two issues in the shared fixture workspace. issue.number defaults to 0 (migration 020), and the table carries UNIQUE (workspace_id, number), so the second insert raced the first on the same default and failed in CI. Allocate MAX(number) + 1 per insert so each row gets a fresh number without stepping on rows other tests left behind in the same workspace. Co-authored-by: multica-agent <github@multica.ai> * feat(dashboard): rollup table + cron-driven aggregation for dashboard Mirror the per-runtime rollup in `task_usage_daily` (migrations 073/077/082) to remove the per-request raw aggregation the dashboard was doing. Migration 084 adds: - `task_usage_dashboard_daily` keyed on (bucket_date, workspace_id, agent_id, project_id, model) — the dimensions the dashboard actually queries, with project_id nullable via UNIQUE NULLS NOT DISTINCT (PG15+) so "no-project" buckets upsert cleanly. - `task_usage_dashboard_rollup_state` watermark table. - `task_usage_dashboard_dirty` invalidation queue. - Triggers on agent_task_queue DELETE, task_usage DELETE, and issue.project_id UPDATE — the cases the updated_at watermark can't see. The project_id trigger re-attributes existing rollup rows when a user moves an issue across projects. - `rollup_task_usage_dashboard_daily_window(from, to)` — idempotent recompute primitive (same shape as 077). - `rollup_task_usage_dashboard_daily()` cron entry — own advisory lock (4244) so it serialises independently of the runtime rollup. - `task_usage_dashboard_rollup_lag_seconds()` health helper. Sqlc queries `ListDashboardUsageDailyRollup` / `ListDashboardUsageByAgentRollup` read from the new table; the handler dispatches between rollup and raw on a separate `UseDailyRollupForDashboard` config flag (`USAGE_DASHBOARD_ROLLUP_ENABLED` env). Same fail-safe default (false → raw) so operators can roll out independently of the per-runtime flag. Bucket date is UTC (the dashboard aggregates across runtimes that may sit in different tzs; there's no single correct local boundary). Adds `cmd/backfill_task_usage_dashboard_daily` mirroring the existing per-runtime backfill — operator runs it once before flipping the flag. Tests: - TestDashboardEndpoints now also exercises the rollup read path (raw vs. rollup, same project-scoped totals). - TestDashboardRollupReattributesOnProjectChange verifies the issue.project_id trigger enqueues both old + new buckets and the next rollup tick zeroes the old project + populates the new one. Co-authored-by: multica-agent <github@multica.ai> * fix(dashboard-rollup): close two invalidation gaps Two leak paths missed by migration 084 review: 1. Issue cascade DELETE — the atq BEFORE DELETE trigger runs AFTER the issue row is gone, so `LEFT JOIN issue` returns NULL project_id and the original-project bucket never gets cleared (issue 077 calls this out for the runtime rollup but didn't need to act on it). Adds an `issue BEFORE DELETE` trigger that enqueues using OLD.project_id while the issue row is still readable. 2. `LinkTaskToIssue` (quick-create task attaching to a real issue post- completion) UPDATEs `agent_task_queue.issue_id` from NULL to a real id. Migration 084 only watched DELETE on atq, so usage already rolled up under the no-project bucket stayed attributed to NULL forever. Extends the atq trigger to fire on UPDATE OF issue_id too, enqueueing both OLD (NULL project) and NEW (linked issue's project). Tests: - TestDashboardRollupClearsOnIssueDelete asserts rollup row drops to zero after issue delete + rollup tick. - TestDashboardRollupReattributesOnLinkTaskToIssue verifies tokens move from the NULL bucket to the project bucket after the UPDATE. Co-authored-by: multica-agent <github@multica.ai> --------- Co-authored-by: multica-agent <github@multica.ai>	2026-05-13 12:51:16 +08:00
Bohan Jiang	a02e58b488	fix(github): only auto-close issue after all linked PRs resolve (#2470 ) * fix(github): only auto-close issue when all linked PRs have resolved Previously, the webhook handler unconditionally moved an issue to `done` as soon as a single linked PR was merged. If a second PR was also linked to the same issue and still open / draft, the issue would close before the work was actually finished. Add `CountOpenSiblingPullRequestsForIssue` and gate the auto-status transition on it: a merged PR advances its linked issues only when no sibling PR linked to the same issue is still in flight. Issues stay put while siblings are open or draft, and the merge that resolves the last in-flight PR is the one that closes the issue. Adds an integration test that opens two PRs against the same issue, merges the first, asserts the issue stays in_progress, then merges the second and asserts the issue advances to done. Co-authored-by: multica-agent <github@multica.ai> * fix(github): re-evaluate auto-close on closed-without-merge events too GPT-Boy review on #2470: gating only the `state == "merged"` branch left one ordering hole. PR-A merges first → issue stays in_progress because PR-B is open; PR-B later closes WITHOUT merging → no event ever re-runs the auto-close check, so the issue is stuck in_progress. Generalise the trigger to every terminal PR event (`merged` or `closed`) and advance the issue only when: - the issue is not already terminal (done / cancelled); - no sibling PR is still in flight (open / draft); - at least one linked PR — current or sibling — actually merged. Rule (3) preserves "user closed every PR without merging → leave the issue alone": if no work was delivered, the user decides what to do. Replace `CountOpenSiblingPullRequestsForIssue` with `GetSiblingPullRequestStateCountsForIssue`, which returns both the in-flight count and the merged count in a single roundtrip. Adds `TestWebhook_ClosedSiblingAfterMerge` (the regression GPT-Boy flagged) and `TestWebhook_AllClosedWithoutMerge` (the negative case guarding rule 3). Refactors the multi-PR webhook helper out of the existing two-merge test so all three multi-PR scenarios share it. Co-authored-by: multica-agent <github@multica.ai> --------- Co-authored-by: multica-agent <github@multica.ai>	2026-05-12 15:39:55 +08:00
Bohan Jiang	caeb146bac	feat(github): GitHub App integration for PR ↔ issue linking (#1817 ) * feat(github): GitHub App backend for PR ↔ issue linking - New tables: github_installation (workspace ↔ App install), github_pull_request (mirrored PR state), issue_pull_request (M:N link). - Webhook handler verifies HMAC-SHA256, upserts PR rows, parses issue identifiers from PR title/body/branch and auto-links them. Merging a linked PR moves the issue to done. - Connect/setup endpoints power the zero-config "Connect GitHub" install flow; state token is HMAC-signed so the setup callback can recover the workspace. - Workspace-scoped admin routes for listing/disconnecting installations, plus a per-issue `pull-requests` list endpoint. Co-authored-by: multica-agent <github@multica.ai> * feat(github): UI for connecting GitHub and viewing linked PRs - Settings → Integrations: new tab with Connect GitHub / installations list / disconnect, gated on the deployment having the App configured. - Issue detail sidebar: Pull requests section showing linked PR title, repo, state (open/draft/merged/closed), and author, with deep link to GitHub. - Real-time refresh: github_installation:* and pull_request:* events invalidate the matching TanStack Query caches. Co-authored-by: multica-agent <github@multica.ai> * fix(github): address review — null actor, role gating, configured guard, scoped uninstall broadcast - listeners: use optionalUUID(e.ActorID) so the system actor on the github-driven issue:updated event no longer panics activity / notification listeners; merged-PR → issue done now produces a status_changed activity and inbox entry. - IntegrationsTab: gate the admin-only installations query on canManage so members no longer hit /github/installations 403; the configured/not-configured copy is also scoped to admins. - backend: introduce isGitHubConfigured() requiring both GITHUB_APP_SLUG and GITHUB_WEBHOOK_SECRET, and surface that single flag from list-installations + connect endpoints so the frontend Connect button stays disabled until both are set. - DeleteGitHubInstallationByInstallationID now RETURNs workspace_id; webhook handler publishes github_installation:deleted scoped to the right workspace so already-open Settings tabs invalidate in real time. ErrNoRows on a re-fired delete short-circuits cleanly. - tests: focused webhook integration coverage (auto-link + merge → done, cancelled preservation, uninstall returns workspace). Co-authored-by: multica-agent <github@multica.ai> * fix(github): i18n the new GitHub UI strings to satisfy lint CI flagged every literal string in the Integrations tab, the Pull requests sidebar section, and the per-PR row label. Move them through useT() and add the matching `integrations.` block to settings.json (en / zh-Hans) plus `detail.section_pull_requests` / `detail.pull_request_state_` / loading + empty copy under `issues.json`. Co-authored-by: multica-agent <github@multica.ai> --------- Co-authored-by: multica-agent <github@multica.ai>	2026-05-12 13:49:03 +08:00
Naiyuan Qing	86aa5199fc	feat(chat): support attachments & images in chat input (#2445 ) * docs(plans): chat attachment & image support implementation plan Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Co-authored-by: multica-agent <github@multica.ai> * feat(db): add chat_session_id/chat_message_id to attachment Co-authored-by: multica-agent <github@multica.ai> * feat(db): sqlc — chat_session_id on CreateAttachment + LinkAttachmentsToChatMessage Co-authored-by: multica-agent <github@multica.ai> * feat(file): upload-file accepts chat_session_id form field Co-authored-by: multica-agent <github@multica.ai> * feat(chat): SendChatMessage links uploaded attachments to the new message Co-authored-by: multica-agent <github@multica.ai> * feat(api): uploadFile accepts chatSessionId; sendChatMessage accepts attachmentIds Co-authored-by: multica-agent <github@multica.ai> * feat(core): useFileUpload supports chatSessionId context Co-authored-by: multica-agent <github@multica.ai> * feat(chat): support paste/drag/upload attachments in chat input Co-authored-by: multica-agent <github@multica.ai> * test(e2e): chat input attachment upload + send round-trip Co-authored-by: multica-agent <github@multica.ai> * chore(chat): keep lazy-created session title empty so untitled fallback localizes Co-authored-by: multica-agent <github@multica.ai> * fix(chat): address review — dedupe ensureSession + parse upload response - chat-window: cache in-flight createSession promise in a ref so a file drop followed by a quick send no longer spawns two sessions (and orphans the attachment on the losing one). - Attachment type + EMPTY_ATTACHMENT + AttachmentResponseSchema: include the new chat_session_id / chat_message_id fields the server now returns. - uploadFile: route the response through parseWithFallback so a malformed body returns EMPTY_ATTACHMENT instead of an undefined-keyed Attachment, matching the API boundary rule. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Co-authored-by: multica-agent <github@multica.ai> * fix(chat): address PR #2445 review — test ctx, send gating, attachment surface 1. Backend test was 400ing because the handler reads workspace from middleware-injected ctx, and `newRequest` only sets the header. Helper `withChatTestWorkspaceCtx` mirrors the agent-access-test pattern and loads the member row + SetMemberContext before invoking the handler. 2. Attachment metadata now flows end-to-end: - new sqlc `ListAttachmentsByChatMessageIDs` (batch lookup, mirrors the comment-side query) - `chatMessageToResponse` takes `attachments` and `ChatMessageResponse` surfaces them — same shape as CommentResponse - `ListChatMessages` loads them via a new `groupChatMessageAttachments` helper so the chat bubble can render file cards - daemon claim path pulls `ListAttachmentsByChatMessage` for the latest user message and ships `ChatMessageAttachments` to the daemon - `buildChatPrompt` lists id+filename+content_type and instructs the agent to `multica attachment download <id>` — fixes the private-CDN expiring-URL problem where the markdown URL would have expired by the time the agent acts - TS `ChatMessage` gains an optional `attachments` field 3. Chat composer now blocks send while uploads are in flight: - `pendingUploads` counter increments in handleUpload, SubmitButton uses it to disable - handleSend also gates on `editorRef.current.hasActiveUploads()` to catch the Mod+Enter path that bypasses the button - new vitest covers the "drop large file → immediate send" scenario where attachment id would otherwise be silently dropped Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Co-authored-by: multica-agent <github@multica.ai> * chore: drop implementation plan doc Process artefact, not something the repo needs to keep. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Co-authored-by: multica-agent <github@multica.ai> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com> Co-authored-by: multica-agent <github@multica.ai>	2026-05-12 10:57:54 +08:00
Bohan Jiang	63d215e1c3	feat(runtime): visibility (public/private) gate on CreateAgent / UpdateAgent (#2419 ) * feat(runtime): visibility (public/private) gate on CreateAgent / UpdateAgent Closes the hole where a plain workspace member could pick another member's runtime in the Create Agent dialog and bind an agent to it — the backend wasn't checking runtime ownership, so the agent ran on someone else's hardware / tokens. Reported on GH #1804. Schema - Migration 083 adds agent_runtime.visibility ('private' default, 'public') with a CHECK constraint. Existing rows default to private — same ownership semantics as before, no behavior change for legacy data. Backend - canUseRuntimeForAgent predicate: allow when caller is workspace owner/admin, the runtime owner, or the runtime is public. - CreateAgent and UpdateAgent both gate on it: UpdateAgent matters because a plain member could otherwise create on their own runtime, then re-bind to a private one. - PATCH /api/runtimes/:id accepts { visibility } — owner/admin only, validated against the same private/public allow-list. Frontend - Create-agent dialog renders other-owned private runtimes disabled with a Lock badge + tooltip explaining who to ask. - Inspector runtime-picker disables the same set so re-binding fails the same way at the UI layer. - Runtime detail diagnostics gains a Visibility editor (owner/admin) or read-only chip (everyone else). - Runtime list shows a private/public chip next to the name. Tests - Go: canUseRuntimeForAgent truth table; CreateAgent / UpdateAgent end-to-end gate tests (admin / runtime owner / plain member); PATCH visibility owner / admin / member / invalid-value coverage. - Vitest: create-agent dialog disabled state on private/public runtimes, default-runtime selection skips locked rows; runtime detail visibility editor → mutation, read-only fallback. Migrating runtimes: existing rows default to private to preserve the "owner only" status quo. Owners switch to public via the detail page diagnostics card. Co-authored-by: multica-agent <github@multica.ai> * fix(runtime): apply timezone+visibility atomically; don't seed locked template runtime Two issues surfaced in review of MUL-2062: 1. PATCH /api/runtimes/:id ran the timezone branch first, which: - returned early on a tz no-op, silently dropping a concurrent `visibility` patch in the same body; - committed the timezone mutation (+ usage rollup rebuild) before validating visibility, so an invalid visibility left the row half-updated. Validate every field first, then run the mutations in order. The no-op short-circuit now only triggers when nothing else is requested. 2. The Create Agent dialog in duplicate mode unconditionally seeded `template.runtime_id` as the selected runtime, even when that runtime is now private and owned by someone else — the user saw a selected row they couldn't submit (Create → backend 403). Fall back to the first usable runtime when the template's runtime is locked, and gate the Create button on `selectedRuntimeLocked` as defense in depth. Tests: - Go: TestUpdateAgentRuntime_CombinedPatchAppliesBoth (tz no-op + visibility flip), TestUpdateAgentRuntime_InvalidVisibilityDoesNotMutateTimezone (atomic-fail invariant). - Vitest: duplicate template pointing at a locked runtime now seeds the first usable one; Create button stays disabled when no usable alternative exists. Co-authored-by: multica-agent <github@multica.ai> --------- Co-authored-by: multica-agent <github@multica.ai>	2026-05-11 22:53:07 +08:00
Kagura	702c48209b	fix(agent): stop filtering Pi extension tools via hardcoded --tools allowlist (#2379 ) (#2381 ) The Pi backend hardcoded `--tools read,bash,edit,write,grep,find,ls` in buildPiArgs. Pi's SDK treats --tools as a restrictive allowlist: only the listed tools pass through `_refreshToolRegistry()`, silently filtering out any user-installed extension tools registered via `pi.registerTool()`. Omitting --tools makes Pi's `allowedToolNames` undefined, so the `isAllowedTool()` filter becomes a no-op and all tools — built-in and extension — are available. This matches Pi's standalone behavior. Users who want to restrict tools can still pass --tools via custom_args (it is not in piBlockedArgs). Closes #2379	2026-05-11 16:11:32 +08:00
Bohan Jiang	fae8558263	fix(daemon): self-heal when a runtime is deleted server-side (#2404 ) Closes #2391.	2026-05-11 16:09:40 +08:00

1 2 3 4 5 ...

292 Commits