multica

mirror of https://github.com/multica-ai/multica.git synced 2026-07-05 21:39:54 +02:00

Author	SHA1	Message	Date
AdamQQQ	fab0671332	feat(skills): support multi-select bulk import in Copy from runtime (#2686 ) - Multi-select UI for batch importing skills from a local runtime - Server batch-dispatches up to 10 import requests per heartbeat cycle - WS heartbeat now reads supports_batch_import from daemon payload instead of hardcoding true, so old daemons correctly fall back to one-at-a-time dispatch - Raised server pending timeout to 3min and client poll timeout to 4min to accommodate daemons that pop only one import per 15s heartbeat Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-18 16:56:27 +08:00
Jiayuan Zhang	46c1e2c889	feat(squads): show member working status on squad detail page (#2768 ) * feat(squads): show member working status on squad detail page Add a new GET /api/squads/{id}/members/status endpoint that returns each member's derived working/idle/offline/unstable status, the issues each agent is currently running, and the last observed activity timestamp. The Squad detail page's Members tab consumes this snapshot to render a status pill and an active-issue link next to each agent, with live refresh wired through the existing task/agent/daemon WS events. Human members are returned with status=null so the UI can keep them in the same list without implying a presence signal. Archived agents stay in the response and surface as offline rather than being filtered out. Co-authored-by: multica-agent <github@multica.ai> * fix(squads): address review feedback on member status endpoint - i18n the "blocked" issue-status pill in squad members tab (was a bare literal that failed `i18next/no-literal-string` lint). - Treat any dispatched/running task as working, even when its `agent_task_queue.issue_id` is NULL (chat / quick-create tasks). The agent slot is occupied regardless of whether we can render an issue link. - Force `offline` for archived agents so they appear in the list but never look like they're still on duty, matching the RFC decision in MUL-2319. - Include `workspaceKeys.squads` in the post-reconnect / workspace-switch bulk invalidation so members-status recovers after a disconnect during which task/runtime events were missed. Co-authored-by: multica-agent <github@multica.ai> --------- Co-authored-by: multica-agent <github@multica.ai>	2026-05-18 10:35:18 +02:00
Multica Eve	dfe2a57361	fix(autopilots): allow duplicate create_issue runs (#2789 ) Co-authored-by: Eve <eve@multica-ai.local> Co-authored-by: multica-agent <github@multica.ai>	2026-05-18 16:05:54 +08:00
LinYushen	6621231237	fix: improve search ranking and snippet support (MUL-2329) Fixes MUL-2329	2026-05-18 15:45:06 +08:00
Bohan Jiang	433cd1aaf5	fix(codex): bump default exec_command stuck timeout to 3 minutes (#2786 ) The watchdog fires on a "no progress" window, so the default mainly matters for commands that go fully silent (no outputDelta). Bumping from 2m → 3m leaves more headroom for legitimately slow silent commands before treating them as a dropped function_call_output, at a modest cost to recovery latency. MUL-2337 Co-authored-by: multica-agent <github@multica.ai>	2026-05-18 15:30:05 +08:00
Bohan Jiang	60bae62622	feat(codex): add per-exec_command watchdog to escape dropped function_call_output (MUL-2337) (#2779 ) * feat(codex): add per-exec_command watchdog to escape dropped function_call_output (MUL-2337) Codex app-server can drop the second function_call_output when two exec_command calls fan out in the same turn and both async-yield through the yield_time_ms boundary (observed 2026-05-18, MUL-2334 — Trump Agent wedged for 6+ min with no semantic activity events to drive any existing timer). The model then waits forever for the missing output; only the 10-minute semantic inactivity timeout would eventually rescue the run. Add a per-call watchdog in the codex client that tracks open exec_command / commandExecution items by call_id and fails the turn quickly (default 2 min, configurable via ExecOptions.ExecCommandStuckTimeout) when one stays open without progress. outputDelta events reset the per-call progress timestamp so long-running streaming commands aren't flagged. This is a daemon-side mitigation only — codex itself still has the upstream race, but the daemon no longer burns the full inactivity budget before the run is marked failed and a new run can recover. Co-authored-by: multica-agent <github@multica.ai> * feat(codex): track legacy exec_command_output_delta in watchdog (MUL-2337) Mirrors the raw v2 item/commandExecution/outputDelta refresh on the legacy codex/event protocol so a long-running streaming exec doesn't get falsely flagged as stuck after begin + 2 min. Co-authored-by: multica-agent <github@multica.ai> --------- Co-authored-by: multica-agent <github@multica.ai>	2026-05-18 15:14:45 +08:00
Bohan Jiang	2323b72710	feat(autopilots): webhook delivery layer + idempotency/signature/replay (MUL-2334) [PR1] (#2774 ) * feat(autopilots): webhook delivery layer + idempotency / signature / replay (MUL-2334) Splits "inbound webhook receipt" from "autopilot run creation" so we can record duplicate attempts, signature outcomes, and ignored/skipped deliveries — and replay a delivery on demand. v1 ingress wrote straight into autopilot_run.trigger_payload, which collapsed the two concerns and left run_only autopilots vulnerable to provider retry storms. Backend only (PR1). UI Deliveries tab follows in PR2. Schema (migration 093): - autopilot_trigger.provider: 'generic' \| 'github' (default 'generic'). - autopilot_trigger.signing_secret: nullable plaintext (HMAC needs it cleartext; mirrors how webhook_token is stored). - webhook_delivery: one row per inbound POST. Carries raw_body, selected_headers, dedupe_key/source, signature_status, autopilot_run_id, replayed_from_delivery_id, response_status / body. - Partial unique index on (trigger_id, dedupe_key) excludes NULL and 'rejected' rows, so a wrong-secret 401 does NOT permanently block a future retry with the same X-GitHub-Delivery once the operator fixes the secret. Ingress flow (autopilot_webhook.go), persist-first + sync dispatch: 1. IP rate limit -> 2. token lookup -> 3. token rate limit -> 4. read raw body -> 5. autopilot/workspace cross-check -> 6. normalize JSON (400 without persistence on parse failure) -> 7. compute dedupe key + signature status -> 8. INSERT delivery (status=queued). On (trigger_id, dedupe_key) unique-violation: bump attempt_count on existing row and return the original delivery_id + autopilot_run_id with 200 -> 9. invalid/missing signature: UPDATE -> rejected, return 401 with delivery_id (no dispatch, not replayable) -> 10. trigger disabled / autopilot paused/archived: UPDATE -> ignored, return 200 -> 11. DispatchAutopilot synchronously, UPDATE -> dispatched/skipped/failed with autopilot_run_id and the response body we returned -> 12. TouchAutopilotTriggerFiredAt and return 200. No new long-running worker. A stale 'queued' row only happens if the process dies between INSERT and UPDATE; that's a follow-up sweeper, not this PR. Authenticated API: - GET /api/autopilots/{id}/deliveries (slim list) - GET /api/autopilots/{id}/deliveries/{deliveryId} (with raw_body) - POST /api/autopilots/{id}/deliveries/{deliveryId}/replay -> creates a new delivery row (replayed_from_delivery_id set), dispatches a new run, never collapses onto the original via dedupe. - PUT /api/autopilots/{id}/triggers/{triggerId}/signing-secret Write-only; trigger response surfaces has_signing_secret + signing_secret_hint (last 4 chars), never the secret itself. Signature verification reuses the GitHub-compatible X-Hub-Signature-256: sha256=<hex(hmac(body, secret))> scheme; the HMAC helper is constant-time. Invalid/missing signatures still count against per-IP and per-token rate limits. autopilot_run.trigger_payload is intentionally preserved — delivery records the HTTP receipt; run records the normalized envelope handed to the agent. They are two different views. Tests (Postgres-backed): - delivery persistence on accept - dedupe via Idempotency-Key and X-GitHub-Delivery; run_only retry storm pin (3 retries -> 1 run) - invalid signature: 401 + rejected row + no run linkage - missing signature when secret configured: 401 + 'missing' state - valid signature dispatches - signing secret never echoed in trigger responses; hint shows last 4 - min-length and clear-by-empty for signing secret PUT - replay creates a NEW delivery + new run; rejected deliveries cannot be replayed - list omits raw_body; detail includes it; cross-autopilot ID returns 404 (workspace isolation defense in depth) - provider validation: unknown -> 400, github -> 201 round-trips - bad-signature stream still counts against per-token rate limit Co-authored-by: multica-agent <github@multica.ai> * fix(autopilots): address PR review on webhook delivery layer (MUL-2334) - Exclude `failed` from the (trigger_id, dedupe_key) partial unique index alongside `rejected`, so a transient ingress failure does not strand the provider's stable X-GitHub-Delivery / Idempotency-Key retry. Update the dedupe lookup to prefer non-terminal rows under the same predicate. - Tighten delivery status enum: drop `skipped` from the CHECK constraint and from the handler. A run that was admission-skipped (e.g. runtime offline) is now recorded as delivery=`dispatched` linked to the skipped run, with the response payload carrying status=`skipped`. Source of truth for skipped-ness is autopilot_run.status, not the delivery row — keeps the Deliveries UI enum unambiguous. - On dispatch error, link the (possibly non-nil) autopilot_run returned by DispatchAutopilot to the failed delivery so Deliveries UI can navigate to the run row for debugging. - Slim list projection: ListWebhookDeliveriesByAutopilot no longer pulls raw_body / selected_headers / response_body — a 100-row page × 256 KiB would otherwise round-trip ~25 MiB from Postgres per Deliveries reload. Detail endpoint continues to return the full row. - Fix backend CI: TestGetDelivery_ReturnsFullPayload now decodes the response and asserts on the parsed raw_body instead of substring- matching against an escaped JSON string; raise the test-suite default webhook rate limits in TestMain so the shared 192.0.2.1 IP bucket doesn't fill across the suite and leak 429s into unrelated tests. - Add regression coverage for the dedupe-after-failure path. cd server && go test ./... is green locally. Co-authored-by: multica-agent <github@multica.ai> --------- Co-authored-by: multica-agent <github@multica.ai>	2026-05-18 14:59:40 +08:00
Zohar Babin	15152c6ccd	feat(auth): cache workspace membership for daemon heartbeat path (MUL-2247) (#2638 ) * feat(auth): cache workspace membership for daemon heartbeat path Cache workspace membership existence (not role) in Redis to eliminate a DB round-trip on every PAT-authenticated daemon heartbeat. Follows the existing PATCache nil-safe pattern. Key design decisions per reviewer feedback: - Cache existence only (sentinel "1"), not role string. Authorization decisions that depend on role always hit the DB directly. This eliminates the cache-aside race where a stale elevated role could persist after a downgrade. - Proactive invalidation on UpdateMember, DeleteMember, LeaveWorkspace, and DeleteWorkspace (iterates members before cascade delete). - 5 min TTL. Combined with PATCache (10 min), worst-case revocation delay is max(10m, 5m) = 10 min — consistent with original PATCache design decision. Limitations: - Non-members still hit DB on every request (negative caching not implemented — the scenario is rare for daemon endpoints which require valid workspace-scoped tokens). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: multica-agent <github@multica.ai> * test(auth): drive membership cache invalidation through real handlers - TestRequireDaemonWorkspaceAccess_CacheHit now uses a ghost user with no member row, so the only path to a granted access is the cache short-circuit. Without priming the cache the access check must fail; with priming it must succeed. A future change that bypasses the cache would fail the second assertion. - Replaces the cache-only InvalidatedOnMemberRemoval test (which only re-exercised the auth-package primitive) with four handler-driven tests that exercise DeleteMember, UpdateMember, LeaveWorkspace and DeleteWorkspace via their real HTTP handlers. Each test prepares a real member, primes the cache, calls the handler, and asserts the cache entry is gone — so a refactor that drops one of the Invalidate(...) calls in workspace.go will fail CI. Co-authored-by: multica-agent <github@multica.ai> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: multica-agent <github@multica.ai> Co-authored-by: Jiang Bohan <bhjiang@outlook.com>	2026-05-18 13:30:35 +08:00
Zohar Babin	e50bfc88da	fix(auth): add per-IP rate limiting on public auth endpoints (#2636 ) Adds a Redis-backed fixed-window rate limiter middleware on /auth/send-code, /auth/verify-code, and /auth/google. Prevents brute-force enumeration, verification_code table flooding, and connection pool exhaustion from rapid-fire unauthenticated requests. Key design decisions per reviewer feedback: - X-Forwarded-For trust model: XFF is NEVER trusted by default. Only honored when RemoteAddr is from a CIDR in RATE_LIMIT_TRUSTED_PROXIES. Uses rightmost-untrusted algorithm (walks XFF right-to-left, returns first non-trusted IP). Matches the project's conservative model in health_realtime.go. - Atomic INCR+EXPIRE via Lua script: prevents a stuck key (permanent ban) if EXPIRE fails independently. Follows existing Lua script pattern in runtime_local_skills_redis_store.go. - Fixed-window counter (not sliding-window): simple, adequate for auth rate limiting where precision at window boundaries is acceptable. - Fail-open with startup warning: nil Redis disables rate limiting (same as PATCache), but logs a warning at startup so ops can see. - IPv6 normalization: net.ParseIP().String() produces canonical form. - Configurable via env vars: RATE_LIMIT_AUTH (default 5/min), RATE_LIMIT_AUTH_VERIFY (default 20/min), RATE_LIMIT_TRUSTED_PROXIES. Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-05-18 12:59:28 +08:00
Multica Eve	e8fb0efe3d	MUL-2324 conditionally inject non-core rule blocks (#2771 ) * feat(runtime): conditionally inject non-core rule blocks Co-authored-by: multica-agent <github@multica.ai> * fix(runtime): tighten mention rule triggers Co-authored-by: multica-agent <github@multica.ai> --------- Co-authored-by: Eve <eve@multica-ai.local> Co-authored-by: multica-agent <github@multica.ai>	2026-05-18 12:52:54 +08:00
Multica Eve	58a76f6d96	fix(execenv): trim default runtime brief command list (MUL-2322) (#2769 ) Trim the default runtime brief Available Commands to the agreed core set, including issue create/update, while keeping non-core commands discoverable through help. CI passed for backend and frontend.	2026-05-18 12:25:37 +08:00
Kerim Incedayi	9418d2a2c1	feat(autopilots): webhook triggers (server + CLI + UI + docs) MUL-2049 (#2348 ) * feat(server): add webhook trigger DB migration + sqlc queries Lays the foundation for webhook autopilot triggers: - partial unique index on autopilot_trigger.webhook_token (kind=webhook only) so the public ingress route can resolve a trigger in O(1) - GetWebhookTriggerByToken / TouchAutopilotTriggerFiredAt / RotateAutopilotTriggerWebhookToken / SetAutopilotTriggerWebhookToken queries, regenerated with sqlc * feat(server): webhook token generator + payload normalizer Two pure helpers for the webhook autopilot work: - generateWebhookToken: 32 random bytes -> base64-url, "awt_" prefix. 256 bits of entropy keeps brute-force off the table; the prefix makes leaked tokens recognisable in logs. - normalizeWebhookPayload: turns arbitrary JSON into the WebhookEnvelope shape (event/eventPayload/request) used by trigger_payload. Header- and body-based event inference covers GitHub, GitLab, X-Event-Type, and caller-provided envelopes; scalar/empty/invalid bodies are rejected so the handler can answer 400. * feat(server): generate webhook tokens and expose rotate endpoint - New handler.Config.PublicURL fed by MULTICA_PUBLIC_URL env so /api/autopilots/.../triggers responses can include an absolute webhook_url alongside the always-present webhook_path. - CreateAutopilotTrigger now mints a webhook_token via crypto/rand for kind=webhook and ignores cron/timezone for non-schedule kinds. api triggers stay accepted-but-inert per PLAN.md. - New POST /api/autopilots/{id}/triggers/{triggerId}/rotate-webhook-token protected by the existing workspace auth group; old tokens stop working immediately because the unique-index lookup keys on the current row value. * feat(server): public webhook ingress route + per-token rate limiter - New POST /api/webhooks/autopilots/{token} route, mounted outside the authenticated group: the path token is the credential. Workspace context is derived from the joined autopilot row, never headers. - Body capped at 256 KiB via http.MaxBytesReader; oversized payloads return 413 mid-read instead of being fully buffered. - Disabled triggers / paused / archived autopilots return 200 {"status":"ignored"} so providers stop retrying. - Skipped-runtime dispatches surface 200 {"status":"skipped"} with the reason from the autopilot service's pre-flight admission check. - WebhookRateLimiter interface with sliding-window in-memory + Redis Lua-script implementations. Default 60 req/min per token. Test coverage on the in-memory path; Redis variant fails open on cache errors so a Redis hiccup never blocks ingress. - Integration tests exercise token generation, dispatch, payload envelope persistence, GitHub-header inference, paused/disabled short-circuits, oversized rejection, and rotate-then-old-token-404. * feat(server): include webhook payload in create_issue description When an autopilot run is triggered by a webhook and execution_mode is create_issue, the agent only sees the issue body — never the run's trigger_payload. Append a 'Webhook event:' line and a fenced JSON block with the normalized eventPayload so the agent has the inbound context inline. Schedule / manual runs are unchanged. Tests cover: - schedule path keeps existing italic note, no webhook block - webhook path emits event line + payload block, italic before block - non-envelope JSON falls back to raw body (defensive) - non-webhook source with payload still gets no webhook block * feat(core): types, API client and mutations for webhook triggers - AutopilotRunStatus gains 'skipped' so the run-list UI handles the admission-skipped state explicitly instead of falling through to a generic case (the backend already emits it via MUL-1899). - AutopilotTrigger picks up optional webhook_path / webhook_url. Both are optional so older self-hosted servers that pre-date this change still parse cleanly. - buildAutopilotWebhookUrl helper composes a usable absolute URL with the priority webhook_url > apiBaseUrl + path > origin + path > path. Tested with seven cases covering each branch. - ApiClient.rotateAutopilotTriggerWebhookToken posts to /api/autopilots/{id}/triggers/{triggerId}/rotate-webhook-token; the HTTP-contract test pins URL + method. - useRotateAutopilotTriggerWebhookToken mutation invalidates autopilotKeys.detail on settle, mirroring the existing trigger-mutation pattern. * feat(views): webhook trigger UI in Add Trigger dialog and trigger row Add Trigger dialog gains a Schedule/Webhook segmented toggle: - Schedule reuses TriggerConfigSection unchanged. - Webhook hides the cron config and shows a help line; the trigger is created with kind=webhook and the URL is generated server-side. - Toast text differentiates schedule vs webhook on success. TriggerRow grows a webhook branch: - Webhook icon, kind translated via trigger_kind. - URL shown in a truncating monospace pill, with copy + rotate buttons. Copy uses navigator.clipboard with toast feedback; rotate uses an AlertDialog confirm because the old URL stops working immediately. - api triggers render a Deprecated badge and skip URL/copy/rotate affordances. RunRow gains a 'skipped' RUN_VISUAL entry (muted dash) so admission- skipped runs don't fall through to a generic case. Source label uses the new run_source i18n key instead of capitalize. Locales: en + zh-Hans gain run_status.skipped, run_source., trigger_kind., trigger_row.{copy_url,rotate_url,_confirm_,toast_}, add_trigger_dialog.{type_,webhook_help,toast_added_{schedule,webhook}}. * feat(cli): support webhook trigger creation and URL rotation - multica autopilot trigger-add now takes --kind schedule\|webhook (default schedule for backward compatibility). For webhook it skips --cron / --timezone validation and prints the resulting webhook URL, preferring the server-provided webhook_url and falling back to client.BaseURL + webhook_path. - New multica autopilot trigger-rotate-url <autopilot-id> <trigger-id> command for rotating the bearer URL of a webhook trigger. * docs(autopilots): add webhook trigger guide (en + zh) Replaces the 'Webhook and API triggers are not available yet' section with end-to-end webhook documentation: how the URL is generated, what payload shapes are accepted, the inferred-event rules, the bearer-secret warning + rotate flow, status-code semantics for accepted/skipped/ ignored/4xx/5xx outcomes, and the MULTICA_PUBLIC_URL self-host configuration. Run history list now mentions skipped status. The 'unavailable features' section narrows to api-kind triggers, HMAC signing, IP allowlists, and provider presets. * feat(views): add Schedule/Webhook toggle to the create autopilot dialog Closes the gap where a brand-new autopilot could only be created with a schedule trigger. The right-column config now has a Trigger section with a segmented Schedule/Webhook control: - Schedule keeps the existing cron/timezone UI. - Webhook hides the cron UI and shows a help line; on submit, a kind=webhook trigger is created right after the autopilot. In edit mode the toggle is intentionally hidden (PLAN.md treats trigger- type changes as delete-old + create-new, not in-place updates), but the panel still picks the right kind based on props.triggers[0].kind so a webhook autopilot doesn't render an irrelevant cron form. Locales: section_trigger_kind, trigger_kind_{schedule,webhook}, section_webhook, webhook_help_{create,edit} added in en + zh-Hans. * feat(views): show webhook URL inline after creating a webhook autopilot After a successful create with kind=webhook, the dialog stays open and swaps to a confirmation panel showing the freshly minted URL with a copy button + 'Treat this URL like a password' warning + Done button. Avoids the friction of "create the autopilot, then go find it in the list, click in, scroll to triggers, copy URL." Locales: dialog.webhook_created_{title,description,warning,done} added in en + zh-Hans. Schedule create flow is unchanged (toast + close). The success panel is gated on the trigger returned from the create mutation, so a partial failure (autopilot created, trigger creation errored) still falls through to the toast_create_partial path. * feat(views): show webhook payload in run detail dialog The agent transcript dialog now accepts an optional headerSlot that sits above the event list. The autopilot RunRow drops a WebhookPayloadPreview into that slot when the run came from a webhook and trigger_payload is non-empty. The preview is collapsed by default (the transcript itself is the main event), shows the inferred event name + receivedAt in the header, and reveals the eventPayload as pretty-printed JSON with a copy button on expand. Falls back gracefully if the row's trigger_payload doesn't match the WebhookEnvelope shape — the whole value is shown instead so nothing is hidden. Closes the "agent didn't echo the payload, now I can't see what triggered the run" gap. PLAN.md tracked this as "Payload preview in run history" under follow-ups. Locales: webhook_payload.{label, unknown_event, payload, content_type, copy, copied, copied_short, copy_failed} added in en + zh-Hans. * chore(server): wire MULTICA_PUBLIC_URL through self-host compose Two small follow-ups split out of the webhook trigger PR: - docker-compose.selfhost.yml passes MULTICA_PUBLIC_URL into the backend container so a self-hosted deployment behind a real domain gets absolute webhook URLs in the trigger response. Documented in .env.example with the rationale for not deriving the public host from request headers. - Drop a duplicated 'invalid json:' prefix in the webhook ingress 400 error path. normalizeWebhookPayload already prefixes its errors, so the handler doesn't need to re-prefix. * fix(migrations): renumber webhook trigger migration 081 → 089 to avoid collision The branch's 081_autopilot_webhook_triggers.{up,down}.sql collided numerically with 081_runtime_timezone.{up,down}.sql that landed on main, making migration apply order undefined. Renumber to 089 so the file slots after the latest main migration (088_squad_instructions). The SQL itself doesn't conflict — it only creates a partial unique index on autopilot_trigger.webhook_token — but the duplicate prefix is what the migration runner sees, so the filename must move. * fix(autopilot-webhook): address PR review blocking issues - Redact bearer tokens from request logs: paths matching /api/webhooks/autopilots/<token> now log "[redacted]" instead of the token. The resolved trigger ID is plumbed via context so audit lines stay useful for debugging. (Review item Blocking #1.) - Distinguish pgx.ErrNoRows from transient DB errors in token lookup: no-row stays 404 (so providers don't retry on a deleted webhook), other errors return 500 (which providers DO retry, avoiding silent drops on DB blips). (Review item Blocking #2.) - Add per-IP sliding-window rate limiter that runs BEFORE the token lookup, so spraying random tokens can no longer probe the autopilot_trigger index unboundedly. Reuses the existing Lua script with a separate Redis key namespace; falls open on Redis errors. Default budget 30 req/min/IP. (Review item Blocking #3.) The webhook handler now applies the gates in the order: per-IP rate limit → token lookup → per-token rate limit → handler logic. * fix(autopilot): atomic webhook trigger creation + strict kind/timezone validation - Mint the webhook bearer token BEFORE the INSERT and pass it via CreateAutopilotTriggerParams so the row never exists in a half-written kind=webhook + webhook_token=NULL state. On the (vanishingly rare) unique-index collision the whole INSERT is retried with a fresh token — no UPDATE second step. Removes the now-dead attachFreshWebhookToken helper. (Review item Recommended #4.) - Add new GET /api/autopilots/{id}/runs/{runId} endpoint that returns a single run including the full trigger_payload. The list response is now slim (omits trigger_payload) so worst-case payload size drops from ~5 MB to ~5 KB. (Review item Recommended #5, server side.) - Reject kind=api with 400 ("kind=api is deprecated; use schedule or webhook") and reject kind=webhook with --timezone with 400 — both surfaces stragglers loudly instead of silently dropping fields. CLI mirrors the check so --timezone with --kind webhook errors client-side. (Review nits.) - Add --yes (-y) flag and an interactive y/N confirmation prompt to `multica autopilot trigger-rotate-url` so the destructive rotate matches the UI's AlertDialog safety. (Review item Recommended #6.) * fix(views): fetch webhook payload on-demand and truncate at 4 KiB - Add useAutopilotRun query hook + getAutopilotRun API client method paired with the new server endpoint. The run-detail dialog now mounts a WebhookPayloadSlot that fetches the full run (incl. trigger_payload) lazily — list responses no longer carry up to 256 KiB × N runs of envelope data. - WebhookPayloadPreview truncates its in-DOM <pre> at 4 KiB with a localized marker so jank-y machines aren't asked to render a 256 KiB JSON blob. The Copy button still yields the full string. - Adds the truncated_marker i18n string to en + zh-Hans. Review items Recommended #5 (frontend) and a nit on the preview's unbounded <pre>. * test(autopilot-webhook): close coverage gaps flagged in PR review - request_logger: redactWebhookPath unit tests + integration test proving the bearer token never lands in slog output, plus the webhook_trigger_id context plumbing. - autopilot_webhook_handler: empty body → 400, archived autopilot → 200 ignored, per-IP rate limiter trips before DB lookup, kind=api and webhook+timezone are rejected at 400, slim list + full detail endpoint round-trip. - webhook_rate_limiter: Lua script structure guard (catches reordering even without a live Redis), plus live-Redis tests for both per-token and per-IP limiters (REDIS_TEST_URL gated, matching the existing Redis test pattern in the package). - WebhookPayloadPreview: envelope rendering, fallback shape, and the >4 KiB truncation path with full-payload-on-Copy guarantee. Two branches are documented as code-review-protected rather than covered by tests: the 500-on-DB-error path requires injecting a stub Queries (no interface here), and the cross-workspace defense-in-depth check is unreachable from valid SQL state. * fix(middleware): SetWebhookTriggerID must mutate request in place The round-1 helper returned a fresh http.Request from WithContext, and the webhook handler did `r = SetWebhookTriggerID(r, ...)`. That swaps the handler's local pointer but doesn't propagate the new context back to RequestLogger, which is still holding the original http.Request — so the audit line never actually included webhook_trigger_id in production. The round-1 test happened to pass because it pre-stashed the value on the request before calling ServeHTTP, bypassing the bug it was meant to verify. Switch to in-place mutation via `r = r.WithContext(...)` so the wrapping middleware sees the new context after next.ServeHTTP returns, and update the test to exercise the real call pattern (set the context from inside the handler, assert the surrounding logger reads it). Verified live: an accepted webhook now logs path=/api/webhooks/autopilots/[redacted] webhook_trigger_id=<uuid> * fix(autopilot-webhook): symmetric ErrNoRows split + trusted-proxy gate Round-2 review (Bohan-J, PR #2348 follow-up): - Must-fix #1: the second lookup at autopilot_webhook.go:258 (GetAutopilot after the token resolves) was folding every error into 404. A transient DB blip would tell a webhook sender "not found" and it would never retry. Apply the same errors.Is(err, pgx.ErrNoRows) → 404 / else → 500 split as the first lookup got in round 1. - Must-fix #2: clientIPForRateLimit was honoring X-Forwarded-For / X-Real-IP from any caller. An attacker spraying random tokens could just rotate the XFF header and the per-IP bucket became per-request, so the limiter that's specifically supposed to gate spraying before it hits the DB unique index was bypassed. New shape — matches Bohan's suggestion exactly: * Default: r.RemoteAddr only, headers ignored. * Operator opt-in via MULTICA_TRUSTED_PROXIES (comma-separated CIDRs). XFF/X-Real-IP are honored only when r.RemoteAddr is inside one of the listed prefixes; otherwise they're dropped. Wired through .env.example and docker-compose.selfhost.yml so self-host operators can configure their reverse-proxy's CIDR. Invalid CIDRs in the env var are dropped with a single slog.Warn at startup rather than crashing the server. Uses net/netip (stdlib, value-typed) for parsing and containment checks. Verified live on the rebuilt self-host backend: a 35-request spray from one source with rotating XFF gets the expected 30× 404 + 5× 429, proving the per-IP bucket is keyed on the real connection IP. * fix(autopilot): reject cron/timezone PATCH on non-schedule triggers Round-2 review should-fix. CreateAutopilotTrigger already 400s on kind=webhook + timezone/cron_expression, but UpdateAutopilotTrigger silently wrote those fields regardless of prev.Kind. The values then sat in the DB visible to nobody and read by nothing — a back door that left the API contract fuzzy across create vs update. Mirror the create-path discipline: after loading prev, if prev.Kind != "schedule" and the PATCH body sets cron_expression or timezone, return 400 with a clear message. enabled and label remain accepted on every kind. The existing prev.Kind == "schedule" guard on next_run_at recompute stays as belt-and-braces, but with this gate in place the recompute branch is now reachable only for the kind it was meant for. * test(autopilot-webhook): close round-2 coverage gaps - IPRateLimitNotBypassedByXFFSpoof: drives the must-fix #2 invariant by rotating XFF across three calls from the same RemoteAddr and asserting the third gets 429. Pre-round-2 this test would have passed for the wrong reason (limiter trusted XFF, so per-bucket collision was incidental); now it pins the bypass-closed property. - IPRateLimitReturns429BeforeDBLookup: updated to set RemoteAddr explicitly and drop the XFF header it was leaning on. With TrustedProxies empty (test default) the limiter keys on the real connection IP, which is what the test wants to assert anyway. - UpdateAutopilotTrigger_RejectsCronExpressionOnWebhookKind + UpdateAutopilotTrigger_RejectsTimezoneOnWebhookKind: drive the round-2 should-fix from the handler boundary. - UpdateAutopilotTrigger_AcceptsEnabledAndLabelOnWebhookKind: counter test so a regression to a blanket reject is caught. * fix(migrations): bump webhook trigger migration 089 → 091 origin/main added 089_squad_no_action_activity_index (and 090_task_is_leader) since our last rebase, re-colliding with our 089_autopilot_webhook_triggers. Bump to 091 so the filename ordering is unambiguous again. The SQL is unchanged — same partial unique index on autopilot_trigger.webhook_token — only the filename moves. * fix(views): dedupe skipped icon in autopilot RUN_VISUAL after rebase The rebase against origin/main merged main's add of `Ban` for the skipped status next to our round-1 `MinusCircle` entry, leaving the RUN_VISUAL map with two `skipped` keys (only the last would have been read at runtime, and MinusCircle had been dropped from the imports during conflict resolution — so the file would not compile). Keep main's `Ban` icon (latest design) and a single `skipped` entry. Carry over the round-1 comment about why the muted styling matters for failure-ratio readability. --------- Co-authored-by: Kerim Incedayi <kerim.incedayi@digitalchargingsolutions.com>	2026-05-18 12:17:39 +08:00
Bohan Jiang	113c4f4e90	docs(agent): clarify openclaw agent id vs name semantics (#2744 ) Follow-up to #2716. Updates two stale comments that still described openclaw's `name` and `id` as interchangeable. The actual contract: `id` is the routing key passed to `openclaw agent --agent <id>`; `name` is a human display label and is not safe to pass to the CLI. No behavior change. Co-authored-by: multica-agent <github@multica.ai>	2026-05-17 17:20:41 +08:00
Kagura	44d2fc1946	fix(agent): use openclaw agent id instead of name for --agent flag (#2716 ) openclawEntriesToModels() used the agent Name (which may contain spaces, e.g. "Sub2API OPS") as Model.ID. This ID is passed to openclaw via --agent, where normalizeAgentId mangles spaces into hyphens ("sub2api-ops"), causing a lookup miss against the registered id ("sub2api") and a "no parseable output" error. Fix: prefer agent ID for Model.ID; use Name only for display Label. When ID is empty, fall back to Name for backward compatibility. Fixes #2714	2026-05-17 17:08:00 +08:00
Bohan Jiang	3645bdb5b6	feat(issues): add start_date field with progressive disclosure (MUL-2274) (#2696 ) * feat(issues): add start_date field with progressive disclosure (MUL-2274) Mirrors the existing due_date implementation end-to-end so an issue can express a planned start in addition to a deadline. Surfaces start_date as an optional sidebar property alongside priority / due_date / labels (added in MUL-2275), with consistent picker, board/list/sort, activity, and inbox plumbing. Backs the Project Gantt work (parent MUL-1881) and keeps the progressive-disclosure attribute experience consistent. - DB: migration 091 adds issue.start_date TIMESTAMPTZ. - sqlc: ListIssues / CreateIssue / UpdateIssue / CreateIssueWithOrigin / ListOpenIssues read & write start_date. - Backend: IssueResponse + create/update/batch-update handlers parse and emit start_date with RFC3339 validation; new start_date_changed activity event + subscriber notification (with prev_start_date in event payload). - CLI: --start-date flag on `multica issue create` / `issue update`. - Frontend: StartDatePicker component, start_date wired into Issue type, Zod schema, draft / view stores, sort util, header sort + card-property options, list-row / board-card display, create-issue modal, and the issue-detail progressive-disclosure "+ Add property" surface (visibility rule, picker row, add-property menu icon + label). - i18n: en + zh-Hans for sort_start_date / card_start_date / prop_start_date / activity start_date_set / start_date_removed / picker start_date.trigger_label / clear_action / inbox labels. - Tests: new TestNotification_StartDateChanged; existing Issue / draft / modal fixtures extended with start_date. Co-authored-by: multica-agent <github@multica.ai> * feat(issues): align start_date with due_date in actions menu and CLI table - Add Start Date submenu (today / tomorrow / next week / clear) in actions menu, mirroring Due Date — parity with the Due Date quick setters in list/board context and 3-dot menus. - Add corresponding en / zh-Hans i18n keys (actions.start_date / start_today / start_tomorrow / start_next_week / start_clear). - CLI human table for `multica issue list` and `multica issue get` now shows a START DATE column next to DUE DATE; --full-id variant too. Co-authored-by: multica-agent <github@multica.ai> --------- Co-authored-by: multica-agent <github@multica.ai>	2026-05-17 15:01:38 +08:00
Jiayuan Zhang	668cab6022	feat(github): mirror PR CI checks and merge conflict status (MUL-2228) (#2632 ) * feat(github): mirror PR CI checks and merge conflict status (MUL-2228) Surface "checks passed/failed" and "conflicts/no conflicts" badges under each linked PR on the issue page so users can judge readiness without flipping over to GitHub. CI state is fed by check_suite webhooks (GitHub Actions + apps using the Checks API; legacy status events are out of scope for MVP); conflicts are read from pull_request.mergeable_state. Data model: * github_pull_request: add head_sha + mergeable_state * github_pull_request_check_suite: per-suite rows keyed by (pr_id, suite_id) * Aggregation done at query time, filtering by current head_sha so late-arriving suites for a stale head can't contaminate the new head's pending view; per-app latest suite chosen first so a single app firing multiple suites isn't counted N times. Webhook hardening: * synchronize/opened/reopened/edited(base) explicitly clear mergeable_state * single-row ordering protection on the check_suite upsert prevents a late-delivered older event from overwriting a newer one * check_suite.pull_requests is iterated; unknown PRs are logged and dropped UI: * PR row shows Checks + Conflicts badges; opaque mergeable values (blocked/behind/unstable/...) render as no badge, not as conflicts. * Terminal PR states (merged/closed) suppress the status row entirely. Tests: * Pure unit coverage for derivePRMergeableState + aggregateChecksConclusion * Webhook integration tests: multi-app aggregation, old-head ignore, late-older-event ignore, synchronize clears mergeable_state * Vitest coverage for pull-request-list badge rendering across CI/conflict combinations and the legacy (null) fallback. Co-authored-by: multica-agent <github@multica.ai> * fix(github): scope check_suite PR lookup; preserve mergeable on metadata Addresses code review on PR #2632. 1. check_suite handler now resolves the PR through the workspace-scoped GetGitHubPullRequest query instead of GetGitHubPullRequestByRepoNumber. The (workspace_id, repo_owner, repo_name, pr_number) tuple is the real uniqueness key, so a bare (owner, repo, number) lookup could return a stale row from another workspace and either land the suite on the wrong PR or skip the right one when the installation ids drifted. The old unscoped query is removed. 2. derivePRMergeableState now returns (value, clear) and the upsert SQL distinguishes three cases: state-changing actions clear the column to NULL, non-empty payloads write the value, and metadata events with an empty payload preserve the existing column. Previously every empty payload became NULL, so a labeled/assigned event silently wiped a known clean/dirty verdict in violation of the RFC's "metadata empty payload preserves" rule. 3. ListPullRequestsByIssue narrows to the issue's PR ids before running the per-app check_suite aggregation, avoiding a full-table scan over github_pull_request_check_suite when only a handful of rows belong to the requested issue. New helper test covers labeled+empty preserves; new integration test verifies a metadata event after a known mergeable_state keeps the value. Co-authored-by: multica-agent <github@multica.ai> * feat(github): PR card layout v3 increment — stats + segmented progress bar Replaces the row + badge layout under "Pull requests" on the issue detail sidebar with a card that mirrors the GitHub PR summary look: title, author/avatar, +N −M · K files diff stats, segmented progress bar (failed → pending → passed, failure leftmost), and a one-line status caption following an explicit priority pass-through. Backend - Migration 092: github_pull_request adds additions / deletions / changed_files (INT NOT NULL DEFAULT 0). Zero defaults are what the new frontend treats as "legacy backend — hide the stats row" so old PR rows that pre-date this migration don't render "+0 −0 · 0 files". - pull_request webhook handler reads stats off the top-level payload. - ListPullRequestsByIssue now surfaces per-suite counts (checks_passed / failed / pending) alongside the existing aggregate conclusion, so the segmented bar reuses the already-computed counts with no new aggregation. Frontend (packages) - core/github/pull-request-status.{ts,test.ts}: pure-function module for the status-kind priority table and the segment derivation; 15 cases covered, includes the "all-zero → hide stats" guard. - views/issues/components/pull-request-list.tsx: PullRequestCard plus a compact-row fallback used when count > 4 (first 3 as cards, the remainder collapsed behind a Show more toggle). - i18n: new `pull_request_card_` keys in en + zh-Hans. Tests - 12 component tests covering each rule of the priority table, the legacy-zero stats fallback, and the collapse threshold. - Reuse of the v3 webhook handler tests confirmed. Verification - pnpm typecheck + pnpm test green (60 test files, 536 tests). - go build ./... + go vet ./... clean. - 6 demo issues (DEV-2..DEV-7) screenshotted via Playwright; see the PR comments for the visual check matrix. Co-authored-by: multica-agent <github@multica.ai> fix(views): collapse PR cards at N>=4, not N>4 The card-vs-collapse threshold used `>` so 4 PRs slipped past it and all rendered as full cards, contrary to RFC v3 (N >= 4 collapses to 3 cards + compact tail). Switch to `>=` and update the threshold- boundary test to expect "Show 1 more". Co-authored-by: multica-agent <github@multica.ai> * fix(views): align PR sidebar rows with existing list style Co-authored-by: multica-agent <github@multica.ai> * fix(views): hide terminal PR status badges Co-authored-by: multica-agent <github@multica.ai> --------- Co-authored-by: multica-agent <github@multica.ai>	2026-05-16 21:26:30 +02:00
Jiayuan Zhang	431006e7d6	feat(daemon): add debug-level logs at key debug-path nodes (MUL-2304) (#2733 ) Local daemon previously logged mostly at Info, leaving startup/exit, config resolution, registration, heartbeat ticks, agent invocation, and result classification undiagnosable without code-reading. Add Debug logs at those checkpoints so LOG_LEVEL=debug (the default) produces enough detail to follow a run end-to-end without changing normal Info output. Co-authored-by: multica-agent <github@multica.ai>	2026-05-16 18:02:12 +02:00
Jiayuan Zhang	9bd17058f8	fix(daemon): bump idle watchdog default 5m → 30m (MUL-2300) (#2728 ) * fix(daemon): bump idle watchdog default 5m → 30m (MUL-2300) The previous 5 min default killed legitimate long assistant outputs (e.g. RFC-length writeups) where the model streams a single message for many minutes without any daemon-visible activity. 30 min keeps the safety net for truly stuck runs (dockerd hang) while leaving headroom for long writes. runIdleWatchdog tick interval is window/2, with a 30 s floor that only applies when interval < 30 s — at window=30 min the natural tick is 15 min, so no sync needed. Co-authored-by: multica-agent <github@multica.ai> * docs(daemon): drop stale 5-minute mention from idle watchdog comment Refers to DefaultAgentIdleWatchdog so the comment stays in sync if the default shifts again. Follow-up to Emacs review on PR #2728. Co-authored-by: multica-agent <github@multica.ai> --------- Co-authored-by: multica-agent <github@multica.ai>	2026-05-16 17:20:10 +02:00
Jiayuan Zhang	4c7a990a25	fix(autopilot): attribute autopilot-created issue to assignee agent (MUL-2293) (#2719 ) Before: dispatchCreateIssue copied autopilot.created_by_type/id onto the new issue's creator_type/creator_id, and the same fields were used as the ActorType/ActorID of the issue:created event. Result: any issue spawned by an autopilot was reported as created by the human who first configured the autopilot, not by the agent that actually owns the work. Downstream subscriber/activity/notification listeners inherited the same wrong actor. After: creator and actor are both the autopilot's assignee agent (creator_type=agent, creator_id=ap.assignee_id). The human owner is still recoverable via origin_type=autopilot + origin_id. Audited the other ap.created_by_* usages: analytics attribution (autopilotActorID, task.go user-id), and the private-agent visibility gate in shouldSkipDispatch — all correctly read the autopilot's owner, not the executor, so they stay as-is. Co-authored-by: multica-agent <github@multica.ai>	2026-05-16 09:32:15 +02:00
Jiayuan Zhang	380c6b5122	feat(usage): add Time and Tasks to daily-trend toggle (MUL-2283) (#2709 ) Extends the workspace /usage page Daily tokens chart toggle from Tokens \| Cost to Tokens \| Cost \| Time \| Tasks, so users see daily run-time and task-count trends alongside spend without leaving the page. - New SQL `ListDashboardRunTimeDaily`: per-date totals from agent_task_queue (terminal tasks only), scoped to workspace and optionally project. Same time anchor as ListDashboardAgentRunTime so day boundaries line up. - New handler GET /api/dashboard/runtime/daily + TanStack Query option. - New DailyTimeChart (single-series, smart h/m/s unit) and DailyTasksChart (completed + failed stacked). - Empty-state is per-metric so a workspace with tokens but no terminal runs (or vice-versa) doesn't get a false "no data". - i18n: en + zh-Hans daily.metric_time / metric_tasks + titles. Co-authored-by: multica-agent <github@multica.ai>	2026-05-15 18:51:02 +02:00
Bohan Jiang	bfe9bf3eea	feat(daemon): force-stop hung agent runs via idle watchdog (MUL-2281) (#2691 ) * feat(daemon): force-stop hung agent runs via idle watchdog (MUL-2281) A backend whose subprocess hangs on a stuck child process (e.g. claude blocked on `docker ps` against a frozen dockerd) keeps the daemon's run record at status="running" until the full DefaultAgentTimeout (2 h) expires, because cmd.Wait() never returns and Session.Result is never written. MUL-2225 spent 17+ minutes in this state in the wild. Add a per-task idle watchdog around executeAndDrain: - Wrap the caller's ctx so a single cancel propagates to the agent subprocess (via the ctx passed to backend.Execute) AND the drain loop. - Stamp lastActivityAt every time the drain loop receives a message. - Tick at window/2; when idle_for >= window AND session.Messages buffer is empty, set a fired flag and call cancel. - Tag the resulting Result.Status as "idle_watchdog" so runTask routes it through a dedicated failure_reason instead of "agent_error". Default window is 5 min, configurable via MULTICA_AGENT_IDLE_WATCHDOG; set to 0 to disable. Tests cover the activity-then-silence case, the zero-message case, the disabled case, and the happy path. Co-authored-by: multica-agent <github@multica.ai> * fix(daemon): skip idle watchdog while a tool call is in flight A legitimate long-running tool call (npm install, docker build, test suite) can sit silent between tool_use and tool_result for many minutes. Without this gate, the watchdog would yank the agent mid-build. Track unmatched tool_use messages in an atomic counter; only let the watchdog fire when the counter is zero. tool_result clamps non-negative so a stray result with no matching use can't re-arm the watchdog one call too early. Adds two regression tests: - DoesNotFireDuringInFlightToolCall: tool_use -> silence past window -> tool_result -> completed (must NOT fire) - FiresAfterToolResultIfBackendStaysSilent: tool_use -> tool_result -> silence past window (MUST fire — backend really is stuck) Co-authored-by: multica-agent <github@multica.ai> --------- Co-authored-by: multica-agent <github@multica.ai>	2026-05-15 19:48:39 +08:00
Jiayuan Zhang	8e88156356	Add assignee grouping for issue boards (#2693 )	2026-05-15 18:44:08 +08:00
iYuan	d8635ad580	fix(issues): prevent duplicate active issue creation (MUL-2225) (#2602 ) * fix: prevent duplicate active issue creation * fix(issues): address duplicate guard review * fix(autopilot): skip duplicate issue admissions * fix(issueguard): tighten duplicate lookup edge cases * test(issues): cover duplicate guard autopilot skips * feat(autopilots): group skipped runs in history	2026-05-15 18:27:56 +08:00
Bohan Jiang	fcd13aece9	feat(daemon): auto-update CLI when idle (MUL-2100) (#2679 ) * feat(daemon): auto-update CLI when idle (MUL-2100) Add a periodic poller that checks GitHub for a newer multica release every hour and self-updates when the daemon is idle, reusing the same brew-or-download upgrade path the Runtimes-page "Update" button already runs. - Refactor handleUpdate to call a shared runUpdate(target) helper so both server-triggered and auto-triggered upgrades go through the same brew detection + atomic replace + restart. - New autoUpdateLoop gates each tick on: opt-out flag, Desktop launch source, dev-build version, an in-flight update, and active tasks. The idle gate guarantees we never interrupt a running agent — busy ticks silently retry at the next interval. - Config: MULTICA_DAEMON_AUTO_UPDATE=false to disable (also via --no-auto-update), MULTICA_DAEMON_AUTO_UPDATE_INTERVAL to retune the poll period. - IsNewerVersion / IsReleaseVersion helpers in the cli package, with tests covering patch/minor/major bumps, dev-describe strings, and malformed input. - Daemon-side tests cover every skip path (updating, active tasks, fetch failure, no-newer) plus the success path that fires triggerRestart while keeping the updating flag held to the end. Co-authored-by: multica-agent <github@multica.ai> * fix(daemon): close idle race + verify checksum in auto-update (MUL-2100) Two issues raised in PR #2679 review: 1. The first idle check in tryAutoUpdate only ran before the release-metadata fetch, so a poller that won the claim race during the fetch could end up handing handleTask a task that triggerRestart was about to cancel via root- ctx cancellation. Add a strict claim barrier: runRuntimePoller now tryEnterClaim()s before ClaimTask, and tryAutoUpdate flips pauseClaims under claimMu only after observing claimsInFlight + activeTasks == 0. Pollers that were already mid-claim hold claimsInFlight > 0, so the barrier refuses to engage and the update defers to the next tick. 2. The direct-download path replaced the running binary with whatever bytes GitHub returned, without checking checksums.txt. Pull the manifest first, buffer the archive, and reject on SHA-256 mismatch before extraction. The GoReleaser config already publishes checksums.txt; we just consume it. Also tighten parseReleaseVersion so it stops accepting dev-describe shapes like "v0.1.13-5-gabcdef0" through the patch trim, matching its docstring. The auto-update loop already guards on IsReleaseVersion, but the lenient parser was a footgun and the existing test name even said "not newer" while asserting the opposite. Tests: - TestTryAutoUpdate_DefersWhenClaimInFlightAtBarrier (new race coverage) - TestTryAutoUpdate_HoldsBarrierAcrossRestart / ReleasesBarrierOnUpgradeFailure - TestTryEnterClaim_RespectsBarrier - TestFindChecksumManifestAsset / TestParseChecksumManifest / TestVerifyAssetSHA256 - TestIsNewerVersion: dev-describe cases now expect false (matches docstring) Co-authored-by: multica-agent <github@multica.ai> * chore(daemon): default auto-update poll interval to 6h (MUL-2100) 1h was overly chatty for a release that lands at most a few times a week. Operators who want a different cadence can still set MULTICA_DAEMON_AUTO_UPDATE_INTERVAL or --auto-update-interval. Co-authored-by: multica-agent <github@multica.ai> --------- Co-authored-by: multica-agent <github@multica.ai>	2026-05-15 18:10:22 +08:00
Naiyuan Qing	5ad1641b72	Revert "Squad archive dialog + role editor + transactional DeleteSquad (#2680 )" (#2687 ) This reverts commit `2980ead4c7`.	2026-05-15 17:44:59 +08:00
Naiyuan Qing	2980ead4c7	Squad archive dialog + role editor + transactional DeleteSquad (#2680 ) * docs(squad): address plan-review feedback for archive + role plan Resolve the 4 items the reviewer raised on MUL-2265: 1. TS schema: declare `active_issue_count` as optional (`number \| null \| undefined`) so list/create/update Squad responses don't lie about their shape; only `getSquad` parses through SquadSchema. 2. Archive semantics: restrict TransferSquadAssignees to active issues (status NOT IN done, cancelled) so dialog count and SQL operate on one set and terminal-state issues keep their historical assignee. 3. Index assumption: corrected — `idx_issue_assignee (assignee_type, assignee_id)` exists and is sufficient at realistic squad cardinality; no new index needed. 4. Fixed `int64` test comparison and added `.loose()` to SquadSchema per the local schemas.ts convention. Co-authored-by: multica-agent <github@multica.ai> docs(squad): plan v3 — revert to count-all/transfer-all on archive Reviewer round 2 surfaced two structural problems with plan v2's active-only carve-out: 1. useActorName resolves squad names via ListSquads, which filters archived_at IS NULL. A closed issue with an archived-squad assignee would render as "Unknown Squad". 2. The status-only update path in UpdateIssue skips validateAssigneePair, so a done/cancelled issue with an archived-squad assignee could be reopened to in_progress, violating the "no active issue on an archived squad" invariant enforced elsewhere. Both problems disappear by reverting to count-all + transfer-all: after ArchiveSquad runs, no issue points at the archived squad, so neither case can occur. The product trade-off is that closed historical issues now show the leader agent instead of the archived squad in their "Assigned to" badge — consistent with existing agent-level reassignment behavior elsewhere in the product. Field rename: active_issue_count -> issue_count. TransferSquadAssignees SQL is unchanged (already transfers all). Co-authored-by: multica-agent <github@multica.ai> * docs(squad): add Task 2b — wrap DeleteSquad transfer + archive in one tx Reviewer round-3 flagged that the v3 invariant ("after archive no issue points to the squad") was asserted on the happy path only. DeleteSquad's current best-effort impl breaks it two ways: - transfer failure → slog.Warn but archive proceeds (Unknown Squad, reopen-into-archived-squad bugs reappear) - archive failure after a committed transfer → 500 with squad still active but emptied Task 2b rewrites DeleteSquad to run TransferSquadAssignees + ArchiveSquad inside one pgx tx, mirroring the project.go:266-314 pattern. Publish moves below Commit. Adds two regression tests that lock both partial-write failure modes. Co-authored-by: multica-agent <github@multica.ai> * feat(squad): replace native confirm() with AlertDialog and rewrite role editor as combobox Backend: - Add CountIssuesForSquad sqlc query (counts every issue assigned to a squad, no status filter — matches the existing transfer-all archive semantics). - Extend SquadResponse with optional `issue_count` (`int64` + omitempty, populated only by GetSquad to avoid an N+1 in the list endpoint). - Wrap DeleteSquad's transfer + archive in a single pgx transaction so the v3 invariant ("after archive, no issue points to the squad") is durable rather than best-effort. Promote slog.Warn to slog.Error and check the parseUUIDOrBadRequest ok flag (silent zero-UUID was a #1661-class latent bug). Publish only after Commit so realtime never sees rolled-back state. - Tests cover happy path (count, transfer-all including terminal statuses) and both rollback directions (transfer fail / archive fail) via a fault-injecting tx wrapper. Frontend: - Extend Squad TS type with `issue_count?: number \| null` (optional — list/create/update legitimately omit it). Add SquadSchema with `.loose()` and wrap getSquad with parseWithFallback so older servers and count-error responses degrade to the dialog's "no count" copy variant. - Replace `window.confirm()` with shadcn `ArchiveSquadConfirmDialog` (destructive variant, leader name + count + closed-issue caveat in the copy, Loader2 while pending). i18n keys added under squads.archive_dialog. - Rewrite RoleEditor as a Popover + Command combobox: Pencil affordance is always visible, suggestions aggregate other members' roles, commit only on Enter or selecting a suggestion (blur discards), per-member savingId drives Loader2 so the spinner only renders on the row being saved. Co-authored-by: multica-agent <github@multica.ai> fix(squad): discard RoleEditor draft on close and no-op blank Enter Two reviewer findings on `e0d754bf`: 1. Closing the Popover (outside click, Esc, trigger re-click) left `query` in state, so reopening + Enter would commit the stale draft. Clear `query` on every non-saving close path. 2. With an existing role, opening the editor and pressing Enter on an empty input committed "" — `commit` only no-op'd when trimmed matched value. Treat blank Enter as a no-op; clearing a role would need an explicit clear action that doesn't exist yet. Add two regression tests: - close (via outside click) → reopen surfaces a clean input; Enter does not commit the stale draft - blank Enter on an existing role does not call onSave Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Co-authored-by: multica-agent <github@multica.ai> * fix(squad): add explicit Clear button to RoleEditor Role is optional, but the previous fix turned blank Enter into a no-op without exposing any other way to clear an existing role — that broke a valid terminal state. Keep blank Enter as no-op; add a "Clear role" button at the bottom of the popover that only renders when value is non-empty and routes through onSave(""). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Co-authored-by: multica-agent <github@multica.ai> --------- Co-authored-by: multica-agent <github@multica.ai> Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-15 17:29:37 +08:00
LinYushen	319b23eb39	Revert "feat(task): add claim lease mechanism (Phase 2, MUL-2246) (#2660 )" (#2674 ) This reverts commit `3137feecdf`.	2026-05-15 16:07:23 +08:00
LinYushen	b7a58c06ac	Revert "feat(task): wire claim lease into TaskService and sweeper (MUL-2246) …" (#2673 ) This reverts commit `bb32be0e50`.	2026-05-15 16:06:58 +08:00
LinYushen	bb32be0e50	feat(task): wire claim lease into TaskService and sweeper (MUL-2246) (#2662 ) * feat(task): wire claim lease queries into TaskService and sweeper (MUL-2246) - ClaimTask now uses ClaimAgentTaskWithLease (generates claim_token + lease) - StartTask accepts optional claim_token for token-verified start - AgentTaskResponse includes claim_token for daemon to use - Daemon client sends claim_token in StartTask body - Sweeper calls RequeueExpiredClaimLeases each tick - Legacy daemons without claim_token still work (graceful fallback) Co-authored-by: multica-agent <github@multica.ai> * fix(task): address PR #2662 review blockers (MUL-2246) 1. ClaimAgentTaskForRuntime: push runtime_id into atomic SQL WHERE clause so runtime A cannot claim tasks queued for runtime B under the same agent. 2. Legacy StartAgentTask: add claim_token IS NULL guard so leased rows cannot be started without token verification. Handler rejects malformed tokens with 400 instead of silently degrading to legacy path. 3. StartAgentTaskWithClaimToken: validate claim_expires_at >= now(), preserve claim_token until terminal state (only clear claim_expires_at), use CTE + UNION ALL for idempotent retry when daemon resends after a lost StartTask response. Return 409 Conflict on token mismatch/expiry. Co-authored-by: multica-agent <github@multica.ai> * fix(daemon): StartTask 409 handling, transport retry, claim_token on FailTask (MUL-2246) - StartTask 409 (claim superseded): release slot, don't call FailTask - StartTask transport timeout/5xx: retry once with same token, then check task status before failing - FailTask now sends claim_token; server-side FailAgentTask SQL adds AND (claim_token IS NULL OR claim_token = @claim_token) guard so stale daemons cannot fail tasks that have been re-claimed Co-authored-by: multica-agent <github@multica.ai> * fix(task): close FailTask token bypass and RequeueExpiredClaimLeases liveness gap (MUL-2246) Blocker 1 - FailTask token validation: - SQL: change (param IS NULL OR claim_token = param) to (param IS NULL AND claim_token IS NULL) OR claim_token = param so tokenless requests can only fail legacy (tokenless) rows. - task.go: malformed claim_token now returns ErrInvalidClaimToken (400) instead of being silently dropped to NULL. - Handler: maps ErrInvalidClaimToken→400, ErrClaimTokenInvalid→409. - Service: when UPDATE returns no rows but task is still active, return ErrClaimTokenInvalid (token mismatch) instead of silent success. Blocker 2 - RequeueExpiredClaimLeases runtime liveness: - SQL: JOIN agent_runtime, only requeue tasks where runtime is 'online'. Dead/offline runtime tasks stay dispatched for FailTasksForOfflineRuntimes. - FOR UPDATE → FOR UPDATE OF atq (required with JOIN). Regression tests: - task_claim_token_test.go: malformed, tokenless-on-tokened, wrong-token - requeue_lease_test.go: SQL must JOIN agent_runtime with online filter Co-authored-by: multica-agent <github@multica.ai> * fix(task): move expired lease requeue to ClaimTaskForRuntime preflight, add heartbeat freshness backstop (MUL-2246) - Add RequeueExpiredClaimLeasesForRuntime: per-runtime preflight self-requeue in ClaimTaskForRuntime. Runtime proves liveness by actively claiming, so no heartbeat check needed. - Update global RequeueExpiredClaimLeases to require ar.last_seen_at freshness (stale_threshold_secs param). Prevents requeuing to a dead runtime in the 90s gap between lease expiry (60s) and offline detection (150s). - Add regression tests verifying the heartbeat freshness check and that the preflight query does not join agent_runtime. Co-authored-by: multica-agent <github@multica.ai> * fix(task): use LivenessStore for global requeue, move preflight before empty-cache (MUL-2246) Blocker 1: Global RequeueExpiredClaimLeases now uses LivenessStore.IsAliveBatch to verify runtimes are truly alive before requeuing expired leases. When LivenessStore is unavailable (no Redis), global requeue is skipped entirely — the preflight self-requeue in ClaimTaskForRuntime handles live runtimes. This closes the 60-150s gap where a dead runtime still appears online in DB. Blocker 2: Moved RequeueExpiredClaimLeasesForRuntime BEFORE EmptyClaim.IsEmpty fast-path in ClaimTaskForRuntime. Expired leases are now requeued (which bumps the empty cache via notifyTaskAvailable) before the empty check can short-circuit the claim path. Also adds ListRuntimesWithExpiredClaimLeases SQL query and LivenessChecker interface on TaskService. Co-authored-by: multica-agent <github@multica.ai> * fix(task): wire EmptyClaimCache into backend taskSvc for backstop requeue (MUL-2246) The backend taskSvc used by the sweeper only had Liveness wired but not EmptyClaim. When global backstop requeue called notifyTaskAvailable, s.EmptyClaim.Bump() was a nil no-op — the handler's empty-cache was never invalidated, so the daemon's next claim hit a stale empty verdict. Fix: wire the same Redis-backed EmptyClaimCache into the backend taskSvc in main.go (same Redis keys as router.go:139 handler instance). Add regression test verifying backstop requeue invalidates the handler's empty-cache. Co-authored-by: multica-agent <github@multica.ai> * fix(task): global backstop must not requeue — alive runtimes use preflight, dead stay dispatched (MUL-2246) - RequeueExpiredClaimLeases is now a no-op (returns 0 always) - Alive runtimes self-requeue via ClaimTaskForRuntime preflight - Dead runtimes stay dispatched for FailTasksForOfflineRuntimes - Rewriting to queued on dead runtime creates 2h blackhole (offline sweeper only handles dispatched/running) - Test actually calls RequeueExpiredClaimLeases and asserts 0 in all cases Co-authored-by: multica-agent <github@multica.ai> * fix(daemon): remove duplicate usage reporting block after merge conflict (MUL-2246) The merge resolution introduced a second ReportTaskUsage call after the status check, duplicating the usage-before-early-return block that already runs right after runner.run. Remove the duplicate and add a regression test asserting /usage is called exactly once on the normal completion path. Co-authored-by: multica-agent <github@multica.ai> --------- Co-authored-by: multica-agent <github@multica.ai>	2026-05-15 15:15:31 +08:00
LinYushen	3137feecdf	feat(task): add claim lease mechanism (Phase 2, MUL-2246) (#2660 ) Add claim_token + claim_expires_at columns to agent_task_queue and three new SQL queries for the claim lease protocol: - ClaimAgentTaskWithLease: generates a UUID token and sets a lease expiry when claiming a task, so the daemon must prove it received the response - StartAgentTaskWithClaimToken: validates the token on StartTask, preventing stale daemons from starting requeued tasks - RequeueExpiredClaimLeases: moves dispatched tasks with expired leases back to queued for re-claim This closes the reliability gap where a claim response lost in transit leaves a task stuck in dispatched until the 60s dispatch timeout fires. Co-authored-by: multica-agent <github@multica.ai>	2026-05-15 15:14:05 +08:00
Bohan Jiang	a23856bae3	MUL-1624 docs(email): clarify 888888 is opt-in; document SMTP option (#2666 ) * docs(email): clarify 888888 is opt-in via MULTICA_DEV_VERIFICATION_CODE; document SMTP option in self-host docs The startup log line, .env.example, and SELF_HOSTING_ADVANCED.md still implied that the dev master code 888888 is auto-active whenever APP_ENV != "production". That has not been true since the master code was gated behind MULTICA_DEV_VERIFICATION_CODE — the fixed code is disabled by default and must be opted in explicitly. Also extend the docs site with the SMTP relay backend added in #1877: auth-setup, environment-variables, and self-host-quickstart now cover both Resend and SMTP options in EN and ZH. Co-authored-by: multica-agent <github@multica.ai> * docs(email): treat SMTP as an email backend in self-host docs and startup warning Address review feedback on #2666: - server: startup warning now fires only when both RESEND_API_KEY and SMTP_HOST are empty, since either one is a valid email backend. Otherwise the log mis-tells SMTP-only operators that verification codes go to stdout. - self-host-quickstart (EN/ZH): tell readers to fetch the verification code from whichever backend they configured (Resend or SMTP); fall back to stdout only when neither is configured. - auth-setup (EN/ZH): \"without Resend\" → \"without any email backend configured\" so the wording stays correct now that SMTP is a first-class option. Co-authored-by: multica-agent <github@multica.ai> --------- Co-authored-by: multica-agent <github@multica.ai>	2026-05-15 14:18:46 +08:00
LinYushen	75dc70686b	fix(realtime): include actor_type in WS broadcast messages (#2668 ) * fix(realtime): include actor_type in WebSocket broadcast messages The WS broadcast message format was {type, payload, actor_id} but missing actor_type. This meant the web UI could not distinguish agent from human operations in real-time events at the top level. While payload data for comments (author_type) and activities (entry.actor_type) already included the type, the top-level message did not — causing the web UI to display agent CLI operations as human operations when relying on the broadcast actor identity. Changes: - server/cmd/server/listeners.go: add actor_type to all broadcast messages - packages/core/types/events.ts: add actor_type to WSMessage interface - packages/core/api/ws-client.ts: pass actor_type to event handlers - packages/core/realtime/hooks.ts: update EventHandler type signature - packages/core/realtime/provider.tsx: update EventHandler type signature Fixes MUL-2260 Co-authored-by: multica-agent <github@multica.ai> * test: add frame-shape unit test asserting actor_type in WS frames Co-authored-by: multica-agent <github@multica.ai> --------- Co-authored-by: multica-agent <github@multica.ai>	2026-05-15 14:10:24 +08:00
apollion69	35e9a7f0f6	feat(email): add SMTP relay as alternative to Resend for self-hosted deployments (#1877 ) * feat(email): add SMTP relay as alternative to Resend Self-hosted deployments often run behind a corporate firewall with an existing SMTP relay (Exchange, Postfix, sendmail) and no access to external SaaS APIs. Resend requires a public domain, an API key, and outbound HTTPS to api.resend.com — all unavailable in air-gapped or private-network setups. This adds a second email delivery path using Go's stdlib net/smtp, activated when SMTP_HOST is set. Priority order: 1. SMTP relay (SMTP_HOST set) 2. Resend API (RESEND_API_KEY set) 3. DEV stdout (neither set) New env vars (all optional, no breaking change): SMTP_HOST — SMTP server hostname SMTP_PORT — port, default 25 SMTP_USERNAME — for authenticated SMTP; empty = unauthenticated relay SMTP_PASSWORD — used only when SMTP_USERNAME is set SMTP_TLS_INSECURE — set to "true" to skip TLS cert verification (for private CA / self-signed certs) The implementation: - Dials TCP, creates smtp.Client manually (avoids smtp.SendMail which does not expose TLS config) - Tries STARTTLS if advertised; uses InsecureSkipVerify only when SMTP_TLS_INSECURE=true (opt-in, nolint:gosec annotated) - Applies PlainAuth only when SMTP_USERNAME is non-empty - Wraps all errors with context for easier debugging - Reuses existing HTML templates from buildInvitationParams for invitation emails (no template duplication) Also updates .env.example and docker-compose.selfhost.yml with the new variables and inline documentation. * fix(email): add dial timeout, session deadline, RFC headers for SMTP path Address review blockers from multica-eve and Bohan-J (PR #1877): - net.Dial → net.DialTimeout(10s) + conn.SetDeadline(30s) so a blackholed SMTP relay cannot hang SendVerificationCode (called synchronously from the auth handler) or leak goroutines in the invitation path. - Add Date, Message-ID, and proper Content-Transfer-Encoding headers. Date is required by RFC 5322; many strict relays reject messages without it. Message-ID aids deliverability and threading. - MIME-encode Subject via mime.QEncoding so non-ASCII workspace/inviter names (CJK, emoji) survive without corruption across any RFC 2047-conformant relay. - Probe 8BITMIME after (possible) STARTTLS: use Content-Transfer-Encoding 8bit when the relay advertises 8BITMIME, quoted-printable otherwise — safe for all relay configurations without forcing base64 overhead. - Update SELF_HOSTING_ADVANCED.md to document Option B (SMTP relay) alongside the existing Resend section, including all five env vars and a note that port 465/SMTPS is not yet supported. * fix(email): correct has8Bit assignment order (bool is first return of Extension)	2026-05-15 13:35:01 +08:00
joyanup	4c1fd60215	fix(daemon): report task usage before cancel check (#1180 ) handleTask had two early-return paths that ran before ReportTaskUsage: the cancelledByPoll select and the post-run GetTaskStatus check. Both silently discarded any usage accumulated by the agent — and both claude.go and codex.go populate Result.Usage even when runCtx is cancelled mid-run, so cancelled tasks consistently under-reported tokens. Hoist ReportTaskUsage to run immediately after the runner returns, before any early-return path. Add a taskRunner interface seam and a cancelPollInterval field so tests can inject a fake runner and trigger the poll-cancellation path on a 10ms ticker without spawning real agents. Two regression tests cover both leak windows: - TestHandleTask_ReportsUsageBeforeCancel: post-run /status returns "cancelled"; usage must be reported before the status check. - TestHandleTask_ReportsUsageWhenCancelledByPoll: poll goroutine fires first and cancels runCtx; runner returns usage on Done; assert poll-status precedes usage (proving the cancelledByPoll branch was the one exercised, not the post-run path). Sanity-checked: reverting only the ReportTaskUsage hoist fails both tests with the original "tokens lost" message. MUL-2258 Co-authored-by: Jiang Bohan <bhjiang@outlook.com> Co-authored-by: multica-agent <github@multica.ai>	2026-05-15 13:33:17 +08:00
LinYushen	e6e9a9f77d	squad_briefing: add hard rule requiring mention link for every delegation (#2663 ) Without the full [@Name](mention://<type>/<UUID>) syntax, the platform does not trigger the target agent. Add an explicit, strongly-worded hard rule at the top of the list so the leader model never forgets. Co-authored-by: multica-agent <github@multica.ai>	2026-05-15 13:17:08 +08:00
Naiyuan Qing	f29bd93444	feat(squads): rework Create Squad modal (MUL-2233) (#2645 ) * feat(squad): accept avatar_url on CreateSquad Threads avatar_url through the SQL query, sqlc-generated code, and the Go handler so the create-squad flow can persist an avatar at creation time instead of forcing a follow-up PATCH. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Co-authored-by: multica-agent <github@multica.ai> * feat(squad): add avatar_url to CreateSquadRequest Extends the TS contract for the new backend field so the frontend can pass an uploaded avatar URL through api.createSquad. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Co-authored-by: multica-agent <github@multica.ai> * feat(squads): rework Create Squad modal to match CreateAgentDialog (MUL-2233) Replaces the cramped small-dialog flow with the same large-dialog shape used by Create Agent: identity row (AvatarPicker + name + description with char counter), grouped Leader picker (My Agents first, then Workspace Agents), and a new multi-select Additional Members picker covering agents and workspace members. The members trigger collapses to "+N" once more than three are selected; promoting an agent to leader auto-drops it from the additional-members list. After createSquad, additional members are attached via Promise.allSettled so a single failure surfaces a warning toast without blocking navigation — the squad still exists and the user can retry from the Members tab. Adds packages/views/modals/create-squad.test.tsx covering identity binding, leader-group ordering, leader/member conflict sanitization, the empty- and partial-failure success paths, and the create-failure recovery path. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Co-authored-by: multica-agent <github@multica.ai> * fix(squads): valid trigger HTML + drop conflicted leader from members Two issues from PR #2645 review: 1. AdditionalMembersPicker's PopoverTrigger was a <button> containing MemberChip's remove <button>, which React/HTML flags as nested interactive content (hydration + a11y warning). Render the trigger as a <div role="combobox"> via Base UI's render prop so the chip's remove button is valid. 2. sanitizedMembers only hid the leader from rendered/submitted output, so promoting an additional member to leader then switching leader away resurrected the hidden pick. Drop it from selectedMembers at the moment of promotion via handleLeaderChange; sanitizedMembers is no longer needed. Adds a test that promotes → switches leader and asserts the member is not resubmitted. Co-authored-by: multica-agent <github@multica.ai> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com> Co-authored-by: multica-agent <github@multica.ai>	2026-05-15 13:11:08 +08:00
Jiayuan Zhang	25182995c6	fix(projects): accept SSH repo URLs for github_repo resources MUL-2112 (#2492 ) * fix(projects): accept SSH repo URLs for github_repo resources (#2484) The project resource validator rejected anything that wasn't http(s), so workspace repos configured with an SSH remote (ssh:// or the scp-like `git@host:owner/repo.git` shorthand) could not be attached to a project. Both forms are valid git remotes and the daemon hands the URL straight to `git clone`, so the API has no reason to require https specifically. Relax the validator to accept http/https/ssh/git schemes and the scp-like shorthand, while still rejecting pasted garbage (no scheme, missing host, missing path, ftp://, file://, etc.). Co-authored-by: multica-agent <github@multica.ai> * fix(projects): reject scp-like URLs with '@' after ':' to avoid panic isValidGitRepoURL indexed '@' and ':' independently, then sliced s[at+1 : colon]. For inputs without '://' where '@' appears after the first ':' (e.g. `host:org/repo@branch`), `at+1 > colon` triggered a slice-bounds panic instead of a 400. Guard the slice and treat such inputs as malformed. Co-authored-by: multica-agent <github@multica.ai> --------- Co-authored-by: multica-agent <github@multica.ai>	2026-05-15 12:47:38 +08:00
Bohan Jiang	8d872b7521	fix(daemon): disable Claude AskUserQuestion in non-interactive mode (MUL-2244) (#2656 ) * fix(daemon): disable Claude AskUserQuestion in non-interactive mode (MUL-2244) GitHub #2588: when Claude Code calls its built-in AskUserQuestion tool inside the daemon's stream-json runtime, the question never reaches the user — there's no UI to render it — so the SDK returns an empty answer and the agent silently "infers" and continues. From the issue's perspective, execution looks stuck while the agent is actually charging ahead on its own guess. Two-part fix: - `buildClaudeArgs` now passes `--disallowedTools AskUserQuestion` so the tool is not exposed to the model at all. - The Claude-specific runtime brief tells the agent to use a `blocked` issue comment for genuine clarification, or to state an explicit assumption and proceed. Adds a regression test that pins both: AskUserQuestion is forbidden in CLAUDE.md and is NOT mentioned in the AGENTS.md emitted for non-Claude providers (the tool is Claude-specific). Co-authored-by: multica-agent <github@multica.ai> * refactor(daemon): drop CLAUDE.md AskUserQuestion guidance, rely on --disallowedTools The --disallowedTools flag already prevents Claude from invoking AskUserQuestion, so duplicating the rule in the runtime brief just bloats the prompt without changing behavior. Removes the section and its regression test; the argv-level test in pkg/agent already pins the flag. Co-authored-by: multica-agent <github@multica.ai> --------- Co-authored-by: multica-agent <github@multica.ai>	2026-05-15 12:42:23 +08:00
Valentin Mihov	da7b33561e	fix: make quick-create output prefix agnostic (#2604 ) * fix: make quick-create output prefix agnostic * fix: remove quick-create prefix assumption from runtime config	2026-05-15 12:20:53 +08:00
Bohan Jiang	464201ba0d	feat(execenv): native OpenClaw skill discovery via per-task config (MUL-2219) (#2628 ) * feat(execenv): native OpenClaw skill discovery via per-task config MUL-2213 stopped lying about native discovery and routed openclaw skills to .agent_context/skills/ — a path openclaw's scanner never reads. Multica skills attached to openclaw-backed agents were still invisible to the runtime; the AGENTS.md fallback was only a documentation patch. OpenClaw's skill scanner walks <workspaceDir>/skills/ (plus a few other roots), and workspaceDir is resolved from the openclaw config file — specifically agents.list[id].workspace → agents.defaults.workspace → ~/.openclaw/workspace. There is no CLI flag or env var override on the agent runtime; the only knob is the config file. This change wires a per-task synthesized config: 1. execenv.prepareOpenclawConfig deep-copies the user's existing openclaw.json (priority: $OPENCLAW_CONFIG_PATH, else ~/.openclaw/openclaw.json), rewrites agents.defaults.workspace AND every agents.list[].workspace to the task workdir, and writes the result to {envRoot}/openclaw-config.json. Provider sections, registered agents, model providers, gateway settings — everything openclaw needs to actually start — are preserved as-is. 2. resolveSkillsDir for "openclaw" now points at {workDir}/skills/, which is the first path openclaw scans under workspaceDir. Skills written here are picked up natively. 3. daemon.go exports OPENCLAW_CONFIG_PATH={env.OpenclawConfigPath} on the openclaw subprocess and adds OPENCLAW_CONFIG_PATH to the custom_env blocklist so users cannot accidentally override it. 4. buildMetaSkillContent now lists openclaw alongside the "discovered automatically" providers; the .agent_context/skills/ fallback line stays for gemini/hermes. The new regression test TestPrepareOpenclawSkillWriteMatchesScanPath is the one MUL-2219's DoD calls out: it resolves the workspaceDir the way openclaw does (reading agents.defaults.workspace out of the synthesized config) and proves {workspaceDir}/skills/<name>/SKILL.md is what Multica actually wrote. The pre-MUL-2219 fix asserted "we wrote a file" without checking the scanner would ever see it — which is how the dead drop into .openclaw/skills/ landed in #2621's first commit. Verified locally: minimum-viable synthesized config validates via `openclaw config validate`, and `OPENCLAW_CONFIG_PATH=<path> openclaw config get agents.defaults.workspace` returns the task workdir as expected. MUL-2219 Co-authored-by: multica-agent <github@multica.ai> * fix(execenv): delegate openclaw config parsing to CLI and fail closed Address Elon's must-fix on PR #2628: the previous implementation parsed ~/.openclaw/openclaw.json with encoding/json, which cannot read JSON5 or follow $include — the OpenClaw spec's actual format. When parsing failed, prepareOpenclawConfig silently emitted a minimal config, which could boot OpenClaw without the user's registered agents, model providers, or API keys. Two changes: 1. Delegate active-config-path resolution and config reading to the openclaw CLI itself. `openclaw config file` locates the active config (covering OPENCLAW_CONFIG_PATH / OPENCLAW_STATE_DIR / OPENCLAW_HOME / default and the legacy chain), and the wrapper we write uses $include to point at it so OpenClaw's own loader handles JSON5, $include nesting, env-substitution, and secret refs. We read only agents.list via `openclaw config get --json` to rewrite each entry's workspace — secrets, comments, and includes in the user config are never touched. 2. Remove the silent minimal-config fallback. Any CLI failure, malformed output, or write error now surfaces as a hard error from Prepare / Reuse. The only "synthesize minimal" path left is a fresh install (CLI reports a path but the file doesn't exist), where there is no user data to lose. The per-task override still rewrites every agents.list[].workspace, not just agents.defaults.workspace — this is intentional task isolation, documented in prepareOpenclawConfig and the PR body. A host-scope per-agent workspace would otherwise silently route the scanner back to the user's shared workspace. Cleanups Elon flagged in the same review: - daemon.go inline-system-prompt comment no longer claims openclaw ignores the task workdir; it does load it now, and the inline brief is a belt-and-suspenders carryover for older releases. - execenv.go openclaw block no longer references "skill file paths in the inline brief" — the brief uses "discovered automatically". Reuse() switches to a ReuseParams struct so the openclaw binary path threads through alongside CodexVersion without a 6th positional arg. MUL-2219 Co-authored-by: multica-agent <github@multica.ai> * fix(execenv): grant OpenClaw $include cross-dir confinement for per-task wrapper The per-task wrapper at envRoot/openclaw-config.json $includes the user's active config (typically ~/.openclaw/openclaw.json), but OpenClaw confines $include resolution to the wrapper file's directory unless the target's parent is granted via OPENCLAW_INCLUDE_ROOTS. Without this, OpenClaw refuses to follow the link at runtime and the wrapper boots with no user-registered agents. prepareOpenclawConfig now returns dirname(activePath) as IncludeRoot, and the daemon prepends it to whatever the user already has in OPENCLAW_INCLUDE_ROOTS via the new composeOpenclawIncludeRoots helper (dedupes, drops empty segments, preserves user-configured roots). Fresh install emits no $include and leaves the env var untouched. Adds OPENCLAW_INCLUDE_ROOTS to the custom_env blocklist so a per-agent override cannot strip the granted root. Regression tests: - TestPrepareOpenclawConfigWrapperLoadableUnderIncludeConfinement asserts every $include target's dirname is covered by the IncludeRoot we surface. - TestPrepareEnvironmentOpenclawWiresIncludeRoot covers the non-fresh-install Environment wiring. - TestComposeOpenclawIncludeRoots covers the daemon-side env composition (preserve, dedupe, drop empties). Co-authored-by: multica-agent <github@multica.ai> --------- Co-authored-by: multica-agent <github@multica.ai>	2026-05-14 22:35:31 +08:00
Jiayuan Zhang	4d6b5ad06f	fix(squad): wake leader when dual-role agent posts as worker (MUL-2218) (#2626 ) * fix(squad): wake leader when dual-role agent posts as worker (MUL-2218) The squad-leader self-trigger guard skipped a comment whenever the author equalled the squad's leader id, regardless of the role the agent was acting in. For an agent that holds both leader and worker roles in the same squad, this meant the leader role never reacted to its own worker output and the issue stalled. Tag each enqueued task with is_leader_task and consult the agent's most recent task on the issue from both self-trigger guards (comment path + @squad mention path) — skip only when that task was itself a leader task. Co-authored-by: multica-agent <github@multica.ai> * fix(squad): inherit is_leader_task on retry task clone (MUL-2218) CreateRetryTask cloned a parent task into a fresh queued attempt but omitted is_leader_task from the column list, so the child silently fell back to the column default (false). For a leader task that hit auto-retry through MaybeRetryFailedTask, the retried task posed as a worker task — the self-trigger guard then no longer recognised the leader's own comments, re-opening the very loop MUL-2218 closes. Inherit p.is_leader_task in the clone and add a query-level test that covers both leader and worker retries. Co-authored-by: multica-agent <github@multica.ai> --------- Co-authored-by: multica-agent <github@multica.ai>	2026-05-14 15:23:36 +02:00
Bohan Jiang	8572a79950	MUL-2215: fix(daemon): close handleRuntimeGone success/straggler race (#2623 ) * MUL-2215: fix(daemon): close handleRuntimeGone success/straggler race handleRuntimeGone coalesced concurrent recoveries with a per-workspace `reregisterNextAttempt` slot that was deleted immediately on success. A late-arriving goroutine whose `removeStaleRuntime` was delayed by mutex contention could reach the coalesce gate after the winner cleared the slot, observe no slot, re-claim, and double-register — the source of the intermittent `register endpoint called 2 times under stampede, want 1` failure on PR #2348. The slot delete on success is intentional (a genuinely later distinct deletion in the same workspace must register again, validated by TestHandleRuntimeGone_DistinctDeletionsWithinCoalesceWindowBothRecover), so we can't just extend the slot's lifetime. Add a second per-workspace gate: `reregisterLastCompletedAt`. Every call captures `entryAt` at the top of handleRuntimeGone; at the coalesce gate a caller bails if `lastCompletedAt >= entryAt`, i.e. a peer's register completed AFTER we entered the function. Same-wave stragglers bail deterministically; distinct later events have `entryAt > lastCompletedAt` and proceed. Extracted the gate into `tryClaimRegisterSlot` / `recordRegisterCompletion` so the race can be exercised deterministically with synthetic timestamps instead of relying on `-count=N` to win the scheduling lottery. - TestHandleRuntimeGone_CoalescesConcurrentCallers: -count=500 -race clean (previously intermittent). - New unit tests cover the straggler bail, the distinct-later-event claim, failure backoff suppression, and peer-holds-slot coalescing. Co-authored-by: multica-agent <github@multica.ai> * MUL-2215: narrow completion stamp to success path Second review caught that recordRegisterCompletion stamped lastCompletedAt on both success and failure. A failed register has not covered any workspace state, so a same-wave straggler whose entryAt predates the failure must be allowed to retry once the failure backoff expires — the previous behavior would let the failure-time stamp also hide that straggler. workspaceSyncLoop only retries when a workspace's runtimeIDs fully drain, so partial-deletion recovery has to come from the straggler path. Failure path now only updates reregisterNextAttempt; success path keeps its existing stamp + slot clear. Add a regression test covering the entryAt-before-failed-completion / arrival-past-backoff edge. Co-authored-by: multica-agent <github@multica.ai> --------- Co-authored-by: multica-agent <github@multica.ai>	2026-05-14 21:01:55 +08:00
Bohan Jiang	f82a6adde9	fix(execenv): fall back OpenClaw skills to .agent_context/skills/ and stop claiming native auto-discovery (#2621 ) * fix(execenv): write OpenClaw skills to .openclaw/skills/ for native discovery The OpenClaw provider was missing a case in resolveSkillsDir, so workspace skills attached to OpenClaw-backed agents fell through to .agent_context/ skills/ — a path the openclaw CLI never inspects. The result: agents created against the OpenClaw runtime saw zero of their loaded Skills in chat or task runs, even though the meta AGENTS.md content advertised them as auto-discovered. Mirrors the same per-provider mapping already in place for OpenCode, Copilot, Pi, Cursor, Kimi, Kiro. Also adds .openclaw to the repocache git-exclude list so the per-task skills directory does not pollute checked-out repos. MUL-2213 Co-authored-by: multica-agent <github@multica.ai> * fix(execenv): drop .openclaw/skills dead-drop write; flag openclaw as non-auto-discovery Reviewer (Elon) pointed out that {workDir}/.openclaw/skills/ is not in any OpenClaw skill discovery path. Confirmed by reading openclaw upstream (src/agents/skills/refresh.ts, src/agents/agent-scope-config.ts, src/cli/program/register.agent.ts): - OpenClaw scans <workspaceDir>/skills, <workspaceDir>/.agents/skills, ~/.openclaw/skills, ~/.agents/skills, bundled, and config skills.load.extraDirs. - workspaceDir is resolved from the openclaw config (per-agent workspace -> agents.defaults.workspace -> ~/.openclaw/workspace). It is NOT the cwd of the openclaw process. - There is no --workspace CLI flag on 'openclaw agent', and no OPENCLAW_WORKSPACE env var consumed at runtime. The only knob is the config file. So {workDir}/.openclaw/skills/ written by Multica is never seen by the openclaw runtime, and the meta AGENTS.md was lying to the agent by claiming auto-discovery. Reverts: - resolveSkillsDir: drop the openclaw case; falls back to .agent_context/skills/ (same path as hermes). - agentGitExcludePatterns: drop .openclaw; nothing is written there now. Also updates the openclaw branch in buildMetaSkillContent to point the agent at .agent_context/skills/ explicitly (alongside gemini/hermes), so loaded skills are at least referenced by path in the AGENTS.md context. The openclaw native loader still won't see them as installed skills. Native auto-discovery for openclaw needs per-task workspace integration (e.g. synthesized per-task config via OPENCLAW_CONFIG_PATH that overrides agents.defaults.workspace, or resolving the agent's actual configured workspace at exec time) — tracked as follow-up. MUL-2213 Co-authored-by: multica-agent <github@multica.ai> --------- Co-authored-by: multica-agent <github@multica.ai>	2026-05-14 20:24:45 +08:00
Bohan Jiang	08e355be0b	MUL-2167: fix(daemon): resolve agent CLIs via login shell when daemon PATH misses them (#2620 ) * fix(daemon): resolve agent CLIs via login shell when daemon PATH misses them GUI-launched daemons on macOS/Linux do not inherit the user's interactive shell PATH, so fnm/nvm/volta multishells and the Anthropic native installer silently disappear during onboarding even though `claude --version` works in Terminal. Fall back to `$SHELL -ilc` to ask the login shell for the canonical absolute path, then verify it with exec.LookPath before trusting it. Symlinks (fnm/nvm prefix dirs) are resolved while the helper shell is still alive so per-session paths get canonicalised before they vanish. Refs MUL-2167, multica-ai/multica#2512. Co-authored-by: multica-agent <github@multica.ai> * fix(daemon): strip alias shadowing, harden timeout, lazy-resolve via login shell Three follow-ups from the PR #2620 review (Elon): 1. Alias shadowing — `command -v claude` in zsh/bash returns the alias definition, not the binary, and the absolute-path filter then rejects it. The script now `unalias`/`unset -f` the name before lookup so `command -v` falls through to the real PATH binary. This is the exact case behind #2512. 2. Hard timeout — `CommandContext` kills only the shell process. Rc files that background processes inheriting stdout (`direnv hook`, `nvm` shims, plain `&`) keep the pipe open and `cmd.Output()` would block for as long as the survivors live. `Cmd.WaitDelay` forcibly closes the pipes once the cap elapses, so total startup penalty is bounded by `timeout + waitDelay` regardless of rc-file content. 3. Lazy fallback — the resolver no longer runs on every daemon start. `getShellResolved` is `sync.Once`-guarded and only fires when a bare command name actually misses `exec.LookPath`. Users whose PATH already contains every agent never pay the rc-file load cost. Tests: - `TestResolveAgentsViaLoginShell_StripsAliasShadowing` — rc declares `alias fakeclaude=...`, real binary lives on PATH, resolver must return the binary, not the alias text. - `TestResolveAgentsViaLoginShell_HardTimeoutOnBackgroundedStdout` — rc backgrounds a 60s sleeper holding stdout; resolver must return inside `timeout + waitDelay + slack`, not 60s. - `TestLoadConfig_SkipsLoginShellWhenLookPathSucceeds` — when exec.LookPath finds every agent, SHELL (a marker-writing sentinel) must not be invoked. Co-authored-by: multica-agent <github@multica.ai> --------- Co-authored-by: multica-agent <github@multica.ai>	2026-05-14 19:27:57 +08:00
Bohan Jiang	fdf19cac8f	fix(quick-create): default squad-picked issues to the squad, not the leader (#2611 ) When the user opens quick-create with a squad selected, the task is enqueued against the squad's leader agent — but the squad, not the leader, is the expected owner. The prompt previously instructed the leader to "default to YOURSELF" using its own agent UUID, hiding new issues from the squad's delegation flow. Surface the squad's id + name on the claim response and branch the default-assignee instruction in buildQuickCreatePrompt: when SquadID is present, point --assignee-id at the squad UUID and explicitly forbid self-assignment. MUL-2203 Co-authored-by: multica-agent <github@multica.ai>	2026-05-14 17:48:02 +08:00
yujiawei	a8ce0a8998	feat(cli): add 'multica issue cancel-task <task-id>' command (#2560 ) Exposes the existing /api/tasks/{id}/cancel backend endpoint as a CLI command. Combined with upstream #2107 (cancel running agent on server-side task delete), this gives operators a way to interrupt a runaway agent push-storm without resorting to admin-bypass on the downstream PR. Use cases: - Titan / DevBot iterating beyond its boundary (e.g. push-skip loops) - Codex turn that locked in tool-call spam - Manual recovery when a long-running task needs to stop NOW Symmetric with 'issue rerun': accepts the short ID prefix shown by 'issue runs', supports --issue scoping, and reuses resolveTaskRunID for ambiguity handling. Refs: PR#19 octo-server post-mortem (2026-05-13) Co-authored-by: yujiawei <yujiawei@mininglamp.com>	2026-05-14 17:02:58 +08:00
Naiyuan Qing	2c7738b03a	feat(issues): close composer attachment preview loop end-to-end (#2594 ) Text/code attachments (markdown, JSON, .ts, .log, …) need an attachment id to render through `/api/attachments/{id}/content`. The composer pipeline was dropping that id at the upload-hook boundary, so the Eye preview gate only fired for media (PDF / video / audio via filename fallback). - `useFileUpload` now returns the full `Attachment` (with `link` kept as a `url` alias) so editor providers can resolve content-type and id. - New-comment and reply composers hold a `pendingAttachments` state and feed it to `ContentEditor`; the active subset (those still referenced in the markdown) is sent on submit as before. - Comment edit modes (CommentRow + CommentCardImpl) merge pending uploads with `entry.attachments` for the editor and pipe `attachment_ids` into `onEdit` so newly uploaded files actually bind to the comment. - Issue description editor pushes pending `attachment_ids` on every debounced save and invalidates `issueKeys.attachments` so the preview Eye survives a refresh. - `UpdateComment` and `UpdateIssue` handlers accept `attachment_ids` and call the existing `linkAttachmentsByIDs` / `linkAttachmentsByIssueIDs` helpers; the bind is idempotent so re-sending an existing id is safe. Closes MUL-2153. Co-authored-by: multica-agent <github@multica.ai>	2026-05-14 15:06:21 +08:00
LinYushen	e492d989d1	fix: trigger squad leader agent run on squad @mention in comment (#2592 ) * fix: trigger squad leader agent run when squad is @mentioned in comment Previously, enqueueMentionedAgentTasks only processed m.Type == "agent" mentions, skipping squad mentions entirely. The shouldEnqueueSquadLeaderOnComment path only fires when the issue is already assigned to a squad. This adds handling for m.Type == "squad" in enqueueMentionedAgentTasks: when a squad is @mentioned, look up the squad's leader agent and enqueue a task for them (with the same dedup/self-trigger/archived guards as direct agent mentions). Co-authored-by: multica-agent <github@multica.ai> * fix: add canAccessPrivateAgent gate to squad mention branch Closes the P1 permission vulnerability where a plain workspace member could trigger a private squad leader by @mentioning the squad, bypassing the private-agent access check that the direct @agent mention path enforces. Adds regression test TestCreateComment_SquadMentionPrivateLeaderBlocksPlainMember. Co-authored-by: multica-agent <github@multica.ai> --------- Co-authored-by: multica-agent <github@multica.ai>	2026-05-14 14:33:27 +08:00
Naiyuan Qing	0c4133ef5b	feat(agents): rewrite template catalog as 25 lightweight starters (#2587 ) * feat(agents): rewrite template catalog as 25 lightweight starters Replaces every Phase-1 template with a curated set built around the "persona + intake + scaffold + hard negatives" instruction shape. Cross- platform survey (Cursor / Cline / Roo / Continue / Custom GPTs) showed the industry baseline for starter agents is "few but sharp" — single intent, no methodology buy-in, mostly prompt-only. The original catalog went the opposite direction (avg 2.5 skills, six-skill Full-stack methodology stack) and felt heavy for first-time use. Catalog shape: - 25 templates across 7 categories: Engineering (8), Product (4), Writing (5), Design (3), Communication (2), Team (1), Productivity (2). New Product / Design / Communication / Team domains fill gaps the old Eng-heavy catalog ignored. - 16 / 25 are prompt-only (no skill fan-out). Avg 0.56 skill per template vs. 2.5 prior. Heaviest is 2 skills, only for templates whose intent cannot be expressed in instructions alone (Playwright runner, single- file HTML bundlers, design + UX-guidelines pair). - Universal top-frequency intents that the old catalog missed are now covered: Code Explainer (intent #1 across every platform surveyed), Translator (中英), Summarizer, Writing Critic, PRD Drafter/Critic, RCA Writer, ADR Writer, PR Description Writer, Commit Message Writer. Loader allows 0-skill templates: - server/internal/agenttmpl/loader.go drops the "must declare at least one skill" validation; comment explains the picker's "Prompt only" rendering path. - loader_test.go: removed the corresponding negative case, added TestLoadFromFS_PromptOnlyTemplate as a regression guard. - agent_template.go handler is unchanged — every len(tmpl.Skills) call site was already 0-safe (empty fan-out short-circuits the fetch phase and the in-tx loop both skip cleanly). Frontend: - template-picker.tsx: 18 new lucide icons (BookOpen, Bug, GitPullRequest, GitCommit, AlertTriangle, Scale, ClipboardList, Microscope, UserRound, Target, Highlighter, Languages, AlignLeft, GraduationCap, Lightbulb, Type, MessageSquare, Briefcase). Card renders a "Prompt only" badge when skills.length === 0 instead of "0 skills". - template-detail.tsx: skill list section is hidden entirely for prompt- only templates — a header reading "Includes 0 skills" above an empty list was just visual noise. Instructions section below carries the agent's identity for these. - locales/en + zh-Hans agents.json: new create_dialog.template_card. prompt_only key ("Prompt only" / "纯指令"). Verification: - go test ./internal/agenttmpl/ — 9/9 pass, including TestLoad_RealTemplates which fails closed if any new JSON is malformed. - pnpm typecheck — all 6 packages clean. - pnpm --filter @multica/views test — 482/482 pass. - pnpm lint — 0 errors. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(agents): add category filter pills to template picker 25 templates across 7 categories made the picker scroll-heavy on first open. Add a single-select category filter row above the grid so a PM can isolate Product templates in one click, an engineer can jump straight to Engineering, etc. Visual reuses the IssuesHeader scope-toggle pattern verbatim — Button variant="outline" + active class swap (bg-accent / text-muted-foreground) — so the affordance reads the same as the existing filter pills in issues / squads / runtimes / my-issues. flex-wrap keeps the 8 pills (All + 7 categories) honest on narrow widths. Counts are inlined into the label ("Engineering (8)") rather than shown as a separate badge — single-line-tall pills look right next to the picker grid, and surfacing the per-category density up front doubles as a hint at the catalog's "less but sharper" intent. When a specific category is active, the grid renders flat (no section headers) — the active pill already names what's on screen, and a header reading "Engineering" above an only-Engineering grid is visual duplication. "All" falls back to the prior grouped layout. State is component-local (no URL sync, no persistence) since the picker is dialog-internal transient state — closing the dialog naturally resets the filter, which is the expected behaviour for a "choose from a catalog" surface. i18n: new `create_dialog.template_picker.filter_all` key in en + zh. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 14:12:18 +08:00
LinYushen	0cb759b446	fix(squad): suppress no-action leader comments (#2583 )	2026-05-14 14:07:26 +08:00

1 2 3 4 5 ...

806 Commits