multica

mirror of https://github.com/multica-ai/multica.git synced 2026-07-05 13:29:44 +02:00

Author	SHA1	Message	Date
J	034feea8f7	fix(daemon): honor task-deleted signal in post-runTask completion guard The final pre-completion check in handleTask only looked for status == "cancelled" and ignored errors. After PR #2107 added a 404 task-deleted cancellation path to the in-flight watcher, this trailing guard fell out of sync — if the task was deleted between the watcher's last poll and runTask returning, handleTask would still try to call CompleteTask and only learn about the deletion via the 404 from that callback. Reuse shouldInterruptAgent so the same truth table (cancelled OR 404 task-not-found, but NOT transient errors) drives both polling and the final guard. Co-authored-by: multica-agent <github@multica.ai>	2026-05-06 15:50:07 +08:00
J	e3b4bc07f9	fix(server): return 500 for transient DB errors in daemon task lookup requireDaemonTaskAccess used to turn any GetAgentTask error into 404 "task not found", including transient DB connection / pool errors. Combined with PR #2107 — which added 404+"task not found" as a daemon cancellation trigger — that means a single DB hiccup could kill an in-flight agent run. Distinguish pgx.ErrNoRows (real "task gone", 404) from other errors (transient, 500 + warn log) using the existing isNotFound helper. Tests cover both paths via the mockDB pattern already used by TestFindOrCreateUserGating. Co-authored-by: multica-agent <github@multica.ai>	2026-05-06 15:49:59 +08:00
DimaS	b1be9ed27f	fix(daemon): cancel running agent when task is deleted server-side (#2107 ) When the server deletes a task while the daemon's agent is still running (issue removed, agent reassigned, workspace cleanup), GetTaskStatus starts returning 404 "task not found". The previous polling loop only checked for status == "cancelled" and silently swallowed the error, so the local agent kept emitting tool calls against a dead task until its own timeout fired — minutes of wasted model spend and patch_apply operations against a workdir nobody would consume. Changes: - Add isTaskNotFoundError next to isWorkspaceNotFoundError so the daemon can distinguish "task gone" 404 from "workspace gone" 404 (already handled separately) and from generic network errors. - Extract the cancellation polling goroutine in handleTask into watchTaskCancellation, plus a pure shouldInterruptAgent decision helper. The pure helper makes both signals (cancelled status and 404 task) easy to unit-test without spinning up a real backend. - Trigger interruption on the new 404 path. Transient errors (5xx, network) intentionally still don't cancel — the next poll will retry and a flaky link should not kill an in-flight agent. Tests cover the helper truth table, the existing "status cancelled" path, the new "task deleted (404)" path, and a negative case ensuring a running task is not interrupted. Co-authored-by: “646826” <“646826@gmail.com”>	2026-05-06 15:45:03 +08:00
Bohan Jiang	144661e68f	fix(daemon/execenv): refresh stale Codex auth.json across env reuse (#2126 ) `ensureSymlink` previously short-circuited whenever `dst` already existed as a regular file ("Regular file exists — don't overwrite"). On Windows that branch is reachable via the createFileLink copy fallback that fires when `os.Symlink` is unavailable, so once a per-task `codex-home/auth.json` was written as a copy it would never be refreshed by subsequent Prepare/Reuse calls. If the shared `~/.codex/auth.json` rotated (e.g. Codex Desktop refreshed the token in the background), the daemon kept handing Codex a now-revoked refresh_token, which the OAuth server rejected with `refresh_token_reused` / `token_expired`. Renaming the workspace directory was the only recovery path. Treat any non-matching dst — wrong-target symlink, broken symlink, or stale regular file — as something to delete and re-create via createFileLink, so each Prepare/Reuse mirrors the current shared source. Add a `logCodexAuthState` info log (file kind, link target, size, mtime — never contents) so operators chasing the same symptom can see at a glance whether the per-task home is tracking the shared auth or has drifted. Tests cover: stale regular-file dst is replaced, copy-fallback dst is refreshed when the shared source rotates, and a high-level prepareCodexHome regression simulating the Windows + token-rotation scenario from issue #2081. Co-authored-by: multica-agent <github@multica.ai>	2026-05-06 15:18:04 +08:00
Matt Van Horn	0dbfbfed2e	fix(daemon/execenv): refuse to write .gc_meta.json when issue_id is empty (#2077 ) A non-trivial fraction of completed task workdirs (~28% in field reports) end up with .gc_meta.json files containing issue_id: "". Empty issue_id defeats the daemon's own GC loop (gc.go:139 calls GetIssueGCCheck(meta.IssueID)) and external retention scripts that cross-reference issue status before deleting orphaned workdirs. Refuse to write the file when issueID is empty, logging a Warn so operators have a starting point for debugging the upstream race condition. Skip is preferred over a sentinel-marker file: it keeps the data invariant clean (a .gc_meta.json file always carries a valid issue_id) and matches the repo CLAUDE.md preference for not preserving dual-state behavior. WriteGCMeta now takes a *slog.Logger so it can emit the warning. The package already uses log/slog (Prepare/reuseEnv), and daemon.go:884 has taskLog in scope at the only call site. Closes #1913 Co-authored-by: Matt Van Horn <455140+mvanhorn@users.noreply.github.com>	2026-05-06 15:02:16 +08:00
furtherref	1b3c78e4b5	fix(pins): unpin missing sidebar rows (#2062 ) * fix(pins): unpin missing sidebar rows * fix(pins): guard missing pin auto-unpin	2026-05-06 14:43:47 +08:00
Bohan Jiang	09f04847d3	feat(server): redis-backed runtime liveness with DB fallback (#2121 )	2026-05-06 14:31:33 +08:00
prellr	ee10c508fb	fix(daemon): trust the agent's session id from session/resume across ACP backends (#2070 ) When the local state.db of an ACP backend (hermes, kimi, kiro) is wiped — crash, config change, manual kill, container reset — the backend's session/resume (or session/load, in kiro's case) silently creates a brand-new session rather than failing, and returns the new id in the response. Today the daemon ignores the response and stamps sessionID = opts.ResumeSessionID across all three backends, so every subsequent session/prompt is addressed to a session id the backend has no record of. The task fails with JSON-RPC -32603 (Internal error) on the very first turn, with no operator-visible signal that the problem is a session-id mismatch one layer down. The behavior is invisible: agent shows "started", then "failed" with a generic Internal error. Reproducing in production took repeated runs because nothing in the logs pointed at the silent reset. Fix: route all three ACP backends through a small `resolveResumedSessionID` helper that: - prefers the id the backend returned in its response (the canonical id; the one the backend will accept on the next call) - falls back to the requested id when the response is malformed, empty, or omits sessionId — defensive fallback so older / non- conforming backends (notably kiro's current session/load shape) behave identically to today - signals (via a bool) when the id changed, so the caller logs a Warn with `backend=<hermes\|kimi\|kiro>` and operators can grep for silent state resets to correlate them with task failures Why this is at the backend layer rather than the daemon's existing session-resume fallback: server/internal/daemon/daemon.go:1554-1566 already retries with a fresh session when resume fails, but it gates on `result.Status == "failed" && result.SessionID == ""`. The backend WILL hand back a result.SessionID — just the new one it silently committed to — so the daemon-level fallback never fires for this failure mode. The helper is also what session/new already uses (extractACPSessionID, documented in code as "Shared by all ACP backends"). session/new extracts the canonical id from the response; session/resume just didn't, until now. Coverage: - hermes.go: confirmed bug, root cause of -32603 in production - kimi.go: same code shape, same protocol method, same response schema as hermes (per extractACPSessionID's comment) — same bug - kiro.go: same code shape, different method (session/load). Current observed response doesn't include sessionId, so the defensive fallback means today's behavior is preserved. Routing through the same helper means a future kiro release that DOES return a sessionId on silent reset works the same way as hermes/kimi without another diff. Tests (server/pkg/agent/hermes_test.go — helper covers all three backends, no per-backend duplication): - TestResolveResumedSessionIDMatching — backend confirms requested id - TestResolveResumedSessionIDDifferent — backend returned a new id; caller is told to switch - TestResolveResumedSessionIDEmptyResponse — older / malformed body; defensive fallback to requested id (covers kiro's current shape) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-06 14:15:40 +08:00
Naiyuan Qing	140678c4b3	fix(web): redesign 404 + break NoAccessPage redirect loop (#2122 ) * refactor(web): rewrite 404 page using design tokens Replace editorial-style 404 (hardcoded cream/ink/terracotta colors, Instrument Serif font, fluid clamp() typography) with a minimal version using semantic tokens and the project's buttonVariants helper. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(workspace): break NoAccessPage redirect loop by clearing stale cookie The web proxy redirects / to /<lastSlug>/issues based on the last_workspace_slug cookie alone, with no access check. When a user gets evicted from a workspace, the cookie still points at it; clicking "Go to my workspaces" then loops: NoAccessPage -> / -> proxy -> same bad slug -> NoAccessPage. Clear the cookie on mount so the proxy falls through to the landing page, which resolves the correct destination via the workspace list. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(web): mark not-found as client to allow buttonVariants import buttonVariants is exported from a "use client" module, so calling it from a server component is rejected by Next 16's directive checks. Production build of /workspaces/new prerender failed because of this. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-06 14:15:13 +08:00
Bohan Jiang	b08594f2f6	fix(daemon): isolate runtime poll & heartbeat schedules per runtime (#2116 ) * fix(daemon): isolate runtime poll & heartbeat schedules per runtime A daemon serving multiple workspaces ran a single round-robin poll loop and a single HTTP heartbeat loop across every registered runtime. A 30s HTTP timeout for any one runtime serialized that delay across all the others — observed in production as one workspace's runtimes wedging every other workspace's runtimes on the same daemon. This change: - Replaces the shared runtime-set channel with a multi-subscriber watcher so taskWakeupLoop, heartbeatLoop, and pollLoop can each react to runtime-set changes independently. - Splits heartbeatLoop and pollLoop into supervisor + per-runtime worker goroutines. Each runtime owns its claim cadence and its heartbeat ticker, so a slow request on one runtime no longer blocks any other. - Stagers the per-runtime heartbeat first tick by a jittered delay up to one full interval to avoid a thundering herd at startup. - Sizes the WS writer channel to scale with the runtime count (max(16, 2N)) so a full per-runtime heartbeat batch always fits; the previous fixed 8-slot buffer dropped heartbeats whenever a daemon watched more than ~8 runtimes. Co-authored-by: multica-agent <github@multica.ai> fix(daemon): acquire execution slot only after ClaimTask, drain pollers before taskWG Two issues from review on the previous commit: 1. Acquiring the shared task slot before ClaimTask reintroduced the very head-of-line blocking the refactor was meant to remove. With MaxConcurrentTasks=1, a slow claim on one runtime parked the only slot for the duration of the HTTP timeout (up to 30s), starving every other runtime's claim attempts. Slots are now acquired after the claim returns a task; other runtimes' pollers stay free to claim. The already-dispatched task waits for a slot under MaxConcurrentTasks bounds, which is the same backpressure shape we had before. 2. pollLoop's shutdown path called taskWG.Wait immediately after cancelling pollers, but a poller could still be between ClaimTask returning a task and taskWG.Add(1). When taskWG's counter is zero that races with Wait — undefined sync.WaitGroup misuse, sometimes panic. Added a pollerWG so the supervisor blocks until every poller goroutine has actually returned before reaching taskWG.Wait. Tests: - TestRunRuntimePollerIsolatesSlowRuntime now uses MaxConcurrentTasks=1 (was 4) so it would have failed under the old slot-before-claim path. - New TestPollLoopShutdownWaitsForPollersBeforeTaskWG drives the exact race window — claim returns a task at the same moment shutdown fires — under -race. Co-authored-by: multica-agent <github@multica.ai> * fix(daemon): acquire slot before ClaimTask so capacity-waiters never enter dispatched The previous commit moved slot acquisition AFTER ClaimTask to address a review concern about head-of-line blocking with MaxConcurrentTasks=1. That introduced a strictly worse failure mode: server-side ClaimTask flips the task to `dispatched` immediately (agent.sql:174-176), and the runtime sweeper fails any task in `dispatched` for >300s with `failed/timeout` (runtime_sweeper.go:25-28). When local execution capacity is full and the next claimed task can't acquire a slot within 5 minutes, the user sees the exact failure this issue is fixing — `dispatched_at` set, `started_at` NULL, `failure_reason=timeout`. Reverted to slot-before-claim. The trade-off is the original review concern: with MaxConcurrentTasks=1 and a slow ClaimTask, other runtimes' claims are delayed by up to client.Timeout=30s. That's a 30s polling delay, not a failure — server-side those tasks remain `queued` (no timeout in that state) until a slot frees. 30s ≪ 300s, so other runtimes' tasks cannot get sweeper-failed because of this. The pollerWG fix from the previous commit (avoiding sync.WaitGroup misuse on shutdown) is preserved. Tests: - TestRunRuntimePollerIsolatesSlowRuntime: MaxConcurrentTasks back to 4 (the pre-issue baseline) — the headroom case where slot-before- claim still gives full per-runtime isolation. - New TestRunRuntimePollerSkipsClaimWhenAtCapacity: holds the only slot and verifies the poller never calls ClaimTask while sem is empty. The previous "claim first" path would have failed this. Co-authored-by: multica-agent <github@multica.ai> --------- Co-authored-by: multica-agent <github@multica.ai>	2026-05-06 14:13:27 +08:00
Bohan Jiang	a4fac51cf5	fix(projects): add resource_count breadcrumb instead of inlining resources (#2118 ) * fix(projects): add resource_count breadcrumb to project responses Closes #2087 `multica project get` previously returned project metadata with no signal that resources existed. Agents that fetched a project this way had no way to discover its attached resources without already knowing about `/api/projects/{id}/resources` or the on-disk `.multica/project/resources.json`. Rather than inline the full resource list into the parent payload (which conflates parent metadata with a child sub-collection and locks the resource_ref shape into the project endpoint's contract), this adds a scalar `resource_count` breadcrumb to ProjectResponse. The actual list stays at the dedicated sub-collection endpoint. Changes: - GetProjectResourceCounts :many — new batched sqlc query - ProjectResponse.ResourceCount populated in GetProject, ListProjects, SearchProjects, and the with-resources CreateProject echo - multica project get prints a stderr hint pointing at multica project resource list <id> when count > 0; the JSON on stdout stays parseable - Meta-skill (runtime_config.go) lists multica project get and multica project resource list in Available Commands so agents that read CLAUDE.md / AGENTS.md know about both paths Co-authored-by: multica-agent <github@multica.ai> * fix(projects): wire ResourceCount through Update + Create event payload Review feedback on #2118. - UpdateProject now reloads ResourceCount before responding/publishing. Previously a title- or status-only PUT served (and broadcast over WS) resource_count: 0 even when resources existed. - The with-resources CreateProject path sets resp.ResourceCount before the project:created publish, so the WS event payload matches the HTTP echo. The hand-rolled response map collapses to an embedded ProjectResponse + resources array — one source of truth for the serialized shape. - packages/core/types/project.ts: Project gains resource_count: number to keep the TS contract aligned with the server response. Tests: - TestProjectResourceCountBreadcrumb extends to assert UpdateProject preserves the breadcrumb. - TestCreateProjectWithResourcesEchoesCount asserts the create echo carries resource_count matching the attached resources. Co-authored-by: multica-agent <github@multica.ai> --------- Co-authored-by: multica-agent <github@multica.ai>	2026-05-06 14:09:35 +08:00
Multica Eve	2b967338a8	fix(runtimes): narrow CostCell usage window from 180d to 14d (#2119 ) The runtimes list page renders a CostCell per row that only displays a 7d cost total plus a 7d-vs-prior-7d delta. Until now each cell still fetched a 180d usage window so the cache key matched the runtime-detail page (clicking a row would pre-warm detail). The side effect was N parallel 180d in-line aggregations against task_usage on every list visit, one per runtime, which dominated DB load for this view. Switch the cell to a 14d window — exactly the data it actually needs for cost7d + costPrev7d. Detail still owns its own 180d query; the worst case after this change is one extra request on first navigation into detail, in exchange for a large steady-state reduction on the list page (down to 14d × N instead of 180d × N, ~13× fewer rows scanned per request). This is the frontend half of the runtime-usage perf work tracked in MUL-1748. The backend index + daily rollup changes will land separately. Co-authored-by: Eve <eve@multica.ai> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: multica-agent <github@multica.ai>	2026-05-06 13:56:58 +08:00
Bohan Jiang	6ef9be10d6	fix(chat): expose History panel + delete affordance from chat header (#2117 ) ChatSessionHistory was already implemented but unreachable: nothing in the app rendered it and there was no UI to toggle showHistory. The trash icon on each session row was therefore invisible. Adds a History icon button to the chat-window header that toggles the panel; when on, it renders ChatSessionHistory in place of the message list and input. Per-row delete (hover trash + AlertDialog) works as designed. Co-authored-by: multica-agent <github@multica.ai>	2026-05-06 13:36:06 +08:00
Bohan Jiang	60b215f44f	feat(chat): support deleting chat sessions (#2115 ) * feat(chat): support deleting chat sessions Replaces the unreachable archive endpoint with a real hard delete and exposes it from the chat history panel. - DELETE /api/chat/sessions/{id} now hard-deletes the session and its messages (CASCADE), cancels any in-flight tasks before removal so the daemon doesn't keep running work whose result has nowhere to land, and broadcasts chat:session_deleted. - Frontend adds a per-row delete button with a confirmation dialog, optimistically drops the session from both list caches, and clears the active session pointer locally + on other tabs via the WS handler. Co-authored-by: multica-agent <github@multica.ai> * fix(chat): make session delete atomic and keep archived sessions read-only Address review feedback on #2115. - DeleteChatSession now runs lock + cancel + delete in a single tx and only broadcasts events post-commit. The new LockChatSessionForDelete query takes FOR UPDATE on chat_session, which blocks the FK validation of any concurrent SendChatMessage trying to enqueue a task for this session — that insert fails after we commit, so it can no longer produce an orphaned task whose chat_session_id is nulled by ON DELETE SET NULL. Cancel failure now aborts the delete instead of warn-and-continue. - SendChatMessage refuses non-active sessions again. The archive code path is gone, but legacy rows with status='archived' may still exist in the DB; keep the guard until we explicitly migrate them. - Frontend re-reads allChatSessionsOptions to disable ChatInput on legacy archived sessions so the UX matches the server-side guard. Co-authored-by: multica-agent <github@multica.ai> --------- Co-authored-by: multica-agent <github@multica.ai>	2026-05-06 13:22:53 +08:00
Bohan Jiang	f1082b10a4	feat(cli): add --assignee-id / --to-id / --user-id for unambiguous targeting (#2114 ) * feat(cli): add --assignee-id / --to-id / --user-id for unambiguous targeting `multica issue {create,update,list}`, `issue assign`, and `issue subscriber {add,remove}` accepted only fuzzy name matching, which fails in workspaces where one user's name is a substring of another (e.g. agent "J" vs "Cursor - J" / member "Jiayuan"). #1642 added UUID acceptance through the existing flags, but there was still no explicit path that signals "this is a UUID, not a name" — important for scripts that read IDs from `multica workspace members --output json`. Adds an `-id`-suffixed counterpart for every assignee-taking flag: - `issue list` : --assignee-id - `issue create` : --assignee-id - `issue update` : --assignee-id - `issue assign` : --to-id - `issue subscriber {add,remove}` : --user-id The new flags route through `resolveAssigneeByID`, a strict resolver that requires a canonical UUID and fails with a clear error when the entity is not in the workspace (no name fallback). A shared `pickAssigneeFromFlags` helper enforces mutual exclusion between the name and id flags so a script that accidentally sets both never silently applies one over the other. Refs MUL-1254. Co-authored-by: multica-agent <github@multica.ai> * fix(cli): detect assignee flag presence via Changed, not value-emptiness `pickAssigneeFromFlags` previously branched on `flag value != ""`, so explicitly passing an empty UUID silently routed through the "no flag set" path: multica issue list --assignee-id "" # listed every issue multica issue create --assignee-id "" # created an unassigned issue multica issue subscriber add --user-id "" # subscribed the caller This is exactly the failure mode the strict-UUID flag was added to prevent — a script interpolating `--assignee-id "$MAYBE_UUID"` against a missing env var should fail loudly, not silently degrade to a different operation. Switch the picker (and the assign-command top-level guard) to use `Flags().Changed`, so an explicit empty value reaches `resolveAssigneeByID` / `resolveAssignee` and surfaces a clear "expected a canonical UUID" / "no member or agent found matching" error. Co-authored-by: multica-agent <github@multica.ai> * docs(cli): cover --assignee-id / --to-id in user docs and quick-create prompt Follow-up to the --*-id flag rollout: surface the new flags everywhere the old ones are documented so users (and agents) can discover them. - assigning-issues.{mdx,zh.mdx}: the page explicitly calls out the duplicate-name footgun ("first one listed wins, so rename before assigning") — replace that workaround with a --to-id <uuid> example - cloud-quickstart.{mdx,zh.mdx}: add a --to-id hint after the substring- match callout so first-time users learn about the strict path - internal/daemon/prompt.go (quick-create injected prompt): - default-to-self: pass --assignee-id <task.Agent.ID> instead of --assignee <name>; the picker agent's UUID is already in scope and UUID matching is unambiguous in workspaces with overlapping agent names (J / Cursor - J / Pi - J etc.) - user-named: tell the agent to prefer --assignee-id <uuid> using the user_id/id from the JSON it already fetched; --assignee <name> stays a fallback for unambiguous workspaces Co-authored-by: multica-agent <github@multica.ai> --------- Co-authored-by: multica-agent <github@multica.ai>	2026-05-06 13:13:36 +08:00
LinYushen	44a0ced558	fix(runtime): persist CLI update requests in Redis (#2113 ) Co-authored-by: multica-agent <github@multica.ai>	2026-05-06 13:00:11 +08:00
Bohan Jiang	89b939b07c	fix(storage): build region-qualified S3 public URLs (#2051 ) (#2065 ) * fix(storage): build region-qualified S3 public URLs (#2051) The uploadedURL fallback (no CloudFront, no custom endpoint) wrote "https://<bucket>/<key>" — missing the ".s3.<region>.amazonaws.com" suffix — so any deployment that pointed S3_BUCKET at a real AWS bucket without a CDN got broken image URLs back to the client. Avatar URLs were persisted in this broken form on the user/agent rows, so profile pictures uploaded via the SDK never rendered. - Track S3_REGION on S3Storage and emit https://<bucket>.s3.<region>.amazonaws.com/<key> by default; fall back to path-style https://s3.<region>.amazonaws.com/<bucket>/<key> when the bucket name contains dots, since the AWS wildcard cert can't validate dotted virtual-hosted hosts. - Teach KeyFromURL to recognise the new region-qualified hosts (both styles) and keep recognising the legacy bucket-only host so historical records can still be deleted/migrated. - Document that S3_BUCKET is the bucket name only, not a hostname, in env-vars docs (en+zh), self-hosting guides, and .env.example. Co-authored-by: multica-agent <github@multica.ai> * feat(storage): warn at startup when S3_BUCKET looks like a hostname Catches the most common misconfiguration shape (S3_BUCKET set to "<bucket>.s3.<region>.amazonaws.com") with a startup log line so operators don't silently end up with a config that signs uploads against an invalid bucket name. A real bucket name can never legitimately contain "amazonaws.com", so the check is a single substring match — no false positives worth carving out. Co-authored-by: multica-agent <github@multica.ai> --------- Co-authored-by: multica-agent <github@multica.ai>	2026-05-06 12:45:55 +08:00
Bohan Jiang	8b0eeb0615	fix(projects): show URL tooltip on already-attached repos in Add Resource list (#2111 ) The repo button in the Add Resource popover used the native `disabled` attribute when a repo was already attached. Browsers suppress pointer events on disabled form controls, so the tooltip on the URL text never fired for attached rows — the issue spec calls out "hovering over any URL should also show the complete URL in a tooltip". Switch to `aria-disabled` plus a click guard so the row still announces as disabled to assistive tech, looks the same visually, and is no longer click-able, but hover still reaches the tooltip trigger. Co-authored-by: multica-agent <github@multica.ai>	2026-05-06 12:00:27 +08:00
Ákos Seres	64c605e227	fix(execenv): write OpenCode skills to .opencode/skills/ for native discovery (#2016 ) * fix(execenv): write OpenCode skills to .opencode/skills/ for native discovery * fix(repocache): exclude OpenCode skill directory	2026-05-06 11:48:06 +08:00
Cong Vu Chi	820d57535e	feat(desktop): load runtime self-host config (#2012 ) * feat(desktop): load runtime self-host config Co-authored-by: multica-agent <github@multica.ai> * docs: document desktop runtime self-host config Co-authored-by: multica-agent <github@multica.ai> * fix(desktop): address runtime config review feedback Co-authored-by: multica-agent <github@multica.ai> --------- Co-authored-by: Cheese <congvc@congvc-c00.taila6fa8a.ts.net> Co-authored-by: multica-agent <github@multica.ai> Co-authored-by: congvc <congvc-dev@gmail.com>	2026-05-06 11:39:36 +08:00
Bohan Jiang	a7299bf857	refactor(projects): pass projectId prop to ProjectIssuesContent (#2110 ) Replace `scope.replace("project:", "")` with the `projectId` already held by `ProjectDetail`, so the create-issue handler in the empty state no longer depends on the `project:<id>` scope-string format. Co-authored-by: multica-agent <github@multica.ai>	2026-05-06 11:36:45 +08:00
Yash Soni	baac4080e9	fix(installer): correct Windows version parsing and checksum decode (#2093 ) Closes #2092	2026-05-06 11:36:25 +08:00
Kagura	99f6cb8130	fix(projects): add New Issue button to empty project state and URL tooltips to resources (#2080 ) When a project has no issues, show a [+ New Issue] button that opens the create-issue dialog with the project pre-selected. Previously users had to navigate to the issues page and manually assign the project. Also add tooltips to repository URLs in the Resources section so truncated URLs can be read in full on hover. Fixes #2078	2026-05-06 11:33:26 +08:00
ASDFGHoney	b5f1e506e5	fix(views): split desktop/mobile sidebar state in project-detail (#2067 ) Mobile project-detail mounted its <Sheet> with open=true for one render — useIsMobile() reports false on first render and flips to true on the next, so the mobile branch briefly mounted Base UI Dialog open, painted its fixed inset-0 z-50 backdrop and locked scroll. The follow-up useEffect toggled it closed within the same animation cycle, leaving Dialog's pointer-events/inert/scroll-lock state stuck on mobile. Mirror packages/views/issues/components/issue-detail.tsx by keeping desktopSidebarOpen (default true) and mobileSidebarOpen (default false) as separate states, binding the mobile <Sheet> to mobileSidebarOpen only. The single-state pattern dates back to #1087, where issue-detail and project-detail received mobile-Sheet support together but only issue-detail used split state.	2026-05-06 11:27:45 +08:00
Thanh Minh	00cde21724	fix(views): hide archived agents from runtime detail (#2097 )	2026-05-06 11:23:56 +08:00
Jiayuan Zhang	1476c268dd	refactor(quick-create): exempt git-describe daemons from CLI gate (#2108 ) * refactor(quick-create): remove daemon CLI version gate Local-source daemons report dev-suffixed versions (e.g. v0.2.15-235-gdaf0e935) that the picker pre-check and server gate both treat as too old, blocking quick-create during local testing. Drops the gate end-to-end: removes MinQuickCreateCLIVersion + CheckMinCLIVersion in pkg/agent, the checkQuickCreateDaemonVersion handler and readRuntimeCLIVersion helper in handler/issue.go, and the mirrored cli-version.ts plus the modal's pre-check, blocked-state UI, and daemon_version_unsupported error branch. Co-authored-by: multica-agent <github@multica.ai> * refactor(quick-create): skip daemon CLI version gate in dev Restores the gate (reverts the full-removal commit) and bypasses it in non-production environments instead. The motivation for the original removal — local source-built daemons report a `git describe` version like v0.2.15-N-gHASH that parses below 0.2.20 and blocks dev testing — is now handled by checking APP_ENV on the server and NODE_ENV on the client. Production keeps the original "needs upgrade" UX. Co-authored-by: multica-agent <github@multica.ai> * refactor(quick-create): exempt git-describe daemons instead of env bypass Replaces the per-environment bypass added in the previous commit with a shared daemon-version signal. CheckMinCLIVersion / checkQuickCreateCliVersion now treat any daemon whose CLI version matches the `vX.Y.Z-N-gHASH[-dirty]` git-describe shape as OK; tagged releases keep going through the normal min-version comparison. Why: Emacs flagged that (a) NODE_ENV !== "production" also disables the gate on staging and other non-prod deployments, undoing the protection for the case the gate was originally written for, and (b) NODE_ENV (web client) and APP_ENV (server) are not equivalent, so the modal pre-check and server gate could disagree on the same request. Both go away when the signal is intrinsic to the daemon's version string. Co-authored-by: multica-agent <github@multica.ai> --------- Co-authored-by: multica-agent <github@multica.ai>	2026-05-06 09:00:11 +08:00
Jiayuan Zhang	9a5f5ca498	fix(views): coalesce repeated task_completed/task_failed activity entries (#2044 ) Consecutive "completed the task" entries from the same agent now merge into a single line showing the count (e.g. "completed the task (7 times)") regardless of time gap. Other activity types keep the existing 2-minute coalescing window. Closes MUL-1709	2026-05-06 02:11:43 +02:00
Bohan Jiang	daf0e935f6	fix(views): show Ctrl+K / Ctrl+Enter on non-Mac platforms (#2060 ) The sidebar search trigger, quick-create-issue modal, and feedback modal hardcoded the Mac glyphs (⌘, ↵) for their keyboard hints, so Windows and Linux users always saw Mac shortcuts even though the underlying handlers already accept metaKey \|\| ctrlKey. Extract a small platform helper (isMac, modKey, enterKey, formatShortcut) in packages/core/platform/keyboard.ts and route all four affected sites (plus the editor bubble menu, which had the same logic inlined) through it, so non-Mac users see Ctrl+K, Ctrl+Enter, etc. Closes multica-ai/multica#2056 v0.2.25	2026-05-04 21:26:00 +08:00
Bohan Jiang	5c42ed1649	fix(server): allow re-inviting after invitation expires (#2059 ) The uniqueness check on workspace invitations only filtered by status='pending', not by expires_at. Combined with the partial unique index idx_invitation_unique_pending (also keyed only on status), a past-due pending row permanently blocked re-inviting the same email. Now, before creating a new invitation, the handler flips any past-due pending row for the same (workspace_id, invitee_email) to 'expired', freeing the unique slot. Also tightens GetPendingInvitationByEmail to require expires_at > now(), matching the existing list queries. Closes multica-ai/multica#2055.	2026-05-04 21:24:56 +08:00
Dingyj3178	a57dd76faf	fix(views): improve mobile responsiveness for agents and settings (#2036 ) * feat(agents): make agent detail page mobile responsive (#1) Stack the inspector + overview pane vertically below md, switch the shell to page-level scroll so the inspector flows naturally, give the overview pane a min-h-[60vh] floor so tabs stay usable, and let the 5-tab nav scroll horizontally on narrow viewports. * fix(settings): make Repositories tab and Settings shell mobile-responsive (#2) The Settings shell used a fixed w-52 sidebar with no responsive behavior, leaving almost no room for tab content on phone-width viewports. Stack the nav above the content on mobile, scale inner padding, and let the Repositories tab's input/button rows wrap rather than overflow.	2026-05-04 21:24:07 +08:00
Bohan Jiang	c24191a884	fix(editor): keep blank-line paste inside the code block (#2058 ) Pasting `line1\n\nline2` while the caret was inside a code block ran the text through the Markdown parser, which split on the blank line and tore the code block open, dropping the trailing content into a sibling paragraph. Detect the codeBlock parent on `handlePaste` and insert the clipboard text verbatim instead. Code blocks have `code: true`, so newlines stay literal — exactly what users expect when pasting code or logs. Closes #1982	2026-05-04 21:12:14 +08:00
Kagura	629f4136ac	fix(codex): handle MCP elicitation server requests correctly (#1944 ) * fix(codex): handle MCP elicitation server requests correctly Fixes #1942. handleServerRequest responded with {} to unrecognized Codex server requests including mcpServer/elicitation/request. Codex 0.125+ expects {action, content, _meta} for elicitation — the empty object causes a deserialization error and the MCP tool call is reported as user-rejected. Changes: - Add mcpServer/elicitation/request case with correct response schema - Add respondError helper for JSON-RPC error responses - Return proper JSON-RPC method-not-found error for unknown server requests instead of silent empty object - Add tests for MCP elicitation and unknown method handling * fix: use cfg.Logger instead of global slog in codex handleServerRequest Switch the unhandled-server-request warning from global slog.Warn to c.cfg.Logger.Warn for consistency with all other log calls in codex.go. This ensures the warning appears in daemon run-logs and per-task pipelines where operators look during triage.	2026-05-04 21:05:37 +08:00
ASDFGHoney	cb078c0f36	fix(core): patch byIssue label cache on WS label change (#2048 ) `onIssueLabelsChanged` patched the embedded `labels` field in the issue list and detail caches but never touched `labelKeys.byIssue`, the cache backing the issue-detail Properties LabelPicker. Mutations already covered all three caches; WS-driven changes (agents, other tabs) left the picker stale until remount, since `staleTime: Infinity` plus `refetchOnWindowFocus: false` prevent recovery on focus.	2026-05-04 20:51:02 +08:00
ayakabot	e13e5edc8e	fix(issues): trimEnd comparison on blur to avoid unnecessary updates (#2054 ) Fixed: #2053	2026-05-04 20:50:39 +08:00
Manu	fee393df1f	fix(views): show full repo URLs in project creation (#2045 )	2026-05-04 20:50:17 +08:00
ayakabot	1ff4e27e77	feat(quick-create): cache agent prompt draft across navigation (#2039 ) When creating an issue with agent, the input content was lost when navigating away (e.g., to view a ticket) and returning. Manual create already persisted its draft - now agent create does too. Changes: - Add prompt field to useQuickCreateStore (persisted with workspace) - AgentCreatePanel reads initial prompt from draft store if no transient data.prompt is provided - onUpdate now saves prompt to draft store (not just hasContent) - clearPrompt() called after successful submit Fixes: #1957	2026-05-04 00:03:27 +02:00
Jiayuan Zhang	fbf9460d5e	feat(chat): support fullscreen expand mode (#2043 ) * feat(chat): support fullscreen mode similar to Linear When the expand button is clicked, the chat window now fills the entire content area (inset-0) instead of scaling to 90% of parent. Resize handles are hidden in fullscreen mode. * fix(chat): use stacked card layout for fullscreen mode Fullscreen chat now uses inset-3 with rounded corners, ring, and shadow to create a stacked card effect on top of the content area — matching the Linear design — instead of a flush inset-0 fill. * feat(chat): add motion.dev spring animations for expand/collapse - Install `motion` in @multica/views - Replace CSS transitions with motion.div layout animation for expand/collapse (spring-based FLIP), giving a natural bouncy feel - Open/close uses spring scale + smooth opacity fade - Layout animations are disabled during drag-to-resize (instant updates) * fix(chat): remove spring bounce from expand/collapse animation Use critically damped springs (bounce: 0) so the animation settles directly at its target without overshooting. * fix(chat): fix text distortion during expand/collapse animation Use layout="position" instead of layout (full FLIP). Full FLIP uses scale transforms to animate size changes, which distorts text and child content. Position-only layout animates translate only — size changes are instant, text stays crisp. * fix: regenerate lockfile with pnpm@10.28.2 The lockfile was previously generated with pnpm 10.12.4, causing unrelated churn (lost libc constraints, deprecated metadata). Reset to main and regenerated with the repo's pinned pnpm@10.28.2 so the diff is scoped to the new motion dependency only.	2026-05-03 22:56:22 +02:00
Jiayuan Zhang	d492b9d7a6	Revert "feat(quick-create): add preset issue fields (#2002 )" (#2042 ) This reverts commit `a039c4d803`.	2026-05-03 20:02:40 +02:00
Bohan Jiang	3dc3e49a47	fix(daemon): remove Co-authored-by hook when workspace setting is off (#2035 ) * fix(daemon): remove Co-authored-by hook when workspace setting is off The prepare-commit-msg hook is installed in the bare repo's shared hooks dir, so once installed it persists across worktrees. CreateWorktree only installed the hook when the setting was enabled, but never removed it — so disabling the workspace toggle had no effect on subsequent commits. Add removeCoAuthoredByHook and call it in both CreateWorktree branches when the setting is disabled. Use a marker comment in the hook script so removal only deletes hooks the daemon owns; user-installed hooks at the same path are left alone. Co-authored-by: multica-agent <github@multica.ai> * fix(daemon): recognize legacy Multica prepare-commit-msg hook on removal The first cut of removeCoAuthoredByHook only recognized hooks installed by the new code (containing the multicaHookMarker sentinel). Bare clones already on disk from previous daemon releases carry the older script without that line, so toggling the workspace setting off would have treated them as user hooks and left the trailer in place — exactly the state reported in MUL-1704. Match against a list of known daemon signatures (current marker + the legacy "Installed by the Multica daemon." comment), and add a test that seeds the verbatim legacy hook before CreateWorktree(... disabled) to keep recognition aligned with what production hosts actually have on disk. Co-authored-by: multica-agent <github@multica.ai> --------- Co-authored-by: multica-agent <github@multica.ai>	2026-05-03 21:09:16 +08:00
Bohan Jiang	ae9098637d	feat(analytics): suppress PostHog $pageview on desktop tab/workspace switches (#2033 ) * feat(analytics): suppress PostHog $pageview on desktop tab/workspace switches Desktop tab switches were emitting a $pageview every time the user clicked between already-open tabs (or workspaces), since the tracker fired on any change to the resolved active path. Real-data audit showed this was the single largest source of PostHog quota burn — desktop accounted for 51% of all $pageviews at ~34 pv/user/30d vs web's ~10 — and the re-emitted paths add no signal because the original navigation already fired. Detect "tab switch" as `(workspace, tabId)` identity changing while the surface stays `tab`, and skip the capture in that case while still updating the ref so the next in-tab navigation compares against the right baseline. Login transitions, overlay open/close, and intra-tab navigation continue to fire as before. Co-authored-by: multica-agent <github@multica.ai> * fix(analytics): only suppress $pageview for re-activations of known tabs Prior commit suppressed every (workspace, tabId) change while the surface stayed `tab`, which also swallowed the first $pageview for newly opened tabs (`openInNewTab` / `addTab`) and for cross-workspace `switchWorkspace` into a not-yet-seen tab. Track an observed `(workspace, tabId) → path` map seeded from the persisted tab store on mount. Suppress only when the active key is already in the map AND its recorded path matches the current path — i.e. genuine re-activation of an already-known tab. New tabs and cross-workspace navigation to a fresh tab now correctly emit one pageview. Adds a vitest covering the three behaviors GPT-Boy flagged plus the intra-tab navigation, overlay/login transitions, and persistence-restored mount paths. Wires the `@/` alias into `vitest.config.ts` so component tests can resolve renderer-relative imports. Co-authored-by: multica-agent <github@multica.ai> * refactor(analytics): reuse tab-store helpers and inline observed-tabs seed Replace the two ad-hoc tab selectors with the existing `useActiveTabIdentity()` + `getActiveTab()` helpers from tab-store, which already provide the (slug, tabId) primitive pair and the active tab lookup with the same stability guarantees. Move the observed-tabs Map seeding from a useEffect into a synchronous first-render initializer. The seed runs once per mount before any state-driven effect, so the previous useEffect-then-defensive-fallback pattern in the second effect was unreachable. Co-authored-by: multica-agent <github@multica.ai> --------- Co-authored-by: multica-agent <github@multica.ai>	2026-05-03 20:54:29 +08:00
Kagura	cc94fbd305	fix: handle square brackets in agent names for mention parsing (#1992 ) * fix: handle square brackets in agent names for mention parsing (#1991) The mention regex used [^\]]* to match labels, which broke when agent names contained square brackets (e.g. David[TF]). The ] inside the name caused the regex to stop matching prematurely, silently dropping the mention. Changes: - Backend (mention.go): Switch to .+? (non-greedy) anchored on ](mention:// to correctly match labels with brackets - Frontend (mention-extension.ts): Same regex fix in tokenizer, plus escape [ and ] in renderMarkdown to prevent creating ambiguous markdown syntax - Add comprehensive tests for ParseMentions covering bracket names Fixes #1991 * fix: add optional chaining for match group access Fixes TS2532: Object is possibly 'undefined' on match[1] when calling .replace() in the mention tokenizer. * fix: tighten mention tokenizer to reject ordinary Markdown links - Replace .+? with (?:\\.\|[^\]])+ in start() and tokenize() regexes so the label cannot cross a ]( Markdown link boundary - Escaped brackets (\[ \]) from renderMarkdown() are still accepted - Add frontend tokenizer/serializer round-trip tests: - Plain mention - Escaped brackets (David[TF]) round-trip - Normal Markdown link + mention on same line (regression) - Multiple links before mention - Nested brackets (Bot[v2][beta]) - Issue mentions without @ prefix Addresses review feedback on #1992. * fix: add type assertions for tiptap MarkdownTokenizer interface in tests The tiptap MarkdownTokenizer type allows start to be string \| function and tokenize to accept 3 arguments. Our extension always provides single-arg functions, so cast them for TypeScript satisfaction. Fixes CI typecheck failure in @multica/views package. * fix: cast renderMarkdown to single-arg shape and reset file modes to 0644	2026-05-03 19:39:26 +08:00
ayakabot	a039c4d803	feat(quick-create): add preset issue fields (#2002 ) Fixed: #2001	2026-05-03 19:37:12 +08:00
Bohan Jiang	cf0d58ab50	docs(changelog): add 0.2.24 entry covering 0.2.22 → 0.2.23 → today (#2028 ) Folds together everything that landed since the last public changelog entry (0.2.21) into one 0.2.24 release note: repo checkout --ref, agent avatar CLI, Hermes per-turn gate, multi-replica model picker on Redis, Inbox long-timeline perf, and the rest of the smaller fixes queued for tonight's release. en.ts and zh.ts both updated. Co-authored-by: multica-agent <github@multica.ai> v0.2.24	2026-05-03 12:39:00 +08:00
furtherref	3fe3b84981	fix: hydrate agent cache after create (#2027 ) (cherry picked from commit `0ea425c6e4`)	2026-05-03 12:25:05 +08:00
Bohan Jiang	c4352da126	fix(daemon): drain background repo syncs before test teardown (#2026 ) TestRegisterTaskReposSurvivesWorkspaceRefresh started flaking on CI after #1988 (`feat: support repo checkout ref selection`) extended the bare-clone path to run an extra `git fetch` to backfill refs/remotes/origin/* under the new refspec layout. The race was already latent: registerTaskRepos kicks off `go syncWorkspaceRepos(...)` to clone a repo into the cache root, which in tests is `t.TempDir()`. Once the test waited on `repoCache.Lookup` to return a path it would proceed and return — but the bg goroutine was still inside `ensureRemoteTrackingLayout` running git operations on the clone dir. `t.TempDir`'s cleanup then races with those git commands and surfaces either as "directory not empty" or "fatal: cannot change to ... No such file or directory", with no hint that the failure is unrelated to the test's actual assertion. Track the background goroutine on the Daemon via a sync.WaitGroup and expose `waitBackgroundSyncs()` for tests. `newRepoReadyTestDaemon` registers a t.Cleanup that calls it, so every test that uses the helper now drains in-flight syncs before t.TempDir cleanup runs. No production-behavior change — registerTaskRepos still fires-and-forgets from the caller's perspective. Verified with `go test ./internal/daemon -run TestRegisterTaskReposSurvivesWorkspaceRefresh -count=30` (was failing within ~10 iterations before, 30 green after) and the full `go test ./internal/daemon/...` suite. Co-authored-by: multica-agent <github@multica.ai>	2026-05-03 12:24:56 +08:00
Bohan Jiang	d0c66f3173	perf(issue-detail): memoize timeline render to mitigate Inbox long-timeline freeze (#2025 ) * perf(issue-detail): memoize timeline render to fix Inbox long-timeline freeze On long-timeline issues (thousands of comments), opening from Inbox hard-freezes the browser tab because every WS-driven parent re-render re-runs the full react-markdown + rehype-* + lowlight pipeline for every comment. This is the S3 mitigation for multica#1968: - Wrap ReadonlyContent in React.memo so equal-content re-renders skip the markdown pipeline entirely (the dominant cost per comment). - Wrap CommentCard in React.memo so unrelated parent state updates don't re-render every card. - useMemo the timeline grouping in IssueDetail so the allReplies Map and groups array references are stable across re-renders that don't change timeline. - Stabilize toggleReaction via a timelineRef so its identity doesn't change on every WS event, which previously defeated CommentCard memoization. Virtualization (S2) is the root fix for first-paint cost and lands separately. Co-authored-by: multica-agent <github@multica.ai> * fix(issue-detail): destructure mutate/mutateAsync so CommentCard memo holds Per review on PR #2025: TanStack Query v5 returns a fresh result wrapper from useMutation on every render, with only the inner mutate / mutateAsync functions guaranteed stable. The previous useCallback dependencies listed the whole mutation object, so on every parent re-render the callbacks flipped identity — defeating React.memo on CommentCard and leaving the long-timeline mitigation only half-effective. Pull just the stable handles into deps. Add a renderHook-based regression test that re-renders useIssueTimeline twice and asserts the four callbacks passed to CommentCard keep the same identity. Co-authored-by: multica-agent <github@multica.ai> --------- Co-authored-by: multica-agent <github@multica.ai>	2026-05-03 11:57:30 +08:00
Bohan Jiang	170fa2102b	fix(agent/hermes): wire streamingCurrentTurn gate to drop history replay (#2024 ) Hermes ACP can flush queued session updates from the previous turn before the current turn actually starts — both as session/resume history replay and as chunks queued before our session/prompt response streams. Without a gate those updates were appended to output and re-emitted to the UI, so the previous answer appeared duplicated next to the new one. Closes #1997. PR #1789 added the acceptNotification hook field to hermesClient and the call site in handleNotification, but never assigned it for Hermes, so the guard short-circuited and every notification was processed. This change mirrors the working Kiro pattern (kiro.go:87/97/240): - declare a streamingCurrentTurn atomic.Bool in the backend. - assign acceptNotification, onMessage, onPromptDone gates that all return early when the flag is false. - flip the flag to true immediately before c.request("session/prompt"). Adds TestHermesClientAcceptNotificationGate as a regression test that exercises the gate directly on hermesClient. Verified with `go test ./pkg/agent`. Co-authored-by: multica-agent <github@multica.ai>	2026-05-03 11:43:36 +08:00
Bohan Jiang	a414a00b4a	refactor(repocache): clarify resolveBaseRef comment and cover tag refs (#2023 ) Follow-up nits from PR #1988 review: - Move the comment that documents getRemoteDefaultBranch's resolution walk into the resolveBaseRef call site description, and rephrase the "" branch so it's clear that path only fires for the default-branch case (the requested-ref path returns an explicit error before reaching it). - Add TestCreateWorktreeWithRequestedTagRef to lock in the refs/tags/<ref> candidate. The test tags the initial commit, advances the default branch past it, then asserts the worktree HEAD matches the tagged commit (so the tag must have been resolved, not the default branch). Co-authored-by: multica-agent <github@multica.ai>	2026-05-03 11:30:25 +08:00
Prince Pal	862b0509df	feat: support repo checkout ref selection (#1988 )	2026-05-03 11:27:16 +08:00
Bohan Jiang	ba5b7db78e	fix(server): persist ModelListStore across replicas via Redis (#2022 ) * fix(server): persist ModelListStore across replicas via Redis The model picker uses a pending-request pattern: the frontend POSTs to create a request, the daemon pops it on its next heartbeat, runs agent.ListModels locally, and reports back. Until now the store was a plain in-memory map per Handler instance. That works for self-hosted single-instance deploys but fails in any multi-replica environment (Multica Cloud). Each replica has its own map, so: POST /runtimes/:id/models → request stored in replica A GET /runtimes/:id/models/<requestId> → polls land on B/C → 404 daemon heartbeat → only A sees PendingModelList POST .../<requestId>/result → daemon's report has to land on A Success probability ~1/N². The visible symptom is "No models available" in the picker for every provider, even those (Claude/Codex) whose catalog is statically populated end-to-end. Same shape of bug, same Redis-backed fix as multica-ai/multica#1557 did for LocalSkillListStore / LocalSkillImportStore. Reuse the operational playbook (namespaced keys, ZSET-backed pending queue, atomic ZREM+SET-running via the shared Lua script) so we don't introduce a second concurrency model for the same primitive. Changes: - Convert ModelListStore from struct to interface with context-aware methods. Add HasPending for cheap heartbeat-side probing. - InMemoryModelListStore — single-node fallback, used when REDIS_URL is unset (self-hosted dev / tests). - RedisModelListStore — multi-node implementation using the same key layout and Lua atomic claim as RedisLocalSkillListStore. - Use RunStartedAt (not UpdatedAt) as the running-timeout reference point, matching the local-skill stores so subsequent UpdatedAt bumps don't reset the running clock. - Heartbeat now uses the probe-then-pop pattern for the model queue (matching local-skills) so a slow Redis can't stall every connected daemon. Extends heartbeatMetrics + slow-log with probe_model_ms / pop_model_ms / probe_model_timed_out for parity. - Wire the Redis backend in NewRouterWithOptions when rdb != nil. - Tests for both backends. Redis tests gate on REDIS_TEST_URL so laptop runs without Redis still pass; CI provides it. Co-authored-by: multica-agent <github@multica.ai> * fix(server): persist RunStartedAt + retry model report on transient failures Two follow-ups from PR #2022 review: 1. RedisModelListStore was dropping ModelListRequest.RunStartedAt on persistence — the field is tagged json:"-" so it doesn't leak into the HTTP response, which made plain json.Marshal(req) silently discard it. Across-node readers saw RunStartedAt=nil and applyModelListTimeout's running branch became a no-op, so the 60s running-timeout escape hatch never fired. CI's TestRedisModelListStore_RunningTimeout was failing on this exact case. Fix mirrors RedisLocalSkillImportStore's envelope pattern — wrap in an internal struct that re-promotes the field. HTTP shape stays clean. Adds a no-Redis unit test that pins the round trip. 2. Daemon's handleModelList called d.client.ReportModelListResult directly and swallowed any 5xx, leaving the pending request stranded in "running" until its 60s server-side timeout — exactly the failure mode the multi-node store fix was meant to eliminate. Generalize the existing local-skill retry helper into reportRuntimeResultWithRetry (kind: model_list / local_skill_list / local_skill_import) and wire handleModelList through a new reportModelListResult helper. Renames the test-overridable var localSkillReportBackoffs → runtimeReportBackoffs to match. Co-authored-by: multica-agent <github@multica.ai> --------- Co-authored-by: multica-agent <github@multica.ai>	2026-05-03 11:13:34 +08:00

1 2 3 4 5 ...

2815 Commits