181 Commits

Author SHA1 Message Date
Bohan Jiang
aca74293dd fix(agent/claude): surface stderr tail on writeClaudeInput failure + lock with e2e test (#1698)
#1674 wired claude's post-handshake error path through withAgentStderr but
left the writeClaudeInput failure branch returning a bare "broken pipe"
error. That branch fires precisely when claude crashes during startup —
exactly when the stderr tail is most useful for root-causing V8 aborts,
Bun panics, or missing native modules. cmd.Wait() before sampling Tail()
flushes os/exec's internal stderr copy goroutine, matching the
Wait→Tail synchronization contract spelled out in stderr_tail.go.

Adds TestClaudeExecuteSurfacesStderrWhenChildExitsEarly mirroring the
codex test: a fake claude binary drains stdin, writes a V8-abort line to
stderr, and exits 3. Locks in the contract that Result.Error carries the
stderr tail in the post-handshake failure path on the claude backend too.
2026-04-26 11:09:38 +08:00
jmoney8896
3c3e3bd330 fix(task): reconcile agent status when cancelling tasks by issue (#1587) (#1648)
CancelTasksForIssue silently dropped the list of affected tasks, so
whenever an issue transitioned to "cancelled" or "done" while a task was
still active (6 call sites in issue.go), the underlying agent was left
stuck at status="working" indefinitely and required a manual
`multica agent update <id> --status idle` to self-correct. This matches
the symptom reported in #1587: task rows move to "cancelled" via a
non-user-initiated path, agent status never reconciles.

Change CancelAgentTasksByIssue from :exec to :many (also tack on
completed_at = now() for consistency with CancelAgentTasksByIssueAndAgent),
then update CancelTasksForIssue to iterate the returned rows and call
ReconcileAgentStatus + broadcast task:cancelled per affected task —
mirroring the pattern already used by CancelTask and RerunIssue.

No test added; the change is small and mirrors well-covered paths.
Happy to add a mock-backed test in a follow-up if reviewers prefer.

Refs #1587
Refs #1149
2026-04-26 10:58:42 +08:00
songlei
6f04a6d26b feat(agent): surface agent CLI stderr tail in failure messages (#1674)
Hoist the existing stderrTail ring-buffer (previously codex-only) into
a shared pkg/agent helper so every Backend that supervises a child CLI
can include the last ~2 KB of that CLI's stderr in Result.Error. Wire
the claude backend through the same path.

Motivation: claude on Windows occasionally exits with a non-zero status
after ~5–8 minutes of a single long-running tool_use, and right now the
daemon only reports "claude exited with error: exit status 3" /
"exit status 0x80000003" — useless for root-causing V8 aborts, Bun
panics, native-module OOMs, or any other CLI-side crash. With the tail
attached, the failure message carries the real signal (panic line, V8
assertion, stderr-printed HTTP error) all the way into the task row's
error field that users see in the API.

Renames withCodexStderr to withAgentStderr(msg, label, tail) so the
helper is self-documenting across providers.
2026-04-26 10:55:21 +08:00
Bohan Jiang
95912243bb test(daemon): cover cancelled classification in executeAndDrain (#1692)
Follow-up to #1686. Locks in two nits flagged during review:

1. agent.Result.Status doc comment now lists "cancelled" alongside the
   existing values, so the enum surface matches actual usage.
2. New TestExecuteAndDrain_ContextCancelled_ReportsCancelled exercises
   the path added in #1686: when the parent context is cancelled before
   the backend produces a Result, executeAndDrain must return
   Status="cancelled" (not "timeout"). A regression here would silently
   restore the misleading log line we just fixed.
2026-04-26 09:27:13 +08:00
Bohan Jiang
74593fdb88 fix(daemon): use CREATE_NEW_CONSOLE to stop grandchild console popups on Windows (#1521) (#1643)
* fix(daemon): use CREATE_NEW_CONSOLE to stop grandchild console popups on Windows (#1521)

CREATE_NO_WINDOW strips the console entirely. When the agent CLI then
spawns a console-subsystem grandchild (bash, cmd, netstat, findstr,
timeout) without itself passing CREATE_NO_WINDOW, Windows allocates a
brand-new visible console window per invocation — trading one popup per
agent run for N popups per tool call.

Switch to CREATE_NEW_CONSOLE + HideWindow=true so the agent gets a
hidden console that grandchildren inherit. Stdio pipes still work via
STARTF_USESTDHANDLES; no changes needed at the 17 hideAgentWindow call
sites.

Add a Windows-only regression test asserting CREATE_NEW_CONSOLE is set
and CREATE_NO_WINDOW is not, per the #1474 Windows-test follow-up.

Root-cause diagnosis by @matrenitski (verified against the shipped
multica.exe and the Claude Code CLI it spawns) in issue #1521.

* test(agent): use CREATE_NEW_CONSOLE-compatible flag in preservation test

CREATE_NEW_PROCESS_GROUP is silently ignored by Windows when combined
with CREATE_NEW_CONSOLE, so asserting it 'survives' was only bitwise-true,
not semantically meaningful. Switch the example to
CREATE_UNICODE_ENVIRONMENT (documented compatible) and also assert a
non-flag field (NoInheritHandles) survives to exercise full struct
preservation.
2026-04-25 01:40:15 +08:00
Naiyuan Qing
40cea8454d feat(autopilot): redesign modal — simpler schema, consistent schedule UI (#1595)
Drop priority and project_id from autopilot. project_id was never exposed
in the UI and priority duplicated the agent's own task queue priority.

Redesign the create/edit modal as a Runbook (left) + Configuration (right)
layout. Rework the Schedule section around a single visual shell so every
picker aligns pixel-for-pixel on the same row:

- TimeInput (new): segmented HH:MM control adapted from openstatusHQ/time-picker,
  driven by keyboard (ArrowUp/Down to step, ArrowLeft/Right to jump segment,
  digit typing with a 2s two-digit window). Replaces <input type="time">,
  whose native UI broke the design system. Supports a minuteOnly variant
  for hourly schedules.
- TimezonePicker (new): searchable Popover with a fixed-width left check
  slot so rows stay aligned and GMT offsets never collide with the selected
  indicator.
- Runbook editor now lives in a bordered card, giving the placeholder an
  input surface instead of bare document flow.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 11:05:33 +08:00
Kagura
6d9ca9de93 fix(daemon): suppress agent terminal windows on Windows (#1474)
* fix(daemon): suppress agent terminal windows on Windows (#1471)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: add hideAgentWindow to detectCLIVersion and avoid SysProcAttr overwrite

- Add missing hideAgentWindow(cmd) call in detectCLIVersion (claude.go:554)
  so --version checks don't flash console windows on Windows.
- Refactor hideAgentWindow to preserve existing SysProcAttr fields
  instead of overwriting the entire struct.

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-23 17:23:00 +08:00
LinYushen
d97aec83d7 fix: pass model to Hermes ACP and add hermes to InjectRuntimeConfig (#1203)
* fix: pass model to Hermes ACP session/new and add hermes to InjectRuntimeConfig

- hermes.go: include opts.Model in session/new params so Hermes uses
  the configured model instead of its default (fixes local LLM failures)
- runtime_config.go: add "hermes" to the AGENTS.md provider list so
  Hermes receives the Multica runtime instructions and skill discovery

Fixes: https://github.com/multica-ai/multica/issues/1195

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix(hermes): drop false native-skill claim and add regression tests

The previous change added 'hermes' to the 'skills discovered automatically'
branch of buildMetaSkillContent, but resolveSkillsDir has no Hermes case so
skills still land in the .agent_context/skills/ fallback. AGENTS.md ended up
claiming native discovery while the files were somewhere else, which would
mislead Hermes (and future debuggers).

- Move 'hermes' to the fallback branch alongside 'gemini' so AGENTS.md points
  Hermes at .agent_context/skills/ — matching where writeContextFiles actually
  writes them.
- Extract buildHermesSessionParams so the session/new payload is unit-testable.
- Add regression tests covering:
  * buildHermesSessionParams includes/omits 'model' correctly
  * InjectRuntimeConfig('hermes') writes AGENTS.md with the fallback hint
  * writeContextFiles('hermes') writes skills to .agent_context/skills/

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: CC-Girl <cc-girl@multica.ai>
2026-04-23 12:43:30 +08:00
Naiyuan Qing
d6e7824ff1 feat(feedback): in-app feedback flow + Help launcher (#1546)
* feat(feedback): add in-app feedback flow and Help launcher

Replaces the duplicated bottom-sidebar user popover and "What's new" links
with a single Help menu (Docs / Feedback / Change log) pinned to the
sidebar footer. Feedback opens a rich-text modal that POSTs to a new
/api/feedback endpoint; submissions land in a dedicated feedback table
with per-user hourly rate limiting (10/hr) to deter spam without adding
middleware infrastructure. User identity (avatar + name + email) moves
into the workspace dropdown header so the sidebar is no longer visually
redundant.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(feedback): harden submit path and cap request body

- Read editor markdown via ref at submit time instead of debounced state,
  so ⌘+Enter immediately after typing doesn't drop the last keystrokes.
- Block submission while images are still uploading; toast prompts the
  user to wait instead of silently sending markdown with blob: URLs
  that get stripped.
- Cap /api/feedback request body at 64 KiB via MaxBytesReader so an
  authenticated client can't bloat the metadata JSONB column with an
  oversized url field.
- Add Go handler tests covering happy path, empty-message rejection,
  and the hourly rate limit boundary.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(analytics): instrument feedback funnel

Adds two events pairing frontend intent with backend conversion so we
can compute a completion rate for the in-app Feedback modal:

- `feedback_opened` (frontend) — fires once on FeedbackModal mount.
  Source is currently always "help_menu" but the type is a union so
  future entry points have to extend it explicitly. Workspace id is
  attached when present.
- `feedback_submitted` (backend) — fires from CreateFeedback after the
  DB insert succeeds and the hourly rate-limit check has passed.
  Message content itself is never sent to PostHog; the event carries
  a coarse length bucket (0-100 / 100-500 / 500-2000 / 2000+), an
  image-presence flag, and the client platform / version pulled from
  X-Client-* headers via middleware.ClientMetadataFromContext.

Affects no existing funnel; seeds a new Feedback funnel for product
triage.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 10:35:55 +08:00
Bohan Jiang
7375bda9b5 fix(landing): scope landing route to always-light palette (MUL-1277) (#1537)
* fix(landing): scope landing route to always-light palette

The landing page sections use hardcoded light colors (bg-white / #0a0d12),
but shared components rendered inside — notably CloudWaitlistExpand on
/download — use semantic tokens that flip to dark values under next-themes'
`.dark` class, producing a mismatched dark card on an otherwise light page
when the user's OS is in dark mode.

Add a `.landing-light` class on the landing layout wrapper that re-declares
all color tokens to their light values for the subtree, so nested
token-driven components stay in lockstep with the hardcoded palette.

* test(agent): serialize fake-executable writes to avoid ETXTBSY on CI

TestKimiBackendInvokesACPSubcommand (and its Kimi/Codex siblings) write a
shell script to a per-test TempDir and then fork/exec it. With t.Parallel()
enabled across the package, a concurrent goroutine's fork can inherit the
still-open write fd to another test's new executable; Linux then rejects
the subsequent exec with ETXTBSY (seen as
  fork/exec /tmp/.../kimi: text file busy
on GitHub Actions).

Introduce writeTestExecutable, which holds syscall.ForkLock.RLock across
OpenFile→Write→Close. Fork (which takes ForkLock.Lock) cannot run while we
hold RLock, so no sibling fork inherits our write fd. Ran the three callers
with -count=10 under -p=1 and the full package with no failures.
2026-04-23 01:52:46 +08:00
Dhruv-89
2a248b8548 fix(openclaw): raise agent discovery timeout to 30s (#1495)
'discoverOpenclawAgents' runs several 'openclaw' subprocesses under one
context; 5s was too short on cold starts or under load, causing empty
listings in the model picker. Increase the per-discovery cap to 30s.
2026-04-22 19:24:57 +08:00
Bohan Jiang
dc8096fb6e fix(agent): expose Gemini 3 + CLI aliases in Gemini runtime model list (#1508)
Gemini CLI has no `models list` subcommand, so Multica can't do real
dynamic discovery. Instead, swap the static catalog from fixed version
names (2.0/2.5 only) to the CLI's own aliases (`auto`, `pro`, `flash`,
`flash-lite`, `auto-gemini-2.5`) plus explicit pins for Gemini 3
preview and 2.5 variants. Aliases are resolved inside the Gemini CLI
per user entitlement + quota, so new model releases light up without
a Multica redeploy. Default is `auto`, matching Google's recommended
selection.

Fixes multica-ai/multica#1503.
2026-04-22 19:02:07 +08:00
Naiyuan Qing
3036c6418e fix(onboarding): pin sync, welcome layout, runtime bootstrap state (#1482)
Follow-ups on the onboarding flow shipped in #1411.

Pin state synchronization:
- ImportStarterContent now publishes pin:created after commit so the
  sidebar refreshes without a hard reload (previously the pins landed
  in the DB but no event was fired).
- ReorderPins publishes pin:reordered, keeping order in sync across
  web + desktop sessions.
- StarterContentPrompt.onImport invalidates queries locally, mirroring
  the useCreatePin / useDeletePin / useReorderPins onSettled pattern,
  so the originating session's refresh doesn't depend on the WS
  round-trip (WS is the signal for OTHER sessions).
- ImportStarterContent rejects malformed workspace_id up front with
  400 instead of falling through to a misleading 403.

Welcome step layout:
- Switch the two-column hero from CSS Grid to a flex row. Both
  columns share the container's full height via items-stretch +
  justify-center, so the bg-muted/40 backdrop fills edge-to-edge on
  tall viewports and left/right content stays vertically centred.

Desktop runtime bootstrap state:
- New DesktopRuntimesPage wrapper subscribes to window.daemonAPI and
  forwards a `bootstrapping` prop to RuntimeList. While the bundled
  daemon is booting, the empty state renders "Starting local
  runtime…" instead of the misleading "Run multica daemon start"
  hint. Web leaves the prop undefined — behaviour unchanged.

Small polish:
- CLI install dialog caps at 85vh with an internal scroll so the
  Connect button stays reachable when multiple runtimes are
  registered.
- Drop the env-aware CLI setup command; onboarding always targets
  cloud, so `multica setup` is enough — no need to thread apiUrl /
  appUrl through the dialog.

Developer tooling:
- pnpm dev:desktop:staging — parallel dev command that loads
  .env.staging (copilothub backend) via `electron-vite --mode
  staging`, so switching between local and staging no longer
  requires hand-editing env files.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 14:47:00 +08:00
LinYushen
0b1333fb00 feat(server): orphan-task recovery + auto-retry + manual rerun (MUL-1128) (#1476)
* feat(server): orphan-task recovery + auto-retry + manual rerun (MUL-1128)

When the daemon process crashed mid-task the issue was stuck at
in_progress for up to 2.5h: the in-flight task timeout was the only
mechanism that ever moved the row, and the runtime heartbeat sweeper
only fires after the runtime stays offline for 45s — a quick restart
beats both windows.

This change implements the A+B plan from the issue thread:

A. lifecycle hygiene
- migration 055 adds attempt / max_attempts / parent_task_id /
  failure_reason / last_heartbeat_at to agent_task_queue
- new daemon-auth endpoint POST /runtimes/{id}/recover-orphans:
  daemon calls it on every register so the server fails any
  dispatched/running tasks the previous process left behind
- new daemon-auth endpoint POST /tasks/{id}/session: persists the
  agent's session_id + work_dir mid-flight so a crash doesn't
  lose the resume pointer (claude+codex emit MessageStatus with
  SessionID; daemon forwards on the first one it sees)
- FailAgentTask / FailStaleTasks / FailTasksForOfflineRuntimes
  now set failure_reason ('agent_error' / 'timeout' /
  'runtime_offline')

B. auto-retry with resume context
- TaskService.MaybeRetryFailedTask spawns a fresh queued attempt
  carrying parent's session_id/work_dir when the failure reason
  is infrastructure-shaped (timeout, runtime_offline,
  runtime_recovery) and attempt < max_attempts; skips autopilot
- wired into the runtime sweeper paths and TaskService.FailTask
  so the user transparently sees a new in_progress run instead of
  a stuck row
- new user-auth POST /api/issues/{id}/rerun + multica issue rerun
  CLI for the manual escape hatch

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix(server): address PR review for orphan-task recovery (MUL-1128)

Three review-must-fix items on top of the A+B implementation:

1. recover-orphans now funnels through TaskService.HandleFailedTasks,
   the same shared post-failure pipeline used by the runtime sweeper.
   This guarantees task:failed events are emitted, agent status is
   reconciled, and issues stuck in_progress with no remaining active
   task are reset to todo even when no auto-retry is created
   (max_attempts exhausted, autopilot, non-retryable reason).

2. RerunIssue now uses CancelAgentTasksByIssueAndAgent, scoped to the
   issue's current assignee. The previous implementation called
   CancelAgentTasksByIssue, which would collateral-cancel parallel
   @-mention agents on the same issue.

3. GetLastTaskSession now considers both completed and failed tasks
   (mirroring GetLastChatTaskSession), ordering by the most recent
   timestamp. With UpdateAgentTaskSession pinning session_id/work_dir
   mid-flight, an auto-retry or manual rerun of a daemon-crash failure
   now actually resumes the prior conversation context instead of
   starting fresh — matching the stated B-branch behaviour.

go build / go vet pass; the existing service and agent test suites pass.
runtime_sweeper / handler integration tests require a local DB with the
055 migration (and the pre-existing 050 first_executed_at column).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-22 13:08:37 +08:00
Naiyuan Qing
3fd2fb2ae3 feat(onboarding): redesigned flow + post-landing starter content opt-in (#1411)
* docs(onboarding): add redesign proposal

Captures motivation (two activation funnels), research-backed principles,
final 5-step flow (welcome+questionnaire → workspace → runtime → agent →
first-issue), Q1/Q2/Q3 personalization matrix, backend user_onboarding
schema, API design, resume policy, and development ordering
(frontend-first with Zustand stub, backend-last, server swap).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(onboarding): scaffold redesigned flow and state foundation

Work-in-progress scaffold toward the redesign documented in
docs/onboarding-redesign-proposal.md. This commit is intentionally
broad — subsequent commits will replace step content and wire real
personalization. Not ready for merge.

Included:
- packages/views/onboarding/: flow orchestrator + 5 step components
  (welcome/workspace/runtime/agent/complete) and the CLI install card.
  Step content is the placeholder version; Step 1 (questionnaire) and
  Step 5 (first issue) are the next changes.
- packages/core/onboarding/: dev-phase Zustand store + types. Not
  persisted — every page refresh starts at Step 1 so each step can be
  iterated in isolation. Will swap to TanStack Query + PATCH
  /api/me/onboarding once the backend user_onboarding table ships
  (keeps the exported hook surface stable).
- packages/core/paths/resolve.ts + .test.ts: centralized
  resolvePostAuthDestination. Priority is flipped so !hasOnboarded
  wins over workspace presence — during frontend development every
  login re-enters /onboarding. useHasOnboarded() reads from the store
  so the real onboarded_at semantic lands automatically once the
  backend ships.
- Post-auth wiring: callback page, login page, landing redirect,
  dashboard guard, realtime workspace-loss handler, settings leave/
  delete, invite acceptance, and desktop app shell all delegate to
  the shared resolver instead of inline logic.
- Desktop overlay: 'onboarding' added as a WindowOverlay type
  alongside new-workspace / invite, with a navigation-adapter
  interception so push('/onboarding') opens the overlay.
- packages/core/package.json / packages/views/package.json: add new
  subpath exports.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(onboarding): revise questionnaire to role-driven 3-question form

Aligns the proposal with the corrected product positioning: Multica is an
AI agent orchestration platform for diverse users (developers, product
leads, writers, founders), not a coding-focused tool.

Key changes:
- Drop Q1 "which agents do you already use?" — daemon auto-detects
  installed CLIs on PATH; asking is both redundant and less accurate
- Add Q2 "what best describes you?" (role) to drive Step 4 template
  default and Onboarding Project sub-issue filtering
- Keep Q1 team_size, refine Q3 use_case (recover writing/research
  option); all three now have "Other" with an 80-char text field
- Q3 use_case_other is embedded into Step 5 first issue prompt so
  Other users get maximally personalized aha moments, not generic ones
- Agent templates: 3 → 4 (Coding / Planning / Writing / Assistant),
  matrix driven by Q2 × Q3
- Onboarding Project sub-issues: surface Autopilot and Workspace
  Context (product differentiators), replace "orchestration" wording
- Schema JSONB example and §5/§9 execution plan updated to match

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(onboarding): align questionnaire shape with role-driven redesign

Prepares the core state layer for the Step 1 questionnaire rewrite.
Type-only and initial-value changes; no behavior changes (nothing was
reading the removed `existing_agents` field, since no questionnaire UI
exists yet).

- Add `Role` type (Q2: developer / product_lead / writer / founder / other)
- Add `*_other` sibling fields for team_size / role / use_case so each
  question's "Other" selection can carry 80-char free text
- Drop `existing_agents` — daemon auto-detects CLIs on PATH at Step 3,
  so the signal no longer belongs in the questionnaire
- Extend `TeamSize` / `UseCase` unions with `"other"` member
- Refine `UseCase` option label (`writing` → `writing_research`) so
  it matches the widened Q3 scope in the proposal

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(onboarding): implement Step 1 questionnaire

Replaces the placeholder welcome step with the 3-question questionnaire
defined in docs/onboarding-redesign-proposal.md §3.4. Answers land in
the core onboarding store for later use by Steps 4 and 5.

Added:
- packages/views/onboarding/components/option-card.tsx — OptionCard +
  OtherOptionCard. Radio-group ARIA semantics; Enter/Space select;
  Other variant reveals an 80-char input that auto-focuses on mount.
- packages/views/onboarding/steps/step-questionnaire.tsx — merges
  welcome + Q1/Q2/Q3 into one screen. Local draft state for
  responsiveness; writes to the core store only on submit. Skip/
  Continue CTA swap driven by "any answered?"; the only disabled
  case is "picked Other but the text box is blank".
- Test coverage for the CTA rules, Other-clear-on-switch behavior,
  initial-answers pre-fill, and full payload shape.

Modified:
- packages/views/onboarding/onboarding-flow.tsx — render
  questionnaire as the first step; persist answers and advance the
  stored current_step on submit. Other steps still run off local
  useState for now; full store-driven orchestration follows when
  Step 5 lands.

Removed:
- packages/views/onboarding/steps/step-welcome.tsx — superseded.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(onboarding): split welcome + questionnaire, unblock scroll, drop Q1 evaluating

Three fixes prompted by first real browser testing of the Step 1
questionnaire. All three are about making the flow usable before
pursuing visual polish.

1. Split Welcome and Questionnaire into two screens
   The previous merge-welcome-into-questionnaire decision dropped
   Multica's product introduction entirely. For a product with no
   established mental model (AI agents as first-class teammates in a
   task platform), first-time users need 5 seconds of framing before
   the questionnaire makes sense. StepWelcome carries that framing;
   it's UI-only (not a persisted step), shown only on first entry
   (pristine store), and skipped automatically on resume.

2. Remove `my-auto` vertical centering from both platform shells
   Long questionnaire content pushed the centered block's top above
   the scroll origin, making Continue/Skip unreachable. Top-alignment
   + natural body/overlay scroll is the boring-but-correct baseline
   for content of variable height.

3. Drop Q1 "Just exploring for now" option
   Q1 asks about team structure, not attitude. "Evaluating" was a
   category error. Low-commitment users already have a zero-friction
   path (skip all questions). Removing the option simplifies the
   question and the downstream mapping table.

Types, store initial value, proposal doc (§3.1 flow diagram, §3.4
options, §3.5 sub-issue sorting, §3.6 conditionals, §4.1 JSONB
schema, §5.2 file list, §7 decisions row, §9.2 execution order)
all synced.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(onboarding): center short steps, scroll long ones — correctly this time

Previous attempt removed `my-auto` thinking it was responsible for
blocked scrolling. That diagnosis was wrong: the real blocker was
the root layout's \`body { overflow: hidden }\` (an app-shell
convention so sidebar/topbar stay put while the inner content
region scrolls). Removing `my-auto` broke vertical centering of
short steps (Welcome) without fixing the scroll issue.

Correct fix:
- Web: page now owns its own scroll container — `h-full
  overflow-y-auto` on the outermost div decouples from the body's
  overflow-hidden.
- Desktop: the overlay's existing `flex-1 overflow-auto` container
  already provided scroll; just restoring `my-auto` was sufficient.
- Both platforms: inner `flex min-h-full flex-col items-center` +
  content `my-auto` gives the "short centers, long top-aligns and
  overflows down" behavior. Per the flex spec, auto margins are
  ignored on overflowing boxes (they overflow in the end direction),
  so Continue/Skip remain reachable via scroll even on long steps
  like the questionnaire.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(onboarding): add progress indicator + stable header anchor

Adds a consistent visual anchor at the top of every step (except
Welcome), so transitioning between steps of different content heights
no longer shifts the vertical baseline.

- packages/core/onboarding/step-order.ts — single source of truth for
  step order; indicator math reads from here so adding/reordering a
  step touches only one line
- packages/views/onboarding/components/step-header.tsx — dot row +
  "Step N of M" counter; three dot states (done/current/pending);
  accessible progressbar semantics
- onboarding-flow.tsx — non-welcome steps now render under a shared
  `<div flex flex-col gap-8>` wrapper with StepHeader on top. Maps
  the local `complete` render step to the store's `first_issue`
  until Step 5 lands (one-line function, self-deleting).
- step-welcome.tsx — keeps its own min-h-[60vh] + justify-center so
  the short intro still feels centered once the shell drops my-auto
- apps/web + apps/desktop shells — removed `my-auto`. Every
  non-welcome step now anchors to the same top position, so only the
  content below the header changes during transitions. Welcome's own
  internal centering handles its "short content, no header" case.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(onboarding): add web Step 3 platform fork (Desktop / CLI / waitlist)

Web users now see a three-way choice at the runtime step instead of
being dropped directly into CLI install instructions:
- Primary CTA: Download Multica Desktop (bundled runtime)
- Alternate: install the CLI (reveals existing StepRuntimeConnect)
- Alternate: join the cloud waitlist (captures email, completes
  onboarding early with cloud_waitlist_email set)

Desktop unchanged — its platform shell doesn't pass cliInstructions,
so OnboardingFlow routes it straight to StepRuntimeConnect for the
bundled-daemon auto-connect path.

Rename step-runtime.tsx → step-runtime-connect.tsx to reflect its
new single responsibility (connect UI only; platform choice lives
in StepPlatformFork).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(onboarding): capture optional use-case on cloud waitlist

Adds a textarea to the waitlist form asking what the user wants to
use Multica for. Optional (submit still works with email alone) but
surfaces a clear prompt + placeholder example so most users will fill
it in. Stored as cloud_waitlist_description alongside the email.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(onboarding): make !hasOnboarded a first-class gate on both platforms

Triggering condition was wrong on both sides. Web's dashboard-guard
only checked hasOnboarded when the URL slug failed to resolve; desktop's
App.tsx effect returned early when wsCount > 0 before even looking at
hasOnboarded. Users with existing workspaces never got routed into
onboarding regardless of their flag state.

Also wire store.complete() into the happy-path finish — previously only
the waitlist branch wrote onboarded_at, so every normal completion
left the flag false and (now that triggers work) would loop users back
into onboarding on refresh.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(onboarding): Step 5 auto-bootstrap — welcome issue + Getting Started project

After agent creation, the flow transitions to a loader screen that
runs the bootstrap in the background:
- Creates a welcome issue with a Q3-driven prompt, assigned to the
  new agent (so it starts working immediately)
- Creates a "Getting Started" project with tutorial sub-issues
  filtered by Q1/Q2/Q3
- Stores first_issue_id + onboarding_project_id via store.complete()
- Navigates the user straight into the welcome issue detail page,
  where they see the agent already responding

Degraded path: if welcome issue fails, shows error with Retry /
Continue anyway. If project or sub-issues fail, logs and proceeds
with just the welcome issue — the aha moment still happens.

No-agent paths (runtime skip, agent skip) short-circuit to onComplete
without bootstrap.

Local flow step union now aligns with the store enum; removed the
mapLocalToStoreStep bridge and deleted the old step-complete.tsx
placeholder.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(onboarding): converge all no-agent paths to a single bootstrap step

Before: skip-runtime, skip-agent, and waitlist each finished onboarding
independently, bypassing Step 5 entirely. Users without an agent landed
in an empty workspace with no tutorial project — the "self-serve" case
had no bootstrap at all.

Now: all three paths converge on the first_issue step with agent=null.
Bootstrap branches on agent presence:
- agent ✓ → welcome issue (assigned to agent) + project + agent-guided
  sub-issues ("watch your agent do X"). Lands on the welcome issue.
- agent ✗ → project only + self-serve sub-issues ("try X yourself" —
  configure runtime, create agent, write first issue, etc.). Lands on
  the workspace issues list with the Getting Started project in the
  sidebar.

Both web and desktop shells already handle firstIssueId=undefined →
fall back to /<slug>/issues, so no shell-side change was needed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(onboarding): pin starter project + assign sub-issues to the user

Bootstrap now also:
- Pins the Getting Started project so users see it in the sidebar
  immediately (both paths)
- Pins the welcome issue too (path A only) so the first conversation
  with the agent stays one click away
- Assigns every sub-issue to the current user (via their workspace
  member record). Only the welcome issue stays assigned to the agent —
  that's the aha-moment hand-off; everything else is for the user to
  work through

Pin calls are fire-and-forget (failure logged but non-blocking).
Member lookup is defensive — if listMembers fails or the user isn't
found, sub-issues gracefully fall back to unassigned rather than
breaking the bootstrap.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(onboarding): remove cloud waitlist option

Cloud runtime is not on the immediate roadmap and there's no backend
table to persist emails. Keeping the UI around would silently drop
user submissions — small trust leak. Revisit once cloud product lands
alongside a proper waitlist table + notification pipeline.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(onboarding): persist onboarded_at end-to-end

Phase 1 of bringing onboarding from dev stub to production. A single
persisted column drives every trigger — no separate user_onboarding
table yet (that's a later phase for questionnaire persistence, cloud
waitlist, analytics).

Backend
- Migration 050: ALTER TABLE "user" ADD COLUMN onboarded_at TIMESTAMPTZ
  (no backfill — existing users see onboarding next login, Skip
  affordance lands later)
- sqlc: MarkUserOnboarded with COALESCE for idempotency
- UserResponse DTO + userToResponse now emit onboarded_at via
  existing util.TimestampToPtr helper — single edit covers GetMe,
  VerifyCode, GoogleLogin, LoginWithToken
- New handler POST /api/me/onboarding/complete
- Route registered in the authenticated user-scoped group

Frontend
- User type gets onboarded_at: string | null
- api.markOnboardingComplete()
- Auth store adds refreshMe() — lightweight getMe + setUser,
  complements existing initialize()
- useHasOnboarded switches source from onboarding-store (dev stub)
  to auth-store (user.onboarded_at). Every call site — dashboard
  guard, desktop App.tsx, invite page fallback, realtime
  workspace-loss handler, settings leave/delete — picks up the
  real signal without any direct change
- onboarding-store.complete() now hits the server: POST + refreshMe
  before local state update, so the next router effect sees the
  non-null timestamp and won't bounce the user back

Triggers + route guards
- StepWorkspace drops the Skip button — every onboarding user
  must create their own workspace even if invited into one
- /onboarding page redirects already-onboarded users away (guards
  against manual URL access)
- login page + auth callback: onboarding wins over ?next= for
  unonboarded users; invite links are revisitable after onboarding

Tests
- apps/web callback tests updated: mocks now return User objects
  so onboarded_at is readable; new "onboarded user honors next"
  scenario added, "unonboarded ignores next" scenario kept
- test/helpers mockUser gets onboarded_at field
- questionnaire already-existing strict-required tests bundled in
  from a prior uncommitted change

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(onboarding): review findings — dead state, error recovery, cache races

From independent review of the prior onboarded_at commit.

- Remove the dead OnboardingState.onboarded_at field, its INITIAL_STATE
  entry, and its write in store.complete(). useHasOnboarded now reads
  auth-store exclusively; leaving a parallel field here violates the
  "don't duplicate server data in Zustand" rule and risks drifting into
  a second source of truth.
- Wrap handleBootstrapDone/handleBootstrapSkip in try/catch with toast
  recovery. complete() is idempotent server-side (COALESCE), so a
  retry after a failed POST/refreshMe is free — letting the error
  bubble into the React error boundary trapped the user with no way
  forward.
- RedirectIfAuthenticated: swap `!list` for `isFetched`-gated check,
  matching the pattern added on the /onboarding page. Same one-tick
  race where a stale cache [] could fire a premature replace before
  the fresh list settles.
- (Self-review fixups picked up along the way) /onboarding page now
  waits for workspacesFetched before redirecting already-onboarded
  users, and login handleSuccess reads useAuthStore.getState() so the
  hasOnboarded value is fresh after setUser (the closure captured a
  stale pre-login value otherwise).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(onboarding): shrink store surface + firm up flow invariants

Post-review cleanup. End-to-end flow is already complete (user.onboarded_at
is the single source of truth); these are quality-of-life fixes on top.

Store surface
- Drop six dead fields from OnboardingState (workspace_id, runtime_id,
  agent_id, first_issue_id, onboarding_project_id, platform_preference)
  and the PlatformPreference type. None had readers — they were stub
  placeholders for a future user_onboarding table that isn't coming
  this phase. CLAUDE.md "don't design for hypothetical future".
- store.complete() signature simplifies to () — no more patch arg,
  since the only patch fields were the ones just deleted.

Welcome as a first-class step
- Add "welcome" to OnboardingStep enum and make it INITIAL_STATE's
  current_step. Removes the pristine-heuristic "did user see welcome?"
  check, which could misfire on remount.
- pickInitialStep() collapses to `state.current_step ?? "welcome"`.
- ONBOARDING_STEP_ORDER stays unchanged (welcome isn't a progress point).

advance() chain
- Every transition handler now persists the new current_step to the
  store (handleWorkspaceCreated, handleRuntimeNext, handleAgentCreated,
  handleAgentSkip). Refresh lands on the right step instead of
  jumping back to Step 2.

Invariants
- OnboardingFlow throws on null user instead of spreading defensive
  `?? ""` and `if (userId)` that silently degraded to unassigned
  sub-issues. Shell guards already ensure user is present.
- Desktop WindowOverlay's onComplete gains a paths.root() fallback
  when workspace is undefined — matches web's symmetry.

docs/product-overview.md: committed from untracked.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(onboarding): persist questionnaire + current_step; resume + Back

End-to-end questionnaire persistence + resume capability. User answers
are now server-side (analytics-ready); refreshing or revisiting lands
on the furthest reached step with previous answers pre-filled; a Back
button on each step lets users edit earlier answers without losing
progress.

Backend
- Migration 051: ALTER TABLE "user" ADD onboarding_current_step TEXT,
  onboarding_questionnaire JSONB NOT NULL DEFAULT '{}'::jsonb
- sqlc: new PatchUserOnboarding with sqlc.narg for optional fields
  (COALESCE preserves unspecified columns). MarkUserOnboarded also
  clears current_step — once complete, the step pointer has no meaning
- Handler PATCH /api/me/onboarding accepting partial {current_step,
  questionnaire}. Questionnaire passthrough via json.RawMessage, no
  server-side validation of inner shape (keeps schema evolution free)
- UserResponse DTO emits both new fields; userToResponse coalesces
  JSONB to '{}' defensively

Frontend
- User type gains onboarding_current_step + onboarding_questionnaire
- api.patchOnboarding(payload)
- Delete Zustand onboarding store — replaced with plain async
  advanceOnboarding() / completeOnboarding() that call the API and
  sync auth store. Source of truth is the user object, no client-side
  shadow state that could drift
- pickInitialStep reads user.onboarding_current_step; StepQuestionnaire
  initial pre-fills from user.onboarding_questionnaire
- Monotonic furthestStepRef: Back edits don't regress server-side
  progress, and re-submit returns the user to where they were
- Back buttons on Steps 2/3/4. Back is local-only — just changes the
  rendered step, no PATCH
- Loading indicator on Welcome + Questionnaire submit buttons while
  PATCH is in flight
- CreateWorkspaceForm.onSuccess accepts Promise<void> so the flow can
  await advance() from its onCreated handler

Test mocks (helpers + callback test) updated with new User fields.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(onboarding): resume to Step 3+ needs workspace/runtime fallback

Self-review caught: resume lands the user on their saved step, but
React state (workspace, runtime, agent) is empty on fresh mount. The
render conditions gate on those — without fallbacks the page stays
blank.

- workspaceListOptions() query fills runtimeWorkspace from cache when
  stepping past Step 2. Only one workspace exists during onboarding
  (StepWorkspace always creates one), so [0] is unambiguous.
- StepWorkspace accepts an `existing` prop. On resume / Back to Step 2
  with a pre-existing workspace, render a "Continue with <name>"
  confirmation instead of the create form, which would otherwise hit a
  slug conflict the moment the user clicks Create.
- runtimeListOptions(wsId, "me") similarly seeds Step 4's runtime —
  prefer first online, fall back to first.

Step 5 resume path unchanged: if `agent` React state is null on
re-entry, bootstrap runs the self-serve branch. Not ideal (user may
have actually created an agent), but bootstrap's list-check approach
(future work) will handle orphan detection symmetrically.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(onboarding): delete all skip/resume jump logic

Flow always starts from Welcome. Questionnaire answers still pre-fill
from user.onboarding_questionnaire. current_step is still PATCHed for
future analytics but no UI code reads it for navigation.

Removed from onboarding-flow.tsx:
- pickInitialStep + isOnboardingStep (no server-driven entry point)
- furthestStepRef + resolveNextStep (no edit-vs-first-pass branching)
- runtimes useQuery + stepRuntime fallback (user walks through Step 3
  linearly, so runtime React state is always populated by Step 4)
- workspace resume fallback in runtimeWorkspace (same reasoning)

Kept:
- advanceOnboarding({ current_step, questionnaire? }) — server
  persistence, analytics-ready
- StepQuestionnaire's initial prop from stored answers
- workspaces useQuery (gated to step === "workspace" only) for
  existing-workspace detection on Step 2 to prevent slug conflicts
  when a previous onboarding was abandoned
- Back buttons + handleBack (local-only navigation)
- Error recovery on completeOnboarding via try/catch + toast

Every transition handler is now a straight advance + setStep line.
Users who close mid-flow and return walk the full flow from Welcome
again — slight extra clicks, but each step shows meaningful confirm
UI (existing workspace, connected runtimes, etc.) so it doesn't feel
like repeated work.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(onboarding): grandfather existing users in the onboarded_at migration

Folded the backfill into 050 itself (branch has not shipped to prod,
so editing the migration in place is clean). Without this, once this
branch deploys, every pre-existing user would be walled off into
onboarding on their next login — a real production incident.

Uses created_at rather than NOW() so analytics like "signup →
onboarded interval" read correctly for pre-launch users.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(onboarding): Step 1 questionnaire — two-column editorial layout

Matches the onboarding(3) design spec: full-bleed two-column on lg+
(main + "Why we ask" side rail), collapses to single column below.

- StepQuestionnaire rewritten with:
  - Mono 01/02/03 markers per question
  - Serif question headings (22px)
  - Editorial serif title ("Three answers. We'll handle the rest.")
  - Right-side rationale panel explaining what each answer unlocks
  - Sticky footer with hint + Continue CTA
  - Embeds StepHeader on the left column so it escapes the flow's
    narrow max-w-xl wrapper, same pattern Welcome uses
- OptionCard redesigned: radio-dot marker + inset ring on select,
  matches design's .opt pattern
- OtherOptionCard: text input appears below the row (not inside the
  card) with bottom-border-only styling, aligned under the label
- onboarding-flow: questionnaire now early-returns full-bleed,
  joining Welcome as a hero-layout step

Placeholder copy updated to match design examples; tests adjusted.

* fix(onboarding): questionnaire uses 3-region app-shell layout

Previous version had everything in a single scroll container with a
sticky footer. As the user scrolled into the questions, the Back
button and StepHeader progress indicator scrolled out of view, and
sticky-bottom had edge cases with width-constrained flex nesting.

Classic 3-region shell now:
- Fixed header row: Back button (left) + StepHeader progress
  indicator — persistently visible regardless of scroll position
- Scrollable middle: eyebrow / serif title / lede / 3 question
  blocks. Uses `flex-1 overflow-y-auto min-h-0` — the min-h-0 is
  the critical bit that lets a flex-1 child shrink below content
  height inside a flex column
- Fixed footer row: hint (hidden < sm) + Continue CTA — always
  reachable, never scrolled off

Right "Why we ask" panel is now an independent grid column with its
own overflow, so the two columns scroll independently instead of the
whole page having one shared scrollbar.

Side panel width reduced 520 → 480 to give the question column more
room on 1280/1366 screens where 1fr_520 left ~760px for content;
1fr_480 gives ~800-900px which comfortably fits the 620px max-w
content column plus breathing room.

* fix(onboarding): questionnaire needs DragStrip like every full-window view

Traffic lights were overlapping the StepHeader progress dots because
Step 1 escaped onboarding-flow's non-welcome wrapper (which renders
<DragStrip />) without rendering its own. The codebase convention per
packages/views/platform/drag-strip.tsx is: every full-window view
places a DragStrip as the first flex child of each visible column.

Adds DragStrip at the top of both the left (shell) and right
("Why we ask") columns, matching step-welcome.tsx which already did
this. Traffic lights now land in the 48px transparent strip with no
content collision; dragging from any top edge moves the window on
Electron; border-l between columns runs edge-to-edge.

Also made the right column's scroll container use
`min-h-0 flex-1 overflow-y-auto` so its internal scroll activates
independently of the left column.

(Separately investigated: useImmersiveMode is no longer called
anywhere in production code — the codebase has fully committed to
the DragStrip pattern. No action needed on the hook itself.)

* style(onboarding): drop top/bottom borders on questionnaire shell

* style(onboarding): use chat-style scroll fade mask instead of border

The questionnaire's scroll area now fades softly at top/bottom edges
via `useScrollFade` (already used by chat-message-list.tsx) — the
same mask-image linear-gradient pattern that fades content under the
header/footer based on scroll position:

- At top: only bottom fades (hint: more content below)
- At bottom: only top fades (hint: content above)
- In middle: both fade
- Fits entirely: no mask

This replaces the removed border-b/border-t on the header/footer with
a softer, more editorial visual separation while giving an actual
scroll-position affordance the border can't.

* feat(onboarding): show "n of 3 answered" progress next to Continue

Gives the user a glance-able progress signal as they fill the
questionnaire. Static text, no extra UI primitives, no dynamic
state variants — just `{n} of 3 answered` updating in place,
left of the Continue button.

Replaces the static "Your answers shape the next screens..." hint,
which was always there regardless of progress and added noise.

Same canContinue gate as before (all 3 answered), just derived
from the new per-question check so we don't compute validity twice.

* style(onboarding): drop redundant lede under questionnaire title

The title already conveys the "we'll handle the rest for you"
promise — the lede just rephrased it at length. Removed; bumped the
question-list top margin (mt-8 → mt-10) to keep breathing room.

* feat(onboarding): land redesigned flow + post-landing starter content opt-in

This commit bundles the final onboarding-redesign work that sat in the
working tree with today's architectural reshape of how starter content
is handled. Splitting across sqlc-regenerated files would be fragile,
so it ships as one logical unit — "onboarding is ready for production".

Flow redesign (Steps 1–5)
-------------------------
- Editorial two-column shells on Steps 1/2/3/4 (DragStrip + hero column
  + aside panel) — Welcome, Questionnaire, Workspace, Runtime, Agent
- Web-only Step 3 fork (Download desktop / Install CLI / Cloud waitlist)
  lives alongside desktop's direct runtime picker; cloud path is
  interest-capture only, doesn't advance the flow
- DragStrip extracted to packages/views/platform as a cross-platform
  component — 48px transparent drag row, no-op on web
- recommend-template.ts + test: Q1–Q3 → AgentTemplate mapping

Cloud waitlist
--------------
- Migration 052: cloud_waitlist_email VARCHAR(254) + cloud_waitlist_reason TEXT
- Handler: net/mail.ParseAddress + length bounds + reason trim
- Frontend: CloudWaitlistExpand component + api.joinCloudWaitlist

Drop persisted onboarding_current_step
--------------------------------------
- The interim implementation persisted the user's furthest-reached step;
  the final design starts every entry at Welcome, so the column is dead
- Migration 051 no longer adds it; migration 053 drops it IF EXISTS on
  any environment that ran the interim 051 — schema converges cleanly
- UserResponse / User type / patchOnboarding signature all drop the field

Post-landing starter content (new architecture)
-----------------------------------------------
Why: the old design ran bootstrap inside Step 5 (welcome issue + Getting
Started project + sub-issues, all in one try block). That had three
defects — (1) non-idempotent: Retry after partial failure created
duplicates; (2) sub-issue assignee raced listMembers → showed as
"Unknown"; (3) skipped users (paths A/C/D) never got any starter
content. All three are structural, not patchable.

New design: onboarding ends at completeOnboarding() as before (gate is
unchanged for useDashboardGuard). The 4 completion paths (Welcome skip
/ full flow / Runtime skip / Error recover) all just call
completeOnboarding() and navigate to workspace. On landing, a
StarterContentPrompt dialog renders exactly once per user
(starter_content_state == null) with Import / No thanks. The dialog is
mandatory — no X, no ESC, no outside-click — so state always ends in a
terminal value.

- Migration 054: starter_content_state TEXT, backfill 'skipped_legacy'
  for pre-feature onboarded users so they're never prompted
- Server POST /api/me/starter-content/import: transactional claim
  (NULL → 'imported') + bulk create project + optional welcome issue +
  sub-issues + pins, all in one tx. 409 Conflict on second call
- Server POST /api/me/starter-content/dismiss: transactional NULL → 'dismissed'
- Import decides agent-guided vs self-serve by inspecting the workspace's
  agent list at dialog time — fixes path A (Welcome skip + existing
  agent) which was previously excluded from starter content
- starter-content-templates.ts replaces bootstrap.ts: pure template
  builders, no API calls. Copy is reviewed as UI; server owns atomicity
- StepFirstIssue is now just completeOnboarding() + navigate; error
  surface collapses to a Retry button (no more "Continue anyway" branch)
- OnboardingCelebration + just-completed.ts removed (replaced by
  StarterContentPrompt which reads server state, not sessionStorage)

Handler hardening
-----------------
- PatchOnboarding: MaxBytesReader 16KB so the JSONB column can't be
  weaponized as bulk storage (every /api/me read returns the payload)
- JoinCloudWaitlist: net/mail format check + explicit 254-char cap
- ImportStarterContent: MaxBytesReader 64KB (templates are markdown-heavy
  but still bounded); welcome issue's agent_id verified in-workspace

Tests
-----
- Existing onboarding_test.go (waitlist) passes
- step-platform-fork.test.tsx + recommend-template.test.ts (new)
- apps/web test helpers updated for User.starter_content_state

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(onboarding): resolve Unknown assignee/creator + tighten prompt copy

Two surface issues on the post-landing starter content dialog:

1. Unknown assignee & Created by
-------------------------------
ImportStarterContent stored `member.id` (the membership row UUID) in
`assignee_id` and `creator_id` for sub-issues. That mismatched the rest
of the codebase — AssigneePicker and resolveActor in issue.go both
store `user_id` for type="member", and `useActorName.getMemberName`
looks members up by `user_id`. The mismatch meant the lookup never
matched any member and fell through to the "Unknown" fallback.

Fix: use `parseUUID(userID)` for both fields. The existing membership
check stays for the 403 signal; we just no longer need the returned
`member.ID`.

2. Dialog copy too long, button labels unclear
----------------------------------------------
Old copy was 3–4 paragraphs of instruction; users need to read less
than that to make a binary choice. Buttons "Import starter tasks" and
"No thanks" also didn't make it clear what "No thanks" actually does —
it starts a blank workspace, so say so.

New:
  - Title: "Welcome — add starter tasks?"
  - Body: one sentence describing the seeded content
  - Left button: "Start blank workspace"
  - Right button: "Add starter tasks"

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(onboarding): server decides starter content branch

Problem: the old ImportStarterContent gated the agent-guided vs
self-serve branch on a client-supplied `welcome_issue.agent_id` or
null `welcome_issue`. The client made that decision by reading its
React Query cache of the workspace's agent list — any timing quirk
(cache not populated, stale, race with WS event) could lie to the
server, and there was no way for the server to disagree. Users with
an agent in the DB could still end up on the self-serve branch.

Fix: the server is now authoritative. The client always sends both
template arrays (agent_guided_sub_issues, self_serve_sub_issues) and
a welcome_issue_template (title + description + priority, NO agent_id).
Inside the import transaction the server runs ListAgents on the
workspace — if there's at least one agent, it picks agents[0] (same
ordering the client used: created_at ASC), uses agent_guided_sub_issues,
and creates the welcome issue assigned to that agent. Otherwise it
uses self_serve_sub_issues and skips the welcome issue.

Side effect: the Unknown assignee/creator bug is structurally gone —
no client-supplied id flows into assignee_id/creator_id for type=
"member". The server uses actorID = parseUUID(userID) everywhere,
matching resolveActor in issue.go.

Client surface also simplifies: StarterContentPrompt drops
useQuery(agentListOptions), the hasAgent check, the agentsFetched
button gate, and the branch-specific copy. Dialog description is a
single generic line ("If you already have an agent, we'll also seed
a welcome issue it replies to right away"). buildImportPayload no
longer takes an agentId parameter — one unconditional return shape.

Payload grows ~15 KB (both sub-issue arrays always present); still
well under the 64 KB MaxBytesReader cap.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(onboarding): clarify runtime prerequisite, revert dialog agent list

Step 3 runtime (desktop step-runtime-connect.tsx) — scanning and empty
subtitles now name the local AI coding tools Multica drives (Claude
Code, Codex, Cursor, and others), so users understand a runtime alone
isn't enough: they also need one of those tools installed on the
machine. Uses "and others" rather than a closed list so we don't lock
the copy to exactly three integrations.

StarterContentPrompt dialog — reverted the short-lived "try Coding,
Planning, Writing agents and more" rewrite. That was a misread of
feedback meant for the Step 3 prerequisite, not the dialog. The
dialog's current single-sentence "how agents, issues, and context
work in Multica" is enough.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 20:32:33 +08:00
devv-eve
637bdc8eb3 feat(analytics): full PostHog pipeline + 6 funnel events (MUL-1122) (#1367)
* feat(analytics): add PostHog client with async batch shipping

Introduces server/internal/analytics, the shipping layer for the product
funnel defined in docs/analytics.md. Capture is non-blocking — events are
enqueued into a bounded channel and a background worker batches them to
PostHog's /batch/ endpoint. A broken backend drops events rather than
blocking request handlers.

Local dev and self-hosted instances run a noop client until the operator
sets POSTHOG_API_KEY. This is PR 1 of MUL-1122; signup and workspace_created
emission land in the follow-up commit so this change is independently
reviewable.

* feat(server): emit signup and workspace_created analytics events

Wires analytics.Client through handler.New and main, then emits the first
two funnel events:

- signup fires from findOrCreateUser (which now reports isNew), covering
  both the verification-code and Google OAuth entry points — a single
  emission site guarantees Google signups aren't missed.
- workspace_created fires after the CreateWorkspace transaction commits,
  with is_first_workspace computed from a post-commit ListWorkspaces count
  so we can distinguish fresh-user activation from returning-user
  expansion.

Tests use analytics.NoopClient so nothing ships from test runs. PR 1 of
MUL-1122; runtime_registered and issue_executed follow in later PRs per
the plan.

* refactor(analytics): drop is_first_workspace from workspace_created

Stamping "is this the user's first workspace?" at emit time races under
concurrent CreateWorkspace requests: two transactions committing close
together can both read a post-commit count greater than one and both emit
false. Fixing it at the SQL layer requires a schema change we don't want in
PR 1.

PostHog answers the same question exactly from the event stream (funnel on
"first time user does X" / cohort on $initial_event), so removing the
property loses no information and makes the emit side race-free.

* docs(analytics): document self-host safety defaults

Spell out why self-hosted instances never ship events upstream by default
(empty POSTHOG_API_KEY → noop client) and explain how operators can point
at their own PostHog project without any code change.

* feat(analytics): emit runtime_registered, issue_executed, team_invite_*

Three server-side funnel events, all gated on first-time state transitions
so retries and re-runs don't inflate the WAW buckets:

- runtime_registered fires from DaemonRegister when UpsertAgentRuntime
  reports (xmax = 0) — i.e. the row was inserted, not updated. Heartbeats
  and re-registrations stay silent.
- issue_executed fires from CompleteTask after an atomic
  UPDATE issue SET first_executed_at = now() WHERE id = $1 AND
  first_executed_at IS NULL flips the column for the first time. Retries,
  re-assignments, and comment-triggered follow-up tasks hit the WHERE
  clause and no-op. Carries nth_issue_for_workspace so the ≥1/≥2/≥5/≥10
  buckets filter without extra queries.
- team_invite_sent fires from CreateInvitation and team_invite_accepted
  from AcceptInvitation, closing the expansion funnel.

Adds a 050 migration for issue.first_executed_at plus a partial index so
the workspace-scoped executed-count query doesn't scan the never-executed
tail.

* feat(config): surface PostHog key via /api/config

Extends AppConfig with posthog_key / posthog_host sourced from env on
every request (so operators can rotate the key via secret refresh without
a restart). Reading the key off the server — rather than baking it into
the frontend bundle via NEXT_PUBLIC_* — means self-hosted instances
inherit the blank key automatically and never ship events upstream.

* feat(analytics): wire posthog-js identify + UTM capture on the client

Adds @multica/core/analytics — a thin wrapper around posthog-js that owns
attribution capture and identity merge. Posthog-js config comes from
/api/config (not NEXT_PUBLIC_*), so self-hosted instances whose server
returns an empty key automatically run the SDK inert.

captureSignupSource stamps a multica_signup_source cookie with UTM params
and the referrer's origin (never the full referrer — that can leak OAuth
code/state in the callback URL). The backend signup event reads this
cookie on new-user creation.

Identity flows:
- auth-initializer fires identify() right after getMe() resolves, on both
  cookie and token paths. A getConfig/getMe race is handled by buffering
  a pending identify inside the analytics module and flushing it once
  initAnalytics finishes.
- auth store calls identify() on verifyCode / loginWithGoogle /
  loginWithToken and resetAnalytics() on logout so the next login merges
  cleanly without bleeding events.

* docs(analytics): describe runtime_registered, issue_executed, invite events

Fills in the schema for the remaining funnel events. Captures the
design commentary that belongs next to the contract rather than in a PR
description — in particular why issue_executed uses the atomic
first_executed_at flip instead of counting task-terminal events, and why
runtime_registered relies on xmax = 0 rather than a query-then-write.

* fix(analytics): drop non-atomic nth_issue_for_workspace from issue_executed

Computing the workspace's Nth-issue ordinal at emit time is not atomic
under concurrent first-completions — two transactions can both run
MarkIssueFirstExecuted, then both run CountExecutedIssuesInWorkspace, and
both observe count=1 before either has committed, so both events go out
stamped as n=1. Serialising it would mean a per-workspace advisory lock
or a SERIALIZABLE-isolated tx; PostHog answers the same question exactly
at query time via row_number() partitioned by workspace_id, so the
emit-time property adds risk without adding information.

Removes the property from analytics.IssueExecuted, deletes the unused
CountExecutedIssuesInWorkspace query, and regenerates sqlc. The partial
index stays — any future workspace-scoped executed-issue query will want
it.

* fix(analytics): wire $pageview and harden signup_source cookie payload

Two frontend fixes from the PR review:

- PageviewTracker, mounted under WebProviders, fires capturePageview on
  every Next.js App Router path / query-string change. Without this the
  capturePageview helper in @multica/core/analytics was never called and
  the acquisition funnel's / → signup step was empty.
- captureSignupSource now caps each UTM / referrer value at 96 chars
  *before* JSON.stringify, and drops the whole cookie when the serialised
  payload still exceeds 512 chars. Previously the overall slice(0, 256)
  could leave a half-JSON string on the wire that neither the backend nor
  PostHog could parse.

Both capturePageview and identify now buffer a single pending call when
fired before initAnalytics resolves — otherwise the initial "/" pageview
and same-turn login identify race the /api/config fetch and get dropped.
resetAnalytics clears both buffers so a logout→login cycle stays clean.

* fix(analytics): URL-decode signup_source cookie on read

Go does not URL-decode Cookie.Value automatically, so the frontend's
JSON-then-encodeURIComponent payload was landing in PostHog as
percent-encoded garbage (%7B%22utm_source...). Unescape on read so the
backend receives the original JSON string the frontend intended, and
drop values that fail to decode or exceed the server-side cap — sending
truncated garbage is worse than sending nothing. Oversized-cookie guard
matches the frontend's SIGNUP_SOURCE_MAX_LEN.

* docs(analytics): reflect nth-issue drop, $pageview wiring, cookie encoding

Pulls the schema doc back in line with the code: issue_executed no longer
advertises nth_issue_for_workspace (with a note about why PostHog derives
it at query time instead), the frontend $pageview section names the
actual PageviewTracker component that fires it, and the signup_source
section documents the per-value cap / overall drop rule and the
encode-on-write / decode-on-read contract.

---------

Co-authored-by: Jiang Bohan <bhjiang@outlook.com>
2026-04-21 14:42:52 +08:00
Bohan Jiang
c5a00d8b8c fix(agent/openclaw): extract real model from meta.agentMeta.model (#1426)
OpenClaw's `--json` result blob carries the actual LLM identifier in
`meta.agentMeta.model` (e.g. `deepseek-chat`, `claude-sonnet-4`),
alongside `provider` and the usage breakdown. The backend was reading
the surrounding `agentMeta.usage` and `agentMeta.sessionId` but skipping
the `model` field entirely, then attributing every run's tokens to
`opts.Model` — which for openclaw is the *agent name* passed via
`--agent`, not a real model identifier — falling all the way through to
"unknown" when no agent.model was configured.

Surface the runtime-reported model:

- `openclawEventResult` gains a `model` string.
- `buildOpenclawEventResult` reads `agentMeta.model` (trimmed; empty
  string when absent for forward-compat with older runtimes / partial
  outputs).
- `processOutput` propagates it through the result-blob branch.
- `Execute`'s usage map prefers `scanResult.model`, falling back to
  `opts.Model` then `"unknown"` — preserving the prior behavior path
  for any runtime that doesn't surface its own model yet.

Two unit tests cover both the populated and missing cases.

Refs: #1395
2026-04-21 14:32:31 +08:00
Bohan Jiang
4ac43e9e49 feat(daemon): log agent invocation at info level (#1428)
Surface the actual exec path + argv for every agent backend at INFO
so operators can see the exact command without flipping to debug.
Also add the missing log line in pi.go for consistency with the
other nine backends.
2026-04-21 14:30:07 +08:00
Kagura
0db7d2fb64 fix(issues): include description in list queries for board card display (#1375) (#1377)
The ListIssues and ListOpenIssues SQL queries omitted the description
column, so the API response never included description data. Board cards
checked issue.description (always null) and never rendered it, even when
the Description card property was enabled.

Add description to both SQL queries, the generated Go structs/scan calls,
and the response mapping functions.
2026-04-21 12:20:10 +08:00
devv-eve
9e47b83f02 feat(agent): add Kimi CLI as agent runtime (#1400)
* feat(agent): add Kimi CLI as agent runtime

Adds support for Moonshot AI's Kimi Code CLI (https://github.com/MoonshotAI/kimi-cli)
as a new agent runtime, alongside Claude, Codex, OpenCode, OpenClaw, Hermes,
Gemini, Pi, Cursor and Copilot.

Kimi Code CLI implements the standard Agent Client Protocol (ACP) via the
`kimi acp` subcommand, so the new `kimiBackend` reuses the existing
hermesClient JSON-RPC transport in the agent package — only the binary,
client identity, log prefix, and tool-name extraction differ.

Wiring:
- server/pkg/agent: new kimiBackend + kimi_test.go; registered in New(),
  LaunchHeader map, and the supported-types coverage test.
- server/internal/daemon/config.go: probes `kimi` (overridable via
  MULTICA_KIMI_PATH / MULTICA_KIMI_MODEL).
- server/internal/daemon/execenv: writes AGENTS.md as the runtime context
  file (Kimi reads AGENTS.md natively via /init), and writes skills under
  `.kimi/skills/` so they are auto-discovered by the project-level skill
  loader.
- packages/views/runtimes: ProviderLogo gains a Kimi mark.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* feat(agent/kimi): support per-agent model selection via ACP set_model

Wire Kimi into the model dropdown introduced in #1399:

- ListModels gets a 'kimi' case that drives the same ACP
  initialize + session/new handshake as Hermes; both share a new
  discoverACPModels helper and parseACPSessionNewModels parser
  so future ACP backends only need a small provider entry.
- kimiBackend now issues session/set_model after session/new when
  opts.Model is non-empty, mirroring the Hermes flow. Failures
  fail the task instead of silently falling back to Kimi's
  default model — silent fallback would hide that the dropdown
  pick wasn't honoured.

Verified: go build ./..., go test ./pkg/agent/... ./internal/daemon/... ./internal/handler/..., pnpm typecheck and pnpm test (138 passed).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* refactor(agent): address code review feedback on Kimi runtime

- Share ACP provider-error sniffer between hermes and kimi. Previously
  only hermes promoted stderr-observed 4xx/5xx into a failed task;
  kimi would report "completed + empty output" when the Moonshot
  upstream rejected a request (expired token, rate limit, …). Rename
  hermesProviderErrorSniffer → acpProviderErrorSniffer and parameterise
  the provider name; wire it into kimiBackend.Execute the same way.
- Rename extractHermesSessionID → extractACPSessionID (shared by all
  ACP backends) so the name matches parseACPSessionNewModels.
- Drop the redundant second argument to kimiToolNameFromTitle; the
  Message struct has only one relevant field (Tool), so passing it
  twice was a dead fallback. Document that the function normalises
  residual capitalised kimi titles not caught by hermesToolNameFromTitle.
- Remove kimi-only cmd.WaitDelay override; the hermes baseline is
  fine for both and divergence adds noise.
- Add TestKimiBackendSetModelFailureFailsTask: fake `kimi acp` binary
  that returns a JSON-RPC error for session/set_model, asserts that
  the task result surfaces status=failed with the model name + upstream
  message and preserves the session id.
- Fix stale agent listings in agent.go / daemon/config.go doc comments
  (missing cursor, gemini, copilot).

All: `go build ./...`, `go vet ./...`, `go test ./pkg/agent/...
./internal/daemon/... ./internal/handler/...` green.

* fix(agent/kimi): pass --yolo so Shell tools don't hang on approval

Kimi's default config has `default_yolo = false`. Every Shell/file-mutating
tool call causes kimi acp to send a `session/request_permission` request
and block (up to 300s) waiting for a response. The daemon's hermesClient
only handles `session/update` notifications — permission requests go
unanswered, the tool call times out, and the UI loop eventually dies
("UI loop timed out"). Observed with the first real kimi task: agent sat
as Live for ~7 minutes before the daemon killed it.

The fix mirrors hermes' HERMES_YOLO_MODE=1 override: pass `--yolo` to
`kimi` so it auto-approves everything. `--yolo` is a top-level flag on
the `kimi` CLI (not a flag on `kimi acp`), so it must come before the
`acp` subcommand in argv. Added to kimiBlockedArgs so user custom_args
can't strip it.

While here, fix a related bug that made kimi tool names show up empty
in the daemon log ("tool #1: "): hermesToolNameFromTitle's fallback
returned `kind` when neither title-with-colon nor kind matched a known
tool. Kimi's ACP `tool_call` emits bare titles like "Shell" or "Read
file" with no `kind` at all, so we'd drop the title on the floor before
kimiToolNameFromTitle ever got a chance to map it. Now: preserve the
title when kind is unclassified; hermes titles always carry a colon so
this branch never fires for hermes.

Tests:
- TestKimiBackendPassesYoloFlag — fake binary that records its argv,
  asserts --yolo comes before acp.
- TestHermesToolNameFromTitle rows for bare kimi-style titles.
- Existing suite green: go build, go vet, full pkg/agent + daemon +
  handler test packages.

* fix(agent/acp): auto-approve session/request_permission from agent

The previous attempt (`kimi --yolo acp`) was a no-op. Inspected the
kimi-cli source: the `acp` Typer subcommand takes no parameters, so
flags on the root `kimi` command are dropped before `acp_main()` runs
— it's impossible to opt into YOLO mode through CLI flags for ACP.

The real fix is on our side: respond to session/request_permission.

ACP is bidirectional. When kimi runs a Shell or file-write tool, it
sends `session/request_permission` (agent → client, JSON-RPC request
with id + method) and waits up to 300s for a response. Our existing
hermesClient.handleLine only dispatched: (id + result/error) →
handleResponse, and (no id + method) → handleNotification. A request
with BOTH id and method fell through and got silently dropped — kimi
timed out, UI loop died, task sat stuck for 7 minutes.

Add handleAgentRequest: for session/request_permission, echo the id
and respond with outcome=selected, optionId=approve_for_session. The
daemon is headless; there's no user to prompt. `approve_for_session`
lets the agent remember the action so subsequent identical calls
(every Shell, every file write) skip the round-trip entirely. For any
other agent → client method, reply with standard -32601 method-not-
found so the agent doesn't block.

Also:
- Add writeMu so request() (main goroutine) and handleAgentRequest
  (reader goroutine) don't interleave JSON frames on stdin.
- Revert the `--yolo acp` flag — it's a no-op, and carrying it in
  kimiBlockedArgs gives the wrong impression that it does something.
  Comment in kimi.go now points at handleAgentRequest as the real fix.

Tests:
- TestHermesClientAutoApprovesPermissionRequest: inject a
  session/request_permission, assert the reply echoes the id and
  carries {outcome: selected, optionId: approve_for_session}.
- TestHermesClientReplesMethodNotFoundForUnknownAgentRequest: confirm
  unknown agent → client methods get JSON-RPC -32601 instead of silence.
- TestKimiBackendInvokesACPSubcommand replaces the yolo-flag assertion
  with a negative assertion: no dead --yolo / --auto-approve / -y on
  argv, since they'd pretend to do something they can't.

All: go build ./..., go vet ./..., go test ./pkg/agent/... green.

* fix(agent/acp): surface kimi tool input/output via content blocks

Kimi-cli emits tool_call and tool_call_update ACP frames with the
input/output inside a `content` array of ContentToolCallContent
blocks (shape: {type:"content", content:{type:"text", text:"..."}}),
not in the hermes-style `rawInput` map / `rawOutput` string. Our
parser only looked at rawInput/rawOutput, so the daemon recorded
empty Input and Output for every kimi tool — the execution-history
UI showed blank terminal panels even for commands that ran fine.

Add extractACPToolCallText() and a fallback in handleToolCallStart /
handleToolCallUpdate: when rawInput is nil / rawOutput is empty, pull
the text out of the content blocks. rawInput / rawOutput still take
precedence so hermes' behaviour is untouched. Terminal /
FileEditToolCallContent blocks are skipped (we have nothing to render
them as — kimi only emits TerminalToolCallContent when the client
advertises terminal capability, which we don't).

Tests:
- TestHermesClientHandleToolCallStartKimiContent — content array →
  Input.text populated.
- TestHermesClientHandleToolCallCompleteKimiContent — multi-block
  content → Output concatenated with newline separator.
- TestHermesClientHandleToolCallRawOutputTakesPrecedence — hermes
  rawOutput still wins when both are present.
- TestExtractACPToolCallText — unit coverage for the helper
  (single/multiple text blocks, terminal-block skip, empty input).

* fix(agent/acp): buffer streaming tool args so Input isn't empty in UI

kimi-cli streams tool args token-by-token via tool_call_update frames
— the initial tool_call carries an empty content block and each
subsequent in_progress update carries the cumulative JSON so far
(`{`, `{"comma`, `{"command": "echo`, …). The final completed update
then carries the tool's stdout, not the args. Observed per kimi-cli
acp/session.py::_send_tool_call{,_part,_result} and confirmed by
driving a real Shell call end-to-end: 10 in_progress frames, last
with `{"command": "echo hello world"}`, then completed with `hello
world\n`.

Our previous handleToolCallStart emitted MessageToolUse on the first
tool_call frame, capturing the empty content — so every kimi tool
appeared in the execution-history UI with a blank input. Output was
correct (fix 4335c198) but command was missing.

Changes:
- hermesClient now tracks pending tool calls per toolCallId. Hermes
  path is unchanged — rawInput is present at tool_call time, so
  emit-immediately-then-flag-emitted still fires on the initial frame.
- kimi path defers MessageToolUse until status=completed / failed.
  tool_call_update in_progress frames update the buffered argsText
  (cumulative, so overwrite); on completion we parse the accumulated
  JSON into Message.Input. Malformed JSON falls back to `{"text": …}`
  so non-JSON tool args still render.
- Orphan completion frames (no matching tool_call seen — e.g. daemon
  restarted mid-task) synthesise ToolUse from the update's own
  title/kind/rawInput so the UI still gets a header.
- extractACPToolCallText now also renders FileEditToolCallContent
  blocks as a compact header ("--- path / +++ path / (edited: N → M
  bytes)"). kimi emits these for Write / StrReplaceFile / Patch when
  the tool's display block is a DiffDisplayBlock.

Tests:
- TestHermesClientKimiStreamingToolCall: empty tool_call + 5 streaming
  in_progress + completed. Asserts no emission until complete, then
  [ToolUse(Input.command="echo hi"), ToolResult(Output="hi\n")].
- TestHermesClientKimiMalformedArgsFallback: non-JSON argsText → falls
  back to Input.text.
- TestHermesClientHandleToolCallCompleteOrphan: completed frame
  without a start → ToolUse synthesised from update's rawInput.
- TestExtractACPToolCallText: diff + new-file-diff cases.

All agent / daemon / handler test packages green.

---------

Co-authored-by: Eve <8b0578a3-cf72-4394-9e38-b328eca92463@users.noreply.multica.ai>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Eve <eve@multica.ai>
Co-authored-by: Lambda <f252c2c5-7d1d-4f3c-b394-a61abfe673fc@users.noreply.multica.ai>
2026-04-21 02:18:30 +08:00
Jiayuan Zhang
b291db11c2 feat(agents): add per-agent model field with provider-aware dropdown (#1399)
Adds a first-class `model` field on agents so users can pick the LLM model from the create / settings UI instead of editing `custom_env` / `custom_args`. Each provider's dropdown is populated from the live CLI when possible (`opencode models`, `pi --list-models`, `openclaw agents list --json`, `cursor-agent --list-models`, hermes ACP `session/new` → `SessionModelState`), with a static catalog for providers that don't enumerate.

Daemon resolves the runtime model as `agent.model → MULTICA_<PROVIDER>_MODEL → ""` — empty passes through so each backend's CLI picks its own default, avoiding static-guess drift.

Per-provider honouring:
- Claude / Codex / OpenCode / Cursor / Gemini / Pi / Copilot — CLI `--model` / thread payload.
- OpenClaw — `opts.Model` is mapped to `--agent <name>` (the CLI rejects `--model`).
- Hermes — `session/set_model` ACP RPC; stderr is sniffed for provider-level errors so HTTP 4xx from the configured LLM surfaces instead of "empty output"; explicit-model failures mark the task `failed`.

Supporting changes: migration 050 adds `agent.model`; daemon ↔ server heartbeat piggyback carries a model-discovery request; new REST endpoints under `/api/runtimes/{id}/models`; `multica agent create --model` / `update --model`; shared `ModelDropdown` in `packages/views/agents` (searchable, creatable, provider-grouped, default-badge, runtime-supported gate).
2026-04-21 00:06:34 +08:00
Bohan Jiang
ec73710dd2 fix(agent/codex): surface stderr tail in initialize / turn startup errors (#1314)
* fix(agent/codex): surface stderr tail in initialize / turn startup errors

When codex app-server exits before the JSON-RPC handshake completes —
e.g. because the user put a flag in custom_args that the subcommand
rejects — the Result.Error users see is `codex initialize failed:
codex process exited`, while codex's actual complaint (typically
something like `error: unexpected argument '-m' found`) only lives in
daemon logs.

Wrap the stderr writer with a bounded stderrTail that still forwards
to the slog logWriter but also retains the last 2 KiB of bytes
written. Include that tail on the three startup failure paths
(initialize, startOrResumeThread, turn/start). Runtime cancellation
paths are left untouched — they're our own abort and the stderr
context isn't a clear signal there.

Refs #1308. Complement to #1310 / #1312 — lets "bad custom_args fail
loudly" actually be workable by giving the failure a real message.

* fix(agent/codex): join cmd.Wait() before sampling stderr tail

Addressing review of #1314: reading stderrBuf.Tail() right after
c.request returns "codex process exited" was racy. Nothing in that
path synchronizes with os/exec's internal stderr copy goroutine —
cmd.Wait() is the only documented join point. The original defer ran
cmd.Wait() later, but by then we had already built Result.Error from
a potentially-empty Tail().

Replace the ad-hoc deferred stdin.Close()/cmd.Wait() with a
sync.Once-wrapped drainAndWait closure. Call it explicitly on the
three startup failure paths before sampling the tail; keep it as the
cleanup defer so the success path behaves identically.

Also add TestCodexExecuteSurfacesStderrWhenChildExitsEarly: spawns a
real subprocess that prints to stderr and exits before responding to
initialize, runs it through Execute, and asserts Result.Error
contains the stderr hint. This covers the full timing path the
reviewer flagged, which the helper-level tests in this PR did not.
2026-04-20 14:38:32 +08:00
Bohan Jiang
bd445782d5 fix(openclaw): stop passing unsupported flags and actually deliver AgentInstructions (#1362)
Fixes #1332.

Two regressions introduced in #910 (2026-04-14, "OpenClaw backend P0+P1
improvements") that together block all openclaw users:

1. `openclaw agent` does not accept `--model` or `--system-prompt`, so
   any agent configured with a Model field crashed in ~700ms with
   `exit status 1`. Remove both forwards, and add them to
   openclawBlockedArgs so custom_args can't reintroduce the crash.
   Model is bound at registration time via `openclaw agents
   add/update --model`.

2. AgentInstructions were written to `{workDir}/AGENTS.md` by
   execenv.InjectRuntimeConfig, but openclaw loads bootstrap files
   from its own workspace dir — the file was never read, so every
   agent's Instructions field was silently discarded. Populate
   opts.SystemPrompt for the openclaw provider in runTask and
   prepend it to the `--message` payload in the backend so the
   model actually receives the instructions.

Other providers surface instructions through their native runtime
config file (CLAUDE.md / AGENTS.md / GEMINI.md) and are intentionally
left unchanged to avoid double injection.

Extract buildOpenclawArgs so arg construction is directly testable;
add unit tests covering the removed flags, the SystemPrompt prepend,
and custom_args filtering.
2026-04-20 14:01:41 +08:00
devv-eve
5fa1da448f fix(chat): preserve chat session resume pointer across failures (#1360)
* fix(chat): preserve chat session resume pointer across failures

The chat 'forgets earlier messages' bug came from PriorSessionID being
silently lost in several edge cases:

- UpdateChatSessionSession unconditionally overwrote chat_session.session_id,
  so any task that completed without a session_id (early agent crash,
  missing result) wiped the resume pointer to NULL.
- CompleteAgentTask + UpdateChatSessionSession ran in separate calls. A
  follow-up chat message claimed in between resumed against a stale (or
  NULL) session and started over.
- FailAgentTask never wrote session_id back, so a task that established
  a real session before failing lost its resume pointer.
- ClaimTaskByRuntime only trusted chat_session.session_id and never
  fell back to the existing GetLastChatTaskSession query, so a single
  bad turn could permanently drop the conversation memory.

This change:

- Use COALESCE in UpdateChatSessionSession so empty inputs preserve the
  existing pointer; surface DB errors instead of swallowing them.
- Run CompleteAgentTask/FailAgentTask + UpdateChatSessionSession inside
  the same transaction (TaskService now takes a TxStarter).
- Extend FailAgentTask + the daemon FailTask path (client, handler,
  service) to forward session_id/work_dir, so failed/blocked tasks that
  built a real session still record it.
- Fall back to GetLastChatTaskSession in ClaimTaskByRuntime when the
  chat_session pointer is missing, and include failed tasks in that
  lookup so a single failure can't lose the conversation.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix(daemon): forward session_id/work_dir on blocked + timeout paths

runTask previously dropped result.SessionID and env.WorkDir on the
non-completed return paths:

- timeout returned a naked error, so handleTask called FailTask with
  empty session info and the chat resume pointer was either left stale
  or eventually overwritten with NULL.
- blocked / failed (default branch) returned a TaskResult without
  SessionID / WorkDir, so even though FailTask now COALESCEs into
  chat_session, there was no value to write through.
- the empty-output completion path was the same: it raised an error
  even when a real session_id had been built.

All three paths now return a TaskResult that carries the SessionID /
WorkDir the backend produced. Combined with the COALESCE-based update
in UpdateChatSessionSession and the FailTask plumbing introduced in
PR #1360, the next chat turn can always resume from the latest agent
session — even when the previous turn timed out, was rate-limited, or
returned an empty completion — instead of starting over with no memory
of the conversation.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix(copilot): capture session id from session.start as fallback

The Copilot backend only read sessionId from the synthetic 'result'
event, ignoring the one already present on session.start. When the CLI
was killed before result arrived (timeout, cancel, crash, or a
session.error mid-turn), the daemon reported SessionID="" and the
chat-session resume pointer could not advance — causing the chat to
silently drop conversation memory on the next turn.

Capture session.start.sessionId into state up front, and only let
'result' overwrite it when it actually carries one. result still wins
when present (it is the authoritative end-of-turn record).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix(copilot): parse premiumRequests as float to preserve session id

Copilot CLI v1.0.32 serializes premiumRequests as a float (e.g. 7.5),
not an integer. Our copilotResultUsage struct typed it as int, which
made the entire 'result' line fail json.Unmarshal — silently dropping
sessionId on every turn.

This was the real cause of chat memory loss: the daemon reported
SessionID="" to the server, chat_session.session_id stayed NULL, and
the next chat turn never received --resume <id>, so each turn started
a fresh Copilot session with no prior context.

Add a regression test using the real JSON line from CLI v1.0.32 that
asserts sessionId is preserved when premiumRequests is fractional.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Devv <devv@Devvs-Mac-mini.local>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Eve <eve@multica.ai>
Co-authored-by: yushen <ldnvnbl@gmail.com>
2026-04-19 22:50:33 -07:00
Bohan Jiang
163f34f918 feat(agents): show launch mode preview in custom args tab (#1312)
* feat(agent): add LaunchHeader per agent type

Each backend in server/pkg/agent/ hardcodes a stable command skeleton
(e.g. `codex app-server --listen stdio://`, `hermes acp`) before
appending opts.CustomArgs. Surfacing that skeleton lets the UI tell
users which command their custom_args are being appended to, so a
Codex user doesn't mistakenly add `-m gpt-5.4-mini` expecting it to
reach the CLI when the subcommand is actually `app-server`.

Expose only the minimum that aids judgment — binary + subcommand, or a
short mode label when there is no subcommand — and deliberately omit
transport values, internal flags, and env to keep the surface small
and renaming-safe.

Refs #1308.

* feat(handler/runtime): surface launch_header on runtime response

runtimeToResponse now derives launch_header from agent.LaunchHeader,
piggybacking on the runtime's existing provider field so the
frontend's RuntimeDevice gains the skeleton without a new endpoint or
DB query. Client gets the header for free whenever it lists agents'
runtimes — which the custom-args tab already does.

Refs #1308.

* feat(ui/agents): show launch mode preview in custom args tab

Thread the resolved RuntimeDevice from AgentDetail into CustomArgsTab
and render its launch_header as a one-line preview above the args
list, so users see `codex app-server <your args>` (or equivalent per
provider) and can tell whether a CLI-style flag like `--model` will
actually reach the invoked subcommand. Source of truth stays in the
Go backend; the TS type just carries the string.

Refs #1308.
2026-04-18 14:18:42 +08:00
niceSprite
746f33a38b fix(claude): clear fresh session_id on resume failure so daemon fallback fires (#1285)
When --resume targets a dead session, claude prints
"No conversation found with session ID: ..." to stderr, emits a stream-json
system init with a fresh session_id, then exits with code 1. The backend
was treating that fresh id as the authoritative session, so
daemon.go's retry-with-fresh-session fallback (SessionID == "" guard)
never triggered. Every subsequent task for the same (issue, agent) pair
stayed permanently broken until the server-side session_id was cleared by
hand.

Fix: when --resume was requested but the emitted session_id differs AND
the run failed, drop the fresh id from Result so the daemon's existing
fallback can do its job. Factored into a pure helper and unit-tested.

Fixes #1284

Co-authored-by: fuxiao <fuxiao@zyql.com>
2026-04-18 12:59:30 +08:00
Korkyzer
63800f05ff fix(agent): add per-agent mcp_config field to restore MCP access (#1168)
* fix(agent): add per-agent mcp_config field to restore MCP access

Closes #1111

The --strict-mcp-config flag was added defensively in #592 to prevent
Claude agents from inheriting MCP state from the outer Claude Code session.
It was meant to be paired with --mcp-config <path> to inject a controlled
set of MCPs, but that path was never implemented, which silently stripped
all user-scope MCPs from spawned agents.

This PR completes the original design by:

- Adding a nullable mcp_config jsonb column to the agents table
- Wiring mcp_config through AgentResponse, Create/Update requests
- Piping it into ExecOptions.McpConfig in the daemon
- Serializing to a temp file and passing --mcp-config <path> in buildClaudeArgs
- Blocklisting --mcp-config in claudeBlockedArgs to prevent override
  via custom_args

Does not touch Codex provider (tracked separately in #674).
Does not implement Multica MCP auto-injection (out of scope).

* fix: disambiguate JSON null vs absent for mcp_config
2026-04-18 01:35:22 +08:00
Bohan Jiang
a73336dcf8 feat(daemon): persistent UUID identity + legacy-id merge at register-time (#1220)
* feat(daemon): persistent UUID identity + legacy-id merge at register-time

daemon_id is now a stable UUID persisted to `<profile-dir>/daemon.id` on
first start, replacing the hostname-derived id that drifted whenever
`.local` appeared/disappeared, a system was renamed, or a profile
switched — each of which used to mint a fresh `agent_runtime` row and
strand agents on the old one.

To migrate existing installs without operator intervention, the daemon
reports every legacy id it may have registered under previously
(`host`, `host` with `.local` stripped, and `host[-profile]` variants
for both). At register-time the server looks up each candidate row
scoped to (workspace, provider), re-points its agents and tasks onto
the new UUID-keyed row, records which legacy id was subsumed in the
new `legacy_daemon_id` column for audit, and deletes the stale row.
Result: users running `xxx.local`-keyed runtimes today transparently
land on the new UUID row on next daemon restart.

The hostname-prefix `MigrateAgentsToRuntime` / `daemon_id LIKE '...-%'`
compatibility shim is no longer needed and has been removed along with
the handler call that invoked it.

* fix(daemon): handle bidirectional .local drift and case drift in legacy merge

Review on #1220 flagged two gaps in the legacy-id migration candidate set:

1. Reverse .local: LegacyDaemonIDs only added the stripped variant when the
   current hostname ended in `.local`. The opposite direction — DB has
   `foo.local`, current host is `foo` — was missed, so runtimes registered
   under the `.local` variant stayed orphaned after upgrade. Now both
   variants (`foo` and `foo.local`) are always emitted, regardless of what
   `os.Hostname()` currently returns, plus their `-<profile>` suffix forms.

2. Case drift: os.Hostname() has been observed returning different casings
   on the same machine across mDNS/reboot state. A case-sensitive `=`
   comparison stranded rows like `Jiayuans-MacBook-Pro.local` when the
   daemon later reported `jiayuans-macbook-pro.local`. FindLegacyRuntimeByDaemonID
   now uses `LOWER(daemon_id) = LOWER(@daemon_id)` on both sides, so casing
   differences merge rather than orphan. The (workspace_id, provider) prefix
   still bounds the scan to a tiny set of rows so the non-indexed LOWER()
   comparison has negligible cost.

Tests: TestLegacyDaemonIDs gets the mixed-case + reverse-direction cases;
daemon_test.go adds TestDaemonRegister_MergesLegacyDaemonIDRuntime_ReverseDotLocal
and TestDaemonRegister_MergesLegacyDaemonIDRuntime_CaseDrift.

* fix(daemon): consolidate every case-duplicate legacy runtime, not just the first

Follow-up review on #1220: after switching to `LOWER(daemon_id) =
LOWER(@daemon_id)`, the single-row lookup still only merged one legacy
row per candidate. If a machine already had two rows in the DB that
differed only in casing (e.g. `Jiayuans-MacBook-Pro.local` AND
`jiayuans-macbook-pro.local` coexisting because earlier hostname drift
already minted a duplicate), only one of them got consolidated and the
other stayed orphaned — violating the "no duplicate runtime per machine
after backfill" acceptance.

- FindLegacyRuntimeByDaemonID → FindLegacyRuntimesByDaemonID (:many)
- mergeLegacyRuntimes iterates every returned row and dedupes across
  overlapping legacy candidates so `foo` and `foo.local` both resolving
  to the same stored row don't double-process

Test: TestDaemonRegister_MergesAllCaseDuplicateLegacyRuntimes seeds two
case-duplicate rows with one agent each and confirms both rows are
deleted and both agents end up on the new UUID-keyed row.
2026-04-17 15:10:38 +08:00
Bohan Jiang
423ceaf8f4 test(agent): regression tests for codex subagent threadId filter (#1257)
Follow-up to #1192. Document the v2 protocol contract that the
dispatch-level threadId guard relies on, and lock down the two leakage
paths the guard closes:

- turn/completed from a subagent thread must not call onTurnDone
- item/completed (agentMessage, final_answer) from a subagent thread
  must neither leak text into the output builder nor terminate the turn

Without these tests a future refactor that drops or relocates the guard
would not be caught by CI, since existing notification tests omit the
top-level threadId field and pass through unfiltered.
2026-04-17 14:49:38 +08:00
niceSprite
462ff88df5 fix(codex): dispatch-level threadId filter for subagent notifications (#1192)
* fix(daemon): filter thread/status/changed by threadId to prevent subagent interference

When Codex CLI has memories enabled, the app-server spawns a memory
consolidation subagent as a separate thread within the same stdio
connection. When that subagent thread finishes and transitions to idle,
the daemon's codex backend mistakenly interprets the idle signal as the
main turn completing, causing it to close stdin and cancel the context
before the real turn produces any output.

Add a threadId check to the thread/status/changed handler so only
status changes from the tracked thread trigger turn completion. Signals
from subagent threads (threadId != c.threadID) are now ignored.

Fixes #1181

* fix(codex): dispatch-level threadId filter for subagent notifications

Codex multiplexes subagent threads (e.g. memory consolidation) on
the same stdio pipe. Previously only thread/status/changed had a
threadId guard, but item/completed (agentMessage + final_answer),
turn/completed, and turn/started from subagent threads could still
trigger onTurnDone or contaminate output.

Move the threadId check to the top of handleRawNotification so all
notification handlers are protected. Remove the now-redundant
per-handler check on thread/status/changed.

Fixes multica-ai/multica#1181

---------

Co-authored-by: fuxiao <fuxiao@zyql.com>
2026-04-17 14:45:09 +08:00
Bohan Jiang
d12d690c38 fix(usage): bucket workspace usage by task_usage.created_at, not enqueue time (#1176)
GetWorkspaceUsageByDay and GetWorkspaceUsageSummary had the same date
attribution bug as the runtime endpoint fixed in #1167: they bucketed
and filtered on agent_task_queue.created_at (enqueue time), so a task
that queued at 23:58 and reported usage at 00:05 was attributed to the
prior day, and ?days=N became a rolling now()-N window that clipped the
morning of the earliest day returned.

Switch both queries to task_usage.created_at (~= task completion time)
and snap the since cutoff to start-of-day via DATE_TRUNC, mirroring
ListRuntimeUsage.

These endpoints have no frontend caller today, but per offline
discussion they will back the upcoming workspace-level usage dashboard.
Fix preemptively so the dashboard inherits correct numbers.

Add a regression test covering both endpoints with the same
cross-midnight + earliest-day-cutoff scenarios used for runtime usage.
2026-04-16 19:06:49 +08:00
Bohan Jiang
a36252ca99 refactor(runtime): derive runtime usage from task_usage only (#1167)
* refactor(runtime): derive runtime usage from task_usage only

The daemon used to scan each runtime's local CLI log directory every 5
minutes (Claude Code, Codex, OpenCode, OpenClaw, Hermes) and post daily
aggregates to /api/daemon/runtimes/{id}/usage. Those directories are
shared with the user's own local CLI sessions, so the user's personal
usage was being counted as Daemon-executed usage. Cursor and Gemini had
no scanner at all, so their runtime-level aggregates were always zero.

Switch GetRuntimeUsage to aggregate task_usage (already scoped to
Daemon-executed tasks) via agent_task_queue.runtime_id. Single source of
truth; Cursor/Gemini/Copilot get runtime usage for free; no reliance on
external CLI log formats.

Removes:
- server/internal/daemon/usage/ (all scanners)
- Daemon.usageScanLoop + providerToRuntimeMap
- Client.ReportUsage
- ReportRuntimeUsage handler + POST /api/daemon/runtimes/{id}/usage
- UpsertRuntimeUsage / GetRuntimeUsageSummary queries
- runtime_usage table (migration 046)

Refs: MUL-786

* fix(runtime): bucket daily usage by task_usage.created_at, not enqueue time

ListRuntimeUsage was aggregating by DATE(atq.created_at) and filtering
on atq.created_at. agent_task_queue.created_at is the enqueue timestamp,
which drifts from actual token-production time: a task queued at 23:58
and executed at 00:05 was attributed to yesterday; a task sitting in
the queue overnight was counted on the queue day.

The ?days=N cutoff also became a rolling window (now() - N) instead of
a calendar-day boundary, silently clipping the morning of the earliest
day returned.

Switch bucket + filter to task_usage.created_at (~= task completion /
usage-report time) and snap the since cutoff to start-of-day via
DATE_TRUNC.

Add a regression test covering both scenarios: cross-midnight task
attributes to the day tokens were reported, and the earliest day's
pre-cutoff rows are still included.
2026-04-16 18:54:12 +08:00
Bohan Jiang
9a97ee1f4c fix(agent): resume codex thread across tasks on the same issue (#1166)
Every other backend (Claude, Gemini, OpenCode, OpenClaw, Hermes) honors
ExecOptions.ResumeSessionID — only Codex didn't. That's why users on
the Codex runtime saw each new comment on an issue start a fresh Codex
conversation: the daemon persists Result.SessionID per (agent, issue)
and passes it back as PriorSessionID, but codex.go always called
thread/start and never populated SessionID, so the value round-tripped
as empty.

Wire the missing half:

- Extract startOrResumeThread on codexClient. When ResumeSessionID is
  set, call thread/resume (per the Codex app-server protocol), passing
  only cwd / model / developerInstructions overrides so the thread
  keeps its persisted model and reasoning effort. If resume fails
  (unknown thread, schema drift, transport error) fall back to
  thread/start so the task still runs on a fresh thread.
- Surface the live threadID as Result.SessionID on the final emit so
  the daemon stores it and feeds it back into ResumeSessionID on the
  next claim.

Tests drive the new helper through the fake stdin harness, covering:
fresh start, successful resume, fallback on resume error, fallback
when resume returns no thread ID, and surfacing of thread/start
failures.
2026-04-16 18:06:11 +08:00
LinYushen
cd50c31201 feat(agent): add GitHub Copilot CLI backend (#1157)
* feat(agent): add GitHub Copilot CLI backend

Integrate Copilot CLI as a new agent backend using the stable
`-p` JSONL mode (`--output-format json`), following the same
spawn-CLI-scan-JSONL pattern established by claude.go.

Backend (server/pkg/agent/copilot.go):
- Spawn `copilot -p <prompt> --output-format json --allow-all-tools --no-ask-user`
- Parse streaming JSONL events (system/assistant/user/result/log)
- Extract session ID for resume support (`--resume <id>`)
- Accumulate per-model token usage for billing
- Filter blocked args to prevent protocol-critical flag overrides

Daemon config:
- Probe MULTICA_COPILOT_PATH / MULTICA_COPILOT_MODEL env vars
- Copilot uses AGENTS.md (native discovery) and default skills path

Frontend:
- Add Copilot logo SVG and provider switch case

Tests: 14 unit tests covering arg building, event parsing, usage
accumulation, and edge cases. All Go + TS checks pass.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix(daemon): add restart subcommand, make daemon uses it

- `daemon start` keeps original behavior: errors if already running
- `daemon restart` stops existing daemon then starts fresh
- `make daemon` now runs `daemon restart --profile local`

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix(copilot): address review nits 1-5

- Nit 1: Add MinVersions["copilot"] = "1.0.0"
- Nit 2: Seed activeModel from session.start.data.selectedModel (falls
  back to opts.Model, then "copilot"). First-turn tokens now get correct
  model attribution.
- Nit 3: Handle assistant.reasoning/reasoning_delta → MessageThinking,
  reasoningText in assistant.message → MessageThinking,
  session.warning → MessageLog{warn}
- Nit 4: Extract handleCopilotEvent() method shared by production and
  tests — no more duplicated switch body that can drift
- Nit 5: Deltas write to output buffer as defense-in-depth; if process
  dies before assistant.message, output is non-empty

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-16 17:14:56 +08:00
Bohan Jiang
ac8b08e540 fix(agent): surface codex turn errors instead of reporting empty output (#1156)
When codex emits `turn/completed` with `status="failed"` or a terminal
top-level `error` notification, the daemon previously treated the turn
as successfully completed, saw no accumulated text, and surfaced the
generic "codex returned empty output" — hiding the real reason (auth,
sandbox, API error, etc.).

Capture `turn.error.message` on failed turns and the `error.message`
from non-retrying top-level error notifications, then propagate them
through `Result.Error` with `finalStatus="failed"` so the daemon's
default branch reports the actual cause.
2026-04-16 16:53:08 +08:00
devv-eve
c0b4e7e8b8 feat(agent): add Cursor Agent CLI runtime support (#1057)
* feat(agent): add Cursor Agent CLI runtime support

Add cursor-agent as a new agent backend, following the same pattern as
existing providers. The implementation spawns cursor-agent CLI with
stream-json output, parses JSONL events into the unified Message type,
and supports session resume, usage tracking, and auto-approval (--yolo).

Changes:
- server/pkg/agent/cursor.go: cursorBackend implementation
- server/pkg/agent/cursor_test.go: unit tests for args, parsing, errors
- server/pkg/agent/agent.go: register "cursor" in New() factory
- server/internal/daemon/config.go: probe cursor-agent in PATH
- server/internal/daemon/execenv/context.go: cursor skill discovery path
- server/internal/daemon/execenv/runtime_config.go: AGENTS.md injection
- packages/views/.../provider-logo.tsx: cursor logo in UI

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(agent): address PR review for cursor backend

1. Fix token usage double-counting: usage is now taken exclusively from
   "result" events (session totals). Per-message usage in "assistant"
   events is intentionally ignored. "step_finish" usage is only used as
   fallback when no "result" usage is available.

2. Remove dead code: isCursorUnknownSessionError() and its regex were
   defined but never called. Removed along with corresponding test.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(agent): add missing CustomArgs, SystemPrompt, MaxTurns, and debug logging to cursor backend

- Add cursorBlockedArgs and filterCustomArgs support for safe custom arg passthrough
- Add --system-prompt and --max-turns flag support to buildCursorArgs
- Add debug logging of command args before execution (consistent with all other backends)
- Move stdout-close goroutine inside main goroutine (consistent with claude.go pattern)
- Add tests for SystemPrompt/MaxTurns and CustomArgs filtering

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* chore: make daemon uses local profile & update Cursor logo to official brand

- Makefile: make daemon now runs 'daemon start --profile local' for local dev
- Replace Cursor runtime logo with official brand SVG (removed background rect)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix(agent): remove unsupported --system-prompt and --max-turns from cursor-agent

cursor-agent CLI does not support these flags. Instructions are already
injected via AGENTS.md and .cursor/skills/ files.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix(agent): prevent step_finish + result usage double-counting in cursor

Split usage accumulation into separate stepUsage and resultUsage maps.
After stream ends, use resultUsage if available (session totals from
result event), otherwise fall back to stepUsage (sum of step_finish).
This prevents 2x counting when result.usage already includes totals.

Added table-driven test covering: result-only, step_finish-only,
step_finish+result (no double count), and multi-model scenarios.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* docs(agent): fix misleading comment on cursor -p flag

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Devv <devv@Devvs-Mac-mini.local>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: yushen <ldnvnbl@gmail.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-16 15:54:21 +08:00
Bohan Jiang
8c518c350a feat(agent): add Pi agent runtime support (#1064)
* feat(agent): add Pi agent runtime support

Add Pi as a new agent runtime provider, following the established adapter
pattern. Pi CLI outputs JSONL events which are parsed for messages, tool
calls, and usage tracking.

Backend:
- New piBackend implementing the Backend interface (pi.go)
- Pi CLI discovery via MULTICA_PI_PATH env var or PATH lookup
- JSONL event stream parsing (agent_start, message_update, thinking_update,
  tool_execution_start/end, agent_end)
- Usage scanner for ~/.pi/sessions/*.jsonl files
- Runtime config injection via AGENTS.md
- Skill injection to .pi/agent/skills/

Frontend:
- Pi provider logo (teal π icon)
- Pi label in transcript dialog

Docs:
- Updated all provider lists in README, CLI_INSTALL, and docs

* fix(agent): filter Pi usage scanner to agent_end events only

Address review feedback: restrict usage parsing to agent_end events
which contain cumulative totals, preventing potential inaccuracy if
Pi adds usage fields to other event types in the future.

* fix(agent): align Pi runtime with real CLI flags, event schema, and custom_args

- Flags: Pi's CLI uses `--mode json` (not `--output-format jsonl`), has no
  `--yolo` (explicit `--tools` allowlist instead), takes the prompt as a
  positional argument (not `-p <prompt>`), splits model as
  `--provider <name> --model <id>`, and treats `--session` as a file path
  that must exist before spawn.
- Event parsing: rewrite the stream event struct to match Pi's actual
  JSON event schema (`message_update.assistantMessageEvent.delta`,
  `turn_end.message.usage.{input,output,cacheRead,cacheWrite}`, etc.).
- Sessions: generate/persist session files under ~/.multica/pi-sessions/
  and use the file path as the opaque SessionID returned to the daemon.
- Usage scanner: read assistant `message` events from the same session
  files (Pi's session-file schema, distinct from the stdout stream).
- Custom args: consume `ExecOptions.CustomArgs` via `filterCustomArgs`
  with a Pi-specific blocked set (`-p`, `--print`, `--mode`, `--session`)
  so Pi matches the pattern shared by every other agent backend.
2026-04-16 15:42:40 +08:00
Junghwan
7395b51aee fix(agent): apply filterCustomArgs to hermes backend for parity (#1122)
Every other backend (claude, codex, opencode, openclaw, gemini) filters
opts.CustomArgs through a per-backend blocked map so protocol-critical
flags can't be overridden via the Create Agent UI. The hermes backend
appended CustomArgs directly to argv, so any future flag we add to the
map would be silently bypassed here.

Add hermesBlockedArgs (with 'acp' as the pinned subcommand) and route
CustomArgs through filterCustomArgs. Behaviour is identical for today's
use cases; the change prevents accidental protocol-flag overrides and
brings hermes in line with the other five backends.

Closes #1113

Co-authored-by: shaun0927 <shaun0927@users.noreply.github.com>
2026-04-16 13:53:53 +08:00
Bohan Jiang
b6d30c0e00 feat(agent): log full command line at debug level when spawning agents (#1071)
Add a debug-level log line in every agent backend (claude, codex,
opencode, openclaw, gemini, hermes) that prints the executable path
and full argument list when spawning the agent process. Helps diagnose
custom args, model overrides, and other CLI flag issues.
2026-04-15 16:21:55 +08:00
Bohan Jiang
ce447c7f06 feat(agent): add custom CLI arguments support (#986)
* feat(agent): add custom CLI arguments support

Allow users to configure custom CLI arguments per agent that get
appended to the agent subprocess command at launch time. This enables
use cases like specifying different models (--model o3), max turns,
or other provider-specific flags without needing separate runtimes.

Changes:
- Add custom_args JSONB column to agent table (migration 041)
- Update API handler to accept/return custom_args in create/update
- Pass custom_args through claim endpoint to daemon
- Append custom_args to CLI commands for all agent backends
- Add ExecOptions.CustomArgs field in agent package
- Add Custom Args tab in agent detail UI
- Add --custom-args flag to CLI agent create/update commands

Closes MUL-802

* fix(agent): filter protocol-critical flags from custom_args

Add per-backend filtering of custom_args to prevent users from
accidentally overriding flags that the daemon hardcodes for its
communication protocol (e.g. --output-format, --input-format,
--permission-mode for Claude).

This follows the same pattern as custom_env's isBlockedEnvKey: we
only block the small, stable set of flags that would break the
daemon↔agent protocol — not every possible dangerous flag. Workspace
members are trusted for everything else.

Each backend defines its own blocked set:
- Claude: -p, --output-format, --input-format, --permission-mode
- Gemini: -p, --yolo, -o
- Codex: --listen
- OpenCode: --format
- OpenClaw: --local, --json, --session-id, --message
- Hermes: none (ACP is positional)

Includes unit tests for the filtering logic.

* fix(agent): address code review nits for custom_args

- Replace module-level `nextArgId` counter with `crypto.randomUUID()`
  in custom-args-tab.tsx to avoid SSR ID conflicts
- Add unit tests for custom args passthrough and blocked-arg filtering
  in both Claude and Gemini arg builders
2026-04-15 14:58:53 +08:00
Jiayuan Zhang
f94b0100cd refactor(autopilot): remove broken concurrency policies and fix multiple bugs (#1048)
Remove the concurrency_policy system (skip/queue/replace) — skip had an
orphan bug that permanently blocked triggers, queue didn't actually queue,
and replace didn't cancel running tasks. Every trigger now simply executes.

Bug fixes:
- Listener now handles in_review status (was silently ignored)
- Issue deletion fails linked autopilot runs before DELETE (prevents orphans)
- ComputeNextRun rejects invalid timezones instead of silent UTC fallback
- dispatchCreateIssue post-commit failures now properly fail the run

Reliability:
- Scheduler recovers lost triggers on startup (crash recovery)
- New index on autopilot_run(issue_id) for deletion lookups
- Migration 043 cleans up historical orphaned/skipped/pending runs
2026-04-15 13:48:21 +08:00
Jiayuan Zhang
d88fe2608e feat(autopilot): scheduled/triggered automations for AI agents (#1028)
* feat(autopilot): add scheduled/triggered automation for AI agents

Introduce the Autopilot feature — recurring automations that assign work
to AI agents on a schedule or manual trigger. Supports two execution
modes: create_issue (creates an issue for the agent to work on) and
run_only (directly enqueues an agent task without issue pollution).

Backend: migration (3 tables + 2 columns), sqlc queries, AutopilotService
with concurrency policies (skip/queue/replace), HTTP CRUD + trigger
endpoints, background cron scheduler (30s tick), event listeners for
issue→run and task→run status sync.

Frontend: types, API client methods, TanStack Query hooks with optimistic
mutations, realtime cache invalidation, list page with create dialog,
detail page with trigger management and run history, sidebar nav + routes
for both web and desktop apps.

* feat(autopilot): improve UX — trigger config, edit dialog, template gallery

- Replace raw cron input with friendly frequency tabs (Hourly/Daily/Weekdays/Weekly/Custom), time picker, and timezone dropdown defaulting to user's local timezone
- Fix Select components showing UUIDs instead of names (Base UI render function pattern)
- Add Edit button on detail page opening a unified edit dialog
- Remove project/concurrency/issue-title-template from create/edit (simplify for users)
- Add trigger configuration inline during autopilot creation
- Add template gallery on empty state (6 step-by-step workflow templates)
- Rename "Description" to "Prompt" throughout UI
- Inject autopilot run timestamp into issue description for agent date awareness
- Treat issue status "in_review" as run completion (fixes skip on next trigger)
- Make migration idempotent with IF NOT EXISTS clauses
2026-04-15 04:54:37 +08:00
Bohan Jiang
ff1d348274 feat(security): invitation acceptance flow for workspace members (#1019)
* feat(security): replace instant member-add with invitation acceptance flow

Users invited to a workspace must now explicitly accept the invitation
before becoming a member. This fixes the security vulnerability where
knowing someone's email was enough to auto-register their runtime to
your workspace.

Changes:
- Add workspace_invitation table with pending/accepted/declined/expired states
- Replace CreateMember with CreateInvitation (same endpoint, new behavior)
- Add accept/decline/revoke/list invitation API endpoints
- Add invitation WS events for real-time notification
- Frontend: invitation accept/decline UI in workspace switcher
- Frontend: pending invitations section in members settings tab

* fix(invitation): address PR review nits

- Fix invitation:revoked listener to send event to invitee user (was no-op)
- Remove duplicate queryClient2 in app-sidebar.tsx, reuse existing queryClient
- Add expires_at > now() filter to ListPendingInvitationsByWorkspace query
2026-04-15 00:01:18 +08:00
Naiyuan Qing
a744cd4f45 feat(chat): redesign state, header, and unread tracking
State management
- Pending task / live timeline are now Query-cache single source;
  Zustand mirror removed (fixes duplicate assistant render caused by
  the invalidate→refetch race window)
- WS subscriptions moved from ChatWindow to global useRealtimeSync so
  pending state survives minimize and refresh
- New GET /chat/sessions/:id/pending-task to recover live state on mount
- Drafts persisted per-session (was per-workspace)

Unread tracking
- Migration 040: chat_session.unread_since (event-driven; old chats
  stay clean — no mass backfill)
- POST /chat/sessions/:id/read clears unread; broadcasts
  chat:session_read so other devices sync
- New GET /chat/pending-tasks aggregate for the FAB
- ChatFab: brand-color impulse animation while running, brand-dot
  badge of unread session count
- ChatWindow auto-marks read when user is viewing the session

Header redesign
- Two independent dropdowns: agent (avatar + name + My/Others
  grouping) at the input bottom-left; session (title + agent avatar)
  in the header
- ⊕ new-chat button replaces the old + and history buttons
- Session dropdown lists all sessions across agents with avatars
- Empty state: 3 clickable starter prompts that send immediately
- Mention link renderer falls through to default span on null —
  fixes @member/@agent/@all silently disappearing app-wide
- User messages render through Markdown
- Enter submits in chat input only (with IME guard + codeBlock skip);
  bubble menu hidden in chat

Misc
- Partial index on agent_task_queue for fast pending-task lookup
- 2 new storage keys added to clearWorkspaceStorage
- useMarkChatSessionRead has onError rollback
- chat.* namespace logs across store, mutations, components, realtime

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 18:21:11 +08:00
Bohan Jiang
bf71802451 fix(server): trigger agent on reply in thread where agent already participated (#981)
When a member replies in a member-started thread without @mentioning the
assigned agent, the on_comment trigger was suppressed — even if the agent
had already replied in that thread. This meant the common flow of
"member posts → agent replies → member follows up" would not re-trigger
the agent on the follow-up.

Add HasAgentRepliedInThread SQL query and check it in isReplyToMemberThread
so that agent participation in a thread is treated as an ongoing conversation.
2026-04-14 18:00:29 +08:00
Bohan Jiang
0a998d1cef Merge pull request #846 from multica-ai/agent/j/feb218fd
feat(agent): support custom environment variables for router/proxy mode
2026-04-14 15:34:21 +08:00
Jiang Bohan
bb4944bae2 fix(openclaw): handle pretty-printed multi-line JSON output
OpenClaw outputs its --json result as pretty-printed multi-line JSON to
stderr. The line-by-line scanner never found a valid JSON object on any
single line, causing the raw JSON to be returned as the chat response.

After exhausting line-by-line parsing, try parsing the accumulated
output as a whole before falling back to raw text.

Closes MUL-725
2026-04-14 14:39:32 +08:00
Jiang Bohan
f3355049bc feat(agent): add live log support for Gemini CLI via stream-json
Switch Gemini backend from `-o text` (batch output) to `-o stream-json`
(NDJSON streaming) so tool calls, text, and errors are forwarded to the
UI in real time instead of collected at the end.

Parses all Gemini stream-json event types: init, message, tool_use,
tool_result, error, and result — including per-model token usage from
the result stats.
2026-04-14 14:17:13 +08:00
Bohan Jiang
c71525e198 Merge pull request #910 from multica-ai/agent/j/openclaw-p0-p1
feat(agent): OpenClaw backend P0+P1 improvements
2026-04-14 14:02:38 +08:00
devv-eve
977dc6479d fix(daemon): prevent task stall when agent process hangs on stdout (#947)
When an agent CLI process hangs (e.g. a tool call blocks on unreachable
I/O), the daemon's scanner blocks indefinitely on stdout, preventing the
Result from ever being sent. This causes tasks to stay in "running"
state permanently with no further events.

Three-layer fix:

1. Agent backends (claude, opencode, openclaw, gemini): add a watchdog
   goroutine that closes the stdout/stderr pipe when the context is
   cancelled, forcing the scanner to unblock. Also set cmd.WaitDelay
   so Go force-closes pipes after 10s if the process doesn't exit.

2. daemon executeAndDrain: add an independent drain timeout (backend
   timeout + 30s buffer) with context-aware select on both the message
   channel and the result channel, so the daemon never blocks forever.

3. daemon ping path: add context-aware select so pings don't deadlock
   if the agent backend stalls.

Closes #925

Co-authored-by: Devv <devv@Devvs-Mac-mini.local>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 23:00:27 -07:00