* fix(agent/codex): surface stderr tail in initialize / turn startup errors
When codex app-server exits before the JSON-RPC handshake completes —
e.g. because the user put a flag in custom_args that the subcommand
rejects — the Result.Error users see is `codex initialize failed:
codex process exited`, while codex's actual complaint (typically
something like `error: unexpected argument '-m' found`) only lives in
daemon logs.
Wrap the stderr writer with a bounded stderrTail that still forwards
to the slog logWriter but also retains the last 2 KiB of bytes
written. Include that tail on the three startup failure paths
(initialize, startOrResumeThread, turn/start). Runtime cancellation
paths are left untouched — they're our own abort and the stderr
context isn't a clear signal there.
Refs #1308. Complement to #1310 / #1312 — lets "bad custom_args fail
loudly" actually be workable by giving the failure a real message.
* fix(agent/codex): join cmd.Wait() before sampling stderr tail
Addressing review of #1314: reading stderrBuf.Tail() right after
c.request returns "codex process exited" was racy. Nothing in that
path synchronizes with os/exec's internal stderr copy goroutine —
cmd.Wait() is the only documented join point. The original defer ran
cmd.Wait() later, but by then we had already built Result.Error from
a potentially-empty Tail().
Replace the ad-hoc deferred stdin.Close()/cmd.Wait() with a
sync.Once-wrapped drainAndWait closure. Call it explicitly on the
three startup failure paths before sampling the tail; keep it as the
cleanup defer so the success path behaves identically.
Also add TestCodexExecuteSurfacesStderrWhenChildExitsEarly: spawns a
real subprocess that prints to stderr and exits before responding to
initialize, runs it through Execute, and asserts Result.Error
contains the stderr hint. This covers the full timing path the
reviewer flagged, which the helper-level tests in this PR did not.
Fixes#1332.
Two regressions introduced in #910 (2026-04-14, "OpenClaw backend P0+P1
improvements") that together block all openclaw users:
1. `openclaw agent` does not accept `--model` or `--system-prompt`, so
any agent configured with a Model field crashed in ~700ms with
`exit status 1`. Remove both forwards, and add them to
openclawBlockedArgs so custom_args can't reintroduce the crash.
Model is bound at registration time via `openclaw agents
add/update --model`.
2. AgentInstructions were written to `{workDir}/AGENTS.md` by
execenv.InjectRuntimeConfig, but openclaw loads bootstrap files
from its own workspace dir — the file was never read, so every
agent's Instructions field was silently discarded. Populate
opts.SystemPrompt for the openclaw provider in runTask and
prepend it to the `--message` payload in the backend so the
model actually receives the instructions.
Other providers surface instructions through their native runtime
config file (CLAUDE.md / AGENTS.md / GEMINI.md) and are intentionally
left unchanged to avoid double injection.
Extract buildOpenclawArgs so arg construction is directly testable;
add unit tests covering the removed flags, the SystemPrompt prepend,
and custom_args filtering.
* fix(chat): preserve chat session resume pointer across failures
The chat 'forgets earlier messages' bug came from PriorSessionID being
silently lost in several edge cases:
- UpdateChatSessionSession unconditionally overwrote chat_session.session_id,
so any task that completed without a session_id (early agent crash,
missing result) wiped the resume pointer to NULL.
- CompleteAgentTask + UpdateChatSessionSession ran in separate calls. A
follow-up chat message claimed in between resumed against a stale (or
NULL) session and started over.
- FailAgentTask never wrote session_id back, so a task that established
a real session before failing lost its resume pointer.
- ClaimTaskByRuntime only trusted chat_session.session_id and never
fell back to the existing GetLastChatTaskSession query, so a single
bad turn could permanently drop the conversation memory.
This change:
- Use COALESCE in UpdateChatSessionSession so empty inputs preserve the
existing pointer; surface DB errors instead of swallowing them.
- Run CompleteAgentTask/FailAgentTask + UpdateChatSessionSession inside
the same transaction (TaskService now takes a TxStarter).
- Extend FailAgentTask + the daemon FailTask path (client, handler,
service) to forward session_id/work_dir, so failed/blocked tasks that
built a real session still record it.
- Fall back to GetLastChatTaskSession in ClaimTaskByRuntime when the
chat_session pointer is missing, and include failed tasks in that
lookup so a single failure can't lose the conversation.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* fix(daemon): forward session_id/work_dir on blocked + timeout paths
runTask previously dropped result.SessionID and env.WorkDir on the
non-completed return paths:
- timeout returned a naked error, so handleTask called FailTask with
empty session info and the chat resume pointer was either left stale
or eventually overwritten with NULL.
- blocked / failed (default branch) returned a TaskResult without
SessionID / WorkDir, so even though FailTask now COALESCEs into
chat_session, there was no value to write through.
- the empty-output completion path was the same: it raised an error
even when a real session_id had been built.
All three paths now return a TaskResult that carries the SessionID /
WorkDir the backend produced. Combined with the COALESCE-based update
in UpdateChatSessionSession and the FailTask plumbing introduced in
PR #1360, the next chat turn can always resume from the latest agent
session — even when the previous turn timed out, was rate-limited, or
returned an empty completion — instead of starting over with no memory
of the conversation.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* fix(copilot): capture session id from session.start as fallback
The Copilot backend only read sessionId from the synthetic 'result'
event, ignoring the one already present on session.start. When the CLI
was killed before result arrived (timeout, cancel, crash, or a
session.error mid-turn), the daemon reported SessionID="" and the
chat-session resume pointer could not advance — causing the chat to
silently drop conversation memory on the next turn.
Capture session.start.sessionId into state up front, and only let
'result' overwrite it when it actually carries one. result still wins
when present (it is the authoritative end-of-turn record).
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* fix(copilot): parse premiumRequests as float to preserve session id
Copilot CLI v1.0.32 serializes premiumRequests as a float (e.g. 7.5),
not an integer. Our copilotResultUsage struct typed it as int, which
made the entire 'result' line fail json.Unmarshal — silently dropping
sessionId on every turn.
This was the real cause of chat memory loss: the daemon reported
SessionID="" to the server, chat_session.session_id stayed NULL, and
the next chat turn never received --resume <id>, so each turn started
a fresh Copilot session with no prior context.
Add a regression test using the real JSON line from CLI v1.0.32 that
asserts sessionId is preserved when premiumRequests is fractional.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
---------
Co-authored-by: Devv <devv@Devvs-Mac-mini.local>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Eve <eve@multica.ai>
Co-authored-by: yushen <ldnvnbl@gmail.com>
* feat(agent): add LaunchHeader per agent type
Each backend in server/pkg/agent/ hardcodes a stable command skeleton
(e.g. `codex app-server --listen stdio://`, `hermes acp`) before
appending opts.CustomArgs. Surfacing that skeleton lets the UI tell
users which command their custom_args are being appended to, so a
Codex user doesn't mistakenly add `-m gpt-5.4-mini` expecting it to
reach the CLI when the subcommand is actually `app-server`.
Expose only the minimum that aids judgment — binary + subcommand, or a
short mode label when there is no subcommand — and deliberately omit
transport values, internal flags, and env to keep the surface small
and renaming-safe.
Refs #1308.
* feat(handler/runtime): surface launch_header on runtime response
runtimeToResponse now derives launch_header from agent.LaunchHeader,
piggybacking on the runtime's existing provider field so the
frontend's RuntimeDevice gains the skeleton without a new endpoint or
DB query. Client gets the header for free whenever it lists agents'
runtimes — which the custom-args tab already does.
Refs #1308.
* feat(ui/agents): show launch mode preview in custom args tab
Thread the resolved RuntimeDevice from AgentDetail into CustomArgsTab
and render its launch_header as a one-line preview above the args
list, so users see `codex app-server <your args>` (or equivalent per
provider) and can tell whether a CLI-style flag like `--model` will
actually reach the invoked subcommand. Source of truth stays in the
Go backend; the TS type just carries the string.
Refs #1308.
When --resume targets a dead session, claude prints
"No conversation found with session ID: ..." to stderr, emits a stream-json
system init with a fresh session_id, then exits with code 1. The backend
was treating that fresh id as the authoritative session, so
daemon.go's retry-with-fresh-session fallback (SessionID == "" guard)
never triggered. Every subsequent task for the same (issue, agent) pair
stayed permanently broken until the server-side session_id was cleared by
hand.
Fix: when --resume was requested but the emitted session_id differs AND
the run failed, drop the fresh id from Result so the daemon's existing
fallback can do its job. Factored into a pure helper and unit-tested.
Fixes#1284
Co-authored-by: fuxiao <fuxiao@zyql.com>
* fix(agent): add per-agent mcp_config field to restore MCP access
Closes#1111
The --strict-mcp-config flag was added defensively in #592 to prevent
Claude agents from inheriting MCP state from the outer Claude Code session.
It was meant to be paired with --mcp-config <path> to inject a controlled
set of MCPs, but that path was never implemented, which silently stripped
all user-scope MCPs from spawned agents.
This PR completes the original design by:
- Adding a nullable mcp_config jsonb column to the agents table
- Wiring mcp_config through AgentResponse, Create/Update requests
- Piping it into ExecOptions.McpConfig in the daemon
- Serializing to a temp file and passing --mcp-config <path> in buildClaudeArgs
- Blocklisting --mcp-config in claudeBlockedArgs to prevent override
via custom_args
Does not touch Codex provider (tracked separately in #674).
Does not implement Multica MCP auto-injection (out of scope).
* fix: disambiguate JSON null vs absent for mcp_config
* feat(daemon): persistent UUID identity + legacy-id merge at register-time
daemon_id is now a stable UUID persisted to `<profile-dir>/daemon.id` on
first start, replacing the hostname-derived id that drifted whenever
`.local` appeared/disappeared, a system was renamed, or a profile
switched — each of which used to mint a fresh `agent_runtime` row and
strand agents on the old one.
To migrate existing installs without operator intervention, the daemon
reports every legacy id it may have registered under previously
(`host`, `host` with `.local` stripped, and `host[-profile]` variants
for both). At register-time the server looks up each candidate row
scoped to (workspace, provider), re-points its agents and tasks onto
the new UUID-keyed row, records which legacy id was subsumed in the
new `legacy_daemon_id` column for audit, and deletes the stale row.
Result: users running `xxx.local`-keyed runtimes today transparently
land on the new UUID row on next daemon restart.
The hostname-prefix `MigrateAgentsToRuntime` / `daemon_id LIKE '...-%'`
compatibility shim is no longer needed and has been removed along with
the handler call that invoked it.
* fix(daemon): handle bidirectional .local drift and case drift in legacy merge
Review on #1220 flagged two gaps in the legacy-id migration candidate set:
1. Reverse .local: LegacyDaemonIDs only added the stripped variant when the
current hostname ended in `.local`. The opposite direction — DB has
`foo.local`, current host is `foo` — was missed, so runtimes registered
under the `.local` variant stayed orphaned after upgrade. Now both
variants (`foo` and `foo.local`) are always emitted, regardless of what
`os.Hostname()` currently returns, plus their `-<profile>` suffix forms.
2. Case drift: os.Hostname() has been observed returning different casings
on the same machine across mDNS/reboot state. A case-sensitive `=`
comparison stranded rows like `Jiayuans-MacBook-Pro.local` when the
daemon later reported `jiayuans-macbook-pro.local`. FindLegacyRuntimeByDaemonID
now uses `LOWER(daemon_id) = LOWER(@daemon_id)` on both sides, so casing
differences merge rather than orphan. The (workspace_id, provider) prefix
still bounds the scan to a tiny set of rows so the non-indexed LOWER()
comparison has negligible cost.
Tests: TestLegacyDaemonIDs gets the mixed-case + reverse-direction cases;
daemon_test.go adds TestDaemonRegister_MergesLegacyDaemonIDRuntime_ReverseDotLocal
and TestDaemonRegister_MergesLegacyDaemonIDRuntime_CaseDrift.
* fix(daemon): consolidate every case-duplicate legacy runtime, not just the first
Follow-up review on #1220: after switching to `LOWER(daemon_id) =
LOWER(@daemon_id)`, the single-row lookup still only merged one legacy
row per candidate. If a machine already had two rows in the DB that
differed only in casing (e.g. `Jiayuans-MacBook-Pro.local` AND
`jiayuans-macbook-pro.local` coexisting because earlier hostname drift
already minted a duplicate), only one of them got consolidated and the
other stayed orphaned — violating the "no duplicate runtime per machine
after backfill" acceptance.
- FindLegacyRuntimeByDaemonID → FindLegacyRuntimesByDaemonID (:many)
- mergeLegacyRuntimes iterates every returned row and dedupes across
overlapping legacy candidates so `foo` and `foo.local` both resolving
to the same stored row don't double-process
Test: TestDaemonRegister_MergesAllCaseDuplicateLegacyRuntimes seeds two
case-duplicate rows with one agent each and confirms both rows are
deleted and both agents end up on the new UUID-keyed row.
Follow-up to #1192. Document the v2 protocol contract that the
dispatch-level threadId guard relies on, and lock down the two leakage
paths the guard closes:
- turn/completed from a subagent thread must not call onTurnDone
- item/completed (agentMessage, final_answer) from a subagent thread
must neither leak text into the output builder nor terminate the turn
Without these tests a future refactor that drops or relocates the guard
would not be caught by CI, since existing notification tests omit the
top-level threadId field and pass through unfiltered.
* fix(daemon): filter thread/status/changed by threadId to prevent subagent interference
When Codex CLI has memories enabled, the app-server spawns a memory
consolidation subagent as a separate thread within the same stdio
connection. When that subagent thread finishes and transitions to idle,
the daemon's codex backend mistakenly interprets the idle signal as the
main turn completing, causing it to close stdin and cancel the context
before the real turn produces any output.
Add a threadId check to the thread/status/changed handler so only
status changes from the tracked thread trigger turn completion. Signals
from subagent threads (threadId != c.threadID) are now ignored.
Fixes#1181
* fix(codex): dispatch-level threadId filter for subagent notifications
Codex multiplexes subagent threads (e.g. memory consolidation) on
the same stdio pipe. Previously only thread/status/changed had a
threadId guard, but item/completed (agentMessage + final_answer),
turn/completed, and turn/started from subagent threads could still
trigger onTurnDone or contaminate output.
Move the threadId check to the top of handleRawNotification so all
notification handlers are protected. Remove the now-redundant
per-handler check on thread/status/changed.
Fixesmultica-ai/multica#1181
---------
Co-authored-by: fuxiao <fuxiao@zyql.com>
GetWorkspaceUsageByDay and GetWorkspaceUsageSummary had the same date
attribution bug as the runtime endpoint fixed in #1167: they bucketed
and filtered on agent_task_queue.created_at (enqueue time), so a task
that queued at 23:58 and reported usage at 00:05 was attributed to the
prior day, and ?days=N became a rolling now()-N window that clipped the
morning of the earliest day returned.
Switch both queries to task_usage.created_at (~= task completion time)
and snap the since cutoff to start-of-day via DATE_TRUNC, mirroring
ListRuntimeUsage.
These endpoints have no frontend caller today, but per offline
discussion they will back the upcoming workspace-level usage dashboard.
Fix preemptively so the dashboard inherits correct numbers.
Add a regression test covering both endpoints with the same
cross-midnight + earliest-day-cutoff scenarios used for runtime usage.
* refactor(runtime): derive runtime usage from task_usage only
The daemon used to scan each runtime's local CLI log directory every 5
minutes (Claude Code, Codex, OpenCode, OpenClaw, Hermes) and post daily
aggregates to /api/daemon/runtimes/{id}/usage. Those directories are
shared with the user's own local CLI sessions, so the user's personal
usage was being counted as Daemon-executed usage. Cursor and Gemini had
no scanner at all, so their runtime-level aggregates were always zero.
Switch GetRuntimeUsage to aggregate task_usage (already scoped to
Daemon-executed tasks) via agent_task_queue.runtime_id. Single source of
truth; Cursor/Gemini/Copilot get runtime usage for free; no reliance on
external CLI log formats.
Removes:
- server/internal/daemon/usage/ (all scanners)
- Daemon.usageScanLoop + providerToRuntimeMap
- Client.ReportUsage
- ReportRuntimeUsage handler + POST /api/daemon/runtimes/{id}/usage
- UpsertRuntimeUsage / GetRuntimeUsageSummary queries
- runtime_usage table (migration 046)
Refs: MUL-786
* fix(runtime): bucket daily usage by task_usage.created_at, not enqueue time
ListRuntimeUsage was aggregating by DATE(atq.created_at) and filtering
on atq.created_at. agent_task_queue.created_at is the enqueue timestamp,
which drifts from actual token-production time: a task queued at 23:58
and executed at 00:05 was attributed to yesterday; a task sitting in
the queue overnight was counted on the queue day.
The ?days=N cutoff also became a rolling window (now() - N) instead of
a calendar-day boundary, silently clipping the morning of the earliest
day returned.
Switch bucket + filter to task_usage.created_at (~= task completion /
usage-report time) and snap the since cutoff to start-of-day via
DATE_TRUNC.
Add a regression test covering both scenarios: cross-midnight task
attributes to the day tokens were reported, and the earliest day's
pre-cutoff rows are still included.
Every other backend (Claude, Gemini, OpenCode, OpenClaw, Hermes) honors
ExecOptions.ResumeSessionID — only Codex didn't. That's why users on
the Codex runtime saw each new comment on an issue start a fresh Codex
conversation: the daemon persists Result.SessionID per (agent, issue)
and passes it back as PriorSessionID, but codex.go always called
thread/start and never populated SessionID, so the value round-tripped
as empty.
Wire the missing half:
- Extract startOrResumeThread on codexClient. When ResumeSessionID is
set, call thread/resume (per the Codex app-server protocol), passing
only cwd / model / developerInstructions overrides so the thread
keeps its persisted model and reasoning effort. If resume fails
(unknown thread, schema drift, transport error) fall back to
thread/start so the task still runs on a fresh thread.
- Surface the live threadID as Result.SessionID on the final emit so
the daemon stores it and feeds it back into ResumeSessionID on the
next claim.
Tests drive the new helper through the fake stdin harness, covering:
fresh start, successful resume, fallback on resume error, fallback
when resume returns no thread ID, and surfacing of thread/start
failures.
* feat(agent): add GitHub Copilot CLI backend
Integrate Copilot CLI as a new agent backend using the stable
`-p` JSONL mode (`--output-format json`), following the same
spawn-CLI-scan-JSONL pattern established by claude.go.
Backend (server/pkg/agent/copilot.go):
- Spawn `copilot -p <prompt> --output-format json --allow-all-tools --no-ask-user`
- Parse streaming JSONL events (system/assistant/user/result/log)
- Extract session ID for resume support (`--resume <id>`)
- Accumulate per-model token usage for billing
- Filter blocked args to prevent protocol-critical flag overrides
Daemon config:
- Probe MULTICA_COPILOT_PATH / MULTICA_COPILOT_MODEL env vars
- Copilot uses AGENTS.md (native discovery) and default skills path
Frontend:
- Add Copilot logo SVG and provider switch case
Tests: 14 unit tests covering arg building, event parsing, usage
accumulation, and edge cases. All Go + TS checks pass.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* fix(daemon): add restart subcommand, make daemon uses it
- `daemon start` keeps original behavior: errors if already running
- `daemon restart` stops existing daemon then starts fresh
- `make daemon` now runs `daemon restart --profile local`
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* fix(copilot): address review nits 1-5
- Nit 1: Add MinVersions["copilot"] = "1.0.0"
- Nit 2: Seed activeModel from session.start.data.selectedModel (falls
back to opts.Model, then "copilot"). First-turn tokens now get correct
model attribution.
- Nit 3: Handle assistant.reasoning/reasoning_delta → MessageThinking,
reasoningText in assistant.message → MessageThinking,
session.warning → MessageLog{warn}
- Nit 4: Extract handleCopilotEvent() method shared by production and
tests — no more duplicated switch body that can drift
- Nit 5: Deltas write to output buffer as defense-in-depth; if process
dies before assistant.message, output is non-empty
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
---------
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
When codex emits `turn/completed` with `status="failed"` or a terminal
top-level `error` notification, the daemon previously treated the turn
as successfully completed, saw no accumulated text, and surfaced the
generic "codex returned empty output" — hiding the real reason (auth,
sandbox, API error, etc.).
Capture `turn.error.message` on failed turns and the `error.message`
from non-retrying top-level error notifications, then propagate them
through `Result.Error` with `finalStatus="failed"` so the daemon's
default branch reports the actual cause.
* feat(agent): add Cursor Agent CLI runtime support
Add cursor-agent as a new agent backend, following the same pattern as
existing providers. The implementation spawns cursor-agent CLI with
stream-json output, parses JSONL events into the unified Message type,
and supports session resume, usage tracking, and auto-approval (--yolo).
Changes:
- server/pkg/agent/cursor.go: cursorBackend implementation
- server/pkg/agent/cursor_test.go: unit tests for args, parsing, errors
- server/pkg/agent/agent.go: register "cursor" in New() factory
- server/internal/daemon/config.go: probe cursor-agent in PATH
- server/internal/daemon/execenv/context.go: cursor skill discovery path
- server/internal/daemon/execenv/runtime_config.go: AGENTS.md injection
- packages/views/.../provider-logo.tsx: cursor logo in UI
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(agent): address PR review for cursor backend
1. Fix token usage double-counting: usage is now taken exclusively from
"result" events (session totals). Per-message usage in "assistant"
events is intentionally ignored. "step_finish" usage is only used as
fallback when no "result" usage is available.
2. Remove dead code: isCursorUnknownSessionError() and its regex were
defined but never called. Removed along with corresponding test.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(agent): add missing CustomArgs, SystemPrompt, MaxTurns, and debug logging to cursor backend
- Add cursorBlockedArgs and filterCustomArgs support for safe custom arg passthrough
- Add --system-prompt and --max-turns flag support to buildCursorArgs
- Add debug logging of command args before execution (consistent with all other backends)
- Move stdout-close goroutine inside main goroutine (consistent with claude.go pattern)
- Add tests for SystemPrompt/MaxTurns and CustomArgs filtering
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* chore: make daemon uses local profile & update Cursor logo to official brand
- Makefile: make daemon now runs 'daemon start --profile local' for local dev
- Replace Cursor runtime logo with official brand SVG (removed background rect)
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* fix(agent): remove unsupported --system-prompt and --max-turns from cursor-agent
cursor-agent CLI does not support these flags. Instructions are already
injected via AGENTS.md and .cursor/skills/ files.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* fix(agent): prevent step_finish + result usage double-counting in cursor
Split usage accumulation into separate stepUsage and resultUsage maps.
After stream ends, use resultUsage if available (session totals from
result event), otherwise fall back to stepUsage (sum of step_finish).
This prevents 2x counting when result.usage already includes totals.
Added table-driven test covering: result-only, step_finish-only,
step_finish+result (no double count), and multi-model scenarios.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* docs(agent): fix misleading comment on cursor -p flag
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
---------
Co-authored-by: Devv <devv@Devvs-Mac-mini.local>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: yushen <ldnvnbl@gmail.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* feat(agent): add Pi agent runtime support
Add Pi as a new agent runtime provider, following the established adapter
pattern. Pi CLI outputs JSONL events which are parsed for messages, tool
calls, and usage tracking.
Backend:
- New piBackend implementing the Backend interface (pi.go)
- Pi CLI discovery via MULTICA_PI_PATH env var or PATH lookup
- JSONL event stream parsing (agent_start, message_update, thinking_update,
tool_execution_start/end, agent_end)
- Usage scanner for ~/.pi/sessions/*.jsonl files
- Runtime config injection via AGENTS.md
- Skill injection to .pi/agent/skills/
Frontend:
- Pi provider logo (teal π icon)
- Pi label in transcript dialog
Docs:
- Updated all provider lists in README, CLI_INSTALL, and docs
* fix(agent): filter Pi usage scanner to agent_end events only
Address review feedback: restrict usage parsing to agent_end events
which contain cumulative totals, preventing potential inaccuracy if
Pi adds usage fields to other event types in the future.
* fix(agent): align Pi runtime with real CLI flags, event schema, and custom_args
- Flags: Pi's CLI uses `--mode json` (not `--output-format jsonl`), has no
`--yolo` (explicit `--tools` allowlist instead), takes the prompt as a
positional argument (not `-p <prompt>`), splits model as
`--provider <name> --model <id>`, and treats `--session` as a file path
that must exist before spawn.
- Event parsing: rewrite the stream event struct to match Pi's actual
JSON event schema (`message_update.assistantMessageEvent.delta`,
`turn_end.message.usage.{input,output,cacheRead,cacheWrite}`, etc.).
- Sessions: generate/persist session files under ~/.multica/pi-sessions/
and use the file path as the opaque SessionID returned to the daemon.
- Usage scanner: read assistant `message` events from the same session
files (Pi's session-file schema, distinct from the stdout stream).
- Custom args: consume `ExecOptions.CustomArgs` via `filterCustomArgs`
with a Pi-specific blocked set (`-p`, `--print`, `--mode`, `--session`)
so Pi matches the pattern shared by every other agent backend.
Every other backend (claude, codex, opencode, openclaw, gemini) filters
opts.CustomArgs through a per-backend blocked map so protocol-critical
flags can't be overridden via the Create Agent UI. The hermes backend
appended CustomArgs directly to argv, so any future flag we add to the
map would be silently bypassed here.
Add hermesBlockedArgs (with 'acp' as the pinned subcommand) and route
CustomArgs through filterCustomArgs. Behaviour is identical for today's
use cases; the change prevents accidental protocol-flag overrides and
brings hermes in line with the other five backends.
Closes#1113
Co-authored-by: shaun0927 <shaun0927@users.noreply.github.com>
Add a debug-level log line in every agent backend (claude, codex,
opencode, openclaw, gemini, hermes) that prints the executable path
and full argument list when spawning the agent process. Helps diagnose
custom args, model overrides, and other CLI flag issues.
* feat(agent): add custom CLI arguments support
Allow users to configure custom CLI arguments per agent that get
appended to the agent subprocess command at launch time. This enables
use cases like specifying different models (--model o3), max turns,
or other provider-specific flags without needing separate runtimes.
Changes:
- Add custom_args JSONB column to agent table (migration 041)
- Update API handler to accept/return custom_args in create/update
- Pass custom_args through claim endpoint to daemon
- Append custom_args to CLI commands for all agent backends
- Add ExecOptions.CustomArgs field in agent package
- Add Custom Args tab in agent detail UI
- Add --custom-args flag to CLI agent create/update commands
Closes MUL-802
* fix(agent): filter protocol-critical flags from custom_args
Add per-backend filtering of custom_args to prevent users from
accidentally overriding flags that the daemon hardcodes for its
communication protocol (e.g. --output-format, --input-format,
--permission-mode for Claude).
This follows the same pattern as custom_env's isBlockedEnvKey: we
only block the small, stable set of flags that would break the
daemon↔agent protocol — not every possible dangerous flag. Workspace
members are trusted for everything else.
Each backend defines its own blocked set:
- Claude: -p, --output-format, --input-format, --permission-mode
- Gemini: -p, --yolo, -o
- Codex: --listen
- OpenCode: --format
- OpenClaw: --local, --json, --session-id, --message
- Hermes: none (ACP is positional)
Includes unit tests for the filtering logic.
* fix(agent): address code review nits for custom_args
- Replace module-level `nextArgId` counter with `crypto.randomUUID()`
in custom-args-tab.tsx to avoid SSR ID conflicts
- Add unit tests for custom args passthrough and blocked-arg filtering
in both Claude and Gemini arg builders
Remove the concurrency_policy system (skip/queue/replace) — skip had an
orphan bug that permanently blocked triggers, queue didn't actually queue,
and replace didn't cancel running tasks. Every trigger now simply executes.
Bug fixes:
- Listener now handles in_review status (was silently ignored)
- Issue deletion fails linked autopilot runs before DELETE (prevents orphans)
- ComputeNextRun rejects invalid timezones instead of silent UTC fallback
- dispatchCreateIssue post-commit failures now properly fail the run
Reliability:
- Scheduler recovers lost triggers on startup (crash recovery)
- New index on autopilot_run(issue_id) for deletion lookups
- Migration 043 cleans up historical orphaned/skipped/pending runs
* feat(autopilot): add scheduled/triggered automation for AI agents
Introduce the Autopilot feature — recurring automations that assign work
to AI agents on a schedule or manual trigger. Supports two execution
modes: create_issue (creates an issue for the agent to work on) and
run_only (directly enqueues an agent task without issue pollution).
Backend: migration (3 tables + 2 columns), sqlc queries, AutopilotService
with concurrency policies (skip/queue/replace), HTTP CRUD + trigger
endpoints, background cron scheduler (30s tick), event listeners for
issue→run and task→run status sync.
Frontend: types, API client methods, TanStack Query hooks with optimistic
mutations, realtime cache invalidation, list page with create dialog,
detail page with trigger management and run history, sidebar nav + routes
for both web and desktop apps.
* feat(autopilot): improve UX — trigger config, edit dialog, template gallery
- Replace raw cron input with friendly frequency tabs (Hourly/Daily/Weekdays/Weekly/Custom), time picker, and timezone dropdown defaulting to user's local timezone
- Fix Select components showing UUIDs instead of names (Base UI render function pattern)
- Add Edit button on detail page opening a unified edit dialog
- Remove project/concurrency/issue-title-template from create/edit (simplify for users)
- Add trigger configuration inline during autopilot creation
- Add template gallery on empty state (6 step-by-step workflow templates)
- Rename "Description" to "Prompt" throughout UI
- Inject autopilot run timestamp into issue description for agent date awareness
- Treat issue status "in_review" as run completion (fixes skip on next trigger)
- Make migration idempotent with IF NOT EXISTS clauses
* feat(security): replace instant member-add with invitation acceptance flow
Users invited to a workspace must now explicitly accept the invitation
before becoming a member. This fixes the security vulnerability where
knowing someone's email was enough to auto-register their runtime to
your workspace.
Changes:
- Add workspace_invitation table with pending/accepted/declined/expired states
- Replace CreateMember with CreateInvitation (same endpoint, new behavior)
- Add accept/decline/revoke/list invitation API endpoints
- Add invitation WS events for real-time notification
- Frontend: invitation accept/decline UI in workspace switcher
- Frontend: pending invitations section in members settings tab
* fix(invitation): address PR review nits
- Fix invitation:revoked listener to send event to invitee user (was no-op)
- Remove duplicate queryClient2 in app-sidebar.tsx, reuse existing queryClient
- Add expires_at > now() filter to ListPendingInvitationsByWorkspace query
State management
- Pending task / live timeline are now Query-cache single source;
Zustand mirror removed (fixes duplicate assistant render caused by
the invalidate→refetch race window)
- WS subscriptions moved from ChatWindow to global useRealtimeSync so
pending state survives minimize and refresh
- New GET /chat/sessions/:id/pending-task to recover live state on mount
- Drafts persisted per-session (was per-workspace)
Unread tracking
- Migration 040: chat_session.unread_since (event-driven; old chats
stay clean — no mass backfill)
- POST /chat/sessions/:id/read clears unread; broadcasts
chat:session_read so other devices sync
- New GET /chat/pending-tasks aggregate for the FAB
- ChatFab: brand-color impulse animation while running, brand-dot
badge of unread session count
- ChatWindow auto-marks read when user is viewing the session
Header redesign
- Two independent dropdowns: agent (avatar + name + My/Others
grouping) at the input bottom-left; session (title + agent avatar)
in the header
- ⊕ new-chat button replaces the old + and history buttons
- Session dropdown lists all sessions across agents with avatars
- Empty state: 3 clickable starter prompts that send immediately
- Mention link renderer falls through to default span on null —
fixes @member/@agent/@all silently disappearing app-wide
- User messages render through Markdown
- Enter submits in chat input only (with IME guard + codeBlock skip);
bubble menu hidden in chat
Misc
- Partial index on agent_task_queue for fast pending-task lookup
- 2 new storage keys added to clearWorkspaceStorage
- useMarkChatSessionRead has onError rollback
- chat.* namespace logs across store, mutations, components, realtime
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When a member replies in a member-started thread without @mentioning the
assigned agent, the on_comment trigger was suppressed — even if the agent
had already replied in that thread. This meant the common flow of
"member posts → agent replies → member follows up" would not re-trigger
the agent on the follow-up.
Add HasAgentRepliedInThread SQL query and check it in isReplyToMemberThread
so that agent participation in a thread is treated as an ongoing conversation.
OpenClaw outputs its --json result as pretty-printed multi-line JSON to
stderr. The line-by-line scanner never found a valid JSON object on any
single line, causing the raw JSON to be returned as the chat response.
After exhausting line-by-line parsing, try parsing the accumulated
output as a whole before falling back to raw text.
Closes MUL-725
Switch Gemini backend from `-o text` (batch output) to `-o stream-json`
(NDJSON streaming) so tool calls, text, and errors are forwarded to the
UI in real time instead of collected at the end.
Parses all Gemini stream-json event types: init, message, tool_use,
tool_result, error, and result — including per-model token usage from
the result stats.
When an agent CLI process hangs (e.g. a tool call blocks on unreachable
I/O), the daemon's scanner blocks indefinitely on stdout, preventing the
Result from ever being sent. This causes tasks to stay in "running"
state permanently with no further events.
Three-layer fix:
1. Agent backends (claude, opencode, openclaw, gemini): add a watchdog
goroutine that closes the stdout/stderr pipe when the context is
cancelled, forcing the scanner to unblock. Also set cmd.WaitDelay
so Go force-closes pipes after 10s if the process doesn't exit.
2. daemon executeAndDrain: add an independent drain timeout (backend
timeout + 30s buffer) with context-aware select on both the message
channel and the result channel, so the daemon never blocks forever.
3. daemon ping path: add context-aware select so pings don't deadlock
if the agent backend stalls.
Closes#925
Co-authored-by: Devv <devv@Devvs-Mac-mini.local>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Check runCtx.Err() before readErr/waitErr so that context-driven
process kills (timeout, user cancellation) report the correct status
("timeout" or "aborted") instead of "failed".
When exec.CommandContext kills the gemini process, io.ReadAll can
return a non-nil error as a side-effect of the closed pipe. The
previous code checked readErr first, masking the real cause. This
aligns gemini.go with the ordering already used in claude.go and
hermes.go.
Fixes#914
* fix(daemon): prevent duplicate runtime registration on profile switch
The daemon_id included a profile name suffix (e.g. "hostname-staging"),
so switching profiles created a new daemon_id that bypassed the UPSERT
dedup constraint, leaving orphaned runtime records in the database.
Three changes:
- Remove profile suffix from daemon_id — use stable hostname only.
The unique constraint (workspace_id, daemon_id, provider) already
prevents collisions within the same workspace.
- Auto-migrate agents from old offline runtimes to the newly registered
runtime during DaemonRegister (same workspace/provider/owner).
- Add TTL-based GC in the runtime sweeper to delete offline runtimes
with no active agents after 7 days.
Closes MUL-695
* fix(daemon): address code review issues on PR #906
1. Move gcRuntimes() to the main sweep loop — previously it was inside
sweepStaleRuntimes() after an early return, so it only ran when new
runtimes were marked stale. Now it runs every sweep cycle independently.
2. Fix DeleteStaleOfflineRuntimes to exclude runtimes with ANY agent
reference (not just active ones). The FK agent.runtime_id is ON DELETE
RESTRICT, so archived agents also block deletion.
3. Scope MigrateAgentsToRuntime to the same machine by matching
daemon_id LIKE '<current_daemon_id>-%'. This prevents cross-machine
agent migration when the same user has multiple devices.
Combined P0 and P1 improvements to the OpenClaw agent backend, informed
by PaperClip's adapter architecture:
P0 — User experience:
- Streaming output — emit MessageText as NDJSON events arrive in real
time, instead of waiting for the final result blob
- Tool use support — parse and emit MessageToolUse/MessageToolResult
from streaming events, matching Claude and OpenCode backends
- Model & system prompt — pass --model and --system-prompt to the
OpenClaw CLI when configured
P1 — Robustness:
- Hardened JSON parsing — tryParseOpenclawResult requires lines to
start with '{', eliminating fragile brace-scanning that could
false-match JSON fragments in log lines
- Lifecycle event handling — new "lifecycle" event type with phase
tracking (error/failed/cancelled), plus structured error objects
(error.name, error.data.message) matching PaperClip's pattern
- Usage field name variants — parseOpenclawUsage supports multiple
naming conventions (input/inputTokens/input_tokens, cacheRead/
cachedInputTokens/cache_read_input_tokens, etc.) with incremental
accumulation across step_finish events
Backwards compatible with the legacy single JSON blob format.
31 tests covering all new functionality.
Closes MUL-726
Claude's stream-json flow can emit the terminal result event while the
child process still waits on open stdin. Closing stdin as soon as the
final result arrives lets the CLI exit cleanly instead of idling until
the daemon timeout fires.
Constraint: Must preserve the existing Claude stream-json protocol and child-process lifecycle
Rejected: Increase ping timeout only | masks the hang without fixing process exit
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: Keep Claude stdin handling aligned with the stream-json terminal result semantics; do not defer closure until goroutine teardown
Tested: Reproduced self-hosted runtime ping timeout locally; verified ping succeeds after closing stdin on result; cd server && go test ./pkg/agent
Not-tested: Full make check; Bedrock/Vertex-specific Claude auth flows
Some OpenClaw JSON outputs contain durationMs but lack payloads field.
The original condition rejected these results, causing the agent to
return "openclaw returned no parseable output" instead of the actual
execution result.
Fix by accepting results that have either payloads OR durationMs > 0.
Fixes#830
Co-authored-by: leaderlemon <leaderlemon@users.noreply.github.com>
The sub-issue progress indicator (e.g. "0/2") was undercounting because
it was computed from the client-side issue list, which only loads the
first 50 done issues. Sub-issues marked as done beyond that page were
excluded from both the total and done counts.
Added a dedicated backend endpoint (GET /api/issues/child-progress) that
aggregates child issue counts directly from the database, ensuring
accurate totals regardless of client-side pagination or filtering.
Fixes MUL-702
Add per-agent custom_env configuration that gets injected into the agent
subprocess at launch time. This enables users to configure custom API
endpoints (ANTHROPIC_BASE_URL), API keys (ANTHROPIC_API_KEY), and cloud
provider modes (CLAUDE_CODE_USE_BEDROCK, CLAUDE_CODE_USE_VERTEX) without
requiring code changes.
Changes:
- Migration 040: add custom_env JSONB column to agent table
- Backend: custom_env in agent CRUD API + claim endpoint
- Daemon: merge custom_env into subprocess environment variables
- Frontend: env var editor in agent settings (key-value pairs with
visibility toggle for sensitive values)
Closes#816
Related: #807, #809
* feat: add online status indicator dot on agent & member avatars
Backend:
- Track member presence via WebSocket connections in the Hub
- Broadcast member:online/offline events when users connect/disconnect
- Add GET /api/workspaces/{id}/members/online endpoint
- Add member:online and member:offline event type constants
Frontend:
- Add isOnline prop to ActorAvatar with a status dot at top-right corner
- Green dot = online, gray dot = offline, no dot = status unknown
- Fetch online member list via new query, update optimistically on WS events
- Derive agent online status from existing agent.status field
- Wire online status through ActorAvatar views wrapper (enabled by default)
* fix: address code review — fix hub tests and avatar rounding
1. Hub tests: consume the member:online presence event from the first
connection before asserting on broadcast messages.
2. ActorAvatar: use rounded-[inherit] on the inner wrapper so callers
can override rounding (e.g. rounded-lg for agent list items).
* fix: consume member:online presence event in integration test
Same fix as the hub unit tests — read and discard the member:online
event before asserting on issue:created in TestWebSocketIntegration.
Codex tasks running in workspace-write sandbox mode could not resolve
api.multica.ai because the hardcoded sandbox parameter in thread/start
overrode any config.toml settings, and the default sandbox policy blocks
network access.
Changes:
- Remove hardcoded `sandbox: "workspace-write"` from thread/start RPC —
let Codex read sandbox config from its own config.toml instead
- Auto-generate config.toml in per-task CODEX_HOME with
`sandbox_mode = "workspace-write"` and `network_access = true`,
preserving any existing user settings
- Fix Reuse() to restore CodexHome for Codex provider on workdir reuse
Closes#368
Replace LIMIT $2 with AND date >= $2 in ListRuntimeUsage query. When a
runtime uses multiple models each day has multiple rows, so a row LIMIT
silently returns fewer days than requested.
Also fixes displayName warnings in issue-detail test mocks and adds
missing setOpen to useCallback deps in search-command.
Co-authored-by: jayavibhavnk <jaya11vibhav@gmail.com>
Closes#731
Registers `gemini` as a sixth supported agent provider alongside claude,
codex, opencode, openclaw, and hermes.
- Daemon config probes for `gemini` on PATH (MULTICA_GEMINI_PATH /
MULTICA_GEMINI_MODEL env overrides mirror the other providers).
- New agent.geminiBackend in pkg/agent/gemini.go: spawns
`gemini -p <prompt> --yolo -o text [-m <model>] [-r <session>]`,
reads stdout to completion, and returns a single MessageText plus
the standard Result struct (Status / Output / DurationMs).
- Execution environment writes a GEMINI.md file into the task workdir
(mirroring the existing CLAUDE.md / AGENTS.md injection for other
providers) so Gemini discovers the Multica runtime meta-skill
through its native mechanism.
Tests:
- pkg/agent/gemini_test.go — unit coverage for buildGeminiArgs
(baseline, model override, resume session, omit-when-empty).
- internal/daemon/execenv/TestInjectRuntimeConfigGemini — verifies
GEMINI.md is written and that CLAUDE.md/AGENTS.md are NOT.
Scope (intentional for v1):
- Text output only (`-o text`). Streaming tool events via
`--output-format stream-json` is a follow-up once we have a
reliable reproduction of Gemini's event schema.
- No MCP config plumbing. Gemini's `--allowed-mcp-server-names`
filter pairs well with the per-agent MCP work on feat/per-agent-mcp;
stacking the two can land as a follow-up.
- No token usage scraping (Gemini's accounting lives on the Google
Cloud side, not a local JSONL log like claude/codex).
- No session resume wiring beyond accepting the ExecOptions field —
the daemon does not yet persist Gemini session IDs because the text
output mode does not expose them.
Migration / env changes:
- New optional environment variables MULTICA_GEMINI_PATH and
MULTICA_GEMINI_MODEL. Default path is the string "gemini" (resolved
via PATH at daemon startup). If no Gemini install is detected, the
provider is simply absent from the runtime — no behavior change for
existing deployments.
processOutput() used strings.Index(raw, "{") to find the JSON start,
but error lines like `raw_params={"command":"..."}` contain braces that
get matched first, causing JSON parsing to fail and the entire raw
stderr (including internal metadata) to be returned as the agent comment.
Now tries each '{' position until one successfully unmarshals as a valid
openclawResult, skipping braces embedded in log/error lines.
* fix(cli): poll health endpoint instead of fixed sleep in daemon start
The daemon start command waited a fixed 2 seconds then checked the
health endpoint once. If the daemon took longer to initialize (auth,
workspace loading), the check failed and printed a misleading error
even though the daemon started successfully.
Replace the single check with a polling loop (500ms interval, 15s
timeout) so the CLI waits for the daemon to actually be ready.
* fix(agent): rewrite openclaw tests to match new backend API
The openclaw backend was rewritten in #715 to parse a single JSON blob
instead of streaming NDJSON events. The tests still referenced the old
types (openclawEvent) and methods (handleOCTextEvent, etc.), causing a
build failure in CI.
Rewrite all tests to exercise the new processOutput method and
openclawInt64 helper.
* fix(agent): use --message flag for OpenClaw CLI invocation
OpenClaw CLI changed its prompt flag from `-p` to `--message`. The old
flag caused tasks to fail immediately with "required option '-m,
--message <text>' not specified".
Fixes#713, relates to #703.
* fix(agent): rewrite openclaw backend to match actual CLI interface
- Replace unsupported flags (-p, --output-format, --yes) with correct
ones (--message, --json, --local, --session-id)
- Read JSON result from stderr (where openclaw writes it)
- Parse openclaw's actual output format ({payloads, meta})
- Auto-generate session ID for each task execution
- Show "live log not available" hint in agent live card when timeline
is empty (openclaw doesn't support streaming)
* fix(server): skip auto-comment when agent already posted during task
In CompleteTask(), check if the agent already posted a comment on the
issue since the task started. If so, skip the automatic output comment
to avoid duplicates. This preserves the fallback for agents that don't
post comments via CLI.
Closes MUL-609
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(server): use StartedAt instead of CreatedAt for duplicate check
CreatedAt is the enqueue time, not execution start. If a previous task
posted a comment between enqueue and start of the next task, it would
incorrectly suppress the auto-comment for the later task.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>