multica

mirror of https://github.com/multica-ai/multica.git synced 2026-07-05 13:29:44 +02:00

Author	SHA1	Message	Date
Kagura	44d2fc1946	fix(agent): use openclaw agent id instead of name for --agent flag (#2716 ) openclawEntriesToModels() used the agent Name (which may contain spaces, e.g. "Sub2API OPS") as Model.ID. This ID is passed to openclaw via --agent, where normalizeAgentId mangles spaces into hyphens ("sub2api-ops"), causing a lookup miss against the registered id ("sub2api") and a "no parseable output" error. Fix: prefer agent ID for Model.ID; use Name only for display Label. When ID is empty, fall back to Name for backward compatibility. Fixes #2714	2026-05-17 17:08:00 +08:00
Bohan Jiang	8d872b7521	fix(daemon): disable Claude AskUserQuestion in non-interactive mode (MUL-2244) (#2656 ) * fix(daemon): disable Claude AskUserQuestion in non-interactive mode (MUL-2244) GitHub #2588: when Claude Code calls its built-in AskUserQuestion tool inside the daemon's stream-json runtime, the question never reaches the user — there's no UI to render it — so the SDK returns an empty answer and the agent silently "infers" and continues. From the issue's perspective, execution looks stuck while the agent is actually charging ahead on its own guess. Two-part fix: - `buildClaudeArgs` now passes `--disallowedTools AskUserQuestion` so the tool is not exposed to the model at all. - The Claude-specific runtime brief tells the agent to use a `blocked` issue comment for genuine clarification, or to state an explicit assumption and proceed. Adds a regression test that pins both: AskUserQuestion is forbidden in CLAUDE.md and is NOT mentioned in the AGENTS.md emitted for non-Claude providers (the tool is Claude-specific). Co-authored-by: multica-agent <github@multica.ai> * refactor(daemon): drop CLAUDE.md AskUserQuestion guidance, rely on --disallowedTools The --disallowedTools flag already prevents Claude from invoking AskUserQuestion, so duplicating the rule in the runtime brief just bloats the prompt without changing behavior. Removes the section and its regression test; the argv-level test in pkg/agent already pins the flag. Co-authored-by: multica-agent <github@multica.ai> --------- Co-authored-by: multica-agent <github@multica.ai>	2026-05-15 12:42:23 +08:00
fr00st	cc9fbd3db0	Fix stale Done replies on comment follow-ups (#2495 ) * fix: avoid stale done replies on comment follow-ups * fix: avoid inlining runtime brief for Hermes ACP * fix: address comment follow-up review feedback	2026-05-14 12:00:04 +08:00
Bohan Jiang	5db96b4007	fix(daemon): bypass Gemini folder-trust gate in headless mode (#2516 ) (#2523 ) Gemini CLI's folder-trust feature throws FatalUntrustedWorkspaceError (exit code 55) when the current workspace isn't in `~/.gemini/trustedFolders.json` and the process is headless — no interactive trust prompt is available. The daemon spawns gemini with `-p` + `--yolo` in a freshly checked-out worktree that the user has never trusted interactively, so every run with `security.folderTrust` enabled fails after ~10s with exit status 55 and no useful output. Default `GEMINI_CLI_TRUST_WORKSPACE=true` on the child env to short- circuit `checkPathTrust` in gemini-core. This mirrors gemini-cli's documented `--skip-trust` flag; the env var has been gemini's documented headless escape hatch for the entire folder-trust feature lifetime so the fix works on every gemini version that can produce the crash. Callers that explicitly set the same key in cfg.Env win, preserving the ability to opt back into the gate. Co-authored-by: multica-agent <github@multica.ai>	2026-05-13 17:05:12 +08:00
Bohan Jiang	178cfb5008	fix(daemon): strip Windows chcp noise from runtime version (#2516 ) (#2521 ) The gemini CLI's Windows shim emits `Active code page: 65001` (from `chcp`) to stdout before the real version reaches `--version` output. The daemon stored the raw concatenation as the runtime version, so the runtime detail page rendered `Active code page: 65001 0.42.0` instead of `0.42.0`. Scan `<cli> --version` line by line and return the first line carrying a semver-shaped token. Full strings like `2.1.5 (Claude Code)` or `codex-cli 0.118.0` survive unchanged; unparseable output falls back to the trimmed raw value. Co-authored-by: multica-agent <github@multica.ai>	2026-05-13 16:58:14 +08:00
Kagura	702c48209b	fix(agent): stop filtering Pi extension tools via hardcoded --tools allowlist (#2379 ) (#2381 ) The Pi backend hardcoded `--tools read,bash,edit,write,grep,find,ls` in buildPiArgs. Pi's SDK treats --tools as a restrictive allowlist: only the listed tools pass through `_refreshToolRegistry()`, silently filtering out any user-installed extension tools registered via `pi.registerTool()`. Omitting --tools makes Pi's `allowedToolNames` undefined, so the `isAllowedTool()` filter becomes a no-op and all tools — built-in and extension — are available. This matches Pi's standalone behavior. Users who want to restrict tools can still pass --tools via custom_args (it is not in piBlockedArgs). Closes #2379	2026-05-11 16:11:32 +08:00
Multica Eve	e79ffc0f01	fix(agent): expand Copilot CLI model catalog with correct dotted IDs (#2336 ) * fix(agent): expand Copilot CLI model catalog with correct dotted IDs The Copilot CLI provider only exposed two models in the runtime dropdown, and one of them used the dashed legacy form `claude-sonnet-4-6` which `copilot --model` rejects with "Model ... is not available". The CLI accepts dotted IDs (e.g. `claude-sonnet-4.6`, `gpt-5.4`). Sync `copilotStaticModels()` with the official supported-models catalog so the dropdown surfaces the full set the user's account can route to (8 OpenAI + 4 Anthropic), and add a regression test that pins the expected IDs and bans the dashed form. Closes MUL-1948. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: multica-agent <github@multica.ai> * feat(agent): dynamic Copilot model discovery via ACP session/new The previous static catalog could only ever lag behind the user's real entitlements and what GitHub ships. Copilot CLI exposes the live catalog through its ACP server (`copilot --acp`): the `session/new` response includes `models.availableModels` plus `currentModelId`, scoped to the authenticated account. Wire copilot through the existing discoverACPModels helper — already used by hermes/kimi/kiro — so the dropdown reflects the account's real catalog, including the `auto` entry and per-tier model availability (Pro / Pro+ / Enterprise / evaluation models). The Copilot CLI puts itself into ACP server mode via the `--acp` flag instead of an `acp` subcommand, so acpDiscoveryProvider now takes an optional acpArgs override. Copilot's ACP payload omits the vendor name, so a small prefix-based inferCopilotProvider keeps the UI's openai / anthropic / google grouping working. When the binary is missing or auth fails, fall back to copilotStaticModels() so self-hosted runtimes without a copilot install still see a populated dropdown. Verified against `copilot 1.0.44`: live discovery returns 13 models with gpt-5.5 marked Default. Closes MUL-1948. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: multica-agent <github@multica.ai> * fix(agent): drop no-op COPILOT_ALLOW_ALL env and generalize OpenAI o-series prefix check - discoverCopilotModels: remove COPILOT_ALLOW_ALL=1 (not a real Copilot CLI env var; copy-pasta from HERMES_YOLO_MODE=1). Discovery only drives initialize + session/new which never trigger tool-permission prompts, so no extra env is needed. - inferCopilotProvider: replace the o1/o3/o4 prefix chain with a generic o<digit>+ check via isOpenAIReasoningSeriesID, so future o5/o6/… reasoning models are tagged as openai automatically. Guards against false positives like 'opus-…' or bare 'o'. - Extend TestInferCopilotProvider with o5/o6 forward-compat cases and negative cases (opus-fake, omni, o). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: multica-agent <github@multica.ai> --------- Co-authored-by: Eve <eve@multica-ai.local> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: multica-agent <github@multica.ai>	2026-05-11 14:36:43 +08:00
Multica Eve	72e89a74f3	fix: surface copilot failure details (#2396 ) Co-authored-by: Eve <eve@multica-ai.local> Co-authored-by: multica-agent <github@multica.ai>	2026-05-11 14:08:33 +08:00
Bohan Jiang	b73a301bf9	fix(agent): drain stderr before deciding ACP failure promotion (#2333 ) `hermes`, `kimi`, and `kiro` all wired stderr through `cmd.Stderr = io.MultiWriter(logWriter, providerErrSniffer)`. The OS-pipe → MultiWriter copy goroutine that exec spawns for that form is only joined by `cmd.Wait()`, which the lifecycle goroutine fires in deferred cleanup — after `promoteACPResultOnProviderError` already consulted the sniffer. When stopReason=end_turn (success) raced ahead of the stderr drain, the sniffer's `lines` slice was empty, the helper fell through to the synthetic agent-text fallback ("hermes provider error: API call failed after 3 retries"), and the actionable upstream signal (HTTP 429 / usage limit) was lost. This was visible as a flaky `TestHermesBackendPromotesProviderErrorWithNonEmptyOutput` in CI under high parallelism — a real prod bug, not a test issue: live runs hit the same race when an upstream LLM returns 429 and hermes' synthetic agent turn beats the stderr drain to the parent. Replace the MultiWriter wiring with `cmd.StderrPipe()` + an explicit copier goroutine that signals on `stderrDone`. The lifecycle goroutine already awaits `<-readerDone` for stdout; add `<-stderrDone` next to it before `promoteACPResultOnProviderError` runs. The deferred `cmd.Wait()` ordering is unchanged — it just becomes a cheap reap by the time it fires. Verified: `go test ./pkg/agent/ -run "TestHermes\|TestKimi\|TestKiro" -count=10 -race`, then full package `-count=3 -race`, all green. Co-authored-by: multica-agent <github@multica.ai>	2026-05-09 17:34:25 +08:00
LinYushen	f70105fb12	fix(agent): include JSON-RPC error data field in ACP error messages (#2327 ) ACP backends (Kiro, Hermes, Kimi) put the actionable reason for code=-32603 'Internal error' in the JSON-RPC `data` field, e.g. "No session found with id". The wrapped Go error only carried `code` and `message`, leaving operators staring at a bare "kiro session/prompt failed: session/prompt: Internal error (code=-32603)" with no way to tell apart session expiry, model unavailability, lost auth, or quota. Parse `data` too. Strings render unquoted; objects/arrays render as raw JSON; null/missing keeps the previous format unchanged. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-09 16:19:57 +08:00
Bohan Jiang	c57546159d	fix(daemon): mark provider 429 / out-of-credit agent runs as failed, not completed (#2323 ) * fix(daemon): mark provider 429 / out-of-credit runs as failed, not completed Two bugs combined to silently report failed agent runs as "Completed" in the UI when the upstream LLM returned a 4xx (e.g. HTTP 429 rate-limit / no credit on the account). 1. ACP backends (hermes, kimi, kiro) only promoted the run status to "failed" when their stderr sniffer fired AND the agent output buffer was empty. But hermes injects a synthetic agent text turn ("API call failed after 3 retries: HTTP 429...") on retry exhaustion, so the buffer was never empty in the rate-limit case and the promotion never ran. Drop the empty-output precondition: the sniffer's regex (HTTP-status markers, named error types) is specific enough to trust on its own. 2. The daemon's task-result switch only routed "blocked" through FailTask; every other status — including "cancelled", and any future status we forget to enumerate — fell through to CompleteTask. Invert it so only an explicit "completed" status reports success, and extract the switch into reportTaskResult for direct testing. Cancelled now defaults to failure_reason "cancelled" instead of being silently completed. Closes GitHub multica#1952. Co-authored-by: multica-agent <github@multica.ai> * fix(agent): only promote ACP run to failed on terminal provider error Address GPT-Boy's review on the multica#1952 fix. The previous promotion rule ("any sniffer line → fail") was too broad: the existing sniffer also captures transient per-attempt warnings ("API call failed (attempt 1/3): RateLimitError [HTTP 429]"), and those lines stay in the buffer for the rest of the run. A retry sequence whose first attempt blipped but whose third attempt succeeded would have been wrongly reported as failed. Tighten the criteria with two additional signals, both defined on the existing acpProviderErrorSniffer / output buffer: - acpTerminalErrorRe — sticky `terminal` flag set when stderr shows an exhausted/non-retryable marker (❌, [ERROR], "after N retries", Non-retryable, BadRequestError, AuthenticationError). Per-attempt warnings deliberately don't match. - acpAgentOutputTerminalRe — matches the synthetic "API call failed after N retries..." turn that hermes-style adapters inject into the agent text stream when they give up; this catches multica#1952 even if hermes' stderr only logged transient attempts. Promotion logic becomes a shared helper, promoteACPResultOnProviderError, called from hermes / kimi / kiro. Promotes when (a) terminalMessage is non-empty, (b) output contains the synthetic give-up turn, or (c) output is empty and the sniffer captured anything at all (preserves the original empty-output safety net for transient-only sequences with no real result to fall back on). Tests: - TestHermesProviderErrorSnifferTerminalVsTransient — transient attempt 1/3 alone returns terminalMessage="" but message!=""; a follow-on terminal marker flips terminal on. - TestHermesProviderErrorSnifferTerminalNonRetryable — confirms BadRequest / Authentication / Non-retryable / ❌ / [ERROR] are classified terminal even on the very first attempt. - TestHermesBackendDoesNotPromoteOnTransientRetry — fake hermes emits attempt 1/3 to stderr then a normal agent text turn and end_turn; resulting Status must stay "completed". Co-authored-by: multica-agent <github@multica.ai> --------- Co-authored-by: multica-agent <github@multica.ai>	2026-05-09 16:13:12 +08:00
Bohan Jiang	0eb23df234	fix(agent): scope pi colon-to-slash normalization to legacy format (#2309 ) PR #2281 added table-format support to parsePiModels but kept the unconditional `strings.Replace(":", "/", 1)`, which would silently rewrite a `:` inside a model name read from column 1 of the table output (e.g. `claude-sonnet-4-6:exp` would become `claude-sonnet-4-6/exp`). Move the replace into the legacy `provider:model` branch so only the colon-as-separator case is normalized, and restore a short doc comment describing the dual- format contract. Test extended with a colon-bearing table row. Co-authored-by: multica-agent <github@multica.ai>	2026-05-09 13:56:49 +08:00
Leonardo Diego	8d5a6138fe	fix: parse pi --list-models table format for model discovery (#2281 ) The pi CLI changed its --list-models output from a single-field 'provider:model' format to a multi-column table with separate 'provider' and 'model' columns. The existing parser only looked at the first whitespace-delimited field (the provider name) and skipped lines without ':' or '/' — discarding every model entry. Update parsePiModels to handle both formats: - New table format: combine fields[0] (provider) + fields[1] (model) - Legacy format: single field with ':' or '/' separator Add regression test for the table format using real pi output.	2026-05-09 13:51:32 +08:00
Bohan Jiang	0af67c8159	fix(agent/openclaw): block tasks if openclaw < 2026.5.5 with upgrade hint (#2181 ) PR #2101 swapped the openclaw runtime adapter from reading --json on stderr to stdout. That fixed openclaw 2026.5+ but inverted the breakage for pre-2026.5 builds — those still write JSON to stderr, so the adapter now sees an empty stdout and falls through to the same "openclaw returned no parseable output" failure that 2026.5+ users saw before #2101. Add a per-task version gate inside openclawBackend.Execute that runs `openclaw --version`, parses the dotted version, and rejects anything below 2026.5.5 with a hardcoded upgrade hint: openclaw <detected> is below the minimum supported version 2026.5.5. Run `openclaw update` to upgrade and try again. The check is intentionally per-task and uncached so users who upgrade do not need to restart the daemon — the next task automatically re-checks. ~20ms per task is negligible vs. the typical run. Co-authored-by: multica-agent <github@multica.ai>	2026-05-07 02:11:47 +08:00
Joey Frasier (Boothe)	af971e1e5c	fix(agent/openclaw): read --json from stdout, not stderr (#2101 ) Multica's openclaw runtime adapter has been reading agent output from stderr since the early openclaw integration days. Current openclaw (2026.5.5, c37871e) writes its --json blob exclusively to stdout: $ openclaw agent --local --json --agent main --message 'say hi' >stdout 2>stderr STDOUT bytes: 27401 STDERR bytes: 0 Result: every successful turn was followed by a daemon-generated system comment 'openclaw returned no parseable output', visible to users, looked like the agent broke when it didn't. Reproduced live on WOR-2, turn at 2026-05-05 16:35 UTC; daemon log confirmed the full result JSON arrived on the [openclaw:stdout] debug channel and was discarded while the empty stderr pipe hit the no-events fallback. Changes - server/pkg/agent/openclaw.go: swap pipes, StdoutPipe() for the JSON stream, cmd.Stderr = newLogWriter(...) for log overflow. Cleanup goroutine now closes stdout on cancel. Comments and the read-error errMsg updated to reflect the new pipe. - server/pkg/agent/openclaw_test.go: TestOpenclawProcessOutputReadError asserts on 'read stdout' (was 'read stderr'), string-only fix, no behavior change. New TestOpenclawProcessOutputStdoutFixture feeds a recorded openclaw 2026.5.5 --json blob through processOutput and asserts result + messages parse cleanly. - server/pkg/agent/testdata/openclaw-2026.5.5-stdout.json: 27401-byte fixture captured fresh from the openclaw CLI for the regression test. Side effects (net positive) - Log lines openclaw writes to stderr (security warnings, tool errors) now show up under [openclaw:stderr] instead of being silently consumed by the JSON parser. - Daemon's success_pattern heuristic (empty-output -> 'blocked') becomes meaningful again because result.Output actually populates. Closes WOR-10.	2026-05-07 01:50:16 +08:00
prellr	ee10c508fb	fix(daemon): trust the agent's session id from session/resume across ACP backends (#2070 ) When the local state.db of an ACP backend (hermes, kimi, kiro) is wiped — crash, config change, manual kill, container reset — the backend's session/resume (or session/load, in kiro's case) silently creates a brand-new session rather than failing, and returns the new id in the response. Today the daemon ignores the response and stamps sessionID = opts.ResumeSessionID across all three backends, so every subsequent session/prompt is addressed to a session id the backend has no record of. The task fails with JSON-RPC -32603 (Internal error) on the very first turn, with no operator-visible signal that the problem is a session-id mismatch one layer down. The behavior is invisible: agent shows "started", then "failed" with a generic Internal error. Reproducing in production took repeated runs because nothing in the logs pointed at the silent reset. Fix: route all three ACP backends through a small `resolveResumedSessionID` helper that: - prefers the id the backend returned in its response (the canonical id; the one the backend will accept on the next call) - falls back to the requested id when the response is malformed, empty, or omits sessionId — defensive fallback so older / non- conforming backends (notably kiro's current session/load shape) behave identically to today - signals (via a bool) when the id changed, so the caller logs a Warn with `backend=<hermes\|kimi\|kiro>` and operators can grep for silent state resets to correlate them with task failures Why this is at the backend layer rather than the daemon's existing session-resume fallback: server/internal/daemon/daemon.go:1554-1566 already retries with a fresh session when resume fails, but it gates on `result.Status == "failed" && result.SessionID == ""`. The backend WILL hand back a result.SessionID — just the new one it silently committed to — so the daemon-level fallback never fires for this failure mode. The helper is also what session/new already uses (extractACPSessionID, documented in code as "Shared by all ACP backends"). session/new extracts the canonical id from the response; session/resume just didn't, until now. Coverage: - hermes.go: confirmed bug, root cause of -32603 in production - kimi.go: same code shape, same protocol method, same response schema as hermes (per extractACPSessionID's comment) — same bug - kiro.go: same code shape, different method (session/load). Current observed response doesn't include sessionId, so the defensive fallback means today's behavior is preserved. Routing through the same helper means a future kiro release that DOES return a sessionId on silent reset works the same way as hermes/kimi without another diff. Tests (server/pkg/agent/hermes_test.go — helper covers all three backends, no per-backend duplication): - TestResolveResumedSessionIDMatching — backend confirms requested id - TestResolveResumedSessionIDDifferent — backend returned a new id; caller is told to switch - TestResolveResumedSessionIDEmptyResponse — older / malformed body; defensive fallback to requested id (covers kiro's current shape) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-06 14:15:40 +08:00
Jiayuan Zhang	1476c268dd	refactor(quick-create): exempt git-describe daemons from CLI gate (#2108 ) * refactor(quick-create): remove daemon CLI version gate Local-source daemons report dev-suffixed versions (e.g. v0.2.15-235-gdaf0e935) that the picker pre-check and server gate both treat as too old, blocking quick-create during local testing. Drops the gate end-to-end: removes MinQuickCreateCLIVersion + CheckMinCLIVersion in pkg/agent, the checkQuickCreateDaemonVersion handler and readRuntimeCLIVersion helper in handler/issue.go, and the mirrored cli-version.ts plus the modal's pre-check, blocked-state UI, and daemon_version_unsupported error branch. Co-authored-by: multica-agent <github@multica.ai> * refactor(quick-create): skip daemon CLI version gate in dev Restores the gate (reverts the full-removal commit) and bypasses it in non-production environments instead. The motivation for the original removal — local source-built daemons report a `git describe` version like v0.2.15-N-gHASH that parses below 0.2.20 and blocks dev testing — is now handled by checking APP_ENV on the server and NODE_ENV on the client. Production keeps the original "needs upgrade" UX. Co-authored-by: multica-agent <github@multica.ai> * refactor(quick-create): exempt git-describe daemons instead of env bypass Replaces the per-environment bypass added in the previous commit with a shared daemon-version signal. CheckMinCLIVersion / checkQuickCreateCliVersion now treat any daemon whose CLI version matches the `vX.Y.Z-N-gHASH[-dirty]` git-describe shape as OK; tagged releases keep going through the normal min-version comparison. Why: Emacs flagged that (a) NODE_ENV !== "production" also disables the gate on staging and other non-prod deployments, undoing the protection for the case the gate was originally written for, and (b) NODE_ENV (web client) and APP_ENV (server) are not equivalent, so the modal pre-check and server gate could disagree on the same request. Both go away when the signal is intrinsic to the daemon's version string. Co-authored-by: multica-agent <github@multica.ai> --------- Co-authored-by: multica-agent <github@multica.ai>	2026-05-06 09:00:11 +08:00
Kagura	629f4136ac	fix(codex): handle MCP elicitation server requests correctly (#1944 ) * fix(codex): handle MCP elicitation server requests correctly Fixes #1942. handleServerRequest responded with {} to unrecognized Codex server requests including mcpServer/elicitation/request. Codex 0.125+ expects {action, content, _meta} for elicitation — the empty object causes a deserialization error and the MCP tool call is reported as user-rejected. Changes: - Add mcpServer/elicitation/request case with correct response schema - Add respondError helper for JSON-RPC error responses - Return proper JSON-RPC method-not-found error for unknown server requests instead of silent empty object - Add tests for MCP elicitation and unknown method handling * fix: use cfg.Logger instead of global slog in codex handleServerRequest Switch the unhandled-server-request warning from global slog.Warn to c.cfg.Logger.Warn for consistency with all other log calls in codex.go. This ensures the warning appears in daemon run-logs and per-task pipelines where operators look during triage.	2026-05-04 21:05:37 +08:00
Bohan Jiang	170fa2102b	fix(agent/hermes): wire streamingCurrentTurn gate to drop history replay (#2024 ) Hermes ACP can flush queued session updates from the previous turn before the current turn actually starts — both as session/resume history replay and as chunks queued before our session/prompt response streams. Without a gate those updates were appended to output and re-emitted to the UI, so the previous answer appeared duplicated next to the new one. Closes #1997. PR #1789 added the acceptNotification hook field to hermesClient and the call site in handleNotification, but never assigned it for Hermes, so the guard short-circuited and every notification was processed. This change mirrors the working Kiro pattern (kiro.go:87/97/240): - declare a streamingCurrentTurn atomic.Bool in the backend. - assign acceptNotification, onMessage, onPromptDone gates that all return early when the flag is false. - flip the flag to true immediately before c.request("session/prompt"). Adds TestHermesClientAcceptNotificationGate as a regression test that exercises the gate directly on hermesClient. Verified with `go test ./pkg/agent`. Co-authored-by: multica-agent <github@multica.ai>	2026-05-03 11:43:36 +08:00
Bohan Jiang	3f046d03f7	fix(agent): expose GPT-5.5 family in Codex runtime model picker (#2020 ) Latest Codex CLI ships with GPT-5.5 / GPT-5.5 mini, but the static catalog still topped out at GPT-5.4 so users couldn't pick the new model from the agent picker. Add gpt-5.5 + gpt-5.5-mini to codexStaticModels and promote 5.5 as the default badge. Keep the older 5.4 / 5.3-codex / gpt-5 / o3 entries for users on older Codex CLI builds. Add a regression test mirroring TestGeminiStaticModelsExposesAliasesAndGemini3 so the next OpenAI release isn't a silent miss. Co-authored-by: multica-agent <github@multica.ai>	2026-05-03 11:12:51 +08:00
Bohan Jiang	cfa38df97b	feat(quick-create): gate on daemon CLI version with pre-check + server enforcement (#1857 ) * fix(quick-create): bound dialog height + scroll editor when content overflows Pasting a screenshot into the agent-create prompt expanded the editor unbounded, which dragged DialogContent past the viewport since the agent mode className had no max-height. Manual mode was unaffected because manualDialogContentClass pins `!h-96`. - Cap agent-mode DialogContent at `!max-h-[80vh]` (width stays `!max-w-xl`); short prompts still render compact, tall content stops at 80% of the viewport. - Switch the editor wrapper to `flex-1 min-h-[140px] overflow-y-auto` so it absorbs the remaining vertical space inside the now-bounded DialogContent and scrolls internally instead of pushing the dialog. * feat(quick-create): gate on daemon CLI version with pre-check + server enforcement The agent-create flow depends on multica CLI behavior introduced in v0.2.20 (URL attachment handling, no-retry semantics on `multica issue create` failure — see PR #1851 / MUL-1496). Older daemons either double-create issues on partial CLI failures or mishandle pasted screenshot URLs. Per J's review on MUL-1496, gate the flow at two layers — frontend pre-check for fast feedback, server re-check as the trust boundary, both fail-closed on missing/unparsable versions. Server: - New MinQuickCreateCLIVersion + CheckMinCLIVersion helper in pkg/agent (with sentinel errors for missing vs too-old). - QuickCreateIssue handler reads runtime metadata.cli_version and returns a stable 422 { code: "daemon_version_unsupported", current_version, min_version, runtime_id } before enqueuing. - The check runs after the existing online + ownership validation, so all rejections surface uniformly through the modal's existing error path. Frontend: - New @multica/core/runtimes/cli-version with the min version constant, parser, and runtime-metadata reader (tiny semver, no new lib dep). - AgentCreatePanel resolves the selected agent's runtime, runs the same check, shows an inline amber notice below the agent picker when missing/too old, and disables the Create button. - Submit handler also catches the server's 422 (defensive race — runtime can re-register between pre-check and submit) and surfaces the same wording in the error row. Switching to manual create remains a clean escape hatch — manual mode doesn't talk to a daemon at all, so an outdated CLI doesn't block the user from filing the issue.	2026-04-29 18:44:19 +08:00
Prince Pal	391a4ecd09	feat: add backend default agent args env vars (#1807 ) * feat: add backend default agent args env vars * docs: document default agent args env vars	2026-04-29 16:49:48 +08:00
carmake	805071b5b1	fix(agent/cursor): route Windows launcher through PowerShell -File to preserve multi-line prompts (#1709 ) On Windows the official cursor-agent installer ships cursor-agent.cmd whose body is `powershell ... -File cursor-agent.ps1 %`. CreateProcess for a .cmd file goes through cmd.exe, and `%` in a batch file is expanded by re-tokenising the original command line, which mangles arguments containing newlines or other whitespace - most notably a long, multi-line `-p <prompt>`. The agent then only sees a truncated prompt and fails with "Workspace Trust Required" or exits 1 immediately. When LookPath resolves cursor-agent to a .cmd/.bat launcher and a sibling cursor-agent.ps1 exists, invoke PowerShell directly with `-File <ps1>` so Go's os/exec passes each argv as a discrete token. This is exactly what the .cmd does internally; we just skip the cmd.exe re-tokenisation step. PowerShell host resolution prefers pwsh.exe (PS 7) on PATH, then powershell.exe on PATH, and finally falls back to %SystemRoot%\System32\WindowsPowerShell\v1.0. Platform-specific code is split via build tags (cursor_invocation_windows.go / cursor_invocation_other.go) so non-Windows builds carry no Windows-only dependencies. The lookup is exposed as a package variable to make the Windows path fully unit-testable without spawning real PowerShell. Five unit tests cover: passthrough on non-launcher targets, successful rewrite with a multi-line prompt, .exe direct launch (skip), missing .ps1 (skip), and missing PowerShell host (skip). The change leaves macOS / Linux behaviour entirely untouched and stays on the official cursor-agent launch chain - no node.exe direct invocation, no prompt mutation, no extra flags. Closes #1297 Made-with: Cursor	2026-04-29 14:00:15 +08:00
LinYushen	03f3180b8f	fix(agent): ignore Kiro session/load history replay (#1789 ) Ignore Kiro ACP session/load history replay before the active prompt starts; keep task messages, usage, and tool state scoped to the current Kiro turn. Verified with go test ./pkg/agent -run TestKiro, go test ./pkg/agent, and git diff --check origin/main...HEAD.	2026-04-28 17:50:13 +08:00
LinYushen	c366cf2ba1	feat(agent): add Kiro CLI ACP runtime (#1780 ) * feat(agent): add kiro cli acp runtime * fix(agent): align kiro acp prompt and notifications * chore(agent): clarify kiro acp args compatibility	2026-04-28 17:03:46 +08:00
dyjxg4xygary	6bd5bbad9c	fix: timeout stalled Codex turns (#1730 ) * fix: timeout stalled codex turns * fix: count codex progress events as activity	2026-04-27 18:23:31 +08:00
Truffle	8b340fcf21	fix(agent/opencode): bypass npm .cmd shim on Windows to preserve multi-line prompts (#1718 ) * fix(agent/opencode): bypass npm .cmd shim on Windows to preserve multi-line prompts The npm-generated `opencode.cmd` shim forwards argv via Windows batch `%`, which silently truncates positional arguments at the first newline. The daemon spawns OpenCode with a multi-line prompt (system prompt + user message), so on Windows the agent only ever sees the first line and responds generically as if it never received the user's message (reported in #1717 with native-binary repro confirming the same prompt arrives intact when cmd.exe is skipped). When `runtime.GOOS == "windows"` and `exec.LookPath` returns a `.cmd` shim, walk to the native binary that npm bundles next to the shim: <prefix>\opencode.cmd <prefix>\node_modules\opencode-ai\node_modules\opencode-windows-x64\bin\opencode.exe If the native binary is missing (unusual install layout), keep the original shim path so PATH lookup still wins. The resolver is a pure function with an injectable `statFn`, so layout assertions are testable on Linux: - shim resolves to the bundled native binary - missing native returns "" (caller keeps original path) - non-cmd paths (Linux/Mac binary, opencode.exe direct, empty) skip resolution - uppercase `.CMD` is accepted (PATHEXT entries can be either case) Closes the user-facing failure mode without restructuring exec resolution across the rest of the agent backends — the other shim-aware fixes can follow the same shape if/when they land in similar repros. fix(agent/opencode): cover x64-baseline and arm64 npm package variants `npm install -g opencode-ai` ships three Windows platform packages (opencode-windows-x64, opencode-windows-x64-baseline for older CPUs without AVX2, opencode-windows-arm64 for Surface / Copilot+ PC) and installs whichever matches the host. The previous resolver only knew about opencode-windows-x64, so baseline-x64 and arm64 hosts would fall back to the .cmd shim and hit the multi-line prompt truncation again. Iterate the three package candidates in GOARCH-preferred order. ARM64 hosts try arm64 first; everything else tries x64, then baseline, then arm64 as a last resort. Cost is one extra statFn call per miss when the GOARCH-preferred package isn't installed. Surfaced by review on #1718. * test(agent): add Windows counterpart to writeTestExecutable writeTestExecutable in exec_fixture_unix_test.go is referenced by claude_test.go / codex_test.go / kimi_test.go, but the //go:build unix constraint meant `go test ./pkg/agent` failed to build on Windows. ETXTBSY is a Linux/Unix fork-exec race; Windows doesn't have that pathology, so a plain os.WriteFile is sufficient. Lifted from #1719 (Codex) with attribution. Surfaced by review on #1718.	2026-04-27 12:16:56 +08:00
Bohan Jiang	aca74293dd	fix(agent/claude): surface stderr tail on writeClaudeInput failure + lock with e2e test (#1698 ) #1674 wired claude's post-handshake error path through withAgentStderr but left the writeClaudeInput failure branch returning a bare "broken pipe" error. That branch fires precisely when claude crashes during startup — exactly when the stderr tail is most useful for root-causing V8 aborts, Bun panics, or missing native modules. cmd.Wait() before sampling Tail() flushes os/exec's internal stderr copy goroutine, matching the Wait→Tail synchronization contract spelled out in stderr_tail.go. Adds TestClaudeExecuteSurfacesStderrWhenChildExitsEarly mirroring the codex test: a fake claude binary drains stdin, writes a V8-abort line to stderr, and exits 3. Locks in the contract that Result.Error carries the stderr tail in the post-handshake failure path on the claude backend too.	2026-04-26 11:09:38 +08:00
songlei	6f04a6d26b	feat(agent): surface agent CLI stderr tail in failure messages (#1674 ) Hoist the existing stderrTail ring-buffer (previously codex-only) into a shared pkg/agent helper so every Backend that supervises a child CLI can include the last ~2 KB of that CLI's stderr in Result.Error. Wire the claude backend through the same path. Motivation: claude on Windows occasionally exits with a non-zero status after ~5–8 minutes of a single long-running tool_use, and right now the daemon only reports "claude exited with error: exit status 3" / "exit status 0x80000003" — useless for root-causing V8 aborts, Bun panics, native-module OOMs, or any other CLI-side crash. With the tail attached, the failure message carries the real signal (panic line, V8 assertion, stderr-printed HTTP error) all the way into the task row's error field that users see in the API. Renames withCodexStderr to withAgentStderr(msg, label, tail) so the helper is self-documenting across providers.	2026-04-26 10:55:21 +08:00
Bohan Jiang	95912243bb	test(daemon): cover cancelled classification in executeAndDrain (#1692 ) Follow-up to #1686. Locks in two nits flagged during review: 1. agent.Result.Status doc comment now lists "cancelled" alongside the existing values, so the enum surface matches actual usage. 2. New TestExecuteAndDrain_ContextCancelled_ReportsCancelled exercises the path added in #1686: when the parent context is cancelled before the backend produces a Result, executeAndDrain must return Status="cancelled" (not "timeout"). A regression here would silently restore the misleading log line we just fixed.	2026-04-26 09:27:13 +08:00
Bohan Jiang	74593fdb88	fix(daemon): use CREATE_NEW_CONSOLE to stop grandchild console popups on Windows (#1521 ) (#1643 ) * fix(daemon): use CREATE_NEW_CONSOLE to stop grandchild console popups on Windows (#1521) CREATE_NO_WINDOW strips the console entirely. When the agent CLI then spawns a console-subsystem grandchild (bash, cmd, netstat, findstr, timeout) without itself passing CREATE_NO_WINDOW, Windows allocates a brand-new visible console window per invocation — trading one popup per agent run for N popups per tool call. Switch to CREATE_NEW_CONSOLE + HideWindow=true so the agent gets a hidden console that grandchildren inherit. Stdio pipes still work via STARTF_USESTDHANDLES; no changes needed at the 17 hideAgentWindow call sites. Add a Windows-only regression test asserting CREATE_NEW_CONSOLE is set and CREATE_NO_WINDOW is not, per the #1474 Windows-test follow-up. Root-cause diagnosis by @matrenitski (verified against the shipped multica.exe and the Claude Code CLI it spawns) in issue #1521. * test(agent): use CREATE_NEW_CONSOLE-compatible flag in preservation test CREATE_NEW_PROCESS_GROUP is silently ignored by Windows when combined with CREATE_NEW_CONSOLE, so asserting it 'survives' was only bitwise-true, not semantically meaningful. Switch the example to CREATE_UNICODE_ENVIRONMENT (documented compatible) and also assert a non-flag field (NoInheritHandles) survives to exercise full struct preservation.	2026-04-25 01:40:15 +08:00
Kagura	6d9ca9de93	fix(daemon): suppress agent terminal windows on Windows (#1474 ) * fix(daemon): suppress agent terminal windows on Windows (#1471) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: add hideAgentWindow to detectCLIVersion and avoid SysProcAttr overwrite - Add missing hideAgentWindow(cmd) call in detectCLIVersion (claude.go:554) so --version checks don't flash console windows on Windows. - Refactor hideAgentWindow to preserve existing SysProcAttr fields instead of overwriting the entire struct. --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-23 17:23:00 +08:00
LinYushen	d97aec83d7	fix: pass model to Hermes ACP and add hermes to InjectRuntimeConfig (#1203 ) * fix: pass model to Hermes ACP session/new and add hermes to InjectRuntimeConfig - hermes.go: include opts.Model in session/new params so Hermes uses the configured model instead of its default (fixes local LLM failures) - runtime_config.go: add "hermes" to the AGENTS.md provider list so Hermes receives the Multica runtime instructions and skill discovery Fixes: https://github.com/multica-ai/multica/issues/1195 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix(hermes): drop false native-skill claim and add regression tests The previous change added 'hermes' to the 'skills discovered automatically' branch of buildMetaSkillContent, but resolveSkillsDir has no Hermes case so skills still land in the .agent_context/skills/ fallback. AGENTS.md ended up claiming native discovery while the files were somewhere else, which would mislead Hermes (and future debuggers). - Move 'hermes' to the fallback branch alongside 'gemini' so AGENTS.md points Hermes at .agent_context/skills/ — matching where writeContextFiles actually writes them. - Extract buildHermesSessionParams so the session/new payload is unit-testable. - Add regression tests covering: * buildHermesSessionParams includes/omits 'model' correctly * InjectRuntimeConfig('hermes') writes AGENTS.md with the fallback hint * writeContextFiles('hermes') writes skills to .agent_context/skills/ Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: CC-Girl <cc-girl@multica.ai>	2026-04-23 12:43:30 +08:00
Bohan Jiang	7375bda9b5	fix(landing): scope landing route to always-light palette (MUL-1277) (#1537 ) * fix(landing): scope landing route to always-light palette The landing page sections use hardcoded light colors (bg-white / #0a0d12), but shared components rendered inside — notably CloudWaitlistExpand on /download — use semantic tokens that flip to dark values under next-themes' `.dark` class, producing a mismatched dark card on an otherwise light page when the user's OS is in dark mode. Add a `.landing-light` class on the landing layout wrapper that re-declares all color tokens to their light values for the subtree, so nested token-driven components stay in lockstep with the hardcoded palette. * test(agent): serialize fake-executable writes to avoid ETXTBSY on CI TestKimiBackendInvokesACPSubcommand (and its Kimi/Codex siblings) write a shell script to a per-test TempDir and then fork/exec it. With t.Parallel() enabled across the package, a concurrent goroutine's fork can inherit the still-open write fd to another test's new executable; Linux then rejects the subsequent exec with ETXTBSY (seen as fork/exec /tmp/.../kimi: text file busy on GitHub Actions). Introduce writeTestExecutable, which holds syscall.ForkLock.RLock across OpenFile→Write→Close. Fork (which takes ForkLock.Lock) cannot run while we hold RLock, so no sibling fork inherits our write fd. Ran the three callers with -count=10 under -p=1 and the full package with no failures.	2026-04-23 01:52:46 +08:00
Dhruv-89	2a248b8548	fix(openclaw): raise agent discovery timeout to 30s (#1495 ) 'discoverOpenclawAgents' runs several 'openclaw' subprocesses under one context; 5s was too short on cold starts or under load, causing empty listings in the model picker. Increase the per-discovery cap to 30s.	2026-04-22 19:24:57 +08:00
Bohan Jiang	dc8096fb6e	fix(agent): expose Gemini 3 + CLI aliases in Gemini runtime model list (#1508 ) Gemini CLI has no `models list` subcommand, so Multica can't do real dynamic discovery. Instead, swap the static catalog from fixed version names (2.0/2.5 only) to the CLI's own aliases (`auto`, `pro`, `flash`, `flash-lite`, `auto-gemini-2.5`) plus explicit pins for Gemini 3 preview and 2.5 variants. Aliases are resolved inside the Gemini CLI per user entitlement + quota, so new model releases light up without a Multica redeploy. Default is `auto`, matching Google's recommended selection. Fixes multica-ai/multica#1503.	2026-04-22 19:02:07 +08:00
LinYushen	0b1333fb00	feat(server): orphan-task recovery + auto-retry + manual rerun (MUL-1128) (#1476 ) * feat(server): orphan-task recovery + auto-retry + manual rerun (MUL-1128) When the daemon process crashed mid-task the issue was stuck at in_progress for up to 2.5h: the in-flight task timeout was the only mechanism that ever moved the row, and the runtime heartbeat sweeper only fires after the runtime stays offline for 45s — a quick restart beats both windows. This change implements the A+B plan from the issue thread: A. lifecycle hygiene - migration 055 adds attempt / max_attempts / parent_task_id / failure_reason / last_heartbeat_at to agent_task_queue - new daemon-auth endpoint POST /runtimes/{id}/recover-orphans: daemon calls it on every register so the server fails any dispatched/running tasks the previous process left behind - new daemon-auth endpoint POST /tasks/{id}/session: persists the agent's session_id + work_dir mid-flight so a crash doesn't lose the resume pointer (claude+codex emit MessageStatus with SessionID; daemon forwards on the first one it sees) - FailAgentTask / FailStaleTasks / FailTasksForOfflineRuntimes now set failure_reason ('agent_error' / 'timeout' / 'runtime_offline') B. auto-retry with resume context - TaskService.MaybeRetryFailedTask spawns a fresh queued attempt carrying parent's session_id/work_dir when the failure reason is infrastructure-shaped (timeout, runtime_offline, runtime_recovery) and attempt < max_attempts; skips autopilot - wired into the runtime sweeper paths and TaskService.FailTask so the user transparently sees a new in_progress run instead of a stuck row - new user-auth POST /api/issues/{id}/rerun + multica issue rerun CLI for the manual escape hatch Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix(server): address PR review for orphan-task recovery (MUL-1128) Three review-must-fix items on top of the A+B implementation: 1. recover-orphans now funnels through TaskService.HandleFailedTasks, the same shared post-failure pipeline used by the runtime sweeper. This guarantees task:failed events are emitted, agent status is reconciled, and issues stuck in_progress with no remaining active task are reset to todo even when no auto-retry is created (max_attempts exhausted, autopilot, non-retryable reason). 2. RerunIssue now uses CancelAgentTasksByIssueAndAgent, scoped to the issue's current assignee. The previous implementation called CancelAgentTasksByIssue, which would collateral-cancel parallel @-mention agents on the same issue. 3. GetLastTaskSession now considers both completed and failed tasks (mirroring GetLastChatTaskSession), ordering by the most recent timestamp. With UpdateAgentTaskSession pinning session_id/work_dir mid-flight, an auto-retry or manual rerun of a daemon-crash failure now actually resumes the prior conversation context instead of starting fresh — matching the stated B-branch behaviour. go build / go vet pass; the existing service and agent test suites pass. runtime_sweeper / handler integration tests require a local DB with the 055 migration (and the pre-existing 050 first_executed_at column). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-22 13:08:37 +08:00
Bohan Jiang	c5a00d8b8c	fix(agent/openclaw): extract real model from meta.agentMeta.model (#1426 ) OpenClaw's `--json` result blob carries the actual LLM identifier in `meta.agentMeta.model` (e.g. `deepseek-chat`, `claude-sonnet-4`), alongside `provider` and the usage breakdown. The backend was reading the surrounding `agentMeta.usage` and `agentMeta.sessionId` but skipping the `model` field entirely, then attributing every run's tokens to `opts.Model` — which for openclaw is the agent name passed via `--agent`, not a real model identifier — falling all the way through to "unknown" when no agent.model was configured. Surface the runtime-reported model: - `openclawEventResult` gains a `model` string. - `buildOpenclawEventResult` reads `agentMeta.model` (trimmed; empty string when absent for forward-compat with older runtimes / partial outputs). - `processOutput` propagates it through the result-blob branch. - `Execute`'s usage map prefers `scanResult.model`, falling back to `opts.Model` then `"unknown"` — preserving the prior behavior path for any runtime that doesn't surface its own model yet. Two unit tests cover both the populated and missing cases. Refs: #1395	2026-04-21 14:32:31 +08:00
Bohan Jiang	4ac43e9e49	feat(daemon): log agent invocation at info level (#1428 ) Surface the actual exec path + argv for every agent backend at INFO so operators can see the exact command without flipping to debug. Also add the missing log line in pi.go for consistency with the other nine backends.	2026-04-21 14:30:07 +08:00
devv-eve	9e47b83f02	feat(agent): add Kimi CLI as agent runtime (#1400 ) * feat(agent): add Kimi CLI as agent runtime Adds support for Moonshot AI's Kimi Code CLI (https://github.com/MoonshotAI/kimi-cli) as a new agent runtime, alongside Claude, Codex, OpenCode, OpenClaw, Hermes, Gemini, Pi, Cursor and Copilot. Kimi Code CLI implements the standard Agent Client Protocol (ACP) via the `kimi acp` subcommand, so the new `kimiBackend` reuses the existing hermesClient JSON-RPC transport in the agent package — only the binary, client identity, log prefix, and tool-name extraction differ. Wiring: - server/pkg/agent: new kimiBackend + kimi_test.go; registered in New(), LaunchHeader map, and the supported-types coverage test. - server/internal/daemon/config.go: probes `kimi` (overridable via MULTICA_KIMI_PATH / MULTICA_KIMI_MODEL). - server/internal/daemon/execenv: writes AGENTS.md as the runtime context file (Kimi reads AGENTS.md natively via /init), and writes skills under `.kimi/skills/` so they are auto-discovered by the project-level skill loader. - packages/views/runtimes: ProviderLogo gains a Kimi mark. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * feat(agent/kimi): support per-agent model selection via ACP set_model Wire Kimi into the model dropdown introduced in #1399: - ListModels gets a 'kimi' case that drives the same ACP initialize + session/new handshake as Hermes; both share a new discoverACPModels helper and parseACPSessionNewModels parser so future ACP backends only need a small provider entry. - kimiBackend now issues session/set_model after session/new when opts.Model is non-empty, mirroring the Hermes flow. Failures fail the task instead of silently falling back to Kimi's default model — silent fallback would hide that the dropdown pick wasn't honoured. Verified: go build ./..., go test ./pkg/agent/... ./internal/daemon/... ./internal/handler/..., pnpm typecheck and pnpm test (138 passed). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * refactor(agent): address code review feedback on Kimi runtime - Share ACP provider-error sniffer between hermes and kimi. Previously only hermes promoted stderr-observed 4xx/5xx into a failed task; kimi would report "completed + empty output" when the Moonshot upstream rejected a request (expired token, rate limit, …). Rename hermesProviderErrorSniffer → acpProviderErrorSniffer and parameterise the provider name; wire it into kimiBackend.Execute the same way. - Rename extractHermesSessionID → extractACPSessionID (shared by all ACP backends) so the name matches parseACPSessionNewModels. - Drop the redundant second argument to kimiToolNameFromTitle; the Message struct has only one relevant field (Tool), so passing it twice was a dead fallback. Document that the function normalises residual capitalised kimi titles not caught by hermesToolNameFromTitle. - Remove kimi-only cmd.WaitDelay override; the hermes baseline is fine for both and divergence adds noise. - Add TestKimiBackendSetModelFailureFailsTask: fake `kimi acp` binary that returns a JSON-RPC error for session/set_model, asserts that the task result surfaces status=failed with the model name + upstream message and preserves the session id. - Fix stale agent listings in agent.go / daemon/config.go doc comments (missing cursor, gemini, copilot). All: `go build ./...`, `go vet ./...`, `go test ./pkg/agent/... ./internal/daemon/... ./internal/handler/...` green. * fix(agent/kimi): pass --yolo so Shell tools don't hang on approval Kimi's default config has `default_yolo = false`. Every Shell/file-mutating tool call causes kimi acp to send a `session/request_permission` request and block (up to 300s) waiting for a response. The daemon's hermesClient only handles `session/update` notifications — permission requests go unanswered, the tool call times out, and the UI loop eventually dies ("UI loop timed out"). Observed with the first real kimi task: agent sat as Live for ~7 minutes before the daemon killed it. The fix mirrors hermes' HERMES_YOLO_MODE=1 override: pass `--yolo` to `kimi` so it auto-approves everything. `--yolo` is a top-level flag on the `kimi` CLI (not a flag on `kimi acp`), so it must come before the `acp` subcommand in argv. Added to kimiBlockedArgs so user custom_args can't strip it. While here, fix a related bug that made kimi tool names show up empty in the daemon log ("tool #1: "): hermesToolNameFromTitle's fallback returned `kind` when neither title-with-colon nor kind matched a known tool. Kimi's ACP `tool_call` emits bare titles like "Shell" or "Read file" with no `kind` at all, so we'd drop the title on the floor before kimiToolNameFromTitle ever got a chance to map it. Now: preserve the title when kind is unclassified; hermes titles always carry a colon so this branch never fires for hermes. Tests: - TestKimiBackendPassesYoloFlag — fake binary that records its argv, asserts --yolo comes before acp. - TestHermesToolNameFromTitle rows for bare kimi-style titles. - Existing suite green: go build, go vet, full pkg/agent + daemon + handler test packages. * fix(agent/acp): auto-approve session/request_permission from agent The previous attempt (`kimi --yolo acp`) was a no-op. Inspected the kimi-cli source: the `acp` Typer subcommand takes no parameters, so flags on the root `kimi` command are dropped before `acp_main()` runs — it's impossible to opt into YOLO mode through CLI flags for ACP. The real fix is on our side: respond to session/request_permission. ACP is bidirectional. When kimi runs a Shell or file-write tool, it sends `session/request_permission` (agent → client, JSON-RPC request with id + method) and waits up to 300s for a response. Our existing hermesClient.handleLine only dispatched: (id + result/error) → handleResponse, and (no id + method) → handleNotification. A request with BOTH id and method fell through and got silently dropped — kimi timed out, UI loop died, task sat stuck for 7 minutes. Add handleAgentRequest: for session/request_permission, echo the id and respond with outcome=selected, optionId=approve_for_session. The daemon is headless; there's no user to prompt. `approve_for_session` lets the agent remember the action so subsequent identical calls (every Shell, every file write) skip the round-trip entirely. For any other agent → client method, reply with standard -32601 method-not- found so the agent doesn't block. Also: - Add writeMu so request() (main goroutine) and handleAgentRequest (reader goroutine) don't interleave JSON frames on stdin. - Revert the `--yolo acp` flag — it's a no-op, and carrying it in kimiBlockedArgs gives the wrong impression that it does something. Comment in kimi.go now points at handleAgentRequest as the real fix. Tests: - TestHermesClientAutoApprovesPermissionRequest: inject a session/request_permission, assert the reply echoes the id and carries {outcome: selected, optionId: approve_for_session}. - TestHermesClientReplesMethodNotFoundForUnknownAgentRequest: confirm unknown agent → client methods get JSON-RPC -32601 instead of silence. - TestKimiBackendInvokesACPSubcommand replaces the yolo-flag assertion with a negative assertion: no dead --yolo / --auto-approve / -y on argv, since they'd pretend to do something they can't. All: go build ./..., go vet ./..., go test ./pkg/agent/... green. * fix(agent/acp): surface kimi tool input/output via content blocks Kimi-cli emits tool_call and tool_call_update ACP frames with the input/output inside a `content` array of ContentToolCallContent blocks (shape: {type:"content", content:{type:"text", text:"..."}}), not in the hermes-style `rawInput` map / `rawOutput` string. Our parser only looked at rawInput/rawOutput, so the daemon recorded empty Input and Output for every kimi tool — the execution-history UI showed blank terminal panels even for commands that ran fine. Add extractACPToolCallText() and a fallback in handleToolCallStart / handleToolCallUpdate: when rawInput is nil / rawOutput is empty, pull the text out of the content blocks. rawInput / rawOutput still take precedence so hermes' behaviour is untouched. Terminal / FileEditToolCallContent blocks are skipped (we have nothing to render them as — kimi only emits TerminalToolCallContent when the client advertises terminal capability, which we don't). Tests: - TestHermesClientHandleToolCallStartKimiContent — content array → Input.text populated. - TestHermesClientHandleToolCallCompleteKimiContent — multi-block content → Output concatenated with newline separator. - TestHermesClientHandleToolCallRawOutputTakesPrecedence — hermes rawOutput still wins when both are present. - TestExtractACPToolCallText — unit coverage for the helper (single/multiple text blocks, terminal-block skip, empty input). * fix(agent/acp): buffer streaming tool args so Input isn't empty in UI kimi-cli streams tool args token-by-token via tool_call_update frames — the initial tool_call carries an empty content block and each subsequent in_progress update carries the cumulative JSON so far (`{`, `{"comma`, `{"command": "echo`, …). The final completed update then carries the tool's stdout, not the args. Observed per kimi-cli acp/session.py::_send_tool_call{,_part,_result} and confirmed by driving a real Shell call end-to-end: 10 in_progress frames, last with `{"command": "echo hello world"}`, then completed with `hello world\n`. Our previous handleToolCallStart emitted MessageToolUse on the first tool_call frame, capturing the empty content — so every kimi tool appeared in the execution-history UI with a blank input. Output was correct (fix `4335c198`) but command was missing. Changes: - hermesClient now tracks pending tool calls per toolCallId. Hermes path is unchanged — rawInput is present at tool_call time, so emit-immediately-then-flag-emitted still fires on the initial frame. - kimi path defers MessageToolUse until status=completed / failed. tool_call_update in_progress frames update the buffered argsText (cumulative, so overwrite); on completion we parse the accumulated JSON into Message.Input. Malformed JSON falls back to `{"text": …}` so non-JSON tool args still render. - Orphan completion frames (no matching tool_call seen — e.g. daemon restarted mid-task) synthesise ToolUse from the update's own title/kind/rawInput so the UI still gets a header. - extractACPToolCallText now also renders FileEditToolCallContent blocks as a compact header ("--- path / +++ path / (edited: N → M bytes)"). kimi emits these for Write / StrReplaceFile / Patch when the tool's display block is a DiffDisplayBlock. Tests: - TestHermesClientKimiStreamingToolCall: empty tool_call + 5 streaming in_progress + completed. Asserts no emission until complete, then [ToolUse(Input.command="echo hi"), ToolResult(Output="hi\n")]. - TestHermesClientKimiMalformedArgsFallback: non-JSON argsText → falls back to Input.text. - TestHermesClientHandleToolCallCompleteOrphan: completed frame without a start → ToolUse synthesised from update's rawInput. - TestExtractACPToolCallText: diff + new-file-diff cases. All agent / daemon / handler test packages green. --------- Co-authored-by: Eve <8b0578a3-cf72-4394-9e38-b328eca92463@users.noreply.multica.ai> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: Eve <eve@multica.ai> Co-authored-by: Lambda <f252c2c5-7d1d-4f3c-b394-a61abfe673fc@users.noreply.multica.ai>	2026-04-21 02:18:30 +08:00
Jiayuan Zhang	b291db11c2	feat(agents): add per-agent model field with provider-aware dropdown (#1399 ) Adds a first-class `model` field on agents so users can pick the LLM model from the create / settings UI instead of editing `custom_env` / `custom_args`. Each provider's dropdown is populated from the live CLI when possible (`opencode models`, `pi --list-models`, `openclaw agents list --json`, `cursor-agent --list-models`, hermes ACP `session/new` → `SessionModelState`), with a static catalog for providers that don't enumerate. Daemon resolves the runtime model as `agent.model → MULTICA_<PROVIDER>_MODEL → ""` — empty passes through so each backend's CLI picks its own default, avoiding static-guess drift. Per-provider honouring: - Claude / Codex / OpenCode / Cursor / Gemini / Pi / Copilot — CLI `--model` / thread payload. - OpenClaw — `opts.Model` is mapped to `--agent <name>` (the CLI rejects `--model`). - Hermes — `session/set_model` ACP RPC; stderr is sniffed for provider-level errors so HTTP 4xx from the configured LLM surfaces instead of "empty output"; explicit-model failures mark the task `failed`. Supporting changes: migration 050 adds `agent.model`; daemon ↔ server heartbeat piggyback carries a model-discovery request; new REST endpoints under `/api/runtimes/{id}/models`; `multica agent create --model` / `update --model`; shared `ModelDropdown` in `packages/views/agents` (searchable, creatable, provider-grouped, default-badge, runtime-supported gate).	2026-04-21 00:06:34 +08:00
Bohan Jiang	ec73710dd2	fix(agent/codex): surface stderr tail in initialize / turn startup errors (#1314 ) * fix(agent/codex): surface stderr tail in initialize / turn startup errors When codex app-server exits before the JSON-RPC handshake completes — e.g. because the user put a flag in custom_args that the subcommand rejects — the Result.Error users see is `codex initialize failed: codex process exited`, while codex's actual complaint (typically something like `error: unexpected argument '-m' found`) only lives in daemon logs. Wrap the stderr writer with a bounded stderrTail that still forwards to the slog logWriter but also retains the last 2 KiB of bytes written. Include that tail on the three startup failure paths (initialize, startOrResumeThread, turn/start). Runtime cancellation paths are left untouched — they're our own abort and the stderr context isn't a clear signal there. Refs #1308. Complement to #1310 / #1312 — lets "bad custom_args fail loudly" actually be workable by giving the failure a real message. * fix(agent/codex): join cmd.Wait() before sampling stderr tail Addressing review of #1314: reading stderrBuf.Tail() right after c.request returns "codex process exited" was racy. Nothing in that path synchronizes with os/exec's internal stderr copy goroutine — cmd.Wait() is the only documented join point. The original defer ran cmd.Wait() later, but by then we had already built Result.Error from a potentially-empty Tail(). Replace the ad-hoc deferred stdin.Close()/cmd.Wait() with a sync.Once-wrapped drainAndWait closure. Call it explicitly on the three startup failure paths before sampling the tail; keep it as the cleanup defer so the success path behaves identically. Also add TestCodexExecuteSurfacesStderrWhenChildExitsEarly: spawns a real subprocess that prints to stderr and exits before responding to initialize, runs it through Execute, and asserts Result.Error contains the stderr hint. This covers the full timing path the reviewer flagged, which the helper-level tests in this PR did not.	2026-04-20 14:38:32 +08:00
Bohan Jiang	bd445782d5	fix(openclaw): stop passing unsupported flags and actually deliver AgentInstructions (#1362 ) Fixes #1332. Two regressions introduced in #910 (2026-04-14, "OpenClaw backend P0+P1 improvements") that together block all openclaw users: 1. `openclaw agent` does not accept `--model` or `--system-prompt`, so any agent configured with a Model field crashed in ~700ms with `exit status 1`. Remove both forwards, and add them to openclawBlockedArgs so custom_args can't reintroduce the crash. Model is bound at registration time via `openclaw agents add/update --model`. 2. AgentInstructions were written to `{workDir}/AGENTS.md` by execenv.InjectRuntimeConfig, but openclaw loads bootstrap files from its own workspace dir — the file was never read, so every agent's Instructions field was silently discarded. Populate opts.SystemPrompt for the openclaw provider in runTask and prepend it to the `--message` payload in the backend so the model actually receives the instructions. Other providers surface instructions through their native runtime config file (CLAUDE.md / AGENTS.md / GEMINI.md) and are intentionally left unchanged to avoid double injection. Extract buildOpenclawArgs so arg construction is directly testable; add unit tests covering the removed flags, the SystemPrompt prepend, and custom_args filtering.	2026-04-20 14:01:41 +08:00
devv-eve	5fa1da448f	fix(chat): preserve chat session resume pointer across failures (#1360 ) * fix(chat): preserve chat session resume pointer across failures The chat 'forgets earlier messages' bug came from PriorSessionID being silently lost in several edge cases: - UpdateChatSessionSession unconditionally overwrote chat_session.session_id, so any task that completed without a session_id (early agent crash, missing result) wiped the resume pointer to NULL. - CompleteAgentTask + UpdateChatSessionSession ran in separate calls. A follow-up chat message claimed in between resumed against a stale (or NULL) session and started over. - FailAgentTask never wrote session_id back, so a task that established a real session before failing lost its resume pointer. - ClaimTaskByRuntime only trusted chat_session.session_id and never fell back to the existing GetLastChatTaskSession query, so a single bad turn could permanently drop the conversation memory. This change: - Use COALESCE in UpdateChatSessionSession so empty inputs preserve the existing pointer; surface DB errors instead of swallowing them. - Run CompleteAgentTask/FailAgentTask + UpdateChatSessionSession inside the same transaction (TaskService now takes a TxStarter). - Extend FailAgentTask + the daemon FailTask path (client, handler, service) to forward session_id/work_dir, so failed/blocked tasks that built a real session still record it. - Fall back to GetLastChatTaskSession in ClaimTaskByRuntime when the chat_session pointer is missing, and include failed tasks in that lookup so a single failure can't lose the conversation. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix(daemon): forward session_id/work_dir on blocked + timeout paths runTask previously dropped result.SessionID and env.WorkDir on the non-completed return paths: - timeout returned a naked error, so handleTask called FailTask with empty session info and the chat resume pointer was either left stale or eventually overwritten with NULL. - blocked / failed (default branch) returned a TaskResult without SessionID / WorkDir, so even though FailTask now COALESCEs into chat_session, there was no value to write through. - the empty-output completion path was the same: it raised an error even when a real session_id had been built. All three paths now return a TaskResult that carries the SessionID / WorkDir the backend produced. Combined with the COALESCE-based update in UpdateChatSessionSession and the FailTask plumbing introduced in PR #1360, the next chat turn can always resume from the latest agent session — even when the previous turn timed out, was rate-limited, or returned an empty completion — instead of starting over with no memory of the conversation. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix(copilot): capture session id from session.start as fallback The Copilot backend only read sessionId from the synthetic 'result' event, ignoring the one already present on session.start. When the CLI was killed before result arrived (timeout, cancel, crash, or a session.error mid-turn), the daemon reported SessionID="" and the chat-session resume pointer could not advance — causing the chat to silently drop conversation memory on the next turn. Capture session.start.sessionId into state up front, and only let 'result' overwrite it when it actually carries one. result still wins when present (it is the authoritative end-of-turn record). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix(copilot): parse premiumRequests as float to preserve session id Copilot CLI v1.0.32 serializes premiumRequests as a float (e.g. 7.5), not an integer. Our copilotResultUsage struct typed it as int, which made the entire 'result' line fail json.Unmarshal — silently dropping sessionId on every turn. This was the real cause of chat memory loss: the daemon reported SessionID="" to the server, chat_session.session_id stayed NULL, and the next chat turn never received --resume <id>, so each turn started a fresh Copilot session with no prior context. Add a regression test using the real JSON line from CLI v1.0.32 that asserts sessionId is preserved when premiumRequests is fractional. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Devv <devv@Devvs-Mac-mini.local> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: Eve <eve@multica.ai> Co-authored-by: yushen <ldnvnbl@gmail.com>	2026-04-19 22:50:33 -07:00
Bohan Jiang	163f34f918	feat(agents): show launch mode preview in custom args tab (#1312 ) * feat(agent): add LaunchHeader per agent type Each backend in server/pkg/agent/ hardcodes a stable command skeleton (e.g. `codex app-server --listen stdio://`, `hermes acp`) before appending opts.CustomArgs. Surfacing that skeleton lets the UI tell users which command their custom_args are being appended to, so a Codex user doesn't mistakenly add `-m gpt-5.4-mini` expecting it to reach the CLI when the subcommand is actually `app-server`. Expose only the minimum that aids judgment — binary + subcommand, or a short mode label when there is no subcommand — and deliberately omit transport values, internal flags, and env to keep the surface small and renaming-safe. Refs #1308. * feat(handler/runtime): surface launch_header on runtime response runtimeToResponse now derives launch_header from agent.LaunchHeader, piggybacking on the runtime's existing provider field so the frontend's RuntimeDevice gains the skeleton without a new endpoint or DB query. Client gets the header for free whenever it lists agents' runtimes — which the custom-args tab already does. Refs #1308. * feat(ui/agents): show launch mode preview in custom args tab Thread the resolved RuntimeDevice from AgentDetail into CustomArgsTab and render its launch_header as a one-line preview above the args list, so users see `codex app-server <your args>` (or equivalent per provider) and can tell whether a CLI-style flag like `--model` will actually reach the invoked subcommand. Source of truth stays in the Go backend; the TS type just carries the string. Refs #1308.	2026-04-18 14:18:42 +08:00
niceSprite	746f33a38b	fix(claude): clear fresh session_id on resume failure so daemon fallback fires (#1285 ) When --resume targets a dead session, claude prints "No conversation found with session ID: ..." to stderr, emits a stream-json system init with a fresh session_id, then exits with code 1. The backend was treating that fresh id as the authoritative session, so daemon.go's retry-with-fresh-session fallback (SessionID == "" guard) never triggered. Every subsequent task for the same (issue, agent) pair stayed permanently broken until the server-side session_id was cleared by hand. Fix: when --resume was requested but the emitted session_id differs AND the run failed, drop the fresh id from Result so the daemon's existing fallback can do its job. Factored into a pure helper and unit-tested. Fixes #1284 Co-authored-by: fuxiao <fuxiao@zyql.com>	2026-04-18 12:59:30 +08:00
Korkyzer	63800f05ff	fix(agent): add per-agent mcp_config field to restore MCP access (#1168 ) * fix(agent): add per-agent mcp_config field to restore MCP access Closes #1111 The --strict-mcp-config flag was added defensively in #592 to prevent Claude agents from inheriting MCP state from the outer Claude Code session. It was meant to be paired with --mcp-config <path> to inject a controlled set of MCPs, but that path was never implemented, which silently stripped all user-scope MCPs from spawned agents. This PR completes the original design by: - Adding a nullable mcp_config jsonb column to the agents table - Wiring mcp_config through AgentResponse, Create/Update requests - Piping it into ExecOptions.McpConfig in the daemon - Serializing to a temp file and passing --mcp-config <path> in buildClaudeArgs - Blocklisting --mcp-config in claudeBlockedArgs to prevent override via custom_args Does not touch Codex provider (tracked separately in #674). Does not implement Multica MCP auto-injection (out of scope). * fix: disambiguate JSON null vs absent for mcp_config	2026-04-18 01:35:22 +08:00
Bohan Jiang	423ceaf8f4	test(agent): regression tests for codex subagent threadId filter (#1257 ) Follow-up to #1192. Document the v2 protocol contract that the dispatch-level threadId guard relies on, and lock down the two leakage paths the guard closes: - turn/completed from a subagent thread must not call onTurnDone - item/completed (agentMessage, final_answer) from a subagent thread must neither leak text into the output builder nor terminate the turn Without these tests a future refactor that drops or relocates the guard would not be caught by CI, since existing notification tests omit the top-level threadId field and pass through unfiltered.	2026-04-17 14:49:38 +08:00
niceSprite	462ff88df5	fix(codex): dispatch-level threadId filter for subagent notifications (#1192 ) * fix(daemon): filter thread/status/changed by threadId to prevent subagent interference When Codex CLI has memories enabled, the app-server spawns a memory consolidation subagent as a separate thread within the same stdio connection. When that subagent thread finishes and transitions to idle, the daemon's codex backend mistakenly interprets the idle signal as the main turn completing, causing it to close stdin and cancel the context before the real turn produces any output. Add a threadId check to the thread/status/changed handler so only status changes from the tracked thread trigger turn completion. Signals from subagent threads (threadId != c.threadID) are now ignored. Fixes #1181 * fix(codex): dispatch-level threadId filter for subagent notifications Codex multiplexes subagent threads (e.g. memory consolidation) on the same stdio pipe. Previously only thread/status/changed had a threadId guard, but item/completed (agentMessage + final_answer), turn/completed, and turn/started from subagent threads could still trigger onTurnDone or contaminate output. Move the threadId check to the top of handleRawNotification so all notification handlers are protected. Remove the now-redundant per-handler check on thread/status/changed. Fixes multica-ai/multica#1181 --------- Co-authored-by: fuxiao <fuxiao@zyql.com>	2026-04-17 14:45:09 +08:00
Bohan Jiang	9a97ee1f4c	fix(agent): resume codex thread across tasks on the same issue (#1166 ) Every other backend (Claude, Gemini, OpenCode, OpenClaw, Hermes) honors ExecOptions.ResumeSessionID — only Codex didn't. That's why users on the Codex runtime saw each new comment on an issue start a fresh Codex conversation: the daemon persists Result.SessionID per (agent, issue) and passes it back as PriorSessionID, but codex.go always called thread/start and never populated SessionID, so the value round-tripped as empty. Wire the missing half: - Extract startOrResumeThread on codexClient. When ResumeSessionID is set, call thread/resume (per the Codex app-server protocol), passing only cwd / model / developerInstructions overrides so the thread keeps its persisted model and reasoning effort. If resume fails (unknown thread, schema drift, transport error) fall back to thread/start so the task still runs on a fresh thread. - Surface the live threadID as Result.SessionID on the final emit so the daemon stores it and feeds it back into ResumeSessionID on the next claim. Tests drive the new helper through the fake stdin harness, covering: fresh start, successful resume, fallback on resume error, fallback when resume returns no thread ID, and surfacing of thread/start failures.	2026-04-16 18:06:11 +08:00

1 2

93 Commits