multica

mirror of https://github.com/multica-ai/multica.git synced 2026-07-05 21:39:54 +02:00

Author	SHA1	Message	Date
krislliu	d2aef41fbf	feat(agent): implement codebuddyBackend.Execute with stream-json parsing Co-authored-by: multica-agent <github@multica.ai>	2026-05-30 14:53:32 +08:00
krislliu	160a62ca9f	feat(agent): add codebuddyBackend struct and buildCodebuddyArgs Introduces the codebuddy agent backend skeleton with args builder that mirrors claudeBackend's protocol flags (stream-json, bypass permissions, blocked args filtering) for the codebuddy CLI fork. Co-authored-by: multica-agent <github@multica.ai>	2026-05-30 14:53:32 +08:00
Bohan Jiang	e1745d09ea	MUL-2797 feat(agent): add Claude Opus 4.8 to model catalog & pricing (#3492 ) Claude Code now ships Opus 4.8 (claude-opus-4-8). Add it to the three places that enumerate Claude models so the picker, thinking-level catalog, and usage cost estimates all recognize it: - claudeStaticModels(): list Claude Opus 4.8 (Sonnet 4.6 stays default) - claudeModelEffortAllow: Opus supports the full low..max set incl. xhigh - MODEL_PRICING: $5/$25 in, $0.50 cache read, $6.25 5m cache write — same current-gen Opus tier as 4.5/4.6/4.7, confirmed against platform.claude.com/docs/en/about-claude/pricing Co-authored-by: J <j@multica.ai> Co-authored-by: multica-agent <github@multica.ai>	2026-05-29 10:28:30 +08:00
Bohan Jiang	09f9c7e2ce	MUL-2764 feat(agent): wire mcp_config through ACP runtimes (Hermes / Kimi / Kiro) (#3439 ) * MUL-2764 feat(agent): wire mcp_config through ACP runtimes (Hermes / Kimi / Kiro) The MCP config Tab (#3419) already lets admins save mcp_config on an agent, and the daemon plumbs it through to `agent.ExecOptions.McpConfig` for every runtime. Claude and Codex consume it; the three ACP runtimes (Hermes / Kimi / Kiro) ignored the field and hardcoded an empty `mcpServers: []` in their `session/new` requests. Add `buildACPMcpServers` to translate the Claude-style `{"mcpServers": {"<name>": {...}}}` object-of-objects into the array shape ACP requires (`[{name, command, args, env: [{name,value}, ...]}, ...]` for stdio; `[{type, name, url, headers: [...]}, ...]` for http/sse), then pass the translated array on `session/new` (all three) and `session/load` (kiro resume). Malformed JSON fails the launch closed — same contract Codex's `renderCodexMcpServersBlock` uses — so users see a real error instead of silently running with no MCP servers. Individual unclassifiable entries (no command, no url) are skipped with a warning so one bad row can't take MCP down for the rest of the agent. Co-authored-by: multica-agent <github@multica.ai> * MUL-2764 fix(agent): wire mcp_config through ACP resume + gate http/sse on capability Addresses the two blockers Elon raised on #3439: 1. session/resume now carries mcpServers for Hermes and Kimi (Kiro's session/load already did). Per the ACP Session Setup spec the resume path re-attaches MCP servers, and without this a resumed task lost access to MCP tools that a fresh task on the same agent would have had. Pinned with new TestHermesResumeIncludesMcpServers and TestKimiResumeIncludesMcpServers integration tests that inspect the recorded wire request. 2. Added extractACPMcpCapabilities + filterACPMcpServersByCapability so http/sse MCP entries get dropped (with a daemon-log warning naming the entry) when the runtime's initialize response doesn't advertise mcpCapabilities.http / .sse. Sending those entries to a stdio-only runtime is a spec violation and reliably tanks session/new; now they get filtered and the rest of the session still starts. Stdio entries pass through unconditionally. Both backends wire the filter in right after initialize so session/new and session/resume see the same filtered list. Also added TestKiroLoadIncludesMcpServersFromConfig — Elon flagged that no test pinned "non-empty mcp_config actually reaches the wire" for Kimi/Kiro, so the wire assertions go in for all three runtimes. Co-authored-by: multica-agent <github@multica.ai> --------- Co-authored-by: J <j@multica.ai> Co-authored-by: multica-agent <github@multica.ai>	2026-05-28 16:29:49 +08:00
Bohan Jiang	bae8a84abd	MUL-2767 feat(agent): add Antigravity runtime backend (#3427 ) * feat(agent): add Antigravity runtime backend Adds Google's Antigravity CLI (`agy`) as the 12th supported coding-tool runtime, alongside Claude / Codex / Cursor / Copilot / Gemini / Hermes / Kimi / Kiro / OpenCode / OpenClaw / Pi. The CLI emits plain assistant text on stdout (no structured event stream), so the backend streams stdout line-by-line as `MessageText` events and accumulates the same text as the final `Result.Output`. Session resumption uses `--conversation <id>`; because the conversation UUID is not echoed on stdout, the daemon routes `--log-file` to a temp file and recovers the id from the glog-formatted log lines. MUL-2767 Co-authored-by: multica-agent <github@multica.ai> * fix(agent): correct Antigravity capability contract from Elon review - ModelSelectionSupported now returns false for antigravity. `agy` has no --model flag and antigravityBackend deliberately drops opts.Model, so the UI must render a disabled "Managed by runtime" picker instead of an empty dropdown plus a silently-ignored manual-entry field. Also stop seeding AgentEntry.Model from MULTICA_ANTIGRAVITY_MODEL — the backend would silently ignore it. - Antigravity skills now write to {workDir}/.agents/skills/, the CLI's native workspace path (inherits Gemini CLI's layout per https://antigravity.google/docs/gcli-migration). Previously they went to the .agent_context/skills/ fallback that the CLI doesn't scan. Runtime brief moves antigravity into the native-discovery branch and local_skills.go points the user-level skill root at ~/.gemini/antigravity-cli/skills for Runtime → local skill import. - Doc + UI comment sync: providers matrix / install-agent-runtime / cloud-quickstart / agents-create / tasks (session-resume support) / skills / README all now list Antigravity in the right buckets, and the model-picker / model-dropdown comments cite antigravity (not the stale hermes reference) as the supported=false example. New tests: TestAntigravityModelSelectionUnsupported, TestInjectRuntimeConfigAntigravity (native discovery wording), TestWriteContextFilesAntigravityNativeSkills (.agents/skills/ landing, .agent_context/skills/ NOT written). Co-authored-by: multica-agent <github@multica.ai> * feat(provider-logo): swap inline placeholder for real Antigravity PNG Replaces the hand-drawn planet+arc placeholder with the official asset shipped from Downloads. Stored next to the component; bundlers (Next.js / electron-vite) resolve the PNG import to a URL string at build time. Added a small assets.d.ts so packages/views' tsc accepts PNG / SVG module imports — there was no prior asset usage in this package to register the declaration. --------- Co-authored-by: J <j@multica.ai> Co-authored-by: multica-agent <github@multica.ai>	2026-05-28 15:40:05 +08:00
Bohan Jiang	d39da9f7f0	MUL-2764: feat(agents): add MCP config tab to agent detail page (#3419 ) * MUL-2764: feat(agents): add MCP config tab to agent detail page Backend already stores `mcp_config` and the daemon forwards it to the runtime CLI via `--mcp-config`; this only adds the UI entry point. The new tab presents a JSON editor that pretty-prints the existing config, validates the buffer on every keystroke, and saves through the existing `PUT /api/agents/{id}` path. Clearing the editor sends `mcp_config: null`, which the handler reads as "wipe the column" and the daemon falls back to the CLI's own default. When the caller can't see secrets (agent actor, or a non-owner non-admin member), the server already returns `mcp_config: null` with `mcp_config_redacted: true`; the tab renders a read-only "configured but hidden" state in that case so a non-privileged member cannot silently overwrite an admin-owned config by saving an empty editor. Co-authored-by: multica-agent <github@multica.ai> * fix(agents): MCP tab — preserve in-flight edits + warn non-Claude runtimes - Fix stale-editor sync: compare the local draft against the previous original via a ref, so a background agent refetch updates an untouched editor instead of being silently ignored. Without this, a draft equal to the OLD original was treated as user-edited after the prop changed, and the next Save would write the old config back over a concurrent admin edit. - Surface a notice inside the tab when the agent's runtime provider is not Claude — today's daemon only forwards mcp_config via Claude's --mcp-config, so saving on e.g. a Codex agent was silent but ineffective. - Tests for both: rerender resyncs an untouched editor, rerender preserves an in-flight edit, warning renders on non-Claude / hides on Claude. MUL-2764 Co-authored-by: multica-agent <github@multica.ai> * MUL-2764: feat(agents): codex MCP support + hide MCP tab on unsupported runtimes - Backend: codex.go now translates agent.mcp_config (Claude-style `{"mcpServers": {...}}`) into `-c mcp_servers.<name>=<inline-toml>` flags for `codex app-server`, so MCP servers configured in the UI reach Codex's per-task config layer. Bad mcp_config JSON downgrades to a warn-and-skip so it can't break the agent launch. - Frontend: AgentOverviewPane hides the MCP tab when the agent's runtime provider doesn't read mcp_config — only `claude` and `codex` are supported today, every other provider sees no MCP tab. The previous in-tab warning is removed (no longer reachable). - New shared helper `providerSupportsMcpConfig` lives in `@multica/core/agents` so views and any future caller share one list of MCP-aware providers. - Tests: new go-side coverage for stdio + url + multi-server inputs, TOML string escaping, malformed-input fallback, and arg ordering vs custom_args; new views-side coverage for which providers surface the MCP tab. En + zh-Hans copy and parity test refreshed. Co-authored-by: multica-agent <github@multica.ai> * MUL-2764: fix(agents): keep codex mcp_config secrets out of argv/logs Move the agent's mcp_config from a `-c mcp_servers.<id>=<inline-toml>` argv flag into a daemon-managed `[mcp_servers.]` block inside the per-task `$CODEX_HOME/config.toml`. mcp_servers.<id>.env is a documented Codex config field and the UI already treats mcp_config as redacted for non-admins; argv would have leaked those values into `ps aux` and the `agent command` log line. The file is forced to 0600 to keep secrets in the daemon owner's lane regardless of the seed file's mode. Also drop user-supplied `-c/--config mcp_servers.` entries from custom_args. Codex `-c` is last-wins (verified against codex-cli 0.132.0), so without filtering, a custom_args entry could silently shadow whatever the MCP Tab saved. Strip inherited `[mcp_servers.]` tables from the per-task config.toml when the agent has its own mcp_config, mirroring Claude's `--strict-mcp-config`: avoids TOML "table already exists" errors on name collisions and matches admin expectations that the MCP Tab is the authoritative source for that task. Co-authored-by: multica-agent <github@multica.ai> MUL-2764: fix(agents): codex mcp_config three-state semantics + custom_args compat Address the third review pass: 1. Distinguish nil vs present-but-empty mcp_config. `{}` and `{"mcpServers":{}}` now count as "admin saved an explicit (empty) managed set" — strip inherited user `[mcp_servers.]` and pin an empty managed marker block. Only SQL NULL / JSON `null` map to "absent" and fall back to the user's global `~/.codex/config.toml`. This aligns Codex with the API's three-state contract (omit / null / object) and with Claude's `--strict-mcp-config` semantics. 2. Fail closed on `ensureCodexMcpConfig` errors and on managed mcp_config without CODEX_HOME. Previous warn-and-launch would silently inherit the user's global MCP servers and look identical to a successful apply — exactly the surprise the MCP Tab is meant to remove. 3. Only filter `-c mcp_servers.` from `custom_args`/`extra_args` when the agent has a managed mcp_config. Pre-MUL-2764 agents that configured MCP via custom_args keep working; once an admin opts in via the MCP Tab the daemon owns the `mcp_servers` namespace and overrides are dropped (last-wins safety). 4. Update mcp_config locale intro to mention $CODEX_HOME/config.toml instead of the now-removed `-c mcp_servers.*` argv path. Tests: - Split `TestEnsureCodexMcpConfigEmptyInputsAreNoop` into `TestEnsureCodexMcpConfigAbsentLeavesUserTablesAlone` (nil/null) and `TestEnsureCodexMcpConfigEmptyManagedSetStripsUserMcp` (`{}`, `{"mcpServers":{}}`). - Add `TestEnsureCodexMcpConfigEmptyManagedSetIdempotent` to pin byte-identical reruns on the empty managed marker block. - Add `TestHasManagedCodexMcpConfig` covering the eight relevant inputs. - Add `TestBuildCodexArgsPreservesCustomMcpOverridesWhenUnmanaged` and `TestBuildCodexArgsDropsCustomMcpOverridesWhenManaged` to pin the new gating. - Add `TestCodexExecuteFailsClosedWhenMcpConfigInvalid` and `TestCodexExecuteFailsClosedWhenManagedMcpButNoCodexHome` for the Execute paths. Co-authored-by: multica-agent <github@multica.ai> --------- Co-authored-by: J <j@multica.ai> Co-authored-by: multica-agent <github@multica.ai>	2026-05-28 15:11:28 +08:00
Bohan Jiang	2bda4065d0	MUL-2708: fix(agent): preserve multi-line Pi prompt on Windows by bypassing the .cmd shim (#3417 ) Pi is installed on Windows via npm, which lays down `pi.cmd` → `pi.ps1` → `node_modules/@mariozechner/pi-coding-agent/dist/cli.js`. The daemon spawns Pi with `exec.Command("pi", ...)`; PATHEXT resolves that to `pi.cmd`, and cmd.exe expands `%` in the shim by re-tokenising the original command line, which truncates any argv containing newlines. buildPiArgs passes the full prompt as the last positional argv, so the multi-line system+user prompt is silently cut at the first newline before it reaches the JS entrypoint. The session JSONL then records only the first line ("You are running as a chat assistant for a Multica workspace.") and Pi replies as if the user message were missing (GitHub multica-ai/multica#3306). Mirror the existing cursor-agent fix: when LookPath resolves Pi to a .cmd/.bat launcher and a sibling pi.ps1 exists, invoke PowerShell with `-File <ps1>` directly and forward each arg as a discrete token. This keeps us on the official launch path while skipping the cmd.exe % re-expansion. Falls back to the original launcher when pi.ps1 or PowerShell can't be located. The Windows test asserts the rewrite produces the expected argv and that the multi-line positional prompt survives unchanged. Co-authored-by: J <j@multica.ai> Co-authored-by: multica-agent <github@multica.ai>	2026-05-28 12:36:16 +08:00
Kagura	f02bc56e70	fix(agent/cursor): remove obsolete 'chat' subcommand from argv (#3077 ) (#3092 ) The current cursor-agent CLI no longer has a 'chat' subcommand. The positional 'chat' argument was silently treated as prompt text, leaking into the user message (e.g. 'chat <actual prompt>'). Remove 'chat' from buildCursorArgs so the generated argv matches the current cursor-agent CLI interface. Fixes #3077	2026-05-27 16:40:29 +08:00
Multica Eve	311cf4d998	fix(agent): surface Codex app-server no-progress diagnostics (MUL-2688) Refs #3262.	2026-05-26 18:42:47 +08:00
Multica Eve	26ff52385b	fix: attribute Hermes usage to current model (MUL-2696) Fix Hermes ACP usage attribution to current model when agent.model is unset. Also preserves cache-read token accounting and makes ACP model-list parsing more tolerant of snake_case payloads and Unknown display names.	2026-05-26 18:13:28 +08:00
Multica Eve	744b474199	revert(agent): remove per-agent local skill toggle (MUL-2603) (#3286 ) * Revert "feat(agents): hide skills_local toggle for runtimes that don't honour it (MUL-2603) (#3276)" This reverts commit `0b50c5a209`. Co-authored-by: multica-agent <github@multica.ai> * Revert "fix(agent): surface host OAuth token via env var on macOS isolation (MUL-2603) (#3267)" This reverts commit `a67bf81225`. Co-authored-by: multica-agent <github@multica.ai> * Revert "fix(agents): tighten skills-tab intro and drop redundant import hint (#3265)" This reverts commit `d8075a5775`. Co-authored-by: multica-agent <github@multica.ai> * Revert "fix(agent): mirror $HOME/.claude.json into isolated config dir (MUL-2661) (#3261)" This reverts commit `40da88fc16`. Co-authored-by: multica-agent <github@multica.ai> * Revert "feat(agent): per-agent toggle to isolate host-machine skills (MUL-2603) (#3200)" This reverts commit `960befa56f`. Co-authored-by: multica-agent <github@multica.ai> * Add migration cleanup for reverted agent skills toggle Co-authored-by: multica-agent <github@multica.ai> --------- Co-authored-by: Eve <eve@multica-ai.local> Co-authored-by: multica-agent <github@multica.ai>	2026-05-26 17:00:01 +08:00
Bohan Jiang	a67bf81225	fix(agent): surface host OAuth token via env var on macOS isolation (MUL-2603) (#3267 ) * fix(agent): surface host OAuth token via env var on macOS isolation (MUL-2603) Claude Code 2.x scopes the macOS keychain credentials entry by sha256(CLAUDE_CONFIG_DIR)[:8], so the MUL-2603 isolation path strands the child at "Not logged in" even after #3261 mirrored .claude.json: the child looks up `Claude Code-credentials-<scratch-hash>`, the host token is sitting in the no-suffix `Claude Code-credentials` entry. Read the host OAuth token from the keychain via /usr/bin/security and inject it as CLAUDE_CODE_OAUTH_TOKEN, which bypasses keychain lookup entirely. Linux/Windows continue to use the .credentials.json mirror (no-op there). Operator-pinned tokens and ANTHROPIC_API_KEY both take precedence over the keychain reader. Co-authored-by: multica-agent <github@multica.ai> * fix(agent): tighten empty-value auth gate, pin Claude CLI env-scrub assumption (MUL-2603) Empty-value gate - `ANTHROPIC_API_KEY=` inherited from a login shell that conditionally exports auth previously posed as an "operator pinned API-key auth" choice and disabled the keychain reader, stranding the isolated child at "Not logged in" even though no auth was actually selected. - Custom_env `CLAUDE_CODE_OAUTH_TOKEN=""` (stale agent config) had the same effect, plus would have shadowed a keychain-injected token in libc env lookups that pick the first match. - Both are now treated as noise: the empty entry is dropped from the child env and the keychain reader runs unchanged. Two new unit tests cover the os.Environ side (`...TreatsEmptyAnthropicAPIKeyAsUnpinned`, `...HonorsNonEmptyAnthropicAPIKey`) and the custom_env side (`...EmptyOAuthTokenInCustomEnvAsUnpinned`). Env-scrub boundary - Surfacing `CLAUDE_CODE_OAUTH_TOKEN` to the isolated child is only safe because Claude Code itself drops that variable from the env it hands to Bash / hook subprocesses, so a model-driven `printenv` can never echo the secret into the agent transcript. - Empirically verified against `claude` 2.1.121: printf '...test -n "$CLAUDE_CODE_OAUTH_TOKEN" && echo SET \|\| echo UNSET...' \ \| CLAUDE_CODE_OAUTH_TOKEN=sk-canary-XYZ \ MUL2603_CONTROL=control-value \ claude --print --output-format text \ --allow-dangerously-skip-permissions --allowedTools Bash returned `UNSET` for the OAuth token while the non-sensitive `MUL2603_CONTROL` control returned `CONTROL-SET`, proving the CLI scrubs only the auth env, not the env in general. - Pinned this assumption in a new skip-gated regression test (`TestClaudeCLIScrubsOAuthTokenFromBashSubprocess`) that boots the real CLI with a canary token; failing the test means upstream Claude Code stopped scrubbing and the passthrough must move off env vars before MUL-2603 can ship. Co-authored-by: multica-agent <github@multica.ai> * fix(agent): gate keychain passthrough on default host dir, harden scrub test (MUL-2603) Two follow-ups from the round-2 review on #3267: 1. Custom CLAUDE_CONFIG_DIR no longer pulls the default OAuth token. Claude Code 2.x maps each config dir to its own suffixed `Claude Code-credentials-<hash>` keychain entry, so an operator that pins a managed/custom CLAUDE_CONFIG_DIR via custom_env or the daemon-host env was getting the daemon user's default unsuffixed entry injected into the isolated child — silently crossing accounts, exactly the boundary mirrorHostClaudeJSONIfMissing already protects for `.claude.json`. buildClaudeEnvWith now threads the effective hostConfigDir through and only calls the reader when that dir is the default `$HOME/.claude`. The new gate has a unit-level truth table (TestIsDefaultHostClaudeConfigDir) plus a regression (TestBuildClaudeEnvIsolatedSkipsKeychainForCustomHostConfigDir) that makes a t.Fatal-armed reader prove the gate keeps the read off for custom dirs. 2. Scrub e2e now asserts the control prong and the proof-of-execution marker, not just "canary absent". The previous assertion would false-pass on a model refusal, paraphrase, or "Bash gets no env at all" upstream change. The strengthened version sets a non-secret MUL2603_CONTROL alongside the canary OAuth token and asserts (a) canary is NOT in the transcript, (b) CONTROL-SET IS in the transcript (env propagation works for non-secrets — proves a targeted scrub), (c) UNSET IS in the transcript (the Bash tool actually ran AND saw the OAuth var as empty/unset). Code comment in buildClaudeEnvWith and the test docstring now narrow the security contract to the Bash tool subprocess only; hook subprocess env-scrub is no longer claimed because it has not been verified. Co-authored-by: multica-agent <github@multica.ai> * test(agent): use per-run nonces in Claude scrub e2e to kill false-pass (MUL-2603) Elon's round-3 review flagged that TestClaudeCLIScrubsOAuthTokenFromBashSubprocess still false-passed: the proof markers "UNSET" / "CONTROL-SET" were literal strings in the prompt, so strings.Contains matched them even when the model only paraphrased the prompt without spawning Bash. Replace the hard-coded markers with two per-run random hex nonces passed only via env vars (MUL2603_UNSET_NONCE, MUL2603_CONTROL_NONCE). The prompt now references the variable names, not the values, so the nonces can land in the transcript only if a real Bash subprocess inherits the env vars and echoes them. A paraphrasing or refusing model cannot fake nonces it never saw. Also update the security-boundary comment in buildClaudeEnvWith to describe the nonce-based proof. Co-authored-by: multica-agent <github@multica.ai> --------- Co-authored-by: multica-agent <github@multica.ai>	2026-05-26 15:29:58 +08:00
Bohan Jiang	40da88fc16	fix(agent): mirror $HOME/.claude.json into isolated config dir (MUL-2661) (#3261 ) PR #3200 introduced per-agent `skills_local=ignore` isolation that mirrors the host's Claude config dir into a per-task scratch dir, omitting `skills/` to keep broken local skills out of the CLI's discovery path. The mirror walks entries inside `hostConfigDir` (default: `$HOME/.claude/`), but Claude Code's default layout stores its main config — login state, project history — at `$HOME/.claude.json`, a sibling of `~/.claude/` rather than inside it. Once `CLAUDE_CONFIG_DIR=$ISOLATED` is set, the CLI looks for `$ISOLATED/.claude.json`, finds only `backups/.claude.json.backup.*` (those live inside `~/.claude/` and DO get mirrored), and exits with: Claude configuration file not found at: …/.claude.json Not logged in · Please run /login — so every agent with `skills_local=ignore` on a host using the default Claude layout dies on the first turn. Flipping the toggle back to "merge" restores the host CLAUDE_CONFIG_DIR and recovers the agent; that's the workaround Bohan flagged in MUL-2661. Fix: after the existing `mirrorHostClaudeExceptSkills`, run a new `mirrorHostClaudeJSONIfMissing` that pulls `$HOME/.claude.json` into the scratch dir as `.claude.json` when (a) the dest doesn't already have one and (b) the host source dir is the default `$HOME/.claude/`. The custom-CLAUDE_CONFIG_DIR path is left alone because a pinned custom dir is expected to be self-contained — silently borrowing `$HOME/.claude.json` from a different account would mask credential drift. The helper goes through `createFileLink`, so it inherits the same symlink → junction → hardlink → copy fallback chain the rest of the mirror uses on Windows-without-Developer-Mode hosts. Tests: - `TestMirrorHostClaudeJSONIfMissing_DefaultLayoutMirrorsParentFile` covers the happy path with an injected `homeDir`/`fileLink`. - `TestMirrorHostClaudeJSONIfMissing_AlreadyPresentNoop` asserts a pre-existing dest `.claude.json` (from a custom CLAUDE_CONFIG_DIR mirror) is not overwritten. - `TestMirrorHostClaudeJSONIfMissing_CustomHostDirSkipped` locks in the custom-host-dir gate. - `TestMirrorHostClaudeJSONIfMissing_MissingSourceNoop` documents the env-var-auth-only / fresh-install case. - `TestClaudeExecuteIsolatesProvidesClaudeJSONFromHome` is the end-to-end MUL-2661 regression: a fake `\$HOME` with the default split layout, `skills_local=ignore`, fake claude binary that prints whatever `.claude.json` reaches the scratch dir. Asserts the file rides through. Verified the test fails (with the documented MUL-2661 error message) when the new mirror call is removed. Verification: - `go test ./pkg/agent/...` green (full agent suite). - `GOOS=windows GOARCH=amd64 go vet ./pkg/agent/...` clean. Co-authored-by: multica-agent <github@multica.ai>	2026-05-26 13:50:35 +08:00
Bohan Jiang	960befa56f	feat(agent): per-agent toggle to isolate host-machine skills (MUL-2603) (#3200 ) * feat(agent): per-agent toggle to isolate host-machine skills (MUL-2603) Adds an agent-scoped `skills_local` switch ("ignore" default / "merge") so shared agents stop inheriting the operator's user-global Claude skill directory. A single broken local skill on one operator's machine was crashing the Claude CLI before it ever read stdin — the daemon saw a "broken pipe" with no recoverable signal (GitHub #3052). - DB: migration 108 adds `agent.skills_local` (NOT NULL DEFAULT 'ignore'), with sqlc CreateAgent/UpdateAgent updates and handler validation. - Claude runtime: when the agent is in "ignore" mode the backend points CLAUDE_CONFIG_DIR at an empty per-task scratch dir under the task cwd (fallback: OS temp), strips any inherited override, and cleans up after the run. Workspace skills under `{cwd}/.claude/skills/` still load. "merge" preserves the legacy inherit-from-machine behavior; Codex and other isolated backends are no-ops. - UI: new Skills toggle in the Create Agent dialog and the Agent → Skills tab, with EN/zh-Hans copy and SkillsLocalToggle shared between the two. - Tests: unit coverage for the new env helper, isolation dir lifecycle, full Claude execute paths (ignore + merge), and the handler tristate contract. Existing skills-tab test updated for the new copy. - Docs: updated `/skills` docs (EN + ZH) and added a 0.3.7 changelog entry in the landing-page i18n. Co-authored-by: multica-agent <github@multica.ai> * fix(agent): preserve claude login + validate skills_local input (MUL-2603) Address Elon's review on PR #3200: 1. Skill isolation no longer drops the operator's Claude login. The per-task scratch dir now mirrors every entry under `~/.claude/` as symlinks except `skills/`, so `.credentials.json`, settings, plugins, etc. reach the CLI exactly as on the host while the user-global skills directory stays hidden. Without this, default `ignore` would have broken every Claude agent on a non-API-key host the moment migration 108 landed. 2. Internal CreateAgent callers (agent_template, onboarding_shim) now set `SkillsLocal: "ignore"`. The Go zero value was about to trip the migration-108 CHECK constraint and 500 template / onboarding agent creation. 3. Create / update handler validation no longer normalizes garbage to "ignore". The strict 400 path is now reachable on bad client input; the drift-safe `normalizeSkillsLocal` stays on the read side only. UI copy + docs clarified that the toggle is Claude-only; other runtimes ignore the setting. Verification: - `go test ./...` green (full suite locally). - `pnpm --filter @multica/views exec vitest run agents/components/tabs/skills-tab.test.tsx` green. - Handler DB-backed tests still skip locally without docker (same as Elon's run) — CI will validate the create / update paths against migration 108. Co-authored-by: multica-agent <github@multica.ai> * fix(agent): mirror effective claude config dir with windows fallback (MUL-2603) Address Elon's second-round review on PR #3200: 1. The per-task scratch dir now mirrors the effective host Claude config dir, not unconditionally `~/.claude/`. Precedence: agent `custom_env` CLAUDE_CONFIG_DIR > parent process env > `~/.claude/`. Without this, an operator who pinned Claude at a managed install (custom env CLAUDE_CONFIG_DIR) would get the wrong credentials in the scratch dir, because `buildClaudeEnv` strips that env before handing it to the child. We resolve the source up front and feed it to the mirror, so the override env still points at the right bytes. 2. Mirror entries now go through platform-aware linkers. On Windows without Developer Mode / admin, `os.Symlink` is denied, which previously left the scratch dir empty and broke Claude Code auth on default `ignore`. The new helpers try symlink first, then fall back to a directory junction (`mklink /J`) for dirs or a hardlink (same-volume content share) / copy for files. Mirrors the execenv/codex_home_link_windows.go pattern. 3. Tests: - `TestResolveHostClaudeConfigDir` locks in the custom_env > parent_env > `~/.claude` precedence. - `TestNewIsolatedClaudeConfigDirMirrorsCustomHostDir` confirms the scratch dir picks up `.credentials.json` from a synthetic custom host dir, proving the source resolution actually propagates into the mirror. - `TestNewIsolatedClaudeConfigDirEmptyHostIsNoop` documents the env-var-auth-only case (no host source ⇒ empty scratch dir). - `TestMirrorHostClaudeExceptSkillsWith_FallbackWhenSymlinkFails` exercises the Windows-no-Developer-Mode path via the new `mirrorHostClaudeExceptSkillsWith` seam, asserting credentials and sub-dir children still reach the scratch dir after the symlink stand-in fails. - `TestMirrorHostClaudeExceptSkillsWith_PropagatesFirstLinkError` confirms callers see the per-entry error when even fallback fails (so the warn-log fires on broken Windows installs). - `TestCopyFileRoundTrip` covers the last-resort copy fallback and its EXCL no-overwrite contract. - `TestClaudeExecuteIsolatesUsesCustomEnvSource` is the end-to-end check: an agent with custom_env CLAUDE_CONFIG_DIR reads its credentials from the pinned dir, not `~/.claude/`. 4. Docs: `apps/docs/content/docs/skills.{mdx,zh.mdx}` updated to describe the effective-source resolution and the Windows fallback chain so the docs match the runtime behaviour. Verification: - `go test ./...` green (full server suite locally, including `pkg/agent` 23 cases covering the new + existing isolation paths). - `GOOS=windows GOARCH=amd64 go vet ./pkg/agent/...` and `go test -c -o /dev/null` both compile clean, confirming the Windows-tagged linker file builds. Co-authored-by: multica-agent <github@multica.ai> * fix(agent): default skills_local to merge to preserve legacy behavior (MUL-2603) Per Bohan's product decision on PR #3200, the per-agent host-skill toggle defaults to "merge" — the pre-MUL-2603 inherit-from-machine behavior — so existing personal workflows that rely on locally installed Claude Skills keep working unchanged. Agent owners explicitly opt into "ignore" when they need to harden a shared agent against a broken local skill on one operator's machine (GitHub #3052). Also audited all 11 runtimes for user-global skill discovery paths and documented the scope of the toggle. Only Claude reads a user-global `~/.claude/skills/`; Codex isolates via `CODEX_HOME`, the ACP backends (Hermes / Kimi / Kiro) and the JSON-stream backends (Copilot / Cursor / Gemini / Pi / OpenCode / OpenClaw) anchor discovery to the task workdir and never read a user-global skill directory. UI copy and docs now say "for runtimes that support it (currently Claude Code)" everywhere so the scope is explicit. Changes: - Migration 108: column default flipped to 'merge'. - Handler CreateAgent: missing field → "merge"; explicit "ignore" / "merge" still validated, garbage still 400. - normalizeSkillsLocal: drift-safe coercion now lands on "merge" for anything that isn't the exact literal "ignore". - agent_template.go / onboarding_shim.go: internal CreateAgent callers send "merge" instead of "ignore" to match the new default. - Claude runtime (`claude.go`): isolate-mode gate flipped from `SkillsLocal != "merge"` to `SkillsLocal == "ignore"`, so "" (legacy daemons / older clients) and "merge" both walk `~/.claude/` directly. - Create Agent dialog + Skills tab: toggle defaults to on (merge); only duplicate of an explicit "ignore" agent carries through. The isolation opt-in is now `skills_local: "ignore"` when the user flips off; "merge" is omitted from the request body. - i18n (EN + zh-Hans): copy reframed — "On (default) — merged"; "Off — ignored. Recommended for shared agents". - Docs (`/skills`, `/guides/agents.zh`): describe new default and enumerate which runtimes act on the toggle. - Landing changelog 0.3.7: retitled "Per-Agent Local-Skill Toggle"; note the on-by-default behavior + off-to-isolate framing. - Tests: - `TestClaudeExecuteIsolatesHostSkillsWhenIgnoreOptedIn` replaces the old by-default isolation case (now requires explicit "ignore"). - New `TestClaudeExecuteDefaultModeKeepsHostConfigDir` locks in that default ExecOptions preserve the host CLAUDE_CONFIG_DIR. - `TestClaudeExecuteIsolatesUsesCustomEnvSource` now explicitly opts into "ignore" mode. - Handler tests: omitted → "merge"; explicit "ignore" round-trips; preserve-existing test seeds "ignore" and asserts "merge" flip-back. - `TestNormalizeSkillsLocal_DriftStaysSafe`: only literal "ignore" maps to ignore; everything else → "merge". - `skills-tab.test.tsx`: toggle ON by default; flip OFF when agent opted into "ignore". Intro-text matcher anchored to a more specific phrase so it no longer collides with the toggle hint copy. Verification: - `go test ./...` green (full server suite locally). - `GOOS=windows GOARCH=amd64 go vet ./pkg/agent/...` and `go test -c -o /dev/null` both compile clean (windows-tagged linker file still builds). - `pnpm typecheck` green across all packages and apps. - `pnpm --filter @multica/views test` 88 files / 771 tests green. - `pnpm --filter @multica/core test` 43 files / 390 tests green. - Handler DB-backed tests still skip locally without docker; CI will validate the create / update paths against migration 108. Co-authored-by: multica-agent <github@multica.ai> * chore(landing): drop 0.3.7 changelog entry from this PR (MUL-2603) The landing-page release notes belong in a separate release-prep PR, not in the feature PR. Co-authored-by: multica-agent <github@multica.ai> * fix(agent): propagate skills_local=ignore to codex user-skill seed (MUL-2603) Make the per-agent skills_local toggle real for Codex too, not just Claude. Previously the toggle was only consumed by the Claude backend, while the daemon's execenv layer always seeded Codex's per-task CODEX_HOME with the host machine's user-installed skills from ~/.codex/skills/. A shared Codex agent with skills_local=ignore could still inherit a broken local skill from one operator's machine. Now: PrepareParams/ReuseParams carry SkillsLocal; hydrateCodexSkills skips seedUserCodexSkills when SkillsLocal == "ignore" so the per-task CODEX_HOME exposes only workspace skills to the codex CLI. Default ("merge", or empty from older servers/clients) preserves existing inherit-from-machine behavior. UI / docs are updated to reflect the contract honestly: Claude Code and Codex honor the toggle; other runtimes (Hermes / Kimi / Kiro / Copilot / Cursor / Gemini / Pi / OpenCode / OpenClaw) leave $HOME untouched and discover user-level skills natively, so the toggle is a no-op for them today. New tests: TestPrepareCodexSkillsLocalIgnoreSkipsUserSeed, TestPrepareCodexSkillsLocalMergeSeedsUserSkills, and TestReuseCodexSkillsLocalIgnoreSkipsUserSeed cover Prepare(ignore), Prepare(merge), and the toggle-flip-on-reuse path. Co-authored-by: multica-agent <github@multica.ai> * docs(skills): scope skills_local toggle copy to Claude Code + Codex (MUL-2603) Off-state hint and Skills tab intro now explicitly call out Claude Code + Codex as the only runtimes that honor the toggle, with "other runtimes ignore this setting" wired into both states (en + zh-Hans), so users on non-Claude/Codex agents don't read "Off" as runtime-wide isolation. Docs (skills.mdx, skills.zh.mdx, guides/agents.zh.mdx) stop describing Hermes / Kimi / Gemini / Copilot / Cursor / Pi / OpenCode / OpenClaw / Kiro as having native user-level skill discovery; the daemon simply does not manage user-level skill discovery for those runtimes today, and the toggle is a no-op regardless of where it is set. Co-authored-by: multica-agent <github@multica.ai> --------- Co-authored-by: multica-agent <github@multica.ai>	2026-05-26 13:26:33 +08:00
Wes	cfc652aa5f	fix(daemon): close stdin pipe in Pi adapter to deliver EOF (#2188 ) (#3118 ) Pi reads its prompt from argv (positional, see buildPiArgs) and never expects interactive input, so the Pi backend previously left cmd.Stdin nil. Under systemd, the resulting /dev/null character device has been observed not to satisfy Pi's readable-side wait, leaving runs stuck in "working" forever (#2188). Attach an explicit StdinPipe and close it immediately after Start so the child sees an EOF on a FIFO, matching the pattern already used by the Claude, Codex, Hermes, Kiro, and Kimi backends. The fix is defensive on the daemon side because Pi is mid-refactor and is not accepting issues upstream; once Pi itself stops blocking on stdin, this close is still correct (a closed pipe is a no-op for a process that does not read it). Test asserts the structural invariant: a shell-stub `pi` inspects /proc/self/fd/0 and only emits a valid event stream when stdin is a FIFO. If a future change drops the StdinPipe and stdin reverts to /dev/null (char device), the stub exits non-zero and the test fails.	2026-05-25 15:29:09 +08:00
Dmitry	5bc77f2953	fix(pi): strip leaked tool markup safely (#2956 )	2026-05-22 15:46:10 +08:00
Bohan Jiang	a6f19380b2	test(agent): use ForkLock helper to fix ETXTBSY flake in thinking tests (#3062 ) Two thinking tests wrote fake CLI scripts via os.WriteFile and immediately execed them. Under t.Parallel() with the rest of pkg/agent, a sibling test's concurrent fork can inherit our still-open write fd, so Linux returns ETXTBSY at exec time (Go #22315). CI hit this on main as "TestRunCodexDebugModels_ArgvSeenByBinary: fork/exec ...: text file busy". Switch both call sites to the existing writeTestExecutable helper, which holds syscall.ForkLock across OpenFile→Write→Close so no concurrent fork can inherit the write fd. Same pattern the rest of the package already uses (kimi, kiro, codex, claude tests).	2026-05-22 14:53:56 +08:00
YOMXXX	ed2957ddf8	fix(claude): record result model usage (#2899 )	2026-05-21 13:00:12 +08:00
iYuan	2f1f90c11a	fix(agent): retry codex semantic inactivity fresh (#2593 )	2026-05-20 20:03:39 +08:00
YOMXXX	34f16e2c7a	fix(opencode): deny interactive questions in daemon mode (#2878 ) * fix(opencode): deny interactive questions in daemon mode * fix(opencode): avoid permission env ordering bypass	2026-05-20 17:17:31 +08:00
Bohan Jiang	2bec2221d2	feat(agent): per-agent thinking_level for claude + codex (MUL-2339) (#2865 ) * feat(agent): persist thinking_level per agent (MUL-2339) Adds a nullable `thinking_level` column to the `agent` table so the backend can route a runtime-native reasoning/effort token (e.g. Claude's `xhigh`, Codex's `minimal`) through to the agent CLI on every dispatch. The column is intentionally TEXT rather than an enum — Claude and Codex publish overlapping but distinct vocabularies and we want the persisted value to round-trip exactly through whichever CLI receives it. NULL is the "use runtime default" sentinel that every downstream consumer reads as "do not inject --effort / reasoning_effort". This commit is just the storage layer (migration + sqlc); subsequent commits wire it through the API, daemon, and agent backends. Co-authored-by: multica-agent <github@multica.ai> * feat(agent-backend): inject reasoning effort for claude + codex (MUL-2339) Extends ExecOptions with a runtime-native ThinkingLevel string and wires it into the Claude and Codex backends. Discovery is driven by the local CLI so the daemon advertises whatever the host install supports rather than a hand-maintained list that goes stale. Per Elon's PR1 review: - Claude: parses `claude --help` to learn the `--effort` superset and projects through a per-model allow-list (xhigh is Opus-only; max is session-only on the smaller models). Falls back to a conservative static list when the binary is missing or help drift hides the line. - Codex: drives `codex debug models --output json` so per-model reasoning subsets and the documented default come directly from the CLI. The older config-error probe trick is gone — the JSON path is stable and doesn't pollute stderr with an intentional misconfig. - Cache key includes (provider, executablePath, cliVersion) so a CLI upgrade invalidates entries that referenced the older help / catalog. Per Trump's PR1 constraint, all three Codex injection points (thread/start.config, thread/resume.config, turn/start.effort) flow through one helper (`applyCodexReasoningEffort`) so they cannot drift independently. The shared `codexReasoningCases` fixture in `thinking_test.go` asserts the same value→{shape, key} contract at each site for every level the runtimes know about. Claude's `--effort` is also added to `claudeBlockedArgs` so a user custom_args entry can't silently outvote the daemon-injected value. Co-authored-by: multica-agent <github@multica.ai> * feat(api): wire thinking_level through API + daemon contract (MUL-2339) End-to-end plumbing for the per-agent reasoning/effort setting: - AgentResponse / TaskAgentData now carry `thinking_level`; the daemon's claim response includes it and the daemon's executor passes it through to agent.ExecOptions, where the Claude and Codex backends already know what to do with it. - ModelEntry on the runtime-models wire format gains a `thinking` block carrying `supported_levels` + `default_level` per model so the UI can render a runtime-aware picker without the server having to know about the local CLI install. `handleModelList` projects the agent-package catalog (including the new Thinking field) into the wire shape. - CreateAgent / UpdateAgent gate the field with a synchronous provider enum check (claude / codex only today). UpdateAgent is tri-state: field omitted = no change, "" = explicit clear (new `ClearAgentThinkingLevel` query, mirrors the existing mcp_config null pattern), non-empty = validate then set. Per Trump's PR1 review, the API NEVER auto-clears on a runtime/model swap and ALWAYS returns 400 on an unknown literal value — same shape across CreateAgent, UpdateAgent, and combined patches that move runtime + level in one request. Per-model combination failures (e.g. `xhigh` against a model that only supports up to `high`) surface as a daemon-side task error, not a silent server-side rewrite. TS types follow the same shape: `Agent.thinking_level`, `CreateAgentRequest`/`UpdateAgentRequest` add the field, `RuntimeModel` grows a `thinking` block. Older backends omit the field, which the front-end treats as "no picker for this model" — installed desktop builds keep working. Co-authored-by: multica-agent <github@multica.ai> * fix(agent): correct codex debug models argv + pin via runner test (MUL-2339) `codex debug models --output json` is rejected by codex-cli 0.131.0 — the subcommand emits JSON on stdout by default and has no `--output` flag. Drop the flag and add `--bundled` to skip the network refresh discovery doesn't need. Move the argv to a package-level var and add a test that runs a fake `codex` to assert the binary actually receives exactly `debug models --bundled`, so the contract can't silently drift on the next refactor. Also teach ValidateThinkingLevel to resolve an empty model to the provider's default model entry. Without this, every default-model task with a persisted thinking_level would be misjudged "unknown model" by the daemon guard. Co-authored-by: multica-agent <github@multica.ai> * fix(api): reject runtime switch that would leave invalid thinking_level (MUL-2339) A PATCH that changed `runtime_id` without touching `thinking_level` used to silently keep the existing value, so a Claude agent storing `max` could land on a Codex runtime where `max` is not a recognised token at all, and the daemon would receive a literal-invalid level. Hold the same "always 400 on literal-invalid, never silent coerce" rule on this implicit path. When runtime_id changes and the existing value is not in the new provider's enum, return 400 with the recovery options (clear via `thinking_level=""` or re-set in the same PATCH). Add coverage for both the kept-when-still-valid and the rejected cases, plus the two recovery paths (clear and replace). Co-authored-by: multica-agent <github@multica.ai> * fix(daemon): guard runTask with per-model thinking_level validator (MUL-2339) ValidateThinkingLevel existed but had no call site — `task.Agent. ThinkingLevel` flowed straight into ExecOptions, so `xhigh` configured on a non-Opus Claude model, or API-side stale values that escaped the provider enum gate, would be injected anyway. Run the validator before building ExecOptions. Invalid combinations log a warning and drop the level instead of failing the task: the agent still runs, just at the runtime's default reasoning effort. Discovery errors fail open (keep the level, let the CLI surface any objection) so a transient `claude --help` failure can't strand work. Empty model is forwarded as-is; the validator resolves it to the provider's default model internally per the cross-package contract. Co-authored-by: multica-agent <github@multica.ai> * chore(agent): drop stale `--output json` comments + unused scanner (MUL-2339) Codex CLI's `debug models` subcommand emits JSON without an `--output` flag, and `parseCodexDebugModels` never read from the bufio.Scanner. Sync the comments with the actual invocation and remove the dead init. Co-authored-by: multica-agent <github@multica.ai> --------- Co-authored-by: multica-agent <github@multica.ai>	2026-05-20 12:30:10 +08:00
Joey Frasier (Boothe)	76cd8275ff	fix(openclaw): parse whole buffer instead of line-by-line scanner (MUL-1908) (#2292 ) * fix(openclaw): parse whole buffer instead of line-by-line scanner Follow-up to `c87d7676` (WOR-10). The stdout/stderr swap fixed the dominant case but `processOutput` still scanned line-by-line and only attempted a whole-buffer parse from a fragile fallback path. Pretty-printed JSON (openclaw 2026.5.x emits the result blob indented across many lines) made every individual line unparseable on its own — `{`, ` "payloads": [`, ` {`, etc. — so the success path hinged entirely on the fallback joining `rawLines` and re-trying. Under load (daemon restarts racing the close-on-cancel goroutine, partial chunked reads when stdout closes mid-flight) the line scanner could see truncated input that never reassembled into valid JSON, surfacing "openclaw returned no parseable output" against runs where the agent had in fact completed the work and posted comments. Roughly 30–40% of recent runs in v0.2.27 logs hit this path; multica still wrote a `task_failed` inbox row for each one even though the underlying issue had moved to `in_review` or `done`. The fix: - processOutput now reads the full stdout buffer with `io.ReadAll` first. - A new `parseWholeBufferOpenclawResult` helper attempts a single `json.Unmarshal` against the entire buffer (after trimming, and after optionally stripping leading non-JSON log lines). When it matches, we build the result and return — the line scanner never runs. - If the whole-buffer parse fails, we fall through to the existing NDJSON line-by-line scanner. This preserves streaming-event support (kept for forward compatibility and other backends) without leaving openclaw's dominant pretty-printed shape at the mercy of timing. - The failure path now emits a `(got N bytes; preview: ...)` suffix on the canonical "no parseable output" error so future debugging isn't blind. The exact canonical phrase is preserved for empty buffers so existing dashboards / log-grep tooling keep matching. Tests: - TestOpenclawProcessOutputWholeBufferPrettyJSON: feeds a hand-crafted multi-line indented blob (multiple payloads, nested agentMeta, usage map) and asserts every field round-trips through the whole-buffer fast path. - TestOpenclawProcessOutputDeeplyIndentedFixture: re-runs the recorded openclaw 2026.5.5 stdout fixture (1070 lines) directly through parseWholeBufferOpenclawResult, asserting the bug-shape parses cleanly on the first attempt without falling through to NDJSON scanning. - TestOpenclawProcessOutputEmptyBufferErrorIncludesByteCount: tightens the empty-buffer failure path, asserts the canonical phrase survives so observability tooling keeps working. All existing tests in the openclaw + buildOpenclawArgs suites stay green (streaming NDJSON event tests, lifecycle tests, structured-error tests, usage-field-variant tests). The two pre-existing flaky timeout-tight codex tests (TestCodexExecuteSemanticInactivityAllowsContinuous) fail on both this branch and on `c87d7676` baseline; they are unrelated and out of scope here. Co-authored-by: multica-agent <github@multica.ai> fix(openclaw): drop dead preview branch, document streaming regression Rebase + review-fix follow-up on top of f27df2d9b. processOutput's preview branch was unreachable: openclawNoParseableOutputError was only called from the `!gotEvents && trimmed == ""` path, which by construction means the entire scanned buffer collapsed to whitespace, so the `(got N bytes; preview: ...)` formatter could never fire on a non-empty buffer. Replace the helper with a single canonical-string constant (callsite is now inline) and update the test name to match what it actually asserts (the canonical empty-buffer error string is preserved for external log-grep / dashboard consumers). Also document on processOutput that the line-scanner path is no longer truly streaming after the io.ReadAll switch: events accumulate until stdout closes. OpenClaw 2026.5.x does not emit streaming events so this regression is invisible today, but flag it for the next backend that might. Misc: switch the scanner's input source from `strings.NewReader(string(buf))` to `bytes.NewReader(buf)` to drop one unnecessary byte/string round-trip. MUL-1908 Co-authored-by: multica-agent <github@multica.ai> --------- Co-authored-by: multica-agent <github@multica.ai> Co-authored-by: J (Multica agent) <j@multica.local>	2026-05-19 17:42:41 +08:00
Bohan Jiang	9a577f3e11	fix(runtimes): anchor OpenCode skill + AGENTS.md discovery to task workdir (MUL-2416) (#2849 ) * fix(runtimes): anchor OpenCode skill + AGENTS.md discovery to task workdir OpenCode resolves its project discovery root from `--dir` and `PWD` before falling back to `process.cwd()`. The daemon set `cmd.Dir = workDir` but never overrode the inherited `PWD`, so OpenCode walked from the daemon's shell directory and silently bypassed the per-task workdir — agents lost visibility into `.opencode/skills/` and `AGENTS.md`, falling back to whatever global skills the host had installed (MUL-2416). - Pass `opencode run --dir <workDir>` and override `PWD=<workDir>` in the child env so AGENTS.md walk-up + `.opencode/skills` project config scan both anchor on the task workdir. - Block `--dir` from custom args so user overrides cannot re-introduce the regression. - Plumb skill `description` from DB through service / daemon / execenv. `writeSkillFiles` synthesizes a YAML frontmatter block (`name`, optional `description`) when the stored content lacks one, since runtimes like OpenCode silently drop SKILL.md files without a parseable `name`. Existing frontmatter is preserved unchanged so upstream-imported skills (GitHub / ClawHub / Skills.sh) keep their hand-shaped metadata. Tests: - New fake-CLI test confirms argv carries `--dir <workDir>` and the child sees `PWD=<workDir>`. - New test confirms a user-supplied `--dir` in custom_args is dropped. - New execenv tests cover synthesized frontmatter and preservation of pre-existing frontmatter. Co-authored-by: multica-agent <github@multica.ai> * fix(runtimes): inject SKILL.md `name` when upstream frontmatter omits it Skills imported with frontmatter that sets `description` but leaves `name` implicit (relying on the directory slug, as common in GitHub/Skills.sh imports) still hit OpenCode's "no parseable name → drop" path because the DB Name fallback never made it into the SKILL.md body. ensureSkillFrontmatter now scans the existing block and, when name is missing or empty, prepends `name: <slug>` while preserving description, body, and any runtime-specific keys verbatim. Also tighten yamlEscapeInline to always double-quote so descriptions that look like YAML keywords (`null`, `true`, `[foo]`, `{x: y}`, `2024-01-01`) parse as strings rather than getting reinterpreted and rejected. Adds regression test for the nameless-frontmatter case and updates the existing OpenCode skill test for the always-quoted description format. Co-authored-by: multica-agent <github@multica.ai> --------- Co-authored-by: multica-agent <github@multica.ai>	2026-05-19 16:21:02 +08:00
Bohan Jiang	e8d4b9a0a2	revert: drop exec_command watchdog (#2779 , #2786 ) (MUL-2337) (#2803 ) * Revert "fix(codex): bump default exec_command stuck timeout to 3 minutes (#2786)" This reverts commit `433cd1aaf5`. Co-authored-by: multica-agent <github@multica.ai> * Revert "feat(codex): add per-exec_command watchdog to escape dropped function_call_output (MUL-2337) (#2779)" This reverts commit `60bae62622`. Co-authored-by: multica-agent <github@multica.ai> --------- Co-authored-by: multica-agent <github@multica.ai>	2026-05-18 18:08:07 +08:00
Bohan Jiang	433cd1aaf5	fix(codex): bump default exec_command stuck timeout to 3 minutes (#2786 ) The watchdog fires on a "no progress" window, so the default mainly matters for commands that go fully silent (no outputDelta). Bumping from 2m → 3m leaves more headroom for legitimately slow silent commands before treating them as a dropped function_call_output, at a modest cost to recovery latency. MUL-2337 Co-authored-by: multica-agent <github@multica.ai>	2026-05-18 15:30:05 +08:00
Bohan Jiang	60bae62622	feat(codex): add per-exec_command watchdog to escape dropped function_call_output (MUL-2337) (#2779 ) * feat(codex): add per-exec_command watchdog to escape dropped function_call_output (MUL-2337) Codex app-server can drop the second function_call_output when two exec_command calls fan out in the same turn and both async-yield through the yield_time_ms boundary (observed 2026-05-18, MUL-2334 — Trump Agent wedged for 6+ min with no semantic activity events to drive any existing timer). The model then waits forever for the missing output; only the 10-minute semantic inactivity timeout would eventually rescue the run. Add a per-call watchdog in the codex client that tracks open exec_command / commandExecution items by call_id and fails the turn quickly (default 2 min, configurable via ExecOptions.ExecCommandStuckTimeout) when one stays open without progress. outputDelta events reset the per-call progress timestamp so long-running streaming commands aren't flagged. This is a daemon-side mitigation only — codex itself still has the upstream race, but the daemon no longer burns the full inactivity budget before the run is marked failed and a new run can recover. Co-authored-by: multica-agent <github@multica.ai> * feat(codex): track legacy exec_command_output_delta in watchdog (MUL-2337) Mirrors the raw v2 item/commandExecution/outputDelta refresh on the legacy codex/event protocol so a long-running streaming exec doesn't get falsely flagged as stuck after begin + 2 min. Co-authored-by: multica-agent <github@multica.ai> --------- Co-authored-by: multica-agent <github@multica.ai>	2026-05-18 15:14:45 +08:00
Bohan Jiang	113c4f4e90	docs(agent): clarify openclaw agent id vs name semantics (#2744 ) Follow-up to #2716. Updates two stale comments that still described openclaw's `name` and `id` as interchangeable. The actual contract: `id` is the routing key passed to `openclaw agent --agent <id>`; `name` is a human display label and is not safe to pass to the CLI. No behavior change. Co-authored-by: multica-agent <github@multica.ai>	2026-05-17 17:20:41 +08:00
Kagura	44d2fc1946	fix(agent): use openclaw agent id instead of name for --agent flag (#2716 ) openclawEntriesToModels() used the agent Name (which may contain spaces, e.g. "Sub2API OPS") as Model.ID. This ID is passed to openclaw via --agent, where normalizeAgentId mangles spaces into hyphens ("sub2api-ops"), causing a lookup miss against the registered id ("sub2api") and a "no parseable output" error. Fix: prefer agent ID for Model.ID; use Name only for display Label. When ID is empty, fall back to Name for backward compatibility. Fixes #2714	2026-05-17 17:08:00 +08:00
Bohan Jiang	8d872b7521	fix(daemon): disable Claude AskUserQuestion in non-interactive mode (MUL-2244) (#2656 ) * fix(daemon): disable Claude AskUserQuestion in non-interactive mode (MUL-2244) GitHub #2588: when Claude Code calls its built-in AskUserQuestion tool inside the daemon's stream-json runtime, the question never reaches the user — there's no UI to render it — so the SDK returns an empty answer and the agent silently "infers" and continues. From the issue's perspective, execution looks stuck while the agent is actually charging ahead on its own guess. Two-part fix: - `buildClaudeArgs` now passes `--disallowedTools AskUserQuestion` so the tool is not exposed to the model at all. - The Claude-specific runtime brief tells the agent to use a `blocked` issue comment for genuine clarification, or to state an explicit assumption and proceed. Adds a regression test that pins both: AskUserQuestion is forbidden in CLAUDE.md and is NOT mentioned in the AGENTS.md emitted for non-Claude providers (the tool is Claude-specific). Co-authored-by: multica-agent <github@multica.ai> * refactor(daemon): drop CLAUDE.md AskUserQuestion guidance, rely on --disallowedTools The --disallowedTools flag already prevents Claude from invoking AskUserQuestion, so duplicating the rule in the runtime brief just bloats the prompt without changing behavior. Removes the section and its regression test; the argv-level test in pkg/agent already pins the flag. Co-authored-by: multica-agent <github@multica.ai> --------- Co-authored-by: multica-agent <github@multica.ai>	2026-05-15 12:42:23 +08:00
fr00st	cc9fbd3db0	Fix stale Done replies on comment follow-ups (#2495 ) * fix: avoid stale done replies on comment follow-ups * fix: avoid inlining runtime brief for Hermes ACP * fix: address comment follow-up review feedback	2026-05-14 12:00:04 +08:00
Bohan Jiang	5db96b4007	fix(daemon): bypass Gemini folder-trust gate in headless mode (#2516 ) (#2523 ) Gemini CLI's folder-trust feature throws FatalUntrustedWorkspaceError (exit code 55) when the current workspace isn't in `~/.gemini/trustedFolders.json` and the process is headless — no interactive trust prompt is available. The daemon spawns gemini with `-p` + `--yolo` in a freshly checked-out worktree that the user has never trusted interactively, so every run with `security.folderTrust` enabled fails after ~10s with exit status 55 and no useful output. Default `GEMINI_CLI_TRUST_WORKSPACE=true` on the child env to short- circuit `checkPathTrust` in gemini-core. This mirrors gemini-cli's documented `--skip-trust` flag; the env var has been gemini's documented headless escape hatch for the entire folder-trust feature lifetime so the fix works on every gemini version that can produce the crash. Callers that explicitly set the same key in cfg.Env win, preserving the ability to opt back into the gate. Co-authored-by: multica-agent <github@multica.ai>	2026-05-13 17:05:12 +08:00
Bohan Jiang	178cfb5008	fix(daemon): strip Windows chcp noise from runtime version (#2516 ) (#2521 ) The gemini CLI's Windows shim emits `Active code page: 65001` (from `chcp`) to stdout before the real version reaches `--version` output. The daemon stored the raw concatenation as the runtime version, so the runtime detail page rendered `Active code page: 65001 0.42.0` instead of `0.42.0`. Scan `<cli> --version` line by line and return the first line carrying a semver-shaped token. Full strings like `2.1.5 (Claude Code)` or `codex-cli 0.118.0` survive unchanged; unparseable output falls back to the trimmed raw value. Co-authored-by: multica-agent <github@multica.ai>	2026-05-13 16:58:14 +08:00
Kagura	702c48209b	fix(agent): stop filtering Pi extension tools via hardcoded --tools allowlist (#2379 ) (#2381 ) The Pi backend hardcoded `--tools read,bash,edit,write,grep,find,ls` in buildPiArgs. Pi's SDK treats --tools as a restrictive allowlist: only the listed tools pass through `_refreshToolRegistry()`, silently filtering out any user-installed extension tools registered via `pi.registerTool()`. Omitting --tools makes Pi's `allowedToolNames` undefined, so the `isAllowedTool()` filter becomes a no-op and all tools — built-in and extension — are available. This matches Pi's standalone behavior. Users who want to restrict tools can still pass --tools via custom_args (it is not in piBlockedArgs). Closes #2379	2026-05-11 16:11:32 +08:00
Multica Eve	e79ffc0f01	fix(agent): expand Copilot CLI model catalog with correct dotted IDs (#2336 ) * fix(agent): expand Copilot CLI model catalog with correct dotted IDs The Copilot CLI provider only exposed two models in the runtime dropdown, and one of them used the dashed legacy form `claude-sonnet-4-6` which `copilot --model` rejects with "Model ... is not available". The CLI accepts dotted IDs (e.g. `claude-sonnet-4.6`, `gpt-5.4`). Sync `copilotStaticModels()` with the official supported-models catalog so the dropdown surfaces the full set the user's account can route to (8 OpenAI + 4 Anthropic), and add a regression test that pins the expected IDs and bans the dashed form. Closes MUL-1948. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: multica-agent <github@multica.ai> * feat(agent): dynamic Copilot model discovery via ACP session/new The previous static catalog could only ever lag behind the user's real entitlements and what GitHub ships. Copilot CLI exposes the live catalog through its ACP server (`copilot --acp`): the `session/new` response includes `models.availableModels` plus `currentModelId`, scoped to the authenticated account. Wire copilot through the existing discoverACPModels helper — already used by hermes/kimi/kiro — so the dropdown reflects the account's real catalog, including the `auto` entry and per-tier model availability (Pro / Pro+ / Enterprise / evaluation models). The Copilot CLI puts itself into ACP server mode via the `--acp` flag instead of an `acp` subcommand, so acpDiscoveryProvider now takes an optional acpArgs override. Copilot's ACP payload omits the vendor name, so a small prefix-based inferCopilotProvider keeps the UI's openai / anthropic / google grouping working. When the binary is missing or auth fails, fall back to copilotStaticModels() so self-hosted runtimes without a copilot install still see a populated dropdown. Verified against `copilot 1.0.44`: live discovery returns 13 models with gpt-5.5 marked Default. Closes MUL-1948. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: multica-agent <github@multica.ai> * fix(agent): drop no-op COPILOT_ALLOW_ALL env and generalize OpenAI o-series prefix check - discoverCopilotModels: remove COPILOT_ALLOW_ALL=1 (not a real Copilot CLI env var; copy-pasta from HERMES_YOLO_MODE=1). Discovery only drives initialize + session/new which never trigger tool-permission prompts, so no extra env is needed. - inferCopilotProvider: replace the o1/o3/o4 prefix chain with a generic o<digit>+ check via isOpenAIReasoningSeriesID, so future o5/o6/… reasoning models are tagged as openai automatically. Guards against false positives like 'opus-…' or bare 'o'. - Extend TestInferCopilotProvider with o5/o6 forward-compat cases and negative cases (opus-fake, omni, o). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: multica-agent <github@multica.ai> --------- Co-authored-by: Eve <eve@multica-ai.local> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: multica-agent <github@multica.ai>	2026-05-11 14:36:43 +08:00
Multica Eve	72e89a74f3	fix: surface copilot failure details (#2396 ) Co-authored-by: Eve <eve@multica-ai.local> Co-authored-by: multica-agent <github@multica.ai>	2026-05-11 14:08:33 +08:00
Bohan Jiang	b73a301bf9	fix(agent): drain stderr before deciding ACP failure promotion (#2333 ) `hermes`, `kimi`, and `kiro` all wired stderr through `cmd.Stderr = io.MultiWriter(logWriter, providerErrSniffer)`. The OS-pipe → MultiWriter copy goroutine that exec spawns for that form is only joined by `cmd.Wait()`, which the lifecycle goroutine fires in deferred cleanup — after `promoteACPResultOnProviderError` already consulted the sniffer. When stopReason=end_turn (success) raced ahead of the stderr drain, the sniffer's `lines` slice was empty, the helper fell through to the synthetic agent-text fallback ("hermes provider error: API call failed after 3 retries"), and the actionable upstream signal (HTTP 429 / usage limit) was lost. This was visible as a flaky `TestHermesBackendPromotesProviderErrorWithNonEmptyOutput` in CI under high parallelism — a real prod bug, not a test issue: live runs hit the same race when an upstream LLM returns 429 and hermes' synthetic agent turn beats the stderr drain to the parent. Replace the MultiWriter wiring with `cmd.StderrPipe()` + an explicit copier goroutine that signals on `stderrDone`. The lifecycle goroutine already awaits `<-readerDone` for stdout; add `<-stderrDone` next to it before `promoteACPResultOnProviderError` runs. The deferred `cmd.Wait()` ordering is unchanged — it just becomes a cheap reap by the time it fires. Verified: `go test ./pkg/agent/ -run "TestHermes\|TestKimi\|TestKiro" -count=10 -race`, then full package `-count=3 -race`, all green. Co-authored-by: multica-agent <github@multica.ai>	2026-05-09 17:34:25 +08:00
LinYushen	f70105fb12	fix(agent): include JSON-RPC error data field in ACP error messages (#2327 ) ACP backends (Kiro, Hermes, Kimi) put the actionable reason for code=-32603 'Internal error' in the JSON-RPC `data` field, e.g. "No session found with id". The wrapped Go error only carried `code` and `message`, leaving operators staring at a bare "kiro session/prompt failed: session/prompt: Internal error (code=-32603)" with no way to tell apart session expiry, model unavailability, lost auth, or quota. Parse `data` too. Strings render unquoted; objects/arrays render as raw JSON; null/missing keeps the previous format unchanged. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-09 16:19:57 +08:00
Bohan Jiang	c57546159d	fix(daemon): mark provider 429 / out-of-credit agent runs as failed, not completed (#2323 ) * fix(daemon): mark provider 429 / out-of-credit runs as failed, not completed Two bugs combined to silently report failed agent runs as "Completed" in the UI when the upstream LLM returned a 4xx (e.g. HTTP 429 rate-limit / no credit on the account). 1. ACP backends (hermes, kimi, kiro) only promoted the run status to "failed" when their stderr sniffer fired AND the agent output buffer was empty. But hermes injects a synthetic agent text turn ("API call failed after 3 retries: HTTP 429...") on retry exhaustion, so the buffer was never empty in the rate-limit case and the promotion never ran. Drop the empty-output precondition: the sniffer's regex (HTTP-status markers, named error types) is specific enough to trust on its own. 2. The daemon's task-result switch only routed "blocked" through FailTask; every other status — including "cancelled", and any future status we forget to enumerate — fell through to CompleteTask. Invert it so only an explicit "completed" status reports success, and extract the switch into reportTaskResult for direct testing. Cancelled now defaults to failure_reason "cancelled" instead of being silently completed. Closes GitHub multica#1952. Co-authored-by: multica-agent <github@multica.ai> * fix(agent): only promote ACP run to failed on terminal provider error Address GPT-Boy's review on the multica#1952 fix. The previous promotion rule ("any sniffer line → fail") was too broad: the existing sniffer also captures transient per-attempt warnings ("API call failed (attempt 1/3): RateLimitError [HTTP 429]"), and those lines stay in the buffer for the rest of the run. A retry sequence whose first attempt blipped but whose third attempt succeeded would have been wrongly reported as failed. Tighten the criteria with two additional signals, both defined on the existing acpProviderErrorSniffer / output buffer: - acpTerminalErrorRe — sticky `terminal` flag set when stderr shows an exhausted/non-retryable marker (❌, [ERROR], "after N retries", Non-retryable, BadRequestError, AuthenticationError). Per-attempt warnings deliberately don't match. - acpAgentOutputTerminalRe — matches the synthetic "API call failed after N retries..." turn that hermes-style adapters inject into the agent text stream when they give up; this catches multica#1952 even if hermes' stderr only logged transient attempts. Promotion logic becomes a shared helper, promoteACPResultOnProviderError, called from hermes / kimi / kiro. Promotes when (a) terminalMessage is non-empty, (b) output contains the synthetic give-up turn, or (c) output is empty and the sniffer captured anything at all (preserves the original empty-output safety net for transient-only sequences with no real result to fall back on). Tests: - TestHermesProviderErrorSnifferTerminalVsTransient — transient attempt 1/3 alone returns terminalMessage="" but message!=""; a follow-on terminal marker flips terminal on. - TestHermesProviderErrorSnifferTerminalNonRetryable — confirms BadRequest / Authentication / Non-retryable / ❌ / [ERROR] are classified terminal even on the very first attempt. - TestHermesBackendDoesNotPromoteOnTransientRetry — fake hermes emits attempt 1/3 to stderr then a normal agent text turn and end_turn; resulting Status must stay "completed". Co-authored-by: multica-agent <github@multica.ai> --------- Co-authored-by: multica-agent <github@multica.ai>	2026-05-09 16:13:12 +08:00
Bohan Jiang	0eb23df234	fix(agent): scope pi colon-to-slash normalization to legacy format (#2309 ) PR #2281 added table-format support to parsePiModels but kept the unconditional `strings.Replace(":", "/", 1)`, which would silently rewrite a `:` inside a model name read from column 1 of the table output (e.g. `claude-sonnet-4-6:exp` would become `claude-sonnet-4-6/exp`). Move the replace into the legacy `provider:model` branch so only the colon-as-separator case is normalized, and restore a short doc comment describing the dual- format contract. Test extended with a colon-bearing table row. Co-authored-by: multica-agent <github@multica.ai>	2026-05-09 13:56:49 +08:00
Leonardo Diego	8d5a6138fe	fix: parse pi --list-models table format for model discovery (#2281 ) The pi CLI changed its --list-models output from a single-field 'provider:model' format to a multi-column table with separate 'provider' and 'model' columns. The existing parser only looked at the first whitespace-delimited field (the provider name) and skipped lines without ':' or '/' — discarding every model entry. Update parsePiModels to handle both formats: - New table format: combine fields[0] (provider) + fields[1] (model) - Legacy format: single field with ':' or '/' separator Add regression test for the table format using real pi output.	2026-05-09 13:51:32 +08:00
Bohan Jiang	0af67c8159	fix(agent/openclaw): block tasks if openclaw < 2026.5.5 with upgrade hint (#2181 ) PR #2101 swapped the openclaw runtime adapter from reading --json on stderr to stdout. That fixed openclaw 2026.5+ but inverted the breakage for pre-2026.5 builds — those still write JSON to stderr, so the adapter now sees an empty stdout and falls through to the same "openclaw returned no parseable output" failure that 2026.5+ users saw before #2101. Add a per-task version gate inside openclawBackend.Execute that runs `openclaw --version`, parses the dotted version, and rejects anything below 2026.5.5 with a hardcoded upgrade hint: openclaw <detected> is below the minimum supported version 2026.5.5. Run `openclaw update` to upgrade and try again. The check is intentionally per-task and uncached so users who upgrade do not need to restart the daemon — the next task automatically re-checks. ~20ms per task is negligible vs. the typical run. Co-authored-by: multica-agent <github@multica.ai>	2026-05-07 02:11:47 +08:00
Joey Frasier (Boothe)	af971e1e5c	fix(agent/openclaw): read --json from stdout, not stderr (#2101 ) Multica's openclaw runtime adapter has been reading agent output from stderr since the early openclaw integration days. Current openclaw (2026.5.5, c37871e) writes its --json blob exclusively to stdout: $ openclaw agent --local --json --agent main --message 'say hi' >stdout 2>stderr STDOUT bytes: 27401 STDERR bytes: 0 Result: every successful turn was followed by a daemon-generated system comment 'openclaw returned no parseable output', visible to users, looked like the agent broke when it didn't. Reproduced live on WOR-2, turn at 2026-05-05 16:35 UTC; daemon log confirmed the full result JSON arrived on the [openclaw:stdout] debug channel and was discarded while the empty stderr pipe hit the no-events fallback. Changes - server/pkg/agent/openclaw.go: swap pipes, StdoutPipe() for the JSON stream, cmd.Stderr = newLogWriter(...) for log overflow. Cleanup goroutine now closes stdout on cancel. Comments and the read-error errMsg updated to reflect the new pipe. - server/pkg/agent/openclaw_test.go: TestOpenclawProcessOutputReadError asserts on 'read stdout' (was 'read stderr'), string-only fix, no behavior change. New TestOpenclawProcessOutputStdoutFixture feeds a recorded openclaw 2026.5.5 --json blob through processOutput and asserts result + messages parse cleanly. - server/pkg/agent/testdata/openclaw-2026.5.5-stdout.json: 27401-byte fixture captured fresh from the openclaw CLI for the regression test. Side effects (net positive) - Log lines openclaw writes to stderr (security warnings, tool errors) now show up under [openclaw:stderr] instead of being silently consumed by the JSON parser. - Daemon's success_pattern heuristic (empty-output -> 'blocked') becomes meaningful again because result.Output actually populates. Closes WOR-10.	2026-05-07 01:50:16 +08:00
prellr	ee10c508fb	fix(daemon): trust the agent's session id from session/resume across ACP backends (#2070 ) When the local state.db of an ACP backend (hermes, kimi, kiro) is wiped — crash, config change, manual kill, container reset — the backend's session/resume (or session/load, in kiro's case) silently creates a brand-new session rather than failing, and returns the new id in the response. Today the daemon ignores the response and stamps sessionID = opts.ResumeSessionID across all three backends, so every subsequent session/prompt is addressed to a session id the backend has no record of. The task fails with JSON-RPC -32603 (Internal error) on the very first turn, with no operator-visible signal that the problem is a session-id mismatch one layer down. The behavior is invisible: agent shows "started", then "failed" with a generic Internal error. Reproducing in production took repeated runs because nothing in the logs pointed at the silent reset. Fix: route all three ACP backends through a small `resolveResumedSessionID` helper that: - prefers the id the backend returned in its response (the canonical id; the one the backend will accept on the next call) - falls back to the requested id when the response is malformed, empty, or omits sessionId — defensive fallback so older / non- conforming backends (notably kiro's current session/load shape) behave identically to today - signals (via a bool) when the id changed, so the caller logs a Warn with `backend=<hermes\|kimi\|kiro>` and operators can grep for silent state resets to correlate them with task failures Why this is at the backend layer rather than the daemon's existing session-resume fallback: server/internal/daemon/daemon.go:1554-1566 already retries with a fresh session when resume fails, but it gates on `result.Status == "failed" && result.SessionID == ""`. The backend WILL hand back a result.SessionID — just the new one it silently committed to — so the daemon-level fallback never fires for this failure mode. The helper is also what session/new already uses (extractACPSessionID, documented in code as "Shared by all ACP backends"). session/new extracts the canonical id from the response; session/resume just didn't, until now. Coverage: - hermes.go: confirmed bug, root cause of -32603 in production - kimi.go: same code shape, same protocol method, same response schema as hermes (per extractACPSessionID's comment) — same bug - kiro.go: same code shape, different method (session/load). Current observed response doesn't include sessionId, so the defensive fallback means today's behavior is preserved. Routing through the same helper means a future kiro release that DOES return a sessionId on silent reset works the same way as hermes/kimi without another diff. Tests (server/pkg/agent/hermes_test.go — helper covers all three backends, no per-backend duplication): - TestResolveResumedSessionIDMatching — backend confirms requested id - TestResolveResumedSessionIDDifferent — backend returned a new id; caller is told to switch - TestResolveResumedSessionIDEmptyResponse — older / malformed body; defensive fallback to requested id (covers kiro's current shape) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-06 14:15:40 +08:00
Jiayuan Zhang	1476c268dd	refactor(quick-create): exempt git-describe daemons from CLI gate (#2108 ) * refactor(quick-create): remove daemon CLI version gate Local-source daemons report dev-suffixed versions (e.g. v0.2.15-235-gdaf0e935) that the picker pre-check and server gate both treat as too old, blocking quick-create during local testing. Drops the gate end-to-end: removes MinQuickCreateCLIVersion + CheckMinCLIVersion in pkg/agent, the checkQuickCreateDaemonVersion handler and readRuntimeCLIVersion helper in handler/issue.go, and the mirrored cli-version.ts plus the modal's pre-check, blocked-state UI, and daemon_version_unsupported error branch. Co-authored-by: multica-agent <github@multica.ai> * refactor(quick-create): skip daemon CLI version gate in dev Restores the gate (reverts the full-removal commit) and bypasses it in non-production environments instead. The motivation for the original removal — local source-built daemons report a `git describe` version like v0.2.15-N-gHASH that parses below 0.2.20 and blocks dev testing — is now handled by checking APP_ENV on the server and NODE_ENV on the client. Production keeps the original "needs upgrade" UX. Co-authored-by: multica-agent <github@multica.ai> * refactor(quick-create): exempt git-describe daemons instead of env bypass Replaces the per-environment bypass added in the previous commit with a shared daemon-version signal. CheckMinCLIVersion / checkQuickCreateCliVersion now treat any daemon whose CLI version matches the `vX.Y.Z-N-gHASH[-dirty]` git-describe shape as OK; tagged releases keep going through the normal min-version comparison. Why: Emacs flagged that (a) NODE_ENV !== "production" also disables the gate on staging and other non-prod deployments, undoing the protection for the case the gate was originally written for, and (b) NODE_ENV (web client) and APP_ENV (server) are not equivalent, so the modal pre-check and server gate could disagree on the same request. Both go away when the signal is intrinsic to the daemon's version string. Co-authored-by: multica-agent <github@multica.ai> --------- Co-authored-by: multica-agent <github@multica.ai>	2026-05-06 09:00:11 +08:00
Kagura	629f4136ac	fix(codex): handle MCP elicitation server requests correctly (#1944 ) * fix(codex): handle MCP elicitation server requests correctly Fixes #1942. handleServerRequest responded with {} to unrecognized Codex server requests including mcpServer/elicitation/request. Codex 0.125+ expects {action, content, _meta} for elicitation — the empty object causes a deserialization error and the MCP tool call is reported as user-rejected. Changes: - Add mcpServer/elicitation/request case with correct response schema - Add respondError helper for JSON-RPC error responses - Return proper JSON-RPC method-not-found error for unknown server requests instead of silent empty object - Add tests for MCP elicitation and unknown method handling * fix: use cfg.Logger instead of global slog in codex handleServerRequest Switch the unhandled-server-request warning from global slog.Warn to c.cfg.Logger.Warn for consistency with all other log calls in codex.go. This ensures the warning appears in daemon run-logs and per-task pipelines where operators look during triage.	2026-05-04 21:05:37 +08:00
Bohan Jiang	170fa2102b	fix(agent/hermes): wire streamingCurrentTurn gate to drop history replay (#2024 ) Hermes ACP can flush queued session updates from the previous turn before the current turn actually starts — both as session/resume history replay and as chunks queued before our session/prompt response streams. Without a gate those updates were appended to output and re-emitted to the UI, so the previous answer appeared duplicated next to the new one. Closes #1997. PR #1789 added the acceptNotification hook field to hermesClient and the call site in handleNotification, but never assigned it for Hermes, so the guard short-circuited and every notification was processed. This change mirrors the working Kiro pattern (kiro.go:87/97/240): - declare a streamingCurrentTurn atomic.Bool in the backend. - assign acceptNotification, onMessage, onPromptDone gates that all return early when the flag is false. - flip the flag to true immediately before c.request("session/prompt"). Adds TestHermesClientAcceptNotificationGate as a regression test that exercises the gate directly on hermesClient. Verified with `go test ./pkg/agent`. Co-authored-by: multica-agent <github@multica.ai>	2026-05-03 11:43:36 +08:00
Bohan Jiang	3f046d03f7	fix(agent): expose GPT-5.5 family in Codex runtime model picker (#2020 ) Latest Codex CLI ships with GPT-5.5 / GPT-5.5 mini, but the static catalog still topped out at GPT-5.4 so users couldn't pick the new model from the agent picker. Add gpt-5.5 + gpt-5.5-mini to codexStaticModels and promote 5.5 as the default badge. Keep the older 5.4 / 5.3-codex / gpt-5 / o3 entries for users on older Codex CLI builds. Add a regression test mirroring TestGeminiStaticModelsExposesAliasesAndGemini3 so the next OpenAI release isn't a silent miss. Co-authored-by: multica-agent <github@multica.ai>	2026-05-03 11:12:51 +08:00
Bohan Jiang	cfa38df97b	feat(quick-create): gate on daemon CLI version with pre-check + server enforcement (#1857 ) * fix(quick-create): bound dialog height + scroll editor when content overflows Pasting a screenshot into the agent-create prompt expanded the editor unbounded, which dragged DialogContent past the viewport since the agent mode className had no max-height. Manual mode was unaffected because manualDialogContentClass pins `!h-96`. - Cap agent-mode DialogContent at `!max-h-[80vh]` (width stays `!max-w-xl`); short prompts still render compact, tall content stops at 80% of the viewport. - Switch the editor wrapper to `flex-1 min-h-[140px] overflow-y-auto` so it absorbs the remaining vertical space inside the now-bounded DialogContent and scrolls internally instead of pushing the dialog. * feat(quick-create): gate on daemon CLI version with pre-check + server enforcement The agent-create flow depends on multica CLI behavior introduced in v0.2.20 (URL attachment handling, no-retry semantics on `multica issue create` failure — see PR #1851 / MUL-1496). Older daemons either double-create issues on partial CLI failures or mishandle pasted screenshot URLs. Per J's review on MUL-1496, gate the flow at two layers — frontend pre-check for fast feedback, server re-check as the trust boundary, both fail-closed on missing/unparsable versions. Server: - New MinQuickCreateCLIVersion + CheckMinCLIVersion helper in pkg/agent (with sentinel errors for missing vs too-old). - QuickCreateIssue handler reads runtime metadata.cli_version and returns a stable 422 { code: "daemon_version_unsupported", current_version, min_version, runtime_id } before enqueuing. - The check runs after the existing online + ownership validation, so all rejections surface uniformly through the modal's existing error path. Frontend: - New @multica/core/runtimes/cli-version with the min version constant, parser, and runtime-metadata reader (tiny semver, no new lib dep). - AgentCreatePanel resolves the selected agent's runtime, runs the same check, shows an inline amber notice below the agent picker when missing/too old, and disables the Create button. - Submit handler also catches the server's 422 (defensive race — runtime can re-register between pre-check and submit) and surfaces the same wording in the error row. Switching to manual create remains a clean escape hatch — manual mode doesn't talk to a daemon at all, so an outdated CLI doesn't block the user from filing the issue.	2026-04-29 18:44:19 +08:00
Prince Pal	391a4ecd09	feat: add backend default agent args env vars (#1807 ) * feat: add backend default agent args env vars * docs: document default agent args env vars	2026-04-29 16:49:48 +08:00
carmake	805071b5b1	fix(agent/cursor): route Windows launcher through PowerShell -File to preserve multi-line prompts (#1709 ) On Windows the official cursor-agent installer ships cursor-agent.cmd whose body is `powershell ... -File cursor-agent.ps1 %`. CreateProcess for a .cmd file goes through cmd.exe, and `%` in a batch file is expanded by re-tokenising the original command line, which mangles arguments containing newlines or other whitespace - most notably a long, multi-line `-p <prompt>`. The agent then only sees a truncated prompt and fails with "Workspace Trust Required" or exits 1 immediately. When LookPath resolves cursor-agent to a .cmd/.bat launcher and a sibling cursor-agent.ps1 exists, invoke PowerShell directly with `-File <ps1>` so Go's os/exec passes each argv as a discrete token. This is exactly what the .cmd does internally; we just skip the cmd.exe re-tokenisation step. PowerShell host resolution prefers pwsh.exe (PS 7) on PATH, then powershell.exe on PATH, and finally falls back to %SystemRoot%\System32\WindowsPowerShell\v1.0. Platform-specific code is split via build tags (cursor_invocation_windows.go / cursor_invocation_other.go) so non-Windows builds carry no Windows-only dependencies. The lookup is exposed as a package variable to make the Windows path fully unit-testable without spawning real PowerShell. Five unit tests cover: passthrough on non-launcher targets, successful rewrite with a multi-line prompt, .exe direct launch (skip), missing .ps1 (skip), and missing PowerShell host (skip). The change leaves macOS / Linux behaviour entirely untouched and stays on the official cursor-agent launch chain - no node.exe direct invocation, no prompt mutation, no extra flags. Closes #1297 Made-with: Cursor	2026-04-29 14:00:15 +08:00

1 2 3

120 Commits