multica

mirror of https://github.com/multica-ai/multica.git synced 2026-07-05 13:29:44 +02:00

Author	SHA1	Message	Date
elrrrrrrr	254ec945f5	fix(agent/codex): shut down gracefully so OTEL telemetry flushes (#3888 ) Codex telemetry was never reaching the OTLP collector for tasks run by the daemon. The per-task config (including the [otel] block) is copied into CODEX_HOME correctly, but the lifecycle goroutine closed stdin and then immediately cancelled the run context, which SIGKILLs the app-server. Codex's OTEL batch exporters only force-flush on a graceful shutdown, so the buffered spans/metrics/logs were dropped before they could be exported — short tasks lost everything, long tasks lost the final batch. Let codex exit on its own after stdin EOF (running its shutdown + flush path) and only force-cancel after a bounded grace period if it doesn't, so the reader goroutine still can't block forever. Also set cmd.WaitDelay, matching the other long-lived backends (claude, copilot, cursor, …). Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-09 14:46:07 +08:00
Bohan Jiang	1ddf89a8f2	feat(daemon): enable Antigravity (agy) per-agent model selection (MUL-3125) (#3894 ) * feat(daemon): wire agy --model and model discovery for Antigravity agy 1.0.6 added a --model flag and an `agy models` catalog command, which were the #1 blocker in the earlier agy-backend review (MUL-3125). The antigravity backend already shipped but deliberately dropped opts.Model because agy 1.0.1 had no way to select a model. - buildAntigravityArgs now passes --model <display name> when opts.Model is set; the value is the exact `agy models` display string (spaces + parens), passed as a single exec arg so no shell quoting is needed. - Block --model in custom_args so it can't override the managed value. - ListModels("antigravity") enumerates via `agy models` (no static fallback: agy silently no-ops on unrecognised models, so a stale guess would turn a typo into a successful empty run). - ModelSelectionSupported now returns true for every built-in provider; the hook stays for any future model-less runtime. - Daemon probe reads MULTICA_ANTIGRAVITY_MODEL for the daemon-wide default. Co-authored-by: multica-agent <github@multica.ai> * docs(providers): mark Antigravity model selection as supported Antigravity gained --model in agy 1.0.6 (MUL-3125). Update the provider matrix + prose (en/zh/ja/ko) from "managed internally / no --model" to dynamic discovery via `agy models`, and refresh the now-stale picker comments. Flag the display-string (not slug) shape and agy's silent no-op on unrecognised values. Co-authored-by: multica-agent <github@multica.ai> * fix(daemon): reject unknown Antigravity model at spawn (MUL-3125) agy exits 0 with empty output on an unrecognised --model, so a stale/typo'd value would surface as a 'completed' but empty task. Validate opts.Model against the `agy models` catalog in Execute before spawning: a non-empty model the CLI does not advertise fails fast with an actionable error listing the real choices. opts.Model is the single funnel for agent.model and the MULTICA_ANTIGRAVITY_MODEL default, so this one check covers every source (UI free-text, API, persisted value, env) — addressing Elon's review that a UI-only guard is bypassable. Validation is fail-OPEN: if the catalog can't be discovered we pass the value through and let agy resolve it, so a discovery hiccup never blocks a run. Pure antigravityModelError() is unit-tested (valid / unknown / near-miss / empty-model / empty-catalog); verified live against real agy 1.0.6. Co-authored-by: multica-agent <github@multica.ai> --------- Co-authored-by: J <j@multica.ai> Co-authored-by: multica-agent <github@multica.ai>	2026-06-08 15:32:53 +08:00
Bohan Jiang	3808049361	fix(codex): set semantic thread names (#3887 ) Co-authored-by: J <j@multica.ai> Co-authored-by: multica-agent <github@multica.ai>	2026-06-08 14:53:31 +08:00
Multica Eve	14f89bc08a	Fix Claude control request handling (#3827 ) Co-authored-by: Eve <eve@multica-ai.local> Co-authored-by: multica-agent <github@multica.ai>	2026-06-05 17:14:33 +08:00
Bohan Jiang	3708fb0f07	fix(daemon): inactivity-based agent run timeout, no wall-clock guillotine (MUL-3064) Active long-running sessions are no longer killed by a fixed wall-clock deadline. Liveness is delegated to the idle watchdog (MULTICA_AGENT_IDLE_WATCHDOG, default 30m) with a larger in-flight-tool budget (MULTICA_AGENT_TOOL_WATCHDOG, default 2h). MULTICA_AGENT_TIMEOUT is an opt-in absolute cap (default 0 = no cap). The server-side 2.5h sweeper is unchanged as a coarse backstop. Fixes #3745.	2026-06-05 15:06:07 +08:00
Bohan Jiang	76dbb87762	fix(agent): standardize model-discovery timeouts to 15s, stop caching empty results Raise pi and cursor model-list discovery timeouts 5s->15s to match opencode/ACP; openclaw stays 30s (sequential multi-spawn). Stop caching empty discovery results so a transient timeout doesn't keep the picker blank for the full TTL. Fixes #3729. MUL-2977.	2026-06-05 14:47:59 +08:00
Bohan Jiang	c9ceaee4d9	fix(agent): stop stripping user-facing CLAUDE_CODE_* config from child env (#3690 ) * fix(agent): stop stripping user-facing CLAUDE_CODE_* config from child env isFilteredChildEnvKey blanket-removed every CLAUDE_CODE_* var from the spawned Claude Code child's environment. The intent was only to keep the daemon's internal session markers from leaking, but CLAUDE_CODE_* is also Anthropic's user-facing config namespace. On Windows this stripped the user-set CLAUDE_CODE_GIT_BASH_PATH, so Claude Code could not locate bash.exe, exited immediately, and every task failed with "write claude input: write \|1: The pipe has been ended." Switch from prefixing the whole CLAUDE_CODE_ namespace to an exact-name denylist of the internal runtime/session markers (CLAUDECODE, CLAUDE_CODE_ENTRYPOINT, CLAUDE_CODE_EXECPATH, CLAUDE_CODE_SESSION_ID, CLAUDE_CODE_TMPDIR, CLAUDE_CODE_SSE_PORT), still blanket-stripping the wholly-internal CLAUDECODE_* namespace. Every other CLAUDE_CODE_* var (GIT_BASH_PATH, USE_BEDROCK, USE_VERTEX, MAX_OUTPUT_TOKENS, ...) now reaches the child. The internal-marker set was confirmed against the live runtime, not guessed. Fixes the whole class, not just git-bash: Bedrock/Vertex/etc. were silently dropped the same way. MUL-2940 Co-authored-by: multica-agent <github@multica.ai> * fix(agent): keep CLAUDE_CODE_TMPDIR in child env CLAUDE_CODE_TMPDIR is a documented, user-configurable temp-dir override (public env-vars reference), not an internal per-session marker. Claude Code creates its own per-session subdir under it, so inheriting it is harmless — and stripping it would silently break a user's temp-dir override the same way the broad prefix filter broke CLAUDE_CODE_GIT_BASH_PATH. Drop it from the internal denylist (which now holds only the undocumented per-process runtime markers: CLAUDECODE, CLAUDE_CODE_ENTRYPOINT, CLAUDE_CODE_EXECPATH, CLAUDE_CODE_SESSION_ID, CLAUDE_CODE_SSE_PORT) and assert it reaches the child. MUL-2940 Co-authored-by: multica-agent <github@multica.ai> --------- Co-authored-by: J <j@multica.ai> Co-authored-by: multica-agent <github@multica.ai>	2026-06-05 12:29:55 +08:00
Bohan Jiang	c99c2493ae	fix(agent): keep resolvable models when CLI discovery exits non-zero Parse the discovered catalog even when the model-discovery CLI exits non-zero (pi/opencode/cursor/openclaw) instead of discarding it and returning an empty model picker. Filter pi diagnostic lines so stale-pattern warnings don't coin bogus models. Fixes #3729. MUL-2977.	2026-06-04 14:57:30 +08:00
Bohan Jiang	fcb5099ec5	fix(agent): raise opencode model-discovery timeout to 15s (MUL-2888) (#3689 ) Newer opencode (1.15+) syncs its hosted free-model catalog over the network on `opencode models`, which can take ~6s. The previous 5s cap killed the command, discoverOpenCodeModels returned an empty list, and the daemon reported it as a successful empty result — so the runtime showed online but the model picker was empty ("暂无可用模型"). Fixes #3627 Co-authored-by: J <j@multica.ai> Co-authored-by: multica-agent <github@multica.ai>	2026-06-03 12:18:43 +08:00
Multica Eve	e2720f7d33	feat: add opencode thinking variants Adds OpenCode model variant discovery for thinking controls, passes saved thinking_level through opencode run --variant, and hardens verbose model parsing with fallback coverage.	2026-06-02 13:15:14 +08:00
Jan De Dobbeleer	1e1a4f7845	fix(daemon): fix Copilot CLI invocation on Windows and strip shell quotes from custom args (MUL-2876) Bug 1: detect copilot.cmd/.bat on Windows and invoke the sibling .ps1 directly via powershell -File, bypassing cmd.exe %* re-tokenisation that mangled the multi-line -p prompt. Shared rewriteCmdToPS1() now serves cursor, pi, and copilot. Bug 2: filterCustomArgs (shared by all agent backends) strips one outer layer of shell quotes via unshellQuoteArg() before processing, so shell-style custom args like --deny-tool='write' no longer reach the CLI with literal quotes.	2026-06-01 23:28:51 +08:00
Mohammed Helaiwa	d4b97dc44a	fix(agent): drain claude stdout while writing prompt to stdin (#3490 ) The claude backend wrote the full prompt to the child's stdin and closed it before starting the stdout reader goroutine. With --verbose --output-format stream-json the CLI emits a startup banner before reading its first stdin frame; with no reader draining stdout, the child blocks on its stdout write, never reads stdin, and our stdin Write blocks until the per-task context fires. The field symptom is tasks failing exactly at the 2 h per-task timeout with "write \|1: The pipe has been ended." Move writeClaudeInput into its own goroutine so the prompt write and the stdout drain proceed concurrently. Guard stdin close with sync.Once (it can now be called from both the writer goroutine and, previously, the result handler). Join the write result at cmd.Wait() and surface a write failure as a "failed" status only when no result event arrived and no session was established, so a genuine startup death still reports the stderr tail. Add a regression test that re-execs the test binary as a fake claude which bursts 256 KiB to stdout before reading stdin, with a 128 KiB prompt pushed at stdin — both past any plausible OS pipe buffer — so a regression hangs until the test deadline instead of passing. Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-01 13:52:41 +08:00
feifeigood	382cdd6a0b	feat(agent): consume OpenCode mcp_config via OPENCODE_CONFIG_CONTENT (#3098 ) Closes the runtime-side gap of #2106: previously `agent.mcp_config` was honored only by Claude Code (via `--mcp-config <file>`); for OpenCode the field was accepted by the API but silently ignored at execution time. ## Approach OpenCode has no `--mcp-config` flag. Project the agent's `mcp_config` into OpenCode via OPENCODE_CONFIG_CONTENT — OpenCode's general inline-config injection environment variable, which accepts any subset of OpenCode's config schema (model / agent / mode / plugin / mcp / …) and merges at "local" scope after the project-config loop. MCP is the only field this PR projects through that channel; if a future Multica field needs the same channel it would assemble a combined config slice before the env append. The env-var route was deliberate. An earlier draft of this PR wrote the translated MCP servers into <workdir>/opencode.json and removed the file on cleanup; review (#3098) flagged that the task workdir is reused across turns for the same (agent, issue), and any agent- or user-written model / tools / permission settings in opencode.json must survive across runs. OPENCODE_CONFIG_CONTENT avoids the workdir entirely — nothing is written to disk, no cleanup is needed, and the env entry dies with the spawned process. OPENCODE_CONFIG_CONTENT was added to OpenCode in v1.4.10 (2025-09); the official @opencode-ai/sdk uses the same env var to inject runtime config, so the surface is stable. Verified empirically against OpenCode 1.15.6 in our K8s runtime: `opencode debug config` returns the injected mcp slice deep-merged with the user's global config, and <workdir>/opencode.json is observably untouched. ## Translation surface `agent.mcp_config` accepts two shapes for portability: - Claude-style `{"mcpServers": {name: {url\|command, ...}}}` is translated into OpenCode's native form: `type: "local"\|"remote"`, `command` coerced to a string array, `env` renamed to `environment`. - Native OpenCode `{"mcp": {name: ...}}` accepts the three shapes OpenCode's schema permits and is strict-decoded against each: - McpLocalConfig: `{type:"local", command:[…], environment?, enabled?, timeout?}` - McpRemoteConfig: `{type:"remote", url:"…", headers?, oauth?, enabled?, timeout?}` - bare override: `{enabled: bool}` (toggle a server inherited from global / project config without redefining it) Decoding uses `json.DisallowUnknownFields` so any field outside the matching schema is rejected — matching OpenCode's `additionalProperties: false`. Without this, a malformed payload (e.g. `command: "node"` instead of `command: ["node"]`) would reach OpenCode verbatim and either silently disable the server or crash the CLI at startup. Field-level checks the strict decoder doesn't catch: - `timeout` must be a positive integer (rejects 0, negative, fractional) - `oauth` must be either an object (validated against McpOAuthConfig) or the literal `false`; primitives and `true` are rejected as ambiguous - `oauth.callbackPort` must be in 1..65535 when set ## Precedence Go's os/exec dedups `cmd.Env` by key keeping the LAST occurrence (Go 1.9+). Appending OPENCODE_CONFIG_CONTENT after `buildEnv(b.cfg.Env)` guarantees the daemon's value wins over any value the user happened to put in `agent.custom_env` — which matches the intended semantics (`mcp_config` is the authoritative daemon-managed field; `custom_env` is the escape hatch). When that override happens we surface a warning log so accidental clobbers are debuggable. ## Limitation (out of scope, accepted in review) OpenCode also deep-merges its global config (`~/.config/opencode/opencode.json`) into every session and exposes no flag to disable that. Operators who want strict per-agent isolation from the global layer can set: ```jsonc // agent.custom_env on the platform { "XDG_CONFIG_HOME": "/tmp/opencode-isolated" } ``` …pointing at any directory without an `opencode/` subdir. OpenCode then reads no global config and only honors what the daemon injects via OPENCODE_CONFIG_CONTENT. Verified with `opencode debug config`. ## Changes server/pkg/agent/opencode_mcp.go (new): - buildOpenCodeMCPConfigContent — translates raw mcp_config into the JSON string OpenCode accepts via OPENCODE_CONFIG_CONTENT, returns "" when there's nothing to inject so the caller can skip the env entry (avoids clobbering anything the user put in agent.custom_env.OPENCODE_CONFIG_CONTENT) - translateMCPConfigForOpenCode + helpers — Claude-style → OpenCode native shape - validateOpenCodeNativeMCPEntry + opencodeMCPLocal / opencodeMCPRemote / opencodeMCPEnabledOnly / opencodeMCPOAuth typed structs — strict-decode native-shape entries against the schema (DisallowUnknownFields), plus targeted post-decode assertions for timeout / oauth / callbackPort server/pkg/agent/opencode.go: - 12 lines of env injection in Execute(), placed AFTER buildEnv so the daemon's value wins via os/exec dedup - warning log when agent.custom_env duplicates the same key - no on-disk state, no rollback closure, no post-run cleanup — OPENCODE_CONFIG_CONTENT lives only in the spawned process env server/pkg/agent/opencode_mcp_test.go (new): - TestBuildOpenCodeMCPConfigContent_{Empty,Remote,Local,Native} - TestBuildOpenCodeMCPConfigContent_NativeAcceptsAllSchemaFields — covers each native variant round-tripping every optional field (local with env+timeout+enabled; remote with headers+oauth-object+ timeout+enabled; remote with oauth: false; bare {enabled} override) - TestBuildOpenCodeMCPConfigContent_RejectsMalformedNative — 31-case table covering every constraint on Bohan-J's review: command must be a string array, environment / headers values must be strings, oauth must be an object or false, timeout must be a positive integer, additionalProperties: false (per-shape allow-list checked via DisallowUnknownFields) - TestOpencodeBackendInjectsMCPConfigViaEnv — E2E happy path; fake opencode binary captures $OPENCODE_CONFIG_CONTENT, asserts the translated mcp slice is present AND <workdir>/opencode.json was NOT written - TestOpencodeBackendOmitsMCPEnvWhenEmpty — empty mcp_config does NOT inject the env, preserving any value the user set in agent.custom_env - TestOpencodeBackendOverridesUserOpenCodeConfigContent — daemon value wins via os/exec dedup keep-last apps/docs/content/docs/providers.{en,zh}.mdx: - flip OpenCode's MCP cell from ❌ to ✅ - reword the "MCP configuration: only Claude Code actually reads it" section so OpenCode is included; describe each tool's mechanism (Claude → `--mcp-config`, OpenCode → OPENCODE_CONFIG_CONTENT) apps/docs/content/docs/install-agent-runtime.{en,zh}.mdx: - update the Claude Code blurb (no longer "the only one") - expand the OpenCode blurb to mention mcp_config support - fix the now-broken /providers anchor Refs #2106 (TS types and per-agent UI for mcp_config are separate follow-ups, not in this PR). Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-30 18:08:21 +08:00
Bohan Jiang	e1745d09ea	MUL-2797 feat(agent): add Claude Opus 4.8 to model catalog & pricing (#3492 ) Claude Code now ships Opus 4.8 (claude-opus-4-8). Add it to the three places that enumerate Claude models so the picker, thinking-level catalog, and usage cost estimates all recognize it: - claudeStaticModels(): list Claude Opus 4.8 (Sonnet 4.6 stays default) - claudeModelEffortAllow: Opus supports the full low..max set incl. xhigh - MODEL_PRICING: $5/$25 in, $0.50 cache read, $6.25 5m cache write — same current-gen Opus tier as 4.5/4.6/4.7, confirmed against platform.claude.com/docs/en/about-claude/pricing Co-authored-by: J <j@multica.ai> Co-authored-by: multica-agent <github@multica.ai>	2026-05-29 10:28:30 +08:00
Bohan Jiang	09f9c7e2ce	MUL-2764 feat(agent): wire mcp_config through ACP runtimes (Hermes / Kimi / Kiro) (#3439 ) * MUL-2764 feat(agent): wire mcp_config through ACP runtimes (Hermes / Kimi / Kiro) The MCP config Tab (#3419) already lets admins save mcp_config on an agent, and the daemon plumbs it through to `agent.ExecOptions.McpConfig` for every runtime. Claude and Codex consume it; the three ACP runtimes (Hermes / Kimi / Kiro) ignored the field and hardcoded an empty `mcpServers: []` in their `session/new` requests. Add `buildACPMcpServers` to translate the Claude-style `{"mcpServers": {"<name>": {...}}}` object-of-objects into the array shape ACP requires (`[{name, command, args, env: [{name,value}, ...]}, ...]` for stdio; `[{type, name, url, headers: [...]}, ...]` for http/sse), then pass the translated array on `session/new` (all three) and `session/load` (kiro resume). Malformed JSON fails the launch closed — same contract Codex's `renderCodexMcpServersBlock` uses — so users see a real error instead of silently running with no MCP servers. Individual unclassifiable entries (no command, no url) are skipped with a warning so one bad row can't take MCP down for the rest of the agent. Co-authored-by: multica-agent <github@multica.ai> * MUL-2764 fix(agent): wire mcp_config through ACP resume + gate http/sse on capability Addresses the two blockers Elon raised on #3439: 1. session/resume now carries mcpServers for Hermes and Kimi (Kiro's session/load already did). Per the ACP Session Setup spec the resume path re-attaches MCP servers, and without this a resumed task lost access to MCP tools that a fresh task on the same agent would have had. Pinned with new TestHermesResumeIncludesMcpServers and TestKimiResumeIncludesMcpServers integration tests that inspect the recorded wire request. 2. Added extractACPMcpCapabilities + filterACPMcpServersByCapability so http/sse MCP entries get dropped (with a daemon-log warning naming the entry) when the runtime's initialize response doesn't advertise mcpCapabilities.http / .sse. Sending those entries to a stdio-only runtime is a spec violation and reliably tanks session/new; now they get filtered and the rest of the session still starts. Stdio entries pass through unconditionally. Both backends wire the filter in right after initialize so session/new and session/resume see the same filtered list. Also added TestKiroLoadIncludesMcpServersFromConfig — Elon flagged that no test pinned "non-empty mcp_config actually reaches the wire" for Kimi/Kiro, so the wire assertions go in for all three runtimes. Co-authored-by: multica-agent <github@multica.ai> --------- Co-authored-by: J <j@multica.ai> Co-authored-by: multica-agent <github@multica.ai>	2026-05-28 16:29:49 +08:00
Bohan Jiang	bae8a84abd	MUL-2767 feat(agent): add Antigravity runtime backend (#3427 ) * feat(agent): add Antigravity runtime backend Adds Google's Antigravity CLI (`agy`) as the 12th supported coding-tool runtime, alongside Claude / Codex / Cursor / Copilot / Gemini / Hermes / Kimi / Kiro / OpenCode / OpenClaw / Pi. The CLI emits plain assistant text on stdout (no structured event stream), so the backend streams stdout line-by-line as `MessageText` events and accumulates the same text as the final `Result.Output`. Session resumption uses `--conversation <id>`; because the conversation UUID is not echoed on stdout, the daemon routes `--log-file` to a temp file and recovers the id from the glog-formatted log lines. MUL-2767 Co-authored-by: multica-agent <github@multica.ai> * fix(agent): correct Antigravity capability contract from Elon review - ModelSelectionSupported now returns false for antigravity. `agy` has no --model flag and antigravityBackend deliberately drops opts.Model, so the UI must render a disabled "Managed by runtime" picker instead of an empty dropdown plus a silently-ignored manual-entry field. Also stop seeding AgentEntry.Model from MULTICA_ANTIGRAVITY_MODEL — the backend would silently ignore it. - Antigravity skills now write to {workDir}/.agents/skills/, the CLI's native workspace path (inherits Gemini CLI's layout per https://antigravity.google/docs/gcli-migration). Previously they went to the .agent_context/skills/ fallback that the CLI doesn't scan. Runtime brief moves antigravity into the native-discovery branch and local_skills.go points the user-level skill root at ~/.gemini/antigravity-cli/skills for Runtime → local skill import. - Doc + UI comment sync: providers matrix / install-agent-runtime / cloud-quickstart / agents-create / tasks (session-resume support) / skills / README all now list Antigravity in the right buckets, and the model-picker / model-dropdown comments cite antigravity (not the stale hermes reference) as the supported=false example. New tests: TestAntigravityModelSelectionUnsupported, TestInjectRuntimeConfigAntigravity (native discovery wording), TestWriteContextFilesAntigravityNativeSkills (.agents/skills/ landing, .agent_context/skills/ NOT written). Co-authored-by: multica-agent <github@multica.ai> * feat(provider-logo): swap inline placeholder for real Antigravity PNG Replaces the hand-drawn planet+arc placeholder with the official asset shipped from Downloads. Stored next to the component; bundlers (Next.js / electron-vite) resolve the PNG import to a URL string at build time. Added a small assets.d.ts so packages/views' tsc accepts PNG / SVG module imports — there was no prior asset usage in this package to register the declaration. --------- Co-authored-by: J <j@multica.ai> Co-authored-by: multica-agent <github@multica.ai>	2026-05-28 15:40:05 +08:00
Bohan Jiang	d39da9f7f0	MUL-2764: feat(agents): add MCP config tab to agent detail page (#3419 ) * MUL-2764: feat(agents): add MCP config tab to agent detail page Backend already stores `mcp_config` and the daemon forwards it to the runtime CLI via `--mcp-config`; this only adds the UI entry point. The new tab presents a JSON editor that pretty-prints the existing config, validates the buffer on every keystroke, and saves through the existing `PUT /api/agents/{id}` path. Clearing the editor sends `mcp_config: null`, which the handler reads as "wipe the column" and the daemon falls back to the CLI's own default. When the caller can't see secrets (agent actor, or a non-owner non-admin member), the server already returns `mcp_config: null` with `mcp_config_redacted: true`; the tab renders a read-only "configured but hidden" state in that case so a non-privileged member cannot silently overwrite an admin-owned config by saving an empty editor. Co-authored-by: multica-agent <github@multica.ai> * fix(agents): MCP tab — preserve in-flight edits + warn non-Claude runtimes - Fix stale-editor sync: compare the local draft against the previous original via a ref, so a background agent refetch updates an untouched editor instead of being silently ignored. Without this, a draft equal to the OLD original was treated as user-edited after the prop changed, and the next Save would write the old config back over a concurrent admin edit. - Surface a notice inside the tab when the agent's runtime provider is not Claude — today's daemon only forwards mcp_config via Claude's --mcp-config, so saving on e.g. a Codex agent was silent but ineffective. - Tests for both: rerender resyncs an untouched editor, rerender preserves an in-flight edit, warning renders on non-Claude / hides on Claude. MUL-2764 Co-authored-by: multica-agent <github@multica.ai> * MUL-2764: feat(agents): codex MCP support + hide MCP tab on unsupported runtimes - Backend: codex.go now translates agent.mcp_config (Claude-style `{"mcpServers": {...}}`) into `-c mcp_servers.<name>=<inline-toml>` flags for `codex app-server`, so MCP servers configured in the UI reach Codex's per-task config layer. Bad mcp_config JSON downgrades to a warn-and-skip so it can't break the agent launch. - Frontend: AgentOverviewPane hides the MCP tab when the agent's runtime provider doesn't read mcp_config — only `claude` and `codex` are supported today, every other provider sees no MCP tab. The previous in-tab warning is removed (no longer reachable). - New shared helper `providerSupportsMcpConfig` lives in `@multica/core/agents` so views and any future caller share one list of MCP-aware providers. - Tests: new go-side coverage for stdio + url + multi-server inputs, TOML string escaping, malformed-input fallback, and arg ordering vs custom_args; new views-side coverage for which providers surface the MCP tab. En + zh-Hans copy and parity test refreshed. Co-authored-by: multica-agent <github@multica.ai> * MUL-2764: fix(agents): keep codex mcp_config secrets out of argv/logs Move the agent's mcp_config from a `-c mcp_servers.<id>=<inline-toml>` argv flag into a daemon-managed `[mcp_servers.]` block inside the per-task `$CODEX_HOME/config.toml`. mcp_servers.<id>.env is a documented Codex config field and the UI already treats mcp_config as redacted for non-admins; argv would have leaked those values into `ps aux` and the `agent command` log line. The file is forced to 0600 to keep secrets in the daemon owner's lane regardless of the seed file's mode. Also drop user-supplied `-c/--config mcp_servers.` entries from custom_args. Codex `-c` is last-wins (verified against codex-cli 0.132.0), so without filtering, a custom_args entry could silently shadow whatever the MCP Tab saved. Strip inherited `[mcp_servers.]` tables from the per-task config.toml when the agent has its own mcp_config, mirroring Claude's `--strict-mcp-config`: avoids TOML "table already exists" errors on name collisions and matches admin expectations that the MCP Tab is the authoritative source for that task. Co-authored-by: multica-agent <github@multica.ai> MUL-2764: fix(agents): codex mcp_config three-state semantics + custom_args compat Address the third review pass: 1. Distinguish nil vs present-but-empty mcp_config. `{}` and `{"mcpServers":{}}` now count as "admin saved an explicit (empty) managed set" — strip inherited user `[mcp_servers.]` and pin an empty managed marker block. Only SQL NULL / JSON `null` map to "absent" and fall back to the user's global `~/.codex/config.toml`. This aligns Codex with the API's three-state contract (omit / null / object) and with Claude's `--strict-mcp-config` semantics. 2. Fail closed on `ensureCodexMcpConfig` errors and on managed mcp_config without CODEX_HOME. Previous warn-and-launch would silently inherit the user's global MCP servers and look identical to a successful apply — exactly the surprise the MCP Tab is meant to remove. 3. Only filter `-c mcp_servers.` from `custom_args`/`extra_args` when the agent has a managed mcp_config. Pre-MUL-2764 agents that configured MCP via custom_args keep working; once an admin opts in via the MCP Tab the daemon owns the `mcp_servers` namespace and overrides are dropped (last-wins safety). 4. Update mcp_config locale intro to mention $CODEX_HOME/config.toml instead of the now-removed `-c mcp_servers.*` argv path. Tests: - Split `TestEnsureCodexMcpConfigEmptyInputsAreNoop` into `TestEnsureCodexMcpConfigAbsentLeavesUserTablesAlone` (nil/null) and `TestEnsureCodexMcpConfigEmptyManagedSetStripsUserMcp` (`{}`, `{"mcpServers":{}}`). - Add `TestEnsureCodexMcpConfigEmptyManagedSetIdempotent` to pin byte-identical reruns on the empty managed marker block. - Add `TestHasManagedCodexMcpConfig` covering the eight relevant inputs. - Add `TestBuildCodexArgsPreservesCustomMcpOverridesWhenUnmanaged` and `TestBuildCodexArgsDropsCustomMcpOverridesWhenManaged` to pin the new gating. - Add `TestCodexExecuteFailsClosedWhenMcpConfigInvalid` and `TestCodexExecuteFailsClosedWhenManagedMcpButNoCodexHome` for the Execute paths. Co-authored-by: multica-agent <github@multica.ai> --------- Co-authored-by: J <j@multica.ai> Co-authored-by: multica-agent <github@multica.ai>	2026-05-28 15:11:28 +08:00
Bohan Jiang	2bda4065d0	MUL-2708: fix(agent): preserve multi-line Pi prompt on Windows by bypassing the .cmd shim (#3417 ) Pi is installed on Windows via npm, which lays down `pi.cmd` → `pi.ps1` → `node_modules/@mariozechner/pi-coding-agent/dist/cli.js`. The daemon spawns Pi with `exec.Command("pi", ...)`; PATHEXT resolves that to `pi.cmd`, and cmd.exe expands `%` in the shim by re-tokenising the original command line, which truncates any argv containing newlines. buildPiArgs passes the full prompt as the last positional argv, so the multi-line system+user prompt is silently cut at the first newline before it reaches the JS entrypoint. The session JSONL then records only the first line ("You are running as a chat assistant for a Multica workspace.") and Pi replies as if the user message were missing (GitHub multica-ai/multica#3306). Mirror the existing cursor-agent fix: when LookPath resolves Pi to a .cmd/.bat launcher and a sibling pi.ps1 exists, invoke PowerShell with `-File <ps1>` directly and forward each arg as a discrete token. This keeps us on the official launch path while skipping the cmd.exe % re-expansion. Falls back to the original launcher when pi.ps1 or PowerShell can't be located. The Windows test asserts the rewrite produces the expected argv and that the multi-line positional prompt survives unchanged. Co-authored-by: J <j@multica.ai> Co-authored-by: multica-agent <github@multica.ai>	2026-05-28 12:36:16 +08:00
Kagura	f02bc56e70	fix(agent/cursor): remove obsolete 'chat' subcommand from argv (#3077 ) (#3092 ) The current cursor-agent CLI no longer has a 'chat' subcommand. The positional 'chat' argument was silently treated as prompt text, leaking into the user message (e.g. 'chat <actual prompt>'). Remove 'chat' from buildCursorArgs so the generated argv matches the current cursor-agent CLI interface. Fixes #3077	2026-05-27 16:40:29 +08:00
Multica Eve	311cf4d998	fix(agent): surface Codex app-server no-progress diagnostics (MUL-2688) Refs #3262.	2026-05-26 18:42:47 +08:00
Multica Eve	26ff52385b	fix: attribute Hermes usage to current model (MUL-2696) Fix Hermes ACP usage attribution to current model when agent.model is unset. Also preserves cache-read token accounting and makes ACP model-list parsing more tolerant of snake_case payloads and Unknown display names.	2026-05-26 18:13:28 +08:00
Multica Eve	744b474199	revert(agent): remove per-agent local skill toggle (MUL-2603) (#3286 ) * Revert "feat(agents): hide skills_local toggle for runtimes that don't honour it (MUL-2603) (#3276)" This reverts commit `0b50c5a209`. Co-authored-by: multica-agent <github@multica.ai> * Revert "fix(agent): surface host OAuth token via env var on macOS isolation (MUL-2603) (#3267)" This reverts commit `a67bf81225`. Co-authored-by: multica-agent <github@multica.ai> * Revert "fix(agents): tighten skills-tab intro and drop redundant import hint (#3265)" This reverts commit `d8075a5775`. Co-authored-by: multica-agent <github@multica.ai> * Revert "fix(agent): mirror $HOME/.claude.json into isolated config dir (MUL-2661) (#3261)" This reverts commit `40da88fc16`. Co-authored-by: multica-agent <github@multica.ai> * Revert "feat(agent): per-agent toggle to isolate host-machine skills (MUL-2603) (#3200)" This reverts commit `960befa56f`. Co-authored-by: multica-agent <github@multica.ai> * Add migration cleanup for reverted agent skills toggle Co-authored-by: multica-agent <github@multica.ai> --------- Co-authored-by: Eve <eve@multica-ai.local> Co-authored-by: multica-agent <github@multica.ai>	2026-05-26 17:00:01 +08:00
Bohan Jiang	a67bf81225	fix(agent): surface host OAuth token via env var on macOS isolation (MUL-2603) (#3267 ) * fix(agent): surface host OAuth token via env var on macOS isolation (MUL-2603) Claude Code 2.x scopes the macOS keychain credentials entry by sha256(CLAUDE_CONFIG_DIR)[:8], so the MUL-2603 isolation path strands the child at "Not logged in" even after #3261 mirrored .claude.json: the child looks up `Claude Code-credentials-<scratch-hash>`, the host token is sitting in the no-suffix `Claude Code-credentials` entry. Read the host OAuth token from the keychain via /usr/bin/security and inject it as CLAUDE_CODE_OAUTH_TOKEN, which bypasses keychain lookup entirely. Linux/Windows continue to use the .credentials.json mirror (no-op there). Operator-pinned tokens and ANTHROPIC_API_KEY both take precedence over the keychain reader. Co-authored-by: multica-agent <github@multica.ai> * fix(agent): tighten empty-value auth gate, pin Claude CLI env-scrub assumption (MUL-2603) Empty-value gate - `ANTHROPIC_API_KEY=` inherited from a login shell that conditionally exports auth previously posed as an "operator pinned API-key auth" choice and disabled the keychain reader, stranding the isolated child at "Not logged in" even though no auth was actually selected. - Custom_env `CLAUDE_CODE_OAUTH_TOKEN=""` (stale agent config) had the same effect, plus would have shadowed a keychain-injected token in libc env lookups that pick the first match. - Both are now treated as noise: the empty entry is dropped from the child env and the keychain reader runs unchanged. Two new unit tests cover the os.Environ side (`...TreatsEmptyAnthropicAPIKeyAsUnpinned`, `...HonorsNonEmptyAnthropicAPIKey`) and the custom_env side (`...EmptyOAuthTokenInCustomEnvAsUnpinned`). Env-scrub boundary - Surfacing `CLAUDE_CODE_OAUTH_TOKEN` to the isolated child is only safe because Claude Code itself drops that variable from the env it hands to Bash / hook subprocesses, so a model-driven `printenv` can never echo the secret into the agent transcript. - Empirically verified against `claude` 2.1.121: printf '...test -n "$CLAUDE_CODE_OAUTH_TOKEN" && echo SET \|\| echo UNSET...' \ \| CLAUDE_CODE_OAUTH_TOKEN=sk-canary-XYZ \ MUL2603_CONTROL=control-value \ claude --print --output-format text \ --allow-dangerously-skip-permissions --allowedTools Bash returned `UNSET` for the OAuth token while the non-sensitive `MUL2603_CONTROL` control returned `CONTROL-SET`, proving the CLI scrubs only the auth env, not the env in general. - Pinned this assumption in a new skip-gated regression test (`TestClaudeCLIScrubsOAuthTokenFromBashSubprocess`) that boots the real CLI with a canary token; failing the test means upstream Claude Code stopped scrubbing and the passthrough must move off env vars before MUL-2603 can ship. Co-authored-by: multica-agent <github@multica.ai> * fix(agent): gate keychain passthrough on default host dir, harden scrub test (MUL-2603) Two follow-ups from the round-2 review on #3267: 1. Custom CLAUDE_CONFIG_DIR no longer pulls the default OAuth token. Claude Code 2.x maps each config dir to its own suffixed `Claude Code-credentials-<hash>` keychain entry, so an operator that pins a managed/custom CLAUDE_CONFIG_DIR via custom_env or the daemon-host env was getting the daemon user's default unsuffixed entry injected into the isolated child — silently crossing accounts, exactly the boundary mirrorHostClaudeJSONIfMissing already protects for `.claude.json`. buildClaudeEnvWith now threads the effective hostConfigDir through and only calls the reader when that dir is the default `$HOME/.claude`. The new gate has a unit-level truth table (TestIsDefaultHostClaudeConfigDir) plus a regression (TestBuildClaudeEnvIsolatedSkipsKeychainForCustomHostConfigDir) that makes a t.Fatal-armed reader prove the gate keeps the read off for custom dirs. 2. Scrub e2e now asserts the control prong and the proof-of-execution marker, not just "canary absent". The previous assertion would false-pass on a model refusal, paraphrase, or "Bash gets no env at all" upstream change. The strengthened version sets a non-secret MUL2603_CONTROL alongside the canary OAuth token and asserts (a) canary is NOT in the transcript, (b) CONTROL-SET IS in the transcript (env propagation works for non-secrets — proves a targeted scrub), (c) UNSET IS in the transcript (the Bash tool actually ran AND saw the OAuth var as empty/unset). Code comment in buildClaudeEnvWith and the test docstring now narrow the security contract to the Bash tool subprocess only; hook subprocess env-scrub is no longer claimed because it has not been verified. Co-authored-by: multica-agent <github@multica.ai> * test(agent): use per-run nonces in Claude scrub e2e to kill false-pass (MUL-2603) Elon's round-3 review flagged that TestClaudeCLIScrubsOAuthTokenFromBashSubprocess still false-passed: the proof markers "UNSET" / "CONTROL-SET" were literal strings in the prompt, so strings.Contains matched them even when the model only paraphrased the prompt without spawning Bash. Replace the hard-coded markers with two per-run random hex nonces passed only via env vars (MUL2603_UNSET_NONCE, MUL2603_CONTROL_NONCE). The prompt now references the variable names, not the values, so the nonces can land in the transcript only if a real Bash subprocess inherits the env vars and echoes them. A paraphrasing or refusing model cannot fake nonces it never saw. Also update the security-boundary comment in buildClaudeEnvWith to describe the nonce-based proof. Co-authored-by: multica-agent <github@multica.ai> --------- Co-authored-by: multica-agent <github@multica.ai>	2026-05-26 15:29:58 +08:00
Bohan Jiang	40da88fc16	fix(agent): mirror $HOME/.claude.json into isolated config dir (MUL-2661) (#3261 ) PR #3200 introduced per-agent `skills_local=ignore` isolation that mirrors the host's Claude config dir into a per-task scratch dir, omitting `skills/` to keep broken local skills out of the CLI's discovery path. The mirror walks entries inside `hostConfigDir` (default: `$HOME/.claude/`), but Claude Code's default layout stores its main config — login state, project history — at `$HOME/.claude.json`, a sibling of `~/.claude/` rather than inside it. Once `CLAUDE_CONFIG_DIR=$ISOLATED` is set, the CLI looks for `$ISOLATED/.claude.json`, finds only `backups/.claude.json.backup.*` (those live inside `~/.claude/` and DO get mirrored), and exits with: Claude configuration file not found at: …/.claude.json Not logged in · Please run /login — so every agent with `skills_local=ignore` on a host using the default Claude layout dies on the first turn. Flipping the toggle back to "merge" restores the host CLAUDE_CONFIG_DIR and recovers the agent; that's the workaround Bohan flagged in MUL-2661. Fix: after the existing `mirrorHostClaudeExceptSkills`, run a new `mirrorHostClaudeJSONIfMissing` that pulls `$HOME/.claude.json` into the scratch dir as `.claude.json` when (a) the dest doesn't already have one and (b) the host source dir is the default `$HOME/.claude/`. The custom-CLAUDE_CONFIG_DIR path is left alone because a pinned custom dir is expected to be self-contained — silently borrowing `$HOME/.claude.json` from a different account would mask credential drift. The helper goes through `createFileLink`, so it inherits the same symlink → junction → hardlink → copy fallback chain the rest of the mirror uses on Windows-without-Developer-Mode hosts. Tests: - `TestMirrorHostClaudeJSONIfMissing_DefaultLayoutMirrorsParentFile` covers the happy path with an injected `homeDir`/`fileLink`. - `TestMirrorHostClaudeJSONIfMissing_AlreadyPresentNoop` asserts a pre-existing dest `.claude.json` (from a custom CLAUDE_CONFIG_DIR mirror) is not overwritten. - `TestMirrorHostClaudeJSONIfMissing_CustomHostDirSkipped` locks in the custom-host-dir gate. - `TestMirrorHostClaudeJSONIfMissing_MissingSourceNoop` documents the env-var-auth-only / fresh-install case. - `TestClaudeExecuteIsolatesProvidesClaudeJSONFromHome` is the end-to-end MUL-2661 regression: a fake `\$HOME` with the default split layout, `skills_local=ignore`, fake claude binary that prints whatever `.claude.json` reaches the scratch dir. Asserts the file rides through. Verified the test fails (with the documented MUL-2661 error message) when the new mirror call is removed. Verification: - `go test ./pkg/agent/...` green (full agent suite). - `GOOS=windows GOARCH=amd64 go vet ./pkg/agent/...` clean. Co-authored-by: multica-agent <github@multica.ai>	2026-05-26 13:50:35 +08:00
Bohan Jiang	960befa56f	feat(agent): per-agent toggle to isolate host-machine skills (MUL-2603) (#3200 ) * feat(agent): per-agent toggle to isolate host-machine skills (MUL-2603) Adds an agent-scoped `skills_local` switch ("ignore" default / "merge") so shared agents stop inheriting the operator's user-global Claude skill directory. A single broken local skill on one operator's machine was crashing the Claude CLI before it ever read stdin — the daemon saw a "broken pipe" with no recoverable signal (GitHub #3052). - DB: migration 108 adds `agent.skills_local` (NOT NULL DEFAULT 'ignore'), with sqlc CreateAgent/UpdateAgent updates and handler validation. - Claude runtime: when the agent is in "ignore" mode the backend points CLAUDE_CONFIG_DIR at an empty per-task scratch dir under the task cwd (fallback: OS temp), strips any inherited override, and cleans up after the run. Workspace skills under `{cwd}/.claude/skills/` still load. "merge" preserves the legacy inherit-from-machine behavior; Codex and other isolated backends are no-ops. - UI: new Skills toggle in the Create Agent dialog and the Agent → Skills tab, with EN/zh-Hans copy and SkillsLocalToggle shared between the two. - Tests: unit coverage for the new env helper, isolation dir lifecycle, full Claude execute paths (ignore + merge), and the handler tristate contract. Existing skills-tab test updated for the new copy. - Docs: updated `/skills` docs (EN + ZH) and added a 0.3.7 changelog entry in the landing-page i18n. Co-authored-by: multica-agent <github@multica.ai> * fix(agent): preserve claude login + validate skills_local input (MUL-2603) Address Elon's review on PR #3200: 1. Skill isolation no longer drops the operator's Claude login. The per-task scratch dir now mirrors every entry under `~/.claude/` as symlinks except `skills/`, so `.credentials.json`, settings, plugins, etc. reach the CLI exactly as on the host while the user-global skills directory stays hidden. Without this, default `ignore` would have broken every Claude agent on a non-API-key host the moment migration 108 landed. 2. Internal CreateAgent callers (agent_template, onboarding_shim) now set `SkillsLocal: "ignore"`. The Go zero value was about to trip the migration-108 CHECK constraint and 500 template / onboarding agent creation. 3. Create / update handler validation no longer normalizes garbage to "ignore". The strict 400 path is now reachable on bad client input; the drift-safe `normalizeSkillsLocal` stays on the read side only. UI copy + docs clarified that the toggle is Claude-only; other runtimes ignore the setting. Verification: - `go test ./...` green (full suite locally). - `pnpm --filter @multica/views exec vitest run agents/components/tabs/skills-tab.test.tsx` green. - Handler DB-backed tests still skip locally without docker (same as Elon's run) — CI will validate the create / update paths against migration 108. Co-authored-by: multica-agent <github@multica.ai> * fix(agent): mirror effective claude config dir with windows fallback (MUL-2603) Address Elon's second-round review on PR #3200: 1. The per-task scratch dir now mirrors the effective host Claude config dir, not unconditionally `~/.claude/`. Precedence: agent `custom_env` CLAUDE_CONFIG_DIR > parent process env > `~/.claude/`. Without this, an operator who pinned Claude at a managed install (custom env CLAUDE_CONFIG_DIR) would get the wrong credentials in the scratch dir, because `buildClaudeEnv` strips that env before handing it to the child. We resolve the source up front and feed it to the mirror, so the override env still points at the right bytes. 2. Mirror entries now go through platform-aware linkers. On Windows without Developer Mode / admin, `os.Symlink` is denied, which previously left the scratch dir empty and broke Claude Code auth on default `ignore`. The new helpers try symlink first, then fall back to a directory junction (`mklink /J`) for dirs or a hardlink (same-volume content share) / copy for files. Mirrors the execenv/codex_home_link_windows.go pattern. 3. Tests: - `TestResolveHostClaudeConfigDir` locks in the custom_env > parent_env > `~/.claude` precedence. - `TestNewIsolatedClaudeConfigDirMirrorsCustomHostDir` confirms the scratch dir picks up `.credentials.json` from a synthetic custom host dir, proving the source resolution actually propagates into the mirror. - `TestNewIsolatedClaudeConfigDirEmptyHostIsNoop` documents the env-var-auth-only case (no host source ⇒ empty scratch dir). - `TestMirrorHostClaudeExceptSkillsWith_FallbackWhenSymlinkFails` exercises the Windows-no-Developer-Mode path via the new `mirrorHostClaudeExceptSkillsWith` seam, asserting credentials and sub-dir children still reach the scratch dir after the symlink stand-in fails. - `TestMirrorHostClaudeExceptSkillsWith_PropagatesFirstLinkError` confirms callers see the per-entry error when even fallback fails (so the warn-log fires on broken Windows installs). - `TestCopyFileRoundTrip` covers the last-resort copy fallback and its EXCL no-overwrite contract. - `TestClaudeExecuteIsolatesUsesCustomEnvSource` is the end-to-end check: an agent with custom_env CLAUDE_CONFIG_DIR reads its credentials from the pinned dir, not `~/.claude/`. 4. Docs: `apps/docs/content/docs/skills.{mdx,zh.mdx}` updated to describe the effective-source resolution and the Windows fallback chain so the docs match the runtime behaviour. Verification: - `go test ./...` green (full server suite locally, including `pkg/agent` 23 cases covering the new + existing isolation paths). - `GOOS=windows GOARCH=amd64 go vet ./pkg/agent/...` and `go test -c -o /dev/null` both compile clean, confirming the Windows-tagged linker file builds. Co-authored-by: multica-agent <github@multica.ai> * fix(agent): default skills_local to merge to preserve legacy behavior (MUL-2603) Per Bohan's product decision on PR #3200, the per-agent host-skill toggle defaults to "merge" — the pre-MUL-2603 inherit-from-machine behavior — so existing personal workflows that rely on locally installed Claude Skills keep working unchanged. Agent owners explicitly opt into "ignore" when they need to harden a shared agent against a broken local skill on one operator's machine (GitHub #3052). Also audited all 11 runtimes for user-global skill discovery paths and documented the scope of the toggle. Only Claude reads a user-global `~/.claude/skills/`; Codex isolates via `CODEX_HOME`, the ACP backends (Hermes / Kimi / Kiro) and the JSON-stream backends (Copilot / Cursor / Gemini / Pi / OpenCode / OpenClaw) anchor discovery to the task workdir and never read a user-global skill directory. UI copy and docs now say "for runtimes that support it (currently Claude Code)" everywhere so the scope is explicit. Changes: - Migration 108: column default flipped to 'merge'. - Handler CreateAgent: missing field → "merge"; explicit "ignore" / "merge" still validated, garbage still 400. - normalizeSkillsLocal: drift-safe coercion now lands on "merge" for anything that isn't the exact literal "ignore". - agent_template.go / onboarding_shim.go: internal CreateAgent callers send "merge" instead of "ignore" to match the new default. - Claude runtime (`claude.go`): isolate-mode gate flipped from `SkillsLocal != "merge"` to `SkillsLocal == "ignore"`, so "" (legacy daemons / older clients) and "merge" both walk `~/.claude/` directly. - Create Agent dialog + Skills tab: toggle defaults to on (merge); only duplicate of an explicit "ignore" agent carries through. The isolation opt-in is now `skills_local: "ignore"` when the user flips off; "merge" is omitted from the request body. - i18n (EN + zh-Hans): copy reframed — "On (default) — merged"; "Off — ignored. Recommended for shared agents". - Docs (`/skills`, `/guides/agents.zh`): describe new default and enumerate which runtimes act on the toggle. - Landing changelog 0.3.7: retitled "Per-Agent Local-Skill Toggle"; note the on-by-default behavior + off-to-isolate framing. - Tests: - `TestClaudeExecuteIsolatesHostSkillsWhenIgnoreOptedIn` replaces the old by-default isolation case (now requires explicit "ignore"). - New `TestClaudeExecuteDefaultModeKeepsHostConfigDir` locks in that default ExecOptions preserve the host CLAUDE_CONFIG_DIR. - `TestClaudeExecuteIsolatesUsesCustomEnvSource` now explicitly opts into "ignore" mode. - Handler tests: omitted → "merge"; explicit "ignore" round-trips; preserve-existing test seeds "ignore" and asserts "merge" flip-back. - `TestNormalizeSkillsLocal_DriftStaysSafe`: only literal "ignore" maps to ignore; everything else → "merge". - `skills-tab.test.tsx`: toggle ON by default; flip OFF when agent opted into "ignore". Intro-text matcher anchored to a more specific phrase so it no longer collides with the toggle hint copy. Verification: - `go test ./...` green (full server suite locally). - `GOOS=windows GOARCH=amd64 go vet ./pkg/agent/...` and `go test -c -o /dev/null` both compile clean (windows-tagged linker file still builds). - `pnpm typecheck` green across all packages and apps. - `pnpm --filter @multica/views test` 88 files / 771 tests green. - `pnpm --filter @multica/core test` 43 files / 390 tests green. - Handler DB-backed tests still skip locally without docker; CI will validate the create / update paths against migration 108. Co-authored-by: multica-agent <github@multica.ai> * chore(landing): drop 0.3.7 changelog entry from this PR (MUL-2603) The landing-page release notes belong in a separate release-prep PR, not in the feature PR. Co-authored-by: multica-agent <github@multica.ai> * fix(agent): propagate skills_local=ignore to codex user-skill seed (MUL-2603) Make the per-agent skills_local toggle real for Codex too, not just Claude. Previously the toggle was only consumed by the Claude backend, while the daemon's execenv layer always seeded Codex's per-task CODEX_HOME with the host machine's user-installed skills from ~/.codex/skills/. A shared Codex agent with skills_local=ignore could still inherit a broken local skill from one operator's machine. Now: PrepareParams/ReuseParams carry SkillsLocal; hydrateCodexSkills skips seedUserCodexSkills when SkillsLocal == "ignore" so the per-task CODEX_HOME exposes only workspace skills to the codex CLI. Default ("merge", or empty from older servers/clients) preserves existing inherit-from-machine behavior. UI / docs are updated to reflect the contract honestly: Claude Code and Codex honor the toggle; other runtimes (Hermes / Kimi / Kiro / Copilot / Cursor / Gemini / Pi / OpenCode / OpenClaw) leave $HOME untouched and discover user-level skills natively, so the toggle is a no-op for them today. New tests: TestPrepareCodexSkillsLocalIgnoreSkipsUserSeed, TestPrepareCodexSkillsLocalMergeSeedsUserSkills, and TestReuseCodexSkillsLocalIgnoreSkipsUserSeed cover Prepare(ignore), Prepare(merge), and the toggle-flip-on-reuse path. Co-authored-by: multica-agent <github@multica.ai> * docs(skills): scope skills_local toggle copy to Claude Code + Codex (MUL-2603) Off-state hint and Skills tab intro now explicitly call out Claude Code + Codex as the only runtimes that honor the toggle, with "other runtimes ignore this setting" wired into both states (en + zh-Hans), so users on non-Claude/Codex agents don't read "Off" as runtime-wide isolation. Docs (skills.mdx, skills.zh.mdx, guides/agents.zh.mdx) stop describing Hermes / Kimi / Gemini / Copilot / Cursor / Pi / OpenCode / OpenClaw / Kiro as having native user-level skill discovery; the daemon simply does not manage user-level skill discovery for those runtimes today, and the toggle is a no-op regardless of where it is set. Co-authored-by: multica-agent <github@multica.ai> --------- Co-authored-by: multica-agent <github@multica.ai>	2026-05-26 13:26:33 +08:00
Wes	cfc652aa5f	fix(daemon): close stdin pipe in Pi adapter to deliver EOF (#2188 ) (#3118 ) Pi reads its prompt from argv (positional, see buildPiArgs) and never expects interactive input, so the Pi backend previously left cmd.Stdin nil. Under systemd, the resulting /dev/null character device has been observed not to satisfy Pi's readable-side wait, leaving runs stuck in "working" forever (#2188). Attach an explicit StdinPipe and close it immediately after Start so the child sees an EOF on a FIFO, matching the pattern already used by the Claude, Codex, Hermes, Kiro, and Kimi backends. The fix is defensive on the daemon side because Pi is mid-refactor and is not accepting issues upstream; once Pi itself stops blocking on stdin, this close is still correct (a closed pipe is a no-op for a process that does not read it). Test asserts the structural invariant: a shell-stub `pi` inspects /proc/self/fd/0 and only emits a valid event stream when stdin is a FIFO. If a future change drops the StdinPipe and stdin reverts to /dev/null (char device), the stub exits non-zero and the test fails.	2026-05-25 15:29:09 +08:00
Dmitry	5bc77f2953	fix(pi): strip leaked tool markup safely (#2956 )	2026-05-22 15:46:10 +08:00
Bohan Jiang	a6f19380b2	test(agent): use ForkLock helper to fix ETXTBSY flake in thinking tests (#3062 ) Two thinking tests wrote fake CLI scripts via os.WriteFile and immediately execed them. Under t.Parallel() with the rest of pkg/agent, a sibling test's concurrent fork can inherit our still-open write fd, so Linux returns ETXTBSY at exec time (Go #22315). CI hit this on main as "TestRunCodexDebugModels_ArgvSeenByBinary: fork/exec ...: text file busy". Switch both call sites to the existing writeTestExecutable helper, which holds syscall.ForkLock across OpenFile→Write→Close so no concurrent fork can inherit the write fd. Same pattern the rest of the package already uses (kimi, kiro, codex, claude tests).	2026-05-22 14:53:56 +08:00
YOMXXX	ed2957ddf8	fix(claude): record result model usage (#2899 )	2026-05-21 13:00:12 +08:00
iYuan	2f1f90c11a	fix(agent): retry codex semantic inactivity fresh (#2593 )	2026-05-20 20:03:39 +08:00
YOMXXX	34f16e2c7a	fix(opencode): deny interactive questions in daemon mode (#2878 ) * fix(opencode): deny interactive questions in daemon mode * fix(opencode): avoid permission env ordering bypass	2026-05-20 17:17:31 +08:00
Bohan Jiang	2bec2221d2	feat(agent): per-agent thinking_level for claude + codex (MUL-2339) (#2865 ) * feat(agent): persist thinking_level per agent (MUL-2339) Adds a nullable `thinking_level` column to the `agent` table so the backend can route a runtime-native reasoning/effort token (e.g. Claude's `xhigh`, Codex's `minimal`) through to the agent CLI on every dispatch. The column is intentionally TEXT rather than an enum — Claude and Codex publish overlapping but distinct vocabularies and we want the persisted value to round-trip exactly through whichever CLI receives it. NULL is the "use runtime default" sentinel that every downstream consumer reads as "do not inject --effort / reasoning_effort". This commit is just the storage layer (migration + sqlc); subsequent commits wire it through the API, daemon, and agent backends. Co-authored-by: multica-agent <github@multica.ai> * feat(agent-backend): inject reasoning effort for claude + codex (MUL-2339) Extends ExecOptions with a runtime-native ThinkingLevel string and wires it into the Claude and Codex backends. Discovery is driven by the local CLI so the daemon advertises whatever the host install supports rather than a hand-maintained list that goes stale. Per Elon's PR1 review: - Claude: parses `claude --help` to learn the `--effort` superset and projects through a per-model allow-list (xhigh is Opus-only; max is session-only on the smaller models). Falls back to a conservative static list when the binary is missing or help drift hides the line. - Codex: drives `codex debug models --output json` so per-model reasoning subsets and the documented default come directly from the CLI. The older config-error probe trick is gone — the JSON path is stable and doesn't pollute stderr with an intentional misconfig. - Cache key includes (provider, executablePath, cliVersion) so a CLI upgrade invalidates entries that referenced the older help / catalog. Per Trump's PR1 constraint, all three Codex injection points (thread/start.config, thread/resume.config, turn/start.effort) flow through one helper (`applyCodexReasoningEffort`) so they cannot drift independently. The shared `codexReasoningCases` fixture in `thinking_test.go` asserts the same value→{shape, key} contract at each site for every level the runtimes know about. Claude's `--effort` is also added to `claudeBlockedArgs` so a user custom_args entry can't silently outvote the daemon-injected value. Co-authored-by: multica-agent <github@multica.ai> * feat(api): wire thinking_level through API + daemon contract (MUL-2339) End-to-end plumbing for the per-agent reasoning/effort setting: - AgentResponse / TaskAgentData now carry `thinking_level`; the daemon's claim response includes it and the daemon's executor passes it through to agent.ExecOptions, where the Claude and Codex backends already know what to do with it. - ModelEntry on the runtime-models wire format gains a `thinking` block carrying `supported_levels` + `default_level` per model so the UI can render a runtime-aware picker without the server having to know about the local CLI install. `handleModelList` projects the agent-package catalog (including the new Thinking field) into the wire shape. - CreateAgent / UpdateAgent gate the field with a synchronous provider enum check (claude / codex only today). UpdateAgent is tri-state: field omitted = no change, "" = explicit clear (new `ClearAgentThinkingLevel` query, mirrors the existing mcp_config null pattern), non-empty = validate then set. Per Trump's PR1 review, the API NEVER auto-clears on a runtime/model swap and ALWAYS returns 400 on an unknown literal value — same shape across CreateAgent, UpdateAgent, and combined patches that move runtime + level in one request. Per-model combination failures (e.g. `xhigh` against a model that only supports up to `high`) surface as a daemon-side task error, not a silent server-side rewrite. TS types follow the same shape: `Agent.thinking_level`, `CreateAgentRequest`/`UpdateAgentRequest` add the field, `RuntimeModel` grows a `thinking` block. Older backends omit the field, which the front-end treats as "no picker for this model" — installed desktop builds keep working. Co-authored-by: multica-agent <github@multica.ai> * fix(agent): correct codex debug models argv + pin via runner test (MUL-2339) `codex debug models --output json` is rejected by codex-cli 0.131.0 — the subcommand emits JSON on stdout by default and has no `--output` flag. Drop the flag and add `--bundled` to skip the network refresh discovery doesn't need. Move the argv to a package-level var and add a test that runs a fake `codex` to assert the binary actually receives exactly `debug models --bundled`, so the contract can't silently drift on the next refactor. Also teach ValidateThinkingLevel to resolve an empty model to the provider's default model entry. Without this, every default-model task with a persisted thinking_level would be misjudged "unknown model" by the daemon guard. Co-authored-by: multica-agent <github@multica.ai> * fix(api): reject runtime switch that would leave invalid thinking_level (MUL-2339) A PATCH that changed `runtime_id` without touching `thinking_level` used to silently keep the existing value, so a Claude agent storing `max` could land on a Codex runtime where `max` is not a recognised token at all, and the daemon would receive a literal-invalid level. Hold the same "always 400 on literal-invalid, never silent coerce" rule on this implicit path. When runtime_id changes and the existing value is not in the new provider's enum, return 400 with the recovery options (clear via `thinking_level=""` or re-set in the same PATCH). Add coverage for both the kept-when-still-valid and the rejected cases, plus the two recovery paths (clear and replace). Co-authored-by: multica-agent <github@multica.ai> * fix(daemon): guard runTask with per-model thinking_level validator (MUL-2339) ValidateThinkingLevel existed but had no call site — `task.Agent. ThinkingLevel` flowed straight into ExecOptions, so `xhigh` configured on a non-Opus Claude model, or API-side stale values that escaped the provider enum gate, would be injected anyway. Run the validator before building ExecOptions. Invalid combinations log a warning and drop the level instead of failing the task: the agent still runs, just at the runtime's default reasoning effort. Discovery errors fail open (keep the level, let the CLI surface any objection) so a transient `claude --help` failure can't strand work. Empty model is forwarded as-is; the validator resolves it to the provider's default model internally per the cross-package contract. Co-authored-by: multica-agent <github@multica.ai> * chore(agent): drop stale `--output json` comments + unused scanner (MUL-2339) Codex CLI's `debug models` subcommand emits JSON without an `--output` flag, and `parseCodexDebugModels` never read from the bufio.Scanner. Sync the comments with the actual invocation and remove the dead init. Co-authored-by: multica-agent <github@multica.ai> --------- Co-authored-by: multica-agent <github@multica.ai>	2026-05-20 12:30:10 +08:00
Joey Frasier (Boothe)	76cd8275ff	fix(openclaw): parse whole buffer instead of line-by-line scanner (MUL-1908) (#2292 ) * fix(openclaw): parse whole buffer instead of line-by-line scanner Follow-up to `c87d7676` (WOR-10). The stdout/stderr swap fixed the dominant case but `processOutput` still scanned line-by-line and only attempted a whole-buffer parse from a fragile fallback path. Pretty-printed JSON (openclaw 2026.5.x emits the result blob indented across many lines) made every individual line unparseable on its own — `{`, ` "payloads": [`, ` {`, etc. — so the success path hinged entirely on the fallback joining `rawLines` and re-trying. Under load (daemon restarts racing the close-on-cancel goroutine, partial chunked reads when stdout closes mid-flight) the line scanner could see truncated input that never reassembled into valid JSON, surfacing "openclaw returned no parseable output" against runs where the agent had in fact completed the work and posted comments. Roughly 30–40% of recent runs in v0.2.27 logs hit this path; multica still wrote a `task_failed` inbox row for each one even though the underlying issue had moved to `in_review` or `done`. The fix: - processOutput now reads the full stdout buffer with `io.ReadAll` first. - A new `parseWholeBufferOpenclawResult` helper attempts a single `json.Unmarshal` against the entire buffer (after trimming, and after optionally stripping leading non-JSON log lines). When it matches, we build the result and return — the line scanner never runs. - If the whole-buffer parse fails, we fall through to the existing NDJSON line-by-line scanner. This preserves streaming-event support (kept for forward compatibility and other backends) without leaving openclaw's dominant pretty-printed shape at the mercy of timing. - The failure path now emits a `(got N bytes; preview: ...)` suffix on the canonical "no parseable output" error so future debugging isn't blind. The exact canonical phrase is preserved for empty buffers so existing dashboards / log-grep tooling keep matching. Tests: - TestOpenclawProcessOutputWholeBufferPrettyJSON: feeds a hand-crafted multi-line indented blob (multiple payloads, nested agentMeta, usage map) and asserts every field round-trips through the whole-buffer fast path. - TestOpenclawProcessOutputDeeplyIndentedFixture: re-runs the recorded openclaw 2026.5.5 stdout fixture (1070 lines) directly through parseWholeBufferOpenclawResult, asserting the bug-shape parses cleanly on the first attempt without falling through to NDJSON scanning. - TestOpenclawProcessOutputEmptyBufferErrorIncludesByteCount: tightens the empty-buffer failure path, asserts the canonical phrase survives so observability tooling keeps working. All existing tests in the openclaw + buildOpenclawArgs suites stay green (streaming NDJSON event tests, lifecycle tests, structured-error tests, usage-field-variant tests). The two pre-existing flaky timeout-tight codex tests (TestCodexExecuteSemanticInactivityAllowsContinuous) fail on both this branch and on `c87d7676` baseline; they are unrelated and out of scope here. Co-authored-by: multica-agent <github@multica.ai> fix(openclaw): drop dead preview branch, document streaming regression Rebase + review-fix follow-up on top of f27df2d9b. processOutput's preview branch was unreachable: openclawNoParseableOutputError was only called from the `!gotEvents && trimmed == ""` path, which by construction means the entire scanned buffer collapsed to whitespace, so the `(got N bytes; preview: ...)` formatter could never fire on a non-empty buffer. Replace the helper with a single canonical-string constant (callsite is now inline) and update the test name to match what it actually asserts (the canonical empty-buffer error string is preserved for external log-grep / dashboard consumers). Also document on processOutput that the line-scanner path is no longer truly streaming after the io.ReadAll switch: events accumulate until stdout closes. OpenClaw 2026.5.x does not emit streaming events so this regression is invisible today, but flag it for the next backend that might. Misc: switch the scanner's input source from `strings.NewReader(string(buf))` to `bytes.NewReader(buf)` to drop one unnecessary byte/string round-trip. MUL-1908 Co-authored-by: multica-agent <github@multica.ai> --------- Co-authored-by: multica-agent <github@multica.ai> Co-authored-by: J (Multica agent) <j@multica.local>	2026-05-19 17:42:41 +08:00
Bohan Jiang	9a577f3e11	fix(runtimes): anchor OpenCode skill + AGENTS.md discovery to task workdir (MUL-2416) (#2849 ) * fix(runtimes): anchor OpenCode skill + AGENTS.md discovery to task workdir OpenCode resolves its project discovery root from `--dir` and `PWD` before falling back to `process.cwd()`. The daemon set `cmd.Dir = workDir` but never overrode the inherited `PWD`, so OpenCode walked from the daemon's shell directory and silently bypassed the per-task workdir — agents lost visibility into `.opencode/skills/` and `AGENTS.md`, falling back to whatever global skills the host had installed (MUL-2416). - Pass `opencode run --dir <workDir>` and override `PWD=<workDir>` in the child env so AGENTS.md walk-up + `.opencode/skills` project config scan both anchor on the task workdir. - Block `--dir` from custom args so user overrides cannot re-introduce the regression. - Plumb skill `description` from DB through service / daemon / execenv. `writeSkillFiles` synthesizes a YAML frontmatter block (`name`, optional `description`) when the stored content lacks one, since runtimes like OpenCode silently drop SKILL.md files without a parseable `name`. Existing frontmatter is preserved unchanged so upstream-imported skills (GitHub / ClawHub / Skills.sh) keep their hand-shaped metadata. Tests: - New fake-CLI test confirms argv carries `--dir <workDir>` and the child sees `PWD=<workDir>`. - New test confirms a user-supplied `--dir` in custom_args is dropped. - New execenv tests cover synthesized frontmatter and preservation of pre-existing frontmatter. Co-authored-by: multica-agent <github@multica.ai> * fix(runtimes): inject SKILL.md `name` when upstream frontmatter omits it Skills imported with frontmatter that sets `description` but leaves `name` implicit (relying on the directory slug, as common in GitHub/Skills.sh imports) still hit OpenCode's "no parseable name → drop" path because the DB Name fallback never made it into the SKILL.md body. ensureSkillFrontmatter now scans the existing block and, when name is missing or empty, prepends `name: <slug>` while preserving description, body, and any runtime-specific keys verbatim. Also tighten yamlEscapeInline to always double-quote so descriptions that look like YAML keywords (`null`, `true`, `[foo]`, `{x: y}`, `2024-01-01`) parse as strings rather than getting reinterpreted and rejected. Adds regression test for the nameless-frontmatter case and updates the existing OpenCode skill test for the always-quoted description format. Co-authored-by: multica-agent <github@multica.ai> --------- Co-authored-by: multica-agent <github@multica.ai>	2026-05-19 16:21:02 +08:00
Bohan Jiang	e8d4b9a0a2	revert: drop exec_command watchdog (#2779 , #2786 ) (MUL-2337) (#2803 ) * Revert "fix(codex): bump default exec_command stuck timeout to 3 minutes (#2786)" This reverts commit `433cd1aaf5`. Co-authored-by: multica-agent <github@multica.ai> * Revert "feat(codex): add per-exec_command watchdog to escape dropped function_call_output (MUL-2337) (#2779)" This reverts commit `60bae62622`. Co-authored-by: multica-agent <github@multica.ai> --------- Co-authored-by: multica-agent <github@multica.ai>	2026-05-18 18:08:07 +08:00
Bohan Jiang	433cd1aaf5	fix(codex): bump default exec_command stuck timeout to 3 minutes (#2786 ) The watchdog fires on a "no progress" window, so the default mainly matters for commands that go fully silent (no outputDelta). Bumping from 2m → 3m leaves more headroom for legitimately slow silent commands before treating them as a dropped function_call_output, at a modest cost to recovery latency. MUL-2337 Co-authored-by: multica-agent <github@multica.ai>	2026-05-18 15:30:05 +08:00
Bohan Jiang	60bae62622	feat(codex): add per-exec_command watchdog to escape dropped function_call_output (MUL-2337) (#2779 ) * feat(codex): add per-exec_command watchdog to escape dropped function_call_output (MUL-2337) Codex app-server can drop the second function_call_output when two exec_command calls fan out in the same turn and both async-yield through the yield_time_ms boundary (observed 2026-05-18, MUL-2334 — Trump Agent wedged for 6+ min with no semantic activity events to drive any existing timer). The model then waits forever for the missing output; only the 10-minute semantic inactivity timeout would eventually rescue the run. Add a per-call watchdog in the codex client that tracks open exec_command / commandExecution items by call_id and fails the turn quickly (default 2 min, configurable via ExecOptions.ExecCommandStuckTimeout) when one stays open without progress. outputDelta events reset the per-call progress timestamp so long-running streaming commands aren't flagged. This is a daemon-side mitigation only — codex itself still has the upstream race, but the daemon no longer burns the full inactivity budget before the run is marked failed and a new run can recover. Co-authored-by: multica-agent <github@multica.ai> * feat(codex): track legacy exec_command_output_delta in watchdog (MUL-2337) Mirrors the raw v2 item/commandExecution/outputDelta refresh on the legacy codex/event protocol so a long-running streaming exec doesn't get falsely flagged as stuck after begin + 2 min. Co-authored-by: multica-agent <github@multica.ai> --------- Co-authored-by: multica-agent <github@multica.ai>	2026-05-18 15:14:45 +08:00
Bohan Jiang	113c4f4e90	docs(agent): clarify openclaw agent id vs name semantics (#2744 ) Follow-up to #2716. Updates two stale comments that still described openclaw's `name` and `id` as interchangeable. The actual contract: `id` is the routing key passed to `openclaw agent --agent <id>`; `name` is a human display label and is not safe to pass to the CLI. No behavior change. Co-authored-by: multica-agent <github@multica.ai>	2026-05-17 17:20:41 +08:00
Kagura	44d2fc1946	fix(agent): use openclaw agent id instead of name for --agent flag (#2716 ) openclawEntriesToModels() used the agent Name (which may contain spaces, e.g. "Sub2API OPS") as Model.ID. This ID is passed to openclaw via --agent, where normalizeAgentId mangles spaces into hyphens ("sub2api-ops"), causing a lookup miss against the registered id ("sub2api") and a "no parseable output" error. Fix: prefer agent ID for Model.ID; use Name only for display Label. When ID is empty, fall back to Name for backward compatibility. Fixes #2714	2026-05-17 17:08:00 +08:00
Bohan Jiang	8d872b7521	fix(daemon): disable Claude AskUserQuestion in non-interactive mode (MUL-2244) (#2656 ) * fix(daemon): disable Claude AskUserQuestion in non-interactive mode (MUL-2244) GitHub #2588: when Claude Code calls its built-in AskUserQuestion tool inside the daemon's stream-json runtime, the question never reaches the user — there's no UI to render it — so the SDK returns an empty answer and the agent silently "infers" and continues. From the issue's perspective, execution looks stuck while the agent is actually charging ahead on its own guess. Two-part fix: - `buildClaudeArgs` now passes `--disallowedTools AskUserQuestion` so the tool is not exposed to the model at all. - The Claude-specific runtime brief tells the agent to use a `blocked` issue comment for genuine clarification, or to state an explicit assumption and proceed. Adds a regression test that pins both: AskUserQuestion is forbidden in CLAUDE.md and is NOT mentioned in the AGENTS.md emitted for non-Claude providers (the tool is Claude-specific). Co-authored-by: multica-agent <github@multica.ai> * refactor(daemon): drop CLAUDE.md AskUserQuestion guidance, rely on --disallowedTools The --disallowedTools flag already prevents Claude from invoking AskUserQuestion, so duplicating the rule in the runtime brief just bloats the prompt without changing behavior. Removes the section and its regression test; the argv-level test in pkg/agent already pins the flag. Co-authored-by: multica-agent <github@multica.ai> --------- Co-authored-by: multica-agent <github@multica.ai>	2026-05-15 12:42:23 +08:00
fr00st	cc9fbd3db0	Fix stale Done replies on comment follow-ups (#2495 ) * fix: avoid stale done replies on comment follow-ups * fix: avoid inlining runtime brief for Hermes ACP * fix: address comment follow-up review feedback	2026-05-14 12:00:04 +08:00
Bohan Jiang	5db96b4007	fix(daemon): bypass Gemini folder-trust gate in headless mode (#2516 ) (#2523 ) Gemini CLI's folder-trust feature throws FatalUntrustedWorkspaceError (exit code 55) when the current workspace isn't in `~/.gemini/trustedFolders.json` and the process is headless — no interactive trust prompt is available. The daemon spawns gemini with `-p` + `--yolo` in a freshly checked-out worktree that the user has never trusted interactively, so every run with `security.folderTrust` enabled fails after ~10s with exit status 55 and no useful output. Default `GEMINI_CLI_TRUST_WORKSPACE=true` on the child env to short- circuit `checkPathTrust` in gemini-core. This mirrors gemini-cli's documented `--skip-trust` flag; the env var has been gemini's documented headless escape hatch for the entire folder-trust feature lifetime so the fix works on every gemini version that can produce the crash. Callers that explicitly set the same key in cfg.Env win, preserving the ability to opt back into the gate. Co-authored-by: multica-agent <github@multica.ai>	2026-05-13 17:05:12 +08:00
Bohan Jiang	178cfb5008	fix(daemon): strip Windows chcp noise from runtime version (#2516 ) (#2521 ) The gemini CLI's Windows shim emits `Active code page: 65001` (from `chcp`) to stdout before the real version reaches `--version` output. The daemon stored the raw concatenation as the runtime version, so the runtime detail page rendered `Active code page: 65001 0.42.0` instead of `0.42.0`. Scan `<cli> --version` line by line and return the first line carrying a semver-shaped token. Full strings like `2.1.5 (Claude Code)` or `codex-cli 0.118.0` survive unchanged; unparseable output falls back to the trimmed raw value. Co-authored-by: multica-agent <github@multica.ai>	2026-05-13 16:58:14 +08:00
Kagura	702c48209b	fix(agent): stop filtering Pi extension tools via hardcoded --tools allowlist (#2379 ) (#2381 ) The Pi backend hardcoded `--tools read,bash,edit,write,grep,find,ls` in buildPiArgs. Pi's SDK treats --tools as a restrictive allowlist: only the listed tools pass through `_refreshToolRegistry()`, silently filtering out any user-installed extension tools registered via `pi.registerTool()`. Omitting --tools makes Pi's `allowedToolNames` undefined, so the `isAllowedTool()` filter becomes a no-op and all tools — built-in and extension — are available. This matches Pi's standalone behavior. Users who want to restrict tools can still pass --tools via custom_args (it is not in piBlockedArgs). Closes #2379	2026-05-11 16:11:32 +08:00
Multica Eve	e79ffc0f01	fix(agent): expand Copilot CLI model catalog with correct dotted IDs (#2336 ) * fix(agent): expand Copilot CLI model catalog with correct dotted IDs The Copilot CLI provider only exposed two models in the runtime dropdown, and one of them used the dashed legacy form `claude-sonnet-4-6` which `copilot --model` rejects with "Model ... is not available". The CLI accepts dotted IDs (e.g. `claude-sonnet-4.6`, `gpt-5.4`). Sync `copilotStaticModels()` with the official supported-models catalog so the dropdown surfaces the full set the user's account can route to (8 OpenAI + 4 Anthropic), and add a regression test that pins the expected IDs and bans the dashed form. Closes MUL-1948. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: multica-agent <github@multica.ai> * feat(agent): dynamic Copilot model discovery via ACP session/new The previous static catalog could only ever lag behind the user's real entitlements and what GitHub ships. Copilot CLI exposes the live catalog through its ACP server (`copilot --acp`): the `session/new` response includes `models.availableModels` plus `currentModelId`, scoped to the authenticated account. Wire copilot through the existing discoverACPModels helper — already used by hermes/kimi/kiro — so the dropdown reflects the account's real catalog, including the `auto` entry and per-tier model availability (Pro / Pro+ / Enterprise / evaluation models). The Copilot CLI puts itself into ACP server mode via the `--acp` flag instead of an `acp` subcommand, so acpDiscoveryProvider now takes an optional acpArgs override. Copilot's ACP payload omits the vendor name, so a small prefix-based inferCopilotProvider keeps the UI's openai / anthropic / google grouping working. When the binary is missing or auth fails, fall back to copilotStaticModels() so self-hosted runtimes without a copilot install still see a populated dropdown. Verified against `copilot 1.0.44`: live discovery returns 13 models with gpt-5.5 marked Default. Closes MUL-1948. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: multica-agent <github@multica.ai> * fix(agent): drop no-op COPILOT_ALLOW_ALL env and generalize OpenAI o-series prefix check - discoverCopilotModels: remove COPILOT_ALLOW_ALL=1 (not a real Copilot CLI env var; copy-pasta from HERMES_YOLO_MODE=1). Discovery only drives initialize + session/new which never trigger tool-permission prompts, so no extra env is needed. - inferCopilotProvider: replace the o1/o3/o4 prefix chain with a generic o<digit>+ check via isOpenAIReasoningSeriesID, so future o5/o6/… reasoning models are tagged as openai automatically. Guards against false positives like 'opus-…' or bare 'o'. - Extend TestInferCopilotProvider with o5/o6 forward-compat cases and negative cases (opus-fake, omni, o). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: multica-agent <github@multica.ai> --------- Co-authored-by: Eve <eve@multica-ai.local> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: multica-agent <github@multica.ai>	2026-05-11 14:36:43 +08:00
Multica Eve	72e89a74f3	fix: surface copilot failure details (#2396 ) Co-authored-by: Eve <eve@multica-ai.local> Co-authored-by: multica-agent <github@multica.ai>	2026-05-11 14:08:33 +08:00
Bohan Jiang	b73a301bf9	fix(agent): drain stderr before deciding ACP failure promotion (#2333 ) `hermes`, `kimi`, and `kiro` all wired stderr through `cmd.Stderr = io.MultiWriter(logWriter, providerErrSniffer)`. The OS-pipe → MultiWriter copy goroutine that exec spawns for that form is only joined by `cmd.Wait()`, which the lifecycle goroutine fires in deferred cleanup — after `promoteACPResultOnProviderError` already consulted the sniffer. When stopReason=end_turn (success) raced ahead of the stderr drain, the sniffer's `lines` slice was empty, the helper fell through to the synthetic agent-text fallback ("hermes provider error: API call failed after 3 retries"), and the actionable upstream signal (HTTP 429 / usage limit) was lost. This was visible as a flaky `TestHermesBackendPromotesProviderErrorWithNonEmptyOutput` in CI under high parallelism — a real prod bug, not a test issue: live runs hit the same race when an upstream LLM returns 429 and hermes' synthetic agent turn beats the stderr drain to the parent. Replace the MultiWriter wiring with `cmd.StderrPipe()` + an explicit copier goroutine that signals on `stderrDone`. The lifecycle goroutine already awaits `<-readerDone` for stdout; add `<-stderrDone` next to it before `promoteACPResultOnProviderError` runs. The deferred `cmd.Wait()` ordering is unchanged — it just becomes a cheap reap by the time it fires. Verified: `go test ./pkg/agent/ -run "TestHermes\|TestKimi\|TestKiro" -count=10 -race`, then full package `-count=3 -race`, all green. Co-authored-by: multica-agent <github@multica.ai>	2026-05-09 17:34:25 +08:00
LinYushen	f70105fb12	fix(agent): include JSON-RPC error data field in ACP error messages (#2327 ) ACP backends (Kiro, Hermes, Kimi) put the actionable reason for code=-32603 'Internal error' in the JSON-RPC `data` field, e.g. "No session found with id". The wrapped Go error only carried `code` and `message`, leaving operators staring at a bare "kiro session/prompt failed: session/prompt: Internal error (code=-32603)" with no way to tell apart session expiry, model unavailability, lost auth, or quota. Parse `data` too. Strings render unquoted; objects/arrays render as raw JSON; null/missing keeps the previous format unchanged. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-09 16:19:57 +08:00
Bohan Jiang	c57546159d	fix(daemon): mark provider 429 / out-of-credit agent runs as failed, not completed (#2323 ) * fix(daemon): mark provider 429 / out-of-credit runs as failed, not completed Two bugs combined to silently report failed agent runs as "Completed" in the UI when the upstream LLM returned a 4xx (e.g. HTTP 429 rate-limit / no credit on the account). 1. ACP backends (hermes, kimi, kiro) only promoted the run status to "failed" when their stderr sniffer fired AND the agent output buffer was empty. But hermes injects a synthetic agent text turn ("API call failed after 3 retries: HTTP 429...") on retry exhaustion, so the buffer was never empty in the rate-limit case and the promotion never ran. Drop the empty-output precondition: the sniffer's regex (HTTP-status markers, named error types) is specific enough to trust on its own. 2. The daemon's task-result switch only routed "blocked" through FailTask; every other status — including "cancelled", and any future status we forget to enumerate — fell through to CompleteTask. Invert it so only an explicit "completed" status reports success, and extract the switch into reportTaskResult for direct testing. Cancelled now defaults to failure_reason "cancelled" instead of being silently completed. Closes GitHub multica#1952. Co-authored-by: multica-agent <github@multica.ai> * fix(agent): only promote ACP run to failed on terminal provider error Address GPT-Boy's review on the multica#1952 fix. The previous promotion rule ("any sniffer line → fail") was too broad: the existing sniffer also captures transient per-attempt warnings ("API call failed (attempt 1/3): RateLimitError [HTTP 429]"), and those lines stay in the buffer for the rest of the run. A retry sequence whose first attempt blipped but whose third attempt succeeded would have been wrongly reported as failed. Tighten the criteria with two additional signals, both defined on the existing acpProviderErrorSniffer / output buffer: - acpTerminalErrorRe — sticky `terminal` flag set when stderr shows an exhausted/non-retryable marker (❌, [ERROR], "after N retries", Non-retryable, BadRequestError, AuthenticationError). Per-attempt warnings deliberately don't match. - acpAgentOutputTerminalRe — matches the synthetic "API call failed after N retries..." turn that hermes-style adapters inject into the agent text stream when they give up; this catches multica#1952 even if hermes' stderr only logged transient attempts. Promotion logic becomes a shared helper, promoteACPResultOnProviderError, called from hermes / kimi / kiro. Promotes when (a) terminalMessage is non-empty, (b) output contains the synthetic give-up turn, or (c) output is empty and the sniffer captured anything at all (preserves the original empty-output safety net for transient-only sequences with no real result to fall back on). Tests: - TestHermesProviderErrorSnifferTerminalVsTransient — transient attempt 1/3 alone returns terminalMessage="" but message!=""; a follow-on terminal marker flips terminal on. - TestHermesProviderErrorSnifferTerminalNonRetryable — confirms BadRequest / Authentication / Non-retryable / ❌ / [ERROR] are classified terminal even on the very first attempt. - TestHermesBackendDoesNotPromoteOnTransientRetry — fake hermes emits attempt 1/3 to stderr then a normal agent text turn and end_turn; resulting Status must stay "completed". Co-authored-by: multica-agent <github@multica.ai> --------- Co-authored-by: multica-agent <github@multica.ai>	2026-05-09 16:13:12 +08:00
Bohan Jiang	0eb23df234	fix(agent): scope pi colon-to-slash normalization to legacy format (#2309 ) PR #2281 added table-format support to parsePiModels but kept the unconditional `strings.Replace(":", "/", 1)`, which would silently rewrite a `:` inside a model name read from column 1 of the table output (e.g. `claude-sonnet-4-6:exp` would become `claude-sonnet-4-6/exp`). Move the replace into the legacy `provider:model` branch so only the colon-as-separator case is normalized, and restore a short doc comment describing the dual- format contract. Test extended with a colon-bearing table row. Co-authored-by: multica-agent <github@multica.ai>	2026-05-09 13:56:49 +08:00

1 2 3

131 Commits