Commit Graph

133 Commits

Author SHA1 Message Date
Bohan Jiang
b1c8eb5f11 feat: support Claude Fable 5 pricing (#3982)
Co-authored-by: J <j@multica.ai>
Co-authored-by: multica-agent <github@multica.ai>
2026-06-10 12:33:27 +08:00
shutcode
a2ef95445b MUL-2794 fix(agent): stop Cursor sessions on terminal result (#3165)
Treats Cursor's stream-json terminal `result` event as the protocol completion boundary so a lingering Cursor worker process can no longer hold the daemon task open after the agent has produced its final result.

- Tighten `cmd.WaitDelay` to 500ms (set before `Start()`)
- Set `resultSeen` and `cancel()` on terminal `result`
- Preserve completed/failed status across the cancellation via two `!resultSeen` guards in the post-loop status decision
- Add unix fake-CLI coverage for success and `is_error` terminal results
2026-06-09 16:49:48 +08:00
elrrrrrrr
254ec945f5 fix(agent/codex): shut down gracefully so OTEL telemetry flushes (#3888)
Codex telemetry was never reaching the OTLP collector for tasks run by the
daemon. The per-task config (including the [otel] block) is copied into
CODEX_HOME correctly, but the lifecycle goroutine closed stdin and then
immediately cancelled the run context, which SIGKILLs the app-server. Codex's
OTEL batch exporters only force-flush on a graceful shutdown, so the buffered
spans/metrics/logs were dropped before they could be exported — short tasks
lost everything, long tasks lost the final batch.

Let codex exit on its own after stdin EOF (running its shutdown + flush path)
and only force-cancel after a bounded grace period if it doesn't, so the reader
goroutine still can't block forever. Also set cmd.WaitDelay, matching the other
long-lived backends (claude, copilot, cursor, …).

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-09 14:46:07 +08:00
Bohan Jiang
1ddf89a8f2 feat(daemon): enable Antigravity (agy) per-agent model selection (MUL-3125) (#3894)
* feat(daemon): wire agy --model and model discovery for Antigravity

agy 1.0.6 added a --model flag and an `agy models` catalog command, which
were the #1 blocker in the earlier agy-backend review (MUL-3125). The
antigravity backend already shipped but deliberately dropped opts.Model
because agy 1.0.1 had no way to select a model.

- buildAntigravityArgs now passes --model <display name> when opts.Model is
  set; the value is the exact `agy models` display string (spaces + parens),
  passed as a single exec arg so no shell quoting is needed.
- Block --model in custom_args so it can't override the managed value.
- ListModels("antigravity") enumerates via `agy models` (no static fallback:
  agy silently no-ops on unrecognised models, so a stale guess would turn a
  typo into a successful empty run).
- ModelSelectionSupported now returns true for every built-in provider; the
  hook stays for any future model-less runtime.
- Daemon probe reads MULTICA_ANTIGRAVITY_MODEL for the daemon-wide default.

Co-authored-by: multica-agent <github@multica.ai>

* docs(providers): mark Antigravity model selection as supported

Antigravity gained --model in agy 1.0.6 (MUL-3125). Update the provider
matrix + prose (en/zh/ja/ko) from "managed internally / no --model" to
dynamic discovery via `agy models`, and refresh the now-stale picker
comments. Flag the display-string (not slug) shape and agy's silent no-op
on unrecognised values.

Co-authored-by: multica-agent <github@multica.ai>

* fix(daemon): reject unknown Antigravity model at spawn (MUL-3125)

agy exits 0 with empty output on an unrecognised --model, so a stale/typo'd
value would surface as a 'completed' but empty task. Validate opts.Model
against the `agy models` catalog in Execute before spawning: a non-empty
model the CLI does not advertise fails fast with an actionable error listing
the real choices. opts.Model is the single funnel for agent.model and the
MULTICA_ANTIGRAVITY_MODEL default, so this one check covers every source
(UI free-text, API, persisted value, env) — addressing Elon's review that a
UI-only guard is bypassable.

Validation is fail-OPEN: if the catalog can't be discovered we pass the
value through and let agy resolve it, so a discovery hiccup never blocks a
run. Pure antigravityModelError() is unit-tested (valid / unknown / near-miss
/ empty-model / empty-catalog); verified live against real agy 1.0.6.

Co-authored-by: multica-agent <github@multica.ai>

---------

Co-authored-by: J <j@multica.ai>
Co-authored-by: multica-agent <github@multica.ai>
2026-06-08 15:32:53 +08:00
Bohan Jiang
3808049361 fix(codex): set semantic thread names (#3887)
Co-authored-by: J <j@multica.ai>
Co-authored-by: multica-agent <github@multica.ai>
2026-06-08 14:53:31 +08:00
Multica Eve
14f89bc08a Fix Claude control request handling (#3827)
Co-authored-by: Eve <eve@multica-ai.local>
Co-authored-by: multica-agent <github@multica.ai>
2026-06-05 17:14:33 +08:00
Bohan Jiang
3708fb0f07 fix(daemon): inactivity-based agent run timeout, no wall-clock guillotine (MUL-3064)
Active long-running sessions are no longer killed by a fixed wall-clock deadline. Liveness is delegated to the idle watchdog (MULTICA_AGENT_IDLE_WATCHDOG, default 30m) with a larger in-flight-tool budget (MULTICA_AGENT_TOOL_WATCHDOG, default 2h). MULTICA_AGENT_TIMEOUT is an opt-in absolute cap (default 0 = no cap). The server-side 2.5h sweeper is unchanged as a coarse backstop.

Fixes #3745.
2026-06-05 15:06:07 +08:00
Bohan Jiang
76dbb87762 fix(agent): standardize model-discovery timeouts to 15s, stop caching empty results
Raise pi and cursor model-list discovery timeouts 5s->15s to match opencode/ACP; openclaw stays 30s (sequential multi-spawn). Stop caching empty discovery results so a transient timeout doesn't keep the picker blank for the full TTL. Fixes #3729. MUL-2977.
2026-06-05 14:47:59 +08:00
Bohan Jiang
c9ceaee4d9 fix(agent): stop stripping user-facing CLAUDE_CODE_* config from child env (#3690)
* fix(agent): stop stripping user-facing CLAUDE_CODE_* config from child env

isFilteredChildEnvKey blanket-removed every CLAUDE_CODE_* var from the
spawned Claude Code child's environment. The intent was only to keep the
daemon's internal session markers from leaking, but CLAUDE_CODE_* is also
Anthropic's user-facing config namespace. On Windows this stripped the
user-set CLAUDE_CODE_GIT_BASH_PATH, so Claude Code could not locate
bash.exe, exited immediately, and every task failed with
"write claude input: write |1: The pipe has been ended."

Switch from prefixing the whole CLAUDE_CODE_ namespace to an exact-name
denylist of the internal runtime/session markers (CLAUDECODE,
CLAUDE_CODE_ENTRYPOINT, CLAUDE_CODE_EXECPATH, CLAUDE_CODE_SESSION_ID,
CLAUDE_CODE_TMPDIR, CLAUDE_CODE_SSE_PORT), still blanket-stripping the
wholly-internal CLAUDECODE_* namespace. Every other CLAUDE_CODE_* var
(GIT_BASH_PATH, USE_BEDROCK, USE_VERTEX, MAX_OUTPUT_TOKENS, ...) now
reaches the child. The internal-marker set was confirmed against the live
runtime, not guessed.

Fixes the whole class, not just git-bash: Bedrock/Vertex/etc. were
silently dropped the same way.

MUL-2940

Co-authored-by: multica-agent <github@multica.ai>

* fix(agent): keep CLAUDE_CODE_TMPDIR in child env

CLAUDE_CODE_TMPDIR is a documented, user-configurable temp-dir override
(public env-vars reference), not an internal per-session marker. Claude
Code creates its own per-session subdir under it, so inheriting it is
harmless — and stripping it would silently break a user's temp-dir
override the same way the broad prefix filter broke CLAUDE_CODE_GIT_BASH_PATH.

Drop it from the internal denylist (which now holds only the undocumented
per-process runtime markers: CLAUDECODE, CLAUDE_CODE_ENTRYPOINT,
CLAUDE_CODE_EXECPATH, CLAUDE_CODE_SESSION_ID, CLAUDE_CODE_SSE_PORT) and
assert it reaches the child.

MUL-2940

Co-authored-by: multica-agent <github@multica.ai>

---------

Co-authored-by: J <j@multica.ai>
Co-authored-by: multica-agent <github@multica.ai>
2026-06-05 12:29:55 +08:00
Bohan Jiang
c99c2493ae fix(agent): keep resolvable models when CLI discovery exits non-zero
Parse the discovered catalog even when the model-discovery CLI exits non-zero (pi/opencode/cursor/openclaw) instead of discarding it and returning an empty model picker. Filter pi diagnostic lines so stale-pattern warnings don't coin bogus models. Fixes #3729. MUL-2977.
2026-06-04 14:57:30 +08:00
Bohan Jiang
fcb5099ec5 fix(agent): raise opencode model-discovery timeout to 15s (MUL-2888) (#3689)
Newer opencode (1.15+) syncs its hosted free-model catalog over the
network on `opencode models`, which can take ~6s. The previous 5s cap
killed the command, discoverOpenCodeModels returned an empty list, and
the daemon reported it as a successful empty result — so the runtime
showed online but the model picker was empty ("暂无可用模型").

Fixes #3627

Co-authored-by: J <j@multica.ai>
Co-authored-by: multica-agent <github@multica.ai>
2026-06-03 12:18:43 +08:00
Multica Eve
e2720f7d33 feat: add opencode thinking variants
Adds OpenCode model variant discovery for thinking controls, passes saved thinking_level through opencode run --variant, and hardens verbose model parsing with fallback coverage.
2026-06-02 13:15:14 +08:00
Jan De Dobbeleer
1e1a4f7845 fix(daemon): fix Copilot CLI invocation on Windows and strip shell quotes from custom args (MUL-2876)
Bug 1: detect copilot.cmd/.bat on Windows and invoke the sibling .ps1 directly via powershell -File, bypassing cmd.exe %* re-tokenisation that mangled the multi-line -p prompt. Shared rewriteCmdToPS1() now serves cursor, pi, and copilot.

Bug 2: filterCustomArgs (shared by all agent backends) strips one outer layer of shell quotes via unshellQuoteArg() before processing, so shell-style custom args like --deny-tool='write' no longer reach the CLI with literal quotes.
2026-06-01 23:28:51 +08:00
Mohammed Helaiwa
d4b97dc44a fix(agent): drain claude stdout while writing prompt to stdin (#3490)
The claude backend wrote the full prompt to the child's stdin and closed
it before starting the stdout reader goroutine. With
--verbose --output-format stream-json the CLI emits a startup banner
before reading its first stdin frame; with no reader draining stdout, the
child blocks on its stdout write, never reads stdin, and our stdin Write
blocks until the per-task context fires. The field symptom is tasks
failing exactly at the 2 h per-task timeout with
"write |1: The pipe has been ended."

Move writeClaudeInput into its own goroutine so the prompt write and the
stdout drain proceed concurrently. Guard stdin close with sync.Once (it
can now be called from both the writer goroutine and, previously, the
result handler). Join the write result at cmd.Wait() and surface a write
failure as a "failed" status only when no result event arrived and no
session was established, so a genuine startup death still reports the
stderr tail.

Add a regression test that re-execs the test binary as a fake claude
which bursts 256 KiB to stdout before reading stdin, with a 128 KiB
prompt pushed at stdin — both past any plausible OS pipe buffer — so a
regression hangs until the test deadline instead of passing.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-01 13:52:41 +08:00
feifeigood
382cdd6a0b feat(agent): consume OpenCode mcp_config via OPENCODE_CONFIG_CONTENT (#3098)
Closes the runtime-side gap of #2106: previously `agent.mcp_config` was
honored only by Claude Code (via `--mcp-config <file>`); for OpenCode the
field was accepted by the API but silently ignored at execution time.

## Approach

OpenCode has no `--mcp-config` flag. Project the agent's `mcp_config`
into OpenCode via OPENCODE_CONFIG_CONTENT — OpenCode's general
inline-config injection environment variable, which accepts any subset
of OpenCode's config schema (model / agent / mode / plugin / mcp / …)
and merges at "local" scope after the project-config loop. MCP is the
only field this PR projects through that channel; if a future Multica
field needs the same channel it would assemble a combined config slice
before the env append.

The env-var route was deliberate. An earlier draft of this PR wrote
the translated MCP servers into <workdir>/opencode.json and removed
the file on cleanup; review (#3098) flagged that the task workdir is
reused across turns for the same (agent, issue), and any agent- or
user-written model / tools / permission settings in opencode.json
must survive across runs. OPENCODE_CONFIG_CONTENT avoids the workdir
entirely — nothing is written to disk, no cleanup is needed, and the
env entry dies with the spawned process.

OPENCODE_CONFIG_CONTENT was added to OpenCode in v1.4.10 (2025-09); the
official @opencode-ai/sdk uses the same env var to inject runtime
config, so the surface is stable. Verified empirically against
OpenCode 1.15.6 in our K8s runtime: `opencode debug config` returns
the injected mcp slice deep-merged with the user's global config,
and <workdir>/opencode.json is observably untouched.

## Translation surface

`agent.mcp_config` accepts two shapes for portability:

- Claude-style `{"mcpServers": {name: {url|command, ...}}}` is
  translated into OpenCode's native form: `type: "local"|"remote"`,
  `command` coerced to a string array, `env` renamed to `environment`.
- Native OpenCode `{"mcp": {name: ...}}` accepts the three shapes
  OpenCode's schema permits and is strict-decoded against each:
    - McpLocalConfig:  `{type:"local", command:[…], environment?, enabled?, timeout?}`
    - McpRemoteConfig: `{type:"remote", url:"…", headers?, oauth?, enabled?, timeout?}`
    - bare override:   `{enabled: bool}` (toggle a server inherited
                        from global / project config without redefining it)
  Decoding uses `json.DisallowUnknownFields` so any field outside the
  matching schema is rejected — matching OpenCode's
  `additionalProperties: false`. Without this, a malformed payload
  (e.g. `command: "node"` instead of `command: ["node"]`) would reach
  OpenCode verbatim and either silently disable the server or crash
  the CLI at startup.

Field-level checks the strict decoder doesn't catch:
  - `timeout` must be a positive integer (rejects 0, negative, fractional)
  - `oauth` must be either an object (validated against McpOAuthConfig)
    or the literal `false`; primitives and `true` are rejected as ambiguous
  - `oauth.callbackPort` must be in 1..65535 when set

## Precedence

Go's os/exec dedups `cmd.Env` by key keeping the LAST occurrence
(Go 1.9+). Appending OPENCODE_CONFIG_CONTENT after `buildEnv(b.cfg.Env)`
guarantees the daemon's value wins over any value the user happened
to put in `agent.custom_env` — which matches the intended semantics
(`mcp_config` is the authoritative daemon-managed field; `custom_env`
is the escape hatch). When that override happens we surface a warning
log so accidental clobbers are debuggable.

## Limitation (out of scope, accepted in review)

OpenCode also deep-merges its **global** config
(`~/.config/opencode/opencode.json`) into every session and exposes no
flag to disable that. Operators who want strict per-agent isolation
from the global layer can set:

```jsonc
// agent.custom_env on the platform
{ "XDG_CONFIG_HOME": "/tmp/opencode-isolated" }
```

…pointing at any directory without an `opencode/` subdir. OpenCode then
reads no global config and only honors what the daemon injects via
OPENCODE_CONFIG_CONTENT. Verified with `opencode debug config`.

## Changes

server/pkg/agent/opencode_mcp.go (new):
  - buildOpenCodeMCPConfigContent — translates raw mcp_config into the
    JSON string OpenCode accepts via OPENCODE_CONFIG_CONTENT, returns
    "" when there's nothing to inject so the caller can skip the env
    entry (avoids clobbering anything the user put in
    agent.custom_env.OPENCODE_CONFIG_CONTENT)
  - translateMCPConfigForOpenCode + helpers — Claude-style → OpenCode
    native shape
  - validateOpenCodeNativeMCPEntry + opencodeMCPLocal /
    opencodeMCPRemote / opencodeMCPEnabledOnly / opencodeMCPOAuth
    typed structs — strict-decode native-shape entries against the
    schema (DisallowUnknownFields), plus targeted post-decode
    assertions for timeout / oauth / callbackPort

server/pkg/agent/opencode.go:
  - 12 lines of env injection in Execute(), placed AFTER buildEnv so
    the daemon's value wins via os/exec dedup
  - warning log when agent.custom_env duplicates the same key
  - no on-disk state, no rollback closure, no post-run cleanup —
    OPENCODE_CONFIG_CONTENT lives only in the spawned process env

server/pkg/agent/opencode_mcp_test.go (new):
  - TestBuildOpenCodeMCPConfigContent_{Empty,Remote,Local,Native}
  - TestBuildOpenCodeMCPConfigContent_NativeAcceptsAllSchemaFields —
    covers each native variant round-tripping every optional field
    (local with env+timeout+enabled; remote with headers+oauth-object+
    timeout+enabled; remote with oauth: false; bare {enabled} override)
  - TestBuildOpenCodeMCPConfigContent_RejectsMalformedNative — 31-case
    table covering every constraint on Bohan-J's review: command must
    be a string array, environment / headers values must be strings,
    oauth must be an object or false, timeout must be a positive
    integer, additionalProperties: false (per-shape allow-list checked
    via DisallowUnknownFields)
  - TestOpencodeBackendInjectsMCPConfigViaEnv — E2E happy path; fake
    opencode binary captures $OPENCODE_CONFIG_CONTENT, asserts the
    translated mcp slice is present AND <workdir>/opencode.json was
    NOT written
  - TestOpencodeBackendOmitsMCPEnvWhenEmpty — empty mcp_config does
    NOT inject the env, preserving any value the user set in
    agent.custom_env
  - TestOpencodeBackendOverridesUserOpenCodeConfigContent — daemon
    value wins via os/exec dedup keep-last

apps/docs/content/docs/providers.{en,zh}.mdx:
  - flip OpenCode's MCP cell from  to 
  - reword the "MCP configuration: only Claude Code actually reads it"
    section so OpenCode is included; describe each tool's mechanism
    (Claude → `--mcp-config`, OpenCode → OPENCODE_CONFIG_CONTENT)

apps/docs/content/docs/install-agent-runtime.{en,zh}.mdx:
  - update the Claude Code blurb (no longer "the only one")
  - expand the OpenCode blurb to mention mcp_config support
  - fix the now-broken /providers anchor

Refs #2106 (TS types and per-agent UI for mcp_config are separate
follow-ups, not in this PR).

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-30 18:08:21 +08:00
Bohan Jiang
e1745d09ea MUL-2797 feat(agent): add Claude Opus 4.8 to model catalog & pricing (#3492)
Claude Code now ships Opus 4.8 (claude-opus-4-8). Add it to the three
places that enumerate Claude models so the picker, thinking-level
catalog, and usage cost estimates all recognize it:

- claudeStaticModels(): list Claude Opus 4.8 (Sonnet 4.6 stays default)
- claudeModelEffortAllow: Opus supports the full low..max set incl. xhigh
- MODEL_PRICING: $5/$25 in, $0.50 cache read, $6.25 5m cache write —
  same current-gen Opus tier as 4.5/4.6/4.7, confirmed against
  platform.claude.com/docs/en/about-claude/pricing

Co-authored-by: J <j@multica.ai>
Co-authored-by: multica-agent <github@multica.ai>
2026-05-29 10:28:30 +08:00
Bohan Jiang
09f9c7e2ce MUL-2764 feat(agent): wire mcp_config through ACP runtimes (Hermes / Kimi / Kiro) (#3439)
* MUL-2764 feat(agent): wire mcp_config through ACP runtimes (Hermes / Kimi / Kiro)

The MCP config Tab (#3419) already lets admins save mcp_config on an
agent, and the daemon plumbs it through to `agent.ExecOptions.McpConfig`
for every runtime. Claude and Codex consume it; the three ACP runtimes
(Hermes / Kimi / Kiro) ignored the field and hardcoded an empty
`mcpServers: []` in their `session/new` requests.

Add `buildACPMcpServers` to translate the Claude-style `{"mcpServers":
{"<name>": {...}}}` object-of-objects into the array shape ACP requires
(`[{name, command, args, env: [{name,value}, ...]}, ...]` for stdio;
`[{type, name, url, headers: [...]}, ...]` for http/sse), then pass the
translated array on `session/new` (all three) and `session/load` (kiro
resume). Malformed JSON fails the launch closed — same contract Codex's
`renderCodexMcpServersBlock` uses — so users see a real error instead of
silently running with no MCP servers. Individual unclassifiable entries
(no command, no url) are skipped with a warning so one bad row can't
take MCP down for the rest of the agent.

Co-authored-by: multica-agent <github@multica.ai>

* MUL-2764 fix(agent): wire mcp_config through ACP resume + gate http/sse on capability

Addresses the two blockers Elon raised on #3439:

1. session/resume now carries mcpServers for Hermes and Kimi (Kiro's
   session/load already did). Per the ACP Session Setup spec the resume
   path re-attaches MCP servers, and without this a resumed task lost
   access to MCP tools that a fresh task on the same agent would have
   had. Pinned with new TestHermesResumeIncludesMcpServers and
   TestKimiResumeIncludesMcpServers integration tests that inspect the
   recorded wire request.

2. Added extractACPMcpCapabilities + filterACPMcpServersByCapability so
   http/sse MCP entries get dropped (with a daemon-log warning naming
   the entry) when the runtime's initialize response doesn't advertise
   mcpCapabilities.http / .sse. Sending those entries to a stdio-only
   runtime is a spec violation and reliably tanks session/new; now they
   get filtered and the rest of the session still starts. Stdio entries
   pass through unconditionally. Both backends wire the filter in right
   after initialize so session/new and session/resume see the same
   filtered list.

Also added TestKiroLoadIncludesMcpServersFromConfig — Elon flagged that
no test pinned "non-empty mcp_config actually reaches the wire" for
Kimi/Kiro, so the wire assertions go in for all three runtimes.

Co-authored-by: multica-agent <github@multica.ai>

---------

Co-authored-by: J <j@multica.ai>
Co-authored-by: multica-agent <github@multica.ai>
2026-05-28 16:29:49 +08:00
Bohan Jiang
bae8a84abd MUL-2767 feat(agent): add Antigravity runtime backend (#3427)
* feat(agent): add Antigravity runtime backend

Adds Google's Antigravity CLI (`agy`) as the 12th supported coding-tool
runtime, alongside Claude / Codex / Cursor / Copilot / Gemini / Hermes /
Kimi / Kiro / OpenCode / OpenClaw / Pi.

The CLI emits plain assistant text on stdout (no structured event
stream), so the backend streams stdout line-by-line as `MessageText`
events and accumulates the same text as the final `Result.Output`.
Session resumption uses `--conversation <id>`; because the conversation
UUID is not echoed on stdout, the daemon routes `--log-file` to a temp
file and recovers the id from the glog-formatted log lines.

MUL-2767

Co-authored-by: multica-agent <github@multica.ai>

* fix(agent): correct Antigravity capability contract from Elon review

- ModelSelectionSupported now returns false for antigravity. `agy` has no
  --model flag and antigravityBackend deliberately drops opts.Model, so
  the UI must render a disabled "Managed by runtime" picker instead of
  an empty dropdown plus a silently-ignored manual-entry field. Also
  stop seeding AgentEntry.Model from MULTICA_ANTIGRAVITY_MODEL — the
  backend would silently ignore it.

- Antigravity skills now write to {workDir}/.agents/skills/, the CLI's
  native workspace path (inherits Gemini CLI's layout per
  https://antigravity.google/docs/gcli-migration). Previously they went
  to the .agent_context/skills/ fallback that the CLI doesn't scan.
  Runtime brief moves antigravity into the native-discovery branch and
  local_skills.go points the user-level skill root at
  ~/.gemini/antigravity-cli/skills for Runtime → local skill import.

- Doc + UI comment sync: providers matrix / install-agent-runtime /
  cloud-quickstart / agents-create / tasks (session-resume support) /
  skills / README all now list Antigravity in the right buckets, and
  the model-picker / model-dropdown comments cite antigravity (not the
  stale hermes reference) as the supported=false example.

New tests: TestAntigravityModelSelectionUnsupported,
TestInjectRuntimeConfigAntigravity (native discovery wording),
TestWriteContextFilesAntigravityNativeSkills (.agents/skills/ landing,
.agent_context/skills/ NOT written).

Co-authored-by: multica-agent <github@multica.ai>

* feat(provider-logo): swap inline placeholder for real Antigravity PNG

Replaces the hand-drawn planet+arc placeholder with the official asset
shipped from Downloads. Stored next to the component; bundlers
(Next.js / electron-vite) resolve the PNG import to a URL string at
build time. Added a small assets.d.ts so packages/views' tsc accepts
PNG / SVG module imports — there was no prior asset usage in this
package to register the declaration.

---------

Co-authored-by: J <j@multica.ai>
Co-authored-by: multica-agent <github@multica.ai>
2026-05-28 15:40:05 +08:00
Bohan Jiang
d39da9f7f0 MUL-2764: feat(agents): add MCP config tab to agent detail page (#3419)
* MUL-2764: feat(agents): add MCP config tab to agent detail page

Backend already stores `mcp_config` and the daemon forwards it to the
runtime CLI via `--mcp-config`; this only adds the UI entry point.

The new tab presents a JSON editor that pretty-prints the existing
config, validates the buffer on every keystroke, and saves through the
existing `PUT /api/agents/{id}` path. Clearing the editor sends
`mcp_config: null`, which the handler reads as "wipe the column" and
the daemon falls back to the CLI's own default.

When the caller can't see secrets (agent actor, or a non-owner
non-admin member), the server already returns `mcp_config: null` with
`mcp_config_redacted: true`; the tab renders a read-only "configured
but hidden" state in that case so a non-privileged member cannot
silently overwrite an admin-owned config by saving an empty editor.

Co-authored-by: multica-agent <github@multica.ai>

* fix(agents): MCP tab — preserve in-flight edits + warn non-Claude runtimes

- Fix stale-editor sync: compare the local draft against the *previous*
  original via a ref, so a background agent refetch updates an untouched
  editor instead of being silently ignored. Without this, a draft equal to
  the OLD original was treated as user-edited after the prop changed, and
  the next Save would write the old config back over a concurrent admin
  edit.
- Surface a notice inside the tab when the agent's runtime provider is not
  Claude — today's daemon only forwards mcp_config via Claude's
  --mcp-config, so saving on e.g. a Codex agent was silent but ineffective.
- Tests for both: rerender resyncs an untouched editor, rerender preserves
  an in-flight edit, warning renders on non-Claude / hides on Claude.

MUL-2764

Co-authored-by: multica-agent <github@multica.ai>

* MUL-2764: feat(agents): codex MCP support + hide MCP tab on unsupported runtimes

- Backend: codex.go now translates agent.mcp_config (Claude-style
  `{"mcpServers": {...}}`) into `-c mcp_servers.<name>=<inline-toml>`
  flags for `codex app-server`, so MCP servers configured in the UI
  reach Codex's per-task config layer. Bad mcp_config JSON downgrades
  to a warn-and-skip so it can't break the agent launch.
- Frontend: AgentOverviewPane hides the MCP tab when the agent's
  runtime provider doesn't read mcp_config — only `claude` and `codex`
  are supported today, every other provider sees no MCP tab. The
  previous in-tab warning is removed (no longer reachable).
- New shared helper `providerSupportsMcpConfig` lives in
  `@multica/core/agents` so views and any future caller share one list
  of MCP-aware providers.
- Tests: new go-side coverage for stdio + url + multi-server inputs,
  TOML string escaping, malformed-input fallback, and arg ordering vs
  custom_args; new views-side coverage for which providers surface the
  MCP tab. En + zh-Hans copy and parity test refreshed.

Co-authored-by: multica-agent <github@multica.ai>

* MUL-2764: fix(agents): keep codex mcp_config secrets out of argv/logs

Move the agent's mcp_config from a `-c mcp_servers.<id>=<inline-toml>`
argv flag into a daemon-managed `[mcp_servers.*]` block inside the
per-task `$CODEX_HOME/config.toml`. mcp_servers.<id>.env is a documented
Codex config field and the UI already treats mcp_config as redacted for
non-admins; argv would have leaked those values into `ps aux` and the
`agent command` log line. The file is forced to 0600 to keep secrets in
the daemon owner's lane regardless of the seed file's mode.

Also drop user-supplied `-c/--config mcp_servers.*` entries from
custom_args. Codex `-c` is last-wins (verified against codex-cli 0.132.0),
so without filtering, a custom_args entry could silently shadow whatever
the MCP Tab saved.

Strip inherited `[mcp_servers.*]` tables from the per-task config.toml
when the agent has its own mcp_config, mirroring Claude's
`--strict-mcp-config`: avoids TOML "table already exists" errors on
name collisions and matches admin expectations that the MCP Tab is the
authoritative source for that task.

Co-authored-by: multica-agent <github@multica.ai>

* MUL-2764: fix(agents): codex mcp_config three-state semantics + custom_args compat

Address the third review pass:

1. Distinguish nil vs present-but-empty mcp_config. `{}` and
   `{"mcpServers":{}}` now count as "admin saved an explicit (empty)
   managed set" — strip inherited user `[mcp_servers.*]` and pin an
   empty managed marker block. Only SQL NULL / JSON `null` map to
   "absent" and fall back to the user's global `~/.codex/config.toml`.
   This aligns Codex with the API's three-state contract (omit / null
   / object) and with Claude's `--strict-mcp-config` semantics.

2. Fail closed on `ensureCodexMcpConfig` errors and on managed
   mcp_config without CODEX_HOME. Previous warn-and-launch would
   silently inherit the user's global MCP servers and look identical
   to a successful apply — exactly the surprise the MCP Tab is meant
   to remove.

3. Only filter `-c mcp_servers.*` from `custom_args`/`extra_args`
   when the agent has a managed mcp_config. Pre-MUL-2764 agents that
   configured MCP via custom_args keep working; once an admin opts
   in via the MCP Tab the daemon owns the `mcp_servers` namespace
   and overrides are dropped (last-wins safety).

4. Update mcp_config locale intro to mention $CODEX_HOME/config.toml
   instead of the now-removed `-c mcp_servers.*` argv path.

Tests:
- Split `TestEnsureCodexMcpConfigEmptyInputsAreNoop` into
  `TestEnsureCodexMcpConfigAbsentLeavesUserTablesAlone` (nil/null)
  and `TestEnsureCodexMcpConfigEmptyManagedSetStripsUserMcp` (`{}`,
  `{"mcpServers":{}}`).
- Add `TestEnsureCodexMcpConfigEmptyManagedSetIdempotent` to pin
  byte-identical reruns on the empty managed marker block.
- Add `TestHasManagedCodexMcpConfig` covering the eight relevant
  inputs.
- Add `TestBuildCodexArgsPreservesCustomMcpOverridesWhenUnmanaged`
  and `TestBuildCodexArgsDropsCustomMcpOverridesWhenManaged` to
  pin the new gating.
- Add `TestCodexExecuteFailsClosedWhenMcpConfigInvalid` and
  `TestCodexExecuteFailsClosedWhenManagedMcpButNoCodexHome` for the
  Execute paths.

Co-authored-by: multica-agent <github@multica.ai>

---------

Co-authored-by: J <j@multica.ai>
Co-authored-by: multica-agent <github@multica.ai>
2026-05-28 15:11:28 +08:00
Bohan Jiang
2bda4065d0 MUL-2708: fix(agent): preserve multi-line Pi prompt on Windows by bypassing the .cmd shim (#3417)
Pi is installed on Windows via npm, which lays down `pi.cmd` → `pi.ps1`
→ `node_modules/@mariozechner/pi-coding-agent/dist/cli.js`. The daemon
spawns Pi with `exec.Command("pi", ...)`; PATHEXT resolves that to
`pi.cmd`, and cmd.exe expands `%*` in the shim by re-tokenising the
original command line, which truncates any argv containing newlines.

buildPiArgs passes the full prompt as the last positional argv, so the
multi-line system+user prompt is silently cut at the first newline
before it reaches the JS entrypoint. The session JSONL then records
only the first line ("You are running as a chat assistant for a Multica
workspace.") and Pi replies as if the user message were missing
(GitHub multica-ai/multica#3306).

Mirror the existing cursor-agent fix: when LookPath resolves Pi to a
.cmd/.bat launcher and a sibling pi.ps1 exists, invoke PowerShell with
`-File <ps1>` directly and forward each arg as a discrete token. This
keeps us on the official launch path while skipping the cmd.exe %*
re-expansion. Falls back to the original launcher when pi.ps1 or
PowerShell can't be located.

The Windows test asserts the rewrite produces the expected argv and
that the multi-line positional prompt survives unchanged.

Co-authored-by: J <j@multica.ai>
Co-authored-by: multica-agent <github@multica.ai>
2026-05-28 12:36:16 +08:00
Kagura
f02bc56e70 fix(agent/cursor): remove obsolete 'chat' subcommand from argv (#3077) (#3092)
The current cursor-agent CLI no longer has a 'chat' subcommand. The
positional 'chat' argument was silently treated as prompt text, leaking
into the user message (e.g. 'chat <actual prompt>').

Remove 'chat' from buildCursorArgs so the generated argv matches the
current cursor-agent CLI interface.

Fixes #3077
2026-05-27 16:40:29 +08:00
Multica Eve
311cf4d998 fix(agent): surface Codex app-server no-progress diagnostics (MUL-2688)
Refs #3262.
2026-05-26 18:42:47 +08:00
Multica Eve
26ff52385b fix: attribute Hermes usage to current model (MUL-2696)
Fix Hermes ACP usage attribution to current model when agent.model is unset.

Also preserves cache-read token accounting and makes ACP model-list parsing more tolerant of snake_case payloads and Unknown display names.
2026-05-26 18:13:28 +08:00
Multica Eve
744b474199 revert(agent): remove per-agent local skill toggle (MUL-2603) (#3286)
* Revert "feat(agents): hide skills_local toggle for runtimes that don't honour it (MUL-2603) (#3276)"

This reverts commit 0b50c5a209.

Co-authored-by: multica-agent <github@multica.ai>

* Revert "fix(agent): surface host OAuth token via env var on macOS isolation (MUL-2603) (#3267)"

This reverts commit a67bf81225.

Co-authored-by: multica-agent <github@multica.ai>

* Revert "fix(agents): tighten skills-tab intro and drop redundant import hint (#3265)"

This reverts commit d8075a5775.

Co-authored-by: multica-agent <github@multica.ai>

* Revert "fix(agent): mirror $HOME/.claude.json into isolated config dir (MUL-2661) (#3261)"

This reverts commit 40da88fc16.

Co-authored-by: multica-agent <github@multica.ai>

* Revert "feat(agent): per-agent toggle to isolate host-machine skills (MUL-2603) (#3200)"

This reverts commit 960befa56f.

Co-authored-by: multica-agent <github@multica.ai>

* Add migration cleanup for reverted agent skills toggle

Co-authored-by: multica-agent <github@multica.ai>

---------

Co-authored-by: Eve <eve@multica-ai.local>
Co-authored-by: multica-agent <github@multica.ai>
2026-05-26 17:00:01 +08:00
Bohan Jiang
a67bf81225 fix(agent): surface host OAuth token via env var on macOS isolation (MUL-2603) (#3267)
* fix(agent): surface host OAuth token via env var on macOS isolation (MUL-2603)

Claude Code 2.x scopes the macOS keychain credentials entry by
sha256(CLAUDE_CONFIG_DIR)[:8], so the MUL-2603 isolation path strands
the child at "Not logged in" even after #3261 mirrored .claude.json:
the child looks up `Claude Code-credentials-<scratch-hash>`, the host
token is sitting in the no-suffix `Claude Code-credentials` entry.

Read the host OAuth token from the keychain via /usr/bin/security and
inject it as CLAUDE_CODE_OAUTH_TOKEN, which bypasses keychain lookup
entirely. Linux/Windows continue to use the .credentials.json mirror
(no-op there). Operator-pinned tokens and ANTHROPIC_API_KEY both take
precedence over the keychain reader.

Co-authored-by: multica-agent <github@multica.ai>

* fix(agent): tighten empty-value auth gate, pin Claude CLI env-scrub assumption (MUL-2603)

Empty-value gate
  - `ANTHROPIC_API_KEY=` inherited from a login shell that conditionally
    exports auth previously posed as an "operator pinned API-key auth"
    choice and disabled the keychain reader, stranding the isolated child
    at "Not logged in" even though no auth was actually selected.
  - Custom_env `CLAUDE_CODE_OAUTH_TOKEN=""` (stale agent config) had the
    same effect, plus would have shadowed a keychain-injected token in
    libc env lookups that pick the first match.
  - Both are now treated as noise: the empty entry is dropped from the
    child env and the keychain reader runs unchanged. Two new unit tests
    cover the os.Environ side (`...TreatsEmptyAnthropicAPIKeyAsUnpinned`,
    `...HonorsNonEmptyAnthropicAPIKey`) and the custom_env side
    (`...EmptyOAuthTokenInCustomEnvAsUnpinned`).

Env-scrub boundary
  - Surfacing `CLAUDE_CODE_OAUTH_TOKEN` to the isolated child is only
    safe because Claude Code itself drops that variable from the env it
    hands to Bash / hook subprocesses, so a model-driven `printenv` can
    never echo the secret into the agent transcript.
  - Empirically verified against `claude` 2.1.121:
        printf '...test -n "$CLAUDE_CODE_OAUTH_TOKEN" && echo SET || echo UNSET...' \
            | CLAUDE_CODE_OAUTH_TOKEN=sk-canary-XYZ \
              MUL2603_CONTROL=control-value \
              claude --print --output-format text \
                     --allow-dangerously-skip-permissions --allowedTools Bash
    returned `UNSET` for the OAuth token while the non-sensitive
    `MUL2603_CONTROL` control returned `CONTROL-SET`, proving the CLI
    scrubs only the auth env, not the env in general.
  - Pinned this assumption in a new skip-gated regression test
    (`TestClaudeCLIScrubsOAuthTokenFromBashSubprocess`) that boots the
    real CLI with a canary token; failing the test means upstream
    Claude Code stopped scrubbing and the passthrough must move off env
    vars before MUL-2603 can ship.

Co-authored-by: multica-agent <github@multica.ai>

* fix(agent): gate keychain passthrough on default host dir, harden scrub test (MUL-2603)

Two follow-ups from the round-2 review on #3267:

1. Custom CLAUDE_CONFIG_DIR no longer pulls the default OAuth token.
   Claude Code 2.x maps each config dir to its own suffixed
   `Claude Code-credentials-<hash>` keychain entry, so an operator that
   pins a managed/custom CLAUDE_CONFIG_DIR via custom_env or the
   daemon-host env was getting the *daemon user's* default unsuffixed
   entry injected into the isolated child — silently crossing accounts,
   exactly the boundary mirrorHostClaudeJSONIfMissing already protects
   for `.claude.json`. buildClaudeEnvWith now threads the effective
   hostConfigDir through and only calls the reader when that dir is the
   default `$HOME/.claude`. The new gate has a unit-level truth table
   (TestIsDefaultHostClaudeConfigDir) plus a regression
   (TestBuildClaudeEnvIsolatedSkipsKeychainForCustomHostConfigDir) that
   makes a t.Fatal-armed reader prove the gate keeps the read off for
   custom dirs.

2. Scrub e2e now asserts the control prong and the proof-of-execution
   marker, not just "canary absent". The previous assertion would
   false-pass on a model refusal, paraphrase, or "Bash gets no env at
   all" upstream change. The strengthened version sets a non-secret
   MUL2603_CONTROL alongside the canary OAuth token and asserts (a)
   canary is NOT in the transcript, (b) CONTROL-SET IS in the
   transcript (env propagation works for non-secrets — proves a
   targeted scrub), (c) UNSET IS in the transcript (the Bash tool
   actually ran AND saw the OAuth var as empty/unset). Code comment in
   buildClaudeEnvWith and the test docstring now narrow the
   security contract to the Bash tool subprocess only; hook subprocess
   env-scrub is no longer claimed because it has not been verified.

Co-authored-by: multica-agent <github@multica.ai>

* test(agent): use per-run nonces in Claude scrub e2e to kill false-pass (MUL-2603)

Elon's round-3 review flagged that TestClaudeCLIScrubsOAuthTokenFromBashSubprocess
still false-passed: the proof markers "UNSET" / "CONTROL-SET" were literal
strings in the prompt, so strings.Contains matched them even when the model
only paraphrased the prompt without spawning Bash.

Replace the hard-coded markers with two per-run random hex nonces passed *only*
via env vars (MUL2603_UNSET_NONCE, MUL2603_CONTROL_NONCE). The prompt now
references the variable names, not the values, so the nonces can land in the
transcript only if a real Bash subprocess inherits the env vars and echoes
them. A paraphrasing or refusing model cannot fake nonces it never saw.

Also update the security-boundary comment in buildClaudeEnvWith to describe
the nonce-based proof.

Co-authored-by: multica-agent <github@multica.ai>

---------

Co-authored-by: multica-agent <github@multica.ai>
2026-05-26 15:29:58 +08:00
Bohan Jiang
40da88fc16 fix(agent): mirror $HOME/.claude.json into isolated config dir (MUL-2661) (#3261)
PR #3200 introduced per-agent `skills_local=ignore` isolation that
mirrors the host's Claude config dir into a per-task scratch dir,
omitting `skills/` to keep broken local skills out of the CLI's
discovery path. The mirror walks entries inside `hostConfigDir`
(default: `$HOME/.claude/`), but Claude Code's default layout stores
its main config — login state, project history — at
`$HOME/.claude.json`, a *sibling* of `~/.claude/` rather than inside
it. Once `CLAUDE_CONFIG_DIR=$ISOLATED` is set, the CLI looks for
`$ISOLATED/.claude.json`, finds only `backups/.claude.json.backup.*`
(those live inside `~/.claude/` and DO get mirrored), and exits with:

  Claude configuration file not found at: …/.claude.json
  Not logged in · Please run /login

— so every agent with `skills_local=ignore` on a host using the
default Claude layout dies on the first turn. Flipping the toggle back
to "merge" restores the host CLAUDE_CONFIG_DIR and recovers the agent;
that's the workaround Bohan flagged in MUL-2661.

Fix: after the existing `mirrorHostClaudeExceptSkills`, run a new
`mirrorHostClaudeJSONIfMissing` that pulls `$HOME/.claude.json` into
the scratch dir as `.claude.json` when (a) the dest doesn't already
have one and (b) the host source dir is the default `$HOME/.claude/`.
The custom-CLAUDE_CONFIG_DIR path is left alone because a pinned
custom dir is expected to be self-contained — silently borrowing
`$HOME/.claude.json` from a different account would mask credential
drift.

The helper goes through `createFileLink`, so it inherits the same
symlink → junction → hardlink → copy fallback chain the rest of the
mirror uses on Windows-without-Developer-Mode hosts.

Tests:
- `TestMirrorHostClaudeJSONIfMissing_DefaultLayoutMirrorsParentFile`
  covers the happy path with an injected `homeDir`/`fileLink`.
- `TestMirrorHostClaudeJSONIfMissing_AlreadyPresentNoop` asserts a
  pre-existing dest `.claude.json` (from a custom CLAUDE_CONFIG_DIR
  mirror) is not overwritten.
- `TestMirrorHostClaudeJSONIfMissing_CustomHostDirSkipped` locks in
  the custom-host-dir gate.
- `TestMirrorHostClaudeJSONIfMissing_MissingSourceNoop` documents the
  env-var-auth-only / fresh-install case.
- `TestClaudeExecuteIsolatesProvidesClaudeJSONFromHome` is the
  end-to-end MUL-2661 regression: a fake `\$HOME` with the default
  split layout, `skills_local=ignore`, fake claude binary that prints
  whatever `.claude.json` reaches the scratch dir. Asserts the file
  rides through. Verified the test fails (with the documented
  MUL-2661 error message) when the new mirror call is removed.

Verification:
- `go test ./pkg/agent/...` green (full agent suite).
- `GOOS=windows GOARCH=amd64 go vet ./pkg/agent/...` clean.

Co-authored-by: multica-agent <github@multica.ai>
2026-05-26 13:50:35 +08:00
Bohan Jiang
960befa56f feat(agent): per-agent toggle to isolate host-machine skills (MUL-2603) (#3200)
* feat(agent): per-agent toggle to isolate host-machine skills (MUL-2603)

Adds an agent-scoped `skills_local` switch ("ignore" default / "merge") so
shared agents stop inheriting the operator's user-global Claude skill
directory. A single broken local skill on one operator's machine was
crashing the Claude CLI before it ever read stdin — the daemon saw a
"broken pipe" with no recoverable signal (GitHub #3052).

- DB: migration 108 adds `agent.skills_local` (NOT NULL DEFAULT 'ignore'),
  with sqlc CreateAgent/UpdateAgent updates and handler validation.
- Claude runtime: when the agent is in "ignore" mode the backend points
  CLAUDE_CONFIG_DIR at an empty per-task scratch dir under the task cwd
  (fallback: OS temp), strips any inherited override, and cleans up after
  the run. Workspace skills under `{cwd}/.claude/skills/` still load.
  "merge" preserves the legacy inherit-from-machine behavior; Codex and
  other isolated backends are no-ops.
- UI: new Skills toggle in the Create Agent dialog and the Agent → Skills
  tab, with EN/zh-Hans copy and SkillsLocalToggle shared between the two.
- Tests: unit coverage for the new env helper, isolation dir lifecycle,
  full Claude execute paths (ignore + merge), and the handler tristate
  contract. Existing skills-tab test updated for the new copy.
- Docs: updated `/skills` docs (EN + ZH) and added a 0.3.7 changelog entry
  in the landing-page i18n.

Co-authored-by: multica-agent <github@multica.ai>

* fix(agent): preserve claude login + validate skills_local input (MUL-2603)

Address Elon's review on PR #3200:

1. Skill isolation no longer drops the operator's Claude login. The
   per-task scratch dir now mirrors every entry under `~/.claude/`
   as symlinks except `skills/`, so `.credentials.json`, settings,
   plugins, etc. reach the CLI exactly as on the host while the
   user-global skills directory stays hidden. Without this, default
   `ignore` would have broken every Claude agent on a non-API-key
   host the moment migration 108 landed.

2. Internal CreateAgent callers (agent_template, onboarding_shim)
   now set `SkillsLocal: "ignore"`. The Go zero value was about to
   trip the migration-108 CHECK constraint and 500 template /
   onboarding agent creation.

3. Create / update handler validation no longer normalizes garbage
   to "ignore". The strict 400 path is now reachable on bad client
   input; the drift-safe `normalizeSkillsLocal` stays on the read
   side only.

UI copy + docs clarified that the toggle is Claude-only; other
runtimes ignore the setting.

Verification:
- `go test ./...` green (full suite locally).
- `pnpm --filter @multica/views exec vitest run agents/components/tabs/skills-tab.test.tsx` green.
- Handler DB-backed tests still skip locally without docker (same
  as Elon's run) — CI will validate the create / update paths
  against migration 108.

Co-authored-by: multica-agent <github@multica.ai>

* fix(agent): mirror effective claude config dir with windows fallback (MUL-2603)

Address Elon's second-round review on PR #3200:

1. The per-task scratch dir now mirrors the *effective* host Claude
   config dir, not unconditionally `~/.claude/`. Precedence: agent
   `custom_env` CLAUDE_CONFIG_DIR > parent process env > `~/.claude/`.
   Without this, an operator who pinned Claude at a managed install
   (custom env CLAUDE_CONFIG_DIR) would get the wrong credentials in
   the scratch dir, because `buildClaudeEnv` strips that env before
   handing it to the child. We resolve the source up front and feed
   it to the mirror, so the override env still points at the right
   bytes.

2. Mirror entries now go through platform-aware linkers. On Windows
   without Developer Mode / admin, `os.Symlink` is denied, which
   previously left the scratch dir empty and broke Claude Code auth
   on default `ignore`. The new helpers try symlink first, then fall
   back to a directory junction (`mklink /J`) for dirs or a hardlink
   (same-volume content share) / copy for files. Mirrors the
   execenv/codex_home_link_windows.go pattern.

3. Tests:
   - `TestResolveHostClaudeConfigDir` locks in the custom_env >
     parent_env > `~/.claude` precedence.
   - `TestNewIsolatedClaudeConfigDirMirrorsCustomHostDir` confirms
     the scratch dir picks up `.credentials.json` from a synthetic
     custom host dir, proving the source resolution actually
     propagates into the mirror.
   - `TestNewIsolatedClaudeConfigDirEmptyHostIsNoop` documents the
     env-var-auth-only case (no host source ⇒ empty scratch dir).
   - `TestMirrorHostClaudeExceptSkillsWith_FallbackWhenSymlinkFails`
     exercises the Windows-no-Developer-Mode path via the new
     `mirrorHostClaudeExceptSkillsWith` seam, asserting credentials
     and sub-dir children still reach the scratch dir after the
     symlink stand-in fails.
   - `TestMirrorHostClaudeExceptSkillsWith_PropagatesFirstLinkError`
     confirms callers see the per-entry error when even fallback
     fails (so the warn-log fires on broken Windows installs).
   - `TestCopyFileRoundTrip` covers the last-resort copy fallback
     and its EXCL no-overwrite contract.
   - `TestClaudeExecuteIsolatesUsesCustomEnvSource` is the
     end-to-end check: an agent with custom_env CLAUDE_CONFIG_DIR
     reads its credentials from the pinned dir, not `~/.claude/`.

4. Docs: `apps/docs/content/docs/skills.{mdx,zh.mdx}` updated to
   describe the effective-source resolution and the Windows
   fallback chain so the docs match the runtime behaviour.

Verification:
- `go test ./...` green (full server suite locally, including
  `pkg/agent` 23 cases covering the new + existing isolation
  paths).
- `GOOS=windows GOARCH=amd64 go vet ./pkg/agent/...` and
  `go test -c -o /dev/null` both compile clean, confirming the
  Windows-tagged linker file builds.

Co-authored-by: multica-agent <github@multica.ai>

* fix(agent): default skills_local to merge to preserve legacy behavior (MUL-2603)

Per Bohan's product decision on PR #3200, the per-agent host-skill toggle
defaults to "merge" — the pre-MUL-2603 inherit-from-machine behavior —
so existing personal workflows that rely on locally installed Claude
Skills keep working unchanged. Agent owners explicitly opt into "ignore"
when they need to harden a shared agent against a broken local skill on
one operator's machine (GitHub #3052).

Also audited all 11 runtimes for user-global skill discovery paths and
documented the scope of the toggle. Only Claude reads a user-global
`~/.claude/skills/`; Codex isolates via `CODEX_HOME`, the ACP backends
(Hermes / Kimi / Kiro) and the JSON-stream backends (Copilot / Cursor /
Gemini / Pi / OpenCode / OpenClaw) anchor discovery to the task workdir
and never read a user-global skill directory. UI copy and docs now say
"for runtimes that support it (currently Claude Code)" everywhere so
the scope is explicit.

Changes:

- Migration 108: column default flipped to 'merge'.
- Handler CreateAgent: missing field → "merge"; explicit "ignore" /
  "merge" still validated, garbage still 400.
- normalizeSkillsLocal: drift-safe coercion now lands on "merge" for
  anything that isn't the exact literal "ignore".
- agent_template.go / onboarding_shim.go: internal CreateAgent callers
  send "merge" instead of "ignore" to match the new default.
- Claude runtime (`claude.go`): isolate-mode gate flipped from
  `SkillsLocal != "merge"` to `SkillsLocal == "ignore"`, so "" (legacy
  daemons / older clients) and "merge" both walk `~/.claude/` directly.
- Create Agent dialog + Skills tab: toggle defaults to on (merge); only
  duplicate of an explicit "ignore" agent carries through. The
  isolation opt-in is now `skills_local: "ignore"` when the user flips
  off; "merge" is omitted from the request body.
- i18n (EN + zh-Hans): copy reframed — "On (default) — merged"; "Off —
  ignored. Recommended for shared agents".
- Docs (`/skills`, `/guides/agents.zh`): describe new default and
  enumerate which runtimes act on the toggle.
- Landing changelog 0.3.7: retitled "Per-Agent Local-Skill Toggle"; note
  the on-by-default behavior + off-to-isolate framing.
- Tests:
  - `TestClaudeExecuteIsolatesHostSkillsWhenIgnoreOptedIn` replaces the
    old by-default isolation case (now requires explicit "ignore").
  - New `TestClaudeExecuteDefaultModeKeepsHostConfigDir` locks in that
    default ExecOptions preserve the host CLAUDE_CONFIG_DIR.
  - `TestClaudeExecuteIsolatesUsesCustomEnvSource` now explicitly opts
    into "ignore" mode.
  - Handler tests: omitted → "merge"; explicit "ignore" round-trips;
    preserve-existing test seeds "ignore" and asserts "merge" flip-back.
  - `TestNormalizeSkillsLocal_DriftStaysSafe`: only literal "ignore"
    maps to ignore; everything else → "merge".
  - `skills-tab.test.tsx`: toggle ON by default; flip OFF when agent
    opted into "ignore". Intro-text matcher anchored to a more specific
    phrase so it no longer collides with the toggle hint copy.

Verification:
- `go test ./...` green (full server suite locally).
- `GOOS=windows GOARCH=amd64 go vet ./pkg/agent/...` and
  `go test -c -o /dev/null` both compile clean (windows-tagged linker
  file still builds).
- `pnpm typecheck` green across all packages and apps.
- `pnpm --filter @multica/views test` 88 files / 771 tests green.
- `pnpm --filter @multica/core test` 43 files / 390 tests green.
- Handler DB-backed tests still skip locally without docker; CI will
  validate the create / update paths against migration 108.

Co-authored-by: multica-agent <github@multica.ai>

* chore(landing): drop 0.3.7 changelog entry from this PR (MUL-2603)

The landing-page release notes belong in a separate release-prep PR, not in the feature PR.

Co-authored-by: multica-agent <github@multica.ai>

* fix(agent): propagate skills_local=ignore to codex user-skill seed (MUL-2603)

Make the per-agent skills_local toggle real for Codex too, not just Claude.
Previously the toggle was only consumed by the Claude backend, while the
daemon's execenv layer always seeded Codex's per-task CODEX_HOME with the
host machine's user-installed skills from ~/.codex/skills/. A shared Codex
agent with skills_local=ignore could still inherit a broken local skill
from one operator's machine.

Now: PrepareParams/ReuseParams carry SkillsLocal; hydrateCodexSkills
skips seedUserCodexSkills when SkillsLocal == "ignore" so the per-task
CODEX_HOME exposes only workspace skills to the codex CLI. Default
("merge", or empty from older servers/clients) preserves existing
inherit-from-machine behavior. UI / docs are updated to reflect the
contract honestly: Claude Code and Codex honor the toggle; other
runtimes (Hermes / Kimi / Kiro / Copilot / Cursor / Gemini / Pi /
OpenCode / OpenClaw) leave $HOME untouched and discover user-level
skills natively, so the toggle is a no-op for them today.

New tests: TestPrepareCodexSkillsLocalIgnoreSkipsUserSeed,
TestPrepareCodexSkillsLocalMergeSeedsUserSkills, and
TestReuseCodexSkillsLocalIgnoreSkipsUserSeed cover Prepare(ignore),
Prepare(merge), and the toggle-flip-on-reuse path.

Co-authored-by: multica-agent <github@multica.ai>

* docs(skills): scope skills_local toggle copy to Claude Code + Codex (MUL-2603)

Off-state hint and Skills tab intro now explicitly call out Claude Code +
Codex as the only runtimes that honor the toggle, with "other runtimes
ignore this setting" wired into both states (en + zh-Hans), so users on
non-Claude/Codex agents don't read "Off" as runtime-wide isolation.

Docs (skills.mdx, skills.zh.mdx, guides/agents.zh.mdx) stop describing
Hermes / Kimi / Gemini / Copilot / Cursor / Pi / OpenCode / OpenClaw / Kiro
as having native user-level skill discovery; the daemon simply does not
manage user-level skill discovery for those runtimes today, and the toggle
is a no-op regardless of where it is set.

Co-authored-by: multica-agent <github@multica.ai>

---------

Co-authored-by: multica-agent <github@multica.ai>
2026-05-26 13:26:33 +08:00
Wes
cfc652aa5f fix(daemon): close stdin pipe in Pi adapter to deliver EOF (#2188) (#3118)
Pi reads its prompt from argv (positional, see buildPiArgs) and never
expects interactive input, so the Pi backend previously left cmd.Stdin
nil. Under systemd, the resulting /dev/null character device has been
observed not to satisfy Pi's readable-side wait, leaving runs stuck in
"working" forever (#2188).

Attach an explicit StdinPipe and close it immediately after Start so the
child sees an EOF on a FIFO, matching the pattern already used by the
Claude, Codex, Hermes, Kiro, and Kimi backends. The fix is defensive on
the daemon side because Pi is mid-refactor and is not accepting issues
upstream; once Pi itself stops blocking on stdin, this close is still
correct (a closed pipe is a no-op for a process that does not read it).

Test asserts the structural invariant: a shell-stub `pi` inspects
/proc/self/fd/0 and only emits a valid event stream when stdin is a
FIFO. If a future change drops the StdinPipe and stdin reverts to
/dev/null (char device), the stub exits non-zero and the test fails.
2026-05-25 15:29:09 +08:00
Dmitry
5bc77f2953 fix(pi): strip leaked tool markup safely (#2956) 2026-05-22 15:46:10 +08:00
Bohan Jiang
a6f19380b2 test(agent): use ForkLock helper to fix ETXTBSY flake in thinking tests (#3062)
Two thinking tests wrote fake CLI scripts via os.WriteFile and immediately
execed them. Under t.Parallel() with the rest of pkg/agent, a sibling
test's concurrent fork can inherit our still-open write fd, so Linux
returns ETXTBSY at exec time (Go #22315). CI hit this on main as
"TestRunCodexDebugModels_ArgvSeenByBinary: fork/exec ...: text file busy".

Switch both call sites to the existing writeTestExecutable helper, which
holds syscall.ForkLock across OpenFile→Write→Close so no concurrent fork
can inherit the write fd. Same pattern the rest of the package already
uses (kimi, kiro, codex, claude tests).
2026-05-22 14:53:56 +08:00
YOMXXX
ed2957ddf8 fix(claude): record result model usage (#2899) 2026-05-21 13:00:12 +08:00
iYuan
2f1f90c11a fix(agent): retry codex semantic inactivity fresh (#2593) 2026-05-20 20:03:39 +08:00
YOMXXX
34f16e2c7a fix(opencode): deny interactive questions in daemon mode (#2878)
* fix(opencode): deny interactive questions in daemon mode

* fix(opencode): avoid permission env ordering bypass
2026-05-20 17:17:31 +08:00
Bohan Jiang
2bec2221d2 feat(agent): per-agent thinking_level for claude + codex (MUL-2339) (#2865)
* feat(agent): persist thinking_level per agent (MUL-2339)

Adds a nullable `thinking_level` column to the `agent` table so the
backend can route a runtime-native reasoning/effort token (e.g. Claude's
`xhigh`, Codex's `minimal`) through to the agent CLI on every dispatch.

The column is intentionally TEXT rather than an enum — Claude and Codex
publish overlapping but distinct vocabularies and we want the persisted
value to round-trip exactly through whichever CLI receives it. NULL is
the "use runtime default" sentinel that every downstream consumer reads
as "do not inject --effort / reasoning_effort".

This commit is just the storage layer (migration + sqlc); subsequent
commits wire it through the API, daemon, and agent backends.

Co-authored-by: multica-agent <github@multica.ai>

* feat(agent-backend): inject reasoning effort for claude + codex (MUL-2339)

Extends ExecOptions with a runtime-native ThinkingLevel string and wires
it into the Claude and Codex backends. Discovery is driven by the local
CLI so the daemon advertises whatever the host install supports rather
than a hand-maintained list that goes stale.

Per Elon's PR1 review:
- Claude: parses `claude --help` to learn the `--effort` superset and
  projects through a per-model allow-list (xhigh is Opus-only; max is
  session-only on the smaller models). Falls back to a conservative
  static list when the binary is missing or help drift hides the line.
- Codex: drives `codex debug models --output json` so per-model
  reasoning subsets and the documented default come directly from the
  CLI. The older config-error probe trick is gone — the JSON path is
  stable and doesn't pollute stderr with an intentional misconfig.
- Cache key includes (provider, executablePath, cliVersion) so a CLI
  upgrade invalidates entries that referenced the older help / catalog.

Per Trump's PR1 constraint, all three Codex injection points
(thread/start.config, thread/resume.config, turn/start.effort) flow
through one helper (`applyCodexReasoningEffort`) so they cannot drift
independently. The shared `codexReasoningCases` fixture in
`thinking_test.go` asserts the same value→{shape, key} contract at
each site for every level the runtimes know about.

Claude's `--effort` is also added to `claudeBlockedArgs` so a user
custom_args entry can't silently outvote the daemon-injected value.

Co-authored-by: multica-agent <github@multica.ai>

* feat(api): wire thinking_level through API + daemon contract (MUL-2339)

End-to-end plumbing for the per-agent reasoning/effort setting:

- AgentResponse / TaskAgentData now carry `thinking_level`; the daemon's
  claim response includes it and the daemon's executor passes it through
  to agent.ExecOptions, where the Claude and Codex backends already know
  what to do with it.
- ModelEntry on the runtime-models wire format gains a `thinking` block
  carrying `supported_levels` + `default_level` per model so the UI can
  render a runtime-aware picker without the server having to know about
  the local CLI install. `handleModelList` projects the agent-package
  catalog (including the new Thinking field) into the wire shape.
- CreateAgent / UpdateAgent gate the field with a synchronous provider
  enum check (claude / codex only today). UpdateAgent is tri-state:
  field omitted = no change, "" = explicit clear (new
  `ClearAgentThinkingLevel` query, mirrors the existing mcp_config null
  pattern), non-empty = validate then set.

Per Trump's PR1 review, the API NEVER auto-clears on a runtime/model
swap and ALWAYS returns 400 on an unknown literal value — same shape
across CreateAgent, UpdateAgent, and combined patches that move
runtime + level in one request. Per-model combination failures (e.g.
`xhigh` against a model that only supports up to `high`) surface as a
daemon-side task error, not a silent server-side rewrite.

TS types follow the same shape: `Agent.thinking_level`,
`CreateAgentRequest`/`UpdateAgentRequest` add the field, `RuntimeModel`
grows a `thinking` block. Older backends omit the field, which the
front-end treats as "no picker for this model" — installed desktop
builds keep working.

Co-authored-by: multica-agent <github@multica.ai>

* fix(agent): correct codex debug models argv + pin via runner test (MUL-2339)

`codex debug models --output json` is rejected by codex-cli 0.131.0 —
the subcommand emits JSON on stdout by default and has no `--output`
flag. Drop the flag and add `--bundled` to skip the network refresh
discovery doesn't need. Move the argv to a package-level var and add
a test that runs a fake `codex` to assert the binary actually
receives exactly `debug models --bundled`, so the contract can't
silently drift on the next refactor.

Also teach ValidateThinkingLevel to resolve an empty model to the
provider's default model entry. Without this, every default-model
task with a persisted thinking_level would be misjudged "unknown
model" by the daemon guard.

Co-authored-by: multica-agent <github@multica.ai>

* fix(api): reject runtime switch that would leave invalid thinking_level (MUL-2339)

A PATCH that changed `runtime_id` without touching `thinking_level`
used to silently keep the existing value, so a Claude agent storing
`max` could land on a Codex runtime where `max` is not a recognised
token at all, and the daemon would receive a literal-invalid level.

Hold the same "always 400 on literal-invalid, never silent coerce"
rule on this implicit path. When runtime_id changes and the existing
value is not in the new provider's enum, return 400 with the
recovery options (clear via `thinking_level=""` or re-set in the
same PATCH).

Add coverage for both the kept-when-still-valid and the rejected
cases, plus the two recovery paths (clear and replace).

Co-authored-by: multica-agent <github@multica.ai>

* fix(daemon): guard runTask with per-model thinking_level validator (MUL-2339)

ValidateThinkingLevel existed but had no call site — `task.Agent.
ThinkingLevel` flowed straight into ExecOptions, so `xhigh` configured
on a non-Opus Claude model, or API-side stale values that escaped the
provider enum gate, would be injected anyway.

Run the validator before building ExecOptions. Invalid combinations
log a warning and drop the level instead of failing the task: the
agent still runs, just at the runtime's default reasoning effort.
Discovery errors fail open (keep the level, let the CLI surface any
objection) so a transient `claude --help` failure can't strand work.

Empty model is forwarded as-is; the validator resolves it to the
provider's default model internally per the cross-package contract.

Co-authored-by: multica-agent <github@multica.ai>

* chore(agent): drop stale `--output json` comments + unused scanner (MUL-2339)

Codex CLI's `debug models` subcommand emits JSON without an `--output`
flag, and `parseCodexDebugModels` never read from the bufio.Scanner.
Sync the comments with the actual invocation and remove the dead init.

Co-authored-by: multica-agent <github@multica.ai>

---------

Co-authored-by: multica-agent <github@multica.ai>
2026-05-20 12:30:10 +08:00
Joey Frasier (Boothe)
76cd8275ff fix(openclaw): parse whole buffer instead of line-by-line scanner (MUL-1908) (#2292)
* fix(openclaw): parse whole buffer instead of line-by-line scanner

Follow-up to c87d7676 (WOR-10). The stdout/stderr swap fixed the dominant
case but `processOutput` still scanned line-by-line and only attempted a
whole-buffer parse from a fragile fallback path. Pretty-printed JSON
(openclaw 2026.5.x emits the result blob indented across many lines) made
every individual line unparseable on its own — `{`, `  "payloads": [`,
`    {`, etc. — so the success path hinged entirely on the fallback
joining `rawLines` and re-trying.

Under load (daemon restarts racing the close-on-cancel goroutine, partial
chunked reads when stdout closes mid-flight) the line scanner could see
truncated input that never reassembled into valid JSON, surfacing
"openclaw returned no parseable output" against runs where the agent had
in fact completed the work and posted comments. Roughly 30–40% of recent
runs in v0.2.27 logs hit this path; multica still wrote a `task_failed`
inbox row for each one even though the underlying issue had moved to
`in_review` or `done`.

The fix:

- processOutput now reads the full stdout buffer with `io.ReadAll` first.
- A new `parseWholeBufferOpenclawResult` helper attempts a single
  `json.Unmarshal` against the entire buffer (after trimming, and after
  optionally stripping leading non-JSON log lines). When it matches, we
  build the result and return — the line scanner never runs.
- If the whole-buffer parse fails, we fall through to the existing NDJSON
  line-by-line scanner. This preserves streaming-event support (kept for
  forward compatibility and other backends) without leaving openclaw's
  dominant pretty-printed shape at the mercy of timing.
- The failure path now emits a `(got N bytes; preview: ...)` suffix on
  the canonical "no parseable output" error so future debugging isn't
  blind. The exact canonical phrase is preserved for empty buffers so
  existing dashboards / log-grep tooling keep matching.

Tests:

- TestOpenclawProcessOutputWholeBufferPrettyJSON: feeds a hand-crafted
  multi-line indented blob (multiple payloads, nested agentMeta, usage
  map) and asserts every field round-trips through the whole-buffer fast
  path.
- TestOpenclawProcessOutputDeeplyIndentedFixture: re-runs the recorded
  openclaw 2026.5.5 stdout fixture (1070 lines) directly through
  parseWholeBufferOpenclawResult, asserting the bug-shape parses cleanly
  on the first attempt without falling through to NDJSON scanning.
- TestOpenclawProcessOutputEmptyBufferErrorIncludesByteCount: tightens
  the empty-buffer failure path, asserts the canonical phrase survives so
  observability tooling keeps working.

All existing tests in the openclaw + buildOpenclawArgs suites stay green
(streaming NDJSON event tests, lifecycle tests, structured-error tests,
usage-field-variant tests). The two pre-existing flaky timeout-tight
codex tests (TestCodexExecuteSemanticInactivityAllowsContinuous*) fail on
both this branch and on c87d7676 baseline; they are unrelated and out of
scope here.

Co-authored-by: multica-agent <github@multica.ai>

* fix(openclaw): drop dead preview branch, document streaming regression

Rebase + review-fix follow-up on top of f27df2d9b.

processOutput's preview branch was unreachable: openclawNoParseableOutputError
was only called from the `!gotEvents && trimmed == ""` path, which by
construction means the entire scanned buffer collapsed to whitespace, so the
`(got N bytes; preview: ...)` formatter could never fire on a non-empty buffer.
Replace the helper with a single canonical-string constant (callsite is now
inline) and update the test name to match what it actually asserts (the
canonical empty-buffer error string is preserved for external log-grep /
dashboard consumers).

Also document on processOutput that the line-scanner path is no longer
truly streaming after the io.ReadAll switch: events accumulate until
stdout closes. OpenClaw 2026.5.x does not emit streaming events so this
regression is invisible today, but flag it for the next backend that
might.

Misc: switch the scanner's input source from
`strings.NewReader(string(buf))` to `bytes.NewReader(buf)` to drop one
unnecessary byte/string round-trip.

MUL-1908

Co-authored-by: multica-agent <github@multica.ai>

---------

Co-authored-by: multica-agent <github@multica.ai>
Co-authored-by: J (Multica agent) <j@multica.local>
2026-05-19 17:42:41 +08:00
Bohan Jiang
9a577f3e11 fix(runtimes): anchor OpenCode skill + AGENTS.md discovery to task workdir (MUL-2416) (#2849)
* fix(runtimes): anchor OpenCode skill + AGENTS.md discovery to task workdir

OpenCode resolves its project discovery root from `--dir` and `PWD`
before falling back to `process.cwd()`. The daemon set `cmd.Dir =
workDir` but never overrode the inherited `PWD`, so OpenCode walked
from the daemon's shell directory and silently bypassed the per-task
workdir — agents lost visibility into `.opencode/skills/` and
`AGENTS.md`, falling back to whatever global skills the host had
installed (MUL-2416).

- Pass `opencode run --dir <workDir>` and override `PWD=<workDir>` in
  the child env so AGENTS.md walk-up + `.opencode/skills` project
  config scan both anchor on the task workdir.
- Block `--dir` from custom args so user overrides cannot re-introduce
  the regression.
- Plumb skill `description` from DB through service / daemon /
  execenv. `writeSkillFiles` synthesizes a YAML frontmatter block
  (`name`, optional `description`) when the stored content lacks one,
  since runtimes like OpenCode silently drop SKILL.md files without a
  parseable `name`. Existing frontmatter is preserved unchanged so
  upstream-imported skills (GitHub / ClawHub / Skills.sh) keep their
  hand-shaped metadata.

Tests:
- New fake-CLI test confirms argv carries `--dir <workDir>` and the
  child sees `PWD=<workDir>`.
- New test confirms a user-supplied `--dir` in custom_args is dropped.
- New execenv tests cover synthesized frontmatter and preservation of
  pre-existing frontmatter.

Co-authored-by: multica-agent <github@multica.ai>

* fix(runtimes): inject SKILL.md `name` when upstream frontmatter omits it

Skills imported with frontmatter that sets `description` but leaves `name`
implicit (relying on the directory slug, as common in GitHub/Skills.sh
imports) still hit OpenCode's "no parseable name → drop" path because the
DB Name fallback never made it into the SKILL.md body. ensureSkillFrontmatter
now scans the existing block and, when name is missing or empty, prepends
`name: <slug>` while preserving description, body, and any runtime-specific
keys verbatim.

Also tighten yamlEscapeInline to always double-quote so descriptions that
look like YAML keywords (`null`, `true`, `[foo]`, `{x: y}`, `2024-01-01`)
parse as strings rather than getting reinterpreted and rejected.

Adds regression test for the nameless-frontmatter case and updates the
existing OpenCode skill test for the always-quoted description format.

Co-authored-by: multica-agent <github@multica.ai>

---------

Co-authored-by: multica-agent <github@multica.ai>
2026-05-19 16:21:02 +08:00
Bohan Jiang
e8d4b9a0a2 revert: drop exec_command watchdog (#2779, #2786) (MUL-2337) (#2803)
* Revert "fix(codex): bump default exec_command stuck timeout to 3 minutes (#2786)"

This reverts commit 433cd1aaf5.

Co-authored-by: multica-agent <github@multica.ai>

* Revert "feat(codex): add per-exec_command watchdog to escape dropped function_call_output (MUL-2337) (#2779)"

This reverts commit 60bae62622.

Co-authored-by: multica-agent <github@multica.ai>

---------

Co-authored-by: multica-agent <github@multica.ai>
2026-05-18 18:08:07 +08:00
Bohan Jiang
433cd1aaf5 fix(codex): bump default exec_command stuck timeout to 3 minutes (#2786)
The watchdog fires on a "no progress" window, so the default mainly
matters for commands that go fully silent (no outputDelta). Bumping
from 2m → 3m leaves more headroom for legitimately slow silent
commands before treating them as a dropped function_call_output, at
a modest cost to recovery latency.

MUL-2337

Co-authored-by: multica-agent <github@multica.ai>
2026-05-18 15:30:05 +08:00
Bohan Jiang
60bae62622 feat(codex): add per-exec_command watchdog to escape dropped function_call_output (MUL-2337) (#2779)
* feat(codex): add per-exec_command watchdog to escape dropped function_call_output (MUL-2337)

Codex app-server can drop the second function_call_output when two
exec_command calls fan out in the same turn and both async-yield through
the yield_time_ms boundary (observed 2026-05-18, MUL-2334 — Trump Agent
wedged for 6+ min with no semantic activity events to drive any existing
timer). The model then waits forever for the missing output; only the
10-minute semantic inactivity timeout would eventually rescue the run.

Add a per-call watchdog in the codex client that tracks open
exec_command / commandExecution items by call_id and fails the turn
quickly (default 2 min, configurable via ExecOptions.ExecCommandStuckTimeout)
when one stays open without progress. outputDelta events reset the
per-call progress timestamp so long-running streaming commands aren't
flagged.

This is a daemon-side mitigation only — codex itself still has the
upstream race, but the daemon no longer burns the full inactivity budget
before the run is marked failed and a new run can recover.

Co-authored-by: multica-agent <github@multica.ai>

* feat(codex): track legacy exec_command_output_delta in watchdog (MUL-2337)

Mirrors the raw v2 item/commandExecution/outputDelta refresh on the legacy
codex/event protocol so a long-running streaming exec doesn't get falsely
flagged as stuck after begin + 2 min.

Co-authored-by: multica-agent <github@multica.ai>

---------

Co-authored-by: multica-agent <github@multica.ai>
2026-05-18 15:14:45 +08:00
Bohan Jiang
113c4f4e90 docs(agent): clarify openclaw agent id vs name semantics (#2744)
Follow-up to #2716. Updates two stale comments that still described
openclaw's `name` and `id` as interchangeable. The actual contract:
`id` is the routing key passed to `openclaw agent --agent <id>`;
`name` is a human display label and is not safe to pass to the CLI.

No behavior change.

Co-authored-by: multica-agent <github@multica.ai>
2026-05-17 17:20:41 +08:00
Kagura
44d2fc1946 fix(agent): use openclaw agent id instead of name for --agent flag (#2716)
openclawEntriesToModels() used the agent Name (which may contain
spaces, e.g. "Sub2API OPS") as Model.ID. This ID is passed to
openclaw via --agent, where normalizeAgentId mangles spaces into
hyphens ("sub2api-ops"), causing a lookup miss against the
registered id ("sub2api") and a "no parseable output" error.

Fix: prefer agent ID for Model.ID; use Name only for display Label.
When ID is empty, fall back to Name for backward compatibility.

Fixes #2714
2026-05-17 17:08:00 +08:00
Bohan Jiang
8d872b7521 fix(daemon): disable Claude AskUserQuestion in non-interactive mode (MUL-2244) (#2656)
* fix(daemon): disable Claude AskUserQuestion in non-interactive mode (MUL-2244)

GitHub #2588: when Claude Code calls its built-in AskUserQuestion tool
inside the daemon's stream-json runtime, the question never reaches the
user — there's no UI to render it — so the SDK returns an empty answer
and the agent silently "infers" and continues. From the issue's
perspective, execution looks stuck while the agent is actually charging
ahead on its own guess.

Two-part fix:

- `buildClaudeArgs` now passes `--disallowedTools AskUserQuestion` so
  the tool is not exposed to the model at all.
- The Claude-specific runtime brief tells the agent to use a `blocked`
  issue comment for genuine clarification, or to state an explicit
  assumption and proceed.

Adds a regression test that pins both: AskUserQuestion is forbidden in
CLAUDE.md and is NOT mentioned in the AGENTS.md emitted for non-Claude
providers (the tool is Claude-specific).

Co-authored-by: multica-agent <github@multica.ai>

* refactor(daemon): drop CLAUDE.md AskUserQuestion guidance, rely on --disallowedTools

The --disallowedTools flag already prevents Claude from invoking
AskUserQuestion, so duplicating the rule in the runtime brief just bloats
the prompt without changing behavior. Removes the section and its
regression test; the argv-level test in pkg/agent already pins the flag.

Co-authored-by: multica-agent <github@multica.ai>

---------

Co-authored-by: multica-agent <github@multica.ai>
2026-05-15 12:42:23 +08:00
fr00st
cc9fbd3db0 Fix stale Done replies on comment follow-ups (#2495)
* fix: avoid stale done replies on comment follow-ups

* fix: avoid inlining runtime brief for Hermes ACP

* fix: address comment follow-up review feedback
2026-05-14 12:00:04 +08:00
Bohan Jiang
5db96b4007 fix(daemon): bypass Gemini folder-trust gate in headless mode (#2516) (#2523)
Gemini CLI's folder-trust feature throws FatalUntrustedWorkspaceError
(exit code 55) when the current workspace isn't in
`~/.gemini/trustedFolders.json` and the process is headless — no
interactive trust prompt is available. The daemon spawns gemini with
`-p` + `--yolo` in a freshly checked-out worktree that the user has
never trusted interactively, so every run with `security.folderTrust`
enabled fails after ~10s with exit status 55 and no useful output.

Default `GEMINI_CLI_TRUST_WORKSPACE=true` on the child env to short-
circuit `checkPathTrust` in gemini-core. This mirrors gemini-cli's
documented `--skip-trust` flag; the env var has been gemini's
documented headless escape hatch for the entire folder-trust feature
lifetime so the fix works on every gemini version that can produce
the crash. Callers that explicitly set the same key in cfg.Env win,
preserving the ability to opt back into the gate.

Co-authored-by: multica-agent <github@multica.ai>
2026-05-13 17:05:12 +08:00
Bohan Jiang
178cfb5008 fix(daemon): strip Windows chcp noise from runtime version (#2516) (#2521)
The gemini CLI's Windows shim emits `Active code page: 65001` (from
`chcp`) to stdout before the real version reaches `--version` output.
The daemon stored the raw concatenation as the runtime version, so the
runtime detail page rendered `Active code page: 65001 0.42.0` instead
of `0.42.0`.

Scan `<cli> --version` line by line and return the first line carrying
a semver-shaped token. Full strings like `2.1.5 (Claude Code)` or
`codex-cli 0.118.0` survive unchanged; unparseable output falls back to
the trimmed raw value.

Co-authored-by: multica-agent <github@multica.ai>
2026-05-13 16:58:14 +08:00
Kagura
702c48209b fix(agent): stop filtering Pi extension tools via hardcoded --tools allowlist (#2379) (#2381)
The Pi backend hardcoded `--tools read,bash,edit,write,grep,find,ls` in
buildPiArgs. Pi's SDK treats --tools as a restrictive allowlist: only the
listed tools pass through `_refreshToolRegistry()`, silently filtering
out any user-installed extension tools registered via `pi.registerTool()`.

Omitting --tools makes Pi's `allowedToolNames` undefined, so the
`isAllowedTool()` filter becomes a no-op and all tools — built-in and
extension — are available. This matches Pi's standalone behavior.

Users who want to restrict tools can still pass --tools via custom_args
(it is not in piBlockedArgs).

Closes #2379
2026-05-11 16:11:32 +08:00
Multica Eve
e79ffc0f01 fix(agent): expand Copilot CLI model catalog with correct dotted IDs (#2336)
* fix(agent): expand Copilot CLI model catalog with correct dotted IDs

The Copilot CLI provider only exposed two models in the runtime
dropdown, and one of them used the dashed legacy form
`claude-sonnet-4-6` which `copilot --model` rejects with
"Model ... is not available". The CLI accepts dotted IDs
(e.g. `claude-sonnet-4.6`, `gpt-5.4`).

Sync `copilotStaticModels()` with the official supported-models
catalog so the dropdown surfaces the full set the user's account
can route to (8 OpenAI + 4 Anthropic), and add a regression test
that pins the expected IDs and bans the dashed form.

Closes MUL-1948.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: multica-agent <github@multica.ai>

* feat(agent): dynamic Copilot model discovery via ACP session/new

The previous static catalog could only ever lag behind the user's
real entitlements and what GitHub ships. Copilot CLI exposes the
live catalog through its ACP server (`copilot --acp`): the
`session/new` response includes `models.availableModels` plus
`currentModelId`, scoped to the authenticated account.

Wire copilot through the existing discoverACPModels helper —
already used by hermes/kimi/kiro — so the dropdown reflects the
account's real catalog, including the `auto` entry and per-tier
model availability (Pro / Pro+ / Enterprise / evaluation models).

The Copilot CLI puts itself into ACP server mode via the `--acp`
flag instead of an `acp` subcommand, so acpDiscoveryProvider now
takes an optional acpArgs override.

Copilot's ACP payload omits the vendor name, so a small
prefix-based inferCopilotProvider keeps the UI's openai /
anthropic / google grouping working.

When the binary is missing or auth fails, fall back to
copilotStaticModels() so self-hosted runtimes without a copilot
install still see a populated dropdown.

Verified against `copilot 1.0.44`: live discovery returns 13
models with gpt-5.5 marked Default. Closes MUL-1948.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: multica-agent <github@multica.ai>

* fix(agent): drop no-op COPILOT_ALLOW_ALL env and generalize OpenAI o-series prefix check

- discoverCopilotModels: remove COPILOT_ALLOW_ALL=1 (not a real
  Copilot CLI env var; copy-pasta from HERMES_YOLO_MODE=1).
  Discovery only drives initialize + session/new which never
  trigger tool-permission prompts, so no extra env is needed.
- inferCopilotProvider: replace the o1/o3/o4 prefix chain with a
  generic o<digit>+ check via isOpenAIReasoningSeriesID, so future
  o5/o6/… reasoning models are tagged as openai automatically.
  Guards against false positives like 'opus-…' or bare 'o'.
- Extend TestInferCopilotProvider with o5/o6 forward-compat cases
  and negative cases (opus-fake, omni, o).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: multica-agent <github@multica.ai>

---------

Co-authored-by: Eve <eve@multica-ai.local>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: multica-agent <github@multica.ai>
2026-05-11 14:36:43 +08:00
Multica Eve
72e89a74f3 fix: surface copilot failure details (#2396)
Co-authored-by: Eve <eve@multica-ai.local>
Co-authored-by: multica-agent <github@multica.ai>
2026-05-11 14:08:33 +08:00
Bohan Jiang
b73a301bf9 fix(agent): drain stderr before deciding ACP failure promotion (#2333)
`hermes`, `kimi`, and `kiro` all wired stderr through
`cmd.Stderr = io.MultiWriter(logWriter, providerErrSniffer)`.
The OS-pipe → MultiWriter copy goroutine that exec spawns for
that form is only joined by `cmd.Wait()`, which the lifecycle
goroutine fires in deferred cleanup — *after*
`promoteACPResultOnProviderError` already consulted the sniffer.
When stopReason=end_turn (success) raced ahead of the stderr
drain, the sniffer's `lines` slice was empty, the helper fell
through to the synthetic agent-text fallback ("hermes provider
error: API call failed after 3 retries"), and the actionable
upstream signal (HTTP 429 / usage limit) was lost.

This was visible as a flaky
`TestHermesBackendPromotesProviderErrorWithNonEmptyOutput` in CI
under high parallelism — a real prod bug, not a test issue: live
runs hit the same race when an upstream LLM returns 429 and
hermes' synthetic agent turn beats the stderr drain to the
parent.

Replace the MultiWriter wiring with `cmd.StderrPipe()` + an
explicit copier goroutine that signals on `stderrDone`. The
lifecycle goroutine already awaits `<-readerDone` for stdout;
add `<-stderrDone` next to it before `promoteACPResultOnProviderError`
runs. The deferred `cmd.Wait()` ordering is unchanged — it just
becomes a cheap reap by the time it fires.

Verified: `go test ./pkg/agent/ -run "TestHermes|TestKimi|TestKiro"
-count=10 -race`, then full package `-count=3 -race`, all green.

Co-authored-by: multica-agent <github@multica.ai>
2026-05-09 17:34:25 +08:00
LinYushen
f70105fb12 fix(agent): include JSON-RPC error data field in ACP error messages (#2327)
ACP backends (Kiro, Hermes, Kimi) put the actionable reason for
code=-32603 'Internal error' in the JSON-RPC `data` field, e.g.
"No session found with id". The wrapped Go error only carried
`code` and `message`, leaving operators staring at a bare
"kiro session/prompt failed: session/prompt: Internal error
(code=-32603)" with no way to tell apart session expiry, model
unavailability, lost auth, or quota.

Parse `data` too. Strings render unquoted; objects/arrays render
as raw JSON; null/missing keeps the previous format unchanged.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-09 16:19:57 +08:00