Compare commits

..

6 Commits

Author SHA1 Message Date
Jiayuan Zhang
aaa5529f61 fix(agents): honour default flag wire-side and stop masking model errors
Addresses three issues from the latest PR #1399 review.

1. Wire the `default` flag end-to-end. The Model struct tagged one
   entry per provider as Default=true so the UI can badge it, but
   the daemon's heartbeat-report writer serialised models as
   `map[string]string`, silently dropping the bool; handler's
   `ModelEntry` also lacked the field. Result: the dropdown always
   showed "Default (provider)" and never "Default — <name>".
   Replace the map with a typed wire struct on the daemon side and
   add `Default bool` to `handler.ModelEntry`, plus a regression
   test asserting the flag round-trips through the report body
   and the store.

2. Stop imposing a static Multica-side default on task execution.
   `runTask` resolved the model via a three-tier chain that ended
   in `agent.DefaultModel(provider)`, forcing `claude-sonnet-4-6`
   / `gpt-5.4` / etc. onto every task whose agent hadn't set a
   model. That's the same shape of bug that bit us on cursor: any
   Go-side guess drifts from what the user's account actually has
   access to. Drop the third tier — when both `agent.model` and
   `MULTICA_<PROVIDER>_MODEL` are empty we now pass `""` through,
   so each backend omits `--model` from the CLI and the provider
   resolves its own default. `DefaultModel` / `defaultStaticModelsFor`
   become dead and are removed along with their test; the per-entry
   `Default: true` markers remain as a *display* hint on
   discovery responses (and are now surfaced via Fix #1).

3. Hermes: fail the task when the user's chosen model can't be
   applied. `session/set_model` errors used to log a warn and
   continue on hermes' own default, so a successful task would
   mislead the user into thinking their explicit pick ran. Now,
   when `opts.Model != ""` and the RPC fails, we send a `failed`
   Result with the hermes error in `result.Error` and skip the
   prompt entirely.

Plus gofmt clean-up on `hermes.go` (the new sniffer block picked
up tab/space noise on the prior commit).
2026-04-21 00:00:34 +08:00
Jiayuan Zhang
a8e2ef4c4d fix(agent/hermes): surface provider errors instead of reporting empty output
When the model the user picked isn't usable by the configured
provider — e.g. ChatGPT-account auth rejecting `gpt-5.1-codex-mini`
as "not supported when using Codex with a ChatGPT account" — hermes
still acknowledges `session/prompt` with `stopReason=end_turn` and
an empty text payload. The daemon's task machinery then reports a
useless "hermes returned empty output" and the real HTTP 400 detail
(the one that actually tells the user what to change) stays buried
in the daemon's stderr log.

Tee stderr through a bounded provider-error sniffer that scans for
`⚠️` / `` / `[ERROR]` headers plus the `Error:` / `detail:` tails
hermes emits for API failures. When the task finishes `completed`
with empty output, promote it to `failed` and use the sniffed
message as the task error. Nothing changes for healthy runs — the
sniffer is an additional writer behind an `io.MultiWriter`, it
doesn't filter or replace the normal stderr log forwarder.

Reproduced against a local hermes configured for
`provider: openai-codex` + `chatgpt.com/backend-api/codex`: before
this change the task result was `{status: completed, comment:
"hermes returned empty output"}`; after, it's
`{status: failed, error: "hermes provider error: HTTP 400: ... The
'gpt-5.1-codex-mini' model is not supported when using Codex with
a ChatGPT account."}`.

Tests cover the real fixture shape, partial-line buffering across
Writer calls, log-line filtering (info-level lines never surface
as errors), and the bounded line buffer.
2026-04-20 23:45:49 +08:00
Jiayuan Zhang
9acaf248dc feat(agent/hermes): ACP-driven model discovery and per-task selection
Hermes' ACP server already exposes everything we need to treat its
model catalog like any other provider's:

* `acp_adapter/server.py::_build_model_state` attaches a full
  `SessionModelState` (availableModels + currentModelId) to the
  response of `session/new` / `session/load` / `session/resume`.
* `acp_adapter/server.py::set_session_model` is an ACP-standard RPC
  that switches the active session's provider + model.

Use both from Multica's daemon instead of treating hermes as a
"dropdown-disabled" special case.

Discovery — `discoverHermesModels` spawns a throwaway `hermes acp`
process, drives `initialize` + `session/new(cwd=<tmpdir>)`, reads the
model catalog straight out of the response, then kills the child.
Failures (hermes missing, bad credentials, config resolution error)
degrade to an empty list so the creatable dropdown still works. The
existing 60s TTL cache amortises the ~few-second process spin-up.

Per-task selection — `hermesBackend.Execute` now issues a
`session/set_model` RPC with the UI-chosen `opts.Model` before
`session/prompt`. Failure is logged but not fatal; the session
falls back to hermes' configured default rather than aborting the
task.

`ModelSelectionSupported` no longer special-cases hermes — every
provider in the registry now honours `agent.model` end-to-end. The
UI "disabled" branch of ModelDropdown is retained (dead for today's
catalog) so a future provider can opt out without reintroducing this
plumbing.

Tests cover the JSON parser for the real `SessionModelState` shape
(including `currentModelId` mapping to Default=true, dedup of
duplicate `modelId` entries, and graceful handling of missing /
malformed payloads), plus a regression guard on the support flag.
2026-04-20 23:34:17 +08:00
Jiayuan Zhang
ad5a2abaa2 fix(agent/cursor): discover models via cursor-agent --list-models
The previous cursor catalog hardcoded model IDs like `claude-sonnet-4-6`,
`gpt-5.4`, `gemini-2.5-pro` and `composer-1.5`. None of those are valid
cursor-agent model names today — the real IDs have shape
`composer-2-fast`, `claude-4.6-sonnet-medium`, `claude-opus-4-7-high`,
`gemini-3.1-pro`, etc. Selecting one of the stale IDs from the UI made
cursor-agent exit 1 during task execution with no actionable error,
visible in practice as "cursor-agent exited with error: exit status 1"
on a 2-second-long failed run.

Cursor's catalog is too volatile to ship statically (there are ~60+
variants covering the fast/medium/high/xhigh/max × thinking matrix).
Switch to dynamic discovery via `cursor-agent --list-models`, with a
single-entry `auto` static fallback when the binary is missing.

- `discoverCursorModels` shells out and parses the `<id> - <label>`
  rows; the entry tagged `(default)` surfaces as Default=true so the
  dropdown badge matches cursor's own pick rather than ours.
- Strip the parenthetical suffix from the display label so
  "Composer 2 Fast (current, default)" renders as "Composer 2 Fast".
- Reuses the same identifier guard as the openclaw text parser to
  skip the "Available models" header / blank lines.
- Tests parse the real fixture from the live CLI.

DefaultModel("cursor") now returns "auto" (the static-fallback
default) instead of a specific ID, since we intentionally don't ship
an opinionated cursor model — cursor itself surfaces the real default
through the Default flag on whichever entry `--list-models` tags.
2026-04-20 23:09:56 +08:00
Jiayuan Zhang
6094a149ac fix(agent/openclaw): reject TUI decoration when enumerating agents
`openclaw agents list` prints a decorated banner with box-drawing
characters and section headers ("Agents:", "Identity:", "Workspace:")
surrounding the actual agent rows. The previous text parser picked
up the first whitespace-separated token on every non-empty,
non-dash-prefixed line, which surfaced the section headers and
single-character icons ("│", "◇") as selectable models in the
dropdown.

- Try structured JSON output first (`agents list --json` / `--output
  json` / `-o json`); on any of those the UI gets names straight from
  the CLI's source of truth, with the model shown as a label suffix.
- Fall back to a conservative text parser that only accepts rows with
  at least two whitespace-separated tokens where both look like
  safe identifiers (letters / digits / `-_./`, no trailing colon).
  Anything decorative or header-like is discarded.
- On ambiguity we return an empty list — a silently-wrong enumeration
  would mislead users more than the creatable dropdown's manual entry.

Adds a regression test using the exact banner shape seen in the field
plus two tests for the JSON paths (array and `{agents: [...]}` wrapped).
2026-04-20 22:51:52 +08:00
Jiayuan Zhang
a67e533742 feat(agents): add per-agent model field with provider-aware dropdown
Adds a first-class `model` field to agents so users can pick the LLM model
from the create / settings UI instead of editing custom_env / custom_args.
The previous "set MULTICA_<PROVIDER>_MODEL env var on the daemon" approach
forced one model per provider per machine and was easy to misconfigure
(e.g. -m as a custom_arg breaks codex app-server initialization).

Backend (server/pkg/agent):
- New `agent.ListModels(provider, path)` returns the models supported by a
  provider. Static catalogs for claude, codex, gemini, cursor, copilot;
  dynamic discovery for opencode (`opencode models`), pi (`pi --list-models`),
  openclaw (`openclaw agents list`); 60s TTL cache + empty-list fallback on
  failure. Hermes returns an empty list and `ModelSelectionSupported=false`
  because its model is configured out-of-band.
- `agent.DefaultModel(provider)` returns the recommended default per
  provider (Sonnet 4.6 for claude, GPT-5.4 for codex, Gemini 2.5 Pro for
  gemini, composer-1.5 for cursor); copilot/openclaw/hermes deliberately
  have no default. The static catalog tags one entry per provider with
  `Default: true` so the UI can render a badge.
- For openclaw, opts.Model is mapped to `--agent <name>` since the CLI
  rejects `--model` outright; custom_args `--agent` still wins for
  back-compat.

Daemon protocol (server/internal/daemon):
- Heartbeat response carries an optional `pending_model_list` request
  (same pattern as PingStore / UpdateStore). The daemon resolves models
  via `agent.ListModels`, including the `supported` flag, and reports
  back via /api/daemon/runtimes/{id}/models/{requestId}/result.
- Task dispatch uses a three-tier fallback for the runtime model:
  agent.model → MULTICA_<PROVIDER>_MODEL env → agent.DefaultModel(provider).

Server API (server/internal/handler):
- `agent.model` is a new column (migration 050) and surfaces in
  Agent / CreateAgent / UpdateAgent payloads.
- New endpoints under /api/runtimes/{id}/models: POST to initiate
  discovery, GET to poll the request, plus the daemon-side report
  endpoint above.

CLI (server/cmd/multica):
- `multica agent create / update --model <id>`. Help copy steers users
  away from passing --model via --custom-args, which fails on codex
  (app-server mode) and openclaw.

Frontend (packages/core, packages/views):
- `Agent.model`, `RuntimeModel`, `RuntimeModelListRequest`,
  `RuntimeModelsResult` types.
- `runtimes/models.ts` exports `runtimeModelsOptions(runtimeId)` which
  initiates discovery and polls the request to completion (500ms
  cadence, 30s ceiling).
- New `ModelDropdown` (packages/views/agents) — searchable popover,
  provider grouping, creatable manual entry, "default" badge on the
  shipped recommendation, disabled state when the provider reports
  `supported=false` (Hermes), and clears any stale model value in that
  case to avoid persisting a ghost configuration.
- Wired into create-agent-dialog and the agent settings tab.

Verification:
- gofmt clean on touched files
- `go build ./... && go test ./...` (server) green; new openclaw and
  models_test cases included
- `pnpm typecheck` green across all 6 packages

Closes the immediate UX gap behind MUL-1151. DeepSeek V4 (or any new
model) becomes a zero-code addition: add it to the relevant static
catalog, or rely on the creatable input for one-off use.
2026-04-20 22:45:56 +08:00
12 changed files with 96 additions and 1448 deletions

View File

@@ -111,20 +111,6 @@ function CursorLogo({ className }: { className: string }) {
);
}
// Kimi (Moonshot AI) — wordmark "K" mark in Moonshot brand purple, simple
// rounded-square logotype suitable for small icon sizes.
function KimiLogo({ className }: { className: string }) {
return (
<svg viewBox="0 0 24 24" fill="none" className={className}>
<rect width="24" height="24" rx="5" fill="#1F1147" />
<path
d="M7.2 6h2.4v5.1l4.3-5.1h2.9l-4.4 5.1L17 18h-2.9l-3.2-5.2-1.3 1.5V18H7.2V6z"
fill="#FFFFFF"
/>
</svg>
);
}
export function ProviderLogo({
provider,
className = "h-4 w-4",
@@ -149,8 +135,6 @@ export function ProviderLogo({
return <CopilotLogo className={className} />;
case "cursor":
return <CursorLogo className={className} />;
case "kimi":
return <KimiLogo className={className} />;
default:
return <Monitor className={className} />;
}

View File

@@ -34,7 +34,7 @@ type Config struct {
CLIVersion string // multica CLI version (e.g. "0.1.13")
LaunchedBy string // "desktop" when spawned by the Electron app, empty for standalone
Profile string // profile name (empty = default)
Agents map[string]AgentEntry // keyed by provider: claude, codex, copilot, opencode, openclaw, hermes, gemini, pi, cursor, kimi
Agents map[string]AgentEntry // keyed by provider: claude, codex, opencode, openclaw, hermes, gemini, pi
WorkspacesRoot string // base path for execution envs (default: ~/multica_workspaces)
KeepEnvAfterTask bool // preserve env after task for debugging
HealthPort int // local HTTP port for health checks (default: 19514)
@@ -142,15 +142,8 @@ func LoadConfig(overrides Overrides) (Config, error) {
Model: strings.TrimSpace(os.Getenv("MULTICA_COPILOT_MODEL")),
}
}
kimiPath := envOrDefault("MULTICA_KIMI_PATH", "kimi")
if _, err := exec.LookPath(kimiPath); err == nil {
agents["kimi"] = AgentEntry{
Path: kimiPath,
Model: strings.TrimSpace(os.Getenv("MULTICA_KIMI_MODEL")),
}
}
if len(agents) == 0 {
return Config{}, fmt.Errorf("no agent CLI found: install claude, codex, copilot, opencode, openclaw, hermes, gemini, pi, cursor-agent, or kimi and ensure it is on PATH")
return Config{}, fmt.Errorf("no agent CLI found: install claude, codex, copilot, opencode, openclaw, hermes, gemini, pi, or cursor-agent and ensure it is on PATH")
}
// Host info

View File

@@ -17,7 +17,6 @@ import (
// OpenCode: skills → {workDir}/.config/opencode/skills/{name}/SKILL.md (native discovery)
// Pi: skills → {workDir}/.pi/agent/skills/{name}/SKILL.md (native discovery)
// Cursor: skills → {workDir}/.cursor/skills/{name}/SKILL.md (native discovery)
// Kimi: skills → {workDir}/.kimi/skills/{name}/SKILL.md (native discovery)
// Default: skills → {workDir}/.agent_context/skills/{name}/SKILL.md
func writeContextFiles(workDir, provider string, ctx TaskContextForEnv) error {
contextDir := filepath.Join(workDir, ".agent_context")
@@ -70,10 +69,6 @@ func resolveSkillsDir(workDir, provider string) (string, error) {
case "cursor":
// Cursor natively discovers skills from .cursor/skills/ in the workdir.
skillsDir = filepath.Join(workDir, ".cursor", "skills")
case "kimi":
// Kimi Code CLI auto-discovers project-level skills from .kimi/skills/
// in the workdir. See https://moonshotai.github.io/kimi-cli/en/customization/skills.html
skillsDir = filepath.Join(workDir, ".kimi", "skills")
default:
// Fallback: write to .agent_context/skills/ (referenced by meta config).
skillsDir = filepath.Join(workDir, ".agent_context", "skills")

View File

@@ -18,14 +18,13 @@ import (
// For Gemini: writes {workDir}/GEMINI.md (discovered natively by the Gemini CLI)
// For Pi: writes {workDir}/AGENTS.md (skills discovered natively from ~/.pi/agent/skills/)
// For Cursor: writes {workDir}/AGENTS.md (skills discovered natively from .cursor/skills/)
// For Kimi: writes {workDir}/AGENTS.md (Kimi Code CLI reads AGENTS.md natively; skills auto-discovered from project skills dirs)
func InjectRuntimeConfig(workDir, provider string, ctx TaskContextForEnv) error {
content := buildMetaSkillContent(provider, ctx)
switch provider {
case "claude":
return os.WriteFile(filepath.Join(workDir, "CLAUDE.md"), []byte(content), 0o644)
case "codex", "copilot", "opencode", "openclaw", "pi", "cursor", "kimi":
case "codex", "copilot", "opencode", "openclaw", "pi", "cursor":
return os.WriteFile(filepath.Join(workDir, "AGENTS.md"), []byte(content), 0o644)
case "gemini":
return os.WriteFile(filepath.Join(workDir, "GEMINI.md"), []byte(content), 0o644)
@@ -152,8 +151,8 @@ func buildMetaSkillContent(provider string, ctx TaskContextForEnv) string {
case "claude":
// Claude discovers skills natively from .claude/skills/ — just list names.
b.WriteString("You have the following skills installed (discovered automatically):\n\n")
case "codex", "copilot", "opencode", "openclaw", "pi", "cursor", "kimi":
// Codex, Copilot, OpenCode, OpenClaw, Pi, Cursor, and Kimi discover skills natively from their respective paths — just list names.
case "codex", "copilot", "opencode", "openclaw", "pi", "cursor":
// Codex, Copilot, OpenCode, OpenClaw, Pi, and Cursor discover skills natively from their respective paths — just list names.
b.WriteString("You have the following skills installed (discovered automatically):\n\n")
case "gemini":
// Gemini reads GEMINI.md directly; point it at the fallback skills dir.

View File

@@ -1,6 +1,5 @@
// Package agent provides a unified interface for executing prompts via
// coding agents (Claude Code, Codex, Copilot, OpenCode, OpenClaw, Hermes,
// Gemini, Pi, Cursor, Kimi). It mirrors the happy-cli AgentBackend
// coding agents (Claude Code, Codex, OpenCode, OpenClaw, Hermes, Pi). It mirrors the happy-cli AgentBackend
// pattern, translated to idiomatic Go.
package agent
@@ -86,13 +85,13 @@ type Result struct {
// Config configures a Backend instance.
type Config struct {
ExecutablePath string // path to CLI binary (claude, codex, copilot, opencode, openclaw, hermes, gemini, pi, cursor, kimi)
ExecutablePath string // path to CLI binary (claude, codex, copilot, opencode, openclaw, hermes, gemini, or pi)
Env map[string]string // extra environment variables
Logger *slog.Logger
}
// New creates a Backend for the given agent type.
// Supported types: "claude", "codex", "copilot", "opencode", "openclaw", "hermes", "gemini", "pi", "cursor", "kimi".
// Supported types: "claude", "codex", "copilot", "opencode", "openclaw", "hermes", "gemini", "pi", "cursor".
func New(agentType string, cfg Config) (Backend, error) {
if cfg.Logger == nil {
cfg.Logger = slog.Default()
@@ -117,10 +116,8 @@ func New(agentType string, cfg Config) (Backend, error) {
return &piBackend{cfg: cfg}, nil
case "cursor":
return &cursorBackend{cfg: cfg}, nil
case "kimi":
return &kimiBackend{cfg: cfg}, nil
default:
return nil, fmt.Errorf("unknown agent type: %q (supported: claude, codex, copilot, opencode, openclaw, hermes, gemini, pi, cursor, kimi)", agentType)
return nil, fmt.Errorf("unknown agent type: %q (supported: claude, codex, copilot, opencode, openclaw, hermes, gemini, pi, cursor)", agentType)
}
}
@@ -145,7 +142,6 @@ var launchHeaders = map[string]string{
"openclaw": "openclaw agent (json)",
"opencode": "opencode run (json)",
"pi": "pi (json mode)",
"kimi": "kimi acp",
}
// LaunchHeader returns the user-visible launch skeleton for agentType, or an

View File

@@ -72,7 +72,7 @@ func TestLaunchHeaderCoversAllSupportedBackends(t *testing.T) {
// entry to launchHeaders in agent.go and extend this list.
supported := []string{
"claude", "codex", "copilot", "cursor", "gemini",
"hermes", "kimi", "openclaw", "opencode", "pi",
"hermes", "openclaw", "opencode", "pi",
}
for _, t_ := range supported {
if header := LaunchHeader(t_); header == "" {

View File

@@ -8,7 +8,6 @@ import (
"io"
"os/exec"
"regexp"
"strconv"
"strings"
"sync"
"time"
@@ -74,7 +73,7 @@ func (b *hermesBackend) Execute(ctx context.Context, prompt string, opts ExecOpt
// without this we'd report a misleading "empty output" and hide
// the real cause (wrong model for the current provider, bad
// credentials, rate limit, …) in the daemon log.
providerErr := newACPProviderErrorSniffer("hermes")
providerErr := newHermesProviderErrorSniffer()
cmd.Stderr = io.MultiWriter(newLogWriter(b.cfg.Logger, "[hermes:stderr] "), providerErr)
if err := cmd.Start(); err != nil {
@@ -93,10 +92,9 @@ func (b *hermesBackend) Execute(ctx context.Context, prompt string, opts ExecOpt
promptDone := make(chan hermesPromptResult, 1)
c := &hermesClient{
cfg: b.cfg,
stdin: stdin,
pending: make(map[int]*pendingRPC),
pendingTools: make(map[string]*pendingToolCall),
cfg: b.cfg,
stdin: stdin,
pending: make(map[int]*pendingRPC),
onMessage: func(msg Message) {
if msg.Type == MessageText {
outputMu.Lock()
@@ -190,7 +188,7 @@ func (b *hermesBackend) Execute(ctx context.Context, prompt string, opts ExecOpt
resCh <- Result{Status: finalStatus, Error: finalError, DurationMs: time.Since(startTime).Milliseconds()}
return
}
sessionID = extractACPSessionID(result)
sessionID = extractHermesSessionID(result)
if sessionID == "" {
finalStatus = "failed"
finalError = "hermes session/new returned no session ID"
@@ -338,7 +336,6 @@ type hermesPromptResult struct {
type hermesClient struct {
cfg Config
stdin interface{ Write([]byte) (int, error) }
writeMu sync.Mutex // serialises stdin.Write calls across goroutines
mu sync.Mutex
nextID int
pending map[int]*pendingRPC
@@ -346,39 +343,10 @@ type hermesClient struct {
onMessage func(Message)
onPromptDone func(hermesPromptResult)
// pendingTools buffers the args for tool calls whose input streams in
// across multiple ACP tool_call_update messages (kimi does this —
// tokens from the LLM arrive one at a time, and each update carries
// the cumulative args JSON so far). We defer emitting MessageToolUse
// until we either see status=completed/failed or have a full arg set,
// so the UI never sees a half-written command like `{"comma`.
toolMu sync.Mutex
pendingTools map[string]*pendingToolCall
usageMu sync.Mutex
usage TokenUsage
}
// pendingToolCall buffers state for a tool call while its arguments
// are streaming in. One entry per ACP toolCallId.
type pendingToolCall struct {
toolName string // already mapped via hermesToolNameFromTitle
input map[string]any // from rawInput when the agent sends it up front (hermes)
argsText string // accumulated `content[].text` args (kimi, cumulative)
emitted bool // whether we've already sent MessageToolUse
}
// writeLine serialises concurrent JSON-RPC writes so request() (main
// goroutine) and handleAgentRequest() (reader goroutine) don't
// interleave frames. The pipe itself is atomic for small writes, but
// we also want deterministic ordering under contention.
func (c *hermesClient) writeLine(data []byte) error {
c.writeMu.Lock()
defer c.writeMu.Unlock()
_, err := c.stdin.Write(data)
return err
}
func (c *hermesClient) request(ctx context.Context, method string, params any) (json.RawMessage, error) {
c.mu.Lock()
id := c.nextID
@@ -401,7 +369,7 @@ func (c *hermesClient) request(ctx context.Context, method string, params any) (
return nil, err
}
data = append(data, '\n')
if err := c.writeLine(data); err != nil {
if _, err := c.stdin.Write(data); err != nil {
c.mu.Lock()
delete(c.pending, id)
c.mu.Unlock()
@@ -434,11 +402,7 @@ func (c *hermesClient) handleLine(line string) {
return
}
// Agent → client request: has id + method (no result / error yet).
// Kimi uses this for session/request_permission; if we don't answer,
// the agent blocks for 300s and the task hangs. Hermes doesn't send
// these when launched with HERMES_YOLO_MODE=1, but we still handle
// the case generically for any future ACP backend we bolt on.
// Check if it's a response to our request (has id + result or error).
if _, hasID := raw["id"]; hasID {
if _, hasResult := raw["result"]; hasResult {
c.handleResponse(raw)
@@ -448,10 +412,6 @@ func (c *hermesClient) handleLine(line string) {
c.handleResponse(raw)
return
}
if _, hasMethod := raw["method"]; hasMethod {
c.handleAgentRequest(raw)
return
}
}
// Notification (no id, has method) — session updates from Hermes.
@@ -460,62 +420,6 @@ func (c *hermesClient) handleLine(line string) {
}
}
// handleAgentRequest replies to JSON-RPC requests the agent sends
// us (agent → client direction). The only one we care about today is
// `session/request_permission`: the daemon is headless and cannot
// actually prompt a user, so we auto-approve every action. Using
// `approve_for_session` rather than `approve` means subsequent
// identical actions (every Shell invocation, every file write) don't
// round-trip through us — the agent remembers them locally.
func (c *hermesClient) handleAgentRequest(raw map[string]json.RawMessage) {
var method string
_ = json.Unmarshal(raw["method"], &method)
rawID, ok := raw["id"]
if !ok {
return
}
var resp map[string]any
switch method {
case "session/request_permission":
resp = map[string]any{
"jsonrpc": "2.0",
"id": json.RawMessage(rawID),
"result": map[string]any{
"outcome": map[string]any{
"outcome": "selected",
"optionId": "approve_for_session",
},
},
}
c.cfg.Logger.Debug("auto-approved agent permission request", "method", method)
default:
// Unknown agent→client method — reply with standard "method
// not found" so the agent doesn't block waiting for us. Better
// than silence: the agent can decide how to proceed.
resp = map[string]any{
"jsonrpc": "2.0",
"id": json.RawMessage(rawID),
"error": map[string]any{
"code": -32601,
"message": "method not found: " + method,
},
}
c.cfg.Logger.Debug("unhandled agent→client request", "method", method)
}
data, err := json.Marshal(resp)
if err != nil {
c.cfg.Logger.Warn("marshal agent-request response", "method", method, "error", err)
return
}
data = append(data, '\n')
if err := c.writeLine(data); err != nil {
c.cfg.Logger.Warn("write agent-request response", "method", method, "error", err)
}
}
func (c *hermesClient) handleResponse(raw map[string]json.RawMessage) {
var id int
if err := json.Unmarshal(raw["id"], &id); err != nil {
@@ -656,283 +560,51 @@ func (c *hermesClient) handleAgentThought(data json.RawMessage) {
func (c *hermesClient) handleToolCallStart(data json.RawMessage) {
var msg struct {
ToolCallID string `json:"toolCallId"`
Title string `json:"title"`
Kind string `json:"kind"`
RawInput map[string]any `json:"rawInput"`
Content []json.RawMessage `json:"content"`
ToolCallID string `json:"toolCallId"`
Title string `json:"title"`
Kind string `json:"kind"`
RawInput map[string]any `json:"rawInput"`
}
if err := json.Unmarshal(data, &msg); err != nil {
return
}
toolName := hermesToolNameFromTitle(msg.Title, msg.Kind)
// Hermes pre-populates rawInput on the initial tool_call — emit
// MessageToolUse immediately so the UI can show the tool invocation
// live. Record the emission so handleToolCallUpdate doesn't re-emit
// on completion.
if msg.RawInput != nil {
c.trackTool(msg.ToolCallID, &pendingToolCall{
toolName: toolName,
input: msg.RawInput,
emitted: true,
if c.onMessage != nil {
c.onMessage(Message{
Type: MessageToolUse,
Tool: toolName,
CallID: msg.ToolCallID,
Input: msg.RawInput,
})
if c.onMessage != nil {
c.onMessage(Message{
Type: MessageToolUse,
Tool: toolName,
CallID: msg.ToolCallID,
Input: msg.RawInput,
})
}
return
}
// Kimi streams args token-by-token across tool_call_update messages;
// the initial tool_call often carries an empty content block. Buffer
// the tool and defer MessageToolUse emission to avoid the UI seeing
// a command with `{""` as its input.
c.trackTool(msg.ToolCallID, &pendingToolCall{
toolName: toolName,
argsText: extractACPToolCallText(msg.Content),
emitted: false,
})
}
func (c *hermesClient) handleToolCallUpdate(data json.RawMessage) {
var msg struct {
ToolCallID string `json:"toolCallId"`
Status string `json:"status"`
Title string `json:"title"`
Kind string `json:"kind"`
RawInput map[string]any `json:"rawInput"`
RawOutput string `json:"rawOutput"`
Content []json.RawMessage `json:"content"`
ToolCallID string `json:"toolCallId"`
Status string `json:"status"`
Kind string `json:"kind"`
RawOutput string `json:"rawOutput"`
}
if err := json.Unmarshal(data, &msg); err != nil {
return
}
// Mid-stream: only buffer updates. Kimi emits many of these per
// tool call, each carrying the cumulative args JSON so far.
// Only emit tool result when the call is completed.
if msg.Status != "completed" && msg.Status != "failed" {
if pending := c.getPendingTool(msg.ToolCallID); pending != nil && !pending.emitted {
if text := extractACPToolCallText(msg.Content); text != "" {
// kimi streams the full cumulative args on every frame;
// overwrite rather than concatenate.
pending.argsText = text
}
}
return
}
// Completion: emit any deferred MessageToolUse first, then the result.
pending := c.takePendingTool(msg.ToolCallID)
c.emitDeferredToolUse(pending, msg.ToolCallID, msg.Title, msg.Kind, msg.RawInput)
output := msg.RawOutput
if output == "" {
output = extractACPToolCallText(msg.Content)
}
if c.onMessage != nil {
c.onMessage(Message{
Type: MessageToolResult,
CallID: msg.ToolCallID,
Output: output,
Output: msg.RawOutput,
})
}
}
// trackTool stores pending-tool state for a given callID. Lazy-inits
// the map so zero-value hermesClient values (common in tests) don't
// panic on the first tool call.
func (c *hermesClient) trackTool(callID string, p *pendingToolCall) {
c.toolMu.Lock()
defer c.toolMu.Unlock()
if c.pendingTools == nil {
c.pendingTools = make(map[string]*pendingToolCall)
}
c.pendingTools[callID] = p
}
// getPendingTool returns the pending entry (may be nil) without
// removing it. Safe to call on a zero-value hermesClient.
func (c *hermesClient) getPendingTool(callID string) *pendingToolCall {
c.toolMu.Lock()
defer c.toolMu.Unlock()
if c.pendingTools == nil {
return nil
}
return c.pendingTools[callID]
}
// takePendingTool removes and returns the pending entry, or nil if
// none was tracked (e.g. the tool completed before we saw its start,
// or we missed the start frame).
func (c *hermesClient) takePendingTool(callID string) *pendingToolCall {
c.toolMu.Lock()
defer c.toolMu.Unlock()
if c.pendingTools == nil {
return nil
}
p := c.pendingTools[callID]
delete(c.pendingTools, callID)
return p
}
// emitDeferredToolUse emits a buffered MessageToolUse right before the
// matching MessageToolResult. Handles three cases:
// - hermes tool: already emitted on tool_call → skip
// - kimi tool with streamed args → parse accumulated JSON as Input
// - unknown tool (completed arrived without a start frame) →
// synthesize minimal info from the update's own fields
func (c *hermesClient) emitDeferredToolUse(
p *pendingToolCall,
callID, updateTitle, updateKind string,
updateRawInput map[string]any,
) {
if p != nil && p.emitted {
return
}
var toolName string
var input map[string]any
switch {
case p != nil && p.input != nil:
// Pre-buffered rawInput path — shouldn't happen because we set
// emitted=true in that case, but handle defensively.
toolName = p.toolName
input = p.input
case p != nil:
toolName = p.toolName
input = parseToolArgsJSON(p.argsText)
default:
// No record of the start frame — fall back to the update's own
// title/kind/rawInput so the UI at least sees the tool name.
toolName = hermesToolNameFromTitle(updateTitle, updateKind)
input = updateRawInput
}
if c.onMessage == nil {
return
}
c.onMessage(Message{
Type: MessageToolUse,
Tool: toolName,
CallID: callID,
Input: input,
})
}
// parseToolArgsJSON turns kimi's accumulated args string into the
// structured map the UI expects under Message.Input. Kimi sends args
// as a JSON-encoded object (`{"command":"echo hi"}`), so a full JSON
// parse recovers the original tool-arg shape. On malformed input
// (streaming glitch, non-JSON tool) we preserve the raw text under a
// `text` key so the UI still has something to render.
func parseToolArgsJSON(argsText string) map[string]any {
argsText = strings.TrimSpace(argsText)
if argsText == "" {
return nil
}
var m map[string]any
if err := json.Unmarshal([]byte(argsText), &m); err == nil {
return m
}
return map[string]any{"text": argsText}
}
// extractACPToolCallText concatenates the rendered text of every ACP
// block in a tool_call / tool_call_update's `content` array.
//
// Handles the two block types kimi emits:
// - {type:"content", content:{type:"text", text:"..."}} — plain text
// (shell output, tool args). Text is concatenated verbatim.
// - {type:"diff", path, oldText, newText} — FileEdit output. Rendered
// as a minimal unified-diff header so the UI distinguishes writes
// from reads without needing a diff viewer.
//
// Terminal blocks ({type:"terminal", terminalId}) reference a remote
// terminal the client would normally subscribe to via terminal/output;
// we don't advertise terminal capability so we never receive those in
// practice, but if one slips through we skip it (nothing useful to
// surface from a bare ID).
func extractACPToolCallText(blocks []json.RawMessage) string {
var b strings.Builder
appendPiece := func(piece string) {
if piece == "" {
return
}
if b.Len() > 0 {
b.WriteByte('\n')
}
b.WriteString(piece)
}
for _, raw := range blocks {
var kind struct {
Type string `json:"type"`
}
if err := json.Unmarshal(raw, &kind); err != nil {
continue
}
switch kind.Type {
case "content":
var outer struct {
Content json.RawMessage `json:"content"`
}
if err := json.Unmarshal(raw, &outer); err != nil || len(outer.Content) == 0 {
continue
}
var inner struct {
Type string `json:"type"`
Text string `json:"text"`
}
if err := json.Unmarshal(outer.Content, &inner); err != nil {
continue
}
if inner.Type != "text" {
continue
}
appendPiece(inner.Text)
case "diff":
var diff struct {
Path string `json:"path"`
OldText string `json:"oldText"`
NewText string `json:"newText"`
}
if err := json.Unmarshal(raw, &diff); err != nil || diff.Path == "" {
continue
}
// Keep it tiny — a full unified diff can be huge and we're
// really just recording "this tool wrote to this file".
// The UI can re-read the file if it needs the actual content.
var piece strings.Builder
piece.WriteString("--- ")
piece.WriteString(diff.Path)
piece.WriteString("\n+++ ")
piece.WriteString(diff.Path)
if diff.OldText == "" {
piece.WriteString("\n(new file, ")
piece.WriteString(strconv.Itoa(len(diff.NewText)))
piece.WriteString(" bytes)")
} else {
piece.WriteString("\n(edited: ")
piece.WriteString(strconv.Itoa(len(diff.OldText)))
piece.WriteString(" → ")
piece.WriteString(strconv.Itoa(len(diff.NewText)))
piece.WriteString(" bytes)")
}
appendPiece(piece.String())
default:
// terminal blocks, image blocks, unknown future types —
// ignore. We have no way to inline-render them.
}
}
return b.String()
}
func (c *hermesClient) handleUsageUpdate(data json.RawMessage) {
var msg struct {
Usage struct {
@@ -962,10 +634,7 @@ func (c *hermesClient) handleUsageUpdate(data json.RawMessage) {
// ── Helpers ──
// extractACPSessionID pulls `sessionId` out of a session/new or
// session/resume response. Shared by all ACP backends (hermes, kimi,
// and anything else that follows the standard ACP schema).
func extractACPSessionID(result json.RawMessage) string {
func extractHermesSessionID(result json.RawMessage) string {
var r struct {
SessionID string `json:"sessionId"`
}
@@ -1028,64 +697,46 @@ func hermesToolNameFromTitle(title string, kind string) string {
case "think":
return "thinking"
default:
// Preserve a non-empty title when we can't classify it: kimi
// emits bare titles like "Shell" or "Read file" without any
// `kind`, so returning an empty string here drops the tool
// name entirely before kimiToolNameFromTitle can map it.
// Hermes titles always carry a colon, so hermes never reaches
// this branch with a non-empty title.
if title != "" {
return title
}
return kind
}
}
// ── Provider-error sniffing ──
//
// ACP agents (hermes, kimi, …) all have the same failure mode:
// session/prompt reports stopReason=end_turn even when the underlying
// HTTP call to the configured LLM endpoint returned an error — the
// actionable detail only appears on stderr (e.g.
// hermes' session/prompt RPC reports stopReason=end_turn even when
// the underlying HTTP call to the configured LLM endpoint returned
// an error — the actionable detail only appears on stderr (e.g.
// `⚠️ API call failed (attempt 1/3): BadRequestError [HTTP 400]` and
// `Error: HTTP 400: Error code: 400 - {'detail': "The '...' model
// is not supported when using Codex with a ChatGPT account."}`).
// The sniffer scans for those patterns so the daemon can surface a
// real failure instead of a generic "empty output".
//
// Parameterised by provider name so both hermes and kimi can share
// the transport: the regexes match format-level signals (HTTP status,
// error-kind tags, "API call failed" banner) that both runtimes emit.
type acpProviderErrorSniffer struct {
provider string
mu sync.Mutex
remains []byte // buffer for a partial trailing line across writes
lines []string // captured error lines, bounded
seen map[string]bool
// We scan for those patterns so the daemon can surface a real
// failure instead of a generic "empty output".
type hermesProviderErrorSniffer struct {
mu sync.Mutex
remains []byte // buffer for a partial trailing line across writes
lines []string // captured error lines, bounded
seen map[string]bool
}
// acpErrorHeaderRe matches the first line of an API-error block.
// ACP agents typically prefix these with ⚠️ / ❌ and include an HTTP
// status code or a non-retryable-error tag.
var acpErrorHeaderRe = regexp.MustCompile(`(?:⚠️|❌|\[ERROR\]).*(?:BadRequestError|AuthenticationError|RateLimitError|HTTP [0-9]{3}|Non-retryable|API call failed)`)
// hermesErrorHeaderRe matches the first line of an API-error block.
// Hermes prefixes these with ⚠️ / ❌ and includes an HTTP status
// code or a non-retryable-error tag.
var hermesErrorHeaderRe = regexp.MustCompile(`(?:⚠️|❌|\[ERROR\]).*(?:BadRequestError|AuthenticationError|RateLimitError|HTTP [0-9]{3}|Non-retryable|API call failed)`)
// acpErrorDetailRe pulls the most useful single-line messages out of
// the subsequent lines of the error block (the one whose "Error:" or
// "Details:" tag actually spells out what happened).
var acpErrorDetailRe = regexp.MustCompile(`(?:Error:|detail:|Details:)\s*(.+)`)
// hermesErrorDetailRe pulls the most useful single-line messages
// out of the subsequent lines of the error block (the one whose
// "Error:" or "Details:" tag actually spells out what happened).
var hermesErrorDetailRe = regexp.MustCompile(`(?:Error:|detail:|Details:)\s*(.+)`)
const acpMaxErrorLines = 8
const hermesMaxErrorLines = 8
// newACPProviderErrorSniffer returns a sniffer that tags its messages
// with the given provider name (e.g. "hermes", "kimi") so failure
// strings make it obvious which runtime produced the error.
func newACPProviderErrorSniffer(provider string) *acpProviderErrorSniffer {
return &acpProviderErrorSniffer{provider: provider, seen: map[string]bool{}}
func newHermesProviderErrorSniffer() *hermesProviderErrorSniffer {
return &hermesProviderErrorSniffer{seen: map[string]bool{}}
}
// Write implements io.Writer so the sniffer can sit behind an
// io.MultiWriter next to the normal stderr log forwarder.
func (s *acpProviderErrorSniffer) Write(p []byte) (int, error) {
func (s *hermesProviderErrorSniffer) Write(p []byte) (int, error) {
s.mu.Lock()
defer s.mu.Unlock()
@@ -1106,7 +757,7 @@ func (s *acpProviderErrorSniffer) Write(p []byte) (int, error) {
if line == "" {
continue
}
if !(acpErrorHeaderRe.MatchString(line) || acpErrorDetailRe.MatchString(line)) {
if !(hermesErrorHeaderRe.MatchString(line) || hermesErrorDetailRe.MatchString(line)) {
continue
}
if s.seen[line] {
@@ -1114,8 +765,8 @@ func (s *acpProviderErrorSniffer) Write(p []byte) (int, error) {
}
s.seen[line] = true
s.lines = append(s.lines, line)
if len(s.lines) > acpMaxErrorLines {
s.lines = s.lines[len(s.lines)-acpMaxErrorLines:]
if len(s.lines) > hermesMaxErrorLines {
s.lines = s.lines[len(s.lines)-hermesMaxErrorLines:]
}
}
return len(p), nil
@@ -1125,22 +776,21 @@ func (s *acpProviderErrorSniffer) Write(p []byte) (int, error) {
// error field. Prefers the most specific "Error:" / "detail:"
// fragment; falls back to the first captured header line; empty
// when nothing useful was seen.
func (s *acpProviderErrorSniffer) message() string {
func (s *hermesProviderErrorSniffer) message() string {
s.mu.Lock()
defer s.mu.Unlock()
prefix := s.provider + " provider error: "
for _, line := range s.lines {
if m := acpErrorDetailRe.FindStringSubmatch(line); m != nil {
if m := hermesErrorDetailRe.FindStringSubmatch(line); m != nil {
detail := strings.TrimSpace(m[1])
if detail != "" {
return prefix + detail
return "hermes provider error: " + detail
}
}
}
for _, line := range s.lines {
if acpErrorHeaderRe.MatchString(line) {
return prefix + line
if hermesErrorHeaderRe.MatchString(line) {
return "hermes provider error: " + line
}
}
return ""

View File

@@ -2,9 +2,7 @@ package agent
import (
"encoding/json"
"log/slog"
"strings"
"sync"
"testing"
)
@@ -19,30 +17,30 @@ func TestNewReturnsHermesBackend(t *testing.T) {
}
}
// ── extractACPSessionID ──
// ── extractHermesSessionID ──
func TestExtractACPSessionID(t *testing.T) {
func TestExtractHermesSessionID(t *testing.T) {
t.Parallel()
raw := json.RawMessage(`{"sessionId":"20260410_141145_47260c"}`)
got := extractACPSessionID(raw)
got := extractHermesSessionID(raw)
if got != "20260410_141145_47260c" {
t.Errorf("got %q, want %q", got, "20260410_141145_47260c")
}
}
func TestExtractACPSessionIDEmpty(t *testing.T) {
func TestExtractHermesSessionIDEmpty(t *testing.T) {
t.Parallel()
raw := json.RawMessage(`{}`)
got := extractACPSessionID(raw)
got := extractHermesSessionID(raw)
if got != "" {
t.Errorf("got %q, want empty", got)
}
}
func TestExtractACPSessionIDInvalidJSON(t *testing.T) {
func TestExtractHermesSessionIDInvalidJSON(t *testing.T) {
t.Parallel()
raw := json.RawMessage(`not json`)
got := extractACPSessionID(raw)
got := extractHermesSessionID(raw)
if got != "" {
t.Errorf("got %q, want empty", got)
}
@@ -67,24 +65,14 @@ func TestHermesToolNameFromTitle(t *testing.T) {
{"delegate: fix the bug", "execute", "delegate_task"},
{"analyze image: what is this?", "read", "vision_analyze"},
{"execute code", "execute", "execute_code"},
// Fallback to kind when no colon in title but kind is known.
// Fallback to kind when no colon in title.
{"unknownTool", "read", "read_file"},
{"unknownTool", "edit", "write_file"},
{"unknownTool", "execute", "terminal"},
{"unknownTool", "search", "search_files"},
{"unknownTool", "fetch", "web_search"},
{"unknownTool", "think", "thinking"},
// Bare title (no colon, no known kind) — preserve the title
// itself rather than falling back to an unclassified kind.
// Matters for kimi: its ACP `tool_call` updates emit a bare
// `title: "Shell"` with no `kind`, and we need downstream
// normalisation (kimiToolNameFromTitle) to see "Shell" rather
// than an empty string.
{"Shell", "", "Shell"},
{"Read file", "", "Read file"},
{"unknownTool", "other", "unknownTool"},
// Empty title falls back to kind, even when kind isn't known.
{"", "other", "other"},
{"unknownTool", "other", "other"},
// Tool with colon but not in known map.
{"custom_tool: args", "other", "custom_tool"},
}
@@ -113,7 +101,7 @@ func TestHermesClientHandleLineResponse(t *testing.T) {
if res.err != nil {
t.Fatalf("unexpected error: %v", res.err)
}
sid := extractACPSessionID(res.result)
sid := extractHermesSessionID(res.result)
if sid != "ses_abc" {
t.Errorf("sessionId: got %q, want %q", sid, "ses_abc")
}
@@ -139,111 +127,6 @@ func TestHermesClientHandleLineError(t *testing.T) {
}
}
// ── agent → client request handling ──
// bufferWriter is a test stand-in for cmd.StdinPipe that captures
// writes in-memory so we can assert what handleAgentRequest emitted.
type bufferWriter struct {
mu sync.Mutex
buf strings.Builder
}
func (b *bufferWriter) Write(p []byte) (int, error) {
b.mu.Lock()
defer b.mu.Unlock()
return b.buf.WriteString(string(p))
}
func (b *bufferWriter) String() string {
b.mu.Lock()
defer b.mu.Unlock()
return b.buf.String()
}
// TestHermesClientAutoApprovesPermissionRequest asserts that when an
// ACP agent sends us `session/request_permission` (kimi does this on
// every Shell / file-mutating tool call), the client replies with
// `approve_for_session` — without this the agent blocks 300s and the
// task hangs. The id in the reply must match the agent's request id
// so its in-flight future resolves.
func TestHermesClientAutoApprovesPermissionRequest(t *testing.T) {
t.Parallel()
w := &bufferWriter{}
c := &hermesClient{
cfg: Config{Logger: slog.Default()},
stdin: w,
pending: make(map[int]*pendingRPC),
}
c.handleLine(`{"jsonrpc":"2.0","id":42,"method":"session/request_permission","params":{"sessionId":"ses_1","options":[{"optionId":"approve","name":"Approve once","kind":"allow_once"},{"optionId":"approve_for_session","name":"Approve for this session","kind":"allow_always"},{"optionId":"reject","name":"Reject","kind":"reject_once"}],"toolCall":{"toolCallId":"tc_1","title":"Shell","content":[]}}}`)
got := w.String()
var resp struct {
JSONRPC string `json:"jsonrpc"`
ID int `json:"id"`
Result struct {
Outcome struct {
Outcome string `json:"outcome"`
OptionID string `json:"optionId"`
} `json:"outcome"`
} `json:"result"`
}
if err := json.Unmarshal([]byte(strings.TrimSpace(got)), &resp); err != nil {
t.Fatalf("reply is not valid JSON: %q err=%v", got, err)
}
if resp.JSONRPC != "2.0" {
t.Errorf("jsonrpc: got %q, want 2.0", resp.JSONRPC)
}
if resp.ID != 42 {
t.Errorf("id: got %d, want 42 (must echo agent's request id)", resp.ID)
}
if resp.Result.Outcome.Outcome != "selected" {
t.Errorf("outcome.outcome: got %q, want %q", resp.Result.Outcome.Outcome, "selected")
}
if resp.Result.Outcome.OptionID != "approve_for_session" {
t.Errorf("outcome.optionId: got %q, want %q", resp.Result.Outcome.OptionID, "approve_for_session")
}
}
// TestHermesClientReplesMethodNotFoundForUnknownAgentRequest ensures
// that any agent → client request we don't explicitly handle gets a
// proper JSON-RPC error back, not silence. Silence would block the
// agent for however long its internal timeout is, same as the
// session/request_permission hang this change fixes.
func TestHermesClientReplesMethodNotFoundForUnknownAgentRequest(t *testing.T) {
t.Parallel()
w := &bufferWriter{}
c := &hermesClient{
cfg: Config{Logger: slog.Default()},
stdin: w,
pending: make(map[int]*pendingRPC),
}
c.handleLine(`{"jsonrpc":"2.0","id":7,"method":"fs/read_text_file","params":{"path":"/tmp/x"}}`)
got := w.String()
var resp struct {
ID int `json:"id"`
Error struct {
Code int `json:"code"`
Message string `json:"message"`
} `json:"error"`
}
if err := json.Unmarshal([]byte(strings.TrimSpace(got)), &resp); err != nil {
t.Fatalf("reply not valid JSON: %q err=%v", got, err)
}
if resp.ID != 7 {
t.Errorf("id echo: got %d, want 7", resp.ID)
}
if resp.Error.Code != -32601 {
t.Errorf("error code: got %d, want -32601 (method not found)", resp.Error.Code)
}
if !strings.Contains(resp.Error.Message, "fs/read_text_file") {
t.Errorf("error message should name the unhandled method, got %q", resp.Error.Message)
}
}
// ── session/update notification handling ──
func TestHermesClientHandleAgentMessage(t *testing.T) {
@@ -343,211 +226,6 @@ func TestHermesClientHandleToolCallComplete(t *testing.T) {
}
}
// TestHermesClientKimiStreamingToolCall walks the real kimi frame
// sequence for a single Shell call:
// 1. tool_call with empty content (LLM hasn't started emitting args yet)
// 2. tool_call_update status=in_progress carrying the cumulative args
// JSON character-by-character ("{", "{\"command", …)
// 3. tool_call_update status=completed carrying the command's stdout
//
// The client must defer MessageToolUse until we have the full args so
// the UI doesn't show a command like `{"comma` — and the MessageToolUse
// must carry the parsed args as the Input map (`{"command": "echo hi"}`
// → Input["command"] = "echo hi") rather than a raw string.
func TestHermesClientKimiStreamingToolCall(t *testing.T) {
t.Parallel()
var got []Message
c := &hermesClient{
pending: make(map[int]*pendingRPC),
onMessage: func(msg Message) {
got = append(got, msg)
},
}
// 1. tool_call: empty content (classic kimi start frame).
c.handleLine(`{"jsonrpc":"2.0","method":"session/update","params":{"sessionId":"ses_1","update":{"sessionUpdate":"tool_call","toolCallId":"tc-kimi-1","title":"Shell","status":"in_progress","content":[{"type":"content","content":{"type":"text","text":""}}]}}}`)
if len(got) != 0 {
t.Fatalf("expected nothing emitted yet (args empty), got %+v", got)
}
// 2. Streaming updates — cumulative args JSON.
partials := []string{
`{"`,
`{"command`,
`{"command":`,
`{"command":"echo `,
`{"command":"echo hi"}`,
}
for _, args := range partials {
// JSON-encode args so embedded quotes are escaped properly.
argsJSON, _ := json.Marshal(args)
line := `{"jsonrpc":"2.0","method":"session/update","params":{"sessionId":"ses_1","update":{"sessionUpdate":"tool_call_update","toolCallId":"tc-kimi-1","status":"in_progress","content":[{"type":"content","content":{"type":"text","text":` + string(argsJSON) + `}}]}}}`
c.handleLine(line)
}
if len(got) != 0 {
t.Fatalf("expected nothing emitted mid-stream, got %+v", got)
}
// 3. Completed — stdout.
c.handleLine(`{"jsonrpc":"2.0","method":"session/update","params":{"sessionId":"ses_1","update":{"sessionUpdate":"tool_call_update","toolCallId":"tc-kimi-1","status":"completed","content":[{"type":"content","content":{"type":"text","text":"hi\n"}}]}}}`)
if len(got) != 2 {
t.Fatalf("expected [MessageToolUse, MessageToolResult], got %d: %+v", len(got), got)
}
if got[0].Type != MessageToolUse {
t.Errorf("first message: got %v, want MessageToolUse", got[0].Type)
}
if got[0].CallID != "tc-kimi-1" {
t.Errorf("first.callID: got %q", got[0].CallID)
}
if cmd, _ := got[0].Input["command"].(string); cmd != "echo hi" {
t.Errorf("first.Input.command: got %v, want %q", got[0].Input["command"], "echo hi")
}
if got[1].Type != MessageToolResult {
t.Errorf("second message: got %v, want MessageToolResult", got[1].Type)
}
if got[1].Output != "hi\n" {
t.Errorf("second.output: got %q, want %q", got[1].Output, "hi\n")
}
}
// TestHermesClientKimiMalformedArgsFallback: if the accumulated args
// aren't valid JSON (streaming glitch, tool with non-JSON args), we
// still surface the text under Input.text rather than silently
// dropping it.
func TestHermesClientKimiMalformedArgsFallback(t *testing.T) {
t.Parallel()
var got []Message
c := &hermesClient{
pending: make(map[int]*pendingRPC),
onMessage: func(msg Message) {
got = append(got, msg)
},
}
c.handleLine(`{"jsonrpc":"2.0","method":"session/update","params":{"sessionId":"ses_1","update":{"sessionUpdate":"tool_call","toolCallId":"tc","title":"Shell","status":"in_progress","content":[{"type":"content","content":{"type":"text","text":"not-json"}}]}}}`)
c.handleLine(`{"jsonrpc":"2.0","method":"session/update","params":{"sessionId":"ses_1","update":{"sessionUpdate":"tool_call_update","toolCallId":"tc","status":"completed","content":[{"type":"content","content":{"type":"text","text":"output"}}]}}}`)
if len(got) < 1 {
t.Fatalf("expected ToolUse+ToolResult, got %+v", got)
}
if text, _ := got[0].Input["text"].(string); text != "not-json" {
t.Errorf("fallback Input.text: got %v", got[0].Input["text"])
}
}
// TestHermesClientHandleToolCallCompleteOrphan: if a completion frame
// arrives without a preceding tool_call (out-of-order / missed frame),
// still emit ToolUse synthesised from the update's own title/rawInput
// before ToolResult. Keeps the UI from showing a bare result with no
// header.
func TestHermesClientHandleToolCallCompleteOrphan(t *testing.T) {
t.Parallel()
var got []Message
c := &hermesClient{
pending: make(map[int]*pendingRPC),
onMessage: func(msg Message) {
got = append(got, msg)
},
}
c.handleLine(`{"jsonrpc":"2.0","method":"session/update","params":{"sessionId":"ses_1","update":{"sessionUpdate":"tool_call_update","toolCallId":"tc","status":"completed","title":"terminal: ls","kind":"execute","rawInput":{"command":"ls"},"content":[{"type":"content","content":{"type":"text","text":"file.go\n"}}]}}}`)
if len(got) != 2 || got[0].Type != MessageToolUse || got[1].Type != MessageToolResult {
t.Fatalf("expected [ToolUse, ToolResult], got %+v", got)
}
if got[0].Tool != "terminal" {
t.Errorf("orphan ToolUse tool: got %q", got[0].Tool)
}
if cmd, _ := got[0].Input["command"].(string); cmd != "ls" {
t.Errorf("orphan ToolUse input.command: got %v", got[0].Input["command"])
}
if got[1].Output != "file.go\n" {
t.Errorf("ToolResult output: got %q", got[1].Output)
}
}
// TestHermesClientHandleToolCallRawOutputTakesPrecedence keeps hermes
// behaviour unchanged: when the update has both `rawOutput` (hermes
// convention) and `content` (would be ambiguous), honour rawOutput.
func TestHermesClientHandleToolCallRawOutputTakesPrecedence(t *testing.T) {
t.Parallel()
var got Message
c := &hermesClient{
pending: make(map[int]*pendingRPC),
onMessage: func(msg Message) {
got = msg
},
}
line := `{"jsonrpc":"2.0","method":"session/update","params":{"sessionId":"ses_1","update":{"sessionUpdate":"tool_call_update","toolCallId":"tc","status":"completed","rawOutput":"raw wins","content":[{"type":"content","content":{"type":"text","text":"ignored"}}]}}}`
c.handleLine(line)
if got.Output != "raw wins" {
t.Errorf("output: got %q, want %q", got.Output, "raw wins")
}
}
func TestExtractACPToolCallText(t *testing.T) {
t.Parallel()
tests := []struct {
name string
json string
want string
}{
{
name: "single text block",
json: `[{"type":"content","content":{"type":"text","text":"hello"}}]`,
want: "hello",
},
{
name: "multiple text blocks join with newline",
json: `[{"type":"content","content":{"type":"text","text":"a"}},{"type":"content","content":{"type":"text","text":"b"}}]`,
want: "a\nb",
},
{
name: "terminal blocks skipped",
json: `[{"type":"terminal","terminalId":"t1"},{"type":"content","content":{"type":"text","text":"shell out"}}]`,
want: "shell out",
},
{
name: "diff block renders as mini header",
json: `[{"type":"diff","path":"foo.go","oldText":"abc","newText":"abcdef"}]`,
want: "--- foo.go\n+++ foo.go\n(edited: 3 → 6 bytes)",
},
{
name: "new-file diff (no oldText)",
json: `[{"type":"diff","path":"new.go","oldText":"","newText":"hi"}]`,
want: "--- new.go\n+++ new.go\n(new file, 2 bytes)",
},
{
name: "empty array returns empty",
json: `[]`,
want: "",
},
{
name: "no text content",
json: `[{"type":"terminal","terminalId":"t1"}]`,
want: "",
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
var blocks []json.RawMessage
if err := json.Unmarshal([]byte(tt.json), &blocks); err != nil {
t.Fatalf("unmarshal: %v", err)
}
if got := extractACPToolCallText(blocks); got != tt.want {
t.Errorf("got %q, want %q", got, tt.want)
}
})
}
}
func TestHermesClientHandleToolCallInProgressIgnored(t *testing.T) {
t.Parallel()
@@ -706,7 +384,7 @@ func TestHermesProviderErrorSniffer(t *testing.T) {
// LLM endpoint rejects the requested model. We verify the
// sniffer extracts the `Error: ...` line so the task error
// tells the user *why* it failed.
s := newACPProviderErrorSniffer("hermes")
s := newHermesProviderErrorSniffer()
lines := []string{
"2026-04-20 23:41:47 [INFO] acp_adapter.server: Prompt on session abc",
`⚠️ API call failed (attempt 1/3): BadRequestError [HTTP 400]`,
@@ -731,7 +409,7 @@ func TestHermesProviderErrorSniffer(t *testing.T) {
func TestHermesProviderErrorSnifferIgnoresInfoLines(t *testing.T) {
t.Parallel()
s := newACPProviderErrorSniffer("hermes")
s := newHermesProviderErrorSniffer()
s.Write([]byte("2026-04-20 23:41:45 [INFO] acp_adapter.entry: Loaded env\n"))
s.Write([]byte("2026-04-20 23:41:47 [INFO] agent.auxiliary_client: Vision auto-detect...\n"))
if msg := s.message(); msg != "" {
@@ -744,7 +422,7 @@ func TestHermesProviderErrorSnifferHandlesPartialLines(t *testing.T) {
// Writer may be called mid-line; the sniffer must buffer until
// it sees a newline so the regex doesn't miss the header.
s := newACPProviderErrorSniffer("hermes")
s := newHermesProviderErrorSniffer()
s.Write([]byte(`⚠️ API call failed (attempt 1/3):`))
s.Write([]byte(` BadRequestError [HTTP 400]` + "\n"))
s.Write([]byte(` 📝 Error: something went wrong` + "\n"))
@@ -757,12 +435,12 @@ func TestHermesProviderErrorSnifferHandlesPartialLines(t *testing.T) {
func TestHermesProviderErrorSnifferBoundedBuffer(t *testing.T) {
t.Parallel()
s := newACPProviderErrorSniffer("hermes")
s := newHermesProviderErrorSniffer()
for i := 0; i < 20; i++ {
// Each line differs so dedup doesn't merge them.
s.Write([]byte(`⚠️ API call failed (HTTP 400) attempt ` + string(rune('a'+i%26)) + `: Non-retryable error` + "\n"))
}
if len(s.lines) > acpMaxErrorLines {
t.Errorf("sniffer kept %d lines, limit is %d", len(s.lines), acpMaxErrorLines)
if len(s.lines) > hermesMaxErrorLines {
t.Errorf("sniffer kept %d lines, limit is %d", len(s.lines), hermesMaxErrorLines)
}
}

View File

@@ -1,382 +0,0 @@
package agent
import (
"bufio"
"context"
"fmt"
"io"
"os/exec"
"strings"
"sync"
"time"
)
// kimiBlockedArgs are flags hardcoded by the daemon that must not be
// overridden by user-configured custom_args. `acp` is the protocol
// subcommand that drives the ACP JSON-RPC transport for Kimi Code CLI;
// overriding it would break the daemon↔Kimi communication contract.
var kimiBlockedArgs = map[string]blockedArgMode{
"acp": blockedStandalone,
}
// kimiBackend implements Backend by spawning `kimi acp` and communicating
// via the ACP (Agent Client Protocol) JSON-RPC 2.0 over stdin/stdout.
//
// Kimi Code CLI (https://github.com/MoonshotAI/kimi-cli) supports ACP out of
// the box via the `kimi acp` subcommand. We reuse the existing hermesClient
// ACP transport since both runtimes speak the same protocol — only the
// binary, env, and tool-name extraction differ.
type kimiBackend struct {
cfg Config
}
func (b *kimiBackend) Execute(ctx context.Context, prompt string, opts ExecOptions) (*Session, error) {
execPath := b.cfg.ExecutablePath
if execPath == "" {
execPath = "kimi"
}
if _, err := exec.LookPath(execPath); err != nil {
return nil, fmt.Errorf("kimi executable not found at %q: %w", execPath, err)
}
timeout := opts.Timeout
if timeout == 0 {
timeout = 20 * time.Minute
}
runCtx, cancel := context.WithTimeout(ctx, timeout)
// `kimi acp` ignores --yolo / --auto-approve (they're flags on the
// root `kimi` command, not on the `acp` subcommand). Instead, the
// daemon auto-approves in hermesClient.handleAgentRequest by replying
// "approve_for_session" to every session/request_permission request.
kimiArgs := append([]string{"acp"}, filterCustomArgs(opts.CustomArgs, kimiBlockedArgs, b.cfg.Logger)...)
cmd := exec.CommandContext(runCtx, execPath, kimiArgs...)
b.cfg.Logger.Debug("agent command", "exec", execPath, "args", kimiArgs)
if opts.Cwd != "" {
cmd.Dir = opts.Cwd
}
cmd.Env = buildEnv(b.cfg.Env)
stdout, err := cmd.StdoutPipe()
if err != nil {
cancel()
return nil, fmt.Errorf("kimi stdout pipe: %w", err)
}
stdin, err := cmd.StdinPipe()
if err != nil {
cancel()
return nil, fmt.Errorf("kimi stdin pipe: %w", err)
}
// Forward stderr to the daemon log *and* sniff provider-level
// errors out of it so we can surface them in the task result.
// Kimi's session/prompt still reports stopReason=end_turn when
// the underlying HTTP call to api.kimi.com returns 4xx/5xx, so
// without this the daemon reports a misleading "empty output"
// and the actionable error (expired token, rate limit, upstream
// 5xx, …) stays buried in the daemon log.
providerErr := newACPProviderErrorSniffer("kimi")
cmd.Stderr = io.MultiWriter(newLogWriter(b.cfg.Logger, "[kimi:stderr] "), providerErr)
if err := cmd.Start(); err != nil {
cancel()
return nil, fmt.Errorf("start kimi: %w", err)
}
b.cfg.Logger.Info("kimi acp started", "pid", cmd.Process.Pid, "cwd", opts.Cwd)
msgCh := make(chan Message, 256)
resCh := make(chan Result, 1)
var outputMu sync.Mutex
var output strings.Builder
promptDone := make(chan hermesPromptResult, 1)
// Reuse the hermesClient ACP transport — Kimi speaks the same protocol.
c := &hermesClient{
cfg: b.cfg,
stdin: stdin,
pending: make(map[int]*pendingRPC),
pendingTools: make(map[string]*pendingToolCall),
onMessage: func(msg Message) {
// hermesClient.handleToolCallStart has already mapped
// the raw ACP title via hermesToolNameFromTitle — which
// covers lowercase hermes-style titles ("read:", "patch
// (replace)", …) but not capitalised kimi-style ones
// ("Read file: …", "Run command: …"). Re-normalise so
// the UI sees consistent snake_case identifiers across
// both backends. No-op when the name is already normal
// form (e.g. already mapped to "read_file").
if msg.Type == MessageToolUse {
msg.Tool = kimiToolNameFromTitle(msg.Tool)
}
if msg.Type == MessageText {
outputMu.Lock()
output.WriteString(msg.Content)
outputMu.Unlock()
}
trySend(msgCh, msg)
},
onPromptDone: func(result hermesPromptResult) {
select {
case promptDone <- result:
default:
}
},
}
// Start reading stdout in background.
readerDone := make(chan struct{})
go func() {
defer close(readerDone)
scanner := bufio.NewScanner(stdout)
scanner.Buffer(make([]byte, 0, 1024*1024), 10*1024*1024)
for scanner.Scan() {
line := strings.TrimSpace(scanner.Text())
if line == "" {
continue
}
c.handleLine(line)
}
c.closeAllPending(fmt.Errorf("kimi process exited"))
}()
// Drive the ACP session lifecycle in a goroutine.
go func() {
defer cancel()
defer close(msgCh)
defer close(resCh)
defer func() {
stdin.Close()
_ = cmd.Wait()
}()
startTime := time.Now()
finalStatus := "completed"
var finalError string
var sessionID string
// 1. Initialize handshake.
_, err := c.request(runCtx, "initialize", map[string]any{
"protocolVersion": 1,
"clientInfo": map[string]any{
"name": "multica-agent-sdk",
"version": "0.2.0",
},
"clientCapabilities": map[string]any{},
})
if err != nil {
finalStatus = "failed"
finalError = fmt.Sprintf("kimi initialize failed: %v", err)
resCh <- Result{Status: finalStatus, Error: finalError, DurationMs: time.Since(startTime).Milliseconds()}
return
}
// 2. Create or resume a session.
cwd := opts.Cwd
if cwd == "" {
cwd = "."
}
if opts.ResumeSessionID != "" {
result, err := c.request(runCtx, "session/resume", map[string]any{
"cwd": cwd,
"sessionId": opts.ResumeSessionID,
})
if err != nil {
finalStatus = "failed"
finalError = fmt.Sprintf("kimi session/resume failed: %v", err)
resCh <- Result{Status: finalStatus, Error: finalError, DurationMs: time.Since(startTime).Milliseconds()}
return
}
sessionID = opts.ResumeSessionID
_ = result
} else {
result, err := c.request(runCtx, "session/new", map[string]any{
"cwd": cwd,
"mcpServers": []any{},
})
if err != nil {
finalStatus = "failed"
finalError = fmt.Sprintf("kimi session/new failed: %v", err)
resCh <- Result{Status: finalStatus, Error: finalError, DurationMs: time.Since(startTime).Milliseconds()}
return
}
sessionID = extractACPSessionID(result)
if sessionID == "" {
finalStatus = "failed"
finalError = "kimi session/new returned no session ID"
resCh <- Result{Status: finalStatus, Error: finalError, DurationMs: time.Since(startTime).Milliseconds()}
return
}
}
c.sessionID = sessionID
b.cfg.Logger.Info("kimi session created", "session_id", sessionID)
// 3. If the caller picked a model (via agent.model from the
// UI dropdown), ask kimi to switch the session to it before
// we send any prompt. Kimi's ACP server exposes
// `session/set_model` and advertises available models via
// the `models.availableModels` block returned by
// `session/new` — we pass the chosen modelId through
// verbatim. This MUST fail the task on error: silently
// falling back to kimi's default model would let the user
// believe their pick was honoured while the task actually
// ran on something else.
if opts.Model != "" {
if _, err := c.request(runCtx, "session/set_model", map[string]any{
"sessionId": sessionID,
"modelId": opts.Model,
}); err != nil {
b.cfg.Logger.Warn("kimi set_session_model failed", "error", err, "requested_model", opts.Model)
finalStatus = "failed"
finalError = fmt.Sprintf("kimi could not switch to model %q: %v", opts.Model, err)
resCh <- Result{
Status: finalStatus,
Error: finalError,
DurationMs: time.Since(startTime).Milliseconds(),
SessionID: sessionID,
}
return
}
b.cfg.Logger.Info("kimi session model set", "model", opts.Model)
}
// 4. Build the prompt content. If we have a system prompt, prepend it.
userText := prompt
if opts.SystemPrompt != "" {
userText = opts.SystemPrompt + "\n\n---\n\n" + prompt
}
// 5. Send the prompt and wait for PromptResponse.
_, err = c.request(runCtx, "session/prompt", map[string]any{
"sessionId": sessionID,
"prompt": []map[string]any{
{"type": "text", "text": userText},
},
})
if err != nil {
if runCtx.Err() == context.DeadlineExceeded {
finalStatus = "timeout"
finalError = fmt.Sprintf("kimi timed out after %s", timeout)
} else if runCtx.Err() == context.Canceled {
finalStatus = "aborted"
finalError = "execution cancelled"
} else {
finalStatus = "failed"
finalError = fmt.Sprintf("kimi session/prompt failed: %v", err)
}
} else {
select {
case pr := <-promptDone:
if pr.stopReason == "cancelled" {
finalStatus = "aborted"
finalError = "kimi cancelled the prompt"
}
c.usageMu.Lock()
c.usage.InputTokens += pr.usage.InputTokens
c.usage.OutputTokens += pr.usage.OutputTokens
c.usageMu.Unlock()
default:
}
}
duration := time.Since(startTime)
b.cfg.Logger.Info("kimi finished", "pid", cmd.Process.Pid, "status", finalStatus, "duration", duration.Round(time.Millisecond).String())
stdin.Close()
cancel()
<-readerDone
outputMu.Lock()
finalOutput := output.String()
outputMu.Unlock()
// If kimi produced no visible output but we sniffed a
// provider-level error on stderr (typically HTTP 4xx from
// api.kimi.com — token expired, rate-limited, upstream
// 5xx, …), promote the status to failed and surface the
// real reason. Without this the daemon reports a cryptic
// "completed + empty output" and the actionable error
// stays buried in daemon logs.
if finalStatus == "completed" && finalOutput == "" {
if msg := providerErr.message(); msg != "" {
finalStatus = "failed"
finalError = msg
}
}
c.usageMu.Lock()
u := c.usage
c.usageMu.Unlock()
var usageMap map[string]TokenUsage
if u.InputTokens > 0 || u.OutputTokens > 0 || u.CacheReadTokens > 0 {
model := opts.Model
if model == "" {
model = "unknown"
}
usageMap = map[string]TokenUsage{model: u}
}
resCh <- Result{
Status: finalStatus,
Output: finalOutput,
Error: finalError,
DurationMs: duration.Milliseconds(),
SessionID: sessionID,
Usage: usageMap,
}
}()
return &Session{Messages: msgCh, Result: resCh}, nil
}
// kimiToolNameFromTitle normalises tool names emitted by Kimi's ACP
// server into the snake_case identifiers the Multica UI expects.
//
// Kimi follows the ACP spec where `title` is a short human-readable
// label such as "Read file: /path/to/foo.go" or "Run command: ls".
// hermesToolNameFromTitle upstream handles hermes' lowercase
// convention ("read:", "patch (replace)") but not kimi's capitalised
// format — so we get called on the already-mapped name from hermes
// and fix up anything that slipped through. Empty input returns "".
func kimiToolNameFromTitle(title string) string {
t := strings.TrimSpace(title)
if t == "" {
return ""
}
// Strip everything after the first colon — ACP titles often look like
// "Tool Name: argument detail" and we want only the tool name.
if idx := strings.Index(t, ":"); idx > 0 {
t = strings.TrimSpace(t[:idx])
}
lower := strings.ToLower(t)
switch lower {
case "read", "read file":
return "read_file"
case "write", "write file":
return "write_file"
case "edit", "patch":
return "edit_file"
case "shell", "bash", "terminal", "run command", "run shell command":
return "terminal"
case "search", "grep", "find":
return "search_files"
case "glob":
return "glob"
case "web search":
return "web_search"
case "fetch", "web fetch":
return "web_fetch"
case "todo", "todo write":
return "todo_write"
}
// Fallback: snake_case the title so the UI gets a stable identifier.
return strings.ReplaceAll(lower, " ", "_")
}

View File

@@ -1,215 +0,0 @@
package agent
import (
"context"
"log/slog"
"os"
"path/filepath"
"strings"
"testing"
"time"
)
func TestNewReturnsKimiBackend(t *testing.T) {
t.Parallel()
b, err := New("kimi", Config{ExecutablePath: "/nonexistent/kimi"})
if err != nil {
t.Fatalf("New(kimi) error: %v", err)
}
if _, ok := b.(*kimiBackend); !ok {
t.Fatalf("expected *kimiBackend, got %T", b)
}
}
func TestKimiToolNameFromTitle(t *testing.T) {
t.Parallel()
tests := []struct {
title string
want string
}{
{"Read file: /tmp/foo.go", "read_file"},
{"read", "read_file"},
{"Write: /tmp/bar.go", "write_file"},
{"Edit", "edit_file"},
{"Patch: /tmp/x", "edit_file"},
{"Shell: ls -la", "terminal"},
{"Bash", "terminal"},
{"Run command: pwd", "terminal"},
{"Search: foo", "search_files"},
{"Glob: *.go", "glob"},
{"Web search: golang acp", "web_search"},
{"Fetch: https://example.com", "web_fetch"},
{"Todo Write", "todo_write"},
// Fallback: snake_case the title.
{"Custom Thing", "custom_thing"},
// Empty input returns empty — caller decides how to react.
{"", ""},
}
for _, tt := range tests {
got := kimiToolNameFromTitle(tt.title)
if got != tt.want {
t.Errorf("kimiToolNameFromTitle(%q) = %q, want %q", tt.title, got, tt.want)
}
}
}
// fakeKimiACPScript returns a POSIX-sh script that impersonates
// `kimi acp` for a single short ACP session: it acks initialize /
// session/new and then replies to session/set_model with a JSON-RPC
// error — the scenario the kimiBackend must propagate as a failed
// task rather than silently falling back to the default model.
func fakeKimiACPScript() string {
return `#!/bin/sh
# Fake ` + "`kimi`" + ` binary — used by TestKimiBackendSetModelFailureFailsTask
# and TestKimiBackendPassesYoloFlag.
#
# Writes the full argv (one arg per line) to $KIMI_ARGS_FILE if that env
# var is set, so tests can assert that the daemon invokes us with the
# right flags (`+"`--yolo acp`"+`, not bare `+"`acp`"+`).
#
# Then reads one JSON-RPC request per line from stdin, matches on the
# method name, and writes back a canned response. Exits after set_model
# so the kimiBackend cleanup path can run.
if [ -n "$KIMI_ARGS_FILE" ]; then
for arg in "$@"; do
printf '%s\n' "$arg" >> "$KIMI_ARGS_FILE"
done
fi
while IFS= read -r line; do
id=$(printf '%s' "$line" | sed -n 's/.*"id":\([0-9]*\).*/\1/p')
case "$line" in
*'"method":"initialize"'*)
printf '{"jsonrpc":"2.0","id":%s,"result":{"protocolVersion":1,"agentCapabilities":{}}}\n' "$id"
;;
*'"method":"session/new"'*)
printf '{"jsonrpc":"2.0","id":%s,"result":{"sessionId":"ses_fake"}}\n' "$id"
;;
*'"method":"session/set_model"'*)
printf '{"jsonrpc":"2.0","id":%s,"error":{"code":-32602,"message":"model not available: bogus-model"}}\n' "$id"
exit 0
;;
esac
done
`
}
// TestKimiBackendSetModelFailureFailsTask pins the "don't silently
// fall back" behaviour that landed in this PR: when kimi rejects the
// caller-selected model via session/set_model, the task result must
// report status=failed with a message that names the model and the
// upstream error — not claim success while actually running on the
// default model.
func TestKimiBackendSetModelFailureFailsTask(t *testing.T) {
t.Parallel()
fakePath := filepath.Join(t.TempDir(), "kimi")
if err := os.WriteFile(fakePath, []byte(fakeKimiACPScript()), 0o755); err != nil {
t.Fatalf("write fake kimi: %v", err)
}
backend, err := New("kimi", Config{ExecutablePath: fakePath, Logger: slog.Default()})
if err != nil {
t.Fatalf("new kimi backend: %v", err)
}
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
defer cancel()
session, err := backend.Execute(ctx, "prompt-ignored", ExecOptions{
Model: "bogus-model",
Timeout: 5 * time.Second,
})
if err != nil {
t.Fatalf("execute: %v", err)
}
// Drain message stream so the lifecycle goroutine can progress.
go func() {
for range session.Messages {
}
}()
select {
case result, ok := <-session.Result:
if !ok {
t.Fatal("result channel closed without a value")
}
if result.Status != "failed" {
t.Fatalf("expected status=failed, got %q (error=%q)", result.Status, result.Error)
}
if !strings.Contains(result.Error, `could not switch to model "bogus-model"`) {
t.Errorf("expected error to name the requested model, got %q", result.Error)
}
if !strings.Contains(result.Error, "model not available") {
t.Errorf("expected error to surface upstream message, got %q", result.Error)
}
if result.SessionID != "ses_fake" {
t.Errorf("expected session id to be preserved on failure, got %q", result.SessionID)
}
case <-time.After(10 * time.Second):
t.Fatal("timeout waiting for result")
}
}
// TestKimiBackendInvokesACPSubcommand pins the argv for `kimi`. An
// earlier fix tried passing `--yolo` to bypass per-tool approval
// prompts, but the `acp` subcommand in kimi-cli takes no options
// (see cli/__init__.py @cli.command def acp()), so `--yolo` was a
// no-op and the daemon still hung for 5 min on the first Shell call.
// The actual bypass is in hermesClient.handleAgentRequest, which
// auto-approves session/request_permission. This test catches
// accidental re-introduction of the dead flag.
func TestKimiBackendInvokesACPSubcommand(t *testing.T) {
t.Parallel()
tempDir := t.TempDir()
argsFile := filepath.Join(tempDir, "argv.txt")
fakePath := filepath.Join(tempDir, "kimi")
if err := os.WriteFile(fakePath, []byte(fakeKimiACPScript()), 0o755); err != nil {
t.Fatalf("write fake kimi: %v", err)
}
backend, err := New("kimi", Config{
ExecutablePath: fakePath,
Logger: slog.Default(),
Env: map[string]string{"KIMI_ARGS_FILE": argsFile},
})
if err != nil {
t.Fatalf("new kimi backend: %v", err)
}
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
defer cancel()
// Set Model so the fake binary exits on set_model and we don't
// have to wait for the prompt branch. We only care about argv here.
session, err := backend.Execute(ctx, "prompt-ignored", ExecOptions{
Model: "bogus-model",
Timeout: 5 * time.Second,
})
if err != nil {
t.Fatalf("execute: %v", err)
}
go func() {
for range session.Messages {
}
}()
<-session.Result
raw, err := os.ReadFile(argsFile)
if err != nil {
t.Fatalf("read args file: %v", err)
}
lines := strings.Split(strings.TrimSpace(string(raw)), "\n")
if len(lines) < 1 {
t.Fatalf("expected at least 1 arg (acp), got %d: %q", len(lines), lines)
}
if lines[0] != "acp" {
t.Errorf("expected first arg to be acp, got %q (full: %q)", lines[0], lines)
}
for _, l := range lines {
switch l {
case "--yolo", "--auto-approve", "--yes", "-y":
t.Errorf("kimi acp doesn't accept %q; auto-approval is handled in hermesClient.handleAgentRequest", l)
}
}
}

View File

@@ -71,10 +71,6 @@ func ListModels(ctx context.Context, providerType, executablePath string) ([]Mod
return cachedDiscovery(providerType, func() ([]Model, error) {
return discoverHermesModels(ctx, executablePath)
})
case "kimi":
return cachedDiscovery(providerType, func() ([]Model, error) {
return discoverKimiModels(ctx, executablePath)
})
case "opencode":
return cachedDiscovery(providerType, func() ([]Model, error) {
return discoverOpenCodeModels(ctx, executablePath)
@@ -321,52 +317,8 @@ func parsePiModels(output string) []Model {
// error) all return an empty list so the UI falls back to the
// creatable manual-entry input instead of blocking the form.
func discoverHermesModels(ctx context.Context, executablePath string) ([]Model, error) {
return discoverACPModels(ctx, executablePath, acpDiscoveryProvider{
defaultBin: "hermes",
clientName: "multica-model-discovery",
extraEnv: []string{"HERMES_YOLO_MODE=1"},
tmpdirPrefix: "multica-hermes-discovery-",
})
}
// discoverKimiModels spins up a throwaway `kimi acp` process and
// drives the same minimal ACP handshake as Hermes to surface the
// model catalog advertised by Kimi's `session/new` response. Kimi's
// ACPServer.new_session returns a `models` block of the same shape
// (`availableModels`/`currentModelId`) so the parsing path is shared.
//
// Failure modes (kimi missing, not logged in, config error) all
// return an empty list so the UI falls back to manual entry.
func discoverKimiModels(ctx context.Context, executablePath string) ([]Model, error) {
return discoverACPModels(ctx, executablePath, acpDiscoveryProvider{
defaultBin: "kimi",
clientName: "multica-model-discovery",
tmpdirPrefix: "multica-kimi-discovery-",
})
}
// acpDiscoveryProvider configures how discoverACPModels launches an
// ACP-speaking agent CLI. The shared helper drives every CLI in
// the same way (initialize → session/new → parse models block) — the
// per-provider differences are which binary to spawn, which env
// vars suppress interactive prompts during init, and what to label
// temporary work directories so they're easy to identify in logs.
type acpDiscoveryProvider struct {
defaultBin string
clientName string
extraEnv []string
tmpdirPrefix string
}
// discoverACPModels runs the ACP handshake for any agent CLI that
// implements the standard `initialize` + `session/new` flow and
// advertises its model catalog in the response under
// `models.availableModels` / `models.currentModelId`. This covers
// Hermes and Kimi today; future ACP backends can plug in by adding
// an acpDiscoveryProvider entry instead of duplicating the loop.
func discoverACPModels(ctx context.Context, executablePath string, p acpDiscoveryProvider) ([]Model, error) {
if executablePath == "" {
executablePath = p.defaultBin
executablePath = "hermes"
}
if _, err := exec.LookPath(executablePath); err != nil {
return []Model{}, nil
@@ -375,9 +327,8 @@ func discoverACPModels(ctx context.Context, executablePath string, p acpDiscover
defer cancel()
cmd := exec.CommandContext(runCtx, executablePath, "acp")
if len(p.extraEnv) > 0 {
cmd.Env = append(os.Environ(), p.extraEnv...)
}
// Mirror the real backend's auto-approve so init doesn't prompt.
cmd.Env = append(os.Environ(), "HERMES_YOLO_MODE=1")
stdin, err := cmd.StdinPipe()
if err != nil {
return []Model{}, nil
@@ -419,16 +370,16 @@ func discoverACPModels(ctx context.Context, executablePath string, p acpDiscover
// Send initialize + session/new.
if err := writeACP(1, "initialize", map[string]any{
"protocolVersion": 1,
"clientInfo": map[string]any{"name": p.clientName, "version": "0.1.0"},
"clientInfo": map[string]any{"name": "multica-model-discovery", "version": "0.1.0"},
"clientCapabilities": map[string]any{},
}); err != nil {
return []Model{}, nil
}
// session/new requires a valid cwd — use a temp directory we
// clean up afterwards, not the daemon's workdir (which might
// be in the middle of another task's worktree).
tmp, err := os.MkdirTemp("", p.tmpdirPrefix)
// Hermes requires a valid cwd for session/new — use a temp
// directory we clean up afterwards, not the daemon's workdir
// (which might be in the middle of another task's worktree).
tmp, err := os.MkdirTemp("", "multica-hermes-discovery-")
if err != nil {
return []Model{}, nil
}
@@ -463,7 +414,7 @@ func discoverACPModels(ctx context.Context, executablePath string, p acpDiscover
if env.ID.String() != "2" || len(env.Result) == 0 {
continue
}
done <- parseACPSessionNewModels(env.Result)
done <- parseHermesSessionNewModels(env.Result)
return
}
}()
@@ -481,15 +432,14 @@ func discoverACPModels(ctx context.Context, executablePath string, p acpDiscover
}
}
// parseACPSessionNewModels extracts the model catalog from an ACP
// `session/new` response. Both Hermes and Kimi (and any other ACP
// agent that follows the standard schema) emit:
// parseHermesSessionNewModels extracts the model catalog from a
// hermes `session/new` response. Hermes' ACP schema emits:
//
// {
// "sessionId": "...",
// "models": {
// "availableModels": [
// {"modelId": "...", "name": "...", "description": "..."}
// {"modelId": "...", "name": "...", "description": "... current"}
// ],
// "currentModelId": "..."
// }
@@ -498,7 +448,7 @@ func discoverACPModels(ctx context.Context, executablePath string, p acpDiscover
// Returns nil (not an empty slice) when the payload is missing so
// the caller can distinguish "parsed with no models" (valid but
// empty catalog) from "couldn't find the structure at all".
func parseACPSessionNewModels(raw json.RawMessage) []Model {
func parseHermesSessionNewModels(raw json.RawMessage) []Model {
var resp struct {
Models struct {
AvailableModels []struct {

View File

@@ -258,7 +258,7 @@ func TestParseHermesSessionNewModels(t *testing.T) {
"currentModelId": "nous:anthropic/claude-opus-4.7"
}
}`)
models := parseACPSessionNewModels(raw)
models := parseHermesSessionNewModels(raw)
if len(models) != 2 {
t.Fatalf("expected 2 models (duplicate deduped), got %d: %+v", len(models), models)
}
@@ -281,13 +281,13 @@ func TestParseHermesSessionNewModelsMissingField(t *testing.T) {
// failed _build_model_state — should yield nil so the caller
// can distinguish "no catalog" from "empty catalog".
raw := []byte(`{"sessionId": "ses_123"}`)
if got := parseACPSessionNewModels(raw); got != nil && len(got) != 0 {
if got := parseHermesSessionNewModels(raw); got != nil && len(got) != 0 {
t.Errorf("expected nil/empty, got %+v", got)
}
}
func TestParseHermesSessionNewModelsGarbage(t *testing.T) {
if got := parseACPSessionNewModels([]byte("not json")); got != nil {
if got := parseHermesSessionNewModels([]byte("not json")); got != nil {
t.Errorf("expected nil for non-JSON, got %+v", got)
}
}