Commit Graph

24 Commits

Author SHA1 Message Date
Bohan Jiang
e0e91fc792 feat(daemon): harden agent mention-loop instructions (#1581)
* feat(daemon): harden agent mention-loop instructions

Two agents that mention each other via `mention://agent/<id>` can fall into
an infinite reply loop — each says "I'm done" in prose but keeps
`@mentioning` the other, which re-enqueues their run. Adding hard caps on
agent-to-agent turns conflicts with Multica's design principle of giving
agents the same authorship freedom as humans, so this change hardens the
instructions that the harness injects instead.

- Replace the terse "mentions are actions" blurb with a full Mentions
  protocol: `side-effecting` warning, explicit "when NOT to mention"
  (replying to another agent, sign-offs, thanks) and "when a mention IS
  appropriate" (human escalation, first-time delegation, user asked).
- Add a pre-workflow decision step for comment-triggered runs: decide
  whether a reply is warranted at all, decide whether to include any
  `@mention`, and clarify that the post-a-comment rule is mandatory *if*
  you reply — silence is a valid exit for agent-to-agent threads.
- Thread the triggering comment's author kind + display name
  (`TriggerAuthorType` / `TriggerAuthorName`) from the claim endpoint
  through the daemon task type, per-turn prompt, and CLAUDE.md workflow.
  When the author is another agent, both surfaces now name that agent
  and warn against sign-off mentions.
- Soften the old closing line that told agents to `always` use the
  mention format — the word generalized to member/agent mentions and
  encouraged the very behavior that causes loops.

Refs GH#1576, MUL-1323.

* fix(daemon): remove MUST-respond conflict and sanitize trigger author name

Addresses two blocking points on PR #1581:

1. buildCommentPrompt told the agent "You MUST respond to THIS comment"
   and unconditionally appended the reply command — directly conflicting
   with the new agent-to-agent silence-as-valid-exit workflow. Models
   were likely to keep following the older must-reply rule and fall back
   into the loop this PR is trying to close.

   Rewrite the header as "Focus on THIS comment — do not confuse it
   with previous ones" (keeps the anti-stale-comment signal) and change
   BuildCommentReplyInstructions to open with "If you decide to reply,
   post it by running exactly this command" so the reply command is
   available but conditional across both prompt surfaces.

2. Raw agent/user display names were being embedded directly into the
   high-priority prompt and CLAUDE.md via TriggerAuthorName. Agent and
   member names are only validated as non-empty at write time, so a
   name containing newlines, backticks, or fake mention markup would
   turn the field into a cross-agent prompt-injection surface.

   Add execenv.SanitizePromptField — strip control runes, collapse
   whitespace, drop markdown structural characters (backtick, asterisk,
   brackets, pipe, angle brackets, hash, backslash), truncate to 64
   runes — and apply it at both embed sites (per-turn prompt and
   CLAUDE.md). Defense-in-depth at the consumption layer so this works
   for already-stored names without a migration.

Tests: TestSanitizePromptField covers the policy; TestBuildPromptSanitizesAgentName
plants an attack payload in TriggerAuthorName and checks the rendered prompt
does not leak the newline-anchored injection or the fake mention markup.
TestBuildPromptCommentTriggered*{,ByMember} updated to lock in the
conditional reply-command framing.

* refactor(daemon): trim redundant CLAUDE.md preamble and drop name sanitizer

Per PR #1581 feedback:

1. Remove the `if ctx.TriggerAuthorType == "agent"` preamble block in
   runtime_config.go. It duplicated what workflow steps 4 and 5 already
   say ("Decide whether a reply is warranted", "Never @mention the
   agent you are replying to as a thank-you or sign-off"), so the
   signal lands the same without the extra ~7 lines of CLAUDE.md. The
   per-turn prompt preamble in prompt.go stays — that surface has no
   numbered workflow below it and would otherwise lose the
   silence-as-exit signal.

2. Delete execenv.SanitizePromptField + its test. Workspace agents are
   created by trusted team members, so the cross-agent name-injection
   surface it defended isn't realistic in the current trust model.

3. Drop TriggerAuthorType/Name from execenv.TaskContextForEnv and stop
   populating them in daemon.go — they're no longer read by the
   execenv package. The same fields on daemon.Task stay because
   prompt.go still needs them to label the triggering author in the
   per-turn prompt.

Tests simplified to match the leaner shape: CLAUDE.md regression
guards now assert that the anti-loop phrases live in the numbered
workflow, and the sanitizer-specific tests are removed.
2026-04-24 01:39:12 +08:00
LinYushen
d97aec83d7 fix: pass model to Hermes ACP and add hermes to InjectRuntimeConfig (#1203)
* fix: pass model to Hermes ACP session/new and add hermes to InjectRuntimeConfig

- hermes.go: include opts.Model in session/new params so Hermes uses
  the configured model instead of its default (fixes local LLM failures)
- runtime_config.go: add "hermes" to the AGENTS.md provider list so
  Hermes receives the Multica runtime instructions and skill discovery

Fixes: https://github.com/multica-ai/multica/issues/1195

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix(hermes): drop false native-skill claim and add regression tests

The previous change added 'hermes' to the 'skills discovered automatically'
branch of buildMetaSkillContent, but resolveSkillsDir has no Hermes case so
skills still land in the .agent_context/skills/ fallback. AGENTS.md ended up
claiming native discovery while the files were somewhere else, which would
mislead Hermes (and future debuggers).

- Move 'hermes' to the fallback branch alongside 'gemini' so AGENTS.md points
  Hermes at .agent_context/skills/ — matching where writeContextFiles actually
  writes them.
- Extract buildHermesSessionParams so the session/new payload is unit-testable.
- Add regression tests covering:
  * buildHermesSessionParams includes/omits 'model' correctly
  * InjectRuntimeConfig('hermes') writes AGENTS.md with the fallback hint
  * writeContextFiles('hermes') writes skills to .agent_context/skills/

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: CC-Girl <cc-girl@multica.ai>
2026-04-23 12:43:30 +08:00
Bohan Jiang
c76c790b32 fix(daemon/execenv): make posting result comment an explicit workflow step (#1372)
Agents were silently finishing tasks without ever posting results to the
issue — their final reply stayed in terminal/log output only. See MUL-1124.

Root cause: the injected CLAUDE.md / AGENTS.md put "post a comment with
results" inside the body of step 4 (a nested clause in the default workflow
description), so skill-driven flows jumped straight from "do the work" to
`status in_review`.

- Hoist posting the result comment into its own explicit, numbered step in
  both assignment-triggered and comment-triggered workflows, with the exact
  `multica issue comment add` invocation inlined.
- Add a hard warning at the top of the Output section that terminal / chat
  text is never delivered to the user.
- Add regression test covering both workflow branches.
2026-04-20 17:48:06 +08:00
devv-eve
b2307a5ee9 fix(execenv): write Copilot skills to .github/skills/ for native discovery (#1270)
GitHub Copilot CLI scans project-level skills from .github/skills/<name>/SKILL.md
(per the official cli-config-dir-reference docs), not from .agent_context/skills/.
Previously, skills injected for the copilot provider were placed under
.agent_context/skills/ and only referenced by name in AGENTS.md, meaning
Copilot would not actually pick them up.

- resolveSkillsDir: add a dedicated copilot case writing to .github/skills/
- Update doc comments in context.go and runtime_config.go
- Add TestWriteContextFilesCopilotNativeSkills covering the new path and
  ensuring .agent_context/skills/ is not created for copilot

Co-authored-by: Devv <devv@Devvs-Mac-mini.local>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-17 03:07:32 -07:00
LinYushen
b5de04da59 fix(daemon): platform-aware Codex sandbox config to unbreak macOS network (MUL-963) (#1246)
* fix(daemon): platform-aware Codex sandbox config to unbreak macOS network

On macOS, Codex's Seatbelt sandbox in workspace-write mode silently
ignores '[sandbox_workspace_write] network_access = true' (see
openai/codex#10390). That blocks DNS inside the sandbox, so 'multica
issue get' and other CLI calls fail with 'dial tcp: lookup ...: no such
host' — this is what caused MUL-963.

Changes:

- New server/internal/daemon/execenv/codex_sandbox.go: picks a sandbox
  policy based on runtime.GOOS and the detected Codex CLI version.
  Non-darwin or darwin with a known-fixed version keeps workspace-write
  + network_access=true; older darwin falls back to danger-full-access
  and logs a warn with upgrade hint. The fix-version threshold is a
  single constant (CodexDarwinNetworkAccessFixedVersion) so it's easy
  to bump once upstream ships.
- Per-task config.toml now gets a 'multica-managed' marker block
  (BEGIN/END comments) rewritten idempotently; user-owned keys outside
  the markers are preserved. Legacy inline sandbox directives from
  earlier daemon versions are stripped on migration.
- execenv.PrepareParams gains CodexVersion; execenv.Reuse takes a
  codexVersion arg; daemon.go caches detected versions at registration
  and threads them through to Prepare/Reuse.
- Replaces the old ensureCodexNetworkAccess tests with
  platform-parameterised coverage (linux vs darwin, idempotency,
  legacy-migration, policy matrix).
- docs/codex-sandbox-troubleshooting.md: symptom fingerprint table,
  decision matrix, self-check commands, trade-offs.

Refs: MUL-963

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix(daemon): hoist managed sandbox block above user tables (MUL-963)

Review on #1246 flagged that upsertMulticaManagedBlock appended the
managed block to EOF. If the user's config.toml ends inside a TOML table
(e.g. [permissions.multica] or [profiles.foo]), a trailing bare
sandbox_mode = "..." is parsed as a key of that preceding table, so
Codex silently ignores the policy the daemon meant to apply.

Two changes make the block position-independent:

- renderMulticaManagedBlock now emits only top-level key=value lines and
  uses TOML dotted-key form (sandbox_workspace_write.network_access =
  true) instead of opening a [sandbox_workspace_write] header. The block
  therefore neither inherits from nor leaks into any surrounding table.
- upsertMulticaManagedBlock always hoists the block to the top of the
  file (stripping any previously written managed block first), so the
  sandbox_mode line is always at the TOML root regardless of what the
  user put below it. This also migrates configs written by the original
  PR #1246 logic where the block was trapped behind a user table.

Added tests for the regression scenario (pre-existing [permissions.*]
table) and the legacy-trailing-block migration; updated the existing
Linux default test and the troubleshooting runbook to reflect the
dotted-key form.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: CC-Girl <cc-girl@multica.ai>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-17 14:03:13 +08:00
Jiayuan Zhang
bc1185f525 Merge pull request #755 from sanjay3290/feat/gemini-backend
feat(daemon): add Google Gemini CLI backend
2026-04-14 02:46:20 +08:00
yushen
20809052f5 fix(daemon): address GC review feedback
- Move WriteGCMeta from runTask() to handleTask() so it runs after
  task completion, not at start. Mid-task crashes leave orphan dirs
  that get cleaned by GCOrphanTTL.
- Strengthen isBareRepo to check both HEAD and objects/ directory.
- Remove empty workspace directories after all task dirs are cleaned.
- Add 30s context timeout to git worktree prune to prevent hangs.
- Add comprehensive unit tests for shouldCleanTaskDir (8 scenarios),
  cleanTaskDir, gcWorkspace empty-dir cleanup, isBareRepo, and
  WriteGCMeta/ReadGCMeta roundtrip.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 15:00:37 +08:00
bulai0408
47eb6cb612 fix(agent): enable network access for Codex sandbox so Multica CLI can reach API
Codex tasks running in workspace-write sandbox mode could not resolve
api.multica.ai because the hardcoded sandbox parameter in thread/start
overrode any config.toml settings, and the default sandbox policy blocks
network access.

Changes:
- Remove hardcoded `sandbox: "workspace-write"` from thread/start RPC —
  let Codex read sandbox config from its own config.toml instead
- Auto-generate config.toml in per-task CODEX_HOME with
  `sandbox_mode = "workspace-write"` and `network_access = true`,
  preserving any existing user settings
- Fix Reuse() to restore CodexHome for Codex provider on workdir reuse

Closes #368
2026-04-13 01:03:43 +08:00
Sanjay Ramadugu
f99f50eb0c feat(daemon): add Google Gemini CLI backend
Registers `gemini` as a sixth supported agent provider alongside claude,
codex, opencode, openclaw, and hermes.

- Daemon config probes for `gemini` on PATH (MULTICA_GEMINI_PATH /
  MULTICA_GEMINI_MODEL env overrides mirror the other providers).
- New agent.geminiBackend in pkg/agent/gemini.go: spawns
  `gemini -p <prompt> --yolo -o text [-m <model>] [-r <session>]`,
  reads stdout to completion, and returns a single MessageText plus
  the standard Result struct (Status / Output / DurationMs).
- Execution environment writes a GEMINI.md file into the task workdir
  (mirroring the existing CLAUDE.md / AGENTS.md injection for other
  providers) so Gemini discovers the Multica runtime meta-skill
  through its native mechanism.

Tests:

- pkg/agent/gemini_test.go — unit coverage for buildGeminiArgs
  (baseline, model override, resume session, omit-when-empty).
- internal/daemon/execenv/TestInjectRuntimeConfigGemini — verifies
  GEMINI.md is written and that CLAUDE.md/AGENTS.md are NOT.

Scope (intentional for v1):

- Text output only (`-o text`). Streaming tool events via
  `--output-format stream-json` is a follow-up once we have a
  reliable reproduction of Gemini's event schema.
- No MCP config plumbing. Gemini's `--allowed-mcp-server-names`
  filter pairs well with the per-agent MCP work on feat/per-agent-mcp;
  stacking the two can land as a follow-up.
- No token usage scraping (Gemini's accounting lives on the Google
  Cloud side, not a local JSONL log like claude/codex).
- No session resume wiring beyond accepting the ExecOptions field —
  the daemon does not yet persist Gemini session IDs because the text
  output mode does not expose them.

Migration / env changes:

- New optional environment variables MULTICA_GEMINI_PATH and
  MULTICA_GEMINI_MODEL. Default path is the string "gemini" (resolved
  via PATH at daemon startup). If no Gemini install is detected, the
  provider is simply absent from the runtime — no behavior change for
  existing deployments.
2026-04-11 22:58:49 -04:00
Jiayuan Zhang
2c1d1d989c fix(daemon): symlink Codex sessions dir to shared home for discoverability (#627)
Per-task CODEX_HOME isolated session logs in per-task directories, making
them invisible from the global ~/.codex/sessions/ where users expect to
find them. Symlink the sessions directory back to the shared home so
Codex writes session logs to the global location while keeping skills
isolated per task.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 15:38:34 +08:00
Quake Wang
36db325d50 feat(daemon): add opencode as supported agent provider (#341)
* feat(daemon): add opencode as supported agent provider

Add opencode backend alongside claude and codex. The backend spawns
`opencode run --format json`, parses streaming JSON events (text,
tool_use, error, step_start/finish), and supports --prompt for system
prompts. Includes CLI detection, AGENTS.md runtime config, native skill
discovery via .config/opencode/skills/, and 21 tests covering handlers,
JSON parsing, and integration-level processEvents scenarios.

* chore: add .tool-versions to gitignore
2026-04-02 17:52:07 +08:00
Jiayuan
1054e218ed fix(daemon): update execenv tests to match current renderIssueContext output
CLI hints like "multica issue get" were moved to CLAUDE.md and are no
longer rendered into issue_context.md. Remove stale assertions.
2026-03-31 15:15:06 +08:00
Jiayuan
7d126cc549 feat(daemon): group task directories by workspace ID
Task execution environments were all created flat under WorkspacesRoot,
mixing tasks from different workspaces. Now tasks are nested under their
workspace ID for clearer organization and easier per-workspace cleanup.
2026-03-30 20:13:30 +08:00
Jiayuan
cdc1ac708e feat(daemon): agent-driven repo checkout with bare clone cache
Agents now decide which repo to use based on issue context and check out
repos on demand via `multica repo checkout <url>`. Workspace repos are
cached locally as bare clones for fast worktree creation.

Key changes:
- Add repocache package for bare clone management (clone, fetch, worktree)
- Add `multica repo checkout` CLI command that talks to local daemon
- Add POST /repo/checkout endpoint on daemon health server
- Pass workspace repos metadata through register + task claim responses
- Remove pre-created worktrees from execenv (workdir starts empty)
- Update CLAUDE.md template to instruct agents to use `multica repo checkout`
- Pass MULTICA_DAEMON_PORT, WORKSPACE_ID, AGENT_NAME, TASK_ID env vars to agent
2026-03-29 19:37:48 +08:00
Jiayuan
46144646c5 feat(daemon): inject skills into agent-native directories
Write skills to provider-native paths so agents discover them
automatically instead of relying on manual path references in
CLAUDE.md/AGENTS.md.

- Claude: write to {workDir}/.claude/skills/ (native discovery)
- Codex: write to per-task CODEX_HOME/skills/ with auth/config
  seeded from ~/.codex/ (symlink auth.json, copy config files)
- Fallback: keep .agent_context/skills/ for unknown providers

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-28 00:47:00 +08:00
LinYushen
6d2a0b45d2 refactor: decouple task lifecycle from issue status (#151)
* refactor: decouple task lifecycle from issue status, add daemon health server

- Remove automatic issue status changes from StartTask (in_progress),
  CompleteTask (in_review), and FailTask (blocked) in task service.
  Issue status is now fully managed by the agent via `multica issue status`.
- Update agent prompt and meta skill to instruct agents to manage issue
  status themselves (in_progress → done/in_review/blocked).
- Add daemon health HTTP server on 127.0.0.1:19514 with /health endpoint
  exposing pid, uptime, agents, and workspaces. Fail fast if port is taken
  (another daemon already running).
- Update `multica status` to check both server and daemon health.
- Add Save button to repos section in workspace settings UI.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor(daemon): simplify prompt, fix runtime config path, improve task error logging

- Slim down BuildPrompt to a minimal hint; detailed workflow now lives in CLAUDE.md/AGENTS.md
- Write CLAUDE.md to workDir root instead of .claude/CLAUDE.md
- Fix git-exclude pattern (.claude → CLAUDE.md)
- Decouple task queue reconciliation from issue status changes (agents manage status via CLI)
- Add diagnostic logging when CompleteTask/FailTask fail due to unexpected task state

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(task): use task_completed/task_failed inbox notification types

FailTask was sending "agent_blocked" which conflates agent crash with
issue-level blocked status. Align notification types with the new
decoupled model: task_completed and task_failed. Update frontend types
and labels accordingly.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 18:30:21 +08:00
yushen
1deae2a1e9 refactor(daemon): remove context snapshot, let agent fetch data via CLI
Replace the frozen context snapshot pattern with a CLI-driven approach:
agents now use `multica` CLI commands to fetch issue details, comments,
and workspace context on demand, always getting the latest data.

- Remove buildContextSnapshot and snapshot generation from enqueue
- Claim endpoint now returns fresh agent name + skills from DB
- Daemon resolves provider from local runtimeIndex, not snapshot
- Prompt instructs agent to use `multica issue get` / `comment list`
- Meta skill (CLAUDE.md/AGENTS.md) documents all available CLI commands
- Skills still injected as filesystem files (static agent config)
- Simplify daemon types: remove TaskContext/IssueContext/RuntimeContext

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 15:31:22 +08:00
Naiyuan Qing
395814b16a fix(test): update daemon tests after removing acceptance_criteria/context_refs
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 19:26:35 +08:00
Naiyuan Qing
a500001093 refactor: remove acceptance_criteria and context_refs from issues
These fields were unused in practice. Removed from frontend types,
issue detail UI, backend handlers, daemon prompt/context, protocol
messages, SQL queries, and tests. DB columns retained with defaults.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 19:24:34 +08:00
yushen
e4a905c841 fix(daemon): improve error handling in auth and workspace loading
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 17:22:12 +08:00
yushen
7b4a73c989 refactor(daemon): remove global ReposRoot, use per-task RepoPath from server
ReposRoot was a daemon-level config that locked all tasks to a single
git repo. Replace with RepoPath in TaskContext so the server can specify
the repo per task. When not provided, daemon falls back to directory mode.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 16:04:33 +08:00
Naiyuan Qing
8983a9fefa feat(logging): add structured logging across server and SDK
Replace raw fmt/log calls with structured slog logger (Go) and
console-based logger (TypeScript). Add request logging middleware.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 10:57:11 +08:00
Jiayuan Zhang
02df33803a feat: structured skills system with meta skill runtime injection
Replace agent.skills TEXT field with structured skill/skill_file/agent_skill
tables. Skills are workspace-level entities with supporting files, reusable
across agents via many-to-many bindings.

Backend: migration 008, sqlc queries, CRUD handler, agent-skill junction,
structured skill loading in task context snapshot.

Daemon: meta skill injection via runtime-native config (.claude/CLAUDE.md
for Claude, AGENTS.md for Codex) so agents discover .agent_context/ skills
through their native mechanism. Lean prompt without inlined skill content.

Frontend: Skills management page, agent Skills tab picker, SDK methods,
TypeScript types, workspace store integration.

Also removes auto-creation of init issues when creating agents.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-25 15:17:59 +08:00
Jiayuan Zhang
678266ec87 feat(daemon): add per-task isolated execution environments
Introduce the `execenv` package that creates isolated working directories
for each agent task. Supports git worktree mode (code tasks) and plain
directory mode (non-code tasks), with `.agent_context/issue_context.md`
injected into the workdir for Claude Code to discover.

Key changes:
- New `server/internal/daemon/execenv/` package (Prepare/Cleanup)
- `runTask()` now creates isolated env instead of using shared reposRoot
- Prompt updated to reference `.agent_context/` files
- Add `WorkspacesRoot` config (default ~/multica_workspaces)
- Add `KeepEnvAfterTask` config for debugging
- Default agent timeout increased from 20min to 2h
- `CompleteTask` now forwards branch name to server

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-25 12:41:52 +08:00