* feat(daemon): harden agent mention-loop instructions
Two agents that mention each other via `mention://agent/<id>` can fall into
an infinite reply loop — each says "I'm done" in prose but keeps
`@mentioning` the other, which re-enqueues their run. Adding hard caps on
agent-to-agent turns conflicts with Multica's design principle of giving
agents the same authorship freedom as humans, so this change hardens the
instructions that the harness injects instead.
- Replace the terse "mentions are actions" blurb with a full Mentions
protocol: `side-effecting` warning, explicit "when NOT to mention"
(replying to another agent, sign-offs, thanks) and "when a mention IS
appropriate" (human escalation, first-time delegation, user asked).
- Add a pre-workflow decision step for comment-triggered runs: decide
whether a reply is warranted at all, decide whether to include any
`@mention`, and clarify that the post-a-comment rule is mandatory *if*
you reply — silence is a valid exit for agent-to-agent threads.
- Thread the triggering comment's author kind + display name
(`TriggerAuthorType` / `TriggerAuthorName`) from the claim endpoint
through the daemon task type, per-turn prompt, and CLAUDE.md workflow.
When the author is another agent, both surfaces now name that agent
and warn against sign-off mentions.
- Soften the old closing line that told agents to `always` use the
mention format — the word generalized to member/agent mentions and
encouraged the very behavior that causes loops.
Refs GH#1576, MUL-1323.
* fix(daemon): remove MUST-respond conflict and sanitize trigger author name
Addresses two blocking points on PR #1581:
1. buildCommentPrompt told the agent "You MUST respond to THIS comment"
and unconditionally appended the reply command — directly conflicting
with the new agent-to-agent silence-as-valid-exit workflow. Models
were likely to keep following the older must-reply rule and fall back
into the loop this PR is trying to close.
Rewrite the header as "Focus on THIS comment — do not confuse it
with previous ones" (keeps the anti-stale-comment signal) and change
BuildCommentReplyInstructions to open with "If you decide to reply,
post it by running exactly this command" so the reply command is
available but conditional across both prompt surfaces.
2. Raw agent/user display names were being embedded directly into the
high-priority prompt and CLAUDE.md via TriggerAuthorName. Agent and
member names are only validated as non-empty at write time, so a
name containing newlines, backticks, or fake mention markup would
turn the field into a cross-agent prompt-injection surface.
Add execenv.SanitizePromptField — strip control runes, collapse
whitespace, drop markdown structural characters (backtick, asterisk,
brackets, pipe, angle brackets, hash, backslash), truncate to 64
runes — and apply it at both embed sites (per-turn prompt and
CLAUDE.md). Defense-in-depth at the consumption layer so this works
for already-stored names without a migration.
Tests: TestSanitizePromptField covers the policy; TestBuildPromptSanitizesAgentName
plants an attack payload in TriggerAuthorName and checks the rendered prompt
does not leak the newline-anchored injection or the fake mention markup.
TestBuildPromptCommentTriggered*{,ByMember} updated to lock in the
conditional reply-command framing.
* refactor(daemon): trim redundant CLAUDE.md preamble and drop name sanitizer
Per PR #1581 feedback:
1. Remove the `if ctx.TriggerAuthorType == "agent"` preamble block in
runtime_config.go. It duplicated what workflow steps 4 and 5 already
say ("Decide whether a reply is warranted", "Never @mention the
agent you are replying to as a thank-you or sign-off"), so the
signal lands the same without the extra ~7 lines of CLAUDE.md. The
per-turn prompt preamble in prompt.go stays — that surface has no
numbered workflow below it and would otherwise lose the
silence-as-exit signal.
2. Delete execenv.SanitizePromptField + its test. Workspace agents are
created by trusted team members, so the cross-agent name-injection
surface it defended isn't realistic in the current trust model.
3. Drop TriggerAuthorType/Name from execenv.TaskContextForEnv and stop
populating them in daemon.go — they're no longer read by the
execenv package. The same fields on daemon.Task stay because
prompt.go still needs them to label the triggering author in the
per-turn prompt.
Tests simplified to match the leaner shape: CLAUDE.md regression
guards now assert that the anti-loop phrases live in the numbered
workflow, and the sanitizer-specific tests are removed.
* fix: pass model to Hermes ACP session/new and add hermes to InjectRuntimeConfig
- hermes.go: include opts.Model in session/new params so Hermes uses
the configured model instead of its default (fixes local LLM failures)
- runtime_config.go: add "hermes" to the AGENTS.md provider list so
Hermes receives the Multica runtime instructions and skill discovery
Fixes: https://github.com/multica-ai/multica/issues/1195
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* fix(hermes): drop false native-skill claim and add regression tests
The previous change added 'hermes' to the 'skills discovered automatically'
branch of buildMetaSkillContent, but resolveSkillsDir has no Hermes case so
skills still land in the .agent_context/skills/ fallback. AGENTS.md ended up
claiming native discovery while the files were somewhere else, which would
mislead Hermes (and future debuggers).
- Move 'hermes' to the fallback branch alongside 'gemini' so AGENTS.md points
Hermes at .agent_context/skills/ — matching where writeContextFiles actually
writes them.
- Extract buildHermesSessionParams so the session/new payload is unit-testable.
- Add regression tests covering:
* buildHermesSessionParams includes/omits 'model' correctly
* InjectRuntimeConfig('hermes') writes AGENTS.md with the fallback hint
* writeContextFiles('hermes') writes skills to .agent_context/skills/
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
---------
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: CC-Girl <cc-girl@multica.ai>
Agents were silently finishing tasks without ever posting results to the
issue — their final reply stayed in terminal/log output only. See MUL-1124.
Root cause: the injected CLAUDE.md / AGENTS.md put "post a comment with
results" inside the body of step 4 (a nested clause in the default workflow
description), so skill-driven flows jumped straight from "do the work" to
`status in_review`.
- Hoist posting the result comment into its own explicit, numbered step in
both assignment-triggered and comment-triggered workflows, with the exact
`multica issue comment add` invocation inlined.
- Add a hard warning at the top of the Output section that terminal / chat
text is never delivered to the user.
- Add regression test covering both workflow branches.
GitHub Copilot CLI scans project-level skills from .github/skills/<name>/SKILL.md
(per the official cli-config-dir-reference docs), not from .agent_context/skills/.
Previously, skills injected for the copilot provider were placed under
.agent_context/skills/ and only referenced by name in AGENTS.md, meaning
Copilot would not actually pick them up.
- resolveSkillsDir: add a dedicated copilot case writing to .github/skills/
- Update doc comments in context.go and runtime_config.go
- Add TestWriteContextFilesCopilotNativeSkills covering the new path and
ensuring .agent_context/skills/ is not created for copilot
Co-authored-by: Devv <devv@Devvs-Mac-mini.local>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* fix(daemon): platform-aware Codex sandbox config to unbreak macOS network
On macOS, Codex's Seatbelt sandbox in workspace-write mode silently
ignores '[sandbox_workspace_write] network_access = true' (see
openai/codex#10390). That blocks DNS inside the sandbox, so 'multica
issue get' and other CLI calls fail with 'dial tcp: lookup ...: no such
host' — this is what caused MUL-963.
Changes:
- New server/internal/daemon/execenv/codex_sandbox.go: picks a sandbox
policy based on runtime.GOOS and the detected Codex CLI version.
Non-darwin or darwin with a known-fixed version keeps workspace-write
+ network_access=true; older darwin falls back to danger-full-access
and logs a warn with upgrade hint. The fix-version threshold is a
single constant (CodexDarwinNetworkAccessFixedVersion) so it's easy
to bump once upstream ships.
- Per-task config.toml now gets a 'multica-managed' marker block
(BEGIN/END comments) rewritten idempotently; user-owned keys outside
the markers are preserved. Legacy inline sandbox directives from
earlier daemon versions are stripped on migration.
- execenv.PrepareParams gains CodexVersion; execenv.Reuse takes a
codexVersion arg; daemon.go caches detected versions at registration
and threads them through to Prepare/Reuse.
- Replaces the old ensureCodexNetworkAccess tests with
platform-parameterised coverage (linux vs darwin, idempotency,
legacy-migration, policy matrix).
- docs/codex-sandbox-troubleshooting.md: symptom fingerprint table,
decision matrix, self-check commands, trade-offs.
Refs: MUL-963
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* fix(daemon): hoist managed sandbox block above user tables (MUL-963)
Review on #1246 flagged that upsertMulticaManagedBlock appended the
managed block to EOF. If the user's config.toml ends inside a TOML table
(e.g. [permissions.multica] or [profiles.foo]), a trailing bare
sandbox_mode = "..." is parsed as a key of that preceding table, so
Codex silently ignores the policy the daemon meant to apply.
Two changes make the block position-independent:
- renderMulticaManagedBlock now emits only top-level key=value lines and
uses TOML dotted-key form (sandbox_workspace_write.network_access =
true) instead of opening a [sandbox_workspace_write] header. The block
therefore neither inherits from nor leaks into any surrounding table.
- upsertMulticaManagedBlock always hoists the block to the top of the
file (stripping any previously written managed block first), so the
sandbox_mode line is always at the TOML root regardless of what the
user put below it. This also migrates configs written by the original
PR #1246 logic where the block was trapped behind a user table.
Added tests for the regression scenario (pre-existing [permissions.*]
table) and the legacy-trailing-block migration; updated the existing
Linux default test and the troubleshooting runbook to reflect the
dotted-key form.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
---------
Co-authored-by: CC-Girl <cc-girl@multica.ai>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Move WriteGCMeta from runTask() to handleTask() so it runs after
task completion, not at start. Mid-task crashes leave orphan dirs
that get cleaned by GCOrphanTTL.
- Strengthen isBareRepo to check both HEAD and objects/ directory.
- Remove empty workspace directories after all task dirs are cleaned.
- Add 30s context timeout to git worktree prune to prevent hangs.
- Add comprehensive unit tests for shouldCleanTaskDir (8 scenarios),
cleanTaskDir, gcWorkspace empty-dir cleanup, isBareRepo, and
WriteGCMeta/ReadGCMeta roundtrip.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Codex tasks running in workspace-write sandbox mode could not resolve
api.multica.ai because the hardcoded sandbox parameter in thread/start
overrode any config.toml settings, and the default sandbox policy blocks
network access.
Changes:
- Remove hardcoded `sandbox: "workspace-write"` from thread/start RPC —
let Codex read sandbox config from its own config.toml instead
- Auto-generate config.toml in per-task CODEX_HOME with
`sandbox_mode = "workspace-write"` and `network_access = true`,
preserving any existing user settings
- Fix Reuse() to restore CodexHome for Codex provider on workdir reuse
Closes#368
Registers `gemini` as a sixth supported agent provider alongside claude,
codex, opencode, openclaw, and hermes.
- Daemon config probes for `gemini` on PATH (MULTICA_GEMINI_PATH /
MULTICA_GEMINI_MODEL env overrides mirror the other providers).
- New agent.geminiBackend in pkg/agent/gemini.go: spawns
`gemini -p <prompt> --yolo -o text [-m <model>] [-r <session>]`,
reads stdout to completion, and returns a single MessageText plus
the standard Result struct (Status / Output / DurationMs).
- Execution environment writes a GEMINI.md file into the task workdir
(mirroring the existing CLAUDE.md / AGENTS.md injection for other
providers) so Gemini discovers the Multica runtime meta-skill
through its native mechanism.
Tests:
- pkg/agent/gemini_test.go — unit coverage for buildGeminiArgs
(baseline, model override, resume session, omit-when-empty).
- internal/daemon/execenv/TestInjectRuntimeConfigGemini — verifies
GEMINI.md is written and that CLAUDE.md/AGENTS.md are NOT.
Scope (intentional for v1):
- Text output only (`-o text`). Streaming tool events via
`--output-format stream-json` is a follow-up once we have a
reliable reproduction of Gemini's event schema.
- No MCP config plumbing. Gemini's `--allowed-mcp-server-names`
filter pairs well with the per-agent MCP work on feat/per-agent-mcp;
stacking the two can land as a follow-up.
- No token usage scraping (Gemini's accounting lives on the Google
Cloud side, not a local JSONL log like claude/codex).
- No session resume wiring beyond accepting the ExecOptions field —
the daemon does not yet persist Gemini session IDs because the text
output mode does not expose them.
Migration / env changes:
- New optional environment variables MULTICA_GEMINI_PATH and
MULTICA_GEMINI_MODEL. Default path is the string "gemini" (resolved
via PATH at daemon startup). If no Gemini install is detected, the
provider is simply absent from the runtime — no behavior change for
existing deployments.
Per-task CODEX_HOME isolated session logs in per-task directories, making
them invisible from the global ~/.codex/sessions/ where users expect to
find them. Symlink the sessions directory back to the shared home so
Codex writes session logs to the global location while keeping skills
isolated per task.
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(daemon): add opencode as supported agent provider
Add opencode backend alongside claude and codex. The backend spawns
`opencode run --format json`, parses streaming JSON events (text,
tool_use, error, step_start/finish), and supports --prompt for system
prompts. Includes CLI detection, AGENTS.md runtime config, native skill
discovery via .config/opencode/skills/, and 21 tests covering handlers,
JSON parsing, and integration-level processEvents scenarios.
* chore: add .tool-versions to gitignore
Task execution environments were all created flat under WorkspacesRoot,
mixing tasks from different workspaces. Now tasks are nested under their
workspace ID for clearer organization and easier per-workspace cleanup.
Agents now decide which repo to use based on issue context and check out
repos on demand via `multica repo checkout <url>`. Workspace repos are
cached locally as bare clones for fast worktree creation.
Key changes:
- Add repocache package for bare clone management (clone, fetch, worktree)
- Add `multica repo checkout` CLI command that talks to local daemon
- Add POST /repo/checkout endpoint on daemon health server
- Pass workspace repos metadata through register + task claim responses
- Remove pre-created worktrees from execenv (workdir starts empty)
- Update CLAUDE.md template to instruct agents to use `multica repo checkout`
- Pass MULTICA_DAEMON_PORT, WORKSPACE_ID, AGENT_NAME, TASK_ID env vars to agent
Write skills to provider-native paths so agents discover them
automatically instead of relying on manual path references in
CLAUDE.md/AGENTS.md.
- Claude: write to {workDir}/.claude/skills/ (native discovery)
- Codex: write to per-task CODEX_HOME/skills/ with auth/config
seeded from ~/.codex/ (symlink auth.json, copy config files)
- Fallback: keep .agent_context/skills/ for unknown providers
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* refactor: decouple task lifecycle from issue status, add daemon health server
- Remove automatic issue status changes from StartTask (in_progress),
CompleteTask (in_review), and FailTask (blocked) in task service.
Issue status is now fully managed by the agent via `multica issue status`.
- Update agent prompt and meta skill to instruct agents to manage issue
status themselves (in_progress → done/in_review/blocked).
- Add daemon health HTTP server on 127.0.0.1:19514 with /health endpoint
exposing pid, uptime, agents, and workspaces. Fail fast if port is taken
(another daemon already running).
- Update `multica status` to check both server and daemon health.
- Add Save button to repos section in workspace settings UI.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* refactor(daemon): simplify prompt, fix runtime config path, improve task error logging
- Slim down BuildPrompt to a minimal hint; detailed workflow now lives in CLAUDE.md/AGENTS.md
- Write CLAUDE.md to workDir root instead of .claude/CLAUDE.md
- Fix git-exclude pattern (.claude → CLAUDE.md)
- Decouple task queue reconciliation from issue status changes (agents manage status via CLI)
- Add diagnostic logging when CompleteTask/FailTask fail due to unexpected task state
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(task): use task_completed/task_failed inbox notification types
FailTask was sending "agent_blocked" which conflates agent crash with
issue-level blocked status. Align notification types with the new
decoupled model: task_completed and task_failed. Update frontend types
and labels accordingly.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace the frozen context snapshot pattern with a CLI-driven approach:
agents now use `multica` CLI commands to fetch issue details, comments,
and workspace context on demand, always getting the latest data.
- Remove buildContextSnapshot and snapshot generation from enqueue
- Claim endpoint now returns fresh agent name + skills from DB
- Daemon resolves provider from local runtimeIndex, not snapshot
- Prompt instructs agent to use `multica issue get` / `comment list`
- Meta skill (CLAUDE.md/AGENTS.md) documents all available CLI commands
- Skills still injected as filesystem files (static agent config)
- Simplify daemon types: remove TaskContext/IssueContext/RuntimeContext
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
These fields were unused in practice. Removed from frontend types,
issue detail UI, backend handlers, daemon prompt/context, protocol
messages, SQL queries, and tests. DB columns retained with defaults.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ReposRoot was a daemon-level config that locked all tasks to a single
git repo. Replace with RepoPath in TaskContext so the server can specify
the repo per task. When not provided, daemon falls back to directory mode.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace raw fmt/log calls with structured slog logger (Go) and
console-based logger (TypeScript). Add request logging middleware.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace agent.skills TEXT field with structured skill/skill_file/agent_skill
tables. Skills are workspace-level entities with supporting files, reusable
across agents via many-to-many bindings.
Backend: migration 008, sqlc queries, CRUD handler, agent-skill junction,
structured skill loading in task context snapshot.
Daemon: meta skill injection via runtime-native config (.claude/CLAUDE.md
for Claude, AGENTS.md for Codex) so agents discover .agent_context/ skills
through their native mechanism. Lean prompt without inlined skill content.
Frontend: Skills management page, agent Skills tab picker, SDK methods,
TypeScript types, workspace store integration.
Also removes auto-creation of init issues when creating agents.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Introduce the `execenv` package that creates isolated working directories
for each agent task. Supports git worktree mode (code tasks) and plain
directory mode (non-code tasks), with `.agent_context/issue_context.md`
injected into the workdir for Claude Code to discover.
Key changes:
- New `server/internal/daemon/execenv/` package (Prepare/Cleanup)
- `runTask()` now creates isolated env instead of using shared reposRoot
- Prompt updated to reference `.agent_context/` files
- Add `WorkspacesRoot` config (default ~/multica_workspaces)
- Add `KeepEnvAfterTask` config for debugging
- Default agent timeout increased from 20min to 2h
- `CompleteTask` now forwards branch name to server
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>