Files
multica/server/internal
Bohan Jiang e0e91fc792 feat(daemon): harden agent mention-loop instructions (#1581)
* feat(daemon): harden agent mention-loop instructions

Two agents that mention each other via `mention://agent/<id>` can fall into
an infinite reply loop — each says "I'm done" in prose but keeps
`@mentioning` the other, which re-enqueues their run. Adding hard caps on
agent-to-agent turns conflicts with Multica's design principle of giving
agents the same authorship freedom as humans, so this change hardens the
instructions that the harness injects instead.

- Replace the terse "mentions are actions" blurb with a full Mentions
  protocol: `side-effecting` warning, explicit "when NOT to mention"
  (replying to another agent, sign-offs, thanks) and "when a mention IS
  appropriate" (human escalation, first-time delegation, user asked).
- Add a pre-workflow decision step for comment-triggered runs: decide
  whether a reply is warranted at all, decide whether to include any
  `@mention`, and clarify that the post-a-comment rule is mandatory *if*
  you reply — silence is a valid exit for agent-to-agent threads.
- Thread the triggering comment's author kind + display name
  (`TriggerAuthorType` / `TriggerAuthorName`) from the claim endpoint
  through the daemon task type, per-turn prompt, and CLAUDE.md workflow.
  When the author is another agent, both surfaces now name that agent
  and warn against sign-off mentions.
- Soften the old closing line that told agents to `always` use the
  mention format — the word generalized to member/agent mentions and
  encouraged the very behavior that causes loops.

Refs GH#1576, MUL-1323.

* fix(daemon): remove MUST-respond conflict and sanitize trigger author name

Addresses two blocking points on PR #1581:

1. buildCommentPrompt told the agent "You MUST respond to THIS comment"
   and unconditionally appended the reply command — directly conflicting
   with the new agent-to-agent silence-as-valid-exit workflow. Models
   were likely to keep following the older must-reply rule and fall back
   into the loop this PR is trying to close.

   Rewrite the header as "Focus on THIS comment — do not confuse it
   with previous ones" (keeps the anti-stale-comment signal) and change
   BuildCommentReplyInstructions to open with "If you decide to reply,
   post it by running exactly this command" so the reply command is
   available but conditional across both prompt surfaces.

2. Raw agent/user display names were being embedded directly into the
   high-priority prompt and CLAUDE.md via TriggerAuthorName. Agent and
   member names are only validated as non-empty at write time, so a
   name containing newlines, backticks, or fake mention markup would
   turn the field into a cross-agent prompt-injection surface.

   Add execenv.SanitizePromptField — strip control runes, collapse
   whitespace, drop markdown structural characters (backtick, asterisk,
   brackets, pipe, angle brackets, hash, backslash), truncate to 64
   runes — and apply it at both embed sites (per-turn prompt and
   CLAUDE.md). Defense-in-depth at the consumption layer so this works
   for already-stored names without a migration.

Tests: TestSanitizePromptField covers the policy; TestBuildPromptSanitizesAgentName
plants an attack payload in TriggerAuthorName and checks the rendered prompt
does not leak the newline-anchored injection or the fake mention markup.
TestBuildPromptCommentTriggered*{,ByMember} updated to lock in the
conditional reply-command framing.

* refactor(daemon): trim redundant CLAUDE.md preamble and drop name sanitizer

Per PR #1581 feedback:

1. Remove the `if ctx.TriggerAuthorType == "agent"` preamble block in
   runtime_config.go. It duplicated what workflow steps 4 and 5 already
   say ("Decide whether a reply is warranted", "Never @mention the
   agent you are replying to as a thank-you or sign-off"), so the
   signal lands the same without the extra ~7 lines of CLAUDE.md. The
   per-turn prompt preamble in prompt.go stays — that surface has no
   numbered workflow below it and would otherwise lose the
   silence-as-exit signal.

2. Delete execenv.SanitizePromptField + its test. Workspace agents are
   created by trusted team members, so the cross-agent name-injection
   surface it defended isn't realistic in the current trust model.

3. Drop TriggerAuthorType/Name from execenv.TaskContextForEnv and stop
   populating them in daemon.go — they're no longer read by the
   execenv package. The same fields on daemon.Task stay because
   prompt.go still needs them to label the triggering author in the
   per-turn prompt.

Tests simplified to match the leaner shape: CLAUDE.md regression
guards now assert that the anti-loop phrases live in the numbered
workflow, and the sanitizer-specific tests are removed.
2026-04-24 01:39:12 +08:00
..