Files
multica/server/internal/daemon/execenv/reply_instructions.go
Multica Eve 8ad673fdb7 MUL-3560: gate slim runtime brief behind runtime_brief_slim feature flag (#4449)
The MUL-3560 slim runtime brief — kind-driven dispatcher, per-section
gating, prose compression for ~7k chars saved on the typical
comment-triggered task — now ships behind the `runtime_brief_slim`
feature flag wired via the framework-level service from MUL-3615.

Default: OFF in every environment (production stays on the legacy
brief that has shipped for ~2 years). Staging opts in via the YAML
rule set; ops can override per-process with `FF_RUNTIME_BRIEF_SLIM=true`.
Production is held back until staging has burned in long enough that
we are confident the slim brief does not regress agent behaviour.

Architecture (one toggle point, two code paths, both fully tested):

  buildMetaSkillContent (runtime_config.go)
      │
      └─ useSlimBrief() → false (default)
      │     → fall through to the legacy verbose body that ships on
      │       main today — byte-for-byte unchanged, no migration risk
      │
      └─ useSlimBrief() → true
            → buildMetaSkillContentSlim (runtime_config_sections.go)
              → classifyTask → 5-way kind switch → per-section writers

BuildCommentReplyInstructions takes the same gate, so the per-turn
comment prompt and the runtime brief stay in sync on which template
they emit.

What's in this PR:

- runtime_config_flag.go (new): package-scope `runtimeFlags` atomic
  pointer + `SetFeatureFlags` setter + `useSlimBrief` toggle point.
  Nil-safe: a daemon that forgets to wire the service falls back to
  legacy, no panic.
- runtime_config_kind.go (new): `taskKind` enum + `classifyTask` +
  `hasIssueContext` predicate. Used only by the slim path.
- runtime_config_sections.go (new): the slim brief itself —
  `buildMetaSkillContentSlim` + per-section `writeXxx` helpers
  + `writeAvailableCommandsQuickCreate` minimal variant +
  `writeBackgroundTaskSafetySlim` compressed safety section. The
  Section × Kind matrix is documented inline on
  `buildMetaSkillContentSlim` and the test below checks the
  dispatcher does not diverge from the spec.
- reply_instructions.go: `BuildCommentReplyInstructions` gains a
  short slim-or-legacy prelude; new `buildCommentReplyInstructionsSlim`
  is the compressed cookbook (defers the shell-hazard rationale to
  `## Comment Formatting`).
- runtime_config.go: `buildMetaSkillContent` gains a 2-line
  dispatcher at the top; the legacy body is otherwise untouched.
- runtime_config_kind_test.go (new): canaries for both paths.
  - TestClassifyTask: 5 kinds + 3 tiebreak cases.
  - TestTaskKindHasIssueContext: predicate semantics.
  - TestSlimFlagOffUsesLegacy: nil flag service → legacy path
    (renders "Get full issue details.", a legacy-only substring).
  - TestSlimFlagOnUsesSlim: flag on → slim path (renders "full
    issue.", a slim-only one-liner) AND must NOT render legacy
    "Get full issue details.".
  - TestBuildMetaSkillContentSlimKindMatrix: locks the per-kind
    section set; heading match is line-anchored so inline references
    don't trip absence assertions.
  - TestSlimQuickCreateAvailableCommands: locks the minimal-variant
    content for quick-create (issue create present, every other
    Core command absent).
  - TestSlimBriefIsSubstantiallyShorter: ≥ 30% reduction guard so
    a future change can't accidentally re-bloat the slim path back
    to legacy levels.
- cmd/server/main.go: now calls `execenv.SetFeatureFlags(flags)`
  immediately after constructing the feature flag service.

Measured impact (slim vs legacy, claude provider, realistic fixture
with 2 repos + 2 skills + member initiator):

    legacy = 19567 chars
    slim   = 11868 chars    Δ = -7699  (-39.3%)

Verification:

- go vet ./internal/daemon/... ./cmd/server/...                  ok
- go test ./internal/daemon/...                                  ok
- go test ./pkg/featureflag/...                                  ok
- TestSlimBriefIsSubstantiallyShorter logs the 39.3% ratio
- TestSlimFlagOffUsesLegacy + TestSlimFlagOnUsesSlim pass both
  directions, so the dispatcher is locked in code.

The pre-existing `internal/handler` test failures
(TestLeaveWorkspace_RevokesOwnRuntimes,
TestDeleteMember_CancelsTasksFromAgentReassignment,
TestDeleteMember_NoRuntimes_DeletesMember) reproduce on plain
`origin/main` with the same `relation "channel_user_binding" does
not exist` SQL error — they are a missing-migration bug from the
recent channels foundation PR (ce28d0aa0), not anything this PR
touched.

Rollout plan:

1. Merge this PR. Production daemons keep emitting the legacy brief
   (flag default false).
2. Add a YAML rule to staging's
   `MULTICA_FEATURE_FLAGS_FILE`:

       runtime_brief_slim:
         default: true

   Staging daemons start emitting the slim brief on next restart.
3. Watch `agent prompt prepared` logs + agent behaviour for 7 days.
4. If staging is clean, flip the prod YAML to `default: true`.
   Legacy code path stays in the binary as a kill-switch
   (`FF_RUNTIME_BRIEF_SLIM=false` to revert without a deploy).
5. After ~30 days clean in prod, follow up with a PR that deletes
   the legacy body and the flag — same pattern as docs/feature-flags.md
   recommends ("plan the death of the flag at birth").

Co-authored-by: Eve <eve@multica-ai.local>
Co-authored-by: multica-agent <github@multica.ai>
2026-06-24 14:23:17 +08:00

246 lines
14 KiB
Go

package execenv
import "fmt"
// BuildNewCommentsHint returns the comment-reading pointer for the WARM path —
// the agent ran on this issue before, so there is a since-anchor. The server
// count is ISSUE-WIDE (every thread, not just the triggering one) and excludes
// the triggering comment itself because that body is already injected into the
// prompt. It ships only the COUNT and the cursor — never the comment bodies —
// so the server stays cheap and the agent pulls details on demand.
//
// The agent is told the full issue-wide volume but steered to read the
// triggering (parent) thread FIRST instead of blindly catching up on every
// thread. The issue-wide `--since` catch-up is kept as an explicit
// "only if you need it" fallback.
//
// Both the per-turn prompt (daemon.buildCommentPrompt) and the CLAUDE.md
// workflow (InjectRuntimeConfig) call this so the two surfaces cannot drift
// (hard requirement from PR #2816).
//
// Renders nothing on cold start (no prior run → newCommentsSince empty) or when
// there are no new comments (newCommentCount <= 0) or issueID is empty. In those
// cases the caller falls back to BuildResumedCommentsHint (when a prior session
// is active) or BuildColdCommentsHint.
func BuildNewCommentsHint(issueID, triggerCommentID, triggerThreadID, newCommentsSince string, newCommentCount int) string {
if newCommentCount <= 0 || newCommentsSince == "" || issueID == "" {
return ""
}
threadID := activeThreadID(triggerThreadID, triggerCommentID)
// When we know the triggering thread, steer the agent to read THAT thread
// first rather than blindly pulling every new comment issue-wide. The
// issue-wide --since catch-up is demoted to an only-if-needed fallback.
if threadID != "" {
return fmt.Sprintf(
"%d new comment(s) on this issue since your last run — don't read them all blindly. "+
"Start with the thread your triggering comment is in: "+
"`multica issue comment list %s --thread %s --since %s --output json` "+
"(swap `--since` for `--tail 30` if you need the full thread, not just the delta). "+
"Only if you need context from the other threads, catch up issue-wide: "+
"`multica issue comment list %s --since %s --output json`.\n\n",
newCommentCount, issueID, threadID, newCommentsSince, issueID, newCommentsSince,
)
}
// Defensive: comment triggers always carry a trigger id, but if one is
// missing there is no thread to anchor on, so fall back to the plain
// issue-wide catch-up.
return fmt.Sprintf(
"%d new comment(s) on this issue since your last run. Catch up: "+
"`multica issue comment list %s --since %s --output json`.\n\n",
newCommentCount, issueID, newCommentsSince,
)
}
// BuildResumedCommentsHint returns the comment-reading pointer for the WARM
// no-delta path: the daemon is resuming a prior provider session and the
// triggering comment body has already been injected into the per-turn prompt.
// newCommentCount == 0 here means no new comments arrived issue-wide since the
// last run (beyond the injected trigger and the agent's own replies). Keep the
// read bounded and conditional, but make it explicit that context-dependent
// replies should refresh the triggering conversation rather than trusting
// resumed memory alone.
func BuildResumedCommentsHint(issueID, triggerCommentID, triggerThreadID string) string {
threadID := activeThreadID(triggerThreadID, triggerCommentID)
if issueID == "" || threadID == "" {
return ""
}
return fmt.Sprintf(
"You're resuming the prior session, and the triggering comment is already included above. "+
"No other new comments on this issue since your last run. "+
"Use the active thread anchor `%s` and triggering comment ID `%s`. "+
"If your reply depends on thread context, do not rely only on resumed session memory — "+
"first pull the triggering conversation with: "+
"`multica issue comment list %s --thread %s --tail 30 --output json`.\n\n",
threadID, triggerCommentID, issueID, threadID,
)
}
// BuildColdCommentsHint returns the comment-reading pointer for the COLD path —
// the agent has no prior run on this issue, so there is no since-anchor and
// BuildNewCommentsHint renders nothing. Instead of dumping the whole flat
// timeline (oldest-first, server cap 2000), point the agent at the triggering
// CONVERSATION: `--thread <trigger> --tail 30` returns that thread's root plus
// its 30 newest replies (root is always included, even at --tail 0) — the
// context the triggering comment actually needs. A `--recent 10` pointer is kept
// for cross-thread background the agent can pull on judgment.
//
// Both surfaces call this so the cold fallback cannot drift between them (same
// single-source rule as BuildNewCommentsHint, PR #2816). Returns "" when there
// is no triggering comment to thread from, so the caller can keep a final plain
// fallback.
func BuildColdCommentsHint(issueID, triggerCommentID, triggerThreadID string) string {
threadID := activeThreadID(triggerThreadID, triggerCommentID)
if issueID == "" || threadID == "" {
return ""
}
return fmt.Sprintf(
"Read the triggering conversation first: "+
"`multica issue comment list %s --thread %s --tail 30 --output json` "+
"(that thread's root + its 30 newest replies). "+
"Need cross-thread background? `multica issue comment list %s --recent 10 --output json` "+
"(resolved threads come back folded — `--full` to expand).\n\n",
issueID, threadID, issueID,
)
}
func activeThreadID(triggerThreadID, triggerCommentID string) string {
if triggerThreadID != "" {
return triggerThreadID
}
return triggerCommentID
}
// BuildCommentReplyInstructions returns the canonical block telling an agent
// how to post its reply for a comment-triggered task. Both the per-turn
// prompt (daemon.buildCommentPrompt) and the CLAUDE.md workflow
// (InjectRuntimeConfig) call this so the trigger comment ID and the
// --parent value cannot drift between surfaces.
//
// The explicit "do not reuse --parent from previous turns" wording exists
// because resumed Claude sessions keep prior turns' tool calls in context
// and will otherwise copy the old --parent UUID forward.
//
// The template is platform-agnostic AND provider-agnostic — the failure it
// guards against lives at the shell layer, so it cannot be scoped to one
// provider or one OS:
//
// - Inline `--content "..."` lets the shell rewrite the body BEFORE the CLI
// receives it: a backtick-wrapped token becomes a failed command
// substitution that is silently deleted, the stored comment no longer
// matches what the model intended, and a model that notices the mismatch
// can retry forever (MUL-2904 / OKK-497). It also lets Codex emit literal
// `\n` escapes inside `--content` (MUL-1467).
// - `--content-stdin` with a HEREDOC has TWO failure modes the model cannot
// see:
// 1. On Windows, PowerShell 5.1's `$OutputEncoding` defaults to
// ASCIIEncoding when piping to native commands and drops non-ASCII as
// `?` before the bytes reach `multica.exe` (#2198 Chinese, #2236
// Chinese, #2376 Cyrillic).
// 2. On any host, when the model emits a multi-flag command (e.g.
// `multica issue create --title ... --assignee-id ... --project ...`)
// the bash heredoc/flag boundary is fragile: a `BODY \` "terminator
// with trailing token" is not recognised as the heredoc end, so flag
// lines after it are swallowed into the description; or a clean
// terminator turns the trailing `--assignee ...` line into a separate
// shell statement that fails while the create already succeeded with
// no assignee. Both paths exit 0 with silently dropped flags. Github
// issue #4182 documents two confirmed cases (OXY-78, OXY-76).
//
// The single safe path is therefore: write the body to a UTF-8 file with
// the file-write tool, post with `--content-file`, then remove the file.
// All flags live on one shell-token line; the body never touches the shell;
// no heredoc boundary exists for flags to leak across. This converges with
// the long-standing Windows path so the cross-platform template is one shape.
//
// provider is retained for caller symmetry and future per-provider tweaks; the
// guardrail itself is intentionally identical across providers and hosts.
func BuildCommentReplyInstructions(provider, issueID, triggerCommentID string) string {
if triggerCommentID == "" {
return ""
}
if useSlimBrief() {
return buildCommentReplyInstructionsSlim(provider, issueID, triggerCommentID)
}
if runtimeGOOS == "windows" {
return fmt.Sprintf(
"If you decide to reply, post it as a comment — always use the trigger comment ID below, "+
"do NOT reuse --parent values from previous turns in this session.\n\n"+
"On Windows, write the reply body to a UTF-8 file with your file-write tool, then post it with `--content-file`. "+
"Do NOT pipe via `--content-stdin` — Windows PowerShell 5.1's `$OutputEncoding` defaults to ASCIIEncoding when piping to native commands and silently drops non-ASCII (Chinese, Japanese, Cyrillic, accents, emoji) as `?` before the bytes reach `multica.exe`. "+
"Do NOT use inline `--content`; it is easy to lose formatting or accidentally compress a structured reply into one line.\n\n"+
"Use this form, preserving the same issue ID and --parent value:\n\n"+
" # 1. Write the reply body to a UTF-8 file (e.g. reply.md) with your file-write tool.\n"+
" # 2. Post the comment:\n"+
" multica issue comment add %s --parent %s --content-file ./reply.md\n"+
" # 3. Remove the temp file so a later run does not pick up stale content:\n"+
" Remove-Item ./reply.md\n\n"+
"Do NOT write literal `\\n` escapes to simulate line breaks; the file preserves real newlines.\n",
issueID, triggerCommentID,
)
}
// Linux/macOS, any provider: `--content-file`. Switched from `--content-stdin` +
// HEREDOC to converge with the Windows path and close GitHub #4182. The
// HEREDOC pattern was safe for the trivial single-flag case, but as soon as
// the model wrapped extra flags around the heredoc (assignee, project on
// `issue create` / `issue update`) it became fragile to flag/heredoc
// boundary mistakes — flags either got swallowed into the body or executed
// as separate failing shell statements while the create succeeded with
// nulls. The file path eliminates that class of error: all flags live on
// one command line, the body never reaches the shell.
return fmt.Sprintf(
"If you decide to reply, post it as a comment — always use the trigger comment ID below, "+
"do NOT reuse --parent values from previous turns in this session.\n\n"+
"Write the reply body to a UTF-8 file with your file-write tool first, then post it with `--content-file`. "+
"Do NOT use inline `--content`; the shell rewrites unescaped backticks, `$()`, `$VAR`, or quotes in the body before the CLI receives them. "+
"Do NOT use `--content-stdin` with a HEREDOC either — when extra flags (e.g. `--assignee`, `--project` on `multica issue create`) accompany the command, the bash heredoc/flag boundary is fragile and flags can be silently swallowed into the stdin stream while the command still exits 0 (see GitHub #4182, OXY-78 / OXY-76). "+
"It is also easy to lose formatting or compress a structured reply into one line with inline forms.\n\n"+
"Use this form, preserving the same issue ID and --parent value:\n\n"+
" # 1. Write the reply body to a UTF-8 file (e.g. reply.md) with your file-write tool.\n"+
" # 2. Post the comment:\n"+
" multica issue comment add %s --parent %s --content-file ./reply.md\n"+
" # 3. Remove the temp file so a later run does not pick up stale content:\n"+
" rm ./reply.md\n\n"+
"Do NOT write literal `\\n` escapes to simulate line breaks; the file preserves real newlines.\n",
issueID, triggerCommentID,
)
}
// buildCommentReplyInstructionsSlim is the post-MUL-3560 compressed
// reply-instructions block. Selected by BuildCommentReplyInstructions when
// the `runtime_brief_slim` feature flag is on; the legacy verbose form
// above stays the default in production.
//
// The slim block carries only the trigger-specific cookbook (the exact
// `--parent` UUID, the file path, the cleanup line) plus the two
// behavioural rules tests pin ("do NOT reuse --parent" and "do not rely
// on `\n` escapes"). The detailed shell-hazard rationale lives in the
// canonical `## Comment Formatting` section the same brief carries, so
// repeating it inline at every comment-triggered step 7 would be
// duplication, not signal.
func buildCommentReplyInstructionsSlim(provider, issueID, triggerCommentID string) string {
if runtimeGOOS == "windows" {
return fmt.Sprintf(
"If you decide to reply, post it as a comment — always use the trigger comment ID below, "+
"do NOT reuse --parent values from previous turns in this session.\n\n"+
"On Windows, write the reply body to a UTF-8 file with your file-write tool first, then post with `--content-file`. "+
"Do NOT pipe via `--content-stdin` — PowerShell 5.1's `$OutputEncoding` defaults to ASCIIEncoding when piping to native commands and silently drops non-ASCII (Chinese, Japanese, Cyrillic, accents, emoji) as `?` before bytes reach `multica.exe`. "+
"See ## Comment Formatting above for the full rule:\n\n"+
" multica issue comment add %s --parent %s --content-file ./reply.md\n"+
" Remove-Item ./reply.md\n\n"+
"Do NOT write literal `\\n` escapes to simulate line breaks; the file preserves real newlines.\n",
issueID, triggerCommentID,
)
}
return fmt.Sprintf(
"If you decide to reply, post it as a comment — always use the trigger comment ID below, "+
"do NOT reuse --parent values from previous turns in this session.\n\n"+
"Write the reply body to a UTF-8 file with your file-write tool first, then post it with `--content-file` "+
"(see ## Comment Formatting above for why inline `--content` and `--content-stdin` HEREDOCs are unsafe — MUL-2904 / #4182):\n\n"+
" multica issue comment add %s --parent %s --content-file ./reply.md\n"+
" rm ./reply.md\n\n"+
"Do NOT write literal `\\n` escapes to simulate line breaks; the file preserves real newlines.\n",
issueID, triggerCommentID,
)
}