fix(issues): wake parent squad leader on same-squad/shared-leader child-done (MUL-3969) (#4843)

* fix(issues): wake parent squad leader on same-squad/shared-leader child-done (MUL-3969) The child-done stage-barrier wrote the 'Stage N complete / wrap up the parent' system comment on the parent but suppressed the parent squad leader's wake whenever the finished child was owned by the same squad (childAssigneeIsSquad) or a squad sharing the leader (effectiveChildAgentOwner). That stranded the common 'a squad decomposes its parent into sub-issues it works itself' pattern: the parent silently stalled in in_progress because the leader was never woken to advance the next stage or wrap up. The prior guards assumed the leader had already observed the work via its own coordination cycle on the child, but that wake lands on the CHILD and never carries the parent-level stage-barrier instruction. Remove both self-trigger guards so the squad path mirrors the agent path (MUL-2808): always dispatch, bounded only by HasPendingTaskForIssueAndAgent. The private-leader access gate is unchanged; member/unassigned parents still never wake. Drop the now-dead effectiveChildAgentOwner / childAssigneeIsSquad helpers and the unused child param. Fixes #4838 Co-authored-by: multica-agent <github@multica.ai> * test(issues): fix stale child-done squad-guard comment (MUL-3969) The TestChildDoneTriggersParentAgentWhenChildSquadSharesLeader comment still claimed the both-sides-squads-sharing-a-leader case was guarded on the squad path and referenced the pre-rename test. That guard was removed in MUL-3969; point at the renamed test and state the current behavior. Co-authored-by: multica-agent <github@multica.ai> --------- Co-authored-by: J <j@multica.ai> Co-authored-by: multica-agent <github@multica.ai>
2026-07-05 13:29:44 +02:00 · 2026-07-02 15:48:45 +08:00
parent 0c4c3ff038
commit c328fdbcd0
3 changed files with 89 additions and 98 deletions
--- a/server/internal/handler/issue_child_done.go
+++ b/server/internal/handler/issue_child_done.go
@@ -60,7 +60,7 @@ import (
 // listeners still skip system comments wholesale, so smuggled mentions from
 // the child title cannot light up unrelated members. The parent assignee's
 // own trigger is fired explicitly by dispatchParentAssigneeTrigger below,
-// with the loop and idempotency guards documented there.
+// with the idempotency guard documented there.
 //
 // Errors are logged at warn level and swallowed: this is a best-effort
 // notification on the side of a successful status update; failing it must
@@ -198,7 +198,7 @@ func (h *Handler) notifyParentOfChildDone(ctx context.Context, prev, issue db.Is
 	// author_type='system'); this keeps smuggled mentions from the child
 	// title inert and gives the platform a single place to apply the loop
 	// and idempotency guards.
-	h.dispatchParentAssigneeTrigger(ctx, parent, issue, comment, actorType, actorID)
+	h.dispatchParentAssigneeTrigger(ctx, parent, comment, actorType, actorID)
 }

 // isTerminalChildStatus reports whether a child issue status counts as
@@ -380,7 +380,7 @@ func sanitizeMentionLabel(name string) string {
 // path; notifyParentOfChildDone skips them outright. The generic comment
 // listener is intentionally bypassed (it short-circuits on
 // author_type='system'), so this is the single place where the platform
-// applies loop and idempotency guards for the child-done notification.
+// applies the idempotency guard for the child-done notification.
 //
 // Side-effect semantics (intentionally narrower than a normal @mention):
 //   - agent parent: one EnqueueTaskForMention on the parent assignee, same
@@ -401,24 +401,27 @@ func sanitizeMentionLabel(name string) string {
 //
 // Guards applied here:
 //   - No-op when the parent has no assignee row.
-//   - Squad loop guard (squad parent only): skip when the finished child is
-//     the same squad, or its effective owner is the parent squad's leader. A
-//     squad leader already observes same-squad work through its own
-//     coordination cycle — the worker's completion comment wakes the leader
-//     via computeAssignedSquadLeaderCommentTrigger — so the child-done trigger would
-//     be redundant; this also closes the cross-squad shared-leader loop. The
-//     AGENT parent path intentionally has NO such guard (MUL-2808): a lone
-//     agent that decomposes its parent into sub-issues it owns itself has no
-//     other wake path, and waking the parent agent when its child finishes is
-//     a serial sub-task handoff across two DIFFERENT issues — explicitly not a
-//     self-loop per isAgentRunningOnIssue, and consistent with the @mention
-//     self-trigger path (computeMentionedAgentCommentTriggers). Runaway re-triggering is
-//     bounded by the idempotency guard below, not by suppressing the trigger.
+//   - NO self-trigger guard on either the agent OR the squad path. Waking the
+//     parent assignee when one of its children finishes is a serial sub-task
+//     handoff across two DIFFERENT issues, not a self-loop — legitimate per
+//     isAgentRunningOnIssue and the @mention self-trigger path
+//     (computeMentionedAgentCommentTriggers). The squad path used to skip a
+//     same-squad or shared-leader child on the theory that the leader had
+//     already observed the work through its own coordination cycle on the
+//     child. That stranded the common pattern where a squad decomposes its
+//     parent into sub-issues assigned to its own squad: the stage-barrier
+//     system comment lands on the PARENT carrying the "advance the next stage /
+//     wrap up" instruction, which a child-side wake never delivers — so the
+//     parent silently stalled in in_progress (MUL-3969). The squad path now
+//     mirrors the agent path (MUL-2808): always dispatch, bounded only by
+//     idempotency.
 //   - Idempotency: HasPendingTaskForIssueAndAgent dedupes rapid-fire enqueues
-//     for the same parent (e.g. two children finishing back-to-back).
+//     for the same parent (e.g. two children finishing back-to-back). It also
+//     bounds any re-trigger, since a leader waking on the parent does not by
+//     itself push a child back into a terminal transition.
 //   - Readiness: archived agents / missing runtimes are silently skipped
 //     so a closed-out agent does not surface as a phantom assignee.
-func (h *Handler) dispatchParentAssigneeTrigger(ctx context.Context, parent, child db.Issue, systemComment db.Comment, actorType, actorID string) {
+func (h *Handler) dispatchParentAssigneeTrigger(ctx context.Context, parent db.Issue, systemComment db.Comment, actorType, actorID string) {
 	if !parent.AssigneeType.Valid || !parent.AssigneeID.Valid {
 		return
 	}
@@ -427,7 +430,7 @@ func (h *Handler) dispatchParentAssigneeTrigger(ctx context.Context, parent, chi
 	case "agent":
 		h.triggerChildDoneAgent(ctx, parent, systemComment.ID)
 	case "squad":
-		h.triggerChildDoneSquad(ctx, parent, child, systemComment.ID, actorType, actorID)
+		h.triggerChildDoneSquad(ctx, parent, systemComment.ID, actorType, actorID)
 	}
 }

@@ -470,13 +473,18 @@ func (h *Handler) triggerChildDoneAgent(ctx context.Context, parent db.Issue, tr
 }

 // triggerChildDoneSquad enqueues a leader-role task for the parent's squad
-// assignee, applying the self-trigger guard against:
-//   - same squad on both sides (the leader already observed the child via
-//     its own coordination cycle), and
-//   - same effective leader on both sides — child agent == leader, or
-//     child squad's leader == this squad's leader (the cross-squad shared
-//     leader loop).
-func (h *Handler) triggerChildDoneSquad(ctx context.Context, parent, child db.Issue, triggerCommentID pgtype.UUID, actorType, actorID string) {
+// assignee. Like the agent path (see triggerChildDoneAgent) it applies NO
+// self-trigger guard: even when the finished child is owned by the same squad
+// or by another squad sharing this leader, the leader must still be woken on
+// the PARENT to advance the next stage or wrap up. The prior same-squad /
+// shared-leader guards assumed the leader had already observed the child via
+// its own coordination cycle, but that wake lands on the CHILD and never
+// carries the parent-level stage-barrier instruction, so it stranded the
+// common "squad decomposes its parent into sub-issues assigned to its own
+// squad" pattern (MUL-3969). Re-triggering is bounded by the
+// HasPendingTaskForIssueAndAgent idempotency check below, exactly as the
+// agent path relies on it.
+func (h *Handler) triggerChildDoneSquad(ctx context.Context, parent db.Issue, triggerCommentID pgtype.UUID, actorType, actorID string) {
 	squad, err := h.Queries.GetSquadInWorkspace(ctx, db.GetSquadInWorkspaceParams{
 		ID:          parent.AssigneeID,
 		WorkspaceID: parent.WorkspaceID,
@@ -490,19 +498,6 @@ func (h *Handler) triggerChildDoneSquad(ctx context.Context, parent, child db.Is
 		return
 	}

-	// Same-squad child → the leader has already observed the work via its
-	// own coordination cycle on the child; firing again on the parent would
-	// just re-trigger the same leader run with no new signal.
-	if childAssigneeIsSquad(child, parent.AssigneeID) {
-		return
-	}
-	// Shared-leader loop: child driven directly by the parent squad's leader,
-	// or by another squad whose leader is the same agent.
-	if owner := h.effectiveChildAgentOwner(ctx, child); owner.Valid &&
-		uuidToString(owner) == uuidToString(squad.LeaderID) {
-		return
-	}
-
 	agent, err := h.Queries.GetAgent(ctx, squad.LeaderID)
 	if err != nil || !agent.RuntimeID.Valid || agent.ArchivedAt.Valid {
 		return
@@ -524,44 +519,3 @@ func (h *Handler) triggerChildDoneSquad(ctx context.Context, parent, child db.Is
 			"leader_id", uuidToString(squad.LeaderID))
 	}
 }
-
-// effectiveChildAgentOwner returns the agent identity that effectively
-// "owns" the child issue from the perspective of the child-done trigger:
-//
-//   - child agent assignee → that agent
-//   - child squad assignee → that squad's leader (the agent that would
-//     actually act on a leader task and is the entry point for any squad
-//     work; a shared leader across two squads is the loop vector the
-//     callers above defend against)
-//   - anything else (member assignee, no assignee, missing squad row) →
-//     invalid UUID, signalling "no shared owner to compare against"
-//
-// Callers compare this against the agent they are about to trigger; equality
-// means we'd be enqueueing the same agent that just finished the child,
-// which is the loop case.
-func (h *Handler) effectiveChildAgentOwner(ctx context.Context, child db.Issue) pgtype.UUID {
-	if !child.AssigneeType.Valid || !child.AssigneeID.Valid {
-		return pgtype.UUID{}
-	}
-	switch child.AssigneeType.String {
-	case "agent":
-		return child.AssigneeID
-	case "squad":
-		squad, err := h.Queries.GetSquadInWorkspace(ctx, db.GetSquadInWorkspaceParams{
-			ID:          child.AssigneeID,
-			WorkspaceID: child.WorkspaceID,
-		})
-		if err != nil {
-			return pgtype.UUID{}
-		}
-		return squad.LeaderID
-	}
-	return pgtype.UUID{}
-}
-
-func childAssigneeIsSquad(child db.Issue, squadID pgtype.UUID) bool {
-	if !child.AssigneeType.Valid || child.AssigneeType.String != "squad" || !child.AssigneeID.Valid {
-		return false
-	}
-	return uuidToString(child.AssigneeID) == uuidToString(squadID)
-}
--- a/server/internal/handler/issue_child_done_test.go
+++ b/server/internal/handler/issue_child_done_test.go
@@ -450,9 +450,10 @@ func TestChildDoneTriggersParentAgentWhenSameAgentOwnsChild(t *testing.T) {
 // whose leader is also agent A. Because the parent is an AGENT, dispatch
 // routes through the agent path, which (post-MUL-2808) has no self-trigger
 // guard: A coordinates the parent and must be woken to advance it when the
-// child completes, regardless of who executed the child. The genuinely
-// loop-prone case — BOTH sides squads sharing a leader — is still guarded on
-// the squad path (see TestChildDoneSelfTriggerGuard_SquadParentDifferentSquadSameLeader).
+// child completes, regardless of who executed the child. The squad path now
+// behaves identically: MUL-3969 removed its old same-squad / shared-leader
+// guards, so BOTH sides being squads that share a leader also wakes the leader
+// (see TestChildDoneWakesLeaderWhenParentAndChildSquadsShareLeader).
 func TestChildDoneTriggersParentAgentWhenChildSquadSharesLeader(t *testing.T) {
 	fx := newChildDoneFixture(t, "in_progress")
 	sq := newSquadCommentTriggerFixture(t)
@@ -477,13 +478,16 @@ func TestChildDoneTriggersParentAgentWhenChildSquadSharesLeader(t *testing.T) {
 	}
 }

-// TestChildDoneSelfTriggerGuard_SquadParentDifferentSquadSameLeader — the
-// cross-squad shared-leader loop. Parent is squad A, child is squad B,
-// both squads have the same leader agent. The previous guard only blocked
-// `parent.squad == child.squad`, so two distinct squads sharing a leader
-// would still wake the same agent. effectiveChildAgentOwner reduces both
-// sides to "leader agent" and blocks the enqueue.
-func TestChildDoneSelfTriggerGuard_SquadParentDifferentSquadSameLeader(t *testing.T) {
+// TestChildDoneWakesLeaderWhenParentAndChildSquadsShareLeader — cross-squad
+// shared-leader case. Parent is squad A, child is squad B, both squads have
+// the same leader agent. The squad path used to suppress the leader wake here
+// (effectiveChildAgentOwner reduced both sides to the shared leader), but that
+// guard was removed in MUL-3969: waking the leader on the PARENT is a serial
+// sub-task handoff across two DIFFERENT issues, not a self-loop, and it is the
+// only signal that carries the parent-level stage-barrier instruction. The
+// leader must now be woken exactly once; runaway re-triggering is bounded by
+// the HasPendingTaskForIssueAndAgent idempotency check.
+func TestChildDoneWakesLeaderWhenParentAndChildSquadsShareLeader(t *testing.T) {
 	fx := newChildDoneFixture(t, "in_progress")
 	parentSquad := newSquadCommentTriggerFixture(t)

@@ -516,7 +520,37 @@ func TestChildDoneSelfTriggerGuard_SquadParentDifferentSquadSameLeader(t *testin
 	if !strings.Contains(content, "mention://squad/"+parentSquad.SquadID) {
 		t.Errorf("expected parent-squad mention in system comment, got: %s", content)
 	}
-	if got := countPendingTasksForAgent(t, fx.parent.ID, parentSquad.LeaderID); got != 0 {
-		t.Errorf("expected 0 pending leader tasks on parent (cross-squad shared-leader guard), got %d", got)
+	if got := countPendingTasksForAgent(t, fx.parent.ID, parentSquad.LeaderID); got != 1 {
+		t.Errorf("expected 1 pending leader task on parent (shared-leader guard removed, MUL-3969), got %d", got)
+	}
+}
+
+// TestChildDoneWakesLeaderWhenChildIsSameSquad — the MUL-3969 repro. Parent
+// and the just-finished child are BOTH assigned to the same squad (the common
+// "a squad decomposes its parent into sub-issues it works itself" pattern).
+// The old same-squad guard suppressed the leader wake, so the stage-barrier
+// system comment landed on the parent but the "wrap up / advance" instruction
+// was never delivered to the leader and the parent silently stalled in
+// in_progress. The leader must now be woken exactly once.
+func TestChildDoneWakesLeaderWhenChildIsSameSquad(t *testing.T) {
+	fx := newChildDoneFixture(t, "in_progress")
+	sq := newSquadCommentTriggerFixture(t)
+
+	setIssueAssigneeDirect(t, fx.parent.ID, "squad", sq.SquadID)
+	setIssueAssigneeDirect(t, fx.child.ID, "squad", sq.SquadID)
+	t.Cleanup(func() {
+		testPool.Exec(context.Background(),
+			`DELETE FROM agent_task_queue WHERE issue_id IN ($1, $2)`,
+			fx.parent.ID, fx.child.ID)
+	})
+
+	updateChildStatus(t, fx.child.ID, "done")
+
+	content := parentSystemCommentContent(t, fx.parent.ID)
+	if !strings.Contains(content, "mention://squad/"+sq.SquadID) {
+		t.Errorf("expected parent-squad mention in system comment, got: %s", content)
+	}
+	if got := countPendingTasksForAgent(t, fx.parent.ID, sq.LeaderID); got != 1 {
+		t.Errorf("expected 1 pending leader task for same-squad child (MUL-3969), got %d", got)
 	}
 }
--- a/server/internal/service/builtin_skills/multica-squads/references/squad-source-map.md
+++ b/server/internal/service/builtin_skills/multica-squads/references/squad-source-map.md
@@ -179,13 +179,16 @@ server/internal/handler/issue_child_done.go       # dispatchParentAssigneeTrigge

 Contracts:

- when child issue completes and parent is assigned to squad, parent squad
-  leader can be triggered (triggerChildDoneSquad at issue_child_done.go:304);
+- when a child issue closes a stage barrier and the parent is assigned to a
+  squad, the parent squad leader is triggered (triggerChildDoneSquad in
+  issue_child_done.go);
 - routing is leader-only — one `EnqueueTaskForSquadLeader` on the leader, no
-  member fan-out (issue_child_done.go:214-216, 344);
- loop guards skip same squad, same effective leader, and shared-leader
-  cross-squad cases (issue_child_done.go:229-235, effectiveChildAgentOwner ~367,
-  childAssigneeIsSquad ~387).
+  member fan-out (triggerChildDoneSquad / dispatchParentAssigneeTrigger);
+- no self-trigger guard: a same-squad or shared-leader child still wakes the
+  parent squad leader — the wake is a serial handoff onto the PARENT and is the
+  only carrier of the stage-barrier "advance / wrap up" instruction (MUL-3969,
+  mirrors the agent path from MUL-2808). Re-triggering is bounded only by
+  `HasPendingTaskForIssueAndAgent` (idempotent per parent issue + agent).

 ## Private Leader Access