Files
multica/server/internal/handler/agent_test.go
Naiyuan Qing 21e3cfaa01 Agent runtime status redesign: split presence into availability + last-task (#1794)
* feat(agent-status): add workspace live-tasks endpoint and TaskFailureReason type

Lays the API + type contract for the front-end agent presence cache:

- New `GET /api/active-tasks` returns active (queued/dispatched/running)
  tasks plus failed tasks within the last 2 minutes for the current
  workspace. The 2-minute window powers a UI-side auto-clearing "Failed"
  agent state without back-end pollers.
- `agent_task_queue` has no workspace_id column, so the query JOINs agent;
  `SELECT atq.*` keeps `failure_reason` (migration 055) on the wire.
- Adds `TaskFailureReason` to `AgentTask` so the UI can map the 5 backend
  classifiers (agent_error / timeout / runtime_offline / runtime_recovery
  / manual) to copy without parsing free-text errors.
- New `api.getActiveTasksForWorkspace()` client method; workspace is
  resolved server-side from the X-Workspace-Slug header (no path param,
  matching /api/agents and /api/runtimes conventions).

Includes the joint engineering plan and designer brief that scope the
broader Agent / Runtime status redesign — Phase 0 is this contract plus
the front-end derivation layer landing in the next commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(agent-status): derive presence/health states with WS sync and desktop IPC bridge

Adds the front-end derivation layer that turns raw server data into the
user-facing 5-state agent / 4-state runtime enums. UI files are
deliberately untouched in this commit — derivation lives behind hooks
(useAgentPresence, useRuntimeHealth) that any component can call with
zero additional network traffic.

Architecture:
- Derivation is pure functions in packages/core/{agents,runtimes}; the
  back-end stays free of UI translation. Agents algorithm: runtime
  offline > recent failed (2-min window) > running > queued > available.
  Runtimes algorithm: status + last_seen_at -> online / recently_lost /
  offline / about_to_gc.
- A single workspace-wide active-tasks query backs all per-agent
  presence reads, eliminating N+1 across hover cards, list rows, and
  pickers. 30-second tick re-renders the hooks so the failed window
  expires even when no underlying data changes.
- WS task lifecycle events (dispatch / completed / failed / cancelled)
  invalidate active-tasks via the prefix dispatcher. completed/failed
  were removed from specificEvents so they go through both the prefix
  invalidate and the existing chat ws.on() handlers. Reconnect refetch
  picks up active-tasks too.
- Desktop bridges window.daemonAPI.onStatusChange directly into the
  runtimes cache via setQueryData, giving the local daemon sub-second
  feedback (vs. 75s server sweep). Bridge is wsId-bound so workspace
  switches automatically rebind the subscription; daemon_id matching
  covers the same-daemon-multiple-providers case.

24 derivation unit tests cover all branches plus null/empty/boundary
inputs (FAILED_WINDOW_MS edges, null last_seen_at, missing
completed_at). Full core suite: 112 tests passing. Typecheck green
across all 8 workspace packages.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(agent-status): redesign agent runtime status as two orthogonal dimensions

Splits the conflated 5-state agent presence into two independent axes:

- AgentAvailability (3-state): online / unstable / offline — drives the
  dot indicator everywhere a dot appears. Pure runtime reachability;
  never sticky-red because of a past task outcome.

- LastTaskState (5-state): running / completed / failed / cancelled /
  idle — surfaced as text + icon on focused surfaces (hover card,
  agent detail page, agents list, runtime detail). Never colours the dot.

Major changes:

* Domain layer: AgentPresence union → AgentAvailability + LastTaskState.
  derive-presence split into deriveAgentAvailability + deriveLastTaskState
  + deriveAgentPresenceDetail orchestrator. Tests reorganised into three
  groups (availability invariants, last-task invariants, composition).

* Visual config: presenceConfig (5 entries) → availabilityConfig (3) +
  taskStateConfig (5). availabilityOrder + lastTaskOrder for filter chips.

* Workspace-level presence prefetch: new useWorkspacePresencePrefetch
  hook + WorkspacePresencePrefetch mount component, wired into
  DashboardLayout (web) and WorkspaceRouteLayout (desktop). Hover cards
  render synchronously with no skeleton flash on first hover.

* ActorAvatar hover: flipped default — disableHoverCard removed,
  enableHoverCard added (default false). Opt-in at ~14 decision-moment
  surfaces; pickers / decoration sub-chips stay plain. Status dot
  decoupled (showStatusDot prop) so picker rows can show presence
  without nesting popovers.

* Hover cards: AgentProfileCard simplified — availability dot only,
  Detail link top-right (logs live on the detail page). New
  MemberProfileCard mirrors the structure: name + role + email +
  top-2 owned agents (sorted by 30d run count) with click-through to
  agent detail.

* Agents list: split Status into two columns — availability (3-color
  dot + label) and Last run (task icon + label, optional running
  counts). Two independent filter chip groups (Status + Last run);
  combination acts as intersection ("online + failed" finds broken-
  but-alive agents).

* Other UI surfaces (issue list/board/detail, comments, autopilots,
  projects, runtimes, mention autocomplete, subscribers picker)
  updated to the new dot semantics; status dot now strictly 3-color.

Server changes accompany the client redesign — workspace-wide
agent-task-snapshot endpoint, runtime usage queries, etc. — to feed
the derive layer with the data it needs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(agent-detail): drop last-task chip from detail header + inspector

The Recent work section on the agent detail page already shows the same
data (with task titles, timestamps, error context) — surfacing
"Completed" / "Failed" / etc. up in the header was redundant chrome.
Detail surfaces now show only the 3-state availability dot.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(tables): handle narrow viewports across agents / skills / runtimes

Three table layouts were squeezing content into adjacent cells at
intermediate widths. Each fix is small and targeted:

* runtime-list: the Runtime cell's base name had `shrink-0`, so it
  refused to truncate when its grid column was narrowed under width
  pressure — the name visually overflowed into the Health column
  ("ClaudeOnline" etc). Removed shrink-0, added truncate. The Health
  column was also a fixed 9.5rem reservation for the worst-case
  "Recently lost · 2m 14s ago" copy; switched to minmax(0,1fr) so it
  competes fairly with Runtime.

* skills-page: had a single grid template with no responsive
  breakpoints — all 6 columns were rendered at any width and got
  visually jammed below md. Added a <md template that drops Source +
  Updated; the row markup hides those cells via `hidden md:block` /
  `md:contents`.

* agent-list-item: the new Last run column was reserved at minmax(8rem,
  max-content); on narrow md viewports the 8rem floor pushed the row
  past available width. Changed to minmax(0,max-content) so the cell
  shrinks under pressure (its content already truncates).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(agent-card): hover-only Detail + add Runtime row + breathing room

Three small polish tweaks to the agent hover card:

- Detail link gets `mr-1` + fades in only on card hover (group-hover).
  It was visually flush against the popover edge and competing for
  attention; now it stays out of the way during a quick glance and
  surfaces only when the user is dwelling on the card.

- Runtime row is back, in the meta block (cloud/local icon + runtime
  name). The earlier removal was over-aggressive — knowing where an
  agent runs is part of "who is this agent". The wifi badge stays
  dropped because the availability dot in the header already conveys
  reachability.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(runtime): wifi-style health icon (4-state) for runtime list + agent card

Replaces the 6px coloured dot with a wifi-shape icon that carries both
state (Wifi vs WifiOff) and severity (success/warning/muted/destructive).

Mapping:
- online        → Wifi (success)
- recently_lost → WifiHigh (warning) — transient hiccup, fewer bars
- offline       → WifiOff (muted)    — long unreachable
- about_to_gc   → WifiOff (destructive) — sweeper coming soon

Used in two places:

- Runtime list: replaces HealthDot in the dedicated leading-icon column.
  Bumped the column from 0.5rem (dot-sized) to 0.875rem (icon-sized).

- Agent profile card RuntimeRow: derives runtime health from runtime +
  clock (matching the 4-state semantics) and renders HealthIcon next
  to the runtime name. Cloud runtimes always read as online. The
  duplicate signal with the header availability dot is intentional —
  it confirms WHICH runtime is the one currently in the dot's state.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 19:21:13 +08:00

188 lines
6.8 KiB
Go

package handler
import (
"context"
"encoding/json"
"net/http"
"net/http/httptest"
"testing"
)
// TestListWorkspaceAgentTaskSnapshot covers the agent presence snapshot endpoint:
// every active task (queued/dispatched/running) PLUS each agent's most recent
// OUTCOME task (completed/failed only). Cancelled tasks are excluded by design
// from the outcome half — they're a procedural signal, not an outcome, and
// must NOT mask a prior failure.
//
// The fixtures cover every branch the SQL must classify:
// - actives are always returned, no dedup
// - outcomes are deduped to "latest per agent" by completed_at
// - the OLD 2-minute window must be irrelevant (a 5-minute-old failure is
// still returned if it's the latest outcome)
// - cancelled rows are NEVER returned, even when they are temporally newer
// than a failure — this is what keeps the failed signal sticky after the
// user cancels their queued retry
func TestListWorkspaceAgentTaskSnapshot(t *testing.T) {
if testHandler == nil {
t.Skip("database not available")
}
ctx := context.Background()
// Three agents so we can verify per-agent semantics independently.
agentA := createHandlerTestAgent(t, "snapshot-agent-a", []byte(`{}`))
agentB := createHandlerTestAgent(t, "snapshot-agent-b", []byte(`{}`))
agentC := createHandlerTestAgent(t, "snapshot-agent-c", []byte(`{}`))
type taskFixture struct {
agentID string
status string
completedAt string // SQL expression; "" for NULL
label string
}
fixtures := []taskFixture{
// Agent A — actives + a newer completed supersedes an older failed.
{agentA, "queued", "", "A.queued"},
{agentA, "dispatched", "", "A.dispatched"},
{agentA, "running", "", "A.running"},
{agentA, "failed", "now() - interval '10 minutes'", "A.old_failed"},
{agentA, "completed", "now() - interval '30 seconds'", "A.latest_completed"},
// Agent B — old failure with no later outcome stays visible (no
// time window).
{agentB, "failed", "now() - interval '5 minutes'", "B.stale_failed_kept"},
// Agent C — failure followed by a NEWER cancelled. The cancelled
// must be skipped by the SQL filter so the failure remains visible.
// This is the scenario where a user fails, then cancels their
// queued retry to debug.
{agentC, "failed", "now() - interval '5 minutes'", "C.failure"},
{agentC, "cancelled", "now() - interval '30 seconds'", "C.newer_cancelled_must_be_ignored"},
}
insertedIDs := make([]string, 0, len(fixtures))
for _, f := range fixtures {
var id string
var query string
if f.completedAt == "" {
query = `INSERT INTO agent_task_queue (agent_id, runtime_id, status, priority)
VALUES ($1, $2, $3, 0) RETURNING id`
} else {
query = `INSERT INTO agent_task_queue (agent_id, runtime_id, status, priority, completed_at)
VALUES ($1, $2, $3, 0, ` + f.completedAt + `) RETURNING id`
}
if err := testPool.QueryRow(ctx, query, f.agentID, testRuntimeID, f.status).Scan(&id); err != nil {
t.Fatalf("insert %s: %v", f.label, err)
}
insertedIDs = append(insertedIDs, id)
}
t.Cleanup(func() {
for _, id := range insertedIDs {
testPool.Exec(ctx, `DELETE FROM agent_task_queue WHERE id = $1`, id)
}
})
w := httptest.NewRecorder()
req := newRequest(http.MethodGet, "/api/agent-task-snapshot", nil)
testHandler.ListWorkspaceAgentTaskSnapshot(w, req)
if w.Code != http.StatusOK {
t.Fatalf("ListWorkspaceAgentTaskSnapshot: expected 200, got %d: %s", w.Code, w.Body.String())
}
var tasks []AgentTaskResponse
if err := json.NewDecoder(w.Body).Decode(&tasks); err != nil {
t.Fatalf("decode response: %v", err)
}
// Per-agent breakdown so leftover tasks from other tests in this package
// don't pollute the assertions.
type key struct{ agent, status string }
counts := map[key]int{}
for _, task := range tasks {
if task.AgentID != agentA && task.AgentID != agentB && task.AgentID != agentC {
continue
}
counts[key{task.AgentID, task.Status}]++
}
wantCounts := map[key]int{
// Agent A: 3 actives + the latest outcome (completed). The older
// failed must be excluded by DISTINCT ON.
{agentA, "queued"}: 1,
{agentA, "dispatched"}: 1,
{agentA, "running"}: 1,
{agentA, "completed"}: 1,
// Agent B: just the failed outcome.
{agentB, "failed"}: 1,
// Agent C: the failed outcome must survive the temporally newer
// cancellation — that's the whole point of excluding cancelled
// from the outcome half.
{agentC, "failed"}: 1,
}
for k, expected := range wantCounts {
if got := counts[k]; got != expected {
t.Errorf("agent=%s status=%s: expected %d, got %d", k.agent, k.status, expected, got)
}
}
// The OLD failed terminal on agent A must be excluded.
if counts[key{agentA, "failed"}] != 0 {
t.Errorf("agent A old failed must be superseded by newer completed; got %d", counts[key{agentA, "failed"}])
}
// No cancelled row may ever appear in the snapshot — they're filtered at
// SQL level so the front-end's "cancel doesn't mask failure" rule lands
// without any front-end logic.
for _, agentID := range []string{agentA, agentB, agentC} {
if counts[key{agentID, "cancelled"}] != 0 {
t.Errorf("agent %s: cancelled rows must be excluded from snapshot; got %d",
agentID, counts[key{agentID, "cancelled"}])
}
}
}
func TestCreateAgent_RejectsDuplicateName(t *testing.T) {
if testHandler == nil {
t.Skip("database not available")
}
// Clean up any agents created by this test.
t.Cleanup(func() {
testPool.Exec(context.Background(),
`DELETE FROM agent WHERE workspace_id = $1 AND name = $2`,
testWorkspaceID, "duplicate-name-test-agent",
)
})
body := map[string]any{
"name": "duplicate-name-test-agent",
"description": "first description",
"runtime_id": testRuntimeID,
"visibility": "private",
"max_concurrent_tasks": 1,
}
// First call — creates the agent.
w1 := httptest.NewRecorder()
testHandler.CreateAgent(w1, newRequest(http.MethodPost, "/api/agents", body))
if w1.Code != http.StatusCreated {
t.Fatalf("first CreateAgent: expected 201, got %d: %s", w1.Code, w1.Body.String())
}
var resp1 map[string]any
if err := json.NewDecoder(w1.Body).Decode(&resp1); err != nil {
t.Fatalf("decode first response: %v", err)
}
agentID1, _ := resp1["id"].(string)
if agentID1 == "" {
t.Fatalf("first CreateAgent: no id in response: %v", resp1)
}
// Second call — same name must be rejected with 409 Conflict.
// The unique constraint prevents silent duplicates; the UI shows a clear error.
body["description"] = "updated description"
w2 := httptest.NewRecorder()
testHandler.CreateAgent(w2, newRequest(http.MethodPost, "/api/agents", body))
if w2.Code != http.StatusConflict {
t.Fatalf("second CreateAgent with duplicate name: expected 409, got %d: %s", w2.Code, w2.Body.String())
}
}