Files
multica/server/internal/handler/runtime.go
Jiayuan Zhang fc8528d64d feat(autopilot): support assigning to a squad (MUL-2429) (#2888)
* feat(autopilot): support assigning autopilot to a squad (MUL-2429)

Path A (Squad-as-Leader) from the RFC: when an autopilot's assignee is a
squad, dispatch resolves to squad.leader_id and executes against the
leader's runtime — semantics match a human manually assigning the issue
to that squad, no fan-out.

Backend scope only; frontend picker change is a follow-up PR.

Changes:
- 096_autopilot_squad_assignee migration: drop agent FK on
  autopilot.assignee_id, add assignee_type column (default 'agent'),
  add autopilot_run.squad_id attribution column.
- service.AgentReadiness: single source of truth for archived /
  runtime-bound / runtime-online checks. Shared by autopilot
  admission gate, run_only dispatch, and isSquadLeaderReady.
- service.resolveAutopilotLeader: translates assignee_type/id to the
  agent that actually runs the work.
- dispatchCreateIssue: stamps issue with assignee_type='squad' for
  squad autopilots and enqueues via EnqueueTaskForSquadLeader.
- dispatchRunOnly: belt-and-braces readiness re-check after resolving
  squad → leader so a leader that went offline between admission and
  dispatch produces a clean failure instead of a doomed task.
- handler.CreateAutopilot / UpdateAutopilot: accept assignee_type with
  squad/agent existence + leader-archived validation. Backward-compatible
  default of "agent" preserves the contract for older clients.
- Analytics: AutopilotRunStarted/Completed/Failed events carry
  assignee_type and squad_id; PostHog can now group autopilot runs by
  squad without joining back to the autopilot row.

Co-authored-by: multica-agent <github@multica.ai>

* fix(autopilot): reject archived squads, route post-admission skips, cleanup dangling-agent autopilots (MUL-2429)

Addresses three review findings on PR #2888:

1. Archived squad handling: validateAutopilotAssignee now rejects squads
   with archived_at set; resolveAutopilotLeader returns errSquadArchived
   so the admission gate fails closed; DeleteSquad now mirrors the issue
   transfer for autopilot rows (TransferSquadAutopilotsToLeader) so
   surviving autopilots flip to assignee_type='agent' (leader) instead
   of dangling at the archived squad.

2. dispatchRunOnly post-admission readiness: introduces errDispatchSkipped
   sentinel, recognised by DispatchAutopilot via handleDispatchSkip so
   the run is recorded as `skipped` (not `failed`). Manual triggers no
   longer 500 when the leader's runtime goes offline between admission
   and task creation. New TestManualTriggerDoesNotErrorOnPostAdmissionSkip
   locks the behaviour in.

3. Dangling agent assignee after migration 096 dropped the FK:
   shouldSkipDispatch now distinguishes pgx.ErrNoRows / errSquadArchived
   (hard skip — retrying won't help) from transient DB errors
   (fail-open). DeleteAgentRuntime pauses autopilots that target agents
   about to be hard-deleted (ListArchivedAgentIDsByRuntime +
   PauseAutopilotsByAgentAssignees) so the breakage surfaces as a paused
   row in the UI instead of a quiet skip-burning loop.

Unit tests cover the sentinel unwrap contract and errSquadArchived
errors.Is behaviour. Integration test
TestAutopilotDispatchSkipsWhenRuntimeOffline re-verified against a fresh
DB with migration 096 applied.

Co-authored-by: multica-agent <github@multica.ai>

* fix(autopilot): bump last_run_at on post-admission skip (MUL-2429)

Match recordSkippedRun (pre-flight skip) and the success path so the
scheduler / "last seen" UI both reflect that this tick evaluated the
trigger, even when the post-admission readiness gate caught a late
regression.

Addresses Emacs review caveat #1 on PR #2888.

Co-authored-by: multica-agent <github@multica.ai>

* feat(autopilot): mixed agent/squad assignee picker in dialog (MUL-2429)

End-to-end UI for assigning an autopilot to a squad. Closes the PR #2888
backend gap: the squad-as-assignee feature was already wired in Go (Path A,
RFC §4) but the desktop dialog never offered the choice.

- core/types/autopilot: add `AutopilotAssigneeType`, surface
  `assignee_type` on `Autopilot` + Create/Update request payloads.
- views/autopilots/pickers/agent-picker: switch to a polymorphic
  AssigneeSelection (`{type, id}`); render agents and squads as two
  grouped sections with shared pinyin search.
- views/autopilots/autopilot-dialog: maintain `assigneeType` state, send
  it on create/update, render the trigger avatar / hover dot with
  `assignee.type`.
- views/autopilots/autopilots-page + autopilot-detail-page: render the
  assignee row using `autopilot.assignee_type` so squad-typed autopilots
  show the squad avatar + name, not a broken agent lookup.
- locales: add `agents_group` / `squads_group` / `select_assignee` keys
  (en + zh-Hans), keep legacy `select_agent` for callers that still
  reference it.

Co-authored-by: multica-agent <github@multica.ai>

---------

Co-authored-by: Lambda <lambda@multica.ai>
Co-authored-by: multica-agent <github@multica.ai>
2026-05-20 05:30:13 +02:00

734 lines
25 KiB
Go

package handler
import (
"context"
"encoding/json"
"log/slog"
"net/http"
"strconv"
"time"
"github.com/go-chi/chi/v5"
"github.com/jackc/pgx/v5/pgtype"
"github.com/multica-ai/multica/server/pkg/agent"
db "github.com/multica-ai/multica/server/pkg/db/generated"
"github.com/multica-ai/multica/server/pkg/protocol"
)
type AgentRuntimeResponse struct {
ID string `json:"id"`
WorkspaceID string `json:"workspace_id"`
DaemonID *string `json:"daemon_id"`
Name string `json:"name"`
RuntimeMode string `json:"runtime_mode"`
Provider string `json:"provider"`
LaunchHeader string `json:"launch_header"`
Status string `json:"status"`
DeviceInfo string `json:"device_info"`
Metadata any `json:"metadata"`
OwnerID *string `json:"owner_id"`
// Visibility is "private" (default — only the owner / workspace admins
// can bind agents) or "public" (any workspace member can). See migration
// 083 and canUseRuntimeForAgent.
Visibility string `json:"visibility"`
Timezone string `json:"timezone"`
LastSeenAt *string `json:"last_seen_at"`
CreatedAt string `json:"created_at"`
UpdatedAt string `json:"updated_at"`
}
func runtimeToResponse(rt db.AgentRuntime) AgentRuntimeResponse {
var metadata any
if rt.Metadata != nil {
json.Unmarshal(rt.Metadata, &metadata)
}
if metadata == nil {
metadata = map[string]any{}
}
return AgentRuntimeResponse{
ID: uuidToString(rt.ID),
WorkspaceID: uuidToString(rt.WorkspaceID),
DaemonID: textToPtr(rt.DaemonID),
Name: rt.Name,
RuntimeMode: rt.RuntimeMode,
Provider: rt.Provider,
LaunchHeader: agent.LaunchHeader(rt.Provider),
Status: rt.Status,
DeviceInfo: rt.DeviceInfo,
Metadata: metadata,
OwnerID: uuidToPtr(rt.OwnerID),
Visibility: rt.Visibility,
Timezone: rt.Timezone,
LastSeenAt: timestampToPtr(rt.LastSeenAt),
CreatedAt: timestampToString(rt.CreatedAt),
UpdatedAt: timestampToString(rt.UpdatedAt),
}
}
// ---------------------------------------------------------------------------
// Runtime Usage
// ---------------------------------------------------------------------------
type RuntimeUsageResponse struct {
RuntimeID string `json:"runtime_id"`
Date string `json:"date"`
Provider string `json:"provider"`
Model string `json:"model"`
InputTokens int64 `json:"input_tokens"`
OutputTokens int64 `json:"output_tokens"`
CacheReadTokens int64 `json:"cache_read_tokens"`
CacheWriteTokens int64 `json:"cache_write_tokens"`
}
// GetRuntimeUsage returns daily token usage for a runtime, aggregated from
// per-task usage records captured by the daemon. This is scoped to
// Daemon-executed tasks only (i.e. excludes users' local CLI usage of the
// same tool).
func (h *Handler) GetRuntimeUsage(w http.ResponseWriter, r *http.Request) {
runtimeID := chi.URLParam(r, "runtimeId")
runtimeUUID, ok := parseUUIDOrBadRequest(w, runtimeID, "runtime_id")
if !ok {
return
}
rt, err := h.Queries.GetAgentRuntime(r.Context(), runtimeUUID)
if err != nil {
writeError(w, http.StatusNotFound, "runtime not found")
return
}
if _, ok := h.requireWorkspaceMember(w, r, uuidToString(rt.WorkspaceID), "runtime not found"); !ok {
return
}
since := parseSinceParamInTZ(r, 90, rt.Timezone)
resp, err := h.listRuntimeUsage(r.Context(), rt.ID, rt.Timezone, since)
if err != nil {
writeError(w, http.StatusInternalServerError, "failed to list usage")
return
}
writeJSON(w, http.StatusOK, resp)
}
// listRuntimeUsage dispatches between the raw task_usage scan and the
// task_usage_daily rollup based on the UseDailyRollupForRuntimeUsage
// feature flag. Both code paths return rows in the same shape, so the
// handler doesn't care which one ran.
func (h *Handler) listRuntimeUsage(ctx context.Context, runtimeID pgtype.UUID, tz string, since pgtype.Timestamptz) ([]RuntimeUsageResponse, error) {
resolvedRuntimeID := uuidToString(runtimeID)
if h.cfg.UseDailyRollupForRuntimeUsage {
rows, err := h.Queries.ListRuntimeUsageDaily(ctx, db.ListRuntimeUsageDailyParams{
RuntimeID: runtimeID,
Since: since,
Tz: tz,
})
if err != nil {
return nil, err
}
resp := make([]RuntimeUsageResponse, len(rows))
for i, row := range rows {
resp[i] = RuntimeUsageResponse{
RuntimeID: resolvedRuntimeID,
Date: row.Date.Time.Format("2006-01-02"),
Provider: row.Provider,
Model: row.Model,
InputTokens: row.InputTokens,
OutputTokens: row.OutputTokens,
CacheReadTokens: row.CacheReadTokens,
CacheWriteTokens: row.CacheWriteTokens,
}
}
return resp, nil
}
rows, err := h.Queries.ListRuntimeUsage(ctx, db.ListRuntimeUsageParams{
RuntimeID: runtimeID,
Since: since,
Tz: tz,
})
if err != nil {
return nil, err
}
resp := make([]RuntimeUsageResponse, len(rows))
for i, row := range rows {
resp[i] = RuntimeUsageResponse{
RuntimeID: resolvedRuntimeID,
Date: row.Date.Time.Format("2006-01-02"),
Provider: row.Provider,
Model: row.Model,
InputTokens: row.InputTokens,
OutputTokens: row.OutputTokens,
CacheReadTokens: row.CacheReadTokens,
CacheWriteTokens: row.CacheWriteTokens,
}
}
return resp, nil
}
// GetRuntimeTaskActivity returns hourly task activity distribution for a runtime.
func (h *Handler) GetRuntimeTaskActivity(w http.ResponseWriter, r *http.Request) {
runtimeID := chi.URLParam(r, "runtimeId")
runtimeUUID, ok := parseUUIDOrBadRequest(w, runtimeID, "runtime_id")
if !ok {
return
}
rt, err := h.Queries.GetAgentRuntime(r.Context(), runtimeUUID)
if err != nil {
writeError(w, http.StatusNotFound, "runtime not found")
return
}
if _, ok := h.requireWorkspaceMember(w, r, uuidToString(rt.WorkspaceID), "runtime not found"); !ok {
return
}
rows, err := h.Queries.GetRuntimeTaskHourlyActivity(r.Context(), db.GetRuntimeTaskHourlyActivityParams{
RuntimeID: rt.ID,
Tz: rt.Timezone,
})
if err != nil {
writeError(w, http.StatusInternalServerError, "failed to get task activity")
return
}
type HourlyActivity struct {
Hour int `json:"hour"`
Count int `json:"count"`
}
resp := make([]HourlyActivity, len(rows))
for i, row := range rows {
resp[i] = HourlyActivity{Hour: int(row.Hour), Count: int(row.Count)}
}
writeJSON(w, http.StatusOK, resp)
}
// RuntimeUsageByAgentResponse is one (agent, model) row of "Cost by agent".
// Model stays on the wire because cost is computed client-side from a model
// pricing table, intentionally not stored server-side so pricing changes
// don't require a back-fill. The client groups by agent_id and sums.
type RuntimeUsageByAgentResponse struct {
AgentID string `json:"agent_id"`
Model string `json:"model"`
InputTokens int64 `json:"input_tokens"`
OutputTokens int64 `json:"output_tokens"`
CacheReadTokens int64 `json:"cache_read_tokens"`
CacheWriteTokens int64 `json:"cache_write_tokens"`
TaskCount int32 `json:"task_count"`
}
// GetRuntimeUsageByAgent returns per-agent token aggregates for a runtime
// since the cutoff window. Drives the runtime-detail "Cost by agent" tab.
func (h *Handler) GetRuntimeUsageByAgent(w http.ResponseWriter, r *http.Request) {
runtimeID := chi.URLParam(r, "runtimeId")
rt, err := h.Queries.GetAgentRuntime(r.Context(), parseUUID(runtimeID))
if err != nil {
writeError(w, http.StatusNotFound, "runtime not found")
return
}
if _, ok := h.requireWorkspaceMember(w, r, uuidToString(rt.WorkspaceID), "runtime not found"); !ok {
return
}
since := parseSinceParamInTZ(r, 30, rt.Timezone)
rows, err := h.Queries.ListRuntimeUsageByAgent(r.Context(), db.ListRuntimeUsageByAgentParams{
RuntimeID: parseUUID(runtimeID),
Since: since,
})
if err != nil {
writeError(w, http.StatusInternalServerError, "failed to list usage by agent")
return
}
resp := make([]RuntimeUsageByAgentResponse, len(rows))
for i, row := range rows {
resp[i] = RuntimeUsageByAgentResponse{
AgentID: uuidToString(row.AgentID),
Model: row.Model,
InputTokens: row.InputTokens,
OutputTokens: row.OutputTokens,
CacheReadTokens: row.CacheReadTokens,
CacheWriteTokens: row.CacheWriteTokens,
TaskCount: row.TaskCount,
}
}
writeJSON(w, http.StatusOK, resp)
}
// RuntimeUsageByHourResponse is one (hour, model) row. Hours with zero
// activity are omitted by the SQL — clients fill the gap to render a
// continuous 0..23 axis. Model is preserved for client-side cost math.
type RuntimeUsageByHourResponse struct {
Hour int `json:"hour"`
Model string `json:"model"`
InputTokens int64 `json:"input_tokens"`
OutputTokens int64 `json:"output_tokens"`
CacheReadTokens int64 `json:"cache_read_tokens"`
CacheWriteTokens int64 `json:"cache_write_tokens"`
TaskCount int32 `json:"task_count"`
}
// GetRuntimeUsageByHour returns hourly (0..23) token aggregates for a
// runtime since the cutoff window. Drives the "By hour" tab.
func (h *Handler) GetRuntimeUsageByHour(w http.ResponseWriter, r *http.Request) {
runtimeID := chi.URLParam(r, "runtimeId")
rt, err := h.Queries.GetAgentRuntime(r.Context(), parseUUID(runtimeID))
if err != nil {
writeError(w, http.StatusNotFound, "runtime not found")
return
}
if _, ok := h.requireWorkspaceMember(w, r, uuidToString(rt.WorkspaceID), "runtime not found"); !ok {
return
}
since := parseSinceParamInTZ(r, 30, rt.Timezone)
rows, err := h.Queries.GetRuntimeUsageByHour(r.Context(), db.GetRuntimeUsageByHourParams{
RuntimeID: parseUUID(runtimeID),
Since: since,
Tz: rt.Timezone,
})
if err != nil {
writeError(w, http.StatusInternalServerError, "failed to get usage by hour")
return
}
resp := make([]RuntimeUsageByHourResponse, len(rows))
for i, row := range rows {
resp[i] = RuntimeUsageByHourResponse{
Hour: int(row.Hour),
Model: row.Model,
InputTokens: row.InputTokens,
OutputTokens: row.OutputTokens,
CacheReadTokens: row.CacheReadTokens,
CacheWriteTokens: row.CacheWriteTokens,
TaskCount: row.TaskCount,
}
}
writeJSON(w, http.StatusOK, resp)
}
// GetWorkspaceUsageByDay returns daily token usage aggregated by model for the workspace.
func (h *Handler) GetWorkspaceUsageByDay(w http.ResponseWriter, r *http.Request) {
workspaceID := h.resolveWorkspaceID(r)
since := parseSinceParam(r, 30)
rows, err := h.Queries.GetWorkspaceUsageByDay(r.Context(), db.GetWorkspaceUsageByDayParams{
WorkspaceID: parseUUID(workspaceID),
Since: since,
})
if err != nil {
writeError(w, http.StatusInternalServerError, "failed to get usage")
return
}
type DailyUsageRow struct {
Date string `json:"date"`
Model string `json:"model"`
TotalInputTokens int64 `json:"total_input_tokens"`
TotalOutputTokens int64 `json:"total_output_tokens"`
TotalCacheReadTokens int64 `json:"total_cache_read_tokens"`
TotalCacheWriteTokens int64 `json:"total_cache_write_tokens"`
TaskCount int32 `json:"task_count"`
}
resp := make([]DailyUsageRow, len(rows))
for i, row := range rows {
resp[i] = DailyUsageRow{
Date: row.Date.Time.Format("2006-01-02"),
Model: row.Model,
TotalInputTokens: row.TotalInputTokens,
TotalOutputTokens: row.TotalOutputTokens,
TotalCacheReadTokens: row.TotalCacheReadTokens,
TotalCacheWriteTokens: row.TotalCacheWriteTokens,
TaskCount: row.TaskCount,
}
}
writeJSON(w, http.StatusOK, resp)
}
// GetWorkspaceUsageSummary returns total token usage aggregated by model for the workspace.
func (h *Handler) GetWorkspaceUsageSummary(w http.ResponseWriter, r *http.Request) {
workspaceID := h.resolveWorkspaceID(r)
since := parseSinceParam(r, 30)
rows, err := h.Queries.GetWorkspaceUsageSummary(r.Context(), db.GetWorkspaceUsageSummaryParams{
WorkspaceID: parseUUID(workspaceID),
Since: since,
})
if err != nil {
writeError(w, http.StatusInternalServerError, "failed to get usage summary")
return
}
type UsageSummaryRow struct {
Model string `json:"model"`
TotalInputTokens int64 `json:"total_input_tokens"`
TotalOutputTokens int64 `json:"total_output_tokens"`
TotalCacheReadTokens int64 `json:"total_cache_read_tokens"`
TotalCacheWriteTokens int64 `json:"total_cache_write_tokens"`
TaskCount int32 `json:"task_count"`
}
resp := make([]UsageSummaryRow, len(rows))
for i, row := range rows {
resp[i] = UsageSummaryRow{
Model: row.Model,
TotalInputTokens: row.TotalInputTokens,
TotalOutputTokens: row.TotalOutputTokens,
TotalCacheReadTokens: row.TotalCacheReadTokens,
TotalCacheWriteTokens: row.TotalCacheWriteTokens,
TaskCount: row.TaskCount,
}
}
writeJSON(w, http.StatusOK, resp)
}
// parseSinceParam parses the "days" query parameter and returns a timestamptz
// anchored to UTC start-of-today, so `days=N` represents N natural calendar
// days under UTC (today plus N-1 prior full days). `days=1` therefore means
// "today only" — matching the workspace dashboard's `dailyCutoffIso` axis —
// rather than the trailing 24-hour window the wall-clock cutoff used to
// return. The downstream SQL all applies `DATE_TRUNC('day', @since)` so the
// pre-truncated value below lands on the same boundary regardless.
//
// Use parseSinceParamInTZ when the cutoff must align with a per-runtime
// timezone instead of UTC (runtime-detail pages).
func parseSinceParam(r *http.Request, defaultDays int) pgtype.Timestamptz {
days := defaultDays
if d := r.URL.Query().Get("days"); d != "" {
if parsed, err := strconv.Atoi(d); err == nil && parsed > 0 && parsed <= 365 {
days = parsed
}
}
now := time.Now().UTC()
startOfToday := time.Date(now.Year(), now.Month(), now.Day(), 0, 0, 0, 0, time.UTC)
cutoff := startOfToday.AddDate(0, 0, -(days - 1))
return pgtype.Timestamptz{Time: cutoff, Valid: true}
}
// parseSinceParamInTZ is the timezone-aware variant of parseSinceParam.
// Anchors the cutoff to start-of-day-(N) in the supplied IANA zone so that
// `days=N` returns full N+1 calendar buckets in that zone (today's partial
// bucket + N prior full days). If tzName is empty or unparseable, falls back
// to UTC — never returns an error so handlers stay simple.
func parseSinceParamInTZ(r *http.Request, defaultDays int, tzName string) pgtype.Timestamptz {
days := defaultDays
if d := r.URL.Query().Get("days"); d != "" {
if parsed, err := strconv.Atoi(d); err == nil && parsed > 0 && parsed <= 365 {
days = parsed
}
}
loc, err := time.LoadLocation(tzName)
if err != nil || loc == nil {
loc = time.UTC
}
now := time.Now().In(loc)
startOfToday := time.Date(now.Year(), now.Month(), now.Day(), 0, 0, 0, 0, loc)
cutoff := startOfToday.AddDate(0, 0, -days)
return pgtype.Timestamptz{Time: cutoff, Valid: true}
}
// UpdateAgentRuntimeRequest is the JSON body accepted by PATCH /api/runtimes/:id.
// Only fields users may legitimately edit are listed; other runtime metadata
// (provider, daemon_id, status…) flows in from the daemon and is read-only here.
type UpdateAgentRuntimeRequest struct {
// Timezone is an IANA zone name (e.g. "Asia/Shanghai", "America/New_York").
// Validated server-side via time.LoadLocation; "UTC" or empty resets to UTC.
Timezone *string `json:"timezone,omitempty"`
// Visibility flips a runtime between "private" (default — only the owner
// or workspace admins can bind agents) and "public" (any workspace
// member can). Owner / workspace admin only, gated by canEditRuntime.
Visibility *string `json:"visibility,omitempty"`
}
// UpdateAgentRuntime handles PATCH /api/runtimes/:id. Currently only the
// reporting timezone is editable, but the request shape is open-ended so
// future fields (display name, description) can be added without a route
// change. Workspace-membership-checked; no admin-only restriction since the
// runtime owner traditionally edits their own runtime.
func (h *Handler) UpdateAgentRuntime(w http.ResponseWriter, r *http.Request) {
runtimeID := chi.URLParam(r, "runtimeId")
runtimeUUID, ok := parseUUIDOrBadRequest(w, runtimeID, "runtime_id")
if !ok {
return
}
rt, err := h.Queries.GetAgentRuntime(r.Context(), runtimeUUID)
if err != nil {
writeError(w, http.StatusNotFound, "runtime not found")
return
}
member, ok := h.requireWorkspaceMember(w, r, uuidToString(rt.WorkspaceID), "runtime not found")
if !ok {
return
}
if !canEditRuntime(member, rt) {
writeError(w, http.StatusForbidden, "you can only edit your own runtimes")
return
}
var req UpdateAgentRuntimeRequest
if err := json.NewDecoder(r.Body).Decode(&req); err != nil {
writeError(w, http.StatusBadRequest, "invalid JSON body")
return
}
// Validate every field that's present BEFORE running any mutation. A
// PATCH that carries both `timezone` and `visibility` must succeed or
// fail atomically from the caller's perspective: writing timezone first
// and then 400-ing on a bad visibility would leave the row half-updated
// (and the usage rollup rebuilt under a tz the caller never asked for).
//
// This loop also fixes the no-op short-circuit: the prior version
// returned early when `timezone == rt.Timezone`, silently dropping a
// concurrent visibility patch in the same request body.
var (
newTimezone string
needTimezone bool
newVisibility string
needVisibility bool
)
if req.Timezone != nil {
tz := *req.Timezone
if tz == "" {
tz = "UTC"
}
if _, err := time.LoadLocation(tz); err != nil {
writeError(w, http.StatusBadRequest, "invalid IANA timezone")
return
}
if tz != rt.Timezone {
newTimezone = tz
needTimezone = true
}
}
if req.Visibility != nil {
v := *req.Visibility
if v != "private" && v != "public" {
writeError(w, http.StatusBadRequest, "visibility must be 'private' or 'public'")
return
}
if v != rt.Visibility {
newVisibility = v
needVisibility = true
}
}
if needTimezone {
tx, err := h.TxStarter.Begin(r.Context())
if err != nil {
writeError(w, http.StatusInternalServerError, "failed to update runtime")
return
}
defer tx.Rollback(r.Context())
qtx := h.Queries.WithTx(tx)
if err := qtx.LockTaskUsageDailyRollup(r.Context()); err != nil {
slog.Error("LockTaskUsageDailyRollup failed", "error", err, "runtime_id", runtimeID)
writeError(w, http.StatusInternalServerError, "failed to update runtime")
return
}
updated, err := qtx.UpdateAgentRuntimeTimezone(r.Context(), db.UpdateAgentRuntimeTimezoneParams{
ID: runtimeUUID,
Timezone: newTimezone,
})
if err != nil {
slog.Error("UpdateAgentRuntimeTimezone failed", "error", err, "runtime_id", runtimeID)
writeError(w, http.StatusInternalServerError, "failed to update runtime")
return
}
if _, err := qtx.DeleteTaskUsageDailyForRuntime(r.Context(), runtimeUUID); err != nil {
slog.Error("DeleteTaskUsageDailyForRuntime failed", "error", err, "runtime_id", runtimeID)
writeError(w, http.StatusInternalServerError, "failed to rebuild runtime usage")
return
}
if _, err := qtx.DeleteTaskUsageDailyDirtyForRuntime(r.Context(), runtimeUUID); err != nil {
slog.Error("DeleteTaskUsageDailyDirtyForRuntime failed", "error", err, "runtime_id", runtimeID)
writeError(w, http.StatusInternalServerError, "failed to rebuild runtime usage")
return
}
if _, err := qtx.InsertTaskUsageDailyForRuntime(r.Context(), runtimeUUID); err != nil {
slog.Error("InsertTaskUsageDailyForRuntime failed", "error", err, "runtime_id", runtimeID)
writeError(w, http.StatusInternalServerError, "failed to rebuild runtime usage")
return
}
if err := tx.Commit(r.Context()); err != nil {
slog.Error("runtime timezone transaction commit failed", "error", err, "runtime_id", runtimeID)
writeError(w, http.StatusInternalServerError, "failed to update runtime")
return
}
rt = updated
}
if needVisibility {
updated, err := h.Queries.UpdateAgentRuntimeVisibility(r.Context(), db.UpdateAgentRuntimeVisibilityParams{
ID: runtimeUUID,
Visibility: newVisibility,
})
if err != nil {
slog.Error("UpdateAgentRuntimeVisibility failed", "error", err, "runtime_id", runtimeID)
writeError(w, http.StatusInternalServerError, "failed to update runtime")
return
}
rt = updated
// Notify connected clients that runtime metadata changed so the
// list/detail pages refresh — matches the pattern used by
// DeleteAgentRuntime.
h.publish(protocol.EventDaemonRegister, uuidToString(rt.WorkspaceID), "member", uuidToString(member.UserID), map[string]any{
"action": "update",
})
}
writeJSON(w, http.StatusOK, runtimeToResponse(rt))
}
func canEditRuntime(member db.Member, rt db.AgentRuntime) bool {
if roleAllowed(member.Role, "owner", "admin") {
return true
}
return rt.OwnerID.Valid && uuidToString(rt.OwnerID) == uuidToString(member.UserID)
}
// canUseRuntimeForAgent reports whether a workspace member is allowed to
// bind a new agent to — or move an existing agent onto — the given runtime.
// Mirrors canEditRuntime but layers on the runtime's visibility flag so a
// `public` runtime is usable by anyone in the workspace while a `private`
// runtime stays bound to its owner. Workspace owners/admins keep an
// administrative override for both. See migration 083 for the visibility
// column.
func canUseRuntimeForAgent(member db.Member, rt db.AgentRuntime) bool {
if roleAllowed(member.Role, "owner", "admin") {
return true
}
if rt.Visibility == "public" {
return true
}
return rt.OwnerID.Valid && uuidToString(rt.OwnerID) == uuidToString(member.UserID)
}
func (h *Handler) ListAgentRuntimes(w http.ResponseWriter, r *http.Request) {
workspaceID := h.resolveWorkspaceID(r)
var runtimes []db.AgentRuntime
var err error
if ownerFilter := r.URL.Query().Get("owner"); ownerFilter == "me" {
userID, ok := requireUserID(w, r)
if !ok {
return
}
runtimes, err = h.Queries.ListAgentRuntimesByOwner(r.Context(), db.ListAgentRuntimesByOwnerParams{
WorkspaceID: parseUUID(workspaceID),
OwnerID: parseUUID(userID),
})
} else {
runtimes, err = h.Queries.ListAgentRuntimes(r.Context(), parseUUID(workspaceID))
}
if err != nil {
writeError(w, http.StatusInternalServerError, "failed to list runtimes")
return
}
resp := make([]AgentRuntimeResponse, len(runtimes))
for i, rt := range runtimes {
resp[i] = runtimeToResponse(rt)
}
writeJSON(w, http.StatusOK, resp)
}
// DeleteAgentRuntime deletes a runtime after permission and dependency checks.
func (h *Handler) DeleteAgentRuntime(w http.ResponseWriter, r *http.Request) {
runtimeID := chi.URLParam(r, "runtimeId")
runtimeUUID, ok := parseUUIDOrBadRequest(w, runtimeID, "runtime_id")
if !ok {
return
}
rt, err := h.Queries.GetAgentRuntime(r.Context(), runtimeUUID)
if err != nil {
writeError(w, http.StatusNotFound, "runtime not found")
return
}
wsID := uuidToString(rt.WorkspaceID)
member, ok := h.requireWorkspaceMember(w, r, wsID, "runtime not found")
if !ok {
return
}
// Permission: owner/admin can delete any runtime; members can only delete their own.
if !canEditRuntime(member, rt) {
writeError(w, http.StatusForbidden, "you can only delete your own runtimes")
return
}
userID := uuidToString(member.UserID)
// Check if any active (non-archived) agents are bound to this runtime.
activeCount, err := h.Queries.CountActiveAgentsByRuntime(r.Context(), rt.ID)
if err != nil {
writeError(w, http.StatusInternalServerError, "failed to check runtime dependencies")
return
}
if activeCount > 0 {
writeError(w, http.StatusConflict, "cannot delete runtime: it has active agents bound to it. Archive or reassign the agents first.")
return
}
// Pause autopilots pointing at the archived agents BEFORE we delete
// them. Migration 096 dropped the autopilot.assignee_id agent FK, so a
// hard-delete here would otherwise leave dangling rows that subsequent
// scheduler ticks would skip with "assignee agent no longer exists" —
// quiet, but burning a run record every tick until an operator notices.
// Pausing makes the breakage visible in the autopilot list so the owner
// can re-point or delete the row instead.
archivedAgentIDs, err := h.Queries.ListArchivedAgentIDsByRuntime(r.Context(), rt.ID)
if err != nil {
writeError(w, http.StatusInternalServerError, "failed to enumerate archived agents")
return
}
if len(archivedAgentIDs) > 0 {
if err := h.Queries.PauseAutopilotsByAgentAssignees(r.Context(), archivedAgentIDs); err != nil {
slog.Warn("pause autopilots for archived agents failed",
"runtime_id", uuidToString(rt.ID), "error", err)
}
}
// Remove archived agents so the FK constraint (ON DELETE RESTRICT) won't block deletion.
if err := h.Queries.DeleteArchivedAgentsByRuntime(r.Context(), rt.ID); err != nil {
writeError(w, http.StatusInternalServerError, "failed to clean up archived agents")
return
}
if err := h.Queries.DeleteAgentRuntime(r.Context(), rt.ID); err != nil {
writeError(w, http.StatusInternalServerError, "failed to delete runtime")
return
}
slog.Info("runtime deleted", "runtime_id", uuidToString(rt.ID), "deleted_by", userID)
// Notify frontend to refresh runtime list.
h.publish(protocol.EventDaemonRegister, wsID, "member", userID, map[string]any{
"action": "delete",
})
writeJSON(w, http.StatusOK, map[string]string{"status": "ok"})
}