Files
multica/server/pkg/db/queries/agent.sql
devv-eve 5fa1da448f fix(chat): preserve chat session resume pointer across failures (#1360)
* fix(chat): preserve chat session resume pointer across failures

The chat 'forgets earlier messages' bug came from PriorSessionID being
silently lost in several edge cases:

- UpdateChatSessionSession unconditionally overwrote chat_session.session_id,
  so any task that completed without a session_id (early agent crash,
  missing result) wiped the resume pointer to NULL.
- CompleteAgentTask + UpdateChatSessionSession ran in separate calls. A
  follow-up chat message claimed in between resumed against a stale (or
  NULL) session and started over.
- FailAgentTask never wrote session_id back, so a task that established
  a real session before failing lost its resume pointer.
- ClaimTaskByRuntime only trusted chat_session.session_id and never
  fell back to the existing GetLastChatTaskSession query, so a single
  bad turn could permanently drop the conversation memory.

This change:

- Use COALESCE in UpdateChatSessionSession so empty inputs preserve the
  existing pointer; surface DB errors instead of swallowing them.
- Run CompleteAgentTask/FailAgentTask + UpdateChatSessionSession inside
  the same transaction (TaskService now takes a TxStarter).
- Extend FailAgentTask + the daemon FailTask path (client, handler,
  service) to forward session_id/work_dir, so failed/blocked tasks that
  built a real session still record it.
- Fall back to GetLastChatTaskSession in ClaimTaskByRuntime when the
  chat_session pointer is missing, and include failed tasks in that
  lookup so a single failure can't lose the conversation.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix(daemon): forward session_id/work_dir on blocked + timeout paths

runTask previously dropped result.SessionID and env.WorkDir on the
non-completed return paths:

- timeout returned a naked error, so handleTask called FailTask with
  empty session info and the chat resume pointer was either left stale
  or eventually overwritten with NULL.
- blocked / failed (default branch) returned a TaskResult without
  SessionID / WorkDir, so even though FailTask now COALESCEs into
  chat_session, there was no value to write through.
- the empty-output completion path was the same: it raised an error
  even when a real session_id had been built.

All three paths now return a TaskResult that carries the SessionID /
WorkDir the backend produced. Combined with the COALESCE-based update
in UpdateChatSessionSession and the FailTask plumbing introduced in
PR #1360, the next chat turn can always resume from the latest agent
session — even when the previous turn timed out, was rate-limited, or
returned an empty completion — instead of starting over with no memory
of the conversation.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix(copilot): capture session id from session.start as fallback

The Copilot backend only read sessionId from the synthetic 'result'
event, ignoring the one already present on session.start. When the CLI
was killed before result arrived (timeout, cancel, crash, or a
session.error mid-turn), the daemon reported SessionID="" and the
chat-session resume pointer could not advance — causing the chat to
silently drop conversation memory on the next turn.

Capture session.start.sessionId into state up front, and only let
'result' overwrite it when it actually carries one. result still wins
when present (it is the authoritative end-of-turn record).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix(copilot): parse premiumRequests as float to preserve session id

Copilot CLI v1.0.32 serializes premiumRequests as a float (e.g. 7.5),
not an integer. Our copilotResultUsage struct typed it as int, which
made the entire 'result' line fail json.Unmarshal — silently dropping
sessionId on every turn.

This was the real cause of chat memory loss: the daemon reported
SessionID="" to the server, chat_session.session_id stayed NULL, and
the next chat turn never received --resume <id>, so each turn started
a fresh Copilot session with no prior context.

Add a regression test using the real JSON line from CLI v1.0.32 that
asserts sessionId is preserved when premiumRequests is fractional.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Devv <devv@Devvs-Mac-mini.local>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Eve <eve@multica.ai>
Co-authored-by: yushen <ldnvnbl@gmail.com>
2026-04-19 22:50:33 -07:00

205 lines
7.6 KiB
SQL

-- name: ListAgents :many
SELECT * FROM agent
WHERE workspace_id = $1 AND archived_at IS NULL
ORDER BY created_at ASC;
-- name: ListAllAgents :many
SELECT * FROM agent
WHERE workspace_id = $1
ORDER BY created_at ASC;
-- name: GetAgent :one
SELECT * FROM agent
WHERE id = $1;
-- name: GetAgentInWorkspace :one
SELECT * FROM agent
WHERE id = $1 AND workspace_id = $2;
-- name: CreateAgent :one
INSERT INTO agent (
workspace_id, name, description, avatar_url, runtime_mode,
runtime_config, runtime_id, visibility, max_concurrent_tasks, owner_id,
instructions, custom_env, custom_args, mcp_config
) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14)
RETURNING *;
-- name: UpdateAgent :one
UPDATE agent SET
name = COALESCE(sqlc.narg('name'), name),
description = COALESCE(sqlc.narg('description'), description),
avatar_url = COALESCE(sqlc.narg('avatar_url'), avatar_url),
runtime_config = COALESCE(sqlc.narg('runtime_config'), runtime_config),
runtime_mode = COALESCE(sqlc.narg('runtime_mode'), runtime_mode),
runtime_id = COALESCE(sqlc.narg('runtime_id'), runtime_id),
visibility = COALESCE(sqlc.narg('visibility'), visibility),
status = COALESCE(sqlc.narg('status'), status),
max_concurrent_tasks = COALESCE(sqlc.narg('max_concurrent_tasks'), max_concurrent_tasks),
instructions = COALESCE(sqlc.narg('instructions'), instructions),
custom_env = COALESCE(sqlc.narg('custom_env'), custom_env),
custom_args = COALESCE(sqlc.narg('custom_args'), custom_args),
mcp_config = COALESCE(sqlc.narg('mcp_config'), mcp_config),
updated_at = now()
WHERE id = $1
RETURNING *;
-- name: ClearAgentMcpConfig :one
UPDATE agent SET mcp_config = NULL, updated_at = now()
WHERE id = $1
RETURNING *;
-- name: ArchiveAgent :one
UPDATE agent SET archived_at = now(), archived_by = $2, updated_at = now()
WHERE id = $1
RETURNING *;
-- name: RestoreAgent :one
UPDATE agent SET archived_at = NULL, archived_by = NULL, updated_at = now()
WHERE id = $1
RETURNING *;
-- name: ListAgentTasks :many
SELECT * FROM agent_task_queue
WHERE agent_id = $1
ORDER BY created_at DESC;
-- name: CreateAgentTask :one
INSERT INTO agent_task_queue (agent_id, runtime_id, issue_id, status, priority, trigger_comment_id)
VALUES ($1, $2, $3, 'queued', $4, sqlc.narg(trigger_comment_id))
RETURNING *;
-- name: CancelAgentTasksByIssue :exec
UPDATE agent_task_queue
SET status = 'cancelled'
WHERE issue_id = $1 AND status IN ('queued', 'dispatched', 'running');
-- name: CancelAgentTasksByAgent :exec
UPDATE agent_task_queue
SET status = 'cancelled'
WHERE agent_id = $1 AND status IN ('queued', 'dispatched', 'running');
-- name: GetAgentTask :one
SELECT * FROM agent_task_queue
WHERE id = $1;
-- name: ClaimAgentTask :one
-- Claims the next queued task for an agent, enforcing per-(issue, agent) serialization:
-- a task is only claimable when no other task for the same issue AND same agent is
-- already dispatched or running. This allows different agents to work on the same
-- issue in parallel while preventing a single agent from running duplicate tasks.
-- Chat tasks (issue_id IS NULL) use chat_session_id for serialization instead.
UPDATE agent_task_queue
SET status = 'dispatched', dispatched_at = now()
WHERE id = (
SELECT atq.id FROM agent_task_queue atq
WHERE atq.agent_id = $1 AND atq.status = 'queued'
AND NOT EXISTS (
SELECT 1 FROM agent_task_queue active
WHERE active.agent_id = atq.agent_id
AND active.status IN ('dispatched', 'running')
AND (
(atq.issue_id IS NOT NULL AND active.issue_id = atq.issue_id)
OR (atq.chat_session_id IS NOT NULL AND active.chat_session_id = atq.chat_session_id)
)
)
ORDER BY atq.priority DESC, atq.created_at ASC
LIMIT 1
FOR UPDATE SKIP LOCKED
)
RETURNING *;
-- name: StartAgentTask :one
UPDATE agent_task_queue
SET status = 'running', started_at = now()
WHERE id = $1 AND status = 'dispatched'
RETURNING *;
-- name: CompleteAgentTask :one
UPDATE agent_task_queue
SET status = 'completed', completed_at = now(), result = $2, session_id = $3, work_dir = $4
WHERE id = $1 AND status = 'running'
RETURNING *;
-- name: GetLastTaskSession :one
-- Returns the session_id and work_dir from the most recent completed task
-- for a given (agent_id, issue_id) pair, used for session resumption.
SELECT session_id, work_dir FROM agent_task_queue
WHERE agent_id = $1 AND issue_id = $2 AND status = 'completed' AND session_id IS NOT NULL
ORDER BY completed_at DESC
LIMIT 1;
-- name: FailAgentTask :one
-- Marks a task as failed. session_id and work_dir are merged via COALESCE so
-- if the agent already established a real session before failing (e.g. it
-- crashed mid-conversation, was cancelled, or hit a tool error) the resume
-- pointer is preserved on the task row. The next chat task can then fall
-- back to GetLastChatTaskSession and continue the conversation instead of
-- silently starting over.
UPDATE agent_task_queue
SET status = 'failed',
completed_at = now(),
error = $2,
session_id = COALESCE(sqlc.narg('session_id'), session_id),
work_dir = COALESCE(sqlc.narg('work_dir'), work_dir)
WHERE id = $1 AND status IN ('dispatched', 'running')
RETURNING *;
-- name: FailStaleTasks :many
-- Fails tasks stuck in dispatched/running beyond the given thresholds.
-- Handles cases where the daemon is alive but the task is orphaned
-- (e.g. agent process hung, daemon failed to report completion).
UPDATE agent_task_queue
SET status = 'failed', completed_at = now(), error = 'task timed out'
WHERE (status = 'dispatched' AND dispatched_at < now() - make_interval(secs => @dispatch_timeout_secs::double precision))
OR (status = 'running' AND started_at < now() - make_interval(secs => @running_timeout_secs::double precision))
RETURNING id, agent_id, issue_id;
-- name: CancelAgentTask :one
UPDATE agent_task_queue
SET status = 'cancelled', completed_at = now()
WHERE id = $1 AND status IN ('queued', 'dispatched', 'running')
RETURNING *;
-- name: CountRunningTasks :one
SELECT count(*) FROM agent_task_queue
WHERE agent_id = $1 AND status IN ('dispatched', 'running');
-- name: HasActiveTaskForIssue :one
-- Returns true if there is any queued, dispatched, or running task for the issue.
SELECT count(*) > 0 AS has_active FROM agent_task_queue
WHERE issue_id = $1 AND status IN ('queued', 'dispatched', 'running');
-- name: HasPendingTaskForIssue :one
-- Returns true if there is a queued or dispatched (but not yet running) task for the issue.
-- Used by the coalescing queue: allow enqueue when a task is running (so
-- the agent picks up new comments on the next cycle) but skip if a pending
-- task already exists (natural dedup).
SELECT count(*) > 0 AS has_pending FROM agent_task_queue
WHERE issue_id = $1 AND status IN ('queued', 'dispatched');
-- name: HasPendingTaskForIssueAndAgent :one
-- Returns true if a specific agent already has a queued or dispatched task
-- for the given issue. Used by @mention trigger dedup.
SELECT count(*) > 0 AS has_pending FROM agent_task_queue
WHERE issue_id = $1 AND agent_id = $2 AND status IN ('queued', 'dispatched');
-- name: ListPendingTasksByRuntime :many
SELECT * FROM agent_task_queue
WHERE runtime_id = $1 AND status IN ('queued', 'dispatched')
ORDER BY priority DESC, created_at ASC;
-- name: ListActiveTasksByIssue :many
SELECT * FROM agent_task_queue
WHERE issue_id = $1 AND status IN ('dispatched', 'running')
ORDER BY created_at DESC;
-- name: ListTasksByIssue :many
SELECT * FROM agent_task_queue
WHERE issue_id = $1
ORDER BY created_at DESC;
-- name: UpdateAgentStatus :one
UPDATE agent SET status = $2, updated_at = now()
WHERE id = $1
RETURNING *;