Files
multica/server/migrations/079_backfill_api_invalid_request.down.sql
Bohan Jiang 6d9ebb0fdd fix(daemon): unblock issues stuck on a poisoned-image agent session (#2314)
* fix(daemon): treat upstream API 400 invalid_request_error as poisoned session

A markdown-linked image in an issue description that the agent downloads as
a tiny CDN auth-error file and Read's as a PNG poisons the conversation:
the LLM API rejects the bad image with 400 invalid_request_error, the
session_id is pinned mid-flight, and every follow-up task on the issue
(comment-trigger, auto-retry) resumes the same poisoned conversation and
hits the same 400 — the issue can no longer be executed even after the
description is cleaned up.

Mirror the existing fallback-output classifier on the error side: detect
"API Error: ... 400 ... invalid_request_error" in the agent error string,
persist failure_reason='api_invalid_request', and add it to the
GetLastTaskSession exclusion list so the next task starts a fresh
session that re-reads the (now-clean) description.

Co-authored-by: multica-agent <github@multica.ai>

* fix(daemon): unblock issues already poisoned by API 400 invalid_request_error

The forward-only classifier from the previous commit only tags new failures.
Issues like MUL-1918 already have multiple failed-task rows whose
failure_reason is the pre-fix default 'agent_error', and GetLastTaskSession
falls back to those legacy rows on the next claim — so deploying the
classifier alone leaves existing poisoned issues stuck (GPT-Boy review
on PR #2314).

Two complementary changes:

- Migration 079 backfills failure_reason='api_invalid_request' on every
  pre-existing 'agent_error' row whose error text matches the canonical
  Anthropic 400 invalid_request_error shape. Keeps observability
  consistent (multica issue runs / UI now report the right reason).

- GetLastTaskSession adds a defensive ILIKE clause on error text. Closes
  the deploy-window gap where the old binary could write a new
  'agent_error' row between the migration running and the new code
  taking over, and protects against future error-format variants the
  daemon classifier might miss.

Plus regression tests covering the legacy + new coexistence case GPT-Boy
flagged, and a guard rail asserting benign 'agent_error' failures
(timeouts, tool errors) still resume their session.

Co-authored-by: multica-agent <github@multica.ai>

---------

Co-authored-by: multica-agent <github@multica.ai>
2026-05-09 14:39:10 +08:00

11 lines
465 B
SQL

-- Reverse the backfill: reset the rows we re-classified back to the
-- legacy 'agent_error' default. The error text we use as the witness is
-- preserved on the row so the same WHERE clause still selects the same
-- set unless someone manually relabels in between.
UPDATE agent_task_queue
SET failure_reason = 'agent_error'
WHERE status = 'failed'
AND failure_reason = 'api_invalid_request'
AND error ILIKE '%400%'
AND error ILIKE '%invalid_request_error%';