* fix(daemon): treat upstream API 400 invalid_request_error as poisoned session
A markdown-linked image in an issue description that the agent downloads as
a tiny CDN auth-error file and Read's as a PNG poisons the conversation:
the LLM API rejects the bad image with 400 invalid_request_error, the
session_id is pinned mid-flight, and every follow-up task on the issue
(comment-trigger, auto-retry) resumes the same poisoned conversation and
hits the same 400 — the issue can no longer be executed even after the
description is cleaned up.
Mirror the existing fallback-output classifier on the error side: detect
"API Error: ... 400 ... invalid_request_error" in the agent error string,
persist failure_reason='api_invalid_request', and add it to the
GetLastTaskSession exclusion list so the next task starts a fresh
session that re-reads the (now-clean) description.
Co-authored-by: multica-agent <github@multica.ai>
* fix(daemon): unblock issues already poisoned by API 400 invalid_request_error
The forward-only classifier from the previous commit only tags new failures.
Issues like MUL-1918 already have multiple failed-task rows whose
failure_reason is the pre-fix default 'agent_error', and GetLastTaskSession
falls back to those legacy rows on the next claim — so deploying the
classifier alone leaves existing poisoned issues stuck (GPT-Boy review
on PR #2314).
Two complementary changes:
- Migration 079 backfills failure_reason='api_invalid_request' on every
pre-existing 'agent_error' row whose error text matches the canonical
Anthropic 400 invalid_request_error shape. Keeps observability
consistent (multica issue runs / UI now report the right reason).
- GetLastTaskSession adds a defensive ILIKE clause on error text. Closes
the deploy-window gap where the old binary could write a new
'agent_error' row between the migration running and the new code
taking over, and protects against future error-format variants the
daemon classifier might miss.
Plus regression tests covering the legacy + new coexistence case GPT-Boy
flagged, and a guard rail asserting benign 'agent_error' failures
(timeouts, tool errors) still resume their session.
Co-authored-by: multica-agent <github@multica.ai>
---------
Co-authored-by: multica-agent <github@multica.ai>