Commit Graph

753 Commits

Author SHA1 Message Date
LinYushen
f6ac53a967 fix: squad leader no_action must not post comment on comment-triggered path (#2573)
PR #2564 only added IsSquadLeader handling to the assignment-triggered
workflow path and the Output section. When a squad leader is triggered by
a comment (the common case for re-evaluation), the comment-triggered
workflow path had NO squad leader special handling, so the model still
posted comments announcing no_action/silence.

Changes:
- runtime_config.go: Add IsSquadLeader check to comment-triggered step 4
  with explicit prohibition against posting no_action announcement comments
- runtime_config.go: Strengthen Output section from 'may exit silently' to
  'MUST exit without posting any comment' with explicit DO NOT examples
- runtime_config.go: Strengthen assignment-triggered step 5 similarly
- prompt.go: Add squad leader no_action rule to per-turn comment prompt
  when trigger author is an agent and agent instructions contain the
  Squad Operating Protocol marker
- Add tests for both the per-turn prompt and CLAUDE.md generation

Fixes MUL-2168

Co-authored-by: multica-agent <github@multica.ai>
2026-05-14 12:36:06 +08:00
Bohan Jiang
334d9cdd02 fix(squad): skip leader when a member @mentions anyone (MUL-2170) (#2569)
* fix(squad): skip leader on comment when a member @mentions any agent (MUL-2170)

When a human commenter routes an issue directly at a specific agent via
[@Name](mention://agent/<id>), the squad leader was still being woken up
to evaluate the same comment. The leader's only real options were to
re-delegate to the agent the member already named or to record
no_action — both of which produce queue noise without changing the
outcome.

This skips the leader-enqueue path entirely when:
  - the assignee is a squad,
  - the comment author is a member, AND
  - the comment body contains at least one agent mention.

Agent-authored comments are intentionally exempt: when an agent posts
an update that @mentions another agent, the leader still needs to
coordinate the thread. The existing leader-self-trigger guard is
preserved. Only the current comment's body is inspected — parent
(thread root) mentions are not inherited here.

Tests cover the helper (mentions parsing) plus the integration matrix:
member plain / member @member / member @non-leader-agent /
member @leader / agent @agent / leader-self.

Co-authored-by: multica-agent <github@multica.ai>

* test(squad): exercise full CreateComment path for leader-skip rule (MUL-2170)

Adds an integration test that drives the HTTP-layer CreateComment handler
(not just the helper) to lock the call-site wiring: a member top-level
comment with an @agent skips the squad leader, and a subsequent plain
reply in the same thread DOES wake the leader — the parent's @agent
mention must not be inherited into the leader-skip decision.

Picks up a non-blocking review note on PR #2569.

Co-authored-by: multica-agent <github@multica.ai>

* fix(squad): skip leader on any explicit member mention, not only @agent (MUL-2170)

Broaden the leader-skip rule for squad-assigned issues: a member comment
that explicitly @mentions anyone — @agent, @member, @squad, or @all —
counts as deliberate routing and the squad leader stays out. Issue
cross-references (mention://issue/...) are not routing and still trigger
the leader as before.

Per Bohan's follow-up on MUL-2170 — @member should suppress the leader
for the same reason @agent does: the human has already pointed at a
specific recipient, so a leader turn would just be observation noise.

Helper renamed commentMentionsAnyAgent → commentMentionsAnyone with
explicit handling of all four routing mention types. Existing call-site
wiring (current-comment-only, agent-author exemption, leader self-trigger
guard) is unchanged.

Tests updated and extended to cover the full routing matrix:
@member / @squad / @all / @issue (cross-ref) plus the @agent variants
already covered.

Co-authored-by: multica-agent <github@multica.ai>

---------

Co-authored-by: multica-agent <github@multica.ai>
2026-05-14 12:22:10 +08:00
fr00st
cc9fbd3db0 Fix stale Done replies on comment follow-ups (#2495)
* fix: avoid stale done replies on comment follow-ups

* fix: avoid inlining runtime brief for Hermes ACP

* fix: address comment follow-up review feedback
2026-05-14 12:00:04 +08:00
LinYushen
7a1284128d fix: allow squad leader to exit silently on no_action without posting a comment (#2564)
The runtime prompt's Output section unconditionally required all tasks to
post a comment via 'multica issue comment add', which conflicted with the
squad leader protocol that says to 'exit silently' on no_action.

Changes:
- Add IsSquadLeader bool to TaskContextForEnv (detected via Squad Operating
  Protocol marker in agent instructions)
- Relax the Output section and assignment-triggered workflow step 5 to
  allow squad leaders to exit with only a 'multica squad activity' call
  when the outcome is no_action

Fixes MUL-2168

Co-authored-by: multica-agent <github@multica.ai>
2026-05-14 11:33:15 +08:00
Bohan Jiang
21b49eb59b fix(cli): resolve squad assignees in issue create/update/assign (MUL-2165) (#2551)
* fix(cli): resolve squad assignees in issue create/update/assign (MUL-2165)

The CLI assignee resolver only searched workspace members and agents, so a
quick-create input like "assign to <SquadName>" silently fell through to
"Unrecognized assignee: <SquadName>" in the issue description — even though
squads are first-class assignees server-side and the prompt's whole point was
to route the work for the user.

Extend resolveAssignee / resolveAssigneeByID to also fetch /api/squads, teach
the actor display lookup to render squad names in table output, update the
quick-create prompt and runtime-config command listing to mention
`multica squad list` alongside members and agents, and lock in the new
behavior with tests.

Co-authored-by: multica-agent <github@multica.ai>

* fix(cli): gate squad assignee resolution behind an allowed-kinds set (MUL-2165)

The earlier MUL-2165 fix taught resolveAssignee / resolveAssigneeByID to also
return (squad, ...), but those helpers are shared. Project lead and issue
subscriber callers were still using them, and their target schemas reject
squads — project.lead_type has a DB CHECK constraint
(server/migrations/034_projects.up.sql:10) and the subscriber handler's
isWorkspaceEntity switch only knows member/agent
(server/internal/handler/handler.go:414). So
`multica project create --lead "<SquadName>"` and
`multica issue subscriber add --user "<SquadName>"` would resolve to
(squad, ...) and surface as a 500/403 server-side instead of a clean
CLI-side resolution error.

Thread an assigneeKinds set through the resolver and the pickAssigneeFromFlags
helper. Issue create/update/assign/list pass `issueAssigneeKinds` (all three);
project lead and subscriber pass `memberOrAgentKinds`. The squads fetch is
skipped entirely when not allowed, and the not-found / no-match error wording
adapts to the allowed kinds so it never mentions a type the caller cannot use.

Co-authored-by: multica-agent <github@multica.ai>

---------

Co-authored-by: multica-agent <github@multica.ai>
2026-05-13 22:31:50 +08:00
Bohan Jiang
0345285b86 feat(quick-create): searchable actor picker + squad support (#2552)
* feat(quick-create): searchable actor picker + squad support (MUL-2163)

- Replaces the flat agent dropdown in the "Create with agent" modal with a
  searchable PropertyPicker that lists Agents and Squads in separate
  sections, so users can filter by name and pick a squad as the creator.
- Persists the selection as (lastActorType, lastActorId), removing the
  agent-only lastAgentId field on the quick-create store.
- Adds squad_id to the quick-create API request and stamps it onto the
  task's QuickCreateContext. The handler resolves the squad to its leader
  agent (re-using validateAssigneePair) and the daemon claim path injects
  the squad-leader briefing when the task carries a squad hint, matching
  the behavior of issue-bound squad tasks.

Co-authored-by: multica-agent <github@multica.ai>

* fix(create-issue): forward squad picks across manual→agent switch

Manual mode → agent mode previously only carried `agent_id`, so picking
a squad and then flipping to agent silently fell back to the persisted
actor / first visible agent and lost the user's choice. Carry `squad_id`
on the same branch so the agent panel honors the squad pick.

Adds a sibling test alongside the existing project-carry case.

Co-authored-by: multica-agent <github@multica.ai>

---------

Co-authored-by: multica-agent <github@multica.ai>
2026-05-13 22:31:17 +08:00
LinYushen
29082f7cfe feat: implement Squad feature MVP (#2505)
* feat: implement Squad feature MVP

- Add migration 084_squad: squad, squad_member, squad_activity_log tables
- Extend issue.assignee_type to support 'squad'
- Add sqlc queries for squad CRUD, member management, activity logs
- Add Go handler with full Squad API (CRUD, members, activity log)
- Register routes: /api/squads/*, /api/issues/{id}/squad-activity, /api/squad-activity
- Add Squad trigger logic:
  - Assign Squad immediately triggers leader
  - Every external comment on squad-assigned issue triggers leader
  - Anti-loop: squad members' comments don't trigger leader
  - Dedup: skip if leader already has pending task
- Add squad activity log API (方案 B) for leader no-op recording
- Add frontend TypeScript types (Squad, SquadMember, SquadActivityLog)
- Add protocol events: squad:created, squad:updated, squad:deleted

Co-authored-by: multica-agent <github@multica.ai>

* fix: address PR review blocking issues

1. validateAssigneePair now accepts 'squad' assignee_type
2. All squad endpoints validate workspace ownership via GetSquadInWorkspace
3. CreateSquadActivityLog restricted to squad leader agent only
4. AddSquadMember validates member exists in workspace
5. UpdateSquad auto-adds new leader to squad members
6. DeleteSquad transfers assigned issues to leader before deletion
7. IssueAssigneeType includes 'squad' in frontend types

Co-authored-by: multica-agent <github@multica.ai>

* feat: soft-delete squads via archive instead of hard delete

- Add migration 085: archived_at + archived_by columns on squad table
- ListSquads now excludes archived squads (ListAllSquads for admin)
- DeleteSquad → ArchiveSquad (sets archived_at, preserves all records)
- Transfer squad-assigned issues to leader before archiving
- SquadResponse includes archived_at/archived_by fields
- Frontend Squad type updated with nullable archived fields

Co-authored-by: multica-agent <github@multica.ai>

* feat: re-add Squads frontend entry (sidebar nav + pages)

Re-applies the frontend squad entry that was lost during a merge:
- Sidebar nav: Squads item with Users icon
- Paths: squads() and squadDetail() in workspace paths
- Routes: /squads and /squads/[id] pages
- Views: SquadsPage (list) and SquadDetailPage
- i18n: en 'Squads' / zh '小队'
- Reserved slug: 'squads'

Co-authored-by: multica-agent <github@multica.ai>

* fix: fix SquadsPage rendering - use PageHeader children pattern

PageHeader takes children, not title/actions props. The incorrect
usage caused a React rendering error. Now matches the pattern used
by autopilots and agents pages.

Co-authored-by: multica-agent <github@multica.ai>

* fix(squads): add API client methods and package export for squads pages

* feat: complete Squad frontend - create dialog, member management, API methods

- Add CreateSquadModal with name/description/leader selection
- Register 'create-squad' in modal registry
- Wire 'New Squad' button to open the modal
- Add full API client methods: createSquad, updateSquad, deleteSquad,
  addSquadMember, removeSquadMember
- Rewrite SquadDetailPage with:
  - Member list showing resolved names
  - Add/remove member UI
  - Archive squad button
  - Back navigation to squads list

Co-authored-by: multica-agent <github@multica.ai>

* feat: improve Squad UI - match create agent dialog style

- CreateSquadModal: proper Dialog with Header/Description/Footer,
  agent picker with avatars, textarea for description
- SquadDetailPage: centered max-w-2xl layout, ActorAvatar for members,
  Crown badge for leader, textarea for member description,
  improved spacing and visual hierarchy
- Renamed 'role' field label to 'Description' in add member form
  (describes the member's responsibilities in the squad)

Co-authored-by: multica-agent <github@multica.ai>

* feat(squad): add avatar, instructions; drop unique-name constraint

- 086: add squad.avatar_url
- 087: drop unique constraint on squad.name (squads with the same
  name are legitimate across teams; uniqueness was an accidental
  product constraint)
- 088: add squad.instructions (text, default '')
- UpdateSquad now COALESCEs avatar_url + instructions
- handler exposes Instructions in SquadResponse and accepts it in
  UpdateSquad

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* feat(squad): assignable + mention target; trigger leader on assign

- assignee picker and @mention suggestion list squads alongside
  agents and members; renders squad avatar/icon
- creating or updating an issue with assignee_type=squad enqueues
  a task for the squad's current leader (mirrors agent-assignee
  parking-lot rule: skip backlog only)
- workspace queries/hooks expose squads where needed for the
  pickers
- locales updated for new picker copy

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* feat(squad): agent-style detail page with members + instructions tabs

- restructure squad detail page to mirror the agent detail page:
  320px inspector (creator, leader, created/updated) + tabbed
  pane (Members | Instructions) with dirty-guard AlertDialog
- inline name + avatar editing on the inspector
- inline description editor (modal textarea)
- members tab: leader + member picker with role descriptions,
  swap leader, edit member roles, remove
- instructions tab: ContentEditor + Save (mirrors agent pattern)
- squads list shows the squad avatar/icon
- core types + api.updateSquad accept avatar_url + instructions

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* feat(squad): inject leader briefing on claim (protocol + roster + instructions)

When a squad's leader agent claims a task on a squad-assigned issue,
append a system-level briefing to the agent's Instructions composed of:

1. Squad Operating Protocol — hard-coded rules: leader is a
   coordinator, dispatch via @mention, stop after dispatching,
   resume on re-trigger, do not work outside the roster.
2. Squad Roster — leader self-row plus one row per non-archived
   member with a literal mention markdown string ([@Name](mention://
   agent|member/<UUID>)) the leader can paste verbatim. Round-trips
   through util.ParseMentions, enforced by a contract test.
3. Squad Instructions — the user-defined squad.instructions block,
   omitted entirely when empty so we do not leave a dangling heading.

Non-leader members claiming the same issue receive no briefing.

Tests cover: full squad with mixed agent/human members, lone leader,
archived agents skipped, empty user instructions, mention round-trip,
and the leader/non-leader claim-handler gate.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix(squad): tell leader not to restate issue context in dispatch comment

After observing leaders padding their delegation comments with full
re-summaries of the issue body and prior discussion, make the
Operating Protocol explicit:

- assignees on Multica already have the full issue (title,
  description, all comments, attachments) and workspace context;
- delegation comments should add only what cannot be inferred
  (who is picked, why, extra constraints), aim for two or three
  sentences;
- restating context is now an explicit hard rule violation.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* feat(squad): unify leader evaluation into activity_log, add CLI command

- Squad member comments now trigger leader (only leader self-excluded)
- Replace squad_activity_log with activity_log (action: squad_leader_evaluated)
- Add CLI: multica squad activity <issue-id> <outcome> --reason
- Add API: POST /api/issues/{id}/squad-evaluated
- Update squad operating protocol to require evaluation recording
- Remove squad_activity_log table from schema and generated code

* feat(cli): add squad list, get, member list commands

* fix(squad): address review findings (P1+P2)

P1 fixes:
- Add 'squads' to reserved_slugs.json (source of truth)
- Add 'create-squad' to ModalType union
- Remove unused leaderOpen/selectedLeader in create-squad modal
- Replace literal JSX strings with i18n selectors (en + zh-Hans)

P2 fixes:
- Add 'squad' to mention regex (MentionRe)
- Fix human member lookup in squad briefing (use GetUser directly)
- Add squads routes to desktop app
- Add squad:created/updated/deleted to WSEventType + invalidation
- Reject archived squads as issue assignees

* fix(squad): restore zh-Hans key, publish activity event, invalidate issues on archive

- Restore create_project.title in zh-Hans modals.json (dropped by prior edit)
- Publish activity:created WS event after squad leader evaluation
- Invalidate issue queries on squad:deleted (archive transfers assignees)
- Add creator info to squad list cards

* fix(squad): realtime sync, rerun support, leader validation

- Use workspaceKeys.squads prefix for detail/member queries (realtime invalidation)
- Publish squad:updated after add/remove/role-change member mutations
- Support rerun for squad-assigned issues (targets leader agent)
- Reject assignment to squads whose leader is archived

---------

Co-authored-by: multica-agent <github@multica.ai>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-13 18:46:20 +08:00
Naiyuan Qing
623d29f276 feat(agents): one-click create from curated templates (Phase 1) (#2520)
* docs(agents): three-phase agent quick-create plan

Captures the full design for moving agent creation from manual form +
one-by-one skill attachment to a tiered experience:

- Phase 1 (this PR): one-click curated templates, AI-free.
- Phase 2 (next): AI-recommended skills via the existing quick-create
  task mechanism — no new server-side LLM dependency.
- Phase 3 (later): AI creates the whole agent end-to-end, composing
  Phase 2 with a new `multica agent create` CLI driver.

Documents the architectural decisions that keep all three phases on
existing infrastructure (no SSE, no server-side LLM SDK, no new WS
channels), the two soft blockers Phase 1 unlocks for later phases
(createSkillWithFiles TX composability + skill same-name dedupe), and
the scope decisions we explicitly opted out of (Anthropic plugin
marketplace, ClawHub UI affordances).

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(skills): harden import against invalid UTF-8 and binary files

PG rejects two byte patterns in a TEXT column. Both crashed real skill
imports we hit while assembling the template catalog:

- Embedded NUL (0x00) -> SQLSTATE 22021. Already stripped by
  sanitizeNullBytes, kept as-is.
- Other invalid UTF-8 (e.g. 0x91 — Windows-1252 smart quote in a skill
  whose author saved prose from Word). sanitizeNullBytes now also runs
  strings.ToValidUTF8 over the content so the second class no longer
  takes the whole import down.

For non-text payloads (images, fonts, archives, compiled binaries),
sanitization isn't the right fix — agents never read those as text,
and the bytes can't survive a TEXT column at all. addFile now skips
them by extension before the per-bundle cap counters tick, logging
the skip so an unexpected drop leaves a breadcrumb.

Function name kept for compatibility with the many call sites; both
behaviours are strict supersets of the original.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(skills): split createSkillWithFiles for tx composition + add workspace find-or-create query

Two soft blockers cleared so create-from-template (next commit) can
fold N skill creates and the agent + binding writes into one outer
transaction:

1. createSkillWithFiles used to Begin/Commit its own tx. Caller
   composition was impossible — N invocations meant N separate
   transactions and no atomicity over the whole materialise step.
   Pull the body into createSkillWithFilesInTx(ctx, qtx, input); the
   original function becomes a thin wrapper that manages its own tx
   for standalone callers. Existing call sites: zero behaviour change.

2. Add GetSkillByWorkspaceAndName sqlc query — workspace skill lookup
   by name, anchored to UNIQUE(workspace_id, name) from migration
   008. Lets the template materialiser implement find-or-create:
   reuse the workspace's existing skill row when a template
   references the same name, rather than crashing on the unique
   constraint or polluting the workspace with `<name>-2` clones.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(agents): agent template catalog + create-from-template endpoint

Server-side foundation for Phase 1 of the quick-create roadmap (see
docs/agent-quick-create-plan.md). Adds:

- server/internal/agenttmpl/ — embed-loaded catalog of curated agent
  templates. Each template ships pre-written instructions plus a list
  of skill URLs that get materialised into the workspace at create
  time. Validation runs at startup (init() panics on a malformed
  template) so a bad JSON ships as a deploy-time defect, not a
  runtime 500. Slug must equal the filename basename so the URL
  router is mirror-symmetric with the file layout.

- 11 starter templates covering Engineering / Writing / Building /
  Testing (code-reviewer, frontend-builder, planner, docs-writer,
  one-pager, html-slides, full-stack-engineer, …).

- Three new endpoints, all behind RequireWorkspaceMember:
    GET  /api/agent-templates           — picker list (no instructions)
    GET  /api/agent-templates/:slug     — detail with instructions
    POST /api/agents/from-template      — materialise + create

  Create flow:
    1. Auth + runtime authorization happen BEFORE the GitHub fan-out
       so a 403 never wastes 20s of upstream fetches.
    2. Pre-flight dedupe by cached_name reuses workspace skills
       without an HTTP fetch — second create-from-the-same-template
       drops from 20s to <100ms.
    3. Parallel fetch (30s per-URL timeout) for the remaining skills.
    4. Single transaction: every skill insert, the agent insert, and
       the agent_skill bindings. On any upstream fetch failure the TX
       rolls back and the API returns 422 with `failed_urls` so the
       UI can name the bad source(s).
    5. extra_skill_ids (user-supplied additions) are verified through
       GetSkillInWorkspace per id before attach, so a malicious client
       can't graft a skill from another workspace via UUID guessing.

- multica agent create --from-template <slug> CLI flag dispatches to
  the new endpoint with a 60s ceiling, matching `multica skill import`.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(agents): one-click create-from-template UI

Frontend half of Phase 1. CreateAgentDialog becomes a state machine
spanning four steps:

  chooser          → Start blank / From template cards
  blank-form       → existing manual form (post-chooser)
  duplicate-form   → existing form pre-filled from a duplicated agent
  template-picker  → grid of templates, click navigates to detail
  template-detail  → instructions + skill list preview + one-click Use

Picking a template never lands on the form: name auto-deduped against
existingAgentNames, runtime = first usable one, visibility = private.
Refinement happens on the agent detail page if needed. Same rationale
the doc spells out — templates exist precisely to skip configuration.

New components, all collapsible-by-default so quick-create stays fast:
  - template-picker.tsx — categorised grid, lucide icons + semantic
    accent tokens resolved through static maps so Tailwind's JIT picks
    up every variant (dynamic class strings would silently miss).
  - template-detail.tsx — instructions preview, skill list with cached
    descriptions, Use CTA. Renders the failedURLs banner when a 422
    fires — the only step that can trigger that response.
  - instructions-editor.tsx — collapsed preview-card / expanded full
    ContentEditor.
  - skill-multi-select.tsx + skill-picker-list.tsx — shared multi-
    select surface, also adopted by the existing skill-add-dialog.
  - avatar-picker.tsx — agent avatar upload, mirrors the inspector's
    visual language.

Schema-defended client (CLAUDE.md → API Response Compatibility): the
three new endpoints are wired through parseWithFallback with lenient
zod schemas. Desktop builds outlive any given server — a future
field rename / wrapping must not white-screen older installs.
listAgentTemplates accepts both the current bare array and a future
{templates: [...]} envelope. Coverage: 7 new schema-test cases in
schema.test.ts (null body, missing skills/instructions, malformed
create response, envelope migration).

Catalog + detail go through TanStack Query with staleTime: Infinity —
workspace-independent static data, no per-mount refetch.

Other:
- skill-add-dialog becomes a true multi-select (Confirm button +
  checkbox list); attached skills are filtered out of the list.
- agents-page hands the freshly-created Agent back to the dialog so a
  follow-up setAgentSkills can attach the form-selected skills.
- agent-overview-pane drops the mx-auto/max-w-2xl frame on config-
  tab content; the wider dialog visual language reads better with
  tabs filling the column.
- Every new UI string lives in both en/agents.json and
  zh-Hans/agents.json under create_dialog.* / tab_body.skills.* —
  locales/parity.test.ts blocks drift in CI.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(ci): align skill import test + drop next-only lint suppression

- TestFetchFromSkillsSh_ResolvesRootLevelSkillMd now expects assets/logo.png
  to be skipped; matches the new addFile binary-extension guard
  (6fafd86e). The .png is intentionally dropped so PG TEXT inserts don't
  hit SQLSTATE 22021.
- packages/views shares zero next/* deps, so the @next/next/no-img-element
  eslint plugin isn't loaded there. The eslint-disable directive
  referencing it produced a hard "rule not found" error in CI lint. Raw
  <img> is the right primitive in views; remove the disable comment.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-authored-by: multica-agent <github@multica.ai>

* test(agents): wrap CreateAgentDialog tests in workspace/navigation providers

The dialog now calls useNavigation() and useWorkspacePaths(), both of
which throw outside their providers. The existing tests rendered the
dialog bare and tripped both new requirements:

- NavigationProvider — supply a stub adapter so push() works for the
  agent-detail redirect.
- WorkspaceSlugProvider — useWorkspacePaths() requires a slug.

The blank-vs-template chooser is now the default first step; the
existing tests target the runtime picker on the manual form, so the
helper auto-clicks "Start blank" when no template is passed
(duplicate-mode tests skip the chooser).

Manual afterEach(cleanup) + document.body wipe. Base UI's Dialog
portal renders into document.body and leaves focus-guard/inert wrapper
divs behind across tests, so the second test in the suite saw two
"All" / "My Runtime" matches and getByText failed. The wipe is local
to this file rather than the shared setup because it isn't a global
issue — only suites that open Base UI dialogs hit it.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-authored-by: multica-agent <github@multica.ai>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: multica-agent <github@multica.ai>
2026-05-13 18:26:04 +08:00
Naiyuan Qing
454c8e3d1a feat: in-app preview for non-image attachments (#2528)
* feat(storage): add GetReader to Storage interface

Adds a streaming read method to the Storage abstraction so callers can
pull object bytes without forcing a full in-memory load. S3Storage wraps
GetObject; LocalStorage opens the file with path-traversal and sidecar
guards. Tests cover happy path, traversal rejection, sidecar rejection,
and missing key.

Used in the next commit by the attachment-preview proxy endpoint.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(server): add attachment preview proxy endpoint

GET /api/attachments/{id}/content streams the raw bytes of a
text-previewable attachment back to the client. Exists to (a) bypass
CloudFront CORS, which is not configured on the CDN, and (b) bypass
Content-Disposition: attachment which Chromium honors for iframe document
loads. Media types (image/video/audio/pdf) intentionally do NOT go through
this endpoint — clients render them directly from the signed CloudFront
download_url, which is already served with Content-Disposition: inline.

Hard cap: 2 MB. Larger files return 413. Anything outside the text
whitelist returns 415. The whitelist (isTextPreviewable) mirrors the
client-side dispatcher; the cross-reference comment in file.go flags
the manual sync until a JSON SSOT generator lands.

Response always uses Content-Type: text/plain; charset=utf-8 so a
hostile HTML payload can't be re-interpreted as a document. The
original MIME ships via X-Original-Content-Type for client dispatch.
Cache-Control: no-store so revoked attachment access takes effect
immediately on the next request.

Tests cover happy path (md), extension fallback when content_type is
generic, 415 (pdf), 413 (>2MB), foreign workspace (404 isolation), and
the isTextPreviewable table.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(core/api): add getAttachmentTextContent + preview error types

Adds an ApiClient method that fetches the text body of an attachment via
the new /api/attachments/{id}/content proxy. Two typed errors —
PreviewTooLargeError (413) and PreviewUnsupportedError (415) — let the
preview modal render specific fallbacks instead of a generic failure.

Refactors the private fetch() into a shared fetchRaw() helper so the
new method inherits the standard infra: auth headers, 401 →
handleUnauthorized recovery, X-Request-ID, error logging, and the
ApiError contract. The previous draft bypassed all of these by calling
window.fetch directly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(views/editor): add AttachmentPreviewModal + Eye entry points

In-app preview for non-image attachments. An Eye icon now sits next to
the existing Download button on file cards / readonly file cards / the
standalone AttachmentList. Clicking it opens a full-screen modal that
dispatches by content_type:

  pdf:      <iframe src={download_url}>           — Chromium PDFium
  video/*:  <video controls src={download_url}>   — native controls
  audio/*:  <audio controls src={download_url}>   — native controls
  md:       <ReadonlyContent>                     — full markdown pipeline
  html:     <iframe srcdoc sandbox="">            — fully restricted
  text:     <code class="hljs">                   — lowlight highlight

Media types render directly from the signed CloudFront download_url
(server marks them inline-disposition). Text types fetch through the
new /api/attachments/{id}/content proxy via TanStack Query, wrapped
in useAttachmentPreview() so each entry point owns its own modal
state without depending on a global Provider mount.

Modal sizing: max-w-6xl × min(90vh, 100vh - 2rem) — slightly larger
than create-issue's max-w-4xl since PDF / video need room, but capped
to viewport on small screens. Sub-renderers use h-full to follow the
fixed modal height instead of viewport-relative units.

Images are intentionally NOT touched — the existing ImageLightbox
(extensions/image-view.tsx) already handles them correctly. The new
modal would be churn without user-visible benefit.

Adds i18n keys under attachment.* (en + zh-Hans) and registers
Preview/Download/Upload in the conventions glossary so future
translations stay consistent.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(desktop): enable Chromium PDF viewer for attachment preview

Adds webPreferences.plugins: true to the main BrowserWindow so the
bundled Chromium PDFium plugin activates inside iframes — required for
the attachment preview modal's PDF dispatch. Default is false in Electron;
without it <iframe src=*.pdf> renders blank.

Security trade-off, accepted intentionally and documented inline:
  1. This window already runs with webSecurity: false + sandbox: false,
     so plugins: true does NOT meaningfully widen the renderer's attack
     surface beyond what is already accepted.
  2. The only PDFs that reach an iframe here are signed CloudFront URLs
     we ourselves issued; user-supplied URLs are routed through
     setWindowOpenHandler → openExternalSafely and cannot land in this
     renderer.
  3. Chromium's PDFium plugin is itself sandboxed and only handles
     application/pdf — no Flash/Java/other historical plugin surfaces.

If we ever tighten webSecurity / sandbox, the follow-up is to host the
PDF viewer in a dedicated BrowserView with plugins scoped to that view,
keeping the main renderer plugin-free.

Old desktop builds ship without the preview modal, so the Eye button
never appears and PDF preview is gated by the same release — zero
regression risk for users on stale clients.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 18:24:15 +08:00
Bohan Jiang
5db96b4007 fix(daemon): bypass Gemini folder-trust gate in headless mode (#2516) (#2523)
Gemini CLI's folder-trust feature throws FatalUntrustedWorkspaceError
(exit code 55) when the current workspace isn't in
`~/.gemini/trustedFolders.json` and the process is headless — no
interactive trust prompt is available. The daemon spawns gemini with
`-p` + `--yolo` in a freshly checked-out worktree that the user has
never trusted interactively, so every run with `security.folderTrust`
enabled fails after ~10s with exit status 55 and no useful output.

Default `GEMINI_CLI_TRUST_WORKSPACE=true` on the child env to short-
circuit `checkPathTrust` in gemini-core. This mirrors gemini-cli's
documented `--skip-trust` flag; the env var has been gemini's
documented headless escape hatch for the entire folder-trust feature
lifetime so the fix works on every gemini version that can produce
the crash. Callers that explicitly set the same key in cfg.Env win,
preserving the ability to opt back into the gate.

Co-authored-by: multica-agent <github@multica.ai>
2026-05-13 17:05:12 +08:00
Bohan Jiang
178cfb5008 fix(daemon): strip Windows chcp noise from runtime version (#2516) (#2521)
The gemini CLI's Windows shim emits `Active code page: 65001` (from
`chcp`) to stdout before the real version reaches `--version` output.
The daemon stored the raw concatenation as the runtime version, so the
runtime detail page rendered `Active code page: 65001 0.42.0` instead
of `0.42.0`.

Scan `<cli> --version` line by line and return the first line carrying
a semver-shaped token. Full strings like `2.1.5 (Claude Code)` or
`codex-cli 0.118.0` survive unchanged; unparseable output falls back to
the trimmed raw value.

Co-authored-by: multica-agent <github@multica.ai>
2026-05-13 16:58:14 +08:00
Bohan Jiang
51aa924124 feat(chat): support renaming chat sessions inline (#2522)
Adds a pencil icon next to the trash icon on each session row in the chat
dropdown. Clicking it turns the title into an inline editable input:
Enter / blur saves, Escape cancels.

Server: new PATCH /api/chat/sessions/{id} handler that updates the title
via the existing `UpdateChatSessionTitle` sqlc query, broadcasts a new
`chat:session_updated` WS event so other tabs / devices stay in sync, and
rejects blank titles. Frontend mutation is optimistic with rollback,
matching the existing delete-session pattern.

MUL-2110

Co-authored-by: multica-agent <github@multica.ai>
2026-05-13 16:57:34 +08:00
Bohan Jiang
384ddcbe65 fix(execenv): seed user-installed Codex skills into per-task CODEX_HOME MUL-1626 (#2519)
* fix(execenv): seed user-installed Codex skills into per-task CODEX_HOME

Codex is the only daemon runtime whose HOME is redirected — the daemon
sets CODEX_HOME to a per-task isolated directory so each task gets a
clean config slate without polluting ~/.codex/. Side effect: the codex
CLI never sees the user's `~/.codex/skills/` and tells the user no skill
was found.

Other runtimes (claude / copilot / opencode / pi / cursor / kimi / kiro)
don't have this issue: they leave HOME untouched and discover both
user-level skills (from ~/.<runtime>/skills) and workspace-assigned
skills (written to a workdir-local dotfile dir) natively. Codex is the
outlier.

Fix: in execenv.Prepare and execenv.Reuse, copy each subdirectory under
`~/.codex/skills/` into the per-task `codex-home/skills/` before writing
workspace-assigned skills. Workspace skills still win on sanitized-name
conflict; user-level installer symlinks (lark-cli style) are followed so
the per-task home gets real content rather than dangling links.

Closes #1922

Co-authored-by: multica-agent <github@multica.ai>

* fix(execenv): wipe per-task codex skills dir before each hydration

Without this, the Reuse path leaves two classes of stale state behind:

1. Round 1 seeded user skill `writing/drafts/stale.md`. Round 2 reuses
   the same workdir with workspace skill `Writing` assigned: seed
   stage skips user `writing` (reserved), workspace stage writes
   `SKILL.md` via MkdirAll + WriteFile but never clears the directory,
   so the round-1 user support files surface under the workspace
   skill — violating "workspace fully wins on name conflict" and
   potentially leaking user-level files into a workspace skill view.

2. User uninstalls a skill from ~/.codex/skills between two runs. The
   prior copy in codex-home/skills/<name>/ lingers, so the codex CLI
   keeps seeing the removed skill.

Fix: RemoveAll(codex-home/skills) at the start of hydrateCodexSkills,
then re-seed user skills and re-write workspace skills. On Prepare
this is a no-op (envRoot was already wiped); on Reuse it resets the
slate.

Added two regression tests covering both scenarios.

Co-authored-by: multica-agent <github@multica.ai>

---------

Co-authored-by: multica-agent <github@multica.ai>
2026-05-13 16:35:03 +08:00
Naiyuan Qing
e8c2855746 fix(chat): collapse chat-done flicker via inline cache write (#2509)
* fix(chat): collapse chat-done flicker via inline cache write

The chat panel flickered at end-of-turn: live TimelineView unmounted →
short blank + scroll jump → persistent AssistantMessage finally appeared.

Root cause: chat:done's WS handler called setQueryData(pendingTask, {})
synchronously while invalidateQueries(messages) was an async refetch.
The render guard pendingAlreadyPersisted (chat-message-list.tsx:62-68)
expected the persisted message to already be in the messages cache
before pending cleared, but the sync/async ordering broke that guard.

Fix follows TkDodo's "combine setQueryData (active query) + invalidate
(others)" pattern. ChatDonePayload now carries the freshly-persisted
ChatMessage (id, content, elapsed_ms, created_at); the WS handler
writes it into chatKeys.messages BEFORE clearing pending. Same render
tick → AssistantMessage mounts before TimelineView unmounts → no
flicker. invalidate(messages) stays as a fallback for clients that
took the older code path or for content drift (redaction, etc.).

Also slim task:completed's chat branch — chat:done already wrote the
message and cleared pending; task:completed only refreshes the
cross-session pending aggregate that drives the FAB.

Field additions are all `omitempty` / TS `?:` so older clients ignore
them and older servers (no fields populated) fall back to invalidate-
only, preserving prior behavior.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-authored-by: multica-agent <github@multica.ai>

* test(chat): cover chat done cache handoff

Co-authored-by: multica-agent <github@multica.ai>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Co-authored-by: multica-agent <github@multica.ai>
Co-authored-by: Eve <eve@multica-ai.local>
2026-05-13 15:27:44 +08:00
Bohan Jiang
451c46c43f refactor(usage): rename Dashboard → Usage + dynamic per-agent leaderboard (#2511)
The page added in #2462 lived at `/{slug}/dashboard` and was titled
"Dashboard", which collides with the conventional meaning ("personal
landing surface") and doesn't tell new users what the page is for. Its
actual contents — token spend, cost, run time, task counts — map cleanly
onto the OpenAI / Anthropic / Vercel "Usage" surface, so rename to that.

Renames (user-visible)
- Route: `/{slug}/dashboard` → `/{slug}/usage` (web App Router + desktop
  memory router)
- Sidebar entry: label "Dashboard" / "看板" → "Usage" / "用量", icon
  LayoutDashboard → BarChart3 (page header icon swapped in sync)
- Page title in en/zh-Hans
- Reserved-slugs: add `usage` to workspace route segments group;
  `dashboard` stays reserved in the marketing group (back-compat against
  workspace slug collisions + keeps the name free for a future Home page)
- i18n namespace `dashboard` → `usage` across resources-types.ts,
  locales/index.ts, and the moved JSON files
- WORKSPACE_ROUTE_SEGMENTS in editor link-handler
- paths.workspace(slug).dashboard() → .usage(), with matching test
  expectation updates

Per-agent leaderboard polish (`packages/views/dashboard/components/
dashboard-page.tsx`)
- Card title "Cost & run time by agent" → "Leaderboard" with a 4-way
  Segmented control: Tokens / Cost / Time / Tasks
- Active metric drives row order, progress-bar width, and the
  emphasised column header / cell — keeping ranking, visual quantity,
  and column emphasis in lockstep so users always see what's being
  measured
- Default sort = Tokens (most universally meaningful; Cost still one
  click away)
- Project filter dropdown:
  - Show ProjectIcon next to the selected project + each list item;
    FolderKanban as the "All projects" fallback (matches ProjectPicker
    language)
  - alignItemWithTrigger={false} so "All projects" doesn't get pushed
    above the trigger and clipped when the header sits at the top of
    the viewport (was the root cause of "can't re-select All projects"
    once a project was selected)
  - max-h-72 to cap the dropdown when workspaces accrue many projects;
    matches the runtime-detail Select precedent
- Folder name `packages/views/dashboard/*` and `DashboardPage`
  component name intentionally left in place — user-visible rename
  only, no broad code refactor.

Old `/dashboard` routes are not redirected because the page only landed
in #2462 (a few days ago); no real users, external links, or
desktop-tab persistence have settled on it yet.
2026-05-13 14:07:53 +08:00
Multica Eve
ff27142b69 fix: treat empty output on successful completion as completed, not blocked (#2507)
When an agent completes successfully (exit 0) but produces no text
output, the daemon incorrectly classified it as 'blocked'. This is
wrong — agents can legitimately complete work via tool calls (posting
comments, pushing code) without emitting text output.

Change the empty-output path to return status=completed so the task
is correctly reported as successful.

Fixes MUL-2104

Co-authored-by: yushen <ldnvnbl@gmail.com>
Co-authored-by: multica-agent <github@multica.ai>
2026-05-13 12:56:17 +08:00
Bohan Jiang
96695a79c5 feat(dashboard): workspace/project token + run-time dashboard MUL-1882 (#2462)
* feat(dashboard): workspace/project token + run-time dashboard

Add a `/{slug}/dashboard` page showing per-agent token spend and execution
time across the whole workspace, with an optional project filter.

Backend:
  - Three new sqlc queries against task_usage + agent_task_queue: daily
    usage, per-agent usage, per-agent total run-time. All optionally
    scoped to a project via sqlc.narg('project_id'), reaching project
    through the issue join.
  - Handlers under /api/dashboard return the same wire shape the runtime
    page already consumes (model preserved for client-side cost math).

Frontend: - Shared DashboardPage in packages/views/dashboard reusing KpiCard,
    DailyCostChart, ActorAvatar, and estimateCost from the runtime page
    so the visual style and pricing math stay in lock-step.
  - Period selector (7/30/90d), project dropdown, four KPI tiles
    (cost, tokens, run time, tasks), daily cost chart, and a combined
    "cost + run time by agent" list.
  - Routed in both web (app/[slug]/(dashboard)/dashboard) and desktop
    (memory router); sidebar nav entry added under Workspace group.
Co-authored-by: multica-agent <github@multica.ai>

* fix(dashboard): drop stale project filter and stop double-counting tasks

Two issues caught in PR #2462 review:

1. Project filter held the previous selection's UUID across workspace
   switches and project deletions: the dropdown gracefully showed
   "All projects" (because the title lookup missed) while the three
   dashboard queries kept forwarding the dead UUID, leaving the UI
   looking like a full-workspace view but populated with empty
   project-scoped data. Validate the picked UUID against the current
   projects list before passing it to the queries.

2. The "by agent" table read its task count from the token rollup,
   which is grouped per (agent, model). A single task that spans two
   models lands twice and the agent's row reads e.g. "2 tasks" when
   the real count is 1. Prefer `ListDashboardAgentRunTime`'s per-agent
   distinct count when available; fall back to the token aggregate
   only for agents with no terminal run yet (in-flight tasks).

Extract the merge into `mergeAgentDashboardRows` so the precedence
rules are unit-tested directly.

Co-authored-by: multica-agent <github@multica.ai>

* test(dashboard): allocate per-workspace issue.number explicitly

TestDashboardEndpoints creates two issues in the shared fixture
workspace. issue.number defaults to 0 (migration 020), and the table
carries UNIQUE (workspace_id, number), so the second insert raced the
first on the same default and failed in CI.

Allocate MAX(number) + 1 per insert so each row gets a fresh number
without stepping on rows other tests left behind in the same workspace.

Co-authored-by: multica-agent <github@multica.ai>

* feat(dashboard): rollup table + cron-driven aggregation for dashboard

Mirror the per-runtime rollup in `task_usage_daily` (migrations 073/077/082)
to remove the per-request raw aggregation the dashboard was doing.

Migration 084 adds:
  - `task_usage_dashboard_daily` keyed on
    (bucket_date, workspace_id, agent_id, project_id, model) — the
    dimensions the dashboard actually queries, with project_id nullable
    via UNIQUE NULLS NOT DISTINCT (PG15+) so "no-project" buckets
    upsert cleanly.
  - `task_usage_dashboard_rollup_state` watermark table.
  - `task_usage_dashboard_dirty` invalidation queue.
  - Triggers on agent_task_queue DELETE, task_usage DELETE, and
    issue.project_id UPDATE — the cases the updated_at watermark can't
    see. The project_id trigger re-attributes existing rollup rows when
    a user moves an issue across projects.
  - `rollup_task_usage_dashboard_daily_window(from, to)` —
    idempotent recompute primitive (same shape as 077).
  - `rollup_task_usage_dashboard_daily()` cron entry — own advisory
    lock (4244) so it serialises independently of the runtime rollup.
  - `task_usage_dashboard_rollup_lag_seconds()` health helper.

Sqlc queries `ListDashboardUsageDailyRollup` /
`ListDashboardUsageByAgentRollup` read from the new table; the handler
dispatches between rollup and raw on a separate
`UseDailyRollupForDashboard` config flag
(`USAGE_DASHBOARD_ROLLUP_ENABLED` env). Same fail-safe default (false →
raw) so operators can roll out independently of the per-runtime flag.

Bucket date is UTC (the dashboard aggregates across runtimes that may
sit in different tzs; there's no single correct local boundary).

Adds `cmd/backfill_task_usage_dashboard_daily` mirroring the existing
per-runtime backfill — operator runs it once before flipping the flag.

Tests: - TestDashboardEndpoints now also exercises the rollup read path
    (raw vs. rollup, same project-scoped totals).
  - TestDashboardRollupReattributesOnProjectChange verifies the
    issue.project_id trigger enqueues both old + new buckets and the
    next rollup tick zeroes the old project + populates the new one.
Co-authored-by: multica-agent <github@multica.ai>

* fix(dashboard-rollup): close two invalidation gaps

Two leak paths missed by migration 084 review:

1. Issue cascade DELETE — the atq BEFORE DELETE trigger runs AFTER the
   issue row is gone, so `LEFT JOIN issue` returns NULL project_id and
   the original-project bucket never gets cleared (issue 077 calls this
   out for the runtime rollup but didn't need to act on it). Adds an
   `issue BEFORE DELETE` trigger that enqueues using OLD.project_id
   while the issue row is still readable.

2. `LinkTaskToIssue` (quick-create task attaching to a real issue post-
   completion) UPDATEs `agent_task_queue.issue_id` from NULL to a real
   id. Migration 084 only watched DELETE on atq, so usage already
   rolled up under the no-project bucket stayed attributed to NULL
   forever. Extends the atq trigger to fire on UPDATE OF issue_id too,
   enqueueing both OLD (NULL project) and NEW (linked issue's project).

Tests: - TestDashboardRollupClearsOnIssueDelete asserts rollup row drops to
    zero after issue delete + rollup tick.
  - TestDashboardRollupReattributesOnLinkTaskToIssue verifies tokens
    move from the NULL bucket to the project bucket after the UPDATE.
Co-authored-by: multica-agent <github@multica.ai>

---------

Co-authored-by: multica-agent <github@multica.ai>
2026-05-13 12:51:16 +08:00
Bohan Jiang
a02e58b488 fix(github): only auto-close issue after all linked PRs resolve (#2470)
* fix(github): only auto-close issue when all linked PRs have resolved

Previously, the webhook handler unconditionally moved an issue to `done`
as soon as a single linked PR was merged. If a second PR was also linked
to the same issue and still open / draft, the issue would close before
the work was actually finished.

Add `CountOpenSiblingPullRequestsForIssue` and gate the auto-status
transition on it: a merged PR advances its linked issues only when no
sibling PR linked to the same issue is still in flight. Issues stay put
while siblings are open or draft, and the merge that resolves the last
in-flight PR is the one that closes the issue.

Adds an integration test that opens two PRs against the same issue,
merges the first, asserts the issue stays in_progress, then merges the
second and asserts the issue advances to done.

Co-authored-by: multica-agent <github@multica.ai>

* fix(github): re-evaluate auto-close on closed-without-merge events too

GPT-Boy review on #2470: gating only the `state == "merged"` branch left
one ordering hole. PR-A merges first → issue stays in_progress because
PR-B is open; PR-B later closes WITHOUT merging → no event ever re-runs
the auto-close check, so the issue is stuck in_progress.

Generalise the trigger to every terminal PR event (`merged` or `closed`)
and advance the issue only when:
- the issue is not already terminal (done / cancelled);
- no sibling PR is still in flight (open / draft);
- at least one linked PR — current or sibling — actually merged.

Rule (3) preserves "user closed every PR without merging → leave the
issue alone": if no work was delivered, the user decides what to do.

Replace `CountOpenSiblingPullRequestsForIssue` with
`GetSiblingPullRequestStateCountsForIssue`, which returns both the
in-flight count and the merged count in a single roundtrip.

Adds `TestWebhook_ClosedSiblingAfterMerge` (the regression GPT-Boy
flagged) and `TestWebhook_AllClosedWithoutMerge` (the negative case
guarding rule 3). Refactors the multi-PR webhook helper out of the
existing two-merge test so all three multi-PR scenarios share it.

Co-authored-by: multica-agent <github@multica.ai>

---------

Co-authored-by: multica-agent <github@multica.ai>
2026-05-12 15:39:55 +08:00
Bohan Jiang
caeb146bac feat(github): GitHub App integration for PR ↔ issue linking (#1817)
* feat(github): GitHub App backend for PR ↔ issue linking

- New tables: github_installation (workspace ↔ App install), github_pull_request (mirrored PR state), issue_pull_request (M:N link).
- Webhook handler verifies HMAC-SHA256, upserts PR rows, parses issue identifiers from PR title/body/branch and auto-links them. Merging a linked PR moves the issue to done.
- Connect/setup endpoints power the zero-config "Connect GitHub" install flow; state token is HMAC-signed so the setup callback can recover the workspace.
- Workspace-scoped admin routes for listing/disconnecting installations, plus a per-issue `pull-requests` list endpoint.

Co-authored-by: multica-agent <github@multica.ai>

* feat(github): UI for connecting GitHub and viewing linked PRs

- Settings → Integrations: new tab with Connect GitHub / installations list / disconnect, gated on the deployment having the App configured.
- Issue detail sidebar: Pull requests section showing linked PR title, repo, state (open/draft/merged/closed), and author, with deep link to GitHub.
- Real-time refresh: github_installation:* and pull_request:* events invalidate the matching TanStack Query caches.

Co-authored-by: multica-agent <github@multica.ai>

* fix(github): address review — null actor, role gating, configured guard, scoped uninstall broadcast

- listeners: use optionalUUID(e.ActorID) so the system actor on the github-driven issue:updated event no longer panics activity / notification listeners; merged-PR → issue done now produces a status_changed activity and inbox entry.
- IntegrationsTab: gate the admin-only installations query on canManage so members no longer hit /github/installations 403; the configured/not-configured copy is also scoped to admins.
- backend: introduce isGitHubConfigured() requiring both GITHUB_APP_SLUG and GITHUB_WEBHOOK_SECRET, and surface that single flag from list-installations + connect endpoints so the frontend Connect button stays disabled until both are set.
- DeleteGitHubInstallationByInstallationID now RETURNs workspace_id; webhook handler publishes github_installation:deleted scoped to the right workspace so already-open Settings tabs invalidate in real time. ErrNoRows on a re-fired delete short-circuits cleanly.
- tests: focused webhook integration coverage (auto-link + merge → done, cancelled preservation, uninstall returns workspace).

Co-authored-by: multica-agent <github@multica.ai>

* fix(github): i18n the new GitHub UI strings to satisfy lint

CI flagged every literal string in the Integrations tab, the Pull requests
sidebar section, and the per-PR row label. Move them through useT() and
add the matching `integrations.*` block to settings.json (en / zh-Hans)
plus `detail.section_pull_requests` / `detail.pull_request_state_*` /
loading + empty copy under `issues.json`.

Co-authored-by: multica-agent <github@multica.ai>

---------

Co-authored-by: multica-agent <github@multica.ai>
2026-05-12 13:49:03 +08:00
Bohan Jiang
f08b2b4f50 fix(attachments): harden local sidecar serving and tighten Upload gate (#2459)
Follow-ups to #2444:

- ServeFile refuses keys ending in .meta.json so the sidecar JSON isn't
  a stable read API. Sits before any disk work so a crafted
  .meta.json sibling can't trigger an out-of-tree read.
- ServeFile rejects paths that resolve outside uploadDir (via
  filepath.Rel) before readLocalMeta runs. http.ServeFile's own ..
  guard fires later on r.URL.Path, but readLocalMeta would otherwise
  do a stray disk read on <some-path>.meta.json before the 400 lands.
- Upload only writes a sidecar when filename is non-empty. ServeFile
  only reads the filename anyway, so a content-type-only sidecar was
  dead disk weight.
- Drop the dead json.Marshal error branch — marshaling two strings
  cannot fail.

Three new tests cover sidecar suffix rejection, the traversal guard,
and the no-filename Upload short-circuit.

Co-authored-by: multica-agent <github@multica.ai>
2026-05-12 12:49:22 +08:00
Truffle
91bdec9a54 fix(attachments): preserve original filename on /uploads/* downloads (#2444)
LocalStorage.ServeFile delegated straight to http.ServeFile without
setting Content-Disposition, so downloads of local-storage attachments
landed on disk under the UUID-based storage key instead of the human
filename the uploader had chosen. The S3 backend already sets
Content-Disposition on PutObject (s3.go:186-187), so the local backend
was the only one losing the original filename — a sibling asymmetry
that's been there since multi-backend support landed.

Upload now writes a sidecar <key>.meta.json beside the data file
capturing the original filename and sniffed content type. ServeFile
reads the sidecar when present and sets Content-Disposition using the
existing sanitizeFilename + isInlineContentType helpers, mirroring the
S3 inline/attachment decision exactly. Uploads from before this lands
have no sidecar and fall through to the previous behavior. Delete now
removes the sidecar alongside the data file so the upload directory
doesn't grow orphans.

Closes #2442
2026-05-12 12:37:07 +08:00
Naiyuan Qing
86aa5199fc feat(chat): support attachments & images in chat input (#2445)
* docs(plans): chat attachment & image support implementation plan

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-authored-by: multica-agent <github@multica.ai>

* feat(db): add chat_session_id/chat_message_id to attachment

Co-authored-by: multica-agent <github@multica.ai>

* feat(db): sqlc — chat_session_id on CreateAttachment + LinkAttachmentsToChatMessage

Co-authored-by: multica-agent <github@multica.ai>

* feat(file): upload-file accepts chat_session_id form field

Co-authored-by: multica-agent <github@multica.ai>

* feat(chat): SendChatMessage links uploaded attachments to the new message

Co-authored-by: multica-agent <github@multica.ai>

* feat(api): uploadFile accepts chatSessionId; sendChatMessage accepts attachmentIds

Co-authored-by: multica-agent <github@multica.ai>

* feat(core): useFileUpload supports chatSessionId context

Co-authored-by: multica-agent <github@multica.ai>

* feat(chat): support paste/drag/upload attachments in chat input

Co-authored-by: multica-agent <github@multica.ai>

* test(e2e): chat input attachment upload + send round-trip

Co-authored-by: multica-agent <github@multica.ai>

* chore(chat): keep lazy-created session title empty so untitled fallback localizes

Co-authored-by: multica-agent <github@multica.ai>

* fix(chat): address review — dedupe ensureSession + parse upload response

- chat-window: cache in-flight createSession promise in a ref so a file drop
  followed by a quick send no longer spawns two sessions (and orphans the
  attachment on the losing one).
- Attachment type + EMPTY_ATTACHMENT + AttachmentResponseSchema: include the
  new chat_session_id / chat_message_id fields the server now returns.
- uploadFile: route the response through parseWithFallback so a malformed
  body returns EMPTY_ATTACHMENT instead of an undefined-keyed Attachment,
  matching the API boundary rule.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-authored-by: multica-agent <github@multica.ai>

* fix(chat): address PR #2445 review — test ctx, send gating, attachment surface

1. Backend test was 400ing because the handler reads workspace from
   middleware-injected ctx, and `newRequest` only sets the header. Helper
   `withChatTestWorkspaceCtx` mirrors the agent-access-test pattern and
   loads the member row + SetMemberContext before invoking the handler.

2. Attachment metadata now flows end-to-end:
   - new sqlc `ListAttachmentsByChatMessageIDs` (batch lookup, mirrors the
     comment-side query)
   - `chatMessageToResponse` takes `attachments` and `ChatMessageResponse`
     surfaces them — same shape as CommentResponse
   - `ListChatMessages` loads them via a new `groupChatMessageAttachments`
     helper so the chat bubble can render file cards
   - daemon claim path pulls `ListAttachmentsByChatMessage` for the latest
     user message and ships `ChatMessageAttachments` to the daemon
   - `buildChatPrompt` lists id+filename+content_type and instructs the
     agent to `multica attachment download <id>` — fixes the private-CDN
     expiring-URL problem where the markdown URL would have expired by
     the time the agent acts
   - TS `ChatMessage` gains an optional `attachments` field

3. Chat composer now blocks send while uploads are in flight:
   - `pendingUploads` counter increments in handleUpload, SubmitButton
     uses it to disable
   - handleSend also gates on `editorRef.current.hasActiveUploads()` to
     catch the Mod+Enter path that bypasses the button
   - new vitest covers the "drop large file → immediate send" scenario
     where attachment id would otherwise be silently dropped

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-authored-by: multica-agent <github@multica.ai>

* chore: drop implementation plan doc

Process artefact, not something the repo needs to keep.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-authored-by: multica-agent <github@multica.ai>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Co-authored-by: multica-agent <github@multica.ai>
2026-05-12 10:57:54 +08:00
Bohan Jiang
63d215e1c3 feat(runtime): visibility (public/private) gate on CreateAgent / UpdateAgent (#2419)
* feat(runtime): visibility (public/private) gate on CreateAgent / UpdateAgent

Closes the hole where a plain workspace member could pick another member's
runtime in the Create Agent dialog and bind an agent to it — the backend
wasn't checking runtime ownership, so the agent ran on someone else's
hardware / tokens. Reported on GH #1804.

Schema
- Migration 083 adds agent_runtime.visibility ('private' default, 'public')
  with a CHECK constraint. Existing rows default to private — same
  ownership semantics as before, no behavior change for legacy data.

Backend
- canUseRuntimeForAgent predicate: allow when caller is workspace
  owner/admin, the runtime owner, or the runtime is public.
- CreateAgent and UpdateAgent both gate on it: UpdateAgent matters because
  a plain member could otherwise create on their own runtime, then re-bind
  to a private one.
- PATCH /api/runtimes/:id accepts { visibility } — owner/admin only,
  validated against the same private/public allow-list.

Frontend
- Create-agent dialog renders other-owned private runtimes disabled with a
  Lock badge + tooltip explaining who to ask.
- Inspector runtime-picker disables the same set so re-binding fails
  the same way at the UI layer.
- Runtime detail diagnostics gains a Visibility editor (owner/admin) or
  read-only chip (everyone else).
- Runtime list shows a private/public chip next to the name.

Tests
- Go: canUseRuntimeForAgent truth table; CreateAgent / UpdateAgent
  end-to-end gate tests (admin / runtime owner / plain member);
  PATCH visibility owner / admin / member / invalid-value coverage.
- Vitest: create-agent dialog disabled state on private/public runtimes,
  default-runtime selection skips locked rows; runtime detail visibility
  editor → mutation, read-only fallback.

Migrating runtimes: existing rows default to private to preserve the
"owner only" status quo. Owners switch to public via the detail page
diagnostics card.

Co-authored-by: multica-agent <github@multica.ai>

* fix(runtime): apply timezone+visibility atomically; don't seed locked template runtime

Two issues surfaced in review of MUL-2062:

1. PATCH /api/runtimes/:id ran the timezone branch first, which:
   - returned early on a tz no-op, silently dropping a concurrent
     `visibility` patch in the same body;
   - committed the timezone mutation (+ usage rollup rebuild) before
     validating visibility, so an invalid visibility left the row
     half-updated.

   Validate every field first, then run the mutations in order. The
   no-op short-circuit now only triggers when nothing else is requested.

2. The Create Agent dialog in duplicate mode unconditionally seeded
   `template.runtime_id` as the selected runtime, even when that runtime
   is now private and owned by someone else — the user saw a selected
   row they couldn't submit (Create → backend 403). Fall back to the
   first usable runtime when the template's runtime is locked, and gate
   the Create button on `selectedRuntimeLocked` as defense in depth.

Tests:
- Go: TestUpdateAgentRuntime_CombinedPatchAppliesBoth (tz no-op +
  visibility flip), TestUpdateAgentRuntime_InvalidVisibilityDoesNotMutateTimezone
  (atomic-fail invariant).
- Vitest: duplicate template pointing at a locked runtime now seeds
  the first usable one; Create button stays disabled when no usable
  alternative exists.

Co-authored-by: multica-agent <github@multica.ai>

---------

Co-authored-by: multica-agent <github@multica.ai>
2026-05-11 22:53:07 +08:00
Bohan Jiang
046e4b1efa fix(execenv): switch every provider's Windows reply template to --content-file (#2411)
Three user reports converge on the same Windows-shell encoding bug:

- #2198 / #2236 — Chinese, Codex on Win11. Comments / descriptions
  generated by the agent arrive as `?`.
- #2376 — Cyrillic, non-Codex agent ("Ops Lead") on Win11 Desktop.
  Title preserved (argv → CreateProcessW UTF-16), description / agent
  reply garbled (stdin → shell-codepage re-encoding).

woodcoal's independent diagnosis on #2198 confirms the root cause:
Windows PowerShell 5.1's `$OutputEncoding` defaults to ASCIIEncoding
when piping to a native command, so non-ASCII bytes are silently
replaced with `?` before they reach `multica.exe`. The CLI's stdin
parsing is fine; the bytes are corrupted upstream, in the agent's
shell layer.

This PR ships the fix that supersedes the codex-only attempt in
PR #2265 (which is closed in favour of this one):

## CLI

Add `--content-file <path>` to `multica issue comment add` and
`--description-file <path>` to `multica issue {create,update}`. The
CLI reads bytes off disk via `os.ReadFile` and skips the shell
entirely; UTF-8 survives end-to-end regardless of `$OutputEncoding`
or `chcp`. The three input modes (`--content`, `--content-stdin`,
`--content-file`) are mutually exclusive.

## Runtime config

`buildMetaSkillContent`'s Available Commands section is rewritten as a
neutral three-mode menu. The previous unconditional "MUST pipe via
stdin" / `--description-stdin` mandate (over-spread from #1795 /
#1851's Codex-multi-line fix) is gone for non-Codex providers; the
strong directive now lives only in the Codex-Specific section, which
branches on host:

- Codex / Linux+macOS: `--content-stdin` + HEREDOC (preserves MUL-1467
  fix against codex's literal `\n` habit).
- Codex / Windows: `--content-file` (PowerShell ASCII pipe is the
  exact bug we're patching).

## Per-turn reply template

`BuildCommentReplyInstructions` now takes a provider arg and branches
provider × OS:

- Windows + any provider → `--content-file` (the bug is shell-layer,
  not provider-layer; #2376 shows non-Codex agents on Windows also
  hit it). All providers write a UTF-8 file with their file-write tool
  and post via `--content-file ./reply.md`.
- Linux/macOS + Codex → stdin/HEREDOC (MUL-1467 protection).
- Linux/macOS + non-Codex → lightweight pre-#1795 inline
  `--content "..."`. The CLI server-side decodes `\n`, so escaped
  multi-line works; the agent retains stdin / file as escape hatches
  for richer formatting.

`BuildPrompt` and `buildCommentPrompt` gain a `provider` arg;
`daemon.runTask` already has it in scope.

## Tests

- `TestResolveTextFlag` — file-source verbatim with non-ASCII
  (`标题 / Заголовок / 中文段落`), missing-file error, empty-file
  rejection, three-way mutual exclusion.
- `TestInjectRuntimeConfigAvailableCommandsIsNeutral` — every
  non-Codex provider × {linux, darwin, windows} pins the three-mode
  menu present + over-spread "MUST stdin" substrings absent.
- `TestInjectRuntimeConfigCodexLinuxEmphasizesStdin` +
  `TestInjectRuntimeConfigCodexWindowsUsesContentFile` — Codex
  section's per-OS branch.
- `TestBuildCommentReplyInstructionsCodexLinux` +
  `TestBuildCommentReplyInstructionsNonCodexLinux` +
  `TestBuildCommentReplyInstructionsWindowsUsesContentFile` — the
  reply-template provider × OS matrix.
- `TestInjectRuntimeConfigWindowsCommentTriggerHasNoStdin` — end-to-end
  AGENTS.md / CLAUDE.md on Windows has no prescriptive stdin
  directive, for claude / codex / opencode.

`go test ./...` and `go vet ./...` clean.

Closes #2198, #2236, #2376.

Co-authored-by: multica-agent <github@multica.ai>
2026-05-11 17:05:45 +08:00
Kagura
702c48209b fix(agent): stop filtering Pi extension tools via hardcoded --tools allowlist (#2379) (#2381)
The Pi backend hardcoded `--tools read,bash,edit,write,grep,find,ls` in
buildPiArgs. Pi's SDK treats --tools as a restrictive allowlist: only the
listed tools pass through `_refreshToolRegistry()`, silently filtering
out any user-installed extension tools registered via `pi.registerTool()`.

Omitting --tools makes Pi's `allowedToolNames` undefined, so the
`isAllowedTool()` filter becomes a no-op and all tools — built-in and
extension — are available. This matches Pi's standalone behavior.

Users who want to restrict tools can still pass --tools via custom_args
(it is not in piBlockedArgs).

Closes #2379
2026-05-11 16:11:32 +08:00
Bohan Jiang
fae8558263 fix(daemon): self-heal when a runtime is deleted server-side (#2404)
Closes #2391.
2026-05-11 16:09:40 +08:00
Bohan Jiang
f5c2994aed feat(workspace): revoke a member's runtimes when they leave or are removed (#2401)
* feat(workspace): revoke a member's runtimes when they leave or are removed

Previously, leaving or being removed from a workspace only deleted the
member row — every runtime the departed user owned in that workspace
remained in the DB, kept its daemon_token valid, and stayed reachable to
the workspace's other members. The departed user lost access but their
machine kept doing work.

This change converges the runtime state in the same transaction as the
member-row deletion: agents pinned to those runtimes are archived,
in-flight tasks are cancelled (so the daemon's per-task status poller
interrupts the running agent gracefully), the runtimes are forced
offline, and the daemon_token rows are deleted. After commit the
DaemonTokenCache is invalidated and agent:archived / daemon:register
events fire so connected clients reconcile immediately.

Server-side state convergence is the production safety net; the
daemon_token revoke takes effect once the mdt_ flow is live (today most
daemons fall back to PAT/JWT, and the member-row deletion is what stops
those requests via requireWorkspaceMember).

Daemon-side handling (recognising the resulting 401/404 and tearing down
the local pairing for that workspace) lands in a follow-up.

Co-authored-by: multica-agent <github@multica.ai>

* fix(workspace): also cancel tasks for archived agents on member revoke

CancelAgentTasksByRuntime only matched tasks whose runtime_id was in the
revoked set, missing a real path: agent.runtime_id can be reassigned via
UpdateAgent, but agent_task_queue.runtime_id keeps the value from when
the task was queued. So an agent currently bound to the leaving member's
runtime gets archived correctly, but its older tasks still pinned to a
prior runtime stay 'queued' — and ClaimAgentTask does not gate on
agent.archived_at, so those orphaned tasks remain claimable by the
prior runtime.

Replace CancelAgentTasksByRuntime with CancelAgentTasksByRuntimeOrAgent,
which OR-matches runtime_ids and the archived agent IDs in one UPDATE.
Pass the archived agent IDs through from revokeAndRemoveMember.

Adds TestDeleteMember_CancelsTasksFromAgentReassignment as a regression
guard: same agent, two runtimes, the older task on the surviving runtime
must end up cancelled while the surviving runtime stays online.

Co-authored-by: multica-agent <github@multica.ai>

---------

Co-authored-by: multica-agent <github@multica.ai>
2026-05-11 15:06:50 +08:00
Bohan Jiang
02310d083e docs(util): clarify EnsureHiddenConsole call-order contract (#2399)
Co-authored-by: multica-agent <github@multica.ai>
2026-05-11 14:45:54 +08:00
Kagura
fb026f2607 fix(daemon): suppress git console windows on Windows (#2358)
* fix(daemon): suppress git console windows on Windows

Apply the same HideConsoleWindow pattern used for agent processes
(PR #1474) to all git commands spawned by the daemon's repo-cache,
execenv, and GC packages. Each exec.Command now calls
util.HideConsoleWindow(cmd) which sets CREATE_NEW_CONSOLE + HideWindow
so grandchildren inherit a hidden console instead of flashing visible
console windows.

Closes #2357

Co-Authored-By: Claude Opus 4 (1M context) <noreply@anthropic.com>

* refactor: use EnsureHiddenConsole at daemon startup

Replace per-site HideConsoleWindow(cmd) calls with a single
EnsureHiddenConsole() invoked once at daemon startup. The daemon
now owns a hidden console that every child process (git, cmd /c
mklink, etc.) inherits automatically, eliminating the need for
per-call SysProcAttr configuration.

This also covers the previously missed exec.Command in
codex_home_link_windows.go (cmd /c mklink) which never had a
HideConsoleWindow call.

Signed-off-by: kagura-agent <kagura.agent.ai@gmail.com>

---------

Signed-off-by: kagura-agent <kagura.agent.ai@gmail.com>
Co-authored-by: Claude Opus 4 (1M context) <noreply@anthropic.com>
2026-05-11 14:41:07 +08:00
Multica Eve
d6349c16ec feat(runtime): per-runtime timezone for token-usage aggregation (MUL-1950) (#2394)
* feat: per-runtime timezone for token usage aggregation

The runtime token-usage charts (daily and hourly tabs on the
runtime-detail page) bucketed every event by the Postgres session
timezone, which is UTC in production. For an operator in UTC+8 that
meant a Tuesday afternoon's tasks landed in Tuesday early-morning's
bar — the chart was always one off.

Fix: store an IANA timezone on agent_runtime and aggregate under it.

* migrations 081 / 082 add agent_runtime.timezone (TEXT NOT NULL
  DEFAULT 'UTC') and rebuild the rollup pipeline (window function
  and both trigger functions) to compute bucket_date with
  AT TIME ZONE rt.timezone instead of bare DATE().
* No historical backfill — task_usage_daily rows already on disk
  keep their UTC bucket_date; only future writes / re-touches
  recompute under the new tz. (Product call from MUL-1950: 'guarantee
  future correctness'.)
* runtime_usage.sql gains a @tz parameter on ListRuntimeUsage and
  GetRuntimeUsageByHour and threads tz through GetRuntimeTaskHourly  Activity. ListRuntimeUsageDaily reads bucket_date as-is since the
  rollup already wrote it in tz.
* parseSinceParamInTZ replaces the raw N×24h cutoff with start-of-
  day-N in the runtime's tz so 'last 7 days' lines up with bucket
  boundaries.
* Daemon registration sends the host's IANA tz (TZ env, then
  time.Local), and UpsertAgentRuntime preserves any user override
  via a CASE-on-existing-value pattern so a daemon reconnect can't
  silently revert the operator's setting.
* New PATCH /api/runtimes/:id endpoint (UpdateAgentRuntime) lets
  the runtime detail page edit the tz; the editor seeds with the
  browser tz on first interaction.

Refs: MUL-1950

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: multica-agent <github@multica.ai>

* fix: harden runtime timezone rollups

Co-authored-by: multica-agent <github@multica.ai>

* fix: address runtime timezone review nits

Co-authored-by: multica-agent <github@multica.ai>

---------

Co-authored-by: Eve <eve@multica.ai>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: multica-agent <github@multica.ai>
Co-authored-by: Eve <eve@multica-ai.local>
2026-05-11 14:39:35 +08:00
Multica Eve
e79ffc0f01 fix(agent): expand Copilot CLI model catalog with correct dotted IDs (#2336)
* fix(agent): expand Copilot CLI model catalog with correct dotted IDs

The Copilot CLI provider only exposed two models in the runtime
dropdown, and one of them used the dashed legacy form
`claude-sonnet-4-6` which `copilot --model` rejects with
"Model ... is not available". The CLI accepts dotted IDs
(e.g. `claude-sonnet-4.6`, `gpt-5.4`).

Sync `copilotStaticModels()` with the official supported-models
catalog so the dropdown surfaces the full set the user's account
can route to (8 OpenAI + 4 Anthropic), and add a regression test
that pins the expected IDs and bans the dashed form.

Closes MUL-1948.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: multica-agent <github@multica.ai>

* feat(agent): dynamic Copilot model discovery via ACP session/new

The previous static catalog could only ever lag behind the user's
real entitlements and what GitHub ships. Copilot CLI exposes the
live catalog through its ACP server (`copilot --acp`): the
`session/new` response includes `models.availableModels` plus
`currentModelId`, scoped to the authenticated account.

Wire copilot through the existing discoverACPModels helper —
already used by hermes/kimi/kiro — so the dropdown reflects the
account's real catalog, including the `auto` entry and per-tier
model availability (Pro / Pro+ / Enterprise / evaluation models).

The Copilot CLI puts itself into ACP server mode via the `--acp`
flag instead of an `acp` subcommand, so acpDiscoveryProvider now
takes an optional acpArgs override.

Copilot's ACP payload omits the vendor name, so a small
prefix-based inferCopilotProvider keeps the UI's openai /
anthropic / google grouping working.

When the binary is missing or auth fails, fall back to
copilotStaticModels() so self-hosted runtimes without a copilot
install still see a populated dropdown.

Verified against `copilot 1.0.44`: live discovery returns 13
models with gpt-5.5 marked Default. Closes MUL-1948.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: multica-agent <github@multica.ai>

* fix(agent): drop no-op COPILOT_ALLOW_ALL env and generalize OpenAI o-series prefix check

- discoverCopilotModels: remove COPILOT_ALLOW_ALL=1 (not a real
  Copilot CLI env var; copy-pasta from HERMES_YOLO_MODE=1).
  Discovery only drives initialize + session/new which never
  trigger tool-permission prompts, so no extra env is needed.
- inferCopilotProvider: replace the o1/o3/o4 prefix chain with a
  generic o<digit>+ check via isOpenAIReasoningSeriesID, so future
  o5/o6/… reasoning models are tagged as openai automatically.
  Guards against false positives like 'opus-…' or bare 'o'.
- Extend TestInferCopilotProvider with o5/o6 forward-compat cases
  and negative cases (opus-fake, omni, o).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: multica-agent <github@multica.ai>

---------

Co-authored-by: Eve <eve@multica-ai.local>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: multica-agent <github@multica.ai>
2026-05-11 14:36:43 +08:00
Multica Eve
72e89a74f3 fix: surface copilot failure details (#2396)
Co-authored-by: Eve <eve@multica-ai.local>
Co-authored-by: multica-agent <github@multica.ai>
2026-05-11 14:08:33 +08:00
Naiyuan Qing
a49222f37b fix(realtime): allow same-origin WebSocket (mobile/CLI) (#2395)
* fix(realtime): allow same-origin WebSocket clients (mobile/CLI)

The previous CheckOrigin implementation (PR #2318) bypassed the Origin
check whenever the request URL carried `client_platform=mobile` and no
browser session cookie. That contract requires every native client to
remember to add a query parameter — and in practice mobile clients hit
ws://localhost:8080/ws with no extra params, so the Origin filled by
the WebSocket library (the server's own host) gets rejected.

Replace the platform-specific bypass with same-origin acceptance: if
Origin's host equals the request Host, allow the upgrade. This is
gorilla/websocket's default CheckOrigin behavior, restored alongside
the existing cross-origin allowlist (for browser web/desktop clients).

Native clients are now zero-config. CSRF defense is unaffected:
SameSite=Strict cookies, the multica_csrf token, workspace membership
check, and the allowlist itself remain in place. Browser CSWSH attacks
fail both same-origin (browser forces Origin = page origin, not the
server's Host) and allowlist checks.

Refs: https://pkg.go.dev/github.com/gorilla/websocket
      https://cheatsheetseries.owasp.org/cheatsheets/WebSocket_Security_Cheat_Sheet.html

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-authored-by: multica-agent <github@multica.ai>

* fix(realtime): use case-insensitive Host comparison for same-origin

HTTP host is case-insensitive (RFC 7230 §2.7.3), and gorilla/websocket's
default checkSameOrigin uses equalASCIIFold(u.Host, r.Host). The plain
== comparison would reject legitimate same-origin requests with a
case-mismatched Host header (e.g. Host: LOCALHOST:8080 vs
Origin: http://localhost:8080).

Switch to strings.EqualFold and cover the case with a regression test.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-authored-by: multica-agent <github@multica.ai>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Co-authored-by: multica-agent <github@multica.ai>
2026-05-11 13:42:42 +08:00
Bohan Jiang
b26f850d4e feat(agents): gate private-agent surfaces with allowed_principals predicate (#2359)
* feat(agents): gate private-agent surfaces with allowed_principals predicate

Tighten chat/@-mention, history, edit, and delete entry points so private
agents are only reachable by their owner or workspace owner/admin. Agent-to-
agent traffic still bypasses the gate so A2A collaboration keeps working.

- New canAccessPrivateAgent predicate in handler/agent_access.go; used by
  comment.enqueueMentionedAgentTasks (replacing the inline check), GetAgent,
  ListAgents (filter), ListAgentTasks, GetWorkspaceAgentRunCounts /
  Activity30d / TaskSnapshot (workspace-wide aggregations no longer leak
  private-agent existence + counts), chat.CreateChatSession,
  chat.SendChatMessage (re-checks on every send so role changes can't leave
  a stale session as a back-door), and autopilot.shouldSkipDispatch
  (caller = autopilot creator).
- allowed_principals is computed inline as {agent.owner_id} ∪ workspace
  owner/admin members. No new table — manual config is intentionally not
  exposed in v1; the predicate is the extension seam.
- Front-end agent detail page distinguishes 403 (private agent the caller
  can't access) from 404 (deleted/missing) and renders a "no access"
  placeholder with a back-to-agents button.
- Go tests cover the pure predicate matrix + the four protected surfaces;
  vitest passes for the affected views.

Co-authored-by: multica-agent <github@multica.ai>

* feat(agents): gate issue assignment with the private-agent predicate

Refactor validateAssigneePair to call the shared canAccessPrivateAgent
helper. This closes the back door where a plain member could assign a
private agent to an issue and let normal task dispatch run it, side-
stepping the chat / @-mention gate. Agent callers (X-Agent-ID) bypass
so A2A delegation onto a private assignee still works.

Add an integration test covering all three callers (workspace owner,
agent owner, plain member).

Co-authored-by: multica-agent <github@multica.ai>

* fix(agents): close three private-agent gate bypasses found in PR review

1. X-Agent-ID forgery (resolveActor): require X-Task-ID alongside
   X-Agent-ID before trusting the agent identity. Without this a plain
   workspace member could set X-Agent-ID to any visible agent UUID and
   short-circuit the gate to "actor=agent, allow". Daemons already
   pair the two headers, so legitimate A2A traffic is unaffected.

2. Chat history read path (chat.go): GetChatSession / ListChatMessages /
   GetPendingChatTask / MarkChatSessionRead now go through a new
   gateChatSessionForUser helper that re-applies canAccessPrivateAgent
   after the ownership check, so a session creator whose role was later
   downgraded loses transcript access. ListChatSessions and
   ListPendingChatTasks filter their result sets by the same predicate.

3. Cross-workspace @mention (comment.enqueueMentionedAgentTasks):
   resolve the mentioned agent via GetAgentInWorkspace scoped to the
   issue's workspace so a UUID belonging to a different workspace's
   private agent can't slip past the gate (the gate was being applied
   against the current workspace's role table, which is the wrong
   one).

Regression tests cover each bypass, plus an update to the resolveActor
unit test to reflect the new "X-Agent-ID without X-Task-ID falls back
to member" contract.

Co-authored-by: multica-agent <github@multica.ai>

* test(handler): seed X-Task-ID alongside X-Agent-ID in existing agent-caller tests

After tightening resolveActor to require both headers (X-Agent-ID +
X-Task-ID) for the "agent" actor identity, three existing tests that
set only X-Agent-ID started failing because their requests now resolve
to "member" instead of "agent". Add createHandlerTestTaskForAgent
helper and seed a task per agent-caller assertion. Also patch
TestAgentExplicitMentionStillTriggers — it still passed only because
the @mention path doesn't care about author type for member callers,
but the test claims to exercise the agent path, so make it faithful.

Co-authored-by: multica-agent <github@multica.ai>

* test(handler): finish X-Task-ID seeding + fix cross-workspace mention test schema

The previous CI run still failed in two places:

1. server/cmd/server integration tests — postCommentAsAgent → authRequestWithAgent
   only set X-Agent-ID, so resolveActor downgraded the request to "member"
   and the on_comment chain produced the wrong task counts. Fix:
   authRequestWithAgent now also sets X-Task-ID, fetched or seeded by a new
   ensureAgentTask(agentID) helper.

2. TestMentionAgent_RejectsCrossWorkspaceAgentUUID's hand-crafted comment
   INSERT was missing comment.workspace_id, which migration 025 made
   NOT NULL. Pass testWorkspaceID into the seed row.

Build + vet clean locally; both packages compile.

Co-authored-by: multica-agent <github@multica.ai>

---------

Co-authored-by: multica-agent <github@multica.ai>
2026-05-11 12:39:45 +08:00
Bohan Jiang
39e57b870f fix(cli): allow --mode run_only on autopilot create/update (#2360)
* fix(cli): allow --mode run_only on autopilot create/update

The autopilot run_only dispatch path is wired end-to-end (handler accepts
the mode, AutopilotService.dispatchRunOnly enqueues a task with
AutopilotRunID, daemon resolves workspace via autopilot_run -> autopilot
in ClaimTaskByRuntime and TaskService.ResolveTaskWorkspaceID). The CLI
guard was added before those fixes landed and never removed.

Drop the CLI rejection on both create and update so callers can pick the
same modes the API and UI already support, and remove the stale "unstable"
callout from the autopilots docs.

Closes multica-ai/multica#2347

Co-authored-by: multica-agent <github@multica.ai>

* fix(daemon): advertise autopilot run_only in agent runtime instructions

The runtime config injected into AGENTS.md / CLAUDE.md only listed
`--mode create_issue` for autopilot create and didn't expose `--mode` on
update at all. So even after the CLI guard was lifted, agents reading
their harness instructions would still believe create_issue was the only
choice — undermining the "agents operate the same surface as humans"
intent.

Update both lines to advertise create_issue|run_only on create and on
update, and add an InjectRuntimeConfig assertion so the runtime prompt
can't drift away from the CLI surface again.

Co-authored-by: multica-agent <github@multica.ai>

---------

Co-authored-by: multica-agent <github@multica.ai>
2026-05-10 14:12:34 +08:00
Bohan Jiang
15c3886302 docs(daemon): refresh stale comment for inline system prompt path (#2362)
The inline path now carries the full runtime brief (CLI catalog,
workflow steps, persona, skills, project context) rather than just
identity/persona instructions, after #2353 / #2355. The pre-existing
comment still described it as "identity/persona instructions inline",
which would mislead future maintainers about why the inline payload is
load-bearing.

Also call out kiro/kimi alongside openclaw/hermes since they were added
to providerNeedsInlineSystemPrompt in #2328, and document the concrete
failure mode (issues stuck in todo) so the rationale is searchable.

Co-authored-by: multica-agent <github@multica.ai>
2026-05-10 14:00:08 +08:00
Kagura
a6968c7485 fix(daemon): inline runtime brief for providers that need system prompt (#2355)
InjectRuntimeConfig writes the full meta skill content (CLI catalog,
workflow instructions, project context, skills) to workdir/AGENTS.md,
but providers like OpenClaw, Hermes, Kiro, and Kimi read bootstrap
files from their own agent workspace — not the task workdir. The
inline system prompt path (providerNeedsInlineSystemPrompt) only
passed the agent persona instructions, so these providers never
received the runtime brief.

Have InjectRuntimeConfig return the rendered content so the daemon can
both write it to disk (for file-reading providers) and pass it inline
(for workspace-isolated providers). This avoids double-rendering and
keeps the file and inline payloads identical.

Fixes #2353
2026-05-10 13:57:05 +08:00
Bohan Jiang
b73a301bf9 fix(agent): drain stderr before deciding ACP failure promotion (#2333)
`hermes`, `kimi`, and `kiro` all wired stderr through
`cmd.Stderr = io.MultiWriter(logWriter, providerErrSniffer)`.
The OS-pipe → MultiWriter copy goroutine that exec spawns for
that form is only joined by `cmd.Wait()`, which the lifecycle
goroutine fires in deferred cleanup — *after*
`promoteACPResultOnProviderError` already consulted the sniffer.
When stopReason=end_turn (success) raced ahead of the stderr
drain, the sniffer's `lines` slice was empty, the helper fell
through to the synthetic agent-text fallback ("hermes provider
error: API call failed after 3 retries"), and the actionable
upstream signal (HTTP 429 / usage limit) was lost.

This was visible as a flaky
`TestHermesBackendPromotesProviderErrorWithNonEmptyOutput` in CI
under high parallelism — a real prod bug, not a test issue: live
runs hit the same race when an upstream LLM returns 429 and
hermes' synthetic agent turn beats the stderr drain to the
parent.

Replace the MultiWriter wiring with `cmd.StderrPipe()` + an
explicit copier goroutine that signals on `stderrDone`. The
lifecycle goroutine already awaits `<-readerDone` for stdout;
add `<-stderrDone` next to it before `promoteACPResultOnProviderError`
runs. The deferred `cmd.Wait()` ordering is unchanged — it just
becomes a cheap reap by the time it fires.

Verified: `go test ./pkg/agent/ -run "TestHermes|TestKimi|TestKiro"
-count=10 -race`, then full package `-count=3 -race`, all green.

Co-authored-by: multica-agent <github@multica.ai>
2026-05-09 17:34:25 +08:00
Bohan Jiang
d713b57072 fix(daemon): add kiro and kimi to providerNeedsInlineSystemPrompt whitelist (#2328)
Kiro and Kimi share Hermes' ACP architecture and already accept
SystemPrompt prepended in front of the user prompt (kiro.go:244-247,
kimi.go:256-257). Without daemon-side opt-in, ExecOptions.SystemPrompt
is never set, so per-task agent identity instructions are lost in
deployments that rely on inline injection (e.g. K3 Lens-style
daemon → wrapper → docker compose exec acp).

Co-authored-by: multica-agent <github@multica.ai>
2026-05-09 16:54:27 +08:00
LinYushen
f70105fb12 fix(agent): include JSON-RPC error data field in ACP error messages (#2327)
ACP backends (Kiro, Hermes, Kimi) put the actionable reason for
code=-32603 'Internal error' in the JSON-RPC `data` field, e.g.
"No session found with id". The wrapped Go error only carried
`code` and `message`, leaving operators staring at a bare
"kiro session/prompt failed: session/prompt: Internal error
(code=-32603)" with no way to tell apart session expiry, model
unavailability, lost auth, or quota.

Parse `data` too. Strings render unquoted; objects/arrays render
as raw JSON; null/missing keeps the previous format unchanged.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-09 16:19:57 +08:00
Bohan Jiang
c57546159d fix(daemon): mark provider 429 / out-of-credit agent runs as failed, not completed (#2323)
* fix(daemon): mark provider 429 / out-of-credit runs as failed, not completed

Two bugs combined to silently report failed agent runs as
"Completed" in the UI when the upstream LLM returned a 4xx (e.g.
HTTP 429 rate-limit / no credit on the account).

1. ACP backends (hermes, kimi, kiro) only promoted the run status to
   "failed" when their stderr sniffer fired AND the agent output
   buffer was empty. But hermes injects a synthetic agent text turn
   ("API call failed after 3 retries: HTTP 429...") on retry
   exhaustion, so the buffer was never empty in the rate-limit
   case and the promotion never ran. Drop the empty-output
   precondition: the sniffer's regex (HTTP-status markers, named
   error types) is specific enough to trust on its own.

2. The daemon's task-result switch only routed "blocked" through
   FailTask; every other status — including "cancelled", and any
   future status we forget to enumerate — fell through to
   CompleteTask. Invert it so only an explicit "completed" status
   reports success, and extract the switch into reportTaskResult
   for direct testing. Cancelled now defaults to failure_reason
   "cancelled" instead of being silently completed.

Closes GitHub multica#1952.

Co-authored-by: multica-agent <github@multica.ai>

* fix(agent): only promote ACP run to failed on terminal provider error

Address GPT-Boy's review on the multica#1952 fix. The previous
promotion rule ("any sniffer line → fail") was too broad: the
existing sniffer also captures transient per-attempt warnings
("API call failed (attempt 1/3): RateLimitError [HTTP 429]"), and
those lines stay in the buffer for the rest of the run. A retry
sequence whose first attempt blipped but whose third attempt
succeeded would have been wrongly reported as failed.

Tighten the criteria with two additional signals, both defined on
the existing acpProviderErrorSniffer / output buffer:

- acpTerminalErrorRe — sticky `terminal` flag set when stderr shows
  an exhausted/non-retryable marker (, [ERROR], "after N retries",
  Non-retryable, BadRequestError, AuthenticationError). Per-attempt
  warnings deliberately don't match.
- acpAgentOutputTerminalRe — matches the synthetic "API call failed
  after N retries..." turn that hermes-style adapters inject into
  the agent text stream when they give up; this catches multica#1952
  even if hermes' stderr only logged transient attempts.

Promotion logic becomes a shared helper, promoteACPResultOnProviderError,
called from hermes / kimi / kiro. Promotes when (a) terminalMessage
is non-empty, (b) output contains the synthetic give-up turn, or
(c) output is empty and the sniffer captured anything at all
(preserves the original empty-output safety net for transient-only
sequences with no real result to fall back on).

Tests:
- TestHermesProviderErrorSnifferTerminalVsTransient — transient
  attempt 1/3 alone returns terminalMessage="" but message!="";
  a follow-on terminal marker flips terminal on.
- TestHermesProviderErrorSnifferTerminalNonRetryable — confirms
  BadRequest / Authentication / Non-retryable /  / [ERROR] are
  classified terminal even on the very first attempt.
- TestHermesBackendDoesNotPromoteOnTransientRetry — fake hermes
  emits attempt 1/3 to stderr then a normal agent text turn and
  end_turn; resulting Status must stay "completed".

Co-authored-by: multica-agent <github@multica.ai>

---------

Co-authored-by: multica-agent <github@multica.ai>
2026-05-09 16:13:12 +08:00
Bohan Jiang
003dfd9b4b feat(quick-create): add project picker that remembers last pick (#2321)
* feat(quick-create): add project picker that remembers last pick

Quick-create users targeting one project repeatedly had to restate "in
project X" in every prompt. The modal now exposes a project picker beside
the agent picker, persists the selection per-workspace, and pins the
agent's `multica issue create` invocation to that project so the prompt
text doesn't have to.

The picked project also flows to the daemon as ProjectID/ProjectTitle and
its github_repo resources override the workspace repo fallback — same
treatment issue-bound tasks already get.

Co-authored-by: multica-agent <github@multica.ai>

* fix(quick-create): move project picker into property pill row

Reviewer feedback: the picker felt out of place wedged next to the agent
header. Move it into a property toolbar row above the footer, reusing the
shared `ProjectPicker` + `PillButton` so its placement and styling line up
exactly with the manual create panel.

This also drops the bespoke dropdown / aria / label strings that were only
needed while the picker rendered inline beside "Created by".

Co-authored-by: multica-agent <github@multica.ai>

* fix(quick-create): clear stale persisted project + carry across mode switch

Two review-blocking bugs in PR #2321:

1. The stale-id sweep in AgentCreatePanel only fired when projects.length > 0
   and only cleared local state, leaving lastProjectId pointing at a deleted
   project. The next open re-seeded the dead UUID and submit hit the server's
   `project not found` rejection. Gate on the query's `isSuccess` so we can
   tell "loading" apart from "loaded as empty", and clear both local state
   and the persisted preference when the selection isn't in the resolved list.

2. ManualCreatePanel's switchToAgent dropped the picked project from the carry
   payload, so flipping manual → agent silently fell back to the agent panel's
   own lastProjectId — potentially routing the issue to a different project
   than the one shown in manual mode. Forward project_id alongside prompt /
   agent_id, and add a regression test.

Co-authored-by: multica-agent <github@multica.ai>

* test(quick-create): pass new isExpanded props in stale-project tests

Main got an expand button on AgentCreatePanel via #2320 while this branch
was open, adding `isExpanded` / `setIsExpanded` to the panel's required
props. The two new stale-project tests still passed `{ onClose }` only,
which CI's typecheck (run on the main+branch merge) caught while my
local run did not.

Co-authored-by: multica-agent <github@multica.ai>

---------

Co-authored-by: multica-agent <github@multica.ai>
2026-05-09 16:12:12 +08:00
Bohan Jiang
3f20999597 refactor(timeline): drop server-side comment + timeline pagination (#2322)
* refactor(timeline): drop server-side comment + timeline pagination (MUL-1929)

The cursor-paginated /timeline and /comments endpoints were sized for a
problem the data shape doesn't have: prod p99 is ~30 comments per issue
and the all-time max is ~1.1k. Time-based pagination also splits reply
threads across page boundaries (orphan replies), which the frontend was
papering over with an "orphan rescue" that promoted disconnected replies
to top-level — confusing UX with no real benefit.

Replace both endpoints with a single full-issue fetch, capped server-side
at 2000 rows as a defensive safety net (never hit in practice).

Server
- /api/issues/:id/timeline now returns a flat ASC TimelineEntry[]
  (matches the legacy desktop contract — older Multica.app builds keep
  working because the wrapped TimelineResponse + cursors are gone, and
  the raw array shape was always what they consumed).
- /api/issues/:id/comments drops limit/offset; only ?since is honoured
  for the CLI agent-polling flow.
- Drop ListCommentsBefore/After/Latest, ListActivitiesBefore/After/Latest
  and the timelineCursor encoding.
- Replace with ListCommentsForIssue / ListCommentsSinceForIssue /
  ListActivitiesForIssue (capped by argument).

CLI
- multica issue comment list drops --limit / --offset and the X-Total-Count
  reporting; --since is preserved for incremental polling.

Frontend
- Replace useInfiniteQuery with useQuery in useIssueTimeline; drop
  fetchOlder/Newer, jumpToLatest, isAtLatest, newEntriesBelowCount.
- Remove timeline-cache helpers (mapAllEntries / filterAllEntries /
  prependToLatestPage) and the TimelinePage / TimelinePageParam types.
- WS event handlers update the single flat-array cache directly.
- Drop the orphan-reply rescue in issue-detail — every reply's parent
  is now guaranteed to be in the same array.
- Strip the "show older / show newer / jump to latest" buttons and their
  i18n strings.

Co-authored-by: multica-agent <github@multica.ai>

* fix(timeline): address review feedback on pagination removal

Three issues caught in PR #2322 review:

1. /timeline broke for stale clients between #2128 and this PR. They send
   ?limit/?before/?after/?around and parse with the wrapped TimelinePageSchema;
   the new flat-array response was failing schema validation and falling back
   to an empty timeline. Restore the wrapped shape on those query params
   (DESC entries, null cursors, has_more_*=false), keeping the flat ASC array
   for bare requests. Around-mode now also fills target_index from the merged
   slice so legacy clients can still scroll-to-anchor without a follow-up.

2. The agent prompts in runtime_config.go and prompt.go still told agents
   that `multica issue comment list` accepts --limit/--offset and to use
   `--limit 30` on truncated output. With those flags removed in this PR,
   new agent runs would hit "unknown flag" or skip context. Update the
   prompt copy to "returns all comments, capped at 2000; --since for
   incremental polling".

3. useCreateComment's onSuccess was a bare append to the timeline cache
   with no id-dedupe, so a fast comment:created WS event firing before
   onSuccess produced a transient duplicate. Restore the id guard the old
   prependToLatestPage helper used to provide.

Adds two new boundary tests:
- TestListTimeline_LegacyWrappedShape_OnPaginationParams
- TestListTimeline_LegacyWrappedShape_AroundFillsTargetIndex

Co-authored-by: multica-agent <github@multica.ai>

* test(handler): fix timeline test assertions for handler-package isolation

The TestListTimeline_* assertions assumed CreateIssue would seed an
"issue_created" activity_log row, but the activity listener that publishes
those rows is registered in cmd/server/main.go — handler-package tests
don't wire it up. CI saw 5 entries (3 comments + 2 activities) where the
test expected ≥6.

Drop the auto-activity assumption: assert exactly 5 entries in
TestListTimeline_MergesCommentsAndActivities, and tighten
TestListTimeline_EmptyIssue to assert a fully-empty timeline.

Co-authored-by: multica-agent <github@multica.ai>

---------

Co-authored-by: multica-agent <github@multica.ai>
2026-05-09 16:11:58 +08:00
Bohan Jiang
9ded462ecc feat(inbox): auto-archive stale task_failed rows on terminal status (#2319)
When an issue progresses to in_review / done / cancelled, archive any
pre-existing task_failed inbox rows for that issue across all member
recipients and emit inbox:batch-archived per recipient so connected
clients self-heal. Reuses the existing archived column rather than
introducing a parallel dismissed flag; the activity log preserves the
full failure history for audit independently of the inbox surface.

Closes #2291.

Co-authored-by: multica-agent <github@multica.ai>
2026-05-09 15:53:25 +08:00
Multica Eve
4b8939e78e fix: allow mobile websocket origin without cookies (#2318)
Co-authored-by: Eve <eve@multica-ai.local>
Co-authored-by: multica-agent <github@multica.ai>
2026-05-09 15:14:16 +08:00
Multica Eve
a2dd80d4f6 feat(autopilot): skip dispatch when assignee runtime is offline (MUL-1899) (#2311)
* feat(autopilot): skip dispatch when assignee runtime is offline (MUL-1899)

Prevents scheduled autopilots from accumulating doomed tasks against
offline / archived / unbound agents. Before this change, a paused laptop
or crashed daemon would let a 5-minute-cron autopilot pile up thousands
of queued agent_task_queue rows that no runtime would ever drain — this
is the dominant source of the 89k stuck-task backlog flagged in MUL-1899.

DispatchAutopilot now performs a pre-flight admission check on the
assignee agent's runtime status. If the runtime is not 'online' (or the
agent is archived / has no runtime bound / has no assignee), the run is
recorded as 'skipped' with a failure_reason and no task is enqueued.
Skipped runs still emit autopilot:run.done so the UI / activity feed
reflect that the trigger fired and was evaluated.

Skipped runs are deliberately NOT counted toward the failure-ratio
auto-pause: a user who closes their laptop overnight should not have
their autopilot paused. Sustained server-side failures keep their
existing pause path via the failure monitor.

Tests: added an integration test that creates an offline runtime and
asserts DispatchAutopilot records a skipped run with no task enqueued.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: multica-agent <github@multica.ai>

* feat(scheduler): expire stale queued tasks via TTL sweeper (MUL-1899)

Companion to the dispatch-time admission gate added in this PR. The
admission gate prevents *new* tasks from being enqueued against an
offline runtime, but it does not drain the historical backlog
(~89k stuck queued rows observed at MUL-1899 baseline) and does not
help when a runtime goes offline *after* a task has already been
queued. This adds a passive TTL sweeper:

- New SQL query `ExpireStaleQueuedTasks` transitions queued tasks
  older than the TTL to status='failed' with
  failure_reason='queued_expired' and a clear error message.
- Sweep is capped per tick (`queuedExpireBatchSize`, default 500) via
  a CTE+LIMIT so that draining a large backlog cannot monopolise the
  DB on a single tick. At 30s ticks the worst case is 60k rows/hour.
- Wired into the existing 30s `runRuntimeSweeper` loop alongside
  `sweepStaleTasks` and reuses `taskSvc.HandleFailedTasks` so the
  expired tasks broadcast `task:failed` events, reconcile agent
  status, and roll back any in-progress issues — same lifecycle as
  any other failed task.
- Default TTL = 2h. Conservatively above any reasonable
  "queued behind a long-running task" window (default agent timeout
  is 2h, sweeper runs every 30s) so legitimate work isn't expired.
- Integration tests cover the happy path (stale → expired, fresh →
  left alone, correct status/reason/error) and the per-tick batch cap.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: multica-agent <github@multica.ai>

* fix(autopilot): address review blockers from PR #2311 (MUL-1899)

GPT-Boy review of the offline-runtime + queued-TTL PR flagged four
blockers; this commit addresses them all.

1. Restore the 'skipped' autopilot_run status in the DB constraint.
   Migration 043 had removed 'skipped' along with the now-defunct
   concurrency_policy feature, so the new admission gate's INSERT of
   status='skipped' violated `autopilot_run_status_check` and broke
   `TestAutopilotDispatchSkipsWhenRuntimeOffline` in CI. New
   migration 079 re-adds 'skipped' to the CHECK list. The down
   migration migrates skipped → failed before re-tightening, mirror-
   ing what 043 did for the original removal.

2. Make `ExpireStaleQueuedTasks` race-safe.
   The CTE-then-UPDATE pattern could clobber a task that the daemon
   claimed between victim selection and the outer update. Two
   guards added:
     - `FOR UPDATE SKIP LOCKED` in the CTE so we never wait on a
       row that's currently being claimed (and never block the
       claim path either).
     - The outer UPDATE now re-checks `t.status = 'queued'` AND the
       TTL predicate so even if a row's lock is released after a
       successful claim, we cannot transition a now-dispatched/
       running task to 'failed'.

3. Add a partial index for the queued-TTL sweeper.
   `idx_agent_task_queue_queued_created_at` on `created_at WHERE
   status = 'queued'` — keeps the 30s sweep query (status=queued
   AND created_at < ... ORDER BY created_at LIMIT 500) cheap even
   when historical terminal rows accumulate (~89k+ at MUL-1899
   baseline). The partial predicate keeps the index tiny because
   only in-flight rows live in 'queued'.

4. Fix the failure-monitor denominator.
   `SelectAutopilotsExceedingFailureThreshold` had been counting
   'skipped' toward total runs, which would have diluted the failure
   ratio: a 100%-failing autopilot could mask itself behind a wall
   of admission skips. With 'skipped' restored as a real status,
   the auto-pause monitor must explicitly exclude it from BOTH
   numerator and denominator — admission skips are neither a
   success nor a failure.

Verified: `go test ./cmd/server/... ./internal/service/...` passes
(including TestAutopilotDispatchSkipsWhenRuntimeOffline,
TestExpireStaleQueuedTasks, TestExpireStaleQueuedTasksRespectsBatch
Limit). `go build ./... && go vet ./...` clean.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: multica-agent <github@multica.ai>

* fix(migrations): split queued-task TTL index into concurrent migration

Per PR #2311 review: agent_task_queue is a hot table, so building the
new partial index with plain CREATE INDEX inside migration 079 would
hold ACCESS EXCLUSIVE on the queue and block dispatch during deploy.

The migration runner does not allow CONCURRENTLY to share a file with
other statements (documented in 068), so split the index into its own
single-statement file 080 — matching the existing pattern in 035 /
067 / 074 / 075 / 078. Migration 079 keeps the autopilot_run
constraint change.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: multica-agent <github@multica.ai>

---------

Co-authored-by: Eve <eve@multica-ai.local>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: multica-agent <github@multica.ai>
2026-05-09 15:07:57 +08:00
Bohan Jiang
6d9ebb0fdd fix(daemon): unblock issues stuck on a poisoned-image agent session (#2314)
* fix(daemon): treat upstream API 400 invalid_request_error as poisoned session

A markdown-linked image in an issue description that the agent downloads as
a tiny CDN auth-error file and Read's as a PNG poisons the conversation:
the LLM API rejects the bad image with 400 invalid_request_error, the
session_id is pinned mid-flight, and every follow-up task on the issue
(comment-trigger, auto-retry) resumes the same poisoned conversation and
hits the same 400 — the issue can no longer be executed even after the
description is cleaned up.

Mirror the existing fallback-output classifier on the error side: detect
"API Error: ... 400 ... invalid_request_error" in the agent error string,
persist failure_reason='api_invalid_request', and add it to the
GetLastTaskSession exclusion list so the next task starts a fresh
session that re-reads the (now-clean) description.

Co-authored-by: multica-agent <github@multica.ai>

* fix(daemon): unblock issues already poisoned by API 400 invalid_request_error

The forward-only classifier from the previous commit only tags new failures.
Issues like MUL-1918 already have multiple failed-task rows whose
failure_reason is the pre-fix default 'agent_error', and GetLastTaskSession
falls back to those legacy rows on the next claim — so deploying the
classifier alone leaves existing poisoned issues stuck (GPT-Boy review
on PR #2314).

Two complementary changes:

- Migration 079 backfills failure_reason='api_invalid_request' on every
  pre-existing 'agent_error' row whose error text matches the canonical
  Anthropic 400 invalid_request_error shape. Keeps observability
  consistent (multica issue runs / UI now report the right reason).

- GetLastTaskSession adds a defensive ILIKE clause on error text. Closes
  the deploy-window gap where the old binary could write a new
  'agent_error' row between the migration running and the new code
  taking over, and protects against future error-format variants the
  daemon classifier might miss.

Plus regression tests covering the legacy + new coexistence case GPT-Boy
flagged, and a guard rail asserting benign 'agent_error' failures
(timeouts, tool errors) still resume their session.

Co-authored-by: multica-agent <github@multica.ai>

---------

Co-authored-by: multica-agent <github@multica.ai>
2026-05-09 14:39:10 +08:00
Valentin Mihov
560e081d8f Pass agent instructions inline to Hermes (#2283) 2026-05-09 14:23:41 +08:00
Multica Eve
46eed3b298 Add task dispatched analytics event (#2310)
Co-authored-by: Devv <devv@Devvs-Mac-mini.local>
Co-authored-by: multica-agent <github@multica.ai>
2026-05-09 14:11:20 +08:00
Bohan Jiang
0eb23df234 fix(agent): scope pi colon-to-slash normalization to legacy format (#2309)
PR #2281 added table-format support to parsePiModels but kept the
unconditional `strings.Replace(":", "/", 1)`, which would silently
rewrite a `:` inside a model name read from column 1 of the table
output (e.g. `claude-sonnet-4-6:exp` would become
`claude-sonnet-4-6/exp`). Move the replace into the legacy
`provider:model` branch so only the colon-as-separator case is
normalized, and restore a short doc comment describing the dual-
format contract. Test extended with a colon-bearing table row.

Co-authored-by: multica-agent <github@multica.ai>
2026-05-09 13:56:49 +08:00