35 Commits

Author SHA1 Message Date
Dmitry
0c2f93bcd1 fix: allow same-origin attachment previews (#4679) 2026-06-29 12:10:52 +08:00
Bohan Jiang
81bde585ba MUL-3467: batch load squad roster skills (#4386)
* MUL-3467: batch load squad roster skills

Co-authored-by: multica-agent <github@multica.ai>

* test: isolate redis-backed test databases

Co-authored-by: multica-agent <github@multica.ai>

---------

Co-authored-by: J <j@multica.ai>
Co-authored-by: multica-agent <github@multica.ai>
2026-06-22 13:14:22 +08:00
LinYushen
3943358e67 feat(billing): proxy /api/cloud-billing/* + Stripe webhook to multica-cloud (#3434) 2026-05-28 16:05:19 +08:00
LinYushen
c968c13c87 feat(auth): support mcn_ Cloud Node PATs verified via Fleet (#3349)
* feat(auth): support mcn_ Cloud Node PATs verified via Fleet

Adds a new token kind, mcn_ (multica cloud node), recognized in both
the regular Auth and DaemonAuth middlewares. mcn_ tokens are minted
and owned by Multica Cloud (not the local personal_access_tokens
table); the server validates them by POSTing to the Fleet's
/api/v1/pat/verify endpoint and uses the returned owner_id as
X-User-ID for downstream handlers.

Cloud is the authoritative owner of token status, so this is a
verifier-only path with no DB fallback:

  * Fleet says valid:false -> 401 (token genuinely bad)
  * Fleet unreachable / 5xx -> 503 (transient, retry)
  * No MULTICA_CLOUD_FLEET_URL configured -> 401 (fail closed)

Verification results are cached in Redis for 60s under
mul:auth:mcn:<sha256> to bound the per-request load on Fleet without
extending the revocation window beyond what the Cloud doc allows.
Negative results are NOT cached, so a freshly minted token doesn't
get locked out by a stale 'token_not_found'.

Reuses MULTICA_CLOUD_FLEET_URL (the same env the cloud-runtime proxy
already uses) so deployments don't need a second config knob.

Tests cover the happy path, every documented invalid reason, 4xx/5xx
mapping, network error, decode error, ctx cancellation, the
fail-closed valid:true-without-owner_id case, trailing-slash URL
normalization, and the Redis cache short-circuit + negative
no-cache contract. Middleware tests pin the four 401/503/200 outcomes
in both Auth and DaemonAuth.

* auth(mcn): require owner_id to map to a real local user; drop X-User-PAT plumbing

Two related changes:

1. Cloud-verified owner_id is now checked against our local users table.
   The Cloud owner_id and our users.id share the same UUID space by
   contract; a missing local user means either the row was deleted
   under an active node or something is forging owner_ids — either
   way, fail closed.

   CloudPATVerifier.Verify takes a new OwnerLookupFunc:
     - returns (true, nil)   -> success, cache + return
     - returns (false, nil)  -> ErrCloudPATInvalid (reason='owner_unknown'),
                                NOT cached (so a freshly-created user
                                doesn't get locked out for a TTL window)
     - returns (_, error)    -> ErrCloudPATUnavailable (transient,
                                middleware emits 503)

   Both Auth and DaemonAuth wire ownerLookupFor(queries), a new shared
   helper that wraps queries.GetUser, mapping pgx.ErrNoRows / unparseable
   UUIDs to (false, nil) and other errors to a real Go error.

2. Removed all X-User-PAT plumbing. Cloud now mints node-scoped mcn_
   PATs itself during /api/v1/nodes (see multica-cloud
   docs/api/node-pat.md) and ships them into the EC2 instance via SSM,
   so multica-api no longer needs to forward the caller's mul_ PAT.
   Propagating a long-lived user PAT into a remote machine widened
   the blast radius of any node compromise; that's gone now.

   Removed:
     - cloud_runtime.go: withUserPAT option, cloudRuntimeUserPAT,
       generateCloudRuntimePAT, revokeGeneratedPAT
     - cloudruntime/Request.UserPAT field + X-User-PAT header
     - X-User-PAT from CORS allowed headers
     - obsolete handler tests:
         TestCreateCloudRuntimeNodeForwardsValidatedPAT
         TestCreateCloudRuntimeNodeRejectsUnownedPAT
         TestCreateCloudRuntimeNodeRejectsExpiredPAT
         TestCreateCloudRuntimeNodeAutoGeneratesPAT
       replaced with TestCreateCloudRuntimeNodeForwardsBody
     - X-User-PAT references in packages/core/api/client.test.ts

Tests:
  * 3 new verifier-level tests (owner_unknown not cached, lookup error
    -> Unavailable, success path is cached for both fleet AND lookup)
  * 5 new owner_lookup_test.go tests (nil queries, existing user,
    missing user, malformed UUID, DB error)
  * 1 new end-to-end DaemonAuth test (cloud says valid, no local user
    -> 401)
  * Existing X-User-PAT TS assertions removed; full vitest run passes.
  * go test ./... and go vet ./... clean on the server module.
2026-05-27 14:52:03 +08:00
Bohan Jiang
13f74e651a feat(agents): remove custom_env from agent resources, add audited env endpoint (MUL-2600) (#3209)
* feat(agents): remove custom_env from agent resources, add audited env endpoint (MUL-2600)

The agent resource shape (list / get / create / update / archive /
restore responses + WebSocket events) no longer carries `custom_env`
values. Reads/writes of env now flow exclusively through a dedicated
`/api/agents/{id}/env` endpoint that is owner/admin-only, rejects
agent-actor sessions, applies a "****" sentinel preserve guard on
PUT, and writes a persistent audit row per reveal/update.

Why
- `multica agent list --output json` historically returned plaintext
  `custom_env` for owner/admin callers (the redaction gate gave only
  members the masked map). Any agent token running on the workspace
  inherits its owner's role and could read every other agent's
  secrets just by listing.
- Patching list/get redaction alone (PR #3175 direction) left
  symmetric leaks via mutation responses, WS events, the "reveal"
  path itself (no actor-aware auth), and a `****` overwrite footgun
  on UpdateAgent.

What changed
- Backend: drop `custom_env` from AgentResponse; add coarse
  `has_custom_env` + `custom_env_key_count`. Strip env handling from
  UpdateAgent (silently ignored if sent). Keep CreateAgent's
  custom_env acceptance.
- Backend: new GET/PUT `/api/agents/{id}/env` handlers in
  `internal/handler/agent_env.go`:
  - resolveActor → 403 for agent actors (closes the lateral-movement
    path).
  - Owner/admin role gate via existing helper.
  - PUT honours value == "****" as "preserve existing value".
  - Both write to `activity_log` with `agent_env_revealed` /
    `agent_env_updated` actions. Audit details record key names only,
    never values.
- Daemon claim path (`ClaimAgentTask`) unchanged — `TaskAgentData`
  still carries plaintext env for runtime injection.
- SQL: new `UpdateAgentCustomEnv` query; sqlc regenerated (v1.31.1).
- CLI: new `multica agent env get|set` subcommands. `--custom-env*`
  flags removed from `multica agent update`; the no-fields error
  now points to the new path.
- Frontend: drop env fields from `Agent` + `UpdateAgentRequest`; add
  `getAgentEnv` / `updateAgentEnv` client methods; rewrite env-tab
  to show "N variables configured" + explicit "Reveal & edit"
  button, fetching values only on intentional reveal.
- Locales: parity-safe additions to en + zh-Hans.
- Docs: agents-create.{mdx,zh.mdx} reflect the new threat model and
  endpoint.
- Mobile: schema drops `custom_env` / `custom_env_redacted`, adds
  metadata fields.

Tests
- Handler tests pinned the new invariants: no env in list/get
  responses, owner reveal happy-path + audit row, agent-actor 403,
  `****` sentinel preserves real values, UpdateAgent silently
  ignores `custom_env`, pure `mergeAgentEnv` cases.
- CLI tests pivot to the new flag surface: `agent update` MUST NOT
  expose the env flags; `agent env set` MUST expose
  --custom-env-stdin/--custom-env-file.
- Frontend test fixtures updated; pnpm typecheck / test / lint
  pass cleanly.

This is a breaking API change. Scripts that read `custom_env` from
`/api/agents` must migrate to `GET /api/agents/{id}/env`.

Co-authored-by: multica-agent <github@multica.ai>

* fix(agents): close actor-spoofing + audit fail-closed in env endpoints (MUL-2600)

Addresses Elon's review of #3209:

* Mint a task-scoped `mat_` token per claim, bound to (agent, task,
  workspace, owner). Daemon injects it into the agent process in place
  of its own credential. Auth middleware authoritatively rebuilds
  X-User-ID / X-Agent-ID / X-Task-ID from the token row and sets
  X-Actor-Source=task_token; that header is server-set only — incoming
  values are stripped before any auth branch runs. resolveActor honors
  the header so an agent that strips X-Agent-ID / X-Task-ID still
  resolves as actor=agent.
* GetAgentEnv / UpdateAgentEnv are now fail-closed on audit-log
  failures: GET refuses to return plaintext, PUT persists inside the
  same tx as the audit row so they commit/roll back together.
* PUT /api/agents/{id} returns 400 when the body carries custom_env
  instead of silently dropping it — directs callers to the audited env
  endpoint.
* Agent actors never see mcp_config, even when the underlying member
  is owner/admin; mutation broadcasts go through a redaction shim so
  WS subscribers don't pick it up either.
* Fix backend test that asserted dense JSON (jsonb::text renders
  whitespace) and frontend test that assumed a unique "Test User"
  match.

Co-authored-by: multica-agent <github@multica.ai>

* fix(agents): close residual MUL-2600 gaps from review (MUL-2600)

Migration 108 FK now correctly references agent_task_queue(id) instead
of the non-existent agent_task table; the previous name blocked CI
backend migrations.

Task-token-authenticated requests can no longer be re-routed at a
different workspace by passing workspace_slug / workspace_id /
?workspace_id / a URL workspace param. ResolveWorkspaceIDFromRequest
and resolveWorkspaceUUID both short-circuit on X-Actor-Source=task_token
and return only the token-bound X-Workspace-ID; buildMiddleware adds a
defence-in-depth 403 if any URL-resolved workspace disagrees with the
token binding.

mcp_config no longer leaks back to agent actors through UpdateAgent /
CreateAgent / ArchiveAgent / RestoreAgent HTTP responses — the same
redactAgentResponseForActor helper that GetAgent/ListAgents use is now
applied to mutation responses too. WS broadcasts were already redacted
via broadcastAgentResponse.

FailTask and every TaskService cancel path (CancelTask /
CancelTasksForIssue / CancelTasksForAgent / CancelTasksByTriggerComment
/ BroadcastCancelledTasks) now eagerly DeleteTaskTokensByTask so the
mat_ token's 24h window doesn't outlive a terminated task. Failure is
non-fatal — the FK cascade and expiry remain durable guards.

Doc-only: clarify that PUT /api/agents/{id} now hard-rejects bodies
that carry custom_env (was previously "silently ignores").

Tests:
- middleware: TestResolveWorkspaceIDFromRequest gains a task_token
  case asserting client-supplied slug/id/query cannot override the
  bound workspace.
- handler: TestUpdateAgent_RedactsMcpConfigForAgentActor and
  TestUpdateAgent_KeepsMcpConfigForMemberActor pin the mutation-
  response redaction contract per actor type.

Co-authored-by: multica-agent <github@multica.ai>

* fix(agents): match redacted mcp_config as JSON null, not Go nil (MUL-2600)

`AgentResponse.McpConfig` is `json.RawMessage` without `omitempty`, so
the redacted response serialises as `"mcp_config": null`. On decode,
`json.RawMessage` keeps the literal bytes `null` rather than collapsing
to Go nil, which made the assertion fire on a non-leak.

The product contract (field always present, distinguished from "no
config" via `mcp_config_redacted`) is intentional, so adjust the test
to check for "no secret-bearing content" instead of weakening the
contract via `omitempty`.

Co-authored-by: multica-agent <github@multica.ai>

---------

Co-authored-by: multica-agent <github@multica.ai>
2026-05-25 18:42:48 +08:00
iYuan
2f1f90c11a fix(agent): retry codex semantic inactivity fresh (#2593) 2026-05-20 20:03:39 +08:00
Kagura
59617f376e feat(auth): make auth token TTL configurable via AUTH_TOKEN_TTL env var (MUL-2371) (#2713)
* feat(auth): make auth token TTL configurable via AUTH_TOKEN_TTL env var

Add AUTH_TOKEN_TTL environment variable (in seconds) to override the
hardcoded 30-day auth token lifetime. Self-hosted deployments on trusted
networks can set a longer value to avoid frequent magic-link
re-authentication.

The value is read once at startup and cached. Invalid or missing values
fall back to the 30-day default with a warning log.

Closes #2685

* refactor(auth): extract parseAuthTokenTTL for testability

Address review feedback: extract pure parse function from sync.Once
wrapper so the parsing logic can be unit-tested independently.
Add TestParseAuthTokenTTL with table-driven cases.

Co-Authored-By: Claude Opus 4 (1M context) <noreply@anthropic.com>

* refactor(auth): accept Go duration strings + hoist shared TTL in SetAuthCookies

Address nice-to-have review feedback from Bohan-J:
- parseAuthTokenTTL now tries time.ParseDuration first (e.g. '8760h'),
  falling back to ParseInt for integer seconds
- Warn on unreasonable values (>10 years) but still accept them
- Hoist AuthTokenTTL() and time.Now() in SetAuthCookies so both
  cookies share the exact same expiry
- Add security trade-off note in .env.example
- Add 5 new test cases for duration strings

Co-Authored-By: Claude Opus 4 (1M context) <noreply@anthropic.com>
Signed-off-by: kagura-agent <kagura.agent.ai@gmail.com>

* fix: use AuthTokenTTL() in CloudFront middleware, guard ParseInt overflow

Address review feedback from Bohan-J (round 2):

1. CloudFront refresh middleware (cloudfront.go:21) was hardcoding
   30*24*time.Hour instead of using auth.AuthTokenTTL(). Now calls
   AuthTokenTTL() so the middleware respects AUTH_TOKEN_TTL env var.

2. parseAuthTokenTTL integer-seconds branch: very large values like
   9999999999 would silently overflow int64 when multiplied by
   time.Second. Added overflow guard comparing against
   math.MaxInt64/int64(time.Second) before the multiplication.

3. Updated AuthTokenTTL() doc comment to reflect that it accepts
   Go duration strings or integer seconds (not just seconds).

4. Added middleware test (cloudfront_test.go) verifying short
   AUTH_TOKEN_TTL produces short cookie expiry, not 30-day hardcode.
   Also covers nil signer and existing-cookie-skip cases.

5. Added integer overflow test case to cookie_test.go.

* style: run gofmt on cookie.go and cookie_test.go

---------

Signed-off-by: kagura-agent <kagura.agent.ai@gmail.com>
Co-authored-by: Claude Opus 4 (1M context) <noreply@anthropic.com>
2026-05-19 16:22:07 +08:00
Zohar Babin
e50bfc88da fix(auth): add per-IP rate limiting on public auth endpoints (#2636)
Adds a Redis-backed fixed-window rate limiter middleware on /auth/send-code,
/auth/verify-code, and /auth/google. Prevents brute-force enumeration,
verification_code table flooding, and connection pool exhaustion from
rapid-fire unauthenticated requests.

Key design decisions per reviewer feedback:

- X-Forwarded-For trust model: XFF is NEVER trusted by default. Only
  honored when RemoteAddr is from a CIDR in RATE_LIMIT_TRUSTED_PROXIES.
  Uses rightmost-untrusted algorithm (walks XFF right-to-left, returns
  first non-trusted IP). Matches the project's conservative model in
  health_realtime.go.

- Atomic INCR+EXPIRE via Lua script: prevents a stuck key (permanent
  ban) if EXPIRE fails independently. Follows existing Lua script
  pattern in runtime_local_skills_redis_store.go.

- Fixed-window counter (not sliding-window): simple, adequate for auth
  rate limiting where precision at window boundaries is acceptable.

- Fail-open with startup warning: nil Redis disables rate limiting
  (same as PATCache), but logs a warning at startup so ops can see.

- IPv6 normalization: net.ParseIP().String() produces canonical form.

- Configurable via env vars: RATE_LIMIT_AUTH (default 5/min),
  RATE_LIMIT_AUTH_VERIFY (default 20/min), RATE_LIMIT_TRUSTED_PROXIES.

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-05-18 12:59:28 +08:00
Kerim Incedayi
9418d2a2c1 feat(autopilots): webhook triggers (server + CLI + UI + docs) MUL-2049 (#2348)
* feat(server): add webhook trigger DB migration + sqlc queries

Lays the foundation for webhook autopilot triggers:
- partial unique index on autopilot_trigger.webhook_token (kind=webhook only)
  so the public ingress route can resolve a trigger in O(1)
- GetWebhookTriggerByToken / TouchAutopilotTriggerFiredAt /
  RotateAutopilotTriggerWebhookToken / SetAutopilotTriggerWebhookToken
  queries, regenerated with sqlc

* feat(server): webhook token generator + payload normalizer

Two pure helpers for the webhook autopilot work:
- generateWebhookToken: 32 random bytes -> base64-url, "awt_" prefix.
  256 bits of entropy keeps brute-force off the table; the prefix makes
  leaked tokens recognisable in logs.
- normalizeWebhookPayload: turns arbitrary JSON into the WebhookEnvelope
  shape (event/eventPayload/request) used by trigger_payload. Header- and
  body-based event inference covers GitHub, GitLab, X-Event-Type, and
  caller-provided envelopes; scalar/empty/invalid bodies are rejected so
  the handler can answer 400.

* feat(server): generate webhook tokens and expose rotate endpoint

- New handler.Config.PublicURL fed by MULTICA_PUBLIC_URL env so
  /api/autopilots/.../triggers responses can include an absolute
  webhook_url alongside the always-present webhook_path.
- CreateAutopilotTrigger now mints a webhook_token via crypto/rand
  for kind=webhook and ignores cron/timezone for non-schedule kinds.
  api triggers stay accepted-but-inert per PLAN.md.
- New POST /api/autopilots/{id}/triggers/{triggerId}/rotate-webhook-token
  protected by the existing workspace auth group; old tokens stop
  working immediately because the unique-index lookup keys on the
  current row value.

* feat(server): public webhook ingress route + per-token rate limiter

- New POST /api/webhooks/autopilots/{token} route, mounted outside the
  authenticated group: the path token is the credential. Workspace
  context is derived from the joined autopilot row, never headers.
- Body capped at 256 KiB via http.MaxBytesReader; oversized payloads
  return 413 mid-read instead of being fully buffered.
- Disabled triggers / paused / archived autopilots return
  200 {"status":"ignored"} so providers stop retrying.
- Skipped-runtime dispatches surface 200 {"status":"skipped"} with the
  reason from the autopilot service's pre-flight admission check.
- WebhookRateLimiter interface with sliding-window in-memory + Redis
  Lua-script implementations. Default 60 req/min per token. Test
  coverage on the in-memory path; Redis variant fails open on cache
  errors so a Redis hiccup never blocks ingress.
- Integration tests exercise token generation, dispatch, payload
  envelope persistence, GitHub-header inference, paused/disabled
  short-circuits, oversized rejection, and rotate-then-old-token-404.

* feat(server): include webhook payload in create_issue description

When an autopilot run is triggered by a webhook and execution_mode is
create_issue, the agent only sees the issue body — never the run's
trigger_payload. Append a 'Webhook event:' line and a fenced JSON block
with the normalized eventPayload so the agent has the inbound context
inline. Schedule / manual runs are unchanged.

Tests cover:
  - schedule path keeps existing italic note, no webhook block
  - webhook path emits event line + payload block, italic before block
  - non-envelope JSON falls back to raw body (defensive)
  - non-webhook source with payload still gets no webhook block

* feat(core): types, API client and mutations for webhook triggers

- AutopilotRunStatus gains 'skipped' so the run-list UI handles the
  admission-skipped state explicitly instead of falling through to a
  generic case (the backend already emits it via MUL-1899).
- AutopilotTrigger picks up optional webhook_path / webhook_url. Both
  are optional so older self-hosted servers that pre-date this change
  still parse cleanly.
- buildAutopilotWebhookUrl helper composes a usable absolute URL with
  the priority webhook_url > apiBaseUrl + path > origin + path > path.
  Tested with seven cases covering each branch.
- ApiClient.rotateAutopilotTriggerWebhookToken posts to
  /api/autopilots/{id}/triggers/{triggerId}/rotate-webhook-token; the
  HTTP-contract test pins URL + method.
- useRotateAutopilotTriggerWebhookToken mutation invalidates
  autopilotKeys.detail on settle, mirroring the existing trigger-mutation
  pattern.

* feat(views): webhook trigger UI in Add Trigger dialog and trigger row

Add Trigger dialog gains a Schedule/Webhook segmented toggle:
  - Schedule reuses TriggerConfigSection unchanged.
  - Webhook hides the cron config and shows a help line; the trigger is
    created with kind=webhook and the URL is generated server-side.
  - Toast text differentiates schedule vs webhook on success.

TriggerRow grows a webhook branch:
  - Webhook icon, kind translated via trigger_kind.
  - URL shown in a truncating monospace pill, with copy + rotate
    buttons. Copy uses navigator.clipboard with toast feedback; rotate
    uses an AlertDialog confirm because the old URL stops working
    immediately.
  - api triggers render a Deprecated badge and skip URL/copy/rotate
    affordances.

RunRow gains a 'skipped' RUN_VISUAL entry (muted dash) so admission-
skipped runs don't fall through to a generic case. Source label uses the
new run_source i18n key instead of capitalize.

Locales: en + zh-Hans gain run_status.skipped, run_source.*,
trigger_kind.*, trigger_row.{copy_url,rotate_url,*_confirm_*,toast_*},
add_trigger_dialog.{type_*,webhook_help,toast_added_{schedule,webhook}}.

* feat(cli): support webhook trigger creation and URL rotation

- multica autopilot trigger-add now takes --kind schedule|webhook
  (default schedule for backward compatibility). For webhook it skips
  --cron / --timezone validation and prints the resulting webhook URL,
  preferring the server-provided webhook_url and falling back to
  client.BaseURL + webhook_path.
- New multica autopilot trigger-rotate-url <autopilot-id> <trigger-id>
  command for rotating the bearer URL of a webhook trigger.

* docs(autopilots): add webhook trigger guide (en + zh)

Replaces the 'Webhook and API triggers are not available yet' section
with end-to-end webhook documentation: how the URL is generated, what
payload shapes are accepted, the inferred-event rules, the bearer-secret
warning + rotate flow, status-code semantics for accepted/skipped/
ignored/4xx/5xx outcomes, and the MULTICA_PUBLIC_URL self-host
configuration.

Run history list now mentions skipped status. The 'unavailable
features' section narrows to api-kind triggers, HMAC signing, IP
allowlists, and provider presets.

* feat(views): add Schedule/Webhook toggle to the create autopilot dialog

Closes the gap where a brand-new autopilot could only be created with a
schedule trigger. The right-column config now has a Trigger section
with a segmented Schedule/Webhook control:
  - Schedule keeps the existing cron/timezone UI.
  - Webhook hides the cron UI and shows a help line; on submit, a
    kind=webhook trigger is created right after the autopilot.

In edit mode the toggle is intentionally hidden (PLAN.md treats trigger-
type changes as delete-old + create-new, not in-place updates), but the
panel still picks the right kind based on props.triggers[0].kind so a
webhook autopilot doesn't render an irrelevant cron form.

Locales: section_trigger_kind, trigger_kind_{schedule,webhook},
section_webhook, webhook_help_{create,edit} added in en + zh-Hans.

* feat(views): show webhook URL inline after creating a webhook autopilot

After a successful create with kind=webhook, the dialog stays open and
swaps to a confirmation panel showing the freshly minted URL with a
copy button + 'Treat this URL like a password' warning + Done button.
Avoids the friction of "create the autopilot, then go find it in the
list, click in, scroll to triggers, copy URL."

Locales: dialog.webhook_created_{title,description,warning,done} added
in en + zh-Hans.

Schedule create flow is unchanged (toast + close). The success panel is
gated on the trigger returned from the create mutation, so a partial
failure (autopilot created, trigger creation errored) still falls
through to the toast_create_partial path.

* feat(views): show webhook payload in run detail dialog

The agent transcript dialog now accepts an optional headerSlot that
sits above the event list. The autopilot RunRow drops a
WebhookPayloadPreview into that slot when the run came from a webhook
and trigger_payload is non-empty.

The preview is collapsed by default (the transcript itself is the main
event), shows the inferred event name + receivedAt in the header, and
reveals the eventPayload as pretty-printed JSON with a copy button on
expand. Falls back gracefully if the row's trigger_payload doesn't
match the WebhookEnvelope shape — the whole value is shown instead so
nothing is hidden.

Closes the "agent didn't echo the payload, now I can't see what
triggered the run" gap. PLAN.md tracked this as
"Payload preview in run history" under follow-ups.

Locales: webhook_payload.{label, unknown_event, payload, content_type,
copy, copied, copied_short, copy_failed} added in en + zh-Hans.

* chore(server): wire MULTICA_PUBLIC_URL through self-host compose

Two small follow-ups split out of the webhook trigger PR:

- docker-compose.selfhost.yml passes MULTICA_PUBLIC_URL into the
  backend container so a self-hosted deployment behind a real domain
  gets absolute webhook URLs in the trigger response. Documented in
  .env.example with the rationale for not deriving the public host
  from request headers.
- Drop a duplicated 'invalid json:' prefix in the webhook ingress
  400 error path. normalizeWebhookPayload already prefixes its
  errors, so the handler doesn't need to re-prefix.

* fix(migrations): renumber webhook trigger migration 081 → 089 to avoid collision

The branch's 081_autopilot_webhook_triggers.{up,down}.sql collided
numerically with 081_runtime_timezone.{up,down}.sql that landed on
main, making migration apply order undefined. Renumber to 089 so the
file slots after the latest main migration (088_squad_instructions).

The SQL itself doesn't conflict — it only creates a partial unique
index on autopilot_trigger.webhook_token — but the duplicate prefix
is what the migration runner sees, so the filename must move.

* fix(autopilot-webhook): address PR review blocking issues

- Redact bearer tokens from request logs: paths matching
  /api/webhooks/autopilots/<token> now log "[redacted]" instead of the
  token. The resolved trigger ID is plumbed via context so audit lines
  stay useful for debugging. (Review item Blocking #1.)
- Distinguish pgx.ErrNoRows from transient DB errors in token lookup:
  no-row stays 404 (so providers don't retry on a deleted webhook),
  other errors return 500 (which providers DO retry, avoiding silent
  drops on DB blips). (Review item Blocking #2.)
- Add per-IP sliding-window rate limiter that runs BEFORE the token
  lookup, so spraying random tokens can no longer probe the
  autopilot_trigger index unboundedly. Reuses the existing Lua script
  with a separate Redis key namespace; falls open on Redis errors.
  Default budget 30 req/min/IP. (Review item Blocking #3.)

The webhook handler now applies the gates in the order: per-IP rate
limit → token lookup → per-token rate limit → handler logic.

* fix(autopilot): atomic webhook trigger creation + strict kind/timezone validation

- Mint the webhook bearer token BEFORE the INSERT and pass it via
  CreateAutopilotTriggerParams so the row never exists in a half-written
  kind=webhook + webhook_token=NULL state. On the (vanishingly rare)
  unique-index collision the whole INSERT is retried with a fresh token
  — no UPDATE second step. Removes the now-dead attachFreshWebhookToken
  helper. (Review item Recommended #4.)
- Add new GET /api/autopilots/{id}/runs/{runId} endpoint that returns a
  single run including the full trigger_payload. The list response is
  now slim (omits trigger_payload) so worst-case payload size drops
  from ~5 MB to ~5 KB. (Review item Recommended #5, server side.)
- Reject kind=api with 400 ("kind=api is deprecated; use schedule or
  webhook") and reject kind=webhook with --timezone with 400 — both
  surfaces stragglers loudly instead of silently dropping fields.
  CLI mirrors the check so --timezone with --kind webhook errors
  client-side. (Review nits.)
- Add --yes (-y) flag and an interactive y/N confirmation prompt to
  `multica autopilot trigger-rotate-url` so the destructive rotate
  matches the UI's AlertDialog safety. (Review item Recommended #6.)

* fix(views): fetch webhook payload on-demand and truncate at 4 KiB

- Add useAutopilotRun query hook + getAutopilotRun API client method
  paired with the new server endpoint. The run-detail dialog now mounts
  a WebhookPayloadSlot that fetches the full run (incl. trigger_payload)
  lazily — list responses no longer carry up to 256 KiB × N runs of
  envelope data.
- WebhookPayloadPreview truncates its in-DOM <pre> at 4 KiB with a
  localized marker so jank-y machines aren't asked to render a 256 KiB
  JSON blob. The Copy button still yields the full string.
- Adds the truncated_marker i18n string to en + zh-Hans.

Review items Recommended #5 (frontend) and a nit on the preview's
unbounded <pre>.

* test(autopilot-webhook): close coverage gaps flagged in PR review

- request_logger: redactWebhookPath unit tests + integration test
  proving the bearer token never lands in slog output, plus the
  webhook_trigger_id context plumbing.
- autopilot_webhook_handler: empty body → 400, archived autopilot →
  200 ignored, per-IP rate limiter trips before DB lookup, kind=api
  and webhook+timezone are rejected at 400, slim list + full detail
  endpoint round-trip.
- webhook_rate_limiter: Lua script structure guard (catches reordering
  even without a live Redis), plus live-Redis tests for both per-token
  and per-IP limiters (REDIS_TEST_URL gated, matching the existing
  Redis test pattern in the package).
- WebhookPayloadPreview: envelope rendering, fallback shape, and the
  >4 KiB truncation path with full-payload-on-Copy guarantee.

Two branches are documented as code-review-protected rather than
covered by tests: the 500-on-DB-error path requires injecting a stub
Queries (no interface here), and the cross-workspace defense-in-depth
check is unreachable from valid SQL state.

* fix(middleware): SetWebhookTriggerID must mutate request in place

The round-1 helper returned a fresh *http.Request from WithContext, and
the webhook handler did `r = SetWebhookTriggerID(r, ...)`. That swaps
the handler's local pointer but doesn't propagate the new context back
to RequestLogger, which is still holding the original *http.Request —
so the audit line never actually included webhook_trigger_id in
production. The round-1 test happened to pass because it pre-stashed
the value on the request before calling ServeHTTP, bypassing the bug
it was meant to verify.

Switch to in-place mutation via `*r = *r.WithContext(...)` so the
wrapping middleware sees the new context after next.ServeHTTP returns,
and update the test to exercise the real call pattern (set the context
from inside the handler, assert the surrounding logger reads it).

Verified live: an accepted webhook now logs
  path=/api/webhooks/autopilots/[redacted] webhook_trigger_id=<uuid>

* fix(autopilot-webhook): symmetric ErrNoRows split + trusted-proxy gate

Round-2 review (Bohan-J, PR #2348 follow-up):

- Must-fix #1: the second lookup at autopilot_webhook.go:258
  (GetAutopilot after the token resolves) was folding every error into
  404. A transient DB blip would tell a webhook sender "not found" and
  it would never retry. Apply the same errors.Is(err, pgx.ErrNoRows)
  → 404 / else → 500 split as the first lookup got in round 1.

- Must-fix #2: clientIPForRateLimit was honoring X-Forwarded-For /
  X-Real-IP from any caller. An attacker spraying random tokens could
  just rotate the XFF header and the per-IP bucket became per-request,
  so the limiter that's specifically supposed to gate spraying before
  it hits the DB unique index was bypassed.

  New shape — matches Bohan's suggestion exactly:
  * Default: r.RemoteAddr only, headers ignored.
  * Operator opt-in via MULTICA_TRUSTED_PROXIES (comma-separated
    CIDRs). XFF/X-Real-IP are honored only when r.RemoteAddr is
    inside one of the listed prefixes; otherwise they're dropped.

  Wired through .env.example and docker-compose.selfhost.yml so
  self-host operators can configure their reverse-proxy's CIDR.
  Invalid CIDRs in the env var are dropped with a single slog.Warn at
  startup rather than crashing the server. Uses net/netip (stdlib,
  value-typed) for parsing and containment checks.

Verified live on the rebuilt self-host backend: a 35-request spray
from one source with rotating XFF gets the expected 30× 404 + 5× 429,
proving the per-IP bucket is keyed on the real connection IP.

* fix(autopilot): reject cron/timezone PATCH on non-schedule triggers

Round-2 review should-fix. CreateAutopilotTrigger already 400s on
kind=webhook + timezone/cron_expression, but UpdateAutopilotTrigger
silently wrote those fields regardless of prev.Kind. The values then
sat in the DB visible to nobody and read by nothing — a back door that
left the API contract fuzzy across create vs update.

Mirror the create-path discipline: after loading prev, if prev.Kind
!= "schedule" and the PATCH body sets cron_expression or timezone,
return 400 with a clear message. enabled and label remain accepted on
every kind.

The existing prev.Kind == "schedule" guard on next_run_at recompute
stays as belt-and-braces, but with this gate in place the recompute
branch is now reachable only for the kind it was meant for.

* test(autopilot-webhook): close round-2 coverage gaps

- IPRateLimitNotBypassedByXFFSpoof: drives the must-fix #2 invariant
  by rotating XFF across three calls from the same RemoteAddr and
  asserting the third gets 429. Pre-round-2 this test would have
  passed for the wrong reason (limiter trusted XFF, so per-bucket
  collision was incidental); now it pins the bypass-closed property.
- IPRateLimitReturns429BeforeDBLookup: updated to set RemoteAddr
  explicitly and drop the XFF header it was leaning on. With
  TrustedProxies empty (test default) the limiter keys on the real
  connection IP, which is what the test wants to assert anyway.
- UpdateAutopilotTrigger_RejectsCronExpressionOnWebhookKind +
  UpdateAutopilotTrigger_RejectsTimezoneOnWebhookKind: drive the
  round-2 should-fix from the handler boundary.
- UpdateAutopilotTrigger_AcceptsEnabledAndLabelOnWebhookKind: counter
  test so a regression to a blanket reject is caught.

* fix(migrations): bump webhook trigger migration 089 → 091

origin/main added 089_squad_no_action_activity_index (and 090_task_is_leader)
since our last rebase, re-colliding with our 089_autopilot_webhook_triggers.
Bump to 091 so the filename ordering is unambiguous again. The SQL is
unchanged — same partial unique index on autopilot_trigger.webhook_token —
only the filename moves.

* fix(views): dedupe skipped icon in autopilot RUN_VISUAL after rebase

The rebase against origin/main merged main's add of `Ban` for the
skipped status next to our round-1 `MinusCircle` entry, leaving the
RUN_VISUAL map with two `skipped` keys (only the last would have been
read at runtime, and MinusCircle had been dropped from the imports
during conflict resolution — so the file would not compile).

Keep main's `Ban` icon (latest design) and a single `skipped` entry.
Carry over the round-1 comment about why the muted styling matters
for failure-ratio readability.

---------

Co-authored-by: Kerim Incedayi <kerim.incedayi@digitalchargingsolutions.com>
2026-05-18 12:17:39 +08:00
Bohan Jiang
fae8558263 fix(daemon): self-heal when a runtime is deleted server-side (#2404)
Closes #2391.
2026-05-11 16:09:40 +08:00
Bohan Jiang
86e7de3e41 feat(server/auth): cache auth token lookups in Redis with 10m TTL
* feat(server/auth): cache PAT lookups in Redis with 60s TTL

Personal access tokens used to hit Postgres on every request: a SELECT
to resolve token_hash → user_id, plus a fire-and-forget UPDATE of
last_used_at. For a CLI / daemon making many requests per second this
is wasted DB load — the token is the same and the answer hasn't changed.

Add a Redis-backed cache (auth.PATCache) keyed by token hash, TTL 60s:

- On cache hit, the auth middleware skips both the SELECT and the
  last_used_at UPDATE. last_used_at is now refreshed at most once per
  TTL window per token, not per request.
- On cache miss the middleware falls back to today's behavior: query
  Postgres, populate the cache, async-update last_used_at.
- On revoke, the handler invalidates the cache entry so revocation
  takes effect immediately rather than waiting for the TTL to expire.
  This required changing RevokePersonalAccessToken from :exec to :one
  RETURNING token_hash.

The cache is nil-safe: when REDIS_URL isn't configured, NewPATCache
returns nil and the middleware degrades to today's always-hit-DB
behavior. JWT validation is untouched (already DB-free).

Tested with REDIS_TEST_URL — same gating pattern the rest of the
suite uses for Redis-backed tests. New tests cover nil-safety, set/
get/invalidate, TTL, and the middleware short-circuit on cache hit.

* fix(server/auth): clamp PAT cache TTL to token's remaining lifetime

GPT-Boy review caught: a PAT expiring in <60s would still be cached
for the full PATCacheTTL window, so the token could continue passing
auth on cache hit for up to ~60s after its expires_at. The DB query
filters expired tokens (revoked = FALSE AND expires_at > now()), but
that filter never ran on a cache hit.

Make Set take an explicit ttl, and add TTLForExpiry to compute it:
  - no expires_at      → full PATCacheTTL
  - expires_at far     → full PATCacheTTL
  - expires_at <60s    → time until expiry
  - already expired    → 0, Set skips caching (TOCTOU defense between
                         the SELECT and the Set, since the SELECT
                         already filters expired rows)

Regression test pins the clamp behavior end-to-end against Redis.

* feat(server/auth): cache daemon-token + PAT lookups in DaemonAuth, bump TTL to 10m

Daemon /api/daemon/* requests (heartbeat, claim task) hit DaemonAuth
which previously did its own GetDaemonTokenByHash on every request and
*also* duplicated the PAT lookup on the mul_ fallback — bypassing the
cache added in 1cdd674c. Today's daemons authenticate via mul_ PATs
(mdt_ minting isn't wired up yet), so the duplicate PAT path is the one
that actually matters for hot-path DB load.

Three changes:

1. New auth.DaemonTokenCache mirrors PATCache for the mdt_ path
   (key = mul:auth:daemon:<sha256>, JSON value = {workspace_id, daemon_id}).
   Forward-looking infrastructure for when daemon tokens get minted; the
   middleware short-circuits the DB SELECT on cache hit. TTL clamped to
   the token's expires_at via the shared TTLForExpiry helper.

2. DaemonAuth now also consults PATCache on its mul_ fallback, sharing
   the same cache as the regular Auth middleware. A daemon making 4 hb/min
   collapses from 4 GetPersonalAccessTokenByHash + 4 last_used_at writes
   per minute to ~1 of each per AuthCacheTTL window (~10 minutes).

3. Rename PATCacheTTL → AuthCacheTTL and bump from 60s to 10 minutes.
   The constant is now shared between PAT and daemon caches; 10m matches
   the user-requested longer TTL for further DB write reduction. Revoke
   latency on the happy path is still instant via active invalidation;
   the worst-case (Redis Del miss / direct-DB revoke) grows from ~60s to
   ~10m.

Tests cover nil-safety, set/get/invalidate, TTL, clamped TTL on near-
expiry tokens, and the middleware short-circuit for both cache paths
(mdt_ via DaemonTokenCache, mul_ fallback via PATCache).

* feat(server/auth): cache PAT lookups on the WebSocket auth path

The third place a PAT is resolved — patResolver.ResolveToken used by
realtime.HandleWebSocket — was still hitting Postgres on every /ws
auth and firing an unconditional last_used_at UPDATE, bypassing the
cache added in 1cdd674c. Wire it through the same shared PATCache so
revoking a token through any path (Auth middleware, DaemonAuth PAT
fallback, or WS auth) hits all three caches with one Invalidate.

Also leaves a comment on DeleteDaemonTokensByWorkspaceAndDaemon —
the query has no caller today, but a future deregister/rotate flow
must remember to call DaemonTokenCache.Invalidate(hash) for each
deleted row, otherwise deleted daemon tokens stay valid until TTL.
2026-04-29 17:07:54 +08:00
Bohan Jiang
22136a55fc fix(server/heartbeat): split auth_ms into decode/runtime_lookup/workspace_check + auth_path (#1822)
Prod slow-log on the deployed v0.2.17 fix shows total_ms=4012,
auth_ms=4010, update_ms=1, all skill stages = 0 — meaning the bottleneck
on /api/daemon/heartbeat is now the auth section, not the Redis claim
path. To pinpoint which sub-stage dominates, decompose auth_ms into:

- decode_ms        — JSON body decode
- runtime_lookup_ms — Queries.GetAgentRuntime (PG PK select)
- workspace_check_ms — requireDaemonWorkspaceAccess (string compare for
                       daemon-token, requireWorkspaceMember for PAT/JWT)

Also add auth_path ("daemon_token" | "pat" | "jwt") set by DaemonAuth
middleware so slow-logs disambiguate which token kind was used. PAT/JWT
takes an extra DB round-trip via requireWorkspaceMember and is a
candidate cause of long auth tails on daemons that haven't migrated to
mdt_ tokens.

The handler keeps the same external behavior; the change inlines and
instruments requireDaemonRuntimeAccess in DaemonHeartbeat only — other
callers of the helper are untouched. logHeartbeatEndpointSlow gains the
new fields.

Existing heartbeat tests pass; the slow-probe test output now shows the
new auth_path / decode_ms / runtime_lookup_ms / workspace_check_ms
fields populated.
2026-04-29 15:00:00 +08:00
Bohan Jiang
f628e48775 refactor(server): error-returning ParseUUID to prevent silent data loss
* refactor(server): make ParseUUID error-returning to prevent silent data loss (MUL-1410)

util.ParseUUID previously swallowed errors and returned a zero pgtype.UUID
on invalid input. When this zero UUID reached a write query (DELETE/UPDATE),
the SQL matched zero rows and the handler returned 2xx success — producing
silent data corruption. #1661 (DeleteIssue with identifier-style ID) was the
visible symptom; PR #1680 patched that one site, this commit closes the
class of bug.

Changes:

- util.ParseUUID now returns (pgtype.UUID, error). Add util.MustParseUUID
  for trusted round-trips that should panic on invalid input.
- handler/handler.go: parseUUID wrapper now calls MustParseUUID — any
  unguarded user-input string reaching it surfaces as a recovered panic
  (chi middleware.Recoverer → 500) instead of silently corrupting data.
  Add parseUUIDOrBadRequest(w, s, fieldName) for handler entry points.
- Convert every Queries.Delete*/Update* call site reachable from raw user
  input (autopilot, comment, project, skill, skill_file, label, pin,
  attachment, feedback, issue assignee, daemon runtime, workspace) to
  validate UUIDs explicitly with parseUUIDOrBadRequest, returning 400 on
  invalid input. Where a resolved entity.ID is already in scope, write
  queries now use it directly instead of re-parsing the URL string.
- Update getWorkspaceMember + loadIssueForUser to handle invalid UUIDs
  gracefully (404/400 instead of panic).
- Update util/middleware/cmd-level callers (subscriber_listeners,
  notification_listeners, activity_listeners, scope_authorizer,
  middleware/workspace) to use the error-returning API.
- Add server/internal/util/pgx_test.go covering valid/invalid input and
  the MustParseUUID panic contract.
- Add TestDeleteIssueByIdentifier + TestDeleteIssueRejectsInvalidUUID
  regression tests in handler_test.go (the original #1661 bug + the
  invalid-input case).
- Document the handler UUID parsing convention in CLAUDE.md so the rule
  is enforceable in future PR review.

* fix(server): address GPT-Boy review of #1748

P1 fixes from PR #1748 review:

1. Migrate remaining request-boundary UUIDs to parseUUIDOrBadRequest so
   malformed input returns 400 instead of panic/500. Was missing on:
   - issue.go: workspace_id in CreateIssue/ChildIssueProgress/ListIssues/
     SearchIssues/BatchUpdateIssues/BatchDeleteIssues; project_id /
     parent_issue_id / lead_id / assignee_id / assignee_ids / creator_id
     filters; batch issue_ids and assignee/parent/project fields in
     BatchUpdateIssues (skip on bad input via util.ParseUUID, matching
     the existing per-row continue semantics).
   - project.go: project id + workspace_id in GetProject/UpdateProject/
     DeleteProject; lead_id in CreateProject/UpdateProject;
     workspace_id in ListProjects + SearchProjects.
   - handler.go: resolveActor now uses util.ParseUUID for X-Agent-ID /
     X-Task-ID headers; invalid UUID falls back to "member" (matches
     pre-existing semantics) instead of panicking.
   - issue.go: validateAssigneePair returns 400 on invalid workspace_id
     instead of panicking.

2. Fix issue:deleted WS event payloads to emit uuidToString(issue.ID)
   instead of the raw URL string. After an identifier-path delete
   ("MUL-7"), the previous payload would have leaked the identifier to
   subscribers, leaving stale entries in frontend caches that key by
   UUID. Updated DeleteIssue (issue.go:1341) and BatchDeleteIssues
   (issue.go:1641). The slog "issue deleted" log line also now records
   the resolved UUID so logs match the WS payload.

3. Extend TestDeleteIssueByIdentifier to subscribe to the bus and
   assert issue:deleted.payload.issue_id is the resolved UUID, not
   the identifier.

* fix(server): validate remaining reviewed UUID inputs

* fix(server): validate remaining handler UUID inputs

* fix(server): finish request boundary UUID audit

* fix(server): validate remaining request body UUIDs

* fix(server): validate runtime path UUIDs

* fix(server): validate remaining audit UUID inputs

---------

Co-authored-by: Eve <eve@multica.ai>
2026-04-28 14:50:28 +08:00
devv-eve
9ed1fa95fc feat(server): add readiness health endpoints (#1605)
* feat(server): add readiness health endpoints

* fix(server): cache readiness checks

* fix(server): raise readiness cache ttl

---------

Co-authored-by: Eve <eve@multica.ai>
2026-04-24 13:50:24 +08:00
LinYushen
b624cd98ad feat: identify clients via X-Client-Platform/Version/OS (#1477)
* feat: identify clients via X-Client-Platform/Version/OS

Adds client identification headers (and matching WS query params) across
all first-party clients so the server can split logs/metrics/gating by
caller without parsing User-Agent.

- HTTP: X-Client-Platform, X-Client-Version, X-Client-OS
- WS: client_platform, client_version, client_os query params
- Platform ∈ {web, desktop, cli, daemon}; OS ∈ {macos, windows, linux}

Wired through the shared TS ApiClient/WSClient via a new identity option
on CoreProvider. Web reads its version from package.json/env; Desktop
captures version + OS synchronously in preload via sendSync IPC. Go CLI
and daemon clients populate the same headers using runtime.GOOS
(normalized darwin → macos).

Server-side adds a ClientMetadata middleware that stashes the headers in
request context; the request logger and logger.RequestAttrs surface them
on every access log and handler-level log. Realtime hub logs the same
fields on websocket connect.

CORS allowlist extended for the new headers.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* test: address client-identity PR nits

- Memoize the CoreProvider identity object on Web and Desktop, and key
  WSProvider's effect on identity primitives instead of the object
  reference, so unrelated parent re-renders no longer tear down and
  reconnect the WebSocket.
- Add direct header-injection tests for the CLI and daemon Go HTTP
  clients (X-Client-Platform/Version/OS) and a normalizeGOOS unit test
  on both packages.
- Add a TS test for WSClient that asserts client_platform/client_version/
  client_os land on the upgrade URL and never leak the auth token.
- Add a hub test that dials the WS endpoint with client_* query params
  and asserts the "websocket connected" log entry surfaces them as
  structured attributes.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-22 13:36:13 +08:00
Naiyuan Qing
f0f3cb5c3a fix(server): resolve X-Workspace-Slug in middleware-less handlers (#1165)
Problem
-------
The v2 workspace URL refactor (#1141) switched the frontend from sending
X-Workspace-ID (UUID) to X-Workspace-Slug. The workspace middleware was
updated to accept the slug and translate it via GetWorkspaceBySlug.

But the handler package maintained a PARALLEL resolver
(`resolveWorkspaceID` in handler.go) used by endpoints that sit outside
the workspace middleware — and that resolver was never updated. It only
checked context / ?workspace_id / X-Workspace-ID, never the slug.

/api/upload-file is the one production route that hit the broken path:
it's user-scoped (not behind workspace middleware) because it also
serves avatar uploads (no workspace). Post-refactor requests from the
frontend arrived with only X-Workspace-Slug; the handler resolver
returned "", the code fell into the "no workspace context" branch, and
every file upload since v2 landed in S3 with no corresponding DB
attachment row — files orphaned, invisible to the UI.

Root cause is structural: two resolvers doing the same job, written
independently, diverged silently when one was updated.

Fix
---
Collapse to a single shared helper. middleware.ResolveWorkspaceIDFromRequest
is the new canonical resolver; both the middleware's internal
`resolveWorkspaceUUID` (for middleware gating) and the handler-side
`(h *Handler).resolveWorkspaceID` (promoted from a package function)
now delegate to it. Priority order matches what the middleware has had
since v2: context > X-Workspace-Slug header > ?workspace_slug query >
X-Workspace-ID header > ?workspace_id query.

Impact analysis
---------------
47 call sites of the old `resolveWorkspaceID(r)` are renamed to
`h.resolveWorkspaceID(r)`. 46 of them sit behind workspace middleware,
so they hit the context fast path and see zero behavior change. The
one caller that actually gains capability is UploadFile — which now
correctly recognizes slug requests and creates DB attachment rows.

Tests
-----
- New table-driven unit test for ResolveWorkspaceIDFromRequest covers
  all priority levels and the unknown-slug fallback.
- Regression tests for UploadFile: once with X-Workspace-Slug only
  (the broken path), once with X-Workspace-ID only (legacy CLI/daemon
  compat path). Both assert that a DB attachment row is created.
- Full Go test suite passes; typecheck + pnpm test unaffected.

Plan
----
See docs/plans/2026-04-16-unify-workspace-identity-resolver.md for the
full first-principles writeup.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 18:01:56 +08:00
Naiyuan Qing
fe358feff0 Reapply "feat: workspace URL refactor v2 + rollback-safe compat layer (#1138)" (#1139) (#1141)
This reverts commit b30fd98605.
2026-04-16 13:16:35 +08:00
Naiyuan Qing
b30fd98605 Revert "feat: workspace URL refactor v2 + rollback-safe compat layer (#1138)" (#1139)
This reverts commit 75d12c26c5.
2026-04-16 12:26:40 +08:00
Naiyuan Qing
75d12c26c5 feat: workspace URL refactor v2 + rollback-safe compat layer (#1138)
* Reapply "feat: workspace URL refactor + slug-first API identity (#1131)" (#1137)

This reverts commit 9b94914bc8.

* compat: legacy URL redirect + localStorage double-write for safe rollback

The first attempt at this refactor (#1131) was reverted because existing
users on old URLs (/issues, /projects, etc.) hit 404 immediately after
deploy, and rolling back left them with empty dashboards — the legacy
code reads localStorage["multica_workspace_id"] to attach a workspace
to API requests, but the new code had stopped writing that key.

Two compat layers added on top of the restored refactor:

1. proxy.ts now intercepts legacy route prefixes (/issues/*, /projects/*,
   /agents/*, /inbox/*, /my-issues/*, /autopilots/*, /runtimes/*,
   /skills/*, /settings/*). Logged-in users with a last_workspace_slug
   cookie are 302'd to /{slug}/{rest}, preserving their deep link. Users
   without the cookie bounce through / where the landing page picks a
   workspace client-side. Unauthenticated users go to /login.

2. Both layouts now double-write the workspace id to the legacy
   localStorage key on every workspace entry. New code ignores this key
   — it exists solely so that if this PR ever gets reverted again, the
   legacy build reading the key would still find the correct workspace
   and avoid the empty-dashboard symptom users saw during the rollback.

Net effect: any direction of deploy ↔ rollback is now cache-compatible,
and any direction of old bookmark → new route resolves without 404.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(platform): defer rehydrateAllWorkspaceStores to a microtask

Same React 19 render-phase restriction that forced setCurrentWorkspace
to defer its subscriber notifications. rehydrateAllWorkspaceStores
synchronously calls each persist store's rehydrate, which setState()s
the store, which schedules updates on any subscribed component. When
the workspace layout's render-phase ref guard invoked this, React
complained that SearchCommand (a store subscriber) couldn't be
re-rendered while WorkspaceLayout was still rendering.

Fix: queueMicrotask the rehydrate loop and add a pending-flag guard so
rapid workspace switches coalesce into one rehydrate on the final slug.
Persist stores tolerate one microtask of staleness — they hold UI
preferences, not correctness-critical state.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 12:23:41 +08:00
Naiyuan Qing
9b94914bc8 Revert "feat: workspace URL refactor + slug-first API identity (#1131)" (#1137)
This reverts commit 59ace95a1e.
2026-04-16 11:56:15 +08:00
Naiyuan Qing
59ace95a1e feat: workspace URL refactor + slug-first API identity (#1131)
* feat: workspace URL refactor + slug-first API identity

Make the URL the single source of truth for workspace identity.
All workspace-scoped URLs now carry the workspace slug as the first
path segment (/{slug}/issues, /{slug}/projects, etc.), matching the
industry standard (Linear, Notion, Vercel, GitHub).

## Key architectural changes

**URL-driven workspace identity:**
- Web routes moved under app/[workspaceSlug]/(dashboard)/
- Desktop routes nested under /:workspaceSlug
- paths.ts builder centralises all URL construction
- reserved-slugs validation (backend + frontend + DB migration audit)

**Slug-first API contract:**
- Frontend sends X-Workspace-Slug header (from URL) instead of X-Workspace-ID (UUID)
- Backend middleware resolves slug → UUID via GetWorkspaceBySlug, falls back to
  X-Workspace-ID for CLI/daemon backwards compatibility
- WebSocket auth accepts ?workspace_slug query param with SlugResolver callback

**State cleanup:**
- Deleted: useWorkspaceStore (Zustand mirror), switchWorkspace/hydrateWorkspace/
  clearWorkspace, localStorage["multica_workspace_id"], api._workspaceId
- useCurrentWorkspace() derives from URL slug + React Query workspace list
- useWorkspaceId() is now a bridge hook (no Context, derives from useCurrentWorkspace)
- WorkspaceIdProvider removed from DashboardGuard
- Paired module vars (slug + UUID) in workspace-storage.ts for non-React consumers

**Layout simplified:**
- Render-phase ref guard sets workspace context synchronously (no async gate)
- DashboardGuard handles auth redirect, loading state, and workspace resolution
- Subscriber notifications deferred via queueMicrotask (React 19 compat)
- persist namespace uses slug (immutable) instead of UUID

## Issues resolved

MUL-43 (share links), MUL-509 (mobile workspace switch), MUL-723 (workspace in URL),
MUL-727 (create workspace flash), MUL-728 (delete workspace no-navigate),
MUL-820 (sidebar Join not switching)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: resolve code review C3/C4/C5/C6 — desktop deadlock + hardcoded paths

C3: Desktop OnboardingGate was calling useCurrentWorkspace() outside
WorkspaceSlugProvider → always null → permanent onboarding deadlock.
Rewrite to use useQuery(workspaceListOptions()) which reads React Query
cache directly without slug context. Remove DashboardGuard from
DesktopShell (auth gating handled by AppContent, workspace routing by
WorkspaceRouteLayout per-tab).

C4: Landing page "Dashboard" links hardcoded /issues (no longer valid).
Changed to / — proxy handles redirect to /{lastSlug}/issues.

C5: autopilots-page.tsx had one hardcoded /autopilots/${id} link.
Changed to wsPaths.autopilotDetail(id).

C6: inbox-page.tsx hardcoded /inbox paths. Changed to wsPaths.inbox().

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(desktop): wrap shell in WorkspaceSlugProvider from module var

AppSidebar calls useWorkspacePaths() → useRequiredWorkspaceSlug() which
throws outside WorkspaceSlugProvider. In the desktop shell, the sidebar
renders at the shell level (outside any tab's WorkspaceRouteLayout).

Fix: DesktopShell reads the current slug via useSyncExternalStore on
the workspace-storage singleton. When slug is available, wraps the
entire shell in WorkspaceSlugProvider. When null (first mount before
any tab's WorkspaceRouteLayout sets it), shows a loading spinner.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(desktop): migrate old tab paths + fix shell slug deadlock

Tab store rehydration: old-format paths like "/issues/abc" (missing
workspace slug prefix) are reset to "/" so IndexRedirect picks the
correct workspace. Detection: if the first segment is a known route
name (issues, projects, etc.) rather than a workspace slug, it's an
old-format path.

Desktop shell: TabContent must always render (not gated behind slug
check) so WorkspaceRouteLayout can mount and call setCurrentWorkspace.
Only sidebar and shell-level UI (chat, modals, search) gate on slug
being present.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 11:53:09 +08:00
LinYushen
e20c507dcc fix(security): add Content-Security-Policy response header (#822)
Adds CSP middleware to the global middleware chain as a browser-level
defense against XSS: script-src 'self', object-src 'none',
frame-ancestors 'none', base-uri 'self', form-action 'self'.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 12:53:39 +08:00
LinYushen
95bfd7dd96 feat(auth): migrate auth token to HttpOnly Cookie & WebSocket Origin whitelist (#819)
* feat(auth): migrate auth token to HttpOnly cookie & implement WebSocket Origin whitelist

Security improvements from the MUL-566 audit report:

1. Auth token is now set as an HttpOnly, SameSite=Lax cookie on login,
   preventing XSS-based token theft. Cookie-based auth includes CSRF
   protection via double-submit cookie pattern. The Authorization header
   path is preserved for Electron desktop app and CLI/PAT clients.

2. WebSocket upgrader now validates the Origin header against a
   configurable allowlist (ALLOWED_ORIGINS env var), rejecting
   connections from unauthorized origins.

Backend: new auth cookie helpers, middleware reads cookie as fallback,
WS handler accepts cookie auth, Origin whitelist, logout endpoint.
Frontend: CSRF token in API headers, cookie-aware auth store and WS
client, web app opts into cookieAuth mode while desktop keeps tokens.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(auth): address PR review — Strict cookies, HMAC-bound CSRF, origin sync

1. SameSite=Lax → SameSite=Strict per spec requirement
2. CSRF token now HMAC-signed with auth token (nonce.signature format),
   preventing subdomain cookie injection attacks
3. allowedWSOrigins uses atomic.Value to eliminate data race
4. Removed magic "cookie" sentinel string in WSProvider — pass null token
   and guard with boolean check instead
5. Removed dead delete uploadHeaders["Content-Type"] in API client

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 12:13:35 +08:00
Jiayuan Zhang
82bbce98fd fix(security): add workspace ownership checks to daemon API routes (#684)
* fix(security): add workspace ownership checks to all daemon API routes

Switch daemon routes from middleware.Auth to middleware.DaemonAuth and
add per-handler workspace ownership verification. This prevents
cross-workspace access to runtimes, tasks, usage, and daemon lifecycle
endpoints (HIGH-1/2/3 + CHAIN-1/2/3).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(security): support mdt_ daemon tokens in DaemonRegister + add regression tests

DaemonRegister now handles both auth paths:
- mdt_ daemon tokens: verify workspace match, skip member check, zero OwnerID
  (SQL COALESCE preserves existing owner on upsert)
- PAT/JWT: existing member check + OwnerID from member

Also adds WithDaemonContext helper and regression tests covering:
- Successful register with daemon token
- Workspace mismatch rejection
- Cross-workspace heartbeat rejection
- Cross-workspace task status rejection

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 12:49:23 +08:00
sunjie21
3bf094ebf7 fix(auth): extend JWT and CloudFront cookie expiration from 72h to 30 days
Reduces login frequency for users by increasing token lifetime.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 21:48:31 +08:00
Jiang Bohan
b8c784dda3 merge: resolve conflicts with main
- Take main's router.go, rich-text-editor.tsx, comment-card.tsx
- Remove deleted daemon_pairing.go
- Keep issue mention card feature
2026-03-31 16:25:20 +08:00
Jiayuan
afdfee78b9 feat(daemon): add authentication for daemon API routes
Issue daemon auth tokens (mdt_) on pairing session claim, bound to
workspace_id + daemon_id with 1-year expiry. Add DaemonAuth middleware
that validates these tokens and falls back to JWT/PAT for backward
compatibility. Apply middleware to all daemon routes except pairing
endpoints.
2026-03-31 16:19:02 +08:00
yushen
9e23fb76fc fix(upload): harden upload flow — sanitize filenames, refresh CF cookies, deduplicate handlers
- Sanitize Content-Disposition filenames to prevent header injection (strip control chars, quotes, semicolons)
- Add CloudFront cookie refresh middleware so cookies are re-issued when expired
- Log errors in groupAttachments instead of silently swallowing them
- Move useFileUpload hook to shared/hooks/ per project architecture conventions
- Add uploadWithToast helper to deduplicate try/catch/toast pattern across 3 components
- Refactor ApiClient.uploadFile to reuse auth headers, 401 handling, and error parsing
- Allow empty MIME types client-side (let server sniff and decide)
- Constrain Image extension max-width in rich-text-editor to prevent layout overflow

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 15:52:40 +08:00
Jiayuan
f4a6e7c475 refactor(server): consolidate workspace permission checks into middleware
Move workspace membership and role validation from individual handlers
into dedicated Chi middleware. The new middleware resolves workspace ID
(from query param, X-Workspace-ID header, or URL param), validates
membership via DB, and injects the member into request context.

Handlers now read workspace ID and member from context instead of
calling requireWorkspaceMember/requireWorkspaceRole directly. This
eliminates ~17 duplicated permission checks across handlers and makes
it harder to accidentally omit access control on new routes.
2026-03-30 03:40:20 +08:00
LinYushen
5c9c2f69fd feat(auth): email verification login and personal access tokens
* feat(auth): add email verification login flow with 401 auto-redirect

Replace the old OAuth-based login with email verification codes:
- Backend: send-code / verify-code endpoints, verification_codes table (migration 009), rate limiting, Resend email service
- Frontend: two-step login UI (email → 6-digit OTP), auth store with sendCode/verifyCode
- SDK: ApiClient gains onUnauthorized callback; 401 responses auto-clear token and redirect to /login
- Fix login button staying disabled due to global isLoading state

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(auth): add brute-force protection, redirect loop guard, and expired code cleanup

- VerifyCode: increment attempts on wrong code, reject after 5 failed tries (migration 010)
- onUnauthorized: skip redirect if already on /login to prevent infinite loops
- SendCode: best-effort cleanup of expired verification codes older than 1 hour

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(auth): add master verification code for non-production environments

Allow code "888888" to bypass email verification in non-production
environments to simplify development and testing workflows.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(auth): add personal access tokens for CLI and API authentication

Add full-stack PAT support: users create tokens in Settings, CLI authenticates
via `multica auth login`. Server stores SHA-256 hashes only. Auth middleware
extended to accept both JWTs and PATs (distinguished by `mul_` prefix).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 14:32:30 +08:00
Naiyuan Qing
8983a9fefa feat(logging): add structured logging across server and SDK
Replace raw fmt/log calls with structured slog logger (Go) and
console-based logger (TypeScript). Add request logging middleware.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 10:57:11 +08:00
Jiayuan Zhang
81e64e9fce Add workspace management and isolated worktree environments 2026-03-23 18:12:11 +08:00
Jiayuan Zhang
6dfc61fa86 test: add comprehensive test suite (Go unit/integration, Vitest, Playwright E2E)
- Add JWT middleware unit tests (8 tests covering all auth edge cases)
- Add WebSocket hub tests (5 tests for client lifecycle and broadcast)
- Add full HTTP integration tests (12 tests through real Chi router with DB)
- Add frontend component tests for login, issues, and issue detail pages
- Add auth context unit tests (9 tests for login/logout/name resolution)
- Add Playwright E2E tests for auth, issues, comments, and navigation
- Configure Vitest with jsdom, React plugin, and path aliases

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 11:50:25 +08:00
Jiayuan Zhang
1e61c1974c feat(server): implement full REST API with JWT auth and real-time WebSocket
- Add HTTP handlers for issues, comments, agents, workspaces, inbox, members, and activity
- Implement JWT authentication middleware with Bearer token validation
- Add sqlc queries for all entities (CRUD operations)
- Extract router into reusable NewRouter() for testability
- Expand SDK with full API client methods (CRUD for all resources)
- Add updateWorkspace to SDK, add Member type to shared types

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 11:50:03 +08:00
Jiayuan Zhang
d4f5c5b16f feat: pivot to AI-native task management platform (#232)
Replace the agent framework codebase with a new monorepo structure
for an AI-native Linear-like product where agents are first-class citizens.

New architecture:
- server/ — Go backend (Chi + gorilla/websocket + sqlc)
  - API server with REST routes for issues, agents, inbox, workspaces
  - WebSocket hub for real-time updates
  - Local daemon entry point for agent runtime connection
  - PostgreSQL migration with 13 tables (issue, agent, inbox, etc.)
  - WebSocket protocol types for server<->daemon communication
- apps/web/ — Next.js 16 frontend
  - Dashboard layout with sidebar navigation
  - Route skeleton: inbox, issues, agents, board, settings
- packages/ui/ — Preserved shadcn/ui design system (26+ components)
- packages/types/ — Full API contract types (Issue, Agent, Workspace, Inbox, Events)
- packages/sdk/ — REST ApiClient + WebSocket WSClient
- packages/store/ — Zustand stores (issue, agent, inbox, auth)
- packages/hooks/ — React hooks (useIssues, useAgents, useInbox, useRealtime)
- packages/utils/ — Shared utilities

Removed: apps/cli, apps/desktop, apps/mobile, apps/gateway,
packages/core, skills/, and all agent-framework code.

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-20 17:55:49 +08:00