feat(chat): workspace-scoped attachment binding + fire-and-forget send (#4249 )

* feat(chat): workspace-scoped attachment binding + fire-and-forget send Uploads are now workspace-scoped: the chat session is created and attachments are bound to the message at send time, so a paste/drop no longer creates an empty session the user never sends. - LinkAttachmentsToChatMessage returns the ids it actually bound; the client diffs requested-vs-bound and warns on partial bind, replacing an extra listChatMessagesPage fetch. - Cancelling an empty chat task detaches attachments before deleting the user message (attachment FK is ON DELETE CASCADE) and returns them via cancelled_chat_message.attachments, so a restored draft can re-bind. - SendChatMessageResponse.attachment_ids has no omitempty: "requested but bound zero" serializes [] so the client can tell it apart from an older server and still warn. - Send is fire-and-forget: it no longer steals focus when the user has navigated to another session (guarded on the live store + new-chat agent id); the reply surfaces via the unread dot. commitInput gets clearEditor so a navigated-away commit doesn't wipe the editor now showing another session, while still clearing the sent draft's data. - Draft restore is session-aware so a failed fire-and-forget send restores into the session it was sent from, never the one the user moved to. - Removed the now-unreferenced migrateInputDraft store action. Verified: core/views typecheck, chat-input (15) / store (3) / api client (24) unit tests, go build + vet, handler SendChatMessage + CancelTaskByUser DB tests. Full make check / E2E left to CI. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test(chat): guard attachment survival on empty-chat cancel Cancelling an empty chat task deletes the user message, and attachment.chat_message_id is ON DELETE CASCADE (migration 083), so the detach-before-delete in finalizeCancelledChatMessage is the only thing keeping the user's attachment from being silently destroyed. Nothing covered it. Add a DB regression test that binds an attachment to the cancelled user message and asserts: the row survives the cascade (chat_message_id NULL, chat_session_id retained), the cancel response returns it via cancelled_chat_message.attachments, and a resend re-binds it to the new message. Verified red when the detach step is removed. Related issue: MUL-3364 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Co-authored-by: multica-agent <github@multica.ai> * fix(comment): pessimistic submit for comment/reply composers The comment and reply composers cleared the editor after `await onSubmit` returned, with no in-flight lock. On a slow send the WS `comment:created` event already dropped the real comment into the timeline while the box still held the same text + spinner, so it read as two comments. And because `submitComment`/`submitReply` swallow errors (toast, no rethrow), a failed send still reached `clearContent` and silently discarded the user's draft. Recover the comment/reply portion of the closed #4236: make the submit callback resolve a success boolean (true on success, false on the caught failure), lock the editor while in flight (pointer-events-none + dimmed wrapper + aria-busy, since ContentEditor can't toggle Tiptap `editable` post-mount), keep the button spinning, and clear only on success — a failed send keeps the draft. Chat composer is out of scope (already reworked on this branch); attachment binding is untouched. Adds two view tests (in-flight lock then clear-on-success; failed send keeps the draft); both verified red against the un-fixed code. Related issue: MUL-3364 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Co-authored-by: multica-agent <github@multica.ai> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Co-authored-by: multica-agent <github@multica.ai>
MUL-3328: add retry button for failed agent comments (#4217 )
2026-06-18 04:09:13 +02:00 · 2026-06-18 09:40:38 +08:00 · 2026-06-17 15:18:44 +02:00 · 2026-06-17 15:18:06 +02:00 · 2026-06-17 18:48:33 +08:00 · 2026-06-17 18:23:46 +08:00
863 changed files with 102985 additions and 9653 deletions
--- a/.env.example
+++ b/.env.example
@@ -21,11 +21,16 @@ APP_ENV=
 # 888888 and keep APP_ENV non-production. This is ignored when APP_ENV=production.
 MULTICA_DEV_VERIFICATION_CODE=
 PORT=8080
-# Optional aliases for the local/self-host backend port. If one is set, it
-# takes precedence over PORT in compose, Makefile, and installer helpers.
-# BACKEND_PORT=8080
+# Docker Compose consumes flat port values. Set BACKEND_PORT directly to
+# override the backend host port.
+BACKEND_PORT=8080
+# Optional aliases for local/self-host backend port helpers outside compose.
 # API_PORT=8080
 # SERVER_PORT=8080
+FRONTEND_PORT=3000
+# Derived by docker-compose.selfhost.yml / local scripts from FRONTEND_PORT.
+# Set explicitly only when serving frontend on a different origin/domain.
+FRONTEND_ORIGIN=http://localhost:${FRONTEND_PORT}
 # Prometheus metrics are disabled by default. When enabled, bind to loopback
 # unless you protect the listener with private networking, allowlists, or
 # proxy auth. Do not expose this endpoint through the public app/API ingress.
@@ -35,9 +40,9 @@ JWT_SECRET=change-me-in-production
 # Derived by Makefile / local scripts from the backend port.
 # Set explicitly only when the daemon reaches the API through a different URL.
 # MULTICA_SERVER_URL=ws://localhost:8080/ws
-# Derived by docker-compose.selfhost.yml / local scripts from FRONTEND_PORT.
+# Derived by docker-compose.selfhost.yml / local scripts from FRONTEND_ORIGIN.
 # Set explicitly only when the app's public URL differs from local frontend.
-# MULTICA_APP_URL=http://localhost:3000
+MULTICA_APP_URL=${FRONTEND_ORIGIN}
 # Public URL the API is reachable at from the open internet (no trailing
 # slash). Used to mint absolute webhook URLs for autopilot webhook
 # triggers and to show correct daemon setup commands in the web UI. Leave
@@ -95,12 +100,16 @@ RESEND_FROM_EMAIL=noreply@multica.ai
 #       Required by providers that only offer port 465 and do not advertise
 #       STARTTLS (e.g. Aliyun enterprise mail). Auto-enabled when SMTP_PORT=465
 #       and SMTP_TLS is unset.
+#   SMTP_EHLO_NAME is the EHLO/HELO name announced to the relay. Defaults to the
+#     machine hostname; set a real FQDN when a strict relay (e.g. Google Workspace
+#     smtp-relay.gmail.com) rejects the default and the connection drops as an EOF.
 SMTP_HOST=
 SMTP_PORT=25
 SMTP_USERNAME=
 SMTP_PASSWORD=
 SMTP_TLS_INSECURE=false
 SMTP_TLS=
+SMTP_EHLO_NAME=

 # Google OAuth
 # The web login page reads GOOGLE_CLIENT_ID from /api/config at runtime, so
@@ -108,9 +117,9 @@ SMTP_TLS=
 # rebuild is needed.
 GOOGLE_CLIENT_ID=
 GOOGLE_CLIENT_SECRET=
-# Derived by docker-compose.selfhost.yml / local scripts from FRONTEND_PORT.
+# Derived by docker-compose.selfhost.yml / local scripts from FRONTEND_ORIGIN.
 # Set explicitly only when your OAuth callback URL differs from local frontend.
-# GOOGLE_REDIRECT_URI=http://localhost:3000/auth/callback
+GOOGLE_REDIRECT_URI=${FRONTEND_ORIGIN}/auth/callback

 # S3 / CloudFront
 # S3_BUCKET — bucket NAME only (e.g. "my-bucket"). Do NOT include the
@@ -118,6 +127,15 @@ GOOGLE_CLIENT_SECRET=
 # from S3_BUCKET + S3_REGION. S3_REGION must match the bucket's real region.
 S3_BUCKET=
 S3_REGION=us-west-2
+AWS_ACCESS_KEY_ID=
+AWS_SECRET_ACCESS_KEY=
+# AWS_ENDPOINT_URL — optional S3-compatible endpoint (MinIO, RustFS, R2, etc.).
+# For internal Docker/VPC hosts such as http://rustfs:9000, leave
+# ATTACHMENT_DOWNLOAD_MODE=auto or set proxy explicitly so browsers/CLI do
+# not need direct access to the object store.
+AWS_ENDPOINT_URL=
+ATTACHMENT_DOWNLOAD_MODE=auto
+ATTACHMENT_DOWNLOAD_URL_TTL=30m
 CLOUDFRONT_KEY_PAIR_ID=
 CLOUDFRONT_PRIVATE_KEY_SECRET=multica/cloudfront-signing-key
 CLOUDFRONT_PRIVATE_KEY=
@@ -188,12 +206,35 @@ CORS_ALLOWED_ORIGINS=
 # GITHUB_APP_SLUG is the tail of https://github.com/apps/<slug>.
 GITHUB_APP_SLUG=
 GITHUB_WEBHOOK_SECRET=
+# Optional: GitHub App identity for App-authenticated REST calls. When set,
+# the setup callback enriches the installation row with the real account
+# login (org / user name) immediately after install. When unset, the row
+# is created with the "unknown" placeholder and the next `installation`
+# webhook from GitHub overwrites it — set both to skip that interim flash.
+# GITHUB_APP_ID is the numeric "App ID" shown on the App's settings page.
+# GITHUB_APP_PRIVATE_KEY is the full PEM block (including BEGIN/END lines)
+# generated under "Private keys" on that same page; preserve newlines.
+GITHUB_APP_ID=
+GITHUB_APP_PRIVATE_KEY=
+
+# Lark / Feishu bot integration (Settings → Integrations "Bind to Lark")
+# Off until MULTICA_LARK_SECRET_KEY is set — a base64-encoded 32-byte key
+# that encrypts each Bot's app secret at rest. Leave empty to disable.
+# Generate one with: openssl rand -base64 32
+MULTICA_LARK_SECRET_KEY=
+# Mainland 飞书 and international Lark are auto-detected per installation
+# (at QR scan) and served side by side — LEAVE THESE EMPTY for normal use.
+# They are optional deployment-wide overrides that force EVERY installation
+# onto one host (a proxy, a mock for tests, or a single-cloud staging
+# setup); HTTP drives outbound Open Platform API calls, CALLBACK the inbound
+# long-conn bootstrap. NOTE: if you previously ran international Lark by
+# setting these to https://open.larksuite.com, the server relabels your
+# existing installs to region=lark on first boot after upgrade, so you can
+# clear these afterwards. See docs/lark-bot-integration.
+MULTICA_LARK_HTTP_BASE_URL=
+MULTICA_LARK_CALLBACK_BASE_URL=

 # Frontend
-FRONTEND_PORT=3000
-# Derived by docker-compose.selfhost.yml / local scripts from FRONTEND_PORT.
-# Set explicitly only when serving frontend on a different origin/domain.
-# FRONTEND_ORIGIN=http://localhost:3000
 # Leave empty — auto-derived from page origin in browser, set by Makefile for local dev.
 # NEXT_PUBLIC_API_URL also feeds the Next.js SSR proxy when explicitly set.
 NEXT_PUBLIC_API_URL=
--- a/.github/workflows/mobile-verify.yml
+++ b/.github/workflows/mobile-verify.yml
@@ -13,8 +13,9 @@ name: Mobile Verify
 # - pnpm-workspace.yaml           — catalog versions
 # - turbo.json                    — turbo task pipeline
 #
-# Mobile has no vitest suite today; if one lands, add `test` to the turbo
-# task list below.
+# Mobile's vitest suite is intentionally narrow (Node env, pure-function
+# tests under apps/mobile/lib/*.test.ts — see apps/mobile/vitest.config.ts).
+# RN component-level rendering is not exercised here.

 on:
  push:
@@ -61,5 +62,5 @@ jobs:
      - name: Install dependencies
        run: pnpm install

-      - name: Type check and lint
-        run: pnpm exec turbo typecheck lint --filter=@multica/mobile
+      - name: Type check, lint, and test
+        run: pnpm exec turbo typecheck lint test --filter=@multica/mobile
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -176,6 +176,7 @@ make start-worktree     # Start using .env.worktree
 - Avoid broad refactors unless required by the task.
 - New global (pre-workspace) routes MUST use a single word (`/login`, `/inbox`) or a `/{noun}/{verb}` pair (`/workspaces/new`). NEVER add hyphenated word-group root routes (`/new-workspace`, `/create-team`) — they collide with common user workspace names and force endless reserved-slug audits. Reserving the noun (`workspaces`) automatically protects the entire `/workspaces/*` subtree.
 - The reserved-slug list lives in **one** place: `server/internal/handler/reserved_slugs.json`. The Go side embeds the JSON; `packages/core/paths/reserved-slugs.ts` is generated from it by `pnpm generate:reserved-slugs`. Edit the JSON, run the generator, commit both. CI re-runs the generator and fails on any drift, so a stale TS file cannot land.
+- When you change a CLI command or flag, an API request/response field, or product behavior that a built-in skill documents (`server/internal/service/builtin_skills/*`), update that skill's `SKILL.md` **and** its `references/*-source-map.md` in the same PR. The built-in skills are source-traced contracts shipped to agents — if the code moves and the skill doesn't, it silently teaches stale behavior.

 ### API Response Compatibility

--- a/CLI_AND_DAEMON.md
+++ b/CLI_AND_DAEMON.md
@@ -168,7 +168,7 @@ Daemon behavior is configured via flags or environment variables:
 |---------|------|--------------|---------|
 | Poll interval | `--poll-interval` | `MULTICA_DAEMON_POLL_INTERVAL` | `3s` |
 | Heartbeat interval | `--heartbeat-interval` | `MULTICA_DAEMON_HEARTBEAT_INTERVAL` | `15s` |
-| Agent timeout | `--agent-timeout` | `MULTICA_AGENT_TIMEOUT` | `2h` |
+| Agent timeout | `--agent-timeout` | `MULTICA_AGENT_TIMEOUT` | `0` (no cap; bounded by the watchdogs) |
 | Codex semantic inactivity timeout | `--codex-semantic-inactivity-timeout` | `MULTICA_CODEX_SEMANTIC_INACTIVITY_TIMEOUT` | `10m` |
 | Max concurrent tasks | `--max-concurrent-tasks` | `MULTICA_DAEMON_MAX_CONCURRENT_TASKS` | `20` |
 | Daemon ID | `--daemon-id` | `MULTICA_DAEMON_ID` | hostname |
@@ -699,3 +699,79 @@ Most commands support `--output` with two formats:
 multica issue list --output json
 multica daemon status --output json
 ```
+
+## Error Messages
+
+The CLI funnels command errors returned to the top-level handler through a
+single user-facing translation layer (`server/internal/cli/errors.go`) so that
+what you see on the terminal is a short, actionable sentence rather than a raw
+Go error, an HTTP status line, or an internal `resolve issue: ...` chain. (A
+few commands print their own output or run deliberate fast probes — for example
+`setup`'s short `/health` reachability check — and don't go through this
+layer.) The underlying detail is still available on demand (see `--debug`).
+
+### What you see
+
+- **Friendly, single-line message.** Transport failures (timeout, DNS,
+  connection refused, TLS) and HTTP status failures (401/403/404/409/400·422/
+  429/5xx) are each rendered as one clear sentence with a next step — for
+  example a timeout suggests checking the network or raising
+  `MULTICA_HTTP_TIMEOUT`, and a 401 tells you to run `multica login`.
+- **Server-provided validation messages are preserved.** For a 400/422 that
+  carries a message from the server, that message is shown verbatim
+  (`Invalid request: <server message>`); only when there is none do you get the
+  generic "check your values / run with --help" hint.
+- **No leaked internals by default.** Raw URLs, status lines, JSON bodies, and
+  the internal verb chain are hidden unless you ask for them.
+
+### Language
+
+Messages default to **English**, matching the rest of the CLI's help output.
+If a Chinese locale is detected in `LC_ALL`, `LC_MESSAGES`, or `LANG` (in that
+precedence order), messages switch to **Chinese**. No flag is needed; set the
+locale as usual:
+
+```bash
+LANG=zh_CN.UTF-8 multica issue get MUL-9999   # 错误信息显示为中文
+```
+
+### Exit codes
+
+The process exit code is tiered so scripts can branch on the failure class:
+
+| Exit code | Meaning |
+| --- | --- |
+| `0` | success |
+| `1` | generic / unclassified error |
+| `2` | network error (timeout, DNS, connection refused, TLS, offline) |
+| `3` | authentication / authorization (HTTP 401, 403) |
+| `4` | not found (HTTP 404) |
+| `5` | validation (HTTP 400, 422) |
+
+```bash
+multica issue get MUL-9999
+if [ $? -eq 4 ]; then echo "no such issue"; fi
+```
+
+### Seeing the full detail (`--debug`)
+
+Pass the global `--debug` flag (or set `MULTICA_DEBUG=1`) to print the complete
+original error chain — the internal verb chain, the request method/path/status,
+and the raw server body — underneath the friendly message. Use it when you need
+to file a bug or understand exactly what the server returned:
+
+```bash
+multica issue list --debug
+MULTICA_DEBUG=1 multica issue update MUL-1234 --title "x"
+```
+
+### Request timeout
+
+API requests use a default timeout of 30 seconds. Override it with
+`MULTICA_HTTP_TIMEOUT` when you are on a slow network; it accepts a Go duration
+(`45s`, `2m`) or a plain number of seconds (`45`). Command-level deadlines are
+always at least this value, so raising it takes effect across all commands.
+
+```bash
+MULTICA_HTTP_TIMEOUT=60s multica issue list
+```
--- a/3
+++ b/3
@@ -15,8 +15,9 @@ COPY server/ ./server/
 # Build binaries
 ARG VERSION=dev
 ARG COMMIT=unknown
+ARG DATE=unknown
 RUN cd server && CGO_ENABLED=0 go build -ldflags "-s -w -X main.version=${VERSION} -X main.commit=${COMMIT}" -o bin/server ./cmd/server
-RUN cd server && CGO_ENABLED=0 go build -ldflags "-s -w -X main.version=${VERSION} -X main.commit=${COMMIT}" -o bin/multica ./cmd/multica
+RUN cd server && CGO_ENABLED=0 go build -ldflags "-s -w -X main.version=${VERSION} -X main.commit=${COMMIT} -X main.date=${DATE}" -o bin/multica ./cmd/multica
 RUN cd server && CGO_ENABLED=0 go build -ldflags "-s -w" -o bin/migrate ./cmd/migrate
 RUN cd server && CGO_ENABLED=0 go build -ldflags "-s -w" -o bin/backfill_task_usage_hourly ./cmd/backfill_task_usage_hourly

--- a/14
+++ b/14
@@ -58,12 +58,17 @@ selfhost: ## Create .env if needed, then pull and start the official self-hosted
 		echo "==> Creating .env from .env.example..."; \
 		cp .env.example .env; \
 		JWT=$$(openssl rand -hex 32); \
+		PGPASS=$$(openssl rand -hex 24); \
 		if [ "$$(uname)" = "Darwin" ]; then \
 			sed -i '' "s/^JWT_SECRET=.*/JWT_SECRET=$$JWT/" .env; \
+			sed -i '' "s/^POSTGRES_PASSWORD=.*/POSTGRES_PASSWORD=$$PGPASS/" .env; \
+			sed -i '' -E "s#^(DATABASE_URL=postgres://[^:]+:)[^@]*(@.*)#\1$$PGPASS\2#" .env; \
 		else \
 			sed -i "s/^JWT_SECRET=.*/JWT_SECRET=$$JWT/" .env; \
+			sed -i "s/^POSTGRES_PASSWORD=.*/POSTGRES_PASSWORD=$$PGPASS/" .env; \
+			sed -i -E "s#^(DATABASE_URL=postgres://[^:]+:)[^@]*(@.*)#\1$$PGPASS\2#" .env; \
 		fi; \
-		echo "==> Generated random JWT_SECRET"; \
+		echo "==> Generated random JWT_SECRET and POSTGRES_PASSWORD"; \
 	fi
 	@echo "==> Pulling official Multica images..."
 	@if ! docker compose -f docker-compose.selfhost.yml pull; then \
@@ -108,12 +113,17 @@ selfhost-build: ## Build backend/web from the current checkout and start the sel
 		echo "==> Creating .env from .env.example..."; \
 		cp .env.example .env; \
 		JWT=$$(openssl rand -hex 32); \
+		PGPASS=$$(openssl rand -hex 24); \
 		if [ "$$(uname)" = "Darwin" ]; then \
 			sed -i '' "s/^JWT_SECRET=.*/JWT_SECRET=$$JWT/" .env; \
+			sed -i '' "s/^POSTGRES_PASSWORD=.*/POSTGRES_PASSWORD=$$PGPASS/" .env; \
+			sed -i '' -E "s#^(DATABASE_URL=postgres://[^:]+:)[^@]*(@.*)#\1$$PGPASS\2#" .env; \
 		else \
 			sed -i "s/^JWT_SECRET=.*/JWT_SECRET=$$JWT/" .env; \
+			sed -i "s/^POSTGRES_PASSWORD=.*/POSTGRES_PASSWORD=$$PGPASS/" .env; \
+			sed -i -E "s#^(DATABASE_URL=postgres://[^:]+:)[^@]*(@.*)#\1$$PGPASS\2#" .env; \
 		fi; \
-		echo "==> Generated random JWT_SECRET"; \
+		echo "==> Generated random JWT_SECRET and POSTGRES_PASSWORD"; \
 	fi
 	@echo "==> Building Multica from the current checkout..."
 	docker compose -f docker-compose.selfhost.yml -f docker-compose.selfhost.build.yml up -d --build
--- a/SELF_HOSTING.md
+++ b/SELF_HOSTING.md
@@ -144,7 +144,7 @@ If you already run a Kubernetes cluster, you can deploy Multica there instead of
 The chart creates the following resources in the target namespace:

 - `multica-postgres` — `pgvector/pgvector:pg17` backed by a 10Gi PVC
- `multica-backend` — Go API/WS server backed by a 5Gi uploads PVC
+- `multica-backend` — Go API/WS server. Backed by a 5Gi `ReadWriteOnce` uploads PVC by default; set `backend.uploads.persistence.enabled=false` when you have configured S3 (`backend.config.s3Bucket`) and don't want the chart to declare the PVC at all.
 - `multica-frontend` — Next.js standalone server
 - Two `Ingress` resources: one for the web host, one for the backend host
 - `multica-config` ConfigMap (rendered from `values.yaml`)
@@ -326,7 +326,7 @@ To roll back if an upgrade goes sideways:
 helm -n multica rollback multica
 ```

-> **Upgrading from `v0.3.4` to `v0.3.5+` fails with `refusing to drop legacy daily rollups: ...`?** Same migration guard as the Docker path — see [Usage Dashboard Rollup → Option C](#option-c--backfill-history-first-then-schedule). Run the backfill against the same database the chart is using (`kubectl -n multica exec deploy/multica-backend -- ./backfill_task_usage_hourly --sleep-between-slices=2s`), then restart the backend deployment to re-apply migrations.
+> **Upgrading from `v0.3.4` to `v0.3.5+` fails with `refusing to drop legacy daily rollups: ...`?** As of MUL-2957 the `migrate up` command runs an idempotent monthly-slice backfill automatically before applying migration `103`, so a clean upgrade is a single `helm upgrade` + backend rollout. If you are still on a pre-MUL-2957 binary or the auto-hook fails, run the standalone backfill against the same database the chart is using (`kubectl -n multica exec deploy/multica-backend -- ./backfill_task_usage_hourly --sleep-between-slices=2s`), then restart the backend deployment to re-apply migrations. See [Advanced Configuration → Usage Dashboard Rollup](SELF_HOSTING_ADVANCED.md#usage-dashboard-rollup) for the full recovery flow.

 ### Tearing down

@@ -340,56 +340,52 @@ kubectl delete namespace multica

 ---

-## Usage Dashboard Rollup (Required)
+## Usage Dashboard Rollup

-Starting with `v0.3.5`, the Usage / Runtime dashboards read from a derived `task_usage_hourly` table rather than directly from `task_usage`. Raw `task_usage` rows are written by the backend on every task, but the dashboard only sees data after `rollup_task_usage_hourly()` runs and aggregates them into `task_usage_hourly`.
+The Usage / Runtime dashboards read from a derived `task_usage_hourly` table populated by `rollup_task_usage_hourly()`. As of MUL-2957 the backend runs this rollup **in-process** on every replica via a DB-backed scheduler (`sys_cron_executions`); a fresh self-host install needs no operator action and the bundled `pgvector/pgvector:pg17` image works without changes — you do **not** need to swap it for an image that ships `pg_cron`, register an external cron job, set up a systemd timer, or run a Kubernetes `CronJob`.

-**The bundled `pgvector/pgvector:pg17` image does NOT include `pg_cron`.** If nothing schedules the rollup, the dashboard will stay at zero forever even though `task_usage` is populated. You have three supported options — pick one before relying on the dashboard.
-
-> **Upgrading from `v0.3.4` to `v0.3.5+`** with existing `task_usage` history: migration `103` is fail-closed and will abort `migrate up` with `refusing to drop legacy daily rollups: …`. Run `backfill_task_usage_hourly` first (Option C below), then re-run the upgrade. **Fresh installs** are exempted by that guard and migrate cleanly — but the dashboard will still stay at zero until you pick Option A or Option B.
-
-### Option A — External cron / systemd-timer (simplest)
-
-Schedule a 5-minute job that calls `rollup_task_usage_hourly()`. It is idempotent and watermark-driven, so a missed tick catches up on the next run.
-
-```bash
-# /etc/cron.d/multica-rollup — every 5 minutes
-*/5 * * * * root docker compose -f /path/to/multica/docker-compose.selfhost.yml \
-  exec -T postgres psql -U multica -d multica \
-  -c "SELECT rollup_task_usage_hourly();" >/dev/null
-```
-
-Or as a systemd timer + service if you prefer that surface. The function returns the number of (upserted + deleted-empty) rows; it's safe to call concurrently with itself (an advisory lock makes overlapping runs no-op) and safe to call alongside `backfill_task_usage_hourly`.
-
-### Option B — Swap Postgres for an image that ships `pg_cron`
-
-If you'd rather have Postgres schedule itself, replace `pgvector/pgvector:pg17` in `docker-compose.selfhost.yml` with an image that bundles both `pgvector` and `pg_cron` (e.g. `supabase/postgres`, or your own build of `pgvector/pgvector` with `pg_cron` added and `shared_preload_libraries=pg_cron` set on the server). Then, once:
+Multiple backend replicas are safe: each replica ticks every 30 seconds and tries to claim the current 5-minute UTC plan, but the unique key `(job_name, scope_kind, scope_id, plan_time)` means only one wins each plan. Inspect steady-state operation:

 ```sql
-CREATE EXTENSION IF NOT EXISTS pg_cron;
-SELECT cron.schedule(
-  'rollup_task_usage_hourly',
-  '*/5 * * * *',
-  $$SELECT rollup_task_usage_hourly()$$
-);
+SELECT plan_time, status, attempt, runner_id,
+       error_code, error_msg, started_at, finished_at
+  FROM sys_cron_executions
+ WHERE job_name = 'rollup_task_usage_hourly'
+ ORDER BY plan_time DESC
+ LIMIT 20;
 ```

-`shared_preload_libraries` requires a Postgres restart to take effect — set it in `postgresql.conf` (or via the image's documented mechanism) before bringing the container up.
+Full reference (audit table semantics, advisory lock 4246, the standalone backfill command, flag descriptions, the `v0.3.4 → v0.3.5+` migration auto-hook) lives in [Advanced Configuration → Usage Dashboard Rollup](SELF_HOSTING_ADVANCED.md#usage-dashboard-rollup).

-### Option C — Backfill history first, then schedule
+> **Upgrading from `v0.3.4` to `v0.3.5+`?** As of MUL-2957 the `migrate up` command runs an idempotent monthly-slice backfill automatically right before applying migration `103`, so the upgrade completes in a single invocation — no operator step required. If you are still on a pre-MUL-2957 binary or the auto-hook fails for an environmental reason, run `backfill_task_usage_hourly` against the same database and re-run the upgrade. See [Advanced Configuration → Usage Dashboard Rollup](SELF_HOSTING_ADVANCED.md#usage-dashboard-rollup) for the recovery flow.

-If you're upgrading from `v0.3.4 → v0.3.5+` and already have `task_usage` rows (or you just want the dashboard to show historical data on a fresh install that you've been running for a while), run the bundled backfill command once before scheduling the rollup:
+### Compatibility paths (existing deployments only)

-```bash
-# Backfills task_usage_hourly from all historical task_usage rows and stamps
-# the rollup watermark. Idempotent — safe to re-run.
-docker compose -f docker-compose.selfhost.yml exec backend \
-  ./backfill_task_usage_hourly --sleep-between-slices=2s
-```
+External schedulers — **`pg_cron` registered on the database, an external cron job, a systemd timer, or a Kubernetes `CronJob`** — that call `SELECT rollup_task_usage_hourly()` directly were the only option before MUL-2957 and remain a supported compatibility path. They are no longer the recommended setup; new deployments should rely on the in-process scheduler instead. The SQL function holds advisory lock 4246 internally, so the in-process scheduler and any pre-existing external schedule can coexist without ever double-writing the rollup.

-On a database with years of data this can scan tens of millions of rows; `--sleep-between-slices=2s` throttles the read pressure. Use `--months-back N` (plus `--force-partial`) if you only want the last N months. Once it finishes, set up Option A or Option B so new buckets keep flowing.
+If you already have a `pg_cron` job in production, the safe sequence to retire it is:

-After upgrading, re-run `migrate up` (or restart the backend container — migrations run automatically on startup) to apply migration `103` cleanly.
+1. Confirm the in-process scheduler is healthy on at least one backend replica — recent SUCCESS rows should be landing in `sys_cron_executions` for `rollup_task_usage_hourly`:
+
+   ```sql
+   SELECT plan_time, status, runner_id, finished_at
+     FROM sys_cron_executions
+    WHERE job_name = 'rollup_task_usage_hourly'
+      AND status = 'SUCCESS'
+    ORDER BY plan_time DESC
+    LIMIT 5;
+   ```
+
+2. Once SUCCESS rows are arriving on schedule, unschedule the redundant `pg_cron` entry:
+
+   ```sql
+   SELECT cron.unschedule('rollup_task_usage_hourly')
+     FROM cron.job WHERE jobname = 'rollup_task_usage_hourly';
+   ```
+
+3. Leave the `pg_cron` extension itself installed unless you are sure no other workload depends on it. The bundled `pgvector/pgvector:pg17` image does **not** ship `pg_cron`, so nothing in Multica's default install needs it; uninstalling `pg_cron` from a custom image that other workloads still use is a separate decision.
+
+External cron / systemd timer / Kubernetes `CronJob` setups that call `SELECT rollup_task_usage_hourly()` directly can be retired the same way — once `sys_cron_executions` shows steady SUCCESS rows from the in-process scheduler, the external job is redundant and can be removed.

 ## Stopping Services

@@ -431,7 +427,7 @@ docker compose -f docker-compose.selfhost.yml up -d
 Pin `MULTICA_IMAGE_TAG` in `.env` to an exact version like `v0.2.4` if you want to stay on a specific release. Migrations run automatically on backend startup.
 If the selected GHCR tag has not been published yet, fall back to `make selfhost-build` or `docker compose -f docker-compose.selfhost.yml -f docker-compose.selfhost.build.yml up -d --build`.

-> **Upgrading from `v0.3.4` to `v0.3.5+` fails with `refusing to drop legacy daily rollups: ...`?** That's migration `103`'s fail-closed guard: it requires `task_usage_hourly` to be seeded before the legacy daily rollups are dropped. Run `backfill_task_usage_hourly` first, then re-run the upgrade. Full instructions in [Usage Dashboard Rollup → Option C](#option-c--backfill-history-first-then-schedule).
+> **Upgrading from `v0.3.4` to `v0.3.5+` fails with `refusing to drop legacy daily rollups: ...`?** That's migration `103`'s fail-closed guard: it requires `task_usage_hourly` to be seeded before the legacy daily rollups are dropped. As of MUL-2957 `migrate up` runs that backfill automatically right before applying `103`, so the upgrade completes in a single invocation. If you are still on a pre-MUL-2957 binary or the auto-hook fails, run `backfill_task_usage_hourly` manually first, then re-run the upgrade. Full instructions in [Advanced Configuration → Usage Dashboard Rollup](SELF_HOSTING_ADVANCED.md#usage-dashboard-rollup).

 ---

--- a/SELF_HOSTING_ADVANCED.md
+++ b/SELF_HOSTING_ADVANCED.md
@@ -46,6 +46,7 @@ Use this option when your deployment cannot reach the public internet or you alr
 | `SMTP_PASSWORD` | SMTP password | - |
 | `SMTP_TLS` | TLS mode. `implicit` (aliases `smtps`, `ssl`) forces SMTPS on connect; port `465` auto-enables it. Unset / `starttls` upgrades via STARTTLS | `starttls` |
 | `SMTP_TLS_INSECURE` | Set `true` to skip TLS certificate verification (self-signed / private CA certs) | `false` |
+| `SMTP_EHLO_NAME` | EHLO/HELO name announced to the relay. Set a real FQDN when a strict relay (e.g. Google Workspace) rejects the default greeting from a public IP | machine hostname |

 STARTTLS is used automatically when advertised by the server. Port 465 (SMTPS / implicit TLS) is supported and auto-enables implicit TLS; set `SMTP_TLS=implicit` (aliases `smtps`, `ssl`) to force it on a non-standard port.

@@ -93,6 +94,8 @@ For file uploads and attachments, configure S3 and (optionally) CloudFront:
 | `S3_REGION` | AWS region (default: `us-west-2`). Must match the bucket's actual region — used for both SDK signing and public URLs |
 | `AWS_ACCESS_KEY_ID` / `AWS_SECRET_ACCESS_KEY` | Static credentials. When both are unset, the AWS SDK default credential chain is used |
 | `AWS_ENDPOINT_URL` | Custom S3-compatible endpoint (e.g. MinIO, R2, B2). Setting this switches to path-style URLs |
+| `ATTACHMENT_DOWNLOAD_MODE` | Attachment download behavior: `auto` (default), `cloudfront`, `presign`, or `proxy`. Use `proxy` for private buckets behind Docker/VPC-only endpoints such as `http://rustfs:9000` |
+| `ATTACHMENT_DOWNLOAD_URL_TTL` | TTL for CloudFront signed URLs and S3 presigned download URLs (default: `30m`) |
 | `CLOUDFRONT_DOMAIN` | CloudFront distribution domain — when set, public URLs use this host instead of the S3 host |
 | `CLOUDFRONT_KEY_PAIR_ID` | CloudFront key pair ID for signed URLs |
 | `CLOUDFRONT_PRIVATE_KEY` | CloudFront private key (PEM format) |
@@ -112,7 +115,7 @@ The `Secure` flag on session cookies is derived automatically from the scheme of
 | `PORT` | `8080` | Backend server port |
 | `METRICS_ADDR` | empty | Optional Prometheus metrics listener, for example `127.0.0.1:9090` |
 | `FRONTEND_PORT` | `3000` | Frontend port |
-| `CORS_ALLOWED_ORIGINS` | Value of `FRONTEND_ORIGIN` | Comma-separated list of allowed origins |
+| `CORS_ALLOWED_ORIGINS` | Value of `FRONTEND_ORIGIN` | Comma-separated list of allowed origins. Governs **both** the HTTP CORS allowlist **and** the WebSocket `Origin` check. A browser origin that isn't listed here (and isn't `localhost`) has its real-time WebSocket upgrade rejected with `403`, so live updates stop working until a manual refresh. |
 | `LOG_LEVEL` | `info` | Log level: `debug`, `info`, `warn`, `error` |

 ### CLI / Daemon
@@ -181,74 +184,35 @@ cd server && go run ./cmd/migrate up

 ## Usage Dashboard Rollup

-The Usage and Runtime dashboards read from `task_usage_hourly`, a derived table populated by `rollup_task_usage_hourly()`. The function is **not** scheduled out of the box on the default self-host stack: the bundled `pgvector/pgvector:pg17` image ships without `pg_cron`, and the backend does not run the rollup in-process either. Until something calls it on a schedule, raw `task_usage` rows will keep arriving while the dashboard stays at zero.
+The Usage and Runtime dashboards read from `task_usage_hourly`, a derived table populated by `rollup_task_usage_hourly()`. As of MUL-2957 the backend runs this rollup **in-process** on every replica via a DB-backed scheduler (`sys_cron_executions`); a fresh self-host install needs no operator action — the bundled `pgvector/pgvector:pg17` image works without changes.

-Pick one of the supported paths:
+### How the in-process scheduler works

-### Option A — External cron / systemd-timer
-
-The simplest path. Schedule `SELECT rollup_task_usage_hourly()` every five minutes from any out-of-band timer (host crontab, systemd timer, sidecar container, Kubernetes CronJob). It is idempotent and watermark-driven — overlapping runs are no-ops on an internal advisory lock, and a missed tick catches up on the next run.
-
-Docker Compose:
-
-```bash
-# /etc/cron.d/multica-rollup
-*/5 * * * * root docker compose -f /path/to/multica/docker-compose.selfhost.yml \
-  exec -T postgres psql -U multica -d multica \
-  -c "SELECT rollup_task_usage_hourly();" >/dev/null
-```
-
-Kubernetes (one-off `CronJob`):
-
-```yaml
-apiVersion: batch/v1
-kind: CronJob
-metadata:
-  name: multica-usage-rollup
-spec:
-  schedule: "*/5 * * * *"
-  concurrencyPolicy: Forbid
-  jobTemplate:
-    spec:
-      template:
-        spec:
-          restartPolicy: OnFailure
-          containers:
-            - name: psql
-              image: postgres:17-alpine
-              command:
-                - psql
-                - "$(DATABASE_URL)"
-                - -c
-                - "SELECT rollup_task_usage_hourly();"
-              env:
-                - name: DATABASE_URL
-                  valueFrom:
-                    secretKeyRef:
-                      name: multica-secrets
-                      key: DATABASE_URL
-```
-
-### Option B — Postgres with `pg_cron`
-
-If you'd rather have Postgres schedule itself, swap the bundled image for one that ships both `pgvector` and `pg_cron` (e.g. `supabase/postgres`, or a custom build of `pgvector/pgvector` with `pg_cron` added). `pg_cron` requires `shared_preload_libraries=pg_cron` in `postgresql.conf`, which only takes effect on Postgres restart — set it before bringing the container up.
-
-Then register the job once:
+Every backend replica ticks every 30 seconds and tries to claim the current 5-minute UTC plan in `sys_cron_executions`. The unique key `(job_name, scope_kind, scope_id, plan_time)` makes the claim a single-winner contest across all replicas, so multi-instance deployments do not double-write. The handler then calls `SELECT rollup_task_usage_hourly()`; the SQL function holds advisory lock `4246` internally, so a stray `pg_cron` job or manual call can run alongside the scheduler without ever colliding on the rollup itself. Inspect the audit table for steady-state operation:

 ```sql
-CREATE EXTENSION IF NOT EXISTS pg_cron;
-SELECT cron.schedule(
-  'rollup_task_usage_hourly',
-  '*/5 * * * *',
-  $$SELECT rollup_task_usage_hourly()$$
-);
+SELECT plan_time, status, attempt, runner_id,
+       error_code, error_msg, started_at, finished_at
+  FROM sys_cron_executions
+ WHERE job_name = 'rollup_task_usage_hourly'
+ ORDER BY plan_time DESC
+ LIMIT 20;
 ```

-`pg_cron.database_name` defaults to `postgres`; if your Multica database has a different name, point `pg_cron` at it via that GUC or run `cron.schedule_in_database(...)` instead.
+### Compatibility — existing `pg_cron` registrations

-### Option C — Backfill historical data first
+If you previously registered the rollup as a `pg_cron` job (`SELECT cron.schedule('rollup_task_usage_hourly', '*/5 * * * *', …)`), it is safe to leave it in place: advisory lock 4246 prevents double-writes, and the loser path no-ops cleanly. To drop the redundant entry once the in-process scheduler is up:

-`rollup_task_usage_hourly()` only processes new buckets after it starts running. If you already have `task_usage` rows from before the rollup was scheduled — most commonly when upgrading from `v0.3.4` to `v0.3.5+`, or on a fresh install that has been collecting usage for a while — run `backfill_task_usage_hourly` once to seed historical buckets, then set up Option A or Option B for ongoing rollups.
+```sql
+SELECT cron.unschedule('rollup_task_usage_hourly')
+  FROM cron.job WHERE jobname = 'rollup_task_usage_hourly';
+```
+
+External cron / systemd / Kubernetes `CronJob` setups that call `SELECT rollup_task_usage_hourly()` directly are also still valid — they were the only option before MUL-2957 and remain a supported compatibility path. They are no longer the recommended setup; new deployments should rely on the in-process scheduler.
+
+### Standalone backfill command
+
+`rollup_task_usage_hourly()` only processes new buckets after it starts running. If you already have `task_usage` rows from before the rollup was claimed for the first time — most commonly when upgrading from `v0.3.4` to `v0.3.5+` on a database that already has months of usage — you can run `backfill_task_usage_hourly` to seed historical buckets:

 ```bash
 # Docker Compose
@@ -260,7 +224,7 @@ kubectl -n multica exec deploy/multica-backend -- \
  ./backfill_task_usage_hourly --sleep-between-slices=2s
 ```

-The command walks `task_usage`'s full time range in monthly slices and calls the same idempotent primitive the cron path uses, so it's safe to re-run, to interrupt with Ctrl-C, and to run concurrently with an already-scheduled rollup. Flags:
+The command walks `task_usage`'s full time range in monthly slices and calls the same idempotent primitive the in-process scheduler uses, so it's safe to re-run, to interrupt with Ctrl-C, and to run concurrently with the scheduler (advisory lock 4246 serialises them). Flags:

 | Flag | Description |
 |---|---|
@@ -272,17 +236,9 @@ After backfill completes, the rollup-state watermark is stamped to `now() - 5 mi

 ### `v0.3.4 → v0.3.5+` upgrade order

-Migration `103` adds a fail-closed guard that refuses to drop the legacy daily rollups until `task_usage_hourly` has caught up. If you run `migrate up` straight through on a database with existing `task_usage` rows, it aborts with:
+Migration `103` adds a fail-closed guard that refuses to drop the legacy daily rollups until `task_usage_hourly` has caught up. As of MUL-2957 the migrate command runs an idempotent monthly-slice backfill (under advisory lock 4246) **automatically** immediately before applying migration `103`, so v0.3.4 → v0.3.5+ upgrades complete in a single `migrate up` invocation — no operator step is required.

-```text
-ERROR: refusing to drop legacy daily rollups:
-  task_usage_hourly_rollup_state.watermark_at (1970-01-01 ...) trails
-  task_usage latest event (...) by more than 01:00:00 — backfill is
-  incomplete or pg_cron is not running. Run cmd/backfill_task_usage_hourly
-  (and let pg_cron catch up) before re-running migrate
-```
-
-Recovery is straightforward: run `backfill_task_usage_hourly` (Option C above), then re-run `migrate up` (or restart the backend container — migrations run automatically on startup). **Fresh installs are exempt** — the guard short-circuits when `task_usage` is empty, and migrations succeed, but the dashboard will still stay at zero until you set up Option A or Option B.
+If you are upgrading from a binary that pre-dates MUL-2957 (or the auto-hook fails for an environmental reason), recovery is the manual path: run `backfill_task_usage_hourly` against the database, then re-run `migrate up` (or restart the backend container — migrations run automatically on startup). **Fresh installs are exempt** — the guard short-circuits when `task_usage` is empty, and the in-process scheduler picks up new buckets from the first tick.

 ## Manual Setup (Without Docker Compose)

@@ -337,6 +293,8 @@ multica.example.com {
 }
 ```

+> Even on a single domain, set `FRONTEND_ORIGIN` / `CORS_ALLOWED_ORIGINS` to that public origin (e.g. `https://multica.example.com`) on the backend. The backend's default origin allowlist is `localhost` only, so without this it rejects the WebSocket upgrade from the public URL with `403` and real-time updates silently stop working. See [LAN / Non-localhost Access](#lan--non-localhost-access).
+
 **Separate-domain layout** — frontend and backend on different hostnames:

 ```
@@ -456,6 +414,8 @@ HTTP requests (issues, comments, uploads) work on LAN out of the box — Next.js

   `NEXT_PUBLIC_WS_URL` is a build-time variable (see `Dockerfile.web`), so setting it only in `environment:` on the pre-built image has no effect — you must use the `selfhost.build.yml` override that rebuilds the image.

+**Also required: allowlist the browser origin.** The two options above fix the WebSocket *upgrade proxying*, but a second, independent setting gates the connection: the backend validates the WebSocket `Origin` header against an allowlist that defaults to `localhost` only. When you open Multica from any other origin — a LAN IP **or a public domain behind a reverse proxy** — set `CORS_ALLOWED_ORIGINS` (or `FRONTEND_ORIGIN`) on the backend to that exact origin and restart, exactly as shown under [LAN / Non-localhost Access](#lan--non-localhost-access) above. Otherwise the upgrade is refused with `403`: the backend logs `websocket: request origin not allowed by Upgrader.CheckOrigin` and the browser console loops `disconnected, reconnecting in 3s`, while HTTP requests (and manual page refreshes) keep working because they are same-origin to the page. The single value covers both HTTP CORS and the WebSocket origin check.
+
 > **Note:** If you need to hard-code a different public API / WebSocket endpoint into the web image for any other reason, use the same source-build override: `docker compose -f docker-compose.selfhost.yml -f docker-compose.selfhost.build.yml up -d --build`.

 ## Health Check
--- a/apps/desktop/src/main/context-menu.test.ts
+++ b/apps/desktop/src/main/context-menu.test.ts
@@ -0,0 +1,221 @@
+import { describe, expect, it, vi, beforeEach } from "vitest";
+
+// Capture every MenuItem the SUT constructs so each test can assert
+// on the menu that would appear at popup time without booting an
+// actual Electron window. State is created via `vi.hoisted` because
+// `vi.mock` factories are hoisted above all top-level statements; a
+// plain `const` would be a TDZ ReferenceError when the factory runs.
+type CapturedMenuItem = {
+  label?: string;
+  role?: string;
+  type?: string;
+  click?: () => void;
+};
+const ctx = vi.hoisted(() => ({
+  capturedItems: [] as CapturedMenuItem[][],
+  browserWindowFromWebContents: vi.fn(),
+  popupSpy: vi.fn(),
+  clipboardWriteText: vi.fn(),
+  openExternalSpy: vi.fn().mockResolvedValue(undefined),
+  preferredLanguagesRef: { current: ["en-US"] as string[] },
+}));
+
+vi.mock("electron", () => {
+  class MockMenu {
+    items: CapturedMenuItem[] = [];
+    constructor() {
+      ctx.capturedItems.push(this.items);
+    }
+    append(item: CapturedMenuItem) {
+      this.items.push(item);
+    }
+    popup(opts: unknown) {
+      ctx.popupSpy(opts);
+    }
+  }
+  class MockMenuItem {
+    label?: string;
+    role?: string;
+    type?: string;
+    click?: () => void;
+    constructor(opts: CapturedMenuItem) {
+      Object.assign(this, opts);
+    }
+  }
+  return {
+    BrowserWindow: { fromWebContents: ctx.browserWindowFromWebContents },
+    Menu: MockMenu,
+    MenuItem: MockMenuItem,
+    app: {
+      getPreferredSystemLanguages: () => ctx.preferredLanguagesRef.current,
+    },
+    clipboard: { writeText: ctx.clipboardWriteText },
+    shell: { openExternal: ctx.openExternalSpy },
+  };
+});
+
+import { installContextMenu } from "./context-menu";
+
+type ContextMenuParams = {
+  selectionText: string;
+  isEditable: boolean;
+  linkURL: string;
+  editFlags: {
+    canCut: boolean;
+    canCopy: boolean;
+    canPaste: boolean;
+    canSelectAll: boolean;
+  };
+};
+
+type Listener = (event: unknown, params: ContextMenuParams) => void;
+
+// Tiny WebContents stub — we only need the `.on("context-menu", ...)`
+// hook the SUT installs and a way to fire it back at our own listener
+// list. Everything else (clipboard, link opening, label resolution) is
+// mocked above.
+function makeWebContents() {
+  const handlers: Listener[] = [];
+  return {
+    on(event: string, fn: Listener) {
+      if (event === "context-menu") handlers.push(fn);
+    },
+    fire(params: ContextMenuParams) {
+      for (const h of handlers) h({}, params);
+    },
+  };
+}
+
+const baseEditFlags = {
+  canCut: false,
+  canCopy: false,
+  canPaste: false,
+  canSelectAll: false,
+};
+
+describe("installContextMenu — link items", () => {
+  beforeEach(() => {
+    ctx.capturedItems.length = 0;
+    ctx.popupSpy.mockClear();
+    ctx.clipboardWriteText.mockClear();
+    ctx.openExternalSpy.mockClear();
+    ctx.browserWindowFromWebContents.mockReset();
+    ctx.preferredLanguagesRef.current = ["en-US"];
+  });
+
+  it("adds 'Open Link in Browser' and 'Copy Link Address' when right-clicking an http(s) link", () => {
+    // The link case is the one this test file is here to cover —
+    // before MUL-3083 follow-up, right-clicking an <a> in the
+    // renderer only surfaced 'copy' (when the user happened to have
+    // text selected) and gave no way to open the URL externally.
+    const wc = makeWebContents();
+    installContextMenu(wc as never);
+    wc.fire({
+      ...baseSelection({ linkURL: "https://multica.ai/welcome" }),
+    });
+
+    const labels = lastMenuLabels();
+    expect(labels).toContain("Open Link in Browser");
+    expect(labels).toContain("Copy Link Address");
+
+    // The two click handlers must route to the existing
+    // openExternalSafely allowlist + clipboard.writeText.
+    invokeByLabel("Open Link in Browser");
+    expect(ctx.openExternalSpy).toHaveBeenCalledWith("https://multica.ai/welcome");
+
+    invokeByLabel("Copy Link Address");
+    expect(ctx.clipboardWriteText).toHaveBeenCalledWith(
+      "https://multica.ai/welcome",
+    );
+    expect(ctx.popupSpy).toHaveBeenCalledTimes(1);
+  });
+
+  it("does NOT add link items when the cursor is over a non-http(s) URL", () => {
+    // Only http(s) links are surfaced — we don't promise anything for
+    // mailto:, javascript:, custom app schemes, etc. Surfacing them
+    // would shell out via openExternalSafely (which would block the
+    // call anyway) or write a non-URL string to the clipboard, both
+    // of which violate user expectations for a "link" item.
+    const wc = makeWebContents();
+    installContextMenu(wc as never);
+    wc.fire(baseSelection({ linkURL: "javascript:alert(1)" }));
+    const labels = lastMenuLabelsOrEmpty();
+    expect(labels).not.toContain("Open Link in Browser");
+    expect(labels).not.toContain("Copy Link Address");
+  });
+
+  it("does NOT add link items when there is no link under the cursor", () => {
+    const wc = makeWebContents();
+    installContextMenu(wc as never);
+    wc.fire({
+      selectionText: "hello",
+      isEditable: false,
+      linkURL: "",
+      editFlags: { ...baseEditFlags, canCopy: true },
+    });
+    const labels = lastMenuLabelsOrEmpty();
+    expect(labels).not.toContain("Open Link in Browser");
+    // Selection-only context still surfaces copy as before — guards
+    // against a regression where adding the link branch broke the
+    // base path.
+    expect(menuItemRoles()).toContain("copy");
+  });
+
+  it("uses zh-Hans labels when the OS preferred language is Chinese", () => {
+    // Locale fallback is intentionally permissive: every zh-* variant
+    // routes to zh-Hans so users on zh-CN / zh-TW / zh-HK still see
+    // Chinese rather than dropping to English. The renderer ships only
+    // zh-Hans translations, so this matches the rest of the app.
+    ctx.preferredLanguagesRef.current = ["zh-CN"];
+    const wc = makeWebContents();
+    installContextMenu(wc as never);
+    wc.fire(baseSelection({ linkURL: "https://multica.ai" }));
+    expect(lastMenuLabels()).toContain("在浏览器中打开链接");
+    expect(lastMenuLabels()).toContain("复制链接地址");
+  });
+
+  it("falls back to English when the OS preferred language is something we don't ship", () => {
+    ctx.preferredLanguagesRef.current = ["fr-FR"];
+    const wc = makeWebContents();
+    installContextMenu(wc as never);
+    wc.fire(baseSelection({ linkURL: "https://multica.ai" }));
+    expect(lastMenuLabels()).toContain("Open Link in Browser");
+  });
+});
+
+// --- helpers ---
+
+function baseSelection(over: Partial<ContextMenuParams>): ContextMenuParams {
+  return {
+    selectionText: "",
+    isEditable: false,
+    linkURL: "",
+    editFlags: { ...baseEditFlags },
+    ...over,
+  };
+}
+
+function lastMenu(): CapturedMenuItem[] {
+  const last = ctx.capturedItems[ctx.capturedItems.length - 1];
+  if (!last) throw new Error("no menu was constructed");
+  return last;
+}
+
+function lastMenuLabelsOrEmpty(): string[] {
+  const last = ctx.capturedItems[ctx.capturedItems.length - 1] ?? [];
+  return last.map((i) => i.label ?? "");
+}
+
+function lastMenuLabels(): string[] {
+  return lastMenu().map((i) => i.label ?? "");
+}
+
+function menuItemRoles(): string[] {
+  return lastMenu().map((i) => i.role ?? "");
+}
+
+function invokeByLabel(label: string): void {
+  const item = lastMenu().find((i) => i.label === label);
+  if (!item) throw new Error(`menu item not found: ${label}`);
+  item.click?.();
+}
--- a/apps/desktop/src/main/context-menu.ts
+++ b/apps/desktop/src/main/context-menu.ts
@@ -1,12 +1,38 @@
-import { BrowserWindow, Menu, MenuItem, type WebContents } from "electron";
+import {
+  BrowserWindow,
+  Menu,
+  MenuItem,
+  app,
+  clipboard,
+  type WebContents,
+} from "electron";
+import { isSafeExternalHttpUrl, openExternalSafely } from "./external-url";

 // Electron ships with no default right-click menu, so a user selecting text
 // in the renderer has no way to copy it. Mirror Chrome's minimal clipboard
 // menu using `roles`, which keeps i18n + accelerator handling native.
+//
+// Custom (non-role) link items below are NOT auto-localized by Electron —
+// roles like "copy" pull labels from the OS, but a custom MenuItem only
+// shows the `label` you give it. We translate by OS-preferred language so
+// the link items at least track Chinese / Japanese / Korean speakers
+// alongside the English default; everything else falls through to English,
+// which matches Chrome's behavior on those locales without app-level
+// translation files.
 export function installContextMenu(webContents: WebContents): void {
  webContents.on("context-menu", (_event, params) => {
-    const { editFlags, selectionText, isEditable } = params;
+    const { editFlags, selectionText, isEditable, linkURL } = params;
    const hasSelection = selectionText.trim().length > 0;
+    // params.linkURL is the resolved absolute URL of the anchor under the
+    // cursor; Electron normalizes relative hrefs against the page URL for
+    // us, so we only need to gate on the http(s) scheme allowlist
+    // (mirrors openExternalSafely + the renderer's <a> usage). Empty for
+    // non-link right-clicks; other schemes (mailto:, javascript:, custom
+    // app schemes) are intentionally not surfaced — opening them via
+    // shell.openExternal would route through the OS handler and is
+    // outside what this menu promises.
+    const linkIsHttpUrl = !!linkURL && isSafeExternalHttpUrl(linkURL);
+    const labels = pickLabels();

    const menu = new Menu();

@@ -26,8 +52,87 @@ export function installContextMenu(webContents: WebContents): void {
      menu.append(new MenuItem({ role: "selectAll" }));
    }

+    // Link items — only when the cursor is over an actual http(s) <a>.
+    // Without these the renderer's <a target="_blank"> gives users no
+    // standard right-click affordance ("Open in new window", "Copy link
+    // address"); the default click handler does forward to
+    // setWindowOpenHandler → openExternalSafely, but discoverability via
+    // the keyboard / mouse context menu was missing.
+    if (linkIsHttpUrl) {
+      if (menu.items.length > 0) {
+        menu.append(new MenuItem({ type: "separator" }));
+      }
+      menu.append(
+        new MenuItem({
+          label: labels.openLink,
+          click: () => {
+            // openExternalSafely re-validates the scheme — defense in
+            // depth in case Electron ever surfaces a non-http linkURL
+            // we forgot to filter at this layer.
+            void openExternalSafely(linkURL);
+          },
+        }),
+      );
+      menu.append(
+        new MenuItem({
+          label: labels.copyLinkAddress,
+          click: () => {
+            clipboard.writeText(linkURL);
+          },
+        }),
+      );
+    }
+
    if (menu.items.length === 0) return;
    const window = BrowserWindow.fromWebContents(webContents) ?? undefined;
    menu.popup({ window });
  });
 }
+
+// Labels for the two link-related menu items in the user's OS-preferred
+// language, with English as the fallback. Kept inline because the main
+// process has no shared i18n loader (the renderer's i18next is per-window
+// and not reachable from here), and pulling one in for two strings would
+// be more rope than payload. Matches the four locales the renderer ships.
+type ContextMenuLabels = {
+  openLink: string;
+  copyLinkAddress: string;
+};
+
+const labelsByLocale: Record<string, ContextMenuLabels> = {
+  en: {
+    openLink: "Open Link in Browser",
+    copyLinkAddress: "Copy Link Address",
+  },
+  "zh-Hans": {
+    openLink: "在浏览器中打开链接",
+    copyLinkAddress: "复制链接地址",
+  },
+  ja: {
+    openLink: "ブラウザでリンクを開く",
+    copyLinkAddress: "リンクのアドレスをコピー",
+  },
+  ko: {
+    openLink: "브라우저에서 링크 열기",
+    copyLinkAddress: "링크 주소 복사",
+  },
+};
+
+// pickLabels resolves the OS-preferred language to one of the four
+// locales we ship copy for. We say "Open Link in Browser" rather than
+// "Open Link in New Window" because the link is opened via
+// shell.openExternal — it lands in the user's default browser, not in
+// another Multica window — so the wording matches what actually
+// happens.
+function pickLabels(): ContextMenuLabels {
+  const preferred = app.getPreferredSystemLanguages()[0]?.toLowerCase() ?? "";
+  if (preferred.startsWith("zh")) {
+    // All Chinese variants get the Simplified copy — Multica only
+    // ships zh-Hans, and zh-Hant users falling through to en would be
+    // worse than reading Simplified Chinese.
+    return labelsByLocale["zh-Hans"];
+  }
+  if (preferred.startsWith("ja")) return labelsByLocale.ja;
+  if (preferred.startsWith("ko")) return labelsByLocale.ko;
+  return labelsByLocale.en;
+}
--- a/apps/desktop/src/main/daemon-auth-probe.test.ts
+++ b/apps/desktop/src/main/daemon-auth-probe.test.ts
@@ -0,0 +1,56 @@
+import { describe, expect, it } from "vitest";
+
+import { classifyAuthProbe, isAuthStatusError } from "./daemon-auth-probe";
+
+describe("classifyAuthProbe", () => {
+  it("treats a 401 as expired login", () => {
+    expect(classifyAuthProbe({ status: 401 })).toBe("auth_expired");
+  });
+
+  it("treats a missing token as expired login", () => {
+    expect(classifyAuthProbe({ noToken: true })).toBe("auth_expired");
+  });
+
+  it("treats a 2xx as a valid token (failure is non-auth)", () => {
+    expect(classifyAuthProbe({ status: 200 })).toBe("ok");
+    expect(classifyAuthProbe({ status: 204 })).toBe("ok");
+  });
+
+  // The headline guard: a network failure must never be reported as an auth
+  // problem — the daemon is just as unreachable for non-auth reasons.
+  it("does NOT classify a network error as expired login", () => {
+    expect(classifyAuthProbe({ networkError: true })).toBe("unknown");
+  });
+
+  it("leaves 5xx and other statuses inconclusive", () => {
+    expect(classifyAuthProbe({ status: 500 })).toBe("unknown");
+    expect(classifyAuthProbe({ status: 503 })).toBe("unknown");
+    expect(classifyAuthProbe({ status: 403 })).toBe("unknown");
+  });
+
+  it("is inconclusive when nothing is known", () => {
+    expect(classifyAuthProbe({})).toBe("unknown");
+  });
+});
+
+describe("isAuthStatusError", () => {
+  it("is true only for a 401-tagged error (session token is dead)", () => {
+    expect(isAuthStatusError(Object.assign(new Error("x"), { status: 401 }))).toBe(
+      true,
+    );
+  });
+
+  // The reviewer's must-fix: transient failures must NOT be treated as auth
+  // failures (which would log the user out). A 5xx mint, a thrown fetch, a
+  // file-write error — none carry status 401.
+  it("is false for transient / non-401 failures", () => {
+    expect(isAuthStatusError(Object.assign(new Error("x"), { status: 503 }))).toBe(
+      false,
+    );
+    expect(isAuthStatusError(new Error("network down"))).toBe(false);
+    expect(isAuthStatusError(new Error("EACCES: write failed"))).toBe(false);
+    expect(isAuthStatusError(undefined)).toBe(false);
+    expect(isAuthStatusError(null)).toBe(false);
+    expect(isAuthStatusError("401")).toBe(false);
+  });
+});
--- a/apps/desktop/src/main/daemon-auth-probe.ts
+++ b/apps/desktop/src/main/daemon-auth-probe.ts
@@ -0,0 +1,58 @@
+/**
+ * Pure classification for the daemon auth probe. Kept free of Electron imports
+ * so it can be unit-tested in jsdom.
+ *
+ * When the local daemon fails to reach "running" shortly after a start, the
+ * main process probes the daemon's token against the backend (GET /api/me) to
+ * tell "the daemon can't authenticate" apart from "the daemon is slow / the
+ * network is down / it crashed for another reason". Misclassifying a network
+ * blip as an auth failure would be worse than the original silent-Starting bug,
+ * so the rules below are deliberately conservative: only an explicit 401 (or a
+ * missing credential) is treated as auth-expired.
+ */
+
+export interface AuthProbeOutcome {
+  /** HTTP status code returned by the probe request, if one completed. */
+  status?: number;
+  /** The daemon profile has no token at all — there is nothing to validate. */
+  noToken?: boolean;
+  /** The probe request threw (timeout, connection refused, DNS, TLS). */
+  networkError?: boolean;
+}
+
+export type AuthProbeResult = "auth_expired" | "ok" | "unknown";
+
+/**
+ * Whether an error represents a genuine auth rejection (HTTP 401) as opposed to
+ * a transient failure (5xx, network, local I/O). Used by the re-authenticate
+ * flow so that only a real 401 — the session token itself is dead — forces a
+ * full re-login; transient failures keep the user signed in to retry.
+ *
+ * `mintPat` attaches the response status to the error it throws, so a 401
+ * surfaces here as `{ status: 401 }`. Everything else (no status, 5xx, a thrown
+ * fetch, a file-write error) is treated as non-auth.
+ */
+export function isAuthStatusError(err: unknown): boolean {
+  return (
+    typeof err === "object" &&
+    err !== null &&
+    (err as { status?: unknown }).status === 401
+  );
+}
+
+export function classifyAuthProbe(outcome: AuthProbeOutcome): AuthProbeResult {
+  // No credential to validate → the user must sign in.
+  if (outcome.noToken) return "auth_expired";
+  // Couldn't reach the server → this is a network problem, not an auth one.
+  // Stay "unknown" so the caller keeps showing "starting"/"stopped" instead of
+  // wrongly prompting for re-login.
+  if (outcome.networkError) return "unknown";
+  // The server explicitly rejected the token.
+  if (outcome.status === 401) return "auth_expired";
+  // The token is accepted — the daemon is failing for some other reason.
+  if (outcome.status !== undefined && outcome.status >= 200 && outcome.status < 300) {
+    return "ok";
+  }
+  // 5xx and everything else are inconclusive about the token's validity.
+  return "unknown";
+}
--- a/apps/desktop/src/main/daemon-manager.ts
+++ b/apps/desktop/src/main/daemon-manager.ts
@@ -17,14 +17,37 @@ import {
 import { join } from "path";
 import { homedir, hostname } from "os";
 import type { DaemonStatus, DaemonPrefs } from "../shared/daemon-types";
+import { daemonStatusAlive } from "../shared/daemon-types";
 import { ensureManagedCli, managedCliPath } from "./cli-bootstrap";
 import { decideVersionAction } from "./version-decision";
+import {
+  daemonLifecycleUnreachable,
+  isDaemonExternallyManaged,
+  normalizeHostOS,
+} from "./daemon-os";
+import {
+  classifyAuthProbe,
+  isAuthStatusError,
+  type AuthProbeResult,
+} from "./daemon-auth-probe";

 const DEFAULT_HEALTH_PORT = 19514;
 const POLL_INTERVAL_MS = 5_000;
 const PREFS_PATH = join(homedir(), ".multica", "desktop_prefs.json");
 const LOG_TAIL_RETRY_MS = 2_000;
 const LOG_TAIL_MAX_RETRIES = 5;
+// How long a start may sit in "starting" (with no /health) before we probe the
+// token to find out whether login expired. The daemon's own startup can legitimately
+// take a while (it renews the PAT and lists workspaces before serving /health), so we
+// wait past the common case to avoid probing healthy-but-slow starts.
+const AUTH_PROBE_GRACE_MS = 10_000;
+// `multica daemon start` blocks until the daemon reports ready, polling /health
+// for up to its own startup timeout (45s in server/cmd/multica/cmd_daemon.go) to
+// cover cold-start agent-version detection. This execFile timeout MUST stay
+// above that — otherwise Electron kills the CLI supervisor mid-startup and a
+// healthy-but-slow start is misreported as a failure (the detached daemon child
+// keeps running, so the UI flashes "stopped" then "running").
+const DAEMON_START_EXEC_TIMEOUT_MS = 60_000;

 const DEFAULT_PREFS: DaemonPrefs = { autoStart: true, autoStop: false };

@@ -48,6 +71,15 @@ let pendingVersionRestart = false;
 let targetApiBaseUrl: string | null = null;
 let activeProfile: ActiveProfile | null = null;

+// Auth-probe state for the current start attempt. When a start fails to reach
+// "running", we probe the daemon's token once (after AUTH_PROBE_GRACE_MS) to
+// decide whether the cause is an expired/invalid login. `authExpired` is sticky
+// until the next start attempt or a successful /health, so the UI keeps showing
+// the re-login prompt instead of flapping back to "starting". See #3512.
+let startingSince: number | null = null;
+let authProbeDone = false;
+let authExpired = false;
+
 // Serialize all writes to any profile config file. Multiple paths
 // (syncToken, resolveActiveProfile, clearToken, watch/unwatch handlers)
 // may try to write concurrently; chaining them avoids interleaved writes
@@ -134,6 +166,8 @@ function sendStatus(status: DaemonStatus): void {
 interface HealthPayload {
  status?: string;
  pid?: number;
+  /** Daemon's runtime.GOOS. Absent on daemons older than the #3916 fix. */
+  os?: string;
  uptime?: string;
  daemon_id?: string;
  device_name?: string;
@@ -161,6 +195,36 @@ async function fetchHealthAtPort(
  }
 }

+/**
+ * Validates the daemon profile's token against the backend to find out whether
+ * a stuck start is an auth problem. Hits the same endpoint `multica auth status`
+ * uses (GET /api/me) with the exact token the daemon loads from config.json, so
+ * the verdict matches what the daemon itself would get from the server.
+ *
+ * Only the HTTP status is inspected (never the body) so a future change to the
+ * /api/me response shape can't break this — a 401 means the token is rejected,
+ * a 2xx means it's fine, and a thrown request means the network is the problem,
+ * not auth. See classifyAuthProbe for the full rule set.
+ */
+async function probeTokenValidity(profile: string): Promise<AuthProbeResult> {
+  if (!targetApiBaseUrl) return "unknown";
+  const cfg = await readProfileConfig(profile);
+  const token = typeof cfg.token === "string" ? cfg.token : "";
+  if (!token) return classifyAuthProbe({ noToken: true });
+  try {
+    const controller = new AbortController();
+    const timeout = setTimeout(() => controller.abort(), 4_000);
+    const res = await fetch(`${targetApiBaseUrl.replace(/\/+$/, "")}/api/me`, {
+      headers: { Authorization: `Bearer ${token}` },
+      signal: controller.signal,
+    });
+    clearTimeout(timeout);
+    return classifyAuthProbe({ status: res.status });
+  } catch {
+    return classifyAuthProbe({ networkError: true });
+  }
+}
+
 // Desktop owns a dedicated CLI profile named after the target API host, so it
 // never reads or writes the user's hand-configured profiles. Profile dir:
 //   ~/.multica/profiles/desktop-<host>/
@@ -249,12 +313,57 @@ async function fetchHealth(): Promise<DaemonStatus> {
  const data = await fetchHealthAtPort(active.port);

  if (!data || data.status !== "running") {
+    // A start that never reaches "running" is the symptom; an expired/invalid
+    // login is the most common cause and the one with no other signal (the
+    // daemon exits before it can serve /health, so we can't read the reason
+    // from it). Probe the token once per attempt, after a grace period, to
+    // surface a re-login prompt instead of spinning on "starting" forever.
+    if (
+      currentState === "starting" &&
+      !authExpired &&
+      !authProbeDone &&
+      startingSince !== null &&
+      Date.now() - startingSince >= AUTH_PROBE_GRACE_MS
+    ) {
+      authProbeDone = true;
+      if ((await probeTokenValidity(active.name)) === "auth_expired") {
+        authExpired = true;
+      }
+    }
+    // Sticky: once login is known-expired, keep reporting it (even after
+    // currentState flips away from "starting") until the next start attempt or
+    // a successful /health clears the flag.
+    if (authExpired) {
+      return { state: "auth_expired", profile: active.name };
+    }
+    // The daemon binds /health before preflight finishes and self-reports
+    // "starting" until it's ready. Trust that over our own currentState, so a
+    // daemon booting on its own — or started via the CLI — surfaces as
+    // "starting" instead of "stopped".
+    if (data?.status === "starting") {
+      return { state: "starting", profile: active.name };
+    }
    return {
      state: currentState === "starting" ? "starting" : "stopped",
      profile: active.name,
    };
  }

+  // A live, authenticated daemon clears any prior auth-failure verdict so the
+  // re-login prompt disappears once the user reconnects.
+  authExpired = false;
+  startingSince = null;
+
+  // A running daemon whose OS differs from this host's is one we can't drive
+  // via the native lifecycle CLI (e.g. Linux-in-WSL2 behind a Windows desktop,
+  // reachable only over localhost forwarding). Surface it so the UI disables
+  // the auto-start/auto-stop toggles instead of letting them silently no-op,
+  // and so before-quit skips a stop that would never land. See #3916.
+  const externallyManaged = isDaemonExternallyManaged(
+    data.os,
+    normalizeHostOS(process.platform),
+  );
+
  // Safety: if we have a target URL and the daemon on our port reports a
  // different server_url, it's not "our" daemon — drop it and re-resolve.
  if (
@@ -278,6 +387,7 @@ async function fetchHealth(): Promise<DaemonStatus> {
      : 0,
    profile: active.name,
    serverUrl: data.server_url,
+    externallyManaged,
  };
 }

@@ -464,6 +574,15 @@ async function ensureRunningDaemonVersionMatches(): Promise<
 > {
  const active = await ensureActiveProfile();
  const running = await fetchHealthAtPort(active.port);
+
+  // Don't try to version-match a daemon we can't restart (e.g. WSL2). Treat it
+  // as up-to-date — restartDaemon would no-op anyway, and skipping here avoids
+  // a misleading "restarting daemon" log on every auto-start. #3916.
+  if (isDaemonExternallyManaged(running?.os, normalizeHostOS(process.platform))) {
+    pendingVersionRestart = false;
+    return "ok";
+  }
+
  const bundled = await getCliBinaryVersion();
  const action = decideVersionAction(bundled, running);

@@ -515,7 +634,13 @@ async function mintPat(jwt: string): Promise<string> {
  });
  if (!res.ok) {
    const body = await res.text().catch(() => "");
-    throw new Error(`mint PAT failed: ${res.status} ${res.statusText} ${body}`);
+    // Attach the status so callers can tell a genuine auth rejection (401 — the
+    // session token is dead) apart from a transient failure (5xx, etc.) without
+    // string-matching the message.
+    throw Object.assign(
+      new Error(`mint PAT failed: ${res.status} ${res.statusText} ${body}`),
+      { status: res.status },
+    );
  }
  const data = (await res.json()) as { token?: unknown };
  if (typeof data.token !== "string" || !data.token.startsWith("mul_")) {
@@ -580,7 +705,10 @@ async function syncToken(
  if (userChanged) {
    try {
      const existing = await fetchHealthAtPort(active.port);
-      if (existing?.status === "running") {
+      if (daemonStatusAlive(existing?.status)) {
+        // Restart whether it's "running" or still "starting" — a booting daemon
+        // already loaded the old token at startup, so it must be restarted to
+        // pick up the rotated credentials.
        console.log(
          "[daemon] user switched — restarting daemon with new credentials",
        );
@@ -620,6 +748,52 @@ async function clearToken(): Promise<void> {
  await removeProfileUserId(active.name);
 }

+// Result of a user-initiated daemon re-authentication. The distinction matters:
+// only `session_invalid` justifies signing the user out of the whole app; a
+// `transient` failure must keep them logged in so they can retry.
+export type ReauthResult =
+  | { ok: true }
+  | { ok: false; reason: "session_invalid" }
+  | { ok: false; reason: "transient"; message: string };
+
+function errorMessage(err: unknown): string {
+  return err instanceof Error ? err.message : String(err);
+}
+
+/**
+ * Recover the local daemon from the "auth_expired" state. Drops the stale
+ * cached PAT, mints a fresh one from the current session token, and restarts
+ * the daemon so it loads the new credential.
+ *
+ * Failures are classified rather than collapsed: a 401 from the mint means the
+ * session token itself is dead (`session_invalid` → the renderer drives a full
+ * re-login); anything else — mint 5xx, a network blip, a config write error, a
+ * restart hiccup — is `transient`, leaving the user signed in so they can retry.
+ * This mirrors the conservative classification the startup probe already uses.
+ */
+async function reauthenticate(
+  token: string,
+  userId: string,
+): Promise<ReauthResult> {
+  try {
+    await clearToken();
+    // syncToken mints a fresh PAT because clearToken just removed any cache.
+    await syncToken(token, userId);
+  } catch (err) {
+    if (isAuthStatusError(err)) return { ok: false, reason: "session_invalid" };
+    return { ok: false, reason: "transient", message: errorMessage(err) };
+  }
+  const restart = await restartDaemon();
+  if (!restart.success) {
+    return {
+      ok: false,
+      reason: "transient",
+      message: restart.error ?? "failed to restart daemon",
+    };
+  }
+  return { ok: true };
+}
+
 async function withGuard<T>(fn: () => Promise<T>): Promise<T | { success: false; error: string }> {
  if (operationInProgress) {
    return { success: false, error: "Another daemon operation is in progress" };
@@ -651,12 +825,19 @@ async function startDaemon(): Promise<{ success: boolean; error?: string }> {

  const active = await ensureActiveProfile();
  const existing = await fetchHealthAtPort(active.port);
-  if (existing?.status === "running") {
+  if (daemonStatusAlive(existing?.status)) {
+    // A daemon is already up ("running") or booting ("starting") on this port —
+    // don't spawn a second one (the CLI rejects that as "already running").
+    // Let polling track it through to "running".
    pollOnce();
    return { success: true };
  }

  currentState = "starting";
+  // Begin a fresh auth-probe window for this attempt.
+  startingSince = Date.now();
+  authProbeDone = false;
+  authExpired = false;
  sendStatus({ state: "starting" });

  const args = ["daemon", "start", ...profileArgs(active)];
@@ -665,7 +846,7 @@ async function startDaemon(): Promise<{ success: boolean; error?: string }> {
    execFile(
      bin,
      args,
-      { timeout: 20_000, env: desktopSpawnEnv() },
+      { timeout: DAEMON_START_EXEC_TIMEOUT_MS, env: desktopSpawnEnv() },
      (err) => {
        if (err) {
          currentState = "stopped";
@@ -683,12 +864,40 @@ async function startDaemon(): Promise<{ success: boolean; error?: string }> {
  });
 }

+/**
+ * Fresh boundary preflight for stop/restart: read the active profile's CURRENT
+ * /health and decide whether the daemon runs somewhere the app can't drive
+ * (WSL2 etc.). Done per call rather than off the poll cache, so a lifecycle op
+ * never shells out to a CLI that can't reach the daemon's process — even on
+ * paths that didn't just poll (e.g. restart-on-user-switch in syncToken, which
+ * calls restartDaemon directly). See #3916.
+ */
+async function lifecycleBlockedByForeignDaemon(): Promise<boolean> {
+  const active = await ensureActiveProfile();
+  return daemonLifecycleUnreachable(
+    async () => (await fetchHealthAtPort(active.port))?.os,
+    normalizeHostOS(process.platform),
+  );
+}
+
 async function stopDaemon(): Promise<{ success: boolean; error?: string }> {
+  // Central lifecycle guard: a daemon running in an environment we can't drive
+  // (e.g. Linux in WSL2 behind a Windows desktop) can't be stopped by the
+  // native CLI — it would act on the host process namespace and no-op, while
+  // still flipping our state to "stopped". Bail as a successful no-op so every
+  // caller (logout, quit, restart, the Runtime card) is covered in one place
+  // rather than each remembering to check. Preflighted against live /health so
+  // it holds even when no poll ran first. #3916.
+  if (await lifecycleBlockedByForeignDaemon()) return { success: true };
+
  const bin = await resolveCliBinary();
  if (!bin) return { success: false, error: "multica CLI is not installed" };

  const active = await ensureActiveProfile();
  currentState = "stopping";
+  // An explicit stop is a clean reset — drop any pending auth-failure verdict.
+  authExpired = false;
+  startingSince = null;
  sendStatus({ state: "stopping" });

  const args = ["daemon", "stop", ...profileArgs(active)];
@@ -707,6 +916,11 @@ async function stopDaemon(): Promise<{ success: boolean; error?: string }> {
 }

 async function restartDaemon(): Promise<{ success: boolean; error?: string }> {
+  // Same central, live-preflighted guard as stopDaemon: we can neither stop nor
+  // start a daemon we don't manage, so don't try (user-switch, reauth,
+  // first-workspace, and any future restart caller all route through here).
+  // #3916.
+  if (await lifecycleBlockedByForeignDaemon()) return { success: true };
  const stopResult = await stopDaemon();
  if (!stopResult.success) return stopResult;
  return startDaemon();
@@ -874,6 +1088,10 @@ export function setupDaemonManager(
    (_event, token: string, userId: string) => syncToken(token, userId),
  );
  ipcMain.handle("daemon:clear-token", () => clearToken());
+  ipcMain.handle(
+    "daemon:reauthenticate",
+    (_event, token: string, userId: string) => reauthenticate(token, userId),
+  );
  ipcMain.handle("daemon:is-cli-installed", async () => {
    const bin = await resolveCliBinary();
    return bin !== null;
@@ -950,6 +1168,8 @@ export function setupDaemonManager(
        isQuitting = true;
        event.preventDefault();
        try {
+          // stopDaemon no-ops for an externally-managed daemon (WSL2 etc.), so
+          // this is safe and instant in that case — the guard lives there. #3916
          await stopDaemon();
        } catch {
          // Best-effort stop on quit
--- a/apps/desktop/src/main/daemon-os.test.ts
+++ b/apps/desktop/src/main/daemon-os.test.ts
@@ -0,0 +1,80 @@
+import { describe, expect, it, vi } from "vitest";
+
+import {
+  daemonLifecycleUnreachable,
+  isDaemonExternallyManaged,
+  normalizeHostOS,
+} from "./daemon-os";
+
+describe("normalizeHostOS", () => {
+  it("maps win32 to the GOOS spelling 'windows'", () => {
+    expect(normalizeHostOS("win32")).toBe("windows");
+  });
+
+  it("passes darwin and linux through unchanged (already GOOS spellings)", () => {
+    expect(normalizeHostOS("darwin")).toBe("darwin");
+    expect(normalizeHostOS("linux")).toBe("linux");
+  });
+});
+
+describe("isDaemonExternallyManaged", () => {
+  it("flags a Linux (WSL2) daemon behind a Windows desktop — the #3916 case", () => {
+    expect(isDaemonExternallyManaged("linux", normalizeHostOS("win32"))).toBe(
+      true,
+    );
+  });
+
+  // These three are the "不误伤" guarantees: a native daemon on each platform
+  // must keep its auto-start/auto-stop toggles.
+  it("does NOT flag a native Windows daemon under a Windows desktop", () => {
+    expect(isDaemonExternallyManaged("windows", normalizeHostOS("win32"))).toBe(
+      false,
+    );
+  });
+
+  it("does NOT flag a native macOS daemon under a macOS desktop", () => {
+    expect(isDaemonExternallyManaged("darwin", normalizeHostOS("darwin"))).toBe(
+      false,
+    );
+  });
+
+  it("does NOT flag a native Linux daemon under a Linux desktop", () => {
+    expect(isDaemonExternallyManaged("linux", normalizeHostOS("linux"))).toBe(
+      false,
+    );
+  });
+
+  // Fail safe: an older daemon that predates the `os` field reports nothing.
+  // Hiding a toggle on a guess would 误伤, so unknown OS = treat as manageable.
+  it("fails safe to false when the daemon reports no OS", () => {
+    expect(isDaemonExternallyManaged(undefined, "windows")).toBe(false);
+    expect(isDaemonExternallyManaged("", "windows")).toBe(false);
+  });
+});
+
+// The stop/restart lifecycle boundary funnels through this. It must read the
+// daemon's LIVE OS (not a cached poll value), so a restart on a path that
+// didn't just poll — e.g. user-switch — still can't shell out at a WSL2 daemon.
+describe("daemonLifecycleUnreachable", () => {
+  it("consults the live OS reader and blocks a foreign-OS (WSL2) daemon", async () => {
+    const readDaemonOS = vi.fn().mockResolvedValue("linux");
+    expect(await daemonLifecycleUnreachable(readDaemonOS, "windows")).toBe(true);
+    // Proves the decision came from a fresh read, not a stale cache.
+    expect(readDaemonOS).toHaveBeenCalledTimes(1);
+  });
+
+  it("allows a native daemon whose live OS matches the host", async () => {
+    expect(
+      await daemonLifecycleUnreachable(async () => "windows", "windows"),
+    ).toBe(false);
+    expect(
+      await daemonLifecycleUnreachable(async () => "darwin", "darwin"),
+    ).toBe(false);
+  });
+
+  it("fails safe to false when the live OS is unknown (older daemon / none running)", async () => {
+    expect(
+      await daemonLifecycleUnreachable(async () => undefined, "windows"),
+    ).toBe(false);
+  });
+});
--- a/apps/desktop/src/main/daemon-os.ts
+++ b/apps/desktop/src/main/daemon-os.ts
@@ -0,0 +1,67 @@
+/**
+ * Detecting a daemon the desktop app can't manage.
+ *
+ * The app reads daemon liveness over HTTP at 127.0.0.1:{port}/health, but it
+ * starts/stops the daemon by shelling out to the bundled native CLI, which acts
+ * on the *host* OS process namespace. On Windows with the daemon running inside
+ * WSL2, /health is reachable via localhost forwarding (so status looks fine) but
+ * the daemon's process lives in a separate Linux namespace the Windows CLI can't
+ * touch — so auto-start / auto-stop silently do nothing (#3916).
+ *
+ * The reliable, low-false-positive signal is the daemon's own OS (reported as
+ * `os` on /health, = runtime.GOOS) vs the desktop host OS. They only disagree
+ * when the daemon runs in a foreign environment we can't drive. This module is
+ * the single source of truth for that comparison so it stays unit-tested — the
+ * cost of a false positive is hiding a working toggle from a native user, so the
+ * logic must fail safe (treat unknown / matching as manageable).
+ */
+
+/**
+ * Normalize a Node `process.platform` value to the daemon's `runtime.GOOS`
+ * vocabulary so the two are directly comparable. Only `win32` -> `windows`
+ * actually differs across the platforms we ship (darwin/linux already match);
+ * any other value passes through unchanged.
+ */
+export function normalizeHostOS(platform: NodeJS.Platform): string {
+  return platform === "win32" ? "windows" : platform;
+}
+
+/**
+ * Whether a running daemon is in an environment the desktop app can't control.
+ *
+ * Returns true ONLY when the daemon reports a concrete OS that differs from the
+ * host's. Fails safe to false when:
+ *   - `daemonOS` is missing/empty (older daemon that predates the `os` field, or
+ *     a malformed response) — we can't prove it's foreign, so keep toggles live.
+ *   - the OSes match — a normally-managed native daemon.
+ *
+ * Callers must only invoke this for a daemon that is actually running; a stopped
+ * daemon has no OS to compare and its toggles must stay enabled.
+ */
+export function isDaemonExternallyManaged(
+  daemonOS: string | undefined,
+  hostOS: string,
+): boolean {
+  if (typeof daemonOS !== "string" || daemonOS.length === 0) return false;
+  return daemonOS !== hostOS;
+}
+
+/**
+ * Boundary preflight for daemon lifecycle ops (stop / restart): resolve the
+ * daemon's CURRENT OS via `readDaemonOS` and return true when it's running
+ * somewhere the app can't drive.
+ *
+ * `readDaemonOS` is a live `/health` read performed at the call site — never a
+ * cached poll value. That is the whole point: a stale "manageable" cache would
+ * let a lifecycle op shell out to a native CLI that can't reach a WSL2 daemon
+ * (the PID lives in another namespace), which is exactly the bug. Taking the
+ * reader as a parameter keeps this unit-testable without the electron-coupled
+ * daemon-manager module, and lets the test prove the live value — not a cache —
+ * drives the decision. See #3916.
+ */
+export async function daemonLifecycleUnreachable(
+  readDaemonOS: () => Promise<string | undefined>,
+  hostOS: string,
+): Promise<boolean> {
+  return isDaemonExternallyManaged(await readDaemonOS(), hostOS);
+}
--- a/apps/desktop/src/main/freeze-breadcrumb.test.ts
+++ b/apps/desktop/src/main/freeze-breadcrumb.test.ts
@@ -0,0 +1,90 @@
+import { afterEach, describe, expect, it } from "vitest";
+import { mkdtempSync, rmSync, writeFileSync, existsSync } from "node:fs";
+import { tmpdir } from "node:os";
+import { join } from "node:path";
+
+import {
+  writeFreezeBreadcrumb,
+  readAndClearFreezeBreadcrumb,
+  clearFreezeBreadcrumb,
+  type FreezeBreadcrumb,
+} from "./freeze-breadcrumb";
+
+// Each test gets its own temp dir so the on-disk breadcrumb is isolated.
+const dirs: string[] = [];
+function tempFile(): string {
+  const dir = mkdtempSync(join(tmpdir(), "freeze-breadcrumb-"));
+  dirs.push(dir);
+  return join(dir, "last-client-failure.json");
+}
+
+afterEach(() => {
+  for (const dir of dirs.splice(0)) rmSync(dir, { recursive: true, force: true });
+});
+
+const sample: FreezeBreadcrumb = {
+  kind: "unresponsive",
+  context: { desktopRoute: { path: "/acme/issues" } },
+  ts: 1_700_000_000_000,
+  version: "0.3.1",
+};
+
+describe("freeze breadcrumb round-trip", () => {
+  it("writes then reads back the breadcrumb", () => {
+    const file = tempFile();
+    writeFreezeBreadcrumb(file, sample);
+    expect(readAndClearFreezeBreadcrumb(file)).toEqual(sample);
+  });
+
+  it("read clears the file so a failure reports exactly once", () => {
+    const file = tempFile();
+    writeFreezeBreadcrumb(file, sample);
+    expect(readAndClearFreezeBreadcrumb(file)).toEqual(sample);
+    expect(existsSync(file)).toBe(false);
+    expect(readAndClearFreezeBreadcrumb(file)).toBeNull();
+  });
+
+  it("clearFreezeBreadcrumb removes a pending breadcrumb (hang recovered)", () => {
+    const file = tempFile();
+    writeFreezeBreadcrumb(file, sample);
+    clearFreezeBreadcrumb(file);
+    expect(readAndClearFreezeBreadcrumb(file)).toBeNull();
+  });
+});
+
+// The breadcrumb crosses a process boundary (main writes, renderer flushes via
+// IPC) and lives across app versions — a future write shape or a corrupt file
+// must never throw into boot. CLAUDE.md "API Response Compatibility".
+describe("freeze breadcrumb defends against malformed input", () => {
+  it("returns null when no file exists", () => {
+    expect(readAndClearFreezeBreadcrumb(tempFile())).toBeNull();
+  });
+
+  it("returns null on corrupt JSON", () => {
+    const file = tempFile();
+    writeFileSync(file, "{ not valid json", "utf8");
+    expect(readAndClearFreezeBreadcrumb(file)).toBeNull();
+  });
+
+  it("returns null when `kind` is missing", () => {
+    const file = tempFile();
+    writeFileSync(file, JSON.stringify({ ts: 1, version: "x" }), "utf8");
+    expect(readAndClearFreezeBreadcrumb(file)).toBeNull();
+  });
+
+  it("returns null when `kind` is the wrong type", () => {
+    const file = tempFile();
+    writeFileSync(file, JSON.stringify({ kind: 42, context: {} }), "utf8");
+    expect(readAndClearFreezeBreadcrumb(file)).toBeNull();
+  });
+
+  it("returns null on a JSON null payload", () => {
+    const file = tempFile();
+    writeFileSync(file, "null", "utf8");
+    expect(readAndClearFreezeBreadcrumb(file)).toBeNull();
+  });
+
+  it("clearing a non-existent file is a no-op, never throws", () => {
+    expect(() => clearFreezeBreadcrumb(tempFile())).not.toThrow();
+  });
+});
--- a/apps/desktop/src/main/freeze-breadcrumb.ts
+++ b/apps/desktop/src/main/freeze-breadcrumb.ts
@@ -0,0 +1,76 @@
+import { writeFileSync, readFileSync, rmSync } from "node:fs";
+import type { FreezeBreadcrumb } from "../shared/freeze-breadcrumb";
+
+// When the renderer truly hangs or its process dies, it can't send telemetry
+// itself — the thread is blocked or gone. The main process (always alive) is
+// the only watcher that can react, but during the hang it can't reach the
+// renderer's posthog-js either. So it writes a breadcrumb to disk; the next
+// time a renderer boots, it reads + clears the file and reports the event.
+// This survives even a force-quit, which is the whole point.
+
+export type { FreezeBreadcrumb };
+
+/**
+ * Best-effort write. A breadcrumb we can't persist is lost, never fatal.
+ *
+ * Known limitation: this is a single slot — last write wins. Multiple failures
+ * within one session collapse to the last one, so per-session failure counts
+ * are undercounted. Acceptable for now: telemetry aggregates presence and
+ * frequency across users, not exhaustive per-session sequences. Upgrade to an
+ * append/ring buffer if per-session failure chains become a question.
+ */
+export function writeFreezeBreadcrumb(filePath: string, breadcrumb: FreezeBreadcrumb): void {
+  try {
+    writeFileSync(filePath, JSON.stringify(breadcrumb), "utf8");
+  } catch {
+    // Disk full / permissions — drop silently.
+  }
+}
+
+/**
+ * Delete a persisted breadcrumb. Called when the renderer recovers from a hang
+ * (a `responsive` event after `unresponsive`): the breadcrumb was written
+ * pre-emptively while the thread was stuck, but since it came back, the
+ * in-thread long-task watchdog already reports it — keeping the breadcrumb
+ * would double-count it AND mislabel a recovered window as `recovered: false`.
+ * Best-effort; a stale breadcrumb only costs one duplicate report.
+ */
+export function clearFreezeBreadcrumb(filePath: string): void {
+  try {
+    rmSync(filePath, { force: true });
+  } catch {
+    // Nothing to clear / permissions — ignore.
+  }
+}
+
+/**
+ * Read the breadcrumb and delete it in the same call, so a failure is reported
+ * exactly once. Returns null when there's no breadcrumb (the normal case) or
+ * when the file is unreadable / corrupt.
+ */
+export function readAndClearFreezeBreadcrumb(filePath: string): FreezeBreadcrumb | null {
+  let raw: string;
+  try {
+    raw = readFileSync(filePath, "utf8");
+  } catch {
+    return null;
+  }
+  try {
+    rmSync(filePath, { force: true });
+  } catch {
+    // If we can't delete it we'd re-report next launch; acceptable over throwing.
+  }
+  try {
+    const parsed: unknown = JSON.parse(raw);
+    if (
+      parsed &&
+      typeof parsed === "object" &&
+      typeof (parsed as FreezeBreadcrumb).kind === "string"
+    ) {
+      return parsed as FreezeBreadcrumb;
+    }
+  } catch {
+    // Corrupt JSON — drop.
+  }
+  return null;
+}
--- a/apps/desktop/src/main/index.ts
+++ b/apps/desktop/src/main/index.ts
@@ -13,11 +13,21 @@ import { installNavigationGestures } from "./navigation-gestures";
 import { getAppVersion } from "./app-version";
 import { loadRuntimeConfig } from "./runtime-config-loader";
 import type { RuntimeConfigResult } from "../shared/runtime-config";
+import {
+  RENDERER_ROUTE_CONTEXT_CHANNEL,
+  sanitizeRendererRouteContext,
+  type RendererRouteContext,
+} from "../shared/renderer-route-context";
 import {
  createElectronReloadPrompt,
  installRendererRecoveryHandlers,
  type RendererRecoveryWindow,
 } from "./renderer-recovery";
+import {
+  writeFreezeBreadcrumb,
+  readAndClearFreezeBreadcrumb,
+  clearFreezeBreadcrumb,
+} from "./freeze-breadcrumb";

 // Bundled icon used for dock/taskbar branding. macOS/Windows production
 // builds let the OS pick up the icon from the .app bundle / .exe resources,
@@ -61,7 +71,15 @@ if (process.platform !== "win32") {

 const PROTOCOL = "multica";

+// Where the main process parks a freeze/crash breadcrumb until the next
+// renderer boot flushes it to telemetry. Lives in userData so it survives a
+// force-quit. Resolved lazily — app.getPath is only valid after `ready`.
+function freezeBreadcrumbPath(): string {
+  return join(app.getPath("userData"), "last-client-failure.json");
+}
+
 let mainWindow: BrowserWindow | null = null;
+let latestRendererRouteContext: RendererRouteContext | null = null;
 let runtimeConfigResult: RuntimeConfigResult = {
  ok: false,
  error: { message: "Runtime config has not loaded yet" },
@@ -165,10 +183,19 @@ function createWindow(): void {
      additionalArguments: [`--multica-locale=${systemLocale}`],
    },
  });
+  const window = mainWindow;
+  latestRendererRouteContext = null;
+
+  window.on("closed", () => {
+    if (mainWindow === window) {
+      mainWindow = null;
+      latestRendererRouteContext = null;
+    }
+  });

  // Strip Origin header from WebSocket upgrade requests so the server's
  // origin whitelist doesn't reject connections from localhost dev origins.
-  mainWindow.webContents.session.webRequest.onBeforeSendHeaders(
+  window.webContents.session.webRequest.onBeforeSendHeaders(
    { urls: ["wss://*/*", "ws://*/*"] },
    (details, callback) => {
      delete details.requestHeaders["Origin"];
@@ -176,8 +203,8 @@ function createWindow(): void {
    },
  );

-  mainWindow.on("ready-to-show", () => {
-    mainWindow?.show();
+  window.on("ready-to-show", () => {
+    window.show();
  });

  // Detect OS language changes while the app is running. Electron has no
@@ -185,24 +212,28 @@ function createWindow(): void {
  // catches the common case where users switch System Settings → Language
  // and bring the app back. The renderer decides whether to act (it ignores
  // the signal when the user has an explicit Settings choice).
-  mainWindow.on("focus", () => {
+  window.on("focus", () => {
    const current = getSystemLocale();
    if (current === lastKnownSystemLocale) return;
    lastKnownSystemLocale = current;
-    mainWindow?.webContents.send("locale:system-changed", current);
+    window.webContents.send("locale:system-changed", current);
  });

-  mainWindow.webContents.setWindowOpenHandler((details) => {
+  window.webContents.setWindowOpenHandler((details) => {
    openExternalSafely(details.url);
    return { action: "deny" };
  });

  // Window-level keyboard shortcuts. Calling preventDefault here prevents
  // both the renderer keydown AND the application menu accelerator, so
-  // anything we own here (reload-block, zoom) is the sole handler for
-  // that combination — no double-fire with the macOS default View menu.
-  mainWindow.webContents.on("before-input-event", (event, input) => {
-    if (handleAppShortcut(input, mainWindow!.webContents)) {
+  // anything we own here (reload-block, zoom, tab-close) is the sole handler
+  // for that combination — no double-fire with the macOS default View menu.
+  window.webContents.on("before-input-event", (event, input) => {
+    const result = handleAppShortcut(input, window.webContents);
+    if (result === "close-tab") {
+      event.preventDefault();
+      window.webContents.send("tab:close-active");
+    } else if (result) {
      event.preventDefault();
    }
  });
@@ -224,7 +255,7 @@ function createWindow(): void {
    // Forward every renderer-side console.* call. The detail object also
    // carries source URL + line — included so a thrown stack trace from
    // window.onerror is traceable back to a file.
-    mainWindow.webContents.on("console-message", (details) => {
+    window.webContents.on("console-message", (details) => {
      const { level, message, sourceId, lineNumber } = details;
      log(level, `${message} (${sourceId}:${lineNumber})`);
    });
@@ -232,7 +263,7 @@ function createWindow(): void {
    // Fires when loadURL / loadFile can't reach its target (dev server
    // not up yet, network blip, file missing). errorCode is a Chromium
    // net error number; -3 = ABORTED is normal during HMR and skipped.
-    mainWindow.webContents.on(
+    window.webContents.on(
      "did-fail-load",
      (_event, errorCode, errorDescription, validatedURL, isMainFrame) => {
        if (errorCode === -3) return;
@@ -245,20 +276,41 @@ function createWindow(): void {

  }

-  installRendererRecoveryHandlers(mainWindow as unknown as RendererRecoveryWindow, {
+  installRendererRecoveryHandlers(window as unknown as RendererRecoveryWindow, {
    isDev: is.dev,
    showReloadPrompt: createElectronReloadPrompt((options) =>
-      dialog.showMessageBox(mainWindow!, options),
+      dialog.showMessageBox(window, options),
    ),
+    getDiagnosticContext: () => ({
+      windowUrl: window.webContents.getURL(),
+      ...(latestRendererRouteContext
+        ? { desktopRoute: latestRendererRouteContext }
+        : {}),
+    }),
+    // Only persist in production: a true hang/crash can't report itself, so we
+    // write a breadcrumb and the next renderer boot flushes it to PostHog. Dev
+    // is excluded to keep field telemetry clean.
+    persistBreadcrumb: is.dev
+      ? undefined
+      : (payload) =>
+          writeFreezeBreadcrumb(freezeBreadcrumbPath(), {
+            kind: payload.kind,
+            context: payload.context,
+            ts: Date.now(),
+            version: getAppVersion(),
+          }),
+    clearBreadcrumb: is.dev
+      ? undefined
+      : () => clearFreezeBreadcrumb(freezeBreadcrumbPath()),
  });

-  installContextMenu(mainWindow.webContents);
-  installNavigationGestures(mainWindow);
+  installContextMenu(window.webContents);
+  installNavigationGestures(window);

  if (is.dev && process.env["ELECTRON_RENDERER_URL"]) {
-    mainWindow.loadURL(process.env["ELECTRON_RENDERER_URL"]);
+    window.loadURL(process.env["ELECTRON_RENDERER_URL"]);
  } else {
-    mainWindow.loadFile(join(__dirname, "../renderer/index.html"));
+    window.loadFile(join(__dirname, "../renderer/index.html"));
  }
 }

@@ -365,6 +417,11 @@ if (!gotTheLock) {
      return openExternalSafely(url);
    });

+    // Renderer requests window close (e.g. Cmd+W on last tab).
+    ipcMain.on("window:close", () => {
+      mainWindow?.close();
+    });
+
    ipcMain.handle("file:download-url", (_event, url: string) => {
      if (!mainWindow) {
        console.warn("[download] ignored file:download-url — mainWindow torn down");
@@ -383,6 +440,14 @@ if (!gotTheLock) {
      event.returnValue = { version: getAppVersion(), os };
    });

+    // Sync IPC: read + clear any freeze/crash breadcrumb left by a previous
+    // session. The renderer flushes it to telemetry on boot (it couldn't be
+    // reported when it happened — the renderer was hung or gone). Read-and-
+    // clear so a failure reports exactly once.
+    ipcMain.on("freeze:get-last", (event) => {
+      event.returnValue = readAndClearFreezeBreadcrumb(freezeBreadcrumbPath());
+    });
+
    // Sync IPC: preload exposes the validated runtime config before renderer
    // boot. If desktop.json exists but is invalid, renderer receives the
    // blocking error and must not silently fall back to the cloud defaults.
@@ -390,6 +455,13 @@ if (!gotTheLock) {
      event.returnValue = runtimeConfigResult;
    });

+    ipcMain.on(RENDERER_ROUTE_CONTEXT_CHANNEL, (event, context: unknown) => {
+      if (!mainWindow || event.sender !== mainWindow.webContents) return;
+      const sanitized = sanitizeRendererRouteContext(context);
+      if (!sanitized) return;
+      latestRendererRouteContext = sanitized;
+    });
+
    // IPC: toggle immersive mode — hides the macOS traffic lights so full-screen
    // modals (e.g. create-workspace) can place UI in the top-left corner
    // without fighting the native window controls' hit-test.
--- a/apps/desktop/src/main/keyboard-shortcuts.test.ts
+++ b/apps/desktop/src/main/keyboard-shortcuts.test.ts
@@ -14,13 +14,14 @@ function makeWc(initialLevel = 0) {

 function key(
  k: string,
-  mods: Partial<Pick<ShortcutInput, "control" | "meta">> = {},
+  mods: Partial<Pick<ShortcutInput, "control" | "meta" | "shift">> = {},
 ): ShortcutInput {
  return {
    type: "keyDown",
    key: k,
    control: false,
    meta: false,
+    shift: false,
    ...mods,
  };
 }
@@ -150,3 +151,36 @@ describe("handleAppShortcut — unrelated keys pass through", () => {
    expect(handleAppShortcut(key("k", { meta: true }), wc, "darwin")).toBe(false);
  });
 });
+
+describe("handleAppShortcut — close tab (Cmd/Ctrl+W)", () => {
+  it('returns "close-tab" on Cmd+W (macOS)', () => {
+    const wc = makeWc();
+    expect(handleAppShortcut(key("w", { meta: true }), wc, "darwin")).toBe("close-tab");
+  });
+
+  it('returns "close-tab" on Cmd+W uppercase', () => {
+    const wc = makeWc();
+    expect(handleAppShortcut(key("W", { meta: true }), wc, "darwin")).toBe("close-tab");
+  });
+
+  it('returns "close-tab" on Ctrl+W (Linux/Windows)', () => {
+    const wc = makeWc();
+    expect(handleAppShortcut(key("w", { control: true }), wc, "linux")).toBe("close-tab");
+    expect(handleAppShortcut(key("w", { control: true }), wc, "win32")).toBe("close-tab");
+  });
+
+  it("does not trigger without Cmd/Ctrl modifier", () => {
+    const wc = makeWc();
+    expect(handleAppShortcut(key("w"), wc, "darwin")).toBe(false);
+  });
+
+  it("does not trigger on Cmd+Shift+W (reserved for close-window)", () => {
+    const wc = makeWc();
+    expect(handleAppShortcut(key("W", { meta: true, shift: true }), wc, "darwin")).toBe(false);
+  });
+
+  it("does not trigger on Ctrl+Shift+W (reserved for close-window)", () => {
+    const wc = makeWc();
+    expect(handleAppShortcut(key("W", { control: true, shift: true }), wc, "linux")).toBe(false);
+  });
+});
--- a/apps/desktop/src/main/keyboard-shortcuts.ts
+++ b/apps/desktop/src/main/keyboard-shortcuts.ts
@@ -8,6 +8,7 @@ export type ShortcutInput = {
  key: string;
  control: boolean;
  meta: boolean;
+  shift: boolean;
 };

 // Subset of WebContents the zoom handler needs. Keeps the test mock tiny.
@@ -34,11 +35,19 @@ const ZOOM_MAX = 4.5;
 * Handling the shortcuts here gives identical behavior on every platform
 * and every layout.
 */
+/**
+ * Result of handleAppShortcut:
+ * - `false`: not handled, let Electron continue
+ * - `true`: handled (preventDefault), no further action
+ * - `"close-tab"`: Cmd/Ctrl+W intercepted — caller should send IPC to renderer
+ */
+export type ShortcutResult = boolean | "close-tab";
+
 export function handleAppShortcut(
  input: ShortcutInput,
  webContents: ZoomTarget,
  platform: NodeJS.Platform = process.platform,
-): boolean {
+): ShortcutResult {
  if (input.type !== "keyDown") return false;
  const cmdOrCtrl = platform === "darwin" ? input.meta : input.control;

@@ -70,5 +79,12 @@ export function handleAppShortcut(
    return true;
  }

+  // Cmd/Ctrl + W → close active tab (or window if last tab).
+  // Cmd/Ctrl + Shift + W is reserved for "close window" — do not intercept.
+  // Return a signal so the caller can send IPC to the renderer.
+  if (input.key.toLowerCase() === "w" && !input.shift) {
+    return "close-tab";
+  }
+
  return false;
 }
--- a/apps/desktop/src/main/renderer-recovery.test.ts
+++ b/apps/desktop/src/main/renderer-recovery.test.ts
@@ -1,5 +1,5 @@
 import { describe, expect, it, vi, beforeEach, afterEach } from "vitest";
-import { installRendererRecoveryHandlers } from "./renderer-recovery";
+import { createElectronReloadPrompt, installRendererRecoveryHandlers } from "./renderer-recovery";

 type Handler = (...args: unknown[]) => void;

@@ -83,10 +83,50 @@ describe("installRendererRecoveryHandlers", () => {
    vi.useFakeTimers();
    const fixture = makeWindow();
    const showReloadPrompt = vi.fn(async () => "dismiss" as const);
+    const desktopRoute = {
+      surface: "tab",
+      path: "/acme/issues/MUL-3239",
+      workspaceSlug: "acme",
+      tabId: "tab-1",
+      reportedAt: "2026-06-15T00:00:00.000Z",
+    };

    installRendererRecoveryHandlers(fixture.window, {
      isDev: false,
      showReloadPrompt,
+      getDiagnosticContext: () => ({
+        windowUrl:
+          "file:///Applications/Multica.app/Contents/Resources/app.asar/index.html",
+        desktopRoute,
+      }),
+      unresponsivePromptDelayMs: 100,
+    });
+
+    fixture.windowHandlers.get("unresponsive")?.();
+    await vi.advanceTimersByTimeAsync(100);
+
+    expect(showReloadPrompt).toHaveBeenCalledWith({
+      kind: "unresponsive",
+      context: {
+        windowUrl:
+          "file:///Applications/Multica.app/Contents/Resources/app.asar/index.html",
+        desktopRoute,
+      },
+    });
+    expect(fixture.reload).not.toHaveBeenCalled();
+  });
+
+  it("keeps prompting when diagnostic context collection fails", async () => {
+    vi.useFakeTimers();
+    const fixture = makeWindow();
+    const showReloadPrompt = vi.fn(async () => "dismiss" as const);
+
+    installRendererRecoveryHandlers(fixture.window, {
+      isDev: false,
+      showReloadPrompt,
+      getDiagnosticContext: () => {
+        throw new Error("diagnostics unavailable");
+      },
      unresponsivePromptDelayMs: 100,
    });

@@ -94,7 +134,6 @@ describe("installRendererRecoveryHandlers", () => {
    await vi.advanceTimersByTimeAsync(100);

    expect(showReloadPrompt).toHaveBeenCalledWith({ kind: "unresponsive", context: {} });
-    expect(fixture.reload).not.toHaveBeenCalled();
  });

  it("keeps dev diagnostics non-prompting", async () => {
@@ -109,4 +148,124 @@ describe("installRendererRecoveryHandlers", () => {
    expect(showReloadPrompt).not.toHaveBeenCalled();
    expect(fixture.reload).not.toHaveBeenCalled();
  });
+
+  it("shows actionable recovery guidance before diagnostic details", async () => {
+    let detail = "";
+    const showMessageBox = vi.fn(
+      async (options: { title: string; message: string; detail: string }) => {
+        detail = options.detail;
+        return { response: 1 };
+      },
+    );
+    const showReloadPrompt = createElectronReloadPrompt(showMessageBox);
+
+    await showReloadPrompt({ kind: "unresponsive", context: {} });
+
+    expect(showMessageBox).toHaveBeenCalledWith(
+      expect.objectContaining({
+        title: "Multica needs to reload",
+        message: "The desktop window has been stuck for a few seconds.",
+        detail: expect.stringContaining(
+          "Click Reload to refresh this window and keep using Multica.",
+        ),
+      }),
+    );
+    expect(detail).toContain("what you were doing right before this message appeared");
+    expect(detail).toContain("Activity Monitor sample");
+    expect(detail).toContain("Diagnostic details:\nkind: unresponsive\ncontext: {}");
+  });
+});
+
+describe("freeze/crash breadcrumb state machine", () => {
+  beforeEach(() => vi.clearAllMocks());
+  afterEach(() => vi.useRealTimers());
+
+  function install(fixture: ReturnType<typeof makeWindow>) {
+    const persistBreadcrumb = vi.fn();
+    const clearBreadcrumb = vi.fn();
+    installRendererRecoveryHandlers(fixture.window, {
+      isDev: false,
+      showReloadPrompt: vi.fn(async () => "dismiss" as const),
+      persistBreadcrumb,
+      clearBreadcrumb,
+      unresponsivePromptDelayMs: 100,
+    });
+    return { persistBreadcrumb, clearBreadcrumb };
+  }
+
+  it("a sustained hang writes exactly one unresponsive breadcrumb", async () => {
+    vi.useFakeTimers();
+    const fixture = makeWindow();
+    const { persistBreadcrumb, clearBreadcrumb } = install(fixture);
+
+    fixture.windowHandlers.get("unresponsive")?.();
+    await vi.advanceTimersByTimeAsync(100);
+
+    expect(persistBreadcrumb).toHaveBeenCalledTimes(1);
+    expect(persistBreadcrumb).toHaveBeenCalledWith(
+      expect.objectContaining({ kind: "unresponsive" }),
+    );
+    expect(clearBreadcrumb).not.toHaveBeenCalled();
+  });
+
+  it("recovering after a written breadcrumb clears it (no double-count, no false recovered:false)", async () => {
+    vi.useFakeTimers();
+    const fixture = makeWindow();
+    const { persistBreadcrumb, clearBreadcrumb } = install(fixture);
+
+    fixture.windowHandlers.get("unresponsive")?.();
+    await vi.advanceTimersByTimeAsync(100);
+    expect(persistBreadcrumb).toHaveBeenCalledTimes(1);
+
+    fixture.windowHandlers.get("responsive")?.();
+    expect(clearBreadcrumb).toHaveBeenCalledTimes(1);
+  });
+
+  it("recovering before the delay never writes a breadcrumb, so nothing to clear", async () => {
+    vi.useFakeTimers();
+    const fixture = makeWindow();
+    const { persistBreadcrumb, clearBreadcrumb } = install(fixture);
+
+    fixture.windowHandlers.get("unresponsive")?.();
+    fixture.windowHandlers.get("responsive")?.();
+    await vi.advanceTimersByTimeAsync(100);
+
+    expect(persistBreadcrumb).not.toHaveBeenCalled();
+    expect(clearBreadcrumb).not.toHaveBeenCalled();
+  });
+
+  it("a hang that never recovers (force-quit) keeps its breadcrumb for next-boot reporting", async () => {
+    vi.useFakeTimers();
+    const fixture = makeWindow();
+    const { persistBreadcrumb, clearBreadcrumb } = install(fixture);
+
+    fixture.windowHandlers.get("unresponsive")?.();
+    await vi.advanceTimersByTimeAsync(100);
+
+    // No "responsive" ever fires — the breadcrumb must survive uncleared.
+    expect(persistBreadcrumb).toHaveBeenCalledTimes(1);
+    expect(clearBreadcrumb).not.toHaveBeenCalled();
+  });
+
+  it("a recoverable crash writes a breadcrumb and never clears it (a dead process never recovers)", () => {
+    const fixture = makeWindow();
+    const { persistBreadcrumb, clearBreadcrumb } = install(fixture);
+
+    fixture.webContentsHandlers.get("render-process-gone")?.({}, { reason: "crashed" });
+
+    expect(persistBreadcrumb).toHaveBeenCalledTimes(1);
+    expect(persistBreadcrumb).toHaveBeenCalledWith(
+      expect.objectContaining({ kind: "render-process-gone" }),
+    );
+    expect(clearBreadcrumb).not.toHaveBeenCalled();
+  });
+
+  it("a clean (non-crash) renderer exit writes no breadcrumb", () => {
+    const fixture = makeWindow();
+    const { persistBreadcrumb } = install(fixture);
+
+    fixture.webContentsHandlers.get("render-process-gone")?.({}, { reason: "clean-exit" });
+
+    expect(persistBreadcrumb).not.toHaveBeenCalled();
+  });
 });
--- a/apps/desktop/src/main/renderer-recovery.ts
+++ b/apps/desktop/src/main/renderer-recovery.ts
@@ -17,6 +17,22 @@ type ReloadPromptResult = "reload" | "dismiss";
 type RendererRecoveryOptions = {
  isDev: boolean;
  showReloadPrompt: (payload: ReloadPromptPayload) => Promise<ReloadPromptResult>;
+  getDiagnosticContext?: () => Record<string, unknown>;
+  /**
+   * Persist a freeze/crash breadcrumb to disk. The renderer can't report a
+   * true hang or process death itself (blocked / gone), so the main process
+   * writes it here and the next renderer boot flushes it to telemetry. Omit
+   * in dev to keep field telemetry clean.
+   */
+  persistBreadcrumb?: (payload: ReloadPromptPayload) => void;
+  /**
+   * Delete a previously-persisted unresponsive breadcrumb. Called when the
+   * renderer recovers (`responsive` after `unresponsive`): the window came
+   * back, so the in-thread watchdog reports the freeze and the breadcrumb
+   * would only double-count it. Crash breadcrumbs are never cleared — a dead
+   * process never recovers.
+   */
+  clearBreadcrumb?: () => void;
  log?: (tag: string, ...args: unknown[]) => void;
  unresponsivePromptDelayMs?: number;
 };
@@ -26,11 +42,21 @@ export function installRendererRecoveryHandlers(
  {
    isDev,
    showReloadPrompt,
+    getDiagnosticContext,
+    persistBreadcrumb,
+    clearBreadcrumb,
    log = defaultDevLog,
    unresponsivePromptDelayMs = 1500,
  }: RendererRecoveryOptions,
 ) {
  let unresponsivePromptTimer: ReturnType<typeof setTimeout> | null = null;
+  // True once a breadcrumb has been written for the current hang. A later
+  // `responsive` clears it; only a hang that never returns survives to report.
+  let unresponsiveBreadcrumbWritten = false;
+  const mergeDiagnosticContext = (context: Record<string, unknown>) => ({
+    ...readDiagnosticContext(getDiagnosticContext),
+    ...context,
+  });
  const maybePromptReload = (payload: ReloadPromptPayload) => {
    if (isDev) return;
    void showReloadPrompt(payload).then((result) => {
@@ -43,14 +69,23 @@ export function installRendererRecoveryHandlers(
  window.webContents.on("render-process-gone", (_event, details) => {
    if (isDev) log("process-gone", JSON.stringify(details));
    if (!isRecoverableRendererExit(details)) return;
-    maybePromptReload({ kind: "render-process-gone", context: { details } });
+    const payload: ReloadPromptPayload = {
+      kind: "render-process-gone",
+      context: mergeDiagnosticContext({ details }),
+    };
+    persistBreadcrumb?.(payload);
+    maybePromptReload(payload);
  });

+  // preload-error intentionally does NOT persist a breadcrumb: it's a startup
+  // failure of the preload script itself, and the breadcrumb-flush path depends
+  // on that same preload exposing `getLastFreeze` — if preload is broken, the
+  // next boot couldn't read it back anyway. We only prompt for reload here.
  window.webContents.on("preload-error", (_event, preloadPath, error) => {
    if (isDev) log("preload-error", `path=${preloadPath} err=${formatError(error)}`);
    maybePromptReload({
      kind: "preload-error",
-      context: { preloadPath, error: formatError(error) },
+      context: mergeDiagnosticContext({ preloadPath, error: formatError(error) }),
    });
  });

@@ -58,14 +93,27 @@ export function installRendererRecoveryHandlers(
    if (isDev || unresponsivePromptTimer) return;
    unresponsivePromptTimer = setTimeout(() => {
      unresponsivePromptTimer = null;
-      maybePromptReload({ kind: "unresponsive", context: {} });
+      const payload: ReloadPromptPayload = {
+        kind: "unresponsive",
+        context: mergeDiagnosticContext({}),
+      };
+      persistBreadcrumb?.(payload);
+      unresponsiveBreadcrumbWritten = true;
+      maybePromptReload(payload);
    }, unresponsivePromptDelayMs);
  });

  window.on("responsive", () => {
-    if (!unresponsivePromptTimer) return;
-    clearTimeout(unresponsivePromptTimer);
-    unresponsivePromptTimer = null;
+    if (unresponsivePromptTimer) {
+      clearTimeout(unresponsivePromptTimer);
+      unresponsivePromptTimer = null;
+    }
+    // The window came back: drop any breadcrumb written during this hang so it
+    // isn't re-reported (and mislabeled `recovered: false`) on next boot.
+    if (unresponsiveBreadcrumbWritten) {
+      clearBreadcrumb?.();
+      unresponsiveBreadcrumbWritten = false;
+    }
  });
 }

@@ -109,18 +157,30 @@ function isRecoverableRendererExit(details: unknown) {
 function rendererRecoveryMessage(kind: ReloadPromptPayload["kind"]) {
  switch (kind) {
    case "render-process-gone":
-      return "The desktop renderer process stopped responding or crashed.";
+      return "The desktop window stopped unexpectedly.";
    case "preload-error":
-      return "The desktop preload script failed before the app could start.";
+      return "The desktop window could not finish starting.";
    case "unresponsive":
-      return "The desktop window is not responding.";
+      return "The desktop window has been stuck for a few seconds.";
  }
 }

 function rendererRecoveryDetail(payload: ReloadPromptPayload) {
+  const guidance = [
+    "Click Reload to refresh this window and keep using Multica.",
+    "If this keeps happening, please tell us what you were doing right before this message appeared and whether Reload recovered the window.",
+  ];
+
+  if (payload.kind === "unresponsive") {
+    guidance.push(
+      "For macOS reports, an Activity Monitor sample of the Multica Helper (Renderer) process helps us find what blocked the app.",
+    );
+  }
+
  return [
-    "Reloading is the safest recovery path for this window.",
+    ...guidance,
    "",
+    "Diagnostic details:",
    `kind: ${payload.kind}`,
    `context: ${JSON.stringify(payload.context)}`,
  ].join("\n");
@@ -130,6 +190,17 @@ function defaultDevLog(tag: string, ...args: unknown[]) {
  process.stderr.write(`[renderer ${tag}] ${args.map(String).join(" ")}\n`);
 }

+function readDiagnosticContext(
+  getDiagnosticContext: (() => Record<string, unknown>) | undefined,
+) {
+  if (!getDiagnosticContext) return {};
+  try {
+    return getDiagnosticContext();
+  } catch {
+    return {};
+  }
+}
+
 function formatError(error: unknown) {
  return error instanceof Error ? (error.stack ?? error.message) : String(error);
-}
+}
--- a/apps/desktop/src/main/updater.test.ts
+++ b/apps/desktop/src/main/updater.test.ts
@@ -0,0 +1,170 @@
+import { afterEach, beforeEach, describe, expect, it, vi } from "vitest";
+import type { BrowserWindow, WebContents } from "electron";
+
+type Handler = (...args: unknown[]) => void;
+
+const ctx = vi.hoisted(() => ({
+  handlers: new Map<string, Handler[]>(),
+  ipcHandle: vi.fn(),
+  checkForUpdates: vi.fn(async () => ({
+    updateInfo: { version: "0.3.18" },
+    isUpdateAvailable: false,
+  })),
+  downloadUpdate: vi.fn(),
+  quitAndInstall: vi.fn(),
+  getVersion: vi.fn(() => "0.3.17"),
+}));
+
+vi.mock("electron-updater", () => {
+  const autoUpdater = {
+    autoDownload: false,
+    autoInstallOnAppQuit: false,
+    channel: undefined as string | undefined,
+    on: vi.fn((event: string, handler: Handler) => {
+      const handlers = ctx.handlers.get(event) ?? [];
+      handlers.push(handler);
+      ctx.handlers.set(event, handlers);
+      return autoUpdater;
+    }),
+    checkForUpdates: ctx.checkForUpdates,
+    downloadUpdate: ctx.downloadUpdate,
+    quitAndInstall: ctx.quitAndInstall,
+  };
+  return { autoUpdater };
+});
+
+vi.mock("electron", () => ({
+  app: {
+    getVersion: ctx.getVersion,
+  },
+  BrowserWindow: class BrowserWindow {},
+  ipcMain: {
+    handle: ctx.ipcHandle,
+  },
+}));
+
+import { setupAutoUpdater } from "./updater";
+
+function emitUpdater(event: string, ...args: unknown[]) {
+  for (const handler of ctx.handlers.get(event) ?? []) {
+    handler(...args);
+  }
+}
+
+function makeWindow() {
+  const send = vi.fn();
+  return {
+    win: {
+      isDestroyed: () => false,
+      webContents: {
+        isDestroyed: () => false,
+        send,
+      },
+    } as unknown as BrowserWindow,
+    send,
+  };
+}
+
+function makeDestroyedWindow() {
+  return {
+    isDestroyed: () => true,
+    get webContents(): WebContents {
+      throw new TypeError("Object has been destroyed");
+    },
+  } as unknown as BrowserWindow;
+}
+
+function makeWindowWithDestroyedWebContents() {
+  const send = vi.fn(() => {
+    throw new TypeError("Object has been destroyed");
+  });
+  return {
+    win: {
+      isDestroyed: () => false,
+      webContents: {
+        isDestroyed: () => true,
+        send,
+      },
+    } as unknown as BrowserWindow,
+    send,
+  };
+}
+
+function makeWindowWithThrowingSend(error: Error) {
+  const send = vi.fn(() => {
+    throw error;
+  });
+  return {
+    win: {
+      isDestroyed: () => false,
+      webContents: {
+        isDestroyed: () => false,
+        send,
+      },
+    } as unknown as BrowserWindow,
+    send,
+  };
+}
+
+describe("setupAutoUpdater", () => {
+  beforeEach(() => {
+    vi.useFakeTimers();
+    ctx.handlers.clear();
+    ctx.ipcHandle.mockClear();
+    ctx.checkForUpdates.mockClear();
+    ctx.downloadUpdate.mockClear();
+    ctx.quitAndInstall.mockClear();
+    ctx.getVersion.mockClear();
+  });
+
+  afterEach(() => {
+    vi.clearAllTimers();
+    vi.useRealTimers();
+  });
+
+  it("forwards update progress to a live renderer", () => {
+    const { win, send } = makeWindow();
+    setupAutoUpdater(() => win);
+
+    emitUpdater("download-progress", { percent: 42 });
+
+    expect(send).toHaveBeenCalledWith("updater:download-progress", {
+      percent: 42,
+    });
+  });
+
+  it("skips update progress when the BrowserWindow has already been destroyed", () => {
+    setupAutoUpdater(() => makeDestroyedWindow());
+
+    expect(() => emitUpdater("download-progress", { percent: 42 })).not.toThrow();
+  });
+
+  it("skips update progress when the BrowserWindow webContents has already been destroyed", () => {
+    const { win, send } = makeWindowWithDestroyedWebContents();
+    setupAutoUpdater(() => win);
+
+    expect(() => emitUpdater("download-progress", { percent: 42 })).not.toThrow();
+    expect(send).not.toHaveBeenCalled();
+  });
+
+  it("skips update progress when webContents.send loses a destroy race", () => {
+    const { win, send } = makeWindowWithThrowingSend(
+      new TypeError("Object has been destroyed"),
+    );
+    setupAutoUpdater(() => win);
+
+    expect(() => emitUpdater("download-progress", { percent: 42 })).not.toThrow();
+    expect(send).toHaveBeenCalledWith("updater:download-progress", {
+      percent: 42,
+    });
+  });
+
+  it("rethrows non-destroy errors from webContents.send", () => {
+    const { win } = makeWindowWithThrowingSend(new Error("boom"));
+    setupAutoUpdater(() => win);
+
+    expect(() => emitUpdater("download-progress", { percent: 42 })).toThrow(
+      "boom",
+    );
+  });
+});
--- a/apps/desktop/src/main/updater.ts
+++ b/apps/desktop/src/main/updater.ts
@@ -1,5 +1,5 @@
-import { autoUpdater, UpdateDownloadedEvent } from "electron-updater";
-import { app, BrowserWindow, ipcMain } from "electron";
+import { autoUpdater, type UpdateDownloadedEvent } from "electron-updater";
+import { app, type BrowserWindow, ipcMain } from "electron";

 // Silent background updates: electron-updater downloads on its own as soon
 // as `update-available` fires; we only surface UI when the package is fully
@@ -29,6 +29,32 @@ export type ManualUpdateCheckResult =
    }
  | { ok: false; error: string };

+type RendererChannel =
+  | "updater:update-available"
+  | "updater:download-progress"
+  | "updater:update-downloaded";
+
+function isDestroyedObjectError(err: unknown): boolean {
+  return err instanceof Error && err.message.includes("Object has been destroyed");
+}
+
+function sendToLiveRenderer(
+  win: BrowserWindow | null,
+  channel: RendererChannel,
+  payload: unknown,
+): void {
+  if (!win || win.isDestroyed()) return;
+
+  try {
+    const { webContents } = win;
+    if (webContents.isDestroyed()) return;
+    webContents.send(channel, payload);
+  } catch (err) {
+    if (isDestroyedObjectError(err)) return;
+    throw err;
+  }
+}
+
 // Single-flight guard around checkForUpdates(). With autoDownload=true the
 // startup, periodic, and manual triggers can all kick off downloads, and
 // overlapping calls have caused duplicate download warnings in the past
@@ -62,23 +88,20 @@ export function setupAutoUpdater(getMainWindow: () => BrowserWindow | null): voi
  autoUpdater.on("update-available", (info) => {
    // Forwarded for renderer-side state tracking only; the notification UI
    // does not render an "available" affordance with autoDownload=true.
-    const win = getMainWindow();
-    win?.webContents.send("updater:update-available", {
+    sendToLiveRenderer(getMainWindow(), "updater:update-available", {
      version: info.version,
      releaseNotes: info.releaseNotes,
    });
  });

  autoUpdater.on("download-progress", (progress) => {
-    const win = getMainWindow();
-    win?.webContents.send("updater:download-progress", {
+    sendToLiveRenderer(getMainWindow(), "updater:download-progress", {
      percent: progress.percent,
    });
  });

  autoUpdater.on("update-downloaded", (info: UpdateDownloadedEvent) => {
-    const win = getMainWindow();
-    win?.webContents.send("updater:update-downloaded", {
+    sendToLiveRenderer(getMainWindow(), "updater:update-downloaded", {
      version: info.version,
      releaseNotes: info.releaseNotes,
    });
--- a/apps/desktop/src/preload/index.d.ts
+++ b/apps/desktop/src/preload/index.d.ts
@@ -1,6 +1,8 @@
 import { ElectronAPI } from "@electron-toolkit/preload";
 import type { RuntimeConfigResult } from "../shared/runtime-config";
 import type { NavigationGesture } from "../shared/navigation-gestures";
+import type { RendererRouteContextInput } from "../shared/renderer-route-context";
+import type { FreezeBreadcrumb } from "../shared/freeze-breadcrumb";

 interface DesktopAPI {
  /** App version + normalized OS, captured synchronously at preload time. */
@@ -14,6 +16,9 @@ interface DesktopAPI {
  onSystemLocaleChanged: (callback: (locale: string) => void) => () => void;
  /** Validated runtime endpoint config, or a blocking config error. */
  runtimeConfig: RuntimeConfigResult;
+  /** Read + clear any freeze/crash breadcrumb from a previous session, so the
+   *  renderer can flush it to telemetry on boot. Null when nothing's pending. */
+  getLastFreeze: () => FreezeBreadcrumb | null;
  /** Listen for auth token delivered via deep link. Returns an unsubscribe function. */
  onAuthToken: (callback: (token: string) => void) => () => void;
  /** Listen for invitation IDs delivered via deep link. Returns an unsubscribe function. */
@@ -45,6 +50,8 @@ interface DesktopAPI {
  ) => () => void;
  /** Listen for native macOS back/forward swipe gestures. Returns an unsubscribe function. */
  onNavigationGesture: (callback: (gesture: NavigationGesture) => void) => () => void;
+  /** Report the renderer's memory-router path for recovery diagnostics. */
+  setRendererRouteContext: (context: RendererRouteContextInput) => void;
  /** Open the OS folder picker and return the chosen absolute path.
   *  Used by the Project settings "Add local directory" flow. */
  pickDirectory: (
@@ -71,10 +78,22 @@ interface DesktopAPI {
      | "error";
    error?: string;
  }>;
+  /** Listen for Cmd/Ctrl+W tab-close requests from the main process.
+   *  Returns an unsubscribe function. */
+  onCloseActiveTab: (callback: () => void) => () => void;
+  /** Ask the main process to close the window. */
+  closeWindow: () => void;
 }

 interface DaemonStatus {
-  state: "running" | "stopped" | "starting" | "stopping" | "installing_cli" | "cli_not_found";
+  state:
+    | "running"
+    | "stopped"
+    | "starting"
+    | "stopping"
+    | "installing_cli"
+    | "cli_not_found"
+    | "auth_expired";
  pid?: number;
  uptime?: string;
  daemonId?: string;
@@ -90,6 +109,11 @@ interface DaemonPrefs {
  autoStop: boolean;
 }

+type DaemonReauthResult =
+  | { ok: true }
+  | { ok: false; reason: "session_invalid" }
+  | { ok: false; reason: "transient"; message: string };
+
 interface DaemonAPI {
  start: () => Promise<{ success: boolean; error?: string }>;
  stop: () => Promise<{ success: boolean; error?: string }>;
@@ -100,6 +124,10 @@ interface DaemonAPI {
  setTargetApiUrl: (url: string) => Promise<void>;
  syncToken: (token: string, userId: string) => Promise<void>;
  clearToken: () => Promise<void>;
+  reauthenticate: (
+    token: string,
+    userId: string,
+  ) => Promise<DaemonReauthResult>;
  isCliInstalled: () => Promise<boolean>;
  getPrefs: () => Promise<DaemonPrefs>;
  setPrefs: (prefs: Partial<DaemonPrefs>) => Promise<DaemonPrefs>;
--- a/apps/desktop/src/preload/index.ts
+++ b/apps/desktop/src/preload/index.ts
@@ -1,6 +1,11 @@
 import { contextBridge, ipcRenderer } from "electron";
 import { electronAPI } from "@electron-toolkit/preload";
 import type { RuntimeConfigResult } from "../shared/runtime-config";
+import type { FreezeBreadcrumb } from "../shared/freeze-breadcrumb";
+import {
+  RENDERER_ROUTE_CONTEXT_CHANNEL,
+  type RendererRouteContextInput,
+} from "../shared/renderer-route-context";
 import {
  isNavigationGesture,
  NAVIGATION_GESTURE_CHANNEL,
@@ -74,6 +79,16 @@ const desktopAPI = {
  },
  /** Validated runtime endpoint config, or a blocking config error. */
  runtimeConfig,
+  /** Read + clear any freeze/crash breadcrumb left by a previous session, so
+   *  the renderer can flush it to telemetry on boot. Returns null when there's
+   *  nothing pending (the normal case). */
+  getLastFreeze: (): FreezeBreadcrumb | null => {
+    try {
+      return ipcRenderer.sendSync("freeze:get-last") as FreezeBreadcrumb | null;
+    } catch {
+      return null;
+    }
+  },
  /** Listen for auth token delivered via deep link */
  onAuthToken: (callback: (token: string) => void) => {
    const handler = (_event: Electron.IpcRendererEvent, token: string) =>
@@ -156,16 +171,38 @@ const desktopAPI = {
      ipcRenderer.removeListener(NAVIGATION_GESTURE_CHANNEL, handler);
    };
  },
+  /** Report the renderer's memory-router path for recovery diagnostics. */
+  setRendererRouteContext: (context: RendererRouteContextInput) =>
+    ipcRenderer.send(RENDERER_ROUTE_CONTEXT_CHANNEL, context),
  /** Open the OS folder picker and return the chosen absolute path. */
  pickDirectory: (defaultPath?: string) =>
    ipcRenderer.invoke("local-directory:pick", defaultPath),
  /** Validate that a path is an existing readable+writable directory. */
  validateLocalDirectory: (path: string) =>
    ipcRenderer.invoke("local-directory:validate", path),
+  /** Listen for Cmd/Ctrl+W tab-close requests from the main process.
+   *  The renderer should close the active tab; if it was the last tab,
+   *  call `closeWindow()` to dismiss the window. Returns an unsubscribe fn. */
+  onCloseActiveTab: (callback: () => void) => {
+    const handler = () => callback();
+    ipcRenderer.on("tab:close-active", handler);
+    return () => {
+      ipcRenderer.removeListener("tab:close-active", handler);
+    };
+  },
+  /** Ask the main process to close the window (used after closing the last tab). */
+  closeWindow: () => ipcRenderer.send("window:close"),
 };

 interface DaemonStatus {
-  state: "running" | "stopped" | "starting" | "stopping" | "installing_cli" | "cli_not_found";
+  state:
+    | "running"
+    | "stopped"
+    | "starting"
+    | "stopping"
+    | "installing_cli"
+    | "cli_not_found"
+    | "auth_expired";
  pid?: number;
  uptime?: string;
  daemonId?: string;
@@ -176,6 +213,11 @@ interface DaemonStatus {
  serverUrl?: string;
 }

+type DaemonReauthResult =
+  | { ok: true }
+  | { ok: false; reason: "session_invalid" }
+  | { ok: false; reason: "transient"; message: string };
+
 const daemonAPI = {
  start: (): Promise<{ success: boolean; error?: string }> =>
    ipcRenderer.invoke("daemon:start"),
@@ -198,6 +240,11 @@ const daemonAPI = {
    ipcRenderer.invoke("daemon:sync-token", token, userId),
  clearToken: (): Promise<void> =>
    ipcRenderer.invoke("daemon:clear-token"),
+  reauthenticate: (
+    token: string,
+    userId: string,
+  ): Promise<DaemonReauthResult> =>
+    ipcRenderer.invoke("daemon:reauthenticate", token, userId),
  isCliInstalled: (): Promise<boolean> =>
    ipcRenderer.invoke("daemon:is-cli-installed"),
  getPrefs: (): Promise<{ autoStart: boolean; autoStop: boolean }> =>
--- a/apps/desktop/src/renderer/src/App.tsx
+++ b/apps/desktop/src/renderer/src/App.tsx
@@ -19,6 +19,7 @@ import { useTabStore } from "./stores/tab-store";
 import { useWindowOverlayStore } from "./stores/window-overlay-store";
 import { useDaemonIPCBridge } from "./platform/daemon-ipc-bridge";
 import { createDesktopLocaleAdapter } from "./platform/i18n-adapter";
+import { captureEvent } from "@multica/core/analytics";
 import { RESOURCES } from "@multica/views/locales";

 // BCP-47 region tags for the <html lang> attribute, mirroring
@@ -34,10 +35,42 @@ const HTML_LANG: Record<SupportedLocale, string> = {
 };


+/**
+ * Cmd/Ctrl+W: close the active tab. When the last real tab is closed
+ * (or no tabs/workspace exist — e.g. login page), close the window.
+ *
+ * Mounted at the App root so every renderer state — including login,
+ * loading, onboarding, and runtime-config errors — has a working Cmd+W
+ * handler. Without this, states outside the tab shell would swallow the
+ * shortcut and do nothing.
+ */
+function useCmdWCloseTab() {
+  useEffect(() => {
+    return window.desktopAPI.onCloseActiveTab(() => {
+      const store = useTabStore.getState();
+      const { activeWorkspaceSlug, byWorkspace } = store;
+      if (!activeWorkspaceSlug) {
+        // No workspace — nothing to close, dismiss the window.
+        window.desktopAPI.closeWindow();
+        return;
+      }
+      const group = byWorkspace[activeWorkspaceSlug];
+      if (!group || group.tabs.length <= 1) {
+        // Last tab (or no tabs) — close the window.
+        window.desktopAPI.closeWindow();
+        return;
+      }
+      // Multiple tabs — close the active one.
+      store.closeActiveTab();
+    });
+  }, []);
+}
+
 function AppContent() {
  const user = useAuthStore((s) => s.user);
  const isLoading = useAuthStore((s) => s.isLoading);
  const qc = useQueryClient();
+
  // Deep-link login runs loginWithToken → syncToken → listWorkspaces →
  // setQueryData sequentially. loginWithToken sets user+isLoading=false
  // as soon as getMe resolves, which would cause DesktopShell to mount
@@ -298,6 +331,28 @@ export default function App() {
  const { version, os } = window.desktopAPI.appInfo;
  const systemLocale = window.desktopAPI.systemLocale;
  const runtimeConfigResult = window.desktopAPI.runtimeConfig;
+  useCmdWCloseTab();
+
+  // Flush a freeze/crash breadcrumb the main process parked from a previous
+  // session. A true hang or process death can't report itself when it happens
+  // (the renderer is blocked or gone), so the main process persists it and we
+  // emit it here on the next boot. The in-thread, recoverable freeze tier is
+  // handled separately by the shared watchdog in CoreProvider.
+  useEffect(() => {
+    const last = window.desktopAPI.getLastFreeze();
+    if (!last) return;
+    const crashed = last.kind === "render-process-gone";
+    captureEvent(crashed ? "client_crash" : "client_unresponsive", {
+      // Spread context FIRST so our explicit fields below always win — a
+      // future context key (e.g. its own `source`) must not silently override.
+      ...last.context,
+      source: crashed ? "render-process-gone" : "main-unresponsive",
+      recovered: false,
+      breadcrumb_ts: last.ts,
+      crashed_version: last.version,
+    });
+  }, []);
+
  // Stable identity reference so downstream effects (WS reconnect) don't
  // tear down on every parent render.
  const identity = useMemo(
--- a/apps/desktop/src/renderer/src/components/daemon-panel.tsx
+++ b/apps/desktop/src/renderer/src/components/daemon-panel.tsx
@@ -16,6 +16,7 @@ import {
  X,
 } from "lucide-react";
 import { cn } from "@multica/ui/lib/utils";
+import { copyText } from "@multica/ui/lib/clipboard";
 import { Button } from "@multica/ui/components/ui/button";
 import {
  Dialog,
@@ -194,15 +195,12 @@ export function DaemonPanel({

  const handleCopy = useCallback(async () => {
    const text = filtered.map((l) => l.raw).join("\n");
-    try {
-      await navigator.clipboard.writeText(text);
+    if (await copyText(text)) {
      toast.success(
        `Copied ${filtered.length} line${filtered.length === 1 ? "" : "s"}`,
      );
-    } catch (err) {
-      toast.error("Failed to copy", {
-        description: err instanceof Error ? err.message : String(err),
-      });
+    } else {
+      toast.error("Failed to copy");
    }
  }, [filtered]);

--- a/apps/desktop/src/renderer/src/components/daemon-runtime-card.test.tsx
+++ b/apps/desktop/src/renderer/src/components/daemon-runtime-card.test.tsx
@@ -0,0 +1,66 @@
+import { describe, expect, it, vi } from "vitest";
+import { render, screen } from "@testing-library/react";
+
+import type { DaemonStatus } from "../../../shared/daemon-types";
+
+// The component only needs these to render; stub them so the test focuses on
+// the externally-managed branching, not data fetching.
+vi.mock("@tanstack/react-query", () => ({
+  useQuery: () => ({ data: [] }),
+}));
+vi.mock("@multica/core/hooks", () => ({
+  useWorkspaceId: () => "ws-1",
+}));
+vi.mock("@multica/core/runtimes", () => ({
+  runtimeListOptions: () => ({ queryKey: ["runtimes"] }),
+}));
+vi.mock("@multica/core/agents", () => ({
+  agentTaskSnapshotOptions: () => ({ queryKey: ["snapshot"] }),
+}));
+vi.mock("./daemon-panel", () => ({ DaemonPanel: () => null }));
+vi.mock("../platform/daemon-reauth", () => ({
+  reauthenticateDaemon: vi.fn(),
+}));
+vi.mock("sonner", () => ({
+  toast: { error: vi.fn(), success: vi.fn() },
+}));
+
+import { DaemonRuntimeActions } from "./daemon-runtime-card";
+
+function stubDaemonAPI(status: DaemonStatus) {
+  Object.defineProperty(window, "daemonAPI", {
+    configurable: true,
+    value: {
+      getStatus: vi.fn().mockResolvedValue(status),
+      onStatusChange: vi.fn(() => () => {}),
+    },
+  });
+}
+
+describe("DaemonRuntimeActions — externally managed daemon (#3916)", () => {
+  it("hides Stop/Restart and shows the managed-outside hint for a daemon the app can't control", async () => {
+    stubDaemonAPI({ state: "running", daemonId: "d1", externallyManaged: true });
+    render(<DaemonRuntimeActions />);
+
+    // View logs still renders, confirming the running branch mounted.
+    expect(await screen.findByText("View logs")).toBeInTheDocument();
+    expect(screen.getByText("Managed outside the app")).toBeInTheDocument();
+    expect(screen.queryByText("Restart")).not.toBeInTheDocument();
+    expect(screen.queryByText("Stop")).not.toBeInTheDocument();
+  });
+
+  it("shows Stop/Restart for a normally-managed running daemon (no 误伤)", async () => {
+    stubDaemonAPI({
+      state: "running",
+      daemonId: "d1",
+      externallyManaged: false,
+    });
+    render(<DaemonRuntimeActions />);
+
+    expect(await screen.findByText("Restart")).toBeInTheDocument();
+    expect(screen.getByText("Stop")).toBeInTheDocument();
+    expect(
+      screen.queryByText("Managed outside the app"),
+    ).not.toBeInTheDocument();
+  });
+});
--- a/apps/desktop/src/renderer/src/components/daemon-runtime-card.tsx
+++ b/apps/desktop/src/renderer/src/components/daemon-runtime-card.tsx
@@ -6,6 +6,8 @@ import {
  RotateCw,
  Activity,
  ScrollText,
+  LogIn,
+  Info,
 } from "lucide-react";
 import { useQuery } from "@tanstack/react-query";
 import { useWorkspaceId } from "@multica/core/hooks";
@@ -22,6 +24,7 @@ import {
 } from "@multica/ui/components/ui/dialog";
 import { toast } from "sonner";
 import { DaemonPanel } from "./daemon-panel";
+import { reauthenticateDaemon } from "../platform/daemon-reauth";
 import type { DaemonStatus } from "../../../shared/daemon-types";
 import { DAEMON_STATE_LABELS } from "../../../shared/daemon-types";

@@ -115,9 +118,24 @@ export function DaemonRuntimeActions() {
    }
  }, []);

+  const handleReauth = useCallback(async () => {
+    setActionLoading(true);
+    await reauthenticateDaemon();
+    // onStatusChange resets actionLoading on the next status push; reset here
+    // too in case reauth logged out (unmount) or produced no status change.
+    setActionLoading(false);
+  }, []);
+
  const isRunning = status.state === "running";
+  // The daemon runs somewhere the app can't drive (e.g. inside WSL2): the
+  // lifecycle CLI acts on the host process namespace and can't reach it. Hide
+  // Stop/Restart so they don't silently no-op, mirroring the Settings tab. The
+  // real guard is in the main process (stopDaemon/restartDaemon); this is the
+  // matching UX. See #3916.
+  const externallyManaged = status.externallyManaged === true;
  const isStopped = status.state === "stopped";
  const isCliMissing = status.state === "cli_not_found";
+  const isAuthExpired = status.state === "auth_expired";
  const isTransitioning =
    status.state === "starting" || status.state === "stopping";
  const isInstalling = status.state === "installing_cli";
@@ -131,24 +149,33 @@ export function DaemonRuntimeActions() {
              <ScrollText className="size-3.5 mr-1.5" />
              View logs
            </Button>
-            <Button
-              size="sm"
-              variant="outline"
-              onClick={handleRestart}
-              disabled={actionLoading}
-            >
-              <RotateCw className="size-3.5 mr-1.5" />
-              Restart
-            </Button>
-            <Button
-              size="sm"
-              variant="destructive"
-              onClick={handleStopClick}
-              disabled={actionLoading}
-            >
-              <Square className="size-3.5 mr-1.5" />
-              Stop
-            </Button>
+            {externallyManaged ? (
+              <span className="inline-flex items-center gap-1.5 text-xs text-muted-foreground">
+                <Info className="size-3.5 shrink-0" />
+                Managed outside the app
+              </span>
+            ) : (
+              <>
+                <Button
+                  size="sm"
+                  variant="outline"
+                  onClick={handleRestart}
+                  disabled={actionLoading}
+                >
+                  <RotateCw className="size-3.5 mr-1.5" />
+                  Restart
+                </Button>
+                <Button
+                  size="sm"
+                  variant="destructive"
+                  onClick={handleStopClick}
+                  disabled={actionLoading}
+                >
+                  <Square className="size-3.5 mr-1.5" />
+                  Stop
+                </Button>
+              </>
+            )}
          </>
        )}

@@ -175,6 +202,23 @@ export function DaemonRuntimeActions() {
          </Button>
        )}

+        {isAuthExpired && (
+          <>
+            <span className="inline-flex items-center gap-1.5 text-xs text-destructive">
+              <AlertCircle className="size-3.5 shrink-0" />
+              Sign-in expired
+            </span>
+            <Button size="sm" onClick={handleReauth} disabled={actionLoading}>
+              {actionLoading ? (
+                <Activity className="size-3.5 mr-1.5 animate-pulse" />
+              ) : (
+                <LogIn className="size-3.5 mr-1.5" />
+              )}
+              Sign in again
+            </Button>
+          </>
+        )}
+
        {(isTransitioning || isInstalling) && (
          <Button size="sm" variant="outline" disabled>
            <Activity className="size-3.5 mr-1.5 animate-pulse" />
--- a/apps/desktop/src/renderer/src/components/daemon-settings-tab.tsx
+++ b/apps/desktop/src/renderer/src/components/daemon-settings-tab.tsx
@@ -1,7 +1,9 @@
 import { useState, useEffect, useCallback, type ReactNode } from "react";
+import { AlertCircle, Info, LogIn } from "lucide-react";
 import { Button } from "@multica/ui/components/ui/button";
 import { Switch } from "@multica/ui/components/ui/switch";
 import { cn } from "@multica/ui/lib/utils";
+import { reauthenticateDaemon } from "../platform/daemon-reauth";
 import type { DaemonPrefs, DaemonStatus } from "../../../shared/daemon-types";
 import {
  DAEMON_STATE_COLORS,
@@ -61,6 +63,7 @@ export function DaemonSettingsTab() {
  const [cliInstalled, setCliInstalled] = useState<boolean | null>(null);
  const [saving, setSaving] = useState(false);
  const [status, setStatus] = useState<DaemonStatus>({ state: "stopped" });
+  const [reauthLoading, setReauthLoading] = useState(false);

  useEffect(() => {
    window.daemonAPI.getPrefs().then(setPrefs);
@@ -69,6 +72,12 @@ export function DaemonSettingsTab() {
    return window.daemonAPI.onStatusChange(setStatus);
  }, []);

+  const handleReauth = useCallback(async () => {
+    setReauthLoading(true);
+    await reauthenticateDaemon();
+    setReauthLoading(false);
+  }, []);
+
  const updatePref = useCallback(
    async (key: keyof DaemonPrefs, value: boolean) => {
      setSaving(true);
@@ -79,6 +88,12 @@ export function DaemonSettingsTab() {
    [],
  );

+  // The daemon runs somewhere the app can't drive (e.g. inside WSL2 behind a
+  // Windows desktop): /health is reachable but the lifecycle CLI can't reach
+  // its process. Auto-start/auto-stop can't work, so disable them and say why
+  // rather than letting the toggles silently no-op. See #3916.
+  const externallyManaged = status.externallyManaged === true;
+
  return (
    <div>
      <h2 className="text-lg font-semibold">Daemon</h2>
@@ -86,6 +101,43 @@ export function DaemonSettingsTab() {
        Configure how the local agent daemon behaves with the desktop app.
      </p>

+      {status.state === "auth_expired" && (
+        <div className="mt-4 flex items-start gap-3 rounded-lg border border-destructive/40 bg-destructive/5 px-4 py-3">
+          <AlertCircle className="mt-0.5 size-4 shrink-0 text-destructive" />
+          <div className="min-w-0 flex-1">
+            <p className="text-sm font-medium text-destructive">
+              Sign-in expired
+            </p>
+            <p className="mt-0.5 text-sm text-muted-foreground">
+              The local daemon couldn&apos;t authenticate, so this device
+              can&apos;t take tasks. Sign in again to restore it.
+            </p>
+          </div>
+          <Button
+            size="sm"
+            className="shrink-0"
+            onClick={handleReauth}
+            disabled={reauthLoading}
+          >
+            <LogIn className="size-3.5 mr-1.5" />
+            Sign in again
+          </Button>
+        </div>
+      )}
+
+      {externallyManaged && (
+        <div className="mt-4 flex items-start gap-3 rounded-lg border bg-muted/30 px-4 py-3">
+          <Info className="mt-0.5 size-4 shrink-0 text-muted-foreground" />
+          <p className="min-w-0 text-sm text-muted-foreground">
+            This device&apos;s daemon runs outside the app — for example inside
+            WSL2 — so the app can&apos;t start or stop it. Start or stop it from
+            that environment with{" "}
+            <code className="font-mono text-xs">multica daemon start</code> /{" "}
+            <code className="font-mono text-xs">multica daemon stop</code>.
+          </p>
+        </div>
+      )}
+
      <div className="mt-6 divide-y">
        <SettingRow
          label="Auto-start on launch"
@@ -94,7 +146,7 @@ export function DaemonSettingsTab() {
          <Switch
            checked={prefs.autoStart}
            onCheckedChange={(checked) => updatePref("autoStart", checked)}
-            disabled={saving}
+            disabled={saving || externallyManaged}
          />
        </SettingRow>

@@ -105,7 +157,7 @@ export function DaemonSettingsTab() {
          <Switch
            checked={prefs.autoStop}
            onCheckedChange={(checked) => updatePref("autoStop", checked)}
-            disabled={saving}
+            disabled={saving || externallyManaged}
          />
        </SettingRow>

--- a/apps/desktop/src/renderer/src/components/desktop-layout.tsx
+++ b/apps/desktop/src/renderer/src/components/desktop-layout.tsx
@@ -1,4 +1,4 @@
-import { useEffect, useSyncExternalStore } from "react";
+import { useEffect, useRef, useSyncExternalStore } from "react";
 import { ChevronLeft, ChevronRight } from "lucide-react";
 import { cn } from "@multica/ui/lib/utils";
 import { useTabHistory } from "@/hooks/use-tab-history";
@@ -14,6 +14,7 @@ import { AppSidebar } from "@multica/views/layout";
 import { SearchCommand, SearchTrigger } from "@multica/views/search";
 import { ChatFab, ChatWindow } from "@multica/views/chat";
 import { WorkspaceSlugProvider, paths, useCurrentWorkspace } from "@multica/core/paths";
+import { useNavigation } from "@multica/views/navigation";
 import { getCurrentSlug, subscribeToCurrentSlug } from "@multica/core/platform";
 import { useDesktopUnreadBadge } from "@multica/views/platform";
 import { DesktopNavigationProvider } from "@/platform/navigation";
@@ -127,18 +128,30 @@ function useInternalLinkHandler() {
 *      inbox even if the user has since switched to workspace B. Marking
 *      the row read is handled by InboxPage's selected-item effect, which
 *      covers both click-to-select and URL-param-select paths.
+ *
+ * The click routes through `useNavigation().push` — NOT the
+ * `multica:navigate` event, whose handler `openTab`s into the ACTIVE
+ * workspace's tab group. The navigation adapter detects a cross-workspace
+ * path and translates it into `switchWorkspace(slug, path)`, so clicking a
+ * workspace-A notification while B is active performs a real workspace
+ * switch instead of mounting A's inbox inside B's tab group (#3766).
 */
 function DesktopInboxBridge() {
  const workspace = useCurrentWorkspace();
  useDesktopUnreadBadge(workspace?.id ?? null);
+  const { push } = useNavigation();
+  // The adapter identity changes with the active tab's location; the ref
+  // keeps the main-process subscription stable across navigations.
+  const pushRef = useRef(push);
+  useEffect(() => {
+    pushRef.current = push;
+  }, [push]);

  useEffect(() => {
    return window.desktopAPI.onInboxOpen(({ slug, issueKey }) => {
      if (!slug) return;
      const inboxPath = `${paths.workspace(slug).inbox()}?issue=${encodeURIComponent(issueKey)}`;
-      window.dispatchEvent(
-        new CustomEvent("multica:navigate", { detail: { path: inboxPath } }),
-      );
+      pushRef.current(inboxPath);
    });
  }, []);

--- a/apps/desktop/src/renderer/src/components/pageview-tracker.tsx
+++ b/apps/desktop/src/renderer/src/components/pageview-tracker.tsx
@@ -7,6 +7,7 @@ import {
  useTabStore,
 } from "@/stores/tab-store";
 import { useWindowOverlayStore, type WindowOverlay } from "@/stores/window-overlay-store";
+import type { RendererRouteContextInput } from "../../../shared/renderer-route-context";

 /**
 * Fires a PostHog $pageview whenever the user's visible surface changes,
@@ -90,6 +91,16 @@ export function PageviewTracker() {
    const last = lastSurfaceRef.current;
    const next = { kind, key, path };

+    const routeContext: RendererRouteContextInput = {
+      surface: kind,
+      path,
+    };
+    if (kind === "tab") {
+      routeContext.workspaceSlug = activeWorkspaceSlug ?? undefined;
+      routeContext.tabId = activeTabId ?? undefined;
+    }
+    reportRendererRouteContext(routeContext);
+
    if (kind === "tab" && key !== null) {
      const knownPath = observed.get(key);
      const isReactivation =
@@ -112,6 +123,13 @@ export function PageviewTracker() {
  return null;
 }

+function reportRendererRouteContext(context: RendererRouteContextInput) {
+  const desktopAPI = window.desktopAPI as
+    | { setRendererRouteContext?: (context: RendererRouteContextInput) => void }
+    | undefined;
+  desktopAPI?.setRendererRouteContext?.(context);
+}
+
 function overlayPath(overlay: WindowOverlay): string {
  switch (overlay.type) {
    case "new-workspace":
--- a/apps/desktop/src/renderer/src/platform/daemon-ipc-bridge.ts
+++ b/apps/desktop/src/renderer/src/platform/daemon-ipc-bridge.ts
@@ -11,7 +11,14 @@ import type { AgentRuntime } from "@multica/core/types";
 * to the desktop preload typings (which live in apps/desktop/src/preload).
 */
 interface DaemonStatusLike {
-  state: "running" | "stopped" | "starting" | "stopping" | "installing_cli" | "cli_not_found";
+  state:
+    | "running"
+    | "stopped"
+    | "starting"
+    | "stopping"
+    | "installing_cli"
+    | "cli_not_found"
+    | "auth_expired";
  daemonId?: string;
 }

@@ -25,7 +32,11 @@ interface DaemonStatusLike {
 * within 75s.
 */
 function mergeDaemonStatus(rt: AgentRuntime, status: DaemonStatusLike): AgentRuntime {
-  if (status.state === "stopped" || status.state === "stopping") {
+  if (
+    status.state === "stopped" ||
+    status.state === "stopping" ||
+    status.state === "auth_expired"
+  ) {
    return { ...rt, status: "offline" };
  }
  if (status.state === "running") {
--- a/apps/desktop/src/renderer/src/platform/daemon-reauth.test.ts
+++ b/apps/desktop/src/renderer/src/platform/daemon-reauth.test.ts
@@ -0,0 +1,98 @@
+import { beforeEach, describe, expect, it, vi } from "vitest";
+
+const { mockGetState, logout } = vi.hoisted(() => ({
+  mockGetState: vi.fn(),
+  logout: vi.fn(),
+}));
+
+const { toastError } = vi.hoisted(() => ({ toastError: vi.fn() }));
+
+vi.mock("@multica/core/auth", () => ({
+  useAuthStore: { getState: mockGetState },
+}));
+
+vi.mock("sonner", () => ({
+  toast: { error: toastError },
+}));
+
+import { reauthenticateDaemon } from "./daemon-reauth";
+
+const daemonAPI = {
+  reauthenticate: vi.fn(),
+};
+
+beforeEach(() => {
+  vi.clearAllMocks();
+  localStorage.clear();
+  (window as unknown as { daemonAPI: typeof daemonAPI }).daemonAPI = daemonAPI;
+  mockGetState.mockReturnValue({ user: { id: "user-1" }, logout });
+});
+
+describe("reauthenticateDaemon", () => {
+  it("re-mints + restarts the daemon when signed in, without logging out", async () => {
+    localStorage.setItem("multica_token", "jwt-abc");
+    daemonAPI.reauthenticate.mockResolvedValue({ ok: true });
+
+    await reauthenticateDaemon();
+
+    expect(daemonAPI.reauthenticate).toHaveBeenCalledWith("jwt-abc", "user-1");
+    expect(logout).not.toHaveBeenCalled();
+    expect(toastError).not.toHaveBeenCalled();
+  });
+
+  it("logs out only when the session token itself is rejected (401)", async () => {
+    localStorage.setItem("multica_token", "jwt-abc");
+    daemonAPI.reauthenticate.mockResolvedValue({
+      ok: false,
+      reason: "session_invalid",
+    });
+
+    await reauthenticateDaemon();
+
+    expect(logout).toHaveBeenCalledOnce();
+    expect(toastError).not.toHaveBeenCalled();
+  });
+
+  // The reviewer's must-fix: a non-401 (transient) failure must NOT log the
+  // user out — they stay signed in and can retry.
+  it("does NOT log out on a transient failure; shows a retryable toast", async () => {
+    localStorage.setItem("multica_token", "jwt-abc");
+    daemonAPI.reauthenticate.mockResolvedValue({
+      ok: false,
+      reason: "transient",
+      message: "mint PAT failed: 503 Service Unavailable",
+    });
+
+    await reauthenticateDaemon();
+
+    expect(logout).not.toHaveBeenCalled();
+    expect(toastError).toHaveBeenCalledOnce();
+  });
+
+  it("does NOT log out when the IPC call itself throws unexpectedly", async () => {
+    localStorage.setItem("multica_token", "jwt-abc");
+    daemonAPI.reauthenticate.mockRejectedValue(new Error("ipc boom"));
+
+    await reauthenticateDaemon();
+
+    expect(logout).not.toHaveBeenCalled();
+    expect(toastError).toHaveBeenCalledOnce();
+  });
+
+  it("routes to login when there is no session token", async () => {
+    await reauthenticateDaemon();
+
+    expect(logout).toHaveBeenCalledOnce();
+    expect(daemonAPI.reauthenticate).not.toHaveBeenCalled();
+  });
+
+  it("routes to login when there is no signed-in user", async () => {
+    localStorage.setItem("multica_token", "jwt-abc");
+    mockGetState.mockReturnValue({ user: null, logout });
+
+    await reauthenticateDaemon();
+
+    expect(logout).toHaveBeenCalledOnce();
+    expect(daemonAPI.reauthenticate).not.toHaveBeenCalled();
+  });
+});
--- a/apps/desktop/src/renderer/src/platform/daemon-reauth.ts
+++ b/apps/desktop/src/renderer/src/platform/daemon-reauth.ts
@@ -0,0 +1,48 @@
+import { useAuthStore } from "@multica/core/auth";
+import { toast } from "sonner";
+
+/**
+ * Re-establish the local daemon's credentials after it failed to authenticate
+ * (daemon state "auth_expired", surfaced by daemon-manager's token probe — see
+ * #3512).
+ *
+ * The desktop owns the daemon's PAT: it mints one from the user's session token
+ * and caches it per profile. A stale/revoked cached PAT is the common cause (and
+ * merely restarting the app reuses the same bad PAT), so the main process drops
+ * the cached token, mints a fresh one, and restarts the daemon.
+ *
+ * Failure handling is deliberately conservative — we only force a full re-login
+ * when the session token itself is rejected (a real 401). A transient failure
+ * (mint 5xx, network blip, config write error, restart hiccup) keeps the user
+ * signed in and shows a retryable toast, so a momentary glitch never logs them
+ * out. The 401-vs-transient classification happens in the main process where the
+ * real HTTP status is available; here we just act on the verdict.
+ */
+export async function reauthenticateDaemon(): Promise<void> {
+  const user = useAuthStore.getState().user;
+  const token = localStorage.getItem("multica_token");
+  if (!user || !token) {
+    // No usable session at all — the standard recovery is the login page.
+    useAuthStore.getState().logout();
+    return;
+  }
+
+  try {
+    const result = await window.daemonAPI.reauthenticate(token, user.id);
+    if (result.ok) return; // daemon restarting; status flips via onStatusChange
+    if (result.reason === "session_invalid") {
+      // The session token itself is rejected (401) — full re-login.
+      useAuthStore.getState().logout();
+      return;
+    }
+    // Transient failure — keep the user signed in and let them retry.
+    toast.error("Couldn't reconnect the daemon", {
+      description: result.message || "Please try again in a moment.",
+    });
+  } catch (err) {
+    // An unexpected IPC error is not an auth failure — never log out on it.
+    toast.error("Couldn't reconnect the daemon", {
+      description: err instanceof Error ? err.message : "Please try again.",
+    });
+  }
+}
--- a/apps/desktop/src/shared/daemon-types.test.ts
+++ b/apps/desktop/src/shared/daemon-types.test.ts
@@ -0,0 +1,22 @@
+import { describe, it, expect } from "vitest";
+import { daemonStatusAlive } from "./daemon-types";
+
+describe("daemonStatusAlive", () => {
+  it("treats a ready daemon as alive", () => {
+    expect(daemonStatusAlive("running")).toBe(true);
+  });
+
+  it("treats a still-booting daemon as alive", () => {
+    // /health binds before preflight and reports "starting" until ready; the
+    // Desktop must not spawn a second daemon over it (the CLI rejects that as
+    // "already running").
+    expect(daemonStatusAlive("starting")).toBe(true);
+  });
+
+  it("treats stopped / unknown / missing as not alive", () => {
+    expect(daemonStatusAlive("stopped")).toBe(false);
+    expect(daemonStatusAlive("bogus")).toBe(false);
+    expect(daemonStatusAlive("")).toBe(false);
+    expect(daemonStatusAlive(undefined)).toBe(false);
+  });
+});
--- a/apps/desktop/src/shared/daemon-types.ts
+++ b/apps/desktop/src/shared/daemon-types.ts
@@ -4,7 +4,11 @@ export type DaemonState =
  | "starting"
  | "stopping"
  | "installing_cli"
-  | "cli_not_found";
+  | "cli_not_found"
+  // The daemon can't start because the server rejected its credentials (the
+  // cached PAT expired / was revoked, or the session token is dead). Without
+  // this, an auth failure silently sticks at "starting" forever — see #3512.
+  | "auth_expired";

 export interface DaemonStatus {
  state: DaemonState;
@@ -18,6 +22,16 @@ export interface DaemonStatus {
  profile?: string;
  /** Backend URL the daemon connects to. */
  serverUrl?: string;
+  /**
+   * True when a daemon is running but in an environment the app can't control
+   * — its reported OS differs from the desktop host's (e.g. a Linux daemon
+   * inside WSL2 behind a Windows desktop, reachable only via localhost
+   * forwarding). The app's start/stop CLI acts on the host process namespace,
+   * so auto-start/auto-stop can't reach it; the UI disables those toggles
+   * instead of silently no-op'ing. Only ever set on a running daemon, so it
+   * never disables the toggles for a normally-managed native daemon. See #3916.
+   */
+  externallyManaged?: boolean;
 }

 export interface DaemonPrefs {
@@ -32,6 +46,7 @@ export const DAEMON_STATE_COLORS: Record<DaemonState, string> = {
  stopping: "bg-amber-500 animate-pulse",
  installing_cli: "bg-sky-500 animate-pulse",
  cli_not_found: "bg-red-500",
+  auth_expired: "bg-red-500",
 };

 export const DAEMON_STATE_LABELS: Record<DaemonState, string> = {
@@ -41,6 +56,7 @@ export const DAEMON_STATE_LABELS: Record<DaemonState, string> = {
  stopping: "Stopping…",
  installing_cli: "Setting up…",
  cli_not_found: "Setup Failed",
+  auth_expired: "Sign-in required",
 };

 export function formatUptime(uptime?: string): string {
@@ -52,6 +68,19 @@ export function formatUptime(uptime?: string): string {
  return `${h}${m}`.trim() || uptime;
 }

+/**
+ * Whether a raw daemon `/health` `status` value means a live daemon is on the
+ * port — either fully "running" (ready) or still "starting" (port bound,
+ * preflight in progress). Mirrors the Go `daemonAlive()` in
+ * server/cmd/multica/cmd_daemon.go so the Desktop lifecycle agrees with the
+ * CLI: a "starting" daemon is already there and must not be spawned over (the
+ * CLI rejects that as "already running"). This is liveness, not readiness —
+ * version-restart decisions still gate on the stricter "running".
+ */
+export function daemonStatusAlive(status: string | undefined): boolean {
+  return status === "running" || status === "starting";
+}
+
 /**
 * User-facing description for the local daemon's current state. Replaces the
 * raw state label ("Running" / "Stopped") with a sentence that answers
@@ -81,5 +110,7 @@ export function daemonStateDescription(state: DaemonState, runtimeCount: number)
      return "Setting up the runtime for the first time. Only happens once.";
    case "cli_not_found":
      return "Setup failed · couldn't download the runtime. Check your network.";
+    case "auth_expired":
+      return "Sign-in expired · sign in again to bring this device back online.";
  }
 }
--- a/apps/desktop/src/shared/freeze-breadcrumb.ts
+++ b/apps/desktop/src/shared/freeze-breadcrumb.ts
@@ -0,0 +1,16 @@
+/**
+ * A freeze/crash breadcrumb persisted by the main process and flushed to
+ * telemetry by the next renderer boot. Shared across main, preload, and
+ * renderer because all three touch it. See main/freeze-breadcrumb.ts for the
+ * read/write logic and the rationale.
+ */
+export interface FreezeBreadcrumb {
+  /** "unresponsive" (hang) or "render-process-gone" (crash). */
+  kind: string;
+  /** Diagnostic context captured at failure time (route, window url, …). */
+  context: Record<string, unknown>;
+  /** Epoch ms when the failure was recorded. */
+  ts: number;
+  /** App version at failure time. */
+  version: string;
+}
--- a/apps/desktop/src/shared/renderer-route-context.ts
+++ b/apps/desktop/src/shared/renderer-route-context.ts
@@ -0,0 +1,51 @@
+export const RENDERER_ROUTE_CONTEXT_CHANNEL = "renderer:route-context";
+
+export type RendererRouteSurface = "login" | "overlay" | "tab";
+
+export type RendererRouteContextInput = {
+  surface: RendererRouteSurface;
+  path: string;
+  workspaceSlug?: string;
+  tabId?: string;
+};
+
+export type RendererRouteContext = RendererRouteContextInput & {
+  reportedAt: string;
+};
+
+const MAX_ROUTE_CONTEXT_STRING_LENGTH = 512;
+
+export function sanitizeRendererRouteContext(
+  value: unknown,
+  reportedAt = new Date(),
+): RendererRouteContext | null {
+  if (!value || typeof value !== "object") return null;
+
+  const input = value as Record<string, unknown>;
+  if (!isRendererRouteSurface(input.surface)) return null;
+
+  const path = sanitizeString(input.path);
+  if (!path) return null;
+
+  const workspaceSlug = sanitizeString(input.workspaceSlug);
+  const tabId = sanitizeString(input.tabId);
+
+  return {
+    surface: input.surface,
+    path,
+    ...(workspaceSlug ? { workspaceSlug } : {}),
+    ...(tabId ? { tabId } : {}),
+    reportedAt: reportedAt.toISOString(),
+  };
+}
+
+function isRendererRouteSurface(value: unknown): value is RendererRouteSurface {
+  return value === "login" || value === "overlay" || value === "tab";
+}
+
+function sanitizeString(value: unknown): string | undefined {
+  if (typeof value !== "string") return undefined;
+  const trimmed = value.trim();
+  if (!trimmed) return undefined;
+  return trimmed.slice(0, MAX_ROUTE_CONTEXT_STRING_LENGTH);
+}
--- a/apps/docs/content/docs/auth-setup.ja.mdx
+++ b/apps/docs/content/docs/auth-setup.ja.mdx
@@ -37,7 +37,7 @@ SMTP 経路は、ほとんどのオンプレミスメールサーバー（特に
 |---|---|---|---|
 | 匿名内部 relay | `25` | なし — IP / サブネットで送信を信頼 | 伝送経路上はなし（内部セグメント専用） |
 | 認証付き送信（submission） | `587` | `SMTP_USERNAME` + `SMTP_PASSWORD` | STARTTLS、自動アップグレード |
-| 暗黙的 TLS（SMTPS） | `465` | — | **まだサポートされていません** — ポート 25 または 587 を使用してください |
+| 暗黙的 TLS（SMTPS） | `465` | 任意（`SMTP_USERNAME` + `SMTP_PASSWORD`） | 接続時に TLS ハンドシェイク — ポート `465` で自動的に有効化、非標準ポートでは `SMTP_TLS=implicit` で強制 |

 **ポート 25 の匿名 Exchange relay** — 認証情報なしで信頼されたサブネットからのメールを受け入れる、典型的な「internal SMTP relay」/ Exchange 匿名 receive connector:

@@ -61,7 +61,27 @@ SMTP_TLS_INSECURE=false        # set true only for self-signed / private CA
 RESEND_FROM_EMAIL=noreply@yourdomain.com
 ```

-起動時に、サーバーは選択したプロバイダーを出力します。例えば `EmailService: SMTP relay exchange.internal.example.com:25 from=noreply@example.com`（または `Resend API` / `DEV mode`）のように表示されます。パスワードがログに記録されることは決してありません。再起動後に SMTP の行が見えない場合は `SMTP_HOST` がプロセスに届いていないので、コンテナ環境（`docker compose -f docker-compose.selfhost.yml exec backend env | grep SMTP`）を確認してください。
+**ポート 465 の暗黙的 TLS（SMTPS）** — SMTPS のみを提供し STARTTLS を通知しないプロバイダー（例: Aliyun / Tencent のエンタープライズメール）向け。ポート `465` は暗黙的 TLS を自動的に有効化します。`SMTP_TLS=implicit`（別名: `smtps`、`ssl`）は非標準の SMTPS ポートでこれを強制します:
+
+```bash
+SMTP_HOST=smtp.qiye.aliyun.com
+SMTP_PORT=465                  # implicit TLS auto-enabled on 465
+SMTP_USERNAME=multica@yourdomain.com
+SMTP_PASSWORD=...
+SMTP_TLS=implicit              # optional on 465; required on a non-standard SMTPS port
+RESEND_FROM_EMAIL=noreply@yourdomain.com
+```
+
+**厳格な公開 relay（例: Google Workspace `smtp-relay.gmail.com`）** はさらに有効な EHLO 名を必要とします。これらの relay は公開 IP からのデフォルトの `localhost` 挨拶を拒否し、relay が接続を切断します — これは挨拶の時点ではなく、後続のコマンドで不明瞭な `EOF`（`smtp auth: EOF`）として表面化します。`SMTP_EHLO_NAME` を relay が期待する FQDN に設定してください。デフォルトはマシンのホスト名で、コンテナ内では通常は有効な FQDN ではありません。
+
+```bash
+SMTP_HOST=smtp-relay.gmail.com
+SMTP_PORT=587
+SMTP_EHLO_NAME=mail.yourdomain.com   # FQDN the relay accepts; defaults to the (non-FQDN) container hostname
+RESEND_FROM_EMAIL=noreply@yourdomain.com
+```
+
+起動時に、サーバーは選択したプロバイダーを、ネゴシエートされた TLS モードも含めて出力します。例えば `EmailService: SMTP relay exchange.internal.example.com:25 (starttls) from=noreply@example.com` や `… smtp.qiye.aliyun.com:465 (implicit-tls) from=…`（または `Resend API` / `DEV mode`）のように表示されます。パスワードがログに記録されることは決してありません。再起動後に SMTP の行が見えない場合は `SMTP_HOST` がプロセスに届いていないので、コンテナ環境（`docker compose -f docker-compose.selfhost.yml exec backend env | grep SMTP`）を確認してください。

 **どちらも設定しない場合**: サーバーはエラーを出しませんが、**送信されるはずだったすべてのメールがサーバーの stdout にのみ書き出されます**。ローカル開発には便利ですが（ログからコードをコピーできます）、プロダクションではブラックホールになります。

--- a/apps/docs/content/docs/auth-setup.ko.mdx
+++ b/apps/docs/content/docs/auth-setup.ko.mdx
@@ -37,7 +37,7 @@ SMTP 경로는 대부분의 온프레미스 메일 서버(특히 Microsoft Excha
 |---|---|---|---|
 | 익명 내부 relay | `25` | 없음 — IP / 서브넷으로 제출을 신뢰 | 전송 경로상 없음(내부 세그먼트 전용) |
 | 인증된 제출(submission) | `587` | `SMTP_USERNAME` + `SMTP_PASSWORD` | STARTTLS, 자동 업그레이드 |
-| 암묵적 TLS (SMTPS) | `465` | — | **아직 지원하지 않음** — 포트 25 또는 587을 사용하세요 |
+| 암묵적 TLS (SMTPS) | `465` | 선택 사항(`SMTP_USERNAME` + `SMTP_PASSWORD`) | 연결 시 TLS 핸드셰이크 — 포트 `465`에서 자동 활성화, 비표준 포트에서는 `SMTP_TLS=implicit`로 강제 |

 **포트 25의 익명 Exchange relay** — 자격 증명 없이 신뢰된 서브넷에서 오는 메일을 받아들이는 일반적인 "internal SMTP relay" / Exchange 익명 receive connector:

@@ -61,7 +61,27 @@ SMTP_TLS_INSECURE=false        # set true only for self-signed / private CA
 RESEND_FROM_EMAIL=noreply@yourdomain.com
 ```

-시작 시 서버는 선택한 제공자를 출력합니다. 예를 들어 `EmailService: SMTP relay exchange.internal.example.com:25 from=noreply@example.com`(또는 `Resend API` / `DEV mode`)와 같이 표시됩니다. 비밀번호는 절대 로그에 기록되지 않습니다. 재시작 후 SMTP 줄이 보이지 않는다면 `SMTP_HOST`가 프로세스에 도달하지 못한 것이므로, 컨테이너 환경(`docker compose -f docker-compose.selfhost.yml exec backend env | grep SMTP`)을 확인하세요.
+**포트 465의 암묵적 TLS(SMTPS)** — SMTPS만 제공하고 STARTTLS를 알리지 않는 제공자(예: Aliyun / Tencent 엔터프라이즈 메일)용. 포트 `465`는 암묵적 TLS를 자동으로 활성화하며, `SMTP_TLS=implicit`(별칭: `smtps`, `ssl`)는 비표준 SMTPS 포트에서 이를 강제합니다:
+
+```bash
+SMTP_HOST=smtp.qiye.aliyun.com
+SMTP_PORT=465                  # implicit TLS auto-enabled on 465
+SMTP_USERNAME=multica@yourdomain.com
+SMTP_PASSWORD=...
+SMTP_TLS=implicit              # optional on 465; required on a non-standard SMTPS port
+RESEND_FROM_EMAIL=noreply@yourdomain.com
+```
+
+**엄격한 공개 relay(예: Google Workspace `smtp-relay.gmail.com`)** 는 추가로 유효한 EHLO 이름을 요구합니다. 이들은 공개 IP에서 보내는 기본 `localhost` greeting을 거부하며, relay가 연결을 끊습니다 — 이는 greeting 단계가 아니라 이후 명령에서 불투명한 `EOF`(`smtp auth: EOF`)로 나타납니다. relay가 기대하는 FQDN으로 `SMTP_EHLO_NAME`을 설정하세요. 기본값은 머신 호스트명이며, 컨테이너 안에서는 보통 유효한 FQDN이 아닙니다:
+
+```bash
+SMTP_HOST=smtp-relay.gmail.com
+SMTP_PORT=587
+SMTP_EHLO_NAME=mail.yourdomain.com   # FQDN the relay accepts; defaults to the (non-FQDN) container hostname
+RESEND_FROM_EMAIL=noreply@yourdomain.com
+```
+
+시작 시 서버는 협상된 TLS 모드를 포함하여 선택한 제공자를 출력합니다. 예를 들어 `EmailService: SMTP relay exchange.internal.example.com:25 (starttls) from=noreply@example.com` 또는 `… smtp.qiye.aliyun.com:465 (implicit-tls) from=…`(또는 `Resend API` / `DEV mode`)와 같이 표시됩니다. 비밀번호는 절대 로그에 기록되지 않습니다. 재시작 후 SMTP 줄이 보이지 않는다면 `SMTP_HOST`가 프로세스에 도달하지 못한 것이므로, 컨테이너 환경(`docker compose -f docker-compose.selfhost.yml exec backend env | grep SMTP`)을 확인하세요.

 **둘 다 설정하지 않으면**: 서버는 오류를 내지 않지만, **전송되어야 했던 모든 이메일이 서버의 stdout에만 기록됩니다**. 로컬 개발에는 편리하지만(로그에서 코드를 복사하면 됩니다), 프로덕션에서는 블랙홀이 됩니다.

--- a/apps/docs/content/docs/auth-setup.mdx
+++ b/apps/docs/content/docs/auth-setup.mdx
@@ -72,6 +72,15 @@ SMTP_TLS=implicit              # optional on 465; required on a non-standard SMT
 RESEND_FROM_EMAIL=noreply@yourdomain.com
 ```

+**Strict public relays (e.g. Google Workspace `smtp-relay.gmail.com`)** additionally require a valid EHLO name. They reject the default `localhost` greeting from a public IP, and the relay drops the connection — which surfaces as an opaque `EOF` on a later command (`smtp auth: EOF`) rather than at the greeting. Set `SMTP_EHLO_NAME` to the FQDN the relay expects; it defaults to the machine hostname, which inside a container is usually not a valid FQDN:
+
+```bash
+SMTP_HOST=smtp-relay.gmail.com
+SMTP_PORT=587
+SMTP_EHLO_NAME=mail.yourdomain.com   # FQDN the relay accepts; defaults to the (non-FQDN) container hostname
+RESEND_FROM_EMAIL=noreply@yourdomain.com
+```
+
 At startup the server prints which provider it picked, including the negotiated TLS mode — for example `EmailService: SMTP relay exchange.internal.example.com:25 (starttls) from=noreply@example.com` or `… smtp.qiye.aliyun.com:465 (implicit-tls) from=…` (or `Resend API` / `DEV mode`). The password is never logged. If you don't see the SMTP line after restart, `SMTP_HOST` didn't reach the process — check the container env (`docker compose -f docker-compose.selfhost.yml exec backend env | grep SMTP`).

 **What happens if you set neither**: the server doesn't error, but **every email that should have been sent is written to the server's stdout only**. Handy for local development (copy the code from the logs); in production it's a black hole.
--- a/apps/docs/content/docs/auth-setup.zh.mdx
+++ b/apps/docs/content/docs/auth-setup.zh.mdx
@@ -72,6 +72,15 @@ SMTP_TLS=implicit              # 465 上可省略；在非标准 SMTPS 端口上
 RESEND_FROM_EMAIL=noreply@yourdomain.com
 ```

+**严格公网 relay（例如 Google Workspace `smtp-relay.gmail.com`）**还要求一个合法的 EHLO 名称。它们会拒绝来自公网 IP 的默认 `localhost` 问候，relay 随即断开连接——这不会在问候阶段报错，而是在后续某条命令上表现为一个不知所云的 `EOF`（`smtp auth: EOF`）。把 `SMTP_EHLO_NAME` 设成 relay 期望的 FQDN；它默认取机器主机名，而在容器内这通常不是合法的 FQDN：
+
+```bash
+SMTP_HOST=smtp-relay.gmail.com
+SMTP_PORT=587
+SMTP_EHLO_NAME=mail.yourdomain.com   # relay 接受的 FQDN；默认取（非 FQDN 的）容器主机名
+RESEND_FROM_EMAIL=noreply@yourdomain.com
+```
+
 启动时 server 会打印当前选择的 provider 和协商出的 TLS 模式，比如 `EmailService: SMTP relay exchange.internal.example.com:25 (starttls) from=noreply@example.com` 或 `… smtp.qiye.aliyun.com:465 (implicit-tls) from=…`（或 `Resend API` / `DEV mode`），密码不会出现在日志里。重启后没看到 SMTP 这行，说明 `SMTP_HOST` 没进到进程，确认下容器环境（`docker compose -f docker-compose.selfhost.yml exec backend env | grep SMTP`）。

 **两种都不配**：server 不报错，但所有本该发出去的邮件**只打到 server 的 stdout**。本地开发方便（你从日志抄验证码），生产环境等于黑洞。
--- a/apps/docs/content/docs/cli.ja.mdx
+++ b/apps/docs/content/docs/cli.ja.mdx
@@ -79,6 +79,19 @@ CI やヘッドレス環境では、ブラウザフローをスキップでき
 | `multica skill import ...` | GitHub、ClawHub、またはローカルマシンからスキルをインポート |
 | `multica skill files ...` | ネスト: スキルのファイルを管理 |

+### スキルインポートの競合
+
+`multica skill import --url <url>` の既定値は `--on-conflict fail` です。同じ名前のスキルがすでに存在する場合、コマンドは構造化された `conflict` 結果で終了し、ワークスペースは変更されません。
+
+既存スキルの作成者で、スキル ID とエージェントの紐付けを維持したまま内容を置き換える場合は `--on-conflict overwrite` を使います。既存スキルを残してコピーを取り込む場合は `--on-conflict rename` を使うと、`-2` のような接尾辞が自動で付きます。同名の項目を単に飛ばす場合は `--on-conflict skip` を使います。
+
+```bash
+multica skill import --url https://skills.sh/acme/repo/review-helper
+multica skill import --url https://skills.sh/acme/repo/review-helper --on-conflict overwrite
+multica skill import --url https://skills.sh/acme/repo/review-helper --on-conflict rename
+multica skill import --url https://skills.sh/acme/repo/review-helper --on-conflict skip
+```
+
 ## スクワッド

 | コマンド | 用途 |
--- a/apps/docs/content/docs/cli.ko.mdx
+++ b/apps/docs/content/docs/cli.ko.mdx
@@ -79,6 +79,19 @@ CI나 headless 환경에서는 브라우저 플로우를 건너뛰세요. 웹
 | `multica skill import ...` | GitHub, ClawHub, 또는 로컬 기기에서 스킬 가져오기 |
 | `multica skill files ...` | 중첩: 스킬의 파일 관리 |

+### 스킬 가져오기 충돌
+
+`multica skill import --url <url>`의 기본값은 `--on-conflict fail`입니다. 같은 이름의 스킬이 이미 있으면 명령은 구조화된 `conflict` 결과로 종료되며 워크스페이스를 변경하지 않습니다.
+
+기존 스킬을 만든 사용자이고, 스킬 ID와 에이전트 연결은 유지한 채 내용을 바꾸려면 `--on-conflict overwrite`를 사용하세요. 기존 스킬을 그대로 두고 복사본을 가져오려면 `--on-conflict rename`을 사용하면 `-2` 같은 접미사가 자동으로 붙습니다. 같은 이름의 항목을 그냥 건너뛰려면 `--on-conflict skip`을 사용하세요.
+
+```bash
+multica skill import --url https://skills.sh/acme/repo/review-helper
+multica skill import --url https://skills.sh/acme/repo/review-helper --on-conflict overwrite
+multica skill import --url https://skills.sh/acme/repo/review-helper --on-conflict rename
+multica skill import --url https://skills.sh/acme/repo/review-helper --on-conflict skip
+```
+
 ## 스쿼드

 | 명령어 | 용도 |
--- a/apps/docs/content/docs/cli.mdx
+++ b/apps/docs/content/docs/cli.mdx
@@ -79,6 +79,25 @@ For the difference between token types, see [Authentication and tokens](/auth-to
 | `multica skill import ...` | Import a skill from GitHub, ClawHub, or the local machine |
 | `multica skill files ...` | Nested: manage a skill's files |

+### Skill import conflicts
+
+`multica skill import --url <url>` defaults to `--on-conflict fail`. If a skill
+with the same name already exists, the command exits with a structured
+`conflict` result and does not change the workspace.
+
+Use `--on-conflict overwrite` when you created the existing skill and want to
+replace its content while preserving its ID and agent bindings. Use
+`--on-conflict rename` to import a copy with an automatic suffix such as `-2`.
+Use `--on-conflict skip` to leave the existing skill untouched and report
+`skipped`.
+
+```bash
+multica skill import --url https://skills.sh/acme/repo/review-helper
+multica skill import --url https://skills.sh/acme/repo/review-helper --on-conflict overwrite
+multica skill import --url https://skills.sh/acme/repo/review-helper --on-conflict rename
+multica skill import --url https://skills.sh/acme/repo/review-helper --on-conflict skip
+```
+
 ## Squads

 | Command | Purpose |
--- a/apps/docs/content/docs/cli.zh.mdx
+++ b/apps/docs/content/docs/cli.zh.mdx
@@ -79,6 +79,19 @@ Token 类型的详细区分见 [认证与令牌](/auth-tokens)。
 | `multica skill import ...` | 从 GitHub / ClawHub / 本机导入 Skill |
 | `multica skill files ...` | 嵌套：管理 Skill 的文件 |

+### Skill 导入冲突
+
+`multica skill import --url <url>` 默认等同于 `--on-conflict fail`。如果工作区里已经有同名 Skill，命令会返回结构化 `conflict` 结果并退出，不会修改工作区。
+
+如果你是已有 Skill 的 creator，并且想用新导入内容覆盖它，同时保留原 Skill 的 ID 和 agent 绑定，用 `--on-conflict overwrite`。如果想保留已有 Skill、另存一份，用 `--on-conflict rename`，系统会自动加 `-2` 这类后缀。如果只是批量导入时遇到同名项就跳过，用 `--on-conflict skip`。
+
+```bash
+multica skill import --url https://skills.sh/acme/repo/review-helper
+multica skill import --url https://skills.sh/acme/repo/review-helper --on-conflict overwrite
+multica skill import --url https://skills.sh/acme/repo/review-helper --on-conflict rename
+multica skill import --url https://skills.sh/acme/repo/review-helper --on-conflict skip
+```
+
 ## 小队

 | 命令 | 用途 |
--- a/apps/docs/content/docs/cli/reference.zh.mdx
+++ b/apps/docs/content/docs/cli/reference.zh.mdx
@@ -115,7 +115,7 @@ Daemon behavior is configured via flags or environment variables:
 |---------|------|--------------|---------|
 | Poll interval | `--poll-interval` | `MULTICA_DAEMON_POLL_INTERVAL` | `3s` |
 | Heartbeat interval | `--heartbeat-interval` | `MULTICA_DAEMON_HEARTBEAT_INTERVAL` | `15s` |
-| Agent timeout | `--agent-timeout` | `MULTICA_AGENT_TIMEOUT` | `2h` |
+| Agent timeout | `--agent-timeout` | `MULTICA_AGENT_TIMEOUT` | `0`（不限制，由看门狗兜底）|
 | Max concurrent tasks | `--max-concurrent-tasks` | `MULTICA_DAEMON_MAX_CONCURRENT_TASKS` | `20` |
 | Daemon ID | `--daemon-id` | `MULTICA_DAEMON_ID` | hostname |
 | Device name | `--device-name` | `MULTICA_DAEMON_DEVICE_NAME` | hostname |
--- a/apps/docs/content/docs/comments.ja.mdx
+++ b/apps/docs/content/docs/comments.ja.mdx
@@ -39,9 +39,9 @@ import { Callout } from "fumadocs-ui/components/callout";

 ## イシューを参照する

-別のイシューをリンクするには、`MUL-123` のようにそのイシューキーを入力してください。Multica はコメント内で実在するイシューキーを解決し、内部的に `mention://issue/<uuid>` リンクとして保存します。イシューリンクは単なる相互参照にすぎません。人に通知を送ることはなく、エージェントをトリガーすることもありません。
+別のイシューをリンクするには、コメントの mention ピッカーからそのイシューを選択してください。Multica はイシューリンクを明示的な `[MUL-123](mention://issue/<uuid>)` mention リンクとして保存します。イシューリンクは単なる相互参照にすぎません。人に通知を送ることはなく、エージェントをトリガーすることもありません。

-通常は `[MUL-123](mention://issue/<uuid>)` を手で書く必要はありません。その形式は、Multica がキーを解決した後に使う標準的な内部表現です。
+`MUL-123` のような裸のイシューキーを入力しても、通常のテキストのまま残ります。そのため、`feature/MUL-123` のようなコメント内のブランチ名やパスも書き換えられません。

 <Callout type="info">
 Markdown の強調は CommonMark のルールに従います。太字テキストが句読点や閉じ引用符で終わり、その直後に韓国語の助詞が続く場合、閉じの `**` が認識されないことがあります。
--- a/apps/docs/content/docs/comments.ko.mdx
+++ b/apps/docs/content/docs/comments.ko.mdx
@@ -39,9 +39,9 @@ import { Callout } from "fumadocs-ui/components/callout";

 ## 이슈 참조하기

-다른 이슈를 링크하려면 `MUL-123`처럼 이슈 키를 입력하세요. Multica는 댓글에서 실제 존재하는 이슈 키를 해석하여 내부적으로 `mention://issue/<uuid>` 링크로 저장합니다. 이슈 링크는 단순한 상호 참조일 뿐입니다. 사람에게 알림을 보내지 않으며 에이전트를 트리거하지도 않습니다.
+다른 이슈를 링크하려면 댓글 mention 선택기에서 해당 이슈를 선택하세요. Multica는 이슈 링크를 명시적인 `[MUL-123](mention://issue/<uuid>)` mention 링크로 저장합니다. 이슈 링크는 단순한 상호 참조일 뿐입니다. 사람에게 알림을 보내지 않으며 에이전트를 트리거하지도 않습니다.

-보통은 `[MUL-123](mention://issue/<uuid>)`을 직접 손으로 작성할 필요가 없습니다. 그 형식은 Multica가 키를 해석한 뒤에 사용하는 표준 내부 표현입니다.
+`MUL-123` 같은 bare 이슈 키를 입력하면 일반 텍스트로 유지됩니다. 따라서 `feature/MUL-123` 같은 댓글 안의 브랜치 이름과 경로도 다시 작성되지 않습니다.

 <Callout type="info">
 Markdown 강조는 CommonMark 규칙을 따릅니다. 굵은 텍스트가 문장 부호나 닫는 따옴표로 끝나고 그 뒤에 한국어 조사가 바로 이어지면, 닫는 `**`가 인식되지 않을 수 있습니다.
--- a/apps/docs/content/docs/comments.mdx
+++ b/apps/docs/content/docs/comments.mdx
@@ -39,9 +39,9 @@ Mentioning the same person multiple times in one comment still produces **only o

 ## Referencing issues

-To link another issue, type its issue key, such as `MUL-123`. Multica resolves real issue keys in comments and stores them as an internal `mention://issue/<uuid>` link. Issue links are cross-references only: they do not notify people and they do not trigger agents.
+To link another issue, choose it from the comment mention picker. Multica stores issue links as an explicit `[MUL-123](mention://issue/<uuid>)` mention link. Issue links are cross-references only: they do not notify people and they do not trigger agents.

-You normally do not need to write `[MUL-123](mention://issue/<uuid>)` by hand. That format is the canonical internal representation after Multica has resolved the key.
+Typing a bare issue key, such as `MUL-123`, keeps it as plain text. This also keeps branch names and paths, such as `feature/MUL-123`, from being rewritten inside comments.

 <Callout type="info">
 Markdown emphasis follows CommonMark rules. When bold text ends with punctuation or a closing quote and is immediately followed by a Korean particle, the closing `**` may not be recognized.
--- a/apps/docs/content/docs/comments.zh.mdx
+++ b/apps/docs/content/docs/comments.zh.mdx
@@ -39,9 +39,9 @@ import { Callout } from "fumadocs-ui/components/callout";

 ## 引用 issue

-要链接另一个 issue，直接输入它的 issue key，例如 `MUL-123`。Multica 会在评论中解析真实存在的 issue key，并把它存成内部的 `mention://issue/<uuid>` 链接。Issue 链接只是交叉引用：不会通知成员，也不会触发智能体。
+要链接另一个 issue，请在评论的 mention 选择器里选择它。Multica 会把 issue 链接存成显式的 `[MUL-123](mention://issue/<uuid>)` mention 链接。Issue 链接只是交叉引用：不会通知成员，也不会触发智能体。

-通常不需要手写 `[MUL-123](mention://issue/<uuid>)`。这是 Multica 解析 key 之后使用的内部规范格式。
+直接输入裸 issue key，例如 `MUL-123`，会保持为普通文本。这样评论里的分支名和路径，例如 `feature/MUL-123`，也不会被改写。

 <Callout type="info">
 Markdown 加粗遵循 CommonMark 规则。当加粗文本以标点或闭引号结尾，并且后面紧跟韩语助词时，结尾的 `**` 可能不会被识别。
--- a/apps/docs/content/docs/environment-variables.ja.mdx
+++ b/apps/docs/content/docs/environment-variables.ja.mdx
@@ -49,10 +49,12 @@ Multica は 2 つの配信バックエンドをサポートします — クラ
 | 変数 | デフォルト | 説明 |
 |---|---|---|
 | `SMTP_HOST` | 空 | SMTP relay のホスト名。これを設定すると SMTP モードが有効になり、Resend を上書きします |
-| `SMTP_PORT` | `25` | SMTP ポート。STARTTLS サブミッションには `587` を使用してください。**ポート 465（SMTPS / 暗黙的 TLS）はサポートされていません** |
+| `SMTP_PORT` | `25` | SMTP ポート。STARTTLS サブミッションには `587` を、SMTPS（暗黙的 TLS、自動有効化）には `465` を使用します |
 | `SMTP_USERNAME` | 空 | SMTP ユーザー名。認証なしの relay の場合は空のままにしてください |
 | `SMTP_PASSWORD` | 空 | SMTP パスワード |
+| `SMTP_TLS` | `starttls` | TLS モード。`implicit`（別名 `smtps`、`ssl`）は接続時に即座に TLS ハンドシェイクを行います（SMTPS）。`465` ポートでは自動的に有効になります。未設定 / `starttls` の場合は接続後に STARTTLS でアップグレードします |
 | `SMTP_TLS_INSECURE` | `false` | TLS 証明書の検証をスキップするには `true` に設定（プライベート CA / 自己署名証明書のみ） |
+| `SMTP_EHLO_NAME` | マシンのホスト名 | relay に通知する EHLO/HELO 名。厳格な relay（例: Google Workspace `smtp-relay.gmail.com`）が公開 IP からのデフォルトの挨拶を拒否する場合は、実際の FQDN を設定してください — そうしないと relay が接続を切断し、後続のコマンドで不明瞭な `EOF` として表面化します |

 サーバーが STARTTLS を通知すると自動的にアップグレードされます。dial タイムアウトは 10 秒で、SMTP セッション全体には 30 秒のデッドラインがあるため、ブラックホール化した relay が auth ハンドラーをハングさせることはできません。

@@ -84,15 +86,19 @@ Multica はユーザーがアップロードした添付ファイル（コメン
 | `S3_REGION` | `us-west-2` | AWS リージョン。バケットの実際のリージョンと一致する必要があります — SDK 署名と公開 URL の構築の両方に使われます |
 | `AWS_ACCESS_KEY_ID` / `AWS_SECRET_ACCESS_KEY` | 空 | 静的な認証情報。両方を設定しない場合は AWS SDK のデフォルト認証情報チェーン（IAM role / 環境認証情報）が使われます |
 | `AWS_ENDPOINT_URL` | 空 | カスタムの S3 互換エンドポイント（例: [MinIO](https://min.io/)）。これを設定すると path-style URL に切り替わります |
+| `ATTACHMENT_DOWNLOAD_MODE` | `auto` | 添付ファイルのダウンロード方式: `auto`、`cloudfront`、`presign`、`proxy`。`auto` では CloudFront が完全に設定されている場合は優先し、内部/プライベート endpoint host は server proxy、公開 S3 互換 endpoint は対応時に presigned GET を使います |
+| `ATTACHMENT_DOWNLOAD_URL_TTL` | `30m` | CloudFront signed URL と S3 presigned download URL の有効期間。Go duration 形式を受け付けます |

 **`S3_BUCKET` を設定しない場合**: サーバーは起動時に `"S3_BUCKET not set, cloud upload disabled"` をログに記録し、すべてのアップロードはローカルディスクにフォールバックします。

-**公開 URL** は次の優先順位で構築されます。
+**保存されるオブジェクト URL** は次の優先順位で構築されます。

 1. `CLOUDFRONT_DOMAIN` が設定されている場合は `https://<CLOUDFRONT_DOMAIN>/<key>`。
 2. `AWS_ENDPOINT_URL` が設定されている場合は `<AWS_ENDPOINT_URL>/<S3_BUCKET>/<key>`（path-style）。
 3. `https://<S3_BUCKET>.s3.<S3_REGION>.amazonaws.com/<key>`（virtual-hosted-style）。`S3_BUCKET` にドットが含まれる場合、AWS が発行するワイルドカード TLS 証明書がドットを含むバケットホストを検証できないため、サーバーは `https://s3.<S3_REGION>.amazonaws.com/<S3_BUCKET>/<key>`（path-style）にフォールバックします。

+API の `download_url` は、CloudFront 署名が設定されていない場合 `GET /api/attachments/{id}/download` を使います。この endpoint は安全な場合 CloudFront/S3 presigned URL にリダイレクトし、`http://rustfs:9000` のようなプライベート/内部 endpoint では server がストリーミングします。Docker/VPC 内部のオブジェクトストアでは `ATTACHMENT_DOWNLOAD_MODE=proxy` を明示できます。
+
 ### ローカルディスク（S3 が設定されていない場合）

 | 変数 | デフォルト | 説明 |
@@ -192,18 +198,24 @@ S3 の前段に CloudFront を置く場合、3 つの変数が適用されます

 ## GitHub 連携

-[GitHub PR ↔ イシュー連携](/github-integration)には 2 つの変数が必要です。設定で Connect GitHub を有効にし、受信 webhook を受け付けるには両方を設定してください。
+[GitHub PR ↔ イシュー連携](/github-integration)には 2 つの必須変数が必要です。設定で Connect GitHub を有効にし、受信 webhook を受け付けるには両方を設定してください。さらに 2 つの任意変数を設定すると、インストール時点で連携先のアカウント名を取得できます。

 | 変数 | デフォルト | 説明 |
 |---|---|---|
 | `GITHUB_APP_SLUG` | 空 | GitHub App の slug（`https://github.com/apps/<slug>` の末尾部分）。設定 → GitHub のインストールボタン URL を構成します |
 | `GITHUB_WEBHOOK_SECRET` | 空 | GitHub App に設定した Webhook secret。すべての `pull_request` / `installation` delivery の HMAC-SHA256 検証に使われ、setup コールバックの state token の HMAC キーとしても使われます |
+| `GITHUB_APP_ID` | 空 | 任意。App 設定ページに表示される数値の App ID。`GITHUB_APP_PRIVATE_KEY` と併せて設定すると、setup コールバックがインストール時点で GitHub から連携先のアカウント名を取得できます |
+| `GITHUB_APP_PRIVATE_KEY` | 空 | 任意。App の RSA 秘密鍵の完全な PEM ブロック（`-----BEGIN/END-----` 行を含み、改行を保ったまま）。GitHub の App 認証 REST 呼び出しに必要な短命 JWT の発行に使われます |

-**どちらかが設定されていない場合の動作:**
+**いずれかの必須変数が設定されていない場合の動作:**

 - 設定 → GitHub の `Connect GitHub` が**無効**になり、admin に「not configured」というヒントを表示します。
 - `/api/webhooks/github` エンドポイントは **`503 github webhooks not configured`** を返します — Multica はすべての署名を有効として扱うのではなく、secret なしではイベント処理を拒否します。

+**任意の `GITHUB_APP_ID` / `GITHUB_APP_PRIVATE_KEY` が設定されていない場合の動作:**
+
+- インストール直後、接続カードには一時的に `Connected to unknown` と表示されます。GitHub から `installation.created` webhook が届くと（通常は数秒以内）、Multica は行を実際の組織名/ユーザー名に更新し、リアルタイムブロードキャストを発行するため、開いている Settings → GitHub タブは手動更新なしで反映されます。
+
 **注:** `GITHUB_WEBHOOK_SECRET` はインストールフローの state token の署名キーとして再利用されるため、運用者は secret を 1 つだけ管理すればよいです。これは GitHub App の *Client* secret では**ありません** — Client secret は OAuth 関連であり、この連携では使われません。完全な手順は [GitHub 連携 → セルフホストのセットアップ](/github-integration#self-host-setup)を参照してください。

 ## 使用量分析
--- a/apps/docs/content/docs/environment-variables.ko.mdx
+++ b/apps/docs/content/docs/environment-variables.ko.mdx
@@ -49,10 +49,12 @@ Multica는 두 가지 전송 백엔드를 지원합니다 — 클라우드 배
 | 변수 | 기본값 | 설명 |
 |---|---|---|
 | `SMTP_HOST` | 비어 있음 | SMTP relay 호스트명. 이를 설정하면 SMTP 모드가 활성화되고 Resend를 덮어씁니다 |
-| `SMTP_PORT` | `25` | SMTP 포트. STARTTLS 제출에는 `587`을 사용하세요; **포트 465(SMTPS / 암묵적 TLS)는 지원되지 않습니다** |
+| `SMTP_PORT` | `25` | SMTP 포트. STARTTLS 제출에는 `587`을, SMTPS(암묵적 TLS, 자동 활성화)에는 `465`를 사용하세요 |
 | `SMTP_USERNAME` | 비어 있음 | SMTP 사용자명. 인증 없는 relay의 경우 비워 두세요 |
 | `SMTP_PASSWORD` | 비어 있음 | SMTP 비밀번호 |
+| `SMTP_TLS` | `starttls` | TLS 모드. `implicit`(별칭 `smtps`, `ssl`)은 연결 시 즉시 TLS 핸드셰이크를 수행합니다(SMTPS). `465` 포트에서는 자동으로 활성화됩니다. 미설정 / `starttls`는 연결 후 STARTTLS로 업그레이드합니다 |
 | `SMTP_TLS_INSECURE` | `false` | TLS 인증서 검증을 건너뛰려면 `true`로 설정 (사설 CA / 자체 서명 인증서만 해당) |
+| `SMTP_EHLO_NAME` | 머신 호스트명 | relay에 알리는 EHLO/HELO 이름. 엄격한 relay(예: Google Workspace `smtp-relay.gmail.com`)가 공개 IP에서 보내는 기본 greeting을 거부하는 경우 실제 FQDN을 설정하세요 — 그렇지 않으면 relay가 연결을 끊고, 이는 이후 명령에서 불투명한 `EOF`로 나타납니다 |

 서버가 STARTTLS를 알리면 자동으로 업그레이드됩니다. dial 타임아웃은 10초이고 전체 SMTP 세션에는 30초 데드라인이 있어, 블랙홀이 된 relay가 auth 핸들러를 멈추게 할 수 없습니다.

@@ -84,15 +86,19 @@ Multica는 사용자가 업로드한 첨부 파일(댓글의 이미지와 파일
 | `S3_REGION` | `us-west-2` | AWS 리전. 버킷의 실제 리전과 일치해야 합니다 — SDK 서명과 공개 URL 구성 모두에 사용됩니다 |
 | `AWS_ACCESS_KEY_ID` / `AWS_SECRET_ACCESS_KEY` | 비어 있음 | 정적 자격 증명. 둘 다 설정하지 않으면 AWS SDK 기본 자격 증명 체인(IAM role / 환경 자격 증명)이 사용됩니다 |
 | `AWS_ENDPOINT_URL` | 비어 있음 | 사용자 정의 S3 호환 엔드포인트 (예: [MinIO](https://min.io/)). 이를 설정하면 path-style URL로 전환됩니다 |
+| `ATTACHMENT_DOWNLOAD_MODE` | `auto` | 첨부 파일 다운로드 방식: `auto`, `cloudfront`, `presign`, `proxy`. `auto`에서는 CloudFront가 완전히 설정되어 있으면 우선 사용하고, 내부/프라이빗 endpoint host는 server proxy를, 공개 S3 호환 endpoint는 지원되는 경우 presigned GET을 사용합니다 |
+| `ATTACHMENT_DOWNLOAD_URL_TTL` | `30m` | CloudFront signed URL 및 S3 presigned download URL의 유효 기간. Go duration 형식을 받습니다 |

 **`S3_BUCKET`을 설정하지 않으면**: 서버는 시작 시 `"S3_BUCKET not set, cloud upload disabled"`를 로깅하고, 모든 업로드는 로컬 디스크로 폴백합니다.

-**공개 URL**은 다음 우선순위 순서로 구성됩니다:
+**저장된 객체 URL**은 다음 우선순위 순서로 구성됩니다:

 1. `CLOUDFRONT_DOMAIN`이 설정된 경우 `https://<CLOUDFRONT_DOMAIN>/<key>`.
 2. `AWS_ENDPOINT_URL`이 설정된 경우 `<AWS_ENDPOINT_URL>/<S3_BUCKET>/<key>` (path-style).
 3. `https://<S3_BUCKET>.s3.<S3_REGION>.amazonaws.com/<key>` (virtual-hosted-style). `S3_BUCKET`에 점이 포함된 경우, AWS가 발급한 와일드카드 TLS 인증서가 점이 포함된 버킷 호스트를 검증하지 못하므로 서버는 `https://s3.<S3_REGION>.amazonaws.com/<S3_BUCKET>/<key>` (path-style)로 폴백합니다.

+API `download_url` 값은 CloudFront 서명이 설정되지 않은 경우 `GET /api/attachments/{id}/download`를 사용합니다. 이 endpoint는 안전한 경우 CloudFront/S3 presigned URL로 리디렉션하고, `http://rustfs:9000` 같은 프라이빗/내부 endpoint에서는 server가 스트리밍합니다. Docker/VPC 내부 객체 저장소에서는 `ATTACHMENT_DOWNLOAD_MODE=proxy`를 명시할 수 있습니다.
+
 ### 로컬 디스크 (S3가 설정되지 않은 경우)

 | 변수 | 기본값 | 설명 |
@@ -192,18 +198,24 @@ S3 앞에 CloudFront를 두는 경우 세 가지 변수가 적용됩니다: `CLO

 ## GitHub 연동

-[GitHub PR ↔ 이슈 연동](/github-integration)에는 두 개의 변수가 필요합니다. 설정에서 Connect GitHub를 활성화하고 들어오는 webhook을 수락하려면 둘 다 설정하세요.
+[GitHub PR ↔ 이슈 연동](/github-integration)에는 두 개의 필수 변수가 필요합니다. 설정에서 Connect GitHub를 활성화하고 들어오는 webhook을 수락하려면 둘 다 설정하세요. 추가로 두 개의 선택 변수를 설정하면 설치 시점에 연결된 계정 이름을 즉시 가져올 수 있습니다.

 | 변수 | 기본값 | 설명 |
 |---|---|---|
 | `GITHUB_APP_SLUG` | 비어 있음 | GitHub App의 slug (`https://github.com/apps/<slug>`의 끝부분). 설정 → GitHub 설치 버튼 URL을 구성합니다 |
 | `GITHUB_WEBHOOK_SECRET` | 비어 있음 | GitHub App에 설정한 Webhook secret. 모든 `pull_request` / `installation` delivery의 HMAC-SHA256 검증에 사용되며, setup 콜백 state token의 HMAC 키로도 사용됩니다 |
+| `GITHUB_APP_ID` | 비어 있음 | 선택. App 설정 페이지에 표시되는 숫자 App ID. `GITHUB_APP_PRIVATE_KEY`와 함께 설정하면 setup 콜백이 설치 시점에 GitHub에서 연결된 계정 이름을 가져올 수 있습니다 |
+| `GITHUB_APP_PRIVATE_KEY` | 비어 있음 | 선택. App RSA 비공개 키의 전체 PEM 블록 (`-----BEGIN/END-----` 줄 포함, 줄바꿈 유지). GitHub의 App 인증 REST 호출에 필요한 단명 JWT를 발급하는 데 사용됩니다 |

-**둘 중 하나라도 설정하지 않았을 때의 동작:**
+**필수 변수 중 하나라도 설정하지 않았을 때의 동작:**

 - 설정 → GitHub의 `Connect GitHub`가 **비활성화**되고 admin에게 "not configured" 힌트를 표시합니다.
 - `/api/webhooks/github` 엔드포인트는 **`503 github webhooks not configured`**를 반환합니다 — Multica는 모든 서명을 유효한 것으로 취급하기보다, secret 없이는 이벤트 처리를 거부합니다.

+**선택 `GITHUB_APP_ID` / `GITHUB_APP_PRIVATE_KEY`가 설정되지 않았을 때의 동작:**
+
+- 설치 직후 연결 카드에 잠시 `Connected to unknown`이 표시됩니다. GitHub의 `installation.created` 웹훅이 도착하면(보통 몇 초 이내) Multica가 행을 실제 조직/사용자 이름으로 갱신하고 실시간 브로드캐스트를 보내, 열려 있는 Settings → GitHub 탭이 수동 새로고침 없이 업데이트됩니다.
+
 **참고:** `GITHUB_WEBHOOK_SECRET`은 설치 흐름 state token의 서명 키로 재사용되므로, 운영자는 secret 하나만 관리하면 됩니다. 이것은 GitHub App의 *Client* secret이 **아닙니다** — Client secret은 OAuth 관련이며 이 연동에서는 사용되지 않습니다. 전체 안내는 [GitHub 연동 → 자체 호스팅 설정](/github-integration#self-host-setup)을 참고하세요.

 ## 사용량 분석
--- a/apps/docs/content/docs/environment-variables.mdx
+++ b/apps/docs/content/docs/environment-variables.mdx
@@ -54,6 +54,7 @@ Multica supports two delivery backends — [Resend](https://resend.com/) for clo
 | `SMTP_PASSWORD` | empty | SMTP password |
 | `SMTP_TLS` | `starttls` | TLS mode. `implicit` (aliases `smtps`, `ssl`) forces an immediate TLS handshake on connect (SMTPS); port `465` auto-enables it. Unset / `starttls` upgrades via STARTTLS after connect |
 | `SMTP_TLS_INSECURE` | `false` | Set `true` to skip TLS certificate verification (private CA / self-signed only) |
+| `SMTP_EHLO_NAME` | machine hostname | EHLO/HELO name announced to the relay. Set a real FQDN when a strict relay (e.g. Google Workspace `smtp-relay.gmail.com`) rejects the default greeting from a public IP — otherwise the relay drops the connection and it surfaces as an opaque `EOF` on a later command |

 STARTTLS is upgraded automatically when the server advertises it. The dial timeout is 10s and the whole SMTP session has a 30s deadline, so a black-holed relay can't hang the auth handler.

@@ -85,15 +86,19 @@ Multica stores user-uploaded attachments (images and files in comments). **S3 is
 | `S3_REGION` | `us-west-2` | AWS region. Must match the bucket's actual region — it is used both for SDK signing and for building the public URL |
 | `AWS_ACCESS_KEY_ID` / `AWS_SECRET_ACCESS_KEY` | empty | Static credentials. When both are unset, the AWS SDK default credential chain is used (IAM role / environment credentials) |
 | `AWS_ENDPOINT_URL` | empty | Custom S3-compatible endpoint (for example [MinIO](https://min.io/)). Setting this switches to path-style URLs |
+| `ATTACHMENT_DOWNLOAD_MODE` | `auto` | Attachment download path: `auto`, `cloudfront`, `presign`, or `proxy`. In `auto`, CloudFront is preferred when fully configured; internal/private endpoint hosts use the server proxy; public S3-compatible endpoints use presigned GET URLs when supported |
+| `ATTACHMENT_DOWNLOAD_URL_TTL` | `30m` | TTL for CloudFront signed URLs and S3 presigned download URLs. Accepts Go duration strings |

 **When `S3_BUCKET` is unset**: the server logs `"S3_BUCKET not set, cloud upload disabled"` at startup, and all uploads fall back to local disk.

-**Public URLs** are constructed in this order of priority:
+**Stored object URLs** are constructed in this order of priority:

 1. `https://<CLOUDFRONT_DOMAIN>/<key>` if `CLOUDFRONT_DOMAIN` is set.
 2. `<AWS_ENDPOINT_URL>/<S3_BUCKET>/<key>` (path-style) if `AWS_ENDPOINT_URL` is set.
 3. `https://<S3_BUCKET>.s3.<S3_REGION>.amazonaws.com/<key>` (virtual-hosted-style). When `S3_BUCKET` contains dots, the server falls back to `https://s3.<S3_REGION>.amazonaws.com/<S3_BUCKET>/<key>` (path-style) because the AWS-issued wildcard TLS certificate does not validate dotted bucket hosts.

+API `download_url` values use `GET /api/attachments/{id}/download` unless CloudFront signing is configured. The endpoint redirects to CloudFront/S3 presigned URLs when safe, or streams through the server for private/internal endpoints such as `http://rustfs:9000`. For Docker/VPC-only object stores, set `ATTACHMENT_DOWNLOAD_MODE=proxy` if auto detection is not conservative enough for your network.
+
 ### Local disk (when S3 is not configured)

 | Variable | Default | Description |
@@ -195,18 +200,24 @@ For a full explanation of how each parameter affects daemon behavior, see [Daemo

 ## GitHub integration

-The [GitHub PR ↔ issue integration](/github-integration) needs two variables. Set both to enable Connect GitHub in Settings and accept incoming webhooks.
+The [GitHub PR ↔ issue integration](/github-integration) needs two variables. Set both to enable Connect GitHub in Settings and accept incoming webhooks. Two additional variables are optional but populate the connected account name on install.

 | Variable | Default | Description |
 |---|---|---|
 | `GITHUB_APP_SLUG` | empty | The slug of your GitHub App (the tail of `https://github.com/apps/<slug>`). Drives the Settings → GitHub install button URL |
 | `GITHUB_WEBHOOK_SECRET` | empty | The Webhook secret you set on the GitHub App. Used for HMAC-SHA256 verification of every `pull_request` / `installation` delivery, and as the HMAC key for the setup-callback state token |
+| `GITHUB_APP_ID` | empty | Optional. Numeric App ID from the App's settings page. Combined with `GITHUB_APP_PRIVATE_KEY`, lets the setup callback fetch the connected account name from GitHub immediately on install |
+| `GITHUB_APP_PRIVATE_KEY` | empty | Optional. Full PEM block of the App's RSA private key (including `-----BEGIN/END-----` lines, newlines preserved). Used to mint the short-lived JWT GitHub requires for App-authenticated REST calls |

-**Behavior when either is unset:**
+**Behavior when either of the required variables is unset:**

 - `Connect GitHub` in Settings → GitHub is **disabled** and shows a "not configured" hint to admins.
 - The `/api/webhooks/github` endpoint returns **`503 github webhooks not configured`** — Multica refuses to process events with no secret rather than treating every signature as valid.

+**Behavior when the optional `GITHUB_APP_ID` / `GITHUB_APP_PRIVATE_KEY` are unset:**
+
+- The connection card briefly shows `Connected to unknown` after install. Multica refreshes the row to the real org/user name as soon as GitHub delivers the `installation.created` webhook (typically within a few seconds), and broadcasts a realtime update so any open Settings → GitHub tab reflects the change without a manual refresh.
+
 **Note:** `GITHUB_WEBHOOK_SECRET` is reused as the signing key for the install-flow state token, so operators only need to manage one secret. It is **not** the GitHub App's *Client* secret — Client secrets are OAuth-related and not used by this integration. See [GitHub integration → Self-host setup](/github-integration#self-host-setup) for the full walkthrough.

 ## Usage analytics
--- a/apps/docs/content/docs/environment-variables.zh.mdx
+++ b/apps/docs/content/docs/environment-variables.zh.mdx
@@ -54,6 +54,7 @@ Multica 支持两种邮件发送通道——[Resend](https://resend.com/) 适合
 | `SMTP_PASSWORD` | 空 | SMTP 密码 |
 | `SMTP_TLS` | `starttls` | TLS 模式。`implicit`（别名 `smtps`、`ssl`）在连接时立即进行 TLS 握手（SMTPS）；`465` 端口会自动启用。未设置 / `starttls` 则在连接后通过 STARTTLS 升级 |
 | `SMTP_TLS_INSECURE` | `false` | 设为 `true` 跳过 TLS 证书校验（仅限私有 CA / 自签证书）|
+| `SMTP_EHLO_NAME` | 机器主机名 | 向 relay 通告的 EHLO/HELO 名称。当严格的 relay（例如 Google Workspace `smtp-relay.gmail.com`）拒绝来自公网 IP 的默认问候时，填一个真实的 FQDN——否则 relay 会直接断开连接，并在后续某条命令上表现为一个不知所云的 `EOF` |

 服务端 advertise STARTTLS 时会自动升级。dial 超时 10s，整个 SMTP 会话有 30s deadline，避免 relay 黑洞把 auth handler 挂死。

@@ -85,15 +86,19 @@ Multica 存储用户上传的附件（评论里的图片、文件等）。**优
 | `S3_REGION` | `us-west-2` | AWS 区域。必须和 bucket 所在区域一致——SDK 签名和公开 URL 都用它 |
 | `AWS_ACCESS_KEY_ID` / `AWS_SECRET_ACCESS_KEY` | 空 | 静态凭证。全未设时用 AWS SDK 默认凭证链（IAM role / 环境凭证）|
 | `AWS_ENDPOINT_URL` | 空 | 自定义 S3 兼容端点（例如 [MinIO](https://min.io/)）。设了会切到 path-style URL |
+| `ATTACHMENT_DOWNLOAD_MODE` | `auto` | 附件下载路径：`auto`、`cloudfront`、`presign` 或 `proxy`。`auto` 下 CloudFront 配完整时优先 CloudFront；内网/私有 endpoint host 走 server proxy；公网 S3 兼容 endpoint 在支持时走 presigned GET |
+| `ATTACHMENT_DOWNLOAD_URL_TTL` | `30m` | CloudFront signed URL 和 S3 presigned download URL 的有效期。使用 Go duration 格式 |

 **`S3_BUCKET` 未设时**：server 启动时打 info 日志 `"S3_BUCKET not set, cloud upload disabled"`，所有上传回落到本地磁盘。

-**公开 URL** 按优先级拼装：
+**对象存储 URL** 按优先级拼装：

 1. 设了 `CLOUDFRONT_DOMAIN` → `https://<CLOUDFRONT_DOMAIN>/<key>`
 2. 设了 `AWS_ENDPOINT_URL` → `<AWS_ENDPOINT_URL>/<S3_BUCKET>/<key>`（path-style）
 3. 默认走 AWS S3 → `https://<S3_BUCKET>.s3.<S3_REGION>.amazonaws.com/<key>`（virtual-hosted-style）。bucket 名含点时会回落到 `https://s3.<S3_REGION>.amazonaws.com/<S3_BUCKET>/<key>`（path-style），因为 AWS 通配证书无法覆盖含点 host。

+API 返回的 `download_url` 在未配置 CloudFront 签名时会指向 `GET /api/attachments/{id}/download`。这个端点会在安全时跳转到 CloudFront/S3 presigned URL；遇到 `http://rustfs:9000` 这类私有或内网 endpoint 时则由 server 流式转发。Docker/VPC 内部对象存储建议显式设置 `ATTACHMENT_DOWNLOAD_MODE=proxy`。
+
 ### 本地磁盘（S3 未配时）

 | 环境变量 | 默认值 | 说明 |
@@ -174,6 +179,9 @@ Multica 存储用户上传的附件（评论里的图片、文件等）。**优
 | `MULTICA_DAEMON_HEARTBEAT_INTERVAL` | `15s` | 心跳频率 |
 | `MULTICA_DAEMON_POLL_INTERVAL` | `3s` | 任务轮询频率 |
 | `MULTICA_DAEMON_MAX_CONCURRENT_TASKS` | `20` | 并发任务上限 |
+| `MULTICA_AGENT_TIMEOUT` | `0` | 单次任务的绝对墙钟上限；`0` = 不设上限，任务只受看门狗约束（活跃任务不会因为跑得久被杀）。想要硬性成本/资源天花板时再设一个正值 |
+| `MULTICA_AGENT_IDLE_WATCHDOG` | `30m` | 空闲看门狗：backend 持续静默（无消息、消息队列为空、且没有工具在途）这么久就 force-stop。`0` = 关闭整套看门狗 |
+| `MULTICA_AGENT_TOOL_WATCHDOG` | `2h` | 工具在途时的静默上限：某个工具调用发出后长时间无任何输出（疑似卡死的子进程）这么久就 force-stop。`0` = 关闭该兜底（在途工具永不被停）|
 | `MULTICA_<PROVIDER>_PATH` | 对应 CLI 名 | 各 AI 编程工具的可执行文件路径（如 `MULTICA_CLAUDE_PATH`）|
 | `MULTICA_<PROVIDER>_MODEL` | 空 | 各 AI 编程工具的默认模型 |

@@ -195,18 +203,24 @@ Multica 存储用户上传的附件（评论里的图片、文件等）。**优

 ## GitHub 集成

-[GitHub PR ↔ issue 集成](/github-integration) 依赖两个环境变量。两个都配上才会启用 Settings 里的 Connect GitHub 并接受 webhook。
+[GitHub PR ↔ issue 集成](/github-integration) 依赖两个必填环境变量。两个都配上才会启用 Settings 里的 Connect GitHub 并接受 webhook。另外两个可选变量用于在安装时直接拿到关联账号名。

 | 环境变量 | 默认值 | 说明 |
 |---|---|---|
 | `GITHUB_APP_SLUG` | 空 | 你的 GitHub App slug（`https://github.com/apps/<slug>` 的尾部）。Settings → GitHub 里安装按钮的跳转 URL 用它拼 |
 | `GITHUB_WEBHOOK_SECRET` | 空 | 你在 GitHub App 上设置的 Webhook secret。每条 `pull_request` / `installation` delivery 都用它做 HMAC-SHA256 校验；同一个值也用作 setup 回调里 state token 的签名密钥 |
+| `GITHUB_APP_ID` | 空 | 可选。App 设置页上的数字 App ID。配合 `GITHUB_APP_PRIVATE_KEY` 使用，让 setup 回调在安装那一刻直接从 GitHub 取到关联账号名 |
+| `GITHUB_APP_PRIVATE_KEY` | 空 | 可选。App RSA 私钥的完整 PEM 块（包含 `-----BEGIN/END-----` 两行，保留换行）。用于签发 GitHub App 鉴权 REST 调用所需的短效 JWT |

-**任一变量未设时：**
+**任一必填变量未设时：**

 - Settings → GitHub 里 `Connect GitHub` 按钮 **disable**，对 admin 显示「not configured」提示
 - `/api/webhooks/github` 直接返回 **`503 github webhooks not configured`**——secret 没配置时 Multica 拒绝处理任何 webhook 事件，而不是把所有签名当 valid

+**可选 `GITHUB_APP_ID` / `GITHUB_APP_PRIVATE_KEY` 未设时：**
+
+- 安装完成后，连接卡片会先短暂显示 `已连接到 unknown`。等 GitHub 的 `installation.created` webhook 到达（通常几秒内），Multica 会把 row 刷成真实的组织/用户名，并通过 realtime 推送让正在打开的 Settings → GitHub 页面无需手动刷新即可更新。
+
 **注意：** `GITHUB_WEBHOOK_SECRET` 同时被复用为 install 流程里 state token 的签名密钥，所以运维只需要维护一个 secret。它**不是** GitHub App 的 *Client* secret——Client secret 是 OAuth 用的，和本集成无关。完整配置流程见 [GitHub 集成 → Self-Host 配置](/github-integration#self-host-配置)。

 ## 用量统计
--- a/apps/docs/content/docs/getting-started/self-hosting.zh.mdx
+++ b/apps/docs/content/docs/getting-started/self-hosting.zh.mdx
@@ -51,7 +51,7 @@ cd multica
 make selfhost
 ```

-`make selfhost` automatically creates `.env`, generates a random `JWT_SECRET`, and starts all services via Docker Compose.
+`make selfhost` automatically creates `.env`, generates a random `JWT_SECRET` and Postgres password, and starts all services via Docker Compose.

 By default it pulls the latest stable release images from GHCR. To build the backend/web from your current checkout instead, run `make selfhost-build`.
 If the selected GHCR tag has not been published yet, `make selfhost` now tells you to fall back to `make selfhost-build`.
@@ -63,7 +63,7 @@ Once ready:
 - **Backend API:** http://localhost:8080

 <Callout>
-If you prefer running the Docker Compose steps manually: `cp .env.example .env`, edit `JWT_SECRET`, then `docker compose -f docker-compose.selfhost.yml pull && docker compose -f docker-compose.selfhost.yml up -d`.
+If you prefer running the Docker Compose steps manually: `cp .env.example .env`, edit `JWT_SECRET`, `POSTGRES_PASSWORD`, and the password segment in `DATABASE_URL`, then `docker compose -f docker-compose.selfhost.yml pull && docker compose -f docker-compose.selfhost.yml up -d`.
 </Callout>

 ### Step 2 — Log In
@@ -133,17 +133,54 @@ Alternatively, configure step by step: `multica config set server_url http://loc
 3. Go to **Settings → Agents** and create a new agent
 4. Create an issue and assign it to your agent

-## Usage Dashboard Rollup (Required)
+## Usage Dashboard Rollup

-Starting with `v0.3.5`, the Usage / Runtime dashboards read from a derived `task_usage_hourly` table populated by `rollup_task_usage_hourly()`. The bundled `pgvector/pgvector:pg17` image does **not** include `pg_cron`, and the backend doesn't run the rollup in-process either — until you schedule it yourself, the dashboard will stay at zero even though `task_usage` is populated.
+The Usage / Runtime dashboards read from a derived `task_usage_hourly` table populated by `rollup_task_usage_hourly()`. As of MUL-2957 the backend runs this rollup **in-process** on every replica via a DB-backed scheduler (`sys_cron_executions`). A fresh self-host install needs no operator action — the bundled `pgvector/pgvector:pg17` image works as-is, and you do **not** need to swap it for an image that ships `pg_cron`, register an external cron job, run a systemd timer, or schedule a Kubernetes `CronJob`.

-Pick one supported path before relying on the Usage / Runtime dashboard:
+Multiple backend replicas are safe: every replica ticks every 30 seconds and tries to claim the current 5-minute UTC plan, but the unique key `(job_name, scope_kind, scope_id, plan_time)` means only one wins each plan. Inspect the audit table to confirm steady-state operation:

- **External cron / systemd-timer / Kubernetes `CronJob`**: schedule `SELECT rollup_task_usage_hourly()` every 5 minutes. Idempotent, watermark-driven — overlapping or skipped ticks are safe.
- **Postgres with `pg_cron`**: swap the bundled Postgres image for one that ships `pg_cron`, set `shared_preload_libraries=pg_cron`, then `SELECT cron.schedule('rollup_task_usage_hourly', '*/5 * * * *', 'SELECT rollup_task_usage_hourly()')` once.
- **Backfill historical data**: required on the `v0.3.4 → v0.3.5+` upgrade path when the database already has `task_usage` rows — migration `103` is fail-closed and will abort `migrate up` with `refusing to drop legacy daily rollups: ...` until the hourly table is seeded. Run `./backfill_task_usage_hourly --sleep-between-slices=2s` inside the backend container, then re-run the upgrade and configure one of the schedules above.
+```sql
+SELECT plan_time, status, attempt, runner_id,
+       error_code, error_msg, started_at, finished_at
+  FROM sys_cron_executions
+ WHERE job_name = 'rollup_task_usage_hourly'
+ ORDER BY plan_time DESC
+ LIMIT 20;
+```

-Full reference (Compose + Kubernetes templates, flag descriptions, upgrade order) lives in [`SELF_HOSTING_ADVANCED.md → Usage Dashboard Rollup`](https://github.com/multica-ai/multica/blob/main/SELF_HOSTING_ADVANCED.md#usage-dashboard-rollup).
+<Callout>
+**Upgrading from `v0.3.4` to `v0.3.5+`?** As of MUL-2957 the `migrate up` command runs an idempotent monthly-slice backfill automatically right before applying migration `103`, so the upgrade completes in a single invocation — no operator step required. If you are on a pre-MUL-2957 binary or the auto-hook fails for an environmental reason, run `backfill_task_usage_hourly` against the same database and re-run the upgrade. Full recovery flow lives in [`SELF_HOSTING_ADVANCED.md → Usage Dashboard Rollup`](https://github.com/multica-ai/multica/blob/main/SELF_HOSTING_ADVANCED.md#usage-dashboard-rollup).
+</Callout>
+
+### Compatibility paths (existing deployments only)
+
+External schedulers — **`pg_cron` registered on the database, an external cron job, a systemd timer, or a Kubernetes `CronJob`** — that call `SELECT rollup_task_usage_hourly()` directly were the only option before MUL-2957 and remain a supported compatibility path. They are no longer the recommended setup; new deployments should rely on the in-process scheduler. The SQL function holds advisory lock 4246 internally, so the in-process scheduler and any pre-existing external schedule can coexist without ever double-writing the rollup.
+
+If you already have a `pg_cron` job in production and want to retire it, the safe sequence is:
+
+1. Confirm the in-process scheduler is healthy on at least one backend replica — recent SUCCESS rows should be landing in `sys_cron_executions` for `rollup_task_usage_hourly`:
+
+   ```sql
+   SELECT plan_time, status, runner_id, finished_at
+     FROM sys_cron_executions
+    WHERE job_name = 'rollup_task_usage_hourly'
+      AND status = 'SUCCESS'
+    ORDER BY plan_time DESC
+    LIMIT 5;
+   ```
+
+2. Once SUCCESS rows are arriving on schedule, unschedule the redundant `pg_cron` entry:
+
+   ```sql
+   SELECT cron.unschedule('rollup_task_usage_hourly')
+     FROM cron.job WHERE jobname = 'rollup_task_usage_hourly';
+   ```
+
+3. Leave the `pg_cron` extension itself installed unless you are sure no other workload depends on it. The bundled `pgvector/pgvector:pg17` image does **not** ship `pg_cron`, so nothing in Multica's default install needs it; uninstalling `pg_cron` from a custom image that other workloads still use is a separate decision.
+
+External cron / systemd timer / Kubernetes `CronJob` setups that call `SELECT rollup_task_usage_hourly()` directly can be retired the same way — once `sys_cron_executions` shows steady SUCCESS rows from the in-process scheduler, the external job is redundant and can be removed.
+
+Full reference (audit table semantics, advisory lock 4246, the standalone backfill command, flag descriptions, the migration auto-hook) lives in [`SELF_HOSTING_ADVANCED.md → Usage Dashboard Rollup`](https://github.com/multica-ai/multica/blob/main/SELF_HOSTING_ADVANCED.md#usage-dashboard-rollup).

 ## Stopping Services

@@ -189,7 +226,8 @@ All configuration is done via environment variables. Copy `.env.example` as a st

 | Variable | Description | Example |
 |----------|-------------|---------|
-| `DATABASE_URL` | PostgreSQL connection string | `postgres://multica:multica@localhost:5432/multica?sslmode=disable` |
+| `DATABASE_URL` | PostgreSQL connection string. Keep the password segment in sync with `POSTGRES_PASSWORD`. | `postgres://multica:<postgres-password>@localhost:5432/multica?sslmode=disable` |
+| `POSTGRES_PASSWORD` | **Must change from default.** Password used by the bundled Postgres container. Keep it in sync with `DATABASE_URL`. | `openssl rand -hex 24` |
 | `JWT_SECRET` | **Must change from default.** Secret key for signing JWT tokens. Use a long random string. | `openssl rand -hex 32` |
 | `FRONTEND_ORIGIN` | URL where the frontend is served (used for CORS) | `https://app.example.com` |

--- a/apps/docs/content/docs/github-integration.ja.mdx
+++ b/apps/docs/content/docs/github-integration.ja.mdx
@@ -114,13 +114,20 @@ API サーバーで:
 ```dotenv
 GITHUB_APP_SLUG=multica-acme
 GITHUB_WEBHOOK_SECRET=<the webhook secret you generated>
+
+# 任意（推奨）— インストール時点で接続済みアカウント名を取得できるため、
+# 最初の webhook が届くまで待たなくて済みます:
+GITHUB_APP_ID=<App 設定ページに表示される数値の App ID>
+GITHUB_APP_PRIVATE_KEY=<BEGIN/END 行を含む完全な PEM ブロック>
 ```

-両方の変数が必須です。どちらかが欠けていると:
+`GITHUB_APP_SLUG` と `GITHUB_WEBHOOK_SECRET` は必須です。どちらかが欠けていると:

 - Settings の `Connect GitHub` が**無効**になり、「not configured」のヒントが表示されます。
 - `/api/webhooks/github` エンドポイントが **`503 github webhooks not configured`** を返します — Multica は secret なしでイベントを処理することを拒否し、すべての署名を黙って有効として扱うことはありません。

+`GITHUB_APP_ID` と `GITHUB_APP_PRIVATE_KEY` は**任意**です。これらを設定すると、setup コールバックが GitHub の App 認証された `/app/installations/{id}` エンドポイントを呼び出して、インストール時点で実際の組織名やユーザー名を取得できます。設定しない場合、接続カードには一時的に `Connected to unknown` と表示され、GitHub から `installation.created` webhook が届くと（通常は数秒以内に）Multica が行を更新し、リアルタイムブロードキャストを発行するため、開いている Settings タブは手動更新なしで反映されます。秘密鍵は App 設定ページの **Private keys → Generate a private key** で生成し、PEM ブロック全体（`-----BEGIN/END RSA PRIVATE KEY-----` の行を含む）を改行を保ったまま env 変数に貼り付けてください。
+
 `FRONTEND_ORIGIN` も設定されている必要があります（どのプロダクションのセルフホストでもすでに設定されています）。インストール後、setup コールバックがユーザーを `<FRONTEND_ORIGIN>/settings?tab=github` に戻します。

 env 変数を設定した後は API を再起動してください。
--- a/apps/docs/content/docs/github-integration.ko.mdx
+++ b/apps/docs/content/docs/github-integration.ko.mdx
@@ -114,13 +114,20 @@ API 서버에서:
 ```dotenv
 GITHUB_APP_SLUG=multica-acme
 GITHUB_WEBHOOK_SECRET=<the webhook secret you generated>
+
+# 선택(권장) — 설치 직후 연결된 계정 이름을 바로 확보합니다.
+# 설정하지 않으면 첫 webhook이 도착할 때까지 대기해야 합니다:
+GITHUB_APP_ID=<App 설정 페이지에 표시되는 숫자 App ID>
+GITHUB_APP_PRIVATE_KEY=<BEGIN/END 줄을 포함한 전체 PEM 블록>
 ```

-두 변수 모두 필수입니다. 둘 중 하나라도 누락되면:
+`GITHUB_APP_SLUG`와 `GITHUB_WEBHOOK_SECRET`은 필수입니다. 둘 중 하나라도 누락되면:

 - Settings의 `Connect GitHub`이 **비활성화**되고 "not configured" 힌트가 표시됩니다.
 - `/api/webhooks/github` 엔드포인트가 **`503 github webhooks not configured`**를 반환합니다 — Multica는 secret 없이 이벤트를 처리하기를 거부하며, 모든 서명을 조용히 유효한 것으로 취급하지 않습니다.

+`GITHUB_APP_ID`와 `GITHUB_APP_PRIVATE_KEY`는 **선택 사항**입니다. 설정하면 setup 콜백이 GitHub의 App 인증 `/app/installations/{id}` 엔드포인트를 호출해 설치 직후에 실제 조직명/사용자명을 가져옵니다. 설정하지 않으면 연결 카드에 잠시 `Connected to unknown`이 표시되며, GitHub의 `installation.created` 웹훅이 도착하면(보통 몇 초 이내) Multica가 행을 갱신하고 실시간 브로드캐스트를 보내므로 열려 있는 Settings 탭이 수동 새로고침 없이 업데이트됩니다. 비공개 키는 App 설정 페이지의 **Private keys → Generate a private key**에서 생성한 뒤, PEM 블록 전체(`-----BEGIN/END RSA PRIVATE KEY-----` 줄 포함)를 줄바꿈을 유지한 채 env 값에 붙여넣으세요.
+
 `FRONTEND_ORIGIN`도 설정되어 있어야 합니다(어떤 프로덕션 자체 호스팅이든 이미 설정되어 있습니다). 설치 후 setup 콜백이 사용자를 `<FRONTEND_ORIGIN>/settings?tab=github`으로 다시 돌려보냅니다.

 env 변수를 설정한 후 API를 재시작하세요.
--- a/apps/docs/content/docs/github-integration.mdx
+++ b/apps/docs/content/docs/github-integration.mdx
@@ -114,13 +114,20 @@ On the API server:
 ```dotenv
 GITHUB_APP_SLUG=multica-acme
 GITHUB_WEBHOOK_SECRET=<the webhook secret you generated>
+
+# Optional but recommended — populates the connected account name on
+# install instead of waiting for the first webhook to refresh it:
+GITHUB_APP_ID=<numeric App ID from the App's settings page>
+GITHUB_APP_PRIVATE_KEY=<full PEM block, including BEGIN/END lines>
 ```

-Both variables are required. If either is missing:
+`GITHUB_APP_SLUG` and `GITHUB_WEBHOOK_SECRET` are required. If either is missing:

 - `Connect GitHub` in Settings is **disabled** and shows a "not configured" hint.
 - The `/api/webhooks/github` endpoint returns **`503 github webhooks not configured`** — Multica refuses to process events with no secret, rather than silently treating every signature as valid.

+`GITHUB_APP_ID` and `GITHUB_APP_PRIVATE_KEY` are **optional**. They let the setup callback call GitHub's App-authenticated `/app/installations/{id}` endpoint to fetch the real organization or user name during install. Without them, the connection card briefly shows `Connected to unknown` until GitHub delivers the `installation.created` webhook (typically within a few seconds), at which point Multica refreshes the row and broadcasts a realtime update so any open Settings tab updates without a manual refresh. Generate the private key under **Private keys → Generate a private key** on the App's settings page; paste the full PEM block (including the `-----BEGIN/END RSA PRIVATE KEY-----` lines) into the env var, preserving newlines.
+
 `FRONTEND_ORIGIN` must also be set (it already is for any production self-host); the setup callback bounces the user back to `<FRONTEND_ORIGIN>/settings?tab=github` after install.

 Restart the API after setting the env vars.
--- a/apps/docs/content/docs/github-integration.zh.mdx
+++ b/apps/docs/content/docs/github-integration.zh.mdx
@@ -114,13 +114,19 @@ API server 上：
 ```dotenv
 GITHUB_APP_SLUG=multica-acme
 GITHUB_WEBHOOK_SECRET=<你刚生成的 webhook secret>
+
+# 可选但建议配置——安装完成时直接拿到关联账号名，而不是等第一次 webhook 才刷新：
+GITHUB_APP_ID=<App 设置页上的数字 App ID>
+GITHUB_APP_PRIVATE_KEY=<完整 PEM 块，包含 BEGIN/END 行>
 ```

-两个都必填。任何一个缺失：
+`GITHUB_APP_SLUG` 和 `GITHUB_WEBHOOK_SECRET` 必填。任何一个缺失：

 - Settings 里 `Connect GitHub` 按钮会被 **disable**，并显示「not configured」提示
 - `/api/webhooks/github` 直接返回 **`503 github webhooks not configured`**——Multica 在 secret 没配置时拒绝处理事件，不会出现「没 secret 也接受 webhook」的安全坑

+`GITHUB_APP_ID` 和 `GITHUB_APP_PRIVATE_KEY` **可选**。配上之后，setup 回调可以用 App JWT 鉴权调用 GitHub `/app/installations/{id}`，安装完成那一刻就拿到真实的组织名/用户名。不配的话，连接卡片会先显示 `已连接到 unknown`，等 GitHub 的 `installation.created` webhook 到达（通常几秒内），Multica 会刷新 row 并通过 realtime 推送让 Settings 页面无需手动刷新即可更新。私钥从 App 设置页 **Private keys → Generate a private key** 生成，把整段 PEM（含 `-----BEGIN/END RSA PRIVATE KEY-----` 两行）粘到 env 里，保留换行。
+
 `FRONTEND_ORIGIN` 也必须设置（任何生产 self-host 都已经设了）——setup 回调结束后用它把用户跳回 `<FRONTEND_ORIGIN>/settings?tab=github`。

 设完 env 重启 API。
--- a/apps/docs/content/docs/install-agent-runtime.ja.mdx
+++ b/apps/docs/content/docs/install-agent-runtime.ja.mdx
@@ -48,7 +48,7 @@ multica daemon restart

 ### Codex (OpenAI)

-よりきめ細かい承認ゲートを備えた JSON-RPC 2.0 のトランスポートです。**セッション再開のコードは存在しますが、現在は到達できません** — 再開が必要な場合は Claude Code か ACP 系列のいずれかを選んでください。
+よりきめ細かい承認ゲートを備えた JSON-RPC 2.0 のトランスポートです。**セッション再開は動作します** — Multica は Codex app-server の `thread/resume` で再開し、古いまたは存在しない thread では新しい thread にフォールバックします。

 | | |
 |---|---|
@@ -58,7 +58,7 @@ multica daemon restart

 ### Cursor (Anysphere)

-Cursor エディタに対応する CLI です。**セッション再開は動作しません** — Cursor の CLI がセッション id を返さないため、再開時に渡す値は常に無効です。
+Cursor エディタに対応する CLI です。**セッション再開は動作します** — 現在の Cursor Agent は stream-json イベントで `session_id` を返し、Multica は次回実行時に `--resume <id>` でそれを渡します。

 | | |
 |---|---|
--- a/apps/docs/content/docs/install-agent-runtime.ko.mdx
+++ b/apps/docs/content/docs/install-agent-runtime.ko.mdx
@@ -48,7 +48,7 @@ multica daemon restart

 ### Codex (OpenAI)

-더 세분화된 승인 게이트를 갖춘 JSON-RPC 2.0 전송 방식입니다. MCP 구성은 작업별 `$CODEX_HOME/config.toml`에 기록됩니다. **세션 재개 코드는 존재하지만 현재 도달할 수 없습니다** — 재개가 필요하다면 Claude Code 또는 ACP 계열 중 하나를 선택하세요.
+더 세분화된 승인 게이트를 갖춘 JSON-RPC 2.0 전송 방식입니다. MCP 구성은 작업별 `$CODEX_HOME/config.toml`에 기록됩니다. **세션 재개가 동작합니다** — Multica는 Codex app-server의 `thread/resume`으로 재개하며, 오래되었거나 없는 thread는 새 thread로 폴백합니다.

 | | |
 |---|---|
@@ -58,7 +58,7 @@ multica daemon restart

 ### Cursor (Anysphere)

-Cursor 에디터에 대응하는 CLI입니다. **세션 재개가 작동하지 않습니다** — Cursor의 CLI가 세션 id를 반환하지 않으므로 재개 시 전달하는 값은 항상 유효하지 않습니다.
+Cursor 에디터에 대응하는 CLI입니다. **세션 재개가 동작합니다** — 현재 Cursor Agent는 stream-json 이벤트에서 `session_id`를 반환하고, Multica는 다음 실행 때 이를 `--resume <id>`로 전달합니다.

 | | |
 |---|---|
--- a/apps/docs/content/docs/install-agent-runtime.mdx
+++ b/apps/docs/content/docs/install-agent-runtime.mdx
@@ -48,7 +48,7 @@ The most complete integration. Session resumption works, MCP works, and it consu

 ### Codex (OpenAI)

-JSON-RPC 2.0 transport with finer-grained approval gates. MCP config is written into the per-task `$CODEX_HOME/config.toml`. **Session resumption code exists but is currently unreachable** — pick Claude Code or one of the ACP family if you need resume.
+JSON-RPC 2.0 transport with finer-grained approval gates. MCP config is written into the per-task `$CODEX_HOME/config.toml`. **Session resumption works** through Codex app-server `thread/resume`; stale or missing threads fall back to a fresh thread.

 | | |
 |---|---|
@@ -58,7 +58,7 @@ JSON-RPC 2.0 transport with finer-grained approval gates. MCP config is written

 ### Cursor (Anysphere)

-The CLI counterpart to the Cursor editor. **Session resumption is broken** — Cursor's CLI doesn't return a session id, so the value you pass on resume is always invalid.
+The CLI counterpart to the Cursor editor. **Session resumption works** with current Cursor Agent releases: Multica reads `session_id` from the stream-json events and passes it back with `--resume <id>`.

 | | |
 |---|---|
--- a/apps/docs/content/docs/install-agent-runtime.zh.mdx
+++ b/apps/docs/content/docs/install-agent-runtime.zh.mdx
@@ -48,7 +48,7 @@ multica daemon restart

 ### Codex（OpenAI）

-JSON-RPC 2.0 传输，审批粒度更细。MCP 配置会写入单次任务的 `$CODEX_HOME/config.toml`。**会话续接的代码在，但调不到** —— 要续接的话选 Claude Code 或 ACP 系列。
+JSON-RPC 2.0 传输，审批粒度更细。MCP 配置会写入单次任务的 `$CODEX_HOME/config.toml`。**会话续接可用**——Multica 通过 Codex app-server 的 `thread/resume` 续接；thread 过期或不存在时会回退到新 thread。

 | | |
 |---|---|
@@ -58,7 +58,7 @@ JSON-RPC 2.0 传输，审批粒度更细。MCP 配置会写入单次任务的 `$

 ### Cursor（Anysphere）

-Cursor 编辑器的 CLI 对应物。**会话续接是坏的** —— Cursor CLI 不返回 session id，你传过去的续接 id 永远无效。
+Cursor 编辑器的 CLI 对应物。**会话续接可用**——当前 Cursor Agent 会在 stream-json 事件里返回 `session_id`，Multica 会在下一次运行时用 `--resume <id>` 传回去。

 | | |
 |---|---|
--- a/apps/docs/content/docs/lark-bot-integration.ja.mdx
+++ b/apps/docs/content/docs/lark-bot-integration.ja.mdx
@@ -0,0 +1,95 @@
+---
+title: Lark Bot 連携
+description: Multica エージェントを Lark（飞书）Bot に紐づければ、Lark の DM やグループからそのまま対話できます——@ でメンションして自然に話しかけたり、/issue と入力して Lark を離れずに Multica イシューを起票したりできます。
+---
+
+import { Callout } from "fumadocs-ui/components/callout";
+
+任意の[エージェント](/agents)を Lark（飞书）Bot に紐づければ、チームは Lark の中から直接それを使えます——Bot に DM したり、グループで @ メンションしたり、`/issue` と入力してアプリを開かずに [Multica イシュー](/issues)を起票したりできます。エージェントの返信は、作業の進行に合わせて更新されるライブカードとしてチャットに戻ってきます。
+
+各 Bot は 1 つの Multica エージェントと **1 対 1** で紐づきます。2 つ目のエージェントを紐づけると 2 つ目の Bot が作られます。1 つのエージェントが 2 つの Bot を持つことはありません。
+
+## この連携でできること
+
+| 場所 | 動作 |
+|---|---|
+| **エージェント → 連携** | エージェント詳細ページには **連携（Integrations）** タブがあります（左サイドバーにも対応する区画があります）。owner と admin はそこに **Lark に紐づける** が表示され、紐づけると **Lark に接続済み** バッジと **Lark で管理** リンクに切り替わります。 |
+| **Bot に DM** | ワークスペースメンバーが Lark の中で Bot に直接メッセージを送ります。各会話はそのエージェントとの Multica [chat](/chat) セッションになり、エージェントはスレッド内で返信します。 |
+| **グループで @ メンション** | Bot を Lark グループに追加して @ メンションします。読み取られるのはメンションしたメッセージだけで、Bot はグループ全体を聞いているわけではありません。 |
+| **`/issue` コマンド** | `/issue <タイトル>`（本文を続けてもよい）と入力すると、ワークスペースに新しい Multica イシューが作られ、あなたの名義になります。 |
+| **ライブ返信カード** | Bot はインタラクティブなカードを投稿し、エージェントの実行に合わせて更新し続けます——進捗、最終的な回答、あるいはエラーが反映されます。 |
+
+## エージェントを紐づける（owner / admin）
+
+紐づけはスキャンしてインストールするフローです——アプリのシークレットをコピーする必要も、開発者コンソールでの操作も不要です。
+
+1. **Agents → あなたのエージェント** からそのエージェントを開きます。
+2. **連携（Integrations）** タブ（または左サイドバーの **連携** 区画）を開き、**Lark に紐づける** をクリックします。
+3. QR コードが表示されます。スマートフォンで **Lark → スキャン** を開き、新しい PersonalAgent Bot を認可します。
+4. スキャンが完了するとダイアログが閉じ、エージェントに **Lark に接続済み** と表示されます。あなた自身の Lark アイデンティティは自動であなたの Multica アカウントに紐づくので、すぐに Bot と対話を始められます。
+
+<Callout type="info">
+QR は使い切りで、短い時間が過ぎると失効します。認可する前に失効してしまったら、**もう一度スキャン** をクリックして新しいコードを取得してください。
+</Callout>
+
+エージェントが接続されると、**Lark に紐づける** ボタンは **Lark で管理** リンクに置き換わります。スコープの調整、名前の変更、追加の権限の申請が必要なときは、これを使って Lark 内の Bot のアプリページを開いてください——再スキャンは意図的に無効化されており、既存の Bot を取り残してしまわないようにしています。
+
+## Bot を使う（メンバー）
+
+### 最初のメッセージ：Lark アイデンティティを紐づける
+
+初めて Bot にメッセージを送ると、Bot は **Lark アイデンティティを紐づける** よう促すカードで返信します。リンクをタップして Multica にサインインすると、あなたの Lark アカウントがあなたの Multica メンバーシップに紐づきます。これによって、エージェントがあなたとして振る舞えるようになります——たとえば `/issue` はあなたの名義でイシューを起票します。
+
+<Callout type="warning">
+Bot を使えるのは **ワークスペースのメンバー** だけです。メンバーでない場合や、アイデンティティの紐づけをスキップした場合、Bot は返信しません——あなたのメッセージは破棄されます（内容は保存せず、監査のために記録されます）。
+</Callout>
+
+### 対話と `/issue`
+
+- **エージェントに何でも聞く** —— Bot に DM するか、グループで @ メンションします。会話は通常のエージェント chat セッションで、エージェントはカードの中で返信します。
+- **イシューを起票する** —— `/issue Fix the login redirect` と送れば、Multica は新しいイシューを作るのと同じやり方でそのイシューをワークスペースに作ります。タイトルの後ろに行を足せば、それが説明になります。
+- **作業を見守る** —— 返信カードはエージェントの実行に合わせて自身を更新するので、進捗と結果がその場で見えます。
+
+エージェントが **オフライン**（ランタイムが接続されていない）または **アーカイブ済み** の場合、Bot はメッセージを黙って破棄するのではなく、短いステータス通知で返信します。
+
+## 管理と切断
+
+ワークスペース全体の管理は **設定 → 連携** にあります。
+
+- **接続済みの Bot** は、ワークスペース内のすべての Bot と、それぞれが紐づくエージェントを一覧表示します。この一覧はすべてのメンバーから見えます。
+- **切断** は **owner / admin 専用** です。切断すると Bot は Lark メッセージの受信を停止し、その接続が破棄されます。インストール記録は監査のために保持され、あとで同じエージェントを再び紐づけられます。
+
+## 権限
+
+- **紐づけ / 切断** にはワークスペースの **owner** または **admin** が必要です。member には接続済み Bot 一覧は見えますが、紐づけや切断の操作は見えません。
+- **Bot との対話** には、Lark アイデンティティを紐づけたワークスペースメンバーであることが必要です。それ以外の人のメッセージは一律に破棄されます。
+- この連携は破棄されたメッセージの本文を保存することはありません——監査のために破棄理由だけを記録します。
+
+## セルフホストのセットアップ
+
+Multica Cloud では連携はすでに利用可能です——このセクションは飛ばしてください。
+
+セルフホストの場合、**保存時の暗号化キーを設定するまで Lark はオフ** です。このキーは、各 Bot の app secret がデータベースに触れる前にそれを暗号化します。
+
+1. 32 バイトのキーを生成し、API サーバーに設定します。
+
+   ```dotenv
+   MULTICA_LARK_SECRET_KEY=<base64-encoded 32-byte key>
+   ```
+
+2. API を再起動します。キーを設定するまで、**設定 → 連携** には「Lark integration not enabled」という通知が表示され、**Lark に紐づける** のエントリポイントは非表示のままになります。
+
+<Callout type="info">
+**Feishu と Lark 国際版の両対応。** 各 Bot がどのクラウド（中国大陸の Feishu = `open.feishu.cn`、国際版 Lark = `open.larksuite.com`）に属するかは、QR コードをスキャンした時点で自動的に判定され、そのインストールに保存されて、その Bot へのすべての呼び出しに使われます。1 つのデプロイで両方を同時に提供できるため、どちらのテナントのチームも追加設定なしでバインドできます。
+
+`MULTICA_LARK_HTTP_BASE_URL` / `MULTICA_LARK_CALLBACK_BASE_URL` は、デプロイ全体を上書きする任意のオプション（プロキシやモック用）としてのみ残っています。通常運用では未設定のままにして、各インストールがそれぞれのクラウドに到達するようにしてください。
+
+**単一クラウド構成からのアップグレード？** これらを `https://open.larksuite.com` に設定して国際版 Lark を運用していた場合、アップグレード後の初回起動時にサーバーが既存のインストールを Lark リージョンへ付け替えるので、その後はこの上書きを外せます。中国大陸の Feishu デプロイでは操作は不要です。
+</Callout>
+
+## 次に
+
+- [エージェント](/agents) — 各 Bot はちょうど 1 つのエージェントに紐づきます
+- [Chat](/chat) — Bot の会話が Multica 内で対応するもの
+- [イシュー](/issues) — `/issue` が作るもの
+- [環境変数](/environment-variables) — セルフホスト構成の完全なリファレンス
--- a/apps/docs/content/docs/lark-bot-integration.ko.mdx
+++ b/apps/docs/content/docs/lark-bot-integration.ko.mdx
@@ -0,0 +1,95 @@
+---
+title: Lark Bot 연동
+description: Multica 에이전트를 Lark(飞书) 봇에 바인딩하면, Lark에서 직접 대화할 수 있습니다 — 개인 메시지나 그룹에서 @로 멘션하거나, 자연스럽게 대화하거나, /issue를 입력해 Lark를 벗어나지 않고 Multica 이슈를 생성하세요.
+---
+
+import { Callout } from "fumadocs-ui/components/callout";
+
+아무 [에이전트](/agents)나 Lark(飞书) 봇에 바인딩하면, 팀이 Lark 안에서 바로 그 에이전트를 사용할 수 있습니다 — 봇에게 개인 메시지를 보내거나, 그룹에서 `@`로 멘션하거나, `/issue`를 입력해 앱을 열지 않고도 [Multica 이슈](/issues)를 생성하세요. 에이전트의 답변은 실시간 카드로 채팅에 돌아오며, 작업이 진행되는 동안 계속 업데이트됩니다.
+
+각 봇은 하나의 Multica 에이전트와 **일대일**로 바인딩됩니다. 두 번째 에이전트를 바인딩하면 두 번째 봇이 생성되며, 하나의 에이전트가 두 개의 봇을 갖는 일은 없습니다.
+
+## 연동이 하는 일
+
+| 위치 | 동작 |
+|---|---|
+| **에이전트 → Integrations** | 에이전트 상세 페이지에 **Integrations** 탭이 있습니다(왼쪽 사이드바에도 대응하는 섹션이 있습니다). owner와 admin에게는 여기에 **Bind to Lark**가 보이며, 바인딩되면 **Connected to Lark** 배지와 **Manage in Lark** 링크로 바뀝니다. |
+| **봇에게 개인 메시지** | 워크스페이스 멤버가 Lark에서 봇에게 직접 메시지를 보냅니다. 각 대화는 그 에이전트와의 Multica [chat](/chat) 세션이 되며, 에이전트는 해당 스레드에서 답변합니다. |
+| **그룹에서 `@` 멘션** | 봇을 Lark 그룹에 추가하고 `@`로 멘션하세요. 멘션한 메시지만 읽으며, 봇이 그룹 전체를 듣지는 않습니다. |
+| **`/issue` 명령** | `/issue <제목>`(본문 추가 가능)을 입력하면 워크스페이스에 새 Multica 이슈가 생성되고, 당신 이름으로 귀속됩니다. |
+| **실시간 답변 카드** | 봇은 인터랙티브 카드를 게시하고 에이전트가 실행되는 동안 계속 갱신합니다 — 진행 상황, 최종 답변, 또는 오류. |
+
+## 에이전트 바인딩하기 (owner / admin)
+
+바인딩은 스캔하여 설치하는 방식입니다 — 복사할 앱 시크릿도, 개발자 콘솔 작업도 없습니다.
+
+1. **Agents → _당신의 에이전트_**에서 에이전트를 엽니다.
+2. **Integrations** 탭으로 이동하거나(또는 왼쪽 사이드바의 **Integrations** 섹션 사용) **Bind to Lark**를 클릭합니다.
+3. QR 코드가 나타납니다. 휴대폰에서 **Lark → 스캔**을 열고, 새로 생긴 PersonalAgent 봇을 인증하세요.
+4. 스캔이 완료되면 대화상자가 닫히고 에이전트에 **Connected to Lark**가 표시됩니다. 당신의 Lark 신원이 자동으로 Multica 계정에 바인딩되므로, 곧바로 봇과 대화를 시작할 수 있습니다.
+
+<Callout type="info">
+QR 코드는 일회용이며 짧은 시간 후에 만료됩니다. 인증하기 전에 만료되면 **Scan again**을 클릭해 새 코드를 받으세요.
+</Callout>
+
+에이전트가 연결되면 **Bind to Lark** 버튼이 **Manage in Lark** 링크로 바뀝니다. 권한 범위를 조정하거나, 이름을 바꾸거나, 추가 권한을 요청해야 할 때 이 링크로 Lark에서 봇의 앱 페이지를 여세요 — 기존 봇이 고아가 되지 않도록 재스캔은 의도적으로 비활성화되어 있습니다.
+
+## 봇 사용하기 (멤버)
+
+### 첫 메시지: Lark 신원 바인딩하기
+
+봇에게 처음 메시지를 보내면, **Lark 신원을 바인딩**하라는 카드로 답합니다. 링크를 탭하고 Multica에 로그인하면, 당신의 Lark 계정이 Multica 멤버십에 연결됩니다. 바로 이 단계가 에이전트로 하여금 당신을 대신해 행동하게 합니다 — 예를 들어 `/issue`는 이슈를 당신 이름으로 생성합니다.
+
+<Callout type="warning">
+**워크스페이스 멤버**만 봇을 사용할 수 있습니다. 멤버가 아니거나 신원 바인딩을 건너뛰면 봇은 응답하지 않으며, 메시지는 폐기됩니다(감사 목적으로 기록되며, 내용은 저장하지 않습니다).
+</Callout>
+
+### 대화와 `/issue`
+
+- **무엇이든 에이전트에게 물어보기** — 봇에게 개인 메시지를 보내거나 그룹에서 `@`로 멘션하세요. 이 대화는 일반적인 에이전트 chat 세션이며, 에이전트는 카드에서 답변합니다.
+- **이슈 생성** — `/issue 로그인 리디렉션 수정`을 보내면 Multica가 워크스페이스에 그 이슈를 생성하며, 새 이슈가 으레 할당되는 방식 그대로 처리됩니다. 제목 뒤에 줄을 더 추가하면 설명이 됩니다.
+- **작업 지켜보기** — 답변 카드는 에이전트가 실행되는 동안 스스로 갱신되므로, 진행 상황과 결과를 그 자리에서 볼 수 있습니다.
+
+에이전트가 **오프라인**(런타임이 연결되지 않음)이거나 **보관됨** 상태라면, 봇은 메시지를 조용히 폐기하는 대신 짧은 상태 안내로 답합니다.
+
+## 관리 및 연결 해제
+
+워크스페이스 전체 관리는 **설정 → Integrations**에 있습니다.
+
+- **Connected bots**는 워크스페이스 내 모든 봇과 각 봇이 바인딩된 에이전트를 나열합니다. 이 목록은 모든 멤버에게 보입니다.
+- **Disconnect**는 **owner / admin 전용**입니다. 연결을 해제하면 봇이 Lark 메시지 수신을 멈추고 연결이 해체됩니다. 설치 기록은 감사용으로 유지되며, 이후 같은 에이전트를 다시 바인딩할 수 있습니다.
+
+## 권한
+
+- **바인딩 / 연결 해제**에는 워크스페이스 **owner** 또는 **admin**이 필요합니다. 멤버에게는 connected-bots 목록은 보이지만 바인딩이나 연결 해제 컨트롤은 보이지 않습니다.
+- **봇과 대화하기**에는 Lark 신원이 바인딩된 워크스페이스 멤버여야 합니다. 그 외의 사람은 모두 폐기됩니다.
+- 연동은 폐기된 메시지의 본문을 절대 저장하지 않으며 — 감사용 폐기 사유만 기록합니다.
+
+## 자체 호스팅 설정
+
+Multica Cloud에서는 연동이 이미 사용 가능합니다 — 이 섹션은 건너뛰세요.
+
+자체 호스팅의 경우, **at-rest 암호화 키를 설정하기 전까지 Lark는 꺼져 있습니다**. 이 키는 각 봇의 앱 시크릿이 데이터베이스에 닿기 전에 암호화합니다.
+
+1. 32바이트 키를 생성해 API 서버에 설정합니다.
+
+   ```dotenv
+   MULTICA_LARK_SECRET_KEY=<base64-encoded 32-byte key>
+   ```
+
+2. API를 재시작하세요. 키가 설정되기 전까지 **설정 → Integrations**에는 "Lark integration not enabled" 안내가 표시되고, **Bind to Lark** 진입점은 숨겨진 채로 유지됩니다.
+
+<Callout type="info">
+**Feishu와 Lark 국제판을 동시에 지원.** 각 Bot이 어느 클라우드(중국 본토 Feishu = `open.feishu.cn`, 국제판 Lark = `open.larksuite.com`)에 속하는지는 QR 코드를 스캔할 때 자동으로 감지되어 해당 설치에 저장되고, 그 Bot에 대한 모든 호출에 사용됩니다. 하나의 배포로 둘을 동시에 제공하므로, 어느 테넌트의 팀이든 추가 설정 없이 바인딩할 수 있습니다.
+
+`MULTICA_LARK_HTTP_BASE_URL` / `MULTICA_LARK_CALLBACK_BASE_URL`는 배포 전체를 덮어쓰는 선택적 오버라이드(프록시나 mock용)로만 남아 있습니다. 일반 운영에서는 설정하지 않은 채로 두어 각 설치가 자기 클라우드에 도달하도록 하세요.
+
+**단일 클라우드 구성에서 업그레이드하나요?** 이 변수들을 `https://open.larksuite.com`으로 설정해 국제판 Lark를 운영했다면, 업그레이드 후 첫 부팅 시 서버가 기존 설치를 Lark 리전으로 다시 표시하므로 이후에는 오버라이드를 지울 수 있습니다. 중국 본토 Feishu 배포에서는 별도 작업이 필요 없습니다.
+</Callout>
+
+## 다음
+
+- [에이전트](/agents) — 각 봇은 정확히 하나의 에이전트에 바인딩됩니다
+- [Chat](/chat) — 봇 대화가 Multica 내부에서 무엇에 대응하는지
+- [이슈](/issues) — `/issue`가 생성하는 것
+- [환경 변수](/environment-variables) — 전체 자체 호스팅 구성 참조
--- a/apps/docs/content/docs/lark-bot-integration.mdx
+++ b/apps/docs/content/docs/lark-bot-integration.mdx
@@ -0,0 +1,95 @@
+---
+title: Lark Bot integration
+description: Bind a Multica agent to a Lark (飞书) Bot, then talk to it from a Lark DM or group — @-mention it, chat naturally, or type /issue to file a Multica issue without leaving Lark.
+---
+
+import { Callout } from "fumadocs-ui/components/callout";
+
+Bind any [agent](/agents) to a Lark (飞书) Bot and your team can work with it from inside Lark — DM the Bot, @-mention it in a group, or type `/issue` to file a [Multica issue](/issues) without opening the app. The agent's replies stream back into the chat as a live card that updates while it works.
+
+Each Bot is bound **one-to-one** to a single Multica agent. Binding a second agent creates a second Bot; one agent never has two Bots.
+
+## What the integration does
+
+| Surface | Behavior |
+|---|---|
+| **Agent → Integrations** | The agent detail page has an **Integrations** tab (and a matching section in the left sidebar). Owners and admins see **Bind to Lark** there; once bound it flips to a **Connected to Lark** badge with a **Manage in Lark** link. |
+| **DM the Bot** | A workspace member messages the Bot directly in Lark. Each conversation becomes a Multica [chat](/chat) session with the agent; the agent answers in-thread. |
+| **@-mention in a group** | Add the Bot to a Lark group and @-mention it. Only the mentioning message is read — the Bot does not listen to the whole group. |
+| **`/issue` command** | Typing `/issue <title>` (optionally with a body) creates a new Multica issue in the workspace, attributed to you. |
+| **Live reply card** | The Bot posts an interactive card and keeps patching it as the agent runs — progress, the final answer, or an error. |
+
+## Bind an agent (owner / admin)
+
+Binding uses a scan-to-install flow — no app secrets to copy, no developer console steps.
+
+1. Open the agent in **Agents → _your agent_**.
+2. Go to the **Integrations** tab (or use the **Integrations** section in the left sidebar) and click **Bind to Lark**.
+3. A QR code appears. On your phone, open **Lark → Scan**, then authorize the new PersonalAgent Bot.
+4. When the scan completes the dialog closes and the agent shows **Connected to Lark**. Your own Lark identity is bound to your Multica account automatically, so you can start chatting with the Bot right away.
+
+<Callout type="info">
+The QR is single-use and expires after a short window. If it lapses before you authorize, click **Scan again** for a fresh code.
+</Callout>
+
+Once an agent is connected, the **Bind to Lark** button is replaced by a **Manage in Lark** link. Use it to open the Bot's app page in Lark when you need to adjust scopes, rename it, or request additional permissions — re-scanning is intentionally disabled so you don't strand the existing Bot.
+
+## Use the Bot (members)
+
+### First message: bind your Lark identity
+
+The first time you message the Bot, it replies with a card asking you to **bind your Lark identity**. Tap the link, sign in to Multica, and your Lark account is linked to your Multica membership. This is what lets the agent act as you — for example, `/issue` files the issue under your name.
+
+<Callout type="warning">
+Only people who are **members of the workspace** can use the Bot. If you aren't a member, or you skip the identity bind, the Bot won't respond — your message is dropped (and recorded for audit, without its contents).
+</Callout>
+
+### Chat and `/issue`
+
+- **Ask the agent anything** — DM the Bot or @-mention it in a group. The conversation is a normal agent chat session; the agent replies in the card.
+- **File an issue** — send `/issue Fix the login redirect` and Multica creates that issue in the workspace, assigned the way any new issue would be. Add more lines after the title for a description.
+- **Watch it work** — the reply card patches itself while the agent runs, so you see progress and the result in place.
+
+If the agent is **offline** (its runtime isn't connected) or **archived**, the Bot replies with a short status notice instead of silently dropping your message.
+
+## Manage and disconnect
+
+Workspace-wide management lives in **Settings → Integrations**:
+
+- **Connected bots** lists every Bot in the workspace and the agent each one is bound to. This list is visible to all members.
+- **Disconnect** is **owner / admin only**. Disconnecting stops the Bot from receiving Lark messages and tears down its connection; the installation record is kept for audit, and you can re-bind the same agent later.
+
+## Permissions
+
+- **Bind / disconnect** require workspace **owner** or **admin**. Members see the connected-bots list but no bind or disconnect controls.
+- **Talking to the Bot** requires being a workspace member with a bound Lark identity. Everyone else is dropped.
+- The integration never stores message bodies for dropped messages — only a drop reason, for audit.
+
+## Self-host setup
+
+On Multica Cloud the integration is already available — skip this section.
+
+For self-host, Lark is **off until you set an at-rest encryption key**. The key encrypts each Bot's app secret before it touches the database.
+
+1. Generate a 32-byte key and set it on the API server:
+
+   ```dotenv
+   MULTICA_LARK_SECRET_KEY=<base64-encoded 32-byte key>
+   ```
+
+2. Restart the API. Until the key is set, **Settings → Integrations** shows a "Lark integration not enabled" notice and the **Bind to Lark** entry points stay hidden.
+
+<Callout type="info">
+**Feishu and Lark international, side by side.** The cloud each Bot belongs to — mainland Feishu (`open.feishu.cn`) or Lark international (`open.larksuite.com`) — is detected automatically when you scan the QR, stored on the installation, and used for every call to that Bot. A single deployment serves both at once, so teams on either tenant can bind without any extra configuration.
+
+The `MULTICA_LARK_HTTP_BASE_URL` / `MULTICA_LARK_CALLBACK_BASE_URL` env vars remain only as an optional deployment-wide override (a proxy or a mock); leave them unset for normal operation so each installation keeps reaching its own cloud.
+
+**Upgrading from a single-cloud setup?** If you ran an international-Lark deployment by setting those vars to `https://open.larksuite.com`, the server relabels your existing installations to the Lark region on first boot after upgrade — you can then clear the override. Mainland deployments need no action.
+</Callout>
+
+## Next
+
+- [Agents](/agents) — each Bot is bound to exactly one agent
+- [Chat](/chat) — what a Bot conversation maps to inside Multica
+- [Issues](/issues) — what `/issue` creates
+- [Environment variables](/environment-variables) — full self-host configuration reference
--- a/apps/docs/content/docs/lark-bot-integration.zh.mdx
+++ b/apps/docs/content/docs/lark-bot-integration.zh.mdx
@@ -0,0 +1,95 @@
+---
+title: 飞书 Bot 接入
+description: 把 Multica 智能体绑定到飞书（Lark）Bot，就能直接在飞书里和它对话——私聊、群里 @ 它，或者输入 /issue 直接创建 Multica issue，全程不用离开飞书。
+---
+
+import { Callout } from "fumadocs-ui/components/callout";
+
+把任意[智能体](/agents)绑定到飞书 Bot，团队就能在飞书里直接使用它——私聊 Bot、在群里 @ 它，或者输入 `/issue` 直接创建一个 [Multica issue](/issues)，不用打开应用。智能体的回复会以一张实时卡片的形式回到聊天里，随着它干活不断更新。
+
+每个 Bot 与一个 Multica 智能体**一对一**绑定。再绑定一个智能体会创建另一个 Bot；一个智能体永远不会有两个 Bot。
+
+## 这个集成能做什么
+
+| 入口 | 行为 |
+|---|---|
+| **智能体 → 集成** | 智能体详情页有一个 **集成（Integrations）** tab（左侧栏也有对应的区块）。所有者和管理员能在这里看到 **绑定到飞书**；绑定后会变成 **已连接到飞书** 徽标，并带一个 **在飞书中管理** 链接。 |
+| **私聊 Bot** | 工作区成员在飞书里直接给 Bot 发消息。每段对话都会成为该智能体的一个 Multica [chat](/chat) 会话，智能体在会话里回复。 |
+| **群里 @ 它** | 把 Bot 加进飞书群再 @ 它。Bot 只读取 @ 它的那条消息，不会监听整个群。 |
+| **`/issue` 命令** | 输入 `/issue <标题>`（可附正文）会在工作区创建一个新的 Multica issue，记在你名下。 |
+| **实时回复卡片** | Bot 会发出一张可交互卡片，并随着智能体运行不断更新——进度、最终答复或报错。 |
+
+## 绑定智能体（所有者 / 管理员）
+
+绑定走的是扫码安装流程——不用复制任何应用密钥，也不用进开发者后台。
+
+1. 在 **Agents → 你的智能体** 打开该智能体。
+2. 进入 **集成（Integrations）** tab（或使用左侧栏的 **集成** 区块），点击 **绑定到飞书**。
+3. 弹出一个二维码。用手机打开 **飞书 → 扫一扫**，然后授权这个新的 PersonalAgent Bot。
+4. 扫码完成后弹窗关闭，智能体显示 **已连接到飞书**。你自己的飞书身份会自动绑定到你的 Multica 账号，绑完即可开始和 Bot 对话。
+
+<Callout type="info">
+二维码是一次性的，并且会在较短时间后过期。如果在授权前就过期了，点 **重新扫码** 获取新码即可。
+</Callout>
+
+智能体连接后，**绑定到飞书** 按钮会替换成 **在飞书中管理** 链接。需要调整权限范围、改名或申请更多权限时，用它打开 Bot 在飞书里的应用页面——重新扫码被刻意禁用，以免把已有的 Bot 弄成孤儿。
+
+## 使用 Bot（成员）
+
+### 第一条消息：绑定你的飞书身份
+
+第一次给 Bot 发消息时，它会回一张卡片，让你 **绑定飞书身份**。点开链接、登录 Multica，你的飞书账号就会关联到你的 Multica 成员身份。正是这一步让智能体能以你的身份行事——比如 `/issue` 会把 issue 记在你名下。
+
+<Callout type="warning">
+只有**工作区成员**才能使用 Bot。如果你不是成员，或者跳过了身份绑定，Bot 不会回复——你的消息会被丢弃（仅出于审计目的记录，不保存消息内容）。
+</Callout>
+
+### 对话与 `/issue`
+
+- **随便问智能体** —— 私聊 Bot，或在群里 @ 它。对话就是一段普通的智能体 chat 会话，智能体在卡片里回复。
+- **创建 issue** —— 发送 `/issue 修复登录跳转`，Multica 会在工作区创建这个 issue，和新建任何 issue 一样。标题后面再加几行就是描述。
+- **看它干活** —— 回复卡片会随着智能体运行不断更新，进度和结果都在原处呈现。
+
+如果智能体**离线**（运行时未连接）或**已归档**，Bot 会回一条简短的状态提示，而不是悄悄丢掉你的消息。
+
+## 管理与断开
+
+工作区级别的管理在 **设置 → 集成**：
+
+- **已连接的 Bot** 列出工作区里每个 Bot 以及它绑定的智能体。这个列表所有成员都能看到。
+- **断开连接** 仅限 **所有者 / 管理员**。断开后 Bot 停止接收飞书消息、连接被销毁；安装记录会保留以便审计，之后你可以重新绑定同一个智能体。
+
+## 权限
+
+- **绑定 / 断开** 需要工作区**所有者**或**管理员**。成员能看到已连接 Bot 列表，但看不到绑定或断开的操作。
+- **和 Bot 对话** 需要你是工作区成员且已绑定飞书身份。其余的人一律被丢弃。
+- 对于被丢弃的消息，集成不会保存消息内容——只记录一个丢弃原因，用于审计。
+
+## 自部署配置
+
+在 Multica Cloud 上这个集成已经可用——可跳过本节。
+
+自部署时，**在你设置好静态加密密钥之前，飞书集成是关闭的**。这个密钥会在每个 Bot 的 app secret 落库之前对其加密。
+
+1. 生成一个 32 字节的密钥并设置到 API 服务器：
+
+   ```dotenv
+   MULTICA_LARK_SECRET_KEY=<base64 编码的 32 字节密钥>
+   ```
+
+2. 重启 API。在密钥设置好之前，**设置 → 集成** 会显示「未启用飞书集成」提示，**绑定到飞书** 入口也会保持隐藏。
+
+<Callout type="info">
+**同时支持飞书与海外版 Lark。** 每个 Bot 属于哪个云——中国大陆飞书（`open.feishu.cn`）还是海外版 Lark（`open.larksuite.com`）——会在你扫码时自动识别、记录在该安装上，并用于对这个 Bot 的所有调用。同一个部署可以同时服务两者，因此两个租户的团队都能直接绑定，无需任何额外配置。
+
+`MULTICA_LARK_HTTP_BASE_URL` / `MULTICA_LARK_CALLBACK_BASE_URL` 仅作为可选的部署级覆盖项保留（用于代理或 mock）；正常运行时请保持不设置，让每个安装各自连到自己的云。
+
+**从单云部署升级？** 如果你之前是把这两个变量设为 `https://open.larksuite.com` 来跑海外版 Lark，升级后服务会在首次启动时自动把存量安装重标记为 Lark region，之后你就可以清掉这个覆盖项。国内飞书部署无需任何操作。
+</Callout>
+
+## 下一步
+
+- [智能体](/agents) —— 每个 Bot 都绑定到恰好一个智能体
+- [Chat](/chat) —— 一段 Bot 对话在 Multica 里对应什么
+- [Issues](/issues) —— `/issue` 创建的是什么
+- [环境变量](/environment-variables) —— 完整的自部署配置参考
--- a/apps/docs/content/docs/meta.ja.json
+++ b/apps/docs/content/docs/meta.ja.json
@@ -31,6 +31,7 @@
    "inbox",
    "---連携---",
    "github-integration",
+    "lark-bot-integration",
    "---セルフホスト & 運用---",
    "environment-variables",
    "auth-setup",
--- a/apps/docs/content/docs/meta.json
+++ b/apps/docs/content/docs/meta.json
@@ -31,6 +31,7 @@
    "inbox",
    "---Integrations---",
    "github-integration",
+    "lark-bot-integration",
    "---Self-hosting & ops---",
    "environment-variables",
    "auth-setup",
--- a/apps/docs/content/docs/meta.ko.json
+++ b/apps/docs/content/docs/meta.ko.json
@@ -31,6 +31,7 @@
    "inbox",
    "---연동---",
    "github-integration",
+    "lark-bot-integration",
    "---자체 호스팅 & 운영---",
    "environment-variables",
    "auth-setup",
--- a/apps/docs/content/docs/meta.zh.json
+++ b/apps/docs/content/docs/meta.zh.json
@@ -30,6 +30,7 @@
    "inbox",
    "---集成---",
    "github-integration",
+    "lark-bot-integration",
    "---自部署运维---",
    "environment-variables",
    "auth-setup",
--- a/apps/docs/content/docs/providers.ja.mdx
+++ b/apps/docs/content/docs/providers.ja.mdx
@@ -13,32 +13,32 @@ Multica は **12 個の AI コーディングツール**を標準でサポート

 | ツール | ベンダー | セッション再開 | MCP | スキル注入パス | モデル選択 |
 |---|---|---|---|---|---|
-| **Antigravity** | Google | ✅ (`--conversation <id>`) | ❌ | `.agents/skills/` | Antigravity CLI 自体の内部で管理 |
-| **Claude Code** | Anthropic | ✅ | **✅（実際に使用する唯一のツール）** | `.claude/skills/` | 静的 + flag |
-| **Codex** | OpenAI | ⚠️ コードは存在するが到達不可 | ❌ | `$CODEX_HOME/skills/` | 静的 |
+| **Antigravity** | Google | ✅ (`--conversation <id>`) | ❌ | `.agents/skills/` | 動的探索（`agy models`） |
+| **Claude Code** | Anthropic | ✅ | ✅ | `.claude/skills/` | 静的 + flag |
+| **Codex** | OpenAI | ✅ | ✅ | `$CODEX_HOME/skills/` | 静的 |
 | **Copilot** | GitHub | ✅ | ❌ | `.github/skills/` | 静的（アカウントの権限で決定） |
-| **Cursor** | Anysphere | ⚠️ コードは存在するが使用不可 | ❌ | `.cursor/skills/` | 動的探索 |
+| **Cursor** | Anysphere | ✅ | ✅ | `.cursor/skills/` | 動的探索 |
 | **Gemini** | Google | ❌ | ❌ | `.agent_context/skills/` | 静的 |
-| **Hermes** | Nous Research | ✅ | ❌ | `.agent_context/skills/`（フォールバック） | 動的探索 |
-| **Kimi** | Moonshot | ✅ | ❌ | `.kimi/skills/` | 動的探索 |
-| **Kiro CLI** | Amazon | ✅ | ❌ | `.kiro/skills/` | 動的探索 |
-| **OpenCode** | SST | ✅ | ❌ | `.opencode/skills/` | 動的探索 |
-| **OpenClaw** | オープンソース | ✅ | ❌ | `.agent_context/skills/`（フォールバック） | エージェントにバインドされ、タスクごとに切り替え不可 |
+| **Hermes** | Nous Research | ✅ | ✅ | `.agent_context/skills/`（フォールバック） | 動的探索 |
+| **Kimi** | Moonshot | ✅ | ✅ | `.kimi/skills/` | 動的探索 |
+| **Kiro CLI** | Amazon | ✅ | ✅ | `.kiro/skills/` | 動的探索 |
+| **OpenCode** | SST | ✅ | ✅ | `.opencode/skills/` | 動的探索 + variant |
+| **OpenClaw** | オープンソース | ✅ | ✅ | `.agent_context/skills/`（フォールバック） | エージェントにバインドされ、タスクごとに切り替え不可 |
 | **Pi** | Inflection AI | ✅（セッションがファイルパス） | ❌ | `.pi/skills/` | 動的探索 |

 ## 各ツールの用途

 ### Antigravity

-Google が提供します。CLI バイナリ名は `agy` です。Google の Antigravity サービスと連携し、Gemini ベースのデフォルトモデルが付属しています。**セッション再開が動作します** — `--conversation <id>` を通じて行われ、stdout が構造化されたイベントストリームではなくプレーンテキストであるため、デーモンが CLI のログファイルから conversation UUID をキャプチャします。`--model` flag はありません — モデル選択は Antigravity CLI の設定内にあるため、Multica はこのプロバイダーに対してエージェントごとのモデルピッカーを無効にします。スキルは `.agents/skills/` に配置されます（CLI が Gemini CLI のワークスペーススキルレイアウトをそのまま継承します — [Antigravity 移行ドキュメント](https://antigravity.google/docs/gcli-migration)を参照）。
+Google が提供します。CLI バイナリ名は `agy` です。Google の Antigravity サービスと連携し、Gemini ベースのデフォルトモデルが付属しています。**セッション再開が動作します** — `--conversation <id>` を通じて行われ、stdout が構造化されたイベントストリームではなくプレーンテキストであるため、デーモンが CLI のログファイルから conversation UUID をキャプチャします。**モデル選択が動作します** — `--model` flag（agy 1.0.6 で追加）を通じて行われ、デーモンが `agy models` でカタログを列挙し、選択された値をそのまま渡します。これらは `provider/model` slug ではなく `Claude Opus 4.6 (Thinking)` のような人間が読める表示名である点に注意してください。また agy は認識できない値を渡すと黙って空実行するため、手入力ではなく検出されたリストから選ぶことをおすすめします。スキルは `.agents/skills/` に配置されます（CLI が Gemini CLI のワークスペーススキルレイアウトをそのまま継承します — [Antigravity 移行ドキュメント](https://antigravity.google/docs/gcli-migration)を参照）。

 ### Claude Code

-Anthropic が提供します。**新規ユーザーにとって第一の選択肢**であり、最も完成度の高い機能セットを備えています: セッション再開が実際に動作し、**11 個の中で MCP 構成を本当に読み取る唯一のツール**であり、`--max-turns` や `--append-system-prompt` のような細かな調整 flag をサポートします。Anthropic API キーが必要です。
+Anthropic が提供します。**新規ユーザーにとって第一の選択肢**であり、最も完成度の高い機能セットを備えています: セッション再開が実際に動作し、MCP 構成を読み取り、`--max-turns` や `--append-system-prompt` のような細かな調整 flag をサポートします。Anthropic API キーが必要です。

 ### Codex

-OpenAI が提供します。JSON-RPC 2.0 を使用し、ステートフルな能力がより強く、よりきめ細かい承認メカニズム（`exec_command` および `patch_apply` に対する手動承認）を備えています。**セッション再開のコードは存在しますが、現在は到達できません** — 再開が必要なら、Claude Code または ACP 系のいずれかを選んでください。
+OpenAI が提供します。JSON-RPC 2.0 を使用し、ステートフルな能力がより強く、よりきめ細かい承認メカニズム（`exec_command` および `patch_apply` に対する手動承認）を備えています。MCP 構成はタスクごとの `$CODEX_HOME/config.toml` に書き込まれます。**セッション再開は動作します** — Multica は Codex app-server の `thread/resume` で再開します。保存済み thread が見つからない、または古い場合は、新しい thread にフォールバックしてタスクを続行します。

 ### Copilot

@@ -46,7 +46,7 @@ GitHub が提供します。モデルルーティングは GitHub アカウン

 ### Cursor

-Anysphere が提供し、Cursor エディターに対応する CLI です。**セッション再開のコードは存在しますが、実際には動作しません** — Cursor CLI のイベントストリームがセッション ID を返さないため、渡す再開値は常に無効です。再開が必要なら、別のものを選んでください。
+Anysphere が提供し、Cursor エディターに対応する CLI です。**セッション再開は動作します** — 現在の Cursor Agent の stream-json イベントには `session_id` が含まれ、Multica は次回実行時に `--resume <id>` でそれを渡します。MCP 構成はタスクワークスペースの `.cursor/mcp.json` に書き込まれ、Cursor のプロジェクト approval ファイルはタスクごとの `CURSOR_DATA_DIR` 配下に置かれるため、管理対象 MCP server はユーザーのグローバル Cursor approvals に依存しません。

 ### Gemini

@@ -54,23 +54,23 @@ Google が提供し、Gemini 2.5 および 3 シリーズをサポートしま

 ### Hermes

-Nous Research が提供します。ACP プロトコルを使用します（Kimi とトランスポート層を共有します）。セッション再開が動作します。しかし**スキル注入パスは専用のものではなく汎用のフォールバック**（`.agent_context/skills/`）です — Hermes CLI 自体がこのパスを読み取らない場合、スキルが適用されないことがあります。テストで確認してください。
+Nous Research が提供します。ACP プロトコルを使用します（Kimi とトランスポート層を共有します）。セッション再開が動作し、MCP 構成は ACP `mcpServers` として渡されます。しかし**スキル注入パスは専用のものではなく汎用のフォールバック**（`.agent_context/skills/`）です — Hermes CLI 自体がこのパスを読み取らない場合、スキルが適用されないことがあります。テストで確認してください。

 ### Kimi

-Moonshot が提供し、中国市場を対象としています。Hermes と ACP プロトコルを共有しますが、スキルパス `.kimi/skills/` は Kimi CLI のネイティブな探索メカニズムであり、Hermes のフォールバックとは異なります。
+Moonshot が提供し、中国市場を対象としています。Hermes と ACP プロトコルを共有し、MCP 構成も ACP `mcpServers` として渡されますが、スキルパス `.kimi/skills/` は Kimi CLI のネイティブな探索メカニズムであり、Hermes のフォールバックとは異なります。

 ### Kiro CLI

-Amazon が提供します。`kiro-cli acp` を通じて stdio 上で ACP を使用します。セッション再開は ACP `session/load` で動作し、モデル選択は `session/set_model` で動作し、スキルはプロジェクトレベルのネイティブ探索のために `.kiro/skills/` にコピーされます。
+Amazon が提供します。`kiro-cli acp` を通じて stdio 上で ACP を使用します。セッション再開は ACP `session/load` で動作し、MCP 構成は ACP `mcpServers` として渡され、モデル選択は `session/set_model` で動作し、スキルはプロジェクトレベルのネイティブ探索のために `.kiro/skills/` にコピーされます。

 ### OpenCode

-SST が提供するオープンソースです。利用可能なモデルを動的に探索します（CLI の構成ファイルをスキャン）。セッション再開が動作します。**自分のモデルカタログをカスタマイズしたい、いじるのが好きなユーザーに適しています。**
+SST が提供するオープンソースです。利用可能なモデルと model variant を動的に探索します（CLI の構成ファイルをスキャン）。セッション再開が動作し、エージェントの `mcp_config` フィールドを消費します。Multica は `OPENCODE_CONFIG_CONTENT` 環境変数でインライン注入するため、エージェントの MCP server はタスク workdir の `opencode.json`（エージェントまたはユーザーが所有するファイル）を書き換えずに OpenCode に届きます。モデルが variant を公開している場合、Multica はそれをエージェントの thinking selector として表示し、選択値を `opencode run --variant` で OpenCode に渡します。**自分のモデルカタログをカスタマイズしたい、いじるのが好きなユーザーに適しています。**

 ### OpenClaw

-オープンソースプロジェクトであり、CLI エージェントオーケストレーターです。**モデルはエージェント層にバインドされます**（`openclaw agents add --model`） — タスクごとに上書きできません。構成は厳格に制御されます: ユーザーは `--model` や `--system-prompt` を渡せず、エージェント登録時の構成が決定します。
+オープンソースプロジェクトであり、CLI エージェントオーケストレーターです。MCP 構成は Multica のタスクごとの config wrapper 経由で書き込まれます。**モデルはエージェント層にバインドされます**（`openclaw agents add --model`） — タスクごとに上書きできません。構成は厳格に制御されます: ユーザーは `--model` や `--system-prompt` を渡せず、エージェント登録時の構成が決定します。

 ### Pi

@@ -82,18 +82,19 @@ Inflection AI が提供し、ミニマルです。**セッション再開の方

 | 状態 | ツール | 意味 |
 |---|---|---|
-| ✅ 実際に動作 | Antigravity, Claude Code, Copilot, Hermes, Kimi, Kiro CLI, OpenCode, OpenClaw, Pi | 再開 id を渡すと以前のコンテキストから続行します |
-| ⚠️ コードは存在するが到達不可 | Codex, Cursor | コードに再開パスがありますが、実際には到達しません（Codex は静かにフォールバックし、Cursor はセッション id を返しません） — **未サポートとみなしてください** |
+| ✅ 実際に動作 | Antigravity, Claude Code, Codex, Copilot, Cursor, Hermes, Kimi, Kiro CLI, OpenCode, OpenClaw, Pi | 再開 id を渡すと以前のコンテキストから続行します |
 | ❌ なし | Gemini | CLI に再開メカニズムがありません |

 **意思決定のために**: ワークフローでエージェントがタスク間でコンテキストを保持する必要がある場合（失敗時のリトライ、手動の再実行、対話的な反復）、✅ の行にあるツールだけを選んでください。

-## MCP 構成: Claude Code だけが実際に読み取る
+## MCP 構成: ツールごとの対応

-**12 個のツールのうち、`mcp_config` を実際に消費するのは Claude Code だけです**。残りの 11 個はこのフィールドを受け取りますが、**完全に無視します** — エラーも警告もなく、構成はただ効果を発揮しません。
+**12 個のツールのうち、`mcp_config` を実際に消費するのは 8 個です: Claude Code、Codex、Cursor、Hermes、Kimi、Kiro CLI、OpenCode、OpenClaw**。残りの 4 個はこのフィールドを受け取りますが、**無視します** — エラーも警告もなく、構成はただ効果を発揮しません。
+
+接続方式はツールごとに異なります: Claude Code は `--mcp-config` と `--strict-mcp-config` で受け取り、Codex は daemon 管理の `mcp_servers` ブロックをタスクごとの `$CODEX_HOME/config.toml` に書き込み、Cursor は `.cursor/mcp.json` とタスクごとの `CURSOR_DATA_DIR` 配下のプロジェクト approval を書き込みます。Hermes、Kimi、Kiro CLI は ACP `mcpServers` で受け取ります。OpenCode は `OPENCODE_CONFIG_CONTENT` 環境変数でインライン構成を受け取り、OpenClaw は Multica のタスクごとの config wrapper 経由で `mcp.servers` を受け取ります。OpenCode の経路はプロジェクトの `opencode.json` を書き換えません。

 <Callout type="warning">
-エージェント構成で `mcp_config` を設定しても、Claude Code 以外のツールを選んだ場合、MCP サーバーはそのエージェントに**何の効果**も及ぼしません。現在、MCP 連携は Claude Code のみをカバーしています。
+エージェント構成で `mcp_config` を設定しても、MCP 列に ✅ がないツールを選んだ場合、MCP サーバーはそのエージェントに**何の効果**も及ぼしません。MCP 連携はツールごとに実装されています。
 </Callout>

 ## スキルファイルが置かれる場所
--- a/apps/docs/content/docs/providers.ko.mdx
+++ b/apps/docs/content/docs/providers.ko.mdx
@@ -13,11 +13,11 @@ Multica는 **12개의 AI 코딩 도구**를 기본 지원합니다. 이들은

 | 도구 | 공급사 | 세션 재개 | MCP | 스킬 주입 경로 | 모델 선택 |
 |---|---|---|---|---|---|
-| **Antigravity** | Google | ✅ (`--conversation <id>`) | ❌ | `.agents/skills/` | Antigravity CLI 자체 내부에서 관리 |
+| **Antigravity** | Google | ✅ (`--conversation <id>`) | ❌ | `.agents/skills/` | 동적 탐색(`agy models`) |
 | **Claude Code** | Anthropic | ✅ | ✅ | `.claude/skills/` | 정적 + flag |
-| **Codex** | OpenAI | ⚠️ 코드는 존재하지만 도달 불가 | ✅ | `$CODEX_HOME/skills/` | 정적 |
+| **Codex** | OpenAI | ✅ | ✅ | `$CODEX_HOME/skills/` | 정적 |
 | **Copilot** | GitHub | ✅ | ❌ | `.github/skills/` | 정적 (계정 권한으로 결정) |
-| **Cursor** | Anysphere | ⚠️ 코드는 존재하지만 사용 불가 | ❌ | `.cursor/skills/` | 동적 탐색 |
+| **Cursor** | Anysphere | ✅ | ✅ | `.cursor/skills/` | 동적 탐색 |
 | **Gemini** | Google | ❌ | ❌ | `.agent_context/skills/` | 정적 |
 | **Hermes** | Nous Research | ✅ | ✅ | `.agent_context/skills/` (fallback) | 동적 탐색 |
 | **Kimi** | Moonshot | ✅ | ✅ | `.kimi/skills/` | 동적 탐색 |
@@ -30,7 +30,7 @@ Multica는 **12개의 AI 코딩 도구**를 기본 지원합니다. 이들은

 ### Antigravity

-Google에서 제공합니다. CLI 바이너리 이름은 `agy`입니다. Google의 Antigravity 서비스와 연동되며 Gemini 기반의 기본 모델을 함께 제공합니다. **세션 재개가 동작합니다** — `--conversation <id>`를 통해서이며, stdout이 구조화된 이벤트 스트림이 아니라 일반 텍스트이기 때문에 데몬이 CLI의 로그 파일에서 conversation UUID를 캡처합니다. `--model` flag는 없습니다 — 모델 선택은 Antigravity CLI 설정 안에 있으므로, Multica는 이 제공자에 대해 에이전트별 모델 선택기를 비활성화합니다. 스킬은 `.agents/skills/`에 들어갑니다(CLI가 Gemini CLI의 워크스페이스 스킬 레이아웃을 그대로 따릅니다 — [Antigravity 마이그레이션 문서](https://antigravity.google/docs/gcli-migration) 참고).
+Google에서 제공합니다. CLI 바이너리 이름은 `agy`입니다. Google의 Antigravity 서비스와 연동되며 Gemini 기반의 기본 모델을 함께 제공합니다. **세션 재개가 동작합니다** — `--conversation <id>`를 통해서이며, stdout이 구조화된 이벤트 스트림이 아니라 일반 텍스트이기 때문에 데몬이 CLI의 로그 파일에서 conversation UUID를 캡처합니다. **모델 선택이 동작합니다** — `--model` flag(agy 1.0.6에서 추가)를 통해서이며, 데몬이 `agy models`로 카탈로그를 열거하고 선택된 값을 그대로 전달합니다. 이 값들은 `provider/model` slug가 아니라 `Claude Opus 4.6 (Thinking)` 같은 사람이 읽는 표시 이름이라는 점에 유의하세요. 또한 agy는 인식할 수 없는 값을 받으면 조용히 빈 실행을 하므로, 직접 입력하기보다 발견된 목록에서 선택하는 것을 권장합니다. 스킬은 `.agents/skills/`에 들어갑니다(CLI가 Gemini CLI의 워크스페이스 스킬 레이아웃을 그대로 따릅니다 — [Antigravity 마이그레이션 문서](https://antigravity.google/docs/gcli-migration) 참고).

 ### Claude Code

@@ -38,7 +38,7 @@ Anthropic에서 제공합니다. **신규 사용자에게 첫 번째 선택지**

 ### Codex

-OpenAI에서 제공합니다. JSON-RPC 2.0을 사용하고, 상태 유지 능력이 더 강하며, 더 세밀한 승인 메커니즘(`exec_command` 및 `patch_apply`에 대한 수동 승인)을 갖추고 있습니다. MCP 구성은 작업별 `$CODEX_HOME/config.toml`에 기록됩니다. **세션 재개 코드는 존재하지만 현재 도달할 수 없습니다** — 재개가 필요하다면 Claude Code나 ACP 계열 중 하나를 선택하세요.
+OpenAI에서 제공합니다. JSON-RPC 2.0을 사용하고, 상태 유지 능력이 더 강하며, 더 세밀한 승인 메커니즘(`exec_command` 및 `patch_apply`에 대한 수동 승인)을 갖추고 있습니다. MCP 구성은 작업별 `$CODEX_HOME/config.toml`에 기록됩니다. **세션 재개가 동작합니다** — Multica는 Codex app-server의 `thread/resume`으로 재개합니다. 저장된 thread가 없거나 오래된 경우에는 새 thread로 폴백해 작업을 계속 실행합니다.

 ### Copilot

@@ -46,7 +46,7 @@ GitHub에서 제공합니다. 모델 라우팅은 GitHub 계정 권한을 거칩

 ### Cursor

-Anysphere에서 제공하며, Cursor 에디터에 대응하는 CLI입니다. **세션 재개 코드는 존재하지만 실제로는 동작하지 않습니다** — Cursor CLI 이벤트 스트림이 세션 ID를 반환하지 않으므로, 전달하는 재개 값은 항상 무효입니다. 재개가 필요하다면 다른 것을 선택하세요.
+Anysphere에서 제공하며, Cursor 에디터에 대응하는 CLI입니다. **세션 재개가 동작합니다** — 현재 Cursor Agent의 stream-json 이벤트에는 `session_id`가 포함되며, Multica는 다음 실행 때 이를 `--resume <id>`로 다시 전달합니다. MCP 구성은 작업 워크스페이스의 `.cursor/mcp.json`에 기록되고, Cursor의 프로젝트 approval 파일은 작업별 `CURSOR_DATA_DIR` 아래에 기록되므로, 관리되는 MCP 서버는 사용자의 전역 Cursor approval에 의존하지 않습니다.

 ### Gemini

@@ -82,17 +82,16 @@ Inflection AI에서 제공하며, 미니멀합니다. **세션 재개 방식이

 | 상태 | 도구 | 의미 |
 |---|---|---|
-| ✅ 실제로 동작 | Antigravity, Claude Code, Copilot, Hermes, Kimi, Kiro CLI, OpenCode, OpenClaw, Pi | 재개 id를 전달하면 이전 컨텍스트에서 이어집니다 |
-| ⚠️ 코드는 존재하지만 도달 불가 | Codex, Cursor | 코드에 재개 경로가 있지만 실제로는 도달하지 않습니다(Codex는 조용히 폴백하고, Cursor는 세션 id를 반환하지 않습니다) — **미지원으로 간주하세요** |
+| ✅ 실제로 동작 | Antigravity, Claude Code, Codex, Copilot, Cursor, Hermes, Kimi, Kiro CLI, OpenCode, OpenClaw, Pi | 재개 id를 전달하면 이전 컨텍스트에서 이어집니다 |
 | ❌ 없음 | Gemini | CLI에 재개 메커니즘이 없습니다 |

 **의사결정을 위해**: 워크플로에서 에이전트가 작업 간에 컨텍스트를 유지해야 한다면(실패 재시도, 수동 재실행, 대화형 반복), ✅ 행에 있는 도구만 선택하세요.

 ## MCP 구성: 도구별 지원

-**12개 도구 중 `mcp_config`를 실제로 소비하는 것은 7개입니다: Claude Code, Codex, Hermes, Kimi, Kiro CLI, OpenCode, OpenClaw**. 나머지 5개는 이 필드를 받아들이지만 **무시합니다** — 오류도, 경고도 없으며, 구성이 그저 효과를 내지 못합니다.
+**12개 도구 중 `mcp_config`를 실제로 소비하는 것은 8개입니다: Claude Code, Codex, Cursor, Hermes, Kimi, Kiro CLI, OpenCode, OpenClaw**. 나머지 4개는 이 필드를 받아들이지만 **무시합니다** — 오류도, 경고도 없으며, 구성이 그저 효과를 내지 못합니다.

-각 도구의 연결 방식은 다릅니다: Claude Code는 `--mcp-config`와 `--strict-mcp-config`로 받고, Codex는 데몬이 관리하는 `mcp_servers` 블록을 작업별 `$CODEX_HOME/config.toml`에 기록하며, Hermes/Kimi/Kiro CLI는 ACP `mcpServers`로 받습니다. OpenCode는 `OPENCODE_CONFIG_CONTENT` 환경 변수로 인라인 구성을 받고, OpenClaw는 Multica의 작업별 config wrapper를 통해 `mcp.servers`를 받습니다. OpenCode 경로는 프로젝트의 `opencode.json`을 다시 쓰지 않습니다.
+각 도구의 연결 방식은 다릅니다: Claude Code는 `--mcp-config`와 `--strict-mcp-config`로 받고, Codex는 데몬이 관리하는 `mcp_servers` 블록을 작업별 `$CODEX_HOME/config.toml`에 기록하며, Cursor는 `.cursor/mcp.json`과 작업별 `CURSOR_DATA_DIR` 아래의 프로젝트 approval을 기록합니다. Hermes/Kimi/Kiro CLI는 ACP `mcpServers`로 받습니다. OpenCode는 `OPENCODE_CONFIG_CONTENT` 환경 변수로 인라인 구성을 받고, OpenClaw는 Multica의 작업별 config wrapper를 통해 `mcp.servers`를 받습니다. OpenCode 경로는 프로젝트의 `opencode.json`을 다시 쓰지 않습니다.

 <Callout type="warning">
 에이전트 구성에서 `mcp_config`를 설정했더라도 MCP 열에 ✅가 없는 도구를 선택하면, MCP 서버가 해당 에이전트에 **아무런 효과**도 미치지 않습니다. MCP 연동은 도구별로 구현됩니다.
--- a/apps/docs/content/docs/providers.mdx
+++ b/apps/docs/content/docs/providers.mdx
@@ -13,11 +13,11 @@ For guidance on picking a tool when creating an agent, see [Creating and configu

 | Tool | Vendor | Session resumption | MCP | Skill injection path | Model selection |
 |---|---|---|---|---|---|
-| **Antigravity** | Google | ✅ (`--conversation <id>`) | ❌ | `.agents/skills/` | Managed inside the Antigravity CLI itself |
+| **Antigravity** | Google | ✅ (`--conversation <id>`) | ❌ | `.agents/skills/` | Dynamic discovery (`agy models`) |
 | **Claude Code** | Anthropic | ✅ | ✅ | `.claude/skills/` | Static + flag |
-| **Codex** | OpenAI | ⚠️ Code exists but unreachable | ✅ | `$CODEX_HOME/skills/` | Static |
+| **Codex** | OpenAI | ✅ | ✅ | `$CODEX_HOME/skills/` | Static |
 | **Copilot** | GitHub | ✅ | ❌ | `.github/skills/` | Static (determined by account entitlement) |
-| **Cursor** | Anysphere | ⚠️ Code exists but unusable | ❌ | `.cursor/skills/` | Dynamic discovery |
+| **Cursor** | Anysphere | ✅ | ✅ | `.cursor/skills/` | Dynamic discovery |
 | **Gemini** | Google | ❌ | ❌ | `.agent_context/skills/` | Static |
 | **Hermes** | Nous Research | ✅ | ✅ | `.agent_context/skills/` (fallback) | Dynamic discovery |
 | **Kimi** | Moonshot | ✅ | ✅ | `.kimi/skills/` | Dynamic discovery |
@@ -30,7 +30,7 @@ For guidance on picking a tool when creating an agent, see [Creating and configu

 ### Antigravity

-From Google. CLI binary name is `agy`. Pairs with Google's Antigravity service and ships with a Gemini-backed default model. **Session resumption works** via `--conversation <id>`; the daemon captures the conversation UUID from the CLI's log file because stdout is plain text rather than a structured event stream. There is no `--model` flag — model selection lives inside the Antigravity CLI settings, so Multica disables the per-agent model picker for this provider. Skills land in `.agents/skills/` (the CLI inherits Gemini CLI's workspace skill layout — see [Antigravity migration docs](https://antigravity.google/docs/gcli-migration)).
+From Google. CLI binary name is `agy`. Pairs with Google's Antigravity service and ships with a Gemini-backed default model. **Session resumption works** via `--conversation <id>`; the daemon captures the conversation UUID from the CLI's log file because stdout is plain text rather than a structured event stream. **Model selection works** via the `--model` flag (added in agy 1.0.6): the daemon enumerates the catalog with `agy models` and ships the chosen value verbatim. Note these are human display strings such as `Claude Opus 4.6 (Thinking)`, not `provider/model` slugs — and agy silently no-ops on a value it doesn't recognise, so prefer picking from the discovered list over typing a custom one. Skills land in `.agents/skills/` (the CLI inherits Gemini CLI's workspace skill layout — see [Antigravity migration docs](https://antigravity.google/docs/gcli-migration)).

 ### Claude Code

@@ -38,7 +38,7 @@ From Anthropic. **First choice for new users** — the most complete feature set

 ### Codex

-From OpenAI. Uses JSON-RPC 2.0, has stronger statefulness, and a finer-grained approve mechanism (manual approval for `exec_command` and `patch_apply`). MCP config is materialized into the per-task `$CODEX_HOME/config.toml`. **Session resumption code exists but is currently unreachable** — if you need resume, pick Claude Code or one of the ACP family.
+From OpenAI. Uses JSON-RPC 2.0, has stronger statefulness, and a finer-grained approve mechanism (manual approval for `exec_command` and `patch_apply`). MCP config is materialized into the per-task `$CODEX_HOME/config.toml`. **Session resumption works** through Codex app-server `thread/resume`; if the saved thread is missing or stale, Multica falls back to a fresh thread so the task can still run.

 ### Copilot

@@ -46,7 +46,7 @@ From GitHub. Model routing goes through your GitHub account entitlement — the

 ### Cursor

-From Anysphere, the CLI counterpart to the Cursor editor. **Session resumption code exists but doesn't actually work** — the Cursor CLI event stream doesn't return a session ID, so any resume value you pass is always invalid. If you need resume, pick something else.
+From Anysphere, the CLI counterpart to the Cursor editor. **Session resumption works** with current Cursor Agent releases: the stream-json event includes a `session_id`, and Multica passes it back with `--resume <id>` on the next run. MCP config is materialized into the task workspace's `.cursor/mcp.json`, with Cursor's project approval file written under a per-task `CURSOR_DATA_DIR` so managed MCP servers do not depend on the user's global Cursor approvals.

 ### Gemini

@@ -82,17 +82,16 @@ The session resumption mechanism is covered in [Tasks](/tasks#can-a-task-continu

 | Status | Tools | Meaning |
 |---|---|---|
-| ✅ Really works | Antigravity, Claude Code, Copilot, Hermes, Kimi, Kiro CLI, OpenCode, OpenClaw, Pi | Pass the resume id and it continues from the previous context |
-| ⚠️ Code exists but unreachable | Codex, Cursor | Resume paths exist in the code but aren't actually reached (Codex silently falls back; Cursor doesn't return session id) — **treat as unsupported** |
+| ✅ Really works | Antigravity, Claude Code, Codex, Copilot, Cursor, Hermes, Kimi, Kiro CLI, OpenCode, OpenClaw, Pi | Pass the resume id and it continues from the previous context |
 | ❌ None | Gemini | The CLI has no resume mechanism |

 **For your decision**: if your workflow needs agents to preserve context across tasks (failure retries, manual reruns, conversational iteration), pick only from the ✅ row.

 ## MCP configuration: provider-specific support

-**Of the 12 tools, seven consume `mcp_config`: Claude Code, Codex, Hermes, Kimi, Kiro CLI, OpenCode, and OpenClaw**. The other five accept the field but **ignore it** — no error, no warning, the config just has no effect.
+**Of the 12 tools, eight consume `mcp_config`: Claude Code, Codex, Cursor, Hermes, Kimi, Kiro CLI, OpenCode, and OpenClaw**. The other four accept the field but **ignore it** — no error, no warning, the config just has no effect.

-The runtime paths are provider-specific: Claude Code receives it through `--mcp-config` paired with `--strict-mcp-config`; Codex writes a daemon-managed `mcp_servers` block into the per-task `$CODEX_HOME/config.toml`; Hermes, Kimi, and Kiro CLI receive ACP `mcpServers`; OpenCode receives inline config through `OPENCODE_CONFIG_CONTENT`; OpenClaw receives `mcp.servers` through Multica's per-task config wrapper. OpenCode's path does **not** rewrite the project's `opencode.json`.
+The runtime paths are provider-specific: Claude Code receives it through `--mcp-config` paired with `--strict-mcp-config`; Codex writes a daemon-managed `mcp_servers` block into the per-task `$CODEX_HOME/config.toml`; Cursor writes `.cursor/mcp.json` plus per-task project approvals under `CURSOR_DATA_DIR`; Hermes, Kimi, and Kiro CLI receive ACP `mcpServers`; OpenCode receives inline config through `OPENCODE_CONFIG_CONTENT`; OpenClaw receives `mcp.servers` through Multica's per-task config wrapper. OpenCode's path does **not** rewrite the project's `opencode.json`.

 <Callout type="warning">
 If you set `mcp_config` in an agent configuration but pick a tool not marked ✅ in the MCP column, your MCP servers have **no effect** on that agent. MCP integration is provider-specific.
--- a/apps/docs/content/docs/providers.zh.mdx
+++ b/apps/docs/content/docs/providers.zh.mdx
@@ -13,11 +13,11 @@ Multica 内置支持 **12 款 AI 编程工具**。它们都实现了同一套接

 | 工具 | 厂商 | 会话恢复 | MCP | Skill 注入路径 | 模型选择 |
 |---|---|---|---|---|---|
-| **Antigravity** | Google | ✅（`--conversation <id>`）| ❌ | `.agents/skills/` | 由 Antigravity CLI 自己管理 |
+| **Antigravity** | Google | ✅（`--conversation <id>`）| ❌ | `.agents/skills/` | 动态发现（`agy models`）|
 | **Claude Code** | Anthropic | ✅ | ✅ | `.claude/skills/` | 静态 + flag |
-| **Codex** | OpenAI | ⚠️ 代码存在但不可达 | ✅ | `$CODEX_HOME/skills/` | 静态 |
+| **Codex** | OpenAI | ✅ | ✅ | `$CODEX_HOME/skills/` | 静态 |
 | **Copilot** | GitHub | ✅ | ❌ | `.github/skills/` | 静态（账号权益决定）|
-| **Cursor** | Anysphere | ⚠️ 代码存在但不可用 | ❌ | `.cursor/skills/` | 动态发现 |
+| **Cursor** | Anysphere | ✅ | ✅ | `.cursor/skills/` | 动态发现 |
 | **Gemini** | Google | ❌ | ❌ | `.agent_context/skills/` | 静态 |
 | **Hermes** | Nous Research | ✅ | ✅ | `.agent_context/skills/` （fallback）| 动态发现 |
 | **Kimi** | Moonshot | ✅ | ✅ | `.kimi/skills/` | 动态发现 |
@@ -30,7 +30,7 @@ Multica 内置支持 **12 款 AI 编程工具**。它们都实现了同一套接

 ### Antigravity

-Google 出品。CLI 二进制名为 `agy`，搭配 Google Antigravity 服务，默认走 Gemini 系列模型。**会话恢复真用**——通过 `--conversation <id>`；因为 stdout 是纯文本而非结构化事件流，守护进程从 CLI 的日志文件里抓取 conversation UUID。CLI 没有 `--model` flag——模型选择保存在 Antigravity 自己的设置里，因此 Multica 禁用了这款工具的模型选择控件。Skill 文件写入 `.agents/skills/`（CLI 沿用 Gemini CLI 的 workspace 布局——见 [Antigravity 迁移文档](https://antigravity.google/docs/gcli-migration)）。
+Google 出品。CLI 二进制名为 `agy`，搭配 Google Antigravity 服务，默认走 Gemini 系列模型。**会话恢复真用**——通过 `--conversation <id>`；因为 stdout 是纯文本而非结构化事件流，守护进程从 CLI 的日志文件里抓取 conversation UUID。**模型选择真用**——通过 `--model` flag（agy 1.0.6 新增）：守护进程用 `agy models` 枚举可选项，并把选中的值原样传入。注意这些是 `Claude Opus 4.6 (Thinking)` 这样的人类可读显示名，而非 `provider/model` slug；而且 agy 遇到无法识别的值会静默空跑，所以优先从发现列表里挑选，不要手填。Skill 文件写入 `.agents/skills/`（CLI 沿用 Gemini CLI 的 workspace 布局——见 [Antigravity 迁移文档](https://antigravity.google/docs/gcli-migration)）。

 ### Claude Code

@@ -38,7 +38,7 @@ Anthropic 出品。**新用户首选**——功能最完整：会话恢复真用

 ### Codex

-OpenAI 出品。使用 JSON-RPC 2.0 协议，状态化更强，approve 机制更细（手动批准 `exec_command` 和 `patch_apply`）。MCP 配置会写入单次任务的 `$CODEX_HOME/config.toml`。**会话恢复代码存在但当前不可达**——如果你需要 resume，选 Claude Code 或 ACP 系列。
+OpenAI 出品。使用 JSON-RPC 2.0 协议，状态化更强，approve 机制更细（手动批准 `exec_command` 和 `patch_apply`）。MCP 配置会写入单次任务的 `$CODEX_HOME/config.toml`。**会话恢复可用**——Multica 通过 Codex app-server 的 `thread/resume` 续接；如果已保存的 thread 不存在或过期，会回退到新 thread，让任务继续执行。

 ### Copilot

@@ -46,7 +46,7 @@ GitHub 出品。模型路由走你的 GitHub 账号权益——工具自己不

 ### Cursor

-Anysphere 出品，Cursor 编辑器的 CLI 对应物。**会话恢复代码存在但实际不工作**——Cursor CLI 的事件流里不回传 session ID，所以你传的 resume 值永远无效。如果要 resume，选别的。
+Anysphere 出品，Cursor 编辑器的 CLI 对应物。**会话恢复可用**——当前 Cursor Agent 的 stream-json 事件会返回 `session_id`，Multica 会在下一次运行时通过 `--resume <id>` 传回去。MCP 配置会写入任务工作区的 `.cursor/mcp.json`，Cursor 的项目 approval 文件写在单次任务的 `CURSOR_DATA_DIR` 下，因此托管的 MCP server 不依赖用户全局 Cursor approvals。

 ### Gemini

@@ -82,17 +82,16 @@ Inflection AI 出品，极简主义。**会话恢复机制特殊**——session

 | 状态 | 工具 | 含义 |
 |---|---|---|
-| ✅ 真用 | Antigravity、Claude Code、Copilot、Hermes、Kimi、Kiro CLI、OpenCode、OpenClaw、Pi | 传 resume id，会从上次上下文接着继续 |
-| ⚠️ 代码存在但不可达 | Codex、Cursor | 代码里有 resume 路径但实际走不到（Codex 静默回落、Cursor session id 不回传）—— **当作不支持** |
+| ✅ 真用 | Antigravity、Claude Code、Codex、Copilot、Cursor、Hermes、Kimi、Kiro CLI、OpenCode、OpenClaw、Pi | 传 resume id，会从上次上下文接着继续 |
 | ❌ 无 | Gemini | CLI 无 resume 机制 |

 **对你的决策**：如果工作流需要智能体在多次任务之间保持上下文（失败重试、手动重跑、对话式迭代），只选 ✅ 那一行的工具。

 ## MCP 配置：按工具不同

-**12 款工具里有 7 款实际消费 `mcp_config`：Claude Code、Codex、Hermes、Kimi、Kiro CLI、OpenCode、OpenClaw**。其他 5 款会接收这个字段但**忽略**——不报错、不警告，只是配置不生效。
+**12 款工具里有 8 款实际消费 `mcp_config`：Claude Code、Codex、Cursor、Hermes、Kimi、Kiro CLI、OpenCode、OpenClaw**。其他 4 款会接收这个字段但**忽略**——不报错、不警告，只是配置不生效。

-各工具的接入方式不同：Claude Code 通过 `--mcp-config` 加 `--strict-mcp-config` 接收；Codex 会把 daemon 管理的 `mcp_servers` block 写入单次任务的 `$CODEX_HOME/config.toml`；Hermes、Kimi、Kiro CLI 通过 ACP `mcpServers` 接收；OpenCode 通过 `OPENCODE_CONFIG_CONTENT` 环境变量内联接收；OpenClaw 通过 Multica 的单次任务配置 wrapper 接收 `mcp.servers`。OpenCode 这条路径**不会**改写项目里的 `opencode.json`。
+各工具的接入方式不同：Claude Code 通过 `--mcp-config` 加 `--strict-mcp-config` 接收；Codex 会把 daemon 管理的 `mcp_servers` block 写入单次任务的 `$CODEX_HOME/config.toml`；Cursor 会写入 `.cursor/mcp.json`，并把项目 approval 写到单次任务的 `CURSOR_DATA_DIR`；Hermes、Kimi、Kiro CLI 通过 ACP `mcpServers` 接收；OpenCode 通过 `OPENCODE_CONFIG_CONTENT` 环境变量内联接收；OpenClaw 通过 Multica 的单次任务配置 wrapper 接收 `mcp.servers`。OpenCode 这条路径**不会**改写项目里的 `opencode.json`。

 <Callout type="warning">
 如果你在智能体配置里设置了 `mcp_config`，但选了矩阵 MCP 列没有标 ✅ 的工具，你的 MCP server 对这个智能体**没有效果**。MCP 集成是按工具实现的。
--- a/apps/docs/content/docs/self-host-quickstart.ja.mdx
+++ b/apps/docs/content/docs/self-host-quickstart.ja.mdx
@@ -30,7 +30,7 @@ make selfhost

 `make selfhost` は次のことを行います。

-1. `.env` がなければ `.env.example` から生成し、**ランダムな JWT_SECRET** を併せて作成します
+1. `.env` がなければ `.env.example` から生成し、**ランダムな JWT_SECRET と Postgres パスワード** を併せて作成します
 2. 公式の Docker イメージ（PostgreSQL、Multica backend、Multica frontend）を取得します
 3. `docker-compose.selfhost.yml` を使ってすべてのサービスを起動します
 4. バックエンドの `/health` エンドポイントが準備できるまで待機します
@@ -49,7 +49,7 @@ make selfhost
 - **バックエンド**: [http://localhost:8080](http://localhost:8080)

 <Callout type="info">
-**ポートは `127.0.0.1` でのみ待ち受けます。** `docker-compose.selfhost.yml` は公開されるすべてのポートを loopback にバインドします — `ss -tlnp` には `0.0.0.0:8080` は表示されず、設計上、他のマシンからはサービスにアクセスできません。デフォルトの `JWT_SECRET` と Postgres の認証情報は、公開インターネット上に置いては絶対にいけません。マシン間アクセスが必要な場合は、TLS を終端するリバースプロキシをスタックの前に置いてください — [ステップ5b — マシン間: リバースプロキシを前に置く](#5b-cross-machine-front-with-a-reverse-proxy)を参照してください。
+**ポートは `127.0.0.1` でのみ待ち受けます。** `docker-compose.selfhost.yml` は公開されるすべてのポートを loopback にバインドします — `ss -tlnp` には `0.0.0.0:8080` は表示されず、設計上、他のマシンからはサービスにアクセスできません。サーバーのシークレットと Postgres の認証情報は、公開インターネット上に置いては絶対にいけません。マシン間アクセスが必要な場合は、TLS を終端するリバースプロキシをスタックの前に置いてください — [ステップ5b — マシン間: リバースプロキシを前に置く](#5b-cross-machine-front-with-a-reverse-proxy)を参照してください。
 </Callout>

 ## 2. 重要: プロダクションの安全設定を維持する
@@ -81,7 +81,7 @@ make selfhost

 **オプション B — SMTP relay（内部ネットワーク / オンプレミス）:**

-デプロイ環境が `api.resend.com` に到達できない場合や、すでに内部メールリレー（Microsoft Exchange、Postfix、オンプレミスの SendGrid など）がある場合に使ってください。両方が設定されている場合は `SMTP_HOST` が Resend より優先されるため、認証メールと招待メールは内部リレーにとどまります。ポート 465（SMTPS / 暗黙的 TLS）は現在サポートされていません — 25 または 587 を使ってください。
+デプロイ環境が `api.resend.com` に到達できない場合や、すでに内部メールリレー（Microsoft Exchange、Postfix、オンプレミスの SendGrid など）がある場合に使ってください。両方が設定されている場合は `SMTP_HOST` が Resend より優先されるため、認証メールと招待メールは内部リレーにとどまります。STARTTLS は広告されると自動的にアップグレードされます。ポート `465`（SMTPS / 暗黙的 TLS）は接続直後の TLS ハンドシェイクを自動的に有効化し、`SMTP_TLS=implicit`（別名: `smtps`、`ssl`）は非標準の SMTPS ポートで強制的に有効化します。

 **匿名 Exchange 内部リレー（ポート 25）** — ホストが IP で信頼され、認証情報なしで送信する場合:

@@ -105,6 +105,26 @@ SMTP_TLS_INSECURE=false        # set true only for private CA / self-signed
 RESEND_FROM_EMAIL=noreply@yourdomain.com
 ```

+**暗黙的 TLS / SMTPS（ポート 465）** — STARTTLS を広告しないアリババクラウド / テンセントの法人メールなどのプロバイダー向け。ポート `465` は暗黙的 TLS を自動的に有効化するため、ここでは `SMTP_TLS` は省略可能です:
+
+```bash
+SMTP_HOST=smtp.qiye.aliyun.com
+SMTP_PORT=465
+SMTP_USERNAME=multica@yourdomain.com
+SMTP_PASSWORD=...
+SMTP_TLS=implicit              # optional on 465; required on a non-standard SMTPS port
+RESEND_FROM_EMAIL=noreply@yourdomain.com
+```
+
+**厳格な公開 relay（例: Google Workspace `smtp-relay.gmail.com`）** が公開 IP からのデフォルトの `localhost` 挨拶を拒否する場合は、`SMTP_EHLO_NAME` を relay が期待する FQDN に設定してください — そうしないと接続が切断され、後続のコマンドで不明瞭な `EOF` として表面化します。デフォルトはコンテナのホスト名で、これは通常は有効な FQDN ではありません。
+
+```bash
+SMTP_HOST=smtp-relay.gmail.com
+SMTP_PORT=587
+SMTP_EHLO_NAME=mail.yourdomain.com   # FQDN the relay accepts; defaults to the (non-FQDN) container hostname
+RESEND_FROM_EMAIL=noreply@yourdomain.com
+```
+
 その後、再起動します: `docker compose -f docker-compose.selfhost.yml restart backend`。再起動時、バックエンドはどのプロバイダーを選んだかを出力します（`EmailService: SMTP relay …` / `Resend API` / `DEV mode`）— 認証情報は決してログに残らないため、この行はヘルプを求めるときに共有しても安全です。

 追加の認証構成（OAuth、サインアップの許可リスト）と SMTP 変数の完全なリファレンスは、[認証設定](/auth-setup)と[環境変数 → メール](/environment-variables#email-configuration)を参照してください。
@@ -134,7 +154,7 @@ multica setup self-host

 ### 5b. マシン間: リバースプロキシを前に置く

-compose スタックは `127.0.0.1` でのみ待ち受けるため、別のマシンにあるデーモンは `http://<server-ip>:8080` に直接接続できません — そして、そうなることを望むべきでもありません。さもなければデフォルトの `JWT_SECRET` が公開インターネットから到達可能になってしまうからです。TLS を終端し、`127.0.0.1:8080`（バックエンド）と `127.0.0.1:3000`（フロントエンド）へ転送するリバースプロキシをサーバーに置き、CLI を公開 HTTPS URL に向けてください。
+compose スタックは `127.0.0.1` でのみ待ち受けるため、別のマシンにあるデーモンは `http://<server-ip>:8080` に直接接続できません — そして、そうなることを望むべきでもありません。さもなければサーバーのシークレットが公開インターネットから到達可能になってしまうからです。TLS を終端し、`127.0.0.1:8080`（バックエンド）と `127.0.0.1:3000`（フロントエンド）へ転送するリバースプロキシをサーバーに置き、CLI を公開 HTTPS URL に向けてください。

 ```bash
 multica setup self-host \
@@ -142,6 +162,10 @@ multica setup self-host \
  --app-url https://<your-domain>
 ```

+<Callout type="info">
+フラグより環境変数を使いたい場合は、対応するフラグを省略すると `setup self-host` が `MULTICA_SERVER_URL` と `MULTICA_APP_URL` を読み取ります（両方設定した場合はフラグが優先されます）。`MULTICA_SERVER_URL` は[環境変数](/environment-variables)で示される `ws://…/ws` というデーモン形式も受け付け、HTTP ベース URL に正規化します。
+</Callout>
+
 単一のホスト名でフロントエンドとバックエンドの両方を前段に置く（デーモンと Web アプリの両方に必要な WebSocket サポートを含む）最小限の Caddyfile は次のとおりです。

 ```nginx
@@ -172,44 +196,26 @@ multica.example.com {

 Cloud と同じ流れです — [Cloud クイックスタート → ステップ5-6](/cloud-quickstart#5-create-an-agent)を参照してください。

-## 7. 使用量ロールアップのスケジューリング（使用量ダッシュボードに必須）
+<span id="7-usage-rollup-no-operator-action-required" />

-<Callout type="warning">
-使用量 / ランタイムのダッシュボードは、`rollup_task_usage_hourly()` が埋める派生テーブル `task_usage_hourly` からデータを読み取ります。バンドルされた `pgvector/pgvector:pg17` の Postgres イメージには **`pg_cron` が含まれておらず**、バックエンドもロールアップをインプロセスで実行しません。`rollup_task_usage_hourly()` をスケジューリングするものが何もないと、生の `task_usage` 行は届き続けるのに、ダッシュボードは永遠にゼロのままになります。
+## 7. 使用量ロールアップ（オペレーターの操作は不要）
+
+<Callout type="info">
+使用量 / ランタイムのダッシュボードは、`rollup_task_usage_hourly()` が埋める派生テーブル `task_usage_hourly` からデータを読み取ります。MUL-2957 以降、バックエンドは DB バックドのスケジューラー経由でこのロールアップをインプロセスで実行するようになり、`pg_cron` は不要になりました。外部 cron / systemd タイマーも推奨セットアップではなくなっています。バンドルされた `pgvector/pgvector:pg17` イメージは変更なしで動作します。
 </Callout>

-サポートされているオプションのいずれか1つを選んでください — 1つあれば十分です。
+インプロセススケジューラーは 30 秒おきにティックし、`sys_cron_executions` テーブルを介して 5 分ごとの UTC プランをクレームします。複数のバックエンドレプリカでも安全です — 一意キー `(job_name, scope_kind, scope_id, plan_time)` により、各プランで勝者は 1 つだけです。新規デプロイでは何の設定も不要です。

-**オプション A — 外部 cron / systemd-timer（最もシンプル）。** 任意の帯域外スケジューラから5分ごとにロールアップを実行します。冪等でウォーターマーク駆動なので、取りこぼしたティックは追いつきます。
-
-```bash
-# /etc/cron.d/multica-rollup — every 5 minutes
-*/5 * * * * root docker compose -f /path/to/multica/docker-compose.selfhost.yml \
-  exec -T postgres psql -U multica -d multica \
-  -c "SELECT rollup_task_usage_hourly();" >/dev/null
-```
-
-**オプション B — Postgres を `pg_cron` を同梱したイメージに置き換える。** `docker-compose.selfhost.yml` の `pgvector/pgvector:pg17` を、`pgvector` と `pg_cron` の両方を備えたイメージ（`supabase/postgres`、またはカスタムビルド）に置き換え、`shared_preload_libraries=pg_cron` を設定して再起動してから、ジョブを一度登録します。
+**互換性 — 既存の `pg_cron` 登録。** 以前 rollup を `pg_cron` ジョブとして登録していた（`SELECT cron.schedule('rollup_task_usage_hourly', '*/5 * * * *', …)`）場合でも、削除する必要はありません — SQL 関数が内部で advisory lock 4246 を保持するため、アプリのスケジューラーと `pg_cron` が二重書き込みすることはありません。冗長なエントリを削除するには:

 ```sql
-CREATE EXTENSION IF NOT EXISTS pg_cron;
-SELECT cron.schedule(
-  'rollup_task_usage_hourly',
-  '*/5 * * * *',
-  $$SELECT rollup_task_usage_hourly()$$
-);
+SELECT cron.unschedule('rollup_task_usage_hourly')
+  FROM cron.job WHERE jobname = 'rollup_task_usage_hourly';
 ```

-**オプション C — まず履歴をバックフィルする（アップグレード経路）。** `v0.3.4 → v0.3.5+` へアップグレード中で、既存の `task_usage` 行がある場合、migration `103` は hourly テーブルがシードされるまで `refusing to drop legacy daily rollups: ...` とともに `migrate up` を中断します。バンドルされたバックフィルを一度実行してから、オプション A または B を設定してください。
+**`v0.3.4 → v0.3.5+` からのアップグレード。** 以前のリリースでは、migration 103 を適用する前にオペレーターが手動で `cmd/backfill_task_usage_hourly` を実行する必要があり、そうしないと migration の fail-closed ガードが `migrate up` を中断していました。MUL-2957 以降、これは自動化されました — migrate コマンドが migration 103 を適用する直前に冪等な月単位スライスのバックフィル（advisory lock 4246 の下）を実行してから処理を続行します。忙しい DB では `--sleep-between-slices=2s` で読み取り負荷を絞るためにスタンドアロンの backfill を実行することもできますが、もはや必須ではありません。

-```bash
-docker compose -f docker-compose.selfhost.yml exec backend \
-  ./backfill_task_usage_hourly --sleep-between-slices=2s
-```
-
-`--sleep-between-slices=2s` は、忙しい DB での読み取り負荷を調整します。完了後、バックエンドのコンテナを再起動すると（起動時に migration が実行されます）アップグレードが完了します。
-
-完全なリファレンス — Kubernetes の `CronJob` テンプレートとアップグレード順序を含む — は、リポジトリの [`SELF_HOSTING_ADVANCED.md → Usage Dashboard Rollup`](https://github.com/multica-ai/multica/blob/main/SELF_HOSTING_ADVANCED.md#usage-dashboard-rollup) にあります。
+完全なリファレンス（運用ノートと Kubernetes デプロイ形態を含む）は、リポジトリの [`SELF_HOSTING_ADVANCED.md → Usage Dashboard Rollup`](https://github.com/multica-ai/multica/blob/main/SELF_HOSTING_ADVANCED.md#usage-dashboard-rollup) にあります。

 ## Kubernetes デプロイ（代替手段）

@@ -263,8 +269,8 @@ multica setup self-host \
 - **バックエンドが起動しない**: `docker compose -f docker-compose.selfhost.yml logs backend` でコンテナのログを確認してください。たいていは `.env` の不正な `DATABASE_URL` または `JWT_SECRET` が原因です
 - **認証コードが届かない**: メールバックエンドが構成されていない場合（Resend も SMTP もない）→ `docker compose logs backend` で `[DEV] Verification code` を探してください
 - **WebSocket が接続できない**: 公開デプロイでは、`FRONTEND_ORIGIN` を実際のフロントエンドのドメインに必ず設定する必要があります。[トラブルシューティング → WebSocket が接続できない](/troubleshooting#websocket-wont-connect)を参照してください
- **使用量 / ランタイムのダッシュボードがゼロのまま**: `rollup_task_usage_hourly()` がスケジューリングされていません — 上記の [ステップ7](#7-schedule-the-usage-rollup-required-for-the-usage-dashboard)と[トラブルシューティング → 使用量ダッシュボードがゼロと表示される](/troubleshooting#usage-dashboard-stays-at-zero)を参照してください
- **`migrate up` が `refusing to drop legacy daily rollups` で失敗する**: `v0.3.4 → v0.3.5+` のアップグレード経路ガードです。まず `backfill_task_usage_hourly` を実行してください — [ステップ7 → オプション C](#7-schedule-the-usage-rollup-required-for-the-usage-dashboard)を参照してください
+- **使用量 / ランタイムのダッシュボードがゼロのまま**: `rollup_task_usage_hourly()` がスケジューリングされていません — 上記の [ステップ7](#7-usage-rollup-no-operator-action-required)と[トラブルシューティング → 使用量ダッシュボードがゼロと表示される](/troubleshooting#usage-dashboard-stays-at-zero)を参照してください
+- **`migrate up` が `refusing to drop legacy daily rollups` で失敗する**: `v0.3.4 → v0.3.5+` のアップグレード経路ガードです。MUL-2957 以降、migrate コマンドは migration 103 を適用する前に自動でバックフィルを実行します — [ステップ7](#7-usage-rollup-no-operator-action-required)を参照してください

 ## 次のステップ

--- a/apps/docs/content/docs/self-host-quickstart.ko.mdx
+++ b/apps/docs/content/docs/self-host-quickstart.ko.mdx
@@ -30,7 +30,7 @@ make selfhost

 `make selfhost`는 다음을 수행합니다.

-1. `.env`가 없으면 `.env.example`로부터 생성하며 **무작위 JWT_SECRET**을 함께 만듭니다
+1. `.env`가 없으면 `.env.example`로부터 생성하며 **무작위 JWT_SECRET과 Postgres 비밀번호**를 함께 만듭니다
 2. 공식 Docker 이미지(PostgreSQL, Multica backend, Multica frontend)를 받아옵니다
 3. `docker-compose.selfhost.yml`을 사용해 모든 서비스를 시작합니다
 4. 백엔드의 `/health` 엔드포인트가 준비될 때까지 기다립니다
@@ -49,7 +49,7 @@ make selfhost
 - **백엔드**: [http://localhost:8080](http://localhost:8080)

 <Callout type="info">
-**포트는 `127.0.0.1`에서만 수신합니다.** `docker-compose.selfhost.yml`은 공개된 모든 포트를 loopback에 바인딩합니다 — `ss -tlnp`에서는 `0.0.0.0:8080`이 보이지 않으며, 설계상 다른 기기에서는 서비스에 접근할 수 없습니다. 기본 `JWT_SECRET`과 Postgres 자격 증명이 공개 인터넷에 노출되어서는 절대 안 됩니다. 기기 간 접근이 필요하면 TLS를 종료하는 리버스 프록시를 스택 앞에 두세요 — [5b단계 — 기기 간: 리버스 프록시를 앞에 두기](#5b-cross-machine-front-with-a-reverse-proxy)를 참고하세요.
+**포트는 `127.0.0.1`에서만 수신합니다.** `docker-compose.selfhost.yml`은 공개된 모든 포트를 loopback에 바인딩합니다 — `ss -tlnp`에서는 `0.0.0.0:8080`이 보이지 않으며, 설계상 다른 기기에서는 서비스에 접근할 수 없습니다. 서버 시크릿과 Postgres 자격 증명이 공개 인터넷에 노출되어서는 절대 안 됩니다. 기기 간 접근이 필요하면 TLS를 종료하는 리버스 프록시를 스택 앞에 두세요 — [5b단계 — 기기 간: 리버스 프록시를 앞에 두기](#5b-cross-machine-front-with-a-reverse-proxy)를 참고하세요.
 </Callout>

 ## 2. 중요: 프로덕션 안전 설정 유지하기
@@ -81,7 +81,7 @@ make selfhost

 **옵션 B — SMTP relay(내부 네트워크 / 온프레미스):**

-배포 환경이 `api.resend.com`에 접근할 수 없거나, 이미 내부 메일 릴레이(Microsoft Exchange, Postfix, 온프레미스 SendGrid 등)가 있는 경우에 사용하세요. 둘 다 설정된 경우 `SMTP_HOST`가 Resend보다 우선하므로, 인증 및 초대 메일이 내부 릴레이에 머무릅니다. 465 포트(SMTPS / 암묵적 TLS)는 현재 지원하지 않습니다 — 25 또는 587을 사용하세요.
+배포 환경이 `api.resend.com`에 접근할 수 없거나, 이미 내부 메일 릴레이(Microsoft Exchange, Postfix, 온프레미스 SendGrid 등)가 있는 경우에 사용하세요. 둘 다 설정된 경우 `SMTP_HOST`가 Resend보다 우선하므로, 인증 및 초대 메일이 내부 릴레이에 머무릅니다. STARTTLS는 광고될 때 자동으로 업그레이드됩니다. `465` 포트(SMTPS / 암묵적 TLS)는 연결 직후의 TLS 핸드셰이크를 자동으로 활성화하며, `SMTP_TLS=implicit`(별칭: `smtps`, `ssl`)는 비표준 SMTPS 포트에서 강제로 활성화합니다.

 **익명 Exchange 내부 릴레이(포트 25)** — 호스트가 IP로 신뢰되며 자격 증명 없이 제출하는 경우:

@@ -105,6 +105,26 @@ SMTP_TLS_INSECURE=false        # 비공개 CA / 자체 서명 인증서일 때
 RESEND_FROM_EMAIL=noreply@yourdomain.com
 ```

+**암묵적 TLS / SMTPS(포트 465)** — STARTTLS를 광고하지 않는 알리바바 클라우드 / 텐센트 기업 메일 같은 제공자용. 포트 `465`는 암묵적 TLS를 자동으로 활성화하므로, 여기서 `SMTP_TLS`는 생략할 수 있습니다:
+
+```bash
+SMTP_HOST=smtp.qiye.aliyun.com
+SMTP_PORT=465
+SMTP_USERNAME=multica@yourdomain.com
+SMTP_PASSWORD=...
+SMTP_TLS=implicit              # optional on 465; required on a non-standard SMTPS port
+RESEND_FROM_EMAIL=noreply@yourdomain.com
+```
+
+공개 IP에서 보내는 기본 `localhost` greeting을 거부하는 **엄격한 공개 relay(예: Google Workspace `smtp-relay.gmail.com`)** 의 경우, relay가 기대하는 FQDN으로 `SMTP_EHLO_NAME`을 설정하세요 — 그렇지 않으면 연결이 끊기고, 이는 이후 명령에서 불투명한 `EOF`로 나타납니다. 기본값은 컨테이너 호스트명이며, 보통 유효한 FQDN이 아닙니다:
+
+```bash
+SMTP_HOST=smtp-relay.gmail.com
+SMTP_PORT=587
+SMTP_EHLO_NAME=mail.yourdomain.com   # relay가 받아들이는 FQDN; 기본값은 (FQDN이 아닌) 컨테이너 호스트명
+RESEND_FROM_EMAIL=noreply@yourdomain.com
+```
+
 그런 다음 재시작합니다: `docker compose -f docker-compose.selfhost.yml restart backend`. 재시작 시 백엔드는 어떤 제공자를 선택했는지 출력합니다(`EmailService: SMTP relay …` / `Resend API` / `DEV mode`) — 자격 증명은 절대 로그에 남지 않으므로, 이 줄은 도움을 요청할 때 공유해도 안전합니다.

 추가 인증 구성(OAuth, 가입 허용 목록)과 전체 SMTP 변수 레퍼런스는 [인증 설정](/auth-setup)과 [환경 변수 → 이메일](/environment-variables#email-configuration)을 참고하세요.
@@ -134,7 +154,7 @@ multica setup self-host

 ### 5b. 기기 간: 리버스 프록시를 앞에 두기

-compose 스택은 `127.0.0.1`에서만 수신하므로, 다른 기기에 있는 데몬은 `http://<server-ip>:8080`에 직접 연결할 수 없습니다 — 그리고 그렇게 되기를 원해서도 안 됩니다. 그렇지 않으면 기본 `JWT_SECRET`이 공개 인터넷에서 접근 가능해지기 때문입니다. 서버에 TLS를 종료하고 `127.0.0.1:8080`(백엔드)과 `127.0.0.1:3000`(프런트엔드)으로 전달하는 리버스 프록시를 두고, CLI를 공개 HTTPS URL로 연결하세요.
+compose 스택은 `127.0.0.1`에서만 수신하므로, 다른 기기에 있는 데몬은 `http://<server-ip>:8080`에 직접 연결할 수 없습니다 — 그리고 그렇게 되기를 원해서도 안 됩니다. 그렇지 않으면 서버 시크릿이 공개 인터넷에서 접근 가능해지기 때문입니다. 서버에 TLS를 종료하고 `127.0.0.1:8080`(백엔드)과 `127.0.0.1:3000`(프런트엔드)으로 전달하는 리버스 프록시를 두고, CLI를 공개 HTTPS URL로 연결하세요.

 ```bash
 multica setup self-host \
@@ -142,6 +162,10 @@ multica setup self-host \
  --app-url https://<your-domain>
 ```

+<Callout type="info">
+플래그 대신 환경 변수를 선호한다면, 해당 플래그를 생략할 때 `setup self-host`가 `MULTICA_SERVER_URL`과 `MULTICA_APP_URL`을 읽습니다(둘 다 설정하면 플래그가 우선합니다). `MULTICA_SERVER_URL`은 [환경 변수](/environment-variables)에 나오는 `ws://…/ws` 데몬 형식도 허용하며 HTTP 기본 URL로 정규화합니다.
+</Callout>
+
 단일 호스트네임에서 프런트엔드와 백엔드를 모두 앞단에 두는(데몬과 웹 앱 모두에 필요한 WebSocket 지원 포함) 최소 Caddyfile은 다음과 같습니다.

 ```nginx
@@ -172,44 +196,26 @@ multica.example.com {

 Cloud와 동일한 흐름입니다 — [Cloud 빠른 시작 → 5-6단계](/cloud-quickstart#5-create-an-agent)를 참고하세요.

-## 7. 사용량 롤업 스케줄링(사용량 대시보드에 필수)
+<span id="7-usage-rollup-no-operator-action-required" />

-<Callout type="warning">
-사용량 / 런타임 대시보드는 `rollup_task_usage_hourly()`가 채우는 파생 테이블 `task_usage_hourly`에서 데이터를 읽습니다. 번들된 `pgvector/pgvector:pg17` Postgres 이미지에는 **`pg_cron`이 포함되어 있지 않으며**, 백엔드도 롤업을 인프로세스로 실행하지 않습니다. `rollup_task_usage_hourly()`를 스케줄링하는 것이 없으면, 원시 `task_usage` 행은 계속 들어오는데 대시보드는 영원히 0에 머무릅니다.
+## 7. 사용량 롤업(운영자 작업 불필요)
+
+<Callout type="info">
+사용량 / 런타임 대시보드는 `rollup_task_usage_hourly()`가 채우는 파생 테이블 `task_usage_hourly`에서 데이터를 읽습니다. MUL-2957부터 백엔드는 DB 기반 스케줄러를 통해 인프로세스로 이 롤업을 실행하므로 더 이상 `pg_cron`이 필요하지 않으며, 외부 cron / systemd 타이머도 권장 설정이 아닙니다. 번들된 `pgvector/pgvector:pg17` 이미지가 변경 없이 동작합니다.
 </Callout>

-지원되는 옵션 중 하나를 고르세요 — 하나만 있으면 됩니다.
+인프로세스 스케줄러는 30초마다 틱하면서 `sys_cron_executions` 테이블을 통해 5분 단위 UTC 플랜을 클레임합니다. 백엔드 레플리카가 여러 개여도 안전합니다 — 고유 키 `(job_name, scope_kind, scope_id, plan_time)` 덕분에 각 플랜에서 단 하나만이 승자가 됩니다. 신규 배포에는 어떤 설정도 필요 없습니다.

-**옵션 A — 외부 cron / systemd-timer(가장 간단함).** 임의의 외부 스케줄러에서 5분마다 롤업을 실행합니다. 멱등하고 워터마크 기반이므로, 놓친 틱은 따라잡습니다.
-
-```bash
-# /etc/cron.d/multica-rollup — every 5 minutes
-*/5 * * * * root docker compose -f /path/to/multica/docker-compose.selfhost.yml \
-  exec -T postgres psql -U multica -d multica \
-  -c "SELECT rollup_task_usage_hourly();" >/dev/null
-```
-
-**옵션 B — Postgres를 `pg_cron`이 포함된 이미지로 교체.** `docker-compose.selfhost.yml`의 `pgvector/pgvector:pg17`을 `pgvector`와 `pg_cron`을 모두 갖춘 이미지(`supabase/postgres` 또는 커스텀 빌드)로 교체하고, `shared_preload_libraries=pg_cron`을 설정한 뒤 재시작하고, 작업을 한 번 등록합니다.
+**호환성 — 기존 `pg_cron` 등록.** 이전에 rollup을 `pg_cron` 잡으로 등록했었다면(`SELECT cron.schedule('rollup_task_usage_hourly', '*/5 * * * *', …)`) 굳이 제거할 필요는 없습니다 — SQL 함수가 내부적으로 advisory lock 4246을 잡기 때문에 앱 스케줄러와 `pg_cron`이 이중 쓰기를 할 수 없습니다. 중복 항목을 제거하려면:

 ```sql
-CREATE EXTENSION IF NOT EXISTS pg_cron;
-SELECT cron.schedule(
-  'rollup_task_usage_hourly',
-  '*/5 * * * *',
-  $$SELECT rollup_task_usage_hourly()$$
-);
+SELECT cron.unschedule('rollup_task_usage_hourly')
+  FROM cron.job WHERE jobname = 'rollup_task_usage_hourly';
 ```

-**옵션 C — 먼저 히스토리 백필(업그레이드 경로).** `v0.3.4 → v0.3.5+`로 업그레이드하는 중이고 기존 `task_usage` 행이 있다면, migration `103`이 hourly 테이블이 시드될 때까지 `refusing to drop legacy daily rollups: ...`와 함께 `migrate up`을 중단합니다. 번들된 백필을 한 번 실행한 다음, 옵션 A 또는 B를 설정하세요.
+**`v0.3.4 → v0.3.5+` 업그레이드.** 이전 릴리스에서는 migration 103을 적용하기 전에 운영자가 직접 `cmd/backfill_task_usage_hourly`를 실행해야 했고, 그러지 않으면 fail-closed 가드가 `migrate up`을 중단했습니다. MUL-2957부터 이 작업은 자동입니다 — migrate 명령이 migration 103을 적용하기 직전에(advisory lock 4246 보호 아래에서) 멱등한 월별 슬라이스 백필을 실행한 뒤 계속 진행합니다. 바쁜 DB에서는 여전히 `--sleep-between-slices=2s`로 읽기 부하를 조절하기 위해 스탠드얼론 backfill을 실행할 수 있지만 더 이상 필수는 아닙니다.

-```bash
-docker compose -f docker-compose.selfhost.yml exec backend \
-  ./backfill_task_usage_hourly --sleep-between-slices=2s
-```
-
-`--sleep-between-slices=2s`는 바쁜 DB에서 읽기 부하를 조절합니다. 완료된 후 백엔드 컨테이너를 재시작하면(시작 시 migration이 실행됨) 업그레이드가 완료됩니다.
-
-전체 레퍼런스 — Kubernetes `CronJob` 템플릿과 업그레이드 순서 포함 — 는 저장소의 [`SELF_HOSTING_ADVANCED.md → Usage Dashboard Rollup`](https://github.com/multica-ai/multica/blob/main/SELF_HOSTING_ADVANCED.md#usage-dashboard-rollup)에 있습니다.
+전체 레퍼런스(운영 노트와 Kubernetes 배포 형태 포함)는 저장소의 [`SELF_HOSTING_ADVANCED.md → Usage Dashboard Rollup`](https://github.com/multica-ai/multica/blob/main/SELF_HOSTING_ADVANCED.md#usage-dashboard-rollup)에 있습니다.

 ## Kubernetes 배포(대체 방안)

@@ -263,8 +269,8 @@ multica setup self-host \
 - **백엔드가 시작되지 않음**: `docker compose -f docker-compose.selfhost.yml logs backend`로 컨테이너 로그를 확인하세요. 보통 `.env`의 잘못된 `DATABASE_URL` 또는 `JWT_SECRET`이 원인입니다
 - **인증 코드를 받지 못함**: 이메일 백엔드가 구성되지 않은 경우(Resend도 SMTP도 없음) → `docker compose logs backend`에서 `[DEV] Verification code`를 찾으세요
 - **WebSocket이 연결되지 않음**: 공개 배포에서는 반드시 `FRONTEND_ORIGIN`을 실제 프런트엔드 도메인으로 설정해야 합니다. [문제 해결 → WebSocket이 연결되지 않음](/troubleshooting#websocket-wont-connect)을 참고하세요
- **사용량 / 런타임 대시보드가 0에 머무름**: `rollup_task_usage_hourly()`가 스케줄링되지 않고 있습니다 — 위의 [7단계](#7-schedule-the-usage-rollup-required-for-the-usage-dashboard)와 [문제 해결 → 사용량 대시보드가 0으로 표시됨](/troubleshooting#usage-dashboard-stays-at-zero)을 참고하세요
- **`migrate up`이 `refusing to drop legacy daily rollups`로 실패함**: `v0.3.4 → v0.3.5+` 업그레이드 경로 가드입니다. 먼저 `backfill_task_usage_hourly`를 실행하세요 — [7단계 → 옵션 C](#7-schedule-the-usage-rollup-required-for-the-usage-dashboard)를 참고하세요
+- **사용량 / 런타임 대시보드가 0에 머무름**: `rollup_task_usage_hourly()`가 스케줄링되지 않고 있습니다 — 위의 [7단계](#7-usage-rollup-no-operator-action-required)와 [문제 해결 → 사용량 대시보드가 0으로 표시됨](/troubleshooting#usage-dashboard-stays-at-zero)을 참고하세요
+- **`migrate up`이 `refusing to drop legacy daily rollups`로 실패함**: `v0.3.4 → v0.3.5+` 업그레이드 경로 가드입니다. MUL-2957부터 migrate 명령이 migration 103을 적용하기 전에 백필을 자동으로 실행합니다 — [7단계](#7-usage-rollup-no-operator-action-required)를 참고하세요

 ## 다음 단계

--- a/apps/docs/content/docs/self-host-quickstart.mdx
+++ b/apps/docs/content/docs/self-host-quickstart.mdx
@@ -30,7 +30,7 @@ make selfhost

 `make selfhost` will:

-1. Generate a `.env` from `.env.example` if missing, with a **random JWT_SECRET**
+1. Generate a `.env` from `.env.example` if missing, with a **random JWT_SECRET and Postgres password**
 2. Pull the official Docker images (PostgreSQL, Multica backend, Multica frontend)
 3. Bring up every service using `docker-compose.selfhost.yml`
 4. Wait until the backend's `/health` endpoint is ready
@@ -50,7 +50,7 @@ Once it's up:
 - **Backend**: [http://localhost:8080](http://localhost:8080)

 <Callout type="info">
-**Ports listen on `127.0.0.1` only.** `docker-compose.selfhost.yml` binds every published port to loopback — `ss -tlnp` will not show `0.0.0.0:8080`, and the services are unreachable from other machines by design. The default `JWT_SECRET` and Postgres credentials must never sit on the open internet. For cross-machine access, front the stack with a reverse proxy that terminates TLS — see [Step 5b — Cross-machine: front with a reverse proxy](#5b-cross-machine-front-with-a-reverse-proxy).
+**Ports listen on `127.0.0.1` only.** `docker-compose.selfhost.yml` binds every published port to loopback — `ss -tlnp` will not show `0.0.0.0:8080`, and the services are unreachable from other machines by design. Secrets and Postgres credentials must never sit on the open internet. For cross-machine access, front the stack with a reverse proxy that terminates TLS — see [Step 5b — Cross-machine: front with a reverse proxy](#5b-cross-machine-front-with-a-reverse-proxy).
 </Callout>

 ## 2. Important: keep production safety on
@@ -117,6 +117,15 @@ SMTP_TLS=implicit              # optional on 465; required on a non-standard SMT
 RESEND_FROM_EMAIL=noreply@yourdomain.com
 ```

+For **strict public relays (e.g. Google Workspace `smtp-relay.gmail.com`)** that reject the default `localhost` greeting from a public IP, set `SMTP_EHLO_NAME` to the FQDN the relay expects — otherwise the connection is dropped and surfaces as an opaque `EOF` on a later command. It defaults to the container hostname, which is usually not a valid FQDN:
+
+```bash
+SMTP_HOST=smtp-relay.gmail.com
+SMTP_PORT=587
+SMTP_EHLO_NAME=mail.yourdomain.com   # FQDN the relay accepts; defaults to the (non-FQDN) container hostname
+RESEND_FROM_EMAIL=noreply@yourdomain.com
+```
+
 Then restart: `docker compose -f docker-compose.selfhost.yml restart backend`. On restart, the backend prints which provider it picked and the negotiated TLS mode (`EmailService: SMTP relay <host>:<port> (starttls|implicit-tls) from=…` / `Resend API` / `DEV mode`) — credentials are never logged, so this line is safe to share when asking for help.

 For more auth configuration (OAuth, signup allowlist) and the full SMTP variable reference, see [Auth setup](/auth-setup) and [Environment variables → Email](/environment-variables#email-configuration).
@@ -146,7 +155,7 @@ That points the CLI at `http://localhost:8080` (backend) and `http://localhost:3

 ### 5b. Cross-machine: front with a reverse proxy

-Because the compose stack only listens on `127.0.0.1`, a daemon on a different machine cannot reach `http://<server-ip>:8080` directly — and you do not want it to, since the default `JWT_SECRET` would otherwise be reachable from the open internet. Put a reverse proxy on the server that terminates TLS and forwards to `127.0.0.1:8080` (backend) and `127.0.0.1:3000` (frontend), then point the CLI at the public HTTPS URL:
+Because the compose stack only listens on `127.0.0.1`, a daemon on a different machine cannot reach `http://<server-ip>:8080` directly — and you do not want it to, since server secrets would otherwise be reachable from the open internet. Put a reverse proxy on the server that terminates TLS and forwards to `127.0.0.1:8080` (backend) and `127.0.0.1:3000` (frontend), then point the CLI at the public HTTPS URL:

 ```bash
 multica setup self-host \
@@ -154,6 +163,10 @@ multica setup self-host \
  --app-url https://<your-domain>
 ```

+<Callout type="info">
+Prefer environment variables over flags? `setup self-host` reads `MULTICA_SERVER_URL` and `MULTICA_APP_URL` when the matching flag is omitted — a flag still takes precedence over the env var. `MULTICA_SERVER_URL` also accepts the `ws://…/ws` daemon form from [Environment variables](/environment-variables) and normalizes it to the HTTP base.
+</Callout>
+
 A minimal Caddyfile that fronts both the frontend and the backend (with WebSocket support, which the daemon and the web app both need) on a single hostname:

 ```nginx
@@ -184,44 +197,24 @@ After bringing the proxy up, set `FRONTEND_ORIGIN=https://multica.example.com` i

 Same flow as Cloud — see [Cloud quickstart → Steps 5-6](/cloud-quickstart#5-create-an-agent).

-## 7. Schedule the usage rollup (required for the Usage dashboard)
+## 7. Usage rollup (no operator action required)

-<Callout type="warning">
-The Usage / Runtime dashboards read from a derived `task_usage_hourly` table populated by `rollup_task_usage_hourly()`. The bundled `pgvector/pgvector:pg17` Postgres image **does not include `pg_cron`**, and the backend does not run the rollup in-process either. If nothing schedules `rollup_task_usage_hourly()`, raw `task_usage` rows keep arriving while the dashboard stays at zero forever.
+<Callout type="info">
+The Usage / Runtime dashboards read from a derived `task_usage_hourly` table populated by `rollup_task_usage_hourly()`. As of MUL-2957 the backend runs this rollup in-process via the DB-backed scheduler — `pg_cron` is no longer required, and external cron / systemd timers are no longer the recommended setup. The bundled `pgvector/pgvector:pg17` image works without changes.
 </Callout>

-Pick one of the supported options — only one is needed.
+The in-process scheduler ticks every 30 seconds and claims a 5-minute UTC plan via the `sys_cron_executions` table. Multiple backend replicas are safe — the unique key `(job_name, scope_kind, scope_id, plan_time)` means only one wins each plan. No setup is needed for new deployments.

-**Option A — External cron / systemd-timer (simplest).** Run the rollup every 5 minutes from any out-of-band scheduler. It's idempotent and watermark-driven, so missed ticks catch up:
-
-```bash
-# /etc/cron.d/multica-rollup — every 5 minutes
-*/5 * * * * root docker compose -f /path/to/multica/docker-compose.selfhost.yml \
-  exec -T postgres psql -U multica -d multica \
-  -c "SELECT rollup_task_usage_hourly();" >/dev/null
-```
-
-**Option B — Swap Postgres for an image that ships `pg_cron`.** Replace `pgvector/pgvector:pg17` in `docker-compose.selfhost.yml` with an image that has both `pgvector` and `pg_cron` (`supabase/postgres`, or a custom build), set `shared_preload_libraries=pg_cron`, restart, then register the job once:
+**Compatibility — existing `pg_cron` registrations.** If you previously registered the rollup as a `pg_cron` job (`SELECT cron.schedule('rollup_task_usage_hourly', '*/5 * * * *', …)`), you do not need to remove it — the SQL function holds advisory lock 4246 internally, so the app scheduler and `pg_cron` cannot double-write. To drop the redundant entry:

 ```sql
-CREATE EXTENSION IF NOT EXISTS pg_cron;
-SELECT cron.schedule(
-  'rollup_task_usage_hourly',
-  '*/5 * * * *',
-  $$SELECT rollup_task_usage_hourly()$$
-);
+SELECT cron.unschedule('rollup_task_usage_hourly')
+  FROM cron.job WHERE jobname = 'rollup_task_usage_hourly';
 ```

-**Option C — Backfill history first (upgrade path).** If you're upgrading from `v0.3.4 → v0.3.5+` and have existing `task_usage` rows, migration `103` will abort `migrate up` with `refusing to drop legacy daily rollups: ...` until the hourly table is seeded. Run the bundled backfill once, then set up Option A or B:
+**Upgrade from `v0.3.4 → v0.3.5+`.** The previous release asked operators to run `cmd/backfill_task_usage_hourly` manually before applying migration 103, otherwise the migration's fail-closed guard would abort `migrate up`. As of MUL-2957 this is automatic: the migrate command runs an idempotent monthly-slice backfill (under advisory lock 4246) immediately before applying migration 103, then continues. You may still run the standalone backfill on a busy DB to throttle read pressure with `--sleep-between-slices=2s`, but it is no longer required.

-```bash
-docker compose -f docker-compose.selfhost.yml exec backend \
-  ./backfill_task_usage_hourly --sleep-between-slices=2s
-```
-
-`--sleep-between-slices=2s` throttles read pressure on a busy DB. After it finishes, restart the backend container (migrations run on startup) and the upgrade completes.
-
-Full reference — including the Kubernetes `CronJob` template and the upgrade order — lives in the repo's [`SELF_HOSTING_ADVANCED.md → Usage Dashboard Rollup`](https://github.com/multica-ai/multica/blob/main/SELF_HOSTING_ADVANCED.md#usage-dashboard-rollup).
+Full reference — including operations notes and the Kubernetes deployment shape — lives in the repo's [`SELF_HOSTING_ADVANCED.md → Usage Dashboard Rollup`](https://github.com/multica-ai/multica/blob/main/SELF_HOSTING_ADVANCED.md#usage-dashboard-rollup).

 ## Kubernetes deployment (alternative)

@@ -275,8 +268,8 @@ The full reference — three login modes, the `backend` ExternalName workaround
 - **Backend won't start**: check container logs with `docker compose -f docker-compose.selfhost.yml logs backend`; usually it's a bad `DATABASE_URL` or `JWT_SECRET` in `.env`
 - **Verification code not received**: no email backend is configured (neither Resend nor SMTP) → look for `[DEV] Verification code` in `docker compose logs backend`
 - **WebSocket won't connect**: for public deployments you must set `FRONTEND_ORIGIN` to your real frontend domain; see [Troubleshooting → WebSocket won't connect](/troubleshooting#websocket-wont-connect)
- **Usage / Runtime dashboard stays at zero**: `rollup_task_usage_hourly()` isn't being scheduled — see [Step 7](#7-schedule-the-usage-rollup-required-for-the-usage-dashboard) above and [Troubleshooting → Usage dashboard shows zero](/troubleshooting#usage-dashboard-stays-at-zero)
- **`migrate up` fails with `refusing to drop legacy daily rollups`**: upgrade-path guard from `v0.3.4 → v0.3.5+`. Run `backfill_task_usage_hourly` first — see [Step 7 → Option C](#7-schedule-the-usage-rollup-required-for-the-usage-dashboard)
+- **Usage / Runtime dashboard stays at zero**: `rollup_task_usage_hourly()` isn't being scheduled — see [Step 7](#7-usage-rollup-no-operator-action-required) above and [Troubleshooting → Usage dashboard shows zero](/troubleshooting#usage-dashboard-stays-at-zero)
+- **`migrate up` fails with `refusing to drop legacy daily rollups`**: upgrade-path guard from `v0.3.4 → v0.3.5+`. As of MUL-2957 the migrate command runs the backfill automatically before applying migration 103 — see [Step 7](#7-usage-rollup-no-operator-action-required)

 ## Next steps

--- a/apps/docs/content/docs/self-host-quickstart.zh.mdx
+++ b/apps/docs/content/docs/self-host-quickstart.zh.mdx
@@ -30,7 +30,7 @@ make selfhost

 `make selfhost` 会：

-1. 如果没有 `.env` 文件，从 `.env.example` 自动生成一份并**生成随机 JWT_SECRET**
+1. 如果没有 `.env` 文件，从 `.env.example` 自动生成一份，并**生成随机 JWT_SECRET 和 Postgres 密码**
 2. 拉取官方 Docker 镜像（PostgreSQL、Multica backend、Multica frontend）
 3. 用 `docker-compose.selfhost.yml` 启动全部服务
 4. 等后端 `/health` 端点准备就绪
@@ -49,7 +49,7 @@ make selfhost
 - **后端**：[http://localhost:8080](http://localhost:8080)

 <Callout type="info">
-**所有端口只监听 `127.0.0.1`。** `docker-compose.selfhost.yml` 把每个 publish 出来的端口都绑到 loopback —— `ss -tlnp` 不会看到 `0.0.0.0:8080`，外网/其它机器默认根本连不上。这是为了避免默认 `JWT_SECRET` 和 Postgres 凭据被直接暴露到公网。要做跨机访问，请用反向代理在前面终结 TLS，详见下方 [Step 5b —— 跨机访问：用反向代理把服务挡在前面](#5b-跨机访问用反向代理把服务挡在前面)。
+**所有端口只监听 `127.0.0.1`。** `docker-compose.selfhost.yml` 把每个 publish 出来的端口都绑到 loopback —— `ss -tlnp` 不会看到 `0.0.0.0:8080`，外网/其它机器默认根本连不上。这是为了避免服务密钥和 Postgres 凭据被直接暴露到公网。要做跨机访问，请用反向代理在前面终结 TLS，详见下方 [Step 5b —— 跨机访问：用反向代理把服务挡在前面](#5b-跨机访问用反向代理把服务挡在前面)。
 </Callout>

 ## 2. 重要：保持生产安全配置
@@ -116,6 +116,15 @@ SMTP_TLS=implicit              # 465 上可省略；非标准 SMTPS 端口上必
 RESEND_FROM_EMAIL=noreply@yourdomain.com
 ```

+对于**拒绝来自公网 IP 的默认 `localhost` 问候的严格公网 relay（例如 Google Workspace `smtp-relay.gmail.com`）**，把 `SMTP_EHLO_NAME` 设成 relay 期望的 FQDN——否则连接会被直接断开，并在后续某条命令上表现为一个不知所云的 `EOF`。它默认取容器主机名，而后者通常不是合法的 FQDN：
+
+```bash
+SMTP_HOST=smtp-relay.gmail.com
+SMTP_PORT=587
+SMTP_EHLO_NAME=mail.yourdomain.com   # relay 接受的 FQDN；默认取（非 FQDN 的）容器主机名
+RESEND_FROM_EMAIL=noreply@yourdomain.com
+```
+
 之后重启：`docker compose -f docker-compose.selfhost.yml restart backend`。重启时 backend 会打印当前选择的 provider 和协商出的 TLS 模式（`EmailService: SMTP relay <host>:<port> (starttls|implicit-tls) from=…` / `Resend API` / `DEV mode`），密码不会被记录，所以这行截图给同事是安全的。

 更多 auth 配置（OAuth、注册白名单）以及完整的 SMTP 变量说明见 [登录与注册配置](/auth-setup) 和 [环境变量](/environment-variables)。
@@ -145,7 +154,7 @@ multica setup self-host

 ### 5b. 跨机访问：用反向代理把服务挡在前面

-因为 compose 默认只监听 `127.0.0.1`，从别的机器跑的 daemon 是连不上 `http://<server-ip>:8080` 的——这也是有意为之，否则默认 `JWT_SECRET` 等于直接暴露在公网。正确做法是在 server 上跑一个反向代理（Caddy / nginx / Cloudflare Tunnel），由它终结 TLS，再反代到 `127.0.0.1:8080`（backend）和 `127.0.0.1:3000`（frontend）。然后把 CLI 指到公开的 HTTPS 域名：
+因为 compose 默认只监听 `127.0.0.1`，从别的机器跑的 daemon 是连不上 `http://<server-ip>:8080` 的——这也是有意为之，否则服务密钥会直接暴露在公网。正确做法是在 server 上跑一个反向代理（Caddy / nginx / Cloudflare Tunnel），由它终结 TLS，再反代到 `127.0.0.1:8080`（backend）和 `127.0.0.1:3000`（frontend）。然后把 CLI 指到公开的 HTTPS 域名：

 ```bash
 multica setup self-host \
@@ -153,6 +162,10 @@ multica setup self-host \
  --app-url https://<你的域名>
 ```

+<Callout type="info">
+更习惯用环境变量？省略对应 flag 时，`setup self-host` 会读取 `MULTICA_SERVER_URL` 和 `MULTICA_APP_URL`（同时设置时 flag 优先）。`MULTICA_SERVER_URL` 也接受[环境变量](/environment-variables)里那种 `ws://…/ws` 的 daemon 写法，并自动归一化为 HTTP 地址。
+</Callout>
+
 最小可用的 Caddyfile，单域名同时挂前后端（带 WebSocket 转发，daemon 和网页端都依赖）：

 ```nginx
@@ -183,44 +196,26 @@ multica.example.com {

 流程和 Cloud 一样——见 [Cloud 快速上手 → 5-6 步](/cloud-quickstart#5-创建智能体)。

-## 7. 调度用量汇总任务（Usage Dashboard 必需）
+<span id="7-usage-rollup-no-operator-action-required" />

-<Callout type="warning">
-Usage / Runtime 看板读的是派生表 `task_usage_hourly`，需要 `rollup_task_usage_hourly()` 周期性运行才能填充。**默认的 `pgvector/pgvector:pg17` 镜像不带 `pg_cron`**，后端进程内部也不会跑这个 rollup——什么都没调度的话，原始 `task_usage` 行会继续写入，但 dashboard 会一直停在 0，不会报错。
+## 7. 用量汇总（无需运维操作）
+
+<Callout type="info">
+Usage / Runtime 看板读的是派生表 `task_usage_hourly`，由 `rollup_task_usage_hourly()` 周期性填充。从 MUL-2957 起，后端通过 DB 后端的调度器在进程内运行该 rollup —— 不再需要 `pg_cron`，外部 cron / systemd timer 也不再是推荐方案。默认的 `pgvector/pgvector:pg17` 镜像无需改动即可工作。
 </Callout>

-三种支持路径，三选一即可。
+进程内调度器每 30 秒 tick 一次，通过 `sys_cron_executions` 表认领 5 分钟一档的 UTC plan。多 backend 副本同时跑也安全 —— 唯一键 `(job_name, scope_kind, scope_id, plan_time)` 保证每个 plan 只有一个赢家。新部署不需要任何额外配置。

-**Option A —— 外部 cron / systemd-timer（最简单）。** 在任意外部调度器上每 5 分钟跑一次 rollup。函数是幂等的、按 watermark 推进，丢一两个 tick 下次能补上：
-
-```bash
-# /etc/cron.d/multica-rollup —— 每 5 分钟跑一次
-*/5 * * * * root docker compose -f /path/to/multica/docker-compose.selfhost.yml \
-  exec -T postgres psql -U multica -d multica \
-  -c "SELECT rollup_task_usage_hourly();" >/dev/null
-```
-
-**Option B —— 换成自带 `pg_cron` 的 Postgres 镜像。** 把 `docker-compose.selfhost.yml` 里的 `pgvector/pgvector:pg17` 换成同时带 `pgvector` 和 `pg_cron` 的镜像（比如 `supabase/postgres`，或自己 build 一份），把 `shared_preload_libraries=pg_cron` 配上、重启 Postgres，然后注册一次任务：
+**兼容性 —— 已注册的 `pg_cron` 任务。** 如果你之前用 `pg_cron` 注册过 rollup（`SELECT cron.schedule('rollup_task_usage_hourly', '*/5 * * * *', …)`），不删也行 —— SQL 函数内部持有 advisory lock 4246，应用调度器和 `pg_cron` 不会并发双写。要清掉冗余项可以执行：

 ```sql
-CREATE EXTENSION IF NOT EXISTS pg_cron;
-SELECT cron.schedule(
-  'rollup_task_usage_hourly',
-  '*/5 * * * *',
-  $$SELECT rollup_task_usage_hourly()$$
-);
+SELECT cron.unschedule('rollup_task_usage_hourly')
+  FROM cron.job WHERE jobname = 'rollup_task_usage_hourly';
 ```

-**Option C —— 先回填历史（升级路径）。** 如果你是从 `v0.3.4` 升级到 `v0.3.5+` 且数据库里已经有 `task_usage` 行，migration `103` 会以 `refusing to drop legacy daily rollups: ...` 报错并中止 `migrate up`，直到 hourly 表被 seed 过。先跑一次内置的 backfill 命令，然后再配 Option A 或 Option B 让新数据持续流进来：
+**从 `v0.3.4 → v0.3.5+` 升级。** 上一版要求运维在应用 migration 103 之前手动跑 `cmd/backfill_task_usage_hourly`，否则 fail-closed 守卫会中止 `migrate up`。从 MUL-2957 起这一步是自动的：migrate 命令会在应用 migration 103 之前（advisory lock 4246 保护下）运行幂等的按月切片 backfill，然后继续。在繁忙的数据库上你仍可以用 `--sleep-between-slices=2s` 跑独立 backfill 来限制读压力，但已不是必需。

-```bash
-docker compose -f docker-compose.selfhost.yml exec backend \
-  ./backfill_task_usage_hourly --sleep-between-slices=2s
-```
-
-`--sleep-between-slices=2s` 用来在繁忙的数据库上限制读压力。回填跑完后重启后端容器（migration 在启动时自动跑），升级就能继续。
-
-完整参考（含 Kubernetes `CronJob` 模板和升级顺序）见仓库的 [`SELF_HOSTING_ADVANCED.md → Usage Dashboard Rollup`](https://github.com/multica-ai/multica/blob/main/SELF_HOSTING_ADVANCED.md#usage-dashboard-rollup)。
+完整参考（含运维注意事项和 Kubernetes 部署形态）见仓库的 [`SELF_HOSTING_ADVANCED.md → Usage Dashboard Rollup`](https://github.com/multica-ai/multica/blob/main/SELF_HOSTING_ADVANCED.md#usage-dashboard-rollup)。

 ## Kubernetes 部署（替代方案）

@@ -274,8 +269,8 @@ multica setup self-host \
 - **后端起不来**：看容器日志 `docker compose -f docker-compose.selfhost.yml logs backend`；常见是 `.env` 里 `DATABASE_URL` 或 `JWT_SECRET` 有问题
 - **验证码收不到**：没配任何邮件后端（Resend 和 SMTP 都没设） → 从 `docker compose logs backend` 里找 `[DEV] Verification code`
 - **WebSocket 连不上**：公网部署必须设 `FRONTEND_ORIGIN` 成你真实的前端域名；见 [故障排查 → WebSocket 连不上](/troubleshooting#websocket-连不上)
- **Usage / Runtime 看板一直是 0**：没人调度 `rollup_task_usage_hourly()` —— 见上面的 [第 7 步](#7-调度用量汇总任务usage-dashboard-必需) 和 [故障排查 → Usage 看板一直是 0](/troubleshooting#usage-看板一直是-0)
- **`migrate up` 报 `refusing to drop legacy daily rollups`**：`v0.3.4 → v0.3.5+` 升级路径的 fail-closed guard。先跑 `backfill_task_usage_hourly` —— 见 [第 7 步 → Option C](#7-调度用量汇总任务usage-dashboard-必需)
+- **Usage / Runtime 看板一直是 0**：没人调度 `rollup_task_usage_hourly()` —— 见上面的 [第 7 步](#7-usage-rollup-no-operator-action-required) 和 [故障排查 → Usage 看板一直是 0](/troubleshooting#usage-看板一直是-0)
+- **`migrate up` 报 `refusing to drop legacy daily rollups`**：`v0.3.4 → v0.3.5+` 升级路径的 fail-closed guard。从 MUL-2957 起 migrate 命令在应用 migration 103 之前会自动跑 backfill —— 见 [第 7 步](#7-usage-rollup-no-operator-action-required)

 ## 下一步

--- a/apps/docs/content/docs/skills.ja.mdx
+++ b/apps/docs/content/docs/skills.ja.mdx
@@ -54,7 +54,7 @@ GitHub や ClawHub からインポートしたスキルには、スクリプト
 - **スキル** = 構造化された**ナレッジパック**（静的なコンテンツ + 指示）。エージェントはスキルを読んで「問題 X を見たら、こう考えてこう行動する」を学びます。
 - **MCP**（Model Context Protocol）= **ツールチャネル**。エージェントは MCP を使って外部サービス（データベース、ファイルシステム、サードパーティ API）に接続し、それらを**呼び出します**。

-この 2 つは相互補完的です。現在の Multica では、MCP のサポートを**実際に使うのは Claude Code だけ**です — 他のツールは MCP 設定を受け取りはしますが、実際には使いません。MCP 専用のセクションは今後のリリースで追加される予定です。
+この 2 つは相互補完的です。現在の Multica では、MCP サポートは**ツールごとに実装されています**: Claude Code、Codex、Cursor、Hermes、Kimi、Kiro CLI、OpenCode、OpenClaw は `mcp_config` を使用し、他のツールはこのフィールドを受け取っても実際には使いません。MCP 専用のセクションは今後のリリースで追加される予定です。

 ---

--- a/apps/docs/content/docs/skills.ko.mdx
+++ b/apps/docs/content/docs/skills.ko.mdx
@@ -54,7 +54,7 @@ GitHub나 ClawHub에서 가져온 스킬에는 스크립트와 실행 가능한
 - **스킬** = 구조화된 **지식 팩**(정적 콘텐츠 + 지침). 에이전트는 스킬을 읽어 "문제 X를 만나면 이렇게 생각하고 이렇게 처리하라"를 학습합니다.
 - **MCP**(Model Context Protocol) = **도구 채널**. 에이전트는 MCP를 사용해 외부 서비스(데이터베이스, 파일 시스템, 서드파티 API)에 연결하고 이를 **호출**합니다.

-이 둘은 상호 보완적입니다. 현재 Multica에서 MCP 지원은 **도구별로 구현됩니다**: Claude Code, Codex, Hermes, Kimi, Kiro CLI, OpenCode, OpenClaw는 `mcp_config`를 사용하고, 다른 도구들은 이 필드를 받더라도 실제로 사용하지 않습니다. MCP 전용 섹션은 추후 릴리스에서 추가될 예정입니다.
+이 둘은 상호 보완적입니다. 현재 Multica에서 MCP 지원은 **도구별로 구현됩니다**: Claude Code, Codex, Cursor, Hermes, Kimi, Kiro CLI, OpenCode, OpenClaw는 `mcp_config`를 사용하고, 다른 도구들은 이 필드를 받더라도 실제로 사용하지 않습니다. MCP 전용 섹션은 추후 릴리스에서 추가될 예정입니다.

 ---

--- a/apps/docs/content/docs/skills.mdx
+++ b/apps/docs/content/docs/skills.mdx
@@ -54,7 +54,7 @@ Both augment what an agent can do, but in different directions:
 - **Skill** = a structured **knowledge pack** (static content + instructions). The agent reads a skill to learn "when I see problem X, here's how to think and what to do."
 - **MCP** (Model Context Protocol) = a **tool channel**. The agent uses MCP to connect to external services (databases, filesystems, third-party APIs) and **invoke** them.

-The two are complementary. In Multica today, MCP support is **provider-specific**: Claude Code, Codex, Hermes, Kimi, Kiro CLI, OpenCode, and OpenClaw consume `mcp_config`; other tools receive the field but don't actually use it. A dedicated MCP section will come in a later release.
+The two are complementary. In Multica today, MCP support is **provider-specific**: Claude Code, Codex, Cursor, Hermes, Kimi, Kiro CLI, OpenCode, and OpenClaw consume `mcp_config`; other tools receive the field but don't actually use it. A dedicated MCP section will come in a later release.

 ---

--- a/apps/docs/content/docs/skills.zh.mdx
+++ b/apps/docs/content/docs/skills.zh.mdx
@@ -54,7 +54,7 @@ Skill 导入后需要**挂载到具体的智能体**才会生效。一个智能
 - **Skill** = 结构化的**知识包**（静态内容 + 指令）。智能体读 Skill 来学"遇到 X 类问题该怎么想、怎么做"。
 - **MCP**（Model Context Protocol）= **工具通道**。智能体通过 MCP 连外部服务（数据库、文件系统、第三方 API）并**调用**它们。

-两者可以同时用。目前 Multica 的 MCP 支持是**按工具实现**的：Claude Code、Codex、Hermes、Kimi、Kiro CLI、OpenCode、OpenClaw 会消费 `mcp_config`；其他工具会接收到这个字段但不会实际用。MCP 的专题会在后续版本展开。
+两者可以同时用。目前 Multica 的 MCP 支持是**按工具实现**的：Claude Code、Codex、Cursor、Hermes、Kimi、Kiro CLI、OpenCode、OpenClaw 会消费 `mcp_config`；其他工具会接收到这个字段但不会实际用。MCP 的专题会在后续版本展开。

 ---

--- a/apps/docs/content/docs/tasks.ja.mdx
+++ b/apps/docs/content/docs/tasks.ja.mdx
@@ -105,8 +105,7 @@ Multica はタスク中にセッション ID を**2 回**固定します: 開始

 ただし、**実際にどの AI コーディングツールがこれをサポートするか**は大きく異なります。

- ✅ **実際にサポート** — Antigravity, Claude Code, Copilot, Hermes, Kimi, Kiro CLI, OpenCode, OpenClaw, Pi
- ⚠️ **コードはあるが使用不可** — Codex, Cursor
+- ✅ **実際にサポート** — Antigravity, Claude Code, Codex, Copilot, Cursor, Hermes, Kimi, Kiro CLI, OpenCode, OpenClaw, Pi
 - ❌ **サポートなし** — Gemini

 [プロバイダー対応表 → セッション再開](/providers#session-resumption-who-really-supports-it)を参照してください。
--- a/apps/docs/content/docs/tasks.ko.mdx
+++ b/apps/docs/content/docs/tasks.ko.mdx
@@ -105,8 +105,7 @@ Multica는 작업 중에 세션 ID를 **두 번** 고정합니다: 시작 시

 하지만 **실제로 어떤 AI 코딩 도구가 이를 지원하는지**는 크게 다릅니다:

- ✅ **실제 지원** — Antigravity, Claude Code, Copilot, Hermes, Kimi, Kiro CLI, OpenCode, OpenClaw, Pi
- ⚠️ **코드는 있지만 사용 불가** — Codex, Cursor
+- ✅ **실제 지원** — Antigravity, Claude Code, Codex, Copilot, Cursor, Hermes, Kimi, Kiro CLI, OpenCode, OpenClaw, Pi
 - ❌ **지원 안 함** — Gemini

 [제공자 매트릭스 → 세션 재개](/providers#session-resumption-who-really-supports-it)를 참고하세요.
--- a/apps/docs/content/docs/tasks.mdx
+++ b/apps/docs/content/docs/tasks.mdx
@@ -105,8 +105,7 @@ Multica pins the session ID **twice** during a task: once at the start (when the

 But **which AI coding tools actually support this** varies a lot:

- ✅ **Real support** — Antigravity, Claude Code, Copilot, Hermes, Kimi, Kiro CLI, OpenCode, OpenClaw, Pi
- ⚠️ **Code exists but unusable** — Codex, Cursor
+- ✅ **Real support** — Antigravity, Claude Code, Codex, Copilot, Cursor, Hermes, Kimi, Kiro CLI, OpenCode, OpenClaw, Pi
 - ❌ **No support** — Gemini

 See [Providers Matrix → Session resumption](/providers#session-resumption-who-really-supports-it).
--- a/apps/docs/content/docs/tasks.zh.mdx
+++ b/apps/docs/content/docs/tasks.zh.mdx
@@ -42,6 +42,8 @@ Multica 服务器每 30 秒扫描一次，有两种超时会触发失败：

 两种超时的失败原因都是 `timeout`，**会自动重试**（下一节）。关联的运行时失联判定见 [守护进程与运行时 → 运行时什么时候被判定为离线](/daemon-runtimes#运行时什么时候被判定为离线)。

+上面这层是**服务端的粗粒度兜底**——按任务启动时间算，不看任务是否还在活动。真正区分「卡死」和「正常的长任务」的是**本地守护进程**：它不再用固定墙钟时长砍任务（`MULTICA_AGENT_TIMEOUT` 默认 `0` = 不设上限），而是看活动——只要 agent 还在持续产出事件（消息、工具调用），守护进程就不会因为跑得久判它超时（服务端那条 2.5h 仍是外层上限）。只有真正静默卡死时才会被**空闲看门狗**（`MULTICA_AGENT_IDLE_WATCHDOG`，默认 30 分钟）终止；如果是某个工具调用发出后长时间无任何输出（疑似卡死的子进程），则由更大的**工具看门狗**预算（`MULTICA_AGENT_TOOL_WATCHDOG`，默认 2 小时）兜底。这类被看门狗终止的任务失败原因是 `idle_watchdog`，和墙钟 `timeout` 区分开。各参数见 [环境变量 → 守护进程的调节参数](/environment-variables#守护进程的调节参数)。
+
 ## 哪些失败会自动重试，哪些不会

 失败分两类：**可重试**和**不可重试**。
@@ -105,8 +107,7 @@ Multica 在任务过程中**两次**保存会话 ID——任务一开始（AI

 但**哪些 AI 编程工具真的支持**差别很大：

- ✅ **真支持**——Antigravity、Claude Code、Copilot、Hermes、Kimi、Kiro CLI、OpenCode、OpenClaw、Pi
- ⚠️ **代码看起来支持但实际不可用**——Codex、Cursor
+- ✅ **真支持**——Antigravity、Claude Code、Codex、Copilot、Cursor、Hermes、Kimi、Kiro CLI、OpenCode、OpenClaw、Pi
 - ❌ **不支持**——Gemini

 详见 [Providers Matrix → 会话恢复](/providers#会话恢复谁真的支持)。
--- a/apps/docs/content/docs/troubleshooting.ja.mdx
+++ b/apps/docs/content/docs/troubleshooting.ja.mdx
@@ -180,9 +180,9 @@ docker exec <container> env | grep -E 'APP_ENV|MULTICA_DEV_VERIFICATION_CODE'

 **考えられる原因**:

-1. **`rollup_task_usage_hourly()` が一切スケジュールされていない** — 使用量 / ランタイムのダッシュボードは派生テーブル `task_usage_hourly` から読み取り、このテーブルはその関数によって埋められます。同梱の `pgvector/pgvector:pg17` イメージには `pg_cron` が含まれておらず、バックエンドもプロセス内で rollup を実行しません。外部スケジューラのない新規セルフホストインストールでは、これがデフォルトの状態です。
-2. **`pg_cron` はインストールされているが誤ったデータベースを指している** — `pg_cron.database_name` のデフォルト値は `postgres` です。Multica のデータベース名が異なる場合、スケジュールされたジョブは `rollup_task_usage_hourly()` を一切見つけられません。
-3. **スケジューラは動作しているが rollup が静かにエラーを出している** — 例えば cron エントリ内部の DB ロール / search_path が誤っている。
+1. **`rollup_task_usage_hourly()` がクレームされていない** — 使用量 / ランタイムのダッシュボードは派生テーブル `task_usage_hourly` から読み取り、このテーブルはその関数によって埋められます。MUL-2957 以降、バックエンドは DB バックドのスケジューラー（`sys_cron_executions`）を介してこの rollup をインプロセスで実行します。古いビルド、未適用の migration `113`、またはレプリカが残っていない長時間のバックエンド停止があると、最近の SUCCESS 行のないテーブルが残ることがあります。
+2. **`pg_cron` は互換性のために構成されているが誤ったデータベースを指している** — `pg_cron.database_name` のデフォルトは `postgres` です。Multica データベース名が異なる場合、スケジュールされたジョブは `rollup_task_usage_hourly()` を一切見つけられません。インプロセススケジューラーはこれに依存しませんが、もしインプロセススケジューラーを除去して `pg_cron` に依存している場合、DB 名は一致しなければなりません。
+3. **ハンドラーがクレームされているが静かにエラーを出している** — マイグレーションが部分的にしか適用されていないために SQL 関数が欠落している、あるいは DB ロール / search_path が誤って構成されている、など。`sys_cron_executions` の FAILED 監査行を確認してください。

 **診断方法**:

@@ -191,24 +191,30 @@ docker exec <container> env | grep -E 'APP_ENV|MULTICA_DEV_VERIFICATION_CODE'
 SELECT count(*) AS raw_rows FROM task_usage;
 SELECT count(*) AS hourly_rows FROM task_usage_hourly;

-- Confirm pg_cron is (or isn't) available.
-SELECT * FROM pg_available_extensions WHERE name = 'pg_cron';
-SHOW shared_preload_libraries;
-
-- If pg_cron is installed, check the schedule + last run.
-SELECT jobname, schedule, database, active FROM cron.job;
-SELECT jobname, status, return_message, start_time, end_time
-  FROM cron.job_run_details ORDER BY start_time DESC LIMIT 10;
+-- Inspect the in-process scheduler's audit log.
+SELECT plan_time, status, attempt, runner_id,
+       error_code, error_msg, started_at, finished_at
+  FROM sys_cron_executions
+ WHERE job_name = 'rollup_task_usage_hourly'
+ ORDER BY plan_time DESC
+ LIMIT 20;

 -- Watermark — if this is 1970-01-01, the rollup has never run.
 SELECT watermark_at FROM task_usage_hourly_rollup_state;
+
+-- Compatibility path: if you previously registered pg_cron, confirm
+-- it is (or isn't) available and pointing at the right database.
+SELECT * FROM pg_available_extensions WHERE name = 'pg_cron';
+SHOW shared_preload_libraries;
+SELECT jobname, schedule, database, active FROM cron.job;
 ```

 **解決方法**:

- rollup を手動で一度呼び出して動作するか確認してください: `SELECT rollup_task_usage_hourly();` — ダッシュボードを再読み込みしてください。数値が表示されれば、欠けているのはスケジューラだけです。
- [セルフホストクイックスタート → 使用量 rollup のスケジューリング](/self-host-quickstart#7-schedule-the-usage-rollup-required-for-the-usage-dashboard)からサポートされる方式のいずれかを選んでください: 外部 cron / systemd-timer / Kubernetes CronJob、または Postgres を `pg_cron` を含むイメージに置き換える。
- スケジュール設定より前の履歴がすでにある場合は、バックエンドコンテナ内部で `backfill_task_usage_hourly` を実行し、ウォーターマーク以前のバケットを埋めてください。
+- 少なくとも 1 つのバックエンドレプリカでスケジューラーが実際に動作していることを確認してください — 30 秒ごとに `sys_cron_executions` の `rollup_task_usage_hourly` に SUCCESS 行が追加されているはずです。
+- SQL パスを検証するため、rollup を手動で一度呼び出してください: `SELECT rollup_task_usage_hourly();` — ダッシュボードを再読み込みしてください。数値が表示されれば SQL 関数は問題なく、スケジューラーのクレーム経路に問題があります。
+- migration `113_sys_cron_executions` がまだ適用されていない場合は、バックエンドを再起動してマイグレーションを実行するか、手動で `migrate up` を呼び出してください。
+- インプロセススケジューラー以前のレガシー `pg_cron` 履歴がある場合でも、SQL 関数は内部で advisory lock 4246 を保持するため二重書き込みは発生しません — オプションの `cron.unschedule` クリーンアップについては [セルフホストクイックスタート → 使用量ロールアップ](/self-host-quickstart#7-usage-rollup-no-operator-action-required) を参照してください。

 ## マイグレーション `103` が `refusing to drop legacy daily rollups` で失敗する

@@ -224,9 +230,11 @@ ERROR: refusing to drop legacy daily rollups:

 **考えられる原因**: これはマイグレーション `103` の fail-closed ガードです。`task_usage_hourly` が生の `task_usage` に追いつくまで、レガシーの daily rollup の削除を拒否します。既存の行が存在し、rollup のウォーターマークがまだ epoch に留まっているとき — つまり、まだどの履歴も hourly テーブルに rollup されていないとき — にこのガードが発動します。

+MUL-2957 以降、migrate コマンドは migration `103` を適用する直前に冪等な月別スライス backfill（advisory lock 4246 の下）を自動で実行するため、v0.3.4 → v0.3.5+ への直接アップグレードは単一の `migrate up` 呼び出しで完了します。それでもこのエラーが表示される場合は、MUL-2957 以前のバイナリを使用しているか、フック自体が失敗しています — 直前の `task_usage hourly rollup hook` 行で migrate ログを確認してください。
+
 **解決方法**:

-1. 同じデータベースに対して backfill を実行してください（冪等であり、中断しても安全で、再実行しても安全です）:
+1. MUL-2957 以前のバイナリを使用しており、まずバイナリをアップグレードできない場合は、同じデータベースに対してスタンドアロンの backfill を実行してください（冪等であり、中断しても安全で、再実行しても安全です）:

   ```bash
   # Docker Compose
@@ -239,7 +247,7 @@ ERROR: refusing to drop legacy daily rollups:
   ```

 2. アップグレードを再実行してください — バックエンドコンテナを再起動するだけで十分で、マイグレーションは起動時に実行されます。これでガードが最新のウォーターマークを確認し、`103` の適用を許可します。
-3. ウォーターマークが進み続けるように、継続的な rollup スケジュール（cron / `pg_cron`）を設定してください — [セルフホストクイックスタート → 使用量 rollup のスケジューリング](/self-host-quickstart#7-schedule-the-usage-rollup-required-for-the-usage-dashboard)を参照してください。
+3. インプロセススケジューラーがウォーターマークを進め続けます — [セルフホストクイックスタート → 使用量ロールアップ](/self-host-quickstart#7-usage-rollup-no-operator-action-required) を参照してください。

 `--sleep-between-slices=2s` は、数年分の履歴を持つプロダクションデータベースにとって控えめなデフォルト値です。直近 N か月のみを保持し、それより古いバケットを永久に放棄してもかまわない場合は `--months-back N --force-partial` を使用してください。

--- a/apps/docs/content/docs/troubleshooting.ko.mdx
+++ b/apps/docs/content/docs/troubleshooting.ko.mdx
@@ -180,9 +180,9 @@ docker exec <container> env | grep -E 'APP_ENV|MULTICA_DEV_VERIFICATION_CODE'

 **가능한 원인**:

-1. **`rollup_task_usage_hourly()`가 전혀 스케줄링되지 않음** — 사용량 / 런타임 대시보드는 파생 테이블 `task_usage_hourly`에서 읽으며, 이 테이블은 해당 함수로 채워집니다. 번들된 `pgvector/pgvector:pg17` 이미지에는 `pg_cron`이 포함되어 있지 않으며, 백엔드도 프로세스 내에서 rollup을 실행하지 않습니다. 외부 스케줄러 없이 새로 설치한 자체 호스팅에서는 이것이 기본 상태입니다.
-2. **`pg_cron`이 설치되었지만 잘못된 데이터베이스를 가리킴** — `pg_cron.database_name`의 기본값은 `postgres`입니다. Multica 데이터베이스 이름이 다르면 스케줄된 작업이 `rollup_task_usage_hourly()`를 전혀 보지 못합니다.
-3. **스케줄러는 실행되지만 rollup이 조용히 오류를 냄** — 예를 들어 cron 항목 내부의 DB 역할 / search_path가 잘못됨.
+1. **`rollup_task_usage_hourly()`가 클레임되지 않음** — 사용량 / 런타임 대시보드는 파생 테이블 `task_usage_hourly`에서 읽으며, 이 테이블은 해당 함수로 채워집니다. MUL-2957부터 백엔드는 DB 기반 스케줄러(`sys_cron_executions`)를 통해 인프로세스로 rollup을 실행합니다. 오래된 빌드, 적용되지 않은 migration `113`, 또는 레플리카가 남아있지 않은 장기간의 백엔드 중단이 있으면 최근 SUCCESS 행이 없는 테이블이 남을 수 있습니다.
+2. **`pg_cron`이 호환성 용도로 구성되었지만 잘못된 데이터베이스를 가리킴** — `pg_cron.database_name`의 기본값은 `postgres`입니다. Multica 데이터베이스 이름이 다르면 스케줄된 작업이 `rollup_task_usage_hourly()`를 전혀 보지 못합니다. 인프로세스 스케줄러는 이에 의존하지 않지만, 인프로세스 스케줄러를 제거하고 `pg_cron`에 의존한다면 DB 이름이 일치해야 합니다.
+3. **핸들러가 클레임되지만 조용히 오류를 냄** — 예: 마이그레이션이 일부만 적용되어 SQL 함수가 누락되었거나, DB 역할 / search_path가 잘못 구성됨. `sys_cron_executions`의 FAILED 감사 행을 확인하세요.

 **진단 방법**:

@@ -191,24 +191,30 @@ docker exec <container> env | grep -E 'APP_ENV|MULTICA_DEV_VERIFICATION_CODE'
 SELECT count(*) AS raw_rows FROM task_usage;
 SELECT count(*) AS hourly_rows FROM task_usage_hourly;

-- Confirm pg_cron is (or isn't) available.
-SELECT * FROM pg_available_extensions WHERE name = 'pg_cron';
-SHOW shared_preload_libraries;
-
-- If pg_cron is installed, check the schedule + last run.
-SELECT jobname, schedule, database, active FROM cron.job;
-SELECT jobname, status, return_message, start_time, end_time
-  FROM cron.job_run_details ORDER BY start_time DESC LIMIT 10;
+-- Inspect the in-process scheduler's audit log.
+SELECT plan_time, status, attempt, runner_id,
+       error_code, error_msg, started_at, finished_at
+  FROM sys_cron_executions
+ WHERE job_name = 'rollup_task_usage_hourly'
+ ORDER BY plan_time DESC
+ LIMIT 20;

 -- Watermark — if this is 1970-01-01, the rollup has never run.
 SELECT watermark_at FROM task_usage_hourly_rollup_state;
+
+-- Compatibility path: if you previously registered pg_cron, confirm
+-- it is (or isn't) available and pointing at the right database.
+SELECT * FROM pg_available_extensions WHERE name = 'pg_cron';
+SHOW shared_preload_libraries;
+SELECT jobname, schedule, database, active FROM cron.job;
 ```

 **해결 방법**:

- rollup을 수동으로 한 번 호출하여 동작하는지 확인하세요: `SELECT rollup_task_usage_hourly();` — 대시보드를 새로고침하세요. 숫자가 나타나면 빠진 것은 스케줄러뿐입니다.
- [자체 호스팅 빠른 시작 → 사용량 rollup 스케줄링](/self-host-quickstart#7-schedule-the-usage-rollup-required-for-the-usage-dashboard)에서 지원되는 방식 중 하나를 선택하세요: 외부 cron / systemd-timer / Kubernetes CronJob, 또는 Postgres를 `pg_cron`이 포함된 이미지로 교체.
- 스케줄 설정 이전의 이력이 이미 있다면, 백엔드 컨테이너 내부에서 `backfill_task_usage_hourly`를 실행하여 워터마크 이전의 버킷을 채우세요.
+- 적어도 하나의 백엔드 레플리카에서 스케줄러가 실제로 실행 중인지 확인하세요 — 30초마다 `sys_cron_executions`의 `rollup_task_usage_hourly`에 SUCCESS 행이 추가되어야 합니다.
+- SQL 경로를 검증하기 위해 rollup을 수동으로 한 번 호출하세요: `SELECT rollup_task_usage_hourly();` — 대시보드를 새로고침하세요. 숫자가 나타나면 SQL 함수는 정상이며, 문제는 스케줄러 클레임 경로에 있습니다.
+- migration `113_sys_cron_executions`가 아직 적용되지 않았다면 백엔드를 재시작해 마이그레이션을 실행하거나 수동으로 `migrate up`을 호출하세요.
+- 인프로세스 스케줄러 이전의 레거시 `pg_cron` 이력이 있어도 SQL 함수가 내부적으로 advisory lock 4246을 잡고 있어 두 경로가 이중 쓰기할 수 없습니다 — 선택적 `cron.unschedule` 정리는 [자체 호스팅 빠른 시작 → 사용량 롤업](/self-host-quickstart#7-usage-rollup-no-operator-action-required)을 참고하세요.

 ## 마이그레이션 `103`이 `refusing to drop legacy daily rollups`로 실패함

@@ -224,9 +230,11 @@ ERROR: refusing to drop legacy daily rollups:

 **가능한 원인**: 이것은 마이그레이션 `103`의 fail-closed 가드입니다. `task_usage_hourly`가 원시 `task_usage`를 따라잡을 때까지 레거시 daily rollup 삭제를 거부합니다. 기존 행이 존재하고 rollup 워터마크가 여전히 epoch에 머물러 있을 때 — 즉 아직 어떤 이력도 hourly 테이블로 rollup되지 않았을 때 — 이 가드가 발동합니다.

+MUL-2957부터 migrate 명령은 migration `103`을 적용하기 직전에 멱등한 월별 슬라이스 backfill(advisory lock 4246 보호)을 자동으로 실행하므로, v0.3.4 → v0.3.5+ 직접 업그레이드는 단일 `migrate up` 호출로 완료됩니다. 그래도 이 오류가 보인다면, MUL-2957 이전 바이너리를 사용 중이거나 훅 자체가 실패한 것입니다 — migrate 로그에서 `task_usage hourly rollup hook` 로그를 확인하세요.
+
 **해결 방법**:

-1. 같은 데이터베이스에 대해 backfill을 실행하세요(멱등하며, 중단해도 안전하고, 다시 실행해도 안전합니다):
+1. MUL-2957 이전 바이너리를 사용 중이고 바이너리를 먼저 업그레이드할 수 없다면, 같은 데이터베이스에 대해 스탠드얼론 backfill을 실행하세요(멱등하며, 중단해도 안전하고, 다시 실행해도 안전합니다):

   ```bash
   # Docker Compose
@@ -239,7 +247,7 @@ ERROR: refusing to drop legacy daily rollups:
   ```

 2. 업그레이드를 다시 실행하세요 — 백엔드 컨테이너를 재시작하는 것으로 충분하며, 마이그레이션은 시작 시 실행됩니다. 이제 가드가 최신 워터마크를 확인하고 `103`을 적용하도록 허용합니다.
-3. 워터마크가 계속 진행되도록 지속적인 rollup 스케줄(cron / `pg_cron`)을 설정하세요 — [자체 호스팅 빠른 시작 → 사용량 rollup 스케줄링](/self-host-quickstart#7-schedule-the-usage-rollup-required-for-the-usage-dashboard)을 참고하세요.
+3. 인프로세스 스케줄러가 이후 워터마크를 계속 진행시킵니다 — [자체 호스팅 빠른 시작 → 사용량 롤업](/self-host-quickstart#7-usage-rollup-no-operator-action-required)을 참고하세요.

 `--sleep-between-slices=2s`는 수년 치 이력이 있는 프로덕션 데이터베이스에서 적절한 기본값입니다. 최근 N개월만 보관하고 더 오래된 버킷을 영구히 포기해도 괜찮다면 `--months-back N --force-partial`을 사용하세요.

--- a/apps/docs/content/docs/troubleshooting.mdx
+++ b/apps/docs/content/docs/troubleshooting.mdx
@@ -180,9 +180,9 @@ Check your inbox (including spam) for the real verification code.

 **Likely causes**:

-1. **`rollup_task_usage_hourly()` is never scheduled** — the Usage / Runtime dashboards read from the derived `task_usage_hourly` table, which is populated by that function. The bundled `pgvector/pgvector:pg17` image does not include `pg_cron`, and the backend does not run the rollup in-process either. On a fresh self-host install with no external scheduler, this is the default state.
-2. **`pg_cron` is installed but pointing at the wrong database** — `pg_cron.database_name` defaults to `postgres`; if your Multica database has a different name, the scheduled job never sees `rollup_task_usage_hourly()`.
-3. **The scheduler is running but the rollup is silently erroring** — e.g. wrong DB role / search_path inside the cron entry.
+1. **`rollup_task_usage_hourly()` is never being claimed** — the Usage / Runtime dashboards read from the derived `task_usage_hourly` table, populated by that function. Since MUL-2957 the backend runs the rollup in-process via the DB-backed scheduler (`sys_cron_executions`); a stale build, a missing migration `113`, or a sustained backend outage with no replicas left running can leave the table without a recent SUCCESS row.
+2. **`pg_cron` is configured for compatibility but pointing at the wrong database** — `pg_cron.database_name` defaults to `postgres`; if your Multica database has a different name, the scheduled job never sees `rollup_task_usage_hourly()`. The in-process scheduler does not depend on this, but if you removed the in-process scheduler and rely on `pg_cron`, the DB name must match.
+3. **The handler is being claimed but silently erroring** — e.g. the SQL function is missing because migrations were partially applied, or DB role / search_path is misconfigured. Check the FAILED audit rows in `sys_cron_executions`.

 **How to diagnose**:

@@ -191,24 +191,30 @@ Check your inbox (including spam) for the real verification code.
 SELECT count(*) AS raw_rows FROM task_usage;
 SELECT count(*) AS hourly_rows FROM task_usage_hourly;

-- Confirm pg_cron is (or isn't) available.
-SELECT * FROM pg_available_extensions WHERE name = 'pg_cron';
-SHOW shared_preload_libraries;
-
-- If pg_cron is installed, check the schedule + last run.
-SELECT jobname, schedule, database, active FROM cron.job;
-SELECT jobname, status, return_message, start_time, end_time
-  FROM cron.job_run_details ORDER BY start_time DESC LIMIT 10;
+-- Inspect the in-process scheduler's audit log.
+SELECT plan_time, status, attempt, runner_id,
+       error_code, error_msg, started_at, finished_at
+  FROM sys_cron_executions
+ WHERE job_name = 'rollup_task_usage_hourly'
+ ORDER BY plan_time DESC
+ LIMIT 20;

 -- Watermark — if this is 1970-01-01, the rollup has never run.
 SELECT watermark_at FROM task_usage_hourly_rollup_state;
+
+-- Compatibility path: if you previously registered pg_cron, confirm
+-- it is (or isn't) available and pointing at the right database.
+SELECT * FROM pg_available_extensions WHERE name = 'pg_cron';
+SHOW shared_preload_libraries;
+SELECT jobname, schedule, database, active FROM cron.job;
 ```

 **How to fix**:

- Call the rollup once by hand to confirm it works: `SELECT rollup_task_usage_hourly();` — refresh the dashboard; if numbers appear, the only missing piece is a scheduler.
- Pick one of the supported paths from [Self-host quickstart → Schedule the usage rollup](/self-host-quickstart#7-schedule-the-usage-rollup-required-for-the-usage-dashboard): external cron / systemd-timer / Kubernetes CronJob, or swap Postgres for an image with `pg_cron`.
- If you already have history that pre-dates the schedule, run `backfill_task_usage_hourly` inside the backend container to seed buckets before the watermark.
+- Confirm the scheduler is actually running on at least one backend replica — every 30 seconds it should add a SUCCESS row to `sys_cron_executions` for `rollup_task_usage_hourly`.
+- Call the rollup once by hand to verify the SQL path: `SELECT rollup_task_usage_hourly();` — refresh the dashboard; if numbers appear, the SQL function is fine and the issue is on the scheduler claim path.
+- If migration `113_sys_cron_executions` has not applied yet, restart the backend so migrations run, or invoke `migrate up` manually.
+- If you have legacy `pg_cron` history that pre-dates the in-process scheduler, the SQL function still holds advisory lock 4246 internally and the two paths cannot double-write — see [Self-host quickstart → Usage rollup](/self-host-quickstart#7-usage-rollup-no-operator-action-required) for the optional `cron.unschedule` cleanup.

 ## Migration `103` fails with `refusing to drop legacy daily rollups`

@@ -224,9 +230,11 @@ ERROR: refusing to drop legacy daily rollups:

 **Likely cause**: this is migration `103`'s fail-closed guard. It refuses to drop the legacy daily rollups until `task_usage_hourly` has caught up with raw `task_usage`. The guard fires whenever existing rows are present and the rollup watermark still sits at the epoch — i.e. nothing has rolled history into the hourly table yet.

+Since MUL-2957 the migrate command runs an idempotent monthly-slice backfill (under advisory lock 4246) automatically immediately before applying migration `103`, so v0.3.4 → v0.3.5+ direct upgrades complete in a single `migrate up` invocation. If you are still seeing this error you are either on a pre-MUL-2957 binary or the hook itself failed — check the migrate logs for an earlier `task_usage hourly rollup hook` line.
+
 **How to fix**:

-1. Run the backfill against the same database (idempotent, safe to interrupt, safe to re-run):
+1. If you are on a pre-MUL-2957 binary and cannot upgrade the binary first, run the standalone backfill against the same database (idempotent, safe to interrupt, safe to re-run):

   ```bash
   # Docker Compose
@@ -239,7 +247,7 @@ ERROR: refusing to drop legacy daily rollups:
   ```

 2. Re-run the upgrade — restarting the backend container is enough, migrations run on startup. The guard now sees a current watermark and lets `103` apply.
-3. Set up an ongoing rollup schedule (cron / `pg_cron`) so the watermark keeps advancing — see [Self-host quickstart → Schedule the usage rollup](/self-host-quickstart#7-schedule-the-usage-rollup-required-for-the-usage-dashboard).
+3. The in-process scheduler then keeps the watermark advancing — see [Self-host quickstart → Usage rollup](/self-host-quickstart#7-usage-rollup-no-operator-action-required).

 `--sleep-between-slices=2s` is a polite default on production databases with years of history. Use `--months-back N --force-partial` if you only want to keep the last N months and are willing to permanently abandon older buckets.

--- a/apps/docs/content/docs/troubleshooting.zh.mdx
+++ b/apps/docs/content/docs/troubleshooting.zh.mdx
@@ -180,9 +180,9 @@ docker exec <container> env | grep -E 'APP_ENV|MULTICA_DEV_VERIFICATION_CODE'

 **可能原因**：

-1. **`rollup_task_usage_hourly()` 没人调度** —— Usage / Runtime 看板读的是派生表 `task_usage_hourly`，这张表必须靠 `rollup_task_usage_hourly()` 周期性填充。默认的 `pgvector/pgvector:pg17` 镜像不带 `pg_cron`，后端进程内部也不会跑 rollup。如果你是新装的自部署、没配过外部调度器，默认就是这种状态。
-2. **`pg_cron` 装了但指向了错的库** —— `pg_cron.database_name` 默认是 `postgres`；如果你的 Multica 数据库名不是 `postgres`，调度任务根本看不到 `rollup_task_usage_hourly()`。
-3. **调度跑了，但 rollup 静默报错** —— 比如 cron entry 里 DB role / search_path 不对。
+1. **`rollup_task_usage_hourly()` 没被认领** —— Usage / Runtime 看板读的是派生表 `task_usage_hourly`，这张表必须靠 `rollup_task_usage_hourly()` 周期性填充。从 MUL-2957 起后端通过 DB 后端调度器（`sys_cron_executions`）在进程内跑 rollup；旧版本 binary、未应用 migration `113`、或者所有副本长时间下线，都可能让这张表里没有最近的 SUCCESS 行。
+2. **`pg_cron` 作为兼容路径配着、但指向了错的库** —— `pg_cron.database_name` 默认是 `postgres`；如果你的 Multica 数据库名不是 `postgres`，调度任务根本看不到 `rollup_task_usage_hourly()`。进程内调度器不依赖这一项，但如果你刻意拿掉了进程内调度而靠 `pg_cron`，DB 名就必须对得上。
+3. **handler 被认领了、但静默报错** —— 比如 migration 没全部应用导致 SQL 函数缺失、或 DB role / search_path 配错了。看 `sys_cron_executions` 里的 FAILED 审计行。

 **怎么查**：

@@ -191,24 +191,29 @@ docker exec <container> env | grep -E 'APP_ENV|MULTICA_DEV_VERIFICATION_CODE'
 SELECT count(*) AS raw_rows FROM task_usage;
 SELECT count(*) AS hourly_rows FROM task_usage_hourly;

-- 看 pg_cron 装没装、有没有加载
-SELECT * FROM pg_available_extensions WHERE name = 'pg_cron';
-SHOW shared_preload_libraries;
-
-- 如果 pg_cron 装了，看调度和最近一次运行
-SELECT jobname, schedule, database, active FROM cron.job;
-SELECT jobname, status, return_message, start_time, end_time
-  FROM cron.job_run_details ORDER BY start_time DESC LIMIT 10;
+-- 看进程内调度器的审计日志
+SELECT plan_time, status, attempt, runner_id,
+       error_code, error_msg, started_at, finished_at
+  FROM sys_cron_executions
+ WHERE job_name = 'rollup_task_usage_hourly'
+ ORDER BY plan_time DESC
+ LIMIT 20;

 -- watermark —— 如果还是 1970-01-01，说明 rollup 从来没跑过
 SELECT watermark_at FROM task_usage_hourly_rollup_state;
+
+-- 兼容路径：以前注册过 pg_cron，确认装没装、指对了库没
+SELECT * FROM pg_available_extensions WHERE name = 'pg_cron';
+SHOW shared_preload_libraries;
+SELECT jobname, schedule, database, active FROM cron.job;
 ```

 **怎么修**：

- 手动跑一次确认函数本身没问题：`SELECT rollup_task_usage_hourly();` —— 刷新看板；如果数字出来了，缺的就只是调度器。
- 从 [Self-host 快速上手 → 调度用量汇总任务](/self-host-quickstart#7-调度用量汇总任务usage-dashboard-必需) 里挑一种调度方式：外部 cron / systemd-timer / Kubernetes CronJob，或者换成带 `pg_cron` 的 Postgres 镜像。
- 如果调度配好之前数据库已经有一段历史，先在后端容器里跑 `backfill_task_usage_hourly` 把 watermark 之前的桶补出来。
+- 确认至少一个后端副本里调度器真的在跑 —— 每 30 秒应该往 `sys_cron_executions` 的 `rollup_task_usage_hourly` 加一条 SUCCESS 行。
+- 手动跑一次 SQL 验证函数本身没问题：`SELECT rollup_task_usage_hourly();` —— 刷新看板；如果数字出来了，SQL 这层 OK，问题在调度器认领路径上。
+- 如果 migration `113_sys_cron_executions` 还没应用，重启后端让 migration 跑一遍，或手动 `migrate up`。
+- 历史里有遗留的 `pg_cron` 入口也没事 —— SQL 函数里还持有 advisory lock 4246，应用调度器和 `pg_cron` 不会双写；要清掉冗余项见 [Self-host 快速上手 → 用量汇总](/self-host-quickstart#7-usage-rollup-no-operator-action-required) 里的 `cron.unschedule`。

 ## migration `103` 报 `refusing to drop legacy daily rollups`

@@ -224,9 +229,11 @@ ERROR: refusing to drop legacy daily rollups:

 **可能原因**：这是 migration `103` 的 fail-closed guard。它要求 `task_usage_hourly` 已经追平了原始的 `task_usage` 之后，才允许丢掉旧的 daily rollup。只要数据库里有历史数据、且 rollup watermark 还停在 epoch（说明还没把历史回填进 hourly 表），这条 guard 就会拦住。

+从 MUL-2957 起，migrate 命令在应用 migration `103` 之前会自动跑一次幂等的按月切片 backfill（advisory lock 4246 保护），所以 v0.3.4 → v0.3.5+ 直升一次 `migrate up` 就能搞定。如果你还看到这个错，要么用的是 MUL-2957 之前的二进制，要么 hook 自己也失败了 —— 看 migrate 日志里更早一行的 `task_usage hourly rollup hook` 看具体原因。
+
 **怎么修**：

-1. 对同一个数据库跑一次 backfill（幂等，可以打断，可以重试）：
+1. 如果你跑的是 MUL-2957 之前的 binary，又没办法先升级 binary，就对同一个数据库手动跑一次独立 backfill（幂等，可以打断，可以重试）：

   ```bash
   # Docker Compose
@@ -239,7 +246,7 @@ ERROR: refusing to drop legacy daily rollups:
   ```

 2. 重新跑升级 —— 重启 backend 容器即可，启动时会自动跑 migration。Guard 看到新的 watermark，`103` 就会通过。
-3. 同时配上持续的 rollup 调度，保证 watermark 持续推进 —— 见 [Self-host 快速上手 → 调度用量汇总任务](/self-host-quickstart#7-调度用量汇总任务usage-dashboard-必需)。
+3. 之后由进程内调度器持续推 watermark —— 见 [Self-host 快速上手 → 用量汇总](/self-host-quickstart#7-usage-rollup-no-operator-action-required)。

 `--sleep-between-slices=2s` 在有多年历史的生产库上是个比较克制的默认值。如果你只想保留最近 N 个月、可以接受永久丢掉更老的桶，用 `--months-back N --force-partial`。

--- a/apps/mobile/app/(app)/[workspace]/issue/[id]/runs.tsx
+++ b/apps/mobile/app/(app)/[workspace]/issue/[id]/runs.tsx
@@ -1,7 +1,7 @@
 /**
 * Agent Runs sheet — presented as a formSheet by the parent Stack. Two
 * sections: Active (queued/dispatched/running, created_at desc) and Past
- * (failed → cancelled → completed, completed_at desc within each). Empty
+ * (completed_at desc, status rank as tiebreaker). Empty
 * sections hide entirely.
 *
 * Both entry points (the in-card AgentActivityRow and the Stack-header
@@ -58,9 +58,9 @@ export default function IssueRunsRoute() {
        t.status === "cancelled",
    );
    return filtered.sort((a, b) => {
-      const ord = PAST_STATUS_ORDER[a.status] - PAST_STATUS_ORDER[b.status];
-      if (ord !== 0) return ord;
-      return (b.completed_at ?? "").localeCompare(a.completed_at ?? "");
+      const timeDiff = (b.completed_at ?? "").localeCompare(a.completed_at ?? "");
+      if (timeDiff !== 0) return timeDiff;
+      return PAST_STATUS_ORDER[a.status] - PAST_STATUS_ORDER[b.status];
    });
  }, [allTasks]);

--- a/apps/mobile/app/(app)/[workspace]/switch-workspace.tsx
+++ b/apps/mobile/app/(app)/[workspace]/switch-workspace.tsx
@@ -30,6 +30,7 @@ import { router } from "expo-router";
 import { useQuery } from "@tanstack/react-query";
 import type { Workspace } from "@multica/core/types";
 import { Text } from "@/components/ui/text";
+import { WorkspaceAvatar } from "@/components/workspace/workspace-avatar";
 import { workspaceListOptions } from "@/data/queries/workspaces";
 import { useWorkspaceStore } from "@/data/workspace-store";
 import { useColorScheme } from "@/lib/use-color-scheme";
@@ -45,12 +46,12 @@ export default function SwitchWorkspaceRoute() {
  const onSelect = (ws: Workspace) => {
    if (ws.slug === activeSlug) return;
    Alert.alert(
-      "切换工作区",
-      `确定切换到 "${ws.name}"?`,
+      "Switch workspace",
+      `Switch to "${ws.name}"?`,
      [
-        { text: "取消", style: "cancel" },
+        { text: "Cancel", style: "cancel" },
        {
-          text: "切换",
+          text: "Switch",
          onPress: () => {
            router.dismiss();
            router.replace(`/${ws.slug}/inbox`);
@@ -64,7 +65,7 @@ export default function SwitchWorkspaceRoute() {
    <View className="flex-1">
      <View className="px-4 pt-4 pb-3">
        <Text className="text-base font-semibold text-foreground">
-          切换工作区
+          Switch workspace
        </Text>
      </View>
      {isLoading ? (
@@ -80,7 +81,6 @@ export default function SwitchWorkspaceRoute() {
              active={ws.slug === activeSlug}
              onPress={() => onSelect(ws)}
              iconTint={t.foreground}
-              mutedIconTint={t.mutedForeground}
            />
          ))}
        </ScrollView>
@@ -94,13 +94,11 @@ function WorkspaceRow({
  active,
  onPress,
  iconTint,
-  mutedIconTint,
 }: {
  workspace: Workspace;
  active: boolean;
  onPress: () => void;
  iconTint: string;
-  mutedIconTint: string;
 }) {
  return (
    <Pressable
@@ -108,18 +106,18 @@ function WorkspaceRow({
      disabled={active}
      accessibilityLabel={
        active
-          ? `${workspace.name}, 当前工作区`
-          : `切换到 ${workspace.name}`
+          ? `${workspace.name}, current workspace`
+          : `Switch to ${workspace.name}`
      }
      className={cn(
        "flex-row items-center gap-3 px-4 py-3 active:bg-secondary",
        active && "opacity-100",
      )}
    >
-      <ExpoImage
-        source="sf:building.2"
-        tintColor={active ? iconTint : mutedIconTint}
-        style={{ width: 18, height: 18 }}
+      <WorkspaceAvatar
+        name={workspace.name}
+        avatarUrl={workspace.avatar_url}
+        size={24}
      />
      <Text
        className={cn(
--- a/apps/mobile/components/inbox/detail-label.tsx
+++ b/apps/mobile/components/inbox/detail-label.tsx
@@ -17,6 +17,7 @@ import type {
  IssueStatus,
  IssuePriority,
 } from "@multica/core/types";
+import { formatDateOnly } from "@multica/core/issues/date";
 import { Text } from "@/components/ui/text";
 import { StatusIcon } from "@/components/ui/status-icon";
 import { PriorityIcon } from "@/components/ui/priority-icon";
@@ -46,6 +47,7 @@ const PRIORITY_LABEL: Record<IssuePriority, string> = {
 // Mirrors useTypeLabels in packages/views/inbox/components/inbox-detail-label.tsx
 const TYPE_LABEL: Record<InboxItemType, string> = {
  issue_assigned: "Assigned",
+  issue_subscribed: "Subscribed",
  unassigned: "Unassigned",
  assignee_changed: "Reassigned",
  status_changed: "Status changed",
@@ -64,12 +66,9 @@ const TYPE_LABEL: Record<InboxItemType, string> = {
  quick_create_failed: "Quick-create failed",
 };

+// due_date is a calendar day — format timezone-safely (no offset day shift).
 function shortDate(dateStr: string): string {
-  if (!dateStr) return "";
-  return new Date(dateStr).toLocaleDateString("en-US", {
-    month: "short",
-    day: "numeric",
-  });
+  return formatDateOnly(dateStr, { month: "short", day: "numeric" }, "en-US");
 }

 function singleLine(value: string | null | undefined): string {
--- a/apps/mobile/components/issue/attribute-row.tsx
+++ b/apps/mobile/components/issue/attribute-row.tsx
@@ -23,6 +23,7 @@ import type {
  Issue,
  IssuePriority,
 } from "@multica/core/types";
+import { formatDateOnly } from "@multica/core/issues/date";
 import { Text } from "@/components/ui/text";
 import { StatusIcon } from "@/components/ui/status-icon";
 import { PriorityIcon } from "@/components/ui/priority-icon";
@@ -67,11 +68,11 @@ const ISSUE_PICKER_PATHNAMES = {
  "due-date": "/[workspace]/issue/[id]/picker/due-date",
 } as const satisfies Record<IssuePickerField, string>;

+// due_date is a calendar day — format timezone-safely so the day never shifts
+// with the viewer's offset. Mirrors web's formatDate in list-row/board-card.
 function formatDueDate(iso: string | null): string | null {
  if (!iso) return null;
-  const d = new Date(iso);
-  if (Number.isNaN(d.getTime())) return null;
-  return d.toLocaleDateString("en-US", { month: "short", day: "numeric" });
+  return formatDateOnly(iso, { month: "short", day: "numeric" }, "en-US") || null;
 }

 export function AttributeRow({ issue }: { issue: Issue }) {
--- a/apps/mobile/components/issue/comment-attachment-list.tsx
+++ b/apps/mobile/components/issue/comment-attachment-list.tsx
@@ -26,6 +26,7 @@ import { Linking, Pressable, View } from "react-native";
 import { Ionicons } from "@expo/vector-icons";
 import type { Attachment } from "@multica/core/types";
 import { MarkdownImage } from "@/lib/markdown/markdown-image";
+import { resolveAttachmentUrl } from "@/lib/attachment-url";
 import { useColorScheme } from "@/lib/use-color-scheme";
 import { THEME } from "@/lib/theme";
 import { Text } from "@/components/ui/text";
@@ -108,12 +109,19 @@ function FileCard({
  return (
    <Pressable
      onPress={() => {
-        // download_url is the signed HTTPS link; opening it hands off to
+        // download_url is the canonical link — opening it hands off to
        // Safari which handles auth-token-free download + previewing for
        // common types (PDF, txt). Mirrors what the markdown link renderer
        // does for `[name](url)`.
-        if (attachment.download_url) {
-          void Linking.openURL(attachment.download_url);
+        //
+        // The backend may return a server-relative URL like
+        // `/api/attachments/{id}/download` when no CloudFront signer is
+        // configured (MUL-2976). RN's `Linking.openURL` requires an
+        // absolute http(s) URL — it returns "Cannot open URL" otherwise —
+        // so resolve against `EXPO_PUBLIC_API_URL` first.
+        const target = resolveAttachmentUrl(attachment.download_url);
+        if (target) {
+          void Linking.openURL(target);
        }
      }}
      accessibilityRole="button"
--- a/apps/mobile/components/issue/composer-attachment-row.tsx
+++ b/apps/mobile/components/issue/composer-attachment-row.tsx
@@ -28,6 +28,7 @@
 import { useMemo } from "react";
 import { ActivityIndicator, Linking, Pressable, ScrollView, View } from "react-native";
 import { Ionicons } from "@expo/vector-icons";
+import { resolveAttachmentUrl } from "@/lib/attachment-url";
 import { useLightbox } from "@/lib/markdown/lightbox-provider";
 import { useColorScheme } from "@/lib/use-color-scheme";
 import { THEME } from "@/lib/theme";
@@ -193,8 +194,17 @@ function AttachmentChipView({ item, onRemove, onRetry }: AttachmentChipProps) {
      // Prefer the local on-device file over the network URL — instant,
      // no signed-URL round-trip, works the same pre/post upload.
      open(item.localUri);
-    } else if (item.downloadUrl) {
-      void Linking.openURL(item.downloadUrl);
+    } else {
+      // Non-image file chip: open the canonical download URL in Safari.
+      // `downloadUrl` comes from `api.uploadFile(...).download_url`, which
+      // on non-CloudFront deployments is a server-relative path like
+      // `/api/attachments/{id}/download` (MUL-2976). RN's `Linking.openURL`
+      // requires an absolute http(s) URL — `Cannot open URL` otherwise — so
+      // resolve against `EXPO_PUBLIC_API_URL` first. Already-absolute
+      // CloudFront/presigned URLs pass through unchanged. `null` (no
+      // downloadUrl yet) falls through to a no-op.
+      const target = resolveAttachmentUrl(item.downloadUrl);
+      if (target) void Linking.openURL(target);
    }
  };

--- a/Show More
+++ b/Show More