docs(self-host): document task_usage_hourly rollup requirement (MUL-2682)

The Usage / Runtime dashboards read from `task_usage_hourly`, but the default self-host stack does not schedule `rollup_task_usage_hourly()` anywhere — the bundled pgvector/pgvector:pg17 image ships without pg_cron, and the backend does not run the rollup in-process. Fresh installs see the dashboard stay at zero forever (#3244), and upgrades from v0.3.4 → v0.3.5+ are blocked by migration 103's fail-closed guard (#3015). Document the three supported paths (external cron / systemd-timer / CronJob, Postgres with pg_cron, or backfill_task_usage_hourly for upgrades) across SELF_HOSTING.md, SELF_HOSTING_ADVANCED.md, the quickstart pages on the docs site, and add troubleshooting entries for both the silent-zero and the migration-guard failure modes. Co-authored-by: multica-agent <github@multica.ai>
2026-06-28 10:02:36 +02:00 · 2026-05-26 16:01:27 +08:00
7 changed files with 394 additions and 2 deletions
--- a/SELF_HOSTING.md
+++ b/SELF_HOSTING.md
@@ -298,6 +298,8 @@ To roll back if an upgrade goes sideways:
 helm -n multica rollback multica
 ```

+> **Upgrading from `v0.3.4` to `v0.3.5+` fails with `refusing to drop legacy daily rollups: ...`?** Same migration guard as the Docker path — see [Usage Dashboard Rollup → Option C](#option-c--backfill-history-first-then-schedule). Run the backfill against the same database the chart is using (`kubectl -n multica exec deploy/multica-backend -- ./backfill_task_usage_hourly --sleep-between-slices=2s`), then restart the backend deployment to re-apply migrations.
+
 ### Tearing down

 ```bash
@@ -310,6 +312,57 @@ kubectl delete namespace multica

 ---

+## Usage Dashboard Rollup (Required)
+
+Starting with `v0.3.5`, the Usage / Runtime dashboards read from a derived `task_usage_hourly` table rather than directly from `task_usage`. Raw `task_usage` rows are written by the backend on every task, but the dashboard only sees data after `rollup_task_usage_hourly()` runs and aggregates them into `task_usage_hourly`.
+
+**The bundled `pgvector/pgvector:pg17` image does NOT include `pg_cron`.** If nothing schedules the rollup, the dashboard will stay at zero forever even though `task_usage` is populated. You have three supported options — pick one before relying on the dashboard.
+
+> **Upgrading from `v0.3.4` to `v0.3.5+`** with existing `task_usage` history: migration `103` is fail-closed and will abort `migrate up` with `refusing to drop legacy daily rollups: …`. Run `backfill_task_usage_hourly` first (Option C below), then re-run the upgrade. **Fresh installs** are exempted by that guard and migrate cleanly — but the dashboard will still stay at zero until you pick Option A or Option B.
+
+### Option A — External cron / systemd-timer (simplest)
+
+Schedule a 5-minute job that calls `rollup_task_usage_hourly()`. It is idempotent and watermark-driven, so a missed tick catches up on the next run.
+
+```bash
+# /etc/cron.d/multica-rollup — every 5 minutes
+*/5 * * * * root docker compose -f /path/to/multica/docker-compose.selfhost.yml \
+  exec -T postgres psql -U multica -d multica \
+  -c "SELECT rollup_task_usage_hourly();" >/dev/null
+```
+
+Or as a systemd timer + service if you prefer that surface. The function returns the number of (upserted + deleted-empty) rows; it's safe to call concurrently with itself (an advisory lock makes overlapping runs no-op) and safe to call alongside `backfill_task_usage_hourly`.
+
+### Option B — Swap Postgres for an image that ships `pg_cron`
+
+If you'd rather have Postgres schedule itself, replace `pgvector/pgvector:pg17` in `docker-compose.selfhost.yml` with an image that bundles both `pgvector` and `pg_cron` (e.g. `supabase/postgres`, or your own build of `pgvector/pgvector` with `pg_cron` added and `shared_preload_libraries=pg_cron` set on the server). Then, once:
+
+```sql
+CREATE EXTENSION IF NOT EXISTS pg_cron;
+SELECT cron.schedule(
+  'rollup_task_usage_hourly',
+  '*/5 * * * *',
+  $$SELECT rollup_task_usage_hourly()$$
+);
+```
+
+`shared_preload_libraries` requires a Postgres restart to take effect — set it in `postgresql.conf` (or via the image's documented mechanism) before bringing the container up.
+
+### Option C — Backfill history first, then schedule
+
+If you're upgrading from `v0.3.4 → v0.3.5+` and already have `task_usage` rows (or you just want the dashboard to show historical data on a fresh install that you've been running for a while), run the bundled backfill command once before scheduling the rollup:
+
+```bash
+# Backfills task_usage_hourly from all historical task_usage rows and stamps
+# the rollup watermark. Idempotent — safe to re-run.
+docker compose -f docker-compose.selfhost.yml exec backend \
+  ./backfill_task_usage_hourly --sleep-between-slices=2s
+```
+
+On a database with years of data this can scan tens of millions of rows; `--sleep-between-slices=2s` throttles the read pressure. Use `--months-back N` (plus `--force-partial`) if you only want the last N months. Once it finishes, set up Option A or Option B so new buckets keep flowing.
+
+After upgrading, re-run `migrate up` (or restart the backend container — migrations run automatically on startup) to apply migration `103` cleanly.
+
 ## Stopping Services

 If you installed via the install script:
@@ -350,6 +403,8 @@ docker compose -f docker-compose.selfhost.yml up -d
 Pin `MULTICA_IMAGE_TAG` in `.env` to an exact version like `v0.2.4` if you want to stay on a specific release. Migrations run automatically on backend startup.
 If the selected GHCR tag has not been published yet, fall back to `make selfhost-build` or `docker compose -f docker-compose.selfhost.yml -f docker-compose.selfhost.build.yml up -d --build`.

+> **Upgrading from `v0.3.4` to `v0.3.5+` fails with `refusing to drop legacy daily rollups: ...`?** That's migration `103`'s fail-closed guard: it requires `task_usage_hourly` to be seeded before the legacy daily rollups are dropped. Run `backfill_task_usage_hourly` first, then re-run the upgrade. Full instructions in [Usage Dashboard Rollup → Option C](#option-c--backfill-history-first-then-schedule).
+
 ---

 ## Manual Docker Compose Setup
--- a/SELF_HOSTING_ADVANCED.md
+++ b/SELF_HOSTING_ADVANCED.md
@@ -166,6 +166,111 @@ The Docker Compose setup runs migrations automatically. If you need to run them
 cd server && go run ./cmd/migrate up
 ```

+## Usage Dashboard Rollup
+
+The Usage and Runtime dashboards read from `task_usage_hourly`, a derived table populated by `rollup_task_usage_hourly()`. The function is **not** scheduled out of the box on the default self-host stack: the bundled `pgvector/pgvector:pg17` image ships without `pg_cron`, and the backend does not run the rollup in-process either. Until something calls it on a schedule, raw `task_usage` rows will keep arriving while the dashboard stays at zero.
+
+Pick one of the supported paths:
+
+### Option A — External cron / systemd-timer
+
+The simplest path. Schedule `SELECT rollup_task_usage_hourly()` every five minutes from any out-of-band timer (host crontab, systemd timer, sidecar container, Kubernetes CronJob). It is idempotent and watermark-driven — overlapping runs are no-ops on an internal advisory lock, and a missed tick catches up on the next run.
+
+Docker Compose:
+
+```bash
+# /etc/cron.d/multica-rollup
+*/5 * * * * root docker compose -f /path/to/multica/docker-compose.selfhost.yml \
+  exec -T postgres psql -U multica -d multica \
+  -c "SELECT rollup_task_usage_hourly();" >/dev/null
+```
+
+Kubernetes (one-off `CronJob`):
+
+```yaml
+apiVersion: batch/v1
+kind: CronJob
+metadata:
+  name: multica-usage-rollup
+spec:
+  schedule: "*/5 * * * *"
+  concurrencyPolicy: Forbid
+  jobTemplate:
+    spec:
+      template:
+        spec:
+          restartPolicy: OnFailure
+          containers:
+            - name: psql
+              image: postgres:17-alpine
+              command:
+                - psql
+                - "$(DATABASE_URL)"
+                - -c
+                - "SELECT rollup_task_usage_hourly();"
+              env:
+                - name: DATABASE_URL
+                  valueFrom:
+                    secretKeyRef:
+                      name: multica-secrets
+                      key: DATABASE_URL
+```
+
+### Option B — Postgres with `pg_cron`
+
+If you'd rather have Postgres schedule itself, swap the bundled image for one that ships both `pgvector` and `pg_cron` (e.g. `supabase/postgres`, or a custom build of `pgvector/pgvector` with `pg_cron` added). `pg_cron` requires `shared_preload_libraries=pg_cron` in `postgresql.conf`, which only takes effect on Postgres restart — set it before bringing the container up.
+
+Then register the job once:
+
+```sql
+CREATE EXTENSION IF NOT EXISTS pg_cron;
+SELECT cron.schedule(
+  'rollup_task_usage_hourly',
+  '*/5 * * * *',
+  $$SELECT rollup_task_usage_hourly()$$
+);
+```
+
+`pg_cron.database_name` defaults to `postgres`; if your Multica database has a different name, point `pg_cron` at it via that GUC or run `cron.schedule_in_database(...)` instead.
+
+### Option C — Backfill historical data first
+
+`rollup_task_usage_hourly()` only processes new buckets after it starts running. If you already have `task_usage` rows from before the rollup was scheduled — most commonly when upgrading from `v0.3.4` to `v0.3.5+`, or on a fresh install that has been collecting usage for a while — run `backfill_task_usage_hourly` once to seed historical buckets, then set up Option A or Option B for ongoing rollups.
+
+```bash
+# Docker Compose
+docker compose -f docker-compose.selfhost.yml exec backend \
+  ./backfill_task_usage_hourly --sleep-between-slices=2s
+
+# Kubernetes
+kubectl -n multica exec deploy/multica-backend -- \
+  ./backfill_task_usage_hourly --sleep-between-slices=2s
+```
+
+The command walks `task_usage`'s full time range in monthly slices and calls the same idempotent primitive the cron path uses, so it's safe to re-run, to interrupt with Ctrl-C, and to run concurrently with an already-scheduled rollup. Flags:
+
+| Flag | Description |
+|---|---|
+| `--sleep-between-slices` | Pause between monthly slices to throttle read pressure on busy databases (e.g. `2s`). Recommended on production DBs with years of history. |
+| `--months-back N` | Only backfill the last N months. **Requires `--force-partial`** because the watermark still advances past the skipped older buckets — those are permanently abandoned. |
+| `--dry-run` | Log slices that would be processed without writing anything. |
+
+After backfill completes, the rollup-state watermark is stamped to `now() - 5 minutes`, so the first scheduled tick after backfill does not redo history.
+
+### `v0.3.4 → v0.3.5+` upgrade order
+
+Migration `103` adds a fail-closed guard that refuses to drop the legacy daily rollups until `task_usage_hourly` has caught up. If you run `migrate up` straight through on a database with existing `task_usage` rows, it aborts with:
+
+```text
+ERROR: refusing to drop legacy daily rollups:
+  task_usage_hourly_rollup_state.watermark_at (1970-01-01 ...) trails
+  task_usage latest event (...) by more than 01:00:00 — backfill is
+  incomplete or pg_cron is not running. Run cmd/backfill_task_usage_hourly
+  (and let pg_cron catch up) before re-running migrate
+```
+
+Recovery is straightforward: run `backfill_task_usage_hourly` (Option C above), then re-run `migrate up` (or restart the backend container — migrations run automatically on startup). **Fresh installs are exempt** — the guard short-circuits when `task_usage` is empty, and migrations succeed, but the dashboard will still stay at zero until you set up Option A or Option B.
+
 ## Manual Setup (Without Docker Compose)

 If you prefer to build and run services manually:
--- a/apps/docs/content/docs/getting-started/self-hosting.zh.mdx
+++ b/apps/docs/content/docs/getting-started/self-hosting.zh.mdx
@@ -133,6 +133,18 @@ Alternatively, configure step by step: `multica config set server_url http://loc
 3. Go to **Settings → Agents** and create a new agent
 4. Create an issue and assign it to your agent

+## Usage Dashboard Rollup (Required)
+
+Starting with `v0.3.5`, the Usage / Runtime dashboards read from a derived `task_usage_hourly` table populated by `rollup_task_usage_hourly()`. The bundled `pgvector/pgvector:pg17` image does **not** include `pg_cron`, and the backend doesn't run the rollup in-process either — until you schedule it yourself, the dashboard will stay at zero even though `task_usage` is populated.
+
+Pick one supported path before relying on the Usage / Runtime dashboard:
+
+- **External cron / systemd-timer / Kubernetes `CronJob`**: schedule `SELECT rollup_task_usage_hourly()` every 5 minutes. Idempotent, watermark-driven — overlapping or skipped ticks are safe.
+- **Postgres with `pg_cron`**: swap the bundled Postgres image for one that ships `pg_cron`, set `shared_preload_libraries=pg_cron`, then `SELECT cron.schedule('rollup_task_usage_hourly', '*/5 * * * *', 'SELECT rollup_task_usage_hourly()')` once.
+- **Backfill historical data**: required on the `v0.3.4 → v0.3.5+` upgrade path when the database already has `task_usage` rows — migration `103` is fail-closed and will abort `migrate up` with `refusing to drop legacy daily rollups: ...` until the hourly table is seeded. Run `./backfill_task_usage_hourly --sleep-between-slices=2s` inside the backend container, then re-run the upgrade and configure one of the schedules above.
+
+Full reference (Compose + Kubernetes templates, flag descriptions, upgrade order) lives in [`SELF_HOSTING_ADVANCED.md → Usage Dashboard Rollup`](https://github.com/multica-ai/multica/blob/main/SELF_HOSTING_ADVANCED.md#usage-dashboard-rollup).
+
 ## Stopping Services

 ```bash
--- a/apps/docs/content/docs/self-host-quickstart.mdx
+++ b/apps/docs/content/docs/self-host-quickstart.mdx
@@ -159,6 +159,45 @@ After bringing the proxy up, set `FRONTEND_ORIGIN=https://multica.example.com` i

 Same flow as Cloud — see [Cloud quickstart → Steps 5-6](/cloud-quickstart#5-create-an-agent).

+## 7. Schedule the usage rollup (required for the Usage dashboard)
+
+<Callout type="warning">
+The Usage / Runtime dashboards read from a derived `task_usage_hourly` table populated by `rollup_task_usage_hourly()`. The bundled `pgvector/pgvector:pg17` Postgres image **does not include `pg_cron`**, and the backend does not run the rollup in-process either. If nothing schedules `rollup_task_usage_hourly()`, raw `task_usage` rows keep arriving while the dashboard stays at zero forever.
+</Callout>
+
+Pick one of the supported options — only one is needed.
+
+**Option A — External cron / systemd-timer (simplest).** Run the rollup every 5 minutes from any out-of-band scheduler. It's idempotent and watermark-driven, so missed ticks catch up:
+
+```bash
+# /etc/cron.d/multica-rollup — every 5 minutes
+*/5 * * * * root docker compose -f /path/to/multica/docker-compose.selfhost.yml \
+  exec -T postgres psql -U multica -d multica \
+  -c "SELECT rollup_task_usage_hourly();" >/dev/null
+```
+
+**Option B — Swap Postgres for an image that ships `pg_cron`.** Replace `pgvector/pgvector:pg17` in `docker-compose.selfhost.yml` with an image that has both `pgvector` and `pg_cron` (`supabase/postgres`, or a custom build), set `shared_preload_libraries=pg_cron`, restart, then register the job once:
+
+```sql
+CREATE EXTENSION IF NOT EXISTS pg_cron;
+SELECT cron.schedule(
+  'rollup_task_usage_hourly',
+  '*/5 * * * *',
+  $$SELECT rollup_task_usage_hourly()$$
+);
+```
+
+**Option C — Backfill history first (upgrade path).** If you're upgrading from `v0.3.4 → v0.3.5+` and have existing `task_usage` rows, migration `103` will abort `migrate up` with `refusing to drop legacy daily rollups: ...` until the hourly table is seeded. Run the bundled backfill once, then set up Option A or B:
+
+```bash
+docker compose -f docker-compose.selfhost.yml exec backend \
+  ./backfill_task_usage_hourly --sleep-between-slices=2s
+```
+
+`--sleep-between-slices=2s` throttles read pressure on a busy DB. After it finishes, restart the backend container (migrations run on startup) and the upgrade completes.
+
+Full reference — including the Kubernetes `CronJob` template and the upgrade order — lives in the repo's [`SELF_HOSTING_ADVANCED.md → Usage Dashboard Rollup`](https://github.com/multica-ai/multica/blob/main/SELF_HOSTING_ADVANCED.md#usage-dashboard-rollup).
+
 ## Kubernetes deployment (alternative)

 If you already run a Kubernetes cluster, the repo also ships a Helm chart at `deploy/helm/multica/`. It's the equivalent of `make selfhost` for k8s — same backend image, frontend image, and `pgvector/pgvector:pg17` Postgres, packaged as Deployments / Services / Ingresses with one `ConfigMap` rendered from `values.yaml`. Authored against k3s + Traefik + `local-path` and should work on any cluster with an Ingress controller and a default `ReadWriteOnce` StorageClass.
@@ -211,6 +250,8 @@ The full reference — three login modes, the `backend` ExternalName workaround
 - **Backend won't start**: check container logs with `docker compose -f docker-compose.selfhost.yml logs backend`; usually it's a bad `DATABASE_URL` or `JWT_SECRET` in `.env`
 - **Verification code not received**: no email backend is configured (neither Resend nor SMTP) → look for `[DEV] Verification code` in `docker compose logs backend`
 - **WebSocket won't connect**: for public deployments you must set `FRONTEND_ORIGIN` to your real frontend domain; see [Troubleshooting → WebSocket won't connect](/troubleshooting#websocket-wont-connect)
+- **Usage / Runtime dashboard stays at zero**: `rollup_task_usage_hourly()` isn't being scheduled — see [Step 7](#7-schedule-the-usage-rollup-required-for-the-usage-dashboard) above and [Troubleshooting → Usage dashboard shows zero](/troubleshooting#usage-dashboard-stays-at-zero)
+- **`migrate up` fails with `refusing to drop legacy daily rollups`**: upgrade-path guard from `v0.3.4 → v0.3.5+`. Run `backfill_task_usage_hourly` first — see [Step 7 → Option C](#7-schedule-the-usage-rollup-required-for-the-usage-dashboard)

 ## Next steps

--- a/apps/docs/content/docs/self-host-quickstart.zh.mdx
+++ b/apps/docs/content/docs/self-host-quickstart.zh.mdx
@@ -158,6 +158,45 @@ multica.example.com {

 流程和 Cloud 一样——见 [Cloud 快速上手 → 5-6 步](/cloud-quickstart#5-创建智能体)。

+## 7. 调度用量汇总任务（Usage Dashboard 必需）
+
+<Callout type="warning">
+Usage / Runtime 看板读的是派生表 `task_usage_hourly`，需要 `rollup_task_usage_hourly()` 周期性运行才能填充。**默认的 `pgvector/pgvector:pg17` 镜像不带 `pg_cron`**，后端进程内部也不会跑这个 rollup——什么都没调度的话，原始 `task_usage` 行会继续写入，但 dashboard 会一直停在 0，不会报错。
+</Callout>
+
+三种支持路径，三选一即可。
+
+**Option A —— 外部 cron / systemd-timer（最简单）。** 在任意外部调度器上每 5 分钟跑一次 rollup。函数是幂等的、按 watermark 推进，丢一两个 tick 下次能补上：
+
+```bash
+# /etc/cron.d/multica-rollup —— 每 5 分钟跑一次
+*/5 * * * * root docker compose -f /path/to/multica/docker-compose.selfhost.yml \
+  exec -T postgres psql -U multica -d multica \
+  -c "SELECT rollup_task_usage_hourly();" >/dev/null
+```
+
+**Option B —— 换成自带 `pg_cron` 的 Postgres 镜像。** 把 `docker-compose.selfhost.yml` 里的 `pgvector/pgvector:pg17` 换成同时带 `pgvector` 和 `pg_cron` 的镜像（比如 `supabase/postgres`，或自己 build 一份），把 `shared_preload_libraries=pg_cron` 配上、重启 Postgres，然后注册一次任务：
+
+```sql
+CREATE EXTENSION IF NOT EXISTS pg_cron;
+SELECT cron.schedule(
+  'rollup_task_usage_hourly',
+  '*/5 * * * *',
+  $$SELECT rollup_task_usage_hourly()$$
+);
+```
+
+**Option C —— 先回填历史（升级路径）。** 如果你是从 `v0.3.4` 升级到 `v0.3.5+` 且数据库里已经有 `task_usage` 行，migration `103` 会以 `refusing to drop legacy daily rollups: ...` 报错并中止 `migrate up`，直到 hourly 表被 seed 过。先跑一次内置的 backfill 命令，然后再配 Option A 或 Option B 让新数据持续流进来：
+
+```bash
+docker compose -f docker-compose.selfhost.yml exec backend \
+  ./backfill_task_usage_hourly --sleep-between-slices=2s
+```
+
+`--sleep-between-slices=2s` 用来在繁忙的数据库上限制读压力。回填跑完后重启后端容器（migration 在启动时自动跑），升级就能继续。
+
+完整参考（含 Kubernetes `CronJob` 模板和升级顺序）见仓库的 [`SELF_HOSTING_ADVANCED.md → Usage Dashboard Rollup`](https://github.com/multica-ai/multica/blob/main/SELF_HOSTING_ADVANCED.md#usage-dashboard-rollup)。
+
 ## Kubernetes 部署（替代方案）

 如果你已经在跑 Kubernetes 集群，仓库里也带了一个 Helm chart，路径 `deploy/helm/multica/`。它就是 k8s 版的 `make selfhost`——一样的 backend 镜像、frontend 镜像、`pgvector/pgvector:pg17` Postgres，封装成 Deployment / Service / Ingress，再加上一个由 `values.yaml` 渲染出来的 `ConfigMap`。这套 chart 是按照 k3s + Traefik + `local-path` 写的，集群里只要有 Ingress controller 和默认的 `ReadWriteOnce` StorageClass 就能跑，其他类型的集群稍微改一改也能用。
@@ -210,6 +249,8 @@ multica setup self-host \
 - **后端起不来**：看容器日志 `docker compose -f docker-compose.selfhost.yml logs backend`；常见是 `.env` 里 `DATABASE_URL` 或 `JWT_SECRET` 有问题
 - **验证码收不到**：没配任何邮件后端（Resend 和 SMTP 都没设） → 从 `docker compose logs backend` 里找 `[DEV] Verification code`
 - **WebSocket 连不上**：公网部署必须设 `FRONTEND_ORIGIN` 成你真实的前端域名；见 [故障排查 → WebSocket 连不上](/troubleshooting#websocket-连不上)
+- **Usage / Runtime 看板一直是 0**：没人调度 `rollup_task_usage_hourly()` —— 见上面的 [第 7 步](#7-调度用量汇总任务usage-dashboard-必需) 和 [故障排查 → Usage 看板一直是 0](/troubleshooting#usage-看板一直是-0)
+- **`migrate up` 报 `refusing to drop legacy daily rollups`**：`v0.3.4 → v0.3.5+` 升级路径的 fail-closed guard。先跑 `backfill_task_usage_hourly` —— 见 [第 7 步 → Option C](#7-调度用量汇总任务usage-dashboard-必需)

 ## 下一步

--- a/apps/docs/content/docs/troubleshooting.mdx
+++ b/apps/docs/content/docs/troubleshooting.mdx
@@ -1,6 +1,6 @@
 ---
 title: Troubleshooting
-description: The top 7 common issues when self-hosting Multica — symptoms, causes, how to diagnose, how to fix.
+description: Common issues when self-hosting Multica — symptoms, causes, how to diagnose, how to fix.
 ---

 import { Callout } from "fumadocs-ui/components/callout";
@@ -132,6 +132,75 @@ Check your inbox (including spam) for the real verification code.
 - In production, leave `MULTICA_DEV_VERIFICATION_CODE` empty — configure Resend and use real codes
 - For local development or internal testing, either copy the generated code from server logs or set `APP_ENV=development` plus `MULTICA_DEV_VERIFICATION_CODE=888888` — never enable a fixed code on a public instance (see [Sign-in and signup configuration → Fixed local testing codes](/auth-setup#fixed-local-testing-codes))

+## Usage dashboard stays at zero
+
+**Symptom**: agents complete tasks, raw token usage is written to the database, but **Settings → Usage** and **Settings → Runtime** show 0 input / output / cost across the board. This is silent — there is no error in the backend logs.
+
+**Likely causes**:
+
+1. **`rollup_task_usage_hourly()` is never scheduled** — the Usage / Runtime dashboards read from the derived `task_usage_hourly` table, which is populated by that function. The bundled `pgvector/pgvector:pg17` image does not include `pg_cron`, and the backend does not run the rollup in-process either. On a fresh self-host install with no external scheduler, this is the default state.
+2. **`pg_cron` is installed but pointing at the wrong database** — `pg_cron.database_name` defaults to `postgres`; if your Multica database has a different name, the scheduled job never sees `rollup_task_usage_hourly()`.
+3. **The scheduler is running but the rollup is silently erroring** — e.g. wrong DB role / search_path inside the cron entry.
+
+**How to diagnose**:
+
+```sql
+-- Confirm raw events exist but the hourly table is empty.
+SELECT count(*) AS raw_rows FROM task_usage;
+SELECT count(*) AS hourly_rows FROM task_usage_hourly;
+
+-- Confirm pg_cron is (or isn't) available.
+SELECT * FROM pg_available_extensions WHERE name = 'pg_cron';
+SHOW shared_preload_libraries;
+
+-- If pg_cron is installed, check the schedule + last run.
+SELECT jobname, schedule, database, active FROM cron.job;
+SELECT jobname, status, return_message, start_time, end_time
+  FROM cron.job_run_details ORDER BY start_time DESC LIMIT 10;
+
+-- Watermark — if this is 1970-01-01, the rollup has never run.
+SELECT watermark_at FROM task_usage_hourly_rollup_state;
+```
+
+**How to fix**:
+
+- Call the rollup once by hand to confirm it works: `SELECT rollup_task_usage_hourly();` — refresh the dashboard; if numbers appear, the only missing piece is a scheduler.
+- Pick one of the supported paths from [Self-host quickstart → Schedule the usage rollup](/self-host-quickstart#7-schedule-the-usage-rollup-required-for-the-usage-dashboard): external cron / systemd-timer / Kubernetes CronJob, or swap Postgres for an image with `pg_cron`.
+- If you already have history that pre-dates the schedule, run `backfill_task_usage_hourly` inside the backend container to seed buckets before the watermark.
+
+## Migration `103` fails with `refusing to drop legacy daily rollups`
+
+**Symptom**: upgrading from `v0.3.4` to `v0.3.5+`, the backend container fails to start (or `migrate up` aborts) with:
+
+```text
+ERROR: refusing to drop legacy daily rollups:
+  task_usage_hourly_rollup_state.watermark_at (1970-01-01 ...) trails
+  task_usage latest event (...) by more than 01:00:00 — backfill is
+  incomplete or pg_cron is not running. Run cmd/backfill_task_usage_hourly
+  (and let pg_cron catch up) before re-running migrate
+```
+
+**Likely cause**: this is migration `103`'s fail-closed guard. It refuses to drop the legacy daily rollups until `task_usage_hourly` has caught up with raw `task_usage`. The guard fires whenever existing rows are present and the rollup watermark still sits at the epoch — i.e. nothing has rolled history into the hourly table yet.
+
+**How to fix**:
+
+1. Run the backfill against the same database (idempotent, safe to interrupt, safe to re-run):
+
+   ```bash
+   # Docker Compose
+   docker compose -f docker-compose.selfhost.yml exec backend \
+     ./backfill_task_usage_hourly --sleep-between-slices=2s
+
+   # Kubernetes
+   kubectl -n multica exec deploy/multica-backend -- \
+     ./backfill_task_usage_hourly --sleep-between-slices=2s
+   ```
+
+2. Re-run the upgrade — restarting the backend container is enough, migrations run on startup. The guard now sees a current watermark and lets `103` apply.
+3. Set up an ongoing rollup schedule (cron / `pg_cron`) so the watermark keeps advancing — see [Self-host quickstart → Schedule the usage rollup](/self-host-quickstart#7-schedule-the-usage-rollup-required-for-the-usage-dashboard).
+
+`--sleep-between-slices=2s` is a polite default on production databases with years of history. Use `--months-back N --force-partial` if you only want to keep the last N months and are willing to permanently abandon older buckets.
+
 ## Port conflicts

 **Symptom**: `multica server` or `multica daemon start` fails with `address already in use`.
--- a/apps/docs/content/docs/troubleshooting.zh.mdx
+++ b/apps/docs/content/docs/troubleshooting.zh.mdx
@@ -1,6 +1,6 @@
 ---
 title: 故障排查
-description: self-host Multica 遇到的 Top 7 常见问题——症状、原因、怎么查、怎么修。
+description: self-host Multica 常见问题——症状、原因、怎么查、怎么修。
 ---

 import { Callout } from "fumadocs-ui/components/callout";
@@ -132,6 +132,75 @@ docker exec <container> env | grep -E 'APP_ENV|MULTICA_DEV_VERIFICATION_CODE'
 - 生产环境保持 `MULTICA_DEV_VERIFICATION_CODE` 为空，配好 Resend 后使用真实验证码
 - 本地开发或内网测试可以从 server 日志抄生成的验证码；如果需要 `888888`，设置 `APP_ENV=development` 和 `MULTICA_DEV_VERIFICATION_CODE=888888`。不要在公网实例启用固定验证码（详见 [登录与注册配置 → 固定本地测试验证码](/auth-setup#固定本地测试验证码)）

+## Usage 看板一直是 0
+
+**症状**：agent 执行完任务、原始的 token 用量已经写入数据库，但 **Settings → Usage** 和 **Settings → Runtime** 上输入 / 输出 / 成本全部显示 0。**没有任何报错**——这是静默故障。
+
+**可能原因**：
+
+1. **`rollup_task_usage_hourly()` 没人调度** —— Usage / Runtime 看板读的是派生表 `task_usage_hourly`，这张表必须靠 `rollup_task_usage_hourly()` 周期性填充。默认的 `pgvector/pgvector:pg17` 镜像不带 `pg_cron`，后端进程内部也不会跑 rollup。如果你是新装的自部署、没配过外部调度器，默认就是这种状态。
+2. **`pg_cron` 装了但指向了错的库** —— `pg_cron.database_name` 默认是 `postgres`；如果你的 Multica 数据库名不是 `postgres`，调度任务根本看不到 `rollup_task_usage_hourly()`。
+3. **调度跑了，但 rollup 静默报错** —— 比如 cron entry 里 DB role / search_path 不对。
+
+**怎么查**：
+
+```sql
+-- 确认原始数据有、hourly 表是空的
+SELECT count(*) AS raw_rows FROM task_usage;
+SELECT count(*) AS hourly_rows FROM task_usage_hourly;
+
+-- 看 pg_cron 装没装、有没有加载
+SELECT * FROM pg_available_extensions WHERE name = 'pg_cron';
+SHOW shared_preload_libraries;
+
+-- 如果 pg_cron 装了，看调度和最近一次运行
+SELECT jobname, schedule, database, active FROM cron.job;
+SELECT jobname, status, return_message, start_time, end_time
+  FROM cron.job_run_details ORDER BY start_time DESC LIMIT 10;
+
+-- watermark —— 如果还是 1970-01-01，说明 rollup 从来没跑过
+SELECT watermark_at FROM task_usage_hourly_rollup_state;
+```
+
+**怎么修**：
+
+- 手动跑一次确认函数本身没问题：`SELECT rollup_task_usage_hourly();` —— 刷新看板；如果数字出来了，缺的就只是调度器。
+- 从 [Self-host 快速上手 → 调度用量汇总任务](/self-host-quickstart#7-调度用量汇总任务usage-dashboard-必需) 里挑一种调度方式：外部 cron / systemd-timer / Kubernetes CronJob，或者换成带 `pg_cron` 的 Postgres 镜像。
+- 如果调度配好之前数据库已经有一段历史，先在后端容器里跑 `backfill_task_usage_hourly` 把 watermark 之前的桶补出来。
+
+## migration `103` 报 `refusing to drop legacy daily rollups`
+
+**症状**：从 `v0.3.4` 升级到 `v0.3.5+` 时，后端容器起不来（或 `migrate up` 中止），错误信息：
+
+```text
+ERROR: refusing to drop legacy daily rollups:
+  task_usage_hourly_rollup_state.watermark_at (1970-01-01 ...) trails
+  task_usage latest event (...) by more than 01:00:00 — backfill is
+  incomplete or pg_cron is not running. Run cmd/backfill_task_usage_hourly
+  (and let pg_cron catch up) before re-running migrate
+```
+
+**可能原因**：这是 migration `103` 的 fail-closed guard。它要求 `task_usage_hourly` 已经追平了原始的 `task_usage` 之后，才允许丢掉旧的 daily rollup。只要数据库里有历史数据、且 rollup watermark 还停在 epoch（说明还没把历史回填进 hourly 表），这条 guard 就会拦住。
+
+**怎么修**：
+
+1. 对同一个数据库跑一次 backfill（幂等，可以打断，可以重试）：
+
+   ```bash
+   # Docker Compose
+   docker compose -f docker-compose.selfhost.yml exec backend \
+     ./backfill_task_usage_hourly --sleep-between-slices=2s
+
+   # Kubernetes
+   kubectl -n multica exec deploy/multica-backend -- \
+     ./backfill_task_usage_hourly --sleep-between-slices=2s
+   ```
+
+2. 重新跑升级 —— 重启 backend 容器即可，启动时会自动跑 migration。Guard 看到新的 watermark，`103` 就会通过。
+3. 同时配上持续的 rollup 调度，保证 watermark 持续推进 —— 见 [Self-host 快速上手 → 调度用量汇总任务](/self-host-quickstart#7-调度用量汇总任务usage-dashboard-必需)。
+
+`--sleep-between-slices=2s` 在有多年历史的生产库上是个比较克制的默认值。如果你只想保留最近 N 个月、可以接受永久丢掉更老的桶，用 `--months-back N --force-partial`。
+
 ## 端口冲突

 **症状**：`multica server` 或 `multica daemon start` 启动失败，报 `address already in use`。