Compare commits

...

1 Commits

Author SHA1 Message Date
Jiang Bohan
58017be0e0 docs(self-host): document task_usage_hourly rollup requirement (MUL-2682)
The Usage / Runtime dashboards read from `task_usage_hourly`, but the
default self-host stack does not schedule `rollup_task_usage_hourly()`
anywhere — the bundled pgvector/pgvector:pg17 image ships without
pg_cron, and the backend does not run the rollup in-process. Fresh
installs see the dashboard stay at zero forever (#3244), and upgrades
from v0.3.4 → v0.3.5+ are blocked by migration 103's fail-closed guard
(#3015).

Document the three supported paths (external cron / systemd-timer /
CronJob, Postgres with pg_cron, or backfill_task_usage_hourly for
upgrades) across SELF_HOSTING.md, SELF_HOSTING_ADVANCED.md, the
quickstart pages on the docs site, and add troubleshooting entries
for both the silent-zero and the migration-guard failure modes.

Co-authored-by: multica-agent <github@multica.ai>
2026-05-26 16:01:27 +08:00
7 changed files with 394 additions and 2 deletions

View File

@@ -298,6 +298,8 @@ To roll back if an upgrade goes sideways:
helm -n multica rollback multica
```
> **Upgrading from `v0.3.4` to `v0.3.5+` fails with `refusing to drop legacy daily rollups: ...`?** Same migration guard as the Docker path — see [Usage Dashboard Rollup → Option C](#option-c--backfill-history-first-then-schedule). Run the backfill against the same database the chart is using (`kubectl -n multica exec deploy/multica-backend -- ./backfill_task_usage_hourly --sleep-between-slices=2s`), then restart the backend deployment to re-apply migrations.
### Tearing down
```bash
@@ -310,6 +312,57 @@ kubectl delete namespace multica
---
## Usage Dashboard Rollup (Required)
Starting with `v0.3.5`, the Usage / Runtime dashboards read from a derived `task_usage_hourly` table rather than directly from `task_usage`. Raw `task_usage` rows are written by the backend on every task, but the dashboard only sees data after `rollup_task_usage_hourly()` runs and aggregates them into `task_usage_hourly`.
**The bundled `pgvector/pgvector:pg17` image does NOT include `pg_cron`.** If nothing schedules the rollup, the dashboard will stay at zero forever even though `task_usage` is populated. You have three supported options — pick one before relying on the dashboard.
> **Upgrading from `v0.3.4` to `v0.3.5+`** with existing `task_usage` history: migration `103` is fail-closed and will abort `migrate up` with `refusing to drop legacy daily rollups: …`. Run `backfill_task_usage_hourly` first (Option C below), then re-run the upgrade. **Fresh installs** are exempted by that guard and migrate cleanly — but the dashboard will still stay at zero until you pick Option A or Option B.
### Option A — External cron / systemd-timer (simplest)
Schedule a 5-minute job that calls `rollup_task_usage_hourly()`. It is idempotent and watermark-driven, so a missed tick catches up on the next run.
```bash
# /etc/cron.d/multica-rollup — every 5 minutes
*/5 * * * * root docker compose -f /path/to/multica/docker-compose.selfhost.yml \
exec -T postgres psql -U multica -d multica \
-c "SELECT rollup_task_usage_hourly();" >/dev/null
```
Or as a systemd timer + service if you prefer that surface. The function returns the number of (upserted + deleted-empty) rows; it's safe to call concurrently with itself (an advisory lock makes overlapping runs no-op) and safe to call alongside `backfill_task_usage_hourly`.
### Option B — Swap Postgres for an image that ships `pg_cron`
If you'd rather have Postgres schedule itself, replace `pgvector/pgvector:pg17` in `docker-compose.selfhost.yml` with an image that bundles both `pgvector` and `pg_cron` (e.g. `supabase/postgres`, or your own build of `pgvector/pgvector` with `pg_cron` added and `shared_preload_libraries=pg_cron` set on the server). Then, once:
```sql
CREATE EXTENSION IF NOT EXISTS pg_cron;
SELECT cron.schedule(
'rollup_task_usage_hourly',
'*/5 * * * *',
$$SELECT rollup_task_usage_hourly()$$
);
```
`shared_preload_libraries` requires a Postgres restart to take effect — set it in `postgresql.conf` (or via the image's documented mechanism) before bringing the container up.
### Option C — Backfill history first, then schedule
If you're upgrading from `v0.3.4 → v0.3.5+` and already have `task_usage` rows (or you just want the dashboard to show historical data on a fresh install that you've been running for a while), run the bundled backfill command once before scheduling the rollup:
```bash
# Backfills task_usage_hourly from all historical task_usage rows and stamps
# the rollup watermark. Idempotent — safe to re-run.
docker compose -f docker-compose.selfhost.yml exec backend \
./backfill_task_usage_hourly --sleep-between-slices=2s
```
On a database with years of data this can scan tens of millions of rows; `--sleep-between-slices=2s` throttles the read pressure. Use `--months-back N` (plus `--force-partial`) if you only want the last N months. Once it finishes, set up Option A or Option B so new buckets keep flowing.
After upgrading, re-run `migrate up` (or restart the backend container — migrations run automatically on startup) to apply migration `103` cleanly.
## Stopping Services
If you installed via the install script:
@@ -350,6 +403,8 @@ docker compose -f docker-compose.selfhost.yml up -d
Pin `MULTICA_IMAGE_TAG` in `.env` to an exact version like `v0.2.4` if you want to stay on a specific release. Migrations run automatically on backend startup.
If the selected GHCR tag has not been published yet, fall back to `make selfhost-build` or `docker compose -f docker-compose.selfhost.yml -f docker-compose.selfhost.build.yml up -d --build`.
> **Upgrading from `v0.3.4` to `v0.3.5+` fails with `refusing to drop legacy daily rollups: ...`?** That's migration `103`'s fail-closed guard: it requires `task_usage_hourly` to be seeded before the legacy daily rollups are dropped. Run `backfill_task_usage_hourly` first, then re-run the upgrade. Full instructions in [Usage Dashboard Rollup → Option C](#option-c--backfill-history-first-then-schedule).
---
## Manual Docker Compose Setup

View File

@@ -166,6 +166,111 @@ The Docker Compose setup runs migrations automatically. If you need to run them
cd server && go run ./cmd/migrate up
```
## Usage Dashboard Rollup
The Usage and Runtime dashboards read from `task_usage_hourly`, a derived table populated by `rollup_task_usage_hourly()`. The function is **not** scheduled out of the box on the default self-host stack: the bundled `pgvector/pgvector:pg17` image ships without `pg_cron`, and the backend does not run the rollup in-process either. Until something calls it on a schedule, raw `task_usage` rows will keep arriving while the dashboard stays at zero.
Pick one of the supported paths:
### Option A — External cron / systemd-timer
The simplest path. Schedule `SELECT rollup_task_usage_hourly()` every five minutes from any out-of-band timer (host crontab, systemd timer, sidecar container, Kubernetes CronJob). It is idempotent and watermark-driven — overlapping runs are no-ops on an internal advisory lock, and a missed tick catches up on the next run.
Docker Compose:
```bash
# /etc/cron.d/multica-rollup
*/5 * * * * root docker compose -f /path/to/multica/docker-compose.selfhost.yml \
exec -T postgres psql -U multica -d multica \
-c "SELECT rollup_task_usage_hourly();" >/dev/null
```
Kubernetes (one-off `CronJob`):
```yaml
apiVersion: batch/v1
kind: CronJob
metadata:
name: multica-usage-rollup
spec:
schedule: "*/5 * * * *"
concurrencyPolicy: Forbid
jobTemplate:
spec:
template:
spec:
restartPolicy: OnFailure
containers:
- name: psql
image: postgres:17-alpine
command:
- psql
- "$(DATABASE_URL)"
- -c
- "SELECT rollup_task_usage_hourly();"
env:
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: multica-secrets
key: DATABASE_URL
```
### Option B — Postgres with `pg_cron`
If you'd rather have Postgres schedule itself, swap the bundled image for one that ships both `pgvector` and `pg_cron` (e.g. `supabase/postgres`, or a custom build of `pgvector/pgvector` with `pg_cron` added). `pg_cron` requires `shared_preload_libraries=pg_cron` in `postgresql.conf`, which only takes effect on Postgres restart — set it before bringing the container up.
Then register the job once:
```sql
CREATE EXTENSION IF NOT EXISTS pg_cron;
SELECT cron.schedule(
'rollup_task_usage_hourly',
'*/5 * * * *',
$$SELECT rollup_task_usage_hourly()$$
);
```
`pg_cron.database_name` defaults to `postgres`; if your Multica database has a different name, point `pg_cron` at it via that GUC or run `cron.schedule_in_database(...)` instead.
### Option C — Backfill historical data first
`rollup_task_usage_hourly()` only processes new buckets after it starts running. If you already have `task_usage` rows from before the rollup was scheduled — most commonly when upgrading from `v0.3.4` to `v0.3.5+`, or on a fresh install that has been collecting usage for a while — run `backfill_task_usage_hourly` once to seed historical buckets, then set up Option A or Option B for ongoing rollups.
```bash
# Docker Compose
docker compose -f docker-compose.selfhost.yml exec backend \
./backfill_task_usage_hourly --sleep-between-slices=2s
# Kubernetes
kubectl -n multica exec deploy/multica-backend -- \
./backfill_task_usage_hourly --sleep-between-slices=2s
```
The command walks `task_usage`'s full time range in monthly slices and calls the same idempotent primitive the cron path uses, so it's safe to re-run, to interrupt with Ctrl-C, and to run concurrently with an already-scheduled rollup. Flags:
| Flag | Description |
|---|---|
| `--sleep-between-slices` | Pause between monthly slices to throttle read pressure on busy databases (e.g. `2s`). Recommended on production DBs with years of history. |
| `--months-back N` | Only backfill the last N months. **Requires `--force-partial`** because the watermark still advances past the skipped older buckets — those are permanently abandoned. |
| `--dry-run` | Log slices that would be processed without writing anything. |
After backfill completes, the rollup-state watermark is stamped to `now() - 5 minutes`, so the first scheduled tick after backfill does not redo history.
### `v0.3.4 → v0.3.5+` upgrade order
Migration `103` adds a fail-closed guard that refuses to drop the legacy daily rollups until `task_usage_hourly` has caught up. If you run `migrate up` straight through on a database with existing `task_usage` rows, it aborts with:
```text
ERROR: refusing to drop legacy daily rollups:
task_usage_hourly_rollup_state.watermark_at (1970-01-01 ...) trails
task_usage latest event (...) by more than 01:00:00 — backfill is
incomplete or pg_cron is not running. Run cmd/backfill_task_usage_hourly
(and let pg_cron catch up) before re-running migrate
```
Recovery is straightforward: run `backfill_task_usage_hourly` (Option C above), then re-run `migrate up` (or restart the backend container — migrations run automatically on startup). **Fresh installs are exempt** — the guard short-circuits when `task_usage` is empty, and migrations succeed, but the dashboard will still stay at zero until you set up Option A or Option B.
## Manual Setup (Without Docker Compose)
If you prefer to build and run services manually:

View File

@@ -133,6 +133,18 @@ Alternatively, configure step by step: `multica config set server_url http://loc
3. Go to **Settings → Agents** and create a new agent
4. Create an issue and assign it to your agent
## Usage Dashboard Rollup (Required)
Starting with `v0.3.5`, the Usage / Runtime dashboards read from a derived `task_usage_hourly` table populated by `rollup_task_usage_hourly()`. The bundled `pgvector/pgvector:pg17` image does **not** include `pg_cron`, and the backend doesn't run the rollup in-process either — until you schedule it yourself, the dashboard will stay at zero even though `task_usage` is populated.
Pick one supported path before relying on the Usage / Runtime dashboard:
- **External cron / systemd-timer / Kubernetes `CronJob`**: schedule `SELECT rollup_task_usage_hourly()` every 5 minutes. Idempotent, watermark-driven — overlapping or skipped ticks are safe.
- **Postgres with `pg_cron`**: swap the bundled Postgres image for one that ships `pg_cron`, set `shared_preload_libraries=pg_cron`, then `SELECT cron.schedule('rollup_task_usage_hourly', '*/5 * * * *', 'SELECT rollup_task_usage_hourly()')` once.
- **Backfill historical data**: required on the `v0.3.4 → v0.3.5+` upgrade path when the database already has `task_usage` rows — migration `103` is fail-closed and will abort `migrate up` with `refusing to drop legacy daily rollups: ...` until the hourly table is seeded. Run `./backfill_task_usage_hourly --sleep-between-slices=2s` inside the backend container, then re-run the upgrade and configure one of the schedules above.
Full reference (Compose + Kubernetes templates, flag descriptions, upgrade order) lives in [`SELF_HOSTING_ADVANCED.md → Usage Dashboard Rollup`](https://github.com/multica-ai/multica/blob/main/SELF_HOSTING_ADVANCED.md#usage-dashboard-rollup).
## Stopping Services
```bash

View File

@@ -159,6 +159,45 @@ After bringing the proxy up, set `FRONTEND_ORIGIN=https://multica.example.com` i
Same flow as Cloud — see [Cloud quickstart → Steps 5-6](/cloud-quickstart#5-create-an-agent).
## 7. Schedule the usage rollup (required for the Usage dashboard)
<Callout type="warning">
The Usage / Runtime dashboards read from a derived `task_usage_hourly` table populated by `rollup_task_usage_hourly()`. The bundled `pgvector/pgvector:pg17` Postgres image **does not include `pg_cron`**, and the backend does not run the rollup in-process either. If nothing schedules `rollup_task_usage_hourly()`, raw `task_usage` rows keep arriving while the dashboard stays at zero forever.
</Callout>
Pick one of the supported options — only one is needed.
**Option A — External cron / systemd-timer (simplest).** Run the rollup every 5 minutes from any out-of-band scheduler. It's idempotent and watermark-driven, so missed ticks catch up:
```bash
# /etc/cron.d/multica-rollup — every 5 minutes
*/5 * * * * root docker compose -f /path/to/multica/docker-compose.selfhost.yml \
exec -T postgres psql -U multica -d multica \
-c "SELECT rollup_task_usage_hourly();" >/dev/null
```
**Option B — Swap Postgres for an image that ships `pg_cron`.** Replace `pgvector/pgvector:pg17` in `docker-compose.selfhost.yml` with an image that has both `pgvector` and `pg_cron` (`supabase/postgres`, or a custom build), set `shared_preload_libraries=pg_cron`, restart, then register the job once:
```sql
CREATE EXTENSION IF NOT EXISTS pg_cron;
SELECT cron.schedule(
'rollup_task_usage_hourly',
'*/5 * * * *',
$$SELECT rollup_task_usage_hourly()$$
);
```
**Option C — Backfill history first (upgrade path).** If you're upgrading from `v0.3.4 → v0.3.5+` and have existing `task_usage` rows, migration `103` will abort `migrate up` with `refusing to drop legacy daily rollups: ...` until the hourly table is seeded. Run the bundled backfill once, then set up Option A or B:
```bash
docker compose -f docker-compose.selfhost.yml exec backend \
./backfill_task_usage_hourly --sleep-between-slices=2s
```
`--sleep-between-slices=2s` throttles read pressure on a busy DB. After it finishes, restart the backend container (migrations run on startup) and the upgrade completes.
Full reference — including the Kubernetes `CronJob` template and the upgrade order — lives in the repo's [`SELF_HOSTING_ADVANCED.md → Usage Dashboard Rollup`](https://github.com/multica-ai/multica/blob/main/SELF_HOSTING_ADVANCED.md#usage-dashboard-rollup).
## Kubernetes deployment (alternative)
If you already run a Kubernetes cluster, the repo also ships a Helm chart at `deploy/helm/multica/`. It's the equivalent of `make selfhost` for k8s — same backend image, frontend image, and `pgvector/pgvector:pg17` Postgres, packaged as Deployments / Services / Ingresses with one `ConfigMap` rendered from `values.yaml`. Authored against k3s + Traefik + `local-path` and should work on any cluster with an Ingress controller and a default `ReadWriteOnce` StorageClass.
@@ -211,6 +250,8 @@ The full reference — three login modes, the `backend` ExternalName workaround
- **Backend won't start**: check container logs with `docker compose -f docker-compose.selfhost.yml logs backend`; usually it's a bad `DATABASE_URL` or `JWT_SECRET` in `.env`
- **Verification code not received**: no email backend is configured (neither Resend nor SMTP) → look for `[DEV] Verification code` in `docker compose logs backend`
- **WebSocket won't connect**: for public deployments you must set `FRONTEND_ORIGIN` to your real frontend domain; see [Troubleshooting → WebSocket won't connect](/troubleshooting#websocket-wont-connect)
- **Usage / Runtime dashboard stays at zero**: `rollup_task_usage_hourly()` isn't being scheduled — see [Step 7](#7-schedule-the-usage-rollup-required-for-the-usage-dashboard) above and [Troubleshooting → Usage dashboard shows zero](/troubleshooting#usage-dashboard-stays-at-zero)
- **`migrate up` fails with `refusing to drop legacy daily rollups`**: upgrade-path guard from `v0.3.4 → v0.3.5+`. Run `backfill_task_usage_hourly` first — see [Step 7 → Option C](#7-schedule-the-usage-rollup-required-for-the-usage-dashboard)
## Next steps

View File

@@ -158,6 +158,45 @@ multica.example.com {
流程和 Cloud 一样——见 [Cloud 快速上手 → 5-6 步](/cloud-quickstart#5-创建智能体)。
## 7. 调度用量汇总任务Usage Dashboard 必需)
<Callout type="warning">
Usage / Runtime 看板读的是派生表 `task_usage_hourly`,需要 `rollup_task_usage_hourly()` 周期性运行才能填充。**默认的 `pgvector/pgvector:pg17` 镜像不带 `pg_cron`**,后端进程内部也不会跑这个 rollup——什么都没调度的话原始 `task_usage` 行会继续写入,但 dashboard 会一直停在 0不会报错。
</Callout>
三种支持路径,三选一即可。
**Option A —— 外部 cron / systemd-timer最简单。** 在任意外部调度器上每 5 分钟跑一次 rollup。函数是幂等的、按 watermark 推进,丢一两个 tick 下次能补上:
```bash
# /etc/cron.d/multica-rollup —— 每 5 分钟跑一次
*/5 * * * * root docker compose -f /path/to/multica/docker-compose.selfhost.yml \
exec -T postgres psql -U multica -d multica \
-c "SELECT rollup_task_usage_hourly();" >/dev/null
```
**Option B —— 换成自带 `pg_cron` 的 Postgres 镜像。** 把 `docker-compose.selfhost.yml` 里的 `pgvector/pgvector:pg17` 换成同时带 `pgvector` 和 `pg_cron` 的镜像(比如 `supabase/postgres`,或自己 build 一份),把 `shared_preload_libraries=pg_cron` 配上、重启 Postgres然后注册一次任务
```sql
CREATE EXTENSION IF NOT EXISTS pg_cron;
SELECT cron.schedule(
'rollup_task_usage_hourly',
'*/5 * * * *',
$$SELECT rollup_task_usage_hourly()$$
);
```
**Option C —— 先回填历史(升级路径)。** 如果你是从 `v0.3.4` 升级到 `v0.3.5+` 且数据库里已经有 `task_usage` 行migration `103` 会以 `refusing to drop legacy daily rollups: ...` 报错并中止 `migrate up`,直到 hourly 表被 seed 过。先跑一次内置的 backfill 命令,然后再配 Option A 或 Option B 让新数据持续流进来:
```bash
docker compose -f docker-compose.selfhost.yml exec backend \
./backfill_task_usage_hourly --sleep-between-slices=2s
```
`--sleep-between-slices=2s` 用来在繁忙的数据库上限制读压力。回填跑完后重启后端容器migration 在启动时自动跑),升级就能继续。
完整参考(含 Kubernetes `CronJob` 模板和升级顺序)见仓库的 [`SELF_HOSTING_ADVANCED.md → Usage Dashboard Rollup`](https://github.com/multica-ai/multica/blob/main/SELF_HOSTING_ADVANCED.md#usage-dashboard-rollup)。
## Kubernetes 部署(替代方案)
如果你已经在跑 Kubernetes 集群,仓库里也带了一个 Helm chart路径 `deploy/helm/multica/`。它就是 k8s 版的 `make selfhost`——一样的 backend 镜像、frontend 镜像、`pgvector/pgvector:pg17` Postgres封装成 Deployment / Service / Ingress再加上一个由 `values.yaml` 渲染出来的 `ConfigMap`。这套 chart 是按照 k3s + Traefik + `local-path` 写的,集群里只要有 Ingress controller 和默认的 `ReadWriteOnce` StorageClass 就能跑,其他类型的集群稍微改一改也能用。
@@ -210,6 +249,8 @@ multica setup self-host \
- **后端起不来**:看容器日志 `docker compose -f docker-compose.selfhost.yml logs backend`;常见是 `.env` 里 `DATABASE_URL` 或 `JWT_SECRET` 有问题
- **验证码收不到**没配任何邮件后端Resend 和 SMTP 都没设) → 从 `docker compose logs backend` 里找 `[DEV] Verification code`
- **WebSocket 连不上**:公网部署必须设 `FRONTEND_ORIGIN` 成你真实的前端域名;见 [故障排查 → WebSocket 连不上](/troubleshooting#websocket-连不上)
- **Usage / Runtime 看板一直是 0**:没人调度 `rollup_task_usage_hourly()` —— 见上面的 [第 7 步](#7-调度用量汇总任务usage-dashboard-必需) 和 [故障排查 → Usage 看板一直是 0](/troubleshooting#usage-看板一直是-0)
- **`migrate up` 报 `refusing to drop legacy daily rollups`**`v0.3.4 → v0.3.5+` 升级路径的 fail-closed guard。先跑 `backfill_task_usage_hourly` —— 见 [第 7 步 → Option C](#7-调度用量汇总任务usage-dashboard-必需)
## 下一步

View File

@@ -1,6 +1,6 @@
---
title: Troubleshooting
description: The top 7 common issues when self-hosting Multica — symptoms, causes, how to diagnose, how to fix.
description: Common issues when self-hosting Multica — symptoms, causes, how to diagnose, how to fix.
---
import { Callout } from "fumadocs-ui/components/callout";
@@ -132,6 +132,75 @@ Check your inbox (including spam) for the real verification code.
- In production, leave `MULTICA_DEV_VERIFICATION_CODE` empty — configure Resend and use real codes
- For local development or internal testing, either copy the generated code from server logs or set `APP_ENV=development` plus `MULTICA_DEV_VERIFICATION_CODE=888888` — never enable a fixed code on a public instance (see [Sign-in and signup configuration → Fixed local testing codes](/auth-setup#fixed-local-testing-codes))
## Usage dashboard stays at zero
**Symptom**: agents complete tasks, raw token usage is written to the database, but **Settings → Usage** and **Settings → Runtime** show 0 input / output / cost across the board. This is silent — there is no error in the backend logs.
**Likely causes**:
1. **`rollup_task_usage_hourly()` is never scheduled** — the Usage / Runtime dashboards read from the derived `task_usage_hourly` table, which is populated by that function. The bundled `pgvector/pgvector:pg17` image does not include `pg_cron`, and the backend does not run the rollup in-process either. On a fresh self-host install with no external scheduler, this is the default state.
2. **`pg_cron` is installed but pointing at the wrong database** — `pg_cron.database_name` defaults to `postgres`; if your Multica database has a different name, the scheduled job never sees `rollup_task_usage_hourly()`.
3. **The scheduler is running but the rollup is silently erroring** — e.g. wrong DB role / search_path inside the cron entry.
**How to diagnose**:
```sql
-- Confirm raw events exist but the hourly table is empty.
SELECT count(*) AS raw_rows FROM task_usage;
SELECT count(*) AS hourly_rows FROM task_usage_hourly;
-- Confirm pg_cron is (or isn't) available.
SELECT * FROM pg_available_extensions WHERE name = 'pg_cron';
SHOW shared_preload_libraries;
-- If pg_cron is installed, check the schedule + last run.
SELECT jobname, schedule, database, active FROM cron.job;
SELECT jobname, status, return_message, start_time, end_time
FROM cron.job_run_details ORDER BY start_time DESC LIMIT 10;
-- Watermark — if this is 1970-01-01, the rollup has never run.
SELECT watermark_at FROM task_usage_hourly_rollup_state;
```
**How to fix**:
- Call the rollup once by hand to confirm it works: `SELECT rollup_task_usage_hourly();` — refresh the dashboard; if numbers appear, the only missing piece is a scheduler.
- Pick one of the supported paths from [Self-host quickstart → Schedule the usage rollup](/self-host-quickstart#7-schedule-the-usage-rollup-required-for-the-usage-dashboard): external cron / systemd-timer / Kubernetes CronJob, or swap Postgres for an image with `pg_cron`.
- If you already have history that pre-dates the schedule, run `backfill_task_usage_hourly` inside the backend container to seed buckets before the watermark.
## Migration `103` fails with `refusing to drop legacy daily rollups`
**Symptom**: upgrading from `v0.3.4` to `v0.3.5+`, the backend container fails to start (or `migrate up` aborts) with:
```text
ERROR: refusing to drop legacy daily rollups:
task_usage_hourly_rollup_state.watermark_at (1970-01-01 ...) trails
task_usage latest event (...) by more than 01:00:00 — backfill is
incomplete or pg_cron is not running. Run cmd/backfill_task_usage_hourly
(and let pg_cron catch up) before re-running migrate
```
**Likely cause**: this is migration `103`'s fail-closed guard. It refuses to drop the legacy daily rollups until `task_usage_hourly` has caught up with raw `task_usage`. The guard fires whenever existing rows are present and the rollup watermark still sits at the epoch — i.e. nothing has rolled history into the hourly table yet.
**How to fix**:
1. Run the backfill against the same database (idempotent, safe to interrupt, safe to re-run):
```bash
# Docker Compose
docker compose -f docker-compose.selfhost.yml exec backend \
./backfill_task_usage_hourly --sleep-between-slices=2s
# Kubernetes
kubectl -n multica exec deploy/multica-backend -- \
./backfill_task_usage_hourly --sleep-between-slices=2s
```
2. Re-run the upgrade — restarting the backend container is enough, migrations run on startup. The guard now sees a current watermark and lets `103` apply.
3. Set up an ongoing rollup schedule (cron / `pg_cron`) so the watermark keeps advancing — see [Self-host quickstart → Schedule the usage rollup](/self-host-quickstart#7-schedule-the-usage-rollup-required-for-the-usage-dashboard).
`--sleep-between-slices=2s` is a polite default on production databases with years of history. Use `--months-back N --force-partial` if you only want to keep the last N months and are willing to permanently abandon older buckets.
## Port conflicts
**Symptom**: `multica server` or `multica daemon start` fails with `address already in use`.

View File

@@ -1,6 +1,6 @@
---
title: 故障排查
description: self-host Multica 遇到的 Top 7 常见问题——症状、原因、怎么查、怎么修。
description: self-host Multica 常见问题——症状、原因、怎么查、怎么修。
---
import { Callout } from "fumadocs-ui/components/callout";
@@ -132,6 +132,75 @@ docker exec <container> env | grep -E 'APP_ENV|MULTICA_DEV_VERIFICATION_CODE'
- 生产环境保持 `MULTICA_DEV_VERIFICATION_CODE` 为空,配好 Resend 后使用真实验证码
- 本地开发或内网测试可以从 server 日志抄生成的验证码;如果需要 `888888`,设置 `APP_ENV=development` 和 `MULTICA_DEV_VERIFICATION_CODE=888888`。不要在公网实例启用固定验证码(详见 [登录与注册配置 → 固定本地测试验证码](/auth-setup#固定本地测试验证码)
## Usage 看板一直是 0
**症状**agent 执行完任务、原始的 token 用量已经写入数据库,但 **Settings → Usage** 和 **Settings → Runtime** 上输入 / 输出 / 成本全部显示 0。**没有任何报错**——这是静默故障。
**可能原因**
1. **`rollup_task_usage_hourly()` 没人调度** —— Usage / Runtime 看板读的是派生表 `task_usage_hourly`,这张表必须靠 `rollup_task_usage_hourly()` 周期性填充。默认的 `pgvector/pgvector:pg17` 镜像不带 `pg_cron`,后端进程内部也不会跑 rollup。如果你是新装的自部署、没配过外部调度器默认就是这种状态。
2. **`pg_cron` 装了但指向了错的库** —— `pg_cron.database_name` 默认是 `postgres`;如果你的 Multica 数据库名不是 `postgres`,调度任务根本看不到 `rollup_task_usage_hourly()`。
3. **调度跑了,但 rollup 静默报错** —— 比如 cron entry 里 DB role / search_path 不对。
**怎么查**
```sql
-- 确认原始数据有、hourly 表是空的
SELECT count(*) AS raw_rows FROM task_usage;
SELECT count(*) AS hourly_rows FROM task_usage_hourly;
-- 看 pg_cron 装没装、有没有加载
SELECT * FROM pg_available_extensions WHERE name = 'pg_cron';
SHOW shared_preload_libraries;
-- 如果 pg_cron 装了,看调度和最近一次运行
SELECT jobname, schedule, database, active FROM cron.job;
SELECT jobname, status, return_message, start_time, end_time
FROM cron.job_run_details ORDER BY start_time DESC LIMIT 10;
-- watermark —— 如果还是 1970-01-01说明 rollup 从来没跑过
SELECT watermark_at FROM task_usage_hourly_rollup_state;
```
**怎么修**
- 手动跑一次确认函数本身没问题:`SELECT rollup_task_usage_hourly();` —— 刷新看板;如果数字出来了,缺的就只是调度器。
- 从 [Self-host 快速上手 → 调度用量汇总任务](/self-host-quickstart#7-调度用量汇总任务usage-dashboard-必需) 里挑一种调度方式:外部 cron / systemd-timer / Kubernetes CronJob或者换成带 `pg_cron` 的 Postgres 镜像。
- 如果调度配好之前数据库已经有一段历史,先在后端容器里跑 `backfill_task_usage_hourly` 把 watermark 之前的桶补出来。
## migration `103` 报 `refusing to drop legacy daily rollups`
**症状**:从 `v0.3.4` 升级到 `v0.3.5+` 时,后端容器起不来(或 `migrate up` 中止),错误信息:
```text
ERROR: refusing to drop legacy daily rollups:
task_usage_hourly_rollup_state.watermark_at (1970-01-01 ...) trails
task_usage latest event (...) by more than 01:00:00 — backfill is
incomplete or pg_cron is not running. Run cmd/backfill_task_usage_hourly
(and let pg_cron catch up) before re-running migrate
```
**可能原因**:这是 migration `103` 的 fail-closed guard。它要求 `task_usage_hourly` 已经追平了原始的 `task_usage` 之后,才允许丢掉旧的 daily rollup。只要数据库里有历史数据、且 rollup watermark 还停在 epoch说明还没把历史回填进 hourly 表),这条 guard 就会拦住。
**怎么修**
1. 对同一个数据库跑一次 backfill幂等可以打断可以重试
```bash
# Docker Compose
docker compose -f docker-compose.selfhost.yml exec backend \
./backfill_task_usage_hourly --sleep-between-slices=2s
# Kubernetes
kubectl -n multica exec deploy/multica-backend -- \
./backfill_task_usage_hourly --sleep-between-slices=2s
```
2. 重新跑升级 —— 重启 backend 容器即可,启动时会自动跑 migration。Guard 看到新的 watermark`103` 就会通过。
3. 同时配上持续的 rollup 调度,保证 watermark 持续推进 —— 见 [Self-host 快速上手 → 调度用量汇总任务](/self-host-quickstart#7-调度用量汇总任务usage-dashboard-必需)。
`--sleep-between-slices=2s` 在有多年历史的生产库上是个比较克制的默认值。如果你只想保留最近 N 个月、可以接受永久丢掉更老的桶,用 `--months-back N --force-partial`。
## 端口冲突
**症状**`multica server` 或 `multica daemon start` 启动失败,报 `address already in use`。