mirror of
https://github.com/multica-ai/multica.git
synced 2026-06-16 19:29:26 +02:00
docs(self-host): default rollup to in-process scheduler, demote pg_cron/cron/systemd/CronJob to compat paths (#3808)
Rewrite the 'Usage Dashboard Rollup (Required)' section in SELF_HOSTING.md and apps/docs/content/docs/getting-started/self-hosting.zh.mdx so that: - The DB-backed in-process scheduler (sys_cron_executions, MUL-2957) is documented as the default for fresh self-host installs. The bundled pgvector/pgvector:pg17 image works as-is and no operator step is required. - pg_cron, external cron, systemd timer, and Kubernetes CronJob are demoted to a 'Compatibility paths (existing deployments only)' subsection. They remain supported (advisory lock 4246 prevents double-writes) but are no longer the recommended setup. - A retirement sequence is added for production environments that already have a pg_cron job: confirm in-process SUCCESS rows in sys_cron_executions, then cron.unschedule the redundant entry; leave the pg_cron extension installed unless other workloads stop depending on it. - The two upgrade callouts that pointed to the removed 'Usage Dashboard Rollup -> Option C' anchor are repointed to SELF_HOSTING_ADVANCED.md#usage-dashboard-rollup, which already documents the auto-hook backfill and the recovery flow. Refs MUL-3077. Co-authored-by: multica-agent <github@multica.ai>
This commit is contained in:
@@ -326,7 +326,7 @@ To roll back if an upgrade goes sideways:
|
||||
helm -n multica rollback multica
|
||||
```
|
||||
|
||||
> **Upgrading from `v0.3.4` to `v0.3.5+` fails with `refusing to drop legacy daily rollups: ...`?** Same migration guard as the Docker path — see [Usage Dashboard Rollup → Option C](#option-c--backfill-history-first-then-schedule). Run the backfill against the same database the chart is using (`kubectl -n multica exec deploy/multica-backend -- ./backfill_task_usage_hourly --sleep-between-slices=2s`), then restart the backend deployment to re-apply migrations.
|
||||
> **Upgrading from `v0.3.4` to `v0.3.5+` fails with `refusing to drop legacy daily rollups: ...`?** As of MUL-2957 the `migrate up` command runs an idempotent monthly-slice backfill automatically before applying migration `103`, so a clean upgrade is a single `helm upgrade` + backend rollout. If you are still on a pre-MUL-2957 binary or the auto-hook fails, run the standalone backfill against the same database the chart is using (`kubectl -n multica exec deploy/multica-backend -- ./backfill_task_usage_hourly --sleep-between-slices=2s`), then restart the backend deployment to re-apply migrations. See [Advanced Configuration → Usage Dashboard Rollup](SELF_HOSTING_ADVANCED.md#usage-dashboard-rollup) for the full recovery flow.
|
||||
|
||||
### Tearing down
|
||||
|
||||
@@ -340,56 +340,52 @@ kubectl delete namespace multica
|
||||
|
||||
---
|
||||
|
||||
## Usage Dashboard Rollup (Required)
|
||||
## Usage Dashboard Rollup
|
||||
|
||||
Starting with `v0.3.5`, the Usage / Runtime dashboards read from a derived `task_usage_hourly` table rather than directly from `task_usage`. Raw `task_usage` rows are written by the backend on every task, but the dashboard only sees data after `rollup_task_usage_hourly()` runs and aggregates them into `task_usage_hourly`.
|
||||
The Usage / Runtime dashboards read from a derived `task_usage_hourly` table populated by `rollup_task_usage_hourly()`. As of MUL-2957 the backend runs this rollup **in-process** on every replica via a DB-backed scheduler (`sys_cron_executions`); a fresh self-host install needs no operator action and the bundled `pgvector/pgvector:pg17` image works without changes — you do **not** need to swap it for an image that ships `pg_cron`, register an external cron job, set up a systemd timer, or run a Kubernetes `CronJob`.
|
||||
|
||||
**The bundled `pgvector/pgvector:pg17` image does NOT include `pg_cron`.** If nothing schedules the rollup, the dashboard will stay at zero forever even though `task_usage` is populated. You have three supported options — pick one before relying on the dashboard.
|
||||
|
||||
> **Upgrading from `v0.3.4` to `v0.3.5+`** with existing `task_usage` history: migration `103` is fail-closed and will abort `migrate up` with `refusing to drop legacy daily rollups: …`. Run `backfill_task_usage_hourly` first (Option C below), then re-run the upgrade. **Fresh installs** are exempted by that guard and migrate cleanly — but the dashboard will still stay at zero until you pick Option A or Option B.
|
||||
|
||||
### Option A — External cron / systemd-timer (simplest)
|
||||
|
||||
Schedule a 5-minute job that calls `rollup_task_usage_hourly()`. It is idempotent and watermark-driven, so a missed tick catches up on the next run.
|
||||
|
||||
```bash
|
||||
# /etc/cron.d/multica-rollup — every 5 minutes
|
||||
*/5 * * * * root docker compose -f /path/to/multica/docker-compose.selfhost.yml \
|
||||
exec -T postgres psql -U multica -d multica \
|
||||
-c "SELECT rollup_task_usage_hourly();" >/dev/null
|
||||
```
|
||||
|
||||
Or as a systemd timer + service if you prefer that surface. The function returns the number of (upserted + deleted-empty) rows; it's safe to call concurrently with itself (an advisory lock makes overlapping runs no-op) and safe to call alongside `backfill_task_usage_hourly`.
|
||||
|
||||
### Option B — Swap Postgres for an image that ships `pg_cron`
|
||||
|
||||
If you'd rather have Postgres schedule itself, replace `pgvector/pgvector:pg17` in `docker-compose.selfhost.yml` with an image that bundles both `pgvector` and `pg_cron` (e.g. `supabase/postgres`, or your own build of `pgvector/pgvector` with `pg_cron` added and `shared_preload_libraries=pg_cron` set on the server). Then, once:
|
||||
Multiple backend replicas are safe: each replica ticks every 30 seconds and tries to claim the current 5-minute UTC plan, but the unique key `(job_name, scope_kind, scope_id, plan_time)` means only one wins each plan. Inspect steady-state operation:
|
||||
|
||||
```sql
|
||||
CREATE EXTENSION IF NOT EXISTS pg_cron;
|
||||
SELECT cron.schedule(
|
||||
'rollup_task_usage_hourly',
|
||||
'*/5 * * * *',
|
||||
$$SELECT rollup_task_usage_hourly()$$
|
||||
);
|
||||
SELECT plan_time, status, attempt, runner_id,
|
||||
error_code, error_msg, started_at, finished_at
|
||||
FROM sys_cron_executions
|
||||
WHERE job_name = 'rollup_task_usage_hourly'
|
||||
ORDER BY plan_time DESC
|
||||
LIMIT 20;
|
||||
```
|
||||
|
||||
`shared_preload_libraries` requires a Postgres restart to take effect — set it in `postgresql.conf` (or via the image's documented mechanism) before bringing the container up.
|
||||
Full reference (audit table semantics, advisory lock 4246, the standalone backfill command, flag descriptions, the `v0.3.4 → v0.3.5+` migration auto-hook) lives in [Advanced Configuration → Usage Dashboard Rollup](SELF_HOSTING_ADVANCED.md#usage-dashboard-rollup).
|
||||
|
||||
### Option C — Backfill history first, then schedule
|
||||
> **Upgrading from `v0.3.4` to `v0.3.5+`?** As of MUL-2957 the `migrate up` command runs an idempotent monthly-slice backfill automatically right before applying migration `103`, so the upgrade completes in a single invocation — no operator step required. If you are still on a pre-MUL-2957 binary or the auto-hook fails for an environmental reason, run `backfill_task_usage_hourly` against the same database and re-run the upgrade. See [Advanced Configuration → Usage Dashboard Rollup](SELF_HOSTING_ADVANCED.md#usage-dashboard-rollup) for the recovery flow.
|
||||
|
||||
If you're upgrading from `v0.3.4 → v0.3.5+` and already have `task_usage` rows (or you just want the dashboard to show historical data on a fresh install that you've been running for a while), run the bundled backfill command once before scheduling the rollup:
|
||||
### Compatibility paths (existing deployments only)
|
||||
|
||||
```bash
|
||||
# Backfills task_usage_hourly from all historical task_usage rows and stamps
|
||||
# the rollup watermark. Idempotent — safe to re-run.
|
||||
docker compose -f docker-compose.selfhost.yml exec backend \
|
||||
./backfill_task_usage_hourly --sleep-between-slices=2s
|
||||
```
|
||||
External schedulers — **`pg_cron` registered on the database, an external cron job, a systemd timer, or a Kubernetes `CronJob`** — that call `SELECT rollup_task_usage_hourly()` directly were the only option before MUL-2957 and remain a supported compatibility path. They are no longer the recommended setup; new deployments should rely on the in-process scheduler instead. The SQL function holds advisory lock 4246 internally, so the in-process scheduler and any pre-existing external schedule can coexist without ever double-writing the rollup.
|
||||
|
||||
On a database with years of data this can scan tens of millions of rows; `--sleep-between-slices=2s` throttles the read pressure. Use `--months-back N` (plus `--force-partial`) if you only want the last N months. Once it finishes, set up Option A or Option B so new buckets keep flowing.
|
||||
If you already have a `pg_cron` job in production, the safe sequence to retire it is:
|
||||
|
||||
After upgrading, re-run `migrate up` (or restart the backend container — migrations run automatically on startup) to apply migration `103` cleanly.
|
||||
1. Confirm the in-process scheduler is healthy on at least one backend replica — recent SUCCESS rows should be landing in `sys_cron_executions` for `rollup_task_usage_hourly`:
|
||||
|
||||
```sql
|
||||
SELECT plan_time, status, runner_id, finished_at
|
||||
FROM sys_cron_executions
|
||||
WHERE job_name = 'rollup_task_usage_hourly'
|
||||
AND status = 'SUCCESS'
|
||||
ORDER BY plan_time DESC
|
||||
LIMIT 5;
|
||||
```
|
||||
|
||||
2. Once SUCCESS rows are arriving on schedule, unschedule the redundant `pg_cron` entry:
|
||||
|
||||
```sql
|
||||
SELECT cron.unschedule('rollup_task_usage_hourly')
|
||||
FROM cron.job WHERE jobname = 'rollup_task_usage_hourly';
|
||||
```
|
||||
|
||||
3. Leave the `pg_cron` extension itself installed unless you are sure no other workload depends on it. The bundled `pgvector/pgvector:pg17` image does **not** ship `pg_cron`, so nothing in Multica's default install needs it; uninstalling `pg_cron` from a custom image that other workloads still use is a separate decision.
|
||||
|
||||
External cron / systemd timer / Kubernetes `CronJob` setups that call `SELECT rollup_task_usage_hourly()` directly can be retired the same way — once `sys_cron_executions` shows steady SUCCESS rows from the in-process scheduler, the external job is redundant and can be removed.
|
||||
|
||||
## Stopping Services
|
||||
|
||||
@@ -431,7 +427,7 @@ docker compose -f docker-compose.selfhost.yml up -d
|
||||
Pin `MULTICA_IMAGE_TAG` in `.env` to an exact version like `v0.2.4` if you want to stay on a specific release. Migrations run automatically on backend startup.
|
||||
If the selected GHCR tag has not been published yet, fall back to `make selfhost-build` or `docker compose -f docker-compose.selfhost.yml -f docker-compose.selfhost.build.yml up -d --build`.
|
||||
|
||||
> **Upgrading from `v0.3.4` to `v0.3.5+` fails with `refusing to drop legacy daily rollups: ...`?** That's migration `103`'s fail-closed guard: it requires `task_usage_hourly` to be seeded before the legacy daily rollups are dropped. Run `backfill_task_usage_hourly` first, then re-run the upgrade. Full instructions in [Usage Dashboard Rollup → Option C](#option-c--backfill-history-first-then-schedule).
|
||||
> **Upgrading from `v0.3.4` to `v0.3.5+` fails with `refusing to drop legacy daily rollups: ...`?** That's migration `103`'s fail-closed guard: it requires `task_usage_hourly` to be seeded before the legacy daily rollups are dropped. As of MUL-2957 `migrate up` runs that backfill automatically right before applying `103`, so the upgrade completes in a single invocation. If you are still on a pre-MUL-2957 binary or the auto-hook fails, run `backfill_task_usage_hourly` manually first, then re-run the upgrade. Full instructions in [Advanced Configuration → Usage Dashboard Rollup](SELF_HOSTING_ADVANCED.md#usage-dashboard-rollup).
|
||||
|
||||
---
|
||||
|
||||
|
||||
@@ -133,17 +133,54 @@ Alternatively, configure step by step: `multica config set server_url http://loc
|
||||
3. Go to **Settings → Agents** and create a new agent
|
||||
4. Create an issue and assign it to your agent
|
||||
|
||||
## Usage Dashboard Rollup (Required)
|
||||
## Usage Dashboard Rollup
|
||||
|
||||
Starting with `v0.3.5`, the Usage / Runtime dashboards read from a derived `task_usage_hourly` table populated by `rollup_task_usage_hourly()`. The bundled `pgvector/pgvector:pg17` image does **not** include `pg_cron`, and the backend doesn't run the rollup in-process either — until you schedule it yourself, the dashboard will stay at zero even though `task_usage` is populated.
|
||||
The Usage / Runtime dashboards read from a derived `task_usage_hourly` table populated by `rollup_task_usage_hourly()`. As of MUL-2957 the backend runs this rollup **in-process** on every replica via a DB-backed scheduler (`sys_cron_executions`). A fresh self-host install needs no operator action — the bundled `pgvector/pgvector:pg17` image works as-is, and you do **not** need to swap it for an image that ships `pg_cron`, register an external cron job, run a systemd timer, or schedule a Kubernetes `CronJob`.
|
||||
|
||||
Pick one supported path before relying on the Usage / Runtime dashboard:
|
||||
Multiple backend replicas are safe: every replica ticks every 30 seconds and tries to claim the current 5-minute UTC plan, but the unique key `(job_name, scope_kind, scope_id, plan_time)` means only one wins each plan. Inspect the audit table to confirm steady-state operation:
|
||||
|
||||
- **External cron / systemd-timer / Kubernetes `CronJob`**: schedule `SELECT rollup_task_usage_hourly()` every 5 minutes. Idempotent, watermark-driven — overlapping or skipped ticks are safe.
|
||||
- **Postgres with `pg_cron`**: swap the bundled Postgres image for one that ships `pg_cron`, set `shared_preload_libraries=pg_cron`, then `SELECT cron.schedule('rollup_task_usage_hourly', '*/5 * * * *', 'SELECT rollup_task_usage_hourly()')` once.
|
||||
- **Backfill historical data**: required on the `v0.3.4 → v0.3.5+` upgrade path when the database already has `task_usage` rows — migration `103` is fail-closed and will abort `migrate up` with `refusing to drop legacy daily rollups: ...` until the hourly table is seeded. Run `./backfill_task_usage_hourly --sleep-between-slices=2s` inside the backend container, then re-run the upgrade and configure one of the schedules above.
|
||||
```sql
|
||||
SELECT plan_time, status, attempt, runner_id,
|
||||
error_code, error_msg, started_at, finished_at
|
||||
FROM sys_cron_executions
|
||||
WHERE job_name = 'rollup_task_usage_hourly'
|
||||
ORDER BY plan_time DESC
|
||||
LIMIT 20;
|
||||
```
|
||||
|
||||
Full reference (Compose + Kubernetes templates, flag descriptions, upgrade order) lives in [`SELF_HOSTING_ADVANCED.md → Usage Dashboard Rollup`](https://github.com/multica-ai/multica/blob/main/SELF_HOSTING_ADVANCED.md#usage-dashboard-rollup).
|
||||
<Callout>
|
||||
**Upgrading from `v0.3.4` to `v0.3.5+`?** As of MUL-2957 the `migrate up` command runs an idempotent monthly-slice backfill automatically right before applying migration `103`, so the upgrade completes in a single invocation — no operator step required. If you are on a pre-MUL-2957 binary or the auto-hook fails for an environmental reason, run `backfill_task_usage_hourly` against the same database and re-run the upgrade. Full recovery flow lives in [`SELF_HOSTING_ADVANCED.md → Usage Dashboard Rollup`](https://github.com/multica-ai/multica/blob/main/SELF_HOSTING_ADVANCED.md#usage-dashboard-rollup).
|
||||
</Callout>
|
||||
|
||||
### Compatibility paths (existing deployments only)
|
||||
|
||||
External schedulers — **`pg_cron` registered on the database, an external cron job, a systemd timer, or a Kubernetes `CronJob`** — that call `SELECT rollup_task_usage_hourly()` directly were the only option before MUL-2957 and remain a supported compatibility path. They are no longer the recommended setup; new deployments should rely on the in-process scheduler. The SQL function holds advisory lock 4246 internally, so the in-process scheduler and any pre-existing external schedule can coexist without ever double-writing the rollup.
|
||||
|
||||
If you already have a `pg_cron` job in production and want to retire it, the safe sequence is:
|
||||
|
||||
1. Confirm the in-process scheduler is healthy on at least one backend replica — recent SUCCESS rows should be landing in `sys_cron_executions` for `rollup_task_usage_hourly`:
|
||||
|
||||
```sql
|
||||
SELECT plan_time, status, runner_id, finished_at
|
||||
FROM sys_cron_executions
|
||||
WHERE job_name = 'rollup_task_usage_hourly'
|
||||
AND status = 'SUCCESS'
|
||||
ORDER BY plan_time DESC
|
||||
LIMIT 5;
|
||||
```
|
||||
|
||||
2. Once SUCCESS rows are arriving on schedule, unschedule the redundant `pg_cron` entry:
|
||||
|
||||
```sql
|
||||
SELECT cron.unschedule('rollup_task_usage_hourly')
|
||||
FROM cron.job WHERE jobname = 'rollup_task_usage_hourly';
|
||||
```
|
||||
|
||||
3. Leave the `pg_cron` extension itself installed unless you are sure no other workload depends on it. The bundled `pgvector/pgvector:pg17` image does **not** ship `pg_cron`, so nothing in Multica's default install needs it; uninstalling `pg_cron` from a custom image that other workloads still use is a separate decision.
|
||||
|
||||
External cron / systemd timer / Kubernetes `CronJob` setups that call `SELECT rollup_task_usage_hourly()` directly can be retired the same way — once `sys_cron_executions` shows steady SUCCESS rows from the in-process scheduler, the external job is redundant and can be removed.
|
||||
|
||||
Full reference (audit table semantics, advisory lock 4246, the standalone backfill command, flag descriptions, the migration auto-hook) lives in [`SELF_HOSTING_ADVANCED.md → Usage Dashboard Rollup`](https://github.com/multica-ai/multica/blob/main/SELF_HOSTING_ADVANCED.md#usage-dashboard-rollup).
|
||||
|
||||
## Stopping Services
|
||||
|
||||
|
||||
Reference in New Issue
Block a user