6 Commits

Author SHA1 Message Date
LinYushen
b89b9cb4d6 test(migrate): concurrent migration race test using real Postgres (MUL-2956) (#3712)
* test(migrate): add concurrent migration race test using real Postgres (MUL-2956)

Follow-up to MUL-2923 / #3658, which added a Postgres advisory lock to
serialize the migration loop across concurrent runners (multi-replica
backend startup, scale-up, manual `migrate up` overlap). That PR shipped
without a test because cmd/migrate/ had no harness; this commit adds it.

Refactor: extract runMigrations(ctx, pool, runOptions) from main(), with
the lock key, the bookkeeping table, and the file list now injectable.
main() behavior is unchanged. Identifier interpolation goes through
pgx.Identifier{}.Sanitize so callers can pass "schema.schema_migrations"
safely.

Tests (cmd/migrate/migrate_concurrent_test.go) — every case isolates
itself in a unique throwaway schema and a unique lock key, so they
never touch the real schema_migrations table or block real production
runners that share the database. Skip cleanly when DATABASE_URL is
unreachable, matching the pattern already used in
internal/handler/handler_test.go and internal/metrics/business_sampler_pgsleep_test.go.

  - TestRunMigrationsConcurrentPending: 16 goroutines apply 5
    deliberately non-idempotent migrations (bare CREATE TABLE +
    ALTER TABLE ADD COLUMN). Without the lock, concurrent CREATE TABLE
    races trip "duplicate key value violates unique constraint
    pg_type_typname_nsp_index" — proving the lock is doing its job.
  - TestRunMigrationsConcurrentAlreadyApplied: 16 goroutines hit the
    EXISTS no-op path against a pre-populated bookkeeping table; the
    state must be unchanged.
  - TestRunMigrationsAdvisoryLockSerializes: an external connection
    holds the same advisory lock; we assert that zero of the 16
    runners complete during a 1 s observation window, then release
    the side lock and let them all finish. Catches the original
    MUL-2923 bug where the lock got attached to a random pooled
    connection.
  - TestRunMigrationsConcurrentMixedPoolStress: same pending case but
    with a deliberately small pool (runners/2), forcing pgxpool.Acquire
    contention to overlap with pg_advisory_lock contention.

Verified locally: `go test -race -count=10 ./cmd/migrate/` passes in
~15 s. Mutation test (lock acquire/release replaced with `SELECT 1`)
confirms the pending and lock-serializes tests both fail loudly,
catching the regression they were written to detect.

go.mod tidy promotes golang.org/x/sync to a direct dependency
(now imported by the test for errgroup) and incidentally fixes a
stale `// indirect` annotation on prometheus/client_model, which is
already imported directly by internal/metrics/testutil.go.

Co-authored-by: multica-agent <github@multica.ai>

* test(migrate): gofmt + address review nits (MUL-2956)

- gofmt -w cmd/migrate/migrate_concurrent_test.go: fixture struct field
  alignment.
- quoteQualifiedIdentifier: actually reject identifiers with more than
  one dot (the previous version split on the first dot only and would
  silently sanitize "a.b.c" into "a"."b.c", contradicting the comment).
  Inline the splitter via strings.Split now that we explicitly check the
  component count.
- Soften the test's lock-key comment from "never collide" to the
  accurate probabilistic statement (~1 in 2^62 collision odds with the
  production constant).

go test -race -count=10 ./cmd/migrate/ still passes (~15 s).

Co-authored-by: multica-agent <github@multica.ai>

* test(migrate): direction whitelist + tidy go.mod (MUL-2956)

Address two follow-ups from review:

- runMigrations now whitelist-checks opts.Direction up-front and
  returns an error for anything that is not "up" or "down". The
  previous shape relied on `opts.Direction == "up"` and an else branch,
  so a typo or empty string would silently fall through to the
  rollback path. Add TestRunMigrationsRejectsInvalidDirection covering
  the empty string, "UP"/"DOWN" case mismatches, "rollback", and a
  whitespace-padded value; the check fires before any pool work, so
  the test runs without Postgres.
- go mod tidy: promotes google.golang.org/protobuf to a direct
  dependency (it is imported directly elsewhere in the module and was
  stale-marked indirect).

go test -race -count=10 ./cmd/migrate/ green (~15.7 s, 50/50).

Co-authored-by: multica-agent <github@multica.ai>

---------

Co-authored-by: wei-heshang <wei-heshang@multica.ai>
Co-authored-by: multica-agent <github@multica.ai>
2026-06-08 13:33:16 +08:00
LinYushen
3caba86b09 feat(scheduler): DB-backed execution-record scheduler [MUL-2957] 2026-06-05 13:46:26 +08:00
LinYushen
24ea169d89 fix(migrate): serialize startup migrations with pg advisory lock (#3658)
cmd/migrate previously ran a check-then-apply loop on a *pgxpool.Pool
with no locking, so two backend pods starting at the same time (multi-
replica Deployment, scale-up, or a manual run overlapping with pod
startup) could both pass the EXISTS check on a pending migration and
race on the DDL or the schema_migrations INSERT, crashing the loser.

Take a single connection from the pool, hold a session-level
pg_advisory_lock for the entire migration loop, and release it on the
way out. We use the blocking variant so a late arriver queues behind
the current runner and then no-ops on the EXISTS checks instead of
crash-looping. The loop deliberately stays outside a transaction so
existing CREATE INDEX CONCURRENTLY migrations keep working.

Also refresh the values.yaml / backend.yaml comments next to
backend.replicas: the chart still ships replicas: 1 by default, but
that is now a recommendation (Recreate strategy, no leader split), not
a correctness requirement.

Refs https://github.com/multica-ai/multica/issues/3647

Co-authored-by: multica-agent <github@multica.ai>
2026-06-03 15:51:03 +08:00
devv-eve
9ed1fa95fc feat(server): add readiness health endpoints (#1605)
* feat(server): add readiness health endpoints

* fix(server): cache readiness checks

* fix(server): raise readiness cache ttl

---------

Co-authored-by: Eve <eve@multica.ai>
2026-04-24 13:50:24 +08:00
Naiyuan Qing
8983a9fefa feat(logging): add structured logging across server and SDK
Replace raw fmt/log calls with structured slog logger (Go) and
console-based logger (TypeScript). Add request logging middleware.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 10:57:11 +08:00
Jiayuan Zhang
b5b0605e9a chore: add one-click setup/start/stop scripts, migration CLI, and seed tool
- Add idempotent seed tool with duplicate detection for agents/issues/comments
- Add migration CLI supporting up/down with schema_migrations tracking
- Add Makefile targets: make setup (first-time), make start, make stop
- Update .gitignore for test artifacts and compiled binaries

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 11:50:33 +08:00