mirror of
https://github.com/multica-ai/multica.git
synced 2026-07-05 13:29:44 +02:00
* feat(analytics): add PostHog client with async batch shipping Introduces server/internal/analytics, the shipping layer for the product funnel defined in docs/analytics.md. Capture is non-blocking — events are enqueued into a bounded channel and a background worker batches them to PostHog's /batch/ endpoint. A broken backend drops events rather than blocking request handlers. Local dev and self-hosted instances run a noop client until the operator sets POSTHOG_API_KEY. This is PR 1 of MUL-1122; signup and workspace_created emission land in the follow-up commit so this change is independently reviewable. * feat(server): emit signup and workspace_created analytics events Wires analytics.Client through handler.New and main, then emits the first two funnel events: - signup fires from findOrCreateUser (which now reports isNew), covering both the verification-code and Google OAuth entry points — a single emission site guarantees Google signups aren't missed. - workspace_created fires after the CreateWorkspace transaction commits, with is_first_workspace computed from a post-commit ListWorkspaces count so we can distinguish fresh-user activation from returning-user expansion. Tests use analytics.NoopClient so nothing ships from test runs. PR 1 of MUL-1122; runtime_registered and issue_executed follow in later PRs per the plan. * refactor(analytics): drop is_first_workspace from workspace_created Stamping "is this the user's first workspace?" at emit time races under concurrent CreateWorkspace requests: two transactions committing close together can both read a post-commit count greater than one and both emit false. Fixing it at the SQL layer requires a schema change we don't want in PR 1. PostHog answers the same question exactly from the event stream (funnel on "first time user does X" / cohort on $initial_event), so removing the property loses no information and makes the emit side race-free. * docs(analytics): document self-host safety defaults Spell out why self-hosted instances never ship events upstream by default (empty POSTHOG_API_KEY → noop client) and explain how operators can point at their own PostHog project without any code change. * feat(analytics): emit runtime_registered, issue_executed, team_invite_* Three server-side funnel events, all gated on first-time state transitions so retries and re-runs don't inflate the WAW buckets: - runtime_registered fires from DaemonRegister when UpsertAgentRuntime reports (xmax = 0) — i.e. the row was inserted, not updated. Heartbeats and re-registrations stay silent. - issue_executed fires from CompleteTask after an atomic UPDATE issue SET first_executed_at = now() WHERE id = $1 AND first_executed_at IS NULL flips the column for the first time. Retries, re-assignments, and comment-triggered follow-up tasks hit the WHERE clause and no-op. Carries nth_issue_for_workspace so the ≥1/≥2/≥5/≥10 buckets filter without extra queries. - team_invite_sent fires from CreateInvitation and team_invite_accepted from AcceptInvitation, closing the expansion funnel. Adds a 050 migration for issue.first_executed_at plus a partial index so the workspace-scoped executed-count query doesn't scan the never-executed tail. * feat(config): surface PostHog key via /api/config Extends AppConfig with posthog_key / posthog_host sourced from env on every request (so operators can rotate the key via secret refresh without a restart). Reading the key off the server — rather than baking it into the frontend bundle via NEXT_PUBLIC_* — means self-hosted instances inherit the blank key automatically and never ship events upstream. * feat(analytics): wire posthog-js identify + UTM capture on the client Adds @multica/core/analytics — a thin wrapper around posthog-js that owns attribution capture and identity merge. Posthog-js config comes from /api/config (not NEXT_PUBLIC_*), so self-hosted instances whose server returns an empty key automatically run the SDK inert. captureSignupSource stamps a multica_signup_source cookie with UTM params and the referrer's origin (never the full referrer — that can leak OAuth code/state in the callback URL). The backend signup event reads this cookie on new-user creation. Identity flows: - auth-initializer fires identify() right after getMe() resolves, on both cookie and token paths. A getConfig/getMe race is handled by buffering a pending identify inside the analytics module and flushing it once initAnalytics finishes. - auth store calls identify() on verifyCode / loginWithGoogle / loginWithToken and resetAnalytics() on logout so the next login merges cleanly without bleeding events. * docs(analytics): describe runtime_registered, issue_executed, invite events Fills in the schema for the remaining funnel events. Captures the design commentary that belongs next to the contract rather than in a PR description — in particular why issue_executed uses the atomic first_executed_at flip instead of counting task-terminal events, and why runtime_registered relies on xmax = 0 rather than a query-then-write. * fix(analytics): drop non-atomic nth_issue_for_workspace from issue_executed Computing the workspace's Nth-issue ordinal at emit time is not atomic under concurrent first-completions — two transactions can both run MarkIssueFirstExecuted, then both run CountExecutedIssuesInWorkspace, and both observe count=1 before either has committed, so both events go out stamped as n=1. Serialising it would mean a per-workspace advisory lock or a SERIALIZABLE-isolated tx; PostHog answers the same question exactly at query time via row_number() partitioned by workspace_id, so the emit-time property adds risk without adding information. Removes the property from analytics.IssueExecuted, deletes the unused CountExecutedIssuesInWorkspace query, and regenerates sqlc. The partial index stays — any future workspace-scoped executed-issue query will want it. * fix(analytics): wire $pageview and harden signup_source cookie payload Two frontend fixes from the PR review: - PageviewTracker, mounted under WebProviders, fires capturePageview on every Next.js App Router path / query-string change. Without this the capturePageview helper in @multica/core/analytics was never called and the acquisition funnel's / → signup step was empty. - captureSignupSource now caps each UTM / referrer value at 96 chars *before* JSON.stringify, and drops the whole cookie when the serialised payload still exceeds 512 chars. Previously the overall slice(0, 256) could leave a half-JSON string on the wire that neither the backend nor PostHog could parse. Both capturePageview and identify now buffer a single pending call when fired before initAnalytics resolves — otherwise the initial "/" pageview and same-turn login identify race the /api/config fetch and get dropped. resetAnalytics clears both buffers so a logout→login cycle stays clean. * fix(analytics): URL-decode signup_source cookie on read Go does not URL-decode Cookie.Value automatically, so the frontend's JSON-then-encodeURIComponent payload was landing in PostHog as percent-encoded garbage (%7B%22utm_source...). Unescape on read so the backend receives the original JSON string the frontend intended, and drop values that fail to decode or exceed the server-side cap — sending truncated garbage is worse than sending nothing. Oversized-cookie guard matches the frontend's SIGNUP_SOURCE_MAX_LEN. * docs(analytics): reflect nth-issue drop, $pageview wiring, cookie encoding Pulls the schema doc back in line with the code: issue_executed no longer advertises nth_issue_for_workspace (with a note about why PostHog derives it at query time instead), the frontend $pageview section names the actual PageviewTracker component that fires it, and the signup_source section documents the per-value cap / overall drop rule and the encode-on-write / decode-on-read contract. --------- Co-authored-by: Jiang Bohan <bhjiang@outlook.com>
138 lines
5.0 KiB
SQL
138 lines
5.0 KiB
SQL
-- name: ListAgentRuntimes :many
|
|
SELECT * FROM agent_runtime
|
|
WHERE workspace_id = $1
|
|
ORDER BY created_at ASC;
|
|
|
|
-- name: GetAgentRuntime :one
|
|
SELECT * FROM agent_runtime
|
|
WHERE id = $1;
|
|
|
|
-- name: GetAgentRuntimeForWorkspace :one
|
|
SELECT * FROM agent_runtime
|
|
WHERE id = $1 AND workspace_id = $2;
|
|
|
|
-- name: UpsertAgentRuntime :one
|
|
-- (xmax = 0) AS inserted distinguishes a fresh insert (true) from an upsert
|
|
-- that updated an existing row (false). Analytics reads this to fire the
|
|
-- runtime_registered event only on first-time registration.
|
|
INSERT INTO agent_runtime (
|
|
workspace_id,
|
|
daemon_id,
|
|
name,
|
|
runtime_mode,
|
|
provider,
|
|
status,
|
|
device_info,
|
|
metadata,
|
|
owner_id,
|
|
last_seen_at
|
|
) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, now())
|
|
ON CONFLICT (workspace_id, daemon_id, provider)
|
|
DO UPDATE SET
|
|
name = EXCLUDED.name,
|
|
runtime_mode = EXCLUDED.runtime_mode,
|
|
status = EXCLUDED.status,
|
|
device_info = EXCLUDED.device_info,
|
|
metadata = EXCLUDED.metadata,
|
|
owner_id = COALESCE(EXCLUDED.owner_id, agent_runtime.owner_id),
|
|
last_seen_at = now(),
|
|
updated_at = now()
|
|
RETURNING *, (xmax = 0) AS inserted;
|
|
|
|
-- name: UpdateAgentRuntimeHeartbeat :one
|
|
UPDATE agent_runtime
|
|
SET status = 'online', last_seen_at = now(), updated_at = now()
|
|
WHERE id = $1
|
|
RETURNING *;
|
|
|
|
-- name: SetAgentRuntimeOffline :exec
|
|
UPDATE agent_runtime
|
|
SET status = 'offline', updated_at = now()
|
|
WHERE id = $1;
|
|
|
|
-- name: MarkStaleRuntimesOffline :many
|
|
UPDATE agent_runtime
|
|
SET status = 'offline', updated_at = now()
|
|
WHERE status = 'online'
|
|
AND last_seen_at < now() - make_interval(secs => @stale_seconds::double precision)
|
|
RETURNING id, workspace_id;
|
|
|
|
-- name: FailTasksForOfflineRuntimes :many
|
|
-- Marks dispatched/running tasks as failed when their runtime is offline.
|
|
-- This cleans up orphaned tasks after a daemon crash or network partition.
|
|
UPDATE agent_task_queue
|
|
SET status = 'failed', completed_at = now(), error = 'runtime went offline'
|
|
WHERE status IN ('dispatched', 'running')
|
|
AND runtime_id IN (
|
|
SELECT id FROM agent_runtime WHERE status = 'offline'
|
|
)
|
|
RETURNING id, agent_id, issue_id;
|
|
|
|
-- name: ListAgentRuntimesByOwner :many
|
|
SELECT * FROM agent_runtime
|
|
WHERE workspace_id = $1 AND owner_id = $2
|
|
ORDER BY created_at ASC;
|
|
|
|
-- name: DeleteAgentRuntime :exec
|
|
DELETE FROM agent_runtime WHERE id = $1;
|
|
|
|
-- name: CountActiveAgentsByRuntime :one
|
|
SELECT count(*) FROM agent WHERE runtime_id = $1 AND archived_at IS NULL;
|
|
|
|
-- name: DeleteArchivedAgentsByRuntime :exec
|
|
DELETE FROM agent WHERE runtime_id = $1 AND archived_at IS NOT NULL;
|
|
|
|
-- name: FindLegacyRuntimesByDaemonID :many
|
|
-- Looks up runtime rows keyed on a prior (hostname-derived) daemon_id. Used
|
|
-- at register-time to find rows owned by the same machine under its old
|
|
-- identity so agents/tasks can be re-pointed at the new UUID-keyed row.
|
|
--
|
|
-- Comparison is case-insensitive because os.Hostname() has been observed to
|
|
-- return different casings on the same machine (e.g. `Jiayuans-MacBook-Pro`
|
|
-- vs `jiayuans-macbook-pro`) across reboots/mDNS state changes. A case-
|
|
-- sensitive `=` would strand the old row; LOWER() on both sides handles drift
|
|
-- without forcing the daemon to enumerate cased permutations.
|
|
--
|
|
-- Returns many rather than one because case drift may have already minted
|
|
-- duplicate rows historically (e.g. `Foo.local` AND `foo.local` under the
|
|
-- same workspace+provider). A single-row lookup would consolidate only one
|
|
-- of them and leave the rest orphaned. Callers must merge every returned
|
|
-- row into the new UUID-keyed runtime.
|
|
SELECT * FROM agent_runtime
|
|
WHERE workspace_id = @workspace_id
|
|
AND provider = @provider
|
|
AND LOWER(daemon_id) = LOWER(@daemon_id);
|
|
|
|
-- name: ReassignAgentsToRuntime :execrows
|
|
-- Re-points every agent referencing old_runtime_id at new_runtime_id.
|
|
UPDATE agent
|
|
SET runtime_id = @new_runtime_id
|
|
WHERE runtime_id = @old_runtime_id;
|
|
|
|
-- name: ReassignTasksToRuntime :execrows
|
|
-- Re-points every queued/running/completed task referencing old_runtime_id.
|
|
-- Required before deleting the old runtime row because agent_task_queue has
|
|
-- an ON DELETE CASCADE FK that would otherwise drop historical tasks.
|
|
UPDATE agent_task_queue
|
|
SET runtime_id = @new_runtime_id
|
|
WHERE runtime_id = @old_runtime_id;
|
|
|
|
-- name: RecordRuntimeLegacyDaemonID :exec
|
|
-- Remembers the most recent hostname-derived daemon_id that was merged into
|
|
-- this row. Useful for debugging when tracing back why a given runtime row
|
|
-- subsumed an old one, and only overwrites NULL so the earliest merge is
|
|
-- preserved.
|
|
UPDATE agent_runtime
|
|
SET legacy_daemon_id = COALESCE(legacy_daemon_id, $2)
|
|
WHERE id = $1;
|
|
|
|
-- name: DeleteStaleOfflineRuntimes :many
|
|
-- Deletes runtimes that have been offline for longer than the TTL and have
|
|
-- no agents bound (active or archived). The FK constraint on agent.runtime_id
|
|
-- is ON DELETE RESTRICT, so we must exclude all agent references.
|
|
DELETE FROM agent_runtime
|
|
WHERE status = 'offline'
|
|
AND last_seen_at < now() - make_interval(secs => @stale_seconds::double precision)
|
|
AND id NOT IN (SELECT DISTINCT runtime_id FROM agent)
|
|
RETURNING id, workspace_id;
|