mirror of
https://github.com/multica-ai/multica.git
synced 2026-06-17 03:38:32 +02:00
* feat(server): funnel/community/commercial business metrics + PostHog pairing (MUL-2949) PR3 of the Grafana board metrics split (parent MUL-2328). Adds 23 new Prometheus counter/histogram families to the PR2 BusinessMetrics collector covering the activation/community/commercial funnels, and binds every PostHog event emission to a matching metric increment so the two sides cannot drift. Funnel: signup, workspace_created, team_invite_sent/accepted, onboarding_*, cloud_waitlist_joined. Content: issue_created, chat_message_sent, agent_created, squad_created, autopilot_created, issue_executed. Runtime: runtime_registered/ready/failed/offline + ready_seconds histogram, daemon_ws_message_received_total. Autopilot: autopilot_run_started/terminal/skipped. Webhook/GitHub: webhook_delivery_total, github_event_received_total, github_pr_review_total, github_pr_merge_seconds histogram. CloudRuntime: cloudruntime_request_total + duration histogram, wired through a small RequestRecorder interface so the cloudruntime package stays decoupled from metrics. Commercial: feedback_submitted, contact_sales_submitted. The pairing helper metrics.RecordEvent(client, m, ev) emits the PostHog event AND increments the matching counter via IncForEvent dispatch, reading labels from the analytics event Properties. Every existing h.Analytics.Capture(analytics.X(...)) call site has been migrated to the helper across handler/, service/, and cmd/server/runtime_sweeper.go. Lint enforcement (server/internal/metrics/business_pairing_test.go): - TestEveryAnalyticsEventHasPrometheusCounter: every Event* constant in analytics/events.go either dispatches via IncForEvent or is in the taskMetricEvents allow-list (PR2 typed RecordTask* methods). - TestNoNakedAnalyticsCaptureInHandlersOrServices: AST-walks handler/ service/cmd-server for direct Analytics.Capture(...) calls — only service/task.go's captureTaskEvent helper is allow-listed. - TestEveryAnalyticsRecordEventTakesAnalyticsHelper: validates the third arg of every metrics.RecordEvent call is built from analytics.*. Cardinality protection: all new label values pass through fixed allow-lists in labels_pr3.go; unknown values collapse to 'other'/'unknown'/'error'. Refs: - Spec MUL-2328 / MUL-2949. - Builds on PR2 (MUL-2948) — collectors registered through the same BusinessMetrics struct, no separate Registry. - Uses PR1's taskfailure.Reason (MUL-2946) for runtime_failed's failure_reason label via NormalizeFailureReason. Out of scope: Sampler-class metrics (PR4 / MUL-2947), pr_review_total emission point (no review event handler exists yet — counter is defined, TODO to wire up when /api/webhooks/github grows pull_request_review handling). Co-authored-by: multica-agent <github@multica.ai> * fix(server): tighten PR3 review items — signup_source bucket, fill platform/kind/form_source enums, onboarding_started server emission, lint scope (MUL-2949) Addresses 张大彪's review on #3698: 1. signup_source: NormalizeSignupSource added to labels_pr3.go with a fixed allow-list bucket (direct/google/twitter/linkedin/.../other). Parses JSON cookie payload for utm_source/source/referrer fields, strips URL schemes, maps well-known hostnames to channel buckets. PostHog event still ships the raw cookie value for analytics; only the Prometheus label is bucketed. 2. Filled the unknown/other label gaps: - analytics.IssueCreated and analytics.ChatMessageSent now take a platform parameter sourced from middleware.ClientMetadataFromContext (X-Client-Platform header) at the handler. Autopilot-originated issues stamp PlatformServer. - analytics.FeedbackSubmitted now takes a kind parameter; CreateFeedback reads req.Kind (default "general") so the picker selection lights up the metric's kind label instead of long-term "other". - analytics.ContactSalesSubmitted now takes a formSource (page / onboarding / agents_page); CreateContactSales reads req.Source. The metric reads ev.Properties["form_source"] so the analytics CoreProperties.Source ("marketing_contact_sales") stays backward-compat for PostHog dashboards. 3. analytics.OnboardingStarted helper added; server-side emission lives in PatchOnboarding, fired exactly once per user on the first PATCH that carries a non-empty questionnaire payload (firstTouch logic compares prior bytes against {} / null). Frontend onboarding_started keeps firing on page open; the server emission is what guarantees the Prometheus counter exists so Grafana can be cross-checked against the PostHog funnel without depending on the SDK roundtrip. 4. business_pairing_test.go tightened: - TestNoNakedAnalyticsCaptureInHandlersOrServices now allow-lists at function granularity (just captureTaskEvent in service/task.go), not whole-file. Any future naked Capture in the same file fails CI. - TestEveryAnalyticsRecordEventTakesAnalyticsHelper now does def-use tracking inside the enclosing FuncDecl: when RecordEvent's third arg is an *ast.Ident, the test walks the function body for the assignment that defined it and confirms the RHS is an analytics.<Helper>(...) call. Bare local idents that didn't originate from analytics are now caught. 5. gofmt -w applied across the touched files; gofmt -l clean. Tests: go test ./internal/metrics/... ./internal/analytics/... pass. Pre-existing TestClaimTask_/TestWebhook_MergedPR/TestDeleteIssueByIdentifier failures on origin/main are DB-environment-dependent and not regressions from this change. Co-authored-by: multica-agent <github@multica.ai> * fix(server): normalise onboarding_started platform label + regression test (MUL-2949) Addresses 张大彪's last review nit: - IncForEvent's EventOnboardingStarted case now wraps the platform property with NormalizePlatform, matching every other platform-bearing metric. A misbehaving frontend can no longer leak a raw X-Client-Platform header value into the multica_onboarding_started_total{platform=...} series. - New labels_pr3_test.go covers every PR3 normalizer with both a happy-path value and an unknown value, asserting the unknown collapses to the documented fallback bucket. Includes a focused regression for onboarding_started: emits one event with an attacker-shaped platform string and asserts the metric only exposes web + unknown label values (no raw header bleed). - testutil.go gains a small GatherForTest helper so the regression test can pull the typed MetricFamily map without re-implementing the registry-walk dance. Co-authored-by: multica-agent <github@multica.ai> * fix(server): NormalizeTaskSource on workspace_created + document lint limitations (MUL-2949) Final review touch-ups before merge: - IncForEvent's EventWorkspaceCreated case wraps source through NormalizeTaskSource, matching the other source-bearing dispatches (issue_created, agent_created, issue_executed). Closes the last raw property leak in the dispatcher table. - business_pairing_test.go inline docstrings now spell out the two known limitations of the lint gate that 张大彪 / Eve flagged: analyticsBackedIdents matches by ident NAME (not SSA def-use, so a nested-scope shadow could pass) and isMetricsRecordEvent hard-codes the import alias set. PR description carries a Follow-ups section with the same two items so the work is visible after merge. Co-authored-by: multica-agent <github@multica.ai> --------- Co-authored-by: 魏和尚 <agent+wei@multica.ai> Co-authored-by: multica-agent <github@multica.ai>
347 lines
12 KiB
Go
347 lines
12 KiB
Go
package handler
|
|
|
|
import (
|
|
"encoding/json"
|
|
"log/slog"
|
|
"net/http"
|
|
"net/mail"
|
|
"strings"
|
|
|
|
"github.com/jackc/pgx/v5/pgtype"
|
|
|
|
"github.com/multica-ai/multica/server/internal/analytics"
|
|
"github.com/multica-ai/multica/server/internal/logger"
|
|
obsmetrics "github.com/multica-ai/multica/server/internal/metrics"
|
|
"github.com/multica-ai/multica/server/internal/middleware"
|
|
db "github.com/multica-ai/multica/server/pkg/db/generated"
|
|
)
|
|
|
|
// Upper bound on free-text fields. `cloudWaitlistReasonMaxLen` is a
|
|
// product cap ("we don't need an essay for a waitlist"); the body-size
|
|
// cap further down is defense in depth against arbitrary storage
|
|
// abuse via the JSON body.
|
|
const (
|
|
cloudWaitlistReasonMaxLen = 500
|
|
|
|
// PatchOnboarding body is a tiny JSON with at most a 3-question
|
|
// questionnaire. 16 KiB is ~10x the realistic ceiling — it's the
|
|
// minimum that keeps the door open for future fields without
|
|
// letting a malicious user stuff the JSONB column.
|
|
patchOnboardingBodyLimit = 16 * 1024
|
|
)
|
|
|
|
// completeOnboardingRequest carries the client's view of which exit the
|
|
// user took from the flow. Used purely as an analytics dimension — server
|
|
// state (onboarded_at) flips the same way regardless. Unknown / missing
|
|
// → OnboardingPathUnknown so legacy clients still complete cleanly, just
|
|
// without a funnel-ready label.
|
|
//
|
|
// `workspace_id` is retained for analytics enrichment; the v2 code path
|
|
// used it to seed an install-runtime issue inside the same transaction,
|
|
// but in v3 every workspace-content seeding lives in the frontend
|
|
// welcome hook (see packages/views/workspace/welcome-after-onboarding.tsx).
|
|
type completeOnboardingRequest struct {
|
|
CompletionPath string `json:"completion_path,omitempty"`
|
|
WorkspaceID string `json:"workspace_id,omitempty"`
|
|
}
|
|
|
|
var validCompletionPaths = map[string]struct{}{
|
|
analytics.OnboardingPathFull: {},
|
|
analytics.OnboardingPathRuntimeSkipped: {},
|
|
analytics.OnboardingPathCloudWaitlist: {},
|
|
analytics.OnboardingPathSkipExisting: {},
|
|
analytics.OnboardingPathInviteAccept: {},
|
|
}
|
|
|
|
// CompleteOnboarding marks the authenticated user as having completed
|
|
// onboarding. Idempotent: the underlying query uses COALESCE so the
|
|
// original timestamp is preserved if called more than once.
|
|
//
|
|
// Emits `onboarding_completed` exactly once — the first call that
|
|
// actually flips `onboarded_at` from NULL. Subsequent calls are still
|
|
// 200 OK (for client-side retries) but skip the event so the funnel
|
|
// counts honest first-completion.
|
|
//
|
|
// V3 has no in-handler seeding side effect: workspace content (Helper
|
|
// agent, starter issues, install-runtime guides) is created by the
|
|
// frontend welcome hook via the generic CreateAgent / CreateIssue
|
|
// endpoints. This handler does one thing: flip the field.
|
|
func (h *Handler) CompleteOnboarding(w http.ResponseWriter, r *http.Request) {
|
|
userID, ok := requireUserID(w, r)
|
|
if !ok {
|
|
return
|
|
}
|
|
|
|
// Body is optional — an empty body is a legal legacy call.
|
|
var req completeOnboardingRequest
|
|
if r.ContentLength > 0 {
|
|
if err := json.NewDecoder(r.Body).Decode(&req); err != nil && err.Error() != "EOF" {
|
|
writeError(w, http.StatusBadRequest, "invalid request body")
|
|
return
|
|
}
|
|
}
|
|
|
|
// Validate workspace_id if supplied; we don't write with it, but a
|
|
// malformed value should fail fast rather than silently land in
|
|
// PostHog as a junk dimension.
|
|
if req.WorkspaceID != "" {
|
|
wsUUID, ok := parseUUIDOrBadRequest(w, req.WorkspaceID, "workspace_id")
|
|
if !ok {
|
|
return
|
|
}
|
|
req.WorkspaceID = uuidToString(wsUUID)
|
|
}
|
|
|
|
before, err := h.Queries.GetUser(r.Context(), parseUUID(userID))
|
|
if err != nil {
|
|
writeError(w, http.StatusInternalServerError, "failed to complete onboarding")
|
|
return
|
|
}
|
|
firstCompletion := !before.OnboardedAt.Valid
|
|
|
|
user, err := h.Queries.MarkUserOnboarded(r.Context(), parseUUID(userID))
|
|
if err != nil {
|
|
slog.Warn("complete onboarding: mark user onboarded failed", append(logger.RequestAttrs(r), "error", err)...)
|
|
writeError(w, http.StatusInternalServerError, "failed to complete onboarding")
|
|
return
|
|
}
|
|
|
|
if firstCompletion {
|
|
path := req.CompletionPath
|
|
if _, ok := validCompletionPaths[path]; !ok {
|
|
path = analytics.OnboardingPathUnknown
|
|
}
|
|
onboardedAt := ""
|
|
if user.OnboardedAt.Valid {
|
|
onboardedAt = user.OnboardedAt.Time.UTC().Format("2006-01-02T15:04:05Z07:00")
|
|
}
|
|
obsmetrics.RecordEvent(h.Analytics, h.Metrics, analytics.OnboardingCompleted(
|
|
userID,
|
|
req.WorkspaceID,
|
|
path,
|
|
onboardedAt,
|
|
user.CloudWaitlistEmail.Valid,
|
|
))
|
|
}
|
|
|
|
writeJSON(w, http.StatusOK, userToResponse(user))
|
|
}
|
|
|
|
type patchOnboardingRequest struct {
|
|
Questionnaire *json.RawMessage `json:"questionnaire,omitempty"`
|
|
}
|
|
|
|
// questionnaireAnswers mirrors the frontend's `QuestionnaireAnswers`
|
|
// shape. `use_case` is multi-select (Step 3 allows picking several);
|
|
// `source` is single-select (primary acquisition channel) but kept
|
|
// as `stringOrSlice` for back-compat with v2 multi-select rows — the
|
|
// client now always commits a one-element array. `role` stays
|
|
// single-select.
|
|
//
|
|
// stringOrSlice also tolerates pre-array rows that wrote a bare
|
|
// string into the JSONB column — `json.Unmarshal` would otherwise
|
|
// fail on type mismatch when reading those back.
|
|
type stringOrSlice []string
|
|
|
|
func (s *stringOrSlice) UnmarshalJSON(data []byte) error {
|
|
// Empty / null both decode to nil slice.
|
|
if len(data) == 0 || string(data) == "null" {
|
|
*s = nil
|
|
return nil
|
|
}
|
|
// Try array first (current shape).
|
|
var arr []string
|
|
if err := json.Unmarshal(data, &arr); err == nil {
|
|
*s = arr
|
|
return nil
|
|
}
|
|
// Fall back to single string (pre-array shape from before this
|
|
// column held a slice). Empty string means "unanswered" — keep nil.
|
|
var single string
|
|
if err := json.Unmarshal(data, &single); err != nil {
|
|
return err
|
|
}
|
|
if single == "" {
|
|
*s = nil
|
|
return nil
|
|
}
|
|
*s = []string{single}
|
|
return nil
|
|
}
|
|
|
|
type questionnaireAnswers struct {
|
|
Source stringOrSlice `json:"source"`
|
|
SourceOther string `json:"source_other"`
|
|
SourceSkipped bool `json:"source_skipped"`
|
|
Role string `json:"role"`
|
|
RoleOther string `json:"role_other"`
|
|
RoleSkipped bool `json:"role_skipped"`
|
|
UseCase stringOrSlice `json:"use_case"`
|
|
UseCaseOther string `json:"use_case_other"`
|
|
UseCaseSkipped bool `json:"use_case_skipped"`
|
|
Version int `json:"version"`
|
|
}
|
|
|
|
func (q questionnaireAnswers) sourceResolved() bool {
|
|
return len(q.Source) > 0 || q.SourceSkipped
|
|
}
|
|
func (q questionnaireAnswers) roleResolved() bool {
|
|
return q.Role != "" || q.RoleSkipped
|
|
}
|
|
func (q questionnaireAnswers) useCaseResolved() bool {
|
|
return len(q.UseCase) > 0 || q.UseCaseSkipped
|
|
}
|
|
|
|
// questionnaireSchemaVersion is the schema this handler understands.
|
|
// `complete()` and the funnel event are scoped to this version so a
|
|
// future v3 row can't be silently mis-counted against v2 semantics.
|
|
const questionnaireSchemaVersion = 2
|
|
|
|
func (q questionnaireAnswers) complete() bool {
|
|
if q.Version != questionnaireSchemaVersion {
|
|
return false
|
|
}
|
|
return q.sourceResolved() && q.roleResolved() && q.useCaseResolved()
|
|
}
|
|
|
|
// PatchOnboarding persists the user's questionnaire answers. The
|
|
// field is optional; an omitted questionnaire is preserved. Which
|
|
// step the user is on is deliberately not persisted — every
|
|
// onboarding entry starts at Welcome.
|
|
//
|
|
// Emits `onboarding_questionnaire_submitted` exactly once per user:
|
|
// the first PATCH that transitions the answers from "at least one
|
|
// slot empty" to "all three filled". Revisions past that point don't
|
|
// re-emit — the funnel counts users, not edits.
|
|
func (h *Handler) PatchOnboarding(w http.ResponseWriter, r *http.Request) {
|
|
userID, ok := requireUserID(w, r)
|
|
if !ok {
|
|
return
|
|
}
|
|
// Bound the body so the JSONB column can't be weaponized as bulk
|
|
// storage — otherwise every subsequent `/api/me` read would have
|
|
// to return the bloat.
|
|
r.Body = http.MaxBytesReader(w, r.Body, patchOnboardingBodyLimit)
|
|
var req patchOnboardingRequest
|
|
if err := json.NewDecoder(r.Body).Decode(&req); err != nil {
|
|
writeError(w, http.StatusBadRequest, "invalid request body")
|
|
return
|
|
}
|
|
|
|
// Read prior answers so we can detect the NULL/partial → complete
|
|
// transition after the update. An errored decode on the prior row
|
|
// is treated as "incomplete" — worst case we emit once more than
|
|
// we should, never twice for the same transition.
|
|
var before questionnaireAnswers
|
|
beforeRaw := []byte("{}")
|
|
if beforeUser, err := h.Queries.GetUser(r.Context(), parseUUID(userID)); err == nil {
|
|
beforeRaw = beforeUser.OnboardingQuestionnaire
|
|
_ = json.Unmarshal(beforeRaw, &before)
|
|
}
|
|
// firstTouch is true when the user has never written any
|
|
// onboarding state on the server before this PATCH. Used to fire
|
|
// onboarding_started exactly once per user from the server side.
|
|
firstTouch := len(beforeRaw) == 0 || string(beforeRaw) == "null" || string(beforeRaw) == "{}"
|
|
|
|
params := db.PatchUserOnboardingParams{ID: parseUUID(userID)}
|
|
if req.Questionnaire != nil {
|
|
params.Questionnaire = []byte(*req.Questionnaire)
|
|
}
|
|
user, err := h.Queries.PatchUserOnboarding(r.Context(), params)
|
|
if err != nil {
|
|
slog.Warn("patch onboarding failed", append(logger.RequestAttrs(r), "error", err)...)
|
|
writeError(w, http.StatusInternalServerError, "failed to update onboarding")
|
|
return
|
|
}
|
|
|
|
// Server-side onboarding_started: fire on the first PATCH that
|
|
// actually carries a questionnaire payload. The frontend also
|
|
// emits its own onboarding_started on page open; the two together
|
|
// let Grafana cross-check the funnel against PostHog.
|
|
if firstTouch && req.Questionnaire != nil && len(*req.Questionnaire) > 0 && string(*req.Questionnaire) != "{}" {
|
|
platform, _, _ := middleware.ClientMetadataFromContext(r.Context())
|
|
obsmetrics.RecordEvent(h.Analytics, h.Metrics, analytics.OnboardingStarted(userID, platform))
|
|
}
|
|
|
|
var after questionnaireAnswers
|
|
_ = json.Unmarshal(user.OnboardingQuestionnaire, &after)
|
|
if after.complete() && !before.complete() {
|
|
obsmetrics.RecordEvent(h.Analytics, h.Metrics, analytics.OnboardingQuestionnaireSubmitted(
|
|
userID,
|
|
[]string(after.Source),
|
|
after.Role,
|
|
[]string(after.UseCase),
|
|
after.SourceSkipped,
|
|
after.RoleSkipped,
|
|
after.UseCaseSkipped,
|
|
after.SourceOther != "",
|
|
after.RoleOther != "",
|
|
after.UseCaseOther != "",
|
|
))
|
|
}
|
|
|
|
writeJSON(w, http.StatusOK, userToResponse(user))
|
|
}
|
|
|
|
type joinCloudWaitlistRequest struct {
|
|
Email string `json:"email"`
|
|
Reason string `json:"reason"`
|
|
}
|
|
|
|
// JoinCloudWaitlist records a user's interest in cloud runtimes.
|
|
// Pure side effect — does NOT complete onboarding. The user still
|
|
// has to pick a real Step 3 path (CLI with a detected runtime) or
|
|
// Skip to move on. Repeating the call overwrites email + reason.
|
|
func (h *Handler) JoinCloudWaitlist(w http.ResponseWriter, r *http.Request) {
|
|
userID, ok := requireUserID(w, r)
|
|
if !ok {
|
|
return
|
|
}
|
|
var req joinCloudWaitlistRequest
|
|
if err := json.NewDecoder(r.Body).Decode(&req); err != nil {
|
|
writeError(w, http.StatusBadRequest, "invalid request body")
|
|
return
|
|
}
|
|
|
|
// RFC 5321 caps email at 254 chars; the column is VARCHAR(254) and
|
|
// the format check below rejects anything net/mail can't parse.
|
|
email := strings.ToLower(strings.TrimSpace(req.Email))
|
|
if email == "" {
|
|
writeError(w, http.StatusBadRequest, "email is required")
|
|
return
|
|
}
|
|
if len(email) > 254 {
|
|
writeError(w, http.StatusBadRequest, "email is too long")
|
|
return
|
|
}
|
|
if _, err := mail.ParseAddress(email); err != nil {
|
|
writeError(w, http.StatusBadRequest, "email is invalid")
|
|
return
|
|
}
|
|
|
|
reason := strings.TrimSpace(req.Reason)
|
|
if len(reason) > cloudWaitlistReasonMaxLen {
|
|
writeError(w, http.StatusBadRequest, "reason is too long")
|
|
return
|
|
}
|
|
|
|
reasonParam := pgtype.Text{}
|
|
if reason != "" {
|
|
reasonParam = pgtype.Text{String: reason, Valid: true}
|
|
}
|
|
|
|
user, err := h.Queries.JoinCloudWaitlist(r.Context(), db.JoinCloudWaitlistParams{
|
|
ID: parseUUID(userID),
|
|
CloudWaitlistEmail: pgtype.Text{String: email, Valid: true},
|
|
CloudWaitlistReason: reasonParam,
|
|
})
|
|
if err != nil {
|
|
writeError(w, http.StatusInternalServerError, "failed to join waitlist")
|
|
return
|
|
}
|
|
|
|
obsmetrics.RecordEvent(h.Analytics, h.Metrics, analytics.CloudWaitlistJoined(userID, reason != ""))
|
|
|
|
writeJSON(w, http.StatusOK, userToResponse(user))
|
|
}
|