3 Commits

Author SHA1 Message Date
LinYushen
de900b2ba6 feat(server): funnel/community/commercial business metrics + PostHog pairing (MUL-2949) (#3698)
* feat(server): funnel/community/commercial business metrics + PostHog pairing (MUL-2949)

PR3 of the Grafana board metrics split (parent MUL-2328).

Adds 23 new Prometheus counter/histogram families to the PR2 BusinessMetrics
collector covering the activation/community/commercial funnels, and binds
every PostHog event emission to a matching metric increment so the two sides
cannot drift.

Funnel: signup, workspace_created, team_invite_sent/accepted, onboarding_*,
cloud_waitlist_joined.
Content: issue_created, chat_message_sent, agent_created, squad_created,
autopilot_created, issue_executed.
Runtime: runtime_registered/ready/failed/offline + ready_seconds histogram,
daemon_ws_message_received_total.
Autopilot: autopilot_run_started/terminal/skipped.
Webhook/GitHub: webhook_delivery_total, github_event_received_total,
github_pr_review_total, github_pr_merge_seconds histogram.
CloudRuntime: cloudruntime_request_total + duration histogram, wired through
a small RequestRecorder interface so the cloudruntime package stays decoupled
from metrics.
Commercial: feedback_submitted, contact_sales_submitted.

The pairing helper metrics.RecordEvent(client, m, ev) emits the PostHog
event AND increments the matching counter via IncForEvent dispatch, reading
labels from the analytics event Properties. Every existing
h.Analytics.Capture(analytics.X(...)) call site has been migrated to the
helper across handler/, service/, and cmd/server/runtime_sweeper.go.

Lint enforcement (server/internal/metrics/business_pairing_test.go):
- TestEveryAnalyticsEventHasPrometheusCounter: every Event* constant in
  analytics/events.go either dispatches via IncForEvent or is in the
  taskMetricEvents allow-list (PR2 typed RecordTask* methods).
- TestNoNakedAnalyticsCaptureInHandlersOrServices: AST-walks handler/
  service/cmd-server for direct Analytics.Capture(...) calls — only
  service/task.go's captureTaskEvent helper is allow-listed.
- TestEveryAnalyticsRecordEventTakesAnalyticsHelper: validates the third
  arg of every metrics.RecordEvent call is built from analytics.*.

Cardinality protection: all new label values pass through fixed allow-lists
in labels_pr3.go; unknown values collapse to 'other'/'unknown'/'error'.

Refs:
- Spec MUL-2328 / MUL-2949.
- Builds on PR2 (MUL-2948) — collectors registered through the same
  BusinessMetrics struct, no separate Registry.
- Uses PR1's taskfailure.Reason (MUL-2946) for runtime_failed's failure_reason
  label via NormalizeFailureReason.

Out of scope: Sampler-class metrics (PR4 / MUL-2947), pr_review_total
emission point (no review event handler exists yet — counter is defined,
TODO to wire up when /api/webhooks/github grows pull_request_review handling).

Co-authored-by: multica-agent <github@multica.ai>

* fix(server): tighten PR3 review items — signup_source bucket, fill platform/kind/form_source enums, onboarding_started server emission, lint scope (MUL-2949)

Addresses 张大彪's review on #3698:

1. signup_source: NormalizeSignupSource added to labels_pr3.go with a
   fixed allow-list bucket (direct/google/twitter/linkedin/.../other).
   Parses JSON cookie payload for utm_source/source/referrer fields,
   strips URL schemes, maps well-known hostnames to channel buckets.
   PostHog event still ships the raw cookie value for analytics; only
   the Prometheus label is bucketed.

2. Filled the unknown/other label gaps:
   - analytics.IssueCreated and analytics.ChatMessageSent now take a
     platform parameter sourced from middleware.ClientMetadataFromContext
     (X-Client-Platform header) at the handler. Autopilot-originated
     issues stamp PlatformServer.
   - analytics.FeedbackSubmitted now takes a kind parameter; CreateFeedback
     reads req.Kind (default "general") so the picker selection lights up
     the metric's kind label instead of long-term "other".
   - analytics.ContactSalesSubmitted now takes a formSource (page /
     onboarding / agents_page); CreateContactSales reads req.Source.
     The metric reads ev.Properties["form_source"] so the analytics
     CoreProperties.Source ("marketing_contact_sales") stays
     backward-compat for PostHog dashboards.

3. analytics.OnboardingStarted helper added; server-side emission lives
   in PatchOnboarding, fired exactly once per user on the first PATCH
   that carries a non-empty questionnaire payload (firstTouch logic
   compares prior bytes against {} / null). Frontend onboarding_started
   keeps firing on page open; the server emission is what guarantees the
   Prometheus counter exists so Grafana can be cross-checked against the
   PostHog funnel without depending on the SDK roundtrip.

4. business_pairing_test.go tightened:
   - TestNoNakedAnalyticsCaptureInHandlersOrServices now allow-lists at
     function granularity (just captureTaskEvent in service/task.go), not
     whole-file. Any future naked Capture in the same file fails CI.
   - TestEveryAnalyticsRecordEventTakesAnalyticsHelper now does def-use
     tracking inside the enclosing FuncDecl: when RecordEvent's third
     arg is an *ast.Ident, the test walks the function body for the
     assignment that defined it and confirms the RHS is an
     analytics.<Helper>(...) call. Bare local idents that didn't
     originate from analytics are now caught.

5. gofmt -w applied across the touched files; gofmt -l clean.

Tests: go test ./internal/metrics/... ./internal/analytics/... pass.
Pre-existing TestClaimTask_/TestWebhook_MergedPR/TestDeleteIssueByIdentifier
failures on origin/main are DB-environment-dependent and not regressions
from this change.

Co-authored-by: multica-agent <github@multica.ai>

* fix(server): normalise onboarding_started platform label + regression test (MUL-2949)

Addresses 张大彪's last review nit:

- IncForEvent's EventOnboardingStarted case now wraps the platform
  property with NormalizePlatform, matching every other platform-bearing
  metric. A misbehaving frontend can no longer leak a raw X-Client-Platform
  header value into the multica_onboarding_started_total{platform=...}
  series.

- New labels_pr3_test.go covers every PR3 normalizer with both a happy-path
  value and an unknown value, asserting the unknown collapses to the
  documented fallback bucket. Includes a focused regression for
  onboarding_started: emits one event with an attacker-shaped platform
  string and asserts the metric only exposes web + unknown label values
  (no raw header bleed).

- testutil.go gains a small GatherForTest helper so the regression test
  can pull the typed MetricFamily map without re-implementing the
  registry-walk dance.

Co-authored-by: multica-agent <github@multica.ai>

* fix(server): NormalizeTaskSource on workspace_created + document lint limitations (MUL-2949)

Final review touch-ups before merge:

- IncForEvent's EventWorkspaceCreated case wraps source through
  NormalizeTaskSource, matching the other source-bearing dispatches
  (issue_created, agent_created, issue_executed). Closes the last raw
  property leak in the dispatcher table.

- business_pairing_test.go inline docstrings now spell out the two
  known limitations of the lint gate that 张大彪 / Eve flagged:
  analyticsBackedIdents matches by ident NAME (not SSA def-use, so a
  nested-scope shadow could pass) and isMetricsRecordEvent hard-codes
  the import alias set. PR description carries a Follow-ups section
  with the same two items so the work is visible after merge.

Co-authored-by: multica-agent <github@multica.ai>

---------

Co-authored-by: 魏和尚 <agent+wei@multica.ai>
Co-authored-by: multica-agent <github@multica.ai>
2026-06-03 16:39:06 +08:00
Bohan Jiang
f628e48775 refactor(server): error-returning ParseUUID to prevent silent data loss
* refactor(server): make ParseUUID error-returning to prevent silent data loss (MUL-1410)

util.ParseUUID previously swallowed errors and returned a zero pgtype.UUID
on invalid input. When this zero UUID reached a write query (DELETE/UPDATE),
the SQL matched zero rows and the handler returned 2xx success — producing
silent data corruption. #1661 (DeleteIssue with identifier-style ID) was the
visible symptom; PR #1680 patched that one site, this commit closes the
class of bug.

Changes:

- util.ParseUUID now returns (pgtype.UUID, error). Add util.MustParseUUID
  for trusted round-trips that should panic on invalid input.
- handler/handler.go: parseUUID wrapper now calls MustParseUUID — any
  unguarded user-input string reaching it surfaces as a recovered panic
  (chi middleware.Recoverer → 500) instead of silently corrupting data.
  Add parseUUIDOrBadRequest(w, s, fieldName) for handler entry points.
- Convert every Queries.Delete*/Update* call site reachable from raw user
  input (autopilot, comment, project, skill, skill_file, label, pin,
  attachment, feedback, issue assignee, daemon runtime, workspace) to
  validate UUIDs explicitly with parseUUIDOrBadRequest, returning 400 on
  invalid input. Where a resolved entity.ID is already in scope, write
  queries now use it directly instead of re-parsing the URL string.
- Update getWorkspaceMember + loadIssueForUser to handle invalid UUIDs
  gracefully (404/400 instead of panic).
- Update util/middleware/cmd-level callers (subscriber_listeners,
  notification_listeners, activity_listeners, scope_authorizer,
  middleware/workspace) to use the error-returning API.
- Add server/internal/util/pgx_test.go covering valid/invalid input and
  the MustParseUUID panic contract.
- Add TestDeleteIssueByIdentifier + TestDeleteIssueRejectsInvalidUUID
  regression tests in handler_test.go (the original #1661 bug + the
  invalid-input case).
- Document the handler UUID parsing convention in CLAUDE.md so the rule
  is enforceable in future PR review.

* fix(server): address GPT-Boy review of #1748

P1 fixes from PR #1748 review:

1. Migrate remaining request-boundary UUIDs to parseUUIDOrBadRequest so
   malformed input returns 400 instead of panic/500. Was missing on:
   - issue.go: workspace_id in CreateIssue/ChildIssueProgress/ListIssues/
     SearchIssues/BatchUpdateIssues/BatchDeleteIssues; project_id /
     parent_issue_id / lead_id / assignee_id / assignee_ids / creator_id
     filters; batch issue_ids and assignee/parent/project fields in
     BatchUpdateIssues (skip on bad input via util.ParseUUID, matching
     the existing per-row continue semantics).
   - project.go: project id + workspace_id in GetProject/UpdateProject/
     DeleteProject; lead_id in CreateProject/UpdateProject;
     workspace_id in ListProjects + SearchProjects.
   - handler.go: resolveActor now uses util.ParseUUID for X-Agent-ID /
     X-Task-ID headers; invalid UUID falls back to "member" (matches
     pre-existing semantics) instead of panicking.
   - issue.go: validateAssigneePair returns 400 on invalid workspace_id
     instead of panicking.

2. Fix issue:deleted WS event payloads to emit uuidToString(issue.ID)
   instead of the raw URL string. After an identifier-path delete
   ("MUL-7"), the previous payload would have leaked the identifier to
   subscribers, leaving stale entries in frontend caches that key by
   UUID. Updated DeleteIssue (issue.go:1341) and BatchDeleteIssues
   (issue.go:1641). The slog "issue deleted" log line also now records
   the resolved UUID so logs match the WS payload.

3. Extend TestDeleteIssueByIdentifier to subscribe to the bus and
   assert issue:deleted.payload.issue_id is the resolved UUID, not
   the identifier.

* fix(server): validate remaining reviewed UUID inputs

* fix(server): validate remaining handler UUID inputs

* fix(server): finish request boundary UUID audit

* fix(server): validate remaining request body UUIDs

* fix(server): validate runtime path UUIDs

* fix(server): validate remaining audit UUID inputs

---------

Co-authored-by: Eve <eve@multica.ai>
2026-04-28 14:50:28 +08:00
Naiyuan Qing
d6e7824ff1 feat(feedback): in-app feedback flow + Help launcher (#1546)
* feat(feedback): add in-app feedback flow and Help launcher

Replaces the duplicated bottom-sidebar user popover and "What's new" links
with a single Help menu (Docs / Feedback / Change log) pinned to the
sidebar footer. Feedback opens a rich-text modal that POSTs to a new
/api/feedback endpoint; submissions land in a dedicated feedback table
with per-user hourly rate limiting (10/hr) to deter spam without adding
middleware infrastructure. User identity (avatar + name + email) moves
into the workspace dropdown header so the sidebar is no longer visually
redundant.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(feedback): harden submit path and cap request body

- Read editor markdown via ref at submit time instead of debounced state,
  so ⌘+Enter immediately after typing doesn't drop the last keystrokes.
- Block submission while images are still uploading; toast prompts the
  user to wait instead of silently sending markdown with blob: URLs
  that get stripped.
- Cap /api/feedback request body at 64 KiB via MaxBytesReader so an
  authenticated client can't bloat the metadata JSONB column with an
  oversized url field.
- Add Go handler tests covering happy path, empty-message rejection,
  and the hourly rate limit boundary.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(analytics): instrument feedback funnel

Adds two events pairing frontend intent with backend conversion so we
can compute a completion rate for the in-app Feedback modal:

- `feedback_opened` (frontend) — fires once on FeedbackModal mount.
  Source is currently always "help_menu" but the type is a union so
  future entry points have to extend it explicitly. Workspace id is
  attached when present.
- `feedback_submitted` (backend) — fires from CreateFeedback after the
  DB insert succeeds and the hourly rate-limit check has passed.
  Message content itself is never sent to PostHog; the event carries
  a coarse length bucket (0-100 / 100-500 / 500-2000 / 2000+), an
  image-presence flag, and the client platform / version pulled from
  X-Client-* headers via middleware.ClientMetadataFromContext.

Affects no existing funnel; seeds a new Feedback funnel for product
triage.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 10:35:55 +08:00