Files
multica/server/internal/handler/source_telemetry.go
Naiyuan Qing 63eb6f73ad feat(analytics): anonymous self-host onboarding source beacon (MUL-3708) (#4691)
* feat(analytics): anonymous self-host onboarding source beacon (MUL-3708)

Production self-host servers now report the anonymous onboarding "how did
you hear about us" channel to Multica's public write-only ingest, so the
self-host source distribution becomes visible alongside official cloud.
Official cloud keeps its existing PostHog capture unchanged; this is a
submit-time beacon, not a background telemetry pipeline.

- server/internal/sourcebeacon: ShouldSend gate (production + non-local +
  non-*.multica.ai app host, fail-closed — judged by the app/frontend host,
  not the backend URL, which official often leaves unset), per-instance
  salted hashing, deterministic event uuid, fire-and-forget sender.
- POST /api/telemetry/self-host-source: public, write-only, per-IP
  rate-limited, 4 KiB body cap, channel allowlist, strict unknown-field
  rejection. Lands in PostHog as self_host_source_channel with a
  deterministic uuid (best-effort dedup), $process_person_profile=false,
  and deployment=self_host — a distinct event name so it never pollutes the
  official onboarding funnel.
- Hook in PatchOnboarding fires once when the source is first set; never
  blocks onboarding. Only channel enum(s) + two per-instance hashes leave
  the box — never user_id/email/name/workspace/org/domain/role/use_case/the
  source_other free-text/IP.
- migration 128: system_settings singleton holding instance_salt.
- frontend: self-host-only anonymous-collection notice on the source step,
  gated by a new /api/config self_host_source_notice flag (en/zh-Hans/ko/ja).
- analytics.Event gains an optional top-level uuid; docs/analytics.md,
  SELF_HOSTING.md and .env.example document exactly what is/isn't sent and
  how to disable it (ANALYTICS_DISABLED). Also fixes the long-standing
  team_size→source drift in docs/analytics.md.

Verified locally: go build/vet, go test (sourcebeacon, analytics, handler),
pnpm typecheck (all packages), locale parity (157), step-source (6) + core
config/schema (69) vitest, lint (0 errors).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-authored-by: multica-agent <github@multica.ai>

* fix(analytics): wire self-host source beacon through metrics, guard nil pool (MUL-3708)

Addresses Howard CI blockers on #4691 (no product-direction change):

- loadInstanceSalt returns "" on nil pool; salt is only loaded when
  ShouldSendFromEnv() is true, via a bounded (5s) context — restores the
  "router constructible without a DB" invariant (nil-pool routing tests).
- Add multica_self_host_source_channel_total counter (by source) + an
  IncForEvent case, so every analytics event is paired with a Prometheus
  counter. NormalizeSourceChannel reuses sourcebeacon allowlist (no 3rd copy).
- Beacon handler now builds the event via the analytics.SelfHostSourceChannel
  helper and ships it through obsmetrics.RecordEvent (no naked Capture); not
  IsMetricsOnly, so it still reaches PostHog.
- Prime the new family in the registry-families test.

Verified: go build/vet, go test ./internal/metrics ./internal/sourcebeacon
./internal/handler ./cmd/server (incl. the 3 named blockers + registry +
record-event-helper lints) all green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-authored-by: multica-agent <github@multica.ai>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Co-authored-by: multica-agent <github@multica.ai>
2026-06-29 15:56:16 +08:00

74 lines
2.6 KiB
Go

package handler
import (
"encoding/json"
"net/http"
"github.com/multica-ai/multica/server/internal/analytics"
obsmetrics "github.com/multica-ai/multica/server/internal/metrics"
"github.com/multica-ai/multica/server/internal/sourcebeacon"
)
// HandleSelfHostSourceBeacon ingests the self-host onboarding source beacon
// (MUL-3708). It is mounted PUBLIC and unauthenticated by design: self-host
// instances hold no Multica credential, so the data is treated as a coarse
// directional signal, not a trusted count. Abuse is bounded by the per-IP
// rate limiter (router), the tiny body cap, the channel allowlist, and
// strict unknown-field rejection.
//
// The payload is anonymous by construction — only channel enums plus two
// per-instance hashes. DisallowUnknownFields means any stray identity field
// (email, source_other, …) is rejected rather than logged or forwarded.
//
// Each valid channel becomes one PostHog event with a deterministic uuid
// (PostHog dedups on uuid) and $process_person_profile=false (so the
// anonymous uid_hash never spawns a PostHog person). On the official cloud
// this lands in the configured PostHog; on a self-host instance the same
// route exists but is unused (its analytics client is a no-op).
func (h *Handler) HandleSelfHostSourceBeacon(w http.ResponseWriter, r *http.Request) {
r.Body = http.MaxBytesReader(w, r.Body, sourcebeacon.MaxBodyBytes)
dec := json.NewDecoder(r.Body)
dec.DisallowUnknownFields()
var p sourcebeacon.Payload
if err := dec.Decode(&p); err != nil {
writeError(w, http.StatusBadRequest, "invalid request body")
return
}
// Reject trailing data / a second JSON value in the body.
if dec.More() {
writeError(w, http.StatusBadRequest, "invalid request body")
return
}
if p.V != sourcebeacon.SchemaVersion {
writeError(w, http.StatusBadRequest, "unsupported schema version")
return
}
if !sourcebeacon.IsValidHash(p.UIDHash) || !sourcebeacon.IsValidHash(p.InstanceHash) {
writeError(w, http.StatusBadRequest, "invalid hash")
return
}
channels := sourcebeacon.FilterValidChannels(p.Channels)
if len(channels) == 0 {
writeError(w, http.StatusBadRequest, "no valid channels")
return
}
for _, ch := range channels {
ev := analytics.SelfHostSourceChannel(
p.InstanceHash,
p.UIDHash,
ch,
sourcebeacon.EventUUID(p.InstanceHash, p.UIDHash, ch),
)
// RecordEvent (not a naked Capture) so the event also increments the
// Prometheus counter via IncForEvent. Not IsMetricsOnly, so it still
// ships to PostHog. m==nil-safe (tests / metrics-disabled deploys).
obsmetrics.RecordEvent(h.Analytics, h.Metrics, ev)
}
w.WriteHeader(http.StatusNoContent)
}