Files
multica/server/internal/handler/config.go
Xinmin Zeng 270d177475 fix: broken "Add a computer" command on Multica Cloud + two CLI amplifiers (MUL-3087) (#3817)
* fix(server): recognize official cloud by frontend host in daemon setup config

The 'Add a computer' dialog builds its command from /api/config's
daemon_server_url/daemon_app_url, falling back to 'multica setup' when
both are empty. The official cloud is meant to omit them, but the
omission only fired when MULTICA_PUBLIC_URL=https://api.multica.ai. When
that env is unset the server URL defaults to the frontend origin and the
old guard (which required serverURL host == api.multica.ai) didn't match,
so the dialog emitted 'multica setup self-host --server-url
https://multica.ai' — pointing the daemon backend at the frontend (no
/health, no WebSocket proxy).

Identify the official cloud by its frontend host alone (multica.ai /
app.multica.ai) so a missing or misconfigured MULTICA_PUBLIC_URL can no
longer leak the broken self-host command. Regression from #3474.

* fix(cli): probe before persisting self-host config to preserve auth on failure

setup self-host wrote a fresh CLIConfig{ServerURL, AppURL} (a full
overwrite that drops the saved token) and only then probed the server,
returning early on failure. A failed probe therefore logged the user out
and left them unconnected, with no recovery in the same command.

Probe first via persistSelfHostConfigIfReachable: an unreachable server
leaves the existing config — and its token — untouched (failed setup =
no-op). The prober is injected so both branches are unit-tested.

* fix(daemon): serve health before preflight so daemon start readiness is accurate

The CLI's 'daemon start' polls the health endpoint for 15s expecting
status=running, but the daemon only began serving health after
preflightAuth, whose initial workspace sync detects every configured
agent's version by exec'ing it (~20s cold with 8 agents). Health served
too late, so a perfectly healthy daemon printed 'may not have started
successfully'.

Start the health server right after resolveAuth (which still fails fast
on a missing token) and before the slow preflight, so readiness reflects
the daemon core being up rather than agent-version detection finishing.

* fix(daemon): gate /health readiness so daemon start can't report a false start

Serving health before preflightAuth fixed the false-negative (a healthy
daemon printed "may not have started"), but health still returned
status:"running" unconditionally — before preflight (PAT renew + workspace
sync + runtime registration) had completed. `daemon start` and the desktop
treat "running" as ready, so a slow or *failing* preflight could be
misreported as a started daemon: setup prints "connected", then the process
exits or hangs in agent-version detection with no runtime registered. That
is harder to diagnose than the original false-negative.

Split liveness from readiness: bind/serve the health port early (so callers
see a live "starting" daemon instead of connection-refused), but report
status:"starting" until d.ready is set after preflight, then "running".

- daemon.go: add d.ready (atomic.Bool); set it true after the background
  loops launch, before pollLoop.
- health.go: healthHandler reports "starting" until ready, else "running".
- cmd_daemon.go: `daemon start` waits for "running" with a deadline raised
  to 45s (covers cold-start agent detection) and a clearer "still starting"
  message; new daemonAlive() helper treats both "running" and "starting" as
  a live daemon, so the already-running guard, restart, and stop act on a
  starting daemon and don't double-spawn or race its listener; `daemon
  status` shows "starting" distinctly.

Older CLIs/desktop that only know "running" safely treat "starting" as
not-ready (status != "running"), so no boundary break.

Tests: health reports starting-then-running; daemonAlive truth table.
Co-authored-by: multica-agent <github@multica.ai>

* fix(desktop): handle daemon "starting" health status in lifecycle

The daemon now reports /health status:"starting" until preflight completes
(liveness/readiness split). That made "starting" a new external contract of
/health, but the Desktop daemon-manager only knew "running", so the readiness
fix would have moved the CLI's false-negative into a Desktop start regression:

- `daemon start` now blocks up to 45s waiting for readiness, but the Desktop
  spawned it via execFile({ timeout: 20_000 }). On a cold start (the ~20s agent
  detection this PR targets) Electron killed the CLI supervisor at 20s and
  reported a start failure, even though the detached daemon child kept booting —
  the UI flashed "stopped" then "running". Raise the timeout to 60s (must exceed
  the CLI's 45s startupTimeout).
- The Desktop treated only raw status === "running" as a live daemon, so a
  daemon that was still "starting" (booting on its own or started via the CLI)
  showed as "stopped", and startDaemon() would spawn a second one — which the new
  CLI rejects as "already running", surfacing as a start error.

Add daemonStatusAlive() (shared, pure, unit-tested) mirroring the Go daemonAlive()
and use it for liveness: fetchHealth() surfaces a daemon-reported "starting" as
state "starting" regardless of our own currentState; startDaemon()'s
already-running guard and the restart-on-user-switch guard treat "starting" as an
existing daemon. version-decision stays gated on "running" (readiness, not
liveness) — unchanged.

Verified: desktop typecheck, eslint, full vitest suite (193 tests) all pass.
Co-authored-by: multica-agent <github@multica.ai>

---------

Co-authored-by: J <j@multica.ai>
Co-authored-by: multica-agent <github@multica.ai>
2026-06-05 17:01:23 +08:00

131 lines
4.7 KiB
Go

package handler
import (
"net/http"
"net/url"
"os"
"strings"
"github.com/multica-ai/multica/server/internal/analytics"
)
type AppConfig struct {
CdnDomain string `json:"cdn_domain"`
// Public auth config consumed by the web app at runtime so self-hosted
// deployments do not need to rebuild the frontend image when operators
// toggle signup or wire Google OAuth.
AllowSignup bool `json:"allow_signup"`
GoogleClientID string `json:"google_client_id,omitempty"`
// WorkspaceCreationDisabled mirrors the server-side
// DISABLE_WORKSPACE_CREATION env var so the UI can hide every
// "Create workspace" affordance on self-hosted instances. Omitted
// from the JSON when false to keep responses identical to the
// previous shape for the common managed-cloud case (#3433).
WorkspaceCreationDisabled bool `json:"workspace_creation_disabled,omitempty"`
// Public daemon setup config consumed by the web app at runtime so
// self-hosted instances can show `multica setup self-host` commands
// with the operator's own domains instead of Multica Cloud defaults.
DaemonServerURL string `json:"daemon_server_url,omitempty"`
DaemonAppURL string `json:"daemon_app_url,omitempty"`
// PostHog public config for the frontend. The key is the same Project
// API Key the backend uses; returning it here (instead of baking it
// into the frontend bundle via NEXT_PUBLIC_*) means self-hosted
// instances — whose server returns an empty key — automatically
// disable frontend event shipping too.
PosthogKey string `json:"posthog_key"`
PosthogHost string `json:"posthog_host"`
AnalyticsEnvironment string `json:"analytics_environment"`
}
// GetConfig is mounted on the public (unauthenticated) route group because
// the web app calls it before login to decide whether to render the Google
// sign-in button and signup UI. Only add fields here that are safe to expose
// to anonymous callers — never user- or tenant-scoped data.
func (h *Handler) GetConfig(w http.ResponseWriter, r *http.Request) {
config := AppConfig{
AllowSignup: os.Getenv("ALLOW_SIGNUP") != "false",
GoogleClientID: os.Getenv("GOOGLE_CLIENT_ID"),
WorkspaceCreationDisabled: os.Getenv("DISABLE_WORKSPACE_CREATION") == "true",
}
if h.Storage != nil {
config.CdnDomain = h.Storage.CdnDomain()
}
config.DaemonServerURL, config.DaemonAppURL = daemonSetupURLsFromEnv()
// Re-read from env on every request so operators can rotate keys via
// secret refresh without a server restart.
if v := os.Getenv("ANALYTICS_DISABLED"); v != "true" && v != "1" {
config.PosthogKey = os.Getenv("POSTHOG_API_KEY")
config.PosthogHost = os.Getenv("POSTHOG_HOST")
config.AnalyticsEnvironment = analytics.EnvironmentFromEnv()
if config.PosthogHost == "" && config.PosthogKey != "" {
config.PosthogHost = "https://us.i.posthog.com"
}
}
writeJSON(w, http.StatusOK, config)
}
func daemonSetupURLsFromEnv() (string, string) {
serverURL := normalizePublicURL(os.Getenv("MULTICA_PUBLIC_URL"))
appURL := normalizePublicURL(os.Getenv("MULTICA_APP_URL"))
if appURL == "" {
appURL = normalizePublicURL(os.Getenv("FRONTEND_ORIGIN"))
}
if appURL == "" {
return "", ""
}
if serverURL == "" {
serverURL = appURL
}
if isOfficialCloudDaemonConfig(appURL) {
return "", ""
}
return serverURL, appURL
}
func normalizePublicURL(raw string) string {
return strings.TrimRight(strings.TrimSpace(raw), "/")
}
// isOfficialCloudDaemonConfig reports whether this deployment is the official
// Multica Cloud, identified by its frontend host alone (multica.ai /
// app.multica.ai). The daemon setup for the managed cloud is always
// `multica setup` (which hardcodes api.multica.ai), so the per-deployment URLs
// must be omitted from /api/config even when MULTICA_PUBLIC_URL is unset or
// misconfigured. Previously this also required serverURL==api.multica.ai, so a
// cloud deployment that forgot MULTICA_PUBLIC_URL fell through and emitted a
// `setup self-host --server-url https://multica.ai` command — pointing the
// daemon's backend at the frontend (no /health, no WebSocket proxy).
func isOfficialCloudDaemonConfig(appURL string) bool {
return urlHostEquals(appURL, "multica.ai") || urlHostEquals(appURL, "app.multica.ai")
}
func urlHostEquals(raw, want string) bool {
host := canonicalURLHost(raw)
if host == "" {
return false
}
want = strings.TrimSuffix(strings.ToLower(strings.TrimSpace(want)), ".")
return host == want
}
func canonicalURLHost(raw string) string {
raw = strings.TrimSpace(raw)
u, err := url.Parse(raw)
if err != nil {
return ""
}
host := u.Hostname()
if host == "" && !strings.Contains(raw, "://") {
u, err = url.Parse("https://" + raw)
if err != nil {
return ""
}
host = u.Hostname()
}
return strings.TrimSuffix(strings.ToLower(host), ".")
}