ShardedStreamRelay shard readers previously started with lastID=,
which meant events published while a pod was down were silently lost.
Replace the cursor with a bounded time-window start ID derived
from (now - ReplayGrace), defaulting to 5 minutes. The timestamp is
clamped to 0 to handle misconfigured clocks gracefully.
Key changes:
- Add ReplayGrace field to ShardedStreamRelayConfig (default 5m)
- Add replayStartID() helper with non-negative clamp
- Extract readShardOnce() from readShard() for testability
- Add REALTIME_RELAY_REPLAY_GRACE env var for runtime tuning
- Add regression tests for bounded cursor and replay behavior
Closes#4797
* fix(realtime): allow same-origin WebSocket clients (mobile/CLI)
The previous CheckOrigin implementation (PR #2318) bypassed the Origin
check whenever the request URL carried `client_platform=mobile` and no
browser session cookie. That contract requires every native client to
remember to add a query parameter — and in practice mobile clients hit
ws://localhost:8080/ws with no extra params, so the Origin filled by
the WebSocket library (the server's own host) gets rejected.
Replace the platform-specific bypass with same-origin acceptance: if
Origin's host equals the request Host, allow the upgrade. This is
gorilla/websocket's default CheckOrigin behavior, restored alongside
the existing cross-origin allowlist (for browser web/desktop clients).
Native clients are now zero-config. CSRF defense is unaffected:
SameSite=Strict cookies, the multica_csrf token, workspace membership
check, and the allowlist itself remain in place. Browser CSWSH attacks
fail both same-origin (browser forces Origin = page origin, not the
server's Host) and allowlist checks.
Refs: https://pkg.go.dev/github.com/gorilla/websockethttps://cheatsheetseries.owasp.org/cheatsheets/WebSocket_Security_Cheat_Sheet.html
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-authored-by: multica-agent <github@multica.ai>
* fix(realtime): use case-insensitive Host comparison for same-origin
HTTP host is case-insensitive (RFC 7230 §2.7.3), and gorilla/websocket's
default checkSameOrigin uses equalASCIIFold(u.Host, r.Host). The plain
== comparison would reject legitimate same-origin requests with a
case-mismatched Host header (e.g. Host: LOCALHOST:8080 vs
Origin: http://localhost:8080).
Switch to strings.EqualFold and cover the case with a regression test.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-authored-by: multica-agent <github@multica.ai>
---------
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Co-authored-by: multica-agent <github@multica.ai>
* feat(realtime): phase 0 — extract Broadcaster interface + add metrics
Phase 0 of the WebSocket horizontal-scaling plan tracked in MUL-1138.
This change is intentionally behavior-preserving: it sets up the seams
needed for later phases (subscribe/unsubscribe protocol, scope-level
fanout, Redis Streams relay) without altering any wire protocol or
producer call sites.
What changed
- New realtime.Broadcaster interface covering the three fanout methods
producers already use on *Hub (BroadcastToWorkspace, SendToUser,
Broadcast). *Hub continues to satisfy it; a future Redis-backed
implementation can be dropped in without touching listeners.
- registerListeners now depends on realtime.Broadcaster instead of
*realtime.Hub, isolating the bus → realtime fanout layer behind an
interface.
- New realtime.Metrics singleton with atomic counters: connects,
disconnects, active connections, slow-client evictions, total
messages sent/dropped, and per-event-type send counters. Wired into
Hub register/unregister/broadcast paths and into every listener.
- New GET /health/realtime endpoint returning a JSON snapshot of the
metrics so we can observe baseline fanout pressure before phase 1.
Why phase 0 first
GPT-Boy's only-Redis plan and CC-Girl's review both call out the same
prerequisite: get a Broadcaster seam and visibility in place before
introducing scope-level subscriptions or a Redis relay. Doing this as
a standalone step keeps each later PR focused and trivially revertable.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* feat(realtime): only-Redis fanout — scopes, subscribe protocol, Redis Streams relay (MUL-1138)
Implements the final-version plan agreed in MUL-1138 on top of phase 0:
* Hub: 4 scope types (workspace/user/task/chat), per-client subscription
set, subscribe/unsubscribe WS frames, ScopeAuthorizer hook for
task/chat scope auth, first/last-subscriber callbacks for the relay,
workspace+user auto-subscribe on connect.
* RedisRelay: Broadcaster impl that XADDs every event into
ws:scope:{type}:{id}:stream and XREADGROUPs only the scopes for which
this node has live subscribers. Per-node consumer group, heartbeat,
stale-consumer sweeper, MAXLEN cap, lag/disconnect metrics.
* Listeners: route task:* events to ScopeTask, chat:* events to
ScopeChat; workspace remains the default for everything else.
* events.Event: optional TaskID / ChatSessionID hints so the listener
layer can pick the right scope without re-parsing payloads.
* Handler: publishTask / publishChat helpers; chat + task message
publishers updated to use them.
* main.go: when REDIS_URL is set, wrap the hub with NewRedisRelay and
pass the relay (instead of the hub) to registerListeners. A
db-backed ScopeAuthorizer enforces that task/chat subscribes belong
to the caller's workspace.
* Metrics: per-scope subscribe/deny counters, redis connect state, node
id, lag/dropped counters surfaced via /health/realtime.
Behavior in single-node mode (REDIS_URL unset) is unchanged.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* fix(realtime): address PR #1429 review must-fix items (MUL-1138)
- listeners: keep task/chat events on workspace fanout until the WS
client supports scope-subscribe + reconnect-replay. Routing them
through BroadcastToScope today (without any client subscriber) would
silently drop every chat / task message and break the live timeline,
chat unread badges, and pending-task UI. The server-side scope infra
(Hub subscribe/unsubscribe, ScopeAuthorizer, Redis Streams relay)
stays in place so flipping the switch in the client follow-up PR is
a one-line change.
- scope_authorizer: ScopeChat now enforces CreatorID == userID, mirroring
the HTTP layer (handler/chat.go: GetChatSession / SendChatMessage /
MarkChatSessionRead). Without this, any workspace member who learned a
session_id could subscribe to chat:message / chat:done /
chat:session_read for a peer's private chat. The same creator-only
check is applied to ScopeTask when the task is a chat task
(task.ChatSessionID set). Issue tasks remain workspace-scoped.
- Refactor scope authorizer to depend on a narrow scopeAuthQuerier
interface so its decisions can be unit-tested without a live DB.
- Add tests:
* listeners_scope_test.go pins the workspace-fanout fallback for
task:message / task:progress / chat:message / chat:done /
chat:session_read.
* scope_authorizer_test.go covers chat creator-only access, chat-task
creator-only access, and issue-task workspace-only access (creator
allowed, peer denied, cross-workspace denied, missing session
denied, empty userID denied).
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
---------
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: CC-Girl <cc-girl@multica.ai>
* feat: identify clients via X-Client-Platform/Version/OS
Adds client identification headers (and matching WS query params) across
all first-party clients so the server can split logs/metrics/gating by
caller without parsing User-Agent.
- HTTP: X-Client-Platform, X-Client-Version, X-Client-OS
- WS: client_platform, client_version, client_os query params
- Platform ∈ {web, desktop, cli, daemon}; OS ∈ {macos, windows, linux}
Wired through the shared TS ApiClient/WSClient via a new identity option
on CoreProvider. Web reads its version from package.json/env; Desktop
captures version + OS synchronously in preload via sendSync IPC. Go CLI
and daemon clients populate the same headers using runtime.GOOS
(normalized darwin → macos).
Server-side adds a ClientMetadata middleware that stashes the headers in
request context; the request logger and logger.RequestAttrs surface them
on every access log and handler-level log. Realtime hub logs the same
fields on websocket connect.
CORS allowlist extended for the new headers.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* test: address client-identity PR nits
- Memoize the CoreProvider identity object on Web and Desktop, and key
WSProvider's effect on identity primitives instead of the object
reference, so unrelated parent re-renders no longer tear down and
reconnect the WebSocket.
- Add direct header-injection tests for the CLI and daemon Go HTTP
clients (X-Client-Platform/Version/OS) and a normalizeGOOS unit test
on both packages.
- Add a TS test for WSClient that asserts client_platform/client_version/
client_os land on the upgrade URL and never leak the auth token.
- Add a hub test that dials the WS endpoint with client_* query params
and asserts the "websocket connected" log entry surfaces them as
structured attributes.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
---------
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Reapply "feat: workspace URL refactor + slug-first API identity (#1131)" (#1137)
This reverts commit 9b94914bc8.
* compat: legacy URL redirect + localStorage double-write for safe rollback
The first attempt at this refactor (#1131) was reverted because existing
users on old URLs (/issues, /projects, etc.) hit 404 immediately after
deploy, and rolling back left them with empty dashboards — the legacy
code reads localStorage["multica_workspace_id"] to attach a workspace
to API requests, but the new code had stopped writing that key.
Two compat layers added on top of the restored refactor:
1. proxy.ts now intercepts legacy route prefixes (/issues/*, /projects/*,
/agents/*, /inbox/*, /my-issues/*, /autopilots/*, /runtimes/*,
/skills/*, /settings/*). Logged-in users with a last_workspace_slug
cookie are 302'd to /{slug}/{rest}, preserving their deep link. Users
without the cookie bounce through / where the landing page picks a
workspace client-side. Unauthenticated users go to /login.
2. Both layouts now double-write the workspace id to the legacy
localStorage key on every workspace entry. New code ignores this key
— it exists solely so that if this PR ever gets reverted again, the
legacy build reading the key would still find the correct workspace
and avoid the empty-dashboard symptom users saw during the rollback.
Net effect: any direction of deploy ↔ rollback is now cache-compatible,
and any direction of old bookmark → new route resolves without 404.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(platform): defer rehydrateAllWorkspaceStores to a microtask
Same React 19 render-phase restriction that forced setCurrentWorkspace
to defer its subscriber notifications. rehydrateAllWorkspaceStores
synchronously calls each persist store's rehydrate, which setState()s
the store, which schedules updates on any subscribed component. When
the workspace layout's render-phase ref guard invoked this, React
complained that SearchCommand (a store subscriber) couldn't be
re-rendered while WorkspaceLayout was still rendering.
Fix: queueMicrotask the rehydrate loop and add a pending-flag guard so
rapid workspace switches coalesce into one rehydrate on the final slug.
Persist stores tolerate one microtask of staleness — they hold UI
preferences, not correctness-critical state.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: workspace URL refactor + slug-first API identity
Make the URL the single source of truth for workspace identity.
All workspace-scoped URLs now carry the workspace slug as the first
path segment (/{slug}/issues, /{slug}/projects, etc.), matching the
industry standard (Linear, Notion, Vercel, GitHub).
## Key architectural changes
**URL-driven workspace identity:**
- Web routes moved under app/[workspaceSlug]/(dashboard)/
- Desktop routes nested under /:workspaceSlug
- paths.ts builder centralises all URL construction
- reserved-slugs validation (backend + frontend + DB migration audit)
**Slug-first API contract:**
- Frontend sends X-Workspace-Slug header (from URL) instead of X-Workspace-ID (UUID)
- Backend middleware resolves slug → UUID via GetWorkspaceBySlug, falls back to
X-Workspace-ID for CLI/daemon backwards compatibility
- WebSocket auth accepts ?workspace_slug query param with SlugResolver callback
**State cleanup:**
- Deleted: useWorkspaceStore (Zustand mirror), switchWorkspace/hydrateWorkspace/
clearWorkspace, localStorage["multica_workspace_id"], api._workspaceId
- useCurrentWorkspace() derives from URL slug + React Query workspace list
- useWorkspaceId() is now a bridge hook (no Context, derives from useCurrentWorkspace)
- WorkspaceIdProvider removed from DashboardGuard
- Paired module vars (slug + UUID) in workspace-storage.ts for non-React consumers
**Layout simplified:**
- Render-phase ref guard sets workspace context synchronously (no async gate)
- DashboardGuard handles auth redirect, loading state, and workspace resolution
- Subscriber notifications deferred via queueMicrotask (React 19 compat)
- persist namespace uses slug (immutable) instead of UUID
## Issues resolved
MUL-43 (share links), MUL-509 (mobile workspace switch), MUL-723 (workspace in URL),
MUL-727 (create workspace flash), MUL-728 (delete workspace no-navigate),
MUL-820 (sidebar Join not switching)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: resolve code review C3/C4/C5/C6 — desktop deadlock + hardcoded paths
C3: Desktop OnboardingGate was calling useCurrentWorkspace() outside
WorkspaceSlugProvider → always null → permanent onboarding deadlock.
Rewrite to use useQuery(workspaceListOptions()) which reads React Query
cache directly without slug context. Remove DashboardGuard from
DesktopShell (auth gating handled by AppContent, workspace routing by
WorkspaceRouteLayout per-tab).
C4: Landing page "Dashboard" links hardcoded /issues (no longer valid).
Changed to / — proxy handles redirect to /{lastSlug}/issues.
C5: autopilots-page.tsx had one hardcoded /autopilots/${id} link.
Changed to wsPaths.autopilotDetail(id).
C6: inbox-page.tsx hardcoded /inbox paths. Changed to wsPaths.inbox().
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(desktop): wrap shell in WorkspaceSlugProvider from module var
AppSidebar calls useWorkspacePaths() → useRequiredWorkspaceSlug() which
throws outside WorkspaceSlugProvider. In the desktop shell, the sidebar
renders at the shell level (outside any tab's WorkspaceRouteLayout).
Fix: DesktopShell reads the current slug via useSyncExternalStore on
the workspace-storage singleton. When slug is available, wraps the
entire shell in WorkspaceSlugProvider. When null (first mount before
any tab's WorkspaceRouteLayout sets it), shows a loading spinner.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(desktop): migrate old tab paths + fix shell slug deadlock
Tab store rehydration: old-format paths like "/issues/abc" (missing
workspace slug prefix) are reset to "/" so IndexRedirect picks the
correct workspace. Detection: if the first segment is a known route
name (issues, projects, etc.) rather than a workspace slug, it's an
old-format path.
Desktop shell: TabContent must always render (not gated behind slug
check) so WorkspaceRouteLayout can mount and call setCurrentWorkspace.
Only sidebar and shell-level UI (chat, modals, search) gate on slug
being present.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Without a heartbeat, dead or silently-dropped WebSocket connections are
not detected until the next write fails. This causes goroutine and memory
leaks for each stale client, and breaks real-time updates for users whose
connections are dropped by a load balancer or proxy idle timeout (e.g.
Nginx default 60s, AWS ALB default 60s) without a TCP RST.
This commit applies the standard gorilla/websocket keepalive pattern:
- writePump sends a ping frame every pingPeriod (54 s) using a ticker.
The ticker replaces the simple range-over-channel loop with a select,
which also adds a proper write deadline on every write operation.
- readPump installs a pong handler that resets the read deadline on each
pong, keeping healthy connections alive indefinitely. A connection
that misses a pong is detected within pongWait (60 s) and closed,
which causes readPump to exit and send the client to hub.unregister
for clean removal.
Timing constants:
writeWait = 10 s (per-write deadline, prevents hung writers)
pongWait = 60 s (max silence before declaring a connection dead)
pingPeriod = 54 s (ping interval, 90 % of pongWait)
Also adds user_id and workspace_id to the write-error log line so that
connection problems can be attributed to a specific client in production.
All existing hub tests continue to pass unchanged.
Signed-off-by: Asish Kumar <officialasishkumar@gmail.com>
* fix(security): use first-message auth for WebSocket instead of URL query param
Token was exposed in URL query parameters (HIGH-4 from security audit),
visible in server/proxy logs, browser history, and referrer headers.
Now non-cookie clients (desktop, CLI) send the token as the first
WebSocket message after the connection opens. Cookie-based auth (web)
continues to work unchanged. Server-side auth priority flipped to
cookie-first.
Closes MUL-580
* fix(security): add auth_ack and fix test JSON construction
Server sends auth_ack after successful first-message auth so the client
knows auth completed before firing reconnect callbacks. Test now uses
json.Marshal instead of string concatenation for the auth message.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(test): update WebSocket integration test for first-message auth
The integration test still passed the token as a URL query param,
causing a timeout since the server now expects first-message auth
for non-cookie clients.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: yushen <ldnvnbl@gmail.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: add online status indicator dot on agent & member avatars
Backend:
- Track member presence via WebSocket connections in the Hub
- Broadcast member:online/offline events when users connect/disconnect
- Add GET /api/workspaces/{id}/members/online endpoint
- Add member:online and member:offline event type constants
Frontend:
- Add isOnline prop to ActorAvatar with a status dot at top-right corner
- Green dot = online, gray dot = offline, no dot = status unknown
- Fetch online member list via new query, update optimistically on WS events
- Derive agent online status from existing agent.status field
- Wire online status through ActorAvatar views wrapper (enabled by default)
* fix: address code review — fix hub tests and avatar rounding
1. Hub tests: consume the member:online presence event from the first
connection before asserting on broadcast messages.
2. ActorAvatar: use rounded-[inherit] on the inner wrapper so callers
can override rounding (e.g. rounded-lg for agent list items).
* fix: consume member:online presence event in integration test
Same fix as the hub unit tests — read and discard the member:online
event before asserting on issue:created in TestWebSocketIntegration.
* feat(auth): migrate auth token to HttpOnly cookie & implement WebSocket Origin whitelist
Security improvements from the MUL-566 audit report:
1. Auth token is now set as an HttpOnly, SameSite=Lax cookie on login,
preventing XSS-based token theft. Cookie-based auth includes CSRF
protection via double-submit cookie pattern. The Authorization header
path is preserved for Electron desktop app and CLI/PAT clients.
2. WebSocket upgrader now validates the Origin header against a
configurable allowlist (ALLOWED_ORIGINS env var), rejecting
connections from unauthorized origins.
Backend: new auth cookie helpers, middleware reads cookie as fallback,
WS handler accepts cookie auth, Origin whitelist, logout endpoint.
Frontend: CSRF token in API headers, cookie-aware auth store and WS
client, web app opts into cookieAuth mode while desktop keeps tokens.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(auth): address PR review — Strict cookies, HMAC-bound CSRF, origin sync
1. SameSite=Lax → SameSite=Strict per spec requirement
2. CSRF token now HMAC-signed with auth token (nonce.signature format),
preventing subdomain cookie injection attacks
3. allowedWSOrigins uses atomic.Value to eliminate data race
4. Removed magic "cookie" sentinel string in WSProvider — pass null token
and guard with boolean check instead
5. Removed dead delete uploadHeaders["Content-Type"] in API client
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The /ws endpoint only accepted JWT tokens while REST /api/* routes
accepted both JWTs and PATs (mul_*). Add PATResolver interface and
wire it into HandleWebSocket so PAT holders can use WebSocket streaming.
Also update README (en + zh-CN) to list OpenClaw and OpenCode as
supported agent runtimes alongside Claude Code and Codex.
Inbox events (new, read, archived, batch) are now sent via SendToUser
instead of broadcasting to the entire workspace room. Adds a new
Hub.SendToUser method. Also guards task broadcasts against deleted
issues to prevent global event leaks.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace raw fmt/log calls with structured slog logger (Go) and
console-based logger (TypeScript). Add request logging middleware.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix useRealtimeSync never receiving WSClient (useRef → useState for
re-render trigger, keeping ref for lazy subscribe callback)
- Fix Hub.Run() global broadcast mutating map under RLock (same
two-phase collect+cleanup pattern as BroadcastToWorkspace)
- Move visibleStatuses to module-level constant (prevent useCallback
recreation every render)
- Replace console.error with toast.error for user-facing operations
in issues page and inbox page
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add internal event bus (server/internal/events/) with synchronous
pub/sub and panic isolation per listener
- Upgrade WebSocket Hub to workspace-scoped rooms with JWT auth
and membership verification on connect
- Add 10 new WS event types (comment CRUD, inbox read/archive,
agent create/delete, workspace/member events)
- Refactor all handlers and TaskService to publish events via Bus
instead of direct Hub.Broadcast calls
- Add WS broadcast listener that routes events to correct workspace
- Frontend: WSClient sends token + workspace_id on connect with
auto-reconnect refetch
- Frontend: centralized useRealtimeSync hook dispatches all WS
events to global Zustand stores
- Migrate issues and inbox pages from local useState to global
useIssueStore/useInboxStore
- Make store addIssue/addItem idempotent to prevent duplicates
- Remove dead packages/hooks/src/use-realtime.ts
- Add feature tracking files for 4 planned features
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add JWT middleware unit tests (8 tests covering all auth edge cases)
- Add WebSocket hub tests (5 tests for client lifecycle and broadcast)
- Add full HTTP integration tests (12 tests through real Chi router with DB)
- Add frontend component tests for login, issues, and issue detail pages
- Add auth context unit tests (9 tests for login/logout/name resolution)
- Add Playwright E2E tests for auth, issues, comments, and navigation
- Configure Vitest with jsdom, React plugin, and path aliases
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>