mirror of
https://github.com/multica-ai/multica.git
synced 2026-06-17 03:38:32 +02:00
* feat(server/auth): cache PAT lookups in Redis with 60s TTL
Personal access tokens used to hit Postgres on every request: a SELECT
to resolve token_hash → user_id, plus a fire-and-forget UPDATE of
last_used_at. For a CLI / daemon making many requests per second this
is wasted DB load — the token is the same and the answer hasn't changed.
Add a Redis-backed cache (auth.PATCache) keyed by token hash, TTL 60s:
- On cache hit, the auth middleware skips both the SELECT and the
last_used_at UPDATE. last_used_at is now refreshed at most once per
TTL window per token, not per request.
- On cache miss the middleware falls back to today's behavior: query
Postgres, populate the cache, async-update last_used_at.
- On revoke, the handler invalidates the cache entry so revocation
takes effect immediately rather than waiting for the TTL to expire.
This required changing RevokePersonalAccessToken from :exec to :one
RETURNING token_hash.
The cache is nil-safe: when REDIS_URL isn't configured, NewPATCache
returns nil and the middleware degrades to today's always-hit-DB
behavior. JWT validation is untouched (already DB-free).
Tested with REDIS_TEST_URL — same gating pattern the rest of the
suite uses for Redis-backed tests. New tests cover nil-safety, set/
get/invalidate, TTL, and the middleware short-circuit on cache hit.
* fix(server/auth): clamp PAT cache TTL to token's remaining lifetime
GPT-Boy review caught: a PAT expiring in <60s would still be cached
for the full PATCacheTTL window, so the token could continue passing
auth on cache hit for up to ~60s after its expires_at. The DB query
filters expired tokens (revoked = FALSE AND expires_at > now()), but
that filter never ran on a cache hit.
Make Set take an explicit ttl, and add TTLForExpiry to compute it:
- no expires_at → full PATCacheTTL
- expires_at far → full PATCacheTTL
- expires_at <60s → time until expiry
- already expired → 0, Set skips caching (TOCTOU defense between
the SELECT and the Set, since the SELECT
already filters expired rows)
Regression test pins the clamp behavior end-to-end against Redis.
* feat(server/auth): cache daemon-token + PAT lookups in DaemonAuth, bump TTL to 10m
Daemon /api/daemon/* requests (heartbeat, claim task) hit DaemonAuth
which previously did its own GetDaemonTokenByHash on every request and
*also* duplicated the PAT lookup on the mul_ fallback — bypassing the
cache added in 1cdd674c. Today's daemons authenticate via mul_ PATs
(mdt_ minting isn't wired up yet), so the duplicate PAT path is the one
that actually matters for hot-path DB load.
Three changes:
1. New auth.DaemonTokenCache mirrors PATCache for the mdt_ path
(key = mul:auth:daemon:<sha256>, JSON value = {workspace_id, daemon_id}).
Forward-looking infrastructure for when daemon tokens get minted; the
middleware short-circuits the DB SELECT on cache hit. TTL clamped to
the token's expires_at via the shared TTLForExpiry helper.
2. DaemonAuth now also consults PATCache on its mul_ fallback, sharing
the same cache as the regular Auth middleware. A daemon making 4 hb/min
collapses from 4 GetPersonalAccessTokenByHash + 4 last_used_at writes
per minute to ~1 of each per AuthCacheTTL window (~10 minutes).
3. Rename PATCacheTTL → AuthCacheTTL and bump from 60s to 10 minutes.
The constant is now shared between PAT and daemon caches; 10m matches
the user-requested longer TTL for further DB write reduction. Revoke
latency on the happy path is still instant via active invalidation;
the worst-case (Redis Del miss / direct-DB revoke) grows from ~60s to
~10m.
Tests cover nil-safety, set/get/invalidate, TTL, clamped TTL on near-
expiry tokens, and the middleware short-circuit for both cache paths
(mdt_ via DaemonTokenCache, mul_ fallback via PATCache).
* feat(server/auth): cache PAT lookups on the WebSocket auth path
The third place a PAT is resolved — patResolver.ResolveToken used by
realtime.HandleWebSocket — was still hitting Postgres on every /ws
auth and firing an unconditional last_used_at UPDATE, bypassing the
cache added in 1cdd674c. Wire it through the same shared PATCache so
revoking a token through any path (Auth middleware, DaemonAuth PAT
fallback, or WS auth) hits all three caches with one Invalidate.
Also leaves a comment on DeleteDaemonTokensByWorkspaceAndDaemon —
the query has no caller today, but a future deregister/rotate flow
must remember to call DaemonTokenCache.Invalidate(hash) for each
deleted row, otherwise deleted daemon tokens stay valid until TTL.
109 lines
3.5 KiB
Go
109 lines
3.5 KiB
Go
package auth
|
|
|
|
import (
|
|
"context"
|
|
"errors"
|
|
"log/slog"
|
|
"time"
|
|
|
|
"github.com/redis/go-redis/v9"
|
|
)
|
|
|
|
// AuthCacheTTL bounds how long a token-hash lookup stays cached before
|
|
// the auth middleware goes back to Postgres. Shared by PATCache and
|
|
// DaemonTokenCache so both kinds of token follow the same revocation
|
|
// latency contract. Short enough that revocation lag from a missed
|
|
// invalidation is bounded; long enough that a high-frequency client
|
|
// (CLI, daemon) collapses from one DB round-trip per request to one
|
|
// per TTL window.
|
|
const AuthCacheTTL = 10 * time.Minute
|
|
|
|
// patCachePrefix namespaces auth-cache keys away from the realtime relay
|
|
// (ws:*) and local-skill (mul:local_skill:*) keys.
|
|
const patCachePrefix = "mul:auth:pat:"
|
|
|
|
// PATCache caches resolved PAT lookups in Redis. A nil *PATCache is safe
|
|
// to use — every method becomes a no-op or reports a cache miss, and the
|
|
// auth middleware degrades to direct DB lookups.
|
|
type PATCache struct {
|
|
rdb *redis.Client
|
|
}
|
|
|
|
// NewPATCache returns a cache backed by rdb. Pass nil to disable caching;
|
|
// the returned *PATCache is safe to call but never hits Redis.
|
|
func NewPATCache(rdb *redis.Client) *PATCache {
|
|
if rdb == nil {
|
|
return nil
|
|
}
|
|
return &PATCache{rdb: rdb}
|
|
}
|
|
|
|
func patCacheKey(hash string) string { return patCachePrefix + hash }
|
|
|
|
// Get returns the cached user_id for a token hash. ok=false on cache miss
|
|
// or any Redis error — a dead Redis must not take down auth.
|
|
func (c *PATCache) Get(ctx context.Context, hash string) (userID string, ok bool) {
|
|
if c == nil {
|
|
return "", false
|
|
}
|
|
v, err := c.rdb.Get(ctx, patCacheKey(hash)).Result()
|
|
if err != nil {
|
|
if !errors.Is(err, redis.Nil) {
|
|
slog.Warn("pat_cache: get failed; falling back to DB", "error", err)
|
|
}
|
|
return "", false
|
|
}
|
|
return v, true
|
|
}
|
|
|
|
// Set populates the cache with the given TTL. Callers MUST pass a TTL no
|
|
// longer than the token's remaining lifetime — otherwise an entry could
|
|
// outlive the PAT's expires_at and let an expired token pass auth on
|
|
// cache hit. Use TTLForExpiry to compute it from a token's expires_at.
|
|
//
|
|
// Errors are logged and swallowed — a cache write failure is not a
|
|
// request failure.
|
|
func (c *PATCache) Set(ctx context.Context, hash, userID string, ttl time.Duration) {
|
|
if c == nil || ttl <= 0 {
|
|
return
|
|
}
|
|
if err := c.rdb.Set(ctx, patCacheKey(hash), userID, ttl).Err(); err != nil {
|
|
slog.Warn("pat_cache: set failed", "error", err)
|
|
}
|
|
}
|
|
|
|
// TTLForExpiry returns the cache TTL for a token given its expires_at.
|
|
// - Zero expiresAt (token never expires) → full AuthCacheTTL.
|
|
// - expiresAt in the future → min(AuthCacheTTL, time until expiry).
|
|
// - expiresAt at or before now → 0 (caller should skip caching; the
|
|
// middleware shouldn't reach here because the SELECT already
|
|
// filters expired tokens, but a TOCTOU between SELECT and Set is
|
|
// possible).
|
|
//
|
|
// Pass time.Time{} when the token has no expiry (pgtype.Timestamptz with
|
|
// Valid=false maps to a zero Time).
|
|
func TTLForExpiry(now, expiresAt time.Time) time.Duration {
|
|
if expiresAt.IsZero() {
|
|
return AuthCacheTTL
|
|
}
|
|
remaining := expiresAt.Sub(now)
|
|
if remaining <= 0 {
|
|
return 0
|
|
}
|
|
if remaining < AuthCacheTTL {
|
|
return remaining
|
|
}
|
|
return AuthCacheTTL
|
|
}
|
|
|
|
// Invalidate removes the entry for hash. Called on PAT revocation so the
|
|
// revoke takes effect immediately rather than waiting for the TTL.
|
|
func (c *PATCache) Invalidate(ctx context.Context, hash string) {
|
|
if c == nil {
|
|
return
|
|
}
|
|
if err := c.rdb.Del(ctx, patCacheKey(hash)).Err(); err != nil {
|
|
slog.Warn("pat_cache: invalidate failed; entry will expire on TTL", "error", err)
|
|
}
|
|
}
|