Files
multica/server/internal/auth/daemon_token_cache_test.go
Bohan Jiang 86e7de3e41 feat(server/auth): cache auth token lookups in Redis with 10m TTL
* feat(server/auth): cache PAT lookups in Redis with 60s TTL

Personal access tokens used to hit Postgres on every request: a SELECT
to resolve token_hash → user_id, plus a fire-and-forget UPDATE of
last_used_at. For a CLI / daemon making many requests per second this
is wasted DB load — the token is the same and the answer hasn't changed.

Add a Redis-backed cache (auth.PATCache) keyed by token hash, TTL 60s:

- On cache hit, the auth middleware skips both the SELECT and the
  last_used_at UPDATE. last_used_at is now refreshed at most once per
  TTL window per token, not per request.
- On cache miss the middleware falls back to today's behavior: query
  Postgres, populate the cache, async-update last_used_at.
- On revoke, the handler invalidates the cache entry so revocation
  takes effect immediately rather than waiting for the TTL to expire.
  This required changing RevokePersonalAccessToken from :exec to :one
  RETURNING token_hash.

The cache is nil-safe: when REDIS_URL isn't configured, NewPATCache
returns nil and the middleware degrades to today's always-hit-DB
behavior. JWT validation is untouched (already DB-free).

Tested with REDIS_TEST_URL — same gating pattern the rest of the
suite uses for Redis-backed tests. New tests cover nil-safety, set/
get/invalidate, TTL, and the middleware short-circuit on cache hit.

* fix(server/auth): clamp PAT cache TTL to token's remaining lifetime

GPT-Boy review caught: a PAT expiring in <60s would still be cached
for the full PATCacheTTL window, so the token could continue passing
auth on cache hit for up to ~60s after its expires_at. The DB query
filters expired tokens (revoked = FALSE AND expires_at > now()), but
that filter never ran on a cache hit.

Make Set take an explicit ttl, and add TTLForExpiry to compute it:
  - no expires_at      → full PATCacheTTL
  - expires_at far     → full PATCacheTTL
  - expires_at <60s    → time until expiry
  - already expired    → 0, Set skips caching (TOCTOU defense between
                         the SELECT and the Set, since the SELECT
                         already filters expired rows)

Regression test pins the clamp behavior end-to-end against Redis.

* feat(server/auth): cache daemon-token + PAT lookups in DaemonAuth, bump TTL to 10m

Daemon /api/daemon/* requests (heartbeat, claim task) hit DaemonAuth
which previously did its own GetDaemonTokenByHash on every request and
*also* duplicated the PAT lookup on the mul_ fallback — bypassing the
cache added in 1cdd674c. Today's daemons authenticate via mul_ PATs
(mdt_ minting isn't wired up yet), so the duplicate PAT path is the one
that actually matters for hot-path DB load.

Three changes:

1. New auth.DaemonTokenCache mirrors PATCache for the mdt_ path
   (key = mul:auth:daemon:<sha256>, JSON value = {workspace_id, daemon_id}).
   Forward-looking infrastructure for when daemon tokens get minted; the
   middleware short-circuits the DB SELECT on cache hit. TTL clamped to
   the token's expires_at via the shared TTLForExpiry helper.

2. DaemonAuth now also consults PATCache on its mul_ fallback, sharing
   the same cache as the regular Auth middleware. A daemon making 4 hb/min
   collapses from 4 GetPersonalAccessTokenByHash + 4 last_used_at writes
   per minute to ~1 of each per AuthCacheTTL window (~10 minutes).

3. Rename PATCacheTTL → AuthCacheTTL and bump from 60s to 10 minutes.
   The constant is now shared between PAT and daemon caches; 10m matches
   the user-requested longer TTL for further DB write reduction. Revoke
   latency on the happy path is still instant via active invalidation;
   the worst-case (Redis Del miss / direct-DB revoke) grows from ~60s to
   ~10m.

Tests cover nil-safety, set/get/invalidate, TTL, clamped TTL on near-
expiry tokens, and the middleware short-circuit for both cache paths
(mdt_ via DaemonTokenCache, mul_ fallback via PATCache).

* feat(server/auth): cache PAT lookups on the WebSocket auth path

The third place a PAT is resolved — patResolver.ResolveToken used by
realtime.HandleWebSocket — was still hitting Postgres on every /ws
auth and firing an unconditional last_used_at UPDATE, bypassing the
cache added in 1cdd674c. Wire it through the same shared PATCache so
revoking a token through any path (Auth middleware, DaemonAuth PAT
fallback, or WS auth) hits all three caches with one Invalidate.

Also leaves a comment on DeleteDaemonTokensByWorkspaceAndDaemon —
the query has no caller today, but a future deregister/rotate flow
must remember to call DaemonTokenCache.Invalidate(hash) for each
deleted row, otherwise deleted daemon tokens stay valid until TTL.
2026-04-29 17:07:54 +08:00

94 lines
2.6 KiB
Go

package auth
import (
"context"
"testing"
"time"
)
func TestDaemonTokenCache_NilSafe(t *testing.T) {
var c *DaemonTokenCache // nil
ctx := context.Background()
if id, ok := c.Get(ctx, "any-hash"); ok || id != (DaemonTokenIdentity{}) {
t.Fatalf("nil cache must miss; got (%+v, %v)", id, ok)
}
c.Set(ctx, "any-hash", DaemonTokenIdentity{WorkspaceID: "w", DaemonID: "d"}, AuthCacheTTL)
c.Invalidate(ctx, "any-hash")
}
func TestNewDaemonTokenCache_NilRedisReturnsNil(t *testing.T) {
if c := NewDaemonTokenCache(nil); c != nil {
t.Fatalf("NewDaemonTokenCache(nil) must return nil, got %#v", c)
}
}
func TestDaemonTokenCache_SetGetInvalidate(t *testing.T) {
rdb := newRedisTestClient(t)
c := NewDaemonTokenCache(rdb)
if c == nil {
t.Fatal("NewDaemonTokenCache returned nil")
}
ctx := context.Background()
if _, ok := c.Get(ctx, "missing"); ok {
t.Fatal("expected miss before set")
}
want := DaemonTokenIdentity{WorkspaceID: "ws-uuid", DaemonID: "daemon-1"}
c.Set(ctx, "hash-D", want, AuthCacheTTL)
if got, ok := c.Get(ctx, "hash-D"); !ok || got != want {
t.Fatalf("expected hit %+v, got (%+v, %v)", want, got, ok)
}
c.Invalidate(ctx, "hash-D")
if _, ok := c.Get(ctx, "hash-D"); ok {
t.Fatal("expected miss after invalidate")
}
}
func TestDaemonTokenCache_TTL(t *testing.T) {
rdb := newRedisTestClient(t)
c := NewDaemonTokenCache(rdb)
if c == nil {
t.Fatal("NewDaemonTokenCache returned nil")
}
ctx := context.Background()
c.Set(ctx, "hash-T", DaemonTokenIdentity{WorkspaceID: "w", DaemonID: "d"}, AuthCacheTTL)
ttl, err := rdb.TTL(ctx, daemonTokenCacheKey("hash-T")).Result()
if err != nil {
t.Fatalf("TTL: %v", err)
}
if ttl <= 0 || ttl > AuthCacheTTL+time.Second {
t.Fatalf("unexpected TTL %v (want ~%v)", ttl, AuthCacheTTL)
}
}
func TestDaemonTokenCache_Set_RespectsClampedTTL(t *testing.T) {
rdb := newRedisTestClient(t)
c := NewDaemonTokenCache(rdb)
if c == nil {
t.Fatal("NewDaemonTokenCache returned nil")
}
ctx := context.Background()
c.Set(ctx, "hash-short", DaemonTokenIdentity{WorkspaceID: "w", DaemonID: "d"}, 5*time.Second)
ttl, err := rdb.TTL(ctx, daemonTokenCacheKey("hash-short")).Result()
if err != nil {
t.Fatalf("TTL: %v", err)
}
if ttl <= 0 || ttl > 5*time.Second+time.Second {
t.Fatalf("expected clamped TTL ~5s, got %v", ttl)
}
c.Set(ctx, "hash-zero", DaemonTokenIdentity{WorkspaceID: "w", DaemonID: "d"}, 0)
if _, ok := c.Get(ctx, "hash-zero"); ok {
t.Fatal("zero-TTL Set must not cache")
}
c.Set(ctx, "hash-neg", DaemonTokenIdentity{WorkspaceID: "w", DaemonID: "d"}, -time.Second)
if _, ok := c.Get(ctx, "hash-neg"); ok {
t.Fatal("negative-TTL Set must not cache")
}
}