Files
multica/packages/views/dashboard/utils.test.ts
Jiayuan Zhang 35fc318d68 feat(runtimes): weekly usage dimension + tz-aware aggregation (MUL-2382) (#2822)
* feat(runtimes): weekly usage dimension + tz-aware aggregation (MUL-2382)

Adds a Weekly view to the runtime Usage chart alongside Daily and Hourly,
backed by `aggregateByWeek` on the existing 180-day daily cache (no new
endpoint). Weeks are ISO 8601 Mon–Sun; the in-progress week is rendered at
half opacity and tooltip-labelled "partial · N / 7 days".

Side effects called out in the RFC:

- `sliceWindow` now reads "today" in the runtime's IANA timezone, fixing a
  one-day drift at the window edge when the browser and runtime sit in
  different time zones.
- ActivityHeatmap rows are reordered Mon → Sun to match the rest of the
  Weekly aggregation; "today" is computed in runtime tz so the grid's
  trailing column lines up with the daily rows the backend buckets.

Dimension / period coupling: switching dimension resets the period to that
dimension's default when the active value isn't in its allowed set
(Hourly 7/30, Daily 7/30/90, Weekly 30/90/180).

Unit tests cover weekStart / addDays / tz-aware today, the sliceWindow
boundary, and aggregateByWeek's partial-week math.

Co-authored-by: multica-agent <github@multica.ai>

* fix(runtimes): weekly chart shows trailing calendar weeks (MUL-2382)

aggregateByWeek built one bucket per week-with-data, and the caller
took the last N buckets. With sparse data — old populated weeks plus
empty stretches near today — the slice surfaced the old weeks instead
of the trailing in-window calendar weeks the user selected.

Now aggregateByWeek takes weekCount and emits exactly that many
trailing calendar weeks anchored at today's week in the runtime tz.
Buckets are pre-zeroed so empty in-range weeks render as empty bars;
rows outside the window are dropped.

Co-authored-by: multica-agent <github@multica.ai>

* feat(usage): drop Hourly dim + add Daily/Weekly to workspace dashboard (MUL-2382)

- Remove Hourly from the runtime usage WHEN-chart: segmented control is
  now Daily / Weekly. Drop the HourlyActivityChart component,
  aggregateCostByHour helper, byHour query subscription, and the
  when_tab_hourly i18n key.
- Add the same Daily / Weekly dimension toggle to the workspace-level
  Usage page (dashboard-page.tsx). Time-range linkage matches the runtime
  page: Daily allows 7/30/90 (default 30), Weekly allows 30/90/180
  (default 90); switching dimensions resets `days` when the current value
  isn't in the new dimension's set.
- Reuse `aggregateByWeek` from runtimes/utils for cost / tokens
  (signature relaxed to accept the wider DashboardUsageDaily shape).
  Add `aggregateWeeklyTime` / `aggregateWeeklyTasks` in dashboard/utils
  with identical pre-zeroed trailing-week semantics. Workspace dashboard
  uses the user-chosen timezone (existing TimezoneSelect) as the
  week-boundary tz; runtime page continues to use the runtime's IANA tz.
- New `WeeklyTimeChart` / `WeeklyTasksChart` mirror their daily
  counterparts plus partial-week half-opacity bars and rangeLabel
  tooltips, matching the existing Weekly cost / tokens charts.
- Tests: drop hourly-related setup; add weekly run-time / tasks coverage
  asserting pre-zeroed trailing buckets and the same MUL-2382 sparse
  window-scoping regression we caught on the runtime side.

Co-authored-by: multica-agent <github@multica.ai>

* fix(usage): correct workspace Weekly window + lock tz to UTC (MUL-2382)

Two blocking correctness bugs from Emacs's PR #2822 review:

1. The Weekly chart paints `ceil(days/7)` trailing calendar weeks but the
   API was still asked for exactly `days`. Worst case (today = Sunday on a
   30D request) the leftmost Monday sits 34 days back, so the first week's
   bucket was silently truncated. Over-fetch the per-date queries to
   `weekCount * 7` days when Weekly is active; per-agent rollups stay at
   `days` so the KPI / leaderboard labels keep their advertised window.
   Daily-aggregation surfaces (cost/tokens/time/tasks KPIs and the Daily
   chart) re-scope the over-fetched rows back to `days` so the labels
   stay consistent.

2. The backend dashboard rollup buckets data by UTC `bucket_date` (and the
   raw fallback queries by `DATE(tu.created_at)`, also UTC), but the
   frontend was driving Weekly boundaries from the user-chosen
   `TimezoneSelect`. Near midnight UTC that put cross-boundary rows into
   the wrong calendar week. Lock workspace Weekly to UTC and remove the
   timezone picker from this page; the runtime detail page keeps its own
   `runtime.timezone`-anchored aggregation, which is consistent because
   its rollup is materialized in that runtime's tz.

Verification: pnpm --filter @multica/views test (636 passed),
typecheck clean, lint 0 errors / 13 pre-existing warnings.

Co-authored-by: multica-agent <github@multica.ai>

---------

Co-authored-by: multica-agent <github@multica.ai>
2026-05-19 04:24:46 +02:00

308 lines
10 KiB
TypeScript
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
import { afterEach, beforeEach, describe, expect, it, vi } from "vitest";
import {
aggregateAgentTokens,
aggregateDailyCost,
aggregateWeeklyTasks,
aggregateWeeklyTime,
computeDailyTotals,
formatDuration,
mergeAgentDashboardRows,
} from "./utils";
describe("aggregateDailyCost", () => {
it("collapses multiple rows per day into one stack and sorts by date asc", () => {
const result = aggregateDailyCost([
{
date: "2026-05-10",
model: "claude-sonnet-4-6",
input_tokens: 1_000_000,
output_tokens: 500_000,
cache_read_tokens: 0,
cache_write_tokens: 0,
task_count: 3,
},
{
date: "2026-05-09",
model: "claude-sonnet-4-6",
input_tokens: 1_000_000,
output_tokens: 0,
cache_read_tokens: 0,
cache_write_tokens: 0,
task_count: 1,
},
]);
// Sort: oldest day first.
expect(result.map((r) => r.date)).toEqual(["2026-05-09", "2026-05-10"]);
// claude-sonnet-4-6: input $3/M, output $15/M.
// 2026-05-09 → 1M input × $3 = $3 input, $0 output, $0 cache.
expect(result[0]).toMatchObject({ input: 3, output: 0, cacheWrite: 0, total: 3 });
// 2026-05-10 → $3 input + (0.5M × $15) = $7.5 output. Total $10.5.
expect(result[1]).toMatchObject({ input: 3, output: 7.5, cacheWrite: 0, total: 10.5 });
});
it("treats unmapped models as zero-cost", () => {
const result = aggregateDailyCost([
{
date: "2026-05-10",
model: "made-up-model",
input_tokens: 999_999_999,
output_tokens: 0,
cache_read_tokens: 0,
cache_write_tokens: 0,
task_count: 0,
},
]);
expect(result[0]?.total).toBe(0);
});
});
describe("aggregateAgentTokens", () => {
it("folds per-(agent, model) rows into per-agent totals and sorts by cost desc", () => {
const rows = aggregateAgentTokens([
{
agent_id: "small-spender",
model: "claude-sonnet-4-6",
input_tokens: 100_000,
output_tokens: 0,
cache_read_tokens: 0,
cache_write_tokens: 0,
task_count: 1,
},
{
agent_id: "big-spender",
model: "claude-sonnet-4-6",
input_tokens: 5_000_000,
output_tokens: 0,
cache_read_tokens: 0,
cache_write_tokens: 0,
task_count: 3,
},
{
agent_id: "big-spender",
model: "claude-haiku-4-5",
input_tokens: 1_000_000,
output_tokens: 0,
cache_read_tokens: 0,
cache_write_tokens: 0,
task_count: 2,
},
]);
expect(rows.map((r) => r.agentId)).toEqual(["big-spender", "small-spender"]);
expect(rows[0]?.taskCount).toBe(5);
// big-spender across two models — verify cost > small-spender's.
expect(rows[0]!.cost).toBeGreaterThan(rows[1]!.cost);
});
});
describe("computeDailyTotals", () => {
it("sums tokens across rows and adds estimated cost", () => {
const totals = computeDailyTotals([
{
date: "2026-05-10",
model: "claude-sonnet-4-6",
input_tokens: 1_000_000,
output_tokens: 0,
cache_read_tokens: 0,
cache_write_tokens: 0,
task_count: 2,
},
{
date: "2026-05-09",
model: "claude-sonnet-4-6",
input_tokens: 2_000_000,
output_tokens: 0,
cache_read_tokens: 0,
cache_write_tokens: 0,
task_count: 3,
},
]);
expect(totals.input).toBe(3_000_000);
expect(totals.cost).toBe(9); // 3M × $3/M
expect(totals.taskCount).toBe(5);
});
});
describe("mergeAgentDashboardRows", () => {
it("uses run-time rollup's per-agent task count, not the token sum", () => {
// Token rollup returns two (agent, model) rows for the same task
// (the agent ran one task that touched two models). The token-side
// aggregator sums per-row task_count and lands at 2; the run-time
// rollup correctly reports the underlying distinct count of 1.
const tokenRows = [
{
agentId: "agent-a",
tokens: 3_000_000,
cost: 12,
taskCount: 2, // overcounted because (model-1: 1) + (model-2: 1)
},
];
const runTimeRows = [
{
agent_id: "agent-a",
total_seconds: 600,
task_count: 1, // truth: one task touched both models
failed_count: 0,
},
];
const merged = mergeAgentDashboardRows(tokenRows, runTimeRows);
expect(merged).toHaveLength(1);
expect(merged[0]!.taskCount).toBe(1);
expect(merged[0]!.seconds).toBe(600);
});
it("falls back to token count when no run-time row exists (in-flight task)", () => {
// Tokens reported mid-run; task hasn't terminated yet so the run-time
// rollup is silent on this agent. Keep the token-side estimate
// instead of dropping the agent from the table entirely.
const merged = mergeAgentDashboardRows(
[{ agentId: "agent-b", tokens: 100, cost: 0.5, taskCount: 1 }],
[],
);
expect(merged[0]!.taskCount).toBe(1);
expect(merged[0]!.seconds).toBe(0);
});
it("includes agents that have run-time but no tokens", () => {
// Task errored before reporting any usage — run-time row exists but
// there's no corresponding token row. Agent must still appear on the
// list with zeroed-out token columns.
const merged = mergeAgentDashboardRows(
[],
[{ agent_id: "agent-c", total_seconds: 30, task_count: 1, failed_count: 1 }],
);
expect(merged).toHaveLength(1);
expect(merged[0]!.tokens).toBe(0);
expect(merged[0]!.cost).toBe(0);
expect(merged[0]!.taskCount).toBe(1);
});
it("sorts by cost desc with run-time as a tiebreaker", () => {
const merged = mergeAgentDashboardRows(
[
{ agentId: "low", tokens: 100, cost: 1, taskCount: 1 },
{ agentId: "high", tokens: 100, cost: 9, taskCount: 1 },
{ agentId: "zero-cost-long", tokens: 0, cost: 0, taskCount: 0 },
],
[
{ agent_id: "zero-cost-long", total_seconds: 1000, task_count: 5, failed_count: 0 },
],
);
expect(merged.map((r) => r.agentId)).toEqual(["high", "low", "zero-cost-long"]);
});
});
describe("formatDuration", () => {
it("formats seconds-only durations", () => {
expect(formatDuration(45, "<1m")).toBe("45s");
});
it("formats minutes and seconds when under one hour", () => {
expect(formatDuration(150, "<1m")).toBe("2m 30s");
expect(formatDuration(60, "<1m")).toBe("1m");
});
it("formats hours and minutes when under one day", () => {
expect(formatDuration(3 * 3600 + 17 * 60, "<1m")).toBe("3h 17m");
expect(formatDuration(3600, "<1m")).toBe("1h");
});
it("formats days and hours when more than 24 hours", () => {
expect(formatDuration(2 * 86400 + 5 * 3600, "<1m")).toBe("2d 5h");
});
it("falls back to the supplied label for sub-second durations", () => {
expect(formatDuration(0, "<1m")).toBe("<1m");
expect(formatDuration(0.4, "<1m")).toBe("<1m");
});
});
// ---------------------------------------------------------------------------
// Weekly run-time / tasks aggregation. Mirrors the runtimes-side
// aggregateByWeek tests: trailing N calendar weeks anchored at today-in-tz,
// pre-zeroed buckets, partial-week metadata, and rows outside the window
// dropped. We assert the same invariants on the workspace dashboard helpers
// so all four metrics behave consistently when the user toggles Weekly.
// ---------------------------------------------------------------------------
describe("aggregateWeeklyTime", () => {
beforeEach(() => {
vi.useFakeTimers();
});
afterEach(() => {
vi.useRealTimers();
});
it("folds per-day run-time rows into Mon-anchored weekly totals", () => {
// 2026-05-19 is a Tuesday → current week is Mon=05-18..Sun=05-24.
vi.setSystemTime(new Date("2026-05-19T12:00:00Z"));
const rows = [
{ date: "2026-05-11", total_seconds: 100, task_count: 0, failed_count: 0 },
{ date: "2026-05-17", total_seconds: 50, task_count: 0, failed_count: 0 },
{ date: "2026-05-18", total_seconds: 25, task_count: 0, failed_count: 0 },
];
const result = aggregateWeeklyTime(rows, "UTC", 2);
expect(result).toHaveLength(2);
expect(result[0]).toMatchObject({
weekStart: "2026-05-11",
weekEnd: "2026-05-17",
totalSeconds: 150,
partial: false,
daysCovered: 7,
});
expect(result[1]).toMatchObject({
weekStart: "2026-05-18",
totalSeconds: 25,
partial: true,
daysCovered: 2, // Mon + Tue
});
});
it("drops rows that fall outside the trailing window and keeps empty buckets", () => {
// Same MUL-2382 sparse-data regression we caught on the runtimes side:
// an old populated week must not surface when the requested window
// doesn't include it; in-range empty weeks must remain as zero buckets.
vi.setSystemTime(new Date("2026-05-19T12:00:00Z"));
const rows = [
// 2026-04-13 is a Monday — exactly one week earlier than the oldest
// in-range week (Mon=04-20) for a 5-week trailing window.
{ date: "2026-04-13", total_seconds: 999, task_count: 0, failed_count: 0 },
];
const result = aggregateWeeklyTime(rows, "UTC", 5);
expect(result.map((w) => w.weekStart)).toEqual([
"2026-04-20",
"2026-04-27",
"2026-05-04",
"2026-05-11",
"2026-05-18",
]);
for (const w of result) expect(w.totalSeconds).toBe(0);
});
});
describe("aggregateWeeklyTasks", () => {
beforeEach(() => {
vi.useFakeTimers();
});
afterEach(() => {
vi.useRealTimers();
});
it("splits completed and failed counts per calendar week", () => {
vi.setSystemTime(new Date("2026-05-19T12:00:00Z"));
const rows = [
{ date: "2026-05-12", total_seconds: 0, task_count: 5, failed_count: 1 },
{ date: "2026-05-18", total_seconds: 0, task_count: 3, failed_count: 0 },
];
const result = aggregateWeeklyTasks(rows, "UTC", 2);
expect(result[0]).toMatchObject({
weekStart: "2026-05-11",
completed: 4,
failed: 1,
});
expect(result[1]).toMatchObject({
weekStart: "2026-05-18",
completed: 3,
failed: 0,
partial: true,
});
});
});