mirror of
https://github.com/multica-ai/multica.git
synced 2026-06-17 03:38:32 +02:00
* feat(agent): persist thinking_level per agent (MUL-2339) Adds a nullable `thinking_level` column to the `agent` table so the backend can route a runtime-native reasoning/effort token (e.g. Claude's `xhigh`, Codex's `minimal`) through to the agent CLI on every dispatch. The column is intentionally TEXT rather than an enum — Claude and Codex publish overlapping but distinct vocabularies and we want the persisted value to round-trip exactly through whichever CLI receives it. NULL is the "use runtime default" sentinel that every downstream consumer reads as "do not inject --effort / reasoning_effort". This commit is just the storage layer (migration + sqlc); subsequent commits wire it through the API, daemon, and agent backends. Co-authored-by: multica-agent <github@multica.ai> * feat(agent-backend): inject reasoning effort for claude + codex (MUL-2339) Extends ExecOptions with a runtime-native ThinkingLevel string and wires it into the Claude and Codex backends. Discovery is driven by the local CLI so the daemon advertises whatever the host install supports rather than a hand-maintained list that goes stale. Per Elon's PR1 review: - Claude: parses `claude --help` to learn the `--effort` superset and projects through a per-model allow-list (xhigh is Opus-only; max is session-only on the smaller models). Falls back to a conservative static list when the binary is missing or help drift hides the line. - Codex: drives `codex debug models --output json` so per-model reasoning subsets and the documented default come directly from the CLI. The older config-error probe trick is gone — the JSON path is stable and doesn't pollute stderr with an intentional misconfig. - Cache key includes (provider, executablePath, cliVersion) so a CLI upgrade invalidates entries that referenced the older help / catalog. Per Trump's PR1 constraint, all three Codex injection points (thread/start.config, thread/resume.config, turn/start.effort) flow through one helper (`applyCodexReasoningEffort`) so they cannot drift independently. The shared `codexReasoningCases` fixture in `thinking_test.go` asserts the same value→{shape, key} contract at each site for every level the runtimes know about. Claude's `--effort` is also added to `claudeBlockedArgs` so a user custom_args entry can't silently outvote the daemon-injected value. Co-authored-by: multica-agent <github@multica.ai> * feat(api): wire thinking_level through API + daemon contract (MUL-2339) End-to-end plumbing for the per-agent reasoning/effort setting: - AgentResponse / TaskAgentData now carry `thinking_level`; the daemon's claim response includes it and the daemon's executor passes it through to agent.ExecOptions, where the Claude and Codex backends already know what to do with it. - ModelEntry on the runtime-models wire format gains a `thinking` block carrying `supported_levels` + `default_level` per model so the UI can render a runtime-aware picker without the server having to know about the local CLI install. `handleModelList` projects the agent-package catalog (including the new Thinking field) into the wire shape. - CreateAgent / UpdateAgent gate the field with a synchronous provider enum check (claude / codex only today). UpdateAgent is tri-state: field omitted = no change, "" = explicit clear (new `ClearAgentThinkingLevel` query, mirrors the existing mcp_config null pattern), non-empty = validate then set. Per Trump's PR1 review, the API NEVER auto-clears on a runtime/model swap and ALWAYS returns 400 on an unknown literal value — same shape across CreateAgent, UpdateAgent, and combined patches that move runtime + level in one request. Per-model combination failures (e.g. `xhigh` against a model that only supports up to `high`) surface as a daemon-side task error, not a silent server-side rewrite. TS types follow the same shape: `Agent.thinking_level`, `CreateAgentRequest`/`UpdateAgentRequest` add the field, `RuntimeModel` grows a `thinking` block. Older backends omit the field, which the front-end treats as "no picker for this model" — installed desktop builds keep working. Co-authored-by: multica-agent <github@multica.ai> * fix(agent): correct codex debug models argv + pin via runner test (MUL-2339) `codex debug models --output json` is rejected by codex-cli 0.131.0 — the subcommand emits JSON on stdout by default and has no `--output` flag. Drop the flag and add `--bundled` to skip the network refresh discovery doesn't need. Move the argv to a package-level var and add a test that runs a fake `codex` to assert the binary actually receives exactly `debug models --bundled`, so the contract can't silently drift on the next refactor. Also teach ValidateThinkingLevel to resolve an empty model to the provider's default model entry. Without this, every default-model task with a persisted thinking_level would be misjudged "unknown model" by the daemon guard. Co-authored-by: multica-agent <github@multica.ai> * fix(api): reject runtime switch that would leave invalid thinking_level (MUL-2339) A PATCH that changed `runtime_id` without touching `thinking_level` used to silently keep the existing value, so a Claude agent storing `max` could land on a Codex runtime where `max` is not a recognised token at all, and the daemon would receive a literal-invalid level. Hold the same "always 400 on literal-invalid, never silent coerce" rule on this implicit path. When runtime_id changes and the existing value is not in the new provider's enum, return 400 with the recovery options (clear via `thinking_level=""` or re-set in the same PATCH). Add coverage for both the kept-when-still-valid and the rejected cases, plus the two recovery paths (clear and replace). Co-authored-by: multica-agent <github@multica.ai> * fix(daemon): guard runTask with per-model thinking_level validator (MUL-2339) ValidateThinkingLevel existed but had no call site — `task.Agent. ThinkingLevel` flowed straight into ExecOptions, so `xhigh` configured on a non-Opus Claude model, or API-side stale values that escaped the provider enum gate, would be injected anyway. Run the validator before building ExecOptions. Invalid combinations log a warning and drop the level instead of failing the task: the agent still runs, just at the runtime's default reasoning effort. Discovery errors fail open (keep the level, let the CLI surface any objection) so a transient `claude --help` failure can't strand work. Empty model is forwarded as-is; the validator resolves it to the provider's default model internally per the cross-package contract. Co-authored-by: multica-agent <github@multica.ai> * chore(agent): drop stale `--output json` comments + unused scanner (MUL-2339) Codex CLI's `debug models` subcommand emits JSON without an `--output` flag, and `parseCodexDebugModels` never read from the bufio.Scanner. Sync the comments with the actual invocation and remove the dead init. Co-authored-by: multica-agent <github@multica.ai> --------- Co-authored-by: multica-agent <github@multica.ai>
9 lines
512 B
SQL
9 lines
512 B
SQL
-- Per-agent thinking / reasoning effort setting. Stored as the
|
|
-- runtime-native string (e.g. Claude's "low|medium|high|xhigh|max",
|
|
-- Codex's "none|minimal|low|medium|high|xhigh") rather than a
|
|
-- cross-runtime abstraction, so the user-visible value matches what
|
|
-- each CLI's own UI advertises (see MUL-2339). NULL means "use the
|
|
-- runtime/model default" — every backend treats this as "do not
|
|
-- inject --effort / reasoning_effort" and lets the CLI pick.
|
|
ALTER TABLE agent ADD COLUMN thinking_level TEXT;
|