mirror of
https://github.com/multica-ai/multica.git
synced 2026-06-17 03:38:32 +02:00
* feat(agent): persist thinking_level per agent (MUL-2339) Adds a nullable `thinking_level` column to the `agent` table so the backend can route a runtime-native reasoning/effort token (e.g. Claude's `xhigh`, Codex's `minimal`) through to the agent CLI on every dispatch. The column is intentionally TEXT rather than an enum — Claude and Codex publish overlapping but distinct vocabularies and we want the persisted value to round-trip exactly through whichever CLI receives it. NULL is the "use runtime default" sentinel that every downstream consumer reads as "do not inject --effort / reasoning_effort". This commit is just the storage layer (migration + sqlc); subsequent commits wire it through the API, daemon, and agent backends. Co-authored-by: multica-agent <github@multica.ai> * feat(agent-backend): inject reasoning effort for claude + codex (MUL-2339) Extends ExecOptions with a runtime-native ThinkingLevel string and wires it into the Claude and Codex backends. Discovery is driven by the local CLI so the daemon advertises whatever the host install supports rather than a hand-maintained list that goes stale. Per Elon's PR1 review: - Claude: parses `claude --help` to learn the `--effort` superset and projects through a per-model allow-list (xhigh is Opus-only; max is session-only on the smaller models). Falls back to a conservative static list when the binary is missing or help drift hides the line. - Codex: drives `codex debug models --output json` so per-model reasoning subsets and the documented default come directly from the CLI. The older config-error probe trick is gone — the JSON path is stable and doesn't pollute stderr with an intentional misconfig. - Cache key includes (provider, executablePath, cliVersion) so a CLI upgrade invalidates entries that referenced the older help / catalog. Per Trump's PR1 constraint, all three Codex injection points (thread/start.config, thread/resume.config, turn/start.effort) flow through one helper (`applyCodexReasoningEffort`) so they cannot drift independently. The shared `codexReasoningCases` fixture in `thinking_test.go` asserts the same value→{shape, key} contract at each site for every level the runtimes know about. Claude's `--effort` is also added to `claudeBlockedArgs` so a user custom_args entry can't silently outvote the daemon-injected value. Co-authored-by: multica-agent <github@multica.ai> * feat(api): wire thinking_level through API + daemon contract (MUL-2339) End-to-end plumbing for the per-agent reasoning/effort setting: - AgentResponse / TaskAgentData now carry `thinking_level`; the daemon's claim response includes it and the daemon's executor passes it through to agent.ExecOptions, where the Claude and Codex backends already know what to do with it. - ModelEntry on the runtime-models wire format gains a `thinking` block carrying `supported_levels` + `default_level` per model so the UI can render a runtime-aware picker without the server having to know about the local CLI install. `handleModelList` projects the agent-package catalog (including the new Thinking field) into the wire shape. - CreateAgent / UpdateAgent gate the field with a synchronous provider enum check (claude / codex only today). UpdateAgent is tri-state: field omitted = no change, "" = explicit clear (new `ClearAgentThinkingLevel` query, mirrors the existing mcp_config null pattern), non-empty = validate then set. Per Trump's PR1 review, the API NEVER auto-clears on a runtime/model swap and ALWAYS returns 400 on an unknown literal value — same shape across CreateAgent, UpdateAgent, and combined patches that move runtime + level in one request. Per-model combination failures (e.g. `xhigh` against a model that only supports up to `high`) surface as a daemon-side task error, not a silent server-side rewrite. TS types follow the same shape: `Agent.thinking_level`, `CreateAgentRequest`/`UpdateAgentRequest` add the field, `RuntimeModel` grows a `thinking` block. Older backends omit the field, which the front-end treats as "no picker for this model" — installed desktop builds keep working. Co-authored-by: multica-agent <github@multica.ai> * fix(agent): correct codex debug models argv + pin via runner test (MUL-2339) `codex debug models --output json` is rejected by codex-cli 0.131.0 — the subcommand emits JSON on stdout by default and has no `--output` flag. Drop the flag and add `--bundled` to skip the network refresh discovery doesn't need. Move the argv to a package-level var and add a test that runs a fake `codex` to assert the binary actually receives exactly `debug models --bundled`, so the contract can't silently drift on the next refactor. Also teach ValidateThinkingLevel to resolve an empty model to the provider's default model entry. Without this, every default-model task with a persisted thinking_level would be misjudged "unknown model" by the daemon guard. Co-authored-by: multica-agent <github@multica.ai> * fix(api): reject runtime switch that would leave invalid thinking_level (MUL-2339) A PATCH that changed `runtime_id` without touching `thinking_level` used to silently keep the existing value, so a Claude agent storing `max` could land on a Codex runtime where `max` is not a recognised token at all, and the daemon would receive a literal-invalid level. Hold the same "always 400 on literal-invalid, never silent coerce" rule on this implicit path. When runtime_id changes and the existing value is not in the new provider's enum, return 400 with the recovery options (clear via `thinking_level=""` or re-set in the same PATCH). Add coverage for both the kept-when-still-valid and the rejected cases, plus the two recovery paths (clear and replace). Co-authored-by: multica-agent <github@multica.ai> * fix(daemon): guard runTask with per-model thinking_level validator (MUL-2339) ValidateThinkingLevel existed but had no call site — `task.Agent. ThinkingLevel` flowed straight into ExecOptions, so `xhigh` configured on a non-Opus Claude model, or API-side stale values that escaped the provider enum gate, would be injected anyway. Run the validator before building ExecOptions. Invalid combinations log a warning and drop the level instead of failing the task: the agent still runs, just at the runtime's default reasoning effort. Discovery errors fail open (keep the level, let the CLI surface any objection) so a transient `claude --help` failure can't strand work. Empty model is forwarded as-is; the validator resolves it to the provider's default model internally per the cross-package contract. Co-authored-by: multica-agent <github@multica.ai> * chore(agent): drop stale `--output json` comments + unused scanner (MUL-2339) Codex CLI's `debug models` subcommand emits JSON without an `--output` flag, and `parseCodexDebugModels` never read from the bufio.Scanner. Sync the comments with the actual invocation and remove the dead init. Co-authored-by: multica-agent <github@multica.ai> --------- Co-authored-by: multica-agent <github@multica.ai>
398 lines
14 KiB
Go
398 lines
14 KiB
Go
package handler
|
|
|
|
import (
|
|
"context"
|
|
"encoding/json"
|
|
"log/slog"
|
|
"net/http"
|
|
"sync"
|
|
"time"
|
|
|
|
"github.com/go-chi/chi/v5"
|
|
)
|
|
|
|
// ---------------------------------------------------------------------------
|
|
// Model list request store
|
|
// ---------------------------------------------------------------------------
|
|
//
|
|
// The server cannot call the daemon directly (the daemon is behind the user's
|
|
// NAT and only polls the server). So "list models for this runtime" uses a
|
|
// pending-request pattern: a frontend POST creates a pending request, the
|
|
// daemon pops it on the next heartbeat, executes locally, and reports the
|
|
// result back.
|
|
//
|
|
// The store is the cross-cutting state for that flow. It MUST stay coherent
|
|
// across API replicas — POST, heartbeat and poll can each land on a different
|
|
// node, and they all need to see the same request lifecycle. The single-node
|
|
// in-memory implementation is fine for self-hosted dev; multi-node deploys
|
|
// (Multica Cloud) MUST use the Redis-backed implementation, otherwise the
|
|
// pending request is invisible to whichever replica receives the next call
|
|
// and the picker shows "No models available" (regression: see issue
|
|
// review on multica-ai/multica#2009).
|
|
|
|
// ModelListStatus represents the lifecycle of a model list request.
|
|
type ModelListStatus string
|
|
|
|
const (
|
|
ModelListPending ModelListStatus = "pending"
|
|
ModelListRunning ModelListStatus = "running"
|
|
ModelListCompleted ModelListStatus = "completed"
|
|
ModelListFailed ModelListStatus = "failed"
|
|
ModelListTimeout ModelListStatus = "timeout"
|
|
)
|
|
|
|
// ModelListRequest represents a pending or completed model list request.
|
|
// Supported is false when the provider ignores per-agent model
|
|
// selection entirely (currently: hermes). The UI uses this to
|
|
// disable its dropdown rather than silently accepting a value the
|
|
// backend will drop.
|
|
//
|
|
// RunStartedAt is set when PopPending claims the request. It is
|
|
// `json:"-"` because it's a server-side bookkeeping field — the UI only
|
|
// needs Status / UpdatedAt to drive the polling loop.
|
|
type ModelListRequest struct {
|
|
ID string `json:"id"`
|
|
RuntimeID string `json:"runtime_id"`
|
|
Status ModelListStatus `json:"status"`
|
|
Models []ModelEntry `json:"models,omitempty"`
|
|
Supported bool `json:"supported"`
|
|
Error string `json:"error,omitempty"`
|
|
CreatedAt time.Time `json:"created_at"`
|
|
UpdatedAt time.Time `json:"updated_at"`
|
|
RunStartedAt *time.Time `json:"-"`
|
|
}
|
|
|
|
// ModelEntry mirrors agent.Model for the wire. `Default` tags the
|
|
// model the runtime advertises as its preferred pick (e.g. Claude
|
|
// Code's shipped default, or hermes' currentModelId) so the UI can
|
|
// badge it — don't drop it when marshalling.
|
|
//
|
|
// `Thinking` carries the per-model reasoning-effort catalog discovered
|
|
// by the daemon for runtimes that support it (claude, codex — see
|
|
// MUL-2339). nil means "no picker for this model"; the UI hides the
|
|
// thinking_level selector. Older daemons (pre-2026-05) won't send this
|
|
// field, which is fine: the UI hides the selector and the agent runs
|
|
// with the runtime default.
|
|
type ModelEntry struct {
|
|
ID string `json:"id"`
|
|
Label string `json:"label"`
|
|
Provider string `json:"provider,omitempty"`
|
|
Default bool `json:"default,omitempty"`
|
|
Thinking *ModelThinking `json:"thinking,omitempty"`
|
|
}
|
|
|
|
// ModelThinking is the wire shape for the per-model thinking catalog.
|
|
// Mirrors agent.ModelThinking so the daemon's report passes through
|
|
// without remapping.
|
|
type ModelThinking struct {
|
|
SupportedLevels []ThinkingLevel `json:"supported_levels"`
|
|
DefaultLevel string `json:"default_level,omitempty"`
|
|
}
|
|
|
|
// ThinkingLevel is the wire shape for a single entry in a model's
|
|
// reasoning-effort catalog. `Value` is the literal token the daemon
|
|
// passes to the CLI; `Label` is the human-readable display string;
|
|
// `Description` is optional helper copy (Codex's debug-models output
|
|
// includes one per level).
|
|
type ThinkingLevel struct {
|
|
Value string `json:"value"`
|
|
Label string `json:"label"`
|
|
Description string `json:"description,omitempty"`
|
|
}
|
|
|
|
const (
|
|
// modelListPendingTimeout bounds how long a pending request can sit in
|
|
// the store before the UI is told "daemon didn't pick this up".
|
|
modelListPendingTimeout = 30 * time.Second
|
|
// modelListRunningTimeout bounds how long a claimed (running) request
|
|
// can stay claimed before the UI is told "daemon picked this up but
|
|
// never reported a result". This matters when the heartbeat response
|
|
// carrying `pending_model_list` is lost in transit (e.g. HTTP client
|
|
// timeout after PopPending already mutated store state): without this
|
|
// transition the UI would keep polling a record that is stuck in
|
|
// `running` until retention sweeps it.
|
|
modelListRunningTimeout = 60 * time.Second
|
|
// modelListStoreRetention bounds how long any stored request lives in
|
|
// the backing store. The Redis backend uses it as a TTL; the in-memory
|
|
// backend GCs on Create. The window is deliberately wider than the
|
|
// running/pending timeouts so terminal records are still readable when
|
|
// the UI's last poll arrives.
|
|
modelListStoreRetention = 2 * time.Minute
|
|
)
|
|
|
|
// ModelListStore is the contract every backend (in-memory single-node,
|
|
// Redis multi-node) must satisfy. Methods take a context so the Redis
|
|
// implementation can honour the heartbeat-side timeout that gates a
|
|
// slow shared store from stalling the rest of the heartbeat.
|
|
type ModelListStore interface {
|
|
Create(ctx context.Context, runtimeID string) (*ModelListRequest, error)
|
|
Get(ctx context.Context, id string) (*ModelListRequest, error)
|
|
// HasPending is a cheap read-only probe used by the heartbeat hot path
|
|
// to gate the side-effecting PopPending. A spurious "true" is fine —
|
|
// PopPending handles "queue empty after probe" by returning nil.
|
|
HasPending(ctx context.Context, runtimeID string) (bool, error)
|
|
PopPending(ctx context.Context, runtimeID string) (*ModelListRequest, error)
|
|
Complete(ctx context.Context, id string, models []ModelEntry, supported bool) error
|
|
Fail(ctx context.Context, id string, errMsg string) error
|
|
}
|
|
|
|
// applyModelListTimeout transitions a request to ModelListTimeout when it has
|
|
// been stuck in a non-terminal state past its threshold. Returns true when
|
|
// the record was modified so callers can persist the change. The pending
|
|
// threshold catches "daemon never picked this up"; the running threshold
|
|
// catches "daemon picked it up but the result report was lost" — without
|
|
// the running escape, only retention sweep ends the polling loop.
|
|
func applyModelListTimeout(req *ModelListRequest, now time.Time) bool {
|
|
switch req.Status {
|
|
case ModelListPending:
|
|
if now.Sub(req.CreatedAt) > modelListPendingTimeout {
|
|
req.Status = ModelListTimeout
|
|
req.Error = "daemon did not respond within 30 seconds"
|
|
req.UpdatedAt = now
|
|
return true
|
|
}
|
|
case ModelListRunning:
|
|
if req.RunStartedAt != nil && now.Sub(*req.RunStartedAt) > modelListRunningTimeout {
|
|
req.Status = ModelListTimeout
|
|
req.Error = "daemon did not finish within 60 seconds"
|
|
req.UpdatedAt = now
|
|
return true
|
|
}
|
|
}
|
|
return false
|
|
}
|
|
|
|
// InMemoryModelListStore is the single-node implementation. Adequate for
|
|
// self-hosted dev and the test suite, but unsafe in multi-node deploys
|
|
// (each replica gets its own map and the pending request is invisible to
|
|
// every replica that didn't receive the POST).
|
|
type InMemoryModelListStore struct {
|
|
mu sync.Mutex
|
|
requests map[string]*ModelListRequest
|
|
}
|
|
|
|
func NewInMemoryModelListStore() *InMemoryModelListStore {
|
|
return &InMemoryModelListStore{requests: make(map[string]*ModelListRequest)}
|
|
}
|
|
|
|
func (s *InMemoryModelListStore) Create(_ context.Context, runtimeID string) (*ModelListRequest, error) {
|
|
s.mu.Lock()
|
|
defer s.mu.Unlock()
|
|
|
|
// Garbage-collect stale entries so the map can't grow unbounded.
|
|
for id, req := range s.requests {
|
|
if time.Since(req.CreatedAt) > modelListStoreRetention {
|
|
delete(s.requests, id)
|
|
}
|
|
}
|
|
|
|
now := time.Now()
|
|
req := &ModelListRequest{
|
|
ID: randomID(),
|
|
RuntimeID: runtimeID,
|
|
Status: ModelListPending,
|
|
// Default to true; the daemon overrides this in the report
|
|
// for providers that don't support per-agent model selection.
|
|
Supported: true,
|
|
CreatedAt: now,
|
|
UpdatedAt: now,
|
|
}
|
|
s.requests[req.ID] = req
|
|
return req, nil
|
|
}
|
|
|
|
func (s *InMemoryModelListStore) Get(_ context.Context, id string) (*ModelListRequest, error) {
|
|
s.mu.Lock()
|
|
defer s.mu.Unlock()
|
|
|
|
req, ok := s.requests[id]
|
|
if !ok {
|
|
return nil, nil
|
|
}
|
|
applyModelListTimeout(req, time.Now())
|
|
return req, nil
|
|
}
|
|
|
|
func (s *InMemoryModelListStore) HasPending(_ context.Context, runtimeID string) (bool, error) {
|
|
s.mu.Lock()
|
|
defer s.mu.Unlock()
|
|
|
|
now := time.Now()
|
|
for _, req := range s.requests {
|
|
applyModelListTimeout(req, now)
|
|
if req.RuntimeID == runtimeID && req.Status == ModelListPending {
|
|
return true, nil
|
|
}
|
|
}
|
|
return false, nil
|
|
}
|
|
|
|
func (s *InMemoryModelListStore) PopPending(_ context.Context, runtimeID string) (*ModelListRequest, error) {
|
|
s.mu.Lock()
|
|
defer s.mu.Unlock()
|
|
|
|
var oldest *ModelListRequest
|
|
now := time.Now()
|
|
for _, req := range s.requests {
|
|
applyModelListTimeout(req, now)
|
|
if req.RuntimeID == runtimeID && req.Status == ModelListPending {
|
|
if oldest == nil || req.CreatedAt.Before(oldest.CreatedAt) {
|
|
oldest = req
|
|
}
|
|
}
|
|
}
|
|
if oldest != nil {
|
|
oldest.Status = ModelListRunning
|
|
startedAt := now
|
|
oldest.RunStartedAt = &startedAt
|
|
oldest.UpdatedAt = now
|
|
}
|
|
return oldest, nil
|
|
}
|
|
|
|
func (s *InMemoryModelListStore) Complete(_ context.Context, id string, models []ModelEntry, supported bool) error {
|
|
s.mu.Lock()
|
|
defer s.mu.Unlock()
|
|
|
|
if req, ok := s.requests[id]; ok {
|
|
req.Status = ModelListCompleted
|
|
req.Models = models
|
|
req.Supported = supported
|
|
req.UpdatedAt = time.Now()
|
|
}
|
|
return nil
|
|
}
|
|
|
|
func (s *InMemoryModelListStore) Fail(_ context.Context, id string, errMsg string) error {
|
|
s.mu.Lock()
|
|
defer s.mu.Unlock()
|
|
|
|
if req, ok := s.requests[id]; ok {
|
|
req.Status = ModelListFailed
|
|
req.Error = errMsg
|
|
req.UpdatedAt = time.Now()
|
|
}
|
|
return nil
|
|
}
|
|
|
|
func modelListRequestTerminal(status ModelListStatus) bool {
|
|
return status == ModelListCompleted || status == ModelListFailed || status == ModelListTimeout
|
|
}
|
|
|
|
// ---------------------------------------------------------------------------
|
|
// Handlers
|
|
// ---------------------------------------------------------------------------
|
|
|
|
// InitiateListModels creates a pending model list request for a runtime.
|
|
// Called by the frontend; the daemon picks it up on its next heartbeat.
|
|
func (h *Handler) InitiateListModels(w http.ResponseWriter, r *http.Request) {
|
|
runtimeID := chi.URLParam(r, "runtimeId")
|
|
runtimeUUID, ok := parseUUIDOrBadRequest(w, runtimeID, "runtime_id")
|
|
if !ok {
|
|
return
|
|
}
|
|
|
|
rt, err := h.Queries.GetAgentRuntime(r.Context(), runtimeUUID)
|
|
if err != nil {
|
|
writeError(w, http.StatusNotFound, "runtime not found")
|
|
return
|
|
}
|
|
if _, ok := h.requireWorkspaceMember(w, r, uuidToString(rt.WorkspaceID), "runtime not found"); !ok {
|
|
return
|
|
}
|
|
if rt.Status != "online" {
|
|
writeError(w, http.StatusServiceUnavailable, "runtime is offline")
|
|
return
|
|
}
|
|
|
|
req, err := h.ModelListStore.Create(r.Context(), uuidToString(rt.ID))
|
|
if err != nil {
|
|
writeError(w, http.StatusInternalServerError, "failed to enqueue model list request: "+err.Error())
|
|
return
|
|
}
|
|
writeJSON(w, http.StatusOK, req)
|
|
}
|
|
|
|
// GetModelListRequest returns the status of a model list request.
|
|
func (h *Handler) GetModelListRequest(w http.ResponseWriter, r *http.Request) {
|
|
requestID := chi.URLParam(r, "requestId")
|
|
|
|
req, err := h.ModelListStore.Get(r.Context(), requestID)
|
|
if err != nil {
|
|
writeError(w, http.StatusInternalServerError, "failed to load request: "+err.Error())
|
|
return
|
|
}
|
|
if req == nil {
|
|
writeError(w, http.StatusNotFound, "request not found")
|
|
return
|
|
}
|
|
writeJSON(w, http.StatusOK, req)
|
|
}
|
|
|
|
// ReportModelListResult receives the list result from the daemon.
|
|
func (h *Handler) ReportModelListResult(w http.ResponseWriter, r *http.Request) {
|
|
runtimeID := chi.URLParam(r, "runtimeId")
|
|
|
|
if _, ok := h.requireDaemonRuntimeAccess(w, r, runtimeID); !ok {
|
|
return
|
|
}
|
|
|
|
requestID := chi.URLParam(r, "requestId")
|
|
|
|
// Fetch first so we can ignore stale reports for already-terminal
|
|
// requests (e.g. the heartbeat response that triggered the daemon
|
|
// run was a retry, and the original report already landed).
|
|
existing, err := h.ModelListStore.Get(r.Context(), requestID)
|
|
if err != nil {
|
|
writeError(w, http.StatusInternalServerError, "failed to load request: "+err.Error())
|
|
return
|
|
}
|
|
if existing == nil || existing.RuntimeID != runtimeID {
|
|
writeError(w, http.StatusNotFound, "request not found")
|
|
return
|
|
}
|
|
if modelListRequestTerminal(existing.Status) {
|
|
slog.Debug("ignoring stale model list report", "runtime_id", runtimeID, "request_id", requestID, "status", existing.Status)
|
|
writeJSON(w, http.StatusOK, map[string]string{"status": "ok"})
|
|
return
|
|
}
|
|
|
|
var body struct {
|
|
Status string `json:"status"` // "completed" or "failed"
|
|
Models []ModelEntry `json:"models"`
|
|
Supported *bool `json:"supported"`
|
|
Error string `json:"error"`
|
|
}
|
|
if err := json.NewDecoder(r.Body).Decode(&body); err != nil {
|
|
writeError(w, http.StatusBadRequest, "invalid request body")
|
|
return
|
|
}
|
|
|
|
if body.Status == "completed" {
|
|
// Older daemons may omit `supported`; default to true to keep
|
|
// the UI usable while they haven't been redeployed yet.
|
|
supported := true
|
|
if body.Supported != nil {
|
|
supported = *body.Supported
|
|
}
|
|
if err := h.ModelListStore.Complete(r.Context(), requestID, body.Models, supported); err != nil {
|
|
// Surface the store failure as 5xx so the daemon can retry instead
|
|
// of swallowing the report (leaves the request stuck in running
|
|
// until the server-side timeout, which is exactly the "looks OK
|
|
// but nothing happens" class of bug we're trying to avoid).
|
|
slog.Error("ModelListStore Complete failed", "error", err, "request_id", requestID)
|
|
writeError(w, http.StatusInternalServerError, "failed to persist completion")
|
|
return
|
|
}
|
|
} else {
|
|
if err := h.ModelListStore.Fail(r.Context(), requestID, body.Error); err != nil {
|
|
slog.Error("ModelListStore Fail failed", "error", err, "request_id", requestID)
|
|
writeError(w, http.StatusInternalServerError, "failed to persist failure")
|
|
return
|
|
}
|
|
}
|
|
|
|
slog.Debug("model list report", "runtime_id", runtimeID, "request_id", requestID, "status", body.Status, "count", len(body.Models))
|
|
writeJSON(w, http.StatusOK, map[string]string{"status": "ok"})
|
|
}
|