mirror of
https://github.com/multica-ai/multica.git
synced 2026-07-05 21:39:54 +02:00
`hermes`, `kimi`, and `kiro` all wired stderr through
`cmd.Stderr = io.MultiWriter(logWriter, providerErrSniffer)`.
The OS-pipe → MultiWriter copy goroutine that exec spawns for
that form is only joined by `cmd.Wait()`, which the lifecycle
goroutine fires in deferred cleanup — *after*
`promoteACPResultOnProviderError` already consulted the sniffer.
When stopReason=end_turn (success) raced ahead of the stderr
drain, the sniffer's `lines` slice was empty, the helper fell
through to the synthetic agent-text fallback ("hermes provider
error: API call failed after 3 retries"), and the actionable
upstream signal (HTTP 429 / usage limit) was lost.
This was visible as a flaky
`TestHermesBackendPromotesProviderErrorWithNonEmptyOutput` in CI
under high parallelism — a real prod bug, not a test issue: live
runs hit the same race when an upstream LLM returns 429 and
hermes' synthetic agent turn beats the stderr drain to the
parent.
Replace the MultiWriter wiring with `cmd.StderrPipe()` + an
explicit copier goroutine that signals on `stderrDone`. The
lifecycle goroutine already awaits `<-readerDone` for stdout;
add `<-stderrDone` next to it before `promoteACPResultOnProviderError`
runs. The deferred `cmd.Wait()` ordering is unchanged — it just
becomes a cheap reap by the time it fires.
Verified: `go test ./pkg/agent/ -run "TestHermes|TestKimi|TestKiro"
-count=10 -race`, then full package `-count=3 -race`, all green.
Co-authored-by: multica-agent <github@multica.ai>