mirror of
https://github.com/multica-ai/multica.git
synced 2026-07-05 13:29:44 +02:00
The race detector caught a flaky failure on main (passes on retry): Supervisor.startSupervisor does s.wg.Add(1) under s.mu, while Supervisor.Wait calls s.wg.Wait() with no lock. Calling WaitGroup.Add concurrently with WaitGroup.Wait is a data race and undefined per the WaitGroup contract — so it only trips occasionally (it passed locally and in PR CI). Wait now blocks on stopChan (closed by Run's defer when Run returns) before calling wg.Wait(). Run is the sole caller of startSupervisor, so once Run has returned no further Add can happen and wg.Wait is race-free. WaitWithTimeout inherits the fix (it calls Wait), and its timer still bounds shutdown. This latent race existed in the original lark.Hub.Wait too; fixed properly in the generalized Supervisor. Verified: go test -race -count=300 on the flagged test and -count=8 on the whole engine package, all clean; no deadlock from the stopChan gate (every caller pairs Wait with a started Run + cancelled ctx). Co-authored-by: J <j@multica.ai> Co-authored-by: multica-agent <github@multica.ai>