fix(queue): close TOCTOU race between schedule() and start()#165
fix(queue): close TOCTOU race between schedule() and start()#165
Conversation
Introduce an activeSlots counter protected by the existing mutex to track reserved worker slots. This eliminates the window where schedule() could observe a stale BusyWorkers() value and over-dispatch beyond workerCount. Closes #154 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Fixes a concurrency race in the queue scheduler so concurrent schedule() calls can’t over-dispatch beyond the configured workerCount.
Changes:
- Add a mutex-protected
activeSlotscounter to reserve worker capacity earlier and close the TOCTOU gap between scheduling and “busy” accounting. - Decrement
activeSlotswhen a task finishes (and on early-exit paths) to release capacity safely. - Add a regression test intended to assert
BusyWorkers()never exceedsworkerCountunder load.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
queue.go |
Introduces activeSlots and integrates it into schedule(), start(), and work() to prevent over-dispatch. |
ring_test.go |
Adds TestBusyWorkersNeverExceedsWorkerCount to validate the busy-workers invariant under concurrent execution. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Use a gate channel to block tasks and create scheduling pressure - Monitor BusyWorkers continuously from a dedicated goroutine - Replace fixed 2s sleep with deterministic WaitGroup completion Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Replace tight select/default loop with a 100µs ticker for sampling BusyWorkers, avoiding unnecessary CPU consumption during tests. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Honor ctx.Done() in task function to prevent goroutine leaks on early exit - Add 10s timeout when sending gate tokens to fail fast on deadlock Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Extract tryReserveSlot, releaseSlot, signalReadyLocked helpers to replace three inline mutex+counter blocks - Fold work() defer's slot release and schedule signal into one critical section, saving a lock round-trip per task - Expand activeSlots field comment to clarify its lifecycle versus BusyWorkers Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Shorten trailing comment to under 120 chars; helper docs carry the detail Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
schedule()calls could observe a staleBusyWorkers()value and over-dispatch beyondworkerCountactiveSlotscounter that atomically reserves worker slots, eliminating the gap between the scheduling decision and the busy counter incrementTestBusyWorkersNeverExceedsWorkerCountthat validates the invariant under concurrency pressureRoot Cause
schedule()checkedBusyWorkers()under the mutex and sent on thereadychannel, butIncBusyWorker()instart()happened much later — after receiving from ready, spawning a request goroutine, and waiting for a task. During this window, concurrentschedule()calls (fromwork()defers) saw stale values and could send extra ready signals.Approach
Added an internal
activeSlots int64field that tracks reserved (committed) worker slots. Bothschedule()andstart()check/modify this counter under the same mutex, closing the TOCTOU gap entirely. The publicBusyWorkers()metric remains unchanged — it still only counts workers actively executing tasks.Test plan
go test -race -v ./...— all existing tests passgo test -race -count=50 -run=TestBusyWorkersNeverExceedsWorkerCount— stress test passesgo test -race -count=10 -run=TestDecreaseWorkerCount— dynamic worker count still worksCloses #154
🤖 Generated with Claude Code