fix(agentmain): replace fixed-timeout queue polling with lifecycle-aware drain_queue#374
Open
sontianye wants to merge 1 commit into
Open
fix(agentmain): replace fixed-timeout queue polling with lifecycle-aware drain_queue#374sontianye wants to merge 1 commit into
sontianye wants to merge 1 commit into
Conversation
…are drain_queue Tasks running longer than the hardcoded 300 s (task mode) or 180 s (reflect mode) caused queue.Empty to be raised mid-execution. In task mode there was no try/except, so the exception propagated uncaught and left the background agent thread blocked forever on a queue no consumer would drain — making the process unresponsive to all subsequent messages. Root cause: the consumer had no knowledge of whether the agent was still alive; it raced against a wall-clock deadline instead of the agent lifecycle. Fix: add GenericAgent.drain_queue(), a generator that polls with a short interval (2 s) but keeps looping unconditionally while self.is_running is True. When the agent finishes it does one final non-blocking flush to collect items enqueued in the narrow window between the last put() and is_running being cleared. Both task mode and reflect mode are updated to use it. The helper is also available to third-party frontends and subagent code that currently open-code their own polling loops.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
In
--taskmode, any job running longer than 300 s raised an unhandledqueue.Emptyexception. The background agent thread was still alive andcontinued pushing items onto the display queue — but with no consumer left
to drain it,
queue.put()eventually blocked, freezing the agent threadpermanently. All subsequent messages received no response.
--reflectmode had atry/exceptaround the same pattern, so it degradedmore gracefully, but a 180 s ceiling still caused silent failures on
longer-running reflect tasks.
The root cause in both cases was the same: the consumer was racing against a
wall-clock deadline with no knowledge of whether the agent was still alive.
Fix
Add
GenericAgent.drain_queue(dq)— a generator that ties the wait to theagent's actual lifecycle instead of a fixed timeout:
self.is_runningisTrue, regardlessof elapsed time
is_runningflips toFalse, performs one final non-blocking flushto collect any items enqueued in the narrow window between the last
put()and the flag being cleared
nextprogress andthe final
doneresult in a single loopBoth
--taskand--reflectmodes are updated to usedrain_queue.The method is also available as a public API for third-party frontends and
subagent orchestration code that currently open-code their own polling loops.
Testing
Manually verified with a task that runs ~10 minutes (70 turns, slow model):
is handled immediately