fix(agentmain): replace fixed-timeout queue polling with lifecycle-aware drain_queue by sontianye · Pull Request #374 · lsdefine/GenericAgent

sontianye · 2026-05-14T09:32:03Z

Problem

In --task mode, any job running longer than 300 s raised an unhandled
queue.Empty exception. The background agent thread was still alive and
continued pushing items onto the display queue — but with no consumer left
to drain it, queue.put() eventually blocked, freezing the agent thread
permanently. All subsequent messages received no response.

--reflect mode had a try/except around the same pattern, so it degraded
more gracefully, but a 180 s ceiling still caused silent failures on
longer-running reflect tasks.

The root cause in both cases was the same: the consumer was racing against a
wall-clock deadline with no knowledge of whether the agent was still alive.

Fix

Add GenericAgent.drain_queue(dq) — a generator that ties the wait to the
agent's actual lifecycle instead of a fixed timeout:

Polls with a short 2 s interval so CPU is not wasted
Keeps looping unconditionally while self.is_running is True, regardless
of elapsed time
When is_running flips to False, performs one final non-blocking flush
to collect any items enqueued in the narrow window between the last put()
and the flag being cleared
Yields every item so callers can handle both streaming next progress and
the final done result in a single loop

Both --task and --reflect modes are updated to use drain_queue.
The method is also available as a public API for third-party frontends and
subagent orchestration code that currently open-code their own polling loops.

Testing

Manually verified with a task that runs ~10 minutes (70 turns, slow model):

Before: process hangs after 5 minutes, subsequent messages get no reply
After: task completes normally, output written correctly, next message
is handled immediately

…are drain_queue Tasks running longer than the hardcoded 300 s (task mode) or 180 s (reflect mode) caused queue.Empty to be raised mid-execution. In task mode there was no try/except, so the exception propagated uncaught and left the background agent thread blocked forever on a queue no consumer would drain — making the process unresponsive to all subsequent messages. Root cause: the consumer had no knowledge of whether the agent was still alive; it raced against a wall-clock deadline instead of the agent lifecycle. Fix: add GenericAgent.drain_queue(), a generator that polls with a short interval (2 s) but keeps looping unconditionally while self.is_running is True. When the agent finishes it does one final non-blocking flush to collect items enqueued in the narrow window between the last put() and is_running being cleared. Both task mode and reflect mode are updated to use it. The helper is also available to third-party frontends and subagent code that currently open-code their own polling loops.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(agentmain): replace fixed-timeout queue polling with lifecycle-aware drain_queue#374

fix(agentmain): replace fixed-timeout queue polling with lifecycle-aware drain_queue#374
sontianye wants to merge 1 commit into
lsdefine:mainfrom
sontianye:worktree-fix-task-timeout

sontianye commented May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

sontianye commented May 14, 2026

Problem

Fix

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant