Skip to content

Add model stream idle-timeout handling #80

@byapparov

Description

@byapparov

Context

The CLI currently relies on provider/model streaming APIs to keep producing events or eventually fail. If a provider connection stalls mid-stream without closing or raising an error, the session can remain busy indefinitely and automation wrappers may appear hung.

This came up while reviewing a local exploratory patch that added an AICTRL_MODEL_STREAM_IDLE_TIMEOUT_MS guard in SessionProcessor, but we are not shipping that patch yet because timeout behavior should be designed at the execution/runtime boundary.

Problem / Goal

Add a deliberate model-stream timeout strategy so stalled provider streams are surfaced as explicit failures instead of leaving sessions running forever.

Success means the CLI has a clear, configurable timeout/cancellation policy for model stream reads, emits useful failure information, and does not accidentally abort legitimate long-running reasoning/model calls that are still making progress.

Proposed Approach

Investigate the right layer for timeout handling, likely around LLM.stream / SessionProcessor, and define whether the guard should be:

  • idle-time based: abort when no stream events arrive for a configurable interval
  • wall-clock based: abort after a maximum total model turn duration
  • provider-specific: only enabled for known problematic providers/transports
  • surfaced through config/env and JSON error events

Avoid baking in a hidden global timeout until the policy is explicit.

User Story

As a developer running aictrl run in CI or automation,
I want stalled model streams to fail with a clear timeout error,
So that jobs do not hang forever and can retry or alert correctly.

Acceptance Criteria

  • A model stream that stops producing events can be detected and cancelled without leaving the session busy indefinitely.
  • Timeout behavior is configurable or explicitly documented, including the default and opt-out/override behavior.
  • Timeout failures are classified as timeout errors and surfaced in human output and JSON/NDJSON output where applicable.
  • Tests cover a stalled stream and verify session completion/error handling.
  • Long-running streams that continue producing events are not aborted by an idle timeout.

Out of Scope

  • The separate structured session_error parity change for existing session.error events.
  • Provider-specific retry/fallback policy after a timeout.
  • Shipping the current exploratory local patch as-is without a design pass.

Roadmap Alignment

  • Pillar: EXEC
  • Quarter: Q2 2026
  • Theme fit: Supports executor observability and operational reliability for automation workflows.
  • Decision gate impact: indirect — improves reliability and debuggability of headless execution.

References

  • Closest existing milestone: Enterprise Observability
  • Related code areas: packages/cli/src/session/processor.ts, packages/cli/src/session/llm.ts, packages/cli/src/cli/cmd/run.ts
  • Local exploratory patch considered: AICTRL_MODEL_STREAM_IDLE_TIMEOUT_MS wrapping reads from stream.fullStream

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request
    No fields configured for Feature.

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions