test(pty): stabilize flaky PTY interactions by benvinegar · Pull Request #361 · modem-dev/hunk

benvinegar · 2026-05-24T02:56:41Z

Summary

Stabilize flaky PTY mouse-wheel assertions by retrying pager wheel ticks one at a time
Give pager wheel retry tests enough timeout headroom for slow CI runs
Stabilize the note-focus PTY test by using the explicit cancel control after verifying shortcuts are blocked while the draft composer is active

Context

Main CI was intermittently failing in PTY integration tests. The original failure was pager wheel scrolling; after that was addressed, CI exposed another unrelated PTY timing flake in the note-focus cancellation test.

Verification

bun test ./test/pty/pager.test.ts (5x)
bun test ./test/pty/notes.test.ts -t "draft note focus blocks app shortcuts until cancelled"
bun run test:integration
bun run typecheck
bun run format:check
bun run lint

This PR description was generated by Pi using GPT-5

greptile-apps · 2026-05-24T02:58:57Z

Greptile Summary

This PR stabilizes flaky PTY wheel-scroll assertions by replacing single 10-tick scroll bursts (followed by a 5 s waitForSnapshot) with a new scrollWheelUntil helper that fires one tick at a time, checks the predicate after each tick with a 700 ms window, and retries up to 12 times before failing.

New helper scrollWheelUntil: closes over the module-level harness, accepts a direction and a predicate, and surfaces the last waitForSnapshot error on exhaustion.
Four scroll tests updated: stdin patch mode, stdin patch auto theme, general pager mode, and explicit pager mode --pager all now use the helper instead of the old burst + long poll pattern.

Confidence Score: 4/5

Safe to merge — changes are test-only and address a real CI flakiness root cause.

The retry-per-tick approach is sound and well-scoped. The two observations (outer timeout budget arithmetic and the dead non-Error fallback) are edge cases that won't affect typical CI runs, but are worth keeping in mind if the 20 s default ever needs adjusting for especially slow machines.

test/pty/pager.test.ts — specifically the interaction between the 700 ms per-attempt window, up to 12 retries per direction, and the 20 s global test timeout.

Important Files Changed

Filename	Overview
test/pty/pager.test.ts	Introduces `scrollWheelUntil` helper that retries one PTY wheel tick at a time (up to 12 × 700 ms) instead of a single 10-tick burst, and applies it to four scroll tests; two minor concerns around the global timeout budget and the dead non-Error fallback branch.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Test calls scrollWheelUntil] --> B{direction?}
    B -- down --> C[session.scrollDown 1]
    B -- up --> D[session.scrollUp 1]
    C --> E[harness.waitForSnapshot 700ms]
    D --> E
    E -- predicate matches --> F[return snapshot]
    E -- timeout / no match --> G[save lastError\nattempt += 1]
    G --> H{attempt < 12?}
    H -- yes --> B
    H -- no --> I{lastError instanceof Error?}
    I -- yes --> J[throw lastError]
    I -- no --> K[throw generic timeout Error]

Prompt To Fix All With AI

Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 2
test/pty/pager.test.ts:22-34
**Total timeout budget may still be tight under extreme CI load**

Tests that call `scrollWheelUntil` twice (down + up) can accumulate up to 12 × 700 ms per direction = ~8.4 s each, for a combined ~16.8 s on top of the 15 s `waitForText` startup timeout and the 200 ms `waitIdle`. That's a ceiling of ~32 s against the file-wide `setDefaultTimeout(20_000)`. In practice the predicate matches on the first or second attempt, so the tests pass; but if the PTY is slow enough that either direction needs 4–5 retries (≥ 2.8 s) alongside a slow startup, the test can still time out via the outer limit rather than the inner 700 ms window.

### Issue 2 of 2
test/pty/pager.test.ts:36-40
**Fallback error message swallows the actual timeout text**

If `waitForSnapshot` throws a non-`Error` value (or if `lastError` is still `undefined` because the loop never ran, though the loop guard prevents that), execution falls through to the generic `"Timed out waiting for pager wheel scroll ${direction}."` message instead of the last real snapshot captured inside `waitForSnapshot`. Since `waitForSnapshot` always throws a proper `Error` with the last PTY snapshot in its message, the `instanceof Error` branch will always win in practice — but the dead fallback hides a future footgun if the predicate path ever changes to throw a non-`Error`.

_{Reviews (1): Last reviewed commit: "test(pty): stabilize pager wheel scrolli..." | Re-trigger Greptile}

greptile-apps · 2026-05-24T02:59:00Z

+  for (let attempt = 0; attempt < 12; attempt += 1) {
+    if (direction === "down") {
+      await session.scrollDown(1);
+    } else {
+      await session.scrollUp(1);
+    }
+
+    try {
+      return await harness.waitForSnapshot(session, predicate, 700);
+    } catch (error) {
+      lastError = error;
+    }
+  }


Total timeout budget may still be tight under extreme CI load

Tests that call scrollWheelUntil twice (down + up) can accumulate up to 12 × 700 ms per direction = ~8.4 s each, for a combined ~16.8 s on top of the 15 s waitForText startup timeout and the 200 ms waitIdle. That's a ceiling of ~32 s against the file-wide setDefaultTimeout(20_000). In practice the predicate matches on the first or second attempt, so the tests pass; but if the PTY is slow enough that either direction needs 4–5 retries (≥ 2.8 s) alongside a slow startup, the test can still time out via the outer limit rather than the inner 700 ms window.

Prompt To Fix With AI

This is a comment left during a code review. Path: test/pty/pager.test.ts Line: 22-34 Comment: **Total timeout budget may still be tight under extreme CI load** Tests that call `scrollWheelUntil` twice (down + up) can accumulate up to 12 × 700 ms per direction = ~8.4 s each, for a combined ~16.8 s on top of the 15 s `waitForText` startup timeout and the 200 ms `waitIdle`. That's a ceiling of ~32 s against the file-wide `setDefaultTimeout(20_000)`. In practice the predicate matches on the first or second attempt, so the tests pass; but if the PTY is slow enough that either direction needs 4–5 retries (≥ 2.8 s) alongside a slow startup, the test can still time out via the outer limit rather than the inner 700 ms window. How can I resolve this? If you propose a fix, please make it concise.

Good catch. I raised the pager PTY file timeout to 45s so the worst-case wheel retry budget has headroom instead of racing Bun's outer test timeout.

This comment was generated by Pi using GPT-5

greptile-apps · 2026-05-24T02:59:01Z

+  if (lastError instanceof Error) {
+    throw lastError;
+  }
+
+  throw new Error(`Timed out waiting for pager wheel scroll ${direction}.`);


Fallback error message swallows the actual timeout text

If waitForSnapshot throws a non-Error value (or if lastError is still undefined because the loop never ran, though the loop guard prevents that), execution falls through to the generic "Timed out waiting for pager wheel scroll ${direction}." message instead of the last real snapshot captured inside waitForSnapshot. Since waitForSnapshot always throws a proper Error with the last PTY snapshot in its message, the instanceof Error branch will always win in practice — but the dead fallback hides a future footgun if the predicate path ever changes to throw a non-Error.

Prompt To Fix With AI

This is a comment left during a code review. Path: test/pty/pager.test.ts Line: 36-40 Comment: **Fallback error message swallows the actual timeout text** If `waitForSnapshot` throws a non-`Error` value (or if `lastError` is still `undefined` because the loop never ran, though the loop guard prevents that), execution falls through to the generic `"Timed out waiting for pager wheel scroll ${direction}."` message instead of the last real snapshot captured inside `waitForSnapshot`. Since `waitForSnapshot` always throws a proper `Error` with the last PTY snapshot in its message, the `instanceof Error` branch will always win in practice — but the dead fallback hides a future footgun if the predicate path ever changes to throw a non-`Error`. How can I resolve this? If you propose a fix, please make it concise.

Fixed. The helper now tracks the last failure message and always throws it after retries, preserving the waitForSnapshot diagnostic text even if a future path throws a non-Error.

This comment was generated by Pi using GPT-5

test(pty): stabilize pager wheel scrolling

22f8723

greptile-apps Bot reviewed May 24, 2026

View reviewed changes

benvinegar added 2 commits May 23, 2026 23:03

test(pty): widen pager wheel retry budget

cbe828d

test(pty): stabilize note focus cancellation

4e6bac3

benvinegar changed the title ~~test(pty): stabilize pager wheel scrolling~~ test(pty): stabilize flaky PTY interactions May 24, 2026

benvinegar merged commit 01ee8a1 into main May 24, 2026
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(pty): stabilize flaky PTY interactions#361

test(pty): stabilize flaky PTY interactions#361
benvinegar merged 3 commits into
mainfrom
test/stabilize-pager-pty-wheel

benvinegar commented May 24, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot commented May 24, 2026

Uh oh!

greptile-apps Bot May 24, 2026

Uh oh!

benvinegar May 24, 2026

Uh oh!

greptile-apps Bot May 24, 2026

Uh oh!

benvinegar May 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

benvinegar commented May 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Context

Verification

Uh oh!

greptile-apps Bot commented May 24, 2026

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Flowchart

Uh oh!

greptile-apps Bot May 24, 2026

Choose a reason for hiding this comment

Uh oh!

benvinegar May 24, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot May 24, 2026

Choose a reason for hiding this comment

Uh oh!

benvinegar May 24, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

benvinegar commented May 24, 2026 •

edited

Loading