test(pty): stabilize flaky PTY interactions#361
Conversation
Greptile SummaryThis PR stabilizes flaky PTY wheel-scroll assertions by replacing single 10-tick scroll bursts (followed by a 5 s
Confidence Score: 4/5Safe to merge — changes are test-only and address a real CI flakiness root cause. The retry-per-tick approach is sound and well-scoped. The two observations (outer timeout budget arithmetic and the dead non-Error fallback) are edge cases that won't affect typical CI runs, but are worth keeping in mind if the 20 s default ever needs adjusting for especially slow machines. test/pty/pager.test.ts — specifically the interaction between the 700 ms per-attempt window, up to 12 retries per direction, and the 20 s global test timeout. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[Test calls scrollWheelUntil] --> B{direction?}
B -- down --> C[session.scrollDown 1]
B -- up --> D[session.scrollUp 1]
C --> E[harness.waitForSnapshot 700ms]
D --> E
E -- predicate matches --> F[return snapshot]
E -- timeout / no match --> G[save lastError\nattempt += 1]
G --> H{attempt < 12?}
H -- yes --> B
H -- no --> I{lastError instanceof Error?}
I -- yes --> J[throw lastError]
I -- no --> K[throw generic timeout Error]
Prompt To Fix All With AIFix the following 2 code review issues. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 2
test/pty/pager.test.ts:22-34
**Total timeout budget may still be tight under extreme CI load**
Tests that call `scrollWheelUntil` twice (down + up) can accumulate up to 12 × 700 ms per direction = ~8.4 s each, for a combined ~16.8 s on top of the 15 s `waitForText` startup timeout and the 200 ms `waitIdle`. That's a ceiling of ~32 s against the file-wide `setDefaultTimeout(20_000)`. In practice the predicate matches on the first or second attempt, so the tests pass; but if the PTY is slow enough that either direction needs 4–5 retries (≥ 2.8 s) alongside a slow startup, the test can still time out via the outer limit rather than the inner 700 ms window.
### Issue 2 of 2
test/pty/pager.test.ts:36-40
**Fallback error message swallows the actual timeout text**
If `waitForSnapshot` throws a non-`Error` value (or if `lastError` is still `undefined` because the loop never ran, though the loop guard prevents that), execution falls through to the generic `"Timed out waiting for pager wheel scroll ${direction}."` message instead of the last real snapshot captured inside `waitForSnapshot`. Since `waitForSnapshot` always throws a proper `Error` with the last PTY snapshot in its message, the `instanceof Error` branch will always win in practice — but the dead fallback hides a future footgun if the predicate path ever changes to throw a non-`Error`.
Reviews (1): Last reviewed commit: "test(pty): stabilize pager wheel scrolli..." | Re-trigger Greptile |
| for (let attempt = 0; attempt < 12; attempt += 1) { | ||
| if (direction === "down") { | ||
| await session.scrollDown(1); | ||
| } else { | ||
| await session.scrollUp(1); | ||
| } | ||
|
|
||
| try { | ||
| return await harness.waitForSnapshot(session, predicate, 700); | ||
| } catch (error) { | ||
| lastError = error; | ||
| } | ||
| } |
There was a problem hiding this comment.
Total timeout budget may still be tight under extreme CI load
Tests that call scrollWheelUntil twice (down + up) can accumulate up to 12 × 700 ms per direction = ~8.4 s each, for a combined ~16.8 s on top of the 15 s waitForText startup timeout and the 200 ms waitIdle. That's a ceiling of ~32 s against the file-wide setDefaultTimeout(20_000). In practice the predicate matches on the first or second attempt, so the tests pass; but if the PTY is slow enough that either direction needs 4–5 retries (≥ 2.8 s) alongside a slow startup, the test can still time out via the outer limit rather than the inner 700 ms window.
Prompt To Fix With AI
This is a comment left during a code review.
Path: test/pty/pager.test.ts
Line: 22-34
Comment:
**Total timeout budget may still be tight under extreme CI load**
Tests that call `scrollWheelUntil` twice (down + up) can accumulate up to 12 × 700 ms per direction = ~8.4 s each, for a combined ~16.8 s on top of the 15 s `waitForText` startup timeout and the 200 ms `waitIdle`. That's a ceiling of ~32 s against the file-wide `setDefaultTimeout(20_000)`. In practice the predicate matches on the first or second attempt, so the tests pass; but if the PTY is slow enough that either direction needs 4–5 retries (≥ 2.8 s) alongside a slow startup, the test can still time out via the outer limit rather than the inner 700 ms window.
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
Good catch. I raised the pager PTY file timeout to 45s so the worst-case wheel retry budget has headroom instead of racing Bun's outer test timeout.
This comment was generated by Pi using GPT-5
| if (lastError instanceof Error) { | ||
| throw lastError; | ||
| } | ||
|
|
||
| throw new Error(`Timed out waiting for pager wheel scroll ${direction}.`); |
There was a problem hiding this comment.
Fallback error message swallows the actual timeout text
If waitForSnapshot throws a non-Error value (or if lastError is still undefined because the loop never ran, though the loop guard prevents that), execution falls through to the generic "Timed out waiting for pager wheel scroll ${direction}." message instead of the last real snapshot captured inside waitForSnapshot. Since waitForSnapshot always throws a proper Error with the last PTY snapshot in its message, the instanceof Error branch will always win in practice — but the dead fallback hides a future footgun if the predicate path ever changes to throw a non-Error.
Prompt To Fix With AI
This is a comment left during a code review.
Path: test/pty/pager.test.ts
Line: 36-40
Comment:
**Fallback error message swallows the actual timeout text**
If `waitForSnapshot` throws a non-`Error` value (or if `lastError` is still `undefined` because the loop never ran, though the loop guard prevents that), execution falls through to the generic `"Timed out waiting for pager wheel scroll ${direction}."` message instead of the last real snapshot captured inside `waitForSnapshot`. Since `waitForSnapshot` always throws a proper `Error` with the last PTY snapshot in its message, the `instanceof Error` branch will always win in practice — but the dead fallback hides a future footgun if the predicate path ever changes to throw a non-`Error`.
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
Fixed. The helper now tracks the last failure message and always throws it after retries, preserving the waitForSnapshot diagnostic text even if a future path throws a non-Error.
This comment was generated by Pi using GPT-5
Summary
Context
Main CI was intermittently failing in PTY integration tests. The original failure was pager wheel scrolling; after that was addressed, CI exposed another unrelated PTY timing flake in the note-focus cancellation test.
Verification
bun test ./test/pty/pager.test.ts(5x)bun test ./test/pty/notes.test.ts -t "draft note focus blocks app shortcuts until cancelled"bun run test:integrationbun run typecheckbun run format:checkbun run lintThis PR description was generated by Pi using GPT-5