Skip to content

Commit 4bea2cd

Browse files
committed
Stabilize browser suites for isolated parallel runs
1 parent 3b6639e commit 4bea2cd

37 files changed

+350
-270
lines changed

AGENTS.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -121,12 +121,15 @@ Rule format:
121121
- When diagnosing browser-suite CI failures, always inspect and preserve Playwright screenshot artifacts from the failing run, and extend the harness so early bootstrap or fixture failures also leave a screenshot whenever the browser page still exists.
122122
- Selector-contract remediation requests must be handled repo-wide across all relevant test files (`Web.Tests` and `Web.UITests`), not as partial per-file cleanups.
123123
- Browser and editor acceptance tests must be stable against seeded demo content, user-entered text, and import filename variations: assert invariant UI behavior, use shared fixtures/constants, and wait on explicit ready or completion signals instead of incidental page titles, transient text, or race-prone intermediate chrome.
124+
- Browser UI tests must be parallel-safe by construction: if a test can run beside another test, it must not depend on shared mutable route state, shared browser-storage keys, shared warmed pages, or a shared writable seeded document.
124125
- Editor browser tests that mutate document content must create or import an isolated script for that scenario and edit that script only; do not point multiple mutating tests at the same seeded library script or shared draft state.
125126
- Do not keep `NotInParallel` on browser-test classes that already use isolated browser contexts and isolated writable scripts; parallel locks need a concrete documented shared-resource conflict, not legacy caution.
127+
- Browser test harness work is first-class platform work: prefer reusable per-test draft factories, route-ready drivers, storage reset helpers, and failure artifacts over ad-hoc waits or fixture shortcuts that hide shared-state coupling.
126128
- When the user asks to stabilize failing tests or gather a red baseline, prioritize reproducing the failures, capturing the failing test list, and fixing the suites before spending time on git history, commit hygiene, or other bookkeeping.
127129
- Repo-wide quality audits and agent-generated review handoff artifacts must be written as root-level task files so other coding agents can pick them up quickly; do not bury those temporary audit results under `docs/` unless the task is explicitly about durable product documentation.
128130
- Repo-wide cleanup and review passes must explicitly inventory forbidden implementation string literals, `MarkupString` or raw-HTML UI composition, duplicated JS/CSS patterns, architecture-boundary drift, and `foreach`-driven test scenarios that should become isolated TUnit cases.
129131
- Repo-wide audits should use multiple independent reviewers with distinct focuses when the tooling is available, including external CLI reviewers such as Claude and Copilot plus internal agents, and all review outputs should be captured in root-level task files before remediation starts.
132+
- When CI browser suites keep flaking, use the available assistant CLIs and internal sub-agents as parallel investigators on the same failure cluster instead of debugging only through one serial line of inquiry.
130133
- Legacy, dead, duplicate, or speculative code paths should be deleted aggressively instead of being preserved behind compatibility instincts; if code has no clear runtime owner or authoritative contract, remove it rather than keep it as “just in case” ballast.
131134
- For repo-wide remediation passes, keep an explicit root-level accounting of fixed versus remaining feedback items, finish the code fixes first, and only then run and stabilize the test suites; do not bounce back into verification mid-remediation unless the user explicitly asks.
132135
- For task-scoped work, edit, stage, and commit only the files directly required for the requested change; do not widen the change set into unrelated user-owned or parallel worktree edits, do not touch changes owned by another agent, and if a blocker comes from that parallel work, wait briefly and re-check instead of patching around their in-flight fix.
@@ -178,6 +181,8 @@ Browser test execution rules:
178181
- Browser UI scenarios are the primary acceptance gate for this repo. Component and core tests are supporting layers, not the release bar.
179182
- Major user flows MUST be covered by long Playwright scenarios that execute real browser interactions end to end.
180183
- Major browser scenarios MUST capture screenshot artifacts under `output/playwright/`.
184+
- Browser test infrastructure MUST keep CI and local bootstrap behavior equivalent for the page handed to the test; do not add CI-only navigation, warmup, or seeding steps that change the initial routed state seen by the scenario.
185+
- Browser test seed state MUST be applied explicitly after a verified storage reset on the isolated test page or context; do not inject mutable library or settings seed data through navigation-triggered `AddInitScript` hooks.
181186
- For new visual elements, visual regressions, or editor chrome/layout work, inspect the real browser surface and capture screenshots; bUnit may support structural contracts but is not sufficient as the primary signal for visual correctness.
182187
- Responsive layout work is not done until Playwright verifies every routed screen across a phone-and-tablet viewport matrix that includes small, medium, and large handset sizes plus small, medium, and large tablet sizes in both portrait and landscape, with assertions that primary page controls stay visible inside the viewport without clipping.
183188
- Editor typing and latency fixes are not done until they are reproduced and cleared on the live dev-host editor with real keyboard input, not only synthetic input helpers or the static UI-test host.

0 commit comments

Comments
 (0)