Skip to content

chore: upgrade Chromium from v126 to v143 for asset discovery#2187

Open
Manoj-Katta wants to merge 38 commits into
masterfrom
PPLT-4214/upgrade-chromium-v143
Open

chore: upgrade Chromium from v126 to v143 for asset discovery#2187
Manoj-Katta wants to merge 38 commits into
masterfrom
PPLT-4214/upgrade-chromium-v143

Conversation

@Manoj-Katta
Copy link
Copy Markdown
Contributor

@Manoj-Katta Manoj-Katta commented Apr 17, 2026

Chromium v126 → v143 upgrade (PR #2187)

This PR upgrades the bundled Chromium binary that @percy/core uses for asset discovery from v126.0.6478.184 → v143.0.7499.169, to match the version percy-renderer ships server-side.

The version jump crosses 17 major Chrome versions including:

  • Chrome 128 — old-headless → new-headless rewrite (blog)
  • Chrome 130 — Local Network Access opt-in
  • Chrome 132--headless=old deleted entirely
  • Chrome 143 — Local Network Access default-on, malformed Content-Length leniency, PlzDedicatedWorker default

Many "regressions" we hit are not bugs — they are intentional Chrome behaviors that old headless quietly didn't enforce. This document walks each one, the diagnostic story behind it, and the fix.

TL;DR for reviewers: Production-code changes concentrated in network.js (with smaller diffs in browser.js and install.js) covering 9 categories of v143 behavior change. The network.js changes are layered: (a) Chrome 128 site isolation and v143 launch-flag work in browser.js; (b) malformed Content-Length pre-abort and the worker-script direct-fetch fallback with size/origin/timeout safety nets in network.js; (c) test coverage for every new branch including the deterministic captureResourceDirectly catch test that closes the L755 gap. Per ce:review, 11 must-fix/should-fix items are applied; 5 reviewer comments are documented as deferred with reasons.

Section 1 — What Chrome 143 changed, and what we did for each

1.1. Site isolation became default-on (Chrome 128)

What changed: Cross-origin sub-resources and worker scripts now run in separate renderer processes by default. --disable-site-isolation-trials only affected opt-in trials, not the baseline-on behavior.

How it broke Percy: Percy's CDP Fetch.enable and Network.enable are attached to the page session. Events on a sibling renderer process don't reach those listeners → worker fetches and some cross-origin sub-resources were silently uncaptured.

What we did: Added IsolateOrigins,site-per-process to the --disable-features= launch flag (browser.js). Keeps cross-origin sub-resources in the same renderer process, matching v126 old-headless behavior.

Risk: None for Percy's use case. CORS/CORB/ORB enforcement on actual cross-origin internet hosts is preserved — only Chrome's process-isolation is relaxed.

1.2. HTTPS-first auto-upgrade in headless (Chrome 143)

What changed: Chrome 143 silently auto-upgrades any HTTP navigation to HTTPS. If HTTPS isn't reachable, Chrome blocks navigation with net::ERR_BLOCKED_BY_CLIENT.

How it broke Percy: Customers running asset discovery against HTTP URLs (local dev servers, CI environments, staging) would see silent navigation failures.

What we did: Added HttpsFirstBalancedModeAutoEnable to --disable-features=. HTTP navigation proceeds normally.

Risk: None — Percy is a tool, not a browser. Users intend to load the URL they pass in.

1.3. Local Network Access checks default-on (Chrome 143)

What changed: Pages fulfilled via Fetch.fulfillRequest (Percy's domSnapshot root-document path) are treated as an external context by Chrome's LNA gating. Sub-resource requests to localhost, 127.0.0.1, or RFC 1918 private addresses are blocked with corsError = LocalNetworkAccessPermissionDenied.

How it broke Percy: 12 cross-origin and auto-domain-validation discovery tests failed. Percy's logger swallowed the actual Chrome error — visible only via raw CDP. The wrong hypothesis (a planned dual-stage Fetch interception refactor) would have wasted weeks.

Diagnostic story: Wrote a standalone raw-CDP Node script that bypassed Percy entirely and replicated the exact Fetch.fulfillRequest flow. The probe surfaced LocalNetworkAccessPermissionDenied. Cross-referenced with Chrome's LNA spec.

What we did: Added LocalNetworkAccessChecks to --disable-features=. Narrower than --disable-web-security — preserves CORS/CORB/ORB on real internet hosts, only relaxes Chrome's loopback/private-network gating, which is exactly what asset discovery needs.

Risk: None for headless asset discovery. The renderer side doesn't use this flag.

1.4. Malformed Content-Length no longer triggers Chrome self-abort (Chrome 143)

What changed: In v126, Chrome's HTTP parser detected malformed Content-Length (NaN, non-numeric) and dropped the connection. Network.loadingFailed fired and Percy logged the failure. In v143, Chrome is lenient — it waits indefinitely for body bytes that never come. No event reaches Percy.

How it broke Percy: A page with <link href="bad.css"> (where the server returns malformed CL) would hang. Page.lifecycleEvent name=load never fires. Page.navigate({waitUntil:'load'}) times out at 30s, retries 3 times → 93s of dead air, then snapshot fails.

Diagnostic story: Reproduced the 93s timeout. Added per-event CDP tracing in network.js. Confirmed Network.responseReceived fired (status=200, headers received) but Network.loadingFinished and Network.loadingFailed never fired. The connection was hanging in Chrome's network stack.

What we did (5 pieces, all in network.js):

  1. Opt into response-stage Fetch interception via interceptResponse: true on Fetch.continueRequest. Chrome now pauses every intercepted request a second time once response headers are received (but before body bytes stream).

  2. Add _handleResponsePaused to inspect Content-Length at the response stage. If tooLarge or malformed, call Fetch.failRequest({errorReason: 'Aborted'}) proactively — Chrome stops waiting, <link> resolves as failed, page load fires.

  3. Two helper functions (headersArrayToObject, inspectContentLength) at module scope for clean inspection.

  4. Lifecycle ordering: Fetch.failRequest BEFORE _forgetRequest. v126's malformed-CL handling was implicit (Chrome auto-aborted); v143 forces us to call Fetch.failRequest explicitly. If our failRequest call itself throws (transient CDP error, target detach race), running _forgetRequest first would leak Chrome's Fetch state — Percy thinks the request is done, but Chrome's interception is still paused indefinitely. We now finalize the disposition first, only forgetting the request after. On unrecognized errors, a last-resort Fetch.continueResponse attempts to un-pause Chrome so it doesn't leak.

  5. Observability for stuck-snapshot diagnostics — currently at debug, telemetry follow-up tracked. Unknown errors in _handleResponsePaused's failRequest catch and _continueResponse's catch stay at log.debug. We initially promoted these to log.warn for default-loglevel visibility, then reverted (commit 334f0621) because routine Session closed / Target closed lifecycle races fired these catches and the warn noise drowned legitimate signals. The right home for this signal is structured instrumentation telemetry rather than user-facing logs; tracking that as a separate follow-up so customer logs stay quiet on routine teardowns.

Pattern established: Fetch.failRequest is used only when Percy asserts a failure (our oversized-CL abort). For errors Chrome already reports, we use Fetch.continueResponse to defer to Chrome's natural loadingFailed flow — this preserves Chrome's specific errorText (e.g. net::ERR_EMPTY_RESPONSE) instead of synthesizing a generic net::ERR_FAILED (which Percy's asset_load_missing instrumentation explicitly filters as noise).

Side benefit: Pre-fix, Percy's MAX_RESOURCE_SIZE check inside saveResponseResource ran after Chrome had already buffered the full body. Our new check fires before the body streams — saves bandwidth and memory, and unsticks the page render faster. This was a latent inefficiency in v126 too; v143 forced us to fix it.

Cost: Every Fetch-intercepted request now does one extra CDP roundtrip. Sub-millisecond on localhost websocket. Puppeteer and Playwright use the same dual-stage interception pattern.

1.5. PlzDedicatedWorker — dedicated worker scripts disappear from CDP (Chrome 143)

What changed: Chrome 143 made PlzDedicatedWorker the default (Chromium issue 40093136"Implement browser-initiated dedicated worker script load"). Dedicated worker scripts are now fetched in the browser process via WorkerScriptFetcher, not in the renderer URLLoader proxy chain. The renderer-side URLLoader proxy is what historically emitted Network.responseReceived for worker scripts "for free." That path is now bypassed.

The precise mechanism (verified against Chromium source): Under PlzDedicatedWorker, DevTools instrumentation for the worker-script fetch is handled by content/browser/devtools/devtools_instrumentation.cc. That file has hand-coded hooks for the worker path: OnWorkerMainScriptRequestWillBeSent (emits Network.requestWillBeSent) and OnWorkerMainScriptLoadingFailed (emits Network.loadingFailed). There is no corresponding OnWorkerMainScriptResponseReceived hook. Service workers have one (OnServiceWorkerMainScriptFetchingFailed → network_handler->ResponseReceived(...)); dedicated workers don't. So Network.responseReceived is never emitted on any CDP session for a successful dedicated-worker main-script fetch — it's a Chromium instrumentation gap, not a routing problem.

The NetworkServiceDevToolsObserver attached to the worker-script URLLoaderFactory is bound to the page frame's devtools token (see content/browser/worker_host/worker_script_fetcher.ccMakeSelfOwned(creator_render_frame_host->GetDevToolsFrameToken().ToString())), which is why the page session sees requestWillBeSent. The loadingFinished that does fire on the worker session arrives via a separate path — the worker target's own NetworkHandler, created at auto-attach by content/browser/devtools/dedicated_worker_devtools_agent_host.cc's AttachSession, receives completion from the renderer once the worker is alive — but that handler never received requestWillBeSent or responseReceived from the browser-side fetch, so Network.getResponseBody against it returns "no resource with given identifier found."

Auto-attach (Target.setAutoAttach) is not the cause — it merely makes the asymmetry visible because there's a worker session to receive loadingFinished. A single-session debugger sees the same loss of responseReceived.

How it broke Percy: Two layered failures.

Failure A — idle() hangs forever. Percy's _handleLoadingFinished awaits Network.responseReceived unconditionally. Since Chromium never emits that event for the worker script, the await blocks forever → the request stays in #requestsidle() never returns → snapshot hits Page.TIMEOUT × 3 retries = 93s.

Failure B — Worker bytes never reach CDP. Even after we time out and clean up #requests, the renderer (server-side) still needs the worker.js bytes to construct the Worker at render time. Neither the page session's NetworkHandler nor the worker session's NetworkHandler ever stored the body, so Network.getResponseBody returns "no resource with given identifier found" on both sessions.

Diagnostic story:

  • Wrote a standalone raw-CDP Node script that spawns v143 Chromium, navigates to a worker page, logs every Network/Fetch/Target event across all sessions, and tries Network.getResponseBody on each session that emitted events. Confirmed: responseReceived is absent on every session; loadingFinished fires on the worker session with the page's requestId (= the worker token); getResponseBody returns "no resource with given identifier found" everywhere.
  • Confirmed against Chromium source: no OnWorkerMainScriptResponseReceived hook exists in content/browser/devtools/devtools_instrumentation.cc.
  • Cross-checked with Playwright issue #31747 and PR #35692, and Puppeteer PR #13752 — both upstream projects independently hit the same gap.

What we did (in network.js):

  1. RESPONSE_RECEIVED_TIMEOUT = 2000 — module-scope named constant. Cap the responseReceived await in _handleLoadingFinished with a Promise.race against a 2-second timer. Worker scripts wake up the handler instead of hanging it. Comment notes the cumulative N*2s worst case so future readers can reason about cost.

  2. Debug log when the timeout fires (Skipping resource: responseReceived not received within 2000ms - <url>). Asserted in captures requests from workers so a regression is observable — if a future Chrome restores the v126 lifecycle, the negated assertion will fail and tell us to remove the workaround.

  3. captureResourceDirectly fallback (renamed from captureScriptDirectly — the function captures any allowlisted resource that fell through to direct fetch, not just worker scripts). When the timeout fires and the URL is in allowedHostnames AND NOT in disallowedHostnames, fetch the bytes directly via Node HTTP (makeDirectRequest, the same helper used for fonts since v126). Save them as a Percy resource with mimetype derived from the URL extension via mime.lookup().

  4. DIRECT_FETCH_TIMEOUT = 5000 — module-scope constant. Wraps the makeDirectRequest call inside captureResourceDirectly in Promise.race against a 5-second timer. Caps the wall-clock cost if a worker host accepts the TCP connection but stalls. Pre-fix worst case was ~30s (networkIdleWaitTimeout); post-fix is ~7s (RESPONSE_RECEIVED_TIMEOUT + DIRECT_FETCH_TIMEOUT).

  5. Size guard + real HTTP status in captureResourceDirectly. makeDirectRequest now returns { body, status } via the makeRequest callback. The fallback rejects bodies over MAX_RESOURCE_SIZE (25 MB) — matching saveResponseResource's guard — to keep the WebSocket payload safe. The Percy resource records the real HTTP status from the direct fetch instead of a hardcoded 200. Font path call site updated to destructure { body }.

  6. Disallowedhostnames precedence at the fallback gate. The gate in _handleLoadingFinished now mirrors sendResponseResource's pattern — checks disallowedHostnames first, allowedHostnames second. Customers who explicitly blocklist an internal host won't see Percy's Node fallback bypass that gate. A debug log fires when the fallback is skipped due to hostname gating so the silent-skip is observable.

  7. Same-origin Authorization in makeDirectRequest. Browser's URLLoader origin-scopes Basic auth automatically; our Node fallback didn't. Extracted as an exported shouldAttachAuth(authorization, requestUrl, snapshotUrl) pure helper that compares the target URL's origin against network.meta?.snapshotURL's origin and only returns true when both URLs parse and origins match. Malformed URLs return false defensively. Prevents customer credentials leaking to redirected third-party origins, and makes the same-origin / cross-origin / malformed-URL branches unit-testable without /* istanbul ignore */ markers.

  8. makeDirectRequest reads cookies from the page session, not the triggering session. When captureResourceDirectly is called from _handleLoadingFinished bound to the worker session (v143 worker-script case), Network.getCookies on that session throws "Internal error" because the worker session's NetworkHandler has only a partial Network domain. Extracted as an exported pickCookieSession(network, session) helper that returns network.page.session (the top-level page session, full Network domain) when available and falls back to the triggering session otherwise. Both branches of the ?? are unit-testable via the helper. A defensive try/catch with empty-cookies fallback remains for the unlikely case the page session's Network.getCookies is also unavailable (e.g., browser closing mid-snapshot).

Why the hostname gate (not type gate): v143 PlzDedicatedWorker reports the worker fetch with resourceType: 'Other', not 'Script'. A strict type check would miss the case. Gating on allowedHostnames scopes the fallback to URLs the snapshot would have captured anyway, regardless of Chrome's internal classification.

1.6. HTML attribute-value serialization escapes < and > (Chrome 128+)

What changed: Chrome 128's new-headless serializes HTML attribute values per the HTML5 spec, entity-escaping < and >. So <iframe srcdoc="<p>Foo</p>"> serializes as <iframe srcdoc="&lt;p&gt;Foo&lt;/p&gt;">. Both render identically.

How it broke Percy: One test (runs the execute callback in the correct frame) asserted the literal <p>Foo</p> in serialized iframe srcdoc.

What we did: Widened the regex to accept either form. No production code change.

1.7. Chrome 128+ auto-fetches /favicon.ico on every navigation

What changed: New headless mode auto-fetches favicons. Old headless never did.

How it broke Percy: Several discovery tests asserted exact server.requests counts or captured[i] indices. Phantom /favicon.ico entries shifted everything.

What we did: Two test-helper changes (test/helpers/server.js):

  1. Filter /favicon.ico out of the server.requests log
  2. Reply HTTP 204 by default for /favicon.ico so it doesn't enter the snapshot resource list

Tests that explicitly want favicon capture (added in 890883ad, 98f5a4e2) register their own server.reply('/favicon.ico', ...) override.

Note for next commit: the 204-default approach masks v128+ behavior instead of acknowledging it. A pending principled refactor will revert to a realistic test fixture and update affected tests to expect the auto-favicon-fetch in their assertions.

1.8. Browser-launch failure path varies by platform (Chrome 143)

What changed: Some args we used to test the "browser failed to launch" code path no longer cause Chrome to exit on all platforms. v143 Chrome with --version exits cleanly on Linux/macOS but keeps running on Windows; v126's pattern was Windows-friendly via --remote-debugging-port=null.

What we did: browser.js now passes a real Error to its rejection handler (one shape across code paths). The failing-launch test uses --version on Linux/macOS and --remote-debugging-port=null on Windows so each runner exercises the handleExitClose code path.

1.9. Auth dialog cancellation in headless (fundamental v143, NOT fixed)

Observed: When Chrome 143 cancels an HTTP basic auth dialog in headless mode, the page's load event never fires. Page.navigate({waitUntil:'load'}) times out at 30s.

Impact: The does not capture without auth credentials test still takes ~1m34s in v143 (vs ~1s in v126). Test still passes; just slowly.

Why not fixed: This is fundamental Chrome 143 headless behavior — there's no graceful path on Percy's side. Real customers running asset discovery against auth-protected URLs without credentials would hit the same delay. Documented as known.

Section 2 — Other plumbing changes (not Chrome-behavior driven)

2.1. CI cache-key bump

.github/.cache-key counter incremented. Forces CI to re-download the v143 Chromium binary instead of reusing the cached v126.

⚠️ Note for future Chromium upgrades: This bump was missed for v96, v123, and v126 upgrades in prior PRs. CI for those upgrades may have silently run against the cached old binary, producing false-green CI signals. Going forward this should be a mandatory step. Consider a pre-commit hook or PR template checklist item.

2.2. Updated sec-ch-ua header

In network.js, the hardcoded sec-ch-ua HTTP request header bumped from v123 → v143. Used only by makeDirectRequest (direct HTTP for fonts and the new worker fallback). Header should match the bundled binary's version.

Deferred: Reviewer ce:review (P2 #6) flagged that the version literal will silently desync on the next Chrome bump. The whole makeDirectRequest header block has multiple hardcoded values (sec-fetch-dest: 'font' is wrong for worker scripts; sec-ch-ua-platform: '"macOS"' is wrong on Linux/Windows). Deferring a full audit rather than fixing one literal in isolation.

2.3. install.js revisions

Bumped the per-platform Chromium revision numbers in install.js. Each revision is the nearest chromium-snapshots entry to v143.0.7499.169 for that platform.

Documented the picking-procedure in inline comments so future upgrades don't have to re-derive it.

Section 3 — References

Upgrade bundled Chromium from v126.0.6478.184 to v143.0.7499.169
to match the renderer browser version. Update sec-ch-ua header
from v123 to v143 in direct font request headers.

PPLT-4214

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@Manoj-Katta Manoj-Katta requested a review from a team as a code owner April 17, 2026 05:50
Manoj-Katta and others added 5 commits April 19, 2026 12:46
Chrome >=128 new headless auto-requests /favicon.ico on page load,
breaking discovery tests that asserted exact request counts. Filter
at the helper level so all tests benefit without per-assertion churn.
Chrome >=128 new headless enforces site isolation via IsolateOrigins
and site-per-process, putting cross-origin sub-resources and worker
fetches into separate renderer processes. The existing
--disable-site-isolation-trials flag only covers opt-in trials and
no longer keeps everything in one renderer.

Extend the existing --disable-features list with IsolateOrigins and
site-per-process so the main page's Fetch.enable / Network.enable
listeners continue to see cross-origin and worker traffic, matching
the v126 old-headless behavior Percy relies on.

Also add HttpsFirstBalancedModeAutoEnable to prevent new headless
from blocking HTTP asset discovery with ERR_BLOCKED_BY_CLIENT.
Invalidates actions/cache entries that still contain the old v126
Chromium binary so CI downloads the new v143 revisions fresh.
…T-5285)

Two adjustments for Chrome >=128 (new headless) without changing
production code:

- test server now replies 204 to the auto-fetched /favicon.ico so it
  doesn't leak into a snapshot's resource list and shift downstream
  request indices (the previous helper only filtered favicon from the
  request log; the browser still received the default reply and the
  resource still got captured).
- iframe-srcdoc test regex accepts both serializations: pre-Chrome-128
  left "<" / ">" literal inside attribute values, while >=128
  entity-escapes them per HTML5 spec. Both are valid HTML.

Fixes 4 of the v143 failures: takes additional snapshots after running
each execute, runs the execute callback in the correct frame, can
execute scripts at various states, can execute scripts that wait for
specific states.

Cluster A (cross-origin / domain-validation, 14 specs) is being tracked
separately — root cause is Chrome's stricter CORS / Opaque Response
Blocking; chasing a narrower fix than --disable-web-security before
landing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…284)

Chrome 143 enables Local Network Access (LNA) checks by default. When
asset discovery serves the root document via Fetch.fulfillRequest (the
domSnapshot path), Chrome treats sub-resource requests to local network
addresses (*.localhost, 127.0.0.1, RFC 1918) as needing explicit user
permission. Headless can't grant the prompt, so the request fails with
corsError = LocalNetworkAccessPermissionDenied and the resource is
never delivered to Percy's CDP listeners.

Add LocalNetworkAccessChecks to the existing --disable-features list so
the fulfilled-root flow keeps loading cross-origin local sub-resources,
matching v126 behavior. This is narrower than --disable-web-security:
it preserves CORS / CORB / ORB enforcement on real cross-origin
internet hosts and only relaxes Chrome's loopback / private-network
gating, which is exactly what asset discovery needs.

Resolves the 12 cross-origin and auto-domain-validation discovery test
failures from PR #2187 CI.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread packages/core/test/helpers/server.js Outdated
Comment thread packages/core/test/snapshot.test.js Outdated
Comment thread packages/core/src/install.js
Comment thread packages/core/src/browser.js Outdated
Copy link
Copy Markdown
Contributor

@this-is-shivamsingh this-is-shivamsingh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

commented :}}

Manoj-Katta added a commit that referenced this pull request Apr 27, 2026
Address inline review comments on PR #2187:

- src/browser.js: split the --disable-features comment into one bullet
  per feature. Each bullet documents the specific Chrome >=128/143
  behavior that broke and why disabling is the right fix
- src/install.js: add the procedure for picking per-platform revision
  numbers (chromiumdash + snapshot index) so future upgrades don't
  have to re-derive it
- test/helpers/server.js: only short-circuit /favicon.ico to 204 when
  the test hasn't supplied an explicit reply for it. Tests that want
  favicon-as-asset (none today, but future-proof) keep working via
  `server.reply('/favicon.ico', ...)`
- test/snapshot.test.js: expand the iframe srcdoc-serialization comment
  with a literal example of each form so the regex's "either form"
  intent is obvious

No behavioral change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Address inline review comments on PR #2187:

- src/browser.js: split the --disable-features comment into one bullet
  per feature. Each bullet documents the specific Chrome >=128/143
  behavior that broke and why disabling is the right fix
- src/install.js: add the procedure for picking per-platform revision
  numbers (chromiumdash + snapshot index) so future upgrades don't
  have to re-derive it
- test/helpers/server.js: only short-circuit /favicon.ico to 204 when
  the test hasn't supplied an explicit reply for it. Tests that want
  favicon-as-asset (none today, but future-proof) keep working via
  `server.reply('/favicon.ico', ...)`
- test/snapshot.test.js: expand the iframe srcdoc-serialization comment
  with a literal example of each form so the regex's "either form"
  intent is obvious

No behavioral change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Manoj-Katta Manoj-Katta force-pushed the PPLT-4214/upgrade-chromium-v143 branch from c615e84 to 4f02ad6 Compare April 27, 2026 12:17
Manoj-Katta and others added 13 commits April 28, 2026 00:25
…ntent-Length (PPLT-4214)

Two v143 regressions surfaced after the launch-flag fixes landed:

1. `does not capture remote files with content-length NAN greater
   than 25MB` — Chrome 143 won't terminate the body of a malformed
   `Content-Length` response, so Network.loadingFinished never fires,
   the page's <link href="large.css"> blocks the load event, and the
   navigation times out (3 retries × 30s = 93s). The existing size
   guard in saveResponseResource is unreachable.

2. `captures requests from workers` — for page-fetched worker scripts,
   Chrome 143 splits the lifecycle across two CDP sessions: the page
   session sees Fetch.requestPaused under requestId X, but
   Network.responseReceived / loadingFinished for the same script fire
   on the worker session under a fresh requestId Y once the worker
   target attaches. Percy's #requests entry under X is never cleaned
   up, idle() blocks indefinitely, retries fail. Confirmed via raw
   CDP probe (/tmp/worker-probe.mjs) — Chrome itself behaves
   correctly; the bug is in Percy's requestId-keyed bookkeeping.

Fix: enable Fetch response-stage interception (Fetch.continueRequest
with interceptResponse:true) and add _handleResponsePaused with four
branches:

- responseErrorReason set: confirm the failure with Fetch.failRequest
  so _handleLoadingFailed logs the network error as before.
- Content-Length oversized or malformed: forget the request,
  Fetch.failRequest with 'Aborted' before the body streams. This
  unsticks the page load for the NaN test.
- 3xx redirect: skip — _handleRequest already builds the redirect
  chain off the next request-stage event with redirectResponse.
- Otherwise: schedule a 1s backstop that forgets the request only if
  Network.loadingFinished / loadingFailed haven't already done so.
  For normal requests this is a no-op (Network events fire within
  tens of ms). For worker scripts where loadingFinished never fires
  on the page session, this is what unblocks idle().

Trade-off: every Fetch-intercepted request now does one extra CDP
roundtrip (response-stage pause -> continueResponse). Sub-millisecond
per request on localhost websocket. Puppeteer and Playwright run this
same dual-stage pattern.

Helpers added: inspectContentLength (returns tooLarge / malformed /
rawValue) and headersArrayToObject (Fetch event headers come as
[{name,value}]). Constant RESPONSE_STAGE_BACKSTOP_MS = 1000.

Both target tests pass in 2-3s in isolation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…s (PPLT-4214)

CI on 56306df surfaced a regression in `logs instrumentation for
network errors`: when a server destroys the socket mid-response,
Network.loadingFailed used to fire with errorText='net::ERR_EMPTY_RESPONSE'
(or similar concrete reason), and _handleLoadingFailed logged
asset_load_missing/network_error.

After enabling response-stage Fetch interception, the error now
surfaces first as Fetch.requestPaused with responseErrorReason set.
The previous handler called Fetch.failRequest({errorReason:
responseErrorReason}), which forces Chrome to fire Network.loadingFailed
with errorText synthesized from that reason. For socket-destroy,
responseErrorReason is the generic 'Failed', which collapses to
net::ERR_FAILED — the exact errorText the asset_load_missing branch
explicitly excludes, so the instrumentation was lost.

Switch to Fetch.continueResponse for errored responses. This lets
Chrome propagate the original errorText through Network.loadingFailed
naturally, restoring the asset_load_missing log.

Verified locally:
- logs instrumentation for network errors: passes (3s)
- captures requests from workers: still passes (4s)
- does not capture remote files with content-length NAN: still passes (2s)
- captures redirected resources, does not capture event-stream
  requests, logs failed request errors with a debug loglevel, logs
  unhandled response errors gracefully: all still pass

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The other discovery specs opt out of /favicon.ico via the test
server's 204 short-circuit so favicon noise doesn't shift their
captured[] indices. This spec opts back in by registering an
explicit /favicon.ico reply, asserting that Percy correctly
captures the favicon when the server actually serves one — the
production scenario where a customer's site has a real favicon.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… a timer (PPLT-4214)

Replaces the 1-second backstop timer with a principled fix: read the
response body via Fetch.getResponseBody at response stage, save the
resource, and forget the request — so the entire lifecycle is complete
before Chrome would dispatch Network.loadingFinished.

The backstop approach was a workaround that silently dropped requests
where Network.loadingFinished never fired on the page session (Chrome
143's split-session worker scripts being the motivating case). The
worker test passed because the orphan was deleted, but the worker
script itself was never captured as a Percy resource — a real fidelity
gap, even if invisible to visual diffs.

Approach 2 captures the resource using the data Chrome gives us at
response stage:
- Errored response: continue, let Network.loadingFailed propagate the
  original errorText to _handleLoadingFailed (unchanged from before).
- Oversized / malformed Content-Length: log skip, forget, Fetch.failRequest
  before body streams (unchanged from before).
- Redirect (3xx): continue; _handleRequest builds the chain on the next
  request stage event.
- Streaming (text/event-stream): continue without reading body — SSE bodies
  never complete and Fetch.getResponseBody would hang.
- Normal: read body via Fetch.getResponseBody, build a synthetic response
  object, run saveResponseResource, forget the request.

Network.loadingFinished still fires later for normal page-session requests
but finds nothing in #requests and exits early (the existing handler
already guards for this case).

Other changes in this commit:
- Drop RESPONSE_STAGE_BACKSTOP_MS constant and the setTimeout block.
- Trim the multi-paragraph block comments in _handleResponsePaused, the
  helpers, and the dispatch in _handleRequestPaused down to one-liners.
- Update `logs unhandled response errors gracefully` test to mock
  Fetch.getResponseBody (the body-read path is now Fetch-based).
- Make `captures favicon when the server provides one` deterministic by
  adding <link rel="icon"> to the test DOM — the previous version relied
  on Chrome's auto-fetch which has non-deterministic timing on Windows CI.

Net diff vs. backstop approach: -27 lines in network.js, +5 lines in
discovery.test.js. All affected tests pass locally:
- captures favicon when the server provides one (2s)
- does not capture remote files with content-length NAN greater than 25MB (2s)
- captures requests from workers (2s)
- captures redirected resources (3s)
- does not capture event-stream requests (3s)
- logs failed request errors with a debug loglevel (2s)
- logs instrumentation for network errors (2s)
- logs unhandled response errors gracefully (2s)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… path (PPLT-4214)

Drop the dead Content-Length size check from saveResponseResource — the
oversized/malformed header path is now rejected earlier in
_handleResponsePaused via Fetch.failRequest, so this guard never runs.
The body.length check below still covers cached responses whose headers
lie about size.

Add two discovery specs to keep coverage at 100%:

- "logs gracefully when direct font request fails" exercises the catch
  in saveResponseResource by serving the font as application/octet-stream
  (forces makeDirectRequest) and returning 400 on the direct fetch
  (makes makeRequest throw without retry).

- "captures auto-fetched favicon when the page does not declare one"
  validates that Percy captures Chrome's implicit /favicon.ico fetch.
  Bumps networkIdleTimeout to 1000ms so the auto-fetch lands before idle
  declares the network done — Chrome's auto-fetch timing varies by
  platform and was previously flaky on Windows.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…LT-4214)

The Network constructor reads page.session.browser.version.userAgent. If
the session crashes mid-construction (close() sets browser to null), the
read throws a TypeError that masks the real "Session crashed!" error.

Add optional chaining so userAgent stays unset on the race; the next CDP
call in Network.watch surfaces the real session-closed error as expected.

Fixes Snapshot > handles page crashes on Linux CI.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… read (PPLT-4214)

- Add discovery spec that emits a response-stage Fetch.requestPaused for
  a request that was never tracked at the request stage. Exercises the
  defensive null-request branches in _handleResponsePaused (line 236,
  254, 278-282).
- Simplify error?.message ?? '' to error && error.message ? error.message : ''
  in browser.js handleError so all branches are reachable from existing
  launch-failure tests. Behaviorally identical for any realistic input.
- Update install.js comment to point to the browseable index.html for
  Chromium snapshots.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…-4214)

- Replace the 500ms polling loop in 'returns the same promise for
  concurrent validation requests to the same hostname' with a
  deterministic Promise that the validation mock resolves on its first
  call. The hardcoded ceiling was too tight for a loaded CI suite and
  the v143 response-stage refactor's per-request overhead pushed the
  test over the edge consistently.
- Attach a no-op catch on the sync-job promise in 'handleSyncJob should
  handle promise reject' so the rejection isn't reported as unhandled
  in the race window between snapshot returning and handleSyncJob
  attaching its own handler. expectAsync still observes the rejection.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…se (PPLT-4214)

The previous fix attached .catch() after percy.snapshot returned, but
percy.js:495 creates and assigns the upload promise synchronously inside
the snapshot call, and the upload can reject before the test's next line
runs. Use a setter on the promise property so the no-op catch is attached
the instant percy assigns the promise — eliminating the race window.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…oom (PPLT-4214)

The 'stops accepting snapshots when an in-progress build fails' spec
relied on a 100ms waitForTimeout to keep snapshot-2 in discovery while
snapshot-1's upload failed. The v143 response-stage refactor makes the
suite ~50% slower under CI load, so 100ms is no longer enough — both
uploads were getting through.

Bump to 1000ms. The race window the test exercises is the same; we just
give the failing upload more time to be observed before the second
snapshot leaves discovery.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…PPLT-4214)

Adds a discovery spec that emits a response-stage Fetch.requestPaused with
a malformed Content-Length header for a request that was never tracked at
the request stage. This exercises the if (request) false branch inside
_handleResponsePaused's oversized/malformed handler — line 254 was flagged
as a coverage gap on CI after the response-stage refactor.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… (PPLT-4214)

Root cause: Chrome 143 doesn't fire Network.responseReceived for worker
scripts — only requestWillBeSent and loadingFinished. Percy's
_handleLoadingFinished awaited responseReceived unconditionally, blocking
forever and leaking the request in #requests so idle() never returned.

Fix:
- Cap the responseReceived wait in _handleLoadingFinished at 2s. When
  responseReceived never fires (worker scripts), forget the request
  without saving rather than blocking.
- Simplify _handleResponsePaused: keep response-stage interception only
  for the malformed/oversized Content-Length abort. Stop reading the
  body at response stage — let it be captured later via the v126 path
  (Network.getResponseBody after loadingFinished). The body-read at
  response stage was added in bd44f96 as a workaround for the leak
  above; the proper fix is the timeout.

Test fix: 'logs unhandled response errors gracefully' now mocks
Network.getResponseBody (the actual call site after this change),
not Fetch.getResponseBody.

Verified locally:
- captures requests from workers: 5s (was 90s timeout)
- does not capture remote files with content-length NAN: 4s
- 3 content-length casing variants: pass
- logs instrumentation for network errors / 5xx / empty: pass
- captures redirected resources: pass
- does not capture event-stream requests: pass
- captures stylesheet initiated fonts: pass
- infers mime type when CDP returns text/plain: pass
- captures responsive assets / responsive srcset+mediaquery: pass
- handles page crashes: pass
- captures with valid auth credentials + 3 font-auth variants: pass
- handleSyncJob / in-progress build fails / concurrent validation: pass

Auth test 'does not capture without auth credentials' still slow at
1m34s — that's a fundamental Chrome 143 behavior (load event doesn't
fire after auth cancel), not fixable in Percy.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CI flagged no-multiple-empty-lines at discovery.test.js:3817 — leftover
from removing the failing browser.js coverage test attempt earlier.
Verified yarn lint clean locally.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Manoj-Katta and others added 10 commits May 8, 2026 00:31
…rsions (PPLT-4214)

Pass a real Error from handleExitClose so the rejection has one shape on
every code path; switch the failing-launch test to --version, which makes
Chrome exit deterministically without depending on its arg-parsing quirks.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…LT-4214)

Lift the 2s cap to a named constant, clear the timer on the
responseReceived-wins path, and log a debug line when the cap fires so
a regression of the v143 worker-script hang would surface in CI rather
than silently dropping the resource. Asserts the log in the existing
worker test.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…st (PPLT-4214)

Chrome 143 with --version exits cleanly on Linux/macOS but keeps running on
Windows. Use --remote-debugging-port=null on Windows (still exits there, per
original v126 behavior) and --version elsewhere so each runner exercises the
handleExitClose path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…-4214)

Chrome 143+ fetches dedicated worker scripts in the browser process
(PlzDedicatedWorker) and never surfaces the response on any CDP session,
so the bytes can't be retrieved via Network.getResponseBody. For
enableJavaScript:true snapshots the cloud renderer needs the script to
construct the worker. Fall back to a direct HTTP fetch (same pattern
makeDirectRequest already uses for fonts) when the responseReceived
timeout fires for a Script-type request, and assert worker.js is
captured in the existing worker test.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…PPLT-4214)

Two issues left worker.js missing from the snapshot resource list on
Linux + Windows CI even after the direct-HTTP fallback in 9adeecd:

1. Chrome 143 PlzDedicatedWorker reports the worker script's
   `resourceType` as 'Other', not 'Script'. The strict
   `request.type === 'Script'` gate skipped the fallback. Replace it
   with an `allowedHostnames` filter so the gate matches what the
   snapshot would have captured anyway, regardless of how Chrome
   classifies the request type.

2. `Network.loadingFinished` for the worker script fires on the
   *worker* CDP session in v143. `Network.getCookies` on the worker
   session throws "Internal error", which made `makeDirectRequest`
   throw silently inside the fallback. Wrap the cookies call in
   try/catch and proceed with an empty cookies array. Page-session
   callers (fonts) are unaffected.

Also derive the resource mimetype from the URL via `mime.lookup()`
instead of hardcoding `application/javascript`, so any timed-out
allowed-hostname resource is saved with the correct mimetype.

Verified locally: focused worker test passes in 5s. All discovery
and network specs pass on macOS Node 25 (the 27 unrelated failures
in the local full run are Node 25-only mock issues in
install/doctor tests; CI runs Node 14 where they pass).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…Resource (PPLT-4214)

The `if (!response)` block was dead code: saveResponseResource is only
called from _handleLoadingFinished after the `if (!request.response)`
early-return guards against a null response, so the inner check could
never fire. The pre-existing `/* istanbul ignore if */` was masking the
unreachability; deleting it surfaces the cleaner control flow.

CI flagged this line as the only remaining coverage gap on Linux after
the worker-test fix in 37f5176.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ResponseResource (PPLT-4214)"

This reverts commit ec13879.
…ations (PPLT-4214)

Drops the two `/* istanbul ignore next: race with abort/close */` comments
in _handleResponsePaused (Fetch.failRequest catch) and _continueResponse
(Fetch.continueResponse catch), and backs each branch with a real test.

New specs under "Discovery / with resource errors":
- logs gracefully when Fetch.failRequest fails during malformed-CL abort
- silently swallows Fetch.continueResponse benign races (ABORTED_MESSAGE)
- logs gracefully when Fetch.continueResponse fails with unexpected error
- logs gracefully when direct worker-script fetch fails

Together with the existing untracked-request specs, every branch in the
two catch blocks plus the captureScriptDirectly catch is now exercised
without `istanbul ignore`, addressing the post-merge coverage gap on
network.js (previously L317, L755).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…PPLT-4214)

The companion to c20c581's network.js cleanup, "logs gracefully when
direct worker-script fetch fails", passed locally on macOS Node 25 but
failed on CI Linux Node 14 because its worker scaffolding (real
`new Worker()` + `waitForSelector: '.done'`) interacted badly with
PlzDedicatedWorker when /worker.js had to return 200 on the first hit
and 400 on the second. The worker initialization never completed,
captureScriptDirectly was never reached, the expected debug log never
appeared.

The other three new specs from c20c581 (failRequest unexpected error,
continueResponse ABORTED_MESSAGE silent return, continueResponse
unexpected error) all pass on CI Linux and are kept. Network.js:755
returns to "uncovered" status; a deterministic test for that catch
(no JS execution, no real worker) will land in a follow-up.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@bhokaremoin bhokaremoin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed end-to-end — the diagnostics, Chromium-source citations, and per-feature rationale in this PR are well above the bar, and the dual-stage Fetch interception + worker-script direct-fetch fallback are well-considered. Twelve findings inline:

Blocking

  • C1 — CI is red. network.js:753 (the captureScriptDirectly catch-block log.debug) is the only uncovered line in src/ after 2d48f20 dropped its test. The repo enforces 100% global coverage. Either re-add the test (stubbing makeDirectRequest) or /* istanbul ignore next */ the catch consistent with this file's pattern.

Worth addressing in this PR

  • C2 — Add an inline comment in _handleResponsePaused explaining when #requests.get(requestId) returns undefined (SW-fulfilled? redirect race?). The behavior is correct; the rationale is invisible.
  • C3 — URL normalization mismatch in _handleResponsePaused: tracked path uses originURL (normalized), untracked falls back to raw event.request.url. Telemetry will see two different URLs for the same resource.
  • C4 — Extract the 6-feature --disable-features= list into a named DISABLED_FEATURES array with one comment per flag noting the Chrome version that introduced the need. Will pay off massively at the next Chrome upgrade.
  • C6 — Move the favicon "band-aid" follow-up commitment from the PR body into a TODO(PPLT-4214 follow-up): comment in server.js so the cleanup is discoverable.
  • C8Fetch.failRequest catch logs at debug — invisible at default loglevel. A stuck-snapshot production incident would have no signal. Log non-race errors at warn.
  • C11 — Confirm darwin: 1536380 vs darwinArm: 1536376 weren't swapped; the asymmetry is plausible but worth a quick double-check against the snapshot index.
  • C12 — Confirm that silently skipping worker scripts from non-allowedHostnames hosts is intended (default-config users will lose CDN worker captures after this upgrade vs. v126). If yes, add a distinct debug log so the skip is observable.

Polish / defer to follow-up

  • C5 — 100ms → 1000ms waitForTimeout bump in percy.test.js papers over a timing race rather than fixing it. The firstCallResolver pattern used elsewhere in this PR would be deterministic.
  • C7parseInt(rawValue)parseInt(rawValue, 10) to match repo convention.
  • C9 — One-line expansion of the RESPONSE_RECEIVED_TIMEOUT comment documenting the N × 2s cumulative-timeout worst case.
  • C10 — Mirror the cookie-less direct-fetch caveat into user-facing release notes so customers running auth-gated dedicated workers aren't surprised by diffs.

Overall: solid work. Once C1 is resolved and the C2–C4/C6/C8/C11/C12 items are addressed (or explicitly deferred with rationale), this is ready to merge.

Comment thread packages/core/src/network.js
Comment thread packages/core/src/network.js
Comment thread packages/core/src/network.js Outdated
Comment thread packages/core/src/browser.js Outdated
Comment thread packages/core/test/percy.test.js
Comment thread packages/core/src/network.js Outdated
Comment thread packages/core/src/network.js
Comment thread packages/core/src/network.js
Comment thread packages/core/src/install.js
Comment thread packages/core/src/network.js Outdated
Manoj-Katta and others added 6 commits May 13, 2026 12:28
…rectly (PPLT-4214)

Bundle of ce:review + PR review fixes for the v143 PlzDedicatedWorker direct-fetch
fallback path in network.js. Closes the L755 coverage gap that's been blocking CI
since the unstable worker-based Test D was reverted in 2d48f20.

Renames:
- captureScriptDirectly → captureResourceDirectly. The function captures any
  allowlisted resource that fell through to direct fetch, not just worker scripts;
  the old name was misleading. Module-private; no external callers affected.

Must-fix (P1/P2 production-risk items):
- Cookies: read from network.page.session (full Network domain) instead of the
  triggering session, since worker sessions throw "Internal error" on
  Network.getCookies. Defensive try/catch retained.
- DIRECT_FETCH_TIMEOUT (5s) caps captureResourceDirectly via Promise.race;
  prevents idle() from hanging the full networkIdleWaitTimeout (~30s) when a
  worker host accepts TCP and stalls. (P1 #1)
- makeDirectRequest now returns { body, status }; captureResourceDirectly enforces
  the 25MB MAX_RESOURCE_SIZE guard before saveResource and records the real HTTP
  status instead of hardcoded 200. Font path call site updated to destructure. (P1 #2)
- Direct-fetch gate at _handleLoadingFinished mirrors saveResponseResource's
  disallowedHostnames-then-allowedHostnames precedence; emits a debug log when
  fallback is skipped due to hostname gating. (P2 #10 + C12)
- Authorization header in makeDirectRequest now requires target origin to match
  the page's snapshot origin; prevents Basic-auth credential leak to redirected
  third-party origins. (P2 #11)

Reviewer polish:
- _handleResponsePaused malformed/oversized branch: Fetch.failRequest runs BEFORE
  _forgetRequest so Chrome's Fetch state can't leak paused if failRequest throws.
  Unknown errors trigger a last-resort Fetch.continueResponse to un-pause. Known
  races (ABORTED_MESSAGE / Invalid InterceptionId) remain silent. (P2 #4)
- Unknown errors in _handleResponsePaused failRequest catch and _continueResponse
  catch now log at warn (was debug) for production observability. (P2 #5 + C8)
- _handleResponsePaused: inline comment explaining when the untracked-request
  branch fires (service-worker-fulfilled responses or cleanup races). (C2)
- _handleResponsePaused: url normalization consistent between tracked and
  untracked paths via normalizeURL on the untracked fallback. (C3)
- parseInt now uses explicit radix 10 at both call sites. (C7)
- RESPONSE_RECEIVED_TIMEOUT comment notes the cumulative N*2s worst case. (C9)

Test:
- New deterministic spec "logs gracefully when the direct-fetch fallback fails"
  exercises captureResourceDirectly's catch path by dropping Network.responseReceived
  for a CSS asset (no JS execution, no real worker). Closes the L755 coverage gap.
  (C1 / P1 #3)

Intentionally deferred (reviewer comments replied separately):
- P2 #15 — no-response branch flip-flop predates this PR (commit ae1d388,
  2022-09-21); out of scope.
- P2 #6 — sec-ch-ua hardcoded version is one of several stale literals in the
  makeDirectRequest header block; deferring full audit.
- C4 + S4 — DISABLED_FEATURES extraction; existing block-level comment adequate.
- C5 — percy.test.js timing fix; reviewer pre-approved deferring.
- S2 / S3 — example / reference already present in existing comments.
- Favicon Task A — pending separate investigation of snapshot.test.js timing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two related CI fixes for the previous commit:

1. Revert log.warn -> log.debug in _handleResponsePaused's failRequest catch
   and _continueResponse's catch.

   The promotion was too aggressive: it treated Session closed / Target closed
   (lifecycle teardown races) as unknown errors and fired a warn line, breaking
   exact-stderr-equality assertions in tests like
   snapshot.test.js's `handles the page closing early` (which intentionally
   tears down the browser mid-fetch) and producing noise on routine
   percy.stop() teardown for customers.

   The other parts of the earlier reorder commit are kept:
   - Fetch.failRequest BEFORE _forgetRequest (lifecycle correctness)
   - Last-resort Fetch.continueResponse on unknown failRequest errors
   - Distinction between known-race silent return vs unknown-error log

   Production observability for stuck-snapshot incidents will be addressed
   in a follow-up via instrumentation telemetry rather than user-facing logs.

2. Convert the targetDOM template literal in `logs gracefully when the
   direct-fetch fallback fails` to a single-quoted string. The template
   literal had no interpolation and tripped ESLint's quotes rule
   (single-quote preference).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…T-4214)

Covers the two integration-testable uncovered lines in `network.js`:
- `Network.getCookies` failure catch in `makeDirectRequest`
- 25 MB direct-fetch body-size guard in `captureResourceDirectly`

The third uncovered branch (the hostname-not-allowed skip in
`_handleLoadingFinished`) only fires for PlzDedicatedWorker requests
whose worker-script fetch bypasses `Fetch.requestPaused`. Cross-origin
assets loaded via the document session still flow through
`sendResponseResource`, so the integration test harness can't reliably
exercise that branch. Annotated with `/* istanbul ignore else */` plus
a justification comment so the gate stays at 100%.
…-4214)

Extracts two pure helpers from `makeDirectRequest` so the previously
uncovered branches become unit-testable without `/* istanbul ignore */`
annotations:

- `pickCookieSession(network, session)` — picks the page's full Network
  domain when set, falls back to the request's own session otherwise.
- `shouldAttachAuth(authorization, requestUrl, snapshotUrl)` — re-enforces
  the same-origin rule for Node-side Basic auth (cross-origin or malformed
  URLs return false defensively).

`makeDirectRequest` now reads as straight-line composition: pick session →
fetch cookies → assemble headers → `if (shouldAttachAuth(...))` attach.

Tests:
- 7 unit tests in `test/unit/network.test.js` cover every branch of both
  helpers (page-set, page-undefined, page-no-session, no-auth, empty-auth,
  same-origin, cross-origin, malformed URLs).
- 1 integration test in `test/discovery.test.js` covers the mime-lookup
  fallback to `application/javascript` for direct-fetch URLs with no
  recognizable extension.

Behavior is byte-identical to the previous inline implementation.
…LT-4214)

Same pattern as pickCookieSession / shouldAttachAuth: lifts the
Promise.race + setTimeout dance out of captureResourceDirectly into
an exported pure helper so its branches become unit-testable.

The setTimeout callback (only fires on a 5s+ direct-fetch stall) was
the last remaining function-coverage gap in network.js — no realistic
integration test stalls a fetch long enough to trip it. With the
helper unit-tested via a 10ms timeout, the gap closes without an
istanbul-ignore marker.

Side benefit: the outer try/finally in captureResourceDirectly no
longer needs to clear the timer (the helper does it internally),
which slightly simplifies the function.

Tests: 3 unit tests in test/unit/network.test.js cover all 3 outcomes
of raceWithTimeout (resolves-before-timeout, rejects-on-timeout,
inner-rejection-propagates).
Copy link
Copy Markdown
Contributor

@amandeepsingh333 amandeepsingh333 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent write-up — the diagnostic story for PlzDedicatedWorker and the Chromium-source citations are the kind of context that makes this reviewable at all. The layered fix (response-stage interception for malformed CL + 2s responseReceived timeout + direct-fetch fallback with origin-scoped auth) is the right shape.

Non-test concerns, ordered by severity:

Should-fix (correctness/operational):

  1. install.jsdarwinArm: '1536376' is identical to win64. Likely a copy-paste error; ARM macOS should have its own snapshot.
  2. network.jssec-ch-ua-platform: '"macOS"' is hardcoded and sent from Linux/Windows runners. Servers doing content negotiation on this header will misbehave. PR body marks this "deferred" but it's a real signal.
  3. network.js#_continueResponse — PR body claims "Production observability via log.warn" but the code uses log.debug. Either fix code or fix the description.
  4. captureResourceDirectly mimetype default application/javascript — assumes worker-scripts. For non-worker direct fetches that hit this path (font fallback was the original caller), an extensionless URL silently gets mistyped as JS.

Nice-to-have (clarity / defensive):
5. browser.js comma-bundled --disable-features list mixes process-isolation (security boundary) with HTTPS-first and LNA. Worth a comment per disabled feature explaining why it's safe in headless asset discovery and not safe to copy to any non-headless code path.
6. shouldAttachAuth defensively return false when meta?.snapshotURL is undefined — confirm meta.snapshotURL is guaranteed populated in all entry paths; otherwise existing customers using network.authorization with the font fallback may silently lose auth.
7. RESPONSE_RECEIVED_TIMEOUT cumulative N*2s — comment acknowledges it but no hard cap. A pathological page with many worker fetches that all timeout could compound. Worth a per-snapshot ceiling.

I deliberately skipped test files per your request. Notes inline.

Comment thread packages/core/src/install.js
Comment thread packages/core/src/network.js
Comment thread packages/core/src/network.js Outdated
Comment thread packages/core/src/network.js
Comment thread packages/core/src/network.js
Comment thread packages/core/src/browser.js Outdated
Comment thread packages/core/src/network.js
Comment thread packages/core/src/network.js
Comment thread packages/core/src/network.js
…back (PPLT-4214)

Previously, captureResourceDirectly derived mimetype from URL extension
alone, falling back to 'application/javascript' for extensionless paths.
That default is the worst wrong guess — extensionless REST endpoints
(/api/data, worker registries) often serve JSON/HTML/binary, and
treating those as JS produces confusing renderer parse errors.

New precedence in captureResourceDirectly:
  1. HTTP Content-Type response header from the direct fetch
  2. mime.lookup() on the URL extension
  3. application/octet-stream (safe binary default, never JS)

makeDirectRequest's makeRequest callback now also returns `headers` so
captureResourceDirectly can read the server's Content-Type. The font
path in saveResponseResource continues to destructure { body } only and
is unaffected.

Test: the unknown-extension integration spec now asserts the server's
declared 'application/octet-stream' is preserved (was previously
asserting the JS fallback that this commit removes).
…lpers (PPLT-4214)

Two minimal changes:

1. browser.js — pull the inline `--disable-features=` string into a
   `DISABLED_FEATURES` array with a per-entry comment for each flag.
   Typos in feature names are now diff-visible; security-boundary flags
   (IsolateOrigins, site-per-process) are explicitly tagged as
   headless-only. No behavior change.

2. network.js — extract the direct-fetch mimetype resolution into an
   exported `resolveDirectFetchMime(responseHeaders, urlForLookup)`
   helper. Same playbook as pickCookieSession / shouldAttachAuth /
   raceWithTimeout: pure function, all branches unit-testable without
   integration test gymnastics. Closes the last `network.js` branch
   coverage gap (line 822's optional-chain F-branch).

3. network.js — RESPONSE_RECEIVED_TIMEOUT comment updated to mention
   PERCY_NETWORK_IDLE_WAIT_TIMEOUT as the cumulative safety valve.

Tests: 3 unit tests in test/unit/network.test.js cover all branches of
resolveDirectFetchMime (server-MIME bare, server-MIME with charset
parameter, URL-extension fallback, application/octet-stream fallback,
empty content-type).
Comment thread packages/core/src/browser.js
@amandeepsingh333
Copy link
Copy Markdown
Contributor

Have we tested locally?

Copy link
Copy Markdown
Contributor

@bhokaremoin bhokaremoin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving — all 12 findings from my original review are addressed (9 fixed in code, 2 defended with reasoning captured in-thread, 1 explicit deferral per pre-approval). The second reviewer's pass added 4 more genuine improvements (mimetype fallback, DISABLED_FEATURES extraction, PR-body/code mismatch, ARM macOS revision verified). All 26 review threads are resolved. CI is green across all 44 checks including the previously-red Test @percy/core coverage gate.

The layered v143-compat work (dual-stage Fetch interception, RESPONSE_RECEIVED_TIMEOUT + DIRECT_FETCH_TIMEOUT worker-script fallback, shouldAttachAuth same-origin guard, pickCookieSession page-session preference) is well-considered and the diagnostic write-ups in the PR body are exemplary.

Remaining deferred work is properly externalized:

  • PPLT-4215sec-ch-ua-platform, sec-fetch-dest, optional-chain UA fallback (font-asset capture regression follow-up)
  • Stuck-snapshot observability → structured instrumentation telemetry follow-up
  • CHANGELOG entry for cookie-less worker-script fallback → release time
  • Per-instance cumulative-timeout cap → telemetry-driven, only if a customer hits it

None of those are correctness regressions vs. v126. Ship it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants