Skip to content

fix(ci): Fix E2E test flakiness#5830

Draft
antonis wants to merge 3 commits intomainfrom
antonis/fix-e2e-flakiness-combined
Draft

fix(ci): Fix E2E test flakiness#5830
antonis wants to merge 3 commits intomainfrom
antonis/fix-e2e-flakiness-combined

Conversation

@antonis
Copy link
Contributor

@antonis antonis commented Mar 17, 2026

📢 Type of change

  • Bugfix

📜 Description

Fixes iOS E2E flakiness.

Simulator configuration

  • wait_for_boot: true / erase_before_boot: false
  • MAESTRO_DRIVER_STARTUP_TIMEOUT: 180000 (3 min, up from 90–120s)
  • Settings.app warm-up step before tests start

e2e-v2 test runner (cli.mjs)

  • Per-flow process isolationmaestro test maestro shares a single session; when crash.yml kills the app, subsequent flows fail. Running each flow in its own process prevents this cascade.
  • Per-flow retries (up to 3 attempts) for transient timing failures
  • execSyncexecFileSync to avoid shell interpolation

crash.yml

  • Removed post-crash launchApp + assertTestReady — unreliable on Tart VMs and unnecessary with per-flow process isolation

Sample application test fixes

  • Search all envelopes for app start transaction (may arrive in a separate envelope on slow VMs)
  • Sort envelopes by timestamp for consistent ordering
  • Filter TTID/TTFD assertion to op === 'navigation' only (deny-list missed types like navigation.processing)
  • Relax HTTP spans assertion to >= 1 (native layer span may not complete on slow VMs)
  • Per-flow retries in maestro.ts (up to 3 attempts)

#skip-changelog

💡 Motivation and Context

iOS E2E tests have been failing on every main commit:

  • e2e-v2: flows fail with "App crashed or stopped" due to crash cascade and slow simulator boot
  • sample-application: assertion failures from envelope ordering, missing transactions, and incomplete spans

💚 How did you test it?

CI

📝 Checklist

  • I added tests to verify changes
  • No new PII added or SDK only sends newly added PII if sendDefaultPII is enabled
  • All tests passing
  • No breaking changes

🔮 Next steps

  • Monitor retry frequency post-merge

@github-actions
Copy link
Contributor

github-actions bot commented Mar 17, 2026

Semver Impact of This PR

None (no version bump detected)

📋 Changelog Preview

This is how your changes will appear in the changelog.
Entries from this PR are highlighted with a left border (blockquote style).


This PR will not appear in the changelog.


🤖 This preview updates automatically when you update the PR.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 17, 2026

Android (legacy) Performance metrics 🚀

  Plain With Sentry Diff
Startup time 408.35 ms 411.92 ms 3.57 ms
Size 43.75 MiB 48.08 MiB 4.32 MiB

Baseline results on branch: main

Startup times

Revision Plain With Sentry Diff
4a17c8f+dirty 406.62 ms 400.58 ms -6.04 ms
df1f7df+dirty 442.64 ms 427.16 ms -15.48 ms
a483f9f+dirty 396.82 ms 453.28 ms 56.46 ms
60cd796+dirty 445.84 ms 492.45 ms 46.61 ms
5c16cdc+dirty 423.48 ms 452.35 ms 28.88 ms
80e4616+dirty 411.58 ms 462.12 ms 50.54 ms
55b77fc+dirty 411.87 ms 417.16 ms 5.29 ms
bca62c0+dirty 414.36 ms 451.06 ms 36.70 ms
0b64753+dirty 448.67 ms 474.61 ms 25.94 ms
4e6d7d7+dirty 480.73 ms 515.73 ms 35.00 ms

App size

Revision Plain With Sentry Diff
4a17c8f+dirty 43.75 MiB 47.99 MiB 4.24 MiB
df1f7df+dirty 43.75 MiB 48.08 MiB 4.33 MiB
a483f9f+dirty 43.75 MiB 48.41 MiB 4.66 MiB
60cd796+dirty 43.75 MiB 48.07 MiB 4.32 MiB
5c16cdc+dirty 17.75 MiB 19.68 MiB 1.94 MiB
80e4616+dirty 43.75 MiB 48.55 MiB 4.80 MiB
55b77fc+dirty 43.75 MiB 47.99 MiB 4.24 MiB
bca62c0+dirty 43.75 MiB 48.41 MiB 4.66 MiB
0b64753+dirty 17.75 MiB 19.70 MiB 1.95 MiB
4e6d7d7+dirty 43.75 MiB 48.40 MiB 4.64 MiB

Previous results on branch: antonis/fix-e2e-flakiness-combined

Startup times

Revision Plain With Sentry Diff
2cf0553+dirty 448.74 ms 483.10 ms 34.36 ms
470fa9f+dirty 389.60 ms 430.42 ms 40.82 ms
b6f917a+dirty 468.08 ms 529.56 ms 61.48 ms
4e6aa78+dirty 494.12 ms 525.73 ms 31.61 ms
d34c279+dirty 440.10 ms 477.96 ms 37.86 ms
ca6a32c+dirty 396.66 ms 442.52 ms 45.86 ms
a6bb98b+dirty 453.56 ms 492.82 ms 39.26 ms
c96c5b7+dirty 446.10 ms 449.47 ms 3.37 ms
ab6855b+dirty 459.24 ms 476.94 ms 17.70 ms
7e6fe7f+dirty 425.16 ms 471.08 ms 45.92 ms

App size

Revision Plain With Sentry Diff
2cf0553+dirty 43.75 MiB 48.08 MiB 4.32 MiB
470fa9f+dirty 43.75 MiB 48.08 MiB 4.32 MiB
b6f917a+dirty 43.75 MiB 48.07 MiB 4.32 MiB
4e6aa78+dirty 43.75 MiB 48.08 MiB 4.32 MiB
d34c279+dirty 43.75 MiB 48.32 MiB 4.57 MiB
ca6a32c+dirty 43.75 MiB 48.08 MiB 4.32 MiB
a6bb98b+dirty 43.75 MiB 48.08 MiB 4.32 MiB
c96c5b7+dirty 43.75 MiB 48.08 MiB 4.32 MiB
ab6855b+dirty 43.75 MiB 48.08 MiB 4.32 MiB
7e6fe7f+dirty 43.75 MiB 48.08 MiB 4.32 MiB

@github-actions
Copy link
Contributor

github-actions bot commented Mar 17, 2026

iOS (legacy) Performance metrics 🚀

  Plain With Sentry Diff
Startup time 1224.46 ms 1218.07 ms -6.39 ms
Size 3.38 MiB 4.73 MiB 1.35 MiB

Baseline results on branch: main

Startup times

Revision Plain With Sentry Diff
ea3e26e+dirty 1229.13 ms 1228.46 ms -0.67 ms
80e4616+dirty 1221.32 ms 1225.64 ms 4.32 ms
818a608+dirty 1205.76 ms 1208.00 ms 2.24 ms
77061ed+dirty 1233.16 ms 1234.88 ms 1.71 ms
bef3709+dirty 1222.07 ms 1220.24 ms -1.83 ms
a206511+dirty 1185.00 ms 1186.35 ms 1.35 ms
74979ac+dirty 1210.49 ms 1213.31 ms 2.82 ms
a2bb688+dirty 1223.53 ms 1232.90 ms 9.37 ms
8a868fe+dirty 1221.50 ms 1230.78 ms 9.28 ms
d590428+dirty 1211.77 ms 1220.51 ms 8.75 ms

App size

Revision Plain With Sentry Diff
ea3e26e+dirty 3.41 MiB 4.58 MiB 1.17 MiB
80e4616+dirty 3.38 MiB 4.60 MiB 1.22 MiB
818a608+dirty 2.63 MiB 3.91 MiB 1.28 MiB
77061ed+dirty 2.63 MiB 3.98 MiB 1.34 MiB
bef3709+dirty 3.38 MiB 4.78 MiB 1.40 MiB
a206511+dirty 3.41 MiB 4.67 MiB 1.25 MiB
74979ac+dirty 3.38 MiB 4.60 MiB 1.22 MiB
a2bb688+dirty 2.63 MiB 3.99 MiB 1.36 MiB
8a868fe+dirty 3.38 MiB 4.60 MiB 1.22 MiB
d590428+dirty 3.38 MiB 4.78 MiB 1.39 MiB

Previous results on branch: antonis/fix-e2e-flakiness-combined

Startup times

Revision Plain With Sentry Diff
c96c5b7+dirty 1194.79 ms 1193.75 ms -1.04 ms
b6f917a+dirty 1230.39 ms 1223.63 ms -6.76 ms
7e6fe7f+dirty 1194.50 ms 1192.54 ms -1.96 ms
2cf0553+dirty 1204.83 ms 1209.55 ms 4.72 ms
4e6aa78+dirty 1196.21 ms 1199.65 ms 3.43 ms
470fa9f+dirty 1216.77 ms 1218.02 ms 1.26 ms
d34c279+dirty 1226.58 ms 1225.89 ms -0.69 ms
ca6a32c+dirty 1229.87 ms 1228.40 ms -1.47 ms
ab6855b+dirty 1220.92 ms 1221.73 ms 0.81 ms
a6bb98b+dirty 1229.87 ms 1227.51 ms -2.36 ms

App size

Revision Plain With Sentry Diff
c96c5b7+dirty 3.38 MiB 4.73 MiB 1.35 MiB
b6f917a+dirty 3.38 MiB 4.72 MiB 1.34 MiB
7e6fe7f+dirty 3.38 MiB 4.73 MiB 1.35 MiB
2cf0553+dirty 3.38 MiB 4.73 MiB 1.35 MiB
4e6aa78+dirty 3.38 MiB 4.73 MiB 1.35 MiB
470fa9f+dirty 3.38 MiB 4.73 MiB 1.35 MiB
d34c279+dirty 3.38 MiB 4.72 MiB 1.34 MiB
ca6a32c+dirty 3.38 MiB 4.73 MiB 1.35 MiB
ab6855b+dirty 3.38 MiB 4.73 MiB 1.35 MiB
a6bb98b+dirty 3.38 MiB 4.73 MiB 1.35 MiB

@github-actions
Copy link
Contributor

github-actions bot commented Mar 17, 2026

iOS (new) Performance metrics 🚀

  Plain With Sentry Diff
Startup time 1237.26 ms 1217.72 ms -19.54 ms
Size 3.38 MiB 4.73 MiB 1.35 MiB

Baseline results on branch: main

Startup times

Revision Plain With Sentry Diff
ea3e26e+dirty 1216.61 ms 1214.15 ms -2.47 ms
80e4616+dirty 1206.90 ms 1205.94 ms -0.96 ms
818a608+dirty 1218.84 ms 1223.18 ms 4.34 ms
77061ed+dirty 1210.77 ms 1218.45 ms 7.68 ms
bef3709+dirty 1217.79 ms 1225.33 ms 7.54 ms
a206511+dirty 1225.02 ms 1223.74 ms -1.28 ms
74979ac+dirty 1212.33 ms 1212.54 ms 0.21 ms
a2bb688+dirty 1244.82 ms 1238.60 ms -6.22 ms
8a868fe+dirty 1206.85 ms 1215.04 ms 8.19 ms
d590428+dirty 1221.23 ms 1225.27 ms 4.03 ms

App size

Revision Plain With Sentry Diff
ea3e26e+dirty 3.41 MiB 4.58 MiB 1.17 MiB
80e4616+dirty 3.38 MiB 4.60 MiB 1.22 MiB
818a608+dirty 3.19 MiB 4.48 MiB 1.29 MiB
77061ed+dirty 3.19 MiB 4.54 MiB 1.36 MiB
bef3709+dirty 3.38 MiB 4.78 MiB 1.40 MiB
a206511+dirty 3.41 MiB 4.67 MiB 1.25 MiB
74979ac+dirty 3.38 MiB 4.60 MiB 1.22 MiB
a2bb688+dirty 3.19 MiB 4.56 MiB 1.37 MiB
8a868fe+dirty 3.38 MiB 4.60 MiB 1.22 MiB
d590428+dirty 3.38 MiB 4.78 MiB 1.39 MiB

Previous results on branch: antonis/fix-e2e-flakiness-combined

Startup times

Revision Plain With Sentry Diff
c96c5b7+dirty 1223.89 ms 1228.02 ms 4.13 ms
b6f917a+dirty 1212.11 ms 1220.00 ms 7.89 ms
7e6fe7f+dirty 1211.67 ms 1210.47 ms -1.20 ms
2cf0553+dirty 1220.64 ms 1224.72 ms 4.08 ms
4e6aa78+dirty 1226.67 ms 1225.72 ms -0.94 ms
470fa9f+dirty 1220.96 ms 1221.77 ms 0.81 ms
d34c279+dirty 1210.63 ms 1224.85 ms 14.22 ms
ca6a32c+dirty 1231.83 ms 1241.28 ms 9.45 ms
ab6855b+dirty 1214.08 ms 1214.30 ms 0.21 ms
a6bb98b+dirty 1213.77 ms 1220.53 ms 6.76 ms

App size

Revision Plain With Sentry Diff
c96c5b7+dirty 3.38 MiB 4.73 MiB 1.35 MiB
b6f917a+dirty 3.38 MiB 4.72 MiB 1.34 MiB
7e6fe7f+dirty 3.38 MiB 4.73 MiB 1.35 MiB
2cf0553+dirty 3.38 MiB 4.73 MiB 1.35 MiB
4e6aa78+dirty 3.38 MiB 4.73 MiB 1.35 MiB
470fa9f+dirty 3.38 MiB 4.73 MiB 1.35 MiB
d34c279+dirty 3.38 MiB 4.72 MiB 1.34 MiB
ca6a32c+dirty 3.38 MiB 4.73 MiB 1.35 MiB
ab6855b+dirty 3.38 MiB 4.73 MiB 1.35 MiB
a6bb98b+dirty 3.38 MiB 4.73 MiB 1.35 MiB

@github-actions
Copy link
Contributor

github-actions bot commented Mar 17, 2026

Android (new) Performance metrics 🚀

  Plain With Sentry Diff
Startup time 442.42 ms 480.02 ms 37.60 ms
Size 43.94 MiB 48.93 MiB 5.00 MiB

Baseline results on branch: main

Startup times

Revision Plain With Sentry Diff
70250df+dirty 418.08 ms 480.84 ms 62.76 ms
8d89cc9+dirty 357.69 ms 415.79 ms 58.10 ms
1853710+dirty 360.67 ms 396.28 ms 35.61 ms
55b77fc+dirty 410.46 ms 414.11 ms 3.65 ms
69602ce+dirty 375.37 ms 405.28 ms 29.91 ms
c1573b3+dirty 355.65 ms 448.82 ms 93.17 ms
90afdd3+dirty 367.79 ms 404.84 ms 37.05 ms
955f2eb+dirty 388.13 ms 433.56 ms 45.44 ms
80e4616+dirty 427.31 ms 461.15 ms 33.84 ms
276d348+dirty 356.30 ms 405.27 ms 48.97 ms

App size

Revision Plain With Sentry Diff
70250df+dirty 43.94 MiB 48.91 MiB 4.97 MiB
8d89cc9+dirty 7.15 MiB 8.41 MiB 1.26 MiB
1853710+dirty 7.15 MiB 8.41 MiB 1.26 MiB
55b77fc+dirty 43.94 MiB 48.82 MiB 4.88 MiB
69602ce+dirty 7.15 MiB 8.41 MiB 1.26 MiB
c1573b3+dirty 7.15 MiB 8.42 MiB 1.27 MiB
90afdd3+dirty 7.15 MiB 8.43 MiB 1.28 MiB
955f2eb+dirty 7.15 MiB 8.42 MiB 1.27 MiB
80e4616+dirty 43.94 MiB 49.38 MiB 5.44 MiB
276d348+dirty 7.15 MiB 8.42 MiB 1.26 MiB

Previous results on branch: antonis/fix-e2e-flakiness-combined

Startup times

Revision Plain With Sentry Diff
2cf0553+dirty 368.80 ms 401.65 ms 32.86 ms
470fa9f+dirty 396.50 ms 455.08 ms 58.58 ms
b6f917a+dirty 367.91 ms 412.94 ms 45.03 ms
4e6aa78+dirty 440.60 ms 465.96 ms 25.36 ms
d34c279+dirty 422.73 ms 453.91 ms 31.18 ms
ca6a32c+dirty 413.82 ms 475.83 ms 62.02 ms
a6bb98b+dirty 396.57 ms 419.20 ms 22.63 ms
c96c5b7+dirty 380.02 ms 436.37 ms 56.35 ms
ab6855b+dirty 372.10 ms 406.22 ms 34.12 ms
7e6fe7f+dirty 477.04 ms 520.53 ms 43.49 ms

App size

Revision Plain With Sentry Diff
2cf0553+dirty 43.94 MiB 48.93 MiB 5.00 MiB
470fa9f+dirty 43.94 MiB 48.93 MiB 5.00 MiB
b6f917a+dirty 43.94 MiB 48.93 MiB 4.99 MiB
4e6aa78+dirty 43.94 MiB 48.93 MiB 5.00 MiB
d34c279+dirty 43.94 MiB 49.18 MiB 5.24 MiB
ca6a32c+dirty 43.94 MiB 48.93 MiB 5.00 MiB
a6bb98b+dirty 43.94 MiB 48.93 MiB 5.00 MiB
c96c5b7+dirty 43.94 MiB 48.93 MiB 5.00 MiB
ab6855b+dirty 43.94 MiB 48.93 MiB 5.00 MiB
7e6fe7f+dirty 43.94 MiB 48.93 MiB 5.00 MiB

@antonis antonis changed the title fix(ci): Fix E2E test flakiness on Cirrus Labs runners fix(ci): Fix E2E test flakiness Mar 24, 2026
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Bugbot reviewed your changes and found no new issues!

Comment @cursor review or bugbot run to trigger another review on this PR

@antonis antonis force-pushed the antonis/fix-e2e-flakiness-combined branch 2 times, most recently from c28307e to ad66727 Compare March 25, 2026 09:27
iOS E2E tests have been failing on every main commit since the migration
to Cirrus Labs Tart VMs (nested virtualisation). The simulator is
significantly slower to stabilise, causing Maestro's XCTest driver to
lose communication with the app.

Simulator configuration:
- wait_for_boot: true — block until simulator fully boots
- erase_before_boot: false — skip redundant erase (each flow uses clearState)
- MAESTRO_DRIVER_STARTUP_TIMEOUT: 180000 (3 min)
- Settings.app warm-up step to let SpringBoard finish post-boot init

e2e-v2 test runner (cli.mjs):
- Run each Maestro flow in its own process to isolate crashes
  (maestro test maestro shares a session — if crash.yml kills the app,
  subsequent flows fail because the XCTest driver loses the connection)
- Per-flow retries (up to 3 attempts) for transient timing failures
- execSync → execFileSync to avoid shell interpolation

crash.yml:
- Removed post-crash relaunch — unreliable on Tart VMs and unnecessary
  since each flow now runs in its own process

Sample application test fixes:
- Search all envelopes for app start transaction (may arrive separately)
- Sort news envelopes by timestamp for consistent ordering
- Exclude auto.app.start from time-to-display assertions
- Per-flow retries in maestro.ts for transient failures

Supersedes #5752 and #5755.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@antonis antonis force-pushed the antonis/fix-e2e-flakiness-combined branch from ad66727 to d6ae495 Compare March 25, 2026 10:11
@antonis antonis removed the ready-to-merge Triggers the full CI test suite label Mar 25, 2026
@github-actions
Copy link
Contributor

Fails
🚫 Pull request is not ready for merge, please add the "ready-to-merge" label to the pull request

Generated by 🚫 dangerJS against 61af906

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant