refactor(12/12): add snapshot test fixtures and benchmarks by cameroncooke · Pull Request #330 · getsentry/XcodeBuildMCP

cameroncooke · 2026-04-08T21:12:09Z

Summary

This is PR 12 of 12, the final PR in the stacked series that decouples the rendering pipeline from MCP transport. Depends on PR 11.

Adds the snapshot test suite and performance benchmarks that validate the entire rendering pipeline end-to-end. These are large in line count but are almost entirely test fixtures (expected output files) and benchmark scripts.

Snapshot test suite (159 files, ~5700 lines)

The snapshot test infrastructure captures the rendered output of tool invocations and compares against expected fixtures. This provides regression protection for the rendering pipeline -- any change to event formatting, diagnostic grouping, or output ordering will be caught by a fixture mismatch.

Test harness (src/snapshot-tests/):

harness.ts: Core test runner that invokes tools with mock executors and captures rendered output
fixture-io.ts: Reads/writes fixture files, handles normalization (timestamps, paths, UUIDs)
flowdeck-fixture-io.ts: Flowdeck-specific fixture handling
normalize.ts: Output normalization for stable comparisons across environments
resource-harness.ts: Resource-specific snapshot testing

Fixtures: Expected output files for each tool covering success, error, and edge case scenarios. These serve as living documentation of what each tool's output looks like.

Benchmarks (~3300 lines)

Performance benchmarks for the rendering pipeline and xcodebuild parsing:

Parser throughput: lines/second for xcodebuild output parsing
Render session performance: events/second for text and JSON strategies
End-to-end tool invocation timing

These benchmarks establish baselines and can be run in CI to catch performance regressions.

Note for reviewers

This PR is large by line count but low in conceptual complexity. The fixture files are auto-generated expected outputs. The benchmark scripts are straightforward timing loops. The meaningful code is the ~500 lines of test harness infrastructure.

Stack navigation

PR 1-11/12: All code and configuration changes
PR 12/12 (this PR): Snapshot tests and benchmarks

Test plan

npx vitest run passes -- snapshot tests match expected fixtures
npx vitest run --config vitest.snapshot.config.ts runs snapshot suite specifically
Benchmarks execute without errors (performance numbers are informational)

cursor

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Autofix Details

Bugbot Autofix prepared fixes for both issues found in the latest run.

✅ Fixed: Fixture has un-normalized username causing cross-environment test failures
- Added normalization for USER, ALTERNATE_OWNER, INSTALL_OWNER, and VERSION_INFO_BUILDER build settings to replace usernames with , and updated the fixture file accordingly.
✅ Fixed: Developer team ID committed in fixture file
- Added normalization for DEVELOPMENT_TEAM build setting to replace team ID with , and updated the fixture file to remove the hardcoded team ID.

Or push these changes by commenting:

@cursor push d89a7eafb2

Preview (d89a7eafb2)

diff --git a/src/snapshot-tests/__fixtures__/project-discovery/show-build-settings--success.txt b/src/snapshot-tests/__fixtures__/project-discovery/show-build-settings--success.txt
--- a/src/snapshot-tests/__fixtures__/project-discovery/show-build-settings--success.txt
+++ b/src/snapshot-tests/__fixtures__/project-discovery/show-build-settings--success.txt
@@ -15,7 +15,7 @@
       ALLOW_TARGET_PLATFORM_SPECIALIZATION = NO
       ALTERNATE_GROUP = staff
       ALTERNATE_MODE = u+w,go-w,a+rX
-      ALTERNATE_OWNER = cameroncooke
+      ALTERNATE_OWNER = <redacted>
       ALTERNATIVE_DISTRIBUTION_WEB = NO
       ALWAYS_EMBED_SWIFT_STANDARD_LIBRARIES = NO
       ALWAYS_SEARCH_USER_PATHS = NO
@@ -156,7 +156,7 @@
       DEVELOPER_TOOLS_DIR = /Applications/Xcode-26.4.0.app/Contents/Developer/Tools
       DEVELOPER_USR_DIR = /Applications/Xcode-26.4.0.app/Contents/Developer/usr
       DEVELOPMENT_LANGUAGE = en
-      DEVELOPMENT_TEAM = BR6WD3M6ZD
+      DEVELOPMENT_TEAM = <redacted>
       DIAGNOSE_MISSING_TARGET_DEPENDENCIES = YES
       DIFF = /usr/bin/diff
       DOCUMENTATION_FOLDER_PATH = CalculatorApp.app/en.lproj/Documentation
@@ -293,7 +293,7 @@
       INSTALL_DIR = /tmp/CalculatorApp.dst/Applications
       INSTALL_GROUP = staff
       INSTALL_MODE_FLAG = u+w,go-w,a+rX
-      INSTALL_OWNER = cameroncooke
+      INSTALL_OWNER = <redacted>
       INSTALL_PATH = /Applications
       INSTALL_ROOT = /tmp/CalculatorApp.dst
       IPHONEOS_DEPLOYMENT_TARGET = 17.0
@@ -568,7 +568,7 @@
       UNLOCALIZED_RESOURCES_FOLDER_PATH_SHALLOW_BUNDLE_NO = CalculatorApp.app/Resources
       UNLOCALIZED_RESOURCES_FOLDER_PATH_SHALLOW_BUNDLE_YES = CalculatorApp.app
       UNSTRIPPED_PRODUCT = NO
-      USER = cameroncooke
+      USER = <redacted>
       USER_APPS_DIR = <HOME>/Applications
       USER_LIBRARY_DIR = <HOME>/Library
       USE_DYNAMIC_NO_PIC = YES
@@ -579,7 +579,7 @@
       VALID_ARCHS = arm64 arm64e armv7 armv7s
       VERBOSE_PBXCP = NO
       VERSIONPLIST_PATH = CalculatorApp.app/version.plist
-      VERSION_INFO_BUILDER = cameroncooke
+      VERSION_INFO_BUILDER = <redacted>
       VERSION_INFO_FILE = CalculatorApp_vers.c
       VERSION_INFO_STRING = "@(#)PROGRAM:CalculatorApp  PROJECT:CalculatorApp-1"
       WATCHOS_DEPLOYMENT_TARGET = 26.4

diff --git a/src/snapshot-tests/normalize.ts b/src/snapshot-tests/normalize.ts
--- a/src/snapshot-tests/normalize.ts
+++ b/src/snapshot-tests/normalize.ts
@@ -47,6 +47,9 @@
 const ACQUIRED_USAGE_ASSERTION_TIME_REGEX =
   /(^\s*)\d{2}:\d{2}:\d{2}( {2}Acquired usage assertion\.)$/gm;
 const BUILD_SETTINGS_PATH_REGEX = /^( {6}PATH = ).+$/gm;
+const BUILD_SETTINGS_USER_REGEX =
+  /^( {6}(?:USER|ALTERNATE_OWNER|INSTALL_OWNER|VERSION_INFO_BUILDER) = ).+$/gm;
+const BUILD_SETTINGS_DEVELOPMENT_TEAM_REGEX = /^( {6}DEVELOPMENT_TEAM = ).+$/gm;
 const TRAILING_WHITESPACE_REGEX = /[ \t]+$/gm;
 
 function sortLinesInBlock(text: string, marker: RegExp): string {
@@ -130,6 +133,8 @@
 
   normalized = normalized.replace(TARGET_DEVICE_IDENTIFIER_REGEX, '$1<UUID>');
   normalized = normalized.replace(BUILD_SETTINGS_PATH_REGEX, '$1<PATH>');
+  normalized = normalized.replace(BUILD_SETTINGS_USER_REGEX, '$1<redacted>');
+  normalized = normalized.replace(BUILD_SETTINGS_DEVELOPMENT_TEAM_REGEX, '$1<redacted>');
   normalized = normalized.replace(CODEX_ARG0_PATH_REGEX, '<HOME>/.codex/tmp/arg0/codex-arg0<ARG0>');
   normalized = normalized.replace(ACQUIRED_USAGE_ASSERTION_TIME_REGEX, '$1<TIME>$2');
   normalized = normalized.replace(

_{This Bugbot Autofix run was free. To enable autofix for future PRs, go to the Cursor dashboard.}

src/snapshot-tests/__fixtures__/project-discovery/show-build-settings--success.txt

benchmarks/simulator-test/2026-03-17T14-18-19-390Z/summary.json

src/snapshot-tests/__tests__/coverage.snapshot.test.ts

src/snapshot-tests/harness.ts

src/snapshot-tests/__fixtures__/project-discovery/show-build-settings--success.txt

src/snapshot-tests/__fixtures__/simulator/screenshot--success.txt

cameroncooke · 2026-04-09T09:07:46Z

This stack of pull requests is managed by Graphite. Learn more about stacking.

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 383ee04. Configure here.}

src/snapshot-tests/normalize.ts

cameroncooke · 2026-04-10T12:18:57Z

Merge activity

Apr 10, 12:18 PM UTC: A user started a stack merge that includes this pull request via Graphite.
Apr 10, 12:34 PM UTC: Graphite rebased this pull request as part of a merge.
Apr 10, 12:35 PM UTC: @cameroncooke merged this pull request with Graphite.

…eline

The show-build-settings fixture had hardcoded username and UID values that would cause test failures on other developer machines. Added normalizer rules for ALTERNATE_OWNER, INSTALL_OWNER, USER, VERSION_INFO_BUILDER, and UID build settings.

The regex required a trailing slash after the path segment, missing files placed directly in tmpdir (e.g., screenshot temp files).

Add mcp-harness.ts that combines the MCP test harness (real server with in-memory transport and mocked executors) with the snapshot normalization and fixture comparison pipeline. Initial test scenarios cover session management (show/set defaults) and error paths (missing required params). The harness can be extended with additional tool scenarios.

All consumers were migrated to event-based handlers in PRs 6-9. The barrel with createErrorResponse/createTextResponse shims is no longer imported by any source file. Updated the ARCHITECTURE.md tool pattern example to reflect the event-based handler contract.

…ssertion

The --console-pty flag causes PTY buffering that prevents the PID banner from flushing to the log file reliably (0/5 in testing). Replace file-based PID polling with an idempotent simctl launch call via CommandExecutor. Calling simctl launch on an already-running app returns the existing PID without relaunching (verified 10/10). Algorithm: 1. Spawn --console-pty with fd-to-file (captures print/NSLog) 2. Wait for app startup 3. Plain simctl launch returns existing app PID 4. Start oslog stream separately for Logger messages

- Fix test summary regex to preserve "tests" word in failure counts - Normalize doctor cwd, axe video capture capability - Update fixtures for bundle ID change (MCPTest.macOS) and improved tmpdir/test-count normalization

sentry · 2026-04-10T12:42:05Z

src/snapshot-tests/normalize.ts

+  // Normalize final test summary line (counts vary across environments)
+  normalized = normalized.replace(
+    /\d+ (tests? failed), \d+ (passed)(?:, \d+ (skipped))?/g,
+    '<FAIL_COUNT> $1, <PASS_COUNT> $2, <SKIP_COUNT> skipped',
+  );


Bug: The test summary normalization regex incorrectly adds a placeholder for skipped tests even when none are present, causing incorrect snapshot generation.
_{Severity: MEDIUM}

Suggested Fix

Modify the replacement logic to be conditional. Instead of a single replace call with a static string, use a replacer function. This function would check if the capture group for 'skipped' (e.g., $3) is defined. If it is, include the skipped count placeholder in the returned string; otherwise, omit it.

Prompt for AI Agent

Review the code at the location below. A potential bug has been identified by an AI agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not valid. Location: src/snapshot-tests/normalize.ts#L152-L156 Potential issue: The regex replacement in `normalize.ts` for test summary lines incorrectly handles cases where there are no skipped tests. The replacement string hard-codes the text `, <SKIP_COUNT> skipped` instead of conditionally including it based on the optional capture group `(?:, \d+ (skipped))?`. As a result, when a test summary line without skipped tests is processed (e.g., "10 tests failed, 5 passed"), the normalized output will incorrectly append the skipped test placeholder. This leads to incorrect snapshot fixtures and can cause inconsistent test comparisons across different environments.

This is **PR 12 of 12**, the final PR in the stacked series that decouples the rendering pipeline from MCP transport. Depends on PR 11. Adds the snapshot test suite and performance benchmarks that validate the entire rendering pipeline end-to-end. These are large in line count but are almost entirely test fixtures (expected output files) and benchmark scripts. The snapshot test infrastructure captures the rendered output of tool invocations and compares against expected fixtures. This provides regression protection for the rendering pipeline -- any change to event formatting, diagnostic grouping, or output ordering will be caught by a fixture mismatch. **Test harness** (`src/snapshot-tests/`): - `harness.ts`: Core test runner that invokes tools with mock executors and captures rendered output - `fixture-io.ts`: Reads/writes fixture files, handles normalization (timestamps, paths, UUIDs) - `flowdeck-fixture-io.ts`: Flowdeck-specific fixture handling - `normalize.ts`: Output normalization for stable comparisons across environments - `resource-harness.ts`: Resource-specific snapshot testing **Fixtures**: Expected output files for each tool covering success, error, and edge case scenarios. These serve as living documentation of what each tool's output looks like. Performance benchmarks for the rendering pipeline and xcodebuild parsing: - Parser throughput: lines/second for xcodebuild output parsing - Render session performance: events/second for text and JSON strategies - End-to-end tool invocation timing These benchmarks establish baselines and can be run in CI to catch performance regressions. This PR is large by line count but low in conceptual complexity. The fixture files are auto-generated expected outputs. The benchmark scripts are straightforward timing loops. The meaningful code is the ~500 lines of test harness infrastructure. - PR 1-11/12: All code and configuration changes - **PR 12/12** (this PR): Snapshot tests and benchmarks - [ ] `npx vitest run` passes -- snapshot tests match expected fixtures - [ ] `npx vitest run --config vitest.snapshot.config.ts` runs snapshot suite specifically - [ ] Benchmarks execute without errors (performance numbers are informational)

This is **PR 12 of 12**, the final PR in the stacked series that decouples the rendering pipeline from MCP transport. Depends on PR 11. Adds the snapshot test suite and performance benchmarks that validate the entire rendering pipeline end-to-end. These are large in line count but are almost entirely test fixtures (expected output files) and benchmark scripts. The snapshot test infrastructure captures the rendered output of tool invocations and compares against expected fixtures. This provides regression protection for the rendering pipeline -- any change to event formatting, diagnostic grouping, or output ordering will be caught by a fixture mismatch. **Test harness** (`src/snapshot-tests/`): - `harness.ts`: Core test runner that invokes tools with mock executors and captures rendered output - `fixture-io.ts`: Reads/writes fixture files, handles normalization (timestamps, paths, UUIDs) - `normalize.ts`: Output normalization for stable comparisons across environments - `resource-harness.ts`: Resource-specific snapshot testing **Fixtures**: Expected output files for each tool covering success, error, and edge case scenarios. These serve as living documentation of what each tool's output looks like. Performance benchmarks for the rendering pipeline and xcodebuild parsing: - Parser throughput: lines/second for xcodebuild output parsing - Render session performance: events/second for text and JSON strategies - End-to-end tool invocation timing These benchmarks establish baselines and can be run in CI to catch performance regressions. This PR is large by line count but low in conceptual complexity. The fixture files are auto-generated expected outputs. The benchmark scripts are straightforward timing loops. The meaningful code is the ~500 lines of test harness infrastructure. - PR 1-11/12: All code and configuration changes - **PR 12/12** (this PR): Snapshot tests and benchmarks - [ ] `npx vitest run` passes -- snapshot tests match expected fixtures - [ ] `npx vitest run --config vitest.snapshot.config.ts` runs snapshot suite specifically - [ ] Benchmarks execute without errors (performance numbers are informational)

cursor bot reviewed Apr 8, 2026

View reviewed changes

src/snapshot-tests/__fixtures__/project-discovery/show-build-settings--success.txt Outdated Show resolved Hide resolved

src/snapshot-tests/__fixtures__/project-discovery/show-build-settings--success.txt Show resolved Hide resolved

cameroncooke force-pushed the refactor/snapshot-tests-benchmarks branch from c179927 to f33bc8d Compare April 8, 2026 21:29

cameroncooke force-pushed the refactor/manifests-config-docs branch from 55a323e to 22247c9 Compare April 8, 2026 21:29

cursor bot reviewed Apr 8, 2026

View reviewed changes

benchmarks/simulator-test/2026-03-17T14-18-19-390Z/summary.json Show resolved Hide resolved

cameroncooke force-pushed the refactor/manifests-config-docs branch from 22247c9 to 785bdd9 Compare April 9, 2026 07:49

cameroncooke force-pushed the refactor/snapshot-tests-benchmarks branch from f33bc8d to 3b9be04 Compare April 9, 2026 07:49

cursor bot reviewed Apr 9, 2026

View reviewed changes

src/snapshot-tests/__tests__/coverage.snapshot.test.ts Show resolved Hide resolved

sentry bot reviewed Apr 9, 2026

View reviewed changes

src/snapshot-tests/harness.ts Show resolved Hide resolved

cameroncooke force-pushed the refactor/snapshot-tests-benchmarks branch from 3b9be04 to daa650e Compare April 9, 2026 07:59

cursor bot reviewed Apr 9, 2026

View reviewed changes

src/snapshot-tests/__fixtures__/project-discovery/show-build-settings--success.txt Show resolved Hide resolved

src/snapshot-tests/__fixtures__/simulator/screenshot--success.txt Outdated Show resolved Hide resolved

cameroncooke force-pushed the refactor/manifests-config-docs branch from 083f816 to f5391c3 Compare April 9, 2026 08:14

cameroncooke force-pushed the refactor/snapshot-tests-benchmarks branch from daa650e to 672e575 Compare April 9, 2026 08:14

cameroncooke force-pushed the refactor/manifests-config-docs branch from f5391c3 to b6e35ad Compare April 9, 2026 08:45

cameroncooke force-pushed the refactor/snapshot-tests-benchmarks branch from 672e575 to ebee70a Compare April 9, 2026 08:45

cameroncooke mentioned this pull request Apr 9, 2026

refactor(11/12): update manifests, config, docs, and example projects #329

Merged

4 tasks

cameroncooke force-pushed the refactor/snapshot-tests-benchmarks branch from ebee70a to ed126b6 Compare April 9, 2026 09:33

cameroncooke force-pushed the refactor/manifests-config-docs branch from 0497670 to dc62323 Compare April 9, 2026 10:39

cameroncooke force-pushed the refactor/snapshot-tests-benchmarks branch 2 times, most recently from 91087af to 25e9218 Compare April 9, 2026 10:56

cameroncooke force-pushed the refactor/snapshot-tests-benchmarks branch from 66f1dca to 7583cb0 Compare April 9, 2026 11:48

cameroncooke force-pushed the refactor/manifests-config-docs branch from d9a8caa to 6ea4df8 Compare April 9, 2026 12:03

cameroncooke force-pushed the refactor/snapshot-tests-benchmarks branch from 7583cb0 to 1c849e2 Compare April 9, 2026 12:03

cameroncooke force-pushed the refactor/manifests-config-docs branch from 6ea4df8 to 86b945d Compare April 9, 2026 14:43

cameroncooke force-pushed the refactor/snapshot-tests-benchmarks branch 2 times, most recently from 42de996 to 7bcad74 Compare April 9, 2026 15:15

cameroncooke force-pushed the refactor/manifests-config-docs branch 2 times, most recently from 546ad1c to 81d1e82 Compare April 9, 2026 15:40

cameroncooke force-pushed the refactor/snapshot-tests-benchmarks branch from 7bcad74 to 383ee04 Compare April 9, 2026 15:40

cursor bot reviewed Apr 9, 2026

View reviewed changes

src/snapshot-tests/normalize.ts Show resolved Hide resolved

cameroncooke force-pushed the refactor/snapshot-tests-benchmarks branch from 5fc14c0 to 75d408f Compare April 10, 2026 07:36

cameroncooke force-pushed the refactor/manifests-config-docs branch from aa949ff to 3c0a6bb Compare April 10, 2026 07:36

cameroncooke changed the base branch from refactor/manifests-config-docs to graphite-base/330 April 10, 2026 12:32

cameroncooke changed the base branch from graphite-base/330 to main April 10, 2026 12:33

cameroncooke added 9 commits April 10, 2026 12:34

refactor: add snapshot test fixtures and benchmarks for rendering pip…

3783c34

…eline

fix: tmpdir normalization now matches files directly under tmpdir

833254e

The regex required a trailing slash after the path segment, missing files placed directly in tmpdir (e.g., screenshot temp files).

fix: remove duplicate packageCachePath and update list_devices test a…

7eb7e72

…ssertion

style: apply prettier formatting

a1c1cd2

fix: improve snapshot normalization and update fixtures

343e579

- Fix test summary regex to preserve "tests" word in failure counts - Normalize doctor cwd, axe video capture capability - Update fixtures for bundle ID change (MCPTest.macOS) and improved tmpdir/test-count normalization

cameroncooke force-pushed the refactor/snapshot-tests-benchmarks branch from 33e3c2f to 343e579 Compare April 10, 2026 12:34

cameroncooke merged commit 90d9ac6 into main Apr 10, 2026
10 of 11 checks passed

cameroncooke deleted the refactor/snapshot-tests-benchmarks branch April 10, 2026 12:35

sentry bot reviewed Apr 10, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

refactor(12/12): add snapshot test fixtures and benchmarks#330

refactor(12/12): add snapshot test fixtures and benchmarks#330
cameroncooke merged 9 commits intomainfrom
refactor/snapshot-tests-benchmarks

cameroncooke commented Apr 8, 2026

Uh oh!

cursor bot left a comment •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cameroncooke commented Apr 9, 2026 •

edited

Loading

Uh oh!

cursor bot left a comment

Uh oh!

Uh oh!

cameroncooke commented Apr 10, 2026 •

edited

Loading

Uh oh!

Uh oh!

sentry bot Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

cameroncooke commented Apr 8, 2026

Summary

Snapshot test suite (159 files, ~5700 lines)

Benchmarks (~3300 lines)

Note for reviewers

Stack navigation

Test plan

Uh oh!

cursor bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cameroncooke commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cameroncooke commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merge activity

Uh oh!

Uh oh!

sentry bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

cursor bot left a comment •

edited

Loading

cameroncooke commented Apr 9, 2026 •

edited

Loading

cameroncooke commented Apr 10, 2026 •

edited

Loading