Skip to content

refactor(12/12): add snapshot test fixtures and benchmarks#330

Merged
cameroncooke merged 9 commits intomainfrom
refactor/snapshot-tests-benchmarks
Apr 10, 2026
Merged

refactor(12/12): add snapshot test fixtures and benchmarks#330
cameroncooke merged 9 commits intomainfrom
refactor/snapshot-tests-benchmarks

Conversation

@cameroncooke
Copy link
Copy Markdown
Collaborator

Summary

This is PR 12 of 12, the final PR in the stacked series that decouples the rendering pipeline from MCP transport. Depends on PR 11.

Adds the snapshot test suite and performance benchmarks that validate the entire rendering pipeline end-to-end. These are large in line count but are almost entirely test fixtures (expected output files) and benchmark scripts.

Snapshot test suite (159 files, ~5700 lines)

The snapshot test infrastructure captures the rendered output of tool invocations and compares against expected fixtures. This provides regression protection for the rendering pipeline -- any change to event formatting, diagnostic grouping, or output ordering will be caught by a fixture mismatch.

Test harness (src/snapshot-tests/):

  • harness.ts: Core test runner that invokes tools with mock executors and captures rendered output
  • fixture-io.ts: Reads/writes fixture files, handles normalization (timestamps, paths, UUIDs)
  • flowdeck-fixture-io.ts: Flowdeck-specific fixture handling
  • normalize.ts: Output normalization for stable comparisons across environments
  • resource-harness.ts: Resource-specific snapshot testing

Fixtures: Expected output files for each tool covering success, error, and edge case scenarios. These serve as living documentation of what each tool's output looks like.

Benchmarks (~3300 lines)

Performance benchmarks for the rendering pipeline and xcodebuild parsing:

  • Parser throughput: lines/second for xcodebuild output parsing
  • Render session performance: events/second for text and JSON strategies
  • End-to-end tool invocation timing

These benchmarks establish baselines and can be run in CI to catch performance regressions.

Note for reviewers

This PR is large by line count but low in conceptual complexity. The fixture files are auto-generated expected outputs. The benchmark scripts are straightforward timing loops. The meaningful code is the ~500 lines of test harness infrastructure.

Stack navigation

  • PR 1-11/12: All code and configuration changes
  • PR 12/12 (this PR): Snapshot tests and benchmarks

Test plan

  • npx vitest run passes -- snapshot tests match expected fixtures
  • npx vitest run --config vitest.snapshot.config.ts runs snapshot suite specifically
  • Benchmarks execute without errors (performance numbers are informational)

Copy link
Copy Markdown
Contributor

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Autofix Details

Bugbot Autofix prepared fixes for both issues found in the latest run.

  • ✅ Fixed: Fixture has un-normalized username causing cross-environment test failures
    • Added normalization for USER, ALTERNATE_OWNER, INSTALL_OWNER, and VERSION_INFO_BUILDER build settings to replace usernames with , and updated the fixture file accordingly.
  • ✅ Fixed: Developer team ID committed in fixture file
    • Added normalization for DEVELOPMENT_TEAM build setting to replace team ID with , and updated the fixture file to remove the hardcoded team ID.

Create PR

Or push these changes by commenting:

@cursor push d89a7eafb2
Preview (d89a7eafb2)
diff --git a/src/snapshot-tests/__fixtures__/project-discovery/show-build-settings--success.txt b/src/snapshot-tests/__fixtures__/project-discovery/show-build-settings--success.txt
--- a/src/snapshot-tests/__fixtures__/project-discovery/show-build-settings--success.txt
+++ b/src/snapshot-tests/__fixtures__/project-discovery/show-build-settings--success.txt
@@ -15,7 +15,7 @@
       ALLOW_TARGET_PLATFORM_SPECIALIZATION = NO
       ALTERNATE_GROUP = staff
       ALTERNATE_MODE = u+w,go-w,a+rX
-      ALTERNATE_OWNER = cameroncooke
+      ALTERNATE_OWNER = <redacted>
       ALTERNATIVE_DISTRIBUTION_WEB = NO
       ALWAYS_EMBED_SWIFT_STANDARD_LIBRARIES = NO
       ALWAYS_SEARCH_USER_PATHS = NO
@@ -156,7 +156,7 @@
       DEVELOPER_TOOLS_DIR = /Applications/Xcode-26.4.0.app/Contents/Developer/Tools
       DEVELOPER_USR_DIR = /Applications/Xcode-26.4.0.app/Contents/Developer/usr
       DEVELOPMENT_LANGUAGE = en
-      DEVELOPMENT_TEAM = BR6WD3M6ZD
+      DEVELOPMENT_TEAM = <redacted>
       DIAGNOSE_MISSING_TARGET_DEPENDENCIES = YES
       DIFF = /usr/bin/diff
       DOCUMENTATION_FOLDER_PATH = CalculatorApp.app/en.lproj/Documentation
@@ -293,7 +293,7 @@
       INSTALL_DIR = /tmp/CalculatorApp.dst/Applications
       INSTALL_GROUP = staff
       INSTALL_MODE_FLAG = u+w,go-w,a+rX
-      INSTALL_OWNER = cameroncooke
+      INSTALL_OWNER = <redacted>
       INSTALL_PATH = /Applications
       INSTALL_ROOT = /tmp/CalculatorApp.dst
       IPHONEOS_DEPLOYMENT_TARGET = 17.0
@@ -568,7 +568,7 @@
       UNLOCALIZED_RESOURCES_FOLDER_PATH_SHALLOW_BUNDLE_NO = CalculatorApp.app/Resources
       UNLOCALIZED_RESOURCES_FOLDER_PATH_SHALLOW_BUNDLE_YES = CalculatorApp.app
       UNSTRIPPED_PRODUCT = NO
-      USER = cameroncooke
+      USER = <redacted>
       USER_APPS_DIR = <HOME>/Applications
       USER_LIBRARY_DIR = <HOME>/Library
       USE_DYNAMIC_NO_PIC = YES
@@ -579,7 +579,7 @@
       VALID_ARCHS = arm64 arm64e armv7 armv7s
       VERBOSE_PBXCP = NO
       VERSIONPLIST_PATH = CalculatorApp.app/version.plist
-      VERSION_INFO_BUILDER = cameroncooke
+      VERSION_INFO_BUILDER = <redacted>
       VERSION_INFO_FILE = CalculatorApp_vers.c
       VERSION_INFO_STRING = "@(#)PROGRAM:CalculatorApp  PROJECT:CalculatorApp-1"
       WATCHOS_DEPLOYMENT_TARGET = 26.4

diff --git a/src/snapshot-tests/normalize.ts b/src/snapshot-tests/normalize.ts
--- a/src/snapshot-tests/normalize.ts
+++ b/src/snapshot-tests/normalize.ts
@@ -47,6 +47,9 @@
 const ACQUIRED_USAGE_ASSERTION_TIME_REGEX =
   /(^\s*)\d{2}:\d{2}:\d{2}( {2}Acquired usage assertion\.)$/gm;
 const BUILD_SETTINGS_PATH_REGEX = /^( {6}PATH = ).+$/gm;
+const BUILD_SETTINGS_USER_REGEX =
+  /^( {6}(?:USER|ALTERNATE_OWNER|INSTALL_OWNER|VERSION_INFO_BUILDER) = ).+$/gm;
+const BUILD_SETTINGS_DEVELOPMENT_TEAM_REGEX = /^( {6}DEVELOPMENT_TEAM = ).+$/gm;
 const TRAILING_WHITESPACE_REGEX = /[ \t]+$/gm;
 
 function sortLinesInBlock(text: string, marker: RegExp): string {
@@ -130,6 +133,8 @@
 
   normalized = normalized.replace(TARGET_DEVICE_IDENTIFIER_REGEX, '$1<UUID>');
   normalized = normalized.replace(BUILD_SETTINGS_PATH_REGEX, '$1<PATH>');
+  normalized = normalized.replace(BUILD_SETTINGS_USER_REGEX, '$1<redacted>');
+  normalized = normalized.replace(BUILD_SETTINGS_DEVELOPMENT_TEAM_REGEX, '$1<redacted>');
   normalized = normalized.replace(CODEX_ARG0_PATH_REGEX, '<HOME>/.codex/tmp/arg0/codex-arg0<ARG0>');
   normalized = normalized.replace(ACQUIRED_USAGE_ASSERTION_TIME_REGEX, '$1<TIME>$2');
   normalized = normalized.replace(

This Bugbot Autofix run was free. To enable autofix for future PRs, go to the Cursor dashboard.

@cameroncooke cameroncooke force-pushed the refactor/snapshot-tests-benchmarks branch from c179927 to f33bc8d Compare April 8, 2026 21:29
@cameroncooke cameroncooke force-pushed the refactor/manifests-config-docs branch from 55a323e to 22247c9 Compare April 8, 2026 21:29
@cameroncooke cameroncooke force-pushed the refactor/manifests-config-docs branch from 22247c9 to 785bdd9 Compare April 9, 2026 07:49
@cameroncooke cameroncooke force-pushed the refactor/snapshot-tests-benchmarks branch from f33bc8d to 3b9be04 Compare April 9, 2026 07:49
@cameroncooke cameroncooke force-pushed the refactor/snapshot-tests-benchmarks branch from 3b9be04 to daa650e Compare April 9, 2026 07:59
@cameroncooke cameroncooke force-pushed the refactor/snapshot-tests-benchmarks branch from ebee70a to ed126b6 Compare April 9, 2026 09:33
@cameroncooke cameroncooke force-pushed the refactor/manifests-config-docs branch from 0497670 to dc62323 Compare April 9, 2026 10:39
@cameroncooke cameroncooke force-pushed the refactor/snapshot-tests-benchmarks branch 2 times, most recently from 91087af to 25e9218 Compare April 9, 2026 10:56
@cameroncooke cameroncooke force-pushed the refactor/snapshot-tests-benchmarks branch from 66f1dca to 7583cb0 Compare April 9, 2026 11:48
@cameroncooke cameroncooke force-pushed the refactor/manifests-config-docs branch from d9a8caa to 6ea4df8 Compare April 9, 2026 12:03
@cameroncooke cameroncooke force-pushed the refactor/snapshot-tests-benchmarks branch from 7583cb0 to 1c849e2 Compare April 9, 2026 12:03
@cameroncooke cameroncooke force-pushed the refactor/manifests-config-docs branch from 6ea4df8 to 86b945d Compare April 9, 2026 14:43
@cameroncooke cameroncooke force-pushed the refactor/snapshot-tests-benchmarks branch 2 times, most recently from 42de996 to 7bcad74 Compare April 9, 2026 15:15
@cameroncooke cameroncooke force-pushed the refactor/manifests-config-docs branch 2 times, most recently from 546ad1c to 81d1e82 Compare April 9, 2026 15:40
@cameroncooke cameroncooke force-pushed the refactor/snapshot-tests-benchmarks branch from 7bcad74 to 383ee04 Compare April 9, 2026 15:40
Copy link
Copy Markdown
Contributor

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 383ee04. Configure here.

@cameroncooke cameroncooke force-pushed the refactor/snapshot-tests-benchmarks branch from 5fc14c0 to 75d408f Compare April 10, 2026 07:36
@cameroncooke cameroncooke force-pushed the refactor/manifests-config-docs branch from aa949ff to 3c0a6bb Compare April 10, 2026 07:36
Copy link
Copy Markdown
Collaborator Author

cameroncooke commented Apr 10, 2026

Merge activity

  • Apr 10, 12:18 PM UTC: A user started a stack merge that includes this pull request via Graphite.
  • Apr 10, 12:34 PM UTC: Graphite rebased this pull request as part of a merge.
  • Apr 10, 12:35 PM UTC: @cameroncooke merged this pull request with Graphite.

@cameroncooke cameroncooke changed the base branch from refactor/manifests-config-docs to graphite-base/330 April 10, 2026 12:32
@cameroncooke cameroncooke changed the base branch from graphite-base/330 to main April 10, 2026 12:33
The show-build-settings fixture had hardcoded username and UID values
that would cause test failures on other developer machines. Added
normalizer rules for ALTERNATE_OWNER, INSTALL_OWNER, USER,
VERSION_INFO_BUILDER, and UID build settings.
The regex required a trailing slash after the path segment, missing
files placed directly in tmpdir (e.g., screenshot temp files).
Add mcp-harness.ts that combines the MCP test harness (real server
with in-memory transport and mocked executors) with the snapshot
normalization and fixture comparison pipeline.

Initial test scenarios cover session management (show/set defaults)
and error paths (missing required params). The harness can be
extended with additional tool scenarios.
All consumers were migrated to event-based handlers in PRs 6-9.
The barrel with createErrorResponse/createTextResponse shims is
no longer imported by any source file. Updated the ARCHITECTURE.md
tool pattern example to reflect the event-based handler contract.
The --console-pty flag causes PTY buffering that prevents the PID
banner from flushing to the log file reliably (0/5 in testing).

Replace file-based PID polling with an idempotent simctl launch
call via CommandExecutor. Calling simctl launch on an already-running
app returns the existing PID without relaunching (verified 10/10).

Algorithm:
1. Spawn --console-pty with fd-to-file (captures print/NSLog)
2. Wait for app startup
3. Plain simctl launch returns existing app PID
4. Start oslog stream separately for Logger messages
- Fix test summary regex to preserve "tests" word in failure counts
- Normalize doctor cwd, axe video capture capability
- Update fixtures for bundle ID change (MCPTest.macOS) and
  improved tmpdir/test-count normalization
@cameroncooke cameroncooke force-pushed the refactor/snapshot-tests-benchmarks branch from 33e3c2f to 343e579 Compare April 10, 2026 12:34
@cameroncooke cameroncooke merged commit 90d9ac6 into main Apr 10, 2026
10 of 11 checks passed
@cameroncooke cameroncooke deleted the refactor/snapshot-tests-benchmarks branch April 10, 2026 12:35
Comment on lines +152 to +156
// Normalize final test summary line (counts vary across environments)
normalized = normalized.replace(
/\d+ (tests? failed), \d+ (passed)(?:, \d+ (skipped))?/g,
'<FAIL_COUNT> $1, <PASS_COUNT> $2, <SKIP_COUNT> skipped',
);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: The test summary normalization regex incorrectly adds a placeholder for skipped tests even when none are present, causing incorrect snapshot generation.
Severity: MEDIUM

Suggested Fix

Modify the replacement logic to be conditional. Instead of a single replace call with a static string, use a replacer function. This function would check if the capture group for 'skipped' (e.g., $3) is defined. If it is, include the skipped count placeholder in the returned string; otherwise, omit it.

Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.

Location: src/snapshot-tests/normalize.ts#L152-L156

Potential issue: The regex replacement in `normalize.ts` for test summary lines
incorrectly handles cases where there are no skipped tests. The replacement string
hard-codes the text `, <SKIP_COUNT> skipped` instead of conditionally including it based
on the optional capture group `(?:, \d+ (skipped))?`. As a result, when a test summary
line without skipped tests is processed (e.g., "10 tests failed, 5 passed"), the
normalized output will incorrectly append the skipped test placeholder. This leads to
incorrect snapshot fixtures and can cause inconsistent test comparisons across different
environments.

cameroncooke added a commit that referenced this pull request Apr 10, 2026
This is **PR 12 of 12**, the final PR in the stacked series that decouples the rendering pipeline from MCP transport. Depends on PR 11.

Adds the snapshot test suite and performance benchmarks that validate the entire rendering pipeline end-to-end. These are large in line count but are almost entirely test fixtures (expected output files) and benchmark scripts.

The snapshot test infrastructure captures the rendered output of tool invocations and compares against expected fixtures. This provides regression protection for the rendering pipeline -- any change to event formatting, diagnostic grouping, or output ordering will be caught by a fixture mismatch.

**Test harness** (`src/snapshot-tests/`):
- `harness.ts`: Core test runner that invokes tools with mock executors and captures rendered output
- `fixture-io.ts`: Reads/writes fixture files, handles normalization (timestamps, paths, UUIDs)
- `flowdeck-fixture-io.ts`: Flowdeck-specific fixture handling
- `normalize.ts`: Output normalization for stable comparisons across environments
- `resource-harness.ts`: Resource-specific snapshot testing

**Fixtures**: Expected output files for each tool covering success, error, and edge case scenarios. These serve as living documentation of what each tool's output looks like.

Performance benchmarks for the rendering pipeline and xcodebuild parsing:
- Parser throughput: lines/second for xcodebuild output parsing
- Render session performance: events/second for text and JSON strategies
- End-to-end tool invocation timing

These benchmarks establish baselines and can be run in CI to catch performance regressions.

This PR is large by line count but low in conceptual complexity. The fixture files are auto-generated expected outputs. The benchmark scripts are straightforward timing loops. The meaningful code is the ~500 lines of test harness infrastructure.

- PR 1-11/12: All code and configuration changes
- **PR 12/12** (this PR): Snapshot tests and benchmarks

- [ ] `npx vitest run` passes -- snapshot tests match expected fixtures
- [ ] `npx vitest run --config vitest.snapshot.config.ts` runs snapshot suite specifically
- [ ] Benchmarks execute without errors (performance numbers are informational)
cameroncooke added a commit that referenced this pull request Apr 10, 2026
This is **PR 12 of 12**, the final PR in the stacked series that decouples the rendering pipeline from MCP transport. Depends on PR 11.

Adds the snapshot test suite and performance benchmarks that validate the entire rendering pipeline end-to-end. These are large in line count but are almost entirely test fixtures (expected output files) and benchmark scripts.

The snapshot test infrastructure captures the rendered output of tool invocations and compares against expected fixtures. This provides regression protection for the rendering pipeline -- any change to event formatting, diagnostic grouping, or output ordering will be caught by a fixture mismatch.

**Test harness** (`src/snapshot-tests/`):
- `harness.ts`: Core test runner that invokes tools with mock executors and captures rendered output
- `fixture-io.ts`: Reads/writes fixture files, handles normalization (timestamps, paths, UUIDs)
- `normalize.ts`: Output normalization for stable comparisons across environments
- `resource-harness.ts`: Resource-specific snapshot testing

**Fixtures**: Expected output files for each tool covering success, error, and edge case scenarios. These serve as living documentation of what each tool's output looks like.

Performance benchmarks for the rendering pipeline and xcodebuild parsing:
- Parser throughput: lines/second for xcodebuild output parsing
- Render session performance: events/second for text and JSON strategies
- End-to-end tool invocation timing

These benchmarks establish baselines and can be run in CI to catch performance regressions.

This PR is large by line count but low in conceptual complexity. The fixture files are auto-generated expected outputs. The benchmark scripts are straightforward timing loops. The meaningful code is the ~500 lines of test harness infrastructure.

- PR 1-11/12: All code and configuration changes
- **PR 12/12** (this PR): Snapshot tests and benchmarks

- [ ] `npx vitest run` passes -- snapshot tests match expected fixtures
- [ ] `npx vitest run --config vitest.snapshot.config.ts` runs snapshot suite specifically
- [ ] Benchmarks execute without errors (performance numbers are informational)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant