refactor(12/12): add snapshot test fixtures and benchmarks#330
refactor(12/12): add snapshot test fixtures and benchmarks#330cameroncooke merged 9 commits intomainfrom
Conversation
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
Autofix Details
Bugbot Autofix prepared fixes for both issues found in the latest run.
- ✅ Fixed: Fixture has un-normalized username causing cross-environment test failures
- Added normalization for USER, ALTERNATE_OWNER, INSTALL_OWNER, and VERSION_INFO_BUILDER build settings to replace usernames with , and updated the fixture file accordingly.
- ✅ Fixed: Developer team ID committed in fixture file
- Added normalization for DEVELOPMENT_TEAM build setting to replace team ID with , and updated the fixture file to remove the hardcoded team ID.
Or push these changes by commenting:
@cursor push d89a7eafb2
Preview (d89a7eafb2)
diff --git a/src/snapshot-tests/__fixtures__/project-discovery/show-build-settings--success.txt b/src/snapshot-tests/__fixtures__/project-discovery/show-build-settings--success.txt
--- a/src/snapshot-tests/__fixtures__/project-discovery/show-build-settings--success.txt
+++ b/src/snapshot-tests/__fixtures__/project-discovery/show-build-settings--success.txt
@@ -15,7 +15,7 @@
ALLOW_TARGET_PLATFORM_SPECIALIZATION = NO
ALTERNATE_GROUP = staff
ALTERNATE_MODE = u+w,go-w,a+rX
- ALTERNATE_OWNER = cameroncooke
+ ALTERNATE_OWNER = <redacted>
ALTERNATIVE_DISTRIBUTION_WEB = NO
ALWAYS_EMBED_SWIFT_STANDARD_LIBRARIES = NO
ALWAYS_SEARCH_USER_PATHS = NO
@@ -156,7 +156,7 @@
DEVELOPER_TOOLS_DIR = /Applications/Xcode-26.4.0.app/Contents/Developer/Tools
DEVELOPER_USR_DIR = /Applications/Xcode-26.4.0.app/Contents/Developer/usr
DEVELOPMENT_LANGUAGE = en
- DEVELOPMENT_TEAM = BR6WD3M6ZD
+ DEVELOPMENT_TEAM = <redacted>
DIAGNOSE_MISSING_TARGET_DEPENDENCIES = YES
DIFF = /usr/bin/diff
DOCUMENTATION_FOLDER_PATH = CalculatorApp.app/en.lproj/Documentation
@@ -293,7 +293,7 @@
INSTALL_DIR = /tmp/CalculatorApp.dst/Applications
INSTALL_GROUP = staff
INSTALL_MODE_FLAG = u+w,go-w,a+rX
- INSTALL_OWNER = cameroncooke
+ INSTALL_OWNER = <redacted>
INSTALL_PATH = /Applications
INSTALL_ROOT = /tmp/CalculatorApp.dst
IPHONEOS_DEPLOYMENT_TARGET = 17.0
@@ -568,7 +568,7 @@
UNLOCALIZED_RESOURCES_FOLDER_PATH_SHALLOW_BUNDLE_NO = CalculatorApp.app/Resources
UNLOCALIZED_RESOURCES_FOLDER_PATH_SHALLOW_BUNDLE_YES = CalculatorApp.app
UNSTRIPPED_PRODUCT = NO
- USER = cameroncooke
+ USER = <redacted>
USER_APPS_DIR = <HOME>/Applications
USER_LIBRARY_DIR = <HOME>/Library
USE_DYNAMIC_NO_PIC = YES
@@ -579,7 +579,7 @@
VALID_ARCHS = arm64 arm64e armv7 armv7s
VERBOSE_PBXCP = NO
VERSIONPLIST_PATH = CalculatorApp.app/version.plist
- VERSION_INFO_BUILDER = cameroncooke
+ VERSION_INFO_BUILDER = <redacted>
VERSION_INFO_FILE = CalculatorApp_vers.c
VERSION_INFO_STRING = "@(#)PROGRAM:CalculatorApp PROJECT:CalculatorApp-1"
WATCHOS_DEPLOYMENT_TARGET = 26.4
diff --git a/src/snapshot-tests/normalize.ts b/src/snapshot-tests/normalize.ts
--- a/src/snapshot-tests/normalize.ts
+++ b/src/snapshot-tests/normalize.ts
@@ -47,6 +47,9 @@
const ACQUIRED_USAGE_ASSERTION_TIME_REGEX =
/(^\s*)\d{2}:\d{2}:\d{2}( {2}Acquired usage assertion\.)$/gm;
const BUILD_SETTINGS_PATH_REGEX = /^( {6}PATH = ).+$/gm;
+const BUILD_SETTINGS_USER_REGEX =
+ /^( {6}(?:USER|ALTERNATE_OWNER|INSTALL_OWNER|VERSION_INFO_BUILDER) = ).+$/gm;
+const BUILD_SETTINGS_DEVELOPMENT_TEAM_REGEX = /^( {6}DEVELOPMENT_TEAM = ).+$/gm;
const TRAILING_WHITESPACE_REGEX = /[ \t]+$/gm;
function sortLinesInBlock(text: string, marker: RegExp): string {
@@ -130,6 +133,8 @@
normalized = normalized.replace(TARGET_DEVICE_IDENTIFIER_REGEX, '$1<UUID>');
normalized = normalized.replace(BUILD_SETTINGS_PATH_REGEX, '$1<PATH>');
+ normalized = normalized.replace(BUILD_SETTINGS_USER_REGEX, '$1<redacted>');
+ normalized = normalized.replace(BUILD_SETTINGS_DEVELOPMENT_TEAM_REGEX, '$1<redacted>');
normalized = normalized.replace(CODEX_ARG0_PATH_REGEX, '<HOME>/.codex/tmp/arg0/codex-arg0<ARG0>');
normalized = normalized.replace(ACQUIRED_USAGE_ASSERTION_TIME_REGEX, '$1<TIME>$2');
normalized = normalized.replace(This Bugbot Autofix run was free. To enable autofix for future PRs, go to the Cursor dashboard.
src/snapshot-tests/__fixtures__/project-discovery/show-build-settings--success.txt
Outdated
Show resolved
Hide resolved
src/snapshot-tests/__fixtures__/project-discovery/show-build-settings--success.txt
Show resolved
Hide resolved
c179927 to
f33bc8d
Compare
55a323e to
22247c9
Compare
22247c9 to
785bdd9
Compare
f33bc8d to
3b9be04
Compare
3b9be04 to
daa650e
Compare
src/snapshot-tests/__fixtures__/project-discovery/show-build-settings--success.txt
Show resolved
Hide resolved
src/snapshot-tests/__fixtures__/simulator/screenshot--success.txt
Outdated
Show resolved
Hide resolved
083f816 to
f5391c3
Compare
daa650e to
672e575
Compare
f5391c3 to
b6e35ad
Compare
672e575 to
ebee70a
Compare
ebee70a to
ed126b6
Compare
0497670 to
dc62323
Compare
91087af to
25e9218
Compare
66f1dca to
7583cb0
Compare
d9a8caa to
6ea4df8
Compare
7583cb0 to
1c849e2
Compare
6ea4df8 to
86b945d
Compare
42de996 to
7bcad74
Compare
546ad1c to
81d1e82
Compare
7bcad74 to
383ee04
Compare
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 383ee04. Configure here.
5fc14c0 to
75d408f
Compare
aa949ff to
3c0a6bb
Compare
Merge activity
|
The show-build-settings fixture had hardcoded username and UID values that would cause test failures on other developer machines. Added normalizer rules for ALTERNATE_OWNER, INSTALL_OWNER, USER, VERSION_INFO_BUILDER, and UID build settings.
The regex required a trailing slash after the path segment, missing files placed directly in tmpdir (e.g., screenshot temp files).
Add mcp-harness.ts that combines the MCP test harness (real server with in-memory transport and mocked executors) with the snapshot normalization and fixture comparison pipeline. Initial test scenarios cover session management (show/set defaults) and error paths (missing required params). The harness can be extended with additional tool scenarios.
All consumers were migrated to event-based handlers in PRs 6-9. The barrel with createErrorResponse/createTextResponse shims is no longer imported by any source file. Updated the ARCHITECTURE.md tool pattern example to reflect the event-based handler contract.
The --console-pty flag causes PTY buffering that prevents the PID banner from flushing to the log file reliably (0/5 in testing). Replace file-based PID polling with an idempotent simctl launch call via CommandExecutor. Calling simctl launch on an already-running app returns the existing PID without relaunching (verified 10/10). Algorithm: 1. Spawn --console-pty with fd-to-file (captures print/NSLog) 2. Wait for app startup 3. Plain simctl launch returns existing app PID 4. Start oslog stream separately for Logger messages
- Fix test summary regex to preserve "tests" word in failure counts - Normalize doctor cwd, axe video capture capability - Update fixtures for bundle ID change (MCPTest.macOS) and improved tmpdir/test-count normalization
33e3c2f to
343e579
Compare
| // Normalize final test summary line (counts vary across environments) | ||
| normalized = normalized.replace( | ||
| /\d+ (tests? failed), \d+ (passed)(?:, \d+ (skipped))?/g, | ||
| '<FAIL_COUNT> $1, <PASS_COUNT> $2, <SKIP_COUNT> skipped', | ||
| ); |
There was a problem hiding this comment.
Bug: The test summary normalization regex incorrectly adds a placeholder for skipped tests even when none are present, causing incorrect snapshot generation.
Severity: MEDIUM
Suggested Fix
Modify the replacement logic to be conditional. Instead of a single replace call with a static string, use a replacer function. This function would check if the capture group for 'skipped' (e.g., $3) is defined. If it is, include the skipped count placeholder in the returned string; otherwise, omit it.
Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.
Location: src/snapshot-tests/normalize.ts#L152-L156
Potential issue: The regex replacement in `normalize.ts` for test summary lines
incorrectly handles cases where there are no skipped tests. The replacement string
hard-codes the text `, <SKIP_COUNT> skipped` instead of conditionally including it based
on the optional capture group `(?:, \d+ (skipped))?`. As a result, when a test summary
line without skipped tests is processed (e.g., "10 tests failed, 5 passed"), the
normalized output will incorrectly append the skipped test placeholder. This leads to
incorrect snapshot fixtures and can cause inconsistent test comparisons across different
environments.
This is **PR 12 of 12**, the final PR in the stacked series that decouples the rendering pipeline from MCP transport. Depends on PR 11. Adds the snapshot test suite and performance benchmarks that validate the entire rendering pipeline end-to-end. These are large in line count but are almost entirely test fixtures (expected output files) and benchmark scripts. The snapshot test infrastructure captures the rendered output of tool invocations and compares against expected fixtures. This provides regression protection for the rendering pipeline -- any change to event formatting, diagnostic grouping, or output ordering will be caught by a fixture mismatch. **Test harness** (`src/snapshot-tests/`): - `harness.ts`: Core test runner that invokes tools with mock executors and captures rendered output - `fixture-io.ts`: Reads/writes fixture files, handles normalization (timestamps, paths, UUIDs) - `flowdeck-fixture-io.ts`: Flowdeck-specific fixture handling - `normalize.ts`: Output normalization for stable comparisons across environments - `resource-harness.ts`: Resource-specific snapshot testing **Fixtures**: Expected output files for each tool covering success, error, and edge case scenarios. These serve as living documentation of what each tool's output looks like. Performance benchmarks for the rendering pipeline and xcodebuild parsing: - Parser throughput: lines/second for xcodebuild output parsing - Render session performance: events/second for text and JSON strategies - End-to-end tool invocation timing These benchmarks establish baselines and can be run in CI to catch performance regressions. This PR is large by line count but low in conceptual complexity. The fixture files are auto-generated expected outputs. The benchmark scripts are straightforward timing loops. The meaningful code is the ~500 lines of test harness infrastructure. - PR 1-11/12: All code and configuration changes - **PR 12/12** (this PR): Snapshot tests and benchmarks - [ ] `npx vitest run` passes -- snapshot tests match expected fixtures - [ ] `npx vitest run --config vitest.snapshot.config.ts` runs snapshot suite specifically - [ ] Benchmarks execute without errors (performance numbers are informational)
This is **PR 12 of 12**, the final PR in the stacked series that decouples the rendering pipeline from MCP transport. Depends on PR 11. Adds the snapshot test suite and performance benchmarks that validate the entire rendering pipeline end-to-end. These are large in line count but are almost entirely test fixtures (expected output files) and benchmark scripts. The snapshot test infrastructure captures the rendered output of tool invocations and compares against expected fixtures. This provides regression protection for the rendering pipeline -- any change to event formatting, diagnostic grouping, or output ordering will be caught by a fixture mismatch. **Test harness** (`src/snapshot-tests/`): - `harness.ts`: Core test runner that invokes tools with mock executors and captures rendered output - `fixture-io.ts`: Reads/writes fixture files, handles normalization (timestamps, paths, UUIDs) - `normalize.ts`: Output normalization for stable comparisons across environments - `resource-harness.ts`: Resource-specific snapshot testing **Fixtures**: Expected output files for each tool covering success, error, and edge case scenarios. These serve as living documentation of what each tool's output looks like. Performance benchmarks for the rendering pipeline and xcodebuild parsing: - Parser throughput: lines/second for xcodebuild output parsing - Render session performance: events/second for text and JSON strategies - End-to-end tool invocation timing These benchmarks establish baselines and can be run in CI to catch performance regressions. This PR is large by line count but low in conceptual complexity. The fixture files are auto-generated expected outputs. The benchmark scripts are straightforward timing loops. The meaningful code is the ~500 lines of test harness infrastructure. - PR 1-11/12: All code and configuration changes - **PR 12/12** (this PR): Snapshot tests and benchmarks - [ ] `npx vitest run` passes -- snapshot tests match expected fixtures - [ ] `npx vitest run --config vitest.snapshot.config.ts` runs snapshot suite specifically - [ ] Benchmarks execute without errors (performance numbers are informational)



Summary
This is PR 12 of 12, the final PR in the stacked series that decouples the rendering pipeline from MCP transport. Depends on PR 11.
Adds the snapshot test suite and performance benchmarks that validate the entire rendering pipeline end-to-end. These are large in line count but are almost entirely test fixtures (expected output files) and benchmark scripts.
Snapshot test suite (159 files, ~5700 lines)
The snapshot test infrastructure captures the rendered output of tool invocations and compares against expected fixtures. This provides regression protection for the rendering pipeline -- any change to event formatting, diagnostic grouping, or output ordering will be caught by a fixture mismatch.
Test harness (
src/snapshot-tests/):harness.ts: Core test runner that invokes tools with mock executors and captures rendered outputfixture-io.ts: Reads/writes fixture files, handles normalization (timestamps, paths, UUIDs)flowdeck-fixture-io.ts: Flowdeck-specific fixture handlingnormalize.ts: Output normalization for stable comparisons across environmentsresource-harness.ts: Resource-specific snapshot testingFixtures: Expected output files for each tool covering success, error, and edge case scenarios. These serve as living documentation of what each tool's output looks like.
Benchmarks (~3300 lines)
Performance benchmarks for the rendering pipeline and xcodebuild parsing:
These benchmarks establish baselines and can be run in CI to catch performance regressions.
Note for reviewers
This PR is large by line count but low in conceptual complexity. The fixture files are auto-generated expected outputs. The benchmark scripts are straightforward timing loops. The meaningful code is the ~500 lines of test harness infrastructure.
Stack navigation
Test plan
npx vitest runpasses -- snapshot tests match expected fixturesnpx vitest run --config vitest.snapshot.config.tsruns snapshot suite specifically