You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Depends on: Wednesday PR 1 (summary cache benchmark #7) merged — this suite organizes the summary-cache benchmarks alongside parse/export/search and captures baselines for the full set. Capstone of the week's benchmark work.
Problem
The project has individual test files that measure specific latencies but no unified benchmark infrastructure with tracked baselines, regression gates, and CI integration. benchmarks/baselines.json exists but has empty groups. Without a benchmark suite that gates CI, performance-sensitive code paths (parse, export, search, summary cache) can degrade silently across releases — "the operations most likely to degrade under growth are exactly the ones with no regression gate."
Goal
One merged PR that unifies all benchmark files under a common conftest, populates benchmarks/baselines.json from a reference run, adds a CI gate failing on >20% regression, and documents local + CI usage.
Scope
Touch points
tests/benchmarks/ — organize test_parse_bench.py, test_export_bench.py, test_search_bench.py, and test_summary_cache_bench.py (from Support Cursor CLI agent sessions #7) under a common conftest.py with shared fixtures
benchmarks/baselines.json — populate from a reference run
.github/workflows/ci.yml — add benchmark job with --benchmark-compare
benchmarks/README.md (new) — document local + CI usage
Gate behavior
Fail if a current mean exceeds its baseline by >20%
Pinned runner + consistent data to control variance
Missing baselines for new benchmark names: warn, do not fail
Acceptance Criteria
A pytest-benchmark based suite exists covering: parse latency, export latency, search latency, and summary cache (from item Support Cursor CLI agent sessions #7)
benchmarks/baselines.json is populated with initial values from a reference run
CI has a benchmark job that compares against baselines and fails on >20% regression
The benchmark job runs on a consistent environment (pinned runner, consistent data)
A benchmarks/README.md documents how to run benchmarks locally and update baselines
PR approved by at least 1 reviewer
Verification
cd C:\Users\Jasen\CppAliance\cppa-cursor-browser
.\.venv\Scripts\Activate.ps1
pytest tests/benchmarks/--benchmark-only
pytest tests/benchmarks/--benchmark-compare --benchmark-compare-fail=mean:20%
Calendar Day
Thursday, June 25, 2026 (PR 1 of 2)
Planned Effort
5 story points — sprint item #8 (Medium-High)
Depends on: Wednesday PR 1 (summary cache benchmark #7) merged — this suite organizes the summary-cache benchmarks alongside parse/export/search and captures baselines for the full set. Capstone of the week's benchmark work.
Problem
The project has individual test files that measure specific latencies but no unified benchmark infrastructure with tracked baselines, regression gates, and CI integration.
benchmarks/baselines.jsonexists but has empty groups. Without a benchmark suite that gates CI, performance-sensitive code paths (parse, export, search, summary cache) can degrade silently across releases — "the operations most likely to degrade under growth are exactly the ones with no regression gate."Goal
One merged PR that unifies all benchmark files under a common conftest, populates
benchmarks/baselines.jsonfrom a reference run, adds a CI gate failing on >20% regression, and documents local + CI usage.Scope
Touch points
tests/benchmarks/— organizetest_parse_bench.py,test_export_bench.py,test_search_bench.py, andtest_summary_cache_bench.py(from Support Cursor CLI agent sessions #7) under a commonconftest.pywith shared fixturesbenchmarks/baselines.json— populate from a reference run.github/workflows/ci.yml— add benchmark job with--benchmark-comparebenchmarks/README.md(new) — document local + CI usageGate behavior
Acceptance Criteria
pytest-benchmarkbased suite exists covering: parse latency, export latency, search latency, and summary cache (from item Support Cursor CLI agent sessions #7)benchmarks/baselines.jsonis populated with initial values from a reference runbenchmarks/README.mddocuments how to run benchmarks locally and update baselinesVerification