cppa-cursor-browser: Performance benchmark suite — create benchmark infra + baselines

## Calendar Day

Thursday, June 25, 2026 (**PR 1 of 2**)

## Planned Effort

**5 story points** — sprint item **#8** (Medium-High)

**Depends on:** Wednesday PR 1 (summary cache benchmark #7) merged — this suite organizes the summary-cache benchmarks alongside parse/export/search and captures baselines for the full set. Capstone of the week's benchmark work.

## Problem

The project has individual test files that measure specific latencies but no unified benchmark infrastructure with tracked baselines, regression gates, and CI integration. `benchmarks/baselines.json` exists but has empty groups. Without a benchmark suite that gates CI, performance-sensitive code paths (parse, export, search, summary cache) can degrade silently across releases — "the operations most likely to degrade under growth are exactly the ones with no regression gate."

## Goal

One merged PR that unifies all benchmark files under a common conftest, populates `benchmarks/baselines.json` from a reference run, adds a CI gate failing on >20% regression, and documents local + CI usage.

## Scope

### Touch points

- `tests/benchmarks/` — organize `test_parse_bench.py`, `test_export_bench.py`, `test_search_bench.py`, and `test_summary_cache_bench.py` (from #7) under a common `conftest.py` with shared fixtures
- `benchmarks/baselines.json` — populate from a reference run
- `.github/workflows/ci.yml` — add benchmark job with `--benchmark-compare`
- `benchmarks/README.md` (new) — document local + CI usage

### Gate behavior

- Fail if a current mean exceeds its baseline by >20%
- Pinned runner + consistent data to control variance
- Missing baselines for new benchmark names: warn, do not fail

## Acceptance Criteria

- [ ] A `pytest-benchmark` based suite exists covering: parse latency, export latency, search latency, and summary cache (from item #7)
- [ ] `benchmarks/baselines.json` is populated with initial values from a reference run
- [ ] CI has a benchmark job that compares against baselines and fails on >20% regression
- [ ] The benchmark job runs on a consistent environment (pinned runner, consistent data)
- [ ] A `benchmarks/README.md` documents how to run benchmarks locally and update baselines
- [ ] PR approved by at least 1 reviewer

## Verification

```powershell
cd C:\Users\Jasen\CppAliance\cppa-cursor-browser
.\.venv\Scripts\Activate.ps1
pytest tests/benchmarks/ --benchmark-only
pytest tests/benchmarks/ --benchmark-compare --benchmark-compare-fail=mean:20%
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cppa-cursor-browser: Performance benchmark suite — create benchmark infra + baselines #110

Calendar Day

Planned Effort

Problem

Goal

Scope

Touch points

Gate behavior

Acceptance Criteria

Verification

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

cppa-cursor-browser: Performance benchmark suite — create benchmark infra + baselines #110

Description

Calendar Day

Planned Effort

Problem

Goal

Scope

Touch points

Gate behavior

Acceptance Criteria

Verification

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions