Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
113 changes: 113 additions & 0 deletions .github/workflows/perf-canary.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@
name: Perf Canary

# Lightweight per-PR build-time regression gate for PRs that touch the
# extractor, graph-builder, or native Rust layers — the parts of the codebase
# that caused the v3.12.0 regressions (+1827% 1-file rebuild, +98% full build).
#
# Only the incremental-benchmark suite is run (full build + no-op + 1-file
# rebuild for both engines). The regression guard uses BENCH_CANARY=1 mode,
# which applies a 50% threshold instead of the full suite's 25% — enough
# to catch catastrophic regressions while tolerating CI runner variance.
#
# This is intentionally separate from the full pre-publish-benchmark job in
# ci.yml, which runs unconditionally on every PR and measures the complete
# suite. The canary completes in roughly 5–10 minutes; the full suite takes
# 20–60 minutes.

on:
pull_request:
paths:
- "src/extractors/**"
- "src/domain/graph/**"
- "crates/**"
- "scripts/benchmark.ts"
- "scripts/incremental-benchmark.ts"
- "scripts/lib/bench-config.ts"
- "scripts/lib/fork-engine.ts"
Comment on lines +19 to +26

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 The path filter doesn't include tests/benchmarks/regression-guard.test.ts or scripts/update-incremental-report.ts. This very PR modifies the former to introduce BENCH_CANARY mode, but the canary won't fire for that change because no src/extractors/** / crates/** paths were touched. A future PR that adjusts the canary thresholds or the report-update logic will silently bypass this gate: the full pre-publish-benchmark runs but without BENCH_CANARY=1, so the BENCH_CANARY code path is never exercised in CI.

Suggested change
paths:
- "src/extractors/**"
- "src/domain/graph/**"
- "crates/**"
- "scripts/benchmark.ts"
- "scripts/incremental-benchmark.ts"
- "scripts/lib/bench-config.ts"
- "scripts/lib/fork-engine.ts"
paths:
- "src/extractors/**"
- "src/domain/graph/**"
- "crates/**"
- "scripts/benchmark.ts"
- "scripts/incremental-benchmark.ts"
- "scripts/lib/bench-config.ts"
- "scripts/lib/fork-engine.ts"
- "scripts/update-incremental-report.ts"
- "tests/benchmarks/regression-guard.test.ts"

Fix in Claude Code

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — added both scripts/update-incremental-report.ts and tests/benchmarks/regression-guard.test.ts to the path filter. PRs that only touch the canary infrastructure will now correctly trigger the canary run.

- "scripts/update-incremental-report.ts"
- "tests/benchmarks/regression-guard.test.ts"

permissions: {}

concurrency:
group: perf-canary-${{ github.ref }}
cancel-in-progress: true
Comment on lines +32 to +34

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 The workflow is missing an explicit permissions: block. Other workflows in this repo (e.g. benchmark.yml) use permissions: {} to lock down the token scope. Without an explicit declaration the job inherits the repo default, which may be broader than needed for a read-only benchmark run that only writes an artifact.

Suggested change
concurrency:
group: perf-canary-${{ github.ref }}
cancel-in-progress: true
permissions: {}
concurrency:
group: perf-canary-${{ github.ref }}
cancel-in-progress: true

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Fix in Claude Code

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — added permissions: {} at workflow level (above concurrency:) to lock down the token scope, consistent with benchmark.yml and publish.yml.


jobs:
perf-canary:
name: Perf canary (incremental tiers)
runs-on: ubuntu-latest
env:
CODEGRAPH_FAST_SKIP_DIAG: "1"

steps:
- uses: actions/checkout@v6

- uses: actions/setup-node@v6
with:
node-version: 22
cache: "npm"

- name: Setup Rust
uses: dtolnay/rust-toolchain@stable

- name: Rust cache
uses: Swatinem/rust-cache@v2
with:
workspaces: crates/codegraph-core

- name: Install napi-rs CLI
timeout-minutes: 5
run: npm install -g @napi-rs/cli@3

- name: Build native addon
working-directory: crates/codegraph-core
run: napi build --release

- name: Install dependencies
timeout-minutes: 20
shell: bash
run: |
for attempt in 1 2 3; do
npm install && break
if [ "$attempt" -lt 3 ]; then
echo "::warning::npm install attempt $attempt failed, retrying in 15s..."
sleep 15
else
echo "::error::npm install failed after 3 attempts"
exit 1
fi
done

- name: Install native addon over published binary
run: node scripts/ci-install-native.mjs

# Build dist/ so benchmarks load the same compiled JS that ships to npm,
# matching the methodology used by the full pre-publish-benchmark gate.
- name: Build TypeScript
run: npm run build

- name: Run incremental benchmark
timeout-minutes: 15
run: |
STRIP_FLAG=$(node -e "const [M]=process.versions.node.split('.').map(Number); console.log(M>=23?'--strip-types':'--experimental-strip-types')")
node $STRIP_FLAG --import ./scripts/ts-resolve-loader.js scripts/incremental-benchmark.ts --version dev --dist > incremental-canary-result.json

- name: Update incremental report
run: |
STRIP_FLAG=$(node -e "const [M]=process.versions.node.split('.').map(Number); console.log(M>=23?'--strip-types':'--experimental-strip-types')")
node $STRIP_FLAG scripts/update-incremental-report.ts incremental-canary-result.json

- name: Regression guard (50% threshold)
env:
RUN_REGRESSION_GUARD: "1"
BENCH_CANARY: "1"
run: npm run test:regression-guard
Comment on lines +101 to +105

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Canary must be a required status check to act as a gate

The workflow is a separate file and is not included in ci.yml's ci-pipeline aggregator (needs: [lint, native-host-build, test, ...]). For it to actually block merges on regressing PRs, perf-canary / Perf canary (incremental tiers) needs to be added as a required status check in branch protection rules. Without that step the canary is advisory only — a failing run won't prevent the PR from being merged.

Fix in Claude Code

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledged — making the canary a required status check in branch protection rules is a repo settings change, not a code change. Filed as a follow-up action for the repo admin to add perf-canary / Perf canary (incremental tiers) to required status checks after this PR merges.


- name: Upload canary result
if: always()
uses: actions/upload-artifact@v7
with:
name: incremental-canary-result
path: incremental-canary-result.json
if-no-files-found: warn
49 changes: 39 additions & 10 deletions tests/benchmarks/regression-guard.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,16 @@ import { describe, expect, test } from 'vitest';

// ── Configuration ────────────────────────────────────────────────────────

/**
* When BENCH_CANARY=1, only incremental-benchmark checks run and all timing
* thresholds are raised to 50%. This mode is used by the per-PR perf-canary
* workflow (.github/workflows/perf-canary.yml) which runs only on PRs
* touching src/extractors/, src/domain/graph/, or crates/. The looser
* threshold absorbs CI runner variance while still catching the class of
* catastrophic regressions that hit v3.12.0 (+98%/+1827%).
*/
const BENCH_CANARY = process.env.BENCH_CANARY === '1';

/**
* Maximum allowed regression (as a fraction, e.g. 0.25 = 25%).
*
Expand All @@ -26,8 +36,10 @@ import { describe, expect, test } from 'vitest';
*
* Genuinely high-variance sub-30ms metrics get a wider tolerance via
* `NOISY_METRICS` below — see that set's docstring for rationale.
*
* In BENCH_CANARY mode this is overridden to 0.5 (50%) — see above.
*/
const REGRESSION_THRESHOLD = 0.25;
const REGRESSION_THRESHOLD = BENCH_CANARY ? 0.5 : 0.25;

/**
* Wider regression threshold applied to metrics in NOISY_METRICS.
Expand All @@ -41,8 +53,11 @@ const REGRESSION_THRESHOLD = 0.25;
* Keeping the global threshold at 25% means a regression in the 30–100ms
* range is still caught (e.g. 50ms→63ms = +26%, flagged), while sub-30ms
* metrics in this set get the wider 50% allowance.
*
* In BENCH_CANARY mode this is overridden to 1.0 (100%) — the canary's
* purpose is to catch gross regressions (+50%+), not sub-30ms jitter.
*/
const NOISY_METRIC_THRESHOLD = 0.5;
const NOISY_METRIC_THRESHOLD = BENCH_CANARY ? 1.0 : 0.5;

/**
* Metric labels treated as high-variance and given the NOISY_METRIC_THRESHOLD
Expand Down Expand Up @@ -86,8 +101,12 @@ const NOISY_METRICS = new Set<string>(['No-op rebuild', '1-file rebuild', 'fnDep
* v3.0.1–3.4.0), which 75% still flags, while absorbing the ≤71% shared-runner
* jitter. Size metrics (DB bytes/file) are engine-independent and excluded from
* this widening via SIZE_METRICS below — they keep the strict threshold.
*
* In BENCH_CANARY mode this is overridden to 1.5 (150%) — the canary targets
* gross regressions only, and WASM incremental metrics have extreme variance
* on shared runners.
*/
const WASM_TIMING_THRESHOLD = 0.75;
const WASM_TIMING_THRESHOLD = BENCH_CANARY ? 1.5 : 0.75;

/**
* Metric labels that measure size/count rather than wall-clock time. These are
Expand Down Expand Up @@ -622,6 +641,10 @@ interface IncrementalEntry {
// in the default `npm test` run so docs commits that merge already-recorded
// regressed history into main don't trigger false failures — by then the
// release has already passed the gate.
//
// When BENCH_CANARY=1 (set by .github/workflows/perf-canary.yml), only the
// incremental-benchmark suite runs and thresholds are raised to 50% — see
// the BENCH_CANARY constant above.
const RUN_REGRESSION_GUARD = process.env.RUN_REGRESSION_GUARD === '1';

describe.runIf(RUN_REGRESSION_GUARD)('Benchmark regression guard', () => {
Expand All @@ -641,7 +664,9 @@ describe.runIf(RUN_REGRESSION_GUARD)('Benchmark regression guard', () => {
// Warn when KNOWN_REGRESSIONS entries are stale (more than 1 minor version
// behind the current package version). This makes the stale-exemption
// problem self-detecting rather than requiring manual bookkeeping.
test('KNOWN_REGRESSIONS entries are not stale', () => {
// Skipped in canary mode — this check is maintenance-only and irrelevant
// for a lightweight build-time regression gate.
test.skipIf(BENCH_CANARY)('KNOWN_REGRESSIONS entries are not stale', () => {
// eslint-disable-next-line @typescript-eslint/no-require-imports
const pkgVersion: string = JSON.parse(
fs.readFileSync(path.join(ROOT, 'package.json'), 'utf8'),
Expand Down Expand Up @@ -670,18 +695,22 @@ describe.runIf(RUN_REGRESSION_GUARD)('Benchmark regression guard', () => {
).toBe(0);
});

// Validate newest-first ordering assumption for all history arrays
test('build history is sorted newest-first', () => {
// Validate newest-first ordering assumption for all history arrays.
// Build/query ordering checks are skipped in canary mode (only incremental
// history is updated by the canary workflow).
test.skipIf(BENCH_CANARY)('build history is sorted newest-first', () => {
assertNewestFirst(buildHistory, 'Build benchmark');
});
test('query history is sorted newest-first', () => {
test.skipIf(BENCH_CANARY)('query history is sorted newest-first', () => {
assertNewestFirst(queryHistory, 'Query benchmark');
});
test('incremental history is sorted newest-first', () => {
assertNewestFirst(incrementalHistory, 'Incremental benchmark');
});

describe('build benchmarks', () => {
// In canary mode only the incremental suite runs — build/query/resolution
// benchmarks are not measured by the perf-canary workflow.
describe.skipIf(BENCH_CANARY)('build benchmarks', () => {
for (const engineKey of ['native', 'wasm'] as const) {
const pair = findLatestPair(buildHistory, (e) => e[engineKey] != null);
if (!pair) continue;
Expand Down Expand Up @@ -714,7 +743,7 @@ describe.runIf(RUN_REGRESSION_GUARD)('Benchmark regression guard', () => {
});
});

describe('query benchmarks', () => {
describe.skipIf(BENCH_CANARY)('query benchmarks', () => {
for (const engineKey of ['native', 'wasm'] as const) {
const pair = findLatestPair(queryHistory, (e) => e[engineKey] != null);
if (!pair) continue;
Expand Down Expand Up @@ -817,7 +846,7 @@ describe.runIf(RUN_REGRESSION_GUARD)('Benchmark regression guard', () => {
});
});

describe('resolution benchmarks', () => {
describe.skipIf(BENCH_CANARY)('resolution benchmarks', () => {
/**
* Resolution precision/recall regression thresholds.
* These are percentage-point drops (not relative %) because resolution
Expand Down
Loading