Skip to content

chore(build): give tests first-class adversarial reviewers#57

Merged
bluestreak01 merged 1 commit into
mainfrom
chore/review-pr-test-reviewers
Jun 26, 2026
Merged

chore(build): give tests first-class adversarial reviewers#57
bluestreak01 merged 1 commit into
mainfrom
chore/review-pr-test-reviewers

Conversation

@bluestreak01

Copy link
Copy Markdown
Member

What

Both review-pr skills (the Claude skill under .claude/ and the pi port under .pi/) treated test code as a single catch-all concern: one "Test review & coverage" agent plus passive checklist text. That conflates three distinct questions — does a test exist, does the test actually exercise the change, and is the test well-written — and the latter two reliably got dropped.

This splits test review into three dedicated adversarial reviewers, mirroring the production reviewers, and wires them through the existing depth levels (0-3).

Changes

  • Reviewer 11 — test efficacy: proves each test could fail if the production change regressed — vacuous assertions, tests that never reach the changed code, happy-path-only gaps, racy concurrency harnesses (swallowed AssertionError on spawned threads, Thread.sleep sync), leaky @Before setup.
  • Reviewer 12 — test-code quality: reflection overuse with the non-reflective alternative named, reinvented helpers vs existing fixtures, javadoc bloat, debugging residue — plus the nuance that zero-GC / io.questdb.std rules do not apply to test code.
  • Reviewer 13 — regression-test efficacy: mentally reverts the production hunk and confirms the new test would then fail, catching "regression tests" that pass with the fix removed.
  • Step 2.5e (test surface & helper inventory) grounds reinvented-helper and reflection findings in a real Grep sweep instead of memory.
  • Reviewer 5 narrowed to coverage-only, handing efficacy and quality to 11-13.
  • Step 3b adds test-specific verification passes to filter the characteristic false positives.
  • Levels 0-3 scale the test reviewers with depth, and a "Test code quality" checklist section is added.

The Claude skill is ported to match the pi skill; the two are back in sync.

🤖 Generated with Claude Code

Both review-pr skills (Claude and pi) treated test code as a single
catch-all concern: one "Test review & coverage" agent plus passive
checklist text. In practice that conflates three distinct questions —
does a test exist, does the test actually exercise the change, and is
the test well-written — and the latter two reliably got dropped.

Split test review into three dedicated adversarial reviewers, mirroring
the production reviewers, and wire them through the existing depth levels:

- Agent/Reviewer 11 (test efficacy): proves each test could fail if the
  production change regressed — vacuous assertions, tests that never
  reach the changed code, happy-path-only gaps, racy concurrency
  harnesses (swallowed AssertionError on spawned threads, Thread.sleep
  sync), and leaky @before setup.
- Agent/Reviewer 12 (test-code quality): reflection overuse with the
  non-reflective alternative named, reinvented helpers vs existing
  fixtures, javadoc bloat, debugging residue — and the nuance that
  zero-GC / io.questdb.std rules do not apply to test code.
- Agent/Reviewer 13 (regression-test efficacy): mentally reverts the
  production hunk and confirms the new test would then fail, catching
  "regression tests" that pass with the fix removed.

Supporting changes:
- Step 2.5e (test surface & helper inventory) grounds reinvented-helper
  and reflection findings in a real Grep sweep instead of memory.
- Agent/Reviewer 5 narrowed to coverage-only, handing efficacy and
  quality to 11-13.
- Step 3b adds test-specific verification passes to filter the
  characteristic false positives (an assertion isn't vacuous if
  production recomputes the value; a reflection finding needs a real
  non-reflective path to exist).
- Levels 0-3 scale the test reviewers with depth, and a "Test code
  quality" checklist section is added.

The Claude skill is ported to match the pi skill; the two are back in
sync.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@bluestreak01 bluestreak01 changed the title chore(review-pr): give tests first-class adversarial reviewers chore: give tests first-class adversarial reviewers Jun 25, 2026
@bluestreak01 bluestreak01 changed the title chore: give tests first-class adversarial reviewers chore(build): give tests first-class adversarial reviewers Jun 25, 2026
@bluestreak01 bluestreak01 merged commit bbc0c96 into main Jun 26, 2026
13 of 14 checks passed
@bluestreak01 bluestreak01 deleted the chore/review-pr-test-reviewers branch June 26, 2026 12:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant