chore(build): give tests first-class adversarial reviewers#57
Merged
Conversation
Both review-pr skills (Claude and pi) treated test code as a single catch-all concern: one "Test review & coverage" agent plus passive checklist text. In practice that conflates three distinct questions — does a test exist, does the test actually exercise the change, and is the test well-written — and the latter two reliably got dropped. Split test review into three dedicated adversarial reviewers, mirroring the production reviewers, and wire them through the existing depth levels: - Agent/Reviewer 11 (test efficacy): proves each test could fail if the production change regressed — vacuous assertions, tests that never reach the changed code, happy-path-only gaps, racy concurrency harnesses (swallowed AssertionError on spawned threads, Thread.sleep sync), and leaky @before setup. - Agent/Reviewer 12 (test-code quality): reflection overuse with the non-reflective alternative named, reinvented helpers vs existing fixtures, javadoc bloat, debugging residue — and the nuance that zero-GC / io.questdb.std rules do not apply to test code. - Agent/Reviewer 13 (regression-test efficacy): mentally reverts the production hunk and confirms the new test would then fail, catching "regression tests" that pass with the fix removed. Supporting changes: - Step 2.5e (test surface & helper inventory) grounds reinvented-helper and reflection findings in a real Grep sweep instead of memory. - Agent/Reviewer 5 narrowed to coverage-only, handing efficacy and quality to 11-13. - Step 3b adds test-specific verification passes to filter the characteristic false positives (an assertion isn't vacuous if production recomputes the value; a reflection finding needs a real non-reflective path to exist). - Levels 0-3 scale the test reviewers with depth, and a "Test code quality" checklist section is added. The Claude skill is ported to match the pi skill; the two are back in sync. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Both
review-prskills (the Claude skill under.claude/and the pi port under.pi/) treated test code as a single catch-all concern: one "Test review & coverage" agent plus passive checklist text. That conflates three distinct questions — does a test exist, does the test actually exercise the change, and is the test well-written — and the latter two reliably got dropped.This splits test review into three dedicated adversarial reviewers, mirroring the production reviewers, and wires them through the existing depth levels (0-3).
Changes
AssertionErroron spawned threads,Thread.sleepsync), leaky@Beforesetup.io.questdb.stdrules do not apply to test code.The Claude skill is ported to match the pi skill; the two are back in sync.
🤖 Generated with Claude Code