Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 16 additions & 14 deletions docs/scenarios.md
Original file line number Diff line number Diff line change
Expand Up @@ -152,26 +152,28 @@ requirements (D8), code behavior not traced to any requirement (D9),
and constraint violations in the implementation (D10), with
implementation coverage metrics and specific code locations.

---

## Future Scenarios (Roadmap)

These scenarios describe capabilities that are planned but not yet
implemented. See the [roadmap](roadmap.md) for details.

### "Do our tests actually test what the plan says they should?"

Your validation plan specifies 58 test cases. Your test suite has
tests. But are they the same tests? Do the assertions match the
acceptance criteria?
acceptance criteria? Are there test cases in the plan that have no
automated test at all?

**Planned template:** `audit-test-compliance` ·
**Taxonomy:** `specification-drift` (D11–D13)
**Template:** `audit-test-compliance` · **Persona:** `specification-analyst` ·
**Protocol:** `test-compliance-audit` · **Taxonomy:** `specification-drift` (D11–D13)

**What you'd get:** A report mapping validation plan test cases to
actual test implementations, identifying unimplemented test cases,
tests with wrong assertions, and coverage gaps between the plan and
reality.
**What you get:** An investigation report mapping validation plan test
cases to actual test implementations, identifying unimplemented test
cases (D11), missing acceptance criterion assertions (D12), and
assertion mismatches where the test checks different conditions than
the plan specifies (D13).

---

## Future Scenarios (Roadmap)

These scenarios describe capabilities that are planned but not yet
implemented. See the [roadmap](roadmap.md) for details.

### "Extract the invariants from this RFC"

Expand Down
20 changes: 20 additions & 0 deletions manifest.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -160,6 +160,14 @@ protocols:
and classifies findings using the specification-drift taxonomy
(D8–D10).

- name: test-compliance-audit
path: protocols/reasoning/test-compliance-audit.md
description: >
Systematic protocol for auditing test code against a validation
plan and requirements document. Maps test case definitions to
test implementations and classifies findings using the
specification-drift taxonomy (D11–D13).

formats:
- name: requirements-doc
path: formats/requirements-doc.md
Expand Down Expand Up @@ -340,6 +348,18 @@ templates:
format: investigation-report
requires: requirements-document

- name: audit-test-compliance
path: templates/audit-test-compliance.md
description: >
Audit test code against a validation plan and requirements
document. Detects unimplemented test cases, missing acceptance
criterion assertions, and assertion mismatches.
persona: specification-analyst
protocols: [anti-hallucination, self-verification, operational-constraints, test-compliance-audit]
taxonomies: [specification-drift]
format: investigation-report
requires: [requirements-document, validation-plan]

investigation:
- name: investigate-bug
path: templates/investigate-bug.md
Expand Down
175 changes: 175 additions & 0 deletions protocols/reasoning/test-compliance-audit.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,175 @@
<!-- SPDX-License-Identifier: MIT -->
<!-- Copyright (c) PromptKit Contributors -->

---
name: test-compliance-audit
type: reasoning
description: >
Systematic protocol for auditing test code against a validation plan
and requirements document. Maps test case definitions to test
implementations, verifies assertions match acceptance criteria, and
classifies findings using the specification-drift taxonomy (D11–D13).
applicable_to:
- audit-test-compliance
---

# Protocol: Test Compliance Audit

Apply this protocol when auditing test code against a validation plan
and requirements document to determine whether the automated tests
implement what the validation plan specifies. The goal is to find every
gap between planned and actual test coverage — missing tests,
incomplete assertions, and mismatched expectations.

## Phase 1: Validation Plan Inventory

Extract the complete set of test case definitions from the validation
plan.

1. **Test cases** — for each TC-NNN, extract:
- The test case ID and title
- The linked requirement(s) (REQ-XXX-NNN)
- The test steps (inputs, actions, sequence)
- The expected results and pass/fail criteria
- The test level (unit, integration, system, etc.)
- Any preconditions or environmental assumptions

2. **Requirements cross-reference** — for each linked REQ-ID, look up
its acceptance criteria in the requirements document. These are the
ground truth for what the test should verify.

3. **Test scope classification** — classify each test case as:
- **Automatable**: Can be implemented as an automated test
- **Manual-only**: Requires human judgment, physical interaction,
or platform-specific behavior that cannot be automated
- **Deferred**: Explicitly marked as not-yet-implemented in the
validation plan
Restrict the audit to automatable test cases. Report manual-only
and deferred counts in the coverage summary.

## Phase 2: Test Code Inventory

Survey the test code to understand its structure.

1. **Test organization**: Identify the test framework (e.g., pytest,
JUnit, Rust #[test], Jest), test file structure, and naming
conventions.
2. **Test function catalog**: List all test functions/methods with
their names, locations (file, line), and any identifying markers
(TC-NNN in name or comment, requirement references).
3. **Test helpers and fixtures**: Identify shared setup, teardown,
mocking, and assertion utilities — these affect what individual
tests can verify.

Do NOT attempt to understand every test's implementation in detail.
Build the catalog first, then trace specific tests in Phase 3.

## Phase 3: Forward Traceability (Validation Plan → Test Code)

For each automatable test case in the validation plan:

1. **Find the implementing test**: Search the test code for a test
function that implements TC-NNN. Match by:
- Explicit TC-NNN reference in test name or comments
- Behavioral equivalence (test steps and assertions match the
validation plan's specification, even without an ID reference)
- Requirement reference (test references the same REQ-ID)

2. **Assess implementation completeness**: For each matched test:

a. **Step coverage**: Does the test execute the steps described in
the validation plan? Are inputs, actions, and sequences present?

b. **Assertion coverage**: Does the test assert the expected results
from the validation plan? Check each expected result individually.

c. **Acceptance criteria alignment**: Cross-reference the linked
requirement's acceptance criteria. Does the test verify ALL
criteria, or only a subset? Flag missing criteria as
D12_UNTESTED_ACCEPTANCE_CRITERION.

d. **Assertion correctness**: Do the test's assertions match the
expected behavior? Check for:
- Wrong thresholds (plan says 200ms, test checks for non-null)
- Wrong error codes (plan says 403, test checks not-200)
- Missing negative assertions (plan says "MUST NOT", test only
checks positive path)
- Structural assertions that don't verify semantics (checking
"response exists" instead of "response contains expected data")
Flag mismatches as D13_ASSERTION_MISMATCH.

3. **Classify the result**:
- **IMPLEMENTED**: Test fully implements the validation plan's
test case with correct assertions. Record the test location.
- **PARTIALLY IMPLEMENTED**: Test exists but is incomplete.
Classify based on *what* is missing:
- Missing acceptance criteria assertions →
D12_UNTESTED_ACCEPTANCE_CRITERION
- Wrong assertions or mismatched expected results →
D13_ASSERTION_MISMATCH
- **NOT IMPLEMENTED**: No test implements this test case (no
matching test function found in the provided code). Flag as
D11_UNIMPLEMENTED_TEST_CASE. Note: a test stub with an empty
body or skip annotation is NOT an implementation — classify it
as D13 (assertions don't match because there are none) and
record its code location.

## Phase 4: Backward Traceability (Test Code → Validation Plan)

Identify tests that don't trace to the validation plan.

1. **For each test function** in the test code, determine whether it
maps to a TC-NNN in the validation plan.

2. **Classify unmatched tests**:
- **Regression tests**: Tests added for specific bugs, not part of
the validation plan. These are expected and not findings.
- **Exploratory tests**: Tests that cover scenarios not in the
validation plan. Note these but do not flag as drift — they may
indicate validation plan gaps (candidates for new test cases).
- **Orphaned tests**: Tests that reference TC-NNN IDs or REQ-IDs
that do not exist in the validation plan or requirements. These
may be stale after a renumbering. Report orphaned tests as
observations in the coverage summary (Phase 6), not as D11–D13
findings — they don't fit the taxonomy since no valid TC-NNN
is involved.

## Phase 5: Classification and Reporting

Classify every finding using the specification-drift taxonomy.

1. Assign exactly one drift label (D11, D12, or D13) to each finding.
2. Assign severity using the taxonomy's severity guidance.
3. For each finding, provide:
- The drift label and short title
- The validation plan location (TC-NNN, section) and test code
location (file, function, line). For D11 findings, the test code
location is "None — no implementing test found" with a description
of what was searched.
- The linked requirement and its acceptance criteria
- Evidence: what the validation plan specifies and what the test
does (or doesn't)
- Impact: what could go wrong
- Recommended resolution
4. Order findings primarily by severity, then by taxonomy ranking
within each severity tier.

## Phase 6: Coverage Summary

After reporting individual findings, produce aggregate metrics:

1. **Test implementation rate**: automatable test cases with
implementing tests / total automatable test cases.
2. **Assertion coverage**: test cases with complete assertion
coverage / total implemented test cases.
3. **Acceptance criteria coverage**: individual acceptance criteria
verified by test assertions / total acceptance criteria across
all linked requirements.
4. **Manual/deferred test count**: count of test cases classified as
manual-only or deferred (excluded from the audit).
5. **Unmatched test count**: count of test functions in the test code
with no corresponding TC-NNN in the validation plan (regression,
exploratory, or orphaned).
6. **Overall assessment**: a summary judgment of test compliance
(e.g., "High compliance — 2 missing tests" or "Low compliance —
systemic assertion gaps across the test suite").
88 changes: 73 additions & 15 deletions taxonomies/specification-drift.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ domain: specification-traceability
applicable_to:
- audit-traceability
- audit-code-compliance
- audit-test-compliance
---

# Taxonomy: Specification Drift
Expand Down Expand Up @@ -200,32 +201,89 @@ safety-critical, security-related, or regulatory. High for performance
or functional constraints. Assess based on the constraint itself,
not the code's complexity.

## Reserved Labels (Future Use)
## Test Compliance Labels

The following label range is reserved for future specification drift
categories involving test code:
### D11_UNIMPLEMENTED_TEST_CASE

- **D11–D13**: Reserved for **test compliance** drift (validation plan
vs. test code). Example: D11_UNIMPLEMENTED_TEST_CASE — a test case in
the validation plan has no corresponding automated test.
A test case is defined in the validation plan but has no corresponding
automated test in the test code.

These labels will be defined when the `audit-test-compliance` template
is added to the library.
**Pattern**: TC-NNN is specified in the validation plan with steps,
inputs, and expected results. No test function, test class, or test
file in the test code implements this test case — either by name
reference, by TC-NNN identifier, or by behavioral equivalence.

**Risk**: The validation plan claims coverage that does not exist in
the automated test suite. The requirement linked to this test case
is effectively untested in CI, even though the validation plan says
it is covered.

**Severity guidance**: High when the linked requirement is
safety-critical or security-related. Medium for functional
requirements. Note: test cases classified as manual-only or deferred
in the validation plan are excluded from D11 findings and reported
only in the coverage summary.

### D12_UNTESTED_ACCEPTANCE_CRITERION

A test implementation exists for a test case, but it does not assert
one or more acceptance criteria specified for the linked requirement.

**Pattern**: TC-NNN is implemented as an automated test. The linked
requirement (REQ-XXX-NNN) has multiple acceptance criteria. The test
implementation asserts some criteria but omits others — for example,
it checks the happy-path output but does not verify error handling,
boundary conditions, or timing constraints specified in the acceptance
criteria.

**Risk**: The test passes but does not verify the full requirement.
Defects in the untested acceptance criteria will not be caught by CI.
This is the test-code equivalent of D7 (acceptance criteria mismatch
in the validation plan) but at the implementation level.

**Severity guidance**: High when the missing criterion is a security
or safety property. Medium for functional criteria. Assess based on
what the missing criterion protects, not on the test's overall
coverage.

### D13_ASSERTION_MISMATCH

A test implementation exists for a test case, but its assertions do
not match the expected behavior specified in the validation plan.

**Pattern**: TC-NNN is implemented as an automated test. The test
asserts different conditions, thresholds, or outcomes than what the
validation plan specifies — for example, the plan says "verify
response within 200ms" but the test asserts "response is not null",
or the plan says "verify error code 403" but the test asserts "status
is not 200".

**Risk**: The test passes but does not verify what the validation plan
says it should. This creates illusory coverage — the traceability
matrix shows the requirement as tested, but the actual test checks
something different. More dangerous than D11 (missing test) because
it is invisible without comparing test code to the validation plan.

**Severity guidance**: High. This is the most dangerous test
compliance drift type because it creates false confidence. Severity
should be assessed based on the gap between what is asserted and what
should be asserted.

## Ranking Criteria

Within a given severity level, order findings by impact on specification
integrity:

1. **Highest risk**: D6 (constraint violation in design), D7 (illusory
test coverage), and D10 (constraint violation in code) — these
indicate active conflicts between artifacts.
2. **High risk**: D2 (untested requirement), D5 (assumption drift), and
D8 (unimplemented requirement) — these indicate silent gaps that
will surface late.
test coverage), D10 (constraint violation in code), and D13
(assertion mismatch) — these indicate active conflicts between
artifacts.
2. **High risk**: D2 (untested requirement), D5 (assumption drift),
D8 (unimplemented requirement), and D12 (untested acceptance
criterion) — these indicate silent gaps that will surface late.
3. **Medium risk**: D1 (untraced requirement), D3 (orphaned design),
and D9 (undocumented behavior) — these indicate incomplete
traceability that needs human resolution.
D9 (undocumented behavior), and D11 (unimplemented test case) —
these indicate incomplete traceability that needs human resolution.
4. **Lowest risk**: D4 (orphaned test case) — effort misdirection but
no safety or correctness impact.

Expand Down
Loading
Loading