From 7e54903dd753ce77f44429679516322217b63c2a Mon Sep 17 00:00:00 2001
From: Alan Jowett <alan.jowett@microsoft.com>
Date: Fri, 20 Mar 2026 09:35:24 -0700
Subject: [PATCH 1/3] Add test compliance audit template (spec+validation ->
 test code)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Add a new template that audits test code against a validation plan
and requirements document for test compliance drift. Detects
unimplemented test cases (D11), missing acceptance criterion assertions
(D12), and assertion mismatches (D13).

New components:
- Protocol: test-compliance-audit — 6-phase methodology (validation
  plan inventory, test code inventory, forward/backward traceability,
  classification, coverage summary)
- Template: audit-test-compliance — consumes requirements + validation
  plan + test code, produces investigation-report

Extended components:
- Taxonomy: specification-drift — D11-D13 labels defined (previously
  reserved). Ranking criteria updated. No more reserved labels.
- Scenarios gallery updated (moved from future to existing)

This completes the audit trifecta:
- audit-traceability: doc <-> doc <-> doc (D1-D7)
- audit-code-compliance: spec <-> code (D8-D10)
- audit-test-compliance: spec <-> test code (D11-D13)

Closes #38

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
---
 docs/scenarios.md                            |  30 ++--
 manifest.yaml                                |  20 +++
 protocols/reasoning/test-compliance-audit.md | 166 +++++++++++++++++++
 taxonomies/specification-drift.md            |  87 ++++++++--
 templates/audit-test-compliance.md           | 130 +++++++++++++++
 5 files changed, 404 insertions(+), 29 deletions(-)
 create mode 100644 protocols/reasoning/test-compliance-audit.md
 create mode 100644 templates/audit-test-compliance.md

diff --git a/docs/scenarios.md b/docs/scenarios.md
index 448eb3a..34443d3 100644
--- a/docs/scenarios.md
+++ b/docs/scenarios.md
@@ -152,26 +152,28 @@ requirements (D8), code behavior not traced to any requirement (D9),
 and constraint violations in the implementation (D10), with
 implementation coverage metrics and specific code locations.
 
----
-
-## Future Scenarios (Roadmap)
-
-These scenarios describe capabilities that are planned but not yet
-implemented. See the [roadmap](roadmap.md) for details.
-
 ### "Do our tests actually test what the plan says they should?"
 
 Your validation plan specifies 58 test cases. Your test suite has
 tests. But are they the same tests? Do the assertions match the
-acceptance criteria?
+acceptance criteria? Are there test cases in the plan that have no
+automated test at all?
 
-**Planned template:** `audit-test-compliance` ·
-**Taxonomy:** `specification-drift` (D11–D13)
+**Template:** `audit-test-compliance` · **Persona:** `specification-analyst` ·
+**Protocol:** `test-compliance-audit` · **Taxonomy:** `specification-drift` (D11–D13)
 
-**What you'd get:** A report mapping validation plan test cases to
-actual test implementations, identifying unimplemented test cases,
-tests with wrong assertions, and coverage gaps between the plan and
-reality.
+**What you get:** An investigation report mapping validation plan test
+cases to actual test implementations, identifying unimplemented test
+cases (D11), missing acceptance criterion assertions (D12), and
+assertion mismatches where the test checks different conditions than
+the plan specifies (D13).
+
+---
+
+## Future Scenarios (Roadmap)
+
+These scenarios describe capabilities that are planned but not yet
+implemented. See the [roadmap](roadmap.md) for details.
 
 ### "Extract the invariants from this RFC"
 
diff --git a/manifest.yaml b/manifest.yaml
index a81d89c..2fc8580 100644
--- a/manifest.yaml
+++ b/manifest.yaml
@@ -160,6 +160,14 @@ protocols:
         and classifies findings using the specification-drift taxonomy
         (D8–D10).
 
+    - name: test-compliance-audit
+      path: protocols/reasoning/test-compliance-audit.md
+      description: >
+        Systematic protocol for auditing test code against a validation
+        plan and requirements document. Maps test case definitions to
+        test implementations and classifies findings using the
+        specification-drift taxonomy (D11–D13).
+
 formats:
   - name: requirements-doc
     path: formats/requirements-doc.md
@@ -340,6 +348,18 @@ templates:
       format: investigation-report
       requires: requirements-document
 
+    - name: audit-test-compliance
+      path: templates/audit-test-compliance.md
+      description: >
+        Audit test code against a validation plan and requirements
+        document. Detects unimplemented test cases, missing acceptance
+        criterion assertions, and assertion mismatches.
+      persona: specification-analyst
+      protocols: [anti-hallucination, self-verification, operational-constraints, test-compliance-audit]
+      taxonomies: [specification-drift]
+      format: investigation-report
+      requires: [requirements-document, validation-plan]
+
   investigation:
     - name: investigate-bug
       path: templates/investigate-bug.md
diff --git a/protocols/reasoning/test-compliance-audit.md b/protocols/reasoning/test-compliance-audit.md
new file mode 100644
index 0000000..685700a
--- /dev/null
+++ b/protocols/reasoning/test-compliance-audit.md
@@ -0,0 +1,166 @@
+<!-- SPDX-License-Identifier: MIT -->
+<!-- Copyright (c) PromptKit Contributors -->
+
+---
+name: test-compliance-audit
+type: reasoning
+description: >
+  Systematic protocol for auditing test code against a validation plan
+  and requirements document. Maps test case definitions to test
+  implementations, verifies assertions match acceptance criteria, and
+  classifies findings using the specification-drift taxonomy (D11–D13).
+applicable_to:
+  - audit-test-compliance
+---
+
+# Protocol: Test Compliance Audit
+
+Apply this protocol when auditing test code against a validation plan
+and requirements document to determine whether the automated tests
+implement what the validation plan specifies. The goal is to find every
+gap between planned and actual test coverage — missing tests,
+incomplete assertions, and mismatched expectations.
+
+## Phase 1: Validation Plan Inventory
+
+Extract the complete set of test case definitions from the validation
+plan.
+
+1. **Test cases** — for each TC-NNN, extract:
+   - The test case ID and title
+   - The linked requirement(s) (REQ-XXX-NNN)
+   - The test steps (inputs, actions, sequence)
+   - The expected results and pass/fail criteria
+   - The test level (unit, integration, system, etc.)
+   - Any preconditions or environmental assumptions
+
+2. **Requirements cross-reference** — for each linked REQ-ID, look up
+   its acceptance criteria in the requirements document. These are the
+   ground truth for what the test should verify.
+
+3. **Test scope classification** — classify each test case as:
+   - **Automatable**: Can be implemented as an automated test
+   - **Manual-only**: Requires human judgment, physical interaction,
+     or platform-specific behavior that cannot be automated
+   - **Deferred**: Explicitly marked as not-yet-implemented in the
+     validation plan
+   Restrict the audit to automatable test cases. Report manual-only
+   and deferred counts in the coverage summary.
+
+## Phase 2: Test Code Inventory
+
+Survey the test code to understand its structure.
+
+1. **Test organization**: Identify the test framework (e.g., pytest,
+   JUnit, Rust #[test], Jest), test file structure, and naming
+   conventions.
+2. **Test function catalog**: List all test functions/methods with
+   their names, locations (file, line), and any identifying markers
+   (TC-NNN in name or comment, requirement references).
+3. **Test helpers and fixtures**: Identify shared setup, teardown,
+   mocking, and assertion utilities — these affect what individual
+   tests can verify.
+
+Do NOT attempt to understand every test's implementation in detail.
+Build the catalog first, then trace specific tests in Phase 3.
+
+## Phase 3: Forward Traceability (Validation Plan → Test Code)
+
+For each automatable test case in the validation plan:
+
+1. **Find the implementing test**: Search the test code for a test
+   function that implements TC-NNN. Match by:
+   - Explicit TC-NNN reference in test name or comments
+   - Behavioral equivalence (test steps and assertions match the
+     validation plan's specification, even without an ID reference)
+   - Requirement reference (test references the same REQ-ID)
+
+2. **Assess implementation completeness**: For each matched test:
+
+   a. **Step coverage**: Does the test execute the steps described in
+      the validation plan? Are inputs, actions, and sequences present?
+
+   b. **Assertion coverage**: Does the test assert the expected results
+      from the validation plan? Check each expected result individually.
+
+   c. **Acceptance criteria alignment**: Cross-reference the linked
+      requirement's acceptance criteria. Does the test verify ALL
+      criteria, or only a subset? Flag missing criteria as
+      D12_UNTESTED_ACCEPTANCE_CRITERION.
+
+   d. **Assertion correctness**: Do the test's assertions match the
+      expected behavior? Check for:
+      - Wrong thresholds (plan says 200ms, test checks for non-null)
+      - Wrong error codes (plan says 403, test checks not-200)
+      - Missing negative assertions (plan says "MUST NOT", test only
+        checks positive path)
+      - Structural assertions that don't verify semantics (checking
+        "response exists" instead of "response contains expected data")
+      Flag mismatches as D13_ASSERTION_MISMATCH.
+
+3. **Classify the result**:
+   - **IMPLEMENTED**: Test fully implements the validation plan's
+     test case with correct assertions. Record the test location.
+   - **PARTIALLY IMPLEMENTED**: Test exists but is missing steps,
+     assertions, or acceptance criteria. Flag as D12 with details on
+     what is present and what is missing.
+   - **NOT IMPLEMENTED**: No test implements this test case. Flag as
+     D11_UNIMPLEMENTED_TEST_CASE.
+
+## Phase 4: Backward Traceability (Test Code → Validation Plan)
+
+Identify tests that don't trace to the validation plan.
+
+1. **For each test function** in the test code, determine whether it
+   maps to a TC-NNN in the validation plan.
+
+2. **Classify unmatched tests**:
+   - **Regression tests**: Tests added for specific bugs, not part of
+     the validation plan. These are expected and not findings.
+   - **Exploratory tests**: Tests that cover scenarios not in the
+     validation plan. Note these but do not flag as drift — they may
+     indicate validation plan gaps (candidates for new test cases).
+   - **Orphaned tests**: Tests that reference TC-NNN IDs or REQ-IDs
+     that do not exist in the validation plan or requirements. These
+     may be stale after a renumbering. Flag if they reference invalid
+     identifiers.
+
+## Phase 5: Classification and Reporting
+
+Classify every finding using the specification-drift taxonomy.
+
+1. Assign exactly one drift label (D11, D12, or D13) to each finding.
+2. Assign severity using the taxonomy's severity guidance.
+3. For each finding, provide:
+   - The drift label and short title
+   - The validation plan location (TC-NNN, section) and test code
+     location (file, function, line). For D11 findings, the test code
+     location is "None — no implementing test found" with a description
+     of what was searched.
+   - The linked requirement and its acceptance criteria
+   - Evidence: what the validation plan specifies and what the test
+     does (or doesn't)
+   - Impact: what could go wrong
+   - Recommended resolution
+4. Order findings primarily by severity, then by taxonomy ranking
+   within each severity tier.
+
+## Phase 6: Coverage Summary
+
+After reporting individual findings, produce aggregate metrics:
+
+1. **Test implementation rate**: automatable test cases with
+   implementing tests / total automatable test cases.
+2. **Assertion coverage**: test cases with complete assertion
+   coverage / total implemented test cases.
+3. **Acceptance criteria coverage**: individual acceptance criteria
+   verified by test assertions / total acceptance criteria across
+   all linked requirements.
+4. **Manual/deferred test count**: count of test cases classified as
+   manual-only or deferred (excluded from the audit).
+5. **Unmatched test count**: count of test functions in the test code
+   with no corresponding TC-NNN in the validation plan (regression,
+   exploratory, or orphaned).
+6. **Overall assessment**: a summary judgment of test compliance
+   (e.g., "High compliance — 2 missing tests" or "Low compliance —
+   systemic assertion gaps across the test suite").
diff --git a/taxonomies/specification-drift.md b/taxonomies/specification-drift.md
index ea223e9..3fa8266 100644
--- a/taxonomies/specification-drift.md
+++ b/taxonomies/specification-drift.md
@@ -13,6 +13,7 @@ domain: specification-traceability
 applicable_to:
   - audit-traceability
   - audit-code-compliance
+  - audit-test-compliance
 ---
 
 # Taxonomy: Specification Drift
@@ -200,17 +201,72 @@ safety-critical, security-related, or regulatory. High for performance
 or functional constraints. Assess based on the constraint itself,
 not the code's complexity.
 
-## Reserved Labels (Future Use)
+## Test Compliance Labels
 
-The following label range is reserved for future specification drift
-categories involving test code:
+### D11_UNIMPLEMENTED_TEST_CASE
 
-- **D11–D13**: Reserved for **test compliance** drift (validation plan
-  vs. test code). Example: D11_UNIMPLEMENTED_TEST_CASE — a test case in
-  the validation plan has no corresponding automated test.
+A test case is defined in the validation plan but has no corresponding
+automated test in the test code.
 
-These labels will be defined when the `audit-test-compliance` template
-is added to the library.
+**Pattern**: TC-NNN is specified in the validation plan with steps,
+inputs, and expected results. No test function, test class, or test
+file in the test code implements this test case — either by name
+reference, by TC-NNN identifier, or by behavioral equivalence.
+
+**Risk**: The validation plan claims coverage that does not exist in
+the automated test suite. The requirement linked to this test case
+is effectively untested in CI, even though the validation plan says
+it is covered.
+
+**Severity guidance**: High when the linked requirement is
+safety-critical or security-related. Medium for functional
+requirements. Low for non-functional or exploratory test cases
+explicitly marked as manual-only in the validation plan.
+
+### D12_UNTESTED_ACCEPTANCE_CRITERION
+
+A test implementation exists for a test case, but it does not assert
+one or more acceptance criteria specified for the linked requirement.
+
+**Pattern**: TC-NNN is implemented as an automated test. The linked
+requirement (REQ-XXX-NNN) has multiple acceptance criteria. The test
+implementation asserts some criteria but omits others — for example,
+it checks the happy-path output but does not verify error handling,
+boundary conditions, or timing constraints specified in the acceptance
+criteria.
+
+**Risk**: The test passes but does not verify the full requirement.
+Defects in the untested acceptance criteria will not be caught by CI.
+This is the test-code equivalent of D7 (acceptance criteria mismatch
+in the validation plan) but at the implementation level.
+
+**Severity guidance**: High when the missing criterion is a security
+or safety property. Medium for functional criteria. Assess based on
+what the missing criterion protects, not on the test's overall
+coverage.
+
+### D13_ASSERTION_MISMATCH
+
+A test implementation exists for a test case, but its assertions do
+not match the expected behavior specified in the validation plan.
+
+**Pattern**: TC-NNN is implemented as an automated test. The test
+asserts different conditions, thresholds, or outcomes than what the
+validation plan specifies — for example, the plan says "verify
+response within 200ms" but the test asserts "response is not null",
+or the plan says "verify error code 403" but the test asserts "status
+is not 200".
+
+**Risk**: The test passes but does not verify what the validation plan
+says it should. This creates illusory coverage — the traceability
+matrix shows the requirement as tested, but the actual test checks
+something different. More dangerous than D11 (missing test) because
+it is invisible without comparing test code to the validation plan.
+
+**Severity guidance**: High. This is the most dangerous test
+compliance drift type because it creates false confidence. Severity
+should be assessed based on the gap between what is asserted and what
+should be asserted.
 
 ## Ranking Criteria
 
@@ -218,14 +274,15 @@ Within a given severity level, order findings by impact on specification
 integrity:
 
 1. **Highest risk**: D6 (constraint violation in design), D7 (illusory
-   test coverage), and D10 (constraint violation in code) — these
-   indicate active conflicts between artifacts.
-2. **High risk**: D2 (untested requirement), D5 (assumption drift), and
-   D8 (unimplemented requirement) — these indicate silent gaps that
-   will surface late.
+   test coverage), D10 (constraint violation in code), and D13
+   (assertion mismatch) — these indicate active conflicts between
+   artifacts.
+2. **High risk**: D2 (untested requirement), D5 (assumption drift),
+   D8 (unimplemented requirement), and D12 (untested acceptance
+   criterion) — these indicate silent gaps that will surface late.
 3. **Medium risk**: D1 (untraced requirement), D3 (orphaned design),
-   and D9 (undocumented behavior) — these indicate incomplete
-   traceability that needs human resolution.
+   D9 (undocumented behavior), and D11 (unimplemented test case) —
+   these indicate incomplete traceability that needs human resolution.
 4. **Lowest risk**: D4 (orphaned test case) — effort misdirection but
    no safety or correctness impact.
 
diff --git a/templates/audit-test-compliance.md b/templates/audit-test-compliance.md
new file mode 100644
index 0000000..847f821
--- /dev/null
+++ b/templates/audit-test-compliance.md
@@ -0,0 +1,130 @@
+<!-- SPDX-License-Identifier: MIT -->
+<!-- Copyright (c) PromptKit Contributors -->
+
+---
+name: audit-test-compliance
+description: >
+  Audit test code against a validation plan and requirements document.
+  Detects unimplemented test cases, missing acceptance criterion
+  assertions, and assertion mismatches. Classifies findings using the
+  specification-drift taxonomy (D11–D13).
+persona: specification-analyst
+protocols:
+  - guardrails/anti-hallucination
+  - guardrails/self-verification
+  - guardrails/operational-constraints
+  - reasoning/test-compliance-audit
+taxonomies:
+  - specification-drift
+format: investigation-report
+params:
+  project_name: "Name of the project or feature being audited"
+  requirements_doc: "The requirements document content"
+  validation_plan: "The validation plan content (with TC-NNN test case definitions)"
+  test_code: "Test source code to audit — test files, test modules, or test directory contents"
+  focus_areas: "Optional narrowing — e.g., 'security test cases only', 'TC-001 through TC-020' (default: audit all)"
+  audience: "Who will read the audit report — e.g., 'QA leads', 'development team'"
+input_contract:
+  type: validation-plan
+  description: >
+    A validation plan with numbered test case definitions (TC-NNN),
+    linked requirements (REQ-IDs), and expected results. A requirements
+    document with acceptance criteria. Test source code to audit against
+    the plan.
+output_contract:
+  type: investigation-report
+  description: >
+    An investigation report classifying test compliance findings
+    using the D11–D13 taxonomy, with test implementation coverage
+    metrics and remediation recommendations.
+---
+
+# Task: Audit Test Compliance
+
+You are tasked with auditing test code against its validation plan to
+detect **test compliance drift** — gaps between what was planned for
+testing and what the automated tests actually verify.
+
+## Inputs
+
+**Project Name**: {{project_name}}
+
+**Requirements Document**:
+{{requirements_doc}}
+
+**Validation Plan**:
+{{validation_plan}}
+
+**Test Code**:
+{{test_code}}
+
+**Focus Areas**: {{focus_areas}}
+
+## Instructions
+
+1. **Apply the test-compliance-audit protocol.** Execute all phases in
+   order. This is the core methodology — do not skip phases.
+
+2. **Classify every finding** using the specification-drift taxonomy
+   (D11–D13). Every finding MUST have exactly one drift label, a
+   severity, evidence, and a recommended resolution. Include specific
+   locations in both the validation plan and the test code — except for
+   D11 findings, which by definition have no test code location (use
+   "None — no implementing test found" and describe what was searched).
+
+3. **If focus areas are specified**, perform the full inventories
+   (Phases 1–2) but restrict detailed tracing (Phases 3–4) to test
+   cases and code modules related to the focus areas.
+
+4. **Apply the anti-hallucination protocol.** Every finding must cite
+   specific TC-NNN IDs and test code locations. Do NOT invent test
+   cases or claim tests verify behavior you cannot point to. If you
+   cannot fully trace a test case due to incomplete test code context,
+   assign the appropriate drift label (D11) but set its confidence to
+   Low and state what additional test code would be needed to confirm.
+
+5. **Apply the operational-constraints protocol.** Do not attempt to
+   ingest the entire test suite. Focus on the test functions that map
+   to validation plan test cases and trace into helpers/fixtures only
+   as needed to verify assertions.
+
+6. **Format the output** according to the investigation-report format.
+   Map the protocol's output to the report structure:
+   - Phase 1–2 inventories → Investigation Scope (section 3)
+   - Phases 3–4 findings → Findings (section 4), one F-NNN per issue
+   - Phase 5 classification → Finding severity and categorization
+   - Phase 6 coverage summary → Executive Summary (section 1) and
+     a "Coverage Metrics" subsection in Root Cause Analysis (section 5)
+   - Recommended resolutions → Remediation Plan (section 6)
+
+7. **Quality checklist** — before finalizing, verify:
+   - [ ] Every automatable TC-NNN from the validation plan appears in
+         at least one finding or is confirmed as implemented
+   - [ ] Every finding has a specific drift label (D11, D12, or D13)
+   - [ ] Every finding cites both validation plan and test code
+         locations (D11 findings use "None — no implementing test found")
+   - [ ] D11 findings include what test case was expected and why no
+         implementation was found
+   - [ ] D12 findings include which acceptance criteria are missing
+         and which are present
+   - [ ] D13 findings include both the expected assertion (from the
+         plan) and the actual assertion (from the code)
+   - [ ] Manual-only and deferred test cases are excluded from findings
+         but counted in the coverage summary
+   - [ ] Coverage metrics are calculated from actual counts
+   - [ ] The executive summary is understandable without reading the
+         full report
+
+## Non-Goals
+
+- Do NOT modify the test code — report findings only.
+- Do NOT execute or run the tests — this is static analysis of test
+  code against the validation plan, not test execution.
+- Do NOT assess test code quality (style, readability, performance)
+  unless it directly relates to whether the test verifies what the
+  plan specifies.
+- Do NOT generate missing test implementations — identify and classify
+  the gaps.
+- Do NOT evaluate whether the validation plan's test cases are correct
+  or sufficient — only whether the test code implements them faithfully.
+- Do NOT expand scope beyond the provided documents and code.

From f0f07947ac8b4d285310e290a307f8a6e1c0ca3d Mon Sep 17 00:00:00 2001
From: Alan Jowett <alan.jowett@microsoft.com>
Date: Fri, 20 Mar 2026 12:05:20 -0700
Subject: [PATCH 2/3] Fix review feedback: D12 classification, inconclusive
 handling, D11 severity

- PARTIALLY IMPLEMENTED now maps to the correct drift type based on
  what's missing (criteria -> D12, assertions -> D13, stub -> D11)
- Incomplete code context is INCONCLUSIVE, not D11 (same pattern as
  code-compliance audit)
- D11 severity no longer references manual-only tests (those are
  excluded from findings by the protocol)
- acceptance criterion -> acceptance criteria (plural)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
---
 protocols/reasoning/test-compliance-audit.md | 11 ++++++++---
 taxonomies/specification-drift.md            |  5 +++--
 templates/audit-test-compliance.md           |  8 +++++---
 3 files changed, 16 insertions(+), 8 deletions(-)

diff --git a/protocols/reasoning/test-compliance-audit.md b/protocols/reasoning/test-compliance-audit.md
index 685700a..526134b 100644
--- a/protocols/reasoning/test-compliance-audit.md
+++ b/protocols/reasoning/test-compliance-audit.md
@@ -101,9 +101,14 @@ For each automatable test case in the validation plan:
 3. **Classify the result**:
    - **IMPLEMENTED**: Test fully implements the validation plan's
      test case with correct assertions. Record the test location.
-   - **PARTIALLY IMPLEMENTED**: Test exists but is missing steps,
-     assertions, or acceptance criteria. Flag as D12 with details on
-     what is present and what is missing.
+   - **PARTIALLY IMPLEMENTED**: Test exists but is incomplete.
+     Classify based on *what* is missing:
+     - Missing acceptance criteria assertions →
+       D12_UNTESTED_ACCEPTANCE_CRITERION
+     - Wrong assertions or mismatched expected results →
+       D13_ASSERTION_MISMATCH
+     - Test stub exists (e.g., empty body, skip annotation) with no
+       meaningful assertions → D11_UNIMPLEMENTED_TEST_CASE
    - **NOT IMPLEMENTED**: No test implements this test case. Flag as
      D11_UNIMPLEMENTED_TEST_CASE.
 
diff --git a/taxonomies/specification-drift.md b/taxonomies/specification-drift.md
index 3fa8266..6e175f0 100644
--- a/taxonomies/specification-drift.md
+++ b/taxonomies/specification-drift.md
@@ -220,8 +220,9 @@ it is covered.
 
 **Severity guidance**: High when the linked requirement is
 safety-critical or security-related. Medium for functional
-requirements. Low for non-functional or exploratory test cases
-explicitly marked as manual-only in the validation plan.
+requirements. Note: test cases classified as manual-only or deferred
+in the validation plan are excluded from D11 findings and reported
+only in the coverage summary.
 
 ### D12_UNTESTED_ACCEPTANCE_CRITERION
 
diff --git a/templates/audit-test-compliance.md b/templates/audit-test-compliance.md
index 847f821..a0db7e5 100644
--- a/templates/audit-test-compliance.md
+++ b/templates/audit-test-compliance.md
@@ -5,7 +5,7 @@
 name: audit-test-compliance
 description: >
   Audit test code against a validation plan and requirements document.
-  Detects unimplemented test cases, missing acceptance criterion
+  Detects unimplemented test cases, missing acceptance criteria
   assertions, and assertion mismatches. Classifies findings using the
   specification-drift taxonomy (D11–D13).
 persona: specification-analyst
@@ -80,8 +80,10 @@ testing and what the automated tests actually verify.
    specific TC-NNN IDs and test code locations. Do NOT invent test
    cases or claim tests verify behavior you cannot point to. If you
    cannot fully trace a test case due to incomplete test code context,
-   assign the appropriate drift label (D11) but set its confidence to
-   Low and state what additional test code would be needed to confirm.
+   do NOT assign D11 — instead note the test case as INCONCLUSIVE with
+   confidence Low and state what additional test code would be needed.
+   Only assign D11 after explicitly searching the provided test code
+   and failing to find an implementation.
 
 5. **Apply the operational-constraints protocol.** Do not attempt to
    ingest the entire test suite. Focus on the test functions that map

From 62595a54b9695ac5dc9a71429066156d7b888392 Mon Sep 17 00:00:00 2001
From: Alan Jowett <alan.jowett@microsoft.com>
Date: Fri, 20 Mar 2026 12:46:20 -0700
Subject: [PATCH 3/3] Fix stub classification and orphaned test handling

- Test stubs (empty body, skip annotation) are now D13 (assertions
  don't match because there are none) with a code location, not D11
  (which implies no test function exists at all)
- Orphaned tests referencing invalid TC-NNN/REQ-IDs are reported as
  observations in the coverage summary, not as D11-D13 findings
  (they don't fit the taxonomy since no valid TC-NNN is involved)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
---
 protocols/reasoning/test-compliance-audit.md | 16 ++++++++++------
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/protocols/reasoning/test-compliance-audit.md b/protocols/reasoning/test-compliance-audit.md
index 526134b..c2aa3f9 100644
--- a/protocols/reasoning/test-compliance-audit.md
+++ b/protocols/reasoning/test-compliance-audit.md
@@ -107,10 +107,12 @@ For each automatable test case in the validation plan:
        D12_UNTESTED_ACCEPTANCE_CRITERION
      - Wrong assertions or mismatched expected results →
        D13_ASSERTION_MISMATCH
-     - Test stub exists (e.g., empty body, skip annotation) with no
-       meaningful assertions → D11_UNIMPLEMENTED_TEST_CASE
-   - **NOT IMPLEMENTED**: No test implements this test case. Flag as
-     D11_UNIMPLEMENTED_TEST_CASE.
+   - **NOT IMPLEMENTED**: No test implements this test case (no
+     matching test function found in the provided code). Flag as
+     D11_UNIMPLEMENTED_TEST_CASE. Note: a test stub with an empty
+     body or skip annotation is NOT an implementation — classify it
+     as D13 (assertions don't match because there are none) and
+     record its code location.
 
 ## Phase 4: Backward Traceability (Test Code → Validation Plan)
 
@@ -127,8 +129,10 @@ Identify tests that don't trace to the validation plan.
      indicate validation plan gaps (candidates for new test cases).
    - **Orphaned tests**: Tests that reference TC-NNN IDs or REQ-IDs
      that do not exist in the validation plan or requirements. These
-     may be stale after a renumbering. Flag if they reference invalid
-     identifiers.
+     may be stale after a renumbering. Report orphaned tests as
+     observations in the coverage summary (Phase 6), not as D11–D13
+     findings — they don't fit the taxonomy since no valid TC-NNN
+     is involved.
 
 ## Phase 5: Classification and Reporting