Add test compliance audit template (spec+validation -> test code) by Alan-Jowett · Pull Request #52 · microsoft/PromptKit

Alan-Jowett · 2026-03-20T16:36:04Z

Closes #38

Adds a new template that audits test code against a validation plan and requirements document for test compliance drift — gaps between what was planned for testing and what the automated tests actually verify. Completes the audit trifecta.

The Audit Trifecta

Template	Compares	Drift Types
`audit-traceability`	doc ↔ doc ↔ doc	D1–D7
`audit-code-compliance`	spec ↔ source code	D8–D10
`audit-test-compliance`	spec ↔ test code	D11–D13

New Drift Types (D11–D13)

Label	Description
D11_UNIMPLEMENTED_TEST_CASE	TC-NNN in validation plan has no automated test
D12_UNTESTED_ACCEPTANCE_CRITERION	Test exists but doesn't assert all acceptance criteria
D13_ASSERTION_MISMATCH	Test assertions don't match the expected behavior from the plan

New Components

Component	File
Protocol	`protocols/reasoning/test-compliance-audit.md` — 6-phase methodology
Template	`templates/audit-test-compliance.md` — consumes requirements + validation plan + test code

Reused Components

specification-analyst persona, investigation-report format, anti-hallucination + self-verification + operational-constraints guardrails, specification-drift taxonomy (extended).

Validation

tests/validate-manifest.py passes
Assembly smoke test confirms D11/D12/D13 taxonomy content in assembled prompt

Add a new template that audits test code against a validation plan and requirements document for test compliance drift. Detects unimplemented test cases (D11), missing acceptance criterion assertions (D12), and assertion mismatches (D13). New components: - Protocol: test-compliance-audit — 6-phase methodology (validation plan inventory, test code inventory, forward/backward traceability, classification, coverage summary) - Template: audit-test-compliance — consumes requirements + validation plan + test code, produces investigation-report Extended components: - Taxonomy: specification-drift — D11-D13 labels defined (previously reserved). Ranking criteria updated. No more reserved labels. - Scenarios gallery updated (moved from future to existing) This completes the audit trifecta: - audit-traceability: doc <-> doc <-> doc (D1-D7) - audit-code-compliance: spec <-> code (D8-D10) - audit-test-compliance: spec <-> test code (D11-D13) Closes microsoft#38 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Adds a new audit-test-compliance template and supporting taxonomy/protocol content to detect test compliance drift between requirements + validation plan vs automated test code (D11–D13).

Changes:

Introduces audit-test-compliance template to audit validation-plan test cases against test code and report drift findings.
Adds test-compliance-audit reasoning protocol (6 phases) for inventorying, tracing, and classifying drift.
Extends specification-drift taxonomy with concrete D11–D13 labels and updates docs/manifest to register the new components.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
templates/audit-test-compliance.md	New template for auditing test code vs validation plan + requirements using D11–D13
taxonomies/specification-drift.md	Defines D11–D13 labels and updates drift ranking guidance
protocols/reasoning/test-compliance-audit.md	New 6-phase methodology for test compliance audits
manifest.yaml	Registers the new protocol and template
docs/scenarios.md	Promotes the scenario from “future” to current with richer metadata

Copilot · 2026-03-20T16:43:20Z

protocols/reasoning/test-compliance-audit.md

+   - **PARTIALLY IMPLEMENTED**: Test exists but is missing steps,
+     assertions, or acceptance criteria. Flag as D12 with details on
+     what is present and what is missing.


The protocol instructs classifying any partial implementation (missing steps/assertions) as D12, but the taxonomy/PR description define D12 specifically as missing acceptance criteria coverage. Missing validation-plan expected-result assertions or missing steps are not necessarily ‘untested acceptance criteria’ and may be misclassified. Update the classification rules to map: (a) missing acceptance criteria → D12, (b) assertion mismatch/weak or missing expected-result assertions vs the plan → D13, and reserve D12 for acceptance-criteria gaps only.

Suggested change

- **PARTIALLY IMPLEMENTED**: Test exists but is missing steps,

assertions, or acceptance criteria. Flag as D12 with details on

what is present and what is missing.

- **PARTIALLY IMPLEMENTED**: Test exists but is incomplete relative

to the validation plan. Classify as follows, and record what is

present and what is missing:

- If one or more linked acceptance criteria are not verified at

all, flag D12_UNTESTED_ACCEPTANCE_CRITERION for each missing

criterion.

- If validation-plan steps are missing, expected-result assertions

are missing, or the implemented assertions are weaker or differ

from the validation plan's expected results, flag

D13_ASSERTION_MISMATCH with details.

- If both acceptance-criteria gaps and assertion/step issues are

present, record both D12 and D13.

Copilot · 2026-03-20T16:43:20Z

templates/audit-test-compliance.md

+   cannot fully trace a test case due to incomplete test code context,
+   assign the appropriate drift label (D11) but set its confidence to
+   Low and state what additional test code would be needed to confirm.


These instructions conflate ‘incomplete test code context’ with ‘no implementing test found’. D11 is a concrete claim that no automated test exists, but incomplete context only supports an inconclusive result. Consider changing this to: only label D11 after explicitly searching the provided test code and failing to find an implementation; if the provided code is incomplete, report the limitation explicitly and avoid counting the case as unimplemented in coverage metrics (or provide a min/max coverage range).

Suggested change

cannot fully trace a test case due to incomplete test code context,

assign the appropriate drift label (D11) but set its confidence to

Low and state what additional test code would be needed to confirm.

cannot fully trace a test case because the provided test code

context is incomplete or clearly insufficient, do **not** assign

any D11–D13 drift label. Instead, mark the situation as

"Inconclusive — insufficient test code context", clearly describe

what test files, modules, or directories were searched, what

additional test code would be needed to confirm the presence or

absence of an implementation, and reflect this uncertainty in

coverage metrics (e.g., by reporting a best‑case/worst‑case or

min/max coverage range rather than counting it as an unimplemented

D11 gap.

Copilot · 2026-03-20T16:43:21Z

templates/audit-test-compliance.md

+input_contract:
+  type: validation-plan
+  description: >
+    A validation plan with numbered test case definitions (TC-NNN),
+    linked requirements (REQ-IDs), and expected results. A requirements
+    document with acceptance criteria. Test source code to audit against
+    the plan.


The input_contract.type is validation-plan, but the described inputs include three distinct artifacts (validation plan + requirements doc + test code). If the template system uses input_contract.type for validation/routing, this is ambiguous. Suggestion: either (1) introduce a composite input type (e.g., test-compliance-audit-input) or (2) keep validation-plan but make the contract explicit that it requires accompanying requirements_doc and test_code, and ensure the manifest/schema can represent that requirement consistently.

Copilot · 2026-03-20T16:43:21Z

manifest.yaml

+      protocols: [anti-hallucination, self-verification, operational-constraints, test-compliance-audit]
+      taxonomies: [specification-drift]
+      format: investigation-report
+      requires: [requirements-document, validation-plan]


The template’s requires list omits the test-code artifact, even though the template has a test_code parameter and the PR purpose is spec↔test-code auditing. If requires is used by tooling/assemblers to ensure inputs are provided, this can lead to incomplete invocations. Consider adding an explicit requirement for test code (if there is an existing ‘test-code’/‘source-tree’ format) or extending the manifest schema to represent raw ‘code’ inputs for templates that audit code.

Suggested change

requires: [requirements-document, validation-plan]

requires: [requirements-document, validation-plan, test-code]

Copilot · 2026-03-20T16:43:21Z

taxonomies/specification-drift.md

+requirements. Low for non-functional or exploratory test cases
+explicitly marked as manual-only in the validation plan.


This severity guidance for D11 references ‘manual-only’ test cases, but manual-only cases are explicitly excluded from the audit scope (per the protocol/template) and shouldn’t produce D11 findings. Recommend removing the manual-only clause here, and instead (if needed) clarify that manual-only/deferred cases are excluded from findings and only counted in coverage summaries.

Suggested change

requirements. Low for non-functional or exploratory test cases

explicitly marked as manual-only in the validation plan.

requirements. Low when the linked requirement is non-critical and the

missing automation is explicitly tracked as deferred in the validation

plan. Test cases that are designated as permanently manual-only are out

of scope for D11 findings and only contribute to coverage summaries.

Copilot · 2026-03-20T16:43:21Z

templates/audit-test-compliance.md

+name: audit-test-compliance
+description: >
+  Audit test code against a validation plan and requirements document.
+  Detects unimplemented test cases, missing acceptance criterion


Use ‘acceptance criteria’ (plural) rather than ‘acceptance criterion’ here, since requirements typically have multiple criteria and the rest of the template/protocol discusses multiple criteria.

Suggested change

Detects unimplemented test cases, missing acceptance criterion

Detects unimplemented test cases, missing acceptance criteria

Copilot AI review requested due to automatic review settings March 20, 2026 16:36

Copilot AI reviewed Mar 20, 2026

View reviewed changes

Copilot started reviewing on behalf of Alan-Jowett March 20, 2026 17:27 View session

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add test compliance audit template (spec+validation -> test code)#52

Add test compliance audit template (spec+validation -> test code)#52
Alan-Jowett wants to merge 1 commit intomicrosoft:mainfrom
Alan-Jowett:add-test-compliance-audit

Alan-Jowett commented Mar 20, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 20, 2026

Uh oh!

Copilot AI Mar 20, 2026

Uh oh!

Copilot AI Mar 20, 2026

Uh oh!

Copilot AI Mar 20, 2026

Uh oh!

Copilot AI Mar 20, 2026

Uh oh!

Copilot AI Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

-   - **PARTIALLY IMPLEMENTED**: Test exists but is missing steps,
-     assertions, or acceptance criteria. Flag as D12 with details on
-     what is present and what is missing.
+   - **PARTIALLY IMPLEMENTED**: Test exists but is incomplete relative
+     to the validation plan. Classify as follows, and record what is
+     present and what is missing:
+     - If one or more linked acceptance criteria are not verified at
+       all, flag D12_UNTESTED_ACCEPTANCE_CRITERION for each missing
+       criterion.
+     - If validation-plan steps are missing, expected-result assertions
+       are missing, or the implemented assertions are weaker or differ
+       from the validation plan's expected results, flag
+       D13_ASSERTION_MISMATCH with details.
+     - If both acceptance-criteria gaps and assertion/step issues are
+       present, record both D12 and D13.

-   cannot fully trace a test case due to incomplete test code context,
-   assign the appropriate drift label (D11) but set its confidence to
-   Low and state what additional test code would be needed to confirm.
+   cannot fully trace a test case because the provided test code
+   context is incomplete or clearly insufficient, do **not** assign
+   any D11–D13 drift label. Instead, mark the situation as
+   "Inconclusive — insufficient test code context", clearly describe
+   what test files, modules, or directories were searched, what
+   additional test code would be needed to confirm the presence or
+   absence of an implementation, and reflect this uncertainty in
+   coverage metrics (e.g., by reporting a best‑case/worst‑case or
+   min/max coverage range rather than counting it as an unimplemented
+   D11 gap.

	requires: [requirements-document, validation-plan]
	requires: [requirements-document, validation-plan, test-code]

		requirements. Low for non-functional or exploratory test cases
		explicitly marked as manual-only in the validation plan.

-requirements. Low for non-functional or exploratory test cases
-explicitly marked as manual-only in the validation plan.
+requirements. Low when the linked requirement is non-critical and the
+missing automation is explicitly tracked as deferred in the validation
+plan. Test cases that are designated as permanently manual-only are out
+of scope for D11 findings and only contribute to coverage summaries.

	Detects unimplemented test cases, missing acceptance criterion
	Detects unimplemented test cases, missing acceptance criteria

Conversation

Alan-Jowett commented Mar 20, 2026

The Audit Trifecta

New Drift Types (D11–D13)

New Components

Reused Components

Validation

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants