Skip to content

Add test compliance audit template (spec+validation -> test code)#52

Open
Alan-Jowett wants to merge 1 commit intomicrosoft:mainfrom
Alan-Jowett:add-test-compliance-audit
Open

Add test compliance audit template (spec+validation -> test code)#52
Alan-Jowett wants to merge 1 commit intomicrosoft:mainfrom
Alan-Jowett:add-test-compliance-audit

Conversation

@Alan-Jowett
Copy link
Member

Closes #38

Adds a new template that audits test code against a validation plan and requirements document for test compliance drift — gaps between what was planned for testing and what the automated tests actually verify. Completes the audit trifecta.

The Audit Trifecta

Template Compares Drift Types
audit-traceability doc ↔ doc ↔ doc D1–D7
audit-code-compliance spec ↔ source code D8–D10
audit-test-compliance spec ↔ test code D11–D13

New Drift Types (D11–D13)

Label Description
D11_UNIMPLEMENTED_TEST_CASE TC-NNN in validation plan has no automated test
D12_UNTESTED_ACCEPTANCE_CRITERION Test exists but doesn't assert all acceptance criteria
D13_ASSERTION_MISMATCH Test assertions don't match the expected behavior from the plan

New Components

Component File
Protocol protocols/reasoning/test-compliance-audit.md — 6-phase methodology
Template templates/audit-test-compliance.md — consumes requirements + validation plan + test code

Reused Components

specification-analyst persona, investigation-report format, anti-hallucination + self-verification + operational-constraints guardrails, specification-drift taxonomy (extended).

Validation

  • tests/validate-manifest.py passes
  • Assembly smoke test confirms D11/D12/D13 taxonomy content in assembled prompt

Add a new template that audits test code against a validation plan
and requirements document for test compliance drift. Detects
unimplemented test cases (D11), missing acceptance criterion assertions
(D12), and assertion mismatches (D13).

New components:
- Protocol: test-compliance-audit — 6-phase methodology (validation
  plan inventory, test code inventory, forward/backward traceability,
  classification, coverage summary)
- Template: audit-test-compliance — consumes requirements + validation
  plan + test code, produces investigation-report

Extended components:
- Taxonomy: specification-drift — D11-D13 labels defined (previously
  reserved). Ranking criteria updated. No more reserved labels.
- Scenarios gallery updated (moved from future to existing)

This completes the audit trifecta:
- audit-traceability: doc <-> doc <-> doc (D1-D7)
- audit-code-compliance: spec <-> code (D8-D10)
- audit-test-compliance: spec <-> test code (D11-D13)

Closes microsoft#38

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings March 20, 2026 16:36
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new audit-test-compliance template and supporting taxonomy/protocol content to detect test compliance drift between requirements + validation plan vs automated test code (D11–D13).

Changes:

  • Introduces audit-test-compliance template to audit validation-plan test cases against test code and report drift findings.
  • Adds test-compliance-audit reasoning protocol (6 phases) for inventorying, tracing, and classifying drift.
  • Extends specification-drift taxonomy with concrete D11–D13 labels and updates docs/manifest to register the new components.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
templates/audit-test-compliance.md New template for auditing test code vs validation plan + requirements using D11–D13
taxonomies/specification-drift.md Defines D11–D13 labels and updates drift ranking guidance
protocols/reasoning/test-compliance-audit.md New 6-phase methodology for test compliance audits
manifest.yaml Registers the new protocol and template
docs/scenarios.md Promotes the scenario from “future” to current with richer metadata

Comment on lines +104 to +106
- **PARTIALLY IMPLEMENTED**: Test exists but is missing steps,
assertions, or acceptance criteria. Flag as D12 with details on
what is present and what is missing.
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The protocol instructs classifying any partial implementation (missing steps/assertions) as D12, but the taxonomy/PR description define D12 specifically as missing acceptance criteria coverage. Missing validation-plan expected-result assertions or missing steps are not necessarily ‘untested acceptance criteria’ and may be misclassified. Update the classification rules to map: (a) missing acceptance criteria → D12, (b) assertion mismatch/weak or missing expected-result assertions vs the plan → D13, and reserve D12 for acceptance-criteria gaps only.

Suggested change
- **PARTIALLY IMPLEMENTED**: Test exists but is missing steps,
assertions, or acceptance criteria. Flag as D12 with details on
what is present and what is missing.
- **PARTIALLY IMPLEMENTED**: Test exists but is incomplete relative
to the validation plan. Classify as follows, and record what is
present and what is missing:
- If one or more linked acceptance criteria are not verified at
all, flag D12_UNTESTED_ACCEPTANCE_CRITERION for each missing
criterion.
- If validation-plan steps are missing, expected-result assertions
are missing, or the implemented assertions are weaker or differ
from the validation plan's expected results, flag
D13_ASSERTION_MISMATCH with details.
- If both acceptance-criteria gaps and assertion/step issues are
present, record both D12 and D13.

Copilot uses AI. Check for mistakes.
Comment on lines +82 to +84
cannot fully trace a test case due to incomplete test code context,
assign the appropriate drift label (D11) but set its confidence to
Low and state what additional test code would be needed to confirm.
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These instructions conflate ‘incomplete test code context’ with ‘no implementing test found’. D11 is a concrete claim that no automated test exists, but incomplete context only supports an inconclusive result. Consider changing this to: only label D11 after explicitly searching the provided test code and failing to find an implementation; if the provided code is incomplete, report the limitation explicitly and avoid counting the case as unimplemented in coverage metrics (or provide a min/max coverage range).

Suggested change
cannot fully trace a test case due to incomplete test code context,
assign the appropriate drift label (D11) but set its confidence to
Low and state what additional test code would be needed to confirm.
cannot fully trace a test case because the provided test code
context is incomplete or clearly insufficient, do **not** assign
any D11–D13 drift label. Instead, mark the situation as
"Inconclusive — insufficient test code context", clearly describe
what test files, modules, or directories were searched, what
additional test code would be needed to confirm the presence or
absence of an implementation, and reflect this uncertainty in
coverage metrics (e.g., by reporting a best‑case/worst‑case or
min/max coverage range rather than counting it as an unimplemented
D11 gap.

Copilot uses AI. Check for mistakes.
Comment on lines +27 to +33
input_contract:
type: validation-plan
description: >
A validation plan with numbered test case definitions (TC-NNN),
linked requirements (REQ-IDs), and expected results. A requirements
document with acceptance criteria. Test source code to audit against
the plan.
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The input_contract.type is validation-plan, but the described inputs include three distinct artifacts (validation plan + requirements doc + test code). If the template system uses input_contract.type for validation/routing, this is ambiguous. Suggestion: either (1) introduce a composite input type (e.g., test-compliance-audit-input) or (2) keep validation-plan but make the contract explicit that it requires accompanying requirements_doc and test_code, and ensure the manifest/schema can represent that requirement consistently.

Copilot uses AI. Check for mistakes.
protocols: [anti-hallucination, self-verification, operational-constraints, test-compliance-audit]
taxonomies: [specification-drift]
format: investigation-report
requires: [requirements-document, validation-plan]
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The template’s requires list omits the test-code artifact, even though the template has a test_code parameter and the PR purpose is spec↔test-code auditing. If requires is used by tooling/assemblers to ensure inputs are provided, this can lead to incomplete invocations. Consider adding an explicit requirement for test code (if there is an existing ‘test-code’/‘source-tree’ format) or extending the manifest schema to represent raw ‘code’ inputs for templates that audit code.

Suggested change
requires: [requirements-document, validation-plan]
requires: [requirements-document, validation-plan, test-code]

Copilot uses AI. Check for mistakes.
Comment on lines +223 to +224
requirements. Low for non-functional or exploratory test cases
explicitly marked as manual-only in the validation plan.
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This severity guidance for D11 references ‘manual-only’ test cases, but manual-only cases are explicitly excluded from the audit scope (per the protocol/template) and shouldn’t produce D11 findings. Recommend removing the manual-only clause here, and instead (if needed) clarify that manual-only/deferred cases are excluded from findings and only counted in coverage summaries.

Suggested change
requirements. Low for non-functional or exploratory test cases
explicitly marked as manual-only in the validation plan.
requirements. Low when the linked requirement is non-critical and the
missing automation is explicitly tracked as deferred in the validation
plan. Test cases that are designated as permanently manual-only are out
of scope for D11 findings and only contribute to coverage summaries.

Copilot uses AI. Check for mistakes.
name: audit-test-compliance
description: >
Audit test code against a validation plan and requirements document.
Detects unimplemented test cases, missing acceptance criterion
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use ‘acceptance criteria’ (plural) rather than ‘acceptance criterion’ here, since requirements typically have multiple criteria and the rest of the template/protocol discusses multiple criteria.

Suggested change
Detects unimplemented test cases, missing acceptance criterion
Detects unimplemented test cases, missing acceptance criteria

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add test compliance audit template (spec+validation → test code)

2 participants