Add test compliance audit template (spec+validation -> test code)#52
Add test compliance audit template (spec+validation -> test code)#52Alan-Jowett wants to merge 1 commit intomicrosoft:mainfrom
Conversation
Add a new template that audits test code against a validation plan and requirements document for test compliance drift. Detects unimplemented test cases (D11), missing acceptance criterion assertions (D12), and assertion mismatches (D13). New components: - Protocol: test-compliance-audit — 6-phase methodology (validation plan inventory, test code inventory, forward/backward traceability, classification, coverage summary) - Template: audit-test-compliance — consumes requirements + validation plan + test code, produces investigation-report Extended components: - Taxonomy: specification-drift — D11-D13 labels defined (previously reserved). Ranking criteria updated. No more reserved labels. - Scenarios gallery updated (moved from future to existing) This completes the audit trifecta: - audit-traceability: doc <-> doc <-> doc (D1-D7) - audit-code-compliance: spec <-> code (D8-D10) - audit-test-compliance: spec <-> test code (D11-D13) Closes microsoft#38 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Adds a new audit-test-compliance template and supporting taxonomy/protocol content to detect test compliance drift between requirements + validation plan vs automated test code (D11–D13).
Changes:
- Introduces
audit-test-compliancetemplate to audit validation-plan test cases against test code and report drift findings. - Adds
test-compliance-auditreasoning protocol (6 phases) for inventorying, tracing, and classifying drift. - Extends
specification-drifttaxonomy with concrete D11–D13 labels and updates docs/manifest to register the new components.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| templates/audit-test-compliance.md | New template for auditing test code vs validation plan + requirements using D11–D13 |
| taxonomies/specification-drift.md | Defines D11–D13 labels and updates drift ranking guidance |
| protocols/reasoning/test-compliance-audit.md | New 6-phase methodology for test compliance audits |
| manifest.yaml | Registers the new protocol and template |
| docs/scenarios.md | Promotes the scenario from “future” to current with richer metadata |
| - **PARTIALLY IMPLEMENTED**: Test exists but is missing steps, | ||
| assertions, or acceptance criteria. Flag as D12 with details on | ||
| what is present and what is missing. |
There was a problem hiding this comment.
The protocol instructs classifying any partial implementation (missing steps/assertions) as D12, but the taxonomy/PR description define D12 specifically as missing acceptance criteria coverage. Missing validation-plan expected-result assertions or missing steps are not necessarily ‘untested acceptance criteria’ and may be misclassified. Update the classification rules to map: (a) missing acceptance criteria → D12, (b) assertion mismatch/weak or missing expected-result assertions vs the plan → D13, and reserve D12 for acceptance-criteria gaps only.
| - **PARTIALLY IMPLEMENTED**: Test exists but is missing steps, | |
| assertions, or acceptance criteria. Flag as D12 with details on | |
| what is present and what is missing. | |
| - **PARTIALLY IMPLEMENTED**: Test exists but is incomplete relative | |
| to the validation plan. Classify as follows, and record what is | |
| present and what is missing: | |
| - If one or more linked acceptance criteria are not verified at | |
| all, flag D12_UNTESTED_ACCEPTANCE_CRITERION for each missing | |
| criterion. | |
| - If validation-plan steps are missing, expected-result assertions | |
| are missing, or the implemented assertions are weaker or differ | |
| from the validation plan's expected results, flag | |
| D13_ASSERTION_MISMATCH with details. | |
| - If both acceptance-criteria gaps and assertion/step issues are | |
| present, record both D12 and D13. |
| cannot fully trace a test case due to incomplete test code context, | ||
| assign the appropriate drift label (D11) but set its confidence to | ||
| Low and state what additional test code would be needed to confirm. |
There was a problem hiding this comment.
These instructions conflate ‘incomplete test code context’ with ‘no implementing test found’. D11 is a concrete claim that no automated test exists, but incomplete context only supports an inconclusive result. Consider changing this to: only label D11 after explicitly searching the provided test code and failing to find an implementation; if the provided code is incomplete, report the limitation explicitly and avoid counting the case as unimplemented in coverage metrics (or provide a min/max coverage range).
| cannot fully trace a test case due to incomplete test code context, | |
| assign the appropriate drift label (D11) but set its confidence to | |
| Low and state what additional test code would be needed to confirm. | |
| cannot fully trace a test case because the provided test code | |
| context is incomplete or clearly insufficient, do **not** assign | |
| any D11–D13 drift label. Instead, mark the situation as | |
| "Inconclusive — insufficient test code context", clearly describe | |
| what test files, modules, or directories were searched, what | |
| additional test code would be needed to confirm the presence or | |
| absence of an implementation, and reflect this uncertainty in | |
| coverage metrics (e.g., by reporting a best‑case/worst‑case or | |
| min/max coverage range rather than counting it as an unimplemented | |
| D11 gap. |
| input_contract: | ||
| type: validation-plan | ||
| description: > | ||
| A validation plan with numbered test case definitions (TC-NNN), | ||
| linked requirements (REQ-IDs), and expected results. A requirements | ||
| document with acceptance criteria. Test source code to audit against | ||
| the plan. |
There was a problem hiding this comment.
The input_contract.type is validation-plan, but the described inputs include three distinct artifacts (validation plan + requirements doc + test code). If the template system uses input_contract.type for validation/routing, this is ambiguous. Suggestion: either (1) introduce a composite input type (e.g., test-compliance-audit-input) or (2) keep validation-plan but make the contract explicit that it requires accompanying requirements_doc and test_code, and ensure the manifest/schema can represent that requirement consistently.
| protocols: [anti-hallucination, self-verification, operational-constraints, test-compliance-audit] | ||
| taxonomies: [specification-drift] | ||
| format: investigation-report | ||
| requires: [requirements-document, validation-plan] |
There was a problem hiding this comment.
The template’s requires list omits the test-code artifact, even though the template has a test_code parameter and the PR purpose is spec↔test-code auditing. If requires is used by tooling/assemblers to ensure inputs are provided, this can lead to incomplete invocations. Consider adding an explicit requirement for test code (if there is an existing ‘test-code’/‘source-tree’ format) or extending the manifest schema to represent raw ‘code’ inputs for templates that audit code.
| requires: [requirements-document, validation-plan] | |
| requires: [requirements-document, validation-plan, test-code] |
| requirements. Low for non-functional or exploratory test cases | ||
| explicitly marked as manual-only in the validation plan. |
There was a problem hiding this comment.
This severity guidance for D11 references ‘manual-only’ test cases, but manual-only cases are explicitly excluded from the audit scope (per the protocol/template) and shouldn’t produce D11 findings. Recommend removing the manual-only clause here, and instead (if needed) clarify that manual-only/deferred cases are excluded from findings and only counted in coverage summaries.
| requirements. Low for non-functional or exploratory test cases | |
| explicitly marked as manual-only in the validation plan. | |
| requirements. Low when the linked requirement is non-critical and the | |
| missing automation is explicitly tracked as deferred in the validation | |
| plan. Test cases that are designated as permanently manual-only are out | |
| of scope for D11 findings and only contribute to coverage summaries. |
| name: audit-test-compliance | ||
| description: > | ||
| Audit test code against a validation plan and requirements document. | ||
| Detects unimplemented test cases, missing acceptance criterion |
There was a problem hiding this comment.
Use ‘acceptance criteria’ (plural) rather than ‘acceptance criterion’ here, since requirements typically have multiple criteria and the rest of the template/protocol discusses multiple criteria.
| Detects unimplemented test cases, missing acceptance criterion | |
| Detects unimplemented test cases, missing acceptance criteria |
Closes #38
Adds a new template that audits test code against a validation plan and requirements document for test compliance drift — gaps between what was planned for testing and what the automated tests actually verify. Completes the audit trifecta.
The Audit Trifecta
audit-traceabilityaudit-code-complianceaudit-test-complianceNew Drift Types (D11–D13)
New Components
protocols/reasoning/test-compliance-audit.md— 6-phase methodologytemplates/audit-test-compliance.md— consumes requirements + validation plan + test codeReused Components
specification-analystpersona,investigation-reportformat,anti-hallucination+self-verification+operational-constraintsguardrails,specification-drifttaxonomy (extended).Validation
tests/validate-manifest.pypasses