microsoft · Alan-Jowett · Mar 20, 2026 · Mar 20, 2026 · Mar 20, 2026
diff --git a/docs/scenarios.md b/docs/scenarios.md
@@ -137,26 +137,27 @@ stale, and the critical ones are buried under feature requests.
 priority and effort, identifying patterns and duplicates, and
 recommending a workflow for the next sprint.
 
----
-
-## Future Scenarios (Roadmap)
-
-These scenarios describe capabilities that are planned but not yet
-implemented. See the [roadmap](roadmap.md) for details.
-
 ### "Does the code actually implement what the spec says?"
 
 You have a requirements document and a design document. The code has
 been written. But does it actually implement the specified behavior?
 Are there requirements with no implementation? Features in the code
 that nobody asked for?
 
-**Planned template:** `audit-code-compliance` ·
-**Taxonomy:** `specification-drift` (D8–D10)
+**Template:** `audit-code-compliance` · **Persona:** `specification-analyst` ·
+**Protocol:** `code-compliance-audit` · **Taxonomy:** `specification-drift` (D8–D10)
 
-**What you'd get:** An investigation report listing unimplemented
-requirements, code behavior not traced to any requirement, and
-mismatched assumptions between the spec and the implementation.
+**What you get:** An investigation report listing unimplemented
+requirements (D8), code behavior not traced to any requirement (D9),
+and constraint violations in the implementation (D10), with
+implementation coverage metrics and specific code locations.
+
+---
+
+## Future Scenarios (Roadmap)
+
+These scenarios describe capabilities that are planned but not yet
+implemented. See the [roadmap](roadmap.md) for details.
 
 ### "Do our tests actually test what the plan says they should?"
 

diff --git a/manifest.yaml b/manifest.yaml
@@ -152,6 +152,14 @@ protocols:
         traceability matrices and classifies divergence using the
         specification-drift taxonomy.
 
+    - name: code-compliance-audit
+      path: protocols/reasoning/code-compliance-audit.md
+      description: >
+        Systematic protocol for auditing source code against requirements
+        and design documents. Maps specification claims to code behavior
+        and classifies findings using the specification-drift taxonomy
+        (D8–D10).
+
 formats:
   - name: requirements-doc
     path: formats/requirements-doc.md
@@ -320,6 +328,18 @@ templates:
       pipeline_position: 4
       requires: [requirements-document, validation-plan]
 
+    - name: audit-code-compliance
+      path: templates/audit-code-compliance.md
+      description: >
+        Audit source code against requirements and design documents.
+        Detects unimplemented requirements, undocumented behavior, and
+        constraint violations.
+      persona: specification-analyst
+      protocols: [anti-hallucination, self-verification, operational-constraints, code-compliance-audit]
+      taxonomies: [specification-drift]
+      format: investigation-report
+      requires: requirements-document
+
   investigation:
     - name: investigate-bug
       path: templates/investigate-bug.md

diff --git a/protocols/reasoning/code-compliance-audit.md b/protocols/reasoning/code-compliance-audit.md
@@ -0,0 +1,166 @@
+<!-- SPDX-License-Identifier: MIT -->
+<!-- Copyright (c) PromptKit Contributors -->
+
+---
+name: code-compliance-audit
+type: reasoning
+description: >
+  Systematic protocol for auditing source code against requirements and
+  design documents. Maps specification claims to code behavior, detects
+  unimplemented requirements, undocumented behavior, and constraint
+  violations. Classifies findings using the specification-drift taxonomy
+  (D8–D10).
+applicable_to:
+  - audit-code-compliance
+---
+
+# Protocol: Code Compliance Audit
+
+Apply this protocol when auditing source code against requirements and
+design documents to determine whether the implementation matches the
+specification. The goal is to find every gap between what was specified
+and what was built — in both directions.
+
+## Phase 1: Specification Inventory
+
+Extract the audit targets from the specification documents.
+
+1. **Requirements document** — extract:
+   - Every REQ-ID with its summary, acceptance criteria, and category
+   - Every constraint (performance, security, behavioral)
+   - Every assumption that affects implementation
+   - Defined terms and their precise meanings
+
+2. **Design document** (if provided) — extract:
+   - Components, modules, and interfaces described
+   - API contracts (signatures, pre/postconditions, error handling)
+   - Data models and state management approach
+   - Non-functional strategies (caching, pooling, concurrency model)
+   - Explicit mapping of design elements to REQ-IDs
+
+3. **Build a requirements checklist**: a flat list of every testable
+   claim from the specification that can be verified against code.
+   Each entry has: REQ-ID, the specific behavior or constraint, and
+   what evidence in code would confirm implementation.
+
+## Phase 2: Code Inventory
+
+Survey the source code to understand its structure before tracing.
+
+1. **Module/component map**: Identify the major code modules, classes,
+   or packages and their responsibilities.
+2. **API surface**: Catalog public functions, endpoints, interfaces —
+   the externally visible behavior.
+3. **Configuration and feature flags**: Identify behavior that is
+   conditionally enabled or parameterized.
+4. **Error handling paths**: Catalog how errors are handled — these
+   often implement (or fail to implement) requirements around
+   reliability and graceful degradation.
+
+Do NOT attempt to understand every line of code. Focus on the
+**behavioral surface** — what the code does, not how it does it
+internally — unless the specification constrains the implementation
+approach.
+
+## Phase 3: Forward Traceability (Specification → Code)
+
+For each requirement in the checklist:
+
+1. **Search for implementation**: Identify the code module(s),
+   function(s), or path(s) that implement this requirement.
+   - Look for explicit references (comments citing REQ-IDs, function
+     names matching requirement concepts).
+   - Look for behavioral evidence (code that performs the specified
+     action under the specified conditions).
+   - Check configuration and feature flags that may gate the behavior.
+
+2. **Assess implementation completeness**:
+   - Does the code implement the **full** requirement, including edge
+     cases described in acceptance criteria?
+   - Does the code implement the requirement under all specified
+     conditions, or only the common case?
+   - Are constraints (performance, resource limits, timing) enforced?
+
+3. **Classify the result**:
+   - **IMPLEMENTED**: Code clearly implements the requirement. Record
+     the code location(s) as evidence.
+   - **PARTIALLY IMPLEMENTED**: Some aspects are present but acceptance
+     criteria are not fully met. Flag as D8_UNIMPLEMENTED_REQUIREMENT
+     with the finding describing what is present and what is missing.
+     Set confidence to Medium.
+   - **NOT IMPLEMENTED**: No code implements this requirement. Flag as
+     D8_UNIMPLEMENTED_REQUIREMENT with confidence High.
+
+## Phase 4: Backward Traceability (Code → Specification)
+
+Identify code behavior that is not specified.
+
+1. **For each significant code module or feature**: determine whether
+   it traces to a requirement or design element.
+   - "Significant" means it implements user-facing behavior, data
+     processing, access control, external communication, or state
+     changes. Infrastructure (logging, metrics, boilerplate) is not
+     significant unless the specification constrains it.
+
+2. **Flag undocumented behavior**:
+   - Code that implements meaningful behavior with no tracing
+     requirement is a candidate D9_UNDOCUMENTED_BEHAVIOR.
+   - Distinguish between: (a) genuine scope creep, (b) reasonable
+     infrastructure that supports requirements indirectly, and
+     (c) requirements gaps (behavior that should have been specified).
+     Report all three, but note the distinction.
+
+## Phase 5: Constraint Verification
+
+Check that specified constraints are respected in the implementation.
+
+1. **For each constraint in the requirements**:
+   - Identify the code path(s) responsible for satisfying it.
+   - Assess whether the implementation approach **can** satisfy the
+     constraint (algorithmic feasibility, not just correctness).
+   - Check for explicit violations — code that demonstrably contradicts
+     the constraint.
+
+2. **Common constraint categories to check**:
+   - Performance: response time limits, throughput requirements,
+     resource consumption bounds
+   - Security: encryption requirements, authentication enforcement,
+     input validation, access control
+   - Data integrity: validation rules, consistency guarantees,
+     atomicity requirements
+   - Compatibility: API versioning, backward compatibility,
+     interoperability constraints
+
+3. **Flag violations** as D10_CONSTRAINT_VIOLATION_IN_CODE with
+   specific evidence (code location, the constraint, and how the
+   code violates it).
+
+## Phase 6: Classification and Reporting
+
+Classify every finding using the specification-drift taxonomy.
+
+1. Assign exactly one drift label (D8, D9, or D10) to each finding.
+2. Assign severity using the taxonomy's severity guidance.
+3. For each finding, provide:
+   - The drift label and short title
+   - The spec location (REQ-ID, section) and code location (file,
+     function, line range). For D9 findings, the spec location is
+     "None — no matching requirement identified" with a description
+     of what was searched.
+   - Evidence: what the spec says and what the code does (or doesn't)
+   - Impact: what could go wrong
+   - Recommended resolution
+4. Order findings primarily by severity, then by taxonomy ranking
+   within each severity tier.
+
+## Phase 7: Coverage Summary
+
+After reporting individual findings, produce aggregate metrics:
+
+1. **Implementation coverage**: % of REQ-IDs with confirmed
+   implementations in code.
+2. **Undocumented behavior rate**: count of significant code behaviors
+   with no tracing requirement.
+3. **Constraint compliance**: count of constraints verified vs.
+   violated vs. unverifiable from code analysis alone.
+4. **Overall assessment**: a summary judgment of code-to-spec alignment.
diff --git a/taxonomies/specification-drift.md b/taxonomies/specification-drift.md
@@ -12,6 +12,7 @@ description: >
 domain: specification-traceability
 applicable_to:
   - audit-traceability
+  - audit-code-compliance
 ---
 
 # Taxonomy: Specification Drift
@@ -141,33 +142,90 @@ criteria are not verified.
 **Severity guidance**: High. This is more dangerous than D2 (untested
 requirement) because it creates a false sense of coverage.
 
+## Code Compliance Labels
+
+### D8_UNIMPLEMENTED_REQUIREMENT
+
+A requirement exists in the requirements document but has no
+corresponding implementation in the source code.
+
+**Pattern**: REQ-ID specifies a behavior, constraint, or capability.
+No function, module, class, or code path in the source implements
+or enforces this requirement.
+
+**Risk**: The requirement was specified but never built. The system
+does not deliver this capability despite it being in the spec.
+
+**Severity guidance**: Critical when the requirement is safety-critical
+or security-related. High for functional requirements. Medium for
+non-functional requirements that affect quality attributes.
+
+### D9_UNDOCUMENTED_BEHAVIOR
+
+The source code implements behavior that is not specified in any
+requirement or design document.
+
+**Pattern**: A function, module, or code path implements meaningful
+behavior (not just infrastructure like logging or error handling)
+that does not trace to any REQ-ID in the requirements document or
+any section in the design document.
+
+**Risk**: Scope creep in implementation — the code does more than
+was specified. The undocumented behavior may be intentional (a missing
+requirement) or accidental (a developer's assumption). Either way,
+it is untested against any specification.
+
+**Severity guidance**: Medium when the behavior is benign feature
+logic. High when the behavior involves security, access control,
+data mutation, or external communication — undocumented behavior
+in these areas is a security concern.
+
+### D10_CONSTRAINT_VIOLATION_IN_CODE
+
+The source code violates a constraint stated in the requirements or
+design document.
+
+**Pattern**: The requirements document states a constraint (e.g.,
+"MUST respond within 200ms", "MUST NOT store passwords in plaintext",
+"MUST use TLS 1.3 or later") and the source code demonstrably violates
+it — through algorithmic choice, missing implementation, or explicit
+contradiction.
+
+**Risk**: The implementation will not meet requirements. Unlike D6
+(constraint violation in design), this is a concrete defect in code,
+not a planning gap.
+
+**Severity guidance**: Critical when the violated constraint is
+safety-critical, security-related, or regulatory. High for performance
+or functional constraints. Assess based on the constraint itself,
+not the code's complexity.
+
 ## Reserved Labels (Future Use)
 
-The following label ranges are reserved for future specification drift
-categories involving implementation and test code:
+The following label range is reserved for future specification drift
+categories involving test code:
 
-- **D8–D10**: Reserved for **code compliance** drift (requirements/design
-  vs. source code). Example: D8_UNIMPLEMENTED_REQUIREMENT — a requirement
-  has no corresponding implementation in source code.
 - **D11–D13**: Reserved for **test compliance** drift (validation plan
   vs. test code). Example: D11_UNIMPLEMENTED_TEST_CASE — a test case in
   the validation plan has no corresponding automated test.
 
-These labels will be defined when the corresponding audit templates
-(`audit-code-compliance`, `audit-test-compliance`) are added to the
-library.
+These labels will be defined when the `audit-test-compliance` template
+is added to the library.
 
 ## Ranking Criteria
 
 Within a given severity level, order findings by impact on specification
 integrity:
 
-1. **Highest risk**: D6 (active constraint violation) and D7 (illusory
-   coverage) — these indicate the documents are actively misleading.
-2. **High risk**: D2 (untested requirement) and D5 (assumption drift) —
-   these indicate silent gaps that will surface late.
-3. **Medium risk**: D1 (untraced requirement) and D3 (orphaned design) —
-   these indicate incomplete traceability that needs human resolution.
+1. **Highest risk**: D6 (constraint violation in design), D7 (illusory
+   test coverage), and D10 (constraint violation in code) — these
+   indicate active conflicts between artifacts.
+2. **High risk**: D2 (untested requirement), D5 (assumption drift), and
+   D8 (unimplemented requirement) — these indicate silent gaps that
+   will surface late.
+3. **Medium risk**: D1 (untraced requirement), D3 (orphaned design),
+   and D9 (undocumented behavior) — these indicate incomplete
+   traceability that needs human resolution.
 4. **Lowest risk**: D4 (orphaned test case) — effort misdirection but
    no safety or correctness impact.