Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 13 additions & 12 deletions docs/scenarios.md
Original file line number Diff line number Diff line change
Expand Up @@ -137,26 +137,27 @@ stale, and the critical ones are buried under feature requests.
priority and effort, identifying patterns and duplicates, and
recommending a workflow for the next sprint.

---

## Future Scenarios (Roadmap)

These scenarios describe capabilities that are planned but not yet
implemented. See the [roadmap](roadmap.md) for details.

### "Does the code actually implement what the spec says?"

You have a requirements document and a design document. The code has
been written. But does it actually implement the specified behavior?
Are there requirements with no implementation? Features in the code
that nobody asked for?

**Planned template:** `audit-code-compliance` ·
**Taxonomy:** `specification-drift` (D8–D10)
**Template:** `audit-code-compliance` · **Persona:** `specification-analyst` ·
**Protocol:** `code-compliance-audit` · **Taxonomy:** `specification-drift` (D8–D10)

**What you'd get:** An investigation report listing unimplemented
requirements, code behavior not traced to any requirement, and
mismatched assumptions between the spec and the implementation.
**What you get:** An investigation report listing unimplemented
requirements (D8), code behavior not traced to any requirement (D9),
and constraint violations in the implementation (D10), with
implementation coverage metrics and specific code locations.

---

## Future Scenarios (Roadmap)

These scenarios describe capabilities that are planned but not yet
implemented. See the [roadmap](roadmap.md) for details.

### "Do our tests actually test what the plan says they should?"

Expand Down
20 changes: 20 additions & 0 deletions manifest.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -152,6 +152,14 @@ protocols:
traceability matrices and classifies divergence using the
specification-drift taxonomy.

- name: code-compliance-audit
path: protocols/reasoning/code-compliance-audit.md
description: >
Systematic protocol for auditing source code against requirements
and design documents. Maps specification claims to code behavior
and classifies findings using the specification-drift taxonomy
(D8–D10).

formats:
- name: requirements-doc
path: formats/requirements-doc.md
Expand Down Expand Up @@ -320,6 +328,18 @@ templates:
pipeline_position: 4
requires: [requirements-document, validation-plan]

- name: audit-code-compliance
path: templates/audit-code-compliance.md
description: >
Audit source code against requirements and design documents.
Detects unimplemented requirements, undocumented behavior, and
constraint violations.
persona: specification-analyst
protocols: [anti-hallucination, self-verification, operational-constraints, code-compliance-audit]
taxonomies: [specification-drift]
format: investigation-report
requires: requirements-document

investigation:
- name: investigate-bug
path: templates/investigate-bug.md
Expand Down
166 changes: 166 additions & 0 deletions protocols/reasoning/code-compliance-audit.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,166 @@
<!-- SPDX-License-Identifier: MIT -->
<!-- Copyright (c) PromptKit Contributors -->

---
name: code-compliance-audit
type: reasoning
description: >
Systematic protocol for auditing source code against requirements and
design documents. Maps specification claims to code behavior, detects
unimplemented requirements, undocumented behavior, and constraint
violations. Classifies findings using the specification-drift taxonomy
(D8–D10).
applicable_to:
- audit-code-compliance
---

# Protocol: Code Compliance Audit

Apply this protocol when auditing source code against requirements and
design documents to determine whether the implementation matches the
specification. The goal is to find every gap between what was specified
and what was built — in both directions.

## Phase 1: Specification Inventory

Extract the audit targets from the specification documents.

1. **Requirements document** — extract:
- Every REQ-ID with its summary, acceptance criteria, and category
- Every constraint (performance, security, behavioral)
- Every assumption that affects implementation
- Defined terms and their precise meanings

2. **Design document** (if provided) — extract:
- Components, modules, and interfaces described
- API contracts (signatures, pre/postconditions, error handling)
- Data models and state management approach
- Non-functional strategies (caching, pooling, concurrency model)
- Explicit mapping of design elements to REQ-IDs

3. **Build a requirements checklist**: a flat list of every testable
claim from the specification that can be verified against code.
Each entry has: REQ-ID, the specific behavior or constraint, and
what evidence in code would confirm implementation.

## Phase 2: Code Inventory

Survey the source code to understand its structure before tracing.

1. **Module/component map**: Identify the major code modules, classes,
or packages and their responsibilities.
2. **API surface**: Catalog public functions, endpoints, interfaces —
the externally visible behavior.
3. **Configuration and feature flags**: Identify behavior that is
conditionally enabled or parameterized.
4. **Error handling paths**: Catalog how errors are handled — these
often implement (or fail to implement) requirements around
reliability and graceful degradation.

Do NOT attempt to understand every line of code. Focus on the
**behavioral surface** — what the code does, not how it does it
internally — unless the specification constrains the implementation
approach.

## Phase 3: Forward Traceability (Specification → Code)

For each requirement in the checklist:

1. **Search for implementation**: Identify the code module(s),
function(s), or path(s) that implement this requirement.
- Look for explicit references (comments citing REQ-IDs, function
names matching requirement concepts).
- Look for behavioral evidence (code that performs the specified
action under the specified conditions).
- Check configuration and feature flags that may gate the behavior.

2. **Assess implementation completeness**:
- Does the code implement the **full** requirement, including edge
cases described in acceptance criteria?
- Does the code implement the requirement under all specified
conditions, or only the common case?
- Are constraints (performance, resource limits, timing) enforced?

3. **Classify the result**:
- **IMPLEMENTED**: Code clearly implements the requirement. Record
the code location(s) as evidence.
- **PARTIALLY IMPLEMENTED**: Some aspects are present but acceptance
criteria are not fully met. Flag as D8_UNIMPLEMENTED_REQUIREMENT
with the finding describing what is present and what is missing.
Set confidence to Medium.
- **NOT IMPLEMENTED**: No code implements this requirement. Flag as
D8_UNIMPLEMENTED_REQUIREMENT with confidence High.

## Phase 4: Backward Traceability (Code → Specification)

Identify code behavior that is not specified.

1. **For each significant code module or feature**: determine whether
it traces to a requirement or design element.
- "Significant" means it implements user-facing behavior, data
processing, access control, external communication, or state
changes. Infrastructure (logging, metrics, boilerplate) is not
significant unless the specification constrains it.

2. **Flag undocumented behavior**:
- Code that implements meaningful behavior with no tracing
requirement is a candidate D9_UNDOCUMENTED_BEHAVIOR.
- Distinguish between: (a) genuine scope creep, (b) reasonable
infrastructure that supports requirements indirectly, and
(c) requirements gaps (behavior that should have been specified).
Report all three, but note the distinction.

## Phase 5: Constraint Verification

Check that specified constraints are respected in the implementation.

1. **For each constraint in the requirements**:
- Identify the code path(s) responsible for satisfying it.
- Assess whether the implementation approach **can** satisfy the
constraint (algorithmic feasibility, not just correctness).
- Check for explicit violations — code that demonstrably contradicts
the constraint.

2. **Common constraint categories to check**:
- Performance: response time limits, throughput requirements,
resource consumption bounds
- Security: encryption requirements, authentication enforcement,
input validation, access control
- Data integrity: validation rules, consistency guarantees,
atomicity requirements
- Compatibility: API versioning, backward compatibility,
interoperability constraints

3. **Flag violations** as D10_CONSTRAINT_VIOLATION_IN_CODE with
specific evidence (code location, the constraint, and how the
code violates it).

## Phase 6: Classification and Reporting

Classify every finding using the specification-drift taxonomy.

1. Assign exactly one drift label (D8, D9, or D10) to each finding.
2. Assign severity using the taxonomy's severity guidance.
3. For each finding, provide:
- The drift label and short title
- The spec location (REQ-ID, section) and code location (file,
function, line range). For D9 findings, the spec location is
"None — no matching requirement identified" with a description
of what was searched.
- Evidence: what the spec says and what the code does (or doesn't)
- Impact: what could go wrong
- Recommended resolution
4. Order findings primarily by severity, then by taxonomy ranking
within each severity tier.

## Phase 7: Coverage Summary

After reporting individual findings, produce aggregate metrics:

1. **Implementation coverage**: % of REQ-IDs with confirmed
implementations in code.
2. **Undocumented behavior rate**: count of significant code behaviors
with no tracing requirement.
3. **Constraint compliance**: count of constraints verified vs.
violated vs. unverifiable from code analysis alone.
4. **Overall assessment**: a summary judgment of code-to-spec alignment.
86 changes: 72 additions & 14 deletions taxonomies/specification-drift.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ description: >
domain: specification-traceability
applicable_to:
- audit-traceability
- audit-code-compliance
---

# Taxonomy: Specification Drift
Expand Down Expand Up @@ -141,33 +142,90 @@ criteria are not verified.
**Severity guidance**: High. This is more dangerous than D2 (untested
requirement) because it creates a false sense of coverage.

## Code Compliance Labels

### D8_UNIMPLEMENTED_REQUIREMENT

A requirement exists in the requirements document but has no
corresponding implementation in the source code.

**Pattern**: REQ-ID specifies a behavior, constraint, or capability.
No function, module, class, or code path in the source implements
or enforces this requirement.

**Risk**: The requirement was specified but never built. The system
does not deliver this capability despite it being in the spec.

**Severity guidance**: Critical when the requirement is safety-critical
or security-related. High for functional requirements. Medium for
non-functional requirements that affect quality attributes.

### D9_UNDOCUMENTED_BEHAVIOR

The source code implements behavior that is not specified in any
requirement or design document.

**Pattern**: A function, module, or code path implements meaningful
behavior (not just infrastructure like logging or error handling)
that does not trace to any REQ-ID in the requirements document or
any section in the design document.

**Risk**: Scope creep in implementation — the code does more than
was specified. The undocumented behavior may be intentional (a missing
requirement) or accidental (a developer's assumption). Either way,
it is untested against any specification.

**Severity guidance**: Medium when the behavior is benign feature
logic. High when the behavior involves security, access control,
data mutation, or external communication — undocumented behavior
in these areas is a security concern.

### D10_CONSTRAINT_VIOLATION_IN_CODE

The source code violates a constraint stated in the requirements or
design document.

**Pattern**: The requirements document states a constraint (e.g.,
"MUST respond within 200ms", "MUST NOT store passwords in plaintext",
"MUST use TLS 1.3 or later") and the source code demonstrably violates
it — through algorithmic choice, missing implementation, or explicit
contradiction.

**Risk**: The implementation will not meet requirements. Unlike D6
(constraint violation in design), this is a concrete defect in code,
not a planning gap.

**Severity guidance**: Critical when the violated constraint is
safety-critical, security-related, or regulatory. High for performance
or functional constraints. Assess based on the constraint itself,
not the code's complexity.

## Reserved Labels (Future Use)

The following label ranges are reserved for future specification drift
categories involving implementation and test code:
The following label range is reserved for future specification drift
categories involving test code:

- **D8–D10**: Reserved for **code compliance** drift (requirements/design
vs. source code). Example: D8_UNIMPLEMENTED_REQUIREMENT — a requirement
has no corresponding implementation in source code.
- **D11–D13**: Reserved for **test compliance** drift (validation plan
vs. test code). Example: D11_UNIMPLEMENTED_TEST_CASE — a test case in
the validation plan has no corresponding automated test.

These labels will be defined when the corresponding audit templates
(`audit-code-compliance`, `audit-test-compliance`) are added to the
library.
These labels will be defined when the `audit-test-compliance` template
is added to the library.

## Ranking Criteria

Within a given severity level, order findings by impact on specification
integrity:

1. **Highest risk**: D6 (active constraint violation) and D7 (illusory
coverage) — these indicate the documents are actively misleading.
2. **High risk**: D2 (untested requirement) and D5 (assumption drift) —
these indicate silent gaps that will surface late.
3. **Medium risk**: D1 (untraced requirement) and D3 (orphaned design) —
these indicate incomplete traceability that needs human resolution.
1. **Highest risk**: D6 (constraint violation in design), D7 (illusory
test coverage), and D10 (constraint violation in code) — these
indicate active conflicts between artifacts.
2. **High risk**: D2 (untested requirement), D5 (assumption drift), and
D8 (unimplemented requirement) — these indicate silent gaps that
will surface late.
3. **Medium risk**: D1 (untraced requirement), D3 (orphaned design),
and D9 (undocumented behavior) — these indicate incomplete
traceability that needs human resolution.
4. **Lowest risk**: D4 (orphaned test case) — effort misdirection but
no safety or correctness impact.

Expand Down
Loading
Loading