diff --git a/docs/scenarios.md b/docs/scenarios.md index 3e8c54c..448eb3a 100644 --- a/docs/scenarios.md +++ b/docs/scenarios.md @@ -137,13 +137,6 @@ stale, and the critical ones are buried under feature requests. priority and effort, identifying patterns and duplicates, and recommending a workflow for the next sprint. ---- - -## Future Scenarios (Roadmap) - -These scenarios describe capabilities that are planned but not yet -implemented. See the [roadmap](roadmap.md) for details. - ### "Does the code actually implement what the spec says?" You have a requirements document and a design document. The code has @@ -151,12 +144,20 @@ been written. But does it actually implement the specified behavior? Are there requirements with no implementation? Features in the code that nobody asked for? -**Planned template:** `audit-code-compliance` · -**Taxonomy:** `specification-drift` (D8–D10) +**Template:** `audit-code-compliance` · **Persona:** `specification-analyst` · +**Protocol:** `code-compliance-audit` · **Taxonomy:** `specification-drift` (D8–D10) -**What you'd get:** An investigation report listing unimplemented -requirements, code behavior not traced to any requirement, and -mismatched assumptions between the spec and the implementation. +**What you get:** An investigation report listing unimplemented +requirements (D8), code behavior not traced to any requirement (D9), +and constraint violations in the implementation (D10), with +implementation coverage metrics and specific code locations. + +--- + +## Future Scenarios (Roadmap) + +These scenarios describe capabilities that are planned but not yet +implemented. See the [roadmap](roadmap.md) for details. ### "Do our tests actually test what the plan says they should?" diff --git a/manifest.yaml b/manifest.yaml index 7056aa3..c7ad756 100644 --- a/manifest.yaml +++ b/manifest.yaml @@ -152,6 +152,14 @@ protocols: traceability matrices and classifies divergence using the specification-drift taxonomy. + - name: code-compliance-audit + path: protocols/reasoning/code-compliance-audit.md + description: > + Systematic protocol for auditing source code against requirements + and design documents. Maps specification claims to code behavior + and classifies findings using the specification-drift taxonomy + (D8–D10). + formats: - name: requirements-doc path: formats/requirements-doc.md @@ -320,6 +328,18 @@ templates: pipeline_position: 4 requires: [requirements-document, validation-plan] + - name: audit-code-compliance + path: templates/audit-code-compliance.md + description: > + Audit source code against requirements and design documents. + Detects unimplemented requirements, undocumented behavior, and + constraint violations. + persona: specification-analyst + protocols: [anti-hallucination, self-verification, operational-constraints, code-compliance-audit] + taxonomies: [specification-drift] + format: investigation-report + requires: requirements-document + investigation: - name: investigate-bug path: templates/investigate-bug.md diff --git a/protocols/reasoning/code-compliance-audit.md b/protocols/reasoning/code-compliance-audit.md new file mode 100644 index 0000000..49a6b8d --- /dev/null +++ b/protocols/reasoning/code-compliance-audit.md @@ -0,0 +1,166 @@ + + + +--- +name: code-compliance-audit +type: reasoning +description: > + Systematic protocol for auditing source code against requirements and + design documents. Maps specification claims to code behavior, detects + unimplemented requirements, undocumented behavior, and constraint + violations. Classifies findings using the specification-drift taxonomy + (D8–D10). +applicable_to: + - audit-code-compliance +--- + +# Protocol: Code Compliance Audit + +Apply this protocol when auditing source code against requirements and +design documents to determine whether the implementation matches the +specification. The goal is to find every gap between what was specified +and what was built — in both directions. + +## Phase 1: Specification Inventory + +Extract the audit targets from the specification documents. + +1. **Requirements document** — extract: + - Every REQ-ID with its summary, acceptance criteria, and category + - Every constraint (performance, security, behavioral) + - Every assumption that affects implementation + - Defined terms and their precise meanings + +2. **Design document** (if provided) — extract: + - Components, modules, and interfaces described + - API contracts (signatures, pre/postconditions, error handling) + - Data models and state management approach + - Non-functional strategies (caching, pooling, concurrency model) + - Explicit mapping of design elements to REQ-IDs + +3. **Build a requirements checklist**: a flat list of every testable + claim from the specification that can be verified against code. + Each entry has: REQ-ID, the specific behavior or constraint, and + what evidence in code would confirm implementation. + +## Phase 2: Code Inventory + +Survey the source code to understand its structure before tracing. + +1. **Module/component map**: Identify the major code modules, classes, + or packages and their responsibilities. +2. **API surface**: Catalog public functions, endpoints, interfaces — + the externally visible behavior. +3. **Configuration and feature flags**: Identify behavior that is + conditionally enabled or parameterized. +4. **Error handling paths**: Catalog how errors are handled — these + often implement (or fail to implement) requirements around + reliability and graceful degradation. + +Do NOT attempt to understand every line of code. Focus on the +**behavioral surface** — what the code does, not how it does it +internally — unless the specification constrains the implementation +approach. + +## Phase 3: Forward Traceability (Specification → Code) + +For each requirement in the checklist: + +1. **Search for implementation**: Identify the code module(s), + function(s), or path(s) that implement this requirement. + - Look for explicit references (comments citing REQ-IDs, function + names matching requirement concepts). + - Look for behavioral evidence (code that performs the specified + action under the specified conditions). + - Check configuration and feature flags that may gate the behavior. + +2. **Assess implementation completeness**: + - Does the code implement the **full** requirement, including edge + cases described in acceptance criteria? + - Does the code implement the requirement under all specified + conditions, or only the common case? + - Are constraints (performance, resource limits, timing) enforced? + +3. **Classify the result**: + - **IMPLEMENTED**: Code clearly implements the requirement. Record + the code location(s) as evidence. + - **PARTIALLY IMPLEMENTED**: Some aspects are present but acceptance + criteria are not fully met. Flag as D8_UNIMPLEMENTED_REQUIREMENT + with the finding describing what is present and what is missing. + Set confidence to Medium. + - **NOT IMPLEMENTED**: No code implements this requirement. Flag as + D8_UNIMPLEMENTED_REQUIREMENT with confidence High. + +## Phase 4: Backward Traceability (Code → Specification) + +Identify code behavior that is not specified. + +1. **For each significant code module or feature**: determine whether + it traces to a requirement or design element. + - "Significant" means it implements user-facing behavior, data + processing, access control, external communication, or state + changes. Infrastructure (logging, metrics, boilerplate) is not + significant unless the specification constrains it. + +2. **Flag undocumented behavior**: + - Code that implements meaningful behavior with no tracing + requirement is a candidate D9_UNDOCUMENTED_BEHAVIOR. + - Distinguish between: (a) genuine scope creep, (b) reasonable + infrastructure that supports requirements indirectly, and + (c) requirements gaps (behavior that should have been specified). + Report all three, but note the distinction. + +## Phase 5: Constraint Verification + +Check that specified constraints are respected in the implementation. + +1. **For each constraint in the requirements**: + - Identify the code path(s) responsible for satisfying it. + - Assess whether the implementation approach **can** satisfy the + constraint (algorithmic feasibility, not just correctness). + - Check for explicit violations — code that demonstrably contradicts + the constraint. + +2. **Common constraint categories to check**: + - Performance: response time limits, throughput requirements, + resource consumption bounds + - Security: encryption requirements, authentication enforcement, + input validation, access control + - Data integrity: validation rules, consistency guarantees, + atomicity requirements + - Compatibility: API versioning, backward compatibility, + interoperability constraints + +3. **Flag violations** as D10_CONSTRAINT_VIOLATION_IN_CODE with + specific evidence (code location, the constraint, and how the + code violates it). + +## Phase 6: Classification and Reporting + +Classify every finding using the specification-drift taxonomy. + +1. Assign exactly one drift label (D8, D9, or D10) to each finding. +2. Assign severity using the taxonomy's severity guidance. +3. For each finding, provide: + - The drift label and short title + - The spec location (REQ-ID, section) and code location (file, + function, line range). For D9 findings, the spec location is + "None — no matching requirement identified" with a description + of what was searched. + - Evidence: what the spec says and what the code does (or doesn't) + - Impact: what could go wrong + - Recommended resolution +4. Order findings primarily by severity, then by taxonomy ranking + within each severity tier. + +## Phase 7: Coverage Summary + +After reporting individual findings, produce aggregate metrics: + +1. **Implementation coverage**: % of REQ-IDs with confirmed + implementations in code. +2. **Undocumented behavior rate**: count of significant code behaviors + with no tracing requirement. +3. **Constraint compliance**: count of constraints verified vs. + violated vs. unverifiable from code analysis alone. +4. **Overall assessment**: a summary judgment of code-to-spec alignment. diff --git a/taxonomies/specification-drift.md b/taxonomies/specification-drift.md index 4f0f5d1..ea223e9 100644 --- a/taxonomies/specification-drift.md +++ b/taxonomies/specification-drift.md @@ -12,6 +12,7 @@ description: > domain: specification-traceability applicable_to: - audit-traceability + - audit-code-compliance --- # Taxonomy: Specification Drift @@ -141,33 +142,90 @@ criteria are not verified. **Severity guidance**: High. This is more dangerous than D2 (untested requirement) because it creates a false sense of coverage. +## Code Compliance Labels + +### D8_UNIMPLEMENTED_REQUIREMENT + +A requirement exists in the requirements document but has no +corresponding implementation in the source code. + +**Pattern**: REQ-ID specifies a behavior, constraint, or capability. +No function, module, class, or code path in the source implements +or enforces this requirement. + +**Risk**: The requirement was specified but never built. The system +does not deliver this capability despite it being in the spec. + +**Severity guidance**: Critical when the requirement is safety-critical +or security-related. High for functional requirements. Medium for +non-functional requirements that affect quality attributes. + +### D9_UNDOCUMENTED_BEHAVIOR + +The source code implements behavior that is not specified in any +requirement or design document. + +**Pattern**: A function, module, or code path implements meaningful +behavior (not just infrastructure like logging or error handling) +that does not trace to any REQ-ID in the requirements document or +any section in the design document. + +**Risk**: Scope creep in implementation — the code does more than +was specified. The undocumented behavior may be intentional (a missing +requirement) or accidental (a developer's assumption). Either way, +it is untested against any specification. + +**Severity guidance**: Medium when the behavior is benign feature +logic. High when the behavior involves security, access control, +data mutation, or external communication — undocumented behavior +in these areas is a security concern. + +### D10_CONSTRAINT_VIOLATION_IN_CODE + +The source code violates a constraint stated in the requirements or +design document. + +**Pattern**: The requirements document states a constraint (e.g., +"MUST respond within 200ms", "MUST NOT store passwords in plaintext", +"MUST use TLS 1.3 or later") and the source code demonstrably violates +it — through algorithmic choice, missing implementation, or explicit +contradiction. + +**Risk**: The implementation will not meet requirements. Unlike D6 +(constraint violation in design), this is a concrete defect in code, +not a planning gap. + +**Severity guidance**: Critical when the violated constraint is +safety-critical, security-related, or regulatory. High for performance +or functional constraints. Assess based on the constraint itself, +not the code's complexity. + ## Reserved Labels (Future Use) -The following label ranges are reserved for future specification drift -categories involving implementation and test code: +The following label range is reserved for future specification drift +categories involving test code: -- **D8–D10**: Reserved for **code compliance** drift (requirements/design - vs. source code). Example: D8_UNIMPLEMENTED_REQUIREMENT — a requirement - has no corresponding implementation in source code. - **D11–D13**: Reserved for **test compliance** drift (validation plan vs. test code). Example: D11_UNIMPLEMENTED_TEST_CASE — a test case in the validation plan has no corresponding automated test. -These labels will be defined when the corresponding audit templates -(`audit-code-compliance`, `audit-test-compliance`) are added to the -library. +These labels will be defined when the `audit-test-compliance` template +is added to the library. ## Ranking Criteria Within a given severity level, order findings by impact on specification integrity: -1. **Highest risk**: D6 (active constraint violation) and D7 (illusory - coverage) — these indicate the documents are actively misleading. -2. **High risk**: D2 (untested requirement) and D5 (assumption drift) — - these indicate silent gaps that will surface late. -3. **Medium risk**: D1 (untraced requirement) and D3 (orphaned design) — - these indicate incomplete traceability that needs human resolution. +1. **Highest risk**: D6 (constraint violation in design), D7 (illusory + test coverage), and D10 (constraint violation in code) — these + indicate active conflicts between artifacts. +2. **High risk**: D2 (untested requirement), D5 (assumption drift), and + D8 (unimplemented requirement) — these indicate silent gaps that + will surface late. +3. **Medium risk**: D1 (untraced requirement), D3 (orphaned design), + and D9 (undocumented behavior) — these indicate incomplete + traceability that needs human resolution. 4. **Lowest risk**: D4 (orphaned test case) — effort misdirection but no safety or correctness impact. diff --git a/templates/audit-code-compliance.md b/templates/audit-code-compliance.md new file mode 100644 index 0000000..dbf48a5 --- /dev/null +++ b/templates/audit-code-compliance.md @@ -0,0 +1,132 @@ + + + +--- +name: audit-code-compliance +description: > + Audit source code against requirements and design documents for + specification drift. Detects unimplemented requirements, undocumented + behavior, and constraint violations. Classifies findings using the + specification-drift taxonomy (D8–D10). +persona: specification-analyst +protocols: + - guardrails/anti-hallucination + - guardrails/self-verification + - guardrails/operational-constraints + - reasoning/code-compliance-audit +taxonomies: + - specification-drift +format: investigation-report +params: + project_name: "Name of the project or feature being audited" + requirements_doc: "The requirements document content" + design_doc: "The design document content (optional — omit for a requirements-only audit)" + code_context: "Source code to audit — files, modules, or repository path" + focus_areas: "Optional narrowing — e.g., 'security requirements only', 'API contracts' (default: audit all)" + audience: "Who will read the audit report — e.g., 'engineering leads', 'development team'" +input_contract: + type: requirements-document + description: > + A requirements document with numbered REQ-IDs and acceptance criteria. + Source code to audit against the specification. + Optionally, a design document with architecture and design decisions. +output_contract: + type: investigation-report + description: > + An investigation report classifying code compliance findings + using the D8–D10 taxonomy, with implementation coverage metrics + and remediation recommendations. +--- + +# Task: Audit Code Compliance + +You are tasked with auditing source code against its specification +documents to detect **code compliance drift** — gaps between what was +specified and what was built. + +## Inputs + +**Project Name**: {{project_name}} + +**Requirements Document**: +{{requirements_doc}} + +**Design Document** (if provided): +{{design_doc}} + +**Source Code**: +{{code_context}} + +**Focus Areas**: {{focus_areas}} + +## Instructions + +1. **Apply the code-compliance-audit protocol.** Execute all phases in + order. This is the core methodology — do not skip phases. + +2. **Classify every finding** using the specification-drift taxonomy + (D8–D10). Every finding MUST have exactly one drift label, a severity, + evidence, and a recommended resolution. Include specific locations in + both the spec and the code — except for D9 findings, which by + definition have no spec location (use "None — no matching requirement + identified" and describe what was searched). + +3. **If the design document is not provided**, skip design-related + checks. Trace requirements directly to code without an intermediate + design layer. Do NOT fabricate design content. + +4. **If focus areas are specified**, perform the full inventories + (Phases 1–2) but restrict detailed tracing (Phases 3–5) to + requirements and code modules related to the focus areas. + +5. **Apply the anti-hallucination protocol.** Every finding must cite + specific REQ-IDs and code locations. Do NOT invent requirements or + claim code implements behavior you cannot point to. If you cannot + fully trace a requirement due to incomplete code context, assign the + appropriate drift label (D8) but set its confidence to Low and state + what additional code would be needed to confirm. + +6. **Apply the operational-constraints protocol.** Do not attempt to + ingest the entire codebase. Focus on the behavioral surface — public + APIs, entry points, configuration, error handling — and trace inward + only as needed to verify specific requirements. + +7. **Format the output** according to the investigation-report format. + Map the protocol's output to the report structure: + - Phase 1–2 inventories → Investigation Scope (section 3) + - Phases 3–5 findings → Findings (section 4), one F-NNN per issue + - Phase 6 classification → Finding severity and categorization + - Phase 7 coverage summary → Executive Summary (section 1) and + a "Coverage Metrics" subsection in Root Cause Analysis (section 5) + - Recommended resolutions → Remediation Plan (section 6) + +8. **Quality checklist** — before finalizing, verify: + - [ ] Every REQ-ID from the requirements document appears in at least + one finding or is confirmed as implemented + - [ ] Every finding has a specific drift label (D8, D9, or D10) + - [ ] Every finding cites both spec and code locations (D9 findings + use "None — no matching requirement identified" for spec location) + - [ ] D8 findings include what was expected and why no implementation + was found + - [ ] D9 findings include the undocumented code behavior and why it + does not trace to any requirement + - [ ] D10 findings include the specific constraint and how the code + violates it + - [ ] Coverage metrics are calculated from actual counts + - [ ] The executive summary is understandable without reading the + full report + +## Non-Goals + +- Do NOT modify the source code — report findings only. +- Do NOT execute or test the code — this is static analysis against + the specification, not runtime verification. +- Do NOT assess code quality (style, readability, complexity) unless + it directly relates to a specification requirement. +- Do NOT generate missing requirements or design sections — identify + and classify the gaps. +- Do NOT evaluate whether the requirements are correct for the domain — + only whether the code implements them. +- Do NOT expand scope beyond the provided documents and code. External + knowledge about the domain may inform severity assessment but must + not introduce findings that are not evidenced in the inputs.