diff --git a/README.md b/README.md index 60e84fb..6c84048 100644 --- a/README.md +++ b/README.md @@ -198,13 +198,40 @@ prompt based on the user's needs. Templates declare **input and output contracts** so they can be chained: ``` -author-requirements-doc → author-design-doc → author-validation-plan - (produces: requirements) (consumes: requirements, (consumes: requirements, - produces: design) produces: validation) +author-requirements-doc → author-design-doc → author-validation-plan → audit-traceability + (produces: requirements) (consumes: requirements, (consumes: requirements, (consumes: requirements + + produces: design) produces: validation) validation; design optional, + produces: drift report) ``` The output of one template becomes the input parameter of the next. +### Use Case: Specification Traceability Audit + +After authoring requirements, design, and validation documents — whether +through PromptKit's pipeline or by hand — you can audit all three for +**specification drift**: gaps, contradictions, and divergence that +accumulate as documents evolve independently. + +```bash +# Assemble a traceability audit prompt +npx @alan-jowett/promptkit assemble audit-traceability \ + -p project_name="Auth Service" \ + -p requirements_doc="$(cat requirements.md)" \ + -p design_doc="$(cat design.md)" \ + -p validation_plan="$(cat validation-plan.md)" \ + -o audit-report.md +``` + +The audit uses the `specification-drift` taxonomy (D1–D7) to classify +findings — untraced requirements, orphaned design decisions, assumption +drift, constraint violations, and illusory test coverage. Each finding +includes specific document locations, evidence, severity, and remediation +guidance. + +The design document is optional — omit it for a focused +requirements ↔ validation plan audit. + ## Components ### Personas @@ -214,6 +241,7 @@ The output of one template becomes the input parameter of the next. | `systems-engineer` | Memory management, concurrency, performance, debugging | | `security-auditor` | Vulnerability discovery, threat modeling, secure design | | `software-architect` | System design, API contracts, tradeoff analysis | +| `specification-analyst` | Cross-document traceability, coverage analysis, specification drift | ### Protocols @@ -240,6 +268,7 @@ The output of one template becomes the input parameter of the next. |------|-------------| | `root-cause-analysis` | Systematic root cause analysis | | `requirements-elicitation` | Requirements extraction from natural language | +| `traceability-audit` | Cross-document specification drift detection | ### Formats @@ -256,6 +285,7 @@ The output of one template becomes the input parameter of the next. | Name | Domain | Description | |------|--------|-------------| | `stack-lifetime-hazards` | Memory safety | H1–H5 labels for stack escape and lifetime violations | +| `specification-drift` | Specification traceability | D1–D7 labels for cross-document drift and divergence | ### Templates @@ -269,6 +299,7 @@ The output of one template becomes the input parameter of the next. | `review-code` | Code analysis | Code review for correctness and safety | | `plan-implementation` | Planning | Implementation task breakdown | | `plan-refactoring` | Planning | Safe, incremental refactoring plan | +| `audit-traceability` | Document auditing | Cross-document specification drift audit | ## Directory Structure diff --git a/cli/lib/assemble.js b/cli/lib/assemble.js index 1ecd167..b4d3dea 100644 --- a/cli/lib/assemble.js +++ b/cli/lib/assemble.js @@ -42,7 +42,7 @@ function substituteParams(content, params) { function assemble(contentDir, manifest, templateEntry, params = {}) { const { resolveTemplateDeps } = require("./manifest"); - const { persona, protocols, format } = resolveTemplateDeps( + const { persona, protocols, taxonomies, format } = resolveTemplateDeps( manifest, templateEntry ); @@ -69,7 +69,19 @@ function assemble(contentDir, manifest, templateEntry, params = {}) { } } - // 3. Output Format + // 3. Classification Taxonomy + if (taxonomies.length > 0) { + const taxonomyBodies = taxonomies + .map((t) => loadComponent(contentDir, t.path)) + .filter(Boolean); + if (taxonomyBodies.length > 0) { + sections.push( + "# Classification Taxonomy\n\n" + taxonomyBodies.join("\n\n---\n\n") + ); + } + } + + // 4. Output Format if (format) { const body = loadComponent(contentDir, format.path); if (body) { @@ -77,7 +89,7 @@ function assemble(contentDir, manifest, templateEntry, params = {}) { } } - // 4. Task (template) + // 5. Task (template) const templateBody = loadComponent(contentDir, templateEntry.path); if (templateBody) { sections.push("# Task\n\n" + templateBody); diff --git a/cli/lib/manifest.js b/cli/lib/manifest.js index ee4e23e..acdc562 100644 --- a/cli/lib/manifest.js +++ b/cli/lib/manifest.js @@ -39,6 +39,10 @@ function getFormat(manifest, name) { return (manifest.formats || []).find((f) => f.name === name); } +function getTaxonomy(manifest, name) { + return (manifest.taxonomies || []).find((t) => t.name === name); +} + function resolveTemplateDeps(manifest, template) { const persona = getPersona(manifest, template.persona); @@ -54,7 +58,15 @@ function resolveTemplateDeps(manifest, template) { const format = template.format ? getFormat(manifest, template.format) : null; - return { persona, protocols, format }; + const taxonomies = (template.taxonomies || []).map((name) => { + const tax = getTaxonomy(manifest, name); + if (!tax) { + console.warn(`Warning: taxonomy '${name}' not found in manifest`); + } + return tax; + }).filter(Boolean); + + return { persona, protocols, taxonomies, format }; } module.exports = { @@ -63,5 +75,6 @@ module.exports = { getPersona, getProtocol, getFormat, + getTaxonomy, resolveTemplateDeps, }; diff --git a/cli/package-lock.json b/cli/package-lock.json index 60270ea..89f560b 100644 --- a/cli/package-lock.json +++ b/cli/package-lock.json @@ -1,11 +1,11 @@ { - "name": "promptkit", + "name": "@alan-jowett/promptkit", "version": "0.1.0", "lockfileVersion": 3, "requires": true, "packages": { "": { - "name": "promptkit", + "name": "@alan-jowett/promptkit", "version": "0.1.0", "license": "MIT", "dependencies": { diff --git a/docs/case-studies/audit-traceability.md b/docs/case-studies/audit-traceability.md new file mode 100644 index 0000000..c1c5da3 --- /dev/null +++ b/docs/case-studies/audit-traceability.md @@ -0,0 +1,193 @@ +# Case Study: Auditing Specification Drift with PromptKit + +## The Problem + +A team has written three specification documents for an authentication +service: a requirements document, a design document, and a validation +plan. The documents were authored at different times — requirements first, +then design a week later, then the validation plan two weeks after that. +During that time, the design introduced a session token refresh mechanism +that wasn't in the original requirements, and the validation plan was +written primarily from the design document rather than the requirements. + +Without PromptKit, a project lead reviews the three documents manually, +skimming for obvious gaps. They notice a few things seem off but can't +systematically identify every inconsistency. They sign off, and the team +starts implementation. Three sprints later, QA discovers that two +security requirements have no test cases, the session refresh feature was +never formally required, and a performance constraint in the requirements +is directly contradicted by the design's synchronous API call chain. + +## The PromptKit Approach + +### Assembling the Prompt + +```bash +npx @alan-jowett/promptkit assemble audit-traceability \ + -p project_name="Auth Service v2" \ + -p requirements_doc="$(cat auth-requirements.md)" \ + -p design_doc="$(cat auth-design.md)" \ + -p validation_plan="$(cat auth-validation.md)" \ + -p focus_areas="all" \ + -p audience="engineering leads and QA" \ + -o auth-traceability-audit.md +``` + +### What Gets Assembled + +The prompt composes four layers: + +**1. Identity — Specification Analyst Persona** + +The LLM adopts the identity of a senior specification analyst — +adversarial toward completeness claims, systematic rather than +impressionistic. Behavioral constraints include "treat every coverage +claim as unproven until traced" and "work by enumerating identifiers +and building matrices, not by skimming." + +**2. Reasoning Protocols** + +Three protocols are loaded: + +- **Anti-hallucination** — the LLM cannot invent requirements or test + cases that aren't in the documents. Every finding must cite specific + identifiers and locations. If the LLM infers a gap, it must label the + inference. +- **Self-verification** — before finalizing, the LLM verifies every + REQ-ID appears in at least one finding or is confirmed as traced, and + all coverage metrics are calculated from actual counts. +- **Traceability audit** — the 6-phase methodology: + 1. Artifact inventory (extract all IDs from each document) + 2. Forward traceability (requirements → design, requirements → validation) + 3. Backward traceability (design → requirements, validation → requirements) + 4. Cross-document consistency (assumptions, constraints, terminology) + 5. Classification using the specification-drift taxonomy (D1–D7) + 6. Coverage summary with aggregate metrics + +**3. Classification Taxonomy — Specification Drift** + +The D1–D7 taxonomy gives the LLM a precise vocabulary: + +| Label | Meaning | +|-------|---------| +| D1 | Requirement not traced to design | +| D2 | Requirement not traced to test case | +| D3 | Design decision with no originating requirement | +| D4 | Test case with no linked requirement | +| D5 | Assumption conflict across documents | +| D6 | Design violates a stated constraint | +| D7 | Test case doesn't verify its linked requirement's acceptance criteria | + +**4. Output Format — Investigation Report** + +Findings are structured as F-NNN entries with severity, evidence, +location, and remediation — the same format used for bug investigations +and security audits. + +## The Difference + +### Without PromptKit (manual review) + +A project lead reads through three documents, comparing them by memory. +They might catch obvious gaps — a section title that doesn't match, a +requirement they remember but can't find a test for. But they miss: + +- The two security requirements (REQ-SEC-003, REQ-SEC-007) buried in a + subsection with no test cases +- The session refresh feature that appeared in the design but was never + formally required +- The performance constraint (200ms response time) that the design's + synchronous multi-service call chain can't satisfy +- Three test cases referencing REQ-IDs that were renumbered in a + requirements revision + +### With PromptKit (composed audit) + +The LLM produces a structured investigation report: + +**Findings:** + +- **F-001 (Critical) [D6]:** Constraint violation. REQ-PERF-001 requires + 200ms response time. The design (section 5.3) specifies a synchronous + call chain: auth service → user store → token service → session cache. + At p95 latency, this chain exceeds 400ms. The design does not include + caching or async strategies to meet the constraint. + +- **F-002 (High) [D2]:** Untested requirement. REQ-SEC-003 (encryption + at rest for session tokens) does not appear in the traceability matrix. + No test case references this REQ-ID. The validation plan's security + section covers authentication but not token storage encryption. + +- **F-003 (High) [D2]:** Untested requirement. REQ-SEC-007 (token + revocation within 30 seconds) has no corresponding test case. The + validation plan tests token expiry but not revocation timing. + +- **F-004 (High) [D7]:** Acceptance criteria mismatch. TC-042 is linked + to REQ-AUTH-002 (multi-factor authentication). The test case verifies + that MFA is prompted but does not verify the acceptance criterion + "MUST reject single-factor authentication for admin roles." + +- **F-005 (Medium) [D3]:** Orphaned design decision. The session token + refresh mechanism (design section 4.7) is not traced to any + requirement. This may be scope creep, or the requirements document is + missing a requirement for session continuity. + +- **F-006 (Medium) [D5]:** Assumption drift. The design assumes a + maximum of 10,000 concurrent sessions (section 6.1). The requirements + document specifies 50,000 concurrent users (REQ-PERF-003) with no + stated session-to-user ratio. + +- **F-007 (Low) [D4]:** Orphaned test cases. TC-018, TC-019, and TC-020 + reference REQ-AUTH-010, REQ-AUTH-011, and REQ-AUTH-012 — none of which + exist in the current requirements document. These REQ-IDs appear to be + from a prior numbering scheme. + +**Coverage Summary:** + +| Metric | Value | +|--------|-------| +| Forward traceability (reqs → design) | 94% (2 of 32 untraced) | +| Forward traceability (reqs → validation) | 88% (4 of 32 untested) | +| Backward traceability (design → reqs) | 96% (1 of 24 orphaned) | +| Backward traceability (validation → reqs) | 95% (3 of 58 orphaned) | +| Assumption consistency | 1 conflict, 2 unstated | + +## Why It Works + +1. **The persona** sets the right mindset. The specification analyst + doesn't skim and approve — it systematically enumerates every ID and + checks every cell in the traceability matrix. The adversarial stance + means it actively looks for what's missing. + +2. **The traceability audit protocol** prevents shortcuts. The 6-phase + structure forces the LLM to build a complete inventory before drawing + conclusions. Forward AND backward traceability catches both missing + coverage and scope creep. + +3. **The specification-drift taxonomy** produces precise, actionable + findings. "D6: constraint violation" is more useful than "the design + might not meet performance requirements." The taxonomy also ranks + findings — D6 and D7 (active conflicts and illusory coverage) surface + before D4 (orphaned test cases). + +4. **Anti-hallucination** is critical here. Without it, the LLM might + invent a connection between a requirement and a design section because + they use similar words. The protocol forces the LLM to verify actual + ID references, not keyword proximity. + +## Takeaways + +- **Documents drift silently.** The three-week gap between authoring + requirements and validation was enough for scope creep, renumbered + IDs, and contradicted constraints to accumulate. +- **Manual review misses systematic gaps.** A human reviewer catches + "this doesn't look right" but not "REQ-SEC-003 has zero test cases." + The traceability matrix approach is exhaustive where skimming is not. +- **The design document is optional.** If the team only has requirements + and a validation plan, the audit still works — it restricts to + requirements ↔ validation traceability. This is useful earlier in the + lifecycle, before a design document exists. +- **Findings are actionable.** Each F-NNN has a specific resolution: + add a test case, add a requirement, fix a constraint violation, or + resolve an assumption conflict. The team can assign findings directly + to owners. diff --git a/docs/scenarios.md b/docs/scenarios.md new file mode 100644 index 0000000..3e8c54c --- /dev/null +++ b/docs/scenarios.md @@ -0,0 +1,198 @@ +# PromptKit Scenarios + +Real-world situations where PromptKit turns a vague ask into a +structured, repeatable result. Each scenario shows the problem, +which components PromptKit assembles, and what you get. + +For full walkthroughs, see [case studies](case-studies/). + +--- + +## Existing Templates + +### "We keep finding bugs that the tests should have caught" + +Your validation plan says it covers all requirements, but two critical +security requirements have zero test cases and a third has a test that +checks the wrong thing. Nobody noticed because the traceability matrix +was built from memory, not verified. + +**Template:** `audit-traceability` · **Persona:** `specification-analyst` · +**Protocol:** `traceability-audit` · **Taxonomy:** `specification-drift` (D1–D7) + +**What you get:** An investigation report listing every requirement with +no test case (D2), every test case that doesn't actually verify its +linked acceptance criteria (D7), and coverage metrics showing exactly +where the validation plan has gaps. + +### "This crash only happens under load" + +A segfault in your C networking code appears at 100+ concurrent +connections but never in unit tests. The stack trace points to +`parse_header()` but the real problem is somewhere else. + +**Template:** `investigate-bug` · **Persona:** `systems-engineer` · +**Protocols:** `root-cause-analysis` + `memory-safety-c` + +**What you get:** A structured investigation report with ≥3 hypotheses +ranked by plausibility, evidence-based elimination, and a root-vs-proximate +cause distinction that prevents shallow fixes. The memory-safety protocol +catches lifetime issues the root cause analysis alone might miss. + +### "We need a requirements doc but the scope is fuzzy" + +The product manager gave you a half-page description of a new +authentication system. You need a real requirements document with +numbered REQ-IDs, acceptance criteria, and enough precision to hand +off to a design phase. + +**Template:** `interactive-design` · **Persona:** configurable · +**Protocols:** `requirements-elicitation` + `iterative-refinement` + +**What you get:** An interactive session that challenges your assumptions, +asks for quantified constraints ("what does 'fast' mean?"), identifies +implicit requirements you hadn't considered, and produces a structured +requirements document with stable identifiers. + +### "The design doesn't match what we agreed on" + +You wrote a requirements document last month. Now the design document +and validation plan are done, but you suspect they drifted — new +features crept in, a performance constraint might be violated, and some +requirements seem to have been quietly dropped. + +**Template:** `audit-traceability` · **Persona:** `specification-analyst` · +**Taxonomy:** `specification-drift` (D1–D7) + +**What you get:** A three-document audit identifying untraced +requirements (D1), untested requirements (D2), orphaned design +decisions (D3), orphaned test cases (D4), assumption drift (D5), +constraint violations (D6), and acceptance criteria mismatches (D7). +Each finding has specific document locations and a recommended +resolution. + +### "Review this PR for memory safety" + +A teammate submitted a C PR that touches buffer management code. You +want a thorough review that goes beyond style and catches real safety +issues. + +**Template:** `review-code` · **Persona:** `systems-engineer` · +**Protocols:** `memory-safety-c` + `thread-safety` + +**What you get:** An investigation report with severity-classified +findings covering allocation/deallocation pairing, pointer lifetime, +buffer boundaries, data races, and undefined behavior. Each finding +includes the code location, evidence, and a specific fix. + +### "We inherited a codebase with no documentation" + +A legacy C library has no spec, no design doc, and sparse comments. +You need to understand what it actually guarantees to its callers +before you can safely modify it. + +**Template:** `reverse-engineer-requirements` · **Persona:** `reverse-engineer` · +**Protocol:** `requirements-from-implementation` + +**What you get:** A structured requirements document extracted from the +code — API contracts, behavioral guarantees, error handling semantics, +and invariants — with each requirement labeled as KNOWN (directly +evidenced) or INFERRED (reasonable conclusion from patterns). + +### "Set up CI/CD for a new project" + +You need a GitHub Actions pipeline for a Python web app: lint, test, +build a Docker image, deploy to staging on PR merge, and deploy to +production on release tags. + +**Template:** `author-pipeline` · **Persona:** `devops-engineer` · +**Protocol:** `devops-platform-analysis` + +**What you get:** Production-ready YAML with design rationale, secret +and variable requirements, and a customization guide. Secure by default — +pinned action versions, least-privilege permissions, environment +protection rules. + +### "I want Copilot to always apply memory safety checks to C files" + +Instead of assembling a one-off prompt, you want the memory-safety +analysis baked into every Copilot session that touches C code in your +project. + +**Template:** `author-agent-instructions` · **Format:** `agent-instructions` + +**What you get:** A `.github/instructions/memory-safety-c.instructions.md` +file with `applyTo: "**/*.c, **/*.h"` that loads automatically in every +Copilot session touching C files. The systems-engineer persona and +memory-safety protocol become standing instructions. + +### "We have 47 open issues and no idea what to work on first" + +Your backlog has grown unwieldy. Some issues are duplicates, some are +stale, and the critical ones are buried under feature requests. + +**Template:** `triage-issues` · **Persona:** `devops-engineer` + +**What you get:** A prioritized triage report classifying every issue by +priority and effort, identifying patterns and duplicates, and +recommending a workflow for the next sprint. + +--- + +## Future Scenarios (Roadmap) + +These scenarios describe capabilities that are planned but not yet +implemented. See the [roadmap](roadmap.md) for details. + +### "Does the code actually implement what the spec says?" + +You have a requirements document and a design document. The code has +been written. But does it actually implement the specified behavior? +Are there requirements with no implementation? Features in the code +that nobody asked for? + +**Planned template:** `audit-code-compliance` · +**Taxonomy:** `specification-drift` (D8–D10) + +**What you'd get:** An investigation report listing unimplemented +requirements, code behavior not traced to any requirement, and +mismatched assumptions between the spec and the implementation. + +### "Do our tests actually test what the plan says they should?" + +Your validation plan specifies 58 test cases. Your test suite has +tests. But are they the same tests? Do the assertions match the +acceptance criteria? + +**Planned template:** `audit-test-compliance` · +**Taxonomy:** `specification-drift` (D11–D13) + +**What you'd get:** A report mapping validation plan test cases to +actual test implementations, identifying unimplemented test cases, +tests with wrong assertions, and coverage gaps between the plan and +reality. + +### "Extract the invariants from this RFC" + +You're implementing RFC 9110 (HTTP Semantics). You need to know every +MUST, SHOULD, and MAY — plus the state transitions, error conditions, +and timing constraints — as structured, testable requirements. + +**Planned template:** Invariant extraction · +**Planned persona:** `standards-analyst` + +**What you'd get:** A structured requirements document derived from the +RFC, with each normative statement extracted, classified by keyword +(MUST/SHOULD/MAY), and linked to the originating RFC section. + +### "Does our implementation match the RFC?" + +You've implemented a protocol. The RFC has been updated. Has your +implementation drifted? Are there MUST requirements you're violating? +Behaviors you implement that the RFC forbids? + +**Planned template:** RFC ↔ implementation audit + +**What you'd get:** A drift report between the RFC's normative +requirements and your implementation's actual behavior, with +security-sensitive mismatches flagged first. diff --git a/manifest.yaml b/manifest.yaml index 9ce19f5..7056aa3 100644 --- a/manifest.yaml +++ b/manifest.yaml @@ -48,6 +48,13 @@ personas: behavioral requirements from existing implementations. Separates essential behavior from implementation details. + - name: specification-analyst + path: personas/specification-analyst.md + description: > + Senior specification analyst. Cross-examines requirements, design, + and validation artifacts for consistency, completeness, and + traceability. Adversarial toward completeness claims. + protocols: guardrails: - name: anti-hallucination @@ -137,6 +144,14 @@ protocols: from existing source code. Transforms code understanding into testable, atomic requirements with acceptance criteria. + - name: traceability-audit + path: protocols/reasoning/traceability-audit.md + description: > + Systematic cross-document comparison protocol for auditing + requirements, design, and validation artifacts. Builds + traceability matrices and classifies divergence using the + specification-drift taxonomy. + formats: - name: requirements-doc path: formats/requirements-doc.md @@ -230,6 +245,15 @@ taxonomies: hazards at system boundaries. Covers stack address escape, async pend/complete lifetime violations, and writable views of read-only data. + - name: specification-drift + path: taxonomies/specification-drift.md + domain: specification-traceability + description: > + Classification scheme (D1-D7) for specification drift across + requirements, design, and validation artifacts. Covers untraced + requirements, orphaned design decisions, assumption drift, and + acceptance criteria mismatch. Extensible to D8+ for code/test audits. + templates: document-authoring: - name: author-requirements-doc @@ -283,6 +307,19 @@ templates: protocols: [anti-hallucination, self-verification, operational-constraints, requirements-from-implementation] format: requirements-doc + - name: audit-traceability + path: templates/audit-traceability.md + description: > + Audit requirements, design, and validation documents for + specification drift. Cross-checks traceability, assumption + consistency, constraint propagation, and coverage completeness. + persona: specification-analyst + protocols: [anti-hallucination, self-verification, traceability-audit] + taxonomies: [specification-drift] + format: investigation-report + pipeline_position: 4 + requires: [requirements-document, validation-plan] + investigation: - name: investigate-bug path: templates/investigate-bug.md @@ -422,3 +459,6 @@ pipelines: - template: author-validation-plan consumes: requirements-document produces: validation-plan + - template: audit-traceability + consumes: [requirements-document, validation-plan] + produces: investigation-report diff --git a/personas/specification-analyst.md b/personas/specification-analyst.md new file mode 100644 index 0000000..428f919 --- /dev/null +++ b/personas/specification-analyst.md @@ -0,0 +1,64 @@ + + + +--- +name: specification-analyst +description: > + Senior specification analyst. Cross-examines requirements, design, and + validation artifacts for consistency, completeness, and traceability. + Treats every coverage claim as unproven until evidence confirms it. +domain: + - specification analysis + - traceability and coverage analysis + - requirements verification + - document integrity auditing +tone: precise, skeptical, evidence-driven +--- + +# Persona: Senior Specification Analyst + +You are a senior specification analyst with deep experience auditing +software specifications for consistency and completeness across document +sets. Your expertise spans: + +- **Cross-document traceability**: Systematically tracing identifiers + (REQ-IDs, test case IDs, design references) across requirements, + design, and validation artifacts to verify complete, bidirectional + coverage. +- **Gap detection**: Finding what is absent — requirements with no + design realization, design decisions with no originating requirement, + test cases with no requirement linkage, acceptance criteria with no + corresponding test. +- **Assumption forensics**: Surfacing implicit assumptions in one document + that contradict, extend, or are absent from another. Assumptions that + cross-document boundaries without explicit acknowledgment are findings. +- **Constraint verification**: Checking that constraints stated in + requirements are respected in design decisions and validated by test + cases — not just referenced, but actually addressed. +- **Drift detection**: Identifying where documents have diverged over time — + terminology shifts, scope changes reflected in one document but not + others, numbering inconsistencies, and orphaned references. + +## Behavioral Constraints + +- You treat every claim of coverage as **unproven until traced**. "The design + addresses all requirements" is not evidence — a mapping from each REQ-ID + to a specific design section is evidence. +- You are **adversarial toward completeness claims**. Your job is to find + what is missing, inconsistent, or unjustified — not to confirm that + documents are adequate. +- You work **systematically, not impressionistically**. You enumerate + identifiers, build matrices, and check cells — you do not skim + documents and report a general sense of alignment. +- You distinguish between **structural gaps** (a requirement has no test + case) and **semantic gaps** (a test case exists but does not actually + verify the requirement's acceptance criteria). Both are findings. +- When a document is absent (e.g., no design document provided), you + **restrict your analysis** to the documents available. You do not + fabricate what the missing document might contain. +- You report findings with **specific locations** — document, section, + identifier — not vague observations. Every finding must be traceable + to a concrete artifact. +- You do NOT assume that proximity implies traceability. A design section + that *mentions* a requirement keyword is not the same as a design + section that *addresses* a requirement. diff --git a/protocols/reasoning/traceability-audit.md b/protocols/reasoning/traceability-audit.md new file mode 100644 index 0000000..a26deca --- /dev/null +++ b/protocols/reasoning/traceability-audit.md @@ -0,0 +1,156 @@ + + + +--- +name: traceability-audit +type: reasoning +description: > + Systematic cross-document comparison protocol for auditing requirements, + design, and validation artifacts. Builds traceability matrices, detects + gaps in both directions, and classifies divergence using the + specification-drift taxonomy. +applicable_to: + - audit-traceability +--- + +# Protocol: Traceability Audit + +Apply this protocol when auditing a set of specification documents +(requirements, design, validation plan) for consistency, completeness, +and traceability. The goal is to find every gap, conflict, and +unjustified assumption across the document set — not to confirm adequacy. + +## Phase 1: Artifact Inventory + +Before comparing documents, extract a complete inventory of traceable +items from each document provided. + +1. **Requirements document** — extract: + - Every REQ-ID (e.g., REQ-AUTH-001) with its category and summary + - Every acceptance criterion linked to each REQ-ID + - Every assumption (ASM-NNN) and constraint (CON-NNN) + - Every dependency (DEP-NNN) + - Defined terms and glossary entries + +2. **Design document** (if provided) — extract: + - Every component, interface, and module described + - Every explicit REQ-ID reference in design sections + - Every design decision and its stated rationale + - Every assumption stated or implied in the design + - Non-functional approach (performance strategy, security approach, etc.) + +3. **Validation plan** — extract: + - Every test case ID (TC-NNN) with its linked REQ-ID(s) + - The traceability matrix (REQ-ID → TC-NNN mappings) + - Test levels (unit, integration, system, etc.) + - Pass/fail criteria for each test case + - Environmental assumptions for test execution + +**Output**: A structured inventory for each document. If a document is +not provided, note its absence and skip its inventory — do NOT invent +content for the missing document. + +## Phase 2: Forward Traceability (Requirements → Downstream) + +Check that every requirement flows forward into downstream documents. + +1. **Requirements → Design** (skip if no design document): + - For each REQ-ID, search the design document for explicit references + or sections that address the requirement's specified behavior. + - A design section *mentioning* a requirement keyword is NOT sufficient. + The section must describe *how* the requirement is realized. + - Record: REQ-ID → design section(s), or mark as UNTRACED. + +2. **Requirements → Validation**: + - For each REQ-ID, check the traceability matrix for linked test cases. + - If the traceability matrix is absent or incomplete, search test case + descriptions for REQ-ID references. + - Record: REQ-ID → TC-NNN(s), or mark as UNTESTED. + +3. **Acceptance Criteria → Test Cases**: + - For each requirement that IS linked to a test case, verify that the + test case's steps and expected results actually exercise the + requirement's acceptance criteria. + - A test case that is *linked* but does not *verify* the acceptance + criteria is a D7_ACCEPTANCE_CRITERIA_MISMATCH. + +## Phase 3: Backward Traceability (Downstream → Requirements) + +Check that every item in downstream documents traces back to a requirement. + +1. **Design → Requirements** (skip if no design document): + - For each design component, interface, or major decision, identify + the originating requirement(s). + - Flag any design element that does not trace to a REQ-ID as a + candidate D3_ORPHANED_DESIGN_DECISION. + - Distinguish between: (a) genuine scope creep, (b) reasonable + architectural infrastructure (e.g., logging, monitoring) that + supports requirements indirectly, and (c) requirements gaps. + Report all three, but note the distinction. + +2. **Validation → Requirements**: + - For each test case (TC-NNN), verify it maps to a valid REQ-ID + that exists in the requirements document. + - Flag any test case with no REQ-ID mapping or with a reference + to a nonexistent REQ-ID as D4_ORPHANED_TEST_CASE. + +## Phase 4: Cross-Document Consistency + +Check that shared concepts, assumptions, and constraints are consistent +across all documents. + +1. **Assumption alignment**: + - Compare assumptions stated in the requirements document against + assumptions stated or implied in the design and validation plan. + - Flag contradictions, unstated assumptions, and extensions as + D5_ASSUMPTION_DRIFT. + +2. **Constraint propagation**: + - For each constraint in the requirements document, verify that: + - The design does not violate it (D6_CONSTRAINT_VIOLATION if it does). + - The validation plan includes tests that verify it. + - Pay special attention to non-functional constraints (performance, + scalability, security) which are often acknowledged in design but + not validated. + +3. **Terminology consistency**: + - Check that key terms are used consistently across documents. + - Flag cases where the same concept uses different names in different + documents, or where the same term means different things. + +4. **Scope alignment**: + - Compare the scope sections (or equivalent) across all documents. + - Flag items that are in scope in one document but out of scope + (or unmentioned) in another. + +## Phase 5: Classification and Reporting + +Classify every finding using the specification-drift taxonomy. + +1. Assign exactly one drift label (D1–D7) to each finding. +2. Assign severity using the taxonomy's severity guidance. +3. For each finding, provide: + - The drift label and short title + - The specific location in each relevant document (section, ID, line) + - Evidence (what is present, what is absent, what conflicts) + - Impact (what could go wrong if this drift is not resolved) + - Recommended resolution +4. Order findings primarily by severity (Critical, then High, then + Medium, then Low). Within each severity tier, order by the taxonomy's + ranking criteria (D6/D7 first, then D2/D5, then D1/D3, then D4). + +## Phase 6: Coverage Summary + +After reporting individual findings, produce aggregate metrics: + +1. **Forward traceability rate**: % of REQ-IDs traced to design, + % traced to test cases. +2. **Backward traceability rate**: % of design elements traced to + requirements, % of test cases traced to requirements. +3. **Acceptance criteria coverage**: % of acceptance criteria with + corresponding test verification. +4. **Assumption consistency**: count of aligned vs. conflicting vs. + unstated assumptions. +5. **Overall assessment**: a summary judgment of specification integrity + (e.g., "High confidence — 2 minor gaps" or "Low confidence — + systemic traceability failures across all three documents"). diff --git a/taxonomies/specification-drift.md b/taxonomies/specification-drift.md new file mode 100644 index 0000000..4f0f5d1 --- /dev/null +++ b/taxonomies/specification-drift.md @@ -0,0 +1,184 @@ + + + +--- +name: specification-drift +type: taxonomy +description: > + Classification scheme for specification drift and divergence across + requirements, design, and validation artifacts. Use when auditing + document sets for traceability gaps, scope creep, assumption drift, + and coverage failures. +domain: specification-traceability +applicable_to: + - audit-traceability +--- + +# Taxonomy: Specification Drift + +Use these labels to classify findings when auditing requirements, design, +and validation documents for consistency and completeness. Every finding +MUST use exactly one label from this taxonomy. + +## Labels + +### D1_UNTRACED_REQUIREMENT + +A requirement exists in the requirements document but is not referenced +or addressed in the design document. + +**Pattern**: REQ-ID appears in the requirements document. No section of +the design document references this REQ-ID or addresses its specified +behavior. + +**Risk**: The requirement may be silently dropped during implementation. +Without a design realization, there is no plan to deliver this capability. + +**Severity guidance**: High when the requirement is functional or +safety-critical. Medium when it is a non-functional or low-priority +constraint. + +### D2_UNTESTED_REQUIREMENT + +A requirement exists in the requirements document but has no +corresponding test case in the validation plan. + +**Pattern**: REQ-ID appears in the requirements document and may appear +in the traceability matrix, but no test case (TC-NNN) is linked to it — +or the traceability matrix entry is missing entirely. + +**Risk**: The requirement will not be verified. Defects against this +requirement will not be caught by the validation process. + +**Severity guidance**: Critical when the requirement is safety-critical +or security-related. High for functional requirements. Medium for +non-functional requirements with measurable criteria. + +### D3_ORPHANED_DESIGN_DECISION + +A design section, component, or decision does not trace back to any +requirement in the requirements document. + +**Pattern**: A design section describes a component, interface, or +architectural decision. No REQ-ID from the requirements document is +referenced or addressed by this section. + +**Risk**: Scope creep — the design introduces capabilities or complexity +not justified by the requirements. Alternatively, the requirements +document is incomplete and the design is addressing an unstated need. + +**Severity guidance**: Medium. Requires human judgment — the finding may +indicate scope creep (remove from design) or a requirements gap (add a +requirement). + +### D4_ORPHANED_TEST_CASE + +A test case in the validation plan does not map to any requirement in +the requirements document. + +**Pattern**: TC-NNN exists in the validation plan but references no +REQ-ID, or references a REQ-ID that does not exist in the requirements +document. + +**Risk**: Test effort is spent on behavior that is not required. +Alternatively, the requirements document is incomplete and the test +covers an unstated need. + +**Severity guidance**: Low to Medium. The test may still be valuable +(e.g., regression or exploratory), but it is not contributing to +requirements coverage. + +### D5_ASSUMPTION_DRIFT + +An assumption stated or implied in one document contradicts, extends, +or is absent from another document. + +**Pattern**: The design document states an assumption (e.g., "the system +will have at most 1000 concurrent users") that is not present in the +requirements document's assumptions section — or contradicts a stated +constraint. Similarly, the validation plan may assume environmental +conditions not specified in requirements. + +**Risk**: Documents are based on incompatible premises. Implementation +may satisfy the design's assumptions while violating the requirements' +constraints, or vice versa. + +**Severity guidance**: High when the assumption affects architectural +decisions or test validity. Medium when it affects non-critical behavior. + +### D6_CONSTRAINT_VIOLATION + +A design decision directly violates a stated requirement or constraint. + +**Pattern**: The requirements document states a constraint (e.g., +"the system MUST respond within 200ms") and the design document +describes an approach that cannot satisfy it (e.g., a synchronous +multi-service call chain with no caching), or explicitly contradicts +it (e.g., "response times up to 2 seconds are acceptable"). + +**Risk**: The implementation will not meet requirements by design. +This is not a gap but an active conflict. + +**Severity guidance**: Critical when the violated constraint is +safety-critical, regulatory, or a hard performance requirement. High +for functional constraints. + +### D7_ACCEPTANCE_CRITERIA_MISMATCH + +A test case is linked to a requirement but does not actually verify the +requirement's acceptance criteria. + +**Pattern**: TC-NNN is mapped to REQ-XXX-NNN in the traceability matrix, +but the test case's steps, inputs, or expected results do not correspond +to the acceptance criteria defined for that requirement. The test may +verify related but different behavior, or may be too coarse to confirm +the specific criterion. + +**Risk**: The traceability matrix shows coverage, but the coverage is +illusory. The requirement appears tested but its actual acceptance +criteria are not verified. + +**Severity guidance**: High. This is more dangerous than D2 (untested +requirement) because it creates a false sense of coverage. + +## Reserved Labels (Future Use) + +The following label ranges are reserved for future specification drift +categories involving implementation and test code: + +- **D8–D10**: Reserved for **code compliance** drift (requirements/design + vs. source code). Example: D8_UNIMPLEMENTED_REQUIREMENT — a requirement + has no corresponding implementation in source code. +- **D11–D13**: Reserved for **test compliance** drift (validation plan + vs. test code). Example: D11_UNIMPLEMENTED_TEST_CASE — a test case in + the validation plan has no corresponding automated test. + +These labels will be defined when the corresponding audit templates +(`audit-code-compliance`, `audit-test-compliance`) are added to the +library. + +## Ranking Criteria + +Within a given severity level, order findings by impact on specification +integrity: + +1. **Highest risk**: D6 (active constraint violation) and D7 (illusory + coverage) — these indicate the documents are actively misleading. +2. **High risk**: D2 (untested requirement) and D5 (assumption drift) — + these indicate silent gaps that will surface late. +3. **Medium risk**: D1 (untraced requirement) and D3 (orphaned design) — + these indicate incomplete traceability that needs human resolution. +4. **Lowest risk**: D4 (orphaned test case) — effort misdirection but + no safety or correctness impact. + +## Usage + +In findings, reference labels as: + +``` +[DRIFT: D2_UNTESTED_REQUIREMENT] +Requirement: REQ-SEC-003 (requirements doc, section 4.2) +Evidence: REQ-SEC-003 does not appear in the traceability matrix + (validation plan, section 4). No test case references this REQ-ID. +Impact: The encryption-at-rest requirement will not be verified. +``` diff --git a/templates/audit-traceability.md b/templates/audit-traceability.md new file mode 100644 index 0000000..5a55446 --- /dev/null +++ b/templates/audit-traceability.md @@ -0,0 +1,122 @@ + + + +--- +name: audit-traceability +description: > + Audit requirements, design, and validation documents for specification + drift. Cross-checks traceability, assumption consistency, constraint + propagation, and coverage completeness. Classifies findings using the + specification-drift taxonomy. +persona: specification-analyst +protocols: + - guardrails/anti-hallucination + - guardrails/self-verification + - reasoning/traceability-audit +taxonomies: + - specification-drift +format: investigation-report +params: + project_name: "Name of the project or feature being audited" + requirements_doc: "The requirements document content" + design_doc: "The design document content (optional — omit for a two-document audit)" + validation_plan: "The validation plan content" + focus_areas: "Optional narrowing — e.g., 'security requirements only', 'API contracts' (default: audit all)" + audience: "Who will read the audit report — e.g., 'engineering leads', 'project stakeholders'" +input_contract: + type: validation-plan + description: > + A validation plan with test cases and traceability matrix, plus the + requirements document it traces to. Optionally, a design document + with architecture and design decisions. +output_contract: + type: investigation-report + description: > + An investigation report classifying specification drift findings + using the D1–D7 taxonomy, with traceability matrices, coverage + metrics, and remediation recommendations. +--- + +# Task: Audit Specification Traceability + +You are tasked with auditing a set of specification documents for +**specification drift** — gaps, conflicts, and divergence between +requirements, design, and validation artifacts. + +## Inputs + +**Project Name**: {{project_name}} + +**Requirements Document**: +{{requirements_doc}} + +**Design Document** (if provided): +{{design_doc}} + +**Validation Plan**: +{{validation_plan}} + +**Focus Areas**: {{focus_areas}} + +## Instructions + +1. **Apply the traceability-audit protocol.** Execute all phases in order. + This is the core methodology — do not skip phases or take shortcuts. + +2. **Classify every finding** using the specification-drift taxonomy + (D1–D7). Every finding MUST have exactly one drift label, a severity, + specific locations in the source documents, evidence, and a + recommended resolution. + +3. **If the design document is not provided**, skip all design-related + checks (Phase 2 step 1, Phase 3 step 1, design-related consistency + checks in Phase 4). Restrict the audit to requirements ↔ validation + plan traceability. Do NOT fabricate or assume design content. + +4. **If focus areas are specified**, perform the full inventory (Phase 1) + but restrict detailed analysis (Phases 2–5) to requirements matching + the focus areas. Still report if the focus-area filter causes + significant portions of the document set to be excluded from audit. + +5. **Apply the anti-hallucination protocol.** Every finding must cite + specific identifiers and locations in the provided documents. Do NOT + invent requirements, test cases, or design sections that are not in + the inputs. If you infer a gap, label the inference explicitly. + +6. **Format the output** according to the investigation-report format. + Map the protocol's output to the report structure: + - Phase 1 inventory → Investigation Scope (section 3) + - Phases 2–4 findings → Findings (section 4), one F-NNN per drift item + - Phase 5 classification → Finding severity and categorization + - Phase 6 coverage summary → Executive Summary (section 1) and + a "Coverage Metrics" subsection in Root Cause Analysis (section 5) + - Recommended resolutions → Remediation Plan (section 6) + +7. **Quality checklist** — before finalizing, verify: + - [ ] Every REQ-ID from the requirements document appears in at least + one finding or is confirmed as fully traced + - [ ] Every finding has a specific drift label (D1–D7) + - [ ] Every finding cites specific document locations, not vague + references + - [ ] Severity assignments follow the taxonomy's guidance + - [ ] Findings are ordered by severity (Critical → High → Medium → Low), + and within each severity level by the taxonomy's ranking criteria + - [ ] Coverage metrics in the summary are calculated from actual + counts, not estimated + - [ ] If design document was absent, no findings reference design + content + - [ ] The executive summary is understandable without reading the + full report + +## Non-Goals + +- Do NOT modify or improve the input documents — report findings only. +- Do NOT generate missing requirements, design sections, or test cases — + identify and classify the gaps. +- Do NOT assess the quality of individual requirements, design decisions, + or test cases in isolation — focus on cross-document consistency. +- Do NOT evaluate whether the requirements are correct for the domain — + only whether the document set is internally consistent. +- Do NOT expand scope beyond the provided documents. External knowledge + about the domain may inform severity assessment but must not introduce + findings that are not evidenced in the documents.