Skip to content

PDX-483: add STEPS_REQUIRED runtime guard on testcase_generate#176

Merged
mrdailey99 merged 2 commits into
developfrom
feature/PDX-483-steps-required-guard
May 15, 2026
Merged

PDX-483: add STEPS_REQUIRED runtime guard on testcase_generate#176
mrdailey99 merged 2 commits into
developfrom
feature/PDX-483-steps-required-guard

Conversation

@mrdailey99
Copy link
Copy Markdown
Collaborator

Summary

  • Add an active runtime guard to provar_testcase_generate that rejects the PDX-479 multi-call construction shape (steps:[] + dry_run:false + output_path) with a new STEPS_REQUIRED error code, before any side effects.
  • The rejection returns details.suggestion instructing the LLM to pass the FULL step tree in a single call and notes the dry_run=true escape hatch for skeleton inspection. Other empty-steps shapes (dry-run preview, no output_path) remain allowed.
  • Completes the defense-in-depth picture: PDX-481 (prompt + resource), PDX-482 (passive tool-description contract), PDX-483 (active runtime guard).

Jira

https://provartesting.atlassian.net/browse/PDX-483

Test plan

  • yarn compile passes
  • node_modules/.bin/nyc node_modules/.bin/mocha "test/**/*.test.ts" — 1136 passing
  • node scripts/mcp-smoke.cjs — 55 / 55 PASS (new entry 6b exercises the STEPS_REQUIRED rejection at protocol surface)
  • node scripts/pdx-482-validate.cjs — 28 / 28 PASS (8 new runtime-guard assertions: tools/call returns isError, error_code === STEPS_REQUIRED, retryable=false, details.suggestion non-empty + mentions FULL step tree + dry_run=true, no file on disk)
  • yarn lint passes

Edge cases covered by unit tests

steps.length dry_run output_path Result
0 true absent Allowed — skeleton inspection
0 true present Allowed — dry-run wins (no file written)
0 false absent Allowed — no persistence target
0 false present REJECTED with STEPS_REQUIRED, no file
≥ 1 true or false any Allowed — happy path

Plus an ordering test: when both STEPS_REQUIRED and PATH_NOT_ALLOWED would fire, STEPS_REQUIRED wins (the actionable root-cause error, not the misleading-fix one).

Changes

  • src/mcp/tools/testCaseGenerate.ts: runtime guard at top of handler, before try { buildTestCaseXml(...) }, before any assertPathAllowed/fs.writeFileSync side effect.
  • test/unit/mcp/testCaseGenerate.test.ts: new STEPS_REQUIRED runtime guard (PDX-483) block with 7 tests covering all 6 edge cases plus the ordering check. Existing writing to disk and path policy tests updated to use a non-empty steps[] so they target their original assertion (FILE_EXISTS, PATH_NOT_ALLOWED, mkdirp) rather than the new guard.
  • docs/mcp.md: new STEPS_REQUIRED row in the provar_testcase_generate error codes table, plus an inline behaviour matrix showing which empty-steps shapes are allowed vs rejected and what the suggestion says.
  • docs/mcp-pilot-guide.md: Scenario 12 now lists the runtime guard as the third layer of defense (prompt + description + runtime guard).
  • scripts/pdx-482-validate.cjs: new runRuntimeGuardValidation() phase drives a real tools/call with the rejected shape and asserts the structured error response and zero side effects. File renamed in spirit (still pdx-482-validate.cjs) but now covers PDX-482 + PDX-483.
  • scripts/mcp-smoke.cjs: new entry 6b calls the rejected shape; TOTAL_EXPECTED is dynamic so no static bump needed.

Backwards-compatibility note

The only programmatic caller of provar_testcase_generate is the MCP tool surface itself. A repo-wide search confirms no internal scripts or tests call the tool with the rejected shape outside of the test cases explicitly exercising the guard. Legitimate skeleton-inspection callers continue to work via dry_run=true (the escape hatch is documented in details.suggestion so any caller that hits the rejection learns it from the error response).

External-docs flag

docs/provar-mcp-public-docs.md and docs/university-of-provar-mcp-course.md may want to mention the runtime guard alongside their PDX-482 description-contract notes — flagging for the Provar team since those are maintained separately.

…rate

RCA: PDX-482 hardened the testcase_generate tool description, but the contract is passive — an LLM that ignores it pays no price. Calling generate with steps:[]+dry_run:false+output_path still wrote a TODO-only skeleton file, reproducing the PDX-479 regression class. The contract was enforced only by the agent's reading comprehension.
Fix: Add an active runtime guard that rejects the exact shape (steps:[] + non-dry-run + output_path) with STEPS_REQUIRED before any side effects. details.suggestion tells the LLM to pass the FULL step tree in one call and notes the dry_run=true escape hatch for skeleton inspection. Other empty-steps shapes (dry-run preview, no output_path) remain allowed. Docs, smoke entry, and pdx-482-validate.cjs all updated.
Copilot AI review requested due to automatic review settings May 15, 2026 16:05
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 15, 2026

Quality Orchestrator

🟢 LOW · 2 / 100 · All changed files have mapped tests.


🧪 Tests to Run · Running 1 of 51 tests

  • unit/mcp/testCaseGenerate.test.ts
▶ Run command
npx vitest run \
  unit/mcp/testCaseGenerate.test.ts

⚡ quality-orchestrator  ·  /qo stub <file>  ·  qo analyze-local

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an active STEPS_REQUIRED runtime guard to provar_testcase_generate so empty persisted test case generation is rejected before side effects, complementing existing prompt/tool-description defenses.

Changes:

  • Rejects steps: [] + dry_run: false + output_path with structured guidance.
  • Adds unit, smoke, and protocol-surface validation coverage for the guard.
  • Updates MCP documentation and pilot guide with the new behavior.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated no comments.

Show a summary per file
File Description
src/mcp/tools/testCaseGenerate.ts Adds the pre-write STEPS_REQUIRED guard.
test/unit/mcp/testCaseGenerate.test.ts Covers accepted/rejected empty-step cases and preserves existing disk/path tests.
scripts/pdx-482-validate.cjs Adds protocol-level validation for the runtime guard.
scripts/mcp-smoke.cjs Exercises the rejected shape during authoring smoke runs.
docs/mcp.md Documents the new error code and empty-steps behavior matrix.
docs/mcp-pilot-guide.md Describes the new runtime guard as part of defense-in-depth.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…stomer docs

RCA: The initial PDX-483 commit added internal Jira IDs (PDX-479, PDX-481, PDX-482, PDX-483) and engineering-process phrasing ("regression class", "before this guard") to docs/mcp.md and docs/mcp-pilot-guide.md. These are customer-facing surfaces — pilot guides and MCP reference docs — and must not leak internal ticket plumbing or dev-process language to integrators.
Fix: Rewrite the new STEPS_REQUIRED section in docs/mcp.md and the Defense-in-depth block in docs/mcp-pilot-guide.md in terms of observable behaviour and the API contract. Pre-existing PDX-479 references in the Scenario 12 background and bug-filing line are out of scope for this PR and left as-is.
@mrdailey99 mrdailey99 merged commit a1037d1 into develop May 15, 2026
4 checks passed
mrdailey99 added a commit that referenced this pull request May 15, 2026
…ol-titles

RCA: PR #176 (PDX-483 runtime guard) landed on develop while PDX-484 was in flight. Both PRs touched src/mcp/tools/testCaseGenerate.ts, docs/mcp.md, docs/mcp-pilot-guide.md, scripts/pdx-482-validate.cjs, and the test file. Git auto-merged 5 of 6 files cleanly; scripts/pdx-482-validate.cjs needed manual resolution in the header comment block where both branches rewrote the file purpose.
Fix: Combined the two header comments into a unified PDX-482/PDX-483/PDX-484 explanation that names all three layers (description contract, runtime guard, title contract). All other auto-merges verified by compile + 1140-passing tests + lint. The script header will be scrubbed of PDX-XXX refs and the file renamed in a follow-up chore PR per the no-tickets-in-script-paths convention.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants