PDX-483: add STEPS_REQUIRED runtime guard on testcase_generate#176
Merged
Conversation
…rate RCA: PDX-482 hardened the testcase_generate tool description, but the contract is passive — an LLM that ignores it pays no price. Calling generate with steps:[]+dry_run:false+output_path still wrote a TODO-only skeleton file, reproducing the PDX-479 regression class. The contract was enforced only by the agent's reading comprehension. Fix: Add an active runtime guard that rejects the exact shape (steps:[] + non-dry-run + output_path) with STEPS_REQUIRED before any side effects. details.suggestion tells the LLM to pass the FULL step tree in one call and notes the dry_run=true escape hatch for skeleton inspection. Other empty-steps shapes (dry-run preview, no output_path) remain allowed. Docs, smoke entry, and pdx-482-validate.cjs all updated.
Quality Orchestrator🟢 LOW · 🧪 Tests to Run · Running 1 of 51 tests
▶ Run commandnpx vitest run \
unit/mcp/testCaseGenerate.test.ts⚡ quality-orchestrator · |
Contributor
There was a problem hiding this comment.
Pull request overview
Adds an active STEPS_REQUIRED runtime guard to provar_testcase_generate so empty persisted test case generation is rejected before side effects, complementing existing prompt/tool-description defenses.
Changes:
- Rejects
steps: []+dry_run: false+output_pathwith structured guidance. - Adds unit, smoke, and protocol-surface validation coverage for the guard.
- Updates MCP documentation and pilot guide with the new behavior.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
src/mcp/tools/testCaseGenerate.ts |
Adds the pre-write STEPS_REQUIRED guard. |
test/unit/mcp/testCaseGenerate.test.ts |
Covers accepted/rejected empty-step cases and preserves existing disk/path tests. |
scripts/pdx-482-validate.cjs |
Adds protocol-level validation for the runtime guard. |
scripts/mcp-smoke.cjs |
Exercises the rejected shape during authoring smoke runs. |
docs/mcp.md |
Documents the new error code and empty-steps behavior matrix. |
docs/mcp-pilot-guide.md |
Describes the new runtime guard as part of defense-in-depth. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…stomer docs
RCA: The initial PDX-483 commit added internal Jira IDs (PDX-479, PDX-481, PDX-482, PDX-483) and engineering-process phrasing ("regression class", "before this guard") to docs/mcp.md and docs/mcp-pilot-guide.md. These are customer-facing surfaces — pilot guides and MCP reference docs — and must not leak internal ticket plumbing or dev-process language to integrators.
Fix: Rewrite the new STEPS_REQUIRED section in docs/mcp.md and the Defense-in-depth block in docs/mcp-pilot-guide.md in terms of observable behaviour and the API contract. Pre-existing PDX-479 references in the Scenario 12 background and bug-filing line are out of scope for this PR and left as-is.
mrdailey99
added a commit
that referenced
this pull request
May 15, 2026
…ol-titles RCA: PR #176 (PDX-483 runtime guard) landed on develop while PDX-484 was in flight. Both PRs touched src/mcp/tools/testCaseGenerate.ts, docs/mcp.md, docs/mcp-pilot-guide.md, scripts/pdx-482-validate.cjs, and the test file. Git auto-merged 5 of 6 files cleanly; scripts/pdx-482-validate.cjs needed manual resolution in the header comment block where both branches rewrote the file purpose. Fix: Combined the two header comments into a unified PDX-482/PDX-483/PDX-484 explanation that names all three layers (description contract, runtime guard, title contract). All other auto-merges verified by compile + 1140-passing tests + lint. The script header will be scrubbed of PDX-XXX refs and the file renamed in a follow-up chore PR per the no-tickets-in-script-paths convention.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
provar_testcase_generatethat rejects the PDX-479 multi-call construction shape (steps:[]+dry_run:false+output_path) with a newSTEPS_REQUIREDerror code, before any side effects.details.suggestioninstructing the LLM to pass the FULL step tree in a single call and notes thedry_run=trueescape hatch for skeleton inspection. Other empty-steps shapes (dry-run preview, nooutput_path) remain allowed.Jira
https://provartesting.atlassian.net/browse/PDX-483
Test plan
yarn compilepassesnode_modules/.bin/nyc node_modules/.bin/mocha "test/**/*.test.ts"— 1136 passingnode scripts/mcp-smoke.cjs— 55 / 55 PASS (new entry 6b exercises the STEPS_REQUIRED rejection at protocol surface)node scripts/pdx-482-validate.cjs— 28 / 28 PASS (8 new runtime-guard assertions: tools/call returns isError, error_code === STEPS_REQUIRED, retryable=false, details.suggestion non-empty + mentions FULL step tree + dry_run=true, no file on disk)yarn lintpassesEdge cases covered by unit tests
steps.lengthdry_runoutput_pathtruetruefalsefalseSTEPS_REQUIRED, no filePlus an ordering test: when both
STEPS_REQUIREDandPATH_NOT_ALLOWEDwould fire,STEPS_REQUIREDwins (the actionable root-cause error, not the misleading-fix one).Changes
src/mcp/tools/testCaseGenerate.ts: runtime guard at top of handler, beforetry { buildTestCaseXml(...) }, before anyassertPathAllowed/fs.writeFileSyncside effect.test/unit/mcp/testCaseGenerate.test.ts: newSTEPS_REQUIRED runtime guard (PDX-483)block with 7 tests covering all 6 edge cases plus the ordering check. Existingwriting to diskandpath policytests updated to use a non-emptysteps[]so they target their original assertion (FILE_EXISTS, PATH_NOT_ALLOWED, mkdirp) rather than the new guard.docs/mcp.md: newSTEPS_REQUIREDrow in theprovar_testcase_generateerror codes table, plus an inline behaviour matrix showing which empty-steps shapes are allowed vs rejected and what the suggestion says.docs/mcp-pilot-guide.md: Scenario 12 now lists the runtime guard as the third layer of defense (prompt + description + runtime guard).scripts/pdx-482-validate.cjs: newrunRuntimeGuardValidation()phase drives a realtools/callwith the rejected shape and asserts the structured error response and zero side effects. File renamed in spirit (stillpdx-482-validate.cjs) but now covers PDX-482 + PDX-483.scripts/mcp-smoke.cjs: new entry 6b calls the rejected shape;TOTAL_EXPECTEDis dynamic so no static bump needed.Backwards-compatibility note
The only programmatic caller of
provar_testcase_generateis the MCP tool surface itself. A repo-wide search confirms no internal scripts or tests call the tool with the rejected shape outside of the test cases explicitly exercising the guard. Legitimate skeleton-inspection callers continue to work viadry_run=true(the escape hatch is documented indetails.suggestionso any caller that hits the rejection learns it from the error response).External-docs flag
docs/provar-mcp-public-docs.mdanddocs/university-of-provar-mcp-course.mdmay want to mention the runtime guard alongside their PDX-482 description-contract notes — flagging for the Provar team since those are maintained separately.