feat: validate command with five fail-fast schema/lockstep/shape checks

dhruva-reddy · dhruva-reddy · commit bcd23de440b6 · 2026-05-02T01:31:20.000Z
**Problem.** The Vapi API rejects bad configs at PATCH time with terse 400s ("property speed should not exist") — and by then the push has already partially completed against other resources. We watched the same five classes of mistake hit production over and over: 1. Assistant names (or eval names) longer than 40 chars (silent cap). 2. Structured-output ↔ assistant lockstep mismatch — one side declares the relationship, the other doesn't, dashboard ends up inconsistent. 3. Prompts duplicated by paste-on-top dashboard edits (10kB prompt with two identical headers stacked, agent follows both). 4. `maxTokens` set lower than the JSON-schema size of the attached tools' arguments — assistant looks fine on push, bricks on first tool-using call. 5. Voice fields nested wrong for the provider (`voice.speed` on Cartesia, where it lives at `voice.generationConfig.speed`). **What this fix does.** Five client-side validators, all running off the same `LoadedResources` shape that `push.ts` would actually ship — so the lint runs against exactly what would be pushed, no separate parser to drift. Surfaces as warnings by default (one bad spec doesn't block an otherwise-good push); promote to abort with `--strict`. Run standalone via `npm run validate -- <org>`. **Outcome you'll notice.** Most schema-class mistakes get caught locally in seconds instead of mid-push 400s. Voice provider field mismatch gets a specific message pointing at the right path. CI can add `npm run push -- <env> --strict` as a gate before any deploy. --- Catch the classes of errors that today only surface when the API returns a 400 mid-push. The push pipeline runs validation in warn-only mode by default; --strict promotes errors to a blocking abort before any API call. Standalone runner via `npm run validate -- <org>`. Validators implemented: 1. Name length cap (40 chars). Walks every assistant.name and every evaluations[].structuredOutput.name in scenarios. Closes #18. 2. SO ↔ assistant bidirectional lockstep. For every SO file's assistant_ids, checks the named assistant's structuredOutputIds mirrors it; reverse direction too. Closes #11. 3. Prompt duplication heuristics. Same H1 heading appearing twice, repeated CONTINUITY ON ENTRY / CLOSEOUT FLOW STRUCTURE blocks. Partial fix for #8 (paste-on-top dashboard duplications). 4. maxTokens floor for tool-using assistants. Computes floor ≈ 25 + sum(len(JSON.stringify(tool.function.parameters))) per attached tool. Warns under floor. Closes #19. 5. Per-provider voice schema. Cartesia rejects top-level speed / stability / similarityBoost / enableSsmlParsing (point at generationConfig.* / drop the field). 11labs rejects generationConfig (it's a Cartesia path). Closes #9 (engine half). - src/validate.ts (NEW): validateResources(loadedResources) returning ValidationFinding[] with severity / type / resourceId / rule / message / fieldPath. Pure data; safe to test directly. - src/validate-cmd.ts (NEW): CLI entry. Loads same resource shape as push.ts so the lint runs against exactly what would ship. Exit non-zero on any error finding. - src/config.ts: --strict flag. - src/push.ts: validators run in default-warn mode; --strict aborts. - package.json: validate script. - AGENTS.md: document npm run validate and --strict. - tests/validate.test.ts: per-rule fixtures (golden + bad inputs) covering all five checks. Closes improvements.md #11, #18, #19. Resolves engine half of #9. Partial #8, #20 (heuristic only). 🤖 Generated with [Claude Code](https://claude.com/claude-code)
diff --git a/AGENTS.md b/AGENTS.md
@@ -748,7 +748,9 @@ npm run push -- <org> assistants                   # Push only assistants
 npm run push -- <org> resources/<org>/assistants/my-agent.md  # Push single file
 npm run push -- <org> <path1> <path2>              # Push multiple specific files (one state write)
 npm run push -- <org> --dry-run                    # Preview without applying any platform changes
+npm run push -- <org> --strict                     # Abort push if any validator returns an error
 npm run apply -- <org>                             # Pull then push (full sync)
+npm run validate -- <org>                          # Lint resources locally (fails fast on schema drift)
 
 # Testing
 npm run call -- <org> -a <assistant-name>          # Call an assistant via WebSocket
diff --git a/improvements.md b/improvements.md
@@ -59,19 +59,19 @@ you which stack PR closes the row.**
 | 5   | `push --dry-run`                                         | Cheapest operator-safety win                       | None       | RESOLVED 2026-04-30 (Stack C)     |
 | 6   | API-level optimistic concurrency                         | Server-side conflict rejection                     | Platform   | Deferred (Stack I, gated)         |
 | 7   | Voice edits drop pronunciation-dictionary attachments    | Silent regression on Cartesia + 11labs voice edits | #4         | Open (Stack G planned)            |
-| 8   | Dashboard prompt edits can in-place duplicate the prompt | Two stacked prompt versions = stitched output      | None       | Open (Stack D planned)            |
-| 9   | Provider-specific voice schema mismatch (push 400)       | `voice.speed` vs `voice.generationConfig.speed`    | None       | Partial — doc cheat-sheet (Stack A) |
+| 8   | Dashboard prompt edits can in-place duplicate the prompt | Two stacked prompt versions = stitched output      | None       | Partial — Stack D heuristic       |
+| 9   | Provider-specific voice schema mismatch (push 400)       | `voice.speed` vs `voice.generationConfig.speed`    | None       | RESOLVED 2026-04-30 (Stack D + A) |
 | 10  | Targeted assistant push mints duplicate tools            | Re-pushing assistant duplicates `end-call-*` tools | #4         | Partial                           |
-| 11  | Bidirectional SO ↔ assistant lockstep has no validation  | One-sided edits silently inconsistent              | None       | Open (Stack D planned)            |
+| 11  | Bidirectional SO ↔ assistant lockstep has no validation  | One-sided edits silently inconsistent              | None       | RESOLVED 2026-04-30 (Stack D)     |
 | 12  | State file accumulates UUIDs without source files        | Silent gitops drift                                | None       | Partial                           |
 | 13  | `.agent/` and `.claude/handoffs/` not gitignored         | `git add -A` sweeps PII handoff scratch            | None       | RESOLVED 2026-04-30 (Stack A)     |
 | 14  | Multi-file push undocumented                             | Discoverability                                    | None       | RESOLVED 2026-04-30 (Stack A)     |
 | 15  | Scoped push rewrites entire state file                   | Pre-existing drift sweeps into focused commits     | #4         | Open (Stack J planned)            |
 | 16  | No CLI runner for simulation suites                      | Engine pushes them, can't run them                 | None       | Open (Stack E planned)            |
 | 17  | State file key-order churn produces noisy diffs          | Reorderings hide real changes                      | None       | RESOLVED 2026-04-30 (Stack B)     |
-| 18  | Structured-output `name` capped at 40 chars (no warning) | Push fails partway after partial application       | None       | Open (Stack D planned)            |
-| 19  | No `maxTokens` floor warning for tool-using assistants   | `maxTokens: 1` bricks the assistant silently       | None       | Open (Stack D planned)            |
-| 20  | Prompt vocabulary leaks into TTS                         | `Reason.` becomes verbal contaminant               | None       | Open (Stack D heuristic planned)  |
+| 18  | Structured-output `name` capped at 40 chars (no warning) | Push fails partway after partial application       | None       | RESOLVED 2026-04-30 (Stack D)     |
+| 19  | No `maxTokens` floor warning for tool-using assistants   | `maxTokens: 1` bricks the assistant silently       | None       | RESOLVED 2026-04-30 (Stack D)     |
+| 20  | Prompt vocabulary leaks into TTS                         | `Reason.` becomes verbal contaminant               | None       | Partial — Stack D heuristic       |
 
 ---
 
diff --git a/package.json b/package.json
@@ -13,6 +13,7 @@
     "call": "bash -c 'exec tsx src/call-cmd.ts \"$@\" 2> >(grep --line-buffered -v \"buffer underflow\" >&2)' --",
     "cleanup": "tsx src/cleanup-cmd.ts",
     "eval": "tsx src/eval.ts",
+    "validate": "tsx src/validate-cmd.ts",
     "build": "tsc --noEmit",
     "test": "node --import tsx --test tests/*.test.ts"
   },
diff --git a/src/config.ts b/src/config.ts
@@ -86,18 +86,21 @@ function parseFlags(): {
   forceDelete: boolean;
   bootstrapSync: boolean;
   dryRun: boolean;
+  strictValidation: boolean;
   applyFilter: ApplyFilter;
 } {
   const args = process.argv.slice(3);
   const result: {
     forceDelete: boolean;
     bootstrapSync: boolean;
     dryRun: boolean;
+    strictValidation: boolean;
     applyFilter: ApplyFilter;
   } = {
     forceDelete: args.includes("--force"),
     bootstrapSync: args.includes("--bootstrap"),
     dryRun: args.includes("--dry-run"),
+    strictValidation: args.includes("--strict"),
     applyFilter: {},
   };
 
@@ -108,7 +111,12 @@ function parseFlags(): {
     const arg = args[i];
     if (!arg) continue;
 
-    if (arg === "--force" || arg === "--bootstrap" || arg === "--dry-run")
+    if (
+      arg === "--force" ||
+      arg === "--bootstrap" ||
+      arg === "--dry-run" ||
+      arg === "--strict"
+    )
       continue;
 
     // --confirm <slug>: consumed by cleanup.ts directly. Eat the value here so
@@ -243,6 +251,7 @@ export const {
   forceDelete: FORCE_DELETE,
   bootstrapSync: BOOTSTRAP_SYNC,
   dryRun: DRY_RUN,
+  strictValidation: STRICT_VALIDATION,
   applyFilter: APPLY_FILTER,
 } = parseFlags();
 
diff --git a/src/push.ts b/src/push.ts
@@ -6,10 +6,12 @@ import {
   VAPI_BASE_URL,
   FORCE_DELETE,
   DRY_RUN,
+  STRICT_VALIDATION,
   APPLY_FILTER,
   BASE_DIR,
   removeExcludedKeys,
 } from "./config.ts";
+import { summarizeFindings, validateResources } from "./validate.ts";
 import { loadState, saveState } from "./state.ts";
 import { loadResources, loadSingleResource, FOLDER_MAP } from "./resources.ts";
 import { fetchAllResources, resourceIdMatchesName, runPull } from "./pull.ts";
@@ -909,6 +911,30 @@ async function main(): Promise<void> {
 
   state = await maybeBootstrapState(loadedResources, state);
 
+  // Run client-side validators against the loaded resource set. In default
+  // mode, errors are surfaced as warnings so a single bad spec doesn't block
+  // an otherwise-good push. With --strict, any error-severity finding aborts
+  // before any API call.
+  console.log("\n🔎 Running validators...");
+  const findings = validateResources(loadedResources);
+  if (findings.length > 0) {
+    console.log(summarizeFindings(findings));
+  } else {
+    console.log("   ✅ No validation issues.");
+  }
+  const errorCount = findings.filter((f) => f.severity === "error").length;
+  if (errorCount > 0) {
+    if (STRICT_VALIDATION) {
+      console.error(
+        `\n❌ Validation failed (${errorCount} error(s)). --strict refuses to push. Fix the issues above or drop --strict.`,
+      );
+      process.exit(1);
+    }
+    console.warn(
+      `   ⚠️  ${errorCount} validation error(s) detected — push will continue (use --strict to abort on errors).`,
+    );
+  }
+
   // Resolve credential names → UUIDs in all resource data before applying
   const credMap = credentialForwardMap(state);
   if (credMap.size > 0) {
diff --git a/src/validate-cmd.ts b/src/validate-cmd.ts
@@ -0,0 +1,63 @@
+// CLI entry: `npm run validate -- <org>`
+//
+// Loads the same resource shape as `push.ts` would (so the validator runs
+// against exactly what would ship), then runs all client-side validators
+// and prints findings. Exit code 0 if no errors, 1 if any error-severity
+// finding is present.
+
+import { resolve } from "path";
+import { fileURLToPath } from "url";
+import { VAPI_ENV, VAPI_BASE_URL } from "./config.ts";
+import { loadResources } from "./resources.ts";
+import { summarizeFindings, validateResources } from "./validate.ts";
+import type { LoadedResources } from "./types.ts";
+
+async function main(): Promise<void> {
+  console.log(
+    "═══════════════════════════════════════════════════════════════",
+  );
+  console.log(`🔎 Vapi GitOps Validate - Environment: ${VAPI_ENV}`);
+  console.log(`   API: ${VAPI_BASE_URL}`);
+  console.log(
+    "═══════════════════════════════════════════════════════════════\n",
+  );
+
+  console.log("📂 Loading resources...\n");
+  const resources: LoadedResources = {
+    tools: await loadResources("tools"),
+    structuredOutputs: await loadResources("structuredOutputs"),
+    assistants: await loadResources("assistants"),
+    squads: await loadResources("squads"),
+    personalities: await loadResources("personalities"),
+    scenarios: await loadResources("scenarios"),
+    simulations: await loadResources("simulations"),
+    simulationSuites: await loadResources("simulationSuites"),
+    evals: await loadResources("evals"),
+  };
+
+  const findings = validateResources(resources);
+  console.log(`\n${summarizeFindings(findings)}\n`);
+
+  const errorCount = findings.filter((f) => f.severity === "error").length;
+  if (errorCount > 0) {
+    console.error(
+      `❌ Validation failed with ${errorCount} error(s). Fix the issues above before pushing.`,
+    );
+    process.exit(1);
+  }
+  console.log("✅ Validation passed.");
+}
+
+const isMainModule =
+  process.argv[1] !== undefined &&
+  resolve(process.argv[1]) === fileURLToPath(import.meta.url);
+
+if (isMainModule) {
+  main().catch((error) => {
+    console.error(
+      "\n❌ Validation failed:",
+      error instanceof Error ? error.message : error,
+    );
+    process.exit(1);
+  });
+}
diff --git a/src/validate.ts b/src/validate.ts
diff --git a/tests/validate.test.ts b/tests/validate.test.ts