From 28aa4c9d74138601d2f3de205b2196d6a8e0c84b Mon Sep 17 00:00:00 2001 From: Dhruva Reddy Date: Fri, 1 May 2026 12:37:38 -0700 Subject: [PATCH] docs: adopt upstream improvements.md log + voice-providers cheat-sheet MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit ## ELI5 **Problem.** Two of our customer-fork repos (`gitops-mudflap`, `gitops-amazon3p`) kept their own running notes about engine quirks ("man, this is annoying when X happens"). Those notes never made it back upstream, so every new customer hit the same friction. There was also no convention for *anyone* β€” human or AI β€” to leave behind a "this should be better" trail. **What this fix does.** Adopts the customer-log format upstream (severity-ranked, evidence-tagged) and seeds it with 20 entries catalogued from both customer logs. Adds a voice-provider cheat-sheet under `docs/learnings/` so the most common 400-rejection class (`voice.speed` on Cartesia) becomes a one-page lookup. Updates `.gitignore` to stop sweeping AI agent handoff scratch (`.agent/`, `.claude/handoffs/`) into commits via `git add -A`. Adds a CLAUDE.md section telling future contributors how to log new entries. **Outcome you'll notice.** Every fresh customer clone of this template inherits the running log on day one. When you hit something annoying, you append an entry in the same change instead of carrying it as folklore. As later stacks land, rows in the triage table flip from `Open` to `RESOLVED` so the file becomes a living changelog. --- Land all the zero-engine-change cleanups in one small PR so the rest of the stack starts from a clean docs surface. - improvements.md (NEW, repo root): adopt the severity-ranked, evidence- tagged catalog format from the Amazon3p customer log. Seeds 20 entries catalogued from gitops-mudflap and gitops-amazon3p. Triage table rows flip from Open β†’ RESOLVED as later stacks land. - docs/learnings/voice-providers.md (NEW): per-provider voice block cheat-sheet (Cartesia vs 11labs vs OpenAI/Azure/Rime/LMNT/Minimax/ Neuphonic/SmallestAI). Closes the manual-lookup half of #9. - docs/learnings/README.md: route the new entry from the index. - AGENTS.md: document multi-file push (closes #14) + voice-providers routing row. - CLAUDE.md: add Improvements log section instructing future contributors (humans + AI agents) to append entries when they hit friction. - .gitignore: cover .agent/, .agent/handoffs/, .claude/handoffs/ so git add -A doesn't sweep PII handoff scratch (closes #13). Drop the legacy "requested improvements.md" line since the local-only convention is superseded by upstream's improvements.md. Closes improvements.md #13, #14. Partial #9 (doc cheat-sheet half). πŸ€– Generated with [Claude Code](https://claude.com/claude-code) --- ## Update β€” 11labs pronunciation dictionary coverage Generalized the docs+log from "Cartesia voice picker drops the pronunciation dictionary" to "voice edits drop the dictionary across both Cartesia and 11labs", since the Vapi platform documents 11labs as the supported provider for pronunciation dictionaries (https://docs.vapi.ai/assistants/pronunciation-dictionaries) and exposes a different field shape: - 11labs (documented): `voice.pronunciationDictionaryLocators[]` (array of `{ pronunciationDictionaryId, versionId }`). - Cartesia (passthrough): `voice.pronunciationDictId` (single string id; not in Vapi docs but observed in real customer payloads). Doc + improvements.md updates only β€” the engine-side drift detection that uses these shapes lands in PR #20 (Stack G). --- .gitignore | 10 +- AGENTS.md | 3 + CLAUDE.md | 20 + docs/learnings/README.md | 4 +- docs/learnings/voice-providers.md | 97 +++ improvements.md | 977 ++++++++++++++++++++++++++++++ 6 files changed, 1106 insertions(+), 5 deletions(-) create mode 100644 docs/learnings/voice-providers.md create mode 100644 improvements.md diff --git a/.gitignore b/.gitignore index b04f530..f38c759 100644 --- a/.gitignore +++ b/.gitignore @@ -20,8 +20,12 @@ Thumbs.db tmp/ +# Local snapshots written by `npm run push` for `npm run rollback` recovery. +# Operator-local; not shared. +.vapi-state.*.snapshots/ + # Local agent state .claude/ - -# Local-only audit notes (not part of the upstream repo) -requested improvements.md +.agent/ +.agent/handoffs/ +.claude/handoffs/ diff --git a/AGENTS.md b/AGENTS.md index 37db125..5ee3bad 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -31,6 +31,7 @@ This project manages **Vapi voice agent configurations** as code. All resources | Building outbound calling agents | `docs/learnings/outbound-agents.md` | | Voicemail detection / VM vs human classification | `docs/learnings/voicemail-detection.md` | | Enforcing call time limits / graceful call ending | `docs/learnings/call-duration.md` | +| Voice provider field cheat-sheet (Cartesia vs 11labs vs OpenAI etc.) | `docs/learnings/voice-providers.md` | --- @@ -50,6 +51,7 @@ This project manages **Vapi voice agent configurations** as code. All resources | Pull latest from Vapi | `npm run pull -- `, `--force`, or `--bootstrap` | | Pull one known remote resource | `npm run pull -- --type assistants --id ` | | Push only one file | `npm run push -- resources//assistants/my-agent.md` | +| Push multiple specific files | `npm run push -- ` (one state-file rewrite at the end) | | Test a call | `npm run call -- -a ` | --- @@ -744,6 +746,7 @@ npm run pull -- --type squads --id # Pull one known remote resou npm run push -- # Push all local changes to Vapi npm run push -- assistants # Push only assistants npm run push -- resources//assistants/my-agent.md # Push single file +npm run push -- # Push multiple specific files (one state write) npm run apply -- # Pull then push (full sync) # Testing diff --git a/CLAUDE.md b/CLAUDE.md index 83ff0a5..b9f1a7c 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -26,6 +26,26 @@ When both files exist, follow both. If guidance overlaps, treat `AGENTS.md` as t - WebSocket transport β†’ `docs/learnings/websocket.md` - Call time limits / graceful ending β†’ `docs/learnings/call-duration.md` +## Improvements log + +This repo maintains an upstream-only running log at `improvements.md` (repo +root). It tracks engine friction, footguns, and improvement ideas surfaced +during real customer work β€” both before and after fixes land. + +**When you (Claude or human) hit something that makes you go "this should be +better," append or update an entry in `improvements.md` in the same change.** +The format is **Problem β†’ Current behavior β†’ Risk β†’ Current mitigation β†’ +Possible fix β†’ Status**, ordered by severity / blast radius. Cite source +file paths with line numbers so future readers can verify your claims. + +When a fix lands, mark the entry `[RESOLVED YYYY-MM-DD] (#)` at +the top β€” don't delete it. The history is the point. + +Customer-fork logs (`gitops-mudflap/improvements.md`, +`gitops-amazon3p/improvements.md`) feed upstream: when an entry there is +generic enough to apply across customers, surface it here in the same +revision. + ## Test-Call CLI Notes When debugging a customer issue with `npm run call -- -s `: diff --git a/docs/learnings/README.md b/docs/learnings/README.md index 480bf04..483c14e 100644 --- a/docs/learnings/README.md +++ b/docs/learnings/README.md @@ -26,7 +26,7 @@ Each file targets a specific topic so you can load only the context you need. | Bulk-dialing from a CSV (Outbound Call Campaigns) | [outbound-campaigns.md](outbound-campaigns.md) | | Voicemail detection / VM vs human classification | [voicemail-detection.md](voicemail-detection.md) | | Enforcing call time limits / graceful call ending | [call-duration.md](call-duration.md) | -| Authoring YAML resource files (scalar coercion, frontmatter conventions) | [yaml-conventions.md](yaml-conventions.md) | +| Voice provider field cheat-sheet (Cartesia vs 11labs vs others) | [voice-providers.md](voice-providers.md) | --- @@ -44,7 +44,7 @@ Gotchas and silent defaults for each resource type: | [structured-outputs.md](structured-outputs.md) | Schema type gotchas, assistant_ids, default models, target modes, KPI patterns | | [simulations.md](simulations.md) | Personalities, evaluation comparators, chat-mode gotcha, missing references, full `/eval/simulation/*` API reference | | [webhooks.md](webhooks.md) | Default server messages, timeouts, unreachable servers, credential resolution, payload shape | -| [yaml-conventions.md](yaml-conventions.md) | YAML 1.1 boolean coercion (`off`/`yes`/`no`), whitespace-truthy gotchas, discriminated-union sentinels, deprecated-field footguns, multi-line block scalars, anchors/aliases, frontmatter fence rules | +| [voice-providers.md](voice-providers.md) | Per-provider voice block layout (Cartesia vs 11labs vs OpenAI/Azure/Rime/LMNT/Minimax/Neuphonic/SmallestAI) β€” saves 400s at push time | ### Troubleshooting Runbooks diff --git a/docs/learnings/voice-providers.md b/docs/learnings/voice-providers.md new file mode 100644 index 0000000..9598e53 --- /dev/null +++ b/docs/learnings/voice-providers.md @@ -0,0 +1,97 @@ +# Voice Providers β€” Field Cheat-Sheet + +The `voice` block on an assistant or `membersOverrides.voice` on a squad is **provider-specific**. Same conceptual field (e.g. "speed") lives at different paths depending on the provider. The Vapi platform rejects misplaced fields with a generic `property X should not exist` 400 β€” it does not point to the correct path. This page is the lookup table. + +> **When a 400 says "property X should not exist":** check this page for the provider's field layout before re-pushing. The engine has no schema awareness and will accept whatever you write, then surface the error only after the push reaches the API. + +--- + +## Quick lookup + +| Field | 11labs | Cartesia (sonic-3) | OpenAI / Azure / Rime / LMNT / Minimax / Neuphonic / SmallestAI | +|-------|--------|---------------------|------------------------------------------------------------------| +| Speech rate | `voice.speed` (0.7–1.2) | `voice.generationConfig.speed` (0.6–1.5) | `voice.speed` | +| Stability / consistency | `voice.stability` (0.0–1.0) | β€” (not exposed) | β€” | +| Voice similarity | `voice.similarityBoost` (0.0–1.0) | β€” | β€” | +| SSML parsing | `voice.enableSsmlParsing: true` | (parsed natively, no flag) | varies β€” see provider docs | +| Pronunciation dictionary | `voice.pronunciationDictionaryLocators[]` (array of `{pronunciationDictionaryId, versionId}`) | `voice.pronunciationDictId` (single string id; not in Vapi docs but accepted as a Cartesia passthrough) | β€” | +| Volume control | β€” | `voice.generationConfig.volume` (0.5–2.0) | β€” | +| Emotion / accent (experimental) | β€” | `voice.experimentalControls.emotion`, `voice.experimentalControls.speed` (-1 to 1, older API) | β€” | + +--- + +## 11labs + +```yaml +voice: + provider: 11labs + voiceId: + model: eleven_turbo_v2 # or eleven_flash_v2_5 + speed: 1.05 # 0.7–1.2 + stability: 0.6 # 0.0–1.0; higher = less expressive variation + similarityBoost: 0.75 # 0.0–1.0; higher = closer to source voice + enableSsmlParsing: true # required for ``, ``, etc. + pronunciationDictionaryLocators: # ElevenLabs PLS dictionaries; multiple allowed + - pronunciationDictionaryId: rjshI10OgN6KxqtJBqO4 + versionId: xJl0ImZzi3cYp61T0UQG +``` + +Common pitfalls: +- `voice.generationConfig.*` β€” **does not exist** for 11labs. That's a Cartesia path. Push will 400. +- Forgetting `enableSsmlParsing: true` β€” SSML tags will be spoken literally. +- `voice.pronunciationDictId` (single string) β€” that's the Cartesia shape. 11labs uses `voice.pronunciationDictionaryLocators[]` (array of `{pronunciationDictionaryId, versionId}`). Reference: . + +**Pronunciation dictionary warning (11labs):** dashboard edits that change the voice can drop `pronunciationDictionaryLocators` entries silently β€” the same drift class as Cartesia, just with the array shape. Treat the locators array as part of the voice's identity during edits. + +--- + +## Cartesia (sonic-3) + +```yaml +voice: + provider: cartesia + model: sonic-3 + voiceId: + pronunciationDictId: pdict_ # optional but sticky β€” see warning below + generationConfig: + speed: 1.1 # 0.6–1.5 + volume: 1.0 # 0.5–2.0 + experimentalControls: + speed: 0.0 # -1 to 1 (older API path) + emotion: ["positivity:high"] +``` + +**Forbidden at top level for Cartesia (will 400):** +- `voice.speed` β€” use `voice.generationConfig.speed` instead. +- `voice.enableSsmlParsing` β€” Cartesia parses SSML (``, ``) natively from the text stream; no opt-in flag exists. +- `voice.stability`, `voice.similarityBoost` β€” those are 11labs fields. + +**Pronunciation dictionary warning (Cartesia):** changing the `voiceId` in the Vapi dashboard's voice picker silently drops `pronunciationDictId` from the resource. If you swap the Cartesia voice via the dashboard, re-attach the dictionary on the next pull or it will be gone. Treat `(voiceId, pronunciationDictId)` as one atomic unit during edits. Note: `voice.pronunciationDictId` for Cartesia is observed in real customer payloads but is not in the Vapi docs (Vapi only documents the 11labs `pronunciationDictionaryLocators[]` shape β€” see the 11labs section above). Vapi appears to pass the field through to Cartesia's native API; behavior may change without notice. + +--- + +## OpenAI / Azure / Rime / LMNT / Minimax / Neuphonic / SmallestAI + +```yaml +voice: + provider: openai # or azure, rime, lmnt, minimax, neuphonic, smallestai + voiceId: + model: # e.g. tts-1-hd for openai + speed: 1.0 # top-level for these providers +``` + +These providers expose `speed` at the top of the `voice` block. Refer to the [Vapi voice provider docs](https://docs.vapi.ai/providers/voice) for additional provider-specific fields (instructions, language hints, etc.). + +--- + +## Switching providers + +When migrating an assistant or squad member from Cartesia to 11labs (or vice versa), the field layout flips. If you carry over `generationConfig` from a Cartesia config to an 11labs voice, the next push will 400. Always rewrite the voice block from the target provider's template; do not patch in place. + +If a customer changes the provider on the dashboard and your local YAML still has the old nesting, `pull` will overwrite it cleanly β€” but a subsequent `push` from a stale branch will 400. Pull first, then edit. + +--- + +## Adding a new provider + +If you find yourself reaching for a provider not in the table above, append a row here in the same PR. The cheat-sheet only stays useful if it grows with the platform. diff --git a/improvements.md b/improvements.md new file mode 100644 index 0000000..ba9c30a --- /dev/null +++ b/improvements.md @@ -0,0 +1,977 @@ +# Vapi GitOps β€” Engine Improvements Log + +> **MAINTENANCE DIRECTIVE FOR CONTRIBUTORS (humans and AI agents):** +> This file is the running log of friction, footguns, and improvement ideas +> for the gitops engine in this template repo. It is the upstream source of +> truth β€” every customer fork inherits it on clone, and every customer log +> entry that surfaces an upstream-relevant gap eventually lands here. +> +> **When you discover ANY of the following, add an entry to this file in the +> same change:** +> - A push/pull/apply behavior that surprises a user or causes data loss +> - A footgun in `src/*.ts` that isn't documented in `AGENTS.md` or `docs/learnings/` +> - A missing safety rail (no drift detection, no dry-run, no rollback, etc.) +> - A coordination problem (concurrent edits, dashboard-vs-local divergence) +> - A workflow-level recommendation that emerged from real customer work +> +> **Format:** each entry uses the **Problem β†’ Current behavior β†’ Risk β†’ +> Current mitigation β†’ Possible fix β†’ Status** structure (see "Entry +> template" below). Date the entry. Link to relevant source files / PRs +> with line references so future readers can verify your claims. +> +> **Two evidence rules keep this file trustworthy:** +> - **Verified current behavior** β€” confirmed in this repo (source, scripts, +> or docs) and cited directly. +> - **Needs platform validation** β€” engine-side behavior verified, but the +> corresponding Vapi platform capability is still unknown. Label any +> platform-side claim that hasn't been confirmed. +> +> **When a fix lands**, mark the entry `[RESOLVED YYYY-MM-DD] (#)` +> at the top of the entry β€” don't delete it. The history is the point. + +--- + +## How to read this file + +Sections are ordered by **severity / blast radius**, not by date discovered. +Within each entry: + +- **Problem** β€” one-sentence statement of what's wrong. +- **Current behavior** β€” what the engine actually does today, with code + references so the next person can verify. +- **Risk** β€” what can go wrong in real workflows. +- **Current mitigation** β€” what users should do today to avoid the problem. +- **Possible fix** β€” sketch of an engineering change. +- **Status** β€” open / partially mitigated / resolved. + +## Triage at a glance + +**Statuses below reflect the state at the tip of each PR. Subsequent PRs in +this stack flip rows from `Open` to `RESOLVED` as they land β€” the cell tells +you which stack PR closes the row.** + +| # | Title | Why it matters | Depends on | Status | +| --- | -------------------------------------------------------- | -------------------------------------------------- | ---------- | --------------------------------- | +| 1 | `push` drift detection | Prevent silent overwrites of dashboard edits | #4 | Open (Stack G planned) | +| 2 | `apply` same-file conflict | `apply` drops concurrent same-file dashboard edits | #4 | Open (Stack G planned) | +| 3 | Rollback | Current undo can clobber newer live changes | #4, #5 | Open (Stack H planned) | +| 4 | State schema content hashes | Architectural unlock for #1, #2, #3, #6, #7 | None | Open (Stack F planned) | +| 5 | `push --dry-run` | Cheapest operator-safety win | None | Open (Stack C planned) | +| 6 | API-level optimistic concurrency | Server-side conflict rejection | Platform | Deferred (Stack I, gated) | +| 7 | Voice edits drop pronunciation-dictionary attachments | Silent regression on Cartesia + 11labs voice edits | #4 | Open (Stack G planned) | +| 8 | Dashboard prompt edits can in-place duplicate the prompt | Two stacked prompt versions = stitched output | None | Open (Stack D planned) | +| 9 | Provider-specific voice schema mismatch (push 400) | `voice.speed` vs `voice.generationConfig.speed` | None | Partial β€” doc cheat-sheet (Stack A) | +| 10 | Targeted assistant push mints duplicate tools | Re-pushing assistant duplicates `end-call-*` tools | #4 | Partial | +| 11 | Bidirectional SO ↔ assistant lockstep has no validation | One-sided edits silently inconsistent | None | Open (Stack D planned) | +| 12 | State file accumulates UUIDs without source files | Silent gitops drift | None | Partial | +| 13 | `.agent/` and `.claude/handoffs/` not gitignored | `git add -A` sweeps PII handoff scratch | None | RESOLVED 2026-04-30 (Stack A) | +| 14 | Multi-file push undocumented | Discoverability | None | RESOLVED 2026-04-30 (Stack A) | +| 15 | Scoped push rewrites entire state file | Pre-existing drift sweeps into focused commits | #4 | Open (Stack J planned) | +| 16 | No CLI runner for simulation suites | Engine pushes them, can't run them | None | Open (Stack E planned) | +| 17 | State file key-order churn produces noisy diffs | Reorderings hide real changes | None | Open (Stack B planned) | +| 18 | Structured-output `name` capped at 40 chars (no warning) | Push fails partway after partial application | None | Open (Stack D planned) | +| 19 | No `maxTokens` floor warning for tool-using assistants | `maxTokens: 1` bricks the assistant silently | None | Open (Stack D planned) | +| 20 | Prompt vocabulary leaks into TTS | `Reason.` becomes verbal contaminant | None | Open (Stack D heuristic planned) | + +--- + +## 1. `push` has no drift detection β€” silently overwrites concurrent dashboard edits + +**Discovered:** customer-fork log (Amazon3p `improvements.md` #1, 2026-04-17) + +### Problem + +`npm run push -- ` blindly `PATCH`es the local payload onto the +platform without checking whether the platform's current state matches what +we last pulled. If anyone else (a teammate, a customer, an automation) +edits the same resource on the dashboard between our last pull and our +push, their change is silently overwritten with no warning. + +### Current behavior (Verified) + +The push code path is a straight `PATCH /resource/{uuid}` with the full +local payload β€” no `If-Match` header, no version field comparison, no +fetch-then-diff. See `src/push.ts:73-79` and `src/api.ts:65-71` (no +conditional-write headers anywhere in the request path). The state file +(`.vapi-state..json`) only stores identity mappings (`name β†’ UUID`) +β€” no content hashes, no version numbers, no timestamps. + +### Risk + +A teammate dashboard-edits a prompt during a live test; you push your +unrelated branch and their edit disappears. A customer success rep updates +business hours via the dashboard; the next gitops push silently reverts +it. A `git revert + push` rollback inherits the same problem β€” it +overwrites whatever's currently live, not just the change being reverted. + +### Current mitigation + +Use `npm run apply -- ` (`pull β†’ push`) instead of bare `push`. The +`pull` step is git-aware and preserves locally-modified files while +pulling fresh state for everything else (see #2 for the residual same-file +conflict case). Bare `push` should be reserved for environments where you +know nobody else touches the dashboard. + +### Possible fix + +1. **Content-hash drift detection.** Store sha256 of the platform's + last-known content per resource in `.vapi-state..json`. On push, + GET the current platform version, hash it, refuse to push if the hash + doesn't match β€” surface the diff and require an explicit + `--overwrite` flag. Depends on #4. +2. **Server-side ETag / If-Match.** See #6. +3. **Pre-push diff (poor man's version of #1).** Run a `pull --dry-run` + before push and show the user what's about to change β€” partial + mitigation only. + +### Status + +**Open.** Targeted by **Stack G** (drift detection); depends on **Stack F** +(state schema). Mitigated by `apply -- ` for the non-same-file case. + +--- + +## 2. `apply` (pull β†’ push) silently drops dashboard edits to files modified locally + +**Discovered:** customer-fork log (Amazon3p #2, 2026-04-17) + +### Problem + +`pull` uses `git status --porcelain` to identify locally-modified files +and **preserves the local version**, dropping the platform's version of +those files entirely. There's no warning that the platform's version +differs from what your local file was based on. + +### Current behavior (Verified) + +`src/pull.ts:117-135` (`getLocallyChangedFiles()`) and `src/pull.ts:705-735` +(the preserve-local-on-pull branch). The "preserved" message in +`src/pull.ts:887-896` tells you the count but not whether the platform's +version of that same file diverged from your branch point. There's no +3-way merge β€” local wins by default. + +### Risk + +You edit `assistants/foo.md` locally. A teammate edits the same +`assistants/foo.md` on the dashboard. You run `apply`. Pull preserves your +local version with no warning that the dashboard had a different version, +then push overwrites the dashboard with yours. Their change is lost. + +### Current mitigation + +Coordinate on shared resources. Always commit before pushing so git +history at least preserves your version cleanly. After any known +dashboard-side change, run `pull` first so the conflict surfaces as a +`git diff` rather than a silent overwrite. + +### Possible fix + +Same as #1: with content-hash drift detection (#4), `pull` could detect +the same-file conflict and either refuse to preserve (requiring +`--keep-local ` resolution), or write the platform's version to a +sibling `.platform.yml` for manual 3-way merge. + +### Status + +**Open.** Targeted by **Stack G**. + +--- + +## 3. No rollback command β€” `git revert + push` inherits all of #1's problems + +**Discovered:** customer-fork log (Amazon3p #3, 2026-04-17) + +### Problem + +The README documents the rollback strategy as `git revert + push`. That +restores local content to a previous git state, but it does **not** +restore a known platform snapshot. The subsequent push still has all the +drift problems above, so a "rollback" can clobber unrelated dashboard +edits made since the bad deploy. There is also no engine-level snapshot +of what was sent. + +### Current behavior (Verified) + +`package.json` has no `rollback` script. The README still documents +rollback as a git-level revert followed by a push. The platform-side +safety net is the dashboard's Version History feature (manual, +per-resource, dashboard-driven). + +### Risk + +Rollback is a manual two-step (`git revert ` β†’ `npm run push -- +`), with the same overwrite risk as any other push. If the bad push +was never committed locally, there's no clean rollback target in git. + +### Current mitigation + +Always `git commit` before `push -- `. For mission-critical +resources, note UUIDs so dashboard Version History is reachable. + +### Possible fix + +**Snapshot-on-push.** Before each PATCH, write the *outgoing* payload AND +the *current platform payload* to +`.vapi-state..snapshots///.json`. Add +`npm run rollback -- --to `. + +### Status + +**Open.** Targeted by **Stack H**; depends on **Stack F**. + +--- + +## 4. State file is identity-only β€” no content snapshots + +**Discovered:** customer-fork log (Amazon3p #4, 2026-04-17) + +### Problem + +`.vapi-state..json` stores `name β†’ UUID` mappings only. It has no +record of the content that was last pulled or pushed for each resource. +This is the architectural reason drift detection isn't possible β€” the +engine has no "last known platform state" to compare against. + +### Current behavior (Verified) + +`src/types.ts:5-16` types every section as `Record`. +`src/state.ts:10-22` (`createEmptyState()`) and the load/save flow at +`src/state.ts:25-64` carry only identity mappings. + +### Risk + +Upstream cause of #1, #2, #3, #6, #7, #15. Fixing this enables the +proposed mitigations above. + +### Possible fix + +Extend the state schema to include content hashes (and optionally last- +pull timestamps and platform-reported version IDs): + +```ts +interface ResourceState { + uuid: string; + lastPulledHash?: string; // sha256 of normalized platform payload + lastPulledAt?: string; // ISO timestamp + lastPushedHash?: string; // sha256 of last pushed payload + platformVersionId?: string; // if Vapi exposes one +} +``` + +The existing `loadState()` merge with `createEmptyState()` (`src/state.ts:48-52`) +makes the additive shape backwards-compatible β€” legacy string-only +entries can be wrapped at load time. + +### Status + +**Open.** Targeted by **Stack F** β€” architectural prerequisite for +G, H, I, J. + +--- + +## 5. No `push --dry-run` / pre-push diff + +**Discovered:** customer-fork log (Mudflap #6 + Amazon3p #5, 2026-04-17/28) + +### Problem + +There's no way to preview what `push` will change on the platform before +running it. Vapi's dashboard has "Version Preview" for the same purpose; +the engine doesn't have a local equivalent. + +### Current behavior (Verified) + +`push.ts` has a dry-run concept only for **deletions** β€” `FORCE_DELETE` +default off β†’ orphaned resources are listed but not deleted (see +`src/push.ts:842`). There is no dry-run for updates or creates. + +### Risk + +Users cannot validate "is this push doing what I think it's doing" +before it lands on prod. In a multi-customer repo with prod state, an +accidental wide-scope push (e.g. forgetting a file path arg) hits live +assistants. Compounds #1. + +### Possible fix + +Add `--dry-run` to `src/config.ts`'s `parseFlags()`. At every +`vapiRequest("PATCH"|"POST"|"DELETE", ...)` call site, gate behind +`if (!DRY_RUN)`. Print `[dry-run] would PATCH /assistant/` instead. +Skip the state-file write entirely. End-of-run summary: `would create N, +would update M, would delete K`. + +### Status + +**Open.** Targeted by **Stack C** β€” cheapest individual fix; partially +mitigates #1, #3, #6. + +--- + +## 6. No optimistic concurrency at the API protocol level + +**Discovered:** customer-fork log (Amazon3p #6, 2026-04-17) + +### Problem + +Even if the engine were perfectly drift-aware locally, true race +prevention still needs help from the write API. If two clients race, the +cleanest outcome is for the server to reject stale writes rather than +letting the last writer win silently. + +### Current behavior + +**Verified in engine:** mutating requests in `src/api.ts:65-71` send only +auth and content-type headers. No `If-Match` / `If-Unmodified-Since` +anywhere. + +**Needs platform validation:** we have not yet confirmed whether Vapi +write endpoints support ETags, `If-Match`, `If-Unmodified-Since`, or any +equivalent optimistic-concurrency mechanism. Until that is verified, +"the engine does not send conditional headers" and "the API does/does +not support them" are separate statements. + +### Risk + +Two simultaneous gitops pipelines (e.g. a dev pushing and a CI job +deploying) could race on the same resource with no conflict detection at +any layer. + +### Current mitigation + +None at the API level. The `apply` flow + git coordination is the only +defense. + +### Possible fix + +1. Confirm whether the API supports `If-Match` / `If-Unmodified-Since` + on `PATCH /assistant/{id}`, `PATCH /squad/{id}`, etc. +2. If yes: extend `vapiRequest` to accept an optional ETag and have the + apply functions in `src/push.ts` send the last-known ETag (stored in + #4's extended state file). +3. If no: file a feature request with Vapi. + +### Status + +**Deferred pending platform validation (2026-04-30).** Stack I in the +sequenced plan is intentionally not landed in this branch. Implementing +`If-Match` / `ETag` on the engine side without confirming the platform +honors the headers would create dead code that gives a false sense of +safety: pushes would still succeed under races, and the conditional-header +guard would do nothing. Owner: file a feature-request ticket with the Vapi +platform team to confirm support, then ship Stack I behind a flag. + +--- + +## 7. Voice edits drop pronunciation-dictionary attachments (Cartesia + 11labs) + +**Discovered:** customer-fork log (Amazon3p #7, 2026-04-19) + +### Problem + +When a voice configuration changes in the Vapi dashboard, the +pronunciation-dictionary attachment can be **silently removed** from +the resource. Two shapes are affected: + +- **Cartesia:** `voice.pronunciationDictId` (single string id) β€” + observed dropping on voice-picker edits in the customer log. +- **11labs:** `voice.pronunciationDictionaryLocators` (array of + `{ pronunciationDictionaryId, versionId }` objects) β€” the + documented Vapi shape; the same drift class applies if a + dashboard edit detaches an entry from the array. + +The new voice is selected, but the dictionary attachment is dropped +without warning. + +### Current behavior (Verified) + +Confirmed for Cartesia by diffing pre/post-customer-edit pulls of the +same squad's `membersOverrides.voice` block β€” the `pronunciationDictId` +line vanishes on voice change. The 11labs shape is documented at + and uses +an array; either array shrink or array clear is the equivalent drift. +Note Cartesia's single-id form is **not** in the Vapi docs but is +accepted as a passthrough to Cartesia's native API. + +### Risk + +Acronym/brand pronunciation regresses wherever the dictionary was the +only source of truth. Customers compensate by stuffing inline +pronunciation rules into prompts, which is strictly worse. Drift is +invisible until you actually listen to the agent. + +### Current mitigation + +After any known voice change, immediately verify that the dictionary +attachment is still set: + +- Cartesia: `voice.pronunciationDictId` still present. +- 11labs: `voice.pronunciationDictionaryLocators` still has the + expected entries. + +Treat the dictionary attachment as part of the voice's identity during +edits. See `docs/learnings/voice-providers.md`. + +### Possible fix + +1. **Pull-side warning.** When `pull` materialises a `voice` block that + loses a previously-tracked dictionary attachment (either the + Cartesia `pronunciationDictId` or shrinkage in the 11labs + `pronunciationDictionaryLocators` array), log a warning so the + removal isn't invisible in the diff. Doesn't need #4. +2. **Push-side warning.** When `push` detects that local has a + dictionary attachment but platform doesn't, surface a warning + before applying. Needs #4 + drift detection. +3. **Vapi dashboard fix.** File a feature request to preserve + dictionary attachments across voice changes (when the new voice + supports it), or warn the user explicitly. + +### Status + +**Open.** Targeted by **Stack G** as a provider-aware drift-detection +warning covering both shapes. + +--- + +## 8. Dashboard prompt edits can in-place duplicate the existing prompt + +**Discovered:** customer-fork log (Amazon3p #8, 2026-04-19) + +### Problem + +When a user edits a long prompt in the Vapi dashboard, it's easy to paste +a new version on top of the existing one without first selecting and +removing the old text. The result: the saved prompt contains BOTH the +old and new versions stacked, with internally contradictory instructions. +The agent then follows both sets of rules and produces stitched-together +/ repeating output. + +### Current behavior (Verified) + +The dashboard accepts the duplicated prompt without complaint. The +gitops repo only surfaces the issue on the next pull, where the file +silently grows 2-5x. + +### Risk + +Silent prompt corruption. Hard to diagnose from runtime symptoms alone. +Affects gitops-and-dashboard-concurrent customers most acutely. + +### Current mitigation + +After any customer-side prompt edit, run `pull -- ` and inspect +prompt sizes. A sudden 2-5x size jump is almost always a paste-on-top +duplication or an intentional rewrite that needs review. + +### Possible fix + +1. **Engine-level lint.** `npm run validate -- ` heuristics: + - Same opening header (`You are the ...` or any `# H1`) appearing twice + in one prompt + - Two `CONTINUITY ON ENTRY` blocks + - Same line repeated 3+ times consecutively + - Tool references in the prompt that aren't in `model.toolIds` or + `tools:append` +2. **Vapi dashboard fix.** Diff/preview view in the dashboard prompt + editor that highlights apparent duplicate blocks before save. + +### Status + +**Open.** Targeted by **Stack D** (heuristic lint; engine intervention +is partial β€” duplicated prompts can also be authored deliberately). + +--- + +## 9. Provider-specific voice fields nest differently β€” schema mismatch only surfaces at push time + +**Discovered:** customer-fork log (Amazon3p #9, 2026-04-19) + +### Problem + +Vapi's voice config schema is **provider-specific**. For 11labs, +`voice.speed` is the correct path. For Cartesia, speed lives at +`voice.generationConfig.speed`. Same field name, different nesting. The +gitops engine has no schema awareness β€” it accepts whatever you write, +posts to Vapi, and only the API rejection at push time tells you the +field is in the wrong place. + +### Current behavior (Verified) + +Observed: `voice.speed` on a Cartesia voice β†’ `400: property speed +should not exist`. `voice.enableSsmlParsing: true` on Cartesia β†’ same +400. The error is informative but doesn't say where the field _should_ +exist or whether it exists at all for that provider. + +### Risk + +Push fails after the change is fully prepped. Easy to misread "rejected" +as "tool unavailable" rather than "wrong path." Provider switches break +silently in the inverse direction. + +### Current mitigation + +After any voice-related edit, push to a non-prod environment first if +available, OR consult `docs/learnings/voice-providers.md` (added in +**Stack A**) for the per-provider field layout. + +### Possible fix + +1. **Engine-level validator.** `npm run validate -- ` rejects: + - Cartesia: `voice.speed`, `voice.enableSsmlParsing`, + `voice.stability`, `voice.similarityBoost` at top level (point at + `generationConfig.*` instead). + - 11labs: `voice.generationConfig.*` (point at top level). +2. **Vapi side: clearer error message.** API responds with `property + speed should not exist at this path; for cartesia use + voice.generationConfig.speed`. + +### Status + +**Open.** Targeted by **Stack D** validator + the per-provider +cheat-sheet in `docs/learnings/voice-providers.md` (Stack A). + +--- + +## 10. Targeted assistant pushes can auto-create duplicate tool dependencies + +**Discovered:** customer-fork log (Amazon3p #10, 2026-04-29) + +### Problem + +Repeated targeted pushes of one assistant can auto-apply local tool +dependencies and mint new duplicate tool resources instead of reusing +the already-created dependency. Repeatedly pushing one assistant +file created multiple `end-call-*` tools while refreshing only the +assistant voice config. + +### Current behavior (Partially mitigated) + +`src/push.ts:697-723` (`ensureToolExists()`) skips when the tool's +`toolId` is already a UUID, already exists as an exact key in +`state.tools`, or was auto-applied earlier in the same process. But the +state can lose the stable local key for a tool across bootstrap / +name-mismatch refreshes; the resolver then treats the same local +dependency as missing and creates a new dashboard tool. + +### Risk + +Dashboard clutter and state churn. The wrong dependency can become live β€” +the assistant may point at the newest duplicate while older ones remain +in state, making cleanup risky. + +### Current mitigation + +Before re-pushing an assistant with local tool dependencies, inspect +`.vapi-state..json` for duplicate aliases and run +`npm run cleanup -- ` as a dry-run. + +### Possible fix + +1. **Resolve dependencies by stable identity before create.** + `ensureToolExists()` should detect when a local tool payload already + corresponds to an existing dashboard resource under a renamed / + state-only key and re-key state instead of creating. +2. **Duplicate-name guard for auto-applied dependencies.** Before + `applyTool()` creates from dependency resolution, query existing + remote tools by name / function signature and warn or reuse if + equivalent exists. +3. **Dry-run output for targeted pushes** (Stack C). + +### Status + +**Partial.** `ensureToolExists()` blocks the most common path; the +state-renaming case remains. **Stack C dry-run** surfaces auto-apply +intent before mutation. + +--- + +## 11. Bidirectional SO ↔ assistant attachment has no validation + +**Discovered:** customer-fork log (Mudflap #3, 2026-04-28) + +### Problem + +A structured output's `assistant_ids:` list and each assistant's +`structuredOutputIds:` list are independent declarations of the same +edge. A one-sided edit looks fine locally but produces inconsistent +dashboard state depending on which side `push` reconciles from. Lockstep +rules become memory-only conventions, not engine-enforced invariants. + +### Current behavior (Verified) + +The push pipeline's `updateStructuredOutputAssistantRefs()` +(`src/push.ts:574-606`) and `updateToolAssistantRefs()` independently +PATCH each side based on whichever local file was authored β€” never +cross-checking that both sides agree. + +### Risk + +Inconsistent dashboard state. Hard to audit visually because you have to +grep both files to detect drift. + +### Current mitigation + +Manual: grep both files when editing one side. Easy to miss. + +### Possible fix + +`npm run validate -- `: +- For every SO file's `assistant_ids:`, check the named assistant's + `structuredOutputIds:` lists this SO. If not, flag. +- For every assistant's `structuredOutputIds:`, check the named SO's + `assistant_ids:` lists this assistant. If not, flag. +- Optional `--fix` to auto-mirror. + +### Status + +**Open.** Targeted by **Stack D**. + +--- + +## 12. State file accumulates UUIDs without source files (silent drift) + +**Discovered:** customer-fork log (Mudflap #2, 2026-04-28) + +### Problem + +The state file claims live resources whose specs aren't in the repo. New +engineers cloning the repo see state references to phantom resources. +Lockstep guarantees ("source matches dashboard") quietly break. + +### Current behavior (Partial) + +`src/push.ts:167-231` (`getInvalidStateMappings()`) detects +`missing_remote` and `name_mismatch` cases at push time and triggers a +bootstrap pull, but it doesn't catch "state has UUID, no local source +file." The pull side handles deleted-local-file as an intentional +delete tracked in state (`src/pull.ts:776-790`), which is the inverse +direction β€” that case is by design. + +### Risk + +Silent gitops drift. Phantom resources accumulate across sessions. + +### Current mitigation + +Periodic `npm run cleanup -- ` to surface orphans on the dashboard +side. No equivalent for state-side orphans. + +### Possible fix + +At start of `push` and end of `pull`, run a reconciliation pass: +- For every UUID in state, check that a matching source file exists at + the expected path. If not, warn: + `state has UUID for X but no source file at β€” either run pull + or remove from state`. +- For every source file, check the state has a UUID entry. If not, + warn: `source file Y exists but state has no UUID β€” will create new + on push`. + +Make these warnings non-blocking but very visible. + +### Status + +**Partial.** `getInvalidStateMappings()` covers two of the three cases; +state-orphans-without-source remain. + +--- + +## 13. `.agent/` and `.claude/handoffs/` are not gitignored + +**[RESOLVED 2026-04-30] (Stack A)** + +**Discovered:** customer-fork log (Mudflap #4, 2026-04-28) + +### Problem + +`.agent/` and `.claude/handoffs/` showed up in `git status` from session +start. The repo's `.gitignore` did not cover handoff-scratch directories +written by Claude Code's SessionStart hook and the new-thread skill. + +### Risk + +`git add -A` (or `gt modify -cam`, which uses it internally) silently +sweeps these dirs into commits. Handoff files contain conversation +snapshots, sometimes including draft messages with PII or in-progress +decisions. + +### Resolution + +`.gitignore` extended with `.agent/`, `.agent/handoffs/`, +`.claude/handoffs/` (the existing `.claude/` line covered the latter +already, but Mudflap's log explicitly called out `.agent/` which was +uncovered). Removed the legacy `requested improvements.md` line β€” that +was a per-engineer convention superseded by adopting upstream +`improvements.md`. + +--- + +## 14. Multi-file push works but is undocumented + +**[RESOLVED 2026-04-30] (Stack A)** + +**Discovered:** customer-fork log (Mudflap #5, 2026-04-28) + +### Problem + +`AGENTS.md` documented `npm run push -- ` for scoped +pushes. Multi-file (` `) worked but was undiscoverable β€” +engineers fell back to "push the whole org" (wider blast radius) or +sequential single-file pushes (multiple state file rewrites = more diff +noise). + +### Resolution + +`AGENTS.md` Quick Reference table + Available Commands block now +document multi-file push. Verified intentional in `src/config.ts:104-184` +(file-path arg detection accumulates into `filePaths[]`). + +--- + +## 15. Scoped push still rewrites the entire state file + +**Discovered:** customer-fork log (Mudflap #7, 2026-04-28) + +### Problem + +A surgical push of just two files rewrote the entire +`.vapi-state..json`, sweeping in pre-existing drift from earlier +pushes. The resulting commit-able state file diff was much larger than +the actual push scope warranted. + +### Current behavior (Verified) + +`src/push.ts:1278-1280` calls `saveState(state)` with the full state +object after every push, regardless of which paths were targeted. + +### Risk + +Even a focused push produces a noisy state diff that may include +unintended pre-existing dashboard drift. Reviewers can't tell "what did +this push do" from the state file diff alone. + +### Possible fix + +When push is scoped, only update state entries for resources actually +touched. Track touched IDs during apply; at end-of-push, merge +(load existing state β†’ replace only touched keys β†’ save). Needs #4 to +distinguish "stale" from "just-not-touched." + +### Status + +**Open.** Targeted by **Stack J**; depends on **Stack F**. + +--- + +## 16. No CLI runner for simulation suites (despite engine tracking them) + +**Discovered:** customer-fork log (Mudflap #8, 2026-04-28) + +### Problem + +The engine fully tracks simulation suites in state (and AGENTS.md +describes `simulations/suites/` as a first-class resource type), but +there is no `npm run` command to actually *execute* a suite. `npm run +eval` runs the legacy `/evals` endpoint, not the unified simulation +runner (`POST /eval/simulation/run`). The engine drops you at the API +doorstep when you actually want to run it. + +### Current behavior (Verified) + +`package.json` has `eval` (legacy) but no `sim`. `src/push.ts`'s +`applySimulationSuite()` (line 491) creates and updates suites but the +engine has no run path. + +### Risk + +Asymmetric tooling β€” engineers will go straight to the dashboard UI to +trigger runs (losing reproducibility) or write per-customer shell +wrappers. The naming overlap (`npm run eval` vs `simulations/`) +actively misleads. + +### Possible fix + +Add `npm run sim`: +``` +npm run sim -- --suite --target +npm run sim -- --simulations , --target +npm run sim -- --suite --watch +``` +Reuse `src/eval.ts`'s local-name β†’ UUID resolver and +`src/api.ts:vapiRequest`. Print pass/fail summary on completion. + +Renaming `npm run eval` to disambiguate is a separate, backwards- +incompatible follow-up. + +### Status + +**Open.** Targeted by **Stack E**. + +--- + +## 17. State file key-order churn produces noisy diffs + +**Discovered:** customer-fork log (Mudflap #1, 2026-04-28) + +### Problem + +After pushes, the diff of `.vapi-state..json` includes reorderings +of the section objects. Same keys, same UUIDs β€” just emitted in a +different insertion order. About half the diff is pure reordering. + +### Current behavior (Verified) + +`src/state.ts:55-64` (`saveState()`) calls `JSON.stringify(state, null, +2)` with no key sorter. JS `JSON.stringify` preserves insertion order; +maps merged from multiple sources (push, pull, bootstrap) end up with +unpredictable orders. + +### Risk + +Noisy state-file diffs hide the actually meaningful entries (new UUIDs, +removed entries) under a wall of reorderings. Reviewers rubber-stamp +state file changes because they're hard to read. + +### Possible fix + +Add `sortedKeysReplacer` to `JSON.stringify` so object keys serialize +alphabetically. Preserve the atomic write pattern in +`src/state.ts:60-62`. + +**One-time noise:** the first push after this lands produces a +state-file diff of pure reordering across every customer. Worth calling +out in the PR description. + +### Status + +**Open.** Targeted by **Stack B**. + +--- + +## 18. Structured-output evaluation `name` capped at 40 chars with no client-side validation + +**Discovered:** customer-fork log (Mudflap #9, 2026-04-29) + +### Problem + +Structured-output `evaluations[].structuredOutput.name` is capped at 40 +characters server-side. The engine accepts a 51-char name, posts it, +and only fails when the API returns 400 mid-push. + +### Current behavior (Verified) + +Push partway through a multi-resource apply. By the time the scenario +errored, both assistants and one new personality had already been +applied AND the state file had been written with the new personality +UUID. The push left the dashboard in an intermediate state. + +### Risk + +Failure happens partway through a multi-resource push. Recovery is +non-obvious. Engineers naturally write self-describing names that +exceed the cap. + +### Possible fix + +Client-side validator (`npm run validate`) that walks every assistant +`name` and every `evaluations[].structuredOutput.name` in scenarios. +Fail fast (with the offending field path printed) before any API call. +Same validator can apply the cap to other known-finite fields (e.g. +assistant `name` capped at 40 too). + +### Status + +**Open.** Targeted by **Stack D**. + +--- + +## 19. No engine warning when `maxTokens` is too low for a tool-using assistant + +**Discovered:** customer-fork log (Mudflap #10, 2026-04-29) + +### Problem + +Any engineer can write `maxTokens: 1` (or 10, or 25) into an assistant +`.md`. The engine syncs it to the dashboard with no warning. The first +symptom on a real call is a malformed tool-call payload β€” opaque to +debug. Risk window is widest when an engineer is *trying to suppress +speech* on a silent classifier. + +### Current behavior + +**Verified in engine:** the push pipeline passes `maxTokens` through +unchanged. **Needs platform validation:** the exact OpenAI / provider +behavior at low `maxTokens` boundary is provider-specific; the customer +log cites OpenAI streaming behavior at `maxTokens: 1` that returns +`finish_reason: 'length'` mid-JSON for tool calls. + +### Possible fix + +At validate / push time, for any assistant with non-empty +`model.toolIds`, compute a soft floor: +`floor β‰ˆ 25 + sum(len(JSON.stringify(tool.function.parameters)) for tool in tools)`. +If `model.maxTokens < floor`, warn (non-blocking). + +### Status + +**Open.** Targeted by **Stack D**. + +--- + +## 20. Prompt vocabulary leaks into TTS + +**Discovered:** customer-fork log (Mudflap #11, 2026-04-29) + +### Problem + +A prompt section heading or example word that names a tool argument can +become a TTS contaminant. Customer log: a `# Reasoning Channel +Discipline` section with `Reason.` examples caused the model to open +turns with `"Reason."` as a TTS preface. Squad regressed 7/18 β†’ 4/18. + +### Current behavior (Verified) + +The engine treats prompts as opaque text. No surface to detect this +class of regression at push time. + +### Risk + +Prompt-authoring footguns ship clean through the engine. Discovered +days later via sim regressions; attribution to the prompt's literal +word choice is non-obvious. + +### Possible fix + +Heuristic only β€” a real fix requires linguistic modeling out of scope +for an engine intervention: + +1. If a prompt body contains a structured concept word (`Reason`, + `Reasoning`, `Channel`, `Discipline`, `Argument`, etc., capitalized) + AND the assistant has a tool whose parameter has the same name, warn + at validate time. +2. Templating convention `<>` is overkill but worth thinking + about. + +The full fix lives in `docs/learnings/assistants.md` as a known +regression shape. + +### Status + +**Open.** Targeted by **Stack D** as a heuristic; entry stays open to +flag that the heuristic is partial. + +--- + +## Out of scope (intentionally not improvements) + +- **State file is identity-only and not git-ignored.** It's intentionally + committed so all collaborators share the same localβ†’UUID mapping. + The proposal in #4 is *additive* β€” keep identity mappings, add + content hashes. +- **`push -- ` does not require an interactive confirmation prompt.** + That's a UX choice β€” adding a prompt would break automation. The right + place to add friction is `--dry-run` (#5). +- **No environment-cross-pollination guard.** `push -- ` only + touches `resources//` β€” this is correct and documented in + `AGENTS.md`. Don't conflate that with drift detection. +- **Renaming `npm run eval` to disambiguate from `npm run sim`.** + Backwards-incompatible script change; raise as a separate issue.