diff --git a/.gitignore b/.gitignore
index b04f530..f38c759 100644
--- a/.gitignore
+++ b/.gitignore
@@ -20,8 +20,12 @@ Thumbs.db
 
 tmp/
 
+# Local snapshots written by `npm run push` for `npm run rollback` recovery.
+# Operator-local; not shared.
+.vapi-state.*.snapshots/
+
 # Local agent state
 .claude/
-
-# Local-only audit notes (not part of the upstream repo)
-requested improvements.md
+.agent/
+.agent/handoffs/
+.claude/handoffs/
diff --git a/AGENTS.md b/AGENTS.md
index 37db125..5ee3bad 100644
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -31,6 +31,7 @@ This project manages **Vapi voice agent configurations** as code. All resources
 | Building outbound calling agents | `docs/learnings/outbound-agents.md` |
 | Voicemail detection / VM vs human classification | `docs/learnings/voicemail-detection.md` |
 | Enforcing call time limits / graceful call ending | `docs/learnings/call-duration.md` |
+| Voice provider field cheat-sheet (Cartesia vs 11labs vs OpenAI etc.) | `docs/learnings/voice-providers.md` |
 
 ---
 
@@ -50,6 +51,7 @@ This project manages **Vapi voice agent configurations** as code. All resources
 | Pull latest from Vapi               | `npm run pull -- <org>`, `--force`, or `--bootstrap`                              |
 | Pull one known remote resource      | `npm run pull -- <org> --type assistants --id <uuid>`                             |
 | Push only one file                  | `npm run push -- <org> resources/<org>/assistants/my-agent.md`                    |
+| Push multiple specific files        | `npm run push -- <org> <path1> <path2>` (one state-file rewrite at the end)       |
 | Test a call                         | `npm run call -- <org> -a <assistant-name>`                                       |
 
 ---
@@ -744,6 +746,7 @@ npm run pull -- <org> --type squads --id <uuid>    # Pull one known remote resou
 npm run push -- <org>                              # Push all local changes to Vapi
 npm run push -- <org> assistants                   # Push only assistants
 npm run push -- <org> resources/<org>/assistants/my-agent.md  # Push single file
+npm run push -- <org> <path1> <path2>              # Push multiple specific files (one state write)
 npm run apply -- <org>                             # Pull then push (full sync)
 
 # Testing
diff --git a/CLAUDE.md b/CLAUDE.md
index 83ff0a5..b9f1a7c 100644
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -26,6 +26,26 @@ When both files exist, follow both. If guidance overlaps, treat `AGENTS.md` as t
    - WebSocket transport → `docs/learnings/websocket.md`
    - Call time limits / graceful ending → `docs/learnings/call-duration.md`
 
+## Improvements log
+
+This repo maintains an upstream-only running log at `improvements.md` (repo
+root). It tracks engine friction, footguns, and improvement ideas surfaced
+during real customer work — both before and after fixes land.
+
+**When you (Claude or human) hit something that makes you go "this should be
+better," append or update an entry in `improvements.md` in the same change.**
+The format is **Problem → Current behavior → Risk → Current mitigation →
+Possible fix → Status**, ordered by severity / blast radius. Cite source
+file paths with line numbers so future readers can verify your claims.
+
+When a fix lands, mark the entry `[RESOLVED YYYY-MM-DD] (#<PR-number>)` at
+the top — don't delete it. The history is the point.
+
+Customer-fork logs (`gitops-mudflap/improvements.md`,
+`gitops-amazon3p/improvements.md`) feed upstream: when an entry there is
+generic enough to apply across customers, surface it here in the same
+revision.
+
 ## Test-Call CLI Notes
 
 When debugging a customer issue with `npm run call -- <org> -s <squad>`:
diff --git a/docs/learnings/README.md b/docs/learnings/README.md
index 480bf04..483c14e 100644
--- a/docs/learnings/README.md
+++ b/docs/learnings/README.md
@@ -26,7 +26,7 @@ Each file targets a specific topic so you can load only the context you need.
 | Bulk-dialing from a CSV (Outbound Call Campaigns) | [outbound-campaigns.md](outbound-campaigns.md) |
 | Voicemail detection / VM vs human classification | [voicemail-detection.md](voicemail-detection.md) |
 | Enforcing call time limits / graceful call ending | [call-duration.md](call-duration.md) |
-| Authoring YAML resource files (scalar coercion, frontmatter conventions) | [yaml-conventions.md](yaml-conventions.md) |
+| Voice provider field cheat-sheet (Cartesia vs 11labs vs others) | [voice-providers.md](voice-providers.md) |
 
 ---
 
@@ -44,7 +44,7 @@ Gotchas and silent defaults for each resource type:
 | [structured-outputs.md](structured-outputs.md) | Schema type gotchas, assistant_ids, default models, target modes, KPI patterns |
 | [simulations.md](simulations.md) | Personalities, evaluation comparators, chat-mode gotcha, missing references, full `/eval/simulation/*` API reference |
 | [webhooks.md](webhooks.md) | Default server messages, timeouts, unreachable servers, credential resolution, payload shape |
-| [yaml-conventions.md](yaml-conventions.md) | YAML 1.1 boolean coercion (`off`/`yes`/`no`), whitespace-truthy gotchas, discriminated-union sentinels, deprecated-field footguns, multi-line block scalars, anchors/aliases, frontmatter fence rules |
+| [voice-providers.md](voice-providers.md) | Per-provider voice block layout (Cartesia vs 11labs vs OpenAI/Azure/Rime/LMNT/Minimax/Neuphonic/SmallestAI) — saves 400s at push time |
 
 ### Troubleshooting Runbooks
 
diff --git a/docs/learnings/voice-providers.md b/docs/learnings/voice-providers.md
new file mode 100644
index 0000000..9598e53
--- /dev/null
+++ b/docs/learnings/voice-providers.md
@@ -0,0 +1,97 @@
+# Voice Providers — Field Cheat-Sheet
+
+The `voice` block on an assistant or `membersOverrides.voice` on a squad is **provider-specific**. Same conceptual field (e.g. "speed") lives at different paths depending on the provider. The Vapi platform rejects misplaced fields with a generic `property X should not exist` 400 — it does not point to the correct path. This page is the lookup table.
+
+> **When a 400 says "property X should not exist":** check this page for the provider's field layout before re-pushing. The engine has no schema awareness and will accept whatever you write, then surface the error only after the push reaches the API.
+
+---
+
+## Quick lookup
+
+| Field | 11labs | Cartesia (sonic-3) | OpenAI / Azure / Rime / LMNT / Minimax / Neuphonic / SmallestAI |
+|-------|--------|---------------------|------------------------------------------------------------------|
+| Speech rate | `voice.speed` (0.7–1.2) | `voice.generationConfig.speed` (0.6–1.5) | `voice.speed` |
+| Stability / consistency | `voice.stability` (0.0–1.0) | — (not exposed) | — |
+| Voice similarity | `voice.similarityBoost` (0.0–1.0) | — | — |
+| SSML parsing | `voice.enableSsmlParsing: true` | (parsed natively, no flag) | varies — see provider docs |
+| Pronunciation dictionary | `voice.pronunciationDictionaryLocators[]` (array of `{pronunciationDictionaryId, versionId}`) | `voice.pronunciationDictId` (single string id; not in Vapi docs but accepted as a Cartesia passthrough) | — |
+| Volume control | — | `voice.generationConfig.volume` (0.5–2.0) | — |
+| Emotion / accent (experimental) | — | `voice.experimentalControls.emotion`, `voice.experimentalControls.speed` (-1 to 1, older API) | — |
+
+---
+
+## 11labs
+
+```yaml
+voice:
+  provider: 11labs
+  voiceId: <uuid-or-name>
+  model: eleven_turbo_v2          # or eleven_flash_v2_5
+  speed: 1.05                      # 0.7–1.2
+  stability: 0.6                   # 0.0–1.0; higher = less expressive variation
+  similarityBoost: 0.75            # 0.0–1.0; higher = closer to source voice
+  enableSsmlParsing: true          # required for `<break>`, `<flush/>`, etc.
+  pronunciationDictionaryLocators: # ElevenLabs PLS dictionaries; multiple allowed
+    - pronunciationDictionaryId: rjshI10OgN6KxqtJBqO4
+      versionId: xJl0ImZzi3cYp61T0UQG
+```
+
+Common pitfalls:
+- `voice.generationConfig.*` — **does not exist** for 11labs. That's a Cartesia path. Push will 400.
+- Forgetting `enableSsmlParsing: true` — SSML tags will be spoken literally.
+- `voice.pronunciationDictId` (single string) — that's the Cartesia shape. 11labs uses `voice.pronunciationDictionaryLocators[]` (array of `{pronunciationDictionaryId, versionId}`). Reference: <https://docs.vapi.ai/assistants/pronunciation-dictionaries>.
+
+**Pronunciation dictionary warning (11labs):** dashboard edits that change the voice can drop `pronunciationDictionaryLocators` entries silently — the same drift class as Cartesia, just with the array shape. Treat the locators array as part of the voice's identity during edits.
+
+---
+
+## Cartesia (sonic-3)
+
+```yaml
+voice:
+  provider: cartesia
+  model: sonic-3
+  voiceId: <uuid>
+  pronunciationDictId: pdict_<id>  # optional but sticky — see warning below
+  generationConfig:
+    speed: 1.1                     # 0.6–1.5
+    volume: 1.0                    # 0.5–2.0
+  experimentalControls:
+    speed: 0.0                     # -1 to 1 (older API path)
+    emotion: ["positivity:high"]
+```
+
+**Forbidden at top level for Cartesia (will 400):**
+- `voice.speed` — use `voice.generationConfig.speed` instead.
+- `voice.enableSsmlParsing` — Cartesia parses SSML (`<break time='0.4s'/>`, `<speed ratio='0.9'/>`) natively from the text stream; no opt-in flag exists.
+- `voice.stability`, `voice.similarityBoost` — those are 11labs fields.
+
+**Pronunciation dictionary warning (Cartesia):** changing the `voiceId` in the Vapi dashboard's voice picker silently drops `pronunciationDictId` from the resource. If you swap the Cartesia voice via the dashboard, re-attach the dictionary on the next pull or it will be gone. Treat `(voiceId, pronunciationDictId)` as one atomic unit during edits. Note: `voice.pronunciationDictId` for Cartesia is observed in real customer payloads but is not in the Vapi docs (Vapi only documents the 11labs `pronunciationDictionaryLocators[]` shape — see the 11labs section above). Vapi appears to pass the field through to Cartesia's native API; behavior may change without notice.
+
+---
+
+## OpenAI / Azure / Rime / LMNT / Minimax / Neuphonic / SmallestAI
+
+```yaml
+voice:
+  provider: openai           # or azure, rime, lmnt, minimax, neuphonic, smallestai
+  voiceId: <provider-voice-id>
+  model: <provider-model>    # e.g. tts-1-hd for openai
+  speed: 1.0                 # top-level for these providers
+```
+
+These providers expose `speed` at the top of the `voice` block. Refer to the [Vapi voice provider docs](https://docs.vapi.ai/providers/voice) for additional provider-specific fields (instructions, language hints, etc.).
+
+---
+
+## Switching providers
+
+When migrating an assistant or squad member from Cartesia to 11labs (or vice versa), the field layout flips. If you carry over `generationConfig` from a Cartesia config to an 11labs voice, the next push will 400. Always rewrite the voice block from the target provider's template; do not patch in place.
+
+If a customer changes the provider on the dashboard and your local YAML still has the old nesting, `pull` will overwrite it cleanly — but a subsequent `push` from a stale branch will 400. Pull first, then edit.
+
+---
+
+## Adding a new provider
+
+If you find yourself reaching for a provider not in the table above, append a row here in the same PR. The cheat-sheet only stays useful if it grows with the platform.
diff --git a/improvements.md b/improvements.md
new file mode 100644
index 0000000..ba9c30a
--- /dev/null
+++ b/improvements.md
@@ -0,0 +1,977 @@
+# Vapi GitOps — Engine Improvements Log
+
+> **MAINTENANCE DIRECTIVE FOR CONTRIBUTORS (humans and AI agents):**
+> This file is the running log of friction, footguns, and improvement ideas
+> for the gitops engine in this template repo. It is the upstream source of
+> truth — every customer fork inherits it on clone, and every customer log
+> entry that surfaces an upstream-relevant gap eventually lands here.
+>
+> **When you discover ANY of the following, add an entry to this file in the
+> same change:**
+> - A push/pull/apply behavior that surprises a user or causes data loss
+> - A footgun in `src/*.ts` that isn't documented in `AGENTS.md` or `docs/learnings/`
+> - A missing safety rail (no drift detection, no dry-run, no rollback, etc.)
+> - A coordination problem (concurrent edits, dashboard-vs-local divergence)
+> - A workflow-level recommendation that emerged from real customer work
+>
+> **Format:** each entry uses the **Problem → Current behavior → Risk →
+> Current mitigation → Possible fix → Status** structure (see "Entry
+> template" below). Date the entry. Link to relevant source files / PRs
+> with line references so future readers can verify your claims.
+>
+> **Two evidence rules keep this file trustworthy:**
+> - **Verified current behavior** — confirmed in this repo (source, scripts,
+>   or docs) and cited directly.
+> - **Needs platform validation** — engine-side behavior verified, but the
+>   corresponding Vapi platform capability is still unknown. Label any
+>   platform-side claim that hasn't been confirmed.
+>
+> **When a fix lands**, mark the entry `[RESOLVED YYYY-MM-DD] (#<PR-number>)`
+> at the top of the entry — don't delete it. The history is the point.
+
+---
+
+## How to read this file
+
+Sections are ordered by **severity / blast radius**, not by date discovered.
+Within each entry:
+
+- **Problem** — one-sentence statement of what's wrong.
+- **Current behavior** — what the engine actually does today, with code
+  references so the next person can verify.
+- **Risk** — what can go wrong in real workflows.
+- **Current mitigation** — what users should do today to avoid the problem.
+- **Possible fix** — sketch of an engineering change.
+- **Status** — open / partially mitigated / resolved.
+
+## Triage at a glance
+
+**Statuses below reflect the state at the tip of each PR. Subsequent PRs in
+this stack flip rows from `Open` to `RESOLVED` as they land — the cell tells
+you which stack PR closes the row.**
+
+| #   | Title                                                    | Why it matters                                     | Depends on | Status                            |
+| --- | -------------------------------------------------------- | -------------------------------------------------- | ---------- | --------------------------------- |
+| 1   | `push` drift detection                                   | Prevent silent overwrites of dashboard edits       | #4         | Open (Stack G planned)            |
+| 2   | `apply` same-file conflict                               | `apply` drops concurrent same-file dashboard edits | #4         | Open (Stack G planned)            |
+| 3   | Rollback                                                 | Current undo can clobber newer live changes        | #4, #5     | Open (Stack H planned)            |
+| 4   | State schema content hashes                              | Architectural unlock for #1, #2, #3, #6, #7        | None       | Open (Stack F planned)            |
+| 5   | `push --dry-run`                                         | Cheapest operator-safety win                       | None       | Open (Stack C planned)            |
+| 6   | API-level optimistic concurrency                         | Server-side conflict rejection                     | Platform   | Deferred (Stack I, gated)         |
+| 7   | Voice edits drop pronunciation-dictionary attachments    | Silent regression on Cartesia + 11labs voice edits | #4         | Open (Stack G planned)            |
+| 8   | Dashboard prompt edits can in-place duplicate the prompt | Two stacked prompt versions = stitched output      | None       | Open (Stack D planned)            |
+| 9   | Provider-specific voice schema mismatch (push 400)       | `voice.speed` vs `voice.generationConfig.speed`    | None       | Partial — doc cheat-sheet (Stack A) |
+| 10  | Targeted assistant push mints duplicate tools            | Re-pushing assistant duplicates `end-call-*` tools | #4         | Partial                           |
+| 11  | Bidirectional SO ↔ assistant lockstep has no validation  | One-sided edits silently inconsistent              | None       | Open (Stack D planned)            |
+| 12  | State file accumulates UUIDs without source files        | Silent gitops drift                                | None       | Partial                           |
+| 13  | `.agent/` and `.claude/handoffs/` not gitignored         | `git add -A` sweeps PII handoff scratch            | None       | RESOLVED 2026-04-30 (Stack A)     |
+| 14  | Multi-file push undocumented                             | Discoverability                                    | None       | RESOLVED 2026-04-30 (Stack A)     |
+| 15  | Scoped push rewrites entire state file                   | Pre-existing drift sweeps into focused commits     | #4         | Open (Stack J planned)            |
+| 16  | No CLI runner for simulation suites                      | Engine pushes them, can't run them                 | None       | Open (Stack E planned)            |
+| 17  | State file key-order churn produces noisy diffs          | Reorderings hide real changes                      | None       | Open (Stack B planned)            |
+| 18  | Structured-output `name` capped at 40 chars (no warning) | Push fails partway after partial application       | None       | Open (Stack D planned)            |
+| 19  | No `maxTokens` floor warning for tool-using assistants   | `maxTokens: 1` bricks the assistant silently       | None       | Open (Stack D planned)            |
+| 20  | Prompt vocabulary leaks into TTS                         | `Reason.` becomes verbal contaminant               | None       | Open (Stack D heuristic planned)  |
+
+---
+
+## 1. `push` has no drift detection — silently overwrites concurrent dashboard edits
+
+**Discovered:** customer-fork log (Amazon3p `improvements.md` #1, 2026-04-17)
+
+### Problem
+
+`npm run push -- <env>` blindly `PATCH`es the local payload onto the
+platform without checking whether the platform's current state matches what
+we last pulled. If anyone else (a teammate, a customer, an automation)
+edits the same resource on the dashboard between our last pull and our
+push, their change is silently overwritten with no warning.
+
+### Current behavior (Verified)
+
+The push code path is a straight `PATCH /resource/{uuid}` with the full
+local payload — no `If-Match` header, no version field comparison, no
+fetch-then-diff. See `src/push.ts:73-79` and `src/api.ts:65-71` (no
+conditional-write headers anywhere in the request path). The state file
+(`.vapi-state.<env>.json`) only stores identity mappings (`name → UUID`)
+— no content hashes, no version numbers, no timestamps.
+
+### Risk
+
+A teammate dashboard-edits a prompt during a live test; you push your
+unrelated branch and their edit disappears. A customer success rep updates
+business hours via the dashboard; the next gitops push silently reverts
+it. A `git revert + push` rollback inherits the same problem — it
+overwrites whatever's currently live, not just the change being reverted.
+
+### Current mitigation
+
+Use `npm run apply -- <env>` (`pull → push`) instead of bare `push`. The
+`pull` step is git-aware and preserves locally-modified files while
+pulling fresh state for everything else (see #2 for the residual same-file
+conflict case). Bare `push` should be reserved for environments where you
+know nobody else touches the dashboard.
+
+### Possible fix
+
+1. **Content-hash drift detection.** Store sha256 of the platform's
+   last-known content per resource in `.vapi-state.<env>.json`. On push,
+   GET the current platform version, hash it, refuse to push if the hash
+   doesn't match — surface the diff and require an explicit
+   `--overwrite` flag. Depends on #4.
+2. **Server-side ETag / If-Match.** See #6.
+3. **Pre-push diff (poor man's version of #1).** Run a `pull --dry-run`
+   before push and show the user what's about to change — partial
+   mitigation only.
+
+### Status
+
+**Open.** Targeted by **Stack G** (drift detection); depends on **Stack F**
+(state schema). Mitigated by `apply -- <env>` for the non-same-file case.
+
+---
+
+## 2. `apply` (pull → push) silently drops dashboard edits to files modified locally
+
+**Discovered:** customer-fork log (Amazon3p #2, 2026-04-17)
+
+### Problem
+
+`pull` uses `git status --porcelain` to identify locally-modified files
+and **preserves the local version**, dropping the platform's version of
+those files entirely. There's no warning that the platform's version
+differs from what your local file was based on.
+
+### Current behavior (Verified)
+
+`src/pull.ts:117-135` (`getLocallyChangedFiles()`) and `src/pull.ts:705-735`
+(the preserve-local-on-pull branch). The "preserved" message in
+`src/pull.ts:887-896` tells you the count but not whether the platform's
+version of that same file diverged from your branch point. There's no
+3-way merge — local wins by default.
+
+### Risk
+
+You edit `assistants/foo.md` locally. A teammate edits the same
+`assistants/foo.md` on the dashboard. You run `apply`. Pull preserves your
+local version with no warning that the dashboard had a different version,
+then push overwrites the dashboard with yours. Their change is lost.
+
+### Current mitigation
+
+Coordinate on shared resources. Always commit before pushing so git
+history at least preserves your version cleanly. After any known
+dashboard-side change, run `pull` first so the conflict surfaces as a
+`git diff` rather than a silent overwrite.
+
+### Possible fix
+
+Same as #1: with content-hash drift detection (#4), `pull` could detect
+the same-file conflict and either refuse to preserve (requiring
+`--keep-local <file>` resolution), or write the platform's version to a
+sibling `.platform.yml` for manual 3-way merge.
+
+### Status
+
+**Open.** Targeted by **Stack G**.
+
+---
+
+## 3. No rollback command — `git revert + push` inherits all of #1's problems
+
+**Discovered:** customer-fork log (Amazon3p #3, 2026-04-17)
+
+### Problem
+
+The README documents the rollback strategy as `git revert + push`. That
+restores local content to a previous git state, but it does **not**
+restore a known platform snapshot. The subsequent push still has all the
+drift problems above, so a "rollback" can clobber unrelated dashboard
+edits made since the bad deploy. There is also no engine-level snapshot
+of what was sent.
+
+### Current behavior (Verified)
+
+`package.json` has no `rollback` script. The README still documents
+rollback as a git-level revert followed by a push. The platform-side
+safety net is the dashboard's Version History feature (manual,
+per-resource, dashboard-driven).
+
+### Risk
+
+Rollback is a manual two-step (`git revert <sha>` → `npm run push --
+<env>`), with the same overwrite risk as any other push. If the bad push
+was never committed locally, there's no clean rollback target in git.
+
+### Current mitigation
+
+Always `git commit` before `push -- <env>`. For mission-critical
+resources, note UUIDs so dashboard Version History is reachable.
+
+### Possible fix
+
+**Snapshot-on-push.** Before each PATCH, write the *outgoing* payload AND
+the *current platform payload* to
+`.vapi-state.<env>.snapshots/<timestamp>/<resource-type>/<id>.json`. Add
+`npm run rollback -- <env> --to <timestamp>`.
+
+### Status
+
+**Open.** Targeted by **Stack H**; depends on **Stack F**.
+
+---
+
+## 4. State file is identity-only — no content snapshots
+
+**Discovered:** customer-fork log (Amazon3p #4, 2026-04-17)
+
+### Problem
+
+`.vapi-state.<env>.json` stores `name → UUID` mappings only. It has no
+record of the content that was last pulled or pushed for each resource.
+This is the architectural reason drift detection isn't possible — the
+engine has no "last known platform state" to compare against.
+
+### Current behavior (Verified)
+
+`src/types.ts:5-16` types every section as `Record<string, string>`.
+`src/state.ts:10-22` (`createEmptyState()`) and the load/save flow at
+`src/state.ts:25-64` carry only identity mappings.
+
+### Risk
+
+Upstream cause of #1, #2, #3, #6, #7, #15. Fixing this enables the
+proposed mitigations above.
+
+### Possible fix
+
+Extend the state schema to include content hashes (and optionally last-
+pull timestamps and platform-reported version IDs):
+
+```ts
+interface ResourceState {
+  uuid: string;
+  lastPulledHash?: string;     // sha256 of normalized platform payload
+  lastPulledAt?: string;        // ISO timestamp
+  lastPushedHash?: string;      // sha256 of last pushed payload
+  platformVersionId?: string;   // if Vapi exposes one
+}
+```
+
+The existing `loadState()` merge with `createEmptyState()` (`src/state.ts:48-52`)
+makes the additive shape backwards-compatible — legacy string-only
+entries can be wrapped at load time.
+
+### Status
+
+**Open.** Targeted by **Stack F** — architectural prerequisite for
+G, H, I, J.
+
+---
+
+## 5. No `push --dry-run` / pre-push diff
+
+**Discovered:** customer-fork log (Mudflap #6 + Amazon3p #5, 2026-04-17/28)
+
+### Problem
+
+There's no way to preview what `push` will change on the platform before
+running it. Vapi's dashboard has "Version Preview" for the same purpose;
+the engine doesn't have a local equivalent.
+
+### Current behavior (Verified)
+
+`push.ts` has a dry-run concept only for **deletions** — `FORCE_DELETE`
+default off → orphaned resources are listed but not deleted (see
+`src/push.ts:842`). There is no dry-run for updates or creates.
+
+### Risk
+
+Users cannot validate "is this push doing what I think it's doing"
+before it lands on prod. In a multi-customer repo with prod state, an
+accidental wide-scope push (e.g. forgetting a file path arg) hits live
+assistants. Compounds #1.
+
+### Possible fix
+
+Add `--dry-run` to `src/config.ts`'s `parseFlags()`. At every
+`vapiRequest("PATCH"|"POST"|"DELETE", ...)` call site, gate behind
+`if (!DRY_RUN)`. Print `[dry-run] would PATCH /assistant/<uuid>` instead.
+Skip the state-file write entirely. End-of-run summary: `would create N,
+would update M, would delete K`.
+
+### Status
+
+**Open.** Targeted by **Stack C** — cheapest individual fix; partially
+mitigates #1, #3, #6.
+
+---
+
+## 6. No optimistic concurrency at the API protocol level
+
+**Discovered:** customer-fork log (Amazon3p #6, 2026-04-17)
+
+### Problem
+
+Even if the engine were perfectly drift-aware locally, true race
+prevention still needs help from the write API. If two clients race, the
+cleanest outcome is for the server to reject stale writes rather than
+letting the last writer win silently.
+
+### Current behavior
+
+**Verified in engine:** mutating requests in `src/api.ts:65-71` send only
+auth and content-type headers. No `If-Match` / `If-Unmodified-Since`
+anywhere.
+
+**Needs platform validation:** we have not yet confirmed whether Vapi
+write endpoints support ETags, `If-Match`, `If-Unmodified-Since`, or any
+equivalent optimistic-concurrency mechanism. Until that is verified,
+"the engine does not send conditional headers" and "the API does/does
+not support them" are separate statements.
+
+### Risk
+
+Two simultaneous gitops pipelines (e.g. a dev pushing and a CI job
+deploying) could race on the same resource with no conflict detection at
+any layer.
+
+### Current mitigation
+
+None at the API level. The `apply` flow + git coordination is the only
+defense.
+
+### Possible fix
+
+1. Confirm whether the API supports `If-Match` / `If-Unmodified-Since`
+   on `PATCH /assistant/{id}`, `PATCH /squad/{id}`, etc.
+2. If yes: extend `vapiRequest` to accept an optional ETag and have the
+   apply functions in `src/push.ts` send the last-known ETag (stored in
+   #4's extended state file).
+3. If no: file a feature request with Vapi.
+
+### Status
+
+**Deferred pending platform validation (2026-04-30).** Stack I in the
+sequenced plan is intentionally not landed in this branch. Implementing
+`If-Match` / `ETag` on the engine side without confirming the platform
+honors the headers would create dead code that gives a false sense of
+safety: pushes would still succeed under races, and the conditional-header
+guard would do nothing. Owner: file a feature-request ticket with the Vapi
+platform team to confirm support, then ship Stack I behind a flag.
+
+---
+
+## 7. Voice edits drop pronunciation-dictionary attachments (Cartesia + 11labs)
+
+**Discovered:** customer-fork log (Amazon3p #7, 2026-04-19)
+
+### Problem
+
+When a voice configuration changes in the Vapi dashboard, the
+pronunciation-dictionary attachment can be **silently removed** from
+the resource. Two shapes are affected:
+
+- **Cartesia:** `voice.pronunciationDictId` (single string id) —
+  observed dropping on voice-picker edits in the customer log.
+- **11labs:** `voice.pronunciationDictionaryLocators` (array of
+  `{ pronunciationDictionaryId, versionId }` objects) — the
+  documented Vapi shape; the same drift class applies if a
+  dashboard edit detaches an entry from the array.
+
+The new voice is selected, but the dictionary attachment is dropped
+without warning.
+
+### Current behavior (Verified)
+
+Confirmed for Cartesia by diffing pre/post-customer-edit pulls of the
+same squad's `membersOverrides.voice` block — the `pronunciationDictId`
+line vanishes on voice change. The 11labs shape is documented at
+<https://docs.vapi.ai/assistants/pronunciation-dictionaries> and uses
+an array; either array shrink or array clear is the equivalent drift.
+Note Cartesia's single-id form is **not** in the Vapi docs but is
+accepted as a passthrough to Cartesia's native API.
+
+### Risk
+
+Acronym/brand pronunciation regresses wherever the dictionary was the
+only source of truth. Customers compensate by stuffing inline
+pronunciation rules into prompts, which is strictly worse. Drift is
+invisible until you actually listen to the agent.
+
+### Current mitigation
+
+After any known voice change, immediately verify that the dictionary
+attachment is still set:
+
+- Cartesia: `voice.pronunciationDictId` still present.
+- 11labs: `voice.pronunciationDictionaryLocators` still has the
+  expected entries.
+
+Treat the dictionary attachment as part of the voice's identity during
+edits. See `docs/learnings/voice-providers.md`.
+
+### Possible fix
+
+1. **Pull-side warning.** When `pull` materialises a `voice` block that
+   loses a previously-tracked dictionary attachment (either the
+   Cartesia `pronunciationDictId` or shrinkage in the 11labs
+   `pronunciationDictionaryLocators` array), log a warning so the
+   removal isn't invisible in the diff. Doesn't need #4.
+2. **Push-side warning.** When `push` detects that local has a
+   dictionary attachment but platform doesn't, surface a warning
+   before applying. Needs #4 + drift detection.
+3. **Vapi dashboard fix.** File a feature request to preserve
+   dictionary attachments across voice changes (when the new voice
+   supports it), or warn the user explicitly.
+
+### Status
+
+**Open.** Targeted by **Stack G** as a provider-aware drift-detection
+warning covering both shapes.
+
+---
+
+## 8. Dashboard prompt edits can in-place duplicate the existing prompt
+
+**Discovered:** customer-fork log (Amazon3p #8, 2026-04-19)
+
+### Problem
+
+When a user edits a long prompt in the Vapi dashboard, it's easy to paste
+a new version on top of the existing one without first selecting and
+removing the old text. The result: the saved prompt contains BOTH the
+old and new versions stacked, with internally contradictory instructions.
+The agent then follows both sets of rules and produces stitched-together
+/ repeating output.
+
+### Current behavior (Verified)
+
+The dashboard accepts the duplicated prompt without complaint. The
+gitops repo only surfaces the issue on the next pull, where the file
+silently grows 2-5x.
+
+### Risk
+
+Silent prompt corruption. Hard to diagnose from runtime symptoms alone.
+Affects gitops-and-dashboard-concurrent customers most acutely.
+
+### Current mitigation
+
+After any customer-side prompt edit, run `pull -- <env>` and inspect
+prompt sizes. A sudden 2-5x size jump is almost always a paste-on-top
+duplication or an intentional rewrite that needs review.
+
+### Possible fix
+
+1. **Engine-level lint.** `npm run validate -- <env>` heuristics:
+   - Same opening header (`You are the ...` or any `# H1`) appearing twice
+     in one prompt
+   - Two `CONTINUITY ON ENTRY` blocks
+   - Same line repeated 3+ times consecutively
+   - Tool references in the prompt that aren't in `model.toolIds` or
+     `tools:append`
+2. **Vapi dashboard fix.** Diff/preview view in the dashboard prompt
+   editor that highlights apparent duplicate blocks before save.
+
+### Status
+
+**Open.** Targeted by **Stack D** (heuristic lint; engine intervention
+is partial — duplicated prompts can also be authored deliberately).
+
+---
+
+## 9. Provider-specific voice fields nest differently — schema mismatch only surfaces at push time
+
+**Discovered:** customer-fork log (Amazon3p #9, 2026-04-19)
+
+### Problem
+
+Vapi's voice config schema is **provider-specific**. For 11labs,
+`voice.speed` is the correct path. For Cartesia, speed lives at
+`voice.generationConfig.speed`. Same field name, different nesting. The
+gitops engine has no schema awareness — it accepts whatever you write,
+posts to Vapi, and only the API rejection at push time tells you the
+field is in the wrong place.
+
+### Current behavior (Verified)
+
+Observed: `voice.speed` on a Cartesia voice → `400: property speed
+should not exist`. `voice.enableSsmlParsing: true` on Cartesia → same
+400. The error is informative but doesn't say where the field _should_
+exist or whether it exists at all for that provider.
+
+### Risk
+
+Push fails after the change is fully prepped. Easy to misread "rejected"
+as "tool unavailable" rather than "wrong path." Provider switches break
+silently in the inverse direction.
+
+### Current mitigation
+
+After any voice-related edit, push to a non-prod environment first if
+available, OR consult `docs/learnings/voice-providers.md` (added in
+**Stack A**) for the per-provider field layout.
+
+### Possible fix
+
+1. **Engine-level validator.** `npm run validate -- <env>` rejects:
+   - Cartesia: `voice.speed`, `voice.enableSsmlParsing`,
+     `voice.stability`, `voice.similarityBoost` at top level (point at
+     `generationConfig.*` instead).
+   - 11labs: `voice.generationConfig.*` (point at top level).
+2. **Vapi side: clearer error message.** API responds with `property
+   speed should not exist at this path; for cartesia use
+   voice.generationConfig.speed`.
+
+### Status
+
+**Open.** Targeted by **Stack D** validator + the per-provider
+cheat-sheet in `docs/learnings/voice-providers.md` (Stack A).
+
+---
+
+## 10. Targeted assistant pushes can auto-create duplicate tool dependencies
+
+**Discovered:** customer-fork log (Amazon3p #10, 2026-04-29)
+
+### Problem
+
+Repeated targeted pushes of one assistant can auto-apply local tool
+dependencies and mint new duplicate tool resources instead of reusing
+the already-created dependency. Repeatedly pushing one assistant
+file created multiple `end-call-*` tools while refreshing only the
+assistant voice config.
+
+### Current behavior (Partially mitigated)
+
+`src/push.ts:697-723` (`ensureToolExists()`) skips when the tool's
+`toolId` is already a UUID, already exists as an exact key in
+`state.tools`, or was auto-applied earlier in the same process. But the
+state can lose the stable local key for a tool across bootstrap /
+name-mismatch refreshes; the resolver then treats the same local
+dependency as missing and creates a new dashboard tool.
+
+### Risk
+
+Dashboard clutter and state churn. The wrong dependency can become live —
+the assistant may point at the newest duplicate while older ones remain
+in state, making cleanup risky.
+
+### Current mitigation
+
+Before re-pushing an assistant with local tool dependencies, inspect
+`.vapi-state.<env>.json` for duplicate aliases and run
+`npm run cleanup -- <org>` as a dry-run.
+
+### Possible fix
+
+1. **Resolve dependencies by stable identity before create.**
+   `ensureToolExists()` should detect when a local tool payload already
+   corresponds to an existing dashboard resource under a renamed /
+   state-only key and re-key state instead of creating.
+2. **Duplicate-name guard for auto-applied dependencies.** Before
+   `applyTool()` creates from dependency resolution, query existing
+   remote tools by name / function signature and warn or reuse if
+   equivalent exists.
+3. **Dry-run output for targeted pushes** (Stack C).
+
+### Status
+
+**Partial.** `ensureToolExists()` blocks the most common path; the
+state-renaming case remains. **Stack C dry-run** surfaces auto-apply
+intent before mutation.
+
+---
+
+## 11. Bidirectional SO ↔ assistant attachment has no validation
+
+**Discovered:** customer-fork log (Mudflap #3, 2026-04-28)
+
+### Problem
+
+A structured output's `assistant_ids:` list and each assistant's
+`structuredOutputIds:` list are independent declarations of the same
+edge. A one-sided edit looks fine locally but produces inconsistent
+dashboard state depending on which side `push` reconciles from. Lockstep
+rules become memory-only conventions, not engine-enforced invariants.
+
+### Current behavior (Verified)
+
+The push pipeline's `updateStructuredOutputAssistantRefs()`
+(`src/push.ts:574-606`) and `updateToolAssistantRefs()` independently
+PATCH each side based on whichever local file was authored — never
+cross-checking that both sides agree.
+
+### Risk
+
+Inconsistent dashboard state. Hard to audit visually because you have to
+grep both files to detect drift.
+
+### Current mitigation
+
+Manual: grep both files when editing one side. Easy to miss.
+
+### Possible fix
+
+`npm run validate -- <env>`:
+- For every SO file's `assistant_ids:`, check the named assistant's
+  `structuredOutputIds:` lists this SO. If not, flag.
+- For every assistant's `structuredOutputIds:`, check the named SO's
+  `assistant_ids:` lists this assistant. If not, flag.
+- Optional `--fix` to auto-mirror.
+
+### Status
+
+**Open.** Targeted by **Stack D**.
+
+---
+
+## 12. State file accumulates UUIDs without source files (silent drift)
+
+**Discovered:** customer-fork log (Mudflap #2, 2026-04-28)
+
+### Problem
+
+The state file claims live resources whose specs aren't in the repo. New
+engineers cloning the repo see state references to phantom resources.
+Lockstep guarantees ("source matches dashboard") quietly break.
+
+### Current behavior (Partial)
+
+`src/push.ts:167-231` (`getInvalidStateMappings()`) detects
+`missing_remote` and `name_mismatch` cases at push time and triggers a
+bootstrap pull, but it doesn't catch "state has UUID, no local source
+file." The pull side handles deleted-local-file as an intentional
+delete tracked in state (`src/pull.ts:776-790`), which is the inverse
+direction — that case is by design.
+
+### Risk
+
+Silent gitops drift. Phantom resources accumulate across sessions.
+
+### Current mitigation
+
+Periodic `npm run cleanup -- <org>` to surface orphans on the dashboard
+side. No equivalent for state-side orphans.
+
+### Possible fix
+
+At start of `push` and end of `pull`, run a reconciliation pass:
+- For every UUID in state, check that a matching source file exists at
+  the expected path. If not, warn:
+  `state has UUID for X but no source file at <path> — either run pull
+  or remove from state`.
+- For every source file, check the state has a UUID entry. If not,
+  warn: `source file Y exists but state has no UUID — will create new
+  on push`.
+
+Make these warnings non-blocking but very visible.
+
+### Status
+
+**Partial.** `getInvalidStateMappings()` covers two of the three cases;
+state-orphans-without-source remain.
+
+---
+
+## 13. `.agent/` and `.claude/handoffs/` are not gitignored
+
+**[RESOLVED 2026-04-30] (Stack A)**
+
+**Discovered:** customer-fork log (Mudflap #4, 2026-04-28)
+
+### Problem
+
+`.agent/` and `.claude/handoffs/` showed up in `git status` from session
+start. The repo's `.gitignore` did not cover handoff-scratch directories
+written by Claude Code's SessionStart hook and the new-thread skill.
+
+### Risk
+
+`git add -A` (or `gt modify -cam`, which uses it internally) silently
+sweeps these dirs into commits. Handoff files contain conversation
+snapshots, sometimes including draft messages with PII or in-progress
+decisions.
+
+### Resolution
+
+`.gitignore` extended with `.agent/`, `.agent/handoffs/`,
+`.claude/handoffs/` (the existing `.claude/` line covered the latter
+already, but Mudflap's log explicitly called out `.agent/` which was
+uncovered). Removed the legacy `requested improvements.md` line — that
+was a per-engineer convention superseded by adopting upstream
+`improvements.md`.
+
+---
+
+## 14. Multi-file push works but is undocumented
+
+**[RESOLVED 2026-04-30] (Stack A)**
+
+**Discovered:** customer-fork log (Mudflap #5, 2026-04-28)
+
+### Problem
+
+`AGENTS.md` documented `npm run push -- <org> <single-path>` for scoped
+pushes. Multi-file (`<path1> <path2>`) worked but was undiscoverable —
+engineers fell back to "push the whole org" (wider blast radius) or
+sequential single-file pushes (multiple state file rewrites = more diff
+noise).
+
+### Resolution
+
+`AGENTS.md` Quick Reference table + Available Commands block now
+document multi-file push. Verified intentional in `src/config.ts:104-184`
+(file-path arg detection accumulates into `filePaths[]`).
+
+---
+
+## 15. Scoped push still rewrites the entire state file
+
+**Discovered:** customer-fork log (Mudflap #7, 2026-04-28)
+
+### Problem
+
+A surgical push of just two files rewrote the entire
+`.vapi-state.<env>.json`, sweeping in pre-existing drift from earlier
+pushes. The resulting commit-able state file diff was much larger than
+the actual push scope warranted.
+
+### Current behavior (Verified)
+
+`src/push.ts:1278-1280` calls `saveState(state)` with the full state
+object after every push, regardless of which paths were targeted.
+
+### Risk
+
+Even a focused push produces a noisy state diff that may include
+unintended pre-existing dashboard drift. Reviewers can't tell "what did
+this push do" from the state file diff alone.
+
+### Possible fix
+
+When push is scoped, only update state entries for resources actually
+touched. Track touched IDs during apply; at end-of-push, merge
+(load existing state → replace only touched keys → save). Needs #4 to
+distinguish "stale" from "just-not-touched."
+
+### Status
+
+**Open.** Targeted by **Stack J**; depends on **Stack F**.
+
+---
+
+## 16. No CLI runner for simulation suites (despite engine tracking them)
+
+**Discovered:** customer-fork log (Mudflap #8, 2026-04-28)
+
+### Problem
+
+The engine fully tracks simulation suites in state (and AGENTS.md
+describes `simulations/suites/` as a first-class resource type), but
+there is no `npm run` command to actually *execute* a suite. `npm run
+eval` runs the legacy `/evals` endpoint, not the unified simulation
+runner (`POST /eval/simulation/run`). The engine drops you at the API
+doorstep when you actually want to run it.
+
+### Current behavior (Verified)
+
+`package.json` has `eval` (legacy) but no `sim`. `src/push.ts`'s
+`applySimulationSuite()` (line 491) creates and updates suites but the
+engine has no run path.
+
+### Risk
+
+Asymmetric tooling — engineers will go straight to the dashboard UI to
+trigger runs (losing reproducibility) or write per-customer shell
+wrappers. The naming overlap (`npm run eval` vs `simulations/`)
+actively misleads.
+
+### Possible fix
+
+Add `npm run sim`:
+```
+npm run sim -- <org> --suite <name> --target <assistant-or-squad>
+npm run sim -- <org> --simulations <n1>,<n2> --target <assistant>
+npm run sim -- <org> --suite <name> --watch
+```
+Reuse `src/eval.ts`'s local-name → UUID resolver and
+`src/api.ts:vapiRequest`. Print pass/fail summary on completion.
+
+Renaming `npm run eval` to disambiguate is a separate, backwards-
+incompatible follow-up.
+
+### Status
+
+**Open.** Targeted by **Stack E**.
+
+---
+
+## 17. State file key-order churn produces noisy diffs
+
+**Discovered:** customer-fork log (Mudflap #1, 2026-04-28)
+
+### Problem
+
+After pushes, the diff of `.vapi-state.<env>.json` includes reorderings
+of the section objects. Same keys, same UUIDs — just emitted in a
+different insertion order. About half the diff is pure reordering.
+
+### Current behavior (Verified)
+
+`src/state.ts:55-64` (`saveState()`) calls `JSON.stringify(state, null,
+2)` with no key sorter. JS `JSON.stringify` preserves insertion order;
+maps merged from multiple sources (push, pull, bootstrap) end up with
+unpredictable orders.
+
+### Risk
+
+Noisy state-file diffs hide the actually meaningful entries (new UUIDs,
+removed entries) under a wall of reorderings. Reviewers rubber-stamp
+state file changes because they're hard to read.
+
+### Possible fix
+
+Add `sortedKeysReplacer` to `JSON.stringify` so object keys serialize
+alphabetically. Preserve the atomic write pattern in
+`src/state.ts:60-62`.
+
+**One-time noise:** the first push after this lands produces a
+state-file diff of pure reordering across every customer. Worth calling
+out in the PR description.
+
+### Status
+
+**Open.** Targeted by **Stack B**.
+
+---
+
+## 18. Structured-output evaluation `name` capped at 40 chars with no client-side validation
+
+**Discovered:** customer-fork log (Mudflap #9, 2026-04-29)
+
+### Problem
+
+Structured-output `evaluations[].structuredOutput.name` is capped at 40
+characters server-side. The engine accepts a 51-char name, posts it,
+and only fails when the API returns 400 mid-push.
+
+### Current behavior (Verified)
+
+Push partway through a multi-resource apply. By the time the scenario
+errored, both assistants and one new personality had already been
+applied AND the state file had been written with the new personality
+UUID. The push left the dashboard in an intermediate state.
+
+### Risk
+
+Failure happens partway through a multi-resource push. Recovery is
+non-obvious. Engineers naturally write self-describing names that
+exceed the cap.
+
+### Possible fix
+
+Client-side validator (`npm run validate`) that walks every assistant
+`name` and every `evaluations[].structuredOutput.name` in scenarios.
+Fail fast (with the offending field path printed) before any API call.
+Same validator can apply the cap to other known-finite fields (e.g.
+assistant `name` capped at 40 too).
+
+### Status
+
+**Open.** Targeted by **Stack D**.
+
+---
+
+## 19. No engine warning when `maxTokens` is too low for a tool-using assistant
+
+**Discovered:** customer-fork log (Mudflap #10, 2026-04-29)
+
+### Problem
+
+Any engineer can write `maxTokens: 1` (or 10, or 25) into an assistant
+`.md`. The engine syncs it to the dashboard with no warning. The first
+symptom on a real call is a malformed tool-call payload — opaque to
+debug. Risk window is widest when an engineer is *trying to suppress
+speech* on a silent classifier.
+
+### Current behavior
+
+**Verified in engine:** the push pipeline passes `maxTokens` through
+unchanged. **Needs platform validation:** the exact OpenAI / provider
+behavior at low `maxTokens` boundary is provider-specific; the customer
+log cites OpenAI streaming behavior at `maxTokens: 1` that returns
+`finish_reason: 'length'` mid-JSON for tool calls.
+
+### Possible fix
+
+At validate / push time, for any assistant with non-empty
+`model.toolIds`, compute a soft floor:
+`floor ≈ 25 + sum(len(JSON.stringify(tool.function.parameters)) for tool in tools)`.
+If `model.maxTokens < floor`, warn (non-blocking).
+
+### Status
+
+**Open.** Targeted by **Stack D**.
+
+---
+
+## 20. Prompt vocabulary leaks into TTS
+
+**Discovered:** customer-fork log (Mudflap #11, 2026-04-29)
+
+### Problem
+
+A prompt section heading or example word that names a tool argument can
+become a TTS contaminant. Customer log: a `# Reasoning Channel
+Discipline` section with `Reason.` examples caused the model to open
+turns with `"Reason."` as a TTS preface. Squad regressed 7/18 → 4/18.
+
+### Current behavior (Verified)
+
+The engine treats prompts as opaque text. No surface to detect this
+class of regression at push time.
+
+### Risk
+
+Prompt-authoring footguns ship clean through the engine. Discovered
+days later via sim regressions; attribution to the prompt's literal
+word choice is non-obvious.
+
+### Possible fix
+
+Heuristic only — a real fix requires linguistic modeling out of scope
+for an engine intervention:
+
+1. If a prompt body contains a structured concept word (`Reason`,
+   `Reasoning`, `Channel`, `Discipline`, `Argument`, etc., capitalized)
+   AND the assistant has a tool whose parameter has the same name, warn
+   at validate time.
+2. Templating convention `<<arg:reason>>` is overkill but worth thinking
+   about.
+
+The full fix lives in `docs/learnings/assistants.md` as a known
+regression shape.
+
+### Status
+
+**Open.** Targeted by **Stack D** as a heuristic; entry stays open to
+flag that the heuristic is partial.
+
+---
+
+## Out of scope (intentionally not improvements)
+
+- **State file is identity-only and not git-ignored.** It's intentionally
+  committed so all collaborators share the same local→UUID mapping.
+  The proposal in #4 is *additive* — keep identity mappings, add
+  content hashes.
+- **`push -- <env>` does not require an interactive confirmation prompt.**
+  That's a UX choice — adding a prompt would break automation. The right
+  place to add friction is `--dry-run` (#5).
+- **No environment-cross-pollination guard.** `push -- <env>` only
+  touches `resources/<env>/` — this is correct and documented in
+  `AGENTS.md`. Don't conflate that with drift detection.
+- **Renaming `npm run eval` to disambiguate from `npm run sim`.**
+  Backwards-incompatible script change; raise as a separate issue.