You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* fix: dedup dependency auto-apply to prevent duplicate tool mints (Gap #10)
Targeted assistant pushes minted duplicate dashboard tools when bootstrap
pull stored an existing dashboard tool under a name-slugged state key
(e.g. `end-call-67aea057`) instead of the user's original local key. The
exact-key lookup in `ensureToolExists` / `ensureStructuredOutputExists`
missed and POSTed a fresh duplicate. Each subsequent targeted push
repeated the cycle, accumulating dashboard orphans.
This adds a name-based dedup check between the exact-key short-circuit
and the create path, in two layers:
1. State-side: scan state for any key whose `extractBaseSlug` matches
the local payload's slugified name (handles bootstrap-renamed keys).
2. Dashboard-side: lazy-fetch the live `/tool` (and `/structured-output`)
list once per push and check for a remote resource with the same
canonical name.
When >1 distinct UUID matches the same name (real on-dashboard duplicates
from prior bug runs), pick the lex-smallest UUID for stable adoption,
warn naming the loser UUIDs, and point at `npm run cleanup`. Never mint
another duplicate.
Adoption flow:
- Re-key state to the adopted UUID under the local resourceId.
- Drop other state keys pointing at the same UUID and mark them
touched, so a subsequent full push doesn't orphan-delete the
adopted dashboard resource (Stack J / mergeScoped flushes the
deletion).
- Route through `applyTool`/`applyStructuredOutput` so the local
payload PATCHes the dashboard with the standard drift-check flow,
instead of recording a fake `lastPushedHash` that would silently
drop a locally-edited dependency.
Tests: 12 unit tests for `findExistingResourceByName` covering state-only,
dashboard-only, both-agree, ambiguous (state-vs-state, state-vs-dashboard),
no-name, exact-key-excluded, no-match. All 114 suites pass.
Refs: improvements.md §10
* docs(improvements): mark §10 as resolved (#23)
* docs: surface tool/SO dedup behavior in learnings; align AGENTS.md / CLAUDE.md
- src/dep-dedup.ts: drop "Gap #10" issue marker from the file header (it
rots; the rationale is what matters, not the tracker reference).
- docs/learnings/tools.md: new section "Renaming a tool file is safe — the
engine dedups by `function.name`" — explains the auto-apply dedup safety
net, the 🔁 / ⚠️ log line semantics, and the cleanup path. Counterpart in
docs/learnings/structured-outputs.md cross-references it.
- AGENTS.md: add `outbound-campaigns.md` to the Learnings & recipes table
(was missing); refresh the docs/learnings/ tree in the Project Structure
section to be complete; add an explicit "Where new knowledge goes" table
pinning the convention (per-resource tips → docs/learnings/<topic>.md;
engine-friction log → improvements.md; rationale → code comments;
onboarding → README.md).
- CLAUDE.md: sync the Required Reading Order list with AGENTS.md's table
(was missing voice-providers, outbound-agents, outbound-campaigns,
voicemail-detection); add a brief "Where new knowledge goes" reminder
pointing back at AGENTS.md as the canonical convention table.
No source behavior changes. Build clean, 114/114 tests pass.
* refactor: address review — extend dedup to assistants, drop internal identifier refs from comments
Code-review follow-ups on PR #23:
- src/dep-dedup.ts: replace `Record<string, unknown>` + `as`-cast with a
named `NameablePayload = { name?: unknown; function?: unknown }` shape
and `in`-operator narrowing. No casts, no laundered types — the function
reads two known paths and narrows them at use.
- src/push.ts: scrub "Gap #10" / "Stack J" / "improvements.md #15"
identifiers from comments. These were internal stack/log markers that
don't help anyone reading the code; rephrased in domain language while
keeping the rationale. Also drops redundant `as Record<string, unknown>`
casts at call sites and reuses `extractResourceName` for the display-name
fallback in dedup warnings.
- src/push.ts: extend dedup to assistants. The squad → assistant
auto-apply path (`ensureAssistantExists`) had the same bug class as
tools / SOs — bootstrap pull stores assistants under `<slug>-<uuid8>`
keys, and a squad referencing the original local key would mint a
duplicate assistant on every push. Adds `getExistingRemoteAssistants`
lazy-fetch + dedup branch with the same orphan-deletion guard and
apply-via-PATCH flow already in place for tools / SOs. Documents in
the DependencyContext comment why simulations / personalities /
scenarios / sim-suites are NOT covered: they're not auto-applied as
dependencies anywhere in the engine, so the bug class doesn't fire.
- tests/dep-dedup.test.ts: add explicit assistant-payload test
(top-level `name`, no nested `function`).
Build clean, 115/115 tests pass (was 114, +1 new assistant test).
* refactor: scrub internal stack/issue identifiers and customer references
Two cleanup sweeps prompted by review feedback on PR #23:
Sweep 1 — internal stack/log identifiers in code comments. References
to "Stack F/G/H/I/J" and "improvements.md #N" are internal-only and
mean nothing to a customer reading the code. Each comment is rephrased
in domain language while preserving the WHY:
- Stack F → "per-resource content-hash state schema" / "schema migration"
- Stack G → "drift detection layer"
- Stack H → "snapshot-on-push for rollback"
- Stack I → "ETag-based optimistic concurrency"
- Stack J → "scoped state writes"
- improvements.md #N → dropped entirely (the rationale stands on its own)
Touched: src/api.ts, cleanup.ts, dep-dedup.ts, drift.ts, pull.ts, push.ts,
resolver.ts, sim-cmd.ts, sim.ts, snapshot.ts, state-merge.ts,
state-serialize.ts, types.ts.
Sweep 2 — customer-specific identifiers in docs/learnings. Customer
brand names (`iForm`, `Mudflap`) and internal ticket IDs
(`PRISM-481`, `PRISM-528`, `PRISM-474`) replaced with generic
placeholders so the public template doesn't carry customer artifacts:
- iForm → Acme Logistics (in scenario examples)
- Mudflap → "a customer rollout" (in cross-references)
- PRISM-* tickets → dropped entirely
- handoffToiFormSales → handoffToAcmeSales
- b2b-invoice-end-call.yml → intake-end-call.yml (in renaming example)
Touched: docs/learnings/assistants.md, simulations.md, squads.md, tools.md,
voice-providers.md.
No source-behavior changes. Build clean, 115/115 tests pass.
| Per-resource gotchas, recipes, troubleshooting |`docs/learnings/<topic>.md`| One file per resource type or topic. Add a row to this table AND to `docs/learnings/README.md` when you add a new file. `CLAUDE.md` mirrors this list — keep both in sync. |
43
+
| Engine-friction log (push/pull/state/cleanup pain points + fixes) |`improvements.md`| Format: Problem → Current behavior → Risk → Current mitigation → Possible fix → Status. Mark `[RESOLVED YYYY-MM-DD] (#<PR>)` when fixed; never delete. |
44
+
| Code-level rationale (why a function works the way it does) | Code comments | Only when the WHY is non-obvious — not what the code does. Don't reference PR/issue numbers; they rot. |
This list mirrors the "Learnings & recipes" table in `AGENTS.md`. Keep both in sync — if you add a new learnings file, update both files plus `docs/learnings/README.md`.
35
+
36
+
## Where new knowledge goes
37
+
38
+
Per-resource tips/recipes/troubleshooting → `docs/learnings/<topic>.md`. Engine-friction log (push/pull/state/cleanup pain points + their fixes) → `improvements.md`. Code-level rationale → comments only when the *why* is non-obvious; never reference PR/issue numbers in code comments (they rot). One-time onboarding/install → `README.md`. When unsure, default to `docs/learnings/`. The full convention table lives in `AGENTS.md` under "Where new knowledge goes" — read it once, then this reminder is enough.
39
+
30
40
## Improvements log
31
41
32
42
This repo maintains an upstream-only running log at `improvements.md` (repo
Copy file name to clipboardExpand all lines: docs/learnings/assistants.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -656,7 +656,7 @@ They are merged, not mutually exclusive. But be aware of potential duplicates.
656
656
657
657
## Liquid Variable Bag and Trust Tiers
658
658
659
-
Cross-reference: [docs.vapi.ai/assistants/dynamic-variables](https://docs.vapi.ai/assistants/dynamic-variables). The trust-tier framing came out of Mudflap progressive-auth work (PRISM-528).
659
+
Cross-reference: [docs.vapi.ai/assistants/dynamic-variables](https://docs.vapi.ai/assistants/dynamic-variables). The trust-tier framing came out of progressive caller-ID auth work on a customer rollout.
660
660
661
661
Vapi exposes a Liquid templating layer in prompts, tool config, and overrides — `{{ customer.number }}`, `{{ now }}`, etc. The variables in scope at runtime fall into three trust tiers based on where they originate. This matters because anything you place in a security-sensitive field (tool static `parameters`, message templates that go to a backend) is only as trustworthy as the source of the variable.
Copy file name to clipboardExpand all lines: docs/learnings/simulations.md
+11-11Lines changed: 11 additions & 11 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -20,35 +20,35 @@ Extra system messages beyond `messages[0]` are **not** included in the tester's
20
20
21
21
When the same rubric needs to run against multiple personality variants in a sim suite, give EACH `(rubric, personality)` pair its own scenario file with a uniquely descriptive name — even if the rubric content is identical across them.
22
22
23
-
**Why:** the dashboard's run-history view displays scenarios by `name`, NOT by which personality drove the test. If 4 sims share a scenario named `iForm Live Human Pickup Handling`, all 4 result entries show identically in the suite-run sidebar — you can't tell which test was the "quick" pickup vs the "self-id" pickup vs the "question" pickup vs the "ambiguous-short" pickup without drilling into each item to see the personality. This makes failure investigation painful: every flickering test looks like the same test.
23
+
**Why:** the dashboard's run-history view displays scenarios by `name`, NOT by which personality drove the test. If 4 sims share a scenario named `Acme Logistics Live Human Pickup Handling`, all 4 result entries show identically in the suite-run sidebar — you can't tell which test was the "quick" pickup vs the "self-id" pickup vs the "question" pickup vs the "ambiguous-short" pickup without drilling into each item to see the personality. This makes failure investigation painful: every flickering test looks like the same test.
24
24
25
25
**Recommendation:** name each scenario as `<base>-<personality-variant>-handling`, with a descriptive `name:` field that calls out the personality being tested.
**Cost:** scenario file duplication — each variant is a copy of the same rubric content with a different `name:` field. Cheap. The duplication is mechanical (you can clone the source scenario file 4-6 times with a one-line `name:` change each).
**Anti-pattern:** putting one shared scenario behind N personality variants in the same suite. The dashboard sidebar shows N rows with identical scenario names, only distinguishable by clicking into each item to see the personality. Sim iteration time inflates because every failure investigation starts with "wait, which one was this?"
59
59
60
-
Cross-reference: this convention surfaced as friction during the Mudflap iForm Voicemail Triage sim iteration (PRISM-481). Original suites shipped with one shared scenario per group (4 live-pickup tests sharing one scenario, 6 voicemail-edge-cases sharing one scenario); split into per-personality scenarios mid-iteration. Worth shipping new suites in the per-personality form from day one.
60
+
Cross-reference: this convention surfaced as friction during a customer voicemail-triage sim iteration. Original suites shipped with one shared scenario per group (4 live-pickup tests sharing one scenario, 6 voicemail-edge-cases sharing one scenario); split into per-personality scenarios mid-iteration. Worth shipping new suites in the per-personality form from day one.
Copy file name to clipboardExpand all lines: docs/learnings/squads.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -107,7 +107,7 @@ For sim suites grading the destination's first-turn behavior, see [simulations.m
107
107
108
108
## Passing data between assistants
109
109
110
-
Cross-reference: [docs.vapi.ai/squads/passing-data-between-assistants](https://docs.vapi.ai/squads/passing-data-between-assistants). The trust-tier framing came out of Mudflap progressive-auth work (PRISM-528).
110
+
Cross-reference: [docs.vapi.ai/squads/passing-data-between-assistants](https://docs.vapi.ai/squads/passing-data-between-assistants). The trust-tier framing came out of progressive caller-ID auth work on a customer rollout.
111
111
112
112
When a squad hands off mid-call, three approaches exist for getting data from one assistant to the next. They differ on trust level, latency, and determinism.
Copy file name to clipboardExpand all lines: docs/learnings/structured-outputs.md
+4Lines changed: 4 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -32,6 +32,10 @@ evaluations.5.structuredOutput.Name must be between 1 and 40 characters
32
32
33
33
Long, descriptive evaluator names like `assistant_left_voicemail_and_ended_call_promptly` (48 chars) or `assistant_detected_hostile_recording_and_ended_call` (51 chars) will silently exceed the limit until you POST. Keep names compact (`assistant_ended_call_after_message`, `assistant_handled_hostile_recording`) and put the descriptive nuance in the `description` field, which has no length cap. The constraint applies to the field on every structured output type — both standalone resources and inline evaluations within scenarios.
34
34
35
+
### Renaming a structured-output file is safe — the engine dedups by `name`
36
+
37
+
Same dedup behavior as for tools: if you rename a structured-output file but keep its `name` field stable, the push pipeline detects the existing dashboard resource (by slugified `name` against state and the live dashboard list) and adopts its UUID instead of creating a duplicate. You'll see `🔁 Reusing existing structured output: <localKey> → <uuid>` in the push log. See [tools.md → "Renaming a tool file is safe"](tools.md#renaming-a-tool-file-is-safe--the-engine-dedups-by-functionname) for the full mechanism, ambiguity warning semantics, and `npm run cleanup` workflow — they're identical for SOs.
Copy file name to clipboardExpand all lines: docs/learnings/tools.md
+23-3Lines changed: 23 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -37,7 +37,27 @@ Vapi enforces a hard **1000-character maximum** on `function.description` across
37
37
38
38
### `function.name` matches `^[A-Za-z0-9_-]+$`
39
39
40
-
Tool names are validated against this regex by Vapi. Spaces, dots, slashes, parentheses, or unicode characters cause a 400 at push time. Use snake_case or camelCase (e.g. `end_call_vapi_testing`, `handoffToiFormSales`). The name is what the LLM emits in its function call, so keep it stable across config changes — renaming a tool invalidates any prompt rule that mentions the old name.
40
+
Tool names are validated against this regex by Vapi. Spaces, dots, slashes, parentheses, or unicode characters cause a 400 at push time. Use snake_case or camelCase (e.g. `end_call_vapi_testing`, `handoffToAcmeSales`). The name is what the LLM emits in its function call, so keep it stable across config changes — renaming a tool invalidates any prompt rule that mentions the old name.
41
+
42
+
### Renaming a tool file is safe — the engine dedups by `function.name`
43
+
44
+
The push pipeline includes a name-based dedup safety net that prevents minting duplicate dashboard tools when:
45
+
46
+
- You renamed the local file (e.g. `end-call.yml` → `intake-end-call.yml`) but kept `function.name` the same.
47
+
- Bootstrap pull stored the dashboard tool under a slug-suffixed state key (e.g. `end-call-67aea057`) and your assistant references the original local key.
48
+
- The tool exists on the dashboard but isn't yet in your local state file (e.g. fresh clone, partial pull).
49
+
50
+
In all three cases the engine looks up the tool by slugified `function.name` against both state entries and the live dashboard tool list, then **adopts** the existing UUID instead of creating a new one. You'll see this log line:
51
+
52
+
```
53
+
🔁 Reusing existing tool: <localKey> → <uuid> (matched via state|dashboard|both)
54
+
```
55
+
56
+
Adoption then routes through the standard PATCH path, so any local edits to the tool's payload are pushed normally with drift detection. Your old state-key entries are dropped automatically so the next full push doesn't orphan-delete the just-adopted dashboard tool.
57
+
58
+
**When you see `⚠️ Multiple dashboard tools share the name "<n>" — adopting <uuid> (lex-smallest)`**, real duplicate dashboard resources exist (typically from before the dedup was added). Run `npm run cleanup -- <org>` to inspect and prune; the engine adopts the lex-smallest UUID deterministically so subsequent pushes stay stable.
59
+
60
+
**What this does NOT do:** if you rename `function.name` (not just the file), that's a new logical tool — the engine creates a new dashboard resource. Function-name renames need an explicit `npm run cleanup` of the old one.
41
61
42
62
---
43
63
@@ -335,7 +355,7 @@ Only `function` tools support `strict` mode.
335
355
336
356
## Tool Security and Data Visibility
337
357
338
-
Cross-reference: [docs.vapi.ai/tools/static-variables-and-aliases](https://docs.vapi.ai/tools/static-variables-and-aliases) and [docs.vapi.ai/tools/custom-tools](https://docs.vapi.ai/tools/custom-tools). The full data-flow / threat-model writeup that motivates this section came out of Mudflap progressive-auth work (PRISM-528).
358
+
Cross-reference: [docs.vapi.ai/tools/static-variables-and-aliases](https://docs.vapi.ai/tools/static-variables-and-aliases) and [docs.vapi.ai/tools/custom-tools](https://docs.vapi.ai/tools/custom-tools). The full data-flow / threat-model writeup that motivates this section came out of progressive caller-ID auth work on a customer rollout.
339
359
340
360
### Every tool result is in conversation history
341
361
@@ -374,7 +394,7 @@ The dashboard renders these as "Parameters" (JSON schema editor) and "Static Bod
374
394
| Legacy `assistant.model.functions[]` (deprecated) | ❌ — converter zeroes it out |
Copy file name to clipboardExpand all lines: docs/learnings/voice-providers.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -102,7 +102,7 @@ If you find yourself reaching for a provider not in the table above, append a ro
102
102
103
103
Pronunciation dictionaries do not share a field shape across voice providers. Same conceptual feature, three different surfaces.
104
104
105
-
> **Public-docs note:** As of 2026-05-08 the public Vapi docs state pronunciation dictionaries are "exclusive to ElevenLabs voices." This is out of date — Cartesia has been confirmed in production deployments and Vapi-voice schema-level support is in active rollout (PRISM-474). Treat this wiki as the more current source.
105
+
> **Public-docs note:** As of 2026-05-08 the public Vapi docs state pronunciation dictionaries are "exclusive to ElevenLabs voices." This is out of date — Cartesia has been confirmed in production deployments and Vapi-voice schema-level support is in active rollout. Treat this wiki as the more current source.
106
106
107
107
### Cartesia
108
108
@@ -120,7 +120,7 @@ Pronunciation dictionaries do not share a field shape across voice providers. Sa
120
120
### Vapi voices
121
121
122
122
- **Schema-level**: accepts pronunciation dictionary configs at the API.
123
-
- **Dashboard UI surface**: in active rollout (PRISM-474, Q2 2026). Schema acceptance does **not** guarantee runtime TTS engine honors the dictionary.
123
+
- **Dashboard UI surface**: in active rollout. Schema acceptance does **not** guarantee runtime TTS engine honors the dictionary.
124
124
- **Recommendation**: verify runtime behavior with a call test before depending on it for production Vapi-voice deployments.
0 commit comments