Skip to content

Add Fish Audio s2.1-pro transport (WS, latency=balanced)#3

Open
cshape wants to merge 1 commit into
VapiAI:mainfrom
cshape:feat/add-fish-audio
Open

Add Fish Audio s2.1-pro transport (WS, latency=balanced)#3
cshape wants to merge 1 commit into
VapiAI:mainfrom
cshape:feat/add-fish-audio

Conversation

@cshape

@cshape cshape commented Jun 17, 2026

Copy link
Copy Markdown

Plumbing for Issue #2.

What's here

  • src/pipeline/transports/fish.ts: new transport.
    • synthesize and TTFB share the realtime WebSocket wss://api.fish.audio/v1/tts/live with msgpack framing (start/text/flush/stop out, audio/finish back).
    • createClone hits the documented multipart POST https://api.fish.audio/model (instant cloning, train_mode: "fast", private model).
  • src/pipeline/transports/index.ts: register fish in TRANSPORTS.
  • src/pipeline/models.ts: pre-registration row with frozen arenaApiId: "s2-1-pro", vendorModelId: "s2.1-pro". Once the maintainer hashes clips, that id is permanent.
  • @msgpack/msgpack added as a devDep. Transport-only; the production Next.js bundle never imports pipeline/transports.

What's NOT here (maintainer follow-up, per docs/ADDING_A_MODEL.md)

  1. Clone the 4 source voices on Fish (humanness:clone fish <source-clips-dir>).
  2. Generate + upload + HEAD-verify the 80 clips (humanness:clips fish-s21-pro --upload).
  3. Add the catalog entry in src/catalog/{providers,models}.ts (Fish provider, s2.1-pro model, sourced stats: $15 per 1M chars, 80+ languages, s2-pro released March 2026, s2.1-pro launching July 2026 but available now via API; sample fallbackClip pinned from the uploaded hashes; bump pinned counts in catalog.test.ts).
  4. Run humanness:ttfb fish-s21-pro and record the median as pipelineTtfb(<n>).
  5. Browser-check /models/fish-s2-1-pro.

Verification

  • bun run check-types
  • bun test (109 pass)
  • bun run build

Test plan

  • Maintainer runs the 5 steps above against a Fish API key + the licensed source voices.
  • WS bench produces a comparable median to the existing providers (local non-clone bench saw 283 ms median for balanced; the registered number comes from the 50-trial pipeline bench).

Plumbing only. Maintainer still has to clone the four source voices on
Fish, generate + upload the 80 clips, and add the catalog entry; this
commit gives them a working transport to do it with.

- transports/fish.ts: synthesis + TTFB share wss://api.fish.audio/v1/tts/live
  (start/text/flush/stop msgpack frames, audio/finish back). Clone creation
  hits the documented multipart POST /model with train_mode=fast.
- latency=balanced beat latency=low by ~18 ms median in a local 20-trial
  bench, so the entry uses balanced.
- Pre-registration row in pipeline/models.ts with frozen arenaApiId s2-1-pro.
- @msgpack/msgpack added as a devDep (transport-only; never imported by the
  production Next.js bundle).
@vercel

vercel Bot commented Jun 17, 2026

Copy link
Copy Markdown

@cshape is attempting to deploy a commit to the Vapi Team on Vercel.

A member of the Team first needs to authorize it.

@cshape cshape mentioned this pull request Jun 17, 2026
@cshape cshape changed the title Add Fish Audio s2.1-pro transport (WS, latency=balanced) add fish audio s2.1-pro transport (ws, latency=balanced) Jun 17, 2026
@cshape cshape changed the title add fish audio s2.1-pro transport (ws, latency=balanced) Add Fish Audio s2.1-pro transport (WS, latency=balanced) Jun 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant