python-agents-examples/docs/examples/tts_node_modifications/README.md at main · livekit-examples/python-agents-examples

title

Prerequisites

Add a .env in this directory with your LiveKit credentials:

LIVEKIT_URL=your_livekit_url
LIVEKIT_API_KEY=your_api_key
LIVEKIT_API_SECRET=your_api_secret
DEEPGRAM_API_KEY=your_deepgram_key
OPENAI_API_KEY=your_openai_key
RIME_API_KEY=your_rime_key

Install dependencies:

pip install "livekit-agents[silero]" python-dotenv livekit-plugins-deepgram livekit-plugins-openai livekit-plugins-rime

Load configuration and create the AgentServer

Load your environment variables and set up logging to see the before/after replacements. Create an AgentServer to manage sessions.

import logging
from typing import AsyncIterable
from dotenv import load_dotenv
from livekit.agents import JobContext, JobProcess, AgentServer, cli, Agent, AgentSession, ModelSettings
from livekit.plugins import deepgram, openai, silero, rime

load_dotenv()

logger = logging.getLogger("tts_node")
logger.setLevel(logging.INFO)

server = AgentServer()

Prewarm VAD for faster connections

Preload the VAD model once per process to reduce connection latency.

def prewarm(proc: JobProcess):
    proc.userdata["vad"] = silero.VAD.load()

server.setup_fnc = prewarm

Define the agent and override tts_node

Use Deepgram STT, GPT-4o, and Rime TTS. Override tts_node to wrap the outgoing text stream and modify it before synthesis.

class TtsNodeOverrideAgent(Agent):
    def __init__(self, vad) -> None:
        super().__init__(
            instructions="""
                You are a helpful assistant communicating through voice.
                Feel free to use "lol" in your responses when something is funny.
            """,
            stt=deepgram.STT(),
            llm=openai.LLM(model="gpt-4o"),
            tts=rime.TTS(model="arcana"),
            vad=vad
        )

    async def tts_node(self, text: AsyncIterable[str], model_settings: ModelSettings):
        """Modify the TTS output by replacing 'lol' with '<laugh>'."""

Replace text chunks before synthesis

Iterate over the async text stream, replace "lol"/"LOL", log the swap, and delegate to the default TTS node with the modified stream.

        async def process_text():
            async for chunk in text:
                original_chunk = chunk
                modified_chunk = chunk.replace("lol", "<laugh>").replace("LOL", "<laugh>")

                if original_chunk != modified_chunk:
                    logger.info(f"TTS original: '{original_chunk}'")
                    logger.info(f"TTS modified: '{modified_chunk}'")

                yield modified_chunk

        return Agent.default.tts_node(self, process_text(), model_settings)

Greet on entry

Send a short greeting so you can immediately hear the behavior when you respond with something funny.

    async def on_enter(self):
        await self.session.say("Hi there! Is there anything I can help you with? If you say something funny, I might respond with lol.")

Create the RTC session entrypoint

Start the agent in your LiveKit room; any "lol" in responses will be replaced before TTS renders audio.

@server.rtc_session()
async def entrypoint(ctx: JobContext):
    ctx.log_context_fields = {"room": ctx.room.name}

    session = AgentSession()

    await session.start(
        agent=TtsNodeOverrideAgent(vad=ctx.proc.userdata["vad"]),
        room=ctx.room
    )
    await ctx.connect()

Run it

python tts_node_modifications.py console

How it works

tts_node lets you intercept the text stream headed to TTS.
A wrapper coroutine replaces target phrases and logs the changes.
The modified stream is passed to the default TTS pipeline so timing/buffering are preserved.
Rime TTS renders the adjusted text to audio.

Full example

import logging
from typing import AsyncIterable
from dotenv import load_dotenv
from livekit.agents import JobContext, JobProcess, AgentServer, cli, Agent, AgentSession, ModelSettings
from livekit.plugins import deepgram, openai, silero, rime

load_dotenv()

logger = logging.getLogger("tts_node")
logger.setLevel(logging.INFO)

class TtsNodeOverrideAgent(Agent):
    def __init__(self, vad) -> None:
        super().__init__(
            instructions="""
                You are a helpful assistant communicating through voice.
                Feel free to use "lol" in your responses when something is funny.
            """,
            stt=deepgram.STT(),
            llm=openai.LLM(model="gpt-4o"),
            tts=rime.TTS(model="arcana"),
            vad=vad
        )

    async def tts_node(self, text: AsyncIterable[str], model_settings: ModelSettings):
        """Modify the TTS output by replacing 'lol' with '<laugh>'."""

        async def process_text():
            async for chunk in text:
                original_chunk = chunk
                modified_chunk = chunk.replace("lol", "<laugh>").replace("LOL", "<laugh>")

                if original_chunk != modified_chunk:
                    logger.info(f"TTS original: '{original_chunk}'")
                    logger.info(f"TTS modified: '{modified_chunk}'")

                yield modified_chunk

        return Agent.default.tts_node(self, process_text(), model_settings)

    async def on_enter(self):
        await self.session.say(f"Hi there! Is there anything I can help you with? If you say something funny, I might respond with lol.")

server = AgentServer()

def prewarm(proc: JobProcess):
    proc.userdata["vad"] = silero.VAD.load()

server.setup_fnc = prewarm

@server.rtc_session()
async def entrypoint(ctx: JobContext):
    ctx.log_context_fields = {"room": ctx.room.name}

    session = AgentSession()

    await session.start(
        agent=TtsNodeOverrideAgent(vad=ctx.proc.userdata["vad"]),
        room=ctx.room
    )
    await ctx.connect()

if __name__ == "__main__":
    cli.run_app(server)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prerequisites

Load configuration and create the AgentServer

Prewarm VAD for faster connections

Define the agent and override tts_node

Replace text chunks before synthesis

Greet on entry

Create the RTC session entrypoint

Run it

How it works

Full example

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Prerequisites

Load configuration and create the AgentServer

Prewarm VAD for faster connections

Define the agent and override tts_node

Replace text chunks before synthesis

Greet on entry

Create the RTC session entrypoint

Run it

How it works

Full example