| title | category | tags | difficulty | description | demonstrates | |||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
MCP Agent |
mcp |
|
beginner |
Shows how to use a LiveKit Agent as an MCP client. |
|
This example shows how to connect a LiveKit voice agent to an MCP (Model Context Protocol) server. MCP allows the agent to access external tools and data sources. In this case, the agent connects to a local Codex instance via stdio, enabling voice-based interaction with the Codex coding assistant.
- Add a
.envin this directory with your LiveKit credentials:LIVEKIT_URL=your_livekit_url LIVEKIT_API_KEY=your_api_key LIVEKIT_API_SECRET=your_api_secret - Install dependencies:
pip install "livekit-agents[silero,deepgram,openai,cartesia]" python-dotenv - Have an MCP server available (this example uses Codex)
Load environment variables and configure logging. Create an AgentServer to manage the agent lifecycle.
import logging
from dotenv import load_dotenv
from livekit.agents import AgentServer, AgentSession, JobContext, JobProcess, cli, Agent, inference, mcp
from livekit.plugins import silero
from livekit.plugins.turn_detector.multilingual import MultilingualModel
logger = logging.getLogger("mcp-agent")
load_dotenv()
server = AgentServer()Preload the VAD model once per process. This runs before any sessions start and stores the VAD instance in proc.userdata so it can be reused.
def prewarm(proc: JobProcess):
proc.userdata["vad"] = silero.VAD.load()
server.setup_fnc = prewarmKeep the Agent lightweight with just instructions. The MCP server provides the tools, so the agent doesn't need to define any function tools itself. The instructions explain how to interact with the MCP server.
class MyAgent(Agent):
def __init__(self) -> None:
super().__init__(
instructions=(
"""
You can retrieve data via the MCP server. The interface is voice-based:
accept spoken user queries and respond with synthesized speech.
The MCP server is a codex instance running on the local machine.
When you call the codex MCP server, you should use the following parameters:
- approval-policy: never
- sandbox: workspace-write
- prompt: [user_prompt_goes_here]
"""
),
)
async def on_enter(self):
self.session.generate_reply()Create the AgentSession with STT, LLM, TTS, VAD, and the MCP server configuration. The mcp_servers parameter accepts a list of MCP server connections—here we use MCPServerStdio to connect to a local Codex process.
@server.rtc_session()
async def entrypoint(ctx: JobContext):
ctx.log_context_fields = {"room": ctx.room.name}
session = AgentSession(
stt=inference.STT(model="deepgram/nova-3-general"),
llm=inference.LLM(model="openai/gpt-4.1-mini"),
tts=inference.TTS(model="cartesia/sonic-2", voice="6f84f4b8-58a2-430c-8c79-688dad597532"),
vad=ctx.proc.userdata["vad"],
turn_detection=MultilingualModel(),
mcp_servers=[mcp.MCPServerStdio(command="codex", args=["mcp"], client_session_timeout_seconds=600000)],
preemptive_generation=True,
)
agent = MyAgent()
await session.start(agent=agent, room=ctx.room)
await ctx.connect()The cli.run_app() function starts the agent server, manages the worker lifecycle, and processes incoming jobs.
if __name__ == "__main__":
cli.run_app(server)Run the agent using the console command for local testing:
python stdio_mcp_client.py consoleTo test with a real LiveKit room, use dev mode:
python stdio_mcp_client.py dev- The agent connects to the MCP server (Codex) via stdio when the session starts.
- The MCP server exposes tools that the LLM can call.
- When users speak, their requests are transcribed and sent to the LLM.
- The LLM can invoke MCP tools to perform actions like code generation or file operations.
- Tool results are incorporated into the response and spoken back to the user.
import logging
from dotenv import load_dotenv
from livekit.agents import AgentServer, AgentSession, JobContext, JobProcess, cli, Agent, inference, mcp
from livekit.plugins import silero
from livekit.plugins.turn_detector.multilingual import MultilingualModel
logger = logging.getLogger("mcp-agent")
load_dotenv()
class MyAgent(Agent):
def __init__(self) -> None:
super().__init__(
instructions=(
"""
You can retrieve data via the MCP server. The interface is voice-based:
accept spoken user queries and respond with synthesized speech.
The MCP server is a codex instance running on the local machine.
When you call the codex MCP server, you should use the following parameters:
- approval-policy: never
- sandbox: workspace-write
- prompt: [user_prompt_goes_here]
"""
),
)
async def on_enter(self):
self.session.generate_reply()
server = AgentServer()
def prewarm(proc: JobProcess):
proc.userdata["vad"] = silero.VAD.load()
server.setup_fnc = prewarm
@server.rtc_session()
async def entrypoint(ctx: JobContext):
ctx.log_context_fields = {"room": ctx.room.name}
session = AgentSession(
stt=inference.STT(model="deepgram/nova-3-general"),
llm=inference.LLM(model="openai/gpt-4.1-mini"),
tts=inference.TTS(model="cartesia/sonic-2", voice="6f84f4b8-58a2-430c-8c79-688dad597532"),
vad=ctx.proc.userdata["vad"],
turn_detection=MultilingualModel(),
mcp_servers=[mcp.MCPServerStdio(command="codex", args=["mcp"], client_session_timeout_seconds=600000)],
preemptive_generation=True,
)
agent = MyAgent()
await session.start(agent=agent, room=ctx.room)
await ctx.connect()
if __name__ == "__main__":
cli.run_app(server)