Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 15 additions & 0 deletions sdk/voicelive/azure-ai-voicelive/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,20 @@
# Release History

## 1.3.0b1 (2026-05-22)

### Features Added

- **Azure Realtime Native Voice Support**: Added `AzureRealtimeNativeVoice` and
`AzureRealtimeNativeVoiceName`, and expanded `voice` fields to accept Azure realtime native voices.
- **WebRTC Call Negotiation Support**: Added `ClientEventRtcCallSdpCreate`, `ServerEventRtcCallSdpCreated`,
`ServerEventRtcCallError`, and `RtcCallErrorDetails` for SDP-based WebRTC call setup.
- **Hosted Agent Invocation Input**: Added `invoke_input` to `ResponseCreateParams` and
`ServerEventResponseInvocationDelta` for hosted agent invocation passthrough data.
- **Audio Playback Lifecycle Events**: Added `ServerEventOutputAudioBufferStarted` and
`ServerEventOutputAudioBufferStopped` to track model audio playback start and stop.
- **Echo Cancellation Configuration**: Added `EchoCancellationReferenceSource` and new
`reference_source` / `channels` options on `AudioEchoCancellation` for client-provided stereo echo reference input.

## 1.2.0 (2026-05-22)

### Features Added
Expand Down
42 changes: 23 additions & 19 deletions sdk/voicelive/azure-ai-voicelive/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ This package provides a **real-time, speech-to-speech** client for Azure AI Voic
It opens a WebSocket session to stream microphone audio to the service and receive
typed server events (including audio) for responsive, interruptible conversations.

> **Status:** General Availability (GA). This is a stable release suitable for production use.
> **Status:** Preview (`1.3.0b1`). This beta release includes the latest SDK and sample updates and may change before the next stable release.

> **Important:** As of version 1.0.0, this SDK is **async-only**. The synchronous API has been removed to focus exclusively on async patterns. All examples and samples use `async`/`await` syntax.

Expand All @@ -16,34 +16,35 @@ Getting started

### Prerequisites

- **Python 3.9+**
- **Python 3.10+**
- An **Azure subscription**
- A **VoiceLive** resource and endpoint
- A working **microphone** and **speakers/headphones** if you run the voice samples

### Install

Install the stable GA version:
Install the latest preview version:

```bash
# Base install (core client only)
python -m pip install azure-ai-voicelive
python -m pip install --pre azure-ai-voicelive

# For asynchronous streaming (uses aiohttp)
python -m pip install "azure-ai-voicelive[aiohttp]"
python -m pip install --pre "azure-ai-voicelive[aiohttp]"

# For voice samples (includes audio processing)
# First install PyAudio dependencies for your platform:
# Linux: sudo apt-get install -y portaudio19-dev libasound2-dev
# macOS: brew install portaudio
python -m pip install azure-ai-voicelive[aiohttp] pyaudio python-dotenv
python -m pip install --pre "azure-ai-voicelive[aiohttp]" azure-identity pyaudio python-dotenv
```

The SDK provides async-only WebSocket connections using `aiohttp` for optimal performance and reliability.

### Authenticate

You can authenticate with an **API key** or an **Azure Active Directory (AAD) token**.
You can authenticate with an **API key** or a Microsoft Entra ID token.
The samples default to `DefaultAzureCredential`; for local development, `az login` is usually the simplest path.

#### API Key Authentication (Quick Start)

Expand All @@ -66,7 +67,7 @@ async def main():
async with connect(
endpoint="your-endpoint",
credential=AzureKeyCredential("your-api-key"),
model="gpt-4o-realtime-preview"
model="gpt-realtime"
) as connection:
# Your async code here
pass
Expand All @@ -76,7 +77,7 @@ asyncio.run(main())

#### AAD Token Authentication

For production applications, AAD authentication is recommended:
For production applications, Entra ID authentication is recommended:

```python
import asyncio
Expand All @@ -85,14 +86,17 @@ from azure.ai.voicelive import connect

async def main():
credential = DefaultAzureCredential()

async with connect(
endpoint="your-endpoint",
credential=credential,
model="gpt-4o-realtime-preview"
) as connection:
# Your async code here
pass

try:
async with connect(
endpoint="your-endpoint",
credential=credential,
model="gpt-realtime"
) as connection:
# Your async code here
pass
finally:
await credential.close()

asyncio.run(main())
```
Expand Down Expand Up @@ -142,7 +146,7 @@ The Basic Voice Assistant sample demonstrates full-featured voice interaction wi
python samples/basic_voice_assistant_async.py

# With custom parameters
python samples/basic_voice_assistant_async.py --model gpt-4o-realtime-preview --voice alloy --instructions "You're a helpful assistant"
python samples/basic_voice_assistant_async.py --model gpt-realtime --voice alloy --instructions "You're a helpful assistant"
```

### Minimal example
Expand All @@ -157,7 +161,7 @@ from azure.ai.voicelive.models import (

API_KEY = "your-api-key"
ENDPOINT = "wss://your-endpoint.com/openai/realtime"
MODEL = "gpt-4o-realtime-preview"
MODEL = "gpt-realtime"

async def main():
async with connect(
Expand Down
4 changes: 2 additions & 2 deletions sdk/voicelive/azure-ai-voicelive/_metadata.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"apiVersion": "2026-04-10",
"apiVersion": "2026-06-01-preview",
"apiVersions": {
"VoiceLive": "2026-04-10"
"VoiceLive": "2026-06-01-preview"
Comment on lines +2 to +4
}
}
42 changes: 26 additions & 16 deletions sdk/voicelive/azure-ai-voicelive/apiview-properties.json
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
"azure.ai.voicelive.models.AzureAvatarVoiceSyncVoice": "VoiceLive.AzureAvatarVoiceSyncVoice",
"azure.ai.voicelive.models.AzureCustomVoice": "VoiceLive.AzureCustomVoice",
"azure.ai.voicelive.models.AzurePersonalVoice": "VoiceLive.AzurePersonalVoice",
"azure.ai.voicelive.models.AzureRealtimeNativeVoice": "VoiceLive.AzureRealtimeNativeVoice",
"azure.ai.voicelive.models.EouDetection": "VoiceLive.EouDetection",
"azure.ai.voicelive.models.AzureSemanticDetection": "VoiceLive.AzureSemanticDetection",
"azure.ai.voicelive.models.AzureSemanticDetectionEn": "VoiceLive.AzureSemanticDetectionEn",
Expand Down Expand Up @@ -45,6 +46,7 @@
"azure.ai.voicelive.models.ClientEventOutputAudioBufferClear": "VoiceLive.ClientEventOutputAudioBufferClear",
"azure.ai.voicelive.models.ClientEventResponseCancel": "VoiceLive.ClientEventResponseCancel",
"azure.ai.voicelive.models.ClientEventResponseCreate": "VoiceLive.ClientEventResponseCreate",
"azure.ai.voicelive.models.ClientEventRtcCallSdpCreate": "VoiceLive.ClientEventRtcCallSdpCreate",
"azure.ai.voicelive.models.ClientEventSessionAvatarConnect": "VoiceLive.ClientEventSessionAvatarConnect",
"azure.ai.voicelive.models.ClientEventSessionUpdate": "VoiceLive.ClientEventSessionUpdate",
"azure.ai.voicelive.models.ContentPart": "VoiceLive.ContentPart",
Expand Down Expand Up @@ -92,6 +94,7 @@
"azure.ai.voicelive.models.ResponseSession": "VoiceLive.ResponseSession",
"azure.ai.voicelive.models.ResponseTextContentPart": "VoiceLive.ResponseTextContentPart",
"azure.ai.voicelive.models.ResponseWebSearchCallItem": "VoiceLive.ResponseWebSearchCallItem",
"azure.ai.voicelive.models.RtcCallErrorDetails": "VoiceLive.RtcCallErrorDetails",
"azure.ai.voicelive.models.Scene": "VoiceLive.Scene",
"azure.ai.voicelive.models.ServerEvent": "VoiceLive.ServerEvent",
"azure.ai.voicelive.models.ServerEventConversationItemCreated": "VoiceLive.ServerEventConversationItemCreated",
Expand All @@ -111,6 +114,8 @@
"azure.ai.voicelive.models.ServerEventMcpListToolsFailed": "VoiceLive.ServerEventMcpListToolsFailed",
"azure.ai.voicelive.models.ServerEventMcpListToolsInProgress": "VoiceLive.ServerEventMcpListToolsInProgress",
"azure.ai.voicelive.models.ServerEventOutputAudioBufferCleared": "VoiceLive.ServerEventOutputAudioBufferCleared",
"azure.ai.voicelive.models.ServerEventOutputAudioBufferStarted": "VoiceLive.ServerEventOutputAudioBufferStarted",
"azure.ai.voicelive.models.ServerEventOutputAudioBufferStopped": "VoiceLive.ServerEventOutputAudioBufferStopped",
"azure.ai.voicelive.models.ServerEventResponseAnimationBlendshapeDelta": "VoiceLive.ServerEventResponseAnimationBlendshapeDelta",
"azure.ai.voicelive.models.ServerEventResponseAnimationBlendshapeDone": "VoiceLive.ServerEventResponseAnimationBlendshapeDone",
"azure.ai.voicelive.models.ServerEventResponseAnimationVisemeDelta": "VoiceLive.ServerEventResponseAnimationVisemeDelta",
Expand All @@ -131,6 +136,7 @@
"azure.ai.voicelive.models.ServerEventResponseFileSearchCallSearching": "VoiceLive.ServerEventResponseFileSearchCallSearching",
"azure.ai.voicelive.models.ServerEventResponseFunctionCallArgumentsDelta": "VoiceLive.ServerEventResponseFunctionCallArgumentsDelta",
"azure.ai.voicelive.models.ServerEventResponseFunctionCallArgumentsDone": "VoiceLive.ServerEventResponseFunctionCallArgumentsDone",
"azure.ai.voicelive.models.ServerEventResponseInvocationDelta": "VoiceLive.ServerEventResponseInvocationDelta",
"azure.ai.voicelive.models.ServerEventResponseMcpCallArgumentsDelta": "VoiceLive.ServerEventResponseMcpCallArgumentsDelta",
"azure.ai.voicelive.models.ServerEventResponseMcpCallArgumentsDone": "VoiceLive.ServerEventResponseMcpCallArgumentsDone",
"azure.ai.voicelive.models.ServerEventResponseMcpCallCompleted": "VoiceLive.ServerEventResponseMcpCallCompleted",
Expand All @@ -144,6 +150,8 @@
"azure.ai.voicelive.models.ServerEventResponseWebSearchCallCompleted": "VoiceLive.ServerEventResponseWebSearchCallCompleted",
"azure.ai.voicelive.models.ServerEventResponseWebSearchCallInProgress": "VoiceLive.ServerEventResponseWebSearchCallInProgress",
"azure.ai.voicelive.models.ServerEventResponseWebSearchCallSearching": "VoiceLive.ServerEventResponseWebSearchCallSearching",
"azure.ai.voicelive.models.ServerEventRtcCallError": "VoiceLive.ServerEventRtcCallError",
"azure.ai.voicelive.models.ServerEventRtcCallSdpCreated": "VoiceLive.ServerEventRtcCallSdpCreated",
"azure.ai.voicelive.models.ServerEventSessionAvatarConnecting": "VoiceLive.ServerEventSessionAvatarConnecting",
"azure.ai.voicelive.models.ServerEventSessionAvatarSwitchToIdle": "VoiceLive.ServerEventSessionAvatarSwitchToIdle",
"azure.ai.voicelive.models.ServerEventSessionAvatarSwitchToSpeaking": "VoiceLive.ServerEventSessionAvatarSwitchToSpeaking",
Expand All @@ -165,35 +173,37 @@
"azure.ai.voicelive.models.VideoParams": "VoiceLive.VideoParams",
"azure.ai.voicelive.models.VideoResolution": "VoiceLive.VideoResolution",
"azure.ai.voicelive.models.VoiceLiveErrorDetails": "VoiceLive.VoiceLiveErrorDetails",
"azure.ai.voicelive.models.ClientEventType": "VoiceLive.ClientEventType",
"azure.ai.voicelive.models.ItemType": "VoiceLive.ItemType",
"azure.ai.voicelive.models.ItemParamStatus": "VoiceLive.ItemParamStatus",
"azure.ai.voicelive.models.MessageRole": "VoiceLive.MessageRole",
"azure.ai.voicelive.models.ContentPartType": "VoiceLive.ContentPartType",
"azure.ai.voicelive.models.Modality": "VoiceLive.Modality",
"azure.ai.voicelive.models.AnimationOutputType": "VoiceLive.AnimationOutputType",
"azure.ai.voicelive.models.OpenAIVoiceName": "VoiceLive.OAIVoice",
"azure.ai.voicelive.models.AzureVoiceType": "VoiceLive.AzureVoiceType",
"azure.ai.voicelive.models.PersonalVoiceModels": "VoiceLive.PersonalVoiceModels",
"azure.ai.voicelive.models.OutputAudioFormat": "VoiceLive.OutputAudioFormat",
"azure.ai.voicelive.models.AzureRealtimeNativeVoiceName": "VoiceLive.AzureRealtimeNativeVoiceName",
"azure.ai.voicelive.models.EouThresholdLevel": "VoiceLive.EouThresholdLevel",
"azure.ai.voicelive.models.TurnDetectionType": "VoiceLive.TurnDetectionType",
"azure.ai.voicelive.models.EchoCancellationReferenceSource": "VoiceLive.EchoCancellationReferenceSource",
"azure.ai.voicelive.models.AvatarConfigTypes": "VoiceLive.AvatarConfigTypes",
"azure.ai.voicelive.models.PhotoAvatarBaseModes": "VoiceLive.PhotoAvatarBaseModes",
"azure.ai.voicelive.models.AvatarOutputProtocol": "VoiceLive.AvatarOutputProtocol",
"azure.ai.voicelive.models.ToolType": "VoiceLive.ToolType",
"azure.ai.voicelive.models.MCPApprovalType": "VoiceLive.MCPApprovalType",
"azure.ai.voicelive.models.ReasoningEffort": "VoiceLive.ReasoningEffort",
"azure.ai.voicelive.models.InterimResponseConfigType": "VoiceLive.InterimResponseConfigType",
"azure.ai.voicelive.models.InterimResponseTrigger": "VoiceLive.InterimResponseTrigger",
"azure.ai.voicelive.models.AnimationOutputType": "VoiceLive.AnimationOutputType",
"azure.ai.voicelive.models.Modality": "VoiceLive.Modality",
"azure.ai.voicelive.models.InputAudioFormat": "VoiceLive.InputAudioFormat",
"azure.ai.voicelive.models.TurnDetectionType": "VoiceLive.TurnDetectionType",
"azure.ai.voicelive.models.EouThresholdLevel": "VoiceLive.EouThresholdLevel",
"azure.ai.voicelive.models.AvatarConfigTypes": "VoiceLive.AvatarConfigTypes",
"azure.ai.voicelive.models.PhotoAvatarBaseModes": "VoiceLive.PhotoAvatarBaseModes",
"azure.ai.voicelive.models.AvatarOutputProtocol": "VoiceLive.AvatarOutputProtocol",
"azure.ai.voicelive.models.OutputAudioFormat": "VoiceLive.OutputAudioFormat",
"azure.ai.voicelive.models.AudioTimestampType": "VoiceLive.AudioTimestampType",
"azure.ai.voicelive.models.ToolChoiceLiteral": "VoiceLive.ToolChoiceLiteral",
"azure.ai.voicelive.models.ReasoningEffort": "VoiceLive.ReasoningEffort",
"azure.ai.voicelive.models.SessionIncludeOption": "VoiceLive.SessionIncludeOption",
"azure.ai.voicelive.models.ClientEventType": "VoiceLive.ClientEventType",
"azure.ai.voicelive.models.ItemType": "VoiceLive.ItemType",
"azure.ai.voicelive.models.ItemParamStatus": "VoiceLive.ItemParamStatus",
"azure.ai.voicelive.models.MessageRole": "VoiceLive.MessageRole",
"azure.ai.voicelive.models.ContentPartType": "VoiceLive.ContentPartType",
"azure.ai.voicelive.models.ResponseStatus": "VoiceLive.ResponseStatus",
"azure.ai.voicelive.models.ResponseItemStatus": "VoiceLive.ResponseItemStatus",
"azure.ai.voicelive.models.RequestImageContentPartDetail": "VoiceLive.RequestImageContentPartDetail",
"azure.ai.voicelive.models.ResponseItemStatus": "VoiceLive.ResponseItemStatus",
"azure.ai.voicelive.models.ServerEventType": "VoiceLive.ServerEventType"
},
"CrossLanguageVersion": "4f7c08a38aa5"
"CrossLanguageVersion": "d4391398f022"
}
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,8 @@

if TYPE_CHECKING:
from . import models as _models
Voice = Union[str, "_models.OpenAIVoiceName", "_models.OpenAIVoice", "_models.AzureVoice"]
InterimResponseConfig = Union["_models.StaticInterimResponseConfig", "_models.LlmInterimResponseConfig"]
Voice = Union[
str, "_models.OpenAIVoiceName", "_models.OpenAIVoice", "_models.AzureVoice", "_models.AzureRealtimeNativeVoice"
]
ToolChoice = Union[str, "_models.ToolChoiceLiteral", "_models.ToolChoiceSelection"]
InterimResponseConfig = Union["_models.StaticInterimResponseConfig", "_models.LlmInterimResponseConfig"]
Loading
Loading