Skip to content

feat: add Pulse STT support for smallest.ai pulse (streaming + pre-recorded) #4858

Open
mahimairaja wants to merge 24 commits intolivekit:mainfrom
mahimairaja:feat/smallest-ai-stt
Open

feat: add Pulse STT support for smallest.ai pulse (streaming + pre-recorded) #4858
mahimairaja wants to merge 24 commits intolivekit:mainfrom
mahimairaja:feat/smallest-ai-stt

Conversation

@mahimairaja
Copy link
Contributor

What does this PR does?

Adds Speech-to-Text (STT) support to the livekit-plugins-smallestai plugin using Smallest AI's Pulse STT API. The existing plugin only supported TTS, this PR brings it to parity with plugins like Deepgram, ElevenLabs, and Soniox that offer both TTS and STT.

Closes #4856

Summary of Changes

New: STT class (stt.py)

  • Pre-recorded transcription via HTTP POST (/api/v1/pulse/get_text)
  • Real-time streaming via WebSocket (wss://waves-api.smallest.ai/api/v1/pulse/get_text)
  • ~64ms TTFB streaming, word-level timestamps, speaker diarization
  • 32+ languages with auto-detection (language="multi")
  • Capabilities: streaming=True, interim_results=True

New: SpeechStream class (stt.py)

  • WebSocket-based streaming with concurrent send/recv/keepalive tasks
  • Audio chunking via AudioByteStream (~4096 byte chunks per Smallest AI docs)
  • Full speech event lifecycle: START_OF_SPEECHINTERIM_TRANSCRIPT / FINAL_TRANSCRIPTEND_OF_SPEECH
  • Graceful shutdown with {"type": "end"} signaling

Usage

from livekit.plugins import smallestai

# Pre-recorded
stt = smallestai.STT(language="en")

# Streaming (used in AgentSession)
session = AgentSession(
    stt=smallestai.STT(language="en"),
    llm=...,
    tts=smallestai.TTS(),
)

Configuration via SMALLEST_API_KEY environment variable (same key used for TTS).

Testing

  • Verified pre-recorded transcription with WAV audio files
  • Verified real-time streaming with live microphone input via LiveKit Agents Playground
  • Tested interim + final transcript emission and speech event lifecycle
  • Tested with language="en" and language="multi" (auto-detection)
  • Ran ruff format and check
❯ uv run ruff check .
All checks passed!

❯ uv run ruff format .
629 files left unchanged
  • Ran type checking
❯ uv pip install pip && uv run mypy --install-types --non-interactive \
    -p livekit.agents \
    -p livekit.plugins.smallestai
Audited 1 package in 5ms
Success: no issues found in 169 source files

API Reference

@CLAassistant
Copy link

CLAassistant commented Feb 16, 2026

CLA assistant check
All committers have signed the CLA.

devin-ai-integration[bot]

This comment was marked as resolved.

chatgpt-codex-connector[bot]

This comment was marked as resolved.

@mahimairaja
Copy link
Contributor Author

mahimairaja commented Feb 16, 2026

Tested prerecorded:

import asyncio
from pathlib import Path

import aiohttp
from dotenv import load_dotenv

from livekit.agents import utils
from livekit.plugins import smallestai

load_dotenv()


async def main():
    wav = Path(__file__).resolve().parent / "sample.wav"

    async with aiohttp.ClientSession() as session:
        stt = smallestai.STT(language="en", http_session=session)
        frames = [
            f
            async for f in utils.audio.audio_frames_from_file(
                str(wav), sample_rate=16000, num_channels=1
            )
        ]
        event = await stt.recognize(frames)

    print(event.alternatives[0].text if event.alternatives else "")


if __name__ == "__main__":
    asyncio.run(main())

@mahimairaja
Copy link
Contributor Author

mahimairaja commented Feb 16, 2026

Testing streaming:

from dotenv import load_dotenv

from livekit import agents
from livekit.agents import Agent, AgentServer, AgentSession, room_io
from livekit.plugins import silero
from livekit.plugins.openai.llm import LLM
from livekit.plugins.smallestai.stt import STT
from livekit.plugins.smallestai.tts import TTS
from livekit.plugins.turn_detector.english import EnglishModel

load_dotenv()


class Assistant(Agent):
    def __init__(self) -> None:
        super().__init__(
            instructions="""You are a helpful voice AI assistant.""",
        )


server = AgentServer()


@server.rtc_session(agent_name="my-agent")
async def my_agent(ctx: agents.JobContext):
    session = AgentSession(
        stt=STT(),
        llm=LLM(model="gpt-4.1-mini"),
        tts=TTS(),
        vad=silero.VAD.load(),
        turn_detection=EnglishModel(),
    )

    await session.start(
        room=ctx.room,
        agent=Assistant(),
        room_options=room_io.RoomOptions(),
    )

    await session.generate_reply(instructions="Greet the user and offer your assistance.")


if __name__ == "__main__":
    agents.cli.run_app(server)

devin-ai-integration[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

@mahimairaja
Copy link
Contributor Author

After conversations with @ harshitajain165 from smallest.ai, I came to know that few more steps needed for streaming support from the smallest server. for now I am moving this PR to draft.

@mahimairaja mahimairaja marked this pull request as draft February 16, 2026 19:35
@mahimairaja mahimairaja marked this pull request as ready for review March 2, 2026 20:53
devin-ai-integration[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

Copy link
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

View 14 additional findings in Devin Review.

Open in Devin Review

Comment on lines +248 to +262
if is_given(encoding):
self._opts.encoding = encoding
if is_given(word_timestamps):
self._opts.word_timestamps = word_timestamps
if is_given(diarize):
self._opts.diarize = diarize

for stream in self._streams:
stream.update_options(
language=language,
sample_rate=sample_rate,
encoding=encoding,
word_timestamps=word_timestamps,
diarize=diarize,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 STT.update_options modifies own state before stream validation, corrupting state on failure

STT.update_options at line 249 sets self._opts.encoding = encoding before propagating to streams at line 256. When the stream's update_options calls _validate_stream_encoding(encoding) at stt.py:315, any non-"linear16" encoding (e.g., "mulaw", "opus") raises ValueError. At that point, the STT's _opts.encoding is already corrupted to the invalid value. This has two consequences: (1) subsequent calls to self.stream() (stt.py:224) will also fail because _validate_stream_encoding(config.encoding) uses the now-invalid STT encoding, and (2) if multiple streams exist, only the first stream's update_options is attempted — later streams are never updated. The STT is left in a broken state where streaming is permanently disabled until the user manually resets with update_options(encoding="linear16").

Prompt for agents
In livekit-plugins/livekit-plugins-smallestai/livekit/plugins/smallestai/stt.py, the STT.update_options method (lines 235-262) should validate the encoding before modifying self._opts. Add a call to _validate_stream_encoding(encoding) at the top of the method (or at least before line 249 where self._opts.encoding is set) when encoding is given and there are active streams. For example, after line 243 add:

    if is_given(encoding) and self._streams:
        _validate_stream_encoding(encoding)

This ensures the STT state is not modified if the validation will fail when propagated to streams.
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: Add STT (Speech-to-Text) support to livekit-plugins-smallestai

2 participants