Skip to content

feat(stt): honor proxy in OpenAI Whisper provider#8668

Open
SunmiJJW wants to merge 2 commits into
AstrBotDevs:masterfrom
SunmiJJW:feat/openai-whisper-proxy
Open

feat(stt): honor proxy in OpenAI Whisper provider#8668
SunmiJJW wants to merge 2 commits into
AstrBotDevs:masterfrom
SunmiJJW:feat/openai-whisper-proxy

Conversation

@SunmiJJW

@SunmiJJW SunmiJJW commented Jun 8, 2026

Copy link
Copy Markdown

Summary

  • Make the openai_whisper_api STT provider honor its configured proxy value when constructing the OpenAI SDK client.
  • Reuse the OpenAI SDK's own httpx module when building the custom HTTP client, matching the chat-completion provider pattern.
  • Add a focused regression test to verify that the configured proxy is passed through as http_client.

中文说明

这个 PR 把我们本地插件桥里验证过的最小修复拆到 AstrBot 本体:openai_whisper_api 语音转文字 provider 之前会读取 api_base / timeout,但没有把配置里的 proxy 传给 OpenAI SDK,因此健康检查或实际转写在需要代理的环境中仍可能直连失败。这里复用现有 create_proxy_client(),不改变模型、API key、转写流程或其他 STT 行为。

Duplicate check

Checked open PRs for Whisper proxy, STT proxy, openai_whisper_api, and whisper_api_source proxy; no direct duplicate was found. Existing Whisper-related PRs appear to cover other options such as language/prompt, not proxy passthrough.

Tests

  • python -m pytest tests/test_whisper_api_source.py -q
  • python -m ruff format --check .
  • python -m ruff check .

Summary by Sourcery

Make the OpenAI Whisper STT provider use a proxy-aware HTTP client consistent with other OpenAI providers.

Bug Fixes:

  • Ensure the openai_whisper_api provider passes its configured proxy through to the OpenAI SDK HTTP client.

Enhancements:

  • Align the Whisper provider's HTTP client construction with the chat-completion provider by reusing the OpenAI SDK httpx module when available.

Tests:

  • Add a regression test verifying that the configured proxy is propagated to the OpenAI client http_client configuration.

@dosubot dosubot Bot added size:S This PR changes 10-29 lines, ignoring generated files. area:provider The bug / feature is about AI Provider, Models, LLM Agent, LLM Agent Runner. labels Jun 8, 2026

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds proxy support to the OpenAI Whisper API source by creating and passing a custom HTTP client configured with a proxy to AsyncOpenAI. It also includes unit tests to verify that the proxy configuration is correctly propagated. The reviewer identified a potential connection leak, noting that custom HTTP clients passed to AsyncOpenAI are not automatically closed by the OpenAI SDK. To resolve this, the reviewer suggested storing the custom HTTP client as an instance variable so it can be explicitly closed during termination.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +46 to +59
def _create_http_client(self, provider_config: dict) -> httpx.AsyncClient:
proxy = provider_config.get("proxy", "")
httpx_module: Any = httpx
try:
from openai import _base_client as openai_base_client

httpx_module = getattr(openai_base_client, "httpx", httpx)
except ImportError:
pass
return create_proxy_client(
"OpenAI Whisper",
proxy,
httpx_module=httpx_module,
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

When a custom http_client is passed to AsyncOpenAI, calling self.client.close() does not close the custom client (the OpenAI SDK explicitly skips closing custom clients to avoid lifecycle conflicts). This leads to unclosed client/connection leaks.

To fix this, store the created client in self.http_client so that it can be closed in terminate():

    async def terminate(self):
        if self.client:
            await self.client.close()
        if hasattr(self, "http_client") and self.http_client:
            await self.http_client.aclose()
Suggested change
def _create_http_client(self, provider_config: dict) -> httpx.AsyncClient:
proxy = provider_config.get("proxy", "")
httpx_module: Any = httpx
try:
from openai import _base_client as openai_base_client
httpx_module = getattr(openai_base_client, "httpx", httpx)
except ImportError:
pass
return create_proxy_client(
"OpenAI Whisper",
proxy,
httpx_module=httpx_module,
)
def _create_http_client(self, provider_config: dict) -> httpx.AsyncClient:
proxy = provider_config.get("proxy", "")
httpx_module: Any = httpx
try:
from openai import _base_client as openai_base_client
httpx_module = getattr(openai_base_client, "httpx", httpx)
except ImportError:
pass
self.http_client = create_proxy_client(
"OpenAI Whisper",
proxy,
httpx_module=httpx_module,
)
return self.http_client

@sourcery-ai sourcery-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 2 issues, and left some high level feedback:

  • In _create_http_client, consider passing None when proxy is not configured instead of an empty string so create_proxy_client can fall back to its default behavior (e.g., respecting environment configuration) rather than receiving a potentially misleading empty proxy value.
  • The regression test reaches into openai._base_client.httpx, which is an internal detail of the OpenAI SDK and may change; it might be more robust to assert that http_client is constructed via create_proxy_client with the expected module passed in (e.g., by stubbing the module parameter) without relying on a private _base_client attribute.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `_create_http_client`, consider passing `None` when `proxy` is not configured instead of an empty string so `create_proxy_client` can fall back to its default behavior (e.g., respecting environment configuration) rather than receiving a potentially misleading empty proxy value.
- The regression test reaches into `openai._base_client.httpx`, which is an internal detail of the OpenAI SDK and may change; it might be more robust to assert that `http_client` is constructed via `create_proxy_client` with the expected module passed in (e.g., by stubbing the module parameter) without relying on a private `_base_client` attribute.

## Individual Comments

### Comment 1
<location path="tests/test_whisper_api_source.py" line_range="40-49" />
<code_context>
+def test_provider_passes_configured_proxy_to_openai_http_client(monkeypatch):
</code_context>
<issue_to_address>
**suggestion (testing):** Consider adding a complementary test case for when no proxy is configured

Please also add a test where `ProviderOpenAIWhisperAPI` is constructed without a `proxy` key in `provider_config` (e.g. `test_provider_uses_default_http_client_when_proxy_missing`). That test should verify `create_proxy_client` is called with the correct provider label and the default `proxy` value (empty string or `None`, per the intended contract), so the no-proxy behavior remains covered during refactors.

Suggested implementation:

```python
def test_provider_passes_configured_proxy_to_openai_http_client(monkeypatch):
    captured: dict[str, object] = {}
    fake_http_client = object()

    def fake_create_proxy_client(
        provider_label: str,
        proxy: str | None = None,
        headers: dict[str, str] | None = None,
        verify=None,
        httpx_module=None,
    ):
        captured["provider_label"] = provider_label
        captured["proxy"] = proxy
        captured["headers"] = headers
        captured["verify"] = verify
        captured["httpx_module"] = httpx_module

        return fake_http_client

    # Adjust the target string here to match where `create_proxy_client` is imported/used
    monkeypatch.setattr(
        "whisper_api_source.create_proxy_client",
        fake_create_proxy_client,
    )

    provider = ProviderOpenAIWhisperAPI(
        provider_config={
            # include whatever keys are normally required by your provider_config
            "provider_label": "openai_whisper_api",
            "proxy": "http://configured-proxy.example.com",
        }
    )

    # Trigger the code path that builds the HTTP client. Adjust this call as needed.
    http_client = provider._get_http_client()

    assert http_client is fake_http_client
    assert captured["provider_label"] == "openai_whisper_api"
    assert captured["proxy"] == "http://configured-proxy.example.com"


def test_provider_uses_default_http_client_when_proxy_missing(monkeypatch):
    captured: dict[str, object] = {}
    fake_http_client = object()

    def fake_create_proxy_client(
        provider_label: str,
        proxy: str | None = None,
        headers: dict[str, str] | None = None,
        verify=None,
        httpx_module=None,
    ):
        captured["provider_label"] = provider_label
        captured["proxy"] = proxy
        captured["headers"] = headers
        captured["verify"] = verify
        captured["httpx_module"] = httpx_module

        return fake_http_client

    # Adjust the target string here to match where `create_proxy_client` is imported/used
    monkeypatch.setattr(
        "whisper_api_source.create_proxy_client",
        fake_create_proxy_client,
    )

    # Construct the provider WITHOUT a `proxy` key in provider_config
    provider = ProviderOpenAIWhisperAPI(
        provider_config={
            # include whatever keys are normally required by your provider_config
            "provider_label": "openai_whisper_api",
            # NOTE: no "proxy" key here on purpose
        }
    )

    # Trigger the code path that builds the HTTP client. Adjust this call as needed.
    http_client = provider._get_http_client()

    assert http_client is fake_http_client
    assert captured["provider_label"] == "openai_whisper_api"
    # Depending on your intended contract, assert the default value here:
    # - if default is "", use `== ""`
    # - if default is None, use `is None`
    assert captured["proxy"] in ("", None)

```

To integrate this cleanly with your existing codebase, you will likely need to:

1. **Adjust the monkeypatch target**: Replace `"whisper_api_source.create_proxy_client"` with the actual import path used in your production code (e.g. `"app.providers.whisper_api_source.create_proxy_client"`).
2. **Align `provider_config`**:
   - Replace `"provider_label": "openai_whisper_api"` with the real key/value(s) your `ProviderOpenAIWhisperAPI` expects (e.g. `{"label": "whisper_api"}` or whatever your config schema is).
   - Ensure all required config keys (API key, base URL, etc.) are included so the provider initializes correctly.
3. **Use the proper provider label assertion**:
   - If `create_proxy_client` is called with some constant like `ProviderOpenAIWhisperAPI.PROVIDER_LABEL`, assert against that instead of the hardcoded `"openai_whisper_api"`.
4. **Match the HTTP client creation API**:
   - If your provider does not expose `_get_http_client()`, replace that call with whatever actually triggers the creation of the OpenAI HTTP client (e.g. accessing `provider._openai_http_client`, calling `provider._ensure_client()`, etc.).
5. **Proxy default contract**:
   - If the intended default proxy is always `None`, change `assert captured["proxy"] in ("", None)` to `assert captured["proxy"] is None`.
   - If it is always an empty string, change it to `assert captured["proxy"] == ""`.
</issue_to_address>

### Comment 2
<location path="tests/test_whisper_api_source.py" line_range="11-16" />
<code_context>
 from astrbot.core.provider.sources.whisper_api_source import ProviderOpenAIWhisperAPI


+class _FakeAsyncOpenAI:
+    def __init__(self, **kwargs):
+        self.kwargs = kwargs
+
+    async def close(self):
+        return None
+
+
</code_context>
<issue_to_address>
**suggestion (testing):** Tighten the fake client assertions to better lock in the HTTP client wiring

`_FakeAsyncOpenAI` currently only stores `kwargs`, and tests assert individual keys later. To better lock in the client wiring, also assert that `kwargs` either contains `http_client` (at minimum) or only has the expected keys (`api_key`, `base_url`, `timeout`, `http_client`). This will help catch any unexpected parameters passed to `AsyncOpenAI` in future changes.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines +40 to +49
def test_provider_passes_configured_proxy_to_openai_http_client(monkeypatch):
captured: dict[str, object] = {}
fake_http_client = object()

def fake_create_proxy_client(
provider_label: str,
proxy: str | None = None,
headers: dict[str, str] | None = None,
verify=None,
httpx_module=None,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (testing): Consider adding a complementary test case for when no proxy is configured

Please also add a test where ProviderOpenAIWhisperAPI is constructed without a proxy key in provider_config (e.g. test_provider_uses_default_http_client_when_proxy_missing). That test should verify create_proxy_client is called with the correct provider label and the default proxy value (empty string or None, per the intended contract), so the no-proxy behavior remains covered during refactors.

Suggested implementation:

def test_provider_passes_configured_proxy_to_openai_http_client(monkeypatch):
    captured: dict[str, object] = {}
    fake_http_client = object()

    def fake_create_proxy_client(
        provider_label: str,
        proxy: str | None = None,
        headers: dict[str, str] | None = None,
        verify=None,
        httpx_module=None,
    ):
        captured["provider_label"] = provider_label
        captured["proxy"] = proxy
        captured["headers"] = headers
        captured["verify"] = verify
        captured["httpx_module"] = httpx_module

        return fake_http_client

    # Adjust the target string here to match where `create_proxy_client` is imported/used
    monkeypatch.setattr(
        "whisper_api_source.create_proxy_client",
        fake_create_proxy_client,
    )

    provider = ProviderOpenAIWhisperAPI(
        provider_config={
            # include whatever keys are normally required by your provider_config
            "provider_label": "openai_whisper_api",
            "proxy": "http://configured-proxy.example.com",
        }
    )

    # Trigger the code path that builds the HTTP client. Adjust this call as needed.
    http_client = provider._get_http_client()

    assert http_client is fake_http_client
    assert captured["provider_label"] == "openai_whisper_api"
    assert captured["proxy"] == "http://configured-proxy.example.com"


def test_provider_uses_default_http_client_when_proxy_missing(monkeypatch):
    captured: dict[str, object] = {}
    fake_http_client = object()

    def fake_create_proxy_client(
        provider_label: str,
        proxy: str | None = None,
        headers: dict[str, str] | None = None,
        verify=None,
        httpx_module=None,
    ):
        captured["provider_label"] = provider_label
        captured["proxy"] = proxy
        captured["headers"] = headers
        captured["verify"] = verify
        captured["httpx_module"] = httpx_module

        return fake_http_client

    # Adjust the target string here to match where `create_proxy_client` is imported/used
    monkeypatch.setattr(
        "whisper_api_source.create_proxy_client",
        fake_create_proxy_client,
    )

    # Construct the provider WITHOUT a `proxy` key in provider_config
    provider = ProviderOpenAIWhisperAPI(
        provider_config={
            # include whatever keys are normally required by your provider_config
            "provider_label": "openai_whisper_api",
            # NOTE: no "proxy" key here on purpose
        }
    )

    # Trigger the code path that builds the HTTP client. Adjust this call as needed.
    http_client = provider._get_http_client()

    assert http_client is fake_http_client
    assert captured["provider_label"] == "openai_whisper_api"
    # Depending on your intended contract, assert the default value here:
    # - if default is "", use `== ""`
    # - if default is None, use `is None`
    assert captured["proxy"] in ("", None)

To integrate this cleanly with your existing codebase, you will likely need to:

  1. Adjust the monkeypatch target: Replace "whisper_api_source.create_proxy_client" with the actual import path used in your production code (e.g. "app.providers.whisper_api_source.create_proxy_client").
  2. Align provider_config:
    • Replace "provider_label": "openai_whisper_api" with the real key/value(s) your ProviderOpenAIWhisperAPI expects (e.g. {"label": "whisper_api"} or whatever your config schema is).
    • Ensure all required config keys (API key, base URL, etc.) are included so the provider initializes correctly.
  3. Use the proper provider label assertion:
    • If create_proxy_client is called with some constant like ProviderOpenAIWhisperAPI.PROVIDER_LABEL, assert against that instead of the hardcoded "openai_whisper_api".
  4. Match the HTTP client creation API:
    • If your provider does not expose _get_http_client(), replace that call with whatever actually triggers the creation of the OpenAI HTTP client (e.g. accessing provider._openai_http_client, calling provider._ensure_client(), etc.).
  5. Proxy default contract:
    • If the intended default proxy is always None, change assert captured["proxy"] in ("", None) to assert captured["proxy"] is None.
    • If it is always an empty string, change it to assert captured["proxy"] == "".

Comment thread tests/test_whisper_api_source.py Outdated
Comment on lines +11 to +16
class _FakeAsyncOpenAI:
def __init__(self, **kwargs):
self.kwargs = kwargs

async def close(self):
return None

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (testing): Tighten the fake client assertions to better lock in the HTTP client wiring

_FakeAsyncOpenAI currently only stores kwargs, and tests assert individual keys later. To better lock in the client wiring, also assert that kwargs either contains http_client (at minimum) or only has the expected keys (api_key, base_url, timeout, http_client). This will help catch any unexpected parameters passed to AsyncOpenAI in future changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:provider The bug / feature is about AI Provider, Models, LLM Agent, LLM Agent Runner. size:S This PR changes 10-29 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant