Skip to content

fix: skip STT after audio conversion failure#8704

Open
he-yufeng wants to merge 1 commit into
AstrBotDevs:masterfrom
he-yufeng:fix/stt-skip-unconverted-record
Open

fix: skip STT after audio conversion failure#8704
he-yufeng wants to merge 1 commit into
AstrBotDevs:masterfrom
he-yufeng:fix/stt-skip-unconverted-record

Conversation

@he-yufeng

@he-yufeng he-yufeng commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Summary

  • remember which voice components failed WAV preprocessing
  • skip only those failed components during the later STT pass
  • cover both top-level and quoted voice messages with regression tests

Root cause

When ensure_wav() failed, preprocessing logged a warning but left the original Record in the message chain. The STT pass then sent that unconverted file to the configured provider, producing a second, less useful provider error such as HTTP 400 for unsupported AMR input.

Impact

A missing or failing ffmpeg installation no longer causes AstrBot to submit the known-unconverted audio to STT. Other valid voice components in the same message still get transcribed, and non-voice content remains untouched.

Validation

  • pytest tests/test_preprocess_stage.py tests/test_whisper_api_source.py -q (3 passed)
  • python -m ruff check astrbot/core/pipeline/preprocess_stage/stage.py tests/test_preprocess_stage.py
  • python -m ruff format --check astrbot/core/pipeline/preprocess_stage/stage.py tests/test_preprocess_stage.py
  • git diff --check

Fixes #8651

Summary by Sourcery

Ensure speech-to-text processing skips voice components whose audio conversion to WAV failed while leaving other message content unaffected.

Bug Fixes:

  • Prevent unconverted or failed audio records from being sent to the STT provider in both top-level and reply message chains.

Tests:

  • Add regression tests confirming that failed WAV conversion records are preserved and excluded from STT, while valid records are still transcribed.

@dosubot dosubot Bot added size:S This PR changes 10-29 lines, ignoring generated files. area:core The bug / feature is about astrbot's core, backend labels Jun 9, 2026

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces tracking for failed audio conversions during the preprocessing stage to prevent sending failed records to the speech-to-text (STT) provider. It adds a set failed_record_ids to track failed records by their ID and skips STT processing for them. Unit tests are also added to verify this behavior. The reviewer suggests refactoring the duplicate WAV conversion and error handling logic for direct and reply records into a shared helper function to improve code maintainability.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

logger.debug(f"路径映射: {url} -> {component.url}")
message_chain[idx] = component

failed_record_ids: set[int] = set()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There is significant code duplication between processing direct Record components (lines 70-84) and processing Record components inside Reply chains (lines 87-103). Both blocks perform the exact same WAV conversion, temporary file tracking, and error handling.

Following the general rule to avoid code duplication when implementing similar functionality for direct vs. quoted attachments, we should refactor this logic into a shared helper function.

Here is an example of how you can refactor this:

        async def _process_record(record_comp: Record, context: str = "") -> bool:
            try:
                original_path = await record_comp.convert_to_file_path()
                record_path = await ensure_wav(original_path)
                if record_path != original_path:
                    event.track_temporary_local_file(record_path)
                record_comp.file = record_path
                record_comp.path = record_path
                return True
            except Exception as e:
                failed_record_ids.add(id(record_comp))
                logger.warning(f"Voice processing {context}failed: {e}")
                return False

Then, the loops can be simplified to:

        # Process direct Record components
        message_chain = event.get_messages()
        for idx, component in enumerate(message_chain):
            if isinstance(component, Record):
                if await _process_record(component):
                    message_chain[idx] = component

        # Process Record components inside Reply chains
        for component in event.get_messages():
            if isinstance(component, Reply) and component.chain:
                for idx, reply_comp in enumerate(component.chain):
                    if isinstance(reply_comp, Record):
                        if await _process_record(reply_comp, "in reply chain "):
                            component.chain[idx] = reply_comp
References
  1. When implementing similar functionality for different cases (e.g., direct vs. quoted attachments), refactor the logic into a shared helper function to avoid code duplication.

@sourcery-ai sourcery-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue

Prompt for AI Agents
Please address the comments from this code review:

## Individual Comments

### Comment 1
<location path="astrbot/core/pipeline/preprocess_stage/stage.py" line_range="68-72" />
<code_context>
                             logger.debug(f"路径映射: {url} -> {component.url}")
                     message_chain[idx] = component

+        failed_record_ids: set[int] = set()
+
         # In here, we convert all Record components to wav format and update the file path.
</code_context>
<issue_to_address>
**issue (bug_risk):** Tracking failed records by `id()` may be brittle if components are reconstructed or copied.

This relies on the same `Record` instances being reused throughout the STT pass. If `event.get_messages()` or related logic can recreate or copy `Record`s, their `id()` values will change and `failed_record_ids` will no longer prevent reprocessing failed items. In that case, prefer a stable identifier (e.g., a `uid` field, file path, or a boolean flag on the record) instead of `id()` and confirm which invariant actually holds in this pipeline.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines +68 to 72
failed_record_ids: set[int] = set()

# In here, we convert all Record components to wav format and update the file path.
message_chain = event.get_messages()
for idx, component in enumerate(message_chain):

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (bug_risk): Tracking failed records by id() may be brittle if components are reconstructed or copied.

This relies on the same Record instances being reused throughout the STT pass. If event.get_messages() or related logic can recreate or copy Records, their id() values will change and failed_record_ids will no longer prevent reprocessing failed items. In that case, prefer a stable identifier (e.g., a uid field, file path, or a boolean flag on the record) instead of id() and confirm which invariant actually holds in this pipeline.

@he-yufeng

Copy link
Copy Markdown
Contributor Author

Checked the pipeline invariant behind this: AstrMessageEvent.get_messages() returns message_obj.message directly, and the conversion and STT loops reuse those same component instances. Reply.chain is also traversed in place, with no reconstruction or copying between the two passes. Keeping id() here avoids depending on component value-equality semantics; this would need revisiting if the pipeline later starts copying message components.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:core The bug / feature is about astrbot's core, backend size:S This PR changes 10-29 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]STT:ffmpeg 未安装时降级处理

1 participant