Skip to content

Optimize omni merge#1255

Open
WANDY666 wants to merge 22 commits intomainfrom
optimize_omni_merge
Open

Optimize omni merge#1255
WANDY666 wants to merge 22 commits intomainfrom
optimize_omni_merge

Conversation

@WANDY666
Copy link
Copy Markdown
Contributor

@WANDY666 WANDY666 commented Apr 3, 2026

Optimized the audio path

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces significant optimizations and features for audio multimodal processing, including the addition of Triton autotune kernel configurations for the RTX 5090, implementation of audio preloading and warmup mechanisms, and the introduction of a prompt encoding cache. The server components were updated to support audio-specific batch sizes and data parallelism, while the multimodal parameter handling was refactored to support shared memory formats and more efficient resource allocation. Review feedback highlighted a bug where the device was hardcoded to CUDA in audio preprocessing, and suggested offloading CPU-bound tasks like audio loading and MD5 hashing to threads to avoid blocking the asynchronous event loop.

Comment on lines +126 to +128
compact_features = torch.from_numpy(extracted[:, :, :num_frames]).to(device="cuda", dtype=torch.bfloat16)
compact_features = compact_features[0].contiguous()
feature_lens = torch.tensor([num_frames], device="cuda", dtype=torch.long)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The device argument passed to _preprocess_single_padded is ignored when creating the compact_features and feature_lens tensors, as they are hardcoded to "cuda". This will cause failures if the intended device is "cpu" (which is the default value for the device parameter) or a specific GPU index.

Suggested change
compact_features = torch.from_numpy(extracted[:, :, :num_frames]).to(device="cuda", dtype=torch.bfloat16)
compact_features = compact_features[0].contiguous()
feature_lens = torch.tensor([num_frames], device="cuda", dtype=torch.long)
compact_features = torch.from_numpy(extracted[:, :, :num_frames]).to(device=device, dtype=torch.bfloat16)
compact_features = compact_features[0].contiguous()
feature_lens = torch.tensor([num_frames], device=device, dtype=torch.long)

audio_values, _ = librosa.load(BytesIO(audio_data), sr=16000)
from lightllm.models.whisper.defaults import MIN_AUDIO_LEN
decode_start = time.time()
audio_values, _ = librosa.load(BytesIO(audio_data), sr=target_sample_rate)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

librosa.load is a CPU-bound blocking operation. Calling it directly within an async method like preload will block the event loop, preventing the server from handling other concurrent requests. This is particularly impactful for long audio files. Consider offloading this to a thread pool using asyncio.to_thread (for Python 3.9+) or loop.run_in_executor.

Suggested change
audio_values, _ = librosa.load(BytesIO(audio_data), sr=target_sample_rate)
audio_values, _ = await asyncio.to_thread(librosa.load, BytesIO(audio_data), sr=target_sample_rate)

self.tokenizer.init_imageitem_extral_params(img, multimodal_params, sampling_params)
data = img.read()
token_num = self.tokenizer.get_image_token_length(img)
md5sum = hashlib.md5(data).hexdigest() + "_" + str(hash(frozendict(img.extra_params)))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Calculating the MD5 hash of image data is a CPU-bound operation that can block the event loop, especially for large images. Since the preload method already runs asynchronously, it would be more efficient to calculate and store the MD5 hash in the ImageItem during the preloading phase (similar to the implementation for AudioItem) and reuse it here to avoid blocking the main server loop.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant