Fix BOS token embedding: use embed_tokens() instead of raw integer by Mr-Neutr0n · Pull Request #264 · OpenGVLab/Ask-Anything

Mr-Neutr0n · 2026-02-11T14:05:01Z

Summary

Bug: In the forward() methods across 4 model files, the BOS token position in inputs_embeds is set using the raw integer bos_token_id (e.g., 1) instead of its actual embedding vector. Since inputs_embeds is a float tensor produced by embed_tokens(), assigning a scalar integer broadcasts that value across the entire embedding dimension, effectively corrupting the BOS position with a uniform constant instead of the learned BOS embedding.
Fix: Pass bos_token_id through embed_tokens() to obtain the proper embedding vector before assigning it to inputs_embeds[:, :1]. The fix respects each file's model attribute path (llama_model vs mistral_model) and LoRA configuration (base_model.model.model vs model).

Affected files

video_chat/models/videochat_it.py
video_chat2/models/videochat_vicuna/videochat2_it_vicuna.py
video_chat2/models/videochat_mistra/videochat2_it_mistral.py
video_chat2/models/videochat_mistra/videochat2_it_hd_mistral.py

Before (buggy)

# inputs_embeds is a float tensor of shape [B, seq_len, hidden_dim]
# bos_token_id is an integer (e.g., 1)
# This broadcasts the scalar 1 across all hidden_dim dimensions
inputs_embeds[:, :1] = self.llama_tokenizer.bos_token_id

After (fixed)

bos_token_id = torch.tensor([[self.llama_tokenizer.bos_token_id]], device=inputs_embeds.device)
if self.use_lora:
    bos_embeds = self.llama_model.base_model.model.model.embed_tokens(bos_token_id)
else:
    bos_embeds = self.llama_model.model.embed_tokens(bos_token_id)
inputs_embeds[:, :1] = bos_embeds

Test plan

Verify that inputs_embeds[:, :1] now contains a proper embedding vector (varying values across the hidden dimension) rather than a uniform scalar
Run training on a small sample to confirm loss values are comparable or improved
Check that inference outputs remain coherent with the corrected BOS embedding

…nteger In the forward() methods, the BOS token position in inputs_embeds was being set to the raw integer bos_token_id (e.g. 1) instead of the actual embedding vector. Since inputs_embeds is a float tensor produced by embed_tokens(), assigning a scalar integer broadcasts that value across the entire embedding dimension, corrupting the BOS representation. Fix by passing bos_token_id through embed_tokens() to obtain the correct embedding vector before assignment. This is applied consistently across all four model files, respecting each file's model attribute path and LoRA configuration. Affected files: - video_chat/models/videochat_it.py - video_chat2/models/videochat_vicuna/videochat2_it_vicuna.py - video_chat2/models/videochat_mistra/videochat2_it_mistral.py - video_chat2/models/videochat_mistra/videochat2_it_hd_mistral.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix BOS token embedding: use embed_tokens() instead of raw integer#264

Fix BOS token embedding: use embed_tokens() instead of raw integer#264
Mr-Neutr0n wants to merge 1 commit intoOpenGVLab:mainfrom
Mr-Neutr0n:fix/bos-token-embedding

Mr-Neutr0n commented Feb 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Mr-Neutr0n commented Feb 11, 2026

Summary

Affected files

Before (buggy)

After (fixed)

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant