Fix(fun-asr-dataset): fix dynamic masking error in input_ids by roney123 · Pull Request #2801 · modelscope/FunASR

roney123 · 2026-02-02T06:45:15Z

Problem: input_ids sequence incorrectly contained -100 masking tokens.

Solution:

Use original target_ids for input construction
Apply masking only to labels
Ensure input_ids contain only valid token IDs

Result: Correct implements simulate prev-token fixed output.

gemini-code-assist · 2026-02-02T06:45:29Z

Summary of Changes

Hello @roney123, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request resolves an issue where the dynamic masking mechanism incorrectly inserted -100 tokens into the input_ids sequence. The changes ensure that input_ids are built from the original, unmasked target tokens, while the masking operation is correctly applied only to the labels sequence. This refinement in data preparation guarantees that the model receives clean inputs and appropriately masked labels for the prev-token fixed output simulation.

Highlights

Corrected Dynamic Masking Application: Introduced a separate target_labels variable, which is a copy of target_ids, to ensure that dynamic masking is applied exclusively to the labels and not to the input sequence.
Ensured Valid Input IDs: Modified the data processing to guarantee that input_ids are constructed using the original, unmasked target_ids, preventing the inclusion of -100 masking tokens in the model's input.
Accurate Label Generation: Updated the labels sequence to incorporate the dynamically masked target_labels, aligning the masking behavior with the intended prev-token fixed output simulation.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request correctly fixes a bug where masking tokens were incorrectly added to input_ids. By creating a copy of target_ids for masking and using the original for input_ids, the issue is resolved. The change is clear and effective. I've added one suggestion to improve maintainability by refactoring the new masking logic.

fix(dataset): fix dynamic masking error in input construction

ce9d58d

gemini-code-assist bot reviewed Feb 2, 2026

View reviewed changes

LauraGPT approved these changes Feb 4, 2026

View reviewed changes

LauraGPT merged commit 81d96c9 into modelscope:main Feb 4, 2026

roney123 deleted the fix_dataset branch February 4, 2026 11:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix(fun-asr-dataset): fix dynamic masking error in input_ids#2801

Fix(fun-asr-dataset): fix dynamic masking error in input_ids#2801
LauraGPT merged 1 commit intomodelscope:mainfrom
roney123:fix_dataset

roney123 commented Feb 2, 2026

Uh oh!

gemini-code-assist bot commented Feb 2, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

roney123 commented Feb 2, 2026

Uh oh!

gemini-code-assist bot commented Feb 2, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants