[WIP]Support dpo/grpo simple padding_free by tastelikefeet · Pull Request #181 · modelscope/twinkle

tastelikefeet · 2026-04-23T08:54:24Z

PR type

Bug Fix
New Feature
Document Updates
More Models or Datasets Support

PR information

Write the detail information belongs to this PR.

Experiment results

Paste your experiment result here(if needed).

gemini-code-assist

Code Review

This pull request centralizes the logic for unpacking packed sequences (padding-free mode) into the InputProcessor class, moving it out of specific loss implementations like GRPO. It introduces a canonical method to detect packing and unpack tensors such as log-probabilities and labels into a per-sequence batch format. These changes are integrated into both Megatron and Transformers sequence parallel strategies. The review feedback identifies several improvement opportunities: ensuring the boundary detection logic explicitly includes the first sequence, relaxing the packing detection heuristic to support sequences of length one, and optimizing the unpacking process for better performance and consistency.

gemini-code-assist · 2026-04-23T08:56:08Z

+            pos = pos[0]  # mrope
+        pos_flat = pos.view(-1)
+
+        boundaries = (pos_flat == 0).nonzero(as_tuple=True)[0].unique(sorted=True)


The boundary detection logic assumes that the first sequence in the packed tensor always starts at index 0 and that index 0 is always marked with position_id == 0. While generally true for PackingDataset, if the tensor has been sliced or if the first token has a non-zero position ID for any reason, the tokens before the first 0 will be lost. It is safer to explicitly include index 0 as a boundary.

Suggested change

boundaries = (pos_flat == 0).nonzero(as_tuple=True)[0].unique(sorted=True)

zero_indices = (pos_flat == 0).nonzero(as_tuple=True)[0]

boundaries = torch.cat([pos_flat.new_tensor([0]), zero_indices]).unique(sorted=True)

gemini-code-assist · 2026-04-23T08:56:09Z

+            return False
+        for i in range(pos.shape[0]):
+            row = pos[i]
+            if int((row == 0).sum()) > 1 and int((row == 1).sum()) > 1:


The heuristic for detecting packed sequences is too restrictive. Checking for (row == 1).sum() > 1 means that at least two sequences in the pack must have a length of at least 2. If a pack contains multiple sequences where some have length 1 (e.g., [0, 1, 2, 0]), this check will return False, and the sequences won't be unpacked, leading to incorrect loss calculation in GRPO. Checking for multiple 0 resets is sufficient to identify packing.

Suggested change

if int((row == 0).sum()) > 1 and int((row == 1).sum()) > 1:

if (row == 0).sum().item() > 1:

gemini-code-assist · 2026-04-23T08:56:09Z

+        from copy import copy
+        unpacked_logps, unpacked_labels = self._unpack_by_position_ids(
+            position_ids, logps, labels, padding_values=[0, -100])
+
+        inputs = copy(inputs)
+        outputs = copy(outputs)
+        outputs['logps'] = unpacked_logps
+        inputs['labels'] = unpacked_labels
+        return inputs, outputs


There are several improvements possible in this block:

Performance: Importing copy inside a method called every forward pass is inefficient.

Idiomatic Python: For dictionaries, dict.copy() is preferred over copy.copy(dict).

Consistency: It is better to also unpack position_ids and update it in the inputs dictionary. This ensures that all sequence-length tensors in the batch have consistent shapes [num_seqs, max_len], preventing potential shape mismatches in downstream metrics or custom loss functions.

Suggested change

from copy import copy

unpacked_logps, unpacked_labels = self._unpack_by_position_ids(

position_ids, logps, labels, padding_values=[0, -100])

inputs = copy(inputs)

outputs = copy(outputs)

outputs['logps'] = unpacked_logps

inputs['labels'] = unpacked_labels

return inputs, outputs

unpacked_logps, unpacked_labels, unpacked_pos = self._unpack_by_position_ids(

position_ids, logps, labels, position_ids, padding_values=[0, -100, -1])

inputs = inputs.copy()

outputs = outputs.copy()

outputs['logps'] = unpacked_logps

inputs['labels'] = unpacked_labels

inputs['position_ids'] = unpacked_pos

return inputs, outputs

fix

f875119

gemini-code-assist Bot reviewed Apr 23, 2026

View reviewed changes

tastelikefeet added 2 commits April 24, 2026 19:23

fix

04411ee

fix

3bafa63

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP]Support dpo/grpo simple padding_free#181

[WIP]Support dpo/grpo simple padding_free#181
tastelikefeet wants to merge 3 commits intomodelscope:mainfrom
tastelikefeet:feat/padding_free_fix

tastelikefeet commented Apr 23, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Apr 23, 2026

Uh oh!

gemini-code-assist Bot Apr 23, 2026

Uh oh!

gemini-code-assist Bot Apr 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	boundaries = (pos_flat == 0).nonzero(as_tuple=True)[0].unique(sorted=True)
	zero_indices = (pos_flat == 0).nonzero(as_tuple=True)[0]
	boundaries = torch.cat([pos_flat.new_tensor([0]), zero_indices]).unique(sorted=True)

	if int((row == 0).sum()) > 1 and int((row == 1).sum()) > 1:
	if (row == 0).sum().item() > 1:

Conversation

tastelikefeet commented Apr 23, 2026

PR type

PR information

Experiment results

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant