[RL] Fix shape mismatch on tail batch in GRPO training by susanbao · Pull Request #4252 · AI-Hypercomputer/maxtext

susanbao · 2026-06-24T07:24:03Z

This PR fixes the JAX shard_map shape mismatch (ValueError) caused by lazy filtering of long prompts in grain dataset by adding drop_remainder=True and doubling the test dataset slice.

…inder=True and doubling test dataset slice

codecov · 2026-06-24T07:32:57Z

Codecov Report

❌ Patch coverage is 20.00000% with 4 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/maxtext/trainers/post_train/rl/train_rl.py	0.00%	3 Missing ⚠️
...maxtext/trainers/post_train/rl/math_verify_pool.py	50.00%	1 Missing ⚠️

📢 Thoughts on this report? Let us know!

Fix shape mismatch on tail batch in GRPO training by adding drop_rema…

1efafe4

…inder=True and doubling test dataset slice

Fix host CPU OOM by keeping math_verify_pool size constant

b5c14b6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[RL] Fix shape mismatch on tail batch in GRPO training#4252

[RL] Fix shape mismatch on tail batch in GRPO training#4252
susanbao wants to merge 2 commits into
mainfrom
sanbao/gpt

susanbao commented Jun 24, 2026

Uh oh!

codecov Bot commented Jun 24, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

susanbao commented Jun 24, 2026

Uh oh!

codecov Bot commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

codecov Bot commented Jun 24, 2026 •

edited

Loading