[JAX] Fix get_seqlens_and_offsets() to accept vmapped seg ids and non vmapped seg offsets by KshitijLakhani · Pull Request #2692 · NVIDIA/TransformerEngine

KshitijLakhani · 2026-02-19T21:57:19Z

Description

What is the bug ?

TE provides a convenience function from_segment_ids_and_pos() which allows users to pass only segment ids and the function returns a SequenceDescriptor with internally generated segment pos and passed segment ids.

As mentioned in Issue #2685 , if a user were to vmap a function forward() which i) accepts the q,k,v,segment ids and then ii) calls from_segment_ids_and_pos() followed by iii) a call to DPA(), what happens is that JAX sees the segment ids as vmapped hence an extra leading dimension is added (e.g. 1,2,128) whereas the segment offsets are not given a leading dimension (e.g. 2,128). This results in the FusedAttn primitive impl() assert being triggered due to a shape mismatch between seg ids and seg pos as mentioned in issue #2685

What is the root cause for the bug ?

On debugging, it can be seen that the shape starts differing when the batcher is being traced for the FusedAttn primitive.
segment_ids in the primitive: treated as vmapped inputs hence batched → (1, 2, 128).
segment_pos in the primitive: treated as derived within the function hence not batched → (2, 128).

Fixes #2685

Type of change

Documentation change (change only to the documentation, either a fix or a new content)
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Infra/Build change
Code refactoring

Changes

There are two possible approaches to solve this:

Ensure that the issue is resolved at the source, i.e.. ensure that segment_pos has the same leading batching dims as segment_ids. Add any additional dims in the batcher for the same so that when impl() sees the shape they are the same. Pros: Issue resolved in a "JAX" way and at source. Cons: Increasing mem be expanding seg pos dims.
Resolve the issue when impl() is called, i.e. accomodate for mismatched seg id and seg pos dims when generating the seqlens and offsets. Pros: No extra mem needed as no expansion of dims. Cons: Not "truely" solved (at source)

Second approach is chosen here as it more optimized. After this PR merge the end user can vmap wrap the TE API calls without worrying about the batching in TE.
Accomodate for

Checklist:

I have read and followed the contributing guidelines
The functionality is complete
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

… the TE constructed segment pos are not thereby causing mismatches in impl() Signed-off-by: Kshitij Lakhani <klakhani@nvidia.com>

KshitijLakhani · 2026-02-19T21:59:40Z

/te-ci jax L0 L1 L2

greptile-apps · 2026-02-20T06:57:17Z

Greptile Summary

Fixed shape mismatch bug when using from_segment_ids_and_pos() inside vmapped functions. The issue occurred because JAX added leading batch dimensions to user-provided segment_ids but not to internally-generated segment_pos, causing FusedAttn primitive assertions to fail.

Key changes:

Modified get_seqlens_and_offsets() in SequenceDescriptor to detect and handle extra leading batch dims on segment_ids
When segment_ids has more dimensions than segment_pos, the code now flattens extra batch dims, vmaps the seqlens/offsets computation with segment_pos broadcast, then reshapes outputs back
Replaced strict shape equality assertions with more flexible validation that allows segment_ids to have additional leading dims
Updated comments in FusedAttn primitive batchers to document that segment_ids/segment_pos may have different batch dimensions

Implementation notes:

The vmap approach with in_axes=(0, 0, None, None) correctly broadcasts segment_pos across the batch dimension
JAX will raise clear errors if q and kv segment_ids have incompatible batch sizes, which is appropriate for catching user errors
Changes only affect THD layout; BSHD path remains unchanged

Confidence Score: 5/5

This PR is safe to merge with minimal risk
The implementation correctly solves the vmap shape mismatch issue with a clean approach: detecting extra batch dims on segment_ids, flattening them, vmapping the computation with broadcasted segment_pos, and reshaping outputs. The validation logic appropriately checks shape compatibility. Edge cases with mismatched batch sizes are caught by JAX's vmap error handling. Changes are scoped to THD layout only, minimizing risk to other code paths.
No files require special attention

Important Files Changed

Filename	Overview
transformer_engine/jax/attention.py	Modified `get_seqlens_and_offsets()` to handle vmapped segment_ids with broadcasted segment_pos by flattening extra batch dims, vmapping the computation, and reshaping outputs
transformer_engine/jax/cpp_extensions/attention.py	Updated batcher comments to clarify that segment_ids/segment_pos may have different batch dims and conversion is handled in attention.py

_{Last reviewed commit: 40e4d28}

greptile-apps

_{2 files reviewed, 3 comments}

_{Edit Code Review Agent Settings | Greptile}

transformer_engine/jax/cpp_extensions/attention.py

jberchtold-nvidia

LGTM, thanks!

Signed-off-by: Kshitij Lakhani <klakhani@nvidia.com>

for more information, see https://pre-commit.ci

greptile-apps

_{1 file reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

transformer_engine/jax/cpp_extensions/attention.py

Signed-off-by: Kshitij Lakhani <klakhani@nvidia.com>

greptile-apps

_{1 file reviewed, 2 comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-02-25T02:16:50Z

transformer_engine/jax/cpp_extensions/attention.py

+            for _ in range(leading_bdim):
+                expanded = lax.expand_dims(expanded, (0,))
+            batched_args_list[seg_pos_idx] = jnp.broadcast_to(expanded, target_shape)
+            updated_batch_dims[seg_pos_idx] = 0


consider using seg_id_bdim instead of hardcoding 0 for consistency, even though check_valid_batch_dims ensures it's always 0

Suggested change

updated_batch_dims[seg_pos_idx] = 0

updated_batch_dims[seg_pos_idx] = seg_id_bdim

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

greptile-apps · 2026-02-25T02:16:51Z

transformer_engine/jax/cpp_extensions/attention.py

+            for _ in range(leading_bdim):
+                expanded = lax.expand_dims(expanded, (0,))
+            batched_args_list[seg_pos_idx] = jnp.broadcast_to(expanded, target_shape)
+            updated_batch_dims[seg_pos_idx] = 0


consider using seg_id_bdim instead of hardcoding 0 for consistency, even though check_valid_batch_dims ensures it's always 0

Suggested change

updated_batch_dims[seg_pos_idx] = 0

updated_batch_dims[seg_pos_idx] = seg_id_bdim

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

…rts. Signed-off-by: Kshitij Lakhani <klakhani@nvidia.com>

… the seqlens and offsets for fused attn Signed-off-by: Kshitij Lakhani <klakhani@nvidia.com>

greptile-apps · 2026-02-27T18:30:13Z

transformer_engine/jax/attention.py

+                # assert flat_batch_q == flat_batch_kv, (
+                #     f"segment_ids batch size mismatch: {batch_shape_q} vs {batch_shape_kv}"
+                # )


commented assertion could lead to unclear error if q and kv have mismatched batch sizes. vmap would fail but with a generic JAX error. consider uncommenting or adding a comment explaining why validation isn't needed

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

…ed to get_seqlens_and_offsets() Signed-off-by: Kshitij Lakhani <klakhani@nvidia.com>

Signed-off-by: Kshitij Lakhani <klakhani@nvidia.com>

for more information, see https://pre-commit.ci

KshitijLakhani · 2026-02-27T19:38:26Z

/te-ci jax L0 L1 L2

KshitijLakhani · 2026-02-28T02:23:03Z

CI passes. The only one failure is due to HF requests for the A100 L2 test.
Rerunning passes these

Fix batcher for when segment ids received are batched/vmapped whereas…

79682e6

… the TE constructed segment pos are not thereby causing mismatches in impl() Signed-off-by: Kshitij Lakhani <klakhani@nvidia.com>

KshitijLakhani self-assigned this Feb 19, 2026

KshitijLakhani added the 2.13.0 label Feb 19, 2026

KshitijLakhani marked this pull request as ready for review February 20, 2026 06:54

greptile-apps bot reviewed Feb 20, 2026

View reviewed changes

transformer_engine/jax/cpp_extensions/attention.py Outdated Show resolved Hide resolved

transformer_engine/jax/cpp_extensions/attention.py Outdated Show resolved Hide resolved

KshitijLakhani requested review from cyanguwa, jberchtold-nvidia and mgoldfarb-nvidia February 20, 2026 07:21

jberchtold-nvidia previously approved these changes Feb 23, 2026

View reviewed changes

KshitijLakhani added 2.14.0 and removed 2.13.0 labels Feb 23, 2026

nit: Fix the shape check for assert

35d6d0f

Signed-off-by: Kshitij Lakhani <klakhani@nvidia.com>

KshitijLakhani dismissed jberchtold-nvidia’s stale review via 35d6d0f February 24, 2026 01:27

KshitijLakhani force-pushed the klakhani/fix/vmap-get-seg-ids-pos branch from da19f26 to 35d6d0f Compare February 24, 2026 01:27

pre-commit-ci bot and others added 2 commits February 24, 2026 01:28

[pre-commit.ci] auto fixes from pre-commit.com hooks

1ec442d

for more information, see https://pre-commit.ci

Merge branch 'main' into klakhani/fix/vmap-get-seg-ids-pos

1542899

greptile-apps bot reviewed Feb 24, 2026

View reviewed changes

transformer_engine/jax/cpp_extensions/attention.py Outdated Show resolved Hide resolved

Fix the batcher logic to check for q and kv seg ids separately

0967b86

Signed-off-by: Kshitij Lakhani <klakhani@nvidia.com>

greptile-apps bot reviewed Feb 25, 2026

View reviewed changes

KshitijLakhani added 2 commits February 27, 2026 10:22

Remove batcher logic to expand segment pos. Keep the shape check asse…

ea8e5d6

…rts. Signed-off-by: Kshitij Lakhani <klakhani@nvidia.com>

Add support for vmapped seg id and non vmapped seg pos when computing…

5138ac3

… the seqlens and offsets for fused attn Signed-off-by: Kshitij Lakhani <klakhani@nvidia.com>

greptile-apps bot reviewed Feb 27, 2026

View reviewed changes

Undo batcher check logic for seg pos and seg ids as it is already mov…

395ac54

…ed to get_seqlens_and_offsets() Signed-off-by: Kshitij Lakhani <klakhani@nvidia.com>

KshitijLakhani force-pushed the klakhani/fix/vmap-get-seg-ids-pos branch from a7c398c to 395ac54 Compare February 27, 2026 18:36

KshitijLakhani added 2 commits February 27, 2026 11:05

nit: Remove unnecessary assert check

2f9dcc5

Signed-off-by: Kshitij Lakhani <klakhani@nvidia.com>

nit: Code clean up

693ba65

Signed-off-by: Kshitij Lakhani <klakhani@nvidia.com>

KshitijLakhani force-pushed the klakhani/fix/vmap-get-seg-ids-pos branch from 386a633 to 693ba65 Compare February 27, 2026 19:19

[pre-commit.ci] auto fixes from pre-commit.com hooks

d630082

for more information, see https://pre-commit.ci

KshitijLakhani changed the title ~~[JAX] Fix batcher in FusedAttn primitive for when seg ids bdims != seg pos bdims~~ [JAX] Fix get_seqlens_and_offsets() to accept vmapped seg ids and non vmapped seg offsets Feb 27, 2026

Merge branch 'main' into klakhani/fix/vmap-get-seg-ids-pos

40e4d28

KshitijLakhani requested a review from jberchtold-nvidia February 28, 2026 02:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[JAX] Fix get_seqlens_and_offsets() to accept vmapped seg ids and non vmapped seg offsets#2692

[JAX] Fix get_seqlens_and_offsets() to accept vmapped seg ids and non vmapped seg offsets#2692
KshitijLakhani wants to merge 12 commits intoNVIDIA:mainfrom
KshitijLakhani:klakhani/fix/vmap-get-seg-ids-pos

KshitijLakhani commented Feb 19, 2026 •

edited

Loading

Uh oh!

KshitijLakhani commented Feb 19, 2026

Uh oh!

greptile-apps bot commented Feb 20, 2026 •

edited

Loading

Uh oh!

greptile-apps bot left a comment •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

jberchtold-nvidia left a comment

Uh oh!

greptile-apps bot left a comment

Uh oh!

Uh oh!

greptile-apps bot left a comment

Uh oh!

greptile-apps bot Feb 25, 2026

Uh oh!

greptile-apps bot Feb 25, 2026

Uh oh!

greptile-apps bot Feb 27, 2026

Uh oh!

KshitijLakhani commented Feb 27, 2026

Uh oh!

KshitijLakhani commented Feb 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	updated_batch_dims[seg_pos_idx] = 0
	updated_batch_dims[seg_pos_idx] = seg_id_bdim

Conversation

KshitijLakhani commented Feb 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

What is the bug ?

What is the root cause for the bug ?

Type of change

Changes

Checklist:

Uh oh!

KshitijLakhani commented Feb 19, 2026

Uh oh!

greptile-apps bot commented Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Uh oh!

greptile-apps bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jberchtold-nvidia left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

KshitijLakhani commented Feb 27, 2026

Uh oh!

KshitijLakhani commented Feb 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

KshitijLakhani commented Feb 19, 2026 •

edited

Loading

greptile-apps bot commented Feb 20, 2026 •

edited

Loading

greptile-apps bot left a comment •

edited

Loading