[PyTorch] Introduce quantizer roles by negvet · Pull Request #2620 · NVIDIA/TransformerEngine

negvet · 2026-01-23T15:31:22Z

Description

Introducing QuantizerRole

@dataclasses.dataclass(frozen=True)
class QuantizerRole:
    module_type: str = ""   # e.g. "linear", "grouped_linear", "dpa"
    tensor_type: str = ""   # e.g. "input", "weight", "grad_output", "qkv", "s"
    name: str = ""          # instance name, e.g. "qkv", "proj", "fc1", "fc2"

This is an API that allows to go down to "set this LayerNormLinear in this transformer layer to be less aggressively quantized." (fine-grained, per-module/per-tensor quantization control mechanism)
See test_custom_recipe.py::test_custom_recipe_quantization_targets().

Quantizer factory uses roles to dispatch according to its needs.

TE module/op emits a list of QuantizerRole:

Linear, LayerNormLinear, LayerNormMLP emit module_type="linear" with tensor_type in {"input", "weight", "grad_output"}.
GroupedLinear emits module_type="grouped_linear".

CustomRecipe accepts a qfactory callable that receives QuantizerRole and returns a quantizer.

Factories can be composed - e.g., dispatch (to different sub-factories as an option) based on module_type (dpa vs linear) and then refine based on tensor_type.

Type of change

Documentation change (change only to the documentation, either a fix or a new content)
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Infra/Build change
Code refactoring

Changes

Please list the changes introduced in this PR:

Change A
Change B

Checklist:

I have read and followed the contributing guidelines
The functionality is complete
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

…ipe state Signed-off-by: Evgeny <etsykunov@nvidia.com>

Signed-off-by: Evgeny <etsykunov@nvidia.com>

for more information, see https://pre-commit.ci

greptile-apps · 2026-01-23T15:34:41Z

Greptile Summary

This PR introduces QuantizerRole, a frozen dataclass that enables fine-grained, per-module/per-tensor quantization control through a semantic role-based dispatch mechanism.

Key Changes:

Added QuantizerRole dataclass with fields: module_type (e.g., "linear", "grouped_linear"), tensor_type (e.g., "input", "weight", "grad_output"), and name (instance name)
All TE modules (Linear, GroupedLinear, LayerNormLinear, LayerNormMLP, DotProductAttention) now implement get_quantizer_roles() to emit semantic roles
CustomRecipe.qfactory now receives QuantizerRole objects instead of strings, enabling sophisticated dispatch logic
Added output_quantizer_role and grad_input_quantizer_role properties to base module for configuring consumer identity
Provided reference factory implementations in quantization_recipes_base.py that mirror built-in recipes
Added example factories demonstrating per-module quantization targeting (e.g., NVFP4 for Linear, MXFP8 for GroupedLinear)
Comprehensive test coverage including factory equivalence tests and fine-grained targeting validation

API Design:

Output and grad_input quantizer slots default to None (unknown consumer), allowing factories to provide fallback behavior
Modules can override these via setter properties when consumer identity is known (e.g., MHA sets DPA roles for QKV outputs)
Factories inspect only the role fields they care about, providing flexibility for different dispatch strategies

Confidence Score: 5/5

This PR is safe to merge with comprehensive test coverage and well-architected API design
The implementation is thorough, with all modules properly updated to emit QuantizerRole objects. Test coverage is comprehensive, including factory equivalence tests that validate bit-identical results against built-in recipes. The API is clearly documented and marked as experimental. Previous review concerns about string parsing have been resolved.
No files require special attention

Important Files Changed

Filename	Overview
transformer_engine/pytorch/quantization.py	Introduces QuantizerRole dataclass (module_type, tensor_type, name) and updates CustomRecipeState to accept roles parameter
transformer_engine/pytorch/module/base.py	Adds output_quantizer_role and grad_input_quantizer_role properties to base module, implements get_quantizer_roles() method
transformer_engine/pytorch/module/linear.py	Implements get_quantizer_roles() returning Linear-specific roles (input, weight, grad_output) with module_type="linear"
transformer_engine/pytorch/ops/op.py	Adds get_quantizer_roles() abstract method to BasicOperation, passes roles to RecipeState.create()
transformer_engine/pytorch/custom_recipes/quantization_recipes_base.py	New file with reference factory implementations (current_scaling, mxfp8, float8_block_scaling, nvfp4) that mirror built-in recipes
tests/pytorch/test_custom_recipe.py	Updated tests to use QuantizerRole instead of strings, added comprehensive factory equivalence tests
transformer_engine/common/recipe/init.py	Updated CustomRecipe docstring to document QuantizerRole parameter instead of string roles

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[TE Module/Op] -->|emits| B[QuantizerRole]
    B -->|module_type<br/>tensor_type<br/>name| C{CustomRecipe<br/>qfactory}
    C -->|dispatches based<br/>on role fields| D[Quantizer Instance]
    
    subgraph "QuantizerRole Fields"
    B1[module_type:<br/>linear, grouped_linear, dpa]
    B2[tensor_type:<br/>input, weight, grad_output]
    B3[name:<br/>qkv, proj, fc1, fc2]
    end
    
    subgraph "Module Examples"
    M1[Linear] -->|get_quantizer_roles| B
    M2[GroupedLinear] -->|get_quantizer_roles| B
    M3[LayerNormMLP] -->|get_quantizer_roles| B
    M4[DotProductAttention] -->|get_quantizer_roles| B
    end
    
    subgraph "Quantizer Factories"
    C -->|role-based dispatch| F1[NVFP4Quantizer]
    C -->|role-based dispatch| F2[MXFP8Quantizer]
    C -->|role-based dispatch| F3[Float8CurrentScalingQuantizer]
    C -->|role-based dispatch| F4[Float8BlockQuantizer]
    end
    
    style B fill:#e1f5ff
    style C fill:#fff4e6
    style D fill:#e8f5e9

_{Last reviewed commit: 343f653}

Signed-off-by: Evgeny <etsykunov@nvidia.com>

timmoon10

Overall this design is quite clean and generalizable.

transformer_engine/pytorch/quantization.py

transformer_engine/pytorch/custom_recipes/quantization_nvfp4.py

timmoon10 · 2026-02-20T02:59:37Z

transformer_engine/pytorch/module/linear.py

+            base = [
+                QuantizerRole(module_type="linear", tensor_type="input", name=name),
+                QuantizerRole(module_type="linear", tensor_type="weight", name=name),
+                QuantizerRole(module_type="linear", tensor_type="output", name=name),
+            ]
+        else:
+            base = [
+                QuantizerRole(module_type="linear", tensor_type="grad_output", name=name),
+                QuantizerRole(module_type="linear", tensor_type="grad_input", name=name),
+            ]


"output" and "grad_input" roles don't make sense. In reality, we are implicitly assuming that the tensor will be consumed by another linear-like layer.

Suggested change

base = [

QuantizerRole(module_type="linear", tensor_type="input", name=name),

QuantizerRole(module_type="linear", tensor_type="weight", name=name),

QuantizerRole(module_type="linear", tensor_type="output", name=name),

]

else:

base = [

QuantizerRole(module_type="linear", tensor_type="grad_output", name=name),

QuantizerRole(module_type="linear", tensor_type="grad_input", name=name),

]

base = [

QuantizerRole(module_type="linear", tensor_type="input", name=name),

QuantizerRole(module_type="linear", tensor_type="weight", name=name),

QuantizerRole(module_type="linear", tensor_type="input", name=name),

]

else:

base = [

QuantizerRole(module_type="linear", tensor_type="grad_output", name=name),

QuantizerRole(module_type="linear", tensor_type="grad_output", name=name),

]

Alternatively, if we want to use the output in FP8 DPA, the right role would be module_type="dpa" and module_type="input". We should probably make this configurable. I kind of like that this design is exposing the hidden assumptions we've been making.

I agree about "output" and "grad_input" roles. Setting roles for those slots to None (the safest) and enabling the configuration. Also configured it in MHA.

timmoon10 · 2026-02-20T03:10:24Z

tests/pytorch/test_custom_recipe.py

+    assert counts["input"] == 1
+    assert counts["weight"] == 1
+    assert counts["output"] == 1
+    assert counts["grad_output"] == 1
+    assert counts["grad_input"] == 1


Suggested change

assert counts["input"] == 1

assert counts["weight"] == 1

assert counts["output"] == 1

assert counts["grad_output"] == 1

assert counts["grad_input"] == 1

assert counts["input"] == 2

assert counts["weight"] == 1

assert counts["output"] == 0

assert counts["grad_output"] == 2

assert counts["grad_input"] == 0

Signed-off-by: Evgeny Tsykunov <etsykunov@nvidia.com>

for more information, see https://pre-commit.ci

Signed-off-by: Evgeny <etsykunov@nvidia.com>

for more information, see https://pre-commit.ci

greptile-apps

_{15 files reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

timmoon10 · 2026-02-20T22:59:33Z

transformer_engine/pytorch/quantization.py

+    def is_gemm(self) -> bool:
+        """Whether this role belongs to a GEMM-based module."""
+        return self.module_type in self.GEMM_MODULE_TYPES
+


I think this is baking in assumptions about what formats are similar (our recent experiences with grouped tensors makes me wonder if the requirements for "linear" and "grouped_linear" will diverge in the future), and it's also not giving us that much convenience.

Suggested change

def is_gemm(self) -> bool:

"""Whether this role belongs to a GEMM-based module."""

return self.module_type in self.GEMM_MODULE_TYPES

Sure, removed

Signed-off-by: Evgeny <etsykunov@nvidia.com>

for more information, see https://pre-commit.ci

greptile-apps

_{17 files reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

Signed-off-by: Evgeny <etsykunov@gmail.com>

for more information, see https://pre-commit.ci

greptile-apps

_{24 files reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps

_{24 files reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

Signed-off-by: Evgeny <etsykunov@gmail.com>

for more information, see https://pre-commit.ci

greptile-apps

_{24 files reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

Signed-off-by: Evgeny <etsykunov@nvidia.com>

for more information, see https://pre-commit.ci

negvet and others added 4 commits January 23, 2026 15:14

Enable semantic roles emitted by module/op and comsumed by custom rec…

cd8b8ad

…ipe state Signed-off-by: Evgeny <etsykunov@nvidia.com>

Update quantization factories

fddeba4

Signed-off-by: Evgeny <etsykunov@nvidia.com>

Fix tests

82b84ff

Signed-off-by: Evgeny <etsykunov@nvidia.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

4346231

for more information, see https://pre-commit.ci

negvet requested review from cyanguwa and timmoon10 January 23, 2026 15:32

This comment was marked as off-topic.

Sign in to view

negvet added 2 commits January 27, 2026 10:57

Swap tensor:module

a81f54a

Signed-off-by: Evgeny <etsykunov@nvidia.com>

Better naming

700ea04

Signed-off-by: Evgeny <etsykunov@nvidia.com>

This comment was marked as outdated.

Sign in to view

Introduce QuantizerRole frozen data class instead of a string

d7ca20b

Signed-off-by: Evgeny <etsykunov@nvidia.com>

This comment was marked as resolved.

Sign in to view

Shrink module_type vocabulary

ed59556

Signed-off-by: Evgeny <etsykunov@nvidia.com>

This comment was marked as resolved.

Sign in to view

timmoon10 reviewed Feb 20, 2026

View reviewed changes

negvet and others added 2 commits February 20, 2026 14:31

Merge branch 'main' into semantic_quantizer_roles

ade46a6

Signed-off-by: Evgeny Tsykunov <etsykunov@nvidia.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

b1a4aed

for more information, see https://pre-commit.ci

This comment was marked as resolved.

Sign in to view

negvet and others added 5 commits February 20, 2026 15:05

Fix numerics exact test

6e1ee37

Signed-off-by: Evgeny <etsykunov@nvidia.com>

Set defaults, make custom recipe forward compatible

b9753f2

Signed-off-by: Evgeny <etsykunov@nvidia.com>

remove position from QuantizerRole

ad67247

Signed-off-by: Evgeny <etsykunov@nvidia.com>

Set good defaults

e6be76a

Signed-off-by: Evgeny <etsykunov@nvidia.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

a86fdad

for more information, see https://pre-commit.ci

greptile-apps bot reviewed Feb 20, 2026

View reviewed changes

timmoon10 reviewed Feb 20, 2026

View reviewed changes

negvet mentioned this pull request Feb 23, 2026

Add NVTE_BACKWARD_MODE=default|unquant|dequant #2644

Open

13 tasks

negvet added 3 commits February 24, 2026 15:13

Resolve naming: make every module/op distinguishable via name

d323f66

Signed-off-by: Evgeny <etsykunov@nvidia.com>

Configure output/grad_input roles, defaults to None

c9eae0f

Signed-off-by: Evgeny <etsykunov@nvidia.com>

Remove is_gemm()

ea3c135

Signed-off-by: Evgeny <etsykunov@nvidia.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

aaf980f

for more information, see https://pre-commit.ci

greptile-apps bot reviewed Feb 24, 2026

View reviewed changes

negvet changed the title ~~[PyTorch] Introduce semantic quantizer roles~~ [PyTorch] Introduce quantizer roles Feb 25, 2026

Evgeny and others added 3 commits February 25, 2026 14:30

Enable base recipes via CustomRecipe and quantization factories

aad3512

Signed-off-by: Evgeny <etsykunov@gmail.com>

Add factory example - NVFP4 for Linear, MXFP8 for GroupedLinear

8d7c91f

Signed-off-by: Evgeny <etsykunov@gmail.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

736cd72

for more information, see https://pre-commit.ci

greptile-apps bot reviewed Feb 25, 2026

View reviewed changes

Merge branch 'main' into semantic_quantizer_roles

ddf727c

greptile-apps bot reviewed Feb 25, 2026

View reviewed changes

Evgeny and others added 2 commits February 25, 2026 16:43

Fix custom recipe test

b6bfdf8

Signed-off-by: Evgeny <etsykunov@gmail.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

41656ab

for more information, see https://pre-commit.ci

negvet requested review from ptrendx and timmoon10 February 25, 2026 16:45

greptile-apps bot reviewed Feb 25, 2026

View reviewed changes

negvet and others added 2 commits February 26, 2026 11:17

Test fine-grained quantization targets

cca370c

Signed-off-by: Evgeny <etsykunov@nvidia.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

343f653

for more information, see https://pre-commit.ci

	def is_gemm(self) -> bool:
	"""Whether this role belongs to a GEMM-based module."""
	return self.module_type in self.GEMM_MODULE_TYPES

Conversation

negvet commented Jan 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Changes

Checklist:

Uh oh!

greptile-apps bot commented Jan 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

This comment was marked as off-topic.

This comment was marked as outdated.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

timmoon10 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

timmoon10 Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

negvet Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

timmoon10 Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

This comment was marked as resolved.

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

timmoon10 Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

negvet Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

negvet commented Jan 23, 2026 •

edited

Loading

greptile-apps bot commented Jan 23, 2026 •

edited

Loading

timmoon10 Feb 20, 2026 •

edited

Loading

negvet Feb 25, 2026 •

edited

Loading