[ET Device Support] Annotate device attributes of CUDA backend IO tensors cuda device#18080

Open

Gasoonjia wants to merge 13 commits intogh/gasoonjia/137/basefrom

gh/gasoonjia/137/head

Contributor

Gasoonjia commented Mar 10, 2026 •

edited

Loading

Stack from ghstack (oldest at bottom):

-> [ET Device Support] Annotate device attributes of CUDA backend IO tensors cuda device #18080
[ET Device Support] Parse device info from serialized tensor in tensor_parser #18328

Update cuda backend partitioner to annotate its IO tensors as cuda device

Differential Revision: D96010436


          [ET Device Support] Annotate device attributes of CUDA backend IO ten…

50df074

…sors cuda device

Update cuda backend partitioner to annotate its IO tensors as cuda device

Differential Revision: [D96010436](https://our.internmc.facebook.com/intern/diff/D96010436/)

[ghstack-poisoned]

Gasoonjia mentioned this pull request

[ET Device Support] Schema changes: device info on Tensor and buffer-level device array #17533

Merged

pytorch-bot Bot commented Mar 10, 2026 •

edited

Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18080

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

[CI[B200] Smoke test encounters CUDA Unknown error for dgxb200-03 and dgxb200-04

❌ 2 New Failures, 3 Unrelated Failures

As of commit f5f20d9 with merge base 81bc830 ():

NEW FAILURES - The following jobs have failed:

pull / unittest / macos / macos-job (gh)
export/tests/test_target_recipes.py::TestTargetRecipes::test_mv3_model
Test CUDA Builds / unittest-cuda / linux-job (gh)
backends/cuda/tests/test_cuda_export.py::TestCudaExport::test_device_info_propagated_to_cuda_delegate_outputs

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

pull / test-mcu-cortex-m-backend / linux-job (gh) (detected as infra flaky with no log or failing log classifier)
pull / unittest-editable / windows / windows-job (gh) (matched win rule in flaky-rules.json)
##[error]The operation was canceled.

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

pull / unittest / windows / windows-job (gh) (trunk failure)
##[error]The operation was canceled.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

This was referenced Mar 4, 2026

[ET Device Support] TensorImpl carries device info #17534

Merged

[executorch] Propagate device metadata from partitioner result onto TensorSpecs #18078

Merged

[ET Device Support] Propagate device info from TensorSpec into serialized Tensor #18079

Merged

Gasoonjia added a commit that referenced this pull request


          [ET Device Support] Annotate device attributes of CUDA backend IO ten…

c2231c4

…sors cuda device

Update cuda backend partitioner to annotate its IO tensors as cuda device

Differential Revision: [D96010436](https://our.internmc.facebook.com/intern/diff/D96010436/)

ghstack-source-id: 350230761
Pull Request resolved: #18080

meta-cla Bot added the CLA Signed label

github-actions Bot commented Mar 10, 2026

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

meta-codesync Bot added fb-exported meta-exported labels


          Update on "[ET Device Support] Annotate device attributes of CUDA bac…

c3b9a3c

…kend IO tensors cuda device"

Update cuda backend partitioner to annotate its IO tensors as cuda device

Differential Revision: [D96010436](https://our.internmc.facebook.com/intern/diff/D96010436/)

[ghstack-poisoned]

Gasoonjia mentioned this pull request

[ET Device Support] DeviceAllocator interface and DeviceAllocatorRegistry #17535

Open


          Update on "[ET Device Support] Annotate device attributes of CUDA bac…

fa49aaf

…kend IO tensors cuda device"

Update cuda backend partitioner to annotate its IO tensors as cuda device

Differential Revision: [D96010436](https://our.internmc.facebook.com/intern/diff/D96010436/)

[ghstack-poisoned]

Gasoonjia added a commit that referenced this pull request


          [ET Device Support] Annotate device attributes of CUDA backend IO ten…

6073b6b

…sors cuda device

Pull Request resolved: #18080

Update cuda backend partitioner to annotate its IO tensors as cuda device
ghstack-source-id: 351558872
@exported-using-ghexport

Differential Revision: [D96010436](https://our.internmc.facebook.com/intern/diff/D96010436/)


          Update on "[ET Device Support] Annotate device attributes of CUDA bac…

e137237

…kend IO tensors cuda device"

Update cuda backend partitioner to annotate its IO tensors as cuda device

Differential Revision: [D96010436](https://our.internmc.facebook.com/intern/diff/D96010436/)

[ghstack-poisoned]

Gasoonjia added a commit that referenced this pull request


          [ET Device Support] Annotate device attributes of CUDA backend IO ten…

6b388c1

…sors cuda device

Pull Request resolved: #18080

Update cuda backend partitioner to annotate its IO tensors as cuda device
ghstack-source-id: 353202795
@exported-using-ghexport

Differential Revision: [D96010436](https://our.internmc.facebook.com/intern/diff/D96010436/)


          Update on "[ET Device Support] Annotate device attributes of CUDA bac…

22c5eb1

…kend IO tensors cuda device"

Update cuda backend partitioner to annotate its IO tensors as cuda device

Differential Revision: [D96010436](https://our.internmc.facebook.com/intern/diff/D96010436/)

[ghstack-poisoned]

Gasoonjia mentioned this pull request

[ET Device Support] Parse device info from serialized tensor in tensor_parser #18310

Open

Gasoonjia added a commit that referenced this pull request


          [ET Device Support] Annotate device attributes of CUDA backend IO ten…

808e3da

…sors cuda device

Pull Request resolved: #18080

Update cuda backend partitioner to annotate its IO tensors as cuda device
ghstack-source-id: 354478933
@exported-using-ghexport

Differential Revision: [D96010436](https://our.internmc.facebook.com/intern/diff/D96010436/)


          Update on "[ET Device Support] Annotate device attributes of CUDA bac…

8f4e042

…kend IO tensors cuda device"

Update cuda backend partitioner to annotate its IO tensors as cuda device

Differential Revision: [D96010436](https://our.internmc.facebook.com/intern/diff/D96010436/)

[ghstack-poisoned]

This was referenced Mar 19, 2026

[ET Device Support] Parse device info from serialized tensor in tensor_parser #18328

Merged

[ET Device Support] Add NonConstBufferDevice schema for per-buffer device mapping #18330

Open

Gasoonjia added 2 commits

March 19, 2026 11:43


          Update on "[ET Device Support] Annotate device attributes of CUDA bac…

1ee0cf9

…kend IO tensors cuda device"

Update cuda backend partitioner to annotate its IO tensors as cuda device

Differential Revision: [D96010436](https://our.internmc.facebook.com/intern/diff/D96010436/)

[ghstack-poisoned]


          Update on "[ET Device Support] Annotate device attributes of CUDA bac…

61f249a

…kend IO tensors cuda device"

Update cuda backend partitioner to annotate its IO tensors as cuda device

Differential Revision: [D96010436](https://our.internmc.facebook.com/intern/diff/D96010436/)

[ghstack-poisoned]

Gasoonjia mentioned this pull request

[ET Device Support] Device-aware memory planning: separate buffers per device type #18375

Open


          Update on "[ET Device Support] Annotate device attributes of CUDA bac…

55761da

…kend IO tensors cuda device"

Update cuda backend partitioner to annotate its IO tensors as cuda device

Differential Revision: [D96010436](https://our.internmc.facebook.com/intern/diff/D96010436/)

[ghstack-poisoned]

This was referenced Mar 24, 2026

[ET Device Support] Emitter reads non_const_buffer_device from graph meta #18463

Open

[ET Device Support] DeviceMemoryBuffer RAII class for device memory lifetime management #18464

Open

[ET Device Support] MethodMeta: expose per-buffer device placement API #18465

Open

This was referenced Mar 24, 2026

[ET Device Support] MemoryManager: add per-buffer device metadata #18466

Open

[ET Device Support] Module: allocate device memory for planned buffers #18467

Open

[ET Device Support] CudaAllocator: device memory allocator for CUDA backend #18468

Open

[ET Device Support] Emitter reads non_const_buffer_device from graph meta #18472

Open

[ET Device Support] DeviceMemoryBuffer RAII class for device memory lifetime management #18473

Open

[ET Device Support] MethodMeta: expose per-buffer device placement API #18474

Open

[ET Device Support] MemoryManager: add per-buffer device metadata #18475

Open

[ET Device Support] Module: allocate device memory for planned buffers #18476

Open

[ET Device Support] CudaAllocator: device memory allocator for CUDA backend #18477

Open

lucylq approved these changes

View reviewed changes


          Update on "[ET Device Support] Annotate device attributes of CUDA bac…

dc30a34

…kend IO tensors cuda device"

Update cuda backend partitioner to annotate its IO tensors as cuda device

Differential Revision: [D96010436](https://our.internmc.facebook.com/intern/diff/D96010436/)

[ghstack-poisoned]

This was referenced Apr 6, 2026

[ET Device Support] Define AOT device copy ops registry #18728

Open

[ET Device Support] Define et_copy runtime h2d and d2h copy ops #18729

Open

[ET Device Support] PropagateDevicePass inserts H2D/D2H copy ops at delegate boundaries #18730

Open


          Update on "[ET Device Support] Annotate device attributes of CUDA bac…

5e48eb8

…kend IO tensors cuda device"

Update cuda backend partitioner to annotate its IO tensors as cuda device

Differential Revision: [D96010436](https://our.internmc.facebook.com/intern/diff/D96010436/)

[ghstack-poisoned]

This was referenced Apr 8, 2026

[ET Device Support] Add ExecutorchBackendConfig flags for skipping H2D/D2H copies #18760

Open

[ET Device Support] Add device tensor helper functions to TensorPtr API #18761

Open

[ET Device Support] Extract shared device test utilities to reduce redundancy #18762

Open

[ET Device Support] CUDA-native Qwen 3.5 MoE inference with device tensor pipeline #18788

Open


          Update on "[ET Device Support] Annotate device attributes of CUDA bac…

709138e

…kend IO tensors cuda device"

Update cuda backend partitioner to annotate its IO tensors as cuda device

Differential Revision: [D96010436](https://our.internmc.facebook.com/intern/diff/D96010436/)

[ghstack-poisoned]

Gasoonjia added a commit that referenced this pull request


          [ET Device Support] Annotate device attributes of CUDA backend IO ten…

0c9eb12

…sors cuda device

Pull Request resolved: #18080

Update cuda backend partitioner to annotate its IO tensors as cuda device, and add checks in cuda backend to guarantee it works
ghstack-source-id: 366850769
@exported-using-ghexport

Differential Revision: [D96010436](https://our.internmc.facebook.com/intern/diff/D96010436/)

Gasoonjia added a commit that referenced this pull request


          [executorch] Propagate device metadata from partitioner result onto T…

cebbb1b

…ensorSpecs (#18078)

Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at
bottom):
* #18080
* #18328
* #18079
* __->__ #18078

Add end-to-end device type annotation support from export to runtime.
Currently we only support one device per graph

The overall pipeline is:
a. Partitioner use `compile_spec` to determine which device the
partitoned blob is runing on
b. after lowered partitioned graph to backend, the new-introed
propagate_device_pass will annotate the input and output tensors of
delegate blob as target device.

Differential Revision:
[D95842511](https://our.internmc.facebook.com/intern/diff/D95842511/)

Gasoonjia added a commit that referenced this pull request


          [ET Device Support] Propagate device info from TensorSpec into serial…

d0bed94

…ized Tensor (#18079)

Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at
bottom):
* #18080
* #18328
* __->__ #18079
* #18078

Propagate device information from `TensorSpec.device` (set by
`PropagateDevicePass`) to
the serialized `schema.Tensor` in the emitted PTE file, to make runtime
further aware of it.

Differential Revision:
[D95899706](https://our.internmc.facebook.com/intern/diff/D95899706/)


          Update on "[ET Device Support] Annotate device attributes of CUDA bac…

f5f20d9

…kend IO tensors cuda device"

Update cuda backend partitioner to annotate its IO tensors as cuda device

Differential Revision: [D96010436](https://our.internmc.facebook.com/intern/diff/D96010436/)

[ghstack-poisoned]

Gasoonjia added a commit that referenced this pull request


          [ET Device Support] Annotate device attributes of CUDA backend IO ten…

46591f4

…sors cuda device

Pull Request resolved: #18080

Update cuda backend partitioner to annotate its IO tensors as cuda device, and add checks in cuda backend to guarantee it works
ghstack-source-id: 368551184
@exported-using-ghexport

Differential Revision: [D96010436](https://our.internmc.facebook.com/intern/diff/D96010436/)

Gasoonjia added the ciflow/cuda label

Gasoonjia added a commit that referenced this pull request


          [ET Device Support] Parse device info from serialized tensor in tenso…

c72f072

…r_parser (#18328)

Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at
bottom):
* #18080
* __->__ #18328

Parse device info (device_type, device_index) from the serialized
ExtraTensorInfo in .pte files into TensorImpl at runtime.
When a tensor's extra_tensor_info contains device annotations (e.g.,
CUDA), the tensor parser now reads and propagates them to the TensorImpl
constructor. Tensors without extra_tensor_info default to CPU/0 for
backward compatibility with older PTE files.、

Differential Revision:
[D97199497](https://our.internmc.facebook.com/intern/diff/D97199497/)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/cuda CLA Signed fb-exported meta-exported