[ET Device Support] Parse device info from serialized tensor in tensor_parser by Gasoonjia · Pull Request #18328 · pytorch/executorch

Gasoonjia · 2026-03-19T17:53:51Z

Stack from ghstack (oldest at bottom):

[ET Device Support] Annotate device attributes of CUDA backend IO tensors cuda device #18080
-> [ET Device Support] Parse device info from serialized tensor in tensor_parser #18328

Parse device info (device_type, device_index) from the serialized ExtraTensorInfo in .pte files into TensorImpl at runtime.
When a tensor's extra_tensor_info contains device annotations (e.g., CUDA), the tensor parser now reads and propagates them to the TensorImpl constructor. Tensors without extra_tensor_info default to CPU/0 for backward compatibility with older PTE files.、

Differential Revision: D97199497

…r_parser Parse device info (device_type, device_index) from the serialized ExtraTensorInfo in .pte files into TensorImpl at runtime. When a tensor's extra_tensor_info contains device annotations (e.g., CUDA), the tensor parser now reads and propagates them to the TensorImpl constructor. Tensors without extra_tensor_info default to CPU/0 for backward compatibility with older PTE files.、 Differential Revision: [D97199497](https://our.internmc.facebook.com/intern/diff/D97199497/) [ghstack-poisoned]

pytorch-bot · 2026-03-19T17:53:56Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18328

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

[CI[B200] Smoke test encounters CUDA Unknown error for dgxb200-03 and dgxb200-04

✅ You can merge normally! (2 Unrelated Failures)

As of commit 6deab13 with merge base 81bc830 ():

FLAKY - The following job failed but was likely due to flakiness present on trunk:

pull / unittest-editable / windows / windows-job (gh) (matched win rule in flaky-rules.json)
##[error]The operation was canceled.

BROKEN TRUNK - The following job failed but was present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

pull / unittest / windows / windows-job (gh) (trunk failure)
##[error]The operation was canceled.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

github-actions · 2026-03-19T17:55:06Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

…or in tensor_parser" Parse device info (device_type, device_index) from the serialized ExtraTensorInfo in .pte files into TensorImpl at runtime. When a tensor's extra_tensor_info contains device annotations (e.g., CUDA), the tensor parser now reads and propagates them to the TensorImpl constructor. Tensors without extra_tensor_info default to CPU/0 for backward compatibility with older PTE files.、 Differential Revision: [D97199497](https://our.internmc.facebook.com/intern/diff/D97199497/) [ghstack-poisoned]

…r_parser Pull Request resolved: #18328 Parse device info (device_type, device_index) from the serialized ExtraTensorInfo in .pte files into TensorImpl at runtime. When a tensor's extra_tensor_info contains device annotations (e.g., CUDA), the tensor parser now reads and propagates them to the TensorImpl constructor. Tensors without extra_tensor_info default to CPU/0 for backward compatibility with older PTE files.、 ghstack-source-id: 366667637 @exported-using-ghexport Differential Revision: [D97199497](https://our.internmc.facebook.com/intern/diff/D97199497/)

…or in tensor_parser" Parse device info (device_type, device_index) from the serialized ExtraTensorInfo in .pte files into TensorImpl at runtime. When a tensor's extra_tensor_info contains device annotations (e.g., CUDA), the tensor parser now reads and propagates them to the TensorImpl constructor. Tensors without extra_tensor_info default to CPU/0 for backward compatibility with older PTE files.、 Differential Revision: [D97199497](https://our.internmc.facebook.com/intern/diff/D97199497/) [ghstack-poisoned]

…ensorSpecs (#18078) Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * #18080 * #18328 * #18079 * __->__ #18078 Add end-to-end device type annotation support from export to runtime. Currently we only support one device per graph The overall pipeline is: a. Partitioner use `compile_spec` to determine which device the partitoned blob is runing on b. after lowered partitioned graph to backend, the new-introed propagate_device_pass will annotate the input and output tensors of delegate blob as target device. Differential Revision: [D95842511](https://our.internmc.facebook.com/intern/diff/D95842511/)

…ized Tensor (#18079) Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * #18080 * #18328 * __->__ #18079 * #18078 Propagate device information from `TensorSpec.device` (set by `PropagateDevicePass`) to the serialized `schema.Tensor` in the emitted PTE file, to make runtime further aware of it. Differential Revision: [D95899706](https://our.internmc.facebook.com/intern/diff/D95899706/)

…or in tensor_parser" Parse device info (device_type, device_index) from the serialized ExtraTensorInfo in .pte files into TensorImpl at runtime. When a tensor's extra_tensor_info contains device annotations (e.g., CUDA), the tensor parser now reads and propagates them to the TensorImpl constructor. Tensors without extra_tensor_info default to CPU/0 for backward compatibility with older PTE files.、 Differential Revision: [D97199497](https://our.internmc.facebook.com/intern/diff/D97199497/) [ghstack-poisoned]

@Gasoonjia

…r_parser (#18966) This PR was created by the merge bot to help merge the original PR into the main branch. ghstack PR number: #18328 by @Gasoonjia ^ Please use this as the source of truth for the PR details, comments, and reviews ghstack PR base: https://github.com/pytorch/executorch/tree/gh/gasoonjia/143/base ghstack PR head: https://github.com/pytorch/executorch/tree/gh/gasoonjia/143/head Merge bot PR base: https://github.com/pytorch/executorch/tree/main Merge bot PR head: https://github.com/pytorch/executorch/tree/gh/gasoonjia/143/orig Differential Revision: [D97199497](https://our.internmc.facebook.com/intern/diff/D97199497/) @diff-train-skip-merge Co-authored-by: gasoonjia <gasoonjia@icloud.com>

Gasoonjia requested review from JacobSzwejbka, kirklandsign, larryliu0820 and lucylq as code owners March 19, 2026 17:53

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 19, 2026

meta-codesync Bot added fb-exported meta-exported labels Mar 19, 2026

Gasoonjia mentioned this pull request Mar 19, 2026

[ET Device Support] Add NonConstBufferDevice schema for per-buffer device mapping #18330

Open

Gasoonjia added 2 commits March 19, 2026 11:43

Gasoonjia mentioned this pull request Mar 20, 2026

[ET Device Support] Device-aware memory planning: separate buffers per device type #18375

Open

This was referenced Mar 24, 2026

[ET Device Support] MemoryManager: add per-buffer device metadata #18475

Open

[ET Device Support] Module: allocate device memory for planned buffers #18476

Open

[ET Device Support] CudaAllocator: device memory allocator for CUDA backend #18477

Open

lucylq approved these changes Mar 31, 2026

View reviewed changes

This was referenced Apr 6, 2026

[ET Device Support] Define AOT device copy ops registry #18728

Open

[ET Device Support] Define et_copy runtime h2d and d2h copy ops #18729

Open

[ET Device Support] PropagateDevicePass inserts H2D/D2H copy ops at delegate boundaries #18730

Open

Gasoonjia merged commit c72f072 into gh/gasoonjia/143/base Apr 17, 2026
161 of 164 checks passed

Gasoonjia deleted the gh/gasoonjia/143/head branch April 17, 2026 02:14

Gasoonjia temporarily deployed to cherry-pick-bot April 17, 2026 02:14 — with GitHub Actions Inactive

pytorchbot mentioned this pull request Apr 17, 2026

[ET Device Support] Parse device info from serialized tensor in tensor_parser #18966

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ET Device Support] Parse device info from serialized tensor in tensor_parser#18328

[ET Device Support] Parse device info from serialized tensor in tensor_parser#18328
Gasoonjia merged 9 commits intogh/gasoonjia/143/basefrom
gh/gasoonjia/143/head

Gasoonjia commented Mar 19, 2026 •

edited

Loading

Uh oh!

pytorch-bot Bot commented Mar 19, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Mar 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Gasoonjia commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot Bot commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18328

❗ 1 Active SEVs

✅ You can merge normally! (2 Unrelated Failures)

Uh oh!

github-actions Bot commented Mar 19, 2026

This PR needs a release notes: label

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Gasoonjia commented Mar 19, 2026 •

edited

Loading

pytorch-bot Bot commented Mar 19, 2026 •

edited

Loading

This PR needs a `release notes:` label