Skip to content

[Cadence: Vision] ResNet18 & ResNet50: Optimized, DMA-enabled, functional#19111

Open
cad-rlc wants to merge 8 commits into
pytorch:mainfrom
cad-rlc:main
Open

[Cadence: Vision] ResNet18 & ResNet50: Optimized, DMA-enabled, functional#19111
cad-rlc wants to merge 8 commits into
pytorch:mainfrom
cad-rlc:main

Conversation

@cad-rlc
Copy link
Copy Markdown

@cad-rlc cad-rlc commented Apr 24, 2026

Summary

Optimized Cadence Vision DSP operators for ResNet18 and ResNet50 inference. All operators are DMA-enabled with ping-pong tiling and functionally verified (int8 quantized, NCHW layout).

Operators

Conv2d (quantized_conv2d_nchw)

  • Kernel variants: 7x7j2, 3x3j1, 3x3j2, 1x1j1, 1x1j2
  • Modes: DMA ping-pong tiling (with iDMA) and cache-only (no DMA)
  • Dispatch: Automatic kernel selection based on layer config (kernel size, stride, dilation)
  • Quantization: int8 asymmetric input × symmetric weights, per-tensor output scaling
  • Bias correction: 24-bit clamped kernel bias with post-kernel residual correction
  • Config generator: Python tool to generate per-DRAM-size layer config headers

MaxPool2d (maxpool_exec_mxnj2)

  • Kernel: Arbitrary MxN kernel size, stride-2
  • Modes: DMA tiled and cache-only (no DMA)
  • Layout: NCHW float32

Mean / AdaptiveAvgPool (mean_exec_dma)

  • Kernel: SIMD-optimized channel-wise mean with DMA tiling
  • Layout: NCHW float32, reduces spatial dims to 1x1

Quantize / Dequantize (quantize_per_tensor, dequantize_per_tensor)

  • Modes: DMA ping-pong and HW-optimized (no DMA)
  • Types: int8 asymmetric (asym8s)

Quantized ReLU (quantized_relu)

  • Modes: DMA ping-pong and HW-optimized (no DMA)
  • Type: int8 clamp

Quantized Linear (quantized_linear_out)

  • Mode: SIMD with DMA tiling
  • Type: int8 input × int8 weights, int32 bias

Add (op_add)

  • Mode: DMA ping-pong element-wise float32 add

Softmax (op_softmax)

  • Mode: HW-optimized softmax

Build Configuration

  • Supports configurable DRAM buffer sizes.
  • Automatic DMA vs cache-only dispatch based on DRAM availability

cc @mcremon-meta @hsharma35 @zonglinpengmeta

@pytorch-bot
Copy link
Copy Markdown

pytorch-bot Bot commented Apr 24, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19111

Note: Links to docs will display an error until the docs builds have been completed.

❗ 2 Active SEVs

There are 2 currently active SEVs. If your PR is affected, please view them below:

⚠️ 11 Awaiting Approval

As of commit 93271c8 with merge base 513a4ea (image):

AWAITING APPROVAL - The following workflows need approval before CI can run:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 24, 2026
@github-actions github-actions Bot added ciflow/trunk module: arm Issues related to arm backend labels Apr 24, 2026
@pytorch-bot
Copy link
Copy Markdown

pytorch-bot Bot commented Apr 24, 2026

The following ciflow label(s) have been added but CI has not been triggered yet because the workflows are awaiting approval:

  • ciflow/trunk

Once a maintainer approves the workflows (scroll to the bottom of the PR page), the corresponding CI jobs will be triggered automatically. Please ping one of the reviewers if you do not have access to approve and run workflows.

@github-actions
Copy link
Copy Markdown

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

@cad-rlc
Copy link
Copy Markdown
Author

cad-rlc commented May 8, 2026

@mcremon-meta @hsharma35 @zonglinpengmeta
This is the final PR for the ResNet18 and ResNet50 models.

Copy link
Copy Markdown
Contributor

@mcremon-meta mcremon-meta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will continue the review later, but can we clean the set of files first? I don't quite understand why we have so many files checked in, including CMakeFiles etc.

Comment thread 1 Outdated
@@ -0,0 +1,25 @@
Collecting matplotlib
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure what this file is?

Comment thread backends/cadence/aot/functions.yaml Outdated
kernel_name: impl::generic::quantized_matmul_asym8uxasym8u_asym8u_out

- func: cadence::im2row.out(Tensor input, int[2] kernel_size, int[2] dilation, int[2] padding, int[2] stride, Tensor in_zero_point, bool channel_last=False, *, Tensor(a!) out) -> Tensor(a!)
- func: cadence::im2row.out(Tensor input, int[2] kernel_size, int[2] dilation, int[2] padding, int[2] stride, Tensor in_zero_point, bool channel_last, *, Tensor(a!) out) -> Tensor(a!)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this needed?

@cad-rlc
Copy link
Copy Markdown
Author

cad-rlc commented May 15, 2026

@mcremon-meta few stale files were accidentally committed in this pull request. We are addressing the issue and will submit a new PR shortly.

@linux-foundation-easycla
Copy link
Copy Markdown

linux-foundation-easycla Bot commented May 28, 2026

CLA Missing ID

Suraj Raut added 7 commits May 28, 2026 07:16
…onal

- Add DMA-optimized operators: conv2d (1x1/3x3/7x7), maxpool, quantize/dequantize, relu, add, mean, softmax, linear
- Add new operators: embedding, full, im2row, quantized_fully_connected, quantized_layer_norm, quantized_matmul, requantize, view_copy
- Add vision/kernels library and quantized_ops.h header
- Add config generator for DMA buffer sizing
- Update functions_vision.yaml and CMakeLists.txt
- Add third-party XAI libraries (libxai, libxai_common, libxa_nnlib)
- FACTO submodule update
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. module: arm Issues related to arm backend

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants