Add Microbenchmarking Infrastructure and CI Using GVSOC CSR by runwangdl · Pull Request #162 · pulp-platform/Deeploy

runwangdl · 2026-02-12T12:57:04Z

This PR adds a CSR-based microbenchmarking mechanism from GVSOC.

Added

Changed

Fixed

PR Merge Checklist

The PR is rebased on the latest devel commit and pointing to devel.
Your PR reviewed and approved.
All checks are passing.
The CHANGELOG.md file has been updated.
If the docker was modified, change back its link after review.

coderabbitai · 2026-02-12T13:00:52Z

📝 Walkthrough

Walkthrough

A new performance benchmarking header (perf_utils.h) is introduced, defining utilities to initialize, control, and read hardware performance counters on RISCV/PMSIS platforms. The GEMM implementation is instrumented to capture and report performance metrics during kernel execution.

Changes

Cohort / File(s)	Summary
Performance Benchmarking Utilities `TargetLibraries/PULPOpen/inc/perf_utils.h`	Introduces perf_stats_t struct to hold 15 performance counter fields (cycles, instructions, loads, stores, stalls, branches, misses, etc.) and five inline helper functions: perf_bench_init(), perf_bench_start(), perf_bench_stop(), perf_bench_read(), and perf_bench_print() for managing and reporting hardware counters; also includes perf_bench_diff() for computing counter deltas.
GEMM Instrumentation `TargetLibraries/PULPOpen/src/Gemm.c`	Adds performance counter instrumentation by including perf_utils.h, declaring perf_start/perf_end/perf_total snapshot structs, and calling benchmark functions at function entry and exit (core 0 only) to measure and print labeled performance metrics.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (2 inconclusive)

Check name	Status	Explanation	Resolution
Description check	❓ Inconclusive	The description mentions adding a CSR-based microbenchmarking mechanism but provides no detail about what was actually added; the template sections are empty.	Fill in the PR template sections with specifics about what was added (perf_utils.h with helper functions) and how it was integrated (instrumentation in Gemm.c).
Title check	❓ Inconclusive	The title mentions adding 'Microbenchmarking Infrastructure' but doesn't specify the key implementation detail that it's based on GVSOC CSR, and doesn't mention the GEMM instrumentation which is the primary concrete change.	Consider refining the title to be more specific about the primary change, such as 'Add performance benchmarking utilities and instrument GEMM with GVSOC CSR' to better reflect the main deliverables.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Docstring Coverage	✅ Passed	Docstring coverage is 85.71% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Tip

Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🤖 Fix all issues with AI agents

In `@TargetLibraries/PULPOpen/inc/perf_utils.h`:
- Around line 1-3: Add the project's SPDX header and copyright notice to the top
of perf_utils.h (same style as Gemm.c): insert an SPDX-License-Identifier line
and the matching copyright/ownership comment block as the file prologue so the
reuse pre-commit hook passes; update the header in
TargetLibraries/PULPOpen/inc/perf_utils.h (top-of-file comment) to match the
project's existing license metadata.

In `@TargetLibraries/PULPOpen/src/Gemm.c`:
- Around line 21-29: The perf counters are started on core 0 before checking for
an early return when M_size==0, so core 0 can return without calling
perf_bench_stop(), leaving counters running; move the early-return check (the
M_size == 0/NUM_CORES logic) to occur before calling perf_bench_init(),
perf_bench_start(), and perf_bench_read, or alternatively ensure that before any
return when core_id==0 you call perf_bench_stop() and perf_bench_read/print
cleanup; update the code around
perf_bench_init()/perf_bench_start()/perf_bench_read() and the early-return
condition (referencing core_id, M_size, NUM_CORES, perf_bench_stop) so core 0
always stops and cleans up counters.
- Line 9: The file TargetLibraries/PULPOpen/src/Gemm.c fails CI due to
clang-format changes; run your formatter or the repository pre-commit hooks to
apply the project's clang-format style (e.g., run clang-format on Gemm.c or
execute the repo's pre-commit script), then stage the modified Gemm.c and update
the PR so the include line and surrounding formatting match the project's style;
ensure the header include "perf_utils.h" and nearby code conform after
formatting.

🧹 Nitpick comments (4)

TargetLibraries/PULPOpen/inc/perf_utils.h (2)
139-157: Add const qualifiers to read-only pointer parameters.

end and start are not modified; mark them const for safety and self-documentation. Same applies to stats in perf_bench_print (Line 102).
Proposed fix
-static inline void perf_bench_diff(perf_stats_t *result,
-                                    perf_stats_t *end,
-                                    perf_stats_t *start) {
+static inline void perf_bench_diff(perf_stats_t *result,
+                                    const perf_stats_t *end,
+                                    const perf_stats_t *start) {
102-136: Heavy printf with %f formatting on an embedded target.

Floating-point printf (e.g., %.2f, %.3f) can pull in a large soft-float formatting library, significantly inflating code size on PULP. Consider whether this is acceptable for your target, or switch to scaled-integer printing if code size matters.
TargetLibraries/PULPOpen/src/Gemm.c (2)

25-29: The perf_start snapshot is redundant — counters were just reset.

perf_bench_start() calls pi_perf_reset() then pi_perf_start(), so all counters are zero. Reading into perf_start immediately after will yield near-zero values, making perf_bench_diff equivalent to just using perf_end. You can simplify by dropping the start snapshot and the diff, and printing perf_end directly.

365-376: Perf measurement captures only core 0's counters and includes non-kernel overhead.

The measurement window includes the chunk-setup arithmetic (Lines 31–34) and the snprintf label formatting (Lines 372–374), not just the GEMM kernel. On core 0, this is likely negligible noise, but be aware the reported numbers aren't purely kernel cycles. Also, only core 0's counters are captured — if multi-core behavior is of interest, you'd need per-core reporting or a barrier-based approach.

coderabbitai · 2026-02-12T13:00:55Z

TargetLibraries/PULPOpen/inc/perf_utils.h

+/*
+ * Performance Counter Utilities for PULP Benchmarking
+ */


⚠️ Potential issue | 🔴 Critical

Missing SPDX license header — CI is failing.

The pipeline reports that the reuse pre-commit hook fails because this file has no license identifier or copyright notice. Add an SPDX header consistent with the rest of the project (see Gemm.c for an example).

Proposed fix

-/* - * Performance Counter Utilities for PULP Benchmarking - */ +/* + * SPDX-FileCopyrightText: 2026 ETH Zurich and University of Bologna + * + * SPDX-License-Identifier: Apache-2.0 + * + * Performance Counter Utilities for PULP Benchmarking + */

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

/*

* Performance Counter Utilities for PULP Benchmarking

*/

/*

* SPDX-FileCopyrightText: 2026 ETH Zurich and University of Bologna

*

* SPDX-License-Identifier: Apache-2.0

*

* Performance Counter Utilities for PULP Benchmarking

*/

🧰 Tools

🪛 GitHub Actions: CI • Lint & Licenses

[error] 1-1: pre-commit hook 'reuse' failed: no license identifier and no copyright notice in perf_utils.h.

[error] 1-1: pre-commit hook 'clang-format' modified the file. Please re-run hooks to apply formatting changes.

🤖 Prompt for AI Agents

In `@TargetLibraries/PULPOpen/inc/perf_utils.h` around lines 1 - 3, Add the project's SPDX header and copyright notice to the top of perf_utils.h (same style as Gemm.c): insert an SPDX-License-Identifier line and the matching copyright/ownership comment block as the file prologue so the reuse pre-commit hook passes; update the header in TargetLibraries/PULPOpen/inc/perf_utils.h (top-of-file comment) to match the project's existing license metadata.

coderabbitai · 2026-02-12T13:00:55Z

TargetLibraries/PULPOpen/src/Gemm.c


 #include "DeeployPULPMath.h"
 #include "pmsis.h"
+#include "perf_utils.h"


⚠️ Potential issue | 🟡 Minor

CI clang-format failure — run the formatter.

The pipeline reports that clang-format modified this file. Re-run the pre-commit hooks to fix formatting before merge.

🤖 Prompt for AI Agents

In `@TargetLibraries/PULPOpen/src/Gemm.c` at line 9, The file TargetLibraries/PULPOpen/src/Gemm.c fails CI due to clang-format changes; run your formatter or the repository pre-commit hooks to apply the project's clang-format style (e.g., run clang-format on Gemm.c or execute the repo's pre-commit script), then stage the modified Gemm.c and update the PR so the include line and surrounding formatting match the project's style; ensure the header include "perf_utils.h" and nearby code conform after formatting.

coderabbitai · 2026-02-12T13:00:55Z

TargetLibraries/PULPOpen/src/Gemm.c

+  // Performance monitoring structures
+  perf_stats_t perf_start, perf_end, perf_total;
+
+  // Initialize and start performance counters (only core 0)
+  if (core_id == 0) {
+    perf_bench_init();
+    perf_bench_start();
+    perf_bench_read(&perf_start);
+  }


⚠️ Potential issue | 🔴 Critical

Bug: early return on Line 37 leaves perf counters running on core 0.

If M < NUM_CORES, core 0 may get M_size == 0 and return at Line 37 without ever executing perf_bench_stop() (Line 368). This leaves counters running and skips the cleanup/print block entirely.

Move the early-return check before the perf init, or add stop logic before the return:

Proposed fix (option A — move early return before perf init)

int8_t core_id = pi_core_id(); int8_t log2Core = LOG2(NUM_CORES); + uint32_t M_chunk = (M >> log2Core) + ((M & (NUM_CORES - 1)) != 0); + uint32_t M_start = MIN(core_id * M_chunk, M); + uint32_t M_end = MIN(M_start + M_chunk, M); + uint32_t M_size = M_end - M_start; + + if (M_size == 0) { + return; + } + // Performance monitoring structures perf_stats_t perf_start, perf_end, perf_total; // Initialize and start performance counters (only core 0) if (core_id == 0) { perf_bench_init(); perf_bench_start(); perf_bench_read(&perf_start); } - uint32_t M_chunk = (M >> log2Core) + ((M & (NUM_CORES - 1)) != 0); - uint32_t M_start = MIN(core_id * M_chunk, M); - uint32_t M_end = MIN(M_start + M_chunk, M); - uint32_t M_size = M_end - M_start; - - if (M_size == 0) { - return; - }

Also applies to: 36-38

🤖 Prompt for AI Agents

In `@TargetLibraries/PULPOpen/src/Gemm.c` around lines 21 - 29, The perf counters are started on core 0 before checking for an early return when M_size==0, so core 0 can return without calling perf_bench_stop(), leaving counters running; move the early-return check (the M_size == 0/NUM_CORES logic) to occur before calling perf_bench_init(), perf_bench_start(), and perf_bench_read, or alternatively ensure that before any return when core_id==0 you call perf_bench_stop() and perf_bench_read/print cleanup; update the code around perf_bench_init()/perf_bench_start()/perf_bench_read() and the early-return condition (referencing core_id, M_size, NUM_CORES, perf_bench_stop) so core 0 always stops and cleans up counters.

Xeratec

As discussed with @runwangdl offline, I am strongly against modifying the kernel libraries to add profiling code to every individual kernel.

Instead, I suggest adding a performance profiling code transformation that can be configured to different modes:

"kernel": Profile the call to the kernel function (potentially for multiple cores)
"tile": Profile the function inside the tiling loop (effectively profiling each tile)
"layer ": Profile each layer from a given memory level
"network": Profile the full network

Not that kernel in this case will include the small overhead of the function call. This functionality can be merged with the existing profiling infrastructure (probably requires some refactoring, though)

@Victor-Jung What do you think?

Example Artificial Logging Output

Optimally, I would imagine something like the output below, with a hierarchical breakdown of the different function calls and runtime and performance statistics at different levels.

Note that the numbers are artificial.

=== Cycle Breakdown ====================================================================================================
[network] 100855 cyc (100.00%)
- [matmul_node_1] 98876 cyc (58.04%)
  - [L3-L2][Tile 0]      38363 cyc, 6070 cyc (15.8%) kernel, 32293 cyc (84.2%) overhead
    - [L2-L1][Tile 0]     5642 cyc, 4716 cyc (83.6%) kernel,   926 cyc (16.4%) overhead
  - [L3-L2][Tile 1]      24095 cyc, 3599 cyc (14.9%) kernel, 20596 cyc (85.1%) overhead
    - [L2-L1][Tile 1]     3180 cyc, 2719 cyc (85.5%) kernel,   461 cyc (14.5%) overhead
  - [L3-L2][Tile 2]      24141 cyc, 5881 cyc (24.4%) kernel, 18260 cyc (75.6%) overhead
    - [L2-L1][Tile 2]     5526 cyc, 4653 cyc (84.2%) kernel,   873 cyc (15.8%) overhead
  - [L3-L2][Tile 3]      11550 cyc, 3598 cyc (31.2%) kernel,  7952 cyc (68.8%) overhead
    - [L2-L1][Tile 3]     3213 cyc, 2773 cyc (86.3%) kernel,   440 cyc (13.7%) overhead
- [matmul_node_2] 98876 cyc (38.04%)
  - [L3-L2][Tile 0]      38363 cyc, 6070 cyc (15.8%) kernel, 32293 cyc (84.2%) overhead
    - [L2-L1][Tile 0]     5642 cyc, 4716 cyc (83.6%) kernel,   926 cyc (16.4%) overhead
  - [L3-L2][Tile 1]      24095 cyc, 3599 cyc (14.9%) kernel, 20596 cyc (85.1%) overhead
    - [L2-L1][Tile 1]     3180 cyc, 2719 cyc (85.5%) kernel,   461 cyc (14.5%) overhead
========================================================================================================================

=== Statistics [network] ===============================================================================================
- [All Cores]
    Cycles          : 5642
    Operations      : 5642
    Ops/Cycle       : 1.000
- [Core 0]
    Cycles          : 5642
    Instructions    : 4716
    Ins/Cycle       : 0.836
    Loads           : 5142 (38.73%)
    Stores          :  148 (1.11%)
    Branches        :   93 (0.70%)
    Taken Branches  :   12 (12.90%)
    Load Stalls     : 402
    Jump Stalls     :   0
    I$ Misses       : 647
    TCDM Cont.      :   0
- [Core 1]
    Cycles          : 5642
    Instructions    : 4716
    Ins/Cycle       : 0.836
    Loads           : 5142 (38.73%)
    Stores          :  148 (1.11%)
    Branches        :   93 (0.70%)
    Taken Branches  :   12 (12.90%)
    Load Stalls     : 402
    Jump Stalls     :   0
    I$ Misses       : 647
    TCDM Cont.      :   0
[...]
========================================================================================================================

=== Statistics [matmul_node_1] =========================================================================================
[...]
========================================================================================================================

=== Statistics [matmul_node_1][L3-L2][Tile 0] ==========================================================================
[...]
========================================================================================================================

=== Statistics [matmul_node_1][L2-L1][Tile 0] ==========================================================================
[...]
========================================================================================================================

=== Statistics [matmul_node_1][L2-L1][Tile 0][PULP_MatMul_fp32_fp32_fp32_unroll1x7] ====================================
[...]
========================================================================================================================

runwangdl requested review from Victor-Jung and Xeratec as code owners February 12, 2026 12:57

runwangdl marked this pull request as draft February 12, 2026 12:57

runwangdl self-assigned this Feb 12, 2026

coderabbitai bot reviewed Feb 12, 2026

View reviewed changes

runwangdl changed the title ~~Add Microbenchmarking Infrastructure Using GVSOC CSR~~ Add Microbenchmarking Infrastructure and CI Using GVSOC CSR Feb 12, 2026

Deeploy Microbenchmark with GVSoC CSR and Demo on GEMM

49cddd2

runwangdl force-pushed the microbenchmark branch from a6231d0 to 49cddd2 Compare February 12, 2026 15:28

Xeratec reviewed Feb 13, 2026

View reviewed changes

Xeratec added the Feature Addition of new features label Feb 13, 2026

Xeratec added this to Deeploy Feb 13, 2026

Xeratec added this to the Release 0.2.2 milestone Feb 13, 2026

Xeratec moved this to In progress in Deeploy Feb 13, 2026

Add microbenchmark to codepass

609179c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Microbenchmarking Infrastructure and CI Using GVSOC CSR #162

Add Microbenchmarking Infrastructure and CI Using GVSOC CSR #162
runwangdl wants to merge 2 commits intopulp-platform:develfrom
runwangdl:microbenchmark

runwangdl commented Feb 12, 2026

Uh oh!

coderabbitai bot commented Feb 12, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Feb 12, 2026

Uh oh!

coderabbitai bot Feb 12, 2026

Uh oh!

coderabbitai bot Feb 12, 2026

Uh oh!

Xeratec left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

runwangdl commented Feb 12, 2026

Added

Changed

Fixed

PR Merge Checklist

Uh oh!

coderabbitai bot commented Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

Xeratec left a comment

Choose a reason for hiding this comment

Example Artificial Logging Output

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai bot commented Feb 12, 2026 •

edited

Loading