Skip to content

maple52046/benchtestkit

 
 

Repository files navigation

BenchtestKit

A pluggable harness for benchmarking ROCm software stacks. Each ROCm build (typically a full container image: ROCm + OS) is exercised by a set of micro-benchmarks (GEMM, attention, collective/PCIe/HBM bandwidth, ...) that emit a uniform, machine-readable result format.

Status: scaffold. This repository currently contains the architecture, the project standards, and a non-functional core skeleton. The individual benchmark plugins are migrated in incrementally. See DESIGN.md for the full refactoring plan.

Why this exists

The previous generation of the harness grew organically: ad-hoc per-test bash and Python, results parsed out of mixed stdout, xlsx reports, duplicated regression scripts, and committed throwaway artifacts. This rewrite replaces that with one shared pipeline and a thin per-benchmark plugin.

Design at a glance

  • One pipeline, many plugins. Every benchmark follows build -> run -> parse -> result -> (optional) regression. The shared pipeline lives in benchtestkit/; each benchmark is a plugin under benchmarks/.
  • Output is CSV only. The canonical result is a tidy/long CSV (one metric per row), ideal for ingestion by an external data platform. An optional human-facing wide CSV can be derived from the same data.
  • Logs are separated from results. A run writes stdout.log / stderr.log next to the structured result.csv; they are never mixed.
  • Regression is pluggable and off by default. Backends: none (default), local (baseline CSV comparison, for open-source users), and kish (external data platform, planned).

Layout

benchtestkit/      # core pipeline (knows nothing about any specific benchmark)
benchmarks/        # one plugin per benchmark (see benchmarks/_example)
vendor/            # third-party native sources without an upstream submodule
configs/           # global + per-benchmark parameter matrices
docs/standards/    # project standards (architecture, coding style, schema)
tests/             # unit tests for parsers, regression, etc.
runs/              # benchmark outputs (git-ignored)

Contributing

BenchtestKit uses a fork + pull request workflow with a strictly linear history (every change lands via Rebase and merge). See CONTRIBUTING.md for the full fork / branch / rebase / PR flow.

Standards

Contributors must follow the project standards under docs/standards/. They are surfaced to the Cursor agent as thin rules in .cursor/rules/.

Installation

python -m pip install -e ".[dev]"

Usage

benchtestkit list                       # list available benchmarks
benchtestkit check-env                  # verify required tools/deps
benchtestkit run sustained_gemm flash_attention   # run a subset
benchtestkit run --all --regression none
benchtestkit run --all --regression local   # gate against stored baselines

Results are written to runs/<run_id>/<benchmark>/ (tidy result.csv, an optional result_wide.csv, stdout.log/stderr.log, and meta.json), with a run-level run.json.

Benchmarks

Each benchmark is a plugin under benchmarks/<name>/ (see its README.md):

  • pcie_bandwidth, hbm_bandwidth, p2p_bandwidth - PCIe / HBM / peer-to-peer bandwidth.
  • allreduce, alltoall - RCCL collective bandwidth.
  • paged_attention, flash_attention - paged-attention and flash-attention kernels.
  • sustained_gemm, peak_gemm - GEMM shape sweep and peak throughput.

External dependencies

The harness orchestrates workloads but does not bundle them; the target host or container must provide the relevant tools per benchmark: TransferBench, hip-stream (built from vendor/), p2pBandwidthLatencyTest (built from vendor/), hipblaslt-bench, mpirun + RCCL perf tests, and Python packages torch / vllm / flash_attn for the attention benchmarks. Run benchtestkit check-env to see what is missing before a run.

About

A pluggable harness for benchmarking ROCm software stacks — runs GEMM, attention, bandwidth, and collective micro-benchmarks and emits uniform, machine-readable CSV results.

Topics

Resources

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 90.5%
  • Dockerfile 6.9%
  • Shell 2.6%