BenchtestKit

A pluggable harness for benchmarking ROCm software stacks. Each ROCm build (typically a full container image: ROCm + OS) is exercised by a set of micro-benchmarks (GEMM, attention, collective/PCIe/HBM bandwidth, ...) that emit a uniform, machine-readable result format.

Status: scaffold. This repository currently contains the architecture, the project standards, and a non-functional core skeleton. The individual benchmark plugins are migrated in incrementally. See DESIGN.md for the full refactoring plan.

Why this exists

The previous generation of the harness grew organically: ad-hoc per-test bash and Python, results parsed out of mixed stdout, xlsx reports, duplicated regression scripts, and committed throwaway artifacts. This rewrite replaces that with one shared pipeline and a thin per-benchmark plugin.

Design at a glance

One pipeline, many plugins. Every benchmark follows build -> run -> parse -> result -> (optional) regression. The shared pipeline lives in benchtestkit/; each benchmark is a plugin under benchmarks/.
Output is CSV only. The canonical result is a tidy/long CSV (one metric per row), ideal for ingestion by an external data platform. An optional human-facing wide CSV can be derived from the same data.
Logs are separated from results. A run writes stdout.log / stderr.log next to the structured result.csv; they are never mixed.
Regression is pluggable and off by default. Backends: none (default), local (baseline CSV comparison, for open-source users), and kish (external data platform, planned).

Layout

benchtestkit/      # core pipeline (knows nothing about any specific benchmark)
benchmarks/        # one plugin per benchmark (see benchmarks/_example)
vendor/            # third-party native sources without an upstream submodule
configs/           # global + per-benchmark parameter matrices
docs/standards/    # project standards (architecture, coding style, schema)
tests/             # unit tests for parsers, regression, etc.
runs/              # benchmark outputs (git-ignored)

Contributing

BenchtestKit uses a fork + pull request workflow with a strictly linear history (every change lands via Rebase and merge). See CONTRIBUTING.md for the full fork / branch / rebase / PR flow.

Standards

Contributors must follow the project standards under docs/standards/. They are surfaced to the Cursor agent as thin rules in .cursor/rules/.

Installation

python -m pip install -e ".[dev]"

Usage

benchtestkit list                       # list available benchmarks
benchtestkit check-env                  # verify required tools/deps
benchtestkit run sustained_gemm flash_attention   # run a subset
benchtestkit run --all --regression none
benchtestkit run --all --regression local   # gate against stored baselines

Results are written to runs/<run_id>/<benchmark>/ (tidy result.csv, an optional result_wide.csv, stdout.log/stderr.log, and meta.json), with a run-level run.json.

Benchmarks

Each benchmark is a plugin under benchmarks/<name>/ (see its README.md):

pcie_bandwidth, hbm_bandwidth, p2p_bandwidth - PCIe / HBM / peer-to-peer bandwidth.
allreduce, alltoall - RCCL collective bandwidth.
paged_attention, flash_attention - paged-attention and flash-attention kernels.
sustained_gemm, peak_gemm - GEMM shape sweep and peak throughput.

External dependencies

The harness orchestrates workloads but does not bundle them; the target host or container must provide the relevant tools per benchmark: TransferBench, hip-stream (built from vendor/), p2pBandwidthLatencyTest (built from vendor/), hipblaslt-bench, mpirun + RCCL perf tests, and Python packages torch / vllm / flash_attn for the attention benchmarks. Run benchtestkit check-env to see what is missing before a run.

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
.cursor		.cursor
benchmarks		benchmarks
benchtestkit		benchtestkit
configs		configs
docs		docs
inventory		inventory
skills		skills
tests		tests
vendor		vendor
.cursorindexingignore		.cursorindexingignore
.dockerignore		.dockerignore
.gitignore		.gitignore
.gitmodules		.gitmodules
AGENTS.md		AGENTS.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BenchtestKit

Why this exists

Design at a glance

Layout

Contributing

Standards

Installation

Usage

Benchmarks

External dependencies

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

BenchtestKit

Why this exists

Design at a glance

Layout

Contributing

Standards

Installation

Usage

Benchmarks

External dependencies

About

Topics

Resources

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages