collector: add ebsnvme collector for Amazon EBS NVMe performance statsFeat/ebsnvme collector#3695
Open
AllenXieSZ wants to merge 3 commits into
Open
collector: add ebsnvme collector for Amazon EBS NVMe performance statsFeat/ebsnvme collector#3695AllenXieSZ wants to merge 3 commits into
AllenXieSZ wants to merge 3 commits into
Conversation
* Fix kernel_hung for no data (prometheus#3613) Return an ErrNoData for the kernel_hung collector if the file does not exist. Fixes: prometheus#3612 Signed-off-by: Ben Kochie <superq@gmail.com> * Release v1.11.1 (prometheus#3615) * [BUGFIX] Fix kernel_hung for no data prometheus#3613 Signed-off-by: Ben Kochie <superq@gmail.com> --------- Signed-off-by: Ben Kochie <superq@gmail.com>
Add a new (disabled-by-default) Linux collector, ebsnvme, that exposes the Amazon EBS detailed performance statistics vended by Nitro-based EC2 instances through the EBS NVMe device log page (log page 0xD0). The collector reads the EBS statistics log page from each EBS-backed NVMe device via an NVMe admin ioctl, parses the binary structure, and exposes read/write ops, bytes, time, IOPS/throughput exceeded counters, queue length, and read/write latency histograms. Metrics are labelled by volume_id, device, and mount_path. The log-page parsing logic is derived from the Amazon EBS CSI Driver (pkg/metrics/nvme.go), Apache-2.0, Copyright The Kubernetes Authors. Statistic names and semantics follow the Amazon EBS User Guide: https://docs.aws.amazon.com/ebs/latest/userguide/nvme-detailed-performance-stats.html Signed-off-by: Allen Xie <weifeng.xie@qq.com>
Opening the NVMe character device and issuing the admin passthru ioctl requires CAP_SYS_ADMIN (in practice, root). Note this in the package doc comment and the README so users running node_exporter as an unprivileged user understand why no node_ebs_* metrics appear. Signed-off-by: Allen Xie <weifeng.xie@qq.com>
SuperQ
requested changes
Jun 21, 2026
SuperQ
left a comment
Member
There was a problem hiding this comment.
I'm not sure this is an appropriate feature for the node_exporter. I think it is a bit too vendor specific.
What do you think @discordianfish
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This adds a new disabled-by-default Linux collector, ebsnvme, that exposes the Amazon EBS detailed performance statistics that Nitro-based EC2 instances vend through the EBS NVMe device log page (log page 0xD0).
What it does
For every EBS-backed NVMe device, the collector:
Maps NVMe devices to EBS volume IDs and mount paths via lsblk -nd --json -o NAME,SERIAL,MOUNTPOINT.
Reads EBS statistics log page 0xD0 via an NVMe admin ioctl (NVME_IOCTL_ADMIN_CMD), opening the device read-only.
Parses the binary EBS statistics structure (validated by its magic number 0x3C23B510).
Exposes the values as Prometheus metrics labelled by volume_id, device, and mount_path.
Metrics (namespace node_ebs_*)
node_ebs_read_ops_total, node_ebs_write_ops_total
node_ebs_read_bytes_total, node_ebs_write_bytes_total
node_ebs_read_seconds_total, node_ebs_write_seconds_total
node_ebs_exceeded_iops_seconds_total, node_ebs_exceeded_tp_seconds_total
node_ebs_ec2_exceeded_iops_seconds_total, node_ebs_ec2_exceeded_tp_seconds_total
node_ebs_volume_queue_length
node_ebs_read_io_latency_seconds, node_ebs_write_io_latency_seconds (histograms)
Each metric Help string references the corresponding official EBS statistic name (e.g. total_read_ops, ebs_volume_performance_exceeded_iops, read_io_latency_histogram).
Why disabled by default
It is Linux + Nitro-EC2 + EBS-NVMe specific and issues an NVMe admin ioctl per device, so it is meaningless elsewhere. Enable with --collector.ebsnvme.
Attribution / licensing
The EBS log-page parsing logic is derived from the Amazon EBS CSI Driver (pkg/metrics/nvme.go, Apache-2.0, Copyright The Kubernetes Authors). This is noted in the file header alongside the standard Prometheus Apache-2.0 header.
Tested on real AWS EBS + EC2
Validated live on a Nitro-based EC2 instance (us-east-2) with 6 attached EBS NVMe volumes running a MySQL workload (data / redo / binlog / undo / relay_log on separate EBS volumes):
gofmt clean; go build OK; unit tests pass (TestParseEBSLogPageInvalidMagic, TestParseEBSLogPageValid, TestConvertEBSHistogram).
Scraped by Prometheus for 30+ minutes: target up 100%, scrape duration steady 24–47 ms, all node_ebs_* series present (6 per metric, one per volume).
Grafana panels (read/write latency + P99, throughput, I/O size, IOPS, EBS/EC2 exceeded counters, queue length) render correctly per-volume, with the mount_path label distinguishing each MySQL data path. Screenshots attached below.
An example Grafana dashboard built entirely on these node_ebs_* metrics (14 panels, Prometheus datasource templated) is attached.
Notes
Commit is DCO signed-off.
mount_path is NotMounted for a device with no direct mount point (e.g. a disk mounted only through one of its partitions).