Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
127 changes: 126 additions & 1 deletion docs/PLUGIN_DOC.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@
| KernelModulePlugin | cat /proc/modules<br>modinfo amdgpu<br>wmic os get Version /Value | **Analyzer Args:**<br>- `kernel_modules`: dict[str, dict] — Expected kernel module name -> {version, etc.}. Analyzer checks collected modules match.<br>- `regex_filter`: list[str] — List of regex patterns to filter which collected modules are checked (default: amd). | - | [KernelModuleDataModel](#KernelModuleDataModel-Model) | [KernelModuleCollector](#Collector-Class-KernelModuleCollector) | [KernelModuleAnalyzer](#Data-Analyzer-Class-KernelModuleAnalyzer) |
| MemoryPlugin | free -b<br>lsmem<br>numactl -H<br>wmic OS get FreePhysicalMemory /Value; wmic ComputerSystem get TotalPhysicalMemory /Value | **Analyzer Args:**<br>- `ratio`: float — Required free-memory ratio (0-1). Analysis fails if free/total < ratio.<br>- `memory_threshold`: str — Minimum free memory required (e.g. '30Gi', '1T'). Used when ratio is not sufficient. | - | [MemoryDataModel](#MemoryDataModel-Model) | [MemoryCollector](#Collector-Class-MemoryCollector) | [MemoryAnalyzer](#Data-Analyzer-Class-MemoryAnalyzer) |
| NetworkPlugin | ip addr show<br>curl<br>ethtool -S {interface}<br>ethtool {interface}<br>lldpcli show neighbor<br>lldpctl<br>ip neighbor show<br>ping<br>ip route show<br>ip rule show<br>wget | - | **Collection Args:**<br>- `url`: Optional[str] — Optional URL to probe for network connectivity (used with netprobe).<br>- `netprobe`: Optional[Literal['ping', 'wget', 'curl']] — Tool to use for network connectivity probe: ping, wget, or curl. | [NetworkDataModel](#NetworkDataModel-Model) | [NetworkCollector](#Collector-Class-NetworkCollector) | - |
| NicPlugin | - | **Analyzer Args:**<br>- `expected_values`: Optional[Dict[str, Dict[str, Any]]] — Per-command expected checks keyed by canonical key (see command_to_canonical_key).<br>- `performance_profile_expected`: str — Expected Broadcom performance_profile value (case-insensitive). Default RoCE.<br>- `support_rdma_disabled_values`: List[str] — Values that indicate RDMA is not supported (case-insensitive).<br>- `pcie_relaxed_ordering_expected`: str — Expected Broadcom pcie_relaxed_ordering value (e.g. 'Relaxed ordering = enabled'); checked case-insensitively. Defaul...<br>- `expected_qos_prio_map`: Optional[Dict[Any, Any]] — Expected priority-to-TC map (e.g. {0: 0, 1: 1}; keys may be int or str in config). Checked per device when set.<br>- `expected_qos_pfc_enabled`: Optional[int] — Expected PFC enabled value (0/1 or bitmask). Checked per device when set.<br>- `expected_qos_tsa_map`: Optional[Dict[Any, Any]] — Expected TSA map for ETS (e.g. {0: 'ets', 1: 'strict'}; keys may be int or str in config). Checked per device when set.<br>- `expected_qos_tc_bandwidth`: Optional[List[int]] — Expected TC bandwidth percentages. Checked per device when set.<br>- `require_qos_consistent_across_adapters`: bool — When True and no expected_qos_* are set, require all adapters to have the same prio_map, pfc_enabled, and tsa_map.<br>- `nicctl_log_error_regex`: Optional[List[Dict[str, Any]]] — Optional list of error patterns for nicctl show card logs. | **Collection Args:**<br>- `commands`: Optional[List[str]] — Optional list of niccli/nicctl commands to run. When None, default command set is used.<br>- `use_sudo_niccli`: bool — If True, run niccli commands with sudo when required.<br>- `use_sudo_nicctl`: bool — If True, run nicctl commands with sudo when required. | [NicDataModel](#NicDataModel-Model) | [NicCollector](#Collector-Class-NicCollector) | [NicAnalyzer](#Data-Analyzer-Class-NicAnalyzer) |
| NicPlugin | niccli --listdev<br>niccli --list<br>niccli --list_devices<br>niccli -dev {device_num} nvm -getoption pcie_relaxed_ordering<br>niccli --dev {device_num} nvm --getoption pcie_relaxed_ordering<br>niccli -dev {device_num} nvm -getoption performance_profile<br>niccli --dev {device_num} nvm --getoption performance_profile<br>niccli -dev {device_num} nvm -getoption support_rdma -scope 0<br>niccli -dev {device_num} getqos<br>niccli --dev {device_num} nvm --getoption support_rdma<br>niccli --dev {device_num} qos --ets --show<br>niccli --version<br>nicctl show card<br>nicctl --version<br>nicctl show card flash partition --json<br>nicctl show card interrupts --json<br>nicctl show card logs --non-persistent<br>nicctl show card logs --boot-fault<br>nicctl show card logs --persistent<br>nicctl show card profile --json<br>nicctl show card time --json<br>nicctl show card statistics packet-buffer summary --json<br>nicctl show lif statistics --json<br>nicctl show lif internal queue-to-ud-pinning<br>nicctl show pipeline internal anomalies<br>nicctl show pipeline internal rsq-ring<br>nicctl show pipeline internal statistics memory<br>nicctl show port fsm<br>nicctl show port transceiver --json<br>nicctl show port statistics --json<br>nicctl show port internal mac<br>nicctl show qos headroom --json<br>nicctl show rdma queue --json<br>nicctl show rdma queue-pair --detail --json<br>nicctl show version firmware<br>nicctl show dcqcn<br>nicctl show environment<br>nicctl show lif<br>nicctl show pcie ats<br>nicctl show port<br>nicctl show qos<br>nicctl show rdma statistics<br>nicctl show version host-software<br>nicctl show dcqcn --card {card_id} --json<br>nicctl show card hardware-config --card {card_id} | **Analyzer Args:**<br>- `expected_values`: Optional[Dict[str, Dict[str, Any]]] — Per-command expected checks keyed by canonical key (see command_to_canonical_key).<br>- `performance_profile_expected`: str — Expected Broadcom performance_profile value (case-insensitive). Default RoCE.<br>- `support_rdma_disabled_values`: List[str] — Values that indicate RDMA is not supported (case-insensitive).<br>- `pcie_relaxed_ordering_expected`: str — Expected Broadcom pcie_relaxed_ordering value (e.g. 'Relaxed ordering = enabled'); checked case-insensitively. Defaul...<br>- `expected_qos_prio_map`: Optional[Dict[Any, Any]] — Expected priority-to-TC map (e.g. {0: 0, 1: 1}; keys may be int or str in config). Checked per device when set.<br>- `expected_qos_pfc_enabled`: Optional[int] — Expected PFC enabled value (0/1 or bitmask). Checked per device when set.<br>- `expected_qos_tsa_map`: Optional[Dict[Any, Any]] — Expected TSA map for ETS (e.g. {0: 'ets', 1: 'strict'}; keys may be int or str in config). Checked per device when set.<br>- `expected_qos_tc_bandwidth`: Optional[List[int]] — Expected TC bandwidth percentages. Checked per device when set.<br>- `require_qos_consistent_across_adapters`: bool — When True and no expected_qos_* are set, require all adapters to have the same prio_map, pfc_enabled, and tsa_map.<br>- `nicctl_log_error_regex`: Optional[List[Dict[str, Any]]] — Optional list of error patterns for nicctl show card logs. | **Collection Args:**<br>- `commands`: Optional[List[str]] — Optional list of niccli/nicctl commands to run. When None, default command set is used.<br>- `use_sudo_niccli`: bool — If True, run niccli commands with sudo when required.<br>- `use_sudo_nicctl`: bool — If True, run nicctl commands with sudo when required. | [NicDataModel](#NicDataModel-Model) | [NicCollector](#Collector-Class-NicCollector) | [NicAnalyzer](#Data-Analyzer-Class-NicAnalyzer) |
| NvmePlugin | nvme smart-log {dev}<br>nvme error-log {dev} --log-entries=256<br>nvme id-ctrl {dev}<br>nvme id-ns {dev}{ns}<br>nvme fw-log {dev}<br>nvme self-test-log {dev}<br>nvme get-log {dev} --log-id=6 --log-len=512<br>nvme telemetry-log {dev} --output-file={dev}_{f_name}<br>nvme list -o json | - | - | [NvmeDataModel](#NvmeDataModel-Model) | [NvmeCollector](#Collector-Class-NvmeCollector) | - |
| OsPlugin | sh -c '( lsb_release -ds &#124;&#124; (cat /etc/*release &#124; grep PRETTY_NAME) &#124;&#124; uname -om ) 2>/dev/null &#124; head -n1'<br>cat /etc/*release &#124; grep VERSION_ID<br>wmic os get Version /value<br>wmic os get Caption /Value | **Analyzer Args:**<br>- `exp_os`: Union[str, list] — Expected OS name/version string(s) to match (e.g. from lsb_release or /etc/os-release).<br>- `exact_match`: bool — If True, require exact match for exp_os; otherwise substring match. | - | [OsDataModel](#OsDataModel-Model) | [OsCollector](#Collector-Class-OsCollector) | [OsAnalyzer](#Data-Analyzer-Class-OsAnalyzer) |
| PackagePlugin | dnf list --installed<br>dpkg-query -W<br>pacman -Q<br>cat /etc/*release<br>wmic product get name,version | **Analyzer Args:**<br>- `exp_package_ver`: Dict[str, Optional[str]] — Map package name -> expected version (None = any version). Checked against installed packages.<br>- `regex_match`: bool — If True, match package versions with regex; otherwise exact or prefix match.<br>- `rocm_regex`: Optional[str] — Optional regex to identify ROCm package version (used when enable_rocm_regex is True).<br>- `enable_rocm_regex`: bool — If True, use rocm_regex (or default pattern) to extract ROCm version for checks. | - | [PackageDataModel](#PackageDataModel-Model) | [PackageCollector](#Collector-Class-PackageCollector) | [PackageAnalyzer](#Data-Analyzer-Class-PackageAnalyzer) |
Expand Down Expand Up @@ -439,10 +439,135 @@ Collect raw output from niccli (Broadcom) and nicctl (Pensando) commands.

**Link to code**: [nic_collector.py](https://github.com/amd/node-scraper/blob/HEAD/nodescraper/plugins/inband/nic/nic_collector.py)

### Class Variables

- **CMD_NICCLI_VERSION**: `niccli --version`
- **CMD_NICCLI_LIST**: `niccli --list`
- **CMD_NICCLI_LIST_DEVICES**: `niccli --list_devices`
- **CMD_NICCLI_LIST_DEVICES_LEGACY**: `niccli --listdev`
- **CMD_NICCLI_DISCOVERY_LEGACY**: `['niccli --listdev', 'niccli --list']`
- **CMD_NICCLI_DISCOVERY_NEW**: `['niccli --list_devices', 'niccli --list']`
- **CMD_NICCLI_DISCOVERY**: `['niccli --listdev', 'niccli --list']`
- **CMD_NICCLI_DISCOVERY_ALL**: `frozenset({'niccli --listdev', 'niccli --list_devices', 'niccli --list'})`
- **CMD_NICCLI_SUPPORT_RDMA_TEMPLATE_LEGACY**: `niccli -dev {device_num} nvm -getoption support_rdma -scope 0`
- **CMD_NICCLI_PERFORMANCE_PROFILE_TEMPLATE_LEGACY**: `niccli -dev {device_num} nvm -getoption performance_profile`
- **CMD_NICCLI_PCIE_RELAXED_ORDERING_TEMPLATE_LEGACY**: `niccli -dev {device_num} nvm -getoption pcie_relaxed_ordering`
- **CMD_NICCLI_QOS_TEMPLATE_LEGACY**: `niccli -dev {device_num} getqos`
- **CMD_NICCLI_PER_DEVICE_LEGACY**: `[
niccli -dev {device_num} nvm -getoption support_rdma -scope 0,
niccli -dev {device_num} nvm -getoption performance_profile,
niccli -dev {device_num} nvm -getoption pcie_relaxed_ordering,
niccli -dev {device_num} getqos
]`
- **CMD_NICCLI_SUPPORT_RDMA_TEMPLATE_NEW**: `niccli --dev {device_num} nvm --getoption support_rdma`
- **CMD_NICCLI_PERFORMANCE_PROFILE_TEMPLATE_NEW**: `niccli --dev {device_num} nvm --getoption performance_profile`
- **CMD_NICCLI_PCIE_RELAXED_ORDERING_TEMPLATE_NEW**: `niccli --dev {device_num} nvm --getoption pcie_relaxed_ordering`
- **CMD_NICCLI_QOS_TEMPLATE_NEW**: `niccli --dev {device_num} qos --ets --show`
- **CMD_NICCLI_PER_DEVICE_NEW**: `[
niccli --dev {device_num} nvm --getoption support_rdma,
niccli --dev {device_num} nvm --getoption performance_profile,
niccli --dev {device_num} nvm --getoption pcie_relaxed_ordering,
niccli --dev {device_num} qos --ets --show
]`
- **CMD_NICCLI_SUPPORT_RDMA_TEMPLATE**: `niccli -dev {device_num} nvm -getoption support_rdma -scope 0`
- **CMD_NICCLI_PERFORMANCE_PROFILE_TEMPLATE**: `niccli -dev {device_num} nvm -getoption performance_profile`
- **CMD_NICCLI_PCIE_RELAXED_ORDERING_TEMPLATE**: `niccli -dev {device_num} nvm -getoption pcie_relaxed_ordering`
- **CMD_NICCLI_PER_DEVICE**: `[
niccli -dev {device_num} nvm -getoption support_rdma -scope 0,
niccli -dev {device_num} nvm -getoption performance_profile,
niccli -dev {device_num} nvm -getoption pcie_relaxed_ordering,
niccli -dev {device_num} getqos
]`
- **CMD_NICCTL_CARD_TEXT**: `nicctl show card`
- **CMD_NICCTL_GLOBAL**: `[
nicctl --version,
nicctl show card flash partition --json,
nicctl show card interrupts --json,
nicctl show card logs --non-persistent,
nicctl show card logs --boot-fault,
nicctl show card logs --persistent,
nicctl show card profile --json,
nicctl show card time --json,
nicctl show card statistics packet-buffer summary --json,
nicctl show lif statistics --json,
nicctl show lif internal queue-to-ud-pinning,
nicctl show pipeline internal anomalies,
nicctl show pipeline internal rsq-ring,
nicctl show pipeline internal statistics memory,
nicctl show port fsm,
nicctl show port transceiver --json,
nicctl show port statistics --json,
nicctl show port internal mac,
nicctl show qos headroom --json,
nicctl show rdma queue --json,
nicctl show rdma queue-pair --detail --json,
nicctl show version firmware
]`
- **CMD_NICCTL_PER_CARD**: `['nicctl show dcqcn --card {card_id} --json', 'nicctl show card hardware-config --card {card_id}']`
- **CMD_NICCTL_LEGACY_TEXT**: `[
nicctl show card,
nicctl show dcqcn,
nicctl show environment,
nicctl show lif,
nicctl show pcie ats,
nicctl show port,
nicctl show qos,
nicctl show rdma statistics,
nicctl show version host-software
]`

### Provides Data

NicDataModel

### Commands

- niccli --listdev
- niccli --list
- niccli --list_devices
- niccli -dev {device_num} nvm -getoption pcie_relaxed_ordering
- niccli --dev {device_num} nvm --getoption pcie_relaxed_ordering
- niccli -dev {device_num} nvm -getoption performance_profile
- niccli --dev {device_num} nvm --getoption performance_profile
- niccli -dev {device_num} nvm -getoption support_rdma -scope 0
- niccli -dev {device_num} getqos
- niccli --dev {device_num} nvm --getoption support_rdma
- niccli --dev {device_num} qos --ets --show
- niccli --version
- nicctl show card
- nicctl --version
- nicctl show card flash partition --json
- nicctl show card interrupts --json
- nicctl show card logs --non-persistent
- nicctl show card logs --boot-fault
- nicctl show card logs --persistent
- nicctl show card profile --json
- nicctl show card time --json
- nicctl show card statistics packet-buffer summary --json
- nicctl show lif statistics --json
- nicctl show lif internal queue-to-ud-pinning
- nicctl show pipeline internal anomalies
- nicctl show pipeline internal rsq-ring
- nicctl show pipeline internal statistics memory
- nicctl show port fsm
- nicctl show port transceiver --json
- nicctl show port statistics --json
- nicctl show port internal mac
- nicctl show qos headroom --json
- nicctl show rdma queue --json
- nicctl show rdma queue-pair --detail --json
- nicctl show version firmware
- nicctl show dcqcn
- nicctl show environment
- nicctl show lif
- nicctl show pcie ats
- nicctl show port
- nicctl show qos
- nicctl show rdma statistics
- nicctl show version host-software
- nicctl show dcqcn --card {card_id} --json
- nicctl show card hardware-config --card {card_id}

## Collector Class NvmeCollector

### Description
Expand Down
22 changes: 3 additions & 19 deletions docs/generate_plugin_doc_bundle.py
Original file line number Diff line number Diff line change
Expand Up @@ -439,17 +439,9 @@ def generate_plugin_table_rows(plugins: List[type]) -> List[List[str]]:
an = get_attr(p, "ANALYZER", None)
args = get_attr(p, "ANALYZER_ARGS", None)
collector_args_cls = get_attr(p, "COLLECTOR_ARGS", None)
cmds = []
cmds: List[str] = []
if inspect.isclass(col):
cmds += extract_cmds_from_classvars(col)
seen = set()
uniq = []
for c in cmds:
key = " ".join(c.split())
if key not in seen:
seen.add(key)
uniq.append(c)
cmds = uniq
cmds = extract_cmds_from_classvars(col)

# Extract regexes and args from analyzer
regex_and_args = []
Expand Down Expand Up @@ -505,16 +497,8 @@ def render_collector_section(col: type, link_base: str, rel_root: Optional[str])
dm = get_attr(col, "DATA_MODEL", None)
s += md_header("Provides Data", 3) + (f"{dm.__name__}\n\n" if inspect.isclass(dm) else "-\n\n")

cmds = []
cmds += extract_cmds_from_classvars(col)
cmds = extract_cmds_from_classvars(col)
if cmds:
seen, uniq = set(), []
for c in cmds:
key = " ".join(c.split())
if key not in seen:
seen.add(key)
uniq.append(c)
cmds = uniq
s += md_header("Commands", 3) + md_list(cmds)

return s
Expand Down
Loading
Loading