Prometheus exporter for SONiC network switches.
This project collects switch telemetry from SONiC Redis databases and exposes it in Prometheus format. It also enables a curated subset of node_exporter host metrics, so you can monitor switch services and system health in one scrape target.
Exporters let you turn platform-specific telemetry into a standard metrics format:
- Prometheus can scrape data from many systems in one consistent way.
- You can build shared dashboards and alerts across vendors and platforms.
- Metrics become queryable with one language (PromQL), which reduces operational friction.
- You can correlate switch-level signals (interfaces, queues, FDB, LLDP) with host-level signals (CPU, memory, filesystem).
For SONiC environments, this means less custom glue code and faster troubleshooting from a single monitoring stack.
sonic-exporter is focused on production-friendly SONiC observability:
- Reads from SONiC Redis and selected local read-only sources.
- Keeps scrape latency stable with cached refresh loops per collector.
- Enforces guardrails (timeouts, caps, bounded labels) to control cardinality and scrape cost.
- Keeps experimental collectors opt-in.
flowchart LR
subgraph SONiC host
R[(SONiC Redis DBs)]
F[/Read-only files/]
C[[Allowlisted commands]]
end
subgraph sonic-exporter
M[cmd/sonic-exporter/main.go]
COL[Collectors\ninterface, hw, crm, queue, lldp, vlan, lag, fdb\nsystem*, docker*]
CACHE[(In-memory metric cache)]
NODE[node_exporter subset\nloadavg,cpu,diskstats,filesystem,meminfo,time,stat]
end
P[(Prometheus)]
R --> COL
F --> COL
C --> COL
COL --> CACHE
CACHE --> M
NODE --> M
M -->|/metrics| P
* Experimental collectors are disabled by default.
sonic-exporter/
├── cmd/sonic-exporter/ # bootstrap, collector registration, HTTP server
├── internal/collector/ # SONiC collectors and collector tests
├── pkg/redis/ # Redis access wrapper
├── fixtures/test/ # test fixtures loaded into miniredis
├── scripts/ # static build and package helpers
└── .github/workflows/ # CI test and release pipelines
For a deeper breakdown, see docs/architecture.md.
| Collector | Purpose | Default |
|---|---|---|
| Interface | Interface operation and traffic metrics | Enabled |
| HW | PSU and fan health metrics | Enabled |
| CRM | Critical resource monitoring | Enabled |
| Queue | Queue counters and watermarks | Enabled |
| LLDP | LLDP neighbors from Redis | Enabled |
| VLAN | VLAN and VLAN member state | Enabled |
| LAG | PortChannel and member state | Enabled |
| FDB | FDB summary from ASIC DB | Disabled (FDB_ENABLED=false) |
| System (experimental) | Switch identity, software metadata, uptime | Disabled (SYSTEM_ENABLED=false) |
| Docker (experimental) | Container runtime metrics from STATE_DB |
Disabled (DOCKER_ENABLED=false) |
Collector implementations live in internal/collector/*_collector.go.
./sonic-exporter
curl localhost:9101/metricsdocker-compose up --build -d
curl localhost:9101/metrics| Variable | Description | Default |
|---|---|---|
REDIS_ADDRESS |
Redis address (host:port for TCP) |
localhost:6379 |
REDIS_PASSWORD |
Password for Redis | empty |
REDIS_NETWORK |
Redis network type (tcp or unix) |
tcp |
| Variable | Description | Default |
|---|---|---|
LLDP_ENABLED |
Enable LLDP collector | true |
LLDP_INCLUDE_MGMT |
Include management interfaces like eth0 |
true |
LLDP_REFRESH_INTERVAL |
Cache refresh interval | 30s |
LLDP_TIMEOUT |
Timeout for one refresh cycle | 2s |
LLDP_MAX_NEIGHBORS |
Max neighbors exported per refresh | 512 |
| Variable | Description | Default |
|---|---|---|
VLAN_ENABLED |
Enable VLAN collector | true |
VLAN_REFRESH_INTERVAL |
Cache refresh interval | 30s |
VLAN_TIMEOUT |
Timeout for one refresh cycle | 2s |
VLAN_MAX_VLANS |
Max VLANs exported per refresh | 1024 |
VLAN_MAX_MEMBERS |
Max VLAN members exported per refresh | 8192 |
| Variable | Description | Default |
|---|---|---|
LAG_ENABLED |
Enable LAG collector | true |
LAG_REFRESH_INTERVAL |
Cache refresh interval | 30s |
LAG_TIMEOUT |
Timeout for one refresh cycle | 2s |
LAG_MAX_LAGS |
Max LAGs exported per refresh | 512 |
LAG_MAX_MEMBERS |
Max LAG members exported per refresh | 4096 |
| Variable | Description | Default |
|---|---|---|
FDB_ENABLED |
Enable FDB collector | false |
FDB_REFRESH_INTERVAL |
Cache refresh interval | 60s |
FDB_TIMEOUT |
Timeout for one refresh cycle | 2s |
FDB_MAX_ENTRIES |
Max ASIC FDB entries processed per refresh | 50000 |
FDB_MAX_PORTS |
Max per-port FDB series exported | 1024 |
FDB_MAX_VLANS |
Max per-VLAN FDB series exported | 4096 |
| Variable | Description | Default |
|---|---|---|
SYSTEM_ENABLED |
Enable system collector | false |
SYSTEM_REFRESH_INTERVAL |
Cache refresh interval | 60s |
SYSTEM_TIMEOUT |
Timeout for one refresh cycle | 4s |
SYSTEM_COMMAND_ENABLED |
Enable allowlisted read-only command fallback | true |
SYSTEM_COMMAND_TIMEOUT |
Timeout per command | 2s |
SYSTEM_COMMAND_MAX_OUTPUT_BYTES |
Max bytes read per command | 262144 |
SYSTEM_VERSION_FILE |
SONiC version metadata path | /etc/sonic/sonic_version.yml |
SYSTEM_MACHINE_CONF_FILE |
Machine config path | /host/machine.conf |
SYSTEM_HOSTNAME_FILE |
Hostname path | /etc/hostname |
SYSTEM_UPTIME_FILE |
Uptime path | /proc/uptime |
Enable:
SYSTEM_ENABLED=true ./sonic-exporterSystem collector exports:
sonic_system_identity_infosonic_system_software_infosonic_system_uptime_secondssonic_system_collector_successsonic_system_scrape_duration_secondssonic_system_cache_age_seconds
Data source order:
- Redis (
DEVICE_METADATA|localhost,CHASSIS_INFO|chassis 1) - Read-only files (
/etc/sonic/sonic_version.yml,/host/machine.conf,/etc/hostname,/proc/uptime) - Optional allowlisted command fallback (
show platform summary --json,show version,show platform syseeprom)
| Variable | Description | Default |
|---|---|---|
DOCKER_ENABLED |
Enable docker collector | false |
DOCKER_REFRESH_INTERVAL |
Cache refresh interval | 60s |
DOCKER_TIMEOUT |
Timeout for one refresh cycle | 2s |
DOCKER_MAX_CONTAINERS |
Max container entries exported per refresh | 128 |
DOCKER_SOURCE_STALE_THRESHOLD |
Source age threshold for stale signal | 5m |
Enable:
DOCKER_ENABLED=true ./sonic-exporterDocker collector behavior:
- Reads
STATE_DBkeysDOCKER_STATS|*andDOCKER_STATS|LastUpdateTime. - No Docker socket access.
- No writes.
- Controlled label cardinality (
containeronly).
These are compact anonymized examples. Labels can vary by SONiC platform/version.
sonic_interface_operational_status{device="Ethernet0"} 1
sonic_hw_psu_operational_status{psu="PSU1"} 1
sonic_crm_stats_used{resource="ipv4_route"} 1610
sonic_queue_dropped_packets_total{device="Ethernet0",queue="3"} 73
sonic_lldp_neighbors 64
sonic_vlan_admin_status{vlan="Vlan1000"} 1
sonic_lag_oper_status{lag="PortChannel1"} 1
sonic_fdb_entries 1331
sonic_system_uptime_seconds 123456
sonic_docker_container_cpu_percent{container="swss"} 1.5
node_memory_MemAvailable_bytes 1.24e+10
These tests were done with SONiC Community releases (not SONiC Enterprise releases).
| Model Number | SONiC Software Version | SONiC OS Version | Distribution | Kernel | Platform | ASIC |
|---|---|---|---|---|---|---|
| DellEMC-S5232f-C8D48 | 202012 | 10 | Debian 10.13 | 4.19.0-12-2-amd64 | x86_64-dellemc_s5232f_c3538-r0 | broadcom |
| SSE-T7132SR | 202505 | 12 | Debian 12.11 | 6.1.0-29-2-amd64 | x86_64-supermicro_sse_t7132s-r0 | marvell-teralynx |
| MSN2100-CB2FC | 202411 | 12 | Debian 12.12 | 6.1.0-29-2-amd64 | x86_64-mlnx_msn2100-r0 | mellanox |
go test ./...
go build ./...
./scripts/build.sh
./scripts/package.sh
docker-compose up --build -dNotes:
./scripts/build.shproduces a static Linux binary (CGO_ENABLED=0).- If you add keys to Redis fixtures manually, persist them with
SAVEin Redis.
This section shows an example way to run sonic-exporter as a Linux service using systemd, with collector toggles set by environment variables.
Note: this systemd setup is not fully tested yet. Validate it in a lab or canary environment before using it in production.
sudo useradd --system --no-create-home --shell /usr/sbin/nologin sonic-exporterIf your distro uses /sbin/nologin, use that path instead.
sudo install -m 0755 ./sonic-exporter /usr/local/bin/sonic-exporterUse an env file so collector toggles and Redis settings are easy to manage without editing the unit file.
sudo install -d -m 0755 /etc/sonic-exporter
sudo tee /etc/sonic-exporter/sonic-exporter.env >/dev/null <<'EOF'
REDIS_ADDRESS=localhost:6379
REDIS_PASSWORD=
REDIS_NETWORK=tcp
LLDP_ENABLED=true
VLAN_ENABLED=true
LAG_ENABLED=true
FDB_ENABLED=false
SYSTEM_ENABLED=false
DOCKER_ENABLED=false
EOFCreate /etc/systemd/system/sonic-exporter.service:
[Unit]
Description=SONiC Prometheus Exporter
Documentation=https://github.com/rokernel/sonic-exporter
Wants=network-online.target
After=network-online.target
[Service]
Type=simple
User=sonic-exporter
Group=sonic-exporter
EnvironmentFile=/etc/sonic-exporter/sonic-exporter.env
ExecStart=/usr/local/bin/sonic-exporter
Restart=on-failure
RestartSec=5s
# Logging
StandardOutput=journal
StandardError=journal
# Hardening (safe defaults for this exporter)
NoNewPrivileges=true
PrivateTmp=true
ProtectSystem=strict
ProtectHome=true
ProtectKernelTunables=true
ProtectKernelModules=true
ProtectControlGroups=true
LockPersonality=true
MemoryDenyWriteExecute=true
RestrictSUIDSGID=true
RestrictRealtime=true
SystemCallArchitectures=native
# Allow read access to host and SONiC files used by collectors
ReadOnlyPaths=/etc/sonic /host /proc
[Install]
WantedBy=multi-user.targetsudo systemctl daemon-reload
sudo systemctl enable --now sonic-exporter
sudo systemctl status sonic-exportersudo systemctl show sonic-exporter --property=Environment
curl -s http://127.0.0.1:9101/metrics | headTo confirm a specific collector is enabled/disabled, look for its metric prefix:
- LLDP:
sonic_lldp_ - VLAN:
sonic_vlan_ - LAG:
sonic_lag_ - FDB:
sonic_fdb_ - System:
sonic_system_ - Docker:
sonic_docker_
Edit env file only, then restart:
sudoedit /etc/sonic-exporter/sonic-exporter.env
sudo systemctl restart sonic-exporterIf the unit is package-managed in future, do not edit it directly. Use an override:
sudo systemctl edit sonic-exporterExample override:
[Service]
Environment="FDB_ENABLED=true"
Environment="SYSTEM_ENABLED=true"Then:
sudo systemctl daemon-reload
sudo systemctl restart sonic-exporterValidate unit syntax first (no restart):
sudo systemd-analyze verify /etc/systemd/system/sonic-exporter.serviceRun a canary service on a different port:
- Copy unit to
sonic-exporter-canary.service - Change
ExecStart=/usr/local/bin/sonic-exporter --web.listen-address=:19101 - Optionally use a canary env file
- Start only canary:
sudo systemctl daemon-reload sudo systemctl start sonic-exporter-canary sudo systemctl status sonic-exporter-canary
- Verify:
curl -sf http://127.0.0.1:19101/metrics >/dev/null && echo OKjournalctl -u sonic-exporter-canary -n 100 --no-pager
- Clean rollback:
sudo systemctl stop sonic-exporter-canary sudo systemctl disable sonic-exporter-canary rm /etc/systemd/system/sonic-exporter-canary.service
- SONiC collector toggles are controlled by environment variables, not dedicated CLI flags.
- Keep
SYSTEM_ENABLEDandDOCKER_ENABLEDoff unless you need them. - If hardening blocks file access on your distro, relax only the minimum setting and document the reason.
This project is primarily a learning exercise, and parts of it were developed using AI-assisted workflows (often referred to as "vibe coding") while learning Go.
Before any production deployment, the source code must be reviewed, validated, and approved by qualified human engineers.
This project builds on work from upstream open source projects. Thank you to the maintainers and contributors.
- SONiC project: https://github.com/sonic-net/SONiC
- Original sonic-exporter fork lineage referenced by module path
github.com/vinted/sonic-exporter - Prometheus ecosystem components used by this project:
node_exporter: https://github.com/prometheus/node_exporterclient_golang: https://github.com/prometheus/client_golang
If this repository was forked from another sonic-exporter repository in your organization history, add that URL here as well so lineage stays explicit for users.