Introduction#
NVSwitch Device Monitoring (NVSDM) is a library for monitoring NVSwitch devices on NVIDIA Blackwell systems. NVSDM API provides a wide range of telemetry including, but not limited to, device health, port counters, and PCIe statistics.
The NVSDM package also contains the experimental nvsdm_cli utility. This
utility provides a convenient way to utilize the NVSDM library.
Note
The nvsdm_cli is an experimental tool and is subject to change and/or removal without notice.
Note
NVSDM does not currently support ethernet devices.
Change log of NVSDM library#
This chapter list changes in API and bug fixes that were introduced to the library
Changes between NVSDM v1.2.0 and v1.3.0#
Added new ConnectX counter ID NVSDM_CONNECTX_TELEM_CTR_PCIE_LINK_INBOUND_BYTES
Added new ConnectX counter ID NVSDM_CONNECTX_TELEM_CTR_PCIE_LINK_OUTBOUND_BYTES
Updated doxygen documentation to fix warnings and add better grouping support.
Changes between NVSDM v1.1.0 and v1.2.0#
Added a new API to retrieve “local” port number: nvsdmPortGetLocalNum
Modified nvsdmDeviceGetFirmwareVersion to also retrieve firmware versions for ConnectX HCA in addition to switches
Added support for 4 “extended” (i.e. 64b) PMA counters:
NVSDM_PORT_TELEM_CTR_EXT_XMIT_DATA
NVSDM_PORT_TELEM_CTR_EXT_RCV_DATA
NVSDM_PORT_TELEM_CTR_EXT_XMIT_PKTS
NVSDM_PORT_TELEM_CTR_EXT_RCV_PKTS
Changes between NVSDM v1.0 and v1.1.0#
Added nvsdmSetLogFile to specify a log file.
Added nvsdmDeviceGetFirmwareVersion to retrieve the firmware version for a given switch.
Added nvsdmDeviceGetTelemetryValues to retrieve telemetry from a device.
Added a new telemetry type: NVSDM_TELEM_TYPE_CONNECTX for ConnectX device telemetry.
Known issues in the current version of NVSDM library#
This is a list of known NVSDM issues in the current release:
The following ConnectX inbound and outbound byte counters are calculated over a very short period of time instead of the intended behavior of being calculated over the lifetime of the NVSDM library.
ConnectX counter ID NVSDM_CONNECTX_TELEM_CTR_PCIE_LINK_INBOUND_BYTES
ConnectX counter ID NVSDM_CONNECTX_TELEM_CTR_PCIE_LINK_OUTBOUND_BYTES