Introduction#

NVSwitch Device Monitoring (NVSDM) is a library for monitoring NVSwitch devices on NVIDIA Blackwell systems. NVSDM API provides a wide range of telemetry including, but not limited to, device health, port counters, and PCIe statistics.

The NVSDM package also contains the experimental nvsdm_cli utility. This utility provides a convenient way to utilize the NVSDM library.

Note

The nvsdm_cli is an experimental tool and is subject to change and/or removal without notice.

Note

NVSDM does not currently support ethernet devices.

Change log of NVSDM library#

This chapter list changes in API and bug fixes that were introduced to the library

Changes between NVSDM v1.2.0 and v1.3.0#

  • Added new ConnectX counter ID NVSDM_CONNECTX_TELEM_CTR_PCIE_LINK_INBOUND_BYTES

  • Added new ConnectX counter ID NVSDM_CONNECTX_TELEM_CTR_PCIE_LINK_OUTBOUND_BYTES

  • Updated doxygen documentation to fix warnings and add better grouping support.

Changes between NVSDM v1.1.0 and v1.2.0#

  • Added a new API to retrieve “local” port number: nvsdmPortGetLocalNum

  • Modified nvsdmDeviceGetFirmwareVersion to also retrieve firmware versions for ConnectX HCA in addition to switches

  • Added support for 4 “extended” (i.e. 64b) PMA counters:

    • NVSDM_PORT_TELEM_CTR_EXT_XMIT_DATA

    • NVSDM_PORT_TELEM_CTR_EXT_RCV_DATA

    • NVSDM_PORT_TELEM_CTR_EXT_XMIT_PKTS

    • NVSDM_PORT_TELEM_CTR_EXT_RCV_PKTS

Changes between NVSDM v1.0 and v1.1.0#

  • Added nvsdmSetLogFile to specify a log file.

  • Added nvsdmDeviceGetFirmwareVersion to retrieve the firmware version for a given switch.

  • Added nvsdmDeviceGetTelemetryValues to retrieve telemetry from a device.

  • Added a new telemetry type: NVSDM_TELEM_TYPE_CONNECTX for ConnectX device telemetry.

Known issues in the current version of NVSDM library#

This is a list of known NVSDM issues in the current release:

  • The following ConnectX inbound and outbound byte counters are calculated over a very short period of time instead of the intended behavior of being calculated over the lifetime of the NVSDM library.

    • ConnectX counter ID NVSDM_CONNECTX_TELEM_CTR_PCIE_LINK_INBOUND_BYTES

    • ConnectX counter ID NVSDM_CONNECTX_TELEM_CTR_PCIE_LINK_OUTBOUND_BYTES