Release Notes

NVSM 24.03.03 Release

NVSM Version 24.03.03 was released in April 2024.

Changes and New Features

The following are the changes in 24.03.03.

  • Expanded software health service (nvsm show health -swh) to include Kubernete and Slurm stack deployment verification.

  • Deprecated nvsm-health command in favor of nvsm show health.

  • Improved NVSM parsing of IPMI System Event Log(SEL) records, to avoid generating false alerts.

  • Updated DIMM consistency validation and support for additional DIMM vendors for DGX H100/H800 platforms.

Known Issues

  • The nvsm.service shows as inactive with GPU driver R550; the issue does not impact any NVSM functionality.

  • When more than 56 Virtual Functions (VFs) are created on Infiniband NICs, nvsm show health reports as unhealthy in GPUDirect Topology consistency check. The issue will be fixed in future releases.

NVSM 23.12.01 Release

NVSM Version 23.12.01 was released in December 2023.

Changes and New Features

The following are the changes in 23.12.01.

  • Introduced the software health service (nvsm show health -swh) for DGX OS and container stack deployment verification.

  • Enhanced functionality to collect MLX cable information in nvsm dump health.

  • Improved accuracy of NVSM alert generation based on System Event Log (SEL) records.

Bug Fixes

  • Fixed an issue with raid volume rebuilding on encrypted root filesystem.