Release Notes
NVSM 24.03.03 Release
NVSM Version 24.03.03 was released in April 2024.
Changes and New Features
The following are the changes in 24.03.03.
Expanded software health service (
nvsm show health -swh
) to include Kubernete and Slurm stack deployment verification.Deprecated
nvsm-health
command in favor ofnvsm show health
.Improved NVSM parsing of IPMI System Event Log(SEL) records, to avoid generating false alerts.
Updated DIMM consistency validation and support for additional DIMM vendors for DGX H100/H800 platforms.
Known Issues
The
nvsm.service
shows as inactive with GPU driver R550; the issue does not impact any NVSM functionality.When more than 56 Virtual Functions (VFs) are created on Infiniband NICs, nvsm show health reports as unhealthy in GPUDirect Topology consistency check. The issue will be fixed in future releases.
NVSM 23.12.01 Release
NVSM Version 23.12.01 was released in December 2023.
Changes and New Features
The following are the changes in 23.12.01.
Introduced the software health service (
nvsm show health -swh
) for DGX OS and container stack deployment verification.Enhanced functionality to collect MLX cable information in
nvsm dump health
.Improved accuracy of NVSM alert generation based on System Event Log (SEL) records.
Bug Fixes
Fixed an issue with raid volume rebuilding on encrypted root filesystem.