System Health Check#

NVIDIA provides customers a diagnostics and management tool called NVIDIA System Management, or NVSM. The nvsm command can be used to determine the system’s health, identify component issues and alerts, or run a stress test to ensure all components are in working order while under load.

The following instructions show how to perform a health check on a DGX GB rack system.

  1. Establish an SSH connection to the rack.

  2. Run a basic system check.

    sudo nvsm show health
    
  3. Verify that the output summary shows that all checks are Healthy and that the overall system status is Healthy.

For more information about the nvsm command, refer to the NVIDIA System Management User Guide.