System Health Check#

NVIDIA provides customers a diagnostics and management tool called NVIDIA System Management, or NVSM. The nvsm command can be used to determine the system’s health, identify component issues and alerts, or run a stress test to ensure all components are in working order while under load.

The following instructions show how to perform a health check on the DGX GB200 system.

  1. Establish an SSH connection to the DGX B200 system.

  2. Run a basic system check.

    sudo nvsm show health
    
  3. Verify that the output summary shows that all checks are Healthy and that the overall system status is Healthy.

For more information about the nvsm command, refer to the NVIDIA System Management User Guide.