Power Supply Replacement

This chapter describes how to replace one of the DGX-2 System power supplies (PSUs).

Power Supply Replacement Overview

This is a high-level overview of the steps needed to replace a power supply.
  1. Identify failed power supply through the BMC and submit a service ticket.
  2. Get replacement power supply from NVIDIA Enterprise Support.
  3. Identify the power supply using the diagram as a reference and the indicator LEDs.
  4. Remove the power cord from the power supply that will be replaced.
  5. Remove the failed power supply.
  6. Insert new power supply.
  7. Insert the power cord and make sure both LEDs light up green (IN/OUT).
  8. Use the BMC to confirm that the power supply is working correctly.

Identifying the Failed Power Supply

Identifying the Failed Power Supply from the Back

If physical access to the system is available, you can identify a failed PSU by the inspecting the LEDs on the power supply.

Both LEDs should be solid green. If either of the LEDs are not green or if they are blinking, contact NVIDIA Enterprise Support to troubleshoot the issue.

Identifying the Failed Power Supply from the Console

There are a couple of ways to identify the failed PSU from the DGX-2 console.
  • Use the NVSM CLI as follows.
    $ sudo nvsm show psus

    The output shows information for each PSU. Look for any that do not report Status_Health=OK.

  • You can also log into the BMC, then click Sensor from the left side menu and inspect the PSU information from the Normal Sensors section.

Both NVSM and the BMC identify each power supply as PSUx, where x is from 0 to 5. The following diagram shows the physical location of each PSU.

Identifying the Power Supply Manufacturer

Enter the following NVSM CLI command to see the manufacturer of the PSUs in the system..
$ sudo nvsm show psus |grep Manufacturer 

Request a replacment PSU from NVIDIA Enterprise Support, specifying this information.

Replacing the Power Supply

  1. Be sure you have obtained the replacement PSU and that you have saved the packaging to use when sending back the failed PSU.
  2. Determine whether you need to shut down the system.
    • If the five remaining PSUs are working and energized, then you do not need to shut down power to the DGX-2 System..
    • If fewer than five PSUs are working and energized, then you do need to shut down power to the DGX-2 System.
  3. Unplug the power cable from the PSU to be replaced. You may need to dislodge the power cord from the retaining clip.
  4. Remove the PSU.
    1. Push on the blue tab to release the lock.

    2. Pull on the handle to remove the PSU from the chassis.

  5. Install the new power supply.
    1. Insert the new power supply into the chassis and push it all the way in, making sure that the blue locking mechanism engages.
    2. Plug in the power cord and attach the retaining clip.
    3. If needed, power on the system.
  6. Confirm the installation by
    • Viewing the PSU status from the BMC dashboard->Sensors page.
    • Running nvsm show health to confirm the health of the system.
Pack the old power supply and ship it back to NVIDIA Enterprise Support.