DIMM Replacement

Caution

Static Sensitive Devices: Be sure to observe best practices for electrostatic discharge (ESD) protection. This includes making sure personnel and equipment are connected to a common ground, such as by wearing a wrist strap connected to the chassis ground, and placing components on static-free work surfaces.

DIMM Replacement Overview

This is a high-level overview of the procedure to replace a dual inline memory module (DIMM) on the DGX H100 system.

  1. Use the nvsm health command to identify the failed DIMM

  2. Get a replacement DIMM from NVIDIA Enterprise Support

  3. Shut down the system

  4. Label all motherboard tray cables and unplug them

  5. Remove the motherboard tray and place on a solid flat surface

  6. Remove the motherboard tray lid

  7. Use the reference diagram on the lid of the motherboard tray to identify the failed DIMM

  8. Replace the bad DIMM with the new one

  9. Close the lid on the motherboard tray

  10. Insert the motherboard tray into the system

  11. Plug in all cables using the labels as a reference

  12. Power on the system

  13. Verify that all DIMMs are now healthy with nvsm health

  14. Ship back the failed unit to NVIDIA Enterprise Support using the packaging provided

Identifying the Failed DIMM

  1. From the console, run the following nvsm command to identify memory alerts:

    sudo nvsm show health
    
  2. Determine the DIMM manufacturer.

    sudo nvsm show memory
    
  3. Request the replacement DIMM from NVIDIA Enterprise Support, specifying the manufacturer.

Replacing the DIMM

  1. Power off the system.

  2. Remove the motherboard tray. Refer to Motherboard Tray - Removal and Installation for more information.

  3. Pull the motherboard out of the system and place it on a solid, flat surface and remove the lid and air baffles to expose the DIMMs.

  4. Identify the failed DIMM on the motherboard. Use the label on the lid to identify the position of the DIMM to be replaced. The names of the DIMMs also include the CPU numbering for easier identification.

    _images/motherboard.png
  5. Remove the DIMM. Press down on the side latches at both ends of the DIMM socket to push them away from the DIMM. This should unseat the DIMM from the socket.

    _images/dimm-socket-levels.png
  6. To install the DIMM, make sure both levers are in the open position. Make sure the DIMM is correctly aligned with the key in the right position and press down on the DIMM until it clicks in the socket and the levers close.

    _images/dimm-socket-open.png _images/dimm-location.png

Finalize DIMM Replacement

  1. Install the air baffles, close the motherboard, and install the tray in the chassis. Refer to Motherboard Tray - Removal and Installation for more information.

  2. Plug in all cables.

  3. Install all power cords.

    _images/case-rear.png
  4. Power on system.

  5. Login and use the nvsm command to confirm the system is healthy:

    sudo nvsm show health
    
  6. Ship the bad DIMM back to NVIDIA Enterprise Support.