DIMM Upgrade and Replacement#

Caution

Static Sensitive Devices: Be sure to observe best practices for electrostatic discharge (ESD) protection. Ensure that personnel and equipment are connected to a common ground, such as wearing a wrist strap connected to the chassis ground and placing components on static-free work surfaces.

DIMM Upgrade Procedure#

To upgrade DIMMs,

  1. Contact NVIDIA to obtain the complete upgrade kit.

  2. Replace all DIMMs following the instructions in the DIMM Replacement section.

DIMM Replacement Overview#

This is a high-level overview of the procedure to replace a dual inline memory module (DIMM) on the NVIDIA DGX™ B200 system.

  1. Use the nvsm health command to identify the failed DIMM.

  2. Get a replacement DIMM from NVIDIA Enterprise Support.

  3. Shut down the system.

  4. Label all motherboard tray cables and unplug them.

  5. Remove the motherboard tray and place it on a solid flat surface.

  6. Remove the motherboard tray lid.

  7. Use the reference diagram on the lid of the motherboard tray to identify the failed DIMM.

  8. Replace the failed DIMM with the new one.

  9. Close the lid on the motherboard tray.

  10. Insert the motherboard tray into the system.

  11. Plug in all cables using the labels as a reference.

  12. Power on the system.

  13. Verify that all DIMMs are now healthy with the nvsm health command.

  14. Send the failed unit to NVIDIA Enterprise Support using the packaging provided.

Note

You should observe the following DIMM population guidelines:

  • Each memory channel (A, B, C, D, E, F, G, H) should be populated with identical DIMMs for optimal performance in a dual-memory configuration. For example, DIMMs in slots CPU1_B0 and CPU1_B1 within channel B should have the same part number.

  • Different memory channels can be populated with DIMMs of different part numbers. For example, DIMMs in slots CPU1_A0 and CPU1_A1 should have the same part number, while DIMMs in slots CPU1_B0 and CPU1_B1 should have the same part number. However, the DIMM manufacturer in channel A can differ from the DIMM manufacturer in channel B.

Identifying the Failed DIMM#

  1. From the console, run the following nvsm command to identify the failed DIMM:

    sudo nvsm show health
    
  2. Determine the DIMM manufacturer.

    sudo nvsm show memory
    
  3. Request a replacement DIMM from NVIDIA Enterprise Support, specifying the manufacturer.

Replacing the DIMM#

  1. Power off the system.

  2. Remove the motherboard tray. Refer to Motherboard Tray - Removal and Installation for more information.

  3. Pull the motherboard out of the system and place it on a solid, flat surface.

    Remove the lid and air baffles to expose the DIMMs.

  4. Identify the failed DIMM on the motherboard.

    Use the label on the lid to identify the position of the DIMM to be replaced. The names of the DIMMs also include the CPU numbering for easier identification.

    _images/dgx-b200-motherboard.png
  5. To remove the failed DIMM, press down on the ejection levers to eject the DIMM out of the socket.

    _images/dimm-socket-levels.png
  6. To insert the new DIMM, position it in the socket and press down until the levers close and the DIMM clicks into place.

    _images/dimm-socket-open.png _images/dimm-location.png

Finalize the DIMM Replacement#

  1. Install the air baffles, close the motherboard, and install the tray in the chassis. For more information, refer to Motherboard Tray - Removal and Installation.

  2. Plug in all cables.

  3. Install all power cords.

  4. Power on system.

  5. Log in and use the nvsm command to confirm the system is healthy:

    sudo nvsm show health
    
  6. Send the failed DIMM to NVIDIA Enterprise Support.