System Memory Replacement

This section provides information about how to replace the system memory (DIMM).

System Memory Replacement

This is a high-level overview of the process to replace the system memory (DIMM).

  1. Identify the failed DIMM.

  2. Contact NVIDIA Enterprise Support to obtain a replacement.

  3. Power off the system and turn off the power supply switch.

  4. Open the left cover (motherboard side).

  5. Remove the air baffle.

  6. Identify the failed DIMM on the motherboard.

  7. Replace the DIMM for new component.

  8. Install the air baffle.

  9. Install the system cover.

  10. Power on the system.

  11. Test the memory and overall system health.

Identify the Failed DIMM

  1. To identify the failed DIMM from the output, run the following command:

    $ sudo nvsm show health
    
  2. Contact NVIDIA Enterprise Support, provide the output requested and obtain a replacement DIMM.

  3. After the new DIMM arrives, power off the DGX Station A100 and switch off the power supply.

  4. Remove the left system cover by pressing on the button that releases it.

    _images/remove-left-system-cover.png
  5. Pull the cover off and set aside.

    _images/pull-cover-off.png
  6. Remove the air baffle.

    _images/remove-air-baffle.png

Locate and Replace the Failed DIMM

  1. Use this diagram, or the label located on the back of the system cover, to locate the DIMM that needs to be replaced.

    _images/dimm-diagram.png
  2. Use the ejector lever on the DIMM to push it out of its socket.

    _images/push-dimm-out-of-socket.png
  3. Install the new DIMM and press down until the ejector lever returns to its locked position.

    _images/return-dimm-locked-position.png

After you replace the failed DIMM, see Close the System and Check the Memory.

Close the System and Check the Memory

After replacing the failed DIMM, you need to close the DGX Station A100 and check the memory.

  1. Install the air baffle by inserting it in the holes on the right side and then allowing the magnets on the left side to secure it in place .

  2. Close the system cover by inserting the bottom of the cover to the chassis, and then rotating it until it locks.

    _images/close-lock-system-cover.png
  3. To confirm that DGX Station A100 is healthy and that the memory is working properly, power on the system, and run the following command:

    $ sudo nvsm show health