M.2 Boot Drive Riser Assembly Replacement

This chapter applies when both M.2 OS drives need to be replaced. In that case, a replacement riser assembly (which includes both M.2 NVMe drives) should be ordered.

M.2 Boot Drive Riser Assembly Replacement Overview

This is a high-level overview of the procedure to replace the boot drive riser assembly.
  1. Confirm that both M.2 drives cannot be reached and need to be replaced.
  2. Get replacement M.2 riser assembly from NVIDIA Enterprise Support.
  3. Power down the system.
  4. Label all motherboard tray cables and unplug them.
  5. Slide out the motherboard tray and open the motherboard tray lid.
  6. Pull out the M.2 riser assembly with both M.2 disks attached.
  7. Install the new M.2 riser assembly with both M.2 disks attached..
  8. Close the lid on the motherboard tray.
  9. Slide the motherboard tray into the system.
  10. Plug in all cables using the labels as a reference.
  11. Power on the system.
  12. Re-install the OS and confirm the system is healthy.
  13. Ship back the failed unit to NVIDIA Enterprise Support using the packaging provided.

Determining a Failed M.2 NVMe Riser Assembly

The following are the conditions for which NVIDIA Enterprise Support may instruct the M.2 riser assembly be replaced:
  • The DGX A100 cannot be booted.
  • The boot drives cannot be seen from the SBIOS.
  • The system indicates that the boot drives are not available when booting from the ISO image.
  • Loss of communication with the M.2 boot drives.
  • The M.2 riser assembly was damaged.

Replacing the M.2 NVMe Riser Assembly

Before attempting to replace the M.2 NVMe riser assembly, be sure you have obtained the replacement assembly and have saved the packaging for use when returning the faulty riser assembly.

CAUTION: Static Sensitive Devices: - Be sure to observe best practices for electrostatic discharge (ESD) protection. This includes making sure personnel and equipment are connected to a common ground, such as by wearing a wrist strap connected to the chassis ground, and placing components on static-free work surfaces.

  1. Power down the system. You will likely need to use the BMC console.
  2. Label all cables connected to the motherboard tray for easy identification when reconnecting.
  3. Remove the motherboard tray.

    Refer to the instructions in the section Accessing the Motherboard Tray.

  4. Remove the M.2 riser card from the motherboard tray by lifting the riser assembly.

  5. Install the assembled module on the motherboard by inserting the riser card in its slot.

  6. Close the motherboard tray lid and then install the motherboard tray.

    Refer to the instructions in the section Replacing the Motherboard Tray.

  7. Connect all the cables to the motherboard tray.
  8. Re-install the DGX OS server software. See the DGX A100 User Guide for detailed instructions.

Returning the Riser Assembly

Use the packaging from the new riser assembly and follow the instructions that came with the package to ship the old riser assembly back to NVIDIA Enterprise Support.
Note: If your organization has purchased a media retention policy, you may be able to keep failed drives for destruction. Check with NVIDIA Enterprise Support on the status of the policy for specifics.