U.2 NVMe Cache Drive Replacement#

U.2 NVMe Cache Drive Replacement Overview#

This is a high-level overview of the procedure to replace a cache Non-Volatile Memory Express (NVMe) drive.

  1. Identify the failed SSD.

  2. Request a replacement SSD from NVIDIA Enterprise Support.

  3. Power off the system.

  4. Remove the failed SSD identified earlier.

  5. Insert the new SSD.

  6. Power on the system.

  7. Rebuild the RAID volume and mount the filesystem.

  8. Return the failed unit to NVIDIA Enterprise Support using the packaging provided.

Identifying the Failed U.2 NVMe SSD#

Identifying the Failed NVMe from the Front

If physical access to the system is available, you can identify a failed drive by the illuminated amber LED.

_images/b200-u2-nvme-mapping.png

Identifying the Failed NVMe from the Console

  • To identify the failed data drive, you can use the nvsm command:

    sudo nvsm show health
    

    View the command output and look for drive alerts to identify the failed drive.

  • Alternatively, you can use the BMC Web User Interface to access the Sensor screen, the IPMI event log, and the System log to identify issues with the U.2 drives.

Identifying the NVMe Manufacturer and Model#

  • Use the nvsm command to display the drive information:

    sudo nvsm show /systems/localhost/storage/drives/nvmeXn1
    

    Replace X in the preceding command with the number that corresponds to the Linux device name of the failed drive.

    Example Output

     /systems/localhost/storage/drives/nvme5n1
     Properties:
         PhysicalLocation_Info = SlotU.2_Slot3
         BlockSizeBytes = 512
         SerialNumber = 22L0A01WT2N8
         Model = KCM6DRUL3T84
         Revision = 0107
         Manufacturer = KIOXIA Corporation
         Status_State = Enabled
         Status_Health = OK
         Name = nvme5n1
         MediaType = SSD
         EncryptionStatus = Unlocked
         CapacityBytes = 3840755982336
         Id = nvme5n1
     Targets:
     Verbs:
         cd
         set
         show
    

    Refer to the Manufacturer and Model fields in the output. Request a replacement NVMe from NVIDIA Enterprise Support, specifying this information.

Replacing the U.2 NVMe Drive#

  1. Ensure that you requested and obtained the replacement drive from NVIDIA Enterprise Support.

  2. Back up any critical data to a network shared volume or other backup means.

  3. Power off the system using the power button.

  4. Remove the bezel. Refer to Removing and Attaching the Bezel for more information.

  5. After the system powers off, use the following figure to identify the drive to replace in the chassis.

    The figures in the following procedures show replacing drive number 7 at PCI address ae.

    _images/b200-u2-nvme-mapping.png
  6. Remove the NVMe drive.

    1. Press the tab on the right side of the drive to release the lever:

      _images/b200-nvme-lever.png
    2. Pull the drive out by using the lever:

      _images/b200-nvme-lever-remove.png
    3. Remove the drive:

      _images/b200-u2-nvme-remove.png

Insert the U.2 NVMe Drive#

  1. Open the new drive’s ejector handle by pressing the release tab, and insert the drive all the way until the connector on the drive engages with the midplane:

    _images/b200-nvme-install.png
  2. Use the handle on the drive to secure it in place:

    _images/b200-nvme-lever-close.png
  3. Confirm that the drive is flush with the system:

    _images/b200-nvme-flush.png
  4. Install the bezel after the drive replacement is complete.

Next Steps#