U.2 NVMe Cache Drive Post-Installation Tasks

This section describes the tasks that you typically need to perform after replacing a U.2 NVMe drive.

Recreating the Cache RAID 0 Volume

  1. Power on the system and log in.

  2. Confirm that all expected drives are visible:

    sudo nvme list
    

    The output can indicate two boot drives and eight cache drives, depending on how many are installed in the system.

    Example Output

    Node             SN                   Model                                    Namespace Usage                      Format           FW Rev
    ---------------- -------------------- ---------------------------------------- --------- -------------------------- ---------------- --------
    /dev/nvme0n1     S4YPNE0N200093       SAMSUNG MZWLJ3T8HBLS-00007               1           3.84  TB /   3.84  TB    512   B +  0 B   EPK9CB5Q
    /dev/nvme1n1     S4YPNE0N200040       SAMSUNG MZWLJ3T8HBLS-00007               1           3.84  TB /   3.84  TB    512   B +  0 B   EPK9CB5Q
    /dev/nvme2n1     S436NA0N106764       SAMSUNG MZ1LB1T9HALS-00007               1          44.44  GB /   1.92  TB    512   B +  0 B   EDA7602Q
    /dev/nvme3n1     S436NA0N106850       SAMSUNG MZ1LB1T9HALS-00007               1          45.18  GB /   1.92  TB    512   B +  0 B   EDA7602Q
    ...
    
  3. If the cache volume was locked with an access key, unlock the drives:

    sudo nv-disk-encrypt disable
    

    The disk encryption packages must be installed on the system. Refer to the NVIDIA DGX H100 User Guide for more information.

  4. Recreate the cache volume and the /raid filesystem:

    configure_raid_array.py -c -f
    

    At the prompt, enter y to confirm the rebuild action.

  5. Optional: To lock the volume with an access key, refer to the NVIDIA DGX H100 User Guide.

  6. Confirm the volume is healthy:

    sudo nvsm show volumes
    

Make sure that the drive firmware is up to date. Refer to the NVIDIA DGX H100 Firmware Update Guide for more information.

Returning the NVMe Drive

Use the packaging from the new drive and follow the instructions that came with the package to ship the old drive back to NVIDIA Enterprise Support.

Note

If your organization purchased a media retention policy, you might be able to keep failed drives for destruction. Check with NVIDIA Enterprise Support on the status of the policy for specifics.