U.2 NVMe Cache Drive Post-Installation Tasks
This section describes the tasks that you typically need to perform after replacing a U.2 NVMe drive.
Recreating the Cache RAID 0 Volume
Power on the system and log in.
Confirm that all expected drives are visible:
sudo nvme list
The output can indicate two boot drives and eight cache drives, depending on how many are installed in the system.
Example Output
Node SN Model Namespace Usage Format FW Rev ---------------- -------------------- ---------------------------------------- --------- -------------------------- ---------------- -------- /dev/nvme0n1 S4YPNE0N200093 SAMSUNG MZWLJ3T8HBLS-00007 1 3.84 TB / 3.84 TB 512 B + 0 B EPK9CB5Q /dev/nvme1n1 S4YPNE0N200040 SAMSUNG MZWLJ3T8HBLS-00007 1 3.84 TB / 3.84 TB 512 B + 0 B EPK9CB5Q /dev/nvme2n1 S436NA0N106764 SAMSUNG MZ1LB1T9HALS-00007 1 44.44 GB / 1.92 TB 512 B + 0 B EDA7602Q /dev/nvme3n1 S436NA0N106850 SAMSUNG MZ1LB1T9HALS-00007 1 45.18 GB / 1.92 TB 512 B + 0 B EDA7602Q ...
If the cache volume was locked with an access key, unlock the drives:
sudo nv-disk-encrypt disable
The disk encryption packages must be installed on the system. Refer to the NVIDIA DGX H100/H200 User Guide for more information.
Recreate the cache volume and the
/raid
filesystem:configure_raid_array.py -c -f
At the prompt, enter
y
to confirm the rebuild action.Optional: To lock the volume with an access key, refer to the NVIDIA DGX H100/H200 User Guide.
Confirm the volume is healthy:
sudo nvsm show volumes
Make sure that the drive firmware is up to date. Refer to the NVIDIA DGX H100/H200 Firmware Update Guide for more information.
Returning the NVMe Drive
Use the packaging from the new drive and follow the instructions that came with the package to ship the old drive back to NVIDIA Enterprise Support.
Note
If your organization purchased a media retention policy, you might be able to keep failed drives for destruction. Check with NVIDIA Enterprise Support on the status of the policy for specifics.