U.2 NVMe Cache Drive Post-Installation Tasks

This chapter describes the tasks that are typically needed after replacing a U.2 NVME drive or upgrading from 8 to 16 drives.

Recreating the Cache RAID 0 Volume

  1. Stop cachefilesd.
    $ sudo systemctl stop cachefilesd 
  2. Umount /raid and stop raid-0.
    $ sudo umount –f /raid
    $ sudo mdadm –-stop /dev/md1
  3. Run the script to rebuild the RAID volume.
    $ sudo /usr/bin/configure_raid_array.py –c –f
    Press Y at any questions.
  4. When completed, confirm that the /raid volume is mounted.
    $ df -hl /raid
    The /dev/md1 filesystem should be mounted on /raid with size 28 TB or 56 TB, depending on whether 8 or 16 drives are installed.

Confirming the Volume is Ready

  1. Confirm the storage devices and volumes in the system are healthy using the following command.
    $ sudo nvsm show systems/1/storage/1/volumes/md1 
  2. Verify Status_Health=OK and that the numbers of drives listed in Drives = is as expected.
  3. Confirm that the drives are now available.
    $ sudo mdadm -D /dev/md1  
If the drive manufacturer is Micron, perform the steps in Enabling the Temperature Sensor.

Enabling the Temperature Sensor

The steps in this section need to be followed only for Micron NVMe drives.
  1. Verify the need to enable temperature reading for the installed NVMe drives by running ipmitool.
    $ sudo ipmitool sdr|grep -i -e "nvme.*temp"
  2. If any of the NVMe drives do not show a temperature reading, enter the following script.
    $ for drives in `nvme list|grep Micron | cut -d' ' -f1 |sed 's/..$//'`
    do /opt/MicronTechnology/MicronMSECLI/msecli -M -k 1 -n $drives
    done
  3. Confirm that temperature reading for the replaced drive is enabled by running ipmitool.
    $ sudo ipmitool sdr|grep -i -e "nvme.*temp"

Returning NVMe Drives

Use the packaging from the new drive and follow the instructions that came with the package to ship the old drive back to NVIDIA Enterpsise Support.
Note: If your organization has purchased a media retention policy, you may be able to keep failed drives for destruction. Check with NVIDIA Enterprise Support on the status of the policy for specifics.