U.2 NVMe Cache Drive Post-Installation Tasks

This chapter describes the tasks that are typically needed after replacing a U.2 NVME drive or upgrading from 4 to 8 drives.

Recreating the Cache RAID 0 Volume

  1. Power on the system and log in.

  2. Confirm that all expected drives are visible.

    $ sudo nvme list
    

    Two boot drives and either four or eight cache drives should be visible, depending on how many are installed in the system.

  3. If the previously installed cache drives (which are capable of self-encryption) are locked with an access key, then unlock them.

    $ sudo nv-disk-encrypt disable
    
  4. Re-create the cache volume and the /raid filesystem.

    $ sudo nvsm start /systems/localhost/storage/volumes/rebuild
    

    Output (with appropriate responses to prompts)

    PROMPT: In order to rebuild volume, volume type is required.
    Please specify the volume type to rebuild.
    (Options: raid-0 only) Type of volume rebuild (CTRL-C to cancel): raid-0
    
    WARNING: Once the RAID-0 rebuild process is started,
    all data currently stored on raid will be lost. Start RAID-0 rebuild on raid? [y/n] y
    /systems/localhost/storage/volumes/rebuild started at 2021-01-27 16:03:41.093694
    Finished rebuilding RAID-0 on volume raid
    100.0% [=========================================]
    Status: Done
    
  5. If the volume is to be locked with an access key, then re-enable drive encryption.

    • To enter drive passwords yourself, issue the following.

      $ sudo nv-disk-encrypt init -g
      

      The software prompts you to enter a password for the vault, and then a password for each eligible SED.

      Passwords must consist of only upper-case letters, lower-case letters, digits, and/or the following special-characters: ~ : @ % ^ + = _ ,

      $ sudo nv-disk-encrypt lock
      
    • To allow the encryption software to randomly generated the passwords, issue the following.

      $ sudo nv-disk-encrypt init -k <your-vault-password> -g -r
      

      The vault password must consist of only upper-case letters, lower-case letters, digits, and/or the following special-characters: ~ : @ % ^ + = _ ,

      $ sudo nv-disk-encrypt lock
      
    $ sudo nvsm enable /systems/localhost/storage/volumes/md1/encryption
    
  6. Issue the following to confirm the volume is healthy and that the system is healthy.

    $ sudo nvsm show storage
    
    $ sudo nvsm show health
    

Make sure that the drive firmware is up to date. Refer to the DGX A100 firmware release notes for information on the latest firmware for the U.2 NVMe drive.

Returning the NVMe Drive

Use the packaging from the new drive and follow the instructions that came with the package to ship the old drive back to NVIDIA Enterprise Support.

Note

If your organization has purchased a media retention policy, you may be able to keep failed drives for destruction. Check with NVIDIA Enterprise Support on the status of the policy for specifics.