Known Limitations

This section lists known limitations and other issues that will not be fixed.

Unable to Boot from Degraded RAID 1 Array

Issue

After deleting the second partition of the OS RAID 1 array, putting it into a degraded mode, the system cannot be booted.

Explanation and Workaround

This occurs with Red Hat Enterprise Linux 7 or CentOS 7. The OS is booting into emergency mode.

To manually recover, perform the following while in emergency mode to enter maintenance mode.

mdadm --run /dev/md0
exit

While in maintenance mode, recover by replacing the the lost RAID partition.

mdadm /dev/md0 --add /dev/nvme1n1p2

NGC Containers Might not Run

Issue

NGC containers might not run without either

  • using the --privileged argument, or
  • disabling selinux

Explanation and Workaround

NVIDIA devices sometimes are not labelled correctly after boot. To work around, issue the following before running the NGC container.

$ sudo restorecon /dev/nvidia*

DGX-2, DGX Station: Ubuntu Boot Option Appears After Installing Red Hat Enterprise Linux

Issue

After installing Red Hat Enterprise Linux 7.6 and rebooting, the Ubuntu boot option still appears in the boot menu.

Explanation and Workaround

After installing Red Hat Enterprise Linux, the OS leaves entries from the previous DGX OS in the EFI boot table. These entries have no affect on the system other than potentially causing confusion. You can manually remove the entries as follows.

  1. Obtain a list of all the entires in the boot table.
    efibootmgr list
  2. To remove an entry, run the following.
    sudo efibootmgr -b <xxxx> -B

    Where <xxxx> is the boot entry number.

    Example: To remove the following boot entry

    Boot000A* ubuntu    HD(1,GPT,ae7ba5cb-d73f-43af-ae8c-96d8579d7299,0x800,0x100000)/File(\EFI\UBUNTU\GRUBX64.EFI)..BO

    run
    sudo efiboomgr -b 000A -B
    .

DGX-1: NVSM Storage Alerts are Cleared After Removing All Four RAID 0 Data Drives

Issue

When data drives are removed, NVSM raises several alerts including a controller alert; but after removing the last drive, the controller alert is cleared.

Status

This is not a typical or likely use case.