Known Limitations
This section lists known limitations and other issues that will not be fixed.
Issue
After deleting the second partition of the OS RAID 1 array, putting it into a degraded mode, the system cannot be booted.
Explanation and Workaround
This occurs with Red Hat Enterprise Linux 7 or CentOS 7. The OS is booting into emergency mode.
To manually recover, perform the following while in emergency mode to enter maintenance mode.
mdadm --run /dev/md0
exit
While in maintenance mode, recover by replacing the lost RAID partition.
mdadm /dev/md0 --add /dev/nvme1n1p2
Issue
NGC containers might not run without either
using the
--privileged
argument, ordisabling selinux
Explanation and Workaround
NVIDIA devices sometimes are not labelled correctly after boot. To work around, issue the following before running the NGC container.
sudo restorecon /dev/nvidia*
Issue
When data drives are removed, NVSM raises several alerts including a controller alert; but after removing the last drive, the controller alert is cleared.
Status
This is not a typical or likely use case.
Issue
The version of Docker provided in RHEL7, 1.13.1, does not support Deep Learning Framework containers 23.05 or newer.
Explanation and Workaround
The clone3 syscall used in these containers is not supported in this version of Docker. Users are recommended to use DLFW containers 23.04 or older. Alternatively, users can also install docker-ce, 20.10.14 or newer, by following the instructions here.