Known Limitations

This section lists known limitations and other issues that will not be fixed.

Issue

After deleting the second partition of the OS RAID 1 array, putting it into a degraded mode, the system cannot be booted.

Explanation and Workaround

This occurs with Red Hat Enterprise Linux 7 or CentOS 7. The OS is booting into emergency mode.

To manually recover, perform the following while in emergency mode to enter maintenance mode.

Copy
Copied!
            

mdadm --run /dev/md0 exit

While in maintenance mode, recover by replacing the lost RAID partition.

Copy
Copied!
            

mdadm /dev/md0 --add /dev/nvme1n1p2

Issue

NGC containers might not run without either

  • using the --privileged argument, or

  • disabling selinux

Explanation and Workaround

NVIDIA devices sometimes are not labelled correctly after boot. To work around, issue the following before running the NGC container.

Copy
Copied!
            

sudo restorecon /dev/nvidia*

Issue

When data drives are removed, NVSM raises several alerts including a controller alert; but after removing the last drive, the controller alert is cleared.

Status

This is not a typical or likely use case.

Issue

The version of Docker provided in RHEL7, 1.13.1, does not support Deep Learning Framework containers 23.05 or newer.

Explanation and Workaround

The clone3 syscall used in these containers is not supported in this version of Docker. Users are recommended to use DLFW containers 23.04 or older. Alternatively, users can also install docker-ce, 20.10.14 or newer, by following the instructions here.

© Copyright 2022-2023, NVIDIA. Last updated on Jun 27, 2023.