Known Issues#

This section provides summaries of the issues in DGX OS 5

Known Issue Overview#

Known Issues for DGX Server:

Errors Occur When Loading Mirrored Repositories on Air-Gapped Systems
DGX A800 Station/Server: DCGM Diagnostics may return Skip - All
DGX A800 Station/Server: mig-parted config
Regression of CUDA application startup performance
NVSM Stress Test Logs Do Not Contain Summary Information
nvidia-release-upgrade May Report That Not All Updates Have Been Installed and Exit
Duplicate EFI Variable May Cause efibootmgr to Fail
Erroneous Insufficient Power Error May Occur for PCIe Slots
AMD Crypto Coprocessor is not Supported
nvsm show alerts Reports NVSwitch PCIe Link Width is Degraded
nvsm show health Reports Firmware as Not Authenticated
Running NGC Containers Older than 20.10 May Produce “Incompatible MOFED Driver” Message
System May Slow Down When Using mpirun
Forced Reboot Hangs the OS
Applications that call the cuCTXCreate API Might Experience a Performance Drop

Known Issues for DGX Station:

Applications that call the cuCTXCreate API Might Experience a Performance Drop
NVIDIA Desktop Shortcuts Not Updated After a DGX OS Release Upgrade

Known Issues for DGX Station A100:

DGX A800 Station/Server: DCGM Diagnostics may return Skip - All
Unable to Set a Separate/xinerama Mode through the xorg.conf File or through nvidia-settings

Known Limitations (Issues that will not be fixed)

Virtualization Not Supported
No RAID Partition Created After ISO Install
System Services Startup Messages Appear Upon Completion of First-Boot Setup
[DGX A100]: Hot-plugging of Storage Drives not Supported
[DGX A100]: Syslog Contains Numerous “SM LID is 0, maybe no SM is running” Error Messages
[DGX-2]: Serial Over LAN Does not Work After Cold Resetting the BMC
[DGX-2]: Some BMC Dashboard Quick Links Appear Erroneously
[DGX-2]: Applications Cannot be Run Immediately Upon Powering on the DGX-2
[DGX-1]: Script Cannot Recreate RAID Array After Re-inserting a Known Good SSD
[DGX Station A100] Suspend and Power Button Section Appears in Power Settings
[DGX-2] NVSM Does not Detect Downgraded GPU PCIe Link

Resolved Issues:

[DGX A100] A System with Encrypted rootfs May Fail to Boot if one of the M.2 drives is Corrupted
NVSM Fails to Show CPU Information on Non-English Locales
Driver Version Mismatch Reported
[All DGX systems]: When starting the DCGM service, a version mismatch error message similar to the following will appear: [78075.772392] nvidia-nvswitch: Version mismatch, kernel version 450.80.02 user version 450.51.06
[All DGX systems]: When issuing nvsm show health, the nvsmhealth_log.txt log file reports that the /proc/driver/ folders are empty.
[DGX A100]: The Mellanox software that is included in the DGX OS installed on DGX A100 system does not automatically update the Mellanox firmware as needed when the Mellanox driver is installed.
[DGX A100]: nvsm stress-test does not stress the system if MIG is enabled. Reported in 4.99.10
[DGX A100]: With eight U.2 NVMe drives installed, the nvsm-plugin-pcie service reports ERROR: Device not found in mapping table” for the additional four drives (for example, in response to systemctl status nvsm*). Reported in 4.99.11
[DGX A100]: When starting the Fabric Manager service, the following error is reported: detected NVSwitch non-fatal error 10003 on NVSwitch pci. Reported in 4.99.9

Known Issues Details#

This section provides details for known issues in DGX OS 5.x.

Virtualization Not Supported#

Issue#

Virtualization technology, such as ESXi hypervisors or kernel-based virtual machines (KVM), is not an intended use case on DGX systems and has not been tested.

DGX A800 Station/Server: DCGM Diagnostics may return Skip - All#

Issue#

DCGM Diagnostics dcgmi diag may return a “Skip - All” error message for some tests.

Explanation#

DCGM 2.4 does not identify the A800 device IDs by default.

Workaround#

To continue using DCGM Diagnostics:

Create a file called a800-sxm4-diag.yaml with the following conmand:

cat << EOF > a800-sxm4-diag.yaml

version: "@CMAKE_PROJECT_VERSION@"
spec: dcgm-diag-v1
skus:
  - name: A800-SXM4-80GB
    id: 20f3
    targeted_power:
      is_allowed: true
      starting_matrix_dim: 1024
      target_power: 399.0
      use_dgemm: false
    targeted_stress:
      is_allowed: true
      use_dgemm: false
    sm_stress:
      is_allowed: true
      # dcgmproftester -t 1007 measures ~18600. Multiply by .75 to get ~13950
      target_stress: 13950.0
      use_dgemm: false
    pcie:
      is_allowed: true
      h2d_d2h_single_pinned:
        min_pci_generation: 3.0
        min_pci_width: 8.0
      h2d_d2h_single_unpinned:
        min_pci_generation: 3.0
        min_pci_width: 8.0
    memory:
      is_allowed: true
      l1cache_size_kb_per_sm: 192.0
    diagnostic:
      is_allowed: true
    memory_bandwidth:
      is_allowed: true
      # dcgmproftester -t 1005 shows ~1566000. Multiply by .75 to get ~1175000
      minimum_bandwidth: 971000.0
    pulse_test:
      is_allowed: false
EOF

Next, when you run dcgmi diag, provide the configuration file that you have created in Step 1. For example:
```
dcgmi diag -r 2 -c a800-sxm4-diag.yaml
```

Note

The Pulse test is not supported in this release and thus will continue to be skipped.

DGX A800 Station/Server: mig-parted config#

Issue#

DGX Station A800 is not currently supported in the all-balanced configuration of the default mig-parted config file.

Workaround#

To add the A800 device ID to the all-balanced configuration:

Make a copy of the default configuration.
Add device ID 0x20F310DE to the device-filter of the all-balanced config.
Point mig-parted apply at this new file when selecting a config.

Regression of CUDA application startup performance#

Issue#

Reported in 5.4.0 CUDA applications may experience longer CUDA binary loading.

Explanation#

The CUBIN/FATBINARY loading execution time may be increased by up to ~15% with certain drivers and CUDA versions (experienced with R510 and CUDA 11.6). This impacts all CUDA module loading APIs, cuModuleLoad*, as well as CUDA modules loaded through the CUDA Runtime (CUDART). The issue is not expected to have an impact on the application once the modules are loaded.

NVSM Stress Test Logs Do Not Contain Summary Information#

Issue#

When you run an NVSM stress test, the log does not include the test summary.

Explanation#

This issue is currently under investigation.

nvidia-release-upgrade May Report That Not All Updates Have Been Installed and Exit#

Issue#

When running the nvidia-release-upgrade command on systems running DGX OS 4.99.x, it may exit and tell users: “Please install all available updates for your release before upgrading” even though all upgrades have been installed.

Explanation#

To recover, issue the following command:

sudo apt install -y nvidia-fabricmanager-450/bionic-updates --allow-downgrades

After running the command, proceed with the regular upgrade steps:

sudo apt update
sudo apt full-upgrade -y
sudo apt install -y nvidia-release-upgrade sudo nvidia-release-upgrade

Duplicate EFI Variable May Cause efibootmgr to Fail#

Issue#

Reported in release 5.1.0.

On some DGX-2 systems, the ‘efibootmgr’ command may fail with the following signature:

sudo efibootmgr

No BootOrder is set; firmware will attempt recovery

Explanation#

This happens when the SBIOS presents duplicate EFI variables. Because of this, efivarfs will not be fully populated which may ultimately cause efibootmgr to fail.

To work around:

Flash the BIOS with the latest SBIOS revision using the BMC. Refer to: Updating the SBIOS from the BMC Dashboard for instructions.

Warning

Do not power cycle the system after clicking Cancel at the Firmware update completed dialog.
From the command line, issue the following command to read the “Restore PLDM Flag”.
```
sudo ipmitool raw 0x03 0x0D
```
This flag is cleared after reading, meaning that the system will not restore the PLDM table after the subsequent power cycle.
Power-cycle the system.

Erroneous Insufficient Power Error May Occur for PCIe Slots#

Issue#

Reported in release 4.99.9.

The DGX A100 server reports “Insufficient power” on PCIe slots when network cables are connected.

Explanation#

This may occur with optical cables and indicates that the calculated power of the card + 2 optical cables is higher than what the PCIe slot can provide.

The message can be ignored.

AMD Crypto Coprocessor is not Supported#

Issue#

Reported in release 4.99.9.

The DGX A100 currently does not support the AMD Cryptograph Coprocessor. When booting the system, you may see the following error message in the syslog:

ccp initialization failed

Explanation#

Even if the message does not appear, CCP is still not supported. The SBIOS makes zero CCP queues available to the driver, so CCP cannot be activated.

nvsm show alerts Reports NVSwitch PCIe Link Width is Degraded#

Issue#

Reported in release 4.99.10.

NVSM raises alerts of Severity=Warning against PCIe links between NVSwitch and the Draco switch. The alert states “PCIe link width degraded” - the PCIe link width is expected to be x4 while the actual link width is x2.

There are six pairs of the PCIe links, so NVSM raises six such alerts in this condition.

Explanation#

The Broadcom firmware for the synthetic switch advertises the Draco switch has PCIe link width capability of x4. This synthesized information is not reflecting the hardware capability which is of width x2. NVSM raises alerts based on this incorrect information.

This issue will be resolved with updated firmware to be provided in the DGX A100 Firmware Update Container after version 20.05.12.3. See the DGX A100 Firmware Update Container for the latest firmware status.

nvsm show health Reports Firmware as Not Authenticated#

Issue#

Reported in release 5.0.

When issuing nvsm show health, the output shows CEC firmware components as Not Authenticated, even when they have passed authentication.

Example:

CEC: CEC Version: 3.5 EC_FW_TAG0: Not Authenticated EC_FW_TAG1: Not Authenticated BMC FW authentication state: Not Authenticated

Explanation#

The message can be ignored and does not affect the overall nvsm health output status.

Running NGC Containers Older than 20.10 May Produce “Incompatible MOFED Driver” Message#

Issue#

Reported in release 5.0.

DGX OS 5.0 incorporates Mellanox OFED 5.1 for high performance multi-node connectivity. Support for this version of OFED was added in NGC containers 20.10, so when running on earlier versions (or containers derived from earlier versions), a message similar to the following may appear.

ERROR: Detected MOFED driver 5.1-2.4.6, but this container has version 4.6-1.0.1. Unable to automatically upgrade this container. Multi-node communication may be unreliable or may result in crashes with this version. This incompatibility will be resolved in an upcoming release.

Explanation#

For applications that rely on OFED (typically those used in multi-node jobs), this is an indication that an update to NGC containers 20.10 or greater is required. For most other applications, this error can be ignored.

Some applications may return an error such as the following when running with NCCL debug messages enabled (export NCCL_DEBUG=WARN): misc/ibvwrap.cc:284 NCCL WARN Callto ibv_modify_qp failedwitherrorNo such device … common.cu:777’unhandled system error’

This may occur even for single-node training jobs. To work around this, issue the following:

export NCCL_IB_DISABLE=1

System May Slow Down When Using mpirun#

Issue#

Customers running Message Passing Interface (MPI) workloads may experience the OS becoming very slow to respond. When this occurs, a log message similar to the following would appear in the kernel log:

kernel BUG at /build/linux-fQ94TU/linux-4.4.0/fs/ext4/inode.c:1899!

Explanation#

Due to the current design of the Linux kernel, the condition may be triggered when get_user_pages is used on a file that is on persistent storage. For example, this can happen when cudaHostRegister is used on a file path that is stored in an ext4 filesystem. DGX systems implement /tmp on a persistent ext4 filesystem.

Note: If you performed this workaround on a previous DGX OS software version, you do not need to do it again after updating to the latest DGX OS version.

In order to avoid using persistent storage, MPI can be configured to use shared memory at /dev/shm (this is a temporary filesystem).

If you are using Open MPI, then you can solve the issue by configuring the Modular Component Architecture (MCA) parameters so that mpirun uses the temporary file system in memory.

For details on how to accomplish this, see the Knowledge Base Article DGX System Slows (requires login to the nvidia enterprise support portal).

Forced Reboot Hangs the OS#

Issue#

When issuing reboot -f (forced reboot), I/O error messages appear on the console and then the system hangs.

The system reboots normally when issuing reboot.

Explanation#

This issue will be resolved in a future version of the DGX OS.

Applications that call the cuCTXCreate API Might Experience a Performance Drop#

Issue#

Reported in release 5.0.

When some applications call cuCtxCreate, cuGLCtxCreate, or cut Destroy, there might be a drop in performance.

Explanation#

This issue occurs with Ubuntu 20.04, but not with previous versions. The issue affects applications that perform graphics/compute interoperations or have a plugin mechanism for CUDA, where every plugin creates its own context, or video streaming applications where computations are needed. Examples include ffmpeg, Blender, simpleDrive Runtime, and cuSolverSp_LinearSolver.

This issue is not expected to impact deep learning training.

NVIDIA Desktop Shortcuts Not Updated After a DGX OS Release Upgrade#

Issue#

Reported in release 4.0.4.

In DGX OS 4 releases, the NVIDIA desktop shortcuts have been updated to reflect current information about NVIDIA DGX systems and containers for deep learning frameworks. These desktop shortcuts are also organized in a single folder on the desktop. After a DGX OS release upgrade, the NVIDIA desktop shortcuts for existing users are not updated. However, the desktop for a user added after the upgrade will have the current desktop shortcuts in a single folder.

Explanation#

If you want quick access to current information about NVIDIA DGX systems and containers from your desktop, replace the old desktop shortcuts with the new desktop shortcuts.

Change to your desktop directory. cd /home/your-user-login-id/Desktop
Remove the existing NVIDIA desktop shortcuts rm dgx-container-registry.desktop \ dgxstation-userguide.desktop \ dgx-container-registry-userguide.desktop \ nvidia-customer-support.desktop
Copy the folder that contains the new NVIDIA desktop shortcuts and its contents to your desktop directory. cp -rf /etc/skel/Desktop/Getting\ Started/

Unable to Set a Separate/xinerama Mode through the xorg.conf File or through nvidia-settings#

Issue#

Reported in release 5.0.2

In Station A100, in the BIOS, in OnBrd/Ext VGA Select=, when Auto or External is selected, the nvidia-conf-xconfig service sets up Xorg to use only the Display adapter.

Explanation#

Manually edit the existing the /etc/X11/xorg.conf.d/xorg-nvidia.conf file with the following settings:

--- xorg-nvidia.conf    2020-12-10 02:42:25.585721167 +0530
+++ /root/working-xinerama-xorg-nvidia.conf
 2020-12-10 02:38:05.368218170 +0530
@@ -8,8 +8,10 @@
 Section "ServerLayout"
     Identifier     "Layout0"
     Screen      0  "Screen0"
+    Screen      1  "Screen0 (1)" RightOf "Screen0"
     InputDevice    "Keyboard0" "CoreKeyboard"
     InputDevice    "Mouse0" "CorePointer"
+    Option         "Xinerama" "1"
 EndSection
 Section "Files"
@@ -43,6 +45,7 @@
     Driver         "nvidia"
     BusID          "PCI:2:0:0"
     VendorName     "NVIDIA Corporation"
+    Screen          0
 EndSection
 Section "Screen"
@@ -51,6 +54,25 @@
     Monitor        "Monitor0"
     DefaultDepth    24
     Option         "AllowEmptyInitialConfiguration" "True"
+    SubSection     "Display"
+        Depth       24
+    EndSubSection
+EndSection
+
+Section "Device"
+    Identifier     "Device0 (1)"
+    Driver         "nvidia"
+    BusID          "PCI:2:0:0"
+    VendorName     "NVIDIA Corporation"
+    Screen          1
+EndSection
+
+Section "Screen"
+    Identifier     "Screen0 (1)"
+    Device         "Device0 (1)"
+    Monitor        "Monitor0"
+    DefaultDepth    24
+    Option         "AllowEmptyInitialConfiguration" "True"
     SubSection     "Display"
         Depth       24
EndSubSection

Known Limitations Details#

This section lists details for known limitations and other issues that will not be fixed.

No RAID Partition Created After ISO Install#

Issue#

After using the DGX OS ISO to install the DGX OS, there is no /raid partition created.

Explanation#

This occurs if you reboot the system right after the installation is completed. To create the data RAID, the DGX OS installer sets up a systemd service to create the /raid partition on first boot. If you reboot before you give that service a chance to finish, the /raid partition may not be properly set up.

To create the /raid partition, issue the following.

sudo configure_raid_array.py -c -f

System Services Startup Messages Appear Upon Completion of First-Boot Setup#

Issue#

After completing the first-boot setup process and getting to the login prompt, system services startup messages appear.

Explanation#

Some services cannot be started until after the initial configuration process is completed. Starting the services at the Ubuntu prompt avoids the need for an additional reboot to complete the setup process.

Once completed, the service messages do not appear at subsequent system reboots.

[DGX A100]: Hot-plugging of Storage Drives not Supported#

Issue#

Hot-plugging or hot-swapping one of the storage drives might result in system instability or incorrect device reporting.

Explanation and Workaround#

Turn off the system before removing and replacing any of the storage drives.

[DGX A100]: Syslog Contains Numerous “SM LID is 0, maybe no SM is running” Error Messages#

Issue#

The system log (/var/log/syslog) contains multiple “SM LID is 0, maybe no SM is running” error message entries..

Explanation and Workaround#

This issue is the result of the srp_daemon within the Mellanox driver. The daemon is used to discover and connect to InfiniBand SCSI RDMA Protocol (SRP) targets.

If you are not using RDMA, then disable the srp_daemon as follows.

sudo systemctl disable srp_daemon.service

sudo systemctl disable srptools.service

[DGX-2]: Serial Over LAN Does not Work After Cold Resetting the BMC#

Issue#

After performing a cold reset on the BMC (ipmitool mc reset cold) while serial over LAN (SOL) is active, you cannot restart the SOL session.

Explanation and Workaround#

To re-active SOL, either

Reboot the system, or
Kill and then restart the process as follows:

Identify the Process ID of the SOL TTY process by running the following.

ps -ef | grep "/sbin/agetty -o -p -- \u --keep-baud 115200,38400,9600 ttyS0 vt220"

Kill the process.
```
kill <PID>
```
where <PID> is the Process ID returned by the previous command.
Either wait for the cron job to respawn the process or manually restart the process by running:
```
/sbin/agetty -o -p -- \u --keep-baud 115200,38400,9600 ttyS0 vt220
```

[DGX-2]: Some BMC Dashboard Quick Links Appear Erroneously#

Issue#

On the BMC dashboard, the following Quick Links appear by mistake and should not be used.

Maintenance->Firmware Update
Settings->NvMeManagement->NvMe P3700Vpd Info

To recreate the array in this case,

Set the drive back to a good state.

sudo /opt/MegaRAID/storcli/storcli64/c0/e<enclosure_id>/s<drive_slot> set good

Run the script to recreate the array.

sudo /usr/bin/configure_raid_array.py -c -f

[DGX-2]: Applications Cannot be Run Immediately Upon Powering on the DGX-2#

Issue#

When attempting to run an application that uses the GPUs immediately upon powering on the DGX-2 system, you may encounter the following error.

CUDA_ERROR_SYSTEM_NOT_READY

Explanation and Workaround#

The DGX-2 uses a fabric manager service to manage communication between all the GPUs in the system. When the DGX-2 system is powered on, the fabric manager initializes all the GPUs. This can take approximately 45 seconds. Until the GPUs are initialized, applications that attempt to use them will fail.

If you encounter the error, wait and launch the application again.

[DGX-1]: Script Cannot Recreate RAID Array After Re-inserting a Known Good SSD#

Issue#

When a good SSD is removed from the DGX-1 RAID 0 array and then re-inserted, the script to recreate the array fails.

Explanation and Workaround#

After re-inserting the SSD back into the system, the RAID controller sets the array to offline and marks the re-inserted SSD as Unconfigured_Bad (UBad). The script will fail when attempting to rebuild an array when one or more of the SSDs are marked Ubad.

To recreate the array in this case:

Set the drive back to a good state.

sudo /opt/MegaRAID/storcli/storcli64/c0/e<enclosure_id>/s<drive_slot> set good

Run the script to recreate the array.

sudo /usr/bin/configure_raid_array.py -c -f

[DGX Station A100] Suspend and Power Button Section Appears in Power Settings#

Issue#

Reported in release 5.0.2.

In the Power Settings page of the DGX Station A100 GUI, the Suspend & Power Button section is displayed even though the options do not work.

Explanation#

Suspend and sleep modes are not supported on the DGX Station A100.

[DGX-2] NVSM Does not Detect Downgraded GPU PCIe Link#

Issue#

If the GPU PCIe link is downgraded to Gen1, NVSM still reports the GPU health status as OK.

Explanation#

NVSM does not propagate the health status from the PCIe subsystem to other subsystems. For example: If there is a PCIe link degradation that is reported for Network Adapter, NVSM does not mark the Network adapter as unhealthy.

Resolved Issues Details#

Here are the issues that are resolved in the latest release:

NVSM Platform Displays as Unsupported#

Issue#

Reported in release 5.0.

In DGX Station, when you run

nvsm show version

instead of displaying DGX Station, the platform field displays Unsupported.

Explanation#

You can ignore this message.

NVSM Enumerates NVSwitches as 8-13 Instead of 0-5#

Issue#

Reported in release 4.99.9. Fixed in release 5.1

NVSM commands that list the NVSwitches (such as nvsm show nvswitches) will return the switches with 8-13 enumeration.

Example:

nvsm show /systems/localhost/nvswitches/systems/localhost/nvswitches

Targets:

- NVSwitch10
- NVSwitch11
- NVSwitch12
- NVSwitch13
- NVSwitch8
- NVSwitch9

Explanation#

Currently, NVSM recognizes NVSwitches as graphics devices, and enumerates them as a continuation of the GPU 0-7 enumeration.

[DGX A100] A System with Encrypted rootfs May Fail to Boot if one of the M.2 drives is Corrupted#

Issue#

Reported in release 4.99.9. Fixed in 5.0.2.

On systems with encrypted rootfs, if one of the M.2 drives is corrupted, the system stops at the BusyBox shell when booting.

Explanation#

The inactive RAID array (due to the corrupted M.2 drive) is not getting converted to a degraded RAID array.

To work around, perform the following within the BusyBox.

Issue the following:
```
mdadm --run /dev/md?\*
```
Wait a few seconds for the RAID and crypt to be discovered.
Exit.
```
exit
```

NVSM Fails to Show CPU Information on Non-English Locales#

Issue#

Reported in release 4.1.0 and 5.0 update 3

If the locale is other than English, the nvsm show cpu command reports the target processor does not exist.

sudo nvsm show cpu
ERROR:nvsm:Not Found for target address /systems/localhost/processors
ERROR:nvsm:Target address "/systems/\*/processors/\*" does not exist

Explanation#

To work around, set the locale to English before issuing nvsm show cpu.

Driver Version Mismatch Reported#

Issue#

Reported in release 5.0: 4/20/21 update

Fixed in 5/06/21 update.

After updating the DGX OS, the syslog/dmesg reports the following version mismatch:

nvidia-nvswitch: Version mismatch, kernel version 450.119.03 user version 450.51.06

Explanation#

his occurs with driver 450.119.03 on NVSwitch systems such as DGX -2 or DGX A100, and is due to a bug that causes the NSCQ library to fail to load. This will be resolved in an updated driver version.

Known Issues#

Known Issue Overview#

Known Issues Details#

Virtualization Not Supported#

Issue#

Errors Occur When Loading Mirrored Repositories on Air-Gapped Systems#

Issue#

Explanation#

Workaround#

DGX A800 Station/Server: DCGM Diagnostics may return Skip - All#

Issue#

Explanation#

Workaround#

DGX A800 Station/Server: mig-parted config#

Issue#

Workaround#

Regression of CUDA application startup performance#

Issue#

Explanation#

NVSM Stress Test Logs Do Not Contain Summary Information#

Issue#

Explanation#

nvidia-release-upgrade May Report That Not All Updates Have Been Installed and Exit#

Issue#

Explanation#

Duplicate EFI Variable May Cause efibootmgr to Fail#

Issue#

Explanation#

Erroneous Insufficient Power Error May Occur for PCIe Slots#

Issue#

Explanation#

AMD Crypto Coprocessor is not Supported#

Issue#

Explanation#

nvsm show alerts Reports NVSwitch PCIe Link Width is Degraded#

Issue#

Explanation#

nvsm show health Reports Firmware as Not Authenticated#

Issue#

Explanation#

Running NGC Containers Older than 20.10 May Produce “Incompatible MOFED Driver” Message#

Issue#

Explanation#

System May Slow Down When Using mpirun#

Issue#

Explanation#

Forced Reboot Hangs the OS#

Issue#

Explanation#

Applications that call the cuCTXCreate API Might Experience a Performance Drop#

Issue#

Explanation#

NVIDIA Desktop Shortcuts Not Updated After a DGX OS Release Upgrade#

Issue#

Explanation#

Unable to Set a Separate/xinerama Mode through the xorg.conf File or through nvidia-settings#

Issue#

Explanation#

Known Limitations Details#

No RAID Partition Created After ISO Install#

Issue#

Explanation#

System Services Startup Messages Appear Upon Completion of First-Boot Setup#

Issue#

Explanation#

[DGX A100]: Hot-plugging of Storage Drives not Supported#

Issue#

Explanation and Workaround#

[DGX A100]: Syslog Contains Numerous “SM LID is 0, maybe no SM is running” Error Messages#

Issue#

Explanation and Workaround#

[DGX-2]: Serial Over LAN Does not Work After Cold Resetting the BMC#

Issue#

Explanation and Workaround#

[DGX-2]: Some BMC Dashboard Quick Links Appear Erroneously#

Issue#

[DGX-2]: Applications Cannot be Run Immediately Upon Powering on the DGX-2#

Issue#

Explanation and Workaround#

[DGX-1]: Script Cannot Recreate RAID Array After Re-inserting a Known Good SSD#