Upgrading#
NVIDIA and Red Hat provide updates to the OS in the form of updated software packages between releases with security mitigations and bug fixes.
Important
Please note this important information you need to know before upgrading:
An in-place upgrade from Red Hat Linux 9 to Red Hat Linux 10 with the DGX software stack installed is not supported.
Before you install or perform the upgrade, refer to the Release Notes section for the latest Red Hat Linux version, known issues, and workarounds.
To remain at the same RHEL release and prevent incompatibility between Linux kernel and GPU drivers, pin the RHEL release by using the
subscription-manager release --set=<release>command. For example, thesubscription-manager release --set=10.1command ties the system to RHEL 10.1.
You should evaluate the available updates in regular intervals and update the system by using the
sudo dnf update --nobestcommand.
For a list of the known Common Vulnerabilities and Exposures (CVEs), including those that can be resolved by updating the OS software, refer to the Red Hat Security Updates
Note
You are responsible for upgrading the software on the DGX system to install the updates from these sources.
If updates are available, you can obtain the package upgrades by running:
sudo dnf update -nobest
Upgrades to the NVIDIA Graphics Drivers for Linux requires a restart to complete the kernel upgrade.
If you upgrade the NVIDIA Graphics Drivers for Linux without restarting the DGX system
the nvidia-smicommand may fail. An error message may be displayed, such as the one below.
nvidia-smi
Failed to initialize NVML: Driver/library version mismatch
Upgrading the OS and DGX Software#
This section provides information for upgrading your DGX system and optionally upgrading to a different GPU branch.
Upgrading the Software without Moving to a New Driver Branch#
To upgrade your DGX system with the latest Red Hat Linux upgrades, run the following commands:
sudo dnf update -y --nobest
sudo reboot
Installing or Upgrading to a Newer CUDA Toolkit Release#
Important
Before you install or upgrade to any CUDA Toolkit release, ensure the release is compatible with the driver that is installed on the system. Refer to CUDA Compatibility for more information and a compatibility matrix.
The CUDA Toolkit is not installed by default. You can manually install a qualified CUDA Toolkit release.
All CUDA Toolkit releases are supported that interoperate with the installed GPU driver. Refer to the release notes to see CUDA Toolkit current release versions.
Checking the Currently Installed CUDA Toolkit Release#
Here is some information about the prerequisite to determine the CUDA Toolkit release that you currently have installed.
Important
The CUDA Toolkit is not installed on DGX servers by default, and if you try to run the following command, no installed package will be listed
Before you install a new CUDA Toolkit release run the following command to check the currently installed release:
sudo dnf list installed "cuda-toolkit-*"
Determining the Available CUDA Toolkit Releases#
To see the CUDA Toolkit releases that are available to be installed, run the following command:
sudo dnf search "cuda-toolkit-*"
Installing the CUDA Toolkit or Upgrading Your CUDA Toolkit to a Newer Release#
You can install or upgrade your CUDA Toolkit to a newer release.
To install or upgrade the CUDA Toolkit, run the following command:
sudo dnf install cuda-toolkit-13-0
Note
Version 13.0 is shown as an example - replace the value with the version you want to install. See NVIDIA DGX Software for Red Hat Enterprise Linux 10 for which version of the CUDA Toolkit is recommended to be installed.
Installing GPUDirect Storage Support#
NVIDIA® Magnum IO GPUDirect® Storage (GDS) enables a direct data path for direct memory access (DMA) transfers between GPU memory and storage, which avoids a bounce buffer through the CPU.
Installing nvidia-gds#
To use GDS, perform the following steps:
Install
nvidia-gdswith the correct dependencie:sudo install nvidia-gds-{ver}
Use the CUDA Toolkit version number in place of <ver>; for example, 13-0