Software Update#
Clone the category in BCM
cmsh category clone <dgx-gb200> <new-dgx-gb200> commit
Clone the OS image
cmsh softwareimage clone <dgx-image> <new-dgx-image> commit
Set the new Category to the new image
cmsh category set <new-dgx-gb200> softwareimage <new-dgx-image> commit
Enter the Image to make changes
cm-chroot /cm/images/new-dgx-image/Create DOCA Repo based on Architecture
X86:
dd status=none of=/etc/apt/sources.list.d/doca.sources << EOF Types: deb URIs: https://linux.mellanox.com/public/repo/doca/baseos8-latest/ubuntu24.04/x86_64/ Suites: / Signed-By: /usr/share/keyrings/GPG-KEY-Mellanox.gpg EOF
arm64:
dd status=none of=/etc/apt/sources.list.d/doca.sources << EOF Types: deb URIs: https://linux.mellanox.com/public/repo/doca/baseos8-latest/ubuntu24.04/arm64-sbsa/ Suites: / Signed-By: /usr/share/keyrings/GPG-KEY-Mellanox.gpg EOF
Install the latest DGX OS packages
Make sure all system software and firmware components are updated according to the latest information in the software bill of materials (SBOM).
Note
Individual components (such as drivers or libraries) may be updated independently of the overall DGX OS version, and these component updates can be released more frequently than full OS releases. As a result, you may need to update certain components even when the OS version number does not change.
For a detailed DGX OS update guide, please refer to https://docs.nvidia.com/dgx/dgx-os-7-user-guide/upgrading-the-os.html#performing-package-upgrades-using-the-cli
Also note the Known issue workaround captured here: https://docs.nvidia.com/dgx/dgx-os-7-user-guide/known_issues.html#dgx-gb200-system-failure-during-upgrade
# 1. Update the internal database with the list of available packages and their versions apt update # 2. Review the packages that will be upgraded apt full-upgrade -s # 3. Upgrade to the latest version apt full-upgrade # 4. Re-run DKMS build with the --force option against the newly installed kernel (from Step 2) sudo dkms autoinstall --force -k <New Installed kernel> # 5. Re-configure broken packages sudo apt -f install -y
Note
This does not update the BCM Kernel in use.
Install MFT, DOCA, NVIDIA driver packages:
# Make sure the external repo is pointed to for DOCA packages cat /etc/apt/sources.list.d/doca.sources # Expected output: # Types: deb # URIs: https://linux.mellanox.com/public/repo/doca/DGX_GBxx_latest_DOCA/ubuntu24.04/arm64-sbsa/ # Suites: / # Signed-By: /usr/share/keyrings/GPG-KEY-Mellanox.gpg # Install DOCA package sudo apt-get update sudo apt install doca-all # Install driver package sudo dpkg -i nvidia-driver-local-repo-ubuntu2404-570.158.01_1.0-1_arm64.deb sudo cp /var/nvidia-driver-local-repo-ubuntu2404-570.158.01/nvidia-driver-local-5778B6CA-keyring.gpg /usr/share/keyrings/ sudo mv /etc/apt/sources.list.d/cuda-compute-repo.sources /etc/apt/sources.list.d/cuda-compute-repo.sources.disabled sudo apt update sudo apt install nvidia-driver-570-open sudo apt-get install nvidia-imex-570 sudo apt-get install nvidia-fabricmanager-570 sudo apt-get install libnvidia-nscq-570
Verify installations:
# Check DOCA packages sudo dpkg -l | grep <Expected DOCA Ver> # Check driver package sudo dpkg -l | grep <Expected Driver ver>
Save changes into the image
exitSet compute node to DGX Category
cmsh device foreach -n dgx-nodes[XX-XX] (set category <new-dgx-gb200>) commit
Reboot compute nodes
reboot -c <new-dgx-gb200>
Verify all components have been upgraded