Software Update#
Clone the category in BCM
cmsh category clone <dgx-gb200> <new-dgx-gb200> commit
Clone the OS image
cmsh softwareimage clone <dgx-image> <new-dgx-image> commit
Set the new Category to the new image
cmsh category set <new-dgx-gb200> softwareimage <new-dgx-image> commit
Enter the Image to make changes
cm-chroot /cm/images/new-dgx-image/Create DOCA Repo based on Architecture
X86:
dd status=none of=/etc/apt/sources.list.d/doca.sources << EOF Types: deb URIs: https://linux.mellanox.com/public/repo/doca/baseos8-latest/ubuntu24.04/x86_64/ Suites: / Signed-By: /usr/share/keyrings/GPG-KEY-Mellanox.gpg EOF
arm64:
dd status=none of=/etc/apt/sources.list.d/doca.sources << EOF Types: deb URIs: https://linux.mellanox.com/public/repo/doca/baseos8-latest/ubuntu24.04/arm64-sbsa/ Suites: / Signed-By: /usr/share/keyrings/GPG-KEY-Mellanox.gpg EOF
Install the latest DGX OS packages
Compatible drivers and software packages need to be installed to align with the new firmware.
For a detailed DGX OS update guide, please refer to https://docs.nvidia.com/dgx/dgx-os-7-user-guide/upgrading-the-os.html#performing-package-upgrades-using-the-cli
Also note the Known issue workaround captured here: https://docs.nvidia.com/dgx/dgx-os-7-user-guide/known_issues.html#dgx-gb200-system-failure-during-upgrade
# 1. Update the internal database with the list of available packages and their versions apt update # 2. Review the packages that will be upgraded apt full-upgrade -s # 3. Upgrade to the latest version apt full-upgrade # 4. Re-run DKMS build with the --force option against the newly installed kernel (from Step 2) sudo dkms autoinstall --force -k <New Installed kernel> # 5. Re-configure broken packages sudo apt -f install -y
Note
This does not update the BCM Kernel in use.
Install MFT, DOCA, NVIDIA driver packages:
# Make sure the external repo is pointed to for DOCA packages cat /etc/apt/sources.list.d/doca.sources # Expected output: # Types: deb # URIs: https://linux.mellanox.com/public/repo/doca/DGX_GBxx_latest_DOCA/ubuntu24.04/arm64-sbsa/ # Suites: / # Signed-By: /usr/share/keyrings/GPG-KEY-Mellanox.gpg # Install DOCA package sudo apt-get update sudo apt install doca-all # Install driver package sudo dpkg -i nvidia-driver-local-repo-ubuntu2404-570.158.01_1.0-1_arm64.deb sudo cp /var/nvidia-driver-local-repo-ubuntu2404-570.158.01/nvidia-driver-local-5778B6CA-keyring.gpg /usr/share/keyrings/ sudo mv /etc/apt/sources.list.d/cuda-compute-repo.sources /etc/apt/sources.list.d/cuda-compute-repo.sources.disabled sudo apt update sudo apt install nvidia-driver-570-open sudo apt-get install nvidia-imex-570 sudo apt-get install nvidia-fabricmanager-570 sudo apt-get install libnvidia-nscq-570
Verify installations:
# Check DOCA packages sudo dpkg -l | grep <Expected DOCA Ver> # Check driver package sudo dpkg -l | grep <Expected Driver ver>
Save changes into the image
exitSet compute node to DGX Category
cmsh device foreach -n dgx-nodes[XX-XX] (set category <new-dgx-gb200>) commit
Reboot compute nodes
reboot -c <new-dgx-gb200>
Verify all components have been upgraded