Software Update#

  1. Clone the category in BCM

    cmsh
    category
    clone <dgx-gb200> <new-dgx-gb200>
    commit
    
  2. Clone the OS image

    cmsh
    softwareimage
    clone <dgx-image> <new-dgx-image>
    commit
    

    Set the new Category to the new image

    cmsh
    category
    set <new-dgx-gb200> softwareimage <new-dgx-image>
    commit
    
  3. Enter the Image to make changes

    cm-chroot /cm/images/new-dgx-image/
    
  4. Create DOCA Repo based on Architecture

    X86:

    dd status=none of=/etc/apt/sources.list.d/doca.sources << EOF
    Types: deb
    URIs: https://linux.mellanox.com/public/repo/doca/baseos8-latest/ubuntu24.04/x86_64/
    Suites: /
    Signed-By: /usr/share/keyrings/GPG-KEY-Mellanox.gpg
    EOF
    

    arm64:

    dd status=none of=/etc/apt/sources.list.d/doca.sources << EOF
    Types: deb
    URIs: https://linux.mellanox.com/public/repo/doca/baseos8-latest/ubuntu24.04/arm64-sbsa/
    Suites: /
    Signed-By: /usr/share/keyrings/GPG-KEY-Mellanox.gpg
    EOF
    
  5. Install the latest DGX OS packages

    Make sure all system software and firmware components are updated according to the latest information in the software bill of materials (SBOM).

    Note

    Individual components (such as drivers or libraries) may be updated independently of the overall DGX OS version, and these component updates can be released more frequently than full OS releases. As a result, you may need to update certain components even when the OS version number does not change.

    For a detailed DGX OS update guide, please refer to https://docs.nvidia.com/dgx/dgx-os-7-user-guide/upgrading-the-os.html#performing-package-upgrades-using-the-cli

    Also note the Known issue workaround captured here: https://docs.nvidia.com/dgx/dgx-os-7-user-guide/known_issues.html#dgx-gb200-system-failure-during-upgrade

    # 1. Update the internal database with the list of available packages and their versions
    apt update
    
    # 2. Review the packages that will be upgraded
    apt full-upgrade -s
    
    # 3. Upgrade to the latest version
    apt full-upgrade
    
    # 4. Re-run DKMS build with the --force option against the newly installed kernel (from Step 2)
    sudo dkms autoinstall --force -k <New Installed kernel>
    
    # 5. Re-configure broken packages
    sudo apt -f install -y
    

    Note

    This does not update the BCM Kernel in use.

    Install MFT, DOCA, NVIDIA driver packages:

    # Make sure the external repo is pointed to for DOCA packages
    cat /etc/apt/sources.list.d/doca.sources
    
    # Expected output:
    # Types: deb
    # URIs: https://linux.mellanox.com/public/repo/doca/DGX_GBxx_latest_DOCA/ubuntu24.04/arm64-sbsa/
    # Suites: /
    # Signed-By: /usr/share/keyrings/GPG-KEY-Mellanox.gpg
    
    # Install DOCA package
    sudo apt-get update
    sudo apt install doca-all
    
    # Install driver package
    sudo dpkg -i nvidia-driver-local-repo-ubuntu2404-570.158.01_1.0-1_arm64.deb
    sudo cp /var/nvidia-driver-local-repo-ubuntu2404-570.158.01/nvidia-driver-local-5778B6CA-keyring.gpg /usr/share/keyrings/
    sudo mv /etc/apt/sources.list.d/cuda-compute-repo.sources /etc/apt/sources.list.d/cuda-compute-repo.sources.disabled
    sudo apt update
    sudo apt install nvidia-driver-570-open
    sudo apt-get install nvidia-imex-570
    sudo apt-get install nvidia-fabricmanager-570
    sudo apt-get install libnvidia-nscq-570
    

    Verify installations:

    # Check DOCA packages
    sudo dpkg -l | grep <Expected DOCA Ver>
    
    # Check driver package
    sudo dpkg -l | grep <Expected Driver ver>
    
  6. Save changes into the image

    exit
    
  7. Set compute node to DGX Category

    cmsh
    device
    foreach -n dgx-nodes[XX-XX] (set category <new-dgx-gb200>)
    commit
    
  8. Reboot compute nodes

    reboot -c <new-dgx-gb200>
    
  9. Verify all components have been upgraded