Software Image Setup#
Each category/node type that is provisioned must have a software image created for each node category, so that each node type can get the customizations they need.
For the control plane nodes, each image is a clone of the default image for their respective architecture. For example, ARM/aarch64 based control plane nodes clone the default-image on an ARM/aarch64 based head node.
Conversely, for an x86 based node, its image is cloned from the default image that was imported in the Mixed Architecture Setup section; default-image-ubuntu2404-x86_64.
For DGX GB200 nodes, a DGX OS 7 image needs to be imported or created. If a BCM11-dgx.iso is used, it will already be included the image.
After the images are successfully created, cm-chroot-sw-image can be used to custom configure the control nodes per their individual requirements.
Example: Using cm-chroot-sw-img to update/modify software images
cm-chroot-sw-image /cm/images/<image name>Example: Example with k8s-system-user-node image
cm-chroot-sw-img /cm/images/k8s-system-user-imagemounted /cm/images/k8s-system-user-image/dev mounted /cm/images/k8s-system-user-image/dev/pts mounted /cm/images/k8s-system-user-image/proc mounted /cm/images/k8s-system-user-image/sys mounted /cm/images/k8s-system-user-image/run mounted /run/systemd/resolve/stub-resolv.conf -> /cm/images/k8s-system-user-image/run/systemd/resolve/resolv.conf Using chroot with mounted virtual filesystems to chroot in /cm/images/k8s-system-user-image.... Type 'exit' or ctrl-D to exit from the chroot in the software image. This also unmounts the above mentioned /dev /dev/pts /proc /sys /run filesystems in the software image. root@k8s-system-user-image:/# apt update root@k8s-system-user-image:/# history 1 apt update 2 apt install cmdaemon 3 dpkg -l | grep cmdaemon # <hit Ctrl+D to exit>
If there are scripts or any files that need to be added to the image so that they appear then they are provisioned to a node, copy them to: /cm/images/<node image>/<regular linux file directory>.
Reference: Adding bonding module to an image
cmsh -c "softwareimage; use default-image; kernelmodules; add bonding;commit"
Example: Setting the kernel parameters for DGX GB200/GB300 nodes.
cmsh -c "softwareimage; use <software image name>; set kernelparameters
"nouveau.modeset=0 iommu.passthrough=1
systemd.unified_cgroup_hierarchy=0
systemd.legacy_systemd_cgroup_controller init_on_alloc=0
numa_balancing=disable acpi_power_meter.force_map_on=y"; commit"
Example: Disabling multipathing on NVMe devices
Check current kernel parameters
cmsh; softwareimage; use <software image name>; get kernelparameters
If the multipath argument is not present, append the kernel parameters and verify
cmsh; softwareimage; use <software image name>; append kernelparameters " nvme_core.multipath=n"; commit
Result
cmsh; softwareimage; use <software image name>;get kernelparameters
rd.driver.blacklist=nouveau nvme_core.multipath=n
Note
If it is not the first entry, ensure there is a space before appending a kernel parameter.
Control Plane Software Image Setup#
For NVIDIA Mission Control 2.0, the control nodes that require provisioning are:
slogin
k8s-system-admin
k8s-system-user
The procedure for creating software images for these control nodes depends on the microarchitecture of the head node (the node running the management software). The following instructions outline both scenarios: when the head node is based on ARM (aarch64) architecture and when it is based on x86 architecture.
Note
When creating software images for control nodes with a different architecture than the head node, ensure that the appropriate default image for the target architecture is available.
For example:
If the head node is x86, the
default-image-ubuntu2404-aarch64image is needed for ARM control nodes.If the head node is ARM, the
default-image-ubuntu2404-x86_64image is needed for x86 control nodes.
These cross-architecture default images should be generated or imported during the mixed architecture setup process.
Scenario 1: Head Node on ARM (aarch64) Architecture
To create a software image for an ARM control node (head node and control node are both ARM):
Example command:
cmsh -c "softwareimage; use default-image; clone default-image <control node type>-image; commit"
To create a software image for an x86 control node (head node is ARM, control node is x86):
Example command:
cmsh -c "softwareimage; use default-image-ubuntu2404-x86_64; clone default-image-ubuntu2404-x86_64 <control node type>-image; commit"
Scenario 2: Head Node on x86 Architecture
To create a software image for an x86 control node (head node and control node are both x86):
Example command:
cmsh -c "softwareimage; use default-image; clone default-image <control node type>-image; commit"
To create a software image for an ARM control node (head node is x86, control node is ARM):
Example command:
cmsh -c "softwareimage; use default-image-ubuntu2404-aarch64; clone default-image-ubuntu2404-aarch64 <control node type>-image; commit"
For DGX SuperPOD Reference Architectures (RA), do the following:
SLURM Login (slogin)#
The slogin node(s) are aarch64/ARM based in the RA. They can be either/or in the field. The following example assumes the head node and the slogin node are both on aarch64/ARM microarchitecture.
cmsh -c "softwareimage; use default-image; clone default-image slogin-image; commit"
K8s-system-admin#
In the NMC 2.0, the three k8s-system-admin control plane nodes are x86 only since NMX-M is only supported on x86. An x86 vanilla image had to be created to do this step (since the head node in the RA is C2/ARM based). This is covered in the Mixed Architecture Setup section. The resultant image is applied to all three k8s-system-admin nodes that will be configured to that specific category.
cmsh -c "softwareimage; use default-image-ubuntu2404-x86_64; clone default-image-ubuntu2404-x86_64 k8s-system-admin-image; commit"
K8s-system-user#
In NMC 2.0, the three K8s-system-user plane nodes can be either x86 or aarch64/ARM, since Run:ai can be installed on either microarchitecture. The software image created for k8s-system-user nodes must match the architecture of the nodes themselves, not necessarily the head node.
Case 1: Head node and k8s-system-user node are the same architecture (either both aarch64/ARM or both x86)
cmsh -c "softwareimage; use default-image; clone default-image k8s-system-user-image; commit"
Case 2: Head node and k8s-system-user node are different architectures
If the head node is x86 but the k8s-system-user node is aarch64/ARM:
cmsh -c "softwareimage; use default-image-ubuntu2404-aarch64; clone default-image-ubuntu2404-aarch64 k8s-system-user-image; commit"
If the head node is aarch64/ARM but the k8s-system-user node is x86:
cmsh -c "softwareimage; use default-image-ubuntu2404-x86_64; clone default-image-ubuntu2404-x86_64 k8s-system-user-image; commit"
In all cases, ensure that the software image created and assigned to the k8s-system-user category matches the architecture of the k8s-system-user nodes.
GB200/GB300 Software Image Setup#
The latest DGX OS 7 software image, or any Linux distribution that meets the GB200/GB300 SBOM requirements, is required to provision GB200/GB300 nodes. The latest SBOM can be found in the NVIDIA Mission Control Release Notes. For version 2.0, see the SBOM and look for NVIDIA Mission Control 2.0.0 - Software Bill of Materials (SBOM) for NVIDIA GB200 NVL72
The following is the direct link to the SBOM for NMC 2.0. For newer releases, always check the release notes.
If multiple workload managers are needed, first create or import a ‘default’ GB200/GB300 software image, then clone it for each workload manager as necessary.
The general steps for importing the GB200/GB300 software image are as follows:
If available, download the BCM-created DGXOS.tar.gz (which includes BCM software packages).
DGX OS Download link: TBD, not available at this time.
Import the image from the tar.gz using the cm-create-image tool. If the tar.gz is not available, generate the DGX OS image using the appropriate BCM ISO.
Note
The ISO required to create the GB200/GB300 software image depends on the architecture of the BCM installer ISO (head node):
If the BCM installer ISO (head node) is ARM-based, the same ARM-based installer ISO can be used for creating the DGX OS image for GB200/GB300 nodes—no additional ISO download is required.
If the BCM installer ISO (head node) is x86-based, the ARM version of the DGX OS ISO (ARM.iso) must be downloaded to create the software image for GB200/GB300 nodes.
Clone the image for each workload manager as needed.
cmsh -c "softwareimage; use <imported or created GB200/GB300 software image name>; clone <GB200/GB300 software image name> <dgx-gb200/gb300-slurm-image or dgx-gb200/gb300-k8s-image>; commit"
Update the software image with the required software packages for the GB200/GB300 nodes if the image is not already compliant. Use the cm-chroot-sw-image tool to update the software image, or a provisioned node can be used to do the update where the image changes can be imported back to the software image.
Commit all changes to GB200/GB300 the software image and any related categories.
DGX OS 7 Image Creation#
If the DGX OS 7 image is not included in the ISO, it can be created in two ways (see below).
Method 1 — Import the DGX OS 7 image from a tar.gz (recommended)#
Download the BCM created DGXOS.tar.gz. This DGXOS.tar.gz is generated to include BCM software packages.
DGX OS Download link: ` Installing DGX OS <https://docs.nvidia.com/base-os/dgx-os-5/installing_dgx_os.html>`_.
Import the image from the tar.gz using the cm-create-image tool.
$ cm-create-image -a <IMPORT_IMAGE_NAME>.tar.gz -n <SOFTWARE_IMAGE_NAME_AS_IT_APPEARS_IN_BCM> --no-cm-cuda-repo
Reference: Results of DGX OS 7 image creation
Reference: Results of DGX OS 7 image creation
Running validate base tar........................ [ OK ] Running sanity check............................. [ OK ] Running unpack base tar.......................... [ OK ] ******************** IMPORTANT **************************** Please confirm that the base distribution repositories for the software image are enabled. For instructions on how to enable repositories for your software image, please refer the administrator's manual. Image creation can be resumed in one of the following ways: ----------------------------------------------------------- 1. Enter 'e' to exit, and configure repositories. Then, restart program with the -d (--fromdir) option. cm-create-image -d /cm/images/dgxos-7.2-image -n dgxos-7.2-image 2. Open a new console, and configure repositories. Then enter 'c' on this console, to continue software image creation. *********************************************************** Continue(c)/Exit(e)? c Finalize base distribution....................... [ OK ] Copying cm repo files............................ [ OK ] Validating repo configuration.................... [ OK ] Installing distribution packages................. [ OK ] Finalizing image services........................ [ OK ] Installing CM packages........................... [ OK ] Finalizing cluster services...................... [ OK ] Copying cluster certificate to image............. [ OK ] Adding/Updating software image................... [ OK ]
Note
If the BCM 11 online repos are not available, a local BCM 11 iso can be used to pull packages instead.
$ cm-create-image -a /root/baseos7.1-image-arm64-04-09-2025.tar.gz --cmdvd /root/bcm-11.0-ubuntu2404.iso -n baseos7.1-image -s --no-cm-cuda-repo
Method 2 — Use the cm-create-image tool with the –dgx flag to generate the image#
When the tar.gz cannot be imported, BCM can be used to generate the image. It requires:
the correct BCM ISO for the architecture for the DGX OS image.
the correct DOCA version for the DGX OS image. Check the SBOM for the correct DOCA version.
the correct DGX type for the DGX OS image. Currently, only
dgx_b200anddgx_gb200are supported in the arguments. However, for GB300, just usedgx_gb200.
$ cm-create-image --cmdvd <local source iso to generate the image on the correct architecture> --no-cm-cuda-repo --extra-pkg-group doca_ofed_<doca version> --dgx --dgx-type dgx_gb200 --imagename <gb200/gb300>-image-<date>
Method 3 — Manual creation of a GB200/GB300 compatible image#
The default image for ARM that is either included in the ISO or generated by the mixed architecture setup (on an x86 head node) can be used to create a GB200/GB300 compatible image. Clone that default image, then make the modifications with cm-chroot-sw-image. Please refer to the SBOM to get the correct versions of software packages to be installed. For more details, see the Appendix section “Check the DGX OS packages to meet current SBOM”.
Software image summary#
When complete, the available software images should resemble the following example:
$ cmsh -c "softwareimage;list"
Reference: Software image summary
Name (key) Path (key) Kernel version Nodes
--------------------------- ------------------------------------------- -------------------- -----
k8s-system-admin-image /cm/images/k8s-system-admin-image 6.8.0-51-generic 3
baseos7-image /cm/images/baseos7-image 6.8.0-1021-nvidia-64k 0
default-image /cm/images/default-image 6.8.0-51-generic-64k 0
default-image-ubuntu2404-x86_64 /cm/images/default-image-ubuntu2404-x86_64 6.8.0-51-generic 0
k8s-system-user-image /cm/images/k8s-system-user-image 6.8.0-51-generic-64k 3
slogin-image /cm/images/slogin-image 6.8.0-51-generic-64k 2