Mixed Architecture Setup#
In general, the architecture that is installed on the headnode is the default OS image available for other connected servers (nodes). When nodes use a different architecture, an OS image compiled for that architecture needs to be created or imported. For example, If the head node itself is an x86 server, an ARM/aarch64 OS image would be created to support any ARM/aarch64 nodes. The cm-image tool is specifically used for creating different OS/ARCH images (software image, node-installer image and cm-shared image).
Note
The headnode architecture can be either x86 or aarch64/ARM. The process is the same in either case.
The cm-image tool uses QEMU to emulate the other architecture if necessary. This can take a long time (approximately 4 hours).
The cm-image tool manages three components — software image, node-installer, and
cm-shared — each of which has two parts:
A filesystem directory (for example,
/cm/images/default-image-ubuntu2404-<arch>)A BCM entity in
cmsh. All three components have entries undercmsh→fspart; the software image also appears undercmsh→softwareimage.
Method 1 — Import Pre-compiled Images (fastest)#
Download the pre-compiled
.tar.gzfiles for/cm/node-installer,/cm/shared, and the default image for the architecture to be imported. Contact your NVIDIA enterprise support representative for more details.cd /tmp wget <node-installer url>/node-installer.tar.gz wget <cmshared url>/cmshared.tar.gz wget <default-image url>/default-image-ubuntu2404-<arch>.tar.gz
Extract the software image:
mkdir /cm/images/default-image-ubuntu2404-<arch> cd /cm/images/default-image-ubuntu2404-<arch> tar -xzvf /tmp/default-image-ubuntu2404-<arch>.tar.gz
Extract the node-installer:
mkdir /cm/node-installer-ubuntu2404-<arch> cd /cm/node-installer-ubuntu2404-<arch> tar -xzvf /tmp/node-installer.tar.gz
Extract the
/cm/sharedcomponent:mkdir /cm/shared-ubuntu2404-<arch> cd /cm/shared-ubuntu2404-<arch> tar -xzvf /tmp/cmshared.tar.gz
Add all distribution artifacts (for example, ubuntu2404) to BCM:
cm-image --verbose create all --arch x86_64 (or aarch64) --distro ubuntu2404 --add-only
Troubleshooting#
If Step 5 fails at the archOS stage and the components are intact, you can resume by running only the archOS step:
cm-image --verbose create all --arch x86_64 (or aarch64) --distro ubuntu2404 --add-archos
If --add-archos fails with an “arch/OS does not match” error, the error message
identifies the failing component. That component must be recreated:
Remove the component’s BCM entities and filesystem directory.
In
cmsh→fspart, remove the failing component (there may be more than one):cmsh; fspart; remove <component>; commitExample: identifying fspart components to remove
[a06-u02-bcm-01->fspart]% ls | grep x86 /cm/images/default-image-ubuntu2404-x86_64 image default-image-ubuntu2404-x86_64 /cm/images/default-image-ubuntu2404-x86_64/boot boot default-image-ubuntu2404-x86_64:boot /cm/node-installer-ubuntu2404-x86_64 node-installer /cm/shared-ubuntu2404-x86_64 cm-shared [a06-u02-bcm-01->fspart]%
If the failed component is the software image, also remove it from
cmsh→softwareimage:cmsh; softwareimage; remove -d <softwareimage>; commitExample: identifying the software image to remove
[a06-u02-bcm-01->softwareimage]% ls | grep x86 default-image-ubuntu2404-x86_64 /cm/images/default-image-ubuntu2404-x86_64 6.8.0-51-generic 0 [a06-u02-bcm-01->softwareimage]%
Then remove the filesystem directory and verify it is gone:
rm -rf <filesystem directory>Example: verifying removal
root@a06-u02-bcm-01:~# ls /cm/images/default-image-ubuntu2404-x86_64 ls: cannot access '/cm/images/default-image-ubuntu2404-x86_64': No such file or directory root@a06-u02-bcm-01:~# ls /cm/node-installer-ubuntu2404-x86_64 ls: cannot access '/cm/node-installer-ubuntu2404-x86_64': No such file or directory root@a06-u02-bcm-01:~# ls /cm/shared-ubuntu2404-x86_64 ls: cannot access '/cm/shared-ubuntu2404-x86_64': No such file or directory
Recreate only the failed component by re-running
cm-image create. If usingcm-image create all, answernwhen prompted for any components that are already correct.# software image cm-image create swimage --arch x86_64 (or aarch64) --distro ubuntu2404 --add-only # node-installer cm-image create node-installer --arch x86_64 (or aarch64) --distro ubuntu2404 --add-only # /cm/shared cm-image create cmshared --arch x86_64 (or aarch64) --distro ubuntu2404 --add-only
Once the component is recreated, run
--add-archosto complete the setup:cm-image --verbose create all --arch x86_64 (or aarch64) --distro ubuntu2404 --add-archos
Method 2 — Create images from ISO or a vanilla tar.gz#
Download a BCM Installation ISO to create the other microarchitecture images or get a vanilla/base-distribution tar.gz. A vanilla .tar.gz can be downloaded from Base Distributions. Navigate to the microarchitecture of the desired OS and download the appropriate .tar.gz file.
Use cm-image to use an .iso or a tar.gz as the source of the image.
If using an .iso, the command is:
cm-image create all --arch x86_64 (or aarch64) --source /root/bcm-11.0-ubuntu2404.iso --distro ubuntu2404 --air-gappedIf using a vanilla/base-distribution .tar.gz, the command is:
cm-image create all --arch x86_64 (or aarch64) --source /root/UBUNTU2404.tar.gz --distro ubuntu2404 --air-gapped
Note
To create an aarch64/ARM image, use –arch aarch64. It is necessary to use a basetar/iso created for the aarch64/ARM architecture.
The –air-gapped option skips the connectivity check to the BCM repos, assuming all the packages are present locally.
Caution
If a package dependency/conflict failure occurs, use the -j option to exclude the problem package. For example, if the libglapi-mesa package is causing a conflict, use the following command:
cm-image create all --arch x86_64 --source /root/bcm-11.0-ubuntu2404.iso --distro ubuntu2404 --air-gapped -j libglapi-mesa
Method 3 — Generate an image with cm-image#
This method will create an image from scratch; however, it still requires an x86 or aarch64 basetar/iso to create the image depending on the image architecture that is to be generated. QEMU emulation is used in all cases where the image architecture is different from the head node architecture. This process can take a long time.
cm-image create all --bootstrap -d ubuntu2404 -a x86_64 (or aarch64)
Created Directories After the Import is Completed#
This is what a working mixed architecture setup should look like:
Example: Mixed architecture (ARM headnode) - default Images
cmsh -c "softwareimage;list"
+-------------------------------+-----------------------------------------------+------------------------+-------+
| Name (key) | Path (key) | Kernel version | Nodes |
+-------------------------------+-----------------------------------------------+------------------------+-------+
| default-image | /cm/images/default-image | 6.8.0-51-generic-64k | 0 |
| default-image-ubuntu2404-x86_64| /cm/images/default-image-ubuntu2404-x86_64 | 6.8.0-51-generic | 0 |
+-------------------------------+-----------------------------------------------+------------------------+-------+
Example: Mixed architecture (ARM headnode) - default Categories
cmsh -c "category;list"
+-------------------------------+------------------------------------+-------+
| Name (key) | Software Image | Nodes |
+-------------------------------+------------------------------------+-------+
| default-ubuntu2404-aarch64 | default-image | 0 |
| default-ubuntu2404-x86_64 | default-image-ubuntu2404-x86_64 | 0 |
| dgx | dgx-image | 0 |
+-------------------------------+------------------------------------+-------+
Example: Mixed architecture (Arm headnode) - /cm/shared
cmsh -c "device use master;fsmounts;list"
+------------------------------------------+-------------------------------------+-----------+
| Device | Mountpoint (key) | Filesystem|
+------------------------------------------+-------------------------------------+-----------+
| 7.241.16.39:/home | /home | nfs |
| 7.241.16.39:/cm_shared/ubuntu2404-x86_64 | /cm/shared-ubuntu2404-x86_64 | nfs |
| 7.241.16.39:/cm_shared/ubuntu2404-aarch64| /cm/shared-ubuntu2404-aarch64 | nfs |
+------------------------------------------+-------------------------------------+-----------+
Note
This mixed architecture configuration may not appear in fsmounts until after HA setup is complete.
Example: Mixed architecture (ARM headnode) - fsexports
Look at fsexports to show that both x86 and aarch64 node-installers are present as well as the /cm/shared directories.
cmsh -c "device use master;fsexports;list"
+-----------------------------------------------+-----------------------------------------------+-----------+-------+-------+----------+
| Name (key) | Path | Network | Hosts | Write | Disabled |
+-----------------------------------------------+-----------------------------------------------+-----------+-------+-------+----------+
| /var/spool/burn@internalnet | /var/spool/burn | internalnet| yes | no | |
| /home@internalnet | /home | internalnet| yes | no | |
| /cm/shared-ubuntu2404-x86_64@internalnet | /cm/shared-ubuntu2404-x86_64 | internalnet| yes | no | |
| /cm/shared-ubuntu2404-aarch64@internalnet | /cm/shared-ubuntu2404-aarch64 | internalnet| yes | no | |
| /cm/node-installer-ubuntu2404-x86_64@internalnet | /cm/node-installer-ubuntu2404-x86_64 | internalnet| no | no | |
| /cm/node-installer-ubuntu2404-x86_64/certificat+ | /cm/node-installer-ubuntu2404-x86_64/certificat+ | internalnet| yes | no | |
| /cm/node-installer-ubuntu2404-aarch64@internaln+ | /cm/node-installer-ubuntu2404-aarch64 | internalnet| no | no | |
| /cm/node-installer-ubuntu2404-aarch64/certifica+ | /cm/node-installer-ubuntu2404-aarch64/certifica+ | internalnet| yes | no | |
+-----------------------------------------------+-----------------------------------------------+-----------+-------+-------+----------+
Note
There will be an fsexports entry for each network that is used for node provisioning.