Mixed Architecture Setup#

In general, the architecture that is installed on the headnode is the default OS image available for other connected servers (nodes). When nodes use a different architecture, an OS image compiled for that architecture needs to be created or imported. For example, If the head node itself is an x86 server, an ARM/aarch64 OS image would be created to support any ARM/aarch64 nodes. The cm-image tool is specifically used for creating different OS/ARCH images (software image, node-installer image and cm-shared image).

Note

  • The headnode architecture can be either x86 or aarch64/ARM. The process is the same in either case.

  • The cm-image tool uses QEMU to emulate the other architecture if necessary. This can take a long time (approximately 4 hours).

The cm-image tool manages three components — software image, node-installer, and cm-shared — each of which has two parts:

  • A filesystem directory (for example, /cm/images/default-image-ubuntu2404-<arch>)

  • A BCM entity in cmsh. All three components have entries under cmshfspart; the software image also appears under cmshsoftwareimage.

Method 1 — Import Pre-compiled Images (fastest)#

  1. Download the pre-compiled .tar.gz files for /cm/node-installer, /cm/shared, and the default image for the architecture to be imported. Contact your NVIDIA enterprise support representative for more details.

    cd /tmp
    wget <node-installer url>/node-installer.tar.gz
    wget <cmshared url>/cmshared.tar.gz
    wget <default-image url>/default-image-ubuntu2404-<arch>.tar.gz
    
  2. Extract the software image:

    mkdir /cm/images/default-image-ubuntu2404-<arch>
    cd /cm/images/default-image-ubuntu2404-<arch>
    tar -xzvf /tmp/default-image-ubuntu2404-<arch>.tar.gz
    
  3. Extract the node-installer:

    mkdir /cm/node-installer-ubuntu2404-<arch>
    cd /cm/node-installer-ubuntu2404-<arch>
    tar -xzvf /tmp/node-installer.tar.gz
    
  4. Extract the /cm/shared component:

    mkdir /cm/shared-ubuntu2404-<arch>
    cd /cm/shared-ubuntu2404-<arch>
    tar -xzvf /tmp/cmshared.tar.gz
    
  5. Add all distribution artifacts (for example, ubuntu2404) to BCM:

    cm-image --verbose create all --arch x86_64 (or aarch64) --distro ubuntu2404 --add-only
    

Troubleshooting#

If Step 5 fails at the archOS stage and the components are intact, you can resume by running only the archOS step:

cm-image --verbose create all --arch x86_64 (or aarch64) --distro ubuntu2404 --add-archos

If --add-archos fails with an “arch/OS does not match” error, the error message identifies the failing component. That component must be recreated:

  1. Remove the component’s BCM entities and filesystem directory.

    In cmshfspart, remove the failing component (there may be more than one):

    cmsh; fspart; remove <component>; commit
    
    Example: identifying fspart components to remove
    [a06-u02-bcm-01->fspart]% ls | grep x86
    /cm/images/default-image-ubuntu2404-x86_64       image           default-image-ubuntu2404-x86_64
    /cm/images/default-image-ubuntu2404-x86_64/boot  boot            default-image-ubuntu2404-x86_64:boot
    /cm/node-installer-ubuntu2404-x86_64             node-installer
    /cm/shared-ubuntu2404-x86_64                    cm-shared
    [a06-u02-bcm-01->fspart]%
    

    If the failed component is the software image, also remove it from cmshsoftwareimage:

    cmsh; softwareimage; remove -d <softwareimage>; commit
    
    Example: identifying the software image to remove
    [a06-u02-bcm-01->softwareimage]% ls | grep x86
    default-image-ubuntu2404-x86_64     /cm/images/default-image-ubuntu2404-x86_64     6.8.0-51-generic        0
    [a06-u02-bcm-01->softwareimage]%
    

    Then remove the filesystem directory and verify it is gone:

    rm -rf <filesystem directory>
    
    Example: verifying removal
    root@a06-u02-bcm-01:~# ls /cm/images/default-image-ubuntu2404-x86_64
    ls: cannot access '/cm/images/default-image-ubuntu2404-x86_64': No such file or directory
    
    root@a06-u02-bcm-01:~# ls /cm/node-installer-ubuntu2404-x86_64
    ls: cannot access '/cm/node-installer-ubuntu2404-x86_64': No such file or directory
    
    root@a06-u02-bcm-01:~# ls /cm/shared-ubuntu2404-x86_64
    ls: cannot access '/cm/shared-ubuntu2404-x86_64': No such file or directory
    
  2. Recreate only the failed component by re-running cm-image create. If using cm-image create all, answer n when prompted for any components that are already correct.

    # software image
    cm-image create swimage --arch x86_64 (or aarch64) --distro ubuntu2404 --add-only
    
    # node-installer
    cm-image create node-installer --arch x86_64 (or aarch64) --distro ubuntu2404 --add-only
    
    # /cm/shared
    cm-image create cmshared --arch x86_64 (or aarch64) --distro ubuntu2404 --add-only
    
  3. Once the component is recreated, run --add-archos to complete the setup:

    cm-image --verbose create all --arch x86_64 (or aarch64) --distro ubuntu2404 --add-archos
    

Method 2 — Create images from ISO or a vanilla tar.gz#

  1. Download a BCM Installation ISO to create the other microarchitecture images or get a vanilla/base-distribution tar.gz. A vanilla .tar.gz can be downloaded from Base Distributions. Navigate to the microarchitecture of the desired OS and download the appropriate .tar.gz file.

  2. Use cm-image to use an .iso or a tar.gz as the source of the image.

    • If using an .iso, the command is:

      cm-image create all --arch x86_64 (or aarch64) --source /root/bcm-11.0-ubuntu2404.iso --distro ubuntu2404 --air-gapped
      
    • If using a vanilla/base-distribution .tar.gz, the command is:

      cm-image create all --arch x86_64 (or aarch64) --source /root/UBUNTU2404.tar.gz --distro ubuntu2404 --air-gapped
      

Note

  • To create an aarch64/ARM image, use –arch aarch64. It is necessary to use a basetar/iso created for the aarch64/ARM architecture.

  • The –air-gapped option skips the connectivity check to the BCM repos, assuming all the packages are present locally.

Caution

  • If a package dependency/conflict failure occurs, use the -j option to exclude the problem package. For example, if the libglapi-mesa package is causing a conflict, use the following command:

    cm-image create all --arch x86_64 --source /root/bcm-11.0-ubuntu2404.iso --distro ubuntu2404 --air-gapped -j libglapi-mesa
    

Method 3 — Generate an image with cm-image#

This method will create an image from scratch; however, it still requires an x86 or aarch64 basetar/iso to create the image depending on the image architecture that is to be generated. QEMU emulation is used in all cases where the image architecture is different from the head node architecture. This process can take a long time.

cm-image create all --bootstrap -d ubuntu2404 -a x86_64 (or aarch64)

Created Directories After the Import is Completed#

This is what a working mixed architecture setup should look like:

Example: Mixed architecture (ARM headnode) - default Images

cmsh -c "softwareimage;list"
+-------------------------------+-----------------------------------------------+------------------------+-------+
| Name (key)                    | Path (key)                                    | Kernel version         | Nodes |
+-------------------------------+-----------------------------------------------+------------------------+-------+
| default-image                 | /cm/images/default-image                      | 6.8.0-51-generic-64k   | 0     |
| default-image-ubuntu2404-x86_64| /cm/images/default-image-ubuntu2404-x86_64   | 6.8.0-51-generic       | 0     |
+-------------------------------+-----------------------------------------------+------------------------+-------+

Example: Mixed architecture (ARM headnode) - default Categories

cmsh -c "category;list"
+-------------------------------+------------------------------------+-------+
| Name (key)                    | Software Image                     | Nodes |
+-------------------------------+------------------------------------+-------+
| default-ubuntu2404-aarch64    | default-image                      |  0    |
| default-ubuntu2404-x86_64     | default-image-ubuntu2404-x86_64    |  0    |
| dgx                           | dgx-image                          |  0    |
+-------------------------------+------------------------------------+-------+

Example: Mixed architecture (Arm headnode) - /cm/shared

cmsh -c "device use master;fsmounts;list"
+------------------------------------------+-------------------------------------+-----------+
| Device                                   | Mountpoint (key)                    | Filesystem|
+------------------------------------------+-------------------------------------+-----------+
| 7.241.16.39:/home                        | /home                               | nfs       |
| 7.241.16.39:/cm_shared/ubuntu2404-x86_64 | /cm/shared-ubuntu2404-x86_64        | nfs       |
| 7.241.16.39:/cm_shared/ubuntu2404-aarch64| /cm/shared-ubuntu2404-aarch64       | nfs       |
+------------------------------------------+-------------------------------------+-----------+

Note

This mixed architecture configuration may not appear in fsmounts until after HA setup is complete.

Example: Mixed architecture (ARM headnode) - fsexports

Look at fsexports to show that both x86 and aarch64 node-installers are present as well as the /cm/shared directories.

cmsh -c "device use master;fsexports;list"
+-----------------------------------------------+-----------------------------------------------+-----------+-------+-------+----------+
| Name (key)                                   | Path                                          | Network   | Hosts | Write | Disabled |
+-----------------------------------------------+-----------------------------------------------+-----------+-------+-------+----------+
| /var/spool/burn@internalnet                  | /var/spool/burn                               | internalnet| yes  | no    |          |
| /home@internalnet                            | /home                                         | internalnet| yes  | no    |          |
| /cm/shared-ubuntu2404-x86_64@internalnet     | /cm/shared-ubuntu2404-x86_64                  | internalnet| yes  | no    |          |
| /cm/shared-ubuntu2404-aarch64@internalnet    | /cm/shared-ubuntu2404-aarch64                 | internalnet| yes  | no    |          |
| /cm/node-installer-ubuntu2404-x86_64@internalnet | /cm/node-installer-ubuntu2404-x86_64        | internalnet| no   | no    |          |
| /cm/node-installer-ubuntu2404-x86_64/certificat+ | /cm/node-installer-ubuntu2404-x86_64/certificat+ | internalnet| yes | no    |          |
| /cm/node-installer-ubuntu2404-aarch64@internaln+ | /cm/node-installer-ubuntu2404-aarch64        | internalnet| no   | no    |          |
| /cm/node-installer-ubuntu2404-aarch64/certifica+ | /cm/node-installer-ubuntu2404-aarch64/certifica+ | internalnet| yes | no    |          |
+-----------------------------------------------+-----------------------------------------------+-----------+-------+-------+----------+

Note

There will be an fsexports entry for each network that is used for node provisioning.