Introduction#

This guide will help a deployment engineer complete the software installation of Base Command Manager (BCM) 11 on the head node, set up the control plane nodes, and prepare the BCM software to power on, provision, and manage any NVIDIA GB200 NVL72 rack(s) using the BCM. It also will show the deployment engineer how to prepare the cluster for NVIDIA Mission Control Software Installation. The guide is designed to be followed in order, with each section building on the previous one unless indicated otherwise. It assumes that the reader has a basic understanding of Linux and networking concepts, as well as familiarity with BCM and NVIDIA hardware and software.

The major steps that will be covered in this guide are:

BCM 11 software installation on a server designated as the head node.
- Installing the BCM 11 software on the head node.
- Configuring the head node for mixed architecture provisioning support.
- Setting up the head node for network connectivity.
Configuring networking within BCM 11 manually for the entire cluster (OEM deployments).

Note

Configuration using the bcm-netautogen tool (for DGX SuperPOD deployments only) is described in the NVIDIA Mission Control DGX SuperPOD Ethernet North-South Network Configuration Guide.

Creating categories and their respective software images for node provisioning and NVIDIA Mission Control Software installation, including:
- Slurm login.
- Kubernetes Administrator (K8s-system-admin).
- Kubernetes User space (K8s-system-user).
- DGX GB200/GB300 compute trays.
Individual control plane node hardware setup:
- Slurm control plane nodes.
- K8s Administrator control plane nodes (K8s-system-admin).
- K8s User space control plane nodes (K8s-system-user).
GB200/GB300 Rack setup using the GB200/GB300 rack import process or a manual setup where major rack devices are added to BCM:
- GB200/GB300 compute trays.
- NVLink Switch devices.
- Power shelves.
High availability (HA) setup.
Adding shared storage (NFS) to BCM.

Note

NVIDIA DGX SuperPOD supports workload orchestration with Slurm and with Kubernetes (K8s) through RunAI. GB200/GB300 racks must be assigned to each respective workload manager.