Introduction#
This document details the following:
Installation and configuration of all software components required to enable full NVIDIA Mission Control functionality
Software dependencies for each feature
Installation and deployment of the features themselves
Verification and testing procedures to ensure proper feature functionality
For control plane hardware information and requirements, go to: https://apps.nvidia.com/PID/ContentLibraries/Detail?id=1137731&srch=nmc%20hardware
An overview of Mission Control is shown in Figure 1.

Figure 1 Figure 1. Mission Control Software Architecture#
Assumptions and Prerequisites#
Before installing Mission Control software, complete the following tasks within BCM 11, as outlined in the NVIDIA Mission Control Management Plane and Rack Setup with NVIDIA GB200 NVL72 Systems Installation Guide:
All networks are defined and all switches (in-band and out-of-band) are “Up”
All control plane nodes are configured and “Up”:
K8s-CTRL-nodes
slogin
NMX-M
HA setup is configured and failover is verified
GB200 NVL72 rack(s) setup is completed:
All NVLink switch chips are online with NMX-C/T enabled
NVLink switch leader is assigned
Each GB200 NVL72 rack contains 9 NVLink switch chips
All 18 compute trays per rack are provisioned and in an “Up” state
Power control is established at rack, compute tray, and NVLink switch tray levels
NFS setup is complete and available
A valid BCM license with Mission Control enabled is installed
Mission Control Components#
The NVIDIA Mission Control 2.0 control plane provides a modular, scalable, and secure architecture based on the NVIDIA GB200 NVL72 platform. Through its composable design, customers gain seamless access to all system capabilities and resources via a centralized administrative interface, enabling streamlined operations across distributed infrastructure components without the complexity of managing multiple administrative domains.
Admin Control Plane
Head Nodes - x86 (Cluster deployment, management, and monitoring):
Base Command Manager (BCM) providing: * GUI, CLI, and API interfaces * OS provisioning * Observability* * Network provisioning * Rack and inventory management * Power profiles* * Leak monitoring and CHP * Slurm workflow software
Admin Service Nodes - x86 (K8s and BCM-integrated services):
BCM-Integrated Services: * NMX Manager* * Observability stack* * autonomous hardware recovery (AHR)* * autonomous job recovery (AJR)* * Power Reservation Steering (PRS) Service*
Common K8s Services including: * Loki, Prometheus, operators, and other components
BCM-Provisioned K8s
User Control Plane
Slurm Nodes - Arm64 (User access to Slurm cluster): * BCM-provisioned Slurm submission software
User Service Nodes - x86 (K8s and BCM-integrated services):
Run:AI components: * Control plane * Scheduler
Common K8s Services including: * GPU Operator, DRA, Network Operator
BCM-Provisioned K8s
Compute Plane
GB200 Rack (8 Racks per SU): * Execution hosts with compute trays containing CPU, GPU, and memory * 18 compute trays per rack with integrated hardware components
Additional Systems
The control plane integrates with additional infrastructure components:
BMS (Bare Metal Service) - Customer-provided API-compliant BMS*
NVLink Switches and InfiniBand Switches & UFM
Ethernet Switches
NFS Storage and HFS Storage
Key Features
Supports both training and inference workloads
Provides centralized management for configuration and observability
Delivers a standardized, scalable, secure control plane for all NVIDIA GB200 NVL72 systems
Note: Components marked with an asterisk () are new in Mission Control 2.0.0*