Installing NVIDIA Mission Control Components in a Connected Environment#
Install each NVIDIA Mission Control component using the steps for that component. All packages and container images are pulled directly from their upstream sources (NGC, public Helm repositories) during installation — no local registry or bundle is required.
The Autonomous Recovery Engine (ARE) consists of two independently installed components: Autonomous Hardware Recovery (AHR) and Autonomous Job Recovery (AJR). Both must be installed to enable full ARE functionality.
Note
Run:ai installs on the k8s-user cluster (k8s-system-user nodes),
not the k8s-admin cluster. Ensure the k8s-system-user node category and
the GPU worker node category are created in BCM before running the Run:ai
installation wizard.
- NetQ
- Observability Stack
- Configure storage and retention for Prometheus
- Configure storage and retention for Loki
- Enable Prometheus scrape endpoint in BCM
- Configure Prometheus to scrape metrics from BCM endpoint
- Configure Prometheus to scrape metrics from DCGM endpoints
- Grafana and querying BCM metrics
- Install NMC Grafana dashboards
- Reduce BCM monitoring directory size
- Autonomous Hardware Recovery (AHR)
- Deployment Diagram
- Prerequisites
- NVIDIA Mission Control autonomous hardware recovery Installation via BCM TUI Wizard
- Initial Login to the NVIDIA autonomous hardware recovery UI
- Backend Health and Agent Connectivity
- NVIDIA Mission Control autonomous hardware recovery Uninstallation via BCM TUI Wizard
- AHR Appendix
- Autonomous Job Recovery (AJR)
- Introduction
- Prerequisites
- Before the Autonomous Job Recovery installation
- Create certificates for AJR endpoints
- Setup DNS resolution for AJR endpoints
- Autonomous Job Recovery Installation
- Autonomous Job Recovery post-installation steps
- Autonomous Job Recovery Verification Steps
- Autonomous Job Recovery uninstallation
- Third-Party Open Source Software Licenses
- Power Reservation Steering
- Domain Power Service (DPS)
- NVIDIA Run:ai Installation
- NIM Operator
- NVIDIA Dynamo
- NVIDIA Mission Control Launchpad
- Security Hardening