Skip to main content
Ctrl+K
NVIDIA Mission Control Software with GB200 NVL72 Systems Administration Guide - Home NVIDIA Mission Control Software with GB200 NVL72 Systems Administration Guide - Home

NVIDIA Mission Control Software with GB200 NVL72 Systems Administration Guide

NVIDIA Mission Control Software with GB200 NVL72 Systems Administration Guide - Home NVIDIA Mission Control Software with GB200 NVL72 Systems Administration Guide - Home

NVIDIA Mission Control Software with GB200 NVL72 Systems Administration Guide

Table of Contents

NVIDIA Mission Control

  • Overview
  • Mission Control Software Stack
  • Node and Category Management
  • Slurm Workload Management
  • Observability Software
  • Connecting to NVIDIA Mission Control autonomous hardware recovery
  • Out-of-Band Management
  • High-Speed Fabric Management
  • Leak Detection
  • Backups

Power Reservation Steering

  • Introduction
  • Concepts and Components
  • Installation
  • Advanced Configuration
  • Metrics
  • Troubleshooting
  • FAQ

Autonomous Recovery Engine

  • Introduction
  • Configuration
  • Accessing Clusters
  • Accessing Dashboards
  • Monitoring and Logs
  • Grafana Cloud Setup
  • Installing and Upgrading ARE
  • Example Commands
  • Viewing Job Details
  • Accessing the Cockpit
  • ARE Job Monitoring
  • Confirming ARE is Operational
  • How-to: Toggle Dry-Run Mode
  • Debugging Common Issues

Workload Power Profile Solution (WPPS)

  • Introduction
  • Components and Concepts
  • Installation
  • First Slurm Job with WPPS
  • Frequently Asked Questions
  • Installation

Installation#

For installation instructions, see Power Reservation Steering Installation Guide.

previous

Concepts and Components

next

Advanced Configuration

NVIDIA NVIDIA

Copyright © 2024-2025, NVIDIA Corporation.

Last updated on Sep 26, 2025.