Skip to main content
Ctrl+K
NVIDIA Mission Control Software Administration Guide - Home NVIDIA Mission Control Software Administration Guide - Home

NVIDIA Mission Control Software Administration Guide

NVIDIA Mission Control Software Administration Guide - Home NVIDIA Mission Control Software Administration Guide - Home

NVIDIA Mission Control Software Administration Guide

Table of Contents

NVIDIA Mission Control

  • Overview
  • Mission Control Software Stack
  • Node and Category Management
  • Slurm Workload Management
  • NVIDIA Run:ai Installation
  • Adding and Removing Nodes from Run:ai or Slurm
  • Observability Software
  • Connecting to NVIDIA Mission Control Autonomous Hardware Recovery
  • Out-of-Band Management
  • NVLink Partition Management
  • NVLink Management Software (NMX + NetQ)
  • Leak Detection
  • Backups

Power Reservation Steering

  • Introduction
  • Concepts and Components
  • Installation
  • Advanced Configuration
  • Metrics
  • Troubleshooting
  • FAQ

Autonomous Job Recovery

  • Introduction
  • Configuration
  • Accessing Clusters
  • Accessing Dashboards
  • Monitoring and Logs
  • Grafana Cloud Setup
  • Installing and Upgrading AJR
  • Example Commands
  • Viewing Job Details
  • Accessing the Cockpit
  • AJR Job Monitoring
  • Confirming AJR is Operational
  • How-to: Toggle Dry-Run Mode
  • Debugging Common Issues

Workload Power Profile Solution (WPPS)

  • Introduction
  • Components and Concepts
  • Installation
  • First Slurm Job with WPPS
  • Frequently Asked Questions
  • Installation

Installation#

For installation instructions, see Power Reservation Steering Installation Guide.

previous

Concepts and Components

next

Advanced Configuration

NVIDIA NVIDIA

Copyright © 2024-2026, NVIDIA Corporation.

Last updated on Mar 04, 2026.