Runbooks

Overview

Runbooks are guided, end-to-end procedures for exercising specific DPS configurations against real hardware. Unlike the Tasks section, which documents individual operations, each runbook walks through a complete pilot scenario from baseline measurement through cleanup.

The Inference power pilot: NVL72 fleet management runbook walks through inference on GB200/GB300 NVL72 with automatic fleet power management decoupled from workload schedulers and provisioning systems: a 3-rack baseline without DPS in the control path, then the same workload across 5 racks under DPS while keeping the operating envelope fixed. It defines MaxLPS as background terminology and follows a scheduler-independent path—topology plus DPM, PRS, and Shared GPU in resource group(s) whose lifecycle is not driven by those integrations (the resource group API remains available when you do want coupling).