dpsctl

Overview

dpsctl is the command-line interface for DPS that provides comprehensive management and monitoring capabilities. It serves as the primary tool for administrators, operators, and users to interact with the DPS system from the command-line.

Command Categories

dpsctl organizes functionality into logical command groups:

Infrastructure Management

  • topology - Import, activate, and manage datacenter topologies
  • import - Import entities from external sources (BCM, Nautobot)
  • device - Manage device specifications and hardware capabilities
  • policy - Create and manage power policies

Workload Management

  • resource-group - Create, activate, and manage workload power allocation
  • gpu-policy - Set per-GPU power limits for fine-grained control

Authentication & Access

  • login - Authenticate with DPS server and manage user sessions
  • verify - Check DPS deployment status and connectivity

Monitoring & Diagnostics

  • check - Diagnostic operations, connectivity tests, and health monitoring
  • server-version - Get DPS server version and build information
  • task - Monitor ongoing async tasks (e.g., activation/deactivation operations)

Usage

dpsctl can be installed as a native binary or run directly from the published nvcr.io/nvidia/dpsctl container image — see Installing dpsctl for both options.

Basic dpsctl operations:

# Authenticate with DPS
dpsctl login --username alice

# Import datacenter topology
dpsctl topology import datacenter.json

# Create workload resource group
dpsctl resource-group create --resource-group "ml-job" --policy "Node-High"

# Check system health
dpsctl check connection

Further Reading