SDK Simulator Topology

Overview

The DPS SDK provides a complete simulated datacenter environment designed to serve as an evaluation playground for the Domain Power Service (DPS). This topology, combined with the BMC simulator and simulator playbooks, enables developers and administrators to test power management strategies, resource group operations, and workload optimization without requiring physical hardware infrastructure.

The simulator topology represents a small-scale datacenter deployment with realistic power distribution hierarchies, allowing users to:

  • Test and validate power policies before deploying to production
  • Experiment with resource group creation and management
  • Understand power distribution constraints and optimization strategies
  • Develop integration code against a stable, reproducible environment
  • Perform load testing and capacity planning exercises

Topology Structure

The SDK simulator topology defines a complete power distribution network for a modern AI datacenter featuring 144 DGX GB300 compute nodes organized across 8 racks.

Power Distribution Hierarchy

The topology implements a four-tier power distribution architecture that mirrors real-world datacenter deployments:

Utility (750 kW)
    └── Switchboard (1.6 MW)
            ├── FloorPDU-1 (450 kW) → Racks 1, 5
            ├── FloorPDU-2 (450 kW) → Racks 2, 6
            ├── FloorPDU-3 (450 kW) → Racks 3, 7
            └── FloorPDU-4 (450 kW) → Racks 4, 8

Tier 1: Power Domains

Utility Power Domain

  • Capacity: 750,000 W (750 kW)
  • Purpose: Represents the building utility connection and primary power feed
  • Design Note: This intentionally restrictive capacity is chosen to demonstrate power-constrained scenarios. Under maximum load (all 144 nodes at GB300-High policy), the total hardware demand exceeds the utility capacity, enabling testing of power allocation and optimization strategies.

Switchboard Power Domain

  • Capacity: 1,600,000 W (1.6 MW)
  • Purpose: Main electrical distribution point for the datacenter floor

Tier 2: Floor PDUs

Four Floor Power Distribution Units distribute power across the datacenter:

FloorPDU Capacity Serves Racks Total Nodes
FloorPDU-1 450 kW R01, R05 36
FloorPDU-2 450 kW R02, R06 36
FloorPDU-3 450 kW R03, R07 36
FloorPDU-4 450 kW R04, R08 36

Tier 3: Rack Power Supplies (PSUs)

Each of the 8 racks contains 8 redundant Power Supply Units:

Per-Rack Configuration:

  • PSU Count: 8 power shelves per rack
  • PSU Model: PowerSupply95_33000W (defined in custom-devices.yaml)
  • PSU Capacity: 33,000 W (33 kW) each
  • Efficiency: 95% (modeled), 97.5% peak in actual hardware
  • Total Rack Capacity: 264 kW (with redundancy)

Architecture Note: Each PSU in the topology represents a simplified view of the GB300 NVL72 power architecture. In actual GB300 NVL72 deployments, each “PSU” is implemented as a 1U power shelf containing 6 integrated power supply units (each 5.5 kW) in a 3+3 redundant configuration, delivering up to 33 kW per shelf. This topology abstraction treats each power shelf as a single logical PSU for simplicity while maintaining accurate power capacity modeling.

Custom Device Model: PowerSupply95_33000W

This custom power supply model is defined in custom-devices.yaml. The SDK simulator playbooks load this custom device definition before importing the topology, making it available for use in topology.json.

- type: PowerSupply
  description: GB300 Power Shelf at 33000W 95 Pct Eff
  model: PowerSupply95_33000W
  spec:
    maxLoadWatts: 33000
    efficiencyFactor: 0.95

Specifications:

  • Maximum Load: 33,000 W (33 kW)
  • Efficiency: 95% (0.95 factor)
  • Input Power Calculation: Output power ÷ 0.95
  • Example: Delivering 33 kW requires ~34.7 kW input (33,000 ÷ 0.95)

Naming Convention: PSU-R{rack}-{number}

  • Example: PSU-R01-1 through PSU-R01-8 for Rack 1

Power Distribution Model:

  • All 8 power shelves in a rack power all 18 compute nodes in that rack
  • Each node can draw power from any of the 8 power shelves in its rack

Tier 4: Compute Systems

Total Compute Nodes: 144 DGX GB300 systems

Per-Rack Distribution:

  • Nodes per Rack: 18

Naming Convention: gb300-r{rack}-{node}

  • Rack 1: gb300-r01-0001 through gb300-r01-0018
  • Rack 2: gb300-r02-0001 through gb300-r02-0018
  • …continuing through…
  • Rack 8: gb300-r08-0001 through gb300-r08-0018

Power Policies

The simulator includes three pre-configured power policies that define power limits at different granularities:

GB300-High (Maximum Performance)

Optimized for maximum computational performance:

Element Power Limit Purpose
Node 5,600 W Total node-level power cap
GPU 5,600 W Combined GPU power allocation (4 × 1,400 W)
CPU 600 W CPU subsystem power cap (2 × 300 W)

Use Case: Training runs, inference workloads requiring maximum throughput

GB300-Med (Balanced)

Balanced performance and power consumption:

Element Power Limit Purpose
Node 3,200 W Total node-level power cap
GPU 3,200 W Combined GPU power allocation (4 × 800 W)
CPU 400 W CPU subsystem power cap (2 × 200 W)

Use Case: Standard workloads, development tasks, cost-optimized operations

GB300-Low (Power Saving)

Optimized for minimum power consumption:

Element Power Limit Purpose
Node 1,600 W Total node-level power cap
GPU 1,600 W Combined GPU power allocation (4 × 400 W)
CPU 200 W CPU subsystem power cap (2 × 100 W)

Use Case: Idle periods, power-constrained scenarios, testing minimum viable configurations

Using the Topology

The topology is automatically imported during SDK initialization:

# Initialize SDK simulator with topology
task simulator:init

# For development environments
task simulator:init:dev

This command:

  1. Deploys the DPS server and BMC simulator
  2. Creates custom device models
  3. Imports the topology definition
  4. Registers all power policies
  5. Activates the topology
  6. Makes all 144 nodes available for resource group assignment

File Locations

The SDK includes the following key topology files:

  • Topology Definition: sim/topology.json
  • Custom Devices: sim/custom-devices.yaml

Summary

The DPS SDK simulator topology provides a complete, realistic datacenter power distribution model that serves as an invaluable evaluation and development environment. With 144 DGX GB300 nodes across 8 racks, multiple power policies, and a full four-tier power hierarchy, it enables thorough testing of DPS functionality without physical hardware.

This topology, combined with the BMC simulator and simulator playbooks, allows developers, administrators, and partners to:

  • Understand DPS capabilities through hands-on experimentation
  • Develop integrations with workload schedulers, grid controllers, and optimization engines
  • Test power management strategies before production deployment
  • Perform capacity planning and power budget analysis
  • Validate custom policies and optimization algorithms

For setup instructions, usage examples, and playbook documentation, refer to the SDK README.