SDK Simulator Topology
Overview
The DPS SDK provides a complete simulated datacenter environment designed to serve as an evaluation playground for the Domain Power Service (DPS). This topology, combined with the BMC simulator and simulator playbooks, enables developers and administrators to test power management strategies, resource group operations, and workload optimization without requiring physical hardware infrastructure.
The simulator topology represents a small-scale datacenter deployment with realistic power distribution hierarchies, allowing users to:
- Test and validate power policies before deploying to production
- Experiment with resource group creation and management
- Understand power distribution constraints and optimization strategies
- Develop integration code against a stable, reproducible environment
- Perform load testing and capacity planning exercises
Topology Structure
The SDK simulator topology defines a complete power distribution network for a modern AI datacenter featuring 144 DGX GB300 compute nodes organized across 8 racks.
Power Distribution Hierarchy
The topology implements a four-tier power distribution architecture that mirrors real-world datacenter deployments:
Utility (750 kW)
└── Switchboard (1.6 MW)
├── FloorPDU-1 (450 kW) → Racks 1, 5
├── FloorPDU-2 (450 kW) → Racks 2, 6
├── FloorPDU-3 (450 kW) → Racks 3, 7
└── FloorPDU-4 (450 kW) → Racks 4, 8Tier 1: Power Domains
Utility Power Domain
- Capacity: 750,000 W (750 kW)
- Purpose: Represents the building utility connection and primary power feed
- Design Note: This intentionally restrictive capacity is chosen to demonstrate power-constrained scenarios. Under maximum load (all 144 nodes at GB300-High policy), the total hardware demand exceeds the utility capacity, enabling testing of power allocation and optimization strategies.
Switchboard Power Domain
- Capacity: 1,600,000 W (1.6 MW)
- Purpose: Main electrical distribution point for the datacenter floor
Tier 2: Floor PDUs
Four Floor Power Distribution Units distribute power across the datacenter:
| FloorPDU | Capacity | Serves Racks | Total Nodes |
|---|---|---|---|
| FloorPDU-1 | 450 kW | R01, R05 | 36 |
| FloorPDU-2 | 450 kW | R02, R06 | 36 |
| FloorPDU-3 | 450 kW | R03, R07 | 36 |
| FloorPDU-4 | 450 kW | R04, R08 | 36 |
Tier 3: Rack Power Supplies (PSUs)
Each of the 8 racks contains 8 redundant Power Supply Units:
Per-Rack Configuration:
- PSU Count: 8 power shelves per rack
- PSU Model:
PowerSupply95_33000W(defined incustom-devices.yaml) - PSU Capacity: 33,000 W (33 kW) each
- Efficiency: 95% (modeled), 97.5% peak in actual hardware
- Total Rack Capacity: 264 kW (with redundancy)
Architecture Note: Each PSU in the topology represents a simplified view of the GB300 NVL72 power architecture. In actual GB300 NVL72 deployments, each “PSU” is implemented as a 1U power shelf containing 6 integrated power supply units (each 5.5 kW) in a 3+3 redundant configuration, delivering up to 33 kW per shelf. This topology abstraction treats each power shelf as a single logical PSU for simplicity while maintaining accurate power capacity modeling.
Custom Device Model: PowerSupply95_33000W
This custom power supply model is defined in custom-devices.yaml. The SDK simulator playbooks load this custom device definition before importing the topology, making it available for use in topology.json.
- type: PowerSupply
description: GB300 Power Shelf at 33000W 95 Pct Eff
model: PowerSupply95_33000W
spec:
maxLoadWatts: 33000
efficiencyFactor: 0.95Specifications:
- Maximum Load: 33,000 W (33 kW)
- Efficiency: 95% (0.95 factor)
- Input Power Calculation: Output power ÷ 0.95
- Example: Delivering 33 kW requires ~34.7 kW input (33,000 ÷ 0.95)
Naming Convention: PSU-R{rack}-{number}
- Example:
PSU-R01-1throughPSU-R01-8for Rack 1
Power Distribution Model:
- All 8 power shelves in a rack power all 18 compute nodes in that rack
- Each node can draw power from any of the 8 power shelves in its rack
Tier 4: Compute Systems
Total Compute Nodes: 144 DGX GB300 systems
Per-Rack Distribution:
- Nodes per Rack: 18
Naming Convention: gb300-r{rack}-{node}
- Rack 1:
gb300-r01-0001throughgb300-r01-0018 - Rack 2:
gb300-r02-0001throughgb300-r02-0018 - …continuing through…
- Rack 8:
gb300-r08-0001throughgb300-r08-0018
Power Policies
The simulator includes three pre-configured power policies that define power limits at different granularities:
GB300-High (Maximum Performance)
Optimized for maximum computational performance:
| Element | Power Limit | Purpose |
|---|---|---|
| Node | 5,600 W | Total node-level power cap |
| GPU | 5,600 W | Combined GPU power allocation (4 × 1,400 W) |
| CPU | 600 W | CPU subsystem power cap (2 × 300 W) |
Use Case: Training runs, inference workloads requiring maximum throughput
GB300-Med (Balanced)
Balanced performance and power consumption:
| Element | Power Limit | Purpose |
|---|---|---|
| Node | 3,200 W | Total node-level power cap |
| GPU | 3,200 W | Combined GPU power allocation (4 × 800 W) |
| CPU | 400 W | CPU subsystem power cap (2 × 200 W) |
Use Case: Standard workloads, development tasks, cost-optimized operations
GB300-Low (Power Saving)
Optimized for minimum power consumption:
| Element | Power Limit | Purpose |
|---|---|---|
| Node | 1,600 W | Total node-level power cap |
| GPU | 1,600 W | Combined GPU power allocation (4 × 400 W) |
| CPU | 200 W | CPU subsystem power cap (2 × 100 W) |
Use Case: Idle periods, power-constrained scenarios, testing minimum viable configurations
Using the Topology
The topology is automatically imported during SDK initialization:
# Initialize SDK simulator with topology
task simulator:init
# For development environments
task simulator:init:devThis command:
- Deploys the DPS server and BMC simulator
- Creates custom device models
- Imports the topology definition
- Registers all power policies
- Activates the topology
- Makes all 144 nodes available for resource group assignment
File Locations
The SDK includes the following key topology files:
- Topology Definition:
sim/topology.json - Custom Devices:
sim/custom-devices.yaml
Summary
The DPS SDK simulator topology provides a complete, realistic datacenter power distribution model that serves as an invaluable evaluation and development environment. With 144 DGX GB300 nodes across 8 racks, multiple power policies, and a full four-tier power hierarchy, it enables thorough testing of DPS functionality without physical hardware.
This topology, combined with the BMC simulator and simulator playbooks, allows developers, administrators, and partners to:
- Understand DPS capabilities through hands-on experimentation
- Develop integrations with workload schedulers, grid controllers, and optimization engines
- Test power management strategies before production deployment
- Perform capacity planning and power budget analysis
- Validate custom policies and optimization algorithms
Related Documentation
For setup instructions, usage examples, and playbook documentation, refer to the SDK README.