SDK Simulator User Guide
The DPS SDK is a Kubernetes-based development environment for testing power management solutions without physical hardware. It ships with three deployment profiles so you can pick the right starting point for your goals.
Deployment Profiles
- Default - Blank DPS environment with no topology or simulators running. Recommended starting point for hands-on
dpsctlexercises and API exploration. - Hardware Emulation - 144 emulated DGX GB300 nodes across 8 racks with pseudorandom hardware responses. Best for general API development, integration validation, and automated playbooks.
- Workload-Aware - Same 144-node GB300 topology, but the BMC simulator calls a pluggable workload model over gRPC to return realistic GPU power traces. Best for power-aware analysis and workload optimization studies. See the Workload-Aware Simulation guide.
What’s Included
Emulated Infrastructure (Hardware Emulation and Workload-Aware)
- 144 DGX GB300 Compute Nodes - Organized across 8 racks (18 nodes per rack)
- Complete Power Distribution Hierarchy - Utility, Switchboard, Floor PDUs, Rack PSUs, Compute Systems
- BMC Simulator - Provides Redfish API endpoints for all emulated nodes
- Pre-configured Power Policies - GB300-High (5600 W), GB300-Med (3200 W), and GB300-Low (1600 W)
For detailed information about the simulator’s datacenter topology, see the Simulator Topology Guide.
Monitoring and Visualization
- Grafana Dashboards - Real-time metrics for datacenter power, resource groups, and system operations
- Prometheus - Time-series metrics collection and alerting
- Pyroscope - Continuous profiling for performance analysis
- Web UI - Interactive interface for managing DPS
Automated Simulation Playbooks
- Resource Group Simulation - Automated workload lifecycle testing with configurable parameters
- Grid Simulation - Domain-level power management and grid integration testing
- Load Shedding Simulation - Power reduction events and recovery scenarios
- Combined Simulations - Run multiple scenarios simultaneously for thorough testing
Use Cases
The simulator is ideal for:
- Learning DPS - Explore concepts, APIs, and workflows in a safe environment
- SDK Development - Build and test custom integrations without hardware dependencies
- Partner Integration - Develop grid or optimization integrations
- Quick Start - Jump-start your custom datacenter deployment
Quick Start
System Requirements:
- Linux (Ubuntu/Debian) or macOS
- Minimum 8 GB RAM, 20 GB free disk space
- Internet connection for dependencies
Setup Instructions:
-
Download and unarchive the DPS SDK files from the NVIDIA NVOnline Portal.
-
Install SDK dependencies. From the SDK directory, run:
cd dps-sdk task setupThis installs docker, kubectl, k3d, helm, helm-git, uv, and
dpsctl. -
Deploy the Default profile. This is the recommended starting point - a blank DPS environment with no topology loaded:
task deploy -
Configure your shell for
dpsctl. Paste these exports once to avoid repeating connection flags on every command:export DPSCTL_HOST=api.dps.sdk export DPSCTL_PORT=80 export DPSCTL_INSECURE_TLS_SKIP_VERIFY=true -
Continue with the Default-profile playbook. The Simulator Playbooks guide walks you through importing the topology, creating resource groups, and setting grid load targets with
dpsctl.
Other Deployment Paths
-
Hardware Emulation (automated playbooks):
task sdk task simtask sdkcreates the k3d cluster and deploys DPS;task simimports the topology and runs the combined resource-groups plus grid simulation. -
Workload-Aware (realistic power traces): See the Workload-Aware Simulation guide for the full deployment flow.
Access Services
After deployment, the SDK exposes these endpoints:
| Service | URL | Default Auth |
|---|---|---|
| DPS API (gRPC) | api.dps.sdk |
dps/dps |
| DPS Web UI | http://ui.dps.sdk |
dps/dps |
| Grafana | http://grafana.dps.sdk |
admin/dps |
| Prometheus | http://prometheus.dps.sdk |
- |
| Alertmanager | http://alertmanager.dps.sdk |
- |
| Pyroscope | http://pyroscope.dps.sdk |
- |
Next Steps
- Simulator Playbooks - Default-profile walkthroughs and Hardware Emulation automated simulations.
- Simulator Topology Guide - GB300 datacenter topology reference.
- Workload-Aware Simulation - Deploy and use the surrogate workload model.