Deployment Guide
Overview
This guide provides a step-by-step process for deploying DPS (NVIDIA Domain Power Service) to a Kubernetes cluster.
Prerequisites
- Kubernetes Cluster >= 1.31.x
- Helm 3.x
- Kube context configured
Configuration Reference
The Helm chart’s values.yaml file contains the complete list of configuration options with detailed comments. Always refer to this file for the latest options and defaults.
To download and view the values file:
# Add the NGC Helm repository (if not already added)
helm repo add ngc https://helm.ngc.nvidia.com/nvidia
helm repo update ngc
# Download and extract the chart
helm pull ngc/dps --untar
# View the values file
cat dps/values.yamlQuickstart
For evaluation and testing, you can deploy DPS with default settings:
helm repo add ngc https://helm.ngc.nvidia.com/nvidia
helm repo update ngc
helm install dps -n dps ngc/dps --create-namespace
kubectl get pods -n dpsNote: The Quickstart deployment uses default settings suitable for evaluation only:
- Built-in PostgreSQL (not production-ready)
- No LDAP authentication
- Authentication in warning-only mode (accepts any credentials)
- Default hostnames:
api.dpsandui.dpsFor production deployments, follow the Production Deployment section below.
Production Deployment
1. Configure External PostgreSQL
The built-in PostgreSQL is not production-ready. For production, use your own PostgreSQL instance.
In your values.yaml, disable the built-in PostgreSQL and configure your external database:
postgresql:
enabled: false
global:
postgresql:
host: "your-postgres.example.com"
auth:
username: "dps"
database: "dps"
existingSecret: "dps-postgres-credentials"
secretKeys:
passwordKey: "password"See the chart’s values.yaml for all available PostgreSQL options including SSL mode and port configuration.
2. Create BMC Credentials
Create secrets for each BMC (Baseboard Management Controller) you want to manage. BMC credentials are required for bare-metal compute nodes (such as DGX systems or servers with Redfish/IPMI interfaces) that DPS will monitor and control. These are not Kubernetes node credentials. See Supported Platforms for a complete list of supported hardware.
First, create the namespace:
kubectl create namespace dpsThen create secrets for your BMC nodes:
kubectl apply -f - <<EOF
---
apiVersion: v1
kind: Secret
metadata:
name: node001
namespace: dps
labels:
app: bmc-secret
type: Opaque
stringData:
bmc: |
{
"username": "admin",
"password": "your-bmc-password"
}
---
apiVersion: v1
kind: Secret
metadata:
name: node002
namespace: dps
labels:
app: bmc-secret
type: Opaque
stringData:
bmc: |
{
"username": "admin",
"password": "your-bmc-password"
}
EOFNote: The secret names are referenced in your topology configuration via the
SecretNamefield in the Redfish configuration.
3. Create Values File
Create a values.yaml file with your configuration. At minimum, configure your ingress hostnames and mount your BMC secrets:
dps:
ingress:
hostname: "api.dps.your-domain.com"
secrets:
- name: node001
secretKey: bmc
mountPath: /home/nonroot/secrets/baremetal/node001/bmc
- name: node002
secretKey: bmc
mountPath: /home/nonroot/secrets/baremetal/node002/bmc
ui:
ingress:
hostname: "ui.dps.your-domain.com"See the chart’s values.yaml for additional options including TLS configuration, ingress annotations, and resource limits.
4. Install the Chart
helm install dps ngc/dps \
--namespace dps \
--values values.yaml \
--waitOptional Configuration
These options can be added to your values.yaml during initial installation or applied later via helm upgrade.
Certificate Management
DPS can work with cert-manager for automatic TLS certificate management:
dps:
ingress:
grpcAnnotations:
cert-manager.io/cluster-issuer: "letsencrypt-prod"
httpAnnotations:
cert-manager.io/cluster-issuer: "letsencrypt-prod"See dps.ingress.* and ui.ingress.* in the values file for all available annotation fields.
LDAP Integration
To enable LDAP authentication:
dps:
ldap:
enabled: trueConfigure your LDAP server connection via dps.ldap.* settings including serverUrl, bindDn, and TLS certificate paths. See the chart’s values.yaml for all LDAP options and volume mount examples.
Verifying the Installation
1. Check Pod Status
kubectl get pods -n dpsExpected output:
NAME READY STATUS RESTARTS AGE
dps-server-7d94d5744b-rf8ql 1/1 Running 0 4m
dps-ui-7d94d5744b-rf8ql 1/1 Running 0 4m2. Check Services
kubectl get services -n dps3. Test the API with dpsctl
Note: If you haven’t installed
dpsctlyet, see Installing dpsctl for download and setup instructions.
Note: These commands require that DNS resolves
DPS_HOSTto your cluster’s ingress IP and ingress is configured. Consult your cluster administrator if the hostname is not reachable.
# Set environment variables
export DPS_HOST="api.dps.your-domain.com"
export DPS_PORT="443"
# Log in and verify
dpsctl login
dpsctl verify
dpsctl server-version4. Access the UI
Note: This requires that DNS resolves the UI hostname to your cluster’s ingress IP and ingress is configured. Consult your cluster administrator if the hostname is not reachable.
Open your browser and navigate to https://ui.dps.your-domain.com
Login Credentials:
- Evaluation/Quickstart deployments: Authentication is in warning-only mode by default, so any credentials will work (e.g.,
abcdfor both username and password). - Production deployments: Use your LDAP credentials as configured in
dps.ldap.*.
Upgrading DPS
helm repo update ngc
helm upgrade dps ngc/dps \
--namespace dps \
--values values.yamlTroubleshooting
Common Issues
- Ingress Issues: Verify your ingress controller is properly configured
- BMC Connection Failures: Check BMC credentials and network connectivity to the BMC from dps-server
- Database Issues: Verify PostgreSQL is running and accessible
Getting Logs
# DPS server logs
kubectl logs -n dps -l app=dps-server --all-containers
# UI logs
kubectl logs -n dps -l app=dps-ui --all-containers
# PRS logs
kubectl logs -n dps -l app=prs --all-containers
# PostgreSQL logs (if using built-in)
kubectl logs -n dps -l app.kubernetes.io/name=postgresql --all-containersNext Steps
- Installing dpsctl - Command-line client for DPS
- API Documentation - gRPC API documentation