Deployment Guide
Deployment Guide
Overview
This guide provides a step-by-step process for deploying DPS (NVIDIA Domain Power Service) to a Kubernetes cluster.
Prerequisites
- Kubernetes Cluster >= 1.31.x
- Helm 3.x
- kubectl configured to access your cluster
- Access to DPS NGC
- Access to the DPS Artifactory Repositories
- Access to the managed hosts and their BMC (Baseboard Management Controller)
Downloading DPS Artifacts
1. Download the Helm Chart
Download the DPS Helm chart from NGC. Visit DPS Helm Charts on NGC to find available versions.
# Set your version (e.g., 0.7.0)
VERSION="0.7.0"
# Download the chart
helm fetch https://helm.ngc.nvidia.com/nvidia/charts/dps-${VERSION}.tgz
# Extract the chart
tar -xzf dps-${VERSION}.tgzPreparing Your Kubernetes Cluster
1. Create the DPS Namespace
kubectl create namespace dps2. Configure Image Pull Secrets (if needed)
If your cluster requires authentication to pull images from NVIDIA’s NGC registry:
kubectl create secret docker-registry nvcr-secret \
--docker-server=nvcr.io \
--docker-username='$oauthtoken' \
--docker-password=<your-ngc-api-key> \
--namespace=dps3. Configure BMC Credentials
Create secrets for each BMC (Baseboard Management Controller) you want to manage:
---
apiVersion: v1
kind: Secret
metadata:
name: <node-secret-name>
namespace: dps
labels:
app: bmc-secret
type: Opaque
stringData:
bmc: |
{
"username": "your-bmc-username",
"password": "your-bmc-password"
}Example for multiple nodes:
# Create secrets for your BMC nodes
kubectl apply -f - <<EOF
---
apiVersion: v1
kind: Secret
metadata:
name: dgxh100
namespace: dps
labels:
app: bmc-secret
type: Opaque
stringData:
bmc: |
{
"username": "admin",
"password": "admin"
}
---
apiVersion: v1
kind: Secret
metadata:
name: viking592
namespace: dps
labels:
app: bmc-secret
type: Opaque
stringData:
bmc: |
{
"username": "your-username",
"password": "your-password"
}
EOFNote: The secret names are referenced in your topology configuration via the
SecretNamefield in the Redfish configuration.
Installing DPS
1. Create a Values File
Create a values.yaml file with your configuration:
For example:
# Basic DPS configuration
global:
imagePullSecrets:
- name: nvcr-secret
dps:
ingress:
hostname: "api.dps.your-domain.com"
tls:
- hosts:
- "api.dps.your-domain.com"
secretName: dps-api-tls
# Mount BMC secrets
secrets:
- name: dgxh100
secretKey: bmc
mountPath: /home/nonroot/secrets/baremetal/dgxh100/bmc
- name: viking592
secretKey: bmc
mountPath: /home/nonroot/secrets/baremetal/viking592/bmc
- name: viking593
secretKey: bmc
mountPath: /home/nonroot/secrets/baremetal/viking593/bmc
# Enable UI
ui:
ingress:
hostname: "ui.dps.your-domain.com"
tls:
- hosts:
- "ui.dps.your-domain.com"
secretName: dps-ui-tls
# Enable documentation
docs:
ingress:
hostname: "docs.dps.your-domain.com"
tls:
- hosts:
- "docs.dps.your-domain.com"
secretName: dps-docs-tls2. Install the Chart
# Install DPS
helm install dps ./dps-${VERSION} \
--namespace dps \
--values values.yaml \
--wait
# Verify installation
kubectl get pods -n dps
kubectl get services -n dpsDependencies and Optional Components
Certificate Management
DPS can work with cert-manager for automatic TLS certificate management:
# In your values.yaml
dps:
ingress:
grpcAnnotations:
cert-manager.io/cluster-issuer: "letsencrypt-prod"
httpAnnotations:
cert-manager.io/cluster-issuer: "letsencrypt-prod"LDAP Integration
For LDAP authentication, you can configure DPS to use an external LDAP server:
For example:
dps:
ldap:
enabled: true
serverUrl: "ldaps://your-ldap-server.company.com:636"
bindDn: "cn=dps-service,ou=services,dc=company,dc=com"
bindPassword: "your-service-account-password"
defaultRole: "admin"
groupRoleMapping: "dcpower-admins=admin,dcpower-users=user,dcpower-readonly=readonly"
certSource: "file"
tlsCaCert: "/home/nonroot/secrets/ldap/ca.crt"
tlsClientCert: "/home/nonroot/secrets/ldap/client.crt"
tlsClientKey: "/home/nonroot/secrets/ldap/client.key"
extraVolumeMounts:
- name: ldap-certs
mountPath: "/home/nonroot/secrets/ldap"
readOnly: true
extraVolumes:
- name: ldap-certs
secret:
secretName: "external-ldap-certs"
items:
- key: "ca.crt"
path: "ca.crt"
mode: 0444
- key: "client.crt"
path: "client.crt"
mode: 0444
- key: "client.key"
path: "client.key"
mode: 0400You will need to change the values above to match your environment.
Verifying the Installation
1. Check Pod Status
kubectl get pods -n dpsExpected output:
NAME READY STATUS RESTARTS AGE
dps-postgresql-0 1/1 Running 0 4m22s
dps-server-7d94d5744b-rf8ql 1/1 Running 1 (2m18s ago) 4m22s
dps-ui-7d94d5744b-rf8ql 1/1 Running 0 4m22s
dps-docs-7d94d5744b-rf8ql 1/1 Running 0 4m22s2. Check Services
kubectl get services -n dps3. Test the API with dpsctl
# Set environment variables for dpsctl defaults
export DPS_HOST="api.dps.your-domain.com"
export DPS_PORT="443"
# Log in to the DPS server
dpsctl login
# Test server functionality
dpsctl verify
# Test with dpsctl
dpsctl server-version
# Or test the gRPC endpoint directly
# Get a token
grpcurl -d '{
"passwordCredential": {
"username": "<username>",
"password": "<password>"
}
}' \
${DPS_HOST}:${DPS_PORT} \
nvidia.dcpower.v1.AuthService/Token
# List gRPC endpoints
grpcurl \
-H "authorization: Bearer ${ACCESS_TOKEN}" \
${DPS_HOST}:${DPS_PORT} \
list4. Access the UI
Open your browser and navigate to https://ui.dps.your-domain.com
Troubleshooting
Common Issues
- Image Pull Errors: Ensure your image pull secrets are configured correctly
- Ingress Issues: Verify your ingress controller is properly configured
- BMC Connection Failures: Check BMC credentials and network connectivity
- Database Issues: Verify PostgreSQL is running and accessible
Getting Logs
# DPS server logs
kubectl logs -n dps statefulset/dps-server
# UI logs
kubectl logs -n dps deployment/dps-ui
# PostgreSQL logs
kubectl logs -n dps statefulset/dps-postgresqlUpgrading DPS
# Using Helm
helm upgrade dps ./dps-new-version \
--namespace dps \
--values values.yamlNext Steps
- Installing dpsctl - Command-line client for DPS
- API Documentation - gRPC API documentation