Deployment Guide
Deployment Guide
Overview
This guide provides a step-by-step process for deploying DPS (NVIDIA Domain Power Service) to a Kubernetes cluster.
Prerequisites
- Kubernetes Cluster >= 1.31.x
- Helm 3.x
- kubectl configured to access your cluster
- Access to DPS GitLab Repository
- GitLab personal access token for downloading artifacts
- Access to the DPS Artifactory Repositories
- Access to the managed hosts and their BMC (Baseboard Management Controller)
Downloading DPS Artifacts
1. Get the Latest Release
Visit the DPS GitLab Releases page to find the latest stable version.
2. Download the Helm Chart
Download the DPS Helm chart from the GitLab package registry and then extract it:
# Set your version
VERSION="v0.3.3" # Replace with your desired version
# Extract the chart
tar -xzf {{ .Site.Params.ProjectShortNameLower }}-${VERSION}.tgz
Preparing Your Kubernetes Cluster
1. Create the DPS Namespace
kubectl create namespace {{ .Site.Params.ProjectShortNameLower }}
2. Configure Image Pull Secrets (if needed)
If your cluster requires authentication to pull images from NVIDIA’s registry:
kubectl create secret docker-registry nvidia-gitlab-registry \
--docker-server=gitlab-master.nvidia.com:5005 \
--docker-username=<your-gitlab-username> \
--docker-password=<your-gitlab-token> \
--namespace={{ .Site.Params.ProjectShortNameLower }}
3. Configure BMC Credentials
Create secrets for each BMC (Baseboard Management Controller) you want to manage:
---
apiVersion: v1
kind: Secret
metadata:
name: <node-secret-name>
namespace: {{ .Site.Params.ProjectShortNameLower }}
labels:
app: bmc-secret
type: Opaque
stringData:
bmc: |
{
"username": "your-bmc-username",
"password": "your-bmc-password"
}
Example for multiple nodes:
# Create secrets for your BMC nodes
kubectl apply -f - <<EOF
---
apiVersion: v1
kind: Secret
metadata:
name: dgxh100
namespace: dps
labels:
app: bmc-secret
type: Opaque
stringData:
bmc: |
{
"username": "admin",
"password": "admin"
}
---
apiVersion: v1
kind: Secret
metadata:
name: viking592
namespace: dps
labels:
app: bmc-secret
type: Opaque
stringData:
bmc: |
{
"username": "your-username",
"password": "your-password"
}
EOF
Note: The secret names are referenced in your topology configuration via the
SecretNamefield in the Redfish configuration.
Installing DPS
1. Create a Values File
Create a values.yaml file with your configuration:
For example:
# Basic DPS configuration
global:
imagePullSecrets:
- name: nvidia-gitlab-registry
{{ .Site.Params.ProjectShortNameLower }}:
ingress:
hostname: "api.dps.your-domain.com"
tls:
- hosts:
- "api.dps.your-domain.com"
secretName: {{ .Site.Params.ProjectShortNameLower }}-api-tls
# Mount BMC secrets
secrets:
- name: dgxh100
secretKey: bmc
mountPath: /home/nonroot/secrets/baremetal/dgxh100/bmc
- name: viking592
secretKey: bmc
mountPath: /home/nonroot/secrets/baremetal/viking592/bmc
- name: viking593
secretKey: bmc
mountPath: /home/nonroot/secrets/baremetal/viking593/bmc
# Enable UI
ui:
ingress:
hostname: "ui.dps.your-domain.com"
tls:
- hosts:
- "ui.dps.your-domain.com"
secretName: {{ .Site.Params.ProjectShortNameLower }}-ui-tls
# Enable documentation
docs:
ingress:
hostname: "docs.dps.your-domain.com"
tls:
- hosts:
- "docs.dps.your-domain.com"
secretName: {{ .Site.Params.ProjectShortNameLower }}-docs-tls
2. Install the Chart
# Install DPS
helm install {{ .Site.Params.ProjectShortNameLower }} ./{{ .Site.Params.ProjectShortNameLower }}-${VERSION} \
--namespace {{ .Site.Params.ProjectShortNameLower }} \
--values values.yaml \
--wait
# Verify installation
kubectl get pods -n {{ .Site.Params.ProjectShortNameLower }}
kubectl get services -n {{ .Site.Params.ProjectShortNameLower }}
Dependencies and Optional Components
Certificate Management
DPS can work with cert-manager for automatic TLS certificate management:
# In your values.yaml
{{ .Site.Params.ProjectShortNameLower }}:
ingress:
grpcAnnotations:
cert-manager.io/cluster-issuer: "letsencrypt-prod"
httpAnnotations:
cert-manager.io/cluster-issuer: "letsencrypt-prod"
LDAP Integration
For LDAP authentication, you can configure DPS to use an external LDAP server:
For example:
{{ .Site.Params.ProjectShortNameLower }}:
ldap:
enabled: true
serverUrl: "ldaps://your-ldap-server.company.com:636"
bindDn: "cn=dps-service,ou=services,dc=company,dc=com"
bindPassword: "your-service-account-password"
defaultRole: "admin"
groupRoleMapping: "dcpower-admins=admin,dcpower-users=user,dcpower-readonly=readonly"
certSource: "file"
tlsCaCert: "/home/nonroot/secrets/ldap/ca.crt"
tlsClientCert: "/home/nonroot/secrets/ldap/client.crt"
tlsClientKey: "/home/nonroot/secrets/ldap/client.key"
extraVolumeMounts:
- name: ldap-certs
mountPath: "/home/nonroot/secrets/ldap"
readOnly: true
extraVolumes:
- name: ldap-certs
secret:
secretName: "external-ldap-certs"
items:
- key: "ca.crt"
path: "ca.crt"
mode: 0444
- key: "client.crt"
path: "client.crt"
mode: 0444
- key: "client.key"
path: "client.key"
mode: 0400
You will need to change the values above to match your environment.
Verifying the Installation
1. Check Pod Status
kubectl get pods -n {{ .Site.Params.ProjectShortNameLower }}
Expected output:
NAME READY STATUS RESTARTS AGE
{{ .Site.Params.ProjectShortNameLower }}-postgresql-0 1/1 Running 0 4m22s
{{ .Site.Params.ProjectShortNameLower }}-server-7d94d5744b-rf8ql 1/1 Running 1 (2m18s ago) 4m22s
{{ .Site.Params.ProjectShortNameLower }}-ui-7d94d5744b-rf8ql 1/1 Running 0 4m22s
{{ .Site.Params.ProjectShortNameLower }}-docs-7d94d5744b-rf8ql 1/1 Running 0 4m22s
2. Check Services
kubectl get services -n {{ .Site.Params.ProjectShortNameLower }}
3. Test the API with dpsctl
# Set environment variables for dpsctl defaults
export {{ .Site.Params.ProjectShortName }}_HOST="api.dps.your-domain.com"
export {{ .Site.Params.ProjectShortName }}_PORT="443"
# Log in to the DPS server
dpsctl login
# Test server functionality
dpsctl verify
# Test with dpsctl
dpsctl server-version
# Or test the gRPC endpoint directly
# Get a token
grpcurl -d '{
"passwordCredential": {
"username": "<username>",
"password": "<password>"
}
}' \
${{{ .Site.Params.ProjectShortName }}_HOST}:${{{ .Site.Params.ProjectShortName }}_PORT} \
nvidia.dcpower.v1.AuthService/Token
# List gRPC endpoints
grpcurl \
-H "authorization: Bearer ${ACCESS_TOKEN}" \
${{{ .Site.Params.ProjectShortName }}_HOST}:${{{ .Site.Params.ProjectShortName }}_PORT} \
list
4. Access the UI
Open your browser and navigate to https://ui.dps.your-domain.com
Troubleshooting
Common Issues
- Image Pull Errors: Ensure your image pull secrets are configured correctly
- Ingress Issues: Verify your ingress controller is properly configured
- BMC Connection Failures: Check BMC credentials and network connectivity
- Database Issues: Verify PostgreSQL is running and accessible
Getting Logs
# DPS server logs
kubectl logs -n {{ .Site.Params.ProjectShortNameLower }} statefulset/{{ .Site.Params.ProjectShortNameLower }}-server
# UI logs
kubectl logs -n {{ .Site.Params.ProjectShortNameLower }} deployment/{{ .Site.Params.ProjectShortNameLower }}-ui
# PostgreSQL logs
kubectl logs -n {{ .Site.Params.ProjectShortNameLower }} statefulset/{{ .Site.Params.ProjectShortNameLower }}-postgresql
Upgrading DPS
# Using Helm
helm upgrade {{ .Site.Params.ProjectShortNameLower }} ./{{ .Site.Params.ProjectShortNameLower }}-new-version \
--namespace {{ .Site.Params.ProjectShortNameLower }} \
--values values.yaml
Next Steps
- Installing dpsctl - Command-line client for DPS
- API Documentation - gRPC API documentation