Deployment Guide

Overview

This guide provides a step-by-step process for deploying DPS (NVIDIA Domain Power Service) to a Kubernetes cluster.

Prerequisites

  • Kubernetes Cluster >= 1.31.x
  • Helm 3.x
  • Kube context configured

Configuration Reference

The Helm chart’s values.yaml file contains the complete list of configuration options with detailed comments. Always refer to this file for the latest options and defaults.

To download and view the values file:

# Add the NGC Helm repository (if not already added)
helm repo add ngc https://helm.ngc.nvidia.com/nvidia
helm repo update ngc

# Download and extract the chart
helm pull ngc/dps --untar

# View the values file
cat dps/values.yaml

Quickstart

For evaluation and testing, you can deploy DPS with default settings:

helm repo add ngc https://helm.ngc.nvidia.com/nvidia
helm repo update ngc
helm install dps -n dps ngc/dps --create-namespace
kubectl get pods -n dps

Note: The Quickstart deployment uses default settings suitable for evaluation only:

  • Built-in PostgreSQL (not production-ready)
  • No LDAP authentication
  • Authentication in warning-only mode (accepts any credentials)
  • Default hostnames: api.dps and ui.dps

For production deployments, follow the Production Deployment section below.

Production Deployment

1. Configure External PostgreSQL

The built-in PostgreSQL is not production-ready. For production, use your own PostgreSQL instance.

In your values.yaml, disable the built-in PostgreSQL and configure your external database:

postgresql:
  enabled: false

global:
  postgresql:
    host: "your-postgres.example.com"
    auth:
      username: "dps"
      database: "dps"
      existingSecret: "dps-postgres-credentials"
      secretKeys:
        passwordKey: "password"

See the chart’s values.yaml for all available PostgreSQL options including SSL mode and port configuration.

2. Create BMC Credentials

Create secrets for each BMC (Baseboard Management Controller) you want to manage. BMC credentials are required for bare-metal compute nodes (such as DGX systems or servers with Redfish/IPMI interfaces) that DPS will monitor and control. These are not Kubernetes node credentials. See Supported Platforms for a complete list of supported hardware.

First, create the namespace:

kubectl create namespace dps

Then create secrets for your BMC nodes:

kubectl apply -f - <<EOF
---
apiVersion: v1
kind: Secret
metadata:
  name: node001
  namespace: dps
  labels:
    app: bmc-secret
type: Opaque
stringData:
  bmc: |
    {
      "username": "admin",
      "password": "your-bmc-password"
    }
---
apiVersion: v1
kind: Secret
metadata:
  name: node002
  namespace: dps
  labels:
    app: bmc-secret
type: Opaque
stringData:
  bmc: |
    {
      "username": "admin",
      "password": "your-bmc-password"
    }
EOF

Note: The secret names are referenced in your topology configuration via the SecretName field in the Redfish configuration.

3. Create Values File

Create a values.yaml file with your configuration. At minimum, configure your ingress hostnames and mount your BMC secrets:

dps:
  ingress:
    hostname: "api.dps.your-domain.com"
  secrets:
    - name: node001
      secretKey: bmc
      mountPath: /home/nonroot/secrets/baremetal/node001/bmc
    - name: node002
      secretKey: bmc
      mountPath: /home/nonroot/secrets/baremetal/node002/bmc

ui:
  ingress:
    hostname: "ui.dps.your-domain.com"

See the chart’s values.yaml for additional options including TLS configuration, ingress annotations, and resource limits.

4. Install the Chart

helm install dps ngc/dps \
  --namespace dps \
  --values values.yaml \
  --wait

Optional Configuration

These options can be added to your values.yaml during initial installation or applied later via helm upgrade.

Certificate Management

DPS can work with cert-manager for automatic TLS certificate management:

dps:
  ingress:
    grpcAnnotations:
      cert-manager.io/cluster-issuer: "letsencrypt-prod"
    httpAnnotations:
      cert-manager.io/cluster-issuer: "letsencrypt-prod"

See dps.ingress.* and ui.ingress.* in the values file for all available annotation fields.

LDAP Integration

To enable LDAP authentication:

dps:
  ldap:
    enabled: true

Configure your LDAP server connection via dps.ldap.* settings including serverUrl, bindDn, and TLS certificate paths. See the chart’s values.yaml for all LDAP options and volume mount examples.

Verifying the Installation

1. Check Pod Status

kubectl get pods -n dps

Expected output:

NAME                          READY   STATUS    RESTARTS   AGE
dps-server-7d94d5744b-rf8ql   1/1     Running   0          4m
dps-ui-7d94d5744b-rf8ql       1/1     Running   0          4m

2. Check Services

kubectl get services -n dps

3. Test the API with dpsctl

Note: If you haven’t installed dpsctl yet, see Installing dpsctl for download and setup instructions.

Note: These commands require that DNS resolves DPS_HOST to your cluster’s ingress IP and ingress is configured. Consult your cluster administrator if the hostname is not reachable.

# Set environment variables
export DPS_HOST="api.dps.your-domain.com"
export DPS_PORT="443"

# Log in and verify
dpsctl login
dpsctl verify
dpsctl server-version

4. Access the UI

Note: This requires that DNS resolves the UI hostname to your cluster’s ingress IP and ingress is configured. Consult your cluster administrator if the hostname is not reachable.

Open your browser and navigate to https://ui.dps.your-domain.com

Login Credentials:

  • Evaluation/Quickstart deployments: Authentication is in warning-only mode by default, so any credentials will work (e.g., abcd for both username and password).
  • Production deployments: Use your LDAP credentials as configured in dps.ldap.*.

Upgrading DPS

helm repo update ngc
helm upgrade dps ngc/dps \
  --namespace dps \
  --values values.yaml

Troubleshooting

Common Issues

  1. Ingress Issues: Verify your ingress controller is properly configured
  2. BMC Connection Failures: Check BMC credentials and network connectivity to the BMC from dps-server
  3. Database Issues: Verify PostgreSQL is running and accessible

Getting Logs

# DPS server logs
kubectl logs -n dps -l app=dps-server --all-containers

# UI logs
kubectl logs -n dps -l app=dps-ui --all-containers

# PRS logs
kubectl logs -n dps -l app=prs --all-containers

# PostgreSQL logs (if using built-in)
kubectl logs -n dps -l app.kubernetes.io/name=postgresql --all-containers

Next Steps