Installing with Helm - NVIDIA Docs

This document describes the installation procedure for the 0.5.0-dev version of the Universe service orchestration stack with Helm.

Requirements

Environment

It is possible to deploy all Universe components to a single virtual Kind cluster, but it is highly recommended to have separate hosts/VM at least for the following roles:

Physical Host or VM for iCP master
Physical Host or VM for tenant1 master
Physical Host with DPU for tenant1 worker

This environment is a bare minimum, and it is not enough to test all multi-tenant use cases, so for real-world test cases, it should be extended to include multiple tenant clusters with multiple hosts each.

Vault

Universe service orchestration stack uses Vault for TLS certificate management.

Vault server is not part of Universe.

Control-plane nodes in Infrastructure cluster and all nodes in Tenant clusters should have network access to the external Vault server running in your infrastructure.

You can install and configure Vault server by following the official documentation

If you are going to use the existing Vault server, you should have permissions on this server to initialize PKI (Public Key Infrastructure) and create approles for Universe components.

You can find an example of how to configure PKI in Vault here: Vault PKI configuration

Helm

For Universe installation, it is required to install Helm v3 on iCP master and all Tenant masters. Carefully check the document describing the maximum version skew supported between Helm and Kubernetes.

Common steps

These steps should be applied on all Kubernetes master nodes (for both, iCP and all Tenants).

Download Helm charts from NGC

Note

To be able to use images and Helm Charts from NGC you need to have access to nvstaging/doca group.

Replace <YOUR_NGC_TOKEN> with your NGC token. Token can be generated here.

mkdir ~/universe-helm-charts && cd ~/universe-helm-charts NGC_API_KEY=<YOUR_NGC_TOKEN> NGC_CHARTS_REPO=https://helm.ngc.nvidia.com/nvstaging/doca/charts helm fetch "$NGC_CHARTS_REPO"/universe-tenant-control-plane-0.5.0-dev.tgz --username='$oauthtoken' --password="$NGC_API_KEY" helm fetch "$NGC_CHARTS_REPO"/universe-infra-control-plane-0.5.0-dev.tgz --username='$oauthtoken' --password="$NGC_API_KEY" helm fetch "$NGC_CHARTS_REPO"/universe-vault-0.5.0-dev.tgz --username='$oauthtoken' --password="$NGC_API_KEY" ls universe-*-0.5.0-dev.tgz | xargs -n 1 tar xf

Install Vault agent

Commands in this section should run from the root directory of the universe-vault chart.

Copy
Copied!

            
            cd ~/universe-helm-charts/universe-vault

Prepare values file for Vault helm chart

Check universe-vault Chart documentation for all available options.

Note

You should change settings in the snippet below to match your environment

Fields to changes:

Replace vault.injector.externalVaultAddr with address of the Vault server

Prepare helm values for vault agent

Copy
Copied!

            
            cat << 'EOF' | tee values-external.yaml
vault:
enabled: true
injector:
enabled: true
externalVaultAddr: http://<VAULT_SERVER>:<VAULT_PORT>
nodeSelector:
node-role.kubernetes.io/control-plane: ""
tolerations:
- effect: NoSchedule
operator: "Exists"
key: node-role.kubernetes.io/master
- effect: NoSchedule
operator: "Exists"
key: node-role.kubernetes.io/control-plane
server:
enabled: false
EOF

Deploy vault agent

Copy
Copied!

            
            helm install -n vault --create-namespace -f values-external.yaml vault .

Infrastructure control plane

This section describes deployment steps for infrastructure control plane.

Follow these steps on the iCP master.

Create imagePullSecret to use images from a private registry.

Commands in this section should run from the root directory of the universe-infra-control-plane chart.

Copy
Copied!

            
            cd ~/universe-helm-charts/universe-infra-control-plane

Download and unpack DCM image

For DPU provisioning, DCM image is required and can be downloaded from GIT Package Registry.

Download DCM image

Download DCM image from https://gitlab-master.nvidia.com/api/v4/projects/77555/packages/generic/dcm_ngn/0.0.4/dcm-images.tar.gz

SHA-256: 5fed7981c0481a456c0fbd0f7166fe6d2cefd504ff3dd0ce9bf079d30f0ed817

Unpack DCM image

Note

Unpack the DCM image into a shared directory which is used by Deploy iCP components.

You need set this shared directory to global.provisioningStorage.hostpath in Deploy iCP components.

Copy
Copied!

            
            mkdir -p $SHARED_RESOURCES_DIR/images
tar -xf dcm-images.tar.gz -C ${SHARED_RESOURCES_DIR}/images

Deploy iCP components

Check universe-infra-control-plane Chart documentation for all available options.

Note

You should change settings in the snippet below to match your environment

Fields to changes:

global.ironicHostIP
global.provisioningStorage
universe-infra-provisioning-controller.capiConfig
universe-infra-provisioning-controller.controller
universe-infra-provisioning-executor.universe-infra-provisioning-mariadb
universe-infra-provisioning-executor.universe-infra-provisioning-bootp
universe-infra-admin-controller.tenantConfig.tenants
universe-infra-admin-controller.dpuInventory.dpus
universe-infra-api-gateway.vaultApproleSecret.roleID and secretID - should be created in Vault server during Vault PKI configuration
helm_set_file_params: the path of kubeconf and CA files

Prepare values-secure.yaml and deploy helm chart to the iCP cluster

Copy
Copied!

            
            cat << 'EOF' | tee values-secure.yaml
global:
image:
registry: nvcr.io/nvstaging/doca/
imagePullSecrets:
- name: nvcrio-cred
nodeSelector:
node-role.kubernetes.io/control-plane: ""
tolerations:
- effect: NoSchedule
operator: "Exists"
key: node-role.kubernetes.io/master
- effect: NoSchedule
operator: "Exists"
key: node-role.kubernetes.io/control-plane
# ip for ironic host
ironicHostIP: "<ironic ip address>"
provisioningStorage:
# hostpath is used by bootp and ironic
hostpath: <host path of local pv>
# hostname is used by bootp and ironic
hostname: "<host name of local pv>"

universe-infra-admin-controller:
enabled: true
tenantConfig:
create: true
tenants:
- id: tenant1
hostnames:
- host-a
- host-b
dpuInventory:
create: true
dpus:
- id: dpu1-host-a
host: host-a
- id: dpu1-host-b
host: host-b

universe-infra-resource-manager:
enabled: true

universe-infra-provisioning-manager:
enabled: true

universe-infra-provisioning-controller:
enabled: true
capiConfig:
create: true
clusterName: <name of the cluster>
controlPlaneEndPoint: <the ip address, host ip or vip of control plane endpoint>
controlPlaneEndPointPort: <the port of control plane endpoint>
controller:
args:
ntpServer: <ntp server ip address>
imageRegistry: <image registry ip address>

universe-infra-provisioning-executor:
enabled: true
universe-infra-provisioning-mariadb:
pv:
name: mariadb-pv
hostpath: <host path of local pv>
hostname: <host name of local pv>
universe-infra-provisioning-bootp:
bootp:
# -- dnsmasq configuration, refer https://linux.die.net/man/8/dnsmasq
dnsmasq:
# args is a list of dnsmasq command parameters. You can set any parameters supported by dnsmasq.
# --k, --interface, --dhcp-range and --dhcp-boot are required.
# --k: do not go into the background at startup.
# --interface: Listen only on the specified interface(s).
# --dhcp-range: addresses will be given out from the range <start-addr> to <end-addr>. If the lease time is given, then leases will be given for that length of time.
# --dhcp-boot: dnsmasq is providing a TFTP service. the filename is required here to enable network booting.
# --dhcp-option: specify different or extra options to DHCP clients.
args:
- --k
- --interface=<interface name>
- --dhcp-range=<dhcp ip range> # e.g. "172.16.105.200,172.16.105.240,12h"
- --dhcp-boot=<dhcp boot images> # e.g. "efi/grubaa64-BlueField-3.9.2.12271.2.7.4.efi"
- --dhcp-option=3
- --dhcp-option=6,<dns server> # multiple DNSs are comma separated. e.g. "10.211.0.124,10.211.0.121,10.7.77.135"
- --dhcp-option=option:classless-static-route,<route rules> # multiple route rules are comma separated. e.g. "0.0.0.0/0,172.16.105.1"

universe-infra-workload-manager:
enabled: true

universe-infra-workload-rule-manager:
enabled: true

universe-infra-workload-controller:
enabled: true

universe-infra-api-gateway:
enabled: true
vaultApproleSecret:
create: true
roleID: 3134f7ed-f66b-1347-83e0-54e1e003cd10 # example roleID
secretID: b7ac107d-d7be-cd38-4ad0-d41b4dddf5a0 # example secretID
vaultAnnotations:
addAnnotations: true
envoy:
config:
listener:
serverTLS:
enabled: true
peerValidation:
enabled: true
EOF

# The kubeconfig and CA files are used to authenticate provisioned DPU to join current cluster; if the cluster
# was installed by kubeadm, the default path are "/root/.kube/config", "/etc/kubernetes/pki/ca.crt" and
# "/etc/kubernetes/pki/ca.key" as follow.
#
# NOTES: It is important to check in the kubeconfig that the "server" URL is not using localhost (127.0.0.1),
# but the control plane API IP/VIP address

helm_set_file_params="universe-infra-provisioning-controller.capiConfig.kubeconfig=/root/.kube/config,"
helm_set_file_params+="universe-infra-provisioning-controller.capiConfig.tlsCrt=/etc/kubernetes/pki/ca.crt,"
helm_set_file_params+="universe-infra-provisioning-controller.capiConfig.tlsKey=/etc/kubernetes/pki/ca.key"

helm install -n universe --create-namespace -f values-secure.yaml icp . --set-file ${helm_set_file_params}

Tenant control plane

This section describes deployment steps for tenant control plane.

Follow these steps on the tCP masters for each tenant.

Create imagePullSecret to use images from a private registry.

Commands in this section should run from the root directory of the universe-tenant-control-planechart.

Copy
Copied!

            
            cd ~/universe-helm-charts/universe-tenant-control-plane

Deploy tCP components

Check universe-tenant-control-plane Chart documentation for all available options.

Note

You should change settings in the snippet below to match your environment

Fields to changes:

global.vaultApproleSecret.roleID and secretID - should be created in Vault server during Vault PKI configuration
global.sidecars.proxy.config.listener.inject_headers.tenant-id
global.sidecars.proxy.config.upstream.address and port

Prepare values-secure.yaml and deploy helm chart to the tCP cluster

Copy
Copied!

            
            cat << 'EOF' | tee values-secure.yaml
global:
image:
registry: nvcr.io/nvstaging/doca/
imagePullSecrets:
- name: nvcrio-cred
nodeSelector:
node-role.kubernetes.io/control-plane: ""
tolerations:
- effect: NoSchedule
operator: "Exists"
key: node-role.kubernetes.io/master
- effect: NoSchedule
operator: "Exists"
key: node-role.kubernetes.io/control-plane
sidecars:
proxy:
config:
listener:
inject_headers:
tenant-id: tenant1
upstream:
address: 10.133.133.1 # iCP master ip address
port: 30001
clientTLS:
enabled: true
peerValidation:
enabled: true
vaultAnnotations:
addAnnotations: true
vaultApproleSecret:
create: true
roleID: 24ad0b7c-ad63-bc9d-f45d-bdd0bf75d7a2 # vault roleID, will be shared by all plugins
secretID: 290f4c90-5e10-80e2-a068-937c31ac512b # vault secretID, will be shared by all plugins

universe-k8s-tenant-resource-plugin:
enabled: true

universe-k8s-tenant-workload-plugin:
enabled: true

universe-k8s-tenant-workload-rule-plugin:
enabled: true
EOF

helm install -n universe --create-namespace -f values-secure.yaml tcp .

Validate installation

You can run basic manual tests to validate that Universe components work as expected. These tests should be executed from tenant control-plane

Check resource API

Create UVSPod resource

Copy
Copied!

            
            cat << 'EOF' | tee tenant-pod1.yaml
apiVersion: resource.universe.nvidia.com/v1alpha1
kind: UVSPod
metadata:
name: tenant-pod1
namespace: universe
spec:
object:
apiVersion: v1
kind: Pod
metadata:
name: tenant-pod1
spec:
containers:
- name: nginx
image: nginx:1.14.2
EOF

kubectl apply -f tenant-pod1.yaml

Check UVSPod resource status. Result should be success.

Copy
Copied!

            
            kubectl get uvspods.resource.universe.nvidia.com -n universe tenant-pod1 -o jsonpath='{.status.syncResult}{"\n"}'
{"result":"success"}

If everything is fine, you should also be able to see tenant-pod1 on DPU in iCP cluster.

Check workload API

Create rule1 WorkloadRule CR in tenant cluster

Copy
Copied!

            
            cat << 'EOF' | tee workloadrule1.yaml
apiVersion: workload.universe.nvidia.com/v1alpha1
kind: WorkloadRule
metadata:
name: rule1
namespace: universe
spec:
resourceType: v1/Pod
workloadTerms:
- matchExpressions:
- key: metadata.resourceNamespace
operator: In
values:
- default
- key: metadata.resourceName
operator: In
values:
- test-pod
workloadInfoInject:
- workloadKey: state.nodeName
asAnnotation:
name: tenant-node-name
dpuSelectionPolicy: Any
template:
apiVersion: v1
kind: Pod
spec:
containers:
- name: nginx
image: nginx:1.14.2
volumeMounts:
- name: workload-info
mountPath: /workload-info
# standard k8s way to mount annotation as a volume
volumes:
- name: workload-info
downwardAPI:
items:
- path: node-name
fieldRef:
fieldPath: metadata.annotations['tenant-node-name']
EOF

kubectl apply -f workloadrule1.yaml

Rule above means the following: create Pod with nginx on any DPU (which belongs to tenant) if Pod with name test-pod created in default namespace in tenant cluster.

Create Pod with name test-pod in default namespace in tenant cluster

Copy
Copied!

            
            cat << 'EOF' | tee test-pod.yaml
apiVersion: v1
kind: Pod
metadata:
name: test-pod
spec:
containers:
- name: nginx
image: nginx:1.14.2
EOF

kubectl apply -f test-pod.yaml

Infrastructure cluster will receive notification that test-pod has been created in Tenant cluster. rule1 will match this Pod, as result NGINX Pod will be created in the tenant namespace on a DPU in infrastructure cluster. Resource API tenant plugin will detect this Pod in the infrastructure cluster and will create mirror UVSPod object in the tenant cluster. Below we check that expected UVSPod resource was created.

Copy
Copied!

            
            kubectl get uvspods -n universe \
-o jsonpath='{range .items[?(@.spec.object.metadata.annotations.workloadrule\.workload\.universe\.nvidia\.com/name=="rule1")]}{ .metadata.namespace }/{ .metadata.name}{"\n"}{end}'

universe/rule1-57283160-a077-446d-882b-4f1373b0d02e