Installing with Helm
This document describes the installation procedure for the 0.5.0-dev version of the Universe service orchestration stack with Helm.
Environment
It is possible to deploy all Universe components to a single virtual Kind cluster, but it is highly recommended to have separate hosts/VM at least for the following roles:
Physical Host or VM for iCP master
Physical Host or VM for tenant1 master
Physical Host with DPU for tenant1 worker
This environment is a bare minimum, and it is not enough to test all multi-tenant use cases, so for real-world test cases, it should be extended to include multiple tenant clusters with multiple hosts each.
Vault
Universe service orchestration stack uses Vault for TLS certificate management.
Vault server is not part of Universe.
Control-plane nodes in Infrastructure cluster and all nodes in Tenant clusters should have network access to the external Vault server running in your infrastructure.
You can install and configure Vault server by following the official documentation
If you are going to use the existing Vault server, you should have permissions on this server to initialize PKI (Public Key Infrastructure) and create approles for Universe components.
You can find an example of how to configure PKI in Vault here: Vault PKI configuration
Helm
For Universe installation, it is required to install Helm v3 on iCP master and all Tenant masters. Carefully check the document describing the maximum version skew supported between Helm and Kubernetes.
These steps should be applied on all Kubernetes master nodes (for both, iCP and all Tenants).
Download Helm charts from NGC
To be able to use images and Helm Charts from NGC you need to have access to nvstaging/doca
group.
Replace <YOUR_NGC_TOKEN>
with your NGC token. Token can be generated here.
mkdir ~/universe-helm-charts && cd ~/universe-helm-charts NGC_API_KEY=<YOUR_NGC_TOKEN> NGC_CHARTS_REPO=https://helm.ngc.nvidia.com/nvstaging/doca/charts helm fetch "$NGC_CHARTS_REPO"/universe-tenant-control-plane-0.5.0-dev.tgz --username='$oauthtoken' --password="$NGC_API_KEY" helm fetch "$NGC_CHARTS_REPO"/universe-infra-control-plane-0.5.0-dev.tgz --username='$oauthtoken' --password="$NGC_API_KEY" helm fetch "$NGC_CHARTS_REPO"/universe-vault-0.5.0-dev.tgz --username='$oauthtoken' --password="$NGC_API_KEY" ls universe-*-0.5.0-dev.tgz | xargs -n 1 tar xf
Install Vault agent
Commands in this section should run from the root directory of the universe-vault chart.
cd ~/universe-helm-charts/universe-vault
Prepare values file for Vault helm chart
Check universe-vault Chart documentation for all available options.
You should change settings in the snippet below to match your environment
Fields to changes:
Replace vault.injector.externalVaultAddr with address of the Vault server
Prepare helm values for vault agent
cat << 'EOF' | tee values-external.yaml
vault:
enabled: true
injector:
enabled: true
externalVaultAddr: http://<VAULT_SERVER>:<VAULT_PORT>
nodeSelector:
node-role.kubernetes.io/control-plane: ""
tolerations:
- effect: NoSchedule
operator: "Exists"
key: node-role.kubernetes.io/master
- effect: NoSchedule
operator: "Exists"
key: node-role.kubernetes.io/control-plane
server:
enabled: false
EOF
Deploy vault agent
helm install -n vault --create-namespace -f values-external.yaml vault .
This section describes deployment steps for infrastructure control plane.
Follow these steps on the iCP master.
Create imagePullSecret to use images from a private registry.
Commands in this section should run from the root directory of the universe-infra-control-plane chart.
cd ~/universe-helm-charts/universe-infra-control-plane
Download and unpack DCM image
For DPU provisioning, DCM image is required and can be downloaded from GIT Package Registry
.
Download DCM image
Download DCM image from https://gitlab-master.nvidia.com/api/v4/projects/77555/packages/generic/dcm_ngn/0.0.4/dcm-images.tar.gz
SHA-256: 5fed7981c0481a456c0fbd0f7166fe6d2cefd504ff3dd0ce9bf079d30f0ed817
Unpack DCM image
Unpack the DCM image into a shared directory which is used by Deploy iCP components.
You need set this shared directory to global.provisioningStorage.hostpath in Deploy iCP components.
mkdir -p $SHARED_RESOURCES_DIR/images
tar -xf dcm-images.tar.gz -C ${SHARED_RESOURCES_DIR}/images
Deploy iCP components
Check universe-infra-control-plane Chart documentation for all available options.
You should change settings in the snippet below to match your environment
Fields to changes:
global.ironicHostIP
global.provisioningStorage
universe-infra-provisioning-controller.capiConfig
universe-infra-provisioning-controller.controller
universe-infra-provisioning-executor.universe-infra-provisioning-mariadb
universe-infra-provisioning-executor.universe-infra-provisioning-bootp
universe-infra-admin-controller.tenantConfig.tenants
universe-infra-admin-controller.dpuInventory.dpus
universe-infra-api-gateway.vaultApproleSecret.roleID and secretID - should be created in Vault server during Vault PKI configuration
helm_set_file_params: the path of kubeconf and CA files
Prepare values-secure.yaml and deploy helm chart to the iCP cluster
cat << 'EOF' | tee values-secure.yaml
global:
image:
registry: nvcr.io/nvstaging/doca/
imagePullSecrets:
- name: nvcrio-cred
nodeSelector:
node-role.kubernetes.io/control-plane: ""
tolerations:
- effect: NoSchedule
operator: "Exists"
key: node-role.kubernetes.io/master
- effect: NoSchedule
operator: "Exists"
key: node-role.kubernetes.io/control-plane
# ip for ironic host
ironicHostIP: "<ironic ip address>"
provisioningStorage:
# hostpath is used by bootp and ironic
hostpath: <host path of local pv>
# hostname is used by bootp and ironic
hostname: "<host name of local pv>"
universe-infra-admin-controller:
enabled: true
tenantConfig:
create: true
tenants:
- id: tenant1
hostnames:
- host-a
- host-b
dpuInventory:
create: true
dpus:
- id: dpu1-host-a
host: host-a
- id: dpu1-host-b
host: host-b
universe-infra-resource-manager:
enabled: true
universe-infra-provisioning-manager:
enabled: true
universe-infra-provisioning-controller:
enabled: true
capiConfig:
create: true
clusterName: <name of the cluster>
controlPlaneEndPoint: <the ip address, host ip or vip of control plane endpoint>
controlPlaneEndPointPort: <the port of control plane endpoint>
controller:
args:
ntpServer: <ntp server ip address>
imageRegistry: <image registry ip address>
universe-infra-provisioning-executor:
enabled: true
universe-infra-provisioning-mariadb:
pv:
name: mariadb-pv
hostpath: <host path of local pv>
hostname: <host name of local pv>
universe-infra-provisioning-bootp:
bootp:
# -- dnsmasq configuration, refer https://linux.die.net/man/8/dnsmasq
dnsmasq:
# args is a list of dnsmasq command parameters. You can set any parameters supported by dnsmasq.
# --k, --interface, --dhcp-range and --dhcp-boot are required.
# --k: do not go into the background at startup.
# --interface: Listen only on the specified interface(s).
# --dhcp-range: addresses will be given out from the range <start-addr> to <end-addr>. If the lease time is given, then leases will be given for that length of time.
# --dhcp-boot: dnsmasq is providing a TFTP service. the filename is required here to enable network booting.
# --dhcp-option: specify different or extra options to DHCP clients.
args:
- --k
- --interface=<interface name>
- --dhcp-range=<dhcp ip range> # e.g. "172.16.105.200,172.16.105.240,12h"
- --dhcp-boot=<dhcp boot images> # e.g. "efi/grubaa64-BlueField-3.9.2.12271.2.7.4.efi"
- --dhcp-option=3
- --dhcp-option=6,<dns server> # multiple DNSs are comma separated. e.g. "10.211.0.124,10.211.0.121,10.7.77.135"
- --dhcp-option=option:classless-static-route,<route rules> # multiple route rules are comma separated. e.g. "0.0.0.0/0,172.16.105.1"
universe-infra-workload-manager:
enabled: true
universe-infra-workload-rule-manager:
enabled: true
universe-infra-workload-controller:
enabled: true
universe-infra-api-gateway:
enabled: true
vaultApproleSecret:
create: true
roleID: 3134f7ed-f66b-1347-83e0-54e1e003cd10 # example roleID
secretID: b7ac107d-d7be-cd38-4ad0-d41b4dddf5a0 # example secretID
vaultAnnotations:
addAnnotations: true
envoy:
config:
listener:
serverTLS:
enabled: true
peerValidation:
enabled: true
EOF
# The kubeconfig and CA files are used to authenticate provisioned DPU to join current cluster; if the cluster
# was installed by kubeadm, the default path are "/root/.kube/config", "/etc/kubernetes/pki/ca.crt" and
# "/etc/kubernetes/pki/ca.key" as follow.
#
# NOTES: It is important to check in the kubeconfig that the "server" URL is not using localhost (127.0.0.1),
# but the control plane API IP/VIP address
helm_set_file_params="universe-infra-provisioning-controller.capiConfig.kubeconfig=/root/.kube/config,"
helm_set_file_params+="universe-infra-provisioning-controller.capiConfig.tlsCrt=/etc/kubernetes/pki/ca.crt,"
helm_set_file_params+="universe-infra-provisioning-controller.capiConfig.tlsKey=/etc/kubernetes/pki/ca.key"
helm install -n universe --create-namespace -f values-secure.yaml icp . --set-file ${helm_set_file_params}
This section describes deployment steps for tenant control plane.
Follow these steps on the tCP masters for each tenant.
Create imagePullSecret to use images from a private registry.
Commands in this section should run from the root directory of the universe-tenant-control-planechart.
cd ~/universe-helm-charts/universe-tenant-control-plane
Deploy tCP components
Check universe-tenant-control-plane Chart documentation for all available options.
You should change settings in the snippet below to match your environment
Fields to changes:
global.vaultApproleSecret.roleID and secretID - should be created in Vault server during Vault PKI configuration
global.sidecars.proxy.config.listener.inject_headers.tenant-id
global.sidecars.proxy.config.upstream.address and port
Prepare values-secure.yaml and deploy helm chart to the tCP cluster
cat << 'EOF' | tee values-secure.yaml
global:
image:
registry: nvcr.io/nvstaging/doca/
imagePullSecrets:
- name: nvcrio-cred
nodeSelector:
node-role.kubernetes.io/control-plane: ""
tolerations:
- effect: NoSchedule
operator: "Exists"
key: node-role.kubernetes.io/master
- effect: NoSchedule
operator: "Exists"
key: node-role.kubernetes.io/control-plane
sidecars:
proxy:
config:
listener:
inject_headers:
tenant-id: tenant1
upstream:
address: 10.133.133.1 # iCP master ip address
port: 30001
clientTLS:
enabled: true
peerValidation:
enabled: true
vaultAnnotations:
addAnnotations: true
vaultApproleSecret:
create: true
roleID: 24ad0b7c-ad63-bc9d-f45d-bdd0bf75d7a2 # vault roleID, will be shared by all plugins
secretID: 290f4c90-5e10-80e2-a068-937c31ac512b # vault secretID, will be shared by all plugins
universe-k8s-tenant-resource-plugin:
enabled: true
universe-k8s-tenant-workload-plugin:
enabled: true
universe-k8s-tenant-workload-rule-plugin:
enabled: true
EOF
helm install -n universe --create-namespace -f values-secure.yaml tcp .
You can run basic manual tests to validate that Universe components work as expected. These tests should be executed from tenant control-plane
Check resource API
Create UVSPod resource
cat << 'EOF' | tee tenant-pod1.yaml
apiVersion: resource.universe.nvidia.com/v1alpha1
kind: UVSPod
metadata:
name: tenant-pod1
namespace: universe
spec:
object:
apiVersion: v1
kind: Pod
metadata:
name: tenant-pod1
spec:
containers:
- name: nginx
image: nginx:1.14.2
EOF
kubectl apply -f tenant-pod1.yaml
Check UVSPod resource status. Result should be success
.
kubectl get uvspods.resource.universe.nvidia.com -n universe tenant-pod1 -o jsonpath='{.status.syncResult}{"\n"}'
{"result":"success"}
If everything is fine, you should also be able to see tenant-pod1 on DPU in iCP cluster.
Check workload API
Create rule1
WorkloadRule CR in tenant cluster
cat << 'EOF' | tee workloadrule1.yaml
apiVersion: workload.universe.nvidia.com/v1alpha1
kind: WorkloadRule
metadata:
name: rule1
namespace: universe
spec:
resourceType: v1/Pod
workloadTerms:
- matchExpressions:
- key: metadata.resourceNamespace
operator: In
values:
- default
- key: metadata.resourceName
operator: In
values:
- test-pod
workloadInfoInject:
- workloadKey: state.nodeName
asAnnotation:
name: tenant-node-name
dpuSelectionPolicy: Any
template:
apiVersion: v1
kind: Pod
spec:
containers:
- name: nginx
image: nginx:1.14.2
volumeMounts:
- name: workload-info
mountPath: /workload-info
# standard k8s way to mount annotation as a volume
volumes:
- name: workload-info
downwardAPI:
items:
- path: node-name
fieldRef:
fieldPath: metadata.annotations['tenant-node-name']
EOF
kubectl apply -f workloadrule1.yaml
Rule above means the following: create Pod with nginx on any DPU (which belongs to tenant) if Pod
with name test-pod
created in default
namespace in tenant cluster.
Create Pod with name test-pod
in default
namespace in tenant cluster
cat << 'EOF' | tee test-pod.yaml
apiVersion: v1
kind: Pod
metadata:
name: test-pod
spec:
containers:
- name: nginx
image: nginx:1.14.2
EOF
kubectl apply -f test-pod.yaml
Infrastructure cluster will receive notification that test-pod
has been created in Tenant cluster.
rule1
will match this Pod, as result NGINX Pod will be created in the tenant namespace
on a DPU in infrastructure cluster.
Resource API tenant plugin
will detect this Pod in the infrastructure cluster and will
create mirror UVSPod object in the tenant cluster. Below we check that expected UVSPod resource was created.
kubectl get uvspods -n universe \
-o jsonpath='{range .items[?(@.spec.object.metadata.annotations.workloadrule\.workload\.universe\.nvidia\.com/name=="rule1")]}{ .metadata.namespace }/{ .metadata.name}{"\n"}{end}'
universe/rule1-57283160-a077-446d-882b-4f1373b0d02e