Network Operator Application Notes 23.10.0 - Sphinx Test
1.0

Installing with Helm

This document describes the installation procedure for the 0.5.0-dev version of the Universe service orchestration stack with Helm.

Environment

It is possible to deploy all Universe components to a single virtual Kind cluster, but it is highly recommended to have separate hosts/VM at least for the following roles:

  • Physical Host or VM for iCP master

  • Physical Host or VM for tenant1 master

  • Physical Host with DPU for tenant1 worker

This environment is a bare minimum, and it is not enough to test all multi-tenant use cases, so for real-world test cases, it should be extended to include multiple tenant clusters with multiple hosts each.

Vault

Universe service orchestration stack uses Vault for TLS certificate management.

Vault server is not part of Universe.

Control-plane nodes in Infrastructure cluster and all nodes in Tenant clusters should have network access to the external Vault server running in your infrastructure.

You can install and configure Vault server by following the official documentation

If you are going to use the existing Vault server, you should have permissions on this server to initialize PKI (Public Key Infrastructure) and create approles for Universe components.

You can find an example of how to configure PKI in Vault here: Vault PKI configuration

Helm

For Universe installation, it is required to install Helm v3 on iCP master and all Tenant masters. Carefully check the document describing the maximum version skew supported between Helm and Kubernetes.

These steps should be applied on all Kubernetes master nodes (for both, iCP and all Tenants).

Download Helm charts from NGC

Note

To be able to use images and Helm Charts from NGC you need to have access to nvstaging/doca group.

Replace <YOUR_NGC_TOKEN> with your NGC token. Token can be generated here.

mkdir ~/universe-helm-charts && cd ~/universe-helm-charts NGC_API_KEY=<YOUR_NGC_TOKEN> NGC_CHARTS_REPO=https://helm.ngc.nvidia.com/nvstaging/doca/charts helm fetch "$NGC_CHARTS_REPO"/universe-tenant-control-plane-0.5.0-dev.tgz --username='$oauthtoken' --password="$NGC_API_KEY" helm fetch "$NGC_CHARTS_REPO"/universe-infra-control-plane-0.5.0-dev.tgz --username='$oauthtoken' --password="$NGC_API_KEY" helm fetch "$NGC_CHARTS_REPO"/universe-vault-0.5.0-dev.tgz --username='$oauthtoken' --password="$NGC_API_KEY" ls universe-*-0.5.0-dev.tgz | xargs -n 1 tar xf

Install Vault agent

Commands in this section should run from the root directory of the universe-vault chart.

Copy
Copied!
            

cd ~/universe-helm-charts/universe-vault


Prepare values file for Vault helm chart

Check universe-vault Chart documentation for all available options.

Note

You should change settings in the snippet below to match your environment

Fields to changes:

  • Replace vault.injector.externalVaultAddr with address of the Vault server

Prepare helm values for vault agent

Copy
Copied!
            

cat << 'EOF' | tee values-external.yaml vault: enabled: true injector: enabled: true externalVaultAddr: http://<VAULT_SERVER>:<VAULT_PORT> nodeSelector: node-role.kubernetes.io/control-plane: "" tolerations: - effect: NoSchedule operator: "Exists" key: node-role.kubernetes.io/master - effect: NoSchedule operator: "Exists" key: node-role.kubernetes.io/control-plane server: enabled: false EOF


Deploy vault agent

Copy
Copied!
            

helm install -n vault --create-namespace -f values-external.yaml vault .


This section describes deployment steps for infrastructure control plane.

Follow these steps on the iCP master.

Create imagePullSecret to use images from a private registry.

Commands in this section should run from the root directory of the universe-infra-control-plane chart.

Copy
Copied!
            

cd ~/universe-helm-charts/universe-infra-control-plane


Download and unpack DCM image

For DPU provisioning, DCM image is required and can be downloaded from GIT Package Registry.

Download DCM image

Download DCM image from https://gitlab-master.nvidia.com/api/v4/projects/77555/packages/generic/dcm_ngn/0.0.4/dcm-images.tar.gz

SHA-256: 5fed7981c0481a456c0fbd0f7166fe6d2cefd504ff3dd0ce9bf079d30f0ed817

Unpack DCM image

Note

Unpack the DCM image into a shared directory which is used by Deploy iCP components.

You need set this shared directory to global.provisioningStorage.hostpath in Deploy iCP components.

Copy
Copied!
            

mkdir -p $SHARED_RESOURCES_DIR/images tar -xf dcm-images.tar.gz -C ${SHARED_RESOURCES_DIR}/images

Deploy iCP components

Check universe-infra-control-plane Chart documentation for all available options.

Note

You should change settings in the snippet below to match your environment

Fields to changes:

  • global.ironicHostIP

  • global.provisioningStorage

  • universe-infra-provisioning-controller.capiConfig

  • universe-infra-provisioning-controller.controller

  • universe-infra-provisioning-executor.universe-infra-provisioning-mariadb

  • universe-infra-provisioning-executor.universe-infra-provisioning-bootp

  • universe-infra-admin-controller.tenantConfig.tenants

  • universe-infra-admin-controller.dpuInventory.dpus

  • universe-infra-api-gateway.vaultApproleSecret.roleID and secretID - should be created in Vault server during Vault PKI configuration

  • helm_set_file_params: the path of kubeconf and CA files

Prepare values-secure.yaml and deploy helm chart to the iCP cluster

Copy
Copied!
            

cat << 'EOF' | tee values-secure.yaml global: image: registry: nvcr.io/nvstaging/doca/ imagePullSecrets: - name: nvcrio-cred nodeSelector: node-role.kubernetes.io/control-plane: "" tolerations: - effect: NoSchedule operator: "Exists" key: node-role.kubernetes.io/master - effect: NoSchedule operator: "Exists" key: node-role.kubernetes.io/control-plane # ip for ironic host ironicHostIP: "<ironic ip address>" provisioningStorage: # hostpath is used by bootp and ironic hostpath: <host path of local pv> # hostname is used by bootp and ironic hostname: "<host name of local pv>" universe-infra-admin-controller: enabled: true tenantConfig: create: true tenants: - id: tenant1 hostnames: - host-a - host-b dpuInventory: create: true dpus: - id: dpu1-host-a host: host-a - id: dpu1-host-b host: host-b universe-infra-resource-manager: enabled: true universe-infra-provisioning-manager: enabled: true universe-infra-provisioning-controller: enabled: true capiConfig: create: true clusterName: <name of the cluster> controlPlaneEndPoint: <the ip address, host ip or vip of control plane endpoint> controlPlaneEndPointPort: <the port of control plane endpoint> controller: args: ntpServer: <ntp server ip address> imageRegistry: <image registry ip address> universe-infra-provisioning-executor: enabled: true universe-infra-provisioning-mariadb: pv: name: mariadb-pv hostpath: <host path of local pv> hostname: <host name of local pv> universe-infra-provisioning-bootp: bootp: # -- dnsmasq configuration, refer https://linux.die.net/man/8/dnsmasq dnsmasq: # args is a list of dnsmasq command parameters. You can set any parameters supported by dnsmasq. # --k, --interface, --dhcp-range and --dhcp-boot are required. # --k: do not go into the background at startup. # --interface: Listen only on the specified interface(s). # --dhcp-range: addresses will be given out from the range <start-addr> to <end-addr>. If the lease time is given, then leases will be given for that length of time. # --dhcp-boot: dnsmasq is providing a TFTP service. the filename is required here to enable network booting. # --dhcp-option: specify different or extra options to DHCP clients. args: - --k - --interface=<interface name> - --dhcp-range=<dhcp ip range> # e.g. "172.16.105.200,172.16.105.240,12h" - --dhcp-boot=<dhcp boot images> # e.g. "efi/grubaa64-BlueField-3.9.2.12271.2.7.4.efi" - --dhcp-option=3 - --dhcp-option=6,<dns server> # multiple DNSs are comma separated. e.g. "10.211.0.124,10.211.0.121,10.7.77.135" - --dhcp-option=option:classless-static-route,<route rules> # multiple route rules are comma separated. e.g. "0.0.0.0/0,172.16.105.1" universe-infra-workload-manager: enabled: true universe-infra-workload-rule-manager: enabled: true universe-infra-workload-controller: enabled: true universe-infra-api-gateway: enabled: true vaultApproleSecret: create: true roleID: 3134f7ed-f66b-1347-83e0-54e1e003cd10 # example roleID secretID: b7ac107d-d7be-cd38-4ad0-d41b4dddf5a0 # example secretID vaultAnnotations: addAnnotations: true envoy: config: listener: serverTLS: enabled: true peerValidation: enabled: true EOF # The kubeconfig and CA files are used to authenticate provisioned DPU to join current cluster; if the cluster # was installed by kubeadm, the default path are "/root/.kube/config", "/etc/kubernetes/pki/ca.crt" and # "/etc/kubernetes/pki/ca.key" as follow. # # NOTES: It is important to check in the kubeconfig that the "server" URL is not using localhost (127.0.0.1), # but the control plane API IP/VIP address helm_set_file_params="universe-infra-provisioning-controller.capiConfig.kubeconfig=/root/.kube/config," helm_set_file_params+="universe-infra-provisioning-controller.capiConfig.tlsCrt=/etc/kubernetes/pki/ca.crt," helm_set_file_params+="universe-infra-provisioning-controller.capiConfig.tlsKey=/etc/kubernetes/pki/ca.key" helm install -n universe --create-namespace -f values-secure.yaml icp . --set-file ${helm_set_file_params}


This section describes deployment steps for tenant control plane.

Follow these steps on the tCP masters for each tenant.

Create imagePullSecret to use images from a private registry.

Commands in this section should run from the root directory of the universe-tenant-control-planechart.

Copy
Copied!
            

cd ~/universe-helm-charts/universe-tenant-control-plane


Deploy tCP components

Check universe-tenant-control-plane Chart documentation for all available options.

Note

You should change settings in the snippet below to match your environment

Fields to changes:

  • global.vaultApproleSecret.roleID and secretID - should be created in Vault server during Vault PKI configuration

  • global.sidecars.proxy.config.listener.inject_headers.tenant-id

  • global.sidecars.proxy.config.upstream.address and port

Prepare values-secure.yaml and deploy helm chart to the tCP cluster

Copy
Copied!
            

cat << 'EOF' | tee values-secure.yaml global: image: registry: nvcr.io/nvstaging/doca/ imagePullSecrets: - name: nvcrio-cred nodeSelector: node-role.kubernetes.io/control-plane: "" tolerations: - effect: NoSchedule operator: "Exists" key: node-role.kubernetes.io/master - effect: NoSchedule operator: "Exists" key: node-role.kubernetes.io/control-plane sidecars: proxy: config: listener: inject_headers: tenant-id: tenant1 upstream: address: 10.133.133.1 # iCP master ip address port: 30001 clientTLS: enabled: true peerValidation: enabled: true vaultAnnotations: addAnnotations: true vaultApproleSecret: create: true roleID: 24ad0b7c-ad63-bc9d-f45d-bdd0bf75d7a2 # vault roleID, will be shared by all plugins secretID: 290f4c90-5e10-80e2-a068-937c31ac512b # vault secretID, will be shared by all plugins universe-k8s-tenant-resource-plugin: enabled: true universe-k8s-tenant-workload-plugin: enabled: true universe-k8s-tenant-workload-rule-plugin: enabled: true EOF helm install -n universe --create-namespace -f values-secure.yaml tcp .


You can run basic manual tests to validate that Universe components work as expected. These tests should be executed from tenant control-plane

Check resource API

Create UVSPod resource

Copy
Copied!
            

cat << 'EOF' | tee tenant-pod1.yaml apiVersion: resource.universe.nvidia.com/v1alpha1 kind: UVSPod metadata: name: tenant-pod1 namespace: universe spec: object: apiVersion: v1 kind: Pod metadata: name: tenant-pod1 spec: containers: - name: nginx image: nginx:1.14.2 EOF kubectl apply -f tenant-pod1.yaml


Check UVSPod resource status. Result should be success.

Copy
Copied!
            

kubectl get uvspods.resource.universe.nvidia.com -n universe tenant-pod1 -o jsonpath='{.status.syncResult}{"\n"}' {"result":"success"}


If everything is fine, you should also be able to see tenant-pod1 on DPU in iCP cluster.

Check workload API

Create rule1 WorkloadRule CR in tenant cluster

Copy
Copied!
            

cat << 'EOF' | tee workloadrule1.yaml apiVersion: workload.universe.nvidia.com/v1alpha1 kind: WorkloadRule metadata: name: rule1 namespace: universe spec: resourceType: v1/Pod workloadTerms: - matchExpressions: - key: metadata.resourceNamespace operator: In values: - default - key: metadata.resourceName operator: In values: - test-pod workloadInfoInject: - workloadKey: state.nodeName asAnnotation: name: tenant-node-name dpuSelectionPolicy: Any template: apiVersion: v1 kind: Pod spec: containers: - name: nginx image: nginx:1.14.2 volumeMounts: - name: workload-info mountPath: /workload-info # standard k8s way to mount annotation as a volume volumes: - name: workload-info downwardAPI: items: - path: node-name fieldRef: fieldPath: metadata.annotations['tenant-node-name'] EOF kubectl apply -f workloadrule1.yaml


Rule above means the following: create Pod with nginx on any DPU (which belongs to tenant) if Pod with name test-pod created in default namespace in tenant cluster.

Create Pod with name test-pod in default namespace in tenant cluster

Copy
Copied!
            

cat << 'EOF' | tee test-pod.yaml apiVersion: v1 kind: Pod metadata: name: test-pod spec: containers: - name: nginx image: nginx:1.14.2 EOF kubectl apply -f test-pod.yaml


Infrastructure cluster will receive notification that test-pod has been created in Tenant cluster. rule1 will match this Pod, as result NGINX Pod will be created in the tenant namespace on a DPU in infrastructure cluster. Resource API tenant plugin will detect this Pod in the infrastructure cluster and will create mirror UVSPod object in the tenant cluster. Below we check that expected UVSPod resource was created.

Copy
Copied!
            

kubectl get uvspods -n universe \ -o jsonpath='{range .items[?(@.spec.object.metadata.annotations.workloadrule\.workload\.universe\.nvidia\.com/name=="rule1")]}{ .metadata.namespace }/{ .metadata.name}{"\n"}{end}' universe/rule1-57283160-a077-446d-882b-4f1373b0d02e


Previous Infrastructure cluster control plane
Next Other
© Copyright 2023, NVIDIA. Last updated on Feb 7, 2024.