Workload API - overview

This page describes Workload API concepts with examples.

Workload API provides a way to define Tenant-driven or CloudAdmin-driven rule, which will trigger Pod resource creation in the infrastructure cluster if a workload (Pod), which matches this rule, has started in the Tenant cluster.

A workload is something that is running in the tenant cluster. For example, a workload can represent Kubernetes Pod, Openstack VM, or something else. Currently, the only supported tenant orchestrator is Kubernetes, and the only supported resource type is Pod.

Workload API is a set of APIs; it includes Tenant and CloudAdmin APIs.

Tenant API

  • Workload GRPC API - API to deliver notifications about workloads running in the Tenant cluster to the infrastructure cluster

  • WorkloadRule GRPC API - API to create WorkloadRules in the Infrastructure cluster

  • WorkloadRule CRD - simplify usage of WorkloadRule GRPC API from the tenant cluster

CloudAdmin API

The following infrastructure is used for the demonstration

workload_api_overview.png

Infrastructure cluster

Copy
Copied!
            

✓ icp> kubectl get node NAME STATUS ROLES AGE VERSION dpu1-host-a Ready <none> 27h v1.24.0 dpu1-host-b Ready <none> 27h v1.24.0 dpu1-host-c Ready <none> 27h v1.24.0 dpu1-host-d Ready <none> 27h v1.24.0 icp-master Ready control-plane 27h v1.24.0

Tenant1 cluster

Copy
Copied!
            

✓ tenant1> kubectl get node NAME STATUS ROLES AGE VERSION host-a Ready <none> 27h v1.24.0 host-b Ready <none> 27h v1.24.0 tenant1-master Ready control-plane 27h v1.24.0

Tenant2 cluster

Copy
Copied!
            

✓ tenant2> kubectl get node NAME STATUS ROLES AGE VERSION host-c Ready <none> 27h v1.24.0 host-d Ready <none> 27h v1.24.0 tenant2-master Ready control-plane 27h v1.24.0

Universe components

Universe components are deployed to infrastructure and tenant clusters by following Deployment guide.

As a result, in the infrastructure cluster, we have a separate namespace for each tenant

Copy
Copied!
            

✓ icp> kubectl get ns | grep tenant tenant-tenant1 Active 26h tenant-tenant2 Active 28h

Universe components in infrastructure cluster

Copy
Copied!
            

✓ icp> kubectl get po -n vault NAME READY STATUS RESTARTS AGE vault-0 1/1 Running 0 28h vault-agent-injector-6fd8f84794-xqlg9 1/1 Running 0 21s ✓ icp> kubectl get po -n universe NAME READY STATUS RESTARTS AGE icp-universe-infra-admin-controller-6c578657ff-5bggg 1/1 Running 0 32s icp-universe-infra-api-gateway-888c7dd8b-bqgpn 2/2 Running 0 31s icp-universe-infra-provisioning-manager-65ddd8d568-c9bzr 1/1 Running 0 32s icp-universe-infra-resource-manager-5cfcd597bc-shk25 1/1 Running 0 32s icp-universe-infra-workload-controller-68f7ffcc77-sfg9c 1/1 Running 0 32s icp-universe-infra-workload-manager-58fdbd88bd-4z7l8 1/1 Running 0 31s icp-universe-infra-workload-rule-manager-7d7686d6cc-56qfz 1/1 Running 0 31s

Tenant1 components

Copy
Copied!
            

✓ tenant1> kubectl get po -n vault NAME READY STATUS RESTARTS AGE vault-agent-injector-55d7dc8c6f-67kcl 1/1 Running 0 64s ✓ tenant1> kubectl get po -n universe NAME READY STATUS RESTARTS AGE tcp-universe-k8s-tenant-resource-plugin-8455d9cd59-dnl2q 3/3 Running 0 35s tcp-universe-k8s-tenant-workload-plugin-857dcb4b8c-tpn8s 3/3 Running 0 35s tcp-universe-k8s-tenant-workload-rule-plugin-f5bc8d45b-h66tq 3/3 Running 0 35s

Tenant2 components

Copy
Copied!
            

✓ tenant2> kubectl get po -n vault NAME READY STATUS RESTARTS AGE vault-agent-injector-55d7dc8c6f-6xc52 1/1 Running 0 19s ✓ tenant2> kubectl get po -n universe NAME READY STATUS RESTARTS AGE tcp-universe-k8s-tenant-resource-plugin-8455d9cd59-pqfb6 3/3 Running 0 13s tcp-universe-k8s-tenant-workload-plugin-857dcb4b8c-5qmm5 3/3 Running 0 13s tcp-universe-k8s-tenant-workload-rule-plugin-f5bc8d45b-pp6j2 3/3 Running 0 12s

Here we will test universe resource API in the tenant1 cluster.

Create UVSPod resource

Copy
Copied!
            

✓ tenant1> cat << 'EOF' | tee tenant1-uvspod1.yaml apiVersion: resource.universe.nvidia.com/v1alpha1 kind: UVSPod metadata: name: tenant1-uvspod1 namespace: universe spec: object: apiVersion: v1 kind: Pod metadata: name: tenant1-uvspod1 spec: containers: - name: nginx image: nginx:1.14.2 EOF ✓ tenant1> kubectl apply -f tenant1-uvspod1.yaml uvspod.resource.universe.nvidia.com/tenant1-uvspod1 created

Check UVSPod status in the tenant cluster

Copy
Copied!
            

✓ tenant1> kubectl get uvspods.resource.universe.nvidia.com -n universe tenant1-uvspod1 NAME RESULT MESSAGE tenant1-uvspod1 success

If everything operate correctly RESULT should be success

Check that the Pod resource has been created in the tenant namespace in the infrastructure cluster.

Copy
Copied!
            

✓ icp> kubectl get po -n tenant-tenant1 NAME READY STATUS RESTARTS AGE tenant1-uvspod1 1/1 Running 0 4m16s

In the infrastructure cluster we should see tenant-pod1 with spec from UVSPod which we created in the tenant cluster.

Tenant1 cluster

Create tenant1-pod1 Pod in the default namespace in the tenant1 cluster

Copy
Copied!
            

✓ tenant1> cat << 'EOF' | tee tenant1-pod1.yaml apiVersion: v1 kind: Pod metadata: name: tenant1-pod1 spec: containers: - name: nginx image: nginx:1.14.2 EOF ✓ tenant1> kubectl apply -f tenant1-pod1.yaml pod/tenant1-pod1 created

Create tenant1-pod2 Pod in the default namespace in the tenant1 cluster

Copy
Copied!
            

✓ tenant1> cat << 'EOF' | tee tenant1-pod2.yaml apiVersion: v1 kind: Pod metadata: name: tenant1-pod2 spec: containers: - name: nginx image: nginx:1.14.2 EOF ✓ tenant1> kubectl apply -f tenant1-pod2.yaml pod/tenant1-pod2 created

Tenant2 cluster

Create tenant2-pod1 Pod in the default namespace in the tenant2 cluster

Copy
Copied!
            

✓ tenant2> cat << 'EOF' | tee tenant2-pod1.yaml apiVersion: v1 kind: Pod metadata: name: tenant2-pod1 spec: containers: - name: nginx image: nginx:1.14.2 EOF ✓ tenant2> kubectl apply -f tenant2-pod1.yaml pod/tenant2-pod1 created

Tenant1 cluster

Create tenant1-rule1 WorkloadRule CR in the tenant1 cluster. This rule will match a workload if it runs in the default namespace in the tenant cluster and the workload name is tenant1-pod1.

For matching workloads, the rule will trigger the creation of the Pod, defined in the CR template section (simple nginx pod in our case).

Check WorkloadRule CRD format description for details.

Copy
Copied!
            

✓ tenant1> cat << 'EOF' | tee tenant1-rule1.yaml apiVersion: workload.universe.nvidia.com/v1alpha1 kind: WorkloadRule metadata: name: tenant1-rule1 namespace: universe spec: resourceType: v1/Pod workloadTerms: - matchExpressions: - key: metadata.resourceNamespace operator: In values: - default - key: metadata.resourceName operator: In values: - tenant1-pod1 workloadInfoInject: - workloadKey: state.nodeName asAnnotation: name: tenant-node-name - workloadKey: state.extra.labels asAnnotation: name: tenant-workload-labels dpuSelectionPolicy: Any template: apiVersion: v1 kind: Pod spec: containers: - name: nginx image: nginx:1.14.2 volumeMounts: - name: workload-info mountPath: /workload-info - name: workload-labels mountPath: /workload-labels # standard k8s way to mount annotation as a volume volumes: - name: workload-info downwardAPI: items: - path: node-name fieldRef: fieldPath: metadata.annotations['tenant-node-name'] - name: workload-labels downwardAPI: items: - path: labels fieldRef: fieldPath: metadata.annotations['tenant-workload-labels'] EOF ✓ tenant1> kubectl apply -f tenant1-rule1.yaml workloadrule.workload.universe.nvidia.com/tenant1-rule1 created

Infrastructure cluster

tenant1-rule1 should match only tenant1-pod1 Pod, we expect that single nginx Pod will be created in tenant-tenant1 namespace in the infrastructure cluster

Copy
Copied!
            

# tenant1-uvspod1 is a pod which we created earlier ✓ icp> kubectl get po -n tenant-tenant1 NAME READY STATUS RESTARTS AGE tenant1-rule1-0a7c0d7f-ba7f-4301-afed-8db108dbee1a 1/1 Running 0 63s tenant1-uvspod1 1/1 Running 0 16m # there should be no pods in tenant-tenant2 namespace ✓ icp> kubectl get po -n tenant-tenant2 No resources found in tenant-tenant2 namespace.

You can use the following snippet to check which Pod create by which rule

Copy
Copied!
            

✓ icp> kubectl get pods -n tenant-tenant1 -o=jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.metadata.annotations.workloadrule\.workload\.universe\.nvidia\.com/name}{"\n"}{end}' tenant1-rule1-0a7c0d7f-ba7f-4301-afed-8db108dbee1a tenant1-rule1 tenant1-uvspod1

Also, it is possible to check the workload status in the infrastructure cluster to which rules it matches and which Pods were created for this workload.

Copy
Copied!
            

✓ icp> kubectl get -n tenant-tenant1 workloads.workload.infra.universe.nvidia.com workload-0a7c0d7f-ba7f-4301-afed-8db108dbee1a -o jsonpath={.status} | jq { "rules": { "tenant": [ { "id": "tenant-tenant1/tenant1-rule1", "status": { "objRef": { "apiVersion": "v1", "kind": "Pod", "name": "tenant1-rule1-0a7c0d7f-ba7f-4301-afed-8db108dbee1a", "namespace": "tenant-tenant1" }, } } ] } }

Tenant1 cluster

Update Pod template in tenant1-rule1 WorkloadRule

Copy
Copied!
            

kubectl patch workloadrules.workload.universe.nvidia.com -n universe --type='json' \ -p '[{"op" : "replace","path" : "/spec/template/spec/containers/0/name", "value": "updated"}]' tenant1-rule1

Infrastructure cluster

Pod in the infrastructure cluster should be recreated with updated spec, container should now have name updated

Copy
Copied!
            

✓ icp> kubectl get pods -n tenant-tenant1 -o=jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.spec.containers[0].name}{"\n"}{end}' tenant1-rule1-0a7c0d7f-ba7f-4301-afed-8db108dbee1a updated tenant1-uvspod1 nginx

Now let’s check that workload info injection works as expected. In tenant1-rule1 WorkloadRule, we have a section configuring Workload labels injection for the Pod created in the infra cluster. With the command below, we check the content of the /workload-labels/labels file, which should include workload labels in JSON format. Currently, it should be empty.

Copy
Copied!
            

# find POD which was create by tenant1-rule1 rule icp > RULE_POD=$(kubectl get pod -n tenant-tenant1 -o jsonpath='{range .items[?(@.metadata.annotations.workloadrule\.workload\.universe\.nvidia\.com/name=="tenant1-rule1")]}{ .metadata.name}{"\n"}{end}' | head -n1) # check file content inside the POD icp > kubectl exec -ti -n tenant-tenant1 $RULE_POD -- cat /workload-labels/labels; echo {}

Tenant1 cluster

Update labels for tenant1-pod1 in the tenant1 cluster. Expected that this info will be transferred to the Pod which was created by the tenant1-rule1 WorkloadRule in the infrastructure cluster

Copy
Copied!
            

tenant1> kubectl label pod tenant1-pod1 foo=bar pod/tenant1-pod1 labeled

Infrastructure cluster

Let’s check that workload labels where injected to the Pod in infrastructure cluster

Copy
Copied!
            

# find POD which was create by tenant1-rule1 rule icp > RULE_POD=$(kubectl get pod -n tenant-tenant1 -o jsonpath='{range .items[?(@.metadata.annotations.workloadrule\.workload\.universe\.nvidia\.com/name=="tenant1-rule1")]}{ .metadata.name}{"\n"}{end}' | head -n1) # check file content inside the POD icp > kubectl exec -ti -n tenant-tenant1 $RULE_POD -- cat /workload-labels/labels; echo {"foo":"bar"}

Tenant1 cluster

Remove resourceName constraint from tenant1-rule1 WorkloadRule

Copy
Copied!
            

kubectl patch workloadrules.workload.universe.nvidia.com -n universe --type='json' \ -p '[{"op" : "remove","path" : "/spec/workloadTerms/0/matchExpressions/1"}]' tenant1-rule1

Now tenant1-rule1 rule should match all Pods which running in the default namespace in the tenant1 cluster

Infrastructure cluster

Copy
Copied!
            

✓ icp> kubectl get po -n tenant-tenant1 NAME READY STATUS RESTARTS AGE tenant1-rule1-0a7c0d7f-ba7f-4301-afed-8db108dbee1a 1/1 Running 0 3m20s tenant1-rule1-10ead831-47b6-407f-a903-c5c4cd92e6e8 1/1 Running 0 3m22s tenant1-uvspod1 1/1 Running 0 51m

Additional Pod was created in the infrastructure cluster as result of tenant1-rule1 match with tenant1-pod2 in the tenant1 cluster.

Tenant1 cluster

Mirror UVSPods should be created in the tenant1 cluster for tenant1-rule1-* Pods.

Copy
Copied!
            

tenant1> kubectl get uvspods.resource.universe.nvidia.com -n universe NAME RESULT MESSAGE tenant1-rule1-0a7c0d7f-ba7f-4301-afed-8db108dbee1a success tenant1-rule1-10ead831-47b6-407f-a903-c5c4cd92e6e8 success

Now we will remove tenant1-pod2 in the tenant1 cluster.

Copy
Copied!
            

✓ tenant1> kubectl delete po tenant1-pod2 pod "tenant1-pod2" deleted

Infrastructure cluster

As a result, Pod in the infrastructure cluster, which was created by the tenant1-rule1 rule for tenant1-pod2 Pod should be removed

Copy
Copied!
            

✓ icp> kubectl get po -n tenant-tenant1 NAME READY STATUS RESTARTS AGE tenant1-rule1-0a7c0d7f-ba7f-4301-afed-8db108dbee1a 1/1 Running 0 9m6s tenant1-uvspod1 1/1 Running 0 51m

Tenant1 cluster

Remove tenant1-rule1 rule in the tenant1 cluster

Copy
Copied!
            

✓ tenant1> kubectl delete -n universe workloadrules.workload.universe.nvidia.com tenant1-rule1 workloadrule.workload.universe.nvidia.com "tenant1-rule1" deleted

Infrastructure cluster

As result all Pods create by tenant1-rule1 rule in infrastructure cluster should be removed

Copy
Copied!
            

✓ icp> kubectl get po -n tenant-tenant1 NAME READY STATUS RESTARTS AGE tenant1-uvspod1 1/1 Running 0 59m

universe.admin.workload.v1 GRPC API documentation

Current state of the clusters is following: * tenant1 cluster has tenant1-pod1 pod * tenant2 cluster has tenant2-pod1 pod

Copy
Copied!
            

tenant1> kubectl get po NAME READY STATUS RESTARTS AGE tenant1-pod1 1/1 Running 0 3h42m tenant2> kubectl get po NAME READY STATUS RESTARTS AGE tenant2-pod1 1/1 Running 0 3h41m

Now we will define AdminWorkload rule which match Pods from both tenants

Check Manual GRPC API usage doc for instructions how to use CloudAdmin APIs with grpcurl.

From Cloud Admin host

Copy
Copied!
            

# put base64 encoded Pod spec to RULE_TEMPLATE shel variable RULE_TEMPLATE=$(cat << EOM | base64 -w0 { "apiVersion": "v1", "kind": "Pod", "metadata": { "name": "nginx" }, "spec": { "containers": [ { "name": "nginx", "image": "nginx:1.14.2", "ports": [ { "containerPort": 80 } ] } ] } } EOM ) # -d @ argument for grpcurl mean read arguments from STDIN # use content of RULE_TEMPLATE shell variable as rule.data.rule_template grpcurl -cacert=ca.crt -cert=admin.crt -key=admin.key -servername api-gateway.local \ -d @ -proto universe/admin/workload/v1/admin_workload_rule.proto 10.133.133.1:30001 \ universe.admin.workload.v1.AdminWorkloadRuleService.Create << EOM { "rule": { "id": "adminrule1", "tenant_match": [ "tenant1", "tenant2" ], "data": { "orchestrator_type": 1, "resource_type": "v1/Pod", "dpu_selection_policy": "SameNode", "workload_terms": [ { "match_expressions": [ { "key": "metadata.resourceNamespace", "operation": 1, "values": [ "default" ] } ] } ], "workload_info_inject": [ { "key": "@", "as_annotation": { "name": "full-workload-info" } } ], "rule_template": "$RULE_TEMPLATE" } } } EOM

The command above will create AdminWorkloadRule, which will match workloads(Pods) in the default namespace in both tenant clusters. This rule should match tenant1-pod1 and tenant2-pod1 and create a Pod in the universe namespace in the infrastructure cluster for each.

The AdminWorkloadRule uses "dpu_selection_policy": "SameNode" which means that the Pod created in the infrastructure cluster should start on the DPU, which is installed to the host on which the tenant workload is running.

Infrastructure cluster

Copy
Copied!
            

icp > kubectl get po -n universe | grep adminrule1 adminrule1-tenant-tenant1-0a7c0d7f-ba7f-4301-afed-8db108dbee1a 1/1 Running 0 3m26s adminrule1-tenant-tenant2-6c815148-a769-4271-8c4c-a9485c59cfbd 1/1 Running 0 3m26s

From Cloud Admin host

Remove AdminWorkloadRule adminrule1 and check that related Pods will be removed from the infrastructure cluster

Copy
Copied!
            

grpcurl -cacert=ca.crt -cert=admin.crt -key=admin.key -servername api-gateway.local \ -d '{"id": "adminrule1"}' \ -proto universe/admin/workload/v1/admin_workload_rule.proto 10.133.133.1:30001 \ universe.admin.workload.v1.AdminWorkloadRuleService.Delete

Infrastructure cluster

All Pods created by adminrule1 should be removed

Copy
Copied!
            

icp > kubectl get po -n universe | grep adminrule1

Previous Universe Overview
Next Provisioning API - Overview
© Copyright 2023, NVIDIA. Last updated on Feb 7, 2024.