Workload API

This page describes Workload API concepts with examples.

Workload API provides a way to define Tenant-driven or CloudAdmin-driven rule, which will trigger Pod resource creation in the infrastructure cluster if a workload (Pod), which matches this rule, has started in the Tenant cluster.

A workload is something that is running in the tenant cluster. For example, a workload can represent Kubernetes Pod, Openstack VM, or something else. Currently, the only supported tenant orchestrator is Kubernetes, and the only supported resource type is Pod.

Workload API is a set of APIs; it includes Tenant and CloudAdmin APIs.

Tenant API

Workload GRPC API - API to deliver notifications about workloads running in the Tenant cluster to the infrastructure cluster
WorkloadRule GRPC API - API to create WorkloadRules in the Infrastructure cluster
WorkloadRule CRD - simplify usage of WorkloadRule GRPC API from the tenant cluster

CloudAdmin API

AdminWorkloadRule GRPC API - API to create AdminWorkloadRules in the infrastructure cluster

Infrastructure

The following infrastructure is used for the demonstration

Infrastructure cluster

Copy
Copied!

            
            ✓ icp> kubectl get node
NAME          STATUS   ROLES           AGE   VERSION
dpu1-host-a   Ready    <none>          27h   v1.24.0
dpu1-host-b   Ready    <none>          27h   v1.24.0
dpu1-host-c   Ready    <none>          27h   v1.24.0
dpu1-host-d   Ready    <none>          27h   v1.24.0
icp-master    Ready    control-plane   27h   v1.24.0

Tenant1 cluster

Copy
Copied!

            
            ✓ tenant1> kubectl get node
NAME             STATUS   ROLES           AGE   VERSION
host-a           Ready    <none>          27h   v1.24.0
host-b           Ready    <none>          27h   v1.24.0
tenant1-master   Ready    control-plane   27h   v1.24.0

Tenant2 cluster

Copy
Copied!

            
            ✓ tenant2> kubectl get node
NAME             STATUS   ROLES           AGE   VERSION
host-c           Ready    <none>          27h   v1.24.0
host-d           Ready    <none>          27h   v1.24.0
tenant2-master   Ready    control-plane   27h   v1.24.0

Universe components

Universe components are deployed to infrastructure and tenant clusters by following Deployment guide.

As a result, in the infrastructure cluster, we have a separate namespace for each tenant

Copy
Copied!

            
            ✓ icp> kubectl get ns | grep tenant
tenant-tenant1       Active   26h
tenant-tenant2       Active   28h

Universe components in infrastructure cluster

Copy
Copied!

            
            ✓ icp> kubectl get po -n vault
NAME                                    READY   STATUS    RESTARTS   AGE
vault-0                                 1/1     Running   0          28h
vault-agent-injector-6fd8f84794-xqlg9   1/1     Running   0          21s

✓ icp> kubectl get po -n universe
NAME                                                        READY   STATUS    RESTARTS   AGE
icp-universe-infra-admin-controller-6c578657ff-5bggg        1/1     Running   0          32s
icp-universe-infra-api-gateway-888c7dd8b-bqgpn              2/2     Running   0          31s
icp-universe-infra-provisioning-manager-65ddd8d568-c9bzr    1/1     Running   0          32s
icp-universe-infra-resource-manager-5cfcd597bc-shk25        1/1     Running   0          32s
icp-universe-infra-workload-controller-68f7ffcc77-sfg9c     1/1     Running   0          32s
icp-universe-infra-workload-manager-58fdbd88bd-4z7l8        1/1     Running   0          31s
icp-universe-infra-workload-rule-manager-7d7686d6cc-56qfz   1/1     Running   0          31s

Tenant1 components

Copy
Copied!

            
            ✓ tenant1> kubectl get po -n vault
NAME                                    READY   STATUS    RESTARTS   AGE
vault-agent-injector-55d7dc8c6f-67kcl   1/1     Running   0          64s
✓ tenant1> kubectl get po -n universe
NAME                                                           READY   STATUS    RESTARTS   AGE
tcp-universe-k8s-tenant-resource-plugin-8455d9cd59-dnl2q       3/3     Running   0          35s
tcp-universe-k8s-tenant-workload-plugin-857dcb4b8c-tpn8s       3/3     Running   0          35s
tcp-universe-k8s-tenant-workload-rule-plugin-f5bc8d45b-h66tq   3/3     Running   0          35s

Tenant2 components

Copy
Copied!

            
            ✓ tenant2> kubectl get po -n vault
NAME                                    READY   STATUS    RESTARTS   AGE
vault-agent-injector-55d7dc8c6f-6xc52   1/1     Running   0          19s
✓ tenant2> kubectl get po -n universe
NAME                                                           READY   STATUS    RESTARTS   AGE
tcp-universe-k8s-tenant-resource-plugin-8455d9cd59-pqfb6       3/3     Running   0          13s
tcp-universe-k8s-tenant-workload-plugin-857dcb4b8c-5qmm5       3/3     Running   0          13s
tcp-universe-k8s-tenant-workload-rule-plugin-f5bc8d45b-pp6j2   3/3     Running   0          12s

Resource API - testing

Here we will test universe resource API in the tenant1 cluster.

Create UVSPod resource

Copy
Copied!

            
            ✓ tenant1> cat << 'EOF' | tee tenant1-uvspod1.yaml
apiVersion: resource.universe.nvidia.com/v1alpha1
kind: UVSPod
metadata:
name: tenant1-uvspod1
namespace: universe
spec:
object:
apiVersion: v1
kind: Pod
metadata:
name: tenant1-uvspod1
spec:
containers:
- name: nginx
image: nginx:1.14.2
EOF

✓ tenant1> kubectl apply -f tenant1-uvspod1.yaml
uvspod.resource.universe.nvidia.com/tenant1-uvspod1 created

Check UVSPod status in the tenant cluster

Copy
Copied!

            
            ✓ tenant1> kubectl get uvspods.resource.universe.nvidia.com -n universe tenant1-uvspod1
NAME              RESULT    MESSAGE
tenant1-uvspod1   success

If everything operate correctly RESULT should be success

Check that the Pod resource has been created in the tenant namespace in the infrastructure cluster.

Copy
Copied!

            
            ✓ icp> kubectl get po -n tenant-tenant1
NAME              READY   STATUS    RESTARTS   AGE
tenant1-uvspod1   1/1     Running   0          4m16s

In the infrastructure cluster we should see tenant-pod1 with spec from UVSPod which we created in the tenant cluster.

Workload API – testing

Tenant1 cluster

Create tenant1-pod1 Pod in the default namespace in the tenant1 cluster

Copy
Copied!

            
            ✓ tenant1> cat << 'EOF' | tee tenant1-pod1.yaml
apiVersion: v1
kind: Pod
metadata:
name: tenant1-pod1
spec:
containers:
- name: nginx
image: nginx:1.14.2
EOF
✓ tenant1> kubectl apply -f tenant1-pod1.yaml
pod/tenant1-pod1 created

Create tenant1-pod2 Pod in the default namespace in the tenant1 cluster

Copy
Copied!

            
            ✓ tenant1> cat << 'EOF' | tee tenant1-pod2.yaml
apiVersion: v1
kind: Pod
metadata:
name: tenant1-pod2
spec:
containers:
- name: nginx
image: nginx:1.14.2
EOF
✓ tenant1> kubectl apply -f tenant1-pod2.yaml
pod/tenant1-pod2 created

Tenant2 cluster

Create tenant2-pod1 Pod in the default namespace in the tenant2 cluster

Copy
Copied!

            
            ✓ tenant2> cat << 'EOF' | tee tenant2-pod1.yaml
apiVersion: v1
kind: Pod
metadata:
name: tenant2-pod1
spec:
containers:
- name: nginx
image: nginx:1.14.2
EOF
✓ tenant2> kubectl apply -f tenant2-pod1.yaml
pod/tenant2-pod1 created

Tenant1 cluster

Create tenant1-rule1 WorkloadRule CR in the tenant1 cluster. This rule will match a workload if it runs in the default namespace in the tenant cluster and the workload name is tenant1-pod1.

For matching workloads, the rule will trigger the creation of the Pod, defined in the CR template section (simple nginx pod in our case).

Check WorkloadRule CRD format description for details.

Copy
Copied!

            
            ✓ tenant1> cat << 'EOF' | tee tenant1-rule1.yaml
apiVersion: workload.universe.nvidia.com/v1alpha1
kind: WorkloadRule
metadata:
name: tenant1-rule1
namespace: universe
spec:
resourceType: v1/Pod
workloadTerms:
- matchExpressions:
- key: metadata.resourceNamespace
operator: In
values:
- default
- key: metadata.resourceName
operator: In
values:
- tenant1-pod1
workloadInfoInject:
- workloadKey: state.nodeName
asAnnotation:
name: tenant-node-name
- workloadKey: state.extra.labels
asAnnotation:
name: tenant-workload-labels
dpuSelectionPolicy: Any
template:
apiVersion: v1
kind: Pod
spec:
containers:
- name: nginx
image: nginx:1.14.2
volumeMounts:
- name: workload-info
mountPath: /workload-info
- name: workload-labels
mountPath: /workload-labels
# standard k8s way to mount annotation as a volume
volumes:
- name: workload-info
downwardAPI:
items:
- path: node-name
fieldRef:
fieldPath: metadata.annotations['tenant-node-name']
- name: workload-labels
downwardAPI:
items:
- path: labels
fieldRef:
fieldPath: metadata.annotations['tenant-workload-labels']
EOF
✓ tenant1> kubectl apply -f tenant1-rule1.yaml
workloadrule.workload.universe.nvidia.com/tenant1-rule1 created

Infrastructure cluster

tenant1-rule1 should match only tenant1-pod1 Pod, we expect that single nginx Pod will be created in tenant-tenant1 namespace in the infrastructure cluster

Copy
Copied!

            
            # tenant1-uvspod1 is a pod which we created earlier
✓ icp> kubectl get po -n tenant-tenant1
NAME                                                 READY   STATUS    RESTARTS   AGE
tenant1-rule1-0a7c0d7f-ba7f-4301-afed-8db108dbee1a   1/1     Running   0          63s
tenant1-uvspod1                                      1/1     Running   0          16m
# there should be no pods in tenant-tenant2 namespace
✓ icp> kubectl get po -n tenant-tenant2
No resources found in tenant-tenant2 namespace.

You can use the following snippet to check which Pod create by which rule

Copy
Copied!

            
            ✓ icp> kubectl get pods -n tenant-tenant1 -o=jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.metadata.annotations.workloadrule\.workload\.universe\.nvidia\.com/name}{"\n"}{end}'
tenant1-rule1-0a7c0d7f-ba7f-4301-afed-8db108dbee1a  tenant1-rule1
tenant1-uvspod1

Also, it is possible to check the workload status in the infrastructure cluster to which rules it matches and which Pods were created for this workload.

Copy
Copied!

            
            ✓ icp> kubectl get -n tenant-tenant1 workloads.workload.infra.universe.nvidia.com workload-0a7c0d7f-ba7f-4301-afed-8db108dbee1a -o jsonpath={.status} | jq
{
  "rules": {
    "tenant": [
      {
        "id": "tenant-tenant1/tenant1-rule1",
        "status": {
          "objRef": {
            "apiVersion": "v1",
            "kind": "Pod",
            "name": "tenant1-rule1-0a7c0d7f-ba7f-4301-afed-8db108dbee1a",
            "namespace": "tenant-tenant1"
          },
        }
      }
    ]
  }
}

Tenant1 cluster

Update Pod template in tenant1-rule1 WorkloadRule

Copy
Copied!

            
            kubectl patch workloadrules.workload.universe.nvidia.com -n universe --type='json' \
 -p '[{"op" : "replace","path" : "/spec/template/spec/containers/0/name", "value": "updated"}]' tenant1-rule1

Infrastructure cluster

Pod in the infrastructure cluster should be recreated with updated spec, container should now have name updated

Copy
Copied!

            
            ✓ icp> kubectl get pods -n tenant-tenant1 -o=jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.spec.containers[0].name}{"\n"}{end}'
tenant1-rule1-0a7c0d7f-ba7f-4301-afed-8db108dbee1a  updated
tenant1-uvspod1     nginx

Now let’s check that workload info injection works as expected. In tenant1-rule1 WorkloadRule, we have a section configuring Workload labels injection for the Pod created in the infra cluster. With the command below, we check the content of the /workload-labels/labels file, which should include workload labels in JSON format. Currently, it should be empty.

Copy
Copied!

            
            # find POD which was create by tenant1-rule1 rule
icp > RULE_POD=$(kubectl get pod -n tenant-tenant1 -o jsonpath='{range .items[?(@.metadata.annotations.workloadrule\.workload\.universe\.nvidia\.com/name=="tenant1-rule1")]}{ .metadata.name}{"\n"}{end}' | head -n1)
# check file content inside the POD
icp > kubectl exec -ti -n tenant-tenant1 $RULE_POD -- cat /workload-labels/labels; echo
{}

Tenant1 cluster

Update labels for tenant1-pod1 in the tenant1 cluster. Expected that this info will be transferred to the Pod which was created by the tenant1-rule1 WorkloadRule in the infrastructure cluster

Copy
Copied!

            
            tenant1> kubectl label pod tenant1-pod1 foo=bar
pod/tenant1-pod1 labeled

Infrastructure cluster

Let’s check that workload labels where injected to the Pod in infrastructure cluster

Copy
Copied!

            
            # find POD which was create by tenant1-rule1 rule
icp > RULE_POD=$(kubectl get pod -n tenant-tenant1 -o jsonpath='{range .items[?(@.metadata.annotations.workloadrule\.workload\.universe\.nvidia\.com/name=="tenant1-rule1")]}{ .metadata.name}{"\n"}{end}' | head -n1)
# check file content inside the POD
icp > kubectl exec -ti -n tenant-tenant1 $RULE_POD -- cat /workload-labels/labels; echo
{"foo":"bar"}

Tenant1 cluster

Remove resourceName constraint from tenant1-rule1 WorkloadRule

Copy
Copied!

            
            kubectl patch workloadrules.workload.universe.nvidia.com -n universe --type='json' \
 -p '[{"op" : "remove","path" : "/spec/workloadTerms/0/matchExpressions/1"}]' tenant1-rule1

Now tenant1-rule1 rule should match all Pods which running in the default namespace in the tenant1 cluster

Infrastructure cluster

Copy
Copied!

            
            ✓ icp> kubectl get po -n tenant-tenant1
NAME                                                 READY   STATUS    RESTARTS   AGE
tenant1-rule1-0a7c0d7f-ba7f-4301-afed-8db108dbee1a   1/1     Running   0          3m20s
tenant1-rule1-10ead831-47b6-407f-a903-c5c4cd92e6e8   1/1     Running   0          3m22s
tenant1-uvspod1                                      1/1     Running   0          51m

Additional Pod was created in the infrastructure cluster as result of tenant1-rule1 match with tenant1-pod2 in the tenant1 cluster.

Tenant1 cluster

Mirror UVSPods should be created in the tenant1 cluster for tenant1-rule1-* Pods.

Copy
Copied!

            
            tenant1> kubectl get uvspods.resource.universe.nvidia.com  -n universe
NAME                                                 RESULT    MESSAGE
tenant1-rule1-0a7c0d7f-ba7f-4301-afed-8db108dbee1a   success
tenant1-rule1-10ead831-47b6-407f-a903-c5c4cd92e6e8  success

Now we will remove tenant1-pod2 in the tenant1 cluster.

Copy
Copied!

            
            ✓ tenant1> kubectl delete po tenant1-pod2
pod "tenant1-pod2" deleted

Infrastructure cluster

As a result, Pod in the infrastructure cluster, which was created by the tenant1-rule1 rule for tenant1-pod2 Pod should be removed

Copy
Copied!

            
            ✓ icp> kubectl get po -n tenant-tenant1
NAME                                                 READY   STATUS    RESTARTS   AGE
tenant1-rule1-0a7c0d7f-ba7f-4301-afed-8db108dbee1a   1/1     Running   0          9m6s
tenant1-uvspod1                                 1/1     Running   0          51m

Tenant1 cluster

Remove tenant1-rule1 rule in the tenant1 cluster

Copy
Copied!

            
            ✓ tenant1> kubectl delete -n universe workloadrules.workload.universe.nvidia.com tenant1-rule1
workloadrule.workload.universe.nvidia.com "tenant1-rule1" deleted

Infrastructure cluster

As result all Pods create by tenant1-rule1 rule in infrastructure cluster should be removed

Copy
Copied!

            
            ✓ icp> kubectl get po -n tenant-tenant1
NAME              READY   STATUS    RESTARTS   AGE
tenant1-uvspod1   1/1     Running   0          59m

AdminWorkload API – testing

universe.admin.workload.v1 GRPC API documentation

Current state of the clusters is following: * tenant1 cluster has tenant1-pod1 pod * tenant2 cluster has tenant2-pod1 pod

Copy
Copied!

            
            tenant1> kubectl get po
NAME           READY   STATUS    RESTARTS   AGE
tenant1-pod1   1/1     Running   0          3h42m

tenant2> kubectl get po
NAME           READY   STATUS    RESTARTS   AGE
tenant2-pod1   1/1     Running   0          3h41m

Now we will define AdminWorkload rule which match Pods from both tenants

Check Manual GRPC API usage doc for instructions how to use CloudAdmin APIs with grpcurl.

From Cloud Admin host

Copy
Copied!

            
            # put base64 encoded Pod spec to RULE_TEMPLATE shel variable
RULE_TEMPLATE=$(cat << EOM | base64 -w0
{
"apiVersion": "v1",
"kind": "Pod",
"metadata": {
"name": "nginx"
},
"spec": {
"containers": [
{
"name": "nginx",
"image": "nginx:1.14.2",
"ports": [
{
"containerPort": 80
}
]
}
]
}
}
EOM
)

# -d @ argument for grpcurl mean read arguments from STDIN
# use content of RULE_TEMPLATE shell variable as rule.data.rule_template
grpcurl -cacert=ca.crt -cert=admin.crt -key=admin.key -servername api-gateway.local \
    -d @ -proto universe/admin/workload/v1/admin_workload_rule.proto 10.133.133.1:30001 \
     universe.admin.workload.v1.AdminWorkloadRuleService.Create << EOM
{
"rule": {
"id": "adminrule1",
"tenant_match": [
"tenant1", "tenant2"
],
"data": {
"orchestrator_type": 1,
"resource_type": "v1/Pod",
"dpu_selection_policy": "SameNode",
"workload_terms": [
{
"match_expressions": [
{
"key": "metadata.resourceNamespace",
"operation": 1,
"values": [
"default"
]
}
]
}
],
"workload_info_inject": [
{
"key": "@",
"as_annotation": {
"name": "full-workload-info"
}
}
],
"rule_template": "$RULE_TEMPLATE"
}
}
}
EOM

The command above will create AdminWorkloadRule, which will match workloads(Pods) in the default namespace in both tenant clusters. This rule should match tenant1-pod1 and tenant2-pod1 and create a Pod in the universe namespace in the infrastructure cluster for each.

The AdminWorkloadRule uses "dpu_selection_policy": "SameNode" which means that the Pod created in the infrastructure cluster should start on the DPU, which is installed to the host on which the tenant workload is running.

Infrastructure cluster

Copy
Copied!

            
            icp > kubectl get po -n universe  | grep adminrule1
adminrule1-tenant-tenant1-0a7c0d7f-ba7f-4301-afed-8db108dbee1a   1/1     Running   0          3m26s
adminrule1-tenant-tenant2-6c815148-a769-4271-8c4c-a9485c59cfbd   1/1     Running   0          3m26s

From Cloud Admin host

Remove AdminWorkloadRule adminrule1 and check that related Pods will be removed from the infrastructure cluster

Copy
Copied!

            
            grpcurl -cacert=ca.crt -cert=admin.crt -key=admin.key -servername api-gateway.local \
    -d '{"id": "adminrule1"}' \
    -proto universe/admin/workload/v1/admin_workload_rule.proto 10.133.133.1:30001 \
     universe.admin.workload.v1.AdminWorkloadRuleService.Delete

Infrastructure cluster

All Pods created by adminrule1 should be removed

Copy
Copied!

            
            icp > kubectl get po -n universe  | grep adminrule1