UCX support in UCS Tools
This document assumes that two application containers - one with a UCX server and another with a UCX client - are already available.
Sample UCX Server Microservice
Following are contents of the manifest file for a sample UCX server microservice:
type: msapplication
specVersion: 2.5.0
name: ucf.svc.server-ucx
chartName: server-ucx
description: default description
version: 0.0.1
displayName: ""
category:
functional: ""
industry: ""
tags: []
keywords: []
nSpectId: NSPECT-0000-0000
publish: false
ingress-endpoints:
- name: ucx-server
description: UCX server
scheme: ucx
data-flow: in-out # Or in or out
params:
networks: "rdma-net-ipam"
#> type: string
#> description: available networks for UCX endpoint
serviceNetwork: "rdma-net-ipam"
#> type: string
#> description: network to which UCX endpoint's service should be connected
---
spec:
- name: server-ucx-deployment
type: ucf.k8s.app.deployment
parameters:
apptype: stateless
- name: "server-ucx-container"
type: ucf.k8s.container
parameters:
image:
repository: gitlab-master.nvidia.com:5005/adv-dev-team/distributed-ai/deepstream-sdk/ucf-ms
tag: "server"
securityContext:
privileged: true
capabilities:
add:
- CAP_SYS_PTRACE
volumeMounts:
- mountPath: /dev/shm
name: dshm
- name: graph-config
subPath: parameters.yaml
mountPath: /workspace/sample_graph/server-parameters.yaml
- name: dshm
type: ucf.k8s.volume
parameters:
emptyDir:
medium: Memory
- name: graph-config
type: ucf.k8s.volume
parameters:
configMap:
name: ucx-graph-config-server
- name: ucx-graph-config-server
type: ucf.k8s.configmap
parameters:
name: ucx-graph-config-server
data:
parameters.yaml: |
---
components:
- name: nv_ds_single_src_input0
parameters:
uri: file:///opt/nvidia/deepstream/deepstream/samples/streams/sample_1080p_h265.mp4
name: NvDsSingleSrcInput
---
components:
- name: nv_ds_ucx_server_sink48
parameters:
port: 7174
addr: 0.0.0.0
name: NvDsUcxServerSink
---
components:
- name: gst_caps_filter77
parameters:
caps: video/x-raw(memory:NVMM), width=1280, height=720
name: GstCapsFilter
- name: server-ucx-service
type: ucf.k8s.service
parameters:
labels:
service.kubernetes.io/service-proxy-name: multus-proxy
annotations:
k8s.v1.cni.cncf.io/service-network: $params.serviceNetwork
ports:
- port: 7174
name: ucx-server
protocol: UDP
clusterIP: None
- name: podAnnotations
type: ucf.k8s.podAnnotations
parameters:
annotations:
k8s.v1.cni.cncf.io/networks: $params.networks
Salient points to note in the manifest file are:
An Ingress endpoint with scheme
ucx
A parameter
networks
for the networks that should be used by the podsA parameter
serviceNetwork
for the network that should be used by the Kubernetes service for the podsHaving parameters for the two configuration allows for these configurations to be set by the application developer during deployment time
A Memory based emptyDir volume named
dshm
that must be mounted at location/dev/shm
in containers using UCXA
ucf.k8s.service
component for Kubernetes service abstraction for the UCX server with:service.kubernetes.io/service-proxy-name
label set tomultus-proxy
k8s.v1.cni.cncf.io/service-network
annotation set to the parameter$params.serviceNetwork
A port with name
ucx-server
that matches the Ingress endpoint nameclusterIP
set toNone
A
podAnnotations
component with the annotationk8s.v1.cni.cncf.io/networks
set to the parameter$params.networks
Sample UCX Client Microservice
Following are contents of the manifest file for a sample UCX client microservice:
type: msapplication
specVersion: 2.5.0
name: ucf.svc.client-ucx
chartName: client-ucx
description: default description
version: 0.0.1
displayName: ""
category:
functional: ""
industry: ""
tags: []
keywords: []
nSpectId: NSPECT-0000-0000
publish: false
egress-endpoints:
- name: ucx-client
description: Client UCX EP
protocol: UDP
scheme: ucx # Or grpc / rtsp / asyncio / none
mandatory: True # Or False
data-flow: in-out # Or in or out
params:
networks: "rdma-net-ipam"
#> type: string
#> description: available networks for UCX endpoint
---
spec:
- name: client-ucx-deployment
type: ucf.k8s.app.deployment
parameters:
apptype: job
- name: "client-ucx-container"
type: ucf.k8s.container
parameters:
ucxConfig:
name: ucx-graph-config
mountPath: /workspace/sample_graph/client-parameters.yaml
image:
repository: gitlab-master.nvidia.com:5005/adv-dev-team/distributed-ai/deepstream-sdk/ucf-ms
tag: "client"
securityContext:
privileged: true
capabilities:
add:
- CAP_SYS_PTRACE
volumeMounts:
- mountPath: /dev/shm
name: dshm
- name: dshm
type: ucf.k8s.volume
parameters:
emptyDir:
medium: Memory
- name: ucx-graph-config
type: ucf.k8s.ucx-config
parameters:
name: ucx-graph-config
data: |
---
components:
- name: nv_ds_ucx_client_src7
parameters:
addr: $egress.ucx-client.address
port: {{ index .Values.egress "ucx-client" "port" }}
name: NvDsUcxClientSrc
---
components:
- name: gst_caps_filter9
parameters:
caps: video/x-raw(memory:NVMM), format=NV12, width=1280, height=720, framerate=30/1
name: GstCapsFilter
---
components:
- name: nv_ds_video_renderer13
parameters:
video-sink: 1
name: NvDsVideoRenderer
- name: restartPolicy
type: ucf.k8s.restartPolicy
parameters:
policy: OnFailure # Always / OnFailure / Never
- name: dnsPolicy
type: ucf.k8s.dnsPolicy
parameters:
policy: ClusterFirst
- name: podAnnotations
type: ucf.k8s.podAnnotations
parameters:
annotations:
k8s.v1.cni.cncf.io/networks: $params.networks
Salient points to note in the manifest file are:
An Egress endpoint with scheme
ucx
A parameter
networks
for the networks that should be used by the podsHaving parameter for the configuration allows for the configuration to be set by the application developer during deployment time
A Memory based emptyDir volume named
dshm
that must be mounted at location/dev/shm
in containers using UCXA
ucf.k8s.ucx-config
component with nameucx-graph-config
which can be used as a configuration file for the application using UCX with:The configuration file contents must be set as a string on the
data
parameter of the componentA placeholder
$egress.<egress-endpoint-name>.address
(e.g.$egress.ucx-client.address
in this case) must be used wherever the IP address of the UCX service is requiredA placeholder
{{ index .Values.egress "<egress-endpoint-name>" "port" }}
(e.g.{{ index .Values.egress "ucx-client" "port" }}
in this case) must be used wherever the port of the UCX service is required
Mount the configuration file to the application container (
client-ucx-container
here) by settingucxConfig
parameter with:name
set to name ofucf.k8s.ucx-config
component (ucx-graph-config
in this case)mountPath
set to the path where the configuration file should be mounted
A
podAnnotations
component with the annotationk8s.v1.cni.cncf.io/networks
set to the parameter$params.networks
Adding and connecting microservices with UCX endpoints in UCS Application
Microservices with UCX endpoints can be added and connected similar to microservices with other endpoint types.
An example of such an application that includes microservices described in the previous sections is:
specVersion: 2.5.0
version: 0.0.1
name: ucx-app
description: Sample UCX app
dependencies:
- ucf.svc.server-ucx:0.0.1
- ucf.svc.client-ucx:0.0.1
components:
- name: server-ucx
type: ucf.svc.server-ucx
parameters:
imagePullSecrets:
- name: <image-pull-secret-name>
networks: "rdma-net-ipam"
serviceNetwork: rdma-net-ipam
resources:
limits:
rdma/rdma_shared_device_a: 1
requests:
rdma/rdma_shared_device_a: 1
- name: client-ucx
type: ucf.svc.client-ucx
parameters:
imagePullSecrets:
- name: <image-pull-secret-name>
networks: "rdma-net-ipam"
resources:
limits:
rdma/rdma_shared_device_a: 1
requests:
rdma/rdma_shared_device_a: 1
connections:
client-ucx/ucx-client: server-ucx/ucx-server
Salient points to note in the application are:
networks
parameter set on both microservices that indicates the network interfaces to be used by pods of the microservices. (The microservices must implement a parameter for this to be set by the application.)serviceNetwork
parameter set on server microservice that indicates the network interface to be used by Kubernetes Service of the microservice. (The microservice must implement a parameter for this to be set by the application.)A NIC resource must be assigned to the microservices using resources limits and requests (e.g.
rdma/rdma_shared_device_a
set to 1 in this case). This will depend on the infrastructure and network operator configuration.
Deploying UCS Applications with UCX
Hardware requirements
RDMA capable hardware: Mellanox ConnectX-5 NIC or newer
NVIDIA GPU and driver supporting GPUDirect e.g Quadro RTX 6000/8000 or Tesla T4 or Tesla V100 or Tesla V100. (GPU-Direct only)
Pre-requisites
Network Operator
Refer to the Network Operator documentation - https://docs.nvidia.com/networking/display/COKAN10/Network+Operator
Install the Network Operator
Enable secondary network configuration
Enable RDMA Shared Device Plugin
Enable NVIDIA OFED Driver
Deploy a NetworkAttachmentDefinition CRD to define the RDMA device network
Multus Service Controller
Install the Multus Service Controller using the following command:
kubectl apply -f https://raw.githubusercontent.com/k8snetworkplumbingwg/multus-service/main/deploy-nft.yml
Disable the coredns cache
Edit the coredns configmap using the following command and change the ttl
to 0
if specified:
$ kubectl edit configmap coredns -n kube-system
Deploying the application
Make sure that the networks
and serviceNetwork
match the network name provided in the NetworkAttachmentDefinition
CRD
that is attached to the rdma device. Also make sure the rdma device resource name (rdma_shared_device_a
in the example) is
correct. This is determined by the rdma shared device plugin.
Install the application using:
helm install <release-name> <application-helm-chart>