UCX support in UCS Tools#
This document assumes that two application containers - one with a UCX server and another with a UCX client - are already available.
Sample UCX Server Microservice#
Following are contents of the manifest file for a sample UCX server microservice:
type: msapplication
specVersion: 2.5.0
name: ucf.svc.server-ucx
chartName: server-ucx
description: default description
version: 0.0.1
displayName: ""
category:
functional: ""
industry: ""
tags: []
keywords: []
nSpectId: NSPECT-0000-0000
publish: false
ingress-endpoints:
- name: ucx-server
description: UCX server
scheme: ucx
data-flow: in-out # Or in or out
params:
networks: "rdma-net-ipam"
#> type: string
#> description: available networks for UCX endpoint
serviceNetwork: "rdma-net-ipam"
#> type: string
#> description: network to which UCX endpoint's service should be connected
---
spec:
- name: server-ucx-deployment
type: ucf.k8s.app.deployment
parameters:
apptype: stateless
- name: "server-ucx-container"
type: ucf.k8s.container
parameters:
image:
repository: gitlab-master.nvidia.com:5005/adv-dev-team/distributed-ai/deepstream-sdk/ucf-ms
tag: "server"
securityContext:
privileged: true
capabilities:
add:
- CAP_SYS_PTRACE
volumeMounts:
- mountPath: /dev/shm
name: dshm
- name: graph-config
subPath: parameters.yaml
mountPath: /workspace/sample_graph/server-parameters.yaml
- name: dshm
type: ucf.k8s.volume
parameters:
emptyDir:
medium: Memory
- name: graph-config
type: ucf.k8s.volume
parameters:
configMap:
name: ucx-graph-config-server
- name: ucx-graph-config-server
type: ucf.k8s.configmap
parameters:
name: ucx-graph-config-server
data:
parameters.yaml: |
---
components:
- name: nv_ds_single_src_input0
parameters:
uri: file:///opt/nvidia/deepstream/deepstream/samples/streams/sample_1080p_h265.mp4
name: NvDsSingleSrcInput
---
components:
- name: nv_ds_ucx_server_sink48
parameters:
port: 7174
addr: 0.0.0.0
name: NvDsUcxServerSink
---
components:
- name: gst_caps_filter77
parameters:
caps: video/x-raw(memory:NVMM), width=1280, height=720
name: GstCapsFilter
- name: server-ucx-service
type: ucf.k8s.service
parameters:
labels:
service.kubernetes.io/service-proxy-name: multus-proxy
annotations:
k8s.v1.cni.cncf.io/service-network: $params.serviceNetwork
ports:
- port: 7174
name: ucx-server
protocol: UDP
clusterIP: None
- name: podAnnotations
type: ucf.k8s.podAnnotations
parameters:
annotations:
k8s.v1.cni.cncf.io/networks: $params.networks
Salient points to note in the manifest file are:
An Ingress endpoint with scheme
ucxA parameter
networksfor the networks that should be used by the podsA parameter
serviceNetworkfor the network that should be used by the Kubernetes service for the podsHaving parameters for the two configuration allows for these configurations to be set by the application developer during deployment time
A Memory based emptyDir volume named
dshmthat must be mounted at location/dev/shmin containers using UCXA
ucf.k8s.servicecomponent for Kubernetes service abstraction for the UCX server with:service.kubernetes.io/service-proxy-namelabel set tomultus-proxyk8s.v1.cni.cncf.io/service-networkannotation set to the parameter$params.serviceNetworkA port with name
ucx-serverthat matches the Ingress endpoint nameclusterIPset toNone
A
podAnnotationscomponent with the annotationk8s.v1.cni.cncf.io/networksset to the parameter$params.networks
Sample UCX Client Microservice#
Following are contents of the manifest file for a sample UCX client microservice:
type: msapplication
specVersion: 2.5.0
name: ucf.svc.client-ucx
chartName: client-ucx
description: default description
version: 0.0.1
displayName: ""
category:
functional: ""
industry: ""
tags: []
keywords: []
nSpectId: NSPECT-0000-0000
publish: false
egress-endpoints:
- name: ucx-client
description: Client UCX EP
protocol: UDP
scheme: ucx # Or grpc / rtsp / asyncio / none
mandatory: True # Or False
data-flow: in-out # Or in or out
params:
networks: "rdma-net-ipam"
#> type: string
#> description: available networks for UCX endpoint
---
spec:
- name: client-ucx-deployment
type: ucf.k8s.app.deployment
parameters:
apptype: job
- name: "client-ucx-container"
type: ucf.k8s.container
parameters:
ucxConfig:
name: ucx-graph-config
mountPath: /workspace/sample_graph/client-parameters.yaml
image:
repository: gitlab-master.nvidia.com:5005/adv-dev-team/distributed-ai/deepstream-sdk/ucf-ms
tag: "client"
securityContext:
privileged: true
capabilities:
add:
- CAP_SYS_PTRACE
volumeMounts:
- mountPath: /dev/shm
name: dshm
- name: dshm
type: ucf.k8s.volume
parameters:
emptyDir:
medium: Memory
- name: ucx-graph-config
type: ucf.k8s.ucx-config
parameters:
name: ucx-graph-config
data: |
---
components:
- name: nv_ds_ucx_client_src7
parameters:
addr: $egress.ucx-client.address
port: {{ index .Values.egress "ucx-client" "port" }}
name: NvDsUcxClientSrc
---
components:
- name: gst_caps_filter9
parameters:
caps: video/x-raw(memory:NVMM), format=NV12, width=1280, height=720, framerate=30/1
name: GstCapsFilter
---
components:
- name: nv_ds_video_renderer13
parameters:
video-sink: 1
name: NvDsVideoRenderer
- name: restartPolicy
type: ucf.k8s.restartPolicy
parameters:
policy: OnFailure # Always / OnFailure / Never
- name: dnsPolicy
type: ucf.k8s.dnsPolicy
parameters:
policy: ClusterFirst
- name: podAnnotations
type: ucf.k8s.podAnnotations
parameters:
annotations:
k8s.v1.cni.cncf.io/networks: $params.networks
Salient points to note in the manifest file are:
An Egress endpoint with scheme
ucxA parameter
networksfor the networks that should be used by the podsHaving parameter for the configuration allows for the configuration to be set by the application developer during deployment time
A Memory based emptyDir volume named
dshmthat must be mounted at location/dev/shmin containers using UCXA
ucf.k8s.ucx-configcomponent with nameucx-graph-configwhich can be used as a configuration file for the application using UCX with:The configuration file contents must be set as a string on the
dataparameter of the componentA placeholder
$egress.<egress-endpoint-name>.address(e.g.$egress.ucx-client.addressin this case) must be used wherever the IP address of the UCX service is requiredA placeholder
{{ index .Values.egress "<egress-endpoint-name>" "port" }}(e.g.{{ index .Values.egress "ucx-client" "port" }}in this case) must be used wherever the port of the UCX service is required
Mount the configuration file to the application container (
client-ucx-containerhere) by settingucxConfigparameter with:nameset to name ofucf.k8s.ucx-configcomponent (ucx-graph-configin this case)mountPathset to the path where the configuration file should be mounted
A
podAnnotationscomponent with the annotationk8s.v1.cni.cncf.io/networksset to the parameter$params.networks
Adding and connecting microservices with UCX endpoints in UCS Application#
Microservices with UCX endpoints can be added and connected similar to microservices with other endpoint types.
An example of such an application that includes microservices described in the previous sections is:
specVersion: 2.5.0
version: 0.0.1
name: ucx-app
description: Sample UCX app
dependencies:
- ucf.svc.server-ucx:0.0.1
- ucf.svc.client-ucx:0.0.1
components:
- name: server-ucx
type: ucf.svc.server-ucx
parameters:
imagePullSecrets:
- name: <image-pull-secret-name>
networks: "rdma-net-ipam"
serviceNetwork: rdma-net-ipam
resources:
limits:
rdma/rdma_shared_device_a: 1
requests:
rdma/rdma_shared_device_a: 1
- name: client-ucx
type: ucf.svc.client-ucx
parameters:
imagePullSecrets:
- name: <image-pull-secret-name>
networks: "rdma-net-ipam"
resources:
limits:
rdma/rdma_shared_device_a: 1
requests:
rdma/rdma_shared_device_a: 1
connections:
client-ucx/ucx-client: server-ucx/ucx-server
Salient points to note in the application are:
networksparameter set on both microservices that indicates the network interfaces to be used by pods of the microservices. (The microservices must implement a parameter for this to be set by the application.)serviceNetworkparameter set on server microservice that indicates the network interface to be used by Kubernetes Service of the microservice. (The microservice must implement a parameter for this to be set by the application.)A NIC resource must be assigned to the microservices using resources limits and requests (e.g.
rdma/rdma_shared_device_aset to 1 in this case). This will depend on the infrastructure and network operator configuration.
Deploying UCS Applications with UCX#
Hardware requirements#
RDMA capable hardware: Mellanox ConnectX-5 NIC or newer
NVIDIA GPU and driver supporting GPUDirect e.g Quadro RTX 6000/8000 or Tesla T4 or Tesla V100 or Tesla V100. (GPU-Direct only)
Pre-requisites#
Network Operator#
Refer to the Network Operator documentation - https://docs.nvidia.com/networking/display/COKAN10/Network+Operator
Install the Network Operator
Enable secondary network configuration
Enable RDMA Shared Device Plugin
Enable NVIDIA OFED Driver
Deploy a NetworkAttachmentDefinition CRD to define the RDMA device network
Multus Service Controller#
Install the Multus Service Controller using the following command:
kubectl apply -f https://raw.githubusercontent.com/k8snetworkplumbingwg/multus-service/main/deploy-nft.yml
Disable the coredns cache#
Edit the coredns configmap using the following command and change the ttl to 0 if specified:
$ kubectl edit configmap coredns -n kube-system
Deploying the application#
Make sure that the networks and serviceNetwork match the network name provided in the NetworkAttachmentDefinition CRD
that is attached to the rdma device. Also make sure the rdma device resource name (rdma_shared_device_a in the example) is
correct. This is determined by the rdma shared device plugin.
Install the application using:
helm install <release-name> <application-helm-chart>