UCX support in UCS Tools#
This document assumes that two application containers - one with a UCX server and another with a UCX client - are already available.
Sample UCX Server Microservice#
Following are contents of the manifest file for a sample UCX server microservice:
type: msapplication
specVersion: 2.5.0
name: ucf.svc.server-ucx
chartName: server-ucx
description: default description
version: 0.0.1
displayName: ""
category:
  functional: ""
  industry: ""
tags: []
keywords: []
nSpectId: NSPECT-0000-0000
publish: false
ingress-endpoints:
  - name: ucx-server
    description: UCX server
    scheme: ucx
    data-flow: in-out # Or in or out
params:
  networks: "rdma-net-ipam"
  #> type: string
  #> description: available networks for UCX endpoint
  serviceNetwork: "rdma-net-ipam"
  #> type: string
  #> description: network to which UCX endpoint's service should be connected
---
spec:
  - name: server-ucx-deployment
    type: ucf.k8s.app.deployment
    parameters:
      apptype: stateless
  - name: "server-ucx-container"
    type: ucf.k8s.container
    parameters:
      image:
        repository: gitlab-master.nvidia.com:5005/adv-dev-team/distributed-ai/deepstream-sdk/ucf-ms
        tag: "server"
      securityContext:
        privileged: true
        capabilities:
          add:
          - CAP_SYS_PTRACE
      volumeMounts:
      - mountPath: /dev/shm
        name: dshm
      - name: graph-config
        subPath: parameters.yaml
        mountPath: /workspace/sample_graph/server-parameters.yaml
  - name: dshm
    type: ucf.k8s.volume
    parameters:
      emptyDir:
        medium: Memory
  - name: graph-config
    type: ucf.k8s.volume
    parameters:
      configMap:
        name: ucx-graph-config-server
  - name: ucx-graph-config-server
    type: ucf.k8s.configmap
    parameters:
      name: ucx-graph-config-server
      data:
        parameters.yaml: |
           ---
           components:
           - name: nv_ds_single_src_input0
             parameters:
                    uri: file:///opt/nvidia/deepstream/deepstream/samples/streams/sample_1080p_h265.mp4
           name: NvDsSingleSrcInput
           ---
           components:
           - name: nv_ds_ucx_server_sink48
             parameters:
                    port: 7174
                    addr: 0.0.0.0
           name: NvDsUcxServerSink
           ---
           components:
           - name: gst_caps_filter77
           parameters:
           caps: video/x-raw(memory:NVMM), width=1280, height=720
           name: GstCapsFilter
  - name: server-ucx-service
    type: ucf.k8s.service
    parameters:
      labels:
        service.kubernetes.io/service-proxy-name: multus-proxy
      annotations:
        k8s.v1.cni.cncf.io/service-network: $params.serviceNetwork
      ports:
      - port: 7174
        name: ucx-server
        protocol: UDP
      clusterIP: None
  - name: podAnnotations
    type: ucf.k8s.podAnnotations
    parameters:
      annotations:
        k8s.v1.cni.cncf.io/networks: $params.networks
Salient points to note in the manifest file are:
- An Ingress endpoint with scheme - ucx
- A parameter - networksfor the networks that should be used by the pods
- A parameter - serviceNetworkfor the network that should be used by the Kubernetes service for the pods
- Having parameters for the two configuration allows for these configurations to be set by the application developer during deployment time 
- A Memory based emptyDir volume named - dshmthat must be mounted at location- /dev/shmin containers using UCX
- A - ucf.k8s.servicecomponent for Kubernetes service abstraction for the UCX server with:- service.kubernetes.io/service-proxy-namelabel set to- multus-proxy
- k8s.v1.cni.cncf.io/service-networkannotation set to the parameter- $params.serviceNetwork
- A port with name - ucx-serverthat matches the Ingress endpoint name
- clusterIPset to- None
 
- A - podAnnotationscomponent with the annotation- k8s.v1.cni.cncf.io/networksset to the parameter- $params.networks
Sample UCX Client Microservice#
Following are contents of the manifest file for a sample UCX client microservice:
type: msapplication
specVersion: 2.5.0
name: ucf.svc.client-ucx
chartName: client-ucx
description: default description
version: 0.0.1
displayName: ""
category:
  functional: ""
  industry: ""
tags: []
keywords: []
nSpectId: NSPECT-0000-0000
publish: false
egress-endpoints:
  - name: ucx-client
    description: Client UCX EP
    protocol: UDP
    scheme: ucx # Or grpc / rtsp / asyncio / none
    mandatory: True # Or False
    data-flow: in-out # Or in or out
params:
  networks: "rdma-net-ipam"
  #> type: string
  #> description: available networks for UCX endpoint
---
spec:
  - name: client-ucx-deployment
    type: ucf.k8s.app.deployment
    parameters:
      apptype: job
  - name: "client-ucx-container"
    type: ucf.k8s.container
    parameters:
      ucxConfig:
        name: ucx-graph-config
        mountPath: /workspace/sample_graph/client-parameters.yaml
      image:
        repository: gitlab-master.nvidia.com:5005/adv-dev-team/distributed-ai/deepstream-sdk/ucf-ms
        tag: "client"
      securityContext:
        privileged: true
        capabilities:
          add:
          - CAP_SYS_PTRACE
      volumeMounts:
      - mountPath: /dev/shm
        name: dshm
  - name: dshm
    type: ucf.k8s.volume
    parameters:
      emptyDir:
        medium: Memory
  - name: ucx-graph-config
    type: ucf.k8s.ucx-config
    parameters:
      name: ucx-graph-config
      data: |
           ---
           components:
           - name: nv_ds_ucx_client_src7
             parameters:
                    addr: $egress.ucx-client.address
                    port: {{ index .Values.egress "ucx-client" "port" }}
           name: NvDsUcxClientSrc
           ---
           components:
           - name: gst_caps_filter9
           parameters:
           caps: video/x-raw(memory:NVMM), format=NV12, width=1280, height=720, framerate=30/1
           name: GstCapsFilter
           ---
           components:
           - name: nv_ds_video_renderer13
             parameters:
               video-sink: 1
           name: NvDsVideoRenderer
  - name: restartPolicy
    type: ucf.k8s.restartPolicy
    parameters:
      policy: OnFailure # Always / OnFailure / Never
  - name: dnsPolicy
    type: ucf.k8s.dnsPolicy
    parameters:
      policy: ClusterFirst
  - name: podAnnotations
    type: ucf.k8s.podAnnotations
    parameters:
      annotations:
        k8s.v1.cni.cncf.io/networks: $params.networks
Salient points to note in the manifest file are:
- An Egress endpoint with scheme - ucx
- A parameter - networksfor the networks that should be used by the pods
- Having parameter for the configuration allows for the configuration to be set by the application developer during deployment time 
- A Memory based emptyDir volume named - dshmthat must be mounted at location- /dev/shmin containers using UCX
- A - ucf.k8s.ucx-configcomponent with name- ucx-graph-configwhich can be used as a configuration file for the application using UCX with:- The configuration file contents must be set as a string on the - dataparameter of the component
- A placeholder - $egress.<egress-endpoint-name>.address(e.g.- $egress.ucx-client.addressin this case) must be used wherever the IP address of the UCX service is required
- A placeholder - {{ index .Values.egress "<egress-endpoint-name>" "port" }}(e.g.- {{ index .Values.egress "ucx-client" "port" }}in this case) must be used wherever the port of the UCX service is required
 
- Mount the configuration file to the application container ( - client-ucx-containerhere) by setting- ucxConfigparameter with:- nameset to name of- ucf.k8s.ucx-configcomponent (- ucx-graph-configin this case)
- mountPathset to the path where the configuration file should be mounted
 
- A - podAnnotationscomponent with the annotation- k8s.v1.cni.cncf.io/networksset to the parameter- $params.networks
Adding and connecting microservices with UCX endpoints in UCS Application#
Microservices with UCX endpoints can be added and connected similar to microservices with other endpoint types.
An example of such an application that includes microservices described in the previous sections is:
specVersion: 2.5.0
version: 0.0.1
name: ucx-app
description: Sample UCX app
dependencies:
- ucf.svc.server-ucx:0.0.1
- ucf.svc.client-ucx:0.0.1
components:
- name: server-ucx
  type: ucf.svc.server-ucx
  parameters:
    imagePullSecrets:
    - name: <image-pull-secret-name>
    networks: "rdma-net-ipam"
    serviceNetwork: rdma-net-ipam
    resources:
      limits:
        rdma/rdma_shared_device_a: 1
      requests:
        rdma/rdma_shared_device_a: 1
- name: client-ucx
  type: ucf.svc.client-ucx
  parameters:
    imagePullSecrets:
    - name: <image-pull-secret-name>
    networks: "rdma-net-ipam"
    resources:
      limits:
        rdma/rdma_shared_device_a: 1
      requests:
        rdma/rdma_shared_device_a: 1
connections:
  client-ucx/ucx-client: server-ucx/ucx-server
Salient points to note in the application are:
- networksparameter set on both microservices that indicates the network interfaces to be used by pods of the microservices. (The microservices must implement a parameter for this to be set by the application.)
- serviceNetworkparameter set on server microservice that indicates the network interface to be used by Kubernetes Service of the microservice. (The microservice must implement a parameter for this to be set by the application.)
- A NIC resource must be assigned to the microservices using resources limits and requests (e.g. - rdma/rdma_shared_device_aset to 1 in this case). This will depend on the infrastructure and network operator configuration.
Deploying UCS Applications with UCX#
Hardware requirements#
- RDMA capable hardware: Mellanox ConnectX-5 NIC or newer 
- NVIDIA GPU and driver supporting GPUDirect e.g Quadro RTX 6000/8000 or Tesla T4 or Tesla V100 or Tesla V100. (GPU-Direct only) 
Pre-requisites#
Network Operator#
Refer to the Network Operator documentation - https://docs.nvidia.com/networking/display/COKAN10/Network+Operator
- Install the Network Operator 
- Enable secondary network configuration 
- Enable RDMA Shared Device Plugin 
- Enable NVIDIA OFED Driver 
- Deploy a NetworkAttachmentDefinition CRD to define the RDMA device network 
Multus Service Controller#
Install the Multus Service Controller using the following command:
kubectl apply -f https://raw.githubusercontent.com/k8snetworkplumbingwg/multus-service/main/deploy-nft.yml
Disable the coredns cache#
Edit the coredns configmap using the following command and change the ttl to 0 if specified:
$ kubectl edit configmap coredns -n kube-system
Deploying the application#
Make sure that the networks and serviceNetwork match the network name provided in the NetworkAttachmentDefinition CRD
that is attached to the rdma device. Also make sure the rdma device resource name (rdma_shared_device_a in the example) is
correct. This is determined by the rdma shared device plugin.
Install the application using:
helm install <release-name> <application-helm-chart>