UCX support in UCF

This document assumes that two application containers - one with a UCX server and another with a UCX client - are already available.

Sample UCX Server Microservice

Following are contents of the manifest file for a sample UCX server microservice:

type: msapplication
specVersion: 2.0.0
name: ucf.svc.server-ucx
chartName: server-ucx
description: default description
version: 0.0.1
displayName: ""
category:
  functional: ""
  industry: ""
tags: []
keywords: []
nSpectId: NSPECT-0000-0000

publish: false


ingress-endpoints:
  - name: ucx-server
    description: UCX server
    scheme: ucx
    data-flow: in-out # Or in or out


params:
  networks: "rdma-net-ipam"
  #> type: string
  #> description: available networks for UCX endpoint
  serviceNetwork: "rdma-net-ipam"
  #> type: string
  #> description: network to which UCX endpoint's service should be connected


---
spec:
  - name: server-ucx-deployment
    type: ucf.k8s.app.deployment
    parameters:
      apptype: stateless

  - name: "server-ucx-container"
    type: ucf.k8s.container
    parameters:
      image:
        repository: gitlab-master.nvidia.com:5005/adv-dev-team/distributed-ai/deepstream-sdk/ucf-ms
        tag: "server"
      securityContext:
        privileged: true
        capabilities:
          add:
          - CAP_SYS_PTRACE
      volumeMounts:
      - mountPath: /dev/shm
        name: dshm
      - name: graph-config
        subPath: parameters.yaml
        mountPath: /workspace/sample_graph/server-parameters.yaml

  - name: dshm
    type: ucf.k8s.volume
    parameters:
      emptyDir:
        medium: Memory

  - name: graph-config
    type: ucf.k8s.volume
    parameters:
      configMap:
        name: ucx-graph-config-server

  - name: ucx-graph-config-server
    type: ucf.k8s.configmap
    parameters:
      name: ucx-graph-config-server
      data:
        parameters.yaml: |
           ---
           components:
           - name: nv_ds_single_src_input0
             parameters:
                    uri: file:///opt/nvidia/deepstream/deepstream/samples/streams/sample_1080p_h265.mp4
           name: NvDsSingleSrcInput
           ---
           components:
           - name: nv_ds_ucx_server_sink48
             parameters:
                    port: 7174
                    addr: 0.0.0.0
           name: NvDsUcxServerSink
           ---
           components:
           - name: gst_caps_filter77
           parameters:
           caps: video/x-raw(memory:NVMM), width=1280, height=720
           name: GstCapsFilter

  - name: server-ucx-service
    type: ucf.k8s.service
    parameters:
      labels:
        service.kubernetes.io/service-proxy-name: multus-proxy
      annotations:
        k8s.v1.cni.cncf.io/service-network: $params.serviceNetwork
      ports:
      - port: 7174
        name: ucx-server
        protocol: UDP
      clusterIP: None

  - name: podAnnotations
    type: ucf.k8s.podAnnotations
    parameters:
      annotations:
        k8s.v1.cni.cncf.io/networks: $params.networks

Salient points to note in the manifest file are:

  • An Ingress endpoint with scheme ucx

  • A parameter networks for the networks that should be used by the pods

  • A parameter serviceNetwork for the network that should be used by the Kubernetes service for the pods

  • Having parameters for the two configuration allows for these configurations to be set by the application developer during deployment time

  • A Memory based emptyDir volume named dshm that must be mounted at location /dev/shm in containers using UCX

  • A ucf.k8s.service component for Kubernetes service abstraction for the UCX server with:

    • service.kubernetes.io/service-proxy-name label set to multus-proxy

    • k8s.v1.cni.cncf.io/service-network annotation set to the parameter $params.serviceNetwork

    • A port with name ucx-server that matches the Ingress endpoint name

    • clusterIP set to None

  • A podAnnotations component with the annotation k8s.v1.cni.cncf.io/networks set to the parameter $params.networks

Sample UCX Client Microservice

Following are contents of the manifest file for a sample UCX client microservice:

type: msapplication
specVersion: 2.0.0
name: ucf.svc.client-ucx
chartName: client-ucx
description: default description
version: 0.0.1
displayName: ""
category:
  functional: ""
  industry: ""
tags: []
keywords: []
nSpectId: NSPECT-0000-0000

publish: false

egress-endpoints:
  - name: ucx-client
    description: Client UCX EP
    protocol: UDP
    scheme: ucx # Or grpc / rtsp / asyncio / none
    mandatory: True # Or False
    data-flow: in-out # Or in or out

params:
  networks: "rdma-net-ipam"
  #> type: string
  #> description: available networks for UCX endpoint

---
spec:
  - name: client-ucx-deployment
    type: ucf.k8s.app.deployment
    parameters:
      apptype: job

  - name: "client-ucx-container"
    type: ucf.k8s.container
    parameters:
      ucxConfig:
        name: ucx-graph-config
        mountPath: /workspace/sample_graph/client-parameters.yaml
      image:
        repository: gitlab-master.nvidia.com:5005/adv-dev-team/distributed-ai/deepstream-sdk/ucf-ms
        tag: "client"
      securityContext:
        privileged: true
        capabilities:
          add:
          - CAP_SYS_PTRACE
      volumeMounts:
      - mountPath: /dev/shm
        name: dshm

  - name: dshm
    type: ucf.k8s.volume
    parameters:
      emptyDir:
        medium: Memory

  - name: ucx-graph-config
    type: ucf.k8s.ucx-config
    parameters:
      name: ucx-graph-config
      data: |
           ---
           components:
           - name: nv_ds_ucx_client_src7
             parameters:
                    addr: $egress.ucx-client.address
                    port: {{ index .Values.egress "ucx-client" "port" }}
           name: NvDsUcxClientSrc
           ---
           components:
           - name: gst_caps_filter9
           parameters:
           caps: video/x-raw(memory:NVMM), format=NV12, width=1280, height=720, framerate=30/1
           name: GstCapsFilter
           ---
           components:
           - name: nv_ds_video_renderer13
             parameters:
               video-sink: 1
           name: NvDsVideoRenderer

  - name: restartPolicy
    type: ucf.k8s.restartPolicy
    parameters:
      policy: OnFailure # Always / OnFailure / Never

  - name: dnsPolicy
    type: ucf.k8s.dnsPolicy
    parameters:
      policy: ClusterFirst

  - name: podAnnotations
    type: ucf.k8s.podAnnotations
    parameters:
      annotations:
        k8s.v1.cni.cncf.io/networks: $params.networks

Salient points to note in the manifest file are:

  • An Egress endpoint with scheme ucx

  • A parameter networks for the networks that should be used by the pods

  • Having parameter for the configuration allows for the configuration to be set by the application developer during deployment time

  • A Memory based emptyDir volume named dshm that must be mounted at location /dev/shm in containers using UCX

  • A ucf.k8s.ucx-config component with name ucx-graph-config which can be used as a configuration file for the application using UCX with:

    • The configuration file contents must be set as a string on the data parameter of the component

    • A placeholder $egress.<egress-endpoint-name>.address (e.g. $egress.ucx-client.address in this case) must be used wherever the IP address of the UCX service is required

    • A placeholder {{ index .Values.egress "<egress-endpoint-name>" "port" }} (e.g. {{ index .Values.egress "ucx-client" "port" }} in this case) must be used wherever the port of the UCX service is required

  • Mount the configuration file to the application container (client-ucx-container here) by setting ucxConfig parameter with:

    • name set to name of ucf.k8s.ucx-config component (ucx-graph-config in this case)

    • mountPath set to the path where the configuration file should be mounted

  • A podAnnotations component with the annotation k8s.v1.cni.cncf.io/networks set to the parameter $params.networks

Adding and connecting microservices with UCX endpoints in UCF application

Microservices with UCX endpoints can be added and connected similar to microservices with other endpoint types.

An example of such an application that includes microservices described in the previous sections is:

specVersion: 2.0.0

version: 0.0.1

name: ucx-app

description: Sample UCX app

dependencies:
- ucf.svc.server-ucx:0.0.1
- ucf.svc.client-ucx:0.0.1

components:
- name: server-ucx
  type: ucf.svc.server-ucx
  parameters:
    imagePullSecrets:
    - name: <image-pull-secret-name>

    networks: "rdma-net-ipam"
    serviceNetwork: rdma-net-ipam

    resources:
      limits:
        rdma/rdma_shared_device_a: 1
      requests:
        rdma/rdma_shared_device_a: 1

- name: client-ucx
  type: ucf.svc.client-ucx
  parameters:
    imagePullSecrets:
    - name: <image-pull-secret-name>

    networks: "rdma-net-ipam"

    resources:
      limits:
        rdma/rdma_shared_device_a: 1
      requests:
        rdma/rdma_shared_device_a: 1

connections:
  client-ucx/ucx-client: server-ucx/ucx-server

Salient points to note in the application are:

  • networks parameter set on both microservices that indicates the network interfaces to be used by pods of the microservices. (The microservices must implement a parameter for this to be set by the application.)

  • serviceNetwork parameter set on server microservice that indicates the network interface to be used by Kubernetes Service of the microservice. (The microservice must implement a parameter for this to be set by the application.)

  • A NIC resource must be assigned to the microservices using resources limits and requests (e.g. rdma/rdma_shared_device_a set to 1 in this case). This will depend on the infrastructure and network operator configuration.

Deploying UCF Applications with UCX

Hardware requirements

  • RDMA capable hardware: Mellanox ConnectX-5 NIC or newer

  • NVIDIA GPU and driver supporting GPUDirect e.g Quadro RTX 6000/8000 or Tesla T4 or Tesla V100 or Tesla V100. (GPU-Direct only)

Pre-requisites

Network Operator

Refer to the Network Operator documentation - https://docs.nvidia.com/networking/display/COKAN10/Network+Operator

  • Install the Network Operator

  • Enable secondary network configuration

  • Enable RDMA Shared Device Plugin

  • Enable NVIDIA OFED Driver

  • Deploy a NetworkAttachmentDefinition CRD to define the RDMA device network

Multus Service Controller

Install the Multus Service Controller using the following command:

kubectl apply -f https://raw.githubusercontent.com/k8snetworkplumbingwg/multus-service/main/deploy-nft.yml

Disable the coredns cache

Edit the coredns configmap using the following command and change the ttl to 0 if specified:

$ kubectl edit configmap coredns -n kube-system

Deploying the application

Make sure that the networks and serviceNetwork match the network name provided in the NetworkAttachmentDefinition CRD that is attached to the rdma device. Also make sure the rdma device resource name (rdma_shared_device_a in the example) is correct. This is determined by the rdma shared device plugin.

Install the application using:

helm install <release-name> <application-helm-chart>