RDG for Deploying Media Streaming Applications using Rivermax, DeepStream over Accelerated K8s Cluster

Created on June 15, 2022.

Scope

The following Reference Deployment Guide (RDG) shows deployment of Rivermax and DeepStream streaming apps over accelerated Kubernetes cluster.

Abbreviations and Acronyms

Term

Definition

Term

Definition

CDN

Content Delivery Network

LLDP

Link Layer Discovery Protocol

CNI

Container Network Interface

NFD

Node Feature Discovery

CR

Custom Resources

NCCL

NVIDIA Collective Communication Library

CRD

Custom Resources Definition

OCI

Open Container Initiative

CRI

Container Runtime Interface

PF

Physical Function

DHCP

Dynamic Host Configuration Protocol

QSG

Quick Start Guide

DNS

Domain Name System

RDG

Reference Deployment Guide

DP

Device Plugin

RDMA

Remote Direct Memory Access

DS

Deep Stream

RoCE

RDMA over Converged Ethernet

IPAM

IP Address Management

SR-IOV

Single Root Input Output Virtualization

K8s

Kubernetes

VF

Virtual Function

Introduction

This guide supplies a complete solution cycle of K8s cluster deployment including technology overview, design, component selection, deployment steps and apps workload examples. The solution will be delivered on top of standard servers. The NVIDIA end-to-end Ethernet infrastructure is used to oversee the workload.
In this guide, we use the NVIDIA GPU Operator and the NVIDIA Network Operator, who manage deploying and configuring GPU and Network components in the K8s cluster. These components allow you to accelerate workload using CUDA, RDMA and GPUDirect technologies.

This guide shows the design of a K8s cluster with two K8s worker nodes and provides detailed instructions for deploying a K8s cluster.

A Greenfield deployment is assumed for this guide.

The information presented is written for experienced Media and Entertainment Broadcast System Admins, System Engineers and Solution Architects who need to deploy the Rivermax streaming apps for their customers.

References

Solution Architecture

Key Components and Technologies

  • NVIDIA DGX A100

    NVIDIA DGX™ A100 is the universal system for all AI workloads, offering unprecedented compute density, performance, and flexibility in the world’s first 5 petaFLOPS AI system. NVIDIA DGX A100 features the world’s most advanced accelerator, the NVIDIA A100 Tensor Core GPU, enabling enterprises to consolidate training, inference, and analytics into a unified, easy-to-deploy AI infrastructure that includes direct access to NVIDIA AI experts.

  • NVIDIA ConnectX SmartNICs
    10/25/40/50/100/200 and 400G Ethernet Network Adapters
    The industry-leading NVIDIA® ConnectX® family of smart network interface cards (SmartNICs) offer advanced hardware offloads and accelerations.
    NVIDIA Ethernet adapters enable the highest ROI and lowest Total Cost of Ownership for hyperscale, public and private clouds, storage, machine learning, AI, big data, and telco platforms.

  • NVIDIA LinkX Cables

    The NVIDIA® LinkX® product family of cables and transceivers provides the industry’s most complete line of 10, 25, 40, 50, 100, 200, and 400GbE in Ethernet and 100, 200 and 400Gb/s InfiniBand products for Cloud, HPC, hyperscale, Enterprise, telco, storage and artificial intelligence, data center applications.

  • NVIDIA Spectrum Ethernet Switches

    Flexible form-factors with 16 to 128 physical ports, supporting 1GbE through 400GbE speeds.
    Based on a ground-breaking silicon technology optimized for performance and scalability, NVIDIA Spectrum switches are ideal for building high-performance, cost-effective, and efficient Cloud Data Center Networks, Ethernet Storage Fabric, and Deep Learning Interconnects.
    NVIDIA combines the benefits of NVIDIA Spectrum switches, based on an industry-leading application-specific integrated circuit (ASIC) technology, with a wide variety of modern network operating system choices, including NVIDIA Cumulus® Linux , SONiC and NVIDIA Onyx®.

  • Kubernetes
    Kubernetes is an open-source container orchestration platform for deployment automation, scaling, and management of containerized applications.

  • Kubespray
    Kubespray is a composition of Ansible playbooks, inventory, provisioning tools, and domain knowledge for generic OS/Kubernetes clusters configuration management tasks and provides:

    • A highly available cluster

    • Composable attributes

    • Support for most popular Linux distributions

  • NVIDIA GPU Operator

    The NVIDIA GPU Operator uses the operator framework within Kubernetes to automate the management of all NVIDIA software components needed to provision GPU. These components include the NVIDIA drivers (to enable CUDA), Kubernetes device plugin for GPUs, the NVIDIA Container Runtime, automatic node labelling, DCGM-based monitoring, and more.

  • NVIDIA Network Operator

    An analog to the NVIDIA GPU Operator, the NVIDIA Network Operator simplifies scale-out network design for Kubernetes by automating aspects of network deployment and configuration that would otherwise require manual work. It loads the required drivers, libraries, device plugins, and CNIs on any cluster node with an NVIDIA network interface. Paired with the NVIDIA GPU Operator, the Network Operator enables GPUDirect RDMA, a key technology that accelerates cloud-native AI workloads by orders of magnitude. The NVIDIA Network Operator uses Kubernetes CRD and the Operator Framework to provision the host software needed for enabling accelerated networking.

  • NVIDIA CUDA

    CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers can dramatically speed up computing applications by harnessing the power of GPUs. In GPU-accelerated applications, the sequential part of the workload runs on the CPU – which is optimized for single-threaded performance – while the compute-intensive portion of the application runs on thousands of GPU cores in parallel.

  • NVIDIA Rivermax SDK
    NVIDIA Rivermax offers a unique IP-based solution for any media and data streaming use case. Rivermax together with NVIDIA GPU accelerated computing technologies unlocks innovation for a wide range of applications in Media and Entertainment (M&E), Broadcast, Healthcare, Smart Cities and more. Rivermax leverages NVIDIA ConnectX and BlueField DPU hardware streaming acceleration technology that enables direct data transfers to and from the GPU, delivering best-in-class throughput and latency with minimal CPU utilization for streaming workloads.

  • NVIDIA DeepStream SDK
    NVIDIA DeepStream allows the rapid development and deployment of Vision AI applications and services. DeepStream provides multi-platform, scalable, TLS-encrypted security that can be deployed on-premises, on the edge, and in the cloud. It delivers a complete streaming analytics toolkit for AI-based multi-sensor processing, video, audio and image understanding. Principally DeepStream is for vision AI developers, software partners, startups and OEMs building IVA apps and services.

  • Networked Media Open Specifications (NMOS)
    NMOS specifications are a family of open, free-of-charge specifications that enable interoperability between media devices on an IP infrastructure. The core specifications, IS-04 Registration and Discovery and IS-05 Device Connection Management, provide uniform mechanisms to enable media devices and services to advertise their capabilities onto the network, and control systems to configure the video, audio and data streams between the devices' senders and receivers. NMOS is extensible and, for example, includes specifications for audio channel mapping, for exchange of event and tally information, and for securing the APIs, leveraging IT best practices. There are open-source NMOS implementations available, and NVIDIA provides a free NMOS Node library in the DeepStream SDK.

Logical Design

The logical design includes the following parts:

  • Deployment node running Kubespray that deploys Kubernetes cluster

  • K8s Master node running all Kubernetes management components

  • K8s Worker nodes with NVIDIA GPUs and NVIDIA ConnectsX-6Dx adapter

  • High-speed Ethernet fabric (Secondary K8s network)

  • Deployment and K8s Management networks

sol.png

Application Logical Design

In our guide we deployed the following applications:

  1. Rivermax Media node

  2. NMOS registry controller

  3. DeepStream gateway

  4. Time synchronization service

  5. VNC apps for internal GUI access

apps.png

Software Stack Components

soft.png

Bill of Materials

The following hardware setup is utilized in this guide to build K8s cluster with two K8s Worker nodes.

Warning

You can use any suitable hardware according to the network topology and software stack.

bom.png

Deployment and Configuration

Network / Fabric

This RDG describes K8s cluster deployment with multiple K8s Worker Nodes.

The high-performance network is a secondary network for Kubernetes cluster and requires the L2 network topology.

The Deployment/Management network topology and DNS/DHCP network services are part of the IT infrastructure. The components installation procedure and configuration are not covered in this guide.

Network IP Configuration

Below are the server names with their relevant network configurations.

Server/Switch Type

Server/Switch Name

IP and NICs

High-Speed Network

Management Network

Deployment node

depserver

N/A

eth0: DHCP

192.168.100.202

K8s Master node

node1

N/A

eth0: DHCP

192.168.100.29

K8s Worker Node1

node2

enp57s0f0: no IP set

eth0: DHCP

192.168.100.34

K8s Worker Node2

node3

enp57s0f0: no IP set

eth0: DHCP

192.168.100.39

High-speed switch

switch

mgmt0: DHCP

192.168.100.49

enpXXs0f0 high-speed network interfaces do not require additional configuration.

Wiring

On each K8s Worker Node only the first port of NVIDIA Network Adapter is wired to an NVIDIA switch in high-performance fabric using NVIDIA LinkX DAC cables.

The below figure illustrates the required wiring for building a K8s cluster.

network.png

Fabric configuration

Switch configuration is provided below:

Switch console

Copy
Copied!
            

## ## Running database "initial" ## Generated at 2022/05/10 15:49:25 +0200 ## Hostname: switch ## Product release: 3.9.3202 ##   ## ## Running-config temporary prefix mode setting ## no cli default prefix-modes enable   ## ## Interface Ethernet configuration ## interface ethernet 1/1-1/32 speed 100GxAuto force interface ethernet 1/1-1/32 switchport mode hybrid ## ## VLAN configuration ## vlan 2 vlan 1001 vlan 2 name "RiverData" vlan 1001 name "PTP" interface ethernet 1/1-1/32 switchport hybrid allowed-vlan all interface ethernet 1/5 switchport access vlan 1001 interface ethernet 1/7 switchport access vlan 1001 interface ethernet 1/5 switchport hybrid allowed-vlan add 2 interface ethernet 1/7 switchport hybrid allowed-vlan add 2   ## ## STP configuration ## no spanning-tree ## ## L3 configuration ## interface vlan 1001 interface vlan 1001 ip address 172.20.0.1/24 primary ## ## IGMP Snooping configuration ## ip igmp snooping unregistered multicast forward-to-mrouter-ports ip igmp snooping vlan 1001 ip igmp snooping vlan 1001 ip igmp snooping querier interface ethernet 1/5 ip igmp snooping fast-leave interface ethernet 1/7 ip igmp snooping fast-leave     ## ## Local user account configuration ## username admin password 7 $6$mSW1WwYI$M5xfvsphrTRht6J2ByfF.J475tq8YuGKR6K1FwSgvkdb1QQFZbx/PtqK.GVJEBoMcmXsnB57QycP7jSp.Hy/Q. username monitor password 7 $6$V/Og9kzY$qc.oU2Ma9MPJClZlbvymOrb1wtE0N5yfQYPamhzRYeN2npVY/lOE5iisHUpxNqm3Ku8lIWDTPiO/bklyCMi2o. ## ## AAA remote server configuration ## # ldap bind-password ******** ldap vrf default enable radius-server vrf default enable # radius-server key ******** tacacs-server vrf default enable # tacacs-server key ******** ## ## Password restriction configuration ## no password hardening enable ## ## SNMP configuration ## snmp-server vrf default enable ## ## Network management configuration ## # web proxy auth basic password ******** clock timezone Asia Middle_East Jerusalem ntp vrf default disable terminal sysrq enable web vrf default enable ## ## PTP protocol ## protocol ptp ptp priority1 1 ptp vrf default enable interface ethernet 1/5 ptp enable interface ethernet 1/7 ptp enable interface vlan 1001 ptp enable ## ## X.509 certificates configuration ## # # Certificate name system-self-signed, ID ca9888a2ed650c5c4bd372c055bdc6b4da65eb1e # (public-cert config omitted since private-key config is hidden)   ## ## Persistent prefix mode setting ## cli default prefix-modes enable

Host

General Configuration

General Prerequisites:

  • Hardware

    Ensure that all the K8s worker nodes have the exact hardware specification (see BoM for details).

  • Host BIOS

    Verify that SR-IOV supported server platform is being used and review the BIOS settings in the server platform vendor documentation to enable SR-IOV in the BIOS.

  • Host OS

    The Ubuntu Server 20.04 operating system should be installed on all servers with OpenSSH server packages.

  • Experience with Kubernetes

    Familiarization with the Kubernetes Cluster architecture is essential.

Important

Make sure that the BIOS settings on the K8s Worker Nodes are tuned for maximum performance.

All K8s Worker Nodes must have the exact PCIe placement for the NIC and should expose the same interface name.

Host OS Prerequisites

Ensure that a non-root depuser account is created during the deployment of the Ubuntu Server 20.04 operating system.

Update the Ubuntu software packages by running the following commands:

Server console

Copy
Copied!
            

$ sudo apt-get update $ sudo apt-get install linux-image-lowlatency -y $ sudo apt-get upgrade -y $ sudo reboot

Add to non-root depuser account sudo privileges without password.

In this solution, the following line was added to the EOF /etc/sudoers:

Server console

Copy
Copied!
            

$ sudo vim /etc/sudoers #includedir /etc/sudoers.d #K8s cluster deployment user with sudo privileges without password depuser ALL=(ALL) NOPASSWD:ALL

OFED Installation and Configuration

OFED installation is required only on the K8s Worker Nodes. To download the latest OFED version please visit Linux Drivers (nvidia.com).
Download and installation procedures are provided below. All steps are required with root privileges.
After OFED installation, please reboot your node.

Server console

Copy
Copied!
            

wget https://content.mellanox.com/ofed/MLNX_OFED-5.5-1.0.3.2/MLNX_OFED_LINUX-5.5-1.0.3.2-ubuntu20.04-x86_64.iso mount -o loop ./MLNX_OFED_LINUX-5.5-1.0.3.2-ubuntu20.04-x86_64.iso /mnt/ /mnt/mlnxofedinstall --vma --without-fw-update reboot

K8s Cluster Deployment

The Kubernetes cluster in this solution is installed using Kubespray with a non-root depuser account from the deployment node.

SSH Private Key and SSH Passwordless Login

Log in to the Deployment Node as a deployment user (in our case, depuser) and create an SSH private key for configuring the passwordless authentication on your computer by running the following commands:

Deployment Node console

Copy
Copied!
            

$ ssh-keygen   Generating public/private rsa key pair. Enter file in which to save the key (/home/depuser/.ssh/id_rsa): Created directory '/home/depuser/.ssh'. Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in /home/depuser/.ssh/id_rsa Your public key has been saved in /home/depuser/.ssh/id_rsa.pub The key fingerprint is: SHA256:IfcjdT/spXVHVd3n6wm1OmaWUXGuHnPmvqoXZ6WZYl0 depuser@depserver The key's randomart image is: +---[RSA 3072]----+ | *| | .*| | . o . . o=| | o + . o +E| | S o .**O| | . .o=OX=| | . o%*.| | O.o.| | .*.ooo| +----[SHA256]-----+

Copy your SSH private key, such as ~/.ssh/id_rsa, to all nodes in the deployment by running the following command (example):

Deployment Node console

Copy
Copied!
            

$ ssh-copy-id depuser@192.168.100.29   /usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/home/depuser/.ssh/id_rsa.pub" The authenticity of host '192.168.100.29 (192.168.100.29)' can't be established. ECDSA key fingerprint is SHA256:6nhUgRlt9gY2Y2ofukUqE0ltH+derQuLsI39dFHe0Ag. Are you sure you want to continue connecting (yes/no/[fingerprint])? yes /usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed /usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys depuser@192.168.100.29's password:   Number of key(s) added: 1   Now try logging into the machine, with: "ssh 'depuser@192.168.100.29'" and check to make sure that only the key(s) you wanted were added.

Verify that you have passwordless SSH connectivity to all nodes in your deployment by running the following command (example):

Deployment Node console

Copy
Copied!
            

$ ssh depuser@192.168.100.29

Kubespray Deployment and Configuration

General Setting

To install dependencies for running Kubespray with Ansible on the Deployment Node, please run the following commands:

Deployment Node console

Copy
Copied!
            

$ cd ~ $ sudo apt -y install python3-pip jq $ wget https://github.com/kubernetes-sigs/kubespray/archive/v2.18.1.tar.gz $ tar -zxf v2.18.1.tar.gz $ cd kubespray-2.18.1 $ sudo pip3 install -r requirements.txt

Warning

The default folder for subsequent commands is ~/kubespray-2.18.1.
To download the latest Kubespray version please visit Releases · kubernetes-sigs/kubespray · GitHub.

Deployment Customization

Create a new cluster configuration and host configuration file .
Replace the IP addresses below with your nodes' IP addresses:

Deployment Node console

Copy
Copied!
            

$ cp -rfp inventory/sample inventory/mycluster $ declare -a IPS=(192.168.100.29 192.168.100.34 192.168.100.39) $ CONFIG_FILE=inventory/mycluster/hosts.yaml python3 contrib/inventory_builder/inventory.py ${IPS[@]}

As a result, the inventory/mycluster/hosts.yaml file will be created.
Review and change the host configuration in the file. Below is an example of this deployment:

inventory/mycluster/hosts.yaml

Copy
Copied!
            

all: hosts: node1: ansible_host: 192.168.100.29 ip: 192.168.100.29 access_ip: 192.168.100.29 node2: ansible_host: 192.168.100.34 ip: 192.168.100.34 access_ip: 192.168.100.34 node3: ansible_host: 192.168.100.39 ip: 192.168.100.39 access_ip: 192.168.100.39 children: kube_control_plane: hosts: node1: kube_node: hosts: node2: node3: etcd: hosts: node1: k8s_cluster: children: kube_control_plane: kube_node: calico_rr: hosts: {}

Deploying the Cluster Using KubeSpray Ansible Playbook

Run the following line to start the deployment procedure:

Deployment Node console

Copy
Copied!
            

$ ansible-playbook -i inventory/mycluster/hosts.yaml --become --become-user=root cluster.yml

It takes a while for K8s cluster deployment to complete, please make sure no errors are encountered in the playbook log.

Below is an example of a successful result:

Deployment Node console

Copy
Copied!
            

... PLAY RECAP *************************************************************************************************************************************************** localhost : ok=4 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0 node1 : ok=501 changed=111 unreachable=0 failed=0 skipped=1131 rescued=0 ignored=2 node2 : ok=360 changed=40 unreachable=0 failed=0 skipped=661 rescued=0 ignored=1 node3 : ok=360 changed=40 unreachable=0 failed=0 skipped=660 rescued=0 ignored=1     Sunday 9 May 2021 19:39:17 +0000 (0:00:00.064) 0:06:54.711 ******** =============================================================================== kubernetes/control-plane : kubeadm | Initialize first master ----------------------------------------------------------------------------------------- 28.13s kubernetes/control-plane : Master | wait for kube-scheduler ------------------------------------------------------------------------------------------ 12.78s download : download_container | Download image if required ------------------------------------------------------------------------------------------- 10.56s container-engine/containerd : ensure containerd packages are installed -------------------------------------------------------------------------------- 9.48s download : download_container | Download image if required -------------------------------------------------------------------------------------------- 9.36s download : download_container | Download image if required -------------------------------------------------------------------------------------------- 9.08s download : download_container | Download image if required -------------------------------------------------------------------------------------------- 9.05s download : download_file | Download item -------------------------------------------------------------------------------------------------------------- 8.91s download : download_container | Download image if required -------------------------------------------------------------------------------------------- 8.47s kubernetes/preinstall : Install packages requirements ------------------------------------------------------------------------------------------------- 8.30s download : download_container | Download image if required -------------------------------------------------------------------------------------------- 7.49s download : download_container | Download image if required -------------------------------------------------------------------------------------------- 7.39s kubernetes-apps/ansible : Kubernetes Apps | Start Resources ------------------------------------------------------------------------------------------- 7.07s download : download_container | Download image if required -------------------------------------------------------------------------------------------- 5.99s container-engine/containerd : ensure containerd repository is enabled --------------------------------------------------------------------------------- 5.59s container-engine/crictl : download_file | Download item ----------------------------------------------------------------------------------------------- 5.45s download : download_file | Download item -------------------------------------------------------------------------------------------------------------- 5.34s kubernetes-apps/ansible : Kubernetes Apps | Lay Down CoreDNS templates -------------------------------------------------------------------------------- 5.00s download : download_container | Download image if required -------------------------------------------------------------------------------------------- 4.95s download : download_file | Download item -------------------------------------------------------------------------------------------------------------- 4.50s

K8s Cluster Customization and Verification

Now that the K8S cluster is deployed, connection to the K8s cluster can be done from any K8S Master Node with the root user account or from another server with installed KUBECTL command and configured KUBECONFIG=<path-to-config-file> to customize deployment.

In our guide we continue the deployment from K8s Master Node with the root user account:

Label the Worker Nodes:

Master Node console

Copy
Copied!
            

$ kubectl label nodes node2 node-role.kubernetes.io/worker= $ kubectl label nodes node3 node-role.kubernetes.io/worker=

Important

K8s Worker Node labeling is required for a proper installation of the NVIDIA Network Operator.

Below is an output example of the K8s cluster deployment information using the Calico CNI plugin.

To ensure that the Kubernetes cluster is installed correctly, run the following commands:

Master Node console

Copy
Copied!
            

## Get cluster node status   kubectl get node -o wide   NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME node1 Ready control-plane,master 9d v1.22.8 192.168.100.29 <none> Ubuntu 20.04.4 LTS 5.4.0-109-generic containerd://1.5.8 node2 Ready worker 9d v1.22.8 192.168.100.34 <none> Ubuntu 20.04.4 LTS 5.4.0-109-lowlatency containerd://1.5.8 node3 Ready worker 9d v1.22.8 192.168.100.39 <none> Ubuntu 20.04.4 LTS 5.4.0-109-lowlatency containerd://1.5.8   ## Get system pods status   kubectl -n kube-system get pods -o wide   NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES calico-kube-controllers-5788f6558-bm5h9 1/1 Running   0 9d 192.168.100.29 node1 <none> <none> calico-node-4f748 1/1 Running   0 9d 192.168.100.34 node2 <none> <none> calico-node-jhbjh 1/1 Running   0 9d 192.168.100.39 node3 <none> <none> calico-node-m78p6 1/1 Running   0 9d 192.168.100.29 node1 <none> <none> coredns-8474476ff8-dczww 1/1 Running   0 9d 10.233.90.23 node1 <none> <none> coredns-8474476ff8-ksvkd 1/1 Running 0 9d 10.233.96.234 node2 <none> <none> dns-autoscaler-5ffdc7f89d-h6nc8 1/1 Running   0 9d 10.233.90.20 node1 <none> <none> kube-apiserver-node1 1/1 Running   0 9d 192.168.100.29 node1 <none> <none> kube-controller-manager-node1 1/1 Running   0 9d 192.168.100.29 node1 <none> <none> kube-proxy-2bq45 1/1 Running   0 9d 192.168.100.34 node2 <none> <none> kube-proxy-4c8p7 1/1 Running   0 9d 192.168.100.39 node3 <none> <none> kube-proxy-j226w 1/1 Running   0 9d 192.168.100.29 node1 <none> <none> kube-scheduler-node1 1/1 Running   0 9d 192.168.100.29 node1 <none> <none> nginx-proxy-node2 1/1 Running   0 9d 192.168.100.34 node2 <none> <none> nginx-proxy-node3 1/1 Running   0 9d 192.168.100.39 node3 <none> <none> nodelocaldns-9rffq 1/1 Running   0 9d 192.168.100.39 node3 <none> <none> nodelocaldns-fdnr7 1/1 Running   0 9d 192.168.100.34 node2 <none> <none> nodelocaldns-qhpxk 1/1 Running   0 9d 192.168.100.29 node1 <none> <none>

NVIDIA GPU Operator Installation for K8s cluster

The preferred method to deploy the GPU Operator using helm from the K8s Master node. To install helm, simply use the following command:

Copy
Copied!
            

$ snap install helm --classic

Add the NVIDIA GPU Operator Helm repository.

Copy
Copied!
            

$ helm repo add nvidia https://nvidia.github.io/gpu-operator $ helm repo update

Deploy NVIDIA GPU Operator.

GPU Operator should be deployed with enabled GPUDirect kernel module - driver.rdma.enabled=true.

Copy
Copied!
            

$ helm install --wait --generate-name -n gpu-operator --create-namespace nvidia/gpu-operator --set driver.rdma.enabled=true --set driver.rdma.useHostMofed=true   $ helm ls -n gpu-operator NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION gpu-operator-1652190420 gpu-operator 1 2022-05-10 13:47:01.106147933 +0000 UTC deployed gpu-operator-v1.10.0 v1.10.0 NAME

Once the Helm chart is installed, check the status of the pods to ensure all the containers are running and the validation is complete:

Copy
Copied!
            

$ kubectl get pod -n gpu-operator -o wide   NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES gpu-feature-discovery-bcc22 1/1 Running 1 (3d8h ago) 5d18h 10.233.96.3 node2 <none> <none> gpu-feature-discovery-vl68h 1/1 Running 0 5d18h 10.233.92.58 node3 <none> <none> gpu-operator-1652190420-node-feature-discovery-master-5b5fx8zlx 1/1 Running 1 (4m5s ago) 5d18h 10.233.90.17 node1 <none> <none> gpu-operator-1652190420-node-feature-discovery-worker-czsb4 1/1 Running 0 4s 10.233.92.75 node3 <none> <none> gpu-operator-1652190420-node-feature-discovery-worker-fnlj6 1/1 Running 0 4s 10.233.96.253 node2 <none> <none> gpu-operator-1652190420-node-feature-discovery-worker-r44r8 1/1 Running 1 (4m5s ago) 5d18h 10.233.90.22 node1 <none> <none> gpu-operator-6497cbf9cd-vcsrg 1/1 Running 1 (4m6s ago) 5d18h 10.233.90.19 node1 <none> <none> nvidia-container-toolkit-daemonset-4h9dr 1/1 Running 0 5d18h 10.233.96.246 node2 <none> <none> nvidia-container-toolkit-daemonset-rv7sn 1/1 Running 1 (5d18h ago) 5d18h 10.233.92.50 node3 <none> <none> nvidia-cuda-validator-kr6q9 0/1 Completed 0 5d18h 10.233.92.61 node3 <none> <none> nvidia-cuda-validator-zb4p8 0/1 Completed 0 5d18h 10.233.96.4 node2 <none> <none> nvidia-dcgm-exporter-5hdzh 1/1 Running 0 5d18h 10.233.96.198 node2 <none> <none> nvidia-dcgm-exporter-lnqzb 1/1 Running 0 5d18h 10.233.92.57 node3 <none> <none> nvidia-device-plugin-daemonset-dxgnz 1/1 Running 0 5d18h 10.233.92.62 node3 <none> <none> nvidia-device-plugin-daemonset-w692b 1/1 Running 0 5d18h 10.233.96.9 node2 <none> <none> nvidia-device-plugin-validator-pqns8 0/1 Completed 0 5d18h 10.233.92.64 node3 <none> <none> nvidia-device-plugin-validator-sgtmt 0/1 Completed 0 5d18h 10.233.96.10 node2 <none> <none> nvidia-driver-daemonset-l9x4n 2/2 Running 1 (2d19h ago) 5d18h 10.233.92.30 node3 <none> <none> nvidia-driver-daemonset-tf2tl 2/2 Running 5 (2d21h ago) 5d18h 10.233.96.244 node2 <none> <none> nvidia-operator-validator-p6794 1/1 Running 0 5d18h 10.233.96.6 node2 <none> <none> nvidia-operator-validator-xjrg9 1/1 Running 0 5d18h 10.233.92.54 node3 <none> <none>

NVIDIA Network Operator Installation

The NVIDIA Network Operator leverages Kubernetes CRDs and Operator SDK to manage networking-related components in order to enable fast networking and RDMA for workloads in K8s cluster. The Fast Network is a secondary network of the K8s cluster for applications that require high bandwidth or low latency.

To make it work, several components need to be provisioned and configured. The Helm is required for the Network Operator deployment.

Add the NVIDIA Network Operator Helm repository:

Copy
Copied!
            

## Add REPO   helm repo add mellanox https://mellanox.github.io/network-operator \   && helm repo update            

Create the values.yaml file to customize the Network Operator deployment (e xample):

Copy
Copied!
            

nfd: enabled: true   sriovNetworkOperator: enabled: true   ofedDriver: deploy: false nvPeerDriver: deploy: false rdmaSharedDevicePlugin: deploy: false sriovDevicePlugin: deploy: false   deployCR: true secondaryNetwork: deploy: true cniPlugins: deploy: true multus: deploy: true ipamPlugin: deploy: true          

Deploy the operator:

Copy
Copied!
            

helm install -f ./values.yaml -n network-operator --create-namespace --wait mellanox/network-operator --generate-name   helm ls -n network-operator NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION network-operator-1648457278 network-operator 1 2022-03-28 08:47:59.548667592 +0000 UTC deployed network-operator-1.1.0 v1.1.0

Once the Helm chart is installed, check the status of the pods to ensure all the containers are running:

Copy
Copied!
            

## PODs status in namespace - network-operator   kubectl -n network-operator get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES network-operator-1648457278-5885dbfff5-wjgsc 1/1 Running 0 5m 10.233.90.15 node1 <none> <none> network-operator-1648457278-node-feature-discovery-master-zbcx8 1/1 Running   0 5m   10.233.90.16 node1 <none> <none> network-operator-1648457278-node-feature-discovery-worker-kk4qs 1/1 Running   0 5m   10.233.90.18 node1 <none> <none> network-operator-1648457278-node-feature-discovery-worker-n44b6 1/1 Running   0 5m   10.233.92.221 node3 <none> <none> network-operator-1648457278-node-feature-discovery-worker-xhzfw 1/1 Running   0 5m   10.233.96.233 node2 <none> <none> network-operator-1648457278-sriov-network-operator-5cd4bdb6mm9f 1/1 Running   0 5m   10.233.90.21 node1 <none> <none> sriov-device-plugin-cxnrl 1/1 Running   0 5m   192.168.100.34 node2 <none> <none> sriov-device-plugin-djlmn 1/1 Running   0 5m   192.168.100.39 node3 <none> <none> sriov-network-config-daemon-rgfvk 3/3 Running   0 5m   192.168.100.39 node3 <none> <none> sriov-network-config-daemon-zzchs 3/3 Running   0 5m   192.168.100.34 node2 <none> <none>   ## PODs status in namespace - nvidia-network-operator-resources   kubectl -n nvidia-network-operator-resources get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES cni-plugins-ds-snf6x 1/1 Running   0 5m   192.168.100.39 node3 <none> <none> cni-plugins-ds-zjb27 1/1 Running   0 5m   192.168.100.34 node2 <none> <none> kube-multus-ds-mz7nd 1/1 Running   0 5m   192.168.100.39 node3 <none> <none> kube-multus-ds-xjxgd 1/1 Running   0 5m   192.168.100.34 node2 <none> <none> whereabouts-jgt24 1/1 Running   0 5m   192.168.100.34 node2 <none> <none> whereabouts-sphx4 1/1 Running   0 5m   192.168.100.39 node3 <none> <none>    

High-Speed Network Configuration

After installing the operator, please check the SriovNetworkNodeState CRs to see all SR-IOV-enabled devices in your node.
In this deployment, the network interface has been chosen with the following name: enp57s0f0 .

To review the interface status please use the following command:

NICs status

Copy
Copied!
            

## NIC status kubectl -n network-operator get sriovnetworknodestates.sriovnetwork.openshift.io node2 -o yaml ... status: interfaces: deviceID: 101d driver: mlx5_core eSwitchMode: legacy linkSpeed: 100000 Mb/s linkType: ETH mac: 0c:42:a1:2b:73:fa mtu: 9000 name: enp57s0f0 numVfs: 8 pciAddress: "0000:39:00.0" totalvfs: 8 vendor: 15b3 - deviceID: 101d driver: mlx5_core ...

Create SriovNetworkNodePolicy CR for chosen network interface - policy.yaml file, by specifying the chosen interface in the 'nicSelector'.

According to application design VF0 allotted into a separate pool from the rest of VFn:

policy.yaml

Copy
Copied!
            

apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetworkNodePolicy metadata: name: mlnxnics-sw1 namespace: network-operator spec: nodeSelector: feature.node.kubernetes.io/custom-rdma.capable: "true" resourceName: timepool priority: 99 mtu: 9000 numVfs: 8 nicSelector: pfNames: [ "enp57s0f0#0-0" ] deviceType: netdevice isRdma: true --- apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetworkNodePolicy metadata: name: mlnxnics-sw2 namespace: network-operator spec: nodeSelector: feature.node.kubernetes.io/custom-rdma.capable: "true" resourceName: rdmapool priority: 99 mtu: 9000 numVfs: 8 nicSelector: pfNames: [ "enp57s0f0#1-7" ] deviceType: netdevice isRdma: true

Deploy policy.yaml:

Copy
Copied!
            

kubectl apply -f policy.yaml sriovnetworknodepolicy.sriovnetwork.openshift.io/mlnxnics-sw1 created sriovnetworknodepolicy.sriovnetwork.openshift.io/mlnxnics-sw2 created

Important

This step takes a while. This depends on the amount of K8s Worker Nodes to apply the configuration, and the number of VFs for each selected network interface.

Create an SriovNetwork CR for chosen network interface - network.yaml file which refers to the 'resourceName' defined in SriovNetworkNodePolicy.

In this example below created:

  • timenet - K8s network name for PTP time sync

  • rdmanet - K8s network name with dynamic IPAM

  • rdma-static - K8s network name with static IPAM

network.yaml

Copy
Copied!
            

apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetwork metadata: name: timenet namespace: network-operator spec: ipam: | { "datastore": "kubernetes", "kubernetes": {"kubeconfig": "/etc/cni/net.d/whereabouts.d/whereabouts.kubeconfig"}, "log_file": "/tmp/whereabouts.log", "log_level": "debug", "type": "whereabouts", "range": "172.20.0.0/24", "exclude": [ "172.20.0.1/32" ] } networkNamespace: default resourceName: timepool trust: "on" vlan: 0 --- apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetwork metadata: name: rdmanet namespace: network-operator spec: ipam: | { "datastore": "kubernetes", "kubernetes": { "kubeconfig": "/etc/cni/net.d/whereabouts.d/whereabouts.kubeconfig" }, "log_file": "/tmp/whereabouts.log", "log_level": "debug", "type": "whereabouts", "range": "192.168.102.0/24", "exclude": [ "192.168.102.254/32", "192.168.102.253/32" ] } networkNamespace: default resourceName: rdmapool vlan: 2 --- apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetwork metadata: name: rdmanet-static namespace: network-operator spec: ipam: | { "type": "static" } networkNamespace: default resourceName: rdmapool vlan: 2

Deploy network.yaml:

Copy
Copied!
            

kubectl apply -f network.yaml sriovnetwork.sriovnetwork.openshift.io/timenet created sriovnetwork.sriovnetwork.openshift.io/rdmanet created sriovnetwork.sriovnetwork.openshift.io/rdmanet-static created

Manage HugePages

Kubernetes supports the allocation and consumption of pre-allocated HugePages by applications in a Pod. The nodes will automatically discover and report all HugePages resources as schedulable resources. For get additional information K8s HugePages management, please refer here.

In order to allocate, HugePages needs to modify GRUB_CMDLINE_LINUX_DEFAULT parameter in /etc/default/grub. This setting, below, allocates 2MB * 8192 pages = 16GB HugePages on boot time:

/etc/default/grub

Copy
Copied!
            

...   GRUB_CMDLINE_LINUX_DEFAULT="default_hugepagesz=2M hugepagesz=2M hugepages=8192"   ...

Run update-grub to apply the config to grub and reboot server:

Worker Node console

Copy
Copied!
            

# update-grub # reboot

After the server comes back, check hugepages allocation from master node by command:

Master Node console

Copy
Copied!
            

# kubectl describe nodes node2 ... Capacity: cpu: 48 ephemeral-storage: 459923528Ki hugepages-1Gi: 0 hugepages-2Mi: 16Gi memory: 264050900Ki nvidia.com/gpu: 2 nvidia.com/rdmapool: 7 nvidia.com/timepool: 1 pods: 110 Allocatable: cpu: 46 ephemeral-storage: 423865522704 hugepages-1Gi: 0 hugepages-2Mi: 16Gi memory: 246909140Ki nvidia.com/gpu: 2 nvidia.com/rdmapool: 7 nvidia.com/timepool: 1 pods: 110 ...

Enable CPU and Topology Management

CPU Manager manages groups of CPUs and constrains workloads to specific CPUs.

CPU Manager is useful for workloads that have some of these attributes:

  • Require as much CPU time as possible

  • Are sensitive to processor cache misses

  • Are low-latency network applications

  • Coordinate with other processes and benefit from sharing a single processor cache

Topology Manager uses topology information from collected hints to decide if a pod can be accepted or rejected on a node, based on the configured Topology Manager policy and Pod resources requested. In order to extract the best performance, optimizations related to CPU isolation and memory and device locality are required.

Topology Manager is useful for workloads that use hardware accelerators to support latency-critical execution and high throughput parallel computation.

Important

To use Topology Manager, CPU Manager with static policy must be used.

For additional information, please refer to Control Topology Management Policies on a node and Control Topology Management Policies on a node.

In order to enable CPU Manager and Topology Manager, please add following lines to kubelet configuration file /etc/kubernetes/kubelet-config.yaml:

/etc/kubernetes/kubelet-config.yaml

Copy
Copied!
            

... cpuManagerPolicy: static cpuManagerReconcilePeriod: 10s topologyManagerPolicy: single-numa-node featureGates: CPUManager: true TopologyManager: true

Due to changes in cpuManagerPolicy, remove /var/lib/kubelet/cpu_manager_state and restart kubelet service on each affected K8s worker node.

Worker Node console

Copy
Copied!
            

# rm -f /var/lib/kubelet/cpu_manager_state # service kubelet restart

Application

Below provides K8s specific components and K8s YAML configuration files to deploy Rivermax applications in K8s cluster.

Note

For proper application execution Rivermax license is required. To obtain a license please look at Rivermax License Generation Guidelines.

Note

To download Rivermax apps container images from container repository and application pipeline, you need to register and log in to the Rivermax portal by clicking on "Get Started".

Rivermax license

Upload Rivermax license as configmap value in K8s cluster.

Copy
Copied!
            

kubectl create configmap rivermax-config --from-file=rivermax.lic=./rivermax.lic

Media Node application

This pod definition contains implementation of the AMWA Networked Media Open Specifications (NMOS) with the NMOS Rivermax Node implementation. For more information about AMWA, NMOS and the Networked Media Incubator, please refer to http://amwa.tv/. For more information about Rivermax SDK please refer to https://developer.nvidia.com/networking/Rivermax.
Below provided YAML configuration file for Media Node deployment. Please fill your container file name and your registry secret.

Copy
Copied!
            

apiVersion: v1 kind: ConfigMap metadata: name: river-config data: container-config: |- #media_node JSON file to run config_json=/var/home/config.json #Output registry stdout/stderr output to a log inside container log_output=FALSE #Update/insert label parameter with container hostname on entrypoint script run update_label=TRUE #Allow these network interfaces in /etc/avahi/avahi-daemon.conf allow_interfaces=net1 --- apiVersion: apps/v1 kind: Deployment metadata: name: "mnc" labels: apps: rivermax spec: replicas: 3 selector: matchLabels: app: rivermax template: metadata: labels: app: rivermax annotations: k8s.v1.cni.cncf.io/networks: rdmanet spec: containers: - command: image: < media node container image > name: "medianode" env: - name: DISPLAY value: "192.168.102.253:0.0" resources: requests: nvidia.com/rdmapool: 1 hugepages-2Mi: 4Gi memory: 8Gi cpu: 4 limits: nvidia.com/rdmapool: 1 hugepages-2Mi: 4Gi memory: 8Gi cpu: 4 securityContext: capabilities: add: [ "IPC_LOCK", "SYS_RESOURCE", "NET_RAW","NET_ADMIN" ] volumeMounts: - name: config mountPath: /var/home/ext/ - name: licconfig mountPath: /opt/mellanox/rivermax/ - mountPath: /hugepages name: hugepage - mountPath: /dev/shm name: dshm volumes: - name: config configMap: name: river-config - name: licconfig configMap: name: rivermax-config - name: hugepage emptyDir: medium: HugePages - name: dshm emptyDir: { medium: 'Memory', sizeLimit: '4Gi' } imagePullSecrets: - name: < Container registry secret >

NMOS controller

AMWA NMOS controller is a device that can interact with NMOS APIs, which are a family of open specifications for networked media for professional applications. NMOS controller can discover, register, connect and manage media devices on an IP infrastructure using common methods and protocols. NMOS controller can also handle event and tally, audio channel mapping, authorization and other functions that are part of the NMOS roadmap. For more information, please look at README.md.

Copy
Copied!
            

apiVersion: v1 kind: Pod metadata: name: nmos-cpp labels: app.kubernetes.io/name: nmos annotations: k8s.v1.cni.cncf.io/networks: | [ { "name": "rdmanet-static", "ips": [ "192.168.102.254/24" ] } ] spec: containers: - name: nmos-pod image: docker.io/rhastie/nmos-cpp:latest env: - name: RUN_NODE value: "true" resources: requests: cpu: 2 memory: 1Gi nvidia.com/rdmapool: 1 limits: cpu: 2 memory: 1Gi nvidia.com/rdmapool: 1 ports: - containerPort: 8010 name: port-8010 - containerPort: 8011 name: port-8011 - containerPort: 11000 name: port-11000 - containerPort: 11001 name: port-11001 - containerPort: 1883 name: port-1883 - containerPort: 5353 name: port-5353 protocol: UDP

DeepStream Media Gateway

One of the applications of DeepStream SDK is to encode RAW data to SRT stream. This application can capture video frames from a camera or a file, encode them using H.264 or H.265 codec, and send them over a network using SRT protocol. SRT stands for Secure Reliable Transport, which is a low-latency and secure streaming technology. This application can be useful for scenarios such as remote surveillance, live broadcasting, or video conferencing.
Below provided YAML configuration file for Media Gateway deployment. Please fill your container file name and your registry secret.

Copy
Copied!
            

apiVersion: v1 kind: Pod metadata: name: ds-rmax labels: name: dsrmax-app annotations: k8s.v1.cni.cncf.io/networks: rdmanet spec: containers: - name: dsrmax image: < DeepStream media gateway container image > command: - sh - -c - sleep inf env: - name: DISPLAY value: "192.168.102.253:0.0" ports: - containerPort: 7001 name: udp-port securityContext: capabilities: add: [ "IPC_LOCK", "SYS_RESOURCE", "NET_RAW","NET_ADMIN"] resources: requests: nvidia.com/rdmapool: 1 nvidia.com/gpu: 1 hugepages-2Mi: 2Gi memory: 8Gi cpu: 8 limits: nvidia.com/rdmapool: 1 nvidia.com/gpu: 1 hugepages-2Mi: 2Gi memory: 8Gi cpu: 8 volumeMounts: - name: config mountPath: /var/home/ext/ - name: licconfig mountPath: /opt/mellanox/rivermax/ - mountPath: /hugepages name: hugepage - mountPath: /dev/shm name: dshm volumes: - name: config configMap: name: river-config - name: licconfig configMap: name: rivermax-config - name: hugepage emptyDir: medium: HugePages - name: dshm emptyDir: { medium: 'Memory', sizeLimit: '4Gi' } imagePullSecrets: - name: < Container registry secret > --- apiVersion: v1 kind: Service metadata: name: rmax-service spec: type: NodePort selector: name: dsrmax-app ports: # By default and for convenience, the `targetPort` is set to the same value as the `port` field. - port: 7001 name: udp-port protocol: UDP targetPort: 7001

VNC container with GUI

This pop definition allows you to access a web VNC interface with Ubuntu LXDE/LXQT desktop environment inside a Kubernetes cluster. It uses a interface of the K8s secondary network to manage applications via GUI on your cluster nodes.
Below provided YAML configuration file for VNC deployment. Please fill your container file name.

Note

Example of this application can be found at - GitHub - theasp/docker-novnc: noVNC Display Container for Docker, but you can create your own container image.

Copy
Copied!
            

apiVersion: v1 kind: Pod metadata: name: ub-vnc labels: name: ubuntu-vnc annotations: k8s.v1.cni.cncf.io/networks: | [ { "name": "rdmanet-static", "ips": [ "192.168.102.253/24" ] } ] spec: volumes: - name: dshm emptyDir: medium: Memory containers: - image: < NOVNC container image > name: vnc-container resources: limits: cpu: 4 memory: 8Gi nvidia.com/rdmapool: 1 env: - name: DISPLAY_WIDTH value: "1920" - name: DISPLAY_HEIGHT value: "1080" - name: RUN_XTERM value: "yes" - name: RUN_FLUXBOX value: "yes" ports: - containerPort: 8080 name: http-port volumeMounts: - mountPath: /dev/shm name: dshm --- apiVersion: v1 kind: Service metadata: name: vnc-service spec: type: NodePort selector: name: ubuntu-vnc ports: - port: 8080 name: http-port targetPort: 8080

Authors

ID-2.jpg

Vitaliy Razinkov

Over the past few years, Vitaliy Razinkov has been working as a Solutions Architect on the NVIDIA Networking team, responsible for complex Kubernetes/OpenShift and Microsoft's leading solutions, research and design. He previously spent more than 25 years in senior positions at several companies. Vitaliy has written several reference design guides on Microsoft technologies, RoCE/RDMA accelerated machine learning in Kubernetes/OpenShift, and container solutions, all of which are available on the NVIDIA Networking Documentation website.

garethsb-badge-photo.jpg

Gareth Sylvester-Bradley

Gareth Sylvester-Bradley is a Principal Engineer at NVIDIA, and currently serving as the chair of the Networked Media Open Specifications (NMOS) Architecture Review group in the Advanced Media Workflow Association (AMWA). He is focused on building software toolkits and agile, collaborative industry specifications to deliver open, software-defined, hardware-accelerated media workflows for broadcast, live production, medical imaging, industrial video, etc.

Last updated on Sep 12, 2023.