RDG for Deploying Media Streaming Applications using Rivermax, DeepStream over Accelerated K8s Cluster
Created on June 15, 2022.
Scope
The following Reference Deployment Guide (RDG) shows deployment of Rivermax and DeepStream streaming apps over accelerated Kubernetes cluster.
Abbreviations and Acronyms
Term |
Definition |
Term |
Definition |
CDN |
Content Delivery Network |
LLDP |
Link Layer Discovery Protocol |
CNI |
Container Network Interface |
NFD |
Node Feature Discovery |
CR |
Custom Resources |
NCCL |
NVIDIA Collective Communication Library |
CRD |
Custom Resources Definition |
OCI |
Open Container Initiative |
CRI |
Container Runtime Interface |
PF |
Physical Function |
DHCP |
Dynamic Host Configuration Protocol |
QSG |
Quick Start Guide |
DNS |
Domain Name System |
RDG |
Reference Deployment Guide |
DP |
Device Plugin |
RDMA |
Remote Direct Memory Access |
DS |
Deep Stream |
RoCE |
RDMA over Converged Ethernet |
IPAM |
IP Address Management |
SR-IOV |
Single Root Input Output Virtualization |
K8s |
Kubernetes |
VF |
Virtual Function |
Introduction
This guide supplies a complete solution cycle of K8s cluster deployment including technology overview, design, component selection, deployment steps and apps workload examples.
The solution will be delivered on top of standard servers. The
NVIDIA end-to-end Ethernet infrastructure is used to oversee the workload.
In this guide, we use the NVIDIA GPU Operator and the NVIDIA Network Operator, who manage deploying and configuring GPU and Network components in the K8s cluster. These components allow you to accelerate workload using CUDA, RDMA and GPUDirect technologies.
This guide shows the design of a K8s cluster with two K8s worker nodes and provides detailed instructions for deploying a K8s cluster.
A Greenfield deployment is assumed for this guide.
The information presented is written for experienced Media and Entertainment Broadcast System Admins, System Engineers and Solution Architects who need to deploy the Rivermax streaming apps for their customers.
References
Solution Architecture
Key Components and Technologies
- NVIDIA DGX™ A100 is the universal system for all AI workloads, offering unprecedented compute density, performance, and flexibility in the world’s first 5 petaFLOPS AI system. NVIDIA DGX A100 features the world’s most advanced accelerator, the NVIDIA A100 Tensor Core GPU, enabling enterprises to consolidate training, inference, and analytics into a unified, easy-to-deploy AI infrastructure that includes direct access to NVIDIA AI experts.
NVIDIA ConnectX SmartNICs
10/25/40/50/100/200 and 400G Ethernet Network Adapters
The industry-leading NVIDIA® ConnectX® family of smart network interface cards (SmartNICs) offer advanced hardware offloads and accelerations.
NVIDIA Ethernet adapters enable the highest ROI and lowest Total Cost of Ownership for hyperscale, public and private clouds, storage, machine learning, AI, big data, and telco platforms.
The NVIDIA® LinkX® product family of cables and transceivers provides the industry’s most complete line of 10, 25, 40, 50, 100, 200, and 400GbE in Ethernet and 100, 200 and 400Gb/s InfiniBand products for Cloud, HPC, hyperscale, Enterprise, telco, storage and artificial intelligence, data center applications.
NVIDIA Spectrum Ethernet Switches
Flexible form-factors with 16 to 128 physical ports, supporting 1GbE through 400GbE speeds.
Based on a ground-breaking silicon technology optimized for performance and scalability, NVIDIA Spectrum switches are ideal for building high-performance, cost-effective, and efficient Cloud Data Center Networks, Ethernet Storage Fabric, and Deep Learning Interconnects.
NVIDIA combines the benefits of NVIDIA Spectrum™ switches, based on an industry-leading application-specific integrated circuit (ASIC) technology, with a wide variety of modern network operating system choices, including NVIDIA Cumulus® Linux , SONiC and NVIDIA Onyx®.
Kubernetes
Kubernetes is an open-source container orchestration platform for deployment automation, scaling, and management of containerized applications.
Kubespray
Kubespray is a composition of Ansible playbooks, inventory, provisioning tools, and domain knowledge for generic OS/Kubernetes clusters configuration management tasks and provides:A highly available cluster
Composable attributes
Support for most popular Linux distributions
The NVIDIA GPU Operator uses the operator framework within Kubernetes to automate the management of all NVIDIA software components needed to provision GPU. These components include the NVIDIA drivers (to enable CUDA), Kubernetes device plugin for GPUs, the NVIDIA Container Runtime, automatic node labelling, DCGM-based monitoring, and more.
An analog to the NVIDIA GPU Operator, the NVIDIA Network Operator simplifies scale-out network design for Kubernetes by automating aspects of network deployment and configuration that would otherwise require manual work. It loads the required drivers, libraries, device plugins, and CNIs on any cluster node with an NVIDIA network interface. Paired with the NVIDIA GPU Operator, the Network Operator enables GPUDirect RDMA, a key technology that accelerates cloud-native AI workloads by orders of magnitude. The NVIDIA Network Operator uses Kubernetes CRD and the Operator Framework to provision the host software needed for enabling accelerated networking.
CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers can dramatically speed up computing applications by harnessing the power of GPUs. In GPU-accelerated applications, the sequential part of the workload runs on the CPU – which is optimized for single-threaded performance – while the compute-intensive portion of the application runs on thousands of GPU cores in parallel.
NVIDIA Rivermax SDK
NVIDIA Rivermax offers a unique IP-based solution for any media and data streaming use case. Rivermax together with NVIDIA GPU accelerated computing technologies unlocks innovation for a wide range of applications in Media and Entertainment (M&E), Broadcast, Healthcare, Smart Cities and more. Rivermax leverages NVIDIA ConnectX and BlueField DPU hardware streaming acceleration technology that enables direct data transfers to and from the GPU, delivering best-in-class throughput and latency with minimal CPU utilization for streaming workloads.
NVIDIA DeepStream SDK
NVIDIA DeepStream allows the rapid development and deployment of Vision AI applications and services. DeepStream provides multi-platform, scalable, TLS-encrypted security that can be deployed on-premises, on the edge, and in the cloud. It delivers a complete streaming analytics toolkit for AI-based multi-sensor processing, video, audio and image understanding. Principally DeepStream is for vision AI developers, software partners, startups and OEMs building IVA apps and services.
Networked Media Open Specifications (NMOS)
NMOS specifications are a family of open, free-of-charge specifications that enable interoperability between media devices on an IP infrastructure. The core specifications, IS-04 Registration and Discovery and IS-05 Device Connection Management, provide uniform mechanisms to enable media devices and services to advertise their capabilities onto the network, and control systems to configure the video, audio and data streams between the devices' senders and receivers. NMOS is extensible and, for example, includes specifications for audio channel mapping, for exchange of event and tally information, and for securing the APIs, leveraging IT best practices. There are open-source NMOS implementations available, and NVIDIA provides a free NMOS Node library in the DeepStream SDK.
Logical Design
The logical design includes the following parts:
Deployment node running Kubespray that deploys Kubernetes cluster
K8s Master node running all Kubernetes management components
K8s Worker nodes with NVIDIA GPUs and NVIDIA ConnectsX-6Dx adapter
High-speed Ethernet fabric (Secondary K8s network)
Deployment and K8s Management networks
Application Logical Design
In our guide we deployed the following applications:
Rivermax Media node
NMOS registry controller
DeepStream gateway
Time synchronization service
VNC apps for internal GUI access
Software Stack Components
Bill of Materials
The following hardware setup is utilized in this guide to build K8s cluster with two K8s Worker nodes.
You can use any suitable hardware according to the network topology and software stack.
Deployment and Configuration
Network / Fabric
This RDG describes K8s cluster deployment with multiple K8s Worker Nodes.
The high-performance network is a secondary network for Kubernetes cluster and requires the L2 network topology.
The Deployment/Management network topology and DNS/DHCP network services are part of the IT infrastructure. The components installation procedure and configuration are not covered in this guide.
Network IP Configuration
Below are the server names with their relevant network configurations.
Server/Switch Type |
Server/Switch Name |
IP and NICs |
|
High-Speed Network |
Management Network |
||
Deployment node |
depserver |
N/A |
eth0: DHCP 192.168.100.202 |
K8s Master node |
node1 |
N/A |
eth0: DHCP 192.168.100.29 |
K8s Worker Node1 |
node2 |
enp57s0f0: no IP set |
eth0: DHCP 192.168.100.34 |
K8s Worker Node2 |
node3 |
enp57s0f0: no IP set |
eth0: DHCP 192.168.100.39 |
High-speed switch |
switch |
mgmt0: DHCP 192.168.100.49 |
enpXXs0f0 high-speed network interfaces do not require additional configuration.
Wiring
On each K8s Worker Node only the first port of NVIDIA Network Adapter is wired to an NVIDIA switch in high-performance fabric using NVIDIA LinkX DAC cables.
The below figure illustrates the required wiring for building a K8s cluster.
Fabric configuration
Switch configuration is provided below:
Switch console
##
## Running database "initial"
## Generated at 2022/05/10 15:49:25 +0200
## Hostname: switch
## Product release: 3.9.3202
##
##
## Running-config temporary prefix mode setting
##
no cli default prefix-modes enable
##
## Interface Ethernet configuration
##
interface ethernet 1/1-1/32 speed 100GxAuto force
interface ethernet 1/1-1/32 switchport mode hybrid
##
## VLAN configuration
##
vlan 2
vlan 1001
vlan 2 name "RiverData"
vlan 1001 name "PTP"
interface ethernet 1/1-1/32 switchport hybrid allowed-vlan all
interface ethernet 1/5 switchport access vlan 1001
interface ethernet 1/7 switchport access vlan 1001
interface ethernet 1/5 switchport hybrid allowed-vlan add 2
interface ethernet 1/7 switchport hybrid allowed-vlan add 2
##
## STP configuration
##
no spanning-tree
##
## L3 configuration
##
interface vlan 1001
interface vlan 1001 ip address 172.20.0.1/24 primary
##
## IGMP Snooping configuration
##
ip igmp snooping unregistered multicast forward-to-mrouter-ports
ip igmp snooping
vlan 1001 ip igmp snooping
vlan 1001 ip igmp snooping querier
interface ethernet 1/5 ip igmp snooping fast-leave
interface ethernet 1/7 ip igmp snooping fast-leave
##
## Local user account configuration
##
username admin password 7 $6$mSW1WwYI$M5xfvsphrTRht6J2ByfF.J475tq8YuGKR6K1FwSgvkdb1QQFZbx/PtqK.GVJEBoMcmXsnB57QycP7jSp.Hy/Q.
username monitor password 7 $6$V/Og9kzY$qc.oU2Ma9MPJClZlbvymOrb1wtE0N5yfQYPamhzRYeN2npVY/lOE5iisHUpxNqm3Ku8lIWDTPiO/bklyCMi2o.
##
## AAA remote server configuration
##
# ldap bind-password ********
ldap vrf default enable
radius-server vrf default enable
# radius-server key ********
tacacs-server vrf default enable
# tacacs-server key ********
##
## Password restriction configuration
##
no password hardening enable
##
## SNMP configuration
##
snmp-server vrf default enable
##
## Network management configuration
##
# web proxy auth basic password ********
clock timezone Asia Middle_East Jerusalem
ntp vrf default disable
terminal sysrq enable
web vrf default enable
##
## PTP protocol
##
protocol ptp
ptp priority1 1
ptp vrf default enable
interface ethernet 1/5 ptp enable
interface ethernet 1/7 ptp enable
interface vlan 1001 ptp enable
##
## X.509 certificates configuration
##
#
# Certificate name system-self-signed, ID ca9888a2ed650c5c4bd372c055bdc6b4da65eb1e
# (public-cert config omitted since private-key config is hidden)
##
## Persistent prefix mode setting
##
cli default prefix-modes enable
Host
General Configuration
General Prerequisites:
Hardware
Ensure that all the K8s worker nodes have the exact hardware specification (see BoM for details).
Host BIOS
Verify that SR-IOV supported server platform is being used and review the BIOS settings in the server platform vendor documentation to enable SR-IOV in the BIOS.
Host OS
The Ubuntu Server 20.04 operating system should be installed on all servers with OpenSSH server packages.
Experience with Kubernetes
Familiarization with the Kubernetes Cluster architecture is essential.
Make sure that the BIOS settings on the K8s Worker Nodes are tuned for maximum performance.
All K8s Worker Nodes must have the exact PCIe placement for the NIC and should expose the same interface name.
Host OS Prerequisites
Ensure that a non-root depuser account is created during the deployment of the Ubuntu Server 20.04 operating system.
Update the Ubuntu software packages by running the following commands:
Server console
$ sudo
apt-get update
$ sudo
apt-get install
linux-image-lowlatency -y
$ sudo
apt-get upgrade -y
$ sudo
reboot
Add to non-root depuser account sudo privileges without password.
In this solution, the following line was added to the EOF /etc/sudoers:
Server console
$ sudo
vim /etc/sudoers
#includedir /etc/sudoers.d
#K8s cluster deployment user with sudo privileges without password
depuser ALL=(ALL) NOPASSWD:ALL
OFED Installation and Configuration
OFED installation is required only on the K8s Worker Nodes. To download the latest OFED version please visit Linux Drivers (nvidia.com).
Download and installation procedures are provided below. All steps are required with root privileges.
After OFED installation, please reboot your node.
Server console
wget https://content.mellanox.com/ofed/MLNX_OFED-5.5-1.0.3.2/MLNX_OFED_LINUX-5.5-1.0.3.2-ubuntu20.04-x86_64.iso
mount
-o loop ./MLNX_OFED_LINUX-5.5-1.0.3.2-ubuntu20.04-x86_64.iso /mnt/
/mnt/mlnxofedinstall --vma --without-fw-update
reboot
K8s Cluster Deployment
The Kubernetes cluster in this solution is installed using Kubespray with a non-root depuser account from the deployment node.
SSH Private Key and SSH Passwordless Login
Log in to the Deployment Node as a deployment user (in our case, depuser) and create an SSH private key for configuring the passwordless authentication on your computer by running the following commands:
Deployment Node console
$ ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key (/home/depuser/.ssh/id_rsa):
Created directory '/home/depuser/.ssh'.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/depuser/.ssh/id_rsa
Your public key has been saved in /home/depuser/.ssh/id_rsa.pub
The key fingerprint is:
SHA256:IfcjdT/spXVHVd3n6wm1OmaWUXGuHnPmvqoXZ6WZYl0 depuser@depserver
The key's randomart image is:
+---[RSA 3072]----+
| *|
| .*|
| . o . . o=|
| o + . o +E|
| S o .**O|
| . .o=OX=|
| . o%*.|
| O.o.|
| .*.ooo|
+----[SHA256]-----+
Copy your SSH private key, such as ~/.ssh/id_rsa, to all nodes in the deployment by running the following command (example):
Deployment Node console
$ ssh-copy-id depuser@192.168.100.29
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/home/depuser/.ssh/id_rsa.pub"
The authenticity of host '192.168.100.29 (192.168.100.29)' can't be established.
ECDSA key fingerprint is SHA256:6nhUgRlt9gY2Y2ofukUqE0ltH+derQuLsI39dFHe0Ag.
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
depuser@192.168.100.29's password:
Number of key(s) added: 1
Now try logging into the machine, with: "ssh 'depuser@192.168.100.29'"
and check to make sure that only the key(s) you wanted were added.
Verify that you have passwordless SSH connectivity to all nodes in your deployment by running the following command (example):
Deployment Node console
$ ssh
depuser@192.168.100.29
Kubespray Deployment and Configuration
General Setting
To install dependencies for running Kubespray with Ansible on the Deployment Node, please run the following commands:
Deployment Node console
$ cd
~
$ sudo
apt -y install
python3-pip jq
$ wget https://github.com/kubernetes-sigs/kubespray/archive/v2.18.1.tar
.gz
$ tar
-zxf v2.18.1.tar
.gz
$ cd
kubespray-2.18.1
$ sudo
pip3 install
-r requirements.txt
The default folder for subsequent commands is ~/kubespray-2.18.1.
To download the latest Kubespray version please visit Releases · kubernetes-sigs/kubespray · GitHub.
Deployment Customization
Create a new cluster configuration and
host configuration file
.
Replace the IP addresses below with your nodes' IP addresses:
Deployment Node console
$ cp -rfp inventory/sample inventory/mycluster
$ declare -a IPS=(192.168.100.29 192.168.100.34 192.168.100.39)
$ CONFIG_FILE=inventory/mycluster/hosts.yaml python3 contrib/inventory_builder/inventory.py ${IPS[@]}
As a result, the
inventory/mycluster/hosts.yaml
file will be created.
Review and change the host configuration in the file. Below is an example of this deployment:
inventory/mycluster/hosts.yaml
all:
hosts:
node1:
ansible_host: 192.168.100.29
ip: 192.168.100.29
access_ip: 192.168.100.29
node2:
ansible_host: 192.168.100.34
ip: 192.168.100.34
access_ip: 192.168.100.34
node3:
ansible_host: 192.168.100.39
ip: 192.168.100.39
access_ip: 192.168.100.39
children:
kube_control_plane:
hosts:
node1:
kube_node:
hosts:
node2:
node3:
etcd:
hosts:
node1:
k8s_cluster:
children:
kube_control_plane:
kube_node:
calico_rr:
hosts: {}
Deploying the Cluster Using KubeSpray Ansible Playbook
Run the following line to start the deployment procedure:
Deployment Node console
$ ansible-playbook -i inventory/mycluster/hosts.yaml --become --become-user=root cluster.yml
It takes a while for K8s cluster deployment to complete, please make sure no errors are encountered in the playbook log.
Below is an example of a successful result:
Deployment Node console
...
PLAY RECAP ***************************************************************************************************************************************************
localhost : ok=4 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
node1 : ok=501 changed=111 unreachable=0 failed=0 skipped=1131 rescued=0 ignored=2
node2 : ok=360 changed=40 unreachable=0 failed=0 skipped=661 rescued=0 ignored=1
node3 : ok=360 changed=40 unreachable=0 failed=0 skipped=660 rescued=0 ignored=1
Sunday 9 May 2021 19:39:17 +0000 (0:00:00.064) 0:06:54.711 ********
===============================================================================
kubernetes/control-plane : kubeadm | Initialize first master ----------------------------------------------------------------------------------------- 28.13s
kubernetes/control-plane : Master | wait for kube-scheduler ------------------------------------------------------------------------------------------ 12.78s
download : download_container | Download image if required ------------------------------------------------------------------------------------------- 10.56s
container-engine/containerd : ensure containerd packages are installed -------------------------------------------------------------------------------- 9.48s
download : download_container | Download image if required -------------------------------------------------------------------------------------------- 9.36s
download : download_container | Download image if required -------------------------------------------------------------------------------------------- 9.08s
download : download_container | Download image if required -------------------------------------------------------------------------------------------- 9.05s
download : download_file | Download item -------------------------------------------------------------------------------------------------------------- 8.91s
download : download_container | Download image if required -------------------------------------------------------------------------------------------- 8.47s
kubernetes/preinstall : Install packages requirements ------------------------------------------------------------------------------------------------- 8.30s
download : download_container | Download image if required -------------------------------------------------------------------------------------------- 7.49s
download : download_container | Download image if required -------------------------------------------------------------------------------------------- 7.39s
kubernetes-apps/ansible : Kubernetes Apps | Start Resources ------------------------------------------------------------------------------------------- 7.07s
download : download_container | Download image if required -------------------------------------------------------------------------------------------- 5.99s
container-engine/containerd : ensure containerd repository is enabled --------------------------------------------------------------------------------- 5.59s
container-engine/crictl : download_file | Download item ----------------------------------------------------------------------------------------------- 5.45s
download : download_file | Download item -------------------------------------------------------------------------------------------------------------- 5.34s
kubernetes-apps/ansible : Kubernetes Apps | Lay Down CoreDNS templates -------------------------------------------------------------------------------- 5.00s
download : download_container | Download image if required -------------------------------------------------------------------------------------------- 4.95s
download : download_file | Download item -------------------------------------------------------------------------------------------------------------- 4.50s
K8s Cluster Customization and Verification
Now that the K8S cluster is deployed, connection to the K8s cluster can be done from any K8S Master Node with the root user account or from another server with installed KUBECTL command and configured KUBECONFIG=<path-to-config-file> to customize deployment.
In our guide we continue the deployment from K8s Master Node with the root user account:
Label the Worker Nodes:
Master Node console
$ kubectl label nodes node2 node-role.kubernetes.io/worker=
$ kubectl label nodes node3 node-role.kubernetes.io/worker=
K8s Worker Node labeling is required for a proper installation of the NVIDIA Network Operator.
Below is an output example of the K8s cluster deployment information using the Calico CNI plugin.
To ensure that the Kubernetes cluster is installed correctly, run the following commands:
Master Node console
## Get cluster node status
kubectl get node -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
node1 Ready control-plane,master 9d v1.22.8 192.168.100.29 <none> Ubuntu 20.04.4 LTS 5.4.0-109-generic containerd://1.5.8
node2 Ready worker 9d v1.22.8 192.168.100.34 <none> Ubuntu 20.04.4 LTS 5.4.0-109-lowlatency containerd://1.5.8
node3 Ready worker 9d v1.22.8 192.168.100.39 <none> Ubuntu 20.04.4 LTS 5.4.0-109-lowlatency containerd://1.5.8
## Get system pods status
kubectl -n kube-system get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
calico-kube-controllers-5788f6558-bm5h9 1/1 Running 0 9d 192.168.100.29 node1 <none> <none>
calico-node-4f748 1/1 Running 0 9d 192.168.100.34 node2 <none> <none>
calico-node-jhbjh 1/1 Running 0 9d 192.168.100.39 node3 <none> <none>
calico-node-m78p6 1/1 Running 0 9d 192.168.100.29 node1 <none> <none>
coredns-8474476ff8-dczww 1/1 Running 0 9d 10.233.90.23 node1 <none> <none>
coredns-8474476ff8-ksvkd 1/1 Running 0 9d 10.233.96.234 node2 <none> <none>
dns-autoscaler-5ffdc7f89d-h6nc8 1/1 Running 0 9d 10.233.90.20 node1 <none> <none>
kube-apiserver-node1 1/1 Running 0 9d 192.168.100.29 node1 <none> <none>
kube-controller-manager-node1 1/1 Running 0 9d 192.168.100.29 node1 <none> <none>
kube-proxy-2bq45 1/1 Running 0 9d 192.168.100.34 node2 <none> <none>
kube-proxy-4c8p7 1/1 Running 0 9d 192.168.100.39 node3 <none> <none>
kube-proxy-j226w 1/1 Running 0 9d 192.168.100.29 node1 <none> <none>
kube-scheduler-node1 1/1 Running 0 9d 192.168.100.29 node1 <none> <none>
nginx-proxy-node2 1/1 Running 0 9d 192.168.100.34 node2 <none> <none>
nginx-proxy-node3 1/1 Running 0 9d 192.168.100.39 node3 <none> <none>
nodelocaldns-9rffq 1/1 Running 0 9d 192.168.100.39 node3 <none> <none>
nodelocaldns-fdnr7 1/1 Running 0 9d 192.168.100.34 node2 <none> <none>
nodelocaldns-qhpxk 1/1 Running 0 9d 192.168.100.29 node1 <none> <none>
NVIDIA GPU Operator Installation for K8s cluster
The preferred method to deploy the GPU Operator using helm from the K8s Master node. To install helm, simply use the following command:
$ snap install helm --classic
Add the NVIDIA GPU Operator Helm repository.
$ helm repo add nvidia https://nvidia.github.io/gpu-operator
$ helm repo update
Deploy NVIDIA GPU Operator.
GPU Operator should be deployed with enabled GPUDirect kernel module - driver.rdma.enabled=true.
$ helm install --wait --generate-name -n gpu-operator --create-namespace nvidia/gpu-operator --set driver.rdma.enabled=true --set driver.rdma.useHostMofed=true
$ helm ls -n gpu-operator
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
gpu-operator-1652190420 gpu-operator 1 2022-05-10 13:47:01.106147933 +0000 UTC deployed gpu-operator-v1.10.0 v1.10.0 NAME
Once the Helm chart is installed, check the status of the pods to ensure all the containers are running and the validation is complete:
$ kubectl get pod -n gpu-operator -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
gpu-feature-discovery-bcc22 1/1 Running 1 (3d8h ago) 5d18h 10.233.96.3 node2 <none> <none>
gpu-feature-discovery-vl68h 1/1 Running 0 5d18h 10.233.92.58 node3 <none> <none>
gpu-operator-1652190420-node-feature-discovery-master-5b5fx8zlx 1/1 Running 1 (4m5s ago) 5d18h 10.233.90.17 node1 <none> <none>
gpu-operator-1652190420-node-feature-discovery-worker-czsb4 1/1 Running 0 4s 10.233.92.75 node3 <none> <none>
gpu-operator-1652190420-node-feature-discovery-worker-fnlj6 1/1 Running 0 4s 10.233.96.253 node2 <none> <none>
gpu-operator-1652190420-node-feature-discovery-worker-r44r8 1/1 Running 1 (4m5s ago) 5d18h 10.233.90.22 node1 <none> <none>
gpu-operator-6497cbf9cd-vcsrg 1/1 Running 1 (4m6s ago) 5d18h 10.233.90.19 node1 <none> <none>
nvidia-container-toolkit-daemonset-4h9dr 1/1 Running 0 5d18h 10.233.96.246 node2 <none> <none>
nvidia-container-toolkit-daemonset-rv7sn 1/1 Running 1 (5d18h ago) 5d18h 10.233.92.50 node3 <none> <none>
nvidia-cuda-validator-kr6q9 0/1 Completed 0 5d18h 10.233.92.61 node3 <none> <none>
nvidia-cuda-validator-zb4p8 0/1 Completed 0 5d18h 10.233.96.4 node2 <none> <none>
nvidia-dcgm-exporter-5hdzh 1/1 Running 0 5d18h 10.233.96.198 node2 <none> <none>
nvidia-dcgm-exporter-lnqzb 1/1 Running 0 5d18h 10.233.92.57 node3 <none> <none>
nvidia-device-plugin-daemonset-dxgnz 1/1 Running 0 5d18h 10.233.92.62 node3 <none> <none>
nvidia-device-plugin-daemonset-w692b 1/1 Running 0 5d18h 10.233.96.9 node2 <none> <none>
nvidia-device-plugin-validator-pqns8 0/1 Completed 0 5d18h 10.233.92.64 node3 <none> <none>
nvidia-device-plugin-validator-sgtmt 0/1 Completed 0 5d18h 10.233.96.10 node2 <none> <none>
nvidia-driver-daemonset-l9x4n 2/2 Running 1 (2d19h ago) 5d18h 10.233.92.30 node3 <none> <none>
nvidia-driver-daemonset-tf2tl 2/2 Running 5 (2d21h ago) 5d18h 10.233.96.244 node2 <none> <none>
nvidia-operator-validator-p6794 1/1 Running 0 5d18h 10.233.96.6 node2 <none> <none>
nvidia-operator-validator-xjrg9 1/1 Running 0 5d18h 10.233.92.54 node3 <none> <none>
NVIDIA Network Operator Installation
The NVIDIA Network Operator leverages Kubernetes CRDs and Operator SDK to manage networking-related components in order to enable fast networking and RDMA for workloads in K8s cluster. The Fast Network is a secondary network of the K8s cluster for applications that require high bandwidth or low latency.
To make it work, several components need to be provisioned and configured. The Helm is required for the Network Operator deployment.
Add the NVIDIA Network Operator Helm repository:
## Add REPO
helm repo add mellanox https://mellanox.github.io/network-operator \
&& helm repo update
Create the values.yaml file to customize the Network Operator deployment (e xample):
nfd:
enabled: true
sriovNetworkOperator:
enabled: true
ofedDriver:
deploy: false
nvPeerDriver:
deploy: false
rdmaSharedDevicePlugin:
deploy: false
sriovDevicePlugin:
deploy: false
deployCR: true
secondaryNetwork:
deploy: true
cniPlugins:
deploy: true
multus:
deploy: true
ipamPlugin:
deploy: true
Deploy the operator:
helm install -f ./values.yaml -n network-operator --create-namespace --wait mellanox/network-operator --generate-name
helm ls -n network-operator
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
network-operator-1648457278 network-operator 1 2022-03-28 08:47:59.548667592 +0000 UTC deployed network-operator-1.1.0 v1.1.0
Once the Helm chart is installed, check the status of the pods to ensure all the containers are running:
## PODs status in namespace - network-operator
kubectl -n network-operator get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
network-operator-1648457278-5885dbfff5-wjgsc 1/1 Running 0 5m 10.233.90.15 node1 <none> <none>
network-operator-1648457278-node-feature-discovery-master-zbcx8 1/1 Running 0 5m 10.233.90.16 node1 <none> <none>
network-operator-1648457278-node-feature-discovery-worker-kk4qs 1/1 Running 0 5m 10.233.90.18 node1 <none> <none>
network-operator-1648457278-node-feature-discovery-worker-n44b6 1/1 Running 0 5m 10.233.92.221 node3 <none> <none>
network-operator-1648457278-node-feature-discovery-worker-xhzfw 1/1 Running 0 5m 10.233.96.233 node2 <none> <none>
network-operator-1648457278-sriov-network-operator-5cd4bdb6mm9f 1/1 Running 0 5m 10.233.90.21 node1 <none> <none>
sriov-device-plugin-cxnrl 1/1 Running 0 5m 192.168.100.34 node2 <none> <none>
sriov-device-plugin-djlmn 1/1 Running 0 5m 192.168.100.39 node3 <none> <none>
sriov-network-config-daemon-rgfvk 3/3 Running 0 5m 192.168.100.39 node3 <none> <none>
sriov-network-config-daemon-zzchs 3/3 Running 0 5m 192.168.100.34 node2 <none> <none>
## PODs status in namespace - nvidia-network-operator-resources
kubectl -n nvidia-network-operator-resources get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
cni-plugins-ds-snf6x 1/1 Running 0 5m 192.168.100.39 node3 <none> <none>
cni-plugins-ds-zjb27 1/1 Running 0 5m 192.168.100.34 node2 <none> <none>
kube-multus-ds-mz7nd 1/1 Running 0 5m 192.168.100.39 node3 <none> <none>
kube-multus-ds-xjxgd 1/1 Running 0 5m 192.168.100.34 node2 <none> <none>
whereabouts-jgt24 1/1 Running 0 5m 192.168.100.34 node2 <none> <none>
whereabouts-sphx4 1/1 Running 0 5m 192.168.100.39 node3 <none> <none>
High-Speed Network Configuration
After installing the operator, please check the SriovNetworkNodeState CRs to see all SR-IOV-enabled devices in your node.
In this deployment, the network interface has been chosen with the following name: enp57s0f0
.
To review the interface status please use the following command:
NICs status
## NIC status
kubectl -n network-operator get sriovnetworknodestates.sriovnetwork.openshift.io node2 -o yaml
...
status:
interfaces:
deviceID: 101d
driver: mlx5_core
eSwitchMode: legacy
linkSpeed: 100000 Mb/s
linkType: ETH
mac: 0c:42:a1:2b:73:fa
mtu: 9000
name: enp57s0f0
numVfs: 8
pciAddress: "0000:39:00.0"
totalvfs: 8
vendor: 15b3
- deviceID: 101d
driver: mlx5_core
...
Create SriovNetworkNodePolicy CR for chosen network interface - policy.yaml file, by specifying the chosen interface in the 'nicSelector'.
According to application design VF0 allotted into a separate pool from the rest of VFn:
policy.yaml
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
name: mlnxnics-sw1
namespace: network-operator
spec:
nodeSelector:
feature.node.kubernetes.io/custom-rdma.capable: "true"
resourceName: timepool
priority: 99
mtu: 9000
numVfs: 8
nicSelector:
pfNames: [ "enp57s0f0#0-0" ]
deviceType: netdevice
isRdma: true
---
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
name: mlnxnics-sw2
namespace: network-operator
spec:
nodeSelector:
feature.node.kubernetes.io/custom-rdma.capable: "true"
resourceName: rdmapool
priority: 99
mtu: 9000
numVfs: 8
nicSelector:
pfNames: [ "enp57s0f0#1-7" ]
deviceType: netdevice
isRdma: true
Deploy policy.yaml:
kubectl apply -f policy.yaml
sriovnetworknodepolicy.sriovnetwork.openshift.io/mlnxnics-sw1 created
sriovnetworknodepolicy.sriovnetwork.openshift.io/mlnxnics-sw2 created
This step takes a while. This depends on the amount of K8s Worker Nodes to apply the configuration, and the number of VFs for each selected network interface.
Create an SriovNetwork CR for chosen network interface - network.yaml file which refers to the 'resourceName' defined in SriovNetworkNodePolicy.
In this example below created:
timenet - K8s network name for PTP time sync
rdmanet - K8s network name with dynamic IPAM
rdma-static - K8s network name with static IPAM
network.yaml
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetwork
metadata:
name: timenet
namespace: network-operator
spec:
ipam: |
{
"datastore": "kubernetes",
"kubernetes": {"kubeconfig": "/etc/cni/net.d/whereabouts.d/whereabouts.kubeconfig"},
"log_file": "/tmp/whereabouts.log",
"log_level": "debug",
"type": "whereabouts",
"range": "172.20.0.0/24",
"exclude": [ "172.20.0.1/32" ]
}
networkNamespace: default
resourceName: timepool
trust: "on"
vlan: 0
---
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetwork
metadata:
name: rdmanet
namespace: network-operator
spec:
ipam: |
{
"datastore": "kubernetes",
"kubernetes": { "kubeconfig": "/etc/cni/net.d/whereabouts.d/whereabouts.kubeconfig" },
"log_file": "/tmp/whereabouts.log",
"log_level": "debug",
"type": "whereabouts",
"range": "192.168.102.0/24",
"exclude": [ "192.168.102.254/32", "192.168.102.253/32" ]
}
networkNamespace: default
resourceName: rdmapool
vlan: 2
---
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetwork
metadata:
name: rdmanet-static
namespace: network-operator
spec:
ipam: |
{
"type": "static"
}
networkNamespace: default
resourceName: rdmapool
vlan: 2
Deploy network.yaml:
kubectl apply -f network.yaml
sriovnetwork.sriovnetwork.openshift.io/timenet created
sriovnetwork.sriovnetwork.openshift.io/rdmanet created
sriovnetwork.sriovnetwork.openshift.io/rdmanet-static created
Manage HugePages
Kubernetes supports the allocation and consumption of pre-allocated HugePages by applications in a Pod. The nodes will automatically discover and report all HugePages resources as schedulable resources. For get additional information K8s HugePages management, please refer here.
In order to allocate,
HugePages
needs to modify GRUB_CMDLINE_LINUX_DEFAULT parameter in /etc/default/grub.
This setting, below, allocates 2MB * 8192 pages = 16GB
HugePages
on boot time:
/etc/default/grub
...
GRUB_CMDLINE_LINUX_DEFAULT="default_hugepagesz=2M hugepagesz=2M hugepages=8192"
...
Run update-grub to apply the config to grub and reboot server:
Worker Node console
# update-grub
# reboot
After the server comes back, check hugepages allocation from master node by command:
Master Node console
# kubectl describe nodes node2
...
Capacity:
cpu: 48
ephemeral-storage: 459923528Ki
hugepages-1Gi: 0
hugepages-2Mi: 16Gi
memory: 264050900Ki
nvidia.com/gpu: 2
nvidia.com/rdmapool: 7
nvidia.com/timepool: 1
pods: 110
Allocatable:
cpu: 46
ephemeral-storage: 423865522704
hugepages-1Gi: 0
hugepages-2Mi: 16Gi
memory: 246909140Ki
nvidia.com/gpu: 2
nvidia.com/rdmapool: 7
nvidia.com/timepool: 1
pods: 110
...
Enable CPU and Topology Management
CPU Manager manages groups of CPUs and constrains workloads to specific CPUs.
CPU Manager is useful for workloads that have some of these attributes:
Require as much CPU time as possible
Are sensitive to processor cache misses
Are low-latency network applications
Coordinate with other processes and benefit from sharing a single processor cache
Topology Manager uses topology information from collected hints to decide if a pod can be accepted or rejected on a node, based on the configured Topology Manager policy and Pod resources requested. In order to extract the best performance, optimizations related to CPU isolation and memory and device locality are required.
Topology Manager is useful for workloads that use hardware accelerators to support latency-critical execution and high throughput parallel computation.
To use Topology Manager, CPU Manager with static policy must be used.
For additional information, please refer to Control Topology Management Policies on a node and Control Topology Management Policies on a node.
In order to enable CPU Manager and Topology Manager, please add following lines to kubelet configuration file /etc/kubernetes/kubelet-config.yaml:
/etc/kubernetes/kubelet-config.yaml
...
cpuManagerPolicy: static
cpuManagerReconcilePeriod: 10s
topologyManagerPolicy: single-numa-node
featureGates:
CPUManager: true
TopologyManager: true
Due to changes in cpuManagerPolicy, remove /var/lib/kubelet/cpu_manager_state and restart kubelet service on each affected K8s worker node.
Worker Node console
# rm -f /var/lib/kubelet/cpu_manager_state
# service kubelet restart
Application
Below provides K8s specific components and K8s YAML configuration files to deploy Rivermax applications in K8s cluster.
For proper application execution Rivermax license is required. To obtain a license please look at Rivermax License Generation Guidelines.
To download Rivermax apps container images from container repository and application pipeline, you need to register and log in to the Rivermax portal by clicking on "Get Started".
Rivermax license
Upload Rivermax license as configmap value in K8s cluster.
kubectl create configmap rivermax-config --from-file=rivermax.lic=./rivermax.lic
Media Node application
This pod definition contains implementation of the AMWA Networked Media Open Specifications (NMOS) with the NMOS Rivermax Node implementation. For more information about AMWA, NMOS and the Networked Media Incubator, please refer to http://amwa.tv/. For more information about Rivermax SDK please refer to https://developer.nvidia.com/networking/Rivermax.
Below provided YAML configuration file for Media Node deployment. Please fill your container file name and your registry secret.
apiVersion: v1
kind: ConfigMap
metadata:
name: river-config
data:
container-config: |-
#media_node JSON file to run
config_json=/var/home/config.json
#Output registry stdout/stderr output to a log inside container
log_output=FALSE
#Update/insert label parameter with container hostname on entrypoint script run
update_label=TRUE
#Allow these network interfaces in /etc/avahi/avahi-daemon.conf
allow_interfaces=net1
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: "mnc"
labels:
apps: rivermax
spec:
replicas: 3
selector:
matchLabels:
app: rivermax
template:
metadata:
labels:
app: rivermax
annotations:
k8s.v1.cni.cncf.io/networks: rdmanet
spec:
containers:
- command:
image: < media node container image >
name: "medianode"
env:
- name: DISPLAY
value: "192.168.102.253:0.0"
resources:
requests:
nvidia.com/rdmapool: 1
hugepages-2Mi: 4Gi
memory: 8Gi
cpu: 4
limits:
nvidia.com/rdmapool: 1
hugepages-2Mi: 4Gi
memory: 8Gi
cpu: 4
securityContext:
capabilities:
add: [ "IPC_LOCK", "SYS_RESOURCE", "NET_RAW","NET_ADMIN" ]
volumeMounts:
- name: config
mountPath: /var/home/ext/
- name: licconfig
mountPath: /opt/mellanox/rivermax/
- mountPath: /hugepages
name: hugepage
- mountPath: /dev/shm
name: dshm
volumes:
- name: config
configMap:
name: river-config
- name: licconfig
configMap:
name: rivermax-config
- name: hugepage
emptyDir:
medium: HugePages
- name: dshm
emptyDir: {
medium: 'Memory',
sizeLimit: '4Gi'
}
imagePullSecrets:
- name: < Container registry secret >
NMOS controller
AMWA NMOS controller is a device that can interact with NMOS APIs, which are a family of open specifications for networked media for professional applications. NMOS controller can discover, register, connect and manage media devices on an IP infrastructure using common methods and protocols. NMOS controller can also handle event and tally, audio channel mapping, authorization and other functions that are part of the NMOS roadmap. For more information, please look at README.md.
apiVersion: v1
kind: Pod
metadata:
name: nmos-cpp
labels:
app.kubernetes.io/name: nmos
annotations:
k8s.v1.cni.cncf.io/networks: |
[
{ "name": "rdmanet-static",
"ips": [ "192.168.102.254/24" ]
}
]
spec:
containers:
- name: nmos-pod
image: docker.io/rhastie/nmos-cpp:latest
env:
- name: RUN_NODE
value: "true"
resources:
requests:
cpu: 2
memory: 1Gi
nvidia.com/rdmapool: 1
limits:
cpu: 2
memory: 1Gi
nvidia.com/rdmapool: 1
ports:
- containerPort: 8010
name: port-8010
- containerPort: 8011
name: port-8011
- containerPort: 11000
name: port-11000
- containerPort: 11001
name: port-11001
- containerPort: 1883
name: port-1883
- containerPort: 5353
name: port-5353
protocol: UDP
DeepStream Media Gateway
One of the applications of DeepStream SDK is to encode RAW data to SRT stream. This application can capture video frames from a camera or a file, encode them using H.264 or H.265 codec, and send them over a network using SRT protocol. SRT stands for Secure Reliable Transport, which is a low-latency and secure streaming technology. This application can be useful for scenarios such as remote surveillance, live broadcasting, or video conferencing.
Below provided YAML configuration file for Media Gateway deployment. Please fill your container file name and your registry secret.
apiVersion: v1
kind: Pod
metadata:
name: ds-rmax
labels:
name: dsrmax-app
annotations:
k8s.v1.cni.cncf.io/networks: rdmanet
spec:
containers:
- name: dsrmax
image: < DeepStream media gateway container image >
command:
- sh
- -c
- sleep inf
env:
- name: DISPLAY
value: "192.168.102.253:0.0"
ports:
- containerPort: 7001
name: udp-port
securityContext:
capabilities:
add: [ "IPC_LOCK", "SYS_RESOURCE", "NET_RAW","NET_ADMIN"]
resources:
requests:
nvidia.com/rdmapool: 1
nvidia.com/gpu: 1
hugepages-2Mi: 2Gi
memory: 8Gi
cpu: 8
limits:
nvidia.com/rdmapool: 1
nvidia.com/gpu: 1
hugepages-2Mi: 2Gi
memory: 8Gi
cpu: 8
volumeMounts:
- name: config
mountPath: /var/home/ext/
- name: licconfig
mountPath: /opt/mellanox/rivermax/
- mountPath: /hugepages
name: hugepage
- mountPath: /dev/shm
name: dshm
volumes:
- name: config
configMap:
name: river-config
- name: licconfig
configMap:
name: rivermax-config
- name: hugepage
emptyDir:
medium: HugePages
- name: dshm
emptyDir: {
medium: 'Memory',
sizeLimit: '4Gi'
}
imagePullSecrets:
- name: < Container registry secret >
---
apiVersion: v1
kind: Service
metadata:
name: rmax-service
spec:
type: NodePort
selector:
name: dsrmax-app
ports:
# By default and for convenience, the `targetPort` is set to the same value as the `port` field.
- port: 7001
name: udp-port
protocol: UDP
targetPort: 7001
VNC container with GUI
This pop definition allows you to access a web VNC interface with Ubuntu LXDE/LXQT desktop environment inside a Kubernetes cluster. It uses a interface of the K8s secondary network to manage applications via GUI on your cluster nodes.
Below provided YAML configuration file for VNC deployment. Please fill your container file name.
Example of this application can be found at - GitHub - theasp/docker-novnc: noVNC Display Container for Docker, but you can create your own container image.
apiVersion: v1
kind: Pod
metadata:
name: ub-vnc
labels:
name: ubuntu-vnc
annotations:
k8s.v1.cni.cncf.io/networks: |
[
{ "name": "rdmanet-static",
"ips": [ "192.168.102.253/24" ]
}
]
spec:
volumes:
- name: dshm
emptyDir:
medium: Memory
containers:
- image: < NOVNC container image >
name: vnc-container
resources:
limits:
cpu: 4
memory: 8Gi
nvidia.com/rdmapool: 1
env:
- name: DISPLAY_WIDTH
value: "1920"
- name: DISPLAY_HEIGHT
value: "1080"
- name: RUN_XTERM
value: "yes"
- name: RUN_FLUXBOX
value: "yes"
ports:
- containerPort: 8080
name: http-port
volumeMounts:
- mountPath: /dev/shm
name: dshm
---
apiVersion: v1
kind: Service
metadata:
name: vnc-service
spec:
type: NodePort
selector:
name: ubuntu-vnc
ports:
- port: 8080
name: http-port
targetPort: 8080
Authors
|
Vitaliy Razinkov Over the past few years, Vitaliy Razinkov has been working as a Solutions Architect on the NVIDIA Networking team, responsible for complex Kubernetes/OpenShift and Microsoft's leading solutions, research and design. He previously spent more than 25 years in senior positions at several companies. Vitaliy has written several reference design guides on Microsoft technologies, RoCE/RDMA accelerated machine learning in Kubernetes/OpenShift, and container solutions, all of which are available on the NVIDIA Networking Documentation website. |
|
Gareth Sylvester-Bradley Gareth Sylvester-Bradley is a Principal Engineer at NVIDIA, and currently serving as the chair of the Networked Media Open Specifications (NMOS) Architecture Review group in the Advanced Media Workflow Association (AMWA). He is focused on building software toolkits and agile, collaborative industry specifications to deliver open, software-defined, hardware-accelerated media workflows for broadcast, live production, medical imaging, industrial video, etc. |