RDG for DPDK Applications on SR-IOV Enabled Kubernetes Cluster with NVIDIA Network Operator
Created on July 7, 2021.
Scope
The following R eference D eployment G uide ( RDG ) explains how to build a high performing Kubernetes (K8s) cluster with containerd container runtime that is capable of running DPDK-based applications over NVIDIA Networking end-to-end Ethernet infrastructure.
This RDG describes a solution with multiple servers connected to a single switch that provides secondary network for the Kubernetes cluster. A more complex scale-out network topology of multiple L2 domains is beyond the scope of this document.
Abbreviations and Acronyms
Term |
Definition |
Term |
Definition |
CNI |
Container Network Interface |
LLDP |
Link Layer Discovery Protocol |
CR |
Custom Resources |
NFD |
Node Feature Discovery |
CRD |
Custom Resources Definition |
OCI |
Open Container Initiative |
CRI |
Container Runtime Interface |
PF |
Physical Function |
DHCP |
Dynamic Host Configuration Protocol |
QSG |
Quick Start Guide |
DNS |
Domain Name System |
RDG |
Reference Deployment Guide |
DP |
Device Plugin |
RDMA |
Remote Direct Memory Access |
DPDK |
Data Plane Development Kit |
RoCE |
RDMA over Converged Ethernet |
EVPN |
Ethernet VPN |
SR-IOV |
Single Root Input Output Virtualization |
HWE |
Hardware Enablement |
VF |
Virtual Function |
IPAM |
IP Address Management |
VPN |
Virtual Private Network |
K8s |
Kubernetes |
VXLAN |
Virtual eXtensible Local Area Network |
Introduction
Provisioning Kubernetes cluster with containerd container runtime for running DPDK-based workloads may become an extremely complicated task.
Proper design and software and hardware component selection may become a gating task toward successful deployment.
This guide provides a complete solution cycle including technology overview, design, component selection, and deployment steps.
The solution will be delivered on top of standard servers over the NVIDIA end-to-end Ethernet infrastructure.
In this document, we will be using the new NVIDIA Network Operator which is in charge of deploying and configuring SRIOV Device Plugin and SRIOV CNI.
These components allow to run DPDK workloads on a Kubernetes Worker Node.
References
Solution Architecture
Key Components and Technologies
NVIDIA ConnectX SmartNICs
10/25/40/50/100/200 and 400G Ethernet Network Adapters
The industry-leading NVIDIA® ConnectX® family of smart network interface cards (SmartNICs) offer advanced hardware offloads and accelerations.
NVIDIA Ethernet adapters enable the highest ROI and lowest Total Cost of Ownership for hyperscale, public and private clouds, storage, machine learning, AI, big data, and telco platforms.
The NVIDIA® LinkX® product family of cables and transceivers provides the industry’s most complete line of 10, 25, 40, 50, 100, 200, and 400GbE in Ethernet and 100, 200 and 400Gb/s InfiniBand products for Cloud, HPC, hyperscale, Enterprise, telco, storage and artificial intelligence, data center applications.
NVIDIA Spectrum Ethernet Switches
Flexible form-factors with 16 to 128 physical ports, supporting 1GbE through 400GbE speeds.
Based on a ground-breaking silicon technology optimized for performance and scalability, NVIDIA Spectrum switches are ideal for building high-performance, cost-effective, and efficient Cloud Data Center Networks, Ethernet Storage Fabric, and Deep Learning Interconnects.
NVIDIA combines the benefits of NVIDIA Spectrum™ switches, based on an industry-leading application-specific integrated circuit (ASIC) technology, with a wide variety of modern network operating system choices, including NVIDIA Cumulus® Linux , SONiC and NVIDIA Onyx®.
NVIDIA® Cumulus® Linux is the industry's most innovative open network operating system that allows you to automate, customize, and scale your data center network like no other.
RDMA is a technology that allows computers in a network to exchange data without involving the processor, cache or operating system of either computer.
Like locally based DMA, RDMA improves throughput and performance and frees up compute resources.
Kubernetes
Kubernetes is an open-source container orchestration platform for deployment automation, scaling, and management of containerized applications.
Kubespray
Kubespray is a composition of Ansible playbooks, inventory, provisioning tools, and domain knowledge for generic OS/Kubernetes clusters configuration management tasks and provides:A highly available cluster
Composable attributes
Support for most popular Linux distributions
An analog to the NVIDIA GPU Operator, the NVIDIA Network Operator simplifies scale-out network design for Kubernetes by automating aspects of network deployment and configuration that would otherwise require manual work. It loads the required drivers, libraries, device plugins, and CNIs on any cluster node with an NVIDIA network interface. Paired with the NVIDIA GPU Operator, the Network Operator enables GPUDirect RDMA, a key technology that accelerates cloud-native AI workloads by orders of magnitude. The NVIDIA Network Operator uses Kubernetes CRD and the Operator Framework to provision the host software needed for enabling accelerated networking.
What is containerd?
An industry-standard container runtime with an emphasis on simplicity, robustness and portability. containerd is available as a daemon for Linux and Windows. It manages the complete container lifecycle of its host system, from image transfer and storage to container execution and supervision to low-level storage to network attachments and beyond.NVIDIA Poll Mode Driver (PMD) is an open-source upstream driver, embedded within dpdk.org releases, designed for fast packet processing and low latency by providing kernel bypass for receive and send and by avoiding the interrupt processing performance overhead.
TRex—Realistic Traffic Generator
TRex is an open source stateful and stateless traffic generator fueled by DPDK. It generates L3-7 traffic and provides in one tool capabilities provided by commercial tools. TRex can scale up to 200Gb/sec with one server.
Logical Design
The logical design includes the following parts:
Deployment node running Kubespray that deploys Kubernetes clusters.
K8s master node running all Kubernetes management components.
K8s worker nodes.
TRex server.
High-speed Ethernet fabric for DPDK tenant network
Deployment and K8s management network.
Fabric Design
The high-performance network is a secondary network for Kubernetes cluster and required the L2 network topology.
This RDG describes a solution with multiple servers connected to a single switch that provides secondary network for the Kubernetes cluster.
A more complex scale-out network topology of multiple L2 domains is beyond the scope of this document.
Software Stack Components
Bill of Materials
The following hardware setup is utilized in this guide.
The above table does not contain the management network connectivity components.
Deployment and Configuration
Wiring
On each K8s worker node and TRex server, the first port of each NVIDIA Network Adapter is wired to the NVIDIA switch in high-performance fabric using NVIDIA LinkX DAC cables.
Deployment and Management network is part of IT infrastructure and is not covered in this guide.
Fabric
Prerequisites
High-performance Ethernet fabric
Single switch
NVIDIA SN2100
Switch OS
Cumulus Linux v4.2.1
Deployment and management network
DNS and DHCP network services and network topology are part of the IT infrastructure. The component installation and configuration are not covered in this guide.
Network Configuration
Below are the server names with their relevant network configurations.
Server/Switch type |
Server/Switch name |
IP and NICS |
|
High-speed network |
Management network 1/25 GbE |
||
Deployment node |
depserver |
ens4f0: DHCP 192.168.222.110 |
|
Master node |
node1 |
ens4f0: DHCP 192.168.222.111 |
|
Worker node |
node2 |
ens2f0: no IP set |
ens4f0: DHCP 192.168.222.101 |
Worker node |
node3 |
ens2f0: no IP set |
ens4f0: DHCP 192.168.222.102 |
TRex server |
node4 |
ens2f0: no IP set ens2f1: no IP set |
ens4f0: DHCP 192.168.222.103 |
High-speed switch |
leaf01 |
mgmt0: From DHCP 192.168.222.201 |
ensXf0 high-speed network interfaces do not require additional configuration.
Fabric Configuration
This solution is based on Cumulus Linux v4.2.1 switch operation system.
Intermediate-level Linux knowledge is assumed for this guide. Familiarity with basic text editing, Linux file permissions, and process monitoring is required. A variety of text editors are pre-installed, including vi and nano.
Networking engineers who are unfamiliar with Linux concepts should refer to this reference guide to compare the Cumulus Linux CLI and configuration options and their equivalent Cisco Nexus 3000 NX-OS commands and settings. There is also a series of short videos with an introduction to Linux and Cumulus-Linux-specific concepts.
A Greenfield deployment is assumed for this guide. Please refer to the following guide for Upgrading Cumulus Linux.
Fabric configuration steps:
Administratively enable all physical ports.
Create a bridge and configure one or more front panel ports as members of the bridge.
Commit configuration.
Switch configuration steps.
Switch console
Linux swx-mld-l03 4.19.0-cl-1-amd64 #1 SMP Cumulus 4.19.94-1+cl4.2.1u1 (2020-08-28) x86_64
Welcome to NVIDIA Cumulus (R) Linux (R)
For support and online technical documentation, visit
http://www.cumulusnetworks.com/support
The registered trademark Linux (R) is used pursuant to a sublicense from LMI,
the exclusive licensee of Linus Torvalds, owner of the mark on a world-wide
basis.
cumulus@leaf01:mgmt:~$ net add interface swp1-16
cumulus@leaf01:mgmt:~$ net add bridge bridge ports swp1-16
cumulus@leaf01:mgmt:~$ net commit
To view link status, use the net show interface all command. The following examples show the output of ports in admin down, down, and up modes.
Switch console
cumulus@leaf01:mgmt:~$ net show interface all
State Name Spd MTU Mode LLDP Summary
----- ------ ---- ----- ---------- ------------------------------- ------------------------
UP lo N/A 65536 Loopback IP: 127.0.0.1/8
lo IP: ::1/128
UP eth0 1G 1500 Mgmt mgmt-xxx-xxx-xxx-xxx (8) Master: mgmt(UP)
eth0 IP: 192.168.222.201/24(DHCP)
UP swp1 100G 9216 Access/L2 Master: bridge(UP)
UP swp2 100G 9216 Access/L2 node2 (0c:42:a1:2b:74:ae) Master: bridge(UP)
UP swp3 100G 9216 Access/L2 Master: bridge(UP)
UP swp4 100G 9216 Access/L2 node3 (0c:42:a1:24:05:4a) Master: bridge(UP)
UP swp5 100G 9216 Access/L2 Master: bridge(UP)
UP swp6 100G 9216 Access/L2 node4 (0c:42:a1:24:05:1a) Master: bridge(UP)
UP swp7 100G 9216 Access/L2 Master: bridge(UP)
UP swp8 100G 9216 Access/L2 node4 (0c:42:a1:24:05:1b) Master: bridge(UP)
DN swp9 N/A 9216 Access/L2 Master: bridge(UP)
DN swp10 N/A 9216 Access/L2 Master: bridge(UP)
DN swp11 N/A 9216 Access/L2 Master: bridge(UP)
DN swp12 N/A 9216 Access/L2 Master: bridge(UP)
DN swp13 N/A 9216 Access/L2 Master: bridge(UP)
DN swp14 N/A 9216 Access/L2 Master: bridge(UP)
DN swp15 N/A 9216 Access/L2 Master: bridge(UP)
DN swp16 N/A 9216 Access/L2 Master: bridge(UP)
UP bridge N/A 9216 Bridge/L2
UP mgmt N/A 65536 VRF IP: 127.0.0.1/8
mgmt IP: ::1/128
Nodes Configuration
General Prerequisites:
Hardware
All the K8s worker nodes have the same hardware specification (see BoM for details).
Host BIOS
Verify that SR-IOV supported server platform is being used and review the BIOS settings in the server platform vendor documentation to enable SR-IOV in the BIOS.
Host OS
Ubuntu Server 20.04 operating system should be installed on all servers with OpenSSH server packages.
Experience with Kubernetes
Familiarization with the Kubernetes Cluster architecture is essential.
Make sure that the BIOS settings on the worker nodes servers have SR-IOV enabled and that the servers are tuned for maximum performance.
All worker nodes must have the same PCIe placement for the NIC and expose the same interface name.
Host OS Prerequisites
Make sure Ubuntu Server 20.04 operating system is installed on all servers with OpenSSH server packages and create a non-root depuser account with sudo privileges without password.
Update the Ubuntu software packages by running the following commands:
Server console
$ sudo apt-get update
$ sudo apt-get upgrade -y
$ sudo reboot
In this solution we added the following line to the EOF /etc/sudoers:
Server console
$ sudo vim /etc/sudoers
#includedir /etc/sudoers.d
#K8s cluster deployment user with sudo privileges without password
depuser ALL=(ALL) NOPASSWD:ALL
NIC Firmware Upgrade
It is recommended to upgrade the NIC firmware on the worker nodes to the latest released version.
Download mlxup
firmware update and query utility
to each
worker node and update the NIC firmware.
The most recent version of mlxup can be downloaded from
the official download page. m
lxup can download and update the NIC firmware to the latest firmware over the Internet.
The utility execution required sudo privileges:
Worker Node console
# wget http://www.mellanox.com/downloads/firmware/mlxup/4.15.2/SFX/linux_x64/mlxup
# chmod +x mlxup
# ./mlxup -online -u
RDMA Subsystem Configuration
RDMA subsystem configuration is required on each worker node.
Instal LLDP Daemon and RDMA Core Userspace Libraries and Daemons.
Worker Node console
# apt install -y lldpd rdma-core
LLDPD is a daemon able to receive and send LLDP frames. The Link Layer Discovery Protocol (LLDP) is a vendor-neutral Layer 2 protocol that allows a network device to advertise its identity and capabilities on the local network.
Identify the name of the RDMA-capable interface for high-performance K8s network.
In this guide, ens2f0 network interface for high-performance K8s network was chosen and will be activated by NVIDIA Network Operator deployment:
Worker Node console
# rdma link link rocep7s0f0/1 state DOWN physical_state DISABLED netdev ens2f0 link rocep7s0f1/1 state DOWN physical_state DISABLED netdev ens2f1 link rocep131s0f0/1 state ACTIVE physical_state LINK_UP netdev ens4f0 link rocep131s0f1/1 state DOWN physical_state DISABLED netdev ens4f1
Set RDMA subsystem network namespace mode to exclusive mode.
RDMA subsystem network namespace mode ( netns parameter in ib_core module) in exclusive mode allows network namespace isolation for RDMA workloads on the worker node servers. Please create /etc/modprobe.d/ib_core.conf configuration file to change ib_core module parameters:/etc/modprobe.d/ib_core.conf
# Set netns to exclusive mode for namespace isolation options ib_core netns_mode=0
Then re-generate the initial RAM disks and reboot servers:
Worker Node console
# update-initramfs -u # reboot
After the server comes back, check netns mode:
Worker Node console
# rdma system netns exclusive
K8s Cluster Deployment and Configuration
The Kubernetes cluster in this solution will be installed using Kubespray with a non-root depuser account from the deployment node.
SSH Private Key and SSH Passwordless Login
Log in to the deployment node as a deployment user (in this case, depuser) and create an SSH private key for configuring the passwordless authentication on your computer by running the following commands:
Deployment Node console
$ ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key (/home/depuser/.ssh/id_rsa):
Created directory '/home/depuser/.ssh'.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/depuser/.ssh/id_rsa
Your public key has been saved in /home/depuser/.ssh/id_rsa.pub
The key fingerprint is:
SHA256:IfcjdT/spXVHVd3n6wm1OmaWUXGuHnPmvqoXZ6WZYl0 depuser@depserver
The key's randomart image is:
+---[RSA 3072]----+
| *|
| .*|
| . o . . o=|
| o + . o +E|
| S o .**O|
| . .o=OX=|
| . o%*.|
| O.o.|
| .*.ooo|
+----[SHA256]-----+
Copy your SSH private key, such as ~/.ssh/id_rsa, to all nodes in the deployment by running the following command (example):
Deployment Node console
$ ssh-copy-id depuser@192.168.222.111
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/home/depuser/.ssh/id_rsa.pub"
The authenticity of host '192.168.222.111 (192.168.222.111)' can't be established.
ECDSA key fingerprint is SHA256:6nhUgRlt9gY2Y2ofukUqE0ltH+derQuLsI39dFHe0Ag.
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
depuser@192.168.222.111's password:
Number of key(s) added: 1
Now try logging into the machine, with: "ssh 'depuser@192.168.222.111'"
and check to make sure that only the key(s) you wanted were added.
Verify that you have passwordless SSH connectivity to all nodes in your deployment by running the following command (example):
Deployment Node console
$ ssh depuser@192.168.222.111
Kubespray Deployment and Configuration
General Setting
To install dependencies for running Kubespray with Ansible on the deployment node, please run following commands:
Deployment Node console
$ cd ~
$ sudo apt -y install python3-pip jq
$ wget https://github.com/kubernetes-sigs/kubespray/archive/v2.15.0.tar.gz
$ tar -zxf v2.15.0.tar.gz
$ cd kubespray-2.15.0
$ sudo pip3 install -r requirements.txt
The default folder for subsequent commands is ~/kubespray-2.15.0.
Deployment Customization
Create a new cluster configuration and
host configuration file
.
Replace the IP addresses below with your nodes' IP addresses:
Deployment Node console
$ cp -rfp inventory/sample inventory/mycluster
$ declare -a IPS=(192.168.222.111 192.168.222.101 192.168.222.102)
$ CONFIG_FILE=inventory/mycluster/hosts.yaml python3 contrib/inventory_builder/inventory.py ${IPS[@]}
As a result, the
inventory/mycluster/hosts.yaml
file will be created.
Review and change the host configuration in the file. Below is an example for this deployment:
inventory/mycluster/hosts.yaml
all:
hosts:
node1:
ansible_host: 192.168.222.111
ip: 192.168.222.111
access_ip: 192.168.222.111
node2:
ansible_host: 192.168.222.101
ip: 192.168.222.101
access_ip: 192.168.222.101
node3:
ansible_host: 192.168.222.102
ip: 192.168.222.102
access_ip: 192.168.222.102
children:
kube-master:
hosts:
node1:
kube-node:
hosts:
node2:
node3:
etcd:
hosts:
node1:
k8s-cluster:
children:
kube-master:
kube-node:
calico-rr:
hosts: {}
Review and change cluster installation parameters in the files:
inventory/mycluster/group_vars/all/all.yml
inventory/mycluster/group_vars/k8s-cluster/k8s-cluster.yml
In inventory/mycluster/group_vars/k8s-cluster/k8s-cluster.yml set a d efault Kubernetes CNI by setting the desired kube_network_plugin value (default : calico ) parameter.
inventory/mycluster/group_vars/k8s-cluster/k8s-cluster.yml
...
# Choose network plugin (cilium, calico, contiv, weave or flannel. Use cni for generic cni plugin)
# Can also be set to 'cloud', which lets the cloud provider setup appropriate routing
kube_network_plugin: calico
# Setting multi_networking to true will install Multus: https://github.com/intel/multus-cni
kube_network_plugin_multus: false
...
Choice container runtime
In this guide containerd was chosen as the default container runtime in K8s cluster deployment because docker will be deprecated soon.
To use the containerd container runtime, set the following variables:
In inventory/mycluster/group_vars/k8s-cluster/k8s-cluster.yml :
inventory/mycluster/group_vars/k8s-cluster/k8s-cluster.yml
... ## Container runtime ## docker for docker, crio for cri-o and containerd for containerd. container_manager: containerd ...
In inventory/mycluster/group_vars/all/all.yml:
inventory/mycluster/group_vars/all/all.yml
... ## Experimental kubeadm etcd deployment mode. Available only for new deployment etcd_kubeadm_enabled: true ...
In inventory/mycluster/group_vars/etcd.yml:
inventory/mycluster/group_vars/etcd.yml
... ## Settings for etcd deployment type etcd_deployment_type: host ...
Deploying the Cluster Using KubeSpray Ansible Playbook
Run the following line to start the deployment process:
Deployment Node console
$ ansible-playbook -i inventory/mycluster/hosts.yaml --become --become-user=root cluster.yml
It takes a while for this deployment to complete, please make sure no errors are encountered.
A successful result should look something like the following:
Deployment Node console
...
PLAY RECAP ***********************************************************************************************************************************************************************************
localhost : ok=3 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
node1 : ok=554 changed=81 unreachable=0 failed=0 skipped=1152 rescued=0 ignored=2
node2 : ok=360 changed=42 unreachable=0 failed=0 skipped=633 rescued=0 ignored=1
node3 : ok=360 changed=42 unreachable=0 failed=0 skipped=632 rescued=0 ignored=1
Sunday 11 July 2021 22:36:04 +0000 (0:00:00.053) 0:06:51.785 ************
===============================================================================
kubernetes/kubeadm : Join to cluster ------------------------------------------------------------------------------------------------------------------------------------------------- 37.24s
kubernetes/control-plane : kubeadm | Initialize first master ------------------------------------------------------------------------------------------------------------------------- 28.29s
download_file | Download item -------------------------------------------------------------------------------------------------------------------------------------------------------- 16.57s
kubernetes/control-plane : Master | wait for kube-scheduler -------------------------------------------------------------------------------------------------------------------------- 14.23s
download_container | Download image if required -------------------------------------------------------------------------------------------------------------------------------------- 11.06s
download_container | Download image if required --------------------------------------------------------------------------------------------------------------------------------------- 9.18s
download_file | Download item --------------------------------------------------------------------------------------------------------------------------------------------------------- 8.61s
kubernetes-apps/ansible : Kubernetes Apps | Start Resources --------------------------------------------------------------------------------------------------------------------------- 7.02s
container-engine/crictl : download_file | Download item ------------------------------------------------------------------------------------------------------------------------------- 5.78s
download_container | Download image if required --------------------------------------------------------------------------------------------------------------------------------------- 5.52s
Configure | Check if etcd cluster is healthy ------------------------------------------------------------------------------------------------------------------------------------------ 5.24s
download_file | Download item --------------------------------------------------------------------------------------------------------------------------------------------------------- 4.89s
download_container | Download image if required --------------------------------------------------------------------------------------------------------------------------------------- 4.81s
kubernetes-apps/ansible : Kubernetes Apps | Lay Down CoreDNS templates ---------------------------------------------------------------------------------------------------------------- 4.68s
reload etcd --------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 4.65s
download_file | Download item --------------------------------------------------------------------------------------------------------------------------------------------------------- 4.24s
kubernetes/preinstall : Get current calico cluster version ---------------------------------------------------------------------------------------------------------------------------- 3.70s
network_plugin/calico : Start Calico resources ---------------------------------------------------------------------------------------------------------------------------------------- 3.42s
container-engine/crictl : extract_file | Unpacking archive ---------------------------------------------------------------------------------------------------------------------------- 3.35s
kubernetes-apps/cluster_roles : Apply workaround to allow all nodes with cert O=system:nodes to register ------------------------------------------------------------------------------ 3.32s
K8s Cluster Customization
Now that the K8S cluster is deployed, connect to the K8S master node with the root user account in order to customize deployment.
Label the worker nodes.
Master Node console
# kubectl label nodes node2 node-role.kubernetes.io/worker= # kubectl label nodes node3 node-role.kubernetes.io/worker=
K8S Cluster Deployment Verification
Following is an output example of K8s cluster deployment information using the Calico CNI plugin.
To ensure that the Kubernetes cluster is installed correctly, run the following commands:
Master Node console
# kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
node1 Ready master 44m v1.19.7 192.168.222.111 <none> Ubuntu 20.04.2 LTS 5.4.0-72-generic containerd://1.4.4
node2 Ready worker 42m v1.19.7 192.168.222.101 <none> Ubuntu 20.04.2 LTS 5.4.0-72-generic containerd://1.4.4
node3 Ready worker 42m v1.19.7 192.168.222.102 <none> Ubuntu 20.04.2 LTS 5.4.0-72-generic containerd://1.4.4
# kubectl -n kube-system get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
calico-kube-controllers-8b5ff5d58-ph86x 1/1 Running 0 43m 192.168.222.101 node2 <none> <none>
calico-node-l48qg 1/1 Running 0 43m 192.168.222.102 node3 <none> <none>
calico-node-ldx7w 1/1 Running 0 43m 192.168.222.111 node1 <none> <none>
calico-node-x9bh5 1/1 Running 0 43m 192.168.222.101 node2 <none> <none>
coredns-85967d65-pslmm 1/1 Running 0 27m 10.233.96.1 node2 <none> <none>
coredns-85967d65-qp2rl 1/1 Running 0 43m 10.233.90.230 node1 <none> <none>
dns-autoscaler-5b7b5c9b6f-8wb67 1/1 Running 0 43m 10.233.90.229 node1 <none> <none>
etcd-node1 1/1 Running 0 45m 192.168.222.111 node1 <none> <none>
kube-apiserver-node1 1/1 Running 0 45m 192.168.222.111 node1 <none> <none>
kube-controller-manager-node1 1/1 Running 0 45m 192.168.222.111 node1 <none> <none>
kube-proxy-6p4rm 1/1 Running 0 44m 192.168.222.101 node2 <none> <none>
kube-proxy-8bj6s 1/1 Running 0 44m 192.168.222.111 node1 <none> <none>
kube-proxy-dj4l8 1/1 Running 0 44m 192.168.222.102 node3 <none> <none>
kube-scheduler-node1 1/1 Running 0 45m 192.168.222.111 node1 <none> <none>
nginx-proxy-node2 1/1 Running 0 44m 192.168.222.101 node2 <none> <none>
nginx-proxy-node3 1/1 Running 0 44m 192.168.222.102 node3 <none> <none>
nodelocaldns-8b6kf 1/1 Running 0 43m 192.168.222.102 node3 <none> <none>
nodelocaldns-kzmmh 1/1 Running 0 43m 192.168.222.101 node2 <none> <none>
nodelocaldns-zh9fz 1/1 Running 0 43m 192.168.222.111 node1 <none> <none>
NVIDIA Network Operator Installation for K8S Cluster
NVIDIA Network Operator leverages Kubernetes CRDs and Operator SDK to manage networking-related components in order to enable fast networking and RDMA for workloads in K8s cluster. The Fast Network is a secondary network of the K8s cluster for applications that require high bandwidth or low latency.
To make it work, several components need to be provisioned and configured. All operator configuration and installation steps should be performed from the K8S master node with the root user account.
Prerequisites
Install Helm.
Master Node console
# snap install helm --classic
Install additional RDMA CNI plugin
RDMA CNI plugin allows network namespace isolation for RDMA workloads in a containerized environment.
Deploy CNI's using the following YAML files:Master Node console
# kubectl apply -f https://raw.githubusercontent.com/Mellanox/rdma-cni/master/deployment/rdma-cni-daemonset.yaml
To ensure the plugin is installed correctly, run the following command:
Master Node console
# kubectl -n kube-system get pods -o wide | egrep "rdma" kube-rdma-cni-ds-5zl8d 1/1 Running 0 11m 192.168.222.102 node3 <none> <none> kube-rdma-cni-ds-q74n5 1/1 Running 0 11m 192.168.222.101 node2 <none> <none> kube-rdma-cni-ds-rnqkr 1/1 Running 0 11m 192.168.222.111 node1 <none> <none>
Deployment
Add the NVIDIA Network Operator Helm repository:
Master Node console
# helm repo add mellanox https://mellanox.github.io/network-operator
# helm repo update
Create the values.yaml file in user home folder (e xample):
values.yaml
nfd:
enabled: true
sriovNetworkOperator:
enabled: true
# NicClusterPolicy CR values:
deployCR: true
ofedDriver:
deploy: false
nvPeerDriver:
deploy: false
rdmaSharedDevicePlugin:
deploy: false
sriovDevicePlugin:
deploy: false
secondaryNetwork:
deploy: true
cniPlugins:
deploy: true
image: containernetworking-plugins
repository: mellanox
version: v0.8.7
imagePullSecrets: []
multus:
deploy: true
image: multus
repository: nfvpe
version: v3.6
imagePullSecrets: []
config: ''
ipamPlugin:
deploy: true
image: whereabouts
repository: mellanox
version: v0.3
imagePullSecrets: []
Deploy the operator:
Master Node console
# helm install -f ./values.yaml -n network-operator --create-namespace --wait mellanox/network-operator --generate-name
NAME: network-operator
LAST DEPLOYED: Sun Jul 11 23:06:54 2021
NAMESPACE: network-operator
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
Get Network Operator deployed resources by running the following commands:
$ kubectl -n network-operator get pods
$ kubectl -n mlnx-network-operator-resources get pods
To ensure that the Operator is deployed correctly, run the following commands:
Master Node console
# kubectl -n network-operator get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
network-operator-1627211751-5bd467cbd9-2hwqx 1/1 Running 0 29h 10.233.90.5 node1 <none> <none>
network-operator-1627211751-node-feature-discovery-master-dgs69 1/1 Running 0 29h 10.233.90.6 node1 <none> <none>
network-operator-1627211751-node-feature-discovery-worker-7n6gs 1/1 Running 0 29h 10.233.90.3 node1 <none> <none>
network-operator-1627211751-node-feature-discovery-worker-sjdxw 1/1 Running 1 29h 10.233.96.7 node2 <none> <none>
network-operator-1627211751-node-feature-discovery-worker-vzpvg 1/1 Running 1 29h 10.233.92.5 node3 <none> <none>
network-operator-1627211751-sriov-network-operator-5f869696sdzp 1/1 Running 0 29h 10.233.90.4 node1 <none> <none>
High-Speed Network Configuration
After installing the operator, please check the SriovNetworkNodeState CRs to see all SRIOV-enabled devices in your node.
In our deployment has been chosen network interface with name
ens2f0. To review the interface status please use following command:
Master Node console
# kubectl -n network-operator get sriovnetworknodestates.sriovnetwork.openshift.io node2 -o yaml
...
status:
interfaces:
- deviceID: 101d
driver: mlx5_core
linkSpeed: 100000 Mb/s
linkType: ETH
mac: 0c:42:a1:2b:74:ae
mtu: 1500
name: ens2f0
pciAddress: "0000:07:00.0"
totalvfs: 8
vendor: 15b3
- deviceID: 101d
driver: mlx5_core
linkType: ETH
mac: 0c:42:a1:2b:74:af
mtu: 1500
name: ens2f1
pciAddress: "0000:07:00.1"
totalvfs: 8
vendor: 15b3
...
Create SriovNetworkNodePolicy CR policy.yaml file, by specifying chosen interface in the 'nicSelector' (in this example, for the ens2f0 interface):
policy.yaml
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
name: mlnxnics
namespace: network-operator
spec:
nodeSelector:
feature.node.kubernetes.io/network-sriov.capable: "true"
resourceName: mlnx2f0
priority: 98
mtu: 9000
numVfs: 8
nicSelector:
vendor: "15b3"
pfNames: [ "ens2f0" ]
deviceType: netdevice
isRdma: true
Deploy policy.yaml:
Master Node console
# kubectl apply -f policy.yaml
Create a SriovNetwork CR network.yaml file which refers to the 'resourceName' defined in SriovNetworkNodePolicy (in this example, referencing the mlnx2f0 resource and set 192.168.101.0/24 as CIDR range for the high-speed network):
network.yaml
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetwork
metadata:
name: "netmlnx2f0"
namespace: network-operator
spec:
ipam: |
{
"datastore": "kubernetes",
"kubernetes": {
"kubeconfig": "/etc/cni/net.d/whereabouts.d/whereabouts.kubeconfig"
},
"log_file": "/tmp/whereabouts.log",
"log_level": "debug",
"type": "whereabouts",
"range": "192.168.101.0/24"
}
vlan: 0
networkNamespace: "default"
spoofChk: "off"
resourceName: "mlnx2f0"
linkState: "enable"
metaPlugins: |
{
"type": "rdma"
}
Deploy network.yaml:
Master Node console
# kubectl apply -f network.yaml
Validating the Deployment
Check if the deployment is finished successfully:
Master Node console
# kubectl -n nvidia-network-operator-resources get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
cni-plugins-ds-f548q 1/1 Running 1 30m 192.168.222.101 node2 <none> <none>
cni-plugins-ds-qw7hx 1/1 Running 1 30m 192.168.222.102 node3 <none> <none>
kube-multus-ds-cjbf9 1/1 Running 1 30m 192.168.222.102 node3 <none> <none>
kube-multus-ds-rgc95 1/1 Running 1 30m 192.168.222.101 node2 <none> <none>
whereabouts-gwr7p 1/1 Running 1 30m 192.168.222.101 node2 <none> <none>
whereabouts-n29nq 1/1 Running 1 30m 192.168.222.102 node3 <none> <none>
Check deployed network:
Master Node console
# kubectl get network-attachment-definitions.k8s.cni.cncf.io
NAME AGE
netmlnx2f0 4m56s
Check worker node resources:
Master Node console
# kubectl describe nodes node2
...
Addresses:
InternalIP: 192.168.222.101
Hostname: node2
Capacity:
cpu: 24
ephemeral-storage: 229698892Ki
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 264030604Ki
nvidia.com/mlnx2f0: 8
pods: 110
Allocatable:
cpu: 23900m
ephemeral-storage: 211690498517
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 242694540Ki
nvidia.com/mlnx2f0: 8
pods: 110
...
Manage HugePages
Kubernetes supports the allocation and consumption of pre-allocated HugePages by applications in a Pod. The nodes will automatically discover and report all HugePages resources as schedulable resources. For get additional information K8s HugePages management, please refer here.
In order to allocate,
HugePages
needs to modify GRUB_CMDLINE_LINUX_DEFAULT parameter in /etc/default/grub.
This setting, below, allocates 1GB * 16 pages = 16GB and 2MB * 2048 pages= 4GB
HugePages
on boot time:
/etc/default/grub
...
GRUB_CMDLINE_LINUX_DEFAULT="default_hugepagesz=1G hugepagesz=1G hugepages=16 hugepagesz=2M hugepages=2048"
...
Run update-grub to apply the config to grub and reboot server:
Worker Node console
# update-grub
# reboot
After the server comes back, check hugepages allocation from master node by command:
Master Node console
# kubectl describe nodes node2
...
Capacity:
cpu: 24
ephemeral-storage: 229698892Ki
hugepages-1Gi: 16Gi
hugepages-2Mi: 4Gi
memory: 264030604Ki
nvidia.com/mlnx2f0: 8
pods: 110
Allocatable:
cpu: 23900m
ephemeral-storage: 211690498517
hugepages-1Gi: 16Gi
hugepages-2Mi: 4Gi
memory: 242694540Ki
nvidia.com/mlnx2f0: 8
pods: 110
...
Enable CPU and Topology Management
CPU Manager manages groups of CPUs and constrains workloads to specific CPUs.
CPU Manager is useful for workloads that have some of these attributes:
Require as much CPU time as possible
Are sensitive to processor cache misses
Are low-latency network applications
Coordinate with other processes and benefit from sharing a single processor cache
Topology Manager uses topology information from collected hints to decide if a pod can be accepted or rejected on a node, based on the configured Topology Manager policy and Pod resources requested. In order to extract the best performance, optimizations related to CPU isolation and memory and device locality are required.
Topology Manager is useful for workloads that use hardware accelerators to support latency-critical execution and high throughput parallel computation.
To use Topology Manager, CPU Manager with static policy must be used.
For additional information, please refer to Control Topology Management Policies on a node and Control Topology Management Policies on a node.
In order to enable CPU Manager and Topology Manager, please add following lines to kubelet configuration file /etc/kubernetes/kubelet-config.yaml:
/etc/kubernetes/kubelet-config.yaml
...
cpuManagerPolicy: static
cpuManagerReconcilePeriod: 10s
topologyManagerPolicy: single-numa-node
featureGates:
CPUManager: true
TopologyManager: true
Due to changes in cpuManagerPolicy, remove /var/lib/kubelet/cpu_manager_state and restart kubelet service on each affected K8s worker node.
Worker Node console
# rm -f /var/lib/kubelet/cpu_manager_state
# service kubelet restart
Application
DPDK traffic emulation is shown in Testbed Flow Diagram below. The traffic will be pushed from Trex Server via ens2f0 interface to TestPMD POD via SRIOV network interface net1. TestPMD POD will swap mac-address and re-routes ingress traffic via the same interface net1 to the same interface on Trex Server.
Verification
Create a sample deployment test-deployment.yaml (container image should include InfiniBand userspace drivers and performance tools):
test-deployment.yaml
apiVersion: apps/v1 kind: Deployment metadata: name: mlnx-inbox-pod labels: app: sriov spec: replicas: 2 selector: matchLabels: app: sriov template: metadata: labels: app: sriov annotations: k8s.v1.cni.cncf.io/networks: netmlnx2f0 spec: containers: - image: < Container image > name: mlnx-inbox-ctr securityContext: capabilities: add: [ "IPC_LOCK" ] resources: requests: cpu: 4 nvidia.com/mlnx2f0: 1 limits: cpu: 4 nvidia.com/mlnx2f0: 1 command: - sh - -c - sleep inf
Deploy the sample deployment.
Master Node console
# kubectl apply -f test-deployment.yaml
Verify the deployment is running.
Master Node console
# kubectl get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES mlnx-inbox-pod-599dc445c8-72x6g 1/1 Running 0 12s 10.233.96.5 node2 <none> <none> mlnx-inbox-pod-599dc445c8-v5lnx 1/1 Running 0 12s 10.233.92.4 node3 <none> <none>
Check available network interfaces in POD.
Master Node console
# kubectl exec -it mlnx-inbox-pod-599dc445c8-72x6g -- bash root@mlnx-inbox-pod-599dc445c8-72x6g:/tmp# rdma link link rocep7s0f0v2/1 state ACTIVE physical_state LINK_UP netdev net1 root@mlnx-inbox-pod-599dc445c8-72x6g:/tmp# ip a s 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: tunl0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN group default qlen 1000 link/ipip 0.0.0.0 brd 0.0.0.0 4: eth0@if208: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default link/ether 12:51:ab:b3:ef:26 brd ff:ff:ff:ff:ff:ff link-netnsid 0 inet 10.233.96.5/32 brd 10.233.96.5 scope global eth0 valid_lft forever preferred_lft forever inet6 fe80::1051:abff:feb3:ef26/64 scope link valid_lft forever preferred_lft forever 201: net1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UP group default qlen 1000 link/ether 02:40:7d:5e:5f:af brd ff:ff:ff:ff:ff:ff inet 192.168.101.2/24 brd 192.168.101.255 scope global net1 valid_lft forever preferred_lft forever inet6 fe80::40:7dff:fe5e:5faf/64 scope link valid_lft forever preferred_lft forever
Run synthetic RDMA benchmark tests with ib_write_bw bandwidth and latency test using RDMA write transactions.
Server
ib_write_bw -F -d $IB_DEV_NAME --report_gbits
Client
ib_write_bw -F $SERVER_IP -d $IB_DEV_NAME --report_gbits
Please open two consoles to K8s master node—one for the server apps side and the second for the client apps side.
In a first console (server side) to K8s master node, run the following commands:
Master Node console
# kubectl exec -it mlnx-inbox-pod-599dc445c8-72x6g -- bash root@mlnx-inbox-pod-599dc445c8-72x6g:/tmp# ip a s net1 | grep inet inet 192.168.101.2/24 brd 192.168.101.255 scope global net1 inet6 fe80::40:7dff:fe5e:5faf/64 scope link root@mlnx-inbox-pod-599dc445c8-72x6g:/tmp# rdma link link rocep7s0f0v2/1 state ACTIVE physical_state LINK_UP netdev net1 root@mlnx-inbox-pod-599dc445c8-72x6g:/tmp# ib_write_bw -F -d rocep7s0f0v2 --report_gbits ************************************ * Waiting for client to connect... * ************************************
In a second console (client side) to K8s master node, run the following commands:
Master Node console
# kubectl exec -it mlnx-inbox-pod-599dc445c8-v5lnx -- bash root@mlnx-inbox-pod-599dc445c8-v5lnx:/tmp# rdma link link rocep7s0f0v3/1 state ACTIVE physical_state LINK_UP netdev net1 root@mlnx-inbox-pod-599dc445c8-v5lnx:/tmp# ib_write_bw -F -d rocep7s0f0v3 192.168.101.2 --report_gbits --------------------------------------------------------------------------------------- RDMA_Write BW Test Dual-port : OFF Device : rocep7s0f0v3 Number of qps : 1 Transport type : IB Connection type : RC Using SRQ : OFF TX depth : 128 CQ Moderation : 100 Mtu : 4096[B] Link type : Ethernet GID index : 2 Max inline data : 0[B] rdma_cm QPs : OFF Data ex. method : Ethernet --------------------------------------------------------------------------------------- local address: LID 0000 QPN 0x01f2 PSN 0x75e7cf RKey 0x050e26 VAddr 0x007f51e51b9000 GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:101:01 remote address: LID 0000 QPN 0x00f2 PSN 0x13427f RKey 0x010e26 VAddr 0x007f1ecaac8000 GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:101:02 --------------------------------------------------------------------------------------- #bytes #iterations BW peak[Gb/sec] BW average[Gb/sec] MsgRate[Mpps] 65536 5000 94.26 92.87 0.169509 ---------------------------------------------------------------------------------------
TRex Server Deployment
In our guide used TRex package v2.87.
For detailed TRex installation and configuration guide, please refer to TRex Documentation.
TRex Installation and configuration steps done with the root user account.
Prerequisites
For the TRex server, a standard server with installed RDMA subsystem has been used.
Activate the network interfaces that been used by TRex application with netplan.
In our deployment, interfaces ens2f0 and ens2f1 are used:
/etc/netplan/00-installer-config.yaml
# This is the network config written by 'subiquity'
network:
ethernets:
ens4f0:
dhcp4: true
dhcp-identifier: mac
ens2f0: {}
ens2f1: {}
version: 2
Then re-apply netplan and check link status for ens2f0/ens2f1 network interfaces.
TRex server console
# netplan apply
# rdma link
link mlx5_0/1 state ACTIVE physical_state LINK_UP netdev ens2f0
link mlx5_1/1 state ACTIVE physical_state LINK_UP netdev ens2f1
link mlx5_2/1 state ACTIVE physical_state LINK_UP netdev ens4f0
link mlx5_3/1 state DOWN physical_state DISABLED netdev ens4f1
Updated MTU size for interfaces ens2f0 and ens2f1.
TRex server console
# ip link set ens2f0 mtu 9000
# ip link set ens2f1 mtu 9000
Installation
Create TRex working directory and obtaining the TRex package.
TRex server console
# cd /tmp
# wget https://trex-tgn.cisco.com/trex/release/v2.87.tar.gz --no-check-certificate
# mkdir /scratch
# cd /scratch
# tar -zxf /tmp/v2.87.tar.gz
# chmod 777 -R /scratch
First-Time Scripts
The next step will continue from folder /scratch/v2.87.
Run TRex configuration script in interactive mode. Follow the instructions on the screen to create a basic config file /etc/trex_cfg.yaml :
TRex server console
# ./dpdk_setup_ports.py -i
The /etc/trex_cfg.yaml configuration file is created. Later we'll change it to suit our setup.
Appendix
Performance Testing
Below, a performance test is shown of DPDK traffic emulation between TRex traffic generator and TESTPMD application running on the K8s worker node, in accordance with the Testbed diagram presented above.
Prerequisites
Before starting the test, update TRex configuration file /etc/trex_cfg.yaml with a mac-address of the high-performance interface from the TESTPMD pod. Below are the steps to complete this update.
Run pod on K8s cluster with TESTPMD apps according to below presented YAML configuration file testpmd-inbox.yaml (container image should include InfiniBand userspace drivers and TESTPMD apps):
testpmd-inbox.yaml
apiVersion: apps/v1 kind: Deployment metadata: name: test-deployment labels: app: test spec: replicas: 1 selector: matchLabels: app: test template: metadata: labels: app: test annotations: k8s.v1.cni.cncf.io/networks: netmlnx2f0 spec: containers: - image: < container image > name: test-pod securityContext: capabilities: add: [ "IPC_LOCK" ] volumeMounts: - mountPath: /hugepages name: hugepage resources: requests: hugepages-1Gi: 2Gi memory: 16Gi cpu: 8 nvidia.com/mlnx2f0: 1 limits: hugepages-1Gi: 2Gi memory: 16Gi cpu: 8 nvidia.com/mlnx2f0: 1 command: - sh - -c - sleep inf volumes: - name: hugepage emptyDir: medium: HugePages
Deploy the deployment with the following command:
Master Node console
# kubectl apply -f testpmd-inbox.yaml
Get the network information from the deployed pod by running the following:
Master Node console
# kubectl get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES test-deployment-676476c78d-glbfs 1/1 Running 0 30s 10.233.92.5 node3 <none> <none> # kubectl exec -it test-deployment-676476c78d-glbfs -- ip a s net1 193: net1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UP group default qlen 1000 link/ether 32:f9:3f:e3:dc:89 brd ff:ff:ff:ff:ff:ff inet 192.168.101.3/24 brd 192.168.101.255 scope global net1 valid_lft forever preferred_lft forever inet6 fe80::30f9:3fff:fee3:dc89/64 scope link valid_lft forever preferred_lft forever
Update TRex configuration file /etc/trex_cfg.yaml with mac-address if the NET1 network interface 32:f9:3f:e3:dc:89:
/etc/trex_cfg.yaml
### Config file generated by dpdk_setup_ports.py ### - version: 2 interfaces: ['07:00.0', '0d:00.0'] port_info: - dest_mac: 32:f9:3f:e3:dc:89 # MAC OF NET1 INTERFACE src_mac: 0c:42:a1:24:05:1a - dest_mac: 32:f9:3f:e3:dc:89 # MAC OF NET1 INTERFACE src_mac: 0c:42:a1:24:05:1b platform: master_thread_id: 0 latency_thread_id: 12 dual_if: - socket: 0 threads: [1,2,3,4,5,6,7,8,9,10,11]
DPDK Emulation Test
Run TESTPMD apps in container:
Master Node console
# kubectl exec -it test-deployment-676476c78d-glbfs -- bash root@test-deployment-676476c78d-glbfs:/tmp# dpdk-testpmd -c 0x1fe -m 1024 -w $PCIDEVICE_NVIDIA_COM_MLNX2F0 -- --burst=64 --txd=1024 --rxd=1024 --mbcache=512 --rxq=8 --txq=8 --nb-cores=4 --rss-udp --forward-mode=macswap -a -i ... testpmd>
WarningSpecific TESTPMD parameters:
$PCIDEVICE_NVIDIA_COM_MLNX2F0 - system variable PCI address of NET1
More information about additional TESTPMD parameters:
https://doc.dpdk.org/guides/testpmd_app_ug/run_app.html?highlight=testpmd
https://doc.dpdk.org/guides/linux_gsg/linux_eal_parameters.htmlRun TRex traffic generator on TRex server:
TRex server console
# cd /scratch/v2.87/ # ./t-rex-64 -v 7 -i -c 11 --no-ofed-check
Open second screen to TRex server and create a traffic generation file mlnx-trex.py in folder /scratch/v2.87:
mlnx-trex.py
from trex_stl_lib.api import * class STLS1(object): def create_stream (self): pkt = Ether()/IP(src="16.0.0.1",dst="48.0.0.1")/UDP(dport=12)/(22*'x') vm = STLScVmRaw( [ STLVmFlowVar(name="v_port", min_value=4337, max_value=5337, size=2, op="inc"), STLVmWrFlowVar(fv_name="v_port", pkt_offset= "UDP.sport" ), STLVmFixChecksumHw(l3_offset="IP",l4_offset="UDP",l4_type=CTRexVmInsFixHwCs.L4_TYPE_UDP), ] ) return STLStream(packet = STLPktBuilder(pkt = pkt ,vm = vm ) , mode = STLTXCont(pps = 8000000) ) def get_streams (self, direction = 0, **kwargs): # create 1 stream return [ self.create_stream() ] # dynamic load - used for trex console or simulator def register(): return STLS1()
After run TRex console and generate traffic to TESTPMD pod:
TRex server console
# cd /scratch/v2.87/ # ./trex-console Using 'python3' as Python interpeter Connecting to RPC server on localhost:4501 [SUCCESS] Connecting to publisher server on localhost:4500 [SUCCESS] Acquiring ports [0, 1]: [SUCCESS] Server Info: Server version: v2.87 @ STL Server mode: Stateless Server CPU: 11 x Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz Ports count: 2 x 100Gbps @ MT2892 Family [ConnectX-6 Dx] -=TRex Console v3.0=- Type 'help' or '?' for supported actions trex> tui<enter> ... tui> start -f mlnx-trex.py -m 45mpps -p 0 ... Global Statistitcs connection : localhost, Port 4501 total_tx_L2 : 23.9 Gbps version : STL @ v2.87 total_tx_L1 : 30.93 Gbps cpu_util. : 82.88% @ 11 cores (11 per dual port) total_rx : 25.31 Gbps rx_cpu_util. : 0.0% / 0 pps total_pps : 44.84 Mpps async_util. : 0.05% / 11.22 Kbps drop_rate : 0 bps total_cps. : 0 cps queue_full : 0 pkts ...
Summary
From the above test, it is evident that the desired traffic is 45mpps with SR-IOV network port in POD.
WarningIn order to get better results, additional application tuning is required for Trex and TESTPMD.
Done!
Authors
|
Vitaliy Razinkov Over the past few years, Vitaliy Razinkov has been working as a Solutions Architect on the NVIDIA Networking team, responsible for complex Kubernetes/OpenShift and Microsoft's leading solutions, research and design. He previously spent more than 25 years in senior positions at several companies. Vitaliy has written several reference designs guides on Microsoft technologies, RoCE/RDMA accelerated machine learning in Kubernetes/OpenShift, and container solutions, all of which are available on the NVIDIA Networking Documentation website. |
|
Amir Zeidner For the past several years, Amir has worked as a Solutions Architect primarily in the Telco space, leading advanced solutions to answer 5G, NFV, and SDN networking infrastructures requirements. Amir’s expertise in data plane acceleration technologies, such as Accelerated Switching and Network Processing (ASAP²) and DPDK, together with a deep knowledge of open source cloud-based infrastructures, allows him to promote and deliver unique end-to-end NVIDIA Networking solutions throughout the Telco world. |
Related Documents