image image image image image image



On This Page

Created on Aug 8, 2019

Introduction

This document describes how to enable PVRDMA in VMware vSphere 6.5/6.7 with NVIDIA ConnectX network cards. 

This guide assumes the following software and drivers are installed:

  • VMware ESXi 6.7 Update 2, build 13006603
  • vCenter 6.7 Update 2, build 13007421
  • Distributed Switch 6.6.0
  • ConnectX® Ethernet Driver for VMware® ESXi Server 4.17.13.1-1vmw.670.2.48.13006603 
  • CentOS 7.6

References

Components Overview

vSphere Distributed Switch

A vSphere Distributed Switch provides centralized management and monitoring of the networking configuration of all hosts that are associated with the switch. You must set up a distributed switch on a vCenter Server system, and its settings will be propagated to all hosts that are associated with the switch.

Paravirtual RDMA (PVRDMA)

Direct Memory Access (DMA) - A device's capability to access host memory directly, without the intervention of the CPU.

Remote Direct Memory Access (RDMA) - is the ability of accessing memory (read, write) on a remote machine without interrupting the CPU(s) processes on the system.

RDMA Advantages:

  • Zero-copy - Allows applications to perform data transfers without involving the network software stack. Data is sent and received directly to the buffers without being copied between the network layers.

  • Kernel bypass - Allows applications to perform data transfers directly from the user-space without the kernel involvement.

  • CPU Offload - Allows applications to access a remote memory without consuming any CPU time on the remote server. The remote memory server will be read without any intervention from the remote process (or processor). Moreover, the cache of the remote CPU will not be filled with the accessed memory content.

PVRDMA Architecture


Accelerating VM Data

Solution Overview

Setup

 

Solution Logical Design


Bill of Materials


Solution Physical Network Wiring

 

Configuration

Network Configuration

The below table provides details of ESXi server names and their network configuration:

ESXi

Server

Server

Name

IP and NICs
High-Speed Ethernet Network

Management Network

192.168.1.0

ESXi-01sl01w01esx21noneeno0: From DHCP (reserved)
ESXi-02sl01w01esx22noneeno0: From DHCP (reserved)
ESXi-03sl01w01esx23noneeno0: From DHCP (reserved)
ESXi-04sl01w01esx24noneeno0: From DHCP (reserved)


The below table provides details of VM names and their network configuration:

VM

Server

Name

IP and NICs
High-Speed Ethernet Network

Management Network

192.168.1.0

VM-01pvrdma-vm01192.168.11.51eno0: From DHCP (reserved)
VM-02pvrdma-vm02192.168.11.52eno0: From DHCP (reserved)
VM-03pvrdma-vm03192.168.11.53eno0: From DHCP (reserved)
VM-04pvrdma-vm04192.168.11.54eno0: From DHCP (reserved)

ESXi Host Configuration

Check host configurations:

  1. Enable SSH Access to ESXi server.
  2. Log into ESXi vSphere Command-Line Interface with root permissions.
  3. Verify that the host is equipped with a NVIDIA adapter card:

    ESXi Console
    ~ lspci | grep Mellanox
    
    0000:02:00.0 Network controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex] [vmnic2]
    0000:02:00.1 Network controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex] [vmnic3]
    Note: in this case, NVIDIA card is using vmnic2 and vmnic3.
  4. Verify that the logical RDMA devices are currently registered on the system:

    ESXi Console
    ~ esxcli rdma device list
    Name 	 Driver 	 State 	 MTU    Speed 	   Paired Uplink 	Description
    -------	 ----------	 ------	 ----   --------   -------------	-----------------------------------
    vmrdma0	 nmlx5_rdma  Active	 1024	100 Gbps   vmnic2			MT28800 Family [ConnectX-5 MT28831]
    vmrdma1	 nmlx5_rdma  Down	 1024	0		   vmnic3			MT28800 Family [ConnectX-5 MT28831]

Deployment

Before starting with the deployment process, we need to create a vSphere Distributed Switch (vDS)

Creating a vDS

To create a new vDS:

  1. Launch the vSphere Web Client and connect to a vCenter Server instance.
  2. On the vSphere Web Client home screen, select the vCenter object from the list on the left.
    Hover over the Distributed Switches from the Inventory Lists area, then click on New Distributed Switch icon (see below image):

    This will launch the New vDS creation wizard
  3. Provide a name for the new distributed switch and select the location within the vCenter inventory where you would like to store the new vDS  (a datacenter object or a folder). Click Next.
  4. Select the version of the vDS you would like to create:
  5. Specify the number of uplink ports as 2, tick the Create a default port group box and give  a name to that group:
  6. Click Next to Finish.

Adding hosts to vDS.

To add an ESXi host to an existing vDS:

  1. Launch the vSphere Web Client, and connect to a vCenter Server instance.
  2. Navigate to the list of distributed switches.
  3. Select the new distributed switch in the list of objects on the right, and select Add and Manage Hosts from the Actions menu:
  4. Select the Add hosts button and click Next:
  5. Click on the New hosts green plus icon to add an ESXi host.

    This opens the Select New Host dialog box.
  6. From the list of new hosts, tick the boxes with the names of each ESXi host you would like to add to the vDS.

    Click OK when you are done, and then click Next to continue.

  7. In the next screen, make sure both the Manage physical adapters and Manage VMkernel adapters boxes are selected. Click Next to continue.

  8. Configure vmnic2 in each ESXi host as an Uplink 1 for vDS:
  9. Create and attach the vmkernel adapter vmk2 to vDS port group sl01-w01-vds02-pvrdma. Click the green plus icon and select one of the existing networksClick OK.


    Click Next.
  10. Provide an IPv4 address and Subnet mask for the vmkernel adapter vmk2:
  11. Click Next until the wizard is finished:



  12. Click Finish:

Configure an ESXi Host for PVRDMA

To use PVRDMA in vSphere 6.5/6.7, your environment must meet several configuration requirements.

To configure an ESXi host for PVRDMA, follow the below steps.

Tag a VMkernel Adapter for PVRDMA

to tag a VMkernel adapter, select it and enable it for PVRDMA communication with the following steps:

  1. In the vSphere Web Client, navigate to the host.
  2. On the Configure tab, expand the System subheading and click Advanced System Settings.
  3. Locate Net.PVRDMAvmknic and click Edit.
  4. Enter the value of the VMkernel adapter that you want to use and click OK.

    For this example we used vmk2.


Optional: 

You can use ESXI CLI to Tag a vmknic created on DVS that the VRDMA will use for TCP channel by running the following command line:


ESXi Console
esxcli system settings advanced set -o /Net/PVRDMAVmknic -s vmk2

Enable Firewall Rule for PVRDMA

To enable the firewall rule for PVRDMA in the security profile of the ESXi host:

  1. In the vSphere Web Client, navigate to the host.
  2. In the Configure tab, expand the System subheading.
  3. Go to Security Profile → Firewall(6.7) or Firewall(6.5) section and click Edit.
  4. Scroll to the pvrdma rule and tick the relevant box next to it:
  5. Click OK to finish.


Optional: 

You can use ESXI CLI to enable the pvrdma firewall rule (or disable the firewall) with the following command line:

ESXi Console
esxcli network firewall ruleset set -e true -r pvrdma



Assign PVRDMA Adapter to a Virtual Machine

To enable a virtual machine to exchange data using RDMA, you must associate the VM with a PVRDMA network adapter. To do so:

  1. Locate the VM in the vSphere Web Client.
  2. Select a data center, folder, cluster, resource pool, or a host and click on the VMs tab.
  3. Click Virtual Machines and double-click the VM from the list.
  4. Power off the VM.
  5. In the Configure tab of the VM, expand the Settings subheading and select VM Hardware.
  6. Click Edit and select the Virtual Hardware tab in the dialog box displaying the settings.
  7. At the bottom of the window next to New device, select Network and click Add.
  8. Expand the New Network section and connect the VM to a distributed port group.
    For Adapter Type, select PVRDMA.
  9. Expand the Memory section, tick the box next to Reserve all guest memory (All locked).
  10. Click OK to close the dialog window.
  11. Power on the virtual machine

Configure Guest OS for PVRDMA

This step assumes a procedure to assign a PVRDMA Adapter to a Virtual Machine with CentOS 7.2 or later and Ubuntu 18.04.

To configure a Guest OS for PVRDMA, you need to install a PVRDMA driver. The installation process depends on ESXi version, VM tools and Guest OS version: 

Guest OS: CentOS 7.3 and later
VM hardware version 14
ESXi v6.7
Guest OS: CentOS 7.2
VM hardware version 13
ESXi v6.5


  1. Create a VM with VM Compatibility version 14 and install CentOS version 7.3 or later.
  2. Add PVRDMA adapter over DVS portgroup from the vCenter.
  3. Install the InfiniBand packages and reload pvrdma driver with the following command line:

    VM Console
    yum groupinstall "Infiniband Support" –y
    rmmod vmw_pvrdma
    modprobe vmw_pvrdma
    ibv_devinfo






  1. Create a VM with VM Compatibility version 13 and install CentOS 7.2.
  2. Add PVRDMA adapter over DVS portgroup from the vCenter.
  3. Install the InfiniBand drivers with the following command line:

    VM Console
    yum groupinstall "Infiniband Support" –y
  4. Install the pvrdma driver:

    VM Console
    tar xf vrdma_ib_devel.tar
    cd vrdma_ib_devel/
    make
    cp pvrdma.ko /lib/modules/3.10.0-327.el7.x86_64/extra/
    depmod –a
    modprobe pvrdma
  5. Install vrdma lib:

    VM Console
    cd /tmp/
    tar xf libvrdma_devel.tar
    cd libvrdma_devel/
    ./autogen.sh
    ./configure --libdir=/lib64
    make
    make install
    cp pvrdma.driver /etc/libibverbs.d/
    rmmod pvrdma
    modprobe pvrdma
    ibv_devinfo


For Guest OS: Ubuntu 18.04, VM hardware version 14, ESXi v6.7 the vmw_pvrdma driver should already be included in 18.04. The user level libraries can be installed using:


VM Console
apt-get install rdma-core
reboot
In case VM Compatibility version does not match the above, you can upgrade the VM Compatibility to version 13 in ESXi 6.5, and to version 14 in Esxi 6.7.

Deployment Verification

To test the communication using PVRDMA we will use a PerftestThis is a collection of tests written over uverbs intended for use as a performance micro-benchmark.

The tests may be used for hardware or software tuning as well as for functional testing.

To install and run the benchmark:

  1. Install Perftest:

    VM Console
    yum install git
    git clone https://github.com/linux-rdma/perftest.git
    cd perftest/
    yum install autotools-dev automake
    yum install libtool
    yum install libibverbs-devel
    ./autogen.sh
    ./configure
    make -j 8
  2. Check the network interface name:

    VM Console
    ifconfig
    ...
    ens224f0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
  3. Add Static IP configuration to the network interface. Modify /etc/sysconfig/network-scripts/ifcfg-ens224f0

    Add to file
    HWADDR=00:50:56:aa:65:92
    DNS1=192.168.1.21
    DOMAIN=vwd.clx
    BOOTPROTO="static"
    NAME="ens224f0"
    DEVICE="ens224f0"
    ONBOOT="yes"
    USERCTL=no
    IPADDR=192.168.11.51
    NETMASK=255.255.255.0
    PEERDNS=no
    IPV6INIT=no
    IPV6_AUTOCONF=no
    ping 192.168.11.51

Repeat steps 1-3 for the second VM.

On the first VM ("Server"), run the following:

VM01 "Server" Console
systemctl disable firewall
systemctl stop firewalld
systemctl disable firewalld
firewall-cmd --state
./ib_write_bw -x 0 -d vmw_pvrdma0 --report_gbits

On second VM ("Client"), run the following:

VM02 "Client" Console
./ib_write_bw -x 0 -F 192.168.11.51 -d vmw_pvrdma0 --report_gbits
************************************

* Waiting for client to connect... *
************************************
-------------------------------------------------------------------------------- -------
RDMA_Write BW Test
Dual-port : OFF Device : vmw_pvrdma0
Number of qps : 1 Transport type : IB
Connection type : RC Using SRQ : OFF
CQ Moderation : 100
Mtu : 1024[B]
Link type : Ethernet
GID index : 0
Max inline data : 0[B]
rdma_cm QPs : OFF
Data ex. method : Ethernet
-------------------------------------------------------------------------------- -------
local address: LID 0000 QPN 0x0004 PSN 0xfb9486 RKey 0x000005 VAddr 0x007f68c62 a1000
GID: 254:128:00:00:00:00:00:00:02:80:86:255:254:170:101:146
remote address: LID 0000 QPN 0x0002 PSN 0xe72165 RKey 0x000003 VAddr 0x007f2ab4 361000
GID: 254:128:00:00:00:00:00:00:02:80:86:255:254:170:58:174
-------------------------------------------------------------------------------- -------
#bytes #iterations BW peak[MB/sec] BW average[MB/sec] MsgRate[Mpps]
65536  5000          90.56           90.39            0.172405
-------------------------------------------------------------------------------- -------


Done!

Notice

This document is provided for information purposes only and shall not be regarded as a warranty of a certain functionality, condition, or quality of a product. Neither NVIDIA Corporation nor any of its direct or indirect subsidiaries and affiliates (collectively: “NVIDIA”) make any representations or warranties, expressed or implied, as to the accuracy or completeness of the information contained in this document and assumes no responsibility for any errors contained herein. NVIDIA shall have no liability for the consequences or use of such information or for any infringement of patents or other rights of third parties that may result from its use. This document is not a commitment to develop, release, or deliver any Material (defined below), code, or functionality.
NVIDIA reserves the right to make corrections, modifications, enhancements, improvements, and any other changes to this document, at any time without notice.
Customer should obtain the latest relevant information before placing orders and should verify that such information is current and complete.
NVIDIA products are sold subject to the NVIDIA standard terms and conditions of sale supplied at the time of order acknowledgement, unless otherwise agreed in an individual sales agreement signed by authorized representatives of NVIDIA and customer (“Terms of Sale”). NVIDIA hereby expressly objects to applying any customer general terms and conditions with regards to the purchase of the NVIDIA product referenced in this document. No contractual obligations are formed either directly or indirectly by this document.
NVIDIA products are not designed, authorized, or warranted to be suitable for use in medical, military, aircraft, space, or life support equipment, nor in applications where failure or malfunction of the NVIDIA product can reasonably be expected to result in personal injury, death, or property or environmental damage. NVIDIA accepts no liability for inclusion and/or use of NVIDIA products in such equipment or applications and therefore such inclusion and/or use is at customer’s own risk.
NVIDIA makes no representation or warranty that products based on this document will be suitable for any specified use. Testing of all parameters of each product is not necessarily performed by NVIDIA. It is customer’s sole responsibility to evaluate and determine the applicability of any information contained in this document, ensure the product is suitable and fit for the application planned by customer, and perform the necessary testing for the application in order to avoid a default of the application or the product. Weaknesses in customer’s product designs may affect the quality and reliability of the NVIDIA product and may result in additional or different conditions and/or requirements beyond those contained in this document. NVIDIA accepts no liability related to any default, damage, costs, or problem which may be based on or attributable to: (i) the use of the NVIDIA product in any manner that is contrary to this document or (ii) customer product designs.
No license, either expressed or implied, is granted under any NVIDIA patent right, copyright, or other NVIDIA intellectual property right under this document. Information published by NVIDIA regarding third-party products or services does not constitute a license from NVIDIA to use such products or services or a warranty or endorsement thereof. Use of such information may require a license from a third party under the patents or other intellectual property rights of the third party, or a license from NVIDIA under the patents or other intellectual property rights of NVIDIA.
Reproduction of information in this document is permissible only if approved in advance by NVIDIA in writing, reproduced without alteration and in full compliance with all applicable export laws and regulations, and accompanied by all associated conditions, limitations, and notices.
THIS DOCUMENT AND ALL NVIDIA DESIGN SPECIFICATIONS, REFERENCE BOARDS, FILES, DRAWINGS, DIAGNOSTICS, LISTS, AND OTHER DOCUMENTS (TOGETHER AND SEPARATELY, “MATERIALS”) ARE BEING PROVIDED “AS IS.” NVIDIA MAKES NO WARRANTIES, EXPRESSED, IMPLIED, STATUTORY, OR OTHERWISE WITH RESPECT TO THE MATERIALS, AND EXPRESSLY DISCLAIMS ALL IMPLIED WARRANTIES OF NONINFRINGEMENT, MERCHANTABILITY, AND FITNESS FOR A PARTICULAR PURPOSE. TO THE EXTENT NOT PROHIBITED BY LAW, IN NO EVENT WILL NVIDIA BE LIABLE FOR ANY DAMAGES, INCLUDING WITHOUT LIMITATION ANY DIRECT, INDIRECT, SPECIAL, INCIDENTAL, PUNITIVE, OR CONSEQUENTIAL DAMAGES, HOWEVER CAUSED AND REGARDLESS OF THE THEORY OF LIABILITY, ARISING OUT OF ANY USE OF THIS DOCUMENT, EVEN IF NVIDIA HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. Notwithstanding any damages that customer might incur for any reason whatsoever, NVIDIA’s aggregate and cumulative liability towards customer for the products described herein shall be limited in accordance with the Terms of Sale for the product.

Trademarks
NVIDIA, the NVIDIA logo, and Mellanox are trademarks and/or registered trademarks of NVIDIA Corporation and/or Mellanox Technologies Ltd. in the U.S. and in other countries. Other company and product names may be trademarks of the respective companies with which they are associated.

Copyright
© 2022 NVIDIA Corporation & affiliates. All Rights Reserved.