Created on Aug 8, 2019

Introduction

This document describes how to enable PVRDMA in VMware vSphere 6.5/6.7 with Mellanox ConnectX network card. 

This guide assumes the following software and drivers are installed:

  • VMware ESXi 6.7 Update 2, build 13006603
  • vCenter 6.7 Update 2, build 13007421
  • Distributed Switch 6.6.0
  • ConnectX® Ethernet Driver for VMware® ESXi Server 4.17.13.1-1vmw.670.2.48.13006603 
  • CentOS 7.6

References

Components Overview

vSphere Distributed Switch

A vSphere Distributed Switch provides centralized management and monitoring of the networking configuration of all hosts that are associated with the switch. You must set up a distributed switch on a vCenter Server system, and its settings will be propagated to all hosts that are associated with the switch.

Paravirtual RDMA (PVRDMA)

Direct Memory Access (DMA) is an ability of a device to access host memory directly, without the intervention of the CPU.

Remote Direct Memory Access (RDMA) is the ability of accessing (read, write) memory on a remote machine without interrupting the processing of the CPU(s) on that system.

RDMA Advantages

  • Zero-copy - applications can perform data transfers without the involvement of the network software stack. Data is sent and received directly to the buffers without being copied between the network layers.

  • Kernel bypass - applications can perform data transfers directly from user-space without kernel involvement.

  • No CPU involvement - applications can access remote memory without consuming any CPU time in the remote server. The remote memory server will be read without any intervention from the remote process (or processor). Moreover, the caches of the remote CPU will not be filled with the accessed memory content.

PVRDMA Architecture


Virtual Machine

            • Expose dual function virtual PCIe device
                  - A network interface – vmxnet3
                  - An RDMA provider – PVRDMA
            • RDMA provider plugs in to the OpenFabrics Enterprise Distribution (OFED) stack
                  - In kernel and user-space
            • Full support for the Verbs RDMA API
            • Live vMotion, Snapshots and HA




PVRDMA backend

                          • Creates virtual RDMA resources for VM
                          • Guests operate on these virtual resources


ESXi

                          • Leverage native RDMA and Core drivers
                          • Physical HCA services all VMs





Accelerating VM Data





VM memory address translations registered with HCA – Buffer registration.

Application issues a request (Work Request) to read/write from a particular guest address, size.

PVRDMA backend intercepts these requests issues requests to mapped hardware resources.



HCA performs DMAs to/from application memory without any SW involvement.

                              • Enables direct zero-copy data transfers in HW





Solution Overview

Equipment

 

Solution Logical Design


Bill of Materials


Solution Physical  Network Wiring

 

Network Configuration

The below table provides details of ESXi server names and network configuration:

ESXi

Server

Server

Name

IP and NICs
High-Speed Ethernet Network

Management Network

192.168.1.0

ESXi-01sl01w01esx21noneeno0: From DHCP (reserved)
ESXi-02sl01w01esx22noneeno0: From DHCP (reserved)
ESXi-03sl01w01esx23noneeno0: From DHCP (reserved)
ESXi-04sl01w01esx24noneeno0: From DHCP (reserved)


The below table provides details of VMs names and network configuration:

VM

Server

Name

IP and NICs
High-Speed Ethernet Network

Management Network

192.168.1.0

VM-01pvrdma-vm01192.168.11.51eno0: From DHCP (reserved)
VM-02pvrdma-vm02192.168.11.52eno0: From DHCP (reserved)
VM-03pvrdma-vm03192.168.11.53eno0: From DHCP (reserved)
VM-04pvrdma-vm04192.168.11.54eno0: From DHCP (reserved)

ESXi Host Configuration

Check host configurations

1. Enable SSH Access to ESXi server.

2. Log into ESXi vSphere Command-Line Interface with root permissions.

3. Verify that the host is equipped with Mellanox adapter.

~ lspci | grep Mellanox

0000:02:00.0 Network controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex] [vmnic2]
0000:02:00.1 Network controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex] [vmnic3]
Note: in this case, Mellanox card is using vmnic2 and vmnic3.

4. Verify the logical RDMA devices currently registered on the system.

~ esxcli rdma device list
Name 	 Driver 	 State 	 MTU    Speed 	   Paired Uplink 	Description
-------	 ----------	 ------	 ----   --------   -------------	-----------------------------------
vmrdma0	 nmlx5_rdma  Active	 1024	100 Gbps   vmnic2			MT28800 Family [ConnectX-5 MT28831]
vmrdma1	 nmlx5_rdma  Down	 1024	0		   vmnic3			MT28800 Family [ConnectX-5 MT28831]

Deployment Guide

Prerequisites: vSphere Distributed Switch (vDS)

Creating a vDS

Perform the following steps to create a new vDS:

1. Launch the vSphere Web Client and connect to a vCenter Server instance.

2. On the vSphere Web Client home screen, select the vCenter object from the list on the left. Then, select Distributed Switches from the Inventory Lists area.

3. On the right side of the vSphere Web Client, click the Create a New Distributed Switch icon (it looks like a switch with a green plus mark in the corner).
This launches the New vDS wizard.

4. Supply a name for the new distributed switch and select the location within the vCenter inventory (a datacenter object or a folder) where you would like to store the new vDS.
Click Next.

5. Select the version of the vDS you would like to create.

6. Specify the number of uplink ports as 2 and create a default port group with the name.

7. Click Next and then Finish.

Adding hosts to vDS.

Perform the following steps to add an ESXi host to an existing vDS:

1. Launch the vSphere Web Client, and connect to a vCenter Server instance.

2. Navigate to the list of distributed switches.

3. Select the new distributed switch in the list of objects on the right, and select Add and Manage Hosts from the Actions menu.

4. Select the Add Hosts radio button and click Next.

5. Click the green plus icon to add an ESXi host. This opens the Select New Host dialog box.

6. From the list of new hosts to add, place a check mark next to the name of each ESXi host you would like to add to the vDS.
Click OK when you are done, and then click Next to continue.

7. The next screen, make sure both the Manage physical adapters and Manage VMkernel adapters options are selected. Click Next to continue.

8. Configure vmnic2 in each ESXi host as an Uplink 1 for vDS.

9. Create and attach the vmkernel adapter vmk2 to vDS port group sl01-w01-vds02-pvrdma. Click the green plus icon and Select an existing network.
Click OK
to continue.

10. Click Next to continue.

11. Supply an IPv4 address and Subnet mask for the vmkernel adapter vmk2.
Click Next
to continue.

12. Click Next to continue.

13. Click Next to continue. 

14. Click Finish.

Configure an ESXi Host for PVRDMA

To use PVRDMA in vSphere 6.5/6.7, your environment must meet several configuration requirements.

To configure an ESXi host for PVRDMA, perform the following steps, will described in bottom.

Tag a VMkernel Adapter for PVRDMA

Select a VMkernel adapter and enable it for PVRDMA communication using the following steps:

  1. In the vSphere Web Client, navigate to the host.
  2. On the Configure tab, expand System.
  3. Click Advanced System Settings.
  4. Locate Net.PVRDMAvmknic and click Edit.

Enter the value of the VMkernel adapter that you want to use and click OK. In our lab environment, we entered vmk2.


(Optional). You can use ESXI CLI to Tag a vmknic created on DVS which VRDMA should use for TCP channel by run:

esxcli system settings advanced set -o /Net/PVRDMAVmknic -s vmk1

Enable the Firewall Rule for PVRDMA

Enable the firewall rule for PVRDMA in the security profile of the ESXi host using the following procedure:

  1. In the vSphere Web Client, navigate to the host.
  2. On the Configure tab, expand System.
  3. In the Security Profile → Firewall(6.7) or Firewall(6.5) section, click Edit.
  4. Scroll to the pvrdma rule and select the check box to it.
  5. Click OK.


(Optional). You can use ESXI CLI to Enable the pvrdma firewall rule (or disable the firewall?!):

esxcli network firewall ruleset set -e true -r pvrdma


Assign a PVRDMA Adapter to a Virtual Machine

To enable a virtual machine to exchange data using RDMA, you must associate the VM with a PVRDMA network adapter. The steps are as follows:

  1. Locate the VM in the vSphere Web Client.
    1. Select a data center, folder, cluster, resource pool, or host and click the VMs tab.
    2. Click Virtual Machines and double-click the VM from the list.
  2. Power off the VM.
  3. In the Configure tab of the VM, expand Settings and select VM Hardware.
  4. Click Edit and select the Virtual Hardware tab in the dialog box displaying the settings.
  5. At the bottom of the window next to New device, select Network and click Add.
  6. Expand the New Network section and connect the VM to a distributed port group.
  7. For Adapter Type, select PVRDMA.
  8. Expand the Memory section, select Reserve all guest memory (All locked).
  9. Click OK to close the dialog window.
  10. Power on the virtual machine

Configure Guest OS for PVRDMA

This step assumes a procedure to assign a PVRDMA Adapter to a Virtual Machine with CentOS 7.2 or later.

To configure a Guest OS for PVRDMA, you need to install a PVRDMA driver. Installation process depends on ESXi version, VM tools and guest OS version. 

Related Documents



Please add labels (Author responsible to add!)

Guest OS: CentOS 7.3 and later
VM hardware version 14
ESXi v6.7
  1. Create VM  with VM Compatibility version 14 and install CentOS 7.3 or later.

  2. Add PVRDMA adapter over DVS portgroup from vCenter.
  3. Install InfiniBand packages and reload pvrdma driver:

    yum groupinstall "Infiniband Support" –y
    rmmod vmw_pvrdma
    modprobe vmw_pvrdma
    ibv_devinfo

Guest OS: CentOS 7.2
VM hardware version 13
ESXi v6.5

  1. Create VM  with VM Compatibility version 13 and install CentOS 7.2.

  2. Add PVRDMA adapter over DVS portgroup from vCenter.
  3.  Install InfiniBand drivers:

    yum groupinstall "Infiniband Support" –y
  4. Install pvrdma driver:

    tar xf vrdma_ib_devel.tar
    cd vrdma_ib_devel/
    make
    cp pvrdma.ko /lib/modules/3.10.0-327.el7.x86_64/extra/
    depmod –a
    modprobe pvrdma
  5. Install vrdma lib:

    cd /tmp/
    tar xf libvrdma_devel.tar
    cd libvrdma_devel/
    ./autogen.sh
    ./configure --libdir=/lib64
    make
    make install
    cp pvrdma.driver /etc/libibverbs.d/
    rmmod pvrdma
    modprobe pvrdma
    ibv_devinfo



Guest OS: CentOS 7.3 and later
VM hardware version 14
ESXi v6.7
Guest OS: CentOS 7.2
VM hardware version 13
ESXi v6.5
  1. Create VM  with VM Compatibility version 14 and install CentOS 7.3 or later.

  2. Add PVRDMA adapter over DVS portgroup from vCenter.
  3. Install InfiniBand packages and reload pvrdma driver:

    yum groupinstall "Infiniband Support" –y
    rmmod vmw_pvrdma
    modprobe vmw_pvrdma
    ibv_devinfo




  1. Create VM  with VM Compatibility version 13 and install CentOS 7.2.

  2. Add PVRDMA adapter over DVS portgroup from vCenter.
  3.  Install InfiniBand drivers:

    yum groupinstall "Infiniband Support" –y
  4. Install pvrdma driver:

    tar xf vrdma_ib_devel.tar
    cd vrdma_ib_devel/
    make
    cp pvrdma.ko /lib/modules/3.10.0-327.el7.x86_64/extra/
    depmod –a
    modprobe pvrdma
  5. Install vrdma lib:

    cd /tmp/
    tar xf libvrdma_devel.tar
    cd libvrdma_devel/
    ./autogen.sh
    ./configure --libdir=/lib64
    make
    make install
    cp pvrdma.driver /etc/libibverbs.d/
    rmmod pvrdma
    modprobe pvrdma
    ibv_devinfo





You can upgrade VM Compatibility to v.13 in ESXi 6.5 and 14 in Esxi 6.7 if not upgraded.

Deployment Verification

To test the communication using PVRDMA we will use a PerftestThis is a collection of tests written over uverbs intended for use as a performance micro-benchmark.

The tests may be used for HW or SW tuning as well as for functional testing.

To install and run benchmark go by the steps are as follows:

  1. Install Perftest:

    yum install git
    git clone https://github.com/linux-rdma/perftest.git
    cd perftest/
    yum install autotools-dev automake
    yum install libtool
    yum install libibverbs-devel
    ./autogen.sh
    ./configure
    make -j 8
  2. Check network interface name:

    ifconfig
    ...
    ens224f0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
  3. Add Static IP configuration to network interface. Modify /etc/sysconfig/network-scripts/ifcfg-ens224f0

    HWADDR=00:50:56:aa:65:92
    DNS1=192.168.1.21
    DOMAIN=vwd.clx
    BOOTPROTO="static"
    NAME="ens224f0"
    DEVICE="ens224f0"
    ONBOOT="yes"
    USERCTL=no
    IPADDR=192.168.11.51
    NETMASK=255.255.255.0
    PEERDNS=no
    IPV6INIT=no
    IPV6_AUTOCONF=no
    ping 192.168.11.51
  4. Repeat steps 1-3 to second to VM
  5. On first VM ("Server"):

    systemctl disable firewall
    systemctl stop firewalld
    systemctl disable firewalld
    firewall-cmd --state
    ./ib_write_bw -x 0 -d vmw_pvrdma0 --report_gbits
  6. On second VM ("Client"):

    ./ib_write_bw -x 0 -F 192.168.11.51 -d vmw_pvrdma0 --report_gbits
    ************************************
    
    * Waiting for client to connect... *
    ************************************
    -------------------------------------------------------------------------------- -------
    RDMA_Write BW Test
    Dual-port : OFF Device : vmw_pvrdma0
    Number of qps : 1 Transport type : IB
    Connection type : RC Using SRQ : OFF
    CQ Moderation : 100
    Mtu : 1024[B]
    Link type : Ethernet
    GID index : 0
    Max inline data : 0[B]
    rdma_cm QPs : OFF
    Data ex. method : Ethernet
    -------------------------------------------------------------------------------- -------
    local address: LID 0000 QPN 0x0004 PSN 0xfb9486 RKey 0x000005 VAddr 0x007f68c62 a1000
    GID: 254:128:00:00:00:00:00:00:02:80:86:255:254:170:101:146
    remote address: LID 0000 QPN 0x0002 PSN 0xe72165 RKey 0x000003 VAddr 0x007f2ab4 361000
    GID: 254:128:00:00:00:00:00:00:02:80:86:255:254:170:58:174
    -------------------------------------------------------------------------------- -------
    #bytes #iterations BW peak[MB/sec] BW average[MB/sec] MsgRate[Mpps]
    65536  5000          90.56           90.39            0.172405
    -------------------------------------------------------------------------------- -------

Done!