image image image image image image



On This Page


Created on Sep 12, 2019 

Introduction

More and more media and entertainment (M&E) solution providers are moving their proprietary legacy video solutions to a next-generation IP-based infrastructures to meet the increasing global demand for ultra-high-definition video content. Broadcast providers are looking into cloud-based solutions to offer better scalability and flexibility which introduces new challenges such as multi-tenant high quality streaming at scale and time synchronization in cloud.

Red Hat OpenStack Platform (OSP) is a cloud computing platform that enables the creation, deployment, scale, and management of a secure and reliable public or private OpenStack-based cloud. This production-ready platform offers a tight integration with NVIDIA products and technologies and is used in this guide to demonstrate a full deployment of the "NVIDIA Media Cloud".

NVIDIA Media Cloud is a solution that includes NVIDIA Rivermax library for Packet Pacing, Kernel Bypass and Packet Aggregation along with cloud time synchronization and OVS HW offload to the NVIDIA SmartNIC using the Accelerated Switching And Packet Processing (ASAP2 ) framework.
By the end of this guide you will be able to run offloaded HD media streams between VMs in different racks and validate that it complies with the SMPTE 2110 standards, while using commodity switches, servers and NICs.

The following reference deployment guide (RDG) demonstrates a complete deployment of the RedHat Openstack Platform 13 for Media Streaming applications with NVIDIA SmartNIC hardware offload capabilities.

We'll explain the setup components, scale considerations and other aspects such as the hardware BoM (Bill of Materials) and time synchronization, as well as streaming application compliance testing in the cloud.

Before we start it's highly recommended to become familiar with the key technology features of this deployment guide: 

  • Rivermax video streaming library 
  • ASAP2 - Accelerated Switch and Packet Processing

Visit the product pages in the links below to learn more about these feature and their capabilities: 

NVIDIA Rivermax Video Streaming Library 

NVIDIA Accelerated Switching and Packet Processing 


Downloadable Content

All configuration files are located here: 


References


NVIDIA Components

  • NVIDIA Rivermax implements an optimized software library API for media streaming applications. It runs on NVIDIA ConnectX®-5 network adapters or higher, enabling the use of off-the-shelf (COTS) servers for HD to Ultra HD flows. The Rivermax and ConnectX®-5 adapter cards combination complies with the SMPTE 2110-21 standards, which reduces CPU utilization for video data streaming, and removes bottlenecks for the highest throughput.
  • NVIDIA Accelerated Switching and Packet Processing (ASAP²) is a framework that enables offloading network data planes into a SmartNIC HW, such as Open vSwitch (OVS) offload which enables a performance boost of up to 10x with complete CPU load reduction. ASAP² is available on the ConnectX-4 Lx and later.
  • NVIDIA Spectrum Switch family provides the most efficient network solutions for the ever-increasing performance demands of data center applications.
  • NVIDIA ConnectX Network Adapter family delivers industry-leading connectivity for performance-driven server and storage applications. ConnectX adapter cards enable high bandwidth, coupled with ultra-low latency for diverse applications and systems, resulting in faster access and real-time responses.
  • NVIDIA LinkX Cables and Transceivers family provides the industry’s most complete line of 10, 25, 40, 50, 100, 200, and 400Gb interconnect products for Cloud, Web 2.0, Enterprise, telco, and storage data center applications. They are often used to link top-of-rack switches downwards to servers, storage & appliances and upwards in switch-to-switch applications

Solution Setup Overview 

Below is a list of all the different components in this solution and how they are utilized:

Cloud Platform

The RH-OSP13 will be deployed in large scale and utilized as the cloud platform. 

Compute Nodes 

The compute nodes will be configured and deployed as “Media Compute Nodes”, adjusted for low latency virtual media applications. Each Compute/Controller node is equipped with a dual-port 100GB NIC of which one port is dedicated for VXLAN tenant network and the other for VLAN Multicast tenant network, Storage, Control, and PTP time synchronization.

Packet Pacing is enabled on the NIC ports specifically to allow MC (MultiCast) Pacing on the VLAN Tenant Network.

Network

The different network components used in this user guide are configured in the following way:

  • Multiple racks interconnected via Spine/Leaf network architecture
  • Composable routed provider networks are used per rack
  • Compute nodes on different provider network segments will host local DHCP agent instances per openstack subnet segment. 
  • L3 OSPF underlay is used to route between the provider routed networks (another fabric-wide IGP can be used as desired)
  • Multicast
    • VLAN tenant network is used for Multicast media traffic and will utilize SR-IOV on the VM 
    • IP-PIM (Sparse Mode) is used for routing the tenant Multicast streams between the racks which are located in routed provider networks
    • IGMP snooping is used to manage the tenant multicast groups in the same L2 racks domain
  • Unicast  
    • ASAP²-enabled Compute nodes are located in different racks and maintain VXLAN tunnels as overlay for tenant VM traffic
    • The VXLAN tenant network is used for Unicast media traffic and will utilize ASAP²  to offload the CPU-intensive VXLAN traffic, in order to avoid the encapsulation/decapsulation performance penalty and achieve the optimum throughput
  • Openstack Neutron is used as an SDN controller. All network configuration for every openstack node will be done via Openstack orchestration
  • RHOSP inbox drivers are used on all infrastructure components except for VM guests

Time Synchronization

Time Synchronization will be configured in the following way:

  • linuxptp tools are used on compute nodes and application VMs
  • PTP traffic is untagged on the compute nodes
  • Onyx Switches propagate the time between the compute nodes and act as PTP Boundary Clock devices
  • One of the switches is used as PTP master clock (in real-life deployments a dedicated grand master should be used)
  • KVM virtual PTP driver is used by the VMs to pull the PTP time from their hosting hypervisor which is synced to the PTP clock source

Media Application

NVIDIA provides a Rivermax VM cloud image which includes all Rivermax tools and applications. The Rivermax VM provides a demonstration of the media test application and allows the user to validate compliance with the relevant media standards, i.e SMPTE 2110 (an evaluation License is required).



Solution Components   





Solution General Design


Solution Multicast Design


Cloud Media Application Design 


VXLAN HW Offload Overview



Large Scale Overview


HW Configuration

Bill of Materials (BoM)


Note

  • The BoM above refers to the maximal configuration in a large scale solution, with a blocking ratio of 3:1
  • It is possible to change the blocking ratio to obtain a different capacity
  • The SN2100 and SN2700 switches share the same feature set and can be used in this solution accordingly with compute and/or network capacity required
  • The 2-Rack BoM will be used in the solution example described below


Solution Example

We chose the below key features as a baseline to demonstrate the solution used in this RDG.

Note

The solution example below does not contain redundancy configuration

Solution Scale

  • 2 x racks with a dedicated provider network set per rack
  • 1 x SN2700 switch as Spine switch
  • 2 x SN2100 switches as Leaf switches, 1 per rack
  • 5 nodes in rack 1 (3 x Controller, 2 x Compute)
  • 2 nodes in rack 2 (2 x Compute)
  • All nodes are connected to the Leaf switches using 2 x 100GB ports per node
  • Leaf switches are connected to each Spine switch using a single 100GB port

Physical Rack Diagram

In this RDG we placed all the equipment into the same rack, but the wiring and configuration simulates a two rack network setup.

  


PTP Diagram


Note

One of the Onyx Leaf switches is used as PTP clock source GrandMaster instead of a dedicated device.

Solution Networking

Network Diagram


Note

Compute nodes access External Network/Internet through the undercloud node which functions as a router.


Network Physical Configuration

Important !

The configuration steps below refer to a solution example based on 2 racks

Below is a detailed step-by-step description of the network configuration:


  1. Connect the switches to the switch mgmt network
  2. Interconnect the switches using 100GB/s cables
  3. Connect the Controller/Compute servers to the relevant networks according to the following diagrams:

  4. Connect the Undercloud Director server to the IPMI, PXE and External networks.

Switch Profile Configuration

MC Max Profile must be set on all switches. This will remove existing configurations and will require a reboot.

You shall backup your switch configuration in case you plan to use it later.


Run the command on all switches:

system profile eth-ipv4-mc-max 
show system profile

Switch Interface Configuration

Set the VLANs and VLAN interfaces on the Leaf switches according to the following:

Network Name

Network Set

Leaf Switch Location

Network Details

Switch Interface IP

VLAN ID

Switch Physical Port

Switchport Mode

Note

Storage

1

Rack 1

172.16.0.0 / 24

172.16.0.1

11

A

hybrid


Storage_Mgmt

172.17.0.0 / 24

172.17.0.1

21

A

hybrid


Internal API

172.18.0.0 / 24

172.18.0.1

31

A

hybrid


PTP172.20.0.0 /24172.20.0.151Ahybridaccess vlan
MC_Tenant_VLAN11.11.11.0/2411.11.11.1101Ahybrid

Tenant_VXLAN

172.19.0.0 / 24

172.19.0.1

41

B

access


Storage_2

2

Rack 2

172.16.2.0 / 24

172.16.2.1

12

A

hybrid


Storage_Mgmt_2

172.17.2.0 / 24

172.17.2.1

22

A

hybrid


Internal API _2

172.18.2.0 /24

172.18.2.1

32

A

hybrid


PTP_2172.20.2.0/24172.20.2.152Ahybridaccess vlan
MC_Tenant_VLAN22.22.22.0/2422.22.22.1101Ahybrid

Tenant_VXLAN_2

172.19.2.0 /24

172.19.2.1

42

B

access




Rack 1 Leaf switch VLAN DiagramRack 2 Leaf switch VLAN Diagram

         

Switch Full Configuration

Note

  • Onyx 3.8.1204 version and up is required
  • Switch SW-09 is used as Spine switch
  • SW-10 and SW-11 are used as Leaf switches
  • Leaf SW-11 is configured with a PTP grandmaster role
  • port 1/9 in all Leaf switches should face the Spine switch. The rest of the ports should face the Compute\Controller nodes
  • igmp immediate/fast-leave switch configurations should be removed in case multiple virtual receivers are used on a Compute node

Spine (SW-09) configuration: 

##
## STP configuration
##
no spanning-tree

##   
## L3 configuration
##
ip routing
interface ethernet 1/1-1/2 no switchport force
interface ethernet 1/1 ip address 192.168.119.9/24 primary
interface ethernet 1/2 ip address 192.168.109.9/24 primary
interface loopback 0 ip address 1.1.1.9/32 primary
 
##
## LLDP configuration
##
   lldp
   
##
## OSPF configuration
##
protocol ospf
router ospf router-id 1.1.1.9
   interface ethernet 1/1 ip ospf area 0.0.0.0
   interface ethernet 1/2 ip ospf area 0.0.0.0
   interface ethernet 1/1 ip ospf network broadcast
   interface ethernet 1/2 ip ospf network broadcast
   router ospf redistribute direct   
  
##
## IP Multicast router configuration
##
   ip multicast-routing 
   
##
## PIM configuration
##
   protocol pim
   interface ethernet 1/1 ip pim sparse-mode
   interface ethernet 1/2 ip pim sparse-mode
   ip pim multipath next-hop s-g-hash
   ip pim rp-address 1.1.1.9

##
## IGMP configuration
##
   interface ethernet 1/1 ip igmp immediate-leave
   interface ethernet 1/2 ip igmp immediate-leave
 
##
## Network management configuration
##
   ntp disable
   
##
## PTP protocol
##
   protocol ptp
   ptp vrf default enable
   interface ethernet 1/1 ptp enable
   interface ethernet 1/2 ptp enable

Leaf Rack 1 (SW-11) configuration:

##
## STP configuration
##
no spanning-tree


##
## LLDP configuration
##
   lldp
    
##
## VLAN configuration
##

vlan 11 
name storage
exit
vlan 21
name storage_mgmt
exit
vlan 31 
name internal_api
exit
vlan 41 
name tenant_vxlan
exit
vlan 51 
name ptp
exit
vlan 101 
name tenant_vlan_mc
exit

interface ethernet 1/1-1/5 switchport access vlan 41
interface ethernet 1/11-1/15 switchport mode hybrid 
interface ethernet 1/11-1/15 switchport access vlan 51
interface ethernet 1/11-1/15 switchport hybrid allowed-vlan add 11
interface ethernet 1/11-1/15 switchport hybrid allowed-vlan add 21
interface ethernet 1/11-1/15 switchport hybrid allowed-vlan add 31
interface ethernet 1/11-1/15 switchport hybrid allowed-vlan add 101

##
## IGMP Snooping configuration
##
   ip igmp snooping unregistered multicast forward-to-mrouter-ports
   ip igmp snooping
   vlan 51 ip igmp snooping
  vlan 101 ip igmp snooping
   vlan 51 ip igmp snooping querier
   vlan 101 ip igmp snooping querier
   interface ethernet 1/11-1/15 ip igmp snooping fast-leave

   
##   
## L3 configuration
##
ip routing
interface ethernet 1/9 no switchport force
interface ethernet 1/9 ip address 192.168.119.11/24 primary

interface vlan 11 ip address 172.16.0.1 255.255.255.0
interface vlan 21 ip address 172.17.0.1 255.255.255.0
interface vlan 31 ip address 172.18.0.1 255.255.255.0
interface vlan 41 ip address 172.19.0.1 255.255.255.0
interface vlan 51 ip address 172.20.0.1 255.255.255.0
interface vlan 101 ip address 11.11.11.1 255.255.255.0
  
  
##
## OSPF configuration
##
protocol ospf
router ospf router-id 1.1.1.11
interface ethernet 1/9 ip ospf area 0.0.0.0
interface ethernet 1/9 ip ospf network broadcast
router ospf redistribute direct  
  
   
##
## IP Multicast router configuration
##
   ip multicast-routing 
   
##
## PIM configuration
##
   protocol pim
   interface ethernet 1/9 ip pim sparse-mode
   ip pim multipath next-hop s-g-hash
   interface vlan 101 ip pim sparse-mode
   ip pim rp-address 1.1.1.9

##
## IGMP configuration
##
   interface ethernet 1/9 ip igmp immediate-leave
   interface vlan 101 ip igmp immediate-leave
   
##
## Network management configuration
##
   ntp disable
   
##
## PTP protocol
##
   protocol ptp
   ptp vrf default enable
   ptp priority1 1
   interface ethernet 1/9 ptp enable
   interface ethernet 1/11-1/15 ptp enable
   interface vlan 51 ptp enable 

SW-10 (Leaf Rack 2)

SW-10 Leaf Rack 2

##
## STP configuration
##
no spanning-tree


##
## LLDP configuration
##
   lldp
    
##
## VLAN configuration
##

vlan 12
name storage
exit
vlan 22
name storage_mgmt
exit
vlan 32 
name internal_api
exit
vlan 42 
name tenant_vxlan
exit
vlan 52 
name ptp
exit
vlan 101 
name tenant_vlan_mc
exit

interface ethernet 1/1-1/2 switchport access vlan 42
interface ethernet 1/11-1/12 switchport mode hybrid 
interface ethernet 1/11-1/12 switchport access vlan 52
interface ethernet 1/11-1/12 switchport hybrid allowed-vlan add 12
interface ethernet 1/11-1/12 switchport hybrid allowed-vlan add 22
interface ethernet 1/11-1/12 switchport hybrid allowed-vlan add 32
interface ethernet 1/11-1/12 switchport hybrid allowed-vlan add 101

##
## IGMP Snooping configuration
##
   ip igmp snooping unregistered multicast forward-to-mrouter-ports
   ip igmp snooping
   vlan 52 ip igmp snooping
   vlan 52 ip igmp snooping querier
   vlan 101 ip igmp snooping
   vlan 101 ip igmp snooping querier
   interface ethernet 1/11-1/12 ip igmp snooping fast-leave

   
##   
## L3 configuration
##
ip routing
interface ethernet 1/9 no switchport force
interface ethernet 1/9 ip address 192.168.109.10/24 primary

interface vlan 12 ip address 172.16.2.1 255.255.255.0
interface vlan 22 ip address 172.17.2.1 255.255.255.0
interface vlan 32 ip address 172.18.2.1 255.255.255.0
interface vlan 42 ip address 172.19.2.1 255.255.255.0
interface vlan 52 ip address 172.20.2.1 255.255.255.0
interface vlan 101 ip address 22.22.22.1 255.255.255.0
  
  
##
## OSPF configuration
##
protocol ospf
router ospf router-id 2.2.2.10
interface ethernet 1/9 ip ospf area 0.0.0.0
interface ethernet 1/9 ip ospf network broadcast
router ospf redistribute direct  
  
   
##
## IP Multicast router configuration
##
   ip multicast-routing 
   
##
## PIM configuration
##
   protocol pim
   interface ethernet 1/9 ip pim sparse-mode
   ip pim multipath next-hop s-g-hash
   interface vlan 101 ip pim sparse-mode
   ip pim rp-address 1.1.1.9

##
## IGMP configuration
##
   interface ethernet 1/9 ip igmp immediate-leave
   interface vlan 101 ip igmp immediate-leave
   
##
## Network management configuration
##
   ntp disable
   
##
## PTP protocol
##
   protocol ptp
   ptp vrf default enable
   interface ethernet 1/9 ptp enable
   interface ethernet 1/11-1/12 ptp enable
   interface vlan 52 ptp enable 



Solution Configuration and Deployment

The following information will take you through the configuration and deployment steps of the solution.

Prerequisites

Make sure that the hardware specifications are identical for servers with the same role (Compute/Controller/etc.)

Server Preparation - BIOS

Make sure that for all servers:

  1. The network boot is set on the interface connected to PXE network.
  2. Virtualization and SRIOV are enabled.


Make sure that for Compute servers:

  1. The network boot is set on the interface connected to the PXE network
  2. Virtualization and SRIOV are enabled
  3. Power Profile is at "Maximum Performance"
  4. HyperThreading is disabled
  5. C-state is disabled
  6. Turbo Mode is disabled
  7. Collaborative Power Control is disabled
  8. Processor Power and Utilization Monitoring (crtl+A) are disabled

NIC Preparation 

SRIOV configuration is disabled by default on ConnectX-5 NICs and must be enabled for every NIC used by a Compute node.

To enable and configure SRIOV, insert the Compute NIC into a test server with an OS installed, and follow the below steps:

  1. Run the following to verify that the firmware version is 16.21.2030 or later:

    [root@host ~]# ethtool -i ens2f0
    driver: mlx5_core
    version: 5.0-0
    firmware-version: 16.22.1002 (MT_0000000009)
    expansion-rom-version:
    bus-info: 0000:07:00.0
    supports-statistics: yes
    supports-test: yes
    supports-eeprom-access: no
    supports-register-dump: no
    supports-priv-flags: yes


    If the firmware version is older, download and burn the new firmware as described here


  2. Install the mstflint package:

    [root@host ~]# yum install mstflint
    
  3. Identify the PCI ID of the first 100G port and enable SRIOV:

    [root@host ~]# lspci | grep -i mel
    07:00.0 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex]
    07:00.1 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex]
    [root@host ~]#
    [root@host ~]# mstconfig -d 0000:07:00.0 query | grep -i sriov
    SRIOV_EN False(0)
    SRIOV_IB_ROUTING_MODE_P1 GID(0)
    SRIOV_IB_ROUTING_MODE_P2 GID(0)
    [root@host ~]# mstconfig -d 0000:07:00.0 set SRIOV_EN=1
    Device #1:
    ----------
    
    Device type: ConnectX5
    PCI device: 0000:07:00.0
    
    Configurations: Next Boot New
    SRIOV_EN False(0) True(1)
    
    Apply new Configuration? ? (y/n) [n] : y
    Applying... Done!
    -I- Please reboot machine to load new configurations.
  4. Set the number of VFs to a high value, such as 64, and reboot the server to apply the new configuration:

    [root@host ~]# mstconfig -d 0000:07:00.0 query | grep -i vfs
    NUM_OF_VFS 0
    [root@host ~]# mstconfig -d 0000:07:00.0 set NUM_OF_VFS=64
    
    Device #1:
    ----------
    
    Device type: ConnectX5
    PCI device: 0000:07:00.0
    
    Configurations: Next Boot New
    NUM_OF_VFS 0 64
    
    Apply new Configuration? ? (y/n) [n] : y
    Applying... Done!
    -I- Please reboot machine to load new configurations.
    [root@host ~]# reboot
  5. Confirm the new settings were applied using the mstconfig query commands shown above.
  6. Insert the NIC back to the Compute node.
  7. Repeat the procedure above for every Compute node NIC used in our setup.


Note

  • In our solution, the first port of the two 100G ports in every NIC is used for the ASAP² accelerated data plane. This is the reason we enable SR-IOV only on the first ConnectX NIC PCI device (07:00.0 in the example above).
  • There are future plans to support an automated procedure to update and configure the NICs on the Compute nodes from the Undercloud.

Accelerated RH-OSP Installation and Deployment

The following steps will take you through the accelerated RH-OSP installation and deployment procedure:

  1. Install Red Hat 7.6 OS on the Undercloud server and set an IP to its interface which is connected to the external network; make sure it has internet connectivity.
  2. Install the Undercloud and the director as instructed in section 4 of the Red Hat OSP Director Installation and Usage - Red Hat Customer Portal. Our undercloud.conf file is attached as a reference here: Configuration Files
  3. Configure a container image source as instructed in section 5 of the above guide. Our solution is using undercloud as a local registry.

    Note

    The following overcloud image versions are used in our deployment:

    rhosp-director-images-13.0-20190418.1.el7ost.noarch
    rhosp-director-images-ipa-13.0-20190418.1.el7ost.noarch
    rhosp-director-images-ipa-x86_64-13.0-20190418.1.el7ost.noarch
    rhosp-director-images-x86_64-13.0-20190418.1.el7ost.noarch

    The overcloud image is RH 7.6 with kernel 3.10.0-957.10.1.el7.x86_64

  4. Register the nodes of the overcloud as instructed in section 6.1. Our instackenv.json file is attached as a reference.

  5. Inspect the hardware of the nodes as instructed in section 6.2.
    Once introspection is completed, it is recommended to confirm for each node that the desired root disk was detected since cloud deployment can fail later because of insufficient disk space. Use the following command to check the free space on the detected disk selected as root:

    (undercloud) [stack@rhosp-director ~]$ openstack baremetal node show 92c4c1cb-ce7d-48d4-a2d9-75b2651db097 | grep properties | properties | {u'memory_mb': u'131072', u'cpu_arch': u'x86_64', u'local_gb': u'418', u'cpus': u'24', u'capabilities': u'boot_option:local'}

    “local_gb” value is representing the disk size. In case the disk size is low and not as expected, use the procedure described in section 6.6 for defining the root disk for the node. Note that an additional introspection cycle is required for this node after the root disk is changed.

  6. Verify that all nodes were registered properly and changed their state to “available” before proceeding to the next step:

    +--------------------------------------+--------------+---------------+-------------+--------------------+-------------+
    | UUID                                 | Name         | Instance UUID | Power State | Provisioning State | Maintenance |
    +--------------------------------------+--------------+---------------+-------------+--------------------+-------------+
    | d1fca940-e341-491b-8afd-0cf6d748aa29 | controller-1 | None          | power off   | available          | False       |
    | 6b24d02c-3fd2-4e55-a730-c45008f01723 | controller-2 | None          | power off   | available          | False       |
    | 098c3e2d-1c70-41d2-983b-6c266387de0b | controller-3 | None          | power off   | available          | False       |
    | 91492c2a-b26c-49ef-9d4e-e492a1578076 | compute-1    | None          | power off   | available          | False       |
    | cdf9e0ec-e3cb-4005-86f6-d40e684a9b19 | compute-2    | None          | power off   | available          | False       |
    | 92c4c1cb-ce7d-48d4-a2d9-75b2651db097 | compute-3    | None          | power off   | available          | False       |
    | bb5e829a-834b-4eb1-b733-0012ce9d5f00 | compute-4    | None          | power off   | available          | False       |
    +--------------------------------------+--------------+---------------+-------------+--------------------+-------------+



The next step is to Tag the nodes into profiles


  1. Tag the controllers nodes into “control” default profile:

    (undercloud) [stack@rhosp-director ~]$ openstack baremetal node set --property capabilities='profile:control,boot_option:local' controller-1
    (undercloud) [stack@rhosp-director ~]$ openstack baremetal node set --property capabilities='profile:control,boot_option:local' controller-2
    (undercloud) [stack@rhosp-director ~]$ openstack baremetal node set --property capabilities='profile:control,boot_option:local' controller-3


  2. Create two new compute flavors -- one per rack (compute-r1, compute-r2) -- and attach the flavors to profiles with a correlated name:

    (undercloud) [stack@rhosp-director ~]$ openstack flavor create --id auto --ram 4096 --disk 40 --vcpus 1 compute-r1
    (undercloud) [stack@rhosp-director ~]$ openstack flavor set --property "capabilities:boot_option"="local" --property "capabilities:profile"="compute-r1" --property "resources:CUSTOM_BAREMETAL"="1" --property "resources:DISK_GB"="0" --property "resources:MEMORY_MB"="0" --property "resources:VCPU"="0" compute-r1
    
    (undercloud) [stack@rhosp-director ~]$ openstack flavor create --id auto --ram 4096 --disk 40 --vcpus 1 compute-r2
    (undercloud) [stack@rhosp-director ~]$ openstack flavor set --property "capabilities:boot_option"="local" --property "capabilities:profile"="compute-r2" --property "resources:CUSTOM_BAREMETAL"="1" --property "resources:DISK_GB"="0" --property "resources:MEMORY_MB"="0" --property "resources:VCPU"="0" compute-r2
  3. Tag compute nodes 1,3 into “compute-r1” profile to associate it with Rack 1, and compute nodes 2,4 into “compute-r2” profile to associate it with Rack 2:

    (undercloud) [stack@rhosp-director ~]$ openstack baremetal node set --property capabilities='profile:compute-r1,boot_option:local' compute-1
    (undercloud) [stack@rhosp-director ~]$ openstack baremetal node set --property capabilities='profile:compute-r1,boot_option:local' compute-3
    (undercloud) [stack@rhosp-director ~]$ openstack baremetal node set --property capabilities='profile:compute-r2,boot_option:local' compute-2
    (undercloud) [stack@rhosp-director ~]$ openstack baremetal node set --property capabilities='profile:compute-r2,boot_option:local' compute-4
  4. Verify profile tagging per node using the command below:

    (undercloud) [stack@rhosp-director ~]$ openstack overcloud profiles list
    +--------------------------------------+--------------+-----------------+-----------------+-------------------+
    | Node UUID                            | Node Name    | Provision State | Current Profile | Possible Profiles |
    +--------------------------------------+--------------+-----------------+-----------------+-------------------+
    | d1fca940-e341-491b-8afd-0cf6d748aa29 | controller-1 | available       | control | |
    | 6b24d02c-3fd2-4e55-a730-c45008f01723 | controller-2 | available       | control | |
    | 098c3e2d-1c70-41d2-983b-6c266387de0b | controller-3 | available       | control | |
    | 91492c2a-b26c-49ef-9d4e-e492a1578076 | compute-1    | available       | compute-r1 | |
    | cdf9e0ec-e3cb-4005-86f6-d40e684a9b19 | compute-2    | available       | compute-r2 | |
    | 92c4c1cb-ce7d-48d4-a2d9-75b2651db097 | compute-3    | available       | compute-r1 | |
    | bb5e829a-834b-4eb1-b733-0012ce9d5f00 | compute-4    | available       | compute-r2 | |
    +--------------------------------------+--------------+-----------------+-----------------+-------------------+

    It is possible to tag the nodes into profiles in instackenv.json file during node registration (section 6.1) instead of running the tag command per node, however flavors and profiles must be created in any case.

NVIDIA NICs Listing

Run the following command to go over all registered nodes and identify the interface names of the dual port NVIDIA 100GB NIC. Interface names are used later on in the configuration files.

(undercloud) [stack@rhosp-director templates]$ for node in $(openstack baremetal node list --fields uuid -f value) ; do openstack baremetal introspection interface list $node ; done
.
.
+-----------+-------------------+----------------------+-------------------+----------------+
| Interface | MAC Address       | Switch Port VLAN IDs | Switch Chassis ID | Switch Port ID |
+-----------+-------------------+----------------------+-------------------+----------------+
| eno1      | ec:b1:d7:83:11:b8 | []                   | 94:57:a5:25:fa:80 | 29 |
| eno2      | ec:b1:d7:83:11:b9 | []                   | None              | None |
| eno3      | ec:b1:d7:83:11:ba | []                   | None              | None |
| eno4      | ec:b1:d7:83:11:bb | []                   | None              | None |
| ens1f1    | ec:0d:9a:7d:81:b3 | []                   | 24:8a:07:7f:ef:00 | Eth1/14 |
| ens1f0    | ec:0d:9a:7d:81:b2 | []                   | 24:8a:07:7f:ef:00 | Eth1/1 |
+-----------+-------------------+----------------------+-------------------+----------------+

Note

Names must be identical for all nodes, or at least for all nodes sharing the same role. In our case, it is ens2f0/ens2f1 in Controller nodes, and enf1f0/ens1f1 in Compute nodes.



Note

The configuration file examples in the following sections are partial and were employed to highlight specific sections. The full configuration files are available to download in the following link:

Configuration Files

Deployment configuration and environment files:

Role definitions file:

  • The provided /home/stack/templates/roles_data_rivermax.yaml file includes a standard Controller role and two types of Compute roles, one per associated network rack

  • The NeutronDhcpAgent service is added to the Compute roles

 Below is a partial output of the config files:

###############################################################################
# Role: ComputeSriov1 #
###############################################################################
- name: ComputeSriov1
description: |
Compute SR-IOV Role R1
CountDefault: 1
networks:
- InternalApi
- Tenant
- Storage
- Ptp
HostnameFormatDefault: '%stackname%-computesriov1-%index%'
disable_upgrade_deployment: True
ServicesDefault:
###############################################################################
# Role: ComputeSriov2 #
###############################################################################
- name: ComputeSriov2
description: |
Compute SR-IOV Role R2
CountDefault: 1
networks:
- InternalApi_2
- Tenant_2
- Storage_2
- Ptp_2
HostnameFormatDefault: '%stackname%-computesriov2-%index%'
disable_upgrade_deployment: True
ServicesDefault:


The full configuration file is attached to this document for your convenience.


Node Counts and Flavors file:

The provided /home/stack/templates/node-info.yaml specifies count nodes and the correlated flavor per role.


Full configuration file:

parameter_defaults:
OvercloudControllerFlavor: control
OvercloudComputeSriov1Flavor: compute-r1
OvercloudComputeSriov2Flavor: compute-r2
ControllerCount: 3
ComputeSriov1Count: 2



Rivermax Environment Configuration file:


The provided /home/stack/templates/rivermax-env.yaml file is used to configure the Compute nodes for low latency applications with HW offload:


  • ens1f0 is used for accelerated VXLAN data plane (Nova physical_network: null is required for VXLAN offload)

  • CPU isolation: cores 2-5,12-17 are isolated from Hypervisor and 2-5,12-15 will be used by Rivermax VMs. cores 16,17 are excluded from Nova and will be used exclusively for running linuxptp tasks on the compute node

    ens1f1 is used for vlan traffic

    • Each compute node role is associated with a dedicated physical network to be used later on for multi-segment network, notice that  Nova PCI white list physical network remains the same.

    • VF function #1 is excluded from Nova PCI white list (will be used for Hypervisor VF for PTP traffic).

  • userdata_disable_service.yaml is called to disable chrony(ntp) service on overcloud nodes during compute - this is required for stable PTP setup.

  • ExtraConfig for mapping Role config params to the correct network set, and for setting Firewall rules allowing PTP traffic to the compute nodes


Full configuration file is attached to this document

Note

The following configuration file is correlated to specific compute server HW, OS and drivers in which:

NVIDIA's ConnectX adapter interface names are ens1f0, ens1f1

The PCI IDs used for SR-IOV VFs allocated for Nova usage are specified explicitly per compute role.

In different system the names and PCI addresses might be different.

It is required to have this information before cloud deployment in order to adjust the configuration files.

# A Heat environment file for adjusting the compute nodes to low latency media applications with HW Offload

resource_registry:
  OS::TripleO::Services::NeutronSriovHostConfig: /usr/share/openstack-tripleo-heat-templates/puppet/services/neutron-sriov-host-config.yaml
  OS::TripleO::NodeUserData: /home/stack/templates/userdata_disable_service.yaml
  OS::TripleO::Services::Ntp: OS::Heat::None
  OS::TripleO::Services::NeutronOvsAgent: /usr/share/openstack-tripleo-heat-templates/puppet/services/neutron-ovs-agent.yaml
  
parameter_defaults:

  DisableService: "chronyd"  
  NovaSchedulerDefaultFilters: ['RetryFilter','AvailabilityZoneFilter','RamFilter','ComputeFilter','ComputeCapabilitiesFilter','ImagePropertiesFilter','ServerGroupAntiAffinityFilter','ServerGroupAffinityFilter','PciPassthroughFilter']
  NovaSchedulerAvailableFilters: ["nova.scheduler.filters.all_filters","nova.scheduler.filters.pci_passthrough_filter.PciPassthroughFilter"]  

  # ComputeSriov1 Role params: 1 vxlan offload interface, 1 legacy sriov interface, isolated cores, cores 16-17 are isolated and excluded from nova for ptp usage. 
  ComputeSriov1Parameters:
    KernelArgs: "default_hugepagesz=2MB hugepagesz=2MB hugepages=8192 intel_iommu=on iommu=pt processor.max_cstate=0 intel_idle.max_cstate=0 nosoftlockup isolcpus=2-5,12-17 nohz_full=2-5,12-17 rcu_nocbs=2-5,12-17"
    NovaVcpuPinSet: "2-5,12-15" 
    OvsHwOffload: True
    NovaReservedHostMemory: 4096
    NovaPCIPassthrough:
      - devname: "ens1f0"
        physical_network: null
      - address: {"domain": ".*", "bus": "08", "slot": "08", "function": "[4-7]"}
        physical_network: "tenantvlan1"      
    NeutronPhysicalDevMappings: "tenantvlan1:ens1f1"
    NeutronBridgeMappings: ["tenantvlan1:br-stor"]
    
  # Extra config for mapping config params to rack 1 networks and for setting PTP Firewall rule
  ComputeSriov1ExtraConfig:
    neutron::agents::ml2::ovs::local_ip: "%{hiera('tenant')}"
    nova::vncproxy::host: "%{hiera('internal_api')}"
    nova::compute::vncserver_proxyclient_address: "%{hiera('internal_api')}"
    nova::compute::libvirt::vncserver_listen: "%{hiera('internal_api')}"    
    nova::my_ip: "%{hiera('internal_api')}"
    nova::migration::libvirt::live_migration_inbound_addr: "%{hiera('internal_api')}"
    cold_migration_ssh_inbound_addr: "%{hiera('internal_api')}"
    live_migration_ssh_inbound_addr: "%{hiera('internal_api')}" 
    tripleo::profile::base::database::mysql::client::mysql_client_bind_address: "%{hiera('internal_api')}"
    tripleo::firewall::firewall_rules:
      '199 allow PTP traffic over dedicated interface':
        dport: [319,320]
        proto: udp
        action: accept
  
  # ComputeSriov2 Role params: 1 vxlan offload interface, 1 legacy sriov interface, isolated cores, cores 16-17 are isolated and excluded from nova for ptp usage. 
  ComputeSriov2Parameters:
    KernelArgs: "default_hugepagesz=2MB hugepagesz=2MB hugepages=8192 intel_iommu=on iommu=pt processor.max_cstate=0 intel_idle.max_cstate=0 nosoftlockup isolcpus=2-5,12-17 nohz_full=2-5,12-17 rcu_nocbs=2-5,12-17"
    NovaVcpuPinSet: "2-5,12-15" 
    OvsHwOffload: True
    NovaReservedHostMemory: 4096
    NeutronSriovNumVFs: 
    NovaPCIPassthrough:
      - devname: "ens1f0"
        physical_network: null
      - address: {"domain": ".*", "bus": "08", "slot": "02", "function": "[4-7]"}
        physical_network: "tenantvlan1"     
    NeutronPhysicalDevMappings: "tenantvlan2:ens1f1"
    NeutronBridgeMappings: ["tenantvlan2:br-stor"]

    # Extra config for mapping config params to rack 2 networks and for setting PTP Firewall rule
  ComputeSriov2ExtraConfig:
    neutron::agents::ml2::ovs::local_ip: "%{hiera('tenant_2')}"
    nova::vncproxy::host: "%{hiera('internal_api_2')}"
    nova::compute::vncserver_proxyclient_address: "%{hiera('internal_api_2')}"
    nova::compute::libvirt::vncserver_listen: "%{hiera('internal_api_2')}"      
    nova::my_ip: "%{hiera('internal_api_2')}"
    nova::migration::libvirt::live_migration_inbound_addr: "%{hiera('internal_api_2')}"
    cold_migration_ssh_inbound_addr: "%{hiera('internal_api_2')}"
    live_migration_ssh_inbound_addr: "%{hiera('internal_api_2')}" 
    tripleo::profile::base::database::mysql::client::mysql_client_bind_address: "%{hiera('internal_api_2')}"
    tripleo::firewall::firewall_rules:
      '199 allow PTP traffic over dedicated interface':
        dport: [319,320]
        proto: udp
        action: accept



Disable_Service Configuration file:

The provided /home/stack/templates/userdata_disable_service.yaml is used to disable services on overcloud nodes during deployment.

It is used in rivermax-env.yaml to disable chrony(ntp) service:

heat_template_version: queens                                                          

description: >
  Uses cloud-init to enable root logins and set the root password.
  Note this is less secure than the default configuration and may not be
  appropriate for production environments, it's intended for illustration
  and development/debugging only.

parameters:
  DisableService:
    description: Disable a service
    hidden: true
    type: string

resources:
  userdata:
    type: OS::Heat::MultipartMime
    properties:
      parts:
      - config: {get_resource: disable_service}

  disable_service:
   type: OS::Heat::SoftwareConfig
   properties:
      config:
        str_replace:
          template: |
           #!/bin/bash
           set -x
           sudo systemctl disable $service
           sudo systemctl stop $service
          params:
           $service: {get_param: DisableService}

outputs:
  OS::stack_id:
    value: {get_resource: userdata}


Network configuration Files:

The provided network_data_rivermax.yaml file is used to configure the cloud networks according to the following guidelines:

  • rack 1 networks set parameters match the subnets/vlans configured on Rack 1 Leaf switch. The network names used are specified in roles_data.yaml for Controller\ComputeSriov1 role networks.
  • rack 2 networks match the subnets/vlans configured on Rack 2 Leaf switch. The network names are specified in roles_data.yaml for ComputeSriov2 role networks.
  • “management” network,is not used in our example
  • PTP network is shared to both racks in our example


The configuration is based on the following matrix to match the Leaf switch configuration as executed in Network Configuration section above:

Network Name

Network Set

Network Location

Network Details

VLAN

Network Allocation Pool

Storage

1

Rack 1

172.16.0.0/24

11

172.16.0.100-250

Storage_Mgmt


172.17.0.0/24

21

172.17.0.100-250

Internal API


172.18.0.0/24

31

172.18.0.100-250

Tenant


172.19.0.0/24

41

172.19.0.100-250

PTP
172.20.0.0/24untagged172.20.0.100-250

Storage_2

2




Rack 2

172.16.2.0/24

12

172.16.2.100-250

Storage_Mgmt_2


172.17.2.0/24

22

172.17.2.100-250

Internal API _2


172.18.2.0/24

32

172.18.2.100-250

Tenant _2


172.19.2.0/24

42

172.19.2.100-250

PTP_2
172.20.2.0/24untagged172.20.2.100-250

External

-

Public Switch

10.7.208.0/24

-

10.7.208.10-21

Full configuration file is attached to this document



Below is a partial example for one of the configured networks: Storage (2 networks sets), External, and PTP networks configuration:

- name: Storage
vip: true
vlan: 11
name_lower: storage
ip_subnet: '172.16.0.0/24'
allocation_pools: [{'start': '172.16.0.100', 'end': '172.16.0.250'}]
ipv6_subnet: 'fd00:fd00:fd00:1100::/64'
ipv6_allocation_pools: [{'start': 'fd00:fd00:fd00:1100::10', 'end': 'fd00:fd00:fd00:1100:ffff:ffff:ffff:fffe'}]
.
.
- name: Storage_2
vip: true
vlan: 12
name_lower: storage_2
ip_subnet: '172.16.2.0/24'
allocation_pools: [{'start': '172.16.2.100', 'end': '172.16.2.250'}]
ipv6_subnet: 'fd00:fd00:fd00:1200::/64'
ipv6_allocation_pools: [{'start': 'fd00:fd00:fd00:1200::10', 'end': 'fd00:fd00:fd00:1200:ffff:ffff:ffff:fffe'}]
.
.
- name: External
vip: true
name_lower: external
vlan: 10
ip_subnet: '10.7.208.0/24'
allocation_pools: [{'start': '10.7.208.10', 'end': '10.7.208.21'}]
gateway_ip: '10.7.208.1'
ipv6_subnet: '2001:db8:fd00:1000::/64'
ipv6_allocation_pools: [{'start': '2001:db8:fd00:1000::10', 'end': '2001:db8:fd00:1000:ffff:ffff:ffff:fffe'}]
gateway_ipv6: '2001:db8:fd00:1000::1'
.
.
- name: Ptp
name_lower: ptp
ip_subnet: '172.20.1.0/24'
allocation_pools: [{'start': '172.20.1.100', 'end': '172.20.1.250'}]

- name: Ptp_2
  name_lower: ptp_2
  ip_subnet: '172.20.2.0/24'
  allocation_pools: [{'start': '172.20.2.100', 'end': '172.20.2.250'}] 



The provided network-environment-rivermax.yaml file is used to configure the nova\neutron networks parameters according to the cloud networks:

  • vxlan tunnels
  • tenant vlan ranges to be used for SRIOV ports are 100-200

Full configuration file is attached to this document

.
.
.
  NeutronNetworkType: 'vlan,vxlan,flat'
  NeutronTunnelTypes: 'vxlan'
  NeutronNetworkVLANRanges: 'tenantvlan1:100:200,tenantvlan2:100:200'
  NeutronFlatNetworks: 'datacentre'
  NeutronBridgeMappings: 'datacentre:br-ex,tenantvlan1:br-stor'


Role type configuration files: 

/home/stack/templates/controller.yaml 

  • Make sure the location of run-os-net-config.sh script in the configuration file is pointing to the correct script location.
  • Supernet and GW per network allow routing between network sets located in different racks. The GW would be the IP interface which was configured on the Leaf switch interface facing this network. Supernet and gateway for 2 tenant networks can be seen below.
  • Controller nodes network settings we used:
    • Dedicated 1G interface (type “interface”) for provisioning (PXE) network.
    • Dedicated 1G interface (type “ovs_bridge”) for External network. This network has a default GW configured.
    • Dedicated 100G interface (type “interface” without vlans) for data plane (Tenant) network in Rack 1. The network is associated with a supernet and has a route allowing it to reach other networks in the same supernet located in different racks.
    • Dedicated 100G interface (type “ovs_bridge”) with vlans for Storage/StorageMgmt/InternalApi networks in Rack 1. Each network is associated with a supernet and has a route allowing it to reach other networks in the same supernet located in different racks.
  • See example below. Full configuration file is attached to this document.

    TenantSupernet:
    default: '172.19.0.0/16'
    description: Supernet that contains Tenant subnets for all roles.
    type: string
    TenantGateway:
    default: '172.19.0.1'
    description: Router gateway on tenant network
    type: string
    Tenant_2Gateway:
    default: '172.19.2.1'
    description: Router gateway on tenant_2 network
    type: string
    .
    .
    resources:
    OsNetConfigImpl:
    type: OS::Heat::SoftwareConfig
    properties:
    group: script
    config:
    str_replace:
    template:
    get_file: /usr/share/openstack-tripleo-heat-templates/network/scripts/run-os-net-config.sh
    params:
    $network_config:
    network_config:
    .
    .
    # NIC 3 - Data Plane (Tenant net)
    - type: ovs_bridge
    name: br-sriov
    use_dhcp: false
    members:
    - type: interface
    name: ens2f0
    addresses:
    - ip_netmask:
    get_param: TenantIpSubnet
    routes:
    - ip_netmask:
    get_param: TenantSupernet
    next_hop:
    get_param: TenantGateway

/home/stack/templates/computesriov1.yaml:

  • Make sure the location of run-os-net-config.sh script in the configuration file is pointing to the correct script location.
  • Supernet and GW per network allow routing between network sets located in different racks. The GW would be the IP interface which was configured on the Leaf switch interface facing this network. - not mentioned in the example below, see example above or full configuration file.
  • Networks and routes used by Compute nodes in Rack 1 with ComputeSriov1 role:
    • Dedicated 1G interface for provisioning (PXE) network 
    • Dedicated 100G interface for offloaded vxlan data plane network in Rack 1. The network is associated with a supernet and has a route allowing it to reach other networks in the same supernet located in different racks
    • Dedicated 100G interface with host VF for PTP and with OVS vlans for Storage/InternalApi networks in Rack 1. Each network is associated with a supernet and has a route allowing it to reach other networks in the same supernet located in different racks - not mentioned in the example below, see full configuration file.
  • See example below. Full configuration file is attached to this document.

     network_config:
                     # NIC 1 - Provisioning net
                  - type: interface                                                                                                
                    name: eno1                                                                                               
                    use_dhcp: false                                                                                                 
                    dns_servers:                                                                                                    
                      get_param: DnsServers                                                                                         
                    addresses:                                                                                                      
                    - ip_netmask:
                        list_join:
                        - /
                        - - get_param: ControlPlaneIp
                          - get_param: ControlPlaneSubnetCidr
                    routes:
                    - ip_netmask: 169.254.169.254/32
                      next_hop:
                        get_param: EC2MetadataIp
                    - default: true
                      next_hop:
                        get_param: ControlPlaneDefaultRoute
    
                       
                    # NIC 2 - ASAP2 VXLAN Data Plane (Tenant net)
                  - type: sriov_pf
                    name: ens1f0
                    numvfs: 8
                    link_mode: switchdev
                  - type: interface
                    name: ens1f0
                    use_dhcp: false
                    addresses:
                      - ip_netmask:
                          get_param: TenantIpSubnet 
                    routes:
                      - ip_netmask:
                          get_param: TenantSupernet
                        next_hop:
                          get_param: TenantGateway
                  
                  
                    # NIC 3 - Storage and Control over OVS, legacy SRIOV for Data Plane, NIC Partitioning for PTP VF owned by Host
                  - type: ovs_bridge
                    name: br-stor
                    use_dhcp: false
                    members:
                    - type: sriov_pf
                      name: ens1f1
                      numvfs: 8
                      # force the MAC address of the bridge to this interface
                      primary: true
                    - type: vlan
                      vlan_id:
                        get_param: StorageNetworkVlanID
                      addresses:
                      - ip_netmask:
                          get_param: StorageIpSubnet
                      routes:
                      - ip_netmask:
                          get_param: StorageSupernet
                        next_hop:
                          get_param: StorageGateway
                    - type: vlan
                      vlan_id:
                        get_param: InternalApiNetworkVlanID
                      addresses:
                      - ip_netmask:
                          get_param: InternalApiIpSubnet
                      routes:
                      - ip_netmask:
                          get_param: InternalApiSupernet
                        next_hop:
                          get_param: InternalApiGateway
                  - type: sriov_vf
                    device: ens1f1
                    vfid: 1
                    addresses:
                    - ip_netmask:
                        get_param: PtpIpSubnet
    
    

/home/stack/templates/computesriov2.yaml:

  • Make sure the location of run-os-net-config.sh script in the configuration file is pointing to the correct script location.
  • Supernet and GW per network allow routing between network sets located in different racks. The GW would be the IP interface which was configured on the Leaf switch interface facing this network. - not mentioned in the example below, see example above or full configuration file.
  • Networks and routes used by Compute nodes in Rack 2 with ComputeSriov2 role:
    • Dedicated 1G interface for provisioning (PXE) network - not mentioned in the example below, see example above or full configuration file.
    • Dedicated 100G interface for offloaded vxlan data plane network in Rack 1. The network is associated with a supernet and has a route allowing it to reach other networks in the same supernet located in different racks
    • Dedicated 100G interface with host VF for PTP and with OVS vlans for Storage/InternalApi networks in Rack 1. Each network is associated with a supernet and has a route allowing it to reach other networks in the same supernet located in different racks - not mentioned in the example below, see full configuration file.

  • See example below. Full configuration file is attached to this document.

    network_config:
                    # NIC 1 - Provisioning net
                  - type: interface                                                                                                
                    name: eno1                                                                                               
                    use_dhcp: false                                                                                                 
                    dns_servers:                                                                                                    
                      get_param: DnsServers                                                                                         
                    addresses:                                                                                                      
                    - ip_netmask:
                        list_join:
                        - /
                        - - get_param: ControlPlaneIp
                          - get_param: ControlPlaneSubnetCidr
                    routes:
                    - ip_netmask: 169.254.169.254/32
                      next_hop:
                        get_param: EC2MetadataIp
                    - default: true
                      next_hop:
                        get_param: ControlPlaneDefaultRoute
    
                       
                    # NIC 2 - ASAP2 VXLAN Data Plane (Tenant net)
                  - type: sriov_pf
                    name: ens1f0
                    numvfs: 8
                    link_mode: switchdev
                  - type: interface
                    name: ens1f0
                    use_dhcp: false
                    addresses:
                      - ip_netmask:
                          get_param: Tenant_2IpSubnet 
                    routes:
                      - ip_netmask:
                          get_param: TenantSupernet
                        next_hop:
                          get_param: Tenant_2Gateway
                  
                  
                    # NIC 3 - Storage and Control over OVS, legacy SRIOV for Data Plane, NIC Partitioning for PTP VF owned by Host
                  - type: ovs_bridge
                    name: br-stor
                    use_dhcp: false
                    members:
                    - type: sriov_pf
                      name: ens1f1
                      numvfs: 8
                      # force the MAC address of the bridge to this interface
                      primary: true
                    - type: vlan
                      vlan_id:
                        get_param: Storage_2NetworkVlanID
                      addresses:
                      - ip_netmask:
                          get_param: Storage_2IpSubnet
                      routes:
                      - ip_netmask:
                          get_param: StorageSupernet
                        next_hop:
                          get_param: Storage_2Gateway
                    - type: vlan
                      vlan_id:
                        get_param: InternalApi_2NetworkVlanID
                      addresses:
                      - ip_netmask:
                          get_param: InternalApi_2IpSubnet
                      routes:
                      - ip_netmask:
                          get_param: InternalApiSupernet
                        next_hop:
                          get_param: InternalApi_2Gateway
                  - type: sriov_vf
                    device: ens1f1
                    vfid: 1
                    addresses:
                    - ip_netmask:
                        get_param: Ptp_2IpSubnet

Deploying the Overcloud

Using the provided configuration and environment files, the cloud will be deployed utilizing:

  • 3 controllers associated with Rack 1 networks
  • 2 Compute nodes associated with Rack 1 (provider network 1)
  • 2 Compute nodes associated with Rack 2 (provider network 2)
  • Routes to allow connectivity between racks/networks
  • VXLAN overlay tunnels between all the nodes

Before starting the deployment, verify connectivity between the racks' Leaf switches SW vlan interfaces facing the nodes over the OSPF underlay fabric. Without inter-rack connectivity for all networks, the overcloud deployment will fail.

Note

  • Do not change the order of the environment files in the deploy command.
  • Make sure that the NTP server specified in the deploy command is accessible and can provide time to the undercloud node
  • The overcloud_images.yaml file used in the deploy command is created during undercloud installation, verify its existence in the specified location
  • The network-isolation.yaml and neutron-sriov.yaml files specified in the deploy command are created automatically during deployment from j2.yaml template file

To start the overcloud deployment, issue the command below: 

(undercloud) [stack@rhosp-director templates]$ openstack overcloud deploy --templates /usr/share/openstack-tripleo-heat-templates \
--libvirt-type kvm \
-n /home/stack/templates/network_data_rivermax.yaml \
-r /home/stack/templates/roles_data_rivermax.yaml \
--timeout 90 \
--validation-warnings-fatal \
--ntp-server 0.asia.pool.ntp.org \
-e /home/stack/templates/node-info.yaml \
-e /home/stack/templates/overcloud_images.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/neutron-sriov.yaml \
-e /home/stack/templates/network-environment-rivermax.yaml \
-e /home/stack/templates/rivermax-env.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/host-config-and-reboot.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/disable-telemetry.yaml


Post Deployment Steps

Compute Node configuration:

  1. Verify the system booted with the required low latency adjustments

    # cat /proc/cmdline
    BOOT_IMAGE=/boot/vmlinuz-3.10.0-957.10.1.el7.x86_64 root=UUID=334f450f-1946-4577-a4eb-822bd33b8db2 ro console=tty0 console=ttyS0,115200n8 crashkernel=auto rhgb quiet default_hugepagesz=2MB hugepagesz=2MB hugepages=8192 intel_iommu=on iommu=pt processor.max_cstate=0 intel_idle.max_cstate=0 nosoftlockup isolcpus=2-5,12-17 nohz_full=2-5,12-17 rcu_nocbs=2-5,12-17
    
    # cat /sys/module/intel_idle/parameters/max_cstate
    0
    
    # cat /sys/devices/system/cpu/cpuidle/current_driver
    none
  2. Upload MFT package to the compute node and install it 

    Note

    NVIDIA Mellanox Firmware Tools (MFT) can be obtained here.

    GCC and kernel-devel packages are required for MFT install.

    #yum install gcc kernel-devel-3.10.0-957.10.1.el7.x86_64 -y
    #tar -xzvf mft-4.12.0-105-x86_64-rpm.tgz
    #cd mft-4.12.0-105-x86_64-rpm
    #./install.sh
    mst start
  3. Verify NIC Firmware and upgrade it to the latest if required 

    # mlxfwmanager --query
    Querying Mellanox devices firmware ...
    
    Device #1:
    ----------
    
      Device Type:      ConnectX5
      Part Number:      MCX556A-EDA_Ax
      Description:      ConnectX-5 Ex VPI adapter card; EDR IB (100Gb/s) and 100GbE; dual-port QSFP28; PCIe4.0 x16; tall bracket; ROHS R6
      PSID:             MT_0000000009
      PCI Device Name:  /dev/mst/mt4121_pciconf0
      Base MAC:         ec0d9a7d81b2
      Versions:         Current        Available
         FW             16.25.1020     N/A
         PXE            3.5.0701       N/A
         UEFI           14.18.0019     N/A
    
      Status:           No matching image found
  4. Enable packet pacing and HW time stamp on the port used for PTP

    Note

    rivermax_config script is available to download here

    The relevant interface in our case is ens1f1.

    REBOOT is required between the steps.

    The "mcra" setting will not survive a reboot.

    This is expected to be persistent and enabled by default in future FW releases.

    #mst start
    Starting MST (Mellanox Software Tools) driver set
    Loading MST PCI module - Success
    [warn] mst_pciconf is already loaded, skipping
    Create devices
    -W- Missing "lsusb" command, skipping MTUSB devices detection
    Unloading MST PCI module (unused) - Success
    
    #mst status -v
    MST modules:
    ------------
        MST PCI module is not loaded
        MST PCI configuration module loaded
    PCI devices:
    ------------
    DEVICE_TYPE             MST                           PCI       RDMA            NET                       NUMA
    ConnectX5(rev:0)        /dev/mst/mt4121_pciconf0.1    08:00.1   mlx5_1          net-ens1f1                0
    
    ConnectX5(rev:0)        /dev/mst/mt4121_pciconf0      08:00.0   mlx5_0          net-ens1f0                0
    
     
    #chmod 777 rivermax_config
    #./rivermax_config ens1f1
    running this can take few minutes...
    enabling
    Done!
    # reboot
    #mst start
    Starting MST (Mellanox Software Tools) driver set
    Loading MST PCI module - Success
    [warn] mst_pciconf is already loaded, skipping
    Create devices
    -W- Missing "lsusb" command, skipping MTUSB devices detection
    Unloading MST PCI module (unused) - Success
     
    #mcra /dev/mst/mt4121_pciconf0.1 0xd8068 3
    
    #mcra /dev/mst/mt4121_pciconf0.1 0xd8068
    0x00000003
  5. Sync the compute node clock

  6. install linuxptp

    # yum install -y linuxptp
  7.  Use one of the following methods to identify the host VF interface name used for PTP (look for IP address from the PTP network or for "virtfn1" which is correlated to vfid 1 used in the configuration deployment files)

    [root@overcloud-computesriov1-0 ~]# ip addr show | grep "172.20"
        inet 172.20.0.102/24 brd 172.20.0.255 scope global enp8s8f3
    
    
     
    [root@overcloud-computesriov1-0 ~]# ls /sys/class/net/ens1f1/device/virtfn1/net/
    enp8s8f3
  8. verify connectivity to clock master (Onyx leaf switch sw11 over vlan 51 for Rack1, Onyx leaf switch sw10 over vlan 52 for Rack2)

    [root@overcloud-computesriov1-0 ~]# ping 172.20.0.1
    PING 172.20.0.1 (172.20.0.1) 56(84) bytes of data.
    64 bytes from 172.20.0.1: icmp_seq=1 ttl=64 time=0.158 ms
    
    
     
    [root@overcloud-computesriov2-0 ~]# ping 172.20.2.1
    PING 172.20.2.1 (172.20.2.1) 56(84) bytes of data.
    64 bytes from 172.20.2.1: icmp_seq=1 ttl=64 time=0.110 ms
  9. edit /etc/ptp4l.conf to include the following global parameter and the PTP interface parameters

    [global]
    domainNumber 127
    priority1 128
    priority2 127
    use_syslog 1
    logging_level 6
    tx_timestamp_timeout 30
    hybrid_e2e 1
    dscp_event 46
    dscp_general 46
     
    [enp8s8f3]
    logAnnounceInterval -2
    announceReceiptTimeout 3
    logSyncInterval -3
    logMinDelayReqInterval -3
    delay_mechanism E2E
    network_transport UDPv4
     
  10. Start ptp4l on the PTP VF interface

    Note

    The command below is used to run the ptp4l in slave mode on a dedicated host CPU which is isolated and excluded from Nova per our deployment configuration files (core 16 in our case).

    The second command is used to verify PTP clock is locked on master clock source, rms values should be low.

    # taskset -c 16 ptp4l -s -f /etc/ptp4l.conf &
    # tail -f /var/log/messages | grep rms
    ptp4l: [2560.009] rms   12 max   22 freq -12197 +/-  16
    ptp4l: [2561.010] rms   10 max   18 freq -12200 +/-  13 delay    63 +/-   0
    ptp4l: [2562.010] rms   10 max   21 freq -12212 +/-  10 delay    63 +/-   0
    ptp4l: [2563.011] rms   10 max   21 freq -12208 +/-  14 delay    63 +/-   0
    ptp4l: [2564.012] rms    9 max   14 freq -12220 +/-   8
  11.  Start  phc2sys on the same interface to sync the host system clock time

    Note

    The command below is used to run the phc2sys on a dedicated host CPU which is isolated and excluded from Nova per our deployment configuration files (core 17 in our case).

    The second command is used to verify system clock is synched to PTP, offset values should be low and match the ptp4l rms values

    # taskset -c 17  phc2sys -s enp8s8f3 -w -m -n 127 >> /var/log/messages &
    # tail -f /var/log/messages | grep offset
    phc2sys[2797.730] phc offset         0 s2 freq  +14570 delay    959
    phc2sys[2798.730]: phc offset       -43 s2 freq  +14527 delay    957
    phc2sys[2799.730]: phc offset        10 s2 freq  +14567 delay    951

Application VMs and Use Cases

In the section below we will cover two main use cases:

  1. IP Multicast stream between media VMs located in different L3 routed provider networks 




  2. HW-Offloaded Unicast stream over VXLAN tunnel between media VMs located in different L3 routed provider networks 



Media Instances Creation

Each Media VM will own both SRIOV-based vlan network and ASAP²-based VXLAN network. The same VMs can be used to test all of the use cases.

  1. Contact Nvidia Networking Support to get the Rivermax VM cloud image file (RivermaxCloud_v3.qcow2)

    Note

    The login credentials to VMs that are using this image are: root/3tango

  2. Upload Rivermax cloud image to overcloud image repository

    source overcloudrc
    openstack image create --file RivermaxCloud_v3.qcow2 --disk-format qcow2 --container-format bare rivermax
  3. Create a flavor with dedicated cpu policy to ensure VM vCPUs are pinned to the isolated host CPUs

    openstack flavor create m1.rivermax --id auto --ram 4096 --disk 20 --vcpus 4
    openstack flavor set m1.rivermax --property hw:mem_page_size=large
    openstack flavor set m1.rivermax --property hw:cpu_policy=dedicated
  4. Create a multi-segment network for the tenant vlan Multicast traffic

    Notes

    Each network segment contains SRIOV direct port with IP from a different subnet.

    The subnets are associated with a different physical network, each one correlated with a different routed provider rack.

    Routes to the subnets are propagated between racks via provider L3 infrastructure (OSPF in our case).

    The subnets GWs are the Leaf ToR Switch per rack.

    Both segments under this multi-segment network are carrying the same segment vlan.

    openstack network create mc_vlan_net --provider-physical-network tenantvlan1 --provider-network-type vlan --provider-segment 101 --share
    openstack network segment list --network mc_vlan_net
    +--------------------------------------+------+--------------------------------------+--------------+---------+
    | ID                                   | Name | Network                              | Network Type | Segment |
    +--------------------------------------+------+--------------------------------------+--------------+---------+
    | 309dd695-b45d-455e-b171-5739cc309dcf | None | 00665b03-eeae-4b5d-af65-063f8e989c24 | vlan         |     101 |
    +--------------------------------------+------+--------------------------------------+--------------+---------+
    
    
    openstack network segment set --name segment1 309dd695-b45d-455e-b171-5739cc309dcf
    openstack network segment create --physical-network tenantvlan2 --network-type vlan --segment 101 --network mc_vlan_net segment2
     
    (overcloud) [stack@rhosp-director ~]$ openstack network segment list 
    +--------------------------------------+----------+--------------------------------------+--------------+---------+
    | ID                                   | Name     | Network                              | Network Type | Segment |
    +--------------------------------------+----------+--------------------------------------+--------------+---------+
    | 309dd695-b45d-455e-b171-5739cc309dcf | segment1 | 00665b03-eeae-4b5d-af65-063f8e989c24 | vlan         |     101 |
    | cac89791-2d7f-45e7-8c85-cc0a65060e81 | segment2 | 00665b03-eeae-4b5d-af65-063f8e989c24 | vlan         |     101 |
    +--------------------------------------+----------+--------------------------------------+--------------+---------+
    
    openstack subnet create mc_vlan_subnet --dhcp --network mc_vlan_net --network-segment segment1 --subnet-range 11.11.11.0/24 --gateway 11.11.11.1
    openstack subnet create mc_vlan_subnet_2 --dhcp --network mc_vlan_net --network-segment segment2 --subnet-range 22.22.22.0/24 --gateway 22.22.22.1
    
    openstack port create mc_direct1 --vnic-type=direct --network mc_vlan_net 
    openstack port create mc_direct2 --vnic-type=direct --network mc_vlan_net 
  5. Create vxlan tenant network for Unicast traffic with 2 x  ASAP² offload ports 

    openstack network create tenant_vxlan_net --provider-network-type vxlan --share
    openstack subnet create tenant_vxlan_subnet --dhcp --network tenant_vxlan_net --subnet-range 33.33.33.0/24 --gateway none
    openstack port create offload1 --vnic-type=direct --network tenant_vxlan_net --binding-profile '{"capabilities":["switchdev"]}'
    openstack port create offload2 --vnic-type=direct --network tenant_vxlan_net --binding-profile '{"capabilities":["switchdev"]}'
  6. Create a rivermax instance on media compute node located in Rack 1 (provider network segment 1) with one direct SRIOV port on the vlan network and one ASAP² offload port on the vxlan network

    openstack server create --flavor m1.rivermax --image rivermax --nic port-id=mc_direct1 --nic port-id=offload1 vm1 --availability-zone nova:overcloud-computesriov1-0.localdomain
  7. Create a second rivermax instance on media compute node located in Rack 2 (provider network segment 2) with one direct SRIOV port on the vlan network and one ASAP² offload port on the vxlan network

    openstack server create --flavor m1.rivermax --image rivermax --nic port-id=mc_direct2 --nic port-id=offload2 vm2 --availability-zone nova:overcloud-computesriov2-0.localdomain
  8. Connect to the compute nodes and verify the VMs are pinned to the isolated CPUs

    [root@overcloud-computesriov1-0 ~]# virsh list
     Id    Name                           State
    ----------------------------------------------------
     1     instance-0000002b              running
    
    
    [root@overcloud-computesriov1-0 ~]# virsh vcpupin 1
    VCPU: CPU Affinity
    ----------------------------------
       0: 15
       1: 2
       2: 3
       3: 4
    



Rivermax Application Testing - Use Case 1:

In the following section we use Rivermax application VMs created on 2 media compute nodes located in different network racks.

First we will lock on the PTP clock generated by the Onyx switches and propagated into the VMs via KVM vPTP driver.

Next we will generate media standards compliant stream on VM1 and validate compliance using NVIDIA Rivermax AnalyzeX tool on VM2. The Multicast stream generated by VM1 will traverse over the network using PIM-SM and will be received by VM2 who joined the group. Please notice this stream contains RTP header (including 1 SRD) for each packet and comply with known media RFCs, however the RTP payload is 0 so it is cannot be visually displayed.

In the last step we will decode and stream a real video file on VM1 and play it in the receiver VM2 graphical interface using NVIDIA Rivermax Simple Viewer tool.


  1. Upload rivermax and analyzex license files to the Rivermax VMs and place it under /opt/mellanox/rivermx directory.
  2. On both VMs run the following command to sync the system time from PTP:

    taskset -c 1 phc2sys -s /dev/ptp2 -O 0 -m >> /var/log/messages &
    
    # tail -f /var/log/messages | grep offset
    phc2sys[2797.730] phc offset         0 s2 freq  +14570 delay    959
    phc2sys[2798.730]: phc offset       -43 s2 freq  +14527 delay    957
    phc2sys[2799.730]: phc offset        10 s2 freq  +14567 delay    951

    Notice the phc2sys is running on a dedicated VM core 1 (which is isolated from the hypervisor) and applied on ptp2 device. In some cases the ptp devices names in the VM will be different.

    Ignore the "clock is not adjustable" message when applying the command.

    Low and stable offset values will indicate a lock.

    Note

    Important: Verify the freq values in the output are close to the values seen in the compute node level (see above where we performed the command on host).
    If not, use in the phc2sys command a different /dev/ptp device that is available in the VM system.

  3. On both VMs, run the SDP file modification script to adjust the media configuration file (sdp_hd_video_audio) as desired:

    #cd /home/Rivermax
    #./sdp_modify.sh
    === SDP File Modification Script ===
    Default source IP Address is 11.11.11.10 would you like to change it (Y\N)?y
    Please select source IP Address in format X.X.X.X :11.11.11.25
    Default Video stream multicast IP Address is 224.1.1.20 would you like to change it (Y\N)?y
    Please select Video stream multicast IP Address:224.1.1.110
    Default  Video stream multicast Port is 5000 would you like to change it (Y\N)?n
    Default Audio stream multicast IP Address is 224.1.1.30 would you like to change it (Y\N)?y
    Please select Audio stream multicast IP Address:224.1.1.110
    Default Audio stream multicast Port: is 5010 would you like to change it (Y\N)?n
    Your SDP file is ready with the following parameters:
    IP_ADDR 11.11.11.25
    MC_VIDEO_IP 224.1.1.110
    MC_VIDEO_PORT 5000
    MC_AUDIO_IP 224.1.1.110
    MC_AUDIO_PORT 5010
    
    
    # cat sdp_hd_video_audio
    v=0
    o=- 1443716955 1443716955 IN IP4 11.11.11.25
    s=st2110 stream
    t=0 0
    m=video 5000 RTP/AVP 96
    c=IN IP4 224.1.1.110/64
    a=source-filter:incl IN IP4 224.1.1.110 11.11.11.25
    a=rtpmap:96 raw/90000
    a=fmtp:96 sampling=YCbCr-4:2:2; width=1920; height=1080; exactframerate=50; depth=10; TCS=SDR; colorimetry=BT709; PM=2110GPM; SSN=ST2110-20:2017; TP=2110TPN;
    a=mediaclk:direct=0
    a=ts-refclk:localmac=40-a3-6b-a0-2b-d2
    m=audio 5010 RTP/AVP 97
    c=IN IP4 224.1.1.110/64
    a=source-filter:incl IN IP4 224.1.1.110 11.11.11.25
    a=rtpmap:97 L24/48000/2
    a=mediaclk:direct=0 rate=48000
    a=ptime:1
    a=ts-refclk:localmac=40-a3-6b-a0-2b-d2
  4. On both VMs issue the following command to define the VMA memory buffers:

    export VMA_RX_BUFS=2048
    export VMA_TX_BUFS=2048
    export VMA_RX_WRE=1024
    export VMA_TX_WRE=1024
  5. On the first VM ("transmitter VM") generate the media stream using Rivermax media_sender application. The command below is used to run Rivermax media sender application on dedicated VM vCPUs 2,3 (which are isolated from the hypervisor).

    Media sender application is using the system time to operate.

    # ./media_sender -c 2 -a 3 -s sdp_hd_video_audio -m
  6. On the second VM ("receiver VM") run the AnalyzeX tool to verify compliance. The command below is used to run Rivermax AnalyzeX compliance tool on dedicated VM vCPUs 1-3 (which are isolated from the hypervisor).

    # VMA_HW_TS_CONVERSION=2 ANALYZEX_STACK_JITTER=2 LD_PRELOAD=libvma.so taskset -c 1-3 ./analyzex -i ens4 -s sdp_hd_video_audio -p
  7. The following AnalyzeX result indicate full compliance to ST2110 media standards:



  8. Stop Rivermax media_sender application on VM1 and AnalyzeX tool on VM2.

  9. Login to VM1 and extract the video file under /home/Rivermax directory

    # gunzip mellanoxTV_1080p50.ycbcr.gz
  10. Re-run Rivermax media_sender application on VM1 - this time specify the video file. Lower rate is used to allow the graphical interface to cope with video playing task:

    # ./media_sender -c 2 -a 3 -s sdp_hd_video_audio -m -f mellanoxTV_1080p50.ycbcr --fps 25
  11. Open a graphical remote session to VM2. In our case we have allocated a public floating ip to VM2 and used X2Go client to open a remote session:

  12. Open the Terminal and run the Rivermax rx_hello_wolrd_viewer application under /home/Rivermax directory. Specify the local VLAN IP address of VM2 and the Multicast address of the stream. Once the command is issued the video will start playing on screen.

    #cd /home/Rivermax
    # ./rx_hello_world_viewer -i 22.22.22.4 -m 224.1.1.110 -p 5000

    The following video demonstrates the procedure:
    Simple_player.mp4



Rivermax Application Testing - Use Case 2

In the following section we use the same Rivermax application VMs that were created on 2 remote media compute nodes to generate a Unicast stream between the VMs over VXLAN overlay network.

After validating the PTP clock is locked, we will start the stream and monitor it with the same tools.

The Unicast stream generated by VM1 will create vxlan OVS flow that will be offloaded to the NIC HW.


  1. Make sure rivermax and analyzex license files are placed on the Rivermax VMs as instructed in Use Case 1.
  2. Make sure the system time on both VMs is updated from PTP as instructed in Use Case 1.

    # tail -f /var/log/messages | grep offset
    phc2sys[2797.730] phc offset         0 s2 freq  +14570 delay    959
    phc2sys[2798.730]: phc offset       -43 s2 freq  +14527 delay    957
    phc2sys[2799.730]: phc offset        10 s2 freq  +14567 delay    951
  3. On transmitter VM1 run the SDP file modification script to create a Unicast configuration file - specify VM1 VXLAN IP address as source IP and  VM2 VXLAN IP address as the stream destination

    # ./sdp_modify.sh
    === SDP File Modification Script ===
    Default source IP Address is 11.11.11.10 would you like to change it (Y\N)?y
    Please select source IP Address in format X.X.X.X :33.33.33.12
    Default Video stream multicast IP Address is 224.1.1.20 would you like to change it (Y\N)?y
    Please select Video stream multicast IP Address:33.33.33.16
    Default  Video stream multicast Port is 5000 would you like to change it (Y\N)?n
    Default Audio stream multicast IP Address is 224.1.1.30 would you like to change it (Y\N)?y
    Please select Audio stream multicast IP Address:33.33.33.16
    Default Audio stream multicast Port: is 5010 would you like to change it (Y\N)?n
    Your SDP file is ready with the following parameters:
    IP_ADDR 33.33.33.12
    MC_VIDEO_IP 33.33.33.16
    MC_VIDEO_PORT 5000
    MC_AUDIO_IP 33.33.33.16
    MC_AUDIO_PORT 5010
  4. On VM1 generate the media stream using Rivermax media_sender application - use the unicast SDP file you created in previous step

    # ./media_sender -c 2 -a 3 -s sdp_hd_video_audio -m
  5. On receiver VM2 run the Rivermax rx_hello_wolrd application with the local VXLAN interface IP

    Note

    Make sure you use rx_hello_world tool and not rx_hello_world_viewer.

    # ./rx_hello_world -i 33.33.33.16 -m 33.33.33.16 -p 5000
  6. On the Compute nodes verify the flows are offloaded to the HW

    1. On compute node 1 which is hosting transmitter VM1 the offloaded flow includes the traffic coming from the VM over the Representor interface and goes into the VXLAN tunnel :

      [root@overcloud-computesriov1-0 heat-admin]# ovs-dpctl dump-flows type=offloaded --name
      
       
      in_port(eth4),eth(src=fa:16:3e:94:a4:5d,dst=fa:16:3e:fc:59:f3),eth_type(0x0800),ipv4(tos=0/0x3,frag=no), packets:54527279, bytes:71539619808, used:0.330s, actions:set(tunnel(tun_id=0x8,src=172.19.0.100,dst=172.19.2.105,tp_dst=4789,flags(key))),vxlan_sys_4789
    2. On compute node 2 which is hosting receiver VM2 the offloaded flow includes the traffic coming over the VXLAN tunnels and goes into the VM over the Representor interface:

      [root@overcloud-computesriov2-0 ~]# ovs-dpctl dump-flows type=offloaded --name
       
      tunnel(tun_id=0x8,src=172.19.0.100,dst=172.19.2.105,tp_dst=4789,flags(+key)),in_port(vxlan_sys_4789),eth(src=fa:16:3e:94:a4:5d,dst=fa:16:3e:fc:59:f3),eth_type(0x0800),ipv4(frag=no), packets:75722169, bytes:95561342656, used:0.420s, actions:eth5 sys_4789

Authors

Itai Levy

Over the past few years, Itai Levy has worked as a Solutions Architect and member of the NVIDIA Networking “Solutions Labs” team. Itai designs and executes cutting-edge solutions around Cloud Computing, SDN, SDS and Security. His main areas of expertise include NVIDIA BlueField Data Processing Unit (DPU) solutions and accelerated OpenStack/K8s platforms.




Notice

This document is provided for information purposes only and shall not be regarded as a warranty of a certain functionality, condition, or quality of a product. Neither NVIDIA Corporation nor any of its direct or indirect subsidiaries and affiliates (collectively: “NVIDIA”) make any representations or warranties, expressed or implied, as to the accuracy or completeness of the information contained in this document and assumes no responsibility for any errors contained herein. NVIDIA shall have no liability for the consequences or use of such information or for any infringement of patents or other rights of third parties that may result from its use. This document is not a commitment to develop, release, or deliver any Material (defined below), code, or functionality.
NVIDIA reserves the right to make corrections, modifications, enhancements, improvements, and any other changes to this document, at any time without notice.
Customer should obtain the latest relevant information before placing orders and should verify that such information is current and complete.
NVIDIA products are sold subject to the NVIDIA standard terms and conditions of sale supplied at the time of order acknowledgement, unless otherwise agreed in an individual sales agreement signed by authorized representatives of NVIDIA and customer (“Terms of Sale”). NVIDIA hereby expressly objects to applying any customer general terms and conditions with regards to the purchase of the NVIDIA product referenced in this document. No contractual obligations are formed either directly or indirectly by this document.
NVIDIA products are not designed, authorized, or warranted to be suitable for use in medical, military, aircraft, space, or life support equipment, nor in applications where failure or malfunction of the NVIDIA product can reasonably be expected to result in personal injury, death, or property or environmental damage. NVIDIA accepts no liability for inclusion and/or use of NVIDIA products in such equipment or applications and therefore such inclusion and/or use is at customer’s own risk.
NVIDIA makes no representation or warranty that products based on this document will be suitable for any specified use. Testing of all parameters of each product is not necessarily performed by NVIDIA. It is customer’s sole responsibility to evaluate and determine the applicability of any information contained in this document, ensure the product is suitable and fit for the application planned by customer, and perform the necessary testing for the application in order to avoid a default of the application or the product. Weaknesses in customer’s product designs may affect the quality and reliability of the NVIDIA product and may result in additional or different conditions and/or requirements beyond those contained in this document. NVIDIA accepts no liability related to any default, damage, costs, or problem which may be based on or attributable to: (i) the use of the NVIDIA product in any manner that is contrary to this document or (ii) customer product designs.
No license, either expressed or implied, is granted under any NVIDIA patent right, copyright, or other NVIDIA intellectual property right under this document. Information published by NVIDIA regarding third-party products or services does not constitute a license from NVIDIA to use such products or services or a warranty or endorsement thereof. Use of such information may require a license from a third party under the patents or other intellectual property rights of the third party, or a license from NVIDIA under the patents or other intellectual property rights of NVIDIA.
Reproduction of information in this document is permissible only if approved in advance by NVIDIA in writing, reproduced without alteration and in full compliance with all applicable export laws and regulations, and accompanied by all associated conditions, limitations, and notices.
THIS DOCUMENT AND ALL NVIDIA DESIGN SPECIFICATIONS, REFERENCE BOARDS, FILES, DRAWINGS, DIAGNOSTICS, LISTS, AND OTHER DOCUMENTS (TOGETHER AND SEPARATELY, “MATERIALS”) ARE BEING PROVIDED “AS IS.” NVIDIA MAKES NO WARRANTIES, EXPRESSED, IMPLIED, STATUTORY, OR OTHERWISE WITH RESPECT TO THE MATERIALS, AND EXPRESSLY DISCLAIMS ALL IMPLIED WARRANTIES OF NONINFRINGEMENT, MERCHANTABILITY, AND FITNESS FOR A PARTICULAR PURPOSE. TO THE EXTENT NOT PROHIBITED BY LAW, IN NO EVENT WILL NVIDIA BE LIABLE FOR ANY DAMAGES, INCLUDING WITHOUT LIMITATION ANY DIRECT, INDIRECT, SPECIAL, INCIDENTAL, PUNITIVE, OR CONSEQUENTIAL DAMAGES, HOWEVER CAUSED AND REGARDLESS OF THE THEORY OF LIABILITY, ARISING OUT OF ANY USE OF THIS DOCUMENT, EVEN IF NVIDIA HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. Notwithstanding any damages that customer might incur for any reason whatsoever, NVIDIA’s aggregate and cumulative liability towards customer for the products described herein shall be limited in accordance with the Terms of Sale for the product.

Trademarks
NVIDIA, the NVIDIA logo, and Mellanox are trademarks and/or registered trademarks of NVIDIA Corporation and/or Mellanox Technologies Ltd. in the U.S. and in other countries. Other company and product names may be trademarks of the respective companies with which they are associated.

Copyright
© 2022 NVIDIA Corporation & affiliates. All Rights Reserved.