Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Image Removed

In this Reference Deployment Guide (RDGwe'll demonstrate a complete deployment of  RedHat Openstack Platform 13 for Media Streaming applications with Mellanox NIC hardware (HW) offload capabilities.

We'll explain setup components, scale considerations and other technological aspects from HW BoM up to time synchronization and streaming application compliance testing in the cloud.

Before we start it's highly recommended to become familiar with the key technology features of this deployment guide: Rivermax video streaming library, and ASAPAccelerated Switch and Packet Processing.

You are welcome to visit the product pages and watch some interesting videos: 

Mellanox Rivermax Video Streaming Library 

Mellanox Accelerated Switching and Packet Processing 

Widget Connectorurlhttps://www.Image Added

Created on Sep 12, 2019 

Introduction

More and more media and entertainment (M&E) solution providers are moving their proprietary legacy video solutions to a next-generation IP-based infrastructures to meet the increasing global demand for ultra-high-definition video content. Broadcast providers are looking into cloud-based solutions to offer better scalability and flexibility which introduces new challenges such as multi-tenant high quality streaming at scale and time synchronization in cloud.

Red Hat OpenStack Platform (OSP) is a cloud computing platform that enables the creation, deployment, scale, and management of a secure and reliable public or private OpenStack-based cloud. This production-ready platform offers a tight integration with NVIDIA products and technologies and is used in this guide to demonstrate a full deployment of the "NVIDIA Media Cloud".

NVIDIA Media Cloud is a solution that includes NVIDIA Rivermax library for Packet Pacing, Kernel Bypass and Packet Aggregation along with cloud time synchronization and OVS HW offload to the NVIDIA SmartNIC using the Accelerated Switching And Packet Processing (ASAP2 ) framework.
By the end of this guide you will be able to run offloaded HD media streams between VMs in different racks and validate that it complies with the SMPTE 2110 standards, while using commodity switches, servers and NICs.

Widget Connector
urlhttp://youtube.com/watch?v=
LuKuW5PAvwU

Widget Connector
urlhttps://www.youtube.com/watch?v=m6MIcdw5e5I

Widget Connector
urlhttps://www.youtube.com/watch?v=S0ebcHnZwk0

References

Introduction

With the increasing global demand for ultra-high-definition video content, the media and entertainment (M&E) solution providers are increasingly moving their proprietary legacy video solutions to next-generation IP-based infrastructures. To offer better scalability and flexibility those broadcast providers are starting to look into cloud-based solutions and face new challenges such as mutli-tenant high quality steaming at scale and time synchronization in cloud.
Red Hat OpenStack Platform (OSP) is a cloud computing platform that enables the creation, deployment, scale, and management of a secure and reliable public or private OpenStack-based cloud.
This production-ready platform offers tight integration with Mellanox products and technologies and will be used in this guide to demonstrate a full deployment of a "Media Cloud" which includes Rivermax library for Packet Pacing, Kernel Bypass and Packet Aggregation along with cloud time synchronization and OVS HW offload to the Mellanox NIC using the

cuvNKwhbFVI

The following reference deployment guide (RDG) demonstrates a complete deployment of the RedHat Openstack Platform 13 for Media Streaming applications with NVIDIA SmartNIC hardware offload capabilities.

We'll explain the setup components, scale considerations and other aspects such as the hardware BoM (Bill of Materials) and time synchronization, as well as streaming application compliance testing in the cloud.

Before we start it's highly recommended to become familiar with the key technology features of this deployment guide: 

  • Rivermax video streaming library 
  • ASAP2 - Accelerated Switch and Packet Processing

Visit the product pages in the links below to learn more about these feature and their capabilities: 

NVIDIA Rivermax Video Streaming Library 

NVIDIA Accelerated Switching and Packet Processing 


Info
titleDownloadable Content

All configuration files are located here: 


References

framework.
By the end of this guide you will be able to run offloaded HD media streams between VMs in remote cloud racks and validate it is compliant to SMPTE2110 standards, while using commodity servers and NICs.

Mellanox Components


NVIDIA Components

  • NVIDIA Rivermax implements an optimized software library API for the media streaming applicationapplications. Rivermax It runs on Mellanox NVIDIA ConnectX®-5 network adapters or higher network adapters, enabling the use of common off-the-shelf (COTS) servers for HD to Ultra HD flows. The Rivermax -and ConnectX®-5 adapter cards combination also enables compliance complies with the SMPTE 2110-21 standards, which reduces CPU utilization for video data streaming, and removes bottlenecks for the highest throughput.
  • Mellanox NVIDIA Accelerated Switching and Packet Processing (ASAP²) is a framework that enables offloading of network data plane planes into a SmartNIC HW. The most popular use case is offloading , such as Open vSwitch (OVS) onto the NIC HW allowing offload which enables a performance boost of up to 10x performance boost with complete CPU load reduction. ASAP² is available starting on the ConnectX-4 Lx and later.
  • Mellanox NVIDIA Spectrum Switch family provides the most efficient network solutions for the ever-increasing performance demands of data center applications.
  • Mellanox NVIDIA ConnectX Network Adapter family delivers industry-leading connectivity for performance-driven server and storage applications. ConnectX adapter cards enable high bandwidth, coupled with ultra-low latency for diverse applications and systems, resulting in faster access and real-time responses.
  • Mellanox NVIDIA LinkX Cables and Transceivers family provides the industry’s most complete line of 10, 25, 40, 50, 100, 200, and 400Gb interconnect products for Cloud, Web 2.0, Enterprise, telco, and storage data center applications. They are often used to link top-of-rack switches downwards to servers, storage & appliances and upwards in switch-to-switch applications

Solution Setup Overview 

Below is a list of all the different components in this solution and how they are utilized:

Cloud Platform

The RH-OSP13

cloud is

will be deployed in large scale and utilized as the cloud platform

The

Compute

nodes Will

Nodes 

The compute nodes will be configured and deployed as

“Media Compute Nodes” 

Media Compute Nodes”, adjusted for low latency virtual media applications

. Each Compute/Controller node is equipped with a dual-port 100GB NIC of which one port is dedicated for VXLAN tenant network and the other for VLAN Multicast tenant network, Storage, Control, and PTP time synchronization.

Packet Pacing is enabled on the NIC ports specifically to allow MC (MultiCast) Pacing on the VLAN Tenant

Network  

Network.

Network

The different network components used in this user guide are configured in the following way:

  • Multiple racks interconnected via Spine/Leaf network architecture
  • Composable routed provider networks are used per rack
  • Compute nodes on different provider network segments will host local DHCP agent instances per openstack subnet segment
  • L3 OSPF underlay is used to route between the provider routed networks (another fabric-wide IGP can be used as desired)
  • Multicast
    • VLAN tenant network is used for Multicast media traffic and will utilize SR-IOV on the VM 
    • IP-PIM (Sparse Mode) is used for routing the tenant Multicast streams between the racks which are located in routed provider networks
    • IGMP snooping is used to manage the tenant multicast groups in the same L2 racks domain
  • Unicast  
    • ASAP²-enabled Compute nodes are located in different racks and maintain VXLAN tunnels as overlay for tenant VM traffic
    • The VXLAN tenant network is used for Unicast media traffic and will utilize ASAP²  to offload the CPU-intensive VXLAN traffic, in order to avoid the encapsulation/decapsulation performance penalty and achieve the optimum throughput
  • Openstack Neutron is used as an SDN controller. All network configuration for every openstack node will be done via Openstack orchestration
  • RHOSP inbox drivers are used on all infrastructure components except for VM guests

Time Synchronization

Time Synchronization will be configured in the following way:

  • linuxptp tools are used on compute nodes and application VMs
  • PTP traffic is untagged on the compute nodes
  • Onyx Switches propagate the time between the compute nodes and act as PTP Boundary Clock devices
  • One of the switches is used as PTP master clock (in real-life deployments a dedicated grand master should be used)
  • KVM virtual PTP driver is used by the VMs to pull the PTP time from their hosting hypervisor which is synced to the PTP clock source

Media Application

Mellanox

NVIDIA provides a Rivermax VM cloud image which includes all Rivermax tools and applications.

This

The Rivermax VM

gives

provides a demonstration of the media test application and allows the user to

prove

validate compliance

to

with the relevant media standards, i.e 

SMPTE2110  (Evaluation

SMPTE 2110 (an evaluation License is required

)

).



Solution Components   


Image Added




Solution General Design


Solution Multicast Design


Cloud Media Application Design 


VXLAN HW Offload Overview



Large Scale Overview

Image Removed

Solution Components   

Image Removed

HW Configuration

BOM

Image Added


HW Configuration

Bill of Materials (BoM)

Image Added

Info
titleNote
  • The BOM BoM above is referring refers to the maximal configuration in a large scale solution, with a blocking ratio of 3:1.
  • It is possible to change the blocking ratio in order to obtain a different capacity.
  • SN2100 Switch is sharing The SN2100 and SN2700 switches share the same feature set with SN2700 and can be used in this solution when lower capacity is required.accordingly with compute and/or network capacity required
  • The 2-Rack BOM BoM will be used in the solution example described below.


Solution Example

We have chosen chose the below key features below as a baseline to demonstrate this Reference Deployment Guide solutionthe solution used in this RDG.

Info
titleNote

The Solution Example solution example below does not contain redundancy configuration

Solution Scale

  • 2 x " racks " with a dedicated provider network set per rack
  • 1 x SN2700 switches switch as Spine switch
  • 2 x SN2100 switches as Leaf switches, 1 per rack
  • 5 nodes in rack 1 (3 x Controller, 2 x Compute)
  • 2 nodes in rack 2 (2 x Compute)
  • All nodes are connected to the Leaf switches using 2 x 100GB ports per node
  • Leaf switches are connected to each Spine switch using a single 100GB port

Physical

Diagram

  Image Removed

Network Diagram

Image Removed

Info
titleNote

Compute nodes are going out to the external network via the undercloud node.

 

Rack Diagram

In this RDG we placed all the equipment into the same rack, but the wiring and configuration simulates a two rack network setup.

  Image Added


PTP Diagram

 Image Removed

Image Added


Info
titleNote

One of the Onyx Leaf switches is used as PTP clock source GrandMaster instead of a dedicated deviceof a dedicated device.

Solution Networking

Network Diagram


Image Added

Info
titleNote

Compute nodes access External Network/Internet through the undercloud node which functions as a router.


Network Physical Configuration

Warning
titleImportant !

The configuration steps below refer to a solution example based on 2 racks

Network Configuration Steps

Physical Configuration

Below is a detailed step-by-step description of the network configuration:


  1. Connect the switches to the switch mgmt network
  2. Interconnect the switches using 100GB/s cables
Image Removed

   

  1. Image Added
  2. Connect the Controller/Compute servers to the relevant networks according to the following diagrams:
Image RemovedImage Removed
  1. Image AddedImage Added
  2. Connect the Undercloud Director server to the IPMI
/
  1. , PXE
/
  1. and External networks.

Switch Profile Configuration

Set MC max profile -

MC Max Profile must be set on all switches. This will remove

all

existing

configuration

configurations and

requires reboot

will require a reboot.

Warning
You shall backup your switch configuration in case you plan to use it later.


Run the command on all switches:

Code Block
languagetext
themeRDark
system profile eth-ipv4-mc-max 
show system profile

Switch Interface Configuration

Set the VLANs and VLAN interfaces on the Leaf switches according to the following

diagrams

:

Network Name

Network Set

Leaf Switch Location

Network Details

Switch Interface IP

VLAN ID

Switch Physical Port

Switchport Mode

Note

Storage

1

Rack 1

172.16.0.0 / 24

172.16.0.1

11

A

hybrid


Storage_Mgmt

172.17.0.0 / 24

172.17.0.1

21

A

hybrid


Internal API

172.18.0.0 / 24

172.18.0.1

31

A

hybrid


PTP172.20.0.0 /24172.20.0.151Ahybridaccess vlan
MC_Tenant_VLAN11.11.11.0/2411.11.11.1101Ahybrid

Tenant_VXLAN

172.19.0.0 / 24

172.19.0.1

41

B

access


Storage_2

2

Rack 2

172.16.2.0 / 24

172.16.2.1

12

A

hybrid


Storage_Mgmt_2

172.17.2.0 / 24

172.17.2.1

22

A

hybrid


Internal API _2

172.18.2.0 /24

172.18.2.1

32

A

hybrid


PTP_2172.20.2.0/24172.20.2.152Ahybridaccess vlan
MC_Tenant_VLAN22.22.22.0/2422.22.22.1101Ahybrid

Tenant_VXLAN_2

172.19.2.0 /24

172.19.2.1

42

B

access

Image Removed




Rack 1 Leaf switch VLAN DiagramRack 2 Leaf switch VLAN Diagram

Image Added

Image Added

         

  Image Removed

Switch Full Configuration

Info
titleNote
  • Onyx 3.8.1204 version and up is required
  • Switch SW-09 is used as Spine switch while
  • SW-10 and SW-11 are used as Leaf switches
  • Leaf SW-11 is configured with a PTP grandmaster role
  • port 1/9 in all Leaf switches is facing should face the Spine switch, the . The rest of the ports are facing should face the Compute\Controller nodes.
  • igmp immediate/fast-leave switch configurations should be removed in case multiple virtual receivers are used on a Compute node

Spine (SW-09

(Spine

) configuration: 

Code Block
languagetext
themeRDark
##
## STP configuration
##
no spanning-tree

##   
## L3 configuration
##
ip routing
interface ethernet 1/1-1/2 no switchport force
interface ethernet 1/1 ip address 192.168.119.9/24 primary
interface ethernet 1/2 ip address 192.168.109.9/24 primary
interface loopback 0 ip address 1.1.1.9/32 primary
 
##
## LLDP configuration
##
   lldp
   
##
## OSPF configuration
##
protocol ospf
router ospf router-id 1.1.1.9
   interface ethernet 1/1 ip ospf area 0.0.0.0
   interface ethernet 1/2 ip ospf area 0.0.0.0
   interface ethernet 1/1 ip ospf network broadcast
   interface ethernet 1/2 ip ospf network broadcast
   router ospf redistribute direct   
  
##
## IP Multicast router configuration
##
   ip multicast-routing 
   
##
## PIM configuration
##
   protocol pim
   interface ethernet 1/1 ip pim sparse-mode
   interface ethernet 1/2 ip pim sparse-mode
   ip pim multipath next-hop s-g-hash
   ip pim rp-address 1.1.1.9

##
## IGMP configuration
##
   interface ethernet 1/1 ip igmp immediate-leave
   interface ethernet 1/2 ip igmp immediate-leave
 
##
## Network management configuration
##
   ntp disable
   
##
## PTP protocol
##
   protocol ptp
   ptp vrf default enable
   interface ethernet 1/1 ptp enable
   interface ethernet 1/2 ptp enable

Leaf Rack 1 (SW-11

(Leaf Rack 1

) configuration:

Code Block
languagetext
themeRDark
##
## STP configuration
##
no spanning-tree


##
## LLDP configuration
##
   lldp
    
##
## VLAN configuration
##

vlan 11 
name storage
exit
vlan 21
name storage_mgmt
exit
vlan 31 
name internal_api
exit
vlan 41 
name tenant_vxlan
exit
vlan 51 
name ptp
exit
vlan 101 
name tenant_vlan_mc
exit

interface ethernet 1/1-1/5 switchport access vlan 41
interface ethernet 1/11-1/15 switchport mode hybrid 
interface ethernet 1/11-1/15 switchport access vlan 51
interface ethernet 1/11-1/15 switchport hybrid allowed-vlan add 11
interface ethernet 1/11-1/15 switchport hybrid allowed-vlan add 21
interface ethernet 1/11-1/15 switchport hybrid allowed-vlan add 31
interface ethernet 1/11-1/15 switchport hybrid allowed-vlan add 101

##
## IGMP Snooping configuration
##
   ip igmp snooping unregistered multicast forward-to-mrouter-ports
   ip igmp snooping
   vlan 51 ip igmp snooping
  vlan 101 ip igmp snooping
   vlan 51 ip igmp snooping querier
   vlan 101 ip igmp snooping querier
   interface ethernet 1/11-1/15 ip igmp snooping fast-leave

   
##   
## L3 configuration
##
ip routing
interface ethernet 1/9 no switchport force
interface ethernet 1/9 ip address 192.168.119.11/24 primary

interface vlan 11 ip address 172.16.0.1 255.255.255.0
interface vlan 21 ip address 172.17.0.1 255.255.255.0
interface vlan 31 ip address 172.18.0.1 255.255.255.0
interface vlan 41 ip address 172.19.0.1 255.255.255.0
interface vlan 51 ip address 172.20.0.1 255.255.255.0
interface vlan 101 ip address 11.11.11.1 255.255.255.0
  
  
##
## OSPF configuration
##
protocol ospf
router ospf router-id 1.1.1.11
interface ethernet 1/9 ip ospf area 0.0.0.0
interface ethernet 1/9 ip ospf network broadcast
router ospf redistribute direct  
  
   
##
## IP Multicast router configuration
##
   ip multicast-routing 
   
##
## PIM configuration
##
   protocol pim
   interface ethernet 1/9 ip pim sparse-mode
   ip pim multipath next-hop s-g-hash
   interface vlan 101 ip pim sparse-mode
   ip pim rp-address 1.1.1.9

##
## IGMP configuration
##
   interface ethernet 1/9 ip igmp immediate-leave
   interface vlan 101 ip igmp immediate-leave
   
##
## Network management configuration
##
   ntp disable
   
##
## PTP protocol
##
   protocol ptp
   ptp vrf default enable
   ptp priority1 1
   interface ethernet 1/9 ptp enable
   interface ethernet 1/11-1/15 ptp enable
   interface vlan 51 ptp enable 

SW-10 (Leaf Rack 2)

Code Block
languagetext
themeRDark
SW-10 Leaf Rack 2

##
## STP configuration
##
no spanning-tree


##
## LLDP configuration
##
   lldp
    
##
## VLAN configuration
##

vlan 12
name storage
exit
vlan 22
name storage_mgmt
exit
vlan 32 
name internal_api
exit
vlan 42 
name tenant_vxlan
exit
vlan 52 
name ptp
exit
vlan 101 
name tenant_vlan_mc
exit

interface ethernet 1/1-1/2 switchport access vlan 42
interface ethernet 1/11-1/12 switchport mode hybrid 
interface ethernet 1/11-1/12 switchport access vlan 52
interface ethernet 1/11-1/12 switchport hybrid allowed-vlan add 12
interface ethernet 1/11-1/12 switchport hybrid allowed-vlan add 22
interface ethernet 1/11-1/12 switchport hybrid allowed-vlan add 32
interface ethernet 1/11-1/12 switchport hybrid allowed-vlan add 101

##
## IGMP Snooping configuration
##
   ip igmp snooping unregistered multicast forward-to-mrouter-ports
   ip igmp snooping
   vlan 52 ip igmp snooping
   vlan 52 ip igmp snooping querier
   vlan 101 ip igmp snooping
   vlan 101 ip igmp snooping querier
   interface ethernet 1/11-1/12 ip igmp snooping fast-leave

   
##   
## L3 configuration
##
ip routing
interface ethernet 1/9 no switchport force
interface ethernet 1/9 ip address 192.168.109.10/24 primary

interface vlan 12 ip address 172.16.2.1 255.255.255.0
interface vlan 22 ip address 172.17.2.1 255.255.255.0
interface vlan 32 ip address 172.18.2.1 255.255.255.0
interface vlan 42 ip address 172.19.2.1 255.255.255.0
interface vlan 52 ip address 172.20.2.1 255.255.255.0
interface vlan 101 ip address 22.22.22.1 255.255.255.0
  
  
##
## OSPF configuration
##
protocol ospf
router ospf router-id 2.2.2.10
interface ethernet 1/9 ip ospf area 0.0.0.0
interface ethernet 1/9 ip ospf network broadcast
router ospf redistribute direct  
  
   
##
## IP Multicast router configuration
##
   ip multicast-routing 
   
##
## PIM configuration
##
   protocol pim
   interface ethernet 1/9 ip pim sparse-mode
   ip pim multipath next-hop s-g-hash
   interface vlan 101 ip pim sparse-mode
   ip pim rp-address 1.1.1.9

##
## IGMP configuration
##
   interface ethernet 1/9 ip igmp immediate-leave
   interface vlan 101 ip igmp immediate-leave
   
##
## Network management configuration
##
   ntp disable
   
##
## PTP protocol
##
   protocol ptp
   ptp vrf default enable
   interface ethernet 1/9 ptp enable
   interface ethernet 1/11-1/12 ptp enable
   interface vlan 52 ptp enable 



Solution Configuration and Deployment

Steps

The following information will take you through the configuration and deployment steps of the solution.

Prerequisites

HW Specifications must be Make sure that the hardware specifications are identical for servers with the same role (Compute/Controller/etc.)

Server Preparation - BIOS

For Make sure that for all servers:

Network
  1. The network boot is set on the interface connected to PXE network.
  2. Virtualization and SRIOV
Enabled
For Media
  1. are enabled.


Make sure that for Compute servers:

Network
  1. The network boot is set on the interface connected to the PXE network
  2. Virtualization and SRIOV
Enabled
  1. are enabled
  2. Power Profile
  1. is at "Maximum Performance"
  2. HyperThreading
Disabled
  1. is disabled
  2. C-state
Disabled
  1. is disabled
  2. Turbo Mode
Disabled
  1. is disabled
  2. Collaborative Power Control
Disabled
  1. is disabled
  2. Processor Power and Utilization Monitoring (crtl+A)
Disabled
  1. are disabled

NIC Preparation 

SRIOV configuration is disabled by default on ConnectX-5 NICs by default and must be enabled for every NIC used by a Compute node.

In order to To enable and configure itSRIOV, insert the Compute NIC into a test server with an OS installed OS, and follow the below steps below:

Verify using that
  1. Run the following to verify that the firmware version is 16.21.2030 or

newer
  1. later:

    Code Block
    languagetext
    themeRDark
    [root@host ~]# ethtool -i ens2f0
    driver: mlx5_core
    version: 5.0-0
    firmware-version: 16.22.1002 (MT_0000000009)
    expansion-rom-version:
    bus-info: 0000:07:00.0
    supports-statistics: yes
    supports-test: yes
    supports-eeprom-access: no
    supports-register-dump: no
    supports-priv-flags: yes
In case it

  1. If the firmware version is older, download and burn the new firmware as

explained in How to Install Mellanox OFED on Linux (Rev 4.4-2.0.7.0)
  1. described here


  2. Install the mstflint package:

    Code Block
    languagetext
    themeRDark
    [root@host ~]# yum install mstflint
    
  3. Identify the PCI ID of the first 100G port and enable SRIOV:

    Code Block
    languagetext
    themeRDark
    [root@host ~]# lspci | grep -i mel
    07:00.0 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex]
    07:00.1 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex]
    [root@host ~]#
    [root@host ~]# mstconfig -d 0000:07:00.0 query | grep -i sriov
    SRIOV_EN False(0)
    SRIOV_IB_ROUTING_MODE_P1 GID(0)
    SRIOV_IB_ROUTING_MODE_P2 GID(0)
    [root@host ~]# mstconfig -d 0000:07:00.0 set SRIOV_EN=1
    Device #1:
    ----------
    
    Device type: ConnectX5
    PCI device: 0000:07:00.0
    
    Configurations: Next Boot New
    SRIOV_EN False(0) True(1)
    
    Apply new Configuration? ? (y/n) [n] : y
    Applying... Done!
    -I- Please reboot machine to load new configurations.
  4. Set the number of VFs to a high value, such as 64, and reboot the server to apply the new configuration:

    Code Block
    languagetext
    themeRDark
    [root@host ~]# mstconfig -d 0000:07:00.0 query | grep -i vfs
    NUM_OF_VFS 0
    [root@host ~]# mstconfig -d 0000:07:00.0 set NUM_OF_VFS=64
    
    Device #1:
    ----------
    
    Device type: ConnectX5
    PCI device: 0000:07:00.0
    
    Configurations: Next Boot New
    NUM_OF_VFS 0 64
    
    Apply new Configuration? ? (y/n) [n] : y
    Applying... Done!
    -I- Please reboot machine to load new configurations.
    [root@host ~]# reboot
  5. Confirm the new settings were applied using the mstconfig query commands shown above.
  6. Insert the NIC back to the Compute node.
  7. Repeat the procedure above for every Compute node NIC used in our setup.

 


Info
titleNote
  • In our solution, the first port of the two 100G ports in every NIC is used for the ASAP² accelerated data plane. This is the reason we enable SRIOV SR-IOV only on the first Mellanox ConnectX NIC PCI device (07:00.0 in the example above).
  • There are future plans to support an automated procedure to update and configure the NICs on the Compute nodes from the Undercloud.

Accelerated RH-OSP Installation and Deployment

Steps

The following steps will take you through the accelerated RH-OSP installation and deployment procedure:

  1. Install Red Hat 7.
5
  1. 6 OS on the Undercloud server and set an IP
on
  1. to its
interface
  1. interface which is connected to the
External
  1. external network; make sure it has internet connectivity.
  2. Install the Undercloud and the director as instructed in section 4 of the Red Hat
OSP DIRECTOR INSTALLATION AND USAGE guide: 
  1. OSP Director Installation and Usage - Red Hat Customer Portal
Our
  1. . Our undercloud.conf file is attached as a reference
.
  1. here: Configuration Files
  2. Configure a container image source as instructed in section 5 of the above guide.

  1. Our solution is using undercloud as a local registry.

    Note
    titleNote

    The following overcloud image versions are used in our deployment:

    rhosp-director-images-13.0-20190418.1.el7ost.noarch
    rhosp-director-images-ipa-13.0-20190418.1.el7ost.noarch
    rhosp-director-images-ipa-x86_64-13.0-20190418.1.el7ost.noarch
    rhosp-director-images-x86_64-13.0-20190418.1.el7ost.noarch

    The overcloud image is RH 7.6 with kernel 3.10.0-957.10.1.el7.x86_64

  2. Register the nodes of the overcloud as instructed in section 6.1.

  1.  Our instackenv.json file is attached as a reference.

  2. Inspect the hardware of the nodes as instructed in section 6.2.
    Once introspection is completed, it is recommended to confirm for each node that the desired root disk was detected since cloud deployment can fail later because of insufficient disk space. Use the following command to check the free space on the detected disk selected as root:

    Code Block
    languagetext
    themeRDark
    (undercloud) [stack@rhosp-director ~]$ openstack baremetal node show 92c4c1cb-ce7d-48d4-a2d9-75b2651db097 | grep properties | properties | {u'memory_mb': u'131072', u'cpu_arch': u'x86_64', u'local_gb': u'418', u'cpus': u'24', u'capabilities': u'boot_option:local'}

    “local_gb” value is representing the disk size. In case the disk size is low and not as expected, use the procedure described in section 6.6 for defining the root disk for the node. Note that an additional introspection cycle is required for this node after the root disk is changed.

  3. Verify that all nodes were registered properly and changed their state to “available” before proceeding to the next step:

    Code Block
    languagetext
    themeRDark
    +--------------------------------------+--------------+---------------+-------------+--------------------+-------------+
    | UUID                                 | Name         | Instance UUID | Power State | Provisioning State | Maintenance |
    +--------------------------------------+--------------+---------------+-------------+--------------------+-------------+
    | d1fca940-e341-491b-8afd-0cf6d748aa29 | controller-1 | None          | power off   | available          | False       |
    | 6b24d02c-3fd2-4e55-a730-c45008f01723 | controller-2 | None          | power off   | available          | False       |
    | 098c3e2d-1c70-41d2-983b-6c266387de0b | controller-3 | None          | power off   | available          | False       |
    | 91492c2a-b26c-49ef-9d4e-e492a1578076 | compute-1    | None          | power off   | available          | False       |
    | cdf9e0ec-e3cb-4005-86f6-d40e684a9b19 | compute-2    | None          | power off   | available          | False       |
    | 92c4c1cb-ce7d-48d4-a2d9-75b2651db097 | compute-3    | None          | power off   | available          | False       |
    | bb5e829a-834b-4eb1-b733-0012ce9d5f00 | compute-4    | None          | power off   | available          | False       |
    +--------------------------------------+--------------+---------------+-------------+--------------------+-------------+
Tagging Nodes into Profiles



The next step is to Tag the nodes into profiles


  1. Tag the controllers nodes into “control” default profile:

    Code Block
    languagetext
    themeRDark
    (undercloud) [stack@rhosp-director ~]$ openstack baremetal node set --property capabilities='profile:control,boot_option:local' controller-1
    (undercloud) [stack@rhosp-director ~]$ openstack baremetal node set --property capabilities='profile:control,boot_option:local' controller-2
    (undercloud) [stack@rhosp-director ~]$ openstack baremetal node set --property capabilities='profile:control,boot_option:local' controller-3


  2. Create two new compute flavors -- one per rack (compute-r1, compute-r2) -- and attach the flavors to profiles with a correlated name:

    Code Block
    languagetext
    themeRDark
    (undercloud) [stack@rhosp-director ~]$ openstack flavor create --id auto --ram 4096 --disk 40 --vcpus 1 compute-r1
    (undercloud) [stack@rhosp-director ~]$ openstack flavor set --property "capabilities:boot_option"="local" --property "capabilities:profile"="compute-r1" --property "resources:CUSTOM_BAREMETAL"="1" --property "resources:DISK_GB"="0" --property "resources:MEMORY_MB"="0" --property "resources:VCPU"="0" compute-r1
    
    (undercloud) [stack@rhosp-director ~]$ openstack flavor create --id auto --ram 4096 --disk 40 --vcpus 1 compute-r2
    (undercloud) [stack@rhosp-director ~]$ openstack flavor set --property "capabilities:boot_option"="local" --property "capabilities:profile"="compute-r2" --property "resources:CUSTOM_BAREMETAL"="1" --property "resources:DISK_GB"="0" --property "resources:MEMORY_MB"="0" --property "resources:VCPU"="0" compute-r2
  3. Tag compute nodes 1,3 into “compute-r1” profile to associate it with Rack 1, and compute nodes 2,4 into “compute-r2” profile to associate it with Rack 2:

    Code Block
    languagetext
    themeRDark
    (undercloud) [stack@rhosp-director ~]$ openstack baremetal node set --property capabilities='profile:compute-r1,boot_option:local' compute-1
    (undercloud) [stack@rhosp-director ~]$ openstack baremetal node set --property capabilities='profile:compute-r1,boot_option:local' compute-3
    (undercloud) [stack@rhosp-director ~]$ openstack baremetal node set --property capabilities='profile:compute-r2,boot_option:local' compute-2
    (undercloud) [stack@rhosp-director ~]$ openstack baremetal node set --property capabilities='profile:compute-r2,boot_option:local' compute-4
  4. Verify profile tagging per node using the command below:

    Code Block
    languagetext
    themeRDark
    (undercloud) [stack@rhosp-director ~]$ openstack overcloud profiles list
    +--------------------------------------+--------------+-----------------+-----------------+-------------------+
    | Node UUID                            | Node Name    | Provision State | Current Profile | Possible Profiles |
    +--------------------------------------+--------------+-----------------+-----------------+-------------------+
    | d1fca940-e341-491b-8afd-0cf6d748aa29 | controller-1 | available       | control | |
    | 6b24d02c-3fd2-4e55-a730-c45008f01723 | controller-2 | available       | control | |
    | 098c3e2d-1c70-41d2-983b-6c266387de0b | controller-3 | available       | control | |
    | 91492c2a-b26c-49ef-9d4e-e492a1578076 | compute-1    | available       | compute-r1 | |
    | cdf9e0ec-e3cb-4005-86f6-d40e684a9b19 | compute-2    | available       | compute-r2 | |
    | 92c4c1cb-ce7d-48d4-a2d9-75b2651db097 | compute-3    | available       | compute-r1 | |
    | bb5e829a-834b-4eb1-b733-0012ce9d5f00 | compute-4    | available       | compute-r2 | |
    +--------------------------------------+--------------+-----------------+-----------------+-------------------+
    Info
title
Note
  1. It is possible to tag the nodes into profiles in instackenv.json file during node registration (section 6.1) instead of running the tag command per node, however flavors and profiles must be created in any case.

Mellanox

NVIDIA NICs Listing

Run the following command to go over all registered nodes and identify the interface names of the dual port

Mellanox

NVIDIA 100GB NIC. Interface names are used later on in the configuration files.

Code Block
languagetext
themeRDark
(undercloud) [stack@rhosp-director templates]$ for node in $(openstack baremetal node list --fields uuid -f value) ; do openstack baremetal introspection interface list $node ; done
.
.
+-----------+-------------------+----------------------+-------------------+----------------+
| Interface | MAC Address       | Switch Port VLAN IDs | Switch Chassis ID | Switch Port ID |
+-----------+-------------------+----------------------+-------------------+----------------+
| eno1      | ec:b1:d7:83:11:b8 | []                   | 94:57:a5:25:fa:80 | 29 |
| eno2      | ec:b1:d7:83:11:b9 | []                   | None              | None |
| eno3      | ec:b1:d7:83:11:ba | []                   | None              | None |
| eno4      | ec:b1:d7:83:11:bb | []                   | None              | None |
| ens1f1    | ec:0d:9a:7d:81:b3 | []                   | 24:8a:07:7f:ef:00 | Eth1/14 |
| ens1f0    | ec:0d:9a:7d:81:b2 | []                   | 24:8a:07:7f:ef:00 | Eth1/1 |
+-----------+-------------------+----------------------+-------------------+----------------+
Note
titleNote

Names must be identical for all nodes, or at least for all nodes sharing the same role. In our case, it is ens2f0/ens2f1 in Controller nodes, and enf1f0/ens1f1 in Compute nodes.



Note
titleNote

The configuration file examples in the following sections are partial and were employed to highlight specific sections. The full configuration files are available to download in the following link:

Configuration Files

Deployment configuration and environment files:

Role definitions file:

  • Provided

    The provided /home/stack/templates/

    roles 

    roles_data_rivermax.yaml file includes a standard Controller role and two types of Compute roles, one per associated network rack

  • The NeutronDhcpAgent service is added to the Compute roles Partial roles

 Below is a partial output of the config files:

Code Block
languagetext
themeRDark
###############################################################################
# Role: ComputeSriov1 #
###############################################################################
- name: ComputeSriov1
description: |
Compute SR-IOV Role R1
CountDefault: 1
networks:
- InternalApi
- Tenant
- Storage
- Ptp
HostnameFormatDefault: '%stackname%-computesriov1-%index%'
disable_upgrade_deployment: True
ServicesDefault:
Code Block
languagetext
themeRDark
###############################################################################
# Role: ComputeSriov2 #
###############################################################################
- name: ComputeSriov2
description: |
Compute SR-IOV Role R2
CountDefault: 1
networks:
- InternalApi_2
- Tenant_2
- Storage_2
- Ptp_2
HostnameFormatDefault: '%stackname%-computesriov2-%index%'
disable_upgrade_deployment: True
ServicesDefault:


The full configuration file is attached to this document for your convenience.


Node Counts and Flavors file:

Provided

The provided /home/stack/templates/node-info.yaml specifies count nodes and the correlated flavor per role.


Full configuration file:

Code Block
languagetext
themeRDark
parameter_defaults:
OvercloudControllerFlavor: control
OvercloudComputeSriov1Flavor: compute-r1
OvercloudComputeSriov2Flavor: compute-r2
ControllerCount: 3
ComputeSriov1Count: 2



Rivermax Environment Configuration file:

Provided


The provided /home/stack/templates/rivermax-env.yaml file is used to configure the Compute nodes for low latency applications with HW offload:


  • ens1f0 is used for accelerated VXLAN data plane (Nova physical_network: null is required for VXLAN offload)

  • CPU isolation: cores 2-5,12-17 are isolated from Hypervisor and 2-5,12-15 will be used by Rivermax VMs. cores 16,17 are excluded from Nova and will be used exclusively for running linuxptp tasks on the compute node

    ens1f1 is used for vlan traffic

    • Each compute node role is associated with a dedicated physical network to be used later on for multi-segment network, notice that  Nova PCI white list physical network remains the same.

    • VF function #1 is excluded from Nova PCI white list (will be used for Hypervisor VF for PTP traffic).

  • userdata_disable_service.yaml is called to disable chrony(ntp) service on overcloud nodes during compute - this is required for stable PTP setup.

  • ExtraConfig for mapping Role config params to the correct network set, and for setting Firewall rules allowing PTP traffic to the compute nodes


Full configuration file is attached to this document

Note
titleNote

The following configuration file is correlated to specific compute server HW, OS and drivers in which:

Mellanox

NVIDIA's ConnectX adapter interface names are ens1f0, ens1f1

The PCI IDs used for

SRIOV

SR-IOV VFs allocated for Nova usage are specified explicitly per compute role.

In different system the names and PCI addresses might be different.

It is required to have this information before cloud deployment in order to adjust the configuration files.

Code Block
languagetext
themeRDark
# A Heat environment file for adjusting the compute nodes to low latency media applications with HW Offload

resource_registry:
  OS::TripleO::Services::NeutronSriovHostConfig: /usr/share/openstack-tripleo-heat-templates/puppet/services/neutron-sriov-host-config.yaml
  OS::TripleO::NodeUserData: /home/stack/templates/userdata_disable_service.yaml
  OS::TripleO::Services::Ntp: OS::Heat::None
  OS::TripleO::Services::NeutronOvsAgent: /usr/share/openstack-tripleo-heat-templates/puppet/services/neutron-ovs-agent.yaml
  
parameter_defaults:

  DisableService: "chronyd"  
  NovaSchedulerDefaultFilters: ['RetryFilter','AvailabilityZoneFilter','RamFilter','ComputeFilter','ComputeCapabilitiesFilter','ImagePropertiesFilter','ServerGroupAntiAffinityFilter','ServerGroupAffinityFilter','PciPassthroughFilter']
  NovaSchedulerAvailableFilters: ["nova.scheduler.filters.all_filters","nova.scheduler.filters.pci_passthrough_filter.PciPassthroughFilter"]  

  # ComputeSriov1 Role params: 1 vxlan offload interface, 1 legacy sriov interface, isolated cores, cores 16-17 are isolated and excluded from nova for ptp usage. 
  ComputeSriov1Parameters:
    KernelArgs: "default_hugepagesz=2MB hugepagesz=2MB hugepages=8192 intel_iommu=on iommu=pt processor.max_cstate=0 intel_idle.max_cstate=0 nosoftlockup isolcpus=2-5,12-17 nohz_full=2-5,12-17 rcu_nocbs=2-5,12-17"
    NovaVcpuPinSet: "2-5,12-15" 
    OvsHwOffload: True
    NovaReservedHostMemory: 4096
    NovaPCIPassthrough:
      - devname: "ens1f0"
        physical_network: null
      - address: {"domain": ".*", "bus": "08", "slot": "08", "function": "[4-7]"}
        physical_network: "tenantvlan1"      
    NeutronPhysicalDevMappings: "tenantvlan1:ens1f1"
    NeutronBridgeMappings: ["tenantvlan1:br-stor"]
    
  # Extra config for mapping config params to rack 1 networks and for setting PTP Firewall rule
  ComputeSriov1ExtraConfig:
    neutron::agents::ml2::ovs::local_ip: "%{hiera('tenant')}"
    nova::vncproxy::host: "%{hiera('internal_api')}"
    nova::compute::vncserver_proxyclient_address: "%{hiera('internal_api')}"
    nova::compute::libvirt::vncserver_listen: "%{hiera('internal_api')}"    
    nova::my_ip: "%{hiera('internal_api')}"
    nova::migration::libvirt::live_migration_inbound_addr: "%{hiera('internal_api')}"
    cold_migration_ssh_inbound_addr: "%{hiera('internal_api')}"
    live_migration_ssh_inbound_addr: "%{hiera('internal_api')}" 
    tripleo::profile::base::database::mysql::client::mysql_client_bind_address: "%{hiera('internal_api')}"
    tripleo::firewall::firewall_rules:
      '199 allow PTP traffic over dedicated interface':
        dport: [319,320]
        proto: udp
        action: accept
  
  # ComputeSriov2 Role params: 1 vxlan offload interface, 1 legacy sriov interface, isolated cores, cores 16-17 are isolated and excluded from nova for ptp usage. 
  ComputeSriov2Parameters:
    KernelArgs: "default_hugepagesz=2MB hugepagesz=2MB hugepages=8192 intel_iommu=on iommu=pt processor.max_cstate=0 intel_idle.max_cstate=0 nosoftlockup isolcpus=2-5,12-17 nohz_full=2-5,12-17 rcu_nocbs=2-5,12-17"
    NovaVcpuPinSet: "2-5,12-15" 
    OvsHwOffload: True
    NovaReservedHostMemory: 4096
    NeutronSriovNumVFs: 
    NovaPCIPassthrough:
      - devname: "ens1f0"
        physical_network: null
      - address: {"domain": ".*", "bus": "08", "slot": "02", "function": "[4-7]"}
        physical_network: "tenantvlan1"     
    NeutronPhysicalDevMappings: "tenantvlan2:ens1f1"
    NeutronBridgeMappings: ["tenantvlan2:br-stor"]

    # Extra config for mapping config params to rack 2 networks and for setting PTP Firewall rule
  ComputeSriov2ExtraConfig:
    neutron::agents::ml2::ovs::local_ip: "%{hiera('tenant_2')}"
    nova::vncproxy::host: "%{hiera('internal_api_2')}"
    nova::compute::vncserver_proxyclient_address: "%{hiera('internal_api_2')}"
    nova::compute::libvirt::vncserver_listen: "%{hiera('internal_api_2')}"      
    nova::my_ip: "%{hiera('internal_api_2')}"
    nova::migration::libvirt::live_migration_inbound_addr: "%{hiera('internal_api_2')}"
    cold_migration_ssh_inbound_addr: "%{hiera('internal_api_2')}"
    live_migration_ssh_inbound_addr: "%{hiera('internal_api_2')}" 
    tripleo::profile::base::database::mysql::client::mysql_client_bind_address: "%{hiera('internal_api_2')}"
    tripleo::firewall::firewall_rules:
      '199 allow PTP traffic over dedicated interface':
        dport: [319,320]
        proto: udp
        action: accept



Disable_Service Configuration file:

Provided

The provided /home/stack/templates/userdata_disable_service.yaml is used to disable services on overcloud nodes during deployment.

It is used

in rivermax

in rivermax-env.yaml

in order

to disable chrony(ntp) service:

Code Block
languagetext
themeRDark
heat_template_version: queens                                                          

description: >
  Uses cloud-init to enable root logins and set the root password.
  Note this is less secure than the default configuration and may not be
  appropriate for production environments, it's intended for illustration
  and development/debugging only.

parameters:
  DisableService:
    description: Disable a service
    hidden: true
    type: string

resources:
  userdata:
    type: OS::Heat::MultipartMime
    properties:
      parts:
      - config: {get_resource: disable_service}

  disable_service:
   type: OS::Heat::SoftwareConfig
   properties:
      config:
        str_replace:
          template: |
           #!/bin/bash
           set -x
           sudo systemctl disable $service
           sudo systemctl stop $service
          params:
           $service: {get_param: DisableService}

outputs:
  OS::stack_id:
    value: {get_resource: userdata}


Network configuration Files:

Provided

The provided network_data_rivermax.yaml file is used to configure the cloud networks according to the following guidelines:

  • rack 1 networks set parameters match the subnets/vlans configured on Rack 1 Leaf switch. The network names used are specified in roles_data.yaml for Controller\ComputeSriov1 role networks.
  • rack 2 networks match the subnets/vlans configured on Rack 2 Leaf switch. The network names are specified in roles_data.yaml for ComputeSriov2 role networks.
  • “management” network,is not used in our example
  • PTP network is shared to both racks in our example


The configuration is based on the following matrix to match the Leaf switch configuration as executed in Network Configuration section above:

Network Name

Network Set

Network Location

Network Details

VLAN

Network Allocation Pool

Storage

1

Rack 1

172.16.0.0/24

11

172.16.0.100-250

Storage_Mgmt


172.17.0.0/24

21

172.17.0.100-250

Internal API


172.18.0.0/24

31

172.18.0.100-250

Tenant


172.19.0.0/24

41

172.19.0.100-250

PTP
172.20.0.0/24untagged172.20.0.100-250

Storage_2

2




Rack 2

172.16.2.0/24

12

172.16.2.100-250

Storage_Mgmt_2


172.17.2.0/24

22

172.17.2.100-250

Internal API _2


172.18.2.0/24

32

172.18.2.100-250

Tenant _2


172.19.2.0/24

42

172.19.2.100-250

PTP_2
172.20.2.0/24untagged172.20.2.100-250

External

-

Public Switch

10.7.208.0/24

-

10.7.208.10-21

Full configuration file is attached to this document

Partial



Below is a partial example for one of the configured networks: Storage (2 networks sets), External, and PTP networks configuration:

Code Block
languagetext
themeRDark
- name: Storage
vip: true
vlan: 11
name_lower: storage
ip_subnet: '172.16.0.0/24'
allocation_pools: [{'start': '172.16.0.100', 'end': '172.16.0.250'}]
ipv6_subnet: 'fd00:fd00:fd00:1100::/64'
ipv6_allocation_pools: [{'start': 'fd00:fd00:fd00:1100::10', 'end': 'fd00:fd00:fd00:1100:ffff:ffff:ffff:fffe'}]
.
.
- name: Storage_2
vip: true
vlan: 12
name_lower: storage_2
ip_subnet: '172.16.2.0/24'
allocation_pools: [{'start': '172.16.2.100', 'end': '172.16.2.250'}]
ipv6_subnet: 'fd00:fd00:fd00:1200::/64'
ipv6_allocation_pools: [{'start': 'fd00:fd00:fd00:1200::10', 'end': 'fd00:fd00:fd00:1200:ffff:ffff:ffff:fffe'}]
.
.
- name: External
vip: true
name_lower: external
vlan: 10
ip_subnet: '10.7.208.0/24'
allocation_pools: [{'start': '10.7.208.10', 'end': '10.7.208.21'}]
gateway_ip: '10.7.208.1'
ipv6_subnet: '2001:db8:fd00:1000::/64'
ipv6_allocation_pools: [{'start': '2001:db8:fd00:1000::10', 'end': '2001:db8:fd00:1000:ffff:ffff:ffff:fffe'}]
gateway_ipv6: '2001:db8:fd00:1000::1'
.
.
- name: Ptp
name_lower: ptp
ip_subnet: '172.20.1.0/24'
allocation_pools: [{'start': '172.20.1.100', 'end': '172.20.1.250'}]

- name: Ptp_2
  name_lower: ptp_2
  ip_subnet: '172.20.2.0/24'
  allocation_pools: [{'start': '172.20.2.100', 'end': '172.20.2.250'}] 
Provided network



The provided network-environment-rivermax.yaml file is used to configure the nova\neutron networks parameters according to the cloud networks:

  • vxlan tunnels
  • tenant vlan ranges to be used for SRIOV ports are 100-200

Full configuration file is attached to this document

Code Block
languagetext
themeRDark
.
.
.
  NeutronNetworkType: 'vlan,vxlan,flat'
  NeutronTunnelTypes: 'vxlan'
  NeutronNetworkVLANRanges: 'tenantvlan1:100:200,tenantvlan2:100:200'
  NeutronFlatNetworks: 'datacentre'
  NeutronBridgeMappings: 'datacentre:br-ex,tenantvlan1:br-stor'


Role type configuration

files 

files: 

/home/stack/templates/controller.yaml 

  • Make sure the location of run-os-net-config.sh script in the configuration file is pointing to the correct script location.
  • Supernet and GW per network allow routing between network sets located in different racks. The GW would be the IP interface which was configured on the Leaf switch interface facing this network. Supernet and gateway for 2 tenant networks can be seen below.
  • Controller nodes network settings we used:
    • Dedicated 1G interface (type “interface”) for provisioning (PXE) network.
    • Dedicated 1G interface (type “ovs_bridge”) for External network. This network has a default GW configured.
    • Dedicated 100G interface (type “interface” without vlans) for data plane (Tenant) network in Rack 1. The network is associated with a supernet and has a route allowing it to reach other networks in the same supernet located in different racks.
    • Dedicated 100G interface (type “ovs_bridge”) with vlans for Storage/StorageMgmt/InternalApi networks in Rack 1. Each network is associated with a supernet and has a route allowing it to reach other networks in the same supernet located in different racks.
  • See example below. Full configuration file is attached to this document.

    Code Block
    languagetext
    themeRDark
    TenantSupernet:
    default: '172.19.0.0/16'
    description: Supernet that contains Tenant subnets for all roles.
    type: string
    TenantGateway:
    default: '172.19.0.1'
    description: Router gateway on tenant network
    type: string
    Tenant_2Gateway:
    default: '172.19.2.1'
    description: Router gateway on tenant_2 network
    type: string
    .
    .
    resources:
    OsNetConfigImpl:
    type: OS::Heat::SoftwareConfig
    properties:
    group: script
    config:
    str_replace:
    template:
    get_file: /usr/share/openstack-tripleo-heat-templates/network/scripts/run-os-net-config.sh
    params:
    $network_config:
    network_config:
    .
    .
    # NIC 3 - Data Plane (Tenant net)
    - type: ovs_bridge
    name: br-sriov
    use_dhcp: false
    members:
    - type: interface
    name: ens2f0
    addresses:
    - ip_netmask:
    get_param: TenantIpSubnet
    routes:
    - ip_netmask:
    get_param: TenantSupernet
    next_hop:
    get_param: TenantGateway

/home/stack/templates/computesriov1.yaml:

  • Make sure the location of run-os-net-config.sh script in the configuration file is pointing to the correct script location.
  • Supernet and GW per network allow routing between network sets located in different racks. The GW would be the IP interface which was configured on the Leaf switch interface facing this network. - not mentioned in the example below, see example above or full configuration file.
  • Networks and routes used by Compute nodes in Rack 1 with ComputeSriov1 role:
    • Dedicated 1G interface for provisioning (PXE) network 
    • Dedicated 100G interface for offloaded vxlan data plane network in Rack 1. The network is associated with a supernet and has a route allowing it to reach other networks in the same supernet located in different racks
    • Dedicated 100G interface with host VF for PTP and with OVS vlans for Storage/InternalApi networks in Rack 1. Each network is associated with a supernet and has a route allowing it to reach other networks in the same supernet located in different racks - not mentioned in the example below, see full configuration file.
  • See example below. Full configuration file is attached to this document.

    Code Block
    languagetext
    themeRDark
     network_config:
                     # NIC 1 - Provisioning net
                  - type: interface                                                                                                
                    name: eno1                                                                                               
                    use_dhcp: false                                                                                                 
                    dns_servers:                                                                                                    
                      get_param: DnsServers                                                                                         
                    addresses:                                                                                                      
                    - ip_netmask:
                        list_join:
                        - /
                        - - get_param: ControlPlaneIp
                          - get_param: ControlPlaneSubnetCidr
                    routes:
                    - ip_netmask: 169.254.169.254/32
                      next_hop:
                        get_param: EC2MetadataIp
                    - default: true
                      next_hop:
                        get_param: ControlPlaneDefaultRoute
    
                       
                    # NIC 2 - ASAP2 VXLAN Data Plane (Tenant net)
                  - type: sriov_pf
                    name: ens1f0
                    numvfs: 8
                    link_mode: switchdev
                  - type: interface
                    name: ens1f0
                    use_dhcp: false
                    addresses:
                      - ip_netmask:
                          get_param: TenantIpSubnet 
                    routes:
                      - ip_netmask:
                          get_param: TenantSupernet
                        next_hop:
                          get_param: TenantGateway
                  
                  
                    # NIC 3 - Storage and Control over OVS, legacy SRIOV for Data Plane, NIC Partitioning for PTP VF owned by Host
                  - type: ovs_bridge
                    name: br-stor
                    use_dhcp: false
                    members:
                    - type: sriov_pf
                      name: ens1f1
                      numvfs: 8
                      # force the MAC address of the bridge to this interface
                      primary: true
                    - type: vlan
                      vlan_id:
                        get_param: StorageNetworkVlanID
                      addresses:
                      - ip_netmask:
                          get_param: StorageIpSubnet
                      routes:
                      - ip_netmask:
                          get_param: StorageSupernet
                        next_hop:
                          get_param: StorageGateway
                    - type: vlan
                      vlan_id:
                        get_param: InternalApiNetworkVlanID
                      addresses:
                      - ip_netmask:
                          get_param: InternalApiIpSubnet
                      routes:
                      - ip_netmask:
                          get_param: InternalApiSupernet
                        next_hop:
                          get_param: InternalApiGateway
                  - type: sriov_vf
                    device: ens1f1
                    vfid: 1
                    addresses:
                    - ip_netmask:
                        get_param: PtpIpSubnet
    
    

/home/stack/templates/computesriov2.yaml:

  • Make sure the location of run-os-net-config.sh script in the configuration file is pointing to the correct script location.
  • Supernet and GW per network allow routing between network sets located in different racks. The GW would be the IP interface which was configured on the Leaf switch interface facing this network. - not mentioned in the example below, see example above or full configuration file.
  • Networks and routes used by Compute nodes in Rack 2 with ComputeSriov2 role:
    • Dedicated 1G interface for provisioning (PXE) network - not mentioned in the example below, see example above or full configuration file.
    • Dedicated 100G interface for offloaded vxlan data plane network in Rack 1. The network is associated with a supernet and has a route allowing it to reach other networks in the same supernet located in different racks
    • Dedicated 100G interface with host VF for PTP and with OVS vlans for Storage/InternalApi networks in Rack 1. Each network is associated with a supernet and has a route allowing it to reach other networks in the same supernet located in different racks - not mentioned in the example below, see full configuration file.

  • See example below. Full configuration file is attached to this document.

    Code Block
    languagetext
    themeRDark
    network_config:
                    # NIC 1 - Provisioning net
                  - type: interface                                                                                                
                    name: eno1                                                                                               
                    use_dhcp: false                                                                                                 
                    dns_servers:                                                                                                    
                      get_param: DnsServers                                                                                         
                    addresses:                                                                                                      
                    - ip_netmask:
                        list_join:
                        - /
                        - - get_param: ControlPlaneIp
                          - get_param: ControlPlaneSubnetCidr
                    routes:
                    - ip_netmask: 169.254.169.254/32
                      next_hop:
                        get_param: EC2MetadataIp
                    - default: true
                      next_hop:
                        get_param: ControlPlaneDefaultRoute
    
                       
                    # NIC 2 - ASAP2 VXLAN Data Plane (Tenant net)
                  - type: sriov_pf
                    name: ens1f0
                    numvfs: 8
                    link_mode: switchdev
                  - type: interface
                    name: ens1f0
                    use_dhcp: false
                    addresses:
                      - ip_netmask:
                          get_param: Tenant_2IpSubnet 
                    routes:
                      - ip_netmask:
                          get_param: TenantSupernet
                        next_hop:
                          get_param: Tenant_2Gateway
                  
                  
                    # NIC 3 - Storage and Control over OVS, legacy SRIOV for Data Plane, NIC Partitioning for PTP VF owned by Host
                  - type: ovs_bridge
                    name: br-stor
                    use_dhcp: false
                    members:
                    - type: sriov_pf
                      name: ens1f1
                      numvfs: 8
                      # force the MAC address of the bridge to this interface
                      primary: true
                    - type: vlan
                      vlan_id:
                        get_param: Storage_2NetworkVlanID
                      addresses:
                      - ip_netmask:
                          get_param: Storage_2IpSubnet
                      routes:
                      - ip_netmask:
                          get_param: StorageSupernet
                        next_hop:
                          get_param: Storage_2Gateway
                    - type: vlan
                      vlan_id:
                        get_param: InternalApi_2NetworkVlanID
                      addresses:
                      - ip_netmask:
                          get_param: InternalApi_2IpSubnet
                      routes:
                      - ip_netmask:
                          get_param: InternalApiSupernet
                        next_hop:
                          get_param: InternalApi_2Gateway
                  - type: sriov_vf
                    device: ens1f1
                    vfid: 1
                    addresses:
                    - ip_netmask:
                        get_param: Ptp_2IpSubnet

Deploying the

overcloud

Overcloud

Using the provided configuration and environment files, the cloud will be deployed

with

utilizing:

  • 3 controllers associated with Rack 1 networks
  • 2 Compute nodes associated with Rack 1 (provider network 1)
  • 2 Compute nodes associated with Rack 2 (provider network 2)
  • Routes to allow connectivity between racks/networks
  • VXLAN overlay tunnels between all the nodes

Before starting the deployment, verify connectivity between the racks' Leaf switches SW vlan interfaces facing the nodes over the OSPF underlay fabric. Without inter-rack connectivity for all networks, the overcloud deployment will fail.

  • In order to start the overcloud deployment, issue the command below 
  • Info
    titleNote
    • Do not change the order of the environment files in the deploy command.
    • Make sure that the NTP server specified in the deploy command is accessible and can provide time to the undercloud node
    • The overcloud_images.yaml file used in the deploy command is created during undercloud installation, verify its existence in the specified location
    • The network-isolation.yaml and neutron-sriov.yaml files specified in the deploy command are created automatically during deployment from j2.yaml template file

    To start the overcloud deployment, issue the command below: 

    Code Block
    languagetext
    themeRDark
    (undercloud) [stack@rhosp-director templates]$ openstack overcloud deploy --templates /usr/share/openstack-tripleo-heat-templates \
    --libvirt-type kvm \
    -n /home/stack/templates/network_data_rivermax.yaml \
    -r /home/stack/templates/roles_data_rivermax.yaml \
    --timeout 90 \
    --validation-warnings-fatal \
    --ntp-server 0.asia.pool.ntp.org \
    -e /home/stack/templates/node-info.yaml \
    -e /home/stack/templates/overcloud_images.yaml \
    -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \
    -e /usr/share/openstack-tripleo-heat-templates/environments/neutron-sriov.yaml \
    -e /home/stack/templates/network-environment-rivermax.yaml \
    -e /home/stack/templates/rivermax-env.yaml \
    -e /usr/share/openstack-tripleo-heat-templates/environments/host-config-and-reboot.yaml \
    -e /usr/share/openstack-tripleo-heat-templates/environments/disable-telemetry.yaml
    
    


    Post Deployment Steps

    Media

    Compute Node configuration:

    1. Verify the system booted with the required low latency adjustments

      Code Block
      languagetext
      themeRDark
      # cat /proc/cmdline
      BOOT_IMAGE=/boot/vmlinuz-3.10.0-957.10.1.el7.x86_64 root=UUID=334f450f-1946-4577-a4eb-822bd33b8db2 ro console=tty0 console=ttyS0,115200n8 crashkernel=auto rhgb quiet default_hugepagesz=2MB hugepagesz=2MB hugepages=8192 intel_iommu=on iommu=pt processor.max_cstate=0 intel_idle.max_cstate=0 nosoftlockup isolcpus=2-5,12-17 nohz_full=2-5,12-17 rcu_nocbs=2-5,12-17
      
      # cat /sys/module/intel_idle/parameters/max_cstate
      0
      
      # cat /sys/devices/system/cpu/cpuidle/current_driver
      none
    2. Upload MFT package to the compute node and install it 

      Info
      titleNote

      NVIDIA Mellanox Firmware Tools (MFT) can be

    obtained in http://www.mellanox.com/page/management_tools.Note
    1. obtained here.

      GCC and kernel-devel packages are required for MFT install.

      Code Block
      languagetext
      themeRDark
      #yum install gcc kernel-devel-3.10.0-957.10.1.el7.x86_64 -y
      #tar -xzvf mft-4.12.0-105-x86_64-rpm.tgz
      #cd mft-4.12.0-105-x86_64-rpm
      #./install.sh
      mst start
    2. Verify NIC Firmware and upgrade it to the latest if required 

      Code Block
      languagetext
      themeRDark
      # mlxfwmanager --query
      Querying Mellanox devices firmware ...
      
      Device #1:
      ----------
      
        Device Type:      ConnectX5
        Part Number:      MCX556A-EDA_Ax
        Description:      ConnectX-5 Ex VPI adapter card; EDR IB (100Gb/s) and 100GbE; dual-port QSFP28; PCIe4.0 x16; tall bracket; ROHS R6
        PSID:             MT_0000000009
        PCI Device Name:  /dev/mst/mt4121_pciconf0
        Base MAC:         ec0d9a7d81b2
        Versions:         Current        Available
           FW             16.25.1020     N/A
           PXE            3.5.0701       N/A
           UEFI           14.18.0019     N/A
      
        Status:           No matching image found
    3. Enable packet pacing and HW time stamp on the port used for PTP

      Info
    title
    1. titleNote

      rivermax_config script is available to download here

      The relevant interface in our case is ens1f1.

      REBOOT is required between the steps.

      The "mcra" setting will not survive a reboot.

      This is expected to be persistent and enabled by default in future FW releases.

      Code Block
      languagetext
      themeRDark
      #mst start
      Starting MST (Mellanox Software Tools) driver set
      Loading MST PCI module - Success
      [warn] mst_pciconf is already loaded, skipping
      Create devices
      -W- Missing "lsusb" command, skipping MTUSB devices detection
      Unloading MST PCI module (unused) - Success
      
      #mst status -v
      MST modules:
      ------------
          MST PCI module is not loaded
          MST PCI configuration module loaded
      PCI devices:
      ------------
      DEVICE_TYPE             MST                           PCI       RDMA            NET                       NUMA
      ConnectX5(rev:0)        /dev/mst/mt4121_pciconf0.1    08:00.1   mlx5_1          net-ens1f1                0
      
      ConnectX5(rev:0)        /dev/mst/mt4121_pciconf0      08:00.0   mlx5_0          net-ens1f0                0
      
       
      #chmod 777 rivermax_config
      #./rivermax_config ens1f1
      running this can take few minutes...
      enabling
      Done!
      Code Block
      languagetext
      themeRDark
      # reboot
      Code Block
      languagetext
      themeRDark
      #mst start
      Starting MST (Mellanox Software Tools) driver set
      Loading MST PCI module - Success
      [warn] mst_pciconf is already loaded, skipping
      Create devices
      -W- Missing "lsusb" command, skipping MTUSB devices detection
      Unloading MST PCI module (unused) - Success
       
      #mcra /dev/mst/mt4121_pciconf0.1 0xd8068 3
      
      #mcra /dev/mst/mt4121_pciconf0.1 0xd8068
      0x00000003
    2. Sync the compute node clock

    3. install linuxptp

      Code Block
      languagetext
      themeRDark
      # yum install -y linuxptp
    4.  Use one of the following methods to identify the host VF interface name used for PTP (look for IP address from the PTP network or for "virtfn1" which is correlated to vfid 1 used in the configuration deployment files)

      Code Block
      languagetext
      themeRDark
      [root@overcloud-computesriov1-0 ~]# ip addr show | grep "172.20"
          inet 172.20.0.102/24 brd 172.20.0.255 scope global enp8s8f3
      
      
       
      [root@overcloud-computesriov1-0 ~]# ls /sys/class/net/ens1f1/device/virtfn1/net/
      enp8s8f3
    5. verify connectivity to clock master (Onyx leaf switch sw11 over vlan 51 for Rack1, Onyx leaf switch sw10 over vlan 52 for Rack2)

      Code Block
      languagetext
      themeRDark
      [root@overcloud-computesriov1-0 ~]# ping 172.20.0.1
      PING 172.20.0.1 (172.20.0.1) 56(84) bytes of data.
      64 bytes from 172.20.0.1: icmp_seq=1 ttl=64 time=0.158 ms
      
      
       
      [root@overcloud-computesriov2-0 ~]# ping 172.20.2.1
      PING 172.20.2.1 (172.20.2.1) 56(84) bytes of data.
      64 bytes from 172.20.2.1: icmp_seq=1 ttl=64 time=0.110 ms
    6. edit /etc/ptp4l.conf to include the following global parameter and the PTP interface parameters

      Code Block
      languagetext
      themeRDark
      [global]
      domainNumber 127
      priority1 128
      priority2 127
      use_syslog 1
      logging_level 6
      tx_timestamp_timeout 30
      hybrid_e2e 1
      dscp_event 46
      dscp_general 46
       
      [enp8s8f3]
      logAnnounceInterval -2
      announceReceiptTimeout 3
      logSyncInterval -3
      logMinDelayReqInterval -3
      delay_mechanism E2E
      network_transport UDPv4
       
    7. Start ptp4l on the PTP VF interface

      Info
      titleNote

      The command below is used to run the ptp4l in slave mode on a dedicated host CPU which is isolated and excluded from Nova per our deployment configuration files (core 16 in our case).

      The second command is used to verify PTP clock is locked on master clock source, rms values should be low.

      Code Block
      languagetext
      themeRDark
      # taskset -c 16 ptp4l -s -f /etc/ptp4l.conf &
      # tail -f /var/log/messages | grep rms
      ptp4l: [2560.009] rms   12 max   22 freq -12197 +/-  16
      ptp4l: [2561.010] rms   10 max   18 freq -12200 +/-  13 delay    63 +/-   0
      ptp4l: [2562.010] rms   10 max   21 freq -12212 +/-  10 delay    63 +/-   0
      ptp4l: [2563.011] rms   10 max   21 freq -12208 +/-  14 delay    63 +/-   0
      ptp4l: [2564.012] rms    9 max   14 freq -12220 +/-   8
    8.  Start  phc2sys on the same interface to sync the host system clock time

      Info
      titleNote

      The command below is used to run the phc2sys on a dedicated host CPU which is isolated and excluded from Nova per our deployment configuration files (core 17 in our case).

      The second command is used to verify system clock is synched to PTP, offset values should be low and match the ptp4l rms values

      Code Block
      languagetext
      themeRDark
      # taskset -c 17  phc2sys -s enp8s8f3 -w -m -n 127 >> /var/log/messages &
      # tail -f /var/log/messages | grep offset
      phc2sys[2797.730] phc offset         0 s2 freq  +14570 delay    959
      phc2sys[2798.730]: phc offset       -43 s2 freq  +14527 delay    957
      phc2sys[2799.730]: phc offset        10 s2 freq  +14567 delay    951

    Application VMs and Use Cases

    In the section below we will cover two main use cases:Use case 1: 

    1. IP Multicast stream between media VMs located in different L3 routed provider networks 

    Image Removed

    Use case 2: HW
    1. Image Added



    2. HW-Offloaded Unicast stream over VXLAN tunnel between media VMs located in different L3 routed provider networks 
    Image Removed
    1. Image Added



    Media Instances Creation

    Info
    titleNote
     Each

    Each Media VM will own both SRIOV-based vlan network and ASAP²-based VXLAN network. The same VMs can be used to test all of the use cases.

    Download
    1. Contact Nvidia Networking Support to get the Rivermax VM cloud

    image 
    1. image file (RivermaxCloud_v3.qcow2)

      Info
      titleNote

      The login credentials to VMs that are using this image are: root/3tango

    2. Upload Rivermax cloud image to overcloud image repository

      Code Block
      languagetext
      themeRDark
      source overcloudrc
      openstack image create --file RivermaxCloud_
    v1
    1. v3.qcow2 --disk-format qcow2 --container-format bare rivermax
    2. Create a flavor with dedicated cpu policy to ensure VM vCPUs are pinned to the isolated host CPUs

      Code Block
      languagetext
      themeRDark
      openstack flavor create m1.rivermax --id auto --ram 4096 --disk 20 --vcpus 4
      openstack flavor set m1.rivermax --property hw:mem_page_size=large
      openstack flavor set m1.rivermax --property hw:cpu_policy=dedicated
    3. Create a multi-segment network for the tenant vlan Multicast traffic

      Info
      titleNotes

      Each network segment contains SRIOV direct port with IP from a different subnet.

      The subnets are associated with a different physical network, each one correlated with a different routed provider rack.

      Routes to the subnets are propagated between racks via provider L3 infrastructure (OSPF in our case).

      The subnets GWs are the Leaf ToR Switch per rack.

      Both segments under this multi-segment network are carrying the same segment vlan.

      Code Block
      languagetext
      themeRDark
      openstack network create mc_vlan_net --provider-physical-network tenantvlan1 --provider-network-type vlan --provider-segment 101 --share
      openstack network segment list --network mc_vlan_net
      +--------------------------------------+------+--------------------------------------+--------------+---------+
      | ID                                   | Name | Network                              | Network Type | Segment |
      +--------------------------------------+------+--------------------------------------+--------------+---------+
      | 309dd695-b45d-455e-b171-5739cc309dcf | None | 00665b03-eeae-4b5d-af65-063f8e989c24 | vlan         |     101 |
      +--------------------------------------+------+--------------------------------------+--------------+---------+
      
      
      openstack network segment set --name segment1 309dd695-b45d-455e-b171-5739cc309dcf
      openstack network segment create --physical-network tenantvlan2 --network-type vlan --segment 101 --network mc_vlan_net segment2
       
      (overcloud) [stack@rhosp-director ~]$ openstack network segment list 
      +--------------------------------------+----------+--------------------------------------+--------------+---------+
      | ID                                   | Name     | Network                              | Network Type | Segment |
      +--------------------------------------+----------+--------------------------------------+--------------+---------+
      | 309dd695-b45d-455e-b171-5739cc309dcf | segment1 | 00665b03-eeae-4b5d-af65-063f8e989c24 | vlan         |     101 |
      | cac89791-2d7f-45e7-8c85-cc0a65060e81 | segment2 | 00665b03-eeae-4b5d-af65-063f8e989c24 | vlan         |     101 |
      +--------------------------------------+----------+--------------------------------------+--------------+---------+
      
      openstack subnet create mc_vlan_subnet --dhcp --network mc_vlan_net --network-segment segment1 --subnet-range 11.11.11.0/24 --gateway 11.11.11.1
      openstack subnet create mc_vlan_subnet_2 --dhcp --network mc_vlan_net --network-segment segment2 --subnet-range 22.22.22.0/24 --gateway 22.22.22.1
      
      openstack port create mc_direct1 --vnic-type=direct --network mc_vlan_net 
      openstack port create mc_direct2 --vnic-type=direct --network mc_vlan_net 
    4. Create vxlan tenant network for Unicast traffic with 2 x  ASAP² offload ports 

      Code Block
      languagetext
      themeRDark
      openstack network create tenant_vxlan_net --provider-network-type vxlan --share
      openstack subnet create tenant_vxlan_subnet --dhcp --network tenant_vxlan_net --subnet-range 33.33.33.0/24 --gateway none
      openstack port create offload1 --vnic-type=direct --network tenant_vxlan_net --binding-profile '{"capabilities":["switchdev"]}'
      openstack port create offload2 --vnic-type=direct --network tenant_vxlan_net --binding-profile '{"capabilities":["switchdev"]}'
    5. Create a rivermax instance on media compute node located in Rack 1 (provider network segment 1) with one direct SRIOV port on the vlan network and one ASAP² offload port on the vxlan network

      Code Block
      languagetext
      themeRDark
      openstack server create --flavor m1.rivermax --image rivermax --nic port-id=mc_direct1 --nic port-id=offload1 vm1 --availability-zone nova:overcloud-computesriov1-0.localdomain
    6. Create a second rivermax instance on media compute node located in Rack 2 (provider network segment 2) with one direct SRIOV port on the vlan network and one ASAP² offload port on the vxlan network

      Code Block
      languagetext
      themeRDark
      openstack server create --flavor m1.rivermax --image rivermax --nic port-id=mc_direct2 --nic port-id=offload2 vm2 --availability-zone nova:overcloud-computesriov2-0.localdomain
    7. Connect to the compute nodes and verify the VMs are pinned to the isolated CPUs

      Code Block
      languagetext
      themeRDark
      [root@overcloud-computesriov1-0 ~]# virsh list
       Id    Name                           State
      ----------------------------------------------------
       1     instance-0000002b              running
      
      
      [root@overcloud-computesriov1-0 ~]# virsh vcpupin 1
      VCPU: CPU Affinity
      ----------------------------------
         0: 15
         1: 2
         2: 3
         3: 4
      



    Rivermax Application Testing - Use Case 1:

    Info
    titleNote

    In the following section we use Rivermax application VMs created on 2 media compute nodes located in different network racks.

    First we will lock on the PTP clock generated by the Onyx switches and propagated into the VMs via KVM vPTP driver.

    Next we will generate media standards compliant stream on VM1 and validate compliance using

    Mellanox

    NVIDIA Rivermax AnalyzeX tool on VM2. The Multicast stream generated by VM1 will traverse over the network using PIM-SM and will be received by VM2 who joined the group. Please notice this stream contains RTP header (including 1 SRD) for each packet and comply with known media RFCs, however the RTP payload is 0 so it is cannot be visually displayed.

    In the last step we will decode and stream a real video file on VM1 and play it in the receiver VM2

    using

    graphical interface using NVIDIA Rivermax Simple Viewer tool.


    1. Upload rivermax and analyzex license files to the Rivermax VMs and place it under /opt/mellanox/rivermx
    directory

    On both VMs run the following command to sync the system time from PTP

    Info
    titleNote

    Notice the phc2sys is running on a dedicated VM core 1 (which is isolated from the hypervisor) and applied on ptp2 device. In some cases the ptp devices names in the VM will be different.

    Ignore the "clock is not adjustable" message when applying the command below.

     

    Low and stable offset values will indicate a lock.

    Important: Verify the freq values in the output below are close to the values seen in the compute node level. If not, use in the phc2sys command a different ptp device that is available in the VM system.
    1. directory.
    2. On both VMs run the following command to sync the system time from PTP:

      Code Block
      languagetext
      themeRDark
      taskset -c 1 phc2sys -s /dev/ptp2 -O 0 -m >> /var/log/messages &
      
      # tail -f /var/log/messages | grep offset
      phc2sys[2797.730] phc offset         0 s2 freq  +14570 delay    959
      phc2sys[2798.730]: phc offset       -43 s2 freq  +14527 delay    957
      phc2sys[2799.730]: phc offset        10 s2 freq  +14567 delay    951

      Notice the phc2sys is running on a dedicated VM core 1 (which is isolated from the hypervisor) and applied on ptp2 device. In some cases the ptp devices names in the VM will be different.

      Ignore the "clock is not adjustable" message when applying the command.

      Low and stable offset values will indicate a lock.

      Info
      titleNote

      Important: Verify the freq values in the output are close to the values seen in the compute node level (see above where we performed the command on host).
      If not, use in the phc2sys command a different /dev/ptp device that is available in the VM system.

    3. On both VMs, run the SDP file modification script to adjust the media configuration file (sdp_hd_video_audio) as desired:

      Code Block
      languagetext
      themeRDark
      #cd /home/Rivermax
      #./sdp_modify.sh
      === SDP File Modification Script ===
      Default source IP Address is 11.11.11.10 would you like to change it (Y\N)?y
      Please select source IP Address in format X.X.X.X :11.11.11.25
      Default Video stream multicast IP Address is 224.1.1.20 would you like to change it (Y\N)?y
      Please select Video stream multicast IP Address:224.1.1.110
      Default  Video stream multicast Port is 5000 would you like to change it (Y\N)?n
      Default Audio stream multicast IP Address is 224.1.1.30 would you like to change it (Y\N)?y
      Please select Audio stream multicast IP Address:224.1.1.110
      Default Audio stream multicast Port: is 5010 would you like to change it (Y\N)?n
      Your SDP file is ready with the following parameters:
      IP_ADDR 11.11.11.25
      MC_VIDEO_IP 224.1.1.110
      MC_VIDEO_PORT 5000
      MC_AUDIO_IP 224.1.1.110
      MC_AUDIO_PORT 5010
      
      
      # cat sdp_hd_video_audio
      v=0
      o=- 1443716955 1443716955 IN IP4 11.11.11.25
      s=st2110 stream
      t=0 0
      m=video 5000 RTP/AVP 96
      c=IN IP4 224.1.1.110/64
      a=source-filter:incl IN IP4 224.1.1.110 11.11.11.25
      a=rtpmap:96 raw/90000
      a=fmtp:96 sampling=YCbCr-4:2:2; width=1920; height=1080; exactframerate=50; depth=10; TCS=SDR; colorimetry=BT709; PM=2110GPM; SSN=ST2110-20:2017; TP=2110TPN;
      a=mediaclk:direct=0
      a=ts-refclk:localmac=40-a3-6b-a0-2b-d2
      m=audio 5010 RTP/AVP 97
      c=IN IP4 224.1.1.110/64
      a=source-filter:incl IN IP4 224.1.1.110 11.11.11.25
      a=rtpmap:97 L24/48000/2
      a=mediaclk:direct=0 rate=48000
      a=ptime:1
      a=ts-refclk:localmac=40-a3-6b-a0-2b-d2
    4. On both VMs issue the following command to define the VMA memory buffers:

      Code Block
      languagetext
      themeRDark
      export VMA_RX_BUFS=2048
      export VMA_TX_BUFS=2048
      export VMA_RX_WRE=1024
      export VMA_TX_WRE=1024
    5. On the first VM ("transmitter VM") generate the media stream using Rivermax media_sender application

    Info
    titleNote
    The
    1. . The command below is used to run Rivermax media sender application on dedicated VM vCPUs 2,3 (which are isolated from the hypervisor).

      Media sender application is using the system time to operate.

      Code Block
      languagetext
      themeRDark
      # ./media_sender -c 2 -a 3 -s sdp_hd_video_audio -m
    2. On the second VM ("receiver VM") run the AnalyzeX tool to verify compliance

    Info
    titleNote
    The
    1. . The command below is used to run Rivermax AnalyzeX compliance tool on dedicated VM vCPUs 1-3 (which are isolated from the hypervisor).

      Code Block
      languagetext
      themeRDark
      # VMA_HW_TS_CONVERSION=2 ANALYZEX_STACK_JITTER=2 LD_PRELOAD=libvma.so taskset -c 1-3 ./analyzex -i ens4 -s sdp_hd_video_audio -p
    2. The following AnalyzeX result indicate full compliance to ST2110 media standards:

    Image Removed
    1. Image Added

    2. Stop Rivermax media_sender application on VM1 and AnalyzeX tool on VM2.

    3. Login to VM1 and extract the video file under /home/Rivermax directory

      Code Block
      languagetext
      themeRDark
      # gunzip mellanoxTV_1080p50.ycbcr.gz
    4. Re-run Rivermax media_sender application on VM1 - this time specify the video file

    Info
    titleNote
    Lower
    1. . Lower rate is used to allow the graphical interface to cope with video playing task:

      Code Block
      languagetext
      themeRDark
      # ./media_sender -c 2 -a 3 -s sdp_hd_video_audio -m -f mellanoxTV_1080p50.ycbcr --fps 25
    2. Open a graphical remote session to

    VM2  Info
    titleNote
    In
    1. VM2. In our case we have allocated a public floating ip to VM2 and used X2Go client to open a remote session:

    Image Removed
    1. Image Added

    2. Open the Terminal and run the Rivermax rx_hello_wolrd_viewer application under /home/Rivermax directory. Specify the local VLAN IP address of VM2 and the Multicast address of the stream.

     Once Image Removed
    1.  Once the command is issued the video will start playing on screen.

      Code Block
      languagetext
      themeRDark
      #cd /home/Rivermax
      # ./rx_hello_world_viewer -i 22.22.22.4 -m 224.1.1.110 -p 5000

    Image RemovedImage Removed

    1. Image AddedImage Added

      Image Added

      The following video demonstrates the procedure:

      View file
      nameSimple_player.mp4
      height250



    Rivermax Application Testing - Use Case 2

    InfotitleNote

    In the following section we use the same Rivermax application VMs that were created on 2 remote media compute nodes to generate a Unicast stream between the VMs over VXLAN overlay network.

    After validating the PTP clock is locked, we will start the stream and monitor it with the same tools.

    The Unicast stream generated by VM1 will create vxlan OVS flow that will be offloaded to the NIC HW.


    1. Make sure rivermax and analyzex license files are placed on the Rivermax VMs as instructed in Use Case 1.
    2. Make sure the system time on both VMs is updated from PTP as instructed in Use Case 1.

      Code Block
      languagetext
      themeRDark
      # tail -f /var/log/messages | grep offset
      phc2sys[2797.730] phc offset         0 s2 freq  +14570 delay    959
      phc2sys[2798.730]: phc offset       -43 s2 freq  +14527 delay    957
      phc2sys[2799.730]: phc offset        10 s2 freq  +14567 delay    951
    3. On transmitter VM1 run the SDP file modification script to create a Unicast configuration file - specify VM1 VXLAN IP address as source IP and  VM2 VXLAN IP address as the stream destination

      Code Block
      languagetext
      themeRDark
      # ./sdp_modify.sh
      === SDP File Modification Script ===
      Default source IP Address is 11.11.11.10 would you like to change it (Y\N)?y
      Please select source IP Address in format X.X.X.X :33.33.33.12
      Default Video stream multicast IP Address is 224.1.1.20 would you like to change it (Y\N)?y
      Please select Video stream multicast IP Address:33.33.33.16
      Default  Video stream multicast Port is 5000 would you like to change it (Y\N)?n
      Default Audio stream multicast IP Address is 224.1.1.30 would you like to change it (Y\N)?y
      Please select Audio stream multicast IP Address:33.33.33.16
      Default Audio stream multicast Port: is 5010 would you like to change it (Y\N)?n
      Your SDP file is ready with the following parameters:
      IP_ADDR 33.33.33.12
      MC_VIDEO_IP 33.33.33.16
      MC_VIDEO_PORT 5000
      MC_AUDIO_IP 33.33.33.16
      MC_AUDIO_PORT 5010
    4. On VM1 generate the media stream using Rivermax media_sender application - use the unicast SDP file you created in previous step

      Code Block
      languagetext
      themeRDark
      # ./media_sender -c 2 -a 3 -s sdp_hd_video_audio -m
    5. On receiver VM2 run the Rivermax rx_hello_wolrd application with the local VXLAN interface IP

      Info
      titleNote

      Make sure you use rx_hello_world tool and not rx_hello_world_viewer.

      Code Block
      languagetext
      themeRDark
      # ./rx_hello_world -i 33.33.33.16 -m 33.33.33.16 -p 5000
    6. On the Compute nodes verify the flows are offloaded to the HW

      1. On compute node 1 which is hosting transmitter VM1 the offloaded flow includes the traffic coming from the VM over the Representor interface and goes into the VXLAN tunnel :

        Code Block
        languagetext
        themeRDark
        [root@overcloud-computesriov1-0 heat-admin]# ovs-dpctl dump-flows type=offloaded --name
        
         
        in_port(eth4),eth(src=fa:16:3e:94:a4:5d,dst=fa:16:3e:fc:59:f3),eth_type(0x0800),ipv4(tos=0/0x3,frag=no), packets:54527279, bytes:71539619808, used:0.330s, actions:set(tunnel(tun_id=0x8,src=172.19.0.100,dst=172.19.2.105,tp_dst=4789,flags(key))),vxlan_sys_4789
      2. On compute node 2 which is hosting receiver VM2 the offloaded flow includes the traffic coming over the VXLAN tunnels and goes into the VM over the Representor interface:

        Code Block
        languagetext
        themeRDark
        [root@overcloud-computesriov2-0 ~]# ovs-dpctl dump-flows type=offloaded --name
         
        tunnel(tun_id=0x8,src=172.19.0.100,dst=172.19.2.105,tp_dst=4789,flags(+key)),in_port(vxlan_sys_4789),eth(src=fa:16:3e:94:a4:5d,dst=fa:16:3e:fc:59:f3),
    eth_type(0x0800),ipv4(frag=no), packets:75722169, bytes:95561342656, used:0.420s, actions:eth5 sys_4789Solution partsLarge Scale2-Rack ExampleSpine switch
     8 x SN2700 Ethernet Switch
     2 x SN2700 Ethernet Switch
    Leaf switch
    (1 per rack)
    32 x SN2700 Ethernet Switch
    2 x SN2700 Ethernet Switch
    Blocking ratio3:11:1Max Nodes384 Max Nodes per Rack123 Controller Nodes - Rack1
    2 Compute Nodes - Rack1
    2 Compute Nodes - Rack2Max Racks322Network adapter
    (1 per host)
    ConnectX-5 Dual QSFP28 Port
    ConnectX-5 Dual QSFP28 Port
    Leaf-Spine\Leaf-Host cables
    2 leaf-host per host
     leaf-spine quantity according to the required blocking ratio
    QSFP28 100GbE Passive Copper Cable
    QSFP28 100GbE Passive Copper Cable
      1. eth_type(0x0800),ipv4(frag=no), packets:75722169, bytes:95561342656, used:0.420s, actions:eth5 sys_4789

    Authors

    Include Page
    SA:Itai Levy
    SA:Itai Levy

    Related Documents

    Content by Label
    showLabelsfalse
    showSpacefalse
    sortcreation
    cqllabel = "replace"in ("openstack","rhosp","asap²","media_and_entertainment","virtual_machine","virtualization")