image image image image image image



On This Page

Created on Jul 11, 2019


Introduction

Red Hat OpenStack Platform solution allows Cloud and Communication Service Providers to increase efficiency and agility while reducing the operational costs.

In this Reference Deployment Guide (RDG) we will demonstrate a complete deployment process of the Red Hat OpenStack Platform 13 as Network Functions Virtualization Infrastructure (NFVI) with NVIDIA Network ASAP²-based OVS Hardware Offload to achieve high-throughput SRIOV data path while keeping the existing Openvswitch control path and VXLAN connectivity .

We'll cover setup components, scale considerations and other technological aspects including Hardware BoM, network topology and the steps to validate VXLAN traffic offload between virtualized network functions (VNFs).

Before you start it's highly recommended to become familiar with the OVS Hardware Offload ASAP² technology which is introduced as inbox feature in RH-OSP13.

You are welcome to watch this 6 min video: Accelerated Switch and Packet Processing.

References

Components Overview

  • NVIDIA Spectrum Switch family provides the most efficient network solutions for the ever-increasing performance demands of data center applications.
  • NVIDIA ConnectX Network Adapter family delivers industry-leading connectivity for performance-driven server and storage applications. ConnectX adapter cards enable high bandwidth, coupled with ultra-low latency for diverse applications and systems, resulting in faster access and real-time responses.
  • NVIDIA Accelerated Switching and Packet Processing (ASAP²) technology combines the performance and efficiency of server/storage networking hardware with the flexibility of virtual switching software. ASAP² offers up to 10 times better performance than non-offloaded OVS solutions, delivering software-defined networks with the highest total infrastructure efficiency, deployment flexibility and operational simplicity. (Introduced starting in ConnectX-4 Lx NICs.)
  • NVIDIA NEO™ is a powerful platform for managing computing networks. It enables data center operators to efficiently provision, monitor and operate the modern data center fabric.
  • NVIDIA LinkX Cables and Transceivers family provides the industry’s most complete line of 10, 25, 40, 50, 100, 200, and 400Gb interconnect products for Cloud, Web 2.0, Enterprise, telco, and storage data center applications. They are often used to link top-of-rack switches downwards to servers, storage & appliances and upwards in switch-to-switch applications

Solution Overview

Solution Design

  • RH-OSP13 cloud is deployed in large scale over multiple racks interconnected via Spine/Leaf network architecture.
  • Each Compute/Controller node is equipped with a dual-port 100GB NIC of which one port is dedicated for tenant data traffic and the other for storage and control traffic.
  • Composable custom networks are used for network isolation between the racks. In our case, L3 OSPF underlay is used to route between the networks, however another fabric infrastructure could be used as desired.
  • ASAP²-enabled Compute nodes are located in different racks and maintain VXLAN tunnels as overlay for tenant VM traffic.
  • OVS ASAP² data plane acceleration is used by the Compute nodes to offload the CPU-intensive VXLAN traffic, in order to avoid the encapsulation/decapsulation performance penalty and achieve impressive high throughput.
  • Switches are configured and managed by NEO.
  • OpenStack Neutron is used as an SDN controller.



_________________________________________________________________


_________________________________________________________________



Bill of Materials

 


Note

  • The BOM above is referring to the maximal configuration in a large scale with a blocking ratio of 3:1.
  • It is possible to change the blocking ratio in order to obtain a different capacity.
  • SN2100 Switch is sharing the same feature set with SN2700 and can be used in this solution when lower capacity is required.
  • The 2-Rack BOM will be used in the solution example described below.


Large Scale Overview

Maximal Scale Diagram



Solution Example

We have chosen the key features below as a baseline to demonstrate the accelerated RH-OSP solution.

Solution Scale

  • 2 x racks with a custom network set per rack
  • 2 x SN2700 switches as Spine switches
  • 2 x SN2100 switches as Leaf switches, 1 per rack
  • 5 nodes in rack 1 (3 x Controller, 2 x Compute)
  • 2 nodes in rack 2 (2 x Compute)
  • All nodes are connected to Leaf switches using 2 x 100GB ports per node
  • Leaf switches are connected to each Spine switch using a single 100GB port

Physical Diagram

Network Diagram

Note

  • The Storage network is configured, however no storage nodes are used.
  • Compute nodes are going out to the external network via the undercloud node.

 


Important !

The configuration steps below refer to a solution example based on 2 racks

Network Configuration Steps

Physical Configuration

  • Connect the switches to the switch mgmt network
  • Interconnect the switches using 100GB/s cables


  • Connect the Controller/Compute servers to the relevant networks according to the following diagrams:


    Role

    Leaf Switch Location

    Controller 1

    Rack 1

    Controller 2

    Rack 1

    Controller 3

    Rack 1

    Compute 1

    Rack 1

    Compute 2

    Rack 2

    Compute 3

    Rack 1

    Compute 4

    Rack 2

       



  • Connect the Undercloud Director server to the IPMI/PXE/External networks.

OSPF Configuration

  • Configure OSPF on the Leaf/Spine switches using NVIDIA NEO



Interface Configuration

  • Set VLANs and VLAN interfaces on the Leaf switches according to the following diagrams:

    Network Name

    Network Set

    Leaf Switch Location

    Network Details

    Switch Interface IP

    VLAN ID

    Switchport Mode

    Storage

    1

    Rack 1

    172.16.0.0 / 24

    172.16.0.1

    11

    hybrid

    Storage_Mgmt

    172.17.0.0 / 24

    172.17.0.1

    21

    hybrid

    Internal API

    172.18.0.0 / 24

    172.18.0.1

    31

    hybrid

    Tenant

    172.19.0.0 / 24

    172.19.0.1

    41

    access

    Storage_2

    2

    Rack 2

    172.16.2.0 / 24

    172.16.2.1

    12

    hybrid

    Storage_Mgmt_2

    172.17.2.0 / 24

    172.17.2.1

    22

    hybrid

    Internal API _2

    172.18.2.0 /24

    172.18.2.1

    32

    hybrid

    Tenant _2

    172.19.2.0 /2 4

    172.19.2.1

    42

    access



  • Use NVIDIA NEO to provision the VLANs and interfaces via the pre-defined Provisioning Tasks:
    • Add-VLAN to create the VLANs and set names.
    • Set-Access-VLAN-Port to set access VLAN on the tenant network ports.
    • Set-Hybrid-Vlan-Port to allow the required VLANs on the storage/storage_mgmt/internal API networks ports.
    • Add-VLAN-To-OSPF-Area for distribution of the networks over OSPF.
    • Add VLAN IP Address to set IP per VLAN (currently no pre-defined template).


For example, in order to set the port hybrid mode and allowed VLAN via the pre-defined Provisioning Tasks:


Note that since there is currently no predefined provisioning template for configuring the VLAN interface IP address, you can manually add the IP configuration into the “Add-VLAN-To-OSPF-Area” template and use it to define both IP addresses and OSPF distribution, for example:


Solution Configuration and Deployment Steps

Prerequisites

HW Specifications must be identical for servers with the same role (Compute/Controller/etc.)

Server Preparation

For all servers, make sure that in BIOS settings:

  • SRIOV is enabled
  • Network boot is set on the interface connected to PXE network

NIC Preparation

SRIOV configuration is disabled on ConnectX-5 NICs by default and must be enabled for every NIC used by a Compute node.

In order to enable and configure it, insert the Compute NIC into a test server with installed OS, and follow the steps below:

  • Verify using that firmware version is 16.21.2030 or newer:

    [root@host ~]# ethtool -i ens2f0
    driver: mlx5_core
    version: 5.0-0
    firmware-version: 16.22.1002 (MT_0000000009)
    expansion-rom-version:
    bus-info: 0000:07:00.0
    supports-statistics: yes
    supports-test: yes
    supports-eeprom-access: no
    supports-register-dump: no
    supports-priv-flags: yes

In case it is older, download the new firmware and burn the new firmware as explained here.


  • Install the mstflint package:

    [root@host ~]# yum install mstflint
    
  • Identify the PCI ID of the first 100G port and enable SRIOV:

    [root@host ~]# lspci | grep -i mel
    07:00.0 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex]
    07:00.1 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex]
    [root@host ~]#
    [root@host ~]# mstconfig -d 0000:07:00.0 query | grep -i sriov
    SRIOV_EN False(0)
    SRIOV_IB_ROUTING_MODE_P1 GID(0)
    SRIOV_IB_ROUTING_MODE_P2 GID(0)
    [root@host ~]# mstconfig -d 0000:07:00.0 set SRIOV_EN=1
    Device #1:
    ----------
    
    Device type: ConnectX5
    PCI device: 0000:07:00.0
    
    Configurations: Next Boot New
    SRIOV_EN False(0) True(1)
    
    Apply new Configuration? ? (y/n) [n] : y
    Applying... Done!
    -I- Please reboot machine to load new configurations.
  • Set the number of VFs to a high value, such as 64, and reboot the server to apply new configuration:

    [root@host ~]# mstconfig -d 0000:07:00.0 query | grep -i vfs
    NUM_OF_VFS 0
    [root@host ~]# mstconfig -d 0000:07:00.0 set NUM_OF_VFS=64
    
    Device #1:
    ----------
    
    Device type: ConnectX5
    PCI device: 0000:07:00.0
    
    Configurations: Next Boot New
    NUM_OF_VFS 0 64
    
    Apply new Configuration? ? (y/n) [n] : y
    Applying... Done!
    -I- Please reboot machine to load new configurations.
    [root@host ~]# reboot
  • Confirm the new settings were applied using the mstconfig query commands shown above.
  • Insert the NIC back to the Compute node.
  • Repeat the procedure above for every Compute node NIC used in our setup.

 

Note

  • In our solution, the first port of the two 100G ports in every NIC is used for the ASAP² accelerated data plane. This is the reason we enable SRIOV only on the first ConnectX NIC PCI device (07:00.0 in the example above).
  • There are future plans to support an automated procedure to update and configure the NICs on the Compute nodes from the Undercloud.


Accelerated RH-OSP Installation and Deployment Steps

  • Install Red Hat 7.5 OS on the Undercloud server and set an IP on its interface connected to the External network; make sure it has internet connectivity.
  • Install the Undercloud and the director as instructed in section 4 of the Red Hat OSP DIRECTOR INSTALLATION AND USAGE guide: Director Installation and Usage - Red Hat Customer Portal
    • Our undercloud.conf file is attached as a reference.
  • Configure a container image source as instructed in section 5 of the guide.
    • Our solution is using undercloud as a local registry.
  • Register the nodes of the overcloud as instructed in section 6.1.
    • Our instackenv.json file is attached as a reference.
  • Inspect the hardware of the nodes as instructed in section 6.2.
    • Once introspection is completed, it is recommended to confirm for each node that the desired root disk was detected since cloud deployment can fail later because of insufficient disk space. Use the following command to check the free space on the detected disk selected as root:

      (undercloud) [stack@rhosp-director ~]$ openstack baremetal node show 92c4c1cb-ce7d-48d4-a2d9-75b2651db097 | grep properties
      
      | properties | {u'memory_mb': u'131072', u'cpu_arch': u'x86_64', u'local_gb': u'418', u'cpus': u'24', u'capabilities': u'boot_option:local'}
    • “local_gb” value is representing the disk size. In case the disk size is low and not as expected, use the procedure described in section 6.6 for defining the root disk for the node. Note that an additional introspection cycle is required for this node after the root disk is changed.
    • Verify that all nodes were registered properly and changed their state to “available” before proceeding to the next step:

      +--------------------------------------+--------------+---------------+-------------+--------------------+-------------+
      | UUID                                 | Name         | Instance UUID | Power State | Provisioning State | Maintenance |
      +--------------------------------------+--------------+---------------+-------------+--------------------+-------------+
      | d1fca940-e341-491b-8afd-0cf6d748aa29 | controller-1 | None          | power off   | available          | False       |
      | 6b24d02c-3fd2-4e55-a730-c45008f01723 | controller-2 | None          | power off   | available          | False       |
      | 098c3e2d-1c70-41d2-983b-6c266387de0b | controller-3 | None          | power off   | available          | False       |
      | 91492c2a-b26c-49ef-9d4e-e492a1578076 | compute-1    | None          | power off   | available          | False       |
      | cdf9e0ec-e3cb-4005-86f6-d40e684a9b19 | compute-2    | None          | power off   | available          | False       |
      | 92c4c1cb-ce7d-48d4-a2d9-75b2651db097 | compute-3    | None          | power off   | available          | False       |
      | bb5e829a-834b-4eb1-b733-0012ce9d5f00 | compute-4    | None          | power off   | available          | False       |
      +--------------------------------------+--------------+---------------+-------------+--------------------+-------------+
  • Tagging Nodes into Profiles
    • Tag the controllers nodes into “control” default profile:

      (undercloud) [stack@rhosp-director ~]$ openstack baremetal node set --property capabilities='profile:control,boot_option:local' controller-1
      (undercloud) [stack@rhosp-director ~]$ openstack baremetal node set --property capabilities='profile:control,boot_option:local' controller-2
      (undercloud) [stack@rhosp-director ~]$ openstack baremetal node set --property capabilities='profile:control,boot_option:local' controller-3
  • Create two new compute flavors -- one per rack (compute-r1, compute-r2) -- and attach the flavors to profiles with a correlated name:

    (undercloud) [stack@rhosp-director ~]$ openstack flavor create --id auto --ram 4096 --disk 40 --vcpus 1 compute-r1
    (undercloud) [stack@rhosp-director ~]$ openstack flavor set --property "capabilities:boot_option"="local" --property "capabilities:profile"="compute-r1" --property "resources:CUSTOM_BAREMETAL"="1" --property "resources:DISK_GB"="0" --property "resources:MEMORY_MB"="0" --property "resources:VCPU"="0" compute-r1
    
    (undercloud) [stack@rhosp-director ~]$ openstack flavor create --id auto --ram 4096 --disk 40 --vcpus 1 compute-r2
    (undercloud) [stack@rhosp-director ~]$ openstack flavor set --property "capabilities:boot_option"="local" --property "capabilities:profile"="compute-r2" --property "resources:CUSTOM_BAREMETAL"="1" --property "resources:DISK_GB"="0" --property "resources:MEMORY_MB"="0" --property "resources:VCPU"="0" compute-r2
  • Tag compute nodes 1,3 into “compute-r1” profile to associate it with Rack 1, and compute nodes 2,4 into “compute-r2” profile to associate it with Rack 2:

    (undercloud) [stack@rhosp-director ~]$ openstack baremetal node set --property capabilities='profile:compute-r1,boot_option:local' compute-1
    (undercloud) [stack@rhosp-director ~]$ openstack baremetal node set --property capabilities='profile:compute-r1,boot_option:local' compute-3
    (undercloud) [stack@rhosp-director ~]$ openstack baremetal node set --property capabilities='profile:compute-r2,boot_option:local' compute-2
    (undercloud) [stack@rhosp-director ~]$ openstack baremetal node set --property capabilities='profile:compute-r2,boot_option:local' compute-4
  • Verify profile tagging per node using the command below:

    (undercloud) [stack@rhosp-director ~]$ openstack overcloud profiles list
    +--------------------------------------+--------------+-----------------+-----------------+-------------------+
    | Node UUID                            | Node Name    | Provision State | Current Profile | Possible Profiles |
    +--------------------------------------+--------------+-----------------+-----------------+-------------------+
    | d1fca940-e341-491b-8afd-0cf6d748aa29 | controller-1 | available       | control | |
    | 6b24d02c-3fd2-4e55-a730-c45008f01723 | controller-2 | available       | control | |
    | 098c3e2d-1c70-41d2-983b-6c266387de0b | controller-3 | available       | control | |
    | 91492c2a-b26c-49ef-9d4e-e492a1578076 | compute-1    | available       | compute-r1 | |
    | cdf9e0ec-e3cb-4005-86f6-d40e684a9b19 | compute-2    | available       | compute-r2 | |
    | 92c4c1cb-ce7d-48d4-a2d9-75b2651db097 | compute-3    | available       | compute-r1 | |
    | bb5e829a-834b-4eb1-b733-0012ce9d5f00 | compute-4    | available       | compute-r2 | |
    +--------------------------------------+--------------+-----------------+-----------------+-------------------+

    Note

    It is possible to tag the nodes into profiles in instackenv.json file during node registration (section 6.1) instead of running the tag command per node, however flavors and profiles must be created in any case.


Note

The configuration file examples in the following sections are partial and were employed to highlight specific sections. The full configuration files are attached to this document.


  • Role definitions:
    • Create the /home/stack/templates/ directory and generate inside it a new roles file (named _data.yaml) with two types of roles using the following command:

      (undercloud) [stack@rhosp-director ~]$ mkdir /home/stack/templates
      (undercloud) [stack@rhosp-director ~]$ cd /home/stack/templates/
      (undercloud) [stack@rhosp-director templates]$ openstack overcloud roles generate -o roles_data.yaml Controller ComputeSriov
    • Edit the file by changing ComputeSriov to ComputeSriov1:

      ###############################################################################
      # Role: ComputeSriov1 #
      ###############################################################################
      - name: ComputeSriov1
      description: |
      Compute SR-IOV Role R1
      CountDefault: 1
      networks:
      - InternalApi
      - Tenant
      - Storage
      HostnameFormatDefault: '%stackname%-computesriov1-%index%'
      disable_upgrade_deployment: True
      ServicesDefault:
    • Clone the entire ComputeSriov1 role section, change it to ComputeSriov2, and change its networks to represent the network set on the second rack:

      ###############################################################################
      # Role: ComputeSriov2 #
      ###############################################################################
      - name: ComputeSriov2
      description: |
      Compute SR-IOV Role R2
      CountDefault: 1
      networks:
      - InternalApi_2
      - Tenant_2
      - Storage_2
      HostnameFormatDefault: '%stackname%-computesriov2-%index%'
      disable_upgrade_deployment: True
      ServicesDefault:
    • Now the roles_data.yaml files include 3 types of roles: Controller and ComputeSriov1 which are associated with the Rack 1 network set, and ComputeSriov2 which is associated with the Rack 2 network set.
    • The full configuration file is attached to this document for your convenience.


  • Environment File for Defining Node Counts and Flavors:
    • Create /home/stack/templates/node-info.yaml, as explained in section 6.7, edit it to include count per role and correlated flavors per role.
    • Full configuration file:

      parameter_defaults:
      OvercloudControllerFlavor: control
      OvercloudComputeSriov1Flavor: compute-r1
      OvercloudComputeSriov2Flavor: compute-r2
      ControllerCount: 3
      ComputeSriov1Count: 2
  • NVIDIA ConnectX NICs Listing
    • Run the following command to go over all registered nodes and identify the interface names of the dual port ConnectX 100GB NIC:

      (undercloud) [stack@rhosp-director templates]$ for node in $(openstack baremetal node list --fields uuid -f value) ; do openstack baremetal introspection interface list $node ; done
      .
      .
      +-----------+-------------------+----------------------+-------------------+----------------+
      | Interface | MAC Address       | Switch Port VLAN IDs | Switch Chassis ID | Switch Port ID |
      +-----------+-------------------+----------------------+-------------------+----------------+
      | eno1      | ec:b1:d7:83:11:b8 | []                   | 94:57:a5:25:fa:80 | 29 |
      | eno2      | ec:b1:d7:83:11:b9 | []                   | None              | None |
      | eno3      | ec:b1:d7:83:11:ba | []                   | None              | None |
      | eno4      | ec:b1:d7:83:11:bb | []                   | None              | None |
      | ens1f1    | ec:0d:9a:7d:81:b3 | []                   | 24:8a:07:7f:ef:00 | Eth1/14 |
      | ens1f0    | ec:0d:9a:7d:81:b2 | []                   | 24:8a:07:7f:ef:00 | Eth1/1 |
      +-----------+-------------------+----------------------+-------------------+----------------+

      Note

      Names must be identical for all nodes, or at least for all nodes sharing the same role. In our case, it is ens2f0/ens2f1 in Controller nodes, and enf1f0/ens1f1 in Compute nodes.



  • HW Offload Configuration File
    • Locate /usr/share/openstack-tripleo-heat-templates/environments/ovs-hw-offload.yaml file and edit it according to the following guidelines per ComputeSriov role:
      • Set offload enabled
      • Set kernel args for huge pages
      • Set the desired interface for accelerated data plane (ens1f0 in our case)
      • Set the desired VF count (64 in our example)
      • Set the correct Nova PCI Passthrough devname, and physical_network: null
      • Set ExtraConfig for correlation between the role and the correct tenant/api network set
    • Full configuration file is attached to this document, see example below:

      # A Heat environment file that enables OVS Hardware Offload in the overcloud.
      # This works by configuring SR-IOV NIC with switchdev and OVS Hardware Offload on
      # compute nodes. The feature supported in OVS 2.8.0
      
      resource_registry:
      OS::TripleO::Services::NeutronSriovHostConfig: ../puppet/services/neutron-sriov-host-config.yaml
      
      parameter_defaults:
      
      NovaSchedulerDefaultFilters: ['RetryFilter','AvailabilityZoneFilter','RamFilter','ComputeFilter','ComputeCapabilitiesFilter','ImagePropertiesFilter','ServerGroupAntiAffinityFilter','ServerGroupAffinityFilter','PciPassthroughFilter']
      NovaSchedulerAvailableFilters: ["nova.scheduler.filters.all_filters","nova.scheduler.filters.pci_passthrough_filter.PciPassthroughFilter"]
      
      # Kernel arguments for ComputeSriov1 node
      ComputeSriov1Parameters:
      KernelArgs: "default_hugepagesz=1GB hugepagesz=1G hugepages=16 intel_iommu=on iommu=pt"
      OvsHwOffload: True
      # Number of VFs that needs to be configured for a physical interface
      NeutronSriovNumVFs: ["ens1f0:64:switchdev"]
      # Mapping of SR-IOV PF interface to neutron physical_network.
      # In case of Vxlan/GRE physical_network should be null.
      # In case of flat/vlan the physical_network should as configured in neutron.
      NovaPCIPassthrough:
      - devname: "ens1f0"
      physical_network: null
      NovaReservedHostMemory: 4096
      # Extra config for mapping the ovs local_ip to the relevant tenant network
      ComputeSriov1ExtraConfig:
      nova::vncproxy::host: "%{hiera('internal_api')}"
      neutron::agents::ml2::ovs::local_ip: "%{hiera('tenant')}"
      
      # Kernel arguments for ComputeSriov2 node
      ComputeSriov2Parameters:
      KernelArgs: "default_hugepagesz=1GB hugepagesz=1G hugepages=16 intel_iommu=on iommu=pt"
      OvsHwOffload: True
      # Number of VFs that needs to be configured for a physical interface
      NeutronSriovNumVFs: ["ens1f0:64:switchdev"]
      # Mapping of SR-IOV PF interface to neutron physical_network.
      # In case of Vxlan/GRE physical_network should be null.
      # In case of flat/vlan the physical_network should as configured in neutron.
      NovaPCIPassthrough:
      - devname: "ens1f0"
      physical_network: null
      NovaReservedHostMemory: 4096
      # Extra config for mapping the ovs local_ip to the relevant tenant network
      ComputeSriov2ExtraConfig:
      nova::vncproxy::host: "%{hiera('internal_api_2')}"
      neutron::agents::ml2::ovs::local_ip: "%{hiera('tenant_2')}"
  • Network Configuration File:
    • Locate /usr/share/openstack-tripleo-heat-templates/network_data.yaml file and edit it according to the following guidelines:
      • Set External network parameters (subnet, allocation pool, default GW).
      • Set rack 1 networks set parameters to match the subnets/vlans configured on Rack 1 Leaf switch.
      • Make sure you use the network names you specified in roles_data.yaml for Controller\ComputeSriov1 role networks.
      • Create a second set of networks to match the subnets/vlans configured on Rack 2 Leaf switch.
      • Make sure you use the network names you specified in roles_data.yaml for ComputeSriov2 role networks.
      • Disable the “management” network, as it is not used in our example.
      • The configuration is based on the following matrix to match the Leaf switch configuration as executed in Network Configuration section above:

        Network Name

        Network Set

        Network Location

        Network Details

        VLAN

        Network Allocation Pool

        Storage

        1

        Rack 1

        172.16.0.0/24

        11

        172.16.0.100-250

        Storage_Mgmt


        172.17.0.0/24

        21

        172.17.0.100-250

        Internal API


        172.18.0.0/24

        31

        172.18.0.100-250

        Tenant


        172.19.0.0/24

        41

        172.19.0.100-250

        Storage_2

        2

        Rack 2

        172.16.2.0/24

        12

        172.16.2.100-250

        Storage_Mgmt_2


        172.17.2.0/24

        22

        172.17.2.100-250

        Internal API _2


        172.18.2.0/24

        32

        172.18.2.100-250

        Tenant _2


        172.19.2.0/24

        42

        172.19.2.100-250

        External

        -

        Public Switch

        10.7.208.0/24

        -

        10.7.208.10-21

      • Full configuration file is attached to this document
      • Partial example for one of the configured networks (Storage network - 2 sets), External network and Management network configuration:

        - name: Storage
        vip: true
        vlan: 11
        name_lower: storage
        ip_subnet: '172.16.0.0/24'
        allocation_pools: [{'start': '172.16.0.100', 'end': '172.16.0.250'}]
        ipv6_subnet: 'fd00:fd00:fd00:1100::/64'
        ipv6_allocation_pools: [{'start': 'fd00:fd00:fd00:1100::10', 'end': 'fd00:fd00:fd00:1100:ffff:ffff:ffff:fffe'}]
        .
        .
        - name: Storage_2
        vip: true
        vlan: 12
        name_lower: storage_2
        ip_subnet: '172.16.2.0/24'
        allocation_pools: [{'start': '172.16.2.100', 'end': '172.16.2.250'}]
        ipv6_subnet: 'fd00:fd00:fd00:1200::/64'
        ipv6_allocation_pools: [{'start': 'fd00:fd00:fd00:1200::10', 'end': 'fd00:fd00:fd00:1200:ffff:ffff:ffff:fffe'}]
        .
        .
        - name: External
        vip: true
        name_lower: external
        vlan: 10
        ip_subnet: '10.7.208.0/24'
        allocation_pools: [{'start': '10.7.208.10', 'end': '10.7.208.21'}]
        gateway_ip: '10.7.208.1'
        ipv6_subnet: '2001:db8:fd00:1000::/64'
        ipv6_allocation_pools: [{'start': '2001:db8:fd00:1000::10', 'end': '2001:db8:fd00:1000:ffff:ffff:ffff:fffe'}]
        gateway_ipv6: '2001:db8:fd00:1000::1'
        
        - name: Management
        # Management network is enabled by default for backwards-compatibility, but
        # is not included in any roles by default. Add to role definitions to use.
        enabled: false
  • Deploying a plan from existing templates
    • Use the following command to create a plan called “asap-plan”:

      (undercloud) [stack@rhosp-director templates]$ openstack overcloud plan create --templates /usr/share/openstack-tripleo-heat-templates asap-plan
    • Create a dedicated folder and deploy the plan files inside it:

      (undercloud) [stack@rhosp-director templates]$ mkdir /home/stack/asap-plan
      (undercloud) [stack@rhosp-director templates]$ cd /home/stack/asap-plan
      (undercloud) [stack@rhosp-director asap-plan]$ openstack container save asap-plan
  • Editing plan files to be used in deployment
    • Copy the following files into the /home/stack/templates directory
      • /home/stack/asap-plan/environments/network-environment.yaml
      • /home/stack/asap-plan/network/config/single-nic-vlans/controller.yaml
      • /home/stack/asap-plan/network/config/single-nic-vlans/computesriov1.yaml
      • /home/stack/asap-plan/network/config/single-nic-vlans/computesriov2.yaml
    • Edit /home/stack/templates/network-environment.yaml according to the following guidelines:
      • Set the role file locations under resource_registry section.
      • Set the Undercloud control plane IP as the default route for this network.
      • Set the required DNS servers for the setup nodes.
      • See example below. Full configuration file is attached to this document.

        #This file is an example of an environment file for defining the isolated
        #networks and related parameters.
        resource_registry:
        # Network Interface templates to use (these files must exist). You can
        # override these by including one of the net-*.yaml environment files,
        # such as net-bond-with-vlans.yaml, or modifying the list here.
        # Port assignments for the Controller
        OS::TripleO::Controller::Net::SoftwareConfig:
        /home/stack/templates/controller.yaml
        # Port assignments for the ComputeSriov1
        OS::TripleO::ComputeSriov1::Net::SoftwareConfig:
        /home/stack/templates/computesriov1.yaml
        # Port assignments for the ComputeSriov2
        OS::TripleO::ComputeSriov2::Net::SoftwareConfig:
        /home/stack/templates/computesriov2.yaml
        
        parameter_defaults:
        # This section is where deployment-specific configuration is done
        # CIDR subnet mask length for provisioning network
        ControlPlaneSubnetCidr: '24'
        # Gateway router for the provisioning network (or Undercloud IP)
        ControlPlaneDefaultRoute: 192.168.24.1
        .
        .
        # Define the DNS servers (maximum 2) for the overcloud nodes
        DnsServers: ["10.7.77.192","10.7.77.135"]
        
        
    • Edit /home/stack/templates/controller.yaml according to the following guidelines:
      • Set the location of run-os-net-config.sh script.
      • Set Supernet and GW per network to allow routing between network sets located in different racks. The GW would be the IP interface which was configured on the Leaf switch interface facing this network. Supernet and gateway for 2 tenant networks are seen in green in the example below.
      • Set type, networks and routes for each interface used by Controller nodes. In our example, we use for Controller nodes:
        • Dedicated 1G interface (type “interface”) for provisioning (PXE) network.
        • Dedicated 1G interface (type “ovs_bridge”) for External network. This network has a default GW configured.
        • Dedicated 100G interface (type “interface” without vlans) for data plane (Tenant) network in Rack 1. The network is associated with a supernet and has a route allowing it to reach other networks in the same supernet located in different racks.
        • Dedicated 100G interface (type “ovs_bridge”) with vlans for Storage/StorageMgmt/InternalApi networks in Rack 1. Each network is associated with a supernet and has a route allowing it to reach other networks in the same supernet located in different racks.
        • See example below. Full configuration file is attached to this document.

          TenantSupernet:
          default: '172.19.0.0/16'
          description: Supernet that contains Tenant subnets for all roles.
          type: string
          TenantGateway:
          default: '172.19.0.1'
          description: Router gateway on tenant network
          type: string
          Tenant_2Gateway:
          default: '172.19.2.1'
          description: Router gateway on tenant_2 network
          type: string
          .
          .
          resources:
          OsNetConfigImpl:
          type: OS::Heat::SoftwareConfig
          properties:
          group: script
          config:
          str_replace:
          template:
          get_file: /usr/share/openstack-tripleo-heat-templates/network/scripts/run-os-net-config.sh
          params:
          $network_config:
          network_config:
          .
          .
          # NIC 3 - Data Plane (Tenant net)
          - type: ovs_bridge
          name: br-sriov
          use_dhcp: false
          members:
          - type: interface
          name: ens2f0
          addresses:
          - ip_netmask:
          get_param: TenantIpSubnet
          routes:
          - ip_netmask:
          get_param: TenantSupernet
          next_hop:
          get_param: TenantGateway
    • Edit /home/stack/templates/computesriov1.yaml according to the following guidelines:
      • Set the location of run-os-net-config.sh script - not mentioned in the example below, see example above or full configuration file.
      • Set Supernet and GW per network to allow routing between network sets located in different racks. The GW would be the IP interface which was configured on the Leaf switch interface facing this network. - not mentioned in the example below, see example above or full configuration file.
      • Set type, networks and routes for each interface used by Compute nodes in Rack 1. In our example, we use for those ComputeSriov1 nodes:
        • Dedicated 1G interface (type “interface”) for provisioning (PXE) network.
        • Dedicated 100G interface (type “interface” without vlans) for data plane (Tenant) network in Rack 1. The network is associated with a supernet and has a route allowing it to reach other networks in the same supernet located in different racks.
        • Dedicated 100G interface (type “ovs_bridge”) with vlans for Storage/InternalApi networks in Rack 1. Each network is associated with a supernet and has a route allowing it to reach other networks in the same supernet located in different racks - not mentioned in the example below, see full configuration file.

          # NIC 1 - Provisioning net
          - type: interface
          name: eno1
          use_dhcp: false
          dns_servers:
          get_param: DnsServers
          addresses:
          - ip_netmask:
          list_join:
          - /
          - - get_param: ControlPlaneIp
          - get_param: ControlPlaneSubnetCidr
          routes:
          - ip_netmask: 169.254.169.254/32
          next_hop:
          get_param: EC2MetadataIp
          - default: true
          next_hop:
          get_param: ControlPlaneDefaultRoute
          
          
          # NIC 2 - ASAP² Data Plane (Tenant net)
          - type: ovs_bridge
          name: br-sriov
          use_dhcp: false
          members:
          - type: interface
          name: ens1f0
          addresses:
          - ip_netmask:
          get_param: TenantIpSubnet
          routes:
          - ip_netmask:
          get_param: TenantSupernet
          next_hop:
          get_param: TenantGateway
    • Edit /home/stack/templates/computesriov2.yaml according to the following guidelines:
      • Set the location of run-os-net-config.sh script - not mentioned in the example below, see example above or full configuration file.
      • Set Supernet and GW per network to allow routing between networks set located in different racks. The GW would be the IP interface which was configured on the Leaf switch interface facing this network - not mentioned in the example below, see example above or full configuration file.
      • Set type, networks and routes for each interface used by Compute nodes in Rack 2. In our example, we use for those ComputeSriov2 nodes:
        • Dedicated 1G interface (type “interface”) for provisioning (PXE) network - not mentioned in the example below, see example above or full configuration file.
        • Dedicated 100G interface (type “interface” without vlans) for data plane (Tenant) network in Rack 2. The network is associated with a supernet and has a route allowing it to reach other networks in the same supernet located in different racks.
        • Dedicated 100G interface (type “ovs_bridge”) with vlans for Storage/InternalApi networks in Rack 2. Each network is associated with a supernet and has a route allowing it to reach other networks in the same supernet located in different racks - not mentioned in the example below, see example above or full configuration file.
        • See example below. Full configuration file is attached to this document.

          # NIC 2 - ASAP² Data Plane (Tenant net)
          - type: ovs_bridge
          name: br-sriov
          use_dhcp: false
          members:
          - type: interface
          name: ens1f0
          addresses:
          - ip_netmask:
          get_param: Tenant_2IpSubnet
          routes:
          - ip_netmask:
          get_param: TenantSupernet
          next_hop:
          get_param: Tenant_2Gateway
  • Deploying the overcloud
    • Now we are ready to deploy an overcloud based on our customized configuration files
    • The cloud will be deployed with:
      • 3 controllers associated with Rack 1 networks
      • 2 Compute nodes associated with Rack 1 networks with ASAP² OVS HW offload
      • 2 Compute nodes associated with Rack 2 networks with ASAP² OVS HW offload
      • Routes to allow connectivity between racks/networks
      • VXLAN overlay tunnels between all the nodes
      • Before starting the deployment, verify connectivity between the racks' Leaf switches SW vlan interfaces facing the nodes over the OSPF underlay fabric. Without inter-rack connectivity for all networks, the overcloud deployment will fail.
      • In order to start the overcloud deployment, issue the command below - notice the custom environment files.


Note

  • Do not change the order of the environment files.
  • Make sure that the NTP server specified in the deploy command is accessible and can provide time to the undercloud node.
  • The overcloud_images.yaml file used in the deploy command is created during undercloud installation, verify its existence in the specified location.
  • The network-isolation.yaml file specified in the deploy command is created automatically during deployment from j2.yaml template file.
(undercloud) [stack@rhosp-director templates]$ openstack overcloud deploy --templates /usr/share/openstack-tripleo-heat-templates \
--libvirt-type kvm \
-n /usr/share/openstack-tripleo-heat-templates/network_data.yaml \
-r /home/stack/templates/roles_data.yaml \
--timeout 90 \
--validation-warnings-fatal \
--ntp-server 0.asia.pool.ntp.org \
-e /home/stack/templates/node-info.yaml \
-e /home/stack/templates/overcloud_images.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \
-e /home/stack/templates/network-environment.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/ovs-hw-offload.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/host-config-and-reboot.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/disable-telemetry.yaml


  • Overcloud VXLAN Configuration Validation
    • Once the cloud is deployed, login into the overcloud nodes and verify that VXLAN tunnels have been established between this node and the rest of the overcloud nodes over the routed Tenant networks.
    • In the example below, we can see VXLAN tunnels are maintained in the OVS level between a node located in rack 2 (tenant network 17.19.2.0/24) and all other nodes located in rack 1 (tenant network 172.19.0.0/24), in addition to the one located in its own rack.

      (undercloud) [stack@rhosp-director ~]$ openstack server list
      +--------------------------------------+---------------------------+--------+------------------------+----------------+------------+
      | ID | Name | Status | Networks | Image | Flavor |
      +--------------------------------------+---------------------------+--------+------------------------+----------------+------------+
      | 35d3b3b6-b867-4408-bfc3-b3d25395450d | overcloud-controller-0 | ACTIVE | ctlplane=192.168.24.19 | overcloud-full | control |
      | 0af372ed-4c5c-41fb-882a-c8a61cc01ba9 | overcloud-controller-1 | ACTIVE | ctlplane=192.168.24.20 | overcloud-full | control |
      | 3c189bb9-fd2f-451d-b2f8-4d17d7fa0381 | overcloud-computesriov1-1 | ACTIVE | ctlplane=192.168.24.13 | overcloud-full | compute-r1 |
      | 7eebc6f0-95af-4ec6-bf44-4db817bc4029 | overcloud-computesriov2-1 | ACTIVE | ctlplane=192.168.24.6 | overcloud-full | compute-r2 |
      | ebc7c38b-6221-45c9-b5ca-98023f5bbebc | overcloud-controller-2 | ACTIVE | ctlplane=192.168.24.17 | overcloud-full | control |
      | 7c700b2c-6a9f-480f-ada5-11866a891f04 | overcloud-computesriov2-0 | ACTIVE | ctlplane=192.168.24.12 | overcloud-full | compute-r2 |
      | 971e8651-4059-42b9-834d-74449007343d | overcloud-computesriov1-0 | ACTIVE | ctlplane=192.168.24.11 | overcloud-full | compute-r1 |
      +--------------------------------------+---------------------------+--------+------------------------+----------------+------------+
      
      (undercloud) [stack@rhosp-director ~]$ ssh heat-admin@192.168.24.12
      [heat-admin@overcloud-computesriov2-0 ~]$ sudo su
      [root@overcloud-computesriov2-0 heat-admin]# ovs-vsctl show
      .
      .
      Bridge br-tun
      Controller "tcp:127.0.0.1:6633"
      is_connected: true
      fail_mode: secure
      Port "vxlan-ac130068"
      Interface "vxlan-ac130068"
      type: vxlan
      options: {df_default="true", in_key=flow, local_ip="172.19.2.102", out_key=flow, remote_ip="172.19.0.104"}
      Port "vxlan-ac130070"
      Interface "vxlan-ac130070"
      type: vxlan
      options: {df_default="true", in_key=flow, local_ip="172.19.2.102", out_key=flow, remote_ip="172.19.0.112"}
      Port "vxlan-ac13026c"
      Interface "vxlan-ac13026c"
      type: vxlan
      options: {df_default="true", in_key=flow, local_ip="172.19.2.102", out_key=flow, remote_ip="172.19.2.108"}
      Port "vxlan-ac13006b"
      Interface "vxlan-ac13006b"
      type: vxlan
      options: {df_default="true", in_key=flow, local_ip="172.19.2.102", out_key=flow, remote_ip="172.19.0.107"}
      Port br-tun
      Interface br-tun
      type: internal
      Port patch-int
      Interface patch-int
      type: patch
      options: {peer=patch-tun}
      Port "vxlan-ac130064"
      Interface "vxlan-ac130064"
      type: vxlan
      options: {df_default="true", in_key=flow, local_ip="172.19.2.102", out_key=flow, remote_ip="172.19.0.100"}
      Port "vxlan-ac130065"
      Interface "vxlan-ac130065"
      type: vxlan
      options: {df_default="true", in_key=flow, local_ip="172.19.2.102", out_key=flow, remote_ip="172.19.0.101"}
  • Overcloud Host Aggregate Configuration
    • In order to enable the option to specify the target rack for VM creation, a Host Aggregate per rack must be configured first.
    • Login into the Overcloud dashboard and create Host Aggregate for Rack 1. Add Compute nodes 1,3 into it. You can identify the relevant hypervisors by its hostname which indicates their role/rack location.
    • Create a Host Aggregate for Rack 2 and add Compute nodes 2,4 into it.






  • Overcloud Instance Creation with ASAP²-based Ports
    • Create a Flavor as desired.
    • Upload an Image – use an updated OS image which includes the latest NVIDIA Network drivers.
    • Create a VXLAN overlay private network to be used by the instances (cli command is used)

      (undercloud) [stack@rhosp-director ~]$ source overcloudrc
      (overcloud) [stack@rhosp-director ~]$ openstack network create private --provider-network-type vxlan --share
    • Create a subnet and assign it to the private network:

      (overcloud) [stack@rhosp-director ~]$ openstack subnet create private_subnet --dhcp --network private --subnet-range 11.11.11.0/24
      
    • Create 2 direct ports with ASAP² capabilities  (use cli commands only) - each one will be used by VM in different rack:

      (overcloud) [stack@rhosp-director ~]$ openstack port create direct1 --vnic-type=direct --network private --binding-profile '{"capabilities":["switchdev"]}'
      (overcloud) [stack@rhosp-director ~]$ openstack port create direct2 --vnic-type=direct --network private --binding-profile '{"capabilities":["switchdev"]}'
    • Spawn on each rack an instance with ASAP² ports. Use allocated ports only without an allocated network as shown below:











  • OVS ASAP² Offload Validation
    • Ping or run traffic between the instances. The traffic will go over the OVS VXLAN overlay network and will be accelerated by ASAP² HW offload into the NIC.
    • SSH into the Compute nodes that are holding the instances and issue the following command to see the accelerated bi-directional traffic flows that were offloaded to the NIC using ASAP². In the output below we can see flow per direction:

      [root@overcloud-computesriov2-0 heat-admin]# ovs-dpctl dump-flows type=offloaded --name
      in_port(eth3),eth(src=fa:16:3e:15:e5:a8,dst=fa:16:3e:01:b3:aa),eth_type(0x0800),ipv4(frag=no), packets:1764662605, bytes:194112828502, used:0.470s, actions:set(tunnel(tun_id=0xf,src=172.19.2.102,dst=172.19.0.104,tp_dst=4789,flags(key))),vxlan_sys_4789
      
      tunnel(tun_id=0xf,src=172.19.0.104,dst=172.19.2.102,tp_dst=4789,flags(+key)),in_port(vxlan_sys_4789),eth(src=fa:16:3e:01:b3:aa,dst=fa:16:3e:15:e5:a8),eth_type(0x0800),ipv4(frag=no), packets:1760910540, bytes:105654631616, used:0.470s, actions:eth3



Configuration Files

config_files.zip

Performance Benchmarks 

For benchmark tests between instances in openstack environment utilizing OVS ASAP² acceleration refer to:

Quick Start Guide: ASAP² technology performance evaluation on Red Hat OpenStack Platform 13.



Notice

This document is provided for information purposes only and shall not be regarded as a warranty of a certain functionality, condition, or quality of a product. Neither NVIDIA Corporation nor any of its direct or indirect subsidiaries and affiliates (collectively: “NVIDIA”) make any representations or warranties, expressed or implied, as to the accuracy or completeness of the information contained in this document and assumes no responsibility for any errors contained herein. NVIDIA shall have no liability for the consequences or use of such information or for any infringement of patents or other rights of third parties that may result from its use. This document is not a commitment to develop, release, or deliver any Material (defined below), code, or functionality.
NVIDIA reserves the right to make corrections, modifications, enhancements, improvements, and any other changes to this document, at any time without notice.
Customer should obtain the latest relevant information before placing orders and should verify that such information is current and complete.
NVIDIA products are sold subject to the NVIDIA standard terms and conditions of sale supplied at the time of order acknowledgement, unless otherwise agreed in an individual sales agreement signed by authorized representatives of NVIDIA and customer (“Terms of Sale”). NVIDIA hereby expressly objects to applying any customer general terms and conditions with regards to the purchase of the NVIDIA product referenced in this document. No contractual obligations are formed either directly or indirectly by this document.
NVIDIA products are not designed, authorized, or warranted to be suitable for use in medical, military, aircraft, space, or life support equipment, nor in applications where failure or malfunction of the NVIDIA product can reasonably be expected to result in personal injury, death, or property or environmental damage. NVIDIA accepts no liability for inclusion and/or use of NVIDIA products in such equipment or applications and therefore such inclusion and/or use is at customer’s own risk.
NVIDIA makes no representation or warranty that products based on this document will be suitable for any specified use. Testing of all parameters of each product is not necessarily performed by NVIDIA. It is customer’s sole responsibility to evaluate and determine the applicability of any information contained in this document, ensure the product is suitable and fit for the application planned by customer, and perform the necessary testing for the application in order to avoid a default of the application or the product. Weaknesses in customer’s product designs may affect the quality and reliability of the NVIDIA product and may result in additional or different conditions and/or requirements beyond those contained in this document. NVIDIA accepts no liability related to any default, damage, costs, or problem which may be based on or attributable to: (i) the use of the NVIDIA product in any manner that is contrary to this document or (ii) customer product designs.
No license, either expressed or implied, is granted under any NVIDIA patent right, copyright, or other NVIDIA intellectual property right under this document. Information published by NVIDIA regarding third-party products or services does not constitute a license from NVIDIA to use such products or services or a warranty or endorsement thereof. Use of such information may require a license from a third party under the patents or other intellectual property rights of the third party, or a license from NVIDIA under the patents or other intellectual property rights of NVIDIA.
Reproduction of information in this document is permissible only if approved in advance by NVIDIA in writing, reproduced without alteration and in full compliance with all applicable export laws and regulations, and accompanied by all associated conditions, limitations, and notices.
THIS DOCUMENT AND ALL NVIDIA DESIGN SPECIFICATIONS, REFERENCE BOARDS, FILES, DRAWINGS, DIAGNOSTICS, LISTS, AND OTHER DOCUMENTS (TOGETHER AND SEPARATELY, “MATERIALS”) ARE BEING PROVIDED “AS IS.” NVIDIA MAKES NO WARRANTIES, EXPRESSED, IMPLIED, STATUTORY, OR OTHERWISE WITH RESPECT TO THE MATERIALS, AND EXPRESSLY DISCLAIMS ALL IMPLIED WARRANTIES OF NONINFRINGEMENT, MERCHANTABILITY, AND FITNESS FOR A PARTICULAR PURPOSE. TO THE EXTENT NOT PROHIBITED BY LAW, IN NO EVENT WILL NVIDIA BE LIABLE FOR ANY DAMAGES, INCLUDING WITHOUT LIMITATION ANY DIRECT, INDIRECT, SPECIAL, INCIDENTAL, PUNITIVE, OR CONSEQUENTIAL DAMAGES, HOWEVER CAUSED AND REGARDLESS OF THE THEORY OF LIABILITY, ARISING OUT OF ANY USE OF THIS DOCUMENT, EVEN IF NVIDIA HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. Notwithstanding any damages that customer might incur for any reason whatsoever, NVIDIA’s aggregate and cumulative liability towards customer for the products described herein shall be limited in accordance with the Terms of Sale for the product.

Trademarks
NVIDIA, the NVIDIA logo, and Mellanox are trademarks and/or registered trademarks of NVIDIA Corporation and/or Mellanox Technologies Ltd. in the U.S. and in other countries. Other company and product names may be trademarks of the respective companies with which they are associated.

Copyright
© 2022 NVIDIA Corporation & affiliates. All Rights Reserved.