Created on Jul 11, 2019
Introduction
Red Hat OpenStack Platform solution allows Cloud and Communication Service Providers to increase efficiency and agility while reducing the operational costs.
In this Reference Deployment Guide (RDG) we will demonstrate a complete deployment process of the Red Hat OpenStack Platform 13 as Network Functions Virtualization Infrastructure (NFVI) with NVIDIA Network ASAP²-based OVS Hardware Offload to achieve high-throughput SRIOV data path while keeping the existing Openvswitch control path and VXLAN connectivity .
We'll cover setup components, scale considerations and other technological aspects including Hardware BoM, network topology and the steps to validate VXLAN traffic offload between virtualized network functions (VNFs).
Before you start it's highly recommended to become familiar with the OVS Hardware Offload ASAP² technology which is introduced as inbox feature in RH-OSP13.
You are welcome to watch this 6 min video: Accelerated Switch and Packet Processing.
References
- Director Installation and Usage - Red Hat Customer Portal
- Using Composable Networks - Red Hat Customer Portal
- Quick Start Guide: ASAP² technology performance evaluation on Red Hat OpenStack Platform 13.
Components Overview
- NVIDIA Spectrum Switch family provides the most efficient network solutions for the ever-increasing performance demands of data center applications.
- NVIDIA ConnectX Network Adapter family delivers industry-leading connectivity for performance-driven server and storage applications. ConnectX adapter cards enable high bandwidth, coupled with ultra-low latency for diverse applications and systems, resulting in faster access and real-time responses.
- NVIDIA Accelerated Switching and Packet Processing (ASAP²) technology combines the performance and efficiency of server/storage networking hardware with the flexibility of virtual switching software. ASAP² offers up to 10 times better performance than non-offloaded OVS solutions, delivering software-defined networks with the highest total infrastructure efficiency, deployment flexibility and operational simplicity. (Introduced starting in ConnectX-4 Lx NICs.)
- NVIDIA NEO™ is a powerful platform for managing computing networks. It enables data center operators to efficiently provision, monitor and operate the modern data center fabric.
- NVIDIA LinkX Cables and Transceivers family provides the industry’s most complete line of 10, 25, 40, 50, 100, 200, and 400Gb interconnect products for Cloud, Web 2.0, Enterprise, telco, and storage data center applications. They are often used to link top-of-rack switches downwards to servers, storage & appliances and upwards in switch-to-switch applications
Solution Overview
Solution Design
- RH-OSP13 cloud is deployed in large scale over multiple racks interconnected via Spine/Leaf network architecture.
- Each Compute/Controller node is equipped with a dual-port 100GB NIC of which one port is dedicated for tenant data traffic and the other for storage and control traffic.
- Composable custom networks are used for network isolation between the racks. In our case, L3 OSPF underlay is used to route between the networks, however another fabric infrastructure could be used as desired.
- ASAP²-enabled Compute nodes are located in different racks and maintain VXLAN tunnels as overlay for tenant VM traffic.
- OVS ASAP² data plane acceleration is used by the Compute nodes to offload the CPU-intensive VXLAN traffic, in order to avoid the encapsulation/decapsulation performance penalty and achieve impressive high throughput.
- Switches are configured and managed by NEO.
- OpenStack Neutron is used as an SDN controller.
_________________________________________________________________
_________________________________________________________________
Bill of Materials
Note
- The BOM above is referring to the maximal configuration in a large scale with a blocking ratio of 3:1.
- It is possible to change the blocking ratio in order to obtain a different capacity.
- SN2100 Switch is sharing the same feature set with SN2700 and can be used in this solution when lower capacity is required.
- The 2-Rack BOM will be used in the solution example described below.
Large Scale Overview
Maximal Scale Diagram
Solution Example
We have chosen the key features below as a baseline to demonstrate the accelerated RH-OSP solution.
Solution Scale
- 2 x racks with a custom network set per rack
- 2 x SN2700 switches as Spine switches
- 2 x SN2100 switches as Leaf switches, 1 per rack
- 5 nodes in rack 1 (3 x Controller, 2 x Compute)
- 2 nodes in rack 2 (2 x Compute)
- All nodes are connected to Leaf switches using 2 x 100GB ports per node
- Leaf switches are connected to each Spine switch using a single 100GB port
Physical Diagram
Network Diagram
Note
- The Storage network is configured, however no storage nodes are used.
- Compute nodes are going out to the external network via the undercloud node.
Important !
The configuration steps below refer to a solution example based on 2 racks
Network Configuration Steps
Physical Configuration
- Connect the switches to the switch mgmt network
- Interconnect the switches using 100GB/s cables
Connect the Controller/Compute servers to the relevant networks according to the following diagrams:
Role
Leaf Switch Location
Controller 1
Rack 1
Controller 2
Rack 1
Controller 3
Rack 1
Compute 1
Rack 1
Compute 2
Rack 2
Compute 3
Rack 1
Compute 4
Rack 2
- Connect the Undercloud Director server to the IPMI/PXE/External networks.
OSPF Configuration
Interface Configuration
Set VLANs and VLAN interfaces on the Leaf switches according to the following diagrams:
Network Name
Network Set
Leaf Switch Location
Network Details
Switch Interface IP
VLAN ID
Switchport Mode
Storage
1
Rack 1
172.16.0.0 / 24
172.16.0.1
11
hybrid
Storage_Mgmt
172.17.0.0 / 24
172.17.0.1
21
hybrid
Internal API
172.18.0.0 / 24
172.18.0.1
31
hybrid
Tenant
172.19.0.0 / 24
172.19.0.1
41
access
Storage_2
2
Rack 2
172.16.2.0 / 24
172.16.2.1
12
hybrid
Storage_Mgmt_2
172.17.2.0 / 24
172.17.2.1
22
hybrid
Internal API _2
172.18.2.0 /24
172.18.2.1
32
hybrid
Tenant _2
172.19.2.0 /2 4
172.19.2.1
42
access
- Use NVIDIA NEO to provision the VLANs and interfaces via the pre-defined Provisioning Tasks:
- Add-VLAN to create the VLANs and set names.
- Set-Access-VLAN-Port to set access VLAN on the tenant network ports.
- Set-Hybrid-Vlan-Port to allow the required VLANs on the storage/storage_mgmt/internal API networks ports.
- Add-VLAN-To-OSPF-Area for distribution of the networks over OSPF.
- Add VLAN IP Address to set IP per VLAN (currently no pre-defined template).
For example, in order to set the port hybrid mode and allowed VLAN via the pre-defined Provisioning Tasks:
Note that since there is currently no predefined provisioning template for configuring the VLAN interface IP address, you can manually add the IP configuration into the “Add-VLAN-To-OSPF-Area” template and use it to define both IP addresses and OSPF distribution, for example:
Solution Configuration and Deployment Steps
Prerequisites
HW Specifications must be identical for servers with the same role (Compute/Controller/etc.)
Server Preparation
For all servers, make sure that in BIOS settings:
- SRIOV is enabled
- Network boot is set on the interface connected to PXE network
NIC Preparation
SRIOV configuration is disabled on ConnectX-5 NICs by default and must be enabled for every NIC used by a Compute node.
In order to enable and configure it, insert the Compute NIC into a test server with installed OS, and follow the steps below:
Verify using that firmware version is 16.21.2030 or newer:
[root@host ~]# ethtool -i ens2f0 driver: mlx5_core version: 5.0-0 firmware-version: 16.22.1002 (MT_0000000009) expansion-rom-version: bus-info: 0000:07:00.0 supports-statistics: yes supports-test: yes supports-eeprom-access: no supports-register-dump: no supports-priv-flags: yes
In case it is older, download the new firmware and burn the new firmware as explained here.
Install the mstflint package:
[root@host ~]# yum install mstflint
Identify the PCI ID of the first 100G port and enable SRIOV:
[root@host ~]# lspci | grep -i mel 07:00.0 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex] 07:00.1 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex] [root@host ~]# [root@host ~]# mstconfig -d 0000:07:00.0 query | grep -i sriov SRIOV_EN False(0) SRIOV_IB_ROUTING_MODE_P1 GID(0) SRIOV_IB_ROUTING_MODE_P2 GID(0) [root@host ~]# mstconfig -d 0000:07:00.0 set SRIOV_EN=1 Device #1: ---------- Device type: ConnectX5 PCI device: 0000:07:00.0 Configurations: Next Boot New SRIOV_EN False(0) True(1) Apply new Configuration? ? (y/n) [n] : y Applying... Done! -I- Please reboot machine to load new configurations.
Set the number of VFs to a high value, such as 64, and reboot the server to apply new configuration:
[root@host ~]# mstconfig -d 0000:07:00.0 query | grep -i vfs NUM_OF_VFS 0 [root@host ~]# mstconfig -d 0000:07:00.0 set NUM_OF_VFS=64 Device #1: ---------- Device type: ConnectX5 PCI device: 0000:07:00.0 Configurations: Next Boot New NUM_OF_VFS 0 64 Apply new Configuration? ? (y/n) [n] : y Applying... Done! -I- Please reboot machine to load new configurations. [root@host ~]# reboot
- Confirm the new settings were applied using the mstconfig query commands shown above.
- Insert the NIC back to the Compute node.
- Repeat the procedure above for every Compute node NIC used in our setup.
Note
- In our solution, the first port of the two 100G ports in every NIC is used for the ASAP² accelerated data plane. This is the reason we enable SRIOV only on the first ConnectX NIC PCI device (07:00.0 in the example above).
- There are future plans to support an automated procedure to update and configure the NICs on the Compute nodes from the Undercloud.
Accelerated RH-OSP Installation and Deployment Steps
- Install Red Hat 7.5 OS on the Undercloud server and set an IP on its interface connected to the External network; make sure it has internet connectivity.
- Install the Undercloud and the director as instructed in section 4 of the Red Hat OSP DIRECTOR INSTALLATION AND USAGE guide: Director Installation and Usage - Red Hat Customer Portal
- Our undercloud.conf file is attached as a reference.
- Configure a container image source as instructed in section 5 of the guide.
- Our solution is using undercloud as a local registry.
- Register the nodes of the overcloud as instructed in section 6.1.
- Our instackenv.json file is attached as a reference.
- Inspect the hardware of the nodes as instructed in section 6.2.
Once introspection is completed, it is recommended to confirm for each node that the desired root disk was detected since cloud deployment can fail later because of insufficient disk space. Use the following command to check the free space on the detected disk selected as root:
(undercloud) [stack@rhosp-director ~]$ openstack baremetal node show 92c4c1cb-ce7d-48d4-a2d9-75b2651db097 | grep properties | properties | {u'memory_mb': u'131072', u'cpu_arch': u'x86_64', u'local_gb': u'418', u'cpus': u'24', u'capabilities': u'boot_option:local'}
- “local_gb” value is representing the disk size. In case the disk size is low and not as expected, use the procedure described in section 6.6 for defining the root disk for the node. Note that an additional introspection cycle is required for this node after the root disk is changed.
Verify that all nodes were registered properly and changed their state to “available” before proceeding to the next step:
+--------------------------------------+--------------+---------------+-------------+--------------------+-------------+ | UUID | Name | Instance UUID | Power State | Provisioning State | Maintenance | +--------------------------------------+--------------+---------------+-------------+--------------------+-------------+ | d1fca940-e341-491b-8afd-0cf6d748aa29 | controller-1 | None | power off | available | False | | 6b24d02c-3fd2-4e55-a730-c45008f01723 | controller-2 | None | power off | available | False | | 098c3e2d-1c70-41d2-983b-6c266387de0b | controller-3 | None | power off | available | False | | 91492c2a-b26c-49ef-9d4e-e492a1578076 | compute-1 | None | power off | available | False | | cdf9e0ec-e3cb-4005-86f6-d40e684a9b19 | compute-2 | None | power off | available | False | | 92c4c1cb-ce7d-48d4-a2d9-75b2651db097 | compute-3 | None | power off | available | False | | bb5e829a-834b-4eb1-b733-0012ce9d5f00 | compute-4 | None | power off | available | False | +--------------------------------------+--------------+---------------+-------------+--------------------+-------------+
- Tagging Nodes into Profiles
Tag the controllers nodes into “control” default profile:
(undercloud) [stack@rhosp-director ~]$ openstack baremetal node set --property capabilities='profile:control,boot_option:local' controller-1 (undercloud) [stack@rhosp-director ~]$ openstack baremetal node set --property capabilities='profile:control,boot_option:local' controller-2 (undercloud) [stack@rhosp-director ~]$ openstack baremetal node set --property capabilities='profile:control,boot_option:local' controller-3
Create two new compute flavors -- one per rack (compute-r1, compute-r2) -- and attach the flavors to profiles with a correlated name:
(undercloud) [stack@rhosp-director ~]$ openstack flavor create --id auto --ram 4096 --disk 40 --vcpus 1 compute-r1 (undercloud) [stack@rhosp-director ~]$ openstack flavor set --property "capabilities:boot_option"="local" --property "capabilities:profile"="compute-r1" --property "resources:CUSTOM_BAREMETAL"="1" --property "resources:DISK_GB"="0" --property "resources:MEMORY_MB"="0" --property "resources:VCPU"="0" compute-r1 (undercloud) [stack@rhosp-director ~]$ openstack flavor create --id auto --ram 4096 --disk 40 --vcpus 1 compute-r2 (undercloud) [stack@rhosp-director ~]$ openstack flavor set --property "capabilities:boot_option"="local" --property "capabilities:profile"="compute-r2" --property "resources:CUSTOM_BAREMETAL"="1" --property "resources:DISK_GB"="0" --property "resources:MEMORY_MB"="0" --property "resources:VCPU"="0" compute-r2
Tag compute nodes 1,3 into “compute-r1” profile to associate it with Rack 1, and compute nodes 2,4 into “compute-r2” profile to associate it with Rack 2:
(undercloud) [stack@rhosp-director ~]$ openstack baremetal node set --property capabilities='profile:compute-r1,boot_option:local' compute-1 (undercloud) [stack@rhosp-director ~]$ openstack baremetal node set --property capabilities='profile:compute-r1,boot_option:local' compute-3 (undercloud) [stack@rhosp-director ~]$ openstack baremetal node set --property capabilities='profile:compute-r2,boot_option:local' compute-2 (undercloud) [stack@rhosp-director ~]$ openstack baremetal node set --property capabilities='profile:compute-r2,boot_option:local' compute-4
Verify profile tagging per node using the command below:
(undercloud) [stack@rhosp-director ~]$ openstack overcloud profiles list +--------------------------------------+--------------+-----------------+-----------------+-------------------+ | Node UUID | Node Name | Provision State | Current Profile | Possible Profiles | +--------------------------------------+--------------+-----------------+-----------------+-------------------+ | d1fca940-e341-491b-8afd-0cf6d748aa29 | controller-1 | available | control | | | 6b24d02c-3fd2-4e55-a730-c45008f01723 | controller-2 | available | control | | | 098c3e2d-1c70-41d2-983b-6c266387de0b | controller-3 | available | control | | | 91492c2a-b26c-49ef-9d4e-e492a1578076 | compute-1 | available | compute-r1 | | | cdf9e0ec-e3cb-4005-86f6-d40e684a9b19 | compute-2 | available | compute-r2 | | | 92c4c1cb-ce7d-48d4-a2d9-75b2651db097 | compute-3 | available | compute-r1 | | | bb5e829a-834b-4eb1-b733-0012ce9d5f00 | compute-4 | available | compute-r2 | | +--------------------------------------+--------------+-----------------+-----------------+-------------------+
Note
It is possible to tag the nodes into profiles in instackenv.json file during node registration (section 6.1) instead of running the tag command per node, however flavors and profiles must be created in any case.
Note
- Role definitions:
Create the /home/stack/templates/ directory and generate inside it a new roles file (named _data.yaml) with two types of roles using the following command:
(undercloud) [stack@rhosp-director ~]$ mkdir /home/stack/templates (undercloud) [stack@rhosp-director ~]$ cd /home/stack/templates/ (undercloud) [stack@rhosp-director templates]$ openstack overcloud roles generate -o roles_data.yaml Controller ComputeSriov
Edit the file by changing ComputeSriov to ComputeSriov1:
############################################################################### # Role: ComputeSriov1 # ############################################################################### - name: ComputeSriov1 description: | Compute SR-IOV Role R1 CountDefault: 1 networks: - InternalApi - Tenant - Storage HostnameFormatDefault: '%stackname%-computesriov1-%index%' disable_upgrade_deployment: True ServicesDefault:
Clone the entire ComputeSriov1 role section, change it to ComputeSriov2, and change its networks to represent the network set on the second rack:
############################################################################### # Role: ComputeSriov2 # ############################################################################### - name: ComputeSriov2 description: | Compute SR-IOV Role R2 CountDefault: 1 networks: - InternalApi_2 - Tenant_2 - Storage_2 HostnameFormatDefault: '%stackname%-computesriov2-%index%' disable_upgrade_deployment: True ServicesDefault:
- Now the roles_data.yaml files include 3 types of roles: Controller and ComputeSriov1 which are associated with the Rack 1 network set, and ComputeSriov2 which is associated with the Rack 2 network set.
- The full configuration file is attached to this document for your convenience.
- Environment File for Defining Node Counts and Flavors:
- Create /home/stack/templates/node-info.yaml, as explained in section 6.7, edit it to include count per role and correlated flavors per role.
Full configuration file:
parameter_defaults: OvercloudControllerFlavor: control OvercloudComputeSriov1Flavor: compute-r1 OvercloudComputeSriov2Flavor: compute-r2 ControllerCount: 3 ComputeSriov1Count: 2
- NVIDIA ConnectX NICs Listing
Run the following command to go over all registered nodes and identify the interface names of the dual port ConnectX 100GB NIC:
(undercloud) [stack@rhosp-director templates]$ for node in $(openstack baremetal node list --fields uuid -f value) ; do openstack baremetal introspection interface list $node ; done . . +-----------+-------------------+----------------------+-------------------+----------------+ | Interface | MAC Address | Switch Port VLAN IDs | Switch Chassis ID | Switch Port ID | +-----------+-------------------+----------------------+-------------------+----------------+ | eno1 | ec:b1:d7:83:11:b8 | [] | 94:57:a5:25:fa:80 | 29 | | eno2 | ec:b1:d7:83:11:b9 | [] | None | None | | eno3 | ec:b1:d7:83:11:ba | [] | None | None | | eno4 | ec:b1:d7:83:11:bb | [] | None | None | | ens1f1 | ec:0d:9a:7d:81:b3 | [] | 24:8a:07:7f:ef:00 | Eth1/14 | | ens1f0 | ec:0d:9a:7d:81:b2 | [] | 24:8a:07:7f:ef:00 | Eth1/1 | +-----------+-------------------+----------------------+-------------------+----------------+
Note
Names must be identical for all nodes, or at least for all nodes sharing the same role. In our case, it is ens2f0/ens2f1 in Controller nodes, and enf1f0/ens1f1 in Compute nodes.
- HW Offload Configuration File
- Locate /usr/share/openstack-tripleo-heat-templates/environments/ovs-hw-offload.yaml file and edit it according to the following guidelines per ComputeSriov role:
- Set offload enabled
- Set kernel args for huge pages
- Set the desired interface for accelerated data plane (ens1f0 in our case)
- Set the desired VF count (64 in our example)
- Set the correct Nova PCI Passthrough devname, and physical_network: null
- Set ExtraConfig for correlation between the role and the correct tenant/api network set
Full configuration file is attached to this document, see example below:
# A Heat environment file that enables OVS Hardware Offload in the overcloud. # This works by configuring SR-IOV NIC with switchdev and OVS Hardware Offload on # compute nodes. The feature supported in OVS 2.8.0 resource_registry: OS::TripleO::Services::NeutronSriovHostConfig: ../puppet/services/neutron-sriov-host-config.yaml parameter_defaults: NovaSchedulerDefaultFilters: ['RetryFilter','AvailabilityZoneFilter','RamFilter','ComputeFilter','ComputeCapabilitiesFilter','ImagePropertiesFilter','ServerGroupAntiAffinityFilter','ServerGroupAffinityFilter','PciPassthroughFilter'] NovaSchedulerAvailableFilters: ["nova.scheduler.filters.all_filters","nova.scheduler.filters.pci_passthrough_filter.PciPassthroughFilter"] # Kernel arguments for ComputeSriov1 node ComputeSriov1Parameters: KernelArgs: "default_hugepagesz=1GB hugepagesz=1G hugepages=16 intel_iommu=on iommu=pt" OvsHwOffload: True # Number of VFs that needs to be configured for a physical interface NeutronSriovNumVFs: ["ens1f0:64:switchdev"] # Mapping of SR-IOV PF interface to neutron physical_network. # In case of Vxlan/GRE physical_network should be null. # In case of flat/vlan the physical_network should as configured in neutron. NovaPCIPassthrough: - devname: "ens1f0" physical_network: null NovaReservedHostMemory: 4096 # Extra config for mapping the ovs local_ip to the relevant tenant network ComputeSriov1ExtraConfig: nova::vncproxy::host: "%{hiera('internal_api')}" neutron::agents::ml2::ovs::local_ip: "%{hiera('tenant')}" # Kernel arguments for ComputeSriov2 node ComputeSriov2Parameters: KernelArgs: "default_hugepagesz=1GB hugepagesz=1G hugepages=16 intel_iommu=on iommu=pt" OvsHwOffload: True # Number of VFs that needs to be configured for a physical interface NeutronSriovNumVFs: ["ens1f0:64:switchdev"] # Mapping of SR-IOV PF interface to neutron physical_network. # In case of Vxlan/GRE physical_network should be null. # In case of flat/vlan the physical_network should as configured in neutron. NovaPCIPassthrough: - devname: "ens1f0" physical_network: null NovaReservedHostMemory: 4096 # Extra config for mapping the ovs local_ip to the relevant tenant network ComputeSriov2ExtraConfig: nova::vncproxy::host: "%{hiera('internal_api_2')}" neutron::agents::ml2::ovs::local_ip: "%{hiera('tenant_2')}"
- Locate /usr/share/openstack-tripleo-heat-templates/environments/ovs-hw-offload.yaml file and edit it according to the following guidelines per ComputeSriov role:
- Network Configuration File:
- Locate /usr/share/openstack-tripleo-heat-templates/network_data.yaml file and edit it according to the following guidelines:
- Set External network parameters (subnet, allocation pool, default GW).
- Set rack 1 networks set parameters to match the subnets/vlans configured on Rack 1 Leaf switch.
- Make sure you use the network names you specified in roles_data.yaml for Controller\ComputeSriov1 role networks.
- Create a second set of networks to match the subnets/vlans configured on Rack 2 Leaf switch.
- Make sure you use the network names you specified in roles_data.yaml for ComputeSriov2 role networks.
- Disable the “management” network, as it is not used in our example.
The configuration is based on the following matrix to match the Leaf switch configuration as executed in Network Configuration section above:
Network Name
Network Set
Network Location
Network Details
VLAN
Network Allocation Pool
Storage
1
Rack 1
172.16.0.0/24
11
172.16.0.100-250
Storage_Mgmt
172.17.0.0/24
21
172.17.0.100-250
Internal API
172.18.0.0/24
31
172.18.0.100-250
Tenant
172.19.0.0/24
41
172.19.0.100-250
Storage_2
2
Rack 2
172.16.2.0/24
12
172.16.2.100-250
Storage_Mgmt_2
172.17.2.0/24
22
172.17.2.100-250
Internal API _2
172.18.2.0/24
32
172.18.2.100-250
Tenant _2
172.19.2.0/24
42
172.19.2.100-250
External
-
Public Switch
10.7.208.0/24
-
10.7.208.10-21
- Locate /usr/share/openstack-tripleo-heat-templates/network_data.yaml file and edit it according to the following guidelines:
- Full configuration file is attached to this document
Partial example for one of the configured networks (Storage network - 2 sets), External network and Management network configuration:
- name: Storage vip: true vlan: 11 name_lower: storage ip_subnet: '172.16.0.0/24' allocation_pools: [{'start': '172.16.0.100', 'end': '172.16.0.250'}] ipv6_subnet: 'fd00:fd00:fd00:1100::/64' ipv6_allocation_pools: [{'start': 'fd00:fd00:fd00:1100::10', 'end': 'fd00:fd00:fd00:1100:ffff:ffff:ffff:fffe'}] . . - name: Storage_2 vip: true vlan: 12 name_lower: storage_2 ip_subnet: '172.16.2.0/24' allocation_pools: [{'start': '172.16.2.100', 'end': '172.16.2.250'}] ipv6_subnet: 'fd00:fd00:fd00:1200::/64' ipv6_allocation_pools: [{'start': 'fd00:fd00:fd00:1200::10', 'end': 'fd00:fd00:fd00:1200:ffff:ffff:ffff:fffe'}] . . - name: External vip: true name_lower: external vlan: 10 ip_subnet: '10.7.208.0/24' allocation_pools: [{'start': '10.7.208.10', 'end': '10.7.208.21'}] gateway_ip: '10.7.208.1' ipv6_subnet: '2001:db8:fd00:1000::/64' ipv6_allocation_pools: [{'start': '2001:db8:fd00:1000::10', 'end': '2001:db8:fd00:1000:ffff:ffff:ffff:fffe'}] gateway_ipv6: '2001:db8:fd00:1000::1' - name: Management # Management network is enabled by default for backwards-compatibility, but # is not included in any roles by default. Add to role definitions to use. enabled: false
- Deploying a plan from existing templates
Use the following command to create a plan called “asap-plan”:
(undercloud) [stack@rhosp-director templates]$ openstack overcloud plan create --templates /usr/share/openstack-tripleo-heat-templates asap-plan
Create a dedicated folder and deploy the plan files inside it:
(undercloud) [stack@rhosp-director templates]$ mkdir /home/stack/asap-plan (undercloud) [stack@rhosp-director templates]$ cd /home/stack/asap-plan (undercloud) [stack@rhosp-director asap-plan]$ openstack container save asap-plan
- Editing plan files to be used in deployment
- Copy the following files into the /home/stack/templates directory
- /home/stack/asap-plan/environments/network-environment.yaml
- /home/stack/asap-plan/network/config/single-nic-vlans/controller.yaml
- /home/stack/asap-plan/network/config/single-nic-vlans/computesriov1.yaml
- /home/stack/asap-plan/network/config/single-nic-vlans/computesriov2.yaml
- Edit /home/stack/templates/network-environment.yaml according to the following guidelines:
- Set the role file locations under resource_registry section.
- Set the Undercloud control plane IP as the default route for this network.
- Set the required DNS servers for the setup nodes.
See example below. Full configuration file is attached to this document.
#This file is an example of an environment file for defining the isolated #networks and related parameters. resource_registry: # Network Interface templates to use (these files must exist). You can # override these by including one of the net-*.yaml environment files, # such as net-bond-with-vlans.yaml, or modifying the list here. # Port assignments for the Controller OS::TripleO::Controller::Net::SoftwareConfig: /home/stack/templates/controller.yaml # Port assignments for the ComputeSriov1 OS::TripleO::ComputeSriov1::Net::SoftwareConfig: /home/stack/templates/computesriov1.yaml # Port assignments for the ComputeSriov2 OS::TripleO::ComputeSriov2::Net::SoftwareConfig: /home/stack/templates/computesriov2.yaml parameter_defaults: # This section is where deployment-specific configuration is done # CIDR subnet mask length for provisioning network ControlPlaneSubnetCidr: '24' # Gateway router for the provisioning network (or Undercloud IP) ControlPlaneDefaultRoute: 192.168.24.1 . . # Define the DNS servers (maximum 2) for the overcloud nodes DnsServers: ["10.7.77.192","10.7.77.135"]
- Copy the following files into the /home/stack/templates directory
- Edit /home/stack/templates/controller.yaml according to the following guidelines:
- Set the location of run-os-net-config.sh script.
- Set Supernet and GW per network to allow routing between network sets located in different racks. The GW would be the IP interface which was configured on the Leaf switch interface facing this network. Supernet and gateway for 2 tenant networks are seen in green in the example below.
- Set type, networks and routes for each interface used by Controller nodes. In our example, we use for Controller nodes:
- Dedicated 1G interface (type “interface”) for provisioning (PXE) network.
- Dedicated 1G interface (type “ovs_bridge”) for External network. This network has a default GW configured.
- Dedicated 100G interface (type “interface” without vlans) for data plane (Tenant) network in Rack 1. The network is associated with a supernet and has a route allowing it to reach other networks in the same supernet located in different racks.
- Dedicated 100G interface (type “ovs_bridge”) with vlans for Storage/StorageMgmt/InternalApi networks in Rack 1. Each network is associated with a supernet and has a route allowing it to reach other networks in the same supernet located in different racks.
See example below. Full configuration file is attached to this document.
TenantSupernet: default: '172.19.0.0/16' description: Supernet that contains Tenant subnets for all roles. type: string TenantGateway: default: '172.19.0.1' description: Router gateway on tenant network type: string Tenant_2Gateway: default: '172.19.2.1' description: Router gateway on tenant_2 network type: string . . resources: OsNetConfigImpl: type: OS::Heat::SoftwareConfig properties: group: script config: str_replace: template: get_file: /usr/share/openstack-tripleo-heat-templates/network/scripts/run-os-net-config.sh params: $network_config: network_config: . . # NIC 3 - Data Plane (Tenant net) - type: ovs_bridge name: br-sriov use_dhcp: false members: - type: interface name: ens2f0 addresses: - ip_netmask: get_param: TenantIpSubnet routes: - ip_netmask: get_param: TenantSupernet next_hop: get_param: TenantGateway
- Edit /home/stack/templates/controller.yaml according to the following guidelines:
- Edit /home/stack/templates/computesriov1.yaml according to the following guidelines:
- Set the location of run-os-net-config.sh script - not mentioned in the example below, see example above or full configuration file.
- Set Supernet and GW per network to allow routing between network sets located in different racks. The GW would be the IP interface which was configured on the Leaf switch interface facing this network. - not mentioned in the example below, see example above or full configuration file.
- Set type, networks and routes for each interface used by Compute nodes in Rack 1. In our example, we use for those ComputeSriov1 nodes:
- Dedicated 1G interface (type “interface”) for provisioning (PXE) network.
- Dedicated 100G interface (type “interface” without vlans) for data plane (Tenant) network in Rack 1. The network is associated with a supernet and has a route allowing it to reach other networks in the same supernet located in different racks.
Dedicated 100G interface (type “ovs_bridge”) with vlans for Storage/InternalApi networks in Rack 1. Each network is associated with a supernet and has a route allowing it to reach other networks in the same supernet located in different racks - not mentioned in the example below, see full configuration file.
# NIC 1 - Provisioning net - type: interface name: eno1 use_dhcp: false dns_servers: get_param: DnsServers addresses: - ip_netmask: list_join: - / - - get_param: ControlPlaneIp - get_param: ControlPlaneSubnetCidr routes: - ip_netmask: 169.254.169.254/32 next_hop: get_param: EC2MetadataIp - default: true next_hop: get_param: ControlPlaneDefaultRoute # NIC 2 - ASAP² Data Plane (Tenant net) - type: ovs_bridge name: br-sriov use_dhcp: false members: - type: interface name: ens1f0 addresses: - ip_netmask: get_param: TenantIpSubnet routes: - ip_netmask: get_param: TenantSupernet next_hop: get_param: TenantGateway
- Edit /home/stack/templates/computesriov1.yaml according to the following guidelines:
- Edit /home/stack/templates/computesriov2.yaml according to the following guidelines:
- Set the location of run-os-net-config.sh script - not mentioned in the example below, see example above or full configuration file.
- Set Supernet and GW per network to allow routing between networks set located in different racks. The GW would be the IP interface which was configured on the Leaf switch interface facing this network - not mentioned in the example below, see example above or full configuration file.
- Set type, networks and routes for each interface used by Compute nodes in Rack 2. In our example, we use for those ComputeSriov2 nodes:
- Dedicated 1G interface (type “interface”) for provisioning (PXE) network - not mentioned in the example below, see example above or full configuration file.
- Dedicated 100G interface (type “interface” without vlans) for data plane (Tenant) network in Rack 2. The network is associated with a supernet and has a route allowing it to reach other networks in the same supernet located in different racks.
- Dedicated 100G interface (type “ovs_bridge”) with vlans for Storage/InternalApi networks in Rack 2. Each network is associated with a supernet and has a route allowing it to reach other networks in the same supernet located in different racks - not mentioned in the example below, see example above or full configuration file.
See example below. Full configuration file is attached to this document.
# NIC 2 - ASAP² Data Plane (Tenant net) - type: ovs_bridge name: br-sriov use_dhcp: false members: - type: interface name: ens1f0 addresses: - ip_netmask: get_param: Tenant_2IpSubnet routes: - ip_netmask: get_param: TenantSupernet next_hop: get_param: Tenant_2Gateway
- Edit /home/stack/templates/computesriov2.yaml according to the following guidelines:
- Deploying the overcloud
- Now we are ready to deploy an overcloud based on our customized configuration files
- The cloud will be deployed with:
- 3 controllers associated with Rack 1 networks
- 2 Compute nodes associated with Rack 1 networks with ASAP² OVS HW offload
- 2 Compute nodes associated with Rack 2 networks with ASAP² OVS HW offload
- Routes to allow connectivity between racks/networks
- VXLAN overlay tunnels between all the nodes
- Before starting the deployment, verify connectivity between the racks' Leaf switches SW vlan interfaces facing the nodes over the OSPF underlay fabric. Without inter-rack connectivity for all networks, the overcloud deployment will fail.
- In order to start the overcloud deployment, issue the command below - notice the custom environment files.
Note
- Do not change the order of the environment files.
- Make sure that the NTP server specified in the deploy command is accessible and can provide time to the undercloud node.
- The overcloud_images.yaml file used in the deploy command is created during undercloud installation, verify its existence in the specified location.
- The network-isolation.yaml file specified in the deploy command is created automatically during deployment from j2.yaml template file.
(undercloud) [stack@rhosp-director templates]$ openstack overcloud deploy --templates /usr/share/openstack-tripleo-heat-templates \ --libvirt-type kvm \ -n /usr/share/openstack-tripleo-heat-templates/network_data.yaml \ -r /home/stack/templates/roles_data.yaml \ --timeout 90 \ --validation-warnings-fatal \ --ntp-server 0.asia.pool.ntp.org \ -e /home/stack/templates/node-info.yaml \ -e /home/stack/templates/overcloud_images.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \ -e /home/stack/templates/network-environment.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/ovs-hw-offload.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/host-config-and-reboot.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/disable-telemetry.yaml
- Overcloud VXLAN Configuration Validation
- Once the cloud is deployed, login into the overcloud nodes and verify that VXLAN tunnels have been established between this node and the rest of the overcloud nodes over the routed Tenant networks.
In the example below, we can see VXLAN tunnels are maintained in the OVS level between a node located in rack 2 (tenant network 17.19.2.0/24) and all other nodes located in rack 1 (tenant network 172.19.0.0/24), in addition to the one located in its own rack.
(undercloud) [stack@rhosp-director ~]$ openstack server list +--------------------------------------+---------------------------+--------+------------------------+----------------+------------+ | ID | Name | Status | Networks | Image | Flavor | +--------------------------------------+---------------------------+--------+------------------------+----------------+------------+ | 35d3b3b6-b867-4408-bfc3-b3d25395450d | overcloud-controller-0 | ACTIVE | ctlplane=192.168.24.19 | overcloud-full | control | | 0af372ed-4c5c-41fb-882a-c8a61cc01ba9 | overcloud-controller-1 | ACTIVE | ctlplane=192.168.24.20 | overcloud-full | control | | 3c189bb9-fd2f-451d-b2f8-4d17d7fa0381 | overcloud-computesriov1-1 | ACTIVE | ctlplane=192.168.24.13 | overcloud-full | compute-r1 | | 7eebc6f0-95af-4ec6-bf44-4db817bc4029 | overcloud-computesriov2-1 | ACTIVE | ctlplane=192.168.24.6 | overcloud-full | compute-r2 | | ebc7c38b-6221-45c9-b5ca-98023f5bbebc | overcloud-controller-2 | ACTIVE | ctlplane=192.168.24.17 | overcloud-full | control | | 7c700b2c-6a9f-480f-ada5-11866a891f04 | overcloud-computesriov2-0 | ACTIVE | ctlplane=192.168.24.12 | overcloud-full | compute-r2 | | 971e8651-4059-42b9-834d-74449007343d | overcloud-computesriov1-0 | ACTIVE | ctlplane=192.168.24.11 | overcloud-full | compute-r1 | +--------------------------------------+---------------------------+--------+------------------------+----------------+------------+ (undercloud) [stack@rhosp-director ~]$ ssh heat-admin@192.168.24.12 [heat-admin@overcloud-computesriov2-0 ~]$ sudo su [root@overcloud-computesriov2-0 heat-admin]# ovs-vsctl show . . Bridge br-tun Controller "tcp:127.0.0.1:6633" is_connected: true fail_mode: secure Port "vxlan-ac130068" Interface "vxlan-ac130068" type: vxlan options: {df_default="true", in_key=flow, local_ip="172.19.2.102", out_key=flow, remote_ip="172.19.0.104"} Port "vxlan-ac130070" Interface "vxlan-ac130070" type: vxlan options: {df_default="true", in_key=flow, local_ip="172.19.2.102", out_key=flow, remote_ip="172.19.0.112"} Port "vxlan-ac13026c" Interface "vxlan-ac13026c" type: vxlan options: {df_default="true", in_key=flow, local_ip="172.19.2.102", out_key=flow, remote_ip="172.19.2.108"} Port "vxlan-ac13006b" Interface "vxlan-ac13006b" type: vxlan options: {df_default="true", in_key=flow, local_ip="172.19.2.102", out_key=flow, remote_ip="172.19.0.107"} Port br-tun Interface br-tun type: internal Port patch-int Interface patch-int type: patch options: {peer=patch-tun} Port "vxlan-ac130064" Interface "vxlan-ac130064" type: vxlan options: {df_default="true", in_key=flow, local_ip="172.19.2.102", out_key=flow, remote_ip="172.19.0.100"} Port "vxlan-ac130065" Interface "vxlan-ac130065" type: vxlan options: {df_default="true", in_key=flow, local_ip="172.19.2.102", out_key=flow, remote_ip="172.19.0.101"}
- Overcloud Host Aggregate Configuration
- In order to enable the option to specify the target rack for VM creation, a Host Aggregate per rack must be configured first.
- Login into the Overcloud dashboard and create Host Aggregate for Rack 1. Add Compute nodes 1,3 into it. You can identify the relevant hypervisors by its hostname which indicates their role/rack location.
- Create a Host Aggregate for Rack 2 and add Compute nodes 2,4 into it.
- Overcloud Instance Creation with ASAP²-based Ports
- Create a Flavor as desired.
- Upload an Image – use an updated OS image which includes the latest NVIDIA Network drivers.
Create a VXLAN overlay private network to be used by the instances (cli command is used)
(undercloud) [stack@rhosp-director ~]$ source overcloudrc (overcloud) [stack@rhosp-director ~]$ openstack network create private --provider-network-type vxlan --share
Create a subnet and assign it to the private network:
(overcloud) [stack@rhosp-director ~]$ openstack subnet create private_subnet --dhcp --network private --subnet-range 11.11.11.0/24
Create 2 direct ports with ASAP² capabilities (use cli commands only) - each one will be used by VM in different rack:
(overcloud) [stack@rhosp-director ~]$ openstack port create direct1 --vnic-type=direct --network private --binding-profile '{"capabilities":["switchdev"]}' (overcloud) [stack@rhosp-director ~]$ openstack port create direct2 --vnic-type=direct --network private --binding-profile '{"capabilities":["switchdev"]}'
- Spawn on each rack an instance with ASAP² ports. Use allocated ports only without an allocated network as shown below:
- Spawn on each rack an instance with ASAP² ports. Use allocated ports only without an allocated network as shown below:
- OVS ASAP² Offload Validation
- Ping or run traffic between the instances. The traffic will go over the OVS VXLAN overlay network and will be accelerated by ASAP² HW offload into the NIC.
SSH into the Compute nodes that are holding the instances and issue the following command to see the accelerated bi-directional traffic flows that were offloaded to the NIC using ASAP². In the output below we can see flow per direction:
[root@overcloud-computesriov2-0 heat-admin]# ovs-dpctl dump-flows type=offloaded --name in_port(eth3),eth(src=fa:16:3e:15:e5:a8,dst=fa:16:3e:01:b3:aa),eth_type(0x0800),ipv4(frag=no), packets:1764662605, bytes:194112828502, used:0.470s, actions:set(tunnel(tun_id=0xf,src=172.19.2.102,dst=172.19.0.104,tp_dst=4789,flags(key))),vxlan_sys_4789 tunnel(tun_id=0xf,src=172.19.0.104,dst=172.19.2.102,tp_dst=4789,flags(+key)),in_port(vxlan_sys_4789),eth(src=fa:16:3e:01:b3:aa,dst=fa:16:3e:15:e5:a8),eth_type(0x0800),ipv4(frag=no), packets:1760910540, bytes:105654631616, used:0.470s, actions:eth3
Configuration Files
Performance Benchmarks
For benchmark tests between instances in openstack environment utilizing OVS ASAP² acceleration refer to:
Quick Start Guide: ASAP² technology performance evaluation on Red Hat OpenStack Platform 13.
Related Documents