VXLAN
Data centers are being increasingly consolidated and outsourced in an effort to improve the deployment time of applications and reduce operational costs, and applications are constantly raising demand for compute, storage, and network resource. Thus, in order to scale compute, storage, and network resources, physical resources are being abstracted from their logical representation, in what is referred to as server, storage, and network virtualization. Virtualization can be implemented in various layers of computer systems or networks.
Multi-tenant data centers are taking advantage of the benefits of server virtualization to provide a new kind of hosting—a virtual hosted data center. Multi-tenant data centers are ones where individual tenants could belong to a different company or a different department. To a tenant, virtual data centers are similar to their physical counterparts, consisting of end-stations attached to a network, complete with services such as load balancers and firewalls. To tenant systems, a virtual network looks like a normal network, except that the only end-stations connected to the virtual network are those belonging to a tenant’s specific virtual network.
How a virtual network is implemented does not generally matter to the tenant; what matters is that the service provided (Layer 2 (L2) or Layer 3 (L3)) has the right semantics, performance, etc. It could be implemented via a pure routed network, a pure bridged network, or a combination of bridged and routed networks.
VXLAN (Virtual eXtensible Local Area Network) addresses the above requirements of the L2 and L3 data center network infrastructure in the presence of virtual networks in a multi-tenant environment. It runs over the existing networking infrastructure and provides a means to “stretch” an L2 network. Each overlay bridge is called a VXLAN segment. Only machines within the same VXLAN segment can communicate with each other. Each VXLAN segment is identified through a 24-bit segment ID called “VXLAN Network Identifier (VNI)”. A network endpoint which performs a conversion from virtual to physical network and back is called VXLAN Tunnel End-Point or VTEP.
In virtual environments, it is typically required to use logical switches to forward traffic between different virtual machines (VMs) on the same physical host, between virtual machines and the physical machines and between networks. Virtual switch environments use an OVSDB management protocol for configuration and state discovery of the virtual networks. OVSDB protocol allows programmable access to the database of virtual switch configuration.
To enable VXLAN:
Configure jumbo frames for NVE ports. Run:
switch
(config)#interface
ethernet1
/1
-1
/4
mtu9216
forceConfigure jumbo frames for underlay-facing ports. Run:
switch
(config)#interface
ethernet1
/17
mtu9216
forceCreate VLAN for all VXLAN traffic. Run:
switch
(config)# vlan3
Configure Overlay interfaces with VXLAN VLAN. Run:
switch
(config)#interface
ethernet1
/17
switchport access vlan3
Enable IP routing. Run:
switch
(config)# ip routing vrfdefault
Configure interface on the VXLAN VLAN and configure an IP address for it. Run:
switch
(config)#interface
vlan3
switch
(configinterface
vlan3
)# ip address33.33
.33.254
255.255
.255.0
switch
(configinterface
vlan3
)#interface
vlan3
mtu9216
Enable NVE protocol. Run:
switch
(config)# protocol nveConfigure interface NVE. Run:
switch
(config)#interface
nve1
Create loopback interface to terminate the VXLAN tunnel. The IP address of the interface will be a VTEP endpoint address, and needs to be reachable in the underlay network. Run:
switch
(config)#interface
loopback1
switch
(configinterface
loopback1
)# ip address1.2
.3.4
255.255
.255.255
switch
(config)#interface
nve1
vxlan sourceinterface
loopback1
Configure routing to other VTEP devices. Run:
switch
(config)# ip route vrfdefault
1.2
.3.5
/32
33.33
.33.253
switch
(config)# ip route vrfdefault
1.2
.3.6
/32
33.33
.33.252
Configure overlay-facing ports for NVE mode. Run:
switch
(config)#interface
ethernet1
/1
nve mode only forceswitch
(config)#interface
ethernet1
/2
nve mode only forceswitch
(config)#interface
ethernet1
/3
nve mode only forceswitch
(config)#interface
ethernet1
/4
nve mode only force
For deployments with a controller, set up OVSDB:
Start OVSDB server. Run:
switch
(config)# ovs ovsdb serverConfigure the OVSDB manager to an IP address of a controller. Run:
switch
(config)# ovs ovsdb manager remote ssl ip address10.130
.250.5
For controller-less deployments, configure the bridging from the CLI directly:
Create bridges. Run:
switch
(config)#interface
nve1
nve bridge7777
switch
(config)#interface
ethernet1
/1
nve vlan10
bridge7777
Configure source-node replication. Run:
switch
(config)# nointerface
nve1
nve fdb flood load-balanceConfigure flood addresses for BUM traffic. Run:
switch
(config)#interface
nve1
nve fdb flood bridge7777
address1.2
.3.5
switch
(config)#interface
nve1
nve fdb flood bridge7777
address1.2
.3.6
Configure FDB remote learning. Run:
switch
(config)#interface
nve1
nve fdb learning remote
Hardware Topology
2 ESXi servers pre-configured with VXLAN networking using VMware NSX
3 NSX Controllers available for VXLAN unicast type logical switches
1 NVIDIA switch connected to the ESXi servers and to a physical database server
Out-of-band network for management and a VLAN network to carry VXLAN traffic
Switch Configuration
Configure jumbo frames on ESXi and Database server facing interfaces. Run:
switch
(config)#interface
ethernet1
/1
-1
/3
mtu9216
forceCreate VLAN 3 to carry VXLAN traffic (if it does not exist yet). Run:
switch
(config)# vlan3
switch
(config vlan3
)# exitswitch
(config)#Enable IP routing. Run:
switch
(config)# ip routing vrfdefault
Create an interface on VLAN 3 and assign an IP address to it.
The IP address must be the default gateway of the VXLAN netstack created by NSX after enabling VXLAN traffic on the hosts.
To check the default gateway in vSphere web client select an ESXi host and go to: Configure -> TCP/IP configuration.switch
(config)#interface
vlan3
switch
(configinterface
vlan3
)# ip address33.33
.33.254
255.255
.255.0
switch
(configinterface
vlan3
)#interface
vlan3
mtu9216
Create a loopback interface to communicate with VTEPs on the ESXi servers by routing through “interface vlan 3”. This interface will be the VTEP IP assigned to the switch. Run:
switch
(config)#interface
loopback1
switch
(configinterface
loopback1
)# ip address1.2
.3.4
255.255
.255.255
Enable NVE protocol. Run:
switch
(config)# protocol nveConfigure interface NVE. Run:
switch
(config)#interface
nve1
Configure the source of the NVE interface to be the loopback created above. Run:
switch
(config)#interface
nve1
vxlan sourceinterface
loopback1
Start the OVSDB server and connect it to the NSX Controllers. Run:
switch
(config)# ovs ovsdb serverswitch
(config)# ovs ovsdb manager remote ssl ip address10.130
.200.100
switch
(config)# ovs ovsdb manager remote ssl ip address10.144
.200.101
switch
(config)# ovs ovsdb manager remote ssl ip address10.144
.200.102
Configure the port facing the Database server as an NVE port. Run:
switch
(config)#interface
ethernet1
/3
nve mode only forceGet the switch certificate for later configuration in the NSX Manager. Run:
switch
(config)# show crypto certificate name system-self-signedpublic
-pemCopy the certificate starting with the line:
-----BEGIN CERTIFICATE-----
Until the line:
-----END CERTIFICATE-----
Make sure to include both of those lines.
ImportantNSX Manager Configuration
ImportantAdding Hosts to Replication Cluster
In NSX Manager, go to “Service Definitions” → “Hardware Devices”.
Under “Replication Cluster” click Edit.
Add both of the ESXi servers to the replication cluster.
All hosts added to the replication cluster can replicate BUM (Broadcast, Unknown unicast and Multicast) traffic to other ESXi servers.
When the switch needs to send BUM traffic to a virtual machine, it will select one of the hosts in the replication cluster and send the traffic to it, the host will then replicate it to all other ESXi hosts.
It is recommended to add at least 2 ESXi servers to the replication cluster for redundancy.
Adding the Switch to NSX
Under Hardware Devices click the + sign to add a new hardware device.
Fill in a name for the new hardware device.
Fill in the switch certificate we got earlier.
Click OK.
Wait until the new switch is showing as “UP” under the connectivity column, you may need to refresh vSphere client a few times.
Mapping a Logical Switch to a Physical Switch Port
In NSX Manager go to “Logical Switches”.
Right click the logical switch you wish to map to the physical switch port and select “Manage Hardware Bindings”.
Click the “+” sign to add a new mapping instance.
Click Select under the port column and select port “eth3”, this corresponds to “ 1/3” we configured earlier as an NVE port in the switch.
Under the VLAN column, set the VLAN that will map this logical switch to this specific switch port, you can have multiple logical switches mapped to the same port on a different VLAN (for example to connect a firewall appliance to logical switches). For “access” configuration (no VLAN is required on the host connected to the physical switch port) use VLAN 1.
Click OK.
For more information about this feature and its potential applications, please refer to the following community posts:
RoCEv2 Using PFC and ECN
The following figure and flow demonstrate how to configure RoCEv2 using PFC and ECN. RoCEv2 QoS is preserved by DSCP.
DSCP is automatically driven from the original packet into the VXLAN header in Onyx.
Configure the switch buffer to support lossless traffic.
traffic pool roce type lossless traffic pool roce memory percent
50.00
traffic pool roce mapswitch
-priority3
Enable ECN.
interface
ethernet1
/15
traffic-class
3
congestion-control ecn minimum-absolute150
maximum-absolute1500
interface
ethernet1
/16
traffic-class
3
congestion-control ecn minimum-absolute150
maximum-absolute1500
interface
mlag-port-channel7
-8
traffic-class
3
congestion-control ecn minimum-absolute150
maximum-absolute1500
interface
port-channel1
traffic-class
3
congestion-control ecn minimum-absolute150
maximum-absolute1500
interface
ethernet1
/15
traffic-class
6
dcb ets strictinterface
ethernet1
/16
traffic-class
6
dcb ets strictinterface
mlag-port-channel7
-8
traffic-class
6
dcb ets strictinterface
port-channel1
traffic-class
6
dcb ets strict
Set QoS trust to DSCP.
interface
ethernet1
/15
-1
/16
qos trust L3interface
mlag-port-channel7
-8
qos trust L3interface
port-channel1
qos trust L3
RoCEv1 Using PFC
The following figure and flow demonstrate how to configure RoCEv1 using PFC. RoCEv1 QoS is based on the PCP field sent by the server.
Configure the switch buffer to support lossless traffic.
traffic pool roce type lossless traffic pool roce memory percent
50.00
traffic pool roce mapswitch
-priority3
Set Uplinks and IPL trust to DSCP.
interface
ethernet1
/15
-1
/16
qos trust L3interface
port-channel1
qos trust L3
Set Downlinks trust to PCP.
interface
mlag-port-channel7
-8
qos trust L2
Set Downlinks rewrite to DSCP. This will allow translation from PCP to DSCP in VXLAN.
interface
mlag-port-channel7
-8
qos rewrite dscp
Set Uplinks and IPL rewrite to PCP. This will allow translation from DSCP to PCP.
interface
ethernet1
/15
-1
/16
qos rewrite pcpinterface
port-channel1
qos rewrite pcp