Many data centers today are moving from legacy Layer 2 (L2) designs to modern Layer 3 (L3) web-scale IT architectures. L3 designs simplify troubleshooting, provide clear upgrade strategies, support multi-vendor environments, and dramatically reduce the size of failure domains.
General Data Center Network with EVPN
However, many applications and storage appliances still require layer 2 adjacency. VXLAN tunnels can satisfy this L2 adjacency requirement, and EVPN serves as a standard for scale-out L2 Ethernet fabrics. VXLAN can virtualize the data center network, enabling layer 2 segments to be extended over an IP core (the underlay). EVPN is the control plane for modern VXLAN deployments, allowing VTEPs to discover each other via EVPN and exchange reachability information such as MAC and IPs across racks.
ARP suppression is used to reduce the amount of broadcast packets crossing the extended L2 domain. BGP is the underlay routing protocol serving as the transport layer for the overlay VXLAN.
Example of How To Configure EVPN
The configuration flow will be described using the setup illustrated below and over leaf3.
Layer 2 Configuration, MLAG, and VLANs
MLAG between leaf3 and leaf4
Layer 2 Ports
In our setup we use VLAN 6 as the native VLAN, and VLAN 10 as the Tagged VLAN.
We use LACP Bond on our servers, and using them we set LACP on the Switch MPOs.
PXE boot is required to set our MPOs to "lacp-individual enable"
Layer 3 Configuration
Layer 3 Interfaces
- Since we use VXLAN, we will set all of our L3 interfaces to support a maximum MTU of 9216. The servers' MTU should be set to below the maximum fabric MTU to allow space for the additional headers of the VXLAN. The VXLAN encapsulation header adds 50 bytes to the overall size of an Ethernet frame.
- Router ports serve as uplinks.
- Loopback for VTEP source is unique per leaf switch.
VXLAN Tunnels Configuration
NVE represents a VTEP. We will use a single VTEP with multiple VNIs.
Note that "vxlan mlag-tunnel-ip" is used to configure MLAG with VXLAN. This way other VTEPs will see the MLAG pair as a single entity (for this reason, the "mlag-tunnel-ip" setting should be unique per MLAG pair). As long as the MLAG is up, both switches will use the same IP as the VTEP source. If MLAG state changes to Split Brain (IPL is down but mgmt0 interface is up), the standby switch will use its local loopback for the advertisements; this will prevent impacting traffic from stand-alone ports by the Split Brain scenario.
The only command needed to add more VNIs to a switch is:
Traditional L2 network broadcast traffic generated by ARP requests overloads the network. Using ARP suppression with VXLAN enables suppressing these messages at the leaf layer. Let's consider the example setup that is illustrated below.
The support for gratuitous ARP in EVPN has been added, also when ARP Suppression is enabled. The feature allows generating GARP packets on the egress VTEP only when neighbor-suppression is enabled on both VTEPs in the chain (ingress and egress). The suppression should be enabled on interface NVE or on a particular VLAN of the VTEP.
- The first time Server2 communicates, it sends an ARP request.
- Leaf2 learns its MAC and IP, and sends an EVPN update containing the IP and MAC on the corresponding VNI4010.
- Leaf1 learns the IP and MAC of Server2 on VNI4010.
- When Server1 sends an ARP request to Server2, leaf1 replies to the ARP request as it has all of the details.
- The result is that broadcasts to all leafs that are part of VNI4010 are suppressed.
IPv4 Normal ARP
IPv6 Neighbor Discovery (equivalent to IPv4 ARP)
IPv6 Unsolicited Neighbor Advertisement (equivalent to IPv4 GARP)
* the GARP (Gratuitous ARP) packet will reach the destination endpoint despite neighbor suppression
Since IPv4 GARP is processed locally on ingress VTEP and then only BGP update propagated through EVPN network there are several limitations related to scale and performance. The following limitations vary based on the CPU type and current switch load. Switches with higher performance will have better results. Below are the minimum performance expected.
- Ingress VTEP: max 1000 frames per second of ingress GARP
- Egress VTEP: at least 100 fps for GARP generation
BGP and EVPN Configuration
The examples below use eBGP. Nevertheless, iBGP can be used as well.
Now we will configure our L3 underlay using eBGP as the underlay protocol. The Autonomous System (AS) design that we use as an example represents common designs of eBGP running over leaf/spine data centers. Specifically, each of the leaf switches will be in a separate AS, and the spine layer will be in the same AS layer.
Note: It is necessary to advertise both the local loopback network and the mlag-tunnel-ip network.
EVPN Address Family
In the following code, we create a peer group that contains all of the EVPN configuration and attach it to our L3 interfaces.
Each spine has a unique loopback address that we use to represent its Router-ID.
Traffic Behavior During Failures
Server Link Failure
Traffic forwarding during a failure follows standard MLAG behavior. If a link of the server fails, traffic will be forwarded across one of the remaining active links.
With reference to the illustration below: If traffic is received on leaf3 due to the ECMP hash of the spine, leaf3 will decapsulate the frame. And based on its local MAC table, leaf3 will also switch the frame across the peer link for forwarding to Server via leaf4.
To cover rare cases such as losing all of the uplinks on one of the MLAG peers, we enable BGP over the IPL. This way, traffic coming from the servers towards that leaf can still be routed towards the remote servers.
Note: Traffic coming towards the servers connected to leaf4 from the spine will always be terminated on leaf4 and sent directly to the servers without passing over the IPL.
show interface nve 1
Display the configured VTEP on a network device participating in BGP EVPN.
show interface nve 1 detail
Display the configured VNIs on a network device participating in BGP EVPN.
show ip bgp evpn summary
Display the BGP peers participating in the layer 2 EVPN address-family and their states.
show ip bgp evpn
Display all EVPN routes, both local and remote. The routes displayed here are based on RD as they are across VNIs.
show ip bgp evpn vni 10060
Display the EVPN information for a specific VNI in detail.
show ip bgp evpn with multiple filters
Display the EVPN information for a specific VNI in detail, selecting different filters
Display all local and remote MAC addresses.
show ip arp
Display all local and remote neighbors (ARP entries), this command is only relevant when arp-suppression is enabled.
EVPN Data Center Interconnect (DCI)
Layer 2 DCI Connection
Regular BGP/EVPN Configuration is required since the connection between the sites is L2 based.
Layer 3 Routes WAN
As the WAN transport layer does not support the EVPN/BGP address family, a remote BGP/EVPN connection should be set between each of the local leafs and the remote leafs. To allow this connection BGP should be set to multi-hop mode.
EVPN Centralized L3 Gateway
In centralized L3 gateway, a specific VTEP can be configured to act as the default gateway for all the hosts in a particular subnet throughout the EVPN network. It is possible to provision an MLAG pair in active-active mode as the default gateway. The VTEP will perform a routing to the destination host together with VxLAN ingress and egress bridging.
Configuration Example of EVPN Centralized Gateway
Run the following:
VTEP Key Outputs
Configuration Example of MLAG EVPN Centralized Gateway
Configure the MLAG Master in the following way
Configure the MLAG Standby in the following way:
EVPN Logging Examples
EVPN MAC Mobility Logs
MAC mobility warning is detected when a MAC address is noticed to move between a local and one or more remote customer site 5 times in a period of 180 seconds. This indicates that multiple hosts have been configured with the same MAC address. The MAC mobility warning is cleared when only one route for the MAC address is left (either local or remote).
When detecting EVPN MAC duplication, the following message will appear:
A static MAC error is detected when a remote route is received for a MAC address for which a local existing route has been marked as static. The local route being marked as static indicates that the MAC address is not expected to move. In this case, any remote route with this MAC address is an error. The static MAC error is cleared when all remote routes for the MAC address are withdrawn or if the local route is no longer marked as static.
When receiving EVPN MAC mobility route for a static MAC address, the following message will appear: