NVIDIA® Cumulus Linux is the first full-featured Linux operating system for the networking industry. The Debian Buster-based, networking-focused distribution runs on hardware produced by a broad partner ecosystem, ensuring unmatched customer choice regarding silicon, optics, cables, and systems.
This user guide provides in-depth documentation on the Cumulus Linux installation process, system configuration and management, network solutions, and monitoring and troubleshooting recommendations. In addition, the quick start guide provides an end-to-end setup process to get you started.
Cumulus Linux 4.3 includes the NetQ agent and CLI, which is installed by default on the Cumulus Linux switch. Use NetQ to monitor and manage your data center network infrastructure and operational health. Refer to the
NetQ documentation for details.
For a list of the new features in this release, see What's New. For bug fixes and known issues present in this release, refer to the Cumulus Linux 4.3 Release Notes.
Open Source Contributions
To implement various Cumulus Linux features, Cumulus Networks has forked various software projects, like CFEngine Netdev and some Puppet Labs packages. Some of the forked code resides in the Cumulus Networks GitHub repository and some is available as part of the Cumulus Linux repository as Debian source packages.
Cumulus Networks has also developed and released new applications as open source. The list of open source projects is on the open source software page.
Download the User Guide
You can view the complete Cumulus Linux 4.3 user guide as a single page to print to PDF here.
What's New
This document supports the Cumulus Linux 4.3 release, and lists new platforms and features.
This quick start guide provides an end-to-end setup process for installing and running Cumulus Linux, as well as a collection of example commands for getting started after installation is complete.
Prerequisites
Intermediate-level Linux knowledge is assumed for this guide. You need to be familiar with basic text editing, Unix file permissions, and process monitoring. A variety of text editors are pre-installed, including vi and nano.
You must have access to a Linux or UNIX shell. If you are running Windows, use a Linux environment like Cygwin as your command line tool for interacting with Cumulus Linux.
If you are a networking engineer but are unfamiliar with Linux concepts, refer to this reference guide to compare the Cumulus Linux CLI and configuration options, and their equivalent Cisco Nexus 3000 NX-OS commands and settings. You can also watch a series of short videos introducing you to Linux and Cumulus Linux-specific concepts.
Install Cumulus Linux
To install Cumulus Linux, you use ONIE (Open Network Install Environment), an extension to the traditional U-Boot software that allows for automatic discovery of a network installer image. This facilitates the ecosystem model of procuring switches with an operating system choice, such as Cumulus Linux. The easiest way to install Cumulus Linux with ONIE is with local HTTP discovery:
If your host (laptop or server) is IPv6-enabled, make sure it is running a web server. If the host is IPv4-enabled, make sure it is running DHCP in addition to a web server.
Download the Cumulus Linux installation file to the root directory of the web server. Rename this file onie-installer.
Connect your host using an Ethernet cable to the management Ethernet port of the switch.
Power on the switch. The switch downloads the ONIE image installer and boots. You can watch the progress of the install in your terminal. After the installation completes, the Cumulus Linux login prompt appears in the terminal window.
These steps describe a flexible unattended installation method. You do not need a console cable. A fresh install with ONIE using a local web server typically completes in less than ten minutes.
You have more options for installing Cumulus Linux with ONIE. Read Installing a New Cumulus Linux Image to install Cumulus Linux using ONIE in the following ways:
DHCP/web server with and without DHCP options
Web server without DHCP
FTP without a web server
Local file
USB
After installing Cumulus Linux, you are ready to:
Log in to Cumulus Linux on the switch.
Install the Cumulus Linux license.
Configure Cumulus Linux. This quick start guide provides instructions on configuring switch ports and a loopback interface.
Get Started
When starting Cumulus Linux for the first time, the management port makes a DHCPv4 request. To determine the IP address of the switch, you can cross reference the MAC address of the switch with your DHCP server. The MAC address is typically located on the side of the switch or on the box in which the unit ships.
Login Credentials
The default installation includes the system account (root), with full system privileges and the user account (cumulus), with sudo privileges. The root account password is locked by default (which prohibits login). The cumulus account is configured with this default password:
cumulus
When you log into Cumulus Linux for the first time with the cumulus account, you are prompted to change the default password. After you provide a new password, the SSH session disconnects and you have to reconnect with the new password.
In this quick start guide, you use the cumulus account to configure Cumulus Linux.
All accounts except root are permitted remote SSH login; you can use sudo to grant a non-root account root-level access. Commands that change the system configuration require this elevated level of access.
You are encouraged to perform management and configuration over the network, either in band or out of band. A serial console is fully supported; however, you might prefer the convenience of network-based management.
Typically, switches ship from the manufacturer with a mating DB9 serial cable. Switches with ONIE are always set to a 115200 baud rate.
Wired Ethernet Management
Switches supported in Cumulus Linux always contain at least one dedicated Ethernet management port, which is named eth0. This interface is geared specifically for out-of-band management use. The management interface uses DHCPv4 for addressing by default. You can set a static IP address with the Network Command Line Utility (NCLU) or by editing the /etc/network/interfaces file (Linux).
Set the static IP address with the interface address and interface gateway NCLU commands:
cumulus@switch:~$ net add interface eth0 ip address 192.0.2.42/24
cumulus@switch:~$ net add interface eth0 ip gateway 192.0.2.1
cumulus@switch:~$ net pending
cumulus@switch:~$ net commit
Set a static IP address by editing the /etc/network/interfaces file:
cumulus@switch:~$ cl set interface eth0 ip address 192.0.2.42/24
cumulus@switch:~$ cl set interface eth0 ip gateway 192.0.2.1
cumulus@switch:~$ cl config apply
Configure the Hostname and Time Zone
Configure the hostname and time zone for your switch. The hostname identifies the switch; make sure you configure the hostname to be unique and descriptive.
Do not use an underscore (_) in the hostname; underscores are not permitted.
Avoid using apostrophes or non-ASCII characters in the hostname. Cumulus Linux does not parse these characters.
To change the hostname:
Run the net add hostname command, which modifies both the /etc/hostname and /etc/hosts files with the desired hostname. The following example sets the hostname to leaf01:
cumulus@switch:~$ net add hostname leaf01
cumulus@switch:~$ net pending
cumulus@switch:~$ net commit
Change the hostname with the hostnamectl command; for example:
The following example sets the hostname to leaf01:
cumulus@switch:~$ cl set platform hostname leaf01
cumulus@switch:~$ cl config apply
The command prompt in the terminal does not reflect the new hostname until you either log out of the switch or start a new shell.
When you use the NCLU command to set the hostname, DHCP does not override the hostname when you reboot the switch. However, if you disable the hostname setting with NCLU, DHCP does override the hostname the next time you reboot the switch.
The default time zone on the switch is (Coordinated Universal Time) UTC. Change the time zone on your switch to be the time zone for your location.
To update the time zone, use NTP interactive mode:
Run the following command in a terminal.
cumulus@switch:~$ sudo dpkg-reconfigure tzdata
Follow the on screen menu options to select the geographic area and region.
Programs that are already running (including log files) and users currently logged in, do not see time zone changes made with interactive mode. To set the time zone for all services and daemons, reboot the switch.
Verify the System Time
Before you install the license, verify that the date and time on the switch are correct, and correct the date and time if necessary. If the date and time is incorrect, the switch might not be able to synchronize with Puppet or might return errors after you restart switchd:
Warning: Unit file of switchd.service changed on disk, 'systemctl daemon-reload' recommended.
Install the License
Cumulus Linux is licensed on a per-instance basis. Each network system is fully operational, enabling any capability to be utilized on the switch with the exception of forwarding on switch panel ports. Only eth0 and console ports are activated on an unlicensed instance of Cumulus Linux. Enabling front panel ports requires a license.
NVIDIA provides a generic license for Cumulus Linux. Download the license from the NVIDIA Enterprise support portal and apply it.
There are three ways to install the license onto the switch:
Copy the license from a local server. Create a text file with the license and copy it to a server accessible from the switch. On the switch, use the following command to transfer the file directly on the switch, then install the license file:
It is not necessary to reboot the switch to activate the switch ports. After you install the license, restart the switchd service. All front panel ports become active and show up as swp1, swp2, and so on.
Restarting the switchd service causes all network ports to reset, interrupting network services, in addition to resetting the switch hardware configuration.
If a license is not installed on a Cumulus Linux switch, the switchd service does not start. After you install the license, start switchd as described above.
Configure Breakout Ports with Splitter Cables
If you are using 4x10G DAC or AOC cables, or want to break out 100G or 40G switch ports, configure the breakout ports. For more details, see Switch Port Attributes.
Test Cable Connectivity
By default, all data plane ports (every Ethernet port except the management interface, eth0) are turned off.
To test cable connectivity:
To administratively enable a port:
cumulus@switch:~$ net add interface swp1
cumulus@switch:~$ net pending
cumulus@switch:~$ net commit
To administratively enable all physical ports, run the following command, where swp1-52 represents a switch with switch ports numbered from swp1 to swp52:
cumulus@switch:~$ net add interface swp1-52
cumulus@switch:~$ net pending
cumulus@switch:~$ net commit
To view link status, use the net show interface all command. The following examples show the output of ports in admin down, down, and up modes:
cumulus@switch:~$ net show interface all
State Name Spd MTU Mode LLDP Summary
----- ------------- --- ----- ------------- ---------------------- -------------------------
UP lo N/A 65536 Loopback IP: 127.0.0.1/8
lo IP: 10.0.0.11/32
lo IP: 10.0.0.112/32
lo IP: ::1/128
UP eth0 1G 1500 Mgmt oob-mgmt-switch (swp6) Master: mgmt(UP)
eth0 IP: 192.168.0.11/24(DHCP)
UP swp1 1G 9000 BondMember server01 (eth1) Master: bond01(UP)
UP swp2 1G 9000 BondMember server02 (eth1) Master: bond02(UP)
ADMDN swp45 N/A 1500 NotConfigured
ADMDN swp46 N/A 1500 NotConfigured
ADMDN swp47 N/A 1500 NotConfigured
ADMDN swp48 N/A 1500 NotConfigured
UP swp49 1G 9000 BondMember leaf02 (swp49) Master: peerlink(UP)
UP swp50 1G 9000 BondMember leaf02 (swp50) Master: peerlink(UP)
UP swp51 1G 9216 NotConfigured spine01 (swp1)
UP swp52 1G 9216 NotConfigured spine02 (swp1)
UP bond01 1G 9000 802.3ad Master: bridge(UP)
bond01 Bond Members: swp1(UP)
UP bond02 1G 9000 802.3ad Master: bridge(UP)
bond02 Bond Members: swp2(UP)
UP bridge N/A 1500 Bridge/L2
UP mgmt N/A 65536 Interface/L3 IP: 127.0.0.1/8
UP peerlink 2G 9000 802.3ad Master: bridge(UP)
peerlink Bond Members: swp49(UP)
peerlink Bond Members: swp50(UP)
DN peerlink.4094 2G 9000 SubInt/L3 IP: 169.254.1.1/30
ADMDN vagrant N/A 1500 NotConfigured
UP vlan13 N/A 1500 Interface/L3 Master: vrf1(UP)
vlan13 IP: 10.1.3.11/24
UP vlan13-v0 N/A 1500 Interface/L3 Master: vrf1(UP)
vlan13-v0 IP: 10.1.3.1/24
UP vlan24 N/A 1500 Interface/L3 Master: vrf1(UP)
vlan24 IP: 10.2.4.11/24
UP vlan24-v0 N/A 1500 Interface/L3 Master: vrf1(UP)
vlan24-v0 IP: 10.2.4.1/24
UP vlan4001 N/A 1500 NotConfigured Master: vrf1(UP)
UP vni13 N/A 9000 Access/L2 Master: bridge(UP)
UP vni24 N/A 9000 Access/L2 Master: bridge(UP)
UP vrf1 N/A 65536 NotConfigured
UP vxlan4001 N/A 1500 Access/L2 Master: bridge(UP)
To enable a port, run the ip link set <interface> up command. For example:
cumulus@switch:~$ sudo ip link set swp1 up
As root, run the following bash script to administratively enable all physical ports:
cumulus@switch:~$ sudo su -
cumulus@switch:~$ for i in /sys/class/net/*; do iface=`basename $i`; if [[ $iface == swp* ]]; then ip link set $iface up fi done
To view link status, use the ip link show command. The following examples show the output of a port in down and up mode:
# Administratively Down
swp1: <BROADCAST,MULTICAST> mtu 1500 qdisc pfifo_fast state DOWN mode DEFAULT qlen 1000
# Administratively Up but Layer 1 protocol is Down
swp1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast state DOWN mode DEFAULT qlen 500
# Administratively Up, Layer 1 protocol is Up
swp1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT qlen 500
cumulus@switch:~$ cl set interface swp1 link state up
cumulus@switch:~$ cl config apply
To administratively enable all physical ports, run the following command, where swp1-52 represents a switch with switch ports numbered from swp1 to swp52:
cumulus@switch:~$ cl set interface swp1-52 link state up
cumulus@switch:~$ cl config apply
Configure Switch Ports
Layer 2 Port Configuration
Cumulus Linux does not put all ports into a bridge by default. To create a bridge and configure one or more front panel ports as members of the bridge, use the following examples as a guide.
In the following configuration example, the front panel port swp1 is placed into a bridge called bridge.
cumulus@switch:~$ net add bridge bridge ports swp1
cumulus@switch:~$ net pending
cumulus@switch:~$ net commit
You can add a range of ports in one command. For example, to add swp1 through swp10, swp12, and swp14 through swp20 to bridge:
cumulus@switch:~$ net add bridge bridge ports swp1-10,12,14-20
cumulus@switch:~$ net pending
cumulus@switch:~$ net commit
In the following configuration example, the front panel port swp1 is placed into a bridge called br0:
...
auto br0
iface br0
bridge-ports swp1
bridge-stp on
To put a range of ports into a bridge, use the glob keyword. For example, to add swp1 through swp10, swp12, and swp14 through swp20 to br0:
...
auto br0
iface br0
bridge-ports glob swp1-10 swp12 glob swp14-20
bridge-stp on
To activate or apply the configuration to the kernel:
# First, check for typos:
cumulus@switch:~$ sudo ifquery -a
# Then activate the change if no errors are found:
cumulus@switch:~$ sudo ifup -a
In the following configuration example, the front panel port swp1 is placed into a bridge called br_default.
To view the changes in the kernel, use the Linux brctl command or the C:
cumulus@switch:~$ brctl show
bridge name bridge id STP enabled interfaces
br0 8000.089e01cedcc2 yes swp1
Layer 3 Port Configuration
You can also configure a front panel port or bridge interface as a layer 3 port.
In the following configuration example, the front panel port swp1 is configured as a layer 3 access port:
cumulus@switch:~$ net add interface swp1 ip address 10.1.1.1/30
cumulus@switch:~$ net pending
cumulus@switch:~$ net commit
To add an IP address to a bridge interface, you must put it into a VLAN interface. If you want to use a VLAN other than the native one, set the bridge PVID:
cumulus@switch:~$ net add vlan 100 ip address 10.2.2.1/24
cumulus@switch:~$ net add bridge bridge pvid 100
cumulus@switch:~$ net pending
cumulus@switch:~$ net commit
In the following configuration example, the front panel port swp1 is configured as a layer 3 access port:
auto swp1
iface swp1
address 10.1.1.1/30
To add an IP address to a bridge interface, include the address under the iface stanza in the /etc/network/interfaces file. If you want to use a VLAN other than the native one, set the bridge PVID:
auto br0
iface br0
address 10.2.2.1/24
bridge-ports glob swp1-10 swp12 glob swp14-20
bridge-pvid 100
bridge-stp on
To activate or apply the configuration to the kernel:
# First check for typos:
cumulus@switch:~$ sudo ifquery -a
# Then activate the change if no errors are found:
cumulus@switch:~$ sudo ifup -a
In the following configuration example, the front panel port swp1 is configured as a layer 3 access port:
cumulus@switch:~$ cl set interface swp1 ip address 10.1.1.1/30
cumulus@switch:~$ cl config apply
To add an IP address to a bridge interface, you must put it into a VLAN interface. If you want to use a VLAN other than the native one, set the bridge PVID:
cumulus@switch:~$ cl set interface swp1-2 bridge domain bridge
cumulus@switch:~$ cl set bridge domain bridge vlan 100
cumulus@switch:~$ cl set interface vlan100 ip address 10.2.2.1/24
cumulus@switch:~$ cl set bridge domain br_default untagged 100
cumulus@switch:~$ cl config apply
To view the changes in the kernel, use the ip addr show command:
cumulus@switch:~$ ip addr show
...
4. swp1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master bridge state UP group default qlen 1000
link/ether 44:38:39:00:6e:fe brd ff:ff:ff:ff:ff:ff
...
14: bridge: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
link/ether 44:38:39:00:00:04 brd ff:ff:ff:ff:ff:ff
inet6 fe80::4638:39ff:fe00:4/64 scope link
valid_lft forever preferred_lft forever
...
Configure a Loopback Interface
Cumulus Linux has a preconfigured loopback interface. When the switch boots up, it has a loopback interface, called lo, which is up and assigned an IP address of 127.0.0.1.
The loopback interface lo must always be specified in the /etc/network/interfaces file and must always be up.
To see the status of the loopback interface (lo):
Use the net show interface lo command.
cumulus@switch:~$ net show interface lo
Name MAC Speed MTU Mode
-- ------ ----------------- ------- ----- --------
UP lo 00:00:00:00:00:00 N/A 65536 Loopback
Alias
-----
loopback interface
IP Details
------------------------- --------------------
IP: 127.0.0.1/8, ::1/128
IP Neighbor(ARP) Entries: 0
The loopback is up and is assigned an IP address of 127.0.0.1.
To add an IP address to a loopback interface, configure the lo interface:
cumulus@switch:~$ net add loopback lo ip address 10.10.10.1/32
cumulus@switch:~$ net pending
cumulus@switch:~$ net commit
Use the ip addr show lo command.
cumulus@switch:~$ ip addr show lo
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
The loopback is up and is assigned an IP address of 127.0.0.1.
To add an IP address to a loopback interface, add it directly under the iface lo inet loopback definition in the /etc network/interfaces file:
auto lo
iface lo inet loopback
address 10.10.10.10
If an IP address is configured without a mask (as shown in the preceding example), the IP address becomes a /32. In the preceding case, 10.1.1.1 is actually 10.1.1.1/32.
Use the cl show interface lo command.
cumulus@switch:~$ cl show interface lo
running applied pending description
----------------------- ----------- -------- -------- ----------------------------------------------------------------------
type loopback loopback loopback The type of interface
ip
vrf default default Virtual routing and forwarding
ipv4 forward forward IPv4 support on the interface. A value of 'on' means IPv4 is enable...
ipv6 forward forward IPv6 support on the interface. A value of 'on' means IPv6 is enable...
[address] 127.0.0.1/8 ipv4 and ipv6 address
[address] ::1/128
link
mtu 65536 interface mtu
state up The state of the interface
stats
carrier-transitions 0 Number of times the interface state has transitioned between up and...
in-bytes 8360290 total number of bytes received on the interface
in-drops 0 number of received packets dropped
in-errors 0 number of received packets with errors
in-pkts 127169 total number of packets received on the interface
out-bytes 8360290 total number of bytes transmitted out of the interface
out-drops 0 The number of outbound packets that were chosen to be discarded eve...
out-errors 0 The number of outbound packets that could not be transmitted becaus...
out-pkts 127169 total number of packets transmitted out of the interface
Alias
-----
loopback interface
IP Details
------------------------- --------------------
IP: 127.0.0.1/8, ::1/128
IP Neighbor(ARP) Entries: 0
The loopback is up and is assigned an IP address of 127.0.0.1.
To add an IP address to a loopback interface, configure the lo interface:
cumulus@switch:~$ cl set interface lo ip address 10.10.10.1/32
cumulus@switch:~$ cl config apply
To determine if your switch is on an x86 or ARM platform, run the uname -m command.
For example, on an x86 platform, uname -m outputs x86_64:
cumulus@switch:~$ uname -m
x86_64
On an ARM platform, uname -m outputs armv7l:
cumulus@switch:~$ uname -m
armv7l
Reprovision the System (Restart the Installer)
Reprovisioning the system deletes all system data from the switch.
To stage an ONIE installer from the network (where ONIE automatically locates the installer), run the onie-select -i command. A reboot is required for the reinstall to begin.
cumulus@switch:~$ sudo onie-select -i
WARNING:
WARNING: Operating System install requested.
WARNING: This will wipe out all system data.
WARNING:
Are you sure (y/N)? y
Enabling install at next reboot...done.
Reboot required to take effect.
To cancel a pending reinstall operation, run the onie-select -c command:
cumulus@switch:~$ sudo onie-select -c
Cancelling pending install at next reboot...done.
To stage an installer located in a specific location, run the onie-install-i command. You can specify a local, absolute or relative path, an HTTP or HTTPS server, SCP or FTP server. You can also stage a Zero Touch Provisioning (ZTP) script along with the installer.
The onie-install command is typically used with the -a option to activate installation. If you do not specify the -a option, a reboot is required for the reinstall to begin.
The following example stages the installer located at http://203.0.113.10/image-installer together with the ZTP script located at http://203.0.113.10/ztp-script and activates installation and ZTP:
You can also specify these options together in the same command. For example:
cumulus@switch:~$ sudo onie-install -i http://203.0.113.10/image-installer -z http://203.0.113.10/ztp-script -a
To see more onie-install options, run man onie-install.
Migrate from Cumulus Linux to ONIE (Uninstall All Images and Remove the Configuration)
To remove all installed images and configurations, and return the switch to its factory defaults, run the onie-select -k command.
The onie-select -k command takes a long time to run as it overwrites the entire NOS section of the flash. Only use this command if you want to erase all NOS data and take the switch out of service.
cumulus@switch:~$ sudo onie-select -k
WARNING:
WARNING: Operating System uninstall requested.
WARNING: This will wipe out all system data.
WARNING:
Are you sure (y/N)? y
Enabling uninstall at next reboot...done.
Reboot required to take effect.
A reboot is required for the uninstallation process to begin.
To cancel a pending uninstall operation, run the onie-select -c command:
cumulus@switch:~$ sudo onie-select -c
Cancelling pending uninstall at next reboot...done.
Boot into Rescue Mode
If your system becomes unresponsive is some way, you can correct certain issues by booting into ONIE rescue mode. In rescue mode, the file systems are unmounted and you can use various Cumulus Linux utilities to try and resolve a problem.
To reboot the system into ONIE rescue mode, run the onie-select -r command:
cumulus@switch:~$ sudo onie-select -r
WARNING:
WARNING: Rescue boot requested.
WARNING:
Are you sure (y/N)? y
Enabling rescue at next reboot...done.
Reboot required to take effect.
A reboot is required to boot into rescue mode.
To cancel a pending rescue boot operation, run the onie-select -c command:
cumulus@switch:~$ sudo onie-select -c
Cancelling pending rescue at next reboot...done.
Inspect the Image File
The Cumulus Linux image file is executable. From a running switch, you can display, extract, and verify the contents of the image file.
To display the contents of the Cumulus Linux image file, pass the info option to the image file. For example, to display the contents of an image file called onie-installer located in the /var/lib/cumulus/installer directory:
To extract the contents of the image file, use with the extract <path> option. For example, to extract an image file called onie-installer located in the /var/lib/cumulus/installer directory to the mypath directory:
cumulus@switch:~$ sudo /var/lib/cumulus/installer/onie-installer extract mypath
total 181860
-rw-r--r-- 1 4000 4000 308 May 16 19:04 control
drwxr-xr-x 5 4000 4000 4096 Apr 26 21:28 embedded-installer
-rw-r--r-- 1 4000 4000 13273936 May 16 19:04 initrd
-rw-r--r-- 1 4000 4000 4239088 May 16 19:04 kernel
-rw-r--r-- 1 4000 4000 168701528 May 16 19:04 sysroot.tar
To verify the contents of the image file, use with the verify option. For example, to verify the contents of an image file called onie-installer located in the /var/lib/cumulus/installer directory:
cumulus@switch:~$ sudo /var/lib/cumulus/installer/onie-installer verify
Verifying image checksum ...OK.
Preparing image archive ... OK.
./cumulus-linux-bcm-amd64.bin.1: 161: ./cumulus-linux-bcm-amd64.bin.1: onie-sysinfo: not found
Verifying image compatibility ...OK.
Verifying system ram ...OK.
In Cumulus Linux 4.2.0 and later, the default password for the cumulus user account is cumulus. The first time you log into Cumulus Linux, you are required to change this default password. Be sure to update any automation scripts before installing a new image. Cumulus Linux provides command line options to change the default password automatically during the installation process. Refer to ONIE Installation Options.
You can install a new Cumulus Linux image using ONIE, an open source project (equivalent to PXE on servers) that enables the installation of network operating systems (NOS) on bare metal switches.
Before you install Cumulus Linux, the switch can be in two different states:
No image is installed on the switch (the switch is only running ONIE).
Cumulus Linux is already installed on the switch but you want to use ONIE to reinstall Cumulus Linux or upgrade to a newer version.
Cumulus Linux 4.3.1 and 4.3.2 supports Broadcom switches only. You cannot upgrade to Cumulus Linux 4.3.1 or 4.3.2 on a Mellanox switch.
The sections below describe some of the different ways you can install the Cumulus Linux image, such as using a DHCP/web server, FTP, a local file, or a USB drive. Steps are provided for both installing directly from ONIE (if no image is installed on the switch) and from Cumulus Linux (if the image is already installed on the switch), where applicable. For additional methods to find and install the Cumulus Linux image, see the ONIE Design Specification.
Installing the Cumulus Linux image is destructive; configuration files on the switch are not saved; copy them to a different server before installing.
In the following procedures:
You can name your Cumulus Linux image using any of the
ONIE naming schemes mentioned here.
In the example commands, [PLATFORM] can be any supported Cumulus Linux platform, such as x86_64, or arm.
Run the sudo onie-install -h command to show the ONIE installer options.
After you install the Cumulus Linux image, you need to install the license file. Refer to Install the License.
Install Using a DHCP/Web Server with DHCP Options
To install Cumulus Linux using a DHCP/web server with DHCP options, set up a DHCP/web server on your laptop and connect the eth0 management port of the switch to your laptop. After you connect the cable, the installation proceeds as follows:
The bare metal switch boots up and requests an IP address (DHCP request).
The DHCP server acknowledges and responds with DHCP option 114 and the location of the installation image.
ONIE downloads the Cumulus Linux image, installs, and reboots.
Success! You are now running Cumulus Linux.
The most common method is to send DHCP option 114 with the entire URL to the web server (this can be the same system). However, there are many other ways to use DHCP even if you do not have full control over DHCP. See the ONIE user guide for help with partial installer URLs and advanced DHCP options; both articles list more supported DHCP options.
Here is an example DHCP configuration with an ISC DHCP server:
Place the Cumulus Linux image in a directory on the web server.
From the Cumulus Linux command prompt, run the onie-install command, then reboot the switch.
cumulus@switch:~$ sudo onie-install -a -i http://10.0.1.251/path/to/cumulus-install-[PLATFORM].bin
Install Using a Web Server with no DHCP
Follow the steps below if you can log into the switch on a serial console (ONIE), or log in on the console or with ssh (Install from Cumulus Linux) but no DHCP server is available.
You need a console connection to access the switch; you cannot perform this procedure remotely.
ONIE is in discovery mode. You must disable discovery mode with the following command:
onie# onie-discovery-stop
On older ONIE versions, if the onie-discovery-stop command is not supported, run:
onie# /etc/init.d/discover.sh stop
Assign a static address to eth0 with the ip addr add command:
ONIE:/ #ip addr add 10.0.1.252/24 dev eth0
Place the Cumulus Linux image in a directory on your web server.
Run the installer manually (because there are no DHCP options):
From the Cumulus Linux command prompt, run the onie-install command, then reboot the switch.
cumulus@switch:~$ sudo onie-install -a -i /path/to/local/file/cumulus-install-[PLATFORM].bin
Install Using a USB Drive
Follow the steps below to install the Cumulus Linux image using a USB drive. Instructions are provided for x86 and ARM platforms.
Installing Cumulus Linux using a USB drive is fine for a single switch here and there but is not scalable. DHCP can scale to hundreds of switch installs with zero manual input unlike USB installs.
From a computer, prepare your USB drive by formatting it using one of the supported formats: FAT32, vFAT or EXT2.
▼
Optional: Prepare a USB Drive inside Cumulus Linux
Insert your USB drive into the USB port on the switch running Cumulus Linux and log in to the switch. Examine output from cat /proc/partitions and sudo fdisk -l [device] to determine on which device your USB drive can be found. For example, sudo fdisk -l /dev/sdb.
These instructions assume your USB drive is the /dev/sdb device, which is typical if you insert
the USB drive after the machine is already booted. However, if you insert the USB drive during the boot process, it is possible that your USB drive is the /dev/sda device. Make sure to modify the commands below to use the proper device for your USB drive.
Create a new partition table on the USB drive. (The parted utility should already be installed. However, if it is not, install it with sudo -E apt-get install parted.)
sudo parted /dev/sdb mklabel msdos
Create a new partition on the USB drive:
sudo parted /dev/sdb -a optimal mkpart primary 0% 100%
Format the partition to your filesystem of choice using one of the examples below:
When using a Mac or Windows computer to rename the installation file, the file extension might still be present. Make sure to remove the file extension otherwise ONIE is not able to detect the file.
Insert the USB drive into the switch, then continue with the appropriate instructions below for your x86 or ARM platform.
Prepare the switch for installation:
If the switch is offline, connect to the console and power on the switch.
If the switch is already online in ONIE, use the reboot command.
SSH sessions to the switch get dropped after this step. To complete the remaining instructions, connect to the console of the switch. Cumulus Linux switches display their boot process to the console; you need to monitor the console specifically to complete the next step.
Monitor the console and select the ONIE option from the first GRUB screen shown below.
Cumulus Linux on x86 uses GRUB chainloading to present a second GRUB menu specific to the ONIE partition. No action is necessary in this menu to select the default option ONIE: Install OS.
The USB drive is recognized and mounted automatically. The image file is located and automatic installation of Cumulus Linux begins. Here is some sample output:
```
ONIE: OS Install Mode ...
Version : quanta_common_rangeley-2019.05.05-6919d98-201410171013
Build Date: 2019-10-17T10:13+0800
Info: Mounting kernel filesystems... done.
Info: Mounting LABEL=ONIE-BOOT on /mnt/onie-boot ...
initializing eth0...
scsi 6:0:0:0: Direct-Access SanDisk Cruzer Facet 1.26 PQ: 0 ANSI: 6
sd 6:0:0:0: [sdb] 31266816 512-byte logical blocks: (16.0 GB/14.9 GiB)
sd 6:0:0:0: [sdb] Write Protect is off
sd 6:0:0:0: [sdb] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
sd 6:0:0:0: [sdb] Attached SCSI disk
<...snip...>
ONIE: Executing installer: file://dev/sdb1/onie-installer-x86_64
Verifying image checksum ... OK.
Preparing image archive ... OK.
Dumping image info...
Control File Contents
=====================
Description: Cumulus Linux
OS-Release: 4.1.0
Architecture: amd64
Date: Fri, 22 November 2019 17:10:30 -0700
Installer-Version: 1.2
Platforms: accton_as5712_54x accton_as6712_32x mlx_sx1400_i73612 dell_s4000_c2338 dell_s3000_c2338 cel_redstone_xp cel_smallstone_xp cel_pebble quanta_panther quanta_ly8_rangeley quanta_ly6_rangeley quanta_ly9_rangeley
Homepage: http://www.cumulusnetworks.com/
```
After installation completes, the switch automatically reboots into the newly installed instance of Cumulus Linux.
Prepare the switch for installation:
If the switch is offline, connect to the console and power on the switch.
If the switch is already online in ONIE, use the reboot command.
SSH sessions to the switch get dropped after this step. To complete the remaining instructions, connect to the console of the switch. Cumulus Linux switches display their boot process to the console; you need to monitor the console specifically to complete the next step.
Interrupt the normal boot process before the countdown (shown below) completes. Press any key to stop the autoboot.
A command prompt appears so that you can run commands. Execute the following command:
run onie_bootcmd
The USB drive is recognized and mounted automatically. The image file is located and automatic installation of Cumulus Linux begins. Here is some sample output:
```
Loading Open Network Install Environment ...
Platform: arm-as4610_54p-r0
Version : 1.6.1.3
WARNING: adjusting available memory to 30000000
## Booting kernel from Legacy Image at ec040000 ...
Image Name: as6701_32x.1.6.1.3
Image Type: ARM Linux Multi-File Image (gzip compressed)
Data Size: 4456555 Bytes = 4.3 MiB
Load Address: 00000000
Entry Point: 00000000
Contents:
Image 0: 3738543 Bytes = 3.6 MiB
Image 1: 706440 Bytes = 689.9 KiB
Image 2: 11555 Bytes = 11.3 KiB
Verifying Checksum ... OK
## Loading init Ramdisk from multi component Legacy Image at ec040000 ...
## Flattened Device Tree from multi component Image at EC040000
Booting using the fdt at 0xec47d388
Uncompressing Multi-File Image ... OK
Loading Ramdisk to 2ff53000, end 2ffff788 ... OK
Loading Device Tree to 03ffa000, end 03fffd22 ... OK
<...snip...>
ONIE: Starting ONIE Service Discovery
ONIE: Executing installer: file://dev/sdb1/onie-installer-arm
Verifying image checksum ... OK.
Preparing image archive ... OK.
Dumping image info ...
Control File Contents
=====================
Description: Cumulus Linux
OS-Release: 4.1.0
Architecture: arm
Date: Fri, 13 March 2020 17:08:35 -0700
Installer-Version: 1.2
Platforms: accton_as4600_54t, accton_as6701_32x, accton_5652, accton_as5610_52x, dni_6448, dni_7448, dni_c7448n, cel_kennisis, cel_redstone, cel_smallstone, cumulus_p2020, quanta_lb9, quanta_ly2, quanta_ly2r, quanta_ly6_p2020
Homepage: http://www.cumulusnetworks.com/
```
After installation completes, the switch automatically reboots into the newly installed instance of Cumulus Linux.
ONIE Installation Options
You can run several installer command line options from ONIE to perform basic switch configuration automatically after installation completes and Cumulus Linux boots for the first time. These options enable you to:
Set a unique password for the cumulus user
Apply a Cumulus Linux license
Provide an initial network configuration
Execute a ZTP script to perform necessary configuration
The onie-nos-install command does not allow you to specify command line parameters. You must access the switch from the console and transfer a disk image to the switch. You must then make the disk image executable and install the image directly from the ONIE command line with the options you want to use.
The following example commands transfer a disk image to the switch, make the image executable, and install the image with the --password option to change the default cumulus user password:
You can run more than one option in the same command.
Set the cumulus User Password
The default cumulus user account password is cumulus. When you log into Cumulus Linux for the first time, you must provide a new password for the cumulus account, then log back into the system. This password change is required.
To automate this process, you can specify a new password from the command line of the installer with the --password '<clear text-password>' option. For example, to change the default cumulus user password to MyP4$$word:
To provide a hashed password instead of a clear text password, use the --hashed-password '<hash>' option. Using an encrypted hash is recommended to maintain a secure management network.
Generate a sha-512 password hash with the following python command. The example command generates a sha-512 password hash for the password MyP4$$word.
If you specify both the --password and --hashed-password options, the --hashed-password option takes precedence and the --password option is ignored.
Apply a Cumulus Linux License
To apply a license and start the switchd service automatically after Cumulus Linux boots for the first time after installation, use the --license <license-string> option. For example:
To provide initial network configuration automatically when Cumulus Linux boots for the first time after installation, use the --interfaces-file <filename> option. For example, to copy the contents of a file called network.intf into the /etc/network/interfaces file and run the ifreload -a command:
To run a ZTP script that contains commands to execute after Cumulus Linux boots for the first time after installation, use the --ztp <filename> option. For example, to run a ZTP script called initial-conf.ztp:
The ZTP script must contain the CUMULUS-AUTOPROVISIONING string near the beginning of the file and must reside on the ONIE filesystem. Refer to Zero Touch Provisioning - ZTP.
If you use the --ztp option together with any of the other command line options, the ZTP script takes precedence and the other command line options are ignored.
Edit the Cumulus Linux Image (Advanced)
The Cumulus Linux disk image file contains a BASH script that includes a set of variables. You can set these variables to be able to install a fully configured system with a single image file.
▼
To edit the image
Example Image File
The Cumulus Linux disk image file is a self-extracting executable. The executable part of the file is a BASH script and is located at the beginning of the file. Towards the beginning of this BASH script are a set of variables set to an empty string:
Defines the clear text password. This variable is equivalent to the ONIE installer command line option --password.
CL_INSTALLER_HASHED_PASSWORD
Defines the hashed password. This variable is equivalent to the ONIE installer command line option --hashed-password. If you set both the CL_INSTALLER_PASSWORD and CL_INSTALLER_HASHED_PASSWORD variable, the CL_INSTALLER_HASHED_PASSWORD takes precedence.
CL_INSTALLER_LICENSE
Defines the Cumulus Linux license you want to install. This variable is equivalent to the ONIE installer command line option --license.
CL_INSTALLER_INTERFACES_FILENAME
Defines the name of the file on the ONIE filesystem you want to use as the /etc/network/interfaces file. This variable is equivalent to the ONIE installer command line option --interfaces-file.
CL_INSTALLER_INTERFACES_CONTENT
Describes the network interfaces available on your system and how to activate them. Setting this variable defines the contents of the /etc/network/interfaces file. There is no equivalent ONIE installer command line option. If you set both the CL_INSTALLER_INTERFACES_FILENAME and CL_INSTALLER_INTERFACES_CONTENT variables, the CL_INSTALLER_INTERFACES_FILENAME takes precedence.
CL_INSTALLER_ZTP_FILENAME
Defines the name of the ZTP file on the ONIE filesystem you want to execute at first boot after installation. This variable is equivalent to the ONIE installer command line option --ztp
Edit the Image File
Because the Cumulus Linux image file is mostly a binary file, you cannot use standard text editors to edit the file directly. Instead, you must split the file into two parts, edit the first part, then put the two parts back together.
Copy the first 20 lines to an empty file:
head -20 cumulus-linux-4.3.0-bcm-amd64.bin > cumulus-linux-4.3.0-bcm-amd64.bin.1
Remove the first 20 lines of the image, then copy the remaining lines into another empty file:
sed -e '1,20d' cumulus-linux-4.3.0-bcm-amd64.bin > cumulus-linux-4.3.0-bcm-amd64.bin.2
The original file is now split, with the first 20 lines in cumulus-linux-4.3.0-bcm-amd64.bin.1 and the remaining lines in cumulus-linux-4.3.0-bcm-amd64.bin.2.
Use a text editor to change the variables in cumulus-linux-4.3.0-bcm-amd64.bin.1.
Calculate the new checksum and update the CL_INSTALLER_PAYLOAD_SHA256 variable. sed -e '1,/^exit_marker$/d' "cumulus-linux-4.3.0-bcm-amd64.bin.final" | sha256sum | awk '{ print $1 }'
This is an example of a modified image file:
...
CL_INSTALLER_PAYLOAD_SHA256='d14a028c2a3a2bc9476102bb288234c415a2b01f828ea62ac332e42f'
CL_INSTALLER_PASSWORD='MyP4$$word'
CL_INSTALLER_HASHED_PASSWORD=''
CL_INSTALLER_LICENSE='customer@datacenter.com|4C3YMCACDiK0D/EnrxlXpj71FBBNAg4Yrq+brza4ZtJFCInvalid'
CL_INSTALLER_INTERFACES_FILENAME=''
CL_INSTALLER_INTERFACES_CONTENT='# This file describes the network interfaces available on your system and how to activate them.
source /etc/network/interfaces.d/*.intf
# The loopback network interface
auto lo
iface lo inet loopback
# The primary network interface
auto eth0
iface eth0 inet dhcp
vrf mgmt
auto bridge
iface bridge
bridge-ports swp1 swp2
bridge-pvid 1
bridge-vids 10 11
bridge-vlan-aware yes
auto mgmt
iface mgmt
address 127.0.0.1/8
address ::1/128
vrf-table auto
'
CL_INSTALLER_ZTP_FILENAME=''
...
You can install this edited image file in the usual way; using the ONIE install waterfall or the onie-nos-install command.
If you install the modified installation image and specify installer command line parameters, the command line parameters take precedence over the variables modified in the image.
In Cumulus Linux 4.2.0 and later, the default password for the cumulus user account is cumulus. The first time you log into Cumulus Linux, you are required to change this default password. Be sure to update any automation scripts before you upgrade. You can use ONIE command line options to change the default password automatically during the Cumulus Linux image installation process. Refer to ONIE Installation Options.
This topic describes how to upgrade Cumulus Linux on your switch.
Consider deploying, provisioning, configuring, and upgrading switches using automation, even with small networks or test labs. During the upgrade process, you can quickly upgrade dozens of devices in a repeatable manner. Using tools like Ansible, Chef, or Puppet for configuration management greatly increases the speed and accuracy of the next major upgrade; these tools also enable the quick swap of failed switch hardware.
Understanding the location of configuration data is required for successful upgrades, migrations, and backup. As with other Linux distributions, the /etc directory is the primary location for all configuration data in Cumulus Linux. The following list is a likely set of files that you need to back up and migrate to a new release. Make sure you examine any file that has been changed. Make the following files and directories part of a backup strategy.
File Name and Location
Explanation
Cumulus Linux Documentation
Debian Documentation
/etc/network/
Network configuration files, most notably /etc/network/interfaces and /etc/network/interfaces.d/
Per-platform hardware configuration directory, created on first boot. Do not copy.
/etc/mlx/
Per-platform hardware configuration directory, created on first boot. Do not copy.
/etc/default/clagd
Created and managed by ifupdown2. Do not copy.
/etc/default/grub
Grub init table. Do not modify manually.
/etc/default/hwclock
Platform hardware-specific file. Created during first boot. Do not copy.
/etc/init
Platform initialization files. Do not copy.
/etc/init.d/
Platform initialization files. Do not copy.
/etc/fstab
Static information on filesystem. Do not copy.
/etc/image-release
System version data. Do not copy.
/etc/os-release
System version data. Do not copy.
/etc/lsb-release
System version data. Do not copy.
/etc/lvm/archive
Filesystem files. Do not copy.
/etc/lvm/backup
Filesystem files. Do not copy.
/etc/modules
Created during first boot. Do not copy.
/etc/modules-load.d/
Created during first boot. Do not copy.
/etc/sensors.d
Platform-specific sensor data. Created during first boot. Do not copy.
/root/.ansible
Ansible tmp files. Do not copy.
/home/cumulus/.ansible
Ansible tmp files. Do not copy.
You can check which files have changed since the last Cumulus Linux image install with the following commands. Be sure to back up any changed files:
Run the sudo dpkg --verify command to show a list of changed files.
Run the egrep -v '^$|^#|=""$' /etc/default/isc-dhcp-* command to see if any of the generated /etc/default/isc-* files have changed.
Create a cl-support File
Before and after you upgrade the switch, run the cl-support script to create a cl-support archive file. The file is a compressed archive of useful information for troubleshooting. If you experience any issues during upgrade, you can send this archive file to the Cumulus Linux support team to investigate.
Create the cl-support archive file with the cl-support command:
cumulus@switch:~$ sudo cl-support
Copy the cl-support file off the switch to a different location.
After upgrade is complete, run the cl-support command again to create a new archive file:
cumulus@switch:~$ sudo cl-support
Upgrade Cumulus Linux
You can upgrade Cumulus Linux in one of two ways:
Install a Cumulus Linux image of the new release, using ONIE.
Upgrade only the changed packages using the sudo -E apt-get update and sudo -E apt-get upgrade command.
Cumulus Linux also provides the Smart System Manager that enables you to upgrade an active switch with minimal disruption to the network. See Smart System Manager.
Upgrading an MLAG pair requires additional steps. If you are using MLAG to dual connect two Cumulus Linux switches in your environment, follow the steps in Upgrade Switches in an MLAG Pair below to ensure a smooth upgrade.
Cumulus Linux 4.3.1 is supported on Broadcom switches only. You cannot upgrade to Cumulus Linux 4.3.1 on a Mellanox switch.
NVIDIA does not provide a 4.3.1 image for Mellanox switches.
Should I Install a Cumulus Linux Image or Upgrade Packages?
The decision to upgrade Cumulus Linux by either installing a Cumulus Linux image or upgrading packages depends on your environment and your preferences. Here are some recommendations for each upgrade method.
Installing a Cumulus Linux image is recommended if you are performing a rolling upgrade in a production environment and if you are using up-to-date and comprehensive automation scripts. This upgrade method enables you to choose the exact release to which you want to upgrade and is the only method available to upgrade your switch to a new release train (for example, from 3.7.12 to 4.1.0).
Be aware of the following when installing the Cumulus Linux image:
Installing a Cumulus Linux image is destructive; any configuration files on the switch are not saved; copy them to a different server before you start the Cumulus Linux image install.
You must move configuration data to the new OS using ZTP or automation while the OS is first booted, or soon afterwards using out-of-band management.
Merge conflicts with configuration file changes in the new release might go undetected.
If configuration files are not restored correctly, you might be unable to ssh to the switch from in-band management. Out-of-band connectivity (eth0 or console) is recommended.
You must reinstall and reconfigure third-party applications after upgrade.
Package upgrade is recommended if you are upgrading from Cumulus Linux 4.0, or if you use third-party applications (package upgrade does not replace or remove third-party applications, unlike the Cumulus Linux image install).
Be aware of the following when upgrading packages:
You cannot upgrade the switch to a new release train. For example, you cannot upgrade the switch from 3.7.x to 4.1.0.
The sudo -E apt-get upgrade command might result in services being restarted or stopped as part of the upgrade process.
The sudo -E apt-get upgrade command might disrupt core services by changing core service dependency packages.
After you upgrade, account UIDs and GIDs created by packages might be different on different switches, depending on the configuration and package installation history.
Cumulus Linux Image Install (ONIE)
ONIE is an open source project (equivalent to PXE on servers) that enables the installation of network operating systems (NOS) on a bare metal switch.
Lightweight network virtualization (LNV) is deprecated in Cumulus Linux 4.0 in favor of Ethernet virtual private networks (EVPN. If your network is configured for LNV, you need to migrate your network configuration to a BGP EVPN configuration that is functionally equivalent before you upgrade. Refer to Migrating from LNV to EVPN.
To upgrade the switch:
Back up the configurations off the switch.
Download the Cumulus Linux image.
Install the Cumulus Linux image with the onie-install -a -i <image-location> command, which boots the switch into ONIE. The following example command installs the image from a web server, then reboots the switch. There are additional ways to install the Cumulus Linux image, such as using FTP, TFTP, a local file, or a USB drive. For more information, see Installing a New Cumulus Linux Image.
cumulus@switch:~$ sudo onie-install -a -i http://10.0.1.251/cumulus-linux-4.1.0-mlx-amd64.bin && sudo reboot
Restore the configuration files to the new release - ideally with automation.
Verify correct operation with the old configurations on the new release.
Reinstall third party applications and associated configurations.
Package Upgrade
Cumulus Linux completely embraces the Linux and Debian upgrade workflow, where you use an installer to install a base image, then perform any upgrades within that release train with sudo -E apt-get update and sudo -E apt-get upgrade commands. Any packages that have been changed since the base install get upgraded in place from the repository. All switch configuration files remain untouched, or in rare cases merged (using the Debian merge function) during the package upgrade.
When you use package upgrade to upgrade your switch, configuration data stays in place while the packages are upgraded. If the new release updates a configuration file that you changed previously, you are prompted for the version you want to use or if you want to evaluate the differences.
Cumulus Linux 4.3.1 is supported on Broadcom switches only and requires a different upgrade procedure.
Back up the configurations from the switch.
Fetch the latest update metadata from the repository.
cumulus@switch:~$ sudo -E apt-get update
Review potential upgrade issues (in some cases, upgrading new packages might also upgrade additional existing packages due to dependencies). Run the following command to see the additional packages that will be installed or upgraded.
Upgrade all the packages to the latest distribution.
cumulus@switch:~$ sudo -E apt-get upgrade
If no reboot is required after the upgrade completes, the upgrade ends, restarts all upgraded services, and log messages in the /var/log/syslog file similar to the ones shown below. In the examples below, only the frr package is upgraded.
Policy: Service frr.service action stop postponed
Policy: Service frr.service action start postponed
Policy: Restarting services: frr.service
Policy: Finished restarting services
Policy: Removed /usr/sbin/policy-rc.d
Policy: Upgrade is finished
If the upgrade process encounters changed configuration files that have new versions in the release to which you are upgrading, you see a message similar to this:
Configuration file '/etc/frr/daemons'
==> Modified (by you or by a script) since installation.
==> Package distributor has shipped an updated version.
What would you like to do about it ? Your options are:
Y or I : install the package maintainer's version
N or O : keep your currently-installed version
D : show the differences between the versions
Z : start a shell to examine the situation
The default action is to keep your current version.
*** daemons (Y/I/N/O/D/Z) [default=N] ?
- To see the differences between the currently installed version and the
new version, type `D`- To keep the currently installed version, type `N`.
The new package version is installed with the suffix `_.dpkg-dist`
(for example, `/etc/frr/daemons.dpkg-dist`). When upgrade is complete and
**before** you reboot, merge your changes with the changes from the newly
installed file.
- To install the new version, type `I`. Your currently installed version is
saved with the suffix `.dpkg-old`.
When the upgrade is complete, you can search for the files with the
`sudo find / -mount -type f -name '*.dpkg-*'` command.
If you see errors for expired GPG keys that prevent you from upgrading packages, follow the steps in Upgrading Expired GPG Keys.
Reboot the switch if the upgrade messages indicate that a system restart is required.
cumulus@switch:~$ sudo -E apt-get upgrade
... upgrade messages here ...
*** Caution: Service restart prior to reboot could cause unpredictable behavior
*** System reboot required ***
cumulus@switch:~$ sudo reboot
Verify correct operation with the old configurations on the new version.
To ensure that 4.3.1 package update is available only for Broadcom switches, you must either run apt update and apt upgrade twice or manually edit the sources.list file, then run apt update and apt upgrade once. Both procedures are below.
Mellanox switches do not support Cumulus Linux 4.3.1. When you run apt update on a Mellanox switch, the /etc/apt/sources.list does not change. Cumulus Linux remains at 4.3.0 or upgrades to 4.3.0 if you are running an earlier release.
Back up the configurations from the switch.
Fetch the latest update metadata from the repository:
cumulus@switch:~$ sudo -E apt-get update
Review potential upgrade issues (in some cases, upgrading new packages might also upgrade additional existing packages due to dependencies). Run the following command to see the additional packages that will be installed or upgraded:
Upgrade all the packages to the latest distribution. You might be prompted to reboot the switch but this is not required until step 9.
cumulus@switch:~$ sudo -E apt-get upgrade
If the upgrade process encounters changed configuration files that have new versions in the release to which you are upgrading, you see a message similar to this:
Configuration file '/etc/frr/daemons'
==> Modified (by you or by a script) since installation.
==> Package distributor has shipped an updated version.
What would you like to do about it ? Your options are:
Y or I : install the package maintainer's version
N or O : keep your currently-installed version
D : show the differences between the versions
Z : start a shell to examine the situation
The default action is to keep your current version.
*** daemons (Y/I/N/O/D/Z) [default=N] ?
To see the differences between the currently installed version and the
new version, type D- To keep the currently installed version, type N.
The new package version is installed with the suffix _.dpkg-dist
(for example, /etc/frr/daemons.dpkg-dist). When upgrade is complete and
before you reboot, merge your changes with the changes from the newly
installed file.
To install the new version, type I. Your currently installed version is
saved with the suffix .dpkg-old.
When the upgrade is complete, you can search for the files with the
sudo find / -mount -type f -name '*.dpkg-*' command.
If you see errors for expired GPG keys that prevent you from upgrading packages, follow the steps in Upgrading Expired GPG Keys.
Confirm that the distribution in /etc/apt/sources.list has changed from CumulusLinux-4-latest to CumulusLinux-4-latest-BCM:
Fetch the latest update metadata from the repository again:
cumulus@switch:~$ sudo -E apt-get update
Upgrade all the packages to the latest distribution again.
cumulus@switch:~$ sudo -E apt-get upgrade
If the upgrade process encounters changed configuration files that have new versions in the release to which you are upgrading, you see a message similar to one show in step 4.
Confirm that the /etc/lsb-release file contains the target Cumulus Linux version (4.3.1).
cumulus@switch:~$ cat /etc/lsb-release
DISTRIB_ID="Cumulus Linux"
DISTRIB_RELEASE=4.3.1
DISTRIB_DESCRIPTION="Cumulus Linux 4.3.1"
Reboot the switch to finalise the upgrade:
cumulus@switch:~$ sudo reboot
Do not perform this procedure on a Mellanox switch; the switch will become unusable and you will have to reinstall the image. To verify that the switch ASIC is Broadcom, run the dpkg -l | grep cumulus-newpackages-bcm command.
Back up the configurations from the switch.
In the /etc/apt/sources.list file, change the distribution from CumulusLinux-4-latest to CumulusLinux-4-latest-BCM:
Fetch the latest update metadata from the repository:
cumulus@switch:~$ sudo -E apt-get update
Review potential upgrade issues (in some cases, upgrading new packages might also upgrade additional existing packages due to dependencies). Run the following command to see the additional packages that will be installed or upgraded:
Upgrade all the packages to the latest distribution.
cumulus@switch:~$ sudo -E apt-get upgrade
If the upgrade process encounters changed configuration files that have new versions in the release to which you are upgrading, you see a message similar to this:
Configuration file '/etc/frr/daemons'
==> Modified (by you or by a script) since installation.
==> Package distributor has shipped an updated version.
What would you like to do about it ? Your options are:
Y or I : install the package maintainer's version
N or O : keep your currently-installed version
D : show the differences between the versions
Z : start a shell to examine the situation
The default action is to keep your current version.
*** daemons (Y/I/N/O/D/Z) [default=N] ?
To see the differences between the currently installed version and the
new version, type D- To keep the currently installed version, type N.
The new package version is installed with the suffix _.dpkg-dist
(for example, /etc/frr/daemons.dpkg-dist). When upgrade is complete and
before you reboot, merge your changes with the changes from the newly
installed file.
To install the new version, type I. Your currently installed version is
saved with the suffix .dpkg-old.
When the upgrade is complete, you can search for the files with the
sudo find / -mount -type f -name '*.dpkg-*' command.
If you see errors for expired GPG keys that prevent you from upgrading packages, follow the steps in Upgrading Expired GPG Keys.
Confirm that the /etc/lsb-release file contains the target Cumulus Linux version (4.3.1).
cumulus@switch:~$ cat /etc/lsb-release
DISTRIB_ID="Cumulus Linux"
DISTRIB_RELEASE=4.3.1
DISTRIB_DESCRIPTION="Cumulus Linux 4.3.1"
Reboot the switch to finalise the upgrade:
cumulus@switch:~$ sudo reboot
Upgrade Notes
Package upgrade always updates to the latest available release available for the switch ASIC in the Cumulus Linux repository. For example, if you are currently running Cumulus Linux 4.0.0 and run the sudo -E apt-get upgrade command on that switch, the packages are upgraded to the latest releases contained in the latest 4.y.z release.
Because Cumulus Linux is a collection of different Debian Linux packages, be aware of the following:
The /etc/os-release and /etc/lsb-release files are updated to the currently installed Cumulus Linux release when you upgrade the switch using either package upgrade or Cumulus Linux image install. For example, if you run sudo -E apt-get upgrade and the latest Cumulus Linux release on the repository is 4.1.0, these two files display the release as 4.1.0 after the upgrade.
The /etc/image-release file is updated only when you run a Cumulus Linux image install. Therefore, if you run a Cumulus Linux image install of Cumulus Linux 4.0.0, followed by a package upgrade to 4.1.0 using sudo -E apt-get upgrade, the /etc/image-release file continues to display Cumulus Linux 4.0.0, which is the originally installed base image.
Upgrade Switches in an MLAG Pair
If you are using MLAG to dual connect two switches in your environment, follow the steps below to upgrade the switches.
You must upgrade both switches in the MLAG pair to the same release of Cumulus Linux.
Only during the upgrade process does Cumulus Linux supports different software versions between MLAG peer switches. After you upgrade the first MLAG switch in the pair, run the clagctl showtimers command to monitor the init-delay timer. When the timer expires, make the upgraded MLAG switch the primary, then upgrade the peer to the same version of Cumulus Linux.
Running different versions of Cumulus Linux on MLAG peer switches outside of the upgrade time period is untested and might have unexpected results.
For networks with MLAG deployments, you can only upgrade to Cumulus Linux 4.3 from version 3.7.10 or later. If you are using a version of Cumulus Linux earlier than 3.7.10, you must upgrade to version 3.7.10 first, then upgrade to version 4.3. Version 3.7.10 is available on the
NVIDIA Enterprise support portal.
During upgrade, MLAG bonds stay single-connected while the switches are running different major releases; for example, while leaf01 is running 3.7.12 and leaf02 is running 4.3.0.
This is due to a change in the bonding driver regarding how the actor port key is derived, which causes the port key to have a different value for links with the same speed/duplex settings across different major releases. The port key received from the LACP partner must remain consistent between all bond members in order for all bonds to be synchronized. When each MLAG switch sends LACPDUs with different port keys, only links to one MLAG switch are in sync.
Verify the switch is in the secondary role:
cumulus@switch:~$ clagctl status
Shut down the core uplink layer 3 interfaces:
cumulus@switch:~$ sudo ip link set swpX down
Shut down the peer link:
cumulus@switch:~$ sudo ip link set peerlink down
To boot the switch into ONIE, run the onie-install -a -i <image-location> command. The following example command installs the image from a web server. There are additional ways to install the Cumulus Linux image, such as using FTP, a local file, or a USB drive. For more information, see Installing a New Cumulus Linux Image.
cumulus@switch:~$ sudo onie-install -a -i http://10.0.1.251/downloads/cumulus-linux-4.1.0-mlx-amd64.bin
To upgrade the switch with package upgrade instead of booting into ONIE, run the sudo -E apt-get update and sudo -E apt-get upgrade commands; see Package Upgrade.
Reboot the switch:
cumulus@switch:~$ sudo reboot
If you installed a new image on the switch, restore the configuration files to the new release.
Verify STP convergence across both switches:
cumulus@switch:~$ mstpctl showall
Verify core uplinks and peer links are UP:
cumulus@switch:~$ net show interface
Verify MLAG convergence:
cumulus@switch:~$ clagctl status
Make this secondary switch the primary:
cumulus@switch:~$ clagctl priority 2048
Verify the other switch is now in the secondary role.
Repeat steps 2-9 on the new secondary switch.
Remove the priority 2048 and restore the priority back to 32768 on the current primary switch:
cumulus@switch:~$ clagctl priority 32768
Roll Back a Cumulus Linux Installation
Even the most well planned and tested upgrades can result in unforeseen problems; sometimes the best solution is to roll back to the previous state. There are three main strategies; all require detailed planning and execution:
Flatten and rebuild: If the OS becomes unusable, you can use orchestration tools to reinstall the previous OS release from scratch and then rebuild the configuration automatically.
Backup and restore: Another common strategy is to restore to a previous state using a backup captured before the upgrade. See Back up and Restore.
The method you employ is specific to your deployment strategy, so providing detailed steps for each scenario is outside the scope of this document.
Third Party Packages
Third party packages in the Linux host world often use the same package system as the distribution into which it is to be installed (for example, Debian uses apt-get). Or, the package might be compiled and installed by the system administrator. Configuration and executable files generally follow the same filesystem hierarchy standards as other applications.
If you install any third party applications on a Cumulus Linux switch, configuration data is typically installed into the /etc directory, but it is not guaranteed. It is your responsibility to understand the behavior and configuration file information of any third party packages installed on the switch.
After you upgrade using a full Cumulus Linux image install, you need to reinstall any third party packages or any Cumulus Linux add-on packages.
Lightweight network virtualization (LNV) is deprecated in Cumulus Linux 4.0 in favor of Ethernet virtual private networks (EVPN) to enable interoperability with switches from other manufacturers, to commit to industry standards, and because the benefits of EVPN outweigh those of LNV.
If your network is configured for LNV, you need to migrate your network configuration to a BGP EVPN configuration that is functionally equivalent before you upgrade to Cumulus Linux 4.0 or later.
Migration Considerations
You cannot run LNV and EVPN at the same time for the following reasons:
It is not possible to reconcile the bridge-learning configuration on all of the VTEP interfaces if both LNV and EVPN are enabled at the same time. LNV requires MAC learning to be enabled on the VXLAN VTEP interfaces. EVPN requires MAC learning to be disabled on the VXLAN VTEP interfaces.
The Linux bridge installs MAC address entries differently when LNV is enabled than when EVPN is enabled. Different flags are set on the MAC addresses in the Linux kernel depending on how the address is learned. Duplicate and/or conflicting bridge entries and race conditions become a possibility when both are enabled at the same time. Because the kernel bridging table is the basis for programming the forwarding ASICs, this might lead to downstream inconsistencies in the hardware forwarding tables.
The standard IPv4 unicast address family is commonly used to route inside the fabric for spine and leaf Clos networks. Because FRRouting does not currently support BGP dynamic capability negotiation, enabling the EVPN address family requires all of the neighbors to restart for the changes to take effect. This results in a brief disruption to traffic forwarding.
Upgrade to EVPN
Use automation, such as Ansible to upgrade to EVPN. Automation ensures minimal downtime, reduces human error, and is useful at almost any scale.
Using NCLU to update the configuration provides several benefits:
NCLU restarts services and reloads interfaces automatically so the changes can take effect.
With the transactional commit model of NCLU, the order in which the NCLU commands are entered is of no consequence. This further reduces complexity and hidden dependencies.
The upgrade steps described here are based on the following example topology (based on the Reference Topology):
The BGP EVPN configuration for a centralized routing topology is slightly different on the exit/routing leafs compared to the other ToR leaf switches.
Run the following NCLU commands on each type of device shown (leaf, exit, spine):
Leaf node NCLU commands
# BGP changes
cumulus@switch:~$ net add bgp l2vpn evpn neighbor swp51-52 activate
cumulus@switch:~$ net add bgp l2vpn evpn advertise-all-vni
# Disable MAC learning on VNI
cumulus@switch:~$ net add vxlan vni-13 bridge learning off
cumulus@switch:~$ net add vxlan vni-24 bridge learning off
# Remove LNV (vxrd) configuration
cumulus@switch:~$ net del loopback lo vxrd-src-ip
cumulus@switch:~$ net del loopback lo vxrd-svcnode-ip
Exit node NCLU commands
# BGP changes
cumulus@switch:~$ net add bgp l2vpn evpn neighbor swp51-52 activate
cumulus@switch:~$ net add bgp l2vpn evpn advertise-all-vni
cumulus@switch:~$ net add bgp l2vpn evpn advertise-default-gw
# Disable MAC learning on VNI
cumulus@switch:~$ net add vxlan vni-13 bridge learning off
cumulus@switch:~$ net add vxlan vni-24 bridge learning off
# Remove LNV (vxrd) configuration
cumulus@switch:~$ net del loopback lo vxrd-src-ip
cumulus@switch:~$ net del loopback lo vxrd-svcnode-ip
Spine node NCLU commands
# BGP changes
cumulus@switch:~$ net add bgp l2vpn evpn neighbor swp1-4 activate
# Remove LNV service node (vxsnd) configuration
cumulus@switch:~$ net del lnv service-node anycast-ip 10.0.0.200
cumulus@switch:~$ net del lnv service-node peers 10.0.0.21 10.0.0.22
cumulus@switch:~$ net del lnv service-node source [primary-loopback-ip]
# Remove unused LNV anycast address 10.0.0.200
cumulus@switch:~$ net del loopback lo ip address 10.0.0.200/32
cumulus@switch:~$ net del bgp ipv4 unicast network 10.0.0.200/32
Manually disable and stop the LNV daemons. NCLU can remove the LNV configuration from the configuration files, but you must manually stop and disable these daemons before you commit the NCLU changes. After you commit the NCLU changes, NCLU restarts the BGP daemon, which enables the EVPN address family.
Traffic loss can start to occur at this point.
To disable and stop the LNV registration daemon, run the following commands on the leaf and exit nodes:
To commit and apply the pending NCLU changes, run the following command on all the nodes:
cumulus@switch:~$ net commit
Verify the Upgrade
To check that LNV is disabled, run the net show lnv command on any node. This command returns no output when LNV is disabled.
This command is for verification on Cumulus Linux 3.x only. This command has been removed in Cumulus Linux 4.0 and does not work after you upgrade.
cumulus@switch:~$ net show lnv
To ensure that EVPN BGP neighbors are up, run the net show bgp l2vpn summary command:
cumulus@switch:~$ net show bgp l2vpn evpn summary
BGP router identifier 10.0.0.11, local AS number 65011 vrf-id 0
BGP table version 0
RIB entries 23, using 3496 bytes of memory
Peers 2, using 39 KiB of memory
Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd
spine01(swp51) 4 65020 10932 11064 0 0 0 00:14:28 48
spine02(swp52) 4 65020 10938 11068 0 0 0 00:14:27 48
Total number of neighbors 2
To examine the EVPN routes, run the net show bgp l2vpn evpn route command. Because a MAC address only appears as a type-2 route if the host has generated traffic and its MAC is learned by the local EVPN-enabled switch, a host that does not send any traffic does not create a type-2 EVPN route until it sends a frame that ingresses the
EVPN-enabled local switch.
cumulus@switch:~$ net show bgp l2vpn evpn route
BGP table version is 45, local router ID is 10.0.0.11
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal
Origin codes: i - IGP, e - EGP, ? - incomplete
EVPN type-2 prefix: [2]:[ESI]:[EthTag]:[MAClen]:[MAC]:[IPlen]:[IP]
EVPN type-3 prefix: [3]:[EthTag]:[IPlen]:[OrigIP]
EVPN type-5 prefix: [5]:[ESI]:[EthTag]:[IPlen]:[IP]
Network Next Hop Metric LocPrf Weight Path
Route Distinguisher: 10.0.0.11:2
*> [2]:[0]:[0]:[48]:[00:03:00:11:11:01]
10.0.0.100 32768 i
*> [2]:[0]:[0]:[48]:[02:03:00:11:11:01]
10.0.0.100 32768 i
*> [2]:[0]:[0]:[48]:[02:03:00:11:11:02]
10.0.0.100 32768 i
*> [3]:[0]:[32]:[10.0.0.100]
10.0.0.100 32768 i
Route Distinguisher: 10.0.0.11:3
*> [2]:[0]:[0]:[48]:[00:03:00:22:22:02]
10.0.0.100 32768 i
*> [2]:[0]:[0]:[48]:[02:03:00:22:22:01]
10.0.0.100 32768 i
*> [2]:[0]:[0]:[48]:[02:03:00:22:22:02]
10.0.0.100 32768 i
*> [3]:[0]:[32]:[10.0.0.100]
10.0.0.100 32768 i
Route Distinguisher: 10.0.0.13:2
* [2]:[0]:[0]:[48]:[00:03:00:33:33:01]
10.0.0.101 0 65020 65013 i
*> [2]:[0]:[0]:[48]:[00:03:00:33:33:01]
10.0.0.101 0 65020 65013 i
* [2]:[0]:[0]:[48]:[02:03:00:33:33:01]
10.0.0.101 0 65020 65013 i
*> [2]:[0]:[0]:[48]:[02:03:00:33:33:01]
10.0.0.101 0 65020 65013 i
* [2]:[0]:[0]:[48]:[02:03:00:33:33:02]
10.0.0.101 0 65020 65013 i
*> [2]:[0]:[0]:[48]:[02:03:00:33:33:02]
10.0.0.101 0 65020 65013 i
* [3]:[0]:[32]:[10.0.0.101]
10.0.0.101 0 65020 65013 i
*> [3]:[0]:[32]:[10.0.0.101]
10.0.0.101 0 65020 65013 i
...
You can filter the EVPN route output by route type. The multicast route type corresponds to type-3. The prefix route type is type-5 (but is not used here).
cumulus@switch:~$ net show bgp l2vpn evpn route type
macip : MAC-IP (Type-2) route
multicast : Multicast
prefix : An IPv4 or IPv6 prefix
In the EVPN route output below, Cumulus Linux learned 00:03:00:33:33:01 with a next-hop (VTEP IP address) of 10.0.0.101. The MAC address of server03 is 00:03:00:33:33:01.
cumulus@leaf01:~$ net show bgp l2vpn evpn route
...
Route Distinguisher: 10.0.0.13:2
* [2]:[0]:[0]:[48]:[00:03:00:33:33:01]
10.0.0.101 0 65020 65013 i
...
To ensure the type-2 route is installed in the bridge table, run the net show bridge macs <mac-address> command on leaf01:
cumulus@leaf01:~$ net show bridge macs 00:03:00:33:33:01
VLAN Master Interface MAC TunnelDest State Flags LastSeen
-------- ------ --------- ----------------- ---------- ----- ------------- --------
13 bridge vni-13 00:03:00:33:33:01 offload 00:01:49
untagged vni-13 00:03:00:33:33:01 10.0.0.101 self, offload 00:01:49
Back up and Restore
You can back up the current configuration on a switch and restore the configuration on the same switch or on another Cumulus Linux switch of the same type and release. The backup is a compressed tar file that includes all configuration files installed by Debian packages and marked as configuration files. In addition, the backup contains files in the /etc directory that are not installed by a Debian package but are modified when you install a new image or enable/disable certain services (such as the Cumulus license file).
Cumulus Linux automatically creates a backup of the configuration files on the switch after you install the Cumulus Linux image, in case you want to return to the initial switch configuration. NCLU automatically
creates a backup of the configuration files when you run the net commit command and restores a previous configuration when you run the net rollback command.
Back up Configuration Files
To back up the current configuration files on the switch, run the config-backup command:
cumulus@switch:~$ sudo config-backup
If you run this command without any options, Cumulus Linux creates a backup of the current configuration and stores the backup file in the /var/lib/config-backup/backups directory. The filename includes the date and time you run the backup, and the switch name; for example, config_backup-2019-04-23-21.30.47_leaf01. You can restore the backup with the config-restore command, described below.
The switch can store up to 30 non-permanent backup files (or can allocate a maximum of 25 MB of disc space) in addition to the permanent backup files (see the -p option below). When this limit is reached, Cumulus Linux keeps the oldest and the newest backup files, then starts removing the second oldest file up to the second newest file.
Cumulus Linux recommends you copy the backup file off the switch after backup is complete.
The config-backup command includes the following options:
Option
Description
-h
Displays this list of command options.
-d
Enables debugging output, which shows status messages during the backup process.
-D <description>
Adds a description, which is shown in the archive file list when you run the config-restore -l command.
-p
Adds -perm to the end of the backup filename to mark it as permanent. For example, config_backup-2019-04-23-21.30.47_leaf01-perm. Be careful when using this option. Permanent backup files are not removed.
-q
Runs the command in quiet mode. No status messages are shown, only errors.
-t <type>
Specifies the type of configuration, which is shown in the archive file list when you run the config-restore -l command. You can provide any short text. For example, you can specify pre, post, or pre-restore.
-v
Enables verbose mode to show messages during the backup process.
-X <pattern>
Excludes certain files that match a specified pattern. For example, to exclude all backup files ending with a tilde (~), use the -X .*~$ option.
config-backup Command Examples
The following command example creates a backup file in debugging mode and provides the description myconfig, which shows in the backup archive list.
The following command example creates a backup file in quiet mode and excludes files that end in a tilde (~).
cumulus@switch:~$ sudo config-backup -q -X .*~$
The following command example creates a backup file in verbose mode and marks the file as permanent.
cumulus@switch:~$ sudo config-backup -pv
Restore Backup Files
You can restore a backup to the same switch or to a different switch. When restoring to a different switch, the switch must be of the same type and release. For example, you can restore a backup from a Broadcom Trident3 switch to a Broadcom Trident3 switch; however, you cannot restore a backup from a Broadcom Trident3 switch to an NVIDIA Spectrum or to a Broadcom Tomahawk2 switch.
To restore a backup file, run the config-restore command with a specific filename (-b <filename>), file number (-n <number>), or the -N option, which restores the most recent backup file.
You can run the config-restore -l command to list the archived backup files by filename and number (see config-restore Command Examples below).
After the backup file is restored successfully, you are prompted to restart any affected services or reboot the switch if necessary.
Cumulus Linux reports any issues encountered during restore and prompts you to continue or stop.
The config-restore command requires a filename, file number, or the most recent file option (-N).
You can only run one config-backup or config-restore command instance at the same time.
The config-restore command includes the following options:
Option
Description
-h
Displays this list of command options.
-a <directory>
Restores the backup to the directory specified.
-B
Runs no backup before restoring the configuration. If you do not specify this option, Cumulus Linux runs a backup to save the current configuration before the restore so that you can do a rollback if needed.
-b <filename>
Specifies the name of the backup file you want to restore (shown by -l).
-D
Shows the differences between the current configuration and the configuration in the backup file.
-d
Displays debugging output, which provides status messages during the restore process.
-f
Forces the restore; does not prompt for confirmations.
-F <filename>
Shows differences for only this file (used with -D).
-i
Displays information about the current backup file.
-L
Lists the configuration files in the backup file.
-l
Lists all backup files archived on the switch and includes the file number, type, and description.
-N
Restores the newest (most recent) backup file.
-n <number>
Specifies the backup file by number (shown by -l).
-q
Runs the command in quiet mode. No status messages are displayed, only errors.
-T
Runs the command in test mode; does not restore the configuration but shows what would be restored.
-v
Enables verbose mode to display status messages during restore.
config-restore Command Examples
The following command example lists the backup files available on the switch. The list includes the file number (#), type, description, and filename. Type is the text specified with the config-backup -t option.
cumulus@switch:~$ sudo config-restore -l
# Type Description Name
1 Initial First system boot config_backup-2019-04-23-00.42.11_cumulus-perm
2 Initial First system boot config_backup-2019-04-23-00.47.43_cumulus-perm
3 Initial First system boot config_backup-2019-04-23-18.12.26_cumulus-perm
4 pre nclu "net commit" (user cumulus) config_backup-2019-04-23-19.55.13_leaf01
5 post-4 nclu "net commit" (user cumulus) config_backup-2019-04-23-19.55.26_leaf01
6 config_backup-2019-04-23-21.20.41_leaf01
7 config_backup-2019-04-23-21.30.47_leaf01-perm
...
The following command example runs in verbose mode to restore the backup file config_backup-2019-04-23-21.30.47_leaf01.
The following command example runs test mode to restore the most recent backup file (no configuration is actually restored).
cumulus@switch:~$ sudo config-restore -T -N
The following command example lists the files in the most recent backup file.
cumulus@switch:~$ sudo config-restore -L -N
Adding and Updating Packages
You use the Advanced Packaging Tool (apt) to manage additional applications (in the form of packages) and to install the latest updates.
Updating, upgrading, and installing packages with apt causes disruptions to network services:
Upgrading a package might result in services being restarted or stopped as part of the upgrade process.
Installing a package might disrupt core services by changing core service dependency packages. In some cases, installing new packages might also upgrade additional existing packages due to dependencies.
If services are stopped, you might need to reboot the switch for those services to restart.
Update the Package Cache
To work properly, apt relies on a local cache listing of the available packages. You must populate the cache initially, then periodically update it with sudo -E apt-get update:
Use the -E option with sudo whenever you run any apt-get command. This option preserves your environment variables (such as HTTP proxies) before you install new packages or upgrade your distribution.
List Available Packages
After the cache is populated, use the apt-cache command to search the cache and find the packages in which you are interested or to get information about an available package.
Here are examples of the search and show sub-commands:
cumulus@switch:~$ apt-cache search tcp
collectd-core - statistics collection and monitoring daemon (core system)
fakeroot - tool for simulating superuser privileges
iperf - Internet Protocol bandwidth measuring tool
iptraf-ng - Next Generation Interactive Colorful IP LAN Monitor
libfakeroot - tool for simulating superuser privileges - shared libraries
libfstrm0 - Frame Streams (fstrm) library
libibverbs1 - Library for direct userspace use of RDMA (InfiniBand/iWARP)
libnginx-mod-stream - Stream module for Nginx
libqt4-network - Qt 4 network module
librtr-dev - Small extensible RPKI-RTR-Client C library - development files
librtr0 - Small extensible RPKI-RTR-Client C library
libwiretap8 - network packet capture library -- shared library
libwrap0 - Wietse Venema's TCP wrappers library
libwrap0-dev - Wietse Venema's TCP wrappers library, development files
netbase - Basic TCP/IP networking system
nmap-common - Architecture independent files for nmap
nuttcp - network performance measurement tool
openssh-client - secure shell (SSH) client, for secure access to remote machines
openssh-server - secure shell (SSH) server, for secure access from remote machines
openssh-sftp-server - secure shell (SSH) sftp server module, for SFTP access from remote machines
python-dpkt - Python 2 packet creation / parsing module for basic TCP/IP protocols
rsyslog - reliable system and kernel logging daemon
socat - multipurpose relay for bidirectional data transfer
tcpdump - command-line network traffic analyzer
cumulus@switch:~$ apt-cache show tcpdump
Package: tcpdump
Version: 4.9.3-1~deb10u1
Installed-Size: 1109
Maintainer: Romain Francoise <rfrancoise@debian.org>
Architecture: amd64
Replaces: apparmor-profiles-extra (<< 1.12~)
Depends: libc6 (>= 2.14), libpcap0.8 (>= 1.5.1), libssl1.1 (>= 1.1.0)
Suggests: apparmor (>= 2.3)
Breaks: apparmor-profiles-extra (<< 1.12~)
Size: 400060
SHA256: 3a63be16f96004bdf8848056f2621fbd863fadc0baf44bdcbc5d75dd98331fd3
SHA1: 2ab9f0d2673f49da466f5164ecec8836350aed42
MD5sum: 603baaf914de63f62a9f8055709257f3
Description: command-line network traffic analyzer
This program allows you to dump the traffic on a network. tcpdump
is able to examine IPv4, ICMPv4, IPv6, ICMPv6, UDP, TCP, SNMP, AFS
BGP, RIP, PIM, DVMRP, IGMP, SMB, OSPF, NFS and many other packet
types.
.
It can be used to print out the headers of packets on a network
interface, filter packets that match a certain expression. You can
use this tool to track down network problems, to detect attacks
or to monitor network activities.
Description-md5: f01841bfda357d116d7ff7b7a47e8782
Homepage: http://www.tcpdump.org/
Multi-Arch: foreign
Section: net
Priority: optional
Filename: pool/upstream/t/tcpdump/tcpdump_4.9.3-1~deb10u1_amd64.deb
The search commands look for the search terms not only in the package name but in other parts of the package information; the search matches on more packages than you might expect.
List Packages Installed on the System
apt-cache command shows information about all the packages available in the repository. To see which packages are actually installed on your system with their versions, run the following commands.
Run the net show package version command:
cumulus@switch:~$ net show package version
Package Installed Version(s)
--------------------------------- -----------------------------------------------------------------------
acpi 1.7-1.1
acpi-support-base 0.142-8
acpid 1:2.0.31-1
adduser 3.118
apt 1.8.2
arping 2.19-6
arptables 0.0.4+snapshot20181021-4
...
Run the dpkg -l command:
cumulus@switch:~$ dpkg -l
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name Version Architecture Description
+++-===================-=========================-============-=================================
ii acpi 1.7-1.1 amd64 displays information on ACPI devices
ii acpi-support-base 0.142-8 all scripts for handling base ACPI events such as th
ii acpid 1:2.0.31-1 amd64 Advanced Configuration and Power Interface event
ii adduser 3.118 all add and remove users and groups
ii apt 1.8.2 amd64 commandline package manager
ii arping 2.19-6 amd64 sends IP and/or ARP pings (to the MAC address)
ii arptables 0.0.4+snapshot20181021-4 amd64 ARP table administration
...
The apps repository was removed in Cumulus Linux 4.0.0.
Show the Version of a Package
To show the version of a specific package installed on the system:
Run the net show package version <package> command. For example, the following command shows which version of the vrf package is installed on the system:
cumulus@switch:~$ net show package version vrf
1.0-cl4u2
Run the Linux dpkg -l <package_name> command. For example, the following command shows which version of the vrf package is installed on the system:
cumulus@switch:~$ dpkg -l vrf
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name Version Architecture Description
+++-==========-============-============-=================================
ii vrf 1.0-cl4u2 amd64 Linux tools for VRF
Upgrade Packages
To upgrade all the packages installed on the system to their latest versions, run the following commands:
A list of packages that will be upgraded is displayed and you are prompted to continue.
The above commands upgrade all installed versions with their latest versions but do not install any new packages.
Add New Packages
To add a new package, first ensure the package is not already installed on the system:
cumulus@switch:~$ dpkg -l | grep <name of package>
If the package is installed already, you can update the package from the Cumulus Linux repository as part of the package upgrade process, which upgrades all packages on the system. See Upgrade Packages above.
If the package is not already installed, add it by running sudo -E apt-get install <name of package>. This retrieves the package from the Cumulus Linux repository and installs it on your system together with any other packages on which this package might depend. The following example adds the tcpreplay package to the system:
cumulus@switch:~$ sudo -E apt-get update
cumulus@switch:~$ sudo -E apt-get install tcpreplay
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following NEW packages will be installed:
tcpreplay
0 upgraded, 1 newly installed, 0 to remove and 1 not upgraded.
Need to get 436 kB of archives.
After this operation, 1008 kB of additional disk space will be used
...
You can install several packages at the same time:
In some cases, installing a new package might also upgrade additional existing packages due to dependencies. To view these additional packages before you install, run the apt-get install --dry-run command.
Add Packages from Another Repository
As shipped, Cumulus Linux searches the Cumulus Linux repository for available packages. You can add additional repositories to search by adding them to the list of sources that apt-get consults. See man sources.list for more information.
NVIDIA has added features or made bug fixes to certain packages; you must not replace these packages with versions from other repositories. Cumulus Linux is configured to ensure that the packages from the Cumulus Linux repository are always preferred over packages from other repositories.
If you want to install packages that are not in the Cumulus Linux repository, the procedure is the same as above, but with one additional step.
Packages that are not part of the Cumulus Linux Repository are not typically tested and might not be supported by Cumulus Linux Technical Support.
Installing packages outside of the Cumulus Linux repository requires the use of sudo -E apt-get; however, depending on the package, you can use easy-install and other commands.
To install a new package, complete the following steps:
Run the dpkg command to ensure that the package is not already
installed on the system:
cumulus@switch:~$ dpkg -l | grep <name of package>
If the package is installed already, ensure it is the version you need. If it is an older version, update the package from the Cumulus Linux repository:
If the package is not on the system, the package source location is most likely not in the /etc/apt/sources.list file. If the source for the new package is not in sources.list, edit and add the appropriate source to the file. For example, add the following if you want a package from the Debian repository that is not in the Cumulus Linux repository:
deb http://http.us.debian.org/debian buster main
deb http://security.debian.org/ buster/updates main
Otherwise, the repository might be listed in /etc/apt/sources.list but is commented out. To uncomment the repository, remove the # at the start of the line, then save the file.
Run sudo -E apt-get update, then install the package and upgrade:
Cumulus Linux contains a local archive embedded in the Cumulus Linux image. This archive contains the packages needed to install ifplugd, LDAP, RADIUS or TACACS+ without needing a network connection.
The archive is called cumulus-local-apt-archive and is referenced in the /etc/apt/cumulus-local-apt-archive-sources.list file. It contains the following packages:
audisp-tacplus
ifplugd
libdaemon0
libnss-ldapd
libnss-mapuser
libnss-tacplus
libpam-ldapd
libpam-radius-auth
libpam-tacplus
libtac2
libtacplus-map1
nslcd
You add these packages normally with apt-get update && apt-get install, as described above.
man pages for apt-get, dpkg, sources.list, apt_preferences
Considerations
At this time, you cannot directly browse the contents of the apt.cumulusnetworks.com repository using HTTP.
Zero Touch Provisioning - ZTP
Zero touch provisioning (ZTP) enables you to deploy network devices quickly in large-scale environments. On first boot, Cumulus Linux invokes ZTP, which executes the provisioning automation used to deploy the device for its intended role in the network.
The provisioning framework allows for a one-time, user-provided script to be executed. You can develop this script using a variety of automation tools and scripting languages, providing ample flexibility
for you to design the provisioning scheme to meet your needs. You can also use it to add the switch to a configuration management (CM) platform such as Puppet, Chef, CFEngine or possibly a custom, proprietary tool.
While developing and testing the provisioning logic, you can use the ztp command in Cumulus Linux to manually invoke your provisioning script on a device.
ZTP in Cumulus Linux can occur automatically in one of the following ways, in this order:
Through a local file
Using a USB drive inserted into the switch (ZTP-USB)
Through DHCP
Each method is discussed in greater detail below.
Use a Local File
ZTP only looks once for a ZTP script on the local file system when the switch boots. ZTP searches for an install script that matches an ONIE-style waterfall in /var/lib/cumulus/ztp, looking for the most specific name first, and ending at the most generic:
You can also trigger the ZTP process manually by running the ztp --run <URL> command, where the URL is the path to the ZTP script.
Use a USB Drive
This feature has been tested only with thumb drives, not an actual external large USB hard drive.
If the ztp process does not discover a local script, it tries once to locate an inserted but unmounted USB drive. If it discovers one, it begins the ZTP process.
Cumulus Linux supports the use of a FAT32, FAT16, or VFAT-formatted USB drive as an installation source for ZTP scripts. You must plug in the USB drive before you power up the switch.
At minimum, the script must:
Install the Cumulus Linux operating system and license.
Copy over a basic configuration to the switch.
Restart the switch or the relevant services to get switchd up and running with that configuration.
Follow these steps to perform ZTP using a USB drive:
Copy the Cumulus Linux license and installation image to the USB drive.
The ztp process searches the root filesystem of the newly mounted drive for filenames matching an ONIE-style waterfall (see the patterns and examples above), looking for the most specific name first, and ending at the most generic.
The contents of the script are parsed to ensure it contains the CUMULUS-AUTOPROVISIONING flag (see example scripts).
The USB drive is mounted to a temporary directory under /tmp (for example, /tmp/tmpigGgjf/). To reference files on the USB drive, use the environment variable ZTP_USB_MOUNTPOINT to refer to the USB root partition.
ZTP over DHCP
If the ztp process does not discover a local/ONIE script or applicable USB drive, it checks DHCP every ten seconds for up to five minutes for the presence of a ZTP URL specified in /var/run/ztp.dhcp. The URL can be any of HTTP, HTTPS, or FTP.
For ZTP using DHCP, provisioning initially takes place over the management network and is initiated through a DHCP hook. A DHCP option is used to specify a configuration script. This script is then requested from the Web server and executed locally on the switch.
The ZTP process over DHCP follows these steps:
The first time you boot Cumulus Linux, eth0 is configured for DHCP and makes a DHCP request.
The DHCP server offers a lease to the switch.
If option 239 is present in the response, the ZTP process starts.
The ZTP process requests the contents of the script from the URL, sending additional HTTP headers containing details about the switch.
The contents of the script are parsed to ensure it contains the CUMULUS-AUTOPROVISIONING flag (see example scripts).
If provisioning is necessary, the script executes locally on the switch with root privileges.
The return code of the script is examined. If it is 0, the provisioning state is marked as complete in the autoprovisioning configuration file.
Trigger ZTP over DHCP
If provisioning has not already occurred, it is possible to trigger the ZTP process over DHCP when eth0 is set to use DHCP and one of the following events occur:
The switch boots.
You plug a cable into or unplug a cable from the eth0 port.
You disconnect, then reconnect the switch power cord.
You can also run the ztp --run <URL> command, where the URL is the path to the ZTP script.
Configure the DHCP Server
During the DHCP process over eth0, Cumulus Linux requests DHCP option 239. This option is used to specify the custom provisioning script.
For example, the /etc/dhcp/dhcpd.conf file for an ISC DHCP server looks like:
Do not use an underscore (_) in the hostname; underscores are not permitted in hostnames.
Inspect HTTP Headers
The following HTTP headers are sent in the request to the webserver to retrieve the provisioning script:
Header Value Example
------ ----- -------
User-Agent CumulusLinux-AutoProvision/0.4
CUMULUS-ARCH CPU architecture x86_64
CUMULUS-BUILD 4.1.0
CUMULUS-LICENSE-INSTALLED Either 0 or 1 1
CUMULUS-MANUFACTURER odm
CUMULUS-PRODUCTNAME switch_model
CUMULUS-SERIAL XYZ123004
CUMULUS-BASE-MAC 44:38:39:FF:40:94
CUMULUS-MGMT-MAC 44:38:39:FF:00:00
CUMULUS-VERSION 4.1.0
CUMULUS-PROV-COUNT 0
CUMULUS-PROV-MAX 32
Write ZTP Scripts
Remember to include the following line in any of the supported scripts that you expect to run using the autoprovisioning framework.
# CUMULUS-AUTOPROVISIONING
This line is required somewhere in the script file for execution to occur.
The script must contain the CUMULUS-AUTOPROVISIONING flag. You can include this flag in a comment or remark; the flag does not need to be echoed or written to stdout.
You can write the script in any language currently supported by Cumulus Linux, such as:
Perl
Python
Ruby
Shell
The script must return an exit code of 0 upon success, as this triggers the autoprovisioning process to be marked as complete in the autoprovisioning configuration file.
The following script installs Cumulus Linux and its license from a USB drive and applies a configuration:
#!/bin/bash
function error() {
echo -e "\e[0;33mERROR: The ZTP script failed while running the command $BASH_COMMAND at line $BASH_LINENO.\e[0m" >&2
exit 1
}
# Log all output from this script
exec >> /var/log/autoprovision 2>&1
date "+%FT%T ztp starting script $0"
trap error ERR
#Add Debian Repositories
echo "deb http://http.us.debian.org/debian buster main" >> /etc/apt/sources.list
echo "deb http://security.debian.org/ buster/updates main" >> /etc/apt/sources.list
#Update Package Cache
apt-get update -y
#Load interface config from usb
cp ${ZTP_USB_MOUNTPOINT}/interfaces /etc/network/interfaces
#Load port config from usb
# (if breakout cables are used for certain interfaces)
cp ${ZTP_USB_MOUNTPOINT}/ports.conf /etc/cumulus/ports.conf
#Install a License from usb and restart switchd
/usr/cumulus/bin/cl-license -i ${ZTP_USB_MOUNTPOINT}/license.txt && systemctl restart switchd.service
#Reload interfaces to apply loaded config
ifreload -a
#Output state of interfaces
net show interface
# CUMULUS-AUTOPROVISIONING
exit 0
Best Practices
ZTP scripts come in different forms and frequently perform many of the same tasks. As BASH is the most common language used for ZTP scripts, the following BASH snippets are provided to accelerate your ability to perform common tasks with robust error checking.
Set the Default Cumulus User Password
The default cumulus user account password is cumulus. When you log into Cumulus Linux for the first time, you must provide a new password for the cumulus account, then log back into the system. This password change at first login is required in Cumulus Linux 4.2 and later.
Add the following function to your ZTP script to change the default cumulus user account password to a clear-text password. The example changes the password cumulus to MyP4$$word.
function set_password(){
# Unexpire the cumulus account
passwd -x 99999 cumulus
# Set the password
echo 'cumulus:MyP4$$word' | chpasswd
}
set_password
If you have an insecure management network, set the password with an encrypted hash instead of a clear-text password. Using an encrypted hash is recommended.
First, generate a sha-512 password hash with the following python commands. The example commands generate a sha-512 password hash for the password MyP4$$word.
Then, add the following function to the ZTP script to change the default cumulus user account password:
function set_password(){
# Unexpire the cumulus account
passwd -x 99999 cumulus
# Set the password
usermod -p '$6$hs7OPmnrfvLNKfoZ$iB3hy5N6Vv6koqDmxixpTO6lej6VaoKGvs5E8p5zNo4tPec0KKqyQnrFMII3jGxVEYWntG9e7Z7DORdylG5aR/' cumulus
}
set_password
Install a License
Use the following function to include error checking for license file installation.
function install_license(){
# Install license
echo "$(date) INFO: Installing License..."
echo $1 | /usr/cumulus/bin/cl-license -i
return_code=$?
if [ "$return_code" == "0" ]; then
echo "$(date) INFO: License Installed."
else
echo "$(date) ERROR: License not installed. Return code was: $return_code"
/usr/cumulus/bin/cl-license
exit 1
fi
}
Test DNS Name Resolution
DNS names are frequently used in ZTP scripts. The ping_until_reachable function tests that each DNS name resolves into a reachable IP address. Call this function with each DNS target used in your script before you use the DNS name elsewhere in your script.
The following example shows how to call the ping_until_reachable function in the context of a larger task.
function ping_until_reachable(){
last_code=1
max_tries=30
tries=0
while [ "0" != "$last_code" ] && [ "$tries" -lt "$max_tries" ]; do
tries=$((tries+1))
echo "$(date) INFO: ( Attempt $tries of $max_tries ) Pinging $1 Target Until Reachable."
ping $1 -c2 &> /dev/null
last_code=$?
sleep 1
done
if [ "$tries" -eq "$max_tries" ] && [ "$last_code" -ne "0" ]; then
echo "$(date) ERROR: Reached maximum number of attempts to ping the target $1 ."
exit 1
fi
}
Check the Cumulus Linux Release
The following script segment demonstrates how to check which Cumulus Linux release is running currently and upgrades the node if the release is not the target release. If the release is the target release, normal ZTP tasks execute. This script calls the ping_until_reachable script (described above) to make sure the server holding the image server and the ZTP script is reachable.
If you apply a management VRF in your script, either apply it last or reboot instead. If you do not apply a management VRF last, you need to prepend any commands that require eth0 to communicate out with /usr/bin/ip vrf exec mgmt; for example, /usr/bin/ip vrf exec mgmt apt-get update -y.
Perform Ansible Provisioning Callbacks
After initially configuring a node with ZTP, use Provisioning Callbacks to inform Ansible Tower or AWX that the node is ready for more detailed provisioning. The following example demonstrates how to use a provisioning callback:
Make sure to disable the DHCP hostname override setting in your script (NCLU does this automatically).
function set_hostname(){
# Remove DHCP Setting of Hostname
sed s/'SETHOSTNAME="yes"'/'SETHOSTNAME="no"'/g -i /etc/dhcp/dhclient-exit-hooks.d/dhcp-sethostname
hostnamectl set-hostname $1
}
NCLU in ZTP Scripts
Not all aspects of NCLU are supported when running during ZTP. Use traditional Linux methods of providing configuration to the switch during ZTP.
When you use NCLU in ZTP scripts, add the following loop to make sure NCLU has time to start up before being called.
# Waiting for NCLU to finish starting up
last_code=1
while [ "1" == "$last_code" ]; do
net show interface &> /dev/null
last_code=$?
done
net add vrf mgmt
net add time zone Etc/UTC
net add time ntp server 192.168.0.254 iburst
net commit
Test ZTP Scripts
There are a few commands you can use to test and debug your ZTP scripts.
You can use verbose mode to debug your script and see where your script failed. Include the -v option when you run ZTP:
cumulus@switch:~$ sudo ztp -v -r http://192.0.2.1/demo.sh
Attempting to provision via ZTP Manual from http://192.0.2.1/demo.sh
Broadcast message from root@dell-s6010-01 (ttyS0) (Tue May 10 22:44:17 2016):
ZTP: Attempting to provision via ZTP Manual from http://192.0.2.1/demo.sh
ZTP Manual: URL response code 200
ZTP Manual: Found Marker CUMULUS-AUTOPROVISIONING
ZTP Manual: Executing http://192.0.2.1/demo.sh
error: ZTP Manual: Payload returned code 1
error: Script returned failure
To see if ZTP is enabled and to see results of the most recent execution, you can run the ztp -s command.
cumulus@switch:~$ ztp -s
ZTP INFO:
State enabled
Version 1.0
Result Script Failure
Date Mon 20 May 2019 09:31:27 PM UTC
Method ZTP DHCP
URL http://192.0.2.1/demo.sh
If ZTP runs when the switch boots and not manually, you can run the systemctl -l status ztp.service then journalctl -l -u ztp.service to see if any failures occur:
cumulus@switch:~$ sudo systemctl -l status ztp.service
● ztp.service - Cumulus Linux ZTP
Loaded: loaded (/lib/systemd/system/ztp.service; enabled)
Active: failed (Result: exit-code) since Wed 2016-05-11 16:38:45 UTC; 1min 47s ago
Docs: man:ztp(8)
Process: 400 ExecStart=/usr/sbin/ztp -b (code=exited, status=1/FAILURE)
Main PID: 400 (code=exited, status=1/FAILURE)
May 11 16:37:45 cumulus ztp[400]: ztp [400]: ZTP USB: Device not found
May 11 16:38:45 dell-s6010-01 ztp[400]: ztp [400]: ZTP DHCP: Looking for ZTP Script provided by DHCP
May 11 16:38:45 dell-s6010-01 ztp[400]: ztp [400]: Attempting to provision via ZTP DHCP from http://192.0.2.1/demo.sh
May 11 16:38:45 dell-s6010-01 ztp[400]: ztp [400]: ZTP DHCP: URL response code 200
May 11 16:38:45 dell-s6010-01 ztp[400]: ztp [400]: ZTP DHCP: Found Marker CUMULUS-AUTOPROVISIONING
May 11 16:38:45 dell-s6010-01 ztp[400]: ztp [400]: ZTP DHCP: Executing http://192.0.2.1/demo.sh
May 11 16:38:45 dell-s6010-01 ztp[400]: ztp [400]: ZTP DHCP: Payload returned code 1
May 11 16:38:45 dell-s6010-01 ztp[400]: ztp [400]: Script returned failure
May 11 16:38:45 dell-s6010-01 systemd[1]: ztp.service: main process exited, code=exited, status=1/FAILURE
May 11 16:38:45 dell-s6010-01 systemd[1]: Unit ztp.service entered failed state.
cumulus@switch:~$
cumulus@switch:~$ sudo journalctl -l -u ztp.service --no-pager
-- Logs begin at Wed 2016-05-11 16:37:42 UTC, end at Wed 2016-05-11 16:40:39 UTC. --
May 11 16:37:45 cumulus ztp[400]: ztp [400]: /var/lib/cumulus/ztp: Sate Directory does not exist. Creating it...
May 11 16:37:45 cumulus ztp[400]: ztp [400]: /var/run/ztp.lock: Lock File does not exist. Creating it...
May 11 16:37:45 cumulus ztp[400]: ztp [400]: /var/lib/cumulus/ztp/ztp_state.log: State File does not exist. Creating it...
May 11 16:37:45 cumulus ztp[400]: ztp [400]: ZTP LOCAL: Looking for ZTP local Script
May 11 16:37:45 cumulus ztp[400]: ztp [400]: ZTP LOCAL: Waterfall search for /var/lib/cumulus/ztp/cumulus-ztp-x86_64-dell_s6010_s1220-rUNKNOWN
May 11 16:37:45 cumulus ztp[400]: ztp [400]: ZTP LOCAL: Waterfall search for /var/lib/cumulus/ztp/cumulus-ztp-x86_64-dell_s6010_s1220
May 11 16:37:45 cumulus ztp[400]: ztp [400]: ZTP LOCAL: Waterfall search for /var/lib/cumulus/ztp/cumulus-ztp-x86_64-dell
May 11 16:37:45 cumulus ztp[400]: ztp [400]: ZTP LOCAL: Waterfall search for /var/lib/cumulus/ztp/cumulus-ztp-x86_64
May 11 16:37:45 cumulus ztp[400]: ztp [400]: ZTP LOCAL: Waterfall search for /var/lib/cumulus/ztp/cumulus-ztp
May 11 16:37:45 cumulus ztp[400]: ztp [400]: ZTP USB: Looking for unmounted USB devices
May 11 16:37:45 cumulus ztp[400]: ztp [400]: ZTP USB: Parsing partitions
May 11 16:37:45 cumulus ztp[400]: ztp [400]: ZTP USB: Device not found
May 11 16:38:45 dell-s6010-01 ztp[400]: ztp [400]: ZTP DHCP: Looking for ZTP Script provided by DHCP
May 11 16:38:45 dell-s6010-01 ztp[400]: ztp [400]: Attempting to provision via ZTP DHCP from http://192.0.2.1/demo.sh
May 11 16:38:45 dell-s6010-01 ztp[400]: ztp [400]: ZTP DHCP: URL response code 200
May 11 16:38:45 dell-s6010-01 ztp[400]: ztp [400]: ZTP DHCP: Found Marker CUMULUS-AUTOPROVISIONING
May 11 16:38:45 dell-s6010-01 ztp[400]: ztp [400]: ZTP DHCP: Executing http://192.0.2.1/demo.sh
May 11 16:38:45 dell-s6010-01 ztp[400]: ztp [400]: ZTP DHCP: Payload returned code 1
May 11 16:38:45 dell-s6010-01 ztp[400]: ztp [400]: Script returned failure
May 11 16:38:45 dell-s6010-01 systemd[1]: ztp.service: main process exited, code=exited, status=1/FAILURE
May 11 16:38:45 dell-s6010-01 systemd[1]: Unit ztp.service entered failed state.
Instead of running journalctl, you can see the log history by running:
cumulus@switch:~$ cat /var/log/syslog | grep ztp
2016-05-11T16:37:45.132583+00:00 cumulus ztp [400]: /var/lib/cumulus/ztp: State Directory does not exist. Creating it...
2016-05-11T16:37:45.134081+00:00 cumulus ztp [400]: /var/run/ztp.lock: Lock File does not exist. Creating it...
2016-05-11T16:37:45.135360+00:00 cumulus ztp [400]: /var/lib/cumulus/ztp/ztp_state.log: State File does not exist. Creating it...
2016-05-11T16:37:45.185598+00:00 cumulus ztp [400]: ZTP LOCAL: Looking for ZTP local Script
2016-05-11T16:37:45.485084+00:00 cumulus ztp [400]: ZTP LOCAL: Waterfall search for /var/lib/cumulus/ztp/cumulus-ztp-x86_64-dell_s6010_s1220-rUNKNOWN
2016-05-11T16:37:45.486394+00:00 cumulus ztp [400]: ZTP LOCAL: Waterfall search for /var/lib/cumulus/ztp/cumulus-ztp-x86_64-dell_s6010_s1220
2016-05-11T16:37:45.488385+00:00 cumulus ztp [400]: ZTP LOCAL: Waterfall search for /var/lib/cumulus/ztp/cumulus-ztp-x86_64-dell
2016-05-11T16:37:45.489665+00:00 cumulus ztp [400]: ZTP LOCAL: Waterfall search for /var/lib/cumulus/ztp/cumulus-ztp-x86_64
2016-05-11T16:37:45.490854+00:00 cumulus ztp [400]: ZTP LOCAL: Waterfall search for /var/lib/cumulus/ztp/cumulus-ztp
2016-05-11T16:37:45.492296+00:00 cumulus ztp [400]: ZTP USB: Looking for unmounted USB devices
2016-05-11T16:37:45.493525+00:00 cumulus ztp [400]: ZTP USB: Parsing partitions
2016-05-11T16:37:45.636422+00:00 cumulus ztp [400]: ZTP USB: Device not found
2016-05-11T16:38:43.372857+00:00 cumulus ztp [1805]: Found ZTP DHCP Request
2016-05-11T16:38:45.696562+00:00 cumulus ztp [400]: ZTP DHCP: Looking for ZTP Script provided by DHCP
2016-05-11T16:38:45.698598+00:00 cumulus ztp [400]: Attempting to provision via ZTP DHCP from http://192.0.2.1/demo.sh
2016-05-11T16:38:45.816275+00:00 cumulus ztp [400]: ZTP DHCP: URL response code 200
2016-05-11T16:38:45.817446+00:00 cumulus ztp [400]: ZTP DHCP: Found Marker CUMULUS-AUTOPROVISIONING
2016-05-11T16:38:45.818402+00:00 cumulus ztp [400]: ZTP DHCP: Executing http://192.0.2.1/demo.sh
2016-05-11T16:38:45.834240+00:00 cumulus ztp [400]: ZTP DHCP: Payload returned code 1
2016-05-11T16:38:45.835488+00:00 cumulus ztp [400]: Script returned failure
2016-05-11T16:38:45.876334+00:00 cumulus systemd[1]: ztp.service: main process exited, code=exited, status=1/FAILURE
2016-05-11T16:38:45.879410+00:00 cumulus systemd[1]: Unit ztp.service entered failed state.
If you see that the issue is a script failure, you can modify the script and then run ZTP manually using ztp -v -r <URL/path to that script>, as above.
cumulus@switch:~$ sudo ztp -v -r http://192.0.2.1/demo.sh
Attempting to provision via ZTP Manual from http://192.0.2.1/demo.sh
Broadcast message from root@dell-s6010-01 (ttyS0) (Tue May 10 22:44:17 2019):
ZTP: Attempting to provision via ZTP Manual from http://192.0.2.1/demo.sh
ZTP Manual: URL response code 200
ZTP Manual: Found Marker CUMULUS-AUTOPROVISIONING
ZTP Manual: Executing http://192.0.2.1/demo.sh
error: ZTP Manual: Payload returned code 1
error: Script returned failure
cumulus@switch:~$ sudo ztp -s
State enabled
Version 1.0
Result Script Failure
Date Mon 20 May 2019 09:31:27 PM UTC
Method ZTP Manual
URL http://192.0.2.1/demo.sh
Use the following command to check syslog for information about ZTP:
Errors in syslog for ZTP like those shown above often occur if the script is created (or edited as some point) on a Windows machine. Check to make sure that the \r\n characters are not present in the end-of-line encodings.
Use the cat -v ztp.sh command to view the contents of the script and search for any hidden characters.
root@oob-mgmt-server:/var/www/html# cat -v ./ztp_oob_windows.sh
#!/bin/bash^M
^M
###################^M
# ZTP Script^M
###################^M
^M
/usr/cumulus/bin/cl-license -i http://192.168.0.254/license.txt^M
^M
# Clean method of performing a Reboot^M
nohup bash -c 'sleep 2; shutdown now -r "Rebooting to Complete ZTP"' &^M
^M
exit 0^M
^M
# The line below is required to be a valid ZTP script^M
#CUMULUS-AUTOPROVISIONING^M
root@oob-mgmt-server:/var/www/html#
The ^M characters in the output of your ZTP script, as shown above, indicate the presence of Windows end-of-line encodings that you need to remove.
Use the translate (tr) command on any Linux system to remove the '\r' characters from the file.
root@oob-mgmt-server:/var/www/html# tr -d '\r' < ztp_oob_windows.sh > ztp_oob_unix.sh
root@oob-mgmt-server:/var/www/html# cat -v ./ztp_oob_unix.sh
#!/bin/bash
###################
# ZTP Script
###################
/usr/cumulus/bin/cl-license -i http://192.168.0.254/license.txt
# Clean method of performing a Reboot
nohup bash -c 'sleep 2; shutdown now -r "Rebooting to Complete ZTP"' &
exit 0
# The line below is required to be a valid ZTP script
#CUMULUS-AUTOPROVISIONING
root@oob-mgmt-server:/var/www/html#
Manually Use the ztp Command
To enable ZTP, use the -e option:
cumulus@switch:~$ sudo ztp -e
Enabling ZTP means that ZTP tries to run the next time the switch boots. However, if ZTP already ran on a previous boot up or if a manual configuration has been found, ZTP will just exit without trying to look for any script.
ZTP checks for these manual configurations during bootup:
Password changes
Users and groups changes
Packages changes
Interfaces changes
The presence of an installed license
When the switch is booted for the very first time, ZTP records the state of important files that are most likely going to be modified after that the switch is configured. If ZTP is still enabled after a reboot, ZTP compares the recorded state to the current state of these files. If they do not match, ZTP considers that the switch has already been provisioned and exits. These files are only erased after a reset.
To reset ZTP to its original state, use the -R option. This removes the ztp directory and ZTP runs the next time the switch reboots.
cumulus@switch:~$ sudo ztp -R
To disable ZTP, use the -d option:
cumulus@switch:~$ sudo ztp -d
To force provisioning to occur and ignore the status listed in the configuration file, use the -r option:
cumulus@switch:~$ sudo ztp -r cumulus-ztp.sh
To see the current ZTP state, use the -s option:
cumulus@switch:~$ sudo ztp -s
ZTP INFO:
State disabled
Version 1.0
Result success
Date Mon May 20 21:51:04 2019 UTC
Method Switch manually configured
URL None
You can run the NCLU net show system ztp script or net show system ztp json command to see the current ztp state.
Considerations
During the development of a provisioning script, the switch might need to be rebooted.
You can use the Cumulus Linux onie-select -i command to cause the switch to reprovision itself and install a network operating system again using ONIE.
System Configuration
This section describes how to configure your Cumulus Linux switch. You can set the date and time, configure authentication, authorization, and accounting and configure access control lists (ACLs), which control the traffic entering your network.
This section also describes the services and daemons that Cumulus Linux uses, and describes how to configure switchd, the daemon at the heart of Cumulus Linux.
An overview of the Network Command Line Utility (NCLU) is also provided.
Network Command Line Utility - NCLU
The Network Command Line Utility (NCLU) is a command line interface that simplifies the networking configuration process for all users.
NCLU resides in the Linux user space and provides consistent access to networking commands directly through bash, making configuration and troubleshooting simple and easy; no need to edit files or enter modes and sub-modes. NCLU provides these benefits:
Embeds help, examples, and automatic command checking with suggestions in case you enter a typo.
Runs directly from and integrates with bash, while being interoperable with the regular way of accessing underlying configuration files.
Configures dependent features automatically so that you don’t have to.
The NCLU wrapper utility called net is capable of configuring layer 2 and layer 3 features of the networking stack, installing ACLs and VXLANs, restoring configuration files, as well as providing monitoring and troubleshooting functionality for these features. You can configure both the /etc/network/interfaces and /etc/frr/frr.conf files with net, in addition to running show and clear commands related to ifupdown2 and FRRouting.
If you use automation to configure your switches, NVIDIA recommends that you do not use NCLU. Edit configuration files directly.
NCLU Basics
Use the following workflow to stage and commit changes to Cumulus Linux with NCLU:
Use the net add and net del commands to stage and remove configuration changes.
Use the net pending command to review staged changes.
Use net commit and net abort to commit and delete staged changes.
net commit applies the changes to the relevant configuration files, such as /etc/network/interfaces, then runs necessary follow on commands to enable the configuration, such as ifreload -a.
If two different users try to commit a change at the same time, NCLU displays a warning but implements the change according to the first commit received. The second user will need to abort the commit.
When you have a running configuration, you can review and update the configuration with the following commands:
net show is a series of commands for viewing various parts of the network configuration. For example, use net show configuration to view the complete network configuration, net show commit history to view a history of commits using NCLU, and net show bgp to view BGP status.
net clear provides a way to clear net show counters, BGP and OSPF neighbor content, and more.
net rollback provides a mechanism to revert back to an earlier configuration.
net commit confirm requires you to press Enter to commit changes using NCLU. If you run net commit confirm but do not press Enter within 10 seconds, the commit automatically reverts and no changes are made.
net commit description <description> enables you to provide a descriptive summary of the changes you are about to commit.
net commit permanent retains the backup file taken when committing the change. Otherwise, the backup files created from NCLU commands are cleaned up periodically.
net del all deletes all configurations.
The net del all command does not remove management VRF configurations; NCLU does not interact with eth0 interfaces and management VRF.
Tab Completion, Verification, and Inline Help
In addition to tab completion and partial keyword command identification, NCLU includes verification checks to ensure you use the correct syntax. The examples below show the output for incorrect commands:
cumulus@switch:~$ net add bgp router-id 1.1.1.1/32
ERROR: Command not found
Did you mean one of the following?
net add bgp router-id <ipv4>
This command is looking for an IP address, not an IP/prefixlen
cumulus@switch:~$ net add bgp router-id 1.1.1.1
cumulus@switch:~$ net add int swp10 mtu <TAB>
<552-9216> :
cumulus@switch:~$ net add int swp10 mtu 9300
ERROR: Command not found
Did you mean one of the following?
net add interface <interface> mtu <552-9216>
NCLU has a comprehensive built in help system. In addition to the net man page, you can use ?and help to display available commands:
cumulus@switch:~$ net help
Usage:
# net <COMMAND> [<ARGS>] [help]
#
# net is a command line utility for networking on Cumulus Linux switches.
#
# COMMANDS are listed below and have context specific arguments which can
# be explored by typing "<TAB>" or "help" anytime while using net.
#
# Use 'man net' for a more comprehensive overview.
net abort
net commit [verbose] [confirm [<number-seconds>]] [description <wildcard>]
net commit permanent <wildcard>
net del all
net help [verbose]
net pending [json]
net rollback (<number>|last)
net rollback description <wildcard-snapshot>
net show commit (history|<number>|last)
net show rollback (<number>|last)
net show rollback description <wildcard-snapshot>
net show configuration [commands|files|acl|bgp|multicast|ospf|ospf6]
net show configuration interface [<interface>] [json]
Options:
# Help commands
help : context sensitive information; see section below
example : detailed examples of common workflows
# Configuration commands
add : add/modify configuration
del : remove configuration
# Commit buffer commands
abort : abandon changes in the commit buffer
commit : apply the commit buffer to the system
pending : show changes staged in the commit buffer
rollback : revert to a previous configuration state
# Status commands
show : show command output
clear : clear counters, BGP neighbors, etc
cumulus@switch:~$ net help bestpath
The following commands contain keyword(s) 'bestpath'
net (add|del) bgp bestpath as-path multipath-relax [as-set|no-as-set]
net (add|del) bgp bestpath compare-routerid
net (add|del) bgp bestpath med missing-as-worst
net (add|del) bgp ipv4 labeled-unicast neighbor <bgppeer> addpath-tx-bestpath-per-AS
net (add|del) bgp ipv4 unicast neighbor <bgppeer> addpath-tx-bestpath-per-AS
net (add|del) bgp ipv6 labeled-unicast neighbor <bgppeer> addpath-tx-bestpath-per-AS
net (add|del) bgp ipv6 unicast neighbor <bgppeer> addpath-tx-bestpath-per-AS
net (add|del) bgp neighbor <bgppeer> addpath-tx-bestpath-per-AS
net (add|del) bgp vrf <text> bestpath as-path multipath-relax [as-set|no-as-set]
net (add|del) bgp vrf <text> bestpath compare-routerid
net (add|del) bgp vrf <text> bestpath med missing-as-worst
net (add|del) bgp vrf <text> ipv4 labeled-unicast neighbor <bgppeer> addpath-tx-bestpath-per-AS
net (add|del) bgp vrf <text> ipv4 unicast neighbor <bgppeer> addpath-tx-bestpath-per-AS
net (add|del) bgp vrf <text> ipv6 labeled-unicast neighbor <bgppeer> addpath-tx-bestpath-per-AS
net (add|del) bgp vrf <text> ipv6 unicast neighbor <bgppeer> addpath-tx-bestpath-per-AS
net (add|del) bgp vrf <text> neighbor <bgppeer> addpath-tx-bestpath-per-AS
net add bgp debug bestpath <ip/prefixlen>
net del bgp debug bestpath [<ip/prefixlen>]
net show bgp (<ipv4>|<ipv4/prefixlen>|<ipv6>|<ipv6/prefixlen>) [bestpath|multipath] [json]
net show bgp vrf <text> (<ipv4>|<ipv4/prefixlen>|<ipv6>|<ipv6/prefixlen>) [bestpath|multipath] [json]
You can configure multiple interfaces at once:
cumulus@switch:~$ net add int swp7-9,12,15-17,22 mtu 9216
Search for Specific Commands
To search for specific NCLU commands so that you can identify the correct syntax to use, run the net help verbose | <term> command. For example, to show only commands that include clag (for MLAG):
cumulus@leaf01:mgmt:~$ net help verbose | grep clag
net example clag basic-clag
net example clag l2-with-server-vlan-trunks
net example clag l3-uplinks-virtual-address
net add clag peer sys-mac <mac-clag> interface <interface> (primary|secondary) [backup-ip <ipv4>]
net add clag peer sys-mac <mac-clag> interface <interface> (primary|secondary) [backup-ip <ipv4> vrf <text>]
net del clag peer
net add clag port bond <interface> interface <interface> clag-id <0-65535>
net del clag port bond <interface>
net show clag [our-macs|our-multicast-entries|our-multicast-route|our-multicast-router-ports|peer-macs|peer-multicast-entries|peer-multicast-route|peer-multicast-router-ports|params|backup-ip|id] [verbose] [json]
net show clag macs [<mac>] [json]
net show clag neighbors [verbose]
net show clag peer-lacp-rate
net show clag verify-vlans [verbose]
net show clag status [verbose] [json]
net add bond <interface> clag id <0-65535>
net add interface <interface> clag args <wildcard>
net add interface <interface> clag backup-ip (<ipv4>|<ipv4> vrf <text>)
net add interface <interface> clag enable (yes|no)
net add interface <interface> clag peer-ip (<ipv4>|<ipv6>|linklocal)
net add interface <interface> clag priority <0-65535>
net add interface <interface> clag sys-mac <mac>
net add loopback lo clag vxlan-anycast-ip <ipv4>
net del bond <interface> clag id [<0-65535>]
net del interface <interface> clag args [<wildcard>]
...
Add ? (Question Mark) Ability to NCLU
While tab completion is enabled by default, you can also configure NCLU to use the ? (question mark character) to look at available commands. To enable this feature for the cumulus user, open the following file:
cumulus@switch:~$ sudo nano ~/.inputrc
Uncomment the very last line in the .inputrc file so that the file changes from this:
# Uncomment to use ? as an alternative to
# ?: complete
to this:
# Uncomment to use ? as an alternative to
?: complete
Save the file and reconnect to the switch. The ? (question mark) ability will work on all subsequent sessions on the switch.
cumulus@switch:~$ net
abort : abandon changes in the commit buffer
add : add/modify configuration
clear : clear counters, BGP neighbors, etc
commit : apply the commit buffer to the system
del : remove configuration
example : detailed examples of common workflows
help : Show this screen and exit
pending : show changes staged in the commit buffer
rollback : revert to a previous configuration state
show : show command output
When the question mark is typed, NCLU will autocomplete and show all available options, but the question mark does not actually appear on the terminal. This is normal, expected behavior.
Built-In Examples
NCLU has a number of built in examples to guide you through basic configuration setup:
cumulus@switch:~$ net example
acl : access-list
bgp : Border Gateway Protocol
bond : bond, port-channel, etc
bridge : a layer2 bridge
clag : Multi-Chassis Link Aggregation
dhcp : Dynamic Host Configuration Protocol
dot1x : Configure, Enable, Delete or Show IEEE 802.1X EAPOL
evpn : Ethernet VPN
link-settings : Physical link parameters
management-vrf : Management VRF
mlag : Multi-Chassis Link Aggregation
ospf : Open Shortest Path First (OSPFv2)
snmp-server : Configure the SNMP server
syslog : Set syslog logging
vlan-interfaces : IP interfaces for VLANs
voice-vlan : VLAN used for IP Phones
vrr : add help text
cumulus@switch:~$ net example bridge
Scenario
========
We are configuring switch1 and would like to configure the following
- configure switch1 as an L2 switch for host-11 and host-12
- enable vlans 10-20
- place host-11 in vlan 10
- place host-12 in vlan 20
- create an SVI interface for vlan 10
- create an SVI interface for vlan 20
- assign IP 10.0.0.1/24 to the SVI for vlan 10
- assign IP 20.0.0.1/24 to the SVI for vlan 20
- configure swp3 as a trunk for vlans 10, 11, 12 and 20
swp3
*switch1 --------- switch2
/\
swp1 / \ swp2
/ \
/ \
host-11 host-12
switch1 net commands
====================
- enable vlans 10-20
switch1# net add vlan 10-20
- place host-11 in vlan 10
- place host-12 in vlan 20
switch1# net add int swp1 bridge access 10
switch1# net add int swp2 bridge access 20
- create an SVI interface for vlan 10
- create an SVI interface for vlan 20
- assign IP 10.0.0.1/24 to the SVI for vlan 10
- assign IP 20.0.0.1/24 to the SVI for vlan 20
switch1# net add vlan 10 ip address 10.0.0.1/24
switch1# net add vlan 20 ip address 20.0.0.1/24
- configure swp3 as a trunk for vlans 10, 11, 12 and 20
switch1# net add int swp3 bridge trunk vlans 10-12,20
switch1# net pending
switch1# net commit
Verification
============
switch1# net show interface
switch1# net show bridge macs
Configure User Accounts
You can configure user accounts in Cumulus Linux with read-only or edit permissions for NCLU:
You create user accounts with read-only permissions for NCLU by adding them to the netshow group. A user in the netshow group can run NCLU net show commands, such as net show interface or net show config, and certain general Linux commands, such as ls, cd or man, but cannot run net add, net del or net commit commands.
You create user accounts with edit permissions for NCLU by adding them to the netedit group. A user in the netedit group can run NCLU configuration commands, such net add, net del or net commit in addition to NCLU net show commands.
The examples below demonstrate how to add a new user account or modify an existing user account called myuser.
To add a new user account with NCLU show permissions:
cumulus@switch:~$ sudo adduser --ingroup netshow myuser
Adding user `myuser' ...
Adding new user `myuser' (1001) with group `netshow'...
...
To add NCLU show permissions to a user account that already exists:
cumulus@switch:~$ sudo addgroup myuser netshow
Adding user `myuser' to group `netshow' ...
Adding user myuser to group netshow
Done
To add a new user account with NCLU edit permissions:
cumulus@switch:~$ sudo adduser --ingroup netedit myuser
Adding user `myuser' ...
Adding new user `myuser' (1001) with group `netedit'
...
To add NCLU edit permissions to a user account that already exists:
cumulus@switch:~$ sudo addgroup myuser netedit
Adding user `myuser' to group `netedit' ...
Adding user myuser to group netedit
Done
You can use the adduser command for local user accounts only. You can use the addgroup command for both local and remote user accounts. For a remote user account, you must use the mapping username, such as tacacs3 or radius_user, not the TACACS or RADIUS account name.
If the user tries to run commands that are not allowed, the following error displays:
myuser@switch:~$ net add hostname host01
ERROR: User username does not have permission to make networking changes.
Edit the netd.conf File
Instead of using the NCLU commands described above, you can manually configure users and groups to be able to run NCLU commands.
Edit the /etc/netd.conf file to add users to the users_with_edit and users_with_show lines in the file, then save the file.
For example, if you want the user netoperator to be able to run both edit and show commands, add the user to the users_with_edit and users_with_show lines in the /etc/netd.conf file:
cumulus@switch:~$ sudo nano /etc/netd.conf
# Control which users/groups are allowed to run 'add', 'del',
# 'clear', 'net abort', 'net commit' and restart services
# to apply those changes
users_with_edit = root, cumulus, netoperator
groups_with_edit = netedit
# Control which users/groups are allowed to run 'show' commands
users_with_show = root, cumulus, netoperator
groups_with_show = netshow, netedit
To configure a new user group to use NCLU, add that group to the groups_with_edit and groups_with_show lines in the file.
Use caution giving edit permissions to groups. For example, do not give edit permissions to the tacacs group.
Restart the netd Service
Whenever you modify netd.conf or when NSS services change, you must restart the netd service for the changes to take effect:
You can easily back up your NCLU configuration to a file by outputting the results of net show configuration commands to a file, then retrieving the contents of the file using the source command. You can then view the configuration at any time or copy it to other switches and use the source command to apply that configuration to those switches.
For example, to copy the configuration of a leaf switch called leaf01, run the following command:
cumulus@leaf01:~$ net show configuration commands >> leaf01.txt
With the commands all stored in a single file, you can now copy this file to another ToR switch in your network called leaf01 and apply the configuration by running:
cumulus@leaf01:~$ source leaf01.txt
Advanced Configuration
NCLU needs no initial configuration; however, if you need to modify certain configuration, you must manually update the /etc/netd.conf file. You can configure this file to allow different permission levels for users to edit configurations and run show commands. The file also contains a blacklist that hides less frequently used terms from the tabbed autocomplete.
After you edit the netd.conf file, restart the netd service for the changes to take effect.
Hides corner case command options from tab complete, to simplify and streamline output.
net provides an environment variable to set where the net output is directed. To only use stdout, set the NCLU_TAB_STDOUT environment variable to true. The value is not case sensitive.
Considerations
Unsupported Interface Names
NCLU does not support interfaces named dev.
Bonds With No Configured Members
If a bond interface is configured and it contains no members NCLU will report the interface does not exist.
Large NCLU Inputs
Each NCLU command must be parsed by the system. Large inputs, for example a large paste of NCLU commands can take some time, sometimes minutes, to process.
Cumulus User Experience - CUE
Cumulus User Experience (CUE) is an early access feature currently in ALPHA and open to customer feedback.
CUE is not currently intended to run in production and is not supported through NVIDIA networking support.
Your evaluation is welcome and appreciated as we start to roll out this new Cumulus Linux CLI. You can provide feedback by sending an email to net-cl-cue-ea-feedback@nvidia.com.
What is CUE?
CUE is an object-oriented, schema driven model of a complete Cumulus Linux system (hardware and software) providing a robust API that allows for multiple interfaces to both view (show) and configure (set and unset) any element within a system running the CUE software. The CUE CLI and the REST API leverage the same API to interface with Cumulus Linux.
CUE follows a declarative model, removing context-specific commands and settings. It is structured as a big tree that represents the entire state of a Cumulus Linux instance. At the base of the tree are high level branches representing objects, such as router and interface. Under each of these branches are further branches. As you navigate through the tree, you gain a more specific context. At the leaves of the tree are actual attributes, represented as key/value pairs. The path through the tree is similar to a filesystem path.
In this ALPHA release of CUE, you have full access to the new CLI, which leverages the underlying CUE API. Future releases will provide access to the API through REST, Python and more.
This documentation describes how to access CUE and navigate the CUE CLI to configure and monitor Cumulus Linux.
Install CUE
CUE is not installed by default on Cumulus Linux. To install CUE, follow the procedure below.
Log out of the switch, then log back in to get the CUE CLI prompt.
Command Line Basics
The CUE command line has a flat structure as opposed to a modal structure. This means that you can run all commands from the primary prompt instead of only in a specific mode.
Command Syntax
CUE commands all begin with cl and fall into one of three syntax categories:
Configuration (cl set and cl unset)
Monitoring (cl show)
Configuration management (cl config).
Command Completion
As you enter commands, you can get help with the valid keywords or options using the Tab key. For example, using Tab completion with cl set displays the possible objects for the command, and returns you to the command prompt to complete the command.
cumulus@switch:~$ cl set <<press Tab>>
bridge interface nve router vrf
evpn mlag platform system
cumulus@switch:~$ cl set
Command Help
As you enter commands, you can get help with command syntax by entering -h or --help at various points within a command entry. For example, to find out what options are available for cl set interface, enter cl set interface -h or cl set interface --help.
cumulus@switch:~$ cl set interface -h
Usage:
cl set interface <interface-id> ...
Description:
Interfaces
Identifiers:
<interface-id> Interface
General Options:
-h, --help Show help.
Command List
You can list all the CUE commands by running cl list-commands. See List All CUE Commands below.
Command History
At the command prompt, press the Up Arrow and Down Arrow keys to move back and forth through the list of commands previously entered. When you find a given command, you can run the command by pressing Enter. Optionally, you can modify the command before you run it.
Command Categories
The CUE CLI has a flat structure; however, the commands are conceptually grouped into three functional categories:
Configuration
Monitoring
Configuration Management
Configuration Commands
The CUE configuration commands modify switch configuration. You can set and unset configuration options.
The cl set and cl unset commands are grouped into the following categories. Each command group includes arguments. Use command completion (Tab key) to list the subcommands.
Command Group
Description
cl set router cl unset router
Configures router policies, such as prefix list rules and route maps, and global BGP options. This is where you enable and disable BGP, set the ASN and the router ID, and configure BGP graceful restart and shutdown.
cl set platform cl unset platform
Configures hostname options, such as the static hostname for the switch, the local domain, and whether DHCP is allowed to override the hostname. You can also set how configuration apply operations are performed (such as which files to ignore and which files to overwrite).
cl set bridge cl unset bridge
Configures a bridge domain. This is where you configure the bridge type (such as VLAN-aware), 802.1Q encapsulation, the STP state and priority, and the VLANs in the bridge domain.
cl set mlag cl unset mlag
Configures MLAG. This is where you configure the backup IP address or interface, MLAG system MAC address, peer IP address, MLAG priority, and the delay before bonds are brought up.
cl set evpn cl unset evpn
Configures EVPN. This is where you enable and disable the EVPN control plane, and set EVPN route advertise options, default gateway configuration for centralized routing, and duplicate address detection options.
cl set interface <interface-id> cl unset interface <interface-id>
Configures the switch interfaces. Use this command to configure bond interfaces, bridge interfaces, interface IP addresses, VLAN IDs, and links (MTU, FEC, speed, duplex, and so on).
cl set system cl unset system
Configures global system settings, such as NTP, DHCP servers, DNS, LLDP, and syslog.
cl set vrf <vrf-id> cl unset vrf <vrf-id>
Configures VRFs. This is where you configure VRF-level router configuration such as BGP, including BGP for the default VRF.
cl set service cl unset service
Configures DHCP relays. This is where you configure the DHCP relay server IP address, the set of interfaces on which to handle DHCP relay traffic, the DHCP relay gateway IP address on the interfaces, and the source IP address to use on the relayed packet.
cl set nve cl unset nve
Configures network virtualization (VXLAN) settings. This is where you configure the UDP port for VXLAN frames, control dynamic MAC learning over VXLAN tunnels, and configure how Cumulus Linux handles BUM traffic in the overlay.
Monitoring Commands
The CUE monitoring commands show various parts of the network configuration. For example, you can show the complete network configuration or only interface configuration. The monitoring commands are grouped into the following categories. Each command group includes subcommands. Use command completion (Tab key) to list the subcommands.
Command Group
Description
cl show router
Shows router configuration, such as router policies and global BGP configuration.
cl show platform
Shows platform configuration, such as hardware and software components, and the hostname of the switch.
cl show bridge
Shows bridge domain configuration.
cl show mlag
Shows MLAG configuration.
cl show evpn
Shows EVPN configuration.
cl show interface
Shows interface configuration.
cl show system
Shows global system settings, such as NTP, DHCP server, DNS, syslog and LLDP.
cl show service
Shows DHCP relay configuration, such as the DHCP relay server IP address, the set of interfaces on which DHCP relay traffic is handled, and the DHCP relay gateway IP address on the interfaces.
cl show vrf
Shows VRF configuration.
cl show nve
Shows network virtualization configuration, such as VXLAN-specfic MLAG configuration and VXLAN flooding.
The following example shows the cl show router commands after pressing the TAB key, then shows the output of the cl show router bgp command.
cumulus@leaf01:mgmt:~$ cl show router <<press Tab>>
bgp policy
cumulus@leaf01:mgmt:~$ cl show router bgp
running applied pending description
------------------------------ ------- ----------- ----------- ----------------------------------------------------------------------
enable on Turn the feature 'on' or 'off'. The default is 'off'.
autonomous-system 65101 ASN for all VRFs, if a single AS is in use. If "none", then ASN mu...
graceful-shutdown off Graceful shutdown enable will initiate the GSHUT community to be an...
policy-update-timer 5 Wait time in seconds before processing updates to policies to ensur...
router-id 10.10.10.1 BGP router-id for all VRFs, if a common one is used. If "none", th...
convergence-wait
establish-wait-time 0 Maximum time to wait to establish BGP sessions. Any peerswhich do...
time 0 Time to wait for peers to send end-of-RIB before router performs pa...
graceful-restart
mode helper-only Role of router during graceful restart. helper-only, router is in h...
path-selection-deferral-time 360 Used by the restarter as an upper-bounds for waiting for peering es...
restart-time 120 Amount of time taken to restart by router. It is advertised to the...
stale-routes-time 360 Specifies an upper-bounds on how long we retain routes from a ....
If there are no pending or applied configuration changes, the cl show command only shows the running configuration.
Revision options are available for the cl show commands. You can choose the configuration you want to show (pending, applied, startup, or running):
Option
Description
--rev <revision>
Shows a detached pending configuration. See the cl config detach configuration management command below.
--pending
Shows the configuration you set and unset but have not yet applied or saved.
--applied
Shows the last set of commands applied with the cl config apply command.
--startup
Shows the set of commands saved with the cl config save command. This will be the configuration after the switch boots.
--running
Shows the running configuration (the actual system state). The running and applied configuration should be the same. If different, inspect the logs.
The following example shows pending BGP graceful restart configuration:
cumulus@switch:~$ cl show router bgp graceful-restart --pending
pending_20210128_212626_4WSY description
---------------------------- ---------------------------- ----------------------------------------------------------------------
mode helper-only Role of router during graceful restart. helper-only, router is in h...
path-selection-deferral-time 360 Used by the restarter as an upper-bounds for waiting for peeringes...
restart-time 120 Amount of time taken to restart by router. It is advertised to the...
stale-routes-time 360 Specifies an upper-bounds on how long we retain routes from a resta...
Configuration Management Commands
The CUE configuration management commands manage and apply configurations.
Command
Description
cl config apply
Applies the pending configuration to become the applied configuration. You can also use these prompt options:
--y or --assume-yes to automatically reply yes to all prompts.
--assume-no to automatically reply no to all prompts.
The configuration is applied but not saved and does not persist after a reboot.
cl config detach
Detaches the configuration from the current pending configuration. The detached configuration is called pending and includes a timestamp with extra characters. For example: pending_20210128_212626_4WSY
cl config diff <revision> <revision>
Shows differences between configurations, such as the pending configuration and the applied configuration or the detached configuration and the pending configuration.
cl config patch <cue-file>
Updates the pending configuration with the specified YAML configuration file.
cl config replace <cue-file>
Replaces the pending configuration with the specified YAML configuration file.
cl config save
Overwrites the startup configuration with the applied configuration by writing to the /etc/cue.d/startup.yaml file. The configuration persists after a reboot.
List all CUE Commands
To show the full list of CUE commands, run cl list-commands. For example:
cumulus@switch:~$ cl list-commands
...
cl show interface <interface-id> link lldp neighbor
cl show interface <interface-id> link lldp neighbor <neighbor-id>
cl show interface <interface-id> link lldp neighbor <neighbor-id> bridge
cl show interface <interface-id> link lldp neighbor <neighbor-id> bridge vlan
cl show interface <interface-id> link lldp neighbor <neighbor-id> bridge vlan <vid>
cl show interface <interface-id> link stats
cl show system
cl show system global
cl show system ntp
cl show system ntp server
cl show system ntp server <server-id>
cl show system ntp pool
cl show system ntp pool <server-id>
cl show system dhcp-server
...
You can show the list of commands for a command grouping and for subcommands. For example, to show the list of interface commands:
cumulus@switch:~$ cl list-commands interface
cl show interface
cl show interface <interface-id>
cl show interface <interface-id> bond
cl show interface <interface-id> bond member
cl show interface <interface-id> bond member <member-id>
cl show interface <interface-id> bond mlag
cl show interface <interface-id> bridge
cl show interface <interface-id> bridge domain
cl show interface <interface-id> bridge domain <domain-id>
cl show interface <interface-id> bridge domain <domain-id> stp
cl show interface <interface-id> bridge domain <domain-id> vlan
cl show interface <interface-id> bridge domain <domain-id> vlan <vid>
cl show interface <interface-id> ip
...
Use the Tab key to get help for the command lists you want to see. For example, to show the list of command options available for the interface swp1, run:
cumulus@switch:~$ cl list-commands interface swp1 <<press Tab>>
bond bridge ip link
cumulus@switch:~$ cl list-commands interface swp1 bond
cl show interface <interface-id> bond
cl show interface <interface-id> bond member
cl show interface <interface-id> bond member <member-id>
cl show interface <interface-id> bond mlag
cl set interface <interface-id> bond
cl set interface <interface-id> bond member <member-id>
cl set interface <interface-id> bond mlag
cl set interface <interface-id> bond mlag id (1-65535|auto)
cl set interface <interface-id> bond down-delay 0-65535
cl set interface <interface-id> bond lacp-bypass (on|off)
cl set interface <interface-id> bond lacp-rate (fast|slow)
cl set interface <interface-id> bond mode (lacp|static)
cl set interface <interface-id> bond up-delay 0-65535
cl unset interface <interface-id> bond
cl unset interface <interface-id> bond member
cl unset interface <interface-id> bond member <member-id>
cl unset interface <interface-id> bond mlag
cl unset interface <interface-id> bond mlag id
cl unset interface <interface-id> bond down-delay
cl unset interface <interface-id> bond lacp-bypass
cl unset interface <interface-id> bond lacp-rate
cl unset interface <interface-id> bond mode
cl unset interface <interface-id> bond up-delay
Example Configuration Commands
This section provides examples of how to configure a Cumulus Linux switch using CUE commands.
Configure the System Hostname
The example below shows the CUE commands required to change the hostname for the switch to leaf01:
cumulus@switch:~$ cl set platform hostname value leaf01
cumulus@switch:~$ cl config apply
Configure the System DNS Server
The example below shows the CUE commands required to define the DNS server for the switch:
cumulus@switch:~$ cl set system dns server 192.168.200.1
cumulus@switch:~$ cl config apply
Configure an Interface
The example below shows the CUE commands required to bring up swp1.
cumulus@switch:~$ cl set interface swp1 link state up
cumulus@switch:~$ cl config apply
Configure a Bond
The example below shows the CUE commands required to configure the front panel port interfaces swp1 thru swp4 to be slaves in bond0.
cumulus@switch:~$ cl set interface bond0 bond member swp1-4
cumulus@switch:~$ cl config apply
Configure a Bridge
The example below shows the CUE commands required to create a VLAN-aware bridge that contains two switch ports (swp1 and swp2) and includes 3 VLANs; tagged VLANs 10 and 20 and an untagged (native) VLAN of 1.
With CUE, there is a default bridge called br_default, which has no ports assigned to it. The example below configures this default bridge.
cumulus@switch:~$ cl set interface swp1-2 bridge domain br_default
cumulus@switch:~$ cl set bridge domain br_default vlan 10,20
cumulus@switch:~$ cl set bridge domain br_default untagged 1
cumulus@switch:~$ cl config apply
Configure MLAG
The example below shows the CUE commands required to configure MLAG on leaf01. The commands:
Place swp1 into bond1 and swp2 into bond2.
Configure the MLAG ID to 1 for bond1 and to 2 for bond2.
Add bond1 and bond2 to the default bridge (br_default).
Create the inter-chassis bond (swp49 and swp50) and the peer link (peerlink)
Set the peer link IP address to linklocal, the MLAG system MAC address to 44:38:39:BE:EF:AA, and the backup interface to 10.10.10.2.
cumulus@leaf01:~$ cl set interface bond1 bond member swp1
cumulus@leaf01:~$ cl set interface bond2 bond member swp2
cumulus@leaf01:~$ cl set interface bond1 bond mlag id 1
cumulus@leaf01:~$ cl set interface bond2 bond mlag id 2
cumulus@switch:~$ cl set interface bond1-2 bridge domain br_default
cumulus@leaf01:~$ cl set interface peerlink bond member swp49-50
cumulus@leaf01:~$ cl set mlag mac-address 44:38:39:BE:EF:AA
cumulus@leaf01:~$ cl set mlag backup 10.10.10.2
cumulus@leaf01:~$ cl set mlag peer-ip linklocal
cumulus@leaf01:~$ cl config apply
Configure BGP Unnumbered
The example below shows the CUE commands required to configure BGP unnumbered on leaf01. The commands:
Assign the ASN for this BGP node to 65101.
Set the router ID to 10.10.10.1.
Distribute routing information to the peer on swp51.
Originate prefixes 10.10.10.1/32 from this BGP node.
cumulus@leaf01:~$ cl set router bgp autonomous-system 65101
cumulus@leaf01:~$ cl set router bgp router-id 10.10.10.1
cumulus@leaf01:~$ cl set vrf default router bgp peer swp51 remote-as external
cumulus@leaf01:~$ cl set vrf default router bgp address-family ipv4-unicast static-network 10.10.10.1/32
cumulus@leaf01:~$ cl config apply
Example Monitoring Commands
This section provides monitoring command examples.
Show Installed Software
The following example command lists the software installed on the switch:
The following example command shows the running and applied swp1 interface configuration. There is no pending configuration.
cumulus@leaf01:~$ cl show interface swp1
running applied description
----------------------- ---------- ----------- ----------------------------------------------------------------------
type swp swp The type of interface
bridge
[domain] br_default br_default Bridge domains on this interface
[domain] bridge
ip
vrf default Virtual routing and forwarding
ipv4 forward IPv4 support on the interface. A value of 'on' means IPv4 is enable...
ipv6 forward IPv6 support on the interface. A value of 'on' means IPv6 is enable...
[address] 10.1.1.1/30 ipv4 and ipv6 address
link
auto-negotiate on Link speed and characteritic auto negotiation
breakout 1x sub-divide, aggregate, or disable ports (only valid on plug interfa...
duplex full Link duplex
fec auto Link forward error correction mechanism
mtu 9216 9216 interface mtu
speed auto Link speed
dot1x
mab off bypass MAC authentication
parking-vlan off VLAN for unauthorized MAC addresses
state down up The state of the interface
stats
carrier-transitions 3 Number of times the interface state has transitioned between up and...
in-bytes 0 total number of bytes received on the interface
in-drops 0 number of received packets dropped
in-errors 0 number of received packets with errors
in-pkts 0 total number of packets received on the interface
out-bytes 65700 total number of bytes transmitted out of the interface
out-drops 0 The number of outbound packets that were chosen to be discarded eve...
out-errors 0 The number of outbound packets that could not be transmitted becaus...
out-pkts 934 total number of packets transmitted out of the interface
Example Configuration Management Commands
This section provides examples of how to use the configuration management commands to apply, save, and detach configurations.
Apply and Save a Configuration
The following example command configures the front panel port interfaces swp1 thru swp4 to be slaves in bond0. The configuration is only in a pending configuration state. The configuration is not applied. CUE has not yet made any changes to the running configuration.
cumulus@switch:~$ cl set interface bond0 bond member swp1-4
To apply the pending configuration to the running configuration, run the cl config apply command. The configuration does not persist after a reboot.
cumulus@switch:~$ cl config apply
To save the applied configuration to the startup configuration, run the cl config save command. This command overwrites the startup configuration with the applied configuration by writing to the /etc/cue.d/startup.yaml file. The configuration persists after a reboot.
cumulus@switch:~$ cl config save
Detach a Pending Configuration
The following example configures the IP address of the loopback interface, then detaches the configuration from the current pending configuration. The detached configuration is saved to a file called pending that includes a timestamp with extra characters to distinguish it from other pending configurations; for example, pending_20210128_212626_4WSY.
cumulus@switch:~$ cl set interface lo ip address 10.10.10.1
cumulus@switch:~$ cl config detach
View Differences between Configurations
To view differences between configurations, run the cl config diff command.
To view differences between two detached pending configurations, run the cl config diff «TAB» command to list all the current detached pending configurations, then run the cl config diff command with the pending configurations you want to diff:
The following example replaces the pending configuration with the contents of the YAML configuration file called cl-02/13/2021.yaml located in the /deps directory:
The following example patches the pending configuration (runs the set or unset commands from the configuration in the cl-02/13/2021.yaml file located in the /deps directory):
This section lists some of the differences between CUE and the NCLU command line interface to help you navigate configuration.
Configuration File
When you save network configuration using CUE, the configuration is written to the /etc/cue.d/startup.yaml file.
CUE also writes to underlying Linux files when you apply a configuration, such as the /etc/network/interfaces and /etc/frr/frr.conf files. You can view these configuration files; however NVIDIA recommends that you do not manually edit them while using CUE.
Bridge Configuration
You set global bridge configuration on the bridge domain. For example:
cumulus@leaf01:~$ cl set bridge domain br_default vlan 10,20
However, you set specific bridge interface options with interface commands. For example:
cumulus@leaf01:~$ cl set interface swp1 bridge domain br_default learning on
The default vlan-aware bridge in CUE is br_default. The default vlan-aware bridge in NCLU is bridge.
BGP Configuration
You can set global BGP configuration, such as the ASN, router ID, graceful shutdown and restart with the cl set router bgp command. For example:
cumulus@leaf01:~$ cl set router bgp autonomous-system 65101
However, BGP peer and peer group, route information, timer, and address family configuration requires a VRF. For example:
The switch contains a battery backed hardware clock that maintains the time while the switch is powered off and in between reboots. When the switch is running, the Cumulus Linux operating system maintains its own software clock.
During boot up, the time from the hardware clock is copied into the operating system’s software clock. The software clock is then used for all timekeeping responsibilities. During system shutdown, the software clock is copied back to the battery backed hardware clock.
You can set the date and time on the software clock using the date command. First, determine your current time zone:
cumulus@switch:~$ date +%Z
If you need to reconfigure the current time zone, refer to the instructions above.
Then, to set the system clock according to the time zone configured:
cumulus@switch:~$ sudo date -s "Tue Jan 12 00:37:13 2016"
See man date(1) for more information.
You can write the current value of the system (software) clock to the hardware clock using the hwclock command:
cumulus@switch:~$ sudo hwclock -w
See man hwclock(8) for more information.
Use NTP
The ntpd daemon running on the switch implements the NTP protocol. It synchronizes the system time with time servers listed in the /etc/ntp.conf file. The ntpd daemon is started at boot by default. See man ntpd(8) for details.
If you intend to run this service within a VRF, including the management VRF, follow these steps for configuring the service.
Configure NTP Servers
The default NTP configuration comprises the following servers, which are listed in the /etc/ntpd.conf file:
server 0.cumulusnetworks.pool.ntp.org iburst
server 1.cumulusnetworks.pool.ntp.org iburst
server 2.cumulusnetworks.pool.ntp.org iburst
server 3.cumulusnetworks.pool.ntp.org iburst
To add the NTP server or servers you want to use:
Run the following commands. Include the iburst option to increase the sync speed.
cumulus@switch:~$ net add time ntp server 4.cumulusnetworks.pool.ntp.org iburst
cumulus@switch:~$ net pending
cumulus@switch:~$ net commit
These commands add the NTP server to the list of servers in the /etc/ntp.conf file:
# pool.ntp.org maps to about 1000 low-stratum NTP servers. Your server will
# pick a different set every time it starts up. Please consider joining the
# pool: <http://www.pool.ntp.org/join.html>
server 0.cumulusnetworks.pool.ntp.org iburst
server 1.cumulusnetworks.pool.ntp.org iburst
server 2.cumulusnetworks.pool.ntp.org iburst
server 3.cumulusnetworks.pool.ntp.org iburst
server 4.cumulusnetworks.pool.ntp.org iburst
Edit the /etc/ntp.conf file to add or update NTP server information:
cumulus@switch:~$ sudo nano /etc/ntp.conf
# pool.ntp.org maps to about 1000 low-stratum NTP servers. Your server will
# pick a different set every time it starts up. Please consider joining the
# pool: <http://www.pool.ntp.org/join.html>
server 0.cumulusnetworks.pool.ntp.org iburst
server 1.cumulusnetworks.pool.ntp.org iburst
server 2.cumulusnetworks.pool.ntp.org iburst
server 3.cumulusnetworks.pool.ntp.org iburst
server 4.cumulusnetworks.pool.ntp.org iburst
To set the initial date and time with NTP before starting the ntpd daemon, run the ntpd -q command. This command is the same as ntpdate, which is to be retired and no longer available.
Be aware that ntpd -q can hang if the time servers are not reachable.
cumulus@switch:~$ net show time ntp servers
remote refid st t when poll reach delay offset jitter
==============================================================================
+minime.fdf.net 58.180.158.150 3 u 140 1024 377 55.659 0.339 1.464
+69.195.159.158 128.138.140.44 2 u 259 1024 377 41.587 1.011 1.677
*chl.la 216.218.192.202 2 u 210 1024 377 4.008 1.277 1.628
+vps3.drown.org 17.253.2.125 2 u 743 1024 377 39.319 -0.316 1.384
Run the ntpq -p command:
cumulus@switch:~$ ntpq -p
remote refid st t when poll reach delay offset jitter
==============================================================================
+ec2-34-225-6-20 129.6.15.30 2 u 73 1024 377 70.414 -2.414 4.110
+lax1.m-d.net 132.163.96.1 2 u 69 1024 377 11.676 0.155 2.736
*69.195.159.158 199.102.46.72 2 u 133 1024 377 48.047 -0.457 1.856
-2.time.dbsinet. 198.60.22.240 2 u 1057 1024 377 63.973 2.182 2.692
To remove one or more NTP servers:
Run the net del time ntp <server> command. The following example commands remove some of the default NTP servers.
cumulus@switch:~$ net del time ntp server 0.cumulusnetworks.pool.ntp.org
cumulus@switch:~$ net del time ntp server 1.cumulusnetworks.pool.ntp.org
cumulus@switch:~$ net del time ntp server 2.cumulusnetworks.pool.ntp.org
cumulus@switch:~$ net del time ntp server 3.cumulusnetworks.pool.ntp.org
cumulus@switch:~$ net pending
cumulus@switch:~$ net commit
Edit the /etc/ntp.conf file to delete the NTP servers.
cumulus@switch:~$ sudo nano /etc/ntp.conf
...
# pool.ntp.org maps to about 1000 low-stratum NTP servers. Your server will
# pick a different set every time it starts up. Please consider joining the
# pool: <http://www.pool.ntp.org/join.html>
server 4.cumulusnetworks.pool.ntp.org iburst
...
Specify the NTP Source Interface
By default, the source interface that NTP uses is eth0. To change the source interface:
Run the net add time ntp source <interface> command. The following command example changes the NTP source interface to swp10.
cumulus@switch:~$ net add time ntp source swp10
cumulus@switch:~$ net pending
cumulus@switch:~$ net commit
These commands create the following configuration snippet in the ntp.conf file:
Edit the /etc/ntp.conf file and modify the entry under the # Specify interfaces comment. The following example shows that the NTP source interface is swp10.
You can use DHCP to specify your NTP servers. Ensure that the DHCP-generated configuration file named /run/ntp.conf.dhcp exists. This file is generated by the /etc/dhcp/dhclient-exit-hooks.d/ntp script and is a copy of the default /etc/ntp.conf with a modified server list from the DHCP server. If this file does not exist and you plan on using DHCP in the future, you can copy your current /etc/ntp.conf file to the location of the DHCP file.
To use DHCP to specify your NTP servers, run the sudo -E systemctl edit ntp.service command and add the ExecStart= line:
The sudo -E systemctl edit ntp.service command always updates the base ntp.service even if ntp@mgmt.service is used. The ntp@mgmt.service is re-generated automatically.
To validate that your configuration, run these commands:
If the state is not Active, or the alternate configuration file does not appear in the ntp command line, it is likely that a mistake was made. In this case, correct the mistake and rerun the three commands above to verify.
When you use the above procedure to specify your NTP servers, the NCLU commands for changing NTP settings do not take effect.
Configure NTP with Authorization Keys
For added security, you can configure NTP to use authorization keys.
Configure the NTP Server
Create a .keys file, such as /etc/ntp.keys. Specify a key identifier (a number from 1-65535), an encryption method (M for MD5), and the password. The following provides an example:
#
# PLEASE DO NOT USE THE DEFAULT VALUES HERE.
#
#65535 M akey
#1 M pass
1 M CumulusLinux!
In the /etc/ntp/ntp.conf file, add a pointer to the /etc/ntp.keys file you created above and specify the key identifier. For example:
Restart NTP with the sudo systemctl restart ntp command.
Configure the NTP Client
The NTP client is the Cumulus Linux switch.
Create the same .keys file you created on the NTP server (/etc/ntp.keys). For example:
cumulus@switch:~$ sudo nano /etc/ntp.keys
#
# PLEASE DO NOT USE THE DEFAULT VALUES HERE.
#
#65535 M akey
#1 M pass
1 M CumulusLinux!
Edit the /etc/ntp.conf file to specify the server you want to use, the key identifier, and a pointer to the /etc/ntp.keys file you created in step 1. For example:
cumulus@switch:~$ sudo nano /etc/ntp.conf
...
# You do need to talk to an NTP server or two (or three).
#pool ntp.your-provider.example
# OR
#server ntp.your-provider.example
# pool.ntp.org maps to about 1000 low-stratum NTP servers. Your server will
# pick a different set every time it starts up. Please consider joining the
# pool: <http://www.pool.ntp.org/join.html>
#server 0.cumulusnetworks.pool.ntp.org iburst
#server 1.cumulusnetworks.pool.ntp.org iburst
#server 2.cumulusnetworks.pool.ntp.org iburst
#server 3.cumulusnetworks.pool.ntp.org iburst
server 10.50.23.121 key 1
#keys
keys /etc/ntp.keys
trustedkey 1
controlkey 1
requestkey 1
...
Restart NTP in the active VRF (default or management). For example:
Wait a few minutes, then run the ntpq -c as command to verify the configuration:
cumulus@switch:~$ ntpq -c as
ind assid status conf reach auth condition last_event cnt
===========================================================
1 40828 f014 yes yes ok reject reachable 1
After authorization is accepted, you see the following command output:
cumulus@switch:~$ ntpq -c as
ind assid status conf reach auth condition last_event cnt
===========================================================
1 40828 f61a yes yes ok sys.peer sys_peer 1
Precision Time Protocol (PTP) Boundary Clock
With the growth of low latency and high performance applications, precision timing has become increasingly important. Precision Time Protocol (PTP) is used to synchronize clocks in a network and is capable of sub-microsecond accuracy. The clocks are organized in a master-slave hierarchy. The slaves are synchronized to their masters, which can be slaves to their own masters. The hierarchy is created and updated automatically by the best master clock (BMC) algorithm, which runs on every clock. The grandmaster clock is the top-level master and is typically synchronized by using a Global Positioning System (GPS) time source to provide a high-degree of accuracy.
A boundary clock has multiple ports; one or more master ports and one or more slave ports. The master ports provide time (the time can originate from other masters further up the hierarchy) and the slave ports receive time. The boundary clock absorbs sync messages in the slave port, uses that port to set its clock, then generates new sync messages from this clock out of all of its master ports.
Cumulus Linux includes the linuxptp package for PTP, which uses the phc2sys daemon to synchronize the PTP clock with the system clock.
Cumulus Linux currently supports PTP on the Mellanox Spectrum ASIC only.
PTP is supported in boundary clock mode only (the switch provides timing to downstream servers; it is a slave to a higher-level clock and a master to downstream clocks).
The switch uses hardware time stamping to capture timestamps from an Ethernet frame at the physical layer. This allows PTP to account for delays in message transfer and greatly improves the accuracy of time synchronization.
Only IPv4/UDP PTP packets are supported.
Only a single PTP domain per network is supported. A PTP domain is a network or a portion of a network within which all the clocks are synchronized.
In the following example, boundary clock 2 receives time from Master 1 (the grandmaster) on a PTP slave port, sets its clock and passes the time down from the PTP master port to boundary clock 1. Boundary clock 1 receives the time on a PTP slave port, sets its clock and passes the time down the hierarchy through the PTP master ports to the hosts that receive the time.
Enable the PTP Boundary Clock on the Switch
To enable the PTP boundary clock on the switch:
Open the /etc/cumulus/switchd.conf file in a text editor and add the following line:
Restarting the switchd service causes all network ports to reset, interrupting network services, in addition to resetting the switch hardware configuration.
Configure the PTP Boundary Clock
To configure a boundary clock:
Configure the interfaces on the switch that you want to use for PTP. Each interface must be configured as a layer 3 routed interface with an IP address.
PTP is supported on BGP unnumbered interfaces.
PTP is not supported on switched virtual interfaces (SVIs).
cumulus@switch:~$ net add interface swp13s0 ip address 10.0.0.9/32
cumulus@switch:~$ net add interface swp13s1 ip address 10.0.0.10/32
Configure PTP options on the switch:
Set the gm-capable option to no to configure the switch to be a boundary clock.
Set the priority, which selects the best master clock. You can set priority 1 or 2. For each priority, you can use a number between 0 and 255. The default priority is 255. For the boundary clock, use a number above 128. The lower priority is applied first.
Add the time-stamping parameter. The switch automatically enables hardware time-stamping to capture timestamps from an Ethernet frame at the physical layer. If you are testing PTP in a virtual environment, hardware time-stamping is not available; however the time-stamping parameter is still required.
Add the PTP master and slave interfaces. You do not specify which is a master interface and which is a slave interface; this is determined by the PTP packet received. The following commands provide an example configuration:
cumulus@switch:~$ net add ptp global gm-capable no
cumulus@switch:~$ net add ptp global priority2 254
cumulus@switch:~$ net add ptp global priority1 254
cumulus@switch:~$ net add ptp global time-stamping
cumulus@switch:~$ net add ptp interface swp13s0
cumulus@switch:~$ net add ptp interface swp13s1
cumulus@switch:~$ net pending
cumulus@switch:~$ net commit
The ptp4l man page describes all the configuration parameters.
In the following example, the boundary clock on the switch receives time from Master 1 (the grandmaster) on PTP slave port swp3s0, sets its clock and passes the time down through PTP master ports swp3s1, swp3s2, and swp3s3 to the hosts that receive the time.
The configuration for the above example is shown below. The example assumes that you have already configured the layer 3 routed interfaces (swp3s0, swp3s1, swp3s2, and swp3s3) you want to use for PTP.
cumulus@switch:~$ net add ptp global gm-capable no
cumulus@switch:~$ net add ptp global priority2 254
cumulus@switch:~$ net add ptp global priority1 254
cumulus@switch:~$ net add ptp global time-stamping
cumulus@switch:~$ net add ptp interface swp3s0
cumulus@switch:~$ net add ptp interface swp3s1
cumulus@switch:~$ net add ptp interface swp3s2
cumulus@switch:~$ net add ptp interface swp3s3
cumulus@switch:~$ net pending
cumulus@switch:~$ net commit
Verify PTP Boundary Clock Configuration
To view a summary of the PTP configuration on the switch, run the net show configuration ptp command:
To view the additional PTP status information, including the delta in nanoseconds from the master clock, run the sudo pmc -u -b 0 'GET TIME_STATUS_NP' command:
To delete PTP configuration, delete the PTP master and slave interfaces. The following example commands delete the PTP interfaces swp3s0, swp3s1, and swp3s2.
cumulus@switch:~$ net del ptp interface swp3s0
cumulus@switch:~$ net del ptp interface swp3s1
cumulus@switch:~$ net del ptp interface swp3s2
cumulus@switch:~$ net pending
cumulus@switch:~$ net commit
Considerations
Spanning Tree and PTP
PTP frames are affected by STP filtering; events, such as an STP topology change (where ports temporarily go into the blocking state), can cause interruptions to PTP communications.
If you configure PTP on bridge ports, NVIDIA recommends that the bridge ports are spanning tree edge ports or in a bridge domain where spanning tree is disabled.
This section describes how to set up user accounts, ssh for remote access, LDAP authentication, TACACS+, and RADIUS AAA.
SSH for Remote Access
You can generate authentication keys to access a Cumulus Linux switch securely with the ssh-keygen component of the Secure Shell (SSH) protocol. Cumulus Linux uses the OpenSSH package to provide this functionality. This section describes how to generate an SSH key pair.
Generate an SSH Key Pair
To generate the SSH key pair, run the ssh-keygen command and follow the prompts:
To configure a completely password free system, do not enter a passphrase when prompted in the following step.
cumulus@leaf01:~$ ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key (/home/cumulus/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/cumulus/.ssh/id_rsa.
Your public key has been saved in /home/cumulus/.ssh/id_rsa.pub.
The key fingerprint is:
5a:b4:16:a0:f9:14:6b:51:f6:f6:c0:76:1a:35:2b:bb cumulus@leaf04
The key's randomart image is:
+---[RSA 2048]----+
| +.o o |
| o * o . o |
| o + o O o |
| + . = O |
| . S o . |
| + . |
| . E |
| |
| |
+-----------------+
To copy the generated public key to the desired location, run the ssh-copy-id command and follow the prompts:
cumulus@leaf01:~$ ssh-copy-id -i /home/cumulus/.ssh/id_rsa.pub cumulus@leaf02
The authenticity of host 'leaf02 (192.168.0.11)' can't be established.
ECDSA key fingerprint is b1:ce:b7:6a:20:f4:06:3a:09:3c:d9:42:de:99:66:6e.
Are you sure you want to continue connecting (yes/no)? yes
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
cumulus@leaf01's password:
Number of key(s) added: 1
ssh-copy-id does not work if the username on the remote switch is different from the username on the local switch. To work around this issue, use the scp command instead:
cumulus@leaf01:~$ scp .ssh/id_rsa.pub cumulus@leaf02:.ssh/authorized_keys
Enter passphrase for key '/home/cumulus/.ssh/id_rsa':
id_rsa.pub
Connect to the remote switch to confirm that the authentication keys are in place:
cumulus@leaf01:~$ ssh cumulus@leaf02
Welcome to Cumulus VX (TM)
Cumulus VX (TM) is a community supported virtual appliance designed for
experiencing, testing and prototyping the latest technology.
For any questions or technical support, visit our community site at:
http://community.cumulusnetworks.com
The registered trademark Linux (R) is used pursuant to a sublicense from LMI,
the exclusive licensee of Linus Torvalds, owner of the mark on a world-wide basis.
Last login: Thu Sep 29 16:56:54 2016
User Accounts
By default, Cumulus Linux has two user accounts: cumulus and root.
The cumulus account:
Uses the default password cumulus. You are required to change the default password when you log into Cumulus Linux for the first time.
Is a user account in the sudo group with sudo privileges.
Can log in to the system through all the usual channels, such as console and SSH.
Along with the cumulus group, has both show and edit rights for NCLU.
The root account:
Has the default password disabled by default
Has the standard Linux root user access to everything on the switch
Disabled password prohibits login to the switch by SSH, telnet, FTP, and so on
You can add additional user accounts as needed. Like the cumulus account, these accounts must use sudo to execute privileged commands; be sure to include them in the sudo group. For example:
You can add and configure user accounts in Cumulus Linux with read-only or edit permissions for NCLU. For more information, see Configure User Accounts.
Enable Remote Access for the root User
The root user does not have a password and cannot log into a switch using SSH. This default account behavior is consistent with Debian. To connect to a switch using the root account, you can do one of the
following:
Generate an SSH key
Set a password
Generate an SSH Key for the root Account
In a terminal on your host system (not the switch), see if a key already exists:
root@host:~# ls -al ~/.ssh/
The name of the key is similar to id_dsa.pub, id_rsa.pub, or id_ecdsa.pub.
If a key does not exist, generate a new one by first creating the RSA key pair:
root@host:~# ssh-keygen -t rsa
You are prompted to enter a file in which to save the key (/root/.ssh/id_rsa). Press Enter to use the home directory of the root user or provide a different destination.
You are prompted to enter a passphrase (empty for no passphrase). This is optional but it does provide an extra layer of security.
The public key is now located in /root/.ssh/id_rsa.pub. The private key (identification) is now located in /root/.ssh/id_rsa.
Copy the public key to the switch. SSH to the switch as the cumulus user, then run:
cumulus@switch:~$ sudo mkdir -p /root/.ssh
cumulus@switch:~$ echo <SSH public key string> | sudo tee -a /root/.ssh/authorized_keys
Set the root User Password
Run the following command:
cumulus@switch:~$ sudo passwd root
Change the PermitRootLogin setting in the /etc/ssh/sshd_config file from without-password to yes.
By default, Cumulus Linux has two user accounts: root and cumulus. The cumulus account is a normal user and is in the group sudo.
You can add more user accounts as needed. Like the cumulus account, these accounts must use sudo to execute privileged commands.
sudo Basics
sudo allows you to execute a command as superuser or another user as specified by the security policy. See man sudo(8) for details.
The default security policy is sudoers, which is configured using /etc/sudoers. Use /etc/sudoers.d/ to add to the default sudoers policy. See man sudoers(5) for details.
Use visudo only to edit the sudoers file; do not use another editor like vi or emacs. See manvisudo(8) for details.
When creating a new file in /etc/sudoers.d, use visudo -f. This option performs sanity checks before writing the file to avoid errors that prevent sudo from working.
Errors in the sudoers file can result in losing the ability to elevate privileges to root. You can fix this issue only by power cycling the switch and booting into single user mode. Before modifying sudoers, enable the root user by setting a password for the root user.
By default, users in the sudo group can use sudo to execute privileged commands. To add users to the sudo group, use the useradd(8) or usermod(8) command. To see which users belong to the sudo group, see /etc/group (man group(5)).
You can run any command as sudo, including su. A password is required.
The example below shows how to use sudo as a non-privileged user cumulus to bring up an interface:
cumulus@switch:~$ ip link show dev swp1
3: swp1: <BROADCAST,MULTICAST> mtu 1500 qdisc pfifo_fast master br0 state DOWN mode DEFAULT qlen 500
link/ether 44:38:39:00:27:9f brd ff:ff:ff:ff:ff:ff
cumulus@switch:~$ ip link set dev swp1 up
RTNETLINK answers: Operation not permitted
cumulus@switch:~$ sudo ip link set dev swp1 up
Password:
umulus@switch:~$ ip link show dev swp1
3: swp1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master br0 state UP mode DEFAULT qlen 500
link/ether 44:38:39:00:27:9f brd ff:ff:ff:ff:ff:ff
sudoers Examples
The following examples show how you grant as few privileges as necessary to a user or group of users to allow them to perform the required task. For each example, the system group noc is used; groups are prefixed with an %.
When executed by an unprivileged user, the example commands below must be prefixed with sudo.
Cumulus Linux uses Pluggable Authentication Modules (PAM) and Name Service Switch (NSS) for user authentication. NSS enables PAM to use LDAP to provide user authentication, group mapping, and information for other services on the system.
NSS specifies the order of the information sources that are used to resolve names for each service. Using NSS with authentication and authorization provides the order and location for user lookup and group mapping on the system.
PAM handles the interaction between the user and the system, providing login handling, session setup, authentication of users, and authorization of user actions.
There are three common ways to configure LDAP authentication on Linux: you can use libnss-ldap, libnss-ldapd, or libnss-sss. This chapter describes libnss-ldapd only. From internal testing, this library worked best with Cumulus Linux and is the easiest to configure, automate, and troubleshoot.
Install libnss-ldapd
The libldap-2.4-2 and libldap-common LDAP packages are already installed on the Cumulus Linux image; however you need to install these additional packages to use LDAP authentication:
libnss-ldapd
libpam-ldapd
ldap-utils
To install the additional packages, run the following command:
You can also install these packages even if the switch is not connected to the internet, as they are contained in the cumulus-local-apt-archive repository that is embedded in the Cumulus Linux image.
Follow the interactive prompts to specify the LDAP URI, search base distinguished name (DN), and services that must have LDAP lookups enabled. You need to select at least the passwd, group, and shadow services (press space to select a service). When done, click OK. This creates a very basic LDAP configuration using anonymous bind and initiates user search under the base DN specified.
After the dialog closes, the install process prints information similar to the following:
/etc/nsswitch.conf: enable LDAP lookups for group
/etc/nsswitch.conf: enable LDAP lookups for passwd
/etc/nsswitch.conf: enable LDAP lookups for shadow
After the installation is complete, the name service caching daemon (nslcd) runs. This service handles all the LDAP protocol interactions and caches information returned from the LDAP server. ldap is appended in the /etc/nsswitch.conf file, as is the secondary information source for passwd, group, and shadow. The local files (/etc/passwd, /etc/groups and /etc/shadow) are used first, as specified by the compat source.
Keep compat as the first source in NSS for passwd, group, and shadow. This prevents you from getting locked out of the system.
Entering incorrect information during the installation process might produce configuration errors. You can correct the information after installation by editing certain configuration files.
Edit the /etc/nslcd.conf file to update the LDAP URI and search base DN (see Update the nslcd.conf File, below).
Edit the /etc/nssswitch.conf file to update the service selections.
▼
Alternative Installation Method Using debconf-utils
Instead of running the installer and following the interactive prompts, as described above, you can pre-seed the installer parameters using debconf-utils.
Run apt-get install debconf-utils and create the pre-seeded parameters using debconf-set-selections. Provide the appropriate answers.
Run debconf-show <pkg> to check the settings. Here is an example of how to pre-seed answers to the installer questions using debconf-set-selections:
root# debconf-set-selections <<'zzzEndOfFilezzz'
# LDAP database user. Leave blank will be populated later!
nslcd nslcd/ldap-binddn string
# LDAP user password. Leave blank!
nslcd nslcd/ldap-bindpw password
# LDAP server search base:
nslcd nslcd/ldap-base string ou=support,dc=rtp,dc=example,dc=test
# LDAP server URI. Using ldap over ssl.
nslcd nslcd/ldap-uris string ldaps://myadserver.rtp.example.test
# New to 0.9. restart cron, exim and others libraries without asking
nslcd libraries/restart-without-asking: boolean true
# LDAP authentication to use:
# Choices: none, simple, SASL
# Using simple because its easy to configure. Security comes by using LDAP over SSL
# keep /etc/nslcd.conf 'rw' to root for basic security of bindDN password
nslcd nslcd/ldap-auth-type select simple
# Don't set starttls to true
nslcd nslcd/ldap-starttls boolean false
# Check server's SSL certificate:
# Choices: never, allow, try, demand
nslcd nslcd/ldap-reqcert select never
# Choices: Ccreds credential caching - password saving, Unix authentication, LDAP Authentication , Create home directory on first time login, Ccreds credential caching - password checking
# This is where "mkhomedir" pam config is activated that allows automatic creation of home directory
libpam-runtime libpam-runtime/profiles multiselect ccreds-save, unix, ldap, mkhomedir , ccreds-check
# for internal use; can be preseeded
man-db man-db/auto-update boolean true
# Name services to configure:
# Choices: aliases, ethers, group, hosts, netgroup, networks, passwd, protocols, rpc, services, shadow
libnss-ldapd libnss-ldapd/nsswitch multiselect group, passwd, shadow
libnss-ldapd libnss-ldapd/clean_nsswitch boolean false
## define platform specific libnss-ldapd debconf questions/answers.
## For demo used amd64.
libnss-ldapd:amd64 libnss-ldapd/nsswitch multiselect group, passwd, shadow
libnss-ldapd:amd64 libnss-ldapd/clean_nsswitch boolean false
# libnss-ldapd:powerpc libnss-ldapd/nsswitch multiselect group, passwd, shadow
# libnss-ldapd:powerpc libnss-ldapd/clean_nsswitch boolean false
Update the nslcd.conf File
After installation, update the main configuration file (/etc/nslcd.conf) to accommodate the expected LDAP server settings.
This section documents some of the more important options that relate to security and how queries are handled. For details on all the available configuration options, read the nslcd.conf man page.
After first editing the /etc/nslcd.conf file and/or enabling LDAP in the /etc/nsswitch.conf file, you must restart netd with the sudo systemctl restart netd command. If you disable LDAP, you need to restart the netd service.
Connection
The LDAP client starts a session by connecting to the LDAP server on TCP and UDP port 389 or on port 636 for LDAPS. Depending on the configuration, this connection might be unauthenticated (anonymous bind); otherwise, the client must provide a bind user and password. The variables used to define the connection to the LDAP server are the URI and bind credentials.
The URI is mandatory and specifies the LDAP server location using the FQDN or IP address. The URI also designates whether to use ldap:// for clear text transport, or ldaps:// for SSL/TLS encrypted transport. You can also specify an alternate port in the URI. In production environments, the LDAPS protocol is recommended so that all communications are secure.
After the connection to the server is complete, the BIND operation authenticates the session. The BIND credentials are optional, and if not specified, an anonymous bind is assumed. This is typically not allowed in most production environments. Configure authenticated (Simple) BIND by specifying the user (binddn) and password (bindpw) in the configuration. Another option is to use SASL (Simple Authentication and Security Layer) BIND, which provides authentication services using other mechanisms, like Kerberos. Contact your LDAP server administrator for this information as it depends on the configuration of the LDAP server and the credentials that are created for the client device.
# The location at which the LDAP server(s) should be reachable.
uri ldaps://ldap.example.com
# The DN to bind with for normal lookups.
binddn cn=CLswitch,ou=infra,dc=example,dc=com
bindpw CuMuLuS
Search Function
When an LDAP client requests information about a resource, it must connect and bind to the server. Then, it performs one or more resource queries depending on the lookup. All search queries sent to the LDAP server are created using the configured search base, filter, and the desired entry (uid=myuser) being searched. If the LDAP directory is large, this search might take a significant amount of time. It is a good idea to define a more specific search base for the common maps (passwd and group).
# The search base that will be used for all queries.
base dc=example,dc=com
# Mapped search bases to speed up common queries.
base passwd ou=people,dc=example,dc=com
base group ou=groups,dc=example,dc=com
Search Filters
It is also common to use search filters to specify criteria used when searching for objects within the directory. This is used to limit the search scope when authenticating users. The default filters applied are:
filter passwd (objectClass=posixAccount)
filter group (objectClass=posixGroup)
Attribute Mapping
The map configuration allows you to override the attributes pushed from LDAP. To override an attribute for a given map, specify the attribute name and the new value. This is useful to ensure that the shell is bash and the home directory is /home/cumulus:
In LDAP, the map refers to one of the supported maps specified in the manpage for nslcd.conf (such as passwd or group).
Create Home Directory on Login
If you want to use unique home directories, run the sudo pam-auth-update command and select Create home directory on login in the PAM configuration dialog (press the space bar to select the option). Select OK, then press Enter to save the update and close the dialog.
cumulus@switch:~$ sudo pam-auth-update
The home directory for any user that logs in (using LDAP or not) is created and populated with the standard dotfiles from /etc/skel if it does not already exist.
When nslcd starts, you might see an error message similar to the following (where 5816 is the nslcd PID):
nslcd[5816]: unable to dlopen /usr/lib/x86_64-linux-gnu/sasl2/libsasldb.so: libdb-5.3.so: cannot open
shared object file: No such file or directory
You can safely ignore this message. The libdb package and resulting log messages from nslcd do not cause any issues when you use LDAP as a client for login and authentication.
Example Configuration
Here is an example configuration using Cumulus Linux.
# /etc/nslcd.conf
# nslcd configuration file. See nslcd.conf(5)
# for details.
# The user and group nslcd should run as.
uid nslcd
gid nslcd
# The location at which the LDAP server(s) should be reachable.
uri ldaps://myadserver.rtp.example.test
# The search base that will be used for all queries.
base ou=support,dc=rtp,dc=example,dc=test
# The LDAP protocol version to use.
#ldap_version 3
# The DN to bind with for normal lookups.
# defconf-set-selections doesn't seem to set this. so have to manually set this.
binddn CN=cumulus admin,CN=Users,DC=rtp,DC=example,DC=test
bindpw 1Q2w3e4r!
# The DN used for password modifications by root.
#rootpwmoddn cn=admin,dc=example,dc=com
# SSL options
#ssl off (default)
# Not good does not prevent man in the middle attacks
#tls_reqcert demand(default)
tls_cacertfile /etc/ssl/certs/rtp-example-ca.crt
# The search scope.
#scope sub
# Add nested group support
# Supported in nslcd 0.9 and higher.
# default wheezy install of nslcd supports on 0.8. wheezy-backports has 0.9
nss_nested_groups yes
# Mappings for Active Directory
# (replace the SIDs in the objectSid mappings with the value for your domain)
# "dsquery * -filter (samaccountname=testuser1) -attr ObjectSID" where cn == 'testuser1'
pagesize 1000
referrals off
idle_timelimit 1000
# Do not allow uids lower than 100 to login (aka Administrator)
# not needed as pam already has this support
# nss_min_uid 1000
# This filter says to get all users who are part of the cumuluslnxadm group. Supports nested groups.
# Example, mary is part of the snrnetworkadm group which is part of cumuluslnxadm group
# Ref: http://msdn.microsoft.com/en-us/library/aa746475%28VS.85%29.aspx (LDAP_MATCHING_RULE_IN_CHAIN)
filter passwd (&(Objectclass=user)(!(objectClass=computer))(memberOf:1.2.840.113556.1.4.1941:=cn=cumuluslnxadm,ou=groups,ou=support,dc=rtp,dc=example,dc=test))
map passwd uid sAMAccountName
map passwd uidNumber objectSid:S-1-5-21-1391733952-3059161487-1245441232
map passwd gidNumber objectSid:S-1-5-21-1391733952-3059161487-1245441232
map passwd homeDirectory "/home/$sAMAccountName"
map passwd gecos displayName
map passwd loginShell "/bin/bash"
# Filter for any AD group or user in the baseDN. the reason for filtering for the
# user to make sure group listing for user files don't say '<user> <gid>'. instead will say '<user> <user>'
# So for cosmetic reasons..nothing more.
filter group (&(|(objectClass=group)(Objectclass=user))(!(objectClass=computer)))
map group gidNumber objectSid:S-1-5-21-1391733952-3059161487-1245441232
map group cn sAMAccountName
Configure LDAP Authorization
Linux uses the sudo command to allow non-administrator users (such as the default cumulus user account) to perform privileged operations. To control the users authorized to use sudo, the /etc/sudoers file and files located in the /etc/sudoers.d/ directory define a series of rules. Typically, the rules are based on groups, but can also be defined for specific users. You can add sudo rules using the group names from LDAP. For example, if a group of users are associated with the group netadmin, you can add a rule to give those users sudo privileges. Refer to the sudoers manual (man sudoers) for a complete usage description. The following shows an example in the /etc/sudoers file:
# The basic structure of a user specification is "who where = (as_whom) what ".
%sudo ALL=(ALL:ALL) ALL
%netadmin ALL=(ALL:ALL) ALL
Active Directory Configuration
Active Directory (AD) is a fully featured LDAP-based NIS server create by Microsoft. It offers unique features that classic OpenLDAP servers do not have. AD can be more complicated to configure on the client and each version works a little differently with Linux-based LDAP clients. Some more advanced configuration examples, from testing LDAP clients on Cumulus Linux with Active Directory (AD/LDAP), are available in our knowledge base.
LDAP Verification Tools
Typically, password and group information is retrieved from LDAP and cached by the LDAP client daemon. To test the LDAP interaction, you can use these command-line tools to trigger an LDAP query from the device. This helps to create the best filters and verify the information sent back from the LDAP server.
Identify a User with the id Command
The id command performs a username lookup by following the lookup information sources in NSS for the passwd service. This simply returns the user ID, group ID and the group list retrieved from the information source. In the following example, the user cumulus is locally defined in /etc/passwd, and myuser is on LDAP. The NSS configuration has the passwd map configured with the sources compat ldap:
cumulus@switch:~$ id cumulus
uid=1000(cumulus) gid=1000(cumulus) groups=1000(cumulus),24(cdrom),25(floppy),27(sudo),29(audio),30(dip),44(video),46(plugdev)
cumulus@switch:~$ id myuser
uid=1230(myuser) gid=3000(Development) groups=3000(Development),500(Employees),27(sudo)
getent
The getent command retrieves all records found with NSS for a given map. It can also retrieve a specific entry under that map. You can perform tests with the passwd, group, shadow, or any other map configured in the /etc/nsswitch.conf file. The output from this command is formatted according to the map requested. For the passwd service, the structure of the output is the same as the entries in /etc/passwd. The group map outputs the same structure as /etc/group.
In this example, looking up a specific user in the passwd map, the user cumulus is locally defined in /etc/passwd, and myuser is only in LDAP.
In the next example, looking up a specific group in the group service, the group cumulus is locally defined in /etc/groups, and netadmin is on LDAP.
cumulus@switch:~$ getent group cumulus
cumulus:x:1000:
cumulus@switch:~$ getent group netadmin
netadmin:*:502:larry,moe,curly,shemp
Running the command getent passwd or getent group without a specific request returns all local and LDAP entries for the passwd and group maps.
LDAP search
The ldapsearch command performs LDAP operations directly on the LDAP server. This does not interact with NSS. This command helps display what the LDAP daemon process is receiving back from the server. The command has many options. The simplest option uses anonymous bind to the host and specifies the search DN and the attribute to look up.
# extended LDIF
#
# LDAPv3
# base <dc=example,dc=com> with scope subtree
# filter: uid=myuser
# requesting: ALL
#
# myuser, people, example.com
dn: uid=myuser,ou=people,dc=example,dc=com
cn: My User
displayName: My User
gecos: myuser
gidNumber: 3000
givenName: My
homeDirectory: /home/myuser
initials: MU
loginShell: /bin/bash
mail: myuser@example.com
objectClass: inetOrgPerson
objectClass: posixAccount
objectClass: shadowAccount
objectClass: top
shadowExpire: -1
shadowFlag: 0
shadowMax: 999999
shadowMin: 8
shadowWarning: 7
sn: User
uid: myuser
uidNumber: 1234
# search result
search: 2
result: 0 Success
# numResponses: 2
# numEntries: 1
NCLU
To use NCLU, a user must be in either the netshow or netedit NCLU group in the LDAP database. You can either:
Add a user or one of their groups to the /etc/netd.conf file manually.
Add a user to the local /etc/group file as a member of the netshow or netedit groups.
In the following example, a user that is not in the netshow or netedit NCLU group in the LDAP database runs the NCLU net show version command, which produces an error:
hsolo@switch:~$ net show version
ERROR: 'getpwuid(): uid not found: 0922'
See /var/log/netd.log for more details
To add user to the netshow or netedit NCLU group in the LDAP database, either edit the /etc/group file manually or use the sudo adduser USERNAME netshow command, then restart netd. For example, to add the user bill to the netshow group:
cumulus@switch:~$ sudo adduser hsolo netshow
Adding user `hsolo' to group `netshow' ...
Adding user hsolo to group netshow
Done.
cumulus@switch:~$ sudo systemctl restart netd
Now, the user can run the NCLU net show commands successfully:
hsolo@switch:~$ net show version
NCLU_VERSION=1.0-cl4u5
DISTRIB_ID="Cumulus Linux"
DISTRIB_RELEASE=4.1.0
DISTRIB_DESCRIPTION="Cumulus Linux 4.1.0"
LDAP Browsers
There are several GUI LDAP clients available that help you work with LDAP servers. These are free tools that show the structure of the LDAP database graphically.
When setting up LDAP authentication for the first time, turn off the nslcd service using the systemctl stop nslcd.service command (or the systemctl stop nslcd@mgmt.service if you are running the service in a management VRF) and run it in debug mode. Debug mode works whether you are using LDAP over SSL (port 636) or an unencrypted LDAP connection (port 389).
The FQDN of the LDAP server URI does not match the FQDN in the CA-signed server certificate exactly.
nslcd cannot read the SSL certificate and reports a Permission denied error in the debug during server connection negotiation. Check the permission on each directory in the path of the root SSL certificate. Ensure that it is readable by the nslcd user.
NSCD
If the nscd cache daemon is also enabled and you make some changes to the user from LDAP, you can clear the cache using the following commands:
nscd --invalidate = passwd
nscd --invalidate = group
The nscd package works with nslcd to cache name entries returned from the LDAP server. This might cause authentication failures. To work around these issues, disable nscd, restart the nslcd service, then retry authentication:
If you are running the nslcd service in a management VRF, you need to run the systemctl restart nslcd@mgmt.service command instead of the systemctl restart nslcd.service command. For example:
When a local username also exists in the LDAP database, the order of the information sources in /etc/nsswitch can be updated to query LDAP before the local user database. This is generally not recommended. For example, the configuration below ensures that LDAP is queried before the local database.
Cumulus Linux implements TACACS+ client AAA (Accounting, Authentication, and Authorization) in a transparent way with minimal configuration. The client implements the TACACS+ protocol as described in this IETF document. There is no need to create accounts or directories on the switch. Accounting records are sent to all configured TACACS+ servers by default. Use of per-command authorization requires additional setup on the switch.
Supported Features
Authentication using PAM; includes login, ssh, sudo and su
TACACS+ privilege 15 users can run any command with sudo using the /etc/sudoers.d/tacplus file that is installed by the libtacplus-map1 package
Up to seven TACACS+ servers
Install the TACACS+ Client Packages
You can install the TACACS+ packages even if the switch is not connected to the internet, as they are contained in the cumulus-local-apt-archive repository that is embedded in the Cumulus Linux image.
To install all required packages, run these commands:
After installing TACACS+, edit the /etc/tacplus_servers file to add at least one server and one shared secret (key). You can specify the server and secret parameters in any order anywhere in the file. Whitespace (spaces or tabs) are not allowed. For example, if your TACACS+ server IP address is 192.168.0.30 and your shared secret is tacacskey, add these parameters to the /etc/tacplus_servers file:
secret=tacacskey
server=192.168.0.30
Cumulus Linux supports a maximum of seven TACACS+ servers. To specify multiple servers, add one per line to the /etc/tacplus_servers file.
Connections are made in the order in which they are listed in this file. In most cases, you do not need to change any other parameters. You can add parameters used by any of the packages to this file, which affects all the TACACS+ client software. For example, the timeout value for an NSS lookup (see description below) is set to 5 seconds by default in the /etc/tacplus_nss.conf file, whereas the timeout value for other packages is 10 seconds and is set in the /etc/tacplus_servers file. The timeout value is per connection to the TACACS+ servers. (If authorization is configured per command, the timeout occurs for each command.) There are several (typically four) connections to the server per login attempt from PAM, as well as two or more through NSS. Therefore, with the default timeout values, a TACACS+ server that is not reachable can delay logins by a minute or more per unreachable server. If you must list unreachable TACACS+ servers, place them at the end of the server list and consider reducing the timeout values.
When you add or remove TACACS+ servers, you must restart auditd (with the systemctl restart auditd command) or you must send a signal (with killall -HUP audisp-tacplus) before audisp-tacplus rereads the configuration to see the changed server list.
You can also configure the IP address used as the source IP address when communicating with the TACACS+ server. See TACACS Configuration Parameters below for the full list of TACACS+ parameters.
Following is the complete list of the TACACS+ client configuration files, and their use.
Filename
Description
/etc/tacplus_servers
This is the primary file that requires configuration after installation. The file is used by all packages with include=/etc/tacplus_servers parameters in the other configuration files that are installed. Typically, this file contains the shared secrets; make sure that the Linux file mode is 600.
/etc/nsswitch.conf
When the libnss_tacplus package is installed, this file is configured to enable a tacplus lookup via libnss_tacplus. If you replace this file by automation or other means, you need to add tacplus as the first lookup method for the passwd database line.
/etc/tacplus_nss.conf
This file sets the basic parameters for libnss_tacplus. It includes a debug variable for debugging an NSS lookup separately from other client packages.
/usr/share/pam-configs/tacplus
This is the configuration file for pam-auth-update to generate the files in the next row. These configurations are used at login, by su, and by ssh.
/etc/pam.d/common-*
The /etc/pam.d/common-* files are updated for tacplus authentication. The files are updated with pam-auth-update, when libpam-tacplus is installed or removed.
/etc/sudoers.d/tacplus
This file allows TACACS+ privilege level 15 users to run commands with sudo. The file includes an example (commented out) of how to enable privilege level 15 TACACS users to use sudo without having to enter a password and provides an example of how to enable all TACACS users to run specific commands with sudo. Only edit this file with the visudo -f /etc/sudoers.d/tacplus command.
/etc/audisp/plugins.d/audisp-tacplus.conf
This is the audisp plugin configuration file. Typically, no modifications are required.
/etc/audisp/audisp-tac_plus.conf
This is the TACACS+ server configuration file for accounting. Typically, no modifications are required. You can use this configuration file when you only want to debug TACACS+ accounting issues, not all TACACS+ users.
/etc/audit/rules.d/audisp-tacplus.rules
The auditd rules for TACACS+ accounting. The augenrules command uses all rule files to generate the rules file (described below).
/etc/audit/audit.rules
This is the audit rules file generated when auditd is installed.
You can edit the /etc/pam.d/common-* files manually. However, if you run pam-auth-update again after making the changes, the update fails. Only perform configuration in /usr/share/pam-configs/tacplus, then run pam-auth-update.
TACACS+ Authentication (login)
The initial authentication configuration is done through the PAM modules and an updated version of the libpam-tacplus package. When the package is installed, the PAM configuration is updated in /etc/pam.d with the pam-auth-update command. If you have made changes to your PAM configuration, you need to integrate these changes yourself. If you are also using LDAP with the libpam-ldap package, you might need to edit the PAM configuration to ensure the LDAP and TACACS ordering that you prefer. The libpam-tacplus are configured to skip over rules and the values in the success=2 might require adjustments to skip over LDAP rules.
A user privilege level is determined by the TACACS+ privilege attribute priv_lvl for the user that is returned by the TACACS+ server during the user authorization exchange. The client accepts the attribute in either the mandatory or optional forms and also accepts priv-lvl as the attribute name. The attribute value must be a numeric string in the range 0 to 15, with 15 the most privileged level.
By default, TACACS+ users at privilege levels other than 15 are not allowed to run sudo commands and are limited to commands that can be run with standard Linux user permissions.
TACACS+ Client Sequencing
Due to SSH and login processing mechanisms, Cumulus Linux needs to know the following very early in the AAA sequence:
Whether the user is a valid TACACS+ user
The user’s privilege level
The only way to do this for non-local users — that is, users not present in the local password file — is to send a TACACS+ authorization request as the first communication with the TACACS+ server, prior to the authentication and before a password is requested from the user logging in.
Some TACACS+ servers need special configuration to allow authorization requests prior to authentication. Contact your TACACS+ server vendor for the proper configuration if your TACACS+ server does not allow the initial authorization request.
Local Fallback Authentication
You can configure the switch to allow local fallback authentication for a user when the TACACS servers are unreachable, do not include the user for authentication, or have the user in the exclude user list.
To allow local fallback authentication for a user, add a local privileged user account on the switch with the same username as a TACACS user. A local user is always active even when the TACACS service is not running.
To configure local fallback authentication:
Edit the /etc/nsswitch.conf file to remove the keyword tacplus from the line starting with passwd. (You need to add the keyword back in step 3.)
An example of the /etc/nsswitch.conf file with the keyword tacplus removed from the line starting with passwd is shown below.
cumulus@switch:~$ sudo nano /etc/nsswitch.conf
#
# Example configuration of GNU Name Service Switch functionality.
# If you have the `glibc-doc-reference' and `info' packages installed, try:
# `info libc "Name Service Switch"' for information about this file.
passwd: files
group: tacplus files
shadow: files
gshadow: files
...
To enable the local privileged user to run sudo and NCLU commands, run the adduser commands shown below. In the example commands, the TACACS account name is tacadmin.
The first adduser command prompts for information and a password. You can skip most of the requested information by pressing ENTER.
Edit the /etc/nsswitch.conf file to add the keyword tacplus back to the line starting with passwd (the keyword you removed in the first step).
cumulus@switch:~$ sudo nano /etc/nsswitch.conf
#
# Example configuration of GNU Name Service Switch functionality.
# If you have the `glibc-doc-reference' and `info' packages installed, try:
# `info libc "Name Service Switch"' for information about this file.
passwd: tacplus files
group: tacplus files
shadow: files
gshadow: files
...
Restart the netd service with the following command:
cumulus@switch:~$ sudo systemctl restart netd
TACACS+ Accounting
TACACS+ accounting is implemented with the audisp module, with an additional plugin for auditd/audisp. The plugin maps the auid in the accounting record to a TACACS login, based on the auid and sessionid. The audisp module requires libnss_tacplus and uses the libtacplus_map.so library interfaces as part of the modified libpam_tacplus package.
Communication with the TACACS+ servers is done with the libsimple-tacact1 library, through dlopen(). A maximum of 240 bytes of command name and arguments are sent in the accounting record, due to the TACACS+ field length limitation of 255 bytes.
All Linux commands result in an accounting record, including commands run as part of the login process or as sub-processes of other commands. This can sometimes generate a large number of accounting records.
Configure the IP address and encryption key of the server in the /etc/tacplus_servers file. Minimal configuration to auditd and audisp is necessary to enable the audit records necessary for accounting. These records are installed as part of the package.
audisp-tacplus installs the audit rules for command accounting. Modifying the configuration files is not usually necessary. However, when a management VRF is configured, the accounting configuration does need special modification because the auditd service starts prior to networking. It is necessary to add the vrf parameter and to signal the audisp-tacplus process to reread the configuration. The example below shows that the management VRF is named mgmt. You can place the vrf parameter in either the /etc/tacplus_servers file or in the /etc/audisp/audisp-tac_plus.conf file.
vrf=mgmt
After editing the configuration file, send the HUP signal killall -HUP audisp-tacplus to notify the accounting process to reread the file.
All sudo commands run by TACACS+ users generate accounting records against the original TACACS+ login name.
For more information, refer to the audisp.8 and auditd.8 man pages.
Configure NCLU for TACACS+ Users
When you install or upgrade TACACS+ packages, mapped user accounts are created automatically. All tacacs0 through tacacs15 users are added to the netshow group.
For any TACACS+ users to execute net add, net del, and net commit commands and to restart services with NCLU, you need to add those users to the users_with_edit variable in the /etc/netd.conf file. Add the tacacs15 user and, depending upon your policies, other users (tacacs1 through tacacs14) to this variable.
To give a TACACS+ user access to the show commands, add the tacacs group to the groups_with_show variable.
Do not add the tacacs group to the groups_with_edit variable; this is dangerous and can potentially enable any user to log into the switch as the root user.
To add the users, edit the /etc/netd.conf file:
cumulus@switch:~$ sudo nano /etc/netd.conf
...
# Control which users/groups are allowed to run "add", "del",
# "clear", "abort", and "commit" commands.
users_with_edit = root, cumulus, tacacs15
groups_with_edit = netedit
# Control which users/groups are allowed to run "show" commands
users_with_show = root, cumulus
groups_with_show = netshow, netedit, tacacs
...
After you save and exit the netd.conf file, restart the netd service. Run:
cumulus@switch:~$ sudo systemctl restart netd
TACACS+ Per-command Authorization
The tacplus-auth command handles the per-command authorization. To make this an enforced authorization, you must change the TACACS+ login to use a restricted shell, with a very limited executable search path. Otherwise, the user can bypass the authorization. The tacplus-restrict utility simplifies the setup of the restricted environment. The example below initializes the environment for the tacacs0 user account. This is the account used for TACACS+ users at privilege level 0.
If the user/command combination is not authorized by the TACACS+ server, a message similar to the following displays:
tacuser0@switch:~$ net show version
net not authorized by TACACS+ with given arguments, not executing
The following table provides the command options:
Option
Description
-i
Initializes the environment. You only need to issue this option once per username.
-a
You can invoke the utility with the -a option as many times as desired. For each command in the -a list, a symbolic link is created from tacplus-auth to the relative portion of the command name in the local bin subdirectory. You also need to enable these commands on the TACACS+ server (refer to the TACACS+ server documentation). It is common to have the server allow some options to a command, but not others.
-f
Re-initializes the environment. If you need to restart, issue the -f option with -i to force the re-initialization; otherwise, repeated use of -i is ignored. As part of the initialization: - The user’s shell is changed to /bin/rbash. - Any existing dot files are saved. - A limited environment is set up that does not allow general command execution, but instead allows only commands from the user’s local bin subdirectory.
For example, if you want to allow the user to be able to run the net and ip commands (if authorized by the TACACS+ server), use the command:
cumulus@switch:~$ sudo tacplus-restrict -i -u tacacs0 -a ip net
After running this command, examine the tacacs0 directory::
cumulus@switch:~$ sudo ls -lR ~tacacs0
total 12
lrwxrwxrwx 1 root root 22 Nov 21 22:07 ip -> /usr/sbin/tacplus-auth
lrwxrwxrwx 1 root root 22 Nov 21 22:07 net -> /usr/sbin/tacplus-auth
Other than shell built-ins, the only two commands the privilege level 0 TACACS users can run are the ip and net commands.
If you mistakenly add potential commands with the -a option, you can remove them. The example below shows how to remove the net command:
cumulus@switch:~$ sudo rm ~tacacs0/bin/net
You can remove all commands as follows:
cumulus@switch:~$ sudo rm ~tacacs0/bin/*
Use the man command on the switch for more information on tacplus-auth and tacplus-restrict.
cumulus@switch:~$ man tacplus-auth tacplus-restrict
NSS Plugin
When used with pam_tacplus, TACACS+ authenticated users can log in without a local account on the system using the NSS plugin that comes with the tacplus_nss package. The plugin uses the mapped tacplus information if the user is not found in the local password file, provides the getpwnam() and getpwuid()entry point,s and uses the TACACS+ authentication functions.
The plugin asks the TACACS+ server if the user is known, and then for relevant attributes to determine the privilege level of the user. When the libnss_tacplus package is installed, nsswitch.conf is modified to set tacplus as the first lookup method for passwd. If the order is changed, the lookup return the local accounts, such as tacacs0
If the user is not found, a mapped lookup is performed using the libtacplus.so exported functions. The privilege level is appended to tacacs and the lookup searches for the name in the local password file. For example, privilege level 15 searches for the tacacs15 user. If the user is found, the password structure is filled in with information for the user.
If the user is not found, the privilege level is decremented and checked again until privilege level 0 (user tacacs0) is reached. This allows use of only the two local users tacacs0 and tacacs15, if minimal configuration is desired.
TACACS Configuration Parameters
The recognized configuration options are the same as the libpam_tacplus command line arguments; however, not all pam_tacplus options are supported. These configuration parameters are documented in the tacplus_servers.5 man page, which is part of the libpam-tacplus package.
The table below describes the configuration options available:
Configuration Option
Description
debug
The output debugging information through syslog(3). Note: Debugging is heavy, including passwords. Do not leave debugging enabled on a production switch after you have completed troubleshooting.
secret=STRING
The secret key used to encrypt and decrypt packets sent to and received from the server. You can specify the secret key more than once in any order with respect to the server= parameter. When fewer secret= parameters are specified, the last secret given is used for the remaining servers. Only use this parameter in files such as /etc/tacplus_servers that are not world readable.
server=hostname server=ip-address
Adds a TACACS+ server to the servers list. Servers are queried in turn until a match is found, or no servers remain in the list. Can be specified up to 7 times. An IP address can be optionally followed by a port number, preceded by a “:”. The default port is 49. Note: When sending accounting records, the record is sent to all servers in the list if acct_all=1, which is the default.
source_ip=ipv4-address
Sets the IP address used as the source IP address when communicating with the TACACS+ server. You must specify an IPv4 address. IPv6 addresses and hostnames are not supported. The address must be valid for the interface being used.
timeout=seconds
TACACS+ server communication timeout. This parameter defaults to 10 seconds in the /etc/tacplus_servers file, but defaults to 5 seconds in the /etc/tacplus_nss.conf file.
include=/file/name
A supplemental configuration file to avoid duplicating configuration information. You can include up to 8 more configuration files.
min_uid=value
The minimum user ID that the NSS plugin looks up. Setting it to 0 means uid 0 (root) is never looked up, which is desirable for performance reasons. The value should not be greater than the local TACACS+ user IDs (0 through 15), to ensure they can be looked up.
exclude_users=user1,user2,…
A comma-separated list of usernames that are never looked up by the NSS plugin, set in the tacplus_nss.conf file. You cannot use * (asterisk) as a wild card in the list. While it’s not a legal username, bash may lookup this as a user name during pathname completion, so it is included in this list as a username string. Note: Do not remove the cumulus user from the exclude_users list; doing so can make it impossible to log in as the cumulus user, which is the primary administrative account in Cumulus Linux. If you do remove the cumulus user, add some other local fallback user that does not rely on TACACS but is a member of sudo and netedit groups, so that these accounts can run sudo and NCLU commands.
login=string
TACACS+ authentication service (pap, chap, or login). The default value is pap.
user_homedir=1
This is not enabled by default. When enabled, a separate home directory for each TACACS+ user is created when the TACACS+ user first logs in. By default, the home directory in the mapping accounts in /etc/passwd (/home/tacacs0 … /home/tacacs15) is used. If the home directory does not exist, it is created with the mkhomedir_helper program, in the same way as pam_mkhomedir. This option is not honored for accounts with restricted shells when per-command authorization is enabled.
acct_all=1
Configuration option for audisp_tacplus and pam_tacplus sending accounting records to all supplied servers (1), or the first server to respond (0). The default value is 1.
timeout=seconds
Sets the timeout in seconds for connections to each TACACS+ server. The default is 10 seconds except an NSS lookup uses a 5 second timeout.
vrf=vrf-name
If the management network is in a VRF, set this variable to the VRF name. This is typically mgmt. When this variable is set, the connection to the TACACS+ accounting servers is made through the named VRF.
service
TACACS+ accounting and authorization service. Examples include shell, pap, raccess, ppp, and slip. The default value is shell.
protocol
TACACS+ protocol field. This option is use dependent. PAM uses the SSH protocol.
Remove the TACACS+ Client Packages
To remove all of the TACACS+ client packages, use the following commands:
You can use the getent command to determine if TACACS+ is configured correctly and if the local password is stored in the configuration files. In the example commands below, the cumulus user represents the local user, while cumulusTAC represents the TACACS user.
To look up the username within all NSS methods:
cumulus@switch:~$ sudo getent passwd cumulusTAC
cumulusTAC:x:1016:1001:TACACS+ mapped user at privilege level 15,,,:/home/tacacs15:/bin/bash
To look up the user within the local database only:
To look up the user within the TACACS+ database only:
cumulus@switch:~$ sudo getent -s tacplus passwd cumulusTAC
cumulusTAC:x:1016:1001:TACACS+ mapped user at privilege level 15,,,:/home/tacacs15:/bin/bash
If TACACS does not appear to be working correctly, debug the following configuration files by adding the debug=1 parameter to one or more of these files:
/etc/tacplus_servers
/etc/tacplus_nss.conf
You can also add debug=1 to individual pam_tacplus lines in /etc/pam.d/common*.
All log messages are stored in /var/log/syslog.
Incorrect Shared Key
The TACACS client on the switch and the TACACS server should have the same shared secret key. If this key is incorrect, the following message is printed to syslog:
2017-09-05T19:57:00.356520+00:00 leaf01 sshd[3176]: nss_tacplus: TACACS+ server 192.168.0.254:49 read failed with protocol error (incorrect shared secret?) user cumulus
Issues with Per-command Authorization
To debug TACACS user command authorization, have the TACACS+ user enter
the following command at a shell prompt, then try the command again:
tacuser0@switch:~$ export TACACSAUTHDEBUG=1
When this debugging is enabled, additional information is shown for the command authorization conversation with the TACACS+ server:
tacuser0@switch:~$ net pending
tacplus-auth: found matching command (/usr/bin/net) request authorization
tacplus-auth: error connecting to 10.0.3.195:49 to request authorization for net: Transport endpoint is not connected
tacplus-auth: cmd not authorized (16)
tacplus-auth: net not authorized from 192.168.3.189:49
net not authorized by TACACS+ with given arguments, not executing
tacuser0@switch:~$ net show version
tacplus-auth: found matching command (/usr/bin/net) request authorization
tacplus-auth: error connecting to 10.0.3.195:49 to request authorization for net: Transport endpoint is not connected
tacplus-auth: 192.168.3.189:49 authorized command net
tacplus-auth: net authorized, executing
DISTRIB_ID="Cumulus Linux"
DISTRIB_RELEASE=4.1.0
DISTRIB_DESCRIPTION="Cumulus Linux 4.1.0"
To disable debugging:
tacuser0@switch:~$ export -n TACACSAUTHDEBUG
Debug Issues with Accounting Records
If you have added or deleted TACACS+ servers from the configuration files, make sure you notify the audisp plugin with this command:
If accounting records are still not being sent, add debug=1 to the /etc/audisp/audisp-tac_plus.conf file, then issue the command above to notify the plugin. Ask the TACACS+ user to run a command and examine the end of /var/log/syslog for messages from the plugin. You can also check the auditing log file /var/log/audit audit.log to be sure the auditing records are being written. If they are not, restart the audit daemon with:
The following table describes the different pieces of software involved with delivering TACACS.
Package Name
Description
audisp-tacplus_1.0.0-1-cl3u3
This package uses auditing data from auditd to send accounting records to the TACACS+ server and is started as part of auditd.
libtac2_1.4.0-cl3u2
Basic TACACS+ server utility and communications routines.
libnss-tacplus_1.0.1-cl3u3
Provides an interface between libc username lookup, the mapping functions, and the TACACS+ server.
tacplus-auth-1.0.0-cl3u1
This package includes the tacplus-restrict setup utility, which enables you to perform per-command TACACS+ authorization. Per-command authorization is not done by default.
libpam-tacplus_1.4.0-1-cl3u2
A modified version of the standard Debian package.
libtacplus-map1_1.0.0-cl3u2
The mapping functionality between local and TACACS+ users on the server. Sets the immutable sessionid and auditing UID to ensure the original user can be tracked through multiple processes and privilege changes. Sets the auditing loginuid as immutable if supported. Creates and maintains a status database in /run/tacacs_client_map to manage and lookup mappings.
libsimple-tacacct1_1.0.0-cl3u2
Provides an interface for programs to send accounting records to the TACACS+ server. Used by audisp-tacplus.
libtac2-bin_1.4.0-cl3u2
Provides the tacc testing program and TACACS+ man page.
Considerations
TACACS+ Client Is only Supported through the Management Interface
The TACACS+ client is only supported through the management interface on the switch: eth0, eth1, or the VRF management interface. The TACACS+ client is not supported through bonds, switch virtual interfaces (SVIs), or switch port interfaces (swp).
Multiple TACACS+ Users
If two or more TACACS+ users are logged in simultaneously with the same privilege level, while the accounting records are maintained correctly, a lookup on either name will match both users, while a UID lookup will only return the user that logged in first.
This means that any processes run by either user will be attributed to both, and all files created by either user will be attributed to the first name matched. This is similar to adding two local users to the password file with the same UID and GID, and is an inherent limitation of using the UID for the base user from the password file.
The current algorithm returns the first name matching the UID from the mapping file; this can be the first or the second user that logged in.
To work around this issue, you can use the switch audit log or the TACACS server accounting logs to determine which processes and files are created by each user.
For commands that do not execute other commands (for example, changes to configurations in an editor, or actions with tools like clagctl and vtysh), no additional accounting is done.
Per-command authorization is implemented at the most basic level (commands are permitted or denied based on the standard Linux user permissions for the local TACACS users and only privilege level 15 users can run sudo commands by default).
The Linux auditd system does not always generate audit events for processes when terminated with a signal (with the kill system call or internal errors such as SIGSEGV). As a result, processes that exit on a signal that is not caught and handled, might not generate a STOP accounting record.
Issues with deluser Command
TACACS+ and other non-local users that run the deluser command with the --remove-home option will see an error about not finding the user in /etc/passwd:
tacuser0@switch: deluser --remove-home USERNAME
userdel: cannot remove entry 'USERNAME' from /etc/passwd
/usr/sbin/deluser: `/usr/sbin/userdel USERNAME' returned error code 1. Exiting
However, the command does remove the home directory. The user can still log in on that account, but will not have a valid home directory. This is a known upstream issue with the deluser command for all non-local users.
Only use the --remove-home option when the user_homedir=1 configuration command is in use.
When Both TACACS+ and RADIUS AAA Clients Are Installed
When you have both the TACACS+ and the RADIUS AAA client installed, RADIUS login is not attempted. As a workaround, do not install both the TACACS+ and the RADIUS AAA client on the same switch.
RADIUS AAA
Various add-on packages enable RADIUS users to log in to Cumulus Linux switches in a transparent way with minimal configuration. There is no need to create accounts or directories on the switch. Authentication is handled with PAM and includes login, ssh, sudo and su.
Install the RADIUS Packages
You can install the RADIUS packages even if the switch is not connected to the internet, as they are contained in the cumulus-local-apt-archive repository that is embedded in the Cumulus Linux image.
After installation is complete, either reboot the switch or run the sudo systemctl restart netd command.
The libpam-radius-auth package supplied with the Cumulus Linux RADIUS client is a newer version than the one in Debian Buster. This package contains support for IPv6, the src_ip option described below, as well as a number of bug fixes and minor features. The package also includes VRF support, provides man pages describing the PAM and RADIUS configuration, and sets the SUDO_PROMPT environment variable to the login name for RADIUS mapping support.
The libnss-mapuser package is specific to Cumulus Linux and supports the getgrent, getgrnam and getgrgid library interfaces. These interfaces add logged in RADIUS users to the group member list for groups that contain the mapped_user (radius_user) if the RADIUS account is unprivileged, and add privileged RADIUS users to the group member list for groups that contain the mapped_priv_user (radius_priv_user) during the group lookups.
During package installation:
The PAM configuration is modified automatically using pam-auth-update (8), and the NSS configuration file /etc/nsswitch.conf is modified to add the mapuser and mapuid plugins. If you remove or purge the packages, these files are modified to remove the configuration for these plugins.
The radius_shell package is added, which installs the /sbin/radius_shell and setcap cap_setuid program used as the login shell for RADIUS accounts. The package adjusts the UID when needed, then runs the bash shell with the same arguments. When installed, the package changes the shell of the RADIUS accounts to /sbin//radius_shell, and to /bin/shell if the package is removed. This package is required for privileged RADIUS users to be enabled. It is not required for regular RADIUS client use.
The radius_user account is added to the netshow group and the radius_priv_user account to the netedit and sudo groups. This change enables all RADUS logins to run NCLU net show commands and all privileged RADIUS users to also run net add, net del, and net commit commands, and to use sudo.
Configure the RADIUS Client
To configure the RADIUS client, edit the /etc/pam_radius_auth.conf file:
Add the hostname or IP address of at least one RADIUS server (such as a freeradius server on Linux), and the shared secret used to authenticate and encrypt communication with each server.
The hostname of the switch must be resolvable to an IP address, which, in general, is fixed in DNS. If for some reason you cannot find the hostname in DNS, you can add the hostname to the /etc/hosts file manually. However, this can cause problems since the IP address is usually assigned by DHCP, which can change at any time.
Multiple server configuration lines are verified in the order listed. Other than memory, there is no limit to the number of RADIUS servers you can use.
The server port number or name is optional. The system looks up the port in the /etc/services file. However, you can override the ports in the /etc/pam_radius_auth.conf file.
If the server is slow or latencies are high, change the timeout setting. The setting defaults to 3 seconds.
If you want to use a specific interface to reach the RADIUS server, specify the src_ip option. You can specify the hostname of the interface, an IPv4, or an IPv6 address. If you specify the src_ip option, you must also specify the timeout option.
Set the vrf-name field. This is typically set to mgmt if you are using a management VRF. You cannot specify more than one VRF.
The configuration file includes the mapped_priv_user field that sets the account used for privileged RADIUS users and the priv-lvl field that sets the minimum value for the privilege level to be considered a privileged login (the default value is 15). If you edit these fields, make sure the values match those set in the /etc/nss_mapuser.conf file.
The following example provides a sample /etc/pam_radius_auth.conf file configuration:
mapped_priv_user radius_priv_user
# server[:port] shared_secret timeout (secs) src_ip
192.168.0.254 secretkey
other-server othersecret 3 192.168.1.10
# when mgmt vrf is in use
vrf-name mgmt
If this is the first time you are configuring the RADIUS client, uncomment the debug line to help with troubleshooting. The debugging messages are written to /var/log/syslog. When the RADIUS client is working correctly, comment out the debug line.
As an optional step, you can set PAM configuration keywords by editing the /usr/share/pam-configs/radius file. After you edit the file, you must run the pam-auth-update --package command. PAM configuration keywords are described in the pam_radius_auth (8) man page.
The privilege level for the user on the switch is determined by the value of the VSA (Vendor Specific Attribute) shell:priv-lvl. If the attribute is not returned, the user is unprivileged. The following shows an example using the freeradius server for a fully-privileged user.
The VSA vendor name (Cisco-AVPair in the example above) can have any content. The RADIUS client only checks for the string shell:priv-lvl.
Enable Login without Local Accounts
Because LDAP is not commonly used with switches and adding accounts locally is cumbersome, Cumulus Linux includes a mapping capability with the libnss-mapuser package.
Mapping is done using two NSS (Name Service Switch) plugins, one for account name, and one for UID lookup. These accounts are configured automatically in /etc/nsswitch.conf during installation and are removed when the package is removed. See the nss_mapuser (8) man page for the full description of this plugin.
A username is mapped at login to a fixed account specified in the configuration file, with the fields of the fixed account used as a template for the user that is logging in.
For example, if the name being looked up is dave and the fixed account in the configuration file is radius_user, and that entry in /etc/passwd is:
then the matching line returned by running getent passwd dave is:
cumulus@switch:~$ getent passwd dave
dave:x:1017:1002:dave mapped user:/home/dave:/bin/bash
The home directory /home/dave is created during the login process if it does not already exist and is populated with the standard skeleton files by the mkhomedir_helper command.
The configuration file /etc/nss_mapuser.conf is used to configure the plugins. The file includes the mapped account name, which is radius_user by default. You can change the mapped account name by editing the file. The nss_mapuser (5) man page describes the configuration file.
A flat file mapping is done based on the session number assigned during login, which persists across su and sudo. The mapping is removed at logout.
Local Fallback Authentication
If a site wants to allow local fallback authentication for a user when none of the RADIUS servers can be reached you can add a privileged user account as a local account on the switch. The local account must have the same unique identifier as the privileged user and the shell must be the same.
To configure local fallback authentication:
Add a local privileged user account. For example, if the radius_priv_user account in the /etc/passwd file is radius_priv_user:x:1002:1001::/home/radius_priv_user:/sbin/radius_shell, run the following command to add a local privileged user account named johnadmin:
The RADIUS fixed account is not removed from the /etc/passwd or /etc/group file and the home directories are not removed. They remain in case there are modifications to the account or files in the home directories.
To remove the home directories of the RADIUS users, first get the list by running:
cumulus@switch:~$ sudo ls -l /home | grep radius
For all users listed, except the radius_user, run this command to remove the home directories:
where USERNAME is the account name (the home directory relative portion). This command gives the following warning because the user is not listed in the /etc/passwd file.
userdel: cannot remove entry 'USERNAME' from /etc/passwd
/usr/sbin/deluser: `/usr/sbin/userdel USERNAME' returned error code 1. Exiting.
After removing all the RADIUS users, run the command to remove the fixed account. If the account has been changed in the /etc/nss_mapuser.conf file, use that account name instead of radius_user.
If two or more RADIUS users are logged in simultaneously, a UID lookup only returns the user that logged in first. Any processes run by either user get attributed to both, and all files created by either user get attributed to the first name matched. This is similar to adding two local users to the password file with the same UID and GID, and is an inherent limitation of using the UID for the fixed user from the password file. The current algorithm returns the first name matching the UID from the mapping file; this might be the first or second user that logged in.
When you have both the TACACS+ and the RADIUS AAA client installed, RADIUS login is not attempted. As a workaround, do not install both the TACACS+ and the RADIUS AAA client on the same switch.
Netfilter - ACLs
Netfilter is the packet filtering framework in Cumulus Linux as well as most other Linux distributions. There are a number of tools available for configuring ACLs in Cumulus Linux:
iptables, ip6tables, and ebtables are Linux userspace tools used to administer filtering rules for IPv4 packets, IPv6 packets, and Ethernet frames (layer 2 using MAC addresses).
NCLU is a Cumulus Linux-specific userspace tool used to configure custom ACLs.
cl-acltool is a Cumulus Linux-specific userspace tool used to administer filtering rules and configure default ACLs.
NCLU and cl-acltool operate on various configuration files and use iptables, ip6tables, and ebtables to install rules into the kernel. In addition, NCLU and cl-acltool program rules in hardware for interfaces involving switch port interfaces, which iptables, ip6tables and ebtables cannot do on their own.
In many instances, you can use NCLU to configure ACLs; however, in some cases, you must use cl-acltool. In NCLU, you can run the net example acl command to see a basic configuration.
Traffic Rules In Cumulus Linux
Chains
Netfilter describes the mechanism for which packets are classified and controlled in the Linux kernel. Cumulus Linux uses the Netfilter framework to control the flow of traffic to, from, and across the switch. Netfilter does not require a separate software daemon to run; it is part of the Linux kernel itself. Netfilter asserts policies at layers 2, 3 and 4 of the OSI model by inspecting packet and frame headers based on a list of rules. Rules are defined using syntax provided by the iptables, ip6tables and ebtables userspace applications.
The rules created by these programs inspect or operate on packets at several points in the life of the packet through the system. These five points are known as chains and are shown here:
The chains and their uses are:
PREROUTING touches packets before they are routed
INPUT touches packets after they are determined to be destined for the local system but before they are received by the control plane software
FORWARD touches transit traffic as it moves through the box
OUTPUT touches packets that are sourced by the control plane software before they are put on the wire
POSTROUTING touches packets immediately before they are put on the wire but after the routing decision has been made
Tables
When building rules to affect the flow of traffic, the individual chains can be accessed by tables. Linux provides three tables by default:
Filter classifies traffic or filters traffic
NAT applies Network Address Translation rules
Mangle alters packets as they move through the switch
Each table has a set of default chains that can be used to modify or inspect packets at different points of the path through the switch. Chains contain the individual rules to influence traffic. Each table and the default chains they support are shown below. Tables and chains in green are supported by Cumulus Linux, those in red are not supported (that is, they are not hardware accelerated) at this time.
Rules
Rules are the items that actually classify traffic to be acted upon. Rules are applied to chains, which are attached to tables, similar to the graphic below.
Rules have several different components; the examples below highlight those different components.
Table: The first argument is the table. Notice the second example does not specify a table, that is because the filter table is implied if a table is not specified.
Chain: The second argument is the chain. Each table supports several different chains. See Understanding Tables above.
Matches: The third arguments are called the matches. You can specify multiple matches in a single rule. However, the more matches you use in a rule, the more memory that rule consumes.
Jump: The jump specifies the target of the rule; that is, what action to take if the packet matches the rule. If this option is omitted in a rule, then matching the rule will have no effect on the packet’s fate, but the counters on the rule will be incremented.
Targets: The target can be one more more a user-defined chain (other than the one this rule is in), one of the special built-in targets that decides the fate of the packet immediately (like DROP), or an extended target. See the Supported Rule Types section below for examples of different targets.
How Rules Are Parsed and Applied
All the rules from each chain are read from iptables, ip6tables, and ebtables and entered in order into either the filter table or the mangle table. The rules are read from the kernel in the following order:
IPv6 (ip6tables)
IPv4 (iptables)
ebtables
When rules are combined and put into one table, the order determines the relative priority of the rules; iptables and ip6tables have the highest precedence and ebtables has the lowest.
The Linux packet forwarding construct is an overlay for how the silicon underneath processes packets. Be aware of the following:
The order of operations for how rules are processed is not perfectly maintained when you compare how iptables and the switch silicon process packets. The switch silicon reorders rules when switchd writes to the ASIC, whereas traditional iptables execute the list of rules in order.
All rules, except for POLICE and SETCLASS rules, are terminating; after a rule matches, the action is carried out and no more rules are processed. In the example below, the SETCLASS action applied with the --in-interface option, creates the internal ASIC classification, and continues to process the next rule, which does the rate-limiting for the matched protocol:
When processing traffic, rules affecting the FORWARD chain that specify an ingress interface are performed prior to rules that match on an egress interface. As a workaround, rules that only affect the egress interface can have an ingress interface wildcard (currently, only swp+ and bond+ are supported as wildcard names; see below) that matches any interface applied so that you can maintain order of operations with other input interface rules. For example, with the following rules:
-A FORWARD -i $PORTA -j ACCEPT
-A FORWARD -o $PORTA -j ACCEPT <-- This rule is performed LAST (because of egress interface matching)
-A FORWARD -i $PORTB -j DROP
If you modify the rules like this, they are performed in order:
-A FORWARD -i $PORTA -j ACCEPT
-A FORWARD -i swp+ -o $PORTA -j ACCEPT <-- These rules are performed in order (because of wildcard match on ingress interface)
-A FORWARD -i $PORTB -j DROP
When using rules that do a mangle and a filter lookup for a packet, Cumulus Linux processes them in parallel and combines the action.
If a switch port is assigned to a bond, any egress rules must be assigned to the bond.
When using the OUTPUT chain, rules must be assigned to the source. For example, if a rule is assigned to the switch port in the direction of traffic but the source is a bridge (VLAN), the traffic is not affected by the rule and must be applied to the bridge.
If all transit traffic needs to have a rule applied, use the FORWARD chain, not the OUTPUT chain.
ebtable rules are put into either the IPv4 or IPv6 memory space depending on whether the rule utilizes IPv4 or IPv6 to make a decision. Layer 2-only rules that match the MAC address are put into the IPv4 memory space.
On Broadcom switches, the ingress INPUT chain rules match layer 2 and layer 3 multicast packets before multicast packet replication has occurred; therefore, a DROP rule affects all copies.
Rule Placement in Memory
INPUT and ingress (FORWARD -i) rules occupy the same memory space. A rule counts as ingress if the -i option is set. If both input and output options (-i and -o) are set, the rule is considered as ingress and occupies that memory space. For example:
However, removing the -o option and interface make it a valid rule.
Nonatomic Update Mode and Atomic Update Mode
In Cumulus Linux, atomic update mode is enabled by default. However, this mode limits the number of ACL rules that you can configure.
To increase the number of ACL rules that can be configured, configure the switch to operate in nonatomic mode.
How the Rules Get Installed
Instead of reserving 50% of your TCAM space for atomic updates, incremental update uses the available free space to write the new TCAM rules and swap over to the new rules after this is complete. Cumulus Linux then deletes the old rules and frees up the original TCAM space. If there is insufficient free space to complete this task, the original nonatomic update is performed, which interrupts traffic.
Enable Nonatomic Update Mode
You can enable nonatomic updates for switchd, which offer better scaling because all TCAM resources are used to actively impact traffic. With atomic updates, half of the hardware resources are on standby and do not actively impact traffic.
Incremental nonatomic updates are table based, so they do not interrupt network traffic when new rules are installed. The rules are mapped into the following tables and are updated in this order:
mirror (ingress only)
ipv4-mac (can be both ingress and egress)
ipv6 (ingress only)
The incremental nonatomic update operation follows this order:
Updates are performed incrementally, one table at a time without stopping traffic.
Cumulus Linux checks if the rules in a table have changed since the last time they were installed; if a table does not have any changes, it is not reinstalled.
If there are changes in a table, the new rules are populated in new groups or slices in hardware, then that table is switched over to the new groups or slices.
Finally, old resources for that table are freed. This process is repeated for each of the tables listed above.
If sufficient resources do not exist to hold both the new rule set and old rule set, the regular nonatomic mode is attempted. This interrupts network traffic.
If the regular nonatomic update fails, Cumulus Linux reverts back to the previous rules.
Restarting the switchd service causes all network ports to reset, interrupting network services, in addition to resetting the switch hardware configuration.
During regular non-incremental nonatomic updates, traffic is stopped first, then enabled after the new configuration is written into the hardware completely.
Use iptables, ip6tables, and ebtables Directly
Using iptables, ip6tables, ebtables directly is not recommended because any rules installed in these cases only are applied to the Linux kernel and are not hardware accelerated using synchronization to the switch silicon. Running cl-acltool -i (the installation command) resets all rules and deletes anything that is not stored in /etc/cumulus/acl/policy.conf.
For example, performing:
cumulus@switch:~$ sudo iptables -A INPUT -p icmp --icmp-type echo-request -j DROP
Appears to work, and the rule appears when you run cl-acltool -L:
cumulus@switch:~$ sudo cl-acltool -L ip
-------------------------------
Listing rules of type iptables:
-------------------------------
TABLE filter :
Chain INPUT (policy ACCEPT 72 packets, 5236 bytes)
pkts bytes target prot opt in out source destination
0 0 DROP icmp -- any any anywhere anywhere icmp echo-request
However, the rule is not synced to hardware when applied in this way and running cl-acltool -i or reboot removes the rule without replacing it. To ensure all rules that can be in hardware are hardware accelerated, place them in /etc/cumulus/acl/policy.conf and install them by running cl-acltool -i.
Estimate the Number of Rules
To estimate the number of rules you can create from an ACL entry, first determine if that entry is an ingress or an egress. Then, determine if it is an IPv4-mac or IPv6 type rule. This determines the slice to which the rule belongs. Use the following to determine how many entries are used up for each type.
By default, each entry occupies one double wide entry, except if the entry is one of the following:
An entry with multiple comma-separated input interfaces is split into one rule for each input interface (listed after --in-interface below). For example, this entry splits into two rules:
-A FORWARD --in-interface swp1s0,swp1s1 -p icmp -j ACCEPT
An entry with multiple comma-separated output interfaces is split into one rule for each output interface (listed after --out-interface below). This entry splits into two rules:
-A FORWARD --in-interface swp+ --out-interface swp1s0,swp1s1 -p icmp -j ACCEPT
An entry with both input and output comma-separated interfaces is split into one rule for each combination of input and output interface (listed after --in-interface and --out-interface below). This entry splits into four rules:
-A FORWARD --in-interface swp1s0,swp1s1 --out-interface swp1s2,swp1s3 -p icmp -j ACCEPT
An entry with multiple layer 4 port ranges is split into one rule for each range (listed after --dports below). For example, this entry splits into two rules:
Cumulus Linux supports matching ACL rules for both ingress and egress interfaces on both
VLAN-aware and traditional mode bridges, including bridge SVIs (switch VLAN interfaces) for input and output. However, keep the following in mind:
If a traditional mode bridge has a mix of different VLANs, or has both access and trunk members, output interface matching is not supported.
For iptables rules, all IP packets in a bridge are matched, not just routed packets.
You cannot match both input and output interfaces in a rule.
For routed packets, Cumulus Linux cannot match the output bridge for SPAN/ERSPAN.
Matching SVI interfaces in ebtable rules is supported on switches based on Broadcom ASICs. This feature is not currently supported on switches with NVIDIA Spectrum ASICs.
Example rules for a VLAN-aware bridge:
[ebtables]
-A FORWARD -i vlan100 -p IPv4 --ip-protocol icmp -j DROP
-A FORWARD -o vlan100 -p IPv4 --ip-protocol icmp -j ACCEPT
[iptables]
-A FORWARD -i vlan100 -p icmp -j DROP
-A FORWARD --out-interface vlan100 -p icmp -j ACCEPT
-A FORWARD --in-interface vlan100 -j POLICE --set-mode pkt --set-rate 1 --set-burst 1 --set-class 0
Example rules for a traditional mode bridge:
[ebtables]
-A FORWARD -i br0 -p IPv4 --ip-protocol icmp -j DROP
-A FORWARD -o br0 -p IPv4 --ip-protocol icmp -j ACCEPT
[iptables]
-A FORWARD -i br0 -p icmp -j DROP
-A FORWARD --out-interface br0 -p icmp -j ACCEPT
-A FORWARD --in-interface br0 -j POLICE --set-mode pkt --set-rate 1 --set-burst 1 --set-class 0
Match on VLAN IDs on Layer 2 Interfaces
On switches with Spectrum ASICs, you can match on VLAN IDs on layer 2 interfaces for ingress rules.
The following example matches on a VLAN and DSCP class, and sets the internal class of the packet. This can be combined with ingress iptable rules to get extended matching on IP fields.
[ebtables]
-A FORWARD -p 802_1Q --vlan-id 100 -j mark --mark-set 102
[iptables]
-A FORWARD -i swp31 -m mark --mark 102 -m dscp --dscp-class CS1 -j SETCLASS --class 2
Cumulus Linux reserves mark values between 0 and 100; for example, if you use --mark-set 10, you see an error. Use mark values between 101 and 4196.
You cannot mark multiple VLANs with the same value.
Install and Manage ACL Rules with NCLU
NCLU provides an easy way to create custom ACLs in Cumulus Linux. The rules you create live in the /var/lib/cumulus/nclu/nclu_acl.conf file, which gets converted to a rules file, /etc/cumulus/acl/policy.d/50_nclu_acl.rules. This way, the rules you create with NCLU are independent of the two default files in /etc/cumulus/acl/policy.d/00control_plane.rules and 99control_plane_catch_all.rules, as the content in these files might get updated after you upgrade Cumulus Linux.
Instead of crafting a rule by hand then installing it using cl-acltool, NCLU handles many of the options automatically. For example, consider the following iptables rule:
You create this rule, called EXAMPLE1, using NCLU like this:
cumulus@switch:~$ net add acl ipv4 EXAMPLE1 accept tcp source-ip 10.0.14.2/32 source-port any dest-ip 10.0.15.8/32 dest-port any
cumulus@switch:~$ net pending
cumulus@switch:~$ net commit
All options, such as the -j and -p, even FORWARD in the above rule, are added automatically when you apply the rule to the control plane; NCLU figures it all out for you.
You can also set a priority value, which specifies the order in which the rules get executed and the order in which they appear in the rules file. Lower numbers are executed first. To add a new rule in the middle, first run net show config acl, which displays the priority numbers. Otherwise, new rules get appended to the end of the list of rules in the nclu_acl.conf and 50_nclu_acl.rules files.
If you need to hand edit a rule, do not edit the 50_nclu_acl.rules file. Instead, edit the nclu_acl.conf file.
After you add the rule, you need to apply it to an inbound or outbound interface using net add int acl. The inbound interface in our example is swp1:
cumulus@switch:~$ net add int swp1 acl ipv4 EXAMPLE1 inbound
cumulus@switch:~$ net pending
cumulus@switch:~$ net commit
After you commit your changes, you can verify the rule you created with NCLU by running net show configuration acl:
cumulus@switch:~$ net show configuration acl
acl ipv4 EXAMPLEv4 priority 10 accept tcp source-ip 10.0.14.2/32 source-port any dest-ip 10.0.15.8/32 dest-port any
interface swp1
acl ipv4 EXAMPLE1 inbound
Or you can see all of the rules installed by running cat on the 50_nclu_acl.rules file:
For INPUT and FORWARD rules, apply the rule to a control plane interface using net add control-plane:
cumulus@switch:~$ net add control-plane acl ipv4 EXAMPLE1 inbound
cumulus@switch:~$ net pending
cumulus@switch:~$ net commit
The net add control-plane command applies the rule to all data plane ports (swps). To apply the rule to all ports including eth0, run the net add control-plane-all command.
To remove a rule, use net del acl ipv4|ipv6|mac RULENAME:
cumulus@switch:~$ net del acl ipv4 EXAMPLE1
cumulus@switch:~$ net pending
cumulus@switch:~$ net commit
This deletes all rules from the 50_nclu_acl.rules file with that name. It also deletes the interfaces referenced in the nclu_acl.conf file.
Install and Manage ACL Rules with cl-acltool
You can manage Cumulus Linux ACLs with cl-acltool. Rules are first written to the iptables chains, as described above, and then synced to hardware via switchd.
Use iptables/ip6tables/ebtables and cl-acltool to manage rules in the default files, 00control_plane.rules and 99control_plane_catch_all.rules; they are not aware of rules created using NCLU.
To examine the current state of chains and list all installed rules, run:
cumulus@switch:~$ sudo cl-acltool -L all
-------------------------------
Listing rules of type iptables:
-------------------------------
TABLE filter :
Chain INPUT (policy ACCEPT 90 packets, 14456 bytes)
pkts bytes target prot opt in out source destination
0 0 DROP all -- swp+ any 240.0.0.0/5 anywhere
0 0 DROP all -- swp+ any loopback/8 anywhere
0 0 DROP all -- swp+ any base-address.mcast.net/8 anywhere
0 0 DROP all -- swp+ any 255.255.255.255 anywhere ...
To list installed rules using native iptables, ip6tables and ebtables, use the -L option with the respective commands:
If the install fails, ACL rules in the kernel and hardware are rolled back to the previous state. Errors from programming rules in the kernel or ASIC are reported appropriately.
Install Packet Filtering (ACL) Rules
cl-acltool takes access control list (ACL) rules input in files. Each ACL policy file contains iptables, ip6tables and ebtables categories under the tags [iptables], [ip6tables] and [ebtables].
Each rule in an ACL policy must be assigned to one of the rule categories above.
See man cl-acltool(5) for ACL rule details. For iptables rule syntax, see man iptables(8). For ip6tables rule syntax, see man ip6tables(8). For ebtables rule syntax, see man ebtables(8).
See man cl-acltool(5) and man cl-acltool(8) for further details on using cl-acltool. Some examples are listed here and more are listed later in this chapter.
By default:
ACL policy files are located in /etc/cumulus/acl/policy.d/.
All *.rules files in this directory are included in /etc/cumulus/acl/policy.conf.
All files included in this policy.conf file are installed when the switch boots up.
The policy.conf file expects rules files to have a .rules suffix as part of the file name.
Here is an example ACL policy file:
[iptables]
-A INPUT --in-interface swp1 -p tcp --dport 80 -j ACCEPT
-A FORWARD --in-interface swp1 -p tcp --dport 80 -j ACCEPT
[ip6tables]
-A INPUT --in-interface swp1 -p tcp --dport 80 -j ACCEPT
-A FORWARD --in-interface swp1 -p tcp --dport 80 -j ACCEPT
[ebtables]
-A INPUT -p IPv4 -j ACCEPT
-A FORWARD -p IPv4 -j ACCEPT
You can use wildcards or variables to specify chain and interface lists to ease administration of rules.
Currently only swp+ and bond+ are supported as wildcard names. There might be kernel restrictions in supporting more complex wildcards like swp1+ etc.
swp+ rules are applied as an aggregate, not per port. If you want to apply per port policing, specify a specific port instead of the wildcard.
You can write ACL rules for the system into multiple files under the default /etc/cumulus/acl/policy.d/ directory. The ordering of rules during installation follows the sort order of the files based on their file names.
Use multiple files to stack rules. The example below shows two rules files separating rules for management and datapath traffic:
cumulus@switch:~$ ls /etc/cumulus/acl/policy.d/
00sample_mgmt.rules 01sample_datapath.rules
cumulus@switch:~$ cat /etc/cumulus/acl/policy.d/00sample_mgmt.rules
INGRESS_INTF = swp+
INGRESS_CHAIN = INPUT
[iptables]
# protect the switch management
-A $INGRESS_CHAIN --in-interface $INGRESS_INTF -s 10.0.14.2 -d 10.0.15.8 -p tcp -j ACCEPT
-A $INGRESS_CHAIN --in-interface $INGRESS_INTF -s 10.0.11.2 -d 10.0.12.8 -p tcp -j ACCEPT
-A $INGRESS_CHAIN --in-interface $INGRESS_INTF -d 10.0.16.8 -p udp -j DROP
cumulus@switch:~$ cat /etc/cumulus/acl/policy.d/01sample_datapath.rules
INGRESS_INTF = swp+
INGRESS_CHAIN = INPUT, FORWARD
[iptables]
-A $INGRESS_CHAIN --in-interface $INGRESS_INTF -s 192.0.2.5 -p icmp -j ACCEPT
-A $INGRESS_CHAIN --in-interface $INGRESS_INTF -s 192.0.2.6 -d 192.0.2.4 -j DROP
-A $INGRESS_CHAIN --in-interface $INGRESS_INTF -s 192.0.2.2 -d 192.0.2.8 -j DROP
Apply all rules and policies included in /etc/cumulus/acl/policy.conf:
cumulus@switch:~$ sudo cl-acltool -i
In addition to ensuring that the rules and policies referenced by
/etc/cumulus/acl/policy.conf are installed, this will remove any
currently active rules and policies that are not contained in the
files referenced by /etc/cumulus/acl/policy.conf.
Specify the Policy Files to Install
By default, Cumulus Linux installs any .rules file you configure in /etc/cumulus/acl/policy.d/. To add other policy files to an ACL, you need to include them in /etc/cumulus/acl/policy.conf. For example, for Cumulus Linux to install a rule in a policy file called 01_new.datapathacl, add include /etc/cumulus/acl/policy.d/01_new.rules to policy.conf, as in this example:
cumulus@switch:~$ sudo nano /etc/cumulus/acl/policy.conf
#
# This file is a master file for acl policy file inclusion
#
# Note: This is not a file where you list acl rules.
#
# This file can contain:
# - include lines with acl policy files
# example:
# include <filepath>
#
# see manpage cl-acltool(5) and cl-acltool(8) for how to write policy files
#
include /etc/cumulus/acl/policy.d/01_new.datapathacl
Hardware Limitations on Number of Rules
The maximum number of rules that can be handled in hardware is a function of the following factors:
The platform type (switch silicon, like Tomahawk or Spectrum.
The mix of IPv4 and IPv6 rules; Cumulus Linux does not support the maximum number of rules for both IPv4 and IPv6 simultaneously.
The number of default rules provided by Cumulus Linux.
Whether the rules are applied on ingress or egress.
Whether the rules are in atomic or nonatomic mode; nonatomic mode rules are used when nonatomic updates are enabled (see above).
If the maximum number of rules for a particular table is exceeded, cl-acltool -i generates the following error:
error: hw sync failed (sync_acl hardware installation failed) Rolling back .. failed.
In the tables below, the default rules count toward the limits listed. The raw limits below assume only one ingress and one egress table are present.
Broadcom Tomahawk Limits
Direction
Atomic Mode IPv4 Rules
Atomic Mode IPv6 Rules
Nonatomic Mode IPv4 Rules
Nonatomic Mode IPv6 Rules
Ingress raw limit
512
512
1024
1024
Ingress limit with default rules
256 (36 default)
256 (29 default)
768 (36 default)
768 (29 default)
Egress raw limit
256
0
512
0
Egress limit with default rules
256 (29 default)
0
512 (29 default)
0
Broadcom Trident3 Limits
The Trident3 ASIC is divided into 12 slices, organized into 4 groups for ACLs. Each group contains 3 slices. Each group can support a maximum of 768 rules. You cannot mix IPv4 and IPv6 rules within the same group. IPv4 and MAC rules can be programmed into the same group.
Direction
Atomic Mode IPv4 Rules
Atomic Mode IPv6 Rules
Nonatomic Mode IPv4 Rules
Nonatomic Mode IPv6 Rules
Ingress raw limit
768
768
2304
2304
Ingress limit with default rules
768 (44 default)
768 (41 default)
2304 (44 default)
2304 (41 default)
Egress raw limit
512
0
512
0
Egress limit with default rules
512 (28 default)
0
512 (28 default)
0
Due to a hardware limitation on Trident3 switches, certain broadcast packets that are VXLAN decapsulated and sent to the CPU do not hit the normal INPUT chain ACL rules installed with cl-acltool. See default ACL considerations.
Broadcom Trident II+ Limits
Direction
Atomic Mode IPv4 Rules
Atomic Mode IPv6 Rules
Nonatomic Mode IPv4 Rules
Nonatomic Mode IPv6 Rules
Ingress raw limit
4096
4096
8192
8192
Ingress limit with default rules
2048 (36 default)
3072 (29 default)
6144 (36 default)
6144 (29 default)
Egress raw limit
256
0
512
0
Egress limit with default rules
256 (29 default)
0
512 (29 default)
0
Broadcom Trident II Limits
Direction
Atomic Mode IPv4 Rules
Atomic Mode IPv6 Rules
Nonatomic Mode IPv4 Rules
Nonatomic Mode IPv6 Rules
Ingress raw limit
1024
1024
2048
2048
Ingress limit with default rules
512 (36 default)
768 (29 default)
1536 (36 default)
1536 (29 default)
Egress raw limit
256
0
512
0
Egress limit with default rules
256 (29 default)
0
512 (29 default)
0
Broadcom Helix4 Limits
Direction
Atomic Mode IPv4 Rules
Atomic Mode IPv6 Rules
Nonatomic Mode IPv4 Rules
Nonatomic Mode IPv6 Rules
Ingress raw limit
1024
512
2048
1024
Ingress limit with default rules
768 (36 default)
384 (29 default)
1792 (36 default)
896 (29 default)
Egress raw limit
256
0
512
0
Egress limit with default rules
256 (29 default)
0
512 (29 default)
0
NVIDIA Spectrum Limits
The NVIDIA Spectrum ASIC has one common TCAM for both ingress and egress, which can be used for other non-ACL-related resources. However, the number of supported rules varies with the TCAM profile specified for the switch.
Profile
Atomic Mode IPv4 Rules
Atomic Mode IPv6 Rules
Nonatomic Mode IPv4 Rules
Nonatomic Mode IPv6 Rules
default
500
250
1000
500
ipmc-heavy
750
500
1500
1000
acl-heavy
1750
1000
3500
2000
ipmc-max
1000
500
2000
1000
ip-acl-heavy
6000
0
12000
0
Even though the table above specifies that zero IPv6 rules are supported with the ip-acl-heavy profile, Cumulus Linux does not prevent you from configuring IPv6 rules. However, there is no guarantee that IPv6 rules work under the ip-acl-heavy profile.
The ip-acl-heavy profile shows an updated number of supported atomic mode and nonatomic mode IPv4 rules. The previously published numbers were 7500 for atomic mode and 15000 for nonatomic mode IPv4 rules.
Supported Rule Types
The iptables/ip6tables/ebtables construct tries to layer the Linux implementation on top of the underlying hardware but they are not always directly compatible. Here are the supported rules for chains in iptables, ip6tables and ebtables.
To learn more about any of the options shown in the tables below, run iptables -h [name of option]. The same help syntax works for options for ip6tables and ebtables.
root@leaf1# ebtables -h tricolorpolice
<...snip...>
tricolorpolice option:
--set-color-mode STRING setting the mode in blind or aware
--set-cir INT setting committed information rate in kbits per second
--set-cbs INT setting committed burst size in kbyte
--set-pir INT setting peak information rate in kbits per second
--set-ebs INT setting excess burst size in kbyte
--set-conform-action-dscp INT setting dscp value if the action is accept for conforming packets
--set-exceed-action-dscp INT setting dscp value if the action is accept for exceeding packets
--set-violate-action STRING setting the action (accept/drop) for violating packets
--set-violate-action-dscp INT setting dscp value if the action is accept for violating packets
Supported chains for the filter table:
INPUT FORWARD OUTPUT
Rules with input/output Ethernet interfaces are ignored Inverse matches
Standard Targets
ACCEPT, DROP
RETURN, QUEUE, STOP, Fall Thru, Jump
Extended Targets
LOG (IPv4/IPv6); UID is not supported for LOG TCP SEQ, TCP options or IP options ULOG SETQOS DSCP Unique to Cumulus Linux: SPAN ERSPAN (IPv4/IPv6) POLICE TRICOLORPOLICE SETCLASS
ebtables Rule Support
Rule Element
Supported
Unsupported
Matches
ether type input interface/wildcard output interface/wildcard Src/Dst MAC IP: src, dest, tos, proto, sport, dport IPv6: tclass, icmp6: type, icmp6: code range, src/dst addr, sport, dport 802.1p (CoS) VLAN
Rules that have no matches and accept all packets in a chain are currently ignored.
Chain default rules (that are ACCEPT) are also ignored.
IPv6 Egress Rules on Broadcom Switches
Cumulus Linux supports IPv6 egress rules in ip6tables on Broadcom switches. Because there are no slices to allocate in the egress TCAM for IPv6, the matches are implemented using a combination of the ingress IPv6 slice and the existing egress IPv4 MAC slice:
Cumulus Linux compares all the match fields in the IPv6 ingress slice, except the --out-interface field, and marks the packet with a classid.
The egress IPv4 MAC slice matches on the classid and the out-interface, and performs the actions.
For example, the -A FORWARD --out-interface vlan100 -p icmp6 -j ACCEPT rule is split into the following:
IPv6 ingress: -A FORWARD -p icmp6 → action mark (for example, classid 4)
IPv4 MAC egress: <match mark 4> and --out-interface vlan100 -j ACCEPT
IPv6 egress rules in ip6tables are not supported on Hurricane2 switches.
You cannot match both input and output interfaces in the same rule.
The egress TCAM IPv4 MAC slice is shared with other rules, which constrains the scale to a much lower limit.
Considerations
Splitting rules across the ingress TCAM and the egress TCAM causes the ingress IPv6 part of the rule to match packets going to all destinations, which can interfere with the regular expected linear rule match in a sequence. For example:
A higher rule can prevent a lower rule from being matched unexpectedly:
Rule 1: -A FORWARD --out-interface vlan100 -p icmp6 -j ACCEPT
Rule 1 matches all icmp6 packets from to all out interfaces in the ingress TCAM.
This prevents rule 2 from getting matched, which is more specific but with a different out interface. Make sure to put more specific matches above more general matches even if the output interfaces are different.
When you have two rules with the same output interface, the lower rule might match unexpectedly depending on the presence of the previous rules.
Rule 1: -A FORWARD --out-interface vlan100 -p icmp6 -j ACCEPT
Rule 2: -A FORWARD --out-interface vlan101 -s 00::01 -j DROP
Rule 3 still matches for an icmp6 packet with sip 00:01 going out of vlan101. Rule 1 interferes with the normal function of rule 2 and/or rule 3.
When you have two adjacent rules with the same match and different output interfaces, such as:
Rule 1: -A FORWARD --out-interface vlan100 -p icmp6 -j ACCEPT
Rule 2: -A FORWARD --out-interface vlan101 -p icmp6 -j DROP
Rule 2 will never be match on ingress. Both rules share the same mark.
Matching Untagged Packets (Trident3 Switches)
Untagged packets do not have an associated VLAN to match on egress; therefore, the match must be on the underlying layer 2 port. For example, for a bridge configured with pvid 100, member port swp1s0 and swp1s1, and SVI vlan100, the output interface match on vlan100 has to be expanded into each member port. The -A FORWARD -o vlan100 -p icmp6 -j ACCEPT rule must be specified as two rules:
Rule 1: -A FORWARD -o swp1s0 -p icmp6 -J ACCEPT
Rule 2: -A FORWARD -o swp1s1 -p icmp6 -j ACCEPT
Matching on an egress port matches all packets egressing the port, tagged as well as untagged. Therefore, to match only untagged traffic on the port, you must specify additional rules above this rule to prevent tagged packets matching the rule. This is true for bridge member ports as well as regular layer 2 ports. In the example rule above, if vlan101 is also present on the bridge, add a rule above rule 1 and rule 2 to protect vlan101 tagged traffic:
Rule 0: -A FORWARD -o vlan101 -p icmp6 -j ACCEPT
Rule 1: -A FORWARD -o swp1s0 -p icmp6 -j ACCEPT
Rule 2: -A FORWARD -o swp1s1 -p icmp6 -j ACCEPT
For a standalone port or subinterface on swp1s2:
Rule 0: -A FORWARD -o swp1s2.101 -p icmp6 -j ACCEPT
Rule 1: -A FORWARD -o swp1s2 -p icmp6 -j ACCEPT
Common Examples
Control Plane and Data Plane Traffic
You can configure quality of service for traffic on both the control plane and the data plane. By using QoS policers, you can rate limit traffic so incoming packets get dropped if they exceed specified thresholds.
Counters on POLICE ACL rules in iptables do not currently show the packets that are dropped due to those rules.
Use the POLICE target with iptables. POLICE takes these arguments:
--set-class value sets the system internal class of service queue configuration to value.
--set-rate value specifies the maximum rate in kilobytes (KB) or packets.
--set-burst value specifies the number of packets or kilobytes (KB) allowed to arrive sequentially.
--set-mode string sets the mode in KB (kilobytes) or pkt (packets) for rate and burst size.
For example, to rate limit the incoming traffic on swp1 to 400 packets per second with a burst of 100 packets per second and set the class of the queue for the policed traffic as 0, set this rule in your appropriate .rules file:
The examples here use the mangle table to modify the packet as it transits the switch. DSCP is expressed in decimal notation in the examples below.
[iptables]
#Set SSH as high priority traffic.
-t mangle -A FORWARD -p tcp --dport 22 -j DSCP --set-dscp 46
#Set everything coming in SWP1 as AF13
-t mangle -A FORWARD --in-interface swp1 -j DSCP --set-dscp 14
#Set Packets destined for 10.0.100.27 as best effort
-t mangle -A FORWARD -d 10.0.100.27/32 -j DSCP --set-dscp 0
#Example using a range of ports for TCP traffic
-t mangle -A FORWARD -p tcp -s 10.0.0.17/32 --sport 10000:20000 -d 10.0.100.27/32 --dport 10000:20000 -j DSCP --set-dscp 34
Verify DSCP Values on Transit Traffic
The examples here use the DSCP match criteria in combination with other IP, TCP, and interface matches to identify traffic and count the number of packets.
[iptables]
#Match and count the packets that match SSH traffic with DSCP EF
-A FORWARD -p tcp --dport 22 -m dscp --dscp 46 -j ACCEPT
#Match and count the packets coming in SWP1 as AF13
-A FORWARD --in-interface swp1 -m dscp --dscp 14 -j ACCEPT
#Match and count the packets with a destination 10.0.0.17 marked best effort
-A FORWARD -d 10.0.100.27/32 -m dscp --dscp 0 -j ACCEPT
#Match and count the packets in a port range with DSCP AF41
-A FORWARD -p tcp -s 10.0.0.17/32 --sport 10000:20000 -d 10.0.100.27/32 --dport 10000:20000 -m dscp --dscp 34 -j ACCEPT
Check the Packet and Byte Counters for ACL Rules
To verify the counters using the above example rules, first send test traffic matching the patterns through the network. The following example generates traffic with mz (or mausezahn), which can be installed on host servers or even on Cumulus Linux switches. After traffic is sent to validate the counters, they are matched on switch1 using cl-acltool.
Policing counters do not increment on switches with the Spectrum ASIC.
# Send 100 TCP packets on host1 with a DSCP value of EF with a destination of host2 TCP port 22:
cumulus@host1$ mz eth1 -A 10.0.0.17 -B 10.0.100.27 -c 100 -v -t tcp "dp=22,dscp=46"
IP: ver=4, len=40, tos=184, id=0, frag=0, ttl=255, proto=6, sum=0, SA=10.0.0.17, DA=10.0.100.27,
payload=[see next layer]
TCP: sp=0, dp=22, S=42, A=42, flags=0, win=10000, len=20, sum=0,
payload=
# Verify the 100 packets are matched on switch1
cumulus@switch1$ sudo cl-acltool -L ip
-------------------------------
Listing rules of type iptables:
-------------------------------
TABLE filter :
Chain INPUT (policy ACCEPT 9314 packets, 753K bytes)
pkts bytes target prot opt in out source destination
Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
100 6400 ACCEPT tcp -- any any anywhere anywhere tcp dpt:ssh DSCP match 0x2e
0 0 ACCEPT all -- swp1 any anywhere anywhere DSCP match 0x0e
0 0 ACCEPT all -- any any 10.0.0.17 anywhere DSCP match 0x00
0 0 ACCEPT tcp -- any any 10.0.0.17 10.0.100.27 tcp spts:webmin:20000
dpts:webmin:2002
# Send 100 packets with a small payload on host1 with a DSCP value of AF13 with a destination of host2:
cumulus@host1$ mz eth1 -A 10.0.0.17 -B 10.0.100.27 -c 100 -v -t ip
IP: ver=4, len=20, tos=0, id=0, frag=0, ttl=255, proto=0, sum=0, SA=10.0.0.17, DA=10.0.100.27,
payload=
# Verify the 100 packets are matched on switch1
cumulus@switch1$ sudo cl-acltool -L ip
-------------------------------
Listing rules of type iptables:
-------------------------------
TABLE filter :
Chain INPUT (policy ACCEPT 9314 packets, 753K bytes)
pkts bytes target prot opt in out source destination
chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
100 6400 ACCEPT tcp -- any any anywhere anywhere tcp dpt:ssh DSCP match 0x2e
100 7000 ACCEPT all -- swp3 any anywhere anywhere DSCP match 0x0e
100 6400 ACCEPT all -- any any 10.0.0.17 anywhere DSCP match 0x00
0 0 ACCEPT tcp -- any any 10.0.0.17 10.0.100.27 tcp spts:webmin:20000 dpts:webmin:2002
# Send 100 packets on host1 with a destination of host2:
cumulus@host1$ mz eth1 -A 10.0.0.17 -B 10.0.100.27 -c 100 -v -t ip
IP: ver=4, len=20, tos=56, id=0, frag=0, ttl=255, proto=0, sum=0, SA=10.0.0.17, DA=10.0.100.27,
payload=
# Verify the 100 packets are matched on switch1
cumulus@switch1$ sudo cl-acltool -L ip
-------------------------------
Listing rules of type iptables:
-------------------------------
TABLE filter :
Chain INPUT (policy ACCEPT 9314 packets, 753K bytes)
pkts bytes target prot opt in out source destination
Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
100 6400 ACCEPT tcp -- any any anywhere anywhere tcp dpt:ssh DSCP match 0x2e
100 7000 ACCEPT all -- swp3 any anywhere anywhere DSCP match 0x0e
0 0 ACCEPT all -- any any 10.0.0.17 anywhere DSCP match 0x00
0 0 ACCEPT tcp -- any any 10.0.0.17 10.0.100.27 tcp spts:webmin:20000 dpts:webmin:2002Still working
Filter Specific TCP Flags
The example solution below creates rules on the INPUT and FORWARD chains to drop ingress IPv4 and IPv6 TCP packets when the SYN bit is set and the RST, ACK, and FIN bits are reset. The default for the INPUT and FORWARD chains allows all other packets. The ACL is applied to ports swp20 and swp21. After configuring this ACL, new TCP sessions that originate from ingress ports swp20 and swp21 are not allowed. TCP sessions that originate from any other port are allowed.
INGRESS_INTF = swp20,swp21
[iptables]
-A INPUT,FORWARD --in-interface $INGRESS_INTF -p tcp --syn -j DROP
[ip6tables]
-A INPUT,FORWARD --in-interface $INGRESS_INTF -p tcp --syn -j DROP
The --syn flag in the above rule matches packets with the SYN bit set and the ACK, RST, and FIN bits are cleared. It is equivalent to using -tcp-flags SYN,RST,ACK,FIN SYN. For example, you can write the above rule as:
-A INPUT,FORWARD --in-interface $INGRESS_INTF -p tcp --tcp-flags SYN,RST,ACK,FIN SYN -j DROP
Control Who Can SSH into the Switch
Run the following NCLU commands to control who can SSH into the switch.
In the following example, 10.0.0.11/32 is the interface IP address (or loopback IP address) of the switch and 10.255.4.0/24 can SSH into the switch.
cumulus@switch:~$ net add acl ipv4 test priority 10 accept source-ip 10.255.4.0/24 dest-ip 10.0.0.11/32
cumulus@switch:~$ net add acl ipv4 test priority 20 drop source-ip any dest-ip 10.0.0.11/32
cumulus@switch:~$ net add control-plane acl ipv4 test inbound
cumulus@switch:~$ net pending
cumulus@switch:~$ net commit
Cumulus Linux does not support the keyword iprouter (typically used for traffic sent to the CPU, where the destination MAC address is that of the router but the destination IP address is not the router).
Example Configuration
The following example demonstrates how several different rules are applied.
Following are the configurations for the two switches used in these examples. The configuration for each switch appears in /etc/network/interfaces on that switch.
Switch 1 Configuration
cumulus@switch1:~$ net show configuration files
...
/etc/network/interfaces
=======================
auto swp1
iface swp1
auto swp2
iface swp2
auto swp3
iface swp3
auto swp4
iface swp4
auto bond2
iface bond2
bond-slaves swp3 swp4
auto br-untagged
iface br-untagged
address 10.0.0.1/24
bridge_ports swp1 bond2
bridge_stp on
auto br-tag100
iface br-tag100
address 10.0.100.1/24
bridge_ports swp2.100 bond2.100
bridge_stp on
...
Switch 2 Configuration
cumulus@switch2:~$ net show configuration files
...
/etc/network/interfaces
=======================
auto swp3
iface swp3
auto swp4
iface swp4
auto br-untagged
iface br-untagged
address 10.0.0.2/24
bridge_ports bond2
bridge_stp on
auto br-tag100
iface br-tag100
address 10.0.100.2/24
bridge_ports bond2.100
bridge_stp on
auto bond2
iface bond2
bond-slaves swp3 swp4
...
Egress Rule
The following rule blocks any TCP traffic with destination port 200 going from host1 or host2 through the switch (corresponding to rule 1 in the diagram above).
[iptables] -A FORWARD -o bond2 -p tcp --dport 200 -j DROP
Ingress Rule
The following rule blocks any UDP traffic with source port 200 going from host1 through the switch (corresponding to rule 2 in the diagram above).
[iptables] -A FORWARD -i swp2 -p udp --sport 200 -j DROP
Input Rule
The following rule blocks any UDP traffic with source port 200 and destination port 50 going from host1 to the switch (corresponding to rule 3 in the diagram above).
[iptables] -A INPUT -i swp1 -p udp --sport 200 --dport 50 -j DROP
Output Rule
The following rule blocks any TCP traffic with source port 123 and destination port 123 going from Switch 1 to host2 (corresponding to rule 4 in the diagram above).
[iptables] -A OUTPUT -o br-tag100 -p tcp --sport 123 --dport 123 -j DROP
Combined Rules
The following rule blocks any TCP traffic with source port 123 and destination port 123 going from any switch port egress or generated from Switch 1 to host1 or host2 (corresponding to rules 1 and 4 in the diagram above).
[iptables] -A OUTPUT,FORWARD -o swp+ -p tcp --sport 123 --dport 123 -j DROP
This also becomes two ACLs and is the same as:
[iptables]
-A FORWARD -o swp+ -p tcp --sport 123 --dport 123 -j DROP
-A OUTPUT -o swp+ -p tcp --sport 123 --dport 123 -j DROP
Layer 2-only Rules/ebtables
The following rule blocks any traffic with source MAC address 00:00:00:00:00:12 and destination MAC address 08:9e:01:ce:e2:04 going from any switch port egress/ingress.
[ebtables] -A FORWARD -s 00:00:00:00:00:12 -d 08:9e:01:ce:e2:04 -j DROP
Considerations
Not All Rules Supported
Not all iptables, ip6tables, or ebtables rules are supported. Refer to the Supported Rules section above for specific rule support.
Input Chain Rules on Broadcom Switches
Broadcom switches evaluate both IPv4 and IPv6 packets against INPUT chain iptables rules. For example, when you install the following rule, the switch drops both IPv6 and IPv4 packets with destination port 22.
[iptables]
-A INPUT -p tcp --dport 22 -j DROP
To work around this issue, use ebtables with IPv4 or IPv6 headers instead of the iptables and ip6tables generic INPUT chain DROP. For example:
[ebtables]
-A INPUT -i swp+ -p IPv4 --ip-protocol tcp --ip-destination-port 22 -j DROP
[ebtables]
-A INPUT -i swp+ -p IPv6 --ip6-protocol tcp --ip6-destination-port 22 -j DROP
ACL Log Policer Limits Traffic
To protect the CPU from overloading, traffic copied to the CPU is limited to 1 pkt/s by an ACL Log Policer.
Bridge Traffic Limitations
Bridge traffic that matches LOG ACTION rules are not logged in syslog; the kernel and hardware identify packets using different information.
Log Actions Cannot Be Forwarded
Logged packets cannot be forwarded. The hardware cannot both forward a packet and send the packet to the control plane (or kernel) for logging. To emphasize this, a log action must also have a drop action.
Broadcom Range Checker Limitations
Broadcom platforms have only 24 range checkers. This is a separate resource from the total number of ACLs allowed. If you are creating a large ACL configuration, use port ranges for large ranges of more than 5 ports.
Inbound LOG Actions Only for Broadcom Switches
On Broadcom-based switches, LOG actions can only be done on inbound interfaces (the ingress direction), not on outbound interfaces (the egress direction).
SPAN Sessions that Reference an Outgoing Interface
On Tomahawk switches, the field processor (FP) polices on a per-pipeline basis instead of globally, as with a Trident II switch. If packets come in to different switch ports that are on different pipelines on the ASIC, they might be rate limited differently.
For example, your switch is set so BFD is rate limited to 2000 packets per second. When the BFD packets are received on port1/pipe1 and port2/pipe2, they are each rate limited at 2000 pps; the switch is rate limiting at 4000 pps overall. Because there are four pipelines on a Tomahawk switch, you might see a fourfold increase of your configured rate limits.
Atomic Update Mode Enabled by Default
In Cumulus Linux, atomic update mode is enabled by default. If you have Tomahawk switches and plan to use SPAN and/or mangle rules, you must disable atomic update mode.
To do so, enable nonatomic update mode by setting the value for acl.non_atomic_update_mode to TRUE in /etc/cumulus/switchd.conf, then restart switchd.
acl.non_atomic_update_mode = TRUE
Packets Undercounted during ACL Updates
On Tomahawk switches, when updating egress FP rules, some packets do no get counted. This results in an underreporting of counts during ping-pong or incremental switchover.
Trident II+ Hardware Limitations
On a Trident II+ switch, the TCAM allocation for ACLs is limited to 2048 rules in atomic mode for a default setup instead of 4096, as advertised for ingress rules.
Trident3 Hardware Limitations
TCAM Allocation
On a Trident3 switch, the TCAM allocation for ACLs is limited to 2048 rules in atomic mode for a default setup instead of 4096, as advertised for ingress rules.
Enable Nonatomic Mode
On a Trident3 switch, you must enable nonatomic update mode before you can configure ERSPAN. To do so, set the value for acl.non_atomic_update_mode to TRUE in /etc/cumulus/switchd.conf, then restart switchd.
acl.non_atomic_update_mode = TRUE
Egress ACL Rules
On Trident3 switches, egress ACL rules matching on the output SVI interface match layer 3 routed packets only, not bridged packets. To match layer 2 traffic, use egress bridge member port-based rules.
iptables Interactions with cl-acltool
Because Cumulus Linux is a Linux operating system, the iptables commands can be used directly. However, consider using cl-acltool instead because:
Without using cl-acltool, rules are not installed into hardware.
Running cl-acltool -i (the installation command) resets all rules and deletes anything that is not stored in /etc/cumulus/acl/policy.conf.
For example, running the following command works:
cumulus@switch:~$ sudo iptables -A INPUT -p icmp --icmp-type echo-request -j DROP
And the rules appear when you run cl-acltool -L:
cumulus@switch:~$ sudo cl-acltool -L ip
-------------------------------
Listing rules of type iptables:
-------------------------------
TABLE filter :
Chain INPUT (policy ACCEPT 72 packets, 5236 bytes)
pkts bytes target prot opt in out source destination
0 0 DROP icmp -- any any anywhere anywhere icmp echo-request
However, running cl-acltool -i or reboot removes them. To ensure all rules that can be in hardware are hardware accelerated, place them in the /etc/cumulus/acl/policy.conf file, then run cl-acltool -i.
NVIDIA Spectrum Hardware Limitations
Due to hardware limitations in the Spectrum ASIC, BFD policers are shared between all BFD-related control plane rules. Specifically the following default rules share the same policer in the 00control_plan.rules file:
To work around this limitation, set the rate and burst of all 6 of these rules to the same values, using the --set-rate and --set-burst options.
Where to Assign Rules
If a switch port is assigned to a bond, any egress rules must be assigned to the bond.
When using the OUTPUT chain, rules must be assigned to the source. For example, if a rule is assigned to the switch port in the direction of traffic but the source is a bridge (VLAN), the traffic is not affected by the rule and must be applied to the bridge.
If all transit traffic needs to have a rule applied, use the FORWARD chain, not the OUTPUT chain.
Generic Error Message Displayed after ACL Rule Installation Failure
After an ACL rule installation failure, a generic error message like the following is displayed:
cumulus@switch:$ sudo cl-acltool -i -p 00control_plane.rules
Using user provided rule file 00control_plane.rules
Reading rule file 00control_plane.rules ...
Processing rules in file 00control_plane.rules ...
error: hw sync failed (sync_acl hardware installation failed)
Installing acl policy... Rolling back ..
failed.
Dell S3048-ON Supports only 24K MAC Addresses
The Dell S3048-ON has a limit of 24576 MAC address entries instead of 32K for other 1G switches.
NVIDIA Spectrum ASICs and INPUT Chain Rules
On switches with NVIDIA Spectrum ASICs, INPUT chain rules are implemented using a trap mechanism. Packets headed to the CPU are assigned trap IDs. The default INPUT chain rules are mapped to these trap IDs. However, if a packet matches multiple traps, they are resolved by an internal priority mechanism that might be different from the rule priorities. Packets might not get policed by the default expected rule, but by another rule instead. For example, ICMP packets headed to the CPU are policed by the LOCAL rule instead of the ICMP rule. Also, multiple rules might share the same trap. In this case the policer that is applied is the largest of the policer values.
To work around this issue, create rules on the INPUT and FORWARD chains (INPUT,FORWARD).
Hardware Policing of Packets in the Input Chain
On certain platforms, there are limitations on hardware policing of packets in the INPUT chain. To work around these limitations, Cumulus Linux supports kernel based policing of these packets in software using limit/hashlimit matches. Rules with these matches are not hardware offloaded, but are ignored during hardware install.
ACLs Do not Match when the Output Port on the ACL is a Subinterface
Packets don’t get matched when a subinterface is configured as the output port. The ACL matches on packets only if the primary port is configured as an output port. If a subinterface is set as an output or egress port, the packets match correctly.
For example:
-A FORWARD --out-interface swp49s1.100 -j ACCEPT
NVIDIA Spectrum Switches and Egress ACL Matching on Bonds
On the NVIDIA Spectrum switch, ACL rules that match on an outbound bond interface are not supported. For example, the following rule is not supported:
[iptables]
-A FORWARD --out-interface <bond_intf> -j DROP
To work around this issue, duplicate the ACL rule on each physical port of the bond. For example:
[iptables]
-A FORWARD --out-interface <bond-member-port-1> -j DROP
-A FORWARD --out-interface <bond-member-port-2> -j DROP
The Cumulus Linux default ACL configuration is split into three parts: iptables, ip6tables, and ebtables. The sections below describe the default configurations for each part. You can see the default file by clicking the Default ACL Configuration link:
▼
Default ACL Configuration
cumulus@switch:~$ sudo cl-acltool -L all
-------------------------------
Listing rules of type iptables:
-------------------------------
TABLE filter :
Chain INPUT (policy ACCEPT 167 packets, 16481 bytes)
pkts bytes target prot opt in out source destination
0 0 DROP all -- swp+ any 240.0.0.0/5 anywhere
0 0 DROP all -- swp+ any loopback/8 anywhere
0 0 DROP all -- swp+ any base-address.mcast.net/8 anywhere
0 0 DROP all -- swp+ any 255.255.255.255 anywhere
0 0 SETCLASS udp -- swp+ any anywhere anywhere udp dpt:3785 SETCLASS class:7
0 0 POLICE udp -- any any anywhere anywhere udp dpt:3785 POLICE mode:pkt rate:2000 burst:2000
0 0 SETCLASS udp -- swp+ any anywhere anywhere udp dpt:3784 SETCLASS class:7
0 0 POLICE udp -- any any anywhere anywhere udp dpt:3784 POLICE mode:pkt rate:2000 burst:2000
0 0 SETCLASS udp -- swp+ any anywhere anywhere udp dpt:4784 SETCLASS class:7
0 0 POLICE udp -- any any anywhere anywhere udp dpt:4784 POLICE mode:pkt rate:2000 burst:2000
0 0 SETCLASS ospf -- swp+ any anywhere anywhere SETCLASS class:7
0 0 POLICE ospf -- any any anywhere anywhere POLICE mode:pkt rate:2000 burst:2000
0 0 SETCLASS tcp -- swp+ any anywhere anywhere tcp dpt:bgp SETCLASS class:7
0 0 POLICE tcp -- any any anywhere anywhere tcp dpt:bgp POLICE mode:pkt rate:2000 burst:2000
0 0 SETCLASS tcp -- swp+ any anywhere anywhere tcp spt:bgp SETCLASS class:7
0 0 POLICE tcp -- any any anywhere anywhere tcp spt:bgp POLICE mode:pkt rate:2000 burst:2000
0 0 SETCLASS tcp -- swp+ any anywhere anywhere tcp dpt:5342 SETCLASS class:7
0 0 POLICE tcp -- any any anywhere anywhere tcp dpt:5342 POLICE mode:pkt rate:2000 burst:2000
0 0 SETCLASS tcp -- swp+ any anywhere anywhere tcp spt:5342 SETCLASS class:7
0 0 POLICE tcp -- any any anywhere anywhere tcp spt:5342 POLICE mode:pkt rate:2000 burst:2000
0 0 SETCLASS icmp -- swp+ any anywhere anywhere SETCLASS class:2
1 84 POLICE icmp -- any any anywhere anywhere POLICE mode:pkt rate:100 burst:40
0 0 SETCLASS udp -- swp+ any anywhere anywhere udp dpts:bootps:bootpc SETCLASS class:2
0 0 POLICE udp -- any any anywhere anywhere udp dpt:bootps POLICE mode:pkt rate:100 burst:100
0 0 POLICE udp -- any any anywhere anywhere udp dpt:bootpc POLICE mode:pkt rate:100 burst:100
0 0 SETCLASS tcp -- swp+ any anywhere anywhere tcp dpts:bootps:bootpc SETCLASS class:2
0 0 POLICE tcp -- any any anywhere anywhere tcp dpt:bootps POLICE mode:pkt rate:100 burst:100
0 0 POLICE tcp -- any any anywhere anywhere tcp dpt:bootpc POLICE mode:pkt rate:100 burst:100
0 0 SETCLASS udp -- swp+ any anywhere anywhere udp dpt:10001 SETCLASS class:3
0 0 POLICE udp -- any any anywhere anywhere udp dpt:10001 POLICE mode:pkt rate:2000 burst:2000
0 0 SETCLASS igmp -- swp+ any anywhere anywhere SETCLASS class:6
1 32 POLICE igmp -- any any anywhere anywhere POLICE mode:pkt rate:300 burst:100
0 0 POLICE all -- swp+ any anywhere anywhere ADDRTYPE match dst-type LOCAL POLICE mode:pkt rate:1000 burst:1000 class:0
0 0 POLICE all -- swp+ any anywhere anywhere ADDRTYPE match dst-type IPROUTER POLICE mode:pkt rate:400 burst:100 class:0
0 0 SETCLASS all -- swp+ any anywhere anywhere SETCLASS class:0
Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
0 0 DROP all -- swp+ any 240.0.0.0/5 anywhere
0 0 DROP all -- swp+ any loopback/8 anywhere
0 0 DROP all -- swp+ any base-address.mcast.net/8 anywhere
0 0 DROP all -- swp+ any 255.255.255.255 anywhere
Chain OUTPUT (policy ACCEPT 107 packets, 12590 bytes)
pkts bytes target prot opt in out source destination
TABLE mangle :
Chain PREROUTING (policy ACCEPT 172 packets, 17871 bytes)
pkts bytes target prot opt in out source destination
Chain INPUT (policy ACCEPT 172 packets, 17871 bytes)
pkts bytes target prot opt in out source destination
Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
Chain OUTPUT (policy ACCEPT 111 packets, 18134 bytes)
pkts bytes target prot opt in out source destination
Chain POSTROUTING (policy ACCEPT 111 packets, 18134 bytes)
pkts bytes target prot opt in out source destination
TABLE raw :
Chain PREROUTING (policy ACCEPT 173 packets, 17923 bytes)
pkts bytes target prot opt in out source destination
Chain OUTPUT (policy ACCEPT 112 packets, 18978 bytes)
pkts bytes target prot opt in out source destination
--------------------------------
Listing rules of type ip6tables:
--------------------------------
TABLE filter :
Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
0 0 DROP all swp+ any ip6-mcastprefix/8 anywhere
0 0 DROP all swp+ any ::/128 anywhere
0 0 DROP all swp+ any ::ffff:0.0.0.0/96 anywhere
0 0 DROP all swp+ any localhost/128 anywhere
0 0 POLICE udp swp+ any anywhere anywhere udp dpt:3785 POLICE mode:pkt rate:2000 burst:2000 class:7
0 0 POLICE udp swp+ any anywhere anywhere udp dpt:3784 POLICE mode:pkt rate:2000 burst:2000 class:7
0 0 POLICE udp swp+ any anywhere anywhere udp dpt:4784 POLICE mode:pkt rate:2000 burst:2000 class:7
0 0 POLICE ospf swp+ any anywhere anywhere POLICE mode:pkt rate:2000 burst:2000 class:7
0 0 POLICE tcp swp+ any anywhere anywhere tcp dpt:bgp POLICE mode:pkt rate:2000 burst:2000 class:7
0 0 POLICE tcp swp+ any anywhere anywhere tcp spt:bgp POLICE mode:pkt rate:2000 burst:2000 class:7
0 0 POLICE ipv6-icmp swp+ any anywhere anywhere ipv6-icmp router-solicitation POLICE mode:pkt rate:100 burst:100 class:2
0 0 POLICE ipv6-icmp swp+ any anywhere anywhere ipv6-icmp router-advertisement POLICE mode:pkt rate:500 burst:500 class:2
0 0 POLICE ipv6-icmp swp+ any anywhere anywhere ipv6-icmp neighbour-solicitation POLICE mode:pkt rate:400 burst:400 class:2
0 0 POLICE ipv6-icmp swp+ any anywhere anywhere ipv6-icmp neighbour-advertisement POLICE mode:pkt rate:400 burst:400 class:2
0 0 POLICE ipv6-icmp swp+ any anywhere anywhere ipv6-icmptype 130 POLICE mode:pkt rate:200 burst:100 class:6
0 0 POLICE ipv6-icmp swp+ any anywhere anywhere ipv6-icmptype 131 POLICE mode:pkt rate:200 burst:100 class:6
0 0 POLICE ipv6-icmp swp+ any anywhere anywhere ipv6-icmptype 132 POLICE mode:pkt rate:200 burst:100 class:6
0 0 POLICE ipv6-icmp swp+ any anywhere anywhere ipv6-icmptype 143 POLICE mode:pkt rate:200 burst:100 class:6
0 0 POLICE ipv6-icmp swp+ any anywhere anywhere POLICE mode:pkt rate:64 burst:40 class:2
0 0 POLICE udp swp+ any anywhere anywhere udp dpts:dhcpv6-client:dhcpv6-server POLICE mode:pkt rate:100 burst:100 class:2
0 0 POLICE tcp swp+ any anywhere anywhere tcp dpts:dhcpv6-client:dhcpv6-server POLICE mode:pkt rate:100 burst:100 class:2
0 0 POLICE all swp+ any anywhere anywhere ADDRTYPE match dst-type LOCAL POLICE mode:pkt rate:1000 burst:1000 class:0
0 0 POLICE all swp+ any anywhere anywhere ADDRTYPE match dst-type IPROUTER POLICE mode:pkt rate:400 burst:100 class:0
0 0 SETCLASS all swp+ any anywhere anywhere SETCLASS class:0
Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
0 0 DROP all swp+ any ip6-mcastprefix/8 anywhere
0 0 DROP all swp+ any ::/128 anywhere
0 0 DROP all swp+ any ::ffff:0.0.0.0/96 anywhere
0 0 DROP all swp+ any localhost/128 anywhere
Chain OUTPUT (policy ACCEPT 5 packets, 408 bytes)
pkts bytes target prot opt in out source destination
TABLE mangle :
Chain PREROUTING (policy ACCEPT 7 packets, 718 bytes)
pkts bytes target prot opt in out source destination
Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
Chain OUTPUT (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
Chain POSTROUTING (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
TABLE raw :
Chain PREROUTING (policy ACCEPT 7 packets, 718 bytes)
pkts bytes target prot opt in out source destination
Chain OUTPUT (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
-------------------------------
Listing rules of type ebtables:
-------------------------------
TABLE filter :
Bridge table: filter
Bridge chain: INPUT, entries: 16, policy: ACCEPT
-d BGA -i swp+ -j setclass --class 7 , pcnt = 0 -- bcnt = 0
-d BGA -j police --set-mode pkt --set-rate 2000 --set-burst 2000 , pcnt = 0 -- bcnt = 0
-d 1:80:c2:0:0:2 -i swp+ -j setclass --class 7 , pcnt = 0 -- bcnt = 0
-d 1:80:c2:0:0:2 -j police --set-mode pkt --set-rate 2000 --set-burst 2000 , pcnt = 0 -- bcnt = 0
-d 1:80:c2:0:0:e -i swp+ -j setclass --class 6 , pcnt = 0 -- bcnt = 0
-d 1:80:c2:0:0:e -j police --set-mode pkt --set-rate 200 --set-burst 200 , pcnt = 0 -- bcnt = 0
-d 1:0:c:cc:cc:cc -i swp+ -j setclass --class 6 , pcnt = 0 -- bcnt = 0
-d 1:0:c:cc:cc:cc -j police --set-mode pkt --set-rate 200 --set-burst 200 , pcnt = 0 -- bcnt = 0
-p ARP -i swp+ -j setclass --class 2 , pcnt = 0 -- bcnt = 0
-p ARP -j police --set-mode pkt --set-rate 400 --set-burst 100 , pcnt = 0 -- bcnt = 0
-d 1:0:c:cc:cc:cd -i swp+ -j setclass --class 7 , pcnt = 0 -- bcnt = 0
-d 1:0:c:cc:cc:cd -j police --set-mode pkt --set-rate 2000 --set-burst 2000 , pcnt = 0 -- bcnt = 0
-p IPv4 -i swp+ -j ACCEPT , pcnt = 0 -- bcnt = 0
-p IPv6 -i swp+ -j ACCEPT , pcnt = 0 -- bcnt = 0
-i swp+ -j setclass --class 0 , pcnt = 0 -- bcnt = 0
-j police --set-mode pkt --set-rate 100 --set-burst 100 , pcnt = 0 -- bcnt = 0
Bridge chain: FORWARD, entries: 0, policy: ACCEPT
Bridge chain: OUTPUT, entries: 0, policy: ACCEPT
Set class: 7 Police: Packet rate 2000 burst 2000 Source IP: Any Destination IP: Any
Protocol: UDP/BFD Echo UDP/BFD Control UDP BFD Multihop Control OSPF TCP/BGP (spt dpt 179) TCP/MLAG (spt dpt 5342)
Set Class: 6 Police: Rate 300 burst 100 Source IP: Any Destination IP: Any
Protocol: IGMP
Set class: 2 Police: Rate 100 burst 40 Source IP : Any Destination IP: Any
Protocol: ICMP
Set class: 2 Police: Rate 100 burst 100 Source IP: Any Destination IP: Any
Protocol: UDP/bootpc, bootps
Set class: 0 Police: Rate 1000 burst 1000 Source IP: Any Destination IP: Any
ADDRTYPE match dst-type LOCAL Note: LOCAL is any local address -> Receiving a packet with a destination matching a local IP address on the switch will go to the CPU.
Set class: 0 Police: Rate 400 burst 100 Source IP: Any Destination IP: Any
ADDRTYPE match dst-type IPROUTER Note: IPROUTER is any unresolved address -> On a l2/l3 boundary receiving a packet from L3 and needs to go to CPU in order to ARP for the destination.
Set class 0
All
Set class is internal to the switch - it does not set any precedence bits.
Police: Packet rate: 1000 burst 1000 Source IPv6: Any Destination IPv6: Any
ADDRTYPE match dst-type LOCAL Note: LOCAL is any local address -> Receiving a packet with a destination matching a local IPv6 address on the switch will go to the CPU.
Set class: 0 Police: Packet rate: 400 burst 100
ADDRTYPE match dst-type IPROUTER Note: IPROUTER is an unresolved address -> On a l2/l3 boundary receiving a packet from L3 and needs to go to CPU in order to ARP for the destination.
Set class 0
All
Set class is internal to the switch - it does not set any precedence bits.
ebtables
Action/Value
Protocol/MAC Address
Set Class: 7 Police: packet rate: 2000 burst rate:2000 Any switchport input interface
BDPU LACP= Cisco PVST
Set Class: 6 Police: packet rate: 200 burst rate: 200 Any switchport input inteface
LLDP CDP
Set Class: 2 Police: packet rate: 400 burst rate: 100 Any switchport input interface
ARP
Catch All: Allow all traffic Any switchport input interface
IPv4 IPv6
Catch All (applied at end): Set class: 0 Police: packet rate 100 burst rate 100 Any switchport
ALL OTHER
Set class is internal to the switch. It does not set any precedence bits.
Considerations
Due to a hardware limitation on Trident3 switches, certain broadcast packets that are VXLAN decapsulated and sent to the CPU do not hit the normal INPUT chain ACL rules installed with cl-acltool.
You can configure policers for broadcast packets in the /etc/cumulus/switchd.conf file. The policers configuration format and default value is shown below:
On Broadcom switches, a MAC address is learned on a bridge regardless of whether or not a received packet is dropped by an ACL. This is due to how the hardware learns MAC addresses and occurs before the ACL lookup. This can be a security or resource problem as the MAC address table has the potential to get filled with bogus MAC addresses; a malfunctioning host, network error, loop, or malicious attack on a shared layer 2 platform can create an outage for other hosts if the same MAC address is learned on another port.
To prevent this from happening, Cumulus Linux filters frames before MAC learning occurs. Because MAC addresses and their port/VLAN associations are known at configuration time, you can create static MAC addresses, then create ingress ACLs to whitelist traffic from these MAC addresses and drop traffic otherwise.
This feature is specific to switches on the Broadcom platform only; on switches with Mellanox Spectrum ASICs, the input port ACL does not have these issues when learning MAC addresses.
Create a configuration similar to the following, where you associate a port and VLAN with a given MAC address, adding each one to the bridge:
cumulus@switch:~$ net add bridge bridge vids 100,200,300
cumulus@switch:~$ net add bridge bridge pvid 1
cumulus@switch:~$ net add bridge bridge ports swp1-3
cumulus@switch:~$ net add bridge pre-up bridge fdb add 00:00:00:00:00:11 dev swp1 master static vlan 100
cumulus@switch:~$ net add bridge pre-up bridge fdb add 00:00:00:00:00:22 dev swp2 master static vlan 200
cumulus@switch:~$ net add bridge pre-up bridge fdb add 00:00:00:00:00:33 dev swp3 master static vlan 300
cumulus@switch:~$ net pending
cumulus@switch:~$ net commit
These commands create the following configuration in the /etc/network/interfaces file:
auto swp1
iface swp1
auto swp2
iface swp2
auto swp3
iface swp3
auto bridge
iface bridge
bridge-ports swp1 swp2 swp3
bridge-pvid 1
bridge-vids 100 200 300
bridge-vlan-aware yes
pre-up bridge fdb add 00:00:00:00:00:11 dev swp1 master static vlan 100
pre-up bridge fdb add 00:00:00:00:00:22 dev swp2 master static vlan 200
pre-up bridge fdb add 00:00:00:00:00:33 dev swp3 master static vlan 300
If you need to list many MAC addresses, you can run a script to create the same configuration. For example, create a script called macs.txt and put in the bridge fdb add commands for each MAC address you need to configure:
cumulus@switch:~$ net add bridge bridge vids 100,200,300
cumulus@switch:~$ net add bridge bridge pvid 1
cumulus@switch:~$ net add bridge bridge ports swp1-3
cumulus@switch:~$ net add bridge pre-up /etc/networks/macs.txt
cumulus@switch:~$ net pending
cumulus@switch:~$ net commit
These commands create the following configuration in the /etc/network/interfaces file:
auto swp1
iface swp1
auto swp2
iface swp2
auto swp3
iface swp3
auto swp4
iface swp4
auto swp5
iface swp5
auto swp6
iface swp6
auto bridge
iface bridge
bridge-ports swp1 swp2 swp3 swp4 swp5 swp6
bridge-pvid 1
bridge-vids 100 200 300
bridge-vlan-aware yes
pre-up bridge fdb add 00:00:00:00:00:11 dev swp1 master static vlan 100
pre-up bridge fdb add 00:00:00:00:00:22 dev swp2 master static vlan 200
pre-up bridge fdb add 00:00:00:00:00:33 dev swp3 master static vlan 300
pre-up bridge fdb add 00:00:00:00:00:44 dev swp4 master static vlan 400
pre-up bridge fdb add 00:00:00:00:00:55 dev swp5 master static vlan 500
pre-up bridge fdb add 00:00:00:00:00:66 dev swp6 master static vlan 600
Interactions with EVPN
If you are using EVPN, local static MAC addresses added to the local FDB are exported as static MAC addresses to remote switches. Remote MAC addresses are added as MAC addresses to the remote FDB.
Services and Daemons in Cumulus Linux
Services (also known as daemons) and processes are at the heart of how a Linux system functions. Most of the time, a service takes care of itself; you just enable and start it, then let it run. However, because a Cumulus Linux switch is a Linux system, you can dig deeper if you like. Services can start multiple processes as they run. Services are important to monitor on a Cumulus Linux switch.
You manage services in Cumulus Linux in the following ways:
Identify currently active or stopped services
Identify boot time state of a specific service
Disable or enable a specific service
Identify active listener ports
systemd and the systemctl Command
In general, you manage services using systemd via the systemctl command. You use it with any service on the switch to start, stop, restart, reload, enable, disable, reenable, or get the status of the service.
systemctl has a number of arguments that perform a specific operation on a given service.
status returns the status of the specified service.
start starts the service.
stop stops the service.
restart stops, then starts the service, all the while maintaining state. If there are dependent services or services that mark the restarted service as Required, the other services also restart. For example, running systemctl restart frr.service restarts any of the routing protocol services that are enabled and running, such as bgpd or ospfd.
reload reloads the configuration for the service.
enable enables the service to start when the system boots, but does not start it unless you use the systemctl start SERVICENAME.service command or reboot the switch.
disable disables the service, but does not stop it unless you use the systemctl stop SERVICENAME.service command or reboot the switch. You can start or stop a disabled service.
reenable disables, then enables a service. You might need to do this so that any new Wants or WantedBy lines create the symlinks necessary for ordering. This has no side effects on other services.
There is often little reason to interact with the services directly using these commands. If a critical service crashes or encounters an error, it is automatically restarted by systemd. systemd is effectively the caretaker of services in modern Linux systems and is responsible for starting all the necessary services at boot time.
Ensure a Service Starts after Multiple Restarts
By default, systemd is configured to try to restart a particular service only a certain number of times within a given interval before the service fails to start at all. The settings, StartLimitInterval (which defaults to 10 seconds) and StartBurstLimit (which defaults to 5 attempts) are stored in the service script; however, many services override these defaults, sometimes with much longer times. For example, switchd.service sets StartLimitInterval=10m and StartBurstLimit=3; therefore, if you restart switchd more than 3 times in 10 minutes, it does not start.
When the restart fails for this reason, you see a message similar to the following:
Job for switchd.service failed. See 'systemctl status switchd.service' and 'journalctl -xn' for details.
systemctl status switchd.service shows output similar to:
Active: failed (Result: start-limit) since Thu 2016-04-07 21:55:14 UTC; 15s ago
To clear this error, run systemctl reset-failed switchd.service. If you know you are going to restart frequently (multiple times within the StartLimitInterval), you can run the same command before you issue the restart request. This also applies to stop followed by start.
Keep systemd Services from Hanging after Starting
If you start, restart, or reload any systemd service that can be started from another systemd service, you must use the --no-block option with systemctl. Otherwise, that service or even the switch itself might hang after starting or restarting.
Identify Active Listener Ports for IPv4 and IPv6
You can identify the active listener ports under both IPv4 and IPv6 using the netstat command:
To determine which services are currently active or stopped, run the cl-service-summary command:
cumulus@switch:~$ cl-service-summary
Service cron enabled active
Service ssh enabled active
Service syslog enabled active
Service asic-monitor enabled inactive
Service clagd enabled inactive
Service cumulus-poe inactive
Service lldpd enabled active
Service mstpd enabled active
Service neighmgrd enabled active
Service netd enabled active
Service netq-agent enabled active
Service ntp enabled active
Service portwd enabled active
Service ptmd enabled active
Service pwmd enabled active
Service smond enabled active
Service switchd enabled active
Service sysmonitor enabled active
Service rdnbrd disabled inactive
Service frr enabled inactive
...
You can also run the systemctl list-unit-files --type service command to list all services on the switch and see which ones are enabled:
The following table lists the most important services in Cumulus Linux.
Service Name
Description
Affects Forwarding?
switchd
Hardware abstraction daemon. Synchronizes the kernel with the ASIC.
YES
sx_sdk
Interfaces with the Spectrum ASIC. Only on Spectrum switches.
YES
portwd
Port watch daemon. Broadcom switches only. Reads pluggable information over the I2C bus. Identifies and classifies the modules that are inserted into the system. Manages setting related to the module types that are inserted.
YES, eventually, if modules are added or removed
frr
FRRouting. Handles routing protocols. There are separate processes for each routing protocol, such as bgpd and ospfd.
switchd is the daemon at the heart of Cumulus Linux. It communicates between the switch and Cumulus Linux, and all the applications running on Cumulus Linux.
The switchd configuration is stored in /etc/cumulus/switchd.conf.
The switchd File System
switchd also exports a file system, mounted on /cumulus/switchd, that presents all the switchd configuration options as a series of files arranged in a tree structure. To show the contents, run the tree /cumulus/switchd command. The following example shows output for a switch with one switch port configured:
To configure the switchd parameters, edit the /etc/cumulus/switchd.conf file. An example is provided below.
cumulus@switch:~$ sudo nano /etc/cumulus/switchd.conf
#
# /etc/cumulus/switchd.conf - switchd configuration file
#
# Statistic poll interval (in msec)
#stats.poll_interval = 2000
# Buffer utilization poll interval (in msec), 0 means disable
#buf_util.poll_interval = 0
# Buffer utilization measurement interval (in mins)
#buf_util.measure_interval = 0
# Optimize ACL HW resources for better utilization
#acl.optimize_hw = FALSE
# Enable Flow based mirroring.
#acl.flow_based_mirroring = TRUE
# Enable non atomic acl update
acl.non_atomic_update_mode = FALSE
# Send ARPs for next hops
#arp.next_hops = TRUE
# Kernel routing table ID, range 1 - 2^31, default 254
#route.table = 254
...
When you update the /etc/cumulus/switchd.conf file, you must restart switchd for the changes to take effect. See Restart switchd, below.
Restart switchd
Whenever you modify a switchd hardware configuration file (for example, you update any *.conf file that requires making a change to the switching hardware, like /etc/cumulus/datapath/traffic.conf), you must restart the switchd service for the change to take effect:
You do not have to restart the switchd service when you update a network interface configuration (for example, when you edit the /etc/network/interfaces file).
Restarting the switchd service causes all network ports to reset in addition to resetting the switch hardware configuration. NVIDIA recommends that you reboot the switch instead of restarting the switchd service to minimize traffic impact when redundant switches are present with MLAG.
Power over Ethernet - PoE
Cumulus Linux supports Power over Ethernet (PoE) and PoE+, so certain Cumulus Linux switches can supply power from Ethernet switch ports to enabled devices over the Ethernet cables that connect them. PoE is capable of powering devices up to 15W, while PoE+ can power devices up to 30W. Configuration for power negotiation is done over LLDP.
PoE functionality is provided by the cumulus-poe package. When a powered device is connected to the switch via an Ethernet cable:
If the available power is greater than the power required by the connected device, power is supplied to the switch port, and the device powers on
If available power is less than the power required by the connected device and the switch port’s priority is less than the port priority set on all powered ports, power is not supplied to the port
If available power is less than the power required by the connected device and the switch port’s priority is greater than the priority of a currently powered port, power is removed from lower priority port(s) and power is supplied to the port
If the total consumed power exceeds the configured power limit of the power source, low priority ports are turned off. In the case of a tie, the port with the lower port number gets priority
Power is available as follows:
PSU 1
PSU 2
PoE Power Budget
920W
x
750W
x
920W
750W
920W
920W
1650W
The AS4610-54P has an LED on the front panel to indicate PoE status:
Green: The poed daemon is running and no errors are detected
Yellow: One or more errors are detected or the poed daemon is not running
Link state and PoE state are completely independent of each other. When a link is brought down on a particular port using ip link <port> down, power on that port is not turned off; however, LLDP negotiation is not possible.
Configure PoE
You use the poectl command utility to configure PoE on a switch that supports the feature. You can:
Enable or disable PoE for a given switch port
Set a switch port’s PoE priority to one of three values: low, high or critical
The PoE configuration resides in /etc/cumulus/poe.conf. The file lists all the switch ports, whether PoE is enabled for those ports and the priority for each port.
By default, PoE and PoE+ are enabled on all Ethernet/1G switch ports, and these ports are set with a low priority. Switch ports can have low, high or critical priority.
There is no additional configuration for PoE+.
To change the priority for one or more switch ports, run poectl -p swp# [low|high|critical]. For example:
cumulus@switch:~$ sudo poectl -p swp1-swp5,swp7 high
To disable PoE for one or more ports, run poectl -d [port_numbers]:
cumulus@switch:~$ sudo poectl -d swp1-swp5,swp7
To display PoE information for a set of switch ports, run poectl -i [port_numbers]:
cumulus@switch:~$ sudo poectl -i swp10-swp13
Port Status Allocated Priority PD type PD class Voltage Current Power
----- -------------------- ----------- -------- ----------- -------- ------- ------- ---------
swp10 connected negotiating low IEEE802.3at 4 53.5 V 25 mA 3.9 W
swp11 searching n/a low IEEE802.3at none 0.0 V 0 mA 0.0 W
swp12 connected n/a low IEEE802.3at 2 53.5 V 25 mA 1.4 W
swp13 connected 51.0 W low IEEE802.3at 4 53.6 V 72 mA 3.8 W
The Status can be one of the following:
searching: PoE is enabled but no device has been detected.
disabled: The PoE port has been configured as disabled.
connected: A powered device is connected and receiving power.
power-denied: There is insufficient PoE power available to enable the connected device.
The Allocated column displays how much PoE power has been allocated to the port, which can be one of the following:
n/a: No device is connected or the connected device does not support LLDP negotiation.
negotiating: An LLDP-capable device is connected and is negotiating for PoE power.
XX.X W: An LLDP-capable device has negotiated for XX.X watts of power (for example, 51.0 watts for swp13 above).
To see all the PoE information for a switch, run poectl -s:
cumulus@switch:~$ poectl -s
System power:
Total: 730.0 W
Used: 11.0 W
Available: 719.0 W
Connected ports:
swp11, swp24, swp27, swp48
The set commands (priority, enable, disable) either succeed silently or display an error message if the command fails.
The poectl command takes the following arguments:
Argument
Description
-h, --help
Show this help message and exit.
-i, --port-info <port-list>
Returns detailed information for the specified ports. You can specify a range of ports. For example: -i swp1-swp5,swp10. Note: On an Edge-Core AS4610-54P switch, the voltage reported by the poectl -i command and measured through a power meter connected to the device varies by 5V. The current and power readings are correct and no difference is seen for them.
-a, --all
Returns PoE status and detailed information for all ports.
-p, --priority <port-list> <priority>
Sets priority for the specified ports: low, high, critical.
-d, --disable-ports <port-list>
Disables PoE operation on the specified ports.
-e, --enable-ports <port-list>
Enables PoE operation on the specified ports.
-s, --system
Returns PoE status for the entire switch.
-r, --reset <port-list>
Performs a hardware reset on the specified ports. Use this if one or more ports are stuck in an error state. This does not reset any configuration settings for the specified ports.
-v, --version
Displays version information.
-j, --json
Displays output in JSON format.
--save
Saves the current configuration. The saved configuration is automatically loaded on system boot.
--load
Loads and applies the saved configuration.
Troubleshooting
You can troubleshoot PoE and PoE+ using the following utilities and files:
poectl -s, as described above.
The Cumulus Linux cl-support script, which includes PoE-related output from poed.conf, syslog, poectl --diag-info and lldpctl.
lldpcli show neighbors ports <swp> protocol lldp hidden details
tcpdump -v -v -i <swp> ether proto 0x88cc
The contents of the PoE/PoE+ /etc/lldpd.d/poed.conf configuration file, as described above.
Verify the Link Is Up
LLDP requires network connectivity, so verify that the link is up.
cumulus@switch:~$ net show interface swp20
Name MAC Speed MTU Mode
-- ------ ----------------- ------- ----- ---------
UP swp20 44:38:39:00:00:04 1G 9216 Access/L2
View LLDP Information Using lldpcli
You can run lldpcli to view the LLDP information that has been received on a switch port. For example:
cumulus@switch:~$ sudo lldpcli show neighbors ports swp20 protocol lldp hidden details
-------------------------------------------------------------------------------
LLDP neighbors:
-------------------------------------------------------------------------------
Interface: swp20, via: LLDP, RID: 2, Time: 0 day, 00:03:34
Chassis:
ChassisID: mac 68:c9:0b:25:54:7c
SysName: ihm-ubuntu
SysDescr: Ubuntu 14.04.2 LTS Linux 3.14.4+ #1 SMP Thu Jun 26 00:54:44 UTC 2014 armv7l
MgmtIP: fe80::6ac9:bff:fe25:547c
Capability: Bridge, off
Capability: Router, off
Capability: Wlan, off
Capability: Station, on
Port:
PortID: mac 68:c9:0b:25:54:7c
PortDescr: eth0
PMD autoneg: supported: yes, enabled: yes
Adv: 10Base-T, HD: yes, FD: yes
Adv: 100Base-TX, HD: yes, FD: yes
MAU oper type: 100BaseTXFD - 2 pair category 5 UTP, full duplex mode
MDI Power: supported: yes, enabled: yes, pair control: no
Device type: PD
Power pairs: spare
Class: class 4
Power type: 2
Power Source: Primary power source
Power Priority: low
PD requested power Value: 51000
PSE allocated power Value: 51000
UnknownTLVs:
TLV: OUI: 00,01,42, SubType: 1, Len: 1 05
TLV: OUI: 00,01,42, SubType: 1, Len: 1 0D
-------------------------------------------------------------------------------
View LLDP Information Using tcpdump
You can use tcpdump to view the LLDP frames being transmitted and received. For example:
cumulus@switch:~$ sudo tcpdump -v -v -i swp20 ether proto 0x88cc
tcpdump: listening on swp20, link-type EN10MB (Ethernet), capture size 262144 bytes
18:41:47.559022 LLDP, length 211
Chassis ID TLV (1), length 7
Subtype MAC address (4): 00:30:ab:f2:d7:a5 (oui Unknown)
0x0000: 0400 30ab f2d7 a5
Port ID TLV (2), length 6
Subtype Interface Name (5): swp20
0x0000: 0573 7770 3230
Time to Live TLV (3), length 2: TTL 120s
0x0000: 0078
System Name TLV (5), length 13: dni-3048up-09
0x0000: 646e 692d 3330 3438 7570 2d30 39
System Description TLV (6), length 68
Cumulus Linux version 3.0.1~1466303042.2265c10 running on dni 3048up
0x0000: 4375 6d75 6c75 7320 4c69 6e75 7820 7665
0x0010: 7273 696f 6e20 332e 302e 317e 3134 3636
0x0020: 3330 3330 3432 2e32 3236 3563 3130 2072
0x0030: 756e 6e69 6e67 206f 6e20 646e 6920 3330
0x0040: 3438 7570
System Capabilities TLV (7), length 4
System Capabilities [Bridge, Router] (0x0014)
Enabled Capabilities [Router] (0x0010)
0x0000: 0014 0010
Management Address TLV (8), length 12
Management Address length 5, AFI IPv4 (1): 10.0.3.190
Interface Index Interface Numbering (2): 2
0x0000: 0501 0a00 03be 0200 0000 0200
Management Address TLV (8), length 24
Management Address length 17, AFI IPv6 (2): fe80::230:abff:fef2:d7a5
Interface Index Interface Numbering (2): 2
0x0000: 1102 fe80 0000 0000 0000 0230 abff fef2
0x0010: d7a5 0200 0000 0200
Port Description TLV (4), length 5: swp20
0x0000: 7377 7032 30
Organization specific TLV (127), length 9: OUI IEEE 802.3 Private (0x00120f)
Link aggregation Subtype (3)
aggregation status [supported], aggregation port ID 0
0x0000: 0012 0f03 0100 0000 00
Organization specific TLV (127), length 9: OUI IEEE 802.3 Private (0x00120f)
MAC/PHY configuration/status Subtype (1)
autonegotiation [supported, enabled] (0x03)
PMD autoneg capability [10BASE-T fdx, 100BASE-TX fdx, 1000BASE-T fdx] (0x2401)
MAU type 100BASEFX fdx (0x0012)
0x0000: 0012 0f01 0324 0100 12
Organization specific TLV (127), length 12: OUI IEEE 802.3 Private (0x00120f)
Power via MDI Subtype (2)
MDI power support [PSE, supported, enabled], power pair spare, power class class4
0x0000: 0012 0f02 0702 0513 01fe 01fe
Organization specific TLV (127), length 5: OUI Unknown (0x000142)
0x0000: 0001 4201 0d
Organization specific TLV (127), length 5: OUI Unknown (0x000142)
0x0000: 0001 4201 01
End TLV (0), length 0
Log poed Events in syslog
The poed service logs the following events to syslog when:
A switch provides power to a powered device.
A device that was receiving power is removed.
The power available to the switch changes.
Errors are detected.
Configuring a Global Proxy
You configure global HTTP and HTTPS proxies in the /etc/profile.d/ directory of Cumulus Linux. To do so, set the http_proxy and https_proxy variables, which tells the switch the address of the proxy server to use to fetch URLs on the command line. This is useful for programs such as apt/apt-get, curl and wget, which can all use this proxy.
In a terminal, create a new file in the /etc/profile.d/ directory. In the code example below, the file is called proxy.sh, and is created using the text editor nano.
Create a file in the /etc/apt/apt.conf.d directory and add the following lines to the file for acquiring the HTTP and HTTPS proxies; the example below uses http_proxy as the file name:
Cumulus Linux implements an HTTP application programming interface to NCLU. Instead of accessing Cumulus Linux using SSH, you can interact with the switch using an HTTP client, such as cURL, HTTPie or a web browser.
HTTP API Basics
The supporting software for the API is installed with Cumulus Linux.
To use the REST API, you must enable nginx on the switch:
To configure the HTTP API services, edit the /etc/nginx/sites-available/nginx-restapi.conf configuration file, enter in the IP address in which the REST API will listen on and then run the command sudo systemctl restart nginx.
IP and Port Settings
You can modify the IP:port combinations to which services listen by changing the parameters of the listen directives. By default, nginx-restapi.conf has only one listen parameter.
All URLs must use HTTPS instead of HTTP.
For more information on the listen directive, refer to the NGINX documentation.
Configure Security
Authentication
The default configuration requires all HTTP requests from external sources (not internal switch traffic) to set the HTTP Basic Authentication header.
The user and password must correspond to a user on the host switch.
Transport Layer Security
All traffic must be secured in transport using TLSv1.2 by default. Cumulus Linux contains a self-signed certificate and private key used server-side in this application so that it works out of the box, but NVIDIA recommends you use your own certificates and keys. Certificates must be in the PEM format.
Do not copy the cumulus.pem or cumulus.key files. After installation, edit the ssl_certificate and ssl_certificate_key values in the configuration file for your hardware.
cURL Examples
This section includes several example cURL commands you can use to send HTTP requests to a host. The following settings are used for these examples:
Username: user
Password: pw
IP: 192.168.0.32
Port: 8080
Requests for NCLU require setting the Content-Type request header to be set to application/json.
The cURL -k flag is necessary when the server uses a self-signed certificate. This is the default configuration (see the Security section). To display the response headers, include the -D flag in the command.
To retrieve a list of all available HTTP endpoints:
cumulus@switch:~$ curl -X GET -k -u user:pw https://192.168.0.32:8080
To run net show counters on the host as a remote procedure call:
The /etc/restapi.conf file is not listed in the net show configuration files command output.
Smart System Manager
Use Smart System Manager, also known as ISSU, to upgrade and troubleshoot an active switch with minimal disruption to the network.
Smart System Manager includes the following modes:
Restart
Upgrade
Maintenance
The Smart System Manager is supported on Spectrum 1, 2 and 3 ASICs only.
The Smart System Manager NCLU commands do not require a net commit.
Requirements
The Smart System Manager requires the kexec-tools package, which is installed on the switch when you install a new Cumulus Linux image. However, upgrading the switch with apt-get does not install the kexec-tools package.
To verify that the kexec-tools package is installed on the switch, run the following command:
cumulus@switch:~$ net show package version
To install the kexec-tools package, run the following commands:
You can restart the switch in one of the following modes.
cold completely restarts the system and resets all the hardware devices on the switch (including the switching ASIC).
fast restarts the system more efficiently with minimal impact to traffic by reloading the kernel and software stack without a hard reset of the hardware. During a fast restart, the system is decoupled from the network to the extent possible using existing protocol extensions before recovering to the operational mode of the system. The forwarding entries of the switching ASIC are maintained through the restart process and the data plane is not affected. The data plane is only interrupted when switchd resets and reconfigures the ASIC if the SDK is upgraded. Traffic outage is significantly lower in this mode.
The following command restarts the system in cold mode:
cumulus@switch:~$ net system maintenance restart cold
cumulus@switch:~$ sudo csmgrctl -c
The following command restarts the system in fast mode:
cumulus@switch:~$ net system maintenance restart fast
cumulus@switch:~$ sudo csmgrctl -f
Upgrade Mode
Upgrade mode updates all the components and services on the switch to the latest Cumulus Linux release without traffic loss. After upgrade is complete, you must restart the switch with either a cold or fast restart.
Upgrade mode includes the following options:
all runs apt-get upgrade to upgrade all the system components to the latest release without affecting traffic flow. You must restart the system after the upgrade completes with one of the restart modes.
dry-run provides information on the components that will be upgraded.
The following command upgrades all the system components:
cumulus@switch:~$ net system maintenance upgrade all
cumulus@switch:~$ sudo csmgrctl -u
The following command provides information on the components that will be upgraded:
cumulus@switch:~$ net system maintenance upgrade dry-run
cumulus@switch:~$ sudo csmgrctl -d
Maintenance Mode
Maintenance mode isolates the system from the rest of the network so that you can perform intrusive troubleshooting tasks and data collection or perform system changes, such as break out ports and replace optics or cables with minimal disruption.
Depending on your configuration and network topology, complete isolation might not be possible.
Enable Maintenance Mode
Run the following command to enable maintenance mode. When maintenance mode is enabled, Smart System Manager performs a graceful BGP shutdown, redirects traffic over the peerlink and brings down the MLAG port link. switchd maintains full capability.
cumulus@switch:~$ net system maintenance mode enable
cumulus@switch:~$ sudo csmgrctl -m1
You can run additional commands to bring all the ports down, then up to restore the port admin state.
cumulus@switch:~$ net system maintenance ports down
cumulus@switch:~$ net system maintenance ports up
Before you disable maintenance mode, be sure to bring the ports back up.
Disable Maintenance Mode
Run the following command to disable maintenance mode and restore normal operation. When maintenance mode is disabled, Smart System Manager performs a soft restart, runs a BGP graceful restart, and brings the MLAG port link back up. switchd maintains full capability.
cumulus@switch:~$ net system maintenance mode disable
cumulus@switch:~$ sudo csmgrctl -m0
Show Maintenance Mode Status
To see if maintanance mode is enabled or disabled, run the NCLU net system maintenance show status command or the Linux sudo csmgrctl -s command. For example:
cumulus@switch:~$ net system maintenance show status
Current System Mode: Maintenance since Tue Jan 5 00:13:37 2021 (Duration: 00:00:31)
Boot Mode: reboot_cold
2 registered modules
frr : Maintenance, down
switchd : Maintenance, down
Layer 1 and Switch Ports
This section discusses how to configure network interfaces and DHCP delays and servers. The Prescriptive Topology Manager (PTM) cabling verification tool is also discussed.
Interface Configuration and Management
ifupdown is the network interface manager for Cumulus Linux. Cumulus Linux uses an updated version of this tool, ifupdown2.
By default, ifupdown is quiet. Use the verbose option (-v) to show commands as they are executed when bringing an interface down or up.
Basic Commands
To bring up the physical connection to an interface or apply changes to an existing interface, run the sudo ifup <interface> command. The following example command brings up the physical connection to swp1:
cumulus@switch:~$ sudo ifup swp1
To bring down the physical connection to a single interface, run the sudo ifdown <interface> command. The following example command brings down the physical connection to swp1:
cumulus@switch:~$ sudo ifdown swp1
The ifdown command always deletes logical interfaces after bringing them down. When you bring down the physical connection to an interface, it is brought back up automatically after any future reboots or configuration changes with ifreload -a.
To administratively bring the interface up or down; for example, to bring down a port, bridge, or bond but not the physical connection for a port, bridge, or bond, you can use the --admin-state option. Alternatively, you can use NCLU commands.
When you put an interface into an admin down state, the interface remains down after any future reboots or configuration changes with ifreload -a.
To put an interface into an admin down state, run the net add interface <interface> link down command.
cumulus@switch:~$ net add interface swp1 link down
cumulus@switch:~$ net pending
cumulus@switch:~$ net commit
These commands create the following configuration in the /etc/network/interfaces file:
auto swp1
iface swp1
link-down yes
To bring the interface back up, run the net del interface <interface> link down command.
cumulus@switch:~$ net del interface swp1 link down
cumulus@switch:~$ net pending
cumulus@switch:~$ net commit
To put an interface into an admindown state, run the sudo ifdown <interface> --admin-state command:
cumulus@switch:~$ sudo ifdown swp1 --admin-state
These commands create the following configuration in the /etc/network/interfaces file:
auto swp1
iface swp1
link-down yes
To bring the interface back up, run the sudo ifup <interface> --admin-state command:
cumulus@switch:~$ sudo ifup swp1 --admin-state
To see the link and administrative state, use the ip link show command. In the following example, swp1 is administratively UP and the physical link is UP (LOWER_UP flag).
cumulus@switch:~$ ip link show dev swp1
3: swp1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT qlen 500
link/ether 44:38:39:00:03:c1 brd ff:ff:ff:ff:ff:ff
For additional information on interface administrative state and physical state, refer to this knowledge base article.
ifupdown2 Interface Classes
ifupdown2 enables you to group interfaces into separate classes, where a class is a user-defined label that groups interfaces that share a common function (such as uplink, downlink or compute). You specify classes in the /etc/network/interfaces file.
The most common class is auto, which you configure like this:
auto swp1
iface swp1
You can add other classes using the allow prefix. For example, if you have multiple interfaces used for uplinks, you can define a class called uplinks:
auto swp1
allow-uplink swp1
iface swp1 inet static
address 10.1.1.1/31
auto swp2
allow-uplink swp2
iface swp2 inet static
address 10.1.1.3/31
This allows you to perform operations on only these interfaces using the --allow=uplinks option. You can still use the -a options because these interfaces are also in the auto class:
cumulus@switch:~$ sudo ifup --allow=uplinks
cumulus@switch:~$ sudo ifreload -a
If you are using Management VRF, you can use the special interface class called mgmt and put the management interface into that class. The management VRF must have an IPv6 address in addition to an IPv4 address to work correctly.
The mgmt interface class is not supported with NCLU commands.
All ifupdown2 commands (ifup, ifdown, ifquery, ifreload) can take a class. Include the --allow=<class> option when you run the command. For example, to reload the configuration for the management interface described above, run:
cumulus@switch:~$ sudo ifreload --allow=mgmt
Use the -a option to bring up or down all interfaces that are marked with the common auto class in the
/etc/network/interfaces file.
To administratively bring up all interfaces marked auto, run:
cumulus@switch:~$ sudo ifup -a
To administratively bring down all interfaces marked auto, run:
cumulus@switch:~$ sudo ifdown -a
To reload all network interfaces marked auto, use the ifreload command. This command is equivalent to running ifdown then ifup; however, ifreload skips unchanged configurations:
cumulus@switch:~$ sudo ifreload -a
Certain syntax checks are done by default. As a precaution, apply configurations only if the syntax check passes. Use the following compound command:
cumulus@switch:~$ sudo bash -c "ifreload -s -a && ifreload -a"
For more information, see the individual man pages for ifup(8), ifdown(8), ifreload(8).
Configure a Loopback Interface
Cumulus Linux has a loopback interface preconfigured in the /etc/network/interfaces file. When the switch boots up, it has a loopback interface called lo, which is up and assigned an IP address of 127.0.0.1.
The loopback interface lo must always be specified in the /etc/network/interfaces file and must always be up.
To see the status of the loopback interface (lo):
Use the net show interface lo command.
cumulus@switch:~$ net show interface lo
Name MAC Speed MTU Mode
-- ------ ----------------- ------- ----- --------
UP lo 00:00:00:00:00:00 N/A 65536 Loopback
Alias
-----
loopback interface
IP Details
------------------------- --------------------
IP: 127.0.0.1/8, ::1/128
IP Neighbor(ARP) Entries: 0
The loopback is up and is assigned an IP address of 127.0.0.1.
To add an IP address to a loopback interface, configure the lo interface:
cumulus@switch:~$ net add loopback lo ip address 10.1.1.1/32
cumulus@switch:~$ net pending
cumulus@switch:~$ net commit
Use the ip addr show lo command.
cumulus@switch:~$ ip addr show lo
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
The loopback is up and is assigned an IP address of 127.0.0.1.
To add an IP address to a loopback interface, add it directly under the iface lo inet loopback definition in the /etc network/interfaces file:
auto lo
iface lo inet loopback
address 10.1.1.1
If an IP address is configured without a mask (as shown above), the IP address becomes a /32. So, in the above case, 10.1.1.1 is actually 10.1.1.1/32.
Configure Multiple Loopbacks
You can configure multiple loopback addresses by assigning additional IP addresses to the lo interface.
cumulus@switch:~$ net add loopback lo ip address 172.16.2.1/24
cumulus@switch:~$ net pending
cumulus@switch:~$ net commit
These commands create the following configuration in the /etc/network/interfaces file:
cumulus@leaf01:~$ cat /etc/network/interfaces
...
# The loopback network interface
auto lo
iface lo inet loopback
# The primary network interface
address 172.16.2.1/24
Add multiple address lines in the /etc/network/interfaces file:
auto lo
iface lo inet loopback
address 10.1.1.1
address 172.16.2.1/24
ifupdown2 Behavior with Child Interfaces
By default, ifupdown2 recognizes and uses any interface present on the system that is listed as a dependent of an interface (for example, a VLAN, bond, or physical interface). You are not required to list interfaces in the interfaces file unless they need a specific configuration for MTU, link speed, and so on. If you need to delete a child interface, delete all references to that interface from the interfaces file.
In the following example, swp1 and swp2 do not need an entry in the interfaces file. The following stanzas defined in /etc/network/interfaces provide the exact same configuration:
With Child Interfaces Defined:
auto swp1
iface swp1
auto swp2
iface swp2
auto bridge
iface bridge
bridge-vlan-aware yes
bridge-ports swp1 swp2
bridge-vids 1-100
bridge-pvid 1
bridge-stp on
Without Child Interfaces Defined
auto bridge
iface bridge
bridge-vlan-aware yes
bridge-ports swp1 swp2
bridge-vids 1-100
bridge-pvid 1
bridge-stp on
In the following example, swp1.100 and swp2.100 do not need an entry in the interfaces file. The following stanzas defined in /etc/network/interfaces provide the exact same configuration:
With Child Interfaces Defined
auto swp1.100
iface swp1.100
auto swp2.100
iface swp2.100
auto br-100
iface br-100
address 10.0.12.2/24
address 2001:dad:beef::3/64
bridge-ports swp1.100 swp2.100
bridge-stp on
Without Child Interfaces Defined
auto br-100
iface br-100
address 10.0.12.2/24
address 2001:dad:beef::3/64
bridge-ports swp1.100 swp2.100
bridge-stp on
For more information about bridges in traditional mode and bridges in VLAN-aware mode, read this knowledge base article.
ifupdown2 Interface Dependencies
ifupdown2 understands interface dependency relationships. When you run ifup and ifdown with all interfaces, the commands always run with all interfaces in dependency order. When you run ifup and ifdown
with the interface list on the command line, the default behavior is to not run with dependents; however, if there are any built-in dependents, they will be brought up or down.
To run with dependents when you specify the interface list, use the --with-depends option. The --with-depends option walks through all dependents in the dependency tree rooted at the interface you specify.
Consider the following example configuration:
auto bond1
iface bond1
address 100.0.0.2/16
bond-slaves swp29 swp30
auto bond2
iface bond2
address 100.0.0.5/16
bond-slaves swp31 swp32
auto br2001
iface br2001
address 12.0.1.3/24
bridge-ports bond1.2001 bond2.2001
bridge-stp on
The ifup --with-depends br2001 command brings up all dependents of br2001: bond1.2001, bond2.2001, bond1, bond2, bond1.2001, bond2.2001, swp29, swp30, swp31, swp32.
cumulus@switch:~$ sudo ifup --with-depends br2001
The ifdown --with-depends br2001 command brings down all dependents of br2001: bond1.2001, bond2.2001, bond1, bond2, bond1.2001, bond2.2001, swp29, swp30, swp31, swp32.
ifdown2 always deletes logical interfaces after bringing them down. Use the --admin-state option if you only want to administratively bring the interface up or down. In the above example, ifdown br2001 deletes br2001.
To guide you through which interfaces will be brought down and up, use the --print-dependency option.
For example, run ifquery --print-dependency=list -a to show the dependency list for all interfaces:
To print the dependency list of a single interface, run the ifquery --print-dependency=list <interface> command. The following example command shows the dependency list for br2001:
To show the dependency information for an interface in dot format, run the ifquery --print-dependency=dot <interface> command. The following example command shows the dependency information for interface br2001 in
dot format:
You can use dot to render the graph on an external system where dot is installed.
To print the dependency information of the entire interfaces file, run the following command:
cumulus@switch:~$ sudo ifquery --print-dependency=dot -a >interfaces_all.dot
Subinterfaces
On Linux, an interface is a network device that can be either physical, like a switch port (for example, swp1) or virtual, like a VLAN (for example, vlan100). A VLAN subinterface is a VLAN device on an interface, and the VLAN ID is appended to the parent interface using dot (.) VLAN notation. For example, a VLAN with ID 100 that is a subinterface of swp1 is named swp1.100. The dot VLAN notation for a VLAN device name is a standard way to specify a VLAN device on Linux. Many Linux configuration tools, such as ifupdown2 and its predecessor ifupdown, recognize such a name as a VLAN interface name.
A VLAN subinterface only receives traffic tagged for that VLAN; therefore, swp1.100 only receives packets tagged with VLAN 100 on switch port swp1. Similarly, any packets transmitted from swp1.100 are tagged with VLAN 100.
In an MLAG configuration, the peer link interface that connects the two switches in the MLAG pair has a VLAN subinterface named 4094 by default if you configured the subinterface with NCLU. The peerlink.4094 subinterface only receives traffic tagged for VLAN 4094.
ifup and Upper (Parent) Interfaces
When you run ifup on a logical interface (like a bridge, bond or VLAN interface), if the ifup results in the creation of the logical interface, it implicitly tries to execute on the interface’s upper (or parent) interfaces as well.
Consider this example configuration:
auto br100
iface br100
bridge-ports bond1.100 bond2.100
auto bond1
iface bond1
bond-slaves swp1 swp2
If you run ifdown bond1, ifdown deletes bond1 and the VLAN interface on bond1 (bond1.100); it also removes bond1 from the bridge br100. Next, when you run ifup bond1, it creates bond1 and the VLAN interface on bond1 (bond1.100); it also executes ifup br100 to add the bond VLAN interface (bond1.100) to the bridge br100.
There can be cases where an upper interface (like br100) is not in the right state, which can result in warnings. The warnings are mostly harmless.
If you want to disable these warnings, you can disable the implicit upper interface handling by setting skip_upperifaces=1 in the /etc/network/ifupdown2/ifupdown2.conf file.
With skip_upperifaces=1, you have to explicitly execute ifup on the upper interfaces. In this case, you will have to run ifup br100 after an ifup bond1 to add bond1 back to bridge br100.
Although specifying a subinterface like swp1.100 and then running ifup swp1.100 results in the automatic creation of the swp1 interface in the kernel, consider also specifying the parent interface swp1. A parent interface is one where any physical layer configuration can reside, such as link-speed 1000 or link-duplex full. If you only create swp1.100 and not swp1, then you cannot run ifup swp1 because you did not specify it.
Configure IP Addresses
To configure IP addresses, run the following commands.
The following commands configure three IP addresses for swp1: two IPv4 addresses, and one IPv6 address.
cumulus@switch:~$ net add interface swp1 ip address 12.0.0.1/30
cumulus@switch:~$ net add interface swp1 ip address 12.0.0.2/30
cumulus@switch:~$ net add interface swp1 ipv6 address 2001:DB8::1/126
cumulus@switch:~$ net pending
cumulus@switch:~$ net commit
These commands create the following code snippet in the /etc/network/interfaces file:
auto swp1
iface swp1
address 12.0.0.1/30
address 12.0.0.2/30
address 2001:DB8::1/126
You can specify both IPv4 and IPv6 addresses for the same interface.
For IPv6 addresses, you can create or modify the IP address for an interface using either :: or 0:0:0 notation. Both of the following examples are valid:
cumulus@switch:~$ net add bgp neighbor 2620:149:43:c109:0:0:0:5 remote-as internal
cumulus@switch:~$ net add interface swp1 ipv6 address 2001:DB8::1/126
NCLU adds the address method and address family when needed, specifically when you are creating DHCP or loopback interfaces.
auto lo
iface lo inet loopback
In the /etc/network/interfaces file, list all IP addresses under the iface section. The following command example adds IP address 10.0.0.1/30 and 10.0.0.2/30 to swp1.
auto swp1
iface swp1
address 10.0.0.1/30
address 10.0.0.2/30
The address method and address family are not mandatory; they default to inet/inet6 and static. However, you must specify inet/inet6 when you are creating DHCP or loopback interfaces.
auto lo
iface lo inet loopback
You can specify both IPv4 and IPv6 addresses in the same iface stanza:
auto swp1
iface swp1
address 192.0.2.1/30
address 192.0.2.2/30
address 2001:DB8::1/126
A runtime configuration is non-persistent, which means the configurationyou create here does not persist after you reboot the switch.
To make non-persistent changes to interfaces at runtime, use ip addr add:
cumulus@switch:~$ sudo ip addr add 192.0.2.1/30 dev swp1
cumulus@switch:~$ sudo ip addr add 2001:DB8::1/126 dev swp1
To remove an addresses from an interface, use ip addr del:
cumulus@switch:~$ sudo ip addr del 192.0.2.1/30 dev swp1
cumulus@switch:~$ sudo ip addr del 2001:DB8::1/126 dev swp1
For more details on the options available to manage and query interfaces, see man ip.
To show the assigned IP address on an interface, run the ip addr show command. The following example command shows the assigned IP address on swp1.
cumulus@switch:~$ ip addr show dev swp1
3: swp1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 500
link/ether 44:38:39:00:03:c1 brd ff:ff:ff:ff:ff:ff
inet 192.0.2.1/30 scope global swp1
inet 192.0.2.2/30 scope global swp1
inet6 2001:DB8::1/126 scope global tentative
valid_lft forever preferred_lft forever
Specify IP Address Scope
ifupdown2 does not honor the configured IP address scope setting in the /etc/network/interfaces file, treating all addresses as global. It does not report an error. Consider this example configuration:
auto swp2
iface swp2
address 35.21.30.5/30
address 3101:21:20::31/80
scope link
When you run ifreload -a on this configuration, ifupdown2 considers all IP addresses as global.
cumulus@switch:~$ ip addr show swp2
5: swp2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 74:e6:e2:f5:62:82 brd ff:ff:ff:ff:ff:ff
inet 35.21.30.5/30 scope global swp2
valid_lft forever preferred_lft forever
inet6 3101:21:20::31/80 scope global
valid_lft forever preferred_lft forever
inet6 fe80::76e6:e2ff:fef5:6282/64 scope link
valid_lft forever preferred_lft forever
To work around this issue, configure the IP address scope:
Run the following commands:
cumulus@switch:~$ net add interface swp6 post-up ip address add 71.21.21.20/32 dev swp6 scope site
cumulus@switch:~$ net pending
cumulus@switch:~$ net commit
These commands create the following code snippet in the /etc/network/interfaces file:
auto swp6
iface swp6
post-up ip address add 71.21.21.20/32 dev swp6 scope site
In the /etc/network/interfaces file, configure the IP address scope using post-up ip address add <address> dev <interface> scope <scope>. For example:
auto swp6
iface swp6
post-up ip address add 71.21.21.20/32 dev swp6 scope site
Then run the ifreload -a command on this configuration.
The following configuration shows the correct scope:
cumulus@switch:~$ ip addr show swp6
9: swp6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 74:e6:e2:f5:62:86 brd ff:ff:ff:ff:ff:ff
inet 71.21.21.20/32 scope site swp6
valid_lft forever preferred_lft forever
inet6 fe80::76e6:e2ff:fef5:6286/64 scope link
valid_lft forever preferred_lft forever
Purge Existing IP Addresses on an Interface
By default, ifupdown2 purges existing IP addresses on an interface. If you have other processes that manage IP addresses for an interface, you can disable this feature.
To disable IP address purge on an interface, run the following commands:
cumulus@switch:~$ net add interface swp1 address-purge no
cumulus@switch:~$ net pending
cumulus@switch:~$ net commit
These commands create the following configuration snippet in the /etc/network/interfaces file:
auto swp1
iface swp1
address-purge no
In the /etc/network/interfaces file, add address-purge no to the interface configuration. The following example command disables IP address purge on swp1.
cumulus@switch:~# sudo nano /etc/network/interfaces
auto swp1
iface swp1
address-purge no
Purging existing addresses on interfaces with multiple iface stanzas is not supported. Doing so can result in the configuration of multiple addresses for an interface after you change an interface address and reload the configuration with ifreload -a. If this happens, you must shut down and restart the interface with ifup and ifdown, or manually delete superfluous addresses with ip address delete specify.ip.address.here/mask dev DEVICE. See also the Considerations section below for cautions about using multiple iface stanzas for the same interface.
Specify User Commands
You can specify additional user commands in the /etc/network/interfaces file. The interface stanzas in /etc/network/interfaces can have a command that runs at pre-up, up, post-up, pre-down, down, and post-down:
To add a command to an interface stanza, run the following commands:
cumulus@switch:~$ net add interface swp1 post-up /sbin/foo bar
cumulus@switch:~$ net add interface ip address 12.0.0.1/30
cumulus@switch:~$ net pending
cumulus@switch:~$ net commit
These commands create the following configuration in the /etc/network/interfaces file:
auto swp1
iface swp1
address 12.0.0.1/30
post-up /sbin/foo bar
If your post-up command also starts, restarts, or reloads any systemd service, you must use the --no-block option with systemctl. Otherwise, that service or even the switch itself might hang after starting or restarting. For example, to restart the dhcrelay service after bringing up VLAN 100, first run:
cumulus@switch:~$ net add vlan 100 post-up systemctl --no-block restart dhcrelay.service
This command creates the following configuration in the /etc/network/interfaces file:
auto bridge
iface bridge
bridge-vids 100
bridge-vlan-aware yes
To add a command to an interface stanza, add the command in the /etc/network/interfaces file. For example:
cumulus@switch:~# sudo nano /etc/network/interfaces
auto swp1
iface swp1
address 12.0.0.1/30
up /sbin/foo bar
If your post-up command also starts, restarts, or reloads any systemd service, you must use the --no-block option with systemctl. Otherwise, that service or even the switch itself might hang after starting or restarting. For example, to restart the dhcrelay service after bringing up a VLAN, the /etc network/interfaces configuration looks like this:
auto bridge.100
iface bridge.100
post-up systemctl --no-block restart dhcrelay.service
You can add any valid command in the sequence to bring an interface up
or down; however, limit the scope to network-related commands associated
with the particular interface. For example, it does not make sense to
install a Debian package on ifup of swp1, even though it is
technically possible. See man interfaces for more details.
Source Interface File Snippets
Sourcing interface files helps organize and manage the interfaces file. For example:
cumulus@switch:~$ sudo cat /etc/network/interfaces
# The loopback network interface
auto lo
iface lo inet loopback
# The primary network interface
auto eth0
iface eth0 inet dhcp
source /etc/network/interfaces.d/bond0
NCLU supports globs to define port lists (a range of ports). You must use commas to separate different ranges of ports in the NCLU command; for example:
cumulus@switch:~$ net add bridge bridge ports swp1-4,6,10-12
cumulus@switch:~$ net pending
cumulus@switch:~$ net commit
These commands produce the following snippet in the /etc/network/interfaces file. The file renders the list of ports individually.
...
auto bridge
iface bridge
bridge-ports swp1 swp2 swp3 swp4 swp6 swp10 swp11 swp12
bridge-vlan-aware yes
auto swp1
iface swp1
auto swp2
iface swp2
auto swp3
iface swp3
auto swp4
iface swp4
auto swp6
iface swp6
auto swp10
iface swp10
auto swp11
iface swp11
auto swp12
iface swp12
Use the glob keyword to specify bridge ports and bond slaves:
auto br0
iface br0
bridge-ports glob swp1-6.100
auto br1
iface br1
bridge-ports glob swp7-9.100 swp11.100 glob swp15-18.100
Mako Templates
ifupdown2 supports Mako-style templates. The Mako template engine is run over the interfaces file before parsing.
While ifupdown2 supports Mako templates, NCLU does not understand them. As a result, NCLU cannot read or write to the /etc/network/interfaces file.
Use the template to declare cookie-cutter bridges in the interfaces file:
And use it to declare addresses in the interfaces file:
%for i in [1,12]:
auto swp${i}
iface swp${i}
address 10.20.${i}.3/24
In Mako syntax, use square brackets ([1,12]) to specify a list of individual numbers (in this case, 1 and 12). Use range(1,12) to specify a range of interfaces.
You can test your template and confirm it evaluates correctly by running mako-render /etc/network/interfaces.
To comment out content in Mako templates, use double hash marks (##). For example:
## % for i in range(1, 4):
## auto swp${i}
## iface swp${i}
## % endfor
##
Run ifupdown Scripts under /etc/network/ with ifupdown2
Unlike the traditional ifupdown system, ifupdown2 does not run scripts installed in /etc/network/*/ automatically to configure network interfaces.
To enable or disable ifupdown2 scripting, edit the addon_scripts_support line in the /etc/network/ifupdown2/ifupdown2.conf file. 1 enables scripting and 2 disables scripting. The following example enables scripting.
cumulus@switch:~$ sudo nano /etc/network/ifupdown2/ifupdown2.conf
# Support executing of ifupdown style scripts.
# Note that by default python addon modules override scripts with the same name
addon_scripts_support=1
ifupdown2 sets the following environment variables when executing commands:
$IFACE represents the physical name of the interface being processed; for example, br0 or vxlan42. The name is obtained from the /etc/network/interfaces file.
$LOGICAL represents the logical name (configuration name) of the interface being processed.
$METHOD represents the address method; for example, loopback, DHCP, DHCP6, manual, static, and so on.
$ADDRFAM represents the address families associated with the interface, formatted in a comma-separated list for example, "inet,inet6".
Add Descriptions to Interfaces
You can add descriptions to interfaces configured in the /etc/network/interfaces file by using the alias keyword.
The following commands create an alias for swp1:
cumulus@switch:~$ net add interface swp1 alias hypervisor_port_1
cumulus@switch:~$ net pending
cumulus@switch:~$ net commit
These commands create the following code snippet:
auto swp1
iface swp1
alias hypervisor_port_1
In the /etc/network/interfaces file, add a description using the alias keyword:
cumulus@switch:~# sudo nano /etc/network/interfaces
auto swp1
iface swp1
alias swp1 hypervisor_port_1
You can query the interface description.
To show the description (alias) for an interface, run the net show interface <interface> command. The following example command shows the description for swp1:
cumulus@switch$ net show interface swp1
Name MAC Speed MTU Mode
-- ---- ----------------- ------- ----- ---------
UP swp1 44:38:39:00:00:04 1G 1500 Access/L2
Alias
-----
hypervisor_port_1
To show the interface description (alias) for all interfaces on the switch, run the net show interface alias command. For example:
cumulus@switch:~$ net show interface alias
State Name Mode Alias
----- ------------- ------------- ------------------
UP bond01 LACP
UP bond02 LACP
UP bridge Bridge/L2
UP eth0 Mgmt
UP lo Loopback loopback interface
UP mgmt Interface/L3
UP peerlink LACP
UP peerlink.4094 SubInt/L3
UP swp1 BondMember hypervisor_port_1
UP swp2 BondMember to Server02
...
To show the interface description for all interfaces on the switch in JSON format, run the net show interface alias json command.
To show the description (alias) for an interface, run the ip link show command. The alias appears on the alias line:
cumulus@switch$ ip link show swp1
3: swp1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast state DOWN mode DEFAULT qlen 500
link/ether aa:aa:aa:aa:aa:bc brd ff:ff:ff:ff:ff:ff
alias hypervisor_port_1
Avoid using apostrophes or non-ASCII characters in the alias string. Cumulus Linux does not parse these characters.
Considerations
Even though ifupdown2 supports the inclusion of multiple iface stanzas for the same interface, use a single iface stanza for each interface. If you must specify more than one iface stanza; for example, if the configuration for a single interface comes from many places, like a template or a sourced file, make sure the stanzas do not specify the same interface attributes. Otherwise, unexpected behavior can result.
In the following example, swp1 is configured in two places: the /etc/network/interfaces file and the /etc/network/interfaces.d/speed_settings file. ifupdown2 correctly parses this configuration because the same attributes are not specified in multiple iface stanzas.
cumulus@switch:~$ sudo cat /etc/network/interfaces
source /etc/network/interfaces.d/speed_settings
auto swp1
iface swp1
address 10.0.14.2/24
cumulus@switch:~$ cat /etc/network/interfaces.d/speed_settings
auto swp1
iface swp1
link-speed 1000
link-duplex full
You cannot purge existing addresses on interfaces with multiple iface stanzas.
ifupdown2 and sysctl
For sysctl commands in the pre-up, up, post-up, pre-down, down, and post-down lines that use the
$IFACE variable, if the interface name contains a dot (.), ifupdown2 does not change the name to work with sysctl. For example, the interface name bridge.1 is not converted to bridge/1.
ifupdown2 and the gateway Parameter
The default route created by the gateway parameter in ifupdown2 is not installed in FRRouting, therefore cannot be redistributed into other routing protocols. Define a static default route instead, which is installed in FRR and redistributed, if needed.
The following shows an example of the /etc/network/interfaces file when you use a static route instead of a gateway parameter:
auto swp2
iface swp2
address 172.16.3.3/24
up ip route add default via 172.16.3.2
Interface Name Limitations
Interface names are limited to 15 characters in length, the first character cannot be a number and the name cannot include a dash (-). In addition, any name that matches with the regular expression .{0,13}\-v.* is not supported.
If you encounter issues, remove the interface name from the /etc/network/interfaces file, then restart the networking.service.
Most of these settings are configured automatically for you, depending upon your switch ASIC; however, you must always set MTU manually.
For NVIDIA Spectrum ASICs, the firmware configures FEC, link speed, duplex mode and auto-negotiation automatically, following a predefined list of parameter settings until the link comes up. You can disable FEC if necessary, which forces the firmware to not try any FEC options.
For Broadcom-based switches, enable auto-negotiation on each port. When enabled, Cumulus Linux automatically configures the best link parameter settings based on the module type (speed, duplex, auto-negotiation, and FEC, where supported).
This topic describes the auto-negotiation, link speed, duplex mode, MTU, and FEC settings and provides a table showing the default configuration for various port and cable types. Breakout port configuration, logical switch port limitations, and troubleshooting is also provided.
Auto-negotiation
By default on a Broadcom-based switch, auto-negotiation is disabled - except on 10G and 1000BASE-T fixed copper switch ports, where it is required for links to work. For RJ-45 SFP adapters, you need to manually configure the desired link speed and auto-negotiation as described in the default settings table below.
If you disable auto-negotiation later or never enable it, then you have to configure any settings that deviate from the port default - such as duplex mode, FEC, and link speed settings.
Some module types support auto-negotiation while others do not. To enable a simpler configuration, Cumulus Linux allows you to configure auto-negotiation on all port types on Broadcom switches; the port configuration software then configures the underlying hardware according to its capabilities.
If you do decide to disable auto-negotiation, be aware of the following:
You must manually set any non-default link speed, duplex, pause, and FEC.
Disabling auto-negotiation on a 1G optical cable prevents detection of single fiber breaks.
You cannot disable auto-negotiation on 1GT or 10GT fixed copper switch ports.
For 1000BASE-T RJ-45 SFP adapters, auto-negotiation is automatically done on the SFP PHY, so enabling auto-negotiation on the port settings is not required. You must manually configure these ports using the settings below.
Depending upon the connector used for a port, enabling auto-negotiation also enables forward error correction (FEC), if the cable requires it (see the table below). The correct FEC mode is set based on the speed of the cable when auto-negotiation is enabled.
To configure auto-negotiation for a switch:
Run the net add interface <interface> link autoneg command. The following example commands enable auto-negotiation for the swp1 interface:
cumulus@switch:~$ net add interface swp1 link autoneg on
cumulus@switch:~$ net pending
cumulus@switch:~$ net commit
Edit the /etc/network/interfaces file, then run the ifreload -a command. The following example disables auto-negotiation for the swp1 interface.
cumulus@switch:~$ sudo nano /etc/network/interfaces
auto swp1
iface swp1
link-autoneg off
cumlus@switch:~$ sudo ifreload -a
Runtime Configuration (Advanced)
You can use ethtool to configure auto-negotiation. The following example command enables auto-negotiation for the swp1 interface:
ethtool -s swp1 speed 10000 duplex full autoneg on|off
A runtime configuration is non-persistent. The configuration you create does not persist after you reboot the switch.
Any time you enable auto-negotiation, Cumulus Linux restores the default configuration settings specified in the table below.
Port Speed and Duplex Mode
Cumulus Linux supports both half- and full-duplex configurations. Half-duplex is supported only with speeds of less than 1G.
Supported port speeds include 100M, 1G, 10G, 25G, 40G, 50G and 100G. In Cumulus Linux, you set the speed on a Broadcom switch in mbps, where the setting for 1G is 1000, 40G is 40000, and 100G is 100000.
You can configure ports to the following speeds (unless there are restrictions in the /etc/cumulus/ports.conf file of a particular platform).
Switch Port Type
Other Configurable Speeds
1G
100 Mb
10G
1 Gigabit (1000 Mb)
40G
4x10G (10G lanes) creates four 1-lane ports each running at 10G
100G
50G or 2x50G (25G lanes) - 50G creates one 2-lane port running at 25G and 2x50G creates two 2-lane ports each running at 25G 40G (10G lanes) creates one 4-lane port running at 40G 4x25G (25G lanes) creates four 1-lane ports each running at 25G 4x10G (10G lanes) creates four 1-lane ports each running at 10G
Platform Limitations
On Lenovo NE2572O switches, swp1 through swp8 only support 25G speed.
For 10G and 1G SFPs inserted in a 25G port on a Broadcom switch, you must edit the /etc/cumulus/ports.conf file and configure the four ports in the same core to be 10G. See Considerations below.
A switch with the Maverick ASIC limits multicast traffic by the lowest speed port that has joined a particular group. For example, if you are sending 100G multicast through and subscribe with one 100G and one 25G port, traffic on both egress ports is limited to 25Gbps. If you remove the 25G port from the group, traffic correctly forwards at 100Gbps.
To configure the port speed and duplex mode:
Run the net add interface <interface> link speed command. The following commands configure the port speed for the swp1 interface. The duplex mode setting defaults to full. You only need to specify link duplex if you want to set half-duplex mode.
cumulus@switch:~$ net add interface swp1 link speed 10000
cumulus@switch:~$ net pending
cumulus@switch:~$ net commit
The above commands create the following /etc/network/interfaces file code snippet:
auto swp1
iface swp1
link-speed 10000
The following commands configure the port speed and set half-duplex mode for the swp1 interface.
cumulus@switch:~$ net add interface swp1 link speed 100
cumulus@switch:~$ net add interface swp1 link duplex half
cumulus@switch:~$ net pending
cumulus@switch:~$ net commit
The above commands create the following /etc/network/interfaces file code snippet:
auto swp1
iface swp1
link-speed 100
link-duplex half
To create a persistent configuration for the port speeds, edit the /etc/network/interfaces file, then run the ifreload -a command.
Add the appropriate lines for each switch port stanza. The following example shows that the port speed for the swp1 interface is set to 10G and the duplex mode is set to full.
If you specify the port speed in the /etc/network/interfaces file, you must also specify the duplex mode setting; otherwise, the interface defaults to half duplex.
cumulus@switch:~$ sudo nano /etc/network/interfaces
auto swp1
iface swp1
address 10.1.1.1/24
link-speed 10000
link-duplex full
cumulus@switch:~$ sudo ifreload -a
Runtime Configuration (Advanced)
You can use ethtool to configure the port speed and duplex mode for your switch ports. You must specify both the port speed and the duplex mode in the ethtool command; auto-negotiation is optional.
The following example command sets the port speed to 10G and duplex mode to full on the swp1 interface:
cumulus@switch:~$ ethtool -s swp1 speed 10000 duplex full
A runtime configuration is non-persistent. The configuration you create does not persist after you reboot the switch.
MTU
Interface MTU applies to traffic traversing the management port, front panel or switch ports, bridge, VLAN subinterfaces, and bonds (both physical and logical interfaces). MTU is the only interface setting that you must set manually.
In Cumulus Linux, ifupdown2 assigns 9216 as the default MTU setting. On an NVIDIA Spectrum switch, the initial MTU value set by the driver is 9238. After you configure the interface, the default MTU setting is 9216.
To change the MTU setting, run the following commands:
Run the net add interface <interface> mtu command. The following example command sets the MTU to 1500 for the swp1 interface.
cumulus@switch:~$ net add interface swp1 mtu 1500
cumulus@switch:~$ net pending
cumulus@switch:~$ net commit
These commands create the following code snippet:
auto swp1
iface swp1
mtu 1500
Edit the /etc/network/interfaces file, then run the ifreload -a command. The following example sets the MTU to 1500 for the swp1 interface.
cumulus@switch:~$ sudo nano /etc/network/interfaces
auto swp1
iface swp1
mtu 1500
cumulus@switch:~$ sudo ifreload -a
Runtime Configuration (Advanced)
Run the ip link set command. The following example command sets the swp1 interface MTU to 1500.
cumulus@switch:~$ sudo ip link set dev swp1 mtu 1500
A runtime configuration is non-persistent. The configuration you create does not persist after you reboot the switch.
Some switches might not support the same maximum MTU setting in hardware for both the management interface (eth0) and the data plane ports.
Set a Policy for Global System MTU
For a global policy to set MTU, create a policy document (called mtu.json). For example:
The policies and attributes in any file in /etc/network/ifupdown2/policy.d/ override the default policies and attributes in /var/lib/ifupdown2/policy.d/.
MTU for a Bridge
The MTU setting is the lowest MTU of any interface that is a member of the bridge (every interface specified in bridge-ports in the bridge configuration of the /etc/network/interfaces file). There is no need to specify an MTU on the bridge. Consider this bridge configuration:
For a bridge to have an MTU of 9000, set the MTU for each of the member interfaces (bond1 to bond 4, and peer5), to 9000 at minimum.
When configuring MTU for a bond, configure the MTU value directly under the bond interface; the configured value is inherited by member links/slave interfaces. If you need a different MTU on the bond, set it on the bond interface, as this ensures the slave interfaces pick it up. There is no need to specify MTU on the slave interfaces.
VLAN interfaces inherit their MTU settings from their physical devices or their lower interface; for example, swp1.100 inherits its MTU setting from swp1. Therefore, specifying an MTU on swp1 ensures that swp1.100 inherits the MTU setting for swp1.
If you are working with VXLANs, the MTU for a virtual network interface (VNI must be 50 bytes smaller than the MTU of the physical interfaces on the switch, as those 50 bytes are required for various headers and other data. Also, consider setting the MTU much higher than 1500.
The MTU for an SVI interface, such as vlan100, is derived from the bridge. When you use NCLU to change the MTU for an SVI and the MTU setting is higher than it is for the other bridge member interfaces, the MTU for all bridge member interfaces changes to the new setting. If you need to use a mixed MTU configuration for SVIs, (if some SVIs have a higher MTU and some lower), set the MTU for all member interfaces to the maximum value, then set the MTU on the specific SVIs that need to run at a lower MTU.
To show the MTU setting for an interface:
Run the net show interface <interface> command:
cumulus@switch:~$ net show interface swp1
Name MAC Speed MTU Mode
-- ------ ----------------- ------- ----- ---------
UP swp1 44:38:39:00:00:04 1G 9216 Access/L2
Run the ip link show <interface> command:
cumulus@switch:~$ ip link show dev swp1
3: swp1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9216 qdisc pfifo_fast state UP mode DEFAULT qlen 500
link/ether 44:38:39:00:03:c1 brd ff:ff:ff:ff:ff:ff
FEC
Forward Error Correction (FEC) is an encoding and decoding layer that enables the switch to detect and correct bit errors introduced over the cable between two interfaces. The target IEEE bit error rate (BER) on high speed ethernet link is 10-12. Because 25G transmission speeds can introduce a higher than acceptable BER on a link, FEC is often required to correct errors to achieve the target BER at 25G, 4x25G, 100G, and higher link speeds. The type and grade of a cable or module and the medium of transmission will determine which FEC setting is needed.
For the link to come up, the two interfaces on each end must use the same FEC setting.
There is a very small latency overhead required for FEC. For most applications, this small amount of latency is preferable to error packet retransmission latency.
There are two FEC types:
Reed Solomon (RS), IEEE 802.3 Clause 108 (CL108) on individual 25G channels and Clause 91 on 100G (4channels). This is the highest FEC algorithm, providing the best bit-error correction.
Base-R (BaseR), Fire Code (FC), IEEE 802.3 Clause 74 (CL74). Base-R provides less protection from bit errors than RS FEC but adds less latency.
Cumulus Linux includes additional FEC options:
Auto FEC instructs the hardware to select the best FEC. For copper DAC, FEC can be negotiated with the remote end. However, optical modules do not have auto-negotiation capability; if the device chooses a preferred mode, it might not match the remote end. This is the current default on a Spectrum switch.
No FEC (no error correction is done). This is the current default on a Broadcom switch.
While Auto FEC is the default setting on the Mellanox Spectrum switch, do not explicitly configure the fec auto option on the switch as this leads to a link flap whenever you run net commit or ifreload -a.
The Trident II switch does not support FEC.
The Tomahawk switch does not support RS FEC or auto-negotiation of FEC on 25G lanes that are broken out (Tomahawk pre-dates 802.3by). If you are using a 4x25G breakout DAC or AOC on a Tomahawk switch, you can configure either Base-R FEC or no FEC, and choose cables appropriate for that limitation (CA-25G-S, CA-25G-N or fiber). Tomahawk+, Tomahawk2, Trident3 and Maverick switches do not have this limitation.
For 25G DAC, 4x25G Breakouts DAC and 100G DAC cables, the IEEE 802.3by specification creates 3 classes:
CA-25G-L (Long cable) - Requires RS FEC - Achievable cable length of at least 5m. dB loss less or equal to 22.48. Expected BER of 10-5 or better without RS FEC enabled.
CA-25G-S (Short cable) - Requires Base-R FEC - Achievable cable length of at least 3m. dB loss less or equal to 16.48. Expected BER of 10-8 or better without Base-R FEC enabled.
CA-25G-N (No FEC) - Does not require FEC - Achievable cable length of at least 3m. dB loss less or equal to 12.98. Expected BER 10-12 or better with no FEC enabled.
The IEEE classification is based on various dB loss measurements and minimum achievable cable length. You can build longer and shorter cables if they comply to the dB loss and BER requirements.
If a cable is manufactured to CA-25G-S classification and FEC is not enabled, the BER might be unacceptable in a production network. It is important to set the FEC according to the cable class (or better) to have acceptable bit error rates. See
Determining Cable Class below.
You can check bit errors using cl-netstat (RX_ERR column) or ethtool -S (HwIfInErrors counter) after a large amount of traffic has passed through the link. A non-zero value indicates bit errors.
Expect error packets to be zero or extremely low compared to good packets. If a cable has an unacceptable rate of errors with FEC enabled, replace the cable.
For 25G, 4x25G Breakout, and 100G Fiber modules and AOCs, there is no classification of 25G cable types for dB loss, BER or length. FEC is recommended but might not be required if the BER is low enough.
Determine Cable Class of 100G and 25G DACs
You can determine the cable class for 100G and 25G DACs from the Extended Specification Compliance Code field (SFP28: 0Ah, byte 35, QSFP28: Page 0, byte 192) in the cable EEPROM programming.
For 100G DACs, most manufacturers use the 0x0Bh 100GBASE-CR4 or 25GBASE-CR CA-L value (the 100G DAC specification predates the IEEE 802.3by 25G DAC specification). RS FEC is the expected setting for 100G DAC but might not be required with shorter or better cables.
A manufacturer’s EEPROM setting might not match the dB loss on a cable or the actual bit error rates that a particular cable introduces. Use the designation as a guide, but set FEC according to the bit error rate tolerance in the design criteria for the network. For most applications, the highest mutual FEC ability of both end devices is the best choice.
You can determine for which grade the manufacturer has designated the cable as follows.
In each example below, the Compliance field is derived using the method described above and is not visible in the ethool -m output.
3meter cable that does not require FEC
(CA-N)
Cost: More expensive
Cable size: 26AWG (Note that AWG does not necessarily correspond to overall dB loss or BER performance)
Compliance Code: 25GBASE-CR CA-N
3meter cable that requires Base-R FEC
(CA-S)
Cost: Less expensive
Cable size: 26AWG
Compliance Code: 25GBASE-CR CA-S
When in doubt, consult the manufacturer directly to determine the cable classification.
Spectrum ASIC FEC Behavior
The firmware in a Spectrum ASIC applies FEC configuration to 25G and 100G cables based on the cable type and whether the peer switch also has a Spectrum ASIC.
When the link is between two switches with Spectrum ASICs:
For 25G optical modules, the Spectrum ASIC firmware chooses Base-R/FC-FEC.
For 25G DAC cables with attenuation less or equal to 16db, the firmware chooses Base-R/FC-FEC.
For 25G DAC cables with attenuation higher than 16db, the firmware chooses RS-FEC.
For 100G cables/modules, the firmware chooses RS-FEC.
Cable Type
FEC Mode
25G optical cables
Base-R/FC-FEC
25G 1,2 meters: CA-N, loss <13db
Base-R/FC-FEC
25G 2.5,3 meters: CA-S, loss <16db
Base-R/FC-FEC
25G 2.5,3,4,5 meters: CA-L, loss > 16db
RS-FEC
100G DAC or optical
RS-FEC
When linking to a non-Spectrum peer, the firmware lets the peer decide. The Spectrum ASIC supports RS-FEC (for both 100G and 25G), Base-R/FC-FEC (25G only), or no-FEC (for both 100G and 25G).
Cable Type
FEC Mode
25G pptical cables
Let peer decide
25G 1,2 meters: CA-N, loss <13db
Let peer decide
25G 2.5,3 meters: CA-S, loss <16db
Let peer decide
25G 2.5,3,4,5 meters: CA-L, loss > 16db
Let peer decide
100G
Let peer decide: RS-FEC or No FEC
How Does Cumulus Linux use FEC?
This depends upon the make of the switch you are using.
A Spectrum switch enables FEC automatically when it powers up; that is, the setting is fec auto. The port firmware tests and determines the correct FEC mode to bring the link up with the neighbor. It is possible to get a link up to a Spectrum switch without enabling FEC on the remote device as the switch eventually finds a working combination to the neighbor without FEC.
On a Broadcom switch, Cumulus Linux does not enable FEC by default; that is, the setting is fec off. Configure FEC explicitly to match the configured FEC on the link neighbor. On 100G DACs, you can configure link-autoneg so that the port attempts to negotiate FEC settings with the remote peer.
The following sections describe how to show the current FEC mode, and to enable and disable FEC.
Show the Current FEC Mode
Cumulus Linux returns different output for the ethtool --show-fec command, depending upon whether you are using a Broadcom or NVIDIA Spectrum switch.
On a Broadcom switch, the --show-fec output tells you exactly what you configured, even if the link is down due to a FEC mismatch with the neighbor.
On a Spectrum switch, the --show-fec output tells you the current active state of FEC only if the link is up; that is, if the FEC modes matches that of the neighbor. If the link is not up, the value displays None, which is not valid.
To show the FEC mode currently enabled on a given switch port, run the ethtool --show-fec <interface> command.
cumulus@switch:~$ sudo ethtool --show-fec swp1
FEC parameters for swp1:
Configured FEC encodings: Auto
Active FEC encoding: Off
Enable or Disable FEC
To enable Reed Solomon (RS) FEC on a link:
Run the net add interface <interface> link fec rs command. For example:
cumulus@switch:~$ sudo net add interface swp1 link fec rs
cumulus@switch:~$ sudo net pending
cumulus@switch:~$ sudo net commit
Edit the /etc/network/interfaces file, then run the ifreload -a command. The following example enables RS FEC for the swp1 interface (link-fec rs):
cumulus@switch:~$ sudo nano /etc/network/interfaces
auto swp1
iface swp1
link-autoneg off
link-speed 100000
link-fec rs
cumulus@switch:~$ sudo ifreload -a
Runtime Configuration (Advanced)
Run the ethtool --set-fec <interface> encoding RS command. For example:
A runtime configuration is non-persistent. The configuration you create does not persist after you reboot the switch.
To enable Base-R/FireCode FEC on a link:
Run the net add interface <interface> link fec baser command. For example:
cumulus@switch:~$ sudo net add interface swp1 link fec baser
cumulus@switch:~$ sudo net pending
cumulus@switch:~$ sudo net commit
Edit the /etc/network/interfaces file, then run the ifreload -a command. The following example enables Base-R FEC for the swp1 interface (link-fec baser):
cumulus@switch:~$ sudo nano /etc/network/interfaces
auto swp1
iface swp1
link-autoneg off
link-speed 100000
link-fec baser
cumulus@switch:~$ sudo ifreload -a
Runtime Configuration (Advanced)
Run the ethtool --set-fec <interface> encoding baser command. For example:
Run the net add interface <interface> link fec off command. For example:
cumulus@switch:~$ sudo net add interface swp1 link fec off
cumulus@switch:~$ sudo net pending
cumulus@switch:~$ sudo net commit
Edit the /etc/network/interfaces file, then run the ifreload -a command. The following example disables Base-R FEC for the swp1 interface (link-fec baser):
cumulus@switch:~$ sudo nano /etc/network/interfaces
auto swp1
iface swp1
link-fec off
cumulus@switch:~$ sudo ifreload -a
Runtime Configuration (Advanced)
Run the ethtool --set-fec <interface> encoding off command. For example:
cumulus@switch:~$ sudo ethtool --set-fec swp1 encoding off
A runtime configuration is non-persistent. The configuration you create does not persist after you reboot the switch.
Interface Configuration Recommendations for Broadcom Platforms
The recommended configuration for each type of interface is described in the following table. These are the link settings that are applied to the port hardware when auto-negotiation is enabled on a Broadcom-based switch. If further troubleshooting is required to bring a link up, use the table below as a guide to set the link parameters.
Except as noted below, the settings for both sides of the link are expected to be the same.
Spectrum switches automatically configure these settings following a predefined list of parameter settings until the link comes up.
Speed
Auto-negotiation
FEC Setting
Manual Configuration Examples
Notes
100BASE-T (RJ-45 SFP adapter)
Off
N/A
NCLU commands
$ net add interface swp1 link speed 100 $ net add interface swp1 link autoneg off
Configuration in /etc/network/interfaces
auto swp1 iface swp1 link-autoneg off link-speed 100
The module has two sets of electronics: the port side, which communicates with the switch ASIC and the RJ-45 adapter side.
Auto-negotiation is always used on the RJ-45 adapter side of the link by the PHY built into the module. This is independent of the switch setting. Set auto-negotiation to off.
Auto-negotiation must be enabled on the server side in this scenario.
100BASE-T on a 1G fixed copper port
On
N/A
NCLU commands
$ net add interface swp1 link speed 100 $net add interface swp1 link autoneg on
Configuration in /etc/network/interfaces
auto swp1 iface swp1 ink-autoneg on link-speed 100
10M or 100M speeds are possible with auto-negotiation off on both sides.
Testing on an Edgecore AS4610-54P showed the ASIC reporting auto-negotiation as on.
Power over Ethernet might require auto-negotiation to be on.
1000BASE-T (RJ-45 SFP adapter)
Off
N/A
NCLU commands
$ net add interface swp1 link speed 1000 $ net add interface swp1 link autoneg off
Configuration in /etc/network/interfaces
auto swp1 iface swp1 link-autoneg off link-speed 1000
The module has two sets of electronics: the port side, which communicates with the switch ASIC and the RJ-45 side.
Auto-negotiation is always used on the RJ-45 side of the link by the PHY built into the module. This is independent of the switch setting. Set auto-negotiation to off.
Auto-negotiation must be enabled on the server side.
1000BASE-T on a 1G fixed copper port
On
N/A
NCLU commands
$ net add interface swp1 link speed 1000 $ net add interface swp1 link autoneg on
Configuration in /etc/network/interfaces
auto swp1 iface swp1 link-autoneg on link-speed 1000
1000BASE-T on a 10G fixed copper port
On
N/A
NCLU commands
$ net add interface swp1 link speed 1000 $ net add interface swp1 link autoneg on
Configuration in /etc/network/interfaces
auto swp1 iface swp1 link-autoneg on link-speed 1000
1000BASE-SX 1000BASE-LX (1G Fiber)
Recommended On
N/A
NCLU commands
$ net add interface swp1 link speed 1000 $ net add interface swp1 link autoneg on
Configuration in /etc/network/interfaces
auto swp1 iface swp1 link-autoneg on link-speed 1000
Without auto-negotiation, the link stays up when there is a single fiber break.
$ net add interface swp1 link speed 10000 $ net add interface swp1 link autoneg off
Configuration in /etc/network/interfaces
auto swp1 iface swp1 link-autoneg off link-speed 10000
The module has two sets of electronics - the port side, which communicates to the switch ASIC and the RJ-45 side.
Auto-negotiation is always used on the RJ-45 side of the link by the PHY built into the module. This is independent of the switch setting. Set link-autoneg to off.
Auto-negotiation needs to be enabled on the server side.
10GBASE-T fixed copper port
On
N/A
NCLU commands
$ net add interface swp1 link speed 10000 $ net add interface swp1 link autoneg on
Configuration in /etc/network/interfaces
auto swp1 iface swp1 link-autoneg on link-speed 10000
10GBASE-CR 10GBASE-LR 10GBASE-SR 10G AOC
Off
N/A
NCLU commands
$ net add interface swp1 link speed 10000 $ net add interface swp1 link autoneg off
Configuration in /etc/network/interfaces
auto swp1 iface swp1 link-autoneg off link-speed 10000
40GBASE-CR4
Recommended On
Disable
NCLU commands
$ net add interface swp1 link speed 40000 $ net add interface swp1 link autoneg on
Configuration in /etc/network/interfaces
auto swp1 iface swp1 link-autoneg on link-speed 40000
40G standards mandate auto-negotiation be enabled for DAC connections.
40GBASE-SR4 40GBASE-LR4 40G AOC
Off
Disable
NCLU commands
$ net add interface swp1 link speed 40000 $ net add interface swp1 link autoneg off
Configuration in /etc/network/interfaces
auto swp1 iface swp1 link-autoneg off link-speed 40000
100GBASE-CR4
On
auto-negotiated
NCLU commands
$ net add interface swp1 link speed 100000 $ net add interface swp1 link autoneg on
Configuration in /etc/network/interfaces
auto swp1 iface swp1 link-autoneg on link-speed 100000
100GBASE-SR4 100G AOC
Off
RS
NCLU commands
$ net add interface swp1 link speed 100000 $ net add interface swp1 link autoneg off $ net add interface swp1 link fec rs
Configuration in /etc/network/interfaces
auto swp1 iface swp1 link-autoneg off link-speed 100000 link-fec rs
100GBASE-LR4
Off
None
NCLU commands
$ net add interface swp1 link speed 100000 $ net add interface swp1 link autoneg off $ net add interface swp1 link fec off
Configuration in /etc/network/interfaces
auto swp1 iface swp1 link-autoneg off link-speed 100000 link-fec off
25GBASE-CR
On
auto-negotiated
NCLU commands
$ net add interface swp1 link speed 25000 $ net add interface swp1 link autoneg on
Configuration in /etc/network/interfaces
auto swp1 iface swp1 link-autoneg on link-speed 25000
Tomahawk predates 802.3by. It does not support RS FEC or auto-negotiation of RS FEC on a 25G port or subport. It does support Base-R FEC.
25GBASE-SR
Off
RS
NCLU commands
$ net add interface swp1 link speed 25000 $ net add interface swp1 link autoneg off $ net add interface swp1 link fec rs
Configuration in /etc/network/interfaces
auto swp1 iface swp1 link-autoneg off link-speed 25000 link-fec rs
Tomahawk predates 802.3by and does not support RS FEC on a 25G port or subport; however it does support Base-R FEC. The configuration for Base-R FEC is as follows:
NCLU commands
$ net add interface swp1 link speed 25000 $ net add interface swp1 link autoneg off $ net add interface swp1 link fec baser
Configuration in /etc/network/interfaces
auto swp1 iface swp1 link-autoneg off link-speed 25000 link-fec baser
Configure FEC to the setting that the cable requires.
25GBASE-LR
Off
None
NCLU commands
$ net add interface swp1 link speed 25000 $ net add interface swp1 link autoneg off $ net add interface swp1 link fec off
Configuration in /etc/network/interfaces
auto swp1 iface swp1 link-autoneg off link-speed 25000 link-fec off
Default Policies for Interface Settings
Instead of configuring settings for each individual interface, you can specify a policy for all interfaces on a switch or tailor custom settings for each interface. Create a file in /etc/network/ifupdown2/policy.d/ and populate the settings accordingly. The following example shows a file called address.json.
Setting the default MTU also applies to the management interface. Be sure to add the iface_defaults to override the MTU for eth0, to remain at 9216.
Breakout Ports
Cumulus Linux lets you:
Break out 100G switch ports into 2x50G, 4x25G, or 4x10G with breakout cables.
Break out 40G switch ports into four separate 10G ports (4x10G) for use with breakout cables.
Combine (aggregate or gang) four 10G switch ports into one 40G port for use with a breakout cable (not to be confused with a bond).
For Broadcom switches with ports that support 100G speeds, you cannot have more than 128 logical ports.
On NVIDIA Spectrum switches running in nonatomic ACL mode, if you break out a port, then reload the switchd service, temporary disruption to traffic occurs while the ACLs are reinstalled.
Port ganging is not supported on NVIDIA Spectrum switches.
NVIDIA Spectrum-1 ASICs have a limit of 64 logical ports. 64-port Broadcom switches with the Tomahawk2 ASIC have a limit of 128 total logical ports. If you want to break ports out to 4x25G or 4x10G, you must configure the logical ports as follows:
You can only break out odd-numbered ports into four logical ports.
You must disable the next even-numbered port. For example, if you break out port 11 into four logical ports, you must disable port 12.
These restrictions do not apply to a 2x50G breakout configuration or to the NVIDIA Spectrum SN2100 and SN2010 switches.
NVIDIA Spectrum-2 and Spectrum-3 ASICs have a limit of 128 logical ports. To ensure that the number of total logical interfaces does not exceed the limit, if you split ports into four interfaces on Spectrum 2 and Spectrum 3 switches with 64 interfaces, you must disable the adjacent port. For example, when splitting port 1 into four 25G interfaces, you must disable port 2 in the /etc/cumulus/ports.conf file:
1=4x25G
2=disabled
When you split a port into two interfaces, such as 2x50G, you do not have to disable the adjacent port.
Valid port configuration and breakout guidance for each platform is provided in the /etc/cumulus/ports.conf file.
Configure a Breakout Port
To configure a breakout port:
This example command breaks out the 100G port on swp1 into four 25G ports:
cumulus@switch:~$ net add interface swp1 breakout 4x25G
cumulus@switch:~$ net pending
cumulus@switch:~$ net commit
To break out swp1 into four 10G ports, run the net add interface swp1 breakout 4x10G command.
On NVIDIA Spectrum switches and 64-port Broadcom switches, you need to disable the next port. The following example command disables swp2.
cumulus@switch:~$ net add interface swp2 breakout disabled
cumulus@switch:~$ net pending
cumulus@switch:~$ net commit
These commands break out swp1 into four 25G interfaces in the /etc/cumulus/ports.conf file and create four interfaces in the /etc/network/interfaces file:
cumulus@switch:~$ cat /etc/network/interfaces
...
auto swp1s0
iface swp1s0
auto swp1s1
iface swp1s1
auto swp1s2
iface swp1s2
auto swp1s3
iface swp1s3
...
When you commit your change on a Broadcom switch, switchd restarts to apply the changes. The restart interrupts network services. When you commit your change on an NVIDIA Spectrum switch, switchd reloads and there is no interruption to network services.
Edit the /etc/cumulus/ports.conf file to configure the port breakout. The following example breaks out the 100G port on swp1 into four 25G ports. To break out swp1 into four 10G ports, use 1=4x10G. On NVIDIA Spectrum switches and 64-port Broadcom switches with the Tomahawk2 ASIC, you need to disable the next port. The example also disables swp2.
The /etc/cumulus/ports.conf file varies across different hardware platforms.
Configure the breakout ports in the /etc/network/interfaces file. The following example shows the swp1 breakout ports (swp1s0, swp1s1, swp1s2, and swp1s3).
cumulus@switch:~$ sudo cat /etc/network/interfaces
...
auto swp1s0
iface swp1s0
auto swp1s1
iface swp1s1
auto swp1s2
iface swp1s2
auto swp1s3
iface swp1s3
...
On a Broadcom switch, restart switchd with the sudo systemctl restart switchd.service command.
Restarting the switchd service causes all network ports to reset, interrupting network services, in addition to resetting the switch hardware configuration.
On an NVIDIA Spectrum switch, you can reload switchd with the sudo systemctl reload switchd.service command. The reload does not interrupt network services.
Run the net del interface <interface> command. For example:
cumulus@switch:~$ net del interface swp1s0
cumulus@switch:~$ net del interface swp1s1
cumulus@switch:~$ net del interface swp1s2
cumulus@switch:~$ net del interface swp1s3
cumulus@switch:~$ net pending
cumulus@switch:~$ net commit
Manually edit the /etc/cumulus/ports.conf file to configure the interface for the original speed. For example:
Restarting the switchd service causes all network ports to reset, interrupting network services, in addition to resetting the switch hardware configuration.
On a Spectrum switch, you can reload switchd with the sudo systemctl reload switchd.service command. The reload does not interrupt network services.
Restarting the switchd service causes all network ports to reset, interrupting network services, in addition to resetting the switch hardware configuration.
On an NVIDIA Spectrum switch, you can reload switchd with the sudo systemctl reload switchd.service command. The reload does not interrupt network services.
You can gang (combine) four 10G ports into one 40G port for use with a breakout cable, provided you follow these requirements:
You must gang four 10G ports in sequential order. For example, you cannot gang swp1, swp10, swp20 and swp40 together.
The ports must be in increments of four, with the starting port being swp1 (or swp5, swp9, or so forth); so you cannot gang swp2, swp3, swp4 and swp5 together.
Port ganging is not supported on NVIDIA Spectrum switches.
The /etc/cumulus/ports.conf file varies across different hardware platforms.
To gang swp1 through swp4 into a 40G port, run the following commands:
cumulus@switch:~$ net add int swp1-4 breakout /4
cumulus@switch:~$ net pending
cumulus@switch:~$ net commit
These commands create the following configuration snippet in the /etc/cumulus/ports.conf file:
Restarting the switchd service causes all network ports to reset, interrupting network services, in addition to resetting the switch hardware configuration.
Logical Switch Port Limitations
100G and 40G switches can support a certain number of logical ports, depending on the manufacturer; these include:
NVIDIA Spectrum SN2700, SN2700B, SN2410, and SN2410B switches
Switches with Broadcom Tomahawk, Trident II, Trident II+, and Trident3 chipsets
Before you configure any logical/unganged ports on a switch, check the limitations listed in /etc/cumulus/ports.conf; this file is specific to each manufacturer.
The following example shows the logical port limitation provided in the Dell Z9254F-ON ports.conf file. The maximum number of ports for this switch is 128.
# ports.conf --
#
# configure port speed, aggregation, and subdivision.
#
# The Dell Z9264F has:
# 64 QSFP28 ports numbered 1-64
# These ports are configurable as 100G, 50G, 40G, or split into
# 2x50G, 4x25G, or 4x10G ports.
#
# NOTE: You must restart switchd for any changes to take effect.
# Only "odd-numbered " port can be split into 4 interfaces and if an odd-numbered
# port is split in a 4X configuration, the port adjacent to it (even-numbered port)
# has to be set to "disabled " in this file. When splitting a port into two
# interfaces, like 2x50G, it is NOT required that the adjacent port be
# disabled. For example, when splitting port 11 into 4 10G interfaces, port
# 12 must be configured as "disabled" like this:
#
# 11=4x10G
# 12=disabled
# QSFP28 ports
#
# <port label> = [100G|50G|40G|2x50G|4x25G|4x10G|disabled]
NVIDIA Spectrum SN2700 and SN2700B switches have a limit of 64 logical ports in total. However, the logical ports must be configured in a specific way. See the note above.
ports.conf File Validator
Cumulus Linux includes a ports.conf validator that switchd runs automatically before the switch starts up to confirm that the file syntax is correct. You can run the validator manually to verify the syntax of the file whenever you make changes. The validator is useful if you want to copy a new ports.conf file to the switch with automation tools, then validate that it has the correct syntax.
To run the validator manually, run the /usr/cumulus/bin/validate-ports -f <file> command. For example:
To verify SFP settings, run the ethtool -m command. The following example shows the vendor, type and power output for the swp1 interface.
cumulus@switch:~$ sudo ethtool -m swp1 | egrep 'Vendor|type|power\s+:'
Transceiver type : 10G Ethernet: 10G Base-LR
Vendor name : FINISAR CORP.
Vendor OUI : 00:90:65
Vendor PN : FTLX2071D327
Vendor rev : A
Vendor SN : UY30DTX
Laser output power : 0.5230 mW / -2.81 dBm
Receiver signal average optical power : 0.7285 mW / -1.38 dBm
Considerations
Auto-negotiation and FEC on NVIDIA Spectrum Switches
On NVIDIA Spectrum switches, if auto-negotiation is disabled on 100G and 25G interfaces, you must set FEC to OFF, RS, or BaseR to match the neighbor. The FEC default setting of auto does not link up when auto-negotiation is disabled.
Port Speed and the ifreload -a Command
When configuring port speed or break outs in the /etc/cumulus/ports.conf file, you need to run the ifreload -a command to reload the configuration after restarting switchd in the following cases:
If you configure, or configure then remove, the port speed in the /etc/cumulus/ports.conf file and you also set or remove the speed on the same physical port or breakouts of that port in the /etc/network/interfaces file since the last time you restarted switchd.
If you break out a switch port or remove a break out port and the port speed is set in both the /etc/cumulus/ports.conf file and the /etc/network/interfaces file.
Port Speed Configuration
If you change the port speed in the /etc/cumulus/ports.conf file but the speed is also configured for that port in the /etc/network/interfaces file, after you edit the /etc/cumulus/ports.conf file and restart switchd, you must also run the ifreload -a command so that the /etc/network/interfaces file is also updated with your change.
10G and 1G SFPs Inserted in a 25G Port
For 10G and 1G SFPs inserted in a 25G port on a Broadcom switch, you must configure the four ports in the same core to be 10G. Each set of four 25G ports are controlled by a single core; therefore, each core must run at the same clock speed. The four ports must be in sequential order; for example, swp1, swp2, swp3, and swp4, unless a particular core grouping is specified in the /etc/cumulus/ports.conf file.
Edit the /etc/cumulus/ports.conf file and configure the four ports to be 10G. 1G SFPs are clocked at 10G speeds; therefore, for 1G SFPs, the /etc/cumulus/ports.conf file entry must also specify 10G. Currently you cannot use NCLU commands for this step.
You cannot use ethtool -s speed XX (or ifreload -a after setting the speed in the /etc/network/interfaces file) to change the port speed unless the four ports in a core group are already configured to 10G and switchd has been restarted. If the ports are still in 25G mode, using
ethtool or ifreload to change the speed to 10G or 1G returns an error (and a return code of 255).
If you change the speed with ethtool to a setting already in use in the /etc/cumulus/ports.conf file, ethtool (and ifreload -a) do not return an error and no changes are made.
Restarting the switchd service causes all network ports to reset, interrupting network services, in addition to resetting the switch hardware configuration.
If you want to set the speed of any SFPs to 1G, set the port speed to 1000 Mbps using NCLU commands; this is not necessary for 10G SFPs. You don’t need to set the port speed to 1G for all four ports. For example, if you intend only for swp5 and swp6 to use 1G SFPs, do the following:
cumulus@switch:~$ net add interface swp5-swp6 link speed 1000
cumulus@switch:~$ net pending
cumulus@switch:~$ net commit
100G switch ASICs do not support 1000Base-X auto-negotiation (Clause 37), which is recommended for 1G fiber optical modules. As a result, single fiber breaks cannot be detected when using 1G optical modules on these switches.
The auto-negotiation setting must be the same on both sides of the connection. If using 1G fiber modules in 25G SFP28 ports, ensure auto-negotiation is disabled on the link partner interface as well.
Delta AGV848v1 Switch and Breakout Ports
Breaking out the 100G ports to 4x10G and 4x25G is not supported on the Delta AGV848v1 switch.
Timeout Error on Quanta LY8 and LY9 Switches
On Quanta T5048-LY8 and T3048-LY9 switches, an Operation timed out error occurs when you remove and reinsert a QSFP module.
You cannot remove the QSFPx2 module while the switch is powered on; it is not hot-swappable. However, if an Operation timed out error occurs, restart switchd to bring the link up. Be aware that this disrupts your network.
The front SFP+ ports (swp33 and swp34) are disabled in Cumulus Linux on the following switches:
Dell Z9100-ON
Penguin Arctica 3200-series switches (the 3200C, 3200XL and 3200XLP)
Supermicro SSE-C3632S
These ports appear as disabled in the /etc/cumulus/ports.conf file.
200G Interfaces on the Dell S5248F Switch
On the Dell S5248F switch, the 2x200G QSFP-DD interfaces labeled 49/50 and 51/52 are not supported natively at 200G speeds. The interfaces are supported with 100G cables; however, you can only use one 100G from each QSFP-DD port. The upper QSFP-DD port is named swp49 and the lower QSFP-DD port is named swp52.
QSFP+ Ports on the Dell S5232F Switch
Cumulus Linux does not support the 2x10G QSFP+ ports on the Dell S5232F switch.
QSFP+ Ports on the Dell S4148T Switch
On the Dell S4148T switch, the two QSFP+ ports are set to disabled by default and the four QSFP28 ports are configured for 100G. The following example shows the default settings in the /etc/cumulus/ports.conf file for this switch:
To enable the two QSFP+ ports, you must configure all four QSFP28 ports for either 40G or 4x10G. You cannot use either of the QSFP+ ports if any of the QSFP28 ports are configured for 100G.
The following example shows the /etc/cumulus/ports.conf file with all four QSFP28 ports configured for 40G and both QSFP+ ports enabled:
To disable the QSFP+ ports, you must set the ports to disabled. Do not comment out the lines as this prevents switchd from restarting.
1000BASE-T SFP Modules Supported Only on Certain 25G Platforms
1000BASE-T SFP modules are supported on only the following 25G platforms:
Cumulus Express CX-5148-S and the Edgecore AS7326-56X, provided the switch has board revision R01D (to determine the revision of the board, look for the output in the label revision field when you run decode-syseeprom)
Dell S5248F-ON
NVIDIA Spectrum SN2410
NVIDIA Spectrum SN2010
1000BASE-T SFP modules are not supported on any 100G or faster platforms.
NVIDIA Spectrum SN2100 Switch and eth0 Link Speed
After rebooting the NVIDIA Spectrum SN2100 switch, eth0 always has a speed of 100Mb/s. If you bring the interface down and then back up again, the interface negotiates 1000Mb. This only occurs the first time the interface comes up.
To work around this issue, add the following commands to the /etc/rc.local file to flap the interface automatically when the switch boots:
modprobe -r igb
sleep 20
modprobe igb
Link Speed on the EdgeCore AS7326-56X Switch
On the EdgeCore AS7326-56X switch, all four switch ports in each port group must be set to the same link speed; otherwise, the links do not come up. These ports are set to 25G by default, but can also be set to 10G. The port groups on this switch are as follows, where each row is a port group:
1 2 3 6*
4 5 7* 9
8 10 11* 12
13 14 15 18*
16 17 19* 21
20 22 23* 24
25 26 27 30*
28 29 31* 33
32 34 35* 36
37 38 39 42*
40* 41 43 45
44* 46 47 48
For example, if you configure port 19 for 10G, you must also configure ports 16, 17 and 21 for 10G.
Additionally, you can gang each port group together as a 100G or 40G port. When ganged together, one port (based on the arrangement of the ports) is designated as the gang leader. This port’s number is used to configure the ganged ports and is marked with an asterisk (*) above.
The EdgeCore AS7326-56X is a 48x25G + 8x100G + 2x10G switch. The dedicated 10G ports are not currently supported in Cumulus Linux. However, you can configure all other ports to run at 10G speeds.
Link Speed on the Lenovo NE2572O Switch
The Lenovo NE2572O switch has external retimers on swp1 through swp8. Currently, these ports only support a speed of 25G.
Link Speed and Auto-negotiation on Switches with SOL
The following switches that use Serial over LAN technology (SOL) do not support eth0 speed or auto-negotiation changes:
EdgeCore AS7816-64X
Penguin Arctica 4804ip
Penguin Arctica NX3200c
Penguin Arctica NX4808xxv
Delay in Reporting Interface as Operational Down
When you remove two transceivers simultaneously from a switch, both interfaces show the carrier down status immediately. However, it takes one second for the second interface to show the operational down status. In addition, the services on this interface also take an extra second to come down.
NVIDIA Spectrum-2 and Tomahawk-based Switches Support Different FEC Modes
The NVIDIA Spectrum-2 (25G) switch only supports RS FEC. The Tomahawk-based switch only supports BASE-R FEC. These two switches do not share compatible FEC modes and do not interoperate reliably.
Maverick Switches with Modules that Don’t Support Auto-negotiation
On a Maverick switch, if auto-negotiation is configured on a 10G interface and the installed module does not support auto-negotiation (for example, 10G DAC, 10G Optical, 1G RJ45 SFP), the link breaks.
To work around this issue, disable auto-negotiation on interfaces where it is not supported.
Dell Z9264F-ON 10G Interfaces are Unsupported
The Dell Z9264F-ON has 64x100G + 2x 10G SFP+ ports. The 2x 10G SFP+ ports are not supported in Cumulus Linux.
ifplugd is an Ethernet link-state monitoring daemon that executes user-specified scripts to configure an Ethernet device when a cable is plugged in, or automatically unconfigure an Ethernet device when a cable is removed. Follow the steps below to install and configure the ifplugd daemon.
Install ifplugd
You can install this package even if the switch is not connected to the internet, as it is contained in the cumulus-local-apt-archive repository that is embedded in the Cumulus Linux image.
To install ifplugd:
Update the switch before installing the daemon:
cumulus@switch:~$ sudo -E apt-get update
Install the ifplugd package:
cumulus@switch:~$ sudo -E apt-get install ifplugd
Configure ifplugd
After you install ifplugd, you must edit two configuration files:
/etc/default/ifplugd
/etc/ifplugd/action.d/ifupdown
The example configuration below configures ifplugd to bring down all uplinks when the peer bond goes down in an MLAG environment.
Open /etc/default/ifplugd in a text editor and configure the file as appropriate. Add the peerbond name before you save the file.
Open the /etc/ifplugd/action.d/ifupdown file in a text editor. Configure the script, then save the file.
#!/bin/sh
set -e
case "$2" in
up)
clagrole=$(clagctl | grep "Our Priority" | awk '{print $8}')
if [ "$clagrole" = "secondary" ]
then
#List all the interfaces below to bring up when clag peerbond comes up.
for interface in swp1 bond1 bond3 bond4
do
echo "bringing up : $interface"
ip link set $interface up
done
fi
;;
down)
clagrole=$(clagctl | grep "Our Priority" | awk '{print $8}')
if [ "$clagrole" = "secondary" ]
then
#List all the interfaces below to bring down when clag peerbond goes down.
for interface in swp1 bond1 bond3 bond4
do
echo "bringing down : $interface"
ip link set $interface down
done
fi
;;
esac
Restart the ifplugd daemon to implement the changes:
The default shell for ifplugd is dash (/bin/sh) instead of bash, as it provides a faster and more nimble shell. However, dash contains fewer features than bash (for example, dash is unable to handle multiple uplinks).
Buffer and Queue Management
Hardware datapath configuration manages packet buffering, queueing and scheduling in hardware.
The /usr/lib/python2.7/dist-packages/cumulus/__chip_config/[bcm|mlx]/datapath.conf assigns buffer space and egress queues. The default thresholds defined in the datapath.conf file are intended for data center environments, but certain workloads may require additional tuning. It is best to make small, incremental changes to validate the changes with your application performance. Be sure to back up the original file before making changes.
Each packet is assigned to an ASIC Class of Service (CoS) value based on the priority value of the packet stored in the 802.1p (Class of Service) or DSCP (Differentiated Services Code Point) header field. The choice to schedule packets based on COS or DSCP is a configurable option in the /etc/cumulus/datapath/traffic.conf file.
Priority groups include:
Control: Highest priority traffic
Service: Second-highest priority traffic
Bulk: All remaining traffic
The scheduler is configured to use a hybrid scheduling algorithm. It applies strict priority to control traffic queues and a weighted round robin selection from the remaining queues. Unicast packets and multicast packets with the same priority value are assigned to separate queues, which are assigned equal scheduling weights.
You can configure Quality of Service (QoS) for switches on the following platforms only:
Broadcom Tomahawk, Trident II, Trident II+, and Trident3
Mellanox Spectrum, Spectrum-2, and Spectrum-3
Traffic Marking
You can mark traffic for egress packets through iptables or ip6tables rule classifications. To enable these rules, you do one of the following:
Mark DSCP values in egress packets.
Mark 802.1p CoS values in egress packets.
To enable traffic marking, use cl-acltool. Add the -p option to specify the location of the policy file. By default, if you do not include the -p option, cl-acltool looks for the policy file in /etc/cumulus/acl/policy.d/.
The iptables-/ip6tables-based marking is supported with the following action extension:
-j SETQOS --set-dscp 10 --set-cos 5
For ebtables, the setqos keyword must be in lowercase, as in:
[ebtables]
-A FORWARD -o swp5 -j setqos --set-cos 5
You can specify one of the following targets for SETQOS/setqos:
Option
Description
--set-cos INT
Sets the datapath resource/queuing class value. Values are defined in IEEE P802.1p.
--set-dscp value
Sets the DSCP field in packet header to a value, which can be either a decimal or hex value.
--set-dscp-class class
Sets the DSCP field in the packet header to the value represented by the DiffServ class value. This class can be EF, BE or any of the CSxx or AFxx classes.
You can specify either --set-dscp or --set-dscp-class, but not both.
You can put the rule in either the mangle table or the default filter table; the mangle table and filter table are put into separate TCAM slices in the hardware.
To put the rule in the mangle table, include -t mangle; to put the rule in the filter table, omit -t mangle.
Priority Flow Control
Priority flow control, as defined in the IEEE 802.1Qbb standard, provides a link-level flow control mechanism that can be controlled independently for each Class of Service (CoS) with the intention to ensure no data frames are lost when congestion occurs in a bridged network.
PFC is not supported on switches with the Helix4 ASIC.
PFC is a layer 2 mechanism that prevents congestion by throttling packet transmission. When PFC is enabled for received packets on a set of switch ports, the switch detects congestion in the ingress buffer of the receiving port and signals the upstream switch to stop sending traffic. If the upstream switch has PFC enabled for packet transmission on the designated priorities, it responds to the downstream switch and stops sending those packets for a period of time.
PFC operates between two adjacent neighbor switches; it does not provide end-to-end flow control. However, when an upstream neighbor throttles packet transmission, it could build up packet congestion and propagate PFC frames further upstream: eventually the sending server could receive PFC frames and stop sending traffic for a time.
The PFC mechanism can be enabled for individual switch priorities on all or specific switch ports for received and/or transmitted traffic. The ingress buffer occupancy of the switch port is used to measure congestion. If congestion is present, the switch transmits flow control frames to the upstream switch. Packets with priority values that do not have PFC configured are not counted during congestion detection and they do not get throttled by the upstream switch when it receives flow control frames.
PFC congestion detection is implemented on the switch using xoff and xon threshold values for the specific ingress buffer used by the targeted switch priorities. When a packet enters the buffer and the buffer occupancy is above the xoff threshold, the switch transmits an Ethernet PFC frame to the upstream switch to signal packet transmission must stop. When the buffer occupancy drops below the xon threshold, the switch sends another PFC frame upstream to signal that packet transmission can resume. (PFC frames contain a quanta value to indicate a timeout value for the upstream switch: packet transmission can resume after the timer has expired or when a PFC frame with quanta == 0 is received from the downstream switch.)
After the downstream switch sends a PFC frame upstream, it continues to receive packets until the upstream switch receives and responds to the PFC frame. The downstream ingress buffer must be large enough to store those additional packets after the xoff threshold is reached.
Priority flow control is fully supported on both Broadcom (including the Edgecore Minipack-AS8000/Trident3) and Mellanox switches.
PFC is disabled by default in Cumulus Linux. To configure PFC, update and uncomment the settings in the priority flow control section of the /etc/cumulus/datapath/traffic.conf file.
# to configure priority flow control on a group of ports:
# -- assign cos value(s) to the cos list
# -- add or replace port group names in the port group list
# -- for each port group in the list
# -- populate the port set, e.g.
# swp1-swp4,swp8,swp50s0-swp50s3
# -- set a PFC buffer size in bytes for each port in the group
# -- set the xoff byte limit (buffer limit that triggers PFC frames transmit to start)
# -- set the xon byte delta (buffer limit that triggers PFC frames transmit to stop)
# -- enable PFC frame transmit and/or PFC frame receive
# priority flow control
pfc.port_group_list = [pfc_port_group]
pfc.pfc_port_group.cos_list = []
pfc.pfc_port_group.port_set = swp1-swp4,swp6
pfc.pfc_port_group.port_buffer_bytes = 25000
pfc.pfc_port_group.xoff_size = 10000
pfc.pfc_port_group.xon_delta = 2000
pfc.pfc_port_group.tx_enable = true
pfc.pfc_port_group.rx_enable = true
#
# Specify cable length in mts
pfc.pfc_port_group.cable_length = 10
PFC Setting
Description
pfc.port_group_list
The name of the port group in brackets.
pfc.pfc_port_group.cos_list
The CoS value to the ports.
pfc.pfc_port_group.port_set
The ports in the port group.
pfc.pfc_port_group.port_buffer_bytes
The PFC buffer size. This is the maximum number of bytes allocated for storing bursts of packets, guaranteed at the ingress port. The default is 25000 bytes. This setting is optional. If not provided, the value is derived from the port speed, port MTU, or port cable length.
pfc.pfc_port_group.xoff_size
The xoff byte limit. This is a threshold for the PFC buffer; when this limit is reached, an xoff transition is initiated, signaling the upstream port to stop sending traffic, during which time packets continue to arrive due to the latency of the communication. The default is 10000 bytes. This setting is optional. If not provided, the value is derived from the port speed, port MTU, or port cable length.
pfc.pfc_port_group.xon_delta
The xon delta limit. This is the number of bytes to subtract from the xoff limit, which results in a second threshold at which the egress port resumes sending traffic. After the xoff limit is reached and the upstream port stops sending traffic, the buffer begins to drain. When the buffer reaches 8000 bytes (assuming default xoff and xon settings), the egress port signals that it can start receiving traffic again. The default is 2000 bytes. This setting is optional. If not provided, the value is derived from the port speed, port MTU, or port cable length.
pfc.pfc_port_group.tx_enable
Enables the egress port to signal the upstream port to stop sending traffic. The default is true.
pfc.pfc_port_group.rx_enable
Enables the egress port to receive notifications and act on them. The default is true.
pfc.pfc_port_group.cable_length
The length of the port group cable.
On Broadcom switches, after you modify the settings in the /etc/cumulus/datapath/traffic.conf file, you must restart switchd for the changes to take effect; run the cumulus@switch:~$ sudo systemctl restart switchd.service command.
Restarting the switchd service causes all network ports to reset, interrupting network services, in addition to resetting the switch hardware configuration.
On NVIDIA Spectrum switches, changes to the settings in the /etc/cumulus/datapath/traffic.conf file do not require you to restart switchd. However, you must run the echo 1 > /cumulus/switchd/config/traffic/reload command to apply the settings.
Always run the syntax checker syntax checker before applying the configuration changes.
Port Groups
A port group refers to one or more sequences of contiguous ports. You can define multiple port groups by adding:
A comma-separated list of port group names to the port_group_list.
The port_set, rx_enable, and tx_enable configuration lines for each port group.
You can specify the set of ports in a port group in comma-separate sequences of contiguous ports; you can see which ports are contiguous in the /var/lib/cumulus/porttab file. The syntax supports:
A single port (swp1s0 or swp5).
A sequence of regular swp ports (swp2-swp5).
A sequence within a breakout swp port (swp6s0-swp6s3).
A sequence of regular and breakout ports, provided they are all in a contiguous range. For example:
The PAUSE frame is a flow control mechanism that halts the transmission of the transmitter for a specified period of time, which might be needed if a server or other network node within the data center receives traffic faster than it can handle. In Cumulus Linux, you can configure individual ports to execute link pause by:
Transmitting pause frames when the ingress buffers become congested (TX pause enable).
Responding to received pause frames (RX pause enable).
Link pause is disabled by default. To enable link pause, you must configure settings in the /etc/cumulus/datapath traffic.conf file.
▼
What's the difference between link pause and priority flow control?
Priority flow control is applied to an individual priority group for a specific ingress port.
Link pause (also known as port pause or global pause) is applied to all the traffic for a specific ingress port.
Here is an example configuration that enables TX pause and RX pause for swp1 through swp4 and swp6:
# to configure pause on a group of ports:
# -- add or replace port group names in the port group list
# -- for each port group in the list
# -- populate the port set, e.g.
# swp1-swp4,swp8,swp50s0-swp50s3
# -- set a pause buffer size in bytes for each port
# -- set the xoff byte limit (buffer limit that triggers pause frames transmit to start)
# -- set the xon byte delta (buffer limit that triggers pause frames transmit to stop)
# -- enable pause frame transmit and/or pause frame receive
link pause
link_pause.port_group_list = [pause_port_group]
link_pause.pause_port_group.port_set = swp1-swp4,swp6
link_pause.pause_port_group.port_buffer_bytes = 25000
link_pause.pause_port_group.xoff_size = 10000
link_pause.pause_port_group.xon_delta = 2000
link_pause.pause_port_group.rx_enable = true
link_pause.pause_port_group.tx_enable = true
# Specify cable length in mts
link_pause.pause_port_group.cable_length = 10
This link_pause.pause_port_group.port_buffer_bytes, link_pause.pause_port_group.xoff_size, and link_pause.pause_port_group.xon_delta settings are optional. If not provided, the values are derived from the port speed, port MTU, or port cable length.
On Broadcom switches, after you modify the settings in the /etc/cumulus/datapath/traffic.conf file, you must restart switchd for the changes to take effect; run the cumulus@switch:~$ sudo systemctl restart switchd.service command.
Restarting the switchd service causes all network ports to reset, interrupting network services, in addition to resetting the switch hardware configuration.
On NVIDIA Spectrum switches, changes to the settings in the /etc/cumulus/datapath/traffic.conf file do not require you to restart switchd. However, you must run the echo 1 > /cumulus/switchd/config/traffic/reload command to apply the settings.
Always run the syntax checker syntax checker before applying the configuration changes.
Cut-through Mode and Store and Forward Switching
Cut-through mode is disabled in Cumulus Linux by default on switches with Broadcom ASICs. On NVIDIA Spectrum switches, you cannot disable cut-through mode.
# Cut-through is disabled by default on all chips with the exception of
# Spectrum. On Spectrum cut-through cannot be disabled.
#cut_through_enable = false
If cut-though mode is enabled and link pause is asserted, Cumulus Linux generates a TOVR and TUFL ERROR; certain error counters increment on a given physical port.
On Broadcom switches, after you modify the settings in the /etc/cumulus/datapath/traffic.conf file, you must restart switchd for the changes to take effect; run the cumulus@switch:~$ sudo systemctl restart switchd.service command. Always run the syntax checker syntax checker before applying the configuration changes.
Restarting the switchd service causes all network ports to reset, interrupting network services, in addition to resetting the switch hardware configuration.
On switches using Broadcom Tomahawk, Trident II, Trident II+, and Trident3 ASICs, Cumulus Linux supports store and forward switching but does not support cut-through mode.
On switches with the Mellanox Spectrum ASIC, Cumulus Linux supports cut-through mode but does not support store and forward switching.
Congestion Notification
Explicit Congestion Notification (ECN) is defined by RFC 3168. ECN enables the Cumulus Linux switch to mark a packet to signal impending congestion instead of dropping the packet, which is how TCP typically behaves when ECN is not enabled.
ECN is a layer 3 end-to-end congestion notification mechanism only. Packets can be marked as ECN-capable transport (ECT) by the sending server. If congestion is observed by any switch while the packet is getting forwarded, the ECT-enabled packet can be marked by the switch to indicate the congestion. The end receiver can respond to the ECN-marked packets by signaling the sending server to slow down transmission. The sending server marks a packet ECT by setting the least two significant bits in an IP header DiffServ (ToS) field to 01 or 10. A packet that has the least teo significant bits set to 00 indicates a non-ECT-enabled packet.
The ECN mechanism on a switch only marks packets to notify the end receiver. It does not take any other action or change packet handling in any way, nor does it respond to packets that have already been marked ECN by an upstream switch.
On Trident II switches only, if ECN is enabled on a specific queue, the ASIC also enables RED on the same queue. If the packet is ECT marked (the ECN bits are 01 or 10), the ECN mechanism executes as described above. However, if it is entering an ECN-enabled queue but is not ECT marked (the ECN bits are 00), then the RED mechanism uses the same threshold and probability values to decide whether to drop the packet. Packets entering a non-ECN-enabled queue do not get marked or dropped due to ECN or RED in any case.
ECN is implemented on the switch using minimum and maximum threshold values for the egress queue length. When a packet enters the queue and the average queue length is between the minimum and maximum threshold values, a configurable probability value will determine whether the packet is marked. If the average queue length is above the maximum threshold value, the packet is always marked.
The downstream switches with ECN enabled perform the same actions as the traffic is received. If the ECN bits are set, they remain set. The only way to overwrite ECN bits is to set the ECN bits to 11.
ECN is supported on Broadcom Tomahawk, Tomahawk2, Trident II, Trident II+ and Trident3, and Mellanox Spectrum ASICs.
ECN is disabled by default in Cumulus Linux. You can enable ECN for individual switch priorities on specific switch ports in the /etc/cumulus/datapath/traffic.conf file:
Specify the name of the port group in ecn.port_group_list in brackets; for example, ecn.port_group_list = [ecn_port_group].
Assign a CoS value to the port group in ecn.ecn_port_group.cos_list. If the CoS value of a packet matches the value of this setting, ECN is applied.
Populate the port group with its member ports (ecn.ecn_port_group.port_set). Congestion is measured on the egress port queue for the ports listed here, using the average queue length: if congestion is present, a packet entering the queue can be marked to indicate that congestion was observed. Marking a packet involves setting the least 2 significant bits in the IP header DiffServ (ToS) field to 11.
The switch priority value(s) are mapped to specific egress queues for the target switch ports.
The ecn.ecn_port_group.probability value indicates the probability of a packet being marked if congestion is experienced.
The following configuration example shows ECN configured for ports swp1 through swp4 and swp6:
# Explicit Congestion Notification
# to configure ECN and RED on a group of ports:
# -- add or replace port group names in the port group list
# -- assign cos value(s) to the cos list
# -- for each port group in the list
# -- populate the port set, e.g.
# swp1-swp4,swp8,swp50s0-swp50s3
# -- to enable RED requires the latest traffic.conf
ecn_red.port_group_list = [ecn_red_port_group]
ecn_red.ecn_red_port_group.cos_list = [3]
ecn_red.ecn_red_port_group.port_set = swp1-swp4,swp6
ecn_red.ecn_red_port_group.ecn_enable = true
ecn_red.ecn_red_port_group.red_enable = false
ecn_red.ecn_red_port_group.min_threshold_bytes = 40000
ecn_red.ecn_red_port_group.max_threshold_bytes = 200000
ecn_red.ecn_red_port_group.probability = 100
On Broadcom switches, after you modify the settings in the /etc/cumulus/datapath/traffic.conf file, you must restart switchd for the changes to take effect; run the cumulus@switch:~$ sudo systemctl restart switchd.service command.
Restarting the switchd service causes all network ports to reset, interrupting network services, in addition to resetting the switch hardware configuration.
On NVIDIA Spectrum switches, changes to the settings in the /etc/cumulus/datapath/traffic.conf file do not require you to restart switchd. However, you must run the echo 1 > /cumulus/switchd/config/traffic/reload command to apply the settings.
Always run the syntax checker syntax checker before applying the configuration changes.
Scheduling Weights Per Egress Queue
On NVIDIA Spectrum switches, you can set the scheduling weight per egress queue, which determines the amount of bandwidth assigned to the queue. Cumulus Linux supports eight queues per port. You can either use a default profile that each port inherits or create separate profiles that map a different set of ports. Each profile, including the default profile, has weights configured for each egress queue (0-7).
You set the weights per egress queue as a percentage. The total weight percentages for all egress queues cannot be greater than 100. If you do not define a weight for an egress queue, no scheduling is done for packets on this queue if congestion occurs. If you want to configure strict scheduling on an egress queue (always send every single packet in the queue) set the value to 0.
You can configure per queue egress scheduling with NCLU commands or manually by editing the /etc/cumulus/datapath/traffic.conf file.
Cumulus Linux provides a default profile. You can either enable the default profile or configure a non-default profile.
The following example commands enable the default profile:
cumulus@switch:~$ net add qos egress-sched default_profile
cumulus@switch:~$ net pending
cumulus@switch:~$ net commit
In the default profile, the egress queue weights are set as follows. You cannot modify these values with NCLU.
The following commands create a non-default profile for port group port_group1 for swp2 and swp3, set the weight to 30 percent on egress queue 2 and strict scheduling on egress queue 3:
cumulus@switch:~$ net add qos egress-sched profile port_set swp2-swp3
cumulus@switch:~$ net add qos egress_sched profile sched_port_group1 queue 2 dwrr bw_percent 30
cumulus@switch:~$ net add qos egress_sched profile sched_port_group1 queue 3 strict
cumulus@switch:~$ net pending
cumulus@switch:~$ net commit
The NCLU commands save the configuration in the /etc/cumulus/datapath/traffic.conf file. For example:
To configure a non-default profile with NCLU, you must configure the port set for the profile before you configure the bandwidth percent for the egress queues.
The total bandwidth percent for all egress queues cannot be greater than 100.
If you delete the port set for a non-default profile, the bandwidth percent for all the queues in that profile are deleted.
To configure per queue egress scheduling manually in the /etc/cumulus/datapath/traffic.conf file, update and uncomment the settings in the default egress scheduling weight per egress queue section of the /etc/cumulus/datapath/traffic.conf file.
The following example enables the default profile, and sets the weight to 30 percent for egress queue 2 and 10 percent for the remaining egress queues. The settings are applied to all ports.
# default egress scheduling weight per egress queue
# To be applied to all the ports if port_group profile not configured
# If you do not specify any bw_percent of egress_queues, those egress queues
# will assume DWRR weight 0 - no egress scheduling for those queues
# '0' indicates strict priority
default_egress_sched.egr_queue_0.bw_percent = 10
default_egress_sched.egr_queue_1.bw_percent = 10
default_egress_sched.egr_queue_2.bw_percent = 30
default_egress_sched.egr_queue_3.bw_percent = 10
default_egress_sched.egr_queue_4.bw_percent = 10
default_egress_sched.egr_queue_5.bw_percent = 10
default_egress_sched.egr_queue_6.bw_percent = 10
default_egress_sched.egr_queue_7.bw_percent = 10
The following example creates a non-default profile for port group port_group1, sets the weight to 30 percent for egress queue 1 and 2, to 0 for egress queue 6 and 7 (always send every single packet from egress queue 6 and 7 before any other queue), and 10 percent for the remaining egress queues:
On Broadcom switches, after you modify the settings in the /etc/cumulus/datapath/traffic.conf file, you must restart switchd for the changes to take effect; run the cumulus@switch:~$ sudo systemctl restart switchd.service command.
Restarting the switchd service causes all network ports to reset, interrupting network services, in addition to resetting the switch hardware configuration.
On NVIDIA Spectrum switches, changes to the settings in the /etc/cumulus/datapath/traffic.conf file do not require you to restart switchd. However, you must run the echo 1 > /cumulus/switchd/config/traffic/reload command to apply the settings.
Always run the syntax checker syntax checker before applying the configuration changes.
Traffic Shaping
Configure traffic shaping to regulate network traffic by using a lower bitrate than the physical interface is capable of. Traffic shaping prevents packets from being dropped or lost due to bandwidth limits or congestion.
To configure traffic shaping, update and uncomment the settings in the Hierarchical traffic shaping section of the the /etc/cumulus/datapath/traffic.conf file. You can configure traffic shaping per egress queue or aggregated at the port level.
The egress shaping rate configured in the /etc/cumulus/datapath/traffic.conf is always the layer 1 rate. The calculated shaping rate considers overheads in the Ethernet frame like the interframe gap, preamble, cyclic redundancy check (CRC) and so on. The egress layer 3 throughput measured is always less than the maximum shaper rate configured.
The following example shows the Hierarchical traffic shaping section of the /etc/cumulus/datapath/traffic.conf file.
...
# Hierarchical traffic shaping
# to configure shaping at 2 levels:
# - per egress queue egr_queue_0 - egr_queue_7
# - port level aggregate
# -- add or replace a port group names in the port group list
# -- for each port group in the list
# -- populate the port set, e.g.
# swp1-swp4,swp8,swp50s0-swp50s3
# -- set min and max rates in kbps for each egr_queue [min, max]
# -- set max rate in kbps at port level
shaping.port_group_list = [shaper_port_group]
shaping.shaper_port_group.port_set = swp1-swp3
shaping.shaper_port_group.egr_queue_0.shaper = [50000, 100000]
shaping.shaper_port_group.egr_queue_1.shaper = [51000, 150000]
shaping.shaper_port_group.egr_queue_2.shaper = [52000, 200000]
shaping.shaper_port_group.egr_queue_3.shaper = [53000, 250000]
shaping.shaper_port_group.egr_queue_4.shaper = [54000, 300000]
shaping.shaper_port_group.egr_queue_5.shaper = [55000, 350000]
shaping.shaper_port_group.egr_queue_6.shaper = [56000, 400000]
shaping.shaper_port_group.egr_queue_7.shaper = [57000, 450000]
# shaping.shaper_port_group.port.shaper = 900000
The settings are described below:
Traffic Shaping Setting
Description
shaping.port_group_list
The name of the port group. You must enclose the name in square brackets; for example, shaping.port_group_list = [shaper_port_group1].
shaping.shaper_port_group.port_set
The list of ports in the port group.
shaping.shaper_port_group.egr_queue_0.shaper
The minimum and maximum rates in kbps for egress queue 0. You must enclose the values in square brackets.
shaping.shaper_port_group.egr_queue_1.shaper
The minimum and maximum rates in kbps for egress queue 1. You must enclose the values in square brackets.
shaping.shaper_port_group.egr_queue_2.shaper
The minimum and maximum rates in kbps for egress queue 2. You must enclose the values in square brackets.
shaping.shaper_port_group.egr_queue_3.shaper
The minimum and maximum rates in kbps for egress queue 3. You must enclose the values in square brackets.
shaping.shaper_port_group.egr_queue_4.shaper
The minimum and maximum rates in kbps for egress queue 4. You must enclose the values in square brackets.
shaping.shaper_port_group.egr_queue_5.shaper
The minimum and maximum rates in kbps for egress queue 5. You must enclose the values in square brackets.
shaping.shaper_port_group.egr_queue_6.shaper
The minimum and maximum rates in kbps for egress queue 6. You must enclose the values in square brackets.
shaping.shaper_port_group.egr_queue_7.shaper
The minimum and maximum rates in kbps for egress queue 7. You must enclose the values in square brackets.
shaping.shaper_port_group.port.shaper
The maximum rate in kbps at the port level. At the port level, only the maximum shaper rate is supported.
scheduling.algorithm
Cumulus Linux supports the Deficit Weighted Round Robin (DWRR) scheduling algorithm only.
In Cumulus Linux, the burst size is set to twice the maximum rate internally; the setting is not configurable.
On Broadcom switches, when you modify the configuration in the /etc/cumulus/datapath/traffic.conf file, you must restart switchd for the changes to take effect; run the cumulus@switch:~$ sudo systemctl restart switchd.service command.
Restarting the switchd service causes all network ports to reset, interrupting network services, in addition to resetting the switch hardware configuration.
On Broadcom switches, after you modify the settings in the /etc/cumulus/datapath/traffic.conf file, you must restart switchd for the changes to take effect; run the cumulus@switch:~$ sudo systemctl restart switchd.service command.
Restarting the switchd service causes all network ports to reset, interrupting network services, in addition to resetting the switch hardware configuration.
On NVIDIA Spectrum switches, changes to the settings in the /etc/cumulus/datapath/traffic.conf file do not require you to restart switchd. However, you must run the echo 1 > /cumulus/switchd/config/traffic/reload command to apply the settings.
Always run the syntax checker syntax checker before applying the configuration changes.
Interface Buffer Status
On switches with ASICs, you can collect a fine-grained history of queue lengths using histograms maintained by the ASIC; see the ASIC Monitoring for details.
Example Configuration File
The following example /etc/cumulus/datapath/traffic.conf datapath configuration file applies to 10G, 40G, and 100G switches on Broadcom Tomahawk, Trident II, Trident II+, or Trident3 and Mellanox Spectrum platforms only.
For the default source packet fields and mapping, each selected packet field must have a block of mapped values. Any packet field value that is not specified in the configuration is assigned to a default internal switch priority. The configuration applies to every forwarding port unless a custom remark configuration is defined for that port (see below).
For the default remark packet fields and mapping, each selected packet field should have a block of mapped values. Any internal switch priority value that is not specified in the configuration is assigned to a default packet field value. The configuration applies to every forwarding port unless a custom remark configuration is defined for that port (see below).
Per-port source packet fields and mapping apply to the designated set of ports.
Per-port remark packet fields and mapping apply to the designated set of ports.
▼
Click to see the traffic.conf file
cumulus@switch:~$ sudo cat /etc/cumulus/datapath/traffic.conf
#
# /etc/cumulus/datapath/traffic.conf
# Copyright 2014, 2015, 2016, 2017, 2020 Cumulus Networks, Inc. All rights reserved.
#
# packet header field used to determine the packet priority level
# fields include {802.1p, dscp}
traffic.packet_priority_source_set = [802.1p]
# packet priority source values assigned to each internal cos value
# internal cos values {cos_0..cos_7}
# (internal cos 3 has been reserved for CPU-generated traffic)
#
# 802.1p values = {0..7}
traffic.cos_0.priority_source.8021p = [0]
traffic.cos_1.priority_source.8021p = [1]
traffic.cos_2.priority_source.8021p = [2]
traffic.cos_3.priority_source.8021p = []
traffic.cos_4.priority_source.8021p = [3,4]
traffic.cos_5.priority_source.8021p = [5]
traffic.cos_6.priority_source.8021p = [6]
traffic.cos_7.priority_source.8021p = [7]
# dscp values = {0..63}
#traffic.cos_0.priority_source.dscp = [0,1,2,3,4,5,6,7]
#traffic.cos_1.priority_source.dscp = [8,9,10,11,12,13,14,15]
#traffic.cos_2.priority_source.dscp = [16,17,18,19,20,21,22,23]
#traffic.cos_3.priority_source.dscp = [24,25,26,27,28,29,30,31]
#traffic.cos_4.priority_source.dscp = [32,33,34,35,36,37,38,39]
#traffic.cos_5.priority_source.dscp = [40,41,42,43,44,45,46,47]
#traffic.cos_6.priority_source.dscp = [48,49,50,51,52,53,54,55]
#traffic.cos_7.priority_source.dscp = [56,57,58,59,60,61,62,63]
# remark packet priority value
# fields include {802.1p, dscp}
traffic.packet_priority_remark_set = []
# packet priority remark values assigned from each internal cos value
# internal cos values {cos_0..cos_7}
# (internal cos 3 has been reserved for CPU-generated traffic)
#
# 802.1p values = {0..7}
#traffic.cos_0.priority_remark.8021p = [0]
#traffic.cos_1.priority_remark.8021p = [1]
#traffic.cos_2.priority_remark.8021p = [2]
#traffic.cos_3.priority_remark.8021p = [3]
#traffic.cos_4.priority_remark.8021p = [4]
#traffic.cos_5.priority_remark.8021p = [5]
#traffic.cos_6.priority_remark.8021p = [6]
#traffic.cos_7.priority_remark.8021p = [7]
# dscp values = {0..63}
#traffic.cos_0.priority_remark.dscp = [0]
#traffic.cos_1.priority_remark.dscp = [8]
#traffic.cos_2.priority_remark.dscp = [16]
#traffic.cos_3.priority_remark.dscp = [24]
#traffic.cos_4.priority_remark.dscp = [32]
#traffic.cos_5.priority_remark.dscp = [40]
#traffic.cos_6.priority_remark.dscp = [48]
#traffic.cos_7.priority_remark.dscp = [56]
# source.port_group_list = [source_port_group]
# source.source_port_group.packet_priority_source_set = [dscp]
# source.source_port_group.port_set = swp1-swp4,swp6
# source.source_port_group.cos_0.priority_source.dscp = [0,1,2,3,4,5,6,7]
# source.source_port_group.cos_1.priority_source.dscp = [8,9,10,11,12,13,14,15]
# source.source_port_group.cos_2.priority_source.dscp = [16,17,18,19,20,21,22,23]
# source.source_port_group.cos_3.priority_source.dscp = [24,25,26,27,28,29,30,31]
# source.source_port_group.cos_4.priority_source.dscp = [32,33,34,35,36,37,38,39]
# source.source_port_group.cos_5.priority_source.dscp = [40,41,42,43,44,45,46,47]
# source.source_port_group.cos_6.priority_source.dscp = [48,49,50,51,52,53,54,55]
# source.source_port_group.cos_7.priority_source.dscp = [56,57,58,59,60,61,62,63]
# remark.port_group_list = [remark_port_group]
# remark.remark_port_group.packet_priority_remark_set = [dscp]
# remark.remark_port_group.port_set = swp1-swp4,swp6
# remark.remark_port_group.cos_0.priority_remark.dscp = [0]
# remark.remark_port_group.cos_1.priority_remark.dscp = [8]
# remark.remark_port_group.cos_2.priority_remark.dscp = [16]
# remark.remark_port_group.cos_3.priority_remark.dscp = [24]
# remark.remark_port_group.cos_4.priority_remark.dscp = [32]
# remark.remark_port_group.cos_5.priority_remark.dscp = [40]
# remark.remark_port_group.cos_6.priority_remark.dscp = [48]
# remark.remark_port_group.cos_7.priority_remark.dscp = [56]
# priority groups
traffic.priority_group_list = [control, service, bulk]
# internal cos values assigned to each priority group
# each cos value should be assigned exactly once
# internal cos values {0..7}
priority_group.control.cos_list = [7]
priority_group.service.cos_list = [2]
priority_group.bulk.cos_list = [0,1,3,4,5,6]
# Alias Name defined for each priority group
# Valid string between 0-255 chars
# Sample alias support for naming priority groups
#priority_group.control.alias = "Control"
#priority_group.service.alias = "Service"
#priority_group.bulk.alias = "Bulk"
# to configure priority flow control on a group of ports:
# -- assign cos value(s) to the cos list
# -- add or replace a port group names in the port group list
# -- for each port group in the list
# -- populate the port set, e.g.
# swp1-swp4,swp8,swp50s0-swp50s3
# -- set a PFC buffer size in bytes for each port in the group
# -- set the xoff byte limit (buffer limit that triggers PFC frames transmit to start)
# -- set the xon byte delta (buffer limit that triggers PFC frames transmit to stop)
# -- enable PFC frame transmit and/or PFC frame receive
# priority flow control
# pfc.port_group_list = [pfc_port_group]
# pfc.pfc_port_group.cos_list = []
# pfc.pfc_port_group.port_set = swp1-swp4,swp6
# pfc.pfc_port_group.port_buffer_bytes = 25000
# pfc.pfc_port_group.xoff_size = 10000
# pfc.pfc_port_group.xon_delta = 2000
# pfc.pfc_port_group.tx_enable = true
# pfc.pfc_port_group.rx_enable = true
#
# Specify cable length in mts
# pfc.pfc_port_group.cable_length = 10
# to configure pause on a group of ports:
# -- add or replace port group names in the port group list
# -- for each port group in the list
# -- populate the port set, e.g.
# swp1-swp4,swp8,swp50s0-swp50s3
# -- set a pause buffer size in bytes for each port
# -- set the xoff byte limit (buffer limit that triggers pause frames transmit to start)
# -- set the xon byte delta (buffer limit that triggers pause frames transmit to stop)
# -- enable pause frame transmit and/or pause frame receive
# link pause
# link_pause.port_group_list = [pause_port_group]
# link_pause.pause_port_group.port_set = swp1-swp4,swp6
# link_pause.pause_port_group.port_buffer_bytes = 25000
# link_pause.pause_port_group.xoff_size = 10000
# link_pause.pause_port_group.xon_delta = 2000
# link_pause.pause_port_group.rx_enable = true
# link_pause.pause_port_group.tx_enable = true
#
# Specify cable length in mts
# link_pause.pause_port_group.cable_length = 10
# Explicit Congestion Notification
# to configure ECN and RED on a group of ports:
# -- add or replace port group names in the port group list
# -- assign cos value(s) to the cos list
# -- for each port group in the list
# -- populate the port set, e.g.
# swp1-swp4,swp8,swp50s0-swp50s3
# -- to enable RED requires the latest traffic.conf
# ecn_red.port_group_list = [ecn_red_port_group]
# ecn_red.ecn_red_port_group.cos_list = []
# ecn_red.ecn_red_port_group.port_set = swp1-swp4,swp6
# ecn_red.ecn_red_port_group.ecn_enable = true
# ecn_red.ecn_red_port_group.red_enable = false
# ecn_red.ecn_red_port_group.min_threshold_bytes = 40000
# ecn_red.ecn_red_port_group.max_threshold_bytes = 200000
# ecn_red.ecn_red_port_group.probability = 100
# Hierarchical traffic shaping
# to configure shaping at 2 levels:
# - per egress queue egr_queue_0 - egr_queue_7
# - port level aggregate
# -- add or replace a port group names in the port group list
# -- for each port group in the list
# -- populate the port set, e.g.
# swp1-swp4,swp8,swp50s0-swp50s3
# -- set min and max rates in kbps for each egr_queue [min, max]
# -- set max rate in kbps at port level
# shaping.port_group_list = [shaper_port_group]
# shaping.shaper_port_group.port_set = swp1-swp3,swp5,swp7s0-swp7s3
# shaping.shaper_port_group.egr_queue_0.shaper = [50000, 100000]
# shaping.shaper_port_group.egr_queue_1.shaper = [51000, 150000]
# shaping.shaper_port_group.egr_queue_2.shaper = [52000, 200000]
# shaping.shaper_port_group.egr_queue_3.shaper = [53000, 250000]
# shaping.shaper_port_group.egr_queue_4.shaper = [54000, 300000]
# shaping.shaper_port_group.egr_queue_5.shaper = [55000, 350000]
# shaping.shaper_port_group.egr_queue_6.shaper = [56000, 400000]
# shaping.shaper_port_group.egr_queue_7.shaper = [57000, 450000]
# shaping.shaper_port_group.port.shaper = 900000
# scheduling algorithm: algorithm values = {dwrr}
scheduling.algorithm = dwrr
# traffic group scheduling weight
# weight values = {0..127}
# '0' indicates strict priority
priority_group.control.weight = 0
priority_group.service.weight = 32
priority_group.bulk.weight = 16
# default egress scheduling weight per egress queue
# To be applied to all the ports if port_group profile not configured
# If you do not specify any bw_percent of egress_queues, those egress queues
# will assume DWRR weight 0 - no egress scheduling for those queues
# '0' indicates strict priority
#default_egress_sched.egr_queue_0.bw_percent = 12
#default_egress_sched.egr_queue_1.bw_percent = 12
#default_egress_sched.egr_queue_2.bw_percent = 24
#default_egress_sched.egr_queue_3.bw_percent = 12
#default_egress_sched.egr_queue_4.bw_percent = 12
#default_egress_sched.egr_queue_5.bw_percent = 12
#default_egress_sched.egr_queue_6.bw_percent = 12
#default_egress_sched.egr_queue_7.bw_percent = 0
# port_group profile for egress scheduling weight per egress queue
# If you do not specify any bw_percent of egress_queues, those egress queues
# will assume DWRR weight 0 - no egress scheduling for those queues
# '0' indicates strict priority
#egress_sched.port_group_list = [sched_port_group1]
#egress_sched.sched_port_group1.port_set = swp2
#egress_sched.sched_port_group1.egr_queue_0.bw_percent = 10
#egress_sched.sched_port_group1.egr_queue_1.bw_percent = 20
#egress_sched.sched_port_group1.egr_queue_2.bw_percent = 30
#egress_sched.sched_port_group1.egr_queue_3.bw_percent = 10
#egress_sched.sched_port_group1.egr_queue_4.bw_percent = 10
#egress_sched.sched_port_group1.egr_queue_5.bw_percent = 10
#egress_sched.sched_port_group1.egr_queue_6.bw_percent = 10
#egress_sched.sched_port_group1.egr_queue_7.bw_percent = 0
# To turn on/off Denial of service (DOS) prevention checks
dos_enable = false
# Cut-through is disabled by default on all chips with the exception of
# Spectrum. On Spectrum cut-through cannot be disabled.
#cut_through_enable = false
# Enable resilient hashing
#resilient_hash_enable = FALSE
# Resilient hashing flowset entries per ECMP group
# Valid values - 64, 128, 256, 512, 1024
#resilient_hash_entries_ecmp = 128
# Enable symmetric hashing
#symmetric_hash_enable = TRUE
# Set sflow/sample ingress cpu packet rate and burst in packets/sec
# Values: {0..16384}
#sflow.rate = 16384
#sflow.burst = 16384
#Specify the maximum number of paths per route entry.
# Maximum paths supported is 200.
# Default value 0 takes the number of physical ports as the max path size.
#ecmp_max_paths = 0
#Specify the hash seed for Equal cost multipath entries
# and for cutom ecmp and lag hash
# Default value : random
# Value Rang: {0..4294967295}
#ecmp_hash_seed = 42
# HASH config for ECMP to enable custom fields
# Fields will be applicable for ECMP hash
# calculation
#Note : Currently supported only for MLX platform
# Uncomment to enable custom fields configured below
#hash_config.enable = true
#hash Fields available ( assign true to enable)
#ip protocol
hash_config.ip_prot = true
#source ip
hash_config.sip = true
#destination ip
hash_config.dip = true
#source port
hash_config.sport = true
#destination port
hash_config.dport = true
#ipv6 flow label
hash_config.ip6_label = true
#ingress interface
hash_config.ing_intf = false
#inner fields for IPv4-over-IPv6 and IPv6-over-IPv6
hash_config.inner_ip_prot = false
hash_config.inner_sip = false
hash_config.inner_dip = false
hash_config.inner_sport = false
hash_config.inner_dport = false
hash_config.inner_ip6_label = false
# Hash config end #
#LAG HASH config
#HASH config for LACP to enable custom fields
#Fields will be applicable for LAG hash
#calculation
#Uncomment to enable custom fields configured below
#lag_hash_config.enable = true
lag_hash_config.smac = true
lag_hash_config.dmac = true
lag_hash_config.sip = true
lag_hash_config.dip = true
lag_hash_config.ether_type = true
lag_hash_config.vlan_id = true
lag_hash_config.sport = true
lag_hash_config.dport = true
lag_hash_config.ip_prot = true
# Specify the forwarding table resource allocation profile, applicable
# only on platforms that support universal forwarding resources.
#
# /usr/cumulus/sbin/cl-resource-query reports the allocated table sizes
# based on the profile setting.
#
# Values: one of { *** Common ***
# 'default', 'l2-heavy', 'v4-lpm-heavy', 'v6-lpm-heavy',
# 'ipmc-heavy',
#
# *** Mellanox only platforms ***
# 'l2-heavy-1', 'l2-heavy-2', 'v4-lpm-heavy-1',
# 'rash-v4-lpm-heavy', 'rash-custom-profile1',
# 'rash-custom-profile2', 'lpm-balanced',
#
# *** Broadcom[XGS] only platforms ***
# 'mode-0', 'mode-1', 'mode-2', 'mode-3', 'mode-4',
# 'mode-5', 'mode-6', 'mode-7', 'mode-8'
# }
#
# Default value: 'default'
# Notes: some devices may support more modes, please consult user
# guide for more details
#
forwarding_table.profile = default
On switches with Spectrum ASICs, you must enable packet priority remark on the ingress port. A packet received on a remark-enabled port is remarked according to the priority mapping configured on the egress port. If you configure packet priority remark the same way on every port, the default configuration example above is correct. However, per-port customized configurations require two port groups, one for the ingress ports and one for the egress ports, as below:
Cumulus Linux provides a syntax checker for the /etc/cumulus/datapath/traffic.conf file to check for errors, such missing parameters, or invalid parameter labels and values.
On Broadcom switches, the syntax checker runs automatically during switchd initialization and reports syntax errors to the /var/log/switchd.log file.
On both Broadcom and NVIDIA switches, you can run the syntax checker manually from the command line by issuing the cl-consistency-check --datapath-syntax-check command. If errors exist, they are written to stderr by default. If you run the command with -q, errors are written to the /var/log/switchd.log file.
The cl-consistency-check --datapath-syntax-check command takes the following options:
Option
Description
-h
Displays this list of command options.
-q
Runs the command in quiet mode. Errors are written to the /var/log/switchd.log file instead of stderr.
-t <file-name>
Runs the syntax check on a non-default traffic.conf file; for example, /mypath/test-traffic.conf.
You can run the syntax checker when switchd is either running or stopped.
Example Commands
The following example command runs the syntax checker on the default /etc/cumulus/datapath/traffic.conf file and shows that no errors are detected:
cumulus@switch:~$ cl-consistency-check --datapath-syntax-check
No errors detected in traffic config file /etc/cumulus/datapath/traffic.conf
The following example command runs the syntax checker on the default /etc/cumulus/datapath/traffic.conf file in quiet mode. If errors exist, they are written to the /var/log/switchd.log file.
The following example command runs the syntax checker on the /mypath/test-traffic.conf file and shows that errors are detected:
cumulus@switch:~$ cl-consistency-check --datapath-syntax-check -t /path/test-traffic.conf
Traffic source 8021p: missing mapping for priority value '7'
Errors detected while checking traffic config file /mypath/test-traffic.conf
The following example command runs the syntax checker on the /mypath/test-traffic.conf file in quiet mode. If errors exist, they are written to the /var/log/switchd.log file.
It is crucial to protect the control plane on the switch to ensure that the proper control plane applications have access to the CPU. Failure to do so increases vulnerabilities to a Denial of Service (DOS attack. Cumulus Linux provides control plane protection by default. In addition, you can configure DDOS protection to protect data plane, control plane, and management plane traffic on the switch. You can configure Cumulus Linux to drop packets that match one or more of the following criteria while incurring no performance impact:
Source IP address matches the destination address for IPv4 and IPv6 packets
Source MAC address matches the destination MAC address
Unfragmented or first fragment SYN packets with a source port of 0-1023
TCP packets with control flags =0 and seq number == 0
TCP packets with FIN, URG and PSH bits set and seq number == 0
TCP packets with both SYN and FIN bits set
TCP source PORT matches the destination port
UDP source PORT matches the destination port
First TCP fragment with partial TCP header
TCP header has fragment offset value of 1
ICMPv6 ping packets payload larger than programmed value of ICMP max size
ICMPv4 ping packets payload larger than programmed value of ICMP max size
Fragmented ICMP packet
IPv6 fragment lower than programmed minimum IPv6 packet size
DDOS protection is not supported on Broadcom Hurricane2 and Mellanox Spectrum ASICs.
Configure DDOS Protection
Open the /etc/cumulus/datapath/traffic.conf file in a text editor.
Enable DOS prevention checks by setting the dos_enable value to true:
# To turn on/off Denial of Service (DOS) prevention checks
dos_enable = true
Open the /usr/lib/python2.7/dist-packages/cumulus/__chip_config/bcm/datapath.conf file in a text editor. Set any of the DOS checks to true. For example:
Configuring any of the following settings affects the BFD echo function. For example, if you enable dos.udp_ports_eq, all the BFD packets are dropped because the BFD protocol uses the same source and destination UDP ports.
Restarting the switchd service causes all network ports to reset, interrupting network services, in addition to resetting the switch hardware configuration.
DHCP
This section describes how to configure:
DHCP relays for IPv4 and IPv6
DHCP servers for IPv4 and IPv6
DCHP snooping for IPv4 and IPv6
DHCP Relays
DHCP is a client server protocol that automatically provides IP hosts with IP addresses and other related configuration information. A DHCP relay (agent) is a host that forwards DHCP packets between clients and servers that are not on the same physical subnet.
This topic describes how to configure DHCP relays for IPv4 and IPv6 using the following topology:
If you intend to run the dhcrelay service within a VRF, follow these steps.
Basic Configuration
To set up DHCP relay, you need to provide the IP address of the DHCP server and the interfaces participating in DHCP relay (facing the server and facing the client). You can specify as many server IP addresses that can fit in 255 octets.
In the example commands below, the DHCP server IP address is 172.16.1.102, vlan10 is the SVI for VLAN 10 and the uplinks are swp51 and swp52.
cumulus@leaf01:~$ net add dhcp relay interface swp51
cumulus@leaf01:~$ net add dhcp relay interface swp52
cumulus@leaf01:~$ net add dhcp relay interface vlan10
cumulus@leaf01:~$ net add dhcp relay server 172.16.1.102
cumulus@leaf01:~$ net pending
cumulus@leaf01:~$ net commit
NCLU commands are not currently available to configure IPv6 relays. Use the Linux Commands.
Edit the /etc/default/isc-dhcp-relay file to add the IP address of the DHCP server and the interfaces participating in DHCP relay. In the example below, the DHCP server IP address is 172.16.1.102, vlan10 is the SVI for VLAN 10, and the uplinks are swp51 and swp52.
Edit the /etc/default/isc-dhcp-relay6 file to add the IP address of the DHCP server and the interfaces participating in DHCP relay. In the example below, the DHCP server IP address is 2001:db8:100::2, vlan10 is the SVI for VLAN 10, and the uplinks are swp51 and swp52.
You configure a DHCP relay on a per-VLAN basis, specifying the SVI, not the parent bridge. In the example above, you specify vlan10 as the SVI for VLAN 10 but you do not specify the bridge named bridge.
When you configure DHCP relay with VRR, the DHCP relay client must run on the SVI; not on the -v0 interface.
Optional Configuration
This section describes optional DHCP relay configurations. The steps provided in this section assume that you have already configured basic DHCP relay, as described above.
DHCP Agent Information Option (Option 82)
Cumulus Linux supports DHCP Agent Information Option 82, which allows a DHCP relay to insert circuit or relay specific information into a request that is being forwarded to a DHCP server. The following options are provided:
Circuit ID includes information about the circuit on which the request comes in, such as the SVI or physical port. By default, this is the printable name of the interface on which the client request is received.
Remote ID includes information that identifies the relay agent, such as the MAC address. By default, this is the system MAC address of the device on which DHCP relay is running.
NCLU commands are not currently available for this feature. Use Linux commands.
To configure DHCP Agent Information Option 82:
Edit the /etc/default/isc-dhcp-relay file and add one of the following options:
To inject the ingress SVI interface against which the relayed DHCP discover packet is processed, add -a to the OPTIONS line:
cumulus@leaf01:~$ sudo nano /etc/default/isc-dhcp-relay
...
# Additional options that are passed to the DHCP relay daemon?
OPTIONS="-a"
To inject the physical switch port on which the relayed DHCP discover packet arrives instead of the SVI, add -a --use-pif-circuit-id to the OPTIONS line:
cumulus@leaf01:~$ sudo nano /etc/default/isc-dhcp-relay
...
# Additional options that are passed to the DHCP relay daemon?
OPTIONS="-a --use-pif-circuit-id"
To customize the Remote ID sub-option, add -a -r to the OPTIONS line followed by a custom string (up to 255 characters):
cumulus@leaf01:~$ sudo nano /etc/default/isc-dhcp-relay
...
# Additional options that are passed to the DHCP relay daemon?
OPTIONS="-a -r CUSTOMVALUE"
Restart the dhcrelay service to apply the new configuration:
When DHCP relay is required in an environment that relies on an anycast gateway (such as EVPN), a unique IP address is necessary on each device for return traffic. By default, in a BGP unnumbered environment with DHCP relay, the source IP address is set to the loopback IP address and the gateway IP address (giaddr) is set to the SVI IP address. However with anycast traffic, the SVI IP address is not unique to each rack; it is typically shared between racks. Most EVPN ToR deployments only possess a single unique IP address, which is the loopback IP address.
RFC 3527 enables the DHCP server to react to these environments by introducing a new parameter to the DHCP header called the link selection sub-option, which is built by the DHCP relay agent. The link selection sub-option takes on the normal role of the giaddr in relaying to the DHCP server which subnet is correlated to the DHCP request. When using this sub-option, the giaddr continues to be present but only relays the return IP address that is to be used by the DHCP server; the giaddr becomes the unique loopback IP address.
When enabling RFC 3527 support, you can specify an interface, such as the loopback interface or a switch port interface to be used as the giaddr. The relay picks the first IP address on that interface. If the interface has multiple IP addresses, you can specify a specific IP address for the interface.
RFC 3527 is supported for IPv4 DHCP relays only.
To enable RFC 3527 support and control the giaddr:
Run the net add dhcp relay giaddr-interface command with the interface or the interface and IP address you want to use.
This example uses the first IP address on the loopback interface as the giaddr:
cumulus@leaf01:~$ net add dhcp relay giaddr-interface lo
The first IP address on the loopback interface is typically the 127.0.0.1 address. This example uses IP address 10.10.10.1 on the loopback interface as the giaddr:
cumulus@leaf01:~$ net add dhcp relay giaddr-interface lo 10.10.10.1
This example uses the first IP address on swp2 as the giaddr:
cumulus@leaf01:~$ net add dhcp relay giaddr-interface swp2
This example uses IP address 10.0.0.4 on swp2 as the giaddr:
cumulus@leaf01:~$ net add dhcp relay giaddr-interface swp2 10.0.0.4
Edit the /etc/default/isc-dhcp-relay file and provide the -U option with the interface or IP address you want to use as the giaddr.
This example uses the first IP address on the loopback interface as the giaddr:
cumulus@leaf01:~$ sudo nano /etc/default/isc-dhcp-relay
...
# Additional options that are passed to the DHCP relay daemon?
OPTIONS="-U lo"
The first IP address on the loopback interface is typically the 127.0.0.1 address. This example uses IP address 10.10.10.1 on the loopback interface as the giaddr:
cumulus@leaf01:~$ sudo nano /etc/default/isc-dhcp-relay
...
# Additional options that are passed to the DHCP relay daemon?
OPTIONS="-U 10.10.10.1%lo"
This example uses the first IP address on swp2 as the giaddr:
cumulus@leaf01:~$ sudo nano /etc/default/isc-dhcp-relay
...
# Additional options that are passed to the DHCP relay daemon?
OPTIONS="-U swp2"
This example uses IP address 10.0.0.4 on swp2 as the giaddr:
cumulus@leaf01:~$ sudo nano /etc/default/isc-dhcp-relay
...
# Additional options that are passed to the DHCP relay daemon?
OPTIONS="-U 10.0.0.4%swp2"
Restart the dhcrelay service to apply the configuration change:
Run the cl set service dhcp-relay default giaddress-interface command with the interface/IP address you want to use. The following example uses the first IP address on the loopback interface as the gateway IP address:
cumulus@leaf01:~$ cl set service dhcp-relay default giaddress-interface lo
The first IP address on the loopback interface is typically the 127.0.0.1 address. This example uses IP address 10.10.10.1 on the loopback interface as the giaddr:
cumulus@leaf01:~$ cl set service dhcp-relay default giaddress-interface lo 10.10.10.1
This example uses the first IP address on swp2 as the giaddr:
cumulus@leaf01:~$ cl set service dhcp-relay default giaddr-interface swp2
This example uses IP address 10.0.0.4 on swp2 as the giaddr:
cumulus@leaf01:~$ cl set service dhcp-relay default giaddr-interface swp2 10.0.0.4
When enabling RFC 3527 support, you can specify an interface such as the loopback interface or swp interface for the gateway address. The interface you use must be reachable in the tenant VRF that it is servicing and must be unique to the switch. In EVPN symmetric routing, fabrics running an anycast gateway that use the same SVI IP address on multiple leaf switches need a unique IP address for the VRF interface and must include the layer 3 VNI for this VRF in the DHCP Relay configuration. For example:
Gateway IP Address as Source IP for Relayed DHCP Packets (Advanced)
You can configure the dhcrelay service to forward IPv4 (only) DHCP packets to a DHCP server and ensure that the source IP address of the relayed packet is the same as the gateway IP address.
This option impacts all relayed IPv4 packets globally.
To use the gateway IP address as the source IP address:
Run the net add dhcp relay use-giaddr-as-src command:
cumulus@leaf:~$ net add dhcp relay use-giaddr-as-src
cumulus@leaf:~$ net pending
cumulus@leaf:~$ net commit
Edit the /etc/default/isc-dhcp-relay file to add --giaddr-src to the OPTIONS line. An example is shown below.
Cumulus Linux supports multiple DHCP relay daemons on a switch to enable relaying of packets from different bridges to different upstream interfaces.
To configure multiple DHCP relay daemons on a switch:
In the /etc/default directory, create a configuration file for each DHCP relay daemon. Use the naming scheme isc-dhcp-relay-<dhcp-name> for IPv4 or isc-dhcp-relay6-<dhcp-name> for IPv6. An example configuration file for IPv4 is shown below:
# Defaults for isc-dhcp-relay initscript
# sourced by /etc/init.d/isc-dhcp-relay
# installed at /etc/default/isc-dhcp-relay by the maintainer scripts
#
# This is a POSIX shell fragment
#
# What servers should the DHCP relay forward requests to?
SERVERS="102.0.0.2"
# On what interfaces should the DHCP relay (dhrelay) serve DHCP requests?
# Always include the interface towards the DHCP server.
# This variable requires a -i for each interface configured above.
# This will be used in the actual dhcrelay command
# For example, "-i eth0 -i eth1"
INTF_CMD="-i swp2s2 -i swp2s3"
# Additional options that are passed to the DHCP relay daemon?
OPTIONS=""
Run the following command to start a dhcrelay instance, where <dhcp-name> is the instance name or number.
To see how DHCP relay is working on your switch, run the journalctl command:
cumulus@leaf01:~$ sudo journalctl -l -n 20 | grep dhcrelay
Dec 05 20:58:55 leaf01 dhcrelay[6152]: sending upstream swp52
Dec 05 20:58:55 leaf01 dhcrelay[6152]: sending upstream swp51
Dec 05 20:58:55 leaf01 dhcrelay[6152]: Relaying Reply to fe80::4638:39ff:fe00:3 port 546 down.
Dec 05 20:58:55 leaf01 dhcrelay[6152]: Relaying Reply to fe80::4638:39ff:fe00:3 port 546 down.
Dec 05 21:03:55 leaf01 dhcrelay[6152]: Relaying Renew from fe80::4638:39ff:fe00:3 port 546 going up.
Dec 05 21:03:55 leaf01 dhcrelay[6152]: sending upstream swp52
Dec 05 21:03:55 leaf01 dhcrelay[6152]: sending upstream swp51
Dec 05 21:03:55 leaf01 dhcrelay[6152]: Relaying Reply to fe80::4638:39ff:fe00:3 port 546 down.
Dec 05 21:03:55 leaf01 dhcrelay[6152]: Relaying Reply to fe80::4638:39ff:fe00:3 port 546 down.
To specify a time period with the journalctl command, use the --since flag:
cumulus@leaf01:~$ sudo journalctl -l --since "2 minutes ago" | grep dhcrelay
Dec 05 21:08:55 leaf01 dhcrelay[6152]: Relaying Renew from fe80::4638:39ff:fe00:3 port 546 going up.
Dec 05 21:08:55 leaf01 dhcrelay[6152]: sending upstream swp52
Dec 05 21:08:55 leaf01 dhcrelay[6152]: sending upstream swp51
Configuration Errors
If you configure DHCP relays by editing the /etc/default/isc-dhcp-relay file manually, you might introduce configuration errors that can cause the switch to crash.
For example, if you see an error similar to the following, there might be a space between the DHCP server address and the interface used as the uplink.
Core was generated by /usr/sbin/dhcrelay --nl -d -i vx-40 -i vlan10 10.0.0.4 -U 10.0.1.2 %vlan20.
Program terminated with signal SIGSEGV, Segmentation fault.
To resolve the issue, manually edit the /etc/default/isc-dhcp-relay file to remove the space, then run the systemctl restart dhcrelay.service command to restart the dhcrelay service and apply the configuration change.
Considerations
The dhcrelay command does not bind to an interface if the interface name is longer than 14 characters. This is a known limitation in dhcrelay.
DHCP Servers
A DHCP Server automatically provides and assigns IP addresses and other network parameters to client devices. It relies on the Dynamic Host Configuration Protocol to respond to broadcast requests from clients.
This topic describes how to configure a DHCP server for IPv4 and IPv6 using the following topology.
The DHCP server is a switch running Cumulus Linux; however, the DHCP server can also be located on a dedicated server in your environment.
For information about DHCP relays, refer to DHCP Relays.
Configure the DHCP Server on a Cumulus Linux Switch
To configure the DHCP server on a Cumulus Linux switch, edit the /etc/dhcp/dhcp.conf or /etc/dhcp/dhcpd6.conf configuration file. Sample configurations are provided.
You must include two pools in the DHCP configuration files:
Pool 1 is the subnet that includes the IP addresses of the interfaces on the DHCP server.
Pool 2 is the subnet that includes the IP addresses being assigned.
In a text editor, edit the /etc/dhcp/dhcpd.conf file. Use following configuration as an example:
You can assign an IP address and other DHCP options based on physical location or port regardless of MAC address to clients that are attached directly to the Cumulus Linux switch through a switch port. This is helpful when swapping out switches and servers; you can avoid the inconvenience of collecting the MAC address and sending it to the network administrator to modify the DHCP server configuration.
Edit the /etc/dhcp/dhcpd.conf file and add the interface name ifname to assign an IP address through DHCP. The following provides an example:
The DHCP server determines if a DHCP request is a relay or a non-relay DHCP request. You can run the following command to see the DHCP request:
cumulus@server02:~$ sudo tail /var/log/syslog | grep dhcpd
2016-12-05T19:03:35.379633+00:00 server02 dhcpd: Relay-forward message from 2001:db8:101::1 port 547, link address 2001:db8:101::1, peer address fe80::4638:39ff:fe00:3
2016-12-05T19:03:35.380081+00:00 server02 dhcpd: Advertise NA: address 2001:db8:1::110 to client with duid 00:01:00:01:1f:d8:75:3a:44:38:39:00:00:03 iaid = 956301315 valid for 600 seconds
2016-12-05T19:03:35.380470+00:00 server02 dhcpd: Sending Relay-reply to 2001:db8:101::1 port 547
DHCP Snooping
DHCP snooping enables Cumulus Linux to act as a middle layer between the DHCP infrastructure and DHCP clients by scanning DHCP control packets and building an IP-MAC database. Cumulus Linux accepts DHCP offers from only trusted interfaces and can rate limit packets.
DHCP option 82 processing is not supported.
Configure DHCP Snooping
To configure DHCP snooping, you need to:
Enable DHCP snooping on a VLAN.
Add a trusted interface. Cumulus Linux allows DHCP offers from only trusted interfaces to prevent malicious DHCP servers from assigning IP addresses inside the network. The interface must be a member of the bridge specified.
Set the rate limit for DHCP requests to avoid DoS attacks. The default value is 100 packets per second.
The following example commands show you how to configure DHCP snooping for IPv4 and IPv6.
cumulus@leaf01:~$ net add bridge br0 dhcp-snoop vlan 100
cumulus@leaf01:~$ net add bridge br0 dhcp-snoop vlan 100 trust swp6
cumulus@leaf01:~$ net add bridge br0 dhcp-snoop vlan 100 rate-limit 50
cumulus@leaf01:~$ net pending
cumulus@leaf01:~$ net commit
cumulus@leaf01:~$ net add bridge br0 dhcp-snoop6 vlan 100
cumulus@leaf01:~$ net add bridge br0 dhcp-snoop6 vlan 100 trust swp6
cumulus@leaf01:~$ net add bridge br0 dhcp-snoop6 vlan 100 rate-limit 50
cumulus@leaf01:~$ net pending
cumulus@leaf01:~$ net commit
The NCLU commands save the configuration in the /etc/dhcpsnoop/dhcp_snoop.json file. For example:
To remove all DHCP snooping configuration, run the net del dhcp-snoop all command. For example:
cumulus@leaf01:~$ net del dhcp-snoop all
cumulus@leaf01:~$ net pending
cumulus@leaf01:~$ net commit
When DHCP snooping detects a violation, the packet is dropped and a message is logged to the /var/log/dhcpsnoop.log file.
Show the DHCP Binding Table
To show the DHCP binding table, run the net show dhcp-snoop table command for IPv4 or the net show dhcp-snoop6 table command for IPv6. The following example command shows the DHCP binding table for IPv4:
cumulus@leaf01:~$ net show dhcp-snoop table
Port VLAN IP MAC Lease State Bridge
---- ---- --------- ----------------- ----- ----- ------
swp5 1002 10.0.0.3 00:02:00:00:00:04 7200 ACK br0
swp5 1000 10.0.1.3 00:02:00:00:00:04 7200 ACK br0
802.1X Interfaces
The IEEE 802.1X protocol provides a method of authenticating a client (called a supplicant) over wired media. It also provides access for individual MAC addresses on a switch (called the authenticator) after those MAC addresses have been authenticated by an authentication server, typically a RADIUS (Remote Authentication Dial In User Service, defined by RFC 2865) server.
A Cumulus Linux switch acts as an intermediary between the clients connected to the wired ports and the authentication server, which is reachable over the existing network. EAPOL (Extensible Authentication Protocol (EAP) over LAN - EtherType value of 0x888E, defined by RFC 3748) operates on top of the data link layer; the switch uses EAPOL to communicate with supplicants connected to the switch ports.
Cumulus Linux implements 802.1X through the Debian hostapd package, which has been modified to provide the PAE (port access entity).
Supported Features
802.1X is supported on Broadcom-based switches (except the Hurricane2 switch). The Tomahawk, Tomahawk2, and Trident3 switch must be running in nonatomic mode.
802.1X is supported on physical interfaces only, such as swp1 or swp2s0 (bridged/access only and routed interfaces).
MAB, parking VLAN, and dynamic VLAN all require a bridge access port.
In traditional bridge mode, parking VLANs and dynamic VLANs both require the destination bridge to have a parking VLAN ID or dynamic VLAN ID tagged subinterface.
When you enable or disable 802.1X on ports, hostapd reloads; however, existing authorized sessions do not reset.
Changing the 802.1X interface, MAB, or parking VLAN settings do not reset existing authorized user ports. However, removing all 802.1X interfaces or changing any of the following RADIUS parameters restarts hostapd, which forces existing, authorized users to re-authenticate:
RADIUS server IP address, shared secret, authentication port or accounting port.
Parking VLAN ID.
MAB activation delay.
EAP reauthentication period.
You can configure up to three RADIUS servers (in case of failover). However, do not use a Cumulus Linux switch as the RADIUS server.
802.1X on Cumulus Linux has been tested with only a few wpa_supplicant (Debian), Windows 10 and Windows 7 supplicants.
RADIUS authentication is supported with FreeRADIUS and Cisco ACS.
802.1X supports simple login and password, PEAP/MSCHAPv2 (Win7) and EAP-TLS (Debian).
802.1X supports RFC 5281 for EAP-TTLS, which provides more secure transport layer security.
Mako template-based configurations are not supported.
Cumulus Linux supports Multi Domain Authentication (MDA), where 802.1X is extended to allow authorization of multiple devices (a data and a voice device) on a single port and assign different VLANs to the devices based on authorization.
A maximum of four authorized devices (MAB + EAPOL) per port are supported.
The 802.1X-enabled port must be a trunk port to allow tagged voice traffic from a phone; you cannot enable 802.1X on an access port.
Only one untagged VLAN and one tagged VLAN is supported on the 802.1X enabled ports.
Multiple MAB (non voice) devices on a port are supported for VLAN-aware bridges only. Authorization of multiple MAB devices for different VLANs is not supported.
Cumulus Linux does not support 802.1X with MLAG; the switch cannot synchronize 802.1X authenticated MAC addresses over the peerlink.
Configure the RADIUS Server
Before you can authenticate with 802.1x on your switch, you must configure a RADIUS server somewhere in your network. Popular examples of commercial software with RADIUS capability include Cisco ISE and Aruba ClearPass.
There are also open source versions of software supporting RADIUS such as PacketFence and FreeRADIUS. This section discusses how to add FreeRADIUS to a Debian server on your network.
Do not use a Cumulus Linux switch as the RADIUS server.
To add FreeRADIUS on a Debian server, do the following:
All the 802.1X interfaces share the same RADIUS server settings. Make sure you configure the RADIUS server before you configure the 802.1X interfaces. See Configure the RADIUS Server above.
To configure an 802.1X interface, you need to set the following parameters, then enable 802.1X on the interface:
The RADIUS accounting port, which defaults to 1813.
The RADIUS Server IPv4 or IPv6 address, which has no default, but is required. You can also specify a VRF.
The RADIUS shared secret, which has no default, but is required.
Configure 802.1X Interfaces for a VLAN-aware Bridge
NCLU handles all the 802.1X interface configuration, updating hostapd and other components so you do not have to manually modify configuration files.
Create a simple interface bridge configuration on the switch and add the switch ports that are members of the bridge. You can use glob syntax to add a range of interfaces. The MAB and parking VLAN configurations require interfaces to be bridge access ports. The VLAN-aware bridge must be named bridge and there can be only one VLAN-aware bridge on a switch.
cumulus@switch:~$ net add bridge bridge ports swp1-4
Add the 802.1X RADIUS server IP address and shared secret:
cumulus@switch:~$ net add dot1x radius server-ip 127.0.0.1
cumulus@switch:~$ net add dot1x radius shared-secret mysecret
You can specify a VRF for outgoing RADIUS accounting and authorization packets. The following example specifies a VRF called turtle:
cumulus@switch:~$ net add dot1x radius server-ip 127.0.0.1 vrf turtle
cumulus@switch:~$ net add dot1x radius shared-secret mysecret
Enable 802.1X on the interfaces, then review and commit the new configuration:
cumulus@switch:~$ net add interface swp1-4 dot1x
cumulus@switch:~$ net pending
cumulus@switch:~$ net commit
To assign a tagged VLAN for voice devices and assign different VLANs to the devices based on authorization, run these commands:
cumulus@switch:~$ net add interface swp1-4 dot1x voice-enable
cumulus@switch:~$ net add interface swp1-4 dot1x voice-enable vlan 200
cumulus@switch:~$ net pending
cumulus@switch:~$ net commit
Edit the /etc/network/interfaces file to create a simple interface bridge configuration on the switch and add the switch ports that are members of the bridge. The MAB and parking VLAN configurations require interfaces to be bridge access ports. The VLAN-aware bridge must be named bridge and there can be only one VLAN-aware bridge on a switch. The following example shows that swp1 thru swp4 are members of the bridge.
Edit the /etc/hostapd.conf file to configure 802.1X settings. The example below sets:
The IP address of the 802.1X RADIUS server to 127.0.0.1 (auth_server_addr=127.0.0.1). You can specify a VRF for outgoing RADIUS accounting and authorization packets (for example, to specify a VRF called turtle: auth_server_addr=127.0.0.1%turtle).
The shared secret to mysecret (auth_server_shared_secret=mysecret).
802.1X on swp1 thru swp4 (interfaces=swp1,swp2,swp3,swp4).
Configure 802.1X Interfaces for a Traditional Mode Bridge
NCLU and hostapd might change traditional mode configurations on the bridge-ports line in the /etc/network/interface file by adding or deleting special 802.1X traditional mode bridge-ports configuration stanzas in /etc/network/interfaces.d/. The source configuration command in /etc/network/interfaces must include these special configuration filenames. It must include at least source /etc/network/interfaces.d/*.intf so that these files are sourced during an ifreload.
Create uplink ports. The following example uses bonds:
cumulus@switch:~$ net add bond bond1 bond slaves swp5-6
cumulus@switch:~$ net add bond bond2 bond slaves swp7-8
Create a traditional mode bridge configuration on the switch and add the switch ports that are members of the bridge. A traditional bridge cannot be named **** bridge as that name is reserved for the single VLAN-aware bridge on the switch. You can use glob syntax to add a range of interfaces.
cumulus@switch:~$ net add bridge bridge1 ports swp1-4
Create bridge associations with the parking VLAN ID and the dynamic VLAN IDs. In this example, 600 is used for the parking VLAN ID and 700 is used for the dynamic VLAN ID:
cumulus@switch:~$ net add bridge br-vlan600 ports bond1.600
cumulus@switch:~$ net add bridge br-vlan700 ports bond2.700
Add the 802.1X RADIUS server IP address and shared secret:
cumulus@switch:~$ net add dot1x radius server-ip 127.0.0.1
cumulus@switch:~$ net add dot1x radius shared-secret mysecret
You can specify a VRF for outgoing RADIUS accounting and authorization packets. The following example specifies a VRF called turtle:
cumulus@switch:~$ net add dot1x radius server-ip 127.0.0.1 vrf turtle
cumulus@switch:~$ net add dot1x radius shared-secret mysecret
Enable 802.1X on the interfaces, then review and commit the new configuration:
cumulus@switch:~$ net add interface swp1-2 dot1x
cumulus@switch:~$ net pending
cumulus@switch:~$ net commit
Edit the /etc/network/interfaces file to create uplink ports and create a traditional mode bridge configuration on the switch.
a. Create uplink ports. The following example uses bonds:
cumulus@switch:~$ sudo nano /etc/network/interfaces
...
auto bond1
iface bond1
bond-slaves swp5 swp6
auto bond2
iface bond2
bond-slaves swp7 swp8
...
b. Create a traditional mode bridge configuration on the switch and add the switch ports that are members of the bridge. You must also create bridge associations with the parking VLAN ID and the dynamic VLAN IDs. In this example, 600 is used for the parking VLAN ID and 700 is used for the dynamic VLAN ID.
A traditional bridge cannot be named **** bridge as that name is reserved for the single VLAN-aware bridge on the switch. You can use glob syntax to add a range of interfaces.
cumulus@switch:~$ sudo nano /etc/network/interfaces
...
auto bridge1
iface bridge1
bridge-ports swp1-swp4
bridge-vlan-aware no
auto br-vlan600
iface br-vlan600
bridge-ports bond1.600
bridge-vlan-aware no
auto br-vlan700
iface br-vlan700
bridge-ports bond1.700
bridge-vlan-aware no
Edit the /etc/hostapd.conf file to configure 802.1X settings. The example below sets:
The IP address of the 802.1X RADIUS server to 127.0.0.1 (auth_server_addr=127.0.0.1). You can specify a VRF for outgoing RADIUS accounting and authorization packets (for example, to specify a VRF called turtle: auth_server_addr=127.0.0.1%turtle).
The shared secret to mysecret (auth_server_shared_secret=mysecret).
802.1X on swp1, swp2, swp3, and swp4 (interfaces=swp1,swp2,swp3,swp4).
You can configure the accounting and authentication ports in Cumulus Linux. The default values are 1813 for the accounting port and 1812 for the authentication port. You can also change the reauthentication period for Extensible Authentication Protocol (EAP). The period defaults to 0 (no re-authentication is performed by the switch).
To use different ports:
The following example commands change:
The authentication port to 2812
The accounting port to 2813
The reauthentication period for EAP to 86400
cumulus@switch:~$ net add dot1x radius authentication-port 2812
cumulus@switch:~$ net add dot1x radius accounting-port 2813
cumulus@switch:~$ net add dot1x eap-reauth-period 86400
cumulus@switch:~$ net pending
cumulus@switch:~$ net commit
Edit the /etc/hostapd.conf file to change the accounting and authentication ports. The example below sets:
The accounting port to 2813 (auth_server_port=2813)
The authentication port to 2812
The reauthentication period for EAP to 86400 (eap_reauth_period=86400)
MAC authentication bypass (MAB) enables bridge ports to allow devices to bypass authentication based on their MAC address. This is useful for devices that do not support PAE, such as printers or phones.
MAB must be configured on both the RADIUS server and the RADIUS client (the Cumulus Linux switch).
When using a VLAN-aware bridge, the switch port must be part of bridge named bridge.
To configure MAB:
Enable a bridge port for MAB. The following example commands enable bridge port swp1 for MAB:
cumulus@switch:~$ net add interface swp1 dot1x mab
cumulus@switch:~$ net pending
cumulus@switch:~$ net commit
Edit the /etc/hostapd.conf file to enable a bridge port for MAB. The following example enables bridge port swp1 for MAB.
If a non-authorized supplicant tries to communicate with the switch, you can route traffic from that device to a different VLAN and associate that VLAN with one of the switch ports to which the supplicant is attached.
For VLAN-aware bridges, the parking VLAN is assigned by manipulating the PVID of the switch port. For traditional mode bridges, Cumulus Linux identifies the bridge associated with the parking VLAN ID and moves the switch port into that bridge. If an appropriate bridge is not found for the move, the port remains in an unauthenticated state where no packets can be received or transmitted.
When using a VLAN-aware bridge, the switch port must be part of bridge named bridge.
Run the following commands:
cumulus@switch:~$ net add dot1x parking-vlan-id 777
cumulus@switch:~$ net add interface swp1 dot1x parking-vlan
cumulus@switch:~$ net pending
cumulus@switch:~$ net commit
If the authentication for swp1 fails, the port is moved to the parking VLAN:
cumulus@switch:~$ net show dot1x interface swp1 details
Interface MAC Address Attribute Value
--------- ----------------- ---------------------------- -----------------
swp1 00:02:00:00:00:08 Status Flags [PARKED_VLAN]
Username vlan60
Authentication Type MD5
VLAN 777
Session Time (seconds) 24772
EAPOL Frames RX 9
EAPOL Frames TX 12
EAPOL Start Frames RX 1
EAPOL Logoff Frames RX 0
EAPOL Response ID Frames RX 4
EAPOL Response Frames RX 8
EAPOL Request ID Frames TX 4
EAPOL Request Frames TX 8
EAPOL Invalid Frames RX 0
EAPOL Length Error Frames Rx 0
EAPOL Frame Version 2
EAPOL Auth Last Frame Source 00:02:00:00:00:08
EAPOL Auth Backend Responses 8
RADIUS Auth Session ID C2FED91A39D8D605
The following output shows a parking VLAN association failure. A VLAN association failure only occurs with traditional mode bridges when there is no traditional bridge available with a parking VLAN ID-tagged subinterface (notice the [UNKNOWN_BR] status in the output):
cumulus@switch:~$ net show dot1x interface swp3 details
Interface MAC Address Attribute Value
--------- ----------------- ---------------------------- -------------------------
swp1 00:02:00:00:00:08 Status Flags [PARKED_VLAN][UNKNOWN_BR]
Username vlan60
Authentication Type MD5
VLAN 777
Session Time (seconds) 24599
EAPOL Frames RX 3
EAPOL Frames TX 3
EAPOL Start Frames RX 1
EAPOL Logoff Frames RX 0
EAPOL Response ID Frames RX 1
EAPOL Response Frames RX 2
EAPOL Request ID Frames TX 1
EAPOL Request Frames TX 2
EAPOL Invalid Frames RX 0
EAPOL Length Error Frames Rx 0
EAPOL Frame Version 2
EAPOL Auth Last Frame Source 00:02:00:00:00:08
EAPOL Auth Backend Responses 2
RADIUS Auth Session ID C2FED91A39D8D605
Edit the /etc/hostapd.conf file to add the parking VLAN ID and port. The following example adds the parking VLAN ID 777 (parking_vlan_id=777) and port swp1 (parking_vlan_interfaces=swp1)
If the authentication for swp1 fails, the port is moved to the parking VLAN.
Configure Dynamic VLAN Assignments
A common requirement for campus networks is to assign dynamic VLANs to specific users in combination with IEEE 802.1x. After authenticating a supplicant, the user is assigned a VLAN based on the RADIUS configuration.
For VLAN-aware bridges, the dynamic VLAN is assigned by manipulating the PVID of the switch port. For traditional mode bridges, Cumulus Linux identifies the bridge associated with the dynamic VLAN ID and moves the switch port into that bridge. If an appropriate bridge is not found for the move, the port remains in an unauthenticated state where no packets can be received or transmitted.
To enable dynamic VLAN assignment globally, where VLAN attributes sent from the RADIUS server are applied to the bridge:
Run the following commands:
cumulus@switch:~$ net add dot1x dynamic-vlan
cumulus@switch:~$ net pending
cumulus@switch:~$ net commit
You can specify the require option in the command so that VLAN attributes are required. If VLAN attributes do not exist in the access response packet returned from the RADIUS server, the user is not authorized and has no connectivity. If the RADIUS server returns VLAN attributes but the user has an incorrect password, the user is placed in the parking VLAN (if you have configured parking VLAN).
cumulus@switch:~$ net add dot1x dynamic-vlan require
cumulus@switch:~$ net pending
cumulus@switch:~$ net commit
The following example shows a typical RADIUS configuration (shown for FreeRADIUS, not typically configured or run on the Cumulus Linux device) for a user with dynamic VLAN assignment:
# # VLAN 100 Client Configuration for Freeradius RADIUS Server.
# # This is not part of the CL configuration.
vlan100client Cleartext-Password := "client1password"
Service-Type = Framed-User,
Tunnel-Type = VLAN,
Tunnel-Medium-Type = "IEEE-802",
Tunnel-Private-Group-ID = 100
Verify the configuration (notice the [AUTHORIZED] status in the output):
cumulus@switch:~$ net show dot1x interface swp1 details
Interface MAC Address Attribute Value
--------- ----------------- ---------------------------- --------------------------
swp1 00:02:00:00:00:08 Status Flags [DYNAMIC_VLAN][AUTHORIZED]
Username host1
Authentication Type MD5
VLAN 888
Session Time (seconds) 799
EAPOL Frames RX 3
EAPOL Frames TX 3
EAPOL Start Frames RX 1
EAPOL Logoff Frames RX 0
EAPOL Response ID Frames RX 1
EAPOL Response Frames RX 2
EAPOL Request ID Frames TX 1
EAPOL Request Frames TX 2
EAPOL Invalid Frames RX 0
EAPOL Length Error Frames Rx 0
EAPOL Frame Version 2
EAPOL Auth Last Frame Source 00:02:00:00:00:08
EAPOL Auth Backend Responses 2
RADIUS Auth Session ID 939B1A53B624FC56
cumulus@switch:~$ net show dot1x interface summary
Interface MAC Address Username State Authentication Type MAB VLAN
--------- ----------------- ------------ ------------ ------------------- --- ----
swp1 00:02:00:00:00:08 000200000008 AUTHORIZED unknown NO 888
The following output shows a dynamic VLAN association failure. VLAN association failure only occurs with traditional mode bridges when there is no traditional bridge available with a parking VLAN ID-tagged subinterface in it (notice the [UNKNOWN_BR] status in the output):
cumulus@switch:~$ net show dot1x interface swp1 details
Interface MAC Address Attribute Value
--------- ----------------- ---------------------------- --------------------------------------
swp1 00:02:00:00:00:08 Status Flags [DYNAMIC_VLAN][AUTHORIZED][UNKNOWN_BR]
Username host2
Authentication Type MD5
VLAN 888
Session Time (seconds) 11
EAPOL Frames RX 3
EAPOL Frames TX 3
EAPOL Start Frames RX 1
EAPOL Logoff Frames RX 0
EAPOL Response ID Frames RX 1
EAPOL Response Frames RX 2
EAPOL Request ID Frames TX 1
EAPOL Request Frames TX 2
EAPOL Invalid Frames RX 0
EAPOL Length Error Frames Rx 0
EAPOL Frame Version 2
EAPOL Auth Last Frame Source 00:02:00:00:00:08
EAPOL Auth Backend Responses 2
RADIUS Auth Session ID BDF731EF2B765B78
Edit the /etc/hostapd.conf file to add the following options:
dynamic_vlan=1 (Specify dynamic_vlan=2 if you want VLAN attributes to be required. If VLAN attributes do not exist in the access response packet returned from the RADIUS server, the user is not authorized and has no connectivity. If the RADIUS server returns VLAN attributes but the user has an incorrect password, the user is placed in the parking VLAN, if you have configured parking VLAN).
radius_das_port=
radius_das_time_window=300
radius_das_require_event_timestamp=1
radius_das_require_message_authenticator=1
Remove the eap_send_identity=0 option. For example:
The following example shows a typical RADIUS configuration (shown for FreeRADIUS, not typically configured or run on the Cumulus Linux device) for a user with dynamic VLAN assignment:
# # VLAN 100 Client Configuration for Freeradius RADIUS Server.
# # This is not part of the CL configuration.
vlan100client Cleartext-Password := "client1password"
Service-Type = Framed-User,
Tunnel-Type = VLAN,
Tunnel-Medium-Type = "IEEE-802",
Tunnel-Private-Group-ID = 100
To disable dynamic VLAN assignment, where VLAN attributes sent from the RADIUS server are ignored and users are authenticated based on existing credentials:
Run the net del dot1x dynamic-vlan command:
cumulus@switch:~$ net del dot1x dynamic-vlan
cumulus@switch:~$ net pending
cumulus@switch:~$ net commit
Edit the /etc/hostapd.conf file to remove the following options:
dynamic_vlan=1
radius_das_port=
radius_das_time_window=300
radius_das_require_event_timestamp=1
radius_das_require_message_authenticator=1
Add the eap_send_identity=0 option. The following example shows the options in the /etc/hostapd.conf file
Enabling or disabling dynamic VLAN assignment restarts hostapd, which forces existing, authorized users to re-authenticate.
Dynamic ACLs
In high-security campus environments where 802.1X interfaces are in use, you can implement network access control at the user (supplicant) level using dynamic access control lists, or DACLs. A pre-auth ACL permits some traffic to traverse the network before 802.1X authorization takes place, then a dynamic ACL can be applied for that supplicant that is specific to an interface and the MAC address that was authorized (sometimes called a station).
Since DACLs restrict access to network resources at the user level, multiple users on the same VLAN can access different resources based on the policy provided by the RADIUS server. DACLs utilize NAS-Filter-Rule (RADIUS attribute 92), so you can configure them in your RADIUS server configuration and not on each switch.
The DACLs are also dynamically modified to fit the specific authenticating supplicant. For example, specific MAC addresses may be restricted to talk only to certain L3/L4 destinations.
Port security (MAC address restrictions) cannot be used at the same time as DACLs.
Cumulus Linux does not support configuring both Dynamic VLAN and DACLs on a given switch port at the same time.
The source MAC address of the user gaining authorization in the ebtables filter replaces the from any source IPv4 address.
Only a single destination port integer is supported; port ranges are not supported.
Any IPv4 protocol is supported either by name or number as supported in the Cumulus Linux ebtables implementation.
How It Works
A supplicant sends packets over a network port. A pre-802.1X authorization ACL executes. You can manually create your own pre-auth ACL filter or just use the Cumulus Linux default (see below). There are no NCLU commands for creating the filter itself.
When dot1x dynamic-acl is enabled on an interface, Cumulus Linux installs the pre-auth ACL defaults for the port (once you execute net commit).
When a supplicant on the port tries to get 802.1X authorized, the RADIUS server may (or may not) send along some NAS-Filter-Rule attributes in the Access-Accept message.
If any filters are sent from the RADIUS server, Cumulus Linux applies them before the default pre-auth ACL.
If no filters are sent, Cumulus Linux leaves the defaults in place, and no special access is granted to the user.
The NAS-Filter-Rule Attribute
The NAS-Filter-Rule attribute is a string of one or more octets that contains filter rules in the IPFilterRule syntax defined by RFC 6733. The IPFilterRule filters must follow this format:
action dir proto from src to dst [options]
Keyword
Definition
action
permit: Allow packets that match the rule. deny: Drop packets that match the rule.
dir
Direction: in is from the terminal, out is to the terminal. Only the in direction is supported.
proto
An IP protocol specified by number. The ip keyword means any protocol will match. Only IPv4 ACLs are supported.
src / dst
Source and destination IP address/subnet mask, and optional ports.
The syntax for NAS-Filter-Rule attributes configured in the RADIUS server varies widely by RADIUS vendor. But the resulting format for these rules contained in the Access-Accept must conform to the IPFilterRule syntax defined in by RFC 6733, Section 4.3, as mentioned above. When the Cumulus Linux switch gets these rules for a particular user, they are converted to ebtables rules using the actual user MAC address, and are then combined with the default pre-auth ACL rules.
The rules for the appropriate direction are evaluated in order, with the first matched rule terminating the evaluation. Each packet is evaluated once. If no rule matches, the packet is dropped if the last rule was a deny.
If these rules are invalid — for example, they contain contain port ranges or IPv6 addresses — the port does not get authorized and a log message is written to /var/log/syslog.
Get Started
To start applying a DACL to a port, configure the RADIUS server and client, then configure the port with the following:
You configure DACLs on the RADIUS server on your network using the methods provided by the RADIUS software, then you enable it for one or more switch ports on a given switch. This section shows the configuration methods for the FreeRADIUS server.
Configure the RADIUS Server
On the RADIUS server, set the password for the RADIUS client (that is, the Cumulus Linux switch) in the /etc/freeradius/3.0/clients.conf file as follows, using the src IP address of the switch:
Add the DACL configuration to the /etc/freeradius/3.0/users file. For example:
leaf01 Cleartext-Password := "CumulusLinux!"
Service-Type = Framed-User,
Tunnel-Type = VLAN,
Tunnel-Medium-Type = "IEEE-802",
Tunnel-Private-Group-ID = 222,
NAS-Filter-Rule = "permit in udp from any to any 67",
NAS-Filter-Rule = "permit in udp from any to 10.0.0.0/9 53",
NAS-Filter-Rule = "permit in udp from any to 10.0.0.0/9 123",
NAS-Filter-Rule = "permit in icmp from any to any",
NAS-Filter-Rule = "permit in ip from any to 172.16.0.99",
NAS-Filter-Rule = "permit in ip from any to 172.16.0.33",
NAS-Filter-Rule = "permit in ip from any to 172.16.0.105",
NAS-Filter-Rule = "permit in ip from any to 172.16.0.224",
NAS-Filter-Rule = "permit in ip from any to 172.16.224.142",
NAS-Filter-Rule = "permit in tcp from any to 172.16.224.0/9 8883",
NAS-Filter-Rule = "deny in ip from any to any"
ebtables converts this to a temporary file on the switch called something like /etc/cumulus/acl/policy.d/150_dot1x_dacl_swp2_000200000002.rules (the filename is always prefaced with 150_; default rules filenames are prefaced with 200_). It looks like the following:
cumulus@switch:~$ cat /etc/cumulus/acl/policy.d/150_dot1x_dacl_swp2_000200000002.rules
######## hostapd generated Dynamic ACL EBTABLES rule file ########
[ebtables]
-A FORWARD -i swp2 -s 00:02:00:00:00:02 -p IPV4 --ip-protocol UDP --ip-dport 67 -j mark --set-mark 2
-A FORWARD -i swp2 -s 00:02:00:00:00:02 -p IPV4 --ip-protocol UDP --ip-dport 67 -j ACCEPT
-A FORWARD -i swp2 -s 00:02:00:00:00:02 -p IPV4 --ip-dst 10.0.0.0/9 --ip-protocol UDP --ip-dport 53 -j mark --set-mark 2
-A FORWARD -i swp2 -s 00:02:00:00:00:02 -p IPV4 --ip-dst 10.0.0.0/9 --ip-protocol UDP --ip-dport 53 -j ACCEPT
-A FORWARD -i swp2 -s 00:02:00:00:00:02 -p IPV4 --ip-dst 10.0.0.0/9 --ip-protocol UDP --ip-dport 123 -j mark --set-mark 2
-A FORWARD -i swp2 -s 00:02:00:00:00:02 -p IPV4 --ip-dst 10.0.0.0/9 --ip-protocol UDP --ip-dport 123 -j ACCEPT
-A FORWARD -i swp2 -s 00:02:00:00:00:02 -p IPV4 --ip-dst 10.0.0.3 --ip-protocol ICMP -j mark --set-mark 2
-A FORWARD -i swp2 -s 00:02:00:00:00:02 -p IPV4 --ip-dst 10.0.0.3 --ip-protocol ICMP -j DROP
-A FORWARD -i swp2 -s 00:02:00:00:00:02 -p IPV4 --ip-dst 172.16.0.99 -j mark --set-mark 2
-A FORWARD -i swp2 -s 00:02:00:00:00:02 -p IPV4 --ip-dst 172.16.0.99 -j ACCEPT
-A FORWARD -i swp2 -s 00:02:00:00:00:02 -p IPV4 --ip-dst 172.16.131.99 -j mark --set-mark 2
-A FORWARD -i swp2 -s 00:02:00:00:00:02 -p IPV4 --ip-dst 172.16.131.99 -j ACCEPT
-A FORWARD -i swp2 -s 00:02:00:00:00:02 -p IPV4 --ip-dst 172.16.0.33 -j mark --set-mark 2
-A FORWARD -i swp2 -s 00:02:00:00:00:02 -p IPV4 --ip-dst 172.16.0.33 -j ACCEPT
-A FORWARD -i swp2 -s 00:02:00:00:00:02 -p IPV4 --ip-dst 172.16.131.105 -j mark --set-mark 2
-A FORWARD -i swp2 -s 00:02:00:00:00:02 -p IPV4 --ip-dst 172.16.131.105 -j ACCEPT
-A FORWARD -i swp2 -s 00:02:00:00:00:02 -p IPV4 --ip-dst 10.72.169.224 -j mark --set-mark 2
-A FORWARD -i swp2 -s 00:02:00:00:00:02 -p IPV4 --ip-dst 10.72.169.224 -j ACCEPT
-A FORWARD -i swp2 -s 00:02:00:00:00:02 -p IPV4 --ip-dst 10.72.168.142 -j mark --set-mark 2
-A FORWARD -i swp2 -s 00:02:00:00:00:02 -p IPV4 --ip-dst 10.72.168.142 -j ACCEPT
-A FORWARD -i swp2 -s 00:02:00:00:00:02 -p IPV4 --ip-dst 10.0.0.0/9 --ip-protocol TCP --ip-dport 8883 -j mark --set-mark 2
-A FORWARD -i swp2 -s 00:02:00:00:00:02 -p IPV4 --ip-dst 10.0.0.0/9 --ip-protocol TCP --ip-dport 8883 -j ACCEPT
-A FORWARD -i swp2 -s 00:02:00:00:00:02 -p IPV4 --ip-dst 10.0.0.0/9 --ip-protocol TCP --ip-dport 32768 -j mark --set-mark 2
-A FORWARD -i swp2 -s 00:02:00:00:00:02 -p IPV4 --ip-dst 10.0.0.0/9 --ip-protocol TCP --ip-dport 32768 -j ACCEPT
-A FORWARD -i swp2 -s 00:02:00:00:00:02 -p IPV4 -j mark --set-mark 2
-A FORWARD -i swp2 -s 00:02:00:00:00:02 -p IPV4 -j DROP
In the above rules file, the --set-mark 2 option ensures that the nearly identical next rule gets installed in the dedicated TCAM slice for 802.1X.
Configure the RADIUS Client
The Cumulus Linux switch is the RADIUS client.
Configure the Cumulus Linux switch as a RADIUS client using the net add dot1x radius command, and include your RADIUS server’s IP address and secret:
cumulus@leaf01:~$ net add dot1x radius server-ip 10.0.0.1
cumulus@leaf01:~$ net add dot1x radius shared-secret mysecret
Enable one or more switch ports for DACLs by running the net add dot1x interface <INTERFACE> dot1x dynamic-acl command. You can also enable MAC authentication bypass by including the mab option at the end of the command.
cumulus@leaf01:~$ net add interface swp1 dot1x dynamic-acl [mab]
cumulus@leaf01:~$ net pending
cumulus@leaf01:~$ net commit
Edit the /etc/hostapd.conf file to configure the RADIUS client and the DACL interface. The example below sets the IP address of the 802.1X RADIUS server to 10.0.0.1 (auth_server_addr=10.0.0.1), the shared secret to mysecret (auth_server_shared_secret=mysecret), 802.1X on swp1 and swp2 (interfaces=swp1,swp2), and swp2 as a DACL interface (dynamic_acl_interfaces=swp2).
A pre-auth ACL is a static ACL that is applied to all 802.1X dynamic ACL-enabled ports by default. It provides some basic services that are available before 802.1X authorization occurs. The default pre-auth ACL in Cumulus Linux allows for DHCP and DNS to operate without authorizing the supplicant.
The default pre-auth ACL file is /etc/cumulus/acl/policy.d/dot1x_preauth_dacl/default_preauth_dacl.rules, which you can modify, or you can create your own. The default pre-auth ACL permits DHCP (using source port 68 and destination port 67) and DNS (using destination port 53) before 802.1X authorization. You configure pre-auth ACLs only with ebtables syntax.
The pre-auth ACL is always applied to dynamic ACL-enabled 802.1X ports, even after authentication has already completed for any clients on a given switch port.
If you don’t use the default pre-auth ACL and don’t create your own, all traffic gets denied.
To create your own pre-auth ACL file, complete the following steps.
Create the pre-auth ACL file as shown in Linux Commands below, then run the net add dot1x default-dacl-preauth-filename <FILE> command.
cumulus@switch:~$ net add dot1x default-dacl-preauth-filename my_preauth_dacl.rules
cumulus@switch:~$ net pending
cumulus@switch:~$ net commit
Create your own pre-auth ACL file in the /etc/cumulus/acl/policy.d/dot1x_preauth_dacl/ directory. For example, the following file allows for DHCP, DNS and PXE to operate before authorizing the supplicant:
To see which interfaces are enabled for 802.1X, run the net show dot1x status command. The Interfaces line shows all 802.1X-enabled interfaces while the Dynamic ACL Interfaces line shows only those 802.1X interfaces that are enabled for DACLs:
cumulus@switch:~$ net show dot1x status
Hostapd IEEE 802.11 AP and IEEE 802.1X/WPA/WPA2/EAP Authenticator Daemon
Attribute Value
----------------------- ----------------
Current Status active (running)
Reload Status enabled
Interfaces swp1 swp2
MAB Interfaces
Voice Interfaces
Parking VLAN Interfaces
Dynamic ACL Interfaces swp2
Dynamic VLAN Status Disabled
8021x ACL Rules 10 used/256 max
To see which interfaces have attempted authorization for DACLs, run net show dot1x interface summary:
cumulus@switch:~$ net show dot1x interface summary
Interface MAC Address Username State Authentication Type MAB VLAN DACL Active
--------- ----------------- -------- ---------- ------------------- --- ---- -----------
swp1 00:02:00:00:00:01 host1 AUTHORIZED MD5 NO NO
swp2 00:02:00:00:00:02 host2 AUTHORIZED MD5 NO YES
To determine the name of the DACL rules file for an interface after it has been authorized and has received DACL rules, run net show dot1x interface swp1 detail. Look for the DACL Filename line:
cumulus@switch:~$ net show dot1x interface swp2 detail
Interface MAC Address Attribute Value
--------- ----------------- ---------------------------- -----------------
swp2 00:02:00:00:00:01 Status Flags [AUTHORIZED]
Username host1
Authentication Type MD5
VLAN
DACL Filename 150_dot1x_dacl_swp2_000200000002.rules
Session Time (seconds) 65
EAPOL Frames RX 3
EAPOL Frames TX 3
EAPOL Start Frames RX 1
EAPOL Logoff Frames RX 0
EAPOL Response ID Frames RX 1
To see which ACLs are applied to a given interface, run net show dot1x interface <INTERFACE> applied-acls, which is similar to the output of cl-acltool -L eb | grep swp1.
Cumulus Linux provides the send-eap-request-id option, which you can use to trigger EAP packets to be sent from the host side of a connection. For example, this option is required in a configuration where a PC connected to a phone attempts to send EAP packets to the switch via the phone but the PC does not receive a response from the switch (the phone might not be ready to forward packets to the switch after a reboot). Because the switch does not receive EAP packets, it attempts to authorize the PC with MAB instead of waiting for the packets. In this case, the PC might be placed into a parking VLAN to isolate it. To remove the PC from the parking VLAN, the switch needs to send an EAP request to the PC to trigger EAP.
To configure the switch send an EAP request, run these commands:
cumulus@switch:~$ net add dot1x send-eap-request-id
cumulus@switch:~$ net pending
cumulus@switch:~$ net commit
Only run this command if MAB is configured on an interface.
The PC might attempt 802.1X authorization through the bridged connection in the back of the phone before the phone completes MAB authorization. In this case, 802.1X authorization fails.
The net del dot1x send-eap-request-id command disables this feature.
RADIUS Change of Authorization and Disconnect Requests
Extensions to the RADIUS protocol (RFC 5176) enable the Cumulus Linux switch to act as a Dynamic Authorization Server (DAS) by listening for Change of Authorization (CoA) requests from the RADIUS server (Dynamic Authorization Client (DAC)) and taking action when needed, such as bouncing a port or terminating a user session. The IEEE 802.1x server (hostapd) running on Cumulus Linux has been adapted to handle these additional, unsolicited RADIUS requests.
Configure DAS
To configure DAS, provide the UDP port (3799 is the default port), the IP address, and the secret key for the DAS client.
The following example commands set the UDP port to the default port, the IP address of the DAS client to 10.0.2.228, and the secret key to myclientsecret:
cumulus@switch:~$ net add dot1x radius das-port default
cumulus@switch:~$ net add dot1x radius das-client-ip 10.0.2.228 das-client-secret mysecret123
cumulus@switch:~$ net commit
You can specify a VRF so that incoming RADIUS disconnect and CoA commands are received and acknowledged on the correct interface when VRF is configured. The following example specifies VRF turtle:
cumulus@switch:~$ net add dot1x radius das-port default
cumulus@switch:~$ net add dot1x radius das-client-ip 10.0.2.228 vrf turtle das-client-secret mysecret123
cumulus@switch:~$ net commit
You can configure up to four DAS clients to be authorized to send CoA commands. For example:
cumulus@switch:~$ net add dot1x radius das-port default
cumulus@switch:~$ net add dot1x radius das-client-ip 10.20.250.53 das-client-secret mysecret1
cumulus@switch:~$ net add dot1x radius das-client-ip 10.0.1.7 das-client-secret mysecret2
cumulus@switch:~$ net add dot1x radius das-client-ip 10.20.250.99 das-client-secret mysecret3
cumulus@switch:~$ net add dot1x radius das-client-ip 10.10.0.0.2 das-client-secret mysecret4
cumulus@switch:~$ net pending
cumulus@switch:~$ net commit
To see DAS configuration information, run the net show configuration dot1x command. For example:
You can specify a VRF so that incoming RADIUS disconnect and CoA commands are received and acknowledged on the correct interface when VRF is configured. The following example specifies VRF turtle:
You can disable DAS in Cumulus Linux at any time by running the
following commands:
cumulus@switch:~$ net del dot1x radius das-port
cumulus@switch:~$ net del dot1x radius das-client-ip
cumulus@switch:~$ net pending
cumulus@switch:~$ net commit
Edit the /etc/hostapd.conf file to remove the following options:
radius_das_port
radius_das_client
Restart the hostapd service:
cumulus@switch:~$ sudo systemctl restart hostapd
Terminate a User Session
From the DAC, users can create a disconnect message using the radclient utility (included in the Debian freeradius-utils package) on the RADIUS server or other authorized client. A disconnect message is sent as an unsolicited RADIUS Disconnect-Request packet to the switch to terminate a user session and discard all associated session context. The Disconnect-Request packet is used when the RADIUS server wants to disconnect the user after the session has been accepted by the RADIUS Access-Accept packet.
This is an example of a disconnect message created using the radclient utility:
$ echo "Acct-Session-Id=D91FE8E51802097" > disconnect-packet.txt
$ ## OPTIONAL ## echo "User-Name=somebody" >> disconnect-packet.txt
$ echo "Message-Authenticator=1" >> disconnect-packet.txt
$ echo "Event-Timestamp=1532974019" >> disconnect-packet.txt
# now send the packet with the radclient utility (from freeradius-utils deb package)
$ cat disconnect-packet.txt | radclient -x 10.0.0.1:3799 disconnect myclientsecret
To prevent unauthorized servers from disconnecting users, the Disconnect-Request packet must include certain identification attributes (described below). For a session to be disconnected, all parameters must match their expected values at the switch. If the parameters do not match, the switch discards the Disconnect-Request packet and sends a Disconnect-NAK (negative acknowledgment message).
The Message-Authenticator attribute is required.
If the packet comes from a different source IP address than the one defined by das-client-ip, the session is not disconnected and the hostapd logs the debug message: DAS: Drop message from unknown client.
The Event-Timestamp attribute is required. If Event-Timestamp in the packet is outside the time window, a debug message is shown in the hostapd logs: DAS: Unacceptable Event-Timestamp (1532978602; local time 1532979367) in packet from 10.10.0.21:45263 - drop
If the Acct-Session-Id attribute is omitted, the User-Nameattribute is used to find the session. If the User-Name attribute is omitted, the Acct-Session-Id attribute is used. If both the User-Name and the Acct-Session-Id attributes are supplied, they must match the username provided by the supplicant with the Acct-Session-Id provided. If neither are given or there is no match, a Disconnect-NAK message is returned to the RADIUS server with Error-Cause "Session-Context-Not-Found" and the following debug message is shown in the log:
RADIUS DAS: Acct-Session-Id match
RADIUS DAS: No matches remaining after User-Name check
hostapd_das_find_global_sta: checking ifname=swp2
RADIUS DAS: No matches remaining after Acct-Session-Id check
RADIUS DAS: No matching session found
DAS: Session not found for request from 10.10.0.1:58385
DAS: Reply to 10.10.0.1:58385
The following is an example of the Disconnect-Request packet received by the switch:
You can create a CoA bounce-host-port message from the RADIUS server using the radclient utility (included in the Debian freeradius-utils package). The bounce port can cause a link flap on an authentication port, which triggers DHCP renegotiation from one or more hosts connected to the port.
The following is an example of a Cisco AVPair CoA bounce-host-port message sent from the radclient utility:
You can send the NAS IPv4 or IPv6 address in access request and accounting packets. You can only configure one NAS IP address on the switch, which is used for all interface authorizations.
To configure the NAS IP address, run the following commands:
The following command example sets the NAS IP address to 10.0.0.1:
cumulus@switch:~$ net add dot1x radius nas-ip-address 10.0.0.1
Edit the /etc/hostapd.conf file and configure the own_ip_addr setting with the NAS IP address:
To delete the NAS IP address, either run the NCLU net del dot1x radius nas-ip-address command or edit the /etc/hostapd.conf file.
Troubleshooting
To check connectivity between two supplicants, ping one host from the other:
root@host1:/home/cumulus# ping 198.51.100.2
PING 10.0.0.2 (10.0.0.2) 56(84) bytes of data.
64 bytes from 10.0.0.2: icmp_seq=1 ttl=64 time=0.604 ms
64 bytes from 10.0.0.2: icmp_seq=2 ttl=64 time=0.552 ms
^C
--- 10.0.0.2 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1000ms
rtt min/avg/max/mdev = 0.552/0.578/0
You can run net show dot1x with the following options for more data:
json prints the command output in JSON format.
macs displays MAC address information.
port-details shows counters from the IEEE8021-PAE-MIB for ports.
radius-details shows counters from the RADIUS-CLIENT MIB (RFC 2618) for ports.
status displays the status of the daemon.
To check to see which MAC addresses have been authorized by RADIUS:
cumulus@switch:~$ net show dot1x macs
Interface Attribute Value
----------- ------------- -----------------
swp1 MAC Addresses 00:02:00:00:00:01
swp2 No Data
swp3 No Data
swp4 No Data
You can perform more advanced troubleshooting with the following commands.
To increase the debug level in hostapd, copy over the hostapd service file, then add -d, -dd or -ddd to the ExecStart line in the hostapd.service file:
To check tc rules in /var/lib/hostapd/acl/tc_swpX.rules with:
cumulus@switch:~$ sudo tc -s filter show dev swpXX parent 1:
cumulus@switch:~$ sudo tc -s filter show dev swpXX parent ffff:
Prescriptive Topology Manager - PTM
In data center topologies, right cabling is a time-consuming endeavor and is error prone. Prescriptive Topology Manager (PTM) is a dynamic cabling verification tool to help detect and eliminate such errors. It takes a Graphviz-DOT specified network cabling plan (something many operators already generate), stored in a topology.dot file, and couples it with runtime information derived from LLDP to verify that the cabling matches the specification. The check is performed on every link transition on each node in the network.
You can customize the topology.dot file to control ptmd at both the global/network level and the node/port level.
PTM runs as a daemon, named ptmd.
For more information, see man ptmd(8).
Supported Features
Topology verification using LLDP. ptmd creates a client connection to the LLDP daemon, lldpd, and retrieves the neighbor relationship between the nodes/ports in the network and compares them against the prescribed topology specified in the topology.dot file.
Only physical interfaces, such as swp1 or eth0, are currently supported. Cumulus Linux does not support specifying virtual interfaces, such as bonds or subinterfaces, such as eth0.200 in the topology file.
Integration with FRRouting (PTM to FRRouting notification).
Client management: ptmd creates an abstract named socket /var/run/ptmd.socket on startup. Other applications can connect to this socket to receive notifications and send commands.
Event notifications: see Scripts below.
User configuration via a topology.dot file; see below.
Configure PTM
ptmd verifies the physical network topology against a DOT-specified network graph file, /etc/ptm.d/topology.dot.
At startup, ptmd connects to lldpd, the LLDP daemon, over a Unix socket and retrieves the neighbor name and port information. It then compares the retrieved port information with the configuration information that it read from the topology file. If there is a match, it is a PASS, else it is a FAIL.
PTM performs its LLDP neighbor check using the PortID ifname TLV information.
ptmd Scripts
ptmd executes scripts at /etc/ptm.d/if-topo-pass and /etc/ptm.d/if-topo-failfor each interface that goes through a change and runs if-topo-pass when an LLDP or BFD check passes or if-topo-fails when the check fails. The scripts receive an argument string that is the result of the ptmctl command, described in the ptmd commands below.
Modify these default scripts as needed.
Configuration Parameters
You can configure ptmd parameters in the topology file. The parameters are classified as host-only, global, per-port/node and templates.
Host-only Parameters
Host-only parameters apply to the entire host on which PTM is running. You can include the hostnametype host-only parameter, which specifies if PTM uses only the host name (hostname) or the fully-qualified
domain name (fqdn) while looking for the self-node in the graph file. For example, in the graph file below PTM ignores the FQDN and only looks for switch04 because that is the host name of the switch on which it is running:
Always wrap the hostname in double quotes; for example, "www.example.com" to prevent ptmd from failing.
To avoid errors when starting the ptmd process, make sure that /etc/hosts and /etc/hostname both reflect the hostname you are using in the topology.dot file.
Global parameters apply to every port listed in the topology file. There are two global parameters: LLDP and BFD. LLDP is enabled by default; if no keyword is present, default values are used for all ports. However, BFD is disabled if no keyword is present, unless there is a per-port override configured. For example:
Templates provide flexibility in choosing different parameter combinations and applying them to a given port. A template instructs ptmd to reference a named parameter string instead of a default one. There are two parameter strings ptmd supports:
bfdtmpl specifies a custom parameter tuple for BFD.
lldptmpl specifies a custom parameter tuple for LLDP.
match_type, which defaults to the interface name (ifname), but can accept a port description (portdescr) instead if you want lldpd to compare the topology against the port description instead of the interface name. You can set this parameter globally or at the per-port level.
match_hostname, which defaults to the host name (hostname), but enables PTM to match the topology using the fully qualified domain name (fqdn) supplied by LLDP.
The following is an example of a topology with LLDP applied at the port level:
When you specify match_hostname=fqdn, ptmd will match the entire FQDN, (cumulus-2.domain.com in the example below). If you do not specify anything for match_hostname, ptmd matches based on hostname only, (cumulus-3 below), and ignores the rest of the URL:
BFD provides low overhead and rapid detection of failures in the paths between two network devices. It provides a unified mechanism for link detection over all media and protocol layers. Use BFD to detect failures for IPv4 and IPv6 single or multihop paths between any two network devices, including unidirectional path failure detection. For information about configuring BFD using PTM, see BFD.
Check Link State with FRRouting
The FRRouting routing suite enables additional checks to ensure that routing adjacencies are formed only on links that have connectivity matching the specification, as determined by ptmd.
You only need to do this to check link state; you do not need to enable PTM to determine BFD status.
When the global ptm-enable option is enabled, every interface has an implied ptm-enable line in the configuration stanza in the interfaces file.
To enable the global ptm-enable option, run the following FRRouting command:
To disable the checks, delete the ptm-enable parameter from the interface:
cumulus@switch:~$ net del interface swp51 ptm-enable
cumulus@switch:~$ net pending
cumulus@switch:~$ net commit
cumulus@switch:~$ sudo vtysh
switch# conf t
switch(config)# interface swp51
switch(config-if)# no ptm-enable
switch(config-if)# end
switch# write memory
switch# exit
cumulus@switch:~$
If you need to reenable PTM for that interface:
cumulus@switch:~$ net add interface swp51 ptm-enable
cumulus@switch:~$ net pending
cumulus@switch:~$ net commit
cumulus@switch:~$ sudo vtysh
switch# conf t
switch(config)# interface swp51
switch(config-if)# ptm-enable
switch(config-if)# end
switch# write memory
switch# exit
cumulus@switch:~$
With PTM enabled on an interface, the zebra daemon connects to ptmd over a Unix socket. Any time there is a change of status for an interface, ptmd sends notifications to zebra. Zebra maintains a ptm-status flag per interface and evaluates routing adjacency based on this flag. To check the per-interface ptm-status:
cumulus@switch:~$ net show interface swp1
Interface swp1 is up, line protocol is up
Link ups: 0 last: (never)
Link downs: 0 last: (never)
PTM status: disabled
vrf: Default-IP-Routing-Table
index 3 metric 0 mtu 1550
flags: <UP,BROADCAST,RUNNING,MULTICAST>
HWaddr: c4:54:44:bd:01:41
switch# show interface swp1
Interface swp1 is up, line protocol is up
Link ups: 0 last: (never)
Link downs: 0 last: (never)
PTM status: disabled
vrf: Default-IP-Routing-Table
index 3 metric 0 mtu 1550
flags: <UP,BROADCAST,RUNNING,MULTICAST>
HWaddr: c4:54:44:bd:01:41
...
ptmd Service Commands
PTM sends client notifications in CSV format.
To start or restart the ptmd service, run the following command. The topology.dot file must be present for the service to start.
cumulus@switch:~$ sudo systemctl status ptmd.service
ptmctl Commands
ptmctl is a client of ptmd that retrieves the operational state of the ports configured on the switch and information about BFD sessions from ptmd. ptmctl parses the CSV notifications sent by ptmd. See man ptmctl for more information.
ptmctl Examples
The examples below contain the following keywords in the output of the cbl status column:
cbl status Keyword
Definition
pass
The interface is defined in the topology file, LLDP information is received on the interface, and the LLDP information for the interface matches the information in the topology file.
fail
The interface is defined in the topology file, LLDP information is received on the interface, and the LLDP information for the interface does not match the information in the topology file.
N/A
The interface is defined in the topology file, but no LLDP information is received on the interface. The interface might be down or disconnected, or the neighbor is not sending LLDP packets. The N/A and fail status might indicate a wiring problem to investigate. The N/A status is not shown when you use the -l option with ptmctl; only interfaces that are receiving LLDP information are shown.
For basic output, use ptmctl without any options:
cumulus@switch:~$ sudo ptmctl
-------------------------------------------------------------
port cbl BFD BFD BFD BFD
status status peer local type
-------------------------------------------------------------
swp1 pass pass 11.0.0.2 N/A singlehop
swp2 pass N/A N/A N/A N/A
swp3 pass N/A N/A N/A N/A
For more detailed output, use the -d option:
cumulus@switch:~$ sudo ptmctl -d
--------------------------------------------------------------------------------------
port cbl exp act sysname portID portDescr match last BFD BFD
status nbr nbr on upd Type state
--------------------------------------------------------------------------------------
swp45 pass h1:swp1 h1:swp1 h1 swp1 swp1 IfName 5m: 5s N/A N/A
swp46 fail h2:swp1 h2:swp1 h2 swp1 swp1 IfName 5m: 5s N/A N/A
#continuation of the output
-------------------------------------------------------------------------------------------------
BFD BFD det_mult tx_timeout rx_timeout echo_tx_timeout echo_rx_timeout max_hop_cnt
peer DownDiag
-------------------------------------------------------------------------------------------------
N/A N/A N/A N/A N/A N/A N/A N/A
N/A N/A N/A N/A N/A N/A N/A N/A
To return information on active BFD sessions ptmd is tracking, use the -b option:
cumulus@switch:~$ sudo ptmctl -b
----------------------------------------------------------
port peer state local type diag
----------------------------------------------------------
swp1 11.0.0.2 Up N/A singlehop N/A
N/A 12.12.12.1 Up 12.12.12.4 multihop N/A
To return LLDP information, use the -l option. It returns only the active neighbors currently being tracked by ptmd.
cumulus@switch:~$ sudo ptmctl -l
---------------------------------------------
port sysname portID port match last
descr on upd
---------------------------------------------
swp45 h1 swp1 swp1 IfName 5m:59s
swp46 h2 swp1 swp1 IfName 5m:59s
To return detailed information on active BFD sessions ptmd is tracking, use the -b and -d option (results are for an IPv6-connected peer):
cumulus@switch:~$ sudo ptmctl -b -d
----------------------------------------------------------------------------------------
port peer state local type diag det tx_timeout rx_timeout
mult
----------------------------------------------------------------------------------------
swp1 fe80::202:ff:fe00:1 Up N/A singlehop N/A 3 300 900
swp1 3101:abc:bcad::2 Up N/A singlehop N/A 3 300 900
#continuation of output
---------------------------------------------------------------------
echo echo max rx_ctrl tx_ctrl rx_echo tx_echo
tx_timeout rx_timeout hop_cnt
---------------------------------------------------------------------
0 0 N/A 187172 185986 0 0
0 0 N/A 501 533 0 0
ptmctl Error Outputs
If there are errors in the topology file or there is no session, PTM returns appropriate outputs. Typical error strings are:
Topology file error [/etc/ptm.d/topology.dot] [cannot find node cumulus] -
please check /var/log/ptmd.log for more info
Topology file error [/etc/ptm.d/topology.dot] [cannot open file (errno 2)] -
please check /var/log/ptmd.log for more info
No Hostname/MgmtIP found [Check LLDPD daemon status] -
please check /var/log/ptmd.log for more info
No BFD sessions . Check connections
No LLDP ports detected. Check connections
Unsupported command
For example:
cumulus@switch:~$ sudo ptmctl
-------------------------------------------------------------------------
cmd error
-------------------------------------------------------------------------
get-status Topology file error [/etc/ptm.d/topology.dot]
[cannot open file (errno 2)] - please check /var/log/ptmd.log
for more info
If you encounter errors with the topology.dot file, you can use dot (included in the Graphviz package) to validate the syntax of the topology file.
Open the topology file with Graphviz to ensure that it is readable and that the file format is correct.
If you edit topology.dot file from a Windows system, be sure to double check the file formatting; there might be extra characters that keep the graph from working correctly.
Basic Topology Example
This is a basic example DOT file and its corresponding topology diagram. Use the same topology.dot file on all switches and do not split the file per device; this allows for easy automation by pushing/pulling the same exact file on each device.
ptmd in Incorrect Failure State while Zebra Interface Is Enabled
When ptmd is incorrectly in a failure state and the Zebra interface is enabled, PIF BGP sessions do not establish the route, but the subinterface on top of it does establish routes.
If the subinterface is configured on the physical interface and the physical interface is incorrectly marked as being in a PTM FAIL state, routes on the physical interface are not processed in FRR, but the subinterface is working.
Cannot Use Commas in PortDescr
If an LLDP neighbor advertises a PortDescr that contains commas, ptmctl -d splits the string on the commas and misplaces its components in other columns. Do not use commas in your port descriptions.
Port security is a layer 2 traffic control feature that enables you to manage network access from end-users. Use port security to:
Limit port access to specific MAC addresses so that the port does not forward ingress traffic from source addresses that are not defined.
Limit port access to only the first learned MAC address on the port (sticky MAC) so that the device with that MAC address has full bandwidth. You can provide a timeout so that the MAC address on that port no longer has access after a specified time.
Limit port access to a specific number of MAC addresses.
You can specify what action to take when there is a port security violation (drop packets or put the port into ADMIN down state) and add a timeout for the action to take effect.
Layer 2 interfaces in trunk or access mode are currently supported. However, interfaces in a bond are not supported.
Configure MAC Address Options
To limit port access to a specific MAC address, run the following commands.
The example commands configure swp1 to allow access to MAC address 00:02:00:00:00:05:
cumulus@switch:~$ net add interface swp1 port-security allowed-mac 00:02:00:00:00:05
You can specify only one MAC address with the NCLU command. To specify multiple MAC addresses, set the interface.<port>.port_security.static_mac parameter in the /etc/cumulus/switchd.d/port_security.conf file. See Configure Port Security Manually below.
To enable sticky MAC on a port, where the first learned MAC address on the port is the only MAC address allowed, run the following commands.
You can add a timeout value so that after the time specified, the MAC address ages out and no longer has access to the port. The default aging timeout value is 1800 seconds. You can specify a value between 0 and 3600 seconds.
The example commands enable sticky MAC on interface swp1, set the timeout value to 2000 seconds, and enable aging.
cumulus@switch:~$ net add interface swp1 port-security sticky-mac
cumulus@switch:~$ net add interface swp1 port-security sticky-mac timeout 2000
cumulus@switch:~$ net add interface swp1 port-security sticky-mac aging
cumulus@switch:~$ net pending
cumulus@switch:~$ net commit
To limit the number of MAC addresses that are allowed to access a port, run the following commands. You can specify a number between 0 and 512. The default is 32.
The example commands configure swp1 to limit access to 40 MAC addresses:
cumulus@switch:~$ net add interface swp1 port-security mac-limit 40
cumulus@switch:~$ net pending
cumulus@switch:~$ net commit
Configure Security Violation Actions
You can configure the action you want to take when there is a security violation on a port:
shutdown puts a port into ADMIN down state.
restrict drops packets. When packets are dropped, Cumulus Linux sends a log message.
You can also set a timeout value between 0 and 3600 seconds for the action to take effect. The default is 1800 seconds.
The following example commands put swp1 into ADMIN down state when there is a security violation and set the timeout value to 3600 seconds:
cumulus@switch:~$ net add interface swp1 port-security violation shutdown
cumulus@switch:~$ net add interface swp1 port-security violation timeout 3600
cumulus@switch:~$ net pending
cumulus@switch:~$ net commit
Enable Port Security Settings
After you configure the port security settings to suit your needs, you can enable security on a port with the following commands:
cumulus@switch:~$ net add interface swp1 port-security
cumulus@switch:~$ net pending
cumulus@switch:~$ net commit
To disable port security on a port, run the net del interface <interface> port-security command.
Configure Port Security Manually
You can edit the /etc/cumulus/switchd.d/port_security.conf file manually to configure port security instead of running the NCLU commands shown above. This procedure is useful if you use configuration scripts.
Add the configuration settings you want to use to the /etc/cumulus/switchd.d/port_security.conf file, then restart switchd to apply the changes.
Setting
Description
interface.<port>.port_security.enable
1 enables security on the port. 0 disables security on the port.
interface.<port>.port_security.mac_limit
The maximum number of MAC addresses allowed to access the port. You can specify a number between 0 and 512. The default is 32.
interface.<port>.port_security.static_mac
The specific MAC addresses allowed to access the port. You can specify multiple MAC addresses. Separate each MAC address with a space.
interface.<port>.port_security.sticky_mac
1 enables sticky MAC, where the first learned MAC address on the port is the only MAC address allowed. 0 disables sticky MAC.
interface.<port>.port_security.sticky_timeout
The time period after which the first learned MAC address ages out and no longer has access to the port. The default aging timeout value is 30 minutes. You can specify a value between 0 and 60 minutes.
interface.<port>.port_security.sticky_aging
1 enables sticky MAC aging. 0 disables sticky MAC aging.
interface.<port>.port_security.violation_mode
The violation mode: 0 (shutdown) puts a port into ADMIN down state. 1 (restrict) drops packets.
interface.<port>.port_security.violation_timeout
The number of seconds after which the violation mode times out. You can specify a value between 0 and 3600 seconds. The default value is 1800 seconds.
An example /etc/cumulus/switchd.d/port_security.conf configuration file is shown here:
cumulus@switch:~$ net show port-security
Interface Port security MAC limit Sticky MAC Sticky MAC aging Sticky MAC timeout Violation mode Timeout
--------- ------------- --------- ---------- ---------------- ------------------ -------------- -------
swp1 ENABLED 40 ENABLED ENABLED 2000 Shutdown 3600
swp2 Disabled NA NA NA NA Restrict 1800
swp3 Disabled NA NA NA NA Restrict 1800
swp4 Disabled NA NA NA NA Restrict 1800
swp5 Disabled NA NA NA NA Restrict 1800
swp6 Disabled NA NA NA NA Restrict 1800
...
To show port security settings for a specific port:
cumulus@switch:~$ net show port-security swp1
Interface swp1
Port security Enabled
Mac limit 40
Sticky mac ENABLED
Sticky MAC aging Enabled
Sticky MAC timeout 1440
Violation mode Shutdown
Violation timeout 3600
Mac addresses
00:02:00:00:00:05
00:02:00:00:00:06
The lldpd daemon implements the IEEE802.1AB (Link Layer Discovery Protocol, or LLDP) standard. LLDP shows you which ports are neighbors of a given port. By default, lldpd runs as a daemon and starts at system boot. lldpd command line arguments are placed in /etc/default/lldpd. All lldpd configuration options are saved in /etc/lldpd.conf or under /etc/lldpd.d/.
For more details on the command line arguments and configuration options, see man lldpd(8).
lldpd supports CDP (Cisco Discovery Protocol, v1 and v2) and logs by default into /var/log/daemon.log with an lldpd prefix.
You can use the lldpcli CLI tool to query the lldpd daemon for neighbors, statistics, and other running configuration information. See man lldpcli(8) for details.
Configure LLDP
You configure lldpd settings in /etc/lldpd.conf or /etc/lldpd.d/.
The last line in the example above shows that LLDP is disabled on eth0. To disable LLDP on a single port, edit the /etc/default/lldpd file. This file specifies the default options to present to the lldpd service when it starts. The following example uses the -I option to disable LLDP on swp43:
cumulus@switch:~$ sudo nano /etc/default/lldpd
# Add "-x" to DAEMON_ARGS to start SNMP subagent
# Enable CDP by default
DAEMON_ARGS="-c -I *,!swp43"
lldpd has two timers defined by the tx-interval setting that affect each switch port:
The first timer catches any port-related changes.
The second is a system-based refresh timer on each port that looks for other changes like hostname. This timer uses the tx-interval value multiplied by 20.
lldpd logs to /var/log/daemon.log with the lldpd prefix:
cumulus@switch:~$ sudo tail -f /var/log/daemon.log | grep lldp
Aug 7 17:26:17 switch lldpd[1712]: unable to get system name
Aug 7 17:26:17 switch lldpd[1712]: unable to get system name
Aug 7 17:26:17 switch lldpcli[1711]: lldpd should resume operations
Aug 7 17:26:32 switch lldpd[1805]: NET-SNMP version 5.4.3 AgentX subagent connected
Example lldpcli Commands
To show all neighbors on all ports and interfaces:
cumulus@switch:~$ sudo lldpcli show neighbors
-------------------------------------------------------------------------------
LLDP neighbors:
-------------------------------------------------------------------------------
Interface: eth0, via: LLDP, RID: 1, Time: 0 day, 17:38:08
Chassis:
ChassisID: mac 08:9e:01:e9:66:5a
SysName: PIONEERMS22
SysDescr: Cumulus Linux version 4.1.0 running on quanta lb9
MgmtIP: 192.168.0.22
Capability: Bridge, on
Capability: Router, on
Port:
PortID: ifname swp47
PortDescr: swp47
-------------------------------------------------------------------------------
Interface: swp1, via: LLDP, RID: 10, Time: 0 day, 17:08:27
Chassis:
ChassisID: mac 00:01:00:00:09:00
SysName: MSP-1
SysDescr: Cumulus Linux version 4.1.0 running on QEMU Standard PC (i440FX + PIIX, 1996)
MgmtIP: 192.0.2.9
MgmtIP: fe80::201:ff:fe00:900
Capability: Bridge, off
Capability: Router, on
Port:
PortID: ifname swp1
PortDescr: swp1
-------------------------------------------------------------------------------
Interface: swp2, via: LLDP, RID: 10, Time: 0 day, 17:08:27
Chassis:
ChassisID: mac 00:01:00:00:09:00
SysName: MSP-1
SysDescr: Cumulus Linux version 4.1.0 running on QEMU Standard PC (i440FX + PIIX, 1996)
MgmtIP: 192.0.2.9
MgmtIP: fe80::201:ff:fe00:900
Capability: Bridge, off
Capability: Router, on
Port:
PortID: ifname swp2
PortDescr: swp2
-------------------------------------------------------------------------------
Interface: swp3, via: LLDP, RID: 11, Time: 0 day, 17:08:27
Chassis:
ChassisID: mac 00:01:00:00:0a:00
SysName: MSP-2
SysDescr: Cumulus Linux version 4.1.0 running on QEMU Standard PC (i440FX + PIIX, 1996)
MgmtIP: 192.0.2.10
MgmtIP: fe80::201:ff:fe00:a00
Capability: Bridge, off
Capability: Router, on
Port:
PortID: ifname swp1
PortDescr: swp1
-------------------------------------------------------------------------------
Interface: swp4, via: LLDP, RID: 11, Time: 0 day, 17:08:27
Chassis:
ChassisID: mac 00:01:00:00:0a:00
SysName: MSP-2
SysDescr: Cumulus Linux version 4.1.0 running on QEMU Standard PC (i440FX + PIIX, 1996)
MgmtIP: 192.0.2.10
MgmtIP: fe80::201:ff:fe00:a00
Capability: Bridge, off
Capability: Router, on
Port:
PortID: ifname swp2
PortDescr: swp2
-------------------------------------------------------------------------------
Interface: swp49s1, via: LLDP, RID: 9, Time: 0 day, 16:55:00
Chassis:
ChassisID: mac 00:01:00:00:0c:00
SysName: TORC-1-2
SysDescr: Cumulus Linux version 4.1.0 running on QEMU Standard PC (i440FX + PIIX, 1996)
MgmtIP: 192.0.2.12
MgmtIP: fe80::201:ff:fe00:c00
Capability: Bridge, on
Capability: Router, on
Port:
PortID: ifname swp6
PortDescr: swp6
-------------------------------------------------------------------------------
Interface: swp49s0, via: LLDP, RID: 9, Time: 0 day, 16:55:00
Chassis:
ChassisID: mac 00:01:00:00:0c:00
SysName: TORC-1-2
SysDescr: Cumulus Linux version 4.1.0 running on QEMU Standard PC (i440FX + PIIX, 1996)
MgmtIP: 192.0.2.12
MgmtIP: fe80::201:ff:fe00:c00
Capability: Bridge, on
Capability: Router, on
Port:
PortID: ifname swp5
PortDescr: swp5
-------------------------------------------------------------------------------
cumulus@switch:~$ sudo lldpcli show statistics summary
---------------------------------------------------------------------
LLDP Global statistics:
---------------------------------------------------------------------
Summary of stats:
Transmitted: 648186
Received: 437557
Discarded: 0
Unrecognized: 0
Ageout: 10
Inserted: 38
Deleted: 10
To show the lldpd running configuration:
cumulus@switch:~$ sudo lldpcli show running-configuration
--------------------------------------------------------------------
Global configuration:
--------------------------------------------------------------------
Configuration:
Transmit delay: 30
Transmit hold: 4
Receive mode: no
Pattern for management addresses: (none)
Interface pattern: (none)
Interface pattern blacklist: (none)
Interface pattern for chassis ID: (none)
Override description with: (none)
Override platform with: Linux
Override system name with: (none)
Advertise version: yes
Update interface descriptions: no
Promiscuous mode on managed interfaces: no
Disable LLDP-MED inventory: yes
LLDP-MED fast start mechanism: yes
LLDP-MED fast start interval: 1
Source MAC for LLDP frames on bond slaves: local
Portid TLV Subtype for lldp frames: ifname
--------------------------------------------------------------------
▼
Runtime Configuration (Advanced)
A runtime configuration does not persist when you reboot the switch; all changes are lost.
To configure active interfaces:
cumulus@switch:~$ sudo lldpcli configure system interface pattern "swp*"
To configure inactive interfaces:
cumulus@switch:~$ sudo lldpcli configure system interface pattern *,!eth0,swp*
The active interface list always overrides the inactive interface list.
To reset any interface list to none:
cumulus@switch:~$ sudo lldpcli configure system interface pattern ""
Enable the SNMP Subagent in LLDP
LLDP does not enable the SNMP subagent by default. You need to edit /etc/default/lldpd and enable the -x option.
cumulus@switch:~$ sudo nano /etc/default/lldpd
# Add "-x" to DAEMON_ARGS to start SNMP subagent
# Enable CDP by default
DAEMON_ARGS="-c -x"
Change CDP Settings
Cumulus Linux provides support for CDP so that the switch can advertise information about itself with Cisco routers that do not support LLDP. By default, the Cumulus Linux switch sends CDP packets only if the peer sends CDP packets. You can change this setting by replacing -c in the /etc/default/lldpd file with one of the following options:
Option
Description
-cc
The Cumulus Linux switch sends CDPv1 packets even when there is no detected CDP peer.
-ccc
The Cumulus Linux switch sends CDPv2 packets even when there is no detected CDP peer.
-cccc
The Cumulus Linux switch disables CDPv1 and enables CDPv2.
-ccccc
The Cumulus Linux switch disables CDPv1 and forces CDPv2.
The following example changes the CDP setting to -ccc so that the switch sends CDPv2 packets even when there is no detected CDP peer:
You must restart the lldpd service for the changes to take effect.
cumulus@switch:~$ sudo systemctl restart lldpd
Considerations
Annex E (and hence Annex D) of IEEE802.1AB (lldp) is not supported.
If you configure both an eth0 IP address and a loopback IP address on the switch, LLDP advertises the loopback IP address as the management IP address. In this case, the Cumulus Linux switch behaves more like a typical Linux host than a networking appliance.
In Cumulus Linux, a voice VLAN is a VLAN dedicated to voice traffic on a switch port. Voice VLAN is part of a trunk port with two VLANs that comprises either of the following:
Native VLAN, which carries both data and voice traffic.
Voice VLAN, which carries the voice traffic, and a data or native VLAN, which carries the data traffic in a trunk port.
The voice traffic is an 802.1q-tagged packet with a VLAN ID (that might or might not be 0) and an 802.1p (3-bit layer 2 COS) with a specific value (typically 5 is assigned for voice traffic).
To capture LLDP information, check syslog or use tcpdump on an interface.
Considerations
A static voice VLAN configuration overwrites the existing configuration for the switch port.
Removing the bridge-vids or bridge-pvid configuration from a voice VLAN does not remove the VLAN from the bridge.
Configuring voice VLAN with NCLU does not configure lldpd in Cumulus Linux; LLDP-MED does not provide data and voice VLAN information. You can configure LLDP-MED for each interface in a new file in /etc/lldp.d. In the following example, the file is called /etc/lldpd.d/voice_vlan.conf:
You can also use the lldpcli command to configure an LLDP-MED network policy. However, lldpcli commands do not persist across switch reboots.
Ethernet Bridging - VLANs
Ethernet bridges enable hosts to communicate through layer 2 by connecting all of the physical and logical interfaces in the system into a single layer 2 domain. The bridge is a logical interface with a MAC address and an MTU (maximum transmission unit). The bridge MTU is the minimum MTU among all its members. By default, the bridge’s MAC address is the MAC address of the first port in the bridge-ports list. The bridge can also be assigned an IP address, as discussed below.
Bridge members can be individual physical interfaces, bonds, or logical interfaces that traverse an 802.1Q VLAN trunk.
Use VLAN-aware mode bridges instead of traditional mode bridges. The bridge driver in Cumulus Linux is capable of VLAN filtering, which allows for configurations that are similar to incumbent network devices. For a comparison of traditional and VLAN-aware modes, read
this knowledge base article.
Ethernet Bridge Types
The Cumulus Linux bridge driver supports two configuration modes; one that is VLAN-aware and one that follows a more traditional Linux bridge model.
NVIDIA recommends that you use VLAN-aware mode bridges instead of traditional mode bridges. The Cumulus Linux bridge driver is capable of VLAN filtering, which allows for configurations that are similar to incumbent network devices. For a comparison of traditional and VLAN-aware modes, read
this knowledge base article.
You can configure both VLAN-aware and traditional mode bridges on the same network in Cumulus Linux; however you cannot have more than one VLAN-aware bridge on a switch.
The MAC address for a frame is learned when the frame enters the bridge through an interface. The MAC address is recorded in the bridge table and the bridge forwards the frame to its intended destination by looking up the destination MAC address. The MAC entry is then maintained for 1800 seconds (30 minutes). If the frame is seen with the same source MAC address before the MAC entry age is exceeded, the MAC entry age is refreshed; if the MAC entry age is exceeded, the MAC address is deleted from the bridge table.
The following example NCLU command output shows a MAC address table for the bridge.
cumulus@switch:~$ net show bridge macs
VLAN Master Interface MAC TunnelDest State Flags LastSeen
-------- -------- ----------- ----------------- ------------ --------- ------- -----------------
untagged bridge swp1 44:38:39:00:00:03 00:00:15
untagged bridge swp1 44:38:39:00:00:04 permanent 20 days, 01:14:03
The CUE command to show a MAC address table for a bridge is cl show bridge domain <domain-id> mac-table.
bridge fdb Command Output
The Linux bridge fdb command interacts with the forwarding database table (FDB), which the bridge uses to store the MAC addresses it learns and the ports on which it learns those MAC addresses. The bridge fdb show command output contains some specific keywords:
Keyword
Description
self
The FDB entry belongs to the FDB on the device referenced by the device. For example, this FDB entry belongs to the VXLAN device: vx-1000: 00:02:00:00:00:08 dev vx-1000 dst 27.0.0.10 self
master
The FDB entry belongs to the FDB on the device’s master and the FDB entry is pointing to a master’s port. For example, this FDB entry is from the master device named bridge and is pointing to the VXLAN bridge port: vx-1001: 02:02:00:00:00:08 dev vx-1001 vlan 1001 master bridge
extern_learn
The FDB entry is managed (or offloaded) by an external control plane, such as the BGP control plane for EVPN.
The following example shows the bridge fdb show command output:
cumulus@switch:~$ bridge fdb show | grep 02:02:00:00:00:08
02:02:00:00:00:08 dev vx-1001 vlan 1001 extern_learn master bridge
02:02:00:00:00:08 dev vx-1001 dst 27.0.0.10 self extern_learn
02:02:00:00:00:08 is the MAC address learned with BGP EVPN.
The first FDB entry points to a Linux bridge entry that points to the VXLAN device vx-1001.
The second FDB entry points to the same entry on the VXLAN device and includes additional remote destination information.
The VXLAN FDB augments the bridge FDB with additional remote destination information.
All FDB entries that point to a VXLAN port appear as two entries. The second entry augments the remote destination information.
Considerations
A bridge cannot contain multiple subinterfaces of the same port. Attempting this configuration results in an error.
In environments where both VLAN-aware and traditional bridges are used, if a traditional bridge has a subinterface of a bond that is a normal interface in a VLAN-aware bridge, the bridge is flapped when the traditional bridge’s bond subinterface is brought down.
You cannot enslave a VLAN raw device to a different master interface (you cannot edit the vlan-raw-device setting in the /etc/network/interfaces file). You need to delete the VLAN and recreate it.
Cumulus Linux supports up to 2000 VLANs. This includes the internal interfaces, bridge interfaces, logical interfaces, and so on.
In Cumulus Linux, MAC learning is enabled by default on traditional and VLAN-aware bridge interfaces. Do not disable MAC learning unless you are using EVPN. See Ethernet Virtual Private Network - EVPN.
The VLAN-aware mode in Cumulus Linux implements a configuration model for large-scale layer 2 environments, with one single instance of spanning tree protocol. Each physical bridge member port is configured with the list of allowed VLANs as well as its port VLAN ID, either primary VLAN Identifier (PVID) or native VLAN. MAC address learning, filtering and forwarding are VLAN-aware. This significantly reduces the configuration size, and eliminates the large overhead of managing the port/VLAN instances as subinterfaces, replacing them with lightweight VLAN bitmaps and state updates.
You cannot have more than one VLAN-aware bridge on a switch.
Configure a VLAN-aware Bridge
The example below shows the commands required to create a VLAN-aware bridge configured for STP that contains two switch ports and includes 3 VLANs; the tagged VLANs 100 and 200 and the untagged (native) VLAN of 1.
cumulus@switch:~$ net add bridge bridge ports swp1-2
cumulus@switch:~$ net add bridge bridge vids 100,200
cumulus@switch:~$ net add bridge bridge pvid 1
cumulus@switch:~$ net pending
cumulus@switch:~$ net commit
Edit the /etc/network/interfaces file and add the bridge. An example configuration is shown below.
Run the ifreload -a command to load the new configuration:
cumulus@switch:~$ ifreload -a
With CUE, there is a default bridge called br_default, which has no ports assigned to it. The example below configures this default bridge.
cumulus@switch:~$ cl set interface swp1-2 bridge domain br_default
cumulus@switch:~$ cl set bridge domain br_default vlan 100,200
cumulus@switch:~$ cl set bridge domain br_default untagged 1
cumulus@switch:~$ cl config apply
The Primary VLAN Identifer (PVID) of the bridge defaults to 1. You do not have to specify bridge-pvid for a bridge or a port. However, even though this does not affect the configuration, it helps other users for readability. The following configurations are identical to each other and the configuration above:
If you specify bridge-vids or bridge-pvid at the bridge level, these configurations are inherited by all ports in the bridge. However, specifying any of these settings for a specific port overrides the setting in the bridge.
Do not try to bridge the management port eth0 with any switch ports (swp0, swp1 and so on). For example, if you create a bridge with eth0 and swp1, it will not work properly and might disrupt access to the management interface.
Reserved VLAN Range
For hardware data plane internal operations, the switching silicon requires VLANs for every physical port, Linux bridge, and layer 3 subinterface. Cumulus Linux reserves a range of VLANs by default; the reserved range is 3600-3999.
You can modify the reserved range if it conflicts with any user-defined VLANs, as long the new range is a contiguous set of VLANs with IDs anywhere between 2 and 4094, and the minimum size of the range is 150 VLANs.
To configure the reserved range:
Edit the /etc/cumulus/switchd.conf file to uncomment the resv_vlan_range line and specify a new range, then restart switchd:
Restarting the switchd service causes all network ports to reset, interrupting network services, in addition to resetting the switch hardware configuration.
VLAN Pruning
By default, the bridge port inherits the bridge VIDs. To configure a port to override the bridge VIDs:
The following example commands configure swp3 to override the bridge VIDs:
cumulus@switch:~$ net add bridge bridge ports swp1-3
cumulus@switch:~$ net add bridge bridge vids 100,200
cumulus@switch:~$ net add bridge bridge pvid 1
cumulus@switch:~$ net add interface swp3 bridge vids 200
cumulus@switch:~$ net pending
cumulus@switch:~$ net commit
The above commands create the following code snippets in the /etc/network/interfaces file:
cumulus@switch:~$ cl set interface swp1-3 bridge domain br_default
cumulus@switch:~$ cl set bridge domain br_default vlan 100,200
cumulus@switch:~$ cl set bridge domain br_default untagged 1
cumulus@switch:~$ cl set interface swp3 bridge domain br_default vlan 200
cumulus@switch:~$ cl config apply
Untagged/Access Ports
Access ports ignore all tagged packets. In the configuration below, swp1 and swp2 are configured as access ports, while all untagged traffic goes to VLAN 100:
cumulus@switch:~$ net add bridge bridge ports swp1-2
cumulus@switch:~$ net add bridge bridge vids 100,200
cumulus@switch:~$ net add bridge bridge pvid 1
cumulus@switch:~$ net add interface swp1 bridge access 100
cumulus@switch:~$ net add interface swp2 bridge access 100
cumulus@switch:~$ net pending
cumulus@switch:~$ net commit
The above commands create the following code snippets in the /etc/network/interfaces file:
auto bridge
iface bridge
bridge-ports swp1 swp2
bridge-pvid 1
bridge-vids 100 200
bridge-vlan-aware yes
auto swp1
iface swp1
bridge-access 100
auto swp2
iface swp2
bridge-access 100
Edit the /etc/network/interfaces file, then run the ifreload -a command.
cumulus@switch:~$ cl set interface swp1-2 bridge domain br_default
cumulus@switch:~$ cl set bridge domain br_default vlan 100,200
cumulus@switch:~$ cl set bridge domain br_default untagged 1
cumulus@switch:~$ cl set interface swp1 bridge domain br_default access 100
cumulus@switch:~$ cl set interface swp2 bridge domain br_default access 100
cumulus@switch:~$ cl config apply
Drop Untagged Frames
With VLAN-aware bridge mode, you can configure a switch port to drop any untagged frames. To do this, add bridge-allow-untagged no to the switch port (not to the bridge). This leaves the bridge port without a PVID and drops untagged packets.
To configure a switch port to drop untagged frames, run the net add interface swp2 bridge allow-untagged no command. The following example command configures swp2 to drop untagged frames:
cumulus@switch:~$ net add interface swp2 bridge allow-untagged no
When you check VLAN membership for that port, it shows that there is no untagged VLAN.
Edit the /etc/network/interfaces file to add the bridge-allow-untagged no line under the switch port interface stanza, then run the ifreload -a command. The following example configures swp2 to drop untagged frames:
cumulus@switch:~$ sudo nano /etc/network/interfaces
...
auto swp1
iface swp1
auto swp2
iface swp2
bridge-allow-untagged no
auto bridge
iface bridge
bridge-ports swp1 swp2
bridge-pvid 1
bridge-vids 10 100 200
bridge-vlan-aware yes
...
cumulus@switch:~$ sudo ifreload -a
When you check VLAN membership for that port, it shows that there is no untagged VLAN.
When configuring the VLAN attributes for the bridge, specify the attributes for each VLAN interface. If you are configuring the switch virtual interface (SVI) for the native VLAN, you must declare the native VLAN and specify its IP address. Specifying the IP address in the bridge stanza itself returns an error.
The following example commands declare native VLAN 100 with IPv4 address 10.1.10.2/24 and IPv6 address 2001:db8::1/32.
cumulus@switch:~$ net add vlan 100 ip address 10.1.10.2/24
cumulus@switch:~$ net add vlan 100 ipv6 address 2001:db8::1/32
cumulus@switch:~$ net pending
cumulus@switch:~$ net commit
Edit the /etc/network/interfaces file, then run the ifreload -a command. The following example declares native VLAN 100 with IPv4 address 10.1.10.2/24 and IPv6 address 2001:db8::1/32.
cumulus@switch:~$ cl set interface vlan100 ip address 10.1.10.2/24
cumulus@switch:~$ cl set interface vlan100 ip address 2001:db8::1/32
cumulus@switch:~$ cl config apply
In the above configuration, if your switch is configured for multicast routing, you do not need to specify bridge-igmp-querier-src, as there is no need for a static IGMP querier configuration on the switch. Otherwise, the static IGMP querier configuration helps to probe the hosts to refresh their IGMP reports.
When you configure a switch initially, all southbound bridge ports might be down; therefore, by default, the SVI is also down. You can force the SVI to always be up by disabling interface state tracking, which leaves the SVI in the UP state always, even if all member ports are down. Other implementations describe this feature as no autostate. This is beneficial if you want to perform connectivity testing.
To keep the SVI perpetually UP, create a dummy interface, then make the dummy interface a member of the bridge.
▼
Example Configuration
Consider the following configuration, without a dummy interface in the bridge:
With this configuration, when swp3 is down, the SVI is also down:
cumulus@switch:~$ ip link show swp3
5: swp3: <BROADCAST,MULTICAST> mtu 1500 qdisc pfifo_fast master bridge state DOWN mode DEFAULT group default qlen 1000
link/ether 2c:60:0c:66:b1:7f brd ff:ff:ff:ff:ff:ff
cumulus@switch:~$ ip link show bridge
35: bridge: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default
link/ether 2c:60:0c:66:b1:7f brd ff:ff:ff:ff:ff:ff
Now add the dummy interface to your network configuration:
Edit the /etc/network/interfaces file and add the dummy interface stanza before the bridge stanza:
cumulus@switch:~$ sudo nano /etc/network/interfaces
...
auto dummy
iface dummy
link-type dummy
auto bridge
iface bridge
...
Add the dummy interface to the bridge-ports line in the bridge configuration:
Save and exit the file, then reload the configuration:
cumulus@switch:~$ sudo ifreload -a
Now, even when swp3 is down, both the dummy interface and the bridge remain up:
cumulus@switch:~$ ip link show swp3
5: swp3: <BROADCAST,MULTICAST> mtu 1500 qdisc pfifo_fast master bridge state DOWN mode DEFAULT group default qlen 1000
link/ether 2c:60:0c:66:b1:7f brd ff:ff:ff:ff:ff:ff
cumulus@switch:~$ ip link show dummy
37: dummy: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue master bridge state UNKNOWN mode DEFAULT group default
link/ether 66:dc:92:d4:f3:68 brd ff:ff:ff:ff:ff:ff
cumulus@switch:~$ ip link show bridge
35: bridge: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default
link/ether 2c:60:0c:66:b1:7f brd ff:ff:ff:ff:ff:ff
IPv6 Link-local Address Generation
By default, Cumulus Linux automatically generates IPv6 link-local addresses on VLAN interfaces. If you want to use a different mechanism to assign link-local addresses, you can disable this feature. You can disable link-local automatic address generation for both regular IPv6 addresses and address-virtual (macvlan) addresses.
To disable automatic address generation for a regular IPv6 address on a VLAN:
Run the net add vlan <vlan> ipv6-addrgen off command. The following example command disables automatic address generation for a regular IPv6 address on VLAN 100.
cumulus@switch:~$ net add vlan 100 ipv6-addrgen off
cumulus@switch:~$ net pending
cumulus@switch:~$ net commit
Edit the /etc/network/interfaces file to add the line ipv6-addrgen off to the VLAN stanza, then run the ifreload -a command. The following example disables automatic address generation for a regular IPv6 address on VLAN 100.
cumulus@switch:~$ sudo nano /etc/network/interfaces
...
auto vlan100
iface vlan 100
ipv6-addrgen off
vlan-id 100
vlan-raw-device bridge
...
cumulus@switch:~$ ifreload -a
cumulus@switch:~$ cl set
cumulus@switch:~$ cl config apply
To reenable automatic link-local address generation for a VLAN:
Run the net del vlan <vlan> ipv6-addrgen off command. The following example command reenables automatic address generation for a regular IPv6 address on VLAN 100.
cumulus@switch:~$ net del vlan 100 ipv6-addrgen off
cumulus@switch:~$ net pending
cumulus@switch:~$ net commit
Edit the /etc/network/interfaces file to remove the line ipv6-addrgen off from the VLAN stanza, then run the ifreload -a command.
cumulus@switch:~$ cl set
cumulus@switch:~$ cl config apply
Example Configurations
The following sections provide example VLAN-aware bridge configurations.
Access Ports and Pruned VLANs
The following example configuration contains an access port and switch port that are pruned; they only send and receive traffic tagged to and from a specific set of VLANs declared by the bridge-vids attribute. It also contains other switch ports that send and receive traffic from all the defined VLANs.
...
# ports swp3-swp48 are trunk ports which inherit vlans from the 'bridge'
# ie vlans 310,700,707,712,850,910
#
auto bridge
iface bridge
bridge-ports swp1 swp2 swp3 ... swp51 swp52
bridge-vids 310 700 707 712 850 910
bridge-vlan-aware yes
auto swp1
iface swp1
bridge-access 310
mstpctl-bpduguard yes
mstpctl-portadminedge yes
# The following is a trunk port that is "pruned".
# native vlan is 1, but only .1q tags of 707, 712, 850 are
# sent and received
#
auto swp2
iface swp2
mstpctl-bpduguard yes
mstpctl-portadminedge yes
bridge-vids 707 712 850
# The following port is the trunk uplink and inherits all vlans
# from 'bridge'; bridge assurance is enabled using 'portnetwork' attribute
auto swp49
iface swp49
mstpctl-portnetwork yes
mstpctl-portpathcost 10
# The following port is the trunk uplink and inherits all vlans
# from 'bridge'; bridge assurance is enabled using 'portnetwork' attribute
auto swp50
iface swp50
mstpctl-portnetwork yes
mstpctl-portpathcost 0
...
Large Bond Set Configuration
The configuration below demonstrates a VLAN-aware bridge with a large set of bonds. The bond configurations are generated from a Mako template.
...
#
# vlan-aware bridge with bonds example
#
# uplink1, peerlink and downlink are bond interfaces.
# 'bridge' is a vlan aware bridge with ports uplink1, peerlink
# and downlink (swp2-20).
#
# native vlan is by default 1
#
# 'bridge-vids' attribute is used to declare vlans.
# 'bridge-pvid' attribute is used to specify native vlans if other than 1
# 'bridge-access' attribute is used to declare access port
#
auto lo
iface lo
auto eth0
iface eth0 inet dhcp
# bond interface
auto uplink1
iface uplink1
bond-slaves swp32
bridge-vids 2000-2079
# bond interface
auto peerlink
iface peerlink
bond-slaves swp30 swp31
bridge-vids 2000-2079 4094
# bond interface
auto downlink
iface downlink
bond-slaves swp1
bridge-vids 2000-2079
#
# Declare vlans for all swp ports
# swp2-20 get vlans from 2004 to 2022.
# The below uses mako templates to generate iface sections
# with vlans for swp ports
#
%for port, vlanid in zip(range(2, 20), range(2004, 2022)) :
auto swp${port}
iface swp${port}
bridge-vids ${vlanid}
%endfor
# svi vlan 2000
auto bridge.2000
iface bridge.2000
address 11.100.1.252/24
# l2 attributes for vlan 2000
auto bridge.2000
vlan bridge.2000
bridge-igmp-querier-src 172.16.101.1
#
# vlan-aware bridge
#
auto bridge
iface bridge
bridge-ports uplink1 peerlink downlink swp1 swp2 swp49 swp50
bridge-vlan-aware yes
# svi peerlink vlan
auto peerlink.4094
iface peerlink.4094
address 192.168.10.1/30
broadcast 192.168.10.3
...
VXLANs with VLAN-aware Bridges
Cumulus Linux supports using VXLANs with VLAN-aware bridge configuration. This provides improved scalability, as multiple VXLANs can be added to a single VLAN-aware bridge. A one to one association is used between the VXLAN VNI and the VLAN, with the bridge access VLAN definition on the VXLAN and the VLAN membership definition on the local bridge member interfaces.
The configuration example below shows the differences between a VXLAN configured for traditional bridge mode and one configured for VLAN-aware mode. The configurations use head end replication (HER) together with the VLAN-aware bridge to map VLANs to VNIs.
...
auto lo
iface lo inet loopback
address 10.35.0.10/32
auto bridge
iface bridge
bridge-ports uplink
bridge-pvid 1
bridge-vids 1-100
bridge-vlan-aware yes
auto vni-10000
iface vni-10000
alias CUSTOMER X VLAN 10
bridge-access 10
vxlan-id 10000
vxlan-local-tunnelip 10.35.0.10
vxlan-remoteip 10.35.0.34
...
Configure a Static MAC Address Entry
You can add a static MAC address entry to the layer 2 table for an interface within the VLAN-aware bridge by running a command similar to the following:
cumulus@switch:~$ sudo bridge fdb add 12:34:56:12:34:56 dev swp1 vlan 150 master static sticky
cumulus@switch:~$ sudo bridge fdb show
44:38:39:00:00:7c dev swp1 master bridge permanent
12:34:56:12:34:56 dev swp1 vlan 150 sticky master bridge static
44:38:39:00:00:7c dev swp1 self permanent
12:12:12:12:12:12 dev swp1 self permanent
12:34:12:34:12:34 dev swp1 self permanent
12:34:56:12:34:56 dev swp1 self permanent
12:34:12:34:12:34 dev bridge master bridge permanent
44:38:39:00:00:7c dev bridge vlan 500 master bridge permanent
12:12:12:12:12:12 dev bridge master bridge permanent
Considerations
Spanning Tree Protocol (STP)
Because STP is enabled on a per-bridge basis, VLAN-aware mode supports a single instance of STP across all VLANs. A common practice when using a single STP instance for all VLANs is to define every VLAN on every switch in the spanning tree instance.
IGMP snooping and group membership are supported on a per-VLAN basis; however, the IGMP snooping configuration (including enable, disable, and mrouter ports) is defined on a per-bridge port basis.
VLAN Translation
A bridge in VLAN-aware mode cannot have VLAN translation enabled. Only traditional mode bridges can utilize VLAN translation.
Convert Bridges between Supported Modes
You cannot convert traditional mode bridges automatically to and from a VLAN-aware bridge. You must delete the original configuration and bring down all member switch ports before creating a new bridge.
Traditional Bridge Mode
For a traditional Linux bridge, the kernel supports VLANs in the form of VLAN subinterfaces. Enabling bridging on multiple VLANs means configuring a bridge for each VLAN and, for each member port on a bridge, creating one or more VLAN subinterfaces out of that port. This mode poses scalability challenges in terms of configuration size as well as boot time and run time state management, when the number of ports times the number of VLANs becomes large.
Using a VLAN-aware bridge on your switch is recommended. Use traditional mode bridges only if you need to run more than one bridge on the switch or if you need to use PVSTP+.
Configure a Traditional Mode Bridge
The following examples show how to create a simple traditional mode bridge configuration on the switch. The example also shows some optional elements:
You can add an IP address to provide IP access to the bridge interface.
The following example commands configure a traditional mode bridge called my_bridge with IP address 10.10.10.10/24. swp1, swp2, swp3, and swp4 are members of the bridge.
cumulus@switch:~$ net add bridge my_bridge ports swp1-4
cumulus@switch:~$ net add bridge my_bridge ip address 10.10.10.10/24
cumulus@switch:~$ net pending
cumulus@switch:~$ net commit
Edit the /etc/network/interfaces file, then run the ifreload -a command. The following example command configures a traditional mode bridge called my_bridge with IP address 10.10.10.10/24. swp1, swp2, swp3, and swp4 are members of the bridge.
...
auto swp1
iface swp1
auto swp2
iface swp2
auto swp3
iface swp3
auto swp4
iface swp4
auto my_bridge
iface my_bridge
address 10.10.10.10/24
bridge-ports swp1 swp2 swp3 swp4
bridge-vlan-aware no
...
cumulus@switch:~$ sudo ifreload -a
cumulus@switch:~$ cl set
cumulus@switch:~$ cl config apply
The name of the bridge must be:
Compliant with Linux interface naming conventions.
Unique within the switch.
Something other than bridge, as Cumulus Linux reserves that name for a single VLAN-aware bridge.
Do not try to bridge the management port, eth0, with any switch ports (swp0, swp1, and so on). For example, if you create a bridge with eth0 and swp1, it does not work.
Configure Multiple Traditional Mode Bridges
You can configure multiple bridges to logically divide a switch into multiple layer 2 domains. This allows for hosts to communicate with other hosts in the same domain, while separating them from hosts in other domains.
The diagram below shows a multiple bridge configuration, where host-1 and host-2 are connected to bridge-A, while host-3 and host-4 are connected to bridge-B:
host-1 and host-2 can communicate with each other
host-3 and host-4 can communicate with each other
host-1 and host-2 cannot communicate with host-3 and host-4
This example configuration looks like this in the /etc/network/interfaces file:
...
auto bridge-A
iface bridge-A
bridge-ports swp1 swp2
bridge-vlan-aware no
auto bridge-B
iface bridge-B
bridge-ports swp3 swp4
bridge-vlan-aware no
...
Trunks in Traditional Bridge Mode
The standard for trunking is 802.1Q. The 802.1Q specification adds a 4 byte header within the Ethernet frame that identifies the VLAN of which the frame is a member.
802.1Q also identifies an untagged frame as belonging to the native VLAN (most network devices default their native VLAN to 1). The concept of native, non-native, tagged or untagged has generated confusion due to mixed terminology and vendor-specific implementations. In Cumulus Linux:
A trunk port is a switch port configured to send and receive 802.1Q tagged frames.
A switch sending an untagged (bare Ethernet) frame on a trunk port is sending from the native VLAN defined on the trunk port.
A switch sending a tagged frame on a trunk port is sending to the VLAN identified by the 802.1Q tag.
A switch receiving an untagged (bare Ethernet) frame on a trunk port places that frame in the native VLAN defined on the trunk port.
A switch receiving a tagged frame on a trunk port places that frame in the VLAN identified by the 802.1Q tag.
A bridge in traditional mode has no concept of trunks, just tagged or untagged frames. With a trunk of 200 VLANs, there would need to be 199 bridges, each containing a tagged physical interface, and one bridge containing the native untagged VLAN. See the examples below for more information.
The interaction of tagged and un-tagged frames on the same trunk often leads to undesired and unexpected behavior. A switch that uses VLAN 1 for the native VLAN may send frames to a switch that uses VLAN 2 for the native VLAN, thus merging those two VLANs and their spanning tree state.
Trunk Example
To create the above example:
cumulus@switch:~$ net add bridge br-VLAN100 ports swp1.100,swp2.100
cumulus@switch:~$ net add bridge br-VLAN200 ports swp1.200,swp2.200
cumulus@switch:~$ net pending
cumulus@switch:~$ net commit
Add the following configuration to the /etc/network/interfaces file:
...
auto br-VLAN100
iface br-VLAN100
bridge-ports swp1.100 swp2.100
auto br-VLAN200
iface br-VLAN200
bridge-ports swp1.200 swp2.200
...
cumulus@switch:~$ cl set
cumulus@switch:~$ cl config apply
On Broadcom switches, when two VLAN subinterfaces are bridged to each other in a traditional mode bridge, switchd does not assign an internal resource ID to the subinterface, which is expected for each VLAN subinterface. To work around this issue, add a VXLAN on the bridge so that it does not require a real tunnel IP address.
VLAN Tagging
This topic shows two examples of VLAN tagging, one basic and one more advanced. They both demonstrate the streamlined interface configuration from ifupdown2.
VLAN Tagging, a Basic Example
A simple configuration demonstrating VLAN tagging involves two hosts connected to a switch.
host1 connects to swp1 with both untagged frames and with 802.1Q frames tagged for vlan100.
host2 connects to swp2 with 802.1Q frames tagged for vlan120 and vlan130.
To configure the above example, edit the /etc/network/interfaces file and add a configuration like the following:
# Config for host1
auto swp1
iface swp1
auto swp1.100
iface swp1.100
# Config for host2
# swp2 must exist to create the .1Q subinterfaces, but it is not assigned an address
auto swp2
iface swp2
auto swp2.120
iface swp2.120
auto swp2.130
iface swp2.130
VLAN Tagging, an Advanced Example
This example of VLAN tagging is more complex, involving three hosts and two switches, with a number of bridges and a bond connecting them all.
host1 connects to bridge br-untagged with bare Ethernet frames and to bridge br-tag100 with 802.1q frames tagged for vlan100.
host2 connects to bridge br-tag100 with 802.1q frames tagged for vlan100 and to bridge br-vlan120 with 802.1q frames tagged for vlan120.
host3 connects to bridge br-vlan120 with 802.1q frames tagged for vlan120 and to bridge v130 with 802.1q frames tagged for vlan130.
bond2 carries tagged and untagged frames in this example.
Although not explicitly designated, the bridge member ports function as 802.1Q access ports and trunk ports. In the example above, comparing Cumulus Linux with a traditional Cisco device:
swp1 is equivalent to a trunk port with untagged and vlan100.
swp2 is equivalent to a trunk port with vlan100 and vlan120.
swp3 is equivalent to a trunk port with vlan120 and vlan130.
bond2 is equivalent to an EtherChannel in trunk mode with untagged, vlan100, vlan120, and vlan130.
Bridges br-untagged, br-tag100, br-vlan120, and v130 are equivalent to SVIs (switched virtual interfaces).
To create the above configuration, edit the /etc/network/interfaces file and add a configuration like the following:
# Config for host1
# swp1 does not need an iface section unless it has a specific setting,
# it will be picked up as a dependent of swp1.100.
# And swp1 must exist in the system to create the .1q subinterfaces..
# but it is not applied to any bridge..or assigned an address.
auto swp1.100
iface swp1.100
# Config for host2
# swp2 does not need an iface section unless it has a specific setting,
# it will be picked up as a dependent of swp2.100 and swp2.120.
# And swp2 must exist in the system to create the .1q subinterfaces..
# but it is not applied to any bridge..or assigned an address.
auto swp2.100
iface swp2.100
auto swp2.120
iface swp2.120
# Config for host3
# swp3 does not need an iface section unless it has a specific setting,
# it will be picked up as a dependent of swp3.120 and swp3.130.
# And swp3 must exist in the system to create the .1q subinterfaces..
# but it is not applied to any bridge..or assigned an address.
auto swp3.120
iface swp3.120
auto swp3.130
iface swp3.130
# Configure the bond
auto bond2
iface bond2
bond-slaves glob swp4-7
# configure the bridges
auto br-untagged
iface br-untagged
address 10.0.0.1/24
bridge-ports swp1 bond2
bridge-stp on
auto br-tag100
iface br-tag100
address 10.0.100.1/24
bridge-ports swp1.100 swp2.100 bond2.100
bridge-stp on
auto br-vlan120
iface br-vlan120
address 10.0.120.1/24
bridge-ports swp2.120 swp3.120 bond2.120
bridge-stp on
auto v130
iface v130
address 10.0.130.1/24
bridge-ports swp3.130 bond2.130
bridge-stp on
#
To verify:
cumulus@switch:~$ sudo mstpctl showbridge br-tag100
br-tag100 CIST info
enabled yes
bridge id 8.000.44:38:39:00:32:8B
designated root 8.000.44:38:39:00:32:8B
regional root 8.000.44:38:39:00:32:8B
root port none
path cost 0 internal path cost 0
max age 20 bridge max age 20
forward delay 15 bridge forward delay 15
tx hold count 6 max hops 20
hello time 2 ageing time 300
force protocol version rstp
time since topology change 333040s
topology change count 1
topology change no
topology change port swp2.100
last topology change port None
cumulus@switch:~$ sudo mstpctl showportdetail br-tag100 | grep -B 2 state
br-tag100:bond2.100 CIST info
enabled yes role Designated
port id 8.003 state forwarding
--
br-tag100:swp1.100 CIST info
enabled yes role Designated
port id 8.001 state forwarding
--
br-tag100:swp2.100 CIST info
enabled yes role Designated
port id 8.002 state forwarding
cumulus@switch:~$ cat /proc/net/bonding/bond2
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)
Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer3+4 (1)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0
802.3ad info
LACP rate: fast
Min links: 0
Aggregator selection policy (ad_select): stable
Active Aggregator Info:
Aggregator ID: 3
Number of ports: 4
Actor Key: 33
Partner Key: 33
Partner Mac Address: 44:38:39:00:32:cf
Slave Interface: swp4
MII Status: up
Speed: 10000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 44:38:39:00:32:8e
Aggregator ID: 3
Slave queue ID: 0
Slave Interface: swp5
MII Status: up
Speed: 10000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 44:38:39:00:32:8f
Aggregator ID: 3
Slave queue ID: 0
Slave Interface: swp6
MII Status: up
Speed: 10000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 44:38:39:00:32:90
Aggregator ID: 3
Slave queue ID: 0
Slave Interface: swp7
MII Status: up
Speed: 10000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 44:38:39:00:32:91
Aggregator ID: 3
Slave queue ID: 0
A single bridge cannot contain multiple subinterfaces of the same port as members. Attempting to apply such a configuration will result in an error:
cumulus@switch:~$ sudo brctl addbr another_bridge
cumulus@switch:~$ sudo brctl addif another_bridge swp9 swp9.100
bridge cannot contain multiple subinterfaces of the same port: swp9, swp9.100
VLAN Translation
By default, Cumulus Linux does not allow VLAN subinterfaces associated with different VLAN IDs to be part of the same bridge. Base interfaces are not explicitly associated with any VLAN IDs and are exempt from this restriction.
In some cases, it may be useful to relax this restriction. For example, two servers might be connected to the switch using VLAN trunks, but the VLAN numbering provisioned on the two servers are not consistent. You can choose to just bridge two VLAN subinterfaces of different VLAN IDs from the servers. You do this by enabling the sysctl net.bridge.bridge-allow-multiple-vlans. Packets entering a bridge from a member VLAN subinterface will egress another member VLAN subinterface with the VLAN ID translated.
A bridge in VLAN-aware mode cannot have VLAN translation enabled for it; only bridges configured in traditional mode can utilize VLAN translation.
The following example enables the VLAN translation sysctl:
If the sysctl is enabled and you want to disable it, run the above example, setting the sysctl net.bridge.bridge-allow-multiple-vlans to 0.
After sysctl is enabled, ports with different VLAN IDs can be added to the same bridge. In the following example, packets entering the bridge br-mix from swp10.100 will be bridged to swp11.200 with the VLAN ID translated from 100 to 200:
cumulus@switch:~$ sudo brctl addif br_mix swp10.100 swp11.200
cumulus@switch:~$ sudo brctl show br_mix
bridge name bridge id STP enabled interfaces
br_mix 8000.4438390032bd yes swp10.100
swp11.200
Spanning Tree and Rapid Spanning Tree - STP
Spanning tree protocol (STP) identifies links in the network and shuts down redundant links, preventing possible network loops and broadcast radiation on a bridged network. STP also provides redundant links for automatic failover when an active link fails. STP is enabled by default in Cumulus Linux for both VLAN-aware and traditional bridges.
Cumulus Linux supports RSTP, PVST, and PVRST modes:
Traditional bridges operate in both PVST and PVRST mode. The default is set to PVRST. Each traditional bridge has its own separate STP instance.
Per VLAN Spanning Tree (PVST) creates a spanning tree instance for a bridge. Rapid PVST (PVRST) supports RSTP enhancements for each spanning tree instance. To use PVRST with a traditional bridge, you must create a bridge corresponding to the untagged native VLAN and all the physical switch ports must be part of the same VLAN.
For maximum interoperability, when connected to a switch that has a native VLAN configuration, the native VLAN must be configured to be VLAN 1 only.
STP for a VLAN-aware Bridge
VLAN-aware bridges operate in RSTP mode only. RSTP on VLAN-aware bridges works with other modes in the following ways:
RSTP and STP
If a bridge running RSTP (802.1w) receives a common STP (802.1D) BPDU, it falls back to 802.1D automatically.
RSTP and PVST
The RSTP domain sends BPDUs on the native VLAN, whereas PVST sends BPDUs on a per VLAN basis. For both protocols to work together, you need to enable the native VLAN on the link between the RSTP to PVST domain; the spanning tree is built according to the native VLAN parameters.
The RSTP protocol does not send or parse BPDUs on other VLANs, but floods BPDUs across the network, enabling the PVST domain to maintain its spanning-tree topology and provide a loop-free network.
To enable proper BPDU exchange across the network, be sure to allow all VLANs participating in the PVST domain on the link between the RSTP and PVST domains.
When using RSTP together with an existing PVST network, you need to define the root bridge on the PVST domain. Either lower the priority on the PVST domain or change the priority of the RSTP switches to a higher number.
When connecting a VLAN-aware bridge to a proprietary PVST+ switch using STP, you must allow VLAN 1 on all 802.1Q trunks that interconnect them, regardless of the configured native VLAN. Only VLAN 1 enables the switches to address the BPDU frames to the IEEE multicast MAC address. The proprietary switch might be configured like this:
switchport trunk allowed vlan 1-100
RSTP and MST
RSTP works with MST seamlessly, creating a single instance of spanning tree that transmits BPDUs on the native VLAN.
RSTP treats the MST domain as one giant switch, whereas MST treats the RSTP domain as a different region. To enable proper communication between the regions, MST creates a Common Spanning Tree (CST) that connects all the boundary switches and forms the overall view of the MST domain. Because changes in the CST need to be reflected in all regions, the RSTP tree is included in the CST to ensure that changes on the RSTP domain are reflected in the CST domain. This does cause topology changes on the RSTP domain to impact the rest of the network but keeps the MST domain informed of every change occurring in the RSTP domain, ensuring a loop-free network.
Configure the root bridge within the MST domain by changing the priority on the relevant MST switch. When MST detects an RSTP link, it falls back into RSTP mode. The MST domain chooses the switch with the lowest cost to the CST root bridge as the CIST root bridge.
RSTP with MLAG
More than one spanning tree instance enables switches to load balance and use different links for different VLANs. With RSTP, there is only one instance of spanning tree. To better utilize the links, you can configure MLAG on the switches connected to the MST or PVST domain and set up these interfaces as an MLAG port. The PVST or MST domain thinks it is connected to a single switch and utilizes all the links connected to it. Load balancing is based on the port channel hashing mechanism instead of different spanning tree instances and uses all the links between the RSTP to the PVST or MST domains. For information about configuring MLAG, see Multi-Chassis Link Aggregation - MLAG.
Optional Configuration
There are a number of ways to customize STP in Cumulus Linux. Exercise caution when changing the settings below to prevent malfunctions in STP loop avoidance.
Spanning Tree Priority
If you have a multiple spanning tree instance (MSTI 0, also known as a common spanning tree, or CST), you can set the tree priority for a bridge. The bridge with the lowest priority is elected the root bridge. The priority must be a number between 0 and 61440, and must be a multiple of 4096. The default is 32768.
To set the tree priority, run the following commands:
The following example command sets the tree priority to 8192:
cumulus@switch:~$ net add bridge stp treeprio 8192
cumulus@switch:~$ net pending
cumulus@switch:~$ net commit
Configure the tree priority (mstpctl-treeprio) under the bridge stanza in the /etc/network/interfaces file, then run the ifreload -a command. The following example command sets the tree priority to 8192:
cumulus@switch:~$ sudo nano /etc/network/interfaces
...
auto bridge
iface bridge
# bridge-ports includes all ports related to VxLAN and CLAG.
# does not include the Peerlink.4094 subinterface
bridge-ports bond01 bond02 peerlink vni13 vni24 vxlan4001
bridge-pvid 1
bridge-vids 13 24
bridge-vlan-aware yes
mstpctl-treeprio 8192
...
Cumulus Linux supports MSTI 0 only. It does not support MSTI 1 through 15.
PortAdminEdge (PortFast Mode)
PortAdminEdge is equivalent to the PortFast feature offered by other vendors. It enables or disables the initial edge state of a port in a bridge.
All ports configured with PortAdminEdge bypass the listening and learning states to move immediately to forwarding.
PortAdminEdge mode might cause loops if it is not used with the BPDU guard feature.
It is common for edge ports to be configured as access ports for a simple end host; however, this is not mandatory. In the data center, edge ports typically connect to servers, which might pass both tagged and untagged traffic.
To configure PortAdminEdge mode:
The following example commands configure PortAdminEdge and BPDU guard for swp1.
cumulus@switch:~$ net add interface swp5 stp bpduguard
cumulus@switch:~$ net add interface swp5 stp portadminedge
cumulus@switch:~$ net pending
cumulus@switch:~$ net commit
Configure PortAdminEdge and BPDU guard under the switch port interface stanza in the /etc/network/interfaces file, then run the ifreload -a command. The following example configures PortAdminEdge and BPD guard on swp5.
cumulus@switch:~$ cl set interface swp5 bridge domain br_default stp admin-edge on
cumulus@switch:~$ cl set interface swp5 bridge domain br_default stp bpdu-guard on
cumulus@switch:~$ cl config apply
PortAutoEdge
PortAutoEdge is an enhancement to the standard PortAdminEdge (PortFast) mode, which allows for the automatic detection of edge ports. PortAutoEdge enables and disables the auto transition to and from the edge state of a port in a bridge.
Edge ports and access ports are not the same. Edge ports transition directly to the forwarding state and skip the listening and learning stages. Upstream topology change notifications are not generated when an edge port link changes state. Access ports only forward untagged traffic; however, there is no such restriction on edge ports, which can forward both tagged and untagged traffic.
When a BPDU is received on a port configured with PortAutoEdge, the port ceases to be in the edge port state and transitions into a normal STP port. When BPDUs are no longer received on the interface, the port becomes an edge port, and transitions through the discarding and learning states before resuming forwarding.
PortAutoEdge is enabled by default in Cumulus Linux.
To disable PortAutoEdge for an interface:
The following example commands disable PortAutoEdge on swp1:
cumulus@switch:~$ net add interface swp1 stp portautoedge no
cumulus@switch:~$ net pending
cumulus@switch:~$ net commit
Edit the switch port interface stanza in the /etc/network/interfaces file to add the mstpctl-portautoedge no line, then run the ifreload -a command. The following example disables PortAutoEdge on swp1:
cumulus@switch:~$ sudo nano /etc/network/interfaces
...
auto swp1
iface swp1
alias to Server01
# Port to Server02
mstpctl-portautoedge no
...
cumulus@switch:~$ sudo ifreload -a
cumulus@switch:~$ cl set interface swp1 bridge domain br_default stp auto-edge off
cumulus@switch:~$ cl config apply
To reenable PortAutoEdge for an interface:
The following example commands reenable PortAutoEdge on swp1:
cumulus@switch:~$ net del interface swp1 stp portautoedge no
cumulus@switch:~$ net pending
cumulus@switch:~$ net commit
Edit the switch port interface stanza in the /etc/network/interfaces file to remove mstpctl-portautoedge no, then run the ifreload -a command.
cumulus@switch:~$ cl set interface swp1 bridge domain br_default stp auto-edge on
cumulus@switch:~$ cl config apply
BPDU Guard
You can configure BPDU guard to protect the spanning tree topology from unauthorized switches affecting the forwarding path. For example, if you add a new switch to an access port off a leaf switch and this new switch is configured with a low priority, it might become the new root switch and affect the forwarding path for the entire layer 2 topology.
To configure BPDU guard:
The following example commands set BPDU guard for swp5:
cumulus@switch:~$ net add interface swp5 stp bpduguard
cumulus@switch:~$ net pending
cumulus@switch:~$ net commit
Edit the switch port interface stanza in the /etc/network/interfaces file to add the mstpctl-bpduguard yes line, then run the ifreload -a command. The following example sets BPDU guard for interface swp5:
cumulus@switch:~$ cl set interface swp5 bridge domain br_default stp bpdu-guard on
cumulus@switch:~$ cl config apply
If a BPDU is received on the port, STP brings down the port and logs an error in /var/log/syslog. The following is a sample error:
mstpd: error, MSTP_IN_rx_bpdu: bridge:bond0 Recvd BPDU on BPDU Guard Port - Port Down
To determine whether BPDU guard is configured, or if a BPDU has been received:
cumulus@switch:~$ net show bridge spanning-tree | grep bpdu
bpdu guard port yes bpdu guard error yes
cumulus@switch:~$ mstpctl showportdetail bridge bond0
bridge:bond0 CIST info
enabled no role Disabled
port id 8.001 state discarding
external port cost 305 admin external cost 0
internal port cost 305 admin internal cost 0
designated root 8.000.6C:64:1A:00:4F:9C dsgn external cost 0
dsgn regional root 8.000.6C:64:1A:00:4F:9C dsgn internal cost 0
designated bridge 8.000.6C:64:1A:00:4F:9C designated port 8.001
admin edge port no auto edge port yes
oper edge port no topology change ack no
point-to-point yes admin point-to-point auto
restricted role no restricted TCN no
port hello time 10 disputed no
bpdu guard port yes bpdu guard error yes
network port no BA inconsistent no
Num TX BPDU 3 Num TX TCN 2
Num RX BPDU 488 Num RX TCN 2
Num Transition FWD 1 Num Transition BLK 2
bpdufilter port no
clag ISL no clag ISL Oper UP no
clag role unknown clag dual conn mac 0:0:0:0:0:0
clag remote portID F.FFF clag system mac 0:0:0:0:0:0
cumulus@switch:~$ cl show bridge domain br_default stp
The only way to recover a port that has been placed in the disabled state is to manually bring up the port with the sudo ifup <interface> command. See Interface Configuration and Management for more information about ifupdown.
Bringing up the disabled port does not correct the problem if the configuration on the connected end-station has not been resolved.
Bridge Assurance
On a point-to-point link where RSTP is running, if you want to detect unidirectional links and put the port in a discarding state, you can enable bridge assurance on the port by enabling a port type network. The port is then in a bridge assurance inconsistent state until a BPDU is received from the peer. You need to configure the port type network on both ends of the link for bridge assurance to operate properly.
Bridge assurance is disabled by default.
To enable bridge assurance on an interface:
The following example commands enable bridge assurance on swp1:
cumulus@switch:~$ net add interface swp1 stp portnetwork
cumulus@switch:~$ net pending
cumulus@switch:~$ net commit
Edit the switch port interface stanza in the /etc/network/interfaces file to add the mstpctl-portnetwork yes line, then run the ifreload -a command. The following example enables bridge assurance on swp5:
You can enable bpdufilter on a switch port, which filters BPDUs in both directions. This disables STP on the port as no BPDUs are transiting.
Using BDPU filter might cause layer 2 loops. Use this feature deliberately and with extreme caution.
To configure the BPDU filter on an interface:
The following example commands configure the BPDU filter on swp6:
cumulus@switch:~$ net add interface swp6 stp portbpdufilter
cumulus@switch:~$ net pending
cumulus@switch:~$ net commit
Edit the switch port interface stanza in the /etc/network/interfaces file to add the mstpctl-portbpdufilter yes line, then run the ifreload -a command. The following example configures BPDU filter on swp6:
cumulus@switch:~$ cl set interface swp5 bridge domain br_default stp bpdu-filter on
cumulus@switch:~$ cl config apply
Parameter List
Spanning tree parameters are defined in the IEEE 802.1D and 802.1Q specifications.
The table below describes the STP configuration parameters available in Cumulus Linux. For a comparison of STP parameter configuration between mstpctl and other vendors, read this knowledge base article.
Most of these parameters are blacklisted in the ifupdown_blacklist section of the /etc/netd.conf file. Before you configure these parameters, you must edit the file to remove them from the blacklist.
Parameter
NCLU Command
Description
mstpctl-maxage
net add bridge stp maxage <seconds>
Sets the maximum age of the bridge in seconds. The default is 20. The maximum age must meet the condition 2 * (Bridge Forward Delay - 1 second) >= Bridge Max Age.
mstpctl-ageing
net add bridge stp ageing <seconds>
Sets the Ethernet (MAC) address ageing time for the bridge in seconds when the running version is STP, but not RSTP/MSTP. The default is 1800.
mstpctl-fdelay
net add bridge stp fdelay <seconds>
Sets the bridge forward delay time in seconds. The default value is 15. The bridge forward delay must meet the condition 2 * (Bridge Forward Delay - 1 second) >= Bridge Max Age.
mstpctl-maxhops
net add bridge stp maxhops <max-hops>
Sets the maximum hops for the bridge. The default is 20.
mstpctl-txholdcount
net add bridge stp txholdcount <hold-count>
Sets the bridge transmit hold count. The default value is 6.
mstpctl-forcevers
net add bridge stp forcevers RSTP|STP
Sets the force STP version of the bridge to either RSTP/STP. The default is RSTP.
mstpctl-treeprio
net add bridge stp treeprio <priority>
Sets the tree priority of the bridge for an MSTI (multiple spanning tree instance). The priority value is a number between 0 and 61440 and must be a multiple of 4096. The bridge with the lowest priority is elected the root bridge. The default is 32768. See Spanning Tree Priority above. Note: Cumulus Linux supports MSTI 0 only. It does not support MSTI 1 through 15.
mstpctl-hello
net add bridge stp hello <seconds>
Sets the bridge hello time in seconds. The default is 2.
mstpctl-portpathcost
net add interface <interface> stp portpathcost <cost>
Sets the port cost of the interface. The default is 0. mstpd supports only long mode; 32 bits for the path cost.
mstpctl-treeportprio
net add interface <interface> stp treeportprio <priority>
Sets the priority of the interface for the MSTI. The priority value is a number between 0 and 240 and must be a multiple of 16. The default is 128. Note: Cumulus Linux supports MSTI 0 only. It does not support MSTI 1 through 15.
mstpctl-portadminedge
net add interface <interface> stp portadminedge
Enables or disables the initial edge state of the interface in the bridge. The default is no. In NCLU, to use a setting other than the default, you must specify this attribute without setting an option. See PortAdminEdge above.
mstpctl-portautoedge
net add interface <interface> stp portautoedge
Enables or disables the auto transition to and from the edge state of the interface in the bridge. PortAutoEdge is enabled by default. See PortAutoEdge above.
mstpctl-portp2p
net add interface <interface> stp portp2p yes|no
Enables or disables the point-to-point detection mode of the interface in the bridge.
mstpctl-portrestrrole
net add interface <interface> stp portrestrrole
Enables or disables the ability of the interface in the bridge to take the restricted role. The default is no. To enable this feature with the NCLU command, you specify this attribute without an option (portrestrrole). To enable this feature by editing the /etc/network/interfaces file, you specify mstpctl-portrestrrole yes.
mstpctl-portrestrtcn
net add interface <interface> stp portrestrtcn
Enables or disables the ability of the interface in the bridge to propagate received topology change notifications. The default is no.
mstpctl-portnetwork
net add interface <interface> stp portnetwork
Enables or disables the bridge assurance capability for a network interface. The default is no. See Bridge Assurance above.
mstpctl-bpduguard
net add interface <interface> stp bpduguard
Enables or disables the BPDU guard configuration of the interface in the bridge. The default is no. See BPDU Guard above.
mstpctl-portbpdufilter
net add interface <interface> stp portbpdufilter
Enables or disables the BPDU filter functionality for an interface in the bridge. The default is no. See BPDU Filter above.
mstpctl-treeportcost
net add interface <interface> stp treeportcost <port-cost>
Sets the spanning tree port cost to a value from 0 to 255. The default is 0.
Troubleshooting
To check STP status for a bridge:
Run the net show bridge spanning-tree command:
cumulus@switch:~$ net show bridge spanning-tree
Bridge info
enabled yes
bridge id 8.000.44:38:39:FF:40:94
Priority: 32768
Address: 44:38:39:FF:40:94
This bridge is root.
designated root 8.000.44:38:39:FF:40:94
Priority: 32768
Address: 44:38:39:FF:40:94
root port none
path cost 0 internal path cost 0
max age 20 bridge max age 20
forward delay 15 bridge forward delay 15
tx hold count 6 max hops 20
hello time 2 ageing time 300
force protocol version rstp
INTERFACE STATE ROLE EDGE
--------- ----- ---- ----
peerlink forw Desg Yes
vni13 forw Desg Yes
vni24 forw Desg Yes
vxlan4001 forw Desg Yes
The mstpctl utility provided by the mstpd service configures STP. The mstpd daemon is an open source project used by Cumulus Linux to implement IEEE802.1D 2004 and IEEE802.1Q 2011.
The mstpd daemon starts by default when the switch boots and logs errors to /var/log/syslog.
mstpd is the preferred utility for interacting with STP on Cumulus Linux. brctl also provides certain tools for configuring STP; however, they are not as complete and output from brctl might be misleading.
To show the bridge state, run the brctl show command:
cumulus@switch:~$ sudo brctl show
bridge name bridge id STP enabled interfaces
bridge 8000.001401010100 yes swp1
swp4
swp5
To show the mstpd bridge port state, run the mstpctl showport bridge command:
Storm control provides protection against excessive inbound BUM (broadcast, unknown unicast, multicast) traffic on layer 2 switch port interfaces, which can cause poor network performance.
Storm control is not supported on a switch with the Tomahawk2 ASIC.
On Broadcom switches, ARP requests over layer 2 VXLAN bypass broadcast storm control; they are forwarded to the CPU and subjected to embedded control plane QoS instead.
Configure Storm Control
To configure storm control for physical ports, edit the /etc/cumulus/switchd.conf file. For example, to enable broadcast storm control for swp1 at 400 packets per second (pps), multicast storm control at 3000 pps, and unknown unicast at 500 pps, edit the /etc/cumulus/switchd.conf file and uncomment the storm_control.broadcast, storm_control.multicast, and storm_control.unknown_unicast lines:
cumulus@switch:~$ sudo nano /etc/cumulus/switchd.conf
...
# Storm Control setting on a port, in pps, 0 means disable
interface.swp1.storm_control.broadcast = 400
interface.swp1.storm_control.multicast = 3000
interface.swp1.storm_control.unknown_unicast = 500
...
When you update the /etc/cumulus/switchd.conf file, you must restart switchd for the changes to take effect.
Restarting the switchd service causes all network ports to reset, interrupting network services, in addition to resetting the switch hardware configuration.
Alternatively, you can run the following commands. The configuration below takes effect immediately, but does not persist if you reboot the switch. For a persistent configuration, edit the /etc/cumulus/switchd.conf file, as described above.
cumulus@switch:~$ sudo sh -c 'echo 400 > /cumulus/switchd/config/interface/swp1/storm_control/broadcast'
cumulus@switch:~$ sudo sh -c 'echo 3000 > /cumulus/switchd/config/interface/swp1/storm_control/multicast'
cumulus@switch:~$ sudo sh -c 'echo 500 > /cumulus/switchd/config/interface/swp1/storm_control/unknown_unicast'
To use the same command above on range of interfaces you can use a for-loop from the switch CLI using the below example.
cumulus@switch:mgmt:~$ for i in {1..5}; do
> sudo sh -c "echo 400 > /cumulus/switchd/config/interface/swp$i/storm_control/broadcast"
> sudo sh -c "echo 3000 > /cumulus/switchd/config/interface/swp$i/storm_control/multicast"
> sudo sh -c "echo 500 > /cumulus/switchd/config/interface/swp$i/storm_control/unknown_unicast"
> done
cumulus@switch:mgmt:~$
Bonding - Link Aggregation
Linux bonding provides a method for aggregating multiple network interfaces (slaves) into a single logical bonded interface (bond). Link aggregation is useful for linear scaling of bandwidth, load balancing, and failover protection.
Cumulus Linux supports two bonding modes:
IEEE 802.3ad link aggregation mode that allows one or more links to be aggregated together to form a link aggregation group (LAG) so that a media access control (MAC) client can treat the group as if it were a single link. IEEE 802.3ad link aggregation is the default mode.
Balance-xor mode, where the bonding of slave interfaces are static and all slave interfaces are active for load balancing and fault tolerance purposes. This is useful for MLAG deployments.
Cumulus Linux uses version 1 of the LAG control protocol (LACP).
To temporarily bring up a bond even when there is no LACP partner, use LACP Bypass.
Hash Distribution
Egress traffic through a bond is distributed to a slave based on a packet hash calculation, providing load balancing over the slaves; many conversation flows are distributed over all available slaves to load balance the total traffic. Traffic for a single conversation flow always hashes to the same slave.
The hash calculation uses packet header data to choose to which slave to transmit the packet:
For IP traffic, IP header source and destination fields are used in the calculation.
For IP + TCP/UDP traffic, source and destination ports are included in the hash calculation.
In a failover event, the hash calculation is adjusted to steer traffic over available slaves.
LAG Custom Hashing
On Mellanox switches, you can configure which fields are used in the LAG hash calculation. For example, if you do not want to use source or destination port numbers in the hash calculation, you can disable the source port and destination port fields.
You can configure the following fields:
Source MAC
Destination
Source IP
Destination IP
Ether type
VLAN ID
Source port
Destination port
Layer 3 protocol
To configure custom hash, edit the /etc/cumulus/datapath/traffic.conf file:
To enable custom hashing, uncomment the lag_hash_config.enable = true line.
To enable a field, set the field to true. To disable a field, set the field to false.
Run the echo 1 > /cumulus/switchd/ctrl/hash_config_reload command. This command does not cause any traffic interruptions.
The following shows an example /etc/cumulus/datapath/traffic.conf file:
cumulus@switch:~$ sudo nano /etc/cumulus/datapath/traffic.conf
...
#LAG HASH config
#HASH config for LACP to enable custom fields
#Fields will be applicable for LAG hash
#calculation
#Uncomment to enable custom fields configured below
lag_hash_config.enable = true
lag_hash_config.smac = true
lag_hash_config.dmac = true
lag_hash_config.sip = true
lag_hash_config.dip = true
lag_hash_config.ether_type = true
lag_hash_config.vlan_id = true
lag_hash_config.sport = false
lag_hash_config.dport = false
lag_hash_config.ip_prot = true
...
Symmetric hashing is enabled by default on Mellanox switches. Make sure that the settings for the source IP (lag_hash_config.sip) and destination IP (lag_hash_config.dip) fields match, and that the settings for the source port (lag_hash_config.sport) and destination port (lag_hash_config.dport) fields match; otherwise symmetric hashing is disabled automatically. You can disable symmetric hashing manually in the /etc/cumulus/datapath/traffic.conf file by setting symmetric_hash_enable = FALSE.
In the example below, the front panel port interfaces swp1 thru swp4 are slaves in bond0, while swp5 and swp6 are not part of bond0.
To create and configure a bond:
Run the net add bond command. The example command below creates a bond called bond0 with slaves swp1, swp2, swp3, and swp4:
cumulus@switch:~$ net add bond bond0 bond slaves swp1-4
cumulus@switch:~$ net pending
cumulus@switch:~$ net commit
Edit the /etc/network/interfaces file to add a stanza for the bond, then run the ifreload -a command. The example below creates a bond called bond0 with slaves swp1, swp2, swp3, and swp4:
cumulus@switch:~$ cl set interface bond0 bond member swp1-4
cumulus@switch:~$ cl config apply
The bond is configured by default in IEEE 802.3ad link aggregation mode. To configure the bond in balance-xor mode, see Configuration Parameters below.
If the bond is not going to become part of a bridge, you need to specify an IP address.
The name of the bond must be compliant with Linux interface naming conventions and unique within the switch.
Do not use a dash (-) in the bond name.
Cumulus Linux does not currently support bond members at 200G or greater.
When networking is started on the switch, bond0 is created as MASTER and interfaces swp1 thru swp4 come up in SLAVE mode, as seen in the ip link show command:
cumulus@switch:~$ ip link show
...
3: swp1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master bond0 state UP mode DEFAULT qlen 500
link/ether 44:38:39:00:03:c1 brd ff:ff:ff:ff:ff:ff
4: swp2: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master bond0 state UP mode DEFAULT qlen 500
link/ether 44:38:39:00:03:c1 brd ff:ff:ff:ff:ff:ff
5: swp3: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master bond0 state UP mode DEFAULT qlen 500
link/ether 44:38:39:00:03:c1 brd ff:ff:ff:ff:ff:ff
6: swp4: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master bond0 state UP mode DEFAULT qlen 500
link/ether 44:38:39:00:03:c1 brd ff:ff:ff:ff:ff:ff
...
55: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT
link/ether 44:38:39:00:03:c1 brd ff:ff:ff:ff:ff:ff
All slave interfaces within a bond have the same MAC address as the bond. Typically, the first slave you add to the bond donates its MAC address as the bond MAC address, whereas the MAC addresses of the other slaves are the bond MAC address. The bond MAC address is the source MAC address for all traffic leaving the bond and provides a single destination MAC address to address traffic to the bond.
Removing a bond slave interface from which a bond derives its MAC address affects traffic when the bond interface flaps to update the MAC address.
Configure Bond Options
The configuration options for a bond are are described in the table below. To configure a bond:
Run net add bond <bond-name> bond <option>. The following example sets the bond mode for bond01 to balance-xor:
cumulus@switch:~$ net add bond bond1 bond mode balance-xor
cumulus@switch:~$ net pending
cumulus@switch:~$ net commit
Edit the /etc/network/interfaces file to add the parameter to the bond stanza, then run the ifreload -a command. The following example sets the bond mode for bond01 to balance-xor:
The following example sets the bond mode for bond01 to balance-xor (static):
cumulus@switch:~$ cl set interface bond01 bond mode static
cumulus@switch:~$ cl config apply
Each bond configuration option, except for bond slaves, is set to the recommended value by default in Cumulus Linux. Only configure an option if a different setting is needed. For more information on configuration values, refer to the Related Information section below.
NCLU and Linux Parameter
CUE Attribute
Description
bond-mode 802.3ad|balance-xor
lacp|static
Cumulus Linux supports IEEE 802.3ad link aggregation mode (802.3ad) and balance-xor mode. The default mode is 802.3ad.
Note: When you enable balance-xor mode, the bonding of slave interfaces are static and all slave interfaces are active for load balancing and fault tolerance purposes. Packet transmission on the bond is based on the hash policy specified by xmit-hash-policy.
When using balance-xor mode to dual-connect host-facing bonds in an MLAG environment, you must configure the clag-id parameter on the MLAG bonds and it must be the same on both MLAG switches. Otherwise, the bonds are treated by the MLAG switch pair as single-connected.
Use balance-xor mode only if you cannot use LACP; LACP can detect mismatched link attributes between bond members and can even detect misconnections.
bond-slaves <interface-list>
member
The list of slaves in the bond.
bond miimon <value>
NEED ATTRIBUTE
Defines how often the link state of each slave is inspected for failures. You can specify a value between 0 and 255. The default value is 100.
bond downdelay <milliseconds>
down-delay
Specifies the time, in milliseconds (between 0 and 65535), to wait before disabling a slave after a link failure is detected. The default value is 0.
This option is only valid for the miimon link monitor. The downdelay value must be a multiple of the miimon value; if not, it is rounded down to the nearest multiple.
bond-updelay <milliseconds>
up-delay
Specifies the time, in milliseconds (between 0 and 65535), to wait before enabling a slave after a link recovery is detected. The default value is 0.
This option is only valid for the miimon link monitor. The updelay value must be a multiple of the miimon value; if not, it is rounded down to the nearest multiple.
bond-use-carrier no
NEED ATTRIBUTE
Determines the link state.
bond-lacp-bypass-allow
lacp-bypass
Enables LACP bypass.
bond-lacp-rate slow
lacp-rate
Sets the rate to ask the link partner to transmit LACP control packets. slow is the only option.
bond-min-links
NEED ATTRIBUTE
Defines the minimum number of links (between 0 and 255) that must be active before the bond is put into service. The default value is 1.
A value greater than 1 is useful if higher level services need to ensure a minimum aggregate bandwidth level before activating a bond. Keeping bond-min-links set to 1 indicates the bond must have at least one active member. If the number of active members drops below the bond-min-links setting, the bond appears to upper-level protocols as link-down. When the number of active links returns to greater than or equal to bond-min-links, the bond becomes link-up.
Show Bond Information
To show information for a bond:
Run the net show interface <bond> command:
cumulus@switch:~$ net show interface bond1
Name MAC Speed MTU Mode
-- ------ ----------------- ------- ----- ------
UP bond1 00:02:00:00:00:12 20G 1500 Bond
Bond Details
--------------- -------------
Bond Mode: Balance-XOR
Load Balancing: Layer3+4
Minimum Links: 1
In CLAG: CLAG Inactive
Port Speed TX RX Err Link Failures
-- ------- ------- ---- ---- ----- ---------------
UP swp3(P) 10G 0 0 0 0
UP swp4(P) 10G 0 0 0 0
LLDP
------- ---- ------------
swp3(P) ==== swp1(p1c1h1)
swp4(P) ==== swp2(p1c1h1)Routing
-------
Interface bond1 is up, line protocol is up
Link ups: 3 last: 2017/04/26 21:00:38.26
Link downs: 2 last: 2017/04/26 20:59:56.78
PTM status: disabled
vrf: Default-IP-Routing-Table
index 31 metric 0 mtu 1500
flags: <UP,BROADCAST,RUNNING,MULTICAST>
Type: Ethernet
HWaddr: 00:02:00:00:00:12
inet6 fe80::202:ff:fe00:12/64
Interface Type Other
Run the sudo cat /proc/net/bonding/<bond> command:
cumulus@switch:~$ sudo cat /proc/net/bonding/bond01
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)
Bonding Mode: load balancing (xor)
Transmit Hash Policy: layer3+4 (1)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0
Slave Interface: swp1
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 44:38:39:00:00:03
Slave queue ID: 0
cumulus@switch:~$ cl show interface bond0
The detailed output in /proc/net/bonding/<filename> includes the actor/partner LACP information. This information is not necessary and requires you to use sudo to view the file.
Considerations
An interface cannot belong to multiple bonds.
A bond can have subinterfaces, but subinterfaces cannot have a bond.
A bond cannot enslave VLAN subinterfaces.
Set all slave ports within a bond to the same speed/duplex and make sure they match the link partner’s slave ports.
On a Cumulus RMP switch, if you create a bond with multiple 10G member ports, traffic gets dropped when the bond uses members of the same unit listed in the /var/lib/cumulus/porttab file. For example, traffic gets dropped if both swp49 and swp52 are in the bond because they both are in xe0 (or if both swp50 and swp51 are in the same bond because they are both in xe1):
Single port member bonds, bonds with different units (xe0 or xe1, as above), or layer 3 bonds do not have this issue.
On Cumulus RMP switches, which are built with two Hurricane2 ASICs, you cannot form an LACP bond on links that terminate on different Hurricane2 ASICs.
MLAG or CLAG: The Cumulus Linux implementation of MLAG is referred to by other vendors as CLAG, MC-LAG or VPC. You will even see references to CLAG in Cumulus Linux, including the management daemon, named clagd, and other options in the code, such as clag-id, which exist for historical purposes. The Cumulus Linux implementation is truly a multi-chassis link aggregation protocol, so we call it MLAG.
Multi-Chassis Link Aggregation (MLAG) enables a server or switch with a two-port bond, such as a link aggregation group (LAG), EtherChannel, port group or trunk, to connect those ports to different switches and operate as if they are connected to a single, logical switch. This provides greater redundancy and greater system throughput.
Dual-connected devices can create LACP bonds that contain links to each physical switch; active-active links from the dual-connected devices are supported even though they are connected to two different physical switches.
How Does MLAG Work?
A basic MLAG configuration looks like this:
The two switches, leaf01 and leaf02, known as peer switches, appear as a single device to the bond on server01.
server01 distributes traffic between the two links to leaf01 and leaf02 in the way you configure on the host.
Traffic inbound to server01 can traverse leaf01 or leaf02 and arrive at server01.
More elaborate configurations are also possible. The number of links between the host and the switches can be greater than two and does not have to be symmetrical. Additionally, because the two peer switches appear as a single switch to other bonding devices, you can also connect pairs of MLAG switches to each other in a switch-to-switch MLAG configuration:
leaf01 and leaf02 are also MLAG peer switches and present a two-port bond from a single logical system to spine01 and spine02.
spine01 and spine02 do the same as far as leaf01 and leaf02 are concerned.
LACP and Dual-connected Links
Link Aggregation Control Protocol (LACP), the IEEE standard protocol for managing bonds, is used for verifying dual-connectedness. LACP runs on the dual-connected devices and on each of the MLAG peer switches. On a dual-connected device, the only configuration requirement is to create a bond that is managed by LACP.
On each of the peer switches, you must place the links that are connected to the dual-connected host or switch in the bond. This is true even if the links are a single port on each peer switch, where each port is placed into a bond, as shown below:
All of the dual-connected bonds on the peer switches have their system ID set to the MLAG system ID. Therefore, from the point of view of the hosts, each of the links in its bond is connected to the same system and so the host uses both links.
Each peer switch periodically makes a list of the LACP partner MAC addresses for all of their bonds and sends that list to its peer (using the clagd service). The LACP partner MAC address is the MAC address of the system at the other end of a bond (server01, server02, and server03 in the figure above). When a switch receives this list from its peer, it compares the list to the LACP partner MAC addresses on its switch. If any matches are found and the clag-id for those bonds match, then that bond is a dual-connected bond. You can find the LACP partner MAC address by the running net show bridge macs command.
Requirements
MLAG has these requirements:
There must be a direct connection between the two peer switches configured with MLAG. This is typically a bond for increased reliability and bandwidth.
There must be only two peer switches in one MLAG configuration, but you can have multiple configurations in a network for switch-to-switch MLAG.
Both switches in the MLAG pair must be identical; they must both be the same model of switch and run the same Cumulus Linux release. See Upgrading Cumulus Linux.
The dual-connected devices (servers or switches) can use LACP (IEEE 802.3ad or 802.1ax) to form the bond. In this case, the peer switches must also use LACP.
Cumulus Linux does not support MLAG with 802.1X; the switch cannot synchronize 802.1X authenticated MAC addresses over the peerlink.
The Edgecore Minipack AS8000 and Cumulus Express CX-11128 switches do not support MLAG.
Basic Configuration
To configure MLAG, you need to create a bond that uses LACP on the dual-connected devices and configure the interfaces (including bonds, VLANs, bridges, and peer links) on each peer switch.
Follow these steps on each peer switch in the MLAG pair:
On the dual-connected device, such as a host or server that sends traffic to and from the switch, create a bond that uses LACP. The method you use varies with the type of device you are configuring.
If you cannot use LACP in your environment, you can configure the bonds in balance-xor mode.
Place every interface that connects to the MLAG pair from a dual-connected device into a bond, even if the bond contains only a single link on a single physical switch.
The following examples place swp1 in bond1 and swp2 in bond2. The examples also add a description for the bonds (an alias), which is optional.
cumulus@leaf01:~$ net add bond bond1 bond slaves swp1
cumulus@leaf01:~$ net add bond bond1 alias bond1 on swp1
cumulus@leaf01:~$ net add bond bond2 bond slaves swp2
cumulus@leaf01:~$ net add bond bond2 alias bond2 on swp2
cumulus@leaf01:~$ net pending
cumulus@leaf01:~$ net commit
Add the following lines to the /etc/network/interfaces file:
cumulus@leaf01:~$ sudo nano /etc/network/interfaces
...
auto bond1
iface bond1
alias bond1 on swp1
bond-slaves swp1
...
auto bond2
iface bond2
alias bond2 on swp2
bond-slaves swp2
...
cumulus@leaf01:~$ cl set interface bond1 bond member swp1
cumulus@leaf01:~$ cl set NEED COMMAND FOR ALIAS
cumulus@leaf01:~$ cl set interface bond2 bond member swp2
cumulus@leaf01:~$ cl set NEED COMMAND FOR ALIAS
cumulus@leaf01:~$ cl config apply
Add a unique MLAG ID (clag-id) to each bond.
You must specify a unique MLAG ID (clag-id) for every dual-connected bond on each peer switch so that switches know which links are dual-connected or are connected to the same host or switch. The value must be between 1 and 65535 and must be the same on both peer switches. A value of 0 disables MLAG on the bond.
The example commands below add an MLAG ID of 1 to bond1 and 2 to bond2:
cumulus@leaf01:~$ net add bond bond1 clag id 1
cumulus@leaf01:~$ net add bond bond2 clag id 2
cumulus@leaf01:~$ net pending
cumulus@leaf01:~$ net commit
In the /etc/network/interfaces file, add the line clag-id 1 to the auto bond1 stanza and clag-id 2 to auto bond2 stanza:
cumulus@switch:~$ sudo nano /etc/network/interfaces
...
auto bond1
iface bond1
alias bond1 on swp1
bond-slaves swp1
clag-id 1
auto bond2
iface bond2
alias bond2 on swp2
bond-slaves swp2
clag-id 2
...
cumulus@leaf01:~$ cl set interface bond1 bond mlag id 1
cumulus@leaf01:~$ cl set interface bond2 bond mlag id 2
cumulus@leaf01:~$ cl config apply
Add the bonds you created above to a bridge. The example commands below add bond1 and bond2 to a VLAN-aware bridge.
On Mellanox switches, you must add all VLANs configured on the MLAG bond to the bridge so that traffic to the downstream device connected in MLAG is redirected successfully over the peerlink in case of an MLAG bond failure.
cumulus@leaf01:~$ net add bridge bridge ports bond1,bond2
cumulus@leaf01:~$ net pending
cumulus@leaf01:~$ net commit
Edit the /etc/network/interfaces file to add the bridge-ports bond1 bond2 lines to the auto bridge stanza:
Create the inter-chassis bond and the peer link VLAN (as a VLAN subinterface). You also need to provide the peer link IP address, the MLAG bond interfaces, the MLAG system MAC address, and the backup interface.
By default, the NCLU command configures the inter-chassis bond with the name peerlink and the peer link VLAN with the name peerlink.4094. Use peerlink.4094 to ensure that the VLAN is independent of the bridge and spanning tree forwarding decisions.
The peer link IP address is an unrouteable link-local address that provides layer 3 connectivity between the peer switches.
NVIDIA provides a reserved range of MAC addresses for MLAG (between 44:38:39:ff:00:00 and 44:38:39:ff:ff:ff). Use a MAC address from this range to prevent conflicts with other interfaces in the same bridged network.
Do not to use a multicast MAC address.
Do not use the same MAC address for different MLAG pairs; make sure you specify a different MAC address for each MLAG pair in the network.
The backup IP address is any layer 3 backup interface for the peer link, which is used in case the peer link goes down. The backup IP address is required and must be different than the peer link IP address. It must be reachable by a route that does not use the peer link. Use the loopback or management IP address of the switch.
▼
Loopback or Management IP Address?
If your MLAG configuration has bridged uplinks (such as a campus network or a large, flat layer 2 network), use the peer switch eth0 address. When the peer link is down, the secondary switch routes towards the eth0 address using the OOB network (provided you have implemented an OOB network).
If your MLAG configuration has routed uplinks (a modern approach to the data center fabric network), use the peer switch loopback address. When the peer link is down, the secondary switch routes towards the loopback address using uplinks (towards the spine layer). If the primary switch is also suffering a more significant problem (for example, switchd is unresponsive or stopped), the secondary switch eventually promotes itself to primary and traffic now flows normally.
When using BGP, to ensure IP connectivity between the loopbacks, the MLAG peer switches must use unique BGP ASNs; if they use the same ASN, you must bypass the BGP loop prevention check on the AS_PATH attribute.
The following examples show commands for both MLAG peers (leaf01 and leaf02).
The NCLU command is a macro command that:
Automatically creates the inter-chassis bond (peerlink) and the peer link VLAN subinterface (peerlink.4094), and adds the peerlink bond to the bridge
Configures the peer link IP address (primary is the link-local address)
Adds the MLAG system MAC address, the MLAG bond interfaces, and the backup IP address you specify
cumulus@leaf01:~$ net add clag peer sys-mac 44:38:39:BE:EF:AA interface swp49-50 primary backup-ip 10.10.10.2
cumulus@leaf01:~$ net pending
cumulus@leaf01:~$ net commit
To configure the backup link to a VRF, include the name of the VRF with the backup-ip parameter. The following example configures the backup link to VRF RED:
cumulus@leaf01:~$ net add clag peer sys-mac 44:38:39:BE:EF:AA interface swp49-50 primary backup-ip 10.10.10.2 vrf RED
cumulus@leaf01:~$ net pending
cumulus@leaf01:~$ net commit
cumulus@leaf02:~$ net add clag peer sys-mac 44:38:39:BE:EF:AA interface swp49-50 primary backup-ip 10.10.10.1
cumulus@leaf02:~$ net pending
cumulus@leaf02:~$ net commit
To configure the backup link to a VRF, include the name of the VRF with the backup-ip parameter. The following example configures the backup link to VRF RED:
cumulus@leaf02:~$ net add clag peer sys-mac 44:38:39:BE:EF:AA interface swp49-50 primary backup-ip 10.10.10.1 vrf RED
cumulus@leaf02:~$ net pending
cumulus@leaf02:~$ net commit
Edit the /etc/network/interfaces file to add the following parameters, then run the sudo ifreload -a command.
The inter-chasis bond (peerlink) with two ports in the bond (swp49 and swp50 in the example command below)
The peerlink bond to the bridge
The peer link VLAN (peerlink.4094) with the backup IP address, the peer link IP address (link-local), and the MLAG system MAC address (from the reserved range of addresses).
To configure the backup link to a VRF, include the name of the VRF with the clagd-backup-ip parameter. The following example configures the backup link to VRF RED:
cumulus@leaf01:~$ sudo nano /etc/network/interfaces
...
auto peerlink.4094
iface peerlink.4094
clagd-backup-ip 10.10.10.2 vrf RED
clagd-peer-ip linklocal
clagd-sys-mac 44:38:39:BE:EF:AA
...
Run the sudo ifreload -a command to apply all the configuration changes:
To configure the backup link to a VRF, include the name of the VRF with the clagd-backup-ip parameter. The following example configures the backup link to VRF RED:
cumulus@leaf02:~$ sudo nano /etc/network/interfaces
...
auto peerlink.4094
iface peerlink.4094
clagd-backup-ip 10.10.10.1 vrf RED
clagd-peer-ip linklocal
clagd-sys-mac 44:38:39:BE:EF:AA
...
Run the sudo ifreload -a command to apply all the configuration changes:
cumulus@leaf02:~$ sudo ifreload -a
cumulus@leaf01:~$ cl set interface peerlink bond member swp49-50
cumulus@leaf01:~$ cl set mlag mac-address 44:38:39:BE:EF:AA
cumulus@leaf01:~$ cl set mlag backup 10.10.10.2
cumulus@leaf01:~$ cl set mlag peer-ip linklocal
cumulus@leaf01:~$ cl config apply
To configure the backup link to a VRF, include the name of the VRF with the backup-ip parameter. The following example configures the backup link to VRF RED:
cumulus@leaf01:~$ cl set NEED COMMAND
cumulus@leaf01:~$ cl config apply
cumulus@leaf02:~$ cl set interface swp49-50 type peerlink
cumulus@leaf02:~$ cl set mlag mac-address 44:38:39:BE:EF:AA
cumulus@leaf02:~$ cl set mlag backup 10.10.10.1
cumulus@leaf02:~$ cl set mlag peer-ip linklocal
cumulus@leaf02:~$ cl config apply
To configure the backup link to a VRF, include the name of the VRF with the backup-ip parameter. The following example configures the backup link to VRF RED:
cumulus@leaf01:~$ cl set NEED COMMAND
cumulus@leaf01:~$ cl config apply
Do not add VLAN 4094 to the bridge VLAN list; VLAN 4094 for the peer link subinterface cannot be configured as a bridged VLAN with bridge VIDs under the bridge.
Do not use 169.254.0.1 as the MLAG peer link IP address; Cumulus Linux uses this address exclusively for BGP unnumbered interfaces.
When you configure MLAG manually in the /etc/network/interfaces file, the changes take effect when you bring the peer link interface up with the sudo ifreload -a command. Do not use systemctl restart clagd.service to apply the new configuration.
The MLAG bond does not support layer 3 configuration.
MLAG synchronizes the dynamic state between the two peer switches but it does not synchronize the switch configurations. After modifying the configuration of one peer switch, you must make the same changes to the configuration on the other peer switch. This applies to all configuration changes, including:
Port configuration, such as VLAN membership, MTU and bonding parameters.
Bridge configuration, such as spanning tree parameters or bridge properties.
Static address entries, such as static FDB entries and static IGMP entries.
QoS configuration, such as ACL entries.
Optional Configuration
This section describes optional configuration procedures.
Set Roles and Priority
Each MLAG-enabled switch in the pair has a role. When the peering relationship is established between the two switches, one switch is put into the primary role and the other into the secondary role. When an MLAG-enabled switch is in the secondary role, it does not send STP BPDUs on dual-connected links; it only sends BPDUs on single-connected links. The switch in the primary role sends STP BPDUs on all single- and dual-connected links.
By default, the role is determined by comparing the MAC addresses of the two sides of the peering link; the switch with the lower MAC address assumes the primary role. You can override this by setting the priority option for the peer link:
cumulus@leaf01:~$ net add interface peerlink.4094 clag priority 2048
cumulus@leaf01:~$ net pending
cumulus@leaf01:~$ net commit
Edit the /etc/network/interfaces file and add the clagd-priority option, then run the ifreload -a command.
cumulus@switch:~$ cl set mlag priority 2048
cumulus@switch:~$ cl config apply
The switch with the lower priority value is given the primary role; the default value is 32768 and the range is between 0 and 65535.
When the clagd service exits during switch reboot or if you stop the service on the primary switch, the peer switch that is in the secondary role becomes the primary.
However, if the primary switch goes down without stopping the clagd service for any reason, or if the peer link goes down, the secondary switch does not change its role. If the peer switch is determined to not be alive, the switch in the secondary role rolls back the LACP system ID to be the bond interface MAC address instead of the MLAG system MAC address (clagd-sys-mac) and the switch in primary role uses the MLAG system MAC address as the LACP system ID on the bonds.
Set clagctl Timers
The clagd service has a number of timers that you can tune for enhanced performance:
Timer
Description
--reloadTimer <seconds>
The number of seconds to wait for the peer switch to become active. If the peer switch does not become active after the timer expires, the MLAG bonds leave the initialization (protodown) state and become active. This provides clagd with sufficient time to determine whether the peer switch is coming up or if it is permanently unreachable. The default is 300 seconds.
--peerTimeout <seconds>
The number of seconds clagd waits without receiving any messages from the peer switch before it determines that the peer is no longer active. At this point, the switch reverts all configuration changes so that it operates as a standard non-MLAG switch. This includes removing all statically assigned MAC addresses, clearing the egress forwarding mask, and allowing addresses to move from any port to the peer port. After a message is again received from the peer, MLAG operation restarts. If this parameter is not specified, clagd uses ten times the local lacpPoll value.
--initDelay <seconds>
The number of seconds clagd delays bringing up MLAG bonds and anycast IP addresses. The default is 180 seconds. NVIDIA recommends you set this parameter to 300 seconds in a scaled environment. This timer is set to 0 automatically under the following conditions:
When the peer is not alive and the backup link is not active after a reload timeout
When the peer sends a goodbye (through the peerlink or the backup link)
When both MLAG sessions come up at the same time
--sendTimeout <seconds>
The number of seconds clagd waits until the sending socket times out. If it takes longer than the sendTimeout value to send data to the peer, clagd generates an exception. The default is 30 seconds.
--lacpPoll <seconds>
The number of seconds clagd waits before obtaining local LACP information. The default is 2 seconds.
To set a timer:
Run the net add interface peerlink.4094 clag args <timer> <value> command. The following example command sets the peerlink timer to 900 seconds:
cumulus@leaf01:~$ net add interface peerlink.4094 clag args --peerTimeout 900
cumulus@leaf01:~$ net pending
cumulus@leaf01:~$ net commit
Edit the /etc/network/interfaces file to add the clagd-args <timer> <value> line to the peerlink.4094 stanza, then run the ifreload -a command. The following example sets the peerlink timer to 900 seconds:
cumulus@switch:~$ cl set NEED COMMAND
cumulus@switch:~$ cl config apply
Configure MLAG with a Traditional Mode Bridge
To configure MLAG with a traditional mode bridge instead of a VLAN-aware mode bridge, you must configure the peer link and all dual-connected links as untagged (native) ports on a bridge (note the absence of any VLANs in the bridge-ports line and the lack of the bridge-vlan-aware parameter below):
...
auto br0
iface br0
bridge-ports peerlink bond1 bond2
...
The following example shows you how to allow VLAN 10 across the peer link:
...
auto br0.10
iface br0.10
bridge-ports peerlink.10 bond1.10 bond2.10
bridge-stp on
...
In an MLAG and traditional bridge configuration, NVIDIA recommends that you set bridge learning to off on all VLANs over the peerlink except for the layer 3 peerlink subinterface; for example:
...
auto peerlink
iface peerlink
bridge-learning off
auto peerlink.1510
iface peerlink.1510
bridge-learning off
auto peerlink.4094
iface peerlink.4094
...
Configure a Backup UDP Port
By default, Cumulus Linux uses UDP port 5342 with the backup IP address. To change the backup UDP port:
cumulus@leaf01:~$ net add interface peerlink.4094 clag args --backupPort 5400
cumulus@leaf01:~$ net pending
cumulus@leaf01:~$ net commit
Edit the /etc/network/interfaces file to add clagd-args --backupPort <port> to the auto peerlink.4094 stanza. For example:
Run the sudo ifreload -a command to apply all the configuration changes:
cumulus@leaf01:~$ sudo ifreload -a
cumulus@switch:~$ cl set NEED COMMAND
cumulus@switch:~$ cl config apply
Best Practices
Follow these best practices when configuring MLAG on your switches.
MTU and MLAG
The MTU in MLAG traffic is determined by the bridge MTU. Bridge MTU is determined by the lowest MTU setting of an interface that is a member of the bridge. If you want to set an MTU other than the default of 9216 bytes, you must configure the MTU on each physical interface and bond interface that is a member of every MLAG bridge in the entire bridged domain.
The following example commands set an MTU of 1500 for each of the bond interfaces (peerlink, uplink, bond1, bond2), which are members of bridge bridge:
cumulus@switch:~$ net add bond peerlink mtu 1500
cumulus@switch:~$ net add bond uplink mtu 1500
cumulus@switch:~$ net add bond bond1 mtu 1500
cumulus@switch:~$ net add bond bond2 mtu 1500
cumulus@switch:~$ net pending
cumulus@switch:~$ net commit
Edit the /etc/network/interfaces file, then run the ifreload -a command. For example:
cumulus@switch:~$ sudo nano /etc/network/interfaces
...
auto bridge
iface bridge
bridge-ports peerlink uplink bond1 bond2
auto peerlink
iface peerlink
mtu 1500
auto bond1
iface bond1
mtu 1500
auto bond2
iface bond2
mtu 1500
auto uplink
iface uplink
mtu 1500
...
cumulus@switch:~$ sudo ifreload -a
cumulus@switch:~$ cl set interface peerlink mtu 1500
cumulus@switch:~$ cl set interface uplink mtu 1500
cumulus@switch:~$ cl set interface bond1 mtu 1500
cumulus@switch:~$ cl set interface bond2 mtu 1500
cumulus@switch:~$ cl config apply
STP and MLAG
Always enable STP in your layer 2 network and BPDU Guard on the host-facing bond interfaces.
The STP global configuration must be the same on both peer switches.
The STP configuration for dual-connected ports must be the same on both peer switches.
The STP priority must be the same on both peer switches.
To minimize convergence times when a link transitions to the forwarding state, configure the edge ports (for tagged and untagged frames) with PortAdminEdge and BPDU guard enabled.
Do not use a multicast MAC address for the LACP ID on systems connected to MLAG bonds; the switch drops STP BPDUs from a multicast MAC address.
Peer Link Sizing
The peer link carries very little traffic when compared to the bandwidth consumed by dataplane traffic. In a typical MLAG configuration, most every connection between the two switches in the MLAG pair is dual-connected so the only traffic going across the peer link is traffic from the clagd process and some LLDP or LACP traffic; the traffic received on the peer link is not forwarded out of the dual-connected bonds.
However, there are some instances where a host is connected to only one switch in the MLAG pair; for example:
You have a hardware limitation on the host where there is only one PCIE slot, and therefore, one NIC on the system, so the host is only single-connected across that interface.
The host does not support 802.3ad and you cannot create a bond on it.
You are accounting for a link failure, where the host becomes single connected until the failure is resolved.
Determine how much bandwidth is traveling across the single-connected interfaces and allocate half of that bandwidth to the peer link. On average, one half of the traffic destined to the single-connected host arrives on the switch directly connected to the single-connected host and the other half arrives on the switch that is not directly connected to the single-connected host. When this happens, only the traffic that arrives on the switch that is not directly connected to the single-connected host needs to traverse the peer link.
In addition, you might want to add extra links to the peer link bond to handle link failures in the peer link bond itself.
Each host has two 10G links, with each 10G link going to each switch in the MLAG pair.
Each host has 20G of dual-connected bandwidth; all three hosts have a total of 60G of dual-connected bandwidth.
Allocate at least 15G of bandwidth to each peer link bond, which represents half of the single-connected bandwidth.
When planning for link failures for a full rack, you need only allocate enough bandwidth to meet your site strategy for handling failure scenarios. For example, for a full rack with 40 servers and two switches, you might plan for four to six servers to lose connectivity to a single switch and become single connected before you respond to the event. Therefore, if you have 40 hosts each with 20G of bandwidth dual-connected to the MLAG pair, you might allocate between 20G and 30G of bandwidth to the peer link, which accounts for half of the single-connected bandwidth for four to six hosts.
Peer Link Routing
When enabling a routing protocol in an MLAG environment, it is also necessary to manage the uplinks; by default MLAG is not aware of layer 3 uplink interfaces. If there is a peer link failure, MLAG does not remove static routes or bring down a BGP or OSPF adjacency unless you use a separate link state daemon such as ifplugd.
When you use MLAG with VRR, set up a routed adjacency across the peerlink.4094 interface. If a routed connection is not built across the peer link, during an uplink failure on one of the switches in the MLAG pair, egress traffic does not forward if the destination is on the switch whose uplinks are down.
To set up the adjacency, configure a BGP or OSPF unnumbered peering, as appropriate for your network.
The MLAG loop avoidance mechanism also drops routed traffic that arrives on an MLAG peerlink interface and routes to a dual-connected VNI.
If you need to route unencapsulated traffic to an MLAG peer switch for VXLAN forwarding to accommodate uplink failures or other design needs, configure a routing adjacency across a separate routed interface that is not the MLAG peerlink.
For BGP, use a configuration like this:
cumulus@switch:~$ net add bgp neighbor peerlink.4094 interface remote-as internal
cumulus@switch:~$ net commit
cumulus@switch:~$ net add interface peerlink.4094 ospf area 0.0.0.1
cumulus@switch:~$ net commit
cumulus@switch:~$ cl set NEED COMMAND
cumulus@switch:~$ cl config apply
If you are using EVPN and MLAG, you need to enable the EVPN address family across the peerlink.4094 interface as well:
cumulus@switch:~$ net add bgp neighbor peerlink.4094 interface remote-as internal
cumulus@switch:~$ net add bgp l2vpn evpn neighbor peerlink.4094 activate
cumulus@switch:~$ net commit
Currently unavailable
If you use NCLU to create an iBGP peering across the peer link, the net add bgp l2vpn evpn neighbor peerlink.4094 activate command creates a new eBGP neighborship when one is already configured for iBGP. The existing iBGP configuration is still valid.
MLAG Routing Support
In addition to the routing adjacency over the peer link, Cumulus Linux supports routing adjacencies from attached network devices to MLAG switches under the following conditions:
The router must physically attach to a single interface of a switch.
The attached router must peer directly to a local address on the physically connected switch.
The router cannot:
Attach to the switch over a MLAG bond interface.
Form routing adjacencies to a virtual address (VRR or VRRP).
Configuration Examples
Basic Example
The example below shows a basic MLAG configuration, where:
leaf01 and leaf02 are MLAG peers
Three bonds are configured for MLAG, each with a single port, a peer link that is a bond with two member ports, and three VLANs on each port
cumulus@leaf01:~$ cat /etc/network/interfaces
auto lo
iface lo inet loopback
address 10.10.10.1/32
auto mgmt
iface mgmt
vrf-table auto
address 127.0.0.1/8
address ::1/128
auto eth0
iface eth0 inet dhcp
vrf mgmt
auto bridge
iface bridge
bridge-ports peerlink
bridge-ports bond1 bond2 bond3
bridge-vids 10 20 30
bridge-vlan-aware yes
auto vlan10
iface vlan10
address 10.1.10.2/24
vlan-raw-device bridge
vlan-id 10
auto vlan20
iface vlan20
address 10.1.20.2/24
vlan-raw-device bridge
vlan-id 20
auto vlan30
iface vlan30
address 10.1.30.2/24
vlan-raw-device bridge
vlan-id 30
auto swp51
iface swp51
alias leaf to spine
auto swp49
iface swp49
alias peerlink
auto swp50
iface swp50
alias peerlink
auto peerlink
iface peerlink
bond-slaves swp49 swp50
auto peerlink.4094
iface peerlink.4094
clagd-backup-ip 10.10.10.2
clagd-peer-ip linklocal
clagd-priority 1000
clagd-sys-mac 44:38:39:BE:EF:AA
auto swp1
iface swp1
alias bond member of bond1
mtu 9000
auto bond1
iface bond1
alias bond1 on swp1
mtu 9000
clag-id 1
bridge-access 10
bond-slaves swp1
bond-lacp-bypass-allow yes
mstpctl-bpduguard yes
mstpctl-portadminedge yes
auto swp2
iface swp2
alias bond member of bond2
mtu 9000
auto bond2
iface bond2
alias bond2 on swp2
mtu 9000
clag-id 2
bridge-access 20
bond-slaves swp2
bond-lacp-bypass-allow yes
mstpctl-bpduguard yes
mstpctl-portadminedge yes
auto swp3
iface swp3
alias bond member of bond3
mtu 9000
auto bond3
iface bond3
alias bond3 on swp3
mtu 9000
clag-id 3
bridge-access 30
bond-slaves swp3
bond-lacp-bypass-allow yes
mstpctl-bpduguard yes
mstpctl-portadminedge yes
cumulus@leaf02:~$ cat /etc/network/interfaces
auto lo
iface lo inet loopback
address 10.10.10.2/32
auto mgmt
iface mgmt
vrf-table auto
address 127.0.0.1/8
address ::1/128
auto eth0
iface eth0 inet dhcp
vrf mgmt
auto bridge
iface bridge
bridge-ports peerlink
bridge-ports bond1 bond2 bond3
bridge-vids 10 20 30
bridge-vlan-aware yes
auto vlan10
iface vlan10
address 10.1.10.3/24
vlan-raw-device bridge
vlan-id 10
auto vlan20
iface vlan20
address 10.1.20.3/24
vlan-raw-device bridge
vlan-id 20
auto vlan30
iface vlan30
address 10.1.30.3/24
vlan-raw-device bridge
vlan-id 30
auto swp51
iface swp51
alias leaf to spine
auto swp49
iface swp49
alias peerlink
auto swp50
iface swp50
alias peerlink
auto peerlink
iface peerlink
bond-slaves swp49 swp50
auto peerlink.4094
iface peerlink.4094
clagd-backup-ip 10.10.10.1
clagd-peer-ip linklocal
clagd-priority 32768
clagd-sys-mac 44:38:39:BE:EF:AA
auto swp1
iface swp1
alias bond member of bond1
mtu 9000
auto bond1
iface bond1
alias bond1 on swp1
mtu 9000
clag-id 1
bridge-access 10
bond-slaves swp1
bond-lacp-bypass-allow yes
mstpctl-bpduguard yes
mstpctl-portadminedge yes
auto swp2
iface swp2
alias bond member of bond2
mtu 9000
auto bond2
iface bond2
alias bond2 on swp2
mtu 9000
clag-id 2
bridge-access 20
bond-slaves swp2
bond-lacp-bypass-allow yes
mstpctl-bpduguard yes
mstpctl-portadminedge yes
auto swp3
iface swp3
alias bond member of bond3
mtu 9000
auto bond3
iface bond3
alias bond3 on swp3
mtu 9000
clag-id 3
bridge-access 30
bond-slaves swp3
bond-lacp-bypass-allow yes
mstpctl-bpduguard yes
mstpctl-portadminedge yes
cumulus@spine01:~$ cat /etc/network/interfaces
auto lo
iface lo inet loopback
address 10.10.10.101/32
auto mgmt
iface mgmt
vrf-table auto
address 127.0.0.1/8
address ::1/128
auto eth0
iface eth0 inet dhcp
vrf mgmt
auto swp1
iface swp1
alias leaf to spine
MLAG and BGP Example
The example configuration below shows an MLAG configuration where:
leaf01 and leaf02 are MLAG peers, and leaf03 and leaf04 are are MLAG peers
Three bonds are configured for MLAG, each with a single port, a peer link that is a bond with two member ports, and three VLANs on each port
BGP unnumbered is configured on the leafs and spines with a routed adjacency across the peerlink.4094 interface
/etc/network/interfaces
cumulus@leaf01:~$ cat /etc/network/interfaces
auto lo
iface lo inet loopback
address 10.10.10.1/32
auto mgmt
iface mgmt
vrf-table auto
address 127.0.0.1/8
address ::1/128
auto eth0
iface eth0 inet dhcp
vrf mgmt
auto bridge
iface bridge
bridge-ports peerlink
bridge-ports bond1 bond2 bond3
bridge-vids 10 20 30
bridge-vlan-aware yes
auto vlan10
iface vlan10
address 10.1.10.2/24
vlan-raw-device bridge
vlan-id 10
auto vlan20
iface vlan20
address 10.1.20.2/24
vlan-raw-device bridge
vlan-id 20
auto vlan30
iface vlan30
address 10.1.30.2/24
vlan-raw-device bridge
vlan-id 30
auto swp51
iface swp51
alias leaf to spine
auto swp52
iface swp52
alias leaf to spine
auto swp49
iface swp49
alias peerlink
auto swp50
iface swp50
alias peerlink
auto peerlink
iface peerlink
bond-slaves swp49 swp50
auto peerlink.4094
iface peerlink.4094
clagd-backup-ip 10.10.10.2
clagd-peer-ip linklocal
clagd-priority 1000
clagd-sys-mac 44:38:39:BE:EF:AA
auto swp1
iface swp1
alias bond member of bond1
mtu 9000
auto bond1
iface bond1
alias bond1 on swp1
mtu 9000
clag-id 1
bridge-access 10
bond-slaves swp1
bond-lacp-bypass-allow yes
mstpctl-bpduguard yes
mstpctl-portadminedge yes
auto swp2
iface swp2
alias bond member of bond2
mtu 9000
auto bond2
iface bond2
alias bond2 on swp2
mtu 9000
clag-id 2
bridge-access 20
bond-slaves swp2
bond-lacp-bypass-allow yes
mstpctl-bpduguard yes
mstpctl-portadminedge yes
auto swp3
iface swp3
alias bond member of bond3
mtu 9000
auto bond3
iface bond3
alias bond3 on swp3
mtu 9000
clag-id 3
bridge-access 30
bond-slaves swp3
bond-lacp-bypass-allow yes
mstpctl-bpduguard yes
mstpctl-portadminedge yes
cumulus@leaf02:~$ cat /etc/network/interfaces
auto lo
iface lo inet loopback
address 10.10.10.2/32
auto mgmt
iface mgmt
vrf-table auto
address 127.0.0.1/8
address ::1/128
auto eth0
iface eth0 inet dhcp
vrf mgmt
auto bridge
iface bridge
bridge-ports peerlink
bridge-ports bond1 bond2 bond3
bridge-vids 10 20 30
bridge-vlan-aware yes
auto vlan10
iface vlan10
address 10.1.10.3/24
vlan-raw-device bridge
vlan-id 10
auto vlan20
iface vlan20
address 10.1.20.3/24
vlan-raw-device bridge
vlan-id 20
auto vlan30
iface vlan30
address 10.1.30.3/24
vlan-raw-device bridge
vlan-id 30
auto swp51
iface swp51
alias leaf to spine
auto swp52
iface swp52
alias leaf to spine
auto swp49
iface swp49
alias peerlink
auto swp50
iface swp50
alias peerlink
auto peerlink
iface peerlink
bond-slaves swp49 swp50
auto peerlink.4094
iface peerlink.4094
clagd-backup-ip 10.10.10.1
clagd-peer-ip linklocal
clagd-priority 32768
clagd-sys-mac 44:38:39:BE:EF:AA
auto swp1
iface swp1
alias bond member of bond1
mtu 9000
auto bond1
iface bond1
alias bond1 on swp1
mtu 9000
clag-id 1
bridge-access 10
bond-slaves swp1
bond-lacp-bypass-allow yes
mstpctl-bpduguard yes
mstpctl-portadminedge yes
auto swp2
iface swp2
alias bond member of bond2
mtu 9000
auto bond2
iface bond2
alias bond2 on swp2
mtu 9000
clag-id 2
bridge-access 20
bond-slaves swp2
bond-lacp-bypass-allow yes
mstpctl-bpduguard yes
mstpctl-portadminedge yes
auto swp3
iface swp3
alias bond member of bond3
mtu 9000
auto bond3
iface bond3
alias bond3 on swp3
mtu 9000
clag-id 3
bridge-access 30
bond-slaves swp3
bond-lacp-bypass-allow yes
mstpctl-bpduguard yes
mstpctl-portadminedge yes
cumulus@leaf03:~$ cat /etc/network/interfaces
auto lo
iface lo inet loopback
address 10.10.10.3/32
auto mgmt
iface mgmt
vrf-table auto
address 127.0.0.1/8
address ::1/128
auto eth0
iface eth0 inet dhcp
vrf mgmt
auto bridge
iface bridge
bridge-ports peerlink
bridge-ports bond1 bond2 bond3
bridge-vids 10 20 30
bridge-vlan-aware yes
auto vlan10
iface vlan10
address 10.1.10.2/24
vlan-raw-device bridge
vlan-id 10
auto vlan20
iface vlan20
address 10.1.20.2/24
vlan-raw-device bridge
vlan-id 20
auto vlan30
iface vlan30
address 10.1.30.2/24
vlan-raw-device bridge
vlan-id 30
auto swp51
iface swp51
alias leaf to spine
auto swp52
iface swp52
alias leaf to spine
auto swp49
iface swp49
alias peerlink
auto swp50
iface swp50
alias peerlink
auto peerlink
iface peerlink
bond-slaves swp49 swp50
auto peerlink.4094
iface peerlink.4094
clagd-backup-ip 10.10.10.4
clagd-peer-ip linklocal
clagd-priority 1000
clagd-sys-mac 44:38:39:BE:EF:BB
auto swp1
iface swp1
alias bond member of bond1
mtu 9000
auto bond1
iface bond1
alias bond1 on swp1
mtu 9000
clag-id 1
bridge-access 10
bond-slaves swp1
bond-lacp-bypass-allow yes
mstpctl-bpduguard yes
mstpctl-portadminedge yes
auto swp2
iface swp2
alias bond member of bond2
mtu 9000
auto bond2
iface bond2
alias bond2 on swp2
mtu 9000
clag-id 2
bridge-access 20
bond-slaves swp2
bond-lacp-bypass-allow yes
mstpctl-bpduguard yes
mstpctl-portadminedge yes
auto swp3
iface swp3
alias bond member of bond3
mtu 9000
auto bond3
iface bond3
alias bond3 on swp3
mtu 9000
clag-id 3
bridge-access 30
bond-slaves swp3
bond-lacp-bypass-allow yes
mstpctl-bpduguard yes
mstpctl-portadminedge yes
cumulus@leaf04:~$ cat /etc/network/interfaces
auto lo
iface lo inet loopback
address 10.10.10.4/32
auto mgmt
iface mgmt
vrf-table auto
address 127.0.0.1/8
address ::1/128
auto eth0
iface eth0 inet dhcp
vrf mgmt
auto bridge
iface bridge
bridge-ports peerlink
bridge-ports bond1 bond2 bond3
bridge-vids 10 20 30
bridge-vlan-aware yes
auto vlan10
iface vlan10
address 10.1.10.3/24
vlan-raw-device bridge
vlan-id 10
auto vlan20
iface vlan20
address 10.1.20.3/24
vlan-raw-device bridge
vlan-id 20
auto vlan30
iface vlan30
address 10.1.30.3/24
vlan-raw-device bridge
vlan-id 30
auto swp51
iface swp51
alias leaf to spine
auto swp52
iface swp52
alias leaf to spine
auto swp49
iface swp49
alias peerlink
auto swp50
iface swp50
alias peerlink
auto peerlink
iface peerlink
bond-slaves swp49 swp50
auto peerlink.4094
iface peerlink.4094
clagd-backup-ip 10.10.10.3
clagd-peer-ip linklocal
clagd-priority 32768
clagd-sys-mac 44:38:39:BE:EF:BB
auto swp1
iface swp1
alias bond member of bond1
mtu 9000
auto bond1
iface bond1
alias bond1 on swp1
mtu 9000
clag-id 1
bridge-access 10
bond-slaves swp1
bond-lacp-bypass-allow yes
mstpctl-bpduguard yes
mstpctl-portadminedge yes
auto swp2
iface swp2
alias bond member of bond2
mtu 9000
auto bond2
iface bond2
alias bond2 on swp2
mtu 9000
clag-id 2
bridge-access 20
bond-slaves swp2
bond-lacp-bypass-allow yes
mstpctl-bpduguard yes
mstpctl-portadminedge yes
auto swp3
iface swp3
alias bond member of bond3
mtu 9000
auto bond3
iface bond3
alias bond3 on swp3
mtu 9000
clag-id 3
bridge-access 30
bond-slaves