NVIDIA® Cumulus Linux is the first full-featured Debian Buster-based, Linux operating system for the networking industry.
This user guide provides in-depth documentation on the Cumulus Linux installation process, system configuration and management, network solutions, and monitoring and troubleshooting recommendations. In addition, the quick start guide provides an end-to-end setup process to get you started.
Cumulus Linux 5.6 includes the NVIDIA NetQ agent and CLI. You can use NetQ to monitor and manage your data center network infrastructure and operational health. Refer to the NVIDIA NetQ documentation for details.
For a list of the new features in this release, see What's New. For bug fixes and known issues present in this release, refer to the Cumulus Linux 5.6 Release Notes.
Try It Pre-built Demos
The Cumulus Linux documentation includes pre-built Try It demos for certain Cumulus Linux features. The Try It demos run a simulation in NVIDIA Air; a cloud hosted platform that works exactly like a real world production deployment. Use the Try It demos to examine switch configuration for a feature. For more information, see Try It Pre-built Demos.
Open Source Contributions
To implement various Cumulus Linux features, NVIDIA has forked various software projects, like CFEngine Netdev and some Puppet Labs packages. Some of the forked code resides in the NVIDIA Networking GitHub repository and some is available as part of the Cumulus Linux repository as Debian source packages.
NVIDIA has also developed and released new applications as open source. The list of open source projects is on the Cumulus Linux packages page.
Download the User Guide
Use one of the following methods to download the Cumulus Linux user guide and view it offline:
Host the documentation on a local host using hugo.
For a fully functional copy of the user guide, download a zip file of an HTML documentation build for offline use. Download the desired version, extract it locally, then open cumulus-linux-56.html in your web browser.
To view this user guide as a single page to print to a PDF with limited functionality, click here.
Click on the link one time and use the web browser print-to-PDF option to save the PDF locally.
What's New
This document supports the Cumulus Linux 5.6 release, and lists new platforms, features, and enhancements.
Enable and disable external API access commands (Cumulus Linux 5.6 and later enables the NVUE REST API by default; be sure to disable or secure the API if needed)
Changes to the nv show platform command outputs to improve readability:
Shows a concise summary of all sensors
Improved memory format
Modified disk size output from KB to GB
Output fields display n/a for VX models where needed
Removed the applied column from the nv show platform hardware component device command
Removed the applied and pending columns from the nv show platform software installed python3-nvue command
The nv show commands provide a --filter option to filter output data
EVPN multihoming configuration with NVUE no longer supports a 10-byte ESI value starting with a non 00 hex value
▼
New NVUE Commands
For descriptions and examples of all NVUE commands, refer to the NVUE Command Reference for Cumulus Linux.
nv show router pbr nexthop-group
nv show router pbr nexthop-group <nexthop-group-id>
nv show bridge domain <domain-id> port
nv show bridge domain <domain-id> stp counters
nv show bridge domain <domain-id> stp port
nv show bridge domain <domain-id> stp port <interface-id>
nv show bridge domain <domain-id> stp vlan
nv show bridge domain <domain-id> stp vlan <vid>
nv show evpn vni <vni-id> remote-vtep
nv show qos pfc-watchdog
nv show interface <interface-id> ip igmp group
nv show interface <interface-id> ip igmp group <static-group-id>
nv show interface <interface-id> bridge domain <domain-id> stp vlan
nv show interface <interface-id> bridge domain <domain-id> stp vlan <vid>
nv show interface <interface-id> qos pfc-watchdog
nv show interface <interface-id> qos pfc-watchdog status
nv show interface <interface-id> qos pfc-watchdog status <qos-tc-id>
nv show service ptp <instance-id> counters
nv show system api
nv show system api listening-address
nv show system api listening-address <listening-address-id>
nv show system api connections
nv show system global arp
nv show system global arp garbage-collection-threshold
nv show system global nd
nv show system global nd garbage-collection-threshold
nv show system ssh-server
nv show system ssh-server max-unauthenticated
nv show system ssh-server vrf
nv show system ssh-server vrf <vrf-id>
nv show system ssh-server allow-users
nv show system ssh-server allow-users <user-id>
nv show system ssh-server deny-users
nv show system ssh-server deny-users <user-id>
nv show system ssh-server port
nv show system ssh-server port <port-id>
nv show system ssh-server active-sessions
nv set router adaptive-routing link-utilization-threshold (on|off)
nv set router password-obfuscation (enabled|disabled)
nv set bridge domain <domain-id> stp vlan <vid>
nv set bridge domain <domain-id> stp vlan <vid> bridge-priority 4096-61440
nv set bridge domain <domain-id> stp vlan <vid> hello-time 1-10
nv set bridge domain <domain-id> stp vlan <vid> forward-delay 4-30
nv set bridge domain <domain-id> stp vlan <vid> max-age 6-40
nv set bridge domain <domain-id> stp mode (rstp|pvrst)
nv set qos pfc-watchdog polling-interval 100-5000
nv set qos pfc-watchdog robustness 1-1000
nv set interface <interface-id> ip igmp fast-leave
nv set interface <interface-id> ip igmp last-member-query-count
nv set interface <interface-id> router adaptive-routing link-utilization-threshold 1-100
nv set interface <interface-id> bridge domain <domain-id> stp vlan <vid>
nv set interface <interface-id> bridge domain <domain-id> stp vlan <vid> priority 0-240
nv set interface <interface-id> bridge domain <domain-id> stp vlan <vid> path-cost
nv set interface <interface-id> bridge domain <domain-id> stp path-cost 1-200000000
nv set interface <interface-id> qos pfc-watchdog state (enable|disable)
nv set service ptp <instance-id> profile <profile-id> two-step (on|off)
nv set service ptp <instance-id> two-step (on|off)
nv set system api listening-address <listening-address-id>
nv set system api state (enabled|disabled)
nv set system api port 1-65535
nv set system global arp base-reachable-time (30-2147483|auto)
nv set system global nd base-reachable-time (30-2147483|auto)
nv set system ssh-server max-unauthenticated session-count 1-10000
nv set system ssh-server max-unauthenticated throttle-percent 1-100
nv set system ssh-server max-unauthenticated throttle-start 1-10000
nv set system ssh-server vrf <vrf-id>
nv set system ssh-server allow-users <user-id>
nv set system ssh-server deny-users <user-id>
nv set system ssh-server port <port-id>
nv set system ssh-server authentication-retries 3-100
nv set system ssh-server login-timeout 1-600
nv set system ssh-server inactive-timeout <value>
nv set system ssh-server permit-root-login (disabled|prohibit-password|forced-commands-only|enabled)
nv set system ssh-server max-sessions-per-connection 1-100
nv set system ssh-server state (enabled|disabled)
nv unset router adaptive-routing link-utilization-threshold
nv unset router password-obfuscation
nv unset bridge domain <domain-id> stp vlan
nv unset bridge domain <domain-id> stp vlan <vid>
nv unset bridge domain <domain-id> stp vlan <vid> bridge-priority
nv unset bridge domain <domain-id> stp vlan <vid> hello-time
nv unset bridge domain <domain-id> stp vlan <vid> forward-delay
nv unset bridge domain <domain-id> stp vlan <vid> max-age
nv unset bridge domain <domain-id> stp mode
nv unset qos pfc-watchdog
nv unset qos pfc-watchdog polling-interval
nv unset qos pfc-watchdog robustness
nv unset interface <interface-id> ip igmp fast-leave
nv unset interface <interface-id> ip igmp last-member-query-count
nv unset interface <interface-id> bridge domain <domain-id> stp vlan
nv unset interface <interface-id> bridge domain <domain-id> stp vlan <vid>
nv unset interface <interface-id> bridge domain <domain-id> stp vlan <vid> priority
nv unset interface <interface-id> bridge domain <domain-id> stp vlan <vid> path-cost
nv unset interface <interface-id> bridge domain <domain-id> stp path-cost
nv unset interface <interface-id> qos pfc-watchdog
nv unset interface <interface-id> qos pfc-watchdog state
nv unset service ptp <instance-id> profile <profile-id> two-step
nv unset service ptp <instance-id> two-step
nv unset system api
nv unset system api listening-address
nv unset system api listening-address <listening-address-id>
nv unset system api state
nv unset system api port
nv unset system api port 1-65535
nv unset system global arp base-reachable-time
nv unset system global nd base-reachable-time
nv unset system ssh-server
nv unset system ssh-server max-unauthenticated
nv unset system ssh-server max-unauthenticated session-count
nv unset system ssh-server max-unauthenticated throttle-percent
nv unset system ssh-server max-unauthenticated throttle-start
nv unset system ssh-server vrf
nv unset system ssh-server vrf <vrf-id>
nv unset system ssh-server allow-users
nv unset system ssh-server allow-users <user-id>
nv unset system ssh-server deny-users
nv unset system ssh-server deny-users <user-id>
nv unset system ssh-server port
nv unset system ssh-server port <port-id>
nv unset system ssh-server authentication-retries
nv unset system ssh-server login-timeout
nv unset system ssh-server inactive-timeout
nv unset system ssh-server permit-root-login
nv unset system ssh-server max-sessions-per-connection
nv unset system ssh-server state
Cumulus Linux 5.6 includes the NVUE object model. After you upgrade to Cumulus Linux 5.6, running NVUE configuration commands might override configuration for features that are now configurable with NVUE and removes configuration you added manually to files or with automation tools like Ansible, Chef, or Puppet. To keep your configuration, you can do one of the following:
Use Linux and FRR (vtysh) commands instead of NVUE for all switch configuration.
Cumulus Linux 3.7, 4.3, and 4.4 continue to support NCLU. For more information, contact your NVIDIA Spectrum platform sales representative.
Quick Start Guide
This quick start guide provides an end-to-end setup process for installing and running Cumulus Linux.
Prerequisites
This guide assumes you have intermediate-level Linux knowledge. You need to be familiar with basic text editing, Unix file permissions, and process monitoring. A variety of text editors are pre-installed, including vi and nano.
You must have access to a Linux or UNIX shell. If you are running Windows, use a Linux environment like Cygwin as your command line tool for interacting with Cumulus Linux.
Get Started
Cumulus Linux is on the switch by default. To upgrade to a different Cumulus Linux release or re-install Cumulus Linux, refer to Installation Management. To show the current Cumulus Linux release on the switch, run the NVUE nv show system command.
When starting Cumulus Linux for the first time, the management port makes a DHCPv4 request. To determine the IP address of the switch, you can cross reference the MAC address of the switch with your DHCP server. The MAC address is typically located on the side of the switch or on the box in which the unit ships.
To get started:
Log in to Cumulus Linux on the switch and change the default credentials.
Configure Cumulus Linux. This quick start guide provides instructions on changing the hostname of the switch, setting the date and time, and configuring switch ports and a loopback interface.
You can choose to configure Cumulus Linux either with NVUE commands or Linux commands (with vtysh or by manually editing configuration files). Do not run both NVUE configuration commands (such as nv set, nv unset, nv action, nv config) and Linux commands to configure the switch. NVUE commands replace the configuration in files such as /etc/network/interfaces and /etc/frr/frr.conf, and remove any configuration you add manually or with automation tools like Ansible, Chef, or Puppet.
If you choose to configure Cumulus Linux with NVUE, you can configure features that do not yet support the NVUE Object Model by creating NVUE Snippets.
Login Credentials
The default installation includes two accounts:
The system account (root) has full system privileges. Cumulus Linux locks the root account password by default (which prohibits login).
The user account (cumulus) has sudo privileges. The cumulus account uses the default password cumulus.
When you log in for the first time with the cumulus account, Cumulus Linux prompts you to change the default password. After you provide a new password, the SSH session disconnects and you have to reconnect with the new password.
In this quick start guide, you use the cumulus account to configure Cumulus Linux.
All accounts except root can use remote SSH login; you can use sudo to grant a non-root account root-level access. Commands that change the system configuration require this elevated level of access.
NVIDIA recommends you perform management and configuration over the network, either in band or out of band. A serial console is fully supported.
Typically, switches ship from the manufacturer with a mating DB9 serial cable. Switches with ONIE are always set to a 115200 baud rate.
Wired Ethernet Management
A Cumulus Linux switch always provides at least one dedicated Ethernet management port called eth0. This interface is specifically for out-of-band management use. The management interface uses DHCPv4 for addressing by default.
To set a static IP address:
cumulus@switch:~$ nv set interface eth0 ip address 192.0.2.42/24
cumulus@switch:~$ nv set interface eth0 ip gateway 192.0.2.1
cumulus@switch:~$ nv config apply
The command prompt in the terminal does not reflect the new hostname until you either log out of the switch or start a new shell.
Configure the Time Zone
The default time zone on the switch is UTC (Coordinated Universal Time). Change the time zone on your switch to be the time zone for your location.
To update the time zone:
Run the nv set system timezone <timezone> command. To see all the available time zones, run nv set system timezone and press the Tab key. The following example sets the time zone to US/Eastern:
cumulus@switch:~$ nv set system timezone US/Eastern
cumulus@switch:~$ nv config apply
In a terminal, run the following command:
cumulus@switch:~$ sudo dpkg-reconfigure tzdata
Follow the on screen menu options to select the geographic area and region.
Programs that are already running (including log files) and logged in users, do not see time zone changes. To set the time zone for all services and daemons, reboot the switch.
Verify the System Time
Verify that the date and time on the switch are correct with the Linux date command:
cumulus@switch:~$ date
Mon 21 Nov 2022 06:30:37 PM UTC
If the date and time are incorrect, the switch does not synchronize with automation tools, such as Puppet, and returns errors after you restart switchd.
To set the software clock according to the configured time zone, run the Linux sudo date -s command; for example:
cumulus@switch:~$ sudo date -s "Tue Jan 26 00:37:13 2021"
NTP starts at boot by default on the switch and the NTP configuration includes default servers. To customize NTP, see NTP.
PTP is off by default on the switch. To configure PTP, see PTP.
Configure Breakout Ports with Splitter Cables
If you are using 4x10G DAC or AOC cables, or you want to break out (split) switch ports, configure the breakout ports; see Switch Port Attributes.
Test Cable Connectivity
By default, Cumulus Linux disables all data plane ports (every Ethernet port except the management interface, eth0). To test cable connectivity, administratively enable physical ports.
To enable a port administratively:
cumulus@switch:~$ nv set interface swp1
cumulus@switch:~$ nv config apply
To enable all physical ports administratively on a switch that has ports numbered from swp1 to swp52:
cumulus@switch:~$ nv set interface swp1-52
cumulus@switch:~$ nv config apply
To view link status, run the nv show interface command.
To enable a port administratively:
cumulus@switch:~$ sudo ip link set swp1 up
To enable all physical ports administratively, run the following bash script:
cumulus@switch:~$ sudo su -
cumulus@switch:~$ for i in /sys/class/net/*; do iface=`basename $i`; if [[ $iface == swp* ]]; then ip link set $iface up fi done
To view link status, run the ip link show command.
Configure Layer 2 Ports
Cumulus Linux does not put all ports into a bridge by default. To create a bridge and configure one or more front panel ports as members of the bridge:
The following configuration example places the front panel port swp1 into the default bridge called br_default.
The following configuration example places the front panel port swp1 into the default bridge called br_default:
...
auto br_default
iface br_default
bridge-ports swp1
...
To put a range of ports into a bridge, use the glob keyword. For example, to add swp1 through swp10, swp12, and swp14 through swp20 to the bridge called br_default:
You can configure a front panel port or bridge interface as a layer 3 port.
The following configuration example configures the front panel port swp1 as a layer 3 access port:
cumulus@switch:~$ nv set interface swp1 ip address 10.0.0.0/31
cumulus@switch:~$ nv config apply
To add an IP address to a bridge interface, you must put it into a VLAN interface. If you want to use a VLAN other than the native one, set the bridge PVID:
cumulus@switch:~$ nv set interface swp1-2 bridge domain br_default
cumulus@switch:~$ nv set bridge domain br_default vlan 10
cumulus@switch:~$ nv set interface vlan10 ip address 10.1.10.2/24
cumulus@switch:~$ nv set bridge domain br_default untagged 1
cumulus@switch:~$ nv config apply
The following configuration example configures the front panel port swp1 as a layer 3 access port:
auto swp1
iface swp1
address 10.0.0.0/31
To add an IP address to a bridge interface, include the address under the iface stanza in the /etc/network/interfaces file. If you want to use a VLAN other than the native one, set the bridge PVID:
If there are no errors, run the following command:
cumulus@switch:~$ sudo ifup -a
Configure a Loopback Interface
Cumulus Linux has a preconfigured loopback interface. When the switch boots up, the loopback interface, called lo, is up and assigned an IP address of 127.0.0.1.
The loopback interface lo must always exist on the switch and must always be up. To check the status of the loopback interface, run the NVUE nv show interface lo command or the Linux ip addr show lo command.
To add an IP address to a loopback interface, configure the lo interface:
cumulus@switch:~$ nv set interface lo ip address 10.10.10.1/32
cumulus@switch:~$ nv config apply
Add the IP address directly under the iface lo inet loopback definition in the /etc network/interfaces file:
auto lo
iface lo inet loopback
address 10.10.10.1
If you configure an IP address without a subnet mask, it becomes a /32 IP address. For example, 10.10.10.1 is 10.10.10.1/32.
If you run NVUE commands to configure the switch, run the nv config save command before you reboot. The command saves the applied configuration to the startup configuration so that the changes persist after the reboot.
cumulus@switch:~$ nv config save
Show Platform and System Settings
To show the hostname of the switch, the time zone, and the version of Cumulus Linux running on the switch, run the NVUE nv show system command.
To show switch platform information, such as the ASIC model, CPU, hard disk drive size, RAM size, and port layout, run the NVUE nv show platform hardware command.
Next Steps
You are now ready to configure the switch according to your needs. This guide provides separate sections that describe how to configure system, layer 1, layer 2, layer 3, and network virtualization settings. Each section includes example configurations and pre-built demos.
For a deep dive into the NVUE object model that provides a CLI to simplify configuration, see NVUE.
Installation Management
This section describes how to manage, install, and upgrade Cumulus Linux on your switch.
Managing Cumulus Linux Disk Images
The Cumulus Linux operating system resides on a switch as a disk image. This section discusses how to manage the image.
Reprovisioning the system deletes all system data from the switch.
To stage an ONIE installer from the network (where ONIE automatically locates the installer), run the onie-select -i command. You must reboot the switch to start the install process.
cumulus@switch:~$ sudo onie-select -i
WARNING:
WARNING: Operating System install requested.
WARNING: This will wipe out all system data.
WARNING:
Are you sure (y/N)? y
Enabling install at next reboot...done.
Reboot required to take effect.
To cancel a pending reinstall operation, run the onie-select -c command:
cumulus@switch:~$ sudo onie-select -c
Cancelling pending install at next reboot...done.
To stage an installer located in a specific location, run the onie-install -i <location> command. You can specify a local, absolute or relative path, an HTTP or HTTPS server, SCP or FTP server. You can also stage a Zero Touch Provisioning (ZTP) script along with the installer.
You typically use the onie-install command with the -a option to activate installation. If you do not specify the -a option, you must reboot the switch to start the installation process.
The following example stages the installer located at http://203.0.113.10/image-installer together with the ZTP script located at http://203.0.113.10/ztp-script and activates installation and ZTP:
You can also specify these options together in the same command. For example:
cumulus@switch:~$ sudo onie-install -i http://203.0.113.10/image-installer -z http://203.0.113.10/ztp-script -a
To see more onie-install options, run man onie-install.
Migrate from Cumulus Linux to ONIE (Uninstall All Images and Remove the Configuration)
To remove all installed images and configurations, and return the switch to its factory defaults, run the onie-select -k command.
The onie-select -k command takes a long time to run as it overwrites the entire NOS section of the flash. Only use this command if you want to erase all NOS data and take the switch out of service.
cumulus@switch:~$ sudo onie-select -k
WARNING:
WARNING: Operating System uninstall requested.
WARNING: This will wipe out all system data.
WARNING:
Are you sure (y/N)? y
Enabling uninstall at next reboot...done.
Reboot required to take effect.
You must reboot the switch to start the uninstallation process.
To cancel a pending uninstall operation, run the onie-select -c command:
cumulus@switch:~$ sudo onie-select -c
Cancelling pending uninstall at next reboot...done.
Boot Into Rescue Mode
If your system becomes unresponsive, you can correct certain issues by booting into ONIE rescue mode, which uses unmounted file systems. You can use various Cumulus Linux utilities to try and resolve a problem.
To reboot the system into ONIE rescue mode, run the onie-select -r command:
cumulus@switch:~$ sudo onie-select -r
WARNING:
WARNING: Rescue boot requested.
WARNING:
Are you sure (y/N)? y
Enabling rescue at next reboot...done.
Reboot required to take effect.
You must reboot the system to boot into rescue mode.
To cancel a pending rescue boot operation, run the onie-select -c command:
cumulus@switch:~$ sudo onie-select -c
Cancelling pending rescue at next reboot...done.
Inspect the Image File
The Cumulus Linux image file is executable. From a running switch, you can display, extract, and verify the contents of the image file.
To display the contents of the Cumulus Linux image file, pass the info option to the image file. For example, to display the contents of an image file called onie-installer located in the /var/lib/cumulus/installer directory:
To extract the contents of the image file, use with the extract <path> option. For example, to extract an image file called onie-installer located in the /var/lib/cumulus/installer directory to the mypath directory:
cumulus@switch:~$ sudo /var/lib/cumulus/installer/onie-installer extract mypath
total 181860
-rw-r--r-- 1 4000 4000 308 May 16 19:04 control
drwxr-xr-x 5 4000 4000 4096 Apr 26 21:28 embedded-installer
-rw-r--r-- 1 4000 4000 13273936 May 16 19:04 initrd
-rw-r--r-- 1 4000 4000 4239088 May 16 19:04 kernel
-rw-r--r-- 1 4000 4000 168701528 May 16 19:04 sysroot.tar
To verify the contents of the image file, use with the verify option. For example, to verify the contents of an image file called onie-installer located in the /var/lib/cumulus/installer directory:
cumulus@switch:~$ sudo /var/lib/cumulus/installer/onie-installer verify
Verifying image checksum ...OK.
Preparing image archive ... OK.
./cumulus-linux-bcm-amd64.bin.1: 161: ./cumulus-linux-bcm-amd64.bin.1: onie-sysinfo: not found
Verifying image compatibility ...OK.
Verifying system ram ...OK.
The default password for the cumulus user account is cumulus. The first time you log into Cumulus Linux, you must change this default password. Be sure to update any automation scripts before installing a new image. Cumulus Linux provides command line options to change the default password automatically during the installation process. Refer to ONIE Installation Options.
You can install a new Cumulus Linux image using ONIE, an open source project (equivalent to PXE on servers) that enables the installation of network operating systems (NOS) on bare metal switches.
Before you install Cumulus Linux, the switch can be in two different states:
The switch does not contain an image (the switch is only running ONIE).
Cumulus Linux is already on the switch but you want to use ONIE to reinstall Cumulus Linux or upgrade to a newer version.
The sections below describe some of the different ways you can install the Cumulus Linux image. Steps show how to install directly from ONIE (if no image is on the switch) and from Cumulus Linux (if the image is already on the switch). For additional methods to find and install the Cumulus Linux image, see the ONIE Design Specification.
Installing the Cumulus Linux image is destructive; configuration files on the switch are not saved; copy them to a different server before installing.
In the following procedures:
You can name your Cumulus Linux image using any of the
ONIE naming schemes mentioned here.
Run the sudo onie-install -h command to show the ONIE installer options.
Install Using a DHCP/Web Server With DHCP Options
To install Cumulus Linux using a DHCP or web server withDHCP options, set up a DHCP/web server on your laptop and connect the eth0 management port of the switch to your laptop. After you connect the cable, the installation proceeds as follows:
The switch boots up and requests an IP address (DHCP request).
The DHCP server acknowledges and responds with DHCP option 114 and the location of the installation image.
ONIE downloads the Cumulus Linux image, installs, and reboots.
You are now running Cumulus Linux.
The most common way is to send DHCP option 114 with the entire URL to the web server (this can be the same system). However, there are other ways you can use DHCP even if you do not have full control over DHCP. See the ONIE user guide for information on partial installer URLs and advanced DHCP options; both articles list more supported DHCP options.
Here is an example DHCP configuration with an ISC DHCP server:
Place the Cumulus Linux image in a directory on the web server.
From the Cumulus Linux command prompt, run the onie-install command, then reboot the switch.
cumulus@switch:~$ sudo onie-install -a -i http://10.0.1.251/path/to/cumulus-install-x86_64.bin
Install Using a Web Server With no DHCP
Follow the steps below if you can log into the switch on a serial console (ONIE), or you can log in on the console or with ssh (Install from Cumulus Linux) but no DHCP server is available.
You need a console connection to access the switch; you cannot perform this procedure remotely.
ONIE is in discovery mode. You must disable discovery mode with the following command:
onie# onie-discovery-stop
On older ONIE versions, if the onie-discovery-stop command is not supported, run:
onie# /etc/init.d/discover.sh stop
Assign a static address to eth0 with the ip addr add command:
ONIE:/ #ip addr add 10.0.1.252/24 dev eth0
Place the Cumulus Linux image in a directory on your web server.
Run the installer manually (because there are no DHCP options):
From the Cumulus Linux command prompt, run the onie-install command, then reboot the switch.
cumulus@switch:~$ sudo onie-install -a -i /path/to/local/file/cumulus-install-x86_64.bin
Install Using a USB Drive
Follow the steps below to install the Cumulus Linux image using a USB drive.
Installing Cumulus Linux using a USB drive is fine for a single switch here and there but is not scalable. DHCP can scale to hundreds of switch installs with zero manual input unlike USB installs.
From a computer, prepare your USB drive by formatting it using one of the supported formats: FAT32, vFAT or EXT2.
▼
Optional: Prepare a USB Drive inside Cumulus Linux
a. Insert your USB drive into the USB port on the switch running Cumulus Linux and log in to the switch. Examine output from cat /proc/partitions and sudo fdisk -l [device] to determine the location of your USB drive. For example, sudo fdisk -l /dev/sdb.
These instructions assume your USB drive is the /dev/sdb device, which is typical if you insert the USB drive after the machine is already booted. However, if you insert the USB drive during the boot process, it is possible that your USB drive is the /dev/sda device. Make sure to modify the commands below to use the proper device for your USB drive.
b. Create a new partition table on the USB drive. If the parted utility is not on the system, install it with sudo -E apt-get install parted.
sudo parted /dev/sdb mklabel msdos
c. Create a new partition on the USB drive:
sudo parted /dev/sdb -a optimal mkpart primary 0% 100%
d. Format the partition to your filesystem of choice using one of the examples below:
When using a MAC or Windows computer to rename the installation file, the file extension can still be present. Make sure you remove the file extension so that ONIE can detect the file.
Insert the USB drive into the switch, then prepare the switch for installation:
If the switch is offline, connect to the console and power on the switch.
If the switch is already online in ONIE, use the reboot command.
SSH sessions to the switch get dropped after this step. To complete the remaining instructions, connect to the console of the switch. Cumulus Linux switches display their boot process to the console; you need to monitor the console specifically to complete the next step.
Monitor the console and select the ONIE option from the first GRUB screen shown below.
Cumulus Linux on x86 uses GRUB chainloading to present a second GRUB menu specific to the ONIE partition. No action is necessary in this menu to select the default option ONIE: Install OS.
The switch recognizes the USB drive and mounts it automatically. Cumulus Linux installation begins.
After installation completes, the switch automatically reboots into the newly installed instance of Cumulus Linux.
ONIE Installation Options
You can run several installer command line options from ONIE to perform basic switch configuration automatically after installation completes and Cumulus Linux boots for the first time. These options enable you to:
Set a unique password for the cumulus user
Provide an initial network configuration
Execute a ZTP script to perform necessary configuration
The onie-nos-install command does not allow you to specify command line parameters. You must access the switch from the console and transfer a disk image to the switch. You must then make the disk image executable and install the image directly from the ONIE command line with the options you want to use.
The following example commands transfer a disk image to the switch, make the image executable, and install the image with the --password option to change the default cumulus user password:
You can run more than one option in the same command.
Set the cumulus User Password
The default cumulus user account password is cumulus. When you log into Cumulus Linux for the first time, you must provide a new password for the cumulus account, then log back into the system.
To automate this process, you can specify a new password from the command line of the installer with the --password '<clear text-password>' option. For example, to change the default cumulus user password to MyP4$$word:
To provide a hashed password instead of a clear text password, use the --hashed-password '<hash>' option. An encrypted hash maintains a secure management network.
Generate a sha-512 password hash with the following openssl command. The example command generates a sha-512 password hash for the password MyP4$$word.
If you specify both the --password and --hashed-password options, the --hashed-password option takes precedence and the switch ignores the --password option.
Provide Initial Network Configuration
To provide initial network configuration automatically when Cumulus Linux boots for the first time after installation, use the --interfaces-file <filename> option. For example, to copy the contents of a file called network.intf into the /etc/network/interfaces file and run the ifreload -a command:
To run a ZTP script that contains commands to execute after Cumulus Linux boots for the first time after installation, use the --ztp <filename> option. For example, to run a ZTP script called initial-conf.ztp:
The ZTP script must contain the CUMULUS-AUTOPROVISIONING string near the beginning of the file and must reside on the ONIE filesystem. Refer to Zero Touch Provisioning - ZTP.
If you use the --ztp option together with any of the other command line options, the ZTP script takes precedence and the switch ignores other command line options.
Change the Default BIOS Password
To provide a layer of security and to prevent unauthorized access to the switch, NVIDIA recommends you change the default BIOS password. The default BIOS password is admin.
To change the default BIOS password:
During system boot, press Ctrl+B through the serial console while the BIOS version prints.
From the Security menu, select Administrator Password.
Follow the prompts.
Edit the Cumulus Linux Image (Advanced)
The Cumulus Linux disk image file contains a BASH script that includes a set of variables. You can set these variables to be able to install a fully configured system with a single image file.
▼
To edit the image
Example Image File
The Cumulus Linux disk image file is a self-extracting executable. The executable part of the file is a BASH script at the beginning of the file. Towards the beginning of this BASH script are a set of variables with empty strings:
Defines the clear text password. This variable is equivalent to the ONIE installer command line option --password.
CL_INSTALLER_HASHED_PASSWORD
Defines the hashed password. This variable is equivalent to the ONIE installer command line option --hashed-password. If you set both the CL_INSTALLER_PASSWORD and CL_INSTALLER_HASHED_PASSWORD variable, the CL_INSTALLER_HASHED_PASSWORD takes precedence.
CL_INSTALLER_INTERFACES_FILENAME
Defines the name of the file on the ONIE filesystem you want to use as the /etc/network/interfaces file. This variable is equivalent to the ONIE installer command line option --interfaces-file.
CL_INSTALLER_INTERFACES_CONTENT
Describes the network interfaces available on your system and how to activate them. Setting this variable defines the contents of the /etc/network/interfaces file. There is no equivalent ONIE installer command line option. If you set both the CL_INSTALLER_INTERFACES_FILENAME and CL_INSTALLER_INTERFACES_CONTENT variables, the CL_INSTALLER_INTERFACES_FILENAME takes precedence.
CL_INSTALLER_ZTP_FILENAME
Defines the name of the ZTP file on the ONIE filesystem you want to execute at first boot after installation. This variable is equivalent to the ONIE installer command line option --ztp
Edit the Image File
Because the Cumulus Linux image file is a binary file, you cannot use standard text editors to edit the file directly. Instead, you must split the file into two parts, edit the first part, then put the two parts back together.
Copy the first 20 lines to an empty file:
head -20 cumulus-linux-4.4.0-mlx-amd64.bin > cumulus-linux-4.4.0-mlx-amd64.bin.1
Remove the first 20 lines of the image, then copy the remaining lines into another empty file:
sed -e '1,20d' cumulus-linux-4.4.0-mlx-amd64.bin > cumulus-linux-4.4.0-mlx-amd64.bin.2
The original file is now split, with the first 20 lines in cumulus-linux-4.4.0-mlx-amd64.bin.1 and the remaining lines in cumulus-linux-4.4.0-mlx-amd64.bin.2.
Use a text editor to change the variables in cumulus-linux-4.4.0-mlx-amd64.bin.1.
Calculate the new checksum and update the CL_INSTALLER_PAYLOAD_SHA256 variable. sed -e '1,/^exit_marker$/d' "cumulus-linux-4.4.0-mlx-amd64.bin.final" | sha256sum | awk '{ print $1 }'
This following example shows a modified image file:
...
CL_INSTALLER_PAYLOAD_SHA256='d14a028c2a3a2bc9476102bb288234c415a2b01f828ea62ac332e42f'
CL_INSTALLER_PASSWORD='MyP4$$word'
CL_INSTALLER_HASHED_PASSWORD=''
CL_INSTALLER_LICENSE='customer@datacenter.com|4C3YMCACDiK0D/EnrxlXpj71FBBNAg4Yrq+brza4ZtJFCInvalid'
CL_INSTALLER_INTERFACES_FILENAME=''
CL_INSTALLER_INTERFACES_CONTENT='# This file describes the network interfaces available on your system and how to activate them.
source /etc/network/interfaces.d/*.intf
# The loopback network interface
auto lo
iface lo inet loopback
# The primary network interface
auto eth0
iface eth0 inet dhcp
vrf mgmt
auto bridge
iface bridge
bridge-ports swp1 swp2
bridge-pvid 1
bridge-vids 10 11
bridge-vlan-aware yes
auto mgmt
iface mgmt
address 127.0.0.1/8
address ::1/128
vrf-table auto
'
CL_INSTALLER_ZTP_FILENAME=''
...
You can install this edited image file in the usual way, by using the ONIE install waterfall or the onie-nos-install command.
If you install the modified installation image and specify installer command line parameters, the command line parameters take precedence over the variables modified in the image.
Secure Boot
Secure Boot validates each binary image loaded during system boot with key signatures that correspond to a stored trusted key in firmware.
Secure Boot is only on the NVIDIA SN3700C-S switch and switches with the Spectrum-4 ASIC.
Secure Boot settings are in the BIOS Security menu. To access BIOS, press Ctrl+B through the serial console during system boot while the BIOS version prints:
To access the BIOS menu, use admin which is the default BIOS password:
NVIDIA recommends changing the default BIOS password; navigate to Security and select Administrator Password.
To validate or change the Secure Boot mode, navigate to Security and select Secure Boot:
In the Secure Boot menu, you can enable and disable Secure Boot mode. To install an unsigned version of Cumulus Linux or access ONIE without a prompt for a username and password, set Secure Boot to disabled:
To access ONIE when Secure Boot is enabled, authentication is necessary. The default username and password are both root:
ONIE: Rescue Mode ...
Platform : x86_64-mlnx_x86-r0
Version : 2021.02-5.3.0006-rc3-115200
Build Date: 2021-05-20T14:27+03:00
Info: Mounting kernel filesystems... done.
Info: Mounting ONIE-BOOT on /mnt/onie-boot ...
[ 17.011057] ext4 filesystem being mounted at /mnt/onie-boot supports timestamps until 2038 (0x7fffffff)
Info: Mounting EFI System on /boot/efi ...
Info: BIOS mode: UEFI
Info: Using eth0 MAC address: b8:ce:f6:3c:62:06
Info: eth0: Checking link... up.
Info: Trying DHCPv4 on interface: eth0
ONIE: Using DHCPv4 addr: eth0: 10.20.84.226 / 255.255.255.0
Starting: klogd... done.
Starting: dropbear ssh daemon... done.
Starting: telnetd... done.
discover: Rescue mode detected. Installer disabled.
Please press Enter to activate this console. To check the install status inspect /var/log/onie.log.
Try this: tail -f /var/log/onie.log
** Rescue Mode Enabled **
login: root
Password: root
ONIE:~ #
To validate the Secure Boot status of a system from Cumulus Linux, run the mokutil --sb-state command.
On a switch with the Spectrum-4 ASIC, if the ASIC firmware fails to boot, you see a message alerting you to contact NVIDIA Customer Support for further options.
The default password for the cumulus user account is cumulus. The first time you log into Cumulus Linux, you must change this default password. Be sure to update any automation scripts before you upgrade. You can use ONIE command line options to change the default password automatically during the Cumulus Linux image installation process. Refer to ONIE Installation Options.
This topic describes how to upgrade Cumulus Linux on your switch.
Consider deploying, provisioning, configuring, and upgrading switches using automation, even with small networks or test labs. During the upgrade process, you can upgrade dozens of devices in a repeatable manner. Using tools like Ansible, Chef, or Puppet for configuration management greatly increases the speed and accuracy of the next major upgrade; these tools also enable you to quickly swap failed switch hardware.
Understanding the location of configuration data is important for successful upgrades, migrations, and backup. As with other Linux distributions, the /etc directory is the primary location for all configuration data in Cumulus Linux. The following list contains the files you need to back up and migrate to a new release. Make sure you examine any changed files. Make the following files and directories part of a backup strategy.
File Name and Location
Description
Cumulus Linux Documentation
Debian Documentation
/etc/frr/
Routing application (responsible for BGP and OSPF)
If you are using the root user account, consider including /root/.
If you have custom user accounts, consider including /home/<username>/.
Run the net show configuration files | grep -B 1 "===" command and back up the files listed in the command output.
File Name and Location
Description
/etc/mlx/
Per-platform hardware configuration directory, created on first boot. Do not copy.
/etc/default/clagd
Created and managed by ifupdown2. Do not copy.
/etc/default/grub
Grub init table. Do not modify manually.
/etc/default/hwclock
Platform hardware-specific file. Created during first boot. Do not copy.
/etc/init
Platform initialization files. Do not copy.
/etc/init.d/
Platform initialization files. Do not copy.
/etc/fstab
Static information on filesystem. Do not copy.
/etc/image-release
System version data. Do not copy.
/etc/os-release
System version data. Do not copy.
/etc/lsb-release
System version data. Do not copy.
/etc/lvm/archive
Filesystem files. Do not copy.
/etc/lvm/backup
Filesystem files. Do not copy.
/etc/modules
Created during first boot. Do not copy.
/etc/modules-load.d/
Created during first boot. Do not copy.
/etc/sensors.d
Platform-specific sensor data. Created during first boot. Do not copy.
/root/.ansible
Ansible tmp files. Do not copy.
/home/cumulus/.ansible
Ansible tmp files. Do not copy.
The following commands verify which files have changed compared to the previous Cumulus Linux install. Be sure to back up any changed files.
Run the sudo dpkg --verify command to show a list of changed files.
Run the egrep -v '^$|^#|=""$' /etc/default/isc-dhcp-* command to see if any of the generated /etc/default/isc-* files have changed.
Back Up and Restore Configuration with NVUE
You can back up and restore the configuration file with NVUE only if you used NVUE commands to configure the switch you want to upgrade.
To back up and restore the configuration file:
Save the configuration to the /etc/nvue.d/startup.yaml file with the nv config save command:
cumulus@switch:~$ nv config save
saved
Copy the /etc/nvue.d/startup.yaml file off the switch to a different location.
After upgrade is complete, restore the configuration. Copy the /etc/nvue.d/startup.yaml file to the switch, then run the nv config apply startup command:
If NVUE introduces new syntax for the feature that a snippet configures, you must remove the snippet before upgrading.
Create a cl-support File
Before and after you upgrade the switch, run the cl-support script to create a cl-support archive file. The file is a compressed archive of useful information for troubleshooting. If you experience any issues during upgrade, you can send this archive file to the Cumulus Linux support team to investigate.
Create the cl-support archive file with the cl-support command:
cumulus@switch:~$ sudo cl-support
Copy the cl-support file off the switch to a different location.
After upgrade is complete, run the cl-support command again to create a new archive file:
cumulus@switch:~$ sudo cl-support
Upgrade Cumulus Linux
You can upgrade Cumulus Linux in one of two ways:
Install a Cumulus Linux image of the new release, using ONIE.
Upgrade only the changed packages using the sudo -E apt-get update and sudo -E apt-get upgrade command.
Cumulus Linux also provides ISSU to upgrade an active switch with minimal disruption to the network. See In-Service-System-Upgrade-ISSU.
To upgrade to Cumulus Linux 5.6.0 from Cumulus Linux 4.x or 3.x, you must install a disk image of the new release using ONIE. You cannot upgrade packages with the apt-get upgrade command.
Upgrading an MLAG pair requires additional steps. If you are using MLAG to dual connect two Cumulus Linux switches in your environment, follow the steps in Upgrade Switches in an MLAG Pair below to ensure a smooth upgrade.
Install a Cumulus Linux Image or Upgrade Packages?
The decision to upgrade Cumulus Linux by either installing a Cumulus Linux image or upgrading packages depends on your environment and your preferences. Here are some recommendations for each upgrade method.
Install a Cumulus Linux image if you are performing a rolling upgrade in a production environment and if you are using up-to-date and comprehensive automation scripts. This upgrade method enables you to choose the exact release to which you want to upgrade and is the only method available to upgrade your switch to a new release train (for example, from 4.4.3 to 5.6.0).
Be aware of the following when installing the Cumulus Linux image:
Installing a Cumulus Linux image is destructive; any configuration files on the switch are not saved; copy them to a different server before you start the Cumulus Linux image install.
You must move configuration data to the new OS using ZTP or automation while the OS is first booted, or soon afterwards using out-of-band management.
Merge conflicts with configuration file changes in the new release sometimes go undetected.
If configuration files do not restore correctly, you cannot ssh to the switch from in-band management. Use out-of-band connectivity (eth0 or console).
You must reinstall and reconfigure third-party applications after upgrade.
Run package upgrade if you are upgrading from Cumulus Linux 5.0.0 to a later 5.x release, or if you use third-party applications (package upgrade does not replace or remove third-party applications, unlike the Cumulus Linux image install).
Be aware of the following when upgrading packages:
You cannot upgrade the switch to a new release train. For example, you cannot upgrade the switch from 4.x to 5.x.
You can only use package upgrade to upgrade a switch with an image install to a maximum of two releases; for example, you can package upgrade a switch running the Cumulus Linux 5.4 image to 5.5 or 5.6 (5.4 plus two releases).
The sudo -E apt-get upgrade command might restart or stop services as part of the upgrade process.
The sudo -E apt-get upgrade command might disrupt core services by changing core service dependency packages.
After you upgrade, account UIDs and GIDs created by packages might be different on different switches, depending on the configuration and package installation history.
Cumulus Linux does not support the sudo -E apt-get dist-upgrade command. Be sure to use sudo -E apt-get upgrade when upgrading packages.
You can check the base image with the grep RELEASE /etc/image-release syntax.
Occasionally, a release contains a base OS upgrade and does not support package upgrade; release notes indicate when a release does not support package upgrade.
Cumulus Linux Image Install (ONIE)
ONIE is an open source project (equivalent to PXE on servers) that enables the installation of network operating systems (NOS) on a bare metal switch.
To upgrade the switch:
Back up the configurations off the switch.
Download the Cumulus Linux image.
Install the Cumulus Linux image with the onie-install -a -i <image-location> command, which boots the switch into ONIE. The following example command installs the image from a web server, then reboots the switch. There are additional ways to install the Cumulus Linux image, such as using FTP, a local file, or a USB drive. For more information, see Installing a New Cumulus Linux Image.
cumulus@switch:~$ sudo onie-install -a -i http://10.0.1.251/cumulus-linux-5.6.0-mlx-amd64.bin && sudo reboot
Restore the configuration files to the new release (NVIDIA does not recommend restoring files with automation).
Verify correct operation with the old configurations on the new release.
Reinstall third party applications and associated configurations.
Package Upgrade
NVUE deprecated the port split command options (2x10G, 2x25G, 2x40G, 2x50G, 2x100G, 2x200G, 4x10G, 4x25G, 4x50G, 4x100G, 8x50G) available in Cumulus Linux 5.3 and earlier. If you use NVUE to configure port breakout speeds in Cumulus 5.3 or earlier, NVUE automatically updates the configuration during upgrade to Cumulus Linux 5.5 and later to use the new format (2x, 4x, 8x).
Cumulus Linux continues to support the old port split format in the /etc/cumulus/ports.conf file; however NVIDIA recommends that you use the new format.
Cumulus Linux completely embraces the Linux and Debian upgrade workflow, where you use an installer to install a base image, then perform any upgrades within that release train with sudo -E apt-get update and sudo -E apt-get upgrade commands. Any packages that have changed after the base install get upgraded in place from the repository. All switch configuration files remain untouched, or in rare cases merged (using the Debian merge function) during the package upgrade.
When you use package upgrade to upgrade your switch, configuration data stays in place during the upgrade. If the new release updates a previously changed configuration file, the upgrade process prompts you to either specify the version you want to use or evaluate the differences.
Disk Space Requirements
Make sure you have enough disk space to perform a package upgrade. Cumulus Linux 5.6.0 requires:
0.6GB of free disk space to upgrade from 5.5
1.5GB of free disk space to upgrade from 5.4
Before you upgrade, run the sudo df -h command to show how much disk space you are currently using on the switch.
Upgrade all the packages to the latest distribution.
cumulus@switch:~$ sudo -E apt-get upgrade
If you do not need to reboot the switch after the upgrade completes, the upgrade ends, restarts all upgraded services, and logs messages in the /var/log/syslog file similar to the ones shown below. In the examples below, the process only upgrades the frr package.
Policy: Service frr.service action stop postponed
Policy: Service frr.service action start postponed
Policy: Restarting services: frr.service
Policy: Finished restarting services
Policy: Removed /usr/sbin/policy-rc.d
Policy: Upgrade is finished
If the upgrade process encounters changed configuration files that have new versions in the release to which you are upgrading, you see a message similar to this:
Configuration file '/etc/frr/daemons'
==> Modified (by you or by a script) since installation.
==> Package distributor has shipped an updated version.
What would you like to do about it ? Your options are:
Y or I : install the package maintainer's version
N or O : keep your currently-installed version
D : show the differences between the versions
Z : start a shell to examine the situation
The default action is to keep your current version.
*** daemons (Y/I/N/O/D/Z) [default=N] ?
To see the differences between the currently installed version and the new version, type D.
To keep the currently installed version, type N. The new package version installs with the suffix .dpkg-dist (for example, /etc/frr/daemons.dpkg-dist). When the upgrade completes and before you reboot, merge your changes with the changes from the newly installed file.
To install the new version, type I. Your currently installed version has the suffix .dpkg-old.
Cumulus Linux includes /etc/apt/sources.list in the cumulus-archive-keyring package. During upgrade, you must select if you want the new version from the package or the existing file.
When the upgrade is complete, you can search for the files with the sudo find / -mount -type f -name '*.dpkg-*' command.
If you see errors for expired GPG keys that prevent you from upgrading packages, follow the steps in Upgrading Expired GPG Keys.
Reboot the switch if the upgrade messages indicate that you need to perform a system restart.
cumulus@switch:~$ sudo -E apt-get upgrade
... upgrade messages here ...
*** Caution: Service restart prior to reboot could cause unpredictable behavior
*** System reboot required ***
cumulus@switch:~$ sudo reboot
Verify correct operation with the old configurations on the new version.
The first time you run the NVUE nv config apply command after upgrading to Cumulus Linux 5.6, NVUE might override certain existing configuration for features that are now configurable with NVUE. Immediately after you reboot the switch to complete the upgrade, NVIDIA recommends you either:
Package upgrade always updates to the latest available release in the Cumulus Linux repository. For example, if you are currently running Cumulus Linux 5.0.0 and run the sudo -E apt-get upgrade command on that switch, the packages upgrade to the latest releases in the latest 5.x release.
Because Cumulus Linux is a collection of different Debian Linux packages, be aware of the following:
The /etc/os-release and /etc/lsb-release files update to the currently installed Cumulus Linux release when you upgrade the switch using either package upgrade or Cumulus Linux image install. For example, if you run sudo -E apt-get upgrade and the latest Cumulus Linux release on the repository is 5.6.0, these two files display the release as 5.6.0 after the upgrade.
The /etc/image-release file updates only when you run a Cumulus Linux image install. Therefore, if you run a Cumulus Linux image install of Cumulus Linux 5.5.0, followed by a package upgrade to 5.5.1 using sudo -E apt-get upgrade, the /etc/image-release file continues to display Cumulus Linux 5.5.0, which is the originally installed base image.
Upgrade Switches in an MLAG Pair
If you are using MLAG to dual connect two switches in your environment, follow the steps below to upgrade the switches.
You must upgrade both switches in the MLAG pair to the same release of Cumulus Linux.
Only during the upgrade process does Cumulus Linux supports different software versions between MLAG peer switches. After you upgrade the first MLAG switch in the pair, run the clagctl showtimers command to monitor the init-delay timer. When the timer expires, make the upgraded MLAG switch the primary, then upgrade the peer to the same version of Cumulus Linux.
NVIDIA has not tested running different versions of Cumulus Linux on MLAG peer switches outside of the upgrade time period; you might see unexpected results.
Verify the switch is in the secondary role:
cumulus@switch:~$ nv show mlag
Shut down the core uplink layer 3 interfaces. The following example shuts down swp1:
cumulus@switch:~$ nv set interface swp1 link state down
cumulus@switch:~$ nv config apply
Shut down the peer link:
cumulus@switch:~$ nv set interface peerlink link state down
cumulus@switch:~$ nv config apply
To boot the switch into ONIE, run the onie-install -a -i <image-location> command. The following example command installs the image from a web server. There are additional ways to install the Cumulus Linux image, such as using FTP, a local file, or a USB drive. For more information, see Installing a New Cumulus Linux Image.
cumulus@switch:~$ sudo onie-install -a -i http://10.0.1.251/downloads/cumulus-linux-5.6.0-mlx-amd64.bin
To upgrade the switch with package upgrade instead of booting into ONIE, run the sudo -E apt-get update and sudo -E apt-get upgrade commands; see Package Upgrade.
Save the changes to the NVUE configuration from steps 2-3 and reboot the switch:
cumulus@switch:~$ nv config save
cumulus@switch:~$ nv action reboot system
If you installed a new image on the switch, restore the configuration files to the new release. If you performed an upgrade with apt, bring the uplink and peer link interfaces you shut down in steps 2-3 up:
cumulus@switch:~$ nv set interface swp1 link state up
cumulus@switch:~$ nv set interface peerlink link state down
cumulus@switch:~$ nv config apply
cumulus@switch:~$ nv config save
Verify STP convergence across both switches with the Linux mstpctl showall command. NVUE does not provide an equivalent command.
cumulus@switch:~$ mstpctl showall
Verify core uplinks and peer links are UP:
cumulus@switch:~$ nv show interface
Verify MLAG convergence:
cumulus@switch:~$ nv show mlag
Make this secondary switch the primary:
cumulus@switch:~$ nv set mlag priority 2084
Verify the other switch is now in the secondary role.
Repeat steps 2-9 on the new secondary switch.
Remove the priority 2048 and restore the priority back to 32768 on the current primary switch:
cumulus@switch:~$ nv set mlag priority 32768
Verify the switch is in the secondary role:
cumulus@switch:~$ clagctl status
Shut down the core uplink layer 3 interfaces:
cumulus@switch:~$ sudo ip link set <switch-port> down
Shut down the peer link:
cumulus@switch:~$ sudo ip link set peerlink down
To boot the switch into ONIE, run the onie-install -a -i <image-location> command. The following example command installs the image from a web server. There are additional ways to install the Cumulus Linux image, such as using FTP, a local file, or a USB drive. For more information, see Installing a New Cumulus Linux Image.
cumulus@switch:~$ sudo onie-install -a -i http://10.0.1.251/downloads/cumulus-linux-5.6.0-mlx-amd64.bin
To upgrade the switch with package upgrade instead of booting into ONIE, run the sudo -E apt-get update and sudo -E apt-get upgrade commands; see Package Upgrade.
Reboot the switch:
cumulus@switch:~$ sudo reboot
If you installed a new image on the switch, restore the configuration files to the new release.
Verify STP convergence across both switches:
cumulus@switch:~$ mstpctl showall
Verify that core uplinks and peer links are UP:
cumulus@switch:~$ ip addr show
Verify MLAG convergence:
cumulus@switch:~$ clagctl status
Make this secondary switch the primary:
cumulus@switch:~$ clagctl priority 2048
Verify the other switch is now in the secondary role.
Repeat steps 2-9 on the new secondary switch.
Remove the priority 2048 and restore the priority back to 32768 on the current primary switch:
cumulus@switch:~$ clagctl priority 32768
Roll Back a Cumulus Linux Installation
Even the most well planned and tested upgrades can result in unforeseen problems and sometimes the best solution is to roll back to the previous state. These main strategies require detailed planning and execution:
Flatten and rebuild. If the OS becomes unusable, you can use orchestration tools to reinstall the previous OS release from scratch and then rebuild the configuration automatically.
Restore to a previous state using a backup configuration captured before the upgrade.
The method you employ is specific to your deployment strategy. Providing detailed steps for each scenario is outside the scope of this document.
Third Party Packages
If you install any third party applications on a Cumulus Linux switch, configuration data is typically installed in the /etc directory, but it is not guaranteed. It is your responsibility to understand the behavior and configuration file information of any third party packages installed on the switch.
After you upgrade using a full Cumulus Linux image install, you need to reinstall any third party packages or any Cumulus Linux add-on packages.
To manage additional applications in the form of packages and to install the latest updates, use the Advanced Packaging Tool (apt).
Updating, upgrading, and installing packages with apt causes disruptions to network services:
Upgrading a package can cause services to restart or stop.
Installing a package sometimes disrupts core services by changing core service dependency packages. In some cases, installing new packages also upgrades additional existing packages due to dependencies.
If services stop, you need to reboot the switch to restart the services.
Update the Package Cache
To work correctly, apt relies on a local cache listing of the available packages. You must populate the cache initially, then periodically update it with sudo -E apt-get update:
Use the -E option with sudo whenever you run any apt-get command. This option preserves your environment variables (such as HTTP proxies) before you install new packages or upgrade your distribution.
List Available Packages
After the cache populates, use the apt-cache command to search the cache and find the packages of interest or to get information about an available package.
Here are examples of the search and show sub-commands:
cumulus@switch:~$ apt-cache search tcp
collectd-core - statistics collection and monitoring daemon (core system)
fakeroot - tool for simulating superuser privileges
iperf - Internet Protocol bandwidth measuring tool
iptraf-ng - Next Generation Interactive Colorful IP LAN Monitor
libfakeroot - tool for simulating superuser privileges - shared libraries
libfstrm0 - Frame Streams (fstrm) library
libibverbs1 - Library for direct userspace use of RDMA (InfiniBand/iWARP)
libnginx-mod-stream - Stream module for Nginx
libqt4-network - Qt 4 network module
librtr-dev - Small extensible RPKI-RTR-Client C library - development files
librtr0 - Small extensible RPKI-RTR-Client C library
libwiretap8 - network packet capture library -- shared library
libwrap0 - Wietse Venema's TCP wrappers library
libwrap0-dev - Wietse Venema's TCP wrappers library, development files
netbase - Basic TCP/IP networking system
nmap-common - Architecture independent files for nmap
nuttcp - network performance measurement tool
openssh-client - secure shell (SSH) client, for secure access to remote machines
openssh-server - secure shell (SSH) server, for secure access from remote machines
openssh-sftp-server - secure shell (SSH) sftp server module, for SFTP access from remote machines
python-dpkt - Python 2 packet creation / parsing module for basic TCP/IP protocols
rsyslog - reliable system and kernel logging daemon
socat - multipurpose relay for bidirectional data transfer
tcpdump - command-line network traffic analyzer
cumulus@switch:~$ apt-cache show tcpdump
Package: tcpdump
Version: 4.9.3-1~deb10u1
Installed-Size: 1109
Maintainer: Romain Francoise <rfrancoise@debian.org>
Architecture: amd64
Replaces: apparmor-profiles-extra (<< 1.12~)
Depends: libc6 (>= 2.14), libpcap0.8 (>= 1.5.1), libssl1.1 (>= 1.1.0)
Suggests: apparmor (>= 2.3)
Breaks: apparmor-profiles-extra (<< 1.12~)
Size: 400060
SHA256: 3a63be16f96004bdf8848056f2621fbd863fadc0baf44bdcbc5d75dd98331fd3
SHA1: 2ab9f0d2673f49da466f5164ecec8836350aed42
MD5sum: 603baaf914de63f62a9f8055709257f3
Description: command-line network traffic analyzer
This program allows you to dump the traffic on a network. tcpdump
is able to examine IPv4, ICMPv4, IPv6, ICMPv6, UDP, TCP, SNMP, AFS
BGP, RIP, PIM, DVMRP, IGMP, SMB, OSPF, NFS and many other packet
types.
.
It can be used to print out the headers of packets on a network
interface, filter packets that match a certain expression. You can
use this tool to track down network problems, to detect attacks
or to monitor network activities.
Description-md5: f01841bfda357d116d7ff7b7a47e8782
Homepage: http://www.tcpdump.org/
Multi-Arch: foreign
Section: net
Priority: optional
Filename: pool/upstream/t/tcpdump/tcpdump_4.9.3-1~deb10u1_amd64.deb
The search commands look for the search terms not only in the package name but in other parts of the package information; the search matches on more packages than you expect.
List Packages Installed on the System
The apt-cache command shows information about all the packages available in the repository. To see which packages are actually installed on your system with the version, run the following command.
cumulus@switch:~$ nv show platform software installed
Installed Package description package version
----------------- ------------------- ---------- --------------------
acpi displays information on ACPI devices acpi 1.7-1.1
acpi-support-base scripts for handling base ACPI events such as the power button acpi-support-base 0.142-8
acpid Advanced Configuration and Power Interface event daemon acpid 1:2.0.31-1
...
cumulus@switch:~$ dpkg -l
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name Version Architecture Description
+++-===================-=========================-============-=================================
ii acpi 1.7-1.1 amd64 displays information on ACPI devices
ii acpi-support-base 0.142-8 all scripts for handling base ACPI events such as th
ii acpid 1:2.0.31-1 amd64 Advanced Configuration and Power Interface event
ii adduser 3.118 all add and remove users and groups
ii apt 1.8.2 amd64 commandline package manager
ii arping 2.19-6 amd64 sends IP and/or ARP pings (to the MAC address)
ii arptables 0.0.4+snapshot20181021-4 amd64 ARP table administration
...
Show the Version of a Package
To show the version of a specific package installed on the system:
The following example command shows which version of the vrf package is on the system:
cumulus@switch:~$ nv show platform software installed vrf
running applied pending description
----------- ------------------- ------- ------- -----------
description Linux tools for VRF Description
package vrf Package
version 1.0-cl5.6.0u9 Version
The following example command shows which version of the vrf package is on the system:
cumulus@switch:~$ dpkg -l vrf
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name Version Architecture Description
+++-==========-============-============-=================================
ii vrf 1.0-cl5.6.0u9 amd64 Linux tools for VRF
Upgrade Packages
To upgrade all the packages installed on the system to their latest versions, run the following commands:
The system lists the packages for upgrade and prompts you to continue.
The above commands upgrade all installed versions with their latest versions but do not install any new packages.
Add New Packages
To add a new package, first ensure the package is not already on the system:
cumulus@switch:~$ dpkg -l | grep <name of package>
If the package is already on the system, you can update the package from the Cumulus Linux repository as part of the package upgrade process, which upgrades all packages on the system. See Upgrade Packages above.
If the package is not already on the system, add it by running sudo -E apt-get install <name of package>. This retrieves the package from the Cumulus Linux repository and installs it on your system together with any other dependent packages. The following example adds the tcpreplay package to the system:
cumulus@switch:~$ sudo -E apt-get update
cumulus@switch:~$ sudo -E apt-get install tcpreplay
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following NEW packages will be installed:
tcpreplay
0 upgraded, 1 newly installed, 0 to remove and 1 not upgraded.
Need to get 436 kB of archives.
After this operation, 1008 kB of additional disk space will be used
...
You can install several packages at the same time:
In some cases, installing a new package also upgrades additional existing packages due to dependencies. To view these additional packages before you install, run the apt-get install --dry-run command.
Add Packages From Another Repository
As shipped, Cumulus Linux searches the Cumulus Linux repository for available packages. You can add additional repositories to search by adding them to the list of sources that apt-get consults. See man sources.list for more information.
NVIDIA adds features or makes bug fixes to certain packages; do not replace these packages with versions from other repositories.
If you want to install packages that are not in the Cumulus Linux repository, the procedure is the same as above, but with one additional step.
NVIDIA does not test and Cumulus Linux Technical Support does not support packages that are not part of the Cumulus Linux repository.
Installing packages outside of the Cumulus Linux repository requires the use of sudo -E apt-get; however, depending on the package, you can use easy-install and other commands.
To install a new package, complete the following steps:
Run the dpkg command to ensure that the package is not already
installed on the system:
cumulus@switch:~$ dpkg -l | grep <name of package>
If the package is already on the system, ensure it is the version you need. If it is an older version, update the package from the Cumulus Linux repository:
If the package is not on the system, the package source location is not in the /etc/apt/sources.list file. Edit and add the appropriate source to the file. For example, add the following if you want a package from the Debian repository that is not in the Cumulus Linux repository:
deb http://http.us.debian.org/debian buster main
deb http://security.debian.org/ buster/updates main
Otherwise, /etc/apt/sources.list lists the repository but comments it out. To uncomment the repository, remove the # at the start of the line, then save the file.
Run sudo -E apt-get update, then install the package and upgrade:
Cumulus Linux contains a local archive embedded in the Cumulus Linux image. This archive, cumulus-local-apt-archive, contains the packages you need to install ifplugd, LDAP, RADIUS or TACACS+ without a network connection.
The archive contains the following packages:
audisp-tacplus
ifplugd
libdaemon0
libnss-ldapd
libnss-mapuser
libnss-tacplus
libpam-ldapd
libpam-radius-auth
libpam-tacplus
libtac2
libtacplus-map1
nslcd
Add these packages with apt-get update && apt-get install, as described above.
man pages for apt-get, dpkg, sources.list, apt_preferences
Zero Touch Provisioning - ZTP
Use ZTP to deploy network devices in large-scale environments. On first boot, Cumulus Linux runs ZTP, which executes the provisioning automation that deploys the device for its intended role in the network.
The provisioning framework allows you to execute a one-time, user-provided script. You can develop this script using a variety of automation tools and scripting languages. You can also use it to add the switch to a configuration management (CM) platform such as Puppet, Chef, CFEngine or a custom, proprietary tool.
While developing and testing the provisioning logic, you can use the ztp command in Cumulus Linux to run your provisioning script manually on a device.
ZTP in Cumulus Linux can run automatically in one of the following ways, in this order:
Through a local file
Using a USB drive inserted into the switch (ZTP-USB)
Through DHCP
Use a Local File
ZTP only looks one time for a ZTP script on the local file system when the switch boots. ZTP searches for an install script that matches an ONIE-style waterfall in /var/lib/cumulus/ztp, looking for the most specific name first, and ending at the most generic:
You can also trigger the ZTP process manually by running the ztp --run <URL> command, where the URL is the path to the ZTP script.
Use a USB Drive
NVIDIA tests this feature only with thumb drives, not an external large USB hard drive.
If the ztp process does not discover a local script, it tries one time to locate an inserted but unmounted USB drive. If it discovers one, it begins the ZTP process.
Cumulus Linux supports the use of a FAT32, FAT16, or VFAT-formatted USB drive as an installation source for ZTP scripts. You must plug in the USB drive before you power up the switch.
At minimum, the script must:
Install the Cumulus Linux operating system.
Copy over a basic configuration to the switch.
Restart the switch or the relevant services to get switchd up and running with that configuration.
Follow these steps to perform ZTP using a USB drive:
Copy the installation image to the USB drive.
The ztp process searches the root filesystem of the newly mounted drive for filenames matching an ONIE-style waterfall (see the patterns and examples above), looking for the most specific name first, and ending at the most generic.
ZTP parses the contents of the script to ensure it contains the CUMULUS-AUTOPROVISIONING flag (see example scripts).
The USB drive mounts to a temporary directory under /tmp (for example, /tmp/tmpigGgjf/). To reference files on the USB drive, use the environment variable ZTP_USB_MOUNTPOINT to refer to the USB root partition.
ZTP Over DHCP
If the ztp process does not discover a local ONIE script or applicable USB drive, it checks DHCP every ten seconds for up to five minutes for the presence of a ZTP URL specified in /var/run/ztp.dhcp. The URL can be any of HTTP, HTTPS, FTP, or TFTP.
For ZTP using DHCP, provisioning initially takes place over the management network and initiates through a DHCP hook. A DHCP option specifies a configuration script. The ZTP process requests this script from the Web server and the script executes locally.
The ZTP process over DHCP follows these steps:
The first time you boot Cumulus Linux, eth0 makes a DHCP request. By default, Cumulus Linux sends DHCP option 60 (the vendor class identifier) with the value cumulus-linux x86_64 to identify itself to the DHCP server.
The DHCP server offers a lease to the switch.
If option 239 is in the response, the ZTP process starts.
The ZTP process requests the contents of the script from the URL, sending additional HTTP headers containing details about the switch.
ZTP parses the contents of the script to ensure it contains the CUMULUS-AUTOPROVISIONING flag (see example scripts).
If provisioning is necessary, the script executes locally on the switch with root privileges.
ZTP examines the return code of the script. If the return code is 0, ZTP marks the provisioning state as complete in the autoprovisioning configuration file.
Trigger ZTP Over DHCP
If you have not yet provisioned the switch, you can trigger the ZTP process over DHCP when eth0 uses DHCP and one of the following events occur:
The switch boots.
You plug a cable into or unplug a cable from the eth0 port.
You disconnect, then reconnect the switch power cord.
You can also run the ztp --run <URL> command, where the URL is the path to the ZTP script.
Configure the DHCP Server
During the DHCP process over eth0, Cumulus Linux requests DHCP option 239. This option specifies the custom provisioning script.
For example, the /etc/dhcp/dhcpd.conf file for an ISC DHCP server looks like:
Do not use an underscore (_) in the hostname; underscores are not permitted in hostnames.
DHCP on Front Panel Ports
ZTP runs DHCP on all the front panel switch ports and on any active interface. ZTP assesses the list of active ports on every retry cycle. When it receives the DHCP lease and option 239 is present in the response, ZTP starts to execute the script.
Inspect HTTP Headers
The following HTTP headers in the request to the web server retrieve the provisioning script:
Header Value Example
------ ----- -------
User-Agent CumulusLinux-AutoProvision/0.4
CUMULUS-ARCH CPU architecture x86_64
CUMULUS-BUILD 5.1.0
CUMULUS-MANUFACTURER odm
CUMULUS-PRODUCTNAME switch_model
CUMULUS-SERIAL XYZ123004
CUMULUS-BASE-MAC 44:38:39:FF:40:94
CUMULUS-MGMT-MAC 44:38:39:FF:00:00
CUMULUS-VERSION 5.1.0
CUMULUS-PROV-COUNT 0
CUMULUS-PROV-MAX 32
Write ZTP Scripts
You must include the following line in any of the supported scripts that you expect to run using the autoprovisioning framework.
# CUMULUS-AUTOPROVISIONING
The script must contain the CUMULUS-AUTOPROVISIONING flag. You can include this flag in a comment or remark; you do not need to echo or write the flag to stdout.
You can write the script in any language that Cumulus Linux supports, such as:
Perl
Python
Ruby
Shell
The script must return an exit code of 0 upon success to mark the process as complete in the autoprovisioning configuration file.
The following script installs Cumulus Linux from a USB drive and applies a configuration:
#!/bin/bash
function error() {
echo -e "\e[0;33mERROR: The ZTP script failed while running the command $BASH_COMMAND at line $BASH_LINENO.\e[0m" >&2
exit 1
}
# Log all output from this script
exec >> /var/log/autoprovision 2>&1
date "+%FT%T ztp starting script $0"
trap error ERR
#Add Debian Repositories
echo "deb http://http.us.debian.org/debian buster main" >> /etc/apt/sources.list
echo "deb http://security.debian.org/ buster/updates main" >> /etc/apt/sources.list
#Update Package Cache
apt-get update -y
#Load interface config from usb
cp ${ZTP_USB_MOUNTPOINT}/interfaces /etc/network/interfaces
#Load port config from usb
# (if breakout cables are used for certain interfaces)
cp ${ZTP_USB_MOUNTPOINT}/ports.conf /etc/cumulus/ports.conf
#Reload interfaces to apply loaded config
ifreload -a
# CUMULUS-AUTOPROVISIONING
exit 0
Continue Provisioning
Typically ZTP exits after executing the script locally and does not continue. To continue with provisioning so that you do not have to intervene manually or embed an Ansible callback into the script, you can add the CUMULUS-AUTOPROVISION-CASCADE directive.
Best Practices
ZTP scripts come in different forms and frequently perform the same tasks. As BASH is the most common language for ZTP scripts, use the following BASH snippets to perform common tasks with robust error checking.
Set the Default Cumulus User Password
The default cumulus user account password is cumulus. When you log into Cumulus Linux for the first time, you must provide a new password for the cumulus account, then log back into the system.
Add the following function to your ZTP script to change the default cumulus user account password to a clear-text password. The example changes the password cumulus to MyP4$$word.
function set_password(){
# Unexpire the cumulus account
passwd -x 99999 cumulus
# Set the password
echo 'cumulus:MyP4$$word' | chpasswd
}
set_password
If you have an insecure management network, set the password with an encrypted hash instead of a clear-text password.
First, generate a sha-512 password hash with the following python commands. The example commands generate a sha-512 password hash for the password MyP4$$word.
Then, add the following function to the ZTP script to change the default cumulus user account password:
function set_password(){
# Unexpire the cumulus account
passwd -x 99999 cumulus
# Set the password
usermod -p '$6$hs7OPmnrfvLNKfoZ$iB3hy5N6Vv6koqDmxixpTO6lej6VaoKGvs5E8p5zNo4tPec0KKqyQnrFMII3jGxVEYWntG9e7Z7DORdylG5aR/' cumulus
}
set_password
Test DNS Name Resolution
DNS names are frequently used in ZTP scripts. The ping_until_reachable function tests that each DNS name resolves into a reachable IP address. Call this function with each DNS target used in your script before you use the DNS name elsewhere in your script.
The following example shows how to call the ping_until_reachable function in the context of a larger task.
function ping_until_reachable(){
last_code=1
max_tries=30
tries=0
while [ "0" != "$last_code" ] && [ "$tries" -lt "$max_tries" ]; do
tries=$((tries+1))
echo "$(date) INFO: ( Attempt $tries of $max_tries ) Pinging $1 Target Until Reachable."
ping $1 -c2 &> /dev/null
last_code=$?
sleep 1
done
if [ "$tries" -eq "$max_tries" ] && [ "$last_code" -ne "0" ]; then
echo "$(date) ERROR: Reached maximum number of attempts to ping the target $1 ."
exit 1
fi
}
Check the Cumulus Linux Release
The following script segment demonstrates how to check which Cumulus Linux release is running and upgrades the node if the release is not the target release. If the release is the target release, normal ZTP tasks execute. This script calls the ping_until_reachable script (described above) to make sure the server holding the image server and the ZTP script is reachable.
If you apply a management VRF in your script, either apply it last or reboot instead. If you do not apply a management VRF last, you need to prepend any commands that require eth0 to communicate out with /usr/bin/ip vrf exec mgmt; for example, /usr/bin/ip vrf exec mgmt apt-get update -y.
Perform Ansible Provisioning Callbacks
After initially configuring a node with ZTP, use Provisioning Callbacks to inform Ansible Tower or AWX that the node is ready for more detailed provisioning. The following example demonstrates how to use a provisioning callback:
Make sure to disable the DHCP hostname override setting in your script.
function set_hostname(){
# Remove DHCP Setting of Hostname
sed s/'SETHOSTNAME="yes"'/'SETHOSTNAME="no"'/g -i /etc/dhcp/dhclient-exit-hooks.d/dhcp-sethostname
hostnamectl set-hostname $1
}
Test ZTP Scripts
Use these commands to test and debug your ZTP scripts.
You can use verbose mode to debug your script and see where your script fails. Include the -v option when you run ZTP:
cumulus@switch:~$ sudo ztp -v -r http://192.0.2.1/demo.sh
Attempting to provision via ZTP Manual from http://192.0.2.1/demo.sh
Broadcast message from root@dell-s6010-01 (ttyS0) (Tue May 10 22:44:17 2016):
ZTP: Attempting to provision via ZTP Manual from http://192.0.2.1/demo.sh
ZTP Manual: URL response code 200
ZTP Manual: Found Marker CUMULUS-AUTOPROVISIONING
ZTP Manual: Executing http://192.0.2.1/demo.sh
error: ZTP Manual: Payload returned code 1
error: Script returned failure
To see results of the most recent ZTP execution, you can run the ztp -s command.
cumulus@switch:~$ ztp -s
ZTP INFO:
State enabled
Version 1.0
Result Script Failure
Date Mon 20 May 2019 09:31:27 PM UTC
Method ZTP DHCP
URL http://192.0.2.1/demo.sh
If ZTP runs when the switch boots and not manually, you can run the systemctl -l status ztp.service then journalctl -l -u ztp.service to see if any failures occur:
cumulus@switch:~$ sudo systemctl -l status ztp.service
● ztp.service - Cumulus Linux ZTP
Loaded: loaded (/lib/systemd/system/ztp.service; enabled)
Active: failed (Result: exit-code) since Wed 2016-05-11 16:38:45 UTC; 1min 47s ago
Docs: man:ztp(8)
Process: 400 ExecStart=/usr/sbin/ztp -b (code=exited, status=1/FAILURE)
Main PID: 400 (code=exited, status=1/FAILURE)
May 11 16:37:45 cumulus ztp[400]: ztp [400]: ZTP USB: Device not found
May 11 16:38:45 dell-s6010-01 ztp[400]: ztp [400]: ZTP DHCP: Looking for ZTP Script provided by DHCP
May 11 16:38:45 dell-s6010-01 ztp[400]: ztp [400]: Attempting to provision via ZTP DHCP from http://192.0.2.1/demo.sh
May 11 16:38:45 dell-s6010-01 ztp[400]: ztp [400]: ZTP DHCP: URL response code 200
May 11 16:38:45 dell-s6010-01 ztp[400]: ztp [400]: ZTP DHCP: Found Marker CUMULUS-AUTOPROVISIONING
May 11 16:38:45 dell-s6010-01 ztp[400]: ztp [400]: ZTP DHCP: Executing http://192.0.2.1/demo.sh
May 11 16:38:45 dell-s6010-01 ztp[400]: ztp [400]: ZTP DHCP: Payload returned code 1
May 11 16:38:45 dell-s6010-01 ztp[400]: ztp [400]: Script returned failure
May 11 16:38:45 dell-s6010-01 systemd[1]: ztp.service: main process exited, code=exited, status=1/FAILURE
May 11 16:38:45 dell-s6010-01 systemd[1]: Unit ztp.service entered failed state.
cumulus@switch:~$
cumulus@switch:~$ sudo journalctl -l -u ztp.service --no-pager
-- Logs begin at Wed 2016-05-11 16:37:42 UTC, end at Wed 2016-05-11 16:40:39 UTC. --
May 11 16:37:45 cumulus ztp[400]: ztp [400]: /var/lib/cumulus/ztp: Sate Directory does not exist. Creating it...
May 11 16:37:45 cumulus ztp[400]: ztp [400]: /var/run/ztp.lock: Lock File does not exist. Creating it...
May 11 16:37:45 cumulus ztp[400]: ztp [400]: /var/lib/cumulus/ztp/ztp_state.log: State File does not exist. Creating it...
May 11 16:37:45 cumulus ztp[400]: ztp [400]: ZTP LOCAL: Looking for ZTP local Script
May 11 16:37:45 cumulus ztp[400]: ztp [400]: ZTP LOCAL: Waterfall search for /var/lib/cumulus/ztp/cumulus-ztp-x86_64-dell_s6010_s1220-rUNKNOWN
May 11 16:37:45 cumulus ztp[400]: ztp [400]: ZTP LOCAL: Waterfall search for /var/lib/cumulus/ztp/cumulus-ztp-x86_64-dell_s6010_s1220
May 11 16:37:45 cumulus ztp[400]: ztp [400]: ZTP LOCAL: Waterfall search for /var/lib/cumulus/ztp/cumulus-ztp-x86_64-dell
May 11 16:37:45 cumulus ztp[400]: ztp [400]: ZTP LOCAL: Waterfall search for /var/lib/cumulus/ztp/cumulus-ztp-x86_64
May 11 16:37:45 cumulus ztp[400]: ztp [400]: ZTP LOCAL: Waterfall search for /var/lib/cumulus/ztp/cumulus-ztp
May 11 16:37:45 cumulus ztp[400]: ztp [400]: ZTP USB: Looking for unmounted USB devices
May 11 16:37:45 cumulus ztp[400]: ztp [400]: ZTP USB: Parsing partitions
May 11 16:37:45 cumulus ztp[400]: ztp [400]: ZTP USB: Device not found
May 11 16:38:45 dell-s6010-01 ztp[400]: ztp [400]: ZTP DHCP: Looking for ZTP Script provided by DHCP
May 11 16:38:45 dell-s6010-01 ztp[400]: ztp [400]: Attempting to provision via ZTP DHCP from http://192.0.2.1/demo.sh
May 11 16:38:45 dell-s6010-01 ztp[400]: ztp [400]: ZTP DHCP: URL response code 200
May 11 16:38:45 dell-s6010-01 ztp[400]: ztp [400]: ZTP DHCP: Found Marker CUMULUS-AUTOPROVISIONING
May 11 16:38:45 dell-s6010-01 ztp[400]: ztp [400]: ZTP DHCP: Executing http://192.0.2.1/demo.sh
May 11 16:38:45 dell-s6010-01 ztp[400]: ztp [400]: ZTP DHCP: Payload returned code 1
May 11 16:38:45 dell-s6010-01 ztp[400]: ztp [400]: Script returned failure
May 11 16:38:45 dell-s6010-01 systemd[1]: ztp.service: main process exited, code=exited, status=1/FAILURE
May 11 16:38:45 dell-s6010-01 systemd[1]: Unit ztp.service entered failed state.
Instead of running journalctl, you can see the log history by running:
cumulus@switch:~$ cat /var/log/syslog | grep ztp
2016-05-11T16:37:45.132583+00:00 cumulus ztp [400]: /var/lib/cumulus/ztp: State Directory does not exist. Creating it...
2016-05-11T16:37:45.134081+00:00 cumulus ztp [400]: /var/run/ztp.lock: Lock File does not exist. Creating it...
2016-05-11T16:37:45.135360+00:00 cumulus ztp [400]: /var/lib/cumulus/ztp/ztp_state.log: State File does not exist. Creating it...
2016-05-11T16:37:45.185598+00:00 cumulus ztp [400]: ZTP LOCAL: Looking for ZTP local Script
2016-05-11T16:37:45.485084+00:00 cumulus ztp [400]: ZTP LOCAL: Waterfall search for /var/lib/cumulus/ztp/cumulus-ztp-x86_64-dell_s6010_s1220-rUNKNOWN
2016-05-11T16:37:45.486394+00:00 cumulus ztp [400]: ZTP LOCAL: Waterfall search for /var/lib/cumulus/ztp/cumulus-ztp-x86_64-dell_s6010_s1220
2016-05-11T16:37:45.488385+00:00 cumulus ztp [400]: ZTP LOCAL: Waterfall search for /var/lib/cumulus/ztp/cumulus-ztp-x86_64-dell
2016-05-11T16:37:45.489665+00:00 cumulus ztp [400]: ZTP LOCAL: Waterfall search for /var/lib/cumulus/ztp/cumulus-ztp-x86_64
2016-05-11T16:37:45.490854+00:00 cumulus ztp [400]: ZTP LOCAL: Waterfall search for /var/lib/cumulus/ztp/cumulus-ztp
2016-05-11T16:37:45.492296+00:00 cumulus ztp [400]: ZTP USB: Looking for unmounted USB devices
2016-05-11T16:37:45.493525+00:00 cumulus ztp [400]: ZTP USB: Parsing partitions
2016-05-11T16:37:45.636422+00:00 cumulus ztp [400]: ZTP USB: Device not found
2016-05-11T16:38:43.372857+00:00 cumulus ztp [1805]: Found ZTP DHCP Request
2016-05-11T16:38:45.696562+00:00 cumulus ztp [400]: ZTP DHCP: Looking for ZTP Script provided by DHCP
2016-05-11T16:38:45.698598+00:00 cumulus ztp [400]: Attempting to provision via ZTP DHCP from http://192.0.2.1/demo.sh
2016-05-11T16:38:45.816275+00:00 cumulus ztp [400]: ZTP DHCP: URL response code 200
2016-05-11T16:38:45.817446+00:00 cumulus ztp [400]: ZTP DHCP: Found Marker CUMULUS-AUTOPROVISIONING
2016-05-11T16:38:45.818402+00:00 cumulus ztp [400]: ZTP DHCP: Executing http://192.0.2.1/demo.sh
2016-05-11T16:38:45.834240+00:00 cumulus ztp [400]: ZTP DHCP: Payload returned code 1
2016-05-11T16:38:45.835488+00:00 cumulus ztp [400]: Script returned failure
2016-05-11T16:38:45.876334+00:00 cumulus systemd[1]: ztp.service: main process exited, code=exited, status=1/FAILURE
2016-05-11T16:38:45.879410+00:00 cumulus systemd[1]: Unit ztp.service entered failed state.
If you see that the issue is a script failure, you can modify the script and then run ZTP manually using ztp -v -r <URL/path to that script>, as above.
cumulus@switch:~$ sudo ztp -v -r http://192.0.2.1/demo.sh
Attempting to provision via ZTP Manual from http://192.0.2.1/demo.sh
Broadcast message from root@dell-s6010-01 (ttyS0) (Tue May 10 22:44:17 2019):
ZTP: Attempting to provision via ZTP Manual from http://192.0.2.1/demo.sh
ZTP Manual: URL response code 200
ZTP Manual: Found Marker CUMULUS-AUTOPROVISIONING
ZTP Manual: Executing http://192.0.2.1/demo.sh
error: ZTP Manual: Payload returned code 1
error: Script returned failure
cumulus@switch:~$ sudo ztp -s
State enabled
Version 1.0
Result Script Failure
Date Mon 20 May 2019 09:31:27 PM UTC
Method ZTP Manual
URL http://192.0.2.1/demo.sh
Use the following command to check syslog for information about ZTP:
Errors in syslog for ZTP like those shown above often occur if you create or edit the script on a Windows machine. Check to make sure that the \r\n characters are not present in the end-of-line encodings.
Use the cat -v ztp.sh command to view the contents of the script and search for any hidden characters.
root@oob-mgmt-server:/var/www/html# cat -v ./ztp_oob_windows.sh
#!/bin/bash^M
^M
###################^M
# ZTP Script^M
###################^M
^M
/usr/cumulus/bin/cl-license -i http://192.168.0.254/license.txt^M
^M
# Clean method of performing a Reboot^M
nohup bash -c 'sleep 2; shutdown now -r "Rebooting to Complete ZTP"' &^M
^M
exit 0^M
^M
# The line below is required to be a valid ZTP script^M
#CUMULUS-AUTOPROVISIONING^M
root@oob-mgmt-server:/var/www/html#
The ^M characters in the output of your ZTP script, as shown above, indicate the presence of Windows end-of-line encodings that you need to remove.
Use the translate (tr) command on any Linux system to remove the '\r' characters from the file.
root@oob-mgmt-server:/var/www/html# tr -d '\r' < ztp_oob_windows.sh > ztp_oob_unix.sh
root@oob-mgmt-server:/var/www/html# cat -v ./ztp_oob_unix.sh
#!/bin/bash
###################
# ZTP Script
###################
/usr/cumulus/bin/cl-license -i http://192.168.0.254/license.txt
# Clean method of performing a Reboot
nohup bash -c 'sleep 2; shutdown now -r "Rebooting to Complete ZTP"' &
exit 0
# The line below is required to be a valid ZTP script
#CUMULUS-AUTOPROVISIONING
root@oob-mgmt-server:/var/www/html#
Manually Use the ztp Command
To enable ZTP, use the -e option:
cumulus@switch:~$ sudo ztp -e
When you enable ZTP, it tries to run the next time the switch boots. However, if ZTP already ran on a previous boot up or if there is a manual configuration, ZTP exits without trying to look for a script.
ZTP checks for these manual configurations when the switch boots:
Password changes
Users and groups changes
Packages changes
Interfaces changes
When the switch boots for the first time, ZTP records the state of important files that can update after you configure the switch. After a reboot, ZTP compares the recorded state to the current state of these files. If they do not match, ZTP considers the switch as already provisioned and exits. ZTP only deletes these files after a reset.
To reset ZTP to its original state, use the -R option. This removes the ztp directory and ZTP runs the next time the switch reboots.
cumulus@switch:~$ sudo ztp -R
To disable ZTP, use the -d option:
cumulus@switch:~$ sudo ztp -d
To force provisioning to occur and ignore the status listed in the configuration file, use the -r option:
cumulus@switch:~$ sudo ztp -r cumulus-ztp.sh
To see the current ZTP state, use the -s option:
cumulus@switch:~$ sudo ztp -s
ZTP INFO:
State disabled
Version 1.0
Result success
Date Mon May 20 21:51:04 2019 UTC
Method Switch manually configured
URL None
Considerations
While you are writing a provisioning script, you sometimes need to reboot the switch.
You can use the Cumulus Linux onie-select -i command to reprovision the switch and install a network operating system again using ONIE.
System Configuration
This section describes how to configure the following system settings:
NVUE is an object-oriented, schema driven model of a complete Cumulus Linux system (hardware and software) providing a robust API that allows for multiple interfaces to both view (show) and configure (set and unset) any element within a system running the NVUE software.
For a description of the NVUE object model, go to NVUE Object Model.
For an overview of the NVUE CLI commands, go to NVUE CLI.
For information on how to access and use the NVUE API, go to NVUE API.
For information on how to use NVUE snippets, go to NVUE Snippets.
NVUE Object Model
The NVUE object model definition uses the OpenAPI specification (OAS). Similar to YANG (RFC 6020 and RFC 7950), OAS is a data definition, manipulation, and modeling language (DML) that lets you build model-driven interfaces for both humans and machines. Although the computer networking and telecommunications industry commonly uses YANG (standardized by IETF) as a DML, the adoption of OpenAPI is broader, spanning cloud to compute to storage to IoT and even social media. The OpenAPI Initiative (OAI) consortium leads OpenAPI standardization, a chartered project under the Linux Foundation.
The OAS schema forms the management plane model with which you configure, monitor, and manage the Cumulus Linux switch. The v3.0.2 version of OAS defines the NVUE data model.
Like other systems that use OpenAPI, the NVUE OAS schema defines the endpoints (paths) exposed as RESTful APIs. With these REST APIs, you can perform various create, retrieve, update, delete, and eXecute (CRUDX) operations. The OAS schema also describes the API inputs and outputs (data models).
You can use the NVUE object model in these two ways:
Through the NVUE REST API, where you run the GET, PATCH, DELETE, and other REST APIs on the NVUE object model endpoints to configure, monitor, and manage the switch. Because of the large user community and maturity of OAS, you can use several popular tools and libraries to create client-side bindings to use the NVUE REST API.
Through the NVUE CLI, where you configure, monitor and manage the Cumulus Linux network elements. The CLI commands translate to their equivalent REST APIs, which Cumulus Linux then runs on the NVUE object model.
The CLI and the REST API are equivalent in functionality; you can run all management operations from the REST API or the CLI. The NVUE object model drives both the REST API and the CLI management operations. All operations are consistent; for example, the CLI nv show commands reflect any PATCH operation (create) you run through the REST API.
NVUE follows a declarative model, removing context-specific commands and settings. It is structured as a big tree that represents the entire state of a Cumulus Linux instance. At the base of the tree are high level branches representing objects, such as router and interface. Under each of these branches are further branches. As you navigate through the tree, you gain a more specific context. At the leaves of the tree are actual attributes, represented as key-value pairs. The path through the tree is similar to a filesystem path.
Cumulus Linux installs NVUE by default and enables the NVUE service nvued.
NVUE CLI
The NVUE CLI has a flat structure instead of a modal structure. Therefore, you can run all commands from the primary prompt instead of only in a specific mode.
You can choose to configure Cumulus Linux either with NVUE commands or Linux commands (with vtysh or by manually editing configuration files). Do not run both NVUE configuration commands (such as nv set, nv unset, nv action, and nv config) and Linux commands to configure the switch. NVUE commands replace the configuration in files such as /etc/network/interfaces and /etc/frr/frr.conf, and remove any configuration you add manually or with automation tools like Ansible, Chef, or Puppet.
If you choose to configure Cumulus Linux with NVUE, you can configure features that do not yet support the NVUE Object Model by creating snippets. See NVUE Snippets.
Command Syntax
NVUE commands all begin with nv and fall into one of three syntax categories:
Configuration (nv set and nv unset)
Monitoring (nv show)
Configuration management (nv config)
Action commands (nv action)
Command Completion
As you enter commands, you can get help with the valid keywords or options using the tab key. For example, using tab completion with nv set displays the possible options for the command and returns you to the command prompt to complete the command.
cumulus@switch:~$ nv set <<press tab>>
acl evpn mlag qos service vrf
bridge interface nve router system
cumulus@switch:~$ nv set
Command Question Mark
You can type a question mark (?) after a command to display required information quickly and concisely. When you type ?, NVUE specifies the value type, range, and options with a brief description of each; for example:
cumulus@switch:~$ nv set interface swp1 link state ?
[Enter]
down The interface is not ready
up The interface is ready
cumulus@switch:~$ nv set interface swp1 link mtu ?
<arg> (integer:552 - 9216)
cumulus@switch:~$ nv set interface swp1 link speed ?
<arg> (string | enum:10M,100M,1G,10G,25G,40G,50G,100G,200G,40
0G,800G,auto)
NVUE also indicates if you need to provide specific values for the command:
NVUE supports command abbreviation, where you can type a certain number of characters instead of a whole command to speed up CLI interaction. For example, instead of typing nv show interface, you can type nv sh int.
If the command you type is ambiguous, NVUE shows the reason for the ambiguity so that you can correct the shortcut. For example:
cumulus@switch:~$ nv s i
Ambiguous Command:
set interface
show interface
Command Help
As you enter commands, you can get help with command syntax by entering -h or --help at various points within a command entry. For example, to examine the options available for nv set interface, enter nv set interface -h or nv set interface --help.
cumulus@switch:~$ nv set interface -h
usage:
nv [options] set interface <interface-id>
Description:
interface Update all interfaces
Identifiers:
<interface-id> Interface (interface-name)
Output Options:
-o <format>, --output <format>
Supported formats: json, yaml, auto, constable, end-table, commands (default:auto)
--color (on|off|auto)
Toggle coloring of output (default: auto)
--paginate (on|off|auto)
Whether to send output to a pager (default: off)
General Options:
-h, --help Show help.
Command List
You can list all the NVUE commands by running nv list-commands. See List All NVUE Commands below.
Command History
At the command prompt, press the Up Arrow and Down Arrow keys to move back and forth through the list of commands you entered previously. When you find the command you want to use, you can run the command by pressing Enter. You can also modify the command before you run it.
Command Categories
The NVUE CLI has a flat structure; however, the commands are in three functional categories:
Configuration
Monitoring
Configuration Management
Action
Configuration Commands
The NVUE configuration commands modify switch configuration. You can set and unset configuration options.
The nv set and nv unset commands are in the following categories. Each command group includes subcommands. Use command completion (press the tab key) to list the subcommands.
Command Group
Description
nv set acl nv unset acl
Configures ACLs.
nv set bridge nv unset bridge
Configures a bridge domain. This is where you configure bridge attributes, such as the bridge type (VLAN-aware), the STP state and priority, and VLANs.
nv set evpn nv unset evpn
Configures EVPN. This is where you enable and disable the EVPN control plane, and set EVPN route advertise, multihoming, and duplicate address detection options.
nv set interface <interface-id> nv unset interface <interface-id>
Configures the switch interfaces. Use this command to configure bond and bridge interfaces, interface IP addresses and descriptions, VLAN IDs, and links (MTU, FEC, speed, duplex, and so on).
nv set mlag nv unset mlag
Configures MLAG. This is where you configure the backup IP address or interface, MLAG system MAC address, peer IP address, MLAG priority, and the delay before bonds come up.
nv set nve nv unset nve
Configures network virtualization (VXLAN) settings. This is where you configure the UDP port for VXLAN frames, control dynamic MAC learning over VXLAN tunnels, enable and disable ARP and ND suppression, and configure how Cumulus Linux handles BUM traffic in the overlay.
nv set qos nv unset qos
Configures QoS RoCE.
nv set router nv unset router
Configures router policies (prefix list rules and route maps), sets global BGP options (enable and disable, ASN and router ID, BGP graceful restart and shutdown), global OSPF options (enable and disable, router ID, and OSPF timers) PIM, IGMP, PBR, VRR, and VRRP.
nv set service nv unset service
Configures DHCP relays and servers, NTP, PTP, LLDP, SNMP servers, DNS, and syslog.
nv set system nv unset system
Configures system settings, such as the hostname of the switch, pre and post login messages, reboot options (warm, cold, fast), the time zone and global system settings, such as the anycast ID, the system MAC address, and the anycast MAC address. This is also where you configure SPAN and ERSPAN sessions and set how configuration apply operations work (which files to ignore and which files to overwrite; see Configure NVUE to Ignore Linux Files).
nv set vrf <vrf-id> nv unset vrf <vrf-id>
Configures VRFs. This is where you configure VRF-level configuration for PTP, BGP, OSPF, and EVPN.
Monitoring Commands
The NVUE monitoring commands show various parts of the network configuration. For example, you can show the complete network configuration or only interface configuration. The monitoring commands are in the following categories. Each command group includes subcommands. Use command completion (press the tab key) to list the subcommands.
Command Group
Description
nv show acl
Shows ACL configuration.
nv show action
Shows information about the action commands that reset counters and remove conflicts.
nv show bridge
Shows bridge domain configuration.
nv show evpn
Shows EVPN configuration.
nv show interface
Shows interface configuration and counters.
nv show mlag
Shows MLAG configuration.
nv show nve
Shows network virtualization configuration, such as VXLAN-specfic MLAG configuration and VXLAN flooding.
nv show platform
Shows platform configuration, such as hardware and software components.
nv show qos
Shows QoS RoCE configuration.
nv show router
Shows router configuration, such as router policies, global BGP and OSPF configuration, PBR, PIM, IGMP, VRR, and VRRP configuration.
nv show service
Shows DHCP relays and server, NTP, PTP, LLDP, and syslog configuration.
nv show system
Shows global system settings, such as the reserved routing table range for PBR and the reserved VLAN range for layer 3 VNIs. You can also see system login messages and switch reboot history.
nv show vrf
Shows VRF configuration.
The following example shows the nv show router commands after pressing the tab key, then shows the output of the nv show router bgp command.
cumulus@leaf01:mgmt:~$ nv show router <<tab>>
adaptive-routing igmp ospf pim ptm vrrp
bgp nexthop-group pbr policy vrr
cumulus@leaf01:mgmt:~$ nv show router bgp
operational applied pending description
------------------------------ ----------- ------- ----------- ----------------------------------------------------------------------
enable off on Turn the feature 'on' or 'off'. The default is 'off'.
autonomous-system none ASN for all VRFs, if a single AS is in use. If "none", then ASN mu...
graceful-shutdown off Graceful shutdown enable will initiate the GSHUT community to be an...
policy-update-timer 5 Wait time in seconds before processing updates to policies to ensur...
router-id none BGP router-id for all VRFs, if a common one is used. If "none", th...
wait-for-install off bgp waits for routes to be installed into kernel/asic before advert...
convergence-wait
establish-wait-time 0 Maximum time to wait to establish BGP sessions. Any peers which do...
time 0 Time to wait for peers to send end-of-RIB before router performs pa...
graceful-restart
mode helper-only Role of router during graceful restart. helper-only, router is in h...
path-selection-deferral-time 360 Used by the restarter as an upper-bounds for waiting for peering es...
restart-time 120 Amount of time taken to restart by router. It is advertised to the...
stale-routes-time 360 Specifies an upper-bounds on how long we retain routes from a resta...
cumulus@leaf01:mgmt:~$
If there are no pending or applied configuration changes, the nv show command only shows the running configuration (under operational).
Additional options are available for the nv show commands. For example, you can choose the configuration you want to show (pending, applied, startup, or operational). You can also turn on colored output, and paginate specific output.
Option
Description
--view
Shows these different views: acl-statistics, brief, detail, lldp, mac, mlag-cc, pluggables, qos-profile, and small. This option is available for the nv show interface command only.For example, the nv show interface --view=small command shows a list of the interfaces on the switch and the nv show interface --view=brief command shows information about each interface on the switch, such as the interface type, speed, remote host and port.The nv show interface --view=mac command shows the MAC address of each interface and the nv show interface --view=qos-profile command shows the QoS profile for the interfaces on the switch.Note: The description column only shows in the output when you use the --view=detail option.
--filter
Filters show command output on column data. For example, the nv show interface --filter mtu=1500 shows only the interfaces with MTU set to 1500.To filter on multiple column outputs, enclose the entire filter in double quotes; for example, nv show interface --filter "type=bridge&mtu=9216" shows data for bridges with MTU 9216.You can use wildcards; for example, nv show interface swp1 --filter "ip.address=1*" shows all IP addresses that start with 1 for swp1.You can filter on all revisions (operational, applied, and pending); for example, nv show interface --filter "ip.address=1*" --rev=applied shows all IP addresses that start with 1 for swp1 in the applied revision.
--rev <revision>
Shows a detached pending configuration. See the nv config detach configuration management command below. For example, nv show --rev 1. You can also show only applied or only operational information in the nv show output. For example, to show only the applied settings for swp1 configuration, run the nv show interface swp1 --rev=applied command. To show only the operational settings for swp1 configuration, run the nv show interface swp1 --rev=operational command.
--applied
Shows configuration applied with the nv config apply command. For example, nv show --applied interface bond1.
--operational
Shows the running configuration (the actual system state). For example, nv show --operational interface bond1 shows the running configuration for bond1. The running and applied configuration should be the same. If different, inspect the logs.
--pending
Shows the last applied configuration and any pending set or unset configuration that you have not yet applied. For example, nv show --pending interface bond1.
--startup
Shows configuration saved with the nv config save command. This is the configuration after the switch boots.
--output
Shows command output in table (auto), json, or yaml format. For example: nv show --output auto interface bond1 nv show --output json interface bond1 nv show --output yaml interface bond1
--color
Turns colored output on or off. For example, nv show --color on interface bond1
--paginate
Paginates the output. For example, nv show --paginate on interface bond1.
--help
Shows help for the NVUE commands.
The following example shows pending BGP graceful restart configuration:
cumulus@switch:~$ nv show router bgp graceful-restart --pending
4 description
---------------------------- ----------------- ----------------------------------------------------------------------
mode helper-only Role of router during graceful restart. helper-only, router is in h...
path-selection-deferral-time 360 Used by the restarter as an upper-bounds for waiting for peeringes...
restart-time 120 Amount of time taken to restart by router. It is advertised to the...
stale-routes-time 360 Specifies an upper-bounds on how long we retain routes from a resta...
Monitoring Commands and FRR Daemons
If you run an NVUE show command but the corresponding FRR routing daemons are not running on the switch, you see an error message; for example:
If OSPF is not running when you run nv show vrf <vrf-id> ospf commands, NVUE returns Error: The requested item does not exist because the OSPF deamon is not running in FRR.
If PIM and IGMP are not running when you run the nv show interface <interface> ip igmp -o json command, NVUE returns Error: The requested item does not exist because the PIM daemon is not running in FRR.
If PIM is running but IGMP is not running when you the nv show interface <interface> ip igmp group -o json command, NVUE does not return an error message but shows an empty { } response.
Net Show commands
In addition to the nv show commands, Cumulus Linux continues to provide a subset of the NCLU net show commands. Use these commands to get additional views of various parts of your network configuration.
cumulus@leaf01:mgmt:~$ net show
bfd : Bidirectional forwarding detection
bgp : Border Gateway Protocol
bridge : a layer2 bridge
clag : Multi-Chassis Link Aggregation
commit : apply the commit buffer to the system
configuration : settings, configuration state, etc
counters : net show counters
debugs : Debugs
dhcp-snoop : DHCP snooping for IPv4
dhcp-snoop6 : DHCP snooping for IPv6
dot1x : Configure, Enable, Delete or Show IEEE 802.1X EAPOL
evpn : Ethernet VPN
hostname : local hostname
igmp : Internet Group Management Protocol
interface : An interface, such as swp1, swp2, etc.
ip : Internet Protocol version 4/6
ipv6 : Internet Protocol version 6
lldp : Link Layer Discovery Protocol
mpls : Multiprotocol Label Switching
mroute : Static unicast routes in MRIB for multicast RPF lookup
msdp : Multicast Source Discovery Protocol
neighbor : A BGP, OSPF, PIM, etc neighbor
ospf : Open Shortest Path First (OSPFv2)
ospf6 : Open Shortest Path First (OSPFv3)
package : A Cumulus Linux package name
pbr : Policy Based Routing
pim : Protocol Independent Multicast
port-mirror : port-mirror
port-security : Port security
ptp : Precision Time Protocol
roce : Enable RoCE on all interfaces, default mode is lossless
rollback : revert to a previous configuration state
route : EVPN route information
route-map : Route-map
snmp-server : Configure the SNMP server
system : System
time : Time
version : Version number
vrf : Virtual routing and forwarding
vrrp : Virtual Router Redundancy Protocol
Configuration Management Commands
The NVUE configuration management commands manage and apply configurations.
Command
Description
nv config apply
Applies the pending configuration to become the applied configuration. You can also use these prompt options:
--y or --assume-yes to automatically reply yes to all prompts.
--assume-no to automatically reply no to all prompts.
Cumulus Linux applies but does not save the configuration; the configuration does not persist after a reboot.
You can also use these apply options: --confirm applies the configuration change but you must confirm the applied configuration. If you do not confirm within ten minutes, the configuration rolls back automatically. You can change the default time with the apply --confirm <time> command. For example, apply --confirm 60 requires you to confirm within one hour. --confirm-status shows the amount of time left before the automatic rollback.To save the pending configuration to the startup configuration automatically when you run nv config apply so that you do not have to run the nv config save command, enable auto save.
nv config detach
Detaches the configuration from the current pending configuration and uses an integer to identify it; for example, 4. To list all the current detached pending configurations, run nv config diff <<press tab>.
nv config diff <revision> <revision>
Shows differences between configurations, such as the pending configuration and the applied configuration, or the detached configuration and the pending configuration.
nv config history <revision>
Shows the apply history for the revision.
nv config patch <nvue-file>
Updates the pending configuration with the specified YAML configuration file.
nv config replace <nvue-file>
Replaces the pending configuration with the specified YAML configuration file.
nv config save
Overwrites the startup configuration with the applied configuration by writing to the /etc/nvue.d/startup.yaml file. The configuration persists after a reboot.
nv config show
Shows the currently applied configuration in yaml format. This command also shows NVUE version information.
nv config show -o commands
Shows the currently applied configuration commands.
nv config diff -o commands
Shows differences between two configuration revisions.
You can use the NVUE configuration management commands to back up and restore configuration when you upgrade Cumulus Linux on the switch. Refer to Upgrading Cumulus Linux.
Action Commands
The NVUE action commands clear counters, and provide system reboot and TACACS user disconnect options.
Reboots the switch in the configured restart mode (fast, cold, or warm). You must specify the no-confirm option with this command.
List All NVUE Commands
To show the full list of NVUE commands, run nv list-commands. For example:
cumulus@switch:~$ nv list-commands
nv show platform
nv show platform hardware
nv show platform hardware component
nv show platform hardware component <component-id>
nv show platform software
nv show platform software installed
nv show platform software installed <installed-id>
nv show platform capabilities
nv show platform environment
...
You can show the list of commands for a command grouping. For example, to show the list of interface commands:
cumulus@switch:~$ nv list-commands interface
nv show interface
nv show interface <interface-id>
nv show interface <interface-id> ip
nv show interface <interface-id> ip address
nv show interface <interface-id> ip address <ip-prefix-id>
nv show interface <interface-id> ip gateway
nv show interface <interface-id> ip gateway <ip-address-id>
...
Use the tab key to get help for the command lists you want to see. For example, to show the list of command options available for swp1, run the nv list-commands interface swp1 command and press the tab key:
cumulus@switch:~$ nv list-commands interface swp1 <<press tab>>
acl counters link ptp storm-control
bond evpn lldp qos synce
bridge ip pluggable router tunnel
To view the NVUE command reference for Cumulus Linux, which describes all the NVUE CLI commands and provides examples, go to the NVUE Command Reference.
NVUE Configuration File
When you save network configuration, NVUE writes the configuration to the /etc/nvue.d/startup.yaml file.
You can edit or replace the contents of the /etc/nvue.d/startup.yaml file. NVUE applies the configuration in the /etc/nvue.d/startup.yaml file during system boot only if the nvue-startup.service is running. If this service is not running, the switch reboots with the same configuration that is running before the reboot.
When you apply a configuration with nv config apply, NVUE also writes to underlying Linux files such as /etc/network/interfaces and /etc/frr/frr.conf. You can view these configuration files; however, do not manually edit them while using NVUE. If you need to configure certain network settings manually or use automation such as Ansible to configure the switch, see Configure NVUE to Ignore Linux Files below.
Configuration Files that NVUE Manages
NVUE manages the following configuration files:
File
Description
/etc/network/interfaces
Configures the network interfaces available on your system.
/etc/frr/frr.conf
Configures FRRouting.
/etc/cumulus/switchd.conf
Configures switchd options.
/etc/cumulus/switchd.d/ptp.conf
Configures PTP timestamping.
/etc/frr/daemons
Configures FRRouting services.
/etc/hosts
Configures the hostname of the switch.
/etc/default/isc-dhcp-relay-default
Configures DHCP relay options.
/etc/dhcp/dhcpd.conf
Configures DHCP server options.
/etc/hostname
Configures the hostname of the switch.
/etc/cumulus/datapath/qos/qos_features.conf
Configures QoS settings, such as traffic marking, shaping and flow control.
/etc/mlx/datapath/qos/qos_infra.conf
Configures QoS platform specific configurations, such as buffer allocations and Alpha values.
/etc/cumulus/switchd.d/qos.conf
Configures QoS settings.
/etc/cumulus/ports.conf
Configures port breakouts.
/etc/ntp.conf
Configures NTP settings.
/etc/ptp4l.conf
Configures PTP settings.
/etc/snmp/snmpd.conf
Configures SNMP settings.
Search for a Specific Configuration
To search for a specific portion of the NVUE configuration, run the nv config find <search string> command. The search shows all items above and below the search string. For example, to search the entire NVUE object model configuration for any mention of ptm:
You can configure NVUE to ignore certain underlying Linux files when applying configuration changes. For example, if you push certain configuration to the switch using Ansible and Jinja2 file templates or you want to use custom configuration for a particular service such as PTP, you can ensure that NVUE never writes to those configuration files.
The following example configures NVUE to ignore the Linux /etc/ptp4l.conf file when applying configuration changes and saves the configuration so it persists after a reboot.
cumulus@switch:~$ nv set system config apply ignore /etc/ptp4l.conf
cumulus@switch:~$ nv config apply
cumulus@switch:~$ nv config save
Configure Auto Save
By default, when you run the nv config apply command to apply a configuration setting, NVUE applies the pending configuration to become the applied configuration but does not update the startup configuration file (/etc/nvue.d/startup.yaml). To save the applied configuration to the startup configuration so that the changes persist after the reboot, you must run the nv config save command. The auto save option lets you save the pending configuration to the startup configuration automatically when you run nv config apply so that you do not have to run the nv config save command.
To enable auto save:
cumulus@switch:~$ nv set system config auto-save enable on
cumulus@switch:~$ nv config apply
To disable auto save, run the nv set system config auto-save enable off command.
Add Configuration Apply Messages
When you run the nv config apply command, you can add a message that describes the configuration updates you make. You can see the message when you run the nv config history command.
To add a configuration apply message, run the nv config apply -m <message> command. If the message includes more than one word, enclose the message in quotes.
cumulus@switch:~$ nv config apply -m "this is my message"
Reset NVUE Configuration to Default Values
To reset the NVUE configuration on the switch back to the default values, run the following command:
cumulus@switch:~$ nv config apply empty
Example Configuration Commands
This section provides examples of how to configure a Cumulus Linux switch using NVUE commands.
Configure the System Hostname
The example below shows the NVUE commands required to change the hostname for the switch to leaf01:
cumulus@switch:~$ nv set system hostname leaf01
cumulus@switch:~$ nv config apply
Configure the System DNS Server
The example below shows the NVUE commands required to define the DNS server for the switch:
cumulus@switch:~$ nv set service dns mgmt server 192.168.200.1
cumulus@switch:~$ nv config apply
Configure an Interface
The example below shows the NVUE commands required to bring up swp1.
cumulus@switch:~$ nv set interface swp1
cumulus@switch:~$ nv config apply
Configure a Bond
The example below shows the NVUE commands required to configure the front panel port interfaces swp1 thru swp4 to be slaves in bond0.
cumulus@switch:~$ nv set interface bond0 bond member swp1-4
cumulus@switch:~$ nv config apply
Configure a Bridge
The example below shows the NVUE commands required to create a VLAN-aware bridge that contains two switch ports (swp1 and swp2) and includes 3 VLANs; tagged VLANs 10 and 20 and an untagged (native) VLAN of 1.
With NVUE, there is a default bridge called br_default, which has no ports assigned to it. The example below configures this default bridge.
cumulus@switch:~$ nv set interface swp1-2 bridge domain br_default
cumulus@switch:~$ nv set bridge domain br_default vlan 10,20
cumulus@switch:~$ nv set bridge domain br_default untagged 1
cumulus@switch:~$ nv config apply
Configure MLAG
The example below shows the NVUE commands required to configure MLAG on leaf01. The commands:
Place swp1 into bond1 and swp2 into bond2.
Configure the MLAG ID to 1 for bond1 and to 2 for bond2.
Add bond1 and bond2 to the default bridge (br_default).
Create the inter-chassis bond (swp49 and swp50) and the peer link (peerlink)
Set the peer link IP address to linklocal, the MLAG system MAC address to 44:38:39:BE:EF:AA, and the backup interface to 10.10.10.2.
cumulus@leaf01:~$ nv set interface bond1 bond member swp1
cumulus@leaf01:~$ nv set interface bond2 bond member swp2
cumulus@leaf01:~$ nv set interface bond1 bond mlag id 1
cumulus@leaf01:~$ nv set interface bond2 bond mlag id 2
cumulus@switch:~$ nv set interface bond1-2 bridge domain br_default
cumulus@leaf01:~$ nv set interface peerlink bond member swp49-50
cumulus@leaf01:~$ nv set mlag mac-address 44:38:39:BE:EF:AA
cumulus@leaf01:~$ nv set mlag backup 10.10.10.2
cumulus@leaf01:~$ nv set mlag peer-ip linklocal
cumulus@leaf01:~$ nv config apply
Configure BGP Unnumbered
The example below shows the NVUE commands required to configure BGP unnumbered on leaf01. The commands:
Assign the ASN for this BGP node to 65101.
Set the router ID to 10.10.10.1.
Distribute routing information to the peer on swp51.
Originate prefixes 10.10.10.1/32 from this BGP node.
cumulus@leaf01:~$ nv set router bgp autonomous-system 65101
cumulus@leaf01:~$ nv set router bgp router-id 10.10.10.1
cumulus@leaf01:~$ nv set vrf default router bgp neighbor swp51 remote-as external
cumulus@leaf01:~$ nv set vrf default router bgp address-family ipv4-unicast network 10.10.10.1/32
cumulus@leaf01:~$ nv config apply
Example Monitoring Commands
This section provides monitoring command examples.
Show Installed Software
The following example command lists the software installed on the switch:
cumulus@switch:~$ nv show platform software
Installed Software
=====================
Installed software description package version
--------------------------- --------------------------- -------------------------- -----------------------------
acpi displays information on ACPI acpi 1.7-1.1
devices
acpi-support-base scripts for handling base acpi-support-base 0.142-8
ACPI events such as the
power button
acpid Advanced Configuration and acpid 1:2.0.31-1
Power Interface event daemon
adduser add and remove users and adduser 3.118
groups
apt commandline package manager apt 1.8.2.3
...
Show Interface Configuration
The following example command shows the running, applied, and pending swp1 interface configuration.
cumulus@leaf01:~$ nv show interface swp1
operational applied
------------------------ ----------------- ----------
type swp swp
[acl]
bridge
[domain] br_default br_default
evpn
multihoming
uplink off
ptp
enable off
router
adaptive-routing
enable off
ospf
enable off
ospf6
enable off
pbr
[map]
pim
...
Example Configuration Management Commands
This section provides examples of how to use the configuration management commands to apply, save, and detach configurations.
Apply and Save a Configuration
The following example command configures the front panel port interfaces swp1 thru swp4 to be slaves in bond0. The configuration is only in a pending configuration state. The configuration is not applied. NVUE has not yet made any changes to the running configuration.
cumulus@switch:~$ nv set interface bond0 bond member swp1-4
To apply the pending configuration to the running configuration, run the nv config apply command. The configuration does not persist after a reboot.
cumulus@switch:~$ nv config apply
To save the applied configuration to the startup configuration, run the nv config save command. This command overwrites the startup configuration with the applied configuration by writing to the /etc/nvue.d/startup.yaml file. The configuration persists after a reboot.
cumulus@switch:~$ nv config save
Detach a Pending Configuration
The following example configures the IP address of the loopback interface, then detaches the configuration from the current pending configuration. Cumulus Linux saves the detached configuration to a file with a numerical value to distinguish it from other pending configurations.
cumulus@switch:~$ nv set interface lo ip address 10.10.10.1/32
cumulus@switch:~$ nv config detach
View Differences Between Configurations
To view differences between configurations, run the nv config diff command.
To view differences between two detached pending configurations, run the nv config diff «tab» command to list all the current detached pending configurations, then run the nv config diff command with the pending configurations you want to diff.
The following example replaces the pending configuration with the contents of the YAML configuration file called nv-02/13/2021.yaml located in the /deps directory:
The following example patches the pending configuration (runs the set or unset commands from the configuration in the nv-02/13/2021.yaml file located in the /deps directory):
A patch contains a single request to the NVUE service. Ordering of parameters within a patch is not guaranteed; NVUE does not support both unset and set commands for the same object in a single patch.
Date and Time
This section discusses how to:
Set the time zone, and the date and time on the software clock on the switch
NVUE supports both traditional snippets and flexible snippets:
Use traditional snippets to add configuration to the /etc/network/interfaces, /etc/frr/frr.conf, /etc/frr/daemons, /etc/cumulus/switchd.conf, /etc/cumulus/datapath/traffic.conf or /etc/ssh/sshd_config files.
Use flexible snippets to manage any other text file on the system.
A snippet configures a single parameter associated with a specific configuration file.
You can only set or unset a snippet; you cannot modify, partially update, or change a snippet.
Setting the snippet value replaces any existing snippet value.
Cumulus Linux supports only one snippet for a configuration file.
Only certain configuration files support a snippet.
NVUE does not parse or validate the snippet content and does not validate the resulting file after you apply the snippet.
PATCH is only the method of applying snippets and does not refer to any snippet capabilities.
As NVUE supports more features and introduces new syntax, snippets and flexible snippets become invalid. Before you upgrade Cumulus Linux to a new release, review the What's New for new NVUE syntax and remove the snippet if NVUE introduces new syntax for the feature that the snippet configures.
Traditional Snippets
Use traditional snippets if you configure Cumulus Linux with NVUE commands, then want to configure a feature that does not yet support the NVUE Object Model. You create a snippet in yaml format, then add the configuration to the file with the nv config patch command.
The nv config patch command requires you to use the fully qualified path name to the snippet .yaml file; for example you cannot use ./ with the nv config patch command.
/etc/frr/frr.conf Snippets
Example 1: Top Level Configuration
NVUE does not support configuring BGP to peer across the default route. The following example configures BGP to peer across the default route from the default VRF:
Create a .yaml file with the following traditional snippet:
Run the nv config apply command to apply the configuration:
cumulus@switch:~$ nv config apply
Verify that the configuration exists at the end of the /etc/frr/frr.conf file:
cumulus@switch:~$ sudo cat /etc/frr/frr.conf
...
! end of router ospf block
!---- CUE snippets ----
ip nht resolve-via-default
Example 2: Nested Configuration
NVUE does not support configuring EVPN route targets using auto derived values from RFC 8365. The following example configures BGP to enable RFC 8365 derived router targets:
Create a .yaml file with the following traditional snippet:
The traditional snippets for FRR write content to the /etc/frr/frr.conf file. When you apply the configuration and snippet with the nv config apply command, the FRR service goes through and reads in the /etc/frr/frr.conf file.
Example 3: EVPN Multihoming FRR Debugging
NVUE does not support configuring FRR debugging for EVPN multihoming. The following example configures FRR debugging:
Create a .yaml file and add the following traditional snippet:
The traditional snippets for FRR write content to the /etc/frr/frr.conf file. When you apply the configuration and snippet with the nv config apply command, the FRR service goes through and reads in the /etc/frr/frr.conf file.
/etc/network/interfaces Snippets
MLAG Timers Example
NVUE supports configuring only one of the MLAG service timeouts (initDelay). The following example configures the MLAG peer timeout to 400 seconds:
Create a .yaml file and add the following traditional snippet:
NVUE does not support configuring traditional bridges. The following example configures a traditional bridge called br0 with the IP address 11.0.0.10/24. swp1, swp2 are members of the bridge.
Create a .yaml file and add the following traditional snippet:
Run the nv config apply command to apply the configuration:
cumulus@switch:~$ nv config apply
Verify that the configuration exists at the end of the /etc/network/interfaces file:
cumulus@switch:~$ sudo cat /etc/network/interfaces
...
auto br0
iface br0
address 11.0.0.10/24
bridge-ports swp1 swp2
bridge-vlan-aware no
VLAN-aware RSTP Timers Example
NVUE does not support configuring RSTP timers on VLAN-aware bridges. The following example configures non-default RSTP timers for the NVUE default bridge br_default:
Create a .yaml file and add the following traditional snippet:
NVUE does not provide options to configure link flap detection settings. The following example configures the link flap window to 10 seconds and the link flap threshold to 5 seconds:
Create a .yaml file and add the following traditional snippet:
To add Cumulus Linux SNMP agent configuration not yet available with NVUE commands, create an snmpd.conf snippet.
The following example creates a file called snmpd.conf_snippet.yaml, and sets the read only community string and the listening address to run in the mgmt VRF.
SNMP snippets do not take effect unless you first enable SNMP with the NVUE nv set service snmp-server enable on and nv set service snmp-server listening-address commands (or with the equivalent REST API methods).
Create a .yaml file and add the following traditional snippet:
To add SSH service configuration not yet available with NVUE commands, create an sshd_config snippet.
The following example creates a file called sshd_config_snippet.yaml to allow root login and enable X11 forwarding for all users except user anoncvs. The snippet also disables TCP forwarding for the anoncvs user and runs the cvs server command when anoncvs logs in.
Create a .yaml file and add the following traditional snippet:
cumulus@switch:~$ sudo nano sshd_config_snippet.yaml
- set:
system:
config:
snippet:
sshd_config: |
PermitRootLogin yes
X11Forwarding yes
Match User anoncvs
X11Forwarding no
AllowTcpForwarding no
ForceCommand cvs server
Run the following command to patch the configuration:
Run the nv config apply command to apply the configuration:
cumulus@switch:~$ nv config apply
Verify that the configuration exists at the end of the /etc/ssh/sshd_config file:
cumulus@switch:~$ sudo cat /etc/ssh/sshd_config
...
!---- NVUE snippets ----
PermitRootLogin yes
X11Forwarding yes
Match User anoncvs
X11Forwarding no
AllowTcpForwarding no
ForceCommand cvs server
Flexible Snippets
Flexible snippets are an extension of traditional snippets that let you manage any text file on the system.
You can create new files or modify existing files that NVUE does not manage.
You can add configuration to files that NVUE manages.
The account you use through the CLI or the REST API to configure and manage flexible snippets must be in the sudo group, which includes the NVUE system-admin role, or you must be the root user.
Files NVUE Manages
You can use flexible snippets to add configuration to the following files that NVUE manages:
Filename
Description
/etc/cumulus/csmgrd
Configuration file for csmgrctl commands.
/etc/default/isc-dhcp-relay-<VRF>
Configuration file for DHCP relay. Changes to this file require a dhcrelay@<VRF>.service restart.
/etc/resolv.conf
Configuration file for DNS resolution.
/etc/hosts
Configuration file for the hostname of the switch.
/etc/default/isc-dhcp-server-<VRF>
Configuration file for DHCP servers. Changes to this file require a dhcpd@<VRF>.service restart.
/etc/default/isc-dhcp-server6-<VRF>
Configuration file for DHCP servers for IPv6. Changes to this file require a dhcpd6@<VRF>.service restart
/etc/dhcp/dhcpd-<VRF>.conf
Configuration file for the dhcpd service. Changes to this file require a dhcpd@<VRF>.service restart
/etc/dhcp/dhcpd6-<VRF>.conf
Configuration file for the dhcpd service for IPv6. Changes to this file require a dhcpd6@<VRF>.service restart
/etc/ntp.conf
Configuration file for NTP servers. Changes to this file require an ntp service restart.
/etc/default/isc-dhcp-relay6-<VRF>
Configuration file for DHCP relay for IPv6. Changes to this file require a dhcrelay6@<VRF>.service restart.
/etc/snmp/snmpd.conf
Configuration file for SNMP. Changes to this file require an snmpd restart.
/etc/cumulus/datapath/traffic.conf
Configuration file for forwarding table profiles. Changes to this file require a switchd restart.
/etc/cumulus/switchd.conf
Configuration file for switchd. Changes to this file require a switchd restart.
Flexible snippets do not support:
Binary files.
Symbolic links.
More than 1MB of content.
More than one flexible snippet in the same destination file.
Use caution when creating flexible snippets:
If you configure flexible snippets incorrectly, they might impact switch functionality. For example, even though flexible snippet validation allows you to only add textual content, Cumulus Linux does not prevent you from creating a flexible snippet that adds to sensitive text files, such as /boot/grub.cfg and /etc/fstab or add corrupt contents. Such snippets might render the switch unusable or create a potential security vulnerability (the NVUE service (nvued) runs with superuser privileges).
Do not manually update configuration files to which you add flexible snippets.
Any sensitive data in plain text (such as passwords) appears in the NVUE-managed configuration files as plain text.
Create a Flexible Snippet
To create a flexible snippet:
Create a file in yaml format and add each flexible snippet you want to apply in the format shown below. NVUE appends the flexible snippet at the end of an existing file. If the file does not exist, NVUE creates the file, then adds the content.
cumulus@leaf01:mgmt:~$ sudo nano <filename>.yaml>
- set:
system:
config:
snippet:
<snippet-name>:
file: "<filename>"
permissions: "<umask-permissions>"
content: |
# This is my content
services:
<name>:
service: <service-name>
action: <action>
You can only set the umast permissions to a new file that you create. Adding the permissions: line is optional. The default umask persmissions are 644.
You can add a service with an action, such as start, restart, or stop. Adding the services: lines is optional; however, if you add the service: line, you must specify at least one service.
Run the following command to patch the configuration:
Run the nv config apply command to apply the configuration:
cumulus@switch:~$ nv config apply
Verify the patched configuration.
The nv config patch command requires you to use the fully qualified path name to the snippet .yaml file; for example you cannot use ./ with the nv config patch command.
Flexible Snippet Examples
The following example flexible snippet called crontab-flex-snippet appends the single line @daily /opt/utils/run-backup.sh to the existing /etc/crontab file, then restarts the cron service.
The following example flexible snippet called apt-flex-snippet creates a new file /etc/apt/sources.list.d/microsoft-prod.list with 0644 permissions and adds multi-line text:
cumulus@leaf01:mgmt:~$ sudo nano apt-flex-snippet.yaml
- set:
system:
config:
snippet:
apt-flexible-snippet:
file: "/etc/apt/sources.list.d/microsoft-prod.list"
content: |
# Adding Microsoft SQL Server Sources
deb [arch=amd64] https://packages.microsoft.com/debian/10/prod buster main
permissions: "0644"
The following flexible snippet called lldp_config_snipppet disables LLDP on swp1 and swp2 using the configure system interface pattern-blacklist command:
After you patch and apply the configuration above, the snippet creates a new file in the /etc/lldp.d directory, then restarts the lldpd service to stop LLDP transmitting and receiving on swp1 and swp2. Other interfaces continue to participate in LLDP.
If you try to apply a flexible snippet to a file that NVUE does not allow, you see an error message similar to the following:
cumulus@leaf01:mgmt:~$ nv config apply
Invalid config [rev_id: 8]
Flexible snippets are not allowed to be configured on the file '/etc/cumulus/ports.conf’.
Flexible snippets are not allowed to be configured on the file '/etc/cumulus/ports_width.conf’.
If you try to apply a flexible snippet to a file that supports traditional snippets, you see an error message similar to the following:
cumulus@leaf01:mgmt:~$ nv config apply
Invalid config [rev_id: 1]
Flexible snippet cannot be used to modify the file '/etc/ssh/sshd_config'. Traditional snippets (for e.g., 'sshd_config') are supported on this file. Consult NVIDIA NVUE documentation for further information on snippets.
You can also create a flexible snippet with the REST API. See NVUE API.
Remove a Snippet
To remove a traditional or flexible snippet, edit the snippet’s .yaml file to change set to unset, then patch and apply the configuration. Alternatively, you can use the REST API DELETE and PATCH methods.
Setting the time zone, and the date and time on the software clock requires root privileges; use sudo.
Set the Time Zone
You can use one of these methods to set the time zone on the switch:
Run NVUE commands.
Use the guided wizard.
Edit the /etc/timezone file.
Run the nv set system timezone <timezone> command. To see all the available time zones, run nv set system timezone and press the Tab key. The following example sets the time zone to US/Eastern:
cumulus@switch:~$ nv set system timezone US/Eastern
cumulus@switch:~$ nv config apply
In a terminal, run the following command:
cumulus@switch:~$ sudo dpkg-reconfigure tzdata
Follow the on screen menu options to select the geographic area and region.
The switch contains a battery backed hardware clock that maintains the time while the switch powers off and between reboots. When the switch is running, the Cumulus Linux operating system maintains its own software clock.
During boot up, the switch copies the time from the hardware clock to the operating system software clock. The software clock takes care of all the timekeeping. During system shutdown, the switch copies the software clock back to the battery backed hardware clock.
You can set the date and time on the software clock with the date command. First, determine your current time zone:
cumulus@switch:~$ date +%Z
If you need to reconfigure the current time zone, refer to the instructions above.
To set the software clock according to the configured time zone:
cumulus@switch:~$ sudo date -s "Tue Jan 26 00:37:13 2021"
You can write the current value of the software clock to the hardware clock using the hwclock command:
When you upgrade to Cumulus Linux 5.6 or later, the switch overwrites any manual configuration you performed by editing files in Cumulus Linux 5.5 or earlier, such as configuring the listening address, port, TLS, or certificate.
In addition to the CLI, NVUE supports a REST API. Instead of accessing Cumulus Linux using SSH, you can interact with the switch using an HTTP client, such as cURL or a web browser.
The nvued service provides access to the NVUE REST API. Cumulus Linux exposes the HTTP endpoint internally, which makes the NVUE REST API accessible locally within the Cumulus Linux switch. The NVUE CLI also communicates with the nvued service using internal APIs. To provide external access to the NVUE REST API, Cumulus Linux uses an HTTP reverse proxy server, and supports HTTPS and TLS connections from external REST API clients.
The following illustration shows the NVUE REST API architecture and illustrates how Cumulus Linux forwards the requests internally.
Supported HTTP Methods
The NVUE REST API supports the following methods:
The GET method displays configuration and operational data, and is equivalent to the nv show commands.
The POST method creates and submits operations. You typically use this method for nv action commands and for the nv config command to create revisions.
The PATCH method replaces or unsets a configuration. You use this method for the nv set and nv config apply commands. You can either perform:
A targeted configuration patch to make a configuration change, where you run a specific NVUE REST API targeted at a particular OpenAPI end-point URI. Based on the NVUE schema definition, you need to direct the PATCH REST API request at a particular endpoint (for example, /nvue_v1/vrf/<vrf-id>/router/bgp) and provide the payload that conforms to the schema. With a targeted configuration patch, you can control individual resources.
A root patch, where you run the NVUE PATCH API on the root node of the schema so that a single PATCH operation can change one, some, or the entire configuration in a single payload. The payload of the PATCH method must be aware of the entire NVUE object model schema because you make the configuration changes relative to the root node /nvue_v1. You typically perform a root patch to push all configurations to the switch in bulk; for example, if you use an SDN controller or a network management system to push the entire switch configuration every time you need to make a change, regardless of how small or large. A root patch can also make configuration changes with fewer round trips to the switch.
The input payload in a PATCH request can have either a set or unset json object for the same resource, but not both. The order in which the API executes the set and unset objects is not deterministic and not supported.
The DELETE method deletes a configuration and is equivalent to the nv unset commands.
Secure the API
The NVUE REST API supports HTTP basic authentication, and the same underlying authentication methods for username and password that the NVUE CLI supports. User accounts work the same on both the API and the CLI.
Certificates
Cumulus Linux includes a self-signed certificate and private key to use on the server so that it works out of the box. The switch generates the self-signed certificate and private key when it boots for the first time. The X.509 certificate with the public key is in /etc/ssl/certs/cumulus.pem and the corresponding private key is in /etc/ssl/private/cumulus.key.
NVIDIA recommends you use your own certificates and keys. Certificates must be in PEM format. For the steps to generate self-signed certificates and keys, and to install them on the switch, refer to the Ubuntu Certificates and Security documentation.
To use your own certificate chain:
Import the certificate and private key onto the Cumulus Linux switch using secure channels, such as SCP or SFTP.
Store the certificate and private key on the filesystem in a location of you choice or use the same location; for example, /etc/ssl/certs and /etc/ssl/private.
Update the /etc/nginx/sites-enabled/nvue.conf file to set the ssl_certificate and the ssl_certificate_key values to your keys.
Restart NGINX with the sudo systemctl restart nginx command.
API-only User
To create an API-only user without SSH permissions, use Linux group permissions. You can create the API-only user in the ZTP script.
# Create the dedicated automation user
adduser --disabled-password --gecos "Automation User,,,," --shell /usr/bin/nologin automation
# Set the password
echo 'automation:password!' | chpasswd
# Add the user to nvapply group to make NVUE config changes
adduser automation nvapply
This example shows how to create ACLs to allow users from the management subnet and the local switch to communicate with the switch using REST APIs, and restrict all other access.
cumulus@switch:~$ nv set acl API-PROTECT type ipv4
cumulus@switch:~$ nv set acl API-PROTECT rule 10 action permit
cumulus@switch:~$ nv set acl API-PROTECT rule 10 match ip .protocol tcp .dest-port 8765 .source-ip 192.168.200.0/24
cumulus@switch:~$ nv set acl API-PROTECT rule 10 remark "Allow the Management Subnet to talk to API"
cumulus@switch:~$ nv set acl API-PROTECT rule 20 action permit
cumulus@switch:~$ nv set acl API-PROTECT rule 20 match ip .protocol tcp .dest-port 8765 .source-ip 127.0.0.1
cumulus@switch:~$ nv set acl API-PROTECT rule 20 remark "Allow the local switch to talk to the API"
cumulus@switch:~$ nv set acl API-PROTECT rule 30 action deny
cumulus@switch:~$ nv set acl API-PROTECT rule 30 match ip .protocol tcp .dest-port 8765
cumulus@switch:~$ nv set acl API-PROTECT rule 30 remark "Block everyone else from talking to the API"
cumulus@switch:~$ nv set system control-plane acl API-PROTECT inbound
Supported Objects
The NVUE object model supports most features on the Cumulus Linux switch. The following list shows the supported objects. The NVUE API supports more objects within each of these objects. You can find a full listing of the supported API endpoints
here.
High-level Objects
Description
acl
Access control lists.
bridge
Bridge domain configuration.
evpn
EVPN configuration.
interface
Interface configuration.
mlag
MLAG configuration.
nve
Network virtualization configuration, such as VXLAN-specfic MLAG configuration and VXLAN flooding.
platform
Platform configuration, such as hardware and software components.
qos
QoS RoCE configuration.
router
Router configuration, such as router policies, global BGP and OSPF configuration, PBR, PIM, IGMP, VRR, and VRRP configuration.
service
DHCP relays and server, NTP, PTP, LLDP, and syslog configuration.
system
Global system settings, such as the reserved routing table range for PBR and the reserved VLAN range for layer 3 VNIs, system login messages and switch reboot history.
vrf
VRF configuration.
Use the API
The NVUE CLI and the REST API are equivalent in functionality; you can run all management operations from the REST API or from the CLI. The NVUE object model drives both the REST API and the CLI management operations. All operations are consistent; for example, the CLI nv show commands reflect any PATCH operation (create and update) you run through the REST API.
NVUE follows a declarative model, removing context-specific commands and settings. The structure of NVUE is like a big tree that represents the entire state of a Cumulus Linux instance. At the base of the tree are high level branches representing objects, such as router and interface. Under each of these branches are more branches. As you navigate through the tree, you gain a more specific context. At the leaves of the tree are actual attributes, represented as key-value pairs. The path through the tree is similar to a filesystem path.
Cumulus Linux enables the NVUE REST API by default. To disable the NVUE REST API, run the nv set system api state disabled command.
Set the NVUE REST API port. If you do not set a port, Cumulus Linux uses the default port 8765.
Specify the NVUE REST API listening address; you can specify an IPv4 address, IPv6 address, or localhost. If you do not specify a listening address, NGINX listens on all addresses for the target port.
The following example sets the port to 8888:
cumulus@switch:~$ nv set system api port 8888
cumulus@switch:~$ nv config apply
You can listen on multiple interfaces by specifying different listening addresses:
cumulus@switch:~$ nv set system api listening-address 10.10.10.1
cumulus@switch:~$ nv set system api listening-address 10.10.20.1
cumulus@switch:~$ nv config apply
The following example configures the listening address on eth0, which has IP address 172.0.24.0 and uses the management VRF by default:
cumulus@switch:~$ nv set system api listening-address 172.0.24.0
cumulus@switch:~$ nv config apply
The following example configures VRF BLUE on swp1, which has IP address 10.10.20.1, then sets the API listening address to the IP address for swp1 (configured for VRF BLUE).
cumulus@switch:~$ nv set interface swp1 ip address 10.10.10.1/24
cumulus@switch:~$ nv set interface swp1 ip vrf BLUE
cumulus@switch:~$ nv config apply
cumulus@switch:~$ nv set system api listening-address 10.10.10.1
cumulus@switch:~$ nv config apply
You can listen on multiple interfaces by specifying different listening addresses. The following example sets localhost, interface address 10.10.10.1, and 10.10.20.1 as listen-addresses.
The following examples show the primary API uses cases.
View a Configuration
Use the following example to obtain the current applied configuration on the switch. Change the rev argument to view any revision. Possible options for the rev argument include startup, pending, operational, and applied.
cumulus@switch:~$ nv show system
operational applied
-------- ------------------- -------
hostname switch01 cumulus
build Cumulus Linux 5.4.0
uptime 0:12:59
timezone Etc/UTC
cumulus@switch:~$ nv show bridge domain br_default vlan 10
operational applied pending description
--------------- ----------- ------- ------- ------------------------------------------------------
[vni] 10 10 10 L2 VNI
multicast
snooping
querier
source-ip 0.0.0.0 0.0.0.0 0.0.0.0 Source IP to use when sending IGMP/MLD queries.
ptp
enable off off off Turn the feature 'on' or 'off'. The default is 'off'.
#!/usr/bin/env python3
import requests
from requests.auth import HTTPBasicAuth
import json
import time
auth = HTTPBasicAuth(username="cumulus", password="password")
nvue_end_point = "https://127.0.0.1:8765/nvue_v1"
mime_header = {"Content-Type": "application/json"}
DUMMY_SLEEP = 5 # In seconds
POLL_APPLIED = 1 # in seconds
RETRIES = 10
def print_request(r: requests.Request):
print("=======Request=======")
print("URL:", r.url)
print("Headers:", r.headers)
print("Body:", r.body)
def print_response(r: requests.Response):
print("=======Response=======")
print("Headers:", r.headers)
print("Body:", json.dumps(r.json(), indent=2))
def create_nvue_changest():
r = requests.post(url=nvue_end_point + "/revision",
auth=auth,
verify=False)
print_request(r.request)
print_response(r)
response = r.json()
changeset = response.popitem()[0]
return changeset
def apply_nvue_changeset(changeset):
apply_payload = {"state": "apply", "auto-prompt": {"ays": "ays_yes"}}
url = nvue_end_point + "/revision/" + requests.utils.quote(changeset,
safe="")
r = requests.patch(url=url,
auth=auth,
verify=False,
data=json.dumps(apply_payload),
headers=mime_header)
print_request(r.request)
print_response(r)
def is_config_applied(changeset) -> bool:
# Check if the configuration was indeed applied
global RETRIES
global POLL_APPLIED
retries = RETRIES
while retries > 0:
r = requests.get(url=nvue_end_point + "/revision/" + requests.utils.quote(changeset, safe=""),
auth=auth,
verify=False)
response = r.json()
print(response)
if response["state"] == "applied":
return True
retries -= 1
time.sleep(POLL_APPLIED)
return False
def apply_new_config(path,payload):
# Create a new revision ID
changeset = create_nvue_changest()
print("Using NVUE Changeset: '{}'".format(changeset))
# Delete existing configuration
query_string = {"rev": changeset}
r = requests.delete(url=nvue_end_point + path,
auth=auth,
verify=False,
params=query_string,
headers=mime_header)
print_request(r.request)
print_response(r)
# Patch the new configuration
query_string = {"rev": changeset}
r = requests.patch(url=nvue_end_point + path,
auth=auth,
verify=False,
data=json.dumps(payload),
params=query_string,
headers=mime_header)
print_request(r.request)
print_response(r)
# Apply the changes to the new revision changeset
apply_nvue_changeset(changeset)
# Check if the changeset was applied
is_config_applied(changeset)
def nvue_get(path):
r = requests.get(url=nvue_end_point + path,
auth=auth,
verify=False)
print_request(r.request)
print_response(r)
if __name__ == "__main__":
payload = {
"99.99.99.99/32": {}
}
apply_new_config("/interface/lo/ip/address",payload)
time.sleep(DUMMY_SLEEP)
nvue_get("/interface/lo/ip/address")
cumulus@switch:~$ nv show interface lo ip address
-------------
99.99.99.99/32
127.0.0.1/8
::1/128
Troubleshoot Configuration Changes
When a configuration change fails, you see an error in the change request.
Configuration Fails Because of a Dependency
If you stage a configuration but it fails because of a dependency, the failure shows the reason. In the following example, the change fails because the BGP router ID is not set.
cumulus@switch:~$ curl -u 'cumulus:cumulus' --insecure https://127.0.0.1:8765/nvue_v1/revision/6
{
"state": "invalid",
"transition": {
"issue": {
"0": {
"code": "config_invalid",
"data": {
"location": "router.bgp.enable",
"reason": "BGP requires router-id to be set globally or in the VRF.\n"
},
"message": "Config invalid at router.bgp.enable: BGP requires router-id to be set globally or in the VRF.\n",
"severity": "error"
}
},
"progress": "Invalid config"
}
}
To resolve this issue, observe the failures or errors, then inspect the configuration that you are trying to apply. After you resolve the errors, retry the API. If you prefer to overlook the errors and force an apply, add "auto-prompt":{"ays": "ays_yes"} to the configuration apply.
To save an applied configuration change to the startup configuration file (/etc/nvue.d/startup.yaml) so that the changes persist after a reboot, use a PATCH to the applied revision with the save state.
When you unset a change, you must still use the PATCH action. The value indicates removal of the entry. The data is {"vlan100":null} with the PATCH action.
Use the API for Active Monitoring
The example below fetches the counters for interface swp1.
#!/usr/bin/env python3
import requests
from requests.auth import HTTPBasicAuth
import json
import time
auth = HTTPBasicAuth(username="cumulus", password="password")
nvue_end_point = "https://127.0.0.1:8765/nvue_v1"
mime_header = {"Content-Type": "application/json"}
if __name__ == "__main__":
r = requests.get(url=nvue_end_point + "/interface/swp1/link/stats",
auth=auth,
verify=False)
print("=======Interface swp1 Statistics=======")
print(json.dumps(r.json(), indent=2))
cumulus@switch:~$ nv show interface swp1 link stats
operational applied pending description
------------------- ----------- ------- ------- ----------------------------------------------------------------------
carrier-transitions 6 Number of times the interface state has transitioned between up and...
in-bytes 280.15 MB total number of bytes received on the interface
in-drops 0 number of received packets dropped
in-errors 0 number of received packets with errors
in-pkts 2321659 total number of packets received on the interface
out-bytes 349.10 MB total number of bytes transmitted out of the interface
out-drops 0 The number of outbound packets that were chosen to be discarded eve...
out-errors 0 The number of outbound packets that could not be transmitted becaus...
out-pkts 3536508 total number of packets transmitted out of the interface
Convert CLI Changes to Use the API
You can take a configuration change from the CLI and use the API to configure the same set of changes.
Make your configuration changes on the system with the NVUE CLI.
cumulus@switch:~$ nv set system hostname switch01
cumulus@switch:~$ nv set interface lo ip address 99.99.99.99/32
cumulus@switch:~$ nv set interface eth0 ip address 192.168.200.6/24
cumulus@switch:~$ nv set interface bond0 bond member swp1-4
#!/usr/bin/env python3
import requests
from requests.auth import HTTPBasicAuth
import json
import time
auth = HTTPBasicAuth(username="cumulus", password="password")
nvue_end_point = "https://127.0.0.1:8765/nvue_v1"
mime_header = {"Content-Type": "application/json"}
DUMMY_SLEEP = 5 # In seconds
POLL_APPLIED = 1 # in seconds
RETRIES = 10
def print_request(r: requests.Request):
print("=======Request=======")
print("URL:", r.url)
print("Headers:", r.headers)
print("Body:", r.body)
def print_response(r: requests.Response):
print("=======Response=======")
print("Headers:", r.headers)
print("Body:", json.dumps(r.json(), indent=2))
def create_nvue_changest():
r = requests.post(url=nvue_end_point + "/revision",
auth=auth,
verify=False)
print_request(r.request)
print_response(r)
response = r.json()
changeset = response.popitem()[0]
return changeset
def apply_nvue_changeset(changeset):
# apply_payload = {"state": "apply"}
apply_payload = {"state": "apply", "auto-prompt": {"ays": "ays_yes"}}
url = nvue_end_point + "/revision/" + requests.utils.quote(changeset,
safe="")
r = requests.patch(url=url,
auth=auth,
verify=False,
data=json.dumps(apply_payload),
headers=mime_header)
print_request(r.request)
print_response(r)
def is_config_applied(changeset) -> bool:
# Check if the configuration was indeed applied
global RETRIES
global POLL_APPLIED
retries = RETRIES
while retries > 0:
r = requests.get(url=nvue_end_point + "/revision/" + requests.utils.quote(changeset, safe=""),
auth=auth,
verify=False)
response = r.json()
print(response)
if response["state"] == "applied":
return True
retries -= 1
time.sleep(POLL_APPLIED)
return False
def apply_new_config(path,payload):
# Create a new revision ID
changeset = create_nvue_changest()
print("Using NVUE Changeset: '{}'".format(changeset))
# Delete existing configuration
query_string = {"rev": changeset}
r = requests.delete(url=nvue_end_point + path,
auth=auth,
verify=False,
params=query_string,
headers=mime_header)
print_request(r.request)
print_response(r)
# Patch the new configuration
query_string = {"rev": changeset}
r = requests.patch(url=nvue_end_point + path,
auth=auth,
verify=False,
data=json.dumps(payload),
params=query_string,
headers=mime_header)
print_request(r.request)
print_response(r)
# Apply the changes to the new revision changeset
apply_nvue_changeset(changeset)
# Check if the changeset was applied
is_config_applied(changeset)
def nvue_get(path):
r = requests.get(url=nvue_end_point + path,
auth=auth,
verify=False)
print_request(r.request)
print_response(r)
if __name__ == "__main__":
payload = {
"interface": {
"bond0": {
"bond": {
"member": {
"swp1": {},
"swp2": {},
"swp3": {},
"swp4": {}
}
},
"type": "bond"
},
"lo": {
"ip": {
"address": {
"99.99.99.99/32": {}
}
}
}
},
"system": {
"hostname": "switch01"
}
}
apply_new_config("/",payload)
time.sleep(DUMMY_SLEEP)
nvue_get("/interface/bond0")
nvue_get("/interface/lo")
nvue_get("/system")
API Examples
The following section provides practical API examples.
Configure the System
To set the system hostname, pre-login or post-login message, and time zone on the switch, send a targeted API request to /nvue_v1/system.
cumulus@switch:~$ curl -u 'cumulus:cumulus' -d '{"system": {"hostname":"switch01","timezone":"America/Los_Angeles","message":{"pre-login":"Welcome to NVIDIA Cumulus Linux","post-login":"You have successfully logged in to switch01"}}}' -k -X PATCH https://127.0.0.1:8765/nvue_v1/?rev=4
#!/usr/bin/env python3
import requests
from requests.auth import HTTPBasicAuth
import json
import time
auth = HTTPBasicAuth(username="cumulus", password="password")
nvue_end_point = "https://127.0.0.1:8765/nvue_v1"
mime_header = {"Content-Type": "application/json"}
DUMMY_SLEEP = 5 # In seconds
POLL_APPLIED = 1 # in seconds
RETRIES = 10
def print_request(r: requests.Request):
print("=======Request=======")
print("URL:", r.url)
print("Headers:", r.headers)
print("Body:", r.body)
def print_response(r: requests.Response):
print("=======Response=======")
print("Headers:", r.headers)
print("Body:", json.dumps(r.json(), indent=2))
def create_nvue_changest():
r = requests.post(url=nvue_end_point + "/revision",
auth=auth,
verify=False)
print_request(r.request)
print_response(r)
response = r.json()
changeset = response.popitem()[0]
return changeset
def apply_nvue_changeset(changeset):
# apply_payload = {"state": "apply"}
apply_payload = {"state": "apply", "auto-prompt": {"ays": "ays_yes"}}
url = nvue_end_point + "/revision/" + requests.utils.quote(changeset,
safe="")
r = requests.patch(url=url,
auth=auth,
verify=False,
data=json.dumps(apply_payload),
headers=mime_header)
print_request(r.request)
print_response(r)
def is_config_applied(changeset) -> bool:
# Check if the configuration was indeed applied
global RETRIES
global POLL_APPLIED
retries = RETRIES
while retries > 0:
r = requests.get(url=nvue_end_point + "/revision/" + requests.utils.quote(changeset, safe=""),
auth=auth,
verify=False)
response = r.json()
print(response)
if response["state"] == "applied":
return True
retries -= 1
time.sleep(POLL_APPLIED)
return False
def apply_new_config(path,payload):
# Create a new revision ID
changeset = create_nvue_changest()
print("Using NVUE Changeset: '{}'".format(changeset))
# Delete existing configuration
query_string = {"rev": changeset}
r = requests.delete(url=nvue_end_point + path,
auth=auth,
verify=False,
params=query_string,
headers=mime_header)
print_request(r.request)
print_response(r)
# Patch the new configuration
query_string = {"rev": changeset}
r = requests.patch(url=nvue_end_point + path,
auth=auth,
verify=False,
data=json.dumps(payload),
params=query_string,
headers=mime_header)
print_request(r.request)
print_response(r)
# Apply the changes to the new revision changeset
apply_nvue_changeset(changeset)
# Check if the changeset was applied
is_config_applied(changeset)
def nvue_get(path):
r = requests.get(url=nvue_end_point + path,
auth=auth,
verify=False)
print_request(r.request)
print_response(r)
if __name__ == "__main__":
payload = {
"system":
{
"hostname":"switch01",
"timezone":"America/Los_Angeles",
"message":
{
"pre-login":"Welcome to NVIDIA Cumulus Linux",
"post-login:"You have successfully logged in to switch01"
}
}
}
apply_new_config("/",payload) # Root patch
time.sleep(DUMMY_SLEEP)
nvue_get("/system")
cumulus@switch:~$ nv set system hostname switch01
cumulus@switch:~$ nv set system timezone America/Los_Angeles
cumulus@switch:~$ nv set system message pre-login "Welcome to NVIDIA Cumulus Linux"
cumulus@switch:~$ nv set system message post-login "You have successfully logged into switch01"
Configure Services
To set up NTP, DNS, and SNMP on the switch, send a targeted API request to /nvue_v1/service.
#!/usr/bin/env python3
import requests
from requests.auth import HTTPBasicAuth
import json
import time
auth = HTTPBasicAuth(username="cumulus", password="password")
nvue_end_point = "https://127.0.0.1:8765/nvue_v1"
mime_header = {"Content-Type": "application/json"}
DUMMY_SLEEP = 5 # In seconds
POLL_APPLIED = 1 # in seconds
RETRIES = 10
def print_request(r: requests.Request):
print("=======Request=======")
print("URL:", r.url)
print("Headers:", r.headers)
print("Body:", r.body)
def print_response(r: requests.Response):
print("=======Response=======")
print("Headers:", r.headers)
print("Body:", json.dumps(r.json(), indent=2))
def create_nvue_changest():
r = requests.post(url=nvue_end_point + "/revision",
auth=auth,
verify=False)
print_request(r.request)
print_response(r)
response = r.json()
changeset = response.popitem()[0]
return changeset
def apply_nvue_changeset(changeset):
# apply_payload = {"state": "apply"}
apply_payload = {"state": "apply", "auto-prompt": {"ays": "ays_yes"}}
url = nvue_end_point + "/revision/" + requests.utils.quote(changeset,
safe="")
r = requests.patch(url=url,
auth=auth,
verify=False,
data=json.dumps(apply_payload),
headers=mime_header)
print_request(r.request)
print_response(r)
def is_config_applied(changeset) -> bool:
# Check if the configuration was indeed applied
global RETRIES
global POLL_APPLIED
retries = RETRIES
while retries > 0:
r = requests.get(url=nvue_end_point + "/revision/" + requests.utils.quote(changeset, safe=""),
auth=auth,
verify=False)
response = r.json()
print(response)
if response["state"] == "applied":
return True
retries -= 1
time.sleep(POLL_APPLIED)
return False
def apply_new_config(path,payload):
# Create a new revision ID
changeset = create_nvue_changest()
print("Using NVUE Changeset: '{}'".format(changeset))
# Delete existing configuration
query_string = {"rev": changeset}
r = requests.delete(url=nvue_end_point + path,
auth=auth,
verify=False,
params=query_string,
headers=mime_header)
print_request(r.request)
print_response(r)
# Patch the new configuration
query_string = {"rev": changeset}
r = requests.patch(url=nvue_end_point + path,
auth=auth,
verify=False,
data=json.dumps(payload),
params=query_string,
headers=mime_header)
print_request(r.request)
print_response(r)
# Apply the changes to the new revision changeset
apply_nvue_changeset(changeset)
# Check if the changeset was applied
is_config_applied(changeset)
def nvue_get(path):
r = requests.get(url=nvue_end_point + path,
auth=auth,
verify=False)
print_request(r.request)
print_response(r)
if __name__ == "__main__":
payload = {
"service":
{
"ntp":
{
"default":
{
"server:
{
"4.cumulusnetworks.pool.ntp.org":
{
"iburst":"on"
}
}
}
},
"dns":
{
"mgmt":
{
"server:
{
"192.168.1.100":{}
}
}
},
"syslog":
{
"mgmt":
{
"server:
{
"192.168.1.120":
{
"port":8000
}
}
}
}
}
}
apply_new_config("/",payload) # Root patch
time.sleep(DUMMY_SLEEP)
nvue_get("/service/ntp")
nvue_get("/service/dns")
nvue_get("/service/syslog")
cumulus@switch:~$ nv set service ntp default server 4.cumulusnetworks.pool.ntp.org iburst on
cumulus@switch:~$ nv set service dns mgmt server 192.168.1.100
cumulus@switch:~$ nv set service syslog mgmt server 192.168.1.120 port 8000
Configure Users
The following example creates a new user, then deletes the user.
#!/usr/bin/env python3
import requests
from requests.auth import HTTPBasicAuth
import json
import time
auth = HTTPBasicAuth(username="cumulus", password="password")
nvue_end_point = "https://127.0.0.1:8765/nvue_v1"
mime_header = {"Content-Type": "application/json"}
DUMMY_SLEEP = 5 # In seconds
POLL_APPLIED = 1 # in seconds
RETRIES = 10
def print_request(r: requests.Request):
print("=======Request=======")
print("URL:", r.url)
print("Headers:", r.headers)
print("Body:", r.body)
def print_response(r: requests.Response):
print("=======Response=======")
print("Headers:", r.headers)
print("Body:", json.dumps(r.json(), indent=2))
def create_nvue_changest():
r = requests.post(url=nvue_end_point + "/revision",
auth=auth,
verify=False)
print_request(r.request)
print_response(r)
response = r.json()
changeset = response.popitem()[0]
return changeset
def apply_nvue_changeset(changeset):
# apply_payload = {"state": "apply"}
apply_payload = {"state": "apply", "auto-prompt": {"ays": "ays_yes"}}
url = nvue_end_point + "/revision/" + requests.utils.quote(changeset,
safe="")
r = requests.patch(url=url,
auth=auth,
verify=False,
data=json.dumps(apply_payload),
headers=mime_header)
print_request(r.request)
print_response(r)
def is_config_applied(changeset) -> bool:
# Check if the configuration was indeed applied
global RETRIES
global POLL_APPLIED
retries = RETRIES
while retries > 0:
r = requests.get(url=nvue_end_point + "/revision/" + requests.utils.quote(changeset, safe=""),
auth=auth,
verify=False)
response = r.json()
print(response)
if response["state"] == "applied":
return True
retries -= 1
time.sleep(POLL_APPLIED)
return False
def apply_new_config(path,payload):
# Create a new revision ID
changeset = create_nvue_changest()
print("Using NVUE Changeset: '{}'".format(changeset))
# Delete existing configuration
query_string = {"rev": changeset}
r = requests.delete(url=nvue_end_point + path,
auth=auth,
verify=False,
params=query_string,
headers=mime_header)
print_request(r.request)
print_response(r)
# Patch the new configuration
query_string = {"rev": changeset}
r = requests.patch(url=nvue_end_point + path,
auth=auth,
verify=False,
data=json.dumps(payload),
params=query_string,
headers=mime_header)
print_request(r.request)
print_response(r)
# Apply the changes to the new revision changeset
apply_nvue_changeset(changeset)
# Check if the changeset was applied
is_config_applied(changeset)
def delete_config(path):
# Create an NVUE changeset
changeset = create_nvue_changest()
print("Using NVUE Changeset: '{}'".format(changeset))
# Equivalent to JSON `null`
payload = None
# Stage the change
query_string = {"rev": changeset}
r = requests.delete(url=nvue_end_point + path,
auth=auth,
verify=False,
data=json.dumps(payload),
params=query_string,
headers=mime_header)
print_request(r.request)
print_response(r)
# Apply the staged changeset
apply_nvue_changeset(changeset)
# Check if the changeset was applied
is_config_applied(changeset)
def nvue_get(path):
r = requests.get(url=nvue_end_point + path,
auth=auth,
verify=False)
print_request(r.request)
print_response(r)
if __name__ == "__main__":
# Need to create a hashed password - The supported password
# hashes are documented here:
# https://docs.nvidia.com/networking-ethernet-software/cumulus-linux-55/System-Configuration/Authentication-Authorization-and-Accounting/User-Accounts/#hashed-passwords # noqa
# Here in this example, we use SHA-512
import crypt
hashed_password = crypt.crypt("hello$world#2023", salt=crypt.METHOD_SHA512)
payload = {
"system": {
"aaa": {
"user": {
"test1": {
"hashed-password": hashed_password,
"role": "nvue-monitor",
"enable": "on",
"full-name": "Test User",
}
}
}
}
}
apply_new_config("/",payload) # Root patch
time.sleep(DUMMY_SLEEP)
nvue_get("/system/user/aaa")
"""Delete an existing user account using the AAA API."""
delete_config("/system/aaa/user/test1")
time.sleep(DUMMY_SLEEP)
nvue_get("/system/user/aaa")
This example creates a new user test1.
cumulus@switch:~$ nv set system aaa user test1
cumulus@switch:~$ nv set system aaa user test1 full-name "Test User"
cumulus@switch:~$ nv set system aaa user test1 password "abcd@test"
cumulus@switch:~$ nv set system aaa user test1 role nvue-monitor
cumulus@switch:~$ nv set system aaa user test1 enable on
#!/usr/bin/env python3
import requests
from requests.auth import HTTPBasicAuth
import json
import time
auth = HTTPBasicAuth(username="cumulus", password="password")
nvue_end_point = "https://127.0.0.1:8765/nvue_v1"
mime_header = {"Content-Type": "application/json"}
DUMMY_SLEEP = 5 # In seconds
POLL_APPLIED = 1 # in seconds
RETRIES = 10
def print_request(r: requests.Request):
print("=======Request=======")
print("URL:", r.url)
print("Headers:", r.headers)
print("Body:", r.body)
def print_response(r: requests.Response):
print("=======Response=======")
print("Headers:", r.headers)
print("Body:", json.dumps(r.json(), indent=2))
def create_nvue_changest():
r = requests.post(url=nvue_end_point + "/revision",
auth=auth,
verify=False)
print_request(r.request)
print_response(r)
response = r.json()
changeset = response.popitem()[0]
return changeset
def apply_nvue_changeset(changeset):
# apply_payload = {"state": "apply"}
apply_payload = {"state": "apply", "auto-prompt": {"ays": "ays_yes"}}
url = nvue_end_point + "/revision/" + requests.utils.quote(changeset,
safe="")
r = requests.patch(url=url,
auth=auth,
verify=False,
data=json.dumps(apply_payload),
headers=mime_header)
print_request(r.request)
print_response(r)
def is_config_applied(changeset) -> bool:
# Check if the configuration was indeed applied
global RETRIES
global POLL_APPLIED
retries = RETRIES
while retries > 0:
r = requests.get(url=nvue_end_point + "/revision/" + requests.utils.quote(changeset, safe=""),
auth=auth,
verify=False)
response = r.json()
print(response)
if response["state"] == "applied":
return True
retries -= 1
time.sleep(POLL_APPLIED)
return False
def apply_new_config(path,payload):
# Create a new revision ID
changeset = create_nvue_changest()
print("Using NVUE Changeset: '{}'".format(changeset))
# Delete existing configuration
query_string = {"rev": changeset}
r = requests.delete(url=nvue_end_point + path,
auth=auth,
verify=False,
params=query_string,
headers=mime_header)
print_request(r.request)
print_response(r)
# Patch the new configuration
query_string = {"rev": changeset}
r = requests.patch(url=nvue_end_point + path,
auth=auth,
verify=False,
data=json.dumps(payload),
params=query_string,
headers=mime_header)
print_request(r.request)
print_response(r)
# Apply the changes to the new revision changeset
apply_nvue_changeset(changeset)
# Check if the changeset was applied
is_config_applied(changeset)
def nvue_get(path):
r = requests.get(url=nvue_end_point + path,
auth=auth,
verify=False)
print_request(r.request)
print_response(r)
if __name__ == "__main__":
rt_payload = {
"bgp":
{
"autonomous-system": 65101,
"router-id":"10.10.10.1"
}
}
apply_new_config("/router",rt_payload)
vrf_payload = {
"bgp":
{
"neighbor":
{
"swp51":
{
"remote-as":"external"
}
},
"address-family":
{
"ipv4-unicast":
{
"network":
{
"10.10.10.1/32":{}
}
}
}
}
}
apply_new_config("/vrf/default/router",vrf_payload)
time.sleep(DUMMY_SLEEP)
nvue_get("/router")
nvue_get("/vrf/default/router")
cumulus@switch:~$ nv set router bgp autonomous-system 65101
cumulus@switch:~$ nv set router bgp router-id 10.10.10.1
cumulus@switch:~$ nv set vrf default router bgp neighbor swp51 remote-as external
cumulus@switch:~$ nv set vrf default router bgp address-family ipv4-unicast network 10.10.10.1/32
Action Operations
The NVUE action operations are ephemeral operations that do not modify the state of the configuration; they reset counters for interfaces, BGP, QoS buffers and pools, and remove conflicts from protodown MLAG bonds.
In the following python example, the full_config_example() method sets the system pre-login message, enables BGP globally, and changes a few other configuration settings in a single bulk operation. The API end-point goes to the root node /nvue_v1. The bridge_config_example() method performs a targeted API request to /nvue_v1/bridge/domain/<domain-id> to set the vlan-vni-offset attribute.
To try out the NVUE REST API, use the NVUE API Lab available on NVIDIA Air. The lab provides a basic example to help you get started. You can also try out the other examples in this document.
Unlike the NVUE CLI, the NVUE API does not support configuring a plain text password for a user account; you must configure a hashed password for a user account with the NVUE API.
If you need to make multiple updates on the switch, NVIDIA recommends you use a root patch, which can make configuration changes with fewer round trips to the switch. Running many specific NVUE PATCH APIs to set or unset objects requires many round trips to the switch to set up the HTTP connection, transfer payload and responses, manage network utilization, and so on.
The ntpd daemon running on the switch implements the NTP protocol. It synchronizes the system time with time servers in the /etc/ntp.conf file. The ntpd daemon starts at boot by default.
If you intend to run this service within a VRF, including the management VRF, follow these steps to configure the service.
Configure NTP Servers
The default NTP configuration includes the following servers, which are in the /etc/ntp.conf file:
server 0.cumulusnetworks.pool.ntp.org iburst
server 1.cumulusnetworks.pool.ntp.org iburst
server 2.cumulusnetworks.pool.ntp.org iburst
server 3.cumulusnetworks.pool.ntp.org iburst
To add the NTP servers you want to use, run the following commands. Include the iburst option to increase the sync speed.
The NVUE command requires a VRF. The following command adds the NTP servers in the default VRF.
cumulus@switch:~$ nv set service ntp default server 4.cumulusnetworks.pool.ntp.org iburst on
cumulus@switch:~$ nv config apply
Edit the /etc/ntp.conf file to add or update NTP server information:
cumulus@switch:~$ sudo nano /etc/ntp.conf
# pool.ntp.org maps to about 1000 low-stratum NTP servers. Your server will
# pick a different set every time it starts up. Please consider joining the
# pool: <http://www.pool.ntp.org/join.html>
server 0.cumulusnetworks.pool.ntp.org iburst
server 1.cumulusnetworks.pool.ntp.org iburst
server 2.cumulusnetworks.pool.ntp.org iburst
server 3.cumulusnetworks.pool.ntp.org iburst
server 4.cumulusnetworks.pool.ntp.org iburst
To set the initial date and time with NTP before starting the ntpd daemon, run the ntpd -q command. Be aware that ntpd -q can hang if the time servers are not reachable.
cumulus@switch:~$ nv show service ntp default server
cumulus@switch:~$ ntpq -p
remote refid st t when poll reach delay offset jitter
==============================================================================
+ec2-34-225-6-20 129.6.15.30 2 u 73 1024 377 70.414 -2.414 4.110
+lax1.m-d.net 132.163.96.1 2 u 69 1024 377 11.676 0.155 2.736
*69.195.159.158 199.102.46.72 2 u 133 1024 377 48.047 -0.457 1.856
-2.time.dbsinet. 198.60.22.240 2 u 1057 1024 377 63.973 2.182 2.692
The following example commands remove some of the default NTP servers:
cumulus@switch:~$ nv unset service ntp default server 0.cumulusnetworks.pool.ntp.org
cumulus@switch:~$ nv unset service ntp default server 1.cumulusnetworks.pool.ntp.org
cumulus@switch:~$ nv unset service ntp default server 2.cumulusnetworks.pool.ntp.org
cumulus@switch:~$ nv unset service ntp default server 3.cumulusnetworks.pool.ntp.org
cumulus@switch:~$ nv config apply
Edit the /etc/ntp.conf file to delete NTP servers.
cumulus@switch:~$ sudo nano /etc/ntp.conf
...
# pool.ntp.org maps to about 1000 low-stratum NTP servers. Your server will
# pick a different set every time it starts up. Please consider joining the
# pool: <http://www.pool.ntp.org/join.html>
server 4.cumulusnetworks.pool.ntp.org iburst
...
Specify the NTP Source Interface
By default, the source interface that NTP uses is eth0. The following example command configures the NTP source interface to be swp10.
cumulus@switch:~$ nv set service ntp default listen swp10
cumulus@switch:~$ nv config apply
Edit the /etc/ntp.conf file and modify the entry under the Specify interfaces comment.
You can use DHCP to specify your NTP servers. Ensure that the DHCP-generated configuration file /run/ntp.conf.dhcp exists. The /etc/dhcp/dhclient-exit-hooks.d/ntp script generates this file, which is a copy of the default /etc/ntp.conf file with a modified server list from the DHCP server. If this file does not exist and you plan on using DHCP in the future, you can copy your current /etc/ntp.conf file to the location of the DHCP file.
To use DHCP to specify your NTP servers, run the sudo -E systemctl edit ntp.service command and add the ExecStart= line:
The sudo -E systemctl edit ntp.service command always updates the base ntp.service even if you use ntp@mgmt.service. The ntp@mgmt.service is re-generated automatically.
To validate that your configuration, run these commands:
If the state is not Active, or the alternate configuration file does not appear in the ntp command line, it is likely that you made a configuration mistake. Correct the mistake and rerun the commands above to verify.
Configure NTP with Authorization Keys
For added security, you can configure NTP to use authorization keys.
Configure the NTP Server
Create a .keys file, such as /etc/ntp.keys. Specify a key identifier (a number between 1 and 65535), an encryption method (M for MD5), and the password. The following provides an example:
#
# PLEASE DO NOT USE THE DEFAULT VALUES HERE.
#
#65535 M akey
#1 M pass
1 M CumulusLinux!
In the /etc/ntp.conf file, add a pointer to the /etc/ntp.keys file you created above and specify the key identifier. For example:
Restart NTP with the sudo systemctl restart ntp command.
Configure the NTP Client
The NTP client is the Cumulus Linux switch.
Create the same .keys file you created on the NTP server (/etc/ntp.keys). For example:
cumulus@switch:~$ sudo nano /etc/ntp.keys
#
# DO NOT USE THE DEFAULT VALUES HERE.
#
#65535 M akey
#1 M pass
1 M CumulusLinux!
Edit the /etc/ntp.conf file to specify the server you want to use, the key identifier, and a pointer to the /etc/ntp.keys file you created in step 1. For example:
cumulus@switch:~$ sudo nano /etc/ntp.conf
...
# You do need to talk to an NTP server or two (or three).
#pool ntp.your-provider.example
# OR
#server ntp.your-provider.example
# pool.ntp.org maps to about 1000 low-stratum NTP servers. Your server will
# pick a different set every time it starts up. Please consider joining the
# pool: <http://www.pool.ntp.org/join.html>
#server 0.cumulusnetworks.pool.ntp.org iburst
#server 1.cumulusnetworks.pool.ntp.org iburst
#server 2.cumulusnetworks.pool.ntp.org iburst
#server 3.cumulusnetworks.pool.ntp.org iburst
server 10.50.23.121 key 1
#keys
keys /etc/ntp.keys
trustedkey 1
controlkey 1
requestkey 1
...
Restart NTP in the active VRF (default or management). For example:
Wait a few minutes, then run the ntpq -c as command to verify the configuration:
cumulus@switch:~$ ntpq -c as
ind assid status conf reach auth condition last_event cnt
===========================================================
1 40828 f014 yes yes ok reject reachable 1
After a successful authorization, you see the following command output:
cumulus@switch:~$ ntpq -c as
ind assid status conf reach auth condition last_event cnt
===========================================================
1 40828 f61a yes yes ok sys.peer sys_peer 1
Considerations
NTP in Cumulus Linux uses the /usr/share/zoneinfo/leap-seconds.list file, which expires periodically and results in generated log messages about the expiration. When the file expires, update it from https://www.ietf.org/timezones/data/leap-seconds.list or upgrade the tzdata package to the newest version.
Cumulus Linux supports IEEE 1588-2008 Precision Timing Protocol (PTPv2), which defines the algorithm and method for synchronizing clocks of various devices across packet-based networks, including Ethernet switches and IP routers.
PTP is capable of sub-microsecond accuracy. The clocks are in a master-slave hierarchy, where the slaves synchronize to their masters, which can be slaves to their own masters. The best master clock (BMC) algorithm, which runs on every clock, creates and updates the hierarchy automatically. The grandmaster clock is the top-level master. To provide a high-degree of accuracy, a Global Positioning System (GPS) time source typically synchronizes the grandmaster clock.
In the following example:
Boundary clock 2 receives time from Master 1 (the grandmaster) on a PTP slave port, sets its clock and passes the time down from the PTP master port to Boundary clock 1.
Boundary clock 1 receives the time on a PTP slave port, sets its clock and passes the time down the hierarchy through the PTP master ports to the hosts that receive the time.
Cumulus Linux and PTP
PTP in Cumulus Linux uses the linuxptp package that includes the following programs:
ptp4l provides the PTP protocol and state machines
phc2sys provides PTP Hardware Clock and System Clock synchronization
timemaster provides System Clock and PTP synchronization
Cumulus Linux supports:
PTP boundary clock mode only (the switch provides timing to downstream servers; it is a slave to a higher-level clock and a master to downstream clocks).
UDPv4, UDPv6, and 802.3 encapsulation.
Only a single PTP domain per network.
PTP on layer 3 interfaces, layer 3 bonds, trunk ports, and switch ports belonging to a VLAN.
Multicast, unicast, and mixed message mode.
End-to-End delay mechanism only. Cumulus Linux does not support Peer-to-Peer.
One-step and two-step clock timestamp mode.
Hardware timestamping for PTP packets. This allows PTP to avoid inaccuracies caused by message transfer delays and improves the accuracy of time synchronization.
On NVIDIA switches with Spectrum-2 and later, PTP is not supported on 1G interfaces.
You cannot run both PTP and NTP on the switch.
PTP supports the default VRF only.
Basic Configuration
Basic PTP configuration requires you:
Enable PTP on the switch.
Configure PTP on at least one interface; this can be a layer 3 routed port, switch port, or trunk port. You do not need to specify which is a master interface and which is a slave interface; the PTP Best Master Clock Algorithm (BMCA) determines the master and slave.
If you configure PTP with Linux commands, you must also enable PTP timestamping; see step 1 of the Linux procedure below. NVUE enables timestamping when you enable PTP on the switch.
The basic configuration shown below uses the default PTP settings:
The clock mode is Boundary. This is the only clock mode that Cumulus Linux supports.
The delay mechanism is End-to-End (E2E), where the slave measures the delay between itself and the master. The master and slave send delay request and delay response messages between each other to measure the delay.
The clock timestamp mode is two-step.
To configure other settings, such as the PTP profile, domain, priority, and DSCP, the PTP interface transport mode and timers, and PTP monitoring, see the Optional Configuration sections below.
The NVUE nv set service ptp commands require an instance number (1 in the example command below) for management purposes.
When you enable the PTP service with the nv set service ptp <instance> enable on command, NVUE restarts the switchd service, which causes all network ports to reset in addition to resetting the switch hardware configuration.
cumulus@switch:~$ nv set service ptp 1 enable on
cumulus@switch:~$ nv set interface swp1 ip address 10.0.0.9/32
cumulus@switch:~$ nv set interface swp2 ip address 10.0.0.10/32
cumulus@switch:~$ nv set interface swp1 ptp enable on
cumulus@switch:~$ nv set interface swp2 ptp enable on
cumulus@switch:~$ nv config apply
The configuration writes to the /etc/ptp4l.conf file.
cumulus@switch:~$ nv set service ptp 1 enable on
cumulus@switch:~$ nv set bridge domain br_default
cumulus@switch:~$ nv set bridge domain br_default type vlan-aware
cumulus@switch:~$ nv set bridge domain br_default vlan 10-30
cumulus@switch:~$ nv set bridge domain br_default vlan 10 ptp enable on
cumulus@switch:~$ nv set interface vlan10 type svi
cumulus@switch:~$ nv set interface vlan10 ip address 10.1.10.2/24
cumulus@switch:~$ nv set interface vlan10 ptp enable on
cumulus@switch:~$ nv set interface swp1 bridge domain br_default
cumulus@switch:~$ nv set interface swp1 bridge domain br_default vlan 10
cumulus@switch:~$ nv set interface swp1 ptp enable on
cumulus@switch:~$ nv config apply
You can configure only one address; either IPv4 or IPv6.
For IPv6, set the trunk port transport mode to ipv6.
The configuration writes to the /etc/ptp4l.conf file.
cumulus@switch:~$ nv set service ptp 1 enable on
cumulus@switch:~$ nv set bridge domain br_default
cumulus@switch:~$ nv set bridge domain br_default type vlan-aware
cumulus@switch:~$ nv set bridge domain br_default vlan 10-30
cumulus@switch:~$ nv set bridge domain br_default vlan 10 ptp enable on
cumulus@switch:~$ nv set interface vlan10 type svi
cumulus@switch:~$ nv set interface vlan10 ip address 10.1.10.2/24
cumulus@switch:~$ nv set interface swp2 bridge domain br_default
cumulus@switch:~$ nv set interface swp2 bridge domain br_default access 10
cumulus@switch:~$ nv set interface swp2 ptp enable on
cumulus@switch:~$ nv config apply
You can configure only one address; either IPv4 or IPv6.
For IPv6, set the trunk port transport mode to ipv6.
The configuration writes to the /etc/ptp4l.conf file.
Edit the /etc/cumulus/switchd.d/ptp.conf file to set the ptp.timestamping parameter to TRUE:
Edit the Default interface options section of the /etc/ptp4l.conf file to configure the interfaces on the switch that you want to use for PTP.
cumulus@switch:~$ sudo nano /etc/ptp4l.conf
...
[global]
#
# Default Data Set
#
slaveOnly 0
priority1 128
priority2 128
domainNumber 0
twoStepFlag 1
dscp_event 46
dscp_general 46
network_transport L2
dataset_comparison G.8275.x
G.8275.defaultDS.localPriority 128
ptp_dst_mac 01:80:C2:00:00:0E
#
# Port Data Set
#
logAnnounceInterval -3
logSyncInterval -4
logMinDelayReqInterval -4
announceReceiptTimeout 3
delay_mechanism E2E
offset_from_master_min_threshold -50
offset_from_master_max_threshold 50
mean_path_delay_threshold 200
tsmonitor_num_ts 100
tsmonitor_num_log_sets 3
tsmonitor_num_log_entries 4
tsmonitor_log_wait_seconds 1
#
# Run time options
#
logging_level 6
path_trace_enabled 0
use_syslog 1
verbose 0
summary_interval 0
#
# servo parameters
#
pi_proportional_const 0.000000
pi_integral_const 0.000000
pi_proportional_scale 0.700000
pi_proportional_exponent -0.300000
pi_proportional_norm_max 0.700000
pi_integral_scale 0.300000
pi_integral_exponent 0.400000
pi_integral_norm_max 0.300000
step_threshold 0.000002
first_step_threshold 0.000020
max_frequency 900000000
sanity_freq_limit 0
#
# Default interface options
#
time_stamping software
# Interfaces in which ptp should be enabled
# these interfaces should be routed ports
# if an interface does not have an ip address
# the ptp4l will not work as expected.
[swp1]
udp_ttl 1
masterOnly 0
delay_mechanism E2E
[swp2]
udp_ttl 1
masterOnly 0
delay_mechanism E2E
For a trunk VLAN, add the VLAN configuration to the switch port stanza: set l2_mode to trunk, vlan_intf to the VLAN interface, and src_ip to the IP address of the VLAN interface:
For a switch port VLAN, add the VLAN configuration to the switch port stanza: set l2_mode to access, vlan_intf to the VLAN interface, and src_ip to the IP address of the VLAN interface:
Cumulus Linux provides several ways to modify the default basic global configuration. You can:
Use profiles.
Modify the parameters directly with NVUE commands.
Modify the Linux /etc/ptp4l.conf file.
When a predefined profile is set, NVUE does not allow you to configure global parameters. Do not edit the Linux /etc/ptp4l.conf file to modify the global parameters when a predefined profile is in use. For information about profiles, see PTP Profiles.
Clock Domains
PTP domains allow different independent timing systems to be present in the same network without confusing each other. A PTP domain is a network or a portion of a network within which all the clocks synchronize. Every PTP message contains a domain number. A PTP instance works in only one domain and ignores messages that contain a different domain number. Cumulus Linux supports only one domain in the system.
You can specify multiple PTP clock domains. PTP isolates each domain from other domains so that each domain is a different PTP network. You can specify a number between 0 and 127.
The following example commands configure domain 3 when a profile is not set:
cumulus@switch:~$ nv set service ptp 1 domain 3
cumulus@switch:~$ nv config apply
Edit the Default Data Set section of the /etc/ptp4l.conf file to change the domainNumber setting, then restart the ptp4l service.
cumulus@switch:~$ sudo nano /etc/ptp4l.conf
[global]
#
# Default Data Set
#
slaveOnly 0
priority1 128
priority2 128
domainNumber 3
...
The Cumulus Linux switch provides the following clock timestamp modes:
One-step, where PTP adds the precise time that the Sync packet egresses the port to the packet. There is no need for a follow up packet.
Two-step, where PTP notes the precise time when the Sync packet egresses the port and sends it in a separate follow up message.
One-step mode significantly reduces the number of PTP messages. Two-step mode is the default configuration.
Cumulus Linux supports one-step mode on switches with the Spectrum 2 and Spectrum-3 ASIC only.
The following example commands configure one-step mode when a profile is not set:
cumulus@switch:~$ nv set service ptp 1 two-step off
cumulus@switch:~$ nv config apply
To revert the clock timestamp mode to the default setting (two-step mode), run the nv set service ptp 1 two-step on command.
To set the clock timestamp mode for a custom profile based on IEEE1588, ITU 8275-1 or ITU 8275-2, run the nv set service ptp <instance-id> profile <profile-id> two-step command. For example, to set one-step mode for the custom profile called CUSTOM1, run the nv set service ptp 1 profile CUSTOM1 two-step off command.
Edit the Default Data Set section of the /etc/ptp4l.conf file to change the twoStepFlag setting to 0, then restart the ptp4l service.
To revert the clock timestamp mode to the default setting (two-step mode), change the twoStepFlag setting to 1.
PTP Priority
The BMC selects the PTP master according to the criteria in the following order:
Priority 1
Clock class
Clock accuracy
Clock variance
Priority 2
Port ID
Use the PTP priority to select the best master clock. You can set priority 1 and 2:
Priority 1 overrides the clock class and quality selection criteria to select the best master clock.
Priority 2 identifies primary and backup clocks among identical redundant Grandmasters.
The range for both priority1 and priority2 is between 0 and 255. The default priority is 128. For the boundary clock, use a number above 128. The lower priority applies first.
The following example commands set priority 1 and priority 2 to 200 when a profile is not set:
cumulus@switch:~$ nv set service ptp 1 priority1 200
cumulus@switch:~$ nv set service ptp 1 priority2 200
cumulus@switch:~$ nv config apply
Edit the Default Data Set section of the /etc/ptp4l.conf file to change the priority1 and, or priority2 setting, then restart the ptp4l service.
cumulus@switch:~$ sudo nano /etc/ptp4l.conf
[global]
#
# Default Data Set
#
slaveOnly 0
priority1 200
priority2 200
domainNumber 3
...
Use the local priority when you create a custom profile based on a Telecom profile (ITU 8275-1 or ITU 8275-2). Modify the local priority in a custom profile to set the local priority of the local clock. You can set a value between 0 and 255. The default priority is 128.
The following example command configures the local priority to 10 for the custom profile called CUSTOM1, based on ITU 8275-2:
cumulus@switch:~$ nv set service ptp 1 profile CUSTOM1 local-priority 10
cumulus@switch:~$ nv config apply
Edit the G.8275.defaultDS.localPriority option in the /etc/ptp4l.conf file. After you save the /etc/ptp4l.conf file, restart the ptp4l service.
Optional global PTP configuration includes configuring the DiffServ code point (DSCP). You can configure the DSCP value for all PTP IPv4 packets originated locally. You can set a value between 0 and 63.
cumulus@switch:~$ nv set service ptp 1 ip-dscp 22
cumulus@switch:~$ nv config apply
Edit the Default Data Set section of the /etc/ptp4l.conf file to change the dscp_event setting for PTP messages that trigger a timestamp read from the clock and the dscp_general setting for PTP messages that carry commands, responses, information, or timestamps.
After you save the /etc/ptp4l.conf file, restart the ptp4l service.
Cumulus Linux provides several ways to modify the default basic interface configuration. You can:
Use profiles
Modify the parameters directly with NVUE commands
Modify the Linux /etc/ptp4l.conf configuration file.
When a profile is in use, avoid configuring the following interface configuration parameters with NVUE or in the Linux configuration file so that the interface retains its profile settings.
Transport Mode
By default, Cumulus Linux encapsulates PTP messages in UDP IPV4 frames. To encapsulate PTP messages on an interface in UDP IPV6 frames:
cumulus@switch:~$ nv set interface swp1 ptp transport ipv6
cumulus@switch:~$ nv config apply
Edit the Default interface options section of the /etc/ptp4l.conf file to change the network_transport setting for the interface, then restart the ptp4l service.
cumulus@switch:~$ sudo nano /etc/ptp4l.conf
...
# Default interface options
#
time_stamping hardware
# Interfaces in which ptp should be enabled
# these interfaces should be routed ports
# if an interface does not have an ip address
# the ptp4l will not work as expected.
[swp1]
udp_ttl 1
masterOnly 0
delay_mechanism E2E
network_transport UDPv6
[swp2]
udp_ttl 1
masterOnly 0
delay_mechanism E2E
network_transport UDPv6
...
Cumulus Linux supports the following PTP message modes:
Multicast, where the ports subscribe to two multicast addresses, one for event messages with timestamps and the other for general messages without timestamps. The Sync message that the master sends is a multicast message; all slave ports receive this message because the slaves need the time from the master. The slave ports in turn generate a Delay Request to the master. This is a multicast message that the intended master for the message and other slave ports receive. Similarly, all slave ports in addition to the intended slave port receive the master’s Delay Response. The slave ports receiving the unintended Delay Requests and Responses need to drop the packets. This can affect network bandwidth if there are hundreds of slave ports.
Mixed, where Sync and Announce messages are multicast messages but Delay Request and Response messages are unicast. This avoids the issue seen in multicast message mode where every slave port sees Delay Requests and Responses from every other slave port.
Unicast, where you configure the port as a unicast client or server. See Unicast Mode.
Multicast mode is the default setting; when you enable PTP on an interface, the message mode is multicast.
To change the message mode to mixed on swp1:
cumulus@switch:~$ nv set interface swp1 ptp mixed-multicast-unicast on
cumulus@switch:~$ nv config apply
To change the message mode back to the default setting of multicast on swp1:
cumulus@switch:~$ nv set interface swp1 ptp mixed-multicast-unicast off
cumulus@switch:~$ nv config apply
Edit the Default interface options section of the /etc/ptp4l.conf file to add the hybrid_e2e 1 line under the interface, then restart the ptp4l service.
cumulus@switch:~$ sudo nano /etc/ptp4l.conf
...
# Default interface options
#
time_stamping hardware
# Interfaces in which ptp should be enabled
# these interfaces should be routed ports
# if an interface does not have an ip address
# the ptp4l will not work as expected.
[swp1]
hybrid_e2e 1
...
To change the message mode back to the default setting of multicast, remove the hybrid_e2e line under the interface, then restart the ptp4l service.
PTP Interface Timers
You can set the following timers for PTP messages.
Timer
Description
announce-interval
The average interval between successive Announce messages. Specify the value as a power of two in seconds.
announce-timeout
The number of announce intervals that have to occur without receiving an Announce message before a timeout occurs. Make sure that this value is longer than the announce-interval in your network.
delay-req-interval
The minimum average time interval allowed between successive Delay Required messages.
sync-interval
The interval between PTP synchronization messages on an interface. Specify the value as a power of two in seconds.
To set the timers with NVUE, run the nv set interface <interface> ptp timers <timer> <value> command.
To set the timers with Linux commands, edit the /etc/ptp4l.conf file and set the timers in the Default interface options section.
The following example sets the announce interval between successive Announce messages on swp1 to -1.
Edit the Default interface options section of the /etc/ptp4l.conf file:
To set the announce interval between successive Announce messages on swp1 to -1, add logAnnounceInterval -1 under the interface stanza.
To set the mean sync-interval for multicast messages on swp1 to -5, add logSyncInterval -5 under the interface stanza.
After you edit the /etc/ptp4l.conf file, restart the ptp4l service.
cumulus@switch:~$ sudo nano /etc/ptp4l.conf
...
# Default interface options
#
time_stamping hardware
# Interfaces in which ptp should be enabled
# these interfaces should be routed ports
# if an interface does not have an ip address
# the ptp4l will not work as expected.
[swp1]
logAnnounceInterval -1
logSyncInterval -5
udp_ttl 20
masterOnly 1
delay_mechanism E2E
...
Set the local priority on an interface for a profile that uses ITU 8275-1 or ITU 8275-2. You can set a value between 0 and 255. The default priority is 128.
The following example sets the local priority on swp1 to 10.
By default, PTP ports are in auto mode, where the BMC algorithm determines the state of the port.
You can configure Forced Master mode on a PTP port so that it is always in a master state and the BMC algorithm does not run for this port. This port ignores any Announce messages it receives.
cumulus@switch:~$ nv set interface swp1 ptp forced-master on
cumulus@switch:~$ nv config apply
Edit the Default interface options section of the /etc/ptp4l.conf file to change the masterOnly setting for the interface, then restart the ptp4l service.
cumulus@switch:~$ sudo nano /etc/ptp4l.conf
...
# Default interface options
#
time_stamping hardware
# Interfaces in which ptp should be enabled
# these interfaces should be routed ports
# if an interface does not have an ip address
# the ptp4l will not work as expected.
[swp1]
udp_ttl 1
masterOnly 1
delay_mechanism E2E
...
Edit the Default interface options section of the /etc/ptp4l.conf file to change the udp_ttl setting for the interface, then restart the ptp4l service.
cumulus@switch:~$ sudo nano /etc/ptp4l.conf
...
# Default interface options
#
time_stamping hardware
# Interfaces in which ptp should be enabled
# these interfaces should be routed ports
# if an interface does not have an ip address
# the ptp4l will not work as expected.
[swp1]
udp_ttl 20
masterOnly 1
delay_mechanism E2E
...
Cumulus Linux supports unicast mode so that a unicast client can perform Unicast Discover and Negotiation with servers. Unlike the default multicast mode, where both the server(master) and client(slave) start sending out announce requests and discover each other, in unicast mode, the client starts by sending out requests for unicast transmission. The client sends this to every server address in its Unicast Master Table. The server responds with an accept or deny to the request.
Global Unicast Configuration
Unicast clients need a unicast master table for unicast negotiation; you must configure at least one unicast master table on the switch.
To configure unicast globally:
Set the unicast table ID; a unique ID that identifies the unicast master table.
Set the unicast master address. You can set more than one unicast master address, which can be an IPv4, IPv6, or MAC address.
Optional: Set the unicast master query interval, which is the mean interval between requests for Announce messages. Specify this value as a power of two in seconds. You can specify a value between -3 and 4. The default value is -0 (2 power).
cumulus@switch:~$ nv set service ptp 1 unicast-master 1 address 10.10.10.1
cumulus@switch:~$ nv set service ptp 1 unicast-master 1 query-interval 4
cumulus@switch:~$ nv set interface swp1 ptp unicast-master-table-id 1
cumulus@switch:~$ nv config apply
Add the following lines at the end of the # Default interface options section of the /etc/ptp4l.conf file:
For interface unicast configuration, in addition to enabling PTP on an interface, you also need to configure the PTP interface to be either a unicast client or a unicast server.
When configuring multiple PTP interfaces on the switch to be unicast clients, you must configure a unicast table ID on every interface set as a unicast client. Each client must have a different table ID.
To configure a PTP interface to be the unicast client:
To show the unicast master table configuration on the switch, run the nv show service ptp <instance-id> unicast-master <table-id> command.
Optional Unicast Interface Configuration
You can set the unicast request duration for unicast clients, which is the service time in seconds requested by the unicast client during unicast negotiation. The default value is 300 seconds.
PTP profiles are a standardized set of configurations and rules intended to meet the requirements of a specific application. Profiles define required, allowed, and restricted PTP options, network restrictions, and performance requirements.
Cumulus Linux supports the following predefined profiles:
IEEE 1588
ITU 8275-1
ITU 8275-2
Application
Enterprise
Mobile Networks
Mobile Networks
Transport
Layer 2 and Layer 3
Layer 2
Layer 3
Encapsulation
802.3, UDPv4, or UDPv6
802.3
UDPv4 or UDPv6
Transmission
Unicast and Multicast
Multicast
Unicast
Supported Clock Types
Boundary Clock
Boundary Clock
Boundary Clock
You cannot modify the predefined profiles. If you want to set a parameter to a different value in a predefined profile, you need to create a custom profile. You can modify a custom profile within the range applicable to the profile type.
You cannot set the current profile to a profile not yet created.
You cannot set global PTP parameters in a profile currently in use.
PTP profiles do not support VLANs or bonds.
If you set a predefined or custom profile, do not change any global PTP settings, such as the DSCP or the clock domain.
For better performance in a high scale network with PTP on multiple interfaces, configure a higher system policer rate with the nv set system control-plane policer lldp-ptp burst <value> and nv set system control-plane policer lldp-ptp rate <value> commands. The switch uses the LLDP policer for PTP protocol packets. The default value for the LLDP policer is 2500. When you use the ITU 8275.1 profile with higher sync rates, use higher policer values.
Set a Predefined Profile
To set a predefined profile:
To set the ITU 8275.1 profile, run the nv set service ptp <instance-id> current-profile default-itu-8275-1 command.
To set the ITU 8275.2 profile, run the nv set service ptp <instance-id> current-profile default-itu-8275-2 command.
The following example sets the profile to ITU 8275.1
cumulus@switch:~$ nv set service ptp 1 current-profile default-itu-8275-1
cumulus@switch:~$ nv config apply
To set the IEEE 1588 profile:
cumulus@switch:~$ nv set service ptp 1 current-profile default-1588
cumulus@switch:~$ nv config apply
To set the predefined ITU 8275.1 profile, edit the /etc/ptp4l.conf file and set the parameters shown below, then restart the ptp4l service:
Set the profile type on which to base the new profile (itu-g-8275-1itu-g-8275-2, or ieee-1588).
Update any of the profile settings you want to change (announce-interval, delay-req-interval, priority1, sync-interval, announce-timeout, domain, priority2, transport, delay-mechanism, local-priority).
Set the custom profile to be the current profile.
The following example commands create a custom profile called CUSTOM1 based on the predefined profile ITU 8275-1. The commands set the domain to 28 and the announce-timeout to 3, then set CUSTOM1 to be the current profile:
cumulus@switch:~$ nv set service ptp 1 profile CUSTOM1
cumulus@switch:~$ nv set service ptp 1 profile CUSTOM1 profile-type itu-g-8275-1
cumulus@switch:~$ nv set service ptp 1 profile CUSTOM1 domain 28
cumulus@switch:~$ nv set service ptp 1 profile CUSTOM1 announce-timeout 3
cumulus@switch:~$ nv set service ptp 1 current-profile CUSTOM1
cumulus@switch:~$ nv config apply
The following example /etc/ptp4l.conf file creates a custom profile based on the predefined profile ITU 8275-1 and sets the domain to 28 and the announce-timeout to 3.
cumulus@switch:~$ sudo nano /etc/ptp4l.conf
[global]
#
# Default Data Set
#
slaveOnly 0
priority1 128
priority2 128
domainNumber 28
twoStepFlag 1
dscp_event 46
dscp_general 46
network_transport L2
dataset_comparison G.8275.x
G.8275.defaultDS.localPriority 128
ptp_dst_mac 01:80:C2:00:00:0E
#
# Port Data Set
#
logAnnounceInterval 5
logSyncInterval -4
logMinDelayReqInterval -4
announceReceiptTimeout 3
delay_mechanism E2E
offset_from_master_min_threshold -50
offset_from_master_max_threshold 50
mean_path_delay_threshold 200
tsmonitor_num_ts 100
tsmonitor_num_log_sets 3
tsmonitor_num_log_entries 4
tsmonitor_log_wait_seconds 1
#
# Run time options
#
logging_level 6
path_trace_enabled 0
use_syslog 1
verbose 0
summary_interval 0
#
# servo parameters
#
pi_proportional_const 0.000000
pi_integral_const 0.000000
pi_proportional_scale 0.700000
pi_proportional_exponent -0.300000
pi_proportional_norm_max 0.700000
pi_integral_scale 0.300000
pi_integral_exponent 0.400000
pi_integral_norm_max 0.300000
step_threshold 0.000002
first_step_threshold 0.000020
max_frequency 900000000
sanity_freq_limit 0
#
# Default interface options
#
time_stamping software
# Interfaces in which ptp should be enabled
# these interfaces should be routed ports
# if an interface does not have an ip address
# the ptp4l will not work as expected.
[swp1]
udp_ttl 1
masterOnly 0
delay_mechanism E2E
[swp2]
udp_ttl 1
masterOnly 0
delay_mechanism E2E
To show the current PTP profile setting, run the nv show service ptp <ptp-instance> command:
cumulus@switch:~$ nv show service ptp 1
operational applied description
--------------------------- ----------- ------------------ --------------------------------------------------------------------
enable on on Turn the feature 'on' or 'off'. The default is 'off'.
current-profile default-itu-8275-1 Current PTP profile index
domain 24 0 Domain number of the current syntonization
ip-dscp 46 46 Sets the Diffserv code point for all PTP packets originated locally.
priority1 128 128 Priority1 attribute of the local clock
priority2 128 128 Priority2 attribute of the local clock
...
To show the settings for a profile, run the nv show service ptp <instance> profile <profile-name> command:
The acceptable master table option is a security feature that prevents a rogue player from pretending to be the grandmaster clock to take over the PTP network. To use this feature, you configure the clock IDs of known grandmaster clocks in the acceptable master table and set the acceptable master table option on a PTP port. The BMC algorithm checks if the grandmaster clock received in the Announce message is in this table before proceeding with the master selection. Cumulus Linux disables this option by default on PTP ports.
The following example command adds the grandmaster clock ID 24:8a:07:ff:fe:f4:16:06 to the acceptable master table and enables the PTP acceptable master table option for swp1:
cumulus@switch:~$ nv set service ptp 1 acceptable-master 24:8a:07:ff:fe:f4:16:06
cumulus@switch:~$ nv config apply
You can also configure an alternate priority 1 value for the Grandmaster:
cumulus@switch:~$ nv set service ptp 1 acceptable-master 24:8a:07:ff:fe:f4:16:06 alt-priority 2
To enable the PTP acceptable master table option for swp1:
cumulus@switch:~$ nv set interface swp1 ptp acceptable-master on
cumulus@switch:~$ nv config apply
Edit the Default interface options section of the /etc/ptp4l.conf file to add acceptable_master_clockIdentity 248a07.fffe.f41606.
To enable the PTP acceptable master table option for swp1, add acceptable_master on under [swp1].
...
# Default interface options
#
time_stamping hardware
# Interfaces in which ptp should be enabled
# these interfaces should be routed ports
# if an interface does not have an ip address
# the ptp4l will not work as expected.
[swp1]
udp_ttl 20
masterOnly 1
delay_mechanism E2E
acceptable_master on
...
Cumulus Linux provides the following optional PTP monitoring configuration.
Configure Clock TimeStamp and Path Delay Thresholds
Cumulus Linux monitors clock timestamp and path delay against thresholds, and generates counters when PTP reaches the set thresholds. You can see the counters in the NVUE nv show command output and in log messages.
You can configure the following monitor settings:
Command
Description
nv set service ptp <instance> monitor min-offset-threshold
Sets the minimum difference allowed between the master and slave time. You can set a value between -1000000000 and 0 nanoseconds. The default value is -50 nanoseconds.
nv set service ptp <instance> monitor max-offset-threshold
Sets the maximum difference allowed between the master and slave time. You can set a value between 0 and 1000000000 nanoseconds. The default value is 50 nanoseconds.
nv set service ptp <instance> monitor path-delay-threshold
Sets the mean time that PTP packets take to travel between the master and slave. You can set a value between 0 and 1000000000 nanoseconds. The default value is 200 nanoseconds.
nv set service ptp <instance> monitor max-timestamp-entries
Sets the maximum number of timestamp entries allowed. Cumulus Linux updates the timestamps continuously. You can specify a value between 100 and 200. The default value is 100 entries.
The following example sets the minimum offset threshold to -1000, the maximum offset threshold to 1000, and the path delay threshold to 300:
cumulus@switch:~$ nv set service ptp 1 monitor min-offset-threshold -1000
cumulus@switch:~$ nv set service ptp 1 monitor max-offset-threshold 1000
cumulus@switch:~$ nv set service ptp 1 monitor path-delay-threshold 300
cumulus@switch:~$ nv config apply
You can configure the following monitor settings manually in the /etc/ptp4l.conf file. Be sure to run the sudo systemctl restart ptp4l.service to apply the settings.
Parameter
Description
offset_from_master_min_threshold
Sets the minimum difference allowed between the master and slave time. You can set a value between -1000000000 and 0 nanoseconds. The default value is -50 nanoseconds.
offset_from_master_max_threshold
Sets the maximum difference allowed between the master and slave time. You can set a value between 0 and 1000000000 nanoseconds. The default value is 50 nanoseconds.
mean_path_delay_threshold
Sets the mean time that PTP packets take to travel between the master and slave. You can set a value between 0 and 1000000000 nanoseconds. The default value is 200 nanoseconds.
The following example sets the minimum offset threshold to -1000, the maximum offset threshold to 1000, and the path delay threshold to 300:
A log set contains the log entries for clock timestamp and path delay violations at different times. You can set the number of entries to log and the interval between successive violation logs.
Command
Description
nv set service ptp 1 monitor max-violation-log-sets
Sets the maximum number of log sets allowed. You can specify a value between 2 and 4. The default value is 3.
nv set service ptp 1 monitor max-violation-log-entries
Sets the maximum number of log entries allowed in a log set. You can specify a value between 4 and 8. The default value is 4.
nv set service ptp 1 monitor violation-log-interval
Sets the number of seconds to wait before logging back-to-back violations. You can specify a value between 0 and 60. The default value is 1.
The following example sets the maximum number of log sets allowed to 4, the maximum number of log entries allowed to 6, and the violation log interval to 10:
cumulus@switch:~$ nv set service ptp 1 monitor max-violation-log-sets 4
cumulus@switch:~$ nv set service ptp 1 monitor max-violation-log-entries 6
cumulus@switch:~$ nv set service ptp 1 monitor violation-log-interval 10
cumulus@switch:~$ nv config apply
You can configure the following monitor settings manually in the /etc/ptp4l.conf file. Be sure to run the sudo systemctl restart ptp4l.service to apply the settings.
Parameter
Description
tsmonitor_num_log_sets
Sets the maximum number of log sets allowed. You can specify a value between 2 and 4. The default value is 3.
tsmonitor_num_log_entries
Sets the maximum number of log entries allowed in a log set. You can specify a value between 4 and 8. The default value is 4.
tsmonitor_log_wait_seconds
Sets the number of seconds to wait before logging back-to-back violations. You can specify a value between 0 and 60. The default value is 1.
The following example sets the maximum number of log sets allowed to 4, the maximum number of log entries allowed to 6, and the violation log interval to 10:
To delete PTP configuration, delete the PTP master and slave interfaces. The following example commands delete the PTP interfaces swp1, swp2, and swp3.
Edit the /etc/ptp4l.conf file to remove the interfaces from the Default interface options section, then restart the ptp4l service.
cumulus@switch:~$ sudo nano /etc/ptp4l.conf
...
# Default interface options
#
time_stamping hardware
# Interfaces in which ptp should be enabled
# these interfaces should be routed ports
# if an interface does not have an ip address
# the ptp4l will not work as expected.
You can drill down with the following nv show service ptp <instance> commands:
nv show service ptp <instance> acceptable-master shows acceptable master configuration.
nv show service ptp <instance> clock-quality shows the clock quality status.
nv show service ptp <instance> current shows the local states learned during PTP message exchange.
nv show service ptp <instance> domain shows the domain configuration.
nv show service ptp <instance> ip-dscp shows PTP DSCP configuration.
nv show service ptp <instance> monitor shows PTP monitor configuration.
nv show service ptp <instance> profile shows PTP profile configuration.
nv show service ptp <instance> parent shows the local states learned during PTP message exchange.
nv show service ptp <instance> priority1 shows PTP priority1 configuration.
nv show service ptp <instance> priority2 shows PTP priority2 configuration.
nv show service ptp <instance> status shows the status of all PTP interfaces.
nv show service ptp <instance> time-properties shows the clock time attributes.
nv show service ptp <instance> unicast-master shows the unicast master configuration.
Show PTP Interface Configuration
To check configuration for a PTP interface, run the nv show interface <interface> ptp command.
cumulus@switch:~$ nv show interface swp1 ptp
operational applied description
------------------------- ----------- ---------- ----------------------------------------------------------------------
enable on Turn the feature 'on' or 'off'. The default is 'off'.
acceptable-master off Determines if acceptable master check is enabled for this interface.
delay-mechanism end-to-end end-to-end Mode in which PTP message is transmitted.
forced-master off off Configures PTP interfaces to forced master state.
instance 1 PTP instance number.
mixed-multicast-unicast off Enables Multicast for Announce, Sync and Followup and Unicast for D...
transport ipv4 ipv4 Transport method for the PTP messages.
ttl 1 1 Maximum number of hops the PTP messages can make before it gets dro...
unicast-request-duration 300 The service time in seconds to be requested during discovery.
timers
announce-interval 0 0 Mean time interval between successive Announce messages. It's spec...
announce-timeout 3 3 The number of announceIntervals that have to pass without receipt o...
delay-req-interval -3 -3 The minimum permitted mean time interval between successive Delay R...
sync-interval -3 -3 The mean SyncInterval for multicast messages. It's specified as a...
peer-mean-path-delay 0 An estimate of the current one-way propagation delay on the link wh...
port-state master State of the port
protocol-version 2 The PTP version in use on the port
Show PTP Counters
To show all PTP counters, run the nv show service ptp <instance> counters command:
cumulus@switch:~$ nv show service ptp 1 counters
Packet Type Received Transmitted
--------------------- ------------ ------------
Port swp4
Announce 0 10370
Sync 0 20731
Follow-up 0 20731
Delay Request 0 0
Delay Response 0 0
Peer Delay Request 0 0
Peer Delay Response 0 0
Management 0 0
Signaling 0 0
To show PTP counters for an interface, run the nv show interface <interface> counters ptp command.
To clear PTP counters for an interface, run the nv action clear interface <interface> counters ptp command:
To show the status of all PTP interfaces, run the nv show service ptp <instance> status command.
The command output shows the PTP enabled ports, the PTP port mode (unicast or multicast), the state of the port based on BMCA, the unicast state, and identifies the server address to which the client connects.
cumulus@switch:~$ nv show service ptp 1 status
Port Mode State Ustate Server
----- ----- ------- ------------------------------- -------
swp9 Ucast SLAVE Sync and Delay Granted (H_SYDY) 9.9.9.2
swp10 Ucast PASSIVE Initial State (WAIT)
swp11 Ucast PASSIVE Initial State (WAIT)
swp12 Ucast PASSIVE Initial State (WAIT)
Show the List of NVUE PTP Commands
To see a full list of NVUE show commands for PTP, run the nv list-commands service ptp command.
To show a full list of show commands for a PTP interface, run the nv list-commands | grep 'nv show interface <interface-id> ptp' command.
cumulus@switch:~$ nv list-commands service ptp
nv show service ptp
nv show service ptp <instance-id>
nv show service ptp <instance-id> status
nv show service ptp <instance-id> domain
nv show service ptp <instance-id> priority1
nv show service ptp <instance-id> priority2
nv show service ptp <instance-id> ip-dscp
nv show service ptp <instance-id> acceptable-master
...
cumulus@switch:~$ nv list-commands | grep 'nv show interface <interface-id> ptp'
...
nv show interface <interface-id> ptp
nv show interface <interface-id> ptp timers
nv show interface <interface-id> ptp shaper
...
Example Configuration
In the following example, the boundary clock on the switch receives time from Master 1 (the grandmaster) on PTP slave port swp1, sets its clock and passes the time down through PTP master ports swp2, swp3, and swp4 to the hosts that receive the time.
The following example configuration assumes that you have already configured the layer 3 routed interfaces (swp1, swp2, swp3, and swp4) you want to use for PTP.
cumulus@switch:~$ nv set service ptp 1 enable on
cumulus@switch:~$ nv set service ptp 1 priority2 254
cumulus@switch:~$ nv set service ptp 1 priority1 254
cumulus@switch:~$ nv set service ptp 1 domain 3
cumulus@switch:~$ nv set interface swp1 ptp enable on
cumulus@switch:~$ nv set interface swp2 ptp enable on
cumulus@switch:~$ nv set interface swp3 ptp enable on
cumulus@switch:~$ nv set interface swp4 ptp enable on
cumulus@switch:~$ nv config apply
cumulus@switch:~$ sudo cat /etc/nvue.d/startup.yaml
- set:
interface:
lo:
ip:
address:
10.10.10.1/32: {}
type: loopback
swp1:
ptp:
enable: on
type: swp
swp2:
ptp:
enable: on
type: swp
swp3:
ptp:
enable: on
type: swp
swp4:
ptp:
enable: on
type: swp
service:
ptp:
'1':
domain: 3
enable: on
priority1: 254
priority2: 254
cumulus@switch:~$ sudo cat /etc/ptp4l.conf
...
[global]
#
# Default Data Set
#
slaveOnly 0
priority1 254
priority2 254
domainNumber 3
twoStepFlag 1
dscp_event 46
dscp_general 46
offset_from_master_min_threshold -50
offset_from_master_max_threshold 50
mean_path_delay_threshold 200
tsmonitor_num_ts 100
tsmonitor_num_log_sets 2
tsmonitor_num_log_entries 4
tsmonitor_log_wait_seconds 1
#
# Run time options
#
logging_level 6
path_trace_enabled 0
use_syslog 1
verbose 0
summary_interval 0
#
# servo parameters
#
pi_proportional_const 0.000000
pi_integral_const 0.000000
pi_proportional_scale 0.700000
pi_proportional_exponent -0.300000
pi_proportional_norm_max 0.700000
pi_integral_scale 0.300000
pi_integral_exponent 0.400000
pi_integral_norm_max 0.300000
step_threshold 0.000002
first_step_threshold 0.000020
max_frequency 900000000
sanity_freq_limit 0
#
# Default interface options
#
time_stamping software
# Interfaces in which ptp should be enabled
# these interfaces should be routed ports
# if an interface does not have an ip address
# the ptp4l will not work as expected.
[swp1]
udp_ttl 1
masterOnly 0
delay_mechanism E2E
network_transport UDPv4
[swp2]
udp_ttl 1
masterOnly 0
delay_mechanism E2E
network_transport UDPv4
[swp3]
udp_ttl 1
masterOnly 0
delay_mechanism E2E
network_transport UDPv4
[swp4]
udp_ttl 1
masterOnly 0
delay_mechanism E2E
network_transport UDPv4
Considerations
PTP Traffic Shaping
To improve performance on the NVIDIA Spectrum 1 switch for PTP-enabled ports with speeds lower than 100G, you can enable a pre-defined traffic shaping profile. For example, if you see that the PTP timing offset varies widely and does not stabilize, enable PTP shaping on all PTP enabled ports to reduce the bandwidth on the ports slightly and improve timing stabilization.
Switches with Spectrum-2 and later do not support PTP shaping.
Bonds do not support PTP shaping.
You cannot configure QoS traffic shaping and PTP traffic shaping on the same ports.
You must configure a strict priority for PTP traffic; for example:
cumulus@switch:~$ nv set qos egress-scheduler default-global traffic-class 0-5,7 mode dwrr
cumulus@switch:~$ nv set qos egress-scheduler default-global traffic-class 0-5,7 bw-percent 12
cumulus@switch:~$ nv set qos egress-scheduler default-global traffic-class 6 mode strict
For each PTP-enabled port on which you want to set traffic shaping, run the nv set interface <interface> ptp shaper enable on command.
cumulus@switch:~$ nv set interface swp1 ptp shaper enable on
cumulus@switch:~$ nv set interface swp2 ptp shaper enable on
cumulus@switch:~$ nv config apply
To see the PTP shaping setting for an interface, run the nv show interface <interface> ptp shaper command:
cumulus@switch:~$ nv show interface swp1 ptp shaper
operational applied
------ ----------- -------
enable on
In the /etc/cumulus/switchd.d/ptp_shaper.conf file, set the following parameters for the interfaces to which you want to apply traffic shaping and enable the traffic shaper. You must reload switchd for the changes to take effect.
PTP frames are affected by STP filtering; events, such as an STP topology change (where ports temporarily go into the blocking state), can cause interruptions to PTP communications.
If you configure PTP on bridge ports, NVIDIA recommends that the bridge ports are spanning tree edge ports or in a bridge domain where spanning tree is disabled.
Authentication Authorization and Accounting
This section describes how to set up user accounts and ssh for remote access, and configure LDAP authentication, TACACS+, and RADIUS AAA.
SSH for Remote Access
Cumulus Linux uses the OpenSSH package to provide access to the system using the Secure Shell (SSH) protocol.
Configure SSH
You can configure SSH to provide login access to the root user and to specific user accounts, limit SSH to listen on a specific VRF, and configure timeouts and session options.
Root User Settings
By default, the root account cannot use SSH to log in.
You can configure the root account to use SSH to log into the switch with:
A password
A public key or any allowed mechanism that is not a password and not keyboard interactive. This is the default setting.
A set of commands defined in the authorized_keys file.
To allow the root account to SSH into the switch with a password:
cumulus@switch:~$ nv set system ssh-server permit-root-login enabled
cumulus@switch:~$ nv config apply
Run the nv set system ssh-server permit-root-login disabled command to disable SSH login for the root account with a password.
To allow the root account to SSH into the switch and authenticate with a public key or any allowed mechanism that is not a password and not keyboard interactive:
cumulus@switch:~$ nv set system ssh-server permit-root-login prohibit-password
cumulus@switch:~$ nv config apply
To allow the root account to SSH into the switch and only run a set of commands defined in the authorized_keys file:
cumulus@switch:~$ nv set system ssh-server permit-root-login forced-commands-only
cumulus@switch:~$ nv config apply
To allow the root account to SSH into the switch using a password, edit the /etc/ssh/sshd_config file and set the PermitRootLogin option to yes:
Set the PermitRootLogin command to no to disable SSH login with a password.
To allow the root account to SSH into the switch and authenticate with a public key or any allowed mechanism that is not a password and not keyboard interactive:
As a privileged user (such as the cumulus user), either echo the public key contents and redirect the contents to the authorized key file or copy the public key file to the switch, then copy it to the root account (with privilege escalation).
To echo the public key contents and redirect the contents to the authorized key file:
cumulus@switch:~$ echo "<SSH public key contents>" | sudo tee -a /root/.ssh/authorized_keys
cumulus@switch:~$ sudo chmod 0644 /root/.ssh/authorized_keys
To copy the public key file to the switch, then copy it to the root account:
To allow certain users to establish an SSH session:
cumulus@switch:~$ nv set system ssh-server allow-users user1
cumulus@switch:~$ nv config apply
To deny certain users to establish an SSH session:
cumulus@switch:~$ nv set system ssh-server deny-users user4
cumulus@switch:~$ nv config apply
To allow certain users to establish an SSH session, edit the /etc/ssh/sshd_config file and add the AllowUsers parameter:
cumulus@switch:~$ sudo cat /etc/ssh/sshd_config
...
...
# Example of overriding settings on a per-user basis
#Match User anoncvs
# X11Forwarding no
# AllowTcpForwarding no
# PermitTTY no
# ForceCommand cvs server
AllowUsers = user1
To deny certain users to establish an SSH session, edit the /etc/ssh/sshd_config file and add the DenyUsers parameter:
cumulus@switch:~$ sudo cat /etc/ssh/sshd_config
...
# Example of overriding settings on a per-user basis
#Match User anoncvs
# X11Forwarding no
# AllowTcpForwarding no
# PermitTTY no
# ForceCommand cvs server
AllowUsers = user1
DenyUsers = user4
SSH and VRFs
The SSH service runs in the default VRF on the switch but listens on all interfaces in all VRFs. You can limit SSH to listen on specific VRFs.
You cannot run SSH in the default VRF and other VRFs at the same time.
The following example configures SSH to listen only on the management VRF:
cumulus@switch:~$ nv set system ssh-server vrf mgmt
cumulus@switch:~$ nv config apply
The following example configures SSH to listen on the management VRF and VRF RED:
cumulus@switch:~$ nv set system ssh-server vrf mgmt
cumulus@switch:~$ nv set system ssh-server vrf RED
cumulus@switch:~$ nv config apply
Bind the SSH service to the VRF. The following example configures SSH to listen only on the management VRF:
To configure SSH to listen to only one IP address or a subnet in a VRF, you need to bind the service to that VRF (as above), then set the ListenAddress parameter in the /etc/ssh/sshd_config file to the IP address or subnet in that VRF.
You can configure the following SSH timeout and session options:
The number of login attempts allowed before rejecting the SSH session. You can specify a value between 3 and 100. The default value is 3 login attempts.
The number of seconds allowed before login times out. You can specify a value between 1 and 600. The default value is 120 seconds.
The TCP port numbers that listen for incoming SSH sessions. You can specify a value between 1 and 65535.
The number of minutes a session can be inactive before the SSH server terminates the connection. The default value is 0 minutes.
The maximum number of SSH sessions allowed per TCP connection. You can specify a value between 1 and 100. The default value is 10.
Unauthenticated SSH sessions:
The maximum number of unauthenticated SSH sessions allowed. You can specify a value between 1 and 10000. The default value is 100.
The number of unauthenticated SSH sessions allowed before throttling starts. You can specify a value between 1 and 10000. The default value is 10.
The starting percentage of connections to reject above the throttle start count before reaching the session count limit. You can specify a value between 1 and 100. The default value is 30.
The following example configures the number of login attempts allowed before rejecting the SSH session to 10 and the number of seconds allowed before login times out to 200:
cumulus@switch:~$ nv set system ssh-server authentication-retries 10
cumulus@switch:~$ nv set system ssh-server login-timeout 200
cumulus@switch:~$ nv config apply
Edit the /etc/ssh/sshd_config file and change the MaxAuthTries parameter in the Authentication section to 10 and the LoginGraceTime parameter to 200:
The following example configures the TCP port that listens for incoming SSH sessions to 443:
cumulus@switch:~$ nv set system ssh-server port 443
cumulus@switch:~$ nv config apply
Edit the /etc/ssh/sshd_config file and add the Port parameter:
cumulus@switch:~$ sudo nano /etc/ssh/sshd_config
...
Port 443
#AddressFamily any
#ListenAddress 0.0.0.0
#ListenAddress ::
...
The following example configures the amount of time a session can be inactive before the SSH server terminates the connection to 5 minutes (300 seconds) and the maximum number of SSH sessions allowed per TCP connection to 5:
cumulus@switch:~$ nv set system ssh-server inactive-timeout 5
cumulus@switch:~$ nv set system ssh-server max-sessions-per-connection 5
cumulus@switch:~$ nv config apply
Edit Authentication section of the /etc/ssh/sshd_config file.
To configure the amount of time (in seconds) a session can be inactive before the SSH server terminates the connection, change the ClientAliveInterval parameter.
To configure the maximum number of SSH sessions allowed per TCP connection, change the MaxSessions parameter.
The number of unauthenticated SSH sessions allowed before throttling starts to 5.
The starting percentage of connections to reject above the throttle start count before reaching the session count limit to 22.
The maximum number of unauthenticated SSH sessions allowed to 20.
cumulus@switch:~$ nv set system ssh-server max-unauthenticated throttle-start 5
cumulus@switch:~$ nv set system ssh-server max-unauthenticated throttle-percent 22
cumulus@switch:~$ nv set system ssh-server max-unauthenticated session-count 20
cumulus@switch:~$ nv config apply
Edit the /etc/ssh/sshd_config file and change the MaxStartups parameter.
The following example configures:
The number of unauthenticated SSH sessions allowed before throttling starts to 5.
The starting percentage of connections to reject above the throttle start count before reaching the session count limit to 22.
The maximum number of unauthenticated SSH sessions allowed to 20.
This section describes how to generate an SSH key pair on one system and install the key as an authorized key on another system.
Generate an SSH Key Pair
To generate an SSH key pair, run the ssh-keygen command and follow the prompts.
To configure the system without a password, do not enter a passphrase when prompted in the following step.
cumulus@host01:~$ ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key (/home/cumulus/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/cumulus/.ssh/id_rsa.
Your public key has been saved in /home/cumulus/.ssh/id_rsa.pub.
The key fingerprint is:
5a:b4:16:a0:f9:14:6b:51:f6:f6:c0:76:1a:35:2b:bb cumulus@leaf04
The key's randomart image is:
+---[RSA 2048]----+
| +.o o |
| o * o . o |
| o + o O o |
| + . = O |
| . S o . |
| + . |
| . E |
| |
| |
+-----------------+
Install an Authorized SSH Key
To install an authorized SSH key, you take the contents of an SSH public key and add it to the SSH authorized key file (~/.ssh/authorized_keys) of the user.
A public key is a text file with three space separated fields:
<type> <key string> <comment>
Field
Description
<type>
The algorithm you want to use to hash the key. The algorithm can be ecdsa-sha2-nistp256, ecdsa-sha2-nistp384, ecdsa-sha2-nistp521, ssh-dss, ssh-ed25519, or ssh-rsa (the default value).
<key string>
A base64 format string for the key.
<comment>
A single word string. By default, this is the name of the system that generated the key. NVUE uses the <comment> field as the key name.
The procedure to install an authorized SSH key is different based on whether the user is an NVUE managed user or a non-NVUE managed user.
The following example adds an authorized key named prod_key to the user admin2. The content of the public key file is ssh-rsa 1234 prod_key.
cumulus@leaf01:~$ nv set system aaa user admin2 ssh authorized-key prod_key key XABDB3NzaC1yc2EAAAADAQABAAABgQCvjs/RFPhxLQMkckONg+1RE1PTIO2JQhzFN9TRg7ox7o0tfZ+IzSB99lr2dmmVe8FRWgxVjc...
cumulus@leaf01:~$ nv set system aaa user admin2 ssh authorized-key prod_key type ssh-rsa
cumulus@leaf01:~$ nv config apply
The following example adds an authorized key file from the account cumulus on a host to the cumulus account on the switch:
To copy a previously generated public key to the desired location, run the ssh-copy-id command and follow the prompts:
cumulus@host01:~$ ssh-copy-id -i /home/cumulus/.ssh/id_rsa.pub cumulus@leaf02
The authenticity of host 'leaf02 (192.168.0.11)' can't be established.
ECDSA key fingerprint is b1:ce:b7:6a:20:f4:06:3a:09:3c:d9:42:de:99:66:6e.
Are you sure you want to continue connecting (yes/no)? yes
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
cumulus@leaf01's password:
Number of key(s) added: 1
The ssh-copy-id command does not work if the username on the remote switch is different from the username on the local switch. To work around this issue, use the scp command instead:
cumulus@host01:~$ scp .ssh/id_rsa.pub cumulus@leaf02:.ssh/authorized_keys
Enter passphrase for key '/home/cumulus/.ssh/id_rsa':
id_rsa.pub
Connect to the remote switch to confirm that the authentication keys are in place:
cumulus@leaf01:~$ ssh cumulus@leaf02
Welcome to Cumulus VX (TM)
Cumulus VX (TM) is a community supported virtual appliance designed for
experiencing, testing and prototyping the latest technology.
For any questions or technical support, visit our community site at:
http://community.cumulusnetworks.com
The registered trademark Linux (R) is used pursuant to a sublicense from LMI,
the exclusive licensee of Linus Torvalds, owner of the mark on a world-wide basis.
Last login: Thu Sep 29 16:56:54 2016
Troubleshooting
To show all the current SSH server configuration settings, run the NVUE nv show system ssh-server command:
cumulus@switch:~$ nv show system ssh-server
applied
--------------------------- -----------------
authentication-retries 6
inactive-timeout 0
login-timeout 120
max-sessions-per-connection 10
permit-root-login prohibit-password
state enabled
max-unauthenticated
session-count 100
throttle-percent 30
throttle-start 10
To show the current number of active SSH sessions, run the NVUE nv show system ssh-server active-sessions command or the Linux w command:
cumulus@switch:~$ nv show system ssh-server active-sessions
Peer Address:Port Local Address:Port State
------------------- ---------------------- -----
192.168.200.1:46528 192.168.200.11%mgmt:22 ESTAB
cumulus@switch:~$ w
11:10:46 up 19:19, 4 users, load average: 0.08, 0.05, 0.05
USER TTY FROM LOGIN@ IDLE JCPU PCPU WHAT
cumulus ttyS0 - Wed15 19:19m 0.03s 0.02s -bash
cumulus pts/0 192.168.200.1 07:27 3:43m 0.03s 0.03s -bash
cumulus pts/1 192.168.200.1 10:01 1:09m 0.02s 0.02s -bash
cumulus pts/2 192.168.200.1 11:10 1.00s 0.03s 0.00s w
To show which users can establish an SSH session, run the nv show system ssh-server allow-users command. To show which users cannot establish an SSH session, run the nv show system ssh-server deny-users command. You can also show information for a specific user with the nv show system ssh-server allow-users <user> command and the nv show system ssh-server deny-users <user> command.
To show the TCP port numbers that listen for incoming SSH sessions, run the nv show system ssh-server port command. You can also show information for a specific port with the nv show system ssh-server port <port> command.
To show the SSH timer and session information, run the nv show system ssh-server max-unauthenticated command:
cumulus@switch:~$ nv show system ssh-server max-unauthenticated
applied
---------------- -------
session-count 20
throttle-percent 22
throttle-start 5
User Accounts
By default, Cumulus Linux has two user accounts: cumulus and root.
The cumulus account:
Uses the default password cumulus. You must change the default password when you log into Cumulus Linux for the first time.
Is a user account in the sudo group with sudo privileges.
Can log in to the system through all the usual channels, such as console and SSH.
Includes permissions to run NVUE nv show, nv set, nv unset, and nv apply commands.
The root account:
Has the default password disabled by default and prevents you from using SSH, telnet, FTP, and so on, to log in to the switch.
Has the standard Linux root user access to everything on the switch.
Add a New User Account
You can add additional user accounts as needed.
You control local user account access to NVUE commands by changing the group membership (role) for a user. Like the cumulus account, these accounts must be in the sudo group or include the NVUE system-admin role to execute privileged commands.
You can set a plain text password or a hashed password for the local user account. To access the switch without a password, you need to boot into single user mode.
You can provide a full name for the local user account (optional).
Use the following roles to set the permissions for local user accounts.
Role
Permissions
system-admin
Allows the user to use sudo to run commands as the privileged user, run nv show commands, run nv set and nv unset commands to stage configuration changes, and run nv apply commands to apply configuration changes.
nvue-admin
Allows the user to run nv show commands, run nv set and nv unset commands to stage configuration changes, and run nv apply commands to apply configuration changes.
nvue-monitor
Allows the user to run nv show commands only.
The following example:
Creates a new user account called admin2 and sets the role to system-admin (permissions for sudo, nv show, nv set and nvunset, and nv apply).
Sets a plain text password. NVUE hashes the plain text password and stores the value as a hashed password. To set a hashed password, see Hashed Passwords, below.
Adds the full name FIRST LAST. If the full name includes more than one name, either separate the names with a hyphen (FIRST-LAST) or enclose the full name in quotes ("FIRST LAST").
cumulus@switch:~$ nv set system aaa user admin2 role system-admin
cumulus@switch:~$ nv set system aaa user admin2 password
Enter new password:
Confirm password:
cumulus@switch:~$ nv set system aaa user admin2 full-name "FIRST LAST"
cumulus@switch:~$ nv config apply
You can also run the nv set system aaa user <user> password <plain-text-password> command to specify the plain text password inline. This command bypasses the Enter new password and Confirm password prompts but displays the plain text password as you type it.
If you are an NVUE-managed user, you can update your own password with the Linux passwd command.
Use the following groups to set permissions for local user accounts. To add users to these groups, use the useradd(8) or usermod(8) commands:
Group
Permissions
sudo
Allows the user to use sudo to run commands as the privileged user.
nvshow
Allows the user to run nv show commands only.
nvset
Allows the user to run nv show commands, and run nv set and nv unset commands to stage configuration changes.
nvapply
Allows the user to run nv show commands, run nv set and nv unset commands to stage configuration changes, and run nv apply commands to apply configuration changes.
The following example:
Creates a new user account called admin2, creates a home directory for the user, and adds the full name First Last.
Securely sets the password for the user with passwd.
Sets the group membership (role) to sudo and nvapply (permissions to use sudo, nv show, nv set, and nv apply).
When you use Linux commands to add a new user, you must create a home directory for the user with the -m option. NVUE commands create a home directory automatically.
Only the following user accounts can create, modify, and delete other system-admin accounts:
NVUE-managed users with the system-admin role.
The root user.
Non NVUE-managed users that are in the sudo group.
Hashed Passwords
Instead of a plain text password, you can provide a hashed password for a local user.
You must specify the hashed password in Linux crypt format; the password must be a minimum of 15 to 20 characters long and must include special characters, digits, lower case alphabetic letters, and more. Typically, the password format is set to $id$salt$hashed, where $id is the hashing algorithm. In GNU or Linux:
$1$ is MD5
$2a$ is Blowfish
$2y$ is Blowfish
$5$ is SHA-256
$6$ is SHA-512
To generate a hashed password on the switch, you can either run a python3 command or install and use the mkpasswd utility:
Run the following command on the switch or Linux host. When prompted, enter the plain text password you want to hash:
To generate a hashed password for SHA-512, SHA256, or MD5 encryption, run the following command. When prompted, enter the plain text password you want to hash:
Hashed password strings contain characters, such as $, that have a special meaning in the Linux shell; you must enclose the hashed password in single quotes (').
Delete a User Account
To delete a user account:
Run the nv unset system aaa user <user> command. The following example deletes the user account called admin2.
cumulus@switch:~$ nv unset system aaa user admin2
cumulus@switch:~$ nv config apply
Run the sudo userdel <user> command. The following example deletes the user account called admin2.
cumulus@switch:~$ sudo userdel admin2
Show User Accounts
To show the user accounts configured on the system, run the NVUE nv show system aaa command or the linux sudo cat /etc/passwd command.
cumulus@switch:~$ nv show system aaa
Username Full-name Role enable
---------------- ---------------------------------- ------------ ------
Debian-snmp Unknown system
_apt Unknown system
_lldpd Unknown system
admin2 FIRST LAST system-admin on
...
To show information about a specific user account, run the NVUE nv show system aaa user <user> command:
cumulus@switch:~$ nv show system aaa user admin2
operational applied
--------------- ------------ ------------
full-name FIRST LAST FIRST LAST
hashed-password * *
role system-admin system-admin
enable on on
Enable the root User
The root user does not have a password and cannot log into a switch using SSH. This default account behavior is consistent with Debian.
Enable Console Access
To log into the switch using root from the console, you must set the password for the root account:
cumulus@switch:~$ sudo passwd root
Enter new password:
...
Enable SSH Access
To log into the switch using root with SSH, either:
By default, Cumulus Linux has two user accounts: root and cumulus. The cumulus account is a normal user and is in the group sudo.
You can add more user accounts as needed. Like the cumulus account, these accounts must use sudo to execute privileged commands.
sudo Basics
sudo allows you to execute a command as superuser or another user as specified by the security policy.
The default security policy is sudoers, which you configure in the /etc/sudoers file. Use /etc/sudoers.d/ to add to the default sudoers policy.
Use visudo only to edit the sudoers file; do not use another editor like vi or emacs.
When creating a new file in /etc/sudoers.d, use visudo -f. This option performs sanity checks before writing the file to avoid errors that prevent sudo from working.
Errors in the sudoers file can result in losing the ability to elevate privileges to root. You can fix this issue only by power cycling the switch and booting into single user mode. Before modifying sudoers, enable the root user by setting a password for the root user.
By default, users in the sudo group can use sudo to execute privileged commands. To add users to the sudo group, use the useradd(8) or usermod(8) command. To see which users belong to the sudo group, see /etc/group (man group(5)).
You can run any command as sudo, including su. You must enter a password.
The example below shows how to use sudo as a non-privileged user cumulus to bring up an interface:
cumulus@switch:~$ ip link show dev swp1
3: swp1: <BROADCAST,MULTICAST> mtu 1500 qdisc pfifo_fast master br0 state DOWN mode DEFAULT qlen 500
link/ether 44:38:39:00:27:9f brd ff:ff:ff:ff:ff:ff
cumulus@switch:~$ ip link set dev swp1 up
RTNETLINK answers: Operation not permitted
cumulus@switch:~$ sudo ip link set dev swp1 up
Password:
umulus@switch:~$ ip link show dev swp1
3: swp1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master br0 state UP mode DEFAULT qlen 500
link/ether 44:38:39:00:27:9f brd ff:ff:ff:ff:ff:ff
sudoers Examples
The following examples show how you grant as few privileges as necessary to a user or group of users to allow them to perform the required task. Each example uses the system group noc; groups include the prefix %.
When an unprivileged user runs a command, the command must include the sudo prefix.
Cumulus Linux uses Pluggable Authentication Modules (PAM) and Name Service Switch (NSS) for user authentication. NSS enables PAM to use LDAP to provide user authentication, group mapping, and information for other services on the system.
NSS specifies the order of the information sources that resolve names for each service. Using NSS with authentication and authorization provides the order and location for user lookup and group mapping on the system.
PAM handles the interaction between the user and the system, providing login handling, session setup, authentication of users, and authorization of user actions.
To configure LDAP authentication on Linux, you can use libnss-ldap, libnss-ldapd, or libnss-sss. This chapter describes libnss-ldapd only. From internal testing, this library worked best with Cumulus Linux and is the easiest to configure, automate, and troubleshoot.
Install libnss-ldapd
The libldap-2.4-2 and libldap-common LDAP packages are already installed on the Cumulus Linux image; however you need to install these additional packages to use LDAP authentication:
libnss-ldapd
libpam-ldapd
ldap-utils
To install the additional packages, run the following command:
You can also install these packages even if the switch does not connect to the internet, as they are in the cumulus-local-apt-archive repository that is embedded in the Cumulus Linux image.
Follow the interactive prompts to specify the LDAP URI, search base distinguished name (DN), and services that must have LDAP lookups enabled. You need to select at least the passwd, group, and shadow services (press space to select a service). When done, select OK. This creates a basic LDAP configuration using anonymous bind and initiates user search under the base DN specified.
After the dialog closes, the install process prints information similar to the following:
/etc/nsswitch.conf: enable LDAP lookups for group
/etc/nsswitch.conf: enable LDAP lookups for passwd
/etc/nsswitch.conf: enable LDAP lookups for shadow
After the installation is complete, the name service caching daemon (nslcd) runs. This service handles all the LDAP protocol interactions and caches information that returns from the LDAP server. nslcd appends ldap to the /etc/nsswitch.conf file, as well as the secondary information source for passwd, group, and shadow. nslcd references the local files (/etc/passwd, /etc/groups and /etc/shadow) first, as specified by the compat source.
Keep compat as the first source in NSS for passwd, group, and shadow. This prevents you from getting locked out of the system.
Entering incorrect information during the installation process produces configuration errors. You can correct the information after installation by editing certain configuration files.
Edit the /etc/nslcd.conf file to update the LDAP URI and search base DN (see Update the nslcd.conf File, below).
Edit the /etc/nssswitch.conf file to update the service selections.
After editing the files, restart the NVUE and nginx-authenticator services with the sudo systemctl restart nvued.service command and the sudo systemctl restart nginx-authenticator.service command.
▼
Alternative Installation Method Using debconf-utils
Instead of running the installer and following the interactive prompts, as described above, you can pre-seed the installer parameters using debconf-utils.
Run apt-get install debconf-utils and create the pre-seeded parameters using debconf-set-selections. Provide the appropriate answers.
Run debconf-show <pkg> to check the settings. Here is an example of how to pre-seed answers to the installer questions using debconf-set-selections:
root# debconf-set-selections <<'zzzEndOfFilezzz'
# LDAP database user. Leave blank will be populated later!
nslcd nslcd/ldap-binddn string
# LDAP user password. Leave blank!
nslcd nslcd/ldap-bindpw password
# LDAP server search base:
nslcd nslcd/ldap-base string ou=support,dc=rtp,dc=example,dc=test
# LDAP server URI. Using ldap over ssl.
nslcd nslcd/ldap-uris string ldaps://myadserver.rtp.example.test
# New to 0.9. restart cron, exim and others libraries without asking
nslcd libraries/restart-without-asking: boolean true
# LDAP authentication to use:
# Choices: none, simple, SASL
# Using simple because its easy to configure. Security comes by using LDAP over SSL
# keep /etc/nslcd.conf 'rw' to root for basic security of bindDN password
nslcd nslcd/ldap-auth-type select simple
# Don't set starttls to true
nslcd nslcd/ldap-starttls boolean false
# Check server's SSL certificate:
# Choices: never, allow, try, demand
nslcd nslcd/ldap-reqcert select never
# Choices: Ccreds credential caching - password saving, Unix authentication, LDAP Authentication , Create home directory on first time login, Ccreds credential caching - password checking
# This is where "mkhomedir" pam config is activated that allows automatic creation of home directory
libpam-runtime libpam-runtime/profiles multiselect ccreds-save, unix, ldap, mkhomedir , ccreds-check
# for internal use; can be preseeded
man-db man-db/auto-update boolean true
# Name services to configure:
# Choices: aliases, ethers, group, hosts, netgroup, networks, passwd, protocols, rpc, services, shadow
libnss-ldapd libnss-ldapd/nsswitch multiselect group, passwd, shadow
libnss-ldapd libnss-ldapd/clean_nsswitch boolean false
## define platform specific libnss-ldapd debconf questions/answers.
## For demo used amd64.
libnss-ldapd:amd64 libnss-ldapd/nsswitch multiselect group, passwd, shadow
libnss-ldapd:amd64 libnss-ldapd/clean_nsswitch boolean false
# libnss-ldapd:powerpc libnss-ldapd/nsswitch multiselect group, passwd, shadow
# libnss-ldapd:powerpc libnss-ldapd/clean_nsswitch boolean false
Update the nslcd.conf File
After installation, update the main configuration file (/etc/nslcd.conf) to accommodate the expected LDAP server settings.
This section documents some of the more important options that relate to security and queries. For details on all the available configuration options, read the nslcd.conf man page.
After editing the /etc/nslcd.conf file or enabling LDAP in the /etc/nsswitch.conf file, you must restart the NVUE and nginx-authenticator services with the sudo systemctl restart nvued.service command and the sudo systemctl restart nginx-authenticator.service command. If you disable LDAP, you must also restart these two services.
Connection
The LDAP client starts a session by connecting to the LDAP server on TCP and UDP port 389 or on port 636 for LDAPS. Depending on the configuration, this connection establishes without authentication (anonymous bind); otherwise, the client must provide a bind user and password. The variables you use to define the connection to the LDAP server are the URI and bind credentials.
The URI is mandatory and specifies the LDAP server location using the FQDN or IP address. The URI also designates whether to use ldap:// for clear text transport, or ldaps:// for SSL/TLS encrypted transport. You can also specify an alternate port in the URI. In production environments, use the LDAPS protocol so that all communications are secure.
After the connection to the server is complete, the BIND operation authenticates the session. The BIND credentials are optional; if you do not specify the credentials, the switch assumes an anonymous bind. Configure authenticated (Simple) BIND by specifying the user (binddn) and password (bindpw) in the configuration. Another option is to use SASL (Simple Authentication and Security Layer) BIND, which provides authentication services using other mechanisms, like Kerberos. Contact your LDAP server administrator for this information as it depends on the configuration of the LDAP server and the credentials for the client device.
# The location at which the LDAP server(s) should be reachable.
uri ldaps://ldap.example.com
# The DN to bind with for normal lookups.
binddn cn=CLswitch,ou=infra,dc=example,dc=com
bindpw CuMuLuS
Search Function
When an LDAP client requests information about a resource, it must connect and bind to the server. Then, it performs one or more resource queries depending on the lookup. All search queries to the LDAP server use the configured search base, filter, and the desired entry (uid=myuser). If the LDAP directory is large, this search takes a long time. Define a more specific search base for the common maps (passwd and group).
# The search base that will be used for all queries.
base dc=example,dc=com
# Mapped search bases to speed up common queries.
base passwd ou=people,dc=example,dc=com
base group ou=groups,dc=example,dc=com
Search Filters
To limit the search scope when authenticating users, use search filters to specify criteria when searching for objects within the directory. The default filters applied are:
filter passwd (objectClass=posixAccount)
filter group (objectClass=posixGroup)
Attribute Mapping
The map configuration allows you to override the attributes pushed from LDAP. To override an attribute for a given map, specify the attribute name and the new value. This is useful to ensure that the shell is bash and the home directory is /home/cumulus:
In LDAP, the map refers to one of the supported maps specified in the manpage for nslcd.conf (such as passwd or group).
Create Home Directory on Login
If you want to use unique home directories, run the sudo pam-auth-update command and select Create home directory on login in the PAM configuration dialog (press the space bar to select the option). Select OK, then press Enter to save the update and close the dialog.
cumulus@switch:~$ sudo pam-auth-update
The home directory for any user that logs in (using LDAP or not) populates with the standard dotfiles from /etc/skel.
When nslcd starts, an error message similar to the following (where 5816 is the nslcd PID) sometimes appears:
nslcd[5816]: unable to dlopen /usr/lib/x86_64-linux-gnu/sasl2/libsasldb.so: libdb-5.3.so: cannot open
shared object file: No such file or directory
You can ignore this message. The libdb package and resulting log messages from nslcd do not cause any issues when you use LDAP as a client for login and authentication.
Example Configuration
Here is an example configuration using Cumulus Linux.
# /etc/nslcd.conf
# nslcd configuration file. See nslcd.conf(5)
# for details.
# The user and group nslcd should run as.
uid nslcd
gid nslcd
# The location at which the LDAP server(s) should be reachable.
uri ldaps://myadserver.rtp.example.test
# The search base that will be used for all queries.
base ou=support,dc=rtp,dc=example,dc=test
# The LDAP protocol version to use.
#ldap_version 3
# The DN to bind with for normal lookups.
# defconf-set-selections doesn't seem to set this. so have to manually set this.
binddn CN=cumulus admin,CN=Users,DC=rtp,DC=example,DC=test
bindpw 1Q2w3e4r!
# The DN used for password modifications by root.
#rootpwmoddn cn=admin,dc=example,dc=com
# SSL options
#ssl off (default)
# Not good does not prevent man in the middle attacks
#tls_reqcert demand(default)
tls_cacertfile /etc/ssl/certs/rtp-example-ca.crt
# The search scope.
#scope sub
# Add nested group support
# Supported in nslcd 0.9 and higher.
# default wheezy install of nslcd supports on 0.8. wheezy-backports has 0.9
nss_nested_groups yes
# Mappings for Active Directory
# (replace the SIDs in the objectSid mappings with the value for your domain)
# "dsquery * -filter (samaccountname=testuser1) -attr ObjectSID" where cn == 'testuser1'
pagesize 1000
referrals off
idle_timelimit 1000
# Do not allow uids lower than 100 to login (aka Administrator)
# not needed as pam already has this support
# nss_min_uid 1000
# This filter says to get all users who are part of the cumuluslnxadm group. Supports nested groups.
# Example, mary is part of the snrnetworkadm group which is part of cumuluslnxadm group
# Ref: http://msdn.microsoft.com/en-us/library/aa746475%28VS.85%29.aspx (LDAP_MATCHING_RULE_IN_CHAIN)
filter passwd (&(Objectclass=user)(!(objectClass=computer))(memberOf:1.2.840.113556.1.4.1941:=cn=cumuluslnxadm,ou=groups,ou=support,dc=rtp,dc=example,dc=test))
map passwd uid sAMAccountName
map passwd uidNumber objectSid:S-1-5-21-1391733952-3059161487-1245441232
map passwd gidNumber objectSid:S-1-5-21-1391733952-3059161487-1245441232
map passwd homeDirectory "/home/$sAMAccountName"
map passwd gecos displayName
map passwd loginShell "/bin/bash"
# Filter for any AD group or user in the baseDN. the reason for filtering for the
# user to make sure group listing for user files don't say '<user> <gid>'. instead will say '<user> <user>'
# So for cosmetic reasons..nothing more.
filter group (&(|(objectClass=group)(Objectclass=user))(!(objectClass=computer)))
map group gidNumber objectSid:S-1-5-21-1391733952-3059161487-1245441232
map group cn sAMAccountName
Configure LDAP Authorization
Linux uses the sudo command to allow non-administrator users (such as the default cumulus user account) to perform privileged operations. To control the users that can use sudo, define a series of rules in the /etc/sudoers file and files in the /etc/sudoers.d/ directory. The rules apply to groups but you can also define specific users. You can add sudo rules using the group names from LDAP. For example, if a group of users are in the group netadmin, you can add a rule to give those users sudo privileges. Refer to the sudoers manual (man sudoers) for a complete usage description. The following shows an example in the /etc/sudoers file:
# The basic structure of a user specification is "who where = (as_whom) what ".
%sudo ALL=(ALL:ALL) ALL
%netadmin ALL=(ALL:ALL) ALL
Active Directory Configuration
Active Directory (AD) is a fully featured LDAP-based NIS server create by Microsoft. It offers unique features that classic OpenLDAP servers do not have. AD can be more complicated to configure on the client and each version works a little differently with Linux-based LDAP clients. Some more advanced configuration examples, from testing LDAP clients on Cumulus Linux with Active Directory (AD/LDAP), are available in the knowledge base.
LDAP Verification Tools
The LDAP client daemon retrieves and caches password and group information from LDAP. To verify the LDAP interaction, use these command-line tools to trigger an LDAP query from the device.
Identify a User with the id Command
The id command performs a username lookup by following the lookup information sources in NSS for the passwd service. This returns the user ID, group ID and the group list retrieved from the information source. In the following example, the user cumulus is locally defined in /etc/passwd, and myuser is on LDAP. The NSS configuration has the passwd map configured with the sources compat ldap:
cumulus@switch:~$ id cumulus
uid=1000(cumulus) gid=1000(cumulus) groups=1000(cumulus),24(cdrom),25(floppy),27(sudo),29(audio),30(dip),44(video),46(plugdev)
cumulus@switch:~$ id myuser
uid=1230(myuser) gid=3000(Development) groups=3000(Development),500(Employees),27(sudo)
getent
The getent command retrieves all records found with NSS for a given map. It can also retrieve a specific entry under that map. You can perform tests with the passwd, group, shadow, or any other map in the /etc/nsswitch.conf file. The output from this command formats according to the map requested. For the passwd service, the structure of the output is the same as the entries in /etc/passwd. The group map outputs the same structure as /etc/group.
In this example, looking up a specific user in the passwd map, the user cumulus is locally defined in /etc/passwd, and myuser is only in LDAP.
In the next example, looking up a specific group in the group service, the group cumulus is locally defined in /etc/groups, and netadmin is on LDAP.
cumulus@switch:~$ getent group cumulus
cumulus:x:1000:
cumulus@switch:~$ getent group netadmin
netadmin:*:502:larry,moe,curly,shemp
Running the command getent passwd or getent group without a specific request returns all local and LDAP entries for the passwd and group maps.
LDAP search
The ldapsearch command performs LDAP operations directly on the LDAP server. This does not interact with NSS. This command displays the information that the LDAP daemon process receives back from the server. The command has several options. The simplest option uses anonymous bind to the host and specifies the search DN and the attribute to look up.
When setting up LDAP authentication for the first time, turn off the nslcd service using the systemctl stop nslcd.service command (or the systemctl stop nslcd@mgmt.service if you are running the service in a management VRF) and run it in debug mode. Debug mode works whether you are using LDAP over SSL (port 636) or an unencrypted LDAP connection (port 389).
The FQDN of the LDAP server URI does not match the FQDN in the CA-signed server certificate.
nslcd cannot read the SSL certificate and reports a Permission denied error in the debug during server connection negotiation. Check the permission on each directory in the path of the root SSL certificate. Ensure that it is readable by the nslcd user.
NSCD
If the nscd cache daemon is also enabled and you make some changes to the user from LDAP, you can clear the cache using the following commands:
nscd --invalidate = passwd
nscd --invalidate = group
The nscd package works with nslcd to cache name entries returned from the LDAP server. This sometimes causes authentication failures. To work around these issues, disable nscd, restart the nslcd service, then retry authentication:
If you are running the nslcd service in a management VRF, you need to run the systemctl restart nslcd@mgmt.service command instead of the systemctl restart nslcd.service command. For example:
Cumulus Linux implements TACACS+ client AAA in a transparent way with minimal configuration. The client implements the TACACS+ protocol as described in this IETF document. There is no need to create accounts or directories on the switch. Accounting records go to all configured TACACS+ servers by default. Using per-command authorization requires additional setup on the switch.
TACACS+ in Cumulus Linux:
Uses PAM authentication and includes login, ssh, sudo and su.
Allows users with privilege level 15 to run any command with sudo.
Allows users with privilege level 15 to run NVUE nv set, nv unset, and nv apply commands in addition to nv show commands. TACACS+ users with a lower privilege level can only execute nv show commands.
Supports up to seven TACACS+ servers. Be sure to configure your TACACS+ servers in addition to the TACACS+ client. Refer to your TACACS+ server documentation.
Install the TACACS+ Client Packages
You must install the TACACS+ client packages to use TACACS+. If you do not install the TACACS+ packages, you see the following message when you try to enable TACACS+ with the NVUE nv set system aaa tacacs enable on command:
'tacplus-client' package needs to be installed to enable tacacs
You can install the TACACS+ packages even if the switch is not connected to the internet; the packages are in the cumulus-local-apt-archive repository in the Cumulus Linux image.
To install all required packages, run these commands:
After you install the required TACACS+ packages, configure the following required settings on the switch (the TACACS+ client).
Set the IP address or hostname of at least one TACACS+ server.
Set the secret (key) shared between the TACACS+ server and client.
Set the VRF you want to use to communicate with the TACACS+ server. This is typically the management VRF (mgmt), which is the default VRF on the switch.
If you use NVUE commands to configure TACACS+, you must also set the priority for the authentication order for local and TACACS+ users, and enable TACACS+.
After you configure any TACACS+ settings with NVUE and you run nv config apply, you must restart the NVUE service with the sudo systemctl restart nvued.service command.
NVUE commands require you to specify the priority for each TACACS+ server. You must set a priority even if you only specify one server.
The following example commands set:
The TACACS+ server priority to 5.
The IP address of the server to 192.168.0.30.
The secret to mytacac$key.
If you include special characters in the password (such as $), you must enclose the password in single quotes (').
The VRF to mgmt.
The authentication order so that TACACS+ authentication has priority over local (the lower number has priority).
TACACS+ to enabled.
cumulus@switch:~$ nv set system aaa tacacs server 5 host 192.168.0.30
cumulus@switch:~$ nv set system aaa tacacs server 5 secret 'mytacac$key'
cumulus@switch:~$ nv set system aaa tacacs vrf mgmt
cumulus@switch:~$ nv set system aaa authentication-order 5 tacacs
cumulus@switch:~$ nv set system aaa authentication-order 10 local
cumulus@switch:~$ nv set system aaa tacacs enable on
cumulus@switch:~$ nv config apply
If you want the server to use IPv6, you must add the nv set system aaa tacacs server <priority> prefer-ip-version 6 command:
cumulus@switch:~$ nv set system aaa tacacs server 5 host server5
cumulus@switch:~$ nv set system aaa tacacs server 5 prefer-ip-version 6
...
If you configure more than one TACACS+ server, you need to set the priority for each server. If the switch cannot establish a connection with the server that has the highest priority, it tries to establish a connection with the next highest priority server. The server with the lower number has the higher prioritity. In the example below, server 192.168.0.30 with a priority value of 5 has a higher priority than server 192.168.1.30, which has a priority value of 10.
cumulus@switch:~$ nv set system aaa tacacs server 5 host 192.168.0.30
cumulus@switch:~$ nv set system aaa tacacs server 5 secret 'mytacac$key'
cumulus@switch:~$ nv set system aaa tacacs server 10 host 192.168.1.30
cumulus@switch:~$ nv set system aaa tacacs server 10 secret 'mytacac$key2'
cumulus@switch:~$ nv config apply
Edit the /etc/tacplus_servers file to add at least one server and one shared secret (key). You can specify the server and secret parameters in any order anywhere in the file. Whitespace (spaces or tabs) are not allowed. For example, if your TACACS+ server IP address is 192.168.0.30 and your shared secret is tacacskey, add these parameters to the /etc/tacplus_servers file:
Cumulus Linux supports a maximum of seven TACACS+ servers. To specify multiple servers, add one per line to the /etc/tacplus_servers file. Connections establish in the order in the file.
# If the management network is in a vrf, set this variable to the vrf name.
# This would usually be "mgmt"
# When this variable is set, the connection to the TACACS+ accounting servers
# will be made through the named vrf.
vrf=mgmt
Restart auditd:
cumulus@switch:~$ sudo systemctl restart auditd
Optional TACACS+ Configuration
You can configure the following optional TACACS+ settings:
The port to use for communication between the TACACS+ server and client. By default, Cumulus Linux uses IP port 49.
The TACACS timeout value, which is the number of seconds to wait for a response from the TACACS+ server before trying the next TACACS+ server. You can specify a value between 0 and 60. The default is 5 seconds.
The source IP address to use when communicating with the TACACS+ server so that the server can identify the client switch. You must specify an IPv4 address, which must be valid for the interface you use. This source IP address is typically the loopback address on the switch.
The TACACS+ authentication type. You can specify PAP to send clear text between the user and the server, CHAP to establish a PPP connection between the user and the server, or login. The default is PAP.
The users you do not want to send to the TACACS+ server for authentication; for example, local user accounts that exist on the switch, such as the cumulus user.
A separate home directory for each TACACS+ user when the TACACS+ user first logs in. By default, the switch uses the home directory in the mapping accounts in /etc/passwd. If the home directory does not exist, the mkhomedir_helper program creates it. This option does not apply to accounts with restricted shells (users mapped to a TACACS privilege level that has enforced per-command authorization).
The following example commands set the timeout to 10 seconds and the TACACS+ server port to 32:
cumulus@switch:~$ nv set system aaa tacacs timeout 10
cumulus@switch:~$ nv set system aaa tacacs server 5 port 32
cumulus@switch:~$ nv config apply
The following example commands set the source IP address to 10.10.10.1 and the authentication type to CHAP:
cumulus@switch:~$ nv set system aaa tacacs source-ip 10.10.10.1
cumulus@switch:~$ nv set system aaa tacacs authentication mode chap
cumulus@switch:~$ nv config apply
The following example commands exclude the user USER1 from going to the TACACS+ server for authentication and enables Cumulus Linux to create a separate home directory for each TACACS+ user when the TACACS+ user first logs in:
cumulus@switch:~$ nv set system aaa tacacs exclude-user USER1
cumulus@switch:~$ nv set system aaa tacacs authentication per-user-homedir on
cumulus@switch:~$ nv config apply
To set the server port (use the format server:port), source IP address, authentication type, and enable Cumulus Linux to create a separate home directory for each TACACS+ user, edit the /etc/tacplus_servers file, then restart auditd.
To set the timeout and the usernames to exclude from TACACS+ authentication, edit the /etc/tacplus_nss.conf file (you do not need to restart auditd).
The following example sets the server port to 32, the authentication type to CHAP, the source IP address to 10.10.10.1, and enables Cumulus Linux to create a separate home directory for each TACACS+ user when the TACACS+ user first logs in:
cumulus@switch:~$ sudo nano /etc/tacplus_servers
...
secret=mytacac$key
server=192.168.0.30:32
...
# Sets the IPv4 address used as the source IP address when communicating with
# the TACACS+ server. IPv6 addresses are not supported, nor are hostnames.
# The address must work when passsed to the bind() system call, that is, it must
# be valid for the interface being used.
source_ip=10.10.10.1
...
# If user_homedir=1, then tacacs users will be set to have a home directory
# based on their login name, rather than the mapped tacacsN home directory.
# mkhomedir_helper is used to create the directory if it does not exist (similar
# to use of pam_mkhomedir.so). This flag is ignored for users with restricted
# shells, e.g., users mapped to a tacacs privilege level that has enforced
# per-command authorization (see the tacplus-restrict man page).
user_homedir=1
...
login=chap
cumulus@switch:~$ sudo systemctl restart auditd
The following example sets the timeout to 10 seconds and excludes the user USER1 from going to the TACACS+ server for authentication:
cumulus@switch:~$ sudo nano /etc/tacplus_nss.conf
...
# The connection timeout for an NSS library should be short, since it is
# invoked for many programs and daemons, and a failure is usually not
# catastrophic. Not set or set to a negative value disables use of poll().
# This follows the include of tacplus_servers, so it can override any
# timeout value set in that file.
# It's important to have this set in this file, even if the same value
# as in tacplus_servers, since tacplus_servers should not be readable
# by users other than root.
timeout=10
...
# This is a comma separated list of usernames that are never sent to
# a tacacs server, they cause an early not found return.
#
# "*" is not a wild card. While it's not a legal username, it turns out
# that during pathname completion, bash can do an NSS lookup on "*"
# To avoid server round trip delays, or worse, unreachable server delays
# on filename completion, we include "*" in the exclusion list.
exclude_users=root,daemon,nobody,cron,radius_user,radius_priv_user,sshd,cumulus,quagga,frr,snmp,www-data,ntp,man,_lldpd,USER1,*
Cumulus Linux supports the following additional Linux parameters in the etc/tacplus_nss.conf file. Currently, there are no equivalent NUVE commands.
Linux Parameter
Description
include
Configures a supplemental configuration file to avoid duplicating configuration information. You can include up to eight additional configuration files. For example: include=/myfile/myname.
min_uid
Configures the minimum user ID that the NSS plugin can look up. 0 specifies that the plugin never looks up uid 0 (root). Do not specify a value greater than the local TACACS+ user IDs (0 through 15).
TACACS+ Accounting
When you install the TACACS+ packages and configure the basic TACACS+ settings (set the server and shared secret), accounting is on and there is no additional configuration required.
TACACS+ accounting uses the audisp module, with an additional plugin for auditd and audisp. The plugin maps the auid in the accounting record to a TACACS login, which it bases on the auid and sessionid. The audisp module requires libnss_tacplus and uses the libtacplus_map.so library interfaces as part of the modified libpam_tacplus package.
Communication with the TACACS+ servers occurs with the libsimple-tacact1 library, through dlopen(). A maximum of 240 bytes of command name and arguments send in the accounting record, due to the TACACS+ field length limitation of 255 bytes.
All sudo commands run by TACACS+ users generate accounting records against the original TACACS+ login name.
All Linux and NVUE commands result in an accounting record, including login commands and sub-processes of other commands. This can generate a lot of accounting records.
By default, Cumulus Linux sends accounting records to all servers. You can change this setting to send accounting records to the server that is first to respond:
cumulus@switch:~$ nv set system aaa tacacs accounting send-records first-response
cumulus@switch:~$ nv config apply
To reset to the default configuration (send accounting records to all servers), run the nv set system aaa tacacs accounting send-records all command.
Edit the /etc/audisp/audisp-tac_plus.conf file and change the acct_all parameter to 0:
To reset to the default configuration (send accounting records to all servers), change the value of acct_all to 1 (acct_all=1).
To disable TACACS+ accounting:
cumulus@switch:~$ nv set system aaa tacacs accounting enable off
cumulus@switch:~$ nv config apply
Edit the /etc/audisp/plugins.d/audisp-tacplus.conf file and change the active parameter to no:
cumulus@switch:~$ sudo nano /etc/audisp/plugins.d/audisp-tacplus.conf
...
# default to enabling tacacs accounting; change to no to disable
active = no
Restart auditd:
cumulus@switch:~$ sudo systemctl restart auditd
Local Fallback Authentication
You can configure the switch to allow local fallback authentication for a user when the TACACS servers are unreachable, do not include the user for authentication, or have the user in the exclude user list.
To allow local fallback authentication for a user, add a local privileged user account on the switch with the same username as a TACACS user. A local user is always active even when the TACACS service is not running.
NVUE does not provide commands to configure local fallback authentication.
To configure local fallback authentication:
Edit the /etc/nsswitch.conf file to remove the keyword tacplus from the line starting with passwd. (You need to add the keyword back in step 3.)
The following example shows the /etc/nsswitch.conf file with no tacplus keyword in the line starting with passwd.
cumulus@switch:~$ sudo nano /etc/nsswitch.conf
#
# Example configuration of GNU Name Service Switch functionality.
# If you have the `glibc-doc-reference' and `info' packages installed, try:
# `info libc "Name Service Switch"' for information about this file.
passwd: files
group: tacplus files
shadow: files
gshadow: files
...
To enable the local privileged user to run sudo and NVUE commands, run the adduser commands shown below. In the example commands, the TACACS account name is tacadmin.
The first adduser command prompts for information and a password. You can skip most of the requested information by pressing ENTER.
Edit the /etc/nsswitch.conf file to add the keyword tacplus back to the line starting with passwd (the keyword you removed in the first step).
cumulus@switch:~$ sudo nano /etc/nsswitch.conf
#
# Example configuration of GNU Name Service Switch functionality.
# If you have the `glibc-doc-reference' and `info' packages installed, try:
# `info libc "Name Service Switch"' for information about this file.
passwd: tacplus files
group: tacplus files
shadow: files
gshadow: files
...
Restart the nvued service with the following command:
cumulus@switch:~$ sudo systemctl restart nvued
TACACS+ Per-command Authorization
TACACS+ per-command authorization lets you configure the commands that TACACS+ users at different privilege levels can run.
To reach the TACACS+ server through the default VRF, you must specify the egress interface you use in the default VRF. Either run the NVUE nv set system aaa tacacs vrf <interface> command (for example, nv set system aaa tacacs vrf swp51) or set the vrf=<interface> option in the /etc/tacplus_servers file (for example, vrf=swp51).
The following command allows TACACS+ users at privilege level 0 to run the nv and ip commands (if authorized by the TACACS+ server):
cumulus@switch:~$ nv set system aaa tacacs authorization 0 command ip
cumulus@switch:~$ nv set system aaa tacacs authorization 0 command nv
cumulus@switch:~$ nv config apply
To show the per-command authorization settings, run the nv show system aaa tacacs authorization command:
cumulus@switch:~$ nv show system aaa tacacs authorization
Privilege Level role command
--------------- ------------ -------
0 nvue-monitor ip
nv
tacuser0@switch:~$ sudo tacplus-restrict -i -u tacacs0 -a ip nv
The tacplus-auth command handles authorization for each command. To make this an enforced authorization, change the TACACS+ log in to use a restricted shell, with a very limited executable search path. Otherwise, the user can bypass the authorization. The tacplus-restrict utility simplifies setting up the restricted environment.
The following table provides the tacplus-restrict command options:
Option
Description
-i
Initializes the environment. You only need to issue this option one time per username.
-a
You can invoke the utility with the -a option as often as you like. For each command in the -a list, the utility creates a symbolic link from tacplus-auth to the relative portion of the command name in the local bin subdirectory. You also need to enable these commands on the TACACS+ server (refer to your TACACS+ server documentation). It is common for the server to allow some options to a command, but not others.
-f
Re-initializes the environment. If you need to restart, run the -f option with -i to force re-initialization; otherwise, the utility ignores repeated use of -i. During initialization: - The user shell changes to /bin/rbash. - The utility saves any existing dot files.
After running this command, examine the tacacs0 directory::
cumulus@switch:~$ sudo ls -lR ~tacacs0
total 12
lrwxrwxrwx 1 root root 22 Nov 21 22:07 ip -> /usr/sbin/tacplus-auth
lrwxrwxrwx 1 root root 22 Nov 21 22:07 nv -> /usr/sbin/tacplus-auth
Except for shell built-ins, privilege level 0 TACACS users can only run the ip and nv commands.
If you add commands with the -a option by mistake, you can remove them. The example below removes the nv command:
cumulus@switch:~$ sudo rm ~tacacs0/bin/nv
To remove all commands:
cumulus@switch:~$ sudo rm ~tacacs0/bin/*
Remove the TACACS+ Client Packages
To remove all the TACACS+ client packages, use the following commands:
Run the following commands to show TACACS+ configuration:
To show all TACACS+ configuration (NVUE hides server secret keys), run the nv show aaa tacacs command.
To show TACACS+ authentication configuration , run the nv show system aaa tacacs authentication command.
To show TACACS+ accounting configuration , run the nv show system aaa tacacs accounting command.
To show TACACS+ server configuration, run the nv show system aaa tacacs server command.
To show TACACS+ server priority configuration, run the nv show system aaa tacacs server <priority-id> command.
To show the list of users excluded from TACACS+ server authentication, run the nv show system aaa tacacs exclude-user command.
The following example command shows all TACACS+ configuration:
cumulus@switch:~$ nv show system aaa tacacs
applied
------------------ -------
enable off
debug-level 0
timeout 5
vrf mgmt
accounting
enable off
authentication
mode pap
per-user-homedir off
[server] 5
[server] 10
The following command shows the list of users excluded from TACACS+ server authentication:
cumulus@switch:~$ nv show system aaa tacacs exclude-user
applied
-------- -------
username USER1
Basic Server Connectivity or NSS Issues
You can use the getent command to determine if you configured TACACS+ correctly and if the local password is in the configuration files. In the example commands below, the cumulus user represents the local user, while cumulusTAC represents the TACACS user.
To look up the username within all NSS methods:
cumulus@switch:~$ sudo getent passwd cumulusTAC
cumulusTAC:x:1016:1001:TACACS+ mapped user at privilege level 15,,,:/home/tacacs15:/bin/bash
To look up the user within the local database only:
To look up the user within the TACACS+ database only:
cumulus@switch:~$ sudo getent -s tacplus passwd cumulusTAC
cumulusTAC:x:1016:1001:TACACS+ mapped user at privilege level 15,,,:/home/tacacs15:/bin/bash
If TACACS+ is not working correctly, you can use debugging. Add the debug=1 parameter to the /etc/tacplus_servers and /etc/tacplus_nss.conf files; see the Linux Commands under Optional TACACS+ Configuration above. You can also add debug=1 to individual pam_tacplus lines in /etc/pam.d/common*.
All log messages are in /var/log/syslog.
Incorrect Shared Key
The TACACS client on the switch and the TACACS server must have the same shared secret key. If this key is incorrect, the following message prints to syslog:
2017-09-05T19:57:00.356520+00:00 leaf01 sshd[3176]: nss_tacplus: TACACS+ server 192.168.0.254:49 read failed with protocol error (incorrect shared secret?) user cumulus
Debug Issues with Per-command Authorization
To debug TACACS user command authorization, have the TACACS+ user enter the following command at a shell prompt, then try the command again:
tacuser0@switch:~$ export TACACSAUTHDEBUG=1
When you enable debugging, the command authorization conversation with the TACACS+ server shows additional information.
To disable debugging:
tacuser0@switch:~$ export -n TACACSAUTHDEBUG
Debug Issues with Accounting Records
If you add or delete TACACS+ servers from the configuration files, make sure you notify the audisp plugin with this command:
If accounting records do not send, add debug=1 to the /etc/audisp/audisp-tac_plus.conf file, then run the command above to notify the plugin. Ask the TACACS+ user to run a command and examine the end of /var/log/syslog for messages from the plugin. You can also check the auditing log file /var/log/audit audit.log to be sure the auditing records exist. If the auditing records do not exist, restart the audit daemon with:
Cumulus Linux uses the following packages for TACACS.
Package
Description
audisp-tacplus
Uses auditing data from auditd to send accounting records to the TACACS+ server and starts as part of auditd.
libtac2
Provides basic TACACS+ server utility and communication routines.
libnss-tacplus
Provides an interface between libc username lookups, the mapping functions, and the TACACS+ server.
tacplus-auth
Includes the tacplus-restrict setup utility, which enables you to perform per-command TACACS+ authorization. Per-command authorization is not the default.
libpam-tacplus
Provides a modified version of the standard Debian package.
libtacplus-map1
Provides mapping between local and TACACS+ users on the server. The package:- Sets the immutable sessionid and auditing UID to ensure that you can track the original user through multiple processes and privilege changes.- Sets the auditing loginuid as immutable.- Creates and maintains a status database in /run/tacacs_client_map to manage and lookup mappings.
libsimple-tacacct1
Provides an interface for programs to send accounting records to the TACACS+ server. audisp-tacplus uses this package.
libtac2-bin
Provides the tacc testing program and TACACS+ man page.
TACACS+ Client Configuration Files
The following table describes the TACACS+ client configuration files that Cumulus Linux uses.
Filename
Description
/etc/tacplus_servers
The primary file that requires configuration after installation. All packages with include=/etc/tacplus_servers parameters use this file. Typically, this file contains the shared secrets; make sure that the Linux file mode is 600.
/etc/nsswitch.conf
When the libnss_tacplus package installs, this file configures tacplus lookups through libnss_tacplus. If you replace this file by automation, you need to add tacplus as the first lookup method for the passwd database line.
/etc/tacplus_nss.conf
Sets the basic parameters for libnss_tacplus. The file includes a debug variable for debugging NSS lookups separately from other client packages.
/usr/share/pam-configs/tacplus
The configuration file for pam-auth-update to generate the files in the next row. The file uses these configurations at login, by su, and by ssh.
/etc/pam.d/common-*
The /etc/pam.d/common-* files update for tacplus authentication. The files update with pam-auth-update when you install or remove libpam-tacplus.
/etc/sudoers.d/tacplus
Allows TACACS+ privilege level 15 users to run commands with sudo. The file includes an example (commented out) of how to enable privilege level 15 TACACS users to use sudo without a password and provides an example of how to enable all TACACS users to run specific commands with sudo. Only edit this file with the visudo -f /etc/sudoers.d/tacplus command.
/etc/audisp/plugins.d/audisp-tacplus.conf
The audisp plugin configuration file. You do not need to modify this file.
/etc/audisp/audisp-tac_plus.conf
The TACACS+ server configuration file for accounting. You do not need to modify this file. You can use this configuration file when you only want to debug TACACS+ accounting issues, not all TACACS+ users.
/etc/audit/rules.d/audisp-tacplus.rules
The auditd rules for TACACS+ accounting. The augenrules command uses all rule files to generate the rules file.
/etc/audit/audit.rules
The audit rules file that generate when you install auditd.
Considerations
Multiple TACACS+ Users
If two or more TACACS+ users log in simultaneously with the same privilege level, while the accounting records are correct, a lookup on either name matches both users, while a UID lookup only returns the user that logs in first.
As a result, any processes that either user runs apply to both and all files either user creates apply to the first name matched. This is similar to adding two local users to the password file with the same UID and GID and is an inherent limitation of using the UID for the base user from the password file.
The current algorithm returns the first name matching the UID from the mapping file; either the first or the second user that logs in.
To work around this issue, you can use the switch audit log or the TACACS server accounting logs to determine which processes and files each user creates.
For commands that do not execute other commands (for example, changes to configurations in an editor or actions with tools like clagctl and vtysh), there is no additional accounting.
Per-command authorization is at the most basic level (Cumulus Linux uses standard Linux user permissions for the local TACACS users and only privilege level 15 users can run sudo commands by default).
The Linux auditd system does not always generate audit events for processes when terminated with a signal (with the kill system call or internal errors such as SIGSEGV). As a result, processes that exit on a signal that you do not handle, generate a STOP accounting record.
Issues with the deluser Command
TACACS+ and other non-local users that run the deluser command with the --remove-home option see the following error:
tacuser0@switch: deluser --remove-home USERNAME
userdel: cannot remove entry 'USERNAME' from /etc/passwd
/usr/sbin/deluser: `/usr/sbin/userdel USERNAME' returned error code 1. Exiting
The command does remove the home directory. The user can still log in on that account but does not have a valid home directory. This is a known upstream issue with the deluser command for all non-local users.
Only use the --remove-home option with the user_homedir=1 configuration command.
Both TACACS+ and RADIUS AAA Clients
When you install both the TACACS+ and the RADIUS AAA client, Cumulus Linux does not attempt RADIUS login. As a workaround, do not install both the TACACS+ and the RADIUS AAA client on the same switch.
TACACS+ and PAM
PAM modules and an updated version of the libpam-tacplus package configure authentication initially. When you install the package, the pam-auth-update command updates the PAM configuration in /etc/pam.d. If you make changes to your PAM configuration, you need to integrate these changes. If you also use LDAP with the libpam-ldap package, you need to edit the PAM configuration with the LDAP and TACACS ordering you prefer. The libpam-tacplus package ignore rules and the values in success=2 require adjustments to ignore LDAP rules.
The TACACS+ privilege attribute priv_lvl determines the privilege level for the user that the TACACS+ server returns during the user authorization exchange. The client accepts the attribute in either the mandatory or optional forms and also accepts priv-lvl as the attribute name. The attribute value must be a numeric string in the range 0 to 15, with 15 the most privileged level.
By default, TACACS+ users at privilege levels other than 15 cannot run sudo commands and can only run commands with standard Linux user permissions.
You can edit the /etc/pam.d/common-* files manually. However, if you run pam-auth-update again after making the changes, the update fails. Only configure /usr/share/pam-configs/tacplus, then run pam-auth-update.
NSS Plugin
With pam_tacplus, TACACS+ authenticated users can log in without a local account on the system using the NSS plugin that comes with the tacplus_nss package. The plugin uses the mapped tacplus information if the user is not in the local password file, provides the getpwnam() and getpwuid()entry points, and uses the TACACS+ authentication functions.
The plugin asks the TACACS+ server if it knows the user, and then for relevant attributes to determine the privilege level of the user. When you install the libnss_tacplus package, nsswitch.conf changes to set tacplus as the first lookup method for passwd. If you change the order, lookups return the local accounts, such as tacacs0
If TACACS+ server does not find the user, it uses the libtacplus.so exported functions to do a mapped lookup. The privilege level appends to tacacs and the lookup searches for the name in the local password file. For example, privilege level 15 searches for the tacacs15 user. If the TACACS+ server finds the user, it adds information for the user in the password structure.
If the TACACS+ server does not find the user, it decrements the privilege level and checks again until it reaches privilege level 0 (user tacacs0). This allows you to use only the two local users tacacs0 and tacacs15, for minimal configuration.
TACACS+ Client Sequencing
Cumulus Linux requires the following information at the beginning of the AAA sequence:
Whether the user is a valid TACACS+ user
The user privilege level
For non-local users (users not in the local password file) you need to send a TACACS+ authorization request as the first communication with the TACACS+ server, before authentication and before the user logging in requests a password.
You need to configure certain TACACS+ servers to allow authorization requests before authentication. Contact your TACACS+ server vendor for information.
Multiple Servers with Different User Accounts
If you configure multiple TACACS+ servers that have different user accounts:
TACACS+ authentication allows for fall through; if the first reachable server does not authenticate the user, the client tries the second server, and so on.
TACACS authorization does not fall through. If the first reachable server returns an unauthorized result, the command is unauthorized and the client does not try the next server.
RADIUS AAA
Various add-on packages enable RADIUS users to log in to Cumulus Linux switches in a transparent way with minimal configuration. There is no need to create accounts or directories on the switch. Authentication uses PAM and includes login, ssh, sudo and su.
Install the RADIUS Packages
You can install the RADIUS packages even if the switch is not connected to the internet, as they are in the cumulus-local-apt-archive repository, which is embedded in the Cumulus Linux image.
After installation is complete, either reboot the switch or run the sudo systemctl restart nvued command.
The libpam-radius-auth package supplied with the Cumulus Linux RADIUS client is a newer version than the one in Debian Buster. This package contains support for IPv6, the src_ip option described below, as well as bug fixes and minor features. The package also includes VRF support, provides man pages describing the PAM and RADIUS configuration, and sets the SUDO_PROMPT environment variable to the login name for RADIUS mapping support.
The libnss-mapuser package is specific to Cumulus Linux and supports the getgrent, getgrnam and getgrgid library interfaces. These interfaces add logged in RADIUS users to the group member list for groups that contain the mapped_user (radius_user) if the RADIUS account does not have privileges, and add privileged RADIUS users to the group member list for groups that contain the mapped_priv_user (radius_priv_user) during the group lookups.
During package installation:
The PAM configuration updates automatically using pam-auth-update (8), and the NSS configuration file /etc/nsswitch.conf adds the mapuser and mapuid plugins. If you remove or purge the packages, these files remove the configuration for these plugins.
The radius_shell package installs the /sbin/radius_shell and setcap cap_setuid program for the login shell for RADIUS accounts. The package adjusts the UID when needed, then runs the bash shell with the same arguments. When installed, the package changes the shell of the RADIUS accounts to /sbin//radius_shell, and to /bin/shell if you remove the package. You need this package to enable privileged RADIUS users. You do not need this package for regular RADIUS clients.
The nvshow group includes the radius_user account, the nvset and nvapply groups and sudo groups include the radius_priv_user account. This change enables all RADUS logins to run NVUE nv show commands and all privileged RADIUS users to also run nv set, nv unset, and nv apply commands, and to use sudo.
Configure the RADIUS Client
After editing the /etc/pam_radius_auth.conf file, you must restart the NVUE and nginx-authenticator services with the sudo systemctl restart nvued.service command and the sudo systemctl restart nginx-authenticator.service command.
To configure the RADIUS client, edit the /etc/pam_radius_auth.conf file:
Add the hostname or IP address of at least one RADIUS server (such as a freeradius server on Linux), and the shared secret used to authenticate and encrypt communication with each server.
You must be able to resolve the hostname of the switch to an IP address. If for some reason you cannot find the hostname in DNS, you can add the hostname to the /etc/hosts file manually. However, this can cause problems because DHCP assigns the IP address, which can change at any time.
Multiple server configuration lines are verified in the order listed. Other than memory, there is no limit to the number of RADIUS servers you can use.
The server port number or name is optional. The system looks up the port in the /etc/services file. However, you can override the ports in the /etc/pam_radius_auth.conf file.
If the server is slow or latencies are high, change the timeout setting. The setting defaults to 3 seconds.
If you want to use a specific interface to reach the RADIUS server, specify the src_ip option. You can specify the hostname of the interface, an IPv4, or an IPv6 address. If you specify the src_ip option, you must also specify the timeout option.
Set the vrf-name field. This is typically set to mgmt if you are using a management VRF. You cannot specify more than one VRF.
The configuration file includes the mapped_priv_user field that sets the account used for privileged RADIUS users and the priv-lvl field that sets the minimum value for the privilege level to be a privileged login (the default value is 15). If you edit these fields, make sure the values match those set in the /etc/nss_mapuser.conf file.
The following example provides a sample /etc/pam_radius_auth.conf file configuration:
mapped_priv_user radius_priv_user
# server[:port] shared_secret timeout (secs) src_ip
192.168.0.254 secretkey
other-server othersecret 3 192.168.1.10
# when mgmt vrf is in use
vrf-name mgmt
If this is the first time you are configuring the RADIUS client, uncomment the debug line for troubleshooting. The debugging messages write to /var/log/syslog. When the RADIUS client is working correctly, comment out the debug line.
As an optional step, you can set PAM configuration keywords by editing the /usr/share/pam-configs/radius file. After you edit the file, you must run the pam-auth-update --package command. The pam_radius_auth (8) man page describes the PAM configuration keywords.
The value of the VSA (Vendor Specific Attribute) shell:priv-lvl determines the privilege level for the user on the switch. If the attribute does not return, the user does not have privileges. The following shows an example using the freeradius server for a fully privileged user.
The VSA vendor name (Cisco-AVPair in the example above) can have any content. The RADIUS client only checks for the string shell:priv-lvl.
Enable Login without Local Accounts
LDAP is not commonly used with switches and adding accounts locally is cumbersome, Cumulus Linux includes a mapping capability with the libnss-mapuser package.
Mapping uses two NSS (Name Service Switch) plugins, one for account name, and one for UID lookup. The installation process configures these accounts automatically in the /etc/nsswitch.conf file and removes them when you delete the package. See the nss_mapuser (8) man page for the full description of this plugin.
A username is mapped at login to a fixed account specified in the configuration file, with the fields of the fixed account used as a template for the user that is logging in.
For example, if you look up the name dave and the fixed account in the configuration file is radius\_user, and that entry in /etc/passwd is:
then the matching line that returns when you run getent passwd dave is:
cumulus@switch:~$ getent passwd dave
dave:x:1017:1002:dave mapped user:/home/dave:/bin/bash
The login process creates the home directory /home/dave if it does not already exist and populates it with the standard skeleton files by the mkhomedir_helper command.
The configuration file /etc/nss_mapuser.conf configures the plugins. The file includes the mapped account name, which is radius_user by default. You can change the mapped account name by editing the file. The nss_mapuser (5) man page describes the configuration file.
A flat file mapping derives from the session number assigned during login, which persists across su and sudo. Cumulus Linux removes the mapping at logout.
Local Fallback Authentication
If a site wants to allow local fallback authentication for a user when none of the RADIUS servers are reachable, you can add a privileged user account as a local account on the switch. The local account must have the same unique identifier as the privileged user and the shell must be the same.
To configure local fallback authentication:
Add a local privileged user account. For example, if the radius_priv_user account in the /etc/passwd file is radius_priv_user:x:1002:1001::/home/radius_priv_user:/sbin/radius_shell, run the following command to add a local privileged user account named johnadmin:
The RADIUS fixed account is not removed from the /etc/passwd or /etc/group file and the home directories are not removed. They remain in case there are modifications to the account or files in the home directories.
To remove the home directories of the RADIUS users, first get the list by running:
cumulus@switch:~$ sudo ls -l /home | grep radius
For all users listed, except the radius_user, run this command to remove the home directories:
where USERNAME is the account name (the home directory relative portion). This command gives the following warning because the user is not listed in the /etc/passwd file.
userdel: cannot remove entry 'USERNAME' from /etc/passwd
/usr/sbin/deluser: `/usr/sbin/userdel USERNAME' returned error code 1. Exiting.
After you remove all the RADIUS users, run the command to remove the fixed account. If there are changes to the account in the /etc/nss_mapuser.conf file, use that account name instead of radius_user.
If two or more RADIUS users log in simultaneously, a UID lookup only returns the user that logs in first. Any process that either user runs applies to both, and all files that either user creates apply to the first name matched. This process is similar to adding two local users to the password file with the same UID and GID, and is an inherent limitation of using the UID for the fixed user from the password file. The current algorithm returns the first name matching the UID from the mapping file, which is either the first or second user that logs in.
When you install both the TACACS+ and the RADIUS AAA client, Cumulus Linux does not attempt the RADIUS login. As a workaround, do not install both the TACACS+ and the RADIUS AAA client on the same switch.
When the RADIUS server is reachable outside of the management VRF, such as in the default VRF, you might see the following error message when you try to run sudo:
2008-10-31T07:06:36.191359+00:00 SW01 sudo: pam_radius_auth(sudo:auth): Bind for server 10.1.1.25 failed: Cannot assign requested address
2008-10-31T07:06:36.192307+00:00 sw01 sudo: pam_radius_auth(sudo:auth): No valid server found in configuration file /etc/pam_radius_auth.conf
The error occurs because sudo tries to authenticate to the RADIUS server through the management VRF. Before you run sudo, you must set the shell to the correct VRF:
Netfilter is the packet filtering framework in Cumulus Linux and other Linux distributions. You can use several different tools to configure ACLs in Cumulus Linux:
iptables, ip6tables, and ebtables are Linux userspace tools you use to administer filtering rules for IPv4 packets, IPv6 packets, and Ethernet frames (layer 2 using MAC addresses).
cl-acltool is a Cumulus Linux-specific userspace tool you use to administer filtering rules and configure default ACLs. cl-acltool operates on various configuration files and uses iptables, ip6tables, and ebtables to install rules into the kernel. In addition, cl-acltool programs rules in hardware for switch port interfaces, which iptables, ip6tables and ebtables cannot do on their own.
NVUE is a Cumulus Linux-specific userspace tool you can use to configure custom ACLs.
Traffic Rules
Chains
Netfilter describes the way that the Linux kernel classifies and controls packets to, from, and across the switch. Netfilter does not require a separate software daemon to run; it is part of the Linux kernel. Netfilter asserts policies at layer 2, 3 and 4 of the OSI model by inspecting packet and frame headers according to a list of rules. The iptables, ip6tables, and ebtables userspace applications provide syntax you use to define rules.
The rules inspect or operate on packets at several points (chains) in the life of the packet through the system:
PREROUTING touches packets before the switch routes them.
INPUT touches packets after the switch determines that the packets are for the local system but before the control plane software receives them.
FORWARD touches transit traffic as it moves through the switch.
OUTPUT touches packets from the control plane software before they leave the switch.
POSTROUTING touches packets immediately before they leave the switch but after a routing decision.
Tables
When you build rules to affect the flow of traffic, tables can access the individual chains. Linux provides three tables by default:
Filter classifies traffic or filters traffic
NAT applies Network Address Translation rules
Mangle alters packets as they move through the switch
Each table has a set of default chains that modify or inspect packets at different points of the path through the switch. Chains contain the individual rules to influence traffic.
Rules
Rules classify the traffic you want to control. You apply rules to chains, which attach to tables.
Rules have several different components:
Table: The first argument is the table.
Chain: The second argument is the chain. Each table supports several different chains. See Tables above.
Matches: The third argument is the match. You can specify multiple matches in a single rule. However, the more matches you use in a rule, the more memory the rule consumes.
Jump: The jump specifies the target of the rule; what action to take if the packet matches the rule. If you omit this option in a rule, matching the rule has no effect on the packet, but the counters on the rule increment.
Targets: The target is a user-defined chain (other than the one this rule is in), one of the special built-in targets that decides the fate of the packet immediately (like DROP), or an extended target. See Supported Rule Types below for different target examples.
How Rules Parse and Apply
The switch reads all the rules from each chain from iptables, ip6tables, and ebtables and enters them in order into either the filter table or the mangle table. The switch reads the rules from the kernel in the following order:
IPv6 (ip6tables)
IPv4 (iptables)
ebtables
When you combine and put rules into one table, the order determines the relative priority of the rules; iptables and ip6tables have the highest precedence and ebtables has the lowest.
The Linux packet forwarding construct is an overlay for how the silicon underneath processes packets. Be aware of the following:
The switch silicon reorders rules when switchd writes to the ASIC, whereas traditional iptables execute the list of rules in order.
All rules, except for POLICE and SETCLASS rules, are terminating; after a rule matches, the action occurs and no more rules process.
When processing traffic, rules affecting the FORWARD chain that specify an ingress interface process before rules that match on an egress interface. As a workaround, rules that only affect the egress interface can have an ingress interface wildcard (only swp+ and bond+) that matches any interface you apply so that you can maintain order of operations with other input interface rules. For example, with the following rules:
-A FORWARD -i swp1 -j ACCEPT
-A FORWARD -o swp1 -j ACCEPT <-- This rule processes LAST (because of egress interface matching)
-A FORWARD -i swp2 -j DROP
If you modify the rules like this, they process in order:
-A FORWARD -i swp1 -j ACCEPT
-A FORWARD -i swp+ -o $PORTA -j ACCEPT <-- These rules are performed in order (because of wildcard match on the ingress interface)
-A FORWARD -i swp2 -j DROP
When using rules that do a mangle and a filter lookup for a packet, Cumulus Linux processes them in parallel and combines the action.
If there is no ingress interface or egress interface match, Cumulus Linux installs FORWARD chain rules in ingress by default.
When using the OUTPUT chain, you must assign rules to the source. For example, if you assign a rule to the switch port in the direction of traffic but the source is a bridge (VLAN), the rule does not affect the traffic and you must apply it to the bridge.
If you need to apply a rule to all transit traffic, use the FORWARD chain, not the OUTPUT chain.
The switch puts ebtable rules into either the IPv4 or IPv6 memory space depending on whether the rule uses IPv4 or IPv6 to make a decision. The switch only puts layer 2 rules that match the MAC address into the IPv4 memory space.
Rule Placement in Memory
INPUT and ingress (FORWARD -i) rules occupy the same memory space. A rule counts as ingress if you set the -i option. If you set both input and output options (-i and -o), the switch considers the rule as ingress and occupies that memory space. For example:
If you remove the -o option and the interface, it is a valid rule.
Nonatomic Update Mode and Atomic Update Mode
Cumulus Linux enables atomic update mode by default. However, this mode limits the number of ACL rules that you can configure.
To increase the number of configurable ACL rules, configure the switch to operate in nonatomic mode.
Instead of reserving 50% of your TCAM space for atomic updates, incremental update uses the available free space to write the new TCAM rules and swap over to the new rules after this is complete. Cumulus Linux then deletes the old rules and frees up the original TCAM space. If there is insufficient free space to complete this task, the original nonatomic update runs, which interrupts traffic.
You can enable nonatomic updates for switchd, which offer better scaling because all TCAM resources actively impact traffic. With atomic updates, half of the hardware resources are on standby and do not actively impact traffic.
Incremental nonatomic updates are table based, so they do not interrupt network traffic when you install new rules. The rules map to the following tables and update in this order:
mirror (ingress only)
ipv4-mac (can be both ingress and egress)
ipv6 (ingress only)
The incremental nonatomic update operation follows this order:
Updates are incremental, one table at a time without stopping traffic.
Cumulus Linux checks if the rules in a table are different from installation time; if a table does not have any changes, it does not reinstall the rules.
If there are changes in a table, the new rules populate in new groups or slices in hardware, then that table switches over to the new groups or slices.
Finally, old resources for that table free up. This process repeats for each of the tables listed above.
If there are insufficient resources to hold both the new rule set and old rule set, Cumulus Linux tries the regular nonatomic mode, which interrupts network traffic.
If the regular nonatomic update fails, Cumulus Linux reverts back to the previous rules.
To always reload switchd with nonatomic updates:
Edit /etc/cumulus/switchd.conf.
Add the following line to the file:
acl.non_atomic_update_mode = TRUE
Reload switchd with the sudo systemctl reload switchd.service command for the changes to take effect. The reload does not interrupt network services.
During regular non-incremental nonatomic updates, traffic stops, then continues after all the new configuration is in the hardware.
Use iptables, ip6tables, and ebtables Directly
Do not use iptables, ip6tables, ebtables directly; installed rules only apply to the Linux kernel and Cumulus Linux does not hardware accelerate. When you run cl-acltool -i, Cumulus Linux resets all rules and deletes anything that is not in /etc/cumulus/acl/policy.conf.
For example, the following rule appears to work:
cumulus@switch:~$ sudo iptables -A INPUT -p icmp --icmp-type echo-request -j DROP
The cl-acltool -L command shows the rule:
cumulus@switch:~$ sudo cl-acltool -L ip
-------------------------------
Listing rules of type iptables:
-------------------------------
TABLE filter :
Chain INPUT (policy ACCEPT 72 packets, 5236 bytes)
pkts bytes target prot opt in out source destination
0 0 DROP icmp -- any any anywhere anywhere icmp echo-request
However, Cumulus Linux does not synchronize the rule to hardware. Running cl-acltool -i or reboot removes the rule without replacing it. To ensure that Cumulus Linux hardware accelerates all rules that can be in hardware, add them to /etc/cumulus/acl/policy.conf and install them with the cl-acltool -i command.
Estimate the Number of Rules
To estimate the number of rules you can create from an ACL entry, first determine if that entry is an ingress or an egress. Then, determine if it is an IPv4-mac or IPv6 type rule. This determines the slice to which the rule belongs. Use the following to determine how many entries the switch uses for each type.
By default, each entry occupies one double wide entry, except if the entry is one of the following:
An entry with multiple comma-separated input interfaces splits into one rule for each input interface. For example, this entry splits into two rules:
-A FORWARD -i swp1s0,swp1s1 -p icmp -j ACCEPT
An entry with multiple comma-separated output interfaces splits into one rule for each output interface. This entry splits into two rules:
-A FORWARD -i swp+ -o swp1s0,swp1s1 -p icmp -j ACCEPT
An entry with both input and output comma-separated interfaces splits into one rule for each combination of input and output interface This entry splits into four rules:
-A FORWARD -i swp1s0,swp1s1 -o swp1s2,swp1s3 -p icmp -j ACCEPT
An entry with multiple layer 4 port ranges splits into one rule for each range. For example, this entry splits into two rules:
You can match on VLAN IDs on layer 2 interfaces for ingress rules. The following example matches on a VLAN and DSCP class, and sets the internal class of the packet. For extended matching on IP fields, combine this rule with ingress iptable rules.
[ebtables]
-A FORWARD -p 802_1Q --vlan-id 100 -j mark --mark-set 102
[iptables]
-A FORWARD -i swp31 -m mark --mark 102 -m dscp --dscp-class CS1 -j SETCLASS --class 2
Cumulus Linux reserves mark values between 0 and 100; for example, if you use --mark-set 10, you see an error. Use mark values between 101 and 4196.
You cannot mark multiple VLANs with the same value.
If you enable EVPN-MH and configure VLAN match rules in ebtables with a {{mark}} target, the ebtables rule might overwrite the {{mark}} set by traffic class rules you configure for EVPN-MH on ingress. Egress EVPN MH traffic class rules that match the ingress traffic class {{mark}} might not get hit. To work around this issue, add ebtable rules to {{ACCEPT}} the packets already marked by EVPN-MH traffic class rules on ingress.
Install and Manage ACL Rules with NVUE
Instead of crafting a rule by hand, then installing it with cl-acltool, you can use NVUE commands. Cumulus Linux converts the commands to the /etc/cumulus/acl/policy.d/50_nvue.rules file. The rules you create with NVUE are independent of the default files /etc/cumulus/acl/policy.d/00control_plane.rules and 99control_plane_catch_all.rules.
Cumulus Linux 5.0 and later uses the -t mangle -A PREROUTING chain for ingress rules and the -t mangle -A POSTROUTING chain for egress rules instead of the - A FORWARD chain used in previous releases.
To create this rule with NVUE, follow the steps below. NVUE adds all options in the rule automatically.
Set the rule type, the matching protocol, source IP address and port, destination IP address and port, and the action. You must provide a name for the rule (EXAMPLE1 in the commands below):
cumulus@switch:~$ nv set acl EXAMPLE1 type ipv4
cumulus@switch:~$ nv set acl EXAMPLE1 rule 10 match ip protocol tcp
cumulus@switch:~$ nv set acl EXAMPLE1 rule 10 match ip source-ip 10.0.14.2/32
cumulus@switch:~$ nv set acl EXAMPLE1 rule 10 match ip source-port ANY
cumulus@switch:~$ nv set acl EXAMPLE1 rule 10 match ip dest-ip 10.0.15.8/32
cumulus@switch:~$ nv set acl EXAMPLE1 rule 10 match ip dest-port ANY
cumulus@switch:~$ nv set acl EXAMPLE1 rule 10 action permit
Apply the rule to an inbound or outbound interface with the nv set interface <interface> acl command.
For rules affecting the -t mangle -A PREROUTING chain (-A FORWARD in previous releases), apply the rule to an inbound or outbound interface: For example:
To see all installed rules, examine the /etc/cumulus/acl/policy.d/50_nvue.rules file:
cumulus@switch:~$ sudo cat /etc/cumulus/acl/policy.d/50_nvue.rules
[iptables]
## ACL EXAMPLE1 in dir inbound on interface swp1 ##
-t mangle -A PREROUTING -i swp1 -s 10.0.14.2/32 -d 10.0.15.8/32 -p tcp -j ACCEPT
...
To remove this rule, run the nv unset acl <acl-name> and nv unset interface <interface> acl <acl-name> commands. These commands delete the rule from the /etc/cumulus/acl/policy.d/50_nvue.rules file.
To show ACL statistics per interface, such as the total number of bytes that match the ACL rule, run the nv show interface <interface-id> acl <acl-id> statistics or nv show interface <interface-id> acl <acl-id> statistics <rule-id> command.
To see the list of all NVUE ACL commands, run the nv list-commands acl command.
Install and Manage ACL Rules with cl-acltool
You can manage Cumulus Linux ACLs with cl-acltool. Rules write first to the iptables chains, as described above, and then synchronize to hardware through switchd.
To examine the current state of chains and list all installed rules, run:
cumulus@switch:~$ sudo cl-acltool -L all
-------------------------------
Listing rules of type iptables:
-------------------------------
TABLE filter :
Chain INPUT (policy ACCEPT 90 packets, 14456 bytes)
pkts bytes target prot opt in out source destination
0 0 DROP all -- swp+ any 240.0.0.0/5 anywhere
0 0 DROP all -- swp+ any loopback/8 anywhere
0 0 DROP all -- swp+ any base-address.mcast.net/8 anywhere
0 0 DROP all -- swp+ any 255.255.255.255 anywhere ...
To list installed rules using native iptables, ip6tables and ebtables, use the -L option with the respective commands:
If the install fails, ACL rules in the kernel and hardware roll back to the previous state. You also see errors from programming rules in the kernel or ASIC.
Install Packet Filtering (ACL) Rules
cl-acltool takes access control list (ACL) rule input in files. Each ACL policy file includes iptables, ip6tables and ebtables categories under the tags [iptables], [ip6tables] and [ebtables]. You must assign each rule in an ACL policy to one of the rule categories.
See man cl-acltool(5) for ACL rule details. For iptables rule syntax, see man iptables(8). For ip6tables rule syntax, see man ip6tables(8). For ebtables rule syntax, see man ebtables(8).
See man cl-acltool(5) and man cl-acltool(8) for more details on using cl-acltool.
By default:
ACL policy files are in /etc/cumulus/acl/policy.d/.
All *.rules files in /etc/cumulus/acl/policy.d/ directory are also in /etc/cumulus/acl/policy.conf.
All files in the policy.conf file install when the switch boots up.
The policy.conf file expects rule files to have a .rules suffix as part of the file name.
Here is an example ACL policy file:
[iptables]
-A INPUT -i swp1 -p tcp --dport 80 -j ACCEPT
-A FORWARD -i swp1 -p tcp --dport 80 -j ACCEPT
[ip6tables]
-A INPUT -i swp1 -p tcp --dport 80 -j ACCEPT
-A FORWARD -i swp1 -p tcp --dport 80 -j ACCEPT
[ebtables]
-A INPUT -p IPv4 -j ACCEPT
-A FORWARD -p IPv4 -j ACCEPT
You can use wildcards or variables to specify chain and interface lists.
You can only use swp+ and bond+ as wildcard names.
swp+ rules apply as an aggregate, not per port. If you want to apply per port policing, specify a specific port instead of the wildcard.
You can write ACL rules for the system into multiple files under the default /etc/cumulus/acl/policy.d/ directory. The ordering of rules during installation follows the sort order of the files according to their file names.
Use multiple files to stack rules. The example below shows two rule files that separate rules for management and datapath traffic:
cumulus@switch:~$ ls /etc/cumulus/acl/policy.d/
00sample_mgmt.rules 01sample_datapath.rules
cumulus@switch:~$ cat /etc/cumulus/acl/policy.d/00sample_mgmt.rules
INGRESS_INTF = swp+
INGRESS_CHAIN = INPUT
[iptables]
# protect the switch management
-A $INGRESS_CHAIN -i $INGRESS_INTF -s 10.0.14.2 -d 10.0.15.8 -p tcp -j ACCEPT
-A $INGRESS_CHAIN -i $INGRESS_INTF -s 10.0.11.2 -d 10.0.12.8 -p tcp -j ACCEPT
-A $INGRESS_CHAIN -i $INGRESS_INTF -d 10.0.16.8 -p udp -j DROP
cumulus@switch:~$ cat /etc/cumulus/acl/policy.d/01sample_datapath.rules
INGRESS_INTF = swp+
INGRESS_CHAIN = INPUT, FORWARD
[iptables]
-A $INGRESS_CHAIN -i $INGRESS_INTF -s 192.0.2.5 -p icmp -j ACCEPT
-A $INGRESS_CHAIN -i $INGRESS_INTF -s 192.0.2.6 -d 192.0.2.4 -j DROP
-A $INGRESS_CHAIN -i $INGRESS_INTF -s 192.0.2.2 -d 192.0.2.8 -j DROP
Apply all rules and policies included in /etc/cumulus/acl/policy.conf:
cumulus@switch:~$ sudo cl-acltool -i
Specify the Policy Files to Install
By default, Cumulus Linux installs any .rules file you configure in /etc/cumulus/acl/policy.d/. To add other policy files to an ACL, you need to include them in /etc/cumulus/acl/policy.conf. For example, for Cumulus Linux to install a rule in a policy file called 01_new.datapathacl, add include /etc/cumulus/acl/policy.d/01_new.rules to policy.conf:
cumulus@switch:~$ sudo nano /etc/cumulus/acl/policy.conf
#
# This file is a master file for acl policy file inclusion
#
# Note: This is not a file where you list acl rules.
#
# This file can contain:
# - include lines with acl policy files
# example:
# include <filepath>
#
# see manpage cl-acltool(5) and cl-acltool(8) for how to write policy files
#
include /etc/cumulus/acl/policy.d/01_new.datapathacl
Hardware Limitations on Number of Rules
The maximum number of rules that the hardware process depends on:
The mix of IPv4 and IPv6 rules; Cumulus Linux does not support the maximum number of rules for both IPv4 and IPv6 simultaneously.
The number of default rules that Cumulus Linux provides.
Whether the rules apply on ingress or egress.
Whether the rules are in atomic or nonatomic mode; Cumulus Linux uses nonatomic mode rules when you enable nonatomic updates (see above).
If you exceed the maximum number of rules for a particular table, cl-acltool -i generates the following error:
error: hw sync failed (sync_acl hardware installation failed) Rolling back .. failed.
In the table below, the default rules count toward the limits listed. The raw limits below assume only one ingress and one egress table are present.
The NVIDIA Spectrum ASIC has one common TCAM for both ingress and egress, which you can use for other non-ACL-related resources. However, the number of supported rules varies with the TCAM profile for the switch.
Profile
Atomic Mode IPv4 Rules
Atomic Mode IPv6 Rules
Nonatomic Mode IPv4 Rules
Nonatomic Mode IPv6 Rules
default
500
250
1000
500
ipmc-heavy
750
500
1500
1000
acl-heavy
1750
1000
3500
2000
ipmc-max
1000
500
2000
1000
ip-acl-heavy
6000
0
12000
0
Even though the table above specifies the ip-acl-heavy profile supports no IPv6 rules, Cumulus Linux does not prevent you from configuring IPv6 rules. However, there is no guarantee that IPv6 rules work under the ip-acl-heavy profile.
The ip-acl-heavy profile shows an updated number of supported atomic mode and nonatomic mode IPv4 rules. The previously published numbers were 7500 for atomic mode and 15000 for nonatomic mode IPv4 rules.
Supported Rule Types
The iptables/ip6tables/ebtables construct tries to layer the Linux implementation on top of the underlying hardware but they are not always directly compatible. Here are the supported rules for chains in iptables, ip6tables and ebtables.
To learn more about any of the options shown in the tables below, run iptables -h [name of option]. The same help syntax works for options for ip6tables and ebtables.
root@leaf1# ebtables -h tricolorpolice
...
tricolorpolice option:
--set-color-mode STRING setting the mode in blind or aware
--set-cir INT setting committed information rate in kbits per second
--set-cbs INT setting committed burst size in kbyte
--set-pir INT setting peak information rate in kbits per second
--set-ebs INT setting excess burst size in kbyte
--set-conform-action-dscp INT setting dscp value if the action is accept for conforming packets
--set-exceed-action-dscp INT setting dscp value if the action is accept for exceeding packets
--set-violate-action STRING setting the action (accept/drop) for violating packets
--set-violate-action-dscp INT setting dscp value if the action is accept for violating packets
Supported chains for the filter table:
INPUT FORWARD OUTPUT
Rules with input/output Ethernet interfaces do not apply Inverse matches
Standard Targets
ACCEPT, DROP
RETURN, QUEUE, STOP, Fall Thru, Jump
Extended Targets
LOG (IPv4/IPv6); UID is not supported for LOG TCP SEQ, TCP options or IP options ULOG SETQOS DSCP Unique to Cumulus Linux: SPAN ERSPAN (IPv4/IPv6) POLICE TRICOLORPOLICE SETCLASS
ebtables Rule Support
Rule Element
Supported
Unsupported
Matches
ether type input interface/wildcard output interface/wildcard Src/Dst MAC IP: src, dest, tos, proto, sport, dport IPv6: tclass, icmp6: type, icmp6: code range, src/dst addr, sport, dport 802.1p (CoS) VLAN
Rules that have no matches and accept all packets in a chain are currently ignored.
Chain default rules (that are ACCEPT) are also ignored.
Considerations
Splitting rules across the ingress TCAM and the egress TCAM causes the ingress IPv6 part of the rule to match packets going to all destinations, which can interfere with the regular expected linear rule match in a sequence. For example:
A higher rule can prevent a lower rule from matching:
Rule 1 matches all icmp6 packets from to all out interfaces in the ingress TCAM.
This prevents rule 2 from matching, which is more specific but with a different out interface. Make sure to put more specific matches above more general matches even if the output interfaces are different.
When you have two rules with the same output interface, the lower rule might match depending on the presence of the previous rules.
Rule 1: -A FORWARD -o vlan100 -p icmp6 -j ACCEPT
Rule 2: -A FORWARD -o vlan101 -s 00::01 -j DROP
Rule 3: -A FORWARD -o vlan101 -p icmp6 -j ACCEPT
Rule 3 still matches for an icmp6 packet with sip 00:01 going out of vlan101. Rule 1 interferes with the normal function of rule 2 and/or rule 3.
When you have two adjacent rules with the same match and different output interfaces, such as:
Rule 1: -A FORWARD -o vlan100 -p icmp6 -j ACCEPT
Rule 2: -A FORWARD -o vlan101 -p icmp6 -j DROP
Rule 2 never matches on ingress. Both rules share the same mark.
Common Examples
Data Plane Policers
You can configure quality of service for traffic on the data plane. By using QoS policers, you can rate limit traffic so incoming packets get dropped if they exceed specified thresholds.
Counters on POLICE ACL rules in iptables do not show dropped packets due to those rules.
The following example rate limits the incoming traffic on swp1 to 400 packets per second with a burst of 200 packets per second:
cumulus@switch:~$ nv set acl example1 type ipv4
cumulus@switch:~$ nv set acl example1 rule 10 action police
cumulus@switch:~$ nv set acl example1 rule 10 action police mode packet
cumulus@switch:~$ nv set acl example1 rule 10 action police burst 200
cumulus@switch:~$ nv set acl example1 rule 10 action police rate 400
cumulus@switch:~$ nv set interface swp1 acl example1 inbound
cumulus@switch:~$ nv config apply
Use the POLICE target with iptables. POLICE takes these arguments:
--set-rate value specifies the maximum rate in kilobytes (KB) or packets.
--set-burst value specifies the number of packets or kilobytes (KB) allowed to arrive sequentially.
--set-mode string sets the mode in KB (kilobytes) or pkt (packets) for rate and burst size.
For example, to rate limit the incoming traffic on swp1 to 400 packets per second with a burst of 200 packets per second and set this rule in your appropriate .rules file:
You can configure quality of service for traffic on the control plane and rate limit traffic so incoming packets drop if they exceed certain thresholds in the following ways:
Run NVUE commands.
Edit the /etc/cumulus/control-plane/policers.conf file.
Cumulus Linux 5.0 and later no longer uses INPUT chain rules to configure control plane policers.
To configure control plane policers:
Set the burst rate for the trap group with the nv set system control-plane policer <trap-group> burst <value> command. The burst rate is the number of packets or kilobytes (KB) allowed to arrive sequentially.
Set the forwarding rate for the trap group with the nv set system control-plane policer <trap-group> rate <value> command. The forwarding rate is the maximum rate in kilobytes (KB) or packets.
The trap group can be: arp, bfd, pim-ospf-rip, bgp, clag, icmp-def, dhcp-ptp, igmp, ssh, icmp6-neigh, icmp6-def-mld, lacp, lldp, rpvst, eapol, ip2me, acl-log, nat, stp, l3-local, span-cpu, catch-all, or NONE.
The following example changes the PIM trap group forwarding rate and burst rate to 400 packets per second, and the IGMP trap group forwarding rate to 400 packets per second and burst rate to 200 packets per second:
cumulus@switch:~$ nv set system control-plane policer pim-ospf-rip rate 400
cumulus@switch:~$ nv set system control-plane policer pim-ospf-rip burst 400
cumulus@switch:~$ nv set system control-plane policer pim-ospf-rip state on
cumulus@switch:~$ nv set system control-plane policer igmp rate 400
cumulus@switch:~$ nv set system control-plane policer igmp burst 200
cumulus@switch:~$ nv config apply
To rate limit traffic using the /etc/cumulus/control-plane/policers.conf file, you:
Enable an individual policer for a trap group (set enable to TRUE).
Set the policer rate in packets per second. The forwarding rate is the maximum rate in kilobytes (KB) or packets.
Set the policer burst rate in packets per second. The burst rate is the number of packets or kilobytes (KB) allowed to arrive sequentially.
After you edit the /etc/cumulus/control-plane/policers.conf file, you must reload the file with the /usr/lib/cumulus/switchdctl --load /etc/cumulus/control-plane/policers.conf command.
When enable is FALSE for a trap group, the trap group and catch-all trap group have a shared policer. When enable is TRUE, Cumulus Linux creates an individual policer for the trap group.
The following example changes the PIM trap group forwarding rate and burst rate to 400 packets per second, and the IGMP trap group forwarding rate to 400 packets per second and burst rate to 200 packets per second:
To show the control plane police configuration and statistics, run the NVUE nv show system control-plane policer --view=statistics command.
Cumulus Linux provides default control plane policer values. You can adjust these values to accommodate higher scale requirements for specific protocols as needed.
You can configure control plane ACLs to apply a single rule for all packets forwarded to the CPU regardless of the source interface or destination interface on the switch. Control plane ACLs allow you to regulate traffic forwarded to applications on the switch with more granularity than traps and to configure ACLs to block SSH from specific addresses or subnets.
Cumulus Linux applies inbound control plane ACLs in the INPUT chain and outbound control plane ACLs in the OUTPUT chain.
Cumulus Linux does not support a deny all control plane rule. This type of rule blocks traffic for interprocess communication and impacts overall system functionality.
The following example command applies the input control plane ACL called ACL1.
cumulus@switch:~$ nv set system control-plane acl ACL1 inbound
cumulus@switch:~$ nv config apply
The following example command applies the output control plane ACL called ACL2.
cumulus@switch:~$ nv set system control-plane acl ACL2 outbound
cumulus@switch:~$ nv config apply
To show statistics for all control-plane ACLs, run the nv show system control-plane acl command:
cumulus@switch:~$ nv show system control-plane acl
ACL Name Rule ID In Packets In Bytes Out Packets Out Bytes
--------- ------- ---------- -------- ----------- ---------
acl1 1 0 0 0 0
65535 0 0 0 0
acl2 1 0 0 0 0
65535 0 0 0 0
To show statistics for a specific control-plane ACL, run the nv show system control-plane acl <acl_name> statistics command:
cumulus@switch:~$ nv show system control-plane acl ACL1 statistics
Rule In Packet In Byte Out Packet Out Byte Summary
---- --------- ------- ---------- -------- ---------------------------
1 0 0 Bytes 0 0 Bytes match.ip.dest-ip: 9.1.2.3
2 0 0 Bytes 0 0 Bytes match.ip.source-ip: 7.8.2.3
Set DSCP on Transit Traffic
The examples here use the mangle table to modify the packet as it transits the switch. DSCP is in decimal notation in the examples below.
[iptables]
#Set SSH as high priority traffic.
-t mangle -A PREROUTING -i swp+ -p tcp -m multiport --dports 22 -j SETQOS --set-dscp 46
#Set everything coming in swp1 as AF13
-t mangle -A PREROUTING -i swp1 -j SETQOS --set-dscp 14
#Set Packets destined for 10.0.100.27 as best effort
-t mangle -A PREROUTING -i swp+ -d 10.0.100.27/32 -j SETQOS --set-dscp 0
#Example using a range of ports for TCP traffic
-t mangle -A PREROUTING -i swp+ -s 10.0.0.17/32 -d 10.0.100.27/32 -p tcp -m multiport --sports 10000:20000 -m multiport --dports 10000:20000 -j SETQOS --set-dscp 34
Apply the rule:
cumulus@switch:~$ sudo cl-acltool -i
To set SSH as high priority traffic:
cumulus@switch:~$ nv set acl EXAMPLE1 type ipv4
cumulus@switch:~$ nv set acl EXAMPLE1 rule 10 match ip protocol tcp
cumulus@switch:~$ nv set acl EXAMPLE1 rule 10 match ip dest-port 22
cumulus@switch:~$ nv set acl EXAMPLE1 rule 10 action set dscp 46
cumulus@switch:~$ nv set interface swp1-48 acl EXAMPLE1 inbound
cumulus@switch:~$ nv config apply
To set everything coming in swp1 as AF13:
cumulus@switch:~$ nv set acl EXAMPLE1 type ipv4
cumulus@switch:~$ nv set acl EXAMPLE1 rule 10 action set dscp 14
cumulus@switch:~$ nv set interface swp1 acl EXAMPLE1 inbound
cumulus@switch:~$ nv config apply
To set Packets destined for 10.0.100.27 as best effort:
cumulus@switch:~$ nv set acl EXAMPLE1 type ipv4
cumulus@switch:~$ nv set acl EXAMPLE1 rule 10 match ip dest-ip 10.0.100.27/32
cumulus@switch:~$ nv set acl EXAMPLE1 rule 10 action set dscp 0
cumulus@switch:~$ nv set interface swp1-48 acl EXAMPLE1 inbound
cumulus@switch:~$ nv config apply
To use a range of ports for TCP traffic:
cumulus@switch:~$ nv set acl EXAMPLE1 type ipv4
cumulus@switch:~$ nv set acl EXAMPLE1 rule 10 match ip protocol tcp
cumulus@switch:~$ nv set acl EXAMPLE1 rule 10 match ip source-ip 10.0.0.17/32
cumulus@switch:~$ nv set acl EXAMPLE1 rule 10 match ip source-port 10000:20000
cumulus@switch:~$ nv set acl EXAMPLE1 rule 10 match ip dest-ip 10.0.100.27/32
cumulus@switch:~$ nv set acl EXAMPLE1 rule 10 match ip dest-port 10000:20000
cumulus@switch:~$ nv set acl EXAMPLE1 rule 10 action set dscp 34
cumulus@switch:~$ nv set interface swp1-48 acl EXAMPLE1 inbound
cumulus@switch:~$ nv config apply
To specify all ports on the switch in NVUE (swp+ in an iptables rule), you must set the range of interfaces on the switch as in the examples above (nv set interface swp1-48). This command creates as many rules in the /etc/cumulus/acl/policy.d/50_nvue.rules file as the number of interfaces in the range you specify.
Filter Specific TCP Flags
The example rule below drops ingress IPv4 TCP packets when you set the SYN bit and reset the RST, ACK, and FIN bits. The rule applies inbound on interface swp1. After configuring this rule, you cannot establish new TCP sessions that originate from ingress port swp1. You can establish TCP sessions that originate from any other port.
-t mangle -A PREROUTING -i swp1 -p tcp --tcp-flags ACK,SYN,FIN,RST SYN -j DROP
Apply the rule:
cumulus@switch:~$ sudo cl-acltool -i
cumulus@switch:~$ nv set acl EXAMPLE1 type ipv4
cumulus@switch:~$ nv set acl EXAMPLE1 rule 20 match ip protocol tcp
cumulus@switch:~$ nv set acl EXAMPLE1 rule 20 match ip tcp flags syn
cumulus@switch:~$ nv set acl EXAMPLE1 rule 20 match ip tcp mask rst
cumulus@switch:~$ nv set acl EXAMPLE1 rule 20 match ip tcp mask syn
cumulus@switch:~$ nv set acl EXAMPLE1 rule 20 match ip tcp mask fin
cumulus@switch:~$ nv set acl EXAMPLE1 rule 20 match ip tcp mask ack
cumulus@switch:~$ nv set acl EXAMPLE1 rule 20 action deny
cumulus@switch:~$ nv set interface swp1 acl EXAMPLE1 inbound
cumulus@switch:~$ nv config apply
Control Who Can SSH into the Switch
Run the following commands to control who can SSH into the switch.
In the following example, 10.10.10.1/32 is the interface IP address (or loopback IP address) of the switch and 10.255.4.0/24 can SSH into the switch.
-A INPUT -i swp+ -s 10.255.4.0/24 -d 10.10.10.1/32 -j ACCEPT
-A INPUT -i swp+ -d 10.10.10.1/32 -j DROP
Apply the rule:
cumulus@switch:~$ sudo cl-acltool -i
cumulus@switch:~$ nv set acl example2 type ipv4
cumulus@switch:~$ nv set acl example2 rule 10 match ip source-ip 10.255.4.0/24
cumulus@switch:~$ nv set acl example2 rule 10 match ip dest-ip 10.10.10.1/32
cumulus@switch:~$ nv set acl example2 rule 10 action permit
cumulus@switch:~$ nv set acl example2 rule 20 match ip source-ip ANY
cumulus@switch:~$ nv set acl example2 rule 20 match ip dest-ip 10.10.10.1/32
cumulus@switch:~$ nv set acl example2 rule 20 action deny
cumulus@switch:~$ nv set system control-plane acl example2 inbound
cumulus@switch:~$ nv config apply
Match on ECN Bits in the TCP IP Header
ECN allows end-to-end notification of network congestion without dropping packets. You can add ECN rules to match on the ECE, CWR, and ECT flags in the TCP IPv4 header.
By default, ECN rules match a packet with the bit set. You can reverse the match by using an explanation point (!).
Match on the ECE Bit
After an endpoint receives a packet with the CE bit set by a router, it sets the ECE bit in the returning ACK packet to notify the other endpoint that it needs to slow down.
To match on the ECE bit:
Create a rules file in the /etc/cumulus/acl/policy.d directory and add the following rule under [iptables]:
cumulus@switch:~$ nv set acl example2 type ipv4
cumulus@switch:~$ nv set acl example2 rule 10 match ip protocol tcp
cumulus@switch:~$ nv set acl example2 rule 10 match ip ecn flags tcp-cwr
cumulus@switch:~$ nv set acl example2 rule 10 action permit
cumulus@switch:~$ nv set interface swp1 acl example2 inbound
cumulus@switch:~$ nv config apply
Match on the ECT Bit
The ECT codepoints negotiate if the connection is ECN capable by setting one of the two bits to 1. Routers also use the ECT bit to indicate that they are experiencing congestion by setting both the ECT codepoints to 1.
To match on the ECT bit:
Create a rules file in the /etc/cumulus/acl/policy.d directory and add the following rule under [iptables]:
cumulus@switch:~$ nv set acl example2 type ipv4
cumulus@switch:~$ nv set acl example2 rule 10 match ip protocol tcp
cumulus@switch:~$ nv set acl example2 rule 10 match ip ecn ip-ect 1
cumulus@switch:~$ nv set acl example2 rule 10 action permit
cumulus@switch:~$ nv set interface swp1 acl example2 inbound
cumulus@switch:~$ nv config apply
Example Configuration
The following example demonstrates how Cumulus Linux applies several different rules.
Egress Rule
The following rule blocks any TCP traffic with destination port 200 going through leaf01 to server01 (rule 1 in the diagram above).
[iptables]
-t mangle -A POSTROUTING -o swp1 -p tcp -m multiport --dports 200 -j DROP
cumulus@switch:~$ nv set acl EXAMPLE1 type ipv4
cumulus@switch:~$ nv set acl EXAMPLE1 rule 10 match ip protocol tcp
cumulus@switch:~$ nv set acl EXAMPLE1 rule 10 match ip dest-port 200
cumulus@switch:~$ nv set acl EXAMPLE1 rule 10 action deny
cumulus@switch:~$ nv set interface swp1 acl EXAMPLE1 outbound
cumulus@switch:~$ nv config apply
Ingress Rule
The following rule blocks any UDP traffic with source port 200 going from server01 through leaf01 (rule 2 in the diagram above).
[iptables]
-t mangle -A PREROUTING -i swp1 -p udp -m multiport --sports 200 -j DROP
cumulus@switch:~$ nv set acl EXAMPLE1 type ipv4
cumulus@switch:~$ nv set acl EXAMPLE1 rule 10 match ip protocol udp
cumulus@switch:~$ nv set acl EXAMPLE1 rule 10 match ip source-port 200
cumulus@switch:~$ nv set acl EXAMPLE1 rule 10 action deny
cumulus@switch:~$ nv set interface swp1 acl EXAMPLE1 inbound
cumulus@switch:~$ nv config apply
Input Rule
The following rule blocks any UDP traffic with source port 200 and destination port 50 going from server02 to the leaf02 control plane (rule 3 in the diagram above).
[iptables]
-A INPUT -i swp2 -p udp -m multiport --dports 50 -j DROP
cumulus@switch:~$ nv set acl EXAMPLE1 type ipv4
cumulus@switch:~$ nv set acl EXAMPLE1 rule 10 match ip protocol udp
cumulus@switch:~$ nv set acl EXAMPLE1 rule 10 match ip dest-port 50
cumulus@switch:~$ nv set acl EXAMPLE1 rule 10 action deny
cumulus@switch:~$ nv set interface swp2 acl EXAMPLE1 inbound control-plane
cumulus@switch:~$ nv config apply
Output Rule
The following rule blocks any TCP traffic with source port 123 and destination port 123 going from leaf02 to server02 (rule 4 in the diagram above).
[iptables]
-A OUTPUT -o swp2 -p tcp -m multiport --sports 123 -m multiport --dports 123 -j DROP
cumulus@switch:~$ nv set acl EXAMPLE1 type ipv4
cumulus@switch:~$ nv set acl EXAMPLE1 rule 10 match ip protocol tcp
cumulus@switch:~$ nv set acl EXAMPLE1 rule 10 match ip source-port 123
cumulus@switch:~$ nv set acl EXAMPLE1 rule 10 match ip dest-port 123
cumulus@switch:~$ nv set acl EXAMPLE1 rule 10 action deny
cumulus@switch:~$ nv set interface swp2 acl EXAMPLE1 outbound control-plane
cumulus@switch:~$ nv config apply
Layer 2 Rules (ebtables)
The following rule blocks any traffic with source MAC address 00:00:00:00:00:12 and destination MAC address 08:9e:01:ce:e2:04 going from any switch port egress or ingress.
[ebtables]
-A FORWARD -s 00:00:00:00:00:12 -d 08:9e:01:ce:e2:04 -j DROP
cumulus@switch:~$ nv set acl EXAMPLE type mac
cumulus@switch:~$ nv set acl EXAMPLE rule 10 match mac source-mac 00:00:00:00:00:12
cumulus@switch:~$ nv set acl EXAMPLE rule 10 match mac dest-mac 08:9e:01:ce:e2:04
cumulus@switch:~$ nv set acl EXAMPLE rule 10 action deny
cumulus@switch:~$ nv set interface swp1-48 acl EXAMPLE inbound
cumulus@switch:~$ nv config apply
Considerations
Not All Rules Supported
Cumulus Linux does not support all iptables, ip6tables, or ebtables rules. Refer to Supported Rules for specific rule support.
ACL Log Policer Limits Traffic
To protect the CPU from overloading, Cumulus Linux limits traffic copied to the CPU to 1 packet per second by an ACL Log Policer.
Bridge Traffic Limitations
Bridge traffic that matches LOG ACTION rules do not log to syslog; the kernel and hardware identify packets using different information.
You Cannot Forward Log Actions
You cannot forward logged packets. The hardware cannot both forward a packet and send the packet to the control plane (or kernel) for logging. A log action must also have a drop action.
SPAN Sessions that Reference an Outgoing Interface
Because Cumulus Linux is a Linux operating system, you can use the iptables commands. However, consider using cl-acltool instead for the following reasons:
Without using cl-acltool, rules do not install into hardware.
Running cl-acltool -i (the installation command) resets all rules and deletes anything that is not in the /etc/cumulus/acl/policy.conf file.
For example, running the following command works:
cumulus@switch:~$ sudo iptables -A INPUT -p icmp --icmp-type echo-request -j DROP
The rules appear when you run cl-acltool -L:
cumulus@switch:~$ sudo cl-acltool -L ip
-------------------------------
Listing rules of type iptables:
-------------------------------
TABLE filter :
Chain INPUT (policy ACCEPT 72 packets, 5236 bytes)
pkts bytes target prot opt in out source destination
0 0 DROP icmp -- any any anywhere anywhere icmp echo-request
However, running cl-acltool -i or reboot removes them. To ensure that Cumulus Linux can hardware accelerate all rules that can be in hardware, place them in the /etc/cumulus/acl/policy.conf file, then run cl-acltool -i.
Where to Assign Rules
If you assign a switch port to a bond, you must assign any egress rules to the bond.
When using the OUTPUT chain, you must assign rules to the source. For example, if you assign a rule to the switch port in the direction of traffic but the source is a bridge (VLAN), the rule does not affect the traffic and you must apply the rule to the bridge.
If you need to apply a rule to all transit traffic, use the FORWARD chain, not the OUTPUT chain.
ACL Rule Installation Failure
After an ACL rule installation failure, you see a generic error message like the following:
cumulus@switch:$ sudo cl-acltool -i -p 00control_plane.rules
Using user provided rule file 00control_plane.rules
Reading rule file 00control_plane.rules ...
Processing rules in file 00control_plane.rules ...
error: hw sync failed (sync_acl hardware installation failed)
Installing acl policy... Rolling back ..
failed.
ACLs Do not Match when the Output Port on the ACL is a Subinterface
The ACL does not match on packets when you configure a subinterface as the output port. The ACL matches on packets only if the primary port is as an output port. If a subinterface is an output or egress port, the packets match correctly.
For example:
-A FORWARD -o swp49s1.100 -j ACCEPT
Egress ACL Matching on Bonds
Cumulus Linux does not support ACL rules that match on an outbound bond interface. For example, you cannot create the following rule:
[iptables]
-A FORWARD -o <bond_intf> -j DROP
To work around this issue, duplicate the ACL rule on each physical port of the bond. For example:
[iptables]
-A FORWARD -o <bond-member-port-1> -j DROP
-A FORWARD -o <bond-member-port-2> -j DROP
SSH Traffic to the Management VRF
To allow SSH traffic to the management VRF, use -i mgmt, not -i eth0. For example:
In INPUT chain rules, the -i swp+ match works only if the destination of the packet is towards a layer 3 swp interface; the match does not work if the packet terminates at an SVI interface (for example, vlan10). To allow traffic towards specific SVIs, use rules without any interface match or rules with individual -i <SVI> matches.
Services (also known as daemons) and processes are at the heart of how a Linux system functions. Most of the time, a service takes care of itself; you just enable and start it, then let it run. However, because a Cumulus Linux switch is a Linux system, you can dig deeper if you like. Services can start multiple processes as they run. Services are important to monitor on a Cumulus Linux switch.
You manage services in Cumulus Linux in the following ways:
Identify all active or stopped services
Identify boot time state of a specific service
Disable or enable a specific service
Identify active listener ports
systemd and the systemctl Command
You manage services using systemd with the systemctl command. You run the systemctl command with any service on the switch to start, stop, restart, reload, enable, disable, reenable, or get the status of the service.
systemctl has commands that perform a specific operation on a given service:
status returns the status of the specified service.
start starts the service.
stop stops the service.
restart stops, then starts the service, all the while maintaining state. If there are dependent services or services that mark the restarted service as Required, the other services also restart. For example, running systemctl restart frr.service restarts any of the routing protocol services that you enable and that are running, such as bgpd or ospfd.
reload reloads the configuration for the service.
enable enables the service to start when the system boots, but does not start it unless you use the systemctl start SERVICENAME.service command or reboot the switch.
disable disables the service, but does not stop it unless you use the systemctl stop SERVICENAME.service command or reboot the switch. You can start or stop a disabled service.
reenable disables, then enables a service. Run this command so that any new Wants or WantedBy lines create the symlinks necessary for ordering. This has no side effects on other services.
You do not need to interact with the services directly using these commands. If a critical service crashes or encounters an error, systemd restarts it automatically. systemd is the caretaker of services in modern Linux systems and responsible for starting all the necessary services at boot time.
Ensure a Service Starts after Multiple Restarts
By default, systemd tries to restart a particular service only a certain number of times within a given interval before the service fails to start. The settings StartLimitInterval (which defaults to 10 seconds) and StartBurstLimit (which defaults to 5 attempts) are in the service script; however, certain services override these defaults, sometimes with much longer times. For example, switchd.service sets StartLimitInterval=10m and StartBurstLimit=3; therefore, if you restart switchd more than three times in ten minutes, it does not start.
When the restart fails for this reason, you see a message similar to the following:
Job for switchd.service failed. See 'systemctl status switchd.service' and 'journalctl -xn' for details.
systemctl status switchd.service shows output similar to:
Active: failed (Result: start-limit) since Thu 2016-04-07 21:55:14 UTC; 15s ago
To clear this error, run systemctl reset-failed switchd.service. If you know you are going to restart frequently (multiple times within the StartLimitInterval), you can run the same command before you issue the restart request. This also applies to stop followed by start.
Keep systemd Services from Hanging after Starting
If you start, restart, or reload a systemd service that you can start from another systemd service, you must use the --no-block option with systemctl.
Identify Active Listener Ports for IPv4 and IPv6
You can identify the active listener ports under both IPv4 and IPv6 using the netstat command:
To see active or stopped services, run the cl-service-summary command:
cumulus@switch:~$ cl-service-summary
Service cron enabled active
Service ssh enabled active
Service syslog enabled active
Service asic-monitor enabled inactive
Service clagd enabled inactive
Service cumulus-poe inactive
Service lldpd enabled active
Service mstpd enabled active
Service neighmgrd enabled active
Service nvued enabled active
Service netq-agent enabled active
Service ntp enabled active
Service ptmd enabled active
Service pwmd enabled active
Service smond enabled active
Service switchd enabled active
Service sysmonitor enabled active
Service rdnbrd disabled inactive
Service frr enabled inactive
...
You can also run the systemctl list-unit-files --type service command to list all services on the switch and to see their status:
The switchd service enables the switch to communicate with Cumulus Linux and all the applications running on Cumulus Linux.
Configure switchd Settings
You can control certain options associated with the switchd process. For example, you can set polling intervals, optimize ACL hardware resources for better utilization, configure log message levels, set the internal VLAN range, and configure VXLAN encapsulation and decapsulation.
To configure switchd options, you either run NVUE commands or manually edit the /etc/cumulus/switchd.conf file.
NVUE currently only supports a subset of the switchd configuration available in the /etc/cumulus/switchd.conf file.
You can run NVUE commands to set the following switchd options:
The statistic polling interval for physical interfaces and for logical interfaces.
For physical interfaces, you can specify a value between 1 and 10. The default setting is 2 seconds
For logical interfaces, you can specify a value between 1 and 30. The default setting is 5 seconds.
A low setting, such as 1, might affect system performance.
The log level to debug the data plane programming related code. You can specify debug, info, notice, warning, or error. The default setting is info. NVIDIA recommends that you do not set the log level to debug in a production environment.
The DSCP action and value for encapsulation. You can set the DSCP action to copy (to copy the value from the IP header of the packet), set (to specify a specific value), or derive (to obtain the value from the switch priority). The default action is derive. Only specify a value if the action is set.
The DSCP action for decapsulation in VXLAN outer headers. You can specify copy (to copy the value from the IP header of the packet), preserve (to keep the inner DSCP value), or derive (to obtain the value from the switch priority). The default action is derive.
The preference between a route and neighbor with the same IP address and mask. You can specify route, neighbor, or route-and-neighbour. The default setting is route.
The ACL mode (atomic or non-atomic). The default setting is atomic.
The reserved VLAN range. The default setting is 3725-3999.
Certain switchd settings require a switchd restart or reload. Before applying the settings, NVUE indicates if it requires a switchd restart or reload and prompts you for confirmation.
When the switchd service restarts, in addition to resetting the switch hardware configuration, all network ports reset.
When the switchd service reloads, there is no interruption to network services.
The following command example sets both the statistic polling interval for logical interfaces and physical interfaces to 6 seconds:
cumulus@switch:~$ nv set system counter polling-interval logical-interface 6
cumulus@switch:~$ nv set system counter polling-interval physical-interface 6
cumulus@switch:~$ nv config apply
The following command example sets the log level for debugging the data plane programming related code to warning:
cumulus@switch:~$ nv set system forwarding programming log-level warning
cumulus@switch:~$ nv config apply
The following command example sets the DSCP action for encapsulation in VXLAN outer headers to set and the value to af12:
cumulus@switch:~$ nv set nve vxlan encapsulation dscp action set
cumulus@switch:~$ nv set nve vxlan encapsulation dscp value af12
cumulus@switch:~$ nv config apply
The following command example sets the DSCP action for decapsulation in VXLAN outer headers to preserve:
The following command example sets the route or neighbour preference to both route and neighbour:
cumulus@switch:~$ nv set system forwarding host-route-preference route-and-neighbour
cumulus@switch:~$ nv config apply
The following command example sets the ACL mode to non-atomic:
cumulus@switch:~$ nv set system acl mode non-atomic
cumulus@switch:~$ nv config apply
The following command example sets the reserved VLAN range between 4064 and 4094:
cumulus@switch:~$ nv set system global reserved vlan internal range 4064-4094
cumulus@switch:~$ nv config apply
To configure the switchd parameters, edit the /etc/cumulus/switchd.conf file. Change the setting and uncomment the line if needed. The switchd.conf file contains comments with a description for each setting.
The following example shows the first few lines of the /etc/cumulus/switchd.conf file.
The following table describes the /etc/cumulus/switchd.conf file parameters and indicates if you need to restart switchd with the sudo systemctl restart switchd.service command or reload switchd with the sudo systemctl reload switchd.service command for changes to take effect when you update the setting.
Restarting the switchd service causes all network ports to reset in addition to resetting the switch hardware configuration.
Parameter
Description
switchd reload or restart
stats.poll_interval
The statistics polling interval in milliseconds.The default setting is 2000.
restart
buf_util.poll_interval
The buffer utilization polling interval in milliseconds. 0 disables buffer utilization polling.The default setting is 0.
restart
buf_util.measure_interval
The buffer utilization measurement interval in minutes.The default setting is 0.
restart
acl.optimize_hw
Optimizes ACL hardware resources for better utilization.The default setting is FALSE.
restart
acl.flow_based_mirroring
Enables flow-based mirroring.The default setting is TRUE.
restart
acl.non_atomic_update_mode
Enables non atomic ACL updatesThe default setting is FALSE.
reload
arp.next_hops
Sends ARPs for next hops.The default setting is TRUE.
restart
route.table
The kernel routing table ID. The range is between 1 and 2^31.The default is 254.
restart
route.host_max_percent
The maximum neighbor table occupancy in hardware (a percentage of the hardware table size).The default setting is 100.
restart
coalescing.reducer
The coalescing reduction factor for accumulating changes to reduce CPU load.The default setting is 1.
restart
coalescing.timeout
The coalescing time limit in seconds.The default setting is 10.
restart
ignore_non_swps
Ignore routes that point to non-swp interfaces.The default setting is TRUE.
restart
disable_internal_parity_restart
Disables restart after a parity error.The default setting is TRUE.
restart
disable_internal_hw_err_restart
Disables restart after an unrecoverable hardware error.The default setting is FALSE.
restart
nat.static_enable
Enables static NAT. The default setting is TRUE.
restart
nat.dynamic_enable
Enables dynamic NAT. The default setting is TRUE.
restart
nat.age_poll_interval
The NAT age polling interval in minutes. The minimum is 1 minute and the maximum is 24 hours. You can configure this setting only when nat.dynamic_enable is set to TRUE. The default setting is 5.
restart
nat.table_size
The NAT table size limit in number of entries. You can configure this setting only when nat.dynamic_enable is set to TRUE. The default setting is 1024.
restart
nat.config_table_size
The NAT configuration table size limit in number of entries. You can configure this setting only when nat.dynamic_enable is set to TRUE. The default setting is 64.
restart
logging
Configures logging in the format BACKEND=LEVEL. Separate multiple BACKEND=LEVEL pairs with a space. The BACKEND value can be stderr, file:filename, syslog, program:executable. The LEVEL value can be CRIT, ERR, WARN, INFO, DEBUG.The default value is syslog=INFO
restart
interface.swp1.storm_control.broadcast
Enables broadcast storm control and sets the number of packets per second (pps).The default setting is 400.
reload
interface.swp1.storm_control.multicast
Enables multicast storm control and sets the number of packets per second (pps).The default setting is 3000.
reload
interface.swp1.storm_control.unknown_unicast
Enables unicast storm control and sets the number of packets per second (pps).The default setting is 2000.
reload
stats.vlan.aggregate
Enables hardware statistics for VLANs and specifies the type of statistics needed. You can specify NONE, BRIEF, or DETAIL.The default setting is BRIEF.
restart
stats.vxlan.aggregate
Enables hardware statistics for VXLANs and specifies the type of statistics needed. You can specify NONE, BRIEF, or DETAIL.The default setting is DETAIL.
restart
stats.vxlan.member
Enables hardware statistics for VXLAN members and specifies the type of statistics needed. You can specify NONE, BRIEF, or DETAIL.The default setting is BRIEF.
restart
stats.vlan.show_internal_vlans
Show internal VLANs.The default setting is FALSE.
restart
stats.vdev_hw_poll_interval
The polling interval in seconds for virtual device hardware statisitcs.The default setting is 5.
restart
resv_vlan_range
The internal VLAN range.The default setting is 3725-3999.
restart
netlink.buf_size
The netlink socket buffer size in MB.The default setting is 136314880.
restart
route.delete_dead_routes
Delete routes on interfaces when the carrier is down.The default setting is TRUE.
restart
vxlan.default_ttl
The default TTL to use in VXLAN headers.The default setting is 64.
restart
bridge.broadcast_frame_to_cpu
Enables bridge broadcast frames to the CPU even if the SVI is not enabled.The default setting is FALSE.
restart
bridge.unreg_mcast_init
Initialize the prune module for IGMP snooping unregistered layer 2 multicast flood control.The default setting is FALSE.
restart
bridge.unreg_v4_mcast_prune
Enables unregistered layer 2 multicast prune to mrouter ports (IPv4).The default setting is FALSE (flood unregistered layer 2 multicast traffic).
restart
bridge.unreg_v6_mcast_prune
Enables unregistered layer 2 multicast prune to mrouter ports (IPv6).The default setting is FALSE (flood unregistered layer 2 multicast traffic).
restart
netlink libnl logger
The default setting is [0-5].
restart
netlink.nl_logger
The default setting is 0.
restart
vxlan.def_encap_dscp_action
Sets the default VXLAN router DSCP action during encapsulation. You can specify copy if the inner packet is IP, set to set a specific value, or derive to derive the value from the switch priority.The default setting is derive.
restart
vxlan.def_encap_dscp_value
Sets the default VXLAN encapsulation DSCP value if the action is set.
restart
vxlan.def_decap_dscp_action
Sets the default VXLAN router DSCP action during decapsulation. You can specify copy if the inner packet is IP, preserve to preserve the inner DSCP value, or derive to derive the value from the switch priority.The default setting is derive.
restart
ipmulticast.unknown_ipmc_to_cpu
Enables sending unknown IPMC to the CPU.The default setting is FALSE.
restart
vrf_route_leak_enable_dynamic
Enables dynamic VRF route leaking.The default setting is FALSE.
restart
sync_queue_depth_val
The event queue depth.The default setting is 50000.
restart
route.route_preferred_over_neigh
Sets the preference between a route and neighbor with the same IP address and mask. You can specify TRUE to prefer the route over the neighbor, FALSE to prefer the neighbor over the route, or BOTH to install both the route and neighbor.The default setting is TRUE.
restart
evpn.multihoming.enable
Enables EVPN multihoming.The default setting is TRUE.
restart
evpn.multihoming.shared_l2_groups
Enables sharing for layer 2 next hop groups.The default setting is FALSE.
restart
evpn.multihoming.shared_l3_groups
Enables sharing for layer 3 next hop groups.The default setting is FALSE.
restart
evpn.multihoming.fast_local_protect
Enables fast reroute for egress link protection. The default setting is FALSE.
restart
evpn.multihoming.bum_sph_filter
Sets split-horizon filtering for EVPN multihoming. You can specify TRUE to filter only BUM traffic from the Ethernet segment (ES) peer or FALSE to filter all traffic from the ES peer.The default setting is TRUE.
restart
link_flap_window
The duration in seconds during which a link must flap the number of times set in the link_flap_threshold before Cumulus Linux sets the link to protodown and specifies linkflap as the reason.The default setting is 10. A value of 0 disables link flap protection.
restart
link_flap_threshold
The number of times the link must flap within the link flap window before Cumulus Linux sets the link to protodown and specifies linkflap as the reason.The default setting is 5. A value of 0 disables link flap protection.
restart
res_usage_warn_threshold
Sets the percentage over which forwarding resources (routes, hosts, MAC addresses) must go before Cumulus Linux generates a warning. You can set a value between 50 and 95.The default setting is 90.
restart
res_warn_msg_int
The time interval in seconds between resource warning messages. Warning messages generate only one time in the specified interval per resource type even if the threshold falls below or goes over the value set in res_usage_warn_threshold multiple times during this interval. You can set a value between 60 and 3600.The default setting is 300.
restart
Show switchd Settings
You can run the following NVUE commands to show the current switchd configuration settings.
Command
Description
nv show system counter polling-interval
Shows the polling interval for physical and logical interface counters in seconds.
nv show system forwarding programming
Shows the log level for data plane programming logs.
nv show nve vxlan encapsulation dscp
Shows the DSCP action and value (if the action is set) for the outer header in VXLAN encapsulation.
nv show nve vxlan decapsulation dscp
Shows the DSCP action for the outer header in VXLAN decapsulation.
nv show system acl
Shows the ACL mode (atomic or non-atomic).
nv show system global reserved vlan internal
Shows the reserved VLAN range.
The following example command shows that the polling interval setting for logical interface counters is 6 seconds:
cumulus@switch:~$ nv show system counter polling-interval
applied description
----------------- ------- -----------------------------------------------------
logical-interface 0:00:06 Config polling-interval for logical interface(in sec)
The following example command shows that the log level setting for data plane programming logs is warning:
cumulus@switch:~$ nv show system forwarding programming
applied description
--------- ------- -------------------
log-level warning configure Log-level
The following example command shows that the DSCP action setting for the outer header in VXLAN encapsulation is set and the value is af12.
cumulus@switch:~$ nv show nve vxlan encapsulation dscp
operational applied description
------ ----------- ------- --------------------------------------------------
action set set DSCP encapsulation action
value af12 af12 Configured DSCP value to put in outer Vxlan packet
The following command example shows that ACL mode is atomic:
cumulus@switch:~$ nv show system acl
applied description
---- ------- -----------------------------------------
mode atomic configure Atomic or Non-Atomic ACL update
The following command example shows that the reserved VLAN range is between 4064 and 4094:
cumulus@switch:~$ nv show system global reserved vlan internal
operational applied description
----- ----------- --------- -------------------
range 4064-4094 4064-4094 Reserved Vlan range
In addition to restarting switchd when you change certain /etc/cumulus/switchd.conf file parameters manually, you also need to restart switchd whenever you modify a switchd hardware configuration file (any *.conf file that requires making a change to the switching hardware, such as /etc/cumulus/datapath/traffic.conf). You do not have to restart the switchd service when you update a network interface configuration (for example, when you edit the /etc/network/interfaces file).
Configuring a Global Proxy
You configure global HTTP and HTTPS proxies in the /etc/profile.d/ directory of Cumulus Linux. Set the http_proxy and https_proxy variables to configure the switch with the address of the proxy server you want to use to get URLs on the command line. This is useful for programs such as apt, apt-get, curl and wget, which can all use this proxy.
In a terminal, create a new file in the /etc/profile.d/ directory.
Create a file in the /etc/apt/apt.conf.d directory and add the following lines to the file to get the HTTP and HTTPS proxies. The example below uses http_proxy as the file name:
Use ISSU to upgrade and troubleshoot an active switch with minimal disruption to the network.
ISSU includes the following modes:
Restart
Upgrade
Maintenance mode
Maintenance ports
In earlier Cumulus Linux releases, ISSU was Smart System Manager.
The NVIDIA SN5600 (Spectrum-4) switch does not support ISSU.
Restart Mode
You can configure the switch to restart in one of the following modes.
cold restarts the system and resets all the hardware devices on the switch (including the switching ASIC).
fast restarts the system more efficiently with minimal impact to traffic by reloading the kernel and software stack without a hard reset of the hardware. During a fast restart, the system decouples from the network to the extent possible using existing protocol extensions before recovering to the operational mode of the system. The restart process maintains the forwarding entries of the switching ASIC and the data plane is not affected. Traffic outage is much lower in this mode as there is a momentary interruption after reboot, while the system reinitializes.
warm restarts the system with no interruption to traffic for existing route entries. Warm mode diverts traffic from itself and restarts the system without a hardware reset of the switch ASIC. While this process does not affect the data plane, the control plane is absent during restart and is unable to process routing updates. However, if no alternate paths exist, the switch continues forwarding with the existing entries with no interruptions.
When you restart the switch in warm mode, BGP only performs a graceful restart if the BGP graceful restart option is set to full. To set BGP graceful restart to full, run the nv set router bgp graceful-restart mode full command, then apply the configuration with nv config apply. For more information about BGP graceful restart, refer to Optional BGP Configuration.
Cumulus Linux supports fast mode for all protocols; however only supports warm mode for layer 2 forwarding, and layer 3 forwarding with BGP and static routing.
NVIDIA recommends you use NVUE commands to configure restart mode and reboot the system. If you prefer to use csmgrctl commands, you must stop NVUE from managing the /etc/cumulus/csmgrd.conf file before you set restart mode:
Run the following NVUE commands:
cumulus@switch:~$ nv set system config apply ignore /etc/cumulus/csmgrd.conf
cumulus@switch:~$ nv config apply
Edit the /etc/cumulus/csmgrd.conf file and set the csmgrctl_override option to true:
The following command configures the switch to restart in cold mode:
cumulus@switch:~$ nv set system reboot mode cold
cumulus@switch:~$ nv config apply
cumulus@switch:~$ sudo csmgrctl -c
The following command configures the switch to restart in fast mode:
cumulus@switch:~$ nv set system reboot mode fast
cumulus@switch:~$ nv config apply
cumulus@switch:~$ sudo csmgrctl -f
The following command configures the switch to restart in warm mode.
cumulus@switch:~$ nv set system reboot mode warm
cumulus@switch:~$ nv config apply
cumulus@switch:~$ sudo csmgrctl -w
To reboot the switch in the restart mode you configure above with NVUE:
cumulus@switch:~$ nv action reboot system no-confirm
You must specify no-confirm at the end of the command.
To show system reboot information, such as the reboot date and time, reason, and reset mode (fast, cold, warm), run the NVUE nv show system reboot command:
cumulus@switch:~$ nv show system reboot
operational applied pending
--------- -------------------------------- ------- -------
reason
gentime 2023-04-26T15:11:23.140569+00:00
reason Unknown
user system/root
Upgrade Mode
Upgrade mode updates all the components and services on the switch to the latest Cumulus Linux minor release without impacting traffic. After upgrade is complete, you must restart the switch with either a warm, cold, or fast restart.
If the switch is in warm restart mode, restarting the switch after an upgrade does not result in traffic loss (this is a hitless upgrade).
Upgrade mode includes the following options:
all runs apt-get upgrade to upgrade all the system components to the latest release without affecting traffic flow. You must restart the system after the upgrade completes with one of the restart modes.
dry-run provides information on the components you want to upgrade.
The following command upgrades all the system components:
The NVUE command is not supported.
cumulus@switch:~$ sudo csmgrctl -u
The following command provides information on the components you want to upgrade:
The NVUE command is not supported.
cumulus@switch:~$ sudo csmgrctl -d
Maintenance Mode
Maintenance mode globally manages the BGP and MLAG control plane.
When you enable maintenance mode, BGP and MLAG shut down gracefully.
When you disable maintenance mode, BGP and MLAG are enabled based on the individual parameter settings.
To enable maintenance mode:
cumulus@switch:~$ nv action enable system maintenance mode
Action executing ...
System maintenance mode has been enabled successfully
Current System Mode: Maintenance, cold
Maintenance mode since Thu Jun 13 23:59:47 2024 (Duration: 00:00:00)
Ports shutdown for Maintenance
frr : Maintenance, cold, down, up time: 29:06:27
switchd : Maintenance, cold, down, up time: 29:06:31
System Services : Maintenance, cold, down, up time: 29:07:00
Action succeeded
cumulus@switch:~$ sudo csmgrctl -m1
To disable maintenance mode:
cumulus@switch:~$ nv action disable system maintenance mode
Action executing ...
System maintenance mode has been disabled successfully
Current System Mode: cold
frr : cold, up, up time: 12:57:48 (1 restart)
switchd : cold, up, up time: 13:12:13
System Services : cold, up, up time: 13:12:32
Action succeeded
cumulus@switch:~$ sudo csmgrctl -m0
Before you disable maintenance mode, be sure to bring the ports back up.
To show maintenance mode status either run the NVUE nv show system maintenance command or the Linux sudo csmgrctl -s command:
cumulus@switch:~$ nv show system maintenance
operational
----- -----------
mode enabled
ports disabled
cumulus@switch:~$ sudo csmgrctl -s
Current System Mode: cold
frr : cold, up, up time: 00:14:51 (2 restarts)
clagd : cold, up, up time: 00:14:47
switchd : cold, up, up time: 01:09:48
System Services : cold, up, up time: 01:10:07
Maintenance Ports
Maintenance ports globally disables or enables all configured ports.
When you enable maintenance ports, swp interfaces follow individual admin states.
When you disable maintenance ports, swp interfaces are globally admin down, overriding the admin state in the configuration.
To enable maintenance ports:
cumulus@switch:~$ nv action enable system maintenance ports
Action executing ...
System maintenance ports has been enabled successfully
Current System Mode: cold
frr : cold, up, up time: 28:54:36
switchd : cold, up, up time: 28:54:40
System Services : cold, up, up time: 28:55:09
Action succeeded
cumulus@switch:~$ sudo csmgrctl -p0
To disable maintenance ports:
cumulus@switch:~$ nv action disable system maintenance ports
Action executing ...
System maintenance ports has been disabled successfully
Current System Mode: cold
Ports shutdown for Maintenance
frr : cold, up, up time: 28:55:49
switchd : cold, up, up time: 28:55:53
System Services : cold, up, up time: 28:56:22
Action succeeded
cumulus@switch:~$ sudo csmgrctl -p1
To see the status of maintenance ports, run the NVUE nv show system maintenance command:
cumulus@switch:~$ nv show system maintenance
operational
----- -----------
mode enabled
ports disabled
System Power
In certain situations, you might need to power off the switch instead of rebooting. To power off the switch, you can run the Linux poweroff command.
cumulus@switch:~$ sudo poweroff
When you run the Linux poweroff command on the SN2201, SN2010, SN2100, SN2100B, SN3420, SN3700, SN3700C, SN4410, SN4600C, SN4600, or SN4700 switch, the switch reboots instead of powering off. To power off the switch, run the cl-poweroff command instead. The cl-poweroff command performs a hard abrupt power down instead of a graceful power down.
cumulus@switch:~$ sudo cl-poweroff
Layer 1 and Switch Ports
This section discusses the following layer 1 and switch port configuration:
To configure and bring an interface up administratively, edit the /etc/network/interfaces file to add the interface stanza, then run the ifreload -a command:
cumulus@switch:~$ sudo nano /etc/network/interfaces
auto lo
iface lo inet loopback
address 10.10.10.1/32
auto mgmt
iface mgmt
address 127.0.0.1/8
address ::1/128
vrf-table auto
auto eth0
iface eth0 inet dhcp
ip-forward off
ip6-forward off
vrf mgmt
auto swp1
iface swp1
...
To bring an interface down administratively after you configure it, add link-down yes to the interface stanza in the /etc/network/interfaces file, then run ifreload -a:
auto swp1
iface swp1
link-down yes
If you configure an interface in the /etc/network/interfaces file, you can bring it down administratively with the ifdown swp1 command, then bring the interface back up with the ifup swp1 command. These changes do not persist after a reboot. After a reboot, the configuration present in /etc/network/interfaces takes effect.
-By default, the ifupdown and ifup command is quiet. Use the verbose option (-v) to show commands as they execute when you bring an interface down or up.
To remove an interface from the configuration entirely, remove the interface stanza from the /etc/network/interfaces file, then run the ifreload -a command.
For additional information on interface administrative state and physical state, refer to this knowledge base article.
Loopback Interface
Cumulus Linux has a preconfigured loopback interface. When the switch boots up, the loopback interface called lo is up and assigned an IP address of 127.0.0.1.
The loopback interface lo must always exist on the switch and must always be up.
To configure an IP address for the loopback interface:
cumulus@switch:~$ nv set interface lo ip address 10.10.10.1
cumulus@switch:~$ nv config apply
Edit the /etc/network/interfaces file to add an address line:
auto lo
iface lo inet loopback
address 10.10.10.1
If the IP address has no subnet mask, it automatically becomes a /32 IP address. For example, 10.10.10.1 is 10.10.10.1/32.
You can configure multiple IP addresses for the loopback interface.
Subinterfaces
On Linux, an interface is a network device that can be either physical, (for example, swp1) or virtual (for example, vlan100). A VLAN subinterface is a VLAN device on an interface, and the VLAN ID appends to the parent interface using dot (.) VLAN notation. For example, a VLAN with ID 100 that is a subinterface of swp1 is swp1.100. The dot VLAN notation for a VLAN device name is a standard way to specify a VLAN device on Linux.
A VLAN subinterface only receives traffic tagged for that VLAN; therefore, swp1.100 only receives packets that have a VLAN 100 tag on switch port swp1. Any packets that transmit from swp1.100 have a VLAN 100 tag.
The following example configures a routed subinterface on swp1 in VLAN 100:
cumulus@switch:~$ nv set interface swp1.100 ip address 192.168.100.1/24
cumulus@switch:~$ nv config apply
Edit the /etc/network/interfaces file, then run ifreload -a:
If you are using a VLAN subinterface, do not add that VLAN under the bridge stanza.
You cannot use NVUE commands to create a routed subinterface for VLAN 1.
Interface IP Addresses
You can specify both IPv4 and IPv6 addresses for the same interface.
For IPv6 addresses:
You can create or modify the IP address for an interface using either :: or 0:0:0 notation. For example, both 2620:149:43:c109:0:0:0:5 and 2001:DB8::1/126 are valid.
Cumulus Linux assigns the IPv6 address with all zeroes in the interface identifier (2001:DB8::/126) for each subnet; connected hosts cannot use this address.
The following example commands configure three IP addresses for swp1; two IPv4 addresses and one IPv6 address.
cumulus@switch:~$ nv set interface swp1 ip address 10.0.0.1/30
cumulus@switch:~$ nv set interface swp1 ip address 10.0.0.2/30
cumulus@switch:~$ nv set interface swp1 ip address 2001:DB8::1/126
cumulus@switch:~$ nv config apply
In the /etc/network/interfaces file, list all IP addresses under the iface section.
auto swp1
iface swp1
address 10.0.0.1/30
address 10.0.0.2/30
address 2001:DB8::1/126
The address method and address family are not mandatory; they default to inet/inet6 and static. However, you must specify inet/inet6 when you are creating DHCP or loopback interfaces.
auto lo
iface lo inet loopback
To make non-persistent changes to interfaces at runtime, use ip addr add:
cumulus@switch:~$ sudo ip addr add 10.0.0.1/30 dev swp1
cumulus@switch:~$ sudo ip addr add 2001:DB8::1/126 dev swp1
To remove an addresses from an interface, use ip addr del:
cumulus@switch:~$ sudo ip addr del 10.0.0.1/30 dev swp1
cumulus@switch:~$ sudo ip addr del 2001:DB8::1/126 dev swp1
Interface Descriptions
You can add a description (alias) to an interface.
In the /etc/network/interfaces file, add a description using the alias keyword:
cumulus@switch:~# sudo nano /etc/network/interfaces
auto swp1
iface swp1
alias swp1 hypervisor_port_1
Interface Commands
You can specify user commands for an interface that run at pre-up, up, post-up, pre-down, down, and post-down.
You can add any valid command in the sequence to bring an interface up or down; however, limit the scope to network-related commands associated with the particular interface. For example, it does not make sense to install a Debian package on ifup of swp1, even though it is technically possible. See man interfaces for more details.
The following examples adds a command to an interface to enable proxy ARP:
If your post-up command also starts, restarts, or reloads any systemd service, you must use the --no-block option with systemctl. Otherwise, that service or even the switch itself might hang after starting or restarting. For example, to restart the dhcrelay service after bringing up a VLAN, the /etc network/interfaces configuration looks like this:
auto bridge.100
iface bridge.100
post-up systemctl --no-block restart dhcrelay.service
Source Interface File Snippets
Sourcing interface files helps organize and manage the /etc/network/interfaces file. For example:
cumulus@switch:~$ sudo cat /etc/network/interfaces
# The loopback network interface
auto lo
iface lo inet loopback
# The primary network interface
auto eth0
iface eth0 inet dhcp
source /etc/network/interfaces.d/bond0
Use the glob keyword to specify bridge ports and bond slaves:
auto br0
iface br0
bridge-ports glob swp1-6.100
auto br1
iface br1
bridge-ports glob swp7-9.100 swp11.100 glob swp15-18.100
Fast Linkup
Cumulus Linux supports fast linkup on interfaces on NVIDIA Spectrum1 switches. Fast linkup enables you to bring up ports with cards that require links to come up fast, such as certain 100G optical network interface cards.
You must configure both sides of the connection with the same speed and FEC settings.
cumulus@switch:~$ nv set interface swp1 link fast-linkup on
cumulus@switch:~$ nv config apply
Edit the /etc/cumulus/switchd.conf file and add the interface.<interface>.enable_media_depended_linkup_flow=TRUE and interface.<interface>.enable_port_short_tuning=TRUE settings for the interfaces on which you want to enable fast linkup. The following example enables fast linkup on swp1:
Reload switchd with the sudo systemctl reload switchd.service command.
Link Flap Protection
Cumulus Linux enables link flap detection by default. Link flap detection triggers when there are five link flaps within ten seconds, at which point the interface goes into a protodown state and shows linkflap as the reason. The switchd service also shows a log message similar to the following:
2023-02-10T17:53:21.264621+00:00 cumulus switchd[10109]: sync_port.c:2263 ERR swp2 link flapped more than 3 times in the last 60 seconds, setting protodown
To show interfaces with the protodown flag, run the Linux ip link command:
cumulus@switch:~$ ip link
...
37: swp2: <NO-CARRIER,BROADCAST,MULTICAST,SLAVE,UP> mtu 9178 qdisc pfifo_fast master bond131 state DOWN mode DEFAULT group default qlen 1000
link/ether 1c:34:da:ba:bb:2a brd ff:ff:ff:ff:ff:ff protodown on protodown_reason <linkflap>
...
Clear the Interface Protodown State and Reason
The ifdown and ifup commands do not clear the protodown state. You must clear the protodown state and the reason manually using the sudo ip link set <interface> protodown_reason linkflap off and sudo ip link set <interface> protodown off commands.
cumulus@switch:~$ sudo ip link set swp2 protodown_reason linkflap off
cumulus@switch:~$ sudo ip link set swp2 protodown off
After a few seconds the port state returns to UP. Run the ip link show <interface> command to verify that the interface is no longer in a protodown state and that the reason clears:
cumulus@switch:~$ ip link show swp2
37: swp2: <NO-CARRIER,BROADCAST,MULTICAST,SLAVE,UP> mtu 9178 qdisc pfifo_fast master bond131 state UP mode DEFAULT group default qlen 1000
link/ether 1c:34:da:ba:bb:2a brd ff:ff:ff:ff:ff:ff
Change Link Flap Protection Settings
You can change link flap protection settings in the /etc/cumulus/switchd.conf file:
To change the duration during which a link must flap the number of times set in the link flap threshold before link flap protection triggers, change the link_flap_window setting.
To change the number of times the link must flap within the link flap window before link flap protection triggers, change the link_flap_threshold setting.
To disable link flap protection, set the link_flap_window and link_flap_threshold parameters to 0 (zero).
After you change the link flap settings, you must restart switchd with the sudo systemctl restart switchd.service command.
Mako Templates
ifupdown2 supports Mako-style templates. The Mako template engine processes the interfaces file before parsing.
Use the template to declare cookie-cutter bridges and to declare addresses in the interfaces file:
%for i in [1,12]:
auto swp${i}
iface swp${i}
address 10.20.${i}.3/24
In Mako syntax, use square brackets ([1,12]) to specify a list of individual numbers. Use range(1,12) to specify a range of interfaces.
To test your template and confirm it evaluates correctly, run mako-render /etc/network/interfaces.
To comment out content in Mako templates, use double hash marks (##). For example:
## % for i in range(1, 4):
## auto swp${i}
## iface swp${i}
## % endfor
##
Unlike the traditional ifupdown system, ifupdown2 does not run scripts installed in /etc/network/*/ automatically to configure network interfaces.
To enable or disable ifupdown2 scripting, edit the addon_scripts_support line in the /etc/network/ifupdown2/ifupdown2.conf file. 1 enables scripting and 2 disables scripting. For example:
cumulus@switch:~$ sudo nano /etc/network/ifupdown2/ifupdown2.conf
# Support executing of ifupdown style scripts.
# Note that by default python addon modules override scripts with the same name
addon_scripts_support=1
ifupdown2 sets the following environment variables when executing commands:
$IFACE represents the physical name of the interface; for example, br0 or vxlan42. The name comes from the /etc/network/interfaces file.
$LOGICAL represents the logical name (configuration name) of the interface.
$METHOD represents the address method; for example, loopback, DHCP, DHCP6, manual, static, and so on.
$ADDRFAM represents the address families associated with the interface in a comma-separated list; for example, "inet,inet6".
Troubleshooting
To see the link and administrative state of an interface:
cumulus@switch:~$ nv show interface swp1 link state
In the following example, swp1 is administratively UP and the physical link is UP (LOWER_UP).
cumulus@switch:~$ ip link show dev swp1
3: swp1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT qlen 500
link/ether 44:38:39:00:03:c1 brd ff:ff:ff:ff:ff:ff
To show the assigned IP address on an interface:
cumulus@switch:~$ nv show interface swp1 ip address
cumulus@switch:~$ ip addr show swp1
3: swp1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 500
link/ether 44:38:39:00:03:c1 brd ff:ff:ff:ff:ff:ff
inet 192.0.2.1/30 scope global swp1
inet 192.0.2.2/30 scope global swp1
inet6 2001:DB8::1/126 scope global tentative
valid_lft forever preferred_lft forever
To show the description (alias) for an interface:
cumulus@switch$ nv show interface swp1
cumulus@switch$ ip link show swp1
3: swp1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast state DOWN mode DEFAULT qlen 500
link/ether aa:aa:aa:aa:aa:bc brd ff:ff:ff:ff:ff:ff
alias hypervisor_port_1
Considerations
Even though ifupdown2 supports the inclusion of multiple iface stanzas for the same interface, use a single iface stanza for each interface. If you must specify more than one iface stanza; for example, if the configuration for a single interface comes from many places, like a template or a sourced file, make sure the stanzas do not specify the same interface attributes. Otherwise, you see unexpected behavior.
In the following example, swp1 is in two files: /etc/network/interfaces and /etc/network/interfaces.d/speed_settings. ifupdown2 parses this configuration because the same attributes are not in multiple iface stanzas.
cumulus@switch:~$ sudo cat /etc/network/interfaces
source /etc/network/interfaces.d/speed_settings
auto swp1
iface swp1
address 10.0.14.2/24
cumulus@switch:~$ cat /etc/network/interfaces.d/speed_settings
auto swp1
iface swp1
link-speed 1000
link-duplex full
ifupdown2 and sysctl
For sysctl commands in the pre-up, up, post-up, pre-down, down, and post-down lines that use the
$IFACE variable, if the interface name contains a dot (.), ifupdown2 does not change the name to work with sysctl. For example, the interface name bridge.1 does not convert to bridge/1.
ifupdown2 and the gateway Parameter
The default route that the gateway parameter creates in ifupdown2 does not install in FRR, therefore does not redistribute into other routing protocols. Define a static default route instead, which installs in FRR and redistributes, if needed.
The following shows an example of the /etc/network/interfaces file when you use a static route instead of a gateway parameter:
auto swp2
iface swp2
address 172.16.3.3/24
up ip route add default via 172.16.3.2
Interface Name Limitations
Interface names can be a maximum of 15 characters. You cannot use a number for the first character and you cannot include a dash (-) in the name. In addition, you cannot use any name that matches with the regular expression .{0,13}\-v.*.
If you encounter issues, remove the interface name from the /etc/network/interfaces file, then restart the networking.service.
ifupdown2 does not honor the configured IP address scope setting in the /etc/network/interfaces file and treats all addresses as global. It does not report an error. Consider this example configuration:
auto swp2
iface swp2
address 35.21.30.5/30
address 3101:21:20::31/80
scope link
When you run ifreload -a on this configuration, ifupdown2 considers all IP addresses as global.
cumulus@switch:~$ ip addr show swp2
5: swp2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 74:e6:e2:f5:62:82 brd ff:ff:ff:ff:ff:ff
inet 35.21.30.5/30 scope global swp2
valid_lft forever preferred_lft forever
inet6 3101:21:20::31/80 scope global
valid_lft forever preferred_lft forever
inet6 fe80::76e6:e2ff:fef5:6282/64 scope link
valid_lft forever preferred_lft forever
To work around this issue, configure the IP address scope:
The NVUE command is not supported.
In the /etc/network/interfaces file, configure the IP address scope using post-up ip address add <address> dev <interface> scope <scope>. For example:
auto swp6
iface swp6
post-up ip address add 71.21.21.20/32 dev swp6 scope site
Then run the ifreload -a command on this configuration.
The following configuration shows the correct scope:
cumulus@switch:~$ ip addr show swp6
9: swp6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 74:e6:e2:f5:62:86 brd ff:ff:ff:ff:ff:ff
inet 71.21.21.20/32 scope site swp6
valid_lft forever preferred_lft forever
inet6 fe80::76e6:e2ff:fef5:6286/64 scope link
valid_lft forever preferred_lft forever
For NVIDIA Spectrum ASICs, the firmware configures FEC, link speed, duplex mode and auto-negotiation automatically, following a predefined list of parameter settings until the link comes up. You can disable FEC if necessary, which forces the firmware to not try any FEC options.
MTU
Interface MTU applies to traffic traversing the management port, front panel or switch ports, bridge, VLAN subinterfaces, and bonds (both physical and logical interfaces). MTU is the only interface setting that you must set manually.
In Cumulus Linux, ifupdown2 assigns 9216 as the default MTU setting. The initial MTU value set by the driver is 9238. After you configure the interface, the default MTU setting is 9216.
To change the MTU setting, run the following commands. The example command sets the MTU to 1500 for the swp1 interface.
cumulus@switch:~$ nv set interface swp1 link mtu 1500
cumulus@switch:~$ nv config apply
Edit the /etc/network/interfaces file, then run the ifreload -a command.
cumulus@switch:~$ sudo nano /etc/network/interfaces
auto swp1
iface swp1
mtu 1500
cumulus@switch:~$ sudo ifreload -a
Runtime Configuration (Advanced)
Run the ip link set command. The following example command sets the swp1 interface MTU to 1500.
cumulus@switch:~$ sudo ip link set dev swp1 mtu 1500
A runtime configuration is non-persistent; the configuration you create does not persist after you reboot the switch.
Set a Global Policy
To set a global MTU policy, create a policy document (called mtu.json). For example:
The policies and attributes in any file in /etc/network/ifupdown2/policy.d/ override the default policies and attributes in /var/lib/ifupdown2/policy.d/.
Bridge MTU
The MTU setting is the lowest MTU of any interface that is a member of the bridge (every interface specified in bridge-ports in the bridge configuration of the /etc/network/interfaces file). You are not required to specify an MTU on the bridge. Consider this bridge configuration:
For a bridge to have an MTU of 9000, set the MTU for each of the member interfaces (bond1 to bond 4, and peer5) to 9000 at minimum.
When configuring MTU for a bond, configure the MTU value directly under the bond interface; the member links or slave interfaces inherit the configured value. If you need a different MTU on the bond, set it on the bond interface, as this ensures the slave interfaces pick it up. You do not have to specify an MTU on the slave interfaces.
VLAN interfaces inherit their MTU settings from their physical devices or their lower interface; for example, swp1.100 inherits its MTU setting from swp1. Therefore, specifying an MTU on swp1 ensures that swp1.100 inherits the MTU setting for swp1.
If you are working with VXLANs, the MTU for a virtual network interface (VNI must be 50 bytes smaller than the MTU of the physical interfaces on the switch, as various headers and other data require those 50 bytes. Also, consider setting the MTU much higher than 1500.
To show the MTU setting for an interface:
cumulus@switch:~$ nv show interface swp1
...
link
auto-negotiate off on
duplex full full
speed 1G auto
fec auto
mtu 9216 9216
cumulus@switch:~$ ip link show dev swp1
3: swp1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9216 qdisc pfifo_fast state UP mode DEFAULT qlen 500
link/ether 44:38:39:00:03:c1 brd ff:ff:ff:ff:ff:ff
Drop Packets that Exceed the Egress Layer 3 MTU
The switch forwards all packets that are within the MTU value set for the egress layer 3 interface. However, when packets are larger in size than the MTU value, the switch fragments the packets that do not have the DF bit set and drops the packets that do have the DF bit set.
Run the following command to drop all IP packets that are larger in size than the MTU value for the egress layer 3 interface instead of fragmenting packets:
cumulus@switch:~$ nv set system control-plane trap l3-mtu-err state off
cumulus@switch:~$ nv config apply
FEC is an encoding and decoding layer that enables the switch to detect and correct bit errors introduced over the cable between two interfaces. The target IEEE BER on high speed Ethernet links is 10-12. Because 25G transmission speeds can introduce a higher than acceptable BER on a link, FEC is often required to correct errors to achieve the target BER at 25G, 4x25G, 100G, and higher link speeds. The type and grade of a cable or module and the medium of transmission determine which FEC setting is necessary.
For the link to come up, the two interfaces on each end must use the same FEC setting.
FEC requires small latency overhead. For most applications, this small amount of latency is preferable to error packet retransmission latency.
The two FEC types are:
Reed Solomon (RS), IEEE 802.3 Clause 108 (CL108) on individual 25G channels and Clause 91 on 100G (4channels). This is the highest FEC algorithm, providing the best bit-error correction.
Base-R (BaseR), Fire Code (FC), IEEE 802.3 Clause 74 (CL74). Base-R provides less protection from bit errors than RS FEC but adds less latency.
Cumulus Linux includes additional FEC options:
Auto FEC instructs the hardware to select the best FEC. For copper DAC, the remote end can negotiate FEC. However, optical modules do not have auto-negotiation capability; if the device chooses a preferred mode, it might not match the remote end. This is the current default on the NVIDIA Spectrum switch.
No FEC (no error correction).
While Auto FEC is the default setting on the NVIDIA Spectrum switch, do not explicitly configure the fec auto option on the switch as this leads to a link flap whenever you run net commit or ifreload -a.
For 25G DAC, 4x25G Breakouts DAC and 100G DAC cables, the IEEE 802.3by specification creates 3 classes:
CA-25G-L (Long cable) - Requires RS FEC - Achievable cable length of at least 5m. dB loss less or equal to 22.48. Expected BER of 10-5 or better without RS FEC enabled.
CA-25G-S (Short cable) - Requires Base-R FEC - Achievable cable length of at least 3m. dB loss less or equal to 16.48. Expected BER of 10-8 or better without Base-R FEC enabled.
CA-25G-N (No FEC) - Does not require FEC - Achievable cable length of at least 3m. dB loss less or equal to 12.98. Expected BER 10-12 or better with no FEC enabled.
The IEEE classification specifies various dB loss measurements and minimum achievable cable length. You can build longer and shorter cables if they comply to the dB loss and BER requirements.
If a cable has a CA-25G-S classification and FEC is not on, the BER might be unacceptable in a production network. It is important to set the FEC according to the cable class (or better) to have acceptable bit error rates. See
Determining Cable Class below.
You can check bit errors using cl-netstat (RX_ERR column) or ethtool -S (HwIfInErrors counter) after a large amount of traffic passes through the link. A non-zero value indicates bit errors.
Expect error packets to be zero or extremely low compared to good packets. If a cable has an unacceptable rate of errors with FEC enabled, replace the cable.
For 25G, 4x25G Breakout, and 100G Fiber modules and AOCs, there is no classification of 25G cable types for dB loss, BER or length. Use FEC if the BER is low enough.
Cable Class of 100G and 25G DACs
You can determine the cable class for 100G and 25G DACs from the Extended Specification Compliance Code field (SFP28: 0Ah, byte 35, QSFP28: Page 0, byte 192) in the cable EEPROM programming.
For 100G DACs, most manufacturers use the 0x0Bh 100GBASE-CR4 or 25GBASE-CR CA-L value (the 100G DAC specification predates the IEEE 802.3by 25G DAC specification). Use RS FEC for 100G DAC; shorter or better cables might not need this setting.
A manufacturer’s EEPROM setting might not match the dB loss on a cable or the actual bit error rates that a particular cable introduces. Use the designation as a guide, but set FEC according to the bit error rate tolerance in the design criteria for the network. For most applications, the highest mutual FEC ability of both end devices is the best choice.
You can determine for which grade the manufacturer has designated the cable as follows.
In each example below, the Compliance field comes from the method described above; the ethool -m output does not show it.
3meter cable that does not require FEC
(CA-N)
Cost: More expensive
Cable size: 26AWG (Note that AWG does not necessarily correspond to overall dB loss or BER performance)
Compliance Code: 25GBASE-CR CA-N
3meter cable that requires Base-R FEC
(CA-S)
Cost: Less expensive
Cable size: 26AWG
Compliance Code: 25GBASE-CR CA-S
When in doubt, consult the manufacturer directly to determine the cable classification.
Spectrum ASIC FEC Behavior
The firmware in a Spectrum ASIC applies FEC configuration to 25G and 100G cables based on the cable type and whether the peer switch also has a Spectrum ASIC.
When the link is between two switches with Spectrum ASICs:
For 25G optical modules, the Spectrum ASIC firmware chooses Base-R/FC-FEC.
For 25G DAC cables with attenuation less or equal to 16db, the firmware chooses Base-R/FC-FEC.
For 25G DAC cables with attenuation higher than 16db, the firmware chooses RS-FEC.
For 100G cables/modules, the firmware chooses RS-FEC.
Cable Type
FEC Mode
25G optical cables
Base-R/FC-FEC
25G 1,2 meters: CA-N, loss <13db
Base-R/FC-FEC
25G 2.5,3 meters: CA-S, loss <16db
Base-R/FC-FEC
25G 2.5,3,4,5 meters: CA-L, loss > 16db
RS-FEC
100G DAC or optical
RS-FEC
When linking to a non-Spectrum peer, the firmware lets the peer decide. The Spectrum ASIC supports RS-FEC (for both 100G and 25G), Base-R/FC-FEC (25G only), or no-FEC (for both 100G and 25G).
Cable Type
FEC Mode
25G optical cables
Let peer decide
25G 1,2 meters: CA-N, loss <13db
Let peer decide
25G 2.5,3 meters: CA-S, loss <16db
Let peer decide
25G 2.5,3,4,5 meters: CA-L, loss > 16db
Let peer decide
100G
Let peer decide: RS-FEC or No FEC
How Does Cumulus Linux use FEC?
A Spectrum switch enables FEC automatically when it powers up. The port firmware tests and determines the correct FEC mode to bring the link up with the neighbor. It is possible to get a link up to a switch without enabling FEC on the remote device as the switch eventually finds a working combination to the neighbor without FEC.
The following sections describe how to show the current FEC mode, and how to enable and disable FEC.
Show the Current FEC Mode
To show the FEC mode on a switch port, run the NVUE nv show interface <interface> link command.
cumulus@switch:~$ nv show interface swp1 link
operational applied pending description
---------------- ------------ ------- ------- ----------------------------------------------------------------------
auto-negotiate off on on Link speed and characteristic auto negotiation
breakout 1x 1x sub-divide or disable ports (only valid on plug interfaces)
duplex full full full Link duplex
fec auto auto Link forward error correction mechanism
...
Enable or Disable FEC
To enable Reed Solomon (RS) FEC on a link:
cumulus@switch:~$ nv set interface swp1 link fec rs
cumulus@switch:~$ nv config apply
Edit the /etc/network/interfaces file, then run the ifreload -a command. The following example enables RS FEC for the swp1 interface (link-fec rs):
cumulus@switch:~$ sudo nano /etc/network/interfaces
auto swp1
iface swp1
link-autoneg off
link-speed 100000
link-fec rs
cumulus@switch:~$ sudo ifreload -a
Runtime Configuration (Advanced)
Run the ethtool --set-fec <interface> encoding RS command. For example:
A runtime configuration is non-persistent. The configuration you create does not persist after you reboot the switch.
To enable Base-R/FireCode FEC on a link:
cumulus@switch:~$ nv set interface swp1 link fec baser
cumulus@switch:~$ nv config apply
Edit the /etc/network/interfaces file, then run the ifreload -a command. The following example enables Base-R FEC for the swp1 interface (link-fec baser):
cumulus@switch:~$ sudo nano /etc/network/interfaces
auto swp1
iface swp1
link-autoneg off
link-speed 100000
link-fec baser
cumulus@switch:~$ sudo ifreload -a
Runtime Configuration (Advanced)
Run the ethtool --set-fec <interface> encoding baser command. For example:
A runtime configuration is non-persistent. The configuration you create does not persist after you reboot the switch.
To enable FEC with Auto-negotiation:
You can use FEC with auto-negotiation on DACs only.
cumulus@switch:~$ nv set interface swp1 link auto-negotiate on
cumulus@switch:~$ nv config apply
Edit the /etc/network/interfaces file to set auto-negotiation to on, then run the ifreload -a command:
cumulus@switch:~$ sudo nano /etc/network/interfaces
auto swp1
iface swp1
link-autoneg on
cumulus@switch:~$ sudo ifreload -a
Runtime Configuration (Advanced)
You can use ethtool to enable FEC with auto-negotiation. For example:
ethtool -s swp1 speed 10000 duplex full autoneg on
A runtime configuration is non-persistent. The configuration you create does not persist after you reboot the switch.
To show the FEC and auto-negotiation settings for an interface, either run the NVUE nv show interface <interface> link command or the Linux sudo ethtool swp1 | egrep 'FEC|auto' command:
cumulus@switch:~$ nv set interface swp1 link fec off
cumulus@switch:~$ nv config apply
To configure FEC to the default value, run the nv unset interface swp1 link fec command.
Edit the /etc/network/interfaces file, then run the ifreload -a command. The following example disables Base-R FEC for the swp1 interface (link-fec baser):
cumulus@switch:~$ sudo nano /etc/network/interfaces
auto swp1
iface swp1
link-fec off
cumulus@switch:~$ sudo ifreload -a
Runtime Configuration (Advanced)
Run the ethtool --set-fec <interface> encoding off command. For example:
cumulus@switch:~$ sudo ethtool --set-fec swp1 encoding off
A runtime configuration is non-persistent. The configuration you create does not persist after you reboot the switch.
DR1 and DR4 Modules
100GBASE-DR1 modules, such as NVIDIA MMS1V70-CM, include internal RS FEC processing, which the software does not control. When using these optics, you must either set the FEC setting to off or leave it unset for the link to function.
400GBASE-DR4 modules, such as NVIDIA MMS1V00-WM, require RS FEC. The switch automatically enables FEC if it is set to off.
You typically use these optics to interconnect 4x SN2700 uplinks to a single SN4700 breakout downlink. The following configuration shows an explicit FEC example. You can leave the FEC settings unset for autodetection.
SN4700 (400GBASE-DR4 in swp1):
cumulus@SN4700:mgmt:~$ nv set interface swp1 link breakout 4x lanes-per-port 2
cumulus@SN4700:mgmt:~$ nv set interface swp1s0 link fec rs
cumulus@SN4700:mgmt:~$ nv set interface swp1s0 link speed 100G
cumulus@SN4700:mgmt:~$ nv set interface swp1s1 link fec rs
cumulus@SN4700:mgmt:~$ nv set interface swp1s1 link speed 100G
cumulus@SN4700:mgmt:~$ nv set interface swp1s2 link fec rs
cumulus@SN4700:mgmt:~$ nv set interface swp1s2 link speed 100G
cumulus@SN4700:mgmt:~$ nv set interface swp1s3 link fec rs
cumulus@SN4700:mgmt:~$ nv set interface swp1s3 link speed 100G
cumulus@SN4700:mgmt:~$ nv config apply
SN2700 (100GBASE-DR1 in swp11-14):
cumulus@SN2700:mgmt:~$ nv set interface swp11 link fec off
cumulus@SN2700:mgmt:~$ nv set interface swp11 link speed 100G
cumulus@SN2700:mgmt:~$ nv set interface swp12 link fec off
cumulus@SN2700:mgmt:~$ nv set interface swp12 link speed 100G
cumulus@SN2700:mgmt:~$ nv set interface swp13 link fec off
cumulus@SN2700:mgmt:~$ nv set interface swp13 link speed 100G
cumulus@SN2700:mgmt:~$ nv set interface swp14 link fec off
cumulus@SN2700:mgmt:~$ nv set interface swp14 link speed 100G
cumulus@SN4700:mgmt:~$ nv config apply
The FEC operational view of this configuration appears incorrect because FEC is operationally enabled only on the SN4700 400G breakout side. This is because the 100G DR1 module side handles FEC internally, which is not visible to Cumulus Linux.
cumulus@SN2700:mgmt:~$ nv show int swp11 link
operational applied
--------------------- ----------------- -------
auto-negotiate on on
duplex full full
speed 100G auto
fec off off
mtu 9216 9216
fast-linkup off
[breakout]
state up up
...
cumulus@SN4700:mgmt:~$ nv show int swp1s1 link
operational applied
--------------------- ----------------- -------
auto-negotiate on on
duplex full full
speed 100G auto
fec rs off
mtu 9216 9216
fast-linkup off
[breakout]
state up up
...
Default Policies for Interface Settings
Instead of configuring settings for each individual interface, you can specify a policy for all interfaces on a switch or tailor custom settings for each interface. Create a file in /etc/network/ifupdown2/policy.d/ and populate the settings accordingly. The following example shows a file called address.json.
Setting the default MTU also applies to the management interface. Be sure to add the iface_defaults to override the MTU for eth0, to remain at 9216.
Breakout Ports
Cumulus Linux supports the following ports breakout options:
18x SFP28 25G and 4x QSFP28 100G interfaces only support NRZ encoding. You can set all speeds down to 1G.
All 4x QSFP28 ports can break out into 4x SFP28 or 2x QSFP28.
18x 1G - 18x SFP28 set to 1G
16x 1G - 4x QSFP28 configured as 4x breakouts and set to 1G
Max 1G ports: 34
18x 10G - 18x SFP28 set to 10G
16x 10G - 4x QSFP28 configured as 4x breakouts and set to 10G
Maximum 10G ports: 34
18x 25G - 18x SFP28 (native speed)
16x 25G - 4x QSFP28 breakouts to 4x and set to 25G
Maximum 25G ports: 34
4x 40G - 4x QSFP28 set to 40G
Maximum 40G ports: 4
8x 50G - 4x QSFP28 break out into 2x and set to 50G
Maximum 50G ports: 8
4x 100G - 4x QSFP28 (native speed)
Maximum 100G ports: 4
16x QSFP28 100G interfaces only support NRZ encoding. You can set all speeds down to 1G.
All QSFP28 ports can break out into 4x SFP28 or 2x QSFP28.
64x 1G - 16x QSFP28 break out into 4x and set to 1G
Max 1G ports: 64
64x 10G - 16x QSFP28 break out into 4x and set to 10G
Maximum 10G ports: 64
64x 25G - 16x QSFP28 break out into 4x and set to 25G
Maximum 25G ports: 64
16x 40G - 4x QSFP28 set to 40G
Maximum 40G ports: 16
32x 50G - 16x QSFP28 break out into 2x and set to 50G
Maximum 50G ports: 32
16x 100G - 16x QSFP28 (native speed)
Maximum 100G ports: 16
48x 1GBase-T ports (RJ45 up to 100m CAT5E/6) and 4x QSFP28 100G interfaces (only support NRZ encoding). You can set all speeds down to 1G.
All 4x QSFP28 ports can break out into 4x SFP28 or 2x QSFP28.
48x 1GBase-T - 48x Base-T set to 1G. You can set them to also to 10/100Mb.
16x 1G - 4x QSFP28 configured as 4x breakouts and set to 1G
Maximum 10/100MBase-T ports: 48
Maximum 1GBase-T ports: 48
Maximum 1G ports: 16
16x 10G - 4x QSFP28 configured as 4x breakouts and set to 10G
Maximum 10G ports: 16
16x 25G - 4x QSFP28 breakouts to 4x and set to 25G
Maximum 25G ports: 16
4x 40G - 4x QSFP28 set to 40G
Maximum 40G ports: 4
8x 50G - 4x QSFP28 break out into 2x
Maximum 50G ports: 8
4x 100G - 4x QSFP28 (native speed)
Maximum 100G ports: 4
48x SFP28 25G and 8x QSFP28 100G interfaces only support NRZ encoding. You can set all speeds down to 1G.
The top 4x QSFP28 ports can break out into 4x SFP28. You cannot use the lower 4x QSFP28 disabled ports.
All 8x QSFP28 ports can break out into 2x QSFP28 without disabling ports.
48x 1G - 48x SFP28 set to 10G
16x 1G - 4x QSFP28 break out into 4x and set to 1G
Max 1G ports: 64
48x 10G - 48x SFP28 set to 10G
16x 10G - 4x QSFP28 break out into 4x and set to 10G
Maximum 10G ports: 64
48x 25G - 48x SFP28 (native speed)
16x 25G - Top 4x QSFP28 break out into 4x (bottom 4x QSFP28 disabled)
Maximum 25G ports: 64
8x 40G - 8x QSFP28 set to 40G
Maximum 40G ports: 8
16x 50G - 8x QSFP28 break out into 2x
Maximum 50G ports: 16
8x 100G - 8x QSFP28 (native speed)
Maximum 100G ports: 8
32x QSFP28 100G interfaces only support NRZ encoding. You can set all speeds down to 1G.
The top 16x QSFP28 ports can break out into 4x SFP28. You cannot use the lower 4x QSFP28 disabled ports.
All 32x QSFP28 ports can break out into 2x QSFP28 without disabling ports.
64x 1G - Top 16x QSFP28 break out into 4x and set to 1G (bottom 16XQSFP28 disabled)
Max 1G ports: 64
64x 10G - Top 16x QSFP28 break out into 4x and set to 10G (bottom 16x QSFP28 disabled)
Maximum 10G ports: 64
64x 25G - Top 16x QSFP28 break out into 4x (bottom 16x QSFP28 disabled)
Maximum 25G ports: 64
32x 40G - 32x QSFP28 set to 40G
Maximum 40G ports: 32
64x 50G - 64x QSFP28 break out into 2x
Maximum 50G ports: 64
32x 100G - 32x QSFP28 (native speed)
Maximum 100G ports: 32
48x SFP28 25G and 12x QSFP28 100G interfaces only support NRZ encoding. You can set all speeds down to 1G.
All 12x QSFP28 ports can break out into 4x SFP28 or 2x QSFP28.
48x 1G - 48XSFP28 set to 1G
48x 1G - 12XQSFP28 break out into 4x and set to 1G
Max 1G ports: 96
48x 10G - 48x SFP28 set to 10G
48x 10G - 12x QSFP28 break out into 4x and set to 10G
Maximum 10G ports: 96
48x 25G - 48x SFP28 (native speed)
48x 25G - 12x QSFP28 break out into 4x
Maximum 25G ports: 96
12x 40G - 12x QSFP28 set to 40G
Maximum 40G ports: 12
24x 50G - 12x QSFP28 break out into 2x
Maximum 50G ports: 24
12x 100G - 12x QSFP28 (native speed)
Maximum 100G ports: 12
32x QSFP28 100G interfaces only support NRZ encoding. You can set all speeds down to 1G.
All 32x QSFP28 ports can break out into 4x SFP28 or 2x QSFP28.
128x1G - 32XQSFP28 break out into 4x and set to 1G
Max 1G ports: 128
128x 10G - 32x QSFP28 break out into 4x and set to 10G
Maximum 10G ports: 128
128x25G - 32x QSFP28 break out into 4x
Maximum 25G ports: 128
32x 40G - 32x QSFP28 set to 40G
Maximum 40G ports: 32
64x 50G - 32x QSFP28 break out into 2x
Maximum 50G ports: 64
32x 100G - 32x QSFP28 (native speed)
Maximum 100G ports: 32
32x QSFP56 200G interfaces support both PAM4 and NRZ encodings. You can set all speeds down to 1G.
For lower speed interface configurations, PAM4 is automatically converted to NRZ encoding.
All 32x QSFP56 ports can break out into 4xSFP56 or 2x QSFP56.
128x 1G - 32XQSFP56 break out into 4x and set to 1G
Max 1G ports: 128
128x 10G - 32x QSFP56 break out into 4x and set to 10G
Maximum 10G ports: 128
128x 25G - 32x QSFP56 break out into 4x and set to 25G
Maximum 25G ports: 128
32x 40G - 32x QSFP56 set to 40G
Maximum 40G ports: 32
128x 50G - 32x QSFP56 break out into 4x
Maximum 50G ports: 128
64x100G - 32x QSFP56 break out into 2x
Maximum 100G ports: 64
32x 200G - 32x QSFP56 (native speed)
Maximum 200G ports: 32
SN4410 24xQSFP28-DD interfaces [ports 1-24] support both PAM4 and NRZ encoding with all speeds from 200G down to 1G.
The 8xQSFP-DD (400GbE) interfaces [ports 25-32] support both PAM4 and NRZ encodings with all speeds from 400G down to 1G.
For lower speeds, PAM4 is automatically converted to NRZ encoding.
You can split ports #1 to #32 into:
2x ports with PAM 4 and NRZ encoding with no limitations.
4x ports with PAM 4 and NRZ encoding with no limitations.
8x ports with PAM 4 and NRZ encoding but this forces blocking of an adjacent port (the total available number of MAC addresses is 128)
96x 1G - 24XQSFP28-DD break out into 4x and set to 1G
32x 1G - Top 4XQSFP-DD break out into 8x and set to 1G (bottom 4XQSFP-DD blocked*)
Max 1G ports: 128
96x 10G - 24xQSFP28-DD break out into 4x and set to 10G
32x 10G - 4 top QSFP-DD break out into 8x and set to 10G (bottom 4xQSFP-DD blocked*)
Maximum 10G ports: 128
*Other QSFP-DD breakout combinations are available up to maximum of 128x ports.
96x 25G - 24xQSFP28-DD break out into 4x
32x 25G - 4 top QSFP-DD break out into 8x and set to 25G (bottom 4xQSFP-DD blocked*)
Maximum 25G ports: 128
*Other QSFP-DD breakout combinations are available up to maximum of 128x ports.
48x 40G - 24xQSFP28-DD breakout into 2x and set to 40G
16x 40G – 8xQSFP-DD breakout into 2x and set to 40G
Maximum 40G ports: 64
96x 50G - 24xQSFP28-DD/QSFP56 break out into 4x
32x 50G - 8xQSFP-DD break out into 4x
Maximum 50G ports: 128
96x 100G - 24xQSFP28-DD/QSFP56 break out into 4x
32x 100G - 8xQSFP-DD break out into 4x
Maximum 100G ports: 128
48x 200G - 24xQSFP28-DD/QSFP56 break out into 2x
16x 200G - 8xQSFP-DD break out into 2x
Maximum 200G ports: 64
8x400G - 8xQSFP-DD (native speed)
Maximum 400G ports: 8
64x QSFP28 100G interfaces only support NRZ encoding. You can set all speeds down to 1G.
Only 32x QSFP28 ports can break out into 4x SFP28. You must disable the adjacent QSFP28 port. Only the first and third or second and forth rows can break out into 4xSFP28.
All 64x QSFP28 ports can break out into 2x QSFP28 without disabling ports.
128x 1G - 32XQSFP28 break out into 4x and set to 1G
Max 1G ports: 128
128x 10G - 32x QSFP28 break out into 4x and set to 10G
Maximum 10G ports: 128
128x 25G - 32x QSFP28 break out into 4x
Maximum 25G ports: 128
64x 40G - 64x QSFP28 set to 40G
Maximum 40G ports: 64
128x 50G - 64x QSFP28 break out into 2x
Maximum 50G ports: 128
64x 100G - 64x QSFP28 (native speed)
Maximum 100G ports: 64
SN4600 64xQSFP56 (200GbE) interfaces support both PAM4 and NRZ encodings with all speeds down to 1G.
For lower speeds, PAM4 is automatically converted to NRZ encoding.
Only 32xQSFP56 ports can break out into 4xSFP56 (4x50GbE). But, in this case, the adjacent QSFP56 port are blocked (only the first and third or second and fourth rows can break out into 4xSFP56).
All 64xQSFP56 ports can break out into 2xQSFP56 (2x100GbE) without blocking ports.
128x 1G - 32XQSFP56 break out into 4x and set to 1G
Max 1G ports: 128
128x10G - 64xQSFP56 break out into 4x and set to 10G
Maximum 10G ports: 128
128x25G - 64xQSFP56 break out into 4x and set to 25G
Maximum 25G ports: 128
64x40G - 64xQSFP56 set to 40G
Maximum 40G ports: 64
128x50G - 32xQSFP56 break out into 4x
Maximum 50G ports: 128
128x 100G - 64xQSFP56 break out into 2x
64x 100G - 64xQSFP28 set to 100G
Maximum 100G ports: 128
64x200G - 64xQSFP56 (native speed)
Maximum 200G ports: 64
SN4700 32x QSFP-DD 400GbE interfaces support both PAM4 and NRZ encodings. You can set all speeds down to 1G.
For lower speed interface configurations, PAM4 is automatically converted to NRZ encoding.
Only the top 16x QSFP-DD ports can break out into 8x SFP56. You must disable the adjacent QSFP-DD port.
All 32x QSFP-DD ports can break out into 2x QSFP56 at 2x200G or 4x QSFP56 at 4x 100G without disabling ports.
128x 1G - Top 16XQSFP-DD break out into 8x and set to 1G
Maximum 1G ports: 128
128x 10G - 16x QSFP-DD break out into 8x and set to 10G
Maximum 10G ports: 128
*Cumulus Linux supports other QSFP-DD breakout combinations up to maximum of 128x ports.
128x 25G - 16x QSFP-DD break out into 8x and set to 25G
Maximum 25G ports: 128
*Cumulus Linux supports other QSFP-DD breakout combinations up to maximum of 128x ports.
32x 40G - 32x QSFP-DD set to 40G
Maximum 40G ports: 32
128x 50G - 16x QSFP-DD break out into 8x
Maximum 50G ports: 128
*Cumulus Linux supports other QSFP-DD breakout combinations up to maximum of 128x ports.
128x 100G - 32x QSFP-DD break out into 4x
Maximum 100G ports: 128
64x 200G - 64x QSFP-DD break out into 2x
Maximum 200G ports: 64
32x 400G - 32x QSFP-DD (native speed)
Maximum 400G ports: 32
SN5600 64xOSFP (800GbE) interfaces support both PAM4 and NRZ encodings with all speeds down to 10G.
For lower speeds, PAM4 is automatically converted to NRZ encoding.
Bonus port #65 supports 1G, 10G, and 25G but does not support breakouts.
Maximum 1G ports: 1 (bonus port)
256x 10G
Maximum 10G ports: 257 (256 + 1 bonus port)
256x 25G
Maximum 25G ports: 257 (256 + 1 bonus port)
128x 40G
Maximum 40G ports: 128
256x 50G - 32x OSFP break out into 8x - You must disable the adjacent OSFP port.
Maximum 50G ports: 256
256x 100G - 32x OSFP break out into 8x - You must disable the adjacent OSFP port.
Maximum 100G ports: 256
256x 200G - 64x OSFP break out into 4x
Maximum 200G ports: 256
128x 400G - 64x OSFP break out into 2x
Maximum 400G ports: 128
64x 800G
Maximum 800G ports: 64
You can use a single SFP (10/25/50G) transceiver in a QSFP (100/200/400G) port with QSFP-to-SFP Adapter (QSA). Set the port speed to the SFP speed with the nv set interface <interface> link speed <speed> command. Do not configure this port as a breakout port.
If you break out a port, then reload the switchd service on a switch running in nonatomic ACL mode, temporary disruption to traffic occurs while the ACLs reinstall.
Cumulus Linux does not support port ganging.
Configure a Breakout Port
You can break out (split) a port using the following options:
1x does not split the port. This is the default port setting.
2x splits the port into two interfaces.
4x splits the port into four interfaces.
8x splits the port into eight interfaces.
If you split a 100G port into four interfaces and auto-negotiation is on (the default setting), Cumulus Linux advertises the speed for each interface up to the maximum speed possible for a 100G port (100/4=25G). You can overide this configuration and set specific speeds for the split ports if necessary.
Cumulus Linux 5.4 and later uses a new format for port splitting; instead of 1=100G or 1=4x10G, you specify 1=1x or 1=4x. The new format does not support specifying a speed for breakout ports in the /etc/cumulus/ports.conf file. To set a speed, either set the link-speed parameter for each split port in the /etc/network/interfaces file or run the NVUE nv set interface <interface> link speed <speed> command.
The following example breaks out a 100G port on swp1 into four interfaces. Cumulus Linux advertises the speed for each interface up to a maximum of 25G:
cumulus@switch:~$ nv set interface swp1 link breakout 4x
cumulus@switch:~$ nv set interface swp1s0-3 link state up
cumulus@switch:~$ nv config apply
The following example splits the port into four interfaces and forces the link speed to be 10G. Cumulus disables auto-negotiation when you force set the speed.
cumulus@switch:~$ nv set interface swp1 link breakout 4x
cumulus@switch:~$ nv set interface swp1s0-3 link state up
cumulus@switch:~$ nv set interface swp1s0-3 link speed 10G
Certain switches, such as the SN2700, SN4600, and SN4600c, require that you disable the subsequent even-numbered port when you configure a breakout port for 4x or 8x. NVUE automatically disables the subsequent even-numbered port on any switch with this requirement.
To split a port into multiple interfaces, edit the /etc/cumulus/ports.conf file. The following example command breaks out swp1 into four interfaces.
When you configure a breakout port to 4x or 8x on certain switches such as the SN2700, SN4600, and SN4600c, you must set the subsequent even-numbered port to disabled in the /etc/cumulus/ports.conf file. The SN3700, SN3700c, SN2201, SN2010, and SN2100 switch does not have this requirement.
Reload switchd with the sudo systemctl reload switchd.service command. The reload does not interrupt network services.
To configure specific speeds for the split ports, edit the /etc/network/interfaces file, then run the ifreload -a command. The following example configures the speed for each swp1 breakout port (swp1s0, swp1s1, swp1s2, and swp1s3) to 10G with auto-negotiation off.
cumulus@switch:~$ sudo cat /etc/network/interfaces
...
auto swp1s0
iface swp1s0
link-speed 10000
link-duplex full
link-autoneg off
auto swp1s1
iface swp1s1
link-speed 10000
link-duplex full
link-autoneg off
auto swp1s2
iface swp1s2
link-speed 10000
link-duplex full
link-autoneg off
auto swp1s3
iface swp1s3
link-speed 10000
link-duplex full
link-autoneg off
...
cumulus@switch:~$ sudo ifreload -a
The SN4700 and SN4410 switch does not support auto-negotiation on QSFP-DD 400G transceiver modules. You need to force set the speed.
Set the Number of Lanes per Split Port
By default, to calculate the split port width, Cumulus Linux uses the formula split port width = full port width / breakout. For example, a port split into two interfaces (2x breakout) => 8 lanes width / 2x breakout = 4 lanes per split port.
If you need to use a different port width than the default, you can set the number of lanes per port.
QSFP56-DD transceiver ports split into four interfaces (4x) default to one lane per interface for backwards compatibility. You can change the lane setting to two lanes per interface.
The following example command splits swp1 into two interfaces (2x) and sets the number of lanes per split port to 2.
cumulus@switch:~$ nv set interface swp1 link breakout 2x lanes-per-port 2
cumulus@switch:~$ nv config apply
You must configure the lanes-per-port at the same time as you configure the breakout. If you want to change the number of lanes per port after you configure a breakout, you must first unset the breakout with the nv unset interface <port> breakout and nv config apply commands, then reconfigure the breakout and the lanes with the nv set interface <interface> link breakout <breakout> lanes-per-port <lanes> command. For example:
cumulus@switch:~$ nv unset interface swp1 link breakout
cumulus@switch:~$ nv config apply
cumulus@switch:~$ nv set interface swp1 link breakout 2x lanes-per-port 2
cumulus@switch:~$ nv config apply
Edit the /etc/cumulus/ports_width.conf file and add the numer of lanes per split port you want to use, then reload switchd:
You must configure the lanes per port in the /etc/cumulus/ports_width.conf before you configure the breakout in the /etc/cumulus/ports.conf file. If the ports.conf file already contains breakout configuration for a port, you must set the breakout back to 1x, then reload switchd. You can then set the desired lanes per port, then reconfigure the breakout.
Remove the breakout interface configuration from the /etc/network/interfaces file, then run the ifreload -a command.
Configure Port Lanes
You can override the default behavior for supported speeds and platforms and specify the number of lanes for a port. For example, for the NVIDIA SN4700 switch, the default port speed is 50G (2 lanes, NRZ signaling mode) and 100G (4 lanes, NRZ signaling mode). You can override this setting to 50G (1 lane, PAM4 signaling mode) and 100G (2 lanes, PAM4 signaling mode).
This setting does not apply when auto-negotiation is on because Cumulus Linux advertises all supported speed options, including PAM4 and NRZ during auto-negotiation.
cumulus@switch:~$ nv set interface swp1 link speed 50G
cumulus@switch:~$ nv set interface swp1 link lanes 1
cumulus@switch:~$ nv config apply
cumulus@switch:~$ nv set interface swp2 link speed 100G
cumulus@switch:~$ nv set interface swp2 link lanes 2
cumulus@switch:~$ nv config apply
Edit the /etc/network/interfaces file, then run the ifreload -a command.
Cumulus Linux includes a ports.conf validator that switchd runs automatically before the switch starts up to confirm that the file syntax is correct. You can run the validator manually to verify the syntax of the file whenever you make changes. The validator is useful if you want to copy a new ports.conf file to the switch with automation tools, then validate that it has the correct syntax.
To run the validator manually, run the /usr/cumulus/bin/validate-ports -f <file> command. For example:
This section shows basic commands for troubleshooting switch ports. For a more comprehensive troubleshooting guide, see Troubleshoot Layer 1.
Interface Settings
To see all settings for an interface, run the nv show interface <interface> command:
cumulus@switch:~$ nv show interface swp1 operational applied
------------------------ ----------------- -------
type swp swp
[acl]
evpn
multihoming
uplink off
ptp
enable off
router
adaptive-routing
enable off
ospf
enable off
ospf6
enable off
pbr
[map]
pim
enable off
synce
enable off
ip
igmp
enable off
ipv4
forward on
ipv6
enable on
forward on
neighbor-discovery
enable on
[dnssl]
home-agent
enable off
[prefix]
[rdnss]
router-advertisement
enable off
vrrp
enable off
vrf default
[gateway]
link
auto-negotiate off on
duplex full full
speed 1G auto
fec auto
mtu 9000 9216
fast-linkup off
[breakout]
state up up
stats
carrier-transitions 4
in-bytes 600 Bytes
in-drops 5
in-errors 0
in-pkts 10
out-bytes 2.11 MB
out-drops 0
out-errors 0
out-pkts 33143
mac 48:b0:2d:39:3f:83
ifindex 3
You can add the --view option to show different views: acl-statistics, brief, detail, lldp, mac, mlag-cc, pluggables, qos-profile, and small. For example, the nv show interface --view=small command lists the interfaces on the switch. The nv show interface --view=brief command shows information about each interface on the switch, such as the interface type, speed, remote host and port. The nv show interface --view=mac command shows the MAC address of each interface.
The description column only shows in the output when you use the --view=detail option.
The following example shows the MAC address of each interface on the switch:
cumulus@switch:~$ nv show interface --view=mac
Interface State Speed MTU MAC Type
---------- ----- ----- ----- ----------------- --------
BLUE up 65575 2a:f9:b5:3c:74:b8 vrf
RED up 65575 8e:91:ed:ed:d5:76 vrf
bond1 up 1G 9000 48:b0:2d:39:3f:83 bond
bond2 up 1G 9000 48:b0:2d:b3:5e:18 bond
bond3 up 1G 9000 48:b0:2d:c2:9d:47 bond
br_default up 9216 44:38:39:22:01:7a bridge
br_l3vni up 9216 44:38:39:22:01:7a bridge
eth0 up 1G 1500 44:38:39:22:01:7a eth
lo up 65536 00:00:00:00:00:00 loopback
mgmt up 65575 8a:58:d0:25:47:7d vrf
swp1 up 1G 9000 48:b0:2d:39:3f:83 swp
swp2 up 1G 9000 48:b0:2d:b3:5e:18 swp
swp3 up 1G 9000 48:b0:2d:c2:9d:47 swp
swp4 down 1500 48:b0:2d:c2:7e:cd swp
swp5 down 1500 48:b0:2d:6e:bc:c1 swp
swp6 down 1500 48:b0:2d:2d:89:16 swp
...
You can filter the nv show interface command output on specific columns. For example, the nv show interface --filter mtu=1500 shows only the interfaces with MTU set to 1500.
To filter on multiple column outputs, enclose the filter types in parentheses; for example, nv show interface --filter "type=bridge&mtu=9216" shows data for bridges with MTU 9216.
You can filter on all revisions (operational, applied, and pending); for example, nv show interface --filter mtu=1500 --rev=applied shows only the interfaces with MTU set to 1500 in the applied revision.
The following example shows information for all bridges configured on the switch with MTU 9216:
cumulus@switch:~$ nv show interface --filter "type=bridge&mtu=9216"
Interface State Speed MTU Type Remote Host Remote Port Summary
---------- ----- ----- ---- ------ ----------- ----------- ---------------------------------------
br_default up 9216 bridge IP Address: fe80::4638:39ff:fe22:17a/64
br_l3vni up 9216 bridge IP Address: fe80::4638:39ff:fe22:17a/64
Statistics
To show interface statistics, run the NVUE nv show interface <interface> counters command or the Linux sudo ethtool -S <interface> command.
To verify SFP settings, run the NVUE nv show interface <interface> pluggable command or the ethtool -m command. The following example shows the vendor, type and power output for swp1.
cumulus@switch:~$ sudo ethtool -m swp1 | egrep 'Vendor|type|power\s+:'
Transceiver type : 10G Ethernet: 10G Base-LR
Vendor name : FINISAR CORP.
Vendor OUI : 00:90:65
Vendor PN : FTLX2071D327
Vendor rev : A
Vendor SN : UY30DTX
Laser output power : 0.5230 mW / -2.81 dBm
Receiver signal average optical power : 0.7285 mW / -1.38 dBm
Considerations
Auto-negotiation and FEC
If auto-negotiation is off on 100G and 25G interfaces, you must set FEC to OFF, RS, or BaseR to match the neighbor. The FEC default setting of auto does not link up when auto-negotiation is off.
Auto-negotiation and Link Speed
If auto-negotiation is on and you set the link speed for a port, Cumulus Linux disables auto-negotiation and uses the port speed setting you configure.
Auto-negotiation with the Spectrum-4 Switch
When you connect an NVIDIA Spectrum-4 switch to another NVIDIA Spectrum-4 switch with PAM4 modulation, you must enable auto-negotiation.
1000BASE-T SFP Modules Supported Only on Certain 25G Platforms
The following 25G switches support 1000BASE-T SFP modules:
NVIDIA SN2410
NVIDIA SN2010
100G or faster switches do not support 1000BASE-T SFP modules.
NVIDIA SN2100 Switch and eth0 Link Speed
After rebooting the NVIDIA SN2100 switch, eth0 always has a speed of 100MB per second. If you bring the interface down and then back up again, the interface negotiates 1000MB. This only occurs the first time the interface comes up.
To work around this issue, add the following commands to the /etc/rc.local file to flap the interface automatically when the switch boots:
modprobe -r igb
sleep 20
modprobe igb
NVIDIA SN5600 Switch and Force Mode
When you configure force mode on NVIDIA SN5600 switch ports 10 through 50, the Rx precoding setting must be the same between local and peer ports to get the optimal Signal-Integrity of the link.
Delay in Reporting Interface as Operational Down
When you remove two transceivers simultaneously from a switch, both interfaces show the carrier down status immediately. However, it takes one second for the second interface to show the operational down status. In addition, the services on this interface also take an extra second to come down.
NVIDIA Spectrum-2 Switches and FEC Mode
The NVIDIA Spectrum-2 (25G) switch only supports RS FEC.
ifplugd is an Ethernet link-state monitoring daemon that executes scripts to configure an Ethernet device when you plug in or remove a cable. Follow the steps below to install and configure the ifplugd daemon.
Install ifplugd
You can install this package even if the switch does not connect to the internet. The package is in the cumulus-local-apt-archive repository on the Cumulus Linux image.
To install ifplugd:
Update the switch before installing the daemon:
cumulus@switch:~$ sudo -E apt-get update
Install the ifplugd package:
cumulus@switch:~$ sudo -E apt-get install ifplugd
Configure ifplugd
After you install ifplugd, you must edit two configuration files:
/etc/default/ifplugd
/etc/ifplugd/action.d/ifupdown
The example configuration below configures ifplugd to bring down all uplinks when the peer bond goes down in an MLAG environment.
Open /etc/default/ifplugd in a text editor and configure the file as appropriate. Add the peerbond name before you save the file.
Open the /etc/ifplugd/action.d/ifupdown file in a text editor. Configure the script, then save the file.
#!/bin/sh
set -e
case "$2" in
up)
clagrole=$(clagctl | grep "Our Priority" | awk '{print $8}')
if [ "$clagrole" = "secondary" ]
then
#List all the interfaces below to bring up when clag peerbond comes up.
for interface in swp1 bond1 bond3 bond4
do
echo "bringing up : $interface"
ip link set $interface up
done
fi
;;
down)
clagrole=$(clagctl | grep "Our Priority" | awk '{print $8}')
if [ "$clagrole" = "secondary" ]
then
#List all the interfaces below to bring down when clag peerbond goes down.
for interface in swp1 bond1 bond3 bond4
do
echo "bringing down : $interface"
ip link set $interface down
done
fi
;;
esac
Restart the ifplugd daemon to implement the changes:
The default shell for ifplugd is dash (/bin/sh) instead of bash, as it provides a faster and more nimble shell. However, dash contains fewer features than bash (for example, dash is unable to handle multiple uplinks).
Quality of Service
This section refers to frames for all internal QoS functionality. Unless explicitly stated, the actions are independent of layer 2 frames or layer 3 packets.
Cumulus Linux supports several different QoS features and standards including:
Cumulus Linux uses two configuration files for QoS:
/etc/cumulus/datapath/qos/qos_features.conf includes all standard QoS configuration, such as marking, shaping and flow control.
/etc/mlx/datapath/qos/qos_infra.conf includes all platform specific configurations, such as buffer allocations and Alpha values.
Cumulus Linux 5.0 and later does not use the traffic.conf and datapath.conf files but uses the qos_features.conf and qos_infra.conf files instead. Before upgrading Cumulus Linux, review your existing QoS configuration to determine the changes you need to make.
switchd and QoS
When you run Linux commands to configure QoS, you must apply QoS changes to the ASIC with the following command:
Unlike the restart command, the reload switchd.service command does not impact traffic forwarding except when the qos_infra.conf file changes, or when the switch pauses frames or controls priority flow, which require modifications to the ASIC buffer and might result in momentary packet loss.
NVUE reloads the switchd service automatically. You do not have to run the reload switchd.service command to apply changes when configuring QoS with NVUE commands.
Classification
When a frame or packet arrives on the switch, Cumulus Linux maps it to an internal COS (switch priority) value. This value never writes to the frame or packet but classifies and schedules traffic internally through the switch.
You can define which values are trusted: 802.1p, DSCP, or both.
The following table describes the default classifications for various frame and switch priority configurations:
Setting
VLAN Tagged?
IP or Non-IP
Result
PCP (802.1p)
Yes
IP
Accept incoming 802.1p marking.
PCP (802.1p)
Yes
Non-IP
Accept incoming 802.1p marking.
PCP (802.1p)
No
IP
Use the default priority setting.
PCP (802.1p)
No
Non-IP
Use the default priority setting.
DSCP
Yes
IP
Accept incoming DSCP IP header marking.
DSCP
Yes
Non-IP
Use the default priority setting.
DSCP
No
IP
Accept incoming DSCP IP header marking.
DSCP
No
Non-IP
Use the default priority setting.
PCP (802.1p) and DSCP
Yes
IP
Accept incoming DSCP IP header marking.
PCP (802.1p) and DSCP
Yes
Non-IP
Accept incoming 802.1p marking.
PCP (802.1p) and DSCP
No
IP
Accept incoming DSCP IP header marking.
PCP (802.1p) and DSCP
No
Non-IP
Use the default priority setting.
port
Either
Either
Ignore any existing markings and use the default priority setting.
If you use NVUE to configure QoS, you define which values are trusted with the nv set qos mapping <profile> trust l2 command (802.1p) or the nv set qos mapping <profile> trust l3 command (DSCP) .
If you use Linux commands to configure QoS, you define which values are trusted in the /etc/cumulus/datapath/qos/qos_features.conf file by configuring the traffic.packet_priority_source_set setting to 802.1p or dscp.
Trust 802.1p Marking
To trust 802.1p marking:
When 802.1p (l2) is trusted, Cumulus Linux classifies these ingress 802.1p values to switch priority values:
Switch Priority
802.1p (PCP)
0
0
1
1
2
2
3
3
4
4
5
5
6
6
7
7
The PCP number is the incoming 802.1p marking; for example PCP 0 maps to switch priority 0.
To change the default profile to map PCP 0 to switch priority 4:
If you configure the trust to be l2 but do not specify any PCP to switch priority mappings, Cumulus Linux uses the default values.
To show the ingress 802.1p mapping for the default profile, run the nv show qos mapping default-global pcp command. To show the PCP mapping for a specific switch priority in the default profile, run the nv show qos mapping default-global pcp <value> command. The following example shows that PCP 0 maps to switch priority 4:
You can map multiple ingress DSCP values to the same switch priority value. For example, to change the default profile to map ingress DSCP values 10, 21, and 36 to switch priority 0:
If you configure the trust to be l3 but do not specify any DSCP to switch priority mappings, Cumulus Linux uses the default values.
To show the DSCP mapping in the default profile, run the nv show qos mapping default-global dscp command. To show the DSCP mapping for a specific switch priority in the default profile, run the nv show qos mapping default-global dscp <value> command. The following example shows that DSCP 22 maps to switch priority 4:
The # in the configuration file is a comment. By default, the file comments out the traffic.cos_*.priority_source.dscp lines. You must uncomment them for them to take effect.
The traffic.cos_ number is the switch priority value; for example DSCP values 0 through 7 map to switch priority 0. To map ingress DSCP 22 to switch priority 4, configure the traffic.cos_4.priority_source.dscp setting.
traffic.cos_4.priority_source.dscp = [22]
You can map multiple ingress DSCP values to the same switch priority value. For example, to map ingress DSCP values 10, 21, and 36 to switch priority 0:
traffic.cos_0.priority_source.dscp = [10,21,36]
You can also choose not to use an switch priority value. This example does not use switch priority values 3 and 4:
To apply a custom DSCP profile to specific interfaces, see Port Groups.
Trust Port
You can assign all traffic to a switch priority regardless of the ingress marking.
The following commands assign all traffic to switch priority 3 regardless of the ingress marking.
cumulus@switch:~$ nv set qos mapping default-global trust port
cumulus@switch:~$ nv set qos mapping default-global port-default-sp 3
cumulus@switch:~$ nv config apply
To show the switch priority setting in the default profile for all traffic regardless of the ingress marking, run the nv show qos mapping default-global command:
cumulus@switch:~$ nv show qos mapping default-global
operational applied description
--------------- ----------- ------- ----------------------------
port-default-sp 3 3 Port Default Switch Priority
trust port port Port Trust configuration
In the /etc/cumulus/datapath/qos/qos_features.conf file, configure traffic.packet_priority_source_set = [port].
The traffic.port_default_priority setting defines the switch priority that all traffic uses.
To apply a custom profile to specific interfaces, see Port Groups.
Mark and Remark Traffic
You can mark or remark traffic in two ways:
Use ingress COS or DSCP to remark an existing 802.1p COS or DSCP value to a new value.
Use iptables to match packets and set 802.1p COS or DSCP values (policy-based marking).
802.1p or DSCP for Marking
To enable global remarking of 802.1p, DSCP or both 802.1p and DSCP values:
In the /etc/cumulus/datapath/qos/qos_features.conf file, modify the traffic.packet_priority_remark_set value to [802.1p], [dscp] or [802.1p,dscp]. For example, to enable the remarking of only 802.1p values:
traffic.packet_priority_remark_set = [802.1p]
You remark 802.1p or DSCP with the priority_remark.8021p or priority_remark.dscp setting. The switch priority (internal cos_) value determines the egress 802.1p or DSCP remarking. For example, to remark switch priority 0 to egress 802.1p 4:
traffic.cos_0.priority_remark.8021p = [4]
To remark switch priority 0 to egress DSCP 22:
traffic.cos_0.priority_remark.dscp = [22]
The # in the configuration file is a comment. The file comments out the traffic.cos_*.priority_remark.8021p and the traffic.cos_*.priority_remark.dscp lines by default. You must uncomment them to set the configuration.
You can remap multiple switch priority values to the same external 802.1p or DSCP value. For example, to map switch priority 1 and 2 to 802.1p 3:
To apply a custom profile to specific interfaces, see Port Groups.
Policy-based Marking
Cumulus Linux supports ACLs through ebtables, iptables or ip6tables for egress packet marking and remarking.
Cumulus Linux uses ebtables to mark layer 2, 802.1p COS values.
Cumulus Linux uses iptables to match IPv4 traffic and ip6tables to match IPv6 traffic for DSCP marking.
For more information on configuring and applying ACLs, refer to Netfilter - ACLs.
Mark Layer 2 COS
You must use ebtables to match and mark layer 2 bridged traffic. You can match traffic with any supported ebtables rule.
To set the new 802.1p COS value when traffic matches, use -A FORWARD -o <interface> -j setqos --set-cos <value>.
You can only set COS on a per-egress interface basis. Cumulus Linux does not support ebtables based matching on ingress.
The configured action always has the following conditions:
The rule is always part of the FORWARD chain.
The interface (<interface>) is a physical swp port.
The jump action is always setqos (lowercase).
The --set-cos value is a 802.1p COS value between 0 and 7.
For example, to set traffic leaving interface swp5 to 802.1p COS value 4:
-A FORWARD -o swp5 -j setqos --set-cos 4
Mark Layer 3 DSCP
You must use iptables (for IPv4 traffic) or ip6tables (for IPv6 traffic) to match and mark layer 3 traffic.
You can match traffic with any supported iptable or ip6tables rule.
To set the new COS or DSCP value when traffic matches, use -A FORWARD -o <interface> -j SETQOS [--set-dscp <value> | --set-cos <value> | --set-dscp-class <name>].
The configured action always has the following conditions:
The rule is always configured as part of the FORWARD chain.
The interface (<interface>) is a physical swp port.
The jump action is always SETQOS (uppercase).
You can configure COS markings with --set-cos and a value between 0 and 7 (inclusive).
You can use only one of --set-dscp or --set-dscp-class. --set-dscp supports decimal or hex DSCP values between 0 and 77.
--set-dscp-class supports standard DSCP naming, described in RFC3260, including ef, be, CS and AF classes.
You can specify either --set-dscp or --set-dscp-class, but not both.
For example, to set traffic leaving interface swp5 to DSCP value 32:
-A FORWARD -o swp5 -j SETQOS --set-dscp 32
To set traffic leaving interface swp11 to DSCP class value CS6:
-A FORWARD -o swp11 -j SETQOS --set-dscp-class cs6
Flow Control
Flow control influences data transmission to manage congestion along a network path.
Cumulus Linux supports the following flow control mechanisms:
Link pause (IEEE 802.3x), sends specialized ethernet frames to an adjacent layer 2 switch to stop or pauseall traffic on the link during times of congestion.
Priority Flow Control (PFC), which is an upgrade of link pause that IEEE 802.1bb defines, extends the pause frame concept to act on a per switch priority value basis instead of an entire link. A PFC pause frame indicates to the peer which specific switch priority value to pause, while other switch priority values or queues continue transmitting.
You can not configure link pause and PFC on the same port.
Flow Control Buffers
Before configuring link pause or PFC, configure the buffer pool memory allocated for lossless and lossy flows. The following example sets each to fifty percent:
cumulus@switch:~$ nv set qos traffic-pool default-lossless memory-percent 50
cumulus@switch:~$ nv set qos traffic-pool default-lossy memory-percent 50
cumulus@switch:~$ nv config apply
Cumulus Linux allocates 100% of the buffer memory to the default-lossy traffic pool by default. The total memory allocation across pools must not exceed 100%.
Edit the following lines in the /etc/mlx/datapath/qos/qos_infra.conf file:
Modify the existing ingress_service_pool.0.percent and egress_service_pool.0.percent buffer allocation. Change the existing ingress setting to ingress_service_pool.0.percent = 50. Change the existing egress setting to egress_service_pool.0.percent = 50.
Add the following lines to create a new service_pool, set flow_control to the service pool, and define buffer reservations:
Link pause is an older flow control mechanism that causes all traffic on a link between two switches, or between a host and switch, to stop transmitting during times of congestion. Link pause starts and stops depending on buffer congestion. You configure link pause on a per-direction, per-interface basis. You can receive pause frames to stop the switch from transmitting when requested, send pause frames to request neighboring devices to stop transmitting, or both.
NVIDIA recommends that you use Priority Flow Control (PFC) instead of link pause.
Before configuring link pause, you must first modify the switch buffer allocation. Refer to Flow Control Buffers.
Link pause buffer calculation is a complex topic that IEEE 802.1Q-2012 defines. This attempts to incorporate the delay between signaling congestion and the reception of the signal by the neighboring device. This calculation includes the delay that the PHY and MAC layers (interface delay) introduce as well as the distance between end points (cable length).
Incorrect cable length settings can cause wasted buffer space (triggering congestion too early) or packet drops (congestion occurs before flow control activates).
The following example configuration:
Creates a profile (port group) called my_pause_ports.
Enables sending pause frames and disables receiving pause frames.
Sets the cable length to 50 meters.
Sets link pause on swp1 through swp4, and swp6.
Cumulus Linux also includes frame transmission start and stop threshold, and port buffer settings. NVIDIA recommends that you do not change these settings but, instead, let Cumulus Linux configure the settings dynamically. Only change the threshold and buffer settings if you are an advanced user who understands the buffer configuration requirements for lossless traffic to work seamlessly.
cumulus@switch:~$ nv set qos link-pause my_pause_ports tx enable
cumulus@switch:~$ nv set qos link-pause my_pause_ports rx disable
cumulus@switch:~$ nv set qos link-pause my_pause_ports cable-length 50
cumulus@switch:~$ nv set interface swp1-swp4,swp6 qos link-pause profile my_pause_ports
cumulus@switch:~$ nv config apply
To show the link pause settings for a profile, run the nv show qos link-pause <profile> command
Uncomment and edit the link_pause section of the /etc/cumulus/datapath/qos/qos_features.conf file.
To process pause frames, you must enable link pause on the specific interfaces.
Priority Flow Control (PFC)
Priority flow control extends the capabilities of link pause by the frames for a specific 802.1p value instead of stopping all traffic on a link. If a switch supports PFC and receives a PFC pause frame for a given 802.1p value, the switch stops transmitting frames from that queue, but continues transmitting frames for other queues.
You use PFC with RDMA over Converged Ethernet - RoCE. The RoCE section provides information to specifically deploy PFC and ECN for RoCE environments.
Before configuring PFC, first modify the switch buffer allocation according to Flow Control Buffers.
PFC buffer calculation is a complex topic defined in IEEE 802.1Q-2012, which attempts to incorporate the delay between signaling congestion and receiving the signal by the neighboring device. This calculation includes the delay that the PHY and MAC layers (called the interface delay) introduce as well as the distance between end points (cable length). Incorrect cable length settings cause wasted buffer space (triggering congestion too early) or packet drops (congestion occurs before flow control activates).
To apply PFC settings on all ports, modify the default PFC profile (default-global).
The following example modifies the default profile and configures:
PFC on egress queue 0.
Enables sending pause frames and disables receiving pause frames.
The cable length to 50 meters.
Cumulus Linux also includes frame transmission start and stop threshold, and port buffer settings. NVIDIA recommends that you do not change these settings but, instead, let Cumulus Linux configure the settings dynamically. Only change the threshold and buffer settings if you are an advanced user who understands the buffer configuration requirements for lossless traffic to work seamlessly.
cumulus@switch:~$ nv set qos pfc default-global switch-priority 0
cumulus@switch:~$ nv set qos pfc default-global tx enable
cumulus@switch:~$ nv set qos pfc default-global rx disable
cumulus@switch:~$ nv set qos pfc default-global cable-length 50
cumulus@switch:~$ nv config apply
To show the PFC settings for the default profile, run the nv show qos pfc default-global command:
cumulus@switch:~$ nv show qos pfc default-global
operational applied description
----------------- ----------- ------- --------------------------------
cable-length 50 50 Cable Length (in meters)
port-buffer 25000 B 25000 B Port Buffer (in bytes)
rx disable disable PFC Rx State
tx enable enable PFC Tx State
xoff-threshold 10000 B 10000 B Xoff Threshold (in bytes)
xon-threshold 2000 B 2000 B Xon Threshold (in bytes)
[switch-priority] 0 0 Collection of switch priorities.
Edit the priority flow control section of the /etc/cumulus/datapath/qos/qos_features.conf file.
To apply a custom profile to specific interfaces, see Port Groups.
PFC Watchdog
PFC watchdog detects and mitigates pause storms on PFC-enabled ports.
In lossless Ethernet, the switch sends PFC PAUSE frames to instruct the link partner to pause sending packets on a traffic class. This back pressure might propagate across the network and, if it persists, can cause the network to stop forwarding traffic. PFC watchdog detects abnormal back pressure caused by receiving an excessive number of pause frames and disables PFC temporarily.
When a lossless queue receives a pause storm from its link partner and the queue is in a paused state for a certain period of time, PFC watchdog mitigates the pause storm. The watchdog stops processing received pause frames on every switch priority corresponding to the traffic class that detects the storm and discards new incoming packets to this egress queue.
The watchdog continues to count pause frames received on the port. If there are no pause frames received in any polling interval period, it restores the PFC configuration on the port and stops dropping packets.
PFC watchdog also detects and mitigates pause storms on link pause-enabled ports. The watchdog configuration for link pause-enabled ports is the same as the configuration for PFC-enabled ports. For a link pause-enabled port, the watchdog stops processing received pause frames on the egress port that detects the storm and discards new incoming packets to all egress queues on the port until congestion diminishes.
PFC watchdog only works for lossless traffic queues.
You can only configure PFC watchdog on a port with PFC (or link pause) configuration.
You can only enable PFC watchdog on a physical interface (swp).
You cannot enable the watchdog on a bond (for example, bond0) but you can enable the watchdog on a port that is a member of a bond (for example, swp1).
To enable PFC watchdog:
Enable PFC watchdog on the interfaces where you enable PFC:
cumulus@switch:~$ nv set interface swp1 qos pfc-watchdog
cumulus@switch:~$ nv set interface swp3 qos pfc-watchdog
cumulus@switch:~$ nv config apply
To disable PFC watchdog, run the nv unset interface <interface> qos pfc-watchdog command or the nv set interface <interface> qos pfc-watchdog state disable command.
Edit the PFC Watchdog Configuration section of the /etc/cumulus/datapath/qos/qos_features.conf file, then reload switchd.
...
# PFC Watchdog Configuration
# Add the port to the port_group_list where you want to enable PFC Watchdog
# It will enable PFC Watchdog on all the traffic-class corresponding to
# the lossless switch-priority configured on the port.
pfc_watchdog.port_group_list = [pfc_wd_port_group]
pfc_watchdog.pfc_wd_port_group.port_set = swp1,swp2
...
cumulus@switch:~$ sudo systemctl reload switchd
You can control the PFC watchdog polling interval and how many polling intervals the PFC watchdog must wait before it mitigates the storm condition. The default polling interval is 100 milliseconds. The default number of polling intervals is 3.
The following example sets the PFC watchdog polling interval to 200 milliseconds and the number of polling intervals to 5:
cumulus@switch:~$ nv set qos pfc-watchdog polling-interval 200
cumulus@switch:~$ nv set qos pfc-watchdog robustness 5
cumulus@switch:~$ nv config apply
Edit the /etc/cumulus/switchd.conf file to set the pfc_wd.poll_interval parameter and the pfc_wd.robustness parameter.
To show if PFC watchdog is on and to show the status for each traffic class, run the nv show interface <interface> qos pfc-watchdog command:
cumulus@switch:~$ nv show interface swp1 qos pfc-watchdog
operational applied
--------------- ----------- -------
state enabled enabled
PFC WD Status
===========================
traffic-class status deadlock-count
------------- -------- --------------
0 OK 0
1 OK 3
2 DEADLOCK 2
3 OK 0
4 OK 0
5 OK 0
6 OK 0
7 DEADLOCK 3
To show PFC watchdog data for a specific traffic class, run the nv show interface <interface> qos pfc-watchdog status <traffic-class> command.
To clear the PFC watchdog deadlock-count on an interface, run the nv action clear interface <interface> qos pfc-watchdog deadlock-count command.
Congestion Control (ECN)
Explicit Congestion Notification (ECN) is an end-to-end layer 3 congestion control protocol. Defined by RFC 3168, ECN relies on bits in the IPv4 header Traffic Class to signal congestion conditions. ECN requires one or both server endpoints to support ECN to be effective.
Instead of telling adjacent devices to stop transmitting during times of buffer congestion, ECN sets the ECN bits of the transit IPv4 or IPv6 header to indicate to end hosts that congestion might occur. As a result, the sending hosts reduce their sending rate until the transit switch no longer sets ECN bits.
ECN operates by having a transit switch that marks packets between two end hosts.
The transmitting host indicates it is ECN-capable by setting the ECN bits in the outgoing IP header to 01 or 10
If the buffer of a transit switch is greater than the configured minimum threshold of the buffer, the switch remarks the ECN bits to 11 indicating Congestion Encountered or CE.
The receiving host marks any reply packets, like a TCP-ACK, as CE (11).
The original transmitting host reduces its transmission rate.
When the switch buffer congestion falls below the configured minimum threshold of the buffer, the switch stops remarking ECN bits, setting them back to 01 or 10.
A receiving host reflects this new ECN marking in the next reply so that the transmitting host resumes sending at normal speeds.
The default profile (default-global) enables ECN by default on egress queue 0 for all ports with the following settings:
A minimum buffer threshold of 150000 bytes. Random ECN marking starts when buffer congestion crosses this threshold. The probability determines if ECN marking occurs.
A maximum buffer threshold of 1500000 bytes. Cumulus Linux marks all ECN-capable packets when buffer congestion crosses this threshold.
A probability of 100 percent that Cumulus Linux marks an ECN-capable packet when buffer congestion is between the minimum threshold and the maximum threshold.
Random Early Detection (RED) disabled. ECN prevents packet drops in the network due to congestion by signaling hosts to transmit less. However, if congestion continues after ECN marking, packets drop after the switch buffer is full. By default, Cumulus Linux tail-drops packets when the buffer is full. You can enable RED to drop packets that are in the queue randomly instead of always dropping the last arriving packet. This might improve overall performance of TCP based flows.
The following example commands change the default ECN profile that applies to all ports. The commands enable ECN on egress queue 4, 5, and 7, set the minimum buffer threshold to 40000 and the maximum buffer threshold to 200000, and enable RED.
cumulus@switch:~$ nv set qos congestion-control default-global traffic-class 4,5,7 min-threshold 40000
cumulus@switch:~$ nv set qos congestion-control default-global traffic-class 4,5,7 max-threshold 200000
cumulus@switch:~$ nv set qos congestion-control default-global traffic-class 4,5,7 red enable
cumulus@switch:~$ nv config apply
The following example disables ECN bit marking in the default profile for all ports.
To show the ECN settings for the default profile, run the nv show qos congestion-control default-global command:
cumulus@switch:~$ nv show qos congestion-control default-global
operational applied description
-- ----------- ------- -----------
ECN Configurations
=====================
traffic-class ECN RED Min Th Max Th Probability
------------- ------ ------ ------- -------- -----------
4 enable enable 40000 B 200000 B 100
5 enable enable 40000 B 200000 B 100
7 enable enable 40000 B 200000 B 100
To show the ECN settings in the default profile for a specific egress queue, run the nv show qos congestion-control default-global traffic-class <value> command:
cumulus@switch:~$ nv show qos congestion-control default-global traffic-class 4
operational applied description
------------- ----------- -------- -----------------------------------
ecn enable enable Early Congestion Notification State
max-threshold 200000 B 200000 B Maximum Threshold (in bytes)
min-threshold 40000 B 40000 B Minimum Threshold (in bytes)
probability 100 100 Probability
red enable enable Random Early Detection State
Edit the Explicit Congestion Notification section of the /etc/cumulus/datapath/qos/qos_features.conf file.
To disable ECN bit marking, set ecn_enable to false. The following example disables ECN bit marking in the default profile for all ports.
...
default_ecn_red_conf.ecn_enable = false
...
To apply a custom ECN profile to specific interfaces, see Port Groups.
Egress Queues
Cumulus Linux supports eight egress queues to provide different classes of service. By default switch priority values map directly to the matching egress queue. For example, switch priority value 0 maps to egress queue 0.
You can remap queues by changing the switch priority value to the corresponding queue value. You can map multiple switch priority values to a single egress queue.
You do not have to assign all egress queues.
The following command examples assign switch priority 2 to egress queue 7:
To show the egress queue mapping for a specific switch priority in the default profile, run the nv show qos egress-queue-mapping default-global switch-priority <value> command. The following example command shows that switch priority 2 maps to egress queue 7.
cumulus@switch:~$ nv show qos egress-queue-mapping default-global switch-priority 2
operational applied description
------------- ----------- ------- -------------
traffic-class 7 7 Traffic Class
You configure egress queues in the qos_infra.conf file.
Cumulus Linux supports 802.1Qaz, Enhanced Transmission Selection, which allows the switch to assign bandwidth to egress queues and then schedule the transmission of traffic from each queue. 802.1Qaz supports Priority Queuing.
Cumulus Linux provides a default egress scheduler that applies to all ports, where the bandwidth allocated to egress queues 0,2,4,6 is 12 percent and the bandwidth allocated to egress queues 1,3,5,7 is 13 percent. You can also apply a custom egress scheduler for specific ports; see Port Groups.
The following example modifies the default profile. The commands change the bandwidth allocation for egress queues 0, 1, 5, and 7 to strict, bandwidth allocation for egress queues 2 and 6 to 30 percent and bandwidth allocation for egress queues 3 and 4 to 20 percent.
The traffic-class value defines the egress queue where you want to assign bandwidth. For example, traffic-class 2 defines the bandwidth allocation for egress queue 2.
For each egress queue, you can either define the mode as dwrr or strict. In dwrr mode, you must define a bandwidth percent value between 1 and 100. If you do not specify a value for an egress queue, Cumulus Linux uses a DWRR value of 0 (no egress scheduling). The combined total of values you assign to bw_percent must be less than or equal to 100.
You configure the egress scheduling policy in the egress scheduling section of the /etc/cumulus/datapath/qos/qos_features.conf file.
The egr_queue_ value defines the egress queue where you want to assign bandwidth. For example, egr_queue_0 defines the bandwidth allocation for egress queue 0.
The bw_percent value defines the bandwidth allocation you want to assign to an egress queue. If you do not specify a value for an egress queue, there is no egress scheduling. If you specify a value of 0 for an egress queue, Cumulus Linux assigns strict priority mode to the egress queue and always processes it ahead of other queues. The combined total of values you assign to bw_percent must be less than or equal to 100.
strict mode does not define a maximum bandwidth allocation. This can lead to starvation of other queues.
To apply a custom egress scheduler for specific ports, see Port Groups.
Policing and Shaping
Traffic shaping and policing control the rate at which the switch sends or receives traffic on a network to prevent congestion.
Traffic shaping typically occurs at egress and traffic policing at ingress.
Shaping
Traffic shaping allows a switch to send traffic at an average bitrate lower than the physical interface. Traffic shaping prevents a receiving device from dropping bursty traffic if the device is either not capable of that rate of traffic or has a policer that limits what it accepts.
Traffic shaping works by holding packets in the buffer and releasing them at specific time intervals.
Cumulus Linux supports two levels of hierarchical traffic shaping: one at the egress queue level and one at the port level. This allows for minimum and maximum bandwidth guarantees for each egress queue and a defined port traffic shaping rate.
The following example configuration:
Sets the profile name (port group) to use with the traffic shaping settings to shaper1.
Sets the minimum bandwidth for egress queue 2 to 100 kbps. The default minimum bandwidth is 0 kbps.
Sets the maximum bandwidth for egress queue 2 to 500 kbps. The default minimum bandwidth is 2147483647 kbps.
Sets the maximum packet shaper rate for the port group to 200000. The default maximum packet shaper rate is 2147483647 kbps.
Applies the traffic shaping configuration to swp1, swp2, swp3, and swp5.
When the minimum bandwidth for an egress queue is 0, there is no bandwidth guarantee for this queue.
The maximum bandwidth for an egress queue must not exceed the maximum packet shaper rate for the port group.
The maximum packet shaper rate for the port group must not exceed the physical interface speed.
Cumulus Linux only shapes traffic for the traffic classes in a profile that include shaper configuration.
Traffic policing prevents an interface from receiving more traffic than intended. You use policing to enforce a maximum transmission rate on an interface. The switch drops any traffic above the policing level.
Cumulus Linux supports both a single-rate policer and a dual-rate policer (tricolor policer).
You configure traffic policing using ebtables, iptables, or ip6table rules.
For more information on configuring and applying ACLs, refer to Netfilter - ACLs.
Single-rate Policer
To configure a single-rate policer, use iptables JUMP action -j POLICE.
Cumulus Linux supports the following iptable flags with a single-rate policer.
iptables Flag
Description
--set-mode [pkt | KB]
Define the policer to count packets or kilobytes.
--set-rate [<kbytes> | <packets>]
The maximum rate of traffic in kilobytes or packets per second.
--set-burst <kilobytes>
The allowed burst size in kilobytes.
For example, to create a policer to allow 400 packets per second with 100 packet burst: -j POLICE --set-mode pkt --set-rate 400 --set-burst 100
Dual-rate Policer
To configure a dual-rate policer, use the iptables JUMP action -j TRICOLORPOLICE.
Cumulus Linux supports the following iptable flags with a dual-rate policer.
iptables Flag
Description
--set-color-mode [blind | aware]
The policing mode: single-rate (blind) or dual-rate (aware). The default is aware.
--set-cir <kbps>
The committed information rate (CIR) in kilobits per second.
--set-cbs <kbytes>
The committed burst size (CBS) in kilobytes.
--set-pir <kbps>
The peak information rate (PIR) in kilobits per second.
--set-ebs <kbytes>
The excess burst size (EBS) in kilobytes.
--set-conform-action-dscp <dscp value>
The numerical DSCP value to mark for traffic that conforms to the policer rate.
--set-exceed-action-dscp <dscp value>
The numerical DSCP value to mark for traffic that exceeds the policer rate.
--set-violate-action-dscp <dscp value>
The numerical DSCP value to mark for traffic that violates the policer rate.
--set-violate-action [accept | drop]
Cumulus Linux either accepts and remarks, or drops packets that violate the policer rate.
For example, to configure a dual-rate, three-color policer, with a 3 Mbps CIR, 500 KB CBS, 10 Mbps PIR, and 1 MB EBS and drops packets that violate the policer:
Cumulus Linux supports profiles (port groups) for all features including ECN and RED. Profiles apply similar QoS configurations to a set of ports.
Configurations with a profile override the global settings for the ingress ports in the port group.
Ports not in a profile use the global settings.
To apply a profile to all ports, use the global profile.
Trust and Marking
You can use port groups to assign different profiles to different ports. A profile is a label for a group of configuration settings.
The following example configures two profiles. customer1 applies to swp1, swp4, and swp6. customer2 applies to swp5 and swp7.
cumulus@switch:~$ nv set qos mapping customer1 trust l3
cumulus@switch:~$ nv set qos mapping customer1 dscp 0 switch-priority 1-7
cumulus@switch:~$ nv set interface swp1,swp4,swp6 qos mapping profile customer1
cumulus@switch:~$ nv set qos mapping customer2 trust l2
cumulus@switch:~$ nv set qos mapping customer2 pcp 1 switch-priority 4
cumulus@switch:~$ nv set interface swp5,swp7 qos mapping profile customer2
cumulus@switch:~$ nv config apply
The following example configures the profile customports, which assigns traffic on swp1, swp2, and swp3 to switch priority 4 regardless of the ingress marking.
cumulus@switch:~$ nv set qos mapping customports trust port
cumulus@switch:~$ nv set qos mapping customports port-default-sp 4
cumulus@switch:~$ nv set interface swp1,swp2,swp3 qos mapping profile customports
cumulus@switch:~$ nv config apply
You define profiles with the source.port_group_list configuration in the qos_features.conf file. A source.port_group_list is one or more names used for a group of settings.
The following example configures two profiles. customer1 applies to swp1, swp4, and swp6. customer2 applies to swp5 and swp7.
The names of the port groups (profiles) you want to use. The following example defines customer1 and customer2: source.port_group_list = [customer1,customer2]
source.customer1.packet_priority_source_set
The ingress marking trust. In the following example, ingress DSCP values are for group customer1: source.customer1.packet_priority_source_set = [dscp]
source.customer1.port_set
The set of ports on which to apply the ingress marking trust policy. In the following example, ports swp1, swp2, swp3, swp4, and swp6 are for customer1: source.customer1.port_set = swp1-swp4,swp6
source.customer1.port_default_priority
The default switch priority marking for unmarked or untrusted traffic. In the following example, Cumulus Linux marks unmarked traffic or layer 2 traffic for customer1 ports with switch priority 0: source.customer1.port_default_priority = 0
source.customer1.cos_0.priority_source
The ingress DSCP values to a switch priority value mapping for customer1. In the following example, the set of DSCP values from 0 through 7 map to switch priority 0: source.customer1.cos_0.priority_source.dscp = [0,1,2,3,4,5,6,7]
source.customer2.packet_priority_source_set
The ingress marking trust for customer2. In the following example, 802.1p is trusted: source.packet_priority_source_set = [802.1p]
source.customer2.port_set
The set of ports on which to apply the ingress marking trust policy. In the following example, swp5 and swp7 apply for customer2: source.customer2.port_set = swp5,swp7
source.customer2.port_default_priority
The default switch priority marking for unmarked or untrusted traffic. In the following example, Cumulus Linux marks unmarked tagged layer 2 traffic or unmarked VLAN tagged traffic for customer1 ports with switch priority 0: source.customer2.port_default_priority = 0
source.customer2.cos_0.priority_source
The switch priority value to an ingress 802.1p value mapping for customer2. The following example maps ingress 802.1p value 4 to switch priority 1: source.customer2.cos_1.priority_source.8021p = [4]
The following example configures the profile customports, which assigns traffic on swp1, swp2, and swp3 to switch priority 4 regardless of the ingress marking.
You can use profiles to remark 802.1p or DSCP on egress according to the switch priority (internal COS) value.
To change the marked value on a packet, the switch ASIC reads the enable or disable rewrite flag on the ingress port and refers to the mapping configuration on the egress port to change the marked value. To remark 802.1p or DSCP values, you have to enable the rewrite on the ingress port and configure the mapping on the egress port.
In the following example configuration, only packets that ingress on swp1 and egress on swp2 change the marked value of the packet. Packets that ingress on other ports and egress on swp2 do not change the marked value of the packet. The commands map switch priority 0 and 1 to egress DSCP 37.
cumulus@switch:~$ nv set qos remark remark_port_group1 rewrite l3
cumulus@switch:~$ nv set interface swp1 qos remark profile remark_port_group1
cumulus@switch:~$ nv set qos remark remark_port_group2 switch-priority 0 dscp 37
cumulus@switch:~$ nv set qos remark remark_port_group2 switch-priority 1 dscp 37
cumulus@switch:~$ nv set interface swp2 qos remark profile remark_port_group2
cumulus@switch:~$ nv config apply
You define these profiles with remark.port_group_list in the /etc/cumulus/datapath/qos/qos_features.conf file. The name is a label for configuration settings.
You can use port groups with egress scheduling weights to assign different profiles to different egress ports.
In the following example, the profile list2 applies to swp1, swp3, and swp18. list2 only assigns weights to queues 2, 5, and 6, and schedules the other queues on a best-effort basis when there is no congestion in queues 2, 5, or 6. list1 applies to swp2 and assigns weights to all queues.
You define port groups with egress_sched.port_group_list in the /etc/cumulus/datapath/qos/qos_features.conf file. An egress_sched.port_group_list includes the names for the group settings. The name is a label (profile) for the configuration settings.
The names of the port groups (labels) to use. The following example defines port groups list1 snd list2: egress_sched.port_group_list = [list1,list2]
egress_sched.list1.port_set
The interfaces on which you want to apply the port group. egress_sched.list1.port_set = swp2
egress_sched.list1.egr_queue_0.bw_percent
The percentage of bandwidth for egress queue 0. egress_sched.list1.egr_queue_0.bw_percent = 10
egress_sched.list1.egr_queue_1.bw_percent
The percentage of bandwidth for egress queue 1. egress_sched.list1.egr_queue_1.bw_percent = 20
egress_sched.list1.egr_queue_2.bw_percent
The percentage of bandwidth for egress queue 2. egress_sched.list1.egr_queue_2.bw_percent = 30
egress_sched.list1.egr_queue_3.bw_percent
The percentage of bandwidth for egress queue 3. egress_sched.list1.egr_queue_3.bw_percent = 10
egress_sched.list1.egr_queue_4.bw_percent
The percentage of bandwidth for egress queue 4. egress_sched.list1.egr_queue_4.bw_percent = 10
egress_sched.list1.egr_queue_5.bw_percent
The percentage of bandwidth for egress queue 5.
egress_sched.list1.egr_queue_5.bw_percent = 10
egress_sched.list1.egr_queue_6.bw_percent
The percentage of bandwidth for egress queue 6. egress_sched.list1.egr_queue_6.bw_percent = 10
egress_sched.list1.egr_queue_7.bw_percent
The percentage of bandwidth for egress queue 7. 0 indicates a strict priority queue: egress_sched.list1.egr_queue_7.bw_percent = 0
egress_sched.list2.port_set
The interfaces you want to apply to the port group. The following example applies swp1, swp3 and swp18 to port group list2: egress_sched.list2.port_set = [swp1,swp3,swp18]
egress_sched.list2.egr_queue_2.bw_percent
The percentage of bandwidth for egress queue 2. egress_sched.list2.egr_queue_2.bw_percent = 50
egress_sched.list2.egr_queue_5.bw_percent
The percentage of bandwidth for egress queue 5. egress_sched.list2.egr_queue_5.bw_percent = 50
egress_sched.list2.egr_queue_6.bw_percent
The percentage of bandwidth for egress queue 6. 0 indicates a strict priority queue: egress_sched.list2.egr_queue_6.bw_percent = 0
PFC
To set priority flow control on a group of ports, you create a profile to define the egress queues that support sending PFC pause frames and define the set of interfaces to which you want to apply PFC pause frame configuration. Cumulus Linux automatically enables PFC frame transmit and PFC frame receive, and derives all other PFC settings, such as the buffer limits that trigger PFC frames transmit to start and stop, the amount of reserved buffer space, and the cable length.
The following example applies a PFC profile called my_pfc_ports for egress queue 3 and 5 on swp1, swp2, swp3, swp4, and swp6.
The following example applies a PFC profile called my_pfc_ports2 for egress queue 0 on swp1. The commands disable PFC frame receive, and set the buffer limit that triggers PFC frame transmission to stop to 1500 bytes and to start to 1000 bytes. The commands also set the amount of reserved buffer space to 2000 bytes, and the cable length to 50 meters:
cumulus@switch:~$ nv set qos pfc my_pfc_ports2 switch-priority 0
cumulus@switch:~$ nv set qos pfc my_pfc_ports2 xoff-threshold 1500
cumulus@switch:~$ nv set qos pfc my_pfc_ports2 xon-threshold 1000
cumulus@switch:~$ nv set qos pfc my_pfc_ports2 tx enable
cumulus@switch:~$ nv set qos pfc my_pfc_ports2 rx disable
cumulus@switch:~$ nv set qos pfc my_pfc_ports2 port-buffer 2000
cumulus@switch:~$ nv set qos pfc my_pfc_ports2 cable-length 50
cumulus@switch:~$ nv set interface swp1 qos pfc profile my_pfc_ports2
cumulus@switch:~$ nv config apply
All PFC commands
Command
Description
nv set qos pfc <profile> port-buffer <value>
The amount of reserved buffer space (from the global shared buffer) for the interfaces defined in the port group list . The following example sets the amount of reserved buffer space to 25000 bytes: nv set qos pfc my_pfc_ports port-buffer 25000
nv set qos pfc <profile> xoff-threshold <value>
The amount of reserved buffer that the switch must consume before sending a PFC pause frame out of the set of interfaces in the port group list. The following example sends PFC pause frames after consuming 20000 bytes of reserved buffer: nv set qos pfc my_pfc_ports xoff-threshold 20000
nv set qos pfc <profile> xon-threshold <value>
The number of bytes below the xoff threshold that the buffer consumption must drop below before sending PFC pause frames stops. In the following example, the buffer congestion must reduce by 1000 bytes (to 8000 bytes) before PFC pause frames stop: nv set qos pfc my_pfc_ports xon-threshold 1000
nv set qos pfc <profile> rx enable nv set qos pfc <profile> rx disable
Enables or disables sending PFC pause frames. The default value is enable. The following example disables sending PFC pause frames: nv set qos pfc my_pfc_ports rx disable
nv set qos pfc <profile> tx enable nv set qos pfc <profile> tx disable
Enables or disables receiving PFC pause frames. You do not need to define the COS values for rx enable. The switch receives any COS value. The default value is enable. The following example disables receiving PFC pause frames: nv set qos pfc my_pfc_ports tx disable
nv set qos pfc <profile> cable-length <value>
The length, in meters, of the cable that attaches to the ports. Cumulus Linux uses this value internally to determine the latency between generating a PFC pause frame and receiving the PFC pause frame. The default is 10 meters. The following example sets the cable length to 5 meters: nv set qos pfc my_pfc_ports cable-length 5
Edit the priority flow control section of the /etc/cumulus/datapath/qos/qos_features.conf file.
The following example applies a PFC profile called my_pfc_ports for egress queue 3 and 5 on swp1, swp2, swp3, swp4, and swp6.
The following example applies a PFC profile called my_pfc_ports2 for egress queue 0 on swp1. The commands also disable PFC frame receive, and set the xoff-size to 1500 bytes, the xon-size to 1000 bytes, the headroom to 2000 bytes, and the cable length to 10 meters:
The amount of reserved buffer space (from the global shared buffer) for the interfaces defined in the port group list. The following example sets the amount of reserved buffer space to 25000 bytes: pfc.my_pfc_ports.port_buffer_bytes = 25000
pfc.my_pfc_ports.xoff_size
The amount of reserved buffer that the switch must consume before sending a PFC pause frame out the set of interfaces in the port group list. The following example sends PFC pause frames after consuming 10000 bytes of reserved buffer: pfc.my_pfc_ports.xoff_size = 10000
pfc.my_pfc_ports.xon_delta
The number of bytes below the xoff threshold that the buffer consumption must drop below before sending PFC pause frames stops. The following example the buffer congestion must reduce by 2000 bytes (to 8000 bytes) before PFC pause frames stop: pfc.my_pfc_ports.xon_delta = 2000
pfc.my_pfc_ports.rx_enable
Enables (true) or disables (false) sending PFC pause frames. The default value is true. The following example enables sending PFC pause frames: pfc.my_pfc_ports.tx_enable = true
pfc.my_pfc_ports.tx_enable
Enables (true) or disables (false) receiving PFC pause frames. You do not need to define the COS values for rx_enable. The switch receives any COS value. The default value is true. The following example enables receiving PFC pause frames: pfc.my_pfc_ports.rx_enable = true
pfc.my_pfc_ports.cable_length
The length, in meters, of the cable that attaches to the port in the port group list. Cumulus Linux uses this value internally to determine the latency between generating a PFC pause frame and receiving the PFC pause frame. The default is 10 meters In this example, the cable is 5 meters: pfc.my_pfc_ports.cable_length = 5
ECN
You can create ECN profiles and assign them to different ports.
The following example creates a custom ECN profile called my-red-profile for egress queue (traffic-class) 1 and 2. The commands set the minimum buffer threshold to 40000 bytes, maximum buffer threshold to 200000 bytes, and the probability to 10. The commands also enable RED and apply the ECN profile to swp1 and swp2.
cumulus@switch:~$ nv set qos congestion-control my-red-profile traffic-class 1,2 min-threshold-bytes 40000
cumulus@switch:~$ nv set qos congestion-control my-red-profile traffic-class 1,2 max-threshold-bytes 200000
cumulus@switch:~$ nv set qos congestion-control my-red-profile traffic-class 1,2 probability 10
cumulus@switch:~$ nv set qos congestion-control my-red-profile traffic-class 1,2 red enable
cumulus@switch:~$ nv set interface swp1,swp2 qos congestion-control my-red-profile
cumulus@switch:~$ nv config apply
You can configure different thresholds and probability values for different traffic classes in a custom profile:
cumulus@switch:~$ nv set qos congestion-control my-red-profile traffic-class 1,2 min-threshold-bytes 40000
cumulus@switch:~$ nv set qos congestion-control my-red-profile traffic-class 1,2 max-threshold-bytes 200000
cumulus@switch:~$ nv set qos congestion-control my-red-profile traffic-class 1,2 probability 10
cumulus@switch:~$ nv set qos congestion-control my-red-profile traffic-class 1,2 red enable
cumulus@switch:~$ nv set qos congestion-control my-red-profile traffic-class 4 min-threshold-bytes 30000
cumulus@switch:~$ nv set qos congestion-control my-red-profile traffic-class 4 max-threshold-bytes 150000
cumulus@switch:~$ nv set qos congestion-control my-red-profile traffic-class 4 probability 80
cumulus@switch:~$ nv set interface swp1,swp2 qos congestion-control my-red-profile
cumulus@switch:~$ nv config apply
You can disable ECN bit marking for an ECN profile. The following example disables ECN bit marking in the my-red-profile profile:
Edit the Explicit Congestion Notification section of the /etc/cumulus/datapath/qos/qos_features.conf file.
The following example creates a custom ECN profile called my-red-profile for egress queue 1 and 2, with a minimum buffer threshold of 40000 bytes, maximum buffer threshold of 200000 bytes, and a probability of 10. The commands also enable RED and apply the ECN profile to swp1 and swp2.
You can only have a single lossless pool configured on the switch at a time. Configure the roce-lossless pool when you are using RoCE, otherwise configure the default-lossless pool.
You can configure multiple lossy pools concurrently.
You configure a traffic pool by associating switch priorities and defining the buffer memory percentages allocated to the pools. The following example associates switch priority 2 and allocates a memory percentage of 30 for the mc-lossy pool:
cumulus@switch:~$ nv set qos traffic-pool default-lossy switch-priority 0,1,3,4,5,6,7
cumulus@switch:~$ nv set qos traffic-pool default-lossy memory-percent 70
cumulus@switch:~$ nv set qos traffic-pool mc-lossy switch-priority 2
cumulus@switch:~$ nv set qos traffic-pool mc-lossy memory-percent 30
cumulus@switch:~$ nv config apply
Configure the following settings in the /etc/mlx/datapath/qos/qos_infra.conf file:
For additional default-lossless and RoCE pool examples, see Flow Control Buffers and RoCE. You can view traffic-pool configuration with the nv show qos traffic-pool <pool name> command:
You can use NVUE commands to tune advanced buffer properties in addition to the supported traffic pool configurations. Advanced buffer configuration can override the base traffic-pool profiles configured on the system.
You can only configure advanced buffer settings for the default-global profile.
Buffer Regions
You can adjust advanced buffer settings with the following NVUE command:
nv set qos advance-buffer-config default-global <buffer> <priority-group | property> <value>
You can adjust settings for the following supported buffer regions and properties:
Buffers
Supported Property Values
ingress-lossy-buffer
Cumulus Linux supports the following properties for the bulk, control, and service[1-6] priority groups: name - The priority group alias name. reserved - The reserved buffer allocation in bytes. service-pool - Service pool mapping. shared-alpha - The dynamic shared buffer alpha allocation. shared-bytes - The static shared buffer allocation in bytes. switch-priority - Switch priority values.
egress-lossless-buffer
reserved - The reserved buffer allocation in bytes. service-pool - Service pool mapping. shared-alpha - The dynamic shared buffer alpha allocation. shared-bytes - The static shared buffer allocation in bytes.
ingress-lossless-buffer
service-pool - Service pool mapping. shared-alpha - The dynamic shared buffer alpha allocation. shared-bytes - The static shared buffer allocation in bytes.
egress-lossy-buffer
multicast-port - Multicast port reserved or shared-bytes allocation in bytes. multicast-switch-priority [0-7] - Set the reserved, service-pool,shared-alpha, or shared-bytes properties for each multicast switch priority. traffic-class [0-15] - Set the reserved, service-pool,shared-alpha, or shared-bytes properties for each traffic class.
Configure shared-bytes for buffer regions mapped to static service pools, and shared-alpha for buffer regions mapped to dynamic service pools.
The shared buffer alpha value determines the proportion of available shared memory allocated across buffer regions. Regions with higher alpha values receive a higher proportion of available shared buffer memory. The following example changes the ingress-lossless-buffer shared alpha value to alpha_2 when using RoCE lossless mode:
You can configure ingress and egress service pool profile properties with the following NVUE commands:
nv set qos advance-buffer-config default-global ingress-pool <pool-id> <property> <value>
nv set qos advance-buffer-config default-global egress-pool <pool-id> <property> <value>
You can adjust the following properties for each pool:
Property
Description
infinite
The pool infinite flag.
memory-percent
The pool memory percent allocation.
mode
The pool mode: static or dynamic.
reserved
The reserved buffer allocation in bytes.
shared-alpha
The dynamic shared buffer alpha allocation.
shared-bytes
The static shared buffer allocation in bytes.
A relationship exists between the default traffic pools and the advanced buffer configuration settings.
Use caution when configuring advanced buffer settings. NVUE presents a warning if you attempt to apply incompatible traffic pool and advanced buffer configurations. NVUE performs the following validation checks before applying advanced buffer configurations:
You must map all switch priorities (0-7) to a priority group. You can map more than one switch priority to the same priority group.
The sum of memory-percent values across all ingress pools must be less than or equal to 100 percent.
The sum of memory-percent values across all egress pools must be less than or equal to 100 percent.
Reference the table below to view the mappings between the default traffic pool and advanced buffer properties:
Default Traffic Pool
Default Traffic Pool Properties
Advanced Buffer Region or Service Pool
Advanced Buffer Properties
default-lossy
memory-percent
ingress-pool 0 egress-pool 0
memory-percent
default-lossy
switch-priority
ingress-lossy-buffer
priority-group bulk switch-priority
default-lossless
memory-percent
ingress-pool 1 egress-pool 1
memory-percent
roce-lossless
memory-percent
ingress-pool 1 egress-pool 1
memory-percent
mc-lossy
memory-percent
ingress-pool 2 egress-pool 2
memory-percent
mc-lossy
switch-priority
ingress-lossy-buffer
priority-group service2 switch-priority
For example, to assign 20 percent of memory to a new static service pool, you must allow 20 percent of memory to be available from the default traffic pools. The following commands reduce the default-lossy traffic pool to 80 percent memory, allowing you to assign the memory to ingress-pool 3:
Cumulus Linux provides a syntax checker for the qos_features.conf and qos_infra.conf files to check for errors, such missing parameters or invalid parameter labels and values.
The syntax checker runs automatically with every switchd reload.
You can run the syntax checker manually from the command line with the cl-consistency-check --datapath-syntax-check command. If errors exist, they write to stderr by default. If you run the command with -q, errors write to the /var/log/switchd.log file.
The cl-consistency-check --datapath-syntax-check command takes the following options:
Option
Description
-h
Displays this list of command options.
-q
Runs the command in quiet mode. Errors write to the /var/log/switchd.log file instead of stderr.
-qi
Runs the syntax checker against a specified qos_infra.conf file.
-qf
Runs the syntax checker against a specified qos_features.conf file.
By default the syntax checker assumes:
qos_infra.conf is in /etc/mlx/datapath/qos/qos_infra.conf
qos_features.conf is in /etc/cumulus/datapath/qos/qos_features.conf
You can run the syntax checker when switchd is either running or stopped.
Show Qos Counters
NVUE provides the following commands to show QoS statistics for an interface:
NVUE Command
Description
nv show interface <interface> counters qos
Shows all QoS statistics for a specific interface.
nv show interface <interface> counters qos egress-queue-stats
Shows QoS egress queue statistics for a specific interface.
nv show interface <interface> counters qos ingress-buffer-stats
Shows QoS ingress buffer statistics for a specific interface.
nv show interface <interface> counters qos pfc-stats
Shows QoS PFC statistics for a specific interface.
nv show interface <interface> counters qos port-stats
Shows QoS port statistics for a specific interface.
The following example shows all QoS statistics for swp1:
If you configure btoh breakout ports and QoS settings for breakout interfaces at the same time, errors might occur.
You must apply breakout port configuration before QoS configuration on the breakout ports. If you are using NVUE, configure breakout ports and perform an nv config apply first, then configure QoS settings on the breakout ports followed by another nv config apply. If you are using linux file configuration, modify ports.conf first, reload switchd, then modify qos_features.conf and reload switchd a second time.
QoS Settings on Bond Member Interfaces
If you use Linux commands to apply QoS settings on bond member interfaces instead of the logical bond interface, the members must share identical QoS configuration. If the configuration is not identical between bond interfaces, the bond inherits the _last_ interface you apply to the bond.
If QoS settings do not match, switchd reload fails; however, switchd restart does not fail.
NVUE rejects QoS configurations on bond member interfaces and shows an error when you try to apply the configurations; you must apply all QoS configuration on logical bond interfaces.
Cut-through Switching
You cannot disable cut-through switching on Spectrum ASICs. Cumulus Linux ignores the cut_through_enable = false setting in the qos_features.conf file.
RDMA over Converged Ethernet - RoCE
RoCE enables you to write to compute or storage elements using RDMA over an Ethernet network instead of using host CPUs. RoCE relies on ECN and PFC to operate. Cumulus Linux supports features that can enable lossless Ethernet for RoCE environments.
While Cumulus Linux can support RoCE environments, the end hosts must support the RoCE protocol.
RoCE helps you obtain a converged network, where all services run over the Ethernet infrastructure, including Infiniband apps.
Default RoCE Mode Configuration
The following table shows the default RoCE configuration for lossy and lossless mode.
Configuration
Lossy Mode
Lossless Mode
Port trust mode
YES
YES
Port switch priority to traffic class mapping
Switch priority 3 to traffic class 3 (RoCE)
Switch priority 6 to traffic class 6 (CNP)
Other switch priority to traffic class 0
YES
YES
Port ETS:
Traffic class 6 (CNP) - Strict
Traffic class 3 (RoCE) - WRR 50%
Traffic class 0 (Other traffic) - WRR 50%
YES
YES
Port ECN absolute threshold is 1501500 bytes for traffic class 3 (RoCE)
YES
YES
LLDP and Application TLV (RoCE) (UDP, Protocol:4791, Priority: 3)
YES
YES
Enable PFC on switch priority 3 (RoCE)
NO
YES
Switch priority 3 allocated to RoCE lossless traffic pool
NO
YES
Enable RDMA over Converged Ethernet lossless (with PFC and ECN)
RoCE uses the Infiniband (IB) Protocol over converged Ethernet. The IB global route header rides directly on top of the Ethernet header. The lossless Ethernet layer handles congestion hop by hop.
To configure RoCE with PFC and ECN:
cumulus@switch:~$ nv set qos roce
cumulus@switch:~$ nv config apply
NVUE defaults to roce mode lossless. The command nv set qos roce and nv set qos roce mode lossless are equivalent.
If you enable mode lossy, configuring nv set qos roce without a mode does not change the RoCE mode. To change to lossless, you must configure mode lossless.
Link pause is another way to provide lossless ethernet; however, PFC is the preferred method. PFC allows more granular control by pausing the traffic flow for a given CoS group instead of the entire link.
Enable RDMA over Converged Ethernet lossy (with ECN)
RoCEv2 requires flow control for lossless Ethernet. RoCEv2 uses the Infiniband (IB) Transport Protocol over UDP. The IB transport protocol includes an end-to-end reliable delivery mechanism and has its own sender notification mechanism.
RoCEv2 congestion management uses RFC 3168 to signal congestion experienced to the receiver. The receiver generates an RoCEv2 congestion notification packet directed to the source of the packet.
To configure RoCE with ECN:
cumulus@switch:~$ nv set qos roce mode lossy
cumulus@switch:~$ nv config apply
Remove RoCE Configuration
To remove RoCE configurations:
cumulus@switch:~$ nv unset qos roce
cumulus@switch:~$ nv config apply
Verify RoCE Configuration
You can verify RoCE configuration with NVUE nv show commands.
To show detailed information about the configured buffers, utilization and DSCP markings, run the nv show qos roce command:
To show detailed RoCE information about a single interface, run the nv show interface <interface> qos roce status command.
cumulus@switch:mgmt:~$ nv show interface swp16 qos roce status
operational applied description
------------------ ------------- ------- ---------------------------------------------------
congestion-control
congestion-mode ecn, absolute Congestion config mode
enabled-tc 0,3 Congestion config enabled Traffic Class
max-threshold 1.43 MB Congestion config max-threshold
min-threshold 153.00 KB Congestion config min-threshold
probability 100
lldp-app-tlv
priority 3
protocol-id 4791
selector UDP
pfc
pfc-priority 3 switch-prio on which PFC is enabled
rx-enabled yes PFC Rx Enabled status
tx-enabled yes PFC Tx Enabled status
trust
trust-mode pcp,dscp Trust Setting on the port for packet classification
mode lossless Roce Mode
RoCE PCP/DSCP->SP mapping configurations
===========================================
pcp dscp switch-prio
---- --- ---- -----------
cnp 6 48 6
roce 3 26 3
RoCE SP->TC mapping and ETS configurations
=============================================
switch-prio traffic-class scheduler-weight
---- ----------- ------------- ----------------
cnp 6 6 strict priority
roce 3 3 dwrr-50%
RoCE Pool Status
===================
name mode pool-id switch-priorities traffic-class size current-usage max-usage
-- --------------------- ------- ------- ----------------- ------------- -------- ------------- ---------
0 lossy-default-ingress DYNAMIC 2 0,1,2,4,5,6,7 - 15.16 MB 0 Bytes 16.00 MB
1 roce-reserved-ingress DYNAMIC 3 3 - 15.16 MB 7.30 MB 7.90 MB
2 lossy-default-egress DYNAMIC 13 - 0,6 15.16 MB 0 Bytes 16.01 MB
3 roce-reserved-egress DYNAMIC 14 - 3 inf 7.29 MB 13.47 MB
To show detailed information about current buffer utilization as well as historic RoCE byte and packet counts, run the nv show interface <interface> qos roce counters command:
cumulus@switch:mgmt:~$ nv show interface swp16 qos roce counters
operational applied description
----------------------------- ------------ ------- ------------------------------------------------------
rx-stats
rx-non-roce-stats
buffer-max-usage 144 Bytes Max Ingress Pool-buffer usage for non-RoCE traffic
buffer-usage 0 Bytes Current Ingress Pool-buffer usage for non-RoCE traffic
no-buffer-discard 55 Rx buffer discards for non-RoCE traffic
non-roce-bytes 56.52 MB non-roce rx bytes
non-roce-packets 462975 non-roce rx packets
pg-max-usage 144 Bytes Max PG-buffer usage for non-RoCE traffic
pg-usage 0 Bytes Current PG-buffer usage for non-RoCE traffic
rx-pfc-stats
pause-duration 0 Rx PFC pause duration for RoCE traffic
pause-packets 0 Rx PFC pause packets for RoCE traffic
rx-roce-stats
buffer-max-usage 0 Bytes Max Ingress Pool-buffer usage for RoCE traffic
buffer-usage 0 Bytes Current Ingress Pool-buffer usage for RoCE traffic
no-buffer-discard 0 Rx buffer discards for RoCE traffic
pg-max-usage 0 Bytes Max PG-buffer usage for RoCE traffic
pg-usage 0 Bytes Current PG-buffer usage for RoCE traffic
roce-bytes 0 Bytes Rx RoCE Bytes
roce-packets 0 Rx RoCE Packets
tx-stats
tx-cnp-stats
buffer-max-usage 16.02 MB Max Egress Pool-buffer usage for CNP traffic
buffer-usage 0 Bytes Current Egress Pool-buffer usage for CNP traffic
cnp-bytes 0 Bytes Tx CNP Packet Bytes
cnp-packets 0 Tx CNP Packets
tc-max-usage 0 Bytes Max TC-buffer usage for CNP traffic
tc-usage 0 Bytes Current TC-buffer usage for CNP traffic
unicast-no-buffer-discard 0 Tx buffer discards for CNP traffic
tx-ecn-stats
ecn-marked-packets 693777677344 Tx ECN marked packets
tx-pfc-stats
pause-duration 0 Tx PFC pause duration for RoCE traffic
pause-packets 0 Tx PFC pause packets for RoCE traffic
tx-roce-stats
buffer-max-usage 13.47 MB Max Egress Pool-buffer usage for RoCE traffic
buffer-usage 7.29 MB Current Egress Pool-buffer usage for RoCE traffic
roce-bytes 92824.38 GB Tx RoCE Packet bytes
roce-packets 803785675319 Tx RoCE Packets
tc-max-usage 16.02 MB Max TC-buffer usage for RoCE traffic
tc-usage 7.29 MB Current TC-buffer usage for RoCE traffic
unicast-no-buffer-discard 663060754115 Tx buffer discards for RoCE traffic
To reset the counters that the nv show interface <interface> qos roce command displays, run the nv action clear interface <interface> qos roce counters command.
Change RoCE Configuration
You can adjust RoCE settings using NVUE after you enable RoCE. To change the memory allocation for RoCE lossless mode to 60 percent:
cumulus@switch:mgmt:~$ nv set qos traffic-pool default-lossy memory-percent 40
cumulus@switch:mgmt:~$ nv set qos traffic-pool roce-lossless memory-percent 60
cumulus@switch:mgmt:~$ nv config apply
To change the memory allocation of the RoCE lossy traffic pool to 60 percent and remap switch priority 4 to RoCE lossy traffic:
cumulus@switch:mgmt:~$ nv set qos traffic-pool default-lossy switch-priority 0-3,5-7
cumulus@switch:mgmt:~$ nv set qos traffic-pool roce-lossy memory-percent 60
cumulus@switch:mgmt:~$ nv set qos traffic-pool default-lossy memory-percent 40
cumulus@switch:mgmt:~$ nv set qos traffic-pool roce-lossy switch-priority 4
cumulus@switch:mgmt:~$ nv set qos egress-queue-mapping default-global switch-priority 4 traffic-class 3
cumulus@switch:mgmt:~$ nv set qos egress-queue-mapping default-global switch-priority 3 traffic-class 0
cumulus@switch:mgmt:~$ nv set qos mapping default-global trust both
cumulus@switch:mgmt:~$ nv set qos mapping default-global dscp 26 switch-priority 4
cumulus@switch:mgmt:~$ nv config apply
To change the RoCE lossless switch priority from switch priority 3 to switch priority 2:
cumulus@switch:mgmt:~$ nv set qos pfc default-global switch-priority 2
cumulus@switch:mgmt:~$ nv set qos egress-queue-mapping default-global switch-priority 2 traffic-class 3
cumulus@switch:mgmt:~$ nv set qos egress-queue-mapping default-global switch-priority 3 traffic-class 0
cumulus@switch:mgmt:~$ nv set qos mapping default-global trust both
cumulus@switch:mgmt:~$ nv set qos mapping default-global dscp 26 switch-priority 2
DHCP is a client server protocol that automatically provides IP hosts with IP addresses and other related configuration information. A DHCP relay (agent) is a host that forwards DHCP packets between clients and servers that are not on the same physical subnet.
This topic describes how to configure DHCP relays for IPv4 and IPv6 using the following topology:
Basic Configuration
To set up DHCP relay, you need to provide the IP address of the DHCP server and the interfaces participating in DHCP relay (facing the server and facing the client). In an MLAG configuration, you must also specify the peerlink interface in case the local uplink interfaces fail.
In the example commands below:
The DHCP server IPv4 address is 172.16.1.102
The DHCP server IPv6 address is 2001:db8:100::2
vlan10 is the SVI for VLAN 10 and the uplinks are swp51 and swp52
peerlink.4094 is the MLAG interface
cumulus@leaf01:~$ nv set service dhcp-relay default interface swp51
cumulus@leaf01:~$ nv set service dhcp-relay default interface swp52
cumulus@leaf01:~$ nv set service dhcp-relay default interface vlan10
cumulus@leaf01:~$ nv set service dhcp-relay default interface peerlink.4094
cumulus@leaf01:~$ nv set service dhcp-relay default server 172.16.1.102
cumulus@leaf01:~$ nv config apply
cumulus@leaf01:~$ nv set service dhcp-relay6 default interface upstream swp51 server-address 2001:db8:100::2
cumulus@leaf01:~$ nv set service dhcp-relay6 default interface upstream swp52 server-address 2001:db8:100::2
cumulus@leaf01:~$ nv set service dhcp-relay6 default interface downstream vlan10
cumulus@leaf01:~$ nv set service dhcp-relay6 default interface downstream peerlink.4094
cumulus@leaf01:~$ nv config apply
Edit the /etc/default/isc-dhcp-relay-default file to add the IP address of the DHCP server and the interfaces participating in DHCP relay.
You configure a DHCP relay on a per-VLAN basis, specifying the SVI, not the parent bridge. In the example above, you specify vlan10 as the SVI for VLAN 10 but you do not specify the bridge named bridge.
When you configure DHCP relay with VRR, the DHCP relay client must run on the SVI; not on the -v0 interface.
For every instance of a DHCP relay in a non-default VRF, you need to create a separate default file in the /etc/default directory. See DHCP with VRF.
Optional Configuration
This section describes optional DHCP relay configurations. The steps provided in this section assume that you have already configured basic DHCP relay, as described above.
DHCP Agent Information Option (Option 82)
Cumulus Linux supports DHCP Agent Information Option 82, which allows a DHCP relay to insert circuit or relay specific information into a request that the switch forwards to a DHCP server. You can use the following options:
Circuit ID includes information about the circuit on which the request comes in, such as the SVI or physical port. By default, this is the printable name of the interface that receives the client request.
Remote ID includes information that identifies the relay agent, such as the MAC address. By default, this is the system MAC address of the device on which DHCP relay is running.
To configure DHCP Agent Information Option 82:
The following example enables Option 82 and enables circuit ID:
cumulus@leaf01:~$ nv set service dhcp-relay <vrf-id> agent enable on
cumulus@leaf01:~$ nv set service dhcp-relay <vrf-id> agent use-pif-circuit-id enable on
cumulus@leaf01:~$ nv config apply
The following example enables Option 82 and sets the remote ID to MAC address 44:38:39:BE:EF:AA:
cumulus@leaf01:~$ nv set service dhcp-relay <vrf-id> agent enable on
cumulus@leaf01:~$ nv set service dhcp-relay default agent remote-id 44:38:39:BE:EF:AA
cumulus@leaf01:~$ nv config apply
Edit the /etc/default/isc-dhcp-relay-default file and add one of the following options:
To inject the ingress SVI interface against which DHCP processes the relayed DHCP discover packet, add -a to the OPTIONS line:
cumulus@leaf01:~$ sudo nano /etc/default/isc-dhcp-relay-default
...
# Additional options that are passed to the DHCP relay daemon?
OPTIONS="-a"
To inject the physical switch port on which the relayed DHCP discover packet arrives instead of the SVI, add -a --use-pif-circuit-id to the OPTIONS line:
cumulus@leaf01:~$ sudo nano /etc/default/isc-dhcp-relay-default
...
# Additional options that are passed to the DHCP relay daemon?
OPTIONS="-a --use-pif-circuit-id"
To customize the Remote ID sub-option, add -a -r to the OPTIONS line followed by a custom string (up to 255 characters):
cumulus@leaf01:~$ sudo nano /etc/default/isc-dhcp-relay-default
...
# Additional options that are passed to the DHCP relay daemon?
OPTIONS="-a -r CUSTOMVALUE"
Restart the dhcrelay service to apply the new configuration:
When you need DHCP relay in an environment that relies on an anycast gateway (such as EVPN), a unique IP address is necessary on each device for return traffic. By default, in a BGP unnumbered environment with DHCP relay, the source IP address is the loopback IP address and the gateway IP address is the SVI IP address. However with anycast traffic, the SVI IP address is not unique to each rack; it is typically shared between racks. Most EVPN ToR deployments only use a single unique IP address, which is the loopback IP address.
RFC 3527 enables the DHCP server to react to these environments by introducing a new parameter to the DHCP header called the link selection sub-option, which the DHCP relay agent builds. The link selection sub-option takes on the normal role of the gateway address in relaying to the DHCP server which subnet correlates to the DHCP request. When using this sub-option, the gateway address continues to be present but only relays the return IP address that the DHCP server uses; the gateway address becomes the unique loopback IP address.
When enabling RFC 3527 support, you can specify an interface, such as the loopback interface or a switch port interface to use as the gateway address. The relay picks the first IP address on that interface. If the interface has multiple IP addresses, you can specify a specific IP address for the interface.
RFC 3527 supports IPv4 DHCP relays only.
To enable RFC 3527 support and control the gateway address:
Run the nv set service dhcp-relay default gateway-interface command with the interface or IP address you want to use. The following example uses the first IP address on the loopback interface as the gateway IP address:
cumulus@leaf01:~$ nv set service dhcp-relay default gateway-interface lo
The first IP address on the loopback interface is typically the 127.0.0.1 address. This example uses IP address 10.10.10.1 on the loopback interface as the gateway address:
cumulus@leaf01:~$ nv set service dhcp-relay default gateway-interface lo address 10.10.10.1
This example uses the first IP address on swp2 as the gateway address:
cumulus@leaf01:~$ nv set service dhcp-relay default gateway-interface swp2
This example uses IP address 10.0.0.4 on swp2 as the gateway address:
cumulus@leaf01:~$ nv set service dhcp-relay default gateway-interface swp2 address 10.0.0.4
Edit the /etc/default/isc-dhcp-relay-default file and provide the -U option with the interface or IP address you want to use as the gateway address.
This example uses the first IP address on the loopback interface as the gateway address:
cumulus@leaf01:~$ sudo nano /etc/default/isc-dhcp-relay-default
...
# Additional options that are passed to the DHCP relay daemon?
OPTIONS="-U lo"
The first IP address on the loopback interface is typically the 127.0.0.1 address. This example uses IP address 10.10.10.1 on the loopback interface as the gateway address:
cumulus@leaf01:~$ sudo nano /etc/default/isc-dhcp-relay-default
...
# Additional options that are passed to the DHCP relay daemon?
OPTIONS="-U 10.10.10.1%lo"
This example uses the first IP address on swp2 as the gateway address:
cumulus@leaf01:~$ sudo nano /etc/default/isc-dhcp-relay-default
...
# Additional options that are passed to the DHCP relay daemon?
OPTIONS="-U swp2"
This example uses IP address 10.0.0.4 on swp2 as the gateway address:
cumulus@leaf01:~$ sudo nano /etc/default/isc-dhcp-relay-default
...
# Additional options that are passed to the DHCP relay daemon?
OPTIONS="-U 10.0.0.4%swp2"
Restart the dhcrelay service to apply the configuration change:
DHCP Relay for IPv4 in an EVPN Symmetric Environment with MLAG
In a multi-tenant EVPN symmetric routing environment with MLAG, you must enable RFC 3527 support. You can specify an interface, such as the loopback or VRF interface for the gateway address. The interface must be reachable in the tenant VRF that you configure for DHCP relay and must have a unique IPv4 address. For EVPN symmetric routing with an anycast gateway that reuses the same SVI IP address on multiple leaf switches, you must assign a unique IP address for the VRF interface and include the layer 3 VNI for this VRF in the DHCP relay configuration.
The following example:
Configures VRF RED with IPv4 address 20.20.20.1/32.
Configures the SVIs vlan10 and vlan20, and the layer 3 VNI VLAN interface for VRF RED vlan4024_l3 to be part of the INTF_CMD list to service DHCP packets.
Sets the DHCP server to 10.1.10.104.
Configures VRF RED to advertise connected routes as type-5 so that the VRF RED loopback IPv4 address is reachable.
cumulus@leaf01:~$ nv set vrf RED loopback ip address 20.20.20.1/32
cumulus@leaf01:~$ nv set service dhcp-relay RED interface vlan10
cumulus@leaf01:~$ nv set service dhcp-relay RED interface vlan20
cumulus@leaf01:~$ nv set service dhcp-relay RED interface vlan4024_l3
cumulus@leaf01:~$ nv set service dhcp-relay RED server 10.1.10.104
cumulus@leaf01:~$ nv set vrf RED router bgp address-family ipv4-unicast redistribute connected enable on
cumulus@leaf01:~$ nv set vrf RED router bgp address-family ipv4-unicast route-export to-evpn enable on
cumulus@leaf01:~$ nv config apply
Edit the /etc/network/interfaces file to configure VRF RED with IPv4 address 20.20.20.1/32
cumulus@leaf01:mgmt:~$ sudo nano /etc/network/interfaces
...
auto RED
iface RED
address 20.20.20.1/32
vrf-table auto
Configure VRF RED to advertise the connected routes as type-5 so that the loopback IPv4 address is reachable:
DHCP Relay for IPv4 in an EVPN Symmetric Environment without MLAG
In a multi-tenant EVPN symmetric routing environment without MLAG, the VLAN interface (SVI) IPv4 address is typically unique on each leaf switch, which does not require RFC 3527 configuration.
The following example:
Configures the SVIs vlan10 and vlan20, and the layer 3 VNI VLAN interface for VRF RED vlan4024_l3 to be part of INTF_CMD list to service DHCP packets.
Sets the DHCP server IP address to 10.1.10.104.
cumulus@leaf01:~$ nv set service dhcp-relay RED interface vlan10
cumulus@leaf01:~$ nv set service dhcp-relay RED interface vlan20
cumulus@leaf01:~$ nv set service dhcp-relay RED interface vlan4024_l3
cumulus@leaf01:~$ nv set service dhcp-relay RED server 10.1.10.104
cumulus@leaf01:~$ nv config apply
DHCP Relay for IPv6 in an EVPN Symmetric Environment
For IPv6 DHCP relay in a symmetric routing environment, you must assign a unique IPv6 address to the non-default VRF interfaces that participate in DHCP relay. Cumulus Linux uses this IPv6 address as the source address when sending packets to the DHCP server and the DHCP server replies to this address.
RFC 3527 does not apply to IPv6. IPv6 has the functionality described in RFC 3527 as part of its normal operations.
The following example:
Configures VRF RED with the unique IPv6 address 2001:db8:666::1/128.
Configures VLAN 10 and 20 in VRF RED to service DHCP requests from downstream hosts.
Sets the DHCP server to 2001:db8:199::2.
Configures the layer 3 VNI interface for VRF RED vlan4024_l3 to process DHCP packets from the upstream server.
Configures VRF RED to advertise the connected routes so that the loopback IPv6 address is reachable.
cumulus@leaf01:~$ nv set vrf RED loopback ip address 2001:db8:666::1/128
cumulus@leaf01:~$ nv set service dhcp-relay6 RED interface downstream vlan10
cumulus@leaf01:~$ nv set service dhcp-relay6 RED interface downstream vlan20
cumulus@leaf01:~$ nv set service dhcp-relay6 RED interface upstream RED server-address 2001:db8:199::2
cumulus@leaf01:~$ nv set service dhcp-relay6 RED interface upstream vlan4024_l3
cumulus@leaf01:~$ nv set vrf RED router bgp address-family ipv6-unicast route-export to-evpn enable on
cumulus@leaf01:~$ nv config apply
Edit the /etc/network/interfaces file to configure VRF RED with IPv6 address 2001:db8:666::1/128:
cumulus@leaf01:mgmt:~$ sudo nano /etc/network/interfaces
...
auto RED
iface RED
address 2001:db8:666::1/128
vrf-table auto
Configure VRF RED to advertise the connected routes so that the loopback IPv6 address is reachable:
Gateway IP Address as Source IP for Relayed DHCP Packets (Advanced)
You can configure the dhcrelay service to forward IPv4 (only) DHCP packets to a DHCP server and ensure that the source IP address of the relayed packet is the same as the gateway IP address.
This option impacts all relayed IPv4 packets globally.
To use the gateway IP address as the source IP address:
cumulus@leaf01:~$ nv set service dhcp-relay default source-ip gateway
cumulus@leaf01:~$ nv config apply
Edit the /etc/default/isc-dhcp-relay-default file to add --giaddr-src to the OPTIONS line.
Cumulus Linux supports multiple DHCP relay daemons on a switch to enable relaying of packets from different bridges to different upstream interfaces.
To configure multiple DHCP relay daemons on a switch:
In the /etc/default directory, create a configuration file for each DHCP relay daemon. Use the naming scheme isc-dhcp-relay-<dhcp-name> for IPv4 or isc-dhcp-relay6-<dhcp-name> for IPv6. This is an example configuration file for IPv4:
# Defaults for isc-dhcp-relay initscript
# sourced by /etc/init.d/isc-dhcp-relay
# installed at /etc/default/isc-dhcp-relay by the maintainer scripts
#
# This is a POSIX shell fragment
#
# What servers should the DHCP relay forward requests to?
SERVERS="102.0.0.2"
# On what interfaces should the DHCP relay (dhrelay) serve DHCP requests?
# Always include the interface towards the DHCP server.
# This variable requires a -i for each interface configured above.
# This will be used in the actual dhcrelay command
# For example, "-i eth0 -i eth1"
INTF_CMD="-i swp2s2 -i swp2s3"
# Additional options that are passed to the DHCP relay daemon?
OPTIONS=""
Run the following command to start a dhcrelay instance, where <dhcp-name> is the instance name or number.
To see how DHCP relay is working on your switch, run the journalctl command:
cumulus@leaf01:~$ sudo journalctl -l -n 20 | grep dhcrelay
Dec 05 20:58:55 leaf01 dhcrelay[6152]: sending upstream swp52
Dec 05 20:58:55 leaf01 dhcrelay[6152]: sending upstream swp51
Dec 05 20:58:55 leaf01 dhcrelay[6152]: Relaying Reply to fe80::4638:39ff:fe00:3 port 546 down.
Dec 05 20:58:55 leaf01 dhcrelay[6152]: Relaying Reply to fe80::4638:39ff:fe00:3 port 546 down.
Dec 05 21:03:55 leaf01 dhcrelay[6152]: Relaying Renew from fe80::4638:39ff:fe00:3 port 546 going up.
Dec 05 21:03:55 leaf01 dhcrelay[6152]: sending upstream swp52
Dec 05 21:03:55 leaf01 dhcrelay[6152]: sending upstream swp51
Dec 05 21:03:55 leaf01 dhcrelay[6152]: Relaying Reply to fe80::4638:39ff:fe00:3 port 546 down.
Dec 05 21:03:55 leaf01 dhcrelay[6152]: Relaying Reply to fe80::4638:39ff:fe00:3 port 546 down.
To specify a time period with the journalctl command, use the --since flag:
cumulus@leaf01:~$ sudo journalctl -l --since "2 minutes ago" | grep dhcrelay
Dec 05 21:08:55 leaf01 dhcrelay[6152]: Relaying Renew from fe80::4638:39ff:fe00:3 port 546 going up.
Dec 05 21:08:55 leaf01 dhcrelay[6152]: sending upstream swp52
Dec 05 21:08:55 leaf01 dhcrelay[6152]: sending upstream swp51
Configuration Errors
If you configure DHCP relays by editing the /etc/default/isc-dhcp-relay-default file manually, you can introduce configuration errors that cause the switch to crash.
For example, if you see an error similar to the following, check that there is no space between the DHCP server address and the interface you use as the uplink.
Core was generated by /usr/sbin/dhcrelay --nl -d -i vx-40 -i vlan10 10.0.0.4 -U 10.0.1.2 %vlan20.
Program terminated with signal SIGSEGV, Segmentation fault.
To resolve the issue, manually edit the /etc/default/isc-dhcp-relay-default file to remove the space, then run the systemctl restart dhcrelay@default.service command to restart the dhcrelay service and apply the configuration change.
Considerations
The dhcrelay command does not bind to an interface if the interface name is longer than 14 characters. This is a known limitation in dhcrelay.
DHCP packets received on bridge ports and sent to the CPU for processing cause the RX_DROP counter to increment on the interface.
DHCP Servers
A DHCP server automatically provides and assigns IP addresses and other network parameters to client devices. It relies on DHCP to respond to broadcast requests from clients.
This section shows you how to configure a DHCP server using the following topology, where the DHCP server is a switch running Cumulus Linux.
To configure the DHCP server on a Cumulus Linux switch:
Create a DHCP pool by providing a pool ID. The ID is an IPv4 or IPv6 prefix.
Provide a name for the pool (optional).
Provide the IP address of the DNS Server you want to use in this pool. You can assign multiple DNS servers.
Provide the domain name you want to use for this pool for name resolution (optional).
Define the range of IP addresses available for assignment.
Provide the default gateway IP address (optional).
In addition, you can configure a static IP address for a resource, such as a server or printer:
Create an ID for the static assignment. This is typically the name of the resource.
Provide the static IP address you want to assign to this resource.
Provide the MAC address of the resource to which you want to assign the IP address.
To configure static IP address assignments, you must first configure a pool.
You can set the DNS server IP address and domain name globally or specify different DNS server IP addresses and domain names for different pools. The following example commands configure a DNS server IP address and domain name for a pool.
cumulus@switch:~$ nv set service dhcp-server default pool 10.1.10.0/24 pool-name storage-servers
cumulus@switch:~$ nv set service dhcp-server default pool 10.1.10.0/24 domain-name example.com
cumulus@switch:~$ nv set service dhcp-server default pool 10.1.10.0/24 domain-name-server 192.168.200.53
cumulus@switch:~$ nv set service dhcp-server default pool 10.1.10.0/24 range 10.1.10.100 to 10.1.10.199
cumulus@switch:~$ nv set service dhcp-server default pool 10.1.10.0/24 gateway 10.1.10.1
cumulus@switch:~$ nv set service dhcp-server default static server1
cumulus@switch:~$ nv set service dhcp-server default static server1 ip-address 10.0.0.2
cumulus@switch:~$ nv set service dhcp-server default static server1 mac-address 44:38:39:00:01:7e
cumulus@switch:~$ nv config apply
To allocate DHCP addresses from the configured pool, you must configure an interface with an IP address from the pool subnet. For example:
cumulus@switch:~$ nv set interface vlan10 ip address 10.1.10.1/24
cumulus@switch:~$ nv config apply
To set the DNS server IP address and domain name globally, use the nv set service dhcp-server <vrf> domain-name-server <address> and nv set service dhcp-server <vrf> domain-name <domain> commands.
cumulus@switch:~$ nv set service dhcp-server6 default pool 2001:db8::/64
cumulus@switch:~$ nv set service dhcp-server6 default pool 2001:db8::/64 pool-name storage-servers
cumulus@switch:~$ nv set service dhcp-server6 default pool 2001:db8::/64 domain-name-server 2001:db8:100::64
cumulus@switch:~$ nv set service dhcp-server6 default pool 2001:db8::/64 domain-name example.com
cumulus@switch:~$ nv set service dhcp-server6 default pool 2001:db8::/64 range 2001:db8::100 to 2001:db8::199
cumulus@switch:~$ nv set service dhcp-server6 default static server1
cumulus@switch:~$ nv set service dhcp-server6 default static server1 ip-address 2001:db8::100
cumulus@switch:~$ nv set service dhcp-server6 default static server1 mac-address 44:38:39:00:01:7e
cumulus@switch:~$ nv config apply
To allocate DHCP addresses from the configured pool, you must configure an interface with an IP address from the pool subnet. For example:
cumulus@switch:~$ nv set interface vlan10 ip address 2001:db8::10/64
cumulus@switch:~$ nv config apply
To set the DNS server IP address and domain name globally, use the nv set service dhcp-server6 <vrf> domain-name-server <address> and nv set service dhcp-server6 <vrf> domain-name <domain> commands.
In a text editor, edit the /etc/dhcp/dhcpd.conf file. Use following configuration as an example:
To set the DNS server IP address and domain name globally, add the DNS server IP address and domain name before the pool information in the /etc/dhcp/dhcpd.conf file. For example:
To set the DNS server IP address and domain name globally, add the DNS server IP address and domain name before the pool information in the /etc/dhcp/dhcpd6.conf file. For example:
You can set the network address lease time assigned to DHCP clients. You can specify a number between 180 and 31536000. The default lease time is 3600 seconds.
cumulus@switch:~$ nv set service dhcp-server default pool 10.1.10.0/24 lease-time 200000
cumulus@switch:~$ nv config apply
cumulus@switch:~$ nv set service dhcp-server6 default pool 2001:db8::/64 lease-time 200000
cumulus@switch:~$ nv config apply
Edit the /etc/dhcp/dhcpd.conf file to set the lease time (in seconds):
Configure the DHCP server to ping the address you want to assign to a client before issuing the IP address. If there is no response, DHCP delivers the IP address; otherwise, it attempts the next available address in the range.
cumulus@switch:~$ nv set service dhcp-server default pool 10.1.10.0/24 ping-check on
cumulus@switch:~$ nv config apply
cumulus@switch:~$ nv set service dhcp-server6 default pool 2001:db8::/64 ping-check on
cumulus@switch:~$ nv config apply
Edit the /etc/dhcp/dhcpd.conf file to add ping-check true;:
You can assign an IP address and other DHCP options based on physical location or port regardless of MAC address to clients that attach directly to the Cumulus Linux switch through a switch port. This is helpful when swapping out switches and servers; you can avoid the inconvenience of collecting the MAC address and sending it to the network administrator to modify the DHCP server configuration.
Cumulus Linux does not provide NVUE commands for this setting.
Cumulus Linux does not provide NVUE commands for this setting.
Edit the /etc/dhcp/dhcpd.conf file to add the interface and IP address:
To show the current DHCP server settings, run the nv show service dhcp-server command:
cumulus@leaf01:mgmt:~$ nv show service dhcp-server
Summary
--------- ------------------
+ default interface: "swp1
default pool: 10.1.10.0/24
default static: server1
The DHCP server determines if a DHCP request is a relay or a non-relay DHCP request. Run the following command to see the DHCP request:
cumulus@server02:~$ sudo tail /var/log/syslog | grep dhcpd
2016-12-05T19:03:35.379633+00:00 server02 dhcpd: Relay-forward message from 2001:db8:101::1 port 547, link address 2001:db8:101::1, peer address fe80::4638:39ff:fe00:3
2016-12-05T19:03:35.380081+00:00 server02 dhcpd: Advertise NA: address 2001:db8::110 to client with duid 00:01:00:01:1f:d8:75:3a:44:38:39:00:00:03 iaid = 956301315 valid for 600 seconds
2016-12-05T19:03:35.380470+00:00 server02 dhcpd: Sending Relay-reply to 2001:db8:101::1 port 547
Considerations
DHCP packets received on bridge ports and sent to the CPU for processing cause the RX_DROP counter to increment on the interface.
DHCP Snooping
DHCP snooping enables Cumulus Linux to act as a middle layer between the DHCP infrastructure and DHCP clients by scanning DHCP control packets and building an IP-MAC database. Cumulus Linux accepts DHCP offers from only trusted interfaces and can rate limit packets.
DHCP option 82 processing is not supported.
Configure DHCP Snooping
To configure DHCP snooping, you need to:
Enable DHCP snooping on a VLAN.
Add a trusted interface. Cumulus Linux allows DHCP offers from only trusted interfaces to prevent malicious DHCP servers from assigning IP addresses inside the network. The interface must be a member of the bridge specified.
Set the rate limit for DHCP requests to avoid DoS attacks. The default value is 100 packets per second.
The following example shows you how to configure DHCP snooping for IPv4 and IPv6.
NVUE does not provide commands to configure DHCP Snooping.
Create the /etc/dhcpsnoop/dhcp_snoop.json file and add DHCP snooping configuration under the bridge.
The following example enables DHCP snooping for IPv4 on VLAN 10, sets the rate limit to 50 and the trusted interface to swp3. swp3 is a member of the bridge br_default:
The following example enables DHCP snooping for IPv6 on VLAN 10, sets the rate limit to 50 and the trusted interface to swp6. swp6 is a member of the bridge br_default:
When DHCP snooping detects a violation, the packet is dropped and a message is logged to the /var/log/dhcpsnoop.log file.
Show the DHCP Binding Table
To show the DHCP binding table, run the net show dhcp-snoop table command for IPv4 or the net show dhcp-snoop6 table command for IPv6. The following example command shows the DHCP binding table for IPv4:
cumulus@leaf01:~$ net show dhcp-snoop table
Port VLAN IP MAC Lease State Bridge
---- ---- --------- ----------------- ----- ----- ------
swp5 1002 10.0.0.3 00:02:00:00:00:04 7200 ACK br0
swp5 1000 10.0.1.3 00:02:00:00:00:04 7200 ACK br0
Prescriptive Topology Manager - PTM
In data center topologies, right cabling is time consuming and error prone. PTM is a dynamic cabling verification tool that can detect and eliminate errors. PTM uses a Graphviz-DOT specified network cabling plan in a topology.dot file and couples it with runtime information from LLDP to verify that the cabling matches the specification. The check occurs on every link transition on each node in the network.
You can customize the topology.dot file to control ptmd at both the global/network level and the node/port level.
PTM runs as a daemon, named ptmd.
Supported Features
Topology verification using LLDP. ptmd creates a client connection to the LLDP daemon, lldpd, and retrieves the neighbor relationship between the nodes/ports in the network and compares them against the prescribed topology specified in the topology.dot file.
PTM only supports physical interfaces, such as swp1 or eth0. You cannot specify virtual interfaces, such as bonds or subinterfaces in the topology file.
Client management: ptmd creates an abstract named socket /var/run/ptmd.socket on startup. Other applications can connect to this socket to receive notifications and send commands.
Event notifications: see Scripts below.
User configuration through a topology.dot file; see below.
Configure PTM
ptmd verifies the physical network topology against a DOT-specified network graph file, /etc/ptm.d/topology.dot.
At startup, ptmd connects to lldpd (the LLDP daemon) over a Unix socket and retrieves the neighbor name and port information. It then compares the retrieved port information with the configuration information that it reads from the topology file. If there is a match, it is a PASS, otherwise it is a FAIL.
PTM performs its LLDP neighbor check using the PortID ifname TLV information.
ptmd Scripts
ptmd executes scripts at /etc/ptm.d/if-topo-pass and /etc/ptm.d/if-topo-failfor each interface that goes through a change and runs if-topo-pass when an LLDP or BFD check passes or if-topo-fails when the check fails. The scripts receive an argument string that is the result of the ptmctl command; see ptmd commands below.
You can modify these default scripts.
Configuration Parameters
You can configure ptmd parameters in the topology file. The parameters are host-only, global, per-port/node and templates.
Host-only Parameters
Host-only parameters apply to the entire host on which PTM is running. You can include the hostnametype host-only parameter, which specifies if PTM uses only the hostname (hostname) or the fully qualified domain name (fqdn) while looking for the self-node in the graph file. For example, in the graph file below PTM ignores the FQDN and only looks for switch04 because that is the hostname of the switch on which it is running:
Always wrap the hostname in double quotes; for example, "www.example.com" to prevent ptmd from failing.
To avoid errors when starting the ptmd process, make sure that /etc/hosts and /etc/hostname both reflect the hostname you are using in the topology.dot file.
Global parameters apply to every port in the topology file. There are two global parameters: LLDP and BFD. LLDP is on by default; if no keyword is present, PTM uses the default values for all ports. However, BFD is off if no keyword is present unless a per-port override exists. For example:
Templates provide flexibility in choosing different parameter combinations and applying them to a given port. A template instructs ptmd to reference a named parameter string instead of a default one. There are two parameter strings ptmd supports:
bfdtmpl specifies a custom parameter tuple for BFD.
lldptmpl specifies a custom parameter tuple for LLDP.
match_type, which defaults to the interface name (ifname), but can accept a port description (portdescr) instead if you want lldpd to compare the topology against the port description instead of the interface name. You can set this parameter globally or at the per-port level.
match_hostname, which defaults to the hostname (hostname), but enables PTM to match the topology using the fully qualified domain name (fqdn) supplied by LLDP.
The following is an example of a topology with LLDP at the port level:
When you specify match_hostname=fqdn, ptmd matches the entire FQDN, (cumulus-2.domain.com in the example below). If you do not specify anything for match_hostname, ptmd matches based on hostname only, (cumulus-3 below), and ignores the rest of the URL:
BFD provides low overhead and rapid detection of failures in the paths between two network devices. It provides a unified mechanism for link detection over all media and protocol layers. Use BFD to detect failures for IPv4 and IPv6 single or multihop paths between any two network devices, including unidirectional path failure detection. For information about configuring BFD using PTM, see BFD.
Check Link State
You can enable PTM to perfom additional checks to ensure that routing adjacencies form only on links that have connectivity and that conform to the specification that ptmd defines.
You only need to enable PTM to check link state. You do not need to enable PTM to determine BFD status.
cumulus@switch:~$ nv set router ptm enable
cumulus@switch:~$ nv config apply
To disable the check link state, set the no ptm-enable parameter:
cumulus@switch:~$ sudo vtysh
...
switch# configure terminal
switch(config)# no ptm-enable
switch(config)# end
switch# write memory
switch# exit
cumulus@switch:~$
To check PTM status on an interface, run the net show interface <interface> command or the vtysh show interface <interface> command.
cumulus@switch:~$ net show interface swp4
Name MAC Speed MTU Mode
----- ---- ----------------- ----- ---- -------------
ADMDN swp4 48:b0:2d:59:0a:de N/A 1500 NotConfigured
Routing
-------
Interface swp4 is up, line protocol is up
Link ups: 0 last: (never)
Link downs: 0 last: (never)
PTM status: disabled
vrf: default
index 3 metric 0 mtu 1550 speed 4294967295
flags: <UP,BROADCAST,RUNNING,MULTICAST>
Type: Ethernet
HWaddr: c4:54:44:bd:01:41
...
ptmd Service Commands
PTM sends client notifications in CSV format.
To start or restart the ptmd service, run the following command. The topology.dot file must be present for the service to start.
cumulus@switch:~$ sudo systemctl status ptmd.service
ptmctl Commands
ptmctl is a client of ptmd that retrieves the operational state of the ports configured on the switch and information about BFD sessions from ptmd. ptmctl parses the CSV notifications sent by ptmd. See man ptmctl for more information.
ptmctl Examples
The examples below contain the following keywords in the output of the cbl status column:
cbl status Keyword
Definition
pass
The topology file defines the interface, the interface receives LLDP information, and the LLDP information for the interface matches the information in the topology file.
fail
The topology file defines the interface, the interface receives LLDP information, and the LLDP information for the interface does not match the information in the topology file.
N/A
The topology file defines the interface but the interface does not receive LLDP information. The interface might be down or disconnected, or the neighbor is not sending LLDP packets. The N/A and fail status might indicate a wiring problem to investigate. The N/A status does not show when you use the -l option with ptmctl; The output shows only interfaces that are receiving LLDP information.
For basic output, use ptmctl without any options:
cumulus@switch:~$ sudo ptmctl
-------------------------------------------------------------
port cbl BFD BFD BFD BFD
status status peer local type
-------------------------------------------------------------
swp1 pass pass 11.0.0.2 N/A singlehop
swp2 pass N/A N/A N/A N/A
swp3 pass N/A N/A N/A N/A
For more detailed output, use the -d option:
cumulus@switch:~$ sudo ptmctl -d
--------------------------------------------------------------------------------------
port cbl exp act sysname portID portDescr match last BFD BFD
status nbr nbr on upd Type state
--------------------------------------------------------------------------------------
swp45 pass h1:swp1 h1:swp1 h1 swp1 swp1 IfName 5m: 5s N/A N/A
swp46 fail h2:swp1 h2:swp1 h2 swp1 swp1 IfName 5m: 5s N/A N/A
#continuation of the output
-------------------------------------------------------------------------------------------------
BFD BFD det_mult tx_timeout rx_timeout echo_tx_timeout echo_rx_timeout max_hop_cnt
peer DownDiag
-------------------------------------------------------------------------------------------------
N/A N/A N/A N/A N/A N/A N/A N/A
N/A N/A N/A N/A N/A N/A N/A N/A
To return information on active BFD sessions ptmd is tracking, use the -b option:
cumulus@switch:~$ sudo ptmctl -b
----------------------------------------------------------
port peer state local type diag
----------------------------------------------------------
swp1 11.0.0.2 Up N/A singlehop N/A
N/A 12.12.12.1 Up 12.12.12.4 multihop N/A
To return LLDP information, use the -l option. The output returns only the active neighbors that ptmd is tracking.
cumulus@switch:~$ sudo ptmctl -l
---------------------------------------------
port sysname portID port match last
descr on upd
---------------------------------------------
swp45 h1 swp1 swp1 IfName 5m:59s
swp46 h2 swp1 swp1 IfName 5m:59s
To return detailed information on active BFD sessions ptmd is tracking, use the -b and -d option (results are for an IPv6-connected peer):
cumulus@switch:~$ sudo ptmctl -b -d
----------------------------------------------------------------------------------------
port peer state local type diag det tx_timeout rx_timeout
mult
----------------------------------------------------------------------------------------
swp1 fe80::202:ff:fe00:1 Up N/A singlehop N/A 3 300 900
swp1 3101:abc:bcad::2 Up N/A singlehop N/A 3 300 900
#continuation of output
---------------------------------------------------------------------
echo echo max rx_ctrl tx_ctrl rx_echo tx_echo
tx_timeout rx_timeout hop_cnt
---------------------------------------------------------------------
0 0 N/A 187172 185986 0 0
0 0 N/A 501 533 0 0
ptmctl Error Outputs
If there are errors in the topology file or there is no session, PTM returns appropriate outputs. Typical error strings are:
Topology file error [/etc/ptm.d/topology.dot] [cannot find node cumulus] -
please check /var/log/ptmd.log for more info
Topology file error [/etc/ptm.d/topology.dot] [cannot open file (errno 2)] -
please check /var/log/ptmd.log for more info
No Hostname/MgmtIP found [Check LLDPD daemon status] -
please check /var/log/ptmd.log for more info
No BFD sessions . Check connections
No LLDP ports detected. Check connections
Unsupported command
For example:
cumulus@switch:~$ sudo ptmctl
-------------------------------------------------------------------------
cmd error
-------------------------------------------------------------------------
get-status Topology file error [/etc/ptm.d/topology.dot]
[cannot open file (errno 2)] - please check /var/log/ptmd.log
for more info
If you encounter errors with the topology.dot file, you can use dot (included in the Graphviz package) to validate the syntax of the topology file.
Open the topology file with Graphviz to ensure that it is readable and that the file format is correct.
If you edit topology.dot file from a Windows system, be sure to double check the file formatting; there might be extra characters that keep the graph from working correctly.
Basic Topology Example
The following example shows a basic example DOT file and its corresponding topology diagram. Use the same topology.dot file on all switches and do not split the file for each device to allow for easy automation by using the same exact file on each device.
When ptmd is in an incorrect failure state and you enable the Zebra interface, PIF BGP sessions do not establish the route but the subinterface does establish routes.
If the subinterface is on the physical interface and PTM marks the physical interface in a PTM FAIL state, FRR does not process routes on the physical interface, but the subinterface is working.
Commas in Port Descriptions
If an LLDP neighbor advertises a PortDescr that contains commas, ptmctl -d splits the string on the commas and misplaces its components in other columns. Do not use commas in your port descriptions.
Port security is a layer 2 traffic control feature that enables you to manage network access from end-users. Use port security to:
Limit port access to specific MAC addresses so that the port does not forward ingress traffic from source addresses that are not defined.
Limit port access to only the first learned MAC address on the port (sticky MAC) so that the device with that MAC address has full bandwidth. You can provide a timeout so that the MAC address on that port no longer has access after a specified time.
Limit port access to a specific number of MAC addresses.
You can specify what action to take when there is a port security violation (drop packets or put the port into ADMIN down state) and add a timeout for the action to take effect.
Layer 2 interfaces in trunk or access mode are currently supported. However, interfaces in a bond are not supported.
NVUE commands are not available for port security configuration.
Configure Port Security
To configure port security, add the configuration settings you want to use to the /etc/cumulus/switchd.d/port_security.conf file, then restart switchd to apply the changes.
Setting
Description
interface.<port>.port_security.enable
1 enables security on the port. 0 disables security on the port.
interface.<port>.port_security.mac_limit
The maximum number of MAC addresses allowed to access the port. You can specify a number between 0 and 512. The default is 32.
interface.<port>.port_security.static_mac
The specific MAC addresses allowed to access the port. You can specify multiple MAC addresses. Separate each MAC address with a space.
interface.<port>.port_security.sticky_mac
1 enables sticky MAC, where the first learned MAC address on the port is the only MAC address allowed. 0 disables sticky MAC.
interface.<port>.port_security.sticky_timeout
The time period after which the first learned MAC address ages out and no longer has access to the port. The default aging timeout value is 30 minutes. You can specify a value between 0 and 60 minutes.
interface.<port>.port_security.sticky_aging
1 enables sticky MAC aging. 0 disables sticky MAC aging.
interface.<port>.port_security.violation_mode
The violation mode: 0 (shutdown) puts a port into ADMIN down state. 1 (restrict) drops packets.