NVIDIA Cumulus Linux

Cumulus Linux 5.6 User Guide

NVIDIA® Cumulus Linux is the first full-featured Debian Buster-based, Linux operating system for the networking industry.

This user guide provides in-depth documentation on the Cumulus Linux installation process, system configuration and management, network solutions, and monitoring and troubleshooting recommendations. In addition, the quick start guide provides an end-to-end setup process to get you started.

Cumulus Linux 5.6 includes the NVIDIA NetQ agent and CLI. You can use NetQ to monitor and manage your data center network infrastructure and operational health. Refer to the NVIDIA NetQ documentation for details.

For a list of the new features in this release, see What's New. For bug fixes and known issues present in this release, refer to the Cumulus Linux 5.6 Release Notes.

Try It Pre-built Demos

The Cumulus Linux documentation includes pre-built Try It demos for certain Cumulus Linux features. The Try It demos run a simulation in NVIDIA Air; a cloud hosted platform that works exactly like a real world production deployment. Use the Try It demos to examine switch configuration for a feature. For more information, see Try It Pre-built Demos.

Open Source Contributions

To implement various Cumulus Linux features, NVIDIA has forked various software projects, like CFEngine Netdev and some Puppet Labs packages. Some of the forked code resides in the NVIDIA Networking GitHub repository and some is available as part of the Cumulus Linux repository as Debian source packages.

NVIDIA has also developed and released new applications as open source. The list of open source projects is on the Cumulus Linux packages page.

Download the User Guide

Use one of the following methods to download the Cumulus Linux user guide and view it offline:

What's New

This document supports the Cumulus Linux 5.6 release, and lists new platforms, features, and enhancements.

What’s New in Cumulus Linux 5.6.0

Cumulus Linux 5.6.0 supports new platforms, contains several new features and improvements, and provides bug fixes.

Platforms

New Features and Enhancements

New NVUE Commands

For descriptions and examples of all NVUE commands, refer to the NVUE Command Reference for Cumulus Linux.

nv show router pbr nexthop-group
nv show router pbr nexthop-group <nexthop-group-id>
nv show bridge domain <domain-id> port
nv show bridge domain <domain-id> stp counters
nv show bridge domain <domain-id> stp port
nv show bridge domain <domain-id> stp port <interface-id>
nv show bridge domain <domain-id> stp vlan
nv show bridge domain <domain-id> stp vlan <vid>
nv show evpn vni <vni-id> remote-vtep
nv show qos pfc-watchdog
nv show interface <interface-id> ip igmp group
nv show interface <interface-id> ip igmp group <static-group-id>
nv show interface <interface-id> bridge domain <domain-id> stp vlan
nv show interface <interface-id> bridge domain <domain-id> stp vlan <vid>
nv show interface <interface-id> qos pfc-watchdog
nv show interface <interface-id> qos pfc-watchdog status
nv show interface <interface-id> qos pfc-watchdog status <qos-tc-id>
nv show service ptp <instance-id> counters
nv show system api
nv show system api listening-address
nv show system api listening-address <listening-address-id>
nv show system api connections
nv show system global arp
nv show system global arp garbage-collection-threshold
nv show system global nd
nv show system global nd garbage-collection-threshold
nv show system ssh-server
nv show system ssh-server max-unauthenticated
nv show system ssh-server vrf
nv show system ssh-server vrf <vrf-id>
nv show system ssh-server allow-users
nv show system ssh-server allow-users <user-id>
nv show system ssh-server deny-users
nv show system ssh-server deny-users <user-id>
nv show system ssh-server port
nv show system ssh-server port <port-id>
nv show system ssh-server active-sessions
nv set router adaptive-routing link-utilization-threshold (on|off)
nv set router password-obfuscation (enabled|disabled)
nv set bridge domain <domain-id> stp vlan <vid>
nv set bridge domain <domain-id> stp vlan <vid> bridge-priority 4096-61440
nv set bridge domain <domain-id> stp vlan <vid> hello-time 1-10
nv set bridge domain <domain-id> stp vlan <vid> forward-delay 4-30
nv set bridge domain <domain-id> stp vlan <vid> max-age 6-40
nv set bridge domain <domain-id> stp mode (rstp|pvrst)
nv set qos pfc-watchdog polling-interval 100-5000
nv set qos pfc-watchdog robustness 1-1000
nv set interface <interface-id> ip igmp fast-leave
nv set interface <interface-id> ip igmp last-member-query-count
nv set interface <interface-id> router adaptive-routing link-utilization-threshold 1-100
nv set interface <interface-id> bridge domain <domain-id> stp vlan <vid>
nv set interface <interface-id> bridge domain <domain-id> stp vlan <vid> priority 0-240
nv set interface <interface-id> bridge domain <domain-id> stp vlan <vid> path-cost
nv set interface <interface-id> bridge domain <domain-id> stp path-cost 1-200000000 
nv set interface <interface-id> qos pfc-watchdog state (enable|disable)
nv set service ptp <instance-id> profile <profile-id> two-step (on|off)
nv set service ptp <instance-id> two-step (on|off)
nv set system api listening-address <listening-address-id>
nv set system api state (enabled|disabled)
nv set system api port 1-65535
nv set system global arp base-reachable-time (30-2147483|auto)
nv set system global nd base-reachable-time (30-2147483|auto)
nv set system ssh-server max-unauthenticated session-count 1-10000
nv set system ssh-server max-unauthenticated throttle-percent 1-100
nv set system ssh-server max-unauthenticated throttle-start 1-10000
nv set system ssh-server vrf <vrf-id>
nv set system ssh-server allow-users <user-id>
nv set system ssh-server deny-users <user-id>
nv set system ssh-server port <port-id>
nv set system ssh-server authentication-retries 3-100
nv set system ssh-server login-timeout 1-600
nv set system ssh-server inactive-timeout <value>
nv set system ssh-server permit-root-login (disabled|prohibit-password|forced-commands-only|enabled)
nv set system ssh-server max-sessions-per-connection 1-100
nv set system ssh-server state (enabled|disabled)
nv unset router adaptive-routing link-utilization-threshold
nv unset router password-obfuscation
nv unset bridge domain <domain-id> stp vlan
nv unset bridge domain <domain-id> stp vlan <vid>
nv unset bridge domain <domain-id> stp vlan <vid> bridge-priority
nv unset bridge domain <domain-id> stp vlan <vid> hello-time
nv unset bridge domain <domain-id> stp vlan <vid> forward-delay
nv unset bridge domain <domain-id> stp vlan <vid> max-age
nv unset bridge domain <domain-id> stp mode
nv unset qos pfc-watchdog
nv unset qos pfc-watchdog polling-interval
nv unset qos pfc-watchdog robustness
nv unset interface <interface-id> ip igmp fast-leave
nv unset interface <interface-id> ip igmp last-member-query-count
nv unset interface <interface-id> bridge domain <domain-id> stp vlan
nv unset interface <interface-id> bridge domain <domain-id> stp vlan <vid>
nv unset interface <interface-id> bridge domain <domain-id> stp vlan <vid> priority
nv unset interface <interface-id> bridge domain <domain-id> stp vlan <vid> path-cost
nv unset interface <interface-id> bridge domain <domain-id> stp path-cost
nv unset interface <interface-id> qos pfc-watchdog
nv unset interface <interface-id> qos pfc-watchdog state
nv unset service ptp <instance-id> profile <profile-id> two-step
nv unset service ptp <instance-id> two-step
nv unset system api
nv unset system api listening-address
nv unset system api listening-address <listening-address-id>
nv unset system api state
nv unset system api port
nv unset system api port 1-65535
nv unset system global arp base-reachable-time
nv unset system global nd base-reachable-time
nv unset system ssh-server
nv unset system ssh-server max-unauthenticated
nv unset system ssh-server max-unauthenticated session-count
nv unset system ssh-server max-unauthenticated throttle-percent
nv unset system ssh-server max-unauthenticated throttle-start
nv unset system ssh-server vrf
nv unset system ssh-server vrf <vrf-id>
nv unset system ssh-server allow-users
nv unset system ssh-server allow-users <user-id>
nv unset system ssh-server deny-users
nv unset system ssh-server deny-users <user-id>
nv unset system ssh-server port
nv unset system ssh-server port <port-id>
nv unset system ssh-server authentication-retries
nv unset system ssh-server login-timeout
nv unset system ssh-server inactive-timeout
nv unset system ssh-server permit-root-login
nv unset system ssh-server max-sessions-per-connection
nv unset system ssh-server state
nv action clear router policy prefix-list
nv action clear router policy prefix-list <prefix-list-id>
nv action clear router policy prefix-list <prefix-list-id> rule <rule-id> match <match-id>
nv action clear router bgp
nv action clear router bgp in prefix-filter
nv action clear router bgp out
nv action clear router bgp soft
nv action clear router bgp soft in
nv action clear router bgp soft out
nv action clear router igmp interfaces
nv action clear evpn vni
nv action clear evpn vni <vni-id>
nv action clear evpn vni <vni-id> mac <mac-address-id>
nv action clear evpn vni <vni-id> host <ip-address-id>
nv action clear interface <interface-id> qos pfc-watchdog deadlock-count
nv action clear vrf <vrf-id> router bgp in prefix-filter
nv action clear vrf <vrf-id> router bgp out
nv action clear vrf <vrf-id> router bgp soft
nv action clear vrf <vrf-id> router bgp soft in
nv action clear vrf <vrf-id> router bgp soft out
nv action clear vrf <vrf-id> router pim interfaces
nv action clear vrf <vrf-id> router pim interface-traffic

Cumulus Linux 5.6 includes the NVUE object model. After you upgrade to Cumulus Linux 5.6, running NVUE configuration commands might override configuration for features that are now configurable with NVUE and removes configuration you added manually to files or with automation tools like Ansible, Chef, or Puppet. To keep your configuration, you can do one of the following:

Cumulus Linux 3.7, 4.3, and 4.4 continue to support NCLU. For more information, contact your NVIDIA Spectrum platform sales representative.

Quick Start Guide

This quick start guide provides an end-to-end setup process for installing and running Cumulus Linux.

Prerequisites

This guide assumes you have intermediate-level Linux knowledge. You need to be familiar with basic text editing, Unix file permissions, and process monitoring. A variety of text editors are pre-installed, including vi and nano.

You must have access to a Linux or UNIX shell. If you are running Windows, use a Linux environment like Cygwin as your command line tool for interacting with Cumulus Linux.

Get Started

Cumulus Linux is on the switch by default. To upgrade to a different Cumulus Linux release or re-install Cumulus Linux, refer to Installation Management. To show the current Cumulus Linux release on the switch, run the NVUE nv show system command.

When starting Cumulus Linux for the first time, the management port makes a DHCPv4 request. To determine the IP address of the switch, you can cross reference the MAC address of the switch with your DHCP server. The MAC address is typically located on the side of the switch or on the box in which the unit ships.

To get started:

You can choose to configure Cumulus Linux either with NVUE commands or Linux commands (with vtysh or by manually editing configuration files). Do not run both NVUE configuration commands (such as nv set, nv unset, nv action, nv config) and Linux commands to configure the switch. NVUE commands replace the configuration in files such as /etc/network/interfaces and /etc/frr/frr.conf, and remove any configuration you add manually or with automation tools like Ansible, Chef, or Puppet.

If you choose to configure Cumulus Linux with NVUE, you can configure features that do not yet support the NVUE Object Model by creating NVUE Snippets.

Login Credentials

The default installation includes two accounts:

ONIE includes options that allow you to change the default password for the cumulus account automatically when you install a new Cumulus Linux image. Refer to ONIE Installation Options. You can also change the default password using a ZTP script.

In this quick start guide, you use the cumulus account to configure Cumulus Linux.

All accounts except root can use remote SSH login; you can use sudo to grant a non-root account root-level access. Commands that change the system configuration require this elevated level of access.

For more information about sudo, see Using sudo to Delegate Privileges.

Serial Console Management

NVIDIA recommends you perform management and configuration over the network, either in band or out of band. A serial console is fully supported.

Typically, switches ship from the manufacturer with a mating DB9 serial cable. Switches with ONIE are always set to a 115200 baud rate.

Wired Ethernet Management

A Cumulus Linux switch always provides at least one dedicated Ethernet management port called eth0. This interface is specifically for out-of-band management use. The management interface uses DHCPv4 for addressing by default.

To set a static IP address:

cumulus@switch:~$ nv set interface eth0 ip address 192.0.2.42/24
cumulus@switch:~$ nv set interface eth0 ip gateway 192.0.2.1
cumulus@switch:~$ nv config apply

Edit the /etc/network/interfaces file:

cumulus@switch:~$ sudo nano /etc/network/interfaces
# Management interface
auto eth0
iface eth0
    address 192.0.2.42/24
    gateway 192.0.2.1

Configure the Hostname

The hostname identifies the switch; make sure you configure the hostname to be unique and descriptive.

Do not use an underscore (_), apostrophe ('), or non-ASCII characters in the hostname.

To change the hostname:

cumulus@switch:~$ nv set system hostname leaf01
cumulus@switch:~$ nv config apply
  1. Change the hostname with the hostnamectl command; for example:

    cumulus@switch:~$ sudo hostnamectl set-hostname leaf01
    
  2. In the /etc/hosts file, replace the host for IP address 127.0.1.1 with the new hostname:

    cumulus@switch:~$ sudo nano /etc/hosts
    ...
    127.0.1.1       leaf01
    

The command prompt in the terminal does not reflect the new hostname until you either log out of the switch or start a new shell.

Configure the Time Zone

The default time zone on the switch is UTC (Coordinated Universal Time). Change the time zone on your switch to be the time zone for your location.

To update the time zone:

Run the nv set system timezone <timezone> command. To see all the available time zones, run nv set system timezone and press the Tab key. The following example sets the time zone to US/Eastern:

cumulus@switch:~$ nv set system timezone US/Eastern
cumulus@switch:~$ nv config apply
  1. In a terminal, run the following command:

    cumulus@switch:~$ sudo dpkg-reconfigure tzdata
    
  2. Follow the on screen menu options to select the geographic area and region.

Programs that are already running (including log files) and logged in users, do not see time zone changes. To set the time zone for all services and daemons, reboot the switch.

Verify the System Time

Verify that the date and time on the switch are correct with the Linux date command:

cumulus@switch:~$ date
Mon 21 Nov 2022 06:30:37 PM UTC

If the date and time are incorrect, the switch does not synchronize with automation tools, such as Puppet, and returns errors after you restart switchd.

To set the software clock according to the configured time zone, run the Linux sudo date -s command; for example:

cumulus@switch:~$ sudo date -s "Tue Jan 26 00:37:13 2021"

For more information about setting the system time, see Setting the Date and Time.

NTP and PTP

Configure Breakout Ports with Splitter Cables

If you are using 4x10G DAC or AOC cables, or you want to break out (split) switch ports, configure the breakout ports; see Switch Port Attributes.

Test Cable Connectivity

By default, Cumulus Linux disables all data plane ports (every Ethernet port except the management interface, eth0). To test cable connectivity, administratively enable physical ports.

To enable a port administratively:

cumulus@switch:~$ nv set interface swp1
cumulus@switch:~$ nv config apply

To enable all physical ports administratively on a switch that has ports numbered from swp1 to swp52:

cumulus@switch:~$ nv set interface swp1-52
cumulus@switch:~$ nv config apply

To view link status, run the nv show interface command.

To enable a port administratively:

cumulus@switch:~$ sudo ip link set swp1 up

To enable all physical ports administratively, run the following bash script:

cumulus@switch:~$ sudo su -
cumulus@switch:~$ for i in /sys/class/net/*; do iface=`basename $i`; if [[ $iface == swp* ]]; then ip link set $iface up fi done

To view link status, run the ip link show command.

Configure Layer 2 Ports

Cumulus Linux does not put all ports into a bridge by default. To create a bridge and configure one or more front panel ports as members of the bridge:

The following configuration example places the front panel port swp1 into the default bridge called br_default.

cumulus@switch:~$ nv set interface swp1 bridge domain br_default
cumulus@switch:~$ nv config apply

You can add a range of ports in one command. For example, to add swp1 through swp3, swp10, and swp14 through swp20 to the bridge:

cumulus@switch:~$ nv set interface swp1-3,swp6,swp14-20 bridge domain br_default
cumulus@switch:~$ nv config apply

The following configuration example places the front panel port swp1 into the default bridge called br_default:

...
auto br_default
iface br_default
    bridge-ports swp1
...

To put a range of ports into a bridge, use the glob keyword. For example, to add swp1 through swp10, swp12, and swp14 through swp20 to the bridge called br_default:

...
auto br_default
iface br_default
    bridge-ports glob swp1-10 swp12 glob swp14-20
...

To apply the configuration, check for typos:

cumulus@switch:~$ sudo ifquery -a

If there are no errors, run the following command:

cumulus@switch:~$ sudo ifup -a

For more information about Ethernet bridges, see Ethernet Bridging - VLANs.

Configure Layer 3 Ports

You can configure a front panel port or bridge interface as a layer 3 port.

The following configuration example configures the front panel port swp1 as a layer 3 access port:

cumulus@switch:~$ nv set interface swp1 ip address 10.0.0.0/31
cumulus@switch:~$ nv config apply

To add an IP address to a bridge interface, you must put it into a VLAN interface. If you want to use a VLAN other than the native one, set the bridge PVID:

cumulus@switch:~$ nv set interface swp1-2 bridge domain br_default
cumulus@switch:~$ nv set bridge domain br_default vlan 10
cumulus@switch:~$ nv set interface vlan10 ip address 10.1.10.2/24
cumulus@switch:~$ nv set bridge domain br_default untagged 1
cumulus@switch:~$ nv config apply

The following configuration example configures the front panel port swp1 as a layer 3 access port:

auto swp1
iface swp1
  address 10.0.0.0/31

To add an IP address to a bridge interface, include the address under the iface stanza in the /etc/network/interfaces file. If you want to use a VLAN other than the native one, set the bridge PVID:

auto br_default
iface br_default
    address 10.1.10.2/24
    bridge-ports swp1 swp2
    bridge-pvid 1

To apply the configuration, check for typos:

cumulus@switch:~$ sudo ifquery -a

If there are no errors, run the following command:

cumulus@switch:~$ sudo ifup -a

Configure a Loopback Interface

Cumulus Linux has a preconfigured loopback interface. When the switch boots up, the loopback interface, called lo, is up and assigned an IP address of 127.0.0.1.

The loopback interface lo must always exist on the switch and must always be up. To check the status of the loopback interface, run the NVUE nv show interface lo command or the Linux ip addr show lo command.

To add an IP address to a loopback interface, configure the lo interface:

cumulus@switch:~$ nv set interface lo ip address 10.10.10.1/32
cumulus@switch:~$ nv config apply

Add the IP address directly under the iface lo inet loopback definition in the /etc network/interfaces file:

auto lo
iface lo inet loopback
    address 10.10.10.1

If you configure an IP address without a subnet mask, it becomes a /32 IP address. For example, 10.10.10.1 is 10.10.10.1/32.

You can add multiple loopback addresses. For more information, see Interface Configuration and Management.

If you run NVUE commands to configure the switch, run the nv config save command before you reboot. The command saves the applied configuration to the startup configuration so that the changes persist after the reboot.

cumulus@switch:~$ nv config save

Show Platform and System Settings

Next Steps

You are now ready to configure the switch according to your needs. This guide provides separate sections that describe how to configure system, layer 1, layer 2, layer 3, and network virtualization settings. Each section includes example configurations and pre-built demos.

For a deep dive into the NVUE object model that provides a CLI to simplify configuration, see NVUE.

Installation Management

This section describes how to manage, install, and upgrade Cumulus Linux on your switch.

Managing Cumulus Linux Disk Images

The Cumulus Linux operating system resides on a switch as a disk image. This section discusses how to manage the image.

To install a new Cumulus Linux image, refer to Installing a New Cumulus Linux Image. To upgrade Cumulus Linux, refer to Upgrading Cumulus Linux.

Reprovision the System (Restart the Installer)

Reprovisioning the system deletes all system data from the switch.

To stage an ONIE installer from the network (where ONIE automatically locates the installer), run the onie-select -i command. You must reboot the switch to start the install process.

cumulus@switch:~$ sudo onie-select -i
WARNING:
WARNING: Operating System install requested.
WARNING: This will wipe out all system data.
WARNING:
Are you sure (y/N)? y
Enabling install at next reboot...done.
Reboot required to take effect.

To cancel a pending reinstall operation, run the onie-select -c command:

cumulus@switch:~$ sudo onie-select -c
Cancelling pending install at next reboot...done.

To stage an installer located in a specific location, run the onie-install -i <location> command. You can specify a local, absolute or relative path, an HTTP or HTTPS server, SCP or FTP server. You can also stage a Zero Touch Provisioning (ZTP) script along with the installer. You typically use the onie-install command with the -a option to activate installation. If you do not specify the -a option, you must reboot the switch to start the installation process.

The following example stages the installer located at http://203.0.113.10/image-installer together with the ZTP script located at http://203.0.113.10/ztp-script and activates installation and ZTP:

cumulus@switch:~$ sudo onie-install -i http://203.0.113.10/image-installer
cumulus@switch:~$ sudo onie-install -z http://203.0.113.10/ztp-script
cumulus@switch:~$ sudo onie-install -a

You can also specify these options together in the same command. For example:

cumulus@switch:~$ sudo onie-install -i http://203.0.113.10/image-installer -z http://203.0.113.10/ztp-script -a

To see more onie-install options, run man onie-install.

Migrate from Cumulus Linux to ONIE (Uninstall All Images and Remove the Configuration)

To remove all installed images and configurations, and return the switch to its factory defaults, run the onie-select -k command.

The onie-select -k command takes a long time to run as it overwrites the entire NOS section of the flash. Only use this command if you want to erase all NOS data and take the switch out of service.

cumulus@switch:~$ sudo onie-select -k
WARNING:
WARNING: Operating System uninstall requested.
WARNING: This will wipe out all system data.
WARNING:
Are you sure (y/N)? y
Enabling uninstall at next reboot...done.
Reboot required to take effect.

You must reboot the switch to start the uninstallation process.

To cancel a pending uninstall operation, run the onie-select -c command:

cumulus@switch:~$ sudo onie-select -c
Cancelling pending uninstall at next reboot...done.

Boot Into Rescue Mode

If your system becomes unresponsive, you can correct certain issues by booting into ONIE rescue mode, which uses unmounted file systems. You can use various Cumulus Linux utilities to try and resolve a problem.

To reboot the system into ONIE rescue mode, run the onie-select -r command:

cumulus@switch:~$ sudo onie-select -r
WARNING:
WARNING: Rescue boot requested.
WARNING:
Are you sure (y/N)? y
Enabling rescue at next reboot...done.
Reboot required to take effect.

You must reboot the system to boot into rescue mode.

To cancel a pending rescue boot operation, run the onie-select -c command:

cumulus@switch:~$ sudo onie-select -c
Cancelling pending rescue at next reboot...done.

Inspect the Image File

The Cumulus Linux image file is executable. From a running switch, you can display, extract, and verify the contents of the image file.

To display the contents of the Cumulus Linux image file, pass the info option to the image file. For example, to display the contents of an image file called onie-installer located in the /var/lib/cumulus/installer directory:

cumulus@switch:~$ sudo /var/lib/cumulus/installer/onie-installer info
Verifying image checksum ...OK.
Preparing image archive ... OK.
Control File Contents
=====================
Description: Cumulus Linux 4.1.0
Release: 4.1.0
Architecture: amd64
Switch-Architecture: bcm-amd64
Build-Id: dirtyz224615f
Build-Date: 2019-05-17T16:34:22+00:00
Build-User: clbuilder
Homepage: http://www.cumulusnetworks.com/
Min-Disk-Size: 1073741824
Min-Ram-Size: 536870912
mkimage-version: 0.11.111_gbcf0

To extract the contents of the image file, use with the extract <path> option. For example, to extract an image file called onie-installer located in the /var/lib/cumulus/installer directory to the mypath directory:

cumulus@switch:~$ sudo /var/lib/cumulus/installer/onie-installer extract mypath
total 181860
-rw-r--r-- 1 4000 4000       308 May 16 19:04 control
drwxr-xr-x 5 4000 4000      4096 Apr 26 21:28 embedded-installer
-rw-r--r-- 1 4000 4000  13273936 May 16 19:04 initrd
-rw-r--r-- 1 4000 4000   4239088 May 16 19:04 kernel
-rw-r--r-- 1 4000 4000 168701528 May 16 19:04 sysroot.tar

To verify the contents of the image file, use with the verify option. For example, to verify the contents of an image file called onie-installer located in the /var/lib/cumulus/installer directory:

cumulus@switch:~$ sudo /var/lib/cumulus/installer/onie-installer verify
Verifying image checksum ...OK.
Preparing image archive ... OK.
./cumulus-linux-bcm-amd64.bin.1: 161: ./cumulus-linux-bcm-amd64.bin.1: onie-sysinfo: not found
Verifying image compatibility ...OK.
Verifying system ram ...OK.
Open Network Install Environment (ONIE) Home Page

Installing a New Cumulus Linux Image

The default password for the cumulus user account is cumulus. The first time you log into Cumulus Linux, you must change this default password. Be sure to update any automation scripts before installing a new image. Cumulus Linux provides command line options to change the default password automatically during the installation process. Refer to ONIE Installation Options.

You can install a new Cumulus Linux image using ONIE, an open source project (equivalent to PXE on servers) that enables the installation of network operating systems (NOS) on bare metal switches.

Before you install Cumulus Linux, the switch can be in two different states:

The sections below describe some of the different ways you can install the Cumulus Linux image. Steps show how to install directly from ONIE (if no image is on the switch) and from Cumulus Linux (if the image is already on the switch). For additional methods to find and install the Cumulus Linux image, see the ONIE Design Specification.

You can download a Cumulus Linux image from the NVIDIA Enterprise support portal.

Installing the Cumulus Linux image is destructive; configuration files on the switch are not saved; copy them to a different server before installing.

In the following procedures:

Install Using a DHCP/Web Server With DHCP Options

To install Cumulus Linux using a DHCP or web server with DHCP options, set up a DHCP/web server on your laptop and connect the eth0 management port of the switch to your laptop. After you connect the cable, the installation proceeds as follows:

  1. The switch boots up and requests an IP address (DHCP request).

  2. The DHCP server acknowledges and responds with DHCP option 114 and the location of the installation image.

  3. ONIE downloads the Cumulus Linux image, installs, and reboots.

    You are now running Cumulus Linux.

The most common way is to send DHCP option 114 with the entire URL to the web server (this can be the same system). However, there are other ways you can use DHCP even if you do not have full control over DHCP. See the ONIE user guide for information on partial installer URLs and advanced DHCP options; both articles list more supported DHCP options.

Here is an example DHCP configuration with an ISC DHCP server:

subnet 172.0.24.0 netmask 255.255.255.0 {
  range 172.0.24.20 172.0.24.200;
  option default-url = "http://172.0.24.14/onie-installer-x86_64";
}

Here is an example DHCP configuration with dnsmasq (static address assignment):

dhcp-host=sw4,192.168.100.14,6c:64:1a:00:03:ba,set:sw4
dhcp-option=tag:sw4,114,"http://roz.rtplab.test/onie-installer-x86_64"

If you do not have a web server, you can use this free Apache example.

Install Using a DHCP/Web Server without DHCP Options

Follow the steps below if you can log into the switch on a serial console (ONIE), or log in on the console or with ssh (Install from Cumulus Linux).

  1. Place the Cumulus Linux image in a directory on the web server.

  2. Run the onie-nos-install command:

    ONIE:/ #onie-nos-install http://10.0.1.251/path/to/cumulus-install-x86_64.bin
    
  1. Place the Cumulus Linux image in a directory on the web server.

  2. From the Cumulus Linux command prompt, run the onie-install command, then reboot the switch.

    cumulus@switch:~$ sudo onie-install -a -i http://10.0.1.251/path/to/cumulus-install-x86_64.bin
    

Install Using a Web Server With no DHCP

Follow the steps below if you can log into the switch on a serial console (ONIE), or you can log in on the console or with ssh (Install from Cumulus Linux) but no DHCP server is available.

You need a console connection to access the switch; you cannot perform this procedure remotely.

  1. ONIE is in discovery mode. You must disable discovery mode with the following command:

    onie# onie-discovery-stop
    

    On older ONIE versions, if the onie-discovery-stop command is not supported, run:

    onie# /etc/init.d/discover.sh stop
    
  2. Assign a static address to eth0 with the ip addr add command:

    ONIE:/ #ip addr add 10.0.1.252/24 dev eth0
    
  3. Place the Cumulus Linux image in a directory on your web server.

  4. Run the installer manually (because there are no DHCP options):

    ONIE:/ #onie-nos-install http://10.0.1.251/path/to/cumulus-install-x86_64.bin
    
  1. Place the Cumulus Linux image in a directory on your web server.

  2. From the Cumulus Linux command prompt, run the onie-install command, then reboot the switch.

    cumulus@switch:~$ sudo onie-install -a -i http://10.0.1.251/path/to/cumulus-install-x86_64.bin
    

Install Using FTP Without a Web Server

Follow the steps below if your laptop is on the same network as the switch eth0 interface but no DHCP server is available.

  1. Set up DHCP or static addressing for eth0. The following example assigns a static address to eth0:

    ONIE:/ #ip addr add 10.0.1.252/24 dev eth0
    
  2. If you are using static addressing, disable ONIE discovery mode:

    onie# onie-discovery-stop
    

    On older ONIE versions, if the onie-discovery-stop command is not supported, run:

    onie# /etc/init.d/discover.sh stop
    
  3. Place the Cumulus Linux image into a TFTP or FTP directory.

  4. If you are not using DHCP options, run one of the following commands (tftp for TFTP or ftp for FTP):

    ONIE# onie-nos-install ftp://local-ftp-server/cumulus-install-x86_64.bin
    
    ONIE# onie-nos-install tftp://local-tftp-server/cumulus-install-[PLATFORM].bin
    
  1. Place the Cumulus Linux image into an FTP directory (TFTP is not supported in Cumulus Linux).

  2. From the Cumulus Linux command prompt, run the following command, then reboot the switch.

    cumulus@switch:~$ sudo onie-install -a -i ftp://local-ftp-server/cumulus-install-x86_64.bin
    

Install Using a Local File

Follow the steps below to install the Cumulus Linux image referencing a local file.

  1. Set up DHCP or static addressing for eth0. The following example assigns a static address to eth0:

    ONIE:/ #ip addr add 10.0.1.252/24 dev eth0
    
  2. If you are using static addressing, disable ONIE discovery mode.

    onie# onie-discovery-stop
    

    On older ONIE versions, if the onie-discovery-stop command is not supported, run:

    onie# /etc/init.d/discover.sh stop
    
  3. Use scp to copy the Cumulus Linux image to the switch.

  4. Run the installer manually from ONIE:

    ONIE:/ #onie-nos-install /path/to/local/file/cumulus-install-x86_64.bin
    
  1. Copy the Cumulus Linux image to the switch.

  2. From the Cumulus Linux command prompt, run the onie-install command, then reboot the switch.

    cumulus@switch:~$ sudo onie-install -a -i /path/to/local/file/cumulus-install-x86_64.bin
    

Install Using a USB Drive

Follow the steps below to install the Cumulus Linux image using a USB drive.

Installing Cumulus Linux using a USB drive is fine for a single switch here and there but is not scalable. DHCP can scale to hundreds of switch installs with zero manual input unlike USB installs.

Prepare for USB Installation

  1. From the NVIDIA Enterprise support portal, download the appropriate Cumulus Linux image for your platform.

  2. From a computer, prepare your USB drive by formatting it using one of the supported formats: FAT32, vFAT or EXT2.

    Optional: Prepare a USB Drive inside Cumulus Linux

    a. Insert your USB drive into the USB port on the switch running Cumulus Linux and log in to the switch. Examine output from cat /proc/partitions and sudo fdisk -l [device] to determine the location of your USB drive. For example, sudo fdisk -l /dev/sdb.

    These instructions assume your USB drive is the /dev/sdb device, which is typical if you insert the USB drive after the machine is already booted. However, if you insert the USB drive during the boot process, it is possible that your USB drive is the /dev/sda device. Make sure to modify the commands below to use the proper device for your USB drive.

    b. Create a new partition table on the USB drive. If the parted utility is not on the system, install it with sudo -E apt-get install parted.

    sudo parted /dev/sdb mklabel msdos
    

    c. Create a new partition on the USB drive:

    sudo parted /dev/sdb -a optimal mkpart primary 0% 100%
    

    d. Format the partition to your filesystem of choice using one of the examples below:

    sudo mkfs.ext2 /dev/sdb1
    sudo mkfs.msdos -F 32 /dev/sdb1
    sudo mkfs.vfat /dev/sdb1
    

    To use mkfs.msdos or mkfs.vfat, you need to install the dosfstools package from the Debian software repositories, as they are not included by default.

    e. To continue installing Cumulus Linux, mount the USB drive to move files:

    sudo mkdir /mnt/usb
    sudo mount /dev/sdb1 /mnt/usb
    
  3. Copy the Cumulus Linux image to the USB drive, then rename the image file to onie-installer-x86_64.

    You can also use any of the ONIE naming schemes mentioned here.

    When using a MAC or Windows computer to rename the installation file, the file extension can still be present. Make sure you remove the file extension so that ONIE can detect the file.

  4. Insert the USB drive into the switch, then prepare the switch for installation:

    • If the switch is offline, connect to the console and power on the switch.
    • If the switch is already online in ONIE, use the reboot command.

    SSH sessions to the switch get dropped after this step. To complete the remaining instructions, connect to the console of the switch. Cumulus Linux switches display their boot process to the console; you need to monitor the console specifically to complete the next step.

  5. Monitor the console and select the ONIE option from the first GRUB screen shown below.

  6. Cumulus Linux on x86 uses GRUB chainloading to present a second GRUB menu specific to the ONIE partition. No action is necessary in this menu to select the default option ONIE: Install OS.

  7. The switch recognizes the USB drive and mounts it automatically. Cumulus Linux installation begins.

  8. After installation completes, the switch automatically reboots into the newly installed instance of Cumulus Linux.

ONIE Installation Options

You can run several installer command line options from ONIE to perform basic switch configuration automatically after installation completes and Cumulus Linux boots for the first time. These options enable you to:

The onie-nos-install command does not allow you to specify command line parameters. You must access the switch from the console and transfer a disk image to the switch. You must then make the disk image executable and install the image directly from the ONIE command line with the options you want to use.

The following example commands transfer a disk image to the switch, make the image executable, and install the image with the --password option to change the default cumulus user password:

ONIE:/ # wget http://myserver.datacenter.com/cumulus-linux-4.4.0-mlx-amd64.bin
ONIE:/ # chmod 755 cumulus-linux-4.4.0-mlx-amd64.bin
ONIE:/ # ./cumulus-linux-4.4.0-mlx-amd64.bin --password 'MyP4$$word'

You can run more than one option in the same command.

Set the cumulus User Password

The default cumulus user account password is cumulus. When you log into Cumulus Linux for the first time, you must provide a new password for the cumulus account, then log back into the system.

To automate this process, you can specify a new password from the command line of the installer with the --password '<clear text-password>' option. For example, to change the default cumulus user password to MyP4$$word:

ONIE:/ # ./cumulus-linux-4.4.0-mlx-amd64.bin --password 'MyP4$$word'

To provide a hashed password instead of a clear text password, use the --hashed-password '<hash>' option. An encrypted hash maintains a secure management network.

  1. Generate a sha-512 password hash with the following openssl command. The example command generates a sha-512 password hash for the password MyP4$$word.

    user@host:~$ openssl passwd -6 'MyP4$$word'
    6$LXOrvmOkqidBGqu7$dy0dpYYllekNKOY/9LLrobWA4iGwL4zHsgG97qFQWAMZ3ZzMeyz11JcqtgwKDEgYR6RtjfDtdPCeuj8eNzLnS.
    
  2. Specify the new password from the command line of the installer with the --hashed-password '<hash>' command:

    ONIE:/ # ./cumulus-linux-4.4.0-mlx-amd64.bin  --hashed-password '6$LXOrvmOkqidBGqu7$dy0dpYYllekNKOY/9LLrobWA4iGwL4zHsgG97qFQWAMZ3ZzMeyz11JcqtgwKDEgYR6RtjfDtdPCeuj8eNzLnS.'
    

If you specify both the --password and --hashed-password options, the --hashed-password option takes precedence and the switch ignores the --password option.

Provide Initial Network Configuration

To provide initial network configuration automatically when Cumulus Linux boots for the first time after installation, use the --interfaces-file <filename> option. For example, to copy the contents of a file called network.intf into the /etc/network/interfaces file and run the ifreload -a command:

ONIE:/ # ./cumulus-linux-4.4.0-mlx-amd64.bin  --interfaces-file network.intf

Execute a ZTP Script

To run a ZTP script that contains commands to execute after Cumulus Linux boots for the first time after installation, use the --ztp <filename> option. For example, to run a ZTP script called initial-conf.ztp:

ONIE:/ # ./cumulus-linux-4.4.0-mlx-amd64.bin --ztp initial-conf.ztp

The ZTP script must contain the CUMULUS-AUTOPROVISIONING string near the beginning of the file and must reside on the ONIE filesystem. Refer to Zero Touch Provisioning - ZTP.

If you use the --ztp option together with any of the other command line options, the ZTP script takes precedence and the switch ignores other command line options.

Change the Default BIOS Password

To provide a layer of security and to prevent unauthorized access to the switch, NVIDIA recommends you change the default BIOS password. The default BIOS password is admin.

To change the default BIOS password:

  1. During system boot, press Ctrl+B through the serial console while the BIOS version prints.

  2. From the Security menu, select Administrator Password.

  1. Follow the prompts.

Edit the Cumulus Linux Image (Advanced)

The Cumulus Linux disk image file contains a BASH script that includes a set of variables. You can set these variables to be able to install a fully configured system with a single image file.

To edit the image

Example Image File

The Cumulus Linux disk image file is a self-extracting executable. The executable part of the file is a BASH script at the beginning of the file. Towards the beginning of this BASH script are a set of variables with empty strings:

...
CL_INSTALLER_PASSWORD=''
CL_INSTALLER_HASHED_PASSWORD=''
CL_INSTALLER_LICENSE=''
CL_INSTALLER_INTERFACES_FILENAME=''
CL_INSTALLER_INTERFACES_CONTENT=''
CL_INSTALLER_ZTP_FILENAME=''
CL_INSTALLER_QUIET=""
CL_INSTALLER_FORCEINST=""
CL_INSTALLER_INTERACTIVE=""
CL_INSTALLER_EXTRACTDIR=""
CL_INSTALLER_PAYLOAD_SHA256="72a8c3da28cda7a610e272b67fa1b3a54c50248bf6abf720f73ff3d10e79ae76"

You can set these variables:

VariableDescription
CL_INSTALLER_PASSWORDDefines the clear text password.
This variable is equivalent to the ONIE installer command line option --password.
CL_INSTALLER_HASHED_PASSWORDDefines the hashed password.
This variable is equivalent to the ONIE installer command line option --hashed-password.
If you set both the CL_INSTALLER_PASSWORD and CL_INSTALLER_HASHED_PASSWORD variable, the CL_INSTALLER_HASHED_PASSWORD takes precedence.
CL_INSTALLER_INTERFACES_FILENAMEDefines the name of the file on the ONIE filesystem you want to use as the /etc/network/interfaces file.
This variable is equivalent to the ONIE installer command line option --interfaces-file.
CL_INSTALLER_INTERFACES_CONTENTDescribes the network interfaces available on your system and how to activate them. Setting this variable defines the contents of the /etc/network/interfaces file.
There is no equivalent ONIE installer command line option.
If you set both the CL_INSTALLER_INTERFACES_FILENAME and CL_INSTALLER_INTERFACES_CONTENT variables, the CL_INSTALLER_INTERFACES_FILENAME takes precedence.
CL_INSTALLER_ZTP_FILENAMEDefines the name of the ZTP file on the ONIE filesystem you want to execute at first boot after installation.
This variable is equivalent to the ONIE installer command line option --ztp

Edit the Image File

Because the Cumulus Linux image file is a binary file, you cannot use standard text editors to edit the file directly. Instead, you must split the file into two parts, edit the first part, then put the two parts back together.

  1. Copy the first 20 lines to an empty file:
head -20 cumulus-linux-4.4.0-mlx-amd64.bin > cumulus-linux-4.4.0-mlx-amd64.bin.1
  1. Remove the first 20 lines of the image, then copy the remaining lines into another empty file:
sed -e '1,20d' cumulus-linux-4.4.0-mlx-amd64.bin > cumulus-linux-4.4.0-mlx-amd64.bin.2

The original file is now split, with the first 20 lines in cumulus-linux-4.4.0-mlx-amd64.bin.1 and the remaining lines in cumulus-linux-4.4.0-mlx-amd64.bin.2.

  1. Use a text editor to change the variables in cumulus-linux-4.4.0-mlx-amd64.bin.1.

  2. Put the two pieces back together using cat:

cat cumulus-linux-4.4.0-mlx-amd64.bin.1 cumulus-linux-4.4.0-mlx-amd64.bin.2 > cumulus-linux-4.4.0-mlx-amd64.bin.final
  1. Calculate the new checksum and update the CL_INSTALLER_PAYLOAD_SHA256 variable.
    sed -e '1,/^exit_marker$/d' "cumulus-linux-4.4.0-mlx-amd64.bin.final" | sha256sum | awk '{ print $1 }'

This following example shows a modified image file:

...
CL_INSTALLER_PAYLOAD_SHA256='d14a028c2a3a2bc9476102bb288234c415a2b01f828ea62ac332e42f'
CL_INSTALLER_PASSWORD='MyP4$$word'
CL_INSTALLER_HASHED_PASSWORD=''
CL_INSTALLER_LICENSE='customer@datacenter.com|4C3YMCACDiK0D/EnrxlXpj71FBBNAg4Yrq+brza4ZtJFCInvalid'
CL_INSTALLER_INTERFACES_FILENAME=''
CL_INSTALLER_INTERFACES_CONTENT='# This file describes the network interfaces available on your system and how to activate them.

source /etc/network/interfaces.d/*.intf

# The loopback network interface
auto lo
iface lo inet loopback

# The primary network interface
auto eth0
iface eth0 inet dhcp
	vrf mgmt

auto bridge
iface bridge
    bridge-ports swp1 swp2
    bridge-pvid 1
    bridge-vids 10 11
    bridge-vlan-aware yes

auto mgmt
iface mgmt
	address 127.0.0.1/8
	address ::1/128
	vrf-table auto
'
CL_INSTALLER_ZTP_FILENAME=''
...

You can install this edited image file in the usual way, by using the ONIE install waterfall or the onie-nos-install command.

If you install the modified installation image and specify installer command line parameters, the command line parameters take precedence over the variables modified in the image.

Secure Boot

Secure Boot validates each binary image loaded during system boot with key signatures that correspond to a stored trusted key in firmware.

Secure Boot is only on the NVIDIA SN3700C-S switch and switches with the Spectrum-4 ASIC.

Secure Boot settings are in the BIOS Security menu. To access BIOS, press Ctrl+B through the serial console during system boot while the BIOS version prints:


To access the BIOS menu, use admin which is the default BIOS password:


NVIDIA recommends changing the default BIOS password; navigate to Security and select Administrator Password.

To validate or change the Secure Boot mode, navigate to Security and select Secure Boot:


In the Secure Boot menu, you can enable and disable Secure Boot mode. To install an unsigned version of Cumulus Linux or access ONIE without a prompt for a username and password, set Secure Boot to disabled:


To access ONIE when Secure Boot is enabled, authentication is necessary. The default username and password are both root:

​ONIE: Rescue Mode ...
Platform  : x86_64-mlnx_x86-r0
Version   : 2021.02-5.3.0006-rc3-115200
Build Date: 2021-05-20T14:27+03:00
Info: Mounting kernel filesystems... done.

Info: Mounting ONIE-BOOT on /mnt/onie-boot ...
[   17.011057] ext4 filesystem being mounted at /mnt/onie-boot supports timestamps until 2038 (0x7fffffff)
Info: Mounting EFI System on /boot/efi ...
Info: BIOS mode: UEFI
Info: Using eth0 MAC address: b8:ce:f6:3c:62:06
Info: eth0:  Checking link... up.
Info: Trying DHCPv4 on interface: eth0
ONIE: Using DHCPv4 addr: eth0: 10.20.84.226 / 255.255.255.0
Starting: klogd... done.
Starting: dropbear ssh daemon... done.
Starting: telnetd... done.
discover: Rescue mode detected.  Installer disabled.

Please press Enter to activate this console. To check the install status inspect /var/log/onie.log.
Try this:  tail -f /var/log/onie.log

** Rescue Mode Enabled **
login: root
Password: root
ONIE:~ #

To validate the Secure Boot status of a system from Cumulus Linux, run the mokutil --sb-state command.

cumulus@leaf01:mgmt:~$ mokutil --sb-state
SecureBoot enabled

On a switch with the Spectrum-4 ASIC, if the ASIC firmware fails to boot, you see a message alerting you to contact NVIDIA Customer Support for further options.

Upgrading Cumulus Linux

The default password for the cumulus user account is cumulus. The first time you log into Cumulus Linux, you must change this default password. Be sure to update any automation scripts before you upgrade. You can use ONIE command line options to change the default password automatically during the Cumulus Linux image installation process. Refer to ONIE Installation Options.

This topic describes how to upgrade Cumulus Linux on your switch.

Consider deploying, provisioning, configuring, and upgrading switches using automation, even with small networks or test labs. During the upgrade process, you can upgrade dozens of devices in a repeatable manner. Using tools like Ansible, Chef, or Puppet for configuration management greatly increases the speed and accuracy of the next major upgrade; these tools also enable you to quickly swap failed switch hardware.

Before You Upgrade

Be sure to read the knowledge base article Upgrades: Network Device and Linux Host Worldview Comparison, which provides a detailed comparison between the network device and Linux host worldview of upgrade and installation.

Back up Configuration Files

Understanding the location of configuration data is important for successful upgrades, migrations, and backup. As with other Linux distributions, the /etc directory is the primary location for all configuration data in Cumulus Linux. The following list contains the files you need to back up and migrate to a new release. Make sure you examine any changed files. Make the following files and directories part of a backup strategy.

File Name and LocationDescriptionCumulus Linux DocumentationDebian Documentation
/etc/frr/Routing application (responsible for BGP and OSPF)FRRoutingN/A
/etc/hostnameConfiguration file for the hostname of the switchQuick Start Guidehttps://wiki.debian.org/HowTo/ChangeHostname
/etc/network/Network configuration files, most notably /etc/network/interfaces and /etc/network/interfaces.d/Switch Port AttributesN/A
/etc/resolv.confDNS resolutionNot unique to Cumulus Linux: wiki.debian.org/NetworkConfigurationhttps://www.debian.org/doc/manuals/debian-reference/ch05.en.html
/etc/hostsConfiguration file for the hostname of the switchQuick Start Guidehttps://wiki.debian.org/HowTo/ChangeHostname
/etc/cumulus/acl/*Netfilter configurationNetfilter - ACLsN/A
/etc/cumulus/control-plane/policers.confConfiguration for control plane policersNetfilter - ACLsN/A
/etc/cumulus/datapath/qos/qos_features.confQoS configuration

Note: In Cumulus Linux 5.0 and later, default ECN configuration parameters start with default_ecn_red_conf instead of default_ecn_conf.
Quality of ServiceN/A
/etc/mlx/datapath/qos/qos_infra.confQoS configurationQuality of ServiceN/A
/etc/mlx/datapath/tcam_profile.confConfiguration for the forwarding table profilesSupported Route Table EntriesN/A
/etc/cumulus/datapath/traffic.confConfiguration for the forwarding table profilesSupported Route Table EntriesN/A
/etc/cumulus/ports.confBreakout cable configuration fileSwitch Port AttributesN/A; read the guide on breakout cables
/etc/cumulus/switchd.confswitchd configurationConfiguring switchdN/A; read the guide on switchd configuration
File Name and LocationDescriptionCumulus Linux DocumentationDebian Documentation
/etc/motdMessage of the dayNot unique to Cumulus Linuxwiki.debian.org/motd
/etc/passwdUser account informationNot unique to Cumulus Linuxhttps://www.debian.org/doc/manuals/debian-reference/ch04.en.html
/etc/shadowSecure user account informationNot unique to Cumulus Linuxhttps://www.debian.org/doc/manuals/debian-reference/ch04.en.html
/etc/groupDefines user groups on the switchNot unique to Cumulus Linuxhttps://www.debian.org/doc/manuals/debian-reference/ch04.en.html
/etc/init/lldpd.confLink Layer Discover Protocol (LLDP) daemon configurationLink Layer Discovery Protocolhttps://packages.debian.org/buster/lldpd
/etc/lldpd.d/Configuration directory for lldpdLink Layer Discovery Protocolhttps://packages.debian.org/buster/lldpd
/etc/nsswitch.confName Service Switch (NSS) configuration fileTACACSN/A
/etc/ssh/SSH configuration filesSSH for Remote Accesshttps://wiki.debian.org/SSH
/etc/sudoers, /etc/sudoers.dBest practice is to place changes in /etc/sudoers.d/ instead of /etc/sudoers; changes in the /etc/sudoers.d/ directory are not lost during upgradeUsing sudo to Delegate Privileges

  • If you are using the root user account, consider including /root/.
  • If you have custom user accounts, consider including /home/<username>/.
  • Run the net show configuration files | grep -B 1 "===" command and back up the files listed in the command output.

File Name and LocationDescription
/etc/mlx/Per-platform hardware configuration directory, created on first boot. Do not copy.
/etc/default/clagdCreated and managed by ifupdown2. Do not copy.
/etc/default/grubGrub init table. Do not modify manually.
/etc/default/hwclockPlatform hardware-specific file. Created during first boot. Do not copy.
/etc/initPlatform initialization files. Do not copy.
/etc/init.d/Platform initialization files. Do not copy.
/etc/fstabStatic information on filesystem. Do not copy.
/etc/image-releaseSystem version data. Do not copy.
/etc/os-releaseSystem version data. Do not copy.
/etc/lsb-releaseSystem version data. Do not copy.
/etc/lvm/archiveFilesystem files. Do not copy.
/etc/lvm/backupFilesystem files. Do not copy.
/etc/modulesCreated during first boot. Do not copy.
/etc/modules-load.d/Created during first boot. Do not copy.
/etc/sensors.dPlatform-specific sensor data. Created during first boot. Do not copy.
/root/.ansibleAnsible tmp files. Do not copy.
/home/cumulus/.ansibleAnsible tmp files. Do not copy.

The following commands verify which files have changed compared to the previous Cumulus Linux install. Be sure to back up any changed files.

Back Up and Restore Configuration with NVUE

You can back up and restore the configuration file with NVUE only if you used NVUE commands to configure the switch you want to upgrade.

To back up and restore the configuration file:

  1. Save the configuration to the /etc/nvue.d/startup.yaml file with the nv config save command:

    cumulus@switch:~$ nv config save
    saved
    
  2. Copy the /etc/nvue.d/startup.yaml file off the switch to a different location.

  3. After upgrade is complete, restore the configuration. Copy the /etc/nvue.d/startup.yaml file to the switch, then run the nv config apply startup command:

    cumulus@switch:~$ nv config apply startup
    applied
    

For information about the NVUE object model and commands, see NVIDIA User Experience - NVUE.

As NVUE supports more features and introduces new syntax, snippets and flexible snippets become invalid.

Before you upgrade Cumulus Linux to a new release, make sure to:

  • Review the What's New for new NVUE syntax.
  • If NVUE introduces new syntax for the feature that a snippet configures, you must remove the snippet before upgrading.

Create a cl-support File

Before and after you upgrade the switch, run the cl-support script to create a cl-support archive file. The file is a compressed archive of useful information for troubleshooting. If you experience any issues during upgrade, you can send this archive file to the Cumulus Linux support team to investigate.

  1. Create the cl-support archive file with the cl-support command:
cumulus@switch:~$ sudo cl-support
  1. Copy the cl-support file off the switch to a different location.

  2. After upgrade is complete, run the cl-support command again to create a new archive file:

cumulus@switch:~$ sudo cl-support

Upgrade Cumulus Linux

You can upgrade Cumulus Linux in one of two ways:

Cumulus Linux also provides ISSU to upgrade an active switch with minimal disruption to the network. See In-Service-System-Upgrade-ISSU.

  • To upgrade to Cumulus Linux 5.6.0 from Cumulus Linux 4.x or 3.x, you must install a disk image of the new release using ONIE. You cannot upgrade packages with the apt-get upgrade command.
  • Upgrading an MLAG pair requires additional steps. If you are using MLAG to dual connect two Cumulus Linux switches in your environment, follow the steps in Upgrade Switches in an MLAG Pair below to ensure a smooth upgrade.

Install a Cumulus Linux Image or Upgrade Packages?

The decision to upgrade Cumulus Linux by either installing a Cumulus Linux image or upgrading packages depends on your environment and your preferences. Here are some recommendations for each upgrade method.

Install a Cumulus Linux image if you are performing a rolling upgrade in a production environment and if you are using up-to-date and comprehensive automation scripts. This upgrade method enables you to choose the exact release to which you want to upgrade and is the only method available to upgrade your switch to a new release train (for example, from 4.4.3 to 5.6.0).

Be aware of the following when installing the Cumulus Linux image:

Run package upgrade if you are upgrading from Cumulus Linux 5.0.0 to a later 5.x release, or if you use third-party applications (package upgrade does not replace or remove third-party applications, unlike the Cumulus Linux image install).

Be aware of the following when upgrading packages:

Cumulus Linux Image Install (ONIE)

ONIE is an open source project (equivalent to PXE on servers) that enables the installation of network operating systems (NOS) on a bare metal switch.

To upgrade the switch:

  1. Back up the configurations off the switch.

  2. Download the Cumulus Linux image.

  3. Install the Cumulus Linux image with the onie-install -a -i <image-location> command, which boots the switch into ONIE. The following example command installs the image from a web server, then reboots the switch. There are additional ways to install the Cumulus Linux image, such as using FTP, a local file, or a USB drive. For more information, see Installing a New Cumulus Linux Image.

    cumulus@switch:~$ sudo onie-install -a -i http://10.0.1.251/cumulus-linux-5.6.0-mlx-amd64.bin && sudo reboot
    
  4. Restore the configuration files to the new release (NVIDIA does not recommend restoring files with automation).

  5. Verify correct operation with the old configurations on the new release.

  6. Reinstall third party applications and associated configurations.

Package Upgrade

  • NVUE deprecated the port split command options (2x10G, 2x25G, 2x40G, 2x50G, 2x100G, 2x200G, 4x10G, 4x25G, 4x50G, 4x100G, 8x50G) available in Cumulus Linux 5.3 and earlier. If you use NVUE to configure port breakout speeds in Cumulus 5.3 or earlier, NVUE automatically updates the configuration during upgrade to Cumulus Linux 5.5 and later to use the new format (2x, 4x, 8x).
  • Cumulus Linux continues to support the old port split format in the /etc/cumulus/ports.conf file; however NVIDIA recommends that you use the new format.

Cumulus Linux completely embraces the Linux and Debian upgrade workflow, where you use an installer to install a base image, then perform any upgrades within that release train with sudo -E apt-get update and sudo -E apt-get upgrade commands. Any packages that have changed after the base install get upgraded in place from the repository. All switch configuration files remain untouched, or in rare cases merged (using the Debian merge function) during the package upgrade.

When you use package upgrade to upgrade your switch, configuration data stays in place during the upgrade. If the new release updates a previously changed configuration file, the upgrade process prompts you to either specify the version you want to use or evaluate the differences.

Disk Space Requirements

Make sure you have enough disk space to perform a package upgrade. Cumulus Linux 5.6.0 requires:

Before you upgrade, run the sudo df -h command to show how much disk space you are currently using on the switch.

cumulus@switch:~$ sudo df -h
Filesystem      Size   Used   Avail   Use%    Mounted on
udev            7.7G      0    7.7G     0%    /dev
tmpfs           1.6G    18M    1.6G     2%    /run
/dev/sda4        28G   7.9G     18G    31%    /
tmpfs           7.7G   277M    7.4G     4%    /dev/shm
tmpfs           5.0M      0    5.0M     0%    /run/lock
tmpfs           7.7G      0    7.7G     0%    /sys/fs/cgroup
tmpfs           7.7G    16K    7.7G     1%    /tmp
overlay          28G   7.9G     18G    31%   

Upgrade the Switch

To upgrade the switch using package upgrade:

  1. Back up the configurations from the switch.

  2. Fetch the latest update metadata from the repository.

    cumulus@switch:~$ sudo -E apt-get update
    
  3. Review potential upgrade issues (in some cases, upgrading new packages might also upgrade additional existing packages due to dependencies).

    cumulus@switch:~$ sudo -E apt-get upgrade --dry-run
    
  4. Upgrade all the packages to the latest distribution.

    cumulus@switch:~$ sudo -E apt-get upgrade
    

    If you do not need to reboot the switch after the upgrade completes, the upgrade ends, restarts all upgraded services, and logs messages in the /var/log/syslog file similar to the ones shown below. In the examples below, the process only upgrades the frr package.

    Policy: Service frr.service action stop postponed
    Policy: Service frr.service action start postponed
    Policy: Restarting services: frr.service
    Policy: Finished restarting services
    Policy: Removed /usr/sbin/policy-rc.d
    Policy: Upgrade is finished
    

    If the upgrade process encounters changed configuration files that have new versions in the release to which you are upgrading, you see a message similar to this:

    Configuration file '/etc/frr/daemons'
    ==> Modified (by you or by a script) since installation.
    ==> Package distributor has shipped an updated version.
    What would you like to do about it ? Your options are:
    Y or I : install the package maintainer's version
    N or O : keep your currently-installed version
    D : show the differences between the versions
    Z : start a shell to examine the situation
    The default action is to keep your current version.
    *** daemons (Y/I/N/O/D/Z) [default=N] ?
    
    • To see the differences between the currently installed version and the new version, type D.
    • To keep the currently installed version, type N. The new package version installs with the suffix .dpkg-dist (for example, /etc/frr/daemons.dpkg-dist). When the upgrade completes and before you reboot, merge your changes with the changes from the newly installed file.
    • To install the new version, type I. Your currently installed version has the suffix .dpkg-old.
    • Cumulus Linux includes /etc/apt/sources.list in the cumulus-archive-keyring package. During upgrade, you must select if you want the new version from the package or the existing file.

    When the upgrade is complete, you can search for the files with the sudo find / -mount -type f -name '*.dpkg-*' command.

    If you see errors for expired GPG keys that prevent you from upgrading packages, follow the steps in Upgrading Expired GPG Keys.

  5. Reboot the switch if the upgrade messages indicate that you need to perform a system restart.

    cumulus@switch:~$ sudo -E apt-get upgrade
    ... upgrade messages here ...
    
    *** Caution: Service restart prior to reboot could cause unpredictable behavior
    *** System reboot required ***
    cumulus@switch:~$ sudo reboot
    
  6. Verify correct operation with the old configurations on the new version.

The first time you run the NVUE nv config apply command after upgrading to Cumulus Linux 5.6, NVUE might override certain existing configuration for features that are now configurable with NVUE. Immediately after you reboot the switch to complete the upgrade, NVIDIA recommends you either:

  • Run NVUE commands to configure these features.
  • Configure NVUE to ignore changes to the relevant configuration files for these features; refer to Configure NVUE to Ignore Linux Files.

Upgrade Notes

Package upgrade always updates to the latest available release in the Cumulus Linux repository. For example, if you are currently running Cumulus Linux 5.0.0 and run the sudo -E apt-get upgrade command on that switch, the packages upgrade to the latest releases in the latest 5.x release.

Because Cumulus Linux is a collection of different Debian Linux packages, be aware of the following:

Upgrade Switches in an MLAG Pair

If you are using MLAG to dual connect two switches in your environment, follow the steps below to upgrade the switches.

You must upgrade both switches in the MLAG pair to the same release of Cumulus Linux.

Only during the upgrade process does Cumulus Linux supports different software versions between MLAG peer switches. After you upgrade the first MLAG switch in the pair, run the clagctl showtimers command to monitor the init-delay timer. When the timer expires, make the upgraded MLAG switch the primary, then upgrade the peer to the same version of Cumulus Linux.

NVIDIA has not tested running different versions of Cumulus Linux on MLAG peer switches outside of the upgrade time period; you might see unexpected results.

  1. Verify the switch is in the secondary role:

    cumulus@switch:~$ nv show mlag
    
  2. Shut down the core uplink layer 3 interfaces. The following example shuts down swp1:

    cumulus@switch:~$ nv set interface swp1 link state down
    cumulus@switch:~$ nv config apply
    
  3. Shut down the peer link:

    cumulus@switch:~$ nv set interface peerlink link state down
    cumulus@switch:~$ nv config apply
    
  4. To boot the switch into ONIE, run the onie-install -a -i <image-location> command. The following example command installs the image from a web server. There are additional ways to install the Cumulus Linux image, such as using FTP, a local file, or a USB drive. For more information, see Installing a New Cumulus Linux Image.

    cumulus@switch:~$ sudo onie-install -a -i http://10.0.1.251/downloads/cumulus-linux-5.6.0-mlx-amd64.bin
    

    To upgrade the switch with package upgrade instead of booting into ONIE, run the sudo -E apt-get update and sudo -E apt-get upgrade commands; see Package Upgrade.

  5. Save the changes to the NVUE configuration from steps 2-3 and reboot the switch:

    cumulus@switch:~$ nv config save
    cumulus@switch:~$ nv action reboot system
    
  6. If you installed a new image on the switch, restore the configuration files to the new release. If you performed an upgrade with apt, bring the uplink and peer link interfaces you shut down in steps 2-3 up:

    cumulus@switch:~$ nv set interface swp1 link state up
    cumulus@switch:~$ nv set interface peerlink link state down
    cumulus@switch:~$ nv config apply
    cumulus@switch:~$ nv config save
    
  7. Verify STP convergence across both switches with the Linux mstpctl showall command. NVUE does not provide an equivalent command.

    cumulus@switch:~$ mstpctl showall
    
  8. Verify core uplinks and peer links are UP:

    cumulus@switch:~$ nv show interface
    
  9. Verify MLAG convergence:

    cumulus@switch:~$ nv show mlag
    
  10. Make this secondary switch the primary:

    cumulus@switch:~$ nv set mlag priority 2084
    
  11. Verify the other switch is now in the secondary role.

  12. Repeat steps 2-9 on the new secondary switch.

  13. Remove the priority 2048 and restore the priority back to 32768 on the current primary switch:

    cumulus@switch:~$ nv set mlag priority 32768
    
  1. Verify the switch is in the secondary role:

    cumulus@switch:~$ clagctl status
    
  2. Shut down the core uplink layer 3 interfaces:

    cumulus@switch:~$ sudo ip link set <switch-port> down
    
  3. Shut down the peer link:

    cumulus@switch:~$ sudo ip link set peerlink down
    
  4. To boot the switch into ONIE, run the onie-install -a -i <image-location> command. The following example command installs the image from a web server. There are additional ways to install the Cumulus Linux image, such as using FTP, a local file, or a USB drive. For more information, see Installing a New Cumulus Linux Image.

    cumulus@switch:~$ sudo onie-install -a -i http://10.0.1.251/downloads/cumulus-linux-5.6.0-mlx-amd64.bin
    

    To upgrade the switch with package upgrade instead of booting into ONIE, run the sudo -E apt-get update and sudo -E apt-get upgrade commands; see Package Upgrade.

  5. Reboot the switch:

    cumulus@switch:~$ sudo reboot
    
  6. If you installed a new image on the switch, restore the configuration files to the new release.

  7. Verify STP convergence across both switches:

    cumulus@switch:~$ mstpctl showall
    
  8. Verify that core uplinks and peer links are UP:

    cumulus@switch:~$ ip addr show
    
  9. Verify MLAG convergence:

    cumulus@switch:~$ clagctl status
    
  10. Make this secondary switch the primary:

    cumulus@switch:~$ clagctl priority 2048
    
  11. Verify the other switch is now in the secondary role.

  12. Repeat steps 2-9 on the new secondary switch.

  13. Remove the priority 2048 and restore the priority back to 32768 on the current primary switch:

    cumulus@switch:~$ clagctl priority 32768
    

Roll Back a Cumulus Linux Installation

Even the most well planned and tested upgrades can result in unforeseen problems and sometimes the best solution is to roll back to the previous state. These main strategies require detailed planning and execution:

The method you employ is specific to your deployment strategy. Providing detailed steps for each scenario is outside the scope of this document.

Third Party Packages

If you install any third party applications on a Cumulus Linux switch, configuration data is typically installed in the /etc directory, but it is not guaranteed. It is your responsibility to understand the behavior and configuration file information of any third party packages installed on the switch.

After you upgrade using a full Cumulus Linux image install, you need to reinstall any third party packages or any Cumulus Linux add-on packages.

Adding and Updating Packages

To manage additional applications in the form of packages and to install the latest updates, use the Advanced Packaging Tool (apt).

Updating, upgrading, and installing packages with apt causes disruptions to network services:

  • Upgrading a package can cause services to restart or stop.
  • Installing a package sometimes disrupts core services by changing core service dependency packages. In some cases, installing new packages also upgrades additional existing packages due to dependencies.
  • If services stop, you need to reboot the switch to restart the services.

Update the Package Cache

To work correctly, apt relies on a local cache listing of the available packages. You must populate the cache initially, then periodically update it with sudo -E apt-get update:

cumulus@switch:~$ sudo -E apt-get update
Ign:1 copy:/var/lib/cumulus/cumulus-local-apt-archive cumulus-local-apt-archive InRelease
Get:2 copy:/var/lib/cumulus/cumulus-local-apt-archive cumulus-local-apt-archive Release [1,115 B]
Ign:3 copy:/var/lib/cumulus/cumulus-local-apt-archive cumulus-local-apt-archive Release.gpg
Get:4 http://security.debian.org buster/updates InRelease [65.4 kB]                 
Hit:5 http://deb.debian.org/debian buster InRelease                                 
Get:6 http://deb.debian.org/debian buster-updates InRelease [51.9 kB]
Get:7 http://deb.debian.org/debian buster-backports InRelease [46.7 kB]
Get:8 http://deb.debian.org/debian buster-updates/main Sources.diff/Index [8,608 B] 
Get:9 http://deb.debian.org/debian buster-updates/main amd64 Packages.diff/Index [8,608 B]
Get:10 http://deb.debian.org/debian buster-updates/main Sources 2021-09-28-1420.03.pdiff [185 B]
Get:10 http://deb.debian.org/debian buster-updates/main Sources 2021-09-28-1420.03.pdiff [185 B]
Get:11 http://deb.debian.org/debian buster-updates/main amd64 Packages 2021-09-28-1420.03.pdiff [184 B]               
Get:11 http://deb.debian.org/debian buster-updates/main amd64 Packages 2021-09-28-1420.03.pdiff [184 B]               
Get:12 http://deb.debian.org/debian buster-backports/main Sources.diff/Index [27.8 kB]                     
Get:13 http://deb.debian.org/debian buster-backports/main amd64 Packages.diff/Index [27.8 kB]                         
Hit:14 http://apps3.cumulusnetworks.com/repos/deb CumulusLinux-4 InRelease                                            
Get:15 http://security.debian.org buster/updates/main Sources [200 kB]                             
Get:16 http://security.debian.org buster/updates/main amd64 Packages [305 kB]              
Hit:17 http://apt.cumulusnetworks.com/repo CumulusLinux-4-latest InRelease                       
Get:18 http://deb.debian.org/debian buster-backports/main Sources 2021-10-02-0801.17.pdiff [681 B]
Get:19 http://deb.debian.org/debian buster-backports/main Sources 2021-10-02-1405.24.pdiff [31 B]
Get:19 http://deb.debian.org/debian buster-backports/main Sources 2021-10-02-1405.24.pdiff [31 B]
Get:20 http://deb.debian.org/debian buster-backports/main amd64 Packages 2021-10-02-1405.24.pdiff [178 B]
Get:20 http://deb.debian.org/debian buster-backports/main amd64 Packages 2021-10-02-1405.24.pdiff [178 B]
Fetched 744 kB in 1s (982 kB/s)
Reading package lists... Done

Use the -E option with sudo whenever you run any apt-get command. This option preserves your environment variables (such as HTTP proxies) before you install new packages or upgrade your distribution.

List Available Packages

After the cache populates, use the apt-cache command to search the cache and find the packages of interest or to get information about an available package.

Here are examples of the search and show sub-commands:

cumulus@switch:~$ apt-cache search tcp
collectd-core - statistics collection and monitoring daemon (core system)
fakeroot - tool for simulating superuser privileges
iperf - Internet Protocol bandwidth measuring tool
iptraf-ng - Next Generation Interactive Colorful IP LAN Monitor
libfakeroot - tool for simulating superuser privileges - shared libraries
libfstrm0 - Frame Streams (fstrm) library
libibverbs1 - Library for direct userspace use of RDMA (InfiniBand/iWARP)
libnginx-mod-stream - Stream module for Nginx
libqt4-network - Qt 4 network module
librtr-dev - Small extensible RPKI-RTR-Client C library - development files
librtr0 - Small extensible RPKI-RTR-Client C library
libwiretap8 - network packet capture library -- shared library
libwrap0 - Wietse Venema's TCP wrappers library
libwrap0-dev - Wietse Venema's TCP wrappers library, development files
netbase - Basic TCP/IP networking system
nmap-common - Architecture independent files for nmap
nuttcp - network performance measurement tool
openssh-client - secure shell (SSH) client, for secure access to remote machines
openssh-server - secure shell (SSH) server, for secure access from remote machines
openssh-sftp-server - secure shell (SSH) sftp server module, for SFTP access from remote machines
python-dpkt - Python 2 packet creation / parsing module for basic TCP/IP protocols
rsyslog - reliable system and kernel logging daemon
socat - multipurpose relay for bidirectional data transfer
tcpdump - command-line network traffic analyzer
cumulus@switch:~$ apt-cache show tcpdump
Package: tcpdump
Version: 4.9.3-1~deb10u1
Installed-Size: 1109
Maintainer: Romain Francoise <rfrancoise@debian.org>
Architecture: amd64
Replaces: apparmor-profiles-extra (<< 1.12~)
Depends: libc6 (>= 2.14), libpcap0.8 (>= 1.5.1), libssl1.1 (>= 1.1.0)
Suggests: apparmor (>= 2.3)
Breaks: apparmor-profiles-extra (<< 1.12~)
Size: 400060
SHA256: 3a63be16f96004bdf8848056f2621fbd863fadc0baf44bdcbc5d75dd98331fd3
SHA1: 2ab9f0d2673f49da466f5164ecec8836350aed42
MD5sum: 603baaf914de63f62a9f8055709257f3
Description: command-line network traffic analyzer
 This program allows you to dump the traffic on a network. tcpdump
 is able to examine IPv4, ICMPv4, IPv6, ICMPv6, UDP, TCP, SNMP, AFS
 BGP, RIP, PIM, DVMRP, IGMP, SMB, OSPF, NFS and many other packet
 types.
 .
 It can be used to print out the headers of packets on a network
 interface, filter packets that match a certain expression. You can
 use this tool to track down network problems, to detect attacks
 or to monitor network activities.
Description-md5: f01841bfda357d116d7ff7b7a47e8782
Homepage: http://www.tcpdump.org/
Multi-Arch: foreign
Section: net
Priority: optional
Filename: pool/upstream/t/tcpdump/tcpdump_4.9.3-1~deb10u1_amd64.deb

The search commands look for the search terms not only in the package name but in other parts of the package information; the search matches on more packages than you expect.

List Packages Installed on the System

The apt-cache command shows information about all the packages available in the repository. To see which packages are actually installed on your system with the version, run the following command.

cumulus@switch:~$ nv show platform software installed
Installed Package   description                                                      package                  version
-----------------   -------------------                                              ----------               --------------------
acpi                displays information on ACPI devices                             acpi                     1.7-1.1
acpi-support-base   scripts for handling base ACPI events such as the power button   acpi-support-base        0.142-8
acpid               Advanced Configuration and Power Interface event daemon          acpid                    1:2.0.31-1
...
cumulus@switch:~$ dpkg -l
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name                Version                   Architecture Description
+++-===================-=========================-============-=================================
ii  acpi                1.7-1.1                   amd64        displays information on ACPI devices
ii  acpi-support-base   0.142-8                   all          scripts for handling base ACPI events such as th
ii  acpid               1:2.0.31-1                amd64        Advanced Configuration and Power Interface event
ii  adduser             3.118                     all          add and remove users and groups
ii  apt                 1.8.2                     amd64        commandline package manager
ii  arping              2.19-6                    amd64        sends IP and/or ARP pings (to the MAC address)
ii  arptables           0.0.4+snapshot20181021-4  amd64        ARP table administration
...

Show the Version of a Package

To show the version of a specific package installed on the system:

The following example command shows which version of the vrf package is on the system:

cumulus@switch:~$ nv show platform software installed vrf
             running              applied  pending  description
-----------  -------------------  -------  -------  -----------
description  Linux tools for VRF                    Description
package      vrf                                    Package
version      1.0-cl5.6.0u9                         Version

The following example command shows which version of the vrf package is on the system:

cumulus@switch:~$ dpkg -l vrf
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name       Version      Architecture Description
+++-==========-============-============-=================================
ii  vrf        1.0-cl5.6.0u9    amd64        Linux tools for VRF

Upgrade Packages

To upgrade all the packages installed on the system to their latest versions, run the following commands:

cumulus@switch:~$ sudo -E apt-get update
cumulus@switch:~$ sudo -E apt-get upgrade

The system lists the packages for upgrade and prompts you to continue.

The above commands upgrade all installed versions with their latest versions but do not install any new packages.

Add New Packages

To add a new package, first ensure the package is not already on the system:

cumulus@switch:~$ dpkg -l | grep <name of package>
cumulus@switch:~$ sudo -E apt-get update
cumulus@switch:~$ sudo -E apt-get install tcpreplay
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following NEW packages will be installed:
tcpreplay
0 upgraded, 1 newly installed, 0 to remove and 1 not upgraded.
Need to get 436 kB of archives.
After this operation, 1008 kB of additional disk space will be used
...

You can install several packages at the same time:

cumulus@switch:~$ sudo -E apt-get install <package1> <package2> <package3>

In some cases, installing a new package also upgrades additional existing packages due to dependencies. To view these additional packages before you install, run the apt-get install --dry-run command.

Add Packages From Another Repository

As shipped, Cumulus Linux searches the Cumulus Linux repository for available packages. You can add additional repositories to search by adding them to the list of sources that apt-get consults. See man sources.list for more information.

NVIDIA adds features or makes bug fixes to certain packages; do not replace these packages with versions from other repositories.

If you want to install packages that are not in the Cumulus Linux repository, the procedure is the same as above, but with one additional step.

NVIDIA does not test and Cumulus Linux Technical Support does not support packages that are not part of the Cumulus Linux repository.

Installing packages outside of the Cumulus Linux repository requires the use of sudo -E apt-get; however, depending on the package, you can use easy-install and other commands.

To install a new package, complete the following steps:

  1. Run the dpkg command to ensure that the package is not already installed on the system:

    cumulus@switch:~$ dpkg -l | grep <name of package>
    
  2. If the package is already on the system, ensure it is the version you need. If it is an older version, update the package from the Cumulus Linux repository:

    cumulus@switch:~$ sudo -E apt-get update
    cumulus@switch:~$ sudo -E apt-get install <name of package>
    cumulus@switch:~$ sudo -E apt-get upgrade
    
  3. If the package is not on the system, the package source location is not in the /etc/apt/sources.list file. Edit and add the appropriate source to the file. For example, add the following if you want a package from the Debian repository that is not in the Cumulus Linux repository:

    deb http://http.us.debian.org/debian buster main
    deb http://security.debian.org/ buster/updates main
    

    Otherwise, /etc/apt/sources.list lists the repository but comments it out. To uncomment the repository, remove the # at the start of the line, then save the file.

  4. Run sudo -E apt-get update, then install the package and upgrade:

    cumulus@switch:~$ sudo -E apt-get update
    cumulus@switch:~$ sudo -E apt-get install <name of package>
    cumulus@switch:~$ sudo -E apt-get upgrade
    

Add Packages from the Cumulus Linux Local Archive

Cumulus Linux contains a local archive embedded in the Cumulus Linux image. This archive, cumulus-local-apt-archive, contains the packages you need to install ifplugd, LDAP, RADIUS or TACACS+ without a network connection.

The archive contains the following packages:

Add these packages with apt-get update && apt-get install, as described above.

Zero Touch Provisioning - ZTP

Use ZTP to deploy network devices in large-scale environments. On first boot, Cumulus Linux runs ZTP, which executes the provisioning automation that deploys the device for its intended role in the network.

The provisioning framework allows you to execute a one-time, user-provided script. You can develop this script using a variety of automation tools and scripting languages. You can also use it to add the switch to a configuration management (CM) platform such as Puppet, Chef, CFEngine or a custom, proprietary tool.

While developing and testing the provisioning logic, you can use the ztp command in Cumulus Linux to run your provisioning script manually on a device.

ZTP in Cumulus Linux can run automatically in one of the following ways, in this order:

  1. Through a local file
  2. Using a USB drive inserted into the switch (ZTP-USB)
  3. Through DHCP

Use a Local File

ZTP only looks one time for a ZTP script on the local file system when the switch boots. ZTP searches for an install script that matches an ONIE-style waterfall in /var/lib/cumulus/ztp, looking for the most specific name first, and ending at the most generic:

You can also trigger the ZTP process manually by running the ztp --run <URL> command, where the URL is the path to the ZTP script.

Use a USB Drive

NVIDIA tests this feature only with thumb drives, not an external large USB hard drive.

If the ztp process does not discover a local script, it tries one time to locate an inserted but unmounted USB drive. If it discovers one, it begins the ZTP process. Cumulus Linux supports the use of a FAT32, FAT16, or VFAT-formatted USB drive as an installation source for ZTP scripts. You must plug in the USB drive before you power up the switch.

At minimum, the script must:

Follow these steps to perform ZTP using a USB drive:

  1. Copy the installation image to the USB drive.
  2. The ztp process searches the root filesystem of the newly mounted drive for filenames matching an ONIE-style waterfall (see the patterns and examples above), looking for the most specific name first, and ending at the most generic.
  3. ZTP parses the contents of the script to ensure it contains the CUMULUS-AUTOPROVISIONING flag (see example scripts).

The USB drive mounts to a temporary directory under /tmp (for example, /tmp/tmpigGgjf/). To reference files on the USB drive, use the environment variable ZTP_USB_MOUNTPOINT to refer to the USB root partition.

ZTP Over DHCP

If the ztp process does not discover a local ONIE script or applicable USB drive, it checks DHCP every ten seconds for up to five minutes for the presence of a ZTP URL specified in /var/run/ztp.dhcp. The URL can be any of HTTP, HTTPS, FTP, or TFTP.

For ZTP using DHCP, provisioning initially takes place over the management network and initiates through a DHCP hook. A DHCP option specifies a configuration script. The ZTP process requests this script from the Web server and the script executes locally.

The ZTP process over DHCP follows these steps:

  1. The first time you boot Cumulus Linux, eth0 makes a DHCP request. By default, Cumulus Linux sends DHCP option 60 (the vendor class identifier) with the value cumulus-linux x86_64 to identify itself to the DHCP server.
  2. The DHCP server offers a lease to the switch.
  3. If option 239 is in the response, the ZTP process starts.
  4. The ZTP process requests the contents of the script from the URL, sending additional HTTP headers containing details about the switch.
  5. ZTP parses the contents of the script to ensure it contains the CUMULUS-AUTOPROVISIONING flag (see example scripts).
  6. If provisioning is necessary, the script executes locally on the switch with root privileges.
  7. ZTP examines the return code of the script. If the return code is 0, ZTP marks the provisioning state as complete in the autoprovisioning configuration file.

Trigger ZTP Over DHCP

If you have not yet provisioned the switch, you can trigger the ZTP process over DHCP when eth0 uses DHCP and one of the following events occur:

You can also run the ztp --run <URL> command, where the URL is the path to the ZTP script.

Configure the DHCP Server

During the DHCP process over eth0, Cumulus Linux requests DHCP option 239. This option specifies the custom provisioning script.

For example, the /etc/dhcp/dhcpd.conf file for an ISC DHCP server looks like:

option cumulus-provision-url code 239 = text;

  subnet 192.0.2.0 netmask 255.255.255.0 {
  range 192.0.2.100 192.168.0.200;
  option cumulus-provision-url "http://192.0.2.1/demo.sh";
}

In addition, you can specify the hostname of the switch with the host-name option:

subnet 192.168.0.0 netmask 255.255.255.0 {
  range 192.168.0.100 192.168.0.200;
  option cumulus-provision-url "http://192.0.2.1/demo.sh";
  host dc1-tor-sw1 { hardware ethernet 44:38:39:00:1a:6b; fixed-address 192.168.0.101; option host-name "dc1-tor-sw1"; }
}

Do not use an underscore (_) in the hostname; underscores are not permitted in hostnames.

DHCP on Front Panel Ports

ZTP runs DHCP on all the front panel switch ports and on any active interface. ZTP assesses the list of active ports on every retry cycle. When it receives the DHCP lease and option 239 is present in the response, ZTP starts to execute the script.

Inspect HTTP Headers

The following HTTP headers in the request to the web server retrieve the provisioning script:

Header                        Value                 Example
------                        -----                 -------
User-Agent                                          CumulusLinux-AutoProvision/0.4
CUMULUS-ARCH                  CPU architecture      x86_64
CUMULUS-BUILD                                       5.1.0
CUMULUS-MANUFACTURER                                odm
CUMULUS-PRODUCTNAME                                 switch_model
CUMULUS-SERIAL                                      XYZ123004
CUMULUS-BASE-MAC                                    44:38:39:FF:40:94
CUMULUS-MGMT-MAC                                    44:38:39:FF:00:00
CUMULUS-VERSION                                     5.1.0
CUMULUS-PROV-COUNT                                  0
CUMULUS-PROV-MAX                                    32

Write ZTP Scripts

You must include the following line in any of the supported scripts that you expect to run using the autoprovisioning framework.

# CUMULUS-AUTOPROVISIONING

The script must contain the CUMULUS-AUTOPROVISIONING flag. You can include this flag in a comment or remark; you do not need to echo or write the flag to stdout.

You can write the script in any language that Cumulus Linux supports, such as:

The script must return an exit code of 0 upon success to mark the process as complete in the autoprovisioning configuration file.

The following script installs Cumulus Linux from a USB drive and applies a configuration:

#!/bin/bash
function error() {
  echo -e "\e[0;33mERROR: The ZTP script failed while running the command $BASH_COMMAND at line $BASH_LINENO.\e[0m" >&2
  exit 1
}

# Log all output from this script
exec >> /var/log/autoprovision 2>&1
date "+%FT%T ztp starting script $0"

trap error ERR

#Add Debian Repositories
echo "deb http://http.us.debian.org/debian buster main" >> /etc/apt/sources.list
echo "deb http://security.debian.org/ buster/updates main" >> /etc/apt/sources.list

#Update Package Cache
apt-get update -y

#Load interface config from usb
cp ${ZTP_USB_MOUNTPOINT}/interfaces /etc/network/interfaces

#Load port config from usb
#   (if breakout cables are used for certain interfaces)
cp ${ZTP_USB_MOUNTPOINT}/ports.conf /etc/cumulus/ports.conf

#Reload interfaces to apply loaded config
ifreload -a

# CUMULUS-AUTOPROVISIONING
exit 0

Continue Provisioning

Typically ZTP exits after executing the script locally and does not continue. To continue with provisioning so that you do not have to intervene manually or embed an Ansible callback into the script, you can add the CUMULUS-AUTOPROVISION-CASCADE directive.

Best Practices

ZTP scripts come in different forms and frequently perform the same tasks. As BASH is the most common language for ZTP scripts, use the following BASH snippets to perform common tasks with robust error checking.

Set the Default Cumulus User Password

The default cumulus user account password is cumulus. When you log into Cumulus Linux for the first time, you must provide a new password for the cumulus account, then log back into the system.

Add the following function to your ZTP script to change the default cumulus user account password to a clear-text password. The example changes the password cumulus to MyP4$$word.

function set_password(){
     # Unexpire the cumulus account
     passwd -x 99999 cumulus
     # Set the password
     echo 'cumulus:MyP4$$word' | chpasswd
}
set_password

If you have an insecure management network, set the password with an encrypted hash instead of a clear-text password.

Test DNS Name Resolution

DNS names are frequently used in ZTP scripts. The ping_until_reachable function tests that each DNS name resolves into a reachable IP address. Call this function with each DNS target used in your script before you use the DNS name elsewhere in your script.

The following example shows how to call the ping_until_reachable function in the context of a larger task.

function ping_until_reachable(){
    last_code=1
    max_tries=30
    tries=0
    while [ "0" != "$last_code" ] && [ "$tries" -lt "$max_tries" ]; do
        tries=$((tries+1))
        echo "$(date) INFO: ( Attempt $tries of $max_tries ) Pinging $1 Target Until Reachable."
        ping $1 -c2 &> /dev/null
        last_code=$?
            sleep 1
    done
    if [ "$tries" -eq "$max_tries" ] && [ "$last_code" -ne "0" ]; then
        echo "$(date) ERROR: Reached maximum number of attempts to ping the target $1 ."
        exit 1
    fi
}

Check the Cumulus Linux Release

The following script segment demonstrates how to check which Cumulus Linux release is running and upgrades the node if the release is not the target release. If the release is the target release, normal ZTP tasks execute. This script calls the ping_until_reachable script (described above) to make sure the server holding the image server and the ZTP script is reachable.

function init_ztp(){
    #do normal ZTP tasks
}

CUMULUS_TARGET_RELEASE=5.1.0
CUMULUS_CURRENT_RELEASE=$(cat /etc/lsb-release  | grep RELEASE | cut -d "=" -f2)
IMAGE_SERVER_HOSTNAME=webserver.example.com
IMAGE_SERVER= "http:// "$IMAGE_SERVER_HOSTNAME "/ "$CUMULUS_TARGET_RELEASE ".bin "
ZTP_URL= "http:// "$IMAGE_SERVER_HOSTNAME "/ztp.sh "

if [ "$CUMULUS_TARGET_RELEASE" != "$CUMULUS_CURRENT_RELEASE" ]; then
    ping_until_reachable $IMAGE_SERVER_HOSTNAME
    /usr/cumulus/bin/onie-install -fa -i $IMAGE_SERVER -z $ZTP_URL && reboot
else
    init_ztp && reboot
fi
exit 0

Apply Management VRF Configuration

If you apply a management VRF in your script, either apply it last or reboot instead. If you do not apply a management VRF last, you need to prepend any commands that require eth0 to communicate out with /usr/bin/ip vrf exec mgmt; for example, /usr/bin/ip vrf exec mgmt apt-get update -y.

Perform Ansible Provisioning Callbacks

After initially configuring a node with ZTP, use Provisioning Callbacks to inform Ansible Tower or AWX that the node is ready for more detailed provisioning. The following example demonstrates how to use a provisioning callback:

/usr/bin/curl -H "Content-Type:application/json" -k -X POST --data '{"host_config_key":"'somekey'"}' -u username:password http://ansible.example.com/api/v2/job_templates/1111/callback/

Disable the DHCP Hostname Override Setting

Make sure to disable the DHCP hostname override setting in your script.

function set_hostname(){
    # Remove DHCP Setting of Hostname
    sed s/'SETHOSTNAME="yes"'/'SETHOSTNAME="no"'/g -i /etc/dhcp/dhclient-exit-hooks.d/dhcp-sethostname
    hostnamectl set-hostname $1
}

Test ZTP Scripts

Use these commands to test and debug your ZTP scripts.

You can use verbose mode to debug your script and see where your script fails. Include the -v option when you run ZTP:

cumulus@switch:~$ sudo ztp -v -r http://192.0.2.1/demo.sh
Attempting to provision via ZTP Manual from http://192.0.2.1/demo.sh

Broadcast message from root@dell-s6010-01 (ttyS0) (Tue May 10 22:44:17 2016):  

ZTP: Attempting to provision via ZTP Manual from http://192.0.2.1/demo.sh
ZTP Manual: URL response code 200
ZTP Manual: Found Marker CUMULUS-AUTOPROVISIONING
ZTP Manual: Executing http://192.0.2.1/demo.sh
error: ZTP Manual: Payload returned code 1
error: Script returned failure

To see results of the most recent ZTP execution, you can run the ztp -s command.

cumulus@switch:~$ ztp -s
ZTP INFO:

State              enabled
Version            1.0
Result             Script Failure
Date               Mon 20 May 2019 09:31:27 PM UTC
Method             ZTP DHCP
URL                http://192.0.2.1/demo.sh

If ZTP runs when the switch boots and not manually, you can run the systemctl -l status ztp.service then journalctl -l -u ztp.service to see if any failures occur:

cumulus@switch:~$ sudo systemctl -l status ztp.service
● ztp.service - Cumulus Linux ZTP
    Loaded: loaded (/lib/systemd/system/ztp.service; enabled)
    Active: failed (Result: exit-code) since Wed 2016-05-11 16:38:45 UTC; 1min 47s ago
        Docs: man:ztp(8)
    Process: 400 ExecStart=/usr/sbin/ztp -b (code=exited, status=1/FAILURE)
    Main PID: 400 (code=exited, status=1/FAILURE)

May 11 16:37:45 cumulus ztp[400]: ztp [400]: ZTP USB: Device not found
May 11 16:38:45 dell-s6010-01 ztp[400]: ztp [400]: ZTP DHCP: Looking for ZTP Script provided by DHCP
May 11 16:38:45 dell-s6010-01 ztp[400]: ztp [400]: Attempting to provision via ZTP DHCP from http://192.0.2.1/demo.sh
May 11 16:38:45 dell-s6010-01 ztp[400]: ztp [400]: ZTP DHCP: URL response code 200
May 11 16:38:45 dell-s6010-01 ztp[400]: ztp [400]: ZTP DHCP: Found Marker CUMULUS-AUTOPROVISIONING
May 11 16:38:45 dell-s6010-01 ztp[400]: ztp [400]: ZTP DHCP: Executing http://192.0.2.1/demo.sh
May 11 16:38:45 dell-s6010-01 ztp[400]: ztp [400]: ZTP DHCP: Payload returned code 1
May 11 16:38:45 dell-s6010-01 ztp[400]: ztp [400]: Script returned failure
May 11 16:38:45 dell-s6010-01 systemd[1]: ztp.service: main process exited, code=exited, status=1/FAILURE
May 11 16:38:45 dell-s6010-01 systemd[1]: Unit ztp.service entered failed state.
cumulus@switch:~$
cumulus@switch:~$ sudo journalctl -l -u ztp.service --no-pager
-- Logs begin at Wed 2016-05-11 16:37:42 UTC, end at Wed 2016-05-11 16:40:39 UTC. --
May 11 16:37:45 cumulus ztp[400]: ztp [400]: /var/lib/cumulus/ztp: Sate Directory does not exist. Creating it...
May 11 16:37:45 cumulus ztp[400]: ztp [400]: /var/run/ztp.lock: Lock File does not exist. Creating it...
May 11 16:37:45 cumulus ztp[400]: ztp [400]: /var/lib/cumulus/ztp/ztp_state.log: State File does not exist. Creating it...
May 11 16:37:45 cumulus ztp[400]: ztp [400]: ZTP LOCAL: Looking for ZTP local Script
May 11 16:37:45 cumulus ztp[400]: ztp [400]: ZTP LOCAL: Waterfall search for /var/lib/cumulus/ztp/cumulus-ztp-x86_64-dell_s6010_s1220-rUNKNOWN
May 11 16:37:45 cumulus ztp[400]: ztp [400]: ZTP LOCAL: Waterfall search for /var/lib/cumulus/ztp/cumulus-ztp-x86_64-dell_s6010_s1220
May 11 16:37:45 cumulus ztp[400]: ztp [400]: ZTP LOCAL: Waterfall search for /var/lib/cumulus/ztp/cumulus-ztp-x86_64-dell
May 11 16:37:45 cumulus ztp[400]: ztp [400]: ZTP LOCAL: Waterfall search for /var/lib/cumulus/ztp/cumulus-ztp-x86_64
May 11 16:37:45 cumulus ztp[400]: ztp [400]: ZTP LOCAL: Waterfall search for /var/lib/cumulus/ztp/cumulus-ztp
May 11 16:37:45 cumulus ztp[400]: ztp [400]: ZTP USB: Looking for unmounted USB devices
May 11 16:37:45 cumulus ztp[400]: ztp [400]: ZTP USB: Parsing partitions
May 11 16:37:45 cumulus ztp[400]: ztp [400]: ZTP USB: Device not found
May 11 16:38:45 dell-s6010-01 ztp[400]: ztp [400]: ZTP DHCP: Looking for ZTP Script provided by DHCP
May 11 16:38:45 dell-s6010-01 ztp[400]: ztp [400]: Attempting to provision via ZTP DHCP from http://192.0.2.1/demo.sh
May 11 16:38:45 dell-s6010-01 ztp[400]: ztp [400]: ZTP DHCP: URL response code 200
May 11 16:38:45 dell-s6010-01 ztp[400]: ztp [400]: ZTP DHCP: Found Marker CUMULUS-AUTOPROVISIONING
May 11 16:38:45 dell-s6010-01 ztp[400]: ztp [400]: ZTP DHCP: Executing http://192.0.2.1/demo.sh
May 11 16:38:45 dell-s6010-01 ztp[400]: ztp [400]: ZTP DHCP: Payload returned code 1
May 11 16:38:45 dell-s6010-01 ztp[400]: ztp [400]: Script returned failure
May 11 16:38:45 dell-s6010-01 systemd[1]: ztp.service: main process exited, code=exited, status=1/FAILURE
May 11 16:38:45 dell-s6010-01 systemd[1]: Unit ztp.service entered failed state.

Instead of running journalctl, you can see the log history by running:

cumulus@switch:~$ cat /var/log/syslog | grep ztp
2016-05-11T16:37:45.132583+00:00 cumulus ztp [400]: /var/lib/cumulus/ztp: State Directory does not exist. Creating it...
2016-05-11T16:37:45.134081+00:00 cumulus ztp [400]: /var/run/ztp.lock: Lock File does not exist. Creating it...
2016-05-11T16:37:45.135360+00:00 cumulus ztp [400]: /var/lib/cumulus/ztp/ztp_state.log: State File does not exist. Creating it...
2016-05-11T16:37:45.185598+00:00 cumulus ztp [400]: ZTP LOCAL: Looking for ZTP local Script
2016-05-11T16:37:45.485084+00:00 cumulus ztp [400]: ZTP LOCAL: Waterfall search for /var/lib/cumulus/ztp/cumulus-ztp-x86_64-dell_s6010_s1220-rUNKNOWN
2016-05-11T16:37:45.486394+00:00 cumulus ztp [400]: ZTP LOCAL: Waterfall search for /var/lib/cumulus/ztp/cumulus-ztp-x86_64-dell_s6010_s1220
2016-05-11T16:37:45.488385+00:00 cumulus ztp [400]: ZTP LOCAL: Waterfall search for /var/lib/cumulus/ztp/cumulus-ztp-x86_64-dell
2016-05-11T16:37:45.489665+00:00 cumulus ztp [400]: ZTP LOCAL: Waterfall search for /var/lib/cumulus/ztp/cumulus-ztp-x86_64
2016-05-11T16:37:45.490854+00:00 cumulus ztp [400]: ZTP LOCAL: Waterfall search for /var/lib/cumulus/ztp/cumulus-ztp
2016-05-11T16:37:45.492296+00:00 cumulus ztp [400]: ZTP USB: Looking for unmounted USB devices
2016-05-11T16:37:45.493525+00:00 cumulus ztp [400]: ZTP USB: Parsing partitions
2016-05-11T16:37:45.636422+00:00 cumulus ztp [400]: ZTP USB: Device not found
2016-05-11T16:38:43.372857+00:00 cumulus ztp [1805]: Found ZTP DHCP Request
2016-05-11T16:38:45.696562+00:00 cumulus ztp [400]: ZTP DHCP: Looking for ZTP Script provided by DHCP
2016-05-11T16:38:45.698598+00:00 cumulus ztp [400]: Attempting to provision via ZTP DHCP from http://192.0.2.1/demo.sh
2016-05-11T16:38:45.816275+00:00 cumulus ztp [400]: ZTP DHCP: URL response code 200
2016-05-11T16:38:45.817446+00:00 cumulus ztp [400]: ZTP DHCP: Found Marker CUMULUS-AUTOPROVISIONING
2016-05-11T16:38:45.818402+00:00 cumulus ztp [400]: ZTP DHCP: Executing http://192.0.2.1/demo.sh
2016-05-11T16:38:45.834240+00:00 cumulus ztp [400]: ZTP DHCP: Payload returned code 1
2016-05-11T16:38:45.835488+00:00 cumulus ztp [400]: Script returned failure
2016-05-11T16:38:45.876334+00:00 cumulus systemd[1]: ztp.service: main process exited, code=exited, status=1/FAILURE
2016-05-11T16:38:45.879410+00:00 cumulus systemd[1]: Unit ztp.service entered failed state.

If you see that the issue is a script failure, you can modify the script and then run ZTP manually using ztp -v -r <URL/path to that script>, as above.

cumulus@switch:~$ sudo ztp -v -r http://192.0.2.1/demo.sh
Attempting to provision via ZTP Manual from http://192.0.2.1/demo.sh

Broadcast message from root@dell-s6010-01 (ttyS0) (Tue May 10 22:44:17 2019):  

ZTP: Attempting to provision via ZTP Manual from http://192.0.2.1/demo.sh
ZTP Manual: URL response code 200
ZTP Manual: Found Marker CUMULUS-AUTOPROVISIONING
ZTP Manual: Executing http://192.0.2.1/demo.sh
error: ZTP Manual: Payload returned code 1
error: Script returned failure
cumulus@switch:~$ sudo ztp -s
State         enabled
Version       1.0
Result        Script Failure
Date          Mon 20 May 2019 09:31:27 PM UTC
Method        ZTP Manual
URL           http://192.0.2.1/demo.sh

Use the following command to check syslog for information about ZTP:

cumulus@switch:~$ sudo grep -i ztp /var/log/syslog

Common ZTP Script Errors

Could not find referenced script/interpreter in downloaded payload

cumulus@leaf01:~$ sudo cat /var/log/syslog | grep ztp
2018-04-24T15:06:08.887041+00:00 leaf01 ztp [13404]: Attempting to provision via ZTP Manual from http://192.168.0.254/ztp_oob_windows.sh
2018-04-24T15:06:09.106633+00:00 leaf01 ztp [13404]: ZTP Manual: URL response code 200
2018-04-24T15:06:09.107327+00:00 leaf01 ztp [13404]: ZTP Manual: Found Marker CUMULUS-AUTOPROVISIONING
2018-04-24T15:06:09.107635+00:00 leaf01 ztp [13404]: ZTP Manual: Executing http://192.168.0.254/ztp_oob_windows.sh
2018-04-24T15:06:09.132651+00:00 leaf01 ztp [13404]: ZTP Manual: Could not find referenced script/interpreter in downloaded payload.
2018-04-24T15:06:14.135521+00:00 leaf01 ztp [13404]: ZTP Manual: Retrying
2018-04-24T15:06:14.138915+00:00 leaf01 ztp [13404]: ZTP Manual: URL response code 200
2018-04-24T15:06:14.139162+00:00 leaf01 ztp [13404]: ZTP Manual: Found Marker CUMULUS-AUTOPROVISIONING
2018-04-24T15:06:14.139448+00:00 leaf01 ztp [13404]: ZTP Manual: Executing http://192.168.0.254/ztp_oob_windows.sh
2018-04-24T15:06:14.143261+00:00 leaf01 ztp [13404]: ZTP Manual: Could not find referenced script/interpreter in downloaded payload.
2018-04-24T15:06:24.147580+00:00 leaf01 ztp [13404]: ZTP Manual: Retrying
2018-04-24T15:06:24.150945+00:00 leaf01 ztp [13404]: ZTP Manual: URL response code 200
2018-04-24T15:06:24.151177+00:00 leaf01 ztp [13404]: ZTP Manual: Found Marker CUMULUS-AUTOPROVISIONING
2018-04-24T15:06:24.151374+00:00 leaf01 ztp [13404]: ZTP Manual: Executing http://192.168.0.254/ztp_oob_windows.sh
2018-04-24T15:06:24.155026+00:00 leaf01 ztp [13404]: ZTP Manual: Could not find referenced script/interpreter in downloaded payload.
2018-04-24T15:06:39.164957+00:00 leaf01 ztp [13404]: ZTP Manual: Retrying
2018-04-24T15:06:39.165425+00:00 leaf01 ztp [13404]: Script returned failure
2018-04-24T15:06:39.175959+00:00 leaf01 ztp [13404]: ZTP script failed. Exiting...

Errors in syslog for ZTP like those shown above often occur if you create or edit the script on a Windows machine. Check to make sure that the \r\n characters are not present in the end-of-line encodings.

Use the cat -v ztp.sh command to view the contents of the script and search for any hidden characters.

root@oob-mgmt-server:/var/www/html# cat -v ./ztp_oob_windows.sh 
#!/bin/bash^M
^M
###################^M
#   ZTP Script^M
###################^M
^M
/usr/cumulus/bin/cl-license -i http://192.168.0.254/license.txt^M
^M
# Clean method of performing a Reboot^M
nohup bash -c 'sleep 2; shutdown now -r "Rebooting to Complete ZTP"' &^M
^M
exit 0^M
^M
# The line below is required to be a valid ZTP script^M
#CUMULUS-AUTOPROVISIONING^M
root@oob-mgmt-server:/var/www/html#

The ^M characters in the output of your ZTP script, as shown above, indicate the presence of Windows end-of-line encodings that you need to remove.

Use the translate (tr) command on any Linux system to remove the '\r' characters from the file.

root@oob-mgmt-server:/var/www/html# tr -d '\r' < ztp_oob_windows.sh > ztp_oob_unix.sh
root@oob-mgmt-server:/var/www/html# cat -v ./ztp_oob_unix.sh 
#!/bin/bash
###################
#   ZTP Script
###################
/usr/cumulus/bin/cl-license -i http://192.168.0.254/license.txt
# Clean method of performing a Reboot
nohup bash -c 'sleep 2; shutdown now -r "Rebooting to Complete ZTP"' &
exit 0
# The line below is required to be a valid ZTP script
#CUMULUS-AUTOPROVISIONING
root@oob-mgmt-server:/var/www/html#

Manually Use the ztp Command

To enable ZTP, use the -e option:

cumulus@switch:~$ sudo ztp -e

When you enable ZTP, it tries to run the next time the switch boots. However, if ZTP already ran on a previous boot up or if there is a manual configuration, ZTP exits without trying to look for a script.

ZTP checks for these manual configurations when the switch boots:

  • Password changes
  • Users and groups changes
  • Packages changes
  • Interfaces changes

When the switch boots for the first time, ZTP records the state of important files that can update after you configure the switch. After a reboot, ZTP compares the recorded state to the current state of these files. If they do not match, ZTP considers the switch as already provisioned and exits. ZTP only deletes these files after a reset.

To reset ZTP to its original state, use the -R option. This removes the ztp directory and ZTP runs the next time the switch reboots.

cumulus@switch:~$ sudo ztp -R

To disable ZTP, use the -d option:

cumulus@switch:~$ sudo ztp -d

To force provisioning to occur and ignore the status listed in the configuration file, use the -r option:

cumulus@switch:~$ sudo ztp -r cumulus-ztp.sh

To see the current ZTP state, use the -s option:

cumulus@switch:~$ sudo ztp -s
ZTP INFO:
State          disabled
Version        1.0
Result         success
Date           Mon May 20 21:51:04 2019 UTC
Method         Switch manually configured  
URL            None

Considerations

System Configuration

This section describes how to configure the following system settings:

NVIDIA User Experience - NVUE

NVUE is an object-oriented, schema driven model of a complete Cumulus Linux system (hardware and software) providing a robust API that allows for multiple interfaces to both view (show) and configure (set and unset) any element within a system running the NVUE software.

NVUE Object Model

The NVUE object model definition uses the OpenAPI specification (OAS). Similar to YANG (RFC 6020 and RFC 7950), OAS is a data definition, manipulation, and modeling language (DML) that lets you build model-driven interfaces for both humans and machines. Although the computer networking and telecommunications industry commonly uses YANG (standardized by IETF) as a DML, the adoption of OpenAPI is broader, spanning cloud to compute to storage to IoT and even social media. The OpenAPI Initiative (OAI) consortium leads OpenAPI standardization, a chartered project under the Linux Foundation.

The OAS schema forms the management plane model with which you configure, monitor, and manage the Cumulus Linux switch. The v3.0.2 version of OAS defines the NVUE data model.

Like other systems that use OpenAPI, the NVUE OAS schema defines the endpoints (paths) exposed as RESTful APIs. With these REST APIs, you can perform various create, retrieve, update, delete, and eXecute (CRUDX) operations. The OAS schema also describes the API inputs and outputs (data models).

You can use the NVUE object model in these two ways:

The CLI and the REST API are equivalent in functionality; you can run all management operations from the REST API or the CLI. The NVUE object model drives both the REST API and the CLI management operations. All operations are consistent; for example, the CLI nv show commands reflect any PATCH operation (create) you run through the REST API.

NVUE follows a declarative model, removing context-specific commands and settings. It is structured as a big tree that represents the entire state of a Cumulus Linux instance. At the base of the tree are high level branches representing objects, such as router and interface. Under each of these branches are further branches. As you navigate through the tree, you gain a more specific context. At the leaves of the tree are actual attributes, represented as key-value pairs. The path through the tree is similar to a filesystem path.

Cumulus Linux installs NVUE by default and enables the NVUE service nvued.

NVUE CLI

The NVUE CLI has a flat structure instead of a modal structure. Therefore, you can run all commands from the primary prompt instead of only in a specific mode.

You can choose to configure Cumulus Linux either with NVUE commands or Linux commands (with vtysh or by manually editing configuration files). Do not run both NVUE configuration commands (such as nv set, nv unset, nv action, and nv config) and Linux commands to configure the switch. NVUE commands replace the configuration in files such as /etc/network/interfaces and /etc/frr/frr.conf, and remove any configuration you add manually or with automation tools like Ansible, Chef, or Puppet.

If you choose to configure Cumulus Linux with NVUE, you can configure features that do not yet support the NVUE Object Model by creating snippets. See NVUE Snippets.

Command Syntax

NVUE commands all begin with nv and fall into one of three syntax categories:

Command Completion

As you enter commands, you can get help with the valid keywords or options using the tab key. For example, using tab completion with nv set displays the possible options for the command and returns you to the command prompt to complete the command.

cumulus@switch:~$ nv set <<press tab>>
acl        evpn       mlag       qos        service    vrf        
bridge     interface  nve        router     system

cumulus@switch:~$ nv set

Command Question Mark

You can type a question mark (?) after a command to display required information quickly and concisely. When you type ?, NVUE specifies the value type, range, and options with a brief description of each; for example:

cumulus@switch:~$ nv set interface swp1 link state ?
    [Enter]               
    down                   The interface is not ready
    up                     The interface is ready
cumulus@switch:~$ nv set interface swp1 link mtu ?
    <arg>                  (integer:552 - 9216)
cumulus@switch:~$ nv set interface swp1 link speed ?
    <arg>                  (string | enum:10M,100M,1G,10G,25G,40G,50G,100G,200G,40
                           0G,800G,auto)

NVUE also indicates if you need to provide specific values for the command:

cumulus@switch:~$ nv set interface swp1 bridge domain ?
    <domain-id>            Domain (bridge-name)

Command Abbreviation

NVUE supports command abbreviation, where you can type a certain number of characters instead of a whole command to speed up CLI interaction. For example, instead of typing nv show interface, you can type nv sh int.

If the command you type is ambiguous, NVUE shows the reason for the ambiguity so that you can correct the shortcut. For example:

cumulus@switch:~$ nv s i 
Ambiguous Command: 
   set interface 
   show interface 

Command Help

As you enter commands, you can get help with command syntax by entering -h or --help at various points within a command entry. For example, to examine the options available for nv set interface, enter nv set interface -h or nv set interface --help.

cumulus@switch:~$ nv set interface -h
usage: 
  nv [options] set interface <interface-id>

Description:
  interface             Update all interfaces

Identifiers:
  <interface-id>        Interface (interface-name)

Output Options:
  -o <format>, --output <format>
                        Supported formats: json, yaml, auto, constable, end-table, commands (default:auto)
  --color (on|off|auto)
                        Toggle coloring of output (default: auto)
  --paginate (on|off|auto)
                        Whether to send output to a pager (default: off)

General Options:
  -h, --help            Show help.

Command List

You can list all the NVUE commands by running nv list-commands. See List All NVUE Commands below.

Command History

At the command prompt, press the Up Arrow and Down Arrow keys to move back and forth through the list of commands you entered previously. When you find the command you want to use, you can run the command by pressing Enter. You can also modify the command before you run it.

Command Categories

The NVUE CLI has a flat structure; however, the commands are in three functional categories:

Configuration Commands

The NVUE configuration commands modify switch configuration. You can set and unset configuration options.

The nv set and nv unset commands are in the following categories. Each command group includes subcommands. Use command completion (press the tab key) to list the subcommands.

Command Group
Description
nv set acl
nv unset acl
Configures ACLs.
nv set bridge
nv unset bridge
Configures a bridge domain. This is where you configure bridge attributes, such as the bridge type (VLAN-aware), the STP state and priority, and VLANs.
nv set evpn
nv unset evpn
Configures EVPN. This is where you enable and disable the EVPN control plane, and set EVPN route advertise, multihoming, and duplicate address detection options.
nv set interface <interface-id>
nv unset interface <interface-id>
Configures the switch interfaces. Use this command to configure bond and bridge interfaces, interface IP addresses and descriptions, VLAN IDs, and links (MTU, FEC, speed, duplex, and so on).
nv set mlag
nv unset mlag
Configures MLAG. This is where you configure the backup IP address or interface, MLAG system MAC address, peer IP address, MLAG priority, and the delay before bonds come up.
nv set nve
nv unset nve
Configures network virtualization (VXLAN) settings. This is where you configure the UDP port for VXLAN frames, control dynamic MAC learning over VXLAN tunnels, enable and disable ARP and ND suppression, and configure how Cumulus Linux handles BUM traffic in the overlay.
nv set qos
nv unset qos
Configures QoS RoCE.
nv set router
nv unset router
Configures router policies (prefix list rules and route maps), sets global BGP options (enable and disable, ASN and router ID, BGP graceful restart and shutdown), global OSPF options (enable and disable, router ID, and OSPF timers) PIM, IGMP, PBR, VRR, and VRRP.
nv set service
nv unset service
Configures DHCP relays and servers, NTP, PTP, LLDP, SNMP servers, DNS, and syslog.
nv set system
nv unset system
Configures system settings, such as the hostname of the switch, pre and post login messages, reboot options (warm, cold, fast), the time zone and global system settings, such as the anycast ID, the system MAC address, and the anycast MAC address. This is also where you configure SPAN and ERSPAN sessions and set how configuration apply operations work (which files to ignore and which files to overwrite; see Configure NVUE to Ignore Linux Files).
nv set vrf <vrf-id>
nv unset vrf <vrf-id>
Configures VRFs. This is where you configure VRF-level configuration for PTP, BGP, OSPF, and EVPN.

Monitoring Commands

The NVUE monitoring commands show various parts of the network configuration. For example, you can show the complete network configuration or only interface configuration. The monitoring commands are in the following categories. Each command group includes subcommands. Use command completion (press the tab key) to list the subcommands.

Command Group
Description
nv show aclShows ACL configuration.
nv show actionShows information about the action commands that reset counters and remove conflicts.
nv show bridgeShows bridge domain configuration.
nv show evpnShows EVPN configuration.
nv show interfaceShows interface configuration and counters.
nv show mlagShows MLAG configuration.
nv show nveShows network virtualization configuration, such as VXLAN-specfic MLAG configuration and VXLAN flooding.
nv show platformShows platform configuration, such as hardware and software components.
nv show qosShows QoS RoCE configuration.
nv show routerShows router configuration, such as router policies, global BGP and OSPF configuration, PBR, PIM, IGMP, VRR, and VRRP configuration.
nv show serviceShows DHCP relays and server, NTP, PTP, LLDP, and syslog configuration.
nv show systemShows global system settings, such as the reserved routing table range for PBR and the reserved VLAN range for layer 3 VNIs. You can also see system login messages and switch reboot history.
nv show vrfShows VRF configuration.

The following example shows the nv show router commands after pressing the tab key, then shows the output of the nv show router bgp command.

cumulus@leaf01:mgmt:~$ nv show router <<tab>>
adaptive-routing  igmp              ospf              pim               ptm               vrrp              
bgp               nexthop-group     pbr               policy            vrr               

cumulus@leaf01:mgmt:~$ nv show router bgp
                                operational  applied  pending      description
------------------------------  -----------  -------  -----------  ----------------------------------------------------------------------
enable                                       off      on           Turn the feature 'on' or 'off'.  The default is 'off'.
autonomous-system                                     none         ASN for all VRFs, if a single AS is in use.  If "none", then ASN mu...
graceful-shutdown                                     off          Graceful shutdown enable will initiate the GSHUT community to be an...
policy-update-timer                                   5            Wait time in seconds before processing updates to policies to ensur...
router-id                                             none         BGP router-id for all VRFs, if a common one is used.  If "none", th...
wait-for-install                                      off          bgp waits for routes to be installed into kernel/asic before advert...
convergence-wait
  establish-wait-time                                 0            Maximum time to wait to establish BGP sessions. Any peers which do...
  time                                                0            Time to wait for peers to send end-of-RIB before router performs pa...
graceful-restart
  mode                                                helper-only  Role of router during graceful restart. helper-only, router is in h...
  path-selection-deferral-time                        360          Used by the restarter as an upper-bounds for waiting for peering es...
  restart-time                                        120          Amount of time taken to restart by router. It is advertised to the...
  stale-routes-time                                   360          Specifies an upper-bounds on how long we retain routes from a resta...
cumulus@leaf01:mgmt:~$ 

If there are no pending or applied configuration changes, the nv show command only shows the running configuration (under operational).

Additional options are available for the nv show commands. For example, you can choose the configuration you want to show (pending, applied, startup, or operational). You can also turn on colored output, and paginate specific output.

Option
Description
--viewShows these different views: acl-statistics, brief, detail, lldp, mac, mlag-cc, pluggables, qos-profile, and small. This option is available for the nv show interface command only.For example, the nv show interface --view=small command shows a list of the interfaces on the switch and the nv show interface --view=brief command shows information about each interface on the switch, such as the interface type, speed, remote host and port.The nv show interface --view=mac command shows the MAC address of each interface and the nv show interface --view=qos-profile command shows the QoS profile for the interfaces on the switch.Note: The description column only shows in the output when you use the --view=detail option.
--filterFilters show command output on column data. For example, the nv show interface --filter mtu=1500 shows only the interfaces with MTU set to 1500.To filter on multiple column outputs, enclose the entire filter in double quotes; for example, nv show interface --filter "type=bridge&mtu=9216" shows data for bridges with MTU 9216.You can use wildcards; for example, nv show interface swp1 --filter "ip.address=1*" shows all IP addresses that start with 1 for swp1.You can filter on all revisions (operational, applied, and pending); for example, nv show interface --filter "ip.address=1*" --rev=applied shows all IP addresses that start with 1 for swp1 in the applied revision.
--rev <revision>Shows a detached pending configuration. See the nv config detach configuration management command below. For example, nv show --rev 1. You can also show only applied or only operational information in the nv show output. For example, to show only the applied settings for swp1 configuration, run the nv show interface swp1 --rev=applied command. To show only the operational settings for swp1 configuration, run the nv show interface swp1 --rev=operational command.
--appliedShows configuration applied with the nv config apply command. For example, nv show --applied interface bond1.
--operationalShows the running configuration (the actual system state). For example, nv show --operational interface bond1 shows the running configuration for bond1. The running and applied configuration should be the same. If different, inspect the logs.
--pendingShows the last applied configuration and any pending set or unset configuration that you have not yet applied. For example, nv show --pending interface bond1.
--startupShows configuration saved with the nv config save command. This is the configuration after the switch boots.
--outputShows command output in table (auto), json, or yaml format. For example:
nv show --output auto interface bond1
nv show --output json interface bond1
nv show --output yaml interface bond1
--colorTurns colored output on or off. For example, nv show --color on interface bond1
--paginatePaginates the output. For example, nv show --paginate on interface bond1.
--helpShows help for the NVUE commands.

The following example shows pending BGP graceful restart configuration:

cumulus@switch:~$ nv show router bgp graceful-restart --pending
                              4                  description
----------------------------  -----------------  ----------------------------------------------------------------------
mode                          helper-only        Role of router during graceful restart. helper-only, router is in h...
path-selection-deferral-time  360                Used by the restarter as an upper-bounds for waiting for peeringes...
restart-time                  120                Amount of time taken to restart by router. It is advertised to the...
stale-routes-time             360                Specifies an upper-bounds on how long we retain routes from a resta...

Monitoring Commands and FRR Daemons

If you run an NVUE show command but the corresponding FRR routing daemons are not running on the switch, you see an error message; for example:

Net Show commands

In addition to the nv show commands, Cumulus Linux continues to provide a subset of the NCLU net show commands. Use these commands to get additional views of various parts of your network configuration.

cumulus@leaf01:mgmt:~$ net show 
    bfd            :  Bidirectional forwarding detection
    bgp            :  Border Gateway Protocol
    bridge         :  a layer2 bridge
    clag           :  Multi-Chassis Link Aggregation
    commit         :  apply the commit buffer to the system
    configuration  :  settings, configuration state, etc
    counters       :  net show counters
    debugs         :  Debugs
    dhcp-snoop     :  DHCP snooping for IPv4
    dhcp-snoop6    :  DHCP snooping for IPv6
    dot1x          :  Configure, Enable, Delete or Show IEEE 802.1X EAPOL
    evpn           :  Ethernet VPN
    hostname       :  local hostname
    igmp           :  Internet Group Management Protocol
    interface      :  An interface, such as swp1, swp2, etc.
    ip             :  Internet Protocol version 4/6
    ipv6           :  Internet Protocol version 6
    lldp           :  Link Layer Discovery Protocol
    mpls           :  Multiprotocol Label Switching
    mroute         :  Static unicast routes in MRIB for multicast RPF lookup
    msdp           :  Multicast Source Discovery Protocol
    neighbor       :  A BGP, OSPF, PIM, etc neighbor
    ospf           :  Open Shortest Path First (OSPFv2)
    ospf6          :  Open Shortest Path First (OSPFv3)
    package        :  A Cumulus Linux package name
    pbr            :  Policy Based Routing
    pim            :  Protocol Independent Multicast
    port-mirror    :  port-mirror
    port-security  :  Port security
    ptp            :  Precision Time Protocol
    roce           :  Enable RoCE on all interfaces, default mode is lossless
    rollback       :  revert to a previous configuration state
    route          :  EVPN route information
    route-map      :  Route-map
    snmp-server    :  Configure the SNMP server
    system         :  System
    time           :  Time
    version        :  Version number
    vrf            :  Virtual routing and forwarding
    vrrp           :  Virtual Router Redundancy Protocol

Configuration Management Commands

The NVUE configuration management commands manage and apply configurations.

Command
Description
nv config applyApplies the pending configuration to become the applied configuration.
You can also use these prompt options:
  • --y or --assume-yes to automatically reply yes to all prompts.
  • --assume-no to automatically reply no to all prompts.

Cumulus Linux applies but does not save the configuration; the configuration does not persist after a reboot.

You can also use these apply options:
--confirm applies the configuration change but you must confirm the applied configuration. If you do not confirm within ten minutes, the configuration rolls back automatically. You can change the default time with the apply --confirm <time> command. For example, apply --confirm 60 requires you to confirm within one hour.
--confirm-status shows the amount of time left before the automatic rollback.To save the pending configuration to the startup configuration automatically when you run nv config apply so that you do not have to run the nv config save command, enable auto save.
nv config detachDetaches the configuration from the current pending configuration and uses an integer to identify it; for example, 4. To list all the current detached pending configurations, run nv config diff <<press tab>.
nv config diff <revision> <revision>Shows differences between configurations, such as the pending configuration and the applied configuration, or the detached configuration and the pending configuration.
nv config history <revision>Shows the apply history for the revision.
nv config patch <nvue-file>Updates the pending configuration with the specified YAML configuration file.
nv config replace <nvue-file>Replaces the pending configuration with the specified YAML configuration file.
nv config saveOverwrites the startup configuration with the applied configuration by writing to the /etc/nvue.d/startup.yaml file. The configuration persists after a reboot.
nv config showShows the currently applied configuration in yaml format. This command also shows NVUE version information.
nv config show -o commandsShows the currently applied configuration commands.
nv config diff -o commandsShows differences between two configuration revisions.

You can use the NVUE configuration management commands to back up and restore configuration when you upgrade Cumulus Linux on the switch. Refer to Upgrading Cumulus Linux.

Action Commands

The NVUE action commands clear counters, and provide system reboot and TACACS user disconnect options.

CommandDescription
nv action clearProvides commands to clear interface counters, Qos buffers, BGP routes, OSPF interface counters, matches against a route map, and remove conflicts from protodown MLAG bonds.
nv action disconnect system aaa userDisconnects a TACACs user.
nv action reboot systemReboots the switch in the configured restart mode (fast, cold, or warm). You must specify the no-confirm option with this command.

List All NVUE Commands

To show the full list of NVUE commands, run nv list-commands. For example:

cumulus@switch:~$ nv list-commands
nv show platform
nv show platform hardware
nv show platform hardware component
nv show platform hardware component <component-id>
nv show platform software
nv show platform software installed
nv show platform software installed <installed-id>
nv show platform capabilities
nv show platform environment
...

You can show the list of commands for a command grouping. For example, to show the list of interface commands:

cumulus@switch:~$ nv list-commands interface
nv show interface
nv show interface <interface-id>
nv show interface <interface-id> ip
nv show interface <interface-id> ip address
nv show interface <interface-id> ip address <ip-prefix-id>
nv show interface <interface-id> ip gateway
nv show interface <interface-id> ip gateway <ip-address-id>
...

Use the tab key to get help for the command lists you want to see. For example, to show the list of command options available for swp1, run the nv list-commands interface swp1 command and press the tab key:

cumulus@switch:~$ nv list-commands interface swp1 <<press tab>>
acl            counters       link           ptp            storm-control  
bond           evpn           lldp           qos            synce          
bridge         ip             pluggable      router         tunnel

To view the NVUE command reference for Cumulus Linux, which describes all the NVUE CLI commands and provides examples, go to the NVUE Command Reference.

NVUE Configuration File

When you save network configuration, NVUE writes the configuration to the /etc/nvue.d/startup.yaml file.

You can edit or replace the contents of the /etc/nvue.d/startup.yaml file. NVUE applies the configuration in the /etc/nvue.d/startup.yaml file during system boot only if the nvue-startup.service is running. If this service is not running, the switch reboots with the same configuration that is running before the reboot.

To start nvue-startup.service:

cumulus@switch:~$ sudo systemctl enable nvue-startup.service
cumulus@switch:~$ sudo systemctl start nvue-startup.service

When you apply a configuration with nv config apply, NVUE also writes to underlying Linux files such as /etc/network/interfaces and /etc/frr/frr.conf. You can view these configuration files; however, do not manually edit them while using NVUE. If you need to configure certain network settings manually or use automation such as Ansible to configure the switch, see Configure NVUE to Ignore Linux Files below.

Configuration Files that NVUE Manages

NVUE manages the following configuration files:

FileDescription
/etc/network/interfacesConfigures the network interfaces available on your system.
/etc/frr/frr.confConfigures FRRouting.
/etc/cumulus/switchd.confConfigures switchd options.
/etc/cumulus/switchd.d/ptp.confConfigures PTP timestamping.
/etc/frr/daemonsConfigures FRRouting services.
/etc/hostsConfigures the hostname of the switch.
/etc/default/isc-dhcp-relay-defaultConfigures DHCP relay options.
/etc/dhcp/dhcpd.confConfigures DHCP server options.
/etc/hostnameConfigures the hostname of the switch.
/etc/cumulus/datapath/qos/qos_features.confConfigures QoS settings, such as traffic marking, shaping and flow control.
/etc/mlx/datapath/qos/qos_infra.confConfigures QoS platform specific configurations, such as buffer allocations and Alpha values.
/etc/cumulus/switchd.d/qos.confConfigures QoS settings.
/etc/cumulus/ports.confConfigures port breakouts.
/etc/ntp.confConfigures NTP settings.
/etc/ptp4l.confConfigures PTP settings.
/etc/snmp/snmpd.confConfigures SNMP settings.

Search for a Specific Configuration

To search for a specific portion of the NVUE configuration, run the nv config find <search string> command. The search shows all items above and below the search string. For example, to search the entire NVUE object model configuration for any mention of ptm:

cumulus@switch:~$ nv config find ptm
- set:
    router:
      ptm:
        enable: off

Configure NVUE to Ignore Linux Files

You can configure NVUE to ignore certain underlying Linux files when applying configuration changes. For example, if you push certain configuration to the switch using Ansible and Jinja2 file templates or you want to use custom configuration for a particular service such as PTP, you can ensure that NVUE never writes to those configuration files.

The following example configures NVUE to ignore the Linux /etc/ptp4l.conf file when applying configuration changes and saves the configuration so it persists after a reboot.

cumulus@switch:~$ nv set system config apply ignore /etc/ptp4l.conf
cumulus@switch:~$ nv config apply
cumulus@switch:~$ nv config save

Configure Auto Save

By default, when you run the nv config apply command to apply a configuration setting, NVUE applies the pending configuration to become the applied configuration but does not update the startup configuration file (/etc/nvue.d/startup.yaml). To save the applied configuration to the startup configuration so that the changes persist after the reboot, you must run the nv config save command. The auto save option lets you save the pending configuration to the startup configuration automatically when you run nv config apply so that you do not have to run the nv config save command.

To enable auto save:

cumulus@switch:~$ nv set system config auto-save enable on
cumulus@switch:~$ nv config apply

To disable auto save, run the nv set system config auto-save enable off command.

Add Configuration Apply Messages

When you run the nv config apply command, you can add a message that describes the configuration updates you make. You can see the message when you run the nv config history command.

To add a configuration apply message, run the nv config apply -m <message> command. If the message includes more than one word, enclose the message in quotes.

cumulus@switch:~$ nv config apply -m "this is my message"

Reset NVUE Configuration to Default Values

To reset the NVUE configuration on the switch back to the default values, run the following command:

cumulus@switch:~$ nv config apply empty

Example Configuration Commands

This section provides examples of how to configure a Cumulus Linux switch using NVUE commands.

Configure the System Hostname

The example below shows the NVUE commands required to change the hostname for the switch to leaf01:

cumulus@switch:~$ nv set system hostname leaf01
cumulus@switch:~$ nv config apply

Configure the System DNS Server

The example below shows the NVUE commands required to define the DNS server for the switch:

cumulus@switch:~$ nv set service dns mgmt server 192.168.200.1
cumulus@switch:~$ nv config apply

Configure an Interface

The example below shows the NVUE commands required to bring up swp1.

cumulus@switch:~$ nv set interface swp1
cumulus@switch:~$ nv config apply 

Configure a Bond

The example below shows the NVUE commands required to configure the front panel port interfaces swp1 thru swp4 to be slaves in bond0.

cumulus@switch:~$ nv set interface bond0 bond member swp1-4
cumulus@switch:~$ nv config apply

Configure a Bridge

The example below shows the NVUE commands required to create a VLAN-aware bridge that contains two switch ports (swp1 and swp2) and includes 3 VLANs; tagged VLANs 10 and 20 and an untagged (native) VLAN of 1.

With NVUE, there is a default bridge called br_default, which has no ports assigned to it. The example below configures this default bridge.

cumulus@switch:~$ nv set interface swp1-2 bridge domain br_default
cumulus@switch:~$ nv set bridge domain br_default vlan 10,20
cumulus@switch:~$ nv set bridge domain br_default untagged 1
cumulus@switch:~$ nv config apply

Configure MLAG

The example below shows the NVUE commands required to configure MLAG on leaf01. The commands:

cumulus@leaf01:~$ nv set interface bond1 bond member swp1
cumulus@leaf01:~$ nv set interface bond2 bond member swp2
cumulus@leaf01:~$ nv set interface bond1 bond mlag id 1
cumulus@leaf01:~$ nv set interface bond2 bond mlag id 2
cumulus@switch:~$ nv set interface bond1-2 bridge domain br_default 
cumulus@leaf01:~$ nv set interface peerlink bond member swp49-50
cumulus@leaf01:~$ nv set mlag mac-address 44:38:39:BE:EF:AA
cumulus@leaf01:~$ nv set mlag backup 10.10.10.2
cumulus@leaf01:~$ nv set mlag peer-ip linklocal
cumulus@leaf01:~$ nv config apply

Configure BGP Unnumbered

The example below shows the NVUE commands required to configure BGP unnumbered on leaf01. The commands:

cumulus@leaf01:~$ nv set router bgp autonomous-system 65101
cumulus@leaf01:~$ nv set router bgp router-id 10.10.10.1
cumulus@leaf01:~$ nv set vrf default router bgp neighbor swp51 remote-as external
cumulus@leaf01:~$ nv set vrf default router bgp address-family ipv4-unicast network 10.10.10.1/32
cumulus@leaf01:~$ nv config apply

Example Monitoring Commands

This section provides monitoring command examples.

Show Installed Software

The following example command lists the software installed on the switch:

cumulus@switch:~$ nv show platform software
Installed Software
=====================
Installed software            description                     package                      version                       
---------------------------   ---------------------------     --------------------------   -----------------------------
acpi                          displays information on ACPI    acpi                         1.7-1.1                       
                              devices                                                                                  
acpi-support-base             scripts for handling base       acpi-support-base            0.142-8                       
                              ACPI events such as the                                                                  
                              power button                                                                             
acpid                         Advanced Configuration and      acpid                        1:2.0.31-1                    
                              Power Interface event daemon                                                             
adduser                       add and remove users and        adduser                      3.118                         
                              groups                                                                                   
apt                           commandline package manager     apt                          1.8.2.3             
...

Show Interface Configuration

The following example command shows the running, applied, and pending swp1 interface configuration.

cumulus@leaf01:~$ nv show interface swp1
                          operational        applied   
------------------------  -----------------  ----------
type                      swp                swp       
[acl]                                                  
bridge                                                 
  [domain]                br_default         br_default
evpn                                                   
  multihoming                                          
    uplink                                   off       
ptp                                                    
  enable                                     off       
router                                                 
  adaptive-routing                                     
    enable                                   off       
  ospf                                                 
    enable                                   off       
  ospf6                                                
    enable                                   off       
  pbr                                                  
    [map]                                              
  pim            
...

Example Configuration Management Commands

This section provides examples of how to use the configuration management commands to apply, save, and detach configurations.

Apply and Save a Configuration

The following example command configures the front panel port interfaces swp1 thru swp4 to be slaves in bond0. The configuration is only in a pending configuration state. The configuration is not applied. NVUE has not yet made any changes to the running configuration.

cumulus@switch:~$ nv set interface bond0 bond member swp1-4

To apply the pending configuration to the running configuration, run the nv config apply command. The configuration does not persist after a reboot.

cumulus@switch:~$ nv config apply

To save the applied configuration to the startup configuration, run the nv config save command. This command overwrites the startup configuration with the applied configuration by writing to the /etc/nvue.d/startup.yaml file. The configuration persists after a reboot.

cumulus@switch:~$ nv config save

Detach a Pending Configuration

The following example configures the IP address of the loopback interface, then detaches the configuration from the current pending configuration. Cumulus Linux saves the detached configuration to a file with a numerical value to distinguish it from other pending configurations.

cumulus@switch:~$ nv set interface lo ip address 10.10.10.1/32
cumulus@switch:~$ nv config detach

View Differences Between Configurations

To view differences between configurations, run the nv config diff command.

To view differences between two detached pending configurations, run the nv config diff «tab» command to list all the current detached pending configurations, then run the nv config diff command with the pending configurations you want to diff.

cumulus@switch:~$ nv config diff <<press tab>>
1        2        3        4        5        6        applied  empty    startup
cumulus@switch:~$ nv config diff 2 3
- unset:
    system:
      wjh:
        channel:
          forwarding:
            trigger:
              l2:

To view differences between the applied configuration and the startup configuration:

cumulus@switch:~$ nv config diff applied startup
- unset:
    interface:
    system:
      wjh:

Replace and Patch a Pending Configuration

The following example replaces the pending configuration with the contents of the YAML configuration file called nv-02/13/2021.yaml located in the /deps directory:

cumulus@switch:~$ nv config replace /deps/nv-02/13/2021.yaml

The following example patches the pending configuration (runs the set or unset commands from the configuration in the nv-02/13/2021.yaml file located in the /deps directory):

cumulus@switch:~$ nv config patch /deps/nv-02/13/2021.yaml

A patch contains a single request to the NVUE service. Ordering of parameters within a patch is not guaranteed; NVUE does not support both unset and set commands for the same object in a single patch.

Date and Time

This section discusses how to:

NVUE Snippets

NVUE supports both traditional snippets and flexible snippets:

  • A snippet configures a single parameter associated with a specific configuration file.
  • You can only set or unset a snippet; you cannot modify, partially update, or change a snippet.
  • Setting the snippet value replaces any existing snippet value.
  • Cumulus Linux supports only one snippet for a configuration file.
  • Only certain configuration files support a snippet.
  • NVUE does not parse or validate the snippet content and does not validate the resulting file after you apply the snippet.
  • PATCH is only the method of applying snippets and does not refer to any snippet capabilities.
  • As NVUE supports more features and introduces new syntax, snippets and flexible snippets become invalid. Before you upgrade Cumulus Linux to a new release, review the What's New for new NVUE syntax and remove the snippet if NVUE introduces new syntax for the feature that the snippet configures.

Traditional Snippets

Use traditional snippets if you configure Cumulus Linux with NVUE commands, then want to configure a feature that does not yet support the NVUE Object Model. You create a snippet in yaml format, then add the configuration to the file with the nv config patch command.

The nv config patch command requires you to use the fully qualified path name to the snippet .yaml file; for example you cannot use ./ with the nv config patch command.

/etc/frr/frr.conf Snippets

Example 1: Top Level Configuration

NVUE does not support configuring BGP to peer across the default route. The following example configures BGP to peer across the default route from the default VRF:

  1. Create a .yaml file with the following traditional snippet:

    cumulus@switch:~$ sudo nano bgp_snippet.yaml
    - set:
        system:
          config:
            snippet:
              frr.conf: |
                ip nht resolve-via-default
    
  2. Run the following command to patch the configuration:

    cumulus@switch:~$ nv config patch bgp_snippet.yaml
    
  3. Run the nv config apply command to apply the configuration:

    cumulus@switch:~$ nv config apply
    
  4. Verify that the configuration exists at the end of the /etc/frr/frr.conf file:

    cumulus@switch:~$ sudo cat /etc/frr/frr.conf
    ...
    ! end of router ospf block
    !---- CUE snippets ----
    ip nht resolve-via-default
    

Example 2: Nested Configuration

NVUE does not support configuring EVPN route targets using auto derived values from RFC 8365. The following example configures BGP to enable RFC 8365 derived router targets:

  1. Create a .yaml file with the following traditional snippet:

    cumulus@switch:~$ sudo nano bgp_snippet.yaml
    - set:
        system:
          config:
            snippet:
              frr.conf: |
                router bgp 65517 vrf default
                  address-family l2vpn evpn
                    autort rfc8365-compatible
    

Make sure to use spaces not tabs; the parser expects spaces in yaml format.

  1. Run the following command to patch the configuration:

    cumulus@switch:~$ nv config patch bgp_snippet.yaml
    
  2. Run the nv config apply command to apply the configuration:

    cumulus@switch:~$ nv config apply
    
  3. Verify that the configuration exists at the end of the /etc/frr/frr.conf file:

    cumulus@switch:~$ sudo cat /etc/frr/frr.conf
    ...
    ! end of router bgp 65517 vrf default
    !---- CUE snippets ----
    router bgp 65517 vrf default
    address-family l2vpn evpn
    autort rfc8365-compatible
    

The traditional snippets for FRR write content to the /etc/frr/frr.conf file. When you apply the configuration and snippet with the nv config apply command, the FRR service goes through and reads in the /etc/frr/frr.conf file.

Example 3: EVPN Multihoming FRR Debugging

NVUE does not support configuring FRR debugging for EVPN multihoming. The following example configures FRR debugging:

  1. Create a .yaml file and add the following traditional snippet:

    cumulus@switch:~$ sudo nano mh_debug_snippet.yaml
    - set:
        system:
          config:
            snippet:
              frr.conf: |
                debug bgp evpn mh es
                debug bgp evpn mh route
                debug bgp zebra
                debug zebra evpn mh es
                debug zebra evpn mh mac
                debug zebra evpn mh neigh
                debug zebra evpn mh nh
                debug zebra vxlan
    
  2. Run the following command to patch the configuration:

    cumulus@switch:~$ nv config patch mh_debug_snippet.yaml
    
  3. Run the nv config apply command to apply the configuration:

    cumulus@switch:~$ nv config apply
    
  4. Verify that the configuration exists in the /etc/frr/frr.conf file:

    cumulus@switch:~$ sudo cat /etc/frr/frr.conf
    ...
    !---- NVUE snippets ----
    debug bgp evpn mh es
    debug bgp evpn mh route
    debug bgp zebra
    debug zebra evpn mh es
    debug zebra evpn mh mac
    debug zebra evpn mh neigh
    debug zebra evpn mh nh
    debug zebra vxlan
    

The traditional snippets for FRR write content to the /etc/frr/frr.conf file. When you apply the configuration and snippet with the nv config apply command, the FRR service goes through and reads in the /etc/frr/frr.conf file.

/etc/network/interfaces Snippets

MLAG Timers Example

NVUE supports configuring only one of the MLAG service timeouts (initDelay). The following example configures the MLAG peer timeout to 400 seconds:

  1. Create a .yaml file and add the following traditional snippet:

    cumulus@switch:~$ sudo nano mlag_snippet.yaml
    - set:
        system:
          config:
            snippet:
              ifupdown2_eni:
                peerlink.4094: |
                  clagd-args --peerTimeout 400
    
  2. Run the following command to patch the configuration:

    cumulus@switch:~$ nv config patch mlag_snippet.yaml
    
  3. Run the nv config apply command to apply the configuration:

    cumulus@switch:~$ nv config apply
    
  4. Verify that the configuration exists in the peerlink.4094 stanza of the /etc/network/interfaces file:

    cumulus@switch:~$ sudo cat /etc/network/interfaces
    ...
    auto peerlink.4094
    iface peerlink.4094
     clagd-args --peerTimeout 400
     clagd-peer-ip linklocal
     clagd-backup-ip 10.10.10.2
     clagd-sys-mac 44:38:39:BE:EF:AA
     clagd-args --initDelay 180
    ...
    

Traditional Bridge Example

NVUE does not support configuring traditional bridges. The following example configures a traditional bridge called br0 with the IP address 11.0.0.10/24. swp1, swp2 are members of the bridge.

  1. Create a .yaml file and add the following traditional snippet:

    cumulus@switch:~$ sudo nano bridge_snippet.yaml
    - set:
        system:
         config:
           snippet:
             ifupdown2_eni:
               eni_stanzas: |
                 auto br0
                 iface br0
                   address 11.0.0.10/24
                   bridge-ports swp1 swp2
                   bridge-vlan-aware no
    
  2. Run the following command to patch the configuration:

    cumulus@switch:~$ nv config patch bridge_snippet.yaml
    
  3. Run the nv config apply command to apply the configuration:

    cumulus@switch:~$ nv config apply
    
  4. Verify that the configuration exists at the end of the /etc/network/interfaces file:

    cumulus@switch:~$ sudo cat /etc/network/interfaces
    ...
    auto br0
    iface br0
      address 11.0.0.10/24
      bridge-ports swp1 swp2
      bridge-vlan-aware no
    

VLAN-aware RSTP Timers Example

NVUE does not support configuring RSTP timers on VLAN-aware bridges. The following example configures non-default RSTP timers for the NVUE default bridge br_default:

  1. Create a .yaml file and add the following traditional snippet:

    cumulus@switch:~$ sudo nano vlan-aware_bridge_snippet.yaml
    - set:
        system:
          config:
            snippet:
              ifupdown2_eni:
                br_default: |
                  mstpctl-maxage 10
                  mstpctl-hello 1
                  mstpctl-fdelay 8
    
  2. Run the following command to patch the configuration:

    cumulus@switch:~$ nv config patch vlan-aware_bridge_snippet.yaml
    
  3. Run the nv config apply command to apply the configuration:

    cumulus@switch:~$ nv config apply
    
  4. Verify that the configuration exists at the end of the /etc/network/interfaces file:

    cumulus@switch:~$ sudo cat /etc/network/interfaces
    ...
    auto br_default
    iface br_default
        mstpctl-maxage 10
        mstpctl-hello 1
        mstpctl-fdelay 8
    ...
    

/etc/cumulus/switchd.conf Snippets

NVUE does not provide options to configure link flap detection settings. The following example configures the link flap window to 10 seconds and the link flap threshold to 5 seconds:

  1. Create a .yaml file and add the following traditional snippet:

    cumulus@switch:~$ sudo nano switchd_snippet.yaml
    - set:
        system:
          config:
            snippet:
              switchd.conf: |
                link_flap_window = 10
                link_flap_threshold = 5
    
  2. Run the following command to patch the configuration:

    cumulus@switch:~$ nv config patch switchd_snippet.yaml
    
  3. Run the nv config apply command to apply the configuration:

    cumulus@switch:~$ nv config apply
    
  4. Verify that the configuration exists at the end of the /etc/cumulus/switchd.conf file:

    cumulus@switch:~$ sudo cat /etc/cumulus/switchd.conf
    !---- NVUE snippets ----
    link_flap_window = 10
    link_flap_threshold = 5
    

/etc/cumulus/datapath/traffic.conf Snippets

To add data path configuration for the Cumulus Linux switchd module that NVUE does not yet support, create a traffic.conf snippet.

The following example creates a file called traffic_conf_snippet.yaml and enables the resilient hash setting.

  1. Create a .yaml file and add the following traditional snippet:

    cumulus@switch:~$ sudo nano traffic_conf_snippet.yaml
    - set:
        system:
          config:
            snippet:
              traffic.conf: |
                resilient_hash_enable = TRUE
    
  2. Run the following command to patch the configuration:

    cumulus@switch:~$ nv config patch traffic_conf_snippet.yaml
    
  3. Run the nv config apply command to apply the configuration:

    cumulus@switch:~$ nv config apply
    
  4. Verify that the configuration exists at the end of the /etc/cumulus/datapath/traffic.conf file:

    cumulus@switch:~$ sudo cat /etc/cumulus/datapath/traffic.conf
    ...
    !---- NVUE snippets ----
    resilient_hash_enable = TRUE
    

/etc/snmp/snmpd.conf Snippets

To add Cumulus Linux SNMP agent configuration not yet available with NVUE commands, create an snmpd.conf snippet.

The following example creates a file called snmpd.conf_snippet.yaml, and sets the read only community string and the listening address to run in the mgmt VRF.

SNMP snippets do not take effect unless you first enable SNMP with the NVUE nv set service snmp-server enable on and nv set service snmp-server listening-address commands (or with the equivalent REST API methods).

  1. Create a .yaml file and add the following traditional snippet:

    cumulus@switch:~$ sudo nano snmpd.conf_snippet.yaml
    - set:
        system:
          config:
            snippet:
              snmpd.conf: |
                rocommunity cumuluspassword default
                agentaddress udp:@mgmt:161
    
  2. Run the following command to patch the configuration:

    cumulus@switch:~$ nv config patch snmpd.conf_snippet.yaml
    
  3. Run the nv config apply command to apply the configuration:

    cumulus@switch:~$ nv config apply
    
  4. Verify that the configuration exists at the end of the /etc/snmp/snmpd.conf file:

    cumulus@switch:~$ sudo cat /etc/snmp/snmpd.conf
    ...
    !---- NVUE SNMP Server Snippets ----
    rocommunity cumuluspassword default
    agentaddress udp:@mgmt:161
    

/etc/ssh/sshd_config Snippets

To add SSH service configuration not yet available with NVUE commands, create an sshd_config snippet.

The following example creates a file called sshd_config_snippet.yaml to allow root login and enable X11 forwarding for all users except user anoncvs. The snippet also disables TCP forwarding for the anoncvs user and runs the cvs server command when anoncvs logs in.

  1. Create a .yaml file and add the following traditional snippet:

    cumulus@switch:~$ sudo nano sshd_config_snippet.yaml
    - set:
        system:
          config:
            snippet:
              sshd_config: |
                PermitRootLogin yes
                X11Forwarding yes
                Match User anoncvs
                   X11Forwarding no
                   AllowTcpForwarding no
                   ForceCommand cvs server
    
  2. Run the following command to patch the configuration:

    cumulus@switch:~$ nv config patch sshd_config_snippet.yaml
    
  3. Run the nv config apply command to apply the configuration:

    cumulus@switch:~$ nv config apply
    
  4. Verify that the configuration exists at the end of the /etc/ssh/sshd_config file:

    cumulus@switch:~$ sudo cat /etc/ssh/sshd_config
    ...
    !---- NVUE snippets ----
    PermitRootLogin yes
    X11Forwarding yes
    Match User anoncvs
       X11Forwarding no
       AllowTcpForwarding no
       ForceCommand cvs server
    

Flexible Snippets

Flexible snippets are an extension of traditional snippets that let you manage any text file on the system.

The account you use through the CLI or the REST API to configure and manage flexible snippets must be in the sudo group, which includes the NVUE system-admin role, or you must be the root user.

Files NVUE Manages

You can use flexible snippets to add configuration to the following files that NVUE manages:

Filename
Description
/etc/cumulus/csmgrdConfiguration file for csmgrctl commands.
/etc/default/isc-dhcp-relay-<VRF>Configuration file for DHCP relay. Changes to this file require a dhcrelay@<VRF>.service restart.
/etc/resolv.confConfiguration file for DNS resolution.
/etc/hostsConfiguration file for the hostname of the switch.
/etc/default/isc-dhcp-server-<VRF>Configuration file for DHCP servers. Changes to this file require a dhcpd@<VRF>.service restart.
/etc/default/isc-dhcp-server6-<VRF>Configuration file for DHCP servers for IPv6. Changes to this file require a dhcpd6@<VRF>.service restart
/etc/dhcp/dhcpd-<VRF>.confConfiguration file for the dhcpd service. Changes to this file require a dhcpd@<VRF>.service restart
/etc/dhcp/dhcpd6-<VRF>.confConfiguration file for the dhcpd service for IPv6. Changes to this file require a dhcpd6@<VRF>.service restart
/etc/ntp.confConfiguration file for NTP servers. Changes to this file require an ntp service restart.
/etc/default/isc-dhcp-relay6-<VRF>Configuration file for DHCP relay for IPv6. Changes to this file require a dhcrelay6@<VRF>.service restart.
/etc/snmp/snmpd.confConfiguration file for SNMP. Changes to this file require an snmpd restart.
/etc/cumulus/datapath/traffic.confConfiguration file for forwarding table profiles. Changes to this file require a switchd restart.
/etc/cumulus/switchd.confConfiguration file for switchd. Changes to this file require a switchd restart.

Flexible snippets do not support:

Use caution when creating flexible snippets:

  • If you configure flexible snippets incorrectly, they might impact switch functionality. For example, even though flexible snippet validation allows you to only add textual content, Cumulus Linux does not prevent you from creating a flexible snippet that adds to sensitive text files, such as /boot/grub.cfg and /etc/fstab or add corrupt contents. Such snippets might render the switch unusable or create a potential security vulnerability (the NVUE service (nvued) runs with superuser privileges).
  • Do not manually update configuration files to which you add flexible snippets.
  • Any sensitive data in plain text (such as passwords) appears in the NVUE-managed configuration files as plain text.

Create a Flexible Snippet

To create a flexible snippet:

  1. Create a file in yaml format and add each flexible snippet you want to apply in the format shown below. NVUE appends the flexible snippet at the end of an existing file. If the file does not exist, NVUE creates the file, then adds the content.

    cumulus@leaf01:mgmt:~$ sudo nano <filename>.yaml>
    - set:
        system:
         config:
           snippet:
             <snippet-name>:
               file: "<filename>"
               permissions: "<umask-permissions>"
               content: |
                 # This is my content
               services:
                  <name>:
                    service: <service-name>
                    action: <action>
    
    • You can only set the umast permissions to a new file that you create. Adding the permissions: line is optional. The default umask persmissions are 644.
    • You can add a service with an action, such as start, restart, or stop. Adding the services: lines is optional; however, if you add the service: line, you must specify at least one service.
  2. Run the following command to patch the configuration:

    cumulus@switch:~$ nv config patch <filename>.yaml>
    
  3. Run the nv config apply command to apply the configuration:

    cumulus@switch:~$ nv config apply
    
  4. Verify the patched configuration.

The nv config patch command requires you to use the fully qualified path name to the snippet .yaml file; for example you cannot use ./ with the nv config patch command.

Flexible Snippet Examples

The following example flexible snippet called crontab-flex-snippet appends the single line @daily /opt/utils/run-backup.sh to the existing /etc/crontab file, then restarts the cron service.

cumulus@leaf01:mgmt:~$ sudo nano crontab-flex-snippet.yaml
- set:
    system:
      config:
        snippet:
          crontab-flex-snippet:
            file: "/etc/crontab"
            content: |
              @daily /opt/utils/run-backup.sh
            services:
              schedule:
                service: cron
                action: restart

The following example flexible snippet called apt-flex-snippet creates a new file /etc/apt/sources.list.d/microsoft-prod.list with 0644 permissions and adds multi-line text:

cumulus@leaf01:mgmt:~$ sudo nano apt-flex-snippet.yaml
- set:
    system:
      config:
        snippet:
          apt-flexible-snippet:
            file: "/etc/apt/sources.list.d/microsoft-prod.list"
            content: |
              # Adding Microsoft SQL Server Sources
              deb [arch=amd64] https://packages.microsoft.com/debian/10/prod buster main
            permissions: "0644"

The following flexible snippet called lldp_config_snipppet disables LLDP on swp1 and swp2 using the configure system interface pattern-blacklist command:

cumulus@leaf01:mgmt:~$ sudo nano lldp_config_snipppet.yaml
- set:
    system:
      config:
        snippet:
          lldp-interfaces-config:
            file: "/etc/lldpd.d/lldp-interfaces.conf"
            content: |
              configure system interface pattern-blacklist swp1,swp2
              services:
                lldp:
                  service: lldpd
                  action: restart

The following flexible snippet disables LLDP on swp1 and swp2 using the system interface pattern keyword:

cumulus@leaf01:mgmt:~$ sudo nano lldp_config_snipppet.yaml
- set:
    system:
      config:
        snippet:
          lldp-interfaces-config:
            file: "/etc/lldpd.d/lldp-interfaces.conf"
            content: |
              configure system interface pattern eth*,swp*,!swp1,!swp2
            services:
              lldp:
                service: lldpd
                action: restart

After you patch and apply the configuration above, the snippet creates a new file in the /etc/lldp.d directory, then restarts the lldpd service to stop LLDP transmitting and receiving on swp1 and swp2. Other interfaces continue to participate in LLDP.

If you try to apply a flexible snippet to a file that NVUE does not allow, you see an error message similar to the following:

cumulus@leaf01:mgmt:~$ nv config apply
Invalid config [rev_id: 8]
  Flexible snippets are not allowed to be configured on the file '/etc/cumulus/ports.conf’.
  Flexible snippets are not allowed to be configured on the file '/etc/cumulus/ports_width.conf’.

If you try to apply a flexible snippet to a file that supports traditional snippets, you see an error message similar to the following:

cumulus@leaf01:mgmt:~$ nv config apply
Invalid config [rev_id: 1]
  Flexible snippet cannot be used to modify the file '/etc/ssh/sshd_config'. Traditional snippets (for e.g., 'sshd_config') are supported on this file. Consult NVIDIA NVUE documentation for further information on snippets.

You can also create a flexible snippet with the REST API. See NVUE API.

Remove a Snippet

To remove a traditional or flexible snippet, edit the snippet’s .yaml file to change set to unset, then patch and apply the configuration. Alternatively, you can use the REST API DELETE and PATCH methods.

The following example removes the MLAG timer traditional snippet created above to configure the MLAG peer timeout:

  1. Edit the mlag_snippet.yaml file to change set to unset:

    cumulus@switch:~$ sudo nano mlag_snippet.yaml
    - unset:
        system:
          config:
            snippet:
              ifupdown2_eni:
    
  2. Run the following command to patch the configuration:

    cumulus@switch:~$ nv config patch mlag_snippet.yaml
    
  3. Run the nv config apply command to apply the configuration:

    cumulus@switch:~$ nv config apply
    
  4. Verify that the peer timeout parameter no longer exists in the peerlink.4094 stanza of the /etc/network/interfaces file:

    cumulus@switch:~$ sudo cat /etc/network/interfaces
    ...
    auto peerlink.4094
    iface peerlink.4094
     clagd-peer-ip linklocal
     clagd-backup-ip 10.10.10.2
     clagd-sys-mac 44:38:39:BE:EF:AA
     clagd-args --initDelay 180
    ...
    

Setting the Date and Time

This section discusses how to set the time zone, and how to set the date and time on the software clock on the switch. To configure NTP, see Network Time Protocol - NTP. To configure PTP, see Precision Time Protocol - PTP.

Setting the time zone, and the date and time on the software clock requires root privileges; use sudo.

Set the Time Zone

You can use one of these methods to set the time zone on the switch:

Run the nv set system timezone <timezone> command. To see all the available time zones, run nv set system timezone and press the Tab key. The following example sets the time zone to US/Eastern:

cumulus@switch:~$ nv set system timezone US/Eastern
cumulus@switch:~$ nv config apply
  1. In a terminal, run the following command:

    cumulus@switch:~$ sudo dpkg-reconfigure tzdata
    
  2. Follow the on screen menu options to select the geographic area and region.

For more information, see the Debian System Administrator's Manual - Time.

  1. Edit the /etc/timezone file to add your desired time zone. You can see a list of valid time zones here.

    cumulus@switch:~$ sudo vi /etc/timezone
    US/Eastern
    
  2. Apply the new time zone:

    cumulus@switch:~$ sudo dpkg-reconfigure --frontend noninteractive tzdata
    
  3. Change /etc/localtime to reflect your current time zone:

    sudo ln -sf /usr/share/zoneinfo/US/Eastern /etc/localtime
    

Set the Date and Time

The switch contains a battery backed hardware clock that maintains the time while the switch powers off and between reboots. When the switch is running, the Cumulus Linux operating system maintains its own software clock.

During boot up, the switch copies the time from the hardware clock to the operating system software clock. The software clock takes care of all the timekeeping. During system shutdown, the switch copies the software clock back to the battery backed hardware clock.

You can set the date and time on the software clock with the date command. First, determine your current time zone:

cumulus@switch:~$ date +%Z

If you need to reconfigure the current time zone, refer to the instructions above.

To set the software clock according to the configured time zone:

cumulus@switch:~$ sudo date -s "Tue Jan 26 00:37:13 2021"

You can write the current value of the software clock to the hardware clock using the hwclock command:

cumulus@switch:~$ sudo hwclock -w

See man hwclock(8) for more information.

NVUE API

When you upgrade to Cumulus Linux 5.6 or later, the switch overwrites any manual configuration you performed by editing files in Cumulus Linux 5.5 or earlier, such as configuring the listening address, port, TLS, or certificate.

In addition to the CLI, NVUE supports a REST API. Instead of accessing Cumulus Linux using SSH, you can interact with the switch using an HTTP client, such as cURL or a web browser.

The nvued service provides access to the NVUE REST API. Cumulus Linux exposes the HTTP endpoint internally, which makes the NVUE REST API accessible locally within the Cumulus Linux switch. The NVUE CLI also communicates with the nvued service using internal APIs. To provide external access to the NVUE REST API, Cumulus Linux uses an HTTP reverse proxy server, and supports HTTPS and TLS connections from external REST API clients.

The following illustration shows the NVUE REST API architecture and illustrates how Cumulus Linux forwards the requests internally.

Supported HTTP Methods

The NVUE REST API supports the following methods:

Secure the API

The NVUE REST API supports HTTP basic authentication, and the same underlying authentication methods for username and password that the NVUE CLI supports. User accounts work the same on both the API and the CLI.

Certificates

Cumulus Linux includes a self-signed certificate and private key to use on the server so that it works out of the box. The switch generates the self-signed certificate and private key when it boots for the first time. The X.509 certificate with the public key is in /etc/ssl/certs/cumulus.pem and the corresponding private key is in /etc/ssl/private/cumulus.key.

NVIDIA recommends you use your own certificates and keys. Certificates must be in PEM format. For the steps to generate self-signed certificates and keys, and to install them on the switch, refer to the Ubuntu Certificates and Security documentation.

To use your own certificate chain:

  1. Import the certificate and private key onto the Cumulus Linux switch using secure channels, such as SCP or SFTP.
  2. Store the certificate and private key on the filesystem in a location of you choice or use the same location; for example, /etc/ssl/certs and /etc/ssl/private.
  3. Update the /etc/nginx/sites-enabled/nvue.conf file to set the ssl_certificate and the ssl_certificate_key values to your keys.
  4. Restart NGINX with the sudo systemctl restart nginx command.

API-only User

To create an API-only user without SSH permissions, use Linux group permissions. You can create the API-only user in the ZTP script.

# Create the dedicated automation user 
adduser --disabled-password --gecos "Automation User,,,," --shell /usr/bin/nologin automation

# Set the password
echo 'automation:password!' | chpasswd

# Add the user to nvapply group to make NVUE config changes
adduser automation nvapply

Control Plane ACLs

You can secure the API by configuring:

This example shows how to create ACLs to allow users from the management subnet and the local switch to communicate with the switch using REST APIs, and restrict all other access.

cumulus@switch:~$ nv set acl API-PROTECT type ipv4 
cumulus@switch:~$ nv set acl API-PROTECT rule 10 action permit
cumulus@switch:~$ nv set acl API-PROTECT rule 10 match ip .protocol tcp .dest-port 8765 .source-ip 192.168.200.0/24
cumulus@switch:~$ nv set acl API-PROTECT rule 10 remark "Allow the Management Subnet to talk to API"

cumulus@switch:~$ nv set acl API-PROTECT rule 20 action permit
cumulus@switch:~$ nv set acl API-PROTECT rule 20 match ip .protocol tcp .dest-port 8765 .source-ip 127.0.0.1
cumulus@switch:~$ nv set acl API-PROTECT rule 20 remark "Allow the local switch to talk to the API"

cumulus@switch:~$ nv set acl API-PROTECT rule 30 action deny
cumulus@switch:~$ nv set acl API-PROTECT rule 30 match ip .protocol tcp .dest-port 8765
cumulus@switch:~$ nv set acl API-PROTECT rule 30 remark "Block everyone else from talking to the API"

cumulus@switch:~$ nv set system control-plane acl API-PROTECT inbound

Supported Objects

The NVUE object model supports most features on the Cumulus Linux switch. The following list shows the supported objects. The NVUE API supports more objects within each of these objects. You can find a full listing of the supported API endpoints here.

High-level ObjectsDescription
aclAccess control lists.
bridgeBridge domain configuration.
evpnEVPN configuration.
interfaceInterface configuration.
mlagMLAG configuration.
nveNetwork virtualization configuration, such as VXLAN-specfic MLAG configuration and VXLAN flooding.
platformPlatform configuration, such as hardware and software components.
qosQoS RoCE configuration.
routerRouter configuration, such as router policies, global BGP and OSPF configuration, PBR, PIM, IGMP, VRR, and VRRP configuration.
serviceDHCP relays and server, NTP, PTP, LLDP, and syslog configuration.
systemGlobal system settings, such as the reserved routing table range for PBR and the reserved VLAN range for layer 3 VNIs, system login messages and switch reboot history.
vrfVRF configuration.

Use the API

The NVUE CLI and the REST API are equivalent in functionality; you can run all management operations from the REST API or from the CLI. The NVUE object model drives both the REST API and the CLI management operations. All operations are consistent; for example, the CLI nv show commands reflect any PATCH operation (create and update) you run through the REST API.

NVUE follows a declarative model, removing context-specific commands and settings. The structure of NVUE is like a big tree that represents the entire state of a Cumulus Linux instance. At the base of the tree are high level branches representing objects, such as router and interface. Under each of these branches are more branches. As you navigate through the tree, you gain a more specific context. At the leaves of the tree are actual attributes, represented as key-value pairs. The path through the tree is similar to a filesystem path.

Cumulus Linux enables the NVUE REST API by default. To disable the NVUE REST API, run the nv set system api state disabled command.

To use the NVUE REST API in Cumulus Linux 5.6, you must change the password for the cumulus user; otherwise you see 403 responses when you run commands.

API Port and Listening Address

This section shows how to:

The following example sets the port to 8888:

cumulus@switch:~$ nv set system api port 8888
cumulus@switch:~$ nv config apply

You can listen on multiple interfaces by specifying different listening addresses:

cumulus@switch:~$ nv set system api listening-address 10.10.10.1
cumulus@switch:~$ nv set system api listening-address 10.10.20.1
cumulus@switch:~$ nv config apply

The following example configures the listening address on eth0, which has IP address 172.0.24.0 and uses the management VRF by default:

cumulus@switch:~$ nv set system api listening-address 172.0.24.0
cumulus@switch:~$ nv config apply

The following example configures VRF BLUE on swp1, which has IP address 10.10.20.1, then sets the API listening address to the IP address for swp1 (configured for VRF BLUE).

cumulus@switch:~$ nv set interface swp1 ip address 10.10.10.1/24
cumulus@switch:~$ nv set interface swp1 ip vrf BLUE
cumulus@switch:~$ nv config apply

cumulus@switch:~$ nv set system api listening-address 10.10.10.1
cumulus@switch:~$ nv config apply

The following example sets the port to 8888:

cumulus@switch:~$ curl -u 'cumulus:cumulus' -k --request PATCH https://localhost:8765/nvue_v1/system/api?rev=2 -H 'Content-Type:application/json' -d '{"port": 8888 }'

You can listen on multiple interfaces by specifying different listening addresses. The following example sets localhost, interface address 10.10.10.1, and 10.10.20.1 as listen-addresses.

cumulus@switch:~$ curl -u 'cumulus:cumulus' -k --request PATCH https://localhost:8765/nvue_v1/system/api/listening-address?rev=2 -H 'Content-Type:application/json' -d '{ "localhost": {}, "10.10.10.1": {}, "10.10.20.1": {}}'

The following example configures the listening address on eth0, which has IP address 172.0.24.0 and uses the management VRF by default:

cumulus@switch:~$ curl -u 'cumulus:cumulus' -k --request PATCH https://localhost:8765/nvue_v1/system/api/listening-address?rev=2 -H 'Content-Type:application/json' -d '{"172.0.24.0": {}}'

Show NVUE REST API Information

To show REST API port configuration, state (enabled or disabled), and connection information:

Run the nv show system api command:

cumulus@switch:~$ nv show system api
                  operational     applied
--------------    -----------     -------
port                 8888         8888     
state                enabled      enabled  
[listening-address]  localhost    localhost
connections
  accepted        31
  active          1
  handled         33
  reading         0
  requests        28
  waiting         0
  writing         1

To show connection information only, run the nv show system api connections command:

cumulus@switch:~$ nv show system api connections
          operational  applied
--------  -----------  -------
accepted  31                  
active    1                   
handled   33                  
reading   0                   
requests  28                   
waiting   0                   
writing   1     

To show the configured listening address, run the nv show system api listening-address command:

cumulus@switch:~$ nv show system api listening-address

---------
localhost
cumulus@switch:~$ curl -u 'cumulus:cumulus' -k --request GET https://localhost:8765/nvue_v1/system/api?rev=2 -H "accept: application/json"
{
  "state": "enabled",
  "port": 8765,
  "listening-address": {
    "10.10.10.1": {},
    "10.10.20.1": {}
  },
  "connections": {
    "active": 1,
    "accepted": 31,
    "handled": 0,
    "requests": 28,
    "reading": 0,
    "writing": 1,
    "waiting": 0
  }
}

To show the configured listening address:

cumulus@switch:~$ curl -u 'cumulus:cumulus' -k --request GET https://localhost:8765/nvue_v1/system/api/listening-address?rev=2 -H "accept: application/json"
{
  "10.10.10.1": {},
  "10.10.20.1": {}
}

Run cURL Commands

You can run the cURL commands from the command line. Use the username and password for the switch. For example:

cumulus@switch:~$ curl  -u 'cumulus:cumulus' --insecure https://127.0.0.1:8765/nvue_v1/interface
{
  "eth0": {
    "ip": {
      "address": {
        "192.168.200.12/24": {}
      }
    },
    "link": {
      "mtu": 1500,
      "state": {
        "up": {}
      },
      "stats": {
        "carrier-transitions": 2,
        "in-bytes": 184151,
        "in-drops": 0,
        "in-errors": 0,
        "in-pkts": 2371,
        "out-bytes": 117506,
        "out-drops": 0,
        "out-errors": 0,
        "out-pkts": 762
      }
...

API Use Cases

The following examples show the primary API uses cases.

View a Configuration

Use the following example to obtain the current applied configuration on the switch. Change the rev argument to view any revision. Possible options for the rev argument include startup, pending, operational, and applied.

cumulus@switch:~$ curl -k -u cumulus:cumulus -X GET "https://127.0.0.1:8765/nvue_v1/?rev=applied&filled=false"
"acl": {}, 
  "bridge": { 
    "domain": { 
      "br_default": { 
        "encap": "802.1Q", 
        "mac-address": "auto", 
        "multicast": { 
          "snooping": { 
            "enable": "off" 
          } 
        }, 
        "stp": { 
          "priority": 32768, 
          "state": { 
            "up": {} 
          } 
        }, 
        "type": "vlan-aware", 
        "untagged": 1, 
        "vlan": { 
          "10": { 
            "multicast": { 
...  
#!/usr/bin/env python3

import requests
from requests.auth import HTTPBasicAuth
import json
import time

auth = HTTPBasicAuth(username="cumulus", password="password")
nvue_end_point = "https://127.0.0.1:8765/nvue_v1"
mime_header = {"Content-Type": "application/json"}

if __name__ == "__main__":
    r = requests.get(url=nvue_end_point + "/?rev=applied&filled=false",
                     auth=auth,
                     verify=False)
    print("=======Current Applied Revision=======")
    print(json.dumps(r.json(), indent=2))
cumulus@switch:~$ nv config show
- set: 
    bridge: 
      domain: 
        br_default: 
          type: vlan-aware 
          vlan: 
            '10': 
              vni: 
                '10': {} 
            '20': 
              vni: 
                '20': {} 
            '30': 
              vni: 
                '30': {} 
    evpn: 
      enable: on 
    mlag: 
      backup: 
        10.10.10.2: {} 
      enable: on 
      init-delay: 10 
      mac-address: 44:38:39:BE:EF:AA 
... 

Replace an Entire Configuration

To replace an entire configuration:

  1. Create a new revision ID with a POST:

    cumulus@switch:~$ curl -u 'cumulus:cumulus' --insecure -X POST https://127.0.0.1:8765/nvue_v1/revision
    {
     "1": {
       "state": "pending",
       "transition": {
         "issue": {},
         "progress": ""
       }
     }
    }
    
  2. Record the revision ID. In the above example, the revision ID is "1".

  3. Do a root patch to delete the whole configuration.

    cumulus@switch:~$ curl -u 'cumulus:cumulus' -d '{}' -H 'Content-Type: application/json' -k -X DELETE https://127.0.0.1:8765/nvue_v1/?rev=1
    {}
    
  4. Do a root patch to update the switch with the new configuration.

    cumulus@switch:~$ curl -u 'cumulus:cumulus' -d '{
       "system": {
         "hostname": "switch01"
       },
       "bridge": {
         "domain": {
           "br_default": {
             "type": "vlan-aware",
             "vlan": {
               "10": {
                 "vni": {
                   "10": {}
                   }
                 },
               "20": {
                 "vni": {
                   "20": {}
                 }
               },
               "30": {
                 "vni": {
                   "30": {}
                 }
               }
             }
           }
         }
       },
       "interface": {
         "eth0": {
           "ip": {
             "address": {
               "192.168.200.6/24": {}
             },
             "vrf": "mgmt"
           },
           "type": "eth"
         },
         "lo": {
           "ip": {
             "address": {
               "10.10.10.1/32": {}
             }
           },
           "type": "loopback"
         },
         "swp51": {
           "link": {
             "state": {
               "up": {}
             }
           },
           "type": "swp"
         },
         "swp52": {
           "link": {
             "state": {
               "up": {}
             }
           },
           "type": "swp"
         },
         "swp53": {
           "link": {
             "state": {
               "up": {}
             }
           },
           "type": "swp"
         },
         "swp54": {
           "link": {
             "state": {
               "up": {}
             }
           },
           "type": "swp"
         }
       },
       "mlag": {
         "backup": {
           "10.10.10.2": {}
         },
         "enable": "on",
         "init-delay": 10,
         "mac-address": "44:38:39:BE:EF:AA",
         "peer-ip": "linklocal",
         "priority": 1000
       }
       "router": {
         "bgp": {
           "enable": "on"
         },
         "vrr": {
           "enable": "on"
         }
       },
       "service": {},
       "vrf": {
         "mgmt": {
           "router": {
             "static": {
               "0.0.0.0/0": {
                 "address-family": "ipv4-unicast",
                 "via": {
                   "192.168.200.1": {
                     "type": "ipv4-address"
                   }
                 }
               }
             }
           }
         }
       }
     }' -H 'Content-Type: application/json' -k -X PATCH https://127.0.0.1:8765/nvue_v1/?rev=1
    {}
    
  5. Apply the changes with a PATCH to the revision changeset.

    cumulus@switch:~$ curl -u 'cumulus:cumulus' -H 'Content-Type:application/json' -d '{"state": "apply", "auto-prompt": {"ays": "ays_yes"}}' -k -X PATCH https://127.0.0.1:8765/nvue_v1/revision/1
    {
      "state": "apply",
      "transition": {
        "issue": {},
        "progress": ""
      }
    }
    
    cumulus@switch:~$ nv config apply
    
  6. Review the status of the apply and the configuration:

    cumulus@switch:~$ curl -u 'cumulus:cumulus' -k -X GET https://127.0.0.1:8765/nvue_v1/revision/1
    {
      "state": "applied",
      "transition": {
        "issue": {},
        "progress": ""
      }
    }
    
    cumulus@switch:~$ curl -u 'cumulus:cumulus' --insecure https://127.0.0.1:8765/nvue_v1/system
    {
     "build": "Cumulus Linux 5.4.0",
     "hostname": "switch01",
     "timezone": "Etc/UTC",
     "uptime": 763
    }
    cumulus@switch:~$ curl -u 'cumulus:cumulus' --insecure https://127.0.0.1:8765/nvue_v1/bridge/domain/br_default/vlan/10
    {
     "multicast": {
       "snooping": {
         "querier": {
           "source-ip": "0.0.0.0"
         }
       }
     },
     "ptp": {
       "enable": "off"
     },
     "vni": {
       "10": {
         "flooding": {
           "enable": "auto"
         },
         "mac-learning": "off"
       }
     }
    
    #!/usr/bin/env python3
    
    import requests
    from requests.auth import HTTPBasicAuth
    import json
    import time
    
    auth = HTTPBasicAuth(username="cumulus", password="password")
    nvue_end_point = "https://127.0.0.1:8765/nvue_v1"
    mime_header = {"Content-Type": "application/json"}
    
    DUMMY_SLEEP = 5  # In seconds
    POLL_APPLIED = 1  # in seconds
    RETRIES = 10
    
    def print_request(r: requests.Request):
        print("=======Request=======")
        print("URL:", r.url)
        print("Headers:", r.headers)
        print("Body:", r.body)
    
    def print_response(r: requests.Response):
        print("=======Response=======")
        print("Headers:", r.headers)
        print("Body:", json.dumps(r.json(), indent=2))
    
    def create_nvue_changest():
        r = requests.post(url=nvue_end_point + "/revision",
                          auth=auth,
                          verify=False)
        print_request(r.request)
        print_response(r)
        response = r.json()
        changeset = response.popitem()[0]
        return changeset
    
    def apply_nvue_changeset(changeset):
        apply_payload = {"state": "apply", "auto-prompt": {"ays": "ays_yes"}}
        url = nvue_end_point + "/revision/" + requests.utils.quote(changeset,
                                                                   safe="")
        r = requests.patch(url=url,
                           auth=auth,
                           verify=False,
                           data=json.dumps(apply_payload),
                           headers=mime_header)
        print_request(r.request)
        print_response(r)
    
    def is_config_applied(changeset) -> bool:
        # Check if the configuration was indeed applied
        global RETRIES
        global POLL_APPLIED
        retries = RETRIES
        while retries > 0:
            r = requests.get(url=nvue_end_point + "/revision/" + requests.utils.quote(changeset, safe=""),
                             auth=auth,
                             verify=False)
            response = r.json()
            print(response)
            if response["state"] == "applied":
                return True
            retries -= 1
            time.sleep(POLL_APPLIED)
    
        return False
    
    def apply_new_config(path,payload):
        # Create a new revision ID
        changeset = create_nvue_changest()
        print("Using NVUE Changeset: '{}'".format(changeset))
    
        # Delete existing configuration
        query_string = {"rev": changeset}
        r = requests.delete(url=nvue_end_point + path,
                           auth=auth,
                           verify=False,
                           params=query_string,
                           headers=mime_header)
        print_request(r.request)
        print_response(r)
    
        # Patch the new configuration
        
        query_string = {"rev": changeset}
        r = requests.patch(url=nvue_end_point + path,
                           auth=auth,
                           verify=False,
                           data=json.dumps(payload),
                           params=query_string,
                           headers=mime_header)
        print_request(r.request)
        print_response(r)
    
        # Apply the changes to the new revision changeset
        apply_nvue_changeset(changeset)
    
        # Check if the changeset was applied
        is_config_applied(changeset)
    
    def nvue_get(path):
        r = requests.get(url=nvue_end_point + path,
                         auth=auth,
                         verify=False)
        print_request(r.request)
        print_response(r)
    
    if __name__ == "__main__":
        payload = {
          "system": {
            "hostname": "switch01"
          },
          "bridge": {
            "domain": {
              "br_default": {
                "type": "vlan-aware",
                "vlan": {
                  "10": {
                    "vni": {
                      "10": {}
                      }
                    },
                  "20": {
                    "vni": {
                      "20": {}
                    }
                  },
                  "30": {
                    "vni": {
                      "30": {}
                    }
                  }
                }
              }
            }
          },
          "interface": {
            "eth0": {
              "ip": {
                "address": {
                  "192.168.200.6/24": {}
                },
                "vrf": "mgmt"
              },
              "type": "eth"
            },
            "lo": {
              "ip": {
                "address": {
                  "10.10.10.1/32": {}
                }
              },
              "type": "loopback"
            },
            "swp51": {
              "link": {
                "state": {
                  "up": {}
                }
              },
              "type": "swp"
            },
            "swp52": {
              "link": {
                "state": {
                  "up": {}
                }
              },
              "type": "swp"
            },
            "swp53": {
              "link": {
                "state": {
                  "up": {}
                }
              },
              "type": "swp"
            },
            "swp54": {
              "link": {
                "state": {
                  "up": {}
                }
              },
              "type": "swp"
            }
          },
          "mlag": {
            "backup": {
              "10.10.10.2": {}
            },
            "enable": "on",
            "init-delay": 10,
            "mac-address": "44:38:39:BE:EF:AA",
            "peer-ip": "linklocal",
            "priority": 1000
          }
          "router": {
            "bgp": {
              "enable": "on"
            },
            "vrr": {
              "enable": "on"
            }
          },
          "service": {},
          "vrf": {
            "mgmt": {
              "router": {
                "static": {
                  "0.0.0.0/0": {
                    "address-family": "ipv4-unicast",
                    "via": {
                      "192.168.200.1": {
                        "type": "ipv4-address"
                      }
                    }
                  }
                }
              }
            }
          }
        }
        apply_new_config("/",payload)
        time.sleep(DUMMY_SLEEP)
        print("=====Verifying some of the configurations=====")
        nvue_get("/system")
        nvue_get("/bridge/domain/br_default/vlan/10")
    
    cumulus@switch:~$ nv show system
                operational          applied
    --------  -------------------  -------
    hostname  switch01             cumulus
    build     Cumulus Linux 5.4.0
    uptime    0:12:59
    timezone  Etc/UTC
    
    cumulus@switch:~$ nv show bridge domain br_default vlan 10
    
                     operational  applied  pending  description
    ---------------  -----------  -------  -------  ------------------------------------------------------
    [vni]            10           10       10       L2 VNI
    multicast
      snooping
        querier
          source-ip  0.0.0.0      0.0.0.0  0.0.0.0  Source IP to use when sending IGMP/MLD queries.
    ptp
      enable         off          off      off      Turn the feature 'on' or 'off'.  The default is 'off'.
    

Make a Configuration Change

To make a configuration change:

  1. Create a new revision ID with a POST:

    cumulus@switch:~$ curl -u 'cumulus:cumulus' --insecure -X POST https://127.0.0.1:8765/nvue_v1/revision
    {
       "2": {
       "state": "pending",
       "transition": {
         "issue": {},
         "progress": ""
       }
     }
    }
    
  2. Record the revision ID. In the above example, the revision ID is "2".

  3. Make the change with a PATCH and link it to the revision ID:

    cumulus@switch:~$ curl -u 'cumulus:cumulus' -d '{"99.99.99.99/32": {}}' -H 'Content-Type: application/json' -k -X PATCH https://127.0.0.1:876nvue_v1/interface/lo/ip/address?rev=2
    {
      "99.99.99.99/32": {}
    }
    
    cumulus@switch:~$ nv set interface lo ip address 99.99.99.99/32
    
  4. Apply the changes with a PATCH to the revision changeset:

    cumulus@switch:~$ curl -u 'cumulus:cumulus' -H 'Content-Type:application/json' -k -X PATCH https://127.0.0.1:8765/nvue_v1/revision/2
    {
      "state": "apply",
      "transition": {
        "issue": {},
        "progress": ""
      }
    }
    
    cumulus@switch:~$ nv config apply
    
  5. Review the status of the apply and the configuration:

    cumulus@switch:~$ curl -u 'cumulus:cumulus' -k -X GET https://127.0.0.1:8765/nvue_v1/revision/2
    {
      "state": "applied",
      "transition": {
        "issue": {},
        "progress": ""
      }
    }
    
    cumulus@switch:~$ curl -u 'cumulus:cumulus' --insecure https://127.0.0.1:8765/nvue_v1/interface/lo/ip/address
    {
      "127.0.0.1/8": {},
      "99.99.99.99/32": {},
      "::1/128": {}
    }
    
    #!/usr/bin/env python3
    
    import requests
    from requests.auth import HTTPBasicAuth
    import json
    import time
    
    auth = HTTPBasicAuth(username="cumulus", password="password")
    nvue_end_point = "https://127.0.0.1:8765/nvue_v1"
    mime_header = {"Content-Type": "application/json"}
    
    DUMMY_SLEEP = 5  # In seconds
    POLL_APPLIED = 1  # in seconds
    RETRIES = 10
    
    def print_request(r: requests.Request):
        print("=======Request=======")
        print("URL:", r.url)
        print("Headers:", r.headers)
        print("Body:", r.body)
    
    def print_response(r: requests.Response):
        print("=======Response=======")
        print("Headers:", r.headers)
        print("Body:", json.dumps(r.json(), indent=2))
    
    def create_nvue_changest():
        r = requests.post(url=nvue_end_point + "/revision",
                          auth=auth,
                          verify=False)
        print_request(r.request)
        print_response(r)
        response = r.json()
        changeset = response.popitem()[0]
        return changeset
    
    def apply_nvue_changeset(changeset):
        apply_payload = {"state": "apply", "auto-prompt": {"ays": "ays_yes"}}
        url = nvue_end_point + "/revision/" + requests.utils.quote(changeset,
                                                                   safe="")
        r = requests.patch(url=url,
                           auth=auth,
                           verify=False,
                           data=json.dumps(apply_payload),
                           headers=mime_header)
        print_request(r.request)
        print_response(r)
    
    def is_config_applied(changeset) -> bool:
        # Check if the configuration was indeed applied
        global RETRIES
        global POLL_APPLIED
        retries = RETRIES
        while retries > 0:
            r = requests.get(url=nvue_end_point + "/revision/" + requests.utils.quote(changeset, safe=""),
                             auth=auth,
                             verify=False)
            response = r.json()
            print(response)
            if response["state"] == "applied":
                return True
            retries -= 1
            time.sleep(POLL_APPLIED)
    
        return False
    
    def apply_new_config(path,payload):
        # Create a new revision ID
        changeset = create_nvue_changest()
        print("Using NVUE Changeset: '{}'".format(changeset))
    
        # Delete existing configuration
        query_string = {"rev": changeset}
        r = requests.delete(url=nvue_end_point + path,
                           auth=auth,
                           verify=False,
                           params=query_string,
                           headers=mime_header)
        print_request(r.request)
        print_response(r)
    
        # Patch the new configuration
        
        query_string = {"rev": changeset}
        r = requests.patch(url=nvue_end_point + path,
                           auth=auth,
                           verify=False,
                           data=json.dumps(payload),
                           params=query_string,
                           headers=mime_header)
        print_request(r.request)
        print_response(r)
    
        # Apply the changes to the new revision changeset
        apply_nvue_changeset(changeset)
    
        # Check if the changeset was applied
        is_config_applied(changeset)
    
    def nvue_get(path):
        r = requests.get(url=nvue_end_point + path,
                         auth=auth,
                         verify=False)
        print_request(r.request)
        print_response(r)
    
    if __name__ == "__main__":
        payload = {
            "99.99.99.99/32": {}
        }
        apply_new_config("/interface/lo/ip/address",payload)
        time.sleep(DUMMY_SLEEP)
        nvue_get("/interface/lo/ip/address")
    
    cumulus@switch:~$ nv show interface lo ip address
       
    -------------
    99.99.99.99/32
    127.0.0.1/8
    ::1/128
    

Troubleshoot Configuration Changes

When a configuration change fails, you see an error in the change request.

Configuration Fails Because of a Dependency

If you stage a configuration but it fails because of a dependency, the failure shows the reason. In the following example, the change fails because the BGP router ID is not set.

cumulus@switch:~$ curl -u 'cumulus:cumulus' --insecure https://127.0.0.1:8765/nvue_v1/revision/6
{
  "state": "invalid",
  "transition": {
    "issue": {
      "0": {
        "code": "config_invalid",
        "data": {
          "location": "router.bgp.enable",
          "reason": "BGP requires router-id to be set globally or in the VRF.\n"
        },
        "message": "Config invalid at router.bgp.enable: BGP requires router-id to be set globally or in the VRF.\n",
        "severity": "error"
      }
    },
    "progress": "Invalid config"
  }
}

The staged configuration is missing router-id.

cumulus@switch:~$ curl -u 'cumulus:cumulus' --insecure https://127.0.0.1:8765/nvue_v1/vrf/default/router/bgp?rev=6
{
  "autonomous-system": 65999,
  "enable": "on"
}

Configuration Apply Fails with Warnings

In some cases, such as the first push with NVUE or if you change a file manually instead of using NVUE, you see a warning prompt and the apply fails.

cumulus@switch:~$ curl -u 'cumulus:cumulus' --insecure -X GET https://127.0.0.1:8765/nvue_v1/revision/6
{
  "6": {
    "state": "ays_fail",
    "transition": {
      "issue": {
        "0": {
          "code": "client_timeout",
          "data": {},
          "message": "Timeout while waiting for client response",
          "severity": "error"
        }
      },
      "progress": "Aborted apply after warnings"
    }
  }

To resolve this issue, observe the failures or errors, then inspect the configuration that you are trying to apply. After you resolve the errors, retry the API. If you prefer to overlook the errors and force an apply, add "auto-prompt":{"ays": "ays_yes"} to the configuration apply.

cumulus@switch:~$ curl -u 'cumulus:cumulus' -d '{"state":"apply","auto-prompt":{"ays": "ays_yes"}}' -H 'Content-Type:application/json' --insecure -X PATCH https://127.0.0.1:8765/nvue_v1/revision/6

Save a Configuration

To save an applied configuration change to the startup configuration file (/etc/nvue.d/startup.yaml) so that the changes persist after a reboot, use a PATCH to the applied revision with the save state.

cumulus@switch:~$ curl -u 'cumulus:cumulus' -k -X PATCH -d '{"state": "save", "auto-prompt": {"ays": "ays_yes"}}' -H 'Content-Type: application/json'  https://127.0.0.1:8765/nvue_v1/revision/applied 
{ 
  "state": "save",
  "transition": {
    "issue": {},
    "progress": ""
  }
}
#!/usr/bin/env python3

import requests
from requests.auth import HTTPBasicAuth
import json
import time

auth = HTTPBasicAuth(username="cumulus", password="password")
nvue_end_point = "https://127.0.0.1:8765/nvue_v1"
mime_header = {"Content-Type": "application/json"}

DUMMY_SLEEP = 5  # In seconds
POLL_APPLIED = 1  # in seconds
RETRIES = 10

def print_request(r: requests.Request):
    print("=======Request=======")
    print("URL:", r.url)
    print("Headers:", r.headers)
    print("Body:", r.body)

def print_response(r: requests.Response):
    print("=======Response=======")
    print("Headers:", r.headers)
    print("Body:", json.dumps(r.json(), indent=2))

def save_nvue_changeset():
    apply_payload = {"state": "save", "auto-prompt": {"ays": "ays_yes"}}
    url = nvue_end_point + "/revision/applied"
    r = requests.patch(url=url,
                       auth=auth,
                       verify=False,
                       data=json.dumps(apply_payload),
                       headers=mime_header)
    print_request(r.request)
    print_response(r)

if __name__ == "__main__":
    save_nvue_changeset()
cumulus@switch:~$ nv config save
saved

Unset a Configuration Change

To unset a configuration change, use the null value to the key. For example, to delete vlan100 from a switch, use the following syntax:

cumulus@switch:~$ curl -u 'cumulus:cumulus' -d '{"vlan100":null}' -H 'Content-Type: application/json' --insecure -X PATCH https://127.0.0.1:8765/nvue_v1/interface/rev=4

When you unset a change, you must still use the PATCH action. The value indicates removal of the entry. The data is {"vlan100":null} with the PATCH action.

Use the API for Active Monitoring

The example below fetches the counters for interface swp1.

cumulus@switch:~$ curl -u 'cumulus:cumulus' -k -X GET https://127.0.0.1:8765/nvue_v1/interface/swp1/link/stats
{
  "carrier-transitions": 6,
  "in-bytes": 293771538,
  "in-drops": 0,
  "in-errors": 0,
  "in-pkts": 2321737,
  "out-bytes": 366068936,
  "out-drops": 0,
  "out-errors": 0,
  "out-pkts": 3536629
}
#!/usr/bin/env python3

import requests
from requests.auth import HTTPBasicAuth
import json
import time

auth = HTTPBasicAuth(username="cumulus", password="password")
nvue_end_point = "https://127.0.0.1:8765/nvue_v1"
mime_header = {"Content-Type": "application/json"}

if __name__ == "__main__":
    r = requests.get(url=nvue_end_point + "/interface/swp1/link/stats",
                     auth=auth,
                     verify=False)
    print("=======Interface swp1 Statistics=======")
    print(json.dumps(r.json(), indent=2))
cumulus@switch:~$ nv show interface swp1 link stats
                     operational  applied  pending  description
-------------------  -----------  -------  -------  ----------------------------------------------------------------------
carrier-transitions  6                              Number of times the interface state has transitioned between up and...
in-bytes             280.15 MB                      total number of bytes received on the interface
in-drops             0                              number of received packets dropped
in-errors            0                              number of received packets with errors
in-pkts              2321659                        total number of packets received on the interface
out-bytes            349.10 MB                      total number of bytes transmitted out of the interface
out-drops            0                              The number of outbound packets that were chosen to be discarded eve...
out-errors           0                              The number of outbound packets that could not be transmitted becaus...
out-pkts             3536508                        total number of packets transmitted out of the interface

Convert CLI Changes to Use the API

You can take a configuration change from the CLI and use the API to configure the same set of changes.

  1. Make your configuration changes on the system with the NVUE CLI.

    cumulus@switch:~$ nv set system hostname switch01
    cumulus@switch:~$ nv set interface lo ip address 99.99.99.99/32
    cumulus@switch:~$ nv set interface eth0 ip address 192.168.200.6/24
    cumulus@switch:~$ nv set interface bond0 bond member swp1-4
    
  2. View the changes as a JSON blob.

    cumulus@switch:~$ nv config diff -o json
    [
      {
        "set": {
          "interface": {
            "bond0": {
              "bond": {
                "member": {
                  "swp1": {},
                  "swp2": {},
                  "swp3": {},
                  "swp4": {}
                }
              },
              "type": "bond"
            },
            "lo": {
              "ip": {
                "address": {
                  "99.99.99.99/32": {}
                }
              }
            }
          },
          "system": {
            "hostname": "switch01"
          }
        }
      }
    ]
    
  3. Staple the JSON blob to a root patch request as the payload.

    cumulus@switch:~$ curl -u 'cumulus:cumulus' -d '{
          "interface": {
            "bond0": {
              "bond": {
                "member": {
                  "swp1": {},
                  "swp2": {},
                  "swp3": {},
                  "swp4": {}
                }
              },
              "type": "bond"
            },
            "lo": {
              "ip": {
                "address": {
                  "99.99.99.99/32": {}
                }
              }
            }
          },
          "system": {
            "hostname": "switch01"
          }
        }' -k -X PATCH https://127.0.0.1:8765/nvue_v1/?rev=3
    
    {
      "bridge": {
        "domain": {
          "br_default": {
            "type": "vlan-aware",
            "vlan": {
              "10": {
                "vni": {
                  "10": {}
                }
              },
              "20": {
                "vni": {
                  "20": {}
                }
              },
              "30": {
                "vni": {
                  "30": {}
                }
              }
            }
          }
        }
      },
      "evpn": {
        "enable": "on"
      },
      "interface": {
        "bond1": {
          "bond": {
            "lacp-bypass": "on",
            "member": {
              "swp1": {}
            },
    ...
    
  4. Apply the changes with a PATCH to the revision changeset.

    cumulus@switch:~$ curl -u 'cumulus:cumulus' -H 'Content-Type:application/json' -k -d '{"state": "apply", "auto-prompt": {"ays": "ays_yes"}}' -X PATCH https://127.0.0.1:8765/nvue_v1/revision/3
    {
      "state": "apply",
      "transition": {
        "issue": {},
        "progress": ""
      }
    }
    
  5. Review the status of the apply and the configuration:

    cumulus@switch:~$ curl -u 'cumulus:cumulus' -k -X GET https://127.0.0.1:8765/nvue_v1/revision/3
    {
      "state": "applied",
      "transition": {
        "issue": {},
        "progress": ""
      }
    }
    
    #!/usr/bin/env python3
    
    import requests
    from requests.auth import HTTPBasicAuth
    import json
    import time
    
    auth = HTTPBasicAuth(username="cumulus", password="password")
    nvue_end_point = "https://127.0.0.1:8765/nvue_v1"
    mime_header = {"Content-Type": "application/json"}
    
    DUMMY_SLEEP = 5  # In seconds
    POLL_APPLIED = 1  # in seconds
    RETRIES = 10
    
    def print_request(r: requests.Request):
        print("=======Request=======")
        print("URL:", r.url)
        print("Headers:", r.headers)
        print("Body:", r.body)
    
    def print_response(r: requests.Response):
        print("=======Response=======")
        print("Headers:", r.headers)
        print("Body:", json.dumps(r.json(), indent=2))
    
    def create_nvue_changest():
        r = requests.post(url=nvue_end_point + "/revision",
                          auth=auth,
                          verify=False)
        print_request(r.request)
        print_response(r)
        response = r.json()
        changeset = response.popitem()[0]
        return changeset
    
    def apply_nvue_changeset(changeset):
        # apply_payload = {"state": "apply"}
        apply_payload = {"state": "apply", "auto-prompt": {"ays": "ays_yes"}}
        url = nvue_end_point + "/revision/" + requests.utils.quote(changeset,
                                                                   safe="")
        r = requests.patch(url=url,
                           auth=auth,
                           verify=False,
                           data=json.dumps(apply_payload),
                           headers=mime_header)
        print_request(r.request)
        print_response(r)
    
    def is_config_applied(changeset) -> bool:
        # Check if the configuration was indeed applied
        global RETRIES
        global POLL_APPLIED
        retries = RETRIES
        while retries > 0:
            r = requests.get(url=nvue_end_point + "/revision/" + requests.utils.quote(changeset, safe=""),
                             auth=auth,
                             verify=False)
            response = r.json()
            print(response)
    
            if response["state"] == "applied":
                return True
            retries -= 1
            time.sleep(POLL_APPLIED)
    
        return False
    
    def apply_new_config(path,payload):
        # Create a new revision ID
        changeset = create_nvue_changest()
        print("Using NVUE Changeset: '{}'".format(changeset))
    
        # Delete existing configuration
        query_string = {"rev": changeset}
        r = requests.delete(url=nvue_end_point + path,
                           auth=auth,
                           verify=False,
                           params=query_string,
                           headers=mime_header)
        print_request(r.request)
        print_response(r)
    
        # Patch the new configuration
        
        query_string = {"rev": changeset}
        r = requests.patch(url=nvue_end_point + path,
                           auth=auth,
                           verify=False,
                           data=json.dumps(payload),
                           params=query_string,
                           headers=mime_header)
        print_request(r.request)
        print_response(r)
    
        # Apply the changes to the new revision changeset
        apply_nvue_changeset(changeset)
    
        # Check if the changeset was applied
        is_config_applied(changeset)
    
    def nvue_get(path):
        r = requests.get(url=nvue_end_point + path,
                         auth=auth,
                         verify=False)
        print_request(r.request)
        print_response(r)
    
    if __name__ == "__main__":
        payload = {
          "interface": {
            "bond0": {
              "bond": {
                "member": {
                  "swp1": {},
                  "swp2": {},
                  "swp3": {},
                  "swp4": {}
                }
              },
              "type": "bond"
            },
            "lo": {
              "ip": {
                "address": {
                  "99.99.99.99/32": {}
                }
              }
            }
          },
          "system": {
            "hostname": "switch01"
          }
        }
        apply_new_config("/",payload)
        time.sleep(DUMMY_SLEEP)
        nvue_get("/interface/bond0")
        nvue_get("/interface/lo")
        nvue_get("/system")
    
    

API Examples

The following section provides practical API examples.

Configure the System

To set the system hostname, pre-login or post-login message, and time zone on the switch, send a targeted API request to /nvue_v1/system.

cumulus@switch:~$ curl -u 'cumulus:cumulus' -d '{"system": {"hostname":"switch01","timezone":"America/Los_Angeles","message":{"pre-login":"Welcome to NVIDIA Cumulus Linux","post-login":"You have successfully logged in to switch01"}}}' -k -X PATCH https://127.0.0.1:8765/nvue_v1/?rev=4
#!/usr/bin/env python3

import requests
from requests.auth import HTTPBasicAuth
import json
import time

auth = HTTPBasicAuth(username="cumulus", password="password")
nvue_end_point = "https://127.0.0.1:8765/nvue_v1"
mime_header = {"Content-Type": "application/json"}

DUMMY_SLEEP = 5  # In seconds
POLL_APPLIED = 1  # in seconds
RETRIES = 10

def print_request(r: requests.Request):
    print("=======Request=======")
    print("URL:", r.url)
    print("Headers:", r.headers)
    print("Body:", r.body)

def print_response(r: requests.Response):
    print("=======Response=======")
    print("Headers:", r.headers)
    print("Body:", json.dumps(r.json(), indent=2))

def create_nvue_changest():
    r = requests.post(url=nvue_end_point + "/revision",
                      auth=auth,
                      verify=False)
    print_request(r.request)
    print_response(r)
    response = r.json()
    changeset = response.popitem()[0]
    return changeset

def apply_nvue_changeset(changeset):
    # apply_payload = {"state": "apply"}
    apply_payload = {"state": "apply", "auto-prompt": {"ays": "ays_yes"}}
    url = nvue_end_point + "/revision/" + requests.utils.quote(changeset,
                                                               safe="")
    r = requests.patch(url=url,
                       auth=auth,
                       verify=False,
                       data=json.dumps(apply_payload),
                       headers=mime_header)
    print_request(r.request)
    print_response(r)

def is_config_applied(changeset) -> bool:
    # Check if the configuration was indeed applied
    global RETRIES
    global POLL_APPLIED
    retries = RETRIES
    while retries > 0:
        r = requests.get(url=nvue_end_point + "/revision/" + requests.utils.quote(changeset, safe=""),
                         auth=auth,
                         verify=False)
        response = r.json()
        print(response)

        if response["state"] == "applied":
            return True
        retries -= 1
        time.sleep(POLL_APPLIED)

    return False

def apply_new_config(path,payload):
    # Create a new revision ID
    changeset = create_nvue_changest()
    print("Using NVUE Changeset: '{}'".format(changeset))

    # Delete existing configuration
    query_string = {"rev": changeset}
    r = requests.delete(url=nvue_end_point + path,
                       auth=auth,
                       verify=False,
                       params=query_string,
                       headers=mime_header)
    print_request(r.request)
    print_response(r)

    # Patch the new configuration
    
    query_string = {"rev": changeset}
    r = requests.patch(url=nvue_end_point + path,
                       auth=auth,
                       verify=False,
                       data=json.dumps(payload),
                       params=query_string,
                       headers=mime_header)
    print_request(r.request)
    print_response(r)

    # Apply the changes to the new revision changeset
    apply_nvue_changeset(changeset)

    # Check if the changeset was applied
    is_config_applied(changeset)

def nvue_get(path):
    r = requests.get(url=nvue_end_point + path,
                     auth=auth,
                     verify=False)
    print_request(r.request)
    print_response(r)

if __name__ == "__main__":
    payload = {
      "system": 
      {
        "hostname":"switch01",
        "timezone":"America/Los_Angeles",
        "message":
        {
          "pre-login":"Welcome to NVIDIA Cumulus Linux",
          "post-login:"You have successfully logged in to switch01"
        }
      }
    }
    apply_new_config("/",payload) # Root patch
    time.sleep(DUMMY_SLEEP)
    nvue_get("/system")
cumulus@switch:~$ nv set system hostname switch01
cumulus@switch:~$ nv set system timezone America/Los_Angeles
cumulus@switch:~$ nv set system message pre-login "Welcome to NVIDIA Cumulus Linux"
cumulus@switch:~$ nv set system message post-login "You have successfully logged into switch01"

Configure Services

To set up NTP, DNS, and SNMP on the switch, send a targeted API request to /nvue_v1/service.

cumulus@switch:~$ curl -u 'cumulus:cumulus' -d '{"service": { "ntp": {"default":{"server":{"4.cumulusnetworks.pool.ntp.org":{"iburst":"on"}}}}, "dns": {"mgmt":{"server":{"192.168.1.100":{}}}}, "syslog": {"mgmt":{"server":{"192.168.1.120":{"port":8000}}}}}}' -k -X PATCH https://127.0.0.1:8765/nvue_v1/?rev=5
#!/usr/bin/env python3

import requests
from requests.auth import HTTPBasicAuth
import json
import time

auth = HTTPBasicAuth(username="cumulus", password="password")
nvue_end_point = "https://127.0.0.1:8765/nvue_v1"
mime_header = {"Content-Type": "application/json"}

DUMMY_SLEEP = 5  # In seconds
POLL_APPLIED = 1  # in seconds
RETRIES = 10

def print_request(r: requests.Request):
    print("=======Request=======")
    print("URL:", r.url)
    print("Headers:", r.headers)
    print("Body:", r.body)

def print_response(r: requests.Response):
    print("=======Response=======")
    print("Headers:", r.headers)
    print("Body:", json.dumps(r.json(), indent=2))

def create_nvue_changest():
    r = requests.post(url=nvue_end_point + "/revision",
                      auth=auth,
                      verify=False)
    print_request(r.request)
    print_response(r)
    response = r.json()
    changeset = response.popitem()[0]
    return changeset

def apply_nvue_changeset(changeset):
    # apply_payload = {"state": "apply"}
    apply_payload = {"state": "apply", "auto-prompt": {"ays": "ays_yes"}}
    url = nvue_end_point + "/revision/" + requests.utils.quote(changeset,
                                                               safe="")
    r = requests.patch(url=url,
                       auth=auth,
                       verify=False,
                       data=json.dumps(apply_payload),
                       headers=mime_header)
    print_request(r.request)
    print_response(r)

def is_config_applied(changeset) -> bool:
    # Check if the configuration was indeed applied
    global RETRIES
    global POLL_APPLIED
    retries = RETRIES
    while retries > 0:
        r = requests.get(url=nvue_end_point + "/revision/" + requests.utils.quote(changeset, safe=""),
                         auth=auth,
                         verify=False)
        response = r.json()
        print(response)

        if response["state"] == "applied":
            return True
        retries -= 1
        time.sleep(POLL_APPLIED)

    return False

def apply_new_config(path,payload):
    # Create a new revision ID
    changeset = create_nvue_changest()
    print("Using NVUE Changeset: '{}'".format(changeset))

    # Delete existing configuration
    query_string = {"rev": changeset}
    r = requests.delete(url=nvue_end_point + path,
                       auth=auth,
                       verify=False,
                       params=query_string,
                       headers=mime_header)
    print_request(r.request)
    print_response(r)

    # Patch the new configuration
    
    query_string = {"rev": changeset}
    r = requests.patch(url=nvue_end_point + path,
                       auth=auth,
                       verify=False,
                       data=json.dumps(payload),
                       params=query_string,
                       headers=mime_header)
    print_request(r.request)
    print_response(r)

    # Apply the changes to the new revision changeset
    apply_nvue_changeset(changeset)

    # Check if the changeset was applied
    is_config_applied(changeset)

def nvue_get(path):
    r = requests.get(url=nvue_end_point + path,
                     auth=auth,
                     verify=False)
    print_request(r.request)
    print_response(r)

if __name__ == "__main__":
    payload = {
      "service":
      {
        "ntp":
        {
          "default":
          {
            "server:
            {
              "4.cumulusnetworks.pool.ntp.org":
              {
                "iburst":"on"
              }
            }
          }
        },
        "dns":
        {
          "mgmt":
          {
            "server:
            {
              "192.168.1.100":{}
            }
          }
        },
        "syslog":
        {
          "mgmt":
          {
            "server:
            {
              "192.168.1.120":
              {
                "port":8000
              }
            }
          }
        }
      }
    }
    apply_new_config("/",payload) # Root patch
    time.sleep(DUMMY_SLEEP)
    nvue_get("/service/ntp")
    nvue_get("/service/dns")
    nvue_get("/service/syslog")
cumulus@switch:~$ nv set service ntp default server 4.cumulusnetworks.pool.ntp.org iburst on
cumulus@switch:~$ nv set service dns mgmt server 192.168.1.100 
cumulus@switch:~$ nv set service syslog mgmt server 192.168.1.120 port 8000

Configure Users

The following example creates a new user, then deletes the user.

This example creates a new user called test1.

cumulus@switch:~$ curl -u 'cumulus:cumulus' -d '{"system": {"aaa": {"user": {"test1": {"hashed-password":"72b28582708d749c6c82f3b3f226041f1bd37090281641eaeba8d44bd915d0042d609a92759d9f6fb96475cb0601cf428cd22613df8a53a09461e0b426cf0a35","role": "nvue-monitor","enable": "on","full-name": "Test User"}}}}}' -k -X PATCH https://127.0.0.1:8765/nvue_v1/?rev=5

This example deletes the test1 user.

cumulus@switch:~$ curl -u 'cumulus:cumulus' -k -X DELETE https://127.0.0.1:8765/nvue_v1/system/aaa/user/test1?rev=6
#!/usr/bin/env python3

import requests
from requests.auth import HTTPBasicAuth
import json
import time

auth = HTTPBasicAuth(username="cumulus", password="password")
nvue_end_point = "https://127.0.0.1:8765/nvue_v1"
mime_header = {"Content-Type": "application/json"}

DUMMY_SLEEP = 5  # In seconds
POLL_APPLIED = 1  # in seconds
RETRIES = 10

def print_request(r: requests.Request):
    print("=======Request=======")
    print("URL:", r.url)
    print("Headers:", r.headers)
    print("Body:", r.body)

def print_response(r: requests.Response):
    print("=======Response=======")
    print("Headers:", r.headers)
    print("Body:", json.dumps(r.json(), indent=2))

def create_nvue_changest():
    r = requests.post(url=nvue_end_point + "/revision",
                      auth=auth,
                      verify=False)
    print_request(r.request)
    print_response(r)
    response = r.json()
    changeset = response.popitem()[0]
    return changeset

def apply_nvue_changeset(changeset):
    # apply_payload = {"state": "apply"}
    apply_payload = {"state": "apply", "auto-prompt": {"ays": "ays_yes"}}
    url = nvue_end_point + "/revision/" + requests.utils.quote(changeset,
                                                               safe="")
    r = requests.patch(url=url,
                       auth=auth,
                       verify=False,
                       data=json.dumps(apply_payload),
                       headers=mime_header)
    print_request(r.request)
    print_response(r)

def is_config_applied(changeset) -> bool:
    # Check if the configuration was indeed applied
    global RETRIES
    global POLL_APPLIED
    retries = RETRIES
    while retries > 0:
        r = requests.get(url=nvue_end_point + "/revision/" + requests.utils.quote(changeset, safe=""),
                         auth=auth,
                         verify=False)
        response = r.json()
        print(response)

        if response["state"] == "applied":
            return True
        retries -= 1
        time.sleep(POLL_APPLIED)

    return False

def apply_new_config(path,payload):
    # Create a new revision ID
    changeset = create_nvue_changest()
    print("Using NVUE Changeset: '{}'".format(changeset))

    # Delete existing configuration
    query_string = {"rev": changeset}
    r = requests.delete(url=nvue_end_point + path,
                       auth=auth,
                       verify=False,
                       params=query_string,
                       headers=mime_header)
    print_request(r.request)
    print_response(r)

    # Patch the new configuration
    
    query_string = {"rev": changeset}
    r = requests.patch(url=nvue_end_point + path,
                       auth=auth,
                       verify=False,
                       data=json.dumps(payload),
                       params=query_string,
                       headers=mime_header)
    print_request(r.request)
    print_response(r)

    # Apply the changes to the new revision changeset
    apply_nvue_changeset(changeset)

    # Check if the changeset was applied
    is_config_applied(changeset)

def delete_config(path):
    # Create an NVUE changeset
    changeset = create_nvue_changest()
    print("Using NVUE Changeset: '{}'".format(changeset))

    # Equivalent to JSON `null`
    payload = None

    # Stage the change
    query_string = {"rev": changeset}
    r = requests.delete(url=nvue_end_point + path,
                        auth=auth,
                        verify=False,
                        data=json.dumps(payload),
                        params=query_string,
                        headers=mime_header)
    print_request(r.request)
    print_response(r)

    # Apply the staged changeset
    apply_nvue_changeset(changeset)

    # Check if the changeset was applied
    is_config_applied(changeset)

def nvue_get(path):
    r = requests.get(url=nvue_end_point + path,
                     auth=auth,
                     verify=False)
    print_request(r.request)
    print_response(r)

if __name__ == "__main__":

    # Need to create a hashed password - The supported password
    # hashes are documented here:
    # https://docs.nvidia.com/networking-ethernet-software/cumulus-linux-55/System-Configuration/Authentication-Authorization-and-Accounting/User-Accounts/#hashed-passwords  # noqa
    # Here in this example, we use SHA-512
    import crypt
    hashed_password = crypt.crypt("hello$world#2023", salt=crypt.METHOD_SHA512)
    payload = {
        "system": {
            "aaa": {
                "user": {
                    "test1": {
                        "hashed-password": hashed_password,
                        "role": "nvue-monitor",
                        "enable": "on",
                        "full-name": "Test User",
                    }
                }
            }
        }
    }
    apply_new_config("/",payload) # Root patch
    time.sleep(DUMMY_SLEEP)
    nvue_get("/system/user/aaa")

    """Delete an existing user account using the AAA API."""
    delete_config("/system/aaa/user/test1")
    time.sleep(DUMMY_SLEEP)
    nvue_get("/system/user/aaa")

This example creates a new user test1.

cumulus@switch:~$ nv set system aaa user test1
cumulus@switch:~$ nv set system aaa user test1 full-name "Test User" 
cumulus@switch:~$ nv set system aaa user test1 password "abcd@test"
cumulus@switch:~$ nv set system aaa user test1 role nvue-monitor
cumulus@switch:~$ nv set system aaa user test1 enable on

This example deletes the user test1.

cumulus@switch:~$ nv unset system aaa user test1

Configure an Interface

The following example configures an interface.

cumulus@switch:~$ curl -u 'cumulus:cumulus' -k -d '{"swp1": {"link":{"state":{"up": {}}}}}' -H 'Content-Type: application/json' -k -X PATCH https://127.0.0.1:8765/nvue_v1/interface?rev=21
#!/usr/bin/env python3

import requests
from requests.auth import HTTPBasicAuth
import json
import time

auth = HTTPBasicAuth(username="cumulus", password="password")
nvue_end_point = "https://127.0.0.1:8765/nvue_v1"
mime_header = {"Content-Type": "application/json"}

DUMMY_SLEEP = 5  # In seconds
POLL_APPLIED = 1  # in seconds
RETRIES = 10

def print_request(r: requests.Request):
    print("=======Request=======")
    print("URL:", r.url)
    print("Headers:", r.headers)
    print("Body:", r.body)

def print_response(r: requests.Response):
    print("=======Response=======")
    print("Headers:", r.headers)
    print("Body:", json.dumps(r.json(), indent=2))

def create_nvue_changest():
    r = requests.post(url=nvue_end_point + "/revision",
                      auth=auth,
                      verify=False)
    print_request(r.request)
    print_response(r)
    response = r.json()
    changeset = response.popitem()[0]
    return changeset

def apply_nvue_changeset(changeset):
    # apply_payload = {"state": "apply"}
    apply_payload = {"state": "apply", "auto-prompt": {"ays": "ays_yes"}}
    url = nvue_end_point + "/revision/" + requests.utils.quote(changeset,
                                                               safe="")
    r = requests.patch(url=url,
                       auth=auth,
                       verify=False,
                       data=json.dumps(apply_payload),
                       headers=mime_header)
    print_request(r.request)
    print_response(r)

def is_config_applied(changeset) -> bool:
    # Check if the configuration was indeed applied
    global RETRIES
    global POLL_APPLIED
    retries = RETRIES
    while retries > 0:
        r = requests.get(url=nvue_end_point + "/revision/" + requests.utils.quote(changeset, safe=""),
                         auth=auth,
                         verify=False)
        response = r.json()
        print(response)

        if response["state"] == "applied":
            return True
        retries -= 1
        time.sleep(POLL_APPLIED)

    return False

def apply_new_config(path,payload):
    # Create a new revision ID
    changeset = create_nvue_changest()
    print("Using NVUE Changeset: '{}'".format(changeset))

    # Delete existing configuration
    query_string = {"rev": changeset}
    r = requests.delete(url=nvue_end_point + path,
                       auth=auth,
                       verify=False,
                       params=query_string,
                       headers=mime_header)
    print_request(r.request)
    print_response(r)

    # Patch the new configuration
    
    query_string = {"rev": changeset}
    r = requests.patch(url=nvue_end_point + path,
                       auth=auth,
                       verify=False,
                       data=json.dumps(payload),
                       params=query_string,
                       headers=mime_header)
    print_request(r.request)
    print_response(r)

    # Apply the changes to the new revision changeset
    apply_nvue_changeset(changeset)

    # Check if the changeset was applied
    is_config_applied(changeset)

def nvue_get(path):
    r = requests.get(url=nvue_end_point + path,
                     auth=auth,
                     verify=False)
    print_request(r.request)
    print_response(r)

if __name__ == "__main__":
    payload = {
      "swp1":
      {
        "type":"swp",
        "link":
        {
          "state":"up"
          }
        }
      }
    apply_new_config("/interface",payload)
    time.sleep(DUMMY_SLEEP)
    nvue_get("/interface/swp1")
cumulus@switch:~$ nv set interface swp1

Configure a Bond

The following example configures a bond.

cumulus@switch:~$ curl -u 'cumulus:cumulus' -d '{"bond0": {"type":"bond","bond":{"member":{"swp1":{},"swp2":{},"swp3":{},"swp4":{}}}}}' -H 'Content-Type: application/json' -k -X PATCH https://127.0.0.1:8765/nvue_v1/interface?rev=7
{
  "bond0": {
    "bond": {
      "member": {
        "swp1": {},
        "swp2": {},
        "swp3": {},
        "swp4": {}
      }
    },
    "type": "bond"
  },
  "bond1": {
    "bond": {
      "lacp-bypass": "on",
      "member": {
        "swp1": {}
      },
      "mlag": {
        "enable": "on",
        "id": 1
      },
      "mode": "lacp"
    },
    "bridge": {
      "domain": {
        "br_default": {
          "access": 10,
          "stp": {
            "admin-edge": "on",
            "auto-edge": "on",
            "bpdu-guard": "on"
          }
        }
      }
    },
    "link": {
      "mtu": 9000
    },
    "type": "bond"
  },
  "eth0": {
    "ip": {
      "address": {
        "192.168.200.6/24": {}
      },
      "vrf": "mgmt"
    },
    "type": "eth"
  },
  "lo": {
    "ip": {
      "address": {
        "10.10.10.1/32": {}
      }
    },
    "type": "loopback"
  }
}
#!/usr/bin/env python3

import requests
from requests.auth import HTTPBasicAuth
import json
import time

auth = HTTPBasicAuth(username="cumulus", password="password")
nvue_end_point = "https://127.0.0.1:8765/nvue_v1"
mime_header = {"Content-Type": "application/json"}

DUMMY_SLEEP = 5  # In seconds
POLL_APPLIED = 1  # in seconds
RETRIES = 10

def print_request(r: requests.Request):
    print("=======Request=======")
    print("URL:", r.url)
    print("Headers:", r.headers)
    print("Body:", r.body)

def print_response(r: requests.Response):
    print("=======Response=======")
    print("Headers:", r.headers)
    print("Body:", json.dumps(r.json(), indent=2))

def create_nvue_changest():
    r = requests.post(url=nvue_end_point + "/revision",
                      auth=auth,
                      verify=False)
    print_request(r.request)
    print_response(r)
    response = r.json()
    changeset = response.popitem()[0]
    return changeset

def apply_nvue_changeset(changeset):
    # apply_payload = {"state": "apply"}
    apply_payload = {"state": "apply", "auto-prompt": {"ays": "ays_yes"}}
    url = nvue_end_point + "/revision/" + requests.utils.quote(changeset,
                                                               safe="")
    r = requests.patch(url=url,
                       auth=auth,
                       verify=False,
                       data=json.dumps(apply_payload),
                       headers=mime_header)
    print_request(r.request)
    print_response(r)

def is_config_applied(changeset) -> bool:
    # Check if the configuration was indeed applied
    global RETRIES
    global POLL_APPLIED
    retries = RETRIES
    while retries > 0:
        r = requests.get(url=nvue_end_point + "/revision/" + requests.utils.quote(changeset, safe=""),
                         auth=auth,
                         verify=False)
        response = r.json()
        print(response)

        if response["state"] == "applied":
            return True
        retries -= 1
        time.sleep(POLL_APPLIED)

    return False

def apply_new_config(path,payload):
    # Create a new revision ID
    changeset = create_nvue_changest()
    print("Using NVUE Changeset: '{}'".format(changeset))

    # Delete existing configuration
    query_string = {"rev": changeset}
    r = requests.delete(url=nvue_end_point + path,
                       auth=auth,
                       verify=False,
                       params=query_string,
                       headers=mime_header)
    print_request(r.request)
    print_response(r)

    # Patch the new configuration
    
    query_string = {"rev": changeset}
    r = requests.patch(url=nvue_end_point + path,
                       auth=auth,
                       verify=False,
                       data=json.dumps(payload),
                       params=query_string,
                       headers=mime_header)
    print_request(r.request)
    print_response(r)

    # Apply the changes to the new revision changeset
    apply_nvue_changeset(changeset)

    # Check if the changeset was applied
    is_config_applied(changeset)

def nvue_get(path):
    r = requests.get(url=nvue_end_point + path,
                     auth=auth,
                     verify=False)
    print_request(r.request)
    print_response(r)

if __name__ == "__main__":
    payload = {
      "bond0":
      {
        "type":"bond",
        "bond":
        {
          "member":
          {
            "swp1":{},
            "swp2":{},
            "swp3":{},
            "swp4":{}
          }
        }
      }
    }
    apply_new_config("/interface",payload)
    time.sleep(DUMMY_SLEEP)
    nvue_get("/interface/bond0")
cumulus@switch:~$ nv set interface bond0 bond member swp1-4

Configure a Bridge

The following example configures a bridge.

cumulus@switch:~$ curl -u 'cumulus:cumulus' -d '{"swp1": {"bridge":{"domain":{"br_default":{}}}},"swp2": {"bridge":{"domain":{"br_default":{}}}}}' -H 'Content-Type: application/json' -k -X PATCH https://127.0.0.1:8765/nvue_v1/interface?rev=21
cumulus@switch:~$ curl -u 'cumulus:cumulus' -d '{"untagged":1,"vlan":{"10":{},"20":{}}}' -H 'Content-Type: application/json' -k -X PATCH https://127.0.0.1:8765/nvue_v1/bridge/domain/br_default?rev=8
#!/usr/bin/env python3

import requests
from requests.auth import HTTPBasicAuth
import json
import time

auth = HTTPBasicAuth(username="cumulus", password="password")
nvue_end_point = "https://127.0.0.1:8765/nvue_v1"
mime_header = {"Content-Type": "application/json"}

DUMMY_SLEEP = 5  # In seconds
POLL_APPLIED = 1  # in seconds
RETRIES = 10

def print_request(r: requests.Request):
    print("=======Request=======")
    print("URL:", r.url)
    print("Headers:", r.headers)
    print("Body:", r.body)

def print_response(r: requests.Response):
    print("=======Response=======")
    print("Headers:", r.headers)
    print("Body:", json.dumps(r.json(), indent=2))

def create_nvue_changest():
    r = requests.post(url=nvue_end_point + "/revision",
                      auth=auth,
                      verify=False)
    print_request(r.request)
    print_response(r)
    response = r.json()
    changeset = response.popitem()[0]
    return changeset

def apply_nvue_changeset(changeset):
    # apply_payload = {"state": "apply"}
    apply_payload = {"state": "apply", "auto-prompt": {"ays": "ays_yes"}}
    url = nvue_end_point + "/revision/" + requests.utils.quote(changeset,
                                                               safe="")
    r = requests.patch(url=url,
                       auth=auth,
                       verify=False,
                       data=json.dumps(apply_payload),
                       headers=mime_header)
    print_request(r.request)
    print_response(r)

def is_config_applied(changeset) -> bool:
    # Check if the configuration was indeed applied
    global RETRIES
    global POLL_APPLIED
    retries = RETRIES
    while retries > 0:
        r = requests.get(url=nvue_end_point + "/revision/" + requests.utils.quote(changeset, safe=""),
                         auth=auth,
                         verify=False)
        response = r.json()
        print(response)

        if response["state"] == "applied":
            return True
        retries -= 1
        time.sleep(POLL_APPLIED)

    return False

def apply_new_config(path,payload):
    # Create a new revision ID
    changeset = create_nvue_changest()
    print("Using NVUE Changeset: '{}'".format(changeset))

    # Delete existing configuration
    query_string = {"rev": changeset}
    r = requests.delete(url=nvue_end_point + path,
                       auth=auth,
                       verify=False,
                       params=query_string,
                       headers=mime_header)
    print_request(r.request)
    print_response(r)

    # Patch the new configuration
    
    query_string = {"rev": changeset}
    r = requests.patch(url=nvue_end_point + path,
                       auth=auth,
                       verify=False,
                       data=json.dumps(payload),
                       params=query_string,
                       headers=mime_header)
    print_request(r.request)
    print_response(r)

    # Apply the changes to the new revision changeset
    apply_nvue_changeset(changeset)

    # Check if the changeset was applied
    is_config_applied(changeset)

def nvue_get(path):
    r = requests.get(url=nvue_end_point + path,
                     auth=auth,
                     verify=False)
    print_request(r.request)
    print_response(r)

if __name__ == "__main__":
    int_payload = {
      "swp1":
      {
        "bridge":
        {
          "domain":
          {
            "br_default":{}
          }
        },
        "swp2": 
        {
          "bridge":
          {
            "domain":
            {
              "br_default":{}
            }
          }
        }
      }
    }
    apply_new_config("/interface",int_payload)
    br_payload = {
      "untagged":1,
      "vlan":
      {
        "10":{},
        "20":{}
      }
    }
    apply_new_config("/bridge/domain/br_default",br_payload)
    time.sleep(DUMMY_SLEEP)
    nvue_get("/interface/swp1")
    nvue_get("/bridge/domain/br_default")
cumulus@switch:~$ nv set interface swp1-2 bridge domain br_default
cumulus@switch:~$ nv set bridge domain br_default vlan 10,20
cumulus@switch:~$ nv set bridge domain br_default untagged 1

Configure BGP

The following example configures BGP.

cumulus@switch:~$ curl -u 'cumulus:cumulus' -d '{"bgp": {"autonomous-system": 65101,"router-id":"10.10.10.1"}}' -H 'Content-Type: application/json' -k -X PATCH https://127.0.0.1:8765/nvue_v1/router?rev=9
cumulus@switch:~$ curl -u 'cumulus:cumulus' -d '{"bgp":{"neighbor":{"swp51":{"remote-as":"external"}},"address-family":{"ipv4-unicast":{"network":{"10.10.10.1/32":{}}}}}}' -H 'Content-Type: application/json' -k -X PATCH https://127.0.0.1:8765/nvue_v1/vrf/default/router?rev=9
#!/usr/bin/env python3

import requests
from requests.auth import HTTPBasicAuth
import json
import time

auth = HTTPBasicAuth(username="cumulus", password="password")
nvue_end_point = "https://127.0.0.1:8765/nvue_v1"
mime_header = {"Content-Type": "application/json"}

DUMMY_SLEEP = 5  # In seconds
POLL_APPLIED = 1  # in seconds
RETRIES = 10

def print_request(r: requests.Request):
    print("=======Request=======")
    print("URL:", r.url)
    print("Headers:", r.headers)
    print("Body:", r.body)

def print_response(r: requests.Response):
    print("=======Response=======")
    print("Headers:", r.headers)
    print("Body:", json.dumps(r.json(), indent=2))

def create_nvue_changest():
    r = requests.post(url=nvue_end_point + "/revision",
                      auth=auth,
                      verify=False)
    print_request(r.request)
    print_response(r)
    response = r.json()
    changeset = response.popitem()[0]
    return changeset

def apply_nvue_changeset(changeset):
    # apply_payload = {"state": "apply"}
    apply_payload = {"state": "apply", "auto-prompt": {"ays": "ays_yes"}}
    url = nvue_end_point + "/revision/" + requests.utils.quote(changeset,
                                                               safe="")
    r = requests.patch(url=url,
                       auth=auth,
                       verify=False,
                       data=json.dumps(apply_payload),
                       headers=mime_header)
    print_request(r.request)
    print_response(r)

def is_config_applied(changeset) -> bool:
    # Check if the configuration was indeed applied
    global RETRIES
    global POLL_APPLIED
    retries = RETRIES
    while retries > 0:
        r = requests.get(url=nvue_end_point + "/revision/" + requests.utils.quote(changeset, safe=""),
                         auth=auth,
                         verify=False)
        response = r.json()
        print(response)

        if response["state"] == "applied":
            return True
        retries -= 1
        time.sleep(POLL_APPLIED)

    return False

def apply_new_config(path,payload):
    # Create a new revision ID
    changeset = create_nvue_changest()
    print("Using NVUE Changeset: '{}'".format(changeset))

    # Delete existing configuration
    query_string = {"rev": changeset}
    r = requests.delete(url=nvue_end_point + path,
                       auth=auth,
                       verify=False,
                       params=query_string,
                       headers=mime_header)
    print_request(r.request)
    print_response(r)

    # Patch the new configuration
    
    query_string = {"rev": changeset}
    r = requests.patch(url=nvue_end_point + path,
                       auth=auth,
                       verify=False,
                       data=json.dumps(payload),
                       params=query_string,
                       headers=mime_header)
    print_request(r.request)
    print_response(r)

    # Apply the changes to the new revision changeset
    apply_nvue_changeset(changeset)

    # Check if the changeset was applied
    is_config_applied(changeset)

def nvue_get(path):
    r = requests.get(url=nvue_end_point + path,
                     auth=auth,
                     verify=False)
    print_request(r.request)
    print_response(r)

if __name__ == "__main__":
    rt_payload = {
      "bgp":
      {
        "autonomous-system": 65101,
        "router-id":"10.10.10.1"
      }
    }
    apply_new_config("/router",rt_payload)
    vrf_payload = {
      "bgp":
      {
        "neighbor":
        {
          "swp51":
          {
            "remote-as":"external"
          }
        },
        "address-family":
        {
          "ipv4-unicast":
          {
            "network":
            {
              "10.10.10.1/32":{}
            }
          }
        }
      }
    }
    apply_new_config("/vrf/default/router",vrf_payload)
    time.sleep(DUMMY_SLEEP)
    nvue_get("/router")
    nvue_get("/vrf/default/router")
cumulus@switch:~$ nv set router bgp autonomous-system 65101
cumulus@switch:~$ nv set router bgp router-id 10.10.10.1
cumulus@switch:~$ nv set vrf default router bgp neighbor swp51 remote-as external
cumulus@switch:~$ nv set vrf default router bgp address-family ipv4-unicast network 10.10.10.1/32

Action Operations

The NVUE action operations are ephemeral operations that do not modify the state of the configuration; they reset counters for interfaces, BGP, QoS buffers and pools, and remove conflicts from protodown MLAG bonds.

To clear counters on swp1:

cumulus@switch:~$ curl -u 'cumulus:cumulus' -H 'Content-Type:application/json' -d '{"@clear": {"state": "start", "parameters": {}}}' -k -X POST https://127.0.0.1:8765/nvue_v1/interface/swp1/counters
1
cumulus@switch:~$ curl -u 'cumulus:cumulus' -X GET https://127.0.0.1:8765/nvue_v1/action/1 -k
{"detail":"swp1 counters cleared.","http_status":200,"issue":[],"state":"action_success","status":"swp1 counters cleared.","timeout":60,"type":""}

To clear QoS buffers on swp1:

cumulus@switch:~$ curl -u 'cumulus:cumulus' -H 'Content-Type:application/json' -d '{"@clear": {"state": "start", "parameters": {}}}' -k -X POST https://127.0.0.1:8765/nvue_v1/interface/swp1/qos/buffer
2
cumulus@switch:~$ curl -u 'cumulus:cumulus'  -X GET https://127.0.0.1:8765/nvue_v1/action/2 -k
{"detail":"QoS buffers cleared on swp1.","http_status":200,"issue":[],"state":"action_success","status":"QoS buffers cleared on swp1.","timeout":60,"type":""}
#!/usr/bin/env python3

import requests
from requests.auth import HTTPBasicAuth
import json
import time

auth = HTTPBasicAuth(username="cumulus", password="password")
nvue_end_point = "https://127.0.0.1:8765/nvue_v1"
mime_header = {"Content-Type": "application/json"}

DUMMY_SLEEP = 5  # In seconds
POLL_APPLIED = 1  # in seconds
RETRIES = 10

def print_request(r: requests.Request):
    print("=======Request=======")
    print("URL:", r.url)
    print("Headers:", r.headers)
    print("Body:", r.body)

def print_response(r: requests.Response):
    print("=======Response=======")
    print("Headers:", r.headers)
    print("Body:", json.dumps(r.json(), indent=2))

def nvue_action():
    r = requests.post(url=nvue_end_point + path,
                      auth=auth,
                      verify=False,
                      data=json.dumps(apply_payload),
                      headers=mime_header)
    print_request(r.request)
    print_response(r)
    return response

def nvue_get(path):
    r = requests.get(url=nvue_end_point + path,
                     auth=auth,
                     verify=False)
    print_request(r.request)
    print_response(r)

if __name__ == "__main__":
    payload = {
      "@clear": 
      {
        "state": "start", 
        "parameters": {}
      }
    }
    action_id=nvue_action("/interface/swp1/qos/counter",payload)
    time.sleep(DUMMY_SLEEP)
    nvue_get(f"/action/{action_id}")
   
cumulus@switch:~$ nv action clear interface swp1 qos counter

Example Python Script

In the following python example, the full_config_example() method sets the system pre-login message, enables BGP globally, and changes a few other configuration settings in a single bulk operation. The API end-point goes to the root node /nvue_v1. The bridge_config_example() method performs a targeted API request to /nvue_v1/bridge/domain/<domain-id> to set the vlan-vni-offset attribute.

Example Python Script
#!/usr/bin/env python3

import requests
from requests.auth import HTTPBasicAuth
import json
import time

auth = HTTPBasicAuth(username="vagrant", password="vagrant")
nvue_end_point = "https://127.0.0.1:8765/nvue_v1"
mime_header = {"Content-Type": "application/json"}

DUMMY_SLEEP = 5  # In seconds
POLL_APPLIED = 1  # in seconds
RETRIES = 10

def print_request(r: requests.Request):
    print("=======Request=======")
    print("URL:", r.url)
    print("Headers:", r.headers)
    print("Body:", r.body)

def print_response(r: requests.Response):
    print("=======Response=======")
    print("Headers:", r.headers)
    print("Body:", json.dumps(r.json(), indent=2))

def sanity():
    # Basic retrieval to check connectivity
    r = requests.get(url=nvue_end_point + "/system",
                     auth=auth,
                     verify=False)
    print_request(r.request)
    print_response(r)

def create_nvue_changest():
    r = requests.post(url=nvue_end_point + "/revision",
                      auth=auth,
                      verify=False)
    print_request(r.request)
    print_response(r)
    response = r.json()
    changeset = response.popitem()[0]
    return changeset

def apply_nvue_changeset(changeset):
    # apply_payload = {"state": "apply"}
    apply_payload = {"state": "apply", "auto-prompt": {"ays": "ays_yes"}}
    url = nvue_end_point + "/revision/" + requests.utils.quote(changeset,
                                                               safe="")
    r = requests.patch(url=url,
                       auth=auth,
                       verify=False,
                       data=json.dumps(apply_payload),
                       headers=mime_header)
    print_request(r.request)
    print_response(r)

def full_config_example():
    # Create an NVUE changeset
    changeset = create_nvue_changest()
    print("Using NVUE Changeset: '{}'".format(changeset))

    # https://www.asciiart.eu/comics/batman
    pre_login_message = u"""
                   ,.ood888888888888boo.,
              .od888P^""            ""^Y888bo.
          .od8P''   ..oood88888888booo.    ``Y8bo.
       .odP'"  .ood8888888888888888888888boo.  "`Ybo.
     .d8'   od8'd888888888f`8888't888888888b`8bo   `Yb.
    d8'  od8^   8888888888[  `'  ]8888888888   ^8bo  `8b
  .8P  d88'     8888888888P      Y8888888888     `88b  Y8.
 d8' .d8'       `Y88888888'      `88888888P'       `8b. `8b
.8P .88P            """"            """"            Y88. Y8.
88  888                                              888  88
88  888                                              888  88
88  888.        ..                        ..        .888  88
`8b `88b,     d8888b.od8bo.      .od8bo.d8888b     ,d88' d8'
 Y8. `Y88.    8888888888888b    d8888888888888    .88P' .8P
  `8b  Y88b.  `88888888888888  88888888888888'  .d88P  d8'
    Y8.  ^Y88bod8888888888888..8888888888888bod88P^  .8P
     `Y8.   ^Y888888888888888LS888888888888888P^   .8P'
       `^Yb.,  `^^Y8888888888888888888888P^^'  ,.dP^'
          `^Y8b..   ``^^^Y88888888P^^^'    ..d8P^'
              `^Y888bo.,            ,.od888P^'
                   "`^^Y888888888888P^^'"
"""

    # https://www.asciiart.eu/comics/superman
    post_login_message = u'''
        _____________________________________________
      //:::::::::::::::::::::::::::::::::::::::::::::\\
    //:::_______:::::::::________::::::::::_____:::::::\\
  //:::_/   _-"":::_--"""        """--_::::\_  ):::::::::\\
 //:::/    /:::::_"                    "-_:::\/:::::|^\:::\\
//:::/   /~::::::I__                      \:::::::::|  \:::\\
\\:::\   (::::::::::""""---___________     "--------"  /::://
 \\:::\  |::::::::::::::::::::::::::::""""==____      /::://
  \\:::"\/::::::::::::::::::::::::::::::::::::::\   /~::://
    \\:::::::::::::::::::::::::::::::::::::::::::)/~::://
      \\::::\""""""------_____::::::::::::::::::::::://
        \\:::"\               """""-----_____:::::://
          \\:::"\    __----__                )::://
            \\:::"\/~::::::::~\_         __/~:://
              \\::::::::::::::::""----""":::://
                \\::::::::::::::::::::::::://
                  \\:::\^""--._.--""^/::://
                    \\::"\         /":://
                      \\::"\     /":://
                        \\::"\_/":://
                          \\::::://
                            \\_//
                              "
'''

    # Prepare payload which configures a few
    # different switch configurations
    payload = {
        "interface":{
            "eth0":{
                "description": "management port"
            }
        },
        "router":{
            "bgp":{
                "enable":"on"
            }
        },
        "system":{
            "message":{
                "pre-login": pre_login_message,
                "post-login": post_login_message
            },
            "timezone": "Europe/Paris",
            "config": {
               "snippet": {
                   "test-flexible-snippet": {
                       "file": "/tmp/blah",
                       "content": "NVIDIA rocks"
                   },
                   "frr.conf": "hello world"
               }
            }
        },
        "service": {
            "ntp": {
                "mgmt": {
                    "listen": "eth0"
                }
            }
        }
    }
    # Stage the change
    query_string = {"rev": changeset}
    r = requests.patch(url=nvue_end_point + "/",  # Root patch
                       auth=auth,
                       verify=False,
                       data=json.dumps(payload),
                       params=query_string,
                       headers=mime_header)
    print_request(r.request)
    print_response(r)

    # Apply the staged changeset
    apply_nvue_changeset(changeset)

    # Check if the changeset was applied
    is_config_applied(changeset)

def bridge_config_example(domain_id):
    # Create an NVUE changeset
    changeset = create_nvue_changest()
    print("Using NVUE Changeset: '{}'".format(changeset))
    payload = {
        "vlan-vni-offset": 1000
    }

    # Stage the change
    query_string = {"rev": changeset}
    r = requests.patch(url=nvue_end_point + f"/bridge/domain/{domain_id}",
                       auth=auth,
                       verify=False,
                       data=json.dumps(payload),
                       params=query_string,
                       headers=mime_header)
    print_request(r.request)
    print_response(r)

    # Apply the staged changeset
    apply_nvue_changeset(changeset)

    # Check if the changeset was applied
    is_config_applied(changeset)

def message_get():
    # Get the system pre-login/post-login
    # message that was configured.
    r = requests.get(url=nvue_end_point + "/system/message",
                     auth=auth,
                     verify=False)
    print_request(r.request)
    print_response(r)

def is_config_applied(changeset) -> bool:
    # Check if the configuration was indeed applied
    global RETRIES
    global POLL_APPLIED
    retries = RETRIES
    while retries > 0:
        r = requests.get(url=nvue_end_point + "/revision/" + requests.utils.quote(changeset, safe=""),
                         auth=auth,
                         verify=False)
        response = r.json()
        print(response)

        if response["state"] == "applied":
            return True
        retries -= 1
        time.sleep(POLL_APPLIED)

    return False

if __name__ == "__main__":
    sanity()
    time.sleep(DUMMY_SLEEP)
    full_config_example()
    time.sleep(DUMMY_SLEEP)
    bridge_config_example("br_default")
    time.sleep(DUMMY_SLEEP)
    message_get()

Try the API

To try out the NVUE REST API, use the NVUE API Lab available on NVIDIA Air. The lab provides a basic example to help you get started. You can also try out the other examples in this document.

Resources

For information about using the NVUE REST API, refer to the NVUE API Swagger documentation. The full object model download is available here.

Considerations

Network Time Protocol - NTP

The ntpd daemon running on the switch implements the NTP protocol. It synchronizes the system time with time servers in the /etc/ntp.conf file. The ntpd daemon starts at boot by default.

If you intend to run this service within a VRF, including the management VRF, follow these steps to configure the service.

Configure NTP Servers

The default NTP configuration includes the following servers, which are in the /etc/ntp.conf file:

To add the NTP servers you want to use, run the following commands. Include the iburst option to increase the sync speed.

The NVUE command requires a VRF. The following command adds the NTP servers in the default VRF.

cumulus@switch:~$ nv set service ntp default server 4.cumulusnetworks.pool.ntp.org iburst on
cumulus@switch:~$ nv config apply

Edit the /etc/ntp.conf file to add or update NTP server information:

cumulus@switch:~$ sudo nano /etc/ntp.conf
# pool.ntp.org maps to about 1000 low-stratum NTP servers.  Your server will
# pick a different set every time it starts up.  Please consider joining the
# pool: <http://www.pool.ntp.org/join.html>
server 0.cumulusnetworks.pool.ntp.org iburst
server 1.cumulusnetworks.pool.ntp.org iburst
server 2.cumulusnetworks.pool.ntp.org iburst
server 3.cumulusnetworks.pool.ntp.org iburst
server 4.cumulusnetworks.pool.ntp.org iburst

To set the initial date and time with NTP before starting the ntpd daemon, run the ntpd -q command. Be aware that ntpd -q can hang if the time servers are not reachable.

To verify that ntpd is running on the system:

cumulus@switch:~$ ps -ef | grep ntp
ntp       4074     1  0 Jun20 ?        00:00:33 /usr/sbin/ntpd -p /var/run/ntpd.pid -g -u 101:102

To check the NTP peer status:

cumulus@switch:~$ nv show service ntp default server
cumulus@switch:~$ ntpq -p
      remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
+ec2-34-225-6-20 129.6.15.30      2 u   73 1024  377   70.414   -2.414   4.110
+lax1.m-d.net    132.163.96.1     2 u   69 1024  377   11.676    0.155   2.736
*69.195.159.158  199.102.46.72    2 u  133 1024  377   48.047   -0.457   1.856
-2.time.dbsinet. 198.60.22.240    2 u 1057 1024  377   63.973    2.182   2.692

The following example commands remove some of the default NTP servers:

cumulus@switch:~$ nv unset service ntp default server 0.cumulusnetworks.pool.ntp.org
cumulus@switch:~$ nv unset service ntp default server 1.cumulusnetworks.pool.ntp.org
cumulus@switch:~$ nv unset service ntp default server 2.cumulusnetworks.pool.ntp.org
cumulus@switch:~$ nv unset service ntp default server 3.cumulusnetworks.pool.ntp.org
cumulus@switch:~$ nv config apply

Edit the /etc/ntp.conf file to delete NTP servers.

cumulus@switch:~$ sudo nano /etc/ntp.conf
...
# pool.ntp.org maps to about 1000 low-stratum NTP servers.  Your server will
# pick a different set every time it starts up.  Please consider joining the
# pool: <http://www.pool.ntp.org/join.html>
server 4.cumulusnetworks.pool.ntp.org iburst
...

Specify the NTP Source Interface

By default, the source interface that NTP uses is eth0. The following example command configures the NTP source interface to be swp10.

cumulus@switch:~$ nv set service ntp default listen swp10
cumulus@switch:~$ nv config apply

Edit the /etc/ntp.conf file and modify the entry under the Specify interfaces comment.

cumulus@switch:~$ sudo nano /etc/ntp.conf
...
# Specify interfaces
interface listen swp10
...

Use NTP in a DHCP Environment

You can use DHCP to specify your NTP servers. Ensure that the DHCP-generated configuration file /run/ntp.conf.dhcp exists. The /etc/dhcp/dhclient-exit-hooks.d/ntp script generates this file, which is a copy of the default /etc/ntp.conf file with a modified server list from the DHCP server. If this file does not exist and you plan on using DHCP in the future, you can copy your current /etc/ntp.conf file to the location of the DHCP file.

To use DHCP to specify your NTP servers, run the sudo -E systemctl edit ntp.service command and add the ExecStart= line:

cumulus@switch:~$ sudo -E systemctl edit ntp.service
[Service]
ExecStart=
ExecStart=/usr/sbin/ntpd -n -u ntp:ntp -g -c /run/ntp.conf.dhcp

The sudo -E systemctl edit ntp.service command always updates the base ntp.service even if you use ntp@mgmt.service. The ntp@mgmt.service is re-generated automatically.

To validate that your configuration, run these commands:

cumulus@switch:~$ sudo systemctl restart ntp
cumulus@switch:~$ sudo systemctl status -n0 ntp.service

If the state is not Active, or the alternate configuration file does not appear in the ntp command line, it is likely that you made a configuration mistake. Correct the mistake and rerun the commands above to verify.

Configure NTP with Authorization Keys

For added security, you can configure NTP to use authorization keys.

Configure the NTP Server

  1. Create a .keys file, such as /etc/ntp.keys. Specify a key identifier (a number between 1 and 65535), an encryption method (M for MD5), and the password. The following provides an example:

    #
    # PLEASE DO NOT USE THE DEFAULT VALUES HERE.
    #
    #65535  M  akey
    #1      M  pass
    
    1  M  CumulusLinux!
    
  2. In the /etc/ntp.conf file, add a pointer to the /etc/ntp.keys file you created above and specify the key identifier. For example:

    keys /etc/ntp/ntp.keys
    trustedkey 1
    controlkey 1
    requestkey 1
    
  3. Restart NTP with the sudo systemctl restart ntp command.

Configure the NTP Client

The NTP client is the Cumulus Linux switch.

  1. Create the same .keys file you created on the NTP server (/etc/ntp.keys). For example:

    cumulus@switch:~$  sudo nano /etc/ntp.keys
    #
    # DO NOT USE THE DEFAULT VALUES HERE.
    #
    #65535  M  akey
    #1      M  pass
    
    1  M  CumulusLinux!
    
  2. Edit the /etc/ntp.conf file to specify the server you want to use, the key identifier, and a pointer to the /etc/ntp.keys file you created in step 1. For example:

    cumulus@switch:~$ sudo nano /etc/ntp.conf
    ...
    # You do need to talk to an NTP server or two (or three).
    #pool ntp.your-provider.example
    # OR
    #server ntp.your-provider.example
    
    # pool.ntp.org maps to about 1000 low-stratum NTP servers.  Your server will
    # pick a different set every time it starts up.  Please consider joining the
    # pool: <http://www.pool.ntp.org/join.html>
    #server 0.cumulusnetworks.pool.ntp.org iburst
    #server 1.cumulusnetworks.pool.ntp.org iburst
    #server 2.cumulusnetworks.pool.ntp.org iburst
    #server 3.cumulusnetworks.pool.ntp.org iburst
    server 10.50.23.121 key 1
    
    #keys
    keys /etc/ntp.keys
    trustedkey 1
    controlkey 1
    requestkey 1
    ...
    
  3. Restart NTP in the active VRF (default or management). For example:

    cumulus@switch:~$ systemctl restart ntp@mgmt.service
    
  4. Wait a few minutes, then run the ntpq -c as command to verify the configuration:

    cumulus@switch:~$ ntpq -c as
    
    ind assid status  conf reach auth condition  last_event cnt
    ===========================================================
      1 40828  f014   yes   yes   ok     reject   reachable  1
    

    After a successful authorization, you see the following command output:

    cumulus@switch:~$ ntpq -c as
    
    ind assid status  conf reach auth condition  last_event cnt
    ===========================================================
      1 40828  f61a   yes   yes   ok   sys.peer    sys_peer  1
    

Considerations

NTP in Cumulus Linux uses the /usr/share/zoneinfo/leap-seconds.list file, which expires periodically and results in generated log messages about the expiration. When the file expires, update it from https://www.ietf.org/timezones/data/leap-seconds.list or upgrade the tzdata package to the newest version.

Precision Time Protocol - PTP

Cumulus Linux supports IEEE 1588-2008 Precision Timing Protocol (PTPv2), which defines the algorithm and method for synchronizing clocks of various devices across packet-based networks, including Ethernet switches and IP routers.

PTP is capable of sub-microsecond accuracy. The clocks are in a master-slave hierarchy, where the slaves synchronize to their masters, which can be slaves to their own masters. The best master clock (BMC) algorithm, which runs on every clock, creates and updates the hierarchy automatically. The grandmaster clock is the top-level master. To provide a high-degree of accuracy, a Global Positioning System (GPS) time source typically synchronizes the grandmaster clock.

In the following example:

Cumulus Linux and PTP

PTP in Cumulus Linux uses the linuxptp package that includes the following programs:

Cumulus Linux supports:

  • On NVIDIA switches with Spectrum-2 and later, PTP is not supported on 1G interfaces.
  • You cannot run both PTP and NTP on the switch.
  • PTP supports the default VRF only.

Basic Configuration

Basic PTP configuration requires you:

If you configure PTP with Linux commands, you must also enable PTP timestamping; see step 1 of the Linux procedure below. NVUE enables timestamping when you enable PTP on the switch.

The basic configuration shown below uses the default PTP settings:

To configure other settings, such as the PTP profile, domain, priority, and DSCP, the PTP interface transport mode and timers, and PTP monitoring, see the Optional Configuration sections below.

The NVUE nv set service ptp commands require an instance number (1 in the example command below) for management purposes.

When you enable the PTP service with the nv set service ptp <instance> enable on command, NVUE restarts the switchd service, which causes all network ports to reset in addition to resetting the switch hardware configuration.

cumulus@switch:~$ nv set service ptp 1 enable on
cumulus@switch:~$ nv set interface swp1 ip address 10.0.0.9/32
cumulus@switch:~$ nv set interface swp2 ip address 10.0.0.10/32
cumulus@switch:~$ nv set interface swp1 ptp enable on
cumulus@switch:~$ nv set interface swp2 ptp enable on
cumulus@switch:~$ nv config apply

The configuration writes to the /etc/ptp4l.conf file.

cumulus@switch:~$ nv set service ptp 1 enable on
cumulus@switch:~$ nv set bridge domain br_default
cumulus@switch:~$ nv set bridge domain br_default type vlan-aware
cumulus@switch:~$ nv set bridge domain br_default vlan 10-30
cumulus@switch:~$ nv set bridge domain br_default vlan 10 ptp enable on
cumulus@switch:~$ nv set interface vlan10 type svi
cumulus@switch:~$ nv set interface vlan10 ip address 10.1.10.2/24
cumulus@switch:~$ nv set interface vlan10 ptp enable on
cumulus@switch:~$ nv set interface swp1 bridge domain br_default
cumulus@switch:~$ nv set interface swp1 bridge domain br_default vlan 10
cumulus@switch:~$ nv set interface swp1 ptp enable on
cumulus@switch:~$ nv config apply

  • You can configure only one address; either IPv4 or IPv6.
  • For IPv6, set the trunk port transport mode to ipv6.

The configuration writes to the /etc/ptp4l.conf file.

cumulus@switch:~$ nv set service ptp 1 enable on
cumulus@switch:~$ nv set bridge domain br_default
cumulus@switch:~$ nv set bridge domain br_default type vlan-aware
cumulus@switch:~$ nv set bridge domain br_default vlan 10-30
cumulus@switch:~$ nv set bridge domain br_default vlan 10 ptp enable on
cumulus@switch:~$ nv set interface vlan10 type svi
cumulus@switch:~$ nv set interface vlan10 ip address 10.1.10.2/24
cumulus@switch:~$ nv set interface swp2 bridge domain br_default
cumulus@switch:~$ nv set interface swp2 bridge domain br_default access 10
cumulus@switch:~$ nv set interface swp2 ptp enable on
cumulus@switch:~$ nv config apply

  • You can configure only one address; either IPv4 or IPv6.
  • For IPv6, set the trunk port transport mode to ipv6.

The configuration writes to the /etc/ptp4l.conf file.

  1. Edit the /etc/cumulus/switchd.d/ptp.conf file to set the ptp.timestamping parameter to TRUE:

    cumulus@switch:~$ sudo nano /etc/cumulus/switchd.d/ptp.conf
    ...
    ptp.timestamping     TRUE
    ...
    
  2. Restart the switchd service:

    cumulus@switch:~$ sudo systemctl restart switchd.service
    

Restarting the switchd service causes all network ports to reset in addition to resetting the switch hardware configuration.

  1. Enable and start the ptp4l and phc2sys services:

    cumulus@switch:~$ sudo systemctl enable ptp4l.service phc2sys.service
    cumulus@switch:~$ sudo systemctl start ptp4l.service phc2sys.service
    
  2. Edit the Default interface options section of the /etc/ptp4l.conf file to configure the interfaces on the switch that you want to use for PTP.

    cumulus@switch:~$ sudo nano /etc/ptp4l.conf
    ...
    [global]
    #
    # Default Data Set
    #
    slaveOnly               0
    priority1               128
    priority2               128
    domainNumber            0
       
    twoStepFlag             1
    dscp_event              46
    dscp_general            46
    network_transport              L2
    dataset_comparison             G.8275.x
    G.8275.defaultDS.localPriority 128
    ptp_dst_mac                    01:80:C2:00:00:0E
    
    #
    # Port Data Set
    #
    logAnnounceInterval            -3
    logSyncInterval                -4
    logMinDelayReqInterval         -4
    announceReceiptTimeout         3
    delay_mechanism                E2E
    
    offset_from_master_min_threshold   -50
    offset_from_master_max_threshold   50
    mean_path_delay_threshold          200
    tsmonitor_num_ts                   100
    tsmonitor_num_log_sets             3
    tsmonitor_num_log_entries          4
    tsmonitor_log_wait_seconds         1
    
    #
    # Run time options
    #
    logging_level           6
    path_trace_enabled      0
    use_syslog              1
    verbose                 0
    summary_interval        0
       
    #
    # servo parameters
    #
    pi_proportional_const          0.000000
    pi_integral_const              0.000000
    pi_proportional_scale          0.700000
    pi_proportional_exponent       -0.300000
    pi_proportional_norm_max       0.700000
    pi_integral_scale              0.300000
    pi_integral_exponent           0.400000
    pi_integral_norm_max           0.300000
    step_threshold                 0.000002
    first_step_threshold           0.000020
    max_frequency                  900000000
    sanity_freq_limit              0
       
    #
    # Default interface options
    #
    time_stamping                  software
       
    # Interfaces in which ptp should be enabled
    # these interfaces should be routed ports
    # if an interface does not have an ip address
    # the ptp4l will not work as expected.
       
    [swp1]
    udp_ttl                 1
    masterOnly              0
    delay_mechanism         E2E
       
    [swp2]
    udp_ttl                 1
    masterOnly              0
    delay_mechanism         E2E
    

    For a trunk VLAN, add the VLAN configuration to the switch port stanza: set l2_mode to trunk, vlan_intf to the VLAN interface, and src_ip to the IP address of the VLAN interface:

    [swp1]
    l2_mode                 trunk
    vlan_intf               vlan10
    src_ip                  10.1.10.2
    udp_ttl                 1
    masterOnly              0
    delay_mechanism         E2E
    network_transport       UDPv4
    

    For a switch port VLAN, add the VLAN configuration to the switch port stanza: set l2_mode to access, vlan_intf to the VLAN interface, and src_ip to the IP address of the VLAN interface:

    [swp2]
    l2_mode                 access
    vlan_intf               vlan10
    src_ip                  10.1.10.2
    udp_ttl                 1
    masterOnly              0
    delay_mechanism         E2E
    network_transport       UDPv4
    
  3. Restart the ptp4l service:

    cumulus@switch:~$ sudo systemctl restart ptp4l.service
    

Global Configuration

Cumulus Linux provides several ways to modify the default basic global configuration. You can:

When a predefined profile is set, NVUE does not allow you to configure global parameters. Do not edit the Linux /etc/ptp4l.conf file to modify the global parameters when a predefined profile is in use. For information about profiles, see PTP Profiles.

Clock Domains

PTP domains allow different independent timing systems to be present in the same network without confusing each other. A PTP domain is a network or a portion of a network within which all the clocks synchronize. Every PTP message contains a domain number. A PTP instance works in only one domain and ignores messages that contain a different domain number. Cumulus Linux supports only one domain in the system.

You can specify multiple PTP clock domains. PTP isolates each domain from other domains so that each domain is a different PTP network. You can specify a number between 0 and 127.

The following example commands configure domain 3 when a profile is not set:

cumulus@switch:~$ nv set service ptp 1 domain 3
cumulus@switch:~$ nv config apply

Edit the Default Data Set section of the /etc/ptp4l.conf file to change the domainNumber setting, then restart the ptp4l service.

cumulus@switch:~$ sudo nano /etc/ptp4l.conf
[global]
#
# Default Data Set
#
slaveOnly               0
priority1               128
priority2               128
domainNumber            3
...
cumulus@switch:~$ sudo systemctl restart ptp4l.service

Clock Timestamp Mode

The Cumulus Linux switch provides the following clock timestamp modes:

One-step mode significantly reduces the number of PTP messages. Two-step mode is the default configuration.

Cumulus Linux supports one-step mode on switches with the Spectrum 2 and Spectrum-3 ASIC only.

The following example commands configure one-step mode when a profile is not set:

cumulus@switch:~$ nv set service ptp 1 two-step off
cumulus@switch:~$ nv config apply

To revert the clock timestamp mode to the default setting (two-step mode), run the nv set service ptp 1 two-step on command.

To set the clock timestamp mode for a custom profile based on IEEE1588, ITU 8275-1 or ITU 8275-2, run the nv set service ptp <instance-id> profile <profile-id> two-step command. For example, to set one-step mode for the custom profile called CUSTOM1, run the nv set service ptp 1 profile CUSTOM1 two-step off command.

Edit the Default Data Set section of the /etc/ptp4l.conf file to change the twoStepFlag setting to 0, then restart the ptp4l service.

cumulus@switch:~$ sudo nano /etc/ptp4l.conf
[global]
#
# Default Data Set
#
slaveOnly               0
priority1               254
priority2               254
domainNumber            3

twoStepFlag             0
dscp_event              43
dscp_general            43
...
cumulus@switch:~$ sudo systemctl restart ptp4l.service

To revert the clock timestamp mode to the default setting (two-step mode), change the twoStepFlag setting to 1.

PTP Priority

The BMC selects the PTP master according to the criteria in the following order:

  1. Priority 1
  2. Clock class
  3. Clock accuracy
  4. Clock variance
  5. Priority 2
  6. Port ID

Use the PTP priority to select the best master clock. You can set priority 1 and 2:

The range for both priority1 and priority2 is between 0 and 255. The default priority is 128. For the boundary clock, use a number above 128. The lower priority applies first.

The following example commands set priority 1 and priority 2 to 200 when a profile is not set:

cumulus@switch:~$ nv set service ptp 1 priority1 200
cumulus@switch:~$ nv set service ptp 1 priority2 200
cumulus@switch:~$ nv config apply

Edit the Default Data Set section of the /etc/ptp4l.conf file to change the priority1 and, or priority2 setting, then restart the ptp4l service.

cumulus@switch:~$ sudo nano /etc/ptp4l.conf
[global]
#
# Default Data Set
#
slaveOnly               0
priority1               200
priority2               200
domainNumber            3
...
cumulus@switch:~$ sudo systemctl restart ptp4l.service

Local Priority

Use the local priority when you create a custom profile based on a Telecom profile (ITU 8275-1 or ITU 8275-2). Modify the local priority in a custom profile to set the local priority of the local clock. You can set a value between 0 and 255. The default priority is 128.

The following example command configures the local priority to 10 for the custom profile called CUSTOM1, based on ITU 8275-2:

cumulus@switch:~$ nv set service ptp 1 profile CUSTOM1 local-priority 10
cumulus@switch:~$ nv config apply

Edit the G.8275.defaultDS.localPriority option in the /etc/ptp4l.conf file. After you save the /etc/ptp4l.conf file, restart the ptp4l service.

cumulus@switch:~$ sudo nano /etc/ptp4l.conf
[global]
#
# Default Data Set
#
slaveOnly                      0
priority1                      128
priority2                      128
domainNumber                   28

twoStepFlag                    1
dscp_event                     46
dscp_general                   46
network_transport              L2
dataset_comparison             G.8275.x
G.8275.defaultDS.localPriority 10
ptp_dst_mac                    01:80:C2:00:00:0E
...
cumulus@switch:~$ sudo systemctl restart ptp4l.service

Optional Global Configuration

Optional global PTP configuration includes configuring the DiffServ code point (DSCP). You can configure the DSCP value for all PTP IPv4 packets originated locally. You can set a value between 0 and 63.

cumulus@switch:~$ nv set service ptp 1 ip-dscp 22
cumulus@switch:~$ nv config apply

Edit the Default Data Set section of the /etc/ptp4l.conf file to change the dscp_event setting for PTP messages that trigger a timestamp read from the clock and the dscp_general setting for PTP messages that carry commands, responses, information, or timestamps.

After you save the /etc/ptp4l.conf file, restart the ptp4l service.

cumulus@switch:~$ sudo nano /etc/ptp4l.conf
[global]
#
# Default Data Set
#
slaveOnly               0
priority1               200
priority2               200
domainNumber            3

twoStepFlag             1
dscp_event              22
dscp_general            22
...
cumulus@switch:~$ sudo systemctl restart ptp4l.service

PTP Interface Configuration

Cumulus Linux provides several ways to modify the default basic interface configuration. You can:

When a profile is in use, avoid configuring the following interface configuration parameters with NVUE or in the Linux configuration file so that the interface retains its profile settings.

Transport Mode

By default, Cumulus Linux encapsulates PTP messages in UDP IPV4 frames. To encapsulate PTP messages on an interface in UDP IPV6 frames:

cumulus@switch:~$ nv set interface swp1 ptp transport ipv6
cumulus@switch:~$ nv config apply

Edit the Default interface options section of the /etc/ptp4l.conf file to change the network_transport setting for the interface, then restart the ptp4l service.

cumulus@switch:~$ sudo nano /etc/ptp4l.conf
...
# Default interface options
#
time_stamping           hardware

# Interfaces in which ptp should be enabled
# these interfaces should be routed ports
# if an interface does not have an ip address
# the ptp4l will not work as expected.

[swp1]
udp_ttl                 1
masterOnly              0
delay_mechanism         E2E
network_transport       UDPv6

[swp2]
udp_ttl                 1
masterOnly              0
delay_mechanism         E2E
network_transport       UDPv6
...
cumulus@switch:~$ sudo systemctl restart ptp4l.service

Message Mode

Cumulus Linux supports the following PTP message modes:

Multicast mode is the default setting; when you enable PTP on an interface, the message mode is multicast.

To change the message mode to mixed on swp1:

cumulus@switch:~$ nv set interface swp1 ptp mixed-multicast-unicast on
cumulus@switch:~$ nv config apply

To change the message mode back to the default setting of multicast on swp1:

cumulus@switch:~$ nv set interface swp1 ptp mixed-multicast-unicast off
cumulus@switch:~$ nv config apply

Edit the Default interface options section of the /etc/ptp4l.conf file to add the hybrid_e2e 1 line under the interface, then restart the ptp4l service.

cumulus@switch:~$ sudo nano /etc/ptp4l.conf
...
# Default interface options
#
time_stamping           hardware

# Interfaces in which ptp should be enabled
# these interfaces should be routed ports
# if an interface does not have an ip address
# the ptp4l will not work as expected.

[swp1]
hybrid_e2e              1
...
cumulus@switch:~$ sudo systemctl restart ptp4l.service

To change the message mode back to the default setting of multicast, remove the hybrid_e2e line under the interface, then restart the ptp4l service.

PTP Interface Timers

You can set the following timers for PTP messages.

TimerDescription
announce-intervalThe average interval between successive Announce messages. Specify the value as a power of two in seconds.
announce-timeoutThe number of announce intervals that have to occur without receiving an Announce message before a timeout occurs. Make sure that this value is longer than the announce-interval in your network.
delay-req-intervalThe minimum average time interval allowed between successive Delay Required messages.
sync-intervalThe interval between PTP synchronization messages on an interface. Specify the value as a power of two in seconds.

The following example sets the announce interval between successive Announce messages on swp1 to -1.

cumulus@switch:~$ nv set interface swp1 ptp timers announce-interval -1
cumulus@switch:~$ nv config apply

The following example sets the mean sync-interval for multicast messages on swp1 to -5.

cumulus@switch:~$ nv set interface swp1 ptp timers sync-interval -5
cumulus@switch:~$ nv config apply

Edit the Default interface options section of the /etc/ptp4l.conf file:

  • To set the announce interval between successive Announce messages on swp1 to -1, add logAnnounceInterval -1 under the interface stanza.
  • To set the mean sync-interval for multicast messages on swp1 to -5, add logSyncInterval -5 under the interface stanza.

After you edit the /etc/ptp4l.conf file, restart the ptp4l service.

cumulus@switch:~$ sudo nano /etc/ptp4l.conf
...
# Default interface options
#
time_stamping           hardware

# Interfaces in which ptp should be enabled
# these interfaces should be routed ports
# if an interface does not have an ip address
# the ptp4l will not work as expected.

[swp1]
logAnnounceInterval     -1
logSyncInterval         -5
udp_ttl                 20
masterOnly              1
delay_mechanism         E2E
...
cumulus@switch:~$ sudo systemctl restart ptp4l.service

Local Priority

Set the local priority on an interface for a profile that uses ITU 8275-1 or ITU 8275-2. You can set a value between 0 and 255. The default priority is 128.

The following example sets the local priority on swp1 to 10.

cumulus@switch:~$ nv set interface swp1 ptp local-priority 10
cumulus@switch:~$ nv config apply

Add the G.8275.portDS.localPriority option to the interface section of the /etc/ptp4l.conf file, then restart the ptp4l service.

cumulus@switch:~$ sudo nano /etc/ptp4l.conf
...
[swp1]
udp_ttl                      1
hybrid_e2e                   1
masterOnly                   0
delay_mechanism              E2E
network_transport            UDPv6
G.8275.portDS.localPriority  10
...
cumulus@switch:~$ sudo systemctl restart ptp4l.service

Optional PTP Interface Configuration

Forced Master Mode

By default, PTP ports are in auto mode, where the BMC algorithm determines the state of the port.

You can configure Forced Master mode on a PTP port so that it is always in a master state and the BMC algorithm does not run for this port. This port ignores any Announce messages it receives.

cumulus@switch:~$ nv set interface swp1 ptp forced-master on
cumulus@switch:~$ nv config apply

Edit the Default interface options section of the /etc/ptp4l.conf file to change the masterOnly setting for the interface, then restart the ptp4l service.

cumulus@switch:~$ sudo nano /etc/ptp4l.conf
...
# Default interface options
#
time_stamping           hardware

# Interfaces in which ptp should be enabled
# these interfaces should be routed ports
# if an interface does not have an ip address
# the ptp4l will not work as expected.

[swp1]
udp_ttl                 1
masterOnly              1
delay_mechanism         E2E
...
cumulus@switch:~$ sudo systemctl restart ptp4l.service

TTL for a PTP Message

To restrict the number of hops a PTP message can travel, set the TTL on the PTP interface. You can set a value between 1 and 255.

cumulus@switch:~$ nv set interface swp1 ptp ttl 20
cumulus@switch:~$ nv config apply

Edit the Default interface options section of the /etc/ptp4l.conf file to change the udp_ttl setting for the interface, then restart the ptp4l service.

cumulus@switch:~$ sudo nano /etc/ptp4l.conf
...
# Default interface options
#
time_stamping           hardware

# Interfaces in which ptp should be enabled
# these interfaces should be routed ports
# if an interface does not have an ip address
# the ptp4l will not work as expected.

[swp1]
udp_ttl                 20
masterOnly              1
delay_mechanism         E2E
...
cumulus@switch:~$ sudo systemctl restart ptp4l.service

Unicast Mode

Cumulus Linux supports unicast mode so that a unicast client can perform Unicast Discover and Negotiation with servers. Unlike the default multicast mode, where both the server(master) and client(slave) start sending out announce requests and discover each other, in unicast mode, the client starts by sending out requests for unicast transmission. The client sends this to every server address in its Unicast Master Table. The server responds with an accept or deny to the request.

Global Unicast Configuration

Unicast clients need a unicast master table for unicast negotiation; you must configure at least one unicast master table on the switch.

To configure unicast globally:

Interface Unicast Configuration

For interface unicast configuration, in addition to enabling PTP on an interface, you also need to configure the PTP interface to be either a unicast client or a unicast server.

When configuring multiple PTP interfaces on the switch to be unicast clients, you must configure a unicast table ID on every interface set as a unicast client. Each client must have a different table ID.

To configure a PTP interface to be the unicast client:

cumulus@switch:~$ nv set interface swp1 ptp unicast-service-mode client
cumulus@switch:~$ nv config apply
  1. Add the following lines at the end of the interface section of the /etc/ptp4l.conf file:

    [unicast_master_table]
    table_id               3
    logQueryInterval       0
    UDPv4                  100.100.100.1
    
    [swp1]
    table_id                1
    ...
    
  2. Restart the ptp4l service.

    cumulus@switch:~$ sudo systemctl restart ptp4l.service
    

To configure a PTP interface to be the unicast server:

cumulus@switch:~$ nv set interface swp1 ptp unicast-service-mode server
cumulus@switch:~$ nv config apply
  1. Add the following lines at the end of the interface section of the /etc/ptp4l.conf file:

    [swp1]
    ...
    unicast_listen      1
    ...
    
  2. Restart the ptp4l service.

    cumulus@switch:~$ sudo systemctl restart ptp4l.service
    

To configure a unicast table ID:

cumulus@switch:~$ nv set interface swp1 ptp unicast-master-table-id 1
cumulus@switch:~$ nv config apply
  1. Add the table ID at the end of the interface section of the /etc/ptp4l.conf file:

    [swp1]
    ...
    table_id   1
    
    
  2. Restart the ptp4l service.

    cumulus@switch:~$ sudo systemctl restart ptp4l.service
    

To show the unicast master table configuration on the switch, run the nv show service ptp <instance-id> unicast-master <table-id> command.

Optional Unicast Interface Configuration

You can set the unicast request duration for unicast clients, which is the service time in seconds requested by the unicast client during unicast negotiation. The default value is 300 seconds.

cumulus@switch:~$ nv set interface swp1 ptp unicast-request-duration 20
cumulus@switch:~$ nv config apply
  1. Add the unicast_request_duration parameter at the end of the interface section of the /etc/ptp4l.conf file:

    [swp1]
    ...
    table_id   1
    unicast_request_duration 20
    
  2. Restart the ptp4l service.

    cumulus@switch:~$ sudo systemctl restart ptp4l.service
    

PTP Profiles

PTP profiles are a standardized set of configurations and rules intended to meet the requirements of a specific application. Profiles define required, allowed, and restricted PTP options, network restrictions, and performance requirements.

Cumulus Linux supports the following predefined profiles:

IEEE 1588ITU 8275-1ITU 8275-2
ApplicationEnterpriseMobile NetworksMobile Networks
TransportLayer 2 and Layer 3Layer 2Layer 3
Encapsulation802.3, UDPv4, or UDPv6802.3UDPv4 or UDPv6
TransmissionUnicast and MulticastMulticastUnicast
Supported Clock TypesBoundary ClockBoundary ClockBoundary Clock

  • You cannot modify the predefined profiles. If you want to set a parameter to a different value in a predefined profile, you need to create a custom profile. You can modify a custom profile within the range applicable to the profile type.
  • You cannot set the current profile to a profile not yet created.
  • You cannot set global PTP parameters in a profile currently in use.
  • PTP profiles do not support VLANs or bonds.
  • If you set a predefined or custom profile, do not change any global PTP settings, such as the DSCP or the clock domain.
  • For better performance in a high scale network with PTP on multiple interfaces, configure a higher system policer rate with the nv set system control-plane policer lldp-ptp burst <value> and nv set system control-plane policer lldp-ptp rate <value> commands. The switch uses the LLDP policer for PTP protocol packets. The default value for the LLDP policer is 2500. When you use the ITU 8275.1 profile with higher sync rates, use higher policer values.

Set a Predefined Profile

To set a predefined profile:

  • To set the ITU 8275.1 profile, run the nv set service ptp <instance-id> current-profile default-itu-8275-1 command.
  • To set the ITU 8275.2 profile, run the nv set service ptp <instance-id> current-profile default-itu-8275-2 command.

The following example sets the profile to ITU 8275.1

cumulus@switch:~$ nv set service ptp 1 current-profile default-itu-8275-1
cumulus@switch:~$ nv config apply

To set the IEEE 1588 profile:

cumulus@switch:~$ nv set service ptp 1 current-profile default-1588
cumulus@switch:~$ nv config apply

To set the predefined ITU 8275.1 profile, edit the /etc/ptp4l.conf file and set the parameters shown below, then restart the ptp4l service:

cumulus@switch:~$ sudo nano /etc/ptp4l.conf
...
...
[global]
#
# Default Data Set
#
slaveOnly                      0
priority1                      128
priority2                      128
domainNumber                   24

twoStepFlag                    1
dscp_event                     46
dscp_general                   46
dataset_comparison             G.8275.x
G.8275.defaultDS.localPriority 128
ptp_dst_mac                    01:80:C2:00:00:0E
...
cumulus@switch:~$ sudo systemctl restart ptp4l.service

To set the predefined ITU 8275.2 profile, edit the /etc/ptp4l.conf file and set the parameters shown below, then restart the ptp4l service:

cumulus@switch:~$ sudo nano /etc/ptp4l.conf
...
...
[global]
#
# Default Data Set
#
slaveOnly                      0
priority1                      128
priority2                      128
domainNumber                   24

twoStepFlag                    1 
dscp_event                     46
dscp_general                   46
network_transport              UDPv4
dataset_comparison             G.8275.x
G.8275.defaultDS.localPriority 128
hybrid_e2e                     1
inhibit_multicast_service      1
unicast_listen                 1
unicast_req_duration           60
...
cumulus@switch:~$ sudo systemctl restart ptp4l.service

To use the predefined IEEE 1588 profile, edit the /etc/ptp4l.conf file and set the parameters shown below, then restart the ptp4l service:

cumulus@switch:~$ sudo nano /etc/ptp4l.conf
[global]
#
# Default Data Set
#
slaveOnly                      0
priority1                      128
priority2                      128
domainNumber                   0

twoStepFlag                    1
dscp_event                     46
dscp_general                   46
network_transport              UDPv4
dataset_comparison             ieee1588
...
cumulus@switch:~$ sudo systemctl restart ptp4l.service

Create a Custom Profile

To create a custom profile:

  • Create a profile name.
  • Set the profile type on which to base the new profile (itu-g-8275-1 itu-g-8275-2, or ieee-1588).
  • Update any of the profile settings you want to change (announce-interval, delay-req-interval, priority1, sync-interval, announce-timeout, domain, priority2, transport, delay-mechanism, local-priority).
  • Set the custom profile to be the current profile.

The following example commands create a custom profile called CUSTOM1 based on the predefined profile ITU 8275-1. The commands set the domain to 28 and the announce-timeout to 3, then set CUSTOM1 to be the current profile:

cumulus@switch:~$  nv set service ptp 1 profile CUSTOM1 
cumulus@switch:~$  nv set service ptp 1 profile CUSTOM1 profile-type itu-g-8275-1  
cumulus@switch:~$  nv set service ptp 1 profile CUSTOM1 domain 28
cumulus@switch:~$  nv set service ptp 1 profile CUSTOM1 announce-timeout 3
cumulus@switch:~$  nv set service ptp 1 current-profile CUSTOM1
cumulus@switch:~$  nv config apply

The following example /etc/ptp4l.conf file creates a custom profile based on the predefined profile ITU 8275-1 and sets the domain to 28 and the announce-timeout to 3.

cumulus@switch:~$ sudo nano /etc/ptp4l.conf
[global]
#
# Default Data Set
#
slaveOnly                      0
priority1                      128
priority2                      128
domainNumber                   28

twoStepFlag                    1
dscp_event                     46
dscp_general                   46
network_transport              L2
dataset_comparison             G.8275.x
G.8275.defaultDS.localPriority 128
ptp_dst_mac                    01:80:C2:00:00:0E

#
# Port Data Set
#
logAnnounceInterval            5
logSyncInterval                -4
logMinDelayReqInterval         -4
announceReceiptTimeout         3
delay_mechanism                E2E

offset_from_master_min_threshold   -50
offset_from_master_max_threshold   50
mean_path_delay_threshold          200
tsmonitor_num_ts                   100
tsmonitor_num_log_sets             3
tsmonitor_num_log_entries          4
tsmonitor_log_wait_seconds         1

#
# Run time options
#
logging_level                  6
path_trace_enabled             0
use_syslog                     1
verbose                        0
summary_interval               0

#
# servo parameters
#
pi_proportional_const          0.000000
pi_integral_const              0.000000
pi_proportional_scale          0.700000
pi_proportional_exponent       -0.300000
pi_proportional_norm_max       0.700000
pi_integral_scale              0.300000
pi_integral_exponent           0.400000
pi_integral_norm_max           0.300000
step_threshold                 0.000002
first_step_threshold           0.000020
max_frequency                  900000000
sanity_freq_limit              0

#
# Default interface options
#
time_stamping                  software


# Interfaces in which ptp should be enabled
# these interfaces should be routed ports
# if an interface does not have an ip address
# the ptp4l will not work as expected.

[swp1]
udp_ttl                 1
masterOnly              0
delay_mechanism         E2E

[swp2]
udp_ttl                 1
masterOnly              0
delay_mechanism         E2E
cumulus@switch:~$ sudo systemctl restart ptp4l.service

To show the current PTP profile setting, run the nv show service ptp <ptp-instance> command:

cumulus@switch:~$ nv show service ptp 1
                             operational  applied             description
---------------------------  -----------  ------------------  --------------------------------------------------------------------
enable                       on           on                  Turn the feature 'on' or 'off'.  The default is 'off'.
current-profile                           default-itu-8275-1  Current PTP profile index
domain                       24           0                   Domain number of the current syntonization
ip-dscp                      46           46                  Sets the Diffserv code point for all PTP packets originated locally.
priority1                    128          128                 Priority1 attribute of the local clock
priority2                    128          128                 Priority2 attribute of the local clock
...

To show the settings for a profile, run the nv show service ptp <instance> profile <profile-name> command:

cumulus@switch:~$ nv show service ptp 1 profile CUSTOM1
                             operational  applied           
---------------------------  -----------  ------------------
enable                                    on                
current-profile                           default-itu-8275-1
domain                                    0                 
ip-dscp                                   46                
logging-level                             info              
priority1                                 128               
priority2                                 128               
[acceptable-master]    
monitor                                                     
  max-offset-threshold                    50                
  max-timestamp-entries                   100               
  max-violation-log-entries               4                 
  max-violation-log-sets                  3                 
  min-offset-threshold                    -50               
  path-delay-threshold                    200               
  violation-log-interval                  1                 

Optional Acceptable Master Table

The acceptable master table option is a security feature that prevents a rogue player from pretending to be the grandmaster clock to take over the PTP network. To use this feature, you configure the clock IDs of known grandmaster clocks in the acceptable master table and set the acceptable master table option on a PTP port. The BMC algorithm checks if the grandmaster clock received in the Announce message is in this table before proceeding with the master selection. Cumulus Linux disables this option by default on PTP ports.

The following example command adds the grandmaster clock ID 24:8a:07:ff:fe:f4:16:06 to the acceptable master table and enables the PTP acceptable master table option for swp1:

cumulus@switch:~$ nv set service ptp 1 acceptable-master 24:8a:07:ff:fe:f4:16:06
cumulus@switch:~$ nv config apply

You can also configure an alternate priority 1 value for the Grandmaster:

cumulus@switch:~$ nv set service ptp 1 acceptable-master 24:8a:07:ff:fe:f4:16:06 alt-priority 2

To enable the PTP acceptable master table option for swp1:

cumulus@switch:~$ nv set interface swp1 ptp acceptable-master on
cumulus@switch:~$ nv config apply

Edit the Default interface options section of the /etc/ptp4l.conf file to add acceptable_master_clockIdentity 248a07.fffe.f41606.

cumulus@switch:~$ sudo nano /etc/ptp4l.conf
...
#
# Default interface options
#
time_stamping           hardware


[acceptable_master_table]
maxTableSize 16
acceptable_master_clockIdentity 248a07.fffe.f41606
...

You can also configure an alternate priority 1 value for the Grandmaster.

cumulus@switch:~$ sudo nano /etc/ptp4l.conf
...
#
# Default interface options
#
time_stamping           hardware


[acceptable_master_table]
maxTableSize 16
acceptable_master_clockIdentity 248a07.fffe.f41606 2

To enable the PTP acceptable master table option for swp1, add acceptable_master on under [swp1].

...
# Default interface options
#
time_stamping           hardware

# Interfaces in which ptp should be enabled
# these interfaces should be routed ports
# if an interface does not have an ip address
# the ptp4l will not work as expected.

[swp1]
udp_ttl                 20
masterOnly              1
delay_mechanism         E2E
acceptable_master       on
...

Restart the ptp4l service:

cumulus@switch:~$ sudo systemctl restart ptp4l.service

Optional Monitor Configuration

Cumulus Linux provides the following optional PTP monitoring configuration.

Configure Clock TimeStamp and Path Delay Thresholds

Cumulus Linux monitors clock timestamp and path delay against thresholds, and generates counters when PTP reaches the set thresholds. You can see the counters in the NVUE nv show command output and in log messages.

You can configure the following monitor settings:

CommandDescription
nv set service ptp <instance> monitor min-offset-thresholdSets the minimum difference allowed between the master and slave time. You can set a value between -1000000000 and 0 nanoseconds. The default value is -50 nanoseconds.
nv set service ptp <instance> monitor max-offset-thresholdSets the maximum difference allowed between the master and slave time. You can set a value between 0 and 1000000000 nanoseconds. The default value is 50 nanoseconds.
nv set service ptp <instance> monitor path-delay-thresholdSets the mean time that PTP packets take to travel between the master and slave. You can set a value between 0 and 1000000000 nanoseconds. The default value is 200 nanoseconds.
nv set service ptp <instance> monitor max-timestamp-entriesSets the maximum number of timestamp entries allowed. Cumulus Linux updates the timestamps continuously. You can specify a value between 100 and 200. The default value is 100 entries.

The following example sets the minimum offset threshold to -1000, the maximum offset threshold to 1000, and the path delay threshold to 300:

cumulus@switch:~$ nv set service ptp 1 monitor min-offset-threshold -1000
cumulus@switch:~$ nv set service ptp 1 monitor max-offset-threshold 1000
cumulus@switch:~$ nv set service ptp 1 monitor path-delay-threshold 300
cumulus@switch:~$ nv config apply

You can configure the following monitor settings manually in the /etc/ptp4l.conf file. Be sure to run the sudo systemctl restart ptp4l.service to apply the settings.

ParameterDescription
offset_from_master_min_thresholdSets the minimum difference allowed between the master and slave time. You can set a value between -1000000000 and 0 nanoseconds. The default value is -50 nanoseconds.
offset_from_master_max_thresholdSets the maximum difference allowed between the master and slave time. You can set a value between 0 and 1000000000 nanoseconds. The default value is 50 nanoseconds.
mean_path_delay_thresholdSets the mean time that PTP packets take to travel between the master and slave. You can set a value between 0 and 1000000000 nanoseconds. The default value is 200 nanoseconds.

The following example sets the minimum offset threshold to -1000, the maximum offset threshold to 1000, and the path delay threshold to 300:

cumulus@switch:~$ sudo nano /etc/ptp4l.conf
...
[global]
#
# Default Data Set
#
slaveOnly               0
priority1               128
priority2               128
domainNumber            0

twoStepFlag             1
dscp_event              46
dscp_general            46

offset_from_master_min_threshold   -1000
offset_from_master_max_threshold   1000
mean_path_delay_threshold          300
...

Configure PTP Logging

A log set contains the log entries for clock timestamp and path delay violations at different times. You can set the number of entries to log and the interval between successive violation logs.

CommandDescription
nv set service ptp 1 monitor max-violation-log-setsSets the maximum number of log sets allowed. You can specify a value between 2 and 4. The default value is 3.
nv set service ptp 1 monitor max-violation-log-entriesSets the maximum number of log entries allowed in a log set. You can specify a value between 4 and 8. The default value is 4.
nv set service ptp 1 monitor violation-log-intervalSets the number of seconds to wait before logging back-to-back violations. You can specify a value between 0 and 60. The default value is 1.

The following example sets the maximum number of log sets allowed to 4, the maximum number of log entries allowed to 6, and the violation log interval to 10:

cumulus@switch:~$ nv set service ptp 1 monitor max-violation-log-sets 4
cumulus@switch:~$ nv set service ptp 1 monitor max-violation-log-entries 6
cumulus@switch:~$ nv set service ptp 1 monitor violation-log-interval 10
cumulus@switch:~$ nv config apply

You can configure the following monitor settings manually in the /etc/ptp4l.conf file. Be sure to run the sudo systemctl restart ptp4l.service to apply the settings.

ParameterDescription
tsmonitor_num_log_setsSets the maximum number of log sets allowed. You can specify a value between 2 and 4. The default value is 3.
tsmonitor_num_log_entriesSets the maximum number of log entries allowed in a log set. You can specify a value between 4 and 8. The default value is 4.
tsmonitor_log_wait_secondsSets the number of seconds to wait before logging back-to-back violations. You can specify a value between 0 and 60. The default value is 1.

The following example sets the maximum number of log sets allowed to 4, the maximum number of log entries allowed to 6, and the violation log interval to 10:

cumulus@switch:~$ sudo nano /etc/ptp4l.conf
...
[global]
#
# Default Data Set
#
slaveOnly               0
priority1               128
priority2               128
domainNumber            0

twoStepFlag             1
dscp_event              46
dscp_general            46

offset_from_master_min_threshold   -50
offset_from_master_max_threshold   50
mean_path_delay_threshold          300
tsmonitor_num_ts                   100
tsmonitor_num_log_sets             4
tsmonitor_num_log_entries          6
tsmonitor_log_wait_seconds         10
...

Show PTP Logs

PTP monitoring provides commands to show counters for violations as well as the timestamp log entries for a violation.

CommandDescription
nv show service ptp <instance> monitor timestamp-logShows the last 25 PTP timestamps.
nv show service ptp <instance> monitor violationsShows the threshold violation count and the last time a violation of a specific type occurred.
nv show service ptp 1 monitor violations log acceptable-masterShows logs with violations that occur when a PTP server not in the Acceptable Master table sends an Announce request.
nv show service ptp 1 monitor violations log forced-masterShows logs with violations that occur when a forced master port gets a higher clock.
nv show service ptp 1 monitor violations log max-offsetShows logs with violations that occur when the timestamp offset is higher than the max offset threshold.
nv show service ptp 1 monitor violations log min-OffsetShows logs with violations that occur when the timestamp offset is lower than the minimum offset threshold.
nv show service ptp 1 monitor violations log path-delayShows logs with violations that occur when the mean path delay is higher than the path delay threshold.

The following example shows the threshold violation count and the last time a minimum offset threshold violation occurred:

cumulus@switch:~$ nv show service ptp 1 monitor violations
                  operational                  applied
----------------  ---------------------------  -------
last-max-offset
last-min-offset   2023-04-24T15:22:01.312295Z
last-path-delay
max-offset-count  0
min-offset-count  2
path-delay-count  0

Clear PTP Violation Logs

cumulus@leaf01:mgmt:~$ nv action clear service ptp 1 monitor violations log path-delay
Action succeeded

Delete PTP Configuration

To delete PTP configuration, delete the PTP master and slave interfaces. The following example commands delete the PTP interfaces swp1, swp2, and swp3.

cumulus@switch:~$ nv unset interface swp1 ptp
cumulus@switch:~$ nv unset interface swp2 ptp
cumulus@switch:~$ nv unset interface swp3 ptp
cumulus@switch:~$ nv config apply

Edit the /etc/ptp4l.conf file to remove the interfaces from the Default interface options section, then restart the ptp4l service.

cumulus@switch:~$ sudo nano /etc/ptp4l.conf
...
# Default interface options
#
time_stamping           hardware

# Interfaces in which ptp should be enabled
# these interfaces should be routed ports
# if an interface does not have an ip address
# the ptp4l will not work as expected.
cumulus@switch:~$ sudo systemctl restart ptp4l.service

To disable PTP on the switch and stop the ptp4l and phc2sys processes:

cumulus@switch:~$ nv set service ptp 1 enable off
cumulus@switch:~$ nv config apply
cumulus@switch:~$ sudo systemctl stop ptp4l.service phc2sys.service
cumulus@switch:~$ sudo systemctl disable ptp4l.service phc2sys.service

Troubleshooting

Show PTP Configuration

To show a summary of the PTP configuration on the switch, run the nv show service ptp <instance> command:

cumulus@switch:~$ nv show service ptp 1
                             operational  applied
---------------------------  -----------  ------------------
enable                       on           on
current-profile                            default-itu-8275-2
domain                                    0
ip-dscp                                   46
logging-level                             info
priority1                                 128
priority2                                 128
[acceptable-master]
monitor
  max-offset-threshold                     50
  max-timestamp-entries                   100
  max-violation-log-entries               4
  max-violation-log-sets                  2
  min-offset-threshold                     -50
  path-delay-threshold                    200
  violation-log-interval                  1
[profile]                                  abc
[profile]                                  default-1588
[profile]                                  default-itu-8275-1
[profile]                                  default-itu-8275-2
[unicast-master]                          1
[unicast-master]                          2
[unicast-master]                          3
[unicast-master]                          4
...

You can drill down with the following nv show service ptp <instance> commands:

Show PTP Interface Configuration

To check configuration for a PTP interface, run the nv show interface <interface> ptp command.

cumulus@switch:~$ nv show interface swp1 ptp
                           operational  applied     description
-------------------------  -----------  ----------  ----------------------------------------------------------------------
enable                                  on          Turn the feature 'on' or 'off'.  The default is 'off'.
acceptable-master                       off         Determines if acceptable master check is enabled for this interface.
delay-mechanism            end-to-end   end-to-end  Mode in which PTP message is transmitted.
forced-master              off          off         Configures PTP interfaces to forced master state.
instance                                1           PTP instance number.
mixed-multicast-unicast                 off         Enables Multicast for Announce, Sync and Followup and Unicast for D...
transport                  ipv4         ipv4        Transport method for the PTP messages.
ttl                        1            1           Maximum number of hops the PTP messages can make before it gets dro...
unicast-request-duration                300         The service time in seconds to be requested during discovery.
timers
  announce-interval        0            0           Mean time interval between successive Announce messages.  It's spec...
  announce-timeout         3            3           The number of announceIntervals that have to pass without receipt o...
  delay-req-interval       -3           -3          The minimum permitted mean time interval between successive Delay R...
  sync-interval            -3           -3          The mean SyncInterval for multicast messages.  It's specified as a...
peer-mean-path-delay       0                        An estimate of the current one-way propagation delay on the link wh...
port-state                 master                   State of the port
protocol-version           2                        The PTP version in use on the port

Show PTP Counters

To show all PTP counters, run the nv show service ptp <instance> counters command:

cumulus@switch:~$ nv show service ptp 1 counters
Packet Type              Received       Transmitted    
---------------------    ------------   ------------   
Port swp4
  Announce                 0              10370            
  Sync                     0              20731             
  Follow-up                0              20731            
  Delay Request            0              0              
  Delay Response           0              0              
  Peer Delay Request       0              0              
  Peer Delay Response      0              0              
  Management               0              0              
  Signaling                0              0

To show PTP counters for an interface, run the nv show interface <interface> counters ptp command.

To clear PTP counters for an interface, run the nv action clear interface <interface> counters ptp command:

cumulus@switch:~$ nv action clear interface swp1 counters ptp
Action succeeded

Show the Status of All PTP Interfaces

To show the status of all PTP interfaces, run the nv show service ptp <instance> status command. The command output shows the PTP enabled ports, the PTP port mode (unicast or multicast), the state of the port based on BMCA, the unicast state, and identifies the server address to which the client connects.

cumulus@switch:~$ nv show service ptp 1 status
Port   Mode   State    Ustate                           Server
-----  -----  -------  -------------------------------  -------
swp9   Ucast  SLAVE    Sync and Delay Granted (H_SYDY)  9.9.9.2
swp10  Ucast  PASSIVE  Initial State (WAIT)
swp11  Ucast  PASSIVE  Initial State (WAIT)
swp12  Ucast  PASSIVE  Initial State (WAIT)

Show the List of NVUE PTP Commands

cumulus@switch:~$ nv list-commands service ptp
nv show service ptp
nv show service ptp <instance-id>
nv show service ptp <instance-id> status
nv show service ptp <instance-id> domain
nv show service ptp <instance-id> priority1
nv show service ptp <instance-id> priority2
nv show service ptp <instance-id> ip-dscp
nv show service ptp <instance-id> acceptable-master
...
cumulus@switch:~$ nv list-commands | grep 'nv show interface <interface-id> ptp'
...
nv show interface <interface-id> ptp
nv show interface <interface-id> ptp timers
nv show interface <interface-id> ptp shaper
...

Example Configuration

In the following example, the boundary clock on the switch receives time from Master 1 (the grandmaster) on PTP slave port swp1, sets its clock and passes the time down through PTP master ports swp2, swp3, and swp4 to the hosts that receive the time.

The following example configuration assumes that you have already configured the layer 3 routed interfaces (swp1, swp2, swp3, and swp4) you want to use for PTP.

cumulus@switch:~$ nv set service ptp 1 enable on
cumulus@switch:~$ nv set service ptp 1 priority2 254
cumulus@switch:~$ nv set service ptp 1 priority1 254
cumulus@switch:~$ nv set service ptp 1 domain 3
cumulus@switch:~$ nv set interface swp1 ptp enable on
cumulus@switch:~$ nv set interface swp2 ptp enable on
cumulus@switch:~$ nv set interface swp3 ptp enable on
cumulus@switch:~$ nv set interface swp4 ptp enable on
cumulus@switch:~$ nv config apply
cumulus@switch:~$ sudo cat /etc/nvue.d/startup.yaml
- set:
    interface:
      lo:
        ip:
          address:
            10.10.10.1/32: {}
        type: loopback
      swp1:
        ptp:
          enable: on
        type: swp
      swp2:
        ptp:
          enable: on
        type: swp
      swp3:
        ptp:
          enable: on
        type: swp
      swp4:
        ptp:
          enable: on
        type: swp
    service:
      ptp:
        '1':
          domain: 3
          enable: on
          priority1: 254
          priority2: 254
cumulus@switch:~$ sudo cat /etc/ptp4l.conf
...
[global]
#
# Default Data Set
#
slaveOnly                      0
priority1                      254
priority2                      254
domainNumber                   3

twoStepFlag                    1
dscp_event                     46
dscp_general                   46

offset_from_master_min_threshold   -50
offset_from_master_max_threshold   50
mean_path_delay_threshold          200
tsmonitor_num_ts                   100
tsmonitor_num_log_sets             2
tsmonitor_num_log_entries          4
tsmonitor_log_wait_seconds         1

#
# Run time options
#
logging_level                  6
path_trace_enabled             0
use_syslog                     1
verbose                        0
summary_interval               0

#
# servo parameters
#
pi_proportional_const          0.000000
pi_integral_const              0.000000
pi_proportional_scale          0.700000
pi_proportional_exponent       -0.300000
pi_proportional_norm_max       0.700000
pi_integral_scale              0.300000
pi_integral_exponent           0.400000
pi_integral_norm_max           0.300000
step_threshold                 0.000002
first_step_threshold           0.000020
max_frequency                  900000000
sanity_freq_limit              0

#
# Default interface options
#
time_stamping                  software


# Interfaces in which ptp should be enabled
# these interfaces should be routed ports
# if an interface does not have an ip address
# the ptp4l will not work as expected.

[swp1]
udp_ttl                      1
masterOnly                   0
delay_mechanism              E2E
network_transport            UDPv4

[swp2]
udp_ttl                      1
masterOnly                   0
delay_mechanism              E2E
network_transport            UDPv4

[swp3]
udp_ttl                      1
masterOnly                   0
delay_mechanism              E2E
network_transport            UDPv4

[swp4]
udp_ttl                      1
masterOnly                   0
delay_mechanism              E2E
network_transport            UDPv4

Considerations

PTP Traffic Shaping

To improve performance on the NVIDIA Spectrum 1 switch for PTP-enabled ports with speeds lower than 100G, you can enable a pre-defined traffic shaping profile. For example, if you see that the PTP timing offset varies widely and does not stabilize, enable PTP shaping on all PTP enabled ports to reduce the bandwidth on the ports slightly and improve timing stabilization.

  • Switches with Spectrum-2 and later do not support PTP shaping.

  • Bonds do not support PTP shaping.

  • You cannot configure QoS traffic shaping and PTP traffic shaping on the same ports.

  • You must configure a strict priority for PTP traffic; for example:

    cumulus@switch:~$ nv set qos egress-scheduler default-global traffic-class 0-5,7 mode dwrr
    cumulus@switch:~$ nv set qos egress-scheduler default-global traffic-class 0-5,7 bw-percent 12
    cumulus@switch:~$ nv set qos egress-scheduler default-global traffic-class 6 mode strict
    

For each PTP-enabled port on which you want to set traffic shaping, run the nv set interface <interface> ptp shaper enable on command.

cumulus@switch:~$ nv set interface swp1 ptp shaper enable on
cumulus@switch:~$ nv set interface swp2 ptp shaper enable on
cumulus@switch:~$ nv config apply

To see the PTP shaping setting for an interface, run the nv show interface <interface> ptp shaper command:

cumulus@switch:~$ nv show interface swp1 ptp shaper
        operational  applied  
------  -----------  -------  
enable               on   

In the /etc/cumulus/switchd.d/ptp_shaper.conf file, set the following parameters for the interfaces to which you want to apply traffic shaping and enable the traffic shaper. You must reload switchd for the changes to take effect.

cumulus@switch:~$ sudo nano /etc/cumulus/switchd.d/ptp_shaper.conf
## Per-port configuration for PTP shaper
ptp_shaper.port_group_list = [enable-group]
ptp_shaper.enable-group.port_set = swp1,swp2
ptp_shaper.enable-group.ptp_shaper_enable = true
cumulus@switch:~$ sudo systemctl reload switchd.service

Spanning Tree and PTP

PTP frames are affected by STP filtering; events, such as an STP topology change (where ports temporarily go into the blocking state), can cause interruptions to PTP communications.

If you configure PTP on bridge ports, NVIDIA recommends that the bridge ports are spanning tree edge ports or in a bridge domain where spanning tree is disabled.

Authentication Authorization and Accounting

This section describes how to set up user accounts and ssh for remote access, and configure LDAP authentication, TACACS+, and RADIUS AAA.

SSH for Remote Access

Cumulus Linux uses the OpenSSH package to provide access to the system using the Secure Shell (SSH) protocol.

Configure SSH

You can configure SSH to provide login access to the root user and to specific user accounts, limit SSH to listen on a specific VRF, and configure timeouts and session options.

Root User Settings

By default, the root account cannot use SSH to log in.

You can configure the root account to use SSH to log into the switch with:

To allow the root account to SSH into the switch with a password:

cumulus@switch:~$ nv set system ssh-server permit-root-login enabled
cumulus@switch:~$ nv config apply

Run the nv set system ssh-server permit-root-login disabled command to disable SSH login for the root account with a password.

To allow the root account to SSH into the switch and authenticate with a public key or any allowed mechanism that is not a password and not keyboard interactive:

cumulus@switch:~$ nv set system ssh-server permit-root-login prohibit-password
cumulus@switch:~$ nv config apply

To allow the root account to SSH into the switch and only run a set of commands defined in the authorized_keys file:

cumulus@switch:~$ nv set system ssh-server permit-root-login forced-commands-only
cumulus@switch:~$ nv config apply

To allow the root account to SSH into the switch using a password, edit the /etc/ssh/sshd_config file and set the PermitRootLogin option to yes:

cumulus@switch:~$ sudo cat /etc/ssh/sshd_config
...
# Authentication:
LoginGraceTime 2m
PermitRootLogin yes
...

Set the PermitRootLogin command to no to disable SSH login with a password.

To allow the root account to SSH into the switch and authenticate with a public key or any allowed mechanism that is not a password and not keyboard interactive:

  1. Create an .ssh directory for the root user.

    cumulus@switch:~$ sudo mkdir -p /root/.ssh
    cumulus@switch:~$ sudo chmod 0700 /root/.ssh 
    
  2. As a privileged user (such as the cumulus user), either echo the public key contents and redirect the contents to the authorized key file or copy the public key file to the switch, then copy it to the root account (with privilege escalation).

    To echo the public key contents and redirect the contents to the authorized key file:

    cumulus@switch:~$ echo "<SSH public key contents>" | sudo tee -a /root/.ssh/authorized_keys 
    cumulus@switch:~$ sudo chmod 0644 /root/.ssh/authorized_keys 
    

    To copy the public key file to the switch, then copy it to the root account:

    cumulus@switch:~$ sudo cp <SSH public key file> /root/.ssh/authorized_keys 
    cumulus@switch:~$ sudo chmod 0644 /root/.ssh/authorized_keys
    

Allow and Deny Users

To allow certain users to establish an SSH session:

cumulus@switch:~$ nv set system ssh-server allow-users user1
cumulus@switch:~$ nv config apply

To deny certain users to establish an SSH session:

cumulus@switch:~$ nv set system ssh-server deny-users user4
cumulus@switch:~$ nv config apply

To allow certain users to establish an SSH session, edit the /etc/ssh/sshd_config file and add the AllowUsers parameter:

cumulus@switch:~$ sudo cat /etc/ssh/sshd_config
...
...
# Example of overriding settings on a per-user basis
#Match User anoncvs
# X11Forwarding no
# AllowTcpForwarding no
# PermitTTY no
# ForceCommand cvs server
AllowUsers = user1

To deny certain users to establish an SSH session, edit the /etc/ssh/sshd_config file and add the DenyUsers parameter:

cumulus@switch:~$ sudo cat /etc/ssh/sshd_config
...
# Example of overriding settings on a per-user basis
#Match User anoncvs
# X11Forwarding no
# AllowTcpForwarding no
# PermitTTY no
# ForceCommand cvs server
AllowUsers = user1
DenyUsers  = user4

SSH and VRFs

The SSH service runs in the default VRF on the switch but listens on all interfaces in all VRFs. You can limit SSH to listen on specific VRFs.

You cannot run SSH in the default VRF and other VRFs at the same time.

The following example configures SSH to listen only on the management VRF:

cumulus@switch:~$ nv set system ssh-server vrf mgmt
cumulus@switch:~$ nv config apply

The following example configures SSH to listen on the management VRF and VRF RED:

cumulus@switch:~$ nv set system ssh-server vrf mgmt
cumulus@switch:~$ nv set system ssh-server vrf RED
cumulus@switch:~$ nv config apply

Bind the SSH service to the VRF. The following example configures SSH to listen only on the management VRF:

cumulus@switch:~$ sudo systemctl stop ssh.service
cumulus@switch:~$ sudo systemctl disable ssh.service
cumulus@switch:~$ sudo systemctl start ssh@mgmt.service
cumulus@switch:~$ sudo systemctl enable ssh@mgmt.service

The following example configures SSH to listen on the management VRF and VRF RED:

cumulus@switch:~$ sudo systemctl stop ssh.service
cumulus@switch:~$ sudo systemctl disable ssh.service
cumulus@switch:~$ sudo systemctl start ssh@mgmt.service
cumulus@switch:~$ sudo systemctl enable ssh@mgmt.service
cumulus@switch:~$ sudo systemctl start ssh@RED.service
cumulus@switch:~$ sudo systemctl enable ssh@RED.service

To configure SSH to listen to only one IP address or a subnet in a VRF, you need to bind the service to that VRF (as above), then set the ListenAddress parameter in the /etc/ssh/sshd_config file to the IP address or subnet in that VRF.

cumulus@switch:~$ sudo cat /etc/ssh/sshd_config
...

#Port 22
#AddressFamily any
ListenAddress 10.10.10.6
#ListenAddress ::

Enable and Disable the SSH Server

Cumulus Linux enables the SSH server by default. To disable the SSH server:

cumulus@switch:~$ nv set system ssh-server state disabled
cumulus@switch:~$ nv config apply

Run the nv set system ssh-server state enabled command to renable the SSH server.

cumulus@switch:~$ sudo systemctl stop ssh.service
cumulus@switch:~$ sudo systemctl disable ssh.service

To renable the SSH server:

cumulus@switch:~$ sudo systemctl start ssh.service
cumulus@switch:~$ sudo systemctl enable ssh.service

Configure Timeouts and Sessions

You can configure the following SSH timeout and session options:

The following example configures the number of login attempts allowed before rejecting the SSH session to 10 and the number of seconds allowed before login times out to 200:

cumulus@switch:~$ nv set system ssh-server authentication-retries 10
cumulus@switch:~$ nv set system ssh-server login-timeout 200
cumulus@switch:~$ nv config apply

Edit the /etc/ssh/sshd_config file and change the MaxAuthTries parameter in the Authentication section to 10 and the LoginGraceTime parameter to 200:

cumulus@switch:~$ sudo nano /etc/ssh/sshd_config
...
# Authentication:

LoginGraceTime 200s
PermitRootLogin prohibit-password
#StrictModes yes
MaxAuthTries 10
MaxSessions 10

The following example configures the TCP port that listens for incoming SSH sessions to 443:

cumulus@switch:~$ nv set system ssh-server port 443
cumulus@switch:~$ nv config apply

Edit the /etc/ssh/sshd_config file and add the Port parameter:

cumulus@switch:~$ sudo nano /etc/ssh/sshd_config
...
Port 443
#AddressFamily any
#ListenAddress 0.0.0.0
#ListenAddress ::
...

The following example configures the amount of time a session can be inactive before the SSH server terminates the connection to 5 minutes (300 seconds) and the maximum number of SSH sessions allowed per TCP connection to 5:

cumulus@switch:~$ nv set system ssh-server inactive-timeout 5
cumulus@switch:~$ nv set system ssh-server max-sessions-per-connection 5
cumulus@switch:~$ nv config apply

Edit Authentication section of the /etc/ssh/sshd_config file.

  • To configure the amount of time (in seconds) a session can be inactive before the SSH server terminates the connection, change the ClientAliveInterval parameter.
  • To configure the maximum number of SSH sessions allowed per TCP connection, change the MaxSessions parameter.
cumulus@switch:~$ sudo nano /etc/ssh/sshd_config
...
# Authentication:

LoginGraceTime 120s
PermitRootLogin prohibit-password
#StrictModes yes
MaxAuthTries 10
MaxSessions 5
...
#AllowAgentForwarding yes
#AllowTcpForwarding yes
#GatewayPorts no
X11Forwarding yes
#X11DisplayOffset 10
#X11UseLocalhost yes
#PermitTTY yes
PrintMotd no
#PrintLastLog yes
#TCPKeepAlive yes
#PermitUserEnvironment no
#Compression delayed
ClientAliveInterval 300
...

The following example configures:

cumulus@switch:~$ nv set system ssh-server max-unauthenticated throttle-start 5
cumulus@switch:~$ nv set system ssh-server max-unauthenticated throttle-percent 22
cumulus@switch:~$ nv set system ssh-server max-unauthenticated session-count 20
cumulus@switch:~$ nv config apply

Edit the /etc/ssh/sshd_config file and change the MaxStartups parameter.

The following example configures:

  • The number of unauthenticated SSH sessions allowed before throttling starts to 5.
  • The starting percentage of connections to reject above the throttle start count before reaching the session count limit to 22.
  • The maximum number of unauthenticated SSH sessions allowed to 20.
cumulus@switch:~$ sudo nano /etc/ssh/sshd_config
...
MaxStartups 5:22:20
...

Generate and Install an SSH Key Pair

This section describes how to generate an SSH key pair on one system and install the key as an authorized key on another system.

Generate an SSH Key Pair

To generate an SSH key pair, run the ssh-keygen command and follow the prompts.

To configure the system without a password, do not enter a passphrase when prompted in the following step.

cumulus@host01:~$ ssh-keygen 
Generating public/private rsa key pair. 
Enter file in which to save the key (/home/cumulus/.ssh/id_rsa): 
Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 
Your identification has been saved in /home/cumulus/.ssh/id_rsa. 
Your public key has been saved in /home/cumulus/.ssh/id_rsa.pub. 
The key fingerprint is: 
5a:b4:16:a0:f9:14:6b:51:f6:f6:c0:76:1a:35:2b:bb cumulus@leaf04 
The key's randomart image is: 
+---[RSA 2048]----+ 
|      +.o   o    | 
|     o * o . o   | 
|    o + o O o    | 
|     + . = O     | 
|      . S o .    | 
|       +   .     | 
|      .   E      | 
|                 | 
|                 | 
+-----------------+ 

Install an Authorized SSH Key

To install an authorized SSH key, you take the contents of an SSH public key and add it to the SSH authorized key file (~/.ssh/authorized_keys) of the user.

A public key is a text file with three space separated fields:

<type> <key string> <comment>
FieldDescription
<type> The algorithm you want to use to hash the key. The algorithm can be ecdsa-sha2-nistp256, ecdsa-sha2-nistp384, ecdsa-sha2-nistp521, ssh-dss, ssh-ed25519, or ssh-rsa (the default value).
<key string>A base64 format string for the key.
<comment>A single word string. By default, this is the name of the system that generated the key. NVUE uses the <comment> field as the key name.

The procedure to install an authorized SSH key is different based on whether the user is an NVUE managed user or a non-NVUE managed user.

The following example adds an authorized key named prod_key to the user admin2. The content of the public key file is ssh-rsa 1234 prod_key.

cumulus@leaf01:~$ nv set system aaa user admin2 ssh authorized-key prod_key key XABDB3NzaC1yc2EAAAADAQABAAABgQCvjs/RFPhxLQMkckONg+1RE1PTIO2JQhzFN9TRg7ox7o0tfZ+IzSB99lr2dmmVe8FRWgxVjc...
cumulus@leaf01:~$ nv set system aaa user admin2 ssh authorized-key prod_key type ssh-rsa
cumulus@leaf01:~$ nv config apply

The following example adds an authorized key file from the account cumulus on a host to the cumulus account on the switch:

  1. To copy a previously generated public key to the desired location, run the ssh-copy-id command and follow the prompts:

    cumulus@host01:~$ ssh-copy-id -i /home/cumulus/.ssh/id_rsa.pub cumulus@leaf02
    The authenticity of host 'leaf02 (192.168.0.11)' can't be established.
    ECDSA key fingerprint is b1:ce:b7:6a:20:f4:06:3a:09:3c:d9:42:de:99:66:6e.
    Are you sure you want to continue connecting (yes/no)? yes
    /usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
    /usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
    cumulus@leaf01's password:
    Number of key(s) added: 1
    

    The ssh-copy-id command does not work if the username on the remote switch is different from the username on the local switch. To work around this issue, use the scp command instead:

    cumulus@host01:~$ scp .ssh/id_rsa.pub cumulus@leaf02:.ssh/authorized_keys
    Enter passphrase for key '/home/cumulus/.ssh/id_rsa':
    id_rsa.pub
    
  2. Connect to the remote switch to confirm that the authentication keys are in place:

    cumulus@leaf01:~$ ssh cumulus@leaf02
    Welcome to Cumulus VX (TM) 
    Cumulus VX (TM) is a community supported virtual appliance designed for
    experiencing, testing and prototyping the latest technology.
    For any questions or technical support, visit our community site at:
    http://community.cumulusnetworks.com 
    The registered trademark Linux (R) is used pursuant to a sublicense from LMI,
    the exclusive licensee of Linus Torvalds, owner of the mark on a world-wide basis. 
    Last login: Thu Sep 29 16:56:54 2016
    

Troubleshooting

To show all the current SSH server configuration settings, run the NVUE nv show system ssh-server command:

cumulus@switch:~$ nv show system ssh-server
                             applied          
---------------------------  -----------------
authentication-retries       6                
inactive-timeout             0                
login-timeout                120              
max-sessions-per-connection  10               
permit-root-login            prohibit-password
state                        enabled          
max-unauthenticated                           
  session-count              100              
  throttle-percent           30               
  throttle-start             10

To show the current number of active SSH sessions, run the NVUE nv show system ssh-server active-sessions command or the Linux w command:

cumulus@switch:~$ nv show system ssh-server active-sessions
Peer Address:Port    Local Address:Port      State
-------------------  ----------------------  -----
192.168.200.1:46528  192.168.200.11%mgmt:22  ESTAB
cumulus@switch:~$ w
 11:10:46 up 19:19,  4 users,  load average: 0.08, 0.05, 0.05
USER     TTY      FROM             LOGIN@   IDLE   JCPU   PCPU WHAT
cumulus  ttyS0    -                Wed15   19:19m  0.03s  0.02s -bash
cumulus  pts/0    192.168.200.1    07:27    3:43m  0.03s  0.03s -bash
cumulus  pts/1    192.168.200.1    10:01    1:09m  0.02s  0.02s -bash
cumulus  pts/2    192.168.200.1    11:10    1.00s  0.03s  0.00s w

To show which users can establish an SSH session, run the nv show system ssh-server allow-users command. To show which users cannot establish an SSH session, run the nv show system ssh-server deny-users command. You can also show information for a specific user with the nv show system ssh-server allow-users <user> command and the nv show system ssh-server deny-users <user> command.

To show the TCP port numbers that listen for incoming SSH sessions, run the nv show system ssh-server port command. You can also show information for a specific port with the nv show system ssh-server port <port> command.

To show the SSH timer and session information, run the nv show system ssh-server max-unauthenticated command:

cumulus@switch:~$ nv show system ssh-server max-unauthenticated
                  applied
----------------  -------
session-count     20     
throttle-percent  22     
throttle-start    5

User Accounts

By default, Cumulus Linux has two user accounts: cumulus and root.

The cumulus account:

The root account:

Add a New User Account

You can add additional user accounts as needed.

Use the following roles to set the permissions for local user accounts.

Role
Permissions
system-adminAllows the user to use sudo to run commands as the privileged user, run nv show commands, run nv set and nv unset commands to stage configuration changes, and run nv apply commands to apply configuration changes.
nvue-adminAllows the user to run nv show commands, run nv set and nv unset commands to stage configuration changes, and run nv apply commands to apply configuration changes.
nvue-monitorAllows the user to run nv show commands only.

The following example:

  • Creates a new user account called admin2 and sets the role to system-admin (permissions for sudo, nv show, nv set and nvunset, and nv apply).
  • Sets a plain text password. NVUE hashes the plain text password and stores the value as a hashed password. To set a hashed password, see Hashed Passwords, below.
  • Adds the full name FIRST LAST. If the full name includes more than one name, either separate the names with a hyphen (FIRST-LAST) or enclose the full name in quotes ("FIRST LAST").
cumulus@switch:~$ nv set system aaa user admin2 role system-admin
cumulus@switch:~$ nv set system aaa user admin2 password
Enter new password:
Confirm password:
cumulus@switch:~$ nv set system aaa user admin2 full-name "FIRST LAST"
cumulus@switch:~$ nv config apply

You can also run the nv set system aaa user <user> password <plain-text-password> command to specify the plain text password inline. This command bypasses the Enter new password and Confirm password prompts but displays the plain text password as you type it.

If you are an NVUE-managed user, you can update your own password with the Linux passwd command.

Use the following groups to set permissions for local user accounts. To add users to these groups, use the useradd(8) or usermod(8) commands:

GroupPermissions
sudoAllows the user to use sudo to run commands as the privileged user.
nvshowAllows the user to run nv show commands only.
nvsetAllows the user to run nv show commands, and run nv set and nv unset commands to stage configuration changes.
nvapplyAllows the user to run nv show commands, run nv set and nv unset commands to stage configuration changes, and run nv apply commands to apply configuration changes.

The following example:

  • Creates a new user account called admin2, creates a home directory for the user, and adds the full name First Last.
  • Securely sets the password for the user with passwd.
  • Sets the group membership (role) to sudo and nvapply (permissions to use sudo, nv show, nv set, and nv apply).
cumulus@switch:~$ sudo useradd admin2 -m -c "First Last"
cumulus@switch:~$ sudo passwd admin2
Enter new UNIX password:
Retype new UNIX password:
passwd: password updated successfully
cumulus@switch:~$ sudo adduser admin2 sudo
cumulus@switch:~$ sudo adduser admin2 nvapply

When you use Linux commands to add a new user, you must create a home directory for the user with the -m option. NVUE commands create a home directory automatically.

Only the following user accounts can create, modify, and delete other system-admin accounts:

  • NVUE-managed users with the system-admin role.
  • The root user.
  • Non NVUE-managed users that are in the sudo group.

Hashed Passwords

Instead of a plain text password, you can provide a hashed password for a local user.

You must specify the hashed password in Linux crypt format; the password must be a minimum of 15 to 20 characters long and must include special characters, digits, lower case alphabetic letters, and more. Typically, the password format is set to $id$salt$hashed, where $id is the hashing algorithm. In GNU or Linux:

To generate a hashed password on the switch, you can either run a python3 command or install and use the mkpasswd utility:

Run the following command on the switch or Linux host. When prompted, enter the plain text password you want to hash:

cumulus@switch:~$ python3 -c "import crypt; import getpass; print(crypt.crypt(getpass.getpass(), salt=crypt.METHOD_SHA512))"                    
Password:                                                                                                                                                                 
$6$MIDE.sdxwxuAMGHd$XFXSpHV4NRJymUpeCKz.SYEMUfGGEtLbcqK0fBw3d96ZzegP3sw6ppl5Atx9xLS3UHLLTWS/BOwjkeBJJaRx10
  1. Install the mkpasswd utility on the switch or Linux host:
cumulus@switch:~$ sudo -E apt-get update
cumulus@switch:~$ sudo -E apt-get install whois
  1. To generate a hashed password for SHA-512, SHA256, or MD5 encryption, run the following command. When prompted, enter the plain text password you want to hash:

    SHA-512 encryption:

    cumulus@switch:~$ mkpasswd -m SHA-512
    Password:
    $6$bQcjKuWgKC0vdwT5$.ZlRgmS44geDH/HsCIttldsaxJ7Y/NidicXwR0FarwXq74uA/yJHxQXGHZwNviY/cG412i7Grzl6Wk8mStJwD0
    

    SHA256 encryption:

    cumulus@switch:~$ mkpasswd -m SHA-256
    Password:
    $5$SJsPU8bjl2F$.fzRpTGxwGw82RDdFPwhIermSSh6g2ZCYzPeNpeDrgC
    

    MD5 encryption:

    cumulus@switch:~$ mkpasswd -m MD5
    Password:
    $1$/ETjhZMJ$P73qhBZEYP20mKnRkhBol0
    

To set the hashed password for the local user:

Run the nv set system aaa user <username> hashed-password <password> command:

cumulus@switch:~$ nv set system aaa user admin2 hashed-password '$1$/ETjhZMJ$P73qhBZEYP20mKnRkhBol0'
cumulus@switch:~$ nv config apply
cumulus@switch:~$ sudo useradd admin2 -c "First Last" -p '$1$/ETjhZMJ$P73qhBZEYP20mKnRkhBol0'

Hashed password strings contain characters, such as $, that have a special meaning in the Linux shell; you must enclose the hashed password in single quotes (').

Delete a User Account

To delete a user account:

Run the nv unset system aaa user <user> command. The following example deletes the user account called admin2.

cumulus@switch:~$ nv unset system aaa user admin2
cumulus@switch:~$ nv config apply

Run the sudo userdel <user> command. The following example deletes the user account called admin2.

cumulus@switch:~$ sudo userdel admin2

Show User Accounts

To show the user accounts configured on the system, run the NVUE nv show system aaa command or the linux sudo cat /etc/passwd command.

cumulus@switch:~$ nv show system aaa
Username          Full-name                           Role          enable
----------------  ----------------------------------  ------------  ------
Debian-snmp                                           Unknown       system
_apt                                                  Unknown       system
_lldpd                                                Unknown       system
admin2            FIRST LAST                          system-admin  on    
...

To show information about a specific user account, run the NVUE nv show system aaa user <user> command:

cumulus@switch:~$ nv show system aaa user admin2
                 operational   applied     
---------------  ------------  ------------
full-name        FIRST LAST    FIRST LAST  
hashed-password  *             *           
role             system-admin  system-admin
enable           on            on  

Enable the root User

The root user does not have a password and cannot log into a switch using SSH. This default account behavior is consistent with Debian.

Enable Console Access

To log into the switch using root from the console, you must set the password for the root account:

cumulus@switch:~$ sudo passwd root
Enter new password:
...

Enable SSH Access

To log into the switch using root with SSH, either:

Using sudo to Delegate Privileges

By default, Cumulus Linux has two user accounts: root and cumulus. The cumulus account is a normal user and is in the group sudo.

You can add more user accounts as needed. Like the cumulus account, these accounts must use sudo to execute privileged commands.

sudo Basics

sudo allows you to execute a command as superuser or another user as specified by the security policy.

The default security policy is sudoers, which you configure in the /etc/sudoers file. Use /etc/sudoers.d/ to add to the default sudoers policy.

Use visudo only to edit the sudoers file; do not use another editor like vi or emacs.

When creating a new file in /etc/sudoers.d, use visudo -f. This option performs sanity checks before writing the file to avoid errors that prevent sudo from working.

Errors in the sudoers file can result in losing the ability to elevate privileges to root. You can fix this issue only by power cycling the switch and booting into single user mode. Before modifying sudoers, enable the root user by setting a password for the root user.

By default, users in the sudo group can use sudo to execute privileged commands. To add users to the sudo group, use the useradd(8) or usermod(8) command. To see which users belong to the sudo group, see /etc/group (man group(5)).

You can run any command as sudo, including su. You must enter a password.

The example below shows how to use sudo as a non-privileged user cumulus to bring up an interface:

cumulus@switch:~$ ip link show dev swp1
3: swp1: <BROADCAST,MULTICAST> mtu 1500 qdisc pfifo_fast master br0 state DOWN mode DEFAULT qlen 500
link/ether 44:38:39:00:27:9f brd ff:ff:ff:ff:ff:ff

cumulus@switch:~$ ip link set dev swp1 up
RTNETLINK answers: Operation not permitted

cumulus@switch:~$ sudo ip link set dev swp1 up
Password:

umulus@switch:~$ ip link show dev swp1
3: swp1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master br0 state UP mode DEFAULT qlen 500
link/ether 44:38:39:00:27:9f brd ff:ff:ff:ff:ff:ff

sudoers Examples

The following examples show how you grant as few privileges as necessary to a user or group of users to allow them to perform the required task. Each example uses the system group noc; groups include the prefix %.

When an unprivileged user runs a command, the command must include the sudo prefix.

CategoryPrivilegeExample Commandsudoers Entry
MonitoringSwitch port informationethtool -m swp1%noc ALL=(ALL) NOPASSWD:/sbin/ethtool
MonitoringSystem diagnosticscl-support%noc ALL=(ALL) NOPASSWD:/usr/cumulus/bin/cl-support
MonitoringRouting diagnosticscl-resource-query%noc ALL=(ALL) NOPASSWD:/usr/cumulus/bin/cl-resource-query
Image managementInstall imagesonie-select http://lab/install.bin%noc ALL=(ALL) NOPASSWD:/usr/cumulus/bin/onie-select
Package managementAny apt-get commandapt-get update or apt-get install%noc ALL=(ALL) NOPASSWD:/usr/bin/apt-get
Package managementJust apt-get updateapt-get update%noc ALL=(ALL) NOPASSWD:/usr/bin/apt-get update
Package managementInstall packagesapt-get install vim%noc ALL=(ALL) NOPASSWD:/usr/bin/apt-get install *
Package managementUpgradingapt-get upgrade%noc ALL=(ALL) NOPASSWD:/usr/bin/apt-get upgrade
NetfilterInstall ACL policiescl-acltool -i%noc ALL=(ALL) NOPASSWD:/usr/cumulus/bin/cl-acltool
NetfilterList iptables rulesiptables -L%noc ALL=(ALL) NOPASSWD:/sbin/iptables
Layer 1 and 2Any LLDP commandlldpcli show neighbors / configure%noc ALL=(ALL) NOPASSWD:/usr/sbin/lldpcli
Layer 1 and 2Just show neighborslldpcli show neighbors%noc ALL=(ALL) NOPASSWD:/usr/sbin/lldpcli show neighbors*
InterfacesModify any interfaceip link set dev swp1 {up|down}%noc ALL=(ALL) NOPASSWD:/sbin/ip link set *
InterfacesUp any interfaceifup swp1%noc ALL=(ALL) NOPASSWD:/sbin/ifup
InterfacesDown any interfaceifdown swp1%noc ALL=(ALL) NOPASSWD:/sbin/ifdown
InterfacesUp/down only swp2ifup swp2 / ifdown swp2%noc ALL=(ALL) NOPASSWD:/sbin/ifup swp2,/sbin/ifdown swp2
InterfacesAny IP address changeip addr {add|del} 192.0.2.1/30 dev swp1%noc ALL=(ALL) NOPASSWD:/sbin/ip addr *
InterfacesOnly set IP addressip addr add 192.0.2.1/30 dev swp1%noc ALL=(ALL) NOPASSWD:/sbin/ip addr add *
Ethernet bridgingAny bridge commandbrctl addbr br0 / brctl delif br0 swp1%noc ALL=(ALL) NOPASSWD:/sbin/brctl
Ethernet bridgingAdd bridges and interfacesbrctl addbr br0 / brctl addif br0 swp1%noc ALL=(ALL) NOPASSWD:/sbin/brctl addbr *,/sbin/brctl addif *
Spanning treeSet STP propertiesmstpctl setmaxage br2 20%noc ALL=(ALL) NOPASSWD:/sbin/mstpctl
TroubleshootingRestart switchdsystemctl restart switchd.service%noc ALL=(ALL) NOPASSWD:/usr/sbin/service switchd *
TroubleshootingRestart any servicesystemctl cron switchd.service%noc ALL=(ALL) NOPASSWD:/usr/sbin/service
TroubleshootingPacket capturetcpdump%noc ALL=(ALL) NOPASSWD:/usr/sbin/tcpdump
Layer 3Add static routesip route add 10.2.0.0/16 via 10.0.0.1%noc ALL=(ALL) NOPASSWD:/bin/ip route add *
Layer 3Delete static routesip route del 10.2.0.0/16 via 10.0.0.1%noc ALL=(ALL) NOPASSWD:/bin/ip route del *
Layer 3Any static route changeip route *%noc ALL=(ALL) NOPASSWD:/bin/ip route *
Layer 3Any iproute commandip *%noc ALL=(ALL) NOPASSWD:/bin/ip
Layer 3Non-modal OSPFcl-ospf area 0.0.0.1 range 10.0.0.0/24%noc ALL=(ALL) NOPASSWD:/usr/bin/cl-ospf

LDAP Authentication and Authorization

Cumulus Linux uses Pluggable Authentication Modules (PAM) and Name Service Switch (NSS) for user authentication. NSS enables PAM to use LDAP to provide user authentication, group mapping, and information for other services on the system.

To configure LDAP authentication on Linux, you can use libnss-ldap, libnss-ldapd, or libnss-sss. This chapter describes libnss-ldapd only. From internal testing, this library worked best with Cumulus Linux and is the easiest to configure, automate, and troubleshoot.

Install libnss-ldapd

The libldap-2.4-2 and libldap-common LDAP packages are already installed on the Cumulus Linux image; however you need to install these additional packages to use LDAP authentication:

To install the additional packages, run the following command:

cumulus@switch:~$ sudo apt-get install libnss-ldapd libpam-ldapd ldap-utils nslcd

You can also install these packages even if the switch does not connect to the internet, as they are in the cumulus-local-apt-archive repository that is embedded in the Cumulus Linux image.

Follow the interactive prompts to specify the LDAP URI, search base distinguished name (DN), and services that must have LDAP lookups enabled. You need to select at least the passwd, group, and shadow services (press space to select a service). When done, select OK. This creates a basic LDAP configuration using anonymous bind and initiates user search under the base DN specified.

After the dialog closes, the install process prints information similar to the following:

/etc/nsswitch.conf: enable LDAP lookups for group
/etc/nsswitch.conf: enable LDAP lookups for passwd
/etc/nsswitch.conf: enable LDAP lookups for shadow

After the installation is complete, the name service caching daemon (nslcd) runs. This service handles all the LDAP protocol interactions and caches information that returns from the LDAP server. nslcd appends ldap to the /etc/nsswitch.conf file, as well as the secondary information source for passwd, group, and shadow. nslcd references the local files (/etc/passwd, /etc/groups and /etc/shadow) first, as specified by the compat source.

passwd: compat ldap
group: compat ldap
shadow: compat ldap

Keep compat as the first source in NSS for passwd, group, and shadow. This prevents you from getting locked out of the system.

Entering incorrect information during the installation process produces configuration errors. You can correct the information after installation by editing certain configuration files.

After editing the files, restart the NVUE and nginx-authenticator services with the sudo systemctl restart nvued.service command and the sudo systemctl restart nginx-authenticator.service command.

Alternative Installation Method Using debconf-utils

Instead of running the installer and following the interactive prompts, as described above, you can pre-seed the installer parameters using debconf-utils.

  1. Run apt-get install debconf-utils and create the pre-seeded parameters using debconf-set-selections. Provide the appropriate answers.

  2. Run debconf-show <pkg> to check the settings. Here is an example of how to pre-seed answers to the installer questions using debconf-set-selections:

    root# debconf-set-selections <<'zzzEndOfFilezzz'
    
    # LDAP database user. Leave blank will be populated later!
    
    nslcd nslcd/ldap-binddn  string
    
    # LDAP user password. Leave blank!
    nslcd nslcd/ldap-bindpw   password
    
    # LDAP server search base:
    nslcd nslcd/ldap-base string  ou=support,dc=rtp,dc=example,dc=test
    
    # LDAP server URI. Using ldap over ssl.
    nslcd nslcd/ldap-uris string  ldaps://myadserver.rtp.example.test
    
    # New to 0.9. restart cron, exim and others libraries without asking
    nslcd libraries/restart-without-asking: boolean true
    
    # LDAP authentication to use:
    # Choices: none, simple, SASL
    # Using simple because its easy to configure. Security comes by using LDAP over SSL
    # keep /etc/nslcd.conf 'rw' to root for basic security of bindDN password
    nslcd nslcd/ldap-auth-type    select  simple
    
    # Don't set starttls to true
    nslcd nslcd/ldap-starttls     boolean false
    
    # Check server's SSL certificate:
    # Choices: never, allow, try, demand
    nslcd nslcd/ldap-reqcert      select  never
    
    # Choices: Ccreds credential caching - password saving, Unix authentication, LDAP Authentication , Create home directory on first time login, Ccreds credential caching - password checking
    # This is where "mkhomedir" pam config is activated that allows automatic creation of home directory
    libpam-runtime        libpam-runtime/profiles multiselect     ccreds-save, unix, ldap, mkhomedir , ccreds-check
    
    # for internal use; can be preseeded
    man-db        man-db/auto-update      boolean true
    
    # Name services to configure:
    # Choices: aliases, ethers, group, hosts, netgroup, networks, passwd, protocols, rpc, services,  shadow
    libnss-ldapd  libnss-ldapd/nsswitch   multiselect     group, passwd, shadow
    libnss-ldapd  libnss-ldapd/clean_nsswitch     boolean false
    
    ## define platform specific libnss-ldapd debconf questions/answers. 
    ## For demo used amd64.
    libnss-ldapd:amd64    libnss-ldapd/nsswitch   multiselect     group, passwd, shadow
    libnss-ldapd:amd64    libnss-ldapd/clean_nsswitch     boolean false
    # libnss-ldapd:powerpc   libnss-ldapd/nsswitch   multiselect     group, passwd, shadow
    # libnss-ldapd:powerpc    libnss-ldapd/clean_nsswitch     boolean false
    

Update the nslcd.conf File

After installation, update the main configuration file (/etc/nslcd.conf) to accommodate the expected LDAP server settings.

This section documents some of the more important options that relate to security and queries. For details on all the available configuration options, read the nslcd.conf man page.

After editing the /etc/nslcd.conf file or enabling LDAP in the /etc/nsswitch.conf file, you must restart the NVUE and nginx-authenticator services with the sudo systemctl restart nvued.service command and the sudo systemctl restart nginx-authenticator.service command. If you disable LDAP, you must also restart these two services.

Connection

The LDAP client starts a session by connecting to the LDAP server on TCP and UDP port 389 or on port 636 for LDAPS. Depending on the configuration, this connection establishes without authentication (anonymous bind); otherwise, the client must provide a bind user and password. The variables you use to define the connection to the LDAP server are the URI and bind credentials.

The URI is mandatory and specifies the LDAP server location using the FQDN or IP address. The URI also designates whether to use ldap:// for clear text transport, or ldaps:// for SSL/TLS encrypted transport. You can also specify an alternate port in the URI. In production environments, use the LDAPS protocol so that all communications are secure.

After the connection to the server is complete, the BIND operation authenticates the session. The BIND credentials are optional; if you do not specify the credentials, the switch assumes an anonymous bind. Configure authenticated (Simple) BIND by specifying the user (binddn) and password (bindpw) in the configuration. Another option is to use SASL (Simple Authentication and Security Layer) BIND, which provides authentication services using other mechanisms, like Kerberos. Contact your LDAP server administrator for this information as it depends on the configuration of the LDAP server and the credentials for the client device.

# The location at which the LDAP server(s) should be reachable.
uri ldaps://ldap.example.com
# The DN to bind with for normal lookups.
binddn cn=CLswitch,ou=infra,dc=example,dc=com
bindpw CuMuLuS

Search Function

When an LDAP client requests information about a resource, it must connect and bind to the server. Then, it performs one or more resource queries depending on the lookup. All search queries to the LDAP server use the configured search base, filter, and the desired entry (uid=myuser). If the LDAP directory is large, this search takes a long time. Define a more specific search base for the common maps (passwd and group).

# The search base that will be used for all queries.
base dc=example,dc=com
# Mapped search bases to speed up common queries.
base passwd ou=people,dc=example,dc=com
base group ou=groups,dc=example,dc=com

Search Filters

To limit the search scope when authenticating users, use search filters to specify criteria when searching for objects within the directory. The default filters applied are:

filter passwd (objectClass=posixAccount)
filter group (objectClass=posixGroup)

Attribute Mapping

The map configuration allows you to override the attributes pushed from LDAP. To override an attribute for a given map, specify the attribute name and the new value. This is useful to ensure that the shell is bash and the home directory is /home/cumulus:

map    passwd homeDirectory "/home/cumulus"
map    passwd shell "/bin/bash"

In LDAP, the map refers to one of the supported maps specified in the manpage for nslcd.conf (such as passwd or group).

Create Home Directory on Login

If you want to use unique home directories, run the sudo pam-auth-update command and select Create home directory on login in the PAM configuration dialog (press the space bar to select the option). Select OK, then press Enter to save the update and close the dialog.

cumulus@switch:~$ sudo pam-auth-update

The home directory for any user that logs in (using LDAP or not) populates with the standard dotfiles from /etc/skel.

When nslcd starts, an error message similar to the following (where 5816 is the nslcd PID) sometimes appears:

nslcd[5816]: unable to dlopen /usr/lib/x86_64-linux-gnu/sasl2/libsasldb.so: libdb-5.3.so: cannot open
shared object file: No such file or directory

You can ignore this message. The libdb package and resulting log messages from nslcd do not cause any issues when you use LDAP as a client for login and authentication.

Example Configuration

Here is an example configuration using Cumulus Linux.

# /etc/nslcd.conf
# nslcd configuration file. See nslcd.conf(5)
# for details.

# The user and group nslcd should run as.
uid nslcd
gid nslcd

# The location at which the LDAP server(s) should be reachable.
uri ldaps://myadserver.rtp.example.test

# The search base that will be used for all queries.
base ou=support,dc=rtp,dc=example,dc=test

# The LDAP protocol version to use.
#ldap_version 3

# The DN to bind with for normal lookups.
# defconf-set-selections doesn't seem to set this. so have to manually set this.
binddn CN=cumulus admin,CN=Users,DC=rtp,DC=example,DC=test
bindpw 1Q2w3e4r!

# The DN used for password modifications by root.
#rootpwmoddn cn=admin,dc=example,dc=com

# SSL options
#ssl off (default)
# Not good does not prevent man in the middle attacks
#tls_reqcert demand(default)
tls_cacertfile /etc/ssl/certs/rtp-example-ca.crt

# The search scope.
#scope sub

# Add nested group support
# Supported in nslcd 0.9 and higher.
# default wheezy install of nslcd supports on 0.8. wheezy-backports has 0.9
nss_nested_groups yes

# Mappings for Active Directory
# (replace the SIDs in the objectSid mappings with the value for your domain)
# "dsquery * -filter (samaccountname=testuser1) -attr ObjectSID" where cn == 'testuser1'
pagesize 1000
referrals off
idle_timelimit 1000

# Do not allow uids lower than 100 to login (aka Administrator)
# not needed as pam already has this support
# nss_min_uid 1000

# This filter says to get all users who are part of the cumuluslnxadm group. Supports nested groups.
# Example, mary is part of the snrnetworkadm group which is part of cumuluslnxadm group
# Ref: http://msdn.microsoft.com/en-us/library/aa746475%28VS.85%29.aspx (LDAP_MATCHING_RULE_IN_CHAIN)
filter passwd (&(Objectclass=user)(!(objectClass=computer))(memberOf:1.2.840.113556.1.4.1941:=cn=cumuluslnxadm,ou=groups,ou=support,dc=rtp,dc=example,dc=test))
map    passwd uid           sAMAccountName
map    passwd uidNumber     objectSid:S-1-5-21-1391733952-3059161487-1245441232
map    passwd gidNumber     objectSid:S-1-5-21-1391733952-3059161487-1245441232
map    passwd homeDirectory "/home/$sAMAccountName"
map    passwd gecos         displayName
map    passwd loginShell    "/bin/bash"

# Filter for any AD group or user in the baseDN. the reason for filtering for the
# user to make sure group listing for user files don't say '<user> <gid>'. instead will say '<user> <user>'
# So for cosmetic reasons..nothing more.
filter group (&(|(objectClass=group)(Objectclass=user))(!(objectClass=computer)))
map    group gidNumber     objectSid:S-1-5-21-1391733952-3059161487-1245441232
map    group cn            sAMAccountName

Configure LDAP Authorization

Linux uses the sudo command to allow non-administrator users (such as the default cumulus user account) to perform privileged operations. To control the users that can use sudo, define a series of rules in the /etc/sudoers file and files in the /etc/sudoers.d/ directory. The rules apply to groups but you can also define specific users. You can add sudo rules using the group names from LDAP. For example, if a group of users are in the group netadmin, you can add a rule to give those users sudo privileges. Refer to the sudoers manual (man sudoers) for a complete usage description. The following shows an example in the /etc/sudoers file:

# The basic structure of a user specification is "who where = (as_whom) what ".
%sudo ALL=(ALL:ALL) ALL
%netadmin ALL=(ALL:ALL) ALL

Active Directory Configuration

Active Directory (AD) is a fully featured LDAP-based NIS server create by Microsoft. It offers unique features that classic OpenLDAP servers do not have. AD can be more complicated to configure on the client and each version works a little differently with Linux-based LDAP clients. Some more advanced configuration examples, from testing LDAP clients on Cumulus Linux with Active Directory (AD/LDAP), are available in the knowledge base.

LDAP Verification Tools

The LDAP client daemon retrieves and caches password and group information from LDAP. To verify the LDAP interaction, use these command-line tools to trigger an LDAP query from the device.

Identify a User with the id Command

The id command performs a username lookup by following the lookup information sources in NSS for the passwd service. This returns the user ID, group ID and the group list retrieved from the information source. In the following example, the user cumulus is locally defined in /etc/passwd, and myuser is on LDAP. The NSS configuration has the passwd map configured with the sources compat ldap:

cumulus@switch:~$ id cumulus
uid=1000(cumulus) gid=1000(cumulus) groups=1000(cumulus),24(cdrom),25(floppy),27(sudo),29(audio),30(dip),44(video),46(plugdev)
cumulus@switch:~$ id myuser
uid=1230(myuser) gid=3000(Development) groups=3000(Development),500(Employees),27(sudo)

getent

The getent command retrieves all records found with NSS for a given map. It can also retrieve a specific entry under that map. You can perform tests with the passwd, group, shadow, or any other map in the /etc/nsswitch.conf file. The output from this command formats according to the map requested. For the passwd service, the structure of the output is the same as the entries in /etc/passwd. The group map outputs the same structure as /etc/group.

In this example, looking up a specific user in the passwd map, the user cumulus is locally defined in /etc/passwd, and myuser is only in LDAP.

cumulus@switch:~$ getent passwd cumulus
cumulus:x:1000:1000::/home/cumulus:/bin/bash
cumulus@switch:~$ getent passwd myuser
myuser:x:1230:3000:My Test User:/home/myuser:/bin/bash

In the next example, looking up a specific group in the group service, the group cumulus is locally defined in /etc/groups, and netadmin is on LDAP.

cumulus@switch:~$ getent group cumulus
cumulus:x:1000:
cumulus@switch:~$ getent group netadmin
netadmin:*:502:larry,moe,curly,shemp

Running the command getent passwd or getent group without a specific request returns all local and LDAP entries for the passwd and group maps.

The ldapsearch command performs LDAP operations directly on the LDAP server. This does not interact with NSS. This command displays the information that the LDAP daemon process receives back from the server. The command has several options. The simplest option uses anonymous bind to the host and specifies the search DN and the attribute to look up.

cumulus@switch:~$ ldapsearch -H ldap://ldap.example.com -b dc=example,dc=com -x uid=myuser
Click to expand the command output
# extended LDIF
#
# LDAPv3
# base <dc=example,dc=com> with scope subtree
# filter: uid=myuser
# requesting: ALL
#
# myuser, people, example.com
dn: uid=myuser,ou=people,dc=example,dc=com
cn: My User
displayName: My User
gecos: myuser
gidNumber: 3000
givenName: My
homeDirectory: /home/myuser
initials: MU
loginShell: /bin/bash
mail: myuser@example.com
objectClass: inetOrgPerson
objectClass: posixAccount
objectClass: shadowAccount
objectClass: top
shadowExpire: -1
shadowFlag: 0
shadowMax: 999999
shadowMin: 8
shadowWarning: 7
sn: User
uid: myuser
uidNumber: 1234

# search result
search: 2
result: 0 Success

# numResponses: 2
# numEntries: 1

LDAP Browsers

The GUI LDAP clients are free tools that show the structure of the LDAP database graphically.

Troubleshooting

nslcd Debug Mode

When setting up LDAP authentication for the first time, turn off the nslcd service using the systemctl stop nslcd.service command (or the systemctl stop nslcd@mgmt.service if you are running the service in a management VRF) and run it in debug mode. Debug mode works whether you are using LDAP over SSL (port 636) or an unencrypted LDAP connection (port 389).

cumulus@switch:~$ sudo systemctl stop nslcd.service
cumulus@switch:~$ sudo nslcd -d

After you enable debug mode, run the following command to test LDAP queries:

cumulus@switch:~$ getent passwd

If you configure LDAP correctly, the following messages appear after you run the getent command:

nslcd: DEBUG: accept() failed (ignored): Resource temporarily unavailable
nslcd: [8e1f29] DEBUG: connection from pid=11766 uid=0 gid=0
nslcd: [8e1f29] <passwd(all)> DEBUG: myldap_search(base="dc=example,dc=com", filter="(objectClass=posixAccount)")
nslcd: [8e1f29] <passwd(all)> DEBUG: ldap_result(): uid=myuser,ou=people,dc=example,dc=com
nslcd: [8e1f29] <passwd(all)> DEBUG: ldap_result(): ... 152 more results
nslcd: [8e1f29] <passwd(all)> DEBUG: ldap_result(): end of results (162 total)

In the example output above, <passwd(all)> shows a query of the entire directory structure.

You can query a specific user with the following command:

cumulus@switch:~$ getent passwd myuser

You can replace myuser with any username on the switch. The following debug output indicates that user myuser exists:

nslcd: DEBUG: add_uri(ldap://10.50.21.101)
nslcd: version 0.8.10 starting
nslcd: DEBUG: unlink() of /var/run/nslcd/socket failed (ignored): No such file or directory
nslcd: DEBUG: setgroups(0,NULL) done
nslcd: DEBUG: setgid(110) done
nslcd: DEBUG: setuid(107) done
nslcd: accepting connections
nslcd: DEBUG: accept() failed (ignored): Resource temporarily unavailable
nslcd: [8b4567] DEBUG: connection from pid=11369 uid=0 gid=0
nslcd: [8b4567] <passwd="myuser"> DEBUG: myldap_search(base="dc=cumulusnetworks,dc=com", filter="(&(objectClass=posixAccount)(uid=myuser))")
nslcd: [8b4567] <passwd="myuser"> DEBUG: ldap_initialize(ldap://<ip_address>)
nslcd: [8b4567] <passwd="myuser"> DEBUG: ldap_set_rebind_proc()
nslcd: [8b4567] <passwd="myuser"> DEBUG: ldap_set_option(LDAP_OPT_PROTOCOL_VERSION,3)
nslcd: [8b4567] <passwd="myuser"> DEBUG: ldap_set_option(LDAP_OPT_DEREF,0)
nslcd: [8b4567] <passwd="myuser"> DEBUG: ldap_set_option(LDAP_OPT_TIMELIMIT,0)
nslcd: [8b4567] <passwd="myuser"> DEBUG: ldap_set_option(LDAP_OPT_TIMEOUT,0)
nslcd: [8b4567] <passwd="myuser"> DEBUG: ldap_set_option(LDAP_OPT_NETWORK_TIMEOUT,0)
nslcd: [8b4567] <passwd="myuser"> DEBUG: ldap_set_option(LDAP_OPT_REFERRALS,LDAP_OPT_ON)
nslcd: [8b4567] <passwd="myuser"> DEBUG: ldap_set_option(LDAP_OPT_RESTART,LDAP_OPT_ON)
nslcd: [8b4567] <passwd="myuser"> DEBUG: ldap_simple_bind_s(NULL,NULL) (uri="ldap://<ip_address>")
nslcd: [8b4567] <passwd="myuser"> DEBUG: ldap_result(): end of results (0 total)

Common Problems

SSL/TLS

NSCD

If you are running the nslcd service in a management VRF, you need to run the systemctl restart nslcd@mgmt.service command instead of the systemctl restart nslcd.service command. For example:

cumulus@switch:~$ sudo nscd -K
cumulus@switch:~$ sudo systemctl restart nslcd@mgmt.service

LDAP

TACACS

Cumulus Linux implements TACACS+ client AAA in a transparent way with minimal configuration. The client implements the TACACS+ protocol as described in this IETF document. There is no need to create accounts or directories on the switch. Accounting records go to all configured TACACS+ servers by default. Using per-command authorization requires additional setup on the switch.

TACACS+ in Cumulus Linux:

Install the TACACS+ Client Packages

You must install the TACACS+ client packages to use TACACS+. If you do not install the TACACS+ packages, you see the following message when you try to enable TACACS+ with the NVUE nv set system aaa tacacs enable on command:

'tacplus-client' package needs to be installed to enable tacacs

You can install the TACACS+ packages even if the switch is not connected to the internet; the packages are in the cumulus-local-apt-archive repository in the Cumulus Linux image.

To install all required packages, run these commands:

cumulus@switch:~$ sudo -E apt-get update
cumulus@switch:~$ sudo -E apt-get install tacplus-client

Required TACACS+ Client Configuration

After you install the required TACACS+ packages, configure the following required settings on the switch (the TACACS+ client).

If you use NVUE commands to configure TACACS+, you must also set the priority for the authentication order for local and TACACS+ users, and enable TACACS+.

After you configure any TACACS+ settings with NVUE and you run nv config apply, you must restart the NVUE service with the sudo systemctl restart nvued.service command.

NVUE commands require you to specify the priority for each TACACS+ server. You must set a priority even if you only specify one server.

The following example commands set:

  • The TACACS+ server priority to 5.
  • The IP address of the server to 192.168.0.30.
  • The secret to mytacac$key.

If you include special characters in the password (such as $), you must enclose the password in single quotes (').

  • The VRF to mgmt.
  • The authentication order so that TACACS+ authentication has priority over local (the lower number has priority).
  • TACACS+ to enabled.
cumulus@switch:~$ nv set system aaa tacacs server 5 host 192.168.0.30
cumulus@switch:~$ nv set system aaa tacacs server 5 secret 'mytacac$key'
cumulus@switch:~$ nv set system aaa tacacs vrf mgmt 
cumulus@switch:~$ nv set system aaa authentication-order 5 tacacs
cumulus@switch:~$ nv set system aaa authentication-order 10 local
cumulus@switch:~$ nv set system aaa tacacs enable on
cumulus@switch:~$ nv config apply

If you want the server to use IPv6, you must add the nv set system aaa tacacs server <priority> prefer-ip-version 6 command:

cumulus@switch:~$ nv set system aaa tacacs server 5 host server5
cumulus@switch:~$ nv set system aaa tacacs server 5 prefer-ip-version 6
...

If you configure more than one TACACS+ server, you need to set the priority for each server. If the switch cannot establish a connection with the server that has the highest priority, it tries to establish a connection with the next highest priority server. The server with the lower number has the higher prioritity. In the example below, server 192.168.0.30 with a priority value of 5 has a higher priority than server 192.168.1.30, which has a priority value of 10.

cumulus@switch:~$ nv set system aaa tacacs server 5 host 192.168.0.30
cumulus@switch:~$ nv set system aaa tacacs server 5 secret 'mytacac$key' 
cumulus@switch:~$ nv set system aaa tacacs server 10 host 192.168.1.30
cumulus@switch:~$ nv set system aaa tacacs server 10 secret 'mytacac$key2'
cumulus@switch:~$ nv config apply
  1. Edit the /etc/tacplus_servers file to add at least one server and one shared secret (key). You can specify the server and secret parameters in any order anywhere in the file. Whitespace (spaces or tabs) are not allowed. For example, if your TACACS+ server IP address is 192.168.0.30 and your shared secret is tacacskey, add these parameters to the /etc/tacplus_servers file:

    cumulus@switch:~$ sudo nano /etc/tacplus_servers
    secret=mytacac$key
    server=192.168.0.30
    

    Cumulus Linux supports a maximum of seven TACACS+ servers. To specify multiple servers, add one per line to the /etc/tacplus_servers file. Connections establish in the order in the file.

    cumulus@switch:~$ sudo nano /etc/tacplus_servers
    secret=mytacac$key
    server=192.168.0.30
    secret=mytacac$key2
    server=192.168.1.30
    

    If you want the server to use IPv6, you must add the prefer_ip_version=6 parameter in the /etc/tacplus_servers file:

    cumulus@switch:~$ sudo nano /etc/tacplus_servers
    secret=mytacac$key
    server=server5
    prefer_ip_version=ipv6 
    secret=mytacac$key2
    server=server6
    prefer_ip_version=ipv6 
    
  2. Uncomment the vrf=mgmt line:

    # If the management network is in a vrf, set this variable to the vrf name.
    # This would usually be "mgmt"
    # When this variable is set, the connection to the TACACS+ accounting servers
    # will be made through the named vrf.
    vrf=mgmt
    
  3. Restart auditd:

    cumulus@switch:~$ sudo systemctl restart auditd
    

Optional TACACS+ Configuration

You can configure the following optional TACACS+ settings:

The following example commands set the timeout to 10 seconds and the TACACS+ server port to 32:

cumulus@switch:~$ nv set system aaa tacacs timeout 10
cumulus@switch:~$ nv set system aaa tacacs server 5 port 32
cumulus@switch:~$ nv config apply

The following example commands set the source IP address to 10.10.10.1 and the authentication type to CHAP:

cumulus@switch:~$ nv set system aaa tacacs source-ip 10.10.10.1
cumulus@switch:~$ nv set system aaa tacacs authentication mode chap
cumulus@switch:~$ nv config apply

The following example commands exclude the user USER1 from going to the TACACS+ server for authentication and enables Cumulus Linux to create a separate home directory for each TACACS+ user when the TACACS+ user first logs in:

cumulus@switch:~$ nv set system aaa tacacs exclude-user USER1
cumulus@switch:~$ nv set system aaa tacacs authentication per-user-homedir on
cumulus@switch:~$ nv config apply
  • To set the server port (use the format server:port), source IP address, authentication type, and enable Cumulus Linux to create a separate home directory for each TACACS+ user, edit the /etc/tacplus_servers file, then restart auditd.
  • To set the timeout and the usernames to exclude from TACACS+ authentication, edit the /etc/tacplus_nss.conf file (you do not need to restart auditd).

The following example sets the server port to 32, the authentication type to CHAP, the source IP address to 10.10.10.1, and enables Cumulus Linux to create a separate home directory for each TACACS+ user when the TACACS+ user first logs in:

cumulus@switch:~$ sudo nano /etc/tacplus_servers
...
secret=mytacac$key
server=192.168.0.30:32
...
# Sets the IPv4 address used as the source IP address when communicating with
# the TACACS+ server.  IPv6 addresses are not supported, nor are hostnames.
# The address must work when passsed to the bind() system call, that is, it must
# be valid for the interface being used.
source_ip=10.10.10.1
...
# If user_homedir=1, then tacacs users will be set to have a home directory
# based on their login name, rather than the mapped tacacsN home directory.
# mkhomedir_helper is used to create the directory if it does not exist (similar
# to use of pam_mkhomedir.so). This flag is ignored for users with restricted
# shells, e.g., users mapped to a tacacs privilege level that has enforced
# per-command authorization (see the tacplus-restrict man page).
user_homedir=1
...
login=chap
cumulus@switch:~$ sudo systemctl restart auditd

The following example sets the timeout to 10 seconds and excludes the user USER1 from going to the TACACS+ server for authentication:

cumulus@switch:~$ sudo nano /etc/tacplus_nss.conf
...
# The connection timeout for an NSS library should be short, since it is
# invoked for many programs and daemons, and a failure is usually not
# catastrophic.  Not set or set to a negative value disables use of poll().
# This follows the include of tacplus_servers, so it can override any
# timeout value set in that file.
# It's important to have this set in this file, even if the same value
# as in tacplus_servers, since tacplus_servers should not be readable
# by users other than root.
timeout=10
...
# This is a comma separated list of usernames that are never sent to
# a tacacs server, they cause an early not found return.
#
# "*" is not a wild card.  While it's not a legal username, it turns out
# that during pathname completion, bash can do an NSS lookup on "*"
# To avoid server round trip delays, or worse, unreachable server delays
# on filename completion, we include "*" in the exclusion list.
exclude_users=root,daemon,nobody,cron,radius_user,radius_priv_user,sshd,cumulus,quagga,frr,snmp,www-data,ntp,man,_lldpd,USER1,*

Cumulus Linux supports the following additional Linux parameters in the etc/tacplus_nss.conf file. Currently, there are no equivalent NUVE commands.

Linux ParameterDescription
includeConfigures a supplemental configuration file to avoid duplicating configuration information. You can include up to eight additional configuration files. For example: include=/myfile/myname.
min_uidConfigures the minimum user ID that the NSS plugin can look up. 0 specifies that the plugin never looks up uid 0 (root). Do not specify a value greater than the local TACACS+ user IDs (0 through 15).

TACACS+ Accounting

When you install the TACACS+ packages and configure the basic TACACS+ settings (set the server and shared secret), accounting is on and there is no additional configuration required.

TACACS+ accounting uses the audisp module, with an additional plugin for auditd and audisp. The plugin maps the auid in the accounting record to a TACACS login, which it bases on the auid and sessionid. The audisp module requires libnss_tacplus and uses the libtacplus_map.so library interfaces as part of the modified libpam_tacplus package.

Communication with the TACACS+ servers occurs with the libsimple-tacact1 library, through dlopen(). A maximum of 240 bytes of command name and arguments send in the accounting record, due to the TACACS+ field length limitation of 255 bytes.

  • All sudo commands run by TACACS+ users generate accounting records against the original TACACS+ login name.
  • All Linux and NVUE commands result in an accounting record, including login commands and sub-processes of other commands. This can generate a lot of accounting records.

By default, Cumulus Linux sends accounting records to all servers. You can change this setting to send accounting records to the server that is first to respond:

cumulus@switch:~$ nv set system aaa tacacs accounting send-records first-response
cumulus@switch:~$ nv config apply

To reset to the default configuration (send accounting records to all servers), run the nv set system aaa tacacs accounting send-records all command.

  1. Edit the /etc/audisp/audisp-tac_plus.conf file and change the acct_all parameter to 0:

    cumulus@switch:~$ sudo nano /etc/audisp/audisp-tac_plus.conf
    ...
    acct_all=0
    
  2. Restart auditd:

    cumulus@switch:~$ sudo systemctl restart auditd
    

To reset to the default configuration (send accounting records to all servers), change the value of acct_all to 1 (acct_all=1).

To disable TACACS+ accounting:

cumulus@switch:~$ nv set system aaa tacacs accounting enable off
cumulus@switch:~$ nv config apply
  1. Edit the /etc/audisp/plugins.d/audisp-tacplus.conf file and change the active parameter to no:

    cumulus@switch:~$ sudo nano /etc/audisp/plugins.d/audisp-tacplus.conf
    ...
    # default to enabling tacacs accounting; change to no to disable
    active = no
    
  2. Restart auditd:

    cumulus@switch:~$ sudo systemctl restart auditd
    

Local Fallback Authentication

You can configure the switch to allow local fallback authentication for a user when the TACACS servers are unreachable, do not include the user for authentication, or have the user in the exclude user list.

To allow local fallback authentication for a user, add a local privileged user account on the switch with the same username as a TACACS user. A local user is always active even when the TACACS service is not running.

NVUE does not provide commands to configure local fallback authentication.

To configure local fallback authentication:

  1. Edit the /etc/nsswitch.conf file to remove the keyword tacplus from the line starting with passwd. (You need to add the keyword back in step 3.)

    The following example shows the /etc/nsswitch.conf file with no tacplus keyword in the line starting with passwd.

    cumulus@switch:~$ sudo nano /etc/nsswitch.conf
    #
    # Example configuration of GNU Name Service Switch functionality.
    # If you have the `glibc-doc-reference' and `info' packages installed, try:
    # `info libc "Name Service Switch"' for information about this file.
    
    passwd:         files
    group:          tacplus files
    shadow:         files
    gshadow:        files
    ...
    
  2. To enable the local privileged user to run sudo and NVUE commands, run the adduser commands shown below. In the example commands, the TACACS account name is tacadmin.

    The first adduser command prompts for information and a password. You can skip most of the requested information by pressing ENTER.

    cumulus@switch:~$ sudo adduser --ingroup tacacs tacadmin
    cumulus@switch:~$ sudo adduser tacadmin nvset
    cumulus@switch:~$ sudo adduser tacadmin nvapply
    cumulus@switch:~$ sudo adduser tacadmin sudo
    
  3. Edit the /etc/nsswitch.conf file to add the keyword tacplus back to the line starting with passwd (the keyword you removed in the first step).

     cumulus@switch:~$ sudo nano /etc/nsswitch.conf
     #
     # Example configuration of GNU Name Service Switch functionality.
     # If you have the `glibc-doc-reference' and `info' packages installed, try:
     # `info libc "Name Service Switch"' for information about this file.
     passwd:         tacplus files
     group:          tacplus files
     shadow:         files
     gshadow:        files
     ...
    
  4. Restart the nvued service with the following command:

    cumulus@switch:~$ sudo systemctl restart nvued
    

TACACS+ Per-command Authorization

TACACS+ per-command authorization lets you configure the commands that TACACS+ users at different privilege levels can run.

To reach the TACACS+ server through the default VRF, you must specify the egress interface you use in the default VRF. Either run the NVUE nv set system aaa tacacs vrf <interface> command (for example, nv set system aaa tacacs vrf swp51) or set the vrf=<interface> option in the /etc/tacplus_servers file (for example, vrf=swp51).

The following command allows TACACS+ users at privilege level 0 to run the nv and ip commands (if authorized by the TACACS+ server):

cumulus@switch:~$ nv set system aaa tacacs authorization 0 command ip 
cumulus@switch:~$ nv set system aaa tacacs authorization 0 command nv
cumulus@switch:~$ nv config apply

To show the per-command authorization settings, run the nv show system aaa tacacs authorization command:

cumulus@switch:~$ nv show system aaa tacacs authorization
Privilege Level  role          command
---------------  ------------  -------
0                nvue-monitor  ip     
                               nv  
tacuser0@switch:~$ sudo tacplus-restrict -i -u tacacs0 -a ip nv

The tacplus-auth command handles authorization for each command. To make this an enforced authorization, change the TACACS+ log in to use a restricted shell, with a very limited executable search path. Otherwise, the user can bypass the authorization. The tacplus-restrict utility simplifies setting up the restricted environment.

The following table provides the tacplus-restrict command options:

OptionDescription
-iInitializes the environment. You only need to issue this option one time per username.
-aYou can invoke the utility with the -a option as often as you like. For each command in the -a list, the utility creates a symbolic link from tacplus-auth to the relative portion of the command name in the local bin subdirectory. You also need to enable these commands on the TACACS+ server (refer to your TACACS+ server documentation). It is common for the server to allow some options to a command, but not others.
-fRe-initializes the environment. If you need to restart, run the -f option with -i to force re-initialization; otherwise, the utility ignores repeated use of -i.
During initialization:
- The user shell changes to /bin/rbash.
- The utility saves any existing dot files.

After running this command, examine the tacacs0 directory::

cumulus@switch:~$ sudo ls -lR ~tacacs0
total 12
lrwxrwxrwx 1 root root 22 Nov 21 22:07 ip -> /usr/sbin/tacplus-auth
lrwxrwxrwx 1 root root 22 Nov 21 22:07 nv -> /usr/sbin/tacplus-auth

Except for shell built-ins, privilege level 0 TACACS users can only run the ip and nv commands.

If you add commands with the -a option by mistake, you can remove them. The example below removes the nv command:

cumulus@switch:~$ sudo rm ~tacacs0/bin/nv

To remove all commands:

cumulus@switch:~$ sudo rm ~tacacs0/bin/*

Remove the TACACS+ Client Packages

To remove all the TACACS+ client packages, use the following commands:

cumulus@switch:~$ sudo -E apt-get remove tacplus-client
cumulus@switch:~$ sudo -E apt-get autoremove

To remove the TACACS+ client configuration files as well as the packages (recommended), use this command:

cumulus@switch:~$ sudo -E apt-get autoremove --purge

Troubleshooting

Show TACACS+ Configuration

Run the following commands to show TACACS+ configuration:

The following example command shows all TACACS+ configuration:

cumulus@switch:~$ nv show system aaa tacacs
                    applied
------------------  -------
enable              off    
debug-level         0      
timeout             5      
vrf                 mgmt   
accounting                 
  enable            off    
authentication             
  mode              pap    
  per-user-homedir  off    
[server]            5      
[server]            10 

The following command shows the list of users excluded from TACACS+ server authentication:

cumulus@switch:~$ nv show system aaa tacacs exclude-user
          applied
--------  -------
username  USER1  

Basic Server Connectivity or NSS Issues

You can use the getent command to determine if you configured TACACS+ correctly and if the local password is in the configuration files. In the example commands below, the cumulus user represents the local user, while cumulusTAC represents the TACACS user.

To look up the username within all NSS methods:

cumulus@switch:~$ sudo getent passwd cumulusTAC
cumulusTAC:x:1016:1001:TACACS+ mapped user at privilege level 15,,,:/home/tacacs15:/bin/bash

To look up the user within the local database only:

cumulus@switch:~$ sudo getent -s compat passwd cumulus
cumulus:x:1000:1000:cumulus,,,:/home/cumulus:/bin/bash

To look up the user within the TACACS+ database only:

cumulus@switch:~$ sudo getent -s tacplus passwd cumulusTAC
cumulusTAC:x:1016:1001:TACACS+ mapped user at privilege level 15,,,:/home/tacacs15:/bin/bash

If TACACS+ is not working correctly, you can use debugging. Add the debug=1 parameter to the /etc/tacplus_servers and /etc/tacplus_nss.conf files; see the Linux Commands under Optional TACACS+ Configuration above. You can also add debug=1 to individual pam_tacplus lines in /etc/pam.d/common*.

All log messages are in /var/log/syslog.

Incorrect Shared Key

The TACACS client on the switch and the TACACS server must have the same shared secret key. If this key is incorrect, the following message prints to syslog:

2017-09-05T19:57:00.356520+00:00 leaf01 sshd[3176]: nss_tacplus: TACACS+ server 192.168.0.254:49 read failed with protocol error (incorrect shared secret?) user cumulus

Debug Issues with Per-command Authorization

To debug TACACS user command authorization, have the TACACS+ user enter the following command at a shell prompt, then try the command again:

tacuser0@switch:~$ export TACACSAUTHDEBUG=1

When you enable debugging, the command authorization conversation with the TACACS+ server shows additional information.

To disable debugging:

tacuser0@switch:~$ export -n TACACSAUTHDEBUG

Debug Issues with Accounting Records

If you add or delete TACACS+ servers from the configuration files, make sure you notify the audisp plugin with this command:

cumulus@switch:~$ sudo killall -HUP audisp-tacplus

If accounting records do not send, add debug=1 to the /etc/audisp/audisp-tac_plus.conf file, then run the command above to notify the plugin. Ask the TACACS+ user to run a command and examine the end of /var/log/syslog for messages from the plugin. You can also check the auditing log file /var/log/audit audit.log to be sure the auditing records exist. If the auditing records do not exist, restart the audit daemon with:

cumulus@switch:~$ sudo systemctl restart auditd.service

TACACS+ Package Descriptions

Cumulus Linux uses the following packages for TACACS.

Package
Description
audisp-tacplusUses auditing data from auditd to send accounting records to the TACACS+ server and starts as part of auditd.
libtac2Provides basic TACACS+ server utility and communication routines.
libnss-tacplusProvides an interface between libc username lookups, the mapping functions, and the TACACS+ server.
tacplus-authIncludes the tacplus-restrict setup utility, which enables you to perform per-command TACACS+ authorization. Per-command authorization is not the default.
libpam-tacplusProvides a modified version of the standard Debian package.
libtacplus-map1Provides mapping between local and TACACS+ users on the server. The package:- Sets the immutable sessionid and auditing UID to ensure that you can track the original user through multiple processes and privilege changes.- Sets the auditing loginuid as immutable.- Creates and maintains a status database in /run/tacacs_client_map to manage and lookup mappings.
libsimple-tacacct1Provides an interface for programs to send accounting records to the TACACS+ server. audisp-tacplus uses this package.
libtac2-binProvides the tacc testing program and TACACS+ man page.

TACACS+ Client Configuration Files

The following table describes the TACACS+ client configuration files that Cumulus Linux uses.

Filename
Description
/etc/tacplus_serversThe primary file that requires configuration after installation. All packages with include=/etc/tacplus_servers parameters use this file. Typically, this file contains the shared secrets; make sure that the Linux file mode is 600.
/etc/nsswitch.confWhen the libnss_tacplus package installs, this file configures tacplus lookups through libnss_tacplus. If you replace this file by automation, you need to add tacplus as the first lookup method for the passwd database line.
/etc/tacplus_nss.confSets the basic parameters for libnss_tacplus. The file includes a debug variable for debugging NSS lookups separately from other client packages.
/usr/share/pam-configs/tacplusThe configuration file for pam-auth-update to generate the files in the next row. The file uses these configurations at login, by su, and by ssh.
/etc/pam.d/common-*The /etc/pam.d/common-* files update for tacplus authentication. The files update with pam-auth-update when you install or remove libpam-tacplus.
/etc/sudoers.d/tacplusAllows TACACS+ privilege level 15 users to run commands with sudo. The file includes an example (commented out) of how to enable privilege level 15 TACACS users to use sudo without a password and provides an example of how to enable all TACACS users to run specific commands with sudo. Only edit this file with the visudo -f /etc/sudoers.d/tacplus command.
/etc/audisp/plugins.d/audisp-tacplus.confThe audisp plugin configuration file. You do not need to modify this file.
/etc/audisp/audisp-tac_plus.confThe TACACS+ server configuration file for accounting. You do not need to modify this file. You can use this configuration file when you only want to debug TACACS+ accounting issues, not all TACACS+ users.
/etc/audit/rules.d/audisp-tacplus.rulesThe auditd rules for TACACS+ accounting. The augenrules command uses all rule files to generate the rules file.
/etc/audit/audit.rulesThe audit rules file that generate when you install auditd.

Considerations

Multiple TACACS+ Users

If two or more TACACS+ users log in simultaneously with the same privilege level, while the accounting records are correct, a lookup on either name matches both users, while a UID lookup only returns the user that logs in first.

As a result, any processes that either user runs apply to both and all files either user creates apply to the first name matched. This is similar to adding two local users to the password file with the same UID and GID and is an inherent limitation of using the UID for the base user from the password file.

The current algorithm returns the first name matching the UID from the mapping file; either the first or the second user that logs in.

To work around this issue, you can use the switch audit log or the TACACS server accounting logs to determine which processes and files each user creates.

The Linux auditd system does not always generate audit events for processes when terminated with a signal (with the kill system call or internal errors such as SIGSEGV). As a result, processes that exit on a signal that you do not handle, generate a STOP accounting record.

Issues with the deluser Command

TACACS+ and other non-local users that run the deluser command with the --remove-home option see the following error:

tacuser0@switch: deluser --remove-home USERNAME
userdel: cannot remove entry 'USERNAME' from /etc/passwd
/usr/sbin/deluser: `/usr/sbin/userdel USERNAME' returned error code 1. Exiting

The command does remove the home directory. The user can still log in on that account but does not have a valid home directory. This is a known upstream issue with the deluser command for all non-local users.

Only use the --remove-home option with the user_homedir=1 configuration command.

Both TACACS+ and RADIUS AAA Clients

When you install both the TACACS+ and the RADIUS AAA client, Cumulus Linux does not attempt RADIUS login. As a workaround, do not install both the TACACS+ and the RADIUS AAA client on the same switch.

TACACS+ and PAM

PAM modules and an updated version of the libpam-tacplus package configure authentication initially. When you install the package, the pam-auth-update command updates the PAM configuration in /etc/pam.d. If you make changes to your PAM configuration, you need to integrate these changes. If you also use LDAP with the libpam-ldap package, you need to edit the PAM configuration with the LDAP and TACACS ordering you prefer. The libpam-tacplus package ignore rules and the values in success=2 require adjustments to ignore LDAP rules.

The TACACS+ privilege attribute priv_lvl determines the privilege level for the user that the TACACS+ server returns during the user authorization exchange. The client accepts the attribute in either the mandatory or optional forms and also accepts priv-lvl as the attribute name. The attribute value must be a numeric string in the range 0 to 15, with 15 the most privileged level.

By default, TACACS+ users at privilege levels other than 15 cannot run sudo commands and can only run commands with standard Linux user permissions.

You can edit the /etc/pam.d/common-* files manually. However, if you run pam-auth-update again after making the changes, the update fails. Only configure /usr/share/pam-configs/tacplus, then run pam-auth-update.

NSS Plugin

With pam_tacplus, TACACS+ authenticated users can log in without a local account on the system using the NSS plugin that comes with the tacplus_nss package. The plugin uses the mapped tacplus information if the user is not in the local password file, provides the getpwnam() and getpwuid()entry points, and uses the TACACS+ authentication functions.

The plugin asks the TACACS+ server if it knows the user, and then for relevant attributes to determine the privilege level of the user. When you install the libnss_tacplus package, nsswitch.conf changes to set tacplus as the first lookup method for passwd. If you change the order, lookups return the local accounts, such as tacacs0

If TACACS+ server does not find the user, it uses the libtacplus.so exported functions to do a mapped lookup. The privilege level appends to tacacs and the lookup searches for the name in the local password file. For example, privilege level 15 searches for the tacacs15 user. If the TACACS+ server finds the user, it adds information for the user in the password structure.

If the TACACS+ server does not find the user, it decrements the privilege level and checks again until it reaches privilege level 0 (user tacacs0). This allows you to use only the two local users tacacs0 and tacacs15, for minimal configuration.

TACACS+ Client Sequencing

Cumulus Linux requires the following information at the beginning of the AAA sequence:

For non-local users (users not in the local password file) you need to send a TACACS+ authorization request as the first communication with the TACACS+ server, before authentication and before the user logging in requests a password.

You need to configure certain TACACS+ servers to allow authorization requests before authentication. Contact your TACACS+ server vendor for information.

Multiple Servers with Different User Accounts

If you configure multiple TACACS+ servers that have different user accounts:

RADIUS AAA

Various add-on packages enable RADIUS users to log in to Cumulus Linux switches in a transparent way with minimal configuration. There is no need to create accounts or directories on the switch. Authentication uses PAM and includes login, ssh, sudo and su.

Install the RADIUS Packages

You can install the RADIUS packages even if the switch is not connected to the internet, as they are in the cumulus-local-apt-archive repository, which is embedded in the Cumulus Linux image.

To install the RADIUS packages:

cumulus@switch:~$ sudo apt-get update
cumulus@switch:~$ sudo apt-get install libnss-mapuser libpam-radius-auth

After installation is complete, either reboot the switch or run the sudo systemctl restart nvued command.

The libpam-radius-auth package supplied with the Cumulus Linux RADIUS client is a newer version than the one in Debian Buster. This package contains support for IPv6, the src_ip option described below, as well as bug fixes and minor features. The package also includes VRF support, provides man pages describing the PAM and RADIUS configuration, and sets the SUDO_PROMPT environment variable to the login name for RADIUS mapping support.

The libnss-mapuser package is specific to Cumulus Linux and supports the getgrent, getgrnam and getgrgid library interfaces. These interfaces add logged in RADIUS users to the group member list for groups that contain the mapped_user (radius_user) if the RADIUS account does not have privileges, and add privileged RADIUS users to the group member list for groups that contain the mapped_priv_user (radius_priv_user) during the group lookups.

During package installation:

Configure the RADIUS Client

After editing the /etc/pam_radius_auth.conf file, you must restart the NVUE and nginx-authenticator services with the sudo systemctl restart nvued.service command and the sudo systemctl restart nginx-authenticator.service command.

To configure the RADIUS client, edit the /etc/pam_radius_auth.conf file:

  1. Add the hostname or IP address of at least one RADIUS server (such as a freeradius server on Linux), and the shared secret used to authenticate and encrypt communication with each server.

    You must be able to resolve the hostname of the switch to an IP address. If for some reason you cannot find the hostname in DNS, you can add the hostname to the /etc/hosts file manually. However, this can cause problems because DHCP assigns the IP address, which can change at any time.

    Multiple server configuration lines are verified in the order listed. Other than memory, there is no limit to the number of RADIUS servers you can use.

    The server port number or name is optional. The system looks up the port in the /etc/services file. However, you can override the ports in the /etc/pam_radius_auth.conf file.

  2. If the server is slow or latencies are high, change the timeout setting. The setting defaults to 3 seconds.

  3. If you want to use a specific interface to reach the RADIUS server, specify the src_ip option. You can specify the hostname of the interface, an IPv4, or an IPv6 address. If you specify the src_ip option, you must also specify the timeout option.

  4. Set the vrf-name field. This is typically set to mgmt if you are using a management VRF. You cannot specify more than one VRF.

The configuration file includes the mapped_priv_user field that sets the account used for privileged RADIUS users and the priv-lvl field that sets the minimum value for the privilege level to be a privileged login (the default value is 15). If you edit these fields, make sure the values match those set in the /etc/nss_mapuser.conf file.

The following example provides a sample /etc/pam_radius_auth.conf file configuration:

mapped_priv_user   radius_priv_user
# server[:port]    shared_secret   timeout (secs)  src_ip
192.168.0.254      secretkey
other-server       othersecret     3               192.168.1.10
# when mgmt vrf is in use
vrf-name mgmt

If this is the first time you are configuring the RADIUS client, uncomment the debug line for troubleshooting. The debugging messages write to /var/log/syslog. When the RADIUS client is working correctly, comment out the debug line.

As an optional step, you can set PAM configuration keywords by editing the /usr/share/pam-configs/radius file. After you edit the file, you must run the pam-auth-update --package command. The pam_radius_auth (8) man page describes the PAM configuration keywords.

The value of the VSA (Vendor Specific Attribute) shell:priv-lvl determines the privilege level for the user on the switch. If the attribute does not return, the user does not have privileges. The following shows an example using the freeradius server for a fully privileged user.

Service-Type = Administrative-User,
Cisco-AVPair = "shell:roles=network-administrator",
Cisco-AVPair += "shell:priv-lvl=15"

The VSA vendor name (Cisco-AVPair in the example above) can have any content. The RADIUS client only checks for the string shell:priv-lvl.

Enable Login without Local Accounts

LDAP is not commonly used with switches and adding accounts locally is cumbersome, Cumulus Linux includes a mapping capability with the libnss-mapuser package.

Mapping uses two NSS (Name Service Switch) plugins, one for account name, and one for UID lookup. The installation process configures these accounts automatically in the /etc/nsswitch.conf file and removes them when you delete the package. See the nss_mapuser (8) man page for the full description of this plugin.

A username is mapped at login to a fixed account specified in the configuration file, with the fields of the fixed account used as a template for the user that is logging in.

For example, if you look up the name dave and the fixed account in the configuration file is radius\_user, and that entry in /etc/passwd is:

radius_user:x:1017:1002:radius user:/home/radius_user:/bin/bash

then the matching line that returns when you run getent passwd dave is:

cumulus@switch:~$ getent passwd dave
dave:x:1017:1002:dave mapped user:/home/dave:/bin/bash

The login process creates the home directory /home/dave if it does not already exist and populates it with the standard skeleton files by the mkhomedir_helper command.

The configuration file /etc/nss_mapuser.conf configures the plugins. The file includes the mapped account name, which is radius_user by default. You can change the mapped account name by editing the file. The nss_mapuser (5) man page describes the configuration file.

A flat file mapping derives from the session number assigned during login, which persists across su and sudo. Cumulus Linux removes the mapping at logout.

Local Fallback Authentication

If a site wants to allow local fallback authentication for a user when none of the RADIUS servers are reachable, you can add a privileged user account as a local account on the switch. The local account must have the same unique identifier as the privileged user and the shell must be the same.

To configure local fallback authentication:

  1. Add a local privileged user account. For example, if the radius_priv_user account in the /etc/passwd file is radius_priv_user:x:1002:1001::/home/radius_priv_user:/sbin/radius_shell, run the following command to add a local privileged user account named johnadmin:

    cumulus@switch:~$ sudo useradd -u 1002 -g 1001 -o -s /sbin/radius_shell johnadmin
    
  2. To enable the local privileged user to run sudo and NVUE commands, run the following commands:

    cumulus@switch:~$ sudo adduser johnadmin nvset
    cumulus@switch:~$ sudo adduser johnadmin nvapply
    cumulus@switch:~$ sudo adduser johnadmin sudo
    cumulus@switch:~$ sudo systemctl restart nvued
    
  3. Edit the /etc/passwd file to move the local user line before to the radius_priv_user line:

    cumulus@switch:~$ sudo vi /etc/passwd
    ...
    johnadmin:x:1002:1001::/home/johnadmin:/sbin/radius_shell
    radius_priv_user:x:1002:1001::/home/radius_priv_user:/sbin/radius_shell
    
  4. To set the local password for the local user, run the following command:

    cumulus@switch:~$ sudo passwd johnadmin
    

Verify RADIUS Client Configuration

To verify that you configured the RADIUS client correctly, log in as a non-privileged user and run a nv set interface command.

In this example, the ops user is not a privileged RADIUS user so the ops user cannot add an interface.

ops@leaf01:~$ nv set interface swp1
ERROR: User ops does not have permission to make networking changes.

In this example, the admin user is a privileged RADIUS user (with privilege level 15) so is able to add interface swp1.

admin@leaf01:~$ nv set interface swp1
admin@leaf01:~$ nv apply

Remove RADIUS Client Packages

Remove the RADIUS packages with the following command:

cumulus@switch:~$ sudo apt-get remove libnss-mapuser libpam-radius-auth

When you remove the packages, Cumulus Linux deletes the plugins from the /etc/nsswitch.conf file and from the PAM files.

To remove all configuration files for these packages, run:

cumulus@switch:~$ sudo apt-get purge libnss-mapuser libpam-radius-auth

The RADIUS fixed account is not removed from the /etc/passwd or /etc/group file and the home directories are not removed. They remain in case there are modifications to the account or files in the home directories.

To remove the home directories of the RADIUS users, first get the list by running:

cumulus@switch:~$ sudo ls -l /home | grep radius

For all users listed, except the radius_user, run this command to remove the home directories:

cumulus@switch:~$ sudo deluser --remove-home USERNAME

where USERNAME is the account name (the home directory relative portion). This command gives the following warning because the user is not listed in the /etc/passwd file.

userdel: cannot remove entry 'USERNAME' from /etc/passwd
/usr/sbin/deluser: `/usr/sbin/userdel USERNAME' returned error code 1. Exiting.

After you remove all the RADIUS users, run the command to remove the fixed account. If there are changes to the account in the /etc/nss_mapuser.conf file, use that account name instead of radius_user.

cumulus@switch:~$ sudo deluser --remove-home radius_user
cumulus@switch:~$ sudo deluser --remove-home radius_priv_user
cumulus@switch:~$ sudo delgroup radius_users

Considerations

Netfilter - ACLs

Netfilter is the packet filtering framework in Cumulus Linux and other Linux distributions. You can use several different tools to configure ACLs in Cumulus Linux:

Traffic Rules

Chains

Netfilter describes the way that the Linux kernel classifies and controls packets to, from, and across the switch. Netfilter does not require a separate software daemon to run; it is part of the Linux kernel. Netfilter asserts policies at layer 2, 3 and 4 of the OSI model by inspecting packet and frame headers according to a list of rules. The iptables, ip6tables, and ebtables userspace applications provide syntax you use to define rules.

The rules inspect or operate on packets at several points (chains) in the life of the packet through the system:

Tables

When you build rules to affect the flow of traffic, tables can access the individual chains. Linux provides three tables by default:

Each table has a set of default chains that modify or inspect packets at different points of the path through the switch. Chains contain the individual rules to influence traffic.

Rules

Rules classify the traffic you want to control. You apply rules to chains, which attach to tables.

Rules have several different components:

How Rules Parse and Apply

The switch reads all the rules from each chain from iptables, ip6tables, and ebtables and enters them in order into either the filter table or the mangle table. The switch reads the rules from the kernel in the following order:

When you combine and put rules into one table, the order determines the relative priority of the rules; iptables and ip6tables have the highest precedence and ebtables has the lowest.

The Linux packet forwarding construct is an overlay for how the silicon underneath processes packets. Be aware of the following:

Rule Placement in Memory

INPUT and ingress (FORWARD -i) rules occupy the same memory space. A rule counts as ingress if you set the -i option. If you set both input and output options (-i and -o), the switch considers the rule as ingress and occupies that memory space. For example:

-A FORWARD -i swp1 -o swp2 -s 10.0.14.2 -d 10.0.15.8 -p tcp -j ACCEPT

If you set an output flag with the INPUT chain, you see an error. For example:

-A FORWARD,INPUT -i swp1 -o swp2 -s 10.0.14.2 -d 10.0.15.8 -p tcp -j ACCEPT
error: line 2 : output interface specified with INPUT chain error processing rule '-A FORWARD,INPUT -i swp1 -o swp2 -s 10.0.14.2 -d 10.0.15.8 -p tcp -j ACCEPT'

If you remove the -o option and the interface, it is a valid rule.

Nonatomic Update Mode and Atomic Update Mode

Cumulus Linux enables atomic update mode by default. However, this mode limits the number of ACL rules that you can configure.

To increase the number of configurable ACL rules, configure the switch to operate in nonatomic mode.

Instead of reserving 50% of your TCAM space for atomic updates, incremental update uses the available free space to write the new TCAM rules and swap over to the new rules after this is complete. Cumulus Linux then deletes the old rules and frees up the original TCAM space. If there is insufficient free space to complete this task, the original nonatomic update runs, which interrupts traffic.

You can enable nonatomic updates for switchd, which offer better scaling because all TCAM resources actively impact traffic. With atomic updates, half of the hardware resources are on standby and do not actively impact traffic.

Incremental nonatomic updates are table based, so they do not interrupt network traffic when you install new rules. The rules map to the following tables and update in this order:

The incremental nonatomic update operation follows this order:

  1. Updates are incremental, one table at a time without stopping traffic.
  2. Cumulus Linux checks if the rules in a table are different from installation time; if a table does not have any changes, it does not reinstall the rules.
  3. If there are changes in a table, the new rules populate in new groups or slices in hardware, then that table switches over to the new groups or slices.
  4. Finally, old resources for that table free up. This process repeats for each of the tables listed above.
  5. If there are insufficient resources to hold both the new rule set and old rule set, Cumulus Linux tries the regular nonatomic mode, which interrupts network traffic.
  6. If the regular nonatomic update fails, Cumulus Linux reverts back to the previous rules.

To always reload switchd with nonatomic updates:

  1. Edit /etc/cumulus/switchd.conf.

  2. Add the following line to the file:

    acl.non_atomic_update_mode = TRUE
    
  3. Reload switchd with the sudo systemctl reload switchd.service command for the changes to take effect. The reload does not interrupt network services.

During regular non-incremental nonatomic updates, traffic stops, then continues after all the new configuration is in the hardware.

Use iptables, ip6tables, and ebtables Directly

Do not use iptables, ip6tables, ebtables directly; installed rules only apply to the Linux kernel and Cumulus Linux does not hardware accelerate. When you run cl-acltool -i, Cumulus Linux resets all rules and deletes anything that is not in /etc/cumulus/acl/policy.conf.

For example, the following rule appears to work:

cumulus@switch:~$ sudo iptables -A INPUT -p icmp --icmp-type echo-request -j DROP

The cl-acltool -L command shows the rule:

cumulus@switch:~$ sudo cl-acltool -L ip
-------------------------------
Listing rules of type iptables:
-------------------------------

TABLE filter :
Chain INPUT (policy ACCEPT 72 packets, 5236 bytes)
pkts bytes target prot opt in out source destination
0 0 DROP icmp -- any any anywhere anywhere icmp echo-request

However, Cumulus Linux does not synchronize the rule to hardware. Running cl-acltool -i or reboot removes the rule without replacing it. To ensure that Cumulus Linux hardware accelerates all rules that can be in hardware, add them to /etc/cumulus/acl/policy.conf and install them with the cl-acltool -i command.

Estimate the Number of Rules

To estimate the number of rules you can create from an ACL entry, first determine if that entry is an ingress or an egress. Then, determine if it is an IPv4-mac or IPv6 type rule. This determines the slice to which the rule belongs. Use the following to determine how many entries the switch uses for each type.

By default, each entry occupies one double wide entry, except if the entry is one of the following:

Match on VLAN IDs on Layer 2 Interfaces

You can match on VLAN IDs on layer 2 interfaces for ingress rules. The following example matches on a VLAN and DSCP class, and sets the internal class of the packet. For extended matching on IP fields, combine this rule with ingress iptable rules.

[ebtables]
-A FORWARD -p 802_1Q --vlan-id 100 -j mark --mark-set 102

[iptables]
-A FORWARD -i swp31 -m mark --mark 102 -m dscp --dscp-class CS1 -j SETCLASS --class 2

  • Cumulus Linux reserves mark values between 0 and 100; for example, if you use --mark-set 10, you see an error. Use mark values between 101 and 4196.
  • You cannot mark multiple VLANs with the same value.
  • If you enable EVPN-MH and configure VLAN match rules in ebtables with a {{mark}} target, the ebtables rule might overwrite the {{mark}} set by traffic class rules you configure for EVPN-MH on ingress. Egress EVPN MH traffic class rules that match the ingress traffic class {{mark}} might not get hit. To work around this issue, add ebtable rules to {{ACCEPT}} the packets already marked by EVPN-MH traffic class rules on ingress.

Install and Manage ACL Rules with NVUE

Instead of crafting a rule by hand, then installing it with cl-acltool, you can use NVUE commands. Cumulus Linux converts the commands to the /etc/cumulus/acl/policy.d/50_nvue.rules file. The rules you create with NVUE are independent of the default files /etc/cumulus/acl/policy.d/00control_plane.rules and 99control_plane_catch_all.rules.

Cumulus Linux 5.0 and later uses the -t mangle -A PREROUTING chain for ingress rules and the -t mangle -A POSTROUTING chain for egress rules instead of the - A FORWARD chain used in previous releases.

Consider the following iptables rule:

-t mangle -A PREROUTING -i swp1 -s 10.0.14.2/32 -d 10.0.15.8/32 -p tcp -j ACCEPT

To create this rule with NVUE, follow the steps below. NVUE adds all options in the rule automatically.

  1. Set the rule type, the matching protocol, source IP address and port, destination IP address and port, and the action. You must provide a name for the rule (EXAMPLE1 in the commands below):

    cumulus@switch:~$ nv set acl EXAMPLE1 type ipv4
    cumulus@switch:~$ nv set acl EXAMPLE1 rule 10 match ip protocol tcp
    cumulus@switch:~$ nv set acl EXAMPLE1 rule 10 match ip source-ip 10.0.14.2/32
    cumulus@switch:~$ nv set acl EXAMPLE1 rule 10 match ip source-port ANY
    cumulus@switch:~$ nv set acl EXAMPLE1 rule 10 match ip dest-ip 10.0.15.8/32
    cumulus@switch:~$ nv set acl EXAMPLE1 rule 10 match ip dest-port ANY
    cumulus@switch:~$ nv set acl EXAMPLE1 rule 10 action permit
    
  2. Apply the rule to an inbound or outbound interface with the nv set interface <interface> acl command.

    • For rules affecting the -t mangle -A PREROUTING chain (-A FORWARD in previous releases), apply the rule to an inbound or outbound interface: For example:
    cumulus@switch:~$ nv set interface swp1 acl EXAMPLE1 inbound
    cumulus@switch:~$ nv config apply
    
    • For rules affecting the INPUT or OUPUT chain (-A INPUT or -A OUTPUT), apply the rule to a control plane interface. For example:
    cumulus@switch:~$ nv set interface swp1 acl EXAMPLE1 inbound control-plane
    cumulus@switch:~$ nv config apply
    

To see all installed rules, examine the /etc/cumulus/acl/policy.d/50_nvue.rules file:

cumulus@switch:~$ sudo cat /etc/cumulus/acl/policy.d/50_nvue.rules
[iptables]

## ACL EXAMPLE1 in dir inbound on interface swp1 ##
-t mangle -A PREROUTING -i swp1 -s 10.0.14.2/32 -d 10.0.15.8/32 -p tcp -j ACCEPT
...

To remove this rule, run the nv unset acl <acl-name> and nv unset interface <interface> acl <acl-name> commands. These commands delete the rule from the /etc/cumulus/acl/policy.d/50_nvue.rules file.

cumulus@switch:~$ nv unset acl EXAMPLE1
cumulus@switch:~$ nv unset interface swp1 acl EXAMPLE1
cumulus@switch:~$ nv config apply

To show ACL statistics per interface, such as the total number of bytes that match the ACL rule, run the nv show interface <interface-id> acl <acl-id> statistics or nv show interface <interface-id> acl <acl-id> statistics <rule-id> command.

To see the list of all NVUE ACL commands, run the nv list-commands acl command.

Install and Manage ACL Rules with cl-acltool

You can manage Cumulus Linux ACLs with cl-acltool. Rules write first to the iptables chains, as described above, and then synchronize to hardware through switchd.

To examine the current state of chains and list all installed rules, run:

cumulus@switch:~$ sudo cl-acltool -L all
 -------------------------------
Listing rules of type iptables:
-------------------------------

TABLE filter :
Chain INPUT (policy ACCEPT 90 packets, 14456 bytes)
pkts bytes target prot opt in out source destination
0 0 DROP all -- swp+ any 240.0.0.0/5 anywhere
0 0 DROP all -- swp+ any loopback/8 anywhere
0 0 DROP all -- swp+ any base-address.mcast.net/8 anywhere
0 0 DROP all -- swp+ any 255.255.255.255 anywhere ...

To list installed rules using native iptables, ip6tables and ebtables, use the -L option with the respective commands:

cumulus@switch:~$ sudo iptables -L
cumulus@switch:~$ sudo ip6tables -L
cumulus@switch:~$ sudo ebtables -L

To remove all installed rules, run:

cumulus@switch:~$ sudo cl-acltool -F all

To remove only the IPv4 iptables rules, run:

cumulus@switch:~$ sudo cl-acltool -F ip

If the install fails, ACL rules in the kernel and hardware roll back to the previous state. You also see errors from programming rules in the kernel or ASIC.

Install Packet Filtering (ACL) Rules

cl-acltool takes access control list (ACL) rule input in files. Each ACL policy file includes iptables, ip6tables and ebtables categories under the tags [iptables], [ip6tables] and [ebtables]. You must assign each rule in an ACL policy to one of the rule categories.

See man cl-acltool(5) for ACL rule details. For iptables rule syntax, see man iptables(8). For ip6tables rule syntax, see man ip6tables(8). For ebtables rule syntax, see man ebtables(8).

See man cl-acltool(5) and man cl-acltool(8) for more details on using cl-acltool.

By default:

  • ACL policy files are in /etc/cumulus/acl/policy.d/.
  • All *.rules files in /etc/cumulus/acl/policy.d/ directory are also in /etc/cumulus/acl/policy.conf.
  • All files in the policy.conf file install when the switch boots up.
  • The policy.conf file expects rule files to have a .rules suffix as part of the file name.

Here is an example ACL policy file:

[iptables]
-A INPUT -i swp1 -p tcp --dport 80 -j ACCEPT
-A FORWARD -i swp1 -p tcp --dport 80 -j ACCEPT

[ip6tables]
-A INPUT -i swp1 -p tcp --dport 80 -j ACCEPT
-A FORWARD -i swp1 -p tcp --dport 80 -j ACCEPT

[ebtables]
-A INPUT -p IPv4 -j ACCEPT
-A FORWARD -p IPv4 -j ACCEPT

You can use wildcards or variables to specify chain and interface lists.

You can only use swp+ and bond+ as wildcard names.

swp+ rules apply as an aggregate, not per port. If you want to apply per port policing, specify a specific port instead of the wildcard.

INGRESS = swp+
INPUT_PORT_CHAIN = INPUT,FORWARD

[iptables]
-A $INPUT_PORT_CHAIN -i $INGRESS -p tcp --dport 80 -j ACCEPT

[ip6tables]
-A $INPUT_PORT_CHAIN -i $INGRESS -p tcp --dport 80 -j ACCEPT

[ebtables]
-A INPUT -p IPv4 -j ACCEPT

You can write ACL rules for the system into multiple files under the default /etc/cumulus/acl/policy.d/ directory. The ordering of rules during installation follows the sort order of the files according to their file names.

Use multiple files to stack rules. The example below shows two rule files that separate rules for management and datapath traffic:

cumulus@switch:~$ ls /etc/cumulus/acl/policy.d/
00sample_mgmt.rules 01sample_datapath.rules
cumulus@switch:~$ cat /etc/cumulus/acl/policy.d/00sample_mgmt.rules

INGRESS_INTF = swp+
INGRESS_CHAIN = INPUT

[iptables]
# protect the switch management
-A $INGRESS_CHAIN -i $INGRESS_INTF -s 10.0.14.2 -d 10.0.15.8 -p tcp -j ACCEPT
-A $INGRESS_CHAIN -i $INGRESS_INTF -s 10.0.11.2 -d 10.0.12.8 -p tcp -j ACCEPT
-A $INGRESS_CHAIN -i $INGRESS_INTF -d 10.0.16.8 -p udp -j DROP

cumulus@switch:~$ cat /etc/cumulus/acl/policy.d/01sample_datapath.rules
INGRESS_INTF = swp+
INGRESS_CHAIN = INPUT, FORWARD

[iptables]
-A $INGRESS_CHAIN -i $INGRESS_INTF -s 192.0.2.5 -p icmp -j ACCEPT
-A $INGRESS_CHAIN -i $INGRESS_INTF -s 192.0.2.6 -d 192.0.2.4 -j DROP
-A $INGRESS_CHAIN -i $INGRESS_INTF -s 192.0.2.2 -d 192.0.2.8 -j DROP

Install all ACL policies under a directory:

cumulus@switch:~$ sudo cl-acltool -i -P ./rules
Reading files under rules
Reading rule file ./rules/01_http_rules.txt ...
Processing rules in file ./rules/01_http_rules.txt ...
Installing acl policy ...
Done.

Apply all rules and policies included in /etc/cumulus/acl/policy.conf:

cumulus@switch:~$ sudo cl-acltool -i

Specify the Policy Files to Install

By default, Cumulus Linux installs any .rules file you configure in /etc/cumulus/acl/policy.d/. To add other policy files to an ACL, you need to include them in /etc/cumulus/acl/policy.conf. For example, for Cumulus Linux to install a rule in a policy file called 01_new.datapathacl, add include /etc/cumulus/acl/policy.d/01_new.rules to policy.conf:

cumulus@switch:~$ sudo nano /etc/cumulus/acl/policy.conf

#
# This file is a master file for acl policy file inclusion
#
# Note: This is not a file where you list acl rules.
#
# This file can contain:
# - include lines with acl policy files
#   example:
#     include <filepath>
#
# see manpage cl-acltool(5) and cl-acltool(8) for how to write policy files 
#

include /etc/cumulus/acl/policy.d/01_new.datapathacl

Hardware Limitations on Number of Rules

The maximum number of rules that the hardware process depends on:

If you exceed the maximum number of rules for a particular table, cl-acltool -i generates the following error:

error: hw sync failed (sync_acl hardware installation failed) Rolling back .. failed.

In the table below, the default rules count toward the limits listed. The raw limits below assume only one ingress and one egress table are present.

The NVIDIA Spectrum ASIC has one common TCAM for both ingress and egress, which you can use for other non-ACL-related resources. However, the number of supported rules varies with the TCAM profile for the switch.

ProfileAtomic Mode IPv4 RulesAtomic Mode IPv6 RulesNonatomic Mode IPv4 RulesNonatomic Mode IPv6 Rules
default5002501000500
ipmc-heavy75050015001000
acl-heavy1750100035002000
ipmc-max100050020001000
ip-acl-heavy60000120000

  • Even though the table above specifies the ip-acl-heavy profile supports no IPv6 rules, Cumulus Linux does not prevent you from configuring IPv6 rules. However, there is no guarantee that IPv6 rules work under the ip-acl-heavy profile.
  • The ip-acl-heavy profile shows an updated number of supported atomic mode and nonatomic mode IPv4 rules. The previously published numbers were 7500 for atomic mode and 15000 for nonatomic mode IPv4 rules.

Supported Rule Types

The iptables/ip6tables/ebtables construct tries to layer the Linux implementation on top of the underlying hardware but they are not always directly compatible. Here are the supported rules for chains in iptables, ip6tables and ebtables.

To learn more about any of the options shown in the tables below, run iptables -h [name of option]. The same help syntax works for options for ip6tables and ebtables.

root@leaf1# ebtables -h tricolorpolice
...
tricolorpolice option:
--set-color-mode STRING setting the mode in blind or aware
--set-cir INT setting committed information rate in kbits per second
--set-cbs INT setting committed burst size in kbyte
--set-pir INT setting peak information rate in kbits per second
--set-ebs INT setting excess burst size in kbyte
--set-conform-action-dscp INT setting dscp value if the action is accept for conforming packets
--set-exceed-action-dscp INT setting dscp value if the action is accept for exceeding packets
--set-violate-action STRING setting the action (accept/drop) for violating packets
--set-violate-action-dscp INT setting dscp value if the action is accept for violating packets
Supported chains for the filter table:
INPUT FORWARD OUTPUT

iptables and ip6tables Rule Support

Rule ElementSupportedUnsupported
MatchesSrc/Dst, IP protocol
In/out interface
IPv4: ecn, icmp, frag, ttl,
IPv6: icmp6, hl,
IP common: tcp (with flags), udp, multiport, DSCP, addrtype
Rules with input/output Ethernet interfaces do not apply
Inverse matches
Standard TargetsACCEPT, DROPRETURN, QUEUE, STOP, Fall Thru, Jump
Extended TargetsLOG (IPv4/IPv6); UID is not supported for LOG
TCP SEQ, TCP options or IP options
ULOG
SETQOS
DSCP
Unique to Cumulus Linux:
SPAN
ERSPAN (IPv4/IPv6)
POLICE
TRICOLORPOLICE
SETCLASS

ebtables Rule Support

Rule ElementSupportedUnsupported
Matchesether type
input interface/wildcard
output interface/wildcard
Src/Dst MAC
IP: src, dest, tos, proto, sport, dport
IPv6: tclass, icmp6: type, icmp6: code range, src/dst addr, sport, dport
802.1p (CoS)
VLAN
Inverse matches
Proto length
Standard TargetsACCEPT, DROPRETURN, CONTINUE, Jump, Fall Thru
Extended TargetsULOG
LOG
Unique to Cumulus Linux:
SPAN
ERSPAN
POLICE
TRICOLORPOLICE
SETCLASS

Other Unsupported Rules

Considerations

Splitting rules across the ingress TCAM and the egress TCAM causes the ingress IPv6 part of the rule to match packets going to all destinations, which can interfere with the regular expected linear rule match in a sequence. For example:

A higher rule can prevent a lower rule from matching:

Rule 1: -A FORWARD -o vlan100 -p icmp6 -j ACCEPT

Rule 2: -A FORWARD -o vlan101 -p icmp6 -s 01::02 -j ACCEPT

Rule 1 matches all icmp6 packets from to all out interfaces in the ingress TCAM.

This prevents rule 2 from matching, which is more specific but with a different out interface. Make sure to put more specific matches above more general matches even if the output interfaces are different.

When you have two rules with the same output interface, the lower rule might match depending on the presence of the previous rules.

Rule 1: -A FORWARD -o vlan100 -p icmp6 -j ACCEPT

Rule 2: -A FORWARD -o vlan101 -s 00::01 -j DROP

Rule 3: -A FORWARD -o vlan101 -p icmp6 -j ACCEPT

Rule 3 still matches for an icmp6 packet with sip 00:01 going out of vlan101. Rule 1 interferes with the normal function of rule 2 and/or rule 3.

When you have two adjacent rules with the same match and different output interfaces, such as:

Rule 1: -A FORWARD -o vlan100 -p icmp6 -j ACCEPT

Rule 2: -A FORWARD -o vlan101 -p icmp6 -j DROP

Rule 2 never matches on ingress. Both rules share the same mark.

Common Examples

Data Plane Policers

You can configure quality of service for traffic on the data plane. By using QoS policers, you can rate limit traffic so incoming packets get dropped if they exceed specified thresholds.

Counters on POLICE ACL rules in iptables do not show dropped packets due to those rules.

The following example rate limits the incoming traffic on swp1 to 400 packets per second with a burst of 200 packets per second:

cumulus@switch:~$ nv set acl example1 type ipv4
cumulus@switch:~$ nv set acl example1 rule 10 action police
cumulus@switch:~$ nv set acl example1 rule 10 action police mode packet
cumulus@switch:~$ nv set acl example1 rule 10 action police burst 200
cumulus@switch:~$ nv set acl example1 rule 10 action police rate 400
cumulus@switch:~$ nv set interface swp1 acl example1 inbound
cumulus@switch:~$ nv config apply

Use the POLICE target with iptables. POLICE takes these arguments:

  • --set-rate value specifies the maximum rate in kilobytes (KB) or packets.
  • --set-burst value specifies the number of packets or kilobytes (KB) allowed to arrive sequentially.
  • --set-mode string sets the mode in KB (kilobytes) or pkt (packets) for rate and burst size.

For example, to rate limit the incoming traffic on swp1 to 400 packets per second with a burst of 200 packets per second and set this rule in your appropriate .rules file:

-t mangle -A PREROUTING -i swp1  -j POLICE --set-mode pkt --set-rate 400 --set-burst 200

Control Plane Policers

You can configure quality of service for traffic on the control plane and rate limit traffic so incoming packets drop if they exceed certain thresholds in the following ways:

Cumulus Linux 5.0 and later no longer uses INPUT chain rules to configure control plane policers.

To configure control plane policers:

  • Set the burst rate for the trap group with the nv set system control-plane policer <trap-group> burst <value> command. The burst rate is the number of packets or kilobytes (KB) allowed to arrive sequentially.
  • Set the forwarding rate for the trap group with the nv set system control-plane policer <trap-group> rate <value> command. The forwarding rate is the maximum rate in kilobytes (KB) or packets.

The trap group can be: arp, bfd, pim-ospf-rip, bgp, clag, icmp-def, dhcp-ptp, igmp, ssh, icmp6-neigh, icmp6-def-mld, lacp, lldp, rpvst, eapol, ip2me, acl-log, nat, stp, l3-local, span-cpu, catch-all, or NONE.

The following example changes the PIM trap group forwarding rate and burst rate to 400 packets per second, and the IGMP trap group forwarding rate to 400 packets per second and burst rate to 200 packets per second:

cumulus@switch:~$ nv set system control-plane policer pim-ospf-rip rate 400
cumulus@switch:~$ nv set system control-plane policer pim-ospf-rip burst 400
cumulus@switch:~$ nv set system control-plane policer pim-ospf-rip state on
cumulus@switch:~$ nv set system control-plane policer igmp rate 400
cumulus@switch:~$ nv set system control-plane policer igmp burst 200
cumulus@switch:~$ nv config apply

To rate limit traffic using the /etc/cumulus/control-plane/policers.conf file, you:

  • Enable an individual policer for a trap group (set enable to TRUE).
  • Set the policer rate in packets per second. The forwarding rate is the maximum rate in kilobytes (KB) or packets.
  • Set the policer burst rate in packets per second. The burst rate is the number of packets or kilobytes (KB) allowed to arrive sequentially.

After you edit the /etc/cumulus/control-plane/policers.conf file, you must reload the file with the /usr/lib/cumulus/switchdctl --load /etc/cumulus/control-plane/policers.conf command.

When enable is FALSE for a trap group, the trap group and catch-all trap group have a shared policer. When enable is TRUE, Cumulus Linux creates an individual policer for the trap group.

The following example changes the PIM trap group forwarding rate and burst rate to 400 packets per second, and the IGMP trap group forwarding rate to 400 packets per second and burst rate to 200 packets per second:

cumulus@switch:~$ sudo nano /etc/cumulus/control-plane/policers.conf
...
copp.pim_ospf_rip.enable = TRUE
copp.pim_ospf_rip.rate = 400
copp.pim_ospf_rip.burst = 400
...
copp.igmp.enable = TRUE
copp.igmp.rate = 400
copp.igmp.burst = 200
...
cumulus@switch:~$ /usr/lib/cumulus/switchdctl --load /etc/cumulus/control-plane/policers.conf

To show the control plane police configuration and statistics, run the NVUE nv show system control-plane policer --view=statistics command.

Cumulus Linux provides default control plane policer values. You can adjust these values to accommodate higher scale requirements for specific protocols as needed.

Policers Default Values
cumulus@leaf01:mgmt:~$ sudo cat /etc/cumulus/control-plane/policers.conf
copp.arp.enable = TRUE
copp.arp.rate = 800
copp.arp.burst = 800

copp.bfd.enable = TRUE
copp.bfd.rate = 2000
copp.bfd.burst = 2000

copp.pim_ospf_rip.enable = TRUE
copp.pim_ospf_rip.rate = 2000
copp.pim_ospf_rip.burst = 2000

copp.bgp.enable = TRUE
copp.bgp.rate = 2000
copp.bgp.burst = 2000

copp.clag.enable = TRUE
copp.clag.rate = 2000
copp.clag.burst = 2000

copp.icmp_def.enable = TRUE
copp.icmp_def.rate = 100
copp.icmp_def.burst = 40

copp.dhcp_ptp.enable = TRUE
copp.dhcp_ptp.rate = 2000
copp.dhcp_ptp.burst = 2000

copp.igmp.enable = TRUE
copp.igmp.rate = 1000
copp.igmp.burst = 1000

copp.ssh.enable = TRUE
copp.ssh.rate = 1000
copp.ssh.burst = 1000

copp.icmp6_neigh.enable = TRUE
copp.icmp6_neigh.rate = 500
copp.icmp6_neigh.burst = 500

copp.icmp6_def_mld.enable = TRUE
copp.icmp6_def_mld.rate = 300
copp.icmp6_def_mld.burst = 100

copp.lacp.enable = TRUE
copp.lacp.rate = 2000
copp.lacp.burst = 2000

copp.lldp.enable = TRUE
copp.lldp.rate = 200
copp.lldp.burst = 200

copp.rpvst.enable = TRUE
copp.rpvst.rate = 2000
copp.rpvst.burst = 2000

copp.eapol.enable = TRUE
copp.eapol.rate = 2000
copp.eapol.burst = 2000

copp.ip2me.enable = TRUE
copp.ip2me.rate = 1000
copp.ip2me.burst = 1000

copp.acl_log.enable = TRUE
copp.acl_log.rate = 100
copp.acl_log.burst = 100

copp.nat.enable = TRUE
copp.nat.rate = 200
copp.nat.burst = 200

copp.stp.enable = TRUE
copp.stp.rate = 2000
copp.stp.burst = 2000

copp.l3_local.enable = TRUE
copp.l3_local.rate = 400
copp.l3_local.burst = 100

copp.span_cpu.enable = TRUE
copp.span_cpu.rate = 100
copp.span_cpu.burst = 100

copp.catch_all.enable = TRUE
copp.catch_all.rate = 100
copp.catch_all.burst = 100

Control Plane ACLs

You can configure control plane ACLs to apply a single rule for all packets forwarded to the CPU regardless of the source interface or destination interface on the switch. Control plane ACLs allow you to regulate traffic forwarded to applications on the switch with more granularity than traps and to configure ACLs to block SSH from specific addresses or subnets.

Cumulus Linux applies inbound control plane ACLs in the INPUT chain and outbound control plane ACLs in the OUTPUT chain.

Cumulus Linux does not support a deny all control plane rule. This type of rule blocks traffic for interprocess communication and impacts overall system functionality.

The following example command applies the input control plane ACL called ACL1.

cumulus@switch:~$ nv set system control-plane acl ACL1 inbound
cumulus@switch:~$ nv config apply

The following example command applies the output control plane ACL called ACL2.

cumulus@switch:~$ nv set system control-plane acl ACL2 outbound
cumulus@switch:~$ nv config apply

To show statistics for all control-plane ACLs, run the nv show system control-plane acl command:

cumulus@switch:~$ nv show system control-plane acl
ACL Name   Rule ID  In Packets  In Bytes  Out Packets  Out Bytes
---------  -------  ----------  --------  -----------  ---------
acl1       1        0           0         0            0
           65535    0           0         0            0
acl2       1        0           0         0            0
           65535    0           0         0            0 

To show statistics for a specific control-plane ACL, run the nv show system control-plane acl <acl_name> statistics command:

cumulus@switch:~$ nv show system control-plane acl ACL1 statistics
Rule  In Packet  In Byte  Out Packet  Out Byte  Summary 

----  ---------  -------  ----------  --------  --------------------------- 

1     0          0 Bytes  0           0 Bytes   match.ip.dest-ip:   9.1.2.3 

2     0          0 Bytes  0           0 Bytes   match.ip.source-ip: 7.8.2.3 

Set DSCP on Transit Traffic

The examples here use the mangle table to modify the packet as it transits the switch. DSCP is in decimal notation in the examples below.

[iptables]

#Set SSH as high priority traffic.
-t mangle -A PREROUTING -i swp+ -p tcp -m multiport --dports 22 -j SETQOS --set-dscp 46

#Set everything coming in swp1 as AF13
-t mangle -A PREROUTING -i swp1  -j SETQOS --set-dscp 14

#Set Packets destined for 10.0.100.27 as best effort
-t mangle -A PREROUTING -i swp+ -d 10.0.100.27/32 -j SETQOS --set-dscp 0

#Example using a range of ports for TCP traffic
-t mangle -A PREROUTING -i swp+ -s 10.0.0.17/32 -d 10.0.100.27/32 -p tcp -m multiport --sports 10000:20000 -m multiport --dports 10000:20000 -j SETQOS --set-dscp 34

Apply the rule:

cumulus@switch:~$ sudo cl-acltool -i

To set SSH as high priority traffic:

cumulus@switch:~$ nv set acl EXAMPLE1 type ipv4
cumulus@switch:~$ nv set acl EXAMPLE1 rule 10 match ip protocol tcp
cumulus@switch:~$ nv set acl EXAMPLE1 rule 10 match ip dest-port 22
cumulus@switch:~$ nv set acl EXAMPLE1 rule 10 action set dscp 46
cumulus@switch:~$ nv set interface swp1-48 acl EXAMPLE1 inbound
cumulus@switch:~$ nv config apply

To set everything coming in swp1 as AF13:

cumulus@switch:~$ nv set acl EXAMPLE1 type ipv4
cumulus@switch:~$ nv set acl EXAMPLE1 rule 10 action set dscp 14
cumulus@switch:~$ nv set interface swp1 acl EXAMPLE1 inbound
cumulus@switch:~$ nv config apply

To set Packets destined for 10.0.100.27 as best effort:

cumulus@switch:~$ nv set acl EXAMPLE1 type ipv4
cumulus@switch:~$ nv set acl EXAMPLE1 rule 10 match ip dest-ip 10.0.100.27/32
cumulus@switch:~$ nv set acl EXAMPLE1 rule 10 action set dscp 0
cumulus@switch:~$ nv set interface swp1-48 acl EXAMPLE1 inbound
cumulus@switch:~$ nv config apply

To use a range of ports for TCP traffic:

cumulus@switch:~$ nv set acl EXAMPLE1 type ipv4
cumulus@switch:~$ nv set acl EXAMPLE1 rule 10 match ip protocol tcp
cumulus@switch:~$ nv set acl EXAMPLE1 rule 10 match ip source-ip 10.0.0.17/32
cumulus@switch:~$ nv set acl EXAMPLE1 rule 10 match ip source-port 10000:20000
cumulus@switch:~$ nv set acl EXAMPLE1 rule 10 match ip dest-ip 10.0.100.27/32
cumulus@switch:~$ nv set acl EXAMPLE1 rule 10 match ip dest-port 10000:20000
cumulus@switch:~$ nv set acl EXAMPLE1 rule 10 action set dscp 34
cumulus@switch:~$ nv set interface swp1-48 acl EXAMPLE1 inbound
cumulus@switch:~$ nv config apply

To specify all ports on the switch in NVUE (swp+ in an iptables rule), you must set the range of interfaces on the switch as in the examples above (nv set interface swp1-48). This command creates as many rules in the /etc/cumulus/acl/policy.d/50_nvue.rules file as the number of interfaces in the range you specify.

Filter Specific TCP Flags

The example rule below drops ingress IPv4 TCP packets when you set the SYN bit and reset the RST, ACK, and FIN bits. The rule applies inbound on interface swp1. After configuring this rule, you cannot establish new TCP sessions that originate from ingress port swp1. You can establish TCP sessions that originate from any other port.

-t mangle -A PREROUTING -i swp1 -p tcp --tcp-flags  ACK,SYN,FIN,RST SYN -j DROP

Apply the rule:

cumulus@switch:~$ sudo cl-acltool -i
cumulus@switch:~$ nv set acl EXAMPLE1 type ipv4
cumulus@switch:~$ nv set acl EXAMPLE1 rule 20 match ip protocol tcp
cumulus@switch:~$ nv set acl EXAMPLE1 rule 20 match ip tcp flags syn
cumulus@switch:~$ nv set acl EXAMPLE1 rule 20 match ip tcp mask rst
cumulus@switch:~$ nv set acl EXAMPLE1 rule 20 match ip tcp mask syn
cumulus@switch:~$ nv set acl EXAMPLE1 rule 20 match ip tcp mask fin
cumulus@switch:~$ nv set acl EXAMPLE1 rule 20 match ip tcp mask ack
cumulus@switch:~$ nv set acl EXAMPLE1 rule 20 action deny 
cumulus@switch:~$ nv set interface swp1 acl EXAMPLE1 inbound
cumulus@switch:~$ nv config apply

Control Who Can SSH into the Switch

Run the following commands to control who can SSH into the switch. In the following example, 10.10.10.1/32 is the interface IP address (or loopback IP address) of the switch and 10.255.4.0/24 can SSH into the switch.

-A INPUT -i swp+ -s 10.255.4.0/24 -d 10.10.10.1/32 -j ACCEPT
-A INPUT -i swp+ -d 10.10.10.1/32 -j DROP

Apply the rule:

cumulus@switch:~$ sudo cl-acltool -i
cumulus@switch:~$ nv set acl example2 type ipv4
cumulus@switch:~$ nv set acl example2 rule 10 match ip source-ip 10.255.4.0/24 
cumulus@switch:~$ nv set acl example2 rule 10 match ip dest-ip 10.10.10.1/32
cumulus@switch:~$ nv set acl example2 rule 10 action permit
cumulus@switch:~$ nv set acl example2 rule 20 match ip source-ip ANY 
cumulus@switch:~$ nv set acl example2 rule 20 match ip dest-ip 10.10.10.1/32
cumulus@switch:~$ nv set acl example2 rule 20 action deny
cumulus@switch:~$ nv set system control-plane acl example2 inbound
cumulus@switch:~$ nv config apply

Match on ECN Bits in the TCP IP Header

ECN allows end-to-end notification of network congestion without dropping packets. You can add ECN rules to match on the ECE, CWR, and ECT flags in the TCP IPv4 header.

By default, ECN rules match a packet with the bit set. You can reverse the match by using an explanation point (!).

Match on the ECE Bit

After an endpoint receives a packet with the CE bit set by a router, it sets the ECE bit in the returning ACK packet to notify the other endpoint that it needs to slow down.

To match on the ECE bit:

Create a rules file in the /etc/cumulus/acl/policy.d directory and add the following rule under [iptables]:

cumulus@switch:~$ sudo nano /etc/cumulus/acl/policy.d/30-tcp-flags.rules
[iptables]
-t mangle -A PREROUTING -i swp1 -p tcp -m ecn  --ecn-tcp-ece  -j ACCEPT

Apply the rule:

cumulus@switch:~$ sudo cl-acltool -i
cumulus@switch:~$ nv set acl example2 type ipv4
cumulus@switch:~$ nv set acl example2 rule 10 match ip protocol tcp
cumulus@switch:~$ nv set acl example2 rule 10 match ip ecn flags tcp-ece
cumulus@switch:~$ nv set acl example2 rule 10 action permit
cumulus@switch:~$ nv set interface swp1 acl example2 inbound
cumulus@switch:~$ nv config apply

Match on the CWR Bit

The CWR bit notifies the other endpoint of the connection that it received and reacted to an ECE.

To match on the CWR bit:

Create a rules file in the /etc/cumulus/acl/policy.d directory and add the following rule under [iptables]:

cumulus@switch:~$ sudo nano /etc/cumulus/acl/policy.d/30-tcp-flags.rules
[iptables]
-t mangle -A PREROUTING -i swp1 -p tcp -m ecn  --ecn-tcp-cwr  -j ACCEPT

Apply the rule:

cumulus@switch:~$ sudo cl-acltool -i
cumulus@switch:~$ nv set acl example2 type ipv4
cumulus@switch:~$ nv set acl example2 rule 10 match ip protocol tcp
cumulus@switch:~$ nv set acl example2 rule 10 match ip ecn flags tcp-cwr
cumulus@switch:~$ nv set acl example2 rule 10 action permit
cumulus@switch:~$ nv set interface swp1 acl example2 inbound
cumulus@switch:~$ nv config apply

Match on the ECT Bit

The ECT codepoints negotiate if the connection is ECN capable by setting one of the two bits to 1. Routers also use the ECT bit to indicate that they are experiencing congestion by setting both the ECT codepoints to 1.

To match on the ECT bit:

Create a rules file in the /etc/cumulus/acl/policy.d directory and add the following rule under [iptables]:

cumulus@switch:~$ sudo nano /etc/cumulus/acl/policy.d/30-tcp-flags.rules
[iptables]
-t mangle -A PREROUTING -i swp1 -p tcp -m ecn  --ecn-ip-ect 1 -j ACCEPT

Apply the rule:

cumulus@switch:~$ sudo cl-acltool -i
cumulus@switch:~$ nv set acl example2 type ipv4
cumulus@switch:~$ nv set acl example2 rule 10 match ip protocol tcp
cumulus@switch:~$ nv set acl example2 rule 10 match ip ecn ip-ect 1
cumulus@switch:~$ nv set acl example2 rule 10 action permit
cumulus@switch:~$ nv set interface swp1 acl example2 inbound
cumulus@switch:~$ nv config apply

Example Configuration

The following example demonstrates how Cumulus Linux applies several different rules.

Egress Rule

The following rule blocks any TCP traffic with destination port 200 going through leaf01 to server01 (rule 1 in the diagram above).

[iptables]
-t mangle -A POSTROUTING -o swp1 -p tcp -m multiport --dports 200 -j DROP
cumulus@switch:~$ nv set acl EXAMPLE1 type ipv4
cumulus@switch:~$ nv set acl EXAMPLE1 rule 10 match ip protocol tcp
cumulus@switch:~$ nv set acl EXAMPLE1 rule 10 match ip dest-port 200
cumulus@switch:~$ nv set acl EXAMPLE1 rule 10 action deny
cumulus@switch:~$ nv set interface swp1 acl EXAMPLE1 outbound
cumulus@switch:~$ nv config apply

Ingress Rule

The following rule blocks any UDP traffic with source port 200 going from server01 through leaf01 (rule 2 in the diagram above).

[iptables] 
-t mangle -A PREROUTING -i swp1 -p udp -m multiport --sports 200 -j DROP
cumulus@switch:~$ nv set acl EXAMPLE1 type ipv4
cumulus@switch:~$ nv set acl EXAMPLE1 rule 10 match ip protocol udp
cumulus@switch:~$ nv set acl EXAMPLE1 rule 10 match ip source-port 200
cumulus@switch:~$ nv set acl EXAMPLE1 rule 10 action deny
cumulus@switch:~$ nv set interface swp1 acl EXAMPLE1 inbound
cumulus@switch:~$ nv config apply

Input Rule

The following rule blocks any UDP traffic with source port 200 and destination port 50 going from server02 to the leaf02 control plane (rule 3 in the diagram above).

[iptables] 
-A INPUT -i swp2 -p udp -m multiport --dports 50 -j DROP
cumulus@switch:~$ nv set acl EXAMPLE1 type ipv4
cumulus@switch:~$ nv set acl EXAMPLE1 rule 10 match ip protocol udp
cumulus@switch:~$ nv set acl EXAMPLE1 rule 10 match ip dest-port 50
cumulus@switch:~$ nv set acl EXAMPLE1 rule 10 action deny
cumulus@switch:~$ nv set interface swp2 acl EXAMPLE1 inbound control-plane
cumulus@switch:~$ nv config apply

Output Rule

The following rule blocks any TCP traffic with source port 123 and destination port 123 going from leaf02 to server02 (rule 4 in the diagram above).

[iptables] 
-A OUTPUT -o swp2 -p tcp -m multiport --sports 123 -m multiport --dports 123 -j DROP
cumulus@switch:~$ nv set acl EXAMPLE1 type ipv4
cumulus@switch:~$ nv set acl EXAMPLE1 rule 10 match ip protocol tcp
cumulus@switch:~$ nv set acl EXAMPLE1 rule 10 match ip source-port 123
cumulus@switch:~$ nv set acl EXAMPLE1 rule 10 match ip dest-port 123
cumulus@switch:~$ nv set acl EXAMPLE1 rule 10 action deny
cumulus@switch:~$ nv set interface swp2 acl EXAMPLE1 outbound control-plane
cumulus@switch:~$ nv config apply

Layer 2 Rules (ebtables)

The following rule blocks any traffic with source MAC address 00:00:00:00:00:12 and destination MAC address 08:9e:01:ce:e2:04 going from any switch port egress or ingress.

[ebtables]
-A FORWARD -s 00:00:00:00:00:12 -d 08:9e:01:ce:e2:04 -j DROP
cumulus@switch:~$ nv set acl EXAMPLE type mac
cumulus@switch:~$ nv set acl EXAMPLE rule 10 match mac source-mac 00:00:00:00:00:12
cumulus@switch:~$ nv set acl EXAMPLE rule 10 match mac dest-mac 08:9e:01:ce:e2:04
cumulus@switch:~$ nv set acl EXAMPLE rule 10 action deny
cumulus@switch:~$ nv set interface swp1-48 acl EXAMPLE inbound
cumulus@switch:~$ nv config apply

Considerations

Not All Rules Supported

Cumulus Linux does not support all iptables, ip6tables, or ebtables rules. Refer to Supported Rules for specific rule support.

ACL Log Policer Limits Traffic

To protect the CPU from overloading, Cumulus Linux limits traffic copied to the CPU to 1 packet per second by an ACL Log Policer.

Bridge Traffic Limitations

Bridge traffic that matches LOG ACTION rules do not log to syslog; the kernel and hardware identify packets using different information.

You Cannot Forward Log Actions

You cannot forward logged packets. The hardware cannot both forward a packet and send the packet to the control plane (or kernel) for logging. A log action must also have a drop action.

SPAN Sessions that Reference an Outgoing Interface

SPAN sessions that reference an outgoing interface create mirrored packets based on the ingress interface before the routing/switching decision. See SPAN Sessions that Reference an Outgoing Interface and Use the CPU Port as the SPAN Destination in the Network Troubleshooting section.

iptables Interactions with cl-acltool

Because Cumulus Linux is a Linux operating system, you can use the iptables commands. However, consider using cl-acltool instead for the following reasons:

For example, running the following command works:

cumulus@switch:~$ sudo iptables -A INPUT -p icmp --icmp-type echo-request -j DROP

The rules appear when you run cl-acltool -L:

cumulus@switch:~$ sudo cl-acltool -L ip
-------------------------------
Listing rules of type iptables:
-------------------------------
TABLE filter :
Chain INPUT (policy ACCEPT 72 packets, 5236 bytes)
pkts bytes target  prot opt in   out   source    destination

0     0 DROP    icmp --  any  any   anywhere  anywhere      icmp echo-request

However, running cl-acltool -i or reboot removes them. To ensure that Cumulus Linux can hardware accelerate all rules that can be in hardware, place them in the /etc/cumulus/acl/policy.conf file, then run cl-acltool -i.

Where to Assign Rules

ACL Rule Installation Failure

After an ACL rule installation failure, you see a generic error message like the following:

cumulus@switch:$ sudo cl-acltool -i -p 00control_plane.rules
Using user provided rule file 00control_plane.rules
Reading rule file 00control_plane.rules ...
Processing rules in file 00control_plane.rules ...
error: hw sync failed (sync_acl hardware installation failed)
Installing acl policy... Rolling back ..
failed.

ACLs Do not Match when the Output Port on the ACL is a Subinterface

The ACL does not match on packets when you configure a subinterface as the output port. The ACL matches on packets only if the primary port is as an output port. If a subinterface is an output or egress port, the packets match correctly.

For example:

-A FORWARD -o swp49s1.100 -j ACCEPT

Egress ACL Matching on Bonds

Cumulus Linux does not support ACL rules that match on an outbound bond interface. For example, you cannot create the following rule:

[iptables]
-A FORWARD -o <bond_intf> -j DROP

To work around this issue, duplicate the ACL rule on each physical port of the bond. For example:

[iptables]
-A FORWARD -o <bond-member-port-1> -j DROP
-A FORWARD -o <bond-member-port-2> -j DROP

SSH Traffic to the Management VRF

To allow SSH traffic to the management VRF, use -i mgmt, not -i eth0. For example:

-A INPUT -i mgmt -s 10.0.14.2/32 -p tcp --dport ssh -j ACCEPT

INPUT Chain Rules and swp+

In INPUT chain rules, the -i swp+ match works only if the destination of the packet is towards a layer 3 swp interface; the match does not work if the packet terminates at an SVI interface (for example, vlan10). To allow traffic towards specific SVIs, use rules without any interface match or rules with individual -i <SVI> matches.

Services and Daemons in Cumulus Linux

Services (also known as daemons) and processes are at the heart of how a Linux system functions. Most of the time, a service takes care of itself; you just enable and start it, then let it run. However, because a Cumulus Linux switch is a Linux system, you can dig deeper if you like. Services can start multiple processes as they run. Services are important to monitor on a Cumulus Linux switch.

You manage services in Cumulus Linux in the following ways:

systemd and the systemctl Command

You manage services using systemd with the systemctl command. You run the systemctl command with any service on the switch to start, stop, restart, reload, enable, disable, reenable, or get the status of the service.

cumulus@switch:~$ sudo systemctl start | stop | restart | status | reload | enable | disable | reenable SERVICENAME.service

For example to restart networking, run the command:

cumulus@switch:~$ sudo systemctl restart networking.service

Add the service name after the systemctl argument.

To show all running services, use the systemctl status command. For example:

cumulus@switch:~$ sudo systemctl status
● switch
    State: running
      Jobs: 0 queued
    Failed: 0 units
    Since: Thu 2019-01-10 00:19:34 UTC; 23h ago
    CGroup: /
            ├─init.scope
            │ └─1 /sbin/init
            └─system.slice
              ├─haveged.service
              │ └─234 /usr/sbin/haveged --Foreground --verbose=1 -w 1024
              ├─sysmonitor.service
              │ ├─  658 /bin/bash /usr/lib/cumulus/sysmonitor
              │ └─26543 sleep 60
              ├─systemd-udevd.service
              │ └─218 /lib/systemd/systemd-udevd
              ├─system-ntp.slice
              │ └─ntp@mgmt.service
              │   └─vrf
              │     └─mgmt
              │       └─12108 /usr/sbin/ntpd -n -u ntp:ntp -g
              ├─cron.service
              │ └─274 /usr/sbin/cron -f -L 38
              ├─system-serial\x2dgetty.slice
              │ └─serial-getty@ttyS0.service
              │   └─745 /sbin/agetty -o -p -- \u --keep-baud 115200,38400,9600 ttyS0 vt220
              ├─nginx.service
              │ ├─332 nginx: master process /usr/sbin/nginx -g daemon on; master_process on;
              │ └─333 nginx: worker process
              ├─auditd.service
              │ └─235 /sbin/auditd
              ├─rasdaemon.service
              │ └─275 /usr/sbin/rasdaemon -f -r
              ├─clagd.service
              │ └─11443 /usr/bin/python /usr/sbin/clagd --daemon 169.254.1.2 peerlink.4094 44:39:39:ff:40:9
              --priority 100 --vxlanAnycas
              ├─switchd.service
              │ └─430 /usr/sbin/switchd -vx
              ...

systemctl Commands

systemctl has commands that perform a specific operation on a given service:

You do not need to interact with the services directly using these commands. If a critical service crashes or encounters an error, systemd restarts it automatically. systemd is the caretaker of services in modern Linux systems and responsible for starting all the necessary services at boot time.

Ensure a Service Starts after Multiple Restarts

By default, systemd tries to restart a particular service only a certain number of times within a given interval before the service fails to start. The settings StartLimitInterval (which defaults to 10 seconds) and StartBurstLimit (which defaults to 5 attempts) are in the service script; however, certain services override these defaults, sometimes with much longer times. For example, switchd.service sets StartLimitInterval=10m and StartBurstLimit=3; therefore, if you restart switchd more than three times in ten minutes, it does not start.

When the restart fails for this reason, you see a message similar to the following:

Job for switchd.service failed. See 'systemctl status switchd.service' and 'journalctl -xn' for details.

systemctl status switchd.service shows output similar to:

Active: failed (Result: start-limit) since Thu 2016-04-07 21:55:14 UTC; 15s ago

To clear this error, run systemctl reset-failed switchd.service. If you know you are going to restart frequently (multiple times within the StartLimitInterval), you can run the same command before you issue the restart request. This also applies to stop followed by start.

Keep systemd Services from Hanging after Starting

If you start, restart, or reload a systemd service that you can start from another systemd service, you must use the --no-block option with systemctl.

Identify Active Listener Ports for IPv4 and IPv6

You can identify the active listener ports under both IPv4 and IPv6 using the netstat command:

cumulus@switch:~$ netstat -nlp --inet --inet6
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 0.0.0.0:53              0.0.0.0:*               LISTEN      444/dnsmasq
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      874/sshd
tcp6       0      0 :::53                   :::*                    LISTEN      444/dnsmasq
tcp6       0      0 :::22                   :::*                    LISTEN      874/sshd
udp        0      0 0.0.0.0:28450           0.0.0.0:*                           839/dhclient
udp        0      0 0.0.0.0:53              0.0.0.0:*                           444/dnsmasq
udp        0      0 0.0.0.0:68              0.0.0.0:*                           839/dhclient
udp        0      0 192.168.0.42:123        0.0.0.0:*                           907/ntpd
udp        0      0 127.0.0.1:123           0.0.0.0:*                           907/ntpd
udp        0      0 0.0.0.0:123             0.0.0.0:*                           907/ntpd
udp        0      0 0.0.0.0:4784            0.0.0.0:*                           909/ptmd
udp        0      0 0.0.0.0:3784            0.0.0.0:*                           909/ptmd
udp        0      0 0.0.0.0:3785            0.0.0.0:*                           909/ptmd
udp6       0      0 :::58352                :::*                                839/dhclient
udp6       0      0 :::53                   :::*                                444/dnsmasq
udp6       0      0 fe80::a200:ff:fe00::123 :::*                                907/ntpd
udp6       0      0 ::1:123                 :::*                                907/ntpd
udp6       0      0 :::123                  :::*                                907/ntpd
udp6       0      0 :::4784                 :::*                                909/ptmd
udp6       0      0 :::3784                 :::*                                909/ptmd

Identify Active or Stopped Services

To see active or stopped services, run the cl-service-summary command:

cumulus@switch:~$ cl-service-summary
Service cron               enabled    active
Service ssh                enabled    active
Service syslog             enabled    active
Service asic-monitor       enabled    inactive
Service clagd              enabled    inactive
Service cumulus-poe                   inactive
Service lldpd              enabled    active
Service mstpd              enabled    active
Service neighmgrd          enabled    active
Service nvued              enabled    active
Service netq-agent         enabled    active
Service ntp                enabled    active
Service ptmd               enabled    active
Service pwmd               enabled    active
Service smond              enabled    active
Service switchd            enabled    active
Service sysmonitor         enabled    active
Service rdnbrd             disabled   inactive
Service frr                enabled    inactive
...

You can also run the systemctl list-unit-files --type service command to list all services on the switch and to see their status:

cumulus@switch:~$ systemctl list-unit-files --type service
UNIT FILE                              STATE
aclinit.service                        enabled
acltool.service                        enabled
acpid.service                          disabled
asic-monitor.service                   enabled
auditd.service                         enabled
autovt@.service                        disabled
bmcd.service                           disabled
bootlog.service                        enabled
bootlogd.service                       masked  
bootlogs.service                       masked  
bootmisc.service                       masked  
checkfs.service                        masked  
checkroot-bootclean.service            masked  
checkroot.service                      masked
clagd.service                          enabled
console-getty.service                  disabled
console-shell.service                  disabled
container-getty@.service               static  
cron.service                           enabled
cryptdisks-early.service               masked  
cryptdisks.service                     masked  
cumulus-aclcheck.service               static  
cumulus-core.service                   static  
cumulus-fastfailover.service           enabled
cumulus-firstboot.service              disabled
cumulus-platform.service               enabled  
...

Identify Essential Services

To identify which services must run when the switch boots:

cumulus@switch:~$ systemctl list-dependencies --before basic.target

To identify which services you need for networking:

cumulus@switch:~$ systemctl list-dependencies --after network.target
   ├─switchd.service
   ├─wd_keepalive.service
   └─network-pre.target

To identify the services needed for a multi-user environment, run:

cumulus@switch:~$ systemctl list-dependencies --before multi-user.target

 ●  ├─bootlog.service
   ├─systemd-readahead-done.service
   ├─systemd-readahead-done.timer
   ├─systemd-update-utmp-runlevel.service
   └─graphical.target
   └─systemd-update-utmp-runlevel.service

Important Services

The following table lists the most important services in Cumulus Linux.

Service NameDescriptionAffects Forwarding?
switchdHardware abstraction daemon. Synchronizes the kernel with the ASIC.YES
sx_sdkInterfaces with the Spectrum ASIC. Only on Spectrum switches.YES
frrFRR. Handles routing protocols. There are separate processes for each routing protocol, such as bgpd and ospfd.YES if routing
clagdCumulus link aggregation daemon. Handles MLAG.YES if using MLAG
neighmgrdKeeps neighbor entries refreshed, snoops on ARP and ND packets if ARP suppression is on, and refreshes VRR MAC addresses.YES
mstpdSpanning tree protocol daemon.YES if using layer 2
ptmdPrescriptive Topology Manager. Verifies cabling based on LLDP output. Also sets up BFD sessions.YES if using BFD
nvuedHandles the NVUE object model.NO
rsyslogHandles logging of syslog messages.NO
ntpNetwork time protocol.NO
ledmgrdLED manager. Reads the state of system LEDs.NO
sysmonitorWatches and logs critical system load (free memory, disk, CPU).NO
lldpdHandles Tx/Rx of LLDP information.NO
smondReads platform sensors and fan information from pwmd.NO
pwmdReads and sets fan speeds.NO

Configuring switchd

The switchd service enables the switch to communicate with Cumulus Linux and all the applications running on Cumulus Linux.

Configure switchd Settings

You can control certain options associated with the switchd process. For example, you can set polling intervals, optimize ACL hardware resources for better utilization, configure log message levels, set the internal VLAN range, and configure VXLAN encapsulation and decapsulation.

To configure switchd options, you either run NVUE commands or manually edit the /etc/cumulus/switchd.conf file.

NVUE currently only supports a subset of the switchd configuration available in the /etc/cumulus/switchd.conf file.

You can run NVUE commands to set the following switchd options:

  • The statistic polling interval for physical interfaces and for logical interfaces.
    • For physical interfaces, you can specify a value between 1 and 10. The default setting is 2 seconds
    • For logical interfaces, you can specify a value between 1 and 30. The default setting is 5 seconds.

A low setting, such as 1, might affect system performance.

  • The log level to debug the data plane programming related code. You can specify debug, info, notice, warning, or error. The default setting is info. NVIDIA recommends that you do not set the log level to debug in a production environment.
  • The DSCP action and value for encapsulation. You can set the DSCP action to copy (to copy the value from the IP header of the packet), set (to specify a specific value), or derive (to obtain the value from the switch priority). The default action is derive. Only specify a value if the action is set.
  • The DSCP action for decapsulation in VXLAN outer headers. You can specify copy (to copy the value from the IP header of the packet), preserve (to keep the inner DSCP value), or derive (to obtain the value from the switch priority). The default action is derive.
  • The preference between a route and neighbor with the same IP address and mask. You can specify route, neighbor, or route-and-neighbour. The default setting is route.
  • The ACL mode (atomic or non-atomic). The default setting is atomic.
  • The reserved VLAN range. The default setting is 3725-3999.

Certain switchd settings require a switchd restart or reload. Before applying the settings, NVUE indicates if it requires a switchd restart or reload and prompts you for confirmation.

  • When the switchd service restarts, in addition to resetting the switch hardware configuration, all network ports reset.
  • When the switchd service reloads, there is no interruption to network services.

The following command example sets both the statistic polling interval for logical interfaces and physical interfaces to 6 seconds:

cumulus@switch:~$ nv set system counter polling-interval logical-interface 6
cumulus@switch:~$ nv set system counter polling-interval physical-interface 6
cumulus@switch:~$ nv config apply

The following command example sets the log level for debugging the data plane programming related code to warning:

cumulus@switch:~$ nv set system forwarding programming log-level warning
cumulus@switch:~$ nv config apply

The following command example sets the DSCP action for encapsulation in VXLAN outer headers to set and the value to af12:

cumulus@switch:~$ nv set nve vxlan encapsulation dscp action set
cumulus@switch:~$ nv set nve vxlan encapsulation dscp value af12
cumulus@switch:~$ nv config apply

The following command example sets the DSCP action for decapsulation in VXLAN outer headers to preserve:

cumulus@switch:~$ nv set nve vxlan decapsulation dscp action preserve
cumulus@switch:~$ nv config apply

The following command example sets the route or neighbour preference to both route and neighbour:

cumulus@switch:~$ nv set system forwarding host-route-preference route-and-neighbour
cumulus@switch:~$ nv config apply

The following command example sets the ACL mode to non-atomic:

cumulus@switch:~$ nv set system acl mode non-atomic 
cumulus@switch:~$ nv config apply

The following command example sets the reserved VLAN range between 4064 and 4094:

cumulus@switch:~$ nv set system global reserved vlan internal range 4064-4094
cumulus@switch:~$ nv config apply

To configure the switchd parameters, edit the /etc/cumulus/switchd.conf file. Change the setting and uncomment the line if needed. The switchd.conf file contains comments with a description for each setting.

The following example shows the first few lines of the /etc/cumulus/switchd.conf file.

cumulus@switch:~$ sudo nano /etc/cumulus/switchd.conf
#
# /etc/cumulus/switchd.conf - switchd configuration file
#
# Statistic poll interval (in msec)
#stats.poll_interval = 2000

# Buffer utilization poll interval (in msec), 0 means disable
#buf_util.poll_interval = 0

# Buffer utilization measurement interval (in mins)
#buf_util.measure_interval = 0

# Optimize ACL HW resources for better utilization
#acl.optimize_hw = FALSE

# Enable Flow based mirroring.
#acl.flow_based_mirroring = TRUE
...

The following table describes the /etc/cumulus/switchd.conf file parameters and indicates if you need to restart switchd with the sudo systemctl restart switchd.service command or reload switchd with the sudo systemctl reload switchd.service command for changes to take effect when you update the setting.

Restarting the switchd service causes all network ports to reset in addition to resetting the switch hardware configuration.

ParameterDescriptionswitchd reload or restart
stats.poll_intervalThe statistics polling interval in milliseconds.The default setting is 2000.restart
buf_util.poll_intervalThe buffer utilization polling interval in milliseconds. 0 disables buffer utilization polling.The default setting is 0.restart
buf_util.measure_intervalThe buffer utilization measurement interval in minutes.The default setting is 0.restart
acl.optimize_hwOptimizes ACL hardware resources for better utilization.The default setting is FALSE.restart
acl.flow_based_mirroringEnables flow-based mirroring.The default setting is TRUE.restart
acl.non_atomic_update_modeEnables non atomic ACL updatesThe default setting is FALSE.reload
arp.next_hopsSends ARPs for next hops.The default setting is TRUE.restart
route.tableThe kernel routing table ID. The range is between 1 and 2^31.The default is 254.restart
route.host_max_percentThe maximum neighbor table occupancy in hardware (a percentage of the hardware table size).The default setting is 100.restart
coalescing.reducerThe coalescing reduction factor for accumulating changes to reduce CPU load.The default setting is 1.restart
coalescing.timeoutThe coalescing time limit in seconds.The default setting is 10.restart
ignore_non_swpsIgnore routes that point to non-swp interfaces.The default setting is TRUE.restart
disable_internal_parity_restartDisables restart after a parity error.The default setting is TRUE.restart
disable_internal_hw_err_restartDisables restart after an unrecoverable hardware error.The default setting is FALSE.restart
nat.static_enableEnables static NAT.
The default setting is TRUE.
restart
nat.dynamic_enableEnables dynamic NAT.
The default setting is TRUE.
restart
nat.age_poll_intervalThe NAT age polling interval in minutes. The minimum is 1 minute and the maximum is 24 hours. You can configure this setting only when nat.dynamic_enable is set to TRUE.
The default setting is 5.
restart
nat.table_sizeThe NAT table size limit in number of entries. You can configure this setting only when nat.dynamic_enable is set to TRUE.
The default setting is 1024.
restart
nat.config_table_sizeThe NAT configuration table size limit in number of entries. You can configure this setting only when nat.dynamic_enable is set to TRUE.
The default setting is 64.
restart
loggingConfigures logging in the format BACKEND=LEVEL. Separate multiple BACKEND=LEVEL pairs with a space. The BACKEND value can be stderr, file:filename, syslog, program:executable. The LEVEL value can be CRIT, ERR, WARN, INFO, DEBUG.The default value is syslog=INFOrestart
interface.swp1.storm_control.broadcastEnables broadcast storm control and sets the number of packets per second (pps).The default setting is 400.reload
interface.swp1.storm_control.multicastEnables multicast storm control and sets the number of packets per second (pps).The default setting is 3000.reload
interface.swp1.storm_control.unknown_unicastEnables unicast storm control and sets the number of packets per second (pps).The default setting is 2000.reload
stats.vlan.aggregateEnables hardware statistics for VLANs and specifies the type of statistics needed. You can specify NONE, BRIEF, or DETAIL.The default setting is BRIEF.restart
stats.vxlan.aggregateEnables hardware statistics for VXLANs and specifies the type of statistics needed. You can specify NONE, BRIEF, or DETAIL.The default setting is DETAIL.restart
stats.vxlan.memberEnables hardware statistics for VXLAN members and specifies the type of statistics needed. You can specify NONE, BRIEF, or DETAIL.The default setting is BRIEF.restart
stats.vlan.show_internal_vlansShow internal VLANs.The default setting is FALSE.restart
stats.vdev_hw_poll_intervalThe polling interval in seconds for virtual device hardware statisitcs.The default setting is 5.restart
resv_vlan_rangeThe internal VLAN range.The default setting is 3725-3999.restart
netlink.buf_sizeThe netlink socket buffer size in MB.The default setting is 136314880.restart
route.delete_dead_routesDelete routes on interfaces when the carrier is down.The default setting is TRUE.restart
vxlan.default_ttlThe default TTL to use in VXLAN headers.The default setting is 64.restart
bridge.broadcast_frame_to_cpuEnables bridge broadcast frames to the CPU even if the SVI is not enabled.The default setting is FALSE.restart
bridge.unreg_mcast_initInitialize the prune module for IGMP snooping unregistered layer 2 multicast flood control.The default setting is FALSE.restart
bridge.unreg_v4_mcast_pruneEnables unregistered layer 2 multicast prune to mrouter ports (IPv4).The default setting is FALSE (flood unregistered layer 2 multicast traffic).restart
bridge.unreg_v6_mcast_pruneEnables unregistered layer 2 multicast prune to mrouter ports (IPv6).The default setting is FALSE (flood unregistered layer 2 multicast traffic).restart
netlink libnl loggerThe default setting is [0-5].restart
netlink.nl_loggerThe default setting is 0.restart
vxlan.def_encap_dscp_actionSets the default VXLAN router DSCP action during encapsulation. You can specify copy if the inner packet is IP, set to set a specific value, or derive to derive the value from the switch priority.The default setting is derive.restart
vxlan.def_encap_dscp_valueSets the default VXLAN encapsulation DSCP value if the action is set.restart
vxlan.def_decap_dscp_actionSets the default VXLAN router DSCP action during decapsulation. You can specify copy if the inner packet is IP, preserve to preserve the inner DSCP value, or derive to derive the value from the switch priority.The default setting is derive.restart
ipmulticast.unknown_ipmc_to_cpuEnables sending unknown IPMC to the CPU.The default setting is FALSE.restart
vrf_route_leak_enable_dynamicEnables dynamic VRF route leaking.The default setting is FALSE.restart
sync_queue_depth_valThe event queue depth.The default setting is 50000.restart
route.route_preferred_over_neighSets the preference between a route and neighbor with the same IP address and mask. You can specify TRUE to prefer the route over the neighbor, FALSE to prefer the neighbor over the route, or BOTH to install both the route and neighbor.The default setting is TRUE.restart
evpn.multihoming.enableEnables EVPN multihoming.The default setting is TRUE.restart
evpn.multihoming.shared_l2_groupsEnables sharing for layer 2 next hop groups.The default setting is FALSE.restart
evpn.multihoming.shared_l3_groupsEnables sharing for layer 3 next hop groups.The default setting is FALSE.restart
evpn.multihoming.fast_local_protectEnables fast reroute for egress link protection. The default setting is FALSE.restart
evpn.multihoming.bum_sph_filterSets split-horizon filtering for EVPN multihoming. You can specify TRUE to filter only BUM traffic from the Ethernet segment (ES) peer or FALSE to filter all traffic from the ES peer.The default setting is TRUE.restart
link_flap_windowThe duration in seconds during which a link must flap the number of times set in the link_flap_threshold before Cumulus Linux sets the link to protodown and specifies linkflap as the reason.The default setting is 10. A value of 0 disables link flap protection.restart
link_flap_thresholdThe number of times the link must flap within the link flap window before Cumulus Linux sets the link to protodown and specifies linkflap as the reason.The default setting is 5. A value of 0 disables link flap protection.restart
res_usage_warn_thresholdSets the percentage over which forwarding resources (routes, hosts, MAC addresses) must go before Cumulus Linux generates a warning. You can set a value between 50 and 95.The default setting is 90.restart
res_warn_msg_intThe time interval in seconds between resource warning messages. Warning messages generate only one time in the specified interval per resource type even if the threshold falls below or goes over the value set in res_usage_warn_threshold multiple times during this interval. You can set a value between 60 and 3600.The default setting is 300.restart

Show switchd Settings

You can run the following NVUE commands to show the current switchd configuration settings.

Command
Description
nv show system counter polling-intervalShows the polling interval for physical and logical interface counters in seconds.
nv show system forwarding programmingShows the log level for data plane programming logs.
nv show nve vxlan encapsulation dscpShows the DSCP action and value (if the action is set) for the outer header in VXLAN encapsulation.
nv show nve vxlan decapsulation dscpShows the DSCP action for the outer header in VXLAN decapsulation.
nv show system aclShows the ACL mode (atomic or non-atomic).
nv show system global reserved vlan internalShows the reserved VLAN range.

The following example command shows that the polling interval setting for logical interface counters is 6 seconds:

cumulus@switch:~$ nv show system counter polling-interval
                   applied  description
-----------------  -------  -----------------------------------------------------
logical-interface  0:00:06  Config polling-interval for logical interface(in sec)

The following example command shows that the log level setting for data plane programming logs is warning:

cumulus@switch:~$ nv show system forwarding programming
           applied  description
---------  -------  -------------------
log-level  warning  configure Log-level

The following example command shows that the DSCP action setting for the outer header in VXLAN encapsulation is set and the value is af12.

cumulus@switch:~$ nv show nve vxlan encapsulation dscp
        operational  applied  description
------  -----------  -------  --------------------------------------------------
action  set          set      DSCP encapsulation action
value   af12         af12     Configured DSCP value to put in outer Vxlan packet

The following command example shows that ACL mode is atomic:

cumulus@switch:~$ nv show system acl
      applied  description
----  -------  -----------------------------------------
mode  atomic   configure Atomic or Non-Atomic ACL update

The following command example shows that the reserved VLAN range is between 4064 and 4094:

cumulus@switch:~$ nv show system global reserved vlan internal
       operational  applied    description
-----  -----------  ---------  -------------------
range  4064-4094    4064-4094  Reserved Vlan range

In addition to restarting switchd when you change certain /etc/cumulus/switchd.conf file parameters manually, you also need to restart switchd whenever you modify a switchd hardware configuration file (any *.conf file that requires making a change to the switching hardware, such as /etc/cumulus/datapath/traffic.conf). You do not have to restart the switchd service when you update a network interface configuration (for example, when you edit the /etc/network/interfaces file).

Configuring a Global Proxy

You configure global HTTP and HTTPS proxies in the /etc/profile.d/ directory of Cumulus Linux. Set the http_proxy and https_proxy variables to configure the switch with the address of the proxy server you want to use to get URLs on the command line. This is useful for programs such as apt, apt-get, curl and wget, which can all use this proxy.

  1. In a terminal, create a new file in the /etc/profile.d/ directory.

    cumulus@switch:~$ sudo nano /etc/profile.d/proxy.sh
    
  2. Add a line to the file to configure either an HTTP or an HTTPS proxy, or both:

    • HTTP proxy:

      http_proxy=http://myproxy.domain.com:8080
      export http_proxy
      
    • HTTPS proxy:

      https_proxy=https://myproxy.domain.com:8080
      export https_proxy
      
  3. Create a file in the /etc/apt/apt.conf.d directory and add the following lines to the file to get the HTTP and HTTPS proxies. The example below uses http_proxy as the file name:

    cumulus@switch:~$ sudo nano /etc/apt/apt.conf.d/http_proxy
    Acquire::http::Proxy "http://myproxy.domain.com:8080";
    Acquire::https::Proxy "https://myproxy.domain.com:8080";
    
  4. Add the proxy addresses to the /etc/wgetrc file, then uncomment the http_proxy and https_proxy lines, if necessary:

    cumulus@switch:~$ sudo nano /etc/wgetrc
    ...
    https_proxy = https://myproxy.domain.com:8080
    http_proxy = http://myproxy.domain.com:8080
    ...
    
  5. To execute the /etc/profile.d/proxy.sh file in the current environment, run the source command:

    cumulus@switch:~$ source /etc/profile.d/proxy.sh
    

Use the echo command to confirm the configuration:

Set up an apt package cache

In Service System Upgrade - ISSU

Use ISSU to upgrade and troubleshoot an active switch with minimal disruption to the network.

ISSU includes the following modes:

  • In earlier Cumulus Linux releases, ISSU was Smart System Manager.
  • The NVIDIA SN5600 (Spectrum-4) switch does not support ISSU.

Restart Mode

You can configure the switch to restart in one of the following modes.

Cumulus Linux supports fast mode for all protocols; however only supports warm mode for layer 2 forwarding, and layer 3 forwarding with BGP and static routing.

NVIDIA recommends you use NVUE commands to configure restart mode and reboot the system. If you prefer to use csmgrctl commands, you must stop NVUE from managing the /etc/cumulus/csmgrd.conf file before you set restart mode:

  1. Run the following NVUE commands:

    cumulus@switch:~$ nv set system config apply ignore /etc/cumulus/csmgrd.conf
    cumulus@switch:~$ nv config apply
    
  2. Edit the /etc/cumulus/csmgrd.conf file and set the csmgrctl_override option to true:

    cumulus@switch:~$ sudo nano /etc/cumulus/csmgrd.conf
    csmgrctl_override=true
    ...
    
  3. Save the configuration:

    cumulus@switch:~$ nv config save
    

The following command configures the switch to restart in cold mode:

cumulus@switch:~$ nv set system reboot mode cold
cumulus@switch:~$ nv config apply
cumulus@switch:~$ sudo csmgrctl -c

The following command configures the switch to restart in fast mode:

cumulus@switch:~$ nv set system reboot mode fast
cumulus@switch:~$ nv config apply
cumulus@switch:~$ sudo csmgrctl -f

The following command configures the switch to restart in warm mode.

cumulus@switch:~$ nv set system reboot mode warm
cumulus@switch:~$ nv config apply
cumulus@switch:~$ sudo csmgrctl -w

To reboot the switch in the restart mode you configure above with NVUE:

cumulus@switch:~$ nv action reboot system no-confirm

You must specify no-confirm at the end of the command.

To show system reboot information, such as the reboot date and time, reason, and reset mode (fast, cold, warm), run the NVUE nv show system reboot command:

cumulus@switch:~$ nv show system reboot
           operational                       applied  pending
---------  --------------------------------  -------  -------
reason                                                       
  gentime  2023-04-26T15:11:23.140569+00:00                  
  reason   Unknown                                           
  user     system/root

Upgrade Mode

Upgrade mode updates all the components and services on the switch to the latest Cumulus Linux minor release without impacting traffic. After upgrade is complete, you must restart the switch with either a warm, cold, or fast restart.

If the switch is in warm restart mode, restarting the switch after an upgrade does not result in traffic loss (this is a hitless upgrade).

Upgrade mode includes the following options:

The following command upgrades all the system components:

The NVUE command is not supported.
cumulus@switch:~$ sudo csmgrctl -u

The following command provides information on the components you want to upgrade:

The NVUE command is not supported.
cumulus@switch:~$ sudo csmgrctl -d

Maintenance Mode

Maintenance mode globally manages the BGP and MLAG control plane.

To enable maintenance mode:

cumulus@switch:~$ nv action enable system maintenance mode
Action executing ...
System maintenance mode has been enabled successfully
 Current System Mode: Maintenance, cold  
 Maintenance mode since Thu Jun 13 23:59:47 2024 (Duration: 00:00:00)
 Ports shutdown for Maintenance
 frr             : Maintenance, cold, down, up time: 29:06:27
 switchd         : Maintenance, cold, down, up time: 29:06:31
 System Services : Maintenance, cold, down, up time: 29:07:00

Action succeeded
cumulus@switch:~$ sudo csmgrctl -m1

To disable maintenance mode:

cumulus@switch:~$ nv action disable system maintenance mode
Action executing ...
System maintenance mode has been disabled successfully
 Current System Mode: cold  
 frr             : cold, up, up time: 12:57:48 (1 restart)
 switchd         : cold, up, up time: 13:12:13
 System Services : cold, up, up time: 13:12:32
Action succeeded
cumulus@switch:~$ sudo csmgrctl -m0

Before you disable maintenance mode, be sure to bring the ports back up.

To show maintenance mode status either run the NVUE nv show system maintenance command or the Linux sudo csmgrctl -s command:

cumulus@switch:~$ nv show system maintenance 
       operational
-----  -----------
mode   enabled   
ports  disabled 
cumulus@switch:~$ sudo csmgrctl -s
Current System Mode: cold  
 frr             : cold, up, up time: 00:14:51 (2 restarts)
 clagd           : cold, up, up time: 00:14:47
 switchd         : cold, up, up time: 01:09:48
 System Services : cold, up, up time: 01:10:07

Maintenance Ports

Maintenance ports globally disables or enables all configured ports.

To enable maintenance ports:

cumulus@switch:~$ nv action enable system maintenance ports
Action executing ...
System maintenance ports has been enabled successfully
 Current System Mode: cold  
 frr             : cold, up, up time: 28:54:36
 switchd         : cold, up, up time: 28:54:40
 System Services : cold, up, up time: 28:55:09

Action succeeded
cumulus@switch:~$ sudo csmgrctl -p0

To disable maintenance ports:

cumulus@switch:~$ nv action disable system maintenance ports
Action executing ...
System maintenance ports has been disabled successfully
 Current System Mode: cold  
 Ports shutdown for Maintenance
 frr             : cold, up, up time: 28:55:49
 switchd         : cold, up, up time: 28:55:53
 System Services : cold, up, up time: 28:56:22

Action succeeded
cumulus@switch:~$ sudo csmgrctl -p1

To see the status of maintenance ports, run the NVUE nv show system maintenance command:

cumulus@switch:~$ nv show system maintenance 
       operational
-----  -----------
mode   enabled   
ports  disabled 

System Power

In certain situations, you might need to power off the switch instead of rebooting. To power off the switch, you can run the Linux poweroff command.

cumulus@switch:~$ sudo poweroff

When you run the Linux poweroff command on the SN2201, SN2010, SN2100, SN2100B, SN3420, SN3700, SN3700C, SN4410, SN4600C, SN4600, or SN4700 switch, the switch reboots instead of powering off. To power off the switch, run the cl-poweroff command instead. The cl-poweroff command performs a hard abrupt power down instead of a graceful power down.

cumulus@switch:~$ sudo cl-poweroff

Layer 1 and Switch Ports

This section discusses the following layer 1 and switch port configuration:

Interface Configuration and Management

Cumulus Linux uses ifupdown2 to manage network interfaces, which is a new implementation of the Debian network interface manager ifupdown.

Bring an Interface Up or Down

An interface status can be in an:

To configure and bring an interface up administratively, use the nv set interface command:

cumulus@switch:~$ nv set interface swp1
cumulus@switch:~$ nv config apply

After you bring up an interface, you can bring it down administratively by changing the link state to down:

cumulus@switch:~$ nv set interface swp1 link state down
cumulus@switch:~$ nv config apply

To bring the interface back up, change the link state back to up:

cumulus@switch:~$ nv set interface swp1 link state up
cumulus@switch:~$ nv config apply

To remove an interface from the configuration entirely, use the nv unset interface command:

cumulus@switch:~$ nv unset interface swp1
cumulus@switch:~$ nv config apply

To configure and bring an interface up administratively, edit the /etc/network/interfaces file to add the interface stanza, then run the ifreload -a command:

cumulus@switch:~$ sudo nano /etc/network/interfaces
auto lo
iface lo inet loopback
    address 10.10.10.1/32
auto mgmt
iface mgmt
    address 127.0.0.1/8
    address ::1/128
    vrf-table auto
auto eth0
iface eth0 inet dhcp
    ip-forward off
    ip6-forward off
    vrf mgmt
auto swp1
iface swp1
...

To bring an interface down administratively after you configure it, add link-down yes to the interface stanza in the /etc/network/interfaces file, then run ifreload -a:

auto swp1
iface swp1
 link-down yes

If you configure an interface in the /etc/network/interfaces file, you can bring it down administratively with the ifdown swp1 command, then bring the interface back up with the ifup swp1 command. These changes do not persist after a reboot. After a reboot, the configuration present in /etc/network/interfaces takes effect.

-By default, the ifupdown and ifup command is quiet. Use the verbose option (-v) to show commands as they execute when you bring an interface down or up.

To remove an interface from the configuration entirely, remove the interface stanza from the /etc/network/interfaces file, then run the ifreload -a command.

For additional information on interface administrative state and physical state, refer to this knowledge base article.

Loopback Interface

Cumulus Linux has a preconfigured loopback interface. When the switch boots up, the loopback interface called lo is up and assigned an IP address of 127.0.0.1.

The loopback interface lo must always exist on the switch and must always be up.

To configure an IP address for the loopback interface:

cumulus@switch:~$ nv set interface lo ip address 10.10.10.1
cumulus@switch:~$ nv config apply

Edit the /etc/network/interfaces file to add an address line:

auto lo
iface lo inet loopback
    address 10.10.10.1

  • If the IP address has no subnet mask, it automatically becomes a /32 IP address. For example, 10.10.10.1 is 10.10.10.1/32.
  • You can configure multiple IP addresses for the loopback interface.

Subinterfaces

On Linux, an interface is a network device that can be either physical, (for example, swp1) or virtual (for example, vlan100). A VLAN subinterface is a VLAN device on an interface, and the VLAN ID appends to the parent interface using dot (.) VLAN notation. For example, a VLAN with ID 100 that is a subinterface of swp1 is swp1.100. The dot VLAN notation for a VLAN device name is a standard way to specify a VLAN device on Linux.

A VLAN subinterface only receives traffic tagged for that VLAN; therefore, swp1.100 only receives packets that have a VLAN 100 tag on switch port swp1. Any packets that transmit from swp1.100 have a VLAN 100 tag.

The following example configures a routed subinterface on swp1 in VLAN 100:

cumulus@switch:~$ nv set interface swp1.100 ip address 192.168.100.1/24
cumulus@switch:~$ nv config apply

Edit the /etc/network/interfaces file, then run ifreload -a:

cumulus@switch:~$ sudo nano /etc/network/interfaces
...
auto swp1.100
iface swp1.100
 address 192.168.100.1/24
cumulus@switch:~$ sudo ifreload -a

  • If you are using a VLAN subinterface, do not add that VLAN under the bridge stanza.
  • You cannot use NVUE commands to create a routed subinterface for VLAN 1.

Interface IP Addresses

You can specify both IPv4 and IPv6 addresses for the same interface.

For IPv6 addresses:

The following example commands configure three IP addresses for swp1; two IPv4 addresses and one IPv6 address.

cumulus@switch:~$ nv set interface swp1 ip address 10.0.0.1/30
cumulus@switch:~$ nv set interface swp1 ip address 10.0.0.2/30
cumulus@switch:~$ nv set interface swp1 ip address 2001:DB8::1/126
cumulus@switch:~$ nv config apply

In the /etc/network/interfaces file, list all IP addresses under the iface section.

auto swp1
iface swp1
    address 10.0.0.1/30
    address 10.0.0.2/30
    address 2001:DB8::1/126

The address method and address family are not mandatory; they default to inet/inet6 and static. However, you must specify inet/inet6 when you are creating DHCP or loopback interfaces.

auto lo
iface lo inet loopback

To make non-persistent changes to interfaces at runtime, use ip addr add:

cumulus@switch:~$ sudo ip addr add 10.0.0.1/30 dev swp1
cumulus@switch:~$ sudo ip addr add 2001:DB8::1/126 dev swp1

To remove an addresses from an interface, use ip addr del:

cumulus@switch:~$ sudo ip addr del 10.0.0.1/30 dev swp1
cumulus@switch:~$ sudo ip addr del 2001:DB8::1/126 dev swp1

Interface Descriptions

You can add a description (alias) to an interface.

Interface descriptions also appear in the SNMP OID IF-MIB::ifAlias

  • Interface descriptions can have a maximum of 256 characters.
  • Avoid using apostrophes or non-ASCII characters. Cumulus Linux does not parse these characters.

The following example commands create the description hypervisor_port_1 for swp1:

cumulus@switch:~$ nv set interface swp1 description hypervisor_port_1
cumulus@switch:~$ nv config apply

In the /etc/network/interfaces file, add a description using the alias keyword:

cumulus@switch:~# sudo nano /etc/network/interfaces

auto swp1
iface swp1
    alias swp1 hypervisor_port_1

Interface Commands

You can specify user commands for an interface that run at pre-up, up, post-up, pre-down, down, and post-down.

You can add any valid command in the sequence to bring an interface up or down; however, limit the scope to network-related commands associated with the particular interface. For example, it does not make sense to install a Debian package on ifup of swp1, even though it is technically possible. See man interfaces for more details.

The following examples adds a command to an interface to enable proxy ARP:

The NVUE command is not supported.
cumulus@switch:~# sudo nano /etc/network/interfaces
auto swp1
iface swp1
    address 12.0.0.1/30
    post-up echo 1 > /proc/sys/net/ipv4/conf/swp1/proxy_arp

If your post-up command also starts, restarts, or reloads any systemd service, you must use the --no-block option with systemctl. Otherwise, that service or even the switch itself might hang after starting or restarting. For example, to restart the dhcrelay service after bringing up a VLAN, the /etc network/interfaces configuration looks like this:

auto bridge.100
iface bridge.100
    post-up systemctl --no-block restart dhcrelay.service

Source Interface File Snippets

Sourcing interface files helps organize and manage the /etc/network/interfaces file. For example:

cumulus@switch:~$ sudo cat /etc/network/interfaces
# The loopback network interface
auto lo
iface lo inet loopback

# The primary network interface
auto eth0
iface eth0 inet dhcp

source /etc/network/interfaces.d/bond0

The contents of the sourced file used above are:

cumulus@switch:~$ sudo cat /etc/network/interfaces.d/bond0
auto bond0
iface bond0
    address 14.0.0.9/30
    address 2001:ded:beef:2::1/64
    bond-slaves swp25 swp26

Port Ranges

To specify port ranges in commands:

Use commas to separate different port ranges (for example, swp1-46,10-12):

cumulus@switch:~$ nv set interface swp1-4,6,10-12 bridge domain br_default
cumulus@switch:~$ nv config apply

Use the glob keyword to specify bridge ports and bond slaves:

auto br0
iface br0
    bridge-ports glob swp1-6.100

auto br1
iface br1
    bridge-ports glob swp7-9.100  swp11.100 glob swp15-18.100

Fast Linkup

Cumulus Linux supports fast linkup on interfaces on NVIDIA Spectrum1 switches. Fast linkup enables you to bring up ports with cards that require links to come up fast, such as certain 100G optical network interface cards.

You must configure both sides of the connection with the same speed and FEC settings.

cumulus@switch:~$ nv set interface swp1 link fast-linkup on
cumulus@switch:~$ nv config apply

Edit the /etc/cumulus/switchd.conf file and add the interface.<interface>.enable_media_depended_linkup_flow=TRUE and interface.<interface>.enable_port_short_tuning=TRUE settings for the interfaces on which you want to enable fast linkup. The following example enables fast linkup on swp1:

cumulus@switch:~$ sudo nano /etc/cumulus/switchd.conf
...
interface.swp1.enable_media_depended_linkup_flow=TRUE
interface.swp1.enable_short_tuning=TRUE

Reload switchd with the sudo systemctl reload switchd.service command.

Cumulus Linux enables link flap detection by default. Link flap detection triggers when there are five link flaps within ten seconds, at which point the interface goes into a protodown state and shows linkflap as the reason. The switchd service also shows a log message similar to the following:

2023-02-10T17:53:21.264621+00:00 cumulus switchd[10109]: sync_port.c:2263 ERR swp2 link flapped more than 3 times in the last 60 seconds, setting protodown

To show interfaces with the protodown flag, run the Linux ip link command:

cumulus@switch:~$ ip link
...
37: swp2: <NO-CARRIER,BROADCAST,MULTICAST,SLAVE,UP> mtu 9178 qdisc pfifo_fast master bond131 state DOWN mode DEFAULT group default qlen 1000
  link/ether 1c:34:da:ba:bb:2a brd ff:ff:ff:ff:ff:ff protodown on protodown_reason <linkflap>
...

Clear the Interface Protodown State and Reason

The ifdown and ifup commands do not clear the protodown state. You must clear the protodown state and the reason manually using the sudo ip link set <interface> protodown_reason linkflap off and sudo ip link set <interface> protodown off commands.

cumulus@switch:~$ sudo ip link set swp2 protodown_reason linkflap off
cumulus@switch:~$ sudo ip link set swp2 protodown off

After a few seconds the port state returns to UP. Run the ip link show <interface> command to verify that the interface is no longer in a protodown state and that the reason clears:

cumulus@switch:~$ ip link show swp2
37: swp2: <NO-CARRIER,BROADCAST,MULTICAST,SLAVE,UP> mtu 9178 qdisc pfifo_fast master bond131 state UP mode DEFAULT group default qlen 1000
  link/ether 1c:34:da:ba:bb:2a brd ff:ff:ff:ff:ff:ff

You can change link flap protection settings in the /etc/cumulus/switchd.conf file:

After you change the link flap settings, you must restart switchd with the sudo systemctl restart switchd.service command.

Mako Templates

ifupdown2 supports Mako-style templates. The Mako template engine processes the interfaces file before parsing.

Use the template to declare cookie-cutter bridges and to declare addresses in the interfaces file:

%for i in [1,12]:
auto swp${i}
iface swp${i}
    address 10.20.${i}.3/24

  • In Mako syntax, use square brackets ([1,12]) to specify a list of individual numbers. Use range(1,12) to specify a range of interfaces.
  • To test your template and confirm it evaluates correctly, run mako-render /etc/network/interfaces.

To comment out content in Mako templates, use double hash marks (##). For example:

## % for i in range(1, 4):
## auto swp${i}
## iface swp${i}
## % endfor
##

For more Mako template examples, refer to this knowledge base article.

ifupdown Scripts

Unlike the traditional ifupdown system, ifupdown2 does not run scripts installed in /etc/network/*/ automatically to configure network interfaces.

To enable or disable ifupdown2 scripting, edit the addon_scripts_support line in the /etc/network/ifupdown2/ifupdown2.conf file. 1 enables scripting and 2 disables scripting. For example:

cumulus@switch:~$ sudo nano /etc/network/ifupdown2/ifupdown2.conf
# Support executing of ifupdown style scripts.
# Note that by default python addon modules override scripts with the same name
addon_scripts_support=1

ifupdown2 sets the following environment variables when executing commands:

Troubleshooting

To see the link and administrative state of an interface:

cumulus@switch:~$ nv show interface swp1 link state

In the following example, swp1 is administratively UP and the physical link is UP (LOWER_UP).

cumulus@switch:~$ ip link show dev swp1
3: swp1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT qlen 500
    link/ether 44:38:39:00:03:c1 brd ff:ff:ff:ff:ff:ff

To show the assigned IP address on an interface:

cumulus@switch:~$ nv show interface swp1 ip address
cumulus@switch:~$ ip addr show swp1
3: swp1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 500
    link/ether 44:38:39:00:03:c1 brd ff:ff:ff:ff:ff:ff
    inet 192.0.2.1/30 scope global swp1
    inet 192.0.2.2/30 scope global swp1
    inet6 2001:DB8::1/126 scope global tentative
        valid_lft forever preferred_lft forever

To show the description (alias) for an interface:

cumulus@switch$ nv show interface swp1
cumulus@switch$ ip link show swp1
3: swp1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast state DOWN mode DEFAULT qlen 500
    link/ether aa:aa:aa:aa:aa:bc brd ff:ff:ff:ff:ff:ff
    alias hypervisor_port_1

Considerations

Even though ifupdown2 supports the inclusion of multiple iface stanzas for the same interface, use a single iface stanza for each interface. If you must specify more than one iface stanza; for example, if the configuration for a single interface comes from many places, like a template or a sourced file, make sure the stanzas do not specify the same interface attributes. Otherwise, you see unexpected behavior.

In the following example, swp1 is in two files: /etc/network/interfaces and /etc/network/interfaces.d/speed_settings. ifupdown2 parses this configuration because the same attributes are not in multiple iface stanzas.

cumulus@switch:~$ sudo cat /etc/network/interfaces

source /etc/network/interfaces.d/speed_settings

auto swp1
iface swp1
  address 10.0.14.2/24

cumulus@switch:~$ cat /etc/network/interfaces.d/speed_settings

auto swp1
iface swp1
  link-speed 1000
  link-duplex full

ifupdown2 and sysctl

For sysctl commands in the pre-up, up, post-up, pre-down, down, and post-down lines that use the $IFACE variable, if the interface name contains a dot (.), ifupdown2 does not change the name to work with sysctl. For example, the interface name bridge.1 does not convert to bridge/1.

ifupdown2 and the gateway Parameter

The default route that the gateway parameter creates in ifupdown2 does not install in FRR, therefore does not redistribute into other routing protocols. Define a static default route instead, which installs in FRR and redistributes, if needed.

The following shows an example of the /etc/network/interfaces file when you use a static route instead of a gateway parameter:

auto swp2
iface swp2
address 172.16.3.3/24
up ip route add default via 172.16.3.2

Interface Name Limitations

Interface names can be a maximum of 15 characters. You cannot use a number for the first character and you cannot include a dash (-) in the name. In addition, you cannot use any name that matches with the regular expression .{0,13}\-v.*.

If you encounter issues, remove the interface name from the /etc/network/interfaces file, then restart the networking.service.

cumulus@switch:~$ sudo nano /etc/network/interfaces
cumulus@switch:~$ sudo systemctl restart networking.service

IP Address Scope

ifupdown2 does not honor the configured IP address scope setting in the /etc/network/interfaces file and treats all addresses as global. It does not report an error. Consider this example configuration:

auto swp2
iface swp2
    address 35.21.30.5/30
    address 3101:21:20::31/80
    scope link

When you run ifreload -a on this configuration, ifupdown2 considers all IP addresses as global.

cumulus@switch:~$ ip addr show swp2
5: swp2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 74:e6:e2:f5:62:82 brd ff:ff:ff:ff:ff:ff
inet 35.21.30.5/30 scope global swp2
valid_lft forever preferred_lft forever
inet6 3101:21:20::31/80 scope global
valid_lft forever preferred_lft forever
inet6 fe80::76e6:e2ff:fef5:6282/64 scope link
valid_lft forever preferred_lft forever

To work around this issue, configure the IP address scope:

The NVUE command is not supported.

In the /etc/network/interfaces file, configure the IP address scope using post-up ip address add <address> dev <interface> scope <scope>. For example:

auto swp6
iface swp6
    post-up ip address add 71.21.21.20/32 dev swp6 scope site

Then run the ifreload -a command on this configuration.

The following configuration shows the correct scope:

cumulus@switch:~$ ip addr show swp6
9: swp6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 74:e6:e2:f5:62:86 brd ff:ff:ff:ff:ff:ff
inet 71.21.21.20/32 scope site swp6
valid_lft forever preferred_lft forever
inet6 fe80::76e6:e2ff:fef5:6286/64 scope link
valid_lft forever preferred_lft forever

Switch Port Attributes

Cumulus Linux exposes network interfaces for several types of physical and logical devices:

Each physical network interface (port) has several settings:

For NVIDIA Spectrum ASICs, the firmware configures FEC, link speed, duplex mode and auto-negotiation automatically, following a predefined list of parameter settings until the link comes up. You can disable FEC if necessary, which forces the firmware to not try any FEC options.

MTU

Interface MTU applies to traffic traversing the management port, front panel or switch ports, bridge, VLAN subinterfaces, and bonds (both physical and logical interfaces). MTU is the only interface setting that you must set manually.

In Cumulus Linux, ifupdown2 assigns 9216 as the default MTU setting. The initial MTU value set by the driver is 9238. After you configure the interface, the default MTU setting is 9216.

To change the MTU setting, run the following commands. The example command sets the MTU to 1500 for the swp1 interface.

cumulus@switch:~$ nv set interface swp1 link mtu 1500
cumulus@switch:~$ nv config apply

Edit the /etc/network/interfaces file, then run the ifreload -a command.

cumulus@switch:~$ sudo nano /etc/network/interfaces

auto swp1
iface swp1
    mtu 1500
cumulus@switch:~$ sudo ifreload -a

Runtime Configuration (Advanced)

Run the ip link set command. The following example command sets the swp1 interface MTU to 1500.

cumulus@switch:~$ sudo ip link set dev swp1 mtu 1500

A runtime configuration is non-persistent; the configuration you create does not persist after you reboot the switch.

Set a Global Policy

To set a global MTU policy, create a policy document (called mtu.json). For example:

cumulus@switch:~$ sudo cat /etc/network/ifupdown2/policy.d/mtu.json
{
  "address": {"defaults": { "mtu": "9216" }
            }
}

The policies and attributes in any file in /etc/network/ifupdown2/policy.d/ override the default policies and attributes in /var/lib/ifupdown2/policy.d/.

Bridge MTU

The MTU setting is the lowest MTU of any interface that is a member of the bridge (every interface specified in bridge-ports in the bridge configuration of the /etc/network/interfaces file). You are not required to specify an MTU on the bridge. Consider this bridge configuration:

auto bridge
iface bridge
    bridge-ports bond1 bond2 bond3 bond4 peer5
    bridge-vids 100-110
    bridge-vlan-aware yes

For a bridge to have an MTU of 9000, set the MTU for each of the member interfaces (bond1 to bond 4, and peer5) to 9000 at minimum.

When configuring MTU for a bond, configure the MTU value directly under the bond interface; the member links or slave interfaces inherit the configured value. If you need a different MTU on the bond, set it on the bond interface, as this ensures the slave interfaces pick it up. You do not have to specify an MTU on the slave interfaces.

VLAN interfaces inherit their MTU settings from their physical devices or their lower interface; for example, swp1.100 inherits its MTU setting from swp1. Therefore, specifying an MTU on swp1 ensures that swp1.100 inherits the MTU setting for swp1.

If you are working with VXLANs, the MTU for a virtual network interface (VNI must be 50 bytes smaller than the MTU of the physical interfaces on the switch, as various headers and other data require those 50 bytes. Also, consider setting the MTU much higher than 1500.

To show the MTU setting for an interface:

cumulus@switch:~$ nv show interface swp1
...
link                                                   
  auto-negotiate           off                on       
  duplex                   full               full     
  speed                    1G                 auto     
  fec                                         auto     
  mtu                      9216               9216
cumulus@switch:~$ ip link show dev swp1
3: swp1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9216 qdisc pfifo_fast state UP mode DEFAULT qlen 500
   link/ether 44:38:39:00:03:c1 brd ff:ff:ff:ff:ff:ff

Drop Packets that Exceed the Egress Layer 3 MTU

The switch forwards all packets that are within the MTU value set for the egress layer 3 interface. However, when packets are larger in size than the MTU value, the switch fragments the packets that do not have the DF bit set and drops the packets that do have the DF bit set.

Run the following command to drop all IP packets that are larger in size than the MTU value for the egress layer 3 interface instead of fragmenting packets:

cumulus@switch:~$ nv set system control-plane trap l3-mtu-err state off
cumulus@switch:~$ nv config apply
cumulus@switch:~$ echo "0 >" /cumulus/switchd/config/trap/l3-mtu-err/enable

FEC

FEC is an encoding and decoding layer that enables the switch to detect and correct bit errors introduced over the cable between two interfaces. The target IEEE BER on high speed Ethernet links is 10-12. Because 25G transmission speeds can introduce a higher than acceptable BER on a link, FEC is often required to correct errors to achieve the target BER at 25G, 4x25G, 100G, and higher link speeds. The type and grade of a cable or module and the medium of transmission determine which FEC setting is necessary.

For the link to come up, the two interfaces on each end must use the same FEC setting.

FEC requires small latency overhead. For most applications, this small amount of latency is preferable to error packet retransmission latency.

The two FEC types are:

Cumulus Linux includes additional FEC options:

While Auto FEC is the default setting on the NVIDIA Spectrum switch, do not explicitly configure the fec auto option on the switch as this leads to a link flap whenever you run net commit or ifreload -a.

For 25G DAC, 4x25G Breakouts DAC and 100G DAC cables, the IEEE 802.3by specification creates 3 classes:

The IEEE classification specifies various dB loss measurements and minimum achievable cable length. You can build longer and shorter cables if they comply to the dB loss and BER requirements.

If a cable has a CA-25G-S classification and FEC is not on, the BER might be unacceptable in a production network. It is important to set the FEC according to the cable class (or better) to have acceptable bit error rates. See Determining Cable Class below.

You can check bit errors using cl-netstat (RX_ERR column) or ethtool -S (HwIfInErrors counter) after a large amount of traffic passes through the link. A non-zero value indicates bit errors. Expect error packets to be zero or extremely low compared to good packets. If a cable has an unacceptable rate of errors with FEC enabled, replace the cable.

For 25G, 4x25G Breakout, and 100G Fiber modules and AOCs, there is no classification of 25G cable types for dB loss, BER or length. Use FEC if the BER is low enough.

Cable Class of 100G and 25G DACs

You can determine the cable class for 100G and 25G DACs from the Extended Specification Compliance Code field (SFP28: 0Ah, byte 35, QSFP28: Page 0, byte 192) in the cable EEPROM programming.

For 100G DACs, most manufacturers use the 0x0Bh 100GBASE-CR4 or 25GBASE-CR CA-L value (the 100G DAC specification predates the IEEE 802.3by 25G DAC specification). Use RS FEC for 100G DAC; shorter or better cables might not need this setting.

A manufacturer’s EEPROM setting might not match the dB loss on a cable or the actual bit error rates that a particular cable introduces. Use the designation as a guide, but set FEC according to the bit error rate tolerance in the design criteria for the network. For most applications, the highest mutual FEC ability of both end devices is the best choice.

You can determine for which grade the manufacturer has designated the cable as follows.

For the SFP28 DAC, run the following command:

cumulus@switch:~$ sudo ethtool -m swp1 hex on | grep 0020 | awk '{ print $6}'
0c

The values at location 0x0024 are:

For the QSFP28 DAC, run the following command:

cumulus@switch:~$ sudo ethtool -m swp1s0 hex on | grep 00c0 | awk '{print $2}'
0b

The values at 0x00c0 are:

In each example below, the Compliance field comes from the method described above; the ethool -m output does not show it.

3meter cable that does not require FEC
(CA-N)
Cost: More expensive
Cable size: 26AWG (Note that AWG does not necessarily correspond to overall dB loss or BER performance)
Compliance Code: 25GBASE-CR CA-N

3meter cable that requires Base-R FEC
(CA-S)
Cost: Less expensive
Cable size: 26AWG
Compliance Code: 25GBASE-CR CA-S

When in doubt, consult the manufacturer directly to determine the cable classification.

Spectrum ASIC FEC Behavior

The firmware in a Spectrum ASIC applies FEC configuration to 25G and 100G cables based on the cable type and whether the peer switch also has a Spectrum ASIC.

When the link is between two switches with Spectrum ASICs:

Cable Type
FEC Mode
25G optical cablesBase-R/FC-FEC
25G 1,2 meters: CA-N, loss <13dbBase-R/FC-FEC
25G 2.5,3 meters: CA-S, loss <16dbBase-R/FC-FEC
25G 2.5,3,4,5 meters: CA-L, loss > 16dbRS-FEC
100G DAC or opticalRS-FEC

When linking to a non-Spectrum peer, the firmware lets the peer decide. The Spectrum ASIC supports RS-FEC (for both 100G and 25G), Base-R/FC-FEC (25G only), or no-FEC (for both 100G and 25G).

Cable Type
FEC Mode
25G optical cablesLet peer decide
25G 1,2 meters: CA-N, loss <13dbLet peer decide
25G 2.5,3 meters: CA-S, loss <16dbLet peer decide
25G 2.5,3,4,5 meters: CA-L, loss > 16dbLet peer decide
100GLet peer decide: RS-FEC or No FEC

How Does Cumulus Linux use FEC?

A Spectrum switch enables FEC automatically when it powers up. The port firmware tests and determines the correct FEC mode to bring the link up with the neighbor. It is possible to get a link up to a switch without enabling FEC on the remote device as the switch eventually finds a working combination to the neighbor without FEC.

The following sections describe how to show the current FEC mode, and how to enable and disable FEC.

Show the Current FEC Mode

To show the FEC mode on a switch port, run the NVUE nv show interface <interface> link command.

cumulus@switch:~$ nv show interface swp1 link
                  operational   applied  pending  description
----------------  ------------  -------  -------  ----------------------------------------------------------------------
auto-negotiate    off           on       on       Link speed and characteristic auto negotiation
breakout                        1x       1x       sub-divide or disable ports (only valid on plug interfaces)
duplex            full          full     full     Link duplex
fec                             auto     auto     Link forward error correction mechanism
...

Enable or Disable FEC

To enable Reed Solomon (RS) FEC on a link:

cumulus@switch:~$ nv set interface swp1 link fec rs
cumulus@switch:~$ nv config apply

Edit the /etc/network/interfaces file, then run the ifreload -a command. The following example enables RS FEC for the swp1 interface (link-fec rs):

cumulus@switch:~$ sudo nano /etc/network/interfaces

auto swp1
iface swp1
    link-autoneg off
    link-speed 100000
    link-fec rs
cumulus@switch:~$ sudo ifreload -a

Runtime Configuration (Advanced)

Run the ethtool --set-fec <interface> encoding RS command. For example:

cumulus@switch:~$ sudo ethtool --set-fec swp1 encoding RS

A runtime configuration is non-persistent. The configuration you create does not persist after you reboot the switch.

To enable Base-R/FireCode FEC on a link:

cumulus@switch:~$ nv set interface swp1 link fec baser
cumulus@switch:~$ nv config apply

Edit the /etc/network/interfaces file, then run the ifreload -a command. The following example enables Base-R FEC for the swp1 interface (link-fec baser):

cumulus@switch:~$ sudo nano /etc/network/interfaces

auto swp1
iface swp1
    link-autoneg off
    link-speed 100000
    link-fec baser
cumulus@switch:~$ sudo ifreload -a

Runtime Configuration (Advanced)

Run the ethtool --set-fec <interface> encoding baser command. For example:

cumulus@switch:~$ sudo ethtool --set-fec swp1 encoding BaseR

A runtime configuration is non-persistent. The configuration you create does not persist after you reboot the switch.

To enable FEC with Auto-negotiation:

You can use FEC with auto-negotiation on DACs only.

cumulus@switch:~$ nv set interface swp1 link auto-negotiate on
cumulus@switch:~$ nv config apply

Edit the /etc/network/interfaces file to set auto-negotiation to on, then run the ifreload -a command:

cumulus@switch:~$ sudo nano /etc/network/interfaces

auto swp1
iface swp1
link-autoneg on
cumulus@switch:~$ sudo ifreload -a

Runtime Configuration (Advanced)

You can use ethtool to enable FEC with auto-negotiation. For example:

ethtool -s swp1 speed 10000 duplex full autoneg on

A runtime configuration is non-persistent. The configuration you create does not persist after you reboot the switch.

To show the FEC and auto-negotiation settings for an interface, either run the NVUE nv show interface <interface> link command or the Linux sudo ethtool swp1 | egrep 'FEC|auto' command:

cumulus@switch:~$ sudo ethtool swp1 | egrep 'FEC|auto'
Supports auto-negotiation: Yes
Supported FEC modes: RS
Advertised auto-negotiation: Yes
Advertised FEC modes: RS
Link partner advertised auto-negotiation: Yes
Link partner advertised FEC modes: Not reported

To disable FEC on a link:

cumulus@switch:~$ nv set interface swp1 link fec off
cumulus@switch:~$ nv config apply

To configure FEC to the default value, run the nv unset interface swp1 link fec command.

Edit the /etc/network/interfaces file, then run the ifreload -a command. The following example disables Base-R FEC for the swp1 interface (link-fec baser):

cumulus@switch:~$ sudo nano /etc/network/interfaces

auto swp1
iface swp1
link-fec off
cumulus@switch:~$ sudo ifreload -a

Runtime Configuration (Advanced)

Run the ethtool --set-fec <interface> encoding off command. For example:

cumulus@switch:~$ sudo ethtool --set-fec swp1 encoding off

A runtime configuration is non-persistent. The configuration you create does not persist after you reboot the switch.

DR1 and DR4 Modules

100GBASE-DR1 modules, such as NVIDIA MMS1V70-CM, include internal RS FEC processing, which the software does not control. When using these optics, you must either set the FEC setting to off or leave it unset for the link to function.

400GBASE-DR4 modules, such as NVIDIA MMS1V00-WM, require RS FEC. The switch automatically enables FEC if it is set to off.

You typically use these optics to interconnect 4x SN2700 uplinks to a single SN4700 breakout downlink. The following configuration shows an explicit FEC example. You can leave the FEC settings unset for autodetection.

SN4700 (400GBASE-DR4 in swp1):

cumulus@SN4700:mgmt:~$ nv set interface swp1 link breakout 4x lanes-per-port 2
cumulus@SN4700:mgmt:~$ nv set interface swp1s0 link fec rs
cumulus@SN4700:mgmt:~$ nv set interface swp1s0 link speed 100G
cumulus@SN4700:mgmt:~$ nv set interface swp1s1 link fec rs
cumulus@SN4700:mgmt:~$ nv set interface swp1s1 link speed 100G
cumulus@SN4700:mgmt:~$ nv set interface swp1s2 link fec rs
cumulus@SN4700:mgmt:~$ nv set interface swp1s2 link speed 100G
cumulus@SN4700:mgmt:~$ nv set interface swp1s3 link fec rs
cumulus@SN4700:mgmt:~$ nv set interface swp1s3 link speed 100G
cumulus@SN4700:mgmt:~$ nv config apply

SN2700 (100GBASE-DR1 in swp11-14):

cumulus@SN2700:mgmt:~$ nv set interface swp11 link fec off
cumulus@SN2700:mgmt:~$ nv set interface swp11 link speed 100G
cumulus@SN2700:mgmt:~$ nv set interface swp12 link fec off
cumulus@SN2700:mgmt:~$ nv set interface swp12 link speed 100G
cumulus@SN2700:mgmt:~$ nv set interface swp13 link fec off
cumulus@SN2700:mgmt:~$ nv set interface swp13 link speed 100G
cumulus@SN2700:mgmt:~$ nv set interface swp14 link fec off
cumulus@SN2700:mgmt:~$ nv set interface swp14 link speed 100G
cumulus@SN4700:mgmt:~$ nv config apply

The FEC operational view of this configuration appears incorrect because FEC is operationally enabled only on the SN4700 400G breakout side. This is because the 100G DR1 module side handles FEC internally, which is not visible to Cumulus Linux.

cumulus@SN2700:mgmt:~$ nv show int swp11 link
                       operational        applied
---------------------  -----------------  -------
auto-negotiate         on                 on     
duplex                 full               full   
speed                  100G               auto   
fec                    off                off   
mtu                    9216               9216   
fast-linkup            off                       
[breakout]                                       
state                  up                 up     
...
cumulus@SN4700:mgmt:~$ nv show int swp1s1 link
                       operational        applied
---------------------  -----------------  -------
auto-negotiate         on                 on     
duplex                 full               full   
speed                  100G               auto   
fec                    rs                 off    
mtu                    9216               9216   
fast-linkup            off                       
[breakout]                                       
state                  up                 up     
...

Default Policies for Interface Settings

Instead of configuring settings for each individual interface, you can specify a policy for all interfaces on a switch or tailor custom settings for each interface. Create a file in /etc/network/ifupdown2/policy.d/ and populate the settings accordingly. The following example shows a file called address.json.

cumulus@switch:~$ cat /etc/network/ifupdown2/policy.d/address.json
{
    "ethtool": {
        "defaults": {
            "link-duplex": "full"
        },
        "iface_defaults": {
            "swp1": {
                "link-autoneg": "on",
                "link-speed": "1000"
          },
            "swp16": {
                "link-autoneg": "off",
                "link-speed": "10000"
            },
            "swp50": {
                "link-autoneg": "off",
                "link-speed": "100000",
                "link-fec": "rs"
            }
        }
    },
    "address": {
        "defaults": { "mtu": "9000" },
        "iface_defaults": {
            "eth0": {"mtu": "1500"}
        }
    }
}

Setting the default MTU also applies to the management interface. Be sure to add the iface_defaults to override the MTU for eth0, to remain at 9216.

Breakout Ports

Cumulus Linux supports the following ports breakout options:

18x SFP28 25G and 4x QSFP28 100G interfaces only support NRZ encoding. You can set all speeds down to 1G.

All 4x QSFP28 ports can break out into 4x SFP28 or 2x QSFP28.

  • 18x 1G - 18x SFP28 set to 1G
  • 16x 1G - 4x QSFP28 configured as 4x breakouts and set to 1G

Max 1G ports: 34

  • 18x 10G - 18x SFP28 set to 10G
  • 16x 10G - 4x QSFP28 configured as 4x breakouts and set to 10G

Maximum 10G ports: 34

  • 18x 25G - 18x SFP28 (native speed)
  • 16x 25G - 4x QSFP28 breakouts to 4x and set to 25G

Maximum 25G ports: 34

4x 40G - 4x QSFP28 set to 40G

Maximum 40G ports: 4

8x 50G - 4x QSFP28 break out into 2x and set to 50G

Maximum 50G ports: 8

4x 100G - 4x QSFP28 (native speed)

Maximum 100G ports: 4

16x QSFP28 100G interfaces only support NRZ encoding. You can set all speeds down to 1G.

All QSFP28 ports can break out into 4x SFP28 or 2x QSFP28.

64x 1G - 16x QSFP28 break out into 4x and set to 1G

Max 1G ports: 64

64x 10G - 16x QSFP28 break out into 4x and set to 10G

Maximum 10G ports: 64

64x 25G - 16x QSFP28 break out into 4x and set to 25G

Maximum 25G ports: 64

16x 40G - 4x QSFP28 set to 40G

Maximum 40G ports: 16

32x 50G - 16x QSFP28 break out into 2x and set to 50G

Maximum 50G ports: 32

16x 100G - 16x QSFP28 (native speed)

Maximum 100G ports: 16

48x 1GBase-T ports (RJ45 up to 100m CAT5E/6) and 4x QSFP28 100G interfaces (only support NRZ encoding). You can set all speeds down to 1G.

All 4x QSFP28 ports can break out into 4x SFP28 or 2x QSFP28.

  • 48x 1GBase-T - 48x Base-T set to 1G. You can set them to also to 10/100Mb.
  • 16x 1G - 4x QSFP28 configured as 4x breakouts and set to 1G

Maximum 10/100MBase-T ports: 48 Maximum 1GBase-T ports: 48 Maximum 1G ports: 16

  • 16x 10G - 4x QSFP28 configured as 4x breakouts and set to 10G

Maximum 10G ports: 16

  • 16x 25G - 4x QSFP28 breakouts to 4x and set to 25G

Maximum 25G ports: 16

4x 40G - 4x QSFP28 set to 40G

Maximum 40G ports: 4

8x 50G - 4x QSFP28 break out into 2x

Maximum 50G ports: 8

4x 100G - 4x QSFP28 (native speed)

Maximum 100G ports: 4

48x SFP28 25G and 8x QSFP28 100G interfaces only support NRZ encoding. You can set all speeds down to 1G.

The top 4x QSFP28 ports can break out into 4x SFP28. You cannot use the lower 4x QSFP28 disabled ports.

All 8x QSFP28 ports can break out into 2x QSFP28 without disabling ports.

  • 48x 1G - 48x SFP28 set to 10G
  • 16x 1G - 4x QSFP28 break out into 4x and set to 1G

Max 1G ports: 64

  • 48x 10G - 48x SFP28 set to 10G
  • 16x 10G - 4x QSFP28 break out into 4x and set to 10G

Maximum 10G ports: 64

  • 48x 25G - 48x SFP28 (native speed)
  • 16x 25G - Top 4x QSFP28 break out into 4x (bottom 4x QSFP28 disabled)

Maximum 25G ports: 64

8x 40G - 8x QSFP28 set to 40G

Maximum 40G ports: 8

16x 50G - 8x QSFP28 break out into 2x

Maximum 50G ports: 16

8x 100G - 8x QSFP28 (native speed)

Maximum 100G ports: 8

32x QSFP28 100G interfaces only support NRZ encoding. You can set all speeds down to 1G.

The top 16x QSFP28 ports can break out into 4x SFP28. You cannot use the lower 4x QSFP28 disabled ports.

All 32x QSFP28 ports can break out into 2x QSFP28 without disabling ports.

64x 1G - Top 16x QSFP28 break out into 4x and set to 1G (bottom 16XQSFP28 disabled)

Max 1G ports: 64

64x 10G - Top 16x QSFP28 break out into 4x and set to 10G (bottom 16x QSFP28 disabled)

Maximum 10G ports: 64

64x 25G - Top 16x QSFP28 break out into 4x (bottom 16x QSFP28 disabled)

Maximum 25G ports: 64

32x 40G - 32x QSFP28 set to 40G

Maximum 40G ports: 32

64x 50G - 64x QSFP28 break out into 2x

Maximum 50G ports: 64

32x 100G - 32x QSFP28 (native speed)

Maximum 100G ports: 32

48x SFP28 25G and 12x QSFP28 100G interfaces only support NRZ encoding. You can set all speeds down to 1G.

All 12x QSFP28 ports can break out into 4x SFP28 or 2x QSFP28.

  • 48x 1G - 48XSFP28 set to 1G
  • 48x 1G - 12XQSFP28 break out into 4x and set to 1G

Max 1G ports: 96

  • 48x 10G - 48x SFP28 set to 10G
  • 48x 10G - 12x QSFP28 break out into 4x and set to 10G

Maximum 10G ports: 96

  • 48x 25G - 48x SFP28 (native speed)
  • 48x 25G - 12x QSFP28 break out into 4x

Maximum 25G ports: 96

12x 40G - 12x QSFP28 set to 40G

Maximum 40G ports: 12

24x 50G - 12x QSFP28 break out into 2x

Maximum 50G ports: 24

12x 100G - 12x QSFP28 (native speed)

Maximum 100G ports: 12

32x QSFP28 100G interfaces only support NRZ encoding. You can set all speeds down to 1G.

All 32x QSFP28 ports can break out into 4x SFP28 or 2x QSFP28.

128x1G - 32XQSFP28 break out into 4x and set to 1G

Max 1G ports: 128

128x 10G - 32x QSFP28 break out into 4x and set to 10G

Maximum 10G ports: 128

128x25G - 32x QSFP28 break out into 4x

Maximum 25G ports: 128

32x 40G - 32x QSFP28 set to 40G

Maximum 40G ports: 32

64x 50G - 32x QSFP28 break out into 2x

Maximum 50G ports: 64

32x 100G - 32x QSFP28 (native speed)

Maximum 100G ports: 32

32x QSFP56 200G interfaces support both PAM4 and NRZ encodings. You can set all speeds down to 1G.

For lower speed interface configurations, PAM4 is automatically converted to NRZ encoding.

All 32x QSFP56 ports can break out into 4xSFP56 or 2x QSFP56.

128x 1G - 32XQSFP56 break out into 4x and set to 1G

Max 1G ports: 128

128x 10G - 32x QSFP56 break out into 4x and set to 10G

Maximum 10G ports: 128

128x 25G - 32x QSFP56 break out into 4x and set to 25G

Maximum 25G ports: 128

32x 40G - 32x QSFP56 set to 40G

Maximum 40G ports: 32

128x 50G - 32x QSFP56 break out into 4x

Maximum 50G ports: 128

64x100G - 32x QSFP56 break out into 2x

Maximum 100G ports: 64

32x 200G - 32x QSFP56 (native speed)

Maximum 200G ports: 32

SN4410 24xQSFP28-DD interfaces [ports 1-24] support both PAM4 and NRZ encoding with all speeds from 200G down to 1G.

The 8xQSFP-DD (400GbE) interfaces [ports 25-32] support both PAM4 and NRZ encodings with all speeds from 400G down to 1G.

For lower speeds, PAM4 is automatically converted to NRZ encoding.

You can split ports #1 to #32 into:

  • 2x ports with PAM 4 and NRZ encoding with no limitations.
  • 4x ports with PAM 4 and NRZ encoding with no limitations.
  • 8x ports with PAM 4 and NRZ encoding but this forces blocking of an adjacent port (the total available number of MAC addresses is 128)
  • 96x 1G - 24XQSFP28-DD break out into 4x and set to 1G
  • 32x 1G - Top 4XQSFP-DD break out into 8x and set to 1G (bottom 4XQSFP-DD blocked*)

Max 1G ports: 128

  • 96x 10G - 24xQSFP28-DD break out into 4x and set to 10G
  • 32x 10G - 4 top QSFP-DD break out into 8x and set to 10G (bottom 4xQSFP-DD blocked*)

Maximum 10G ports: 128

*Other QSFP-DD breakout combinations are available up to maximum of 128x ports.

  • 96x 25G - 24xQSFP28-DD break out into 4x
  • 32x 25G - 4 top QSFP-DD break out into 8x and set to 25G (bottom 4xQSFP-DD blocked*)

Maximum 25G ports: 128

*Other QSFP-DD breakout combinations are available up to maximum of 128x ports.

  • 48x 40G - 24xQSFP28-DD breakout into 2x and set to 40G
  • 16x 40G – 8xQSFP-DD breakout into 2x and set to 40G

Maximum 40G ports: 64

  • 96x 50G - 24xQSFP28-DD/QSFP56 break out into 4x
  • 32x 50G - 8xQSFP-DD break out into 4x

Maximum 50G ports: 128

  • 96x 100G - 24xQSFP28-DD/QSFP56 break out into 4x
  • 32x 100G - 8xQSFP-DD break out into 4x

Maximum 100G ports: 128

  • 48x 200G - 24xQSFP28-DD/QSFP56 break out into 2x
  • 16x 200G - 8xQSFP-DD break out into 2x

Maximum 200G ports: 64

8x400G - 8xQSFP-DD (native speed)

Maximum 400G ports: 8

64x QSFP28 100G interfaces only support NRZ encoding. You can set all speeds down to 1G.

Only 32x QSFP28 ports can break out into 4x SFP28. You must disable the adjacent QSFP28 port. Only the first and third or second and forth rows can break out into 4xSFP28.

All 64x QSFP28 ports can break out into 2x QSFP28 without disabling ports.

128x 1G - 32XQSFP28 break out into 4x and set to 1G

Max 1G ports: 128

128x 10G - 32x QSFP28 break out into 4x and set to 10G

Maximum 10G ports: 128

128x 25G - 32x QSFP28 break out into 4x

Maximum 25G ports: 128

64x 40G - 64x QSFP28 set to 40G

Maximum 40G ports: 64

128x 50G - 64x QSFP28 break out into 2x

Maximum 50G ports: 128

64x 100G - 64x QSFP28 (native speed)

Maximum 100G ports: 64

SN4600 64xQSFP56 (200GbE) interfaces support both PAM4 and NRZ encodings with all speeds down to 1G.

For lower speeds, PAM4 is automatically converted to NRZ encoding.

Only 32xQSFP56 ports can break out into 4xSFP56 (4x50GbE). But, in this case, the adjacent QSFP56 port are blocked (only the first and third or second and fourth rows can break out into 4xSFP56).

All 64xQSFP56 ports can break out into 2xQSFP56 (2x100GbE) without blocking ports.

128x 1G - 32XQSFP56 break out into 4x and set to 1G

Max 1G ports: 128

128x10G - 64xQSFP56 break out into 4x and set to 10G

Maximum 10G ports: 128

128x25G - 64xQSFP56 break out into 4x and set to 25G

Maximum 25G ports: 128

64x40G - 64xQSFP56 set to 40G

Maximum 40G ports: 64

128x50G - 32xQSFP56 break out into 4x

Maximum 50G ports: 128

  • 128x 100G - 64xQSFP56 break out into 2x
  • 64x 100G - 64xQSFP28 set to 100G

Maximum 100G ports: 128

64x200G - 64xQSFP56 (native speed)

Maximum 200G ports: 64

SN4700 32x QSFP-DD 400GbE interfaces support both PAM4 and NRZ encodings. You can set all speeds down to 1G.

For lower speed interface configurations, PAM4 is automatically converted to NRZ encoding.

Only the top 16x QSFP-DD ports can break out into 8x SFP56. You must disable the adjacent QSFP-DD port.

All 32x QSFP-DD ports can break out into 2x QSFP56 at 2x200G or 4x QSFP56 at 4x 100G without disabling ports.

128x 1G - Top 16XQSFP-DD break out into 8x and set to 1G

Maximum 1G ports: 128

128x 10G - 16x QSFP-DD break out into 8x and set to 10G

Maximum 10G ports: 128

*Cumulus Linux supports other QSFP-DD breakout combinations up to maximum of 128x ports.

128x 25G - 16x QSFP-DD break out into 8x and set to 25G

Maximum 25G ports: 128

*Cumulus Linux supports other QSFP-DD breakout combinations up to maximum of 128x ports.

32x 40G - 32x QSFP-DD set to 40G

Maximum 40G ports: 32

128x 50G - 16x QSFP-DD break out into 8x

Maximum 50G ports: 128

*Cumulus Linux supports other QSFP-DD breakout combinations up to maximum of 128x ports.

128x 100G - 32x QSFP-DD break out into 4x

Maximum 100G ports: 128

64x 200G - 64x QSFP-DD break out into 2x

Maximum 200G ports: 64

32x 400G - 32x QSFP-DD (native speed)

Maximum 400G ports: 32

SN5600 64xOSFP (800GbE) interfaces support both PAM4 and NRZ encodings with all speeds down to 10G.

For lower speeds, PAM4 is automatically converted to NRZ encoding.

Bonus port #65 supports 1G, 10G, and 25G but does not support breakouts.

Maximum 1G ports: 1 (bonus port)

256x 10G

Maximum 10G ports: 257 (256 + 1 bonus port)

256x 25G

Maximum 25G ports: 257 (256 + 1 bonus port)

128x 40G

Maximum 40G ports: 128

256x 50G - 32x OSFP break out into 8x - You must disable the adjacent OSFP port.

Maximum 50G ports: 256

256x 100G - 32x OSFP break out into 8x - You must disable the adjacent OSFP port.

Maximum 100G ports: 256

256x 200G - 64x OSFP break out into 4x

Maximum 200G ports: 256

128x 400G - 64x OSFP break out into 2x

Maximum 400G ports: 128

64x 800G

Maximum 800G ports: 64

  • You can use a single SFP (10/25/50G) transceiver in a QSFP (100/200/400G) port with QSFP-to-SFP Adapter (QSA). Set the port speed to the SFP speed with the nv set interface <interface> link speed <speed> command. Do not configure this port as a breakout port.
  • If you break out a port, then reload the switchd service on a switch running in nonatomic ACL mode, temporary disruption to traffic occurs while the ACLs reinstall.
  • Cumulus Linux does not support port ganging.

Configure a Breakout Port

You can break out (split) a port using the following options:

If you split a 100G port into four interfaces and auto-negotiation is on (the default setting), Cumulus Linux advertises the speed for each interface up to the maximum speed possible for a 100G port (100/4=25G). You can overide this configuration and set specific speeds for the split ports if necessary.

  • Cumulus Linux 5.4 and later uses a new format for port splitting; instead of 1=100G or 1=4x10G, you specify 1=1x or 1=4x. The new format does not support specifying a speed for breakout ports in the /etc/cumulus/ports.conf file. To set a speed, either set the link-speed parameter for each split port in the /etc/network/interfaces file or run the NVUE nv set interface <interface> link speed <speed> command.

The following example breaks out a 100G port on swp1 into four interfaces. Cumulus Linux advertises the speed for each interface up to a maximum of 25G:

cumulus@switch:~$ nv set interface swp1 link breakout 4x
cumulus@switch:~$ nv set interface swp1s0-3 link state up
cumulus@switch:~$ nv config apply

The following example splits the port into four interfaces and forces the link speed to be 10G. Cumulus disables auto-negotiation when you force set the speed.

cumulus@switch:~$ nv set interface swp1 link breakout 4x
cumulus@switch:~$ nv set interface swp1s0-3 link state up
cumulus@switch:~$ nv set interface swp1s0-3 link speed 10G

Certain switches, such as the SN2700, SN4600, and SN4600c, require that you disable the subsequent even-numbered port when you configure a breakout port for 4x or 8x. NVUE automatically disables the subsequent even-numbered port on any switch with this requirement.

  1. To split a port into multiple interfaces, edit the /etc/cumulus/ports.conf file. The following example command breaks out swp1 into four interfaces.

    cumulus@switch:~$ sudo cat /etc/cumulus/ports.conf
    ...
    1=4x 
    2=disabled 
    3=1x 
    4=1x 
    ...
    

When you configure a breakout port to 4x or 8x on certain switches such as the SN2700, SN4600, and SN4600c, you must set the subsequent even-numbered port to disabled in the /etc/cumulus/ports.conf file. The SN3700, SN3700c, SN2201, SN2010, and SN2100 switch does not have this requirement.

  1. Reload switchd with the sudo systemctl reload switchd.service command. The reload does not interrupt network services.

    cumulus@switch:~$ sudo systemctl reload switchd.service
    
  2. To configure specific speeds for the split ports, edit the /etc/network/interfaces file, then run the ifreload -a command. The following example configures the speed for each swp1 breakout port (swp1s0, swp1s1, swp1s2, and swp1s3) to 10G with auto-negotiation off.

cumulus@switch:~$ sudo cat /etc/network/interfaces
...
auto swp1s0
iface swp1s0
    link-speed 10000 
    link-duplex full 
    link-autoneg off
auto swp1s1
iface swp1s1
    link-speed 10000 
    link-duplex full 
    link-autoneg off
auto swp1s2
iface swp1s2
    link-speed 10000 
    link-duplex full 
    link-autoneg off
auto swp1s3
iface swp1s3
    link-speed 10000 
    link-duplex full 
    link-autoneg off
...
cumulus@switch:~$ sudo ifreload -a

The SN4700 and SN4410 switch does not support auto-negotiation on QSFP-DD 400G transceiver modules. You need to force set the speed.

Set the Number of Lanes per Split Port

By default, to calculate the split port width, Cumulus Linux uses the formula split port width = full port width / breakout. For example, a port split into two interfaces (2x breakout) => 8 lanes width / 2x breakout = 4 lanes per split port.

If you need to use a different port width than the default, you can set the number of lanes per port.

QSFP56-DD transceiver ports split into four interfaces (4x) default to one lane per interface for backwards compatibility. You can change the lane setting to two lanes per interface.

The following example command splits swp1 into two interfaces (2x) and sets the number of lanes per split port to 2.

cumulus@switch:~$ nv set interface swp1 link breakout 2x lanes-per-port 2
cumulus@switch:~$ nv config apply

You must configure the lanes-per-port at the same time as you configure the breakout. If you want to change the number of lanes per port after you configure a breakout, you must first unset the breakout with the nv unset interface <port> breakout and nv config apply commands, then reconfigure the breakout and the lanes with the nv set interface <interface> link breakout <breakout> lanes-per-port <lanes> command. For example:

cumulus@switch:~$ nv unset interface swp1 link breakout
cumulus@switch:~$ nv config apply
cumulus@switch:~$ nv set interface swp1 link breakout 2x lanes-per-port 2
cumulus@switch:~$ nv config apply

Edit the /etc/cumulus/ports_width.conf file and add the numer of lanes per split port you want to use, then reload switchd:

You must configure the lanes per port in the /etc/cumulus/ports_width.conf before you configure the breakout in the /etc/cumulus/ports.conf file. If the ports.conf file already contains breakout configuration for a port, you must set the breakout back to 1x, then reload switchd. You can then set the desired lanes per port, then reconfigure the breakout.

cumulus@switch:~$ sudo nano /etc/cumulus/ports_width.conf
...
1=2
2=default
3=default
4=default
5=default
6=default
7=default
8=default
...
cumulus@switch:~$ sudo systemctl reload switchd.service

Remove a Breakout Port

To remove a breakout port:

  1. Run the nv unset interface <interface> command. For example:

    cumulus@switch:~$ nv unset interface swp1s0
    cumulus@switch:~$ nv unset interface swp1s1
    cumulus@switch:~$ nv unset interface swp1s2
    cumulus@switch:~$ nv unset interface swp1s3
    cumulus@switch:~$ nv config apply
    
  2. Run the nv unset interface <interface> link breakout command to configure the interface for the original speed. For example:

    cumulus@switch:~$ nv unset interface swp1 link breakout
    cumulus@switch:~$ nv config apply
    
  1. Edit the /etc/cumulus/ports.conf file to configure the interface for the original speed.

    cumulus@switch:~$ sudo nano /etc/cumulus/ports.conf
    ...
    1=1x 
    2=1x 
    3=1x 
    4=1x 
    ...
    
  2. Reload switchd. The reload does not interrupt network services.

    cumulus@switch:~$ sudo systemctl reload switchd.service
    
  3. Remove the breakout interface configuration from the /etc/network/interfaces file, then run the ifreload -a command.

Configure Port Lanes

You can override the default behavior for supported speeds and platforms and specify the number of lanes for a port. For example, for the NVIDIA SN4700 switch, the default port speed is 50G (2 lanes, NRZ signaling mode) and 100G (4 lanes, NRZ signaling mode). You can override this setting to 50G (1 lane, PAM4 signaling mode) and 100G (2 lanes, PAM4 signaling mode).

This setting does not apply when auto-negotiation is on because Cumulus Linux advertises all supported speed options, including PAM4 and NRZ during auto-negotiation.

cumulus@switch:~$ nv set interface swp1 link speed 50G
cumulus@switch:~$ nv set interface swp1 link lanes 1
cumulus@switch:~$ nv config apply 
cumulus@switch:~$ nv set interface swp2 link speed 100G
cumulus@switch:~$ nv set interface swp2 link lanes 2
cumulus@switch:~$ nv config apply
  1. Edit the /etc/network/interfaces file, then run the ifreload -a command.

    cumulus@switch:~$ sudo nano /etc/network/interfaces
    ...
    auto swp1
    iface swp1
        link-lanes 1
        link-speed 50000
    auto swp2
    iface swp2
        link-lanes 2
        link-speed 100000
    
  2. Run the ifreload -a command:

    cumulus@switch:~$ sudo ifreload -a
    

ports.conf File Validator

Cumulus Linux includes a ports.conf validator that switchd runs automatically before the switch starts up to confirm that the file syntax is correct. You can run the validator manually to verify the syntax of the file whenever you make changes. The validator is useful if you want to copy a new ports.conf file to the switch with automation tools, then validate that it has the correct syntax.

To run the validator manually, run the /usr/cumulus/bin/validate-ports -f <file> command. For example:

cumulus@switch:~$ /usr/cumulus/bin/validate-ports -f /etc/cumulus/ports.conf

Troubleshooting

This section shows basic commands for troubleshooting switch ports. For a more comprehensive troubleshooting guide, see Troubleshoot Layer 1.

Interface Settings

To see all settings for an interface, run the nv show interface <interface> command:

cumulus@switch:~$ nv show interface swp1                          operational        applied
------------------------  -----------------  -------
type                      swp                swp    
[acl]                                               
evpn                                                
  multihoming                                       
    uplink                                   off    
ptp                                                 
  enable                                     off    
router                                              
  adaptive-routing                                  
    enable                                   off    
  ospf                                              
    enable                                   off    
  ospf6                                             
    enable                                   off    
  pbr                                               
    [map]                                           
  pim                                               
    enable                                   off    
synce                                               
  enable                                     off    
ip                                                  
  igmp                                              
    enable                                   off    
  ipv4                                              
    forward                                  on     
  ipv6                                              
    enable                                   on     
    forward                                  on     
  neighbor-discovery                                
    enable                                   on     
    [dnssl]                                         
    home-agent                                      
      enable                                 off    
    [prefix]                                        
    [rdnss]                                         
    router-advertisement                            
      enable                                 off    
  vrrp                                              
    enable                                   off    
  vrf                                        default
  [gateway]                                         
link                                                
  auto-negotiate          off                on     
  duplex                  full               full   
  speed                   1G                 auto   
  fec                                        auto   
  mtu                     9000               9216   
  fast-linkup             off                       
  [breakout]                                        
  state                   up                 up     
  stats                                             
    carrier-transitions   4                         
    in-bytes              600 Bytes                 
    in-drops              5                         
    in-errors             0                         
    in-pkts               10                        
    out-bytes             2.11 MB                   
    out-drops             0                         
    out-errors            0                         
    out-pkts              33143                     
  mac                     48:b0:2d:39:3f:83         
ifindex                   3

You can add the --view option to show different views: acl-statistics, brief, detail, lldp, mac, mlag-cc, pluggables, qos-profile, and small. For example, the nv show interface --view=small command lists the interfaces on the switch. The nv show interface --view=brief command shows information about each interface on the switch, such as the interface type, speed, remote host and port. The nv show interface --view=mac command shows the MAC address of each interface.

The description column only shows in the output when you use the --view=detail option.

The following example shows the MAC address of each interface on the switch:

cumulus@switch:~$ nv show interface --view=mac
Interface   State  Speed  MTU    MAC                Type    
----------  -----  -----  -----  -----------------  --------
BLUE        up            65575  2a:f9:b5:3c:74:b8  vrf     
RED         up            65575  8e:91:ed:ed:d5:76  vrf     
bond1       up     1G     9000   48:b0:2d:39:3f:83  bond    
bond2       up     1G     9000   48:b0:2d:b3:5e:18  bond    
bond3       up     1G     9000   48:b0:2d:c2:9d:47  bond    
br_default  up            9216   44:38:39:22:01:7a  bridge  
br_l3vni    up            9216   44:38:39:22:01:7a  bridge  
eth0        up     1G     1500   44:38:39:22:01:7a  eth     
lo          up            65536  00:00:00:00:00:00  loopback
mgmt        up            65575  8a:58:d0:25:47:7d  vrf     
swp1        up     1G     9000   48:b0:2d:39:3f:83  swp     
swp2        up     1G     9000   48:b0:2d:b3:5e:18  swp     
swp3        up     1G     9000   48:b0:2d:c2:9d:47  swp     
swp4        down          1500   48:b0:2d:c2:7e:cd  swp     
swp5        down          1500   48:b0:2d:6e:bc:c1  swp     
swp6        down          1500   48:b0:2d:2d:89:16  swp 
...

You can filter the nv show interface command output on specific columns. For example, the nv show interface --filter mtu=1500 shows only the interfaces with MTU set to 1500.

To filter on multiple column outputs, enclose the filter types in parentheses; for example, nv show interface --filter "type=bridge&mtu=9216" shows data for bridges with MTU 9216.

You can filter on all revisions (operational, applied, and pending); for example, nv show interface --filter mtu=1500 --rev=applied shows only the interfaces with MTU set to 1500 in the applied revision.

The following example shows information for all bridges configured on the switch with MTU 9216:

cumulus@switch:~$ nv show interface --filter "type=bridge&mtu=9216"
Interface   State  Speed  MTU   Type    Remote Host  Remote Port  Summary                                
----------  -----  -----  ----  ------  -----------  -----------  ---------------------------------------
br_default  up            9216  bridge                            IP Address: fe80::4638:39ff:fe22:17a/64
br_l3vni    up            9216  bridge                            IP Address: fe80::4638:39ff:fe22:17a/64

Statistics

To show interface statistics, run the NVUE nv show interface <interface> counters command or the Linux sudo ethtool -S <interface> command.

cumulus@switch:~$ nv show interface swp1 counters
                    operational  applied
-------------------  -----------  -------
carrier-transitions  4                   
in-bytes             3.37 MB             
in-drops             0                   
in-errors            0                   
in-pkts              29025               
out-bytes            4.28 MB             
out-drops            0                   
out-errors           0                   
out-pkts             43945  
...

For more information about showing and clearing interface counters, refer to Monitoring Interfaces and Transceivers with NVUE.

SFP Port Information

To verify SFP settings, run the NVUE nv show interface <interface> pluggable command or the ethtool -m command. The following example shows the vendor, type and power output for swp1.

cumulus@switch:~$ sudo ethtool -m swp1 | egrep 'Vendor|type|power\s+:'
        Transceiver type                          : 10G Ethernet: 10G Base-LR
        Vendor name                               : FINISAR CORP.
        Vendor OUI                                : 00:90:65
        Vendor PN                                 : FTLX2071D327
        Vendor rev                                : A
        Vendor SN                                 : UY30DTX
        Laser output power                        : 0.5230 mW / -2.81 dBm
        Receiver signal average optical power     : 0.7285 mW / -1.38 dBm

Considerations

Auto-negotiation and FEC

If auto-negotiation is off on 100G and 25G interfaces, you must set FEC to OFF, RS, or BaseR to match the neighbor. The FEC default setting of auto does not link up when auto-negotiation is off.

If auto-negotiation is on and you set the link speed for a port, Cumulus Linux disables auto-negotiation and uses the port speed setting you configure.

Auto-negotiation with the Spectrum-4 Switch

When you connect an NVIDIA Spectrum-4 switch to another NVIDIA Spectrum-4 switch with PAM4 modulation, you must enable auto-negotiation.

1000BASE-T SFP Modules Supported Only on Certain 25G Platforms

The following 25G switches support 1000BASE-T SFP modules:

100G or faster switches do not support 1000BASE-T SFP modules.

After rebooting the NVIDIA SN2100 switch, eth0 always has a speed of 100MB per second. If you bring the interface down and then back up again, the interface negotiates 1000MB. This only occurs the first time the interface comes up.

To work around this issue, add the following commands to the /etc/rc.local file to flap the interface automatically when the switch boots:

modprobe -r igb
sleep 20
modprobe igb

NVIDIA SN5600 Switch and Force Mode

When you configure force mode on NVIDIA SN5600 switch ports 10 through 50, the Rx precoding setting must be the same between local and peer ports to get the optimal Signal-Integrity of the link.

Delay in Reporting Interface as Operational Down

When you remove two transceivers simultaneously from a switch, both interfaces show the carrier down status immediately. However, it takes one second for the second interface to show the operational down status. In addition, the services on this interface also take an extra second to come down.

NVIDIA Spectrum-2 Switches and FEC Mode

The NVIDIA Spectrum-2 (25G) switch only supports RS FEC.

ifplugd

ifplugd is an Ethernet link-state monitoring daemon that executes scripts to configure an Ethernet device when you plug in or remove a cable. Follow the steps below to install and configure the ifplugd daemon.

Install ifplugd

You can install this package even if the switch does not connect to the internet. The package is in the cumulus-local-apt-archive repository on the Cumulus Linux image.

To install ifplugd:

  1. Update the switch before installing the daemon:

    cumulus@switch:~$ sudo -E apt-get update
    
  2. Install the ifplugd package:

    cumulus@switch:~$ sudo -E apt-get install ifplugd
    

Configure ifplugd

After you install ifplugd, you must edit two configuration files:

The example configuration below configures ifplugd to bring down all uplinks when the peer bond goes down in an MLAG environment.

  1. Open /etc/default/ifplugd in a text editor and configure the file as appropriate. Add the peerbond name before you save the file.

    INTERFACES="peerbond"
    HOTPLUG_INTERFACES=""
    ARGS="-q -f -u0 -d1 -w -I"
    SUSPEND_ACTION="stop"
    
  2. Open the /etc/ifplugd/action.d/ifupdown file in a text editor. Configure the script, then save the file.

    #!/bin/sh
    set -e
    case "$2" in
    up)
            clagrole=$(clagctl | grep "Our Priority" | awk '{print $8}')
            if [ "$clagrole" = "secondary" ]
            then
                #List all the interfaces below to bring up when clag peerbond comes up.
                for interface in swp1 bond1 bond3 bond4
                do
                    echo "bringing up : $interface"  
                    ip link set $interface up
                done
            fi
        ;;
    down)
            clagrole=$(clagctl | grep "Our Priority" | awk '{print $8}')
            if [ "$clagrole" = "secondary" ]
            then
                #List all the interfaces below to bring down when clag peerbond goes down.
                for interface in swp1 bond1 bond3 bond4
                do
                    echo "bringing down : $interface"
                    ip link set $interface down
                done
            fi
        ;;
    esac
    
  3. Restart the ifplugd daemon to implement the changes:

    cumulus@switch:~$ sudo systemctl restart ifplugd.service
    

Considerations

The default shell for ifplugd is dash (/bin/sh) instead of bash, as it provides a faster and more nimble shell. However, dash contains fewer features than bash (for example, dash is unable to handle multiple uplinks).

Quality of Service

This section refers to frames for all internal QoS functionality. Unless explicitly stated, the actions are independent of layer 2 frames or layer 3 packets.

Cumulus Linux supports several different QoS features and standards including:

Cumulus Linux uses two configuration files for QoS:

Cumulus Linux 5.0 and later does not use the traffic.conf and datapath.conf files but uses the qos_features.conf and qos_infra.conf files instead. Before upgrading Cumulus Linux, review your existing QoS configuration to determine the changes you need to make.

switchd and QoS

When you run Linux commands to configure QoS, you must apply QoS changes to the ASIC with the following command:

cumulus@switch:~$ sudo systemctl reload switchd.service

Unlike the restart command, the reload switchd.service command does not impact traffic forwarding except when the qos_infra.conf file changes, or when the switch pauses frames or controls priority flow, which require modifications to the ASIC buffer and might result in momentary packet loss.

NVUE reloads the switchd service automatically. You do not have to run the reload switchd.service command to apply changes when configuring QoS with NVUE commands.

Classification

When a frame or packet arrives on the switch, Cumulus Linux maps it to an internal COS (switch priority) value. This value never writes to the frame or packet but classifies and schedules traffic internally through the switch.

You can define which values are trusted: 802.1p, DSCP, or both.

The following table describes the default classifications for various frame and switch priority configurations:

SettingVLAN Tagged?IP or Non-IPResult
PCP (802.1p)YesIPAccept incoming 802.1p marking.
PCP (802.1p)YesNon-IPAccept incoming 802.1p marking.
PCP (802.1p)NoIPUse the default priority setting.
PCP (802.1p)NoNon-IPUse the default priority setting.
DSCPYesIPAccept incoming DSCP IP header marking.
DSCPYesNon-IPUse the default priority setting.
DSCPNoIPAccept incoming DSCP IP header marking.
DSCPNoNon-IPUse the default priority setting.
PCP (802.1p) and DSCPYesIPAccept incoming DSCP IP header marking.
PCP (802.1p) and DSCPYesNon-IPAccept incoming 802.1p marking.
PCP (802.1p) and DSCPNoIPAccept incoming DSCP IP header marking.
PCP (802.1p) and DSCPNoNon-IPUse the default priority setting.
portEitherEitherIgnore any existing markings and use the default priority setting.

Trust 802.1p Marking

To trust 802.1p marking:

When 802.1p (l2) is trusted, Cumulus Linux classifies these ingress 802.1p values to switch priority values:

Switch Priority802.1p (PCP)
00
11
22
33
44
55
66
77

The PCP number is the incoming 802.1p marking; for example PCP 0 maps to switch priority 0.

To change the default profile to map PCP 0 to switch priority 4:

cumulus@switch:~$ nv set qos mapping default-global trust l2
cumulus@switch:~$ nv set qos mapping default-global pcp 0 switch-priority 4 
cumulus@switch:~$ nv config apply

You can map multiple PCP values to the same switch priority value. For example, to map PCP values 2, 3, and 4 to switch priority 0:

cumulus@switch:~$ nv set qos mapping default-global trust l2 
cumulus@switch:~$ nv set qos mapping default-global pcp 2,3,4 switch-priority 0
cumulus@switch:~$ nv config apply

If you configure the trust to be l2 but do not specify any PCP to switch priority mappings, Cumulus Linux uses the default values.

To show the ingress 802.1p mapping for the default profile, run the nv show qos mapping default-global pcp command. To show the PCP mapping for a specific switch priority in the default profile, run the nv show qos mapping default-global pcp <value> command. The following example shows that PCP 0 maps to switch priority 4:

cumulus@switch:~$ nv show qos mapping default-global pcp 0
                 operational  applied  description
---------------  -----------  -------  ------------------------
switch-priority  4            4        Internal Switch Priority

In the /etc/cumulus/datapath/qos/qos_features.conf file, set traffic.packet_priority_source_set = [802.1p].

When 802.1p marking is trusted, the following lines classify ingress 802.1p values to switch priority (internal COS) values:

traffic.cos_0.priority_source.8021p = [0]
traffic.cos_1.priority_source.8021p = [1]
traffic.cos_2.priority_source.8021p = [2]
traffic.cos_3.priority_source.8021p = [3]
traffic.cos_4.priority_source.8021p = [4]
traffic.cos_5.priority_source.8021p = [5]
traffic.cos_6.priority_source.8021p = [6]
traffic.cos_7.priority_source.8021p = [7]

The traffic.cos_ number is the switch priority value; for example 802.1p 0 maps to switch priority 0.

To map 802.1p 4 to switch priority 0, configure the traffic.cos_0.priority_source.8021p setting to 4.

traffic.cos_0.priority_source.8021p = [4]

You can map multiple values to the same switch priority value. For example, to map 802.1p values 0, 1, and 2 to switch priority 0:

traffic.cos_0.priority_source.8021p = [0, 1, 2]

You can also choose not to use a switch priority value. This example does not use switch priority values 3 and 4.

traffic.cos_0.priority_source.8021p = [0]
traffic.cos_1.priority_source.8021p = [1]
traffic.cos_2.priority_source.8021p = [2,3,4]
traffic.cos_3.priority_source.8021p = []
traffic.cos_4.priority_source.8021p = []
traffic.cos_5.priority_source.8021p = [5]
traffic.cos_6.priority_source.8021p = [6]
traffic.cos_7.priority_source.8021p = [7]

To apply a custom profile to specific interfaces, see Port Groups.

Trust DSCP

To trust ingress DSCP markings:

If DSCP (l3) is trusted, Cumulus Linux classifies these ingress DSCP values to switch priority values:

Switch PriorityIngress DSCP
0[0,1,2,3,4,5,6,7]
1[8,9,10,11,12,13,14,15]
2[16,17,18,19,20,21,22,23]
3[24,25,26,27,28,29,30,31]
4[32,33,34,35,36,37,38,39]
5[40,41,42,43,44,45,46,47]
6[48,49,50,51,52,53,54,55]
7[56,57,58,59,60,61,62,63]

The DSCP number is the ingress DSCP value; for example DSCP 0 through 7 maps to switch priority 0.

To change the default profile to map ingress DSCP 22 to switch priority 4:

cumulus@switch:~$ nv set qos mapping default-global trust l3 
cumulus@switch:~$ nv set qos mapping default-global dscp 22 switch-priority 4 
cumulus@switch:~$ nv config apply

You can map multiple ingress DSCP values to the same switch priority value. For example, to change the default profile to map ingress DSCP values 10, 21, and 36 to switch priority 0:

cumulus@switch:~$ nv set qos mapping default-global trust l3 
cumulus@switch:~$ nv set qos mapping default-global dscp 10,21,36 switch-priority 0
cumulus@switch:~$ nv config apply

If you configure the trust to be l3 but do not specify any DSCP to switch priority mappings, Cumulus Linux uses the default values.

To show the DSCP mapping in the default profile, run the nv show qos mapping default-global dscp command. To show the DSCP mapping for a specific switch priority in the default profile, run the nv show qos mapping default-global dscp <value> command. The following example shows that DSCP 22 maps to switch priority 4:

cumulus@switch:~$ nv show qos mapping default-global dscp 22
                 operational  applied  description
---------------  -----------  -------  ------------------------
switch-priority  4            4        Internal Switch Priority

In the /etc/cumulus/datapath/qos/qos_features.conf file, configure traffic.packet_priority_source_set = [dscp].

If DSCP is trusted, the following lines classify ingress DSCP values to switch priority (internal COS) values:

traffic.cos_0.priority_source.dscp = [0,1,2,3,4,5,6,7]
traffic.cos_1.priority_source.dscp = [8,9,10,11,12,13,14,15]
traffic.cos_2.priority_source.dscp = [16,17,18,19,20,21,22,23]
traffic.cos_3.priority_source.dscp = [24,25,26,27,28,29,30,31]
traffic.cos_4.priority_source.dscp = [32,33,34,35,36,37,38,39]
traffic.cos_5.priority_source.dscp = [40,41,42,43,44,45,46,47]
traffic.cos_6.priority_source.dscp = [48,49,50,51,52,53,54,55]
traffic.cos_7.priority_source.dscp = [56,57,58,59,60,61,62,63]

The # in the configuration file is a comment. By default, the file comments out the traffic.cos_*.priority_source.dscp lines.
You must uncomment them for them to take effect.

The traffic.cos_ number is the switch priority value; for example DSCP values 0 through 7 map to switch priority 0. To map ingress DSCP 22 to switch priority 4, configure the traffic.cos_4.priority_source.dscp setting.

traffic.cos_4.priority_source.dscp = [22]

You can map multiple ingress DSCP values to the same switch priority value. For example, to map ingress DSCP values 10, 21, and 36 to switch priority 0:

traffic.cos_0.priority_source.dscp = [10,21,36]

You can also choose not to use an switch priority value. This example does not use switch priority values 3 and 4:

traffic.cos_0.priority_source.dscp = [0,1,2,3,4,5,6,7]
traffic.cos_1.priority_source.dscp = [8,9,10,11,12,13,14,15]
traffic.cos_2.priority_source.dscp = [16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31]
traffic.cos_3.priority_source.dscp = []
traffic.cos_4.priority_source.dscp = []
traffic.cos_5.priority_source.dscp = [40,41,42,43,44,45,46,47,32,33,34,35,36,37,38,39]
traffic.cos_6.priority_source.dscp = [48,49,50,51,52,53,54,55]
traffic.cos_7.priority_source.dscp = [56,57,58,59,60,61,62,63]

To apply a custom DSCP profile to specific interfaces, see Port Groups.

Trust Port

You can assign all traffic to a switch priority regardless of the ingress marking.

The following commands assign all traffic to switch priority 3 regardless of the ingress marking.

cumulus@switch:~$ nv set qos mapping default-global trust port 
cumulus@switch:~$ nv set qos mapping default-global port-default-sp 3 
cumulus@switch:~$ nv config apply

To show the switch priority setting in the default profile for all traffic regardless of the ingress marking, run the nv show qos mapping default-global command:

cumulus@switch:~$ nv show qos mapping default-global
                 operational  applied  description
---------------  -----------  -------  ----------------------------
port-default-sp  3            3        Port Default Switch Priority
trust            port         port     Port Trust configuration

In the /etc/cumulus/datapath/qos/qos_features.conf file, configure traffic.packet_priority_source_set = [port].

The traffic.port_default_priority setting defines the switch priority that all traffic uses.

To apply a custom profile to specific interfaces, see Port Groups.

Mark and Remark Traffic

You can mark or remark traffic in two ways:

802.1p or DSCP for Marking

To enable global remarking of 802.1p, DSCP or both 802.1p and DSCP values:

To remark switch priority 0 to egress 802.1p 4

cumulus@switch:~$ nv set qos remark default-global rewrite l2
cumulus@switch:~$ nv set qos remark default-global switch-priority 0 pcp 4
cumulus@switch:~$ nv config apply

To remark switch priority 0 to egress DSCP 22:

cumulus@switch:~$ nv set qos remark default-global rewrite l3
cumulus@switch:~$ nv set qos remark default-global switch-priority 0 dscp 22
cumulus@switch:~$ nv config apply

You can remap multiple switch priority values to the same external 802.1p or DSCP value. For example, to map switch priority 1 and 2 to 802.1p 3:

cumulus@switch:~$ nv set qos remark default-global rewrite l2
cumulus@switch:~$ nv set qos remark default-global switch-priority 1 pcp 3
cumulus@switch:~$ nv set qos remark default-global switch-priority 2 pcp 3
cumulus@switch:~$ nv config apply

To map switch priority 1 and 2 to DSCP 40:

cumulus@switch:~$ nv set qos remark default-global rewrite l3
cumulus@switch:~$ nv set qos remark default-global switch-priority 1 dscp 40
cumulus@switch:~$ nv set qos remark default-global switch-priority 2 dscp 40
cumulus@switch:~$ nv config apply

In the /etc/cumulus/datapath/qos/qos_features.conf file, modify the traffic.packet_priority_remark_set value to [802.1p], [dscp] or [802.1p,dscp]. For example, to enable the remarking of only 802.1p values:

traffic.packet_priority_remark_set = [802.1p]

You remark 802.1p or DSCP with the priority_remark.8021p or priority_remark.dscp setting. The switch priority (internal cos_) value determines the egress 802.1p or DSCP remarking. For example, to remark switch priority 0 to egress 802.1p 4:

traffic.cos_0.priority_remark.8021p = [4]

To remark switch priority 0 to egress DSCP 22:

traffic.cos_0.priority_remark.dscp = [22]

The # in the configuration file is a comment. The file comments out the traffic.cos_*.priority_remark.8021p and the traffic.cos_*.priority_remark.dscp lines by default. You must uncomment them to set the configuration.

You can remap multiple switch priority values to the same external 802.1p or DSCP value. For example, to map switch priority 1 and 2 to 802.1p 3:

traffic.cos_1.priority_remark.8021p = [3]
traffic.cos_2.priority_remark.8021p = [3]

To map switch priority 1 and 2 to DSCP 40:

traffic.cos_1.priority_remark.dscp = [40]
traffic.cos_2.priority_remark.dscp = [40]

To apply a custom profile to specific interfaces, see Port Groups.

Policy-based Marking

Cumulus Linux supports ACLs through ebtables, iptables or ip6tables for egress packet marking and remarking.

Cumulus Linux uses ebtables to mark layer 2, 802.1p COS values. Cumulus Linux uses iptables to match IPv4 traffic and ip6tables to match IPv6 traffic for DSCP marking.

For more information on configuring and applying ACLs, refer to Netfilter - ACLs.

Mark Layer 2 COS

You must use ebtables to match and mark layer 2 bridged traffic. You can match traffic with any supported ebtables rule.

To set the new 802.1p COS value when traffic matches, use -A FORWARD -o <interface> -j setqos --set-cos <value>.

You can only set COS on a per-egress interface basis. Cumulus Linux does not support ebtables based matching on ingress.

The configured action always has the following conditions:

For example, to set traffic leaving interface swp5 to 802.1p COS value 4:

-A FORWARD -o swp5 -j setqos --set-cos 4

Mark Layer 3 DSCP

You must use iptables (for IPv4 traffic) or ip6tables (for IPv6 traffic) to match and mark layer 3 traffic.

You can match traffic with any supported iptable or ip6tables rule. To set the new COS or DSCP value when traffic matches, use -A FORWARD -o <interface> -j SETQOS [--set-dscp <value> | --set-cos <value> | --set-dscp-class <name>].

The configured action always has the following conditions:

You can configure COS markings with --set-cos and a value between 0 and 7 (inclusive).

You can use only one of --set-dscp or --set-dscp-class.
--set-dscp supports decimal or hex DSCP values between 0 and 77. --set-dscp-class supports standard DSCP naming, described in RFC3260, including ef, be, CS and AF classes.

You can specify either --set-dscp or --set-dscp-class, but not both.

For example, to set traffic leaving interface swp5 to DSCP value 32:

-A FORWARD -o swp5 -j SETQOS --set-dscp 32

To set traffic leaving interface swp11 to DSCP class value CS6:

-A FORWARD -o swp11 -j SETQOS --set-dscp-class cs6

Flow Control

Flow control influences data transmission to manage congestion along a network path.

Cumulus Linux supports the following flow control mechanisms:

You can not configure link pause and PFC on the same port.

Flow Control Buffers

Before configuring link pause or PFC, configure the buffer pool memory allocated for lossless and lossy flows. The following example sets each to fifty percent:

cumulus@switch:~$ nv set qos traffic-pool default-lossless memory-percent 50
cumulus@switch:~$ nv set qos traffic-pool default-lossy memory-percent 50
cumulus@switch:~$ nv config apply

Cumulus Linux allocates 100% of the buffer memory to the default-lossy traffic pool by default. The total memory allocation across pools must not exceed 100%.

Edit the following lines in the /etc/mlx/datapath/qos/qos_infra.conf file:

  1. Modify the existing ingress_service_pool.0.percent and egress_service_pool.0.percent buffer allocation. Change the existing ingress setting to ingress_service_pool.0.percent = 50. Change the existing egress setting to egress_service_pool.0.percent = 50.

  2. Add the following lines to create a new service_pool, set flow_control to the service pool, and define buffer reservations:

ingress_service_pool.1.percent = 50.0
ingress_service_pool.1.mode = 1
egress_service_pool.1.percent = 50.0
egress_service_pool.1.mode = 1
egress_service_pool.1.infinite_flag = TRUE
#
flow_control.ingress_service_pool = 1
flow_control.egress_service_pool = 1
#
port.service_pool.1.ingress_buffer.reserved = 0
port.service_pool.1.ingress_buffer.dynamic_quota = ALPHA_1
port.service_pool.1.egress_buffer.uc.reserved = 0
port.service_pool.1.egress_buffer.uc.dynamic_quota = ALPHA_INFINITY
#
flow_control.ingress_buffer.dynamic_quota = ALPHA_1
flow_control.egress_buffer.reserved = 0
flow_control.egress_buffer.dynamic_quota = ALPHA_INFINITY

Link pause is an older flow control mechanism that causes all traffic on a link between two switches, or between a host and switch, to stop transmitting during times of congestion. Link pause starts and stops depending on buffer congestion. You configure link pause on a per-direction, per-interface basis. You can receive pause frames to stop the switch from transmitting when requested, send pause frames to request neighboring devices to stop transmitting, or both.

  • NVIDIA recommends that you use Priority Flow Control (PFC) instead of link pause.
  • Before configuring link pause, you must first modify the switch buffer allocation. Refer to Flow Control Buffers.

Link pause buffer calculation is a complex topic that IEEE 802.1Q-2012 defines. This attempts to incorporate the delay between signaling congestion and the reception of the signal by the neighboring device. This calculation includes the delay that the PHY and MAC layers (interface delay) introduce as well as the distance between end points (cable length).

Incorrect cable length settings can cause wasted buffer space (triggering congestion too early) or packet drops (congestion occurs before flow control activates).

The following example configuration:

Cumulus Linux also includes frame transmission start and stop threshold, and port buffer settings. NVIDIA recommends that you do not change these settings but, instead, let Cumulus Linux configure the settings dynamically. Only change the threshold and buffer settings if you are an advanced user who understands the buffer configuration requirements for lossless traffic to work seamlessly.

cumulus@switch:~$ nv set qos link-pause my_pause_ports tx enable
cumulus@switch:~$ nv set qos link-pause my_pause_ports rx disable
cumulus@switch:~$ nv set qos link-pause my_pause_ports cable-length 50
cumulus@switch:~$ nv set interface swp1-swp4,swp6 qos link-pause profile my_pause_ports
cumulus@switch:~$ nv config apply

To show the link pause settings for a profile, run the nv show qos link-pause <profile> command

Uncomment and edit the link_pause section of the /etc/cumulus/datapath/qos/qos_features.conf file.

link_pause.port_group_list = [my_pause_ports]
link_pause.my_pause_ports.port_set = swp1-swp4,swp6
link_pause.my_pause_ports.port_buffer_bytes = 25000
link_pause.my_pause_ports.xoff_size = 10000
link_pause.my_pause_ports.xon_delta = 2000
link_pause.my_pause_ports.rx_enable = false
link_pause.my_pause_ports.tx_enable = true
link_pause.my_pause_ports.cable_length = 10

To process pause frames, you must enable link pause on the specific interfaces.

Priority Flow Control (PFC)

Priority flow control extends the capabilities of link pause by the frames for a specific 802.1p value instead of stopping all traffic on a link. If a switch supports PFC and receives a PFC pause frame for a given 802.1p value, the switch stops transmitting frames from that queue, but continues transmitting frames for other queues.

You use PFC with RDMA over Converged Ethernet - RoCE. The RoCE section provides information to specifically deploy PFC and ECN for RoCE environments.

Before configuring PFC, first modify the switch buffer allocation according to Flow Control Buffers.

PFC buffer calculation is a complex topic defined in IEEE 802.1Q-2012, which attempts to incorporate the delay between signaling congestion and receiving the signal by the neighboring device. This calculation includes the delay that the PHY and MAC layers (called the interface delay) introduce as well as the distance between end points (cable length).
Incorrect cable length settings cause wasted buffer space (triggering congestion too early) or packet drops (congestion occurs before flow control activates).

To apply PFC settings on all ports, modify the default PFC profile (default-global).

The following example modifies the default profile and configures:

Cumulus Linux also includes frame transmission start and stop threshold, and port buffer settings. NVIDIA recommends that you do not change these settings but, instead, let Cumulus Linux configure the settings dynamically. Only change the threshold and buffer settings if you are an advanced user who understands the buffer configuration requirements for lossless traffic to work seamlessly.

cumulus@switch:~$ nv set qos pfc default-global switch-priority 0 
cumulus@switch:~$ nv set qos pfc default-global tx enable 
cumulus@switch:~$ nv set qos pfc default-global rx disable 
cumulus@switch:~$ nv set qos pfc default-global cable-length 50
cumulus@switch:~$ nv config apply

To show the PFC settings for the default profile, run the nv show qos pfc default-global command:

cumulus@switch:~$ nv show qos pfc default-global
                   operational  applied  description
-----------------  -----------  -------  --------------------------------
cable-length       50           50       Cable Length (in meters)
port-buffer        25000 B      25000 B  Port Buffer (in bytes)
rx                 disable      disable  PFC Rx State
tx                 enable       enable   PFC Tx State
xoff-threshold     10000 B      10000 B  Xoff Threshold (in bytes)
xon-threshold      2000 B       2000 B   Xon Threshold (in bytes)
[switch-priority]  0            0        Collection of switch priorities.

Edit the priority flow control section of the /etc/cumulus/datapath/qos/qos_features.conf file.

pfc.port_group_list = [default-global]
pfc.default-global.port_set = allports
pfc.default-global.cos_list = [0]
pfc.default-global.port_buffer_bytes = 25000
pfc.default-global.xoff_size = 10000
pfc.default-global.xon_delta = 2000
pfc.default-global.tx_enable = true
pfc.default-global.rx_enable = false
pfc.default-global.cable_length = 50

To apply a custom profile to specific interfaces, see Port Groups.

PFC Watchdog

PFC watchdog detects and mitigates pause storms on PFC-enabled ports.

In lossless Ethernet, the switch sends PFC PAUSE frames to instruct the link partner to pause sending packets on a traffic class. This back pressure might propagate across the network and, if it persists, can cause the network to stop forwarding traffic. PFC watchdog detects abnormal back pressure caused by receiving an excessive number of pause frames and disables PFC temporarily.

When a lossless queue receives a pause storm from its link partner and the queue is in a paused state for a certain period of time, PFC watchdog mitigates the pause storm. The watchdog stops processing received pause frames on every switch priority corresponding to the traffic class that detects the storm and discards new incoming packets to this egress queue.

The watchdog continues to count pause frames received on the port. If there are no pause frames received in any polling interval period, it restores the PFC configuration on the port and stops dropping packets.

PFC watchdog also detects and mitigates pause storms on link pause-enabled ports. The watchdog configuration for link pause-enabled ports is the same as the configuration for PFC-enabled ports. For a link pause-enabled port, the watchdog stops processing received pause frames on the egress port that detects the storm and discards new incoming packets to all egress queues on the port until congestion diminishes.

  • PFC watchdog only works for lossless traffic queues.
  • You can only configure PFC watchdog on a port with PFC (or link pause) configuration.
  • You can only enable PFC watchdog on a physical interface (swp).
  • You cannot enable the watchdog on a bond (for example, bond0) but you can enable the watchdog on a port that is a member of a bond (for example, swp1).

To enable PFC watchdog:

Enable PFC watchdog on the interfaces where you enable PFC:

cumulus@switch:~$ nv set interface swp1 qos pfc-watchdog
cumulus@switch:~$ nv set interface swp3 qos pfc-watchdog
cumulus@switch:~$ nv config apply

To disable PFC watchdog, run the nv unset interface <interface> qos pfc-watchdog command or the nv set interface <interface> qos pfc-watchdog state disable command.

Edit the PFC Watchdog Configuration section of the /etc/cumulus/datapath/qos/qos_features.conf file, then reload switchd.

...
# PFC Watchdog Configuration
# Add the port to the port_group_list where you want to enable PFC Watchdog
# It will enable PFC Watchdog on all the traffic-class corresponding to
# the lossless switch-priority configured on the port.
pfc_watchdog.port_group_list = [pfc_wd_port_group]
pfc_watchdog.pfc_wd_port_group.port_set = swp1,swp2
...
cumulus@switch:~$ sudo systemctl reload switchd

You can control the PFC watchdog polling interval and how many polling intervals the PFC watchdog must wait before it mitigates the storm condition. The default polling interval is 100 milliseconds. The default number of polling intervals is 3.

The following example sets the PFC watchdog polling interval to 200 milliseconds and the number of polling intervals to 5:

cumulus@switch:~$ nv set qos pfc-watchdog polling-interval 200
cumulus@switch:~$ nv set qos pfc-watchdog robustness 5
cumulus@switch:~$ nv config apply

Edit the /etc/cumulus/switchd.conf file to set the pfc_wd.poll_interval parameter and the pfc_wd.robustness parameter.

...
# PFC Watchdog poll interval (in msec)
#pfc_wd.poll_interval = 200

# PFC Watchdog robustness (# of iterations)
#pfc_wd.robustness = 5
...

Run the following commands to apply the configuration:

cumulus@switch:~$ echo 5 > /cumulus/switchd/config/pfc_wd/robustness
cumulus@switch:~$ echo 200 > /cumulus/switchd/config/pfc_wd/poll_interval

To show if PFC watchdog is on and to show the status for each traffic class, run the nv show interface <interface> qos pfc-watchdog command:

cumulus@switch:~$ nv show interface swp1 qos pfc-watchdog
                 operational  applied 
---------------  -----------  ------- 
state            enabled      enabled 

PFC WD Status 
=========================== 
    traffic-class  status    deadlock-count 
    -------------  --------  -------------- 

    0              OK        0 
    1              OK        3 
    2              DEADLOCK  2  
    3              OK        0 
    4              OK        0 
    5              OK        0 
    6              OK        0 
    7              DEADLOCK  3 

To show PFC watchdog data for a specific traffic class, run the nv show interface <interface> qos pfc-watchdog status <traffic-class> command.

To clear the PFC watchdog deadlock-count on an interface, run the nv action clear interface <interface> qos pfc-watchdog deadlock-count command.

Congestion Control (ECN)

Explicit Congestion Notification (ECN) is an end-to-end layer 3 congestion control protocol. Defined by RFC 3168, ECN relies on bits in the IPv4 header Traffic Class to signal congestion conditions. ECN requires one or both server endpoints to support ECN to be effective.

Instead of telling adjacent devices to stop transmitting during times of buffer congestion, ECN sets the ECN bits of the transit IPv4 or IPv6 header to indicate to end hosts that congestion might occur. As a result, the sending hosts reduce their sending rate until the transit switch no longer sets ECN bits.

You use ECN with RDMA over Converged Ethernet - RoCE. The RoCE section describes how to deploy PFC and ECN for RoCE environments.

ECN operates by having a transit switch that marks packets between two end hosts.

  1. The transmitting host indicates it is ECN-capable by setting the ECN bits in the outgoing IP header to 01 or 10
  2. If the buffer of a transit switch is greater than the configured minimum threshold of the buffer, the switch remarks the ECN bits to 11 indicating Congestion Encountered or CE.
  3. The receiving host marks any reply packets, like a TCP-ACK, as CE (11).
  4. The original transmitting host reduces its transmission rate.
  5. When the switch buffer congestion falls below the configured minimum threshold of the buffer, the switch stops remarking ECN bits, setting them back to 01 or 10.
  6. A receiving host reflects this new ECN marking in the next reply so that the transmitting host resumes sending at normal speeds.

The default profile (default-global) enables ECN by default on egress queue 0 for all ports with the following settings:

The following example commands change the default ECN profile that applies to all ports. The commands enable ECN on egress queue 4, 5, and 7, set the minimum buffer threshold to 40000 and the maximum buffer threshold to 200000, and enable RED.

cumulus@switch:~$ nv set qos congestion-control default-global traffic-class 4,5,7 min-threshold 40000
cumulus@switch:~$ nv set qos congestion-control default-global traffic-class 4,5,7 max-threshold 200000 
cumulus@switch:~$ nv set qos congestion-control default-global traffic-class 4,5,7 red enable
cumulus@switch:~$ nv config apply

The following example disables ECN bit marking in the default profile for all ports.

cumulus@switch:~$ nv set qos congestion-control default-global traffic-class 0 ecn disable
cumulus@switch:~$ nv config apply

To show the ECN settings for the default profile, run the nv show qos congestion-control default-global command:

cumulus@switch:~$ nv show qos congestion-control default-global
    operational  applied  description
--  -----------  -------  -----------

ECN Configurations
=====================
    traffic-class  ECN     RED     Min Th   Max Th    Probability
    -------------  ------  ------  -------  --------  -----------
    4              enable  enable  40000 B  200000 B  100
    5              enable  enable  40000 B  200000 B  100
    7              enable  enable  40000 B  200000 B  100

To show the ECN settings in the default profile for a specific egress queue, run the nv show qos congestion-control default-global traffic-class <value> command:

cumulus@switch:~$ nv show qos congestion-control default-global traffic-class 4 
               operational  applied   description
-------------  -----------  --------  -----------------------------------
ecn            enable       enable    Early Congestion Notification State
max-threshold  200000 B     200000 B  Maximum Threshold (in bytes)
min-threshold  40000 B      40000 B   Minimum Threshold (in bytes)
probability    100          100       Probability
red            enable       enable    Random Early Detection State

Edit the Explicit Congestion Notification section of the /etc/cumulus/datapath/qos/qos_features.conf file.

default_ecn_red_conf.egress_queue_list = [4,5,7]
default_ecn_red_conf.ecn_enable = true
default_ecn_red_conf.red_enable = true
default_ecn_red_conf.min_threshold_bytes = 40000
default_ecn_red_conf.max_threshold_bytes = 200000
default_ecn_red_conf.probability = 100

To disable ECN bit marking, set ecn_enable to false. The following example disables ECN bit marking in the default profile for all ports.

...
default_ecn_red_conf.ecn_enable = false 
...

To apply a custom ECN profile to specific interfaces, see Port Groups.

Egress Queues

Cumulus Linux supports eight egress queues to provide different classes of service. By default switch priority values map directly to the matching egress queue. For example, switch priority value 0 maps to egress queue 0.

You can remap queues by changing the switch priority value to the corresponding queue value. You can map multiple switch priority values to a single egress queue.

You do not have to assign all egress queues.

The following command examples assign switch priority 2 to egress queue 7:

cumulus@switch:~$ nv set qos egress-queue-mapping default-global switch-priority 2 traffic-class 7
cumulus@switch:~$ nv config apply

NVUE only supports the default-global profile.

To show the egress queue mapping configuration for the default profile, run the nv show qos egress-queue-mapping default-global command:

cumulus@switch:~$ nv show qos egress-queue-mapping default-global
    operational  applied  description
--  -----------  -------  -----------

SP->TC mapping configuration
===============================
    switch-priority  traffic-class
    ---------------  -------------
    0                0
    1                1
    2                7
    3                3
    4                4
    5                5
    6                6
    7                7

To show the egress queue mapping for a specific switch priority in the default profile, run the nv show qos egress-queue-mapping default-global switch-priority <value> command. The following example command shows that switch priority 2 maps to egress queue 7.

cumulus@switch:~$ nv show qos egress-queue-mapping default-global switch-priority 2
               operational  applied  description
-------------  -----------  -------  -------------
traffic-class  7            7        Traffic Class

You configure egress queues in the qos_infra.conf file.

cos_egr_queue.cos_0.uc  = 0
cos_egr_queue.cos_1.uc  = 1
cos_egr_queue.cos_2.uc  = 7
cos_egr_queue.cos_3.uc  = 3
cos_egr_queue.cos_4.uc  = 4
cos_egr_queue.cos_5.uc  = 5
cos_egr_queue.cos_6.uc  = 6
cos_egr_queue.cos_7.uc  = 7

Egress Scheduler

Cumulus Linux supports 802.1Qaz, Enhanced Transmission Selection, which allows the switch to assign bandwidth to egress queues and then schedule the transmission of traffic from each queue. 802.1Qaz supports Priority Queuing.

Cumulus Linux provides a default egress scheduler that applies to all ports, where the bandwidth allocated to egress queues 0,2,4,6 is 12 percent and the bandwidth allocated to egress queues 1,3,5,7 is 13 percent. You can also apply a custom egress scheduler for specific ports; see Port Groups.

The following example modifies the default profile. The commands change the bandwidth allocation for egress queues 0, 1, 5, and 7 to strict, bandwidth allocation for egress queues 2 and 6 to 30 percent and bandwidth allocation for egress queues 3 and 4 to 20 percent.

  • The traffic-class value defines the egress queue where you want to assign bandwidth. For example, traffic-class 2 defines the bandwidth allocation for egress queue 2.
  • For each egress queue, you can either define the mode as dwrr or strict. In dwrr mode, you must define a bandwidth percent value between 1 and 100. If you do not specify a value for an egress queue, Cumulus Linux uses a DWRR value of 0 (no egress scheduling). The combined total of values you assign to bw_percent must be less than or equal to 100.
cumulus@switch:~$ nv set qos egress-scheduler default-global traffic-class 2,6 mode dwrr 
cumulus@switch:~$ nv set qos egress-scheduler default-global traffic-class 2,6 bw-percent 30 
cumulus@switch:~$ nv set qos egress-scheduler default-global traffic-class 3,4 mode dwrr
cumulus@switch:~$ nv set qos egress-scheduler default-global traffic-class 3,4 bw-percent 20 
cumulus@switch:~$ nv set qos egress-scheduler default-global traffic-class 0,1,5,7 mode strict
cumulus@switch:~$ nv config apply

To show the egress scheduling policy for the default profile, run the nv show qos egress-scheduler default-global command:

cumulus@switch:~$ nv show qos egress-scheduler default-global
    operational  applied  description
--  -----------  -------  -----------

TC->DWRR weight configuration
================================
    traffic-class  mode    bw-percent
    -------------  ------  ----------
    0              strict
    1              strict
    2              dwrr    30
    3              dwrr    20
    4              dwrr    20
    5              strict
    6              dwrr    30
    7              strict

You configure the egress scheduling policy in the egress scheduling section of the /etc/cumulus/datapath/qos/qos_features.conf file.

  • The egr_queue_ value defines the egress queue where you want to assign bandwidth. For example, egr_queue_0 defines the bandwidth allocation for egress queue 0.
  • The bw_percent value defines the bandwidth allocation you want to assign to an egress queue. If you do not specify a value for an egress queue, there is no egress scheduling. If you specify a value of 0 for an egress queue, Cumulus Linux assigns strict priority mode to the egress queue and always processes it ahead of other queues. The combined total of values you assign to bw_percent must be less than or equal to 100.
default_egress_sched.egr_queue_0.bw_percent = 0
default_egress_sched.egr_queue_1.bw_percent = 0
default_egress_sched.egr_queue_2.bw_percent = 30
default_egress_sched.egr_queue_3.bw_percent = 20
default_egress_sched.egr_queue_4.bw_percent = 20
default_egress_sched.egr_queue_5.bw_percent = 0
default_egress_sched.egr_queue_6.bw_percent = 30
default_egress_sched.egr_queue_7.bw_percent = 0

strict mode does not define a maximum bandwidth allocation. This can lead to starvation of other queues.

To apply a custom egress scheduler for specific ports, see Port Groups.

Policing and Shaping

Traffic shaping and policing control the rate at which the switch sends or receives traffic on a network to prevent congestion.

Traffic shaping typically occurs at egress and traffic policing at ingress.

Shaping

Traffic shaping allows a switch to send traffic at an average bitrate lower than the physical interface. Traffic shaping prevents a receiving device from dropping bursty traffic if the device is either not capable of that rate of traffic or has a policer that limits what it accepts.

Traffic shaping works by holding packets in the buffer and releasing them at specific time intervals.

Cumulus Linux supports two levels of hierarchical traffic shaping: one at the egress queue level and one at the port level. This allows for minimum and maximum bandwidth guarantees for each egress queue and a defined port traffic shaping rate.

The following example configuration:

  • When the minimum bandwidth for an egress queue is 0, there is no bandwidth guarantee for this queue.
  • The maximum bandwidth for an egress queue must not exceed the maximum packet shaper rate for the port group.
  • The maximum packet shaper rate for the port group must not exceed the physical interface speed.
  • Cumulus Linux only shapes traffic for the traffic classes in a profile that include shaper configuration.

cumulus@switch:~$ nv set qos egress-shaper shaper1 traffic-class 2 min-rate 100
cumulus@switch:~$ nv set qos egress-shaper shaper1 traffic-class 2 max-rate 500
cumulus@switch:~$ nv set qos egress-shaper shaper1 port-max-rate 200000
cumulus@switch:~$ nv set interface swp1-swp3,swp5 qos egress-shaper profile shaper1
cumulus@switch:~$ nv config apply

Edit the shaping section of the qos_features.conf file.

Cumulus Linux bases the egr_queue value on the configured egress queue.

shaping.port_group_list = [shaper1]
shaping.shaper1.port_set = swp1-swp3,swp5
shaping.shaper1.egr_queue_0.shaper = [50000, 100000]
shaping.shaper1.port.shaper = 900000

Policing

Traffic policing prevents an interface from receiving more traffic than intended. You use policing to enforce a maximum transmission rate on an interface. The switch drops any traffic above the policing level.

Cumulus Linux supports both a single-rate policer and a dual-rate policer (tricolor policer).

You configure traffic policing using ebtables, iptables, or ip6table rules.

For more information on configuring and applying ACLs, refer to Netfilter - ACLs.

Single-rate Policer

To configure a single-rate policer, use iptables JUMP action -j POLICE.

Cumulus Linux supports the following iptable flags with a single-rate policer.

iptables FlagDescription
--set-mode [pkt | KB]Define the policer to count packets or kilobytes.
--set-rate [<kbytes> | <packets>]The maximum rate of traffic in kilobytes or packets per second.
--set-burst <kilobytes>The allowed burst size in kilobytes.

For example, to create a policer to allow 400 packets per second with 100 packet burst:
-j POLICE --set-mode pkt --set-rate 400 --set-burst 100

Dual-rate Policer

To configure a dual-rate policer, use the iptables JUMP action -j TRICOLORPOLICE.

Cumulus Linux supports the following iptable flags with a dual-rate policer.

iptables FlagDescription
--set-color-mode [blind | aware]The policing mode: single-rate (blind) or dual-rate (aware). The default is aware.
--set-cir <kbps>The committed information rate (CIR) in kilobits per second.
--set-cbs <kbytes>The committed burst size (CBS) in kilobytes.
--set-pir <kbps>The peak information rate (PIR) in kilobits per second.
--set-ebs <kbytes>The excess burst size (EBS) in kilobytes.
--set-conform-action-dscp <dscp value>The numerical DSCP value to mark for traffic that conforms to the policer rate.
--set-exceed-action-dscp <dscp value>The numerical DSCP value to mark for traffic that exceeds the policer rate.
--set-violate-action-dscp <dscp value>The numerical DSCP value to mark for traffic that violates the policer rate.
--set-violate-action [accept | drop]Cumulus Linux either accepts and remarks, or drops packets that violate the policer rate.

For example, to configure a dual-rate, three-color policer, with a 3 Mbps CIR, 500 KB CBS, 10 Mbps PIR, and 1 MB EBS and drops packets that violate the policer:

-j TRICOLORPOLICE --set-color-mode blind --set-cir 3000 --set-cbs 500 --set-pir 10000 --set-ebs 1000 --set-violate-action drop

Port Groups

Cumulus Linux supports profiles (port groups) for all features including ECN and RED. Profiles apply similar QoS configurations to a set of ports.

  • Configurations with a profile override the global settings for the ingress ports in the port group.
  • Ports not in a profile use the global settings.
  • To apply a profile to all ports, use the global profile.

Trust and Marking

You can use port groups to assign different profiles to different ports. A profile is a label for a group of configuration settings.

The following example configures two profiles. customer1 applies to swp1, swp4, and swp6. customer2 applies to swp5 and swp7.

cumulus@switch:~$ nv set qos mapping customer1 trust l3 
cumulus@switch:~$ nv set qos mapping customer1 dscp 0 switch-priority 1-7
cumulus@switch:~$ nv set interface swp1,swp4,swp6 qos mapping profile customer1
cumulus@switch:~$ nv set qos mapping customer2 trust l2
cumulus@switch:~$ nv set qos mapping customer2 pcp 1 switch-priority 4 
cumulus@switch:~$ nv set interface swp5,swp7 qos mapping profile customer2
cumulus@switch:~$ nv config apply

The following example configures the profile customports, which assigns traffic on swp1, swp2, and swp3 to switch priority 4 regardless of the ingress marking.

cumulus@switch:~$ nv set qos mapping customports trust port 
cumulus@switch:~$ nv set qos mapping customports port-default-sp 4
cumulus@switch:~$ nv set interface swp1,swp2,swp3 qos mapping profile customports
cumulus@switch:~$ nv config apply

You define profiles with the source.port_group_list configuration in the qos_features.conf file. A source.port_group_list is one or more names used for a group of settings.

The following example configures two profiles. customer1 applies to swp1, swp4, and swp6. customer2 applies to swp5 and swp7.

source.port_group_list = [customer1,customer2]
source.customer1.packet_priority_source_set = [dscp]
source.customer1.port_set = swp1-swp4,swp6
source.customer1.port_default_priority = 0
source.customer1.cos_0.priority_source.dscp = [0-7]
source.customer2.packet_priority_source_set = [802.1p]
source.customer2.port_set = swp5,swp7
source.customer2.port_default_priority = 0
source.customer2.cos_1.priority_source.8021p = [4]
ConfigurationDescription
source.port_group_listThe names of the port groups (profiles) you want to use.
The following example defines customer1 and customer2:
source.port_group_list = [customer1,customer2]
source.customer1.packet_priority_source_setThe ingress marking trust.
In the following example, ingress DSCP values are for group customer1:
source.customer1.packet_priority_source_set = [dscp]
source.customer1.port_setThe set of ports on which to apply the ingress marking trust policy.
In the following example, ports swp1, swp2, swp3, swp4, and swp6 are for customer1:
source.customer1.port_set = swp1-swp4,swp6
source.customer1.port_default_priorityThe default switch priority marking for unmarked or untrusted traffic.
In the following example, Cumulus Linux marks unmarked traffic or layer 2 traffic for customer1 ports with switch priority 0:
source.customer1.port_default_priority = 0
source.customer1.cos_0.priority_sourceThe ingress DSCP values to a switch priority value mapping for customer1.
In the following example, the set of DSCP values from 0 through 7 map to switch priority 0:
source.customer1.cos_0.priority_source.dscp = [0,1,2,3,4,5,6,7]
source.customer2.packet_priority_source_setThe ingress marking trust for customer2.
In the following example, 802.1p is trusted:
source.packet_priority_source_set = [802.1p]
source.customer2.port_setThe set of ports on which to apply the ingress marking trust policy.
In the following example, swp5 and swp7 apply for customer2:
source.customer2.port_set = swp5,swp7
source.customer2.port_default_priorityThe default switch priority marking for unmarked or untrusted traffic.
In the following example, Cumulus Linux marks unmarked tagged layer 2 traffic or unmarked VLAN tagged traffic for customer1 ports with switch priority 0:
source.customer2.port_default_priority = 0
source.customer2.cos_0.priority_sourceThe switch priority value to an ingress 802.1p value mapping for customer2.
The following example maps ingress 802.1p value 4 to switch priority 1:
source.customer2.cos_1.priority_source.8021p = [4]

The following example configures the profile customports, which assigns traffic on swp1, swp2, and swp3 to switch priority 4 regardless of the ingress marking.

source.port_group_list = [customports]
source.customports.packet_priority_source_set = [port]
source.customports.port_default_priority = 4
source.customports.port_set = swp1,swp2,swp3

Remarking

You can use profiles to remark 802.1p or DSCP on egress according to the switch priority (internal COS) value.

To change the marked value on a packet, the switch ASIC reads the enable or disable rewrite flag on the ingress port and refers to the mapping configuration on the egress port to change the marked value. To remark 802.1p or DSCP values, you have to enable the rewrite on the ingress port and configure the mapping on the egress port.

In the following example configuration, only packets that ingress on swp1 and egress on swp2 change the marked value of the packet. Packets that ingress on other ports and egress on swp2 do not change the marked value of the packet. The commands map switch priority 0 and 1 to egress DSCP 37.

cumulus@switch:~$ nv set qos remark remark_port_group1 rewrite l3
cumulus@switch:~$ nv set interface swp1 qos remark profile remark_port_group1
cumulus@switch:~$ nv set qos remark remark_port_group2 switch-priority 0 dscp 37
cumulus@switch:~$ nv set qos remark remark_port_group2 switch-priority 1 dscp 37
cumulus@switch:~$ nv set interface swp2 qos remark profile remark_port_group2
cumulus@switch:~$ nv config apply

You define these profiles with remark.port_group_list in the /etc/cumulus/datapath/qos/qos_features.conf file. The name is a label for configuration settings.

remark.port_group_list = [remark_port_group1,remark_port_group2]
remark.remark_port_group1.packet_priority_remark_set = [dscp]
remark.remark_port_group1.port_set = swp1
remark.remark_port_group2.packet_priority_remark_set = []
remark.remark_port_group2.port_set = swp2
remark.remark_port_group2.cos_0.priority_remark.dscp = [37]
remark.remark_port_group2.cos_1.priority_remark.dscp = [37]

Egress Scheduling

You can use port groups with egress scheduling weights to assign different profiles to different egress ports.

In the following example, the profile list2 applies to swp1, swp3, and swp18. list2 only assigns weights to queues 2, 5, and 6, and schedules the other queues on a best-effort basis when there is no congestion in queues 2, 5, or 6. list1 applies to swp2 and assigns weights to all queues.

cumulus@switch:~$ nv set qos egress-scheduler list2 traffic-class 2,5,6 mode dwrr 
cumulus@switch:~$ nv set qos egress-scheduler list2 traffic-class 2,5 bw-percent 50 
cumulus@switch:~$ nv set qos egress-scheduler list2 traffic-class 6 mode strict
cumulus@switch:~$ nv set interface swp1,swp3,swp18 qos egress-scheduler profile list2
cumulus@switch:~$ nv set qos egress-scheduler list1 traffic-class 0,3,4,5,6 mode dwrr 
cumulus@switch:~$ nv set qos egress-scheduler list1 traffic-class 0,3,4,5,6 bw-percent 10 
cumulus@switch:~$ nv set qos egress-scheduler list1 traffic-class 1 mode dwrr
cumulus@switch:~$ nv set qos egress-scheduler list1 traffic-class 1 bw-percent 20 
cumulus@switch:~$ nv set qos egress-scheduler list1 traffic-class 2 mode dwrr
cumulus@switch:~$ nv set qos egress-scheduler list1 traffic-class 2 bw-percent 30 
cumulus@switch:~$ nv set qos egress-scheduler list1 traffic-class 7 mode strict
cumulus@switch:~$ nv set interface swp2 qos egress-scheduler profile list1
cumulus@switch:~$ nv config apply

You define port groups with egress_sched.port_group_list in the /etc/cumulus/datapath/qos/qos_features.conf file. An egress_sched.port_group_list includes the names for the group settings. The name is a label (profile) for the configuration settings.

egress_sched.port_group_list = [list1,list2]
egress_sched.list1.port_set = swp2
egress_sched.list1.egr_queue_0.bw_percent = 10
egress_sched.list1.egr_queue_1.bw_percent = 20
egress_sched.list1.egr_queue_2.bw_percent = 30
egress_sched.list1.egr_queue_3.bw_percent = 10
egress_sched.list1.egr_queue_4.bw_percent = 10
egress_sched.list1.egr_queue_5.bw_percent = 10
egress_sched.list1.egr_queue_6.bw_percent = 10
egress_sched.list1.egr_queue_7.bw_percent = 0
#
egress_sched.list2.port_set = [swp1,swp3,swp18]
egress_sched.list2.egr_queue_2.bw_percent = 50
egress_sched.list2.egr_queue_5.bw_percent = 50
egress_sched.list2.egr_queue_6.bw_percent = 0
ConfigurationDescription
egress_sched.port_group_listThe names of the port groups (labels) to use.
The following example defines port groups list1 snd list2:
egress_sched.port_group_list = [list1,list2]
egress_sched.list1.port_setThe interfaces on which you want to apply the port group.
egress_sched.list1.port_set = swp2
egress_sched.list1.egr_queue_0.bw_percentThe percentage of bandwidth for egress queue 0.
egress_sched.list1.egr_queue_0.bw_percent = 10
egress_sched.list1.egr_queue_1.bw_percentThe percentage of bandwidth for egress queue 1.
egress_sched.list1.egr_queue_1.bw_percent = 20
egress_sched.list1.egr_queue_2.bw_percentThe percentage of bandwidth for egress queue 2.
egress_sched.list1.egr_queue_2.bw_percent = 30
egress_sched.list1.egr_queue_3.bw_percentThe percentage of bandwidth for egress queue 3.
egress_sched.list1.egr_queue_3.bw_percent = 10
egress_sched.list1.egr_queue_4.bw_percentThe percentage of bandwidth for egress queue 4.
egress_sched.list1.egr_queue_4.bw_percent = 10
egress_sched.list1.egr_queue_5.bw_percentThe percentage of bandwidth for egress queue 5.

egress_sched.list1.egr_queue_5.bw_percent = 10
egress_sched.list1.egr_queue_6.bw_percentThe percentage of bandwidth for egress queue 6.
egress_sched.list1.egr_queue_6.bw_percent = 10
egress_sched.list1.egr_queue_7.bw_percentThe percentage of bandwidth for egress queue 7.
0 indicates a strict priority queue:
egress_sched.list1.egr_queue_7.bw_percent = 0
egress_sched.list2.port_setThe interfaces you want to apply to the port group.
The following example applies swp1, swp3 and swp18 to port group list2:
egress_sched.list2.port_set = [swp1,swp3,swp18]
egress_sched.list2.egr_queue_2.bw_percentThe percentage of bandwidth for egress queue 2.
egress_sched.list2.egr_queue_2.bw_percent = 50
egress_sched.list2.egr_queue_5.bw_percentThe percentage of bandwidth for egress queue 5.
egress_sched.list2.egr_queue_5.bw_percent = 50
egress_sched.list2.egr_queue_6.bw_percentThe percentage of bandwidth for egress queue 6.
0 indicates a strict priority queue:
egress_sched.list2.egr_queue_6.bw_percent = 0

PFC

To set priority flow control on a group of ports, you create a profile to define the egress queues that support sending PFC pause frames and define the set of interfaces to which you want to apply PFC pause frame configuration. Cumulus Linux automatically enables PFC frame transmit and PFC frame receive, and derives all other PFC settings, such as the buffer limits that trigger PFC frames transmit to start and stop, the amount of reserved buffer space, and the cable length.

The following example applies a PFC profile called my_pfc_ports for egress queue 3 and 5 on swp1, swp2, swp3, swp4, and swp6.

cumulus@switch:~$ nv set qos pfc my_pfc_ports switch-priority 3,5
cumulus@switch:~$ nv set interface swp1-4,swp6 qos pfc profile my_pfc_ports
cumulus@switch:~$ nv config apply

The following example applies a PFC profile called my_pfc_ports2 for egress queue 0 on swp1. The commands disable PFC frame receive, and set the buffer limit that triggers PFC frame transmission to stop to 1500 bytes and to start to 1000 bytes. The commands also set the amount of reserved buffer space to 2000 bytes, and the cable length to 50 meters:

cumulus@switch:~$ nv set qos pfc my_pfc_ports2 switch-priority 0 
cumulus@switch:~$ nv set qos pfc my_pfc_ports2 xoff-threshold 1500 
cumulus@switch:~$ nv set qos pfc my_pfc_ports2 xon-threshold 1000 
cumulus@switch:~$ nv set qos pfc my_pfc_ports2 tx enable 
cumulus@switch:~$ nv set qos pfc my_pfc_ports2 rx disable 
cumulus@switch:~$ nv set qos pfc my_pfc_ports2 port-buffer 2000 
cumulus@switch:~$ nv set qos pfc my_pfc_ports2 cable-length 50
cumulus@switch:~$ nv set interface swp1 qos pfc profile my_pfc_ports2
cumulus@switch:~$ nv config apply
All PFC commands
Command
Description
nv set qos pfc <profile> port-buffer <value>The amount of reserved buffer space (from the global shared buffer) for the interfaces defined in the port group list .
The following example sets the amount of reserved buffer space to 25000 bytes:
nv set qos pfc my_pfc_ports port-buffer 25000
nv set qos pfc <profile> xoff-threshold <value>The amount of reserved buffer that the switch must consume before sending a PFC pause frame out of the set of interfaces in the port group list.
The following example sends PFC pause frames after consuming 20000 bytes of reserved buffer:
nv set qos pfc my_pfc_ports xoff-threshold 20000
nv set qos pfc <profile> xon-threshold <value>The number of bytes below the xoff threshold that the buffer consumption must drop below before sending PFC pause frames stops.
In the following example, the buffer congestion must reduce by 1000 bytes (to 8000 bytes) before PFC pause frames stop:
nv set qos pfc my_pfc_ports xon-threshold 1000
nv set qos pfc <profile> rx enable
nv set qos pfc <profile> rx disable
Enables or disables sending PFC pause frames. The default value is enable.
The following example disables sending PFC pause frames:
nv set qos pfc my_pfc_ports rx disable
nv set qos pfc <profile> tx enable
nv set qos pfc <profile> tx disable
Enables or disables receiving PFC pause frames. You do not need to define the COS values for rx enable. The switch receives any COS value. The default value is enable.
The following example disables receiving PFC pause frames:
nv set qos pfc my_pfc_ports tx disable
nv set qos pfc <profile> cable-length <value>The length, in meters, of the cable that attaches to the ports. Cumulus Linux uses this value internally to determine the latency between generating a PFC pause frame and receiving the PFC pause frame. The default is 10 meters.
The following example sets the cable length to 5 meters:
nv set qos pfc my_pfc_ports cable-length 5

Edit the priority flow control section of the /etc/cumulus/datapath/qos/qos_features.conf file.

The following example applies a PFC profile called my_pfc_ports for egress queue 3 and 5 on swp1, swp2, swp3, swp4, and swp6.

pfc.port_group_list = [my_pfc_ports2]
pfc.my_pfc_ports2.cos_list = [0]
pfc.my_pfc_ports2.port_set = swp1

The following example applies a PFC profile called my_pfc_ports2 for egress queue 0 on swp1. The commands also disable PFC frame receive, and set the xoff-size to 1500 bytes, the xon-size to 1000 bytes, the headroom to 2000 bytes, and the cable length to 10 meters:

pfc.port_group_list = [my_pfc_ports2]
pfc.my_pfc_ports2.cos_list = [0]
pfc.my_pfc_ports2.port_set = swp1
pfc.my_pfc_ports2.port_buffer_bytes = 2000
pfc.my_pfc_ports2.xoff_size = 1500
pfc.my_pfc_ports2.xon_delta = 1000
pfc.my_pfc_ports2.tx_enable = true
pfc.my_pfc_ports2.rx_enable = false
pfc.my_pfc_ports2.cable_length = 10
All PFC configuration options
ConfigurationDescription
pfc.my_pfc_ports.port_buffer_bytesThe amount of reserved buffer space (from the global shared buffer) for the interfaces defined in the port group list.
The following example sets the amount of reserved buffer space to 25000 bytes:
pfc.my_pfc_ports.port_buffer_bytes = 25000
pfc.my_pfc_ports.xoff_sizeThe amount of reserved buffer that the switch must consume before sending a PFC pause frame out the set of interfaces in the port group list.
The following example sends PFC pause frames after consuming 10000 bytes of reserved buffer:
pfc.my_pfc_ports.xoff_size = 10000
pfc.my_pfc_ports.xon_deltaThe number of bytes below the xoff threshold that the buffer consumption must drop below before sending PFC pause frames stops.
The following example the buffer congestion must reduce by 2000 bytes (to 8000 bytes) before PFC pause frames stop:
pfc.my_pfc_ports.xon_delta = 2000
pfc.my_pfc_ports.rx_enableEnables (true) or disables (false) sending PFC pause frames. The default value is true.
The following example enables sending PFC pause frames:
pfc.my_pfc_ports.tx_enable = true
pfc.my_pfc_ports.tx_enableEnables (true) or disables (false) receiving PFC pause frames. You do not need to define the COS values for rx_enable. The switch receives any COS value. The default value is true.
The following example enables receiving PFC pause frames:
pfc.my_pfc_ports.rx_enable = true
pfc.my_pfc_ports.cable_lengthThe length, in meters, of the cable that attaches to the port in the port group list. Cumulus Linux uses this value internally to determine the latency between generating a PFC pause frame and receiving the PFC pause frame. The default is 10 meters
In this example, the cable is 5 meters:
pfc.my_pfc_ports.cable_length = 5

ECN

You can create ECN profiles and assign them to different ports.

The following example creates a custom ECN profile called my-red-profile for egress queue (traffic-class) 1 and 2. The commands set the minimum buffer threshold to 40000 bytes, maximum buffer threshold to 200000 bytes, and the probability to 10. The commands also enable RED and apply the ECN profile to swp1 and swp2.

cumulus@switch:~$ nv set qos congestion-control my-red-profile traffic-class 1,2 min-threshold-bytes 40000 
cumulus@switch:~$ nv set qos congestion-control my-red-profile traffic-class 1,2 max-threshold-bytes 200000 
cumulus@switch:~$ nv set qos congestion-control my-red-profile traffic-class 1,2 probability 10
cumulus@switch:~$ nv set qos congestion-control my-red-profile traffic-class 1,2 red enable
cumulus@switch:~$ nv set interface swp1,swp2 qos congestion-control my-red-profile
cumulus@switch:~$ nv config apply

You can configure different thresholds and probability values for different traffic classes in a custom profile:

cumulus@switch:~$ nv set qos congestion-control my-red-profile traffic-class 1,2 min-threshold-bytes 40000 
cumulus@switch:~$ nv set qos congestion-control my-red-profile traffic-class 1,2 max-threshold-bytes 200000 
cumulus@switch:~$ nv set qos congestion-control my-red-profile traffic-class 1,2 probability 10
cumulus@switch:~$ nv set qos congestion-control my-red-profile traffic-class 1,2 red enable
cumulus@switch:~$ nv set qos congestion-control my-red-profile traffic-class 4 min-threshold-bytes 30000 
cumulus@switch:~$ nv set qos congestion-control my-red-profile traffic-class 4 max-threshold-bytes 150000 
cumulus@switch:~$ nv set qos congestion-control my-red-profile traffic-class 4 probability 80
cumulus@switch:~$ nv set interface swp1,swp2 qos congestion-control my-red-profile
cumulus@switch:~$ nv config apply

You can disable ECN bit marking for an ECN profile. The following example disables ECN bit marking in the my-red-profile profile:

cumulus@switch:~$ nv set qos congestion-control my-red-profile traffic-class 1 ecn disable
cumulus@switch:~$ nv config apply

Edit the Explicit Congestion Notification section of the /etc/cumulus/datapath/qos/qos_features.conf file.

The following example creates a custom ECN profile called my-red-profile for egress queue 1 and 2, with a minimum buffer threshold of 40000 bytes, maximum buffer threshold of 200000 bytes, and a probability of 10. The commands also enable RED and apply the ECN profile to swp1 and swp2.

ecn_red.port_group_list = [my-red-profile] 
my-red-profile.egress_queue_list = [1,2]
my-red-profile.port_set = swp1,swp2
my-red-profile.ecn_enable = true
my-red-profile.red_enable = true
my-red-profile.min_threshold_bytes = 40000
my-red-profile.max_threshold_bytes = 200000
my-red-profile.probability = 10

To disable ECN bit marking, set ecn_enable to false. The following example disables ECN bit marking in the my-red-profile.

...
my-red-profile.ecn_enable = false 
...

Traffic Pools

Cumulus Linux supports adjusting the following traffic pools:

Traffic PoolDescription
default-lossyThe default traffic pool for all switch priorities.
default-losslessThe traffic pool for lossless traffic when you enable flow control.
mc-lossyThe traffic pool for multicast traffic.
roce-lossyThe traffic pool for RoCE lossy mode.
roce-losslessThe traffic pool for RoCE lossless mode.

  • You can only have a single lossless pool configured on the switch at a time. Configure the roce-lossless pool when you are using RoCE, otherwise configure the default-lossless pool.

  • You can configure multiple lossy pools concurrently.

You configure a traffic pool by associating switch priorities and defining the buffer memory percentages allocated to the pools. The following example associates switch priority 2 and allocates a memory percentage of 30 for the mc-lossy pool:

cumulus@switch:~$ nv set qos traffic-pool default-lossy switch-priority 0,1,3,4,5,6,7
cumulus@switch:~$ nv set qos traffic-pool default-lossy memory-percent 70
cumulus@switch:~$ nv set qos traffic-pool mc-lossy switch-priority 2
cumulus@switch:~$ nv set qos traffic-pool mc-lossy memory-percent 30
cumulus@switch:~$ nv config apply

Configure the following settings in the /etc/mlx/datapath/qos/qos_infra.conf file:

traffic.priority_group_list = [service2,bulk]

priority_group.service2.cos_list = [2]
priority_group.bulk.cos_list = [0,1,3,4,5,6,7]

priority_group.service2.id = 2

priority_group.service2.service_pool = 2

ingress_service_pool.2.percent = 30
ingress_service_pool.0.percent = 70

port.service_pool.2.ingress_buffer.reserved = 10240

ingress_service_pool.2.mode = 1

port.service_pool.2.ingress_buffer.dynamic_quota = ALPHA_8

priority_group.service2.ingress_buffer.dynamic_quota = ALPHA_8

egress_buffer.egr_queue_2.uc.service_pool = 2

egress_service_pool.2.percent = 30
egress_service_pool.0.percent = 70

port.service_pool.2.egress_buffer.uc.reserved = 0

egress_buffer.cos_2.mc.service_pool = 2

egress_buffer.egr_queue_2.uc.reserved = 1024

port.egress_buffer.mc.reserved = 10240
port.egress_buffer.mc.shared_size = 2097152
egress_service_pool.2.mode = 1

port.service_pool.2.egress_buffer.uc.dynamic_quota = ALPHA_8

egress_buffer.egr_queue_2.uc.dynamic_quota = ALPHA_8

egress_buffer.cos_2.mc.dynamic_quota = ALPHA_8

For additional default-lossless and RoCE pool examples, see Flow Control Buffers and RoCE. You can view traffic-pool configuration with the nv show qos traffic-pool <pool name> command:

cumulus@switch:~$  nv show qos traffic-pool default-lossy
                   applied
-----------------  -------
memory-percent     80     
[switch-priority]  0      
[switch-priority]  1      
[switch-priority]  2      
[switch-priority]  3      
[switch-priority]  4      
[switch-priority]  5      
[switch-priority]  6      
[switch-priority]  7      

Advanced Buffer Tuning

You can use NVUE commands to tune advanced buffer properties in addition to the supported traffic pool configurations. Advanced buffer configuration can override the base traffic-pool profiles configured on the system.

You can only configure advanced buffer settings for the default-global profile.

Buffer Regions

You can adjust advanced buffer settings with the following NVUE command:

You can adjust settings for the following supported buffer regions and properties:

BuffersSupported Property Values
ingress-lossy-buffer
    Cumulus Linux supports the following properties for the bulk, control, and service[1-6] priority groups:
    name - The priority group alias name.
    reserved - The reserved buffer allocation in bytes.
    service-pool - Service pool mapping.
    shared-alpha - The dynamic shared buffer alpha allocation.
    shared-bytes - The static shared buffer allocation in bytes.
    switch-priority - Switch priority values.
egress-lossless-buffer
    reserved - The reserved buffer allocation in bytes.
    service-pool - Service pool mapping.
    shared-alpha - The dynamic shared buffer alpha allocation.
    shared-bytes - The static shared buffer allocation in bytes.
ingress-lossless-buffer
    service-pool - Service pool mapping.
    shared-alpha - The dynamic shared buffer alpha allocation.
    shared-bytes - The static shared buffer allocation in bytes.
egress-lossy-buffer
    multicast-port - Multicast port reserved or shared-bytes allocation in bytes.
    multicast-switch-priority [0-7] - Set the reserved, service-pool,shared-alpha, or shared-bytes properties for each multicast switch priority.
    traffic-class [0-15] - Set the reserved, service-pool,shared-alpha, or shared-bytes properties for each traffic class.

Configure shared-bytes for buffer regions mapped to static service pools, and shared-alpha for buffer regions mapped to dynamic service pools.

The shared buffer alpha value determines the proportion of available shared memory allocated across buffer regions. Regions with higher alpha values receive a higher proportion of available shared buffer memory. The following example changes the ingress-lossless-buffer shared alpha value to alpha_2 when using RoCE lossless mode:

cumulus@switch:~$ nv set qos advance-buffer-config default-global ingress-lossless-buffer shared-alpha alpha_2
cumulus@switch:~$ nv config apply

Service Pools

You can configure ingress and egress service pool profile properties with the following NVUE commands:

You can adjust the following properties for each pool:

PropertyDescription
infiniteThe pool infinite flag.
memory-percentThe pool memory percent allocation.
modeThe pool mode: static or dynamic.
reservedThe reserved buffer allocation in bytes.
shared-alphaThe dynamic shared buffer alpha allocation.
shared-bytesThe static shared buffer allocation in bytes.

A relationship exists between the default traffic pools and the advanced buffer configuration settings.

Use caution when configuring advanced buffer settings. NVUE presents a warning if you attempt to apply incompatible traffic pool and advanced buffer configurations. NVUE performs the following validation checks before applying advanced buffer configurations:

  • You must map all switch priorities (0-7) to a priority group. You can map more than one switch priority to the same priority group.
  • The sum of memory-percent values across all ingress pools must be less than or equal to 100 percent.
  • The sum of memory-percent values across all egress pools must be less than or equal to 100 percent.

Reference the table below to view the mappings between the default traffic pool and advanced buffer properties:

Default Traffic PoolDefault Traffic Pool PropertiesAdvanced Buffer Region or Service PoolAdvanced Buffer Properties
default-lossymemory-percentingress-pool 0
egress-pool 0
memory-percent
default-lossyswitch-priorityingress-lossy-bufferpriority-group bulk switch-priority
default-losslessmemory-percentingress-pool 1
egress-pool 1
memory-percent
roce-losslessmemory-percentingress-pool 1
egress-pool 1
memory-percent
mc-lossymemory-percentingress-pool 2
egress-pool 2
memory-percent
mc-lossyswitch-priorityingress-lossy-bufferpriority-group service2 switch-priority

For example, to assign 20 percent of memory to a new static service pool, you must allow 20 percent of memory to be available from the default traffic pools. The following commands reduce the default-lossy traffic pool to 80 percent memory, allowing you to assign the memory to ingress-pool 3:

cumulus@switch:~$ nv set qos traffic-pool default-lossy memory-percent 80
cumulus@switch:~$ nv set qos advance-buffer-config default-global ingress-pool 3 memory-percent 20
cumulus@switch:~$ nv config apply

You can view advanced buffer configuration with the nv show qos advance-buffer-config default-global <buffer/pool name> command:

cumulus@switch:~$ nv show qos advance-buffer-config default-global ingress-pool
Pool-Id  infinite  memory-percent  mode     reserved  shared-alpha  shared-bytes
-------  --------  --------------  -------  --------  ------------  ------------
0                  80              dynamic                                      
3                  20    

Syntax Checker

Cumulus Linux provides a syntax checker for the qos_features.conf and qos_infra.conf files to check for errors, such missing parameters or invalid parameter labels and values.

The syntax checker runs automatically with every switchd reload.

You can run the syntax checker manually from the command line with the cl-consistency-check --datapath-syntax-check command. If errors exist, they write to stderr by default. If you run the command with -q, errors write to the /var/log/switchd.log file.

The cl-consistency-check --datapath-syntax-check command takes the following options:

Option
Description
-hDisplays this list of command options.
-qRuns the command in quiet mode. Errors write to the /var/log/switchd.log file instead of stderr.
-qiRuns the syntax checker against a specified qos_infra.conf file.
-qfRuns the syntax checker against a specified qos_features.conf file.

By default the syntax checker assumes:

You can run the syntax checker when switchd is either running or stopped.

Show Qos Counters

NVUE provides the following commands to show QoS statistics for an interface:

NVUE Command
Description
nv show interface <interface> counters qosShows all QoS statistics for a specific interface.
nv show interface <interface> counters qos egress-queue-statsShows QoS egress queue statistics for a specific interface.
nv show interface <interface> counters qos ingress-buffer-statsShows QoS ingress buffer statistics for a specific interface.
nv show interface <interface> counters qos pfc-statsShows QoS PFC statistics for a specific interface.
nv show interface <interface> counters qos port-statsShows QoS port statistics for a specific interface.

The following example shows all QoS statistics for swp1:

cumulus@switch:~$ nv show interface swp1 counters qos
Ingress Buffer Statistics
============================
    priority-group  rx-frames  rx-buffer-discards  rx-shared-buffer-discards
    --------------  ---------  ------------------  -------------------------
    0               0          0 Bytes             0 Bytes                  
    1               0          0 Bytes             0 Bytes                  
    2               0          0 Bytes             0 Bytes                  
    3               0          0 Bytes             0 Bytes                  
    4               0          0 Bytes             0 Bytes                  
    5               0          0 Bytes             0 Bytes                  
    6               0          0 Bytes             0 Bytes                  
    7               0          0 Bytes             0 Bytes                  

Egress Queue Statistics
==========================
    traffic-class  tx-frames  tx-bytes  tx-uc-buffer-discards  wred-discards
    -------------  ---------  --------  ---------------------  -------------
    0              0          0 Bytes   0 Bytes                0            
    1              0          0 Bytes   0 Bytes                0            
    2              0          0 Bytes   0 Bytes                0            
    3              0          0 Bytes   0 Bytes                0            
    4              0          0 Bytes   0 Bytes                0            
    5              0          0 Bytes   0 Bytes                0            
    6              0          0 Bytes   0 Bytes                0            
    7              0          0 Bytes   0 Bytes                0            

PFC Statistics
=================
    switch-priority  rx-pause-frames  rx-pause-duration  tx-pause-frames  tx-pause-duration
    ---------------  ---------------  -----------------  ---------------  -----------------
    0                0                0                  0                0                
    1                0                0                  0                0                
    2                0                0                  0                0                
    3                0                0                  0                0                
    4                0                0                  0                0                
    5                0                0                  0                0                
    6                0                0                  0                0                
    7                0                0                  0                0                

Qos Port Statistics
======================
    Counter             Receive  Transmit
    ------------------  -------  --------
    ecn-marked-packets  n/a      0       
    mc-buffer-discards  n/a      0       
    pause-frames        0        0
... 

Clear QoS Buffers

cumulus@switch:~$ nv action clear qos buffer pool
all qos pool buffers cleared.
Action succeeded

Default Configuration Files

qos_features.conf
# /etc/cumulus/datapath/qos/qos_features.conf
#
# Copyright © 2021 NVIDIA CORPORATION & AFFILIATES. ALL RIGHTS RESERVED.
#
# This software product is a proprietary product of Nvidia Corporation and its affiliates
# (the "Company") and all right, title, and interest in and to the software
# product, including all associated intellectual property rights, are and
# shall remain exclusively with the Company.
#
# This software product is governed by the End User License Agreement
# provided with the software product. 

# packet header field used to determine the packet priority level
# fields include {802.1p, dscp, port}
traffic.packet_priority_source_set = [802.1p]
traffic.port_default_priority      = 0

# packet priority source values assigned to each internal cos value
# internal cos values {cos_0..cos_7}
# (internal cos 3 has been reserved for CPU-generated traffic)
# 802.1p values = {0..7}
traffic.cos_0.priority_source.8021p = [0]
traffic.cos_1.priority_source.8021p = [1]
traffic.cos_2.priority_source.8021p = [2]
traffic.cos_3.priority_source.8021p = [3]
traffic.cos_4.priority_source.8021p = [4]
traffic.cos_5.priority_source.8021p = [5]
traffic.cos_6.priority_source.8021p = [6]
traffic.cos_7.priority_source.8021p = [7]

# dscp values = {0..63}
#traffic.cos_0.priority_source.dscp = [0,1,2,3,4,5,6,7]
#traffic.cos_1.priority_source.dscp = [8,9,10,11,12,13,14,15]
#traffic.cos_2.priority_source.dscp = [16,17,18,19,20,21,22,23]
#traffic.cos_3.priority_source.dscp = [24,25,26,27,28,29,30,31]
#traffic.cos_4.priority_source.dscp = [32,33,34,35,36,37,38,39]
#traffic.cos_5.priority_source.dscp = [40,41,42,43,44,45,46,47]
#traffic.cos_6.priority_source.dscp = [48,49,50,51,52,53,54,55]
#traffic.cos_7.priority_source.dscp = [56,57,58,59,60,61,62,63]
# remark packet priority value
# fields include {802.1p, dscp}
traffic.packet_priority_remark_set = []

# packet priority remark values assigned from each internal cos value
# internal cos values {cos_0..cos_7}
# (internal cos 3 has been reserved for CPU-generated traffic)
# 802.1p values = {0..7}
#traffic.cos_0.priority_remark.8021p = [0]
#traffic.cos_1.priority_remark.8021p = [1]
#traffic.cos_2.priority_remark.8021p = [2]
#traffic.cos_3.priority_remark.8021p = [3]
#traffic.cos_4.priority_remark.8021p = [4]
#traffic.cos_5.priority_remark.8021p = [5]
#traffic.cos_6.priority_remark.8021p = [6]
#traffic.cos_7.priority_remark.8021p = [7]

# dscp values = {0..63}
#traffic.cos_0.priority_remark.dscp = [0]
#traffic.cos_1.priority_remark.dscp = [8]
#traffic.cos_2.priority_remark.dscp = [16]
#traffic.cos_3.priority_remark.dscp = [24]
#traffic.cos_4.priority_remark.dscp = [32]
#traffic.cos_5.priority_remark.dscp = [40]
#traffic.cos_6.priority_remark.dscp = [48]
#traffic.cos_7.priority_remark.dscp = [56]

# source.port_group_list = [source_port_group]
# source.source_port_group.packet_priority_source_set = [dscp]
# source.source_port_group.port_set = swp1-swp4,swp6
# source.source_port_group.port_default_priority = 0
# source.source_port_group.cos_0.priority_source.dscp = [0,1,2,3,4,5,6,7]
# source.source_port_group.cos_1.priority_source.dscp = [8,9,10,11,12,13,14,15]
# source.source_port_group.cos_2.priority_source.dscp = [16,17,18,19,20,21,22,23]
# source.source_port_group.cos_3.priority_source.dscp = [24,25,26,27,28,29,30,31]
# source.source_port_group.cos_4.priority_source.dscp = [32,33,34,35,36,37,38,39]
# source.source_port_group.cos_5.priority_source.dscp = [40,41,42,43,44,45,46,47]
# source.source_port_group.cos_6.priority_source.dscp = [48,49,50,51,52,53,54,55]
# source.source_port_group.cos_7.priority_source.dscp = [56,57,58,59,60,61,62,63]

# remark.port_group_list = [remark_port_group]
# remark.remark_port_group.packet_priority_remark_set = [dscp]
# remark.remark_port_group.port_set = swp1-swp4,swp6
# remark.remark_port_group.cos_0.priority_remark.dscp = [0]
# remark.remark_port_group.cos_1.priority_remark.dscp = [8]
# remark.remark_port_group.cos_2.priority_remark.dscp = [16]
# remark.remark_port_group.cos_3.priority_remark.dscp = [24]
# remark.remark_port_group.cos_4.priority_remark.dscp = [32]
# remark.remark_port_group.cos_5.priority_remark.dscp = [40]
# remark.remark_port_group.cos_6.priority_remark.dscp = [48]
# remark.remark_port_group.cos_7.priority_remark.dscp = [56]

# to configure priority flow control on a group of ports:
# -- assign cos value(s) to the cos list
# -- add or replace a port group names in the port group list
# -- for each port group in the list
#    -- populate the port set, e.g.
#       swp1-swp4,swp8,swp50s0-swp50s3
#    -- set a PFC buffer size in bytes for each port in the group
#    -- set the xoff byte limit (buffer limit that triggers PFC frames transmit to start)
#    -- set the xon byte delta (buffer limit that triggers PFC frames transmit to stop)
#    -- enable PFC frame transmit and/or PFC frame receive

# priority flow control
#pfc.port_group_list = [pfc_port_group]
#pfc.pfc_port_group.cos_list = []
#pfc.pfc_port_group.port_set = swp1-swp4,swp6
#pfc.pfc_port_group.port_buffer_bytes = 25000
#pfc.pfc_port_group.xoff_size = 10000
#pfc.pfc_port_group.xon_delta = 2000
#pfc.pfc_port_group.tx_enable = true
#pfc.pfc_port_group.rx_enable = true
#Specify cable length in mts
#pfc.pfc_port_group.cable_length = 10

# to configure pause on a group of ports:
# -- add or replace port group names in the port group list
# -- for each port group in the list
#    -- populate the port set, e.g.
#       swp1-swp4,swp8,swp50s0-swp50s3
#    -- set a pause buffer size in bytes for each port
#    -- set the xoff byte limit (buffer limit that triggers pause frames transmit to start)
#    -- set the xon byte delta (buffer limit that triggers pause frames transmit to stop)
#    -- enable pause frame transmit and/or pause frame receive

# link pause
# link_pause.port_group_list = [pause_port_group]
# link_pause.pause_port_group.port_set = swp1-swp4,swp6
# link_pause.pause_port_group.port_buffer_bytes = 25000
# link_pause.pause_port_group.xoff_size = 10000
# link_pause.pause_port_group.xon_delta = 2000
# link_pause.pause_port_group.rx_enable = true
# link_pause.pause_port_group.tx_enable = true
# Specify cable length in mts
# link_pause.pause_port_group.cable_length = 10

# Explicit Congestion Notification
# to configure ECN and RED on a group of ports:
# -- add or replace port group names in the port group list
# -- assign cos value(s) to the cos list
# -- for each port group in the list
#    -- populate the port set, e.g.
#       swp1-swp4,swp8,swp50s0-swp50s3
# -- to enable RED requires the latest traffic.conf
#Default ECN configuration on TC0
default_ecn_red_conf.egress_queue_list = [0]
default_ecn_red_conf.ecn_enable = true
default_ecn_red_conf.red_enable = false
default_ecn_red_conf.min_threshold_bytes = 150000
default_ecn_red_conf.max_threshold_bytes = 1500000
default_ecn_red_conf.probability = 100

#ecn_red.port_group_list = [ecn_red_port_group]
#ecn_red.ecn_red_port_group.egress_queue_list = [1]
#ecn_red.ecn_red_port_group.port_set = allports
#ecn_red.ecn_red_port_group.ecn_enable = true
#ecn_red.ecn_red_port_group.red_enable = false
#ecn_red.ecn_red_port_group.min_threshold_bytes = 40000
#ecn_red.ecn_red_port_group.max_threshold_bytes = 200000
#ecn_red.ecn_red_port_group.probability = 100

# Hierarchical traffic shaping
# to configure shaping at 2 levels:
#     - per egress queue egr_queue_0 - egr_queue_7
#     - port level aggregate
# -- add or replace a port group names in the port group list
# -- for each port group in the list
#    -- populate the port set, e.g.
#       swp1-swp4,swp8,swp50s0-swp50s3
#    -- set min and max rates in kbps for each egr_queue [min, max]
#    -- set max rate in kbps at port level
# shaping.port_group_list = [shaper_port_group]
# shaping.shaper_port_group.port_set = swp1-swp3,swp5,swp7s0-swp7s3
# shaping.shaper_port_group.egr_queue_0.shaper = [50000, 100000]
# shaping.shaper_port_group.egr_queue_1.shaper = [51000, 150000]
# shaping.shaper_port_group.egr_queue_2.shaper = [52000, 200000]
# shaping.shaper_port_group.egr_queue_3.shaper = [53000, 250000]
# shaping.shaper_port_group.egr_queue_4.shaper = [54000, 300000]
# shaping.shaper_port_group.egr_queue_5.shaper = [55000, 350000]
# shaping.shaper_port_group.egr_queue_6.shaper = [56000, 400000]
# shaping.shaper_port_group.egr_queue_7.shaper = [57000, 450000]
# shaping.shaper_port_group.port.shaper = 900000

# default egress scheduling weight per egress queue
# To be applied to all the ports if port_group profile not configured
# If you do not specify any bw_percent of egress_queues, those egress queues
# will assume DWRR weight 0 - no egress scheduling for those queues
# '0' indicates strict priority

default_egress_sched.egr_queue_0.bw_percent = 12
default_egress_sched.egr_queue_1.bw_percent = 13
default_egress_sched.egr_queue_2.bw_percent = 12
default_egress_sched.egr_queue_3.bw_percent = 13
default_egress_sched.egr_queue_4.bw_percent = 12
default_egress_sched.egr_queue_5.bw_percent = 13
default_egress_sched.egr_queue_6.bw_percent = 12
default_egress_sched.egr_queue_7.bw_percent = 13

# port_group profile for egress scheduling weight per egress queue
# If you do not specify any bw_percent of egress_queues, those egress queues
# will assume DWRR weight 0 - no egress scheduling for those queues
# '0' indicates strict priority
#egress_sched.port_group_list = [sched_port_group1]
#egress_sched.sched_port_group1.port_set = swp2
#egress_sched.sched_port_group1.egr_queue_0.bw_percent = 10
#egress_sched.sched_port_group1.egr_queue_1.bw_percent = 20
#egress_sched.sched_port_group1.egr_queue_2.bw_percent = 30
#egress_sched.sched_port_group1.egr_queue_3.bw_percent = 10
#egress_sched.sched_port_group1.egr_queue_4.bw_percent = 10
#egress_sched.sched_port_group1.egr_queue_5.bw_percent = 10
#egress_sched.sched_port_group1.egr_queue_6.bw_percent = 10
#egress_sched.sched_port_group1.egr_queue_7.bw_percent = 0

# Cut-through is disabled by default on all chips with the exception of
# Spectrum.  On Spectrum cut-through cannot be disabled.
#cut_through_enable = false
qos_infra.conf
#
# Default qos-infra configuration for Mellanox Spectrum chip
#
# Copyright © 2021 NVIDIA CORPORATION & AFFILIATES. ALL RIGHTS RESERVED.
#
# This software product is a proprietary product of Nvidia Corporation and its affiliates
# (the "Company") and all right, title, and interest in and to the software
# product, including all associated intellectual property rights, are and
# shall remain exclusively with the Company.
#
# This software product is governed by the End User License Agreement
# provided with the software product. 

# scheduling algorithm: algorithm values = {dwrr}
scheduling.algorithm = dwrr

# priority groups
# supported group names are control, bulk, service1-6
traffic.priority_group_list = [bulk]

# internal cos values assigned to each priority group
# each cos value should be assigned exactly once
# internal cos values {0..7}
priority_group.bulk.cos_list = [0,1,2,3,4,5,6,7]

# Alias Name defined for each priority group
# Valid string between 0-255 chars
# Sample alias support for naming priority groups
#priority_group.bulk.alias = "Bulk"

# priority group ID assigned to each priority group
#priority_group.control.id = 7
#priority_group.service2.id = 2
priority_group.bulk.id = 0

# all priority groups share a service pool on Spectrum
# service pools assigned to each priority group
priority_group.bulk.service_pool = 0

# service pool assigned for lossless PGs
#flow_control.ingress_service_pool = 0

# --- ingress buffer space allocations ---
# total buffer
#  - ingress minimum buffer allocations
#  - ingress service pool buffer allocations
#  - priority group ingress headroom allocations
#  - ingress global headroom allocations
#  = total ingress shared buffer size

# ingress service pool buffer allocation: percent of total buffer
# If a service pool has no priority groups, the buffer is added
# to the shared buffer space.
ingress_service_pool.0.percent = 100

# Ingress buffer port.pool buffer : size in bytes
#port.service_pool.0.ingress_buffer.reserved = 10240
#port.service_pool.0.ingress_buffer.shared_size = 9000
#port.management.ingress_buffer.reserved = 0


# priority group minimum buffer allocation: size in bytes
# priority group shared buffer allocation: shared buffer size in bytes
# if a priority group has no packet priority values assigned to it, the buffers will not be allocated

#priority_group.bulk.ingress_buffer.reserved           = 0
#priority_group.bulk.ingress_buffer.shared_size        = 15

# ---- ingress dynamic buffering settings
# To enable ingress static pool, set the mode to 0
ingress_service_pool.0.mode = 1

# The ALPHA defines the max% of buffers (quota) available on a
# per ingress port OR ipool, Ingress PG, Egress TC, Egress port OR epool.
# ALPHA value equates to the following buffer limit calculated as:
# alpha%(alpha+1) = Max Buffer percentage

# https://community.mellanox.com/s/article/understanding-the-alpha-parameter-in-the-buffer-configuration-of-mellanox-spectrum-switches
# Each shared buffer pool can use a maximum of [total_buffer * (alpha / (alpha+1))]
# Configure quota values mapped to the following alpha values:
# Configuration value = alpha level:
# Both ALPHA_*(string representation) as well as integer values (old representation) will be supported for alpha
# 0/ALPHA_0  = alpha 0
# 1/ALPHA_1_128  = alpha 1/128
# 2/ALPHA_1_64  = alpha 1/64
# 3/ALPHA_1_32  = alpha 1/32
# 4/ALPHA_1_16  = alpha 1/16
# 5/ALPHA_1_8  = alpha 1/8
# 6/ALPHA_1_4  = alpha 1/4
# 7/ALPHA_1_2  = alpha 1/2
# 8/ALPHA_1  = alpha  1
# 9/ALPHA_2  = alpha  2
# 10/ALPHA_4 = alpha  4
# 11/ALPHA_8 = alpha  8
# 12/ALPHA_16 = alpha 16
# 13/ALPHA_32 = alpha 32
# 14/ALPHA_64 = alpha 64
# 15/ALPHA_INFINITY = alpha Infinity

# Ingress buffer per-port dynamic buffering alpha (Default: ALPHA_8)
#port.service_pool.0.ingress_buffer.dynamic_quota = ALPHA_8
#port.management.ingress_buffer.dynamic_quota = ALPHA_8


# Ingress buffer dynamic buffering alpha for lossless PGs (if any; Default: ALPHA_1)
#flow_control.ingress_buffer.dynamic_quota = ALPHA_1

# Ingress buffer per-PG dynamic buffering alpha (Default: ALPHA_8)
#priority_group.bulk.ingress_buffer.dynamic_quota = ALPHA_8

# --- egress buffer space allocations ---
# total egress buffer
#  - minimum buffer allocations
#  = total service pool buffer size
# service pool assigned for lossless PGs
#flow_control.egress_service_pool = 0

# service pool assigned for egress queues
egress_buffer.egr_queue_0.uc.service_pool = 0
egress_buffer.egr_queue_1.uc.service_pool = 0
egress_buffer.egr_queue_2.uc.service_pool = 0
egress_buffer.egr_queue_3.uc.service_pool = 0
egress_buffer.egr_queue_4.uc.service_pool = 0
egress_buffer.egr_queue_5.uc.service_pool = 0
egress_buffer.egr_queue_6.uc.service_pool = 0
egress_buffer.egr_queue_7.uc.service_pool = 0

# Service pool buffer allocation: percent of total
# buffer size.
egress_service_pool.0.percent = 100

# Egress buffer port.pool buffer : size in bytes
#port.service_pool.0.egress_buffer.uc.reserved = 10240
#port.service_pool.0.egress_buffer.uc.shared_size = 9000
#port.management.egress_buffer.reserved = 0

# Front panel port egress buffer limits enforced for each
# priority group.
# Unlimited egress buffers not supported on Spectrum.
#priority_group.bulk.unlimited_egress_buffer     = false

# if a priority group has no cos values assigned to it, the buffers will not be allocated

# Service pool mapping for MC.SP region
egress_buffer.cos_0.mc.service_pool = 0
egress_buffer.cos_1.mc.service_pool = 0
egress_buffer.cos_2.mc.service_pool = 0
egress_buffer.cos_3.mc.service_pool = 0
egress_buffer.cos_4.mc.service_pool = 0
egress_buffer.cos_5.mc.service_pool = 0
egress_buffer.cos_6.mc.service_pool = 0
egress_buffer.cos_7.mc.service_pool = 0
# Reserved and static shared buffer allocation for MC.SP region: size in bytes
#egress_buffer.cos_0.mc.reserved = 10240
#egress_buffer.cos_1.mc.reserved = 10240
#egress_buffer.cos_2.mc.reserved = 10240
#egress_buffer.cos_3.mc.reserved = 10240
#egress_buffer.cos_4.mc.reserved = 10240
#egress_buffer.cos_5.mc.reserved = 10240
#egress_buffer.cos_6.mc.reserved = 10240
#egress_buffer.cos_7.mc.reserved = 10240
#egress_buffer.cos_0.mc.shared_size = 40
#egress_buffer.cos_1.mc.shared_size = 40
#egress_buffer.cos_2.mc.shared_size =  5
#egress_buffer.cos_3.mc.shared_size = 40
#egress_buffer.cos_4.mc.shared_size = 40
#egress_buffer.cos_5.mc.shared_size = 40
#egress_buffer.cos_6.mc.shared_size = 40
#egress_buffer.cos_7.mc.shared_size = 30

# Shared buffer allocation for ePort.TC region : size in bytes.
#egress_buffer.egr_queue_0.uc.shared_size   = 40
#egress_buffer.egr_queue_1.uc.shared_size   = 40
#egress_buffer.egr_queue_2.uc.shared_size   =  5
#egress_buffer.egr_queue_3.uc.shared_size   = 40
#egress_buffer.egr_queue_4.uc.shared_size   = 40
#egress_buffer.egr_queue_5.uc.shared_size   = 40
#egress_buffer.egr_queue_6.uc.shared_size   = 40
#egress_buffer.egr_queue_7.uc.shared_size   = 30

# Minimum buffer allocation for ePort.TC region: size in bytes
#egress_buffer.egr_queue_0.uc.reserved = 1024
#egress_buffer.egr_queue_1.uc.reserved = 1024
#egress_buffer.egr_queue_2.uc.reserved = 1024
#egress_buffer.egr_queue_3.uc.reserved = 1024
#egress_buffer.egr_queue_4.uc.reserved = 1024
#egress_buffer.egr_queue_5.uc.reserved = 1024
#egress_buffer.egr_queue_6.uc.reserved = 1024
#egress_buffer.egr_queue_7.uc.reserved = 1024

# Reserved Egress buffer for TCs mapped to lossless SPs
#flow_control.egress_buffer.reserved = 0

# Egress buffer ePort.MC buffer : size in bytes
# the per-port limit on multicast packets (applies to all switch priorities)
#port.egress_buffer.mc.reserved = 10240
#port.egress_buffer.mc.shared_size = 92160

# To enable egress static pool, set the mode to 0
egress_service_pool.0.mode = 1

# Egress dynamic buffer pool configuration
# Replace the shared_size parameter with the dynamic_quota=n/ALPHA_x,
# where ‘n’ should be the configuration value for alpha.
# 		‘ALPHA_x’ should be string representation for alpha.
# Pls note : Same alpha configuration values can be used as mentioned in Ingress Dynamic Buffering section above
# Egress buffer per-port dynamic buffering quota (alpha ; Default: ALPHA_16)
#port.service_pool.0.egress_buffer.uc.dynamic_quota = ALPHA_16
#port.management.egress_buffer.dynamic_quota = ALPHA_8


# Egress buffer per-egress-queue dynamic buffering quota (alpha) for lossless egress queues (Default: ALPHA_INFINITY)
#flow_control.egress_buffer.dynamic_quota = ALPHA_1

# Egress buffer per-egress-queue dynamic buffering quota (alpha) for unicast (Default: ALPHA_8)
#egress_buffer.egr_queue_0.uc.dynamic_quota = ALPHA_8
#egress_buffer.egr_queue_1.uc.dynamic_quota = ALPHA_8
#egress_buffer.egr_queue_2.uc.dynamic_quota = ALPHA_8
#egress_buffer.egr_queue_3.uc.dynamic_quota = ALPHA_8
#egress_buffer.egr_queue_4.uc.dynamic_quota = ALPHA_8
#egress_buffer.egr_queue_5.uc.dynamic_quota = ALPHA_8
#egress_buffer.egr_queue_6.uc.dynamic_quota = ALPHA_8
#egress_buffer.egr_queue_7.uc.dynamic_quota = ALPHA_8

# Egress buffer per-egress-queue dynamic buffering quota (alpha) for multicast (Default: ALPHA_INFINITY)
#egress_buffer.egr_queue_0.mc.dynamic_quota    = ALPHA_2
#egress_buffer.egr_queue_1.mc.dynamic_quota = ALPHA_4
#egress_buffer.egr_queue_2.mc.dynamic_quota = ALPHA_1
#egress_buffer.egr_queue_3.mc.dynamic_quota = ALPHA_1_2
#egress_buffer.egr_queue_4.mc.dynamic_quota = ALPHA_1_4
#egress_buffer.egr_queue_5.mc.dynamic_quota = ALPHA_1_8
#egress_buffer.egr_queue_6.mc.dynamic_quota = ALPHA_1_16
#egress_buffer.egr_queue_7.mc.dynamic_quota = ALPHA_INFINITY

# These parameters can be assigned to the virtual Multicast port as well (Default: ALPHA_1_4)
#egress_buffer.cos_0.mc.dynamic_quota = ALPHA_1_4
#egress_buffer.cos_1.mc.dynamic_quota = ALPHA_1_4
#egress_buffer.cos_2.mc.dynamic_quota = ALPHA_1_4
#egress_buffer.cos_3.mc.dynamic_quota = ALPHA_1_4
#egress_buffer.cos_4.mc.dynamic_quota = ALPHA_1_4
#egress_buffer.cos_5.mc.dynamic_quota = ALPHA_1_4
#egress_buffer.cos_6.mc.dynamic_quota = ALPHA_1_4
#egress_buffer.cos_7.mc.dynamic_quota = ALPHA_1_4

# internal cos values mapped to egress queues
# multicast queue: same as unicast queue
cos_egr_queue.cos_0.uc  = 0
cos_egr_queue.cos_0.cpu = 0

cos_egr_queue.cos_1.uc  = 1
cos_egr_queue.cos_1.cpu = 1

cos_egr_queue.cos_2.uc  = 2
cos_egr_queue.cos_2.cpu = 2

cos_egr_queue.cos_3.uc  = 3
cos_egr_queue.cos_3.cpu = 3

cos_egr_queue.cos_4.uc  = 4
cos_egr_queue.cos_4.cpu = 4

cos_egr_queue.cos_5.uc  = 5
cos_egr_queue.cos_5.cpu = 5

cos_egr_queue.cos_6.uc  = 6
cos_egr_queue.cos_6.cpu = 6

cos_egr_queue.cos_7.uc  = 7
cos_egr_queue.cos_7.cpu = 7

Caveats

Configure QoS and Breakout Ports Simultaneously

If you configure btoh breakout ports and QoS settings for breakout interfaces at the same time, errors might occur.

You must apply breakout port configuration before QoS configuration on the breakout ports. If you are using NVUE, configure breakout ports and perform an nv config apply first, then configure QoS settings on the breakout ports followed by another nv config apply. If you are using linux file configuration, modify ports.conf first, reload switchd, then modify qos_features.conf and reload switchd a second time.

QoS Settings on Bond Member Interfaces

If you use Linux commands to apply QoS settings on bond member interfaces instead of the logical bond interface, the members must share identical QoS configuration. If the configuration is not identical between bond interfaces, the bond inherits the _last_ interface you apply to the bond.

If QoS settings do not match, switchd reload fails; however, switchd restart does not fail.

NVUE rejects QoS configurations on bond member interfaces and shows an error when you try to apply the configurations; you must apply all QoS configuration on logical bond interfaces.

Cut-through Switching

You cannot disable cut-through switching on Spectrum ASICs. Cumulus Linux ignores the cut_through_enable = false setting in the qos_features.conf file.

RDMA over Converged Ethernet - RoCE

RoCE enables you to write to compute or storage elements using RDMA over an Ethernet network instead of using host CPUs. RoCE relies on ECN and PFC to operate. Cumulus Linux supports features that can enable lossless Ethernet for RoCE environments.

While Cumulus Linux can support RoCE environments, the end hosts must support the RoCE protocol.

RoCE helps you obtain a converged network, where all services run over the Ethernet infrastructure, including Infiniband apps.

Default RoCE Mode Configuration

The following table shows the default RoCE configuration for lossy and lossless mode.

ConfigurationLossy ModeLossless Mode
Port trust modeYESYES
Port switch priority to traffic class mapping
  • Switch priority 3 to traffic class 3 (RoCE)
  • Switch priority 6 to traffic class 6 (CNP)
  • Other switch priority to traffic class 0
YESYES
Port ETS:
  • Traffic class 6 (CNP) - Strict
  • Traffic class 3 (RoCE) - WRR 50%
  • Traffic class 0 (Other traffic) - WRR 50%
YESYES
Port ECN absolute threshold is 1501500 bytes for traffic class 3 (RoCE)YESYES
LLDP and Application TLV (RoCE)
(UDP, Protocol:4791, Priority: 3)
YESYES
Enable PFC on switch priority 3 (RoCE)NOYES
Switch priority 3 allocated to RoCE lossless traffic poolNOYES

Enable RDMA over Converged Ethernet lossless (with PFC and ECN)

RoCE uses the Infiniband (IB) Protocol over converged Ethernet. The IB global route header rides directly on top of the Ethernet header. The lossless Ethernet layer handles congestion hop by hop.

To configure RoCE with PFC and ECN:

cumulus@switch:~$ nv set qos roce
cumulus@switch:~$ nv config apply

NVUE defaults to roce mode lossless. The command nv set qos roce and nv set qos roce mode lossless are equivalent.

If you enable mode lossy, configuring nv set qos roce without a mode does not change the RoCE mode. To change to lossless, you must configure mode lossless.

Link pause is another way to provide lossless ethernet; however, PFC is the preferred method. PFC allows more granular control by pausing the traffic flow for a given CoS group instead of the entire link.

Enable RDMA over Converged Ethernet lossy (with ECN)

RoCEv2 requires flow control for lossless Ethernet. RoCEv2 uses the Infiniband (IB) Transport Protocol over UDP. The IB transport protocol includes an end-to-end reliable delivery mechanism and has its own sender notification mechanism.

RoCEv2 congestion management uses RFC 3168 to signal congestion experienced to the receiver. The receiver generates an RoCEv2 congestion notification packet directed to the source of the packet.

To configure RoCE with ECN:

cumulus@switch:~$ nv set qos roce mode lossy
cumulus@switch:~$ nv config apply

Remove RoCE Configuration

To remove RoCE configurations:

cumulus@switch:~$ nv unset qos roce
cumulus@switch:~$ nv config apply

Verify RoCE Configuration

You can verify RoCE configuration with NVUE nv show commands.

To show detailed information about the configured buffers, utilization and DSCP markings, run the nv show qos roce command:

cumulus@switch:mgmt:~$ nv show qos roce
                   operational  applied 
------------------  -----------  --------
enable                           on      
mode                lossless     lossless
congestion-control                       
  congestion-mode   ECN                  
  enabled-tc        0,3                  
  max-threshold     1.43 MB              
  min-threshold     146.48 KB            
  probability       100                  
lldp-app-tlv                             
  priority          3                    
  protocol-id       4791                 
  selector          UDP                  
pfc                                      
  pfc-priority      3                    
  rx-enabled        enabled              
  tx-enabled        enabled              
trust                                    
  trust-mode        pcp,dscp             

RoCE PCP/DSCP->SP mapping configurations
===========================================
       pcp  dscp                     switch-prio
    -  ---  -----------------------  -----------
    0  0    0,1,2,3,4,5,6,7          0          
    1  1    8,9,10,11,12,13,14,15    1          
    2  2    16,17,18,19,20,21,22,23  2          
    3  3    24,25,26,27,28,29,30,31  3          
    4  4    32,33,34,35,36,37,38,39  4          
    5  5    40,41,42,43,44,45,46,47  5          
    6  6    48,49,50,51,52,53,54,55  6          
    7  7    56,57,58,59,60,61,62,63  7          

RoCE SP->TC mapping and ETS configurations
=============================================
       switch-prio  traffic-class  scheduler-weight
    -  -----------  -------------  ----------------
    0  0            0              DWRR-50%        
    1  1            0              DWRR-50%        
    2  2            0              DWRR-50%        
    3  3            3              DWRR-50%        
    4  4            0              DWRR-50%        
    5  5            0              DWRR-50%        
    6  6            6              strict-priority 
    7  7            0              DWRR-50%        

RoCE pool config
===================
       name                   mode     size  switch-priorities  traffic-class
    -  ---------------------  -------  ----  -----------------  -------------
    0  lossy-default-ingress  Dynamic  50%   0,1,2,4,5,6,7      -            
    1  roce-reserved-ingress  Dynamic  50%   3                  -            
    2  lossy-default-egress   Dynamic  50%   -                  0,6          
    3  roce-reserved-egress   Dynamic  inf   -                  3            

Exception List
=================
No Data

To show detailed RoCE information about a single interface, run the nv show interface <interface> qos roce status command.

cumulus@switch:mgmt:~$ nv show interface swp16 qos roce status
                    operational    applied  description
------------------  -------------  -------  ---------------------------------------------------
congestion-control
  congestion-mode   ecn, absolute           Congestion config mode
  enabled-tc        0,3                     Congestion config enabled Traffic Class
  max-threshold     1.43 MB                 Congestion config max-threshold
  min-threshold     153.00 KB               Congestion config min-threshold
  probability       100                  
lldp-app-tlv                             
  priority          3                    
  protocol-id       4791                 
  selector          UDP
pfc
  pfc-priority      3                       switch-prio on which PFC is enabled
  rx-enabled        yes                     PFC Rx Enabled status
  tx-enabled        yes                     PFC Tx Enabled status
trust
  trust-mode        pcp,dscp                Trust Setting on the port for packet classification
mode                lossless                Roce Mode
 
 
RoCE PCP/DSCP->SP mapping configurations
===========================================
          pcp  dscp  switch-prio
    ----  ---  ----  -----------
    cnp   6    48    6
    roce  3    26    3
 
 
RoCE SP->TC mapping and ETS configurations
=============================================
          switch-prio  traffic-class  scheduler-weight
    ----  -----------  -------------  ----------------
    cnp   6            6              strict priority
    roce  3            3              dwrr-50%
 
 
RoCE Pool Status
===================
        name                   mode     pool-id  switch-priorities  traffic-class  size      current-usage  max-usage
    --  ---------------------  -------  -------  -----------------  -------------  --------  -------------  ---------
    0   lossy-default-ingress  DYNAMIC  2        0,1,2,4,5,6,7      -              15.16 MB  0 Bytes        16.00 MB
    1   roce-reserved-ingress  DYNAMIC  3        3                  -              15.16 MB  7.30 MB        7.90 MB
    2   lossy-default-egress   DYNAMIC  13       -                  0,6            15.16 MB  0 Bytes        16.01 MB
    3   roce-reserved-egress   DYNAMIC  14       -                  3              inf       7.29 MB        13.47 MB

To show detailed information about current buffer utilization as well as historic RoCE byte and packet counts, run the nv show interface <interface> qos roce counters command:

cumulus@switch:mgmt:~$ nv show interface swp16 qos roce counters
                               operational   applied  description
-----------------------------  ------------  -------  ------------------------------------------------------
rx-stats
  rx-non-roce-stats
    buffer-max-usage           144 Bytes              Max Ingress Pool-buffer usage for non-RoCE traffic
    buffer-usage               0 Bytes                Current Ingress Pool-buffer usage for non-RoCE traffic
    no-buffer-discard          55                     Rx buffer discards for non-RoCE traffic
    non-roce-bytes             56.52 MB               non-roce rx bytes
    non-roce-packets           462975                 non-roce rx packets
    pg-max-usage               144 Bytes              Max PG-buffer usage for non-RoCE traffic
    pg-usage                   0 Bytes                Current PG-buffer usage for non-RoCE traffic
  rx-pfc-stats
    pause-duration             0                      Rx PFC pause duration for RoCE traffic
    pause-packets              0                      Rx PFC pause packets for RoCE traffic
  rx-roce-stats
    buffer-max-usage           0 Bytes                Max Ingress Pool-buffer usage for RoCE traffic
    buffer-usage               0 Bytes                Current Ingress Pool-buffer usage for RoCE traffic
    no-buffer-discard          0                      Rx buffer discards for RoCE traffic
    pg-max-usage               0 Bytes                Max PG-buffer usage for RoCE traffic
    pg-usage                   0 Bytes                Current PG-buffer usage for RoCE traffic
    roce-bytes                 0 Bytes                Rx RoCE Bytes
    roce-packets               0                      Rx RoCE Packets
tx-stats
  tx-cnp-stats
    buffer-max-usage           16.02 MB               Max Egress Pool-buffer usage for CNP traffic
    buffer-usage               0 Bytes                Current Egress Pool-buffer usage for CNP traffic
    cnp-bytes                  0 Bytes                Tx CNP Packet Bytes
    cnp-packets                0                      Tx CNP Packets
    tc-max-usage               0 Bytes                Max TC-buffer usage for CNP traffic
    tc-usage                   0 Bytes                Current TC-buffer usage for CNP traffic
    unicast-no-buffer-discard  0                      Tx buffer discards for CNP traffic
  tx-ecn-stats
    ecn-marked-packets         693777677344           Tx ECN marked packets
  tx-pfc-stats
    pause-duration             0                      Tx PFC pause duration for RoCE traffic
    pause-packets              0                      Tx PFC pause packets for RoCE traffic
  tx-roce-stats
    buffer-max-usage           13.47 MB               Max Egress Pool-buffer usage for RoCE traffic
    buffer-usage               7.29 MB                Current Egress Pool-buffer usage for RoCE traffic
    roce-bytes                 92824.38 GB            Tx RoCE Packet bytes
    roce-packets               803785675319           Tx RoCE Packets
    tc-max-usage               16.02 MB               Max TC-buffer usage for RoCE traffic
    tc-usage                   7.29 MB                Current TC-buffer usage for RoCE traffic
    unicast-no-buffer-discard  663060754115           Tx buffer discards for RoCE traffic

To reset the counters that the nv show interface <interface> qos roce command displays, run the nv action clear interface <interface> qos roce counters command.

Change RoCE Configuration

You can adjust RoCE settings using NVUE after you enable RoCE. To change the memory allocation for RoCE lossless mode to 60 percent:

cumulus@switch:mgmt:~$ nv set qos traffic-pool default-lossy memory-percent 40
cumulus@switch:mgmt:~$ nv set qos traffic-pool roce-lossless memory-percent 60
cumulus@switch:mgmt:~$ nv config apply

To change the memory allocation of the RoCE lossy traffic pool to 60 percent and remap switch priority 4 to RoCE lossy traffic:

cumulus@switch:mgmt:~$ nv set qos traffic-pool default-lossy switch-priority 0-3,5-7
cumulus@switch:mgmt:~$ nv set qos traffic-pool roce-lossy memory-percent 60
cumulus@switch:mgmt:~$ nv set qos traffic-pool default-lossy memory-percent 40
cumulus@switch:mgmt:~$ nv set qos traffic-pool roce-lossy switch-priority 4
cumulus@switch:mgmt:~$ nv set qos egress-queue-mapping default-global switch-priority 4 traffic-class 3
cumulus@switch:mgmt:~$ nv set qos egress-queue-mapping default-global switch-priority 3 traffic-class 0
cumulus@switch:mgmt:~$ nv set qos mapping default-global trust both
cumulus@switch:mgmt:~$ nv set qos mapping default-global dscp 26 switch-priority 4
cumulus@switch:mgmt:~$ nv config apply

To change the RoCE lossless switch priority from switch priority 3 to switch priority 2:

cumulus@switch:mgmt:~$ nv set qos pfc default-global switch-priority 2
cumulus@switch:mgmt:~$ nv set qos egress-queue-mapping default-global switch-priority 2 traffic-class 3
cumulus@switch:mgmt:~$ nv set qos egress-queue-mapping default-global switch-priority 3 traffic-class 0
cumulus@switch:mgmt:~$ nv set qos mapping default-global trust both
cumulus@switch:mgmt:~$ nv set qos mapping default-global dscp 26 switch-priority 2

DHCP

This section describes how to configure:

DHCP Relays

DHCP is a client server protocol that automatically provides IP hosts with IP addresses and other related configuration information. A DHCP relay (agent) is a host that forwards DHCP packets between clients and servers that are not on the same physical subnet.

This topic describes how to configure DHCP relays for IPv4 and IPv6 using the following topology:

Basic Configuration

To set up DHCP relay, you need to provide the IP address of the DHCP server and the interfaces participating in DHCP relay (facing the server and facing the client). In an MLAG configuration, you must also specify the peerlink interface in case the local uplink interfaces fail.

In the example commands below:

cumulus@leaf01:~$ nv set service dhcp-relay default interface swp51
cumulus@leaf01:~$ nv set service dhcp-relay default interface swp52
cumulus@leaf01:~$ nv set service dhcp-relay default interface vlan10
cumulus@leaf01:~$ nv set service dhcp-relay default interface peerlink.4094
cumulus@leaf01:~$ nv set service dhcp-relay default server 172.16.1.102
cumulus@leaf01:~$ nv config apply
cumulus@leaf01:~$ nv set service dhcp-relay6 default interface upstream swp51 server-address 2001:db8:100::2
cumulus@leaf01:~$ nv set service dhcp-relay6 default interface upstream swp52 server-address 2001:db8:100::2
cumulus@leaf01:~$ nv set service dhcp-relay6 default interface downstream vlan10
cumulus@leaf01:~$ nv set service dhcp-relay6 default interface downstream peerlink.4094
cumulus@leaf01:~$ nv config apply
  1. Edit the /etc/default/isc-dhcp-relay-default file to add the IP address of the DHCP server and the interfaces participating in DHCP relay.

    cumulus@leaf01:~$ sudo nano /etc/default/isc-dhcp-relay-default
    SERVERS="172.16.1.102"
    INTF_CMD="-i vlan10 -i swp51 -i swp52 -i peerlink.4094"
    OPTIONS=""
    
  2. Enable, then restart the dhcrelay service so that the configuration persists between reboots:

    cumulus@leaf01:~$ sudo systemctl enable dhcrelay@default.service
    cumulus@leaf01:~$ sudo systemctl restart dhcrelay@default.service
    
  1. Edit the /etc/default/isc-dhcp-relay6-default file to add the IP address of the DHCP server and the interfaces participating in DHCP relay.

    cumulus@leaf01:$ sudo nano /etc/default/isc-dhcp-relay6-default
    SERVERS=" -u 2001:db8:100::2%swp51 -u 2001:db8:100::2%swp52"
    INTF_CMD="-l vlan10 -l peerlink.4094"
    
  2. Enable, then restart the dhcrelay6 service so that the configuration persists between reboots:

    cumulus@switch:~$ sudo systemctl enable dhcrelay6@default.service
    cumulus@switch:~$ sudo systemctl restart dhcrelay6@default.service
    

  • You configure a DHCP relay on a per-VLAN basis, specifying the SVI, not the parent bridge. In the example above, you specify vlan10 as the SVI for VLAN 10 but you do not specify the bridge named bridge.
  • When you configure DHCP relay with VRR, the DHCP relay client must run on the SVI; not on the -v0 interface.
  • For every instance of a DHCP relay in a non-default VRF, you need to create a separate default file in the /etc/default directory. See DHCP with VRF.

Optional Configuration

This section describes optional DHCP relay configurations. The steps provided in this section assume that you have already configured basic DHCP relay, as described above.

DHCP Agent Information Option (Option 82)

Cumulus Linux supports DHCP Agent Information Option 82, which allows a DHCP relay to insert circuit or relay specific information into a request that the switch forwards to a DHCP server. You can use the following options:

To configure DHCP Agent Information Option 82:

The following example enables Option 82 and enables circuit ID:

cumulus@leaf01:~$ nv set service dhcp-relay <vrf-id> agent enable on
cumulus@leaf01:~$ nv set service dhcp-relay <vrf-id> agent use-pif-circuit-id enable on
cumulus@leaf01:~$ nv config apply

The following example enables Option 82 and sets the remote ID to MAC address 44:38:39:BE:EF:AA:

cumulus@leaf01:~$ nv set service dhcp-relay <vrf-id> agent enable on
cumulus@leaf01:~$ nv set service dhcp-relay default agent remote-id 44:38:39:BE:EF:AA
cumulus@leaf01:~$ nv config apply
  1. Edit the /etc/default/isc-dhcp-relay-default file and add one of the following options:

    To inject the ingress SVI interface against which DHCP processes the relayed DHCP discover packet, add -a to the OPTIONS line:

    cumulus@leaf01:~$ sudo nano /etc/default/isc-dhcp-relay-default
    ...
    # Additional options that are passed to the DHCP relay daemon?
    OPTIONS="-a"
    

    To inject the physical switch port on which the relayed DHCP discover packet arrives instead of the SVI, add -a --use-pif-circuit-id to the OPTIONS line:

    cumulus@leaf01:~$ sudo nano /etc/default/isc-dhcp-relay-default
    ...
    # Additional options that are passed to the DHCP relay daemon?
    OPTIONS="-a --use-pif-circuit-id"
    

    To customize the Remote ID sub-option, add -a -r to the OPTIONS line followed by a custom string (up to 255 characters):

    cumulus@leaf01:~$ sudo nano /etc/default/isc-dhcp-relay-default
    ...
    # Additional options that are passed to the DHCP relay daemon?
    OPTIONS="-a -r CUSTOMVALUE"
    
  2. Restart the dhcrelay service to apply the new configuration:

    cumulus@leaf01:~$ sudo systemctl restart dhcrelay@default.service
    

Control the Gateway IP Address with RFC 3527

When you need DHCP relay in an environment that relies on an anycast gateway (such as EVPN), a unique IP address is necessary on each device for return traffic. By default, in a BGP unnumbered environment with DHCP relay, the source IP address is the loopback IP address and the gateway IP address is the SVI IP address. However with anycast traffic, the SVI IP address is not unique to each rack; it is typically shared between racks. Most EVPN ToR deployments only use a single unique IP address, which is the loopback IP address.

RFC 3527 enables the DHCP server to react to these environments by introducing a new parameter to the DHCP header called the link selection sub-option, which the DHCP relay agent builds. The link selection sub-option takes on the normal role of the gateway address in relaying to the DHCP server which subnet correlates to the DHCP request. When using this sub-option, the gateway address continues to be present but only relays the return IP address that the DHCP server uses; the gateway address becomes the unique loopback IP address.

When enabling RFC 3527 support, you can specify an interface, such as the loopback interface or a switch port interface to use as the gateway address. The relay picks the first IP address on that interface. If the interface has multiple IP addresses, you can specify a specific IP address for the interface.

RFC 3527 supports IPv4 DHCP relays only.

To enable RFC 3527 support and control the gateway address:

Run the nv set service dhcp-relay default gateway-interface command with the interface or IP address you want to use. The following example uses the first IP address on the loopback interface as the gateway IP address:

cumulus@leaf01:~$ nv set service dhcp-relay default gateway-interface lo

The first IP address on the loopback interface is typically the 127.0.0.1 address. This example uses IP address 10.10.10.1 on the loopback interface as the gateway address:

cumulus@leaf01:~$ nv set service dhcp-relay default gateway-interface lo address 10.10.10.1

This example uses the first IP address on swp2 as the gateway address:

cumulus@leaf01:~$ nv set service dhcp-relay default gateway-interface swp2

This example uses IP address 10.0.0.4 on swp2 as the gateway address:

cumulus@leaf01:~$ nv set service dhcp-relay default gateway-interface swp2 address 10.0.0.4
  1. Edit the /etc/default/isc-dhcp-relay-default file and provide the -U option with the interface or IP address you want to use as the gateway address.

    This example uses the first IP address on the loopback interface as the gateway address:

    cumulus@leaf01:~$ sudo nano /etc/default/isc-dhcp-relay-default
    ...
    # Additional options that are passed to the DHCP relay daemon?
    OPTIONS="-U lo"
    

    The first IP address on the loopback interface is typically the 127.0.0.1 address. This example uses IP address 10.10.10.1 on the loopback interface as the gateway address:

    cumulus@leaf01:~$ sudo nano /etc/default/isc-dhcp-relay-default
    ...
    # Additional options that are passed to the DHCP relay daemon?
    OPTIONS="-U 10.10.10.1%lo"
    

    This example uses the first IP address on swp2 as the gateway address:

    cumulus@leaf01:~$ sudo nano /etc/default/isc-dhcp-relay-default
    ...
    # Additional options that are passed to the DHCP relay daemon?
    OPTIONS="-U swp2"
    

    This example uses IP address 10.0.0.4 on swp2 as the gateway address:

    cumulus@leaf01:~$ sudo nano /etc/default/isc-dhcp-relay-default
    ...
    # Additional options that are passed to the DHCP relay daemon?
    OPTIONS="-U 10.0.0.4%swp2"
    
  2. Restart the dhcrelay service to apply the configuration change:

    cumulus@leaf01:~$ sudo systemctl restart dhcrelay@default.service
    

DHCP Relay for IPv4 in an EVPN Symmetric Environment with MLAG

In a multi-tenant EVPN symmetric routing environment with MLAG, you must enable RFC 3527 support. You can specify an interface, such as the loopback or VRF interface for the gateway address. The interface must be reachable in the tenant VRF that you configure for DHCP relay and must have a unique IPv4 address. For EVPN symmetric routing with an anycast gateway that reuses the same SVI IP address on multiple leaf switches, you must assign a unique IP address for the VRF interface and include the layer 3 VNI for this VRF in the DHCP relay configuration.

The following example:

cumulus@leaf01:~$ nv set vrf RED loopback ip address 20.20.20.1/32
cumulus@leaf01:~$ nv set service dhcp-relay RED interface vlan10
cumulus@leaf01:~$ nv set service dhcp-relay RED interface vlan20
cumulus@leaf01:~$ nv set service dhcp-relay RED interface vlan4024_l3
cumulus@leaf01:~$ nv set service dhcp-relay RED server 10.1.10.104
cumulus@leaf01:~$ nv set vrf RED router bgp address-family ipv4-unicast redistribute connected enable on
cumulus@leaf01:~$ nv set vrf RED router bgp address-family ipv4-unicast route-export to-evpn enable on
cumulus@leaf01:~$ nv config apply
  1. Edit the /etc/network/interfaces file to configure VRF RED with IPv4 address 20.20.20.1/32

    cumulus@leaf01:mgmt:~$ sudo nano /etc/network/interfaces
    ...
    auto RED
    iface RED
            address 20.20.20.1/32
            vrf-table auto
    
  2. Configure VRF RED to advertise the connected routes as type-5 so that the loopback IPv4 address is reachable:

    cumulus@leaf01:mgmt:~$ sudo vtysh 
    ...
    leaf01# configure terminal
    leaf01(config)# router bgp 65101 vrf RED
    leaf01(config-router)# address-family l2vpn evpn
    leaf01(config-router-af)# advertise ipv4 unicast 
    leaf01(config-router-af)# end
    leaf01# write memory
    

    The /etc/frr/frr.conf file now contains the following entries:

    ...
    router bgp 65101 vrf RED
     bgp router-id 10.10.10.1
    ..
     !
     address-family ipv4 unicast
      redistribute connected
      maximum-paths 64
      maximum-paths ibgp 64
     exit-address-family
     !
     address-family l2vpn evpn
      advertise ipv4 unicast
     exit-address-family
    exit
    
  3. Edit the /etc/default/isc-dhcp-relay-RED file.

    cumulus@leaf01:mgmt:~$ sudo nano /etc/default/isc-dhcp-relay-RED
    SERVERS="10.1.10.104"
    INTF_CMD=" -i vlan10 -i vlan20 -i vlan4024_l3" 
    OPTIONS="-U RED"
    
  4. Start and enable the DHCP service so that it starts automatically the next time the switch boots:

    sudo systemctl start dhcrelay@RED.service
    sudo systemctl enable dhcrelay@RED.service
    

DHCP Relay for IPv4 in an EVPN Symmetric Environment without MLAG

In a multi-tenant EVPN symmetric routing environment without MLAG, the VLAN interface (SVI) IPv4 address is typically unique on each leaf switch, which does not require RFC 3527 configuration.

The following example:

cumulus@leaf01:~$ nv set service dhcp-relay RED interface vlan10
cumulus@leaf01:~$ nv set service dhcp-relay RED interface vlan20
cumulus@leaf01:~$ nv set service dhcp-relay RED interface vlan4024_l3
cumulus@leaf01:~$ nv set service dhcp-relay RED server 10.1.10.104
cumulus@leaf01:~$ nv config apply
  1. Edit the /etc/default/isc-dhcp-relay-RED file.

    cumulus@leaf01:mgmt:~$ sudo nano /etc/default/isc-dhcp-relay-RED
    SERVERS="10.1.10.104"
    INTF_CMD=" -i vlan10 -i vlan20 -i vlan4024_l3" 
    OPTIONS=""
    
  2. Start the DHCP service and enable it to start automatically when the switch boots:

    sudo systemctl start dhcrelay@RED.service
    sudo systemctl enable dhcrelay@RED.service
    

DHCP Relay for IPv6 in an EVPN Symmetric Environment

For IPv6 DHCP relay in a symmetric routing environment, you must assign a unique IPv6 address to the non-default VRF interfaces that participate in DHCP relay. Cumulus Linux uses this IPv6 address as the source address when sending packets to the DHCP server and the DHCP server replies to this address.

RFC 3527 does not apply to IPv6. IPv6 has the functionality described in RFC 3527 as part of its normal operations.

The following example:

cumulus@leaf01:~$ nv set vrf RED loopback ip address 2001:db8:666::1/128
cumulus@leaf01:~$ nv set service dhcp-relay6 RED interface downstream vlan10
cumulus@leaf01:~$ nv set service dhcp-relay6 RED interface downstream vlan20
cumulus@leaf01:~$ nv set service dhcp-relay6 RED interface upstream RED server-address 2001:db8:199::2
cumulus@leaf01:~$ nv set service dhcp-relay6 RED interface upstream vlan4024_l3
cumulus@leaf01:~$ nv set vrf RED router bgp address-family ipv6-unicast route-export to-evpn enable on
cumulus@leaf01:~$ nv config apply
  1. Edit the /etc/network/interfaces file to configure VRF RED with IPv6 address 2001:db8:666::1/128:

    cumulus@leaf01:mgmt:~$ sudo nano /etc/network/interfaces
    ...
    auto RED
    iface RED
            address 2001:db8:666::1/128
            vrf-table auto
    
  2. Configure VRF RED to advertise the connected routes so that the loopback IPv6 address is reachable:

    cumulus@leaf01:mgmt:~$ sudo vtysh 
    ...
    leaf01# configure terminal
    leaf01(config)# router bgp 65101 vrf RED
    leaf01(config-router)# address-family l2vpn evpn
    leaf01(config-router-af)# advertise ipv6 unicast 
    leaf01(config-router-af)# end
    leaf01# write memory
    

    The /etc/frr/frr.conf file now contains the following entries:

    ...
    router bgp 65101 vrf RED
     bgp router-id 10.10.10.1
    ..
     !
     address-family ipv6 unicast
      redistribute connected
      maximum-paths 64
      maximum-paths ibgp 64
     exit-address-family
     !
     address-family l2vpn evpn
      advertise ipv6 unicast
     exit-address-family
    exit
    
  3. Edit the /etc/default/isc-dhcp-relay6-RED file.

    • Set the -l option to the VLANs that receive DHCP requests from hosts.
    • Set the <ip-address-dhcp-server>%<interface-facing-dhcp-server> option to associate the DHCP Server with VRF RED.
    • Set the -u option to indicate where the switch receives replies from the DHCP server (SVI vlan4024_l3).
    cumulus@leaf01:mgmt:~$ sudo nano /etc/default/isc-dhcp-relay6-RED
    INTF_CMD="-l vlan10 -l vlan20"
    SERVERS="-u 2001:db8:199::2%RED -u vlan4024_l3"
    
  4. Start and enable the DHCP service so that it starts automatically the next time the switch boots:

    sudo systemctl start dhcrelay6@RED.service
    sudo systemctl enable dhcrelay6@RED.service
    

Gateway IP Address as Source IP for Relayed DHCP Packets (Advanced)

You can configure the dhcrelay service to forward IPv4 (only) DHCP packets to a DHCP server and ensure that the source IP address of the relayed packet is the same as the gateway IP address.

This option impacts all relayed IPv4 packets globally.

To use the gateway IP address as the source IP address:

cumulus@leaf01:~$ nv set service dhcp-relay default source-ip gateway
cumulus@leaf01:~$ nv config apply
  1. Edit the /etc/default/isc-dhcp-relay-default file to add --giaddr-src to the OPTIONS line.

    cumulus@leaf01:~$ sudo nano /etc/default/isc-dhcp-relay-default
    SERVERS="172.16.1.102"
    INTF_CMD="-i vlan10 -i swp51 -i swp52 -U swp2"
    OPTIONS="--giaddr-src"
    
  2. Restart the dhcrelay service to apply the configuration change:

    cumulus@leaf01:~$ sudo systemctl restart dhcrelay@default.service
    

Configure Multiple DHCP Relays

Cumulus Linux supports multiple DHCP relay daemons on a switch to enable relaying of packets from different bridges to different upstream interfaces.

To configure multiple DHCP relay daemons on a switch:

  1. In the /etc/default directory, create a configuration file for each DHCP relay daemon. Use the naming scheme isc-dhcp-relay-<dhcp-name> for IPv4 or isc-dhcp-relay6-<dhcp-name> for IPv6. This is an example configuration file for IPv4:

    # Defaults for isc-dhcp-relay initscript
    # sourced by /etc/init.d/isc-dhcp-relay
    # installed at /etc/default/isc-dhcp-relay by the maintainer scripts
    
    #
    # This is a POSIX shell fragment
    #
    
    # What servers should the DHCP relay forward requests to?
    SERVERS="102.0.0.2"
    # On what interfaces should the DHCP relay (dhrelay) serve DHCP requests?
    # Always include the interface towards the DHCP server.
    # This variable requires a -i for each interface configured above.
    # This will be used in the actual dhcrelay command
    # For example, "-i eth0 -i eth1"
    INTF_CMD="-i swp2s2 -i swp2s3"
    
    # Additional options that are passed to the DHCP relay daemon?
    OPTIONS=""
    
  2. Run the following command to start a dhcrelay instance, where <dhcp-name> is the instance name or number.

    cumulus@leaf01:~$ sudo systemctl start dhcrelay@<dhcp-name>
    

Troubleshooting

This section provides troubleshooting tips.

Show DHCP Relay Status

To show the DHCP relay status:

Run the nv show service dhcp-relay command for IPv4 or the nv show service dhcp-relay6 command for IPv6:

cumulus@leaf01:~$ nv show service dhcp-relay
           source-ip  Summary
---------  ---------  -----------------------
+ default  auto       gateway-interface: lo
  default             interface:        swp51
  default             interface:        swp52
  default             interface:        vlan10
  default             server:    172.16.1.102

Run the Linux systemctl status dhcrelay@default.service command for IPv4 or the systemctl status dhcrelay6@default.service command for IPv6:

cumulus@leaf01:~$ sudo systemctl status dhcrelay@default.service
● dhcrelay@default.service - DHCPv4 Relay Agent Daemon default in vrf default
   Loaded: loaded (/lib/systemd/system/dhcrelay@.service; enabled; vendor preset: enabled)
  Drop-In: /run/systemd/generator/dhcrelay@.service.d
           └─vrf.conf
   Active: active (running) since Tue 2023-04-18 18:23:55 UTC; 9min ago
     Docs: man:dhcrelay(8)
 Main PID: 30904 (dhcrelay)
    Tasks: 1 (limit: 2056)
   Memory: 2.3M
   CGroup: /system.slice/system-dhcrelay.slice/dhcrelay@default.service
           └─vrf
             └─30904 /usr/sbin/dhcrelay --nl -d -i swp51 -i swp52 -i vlan10 -i peerlink.4094 172.16.1.102

Check systemd

If you are experiencing issues with DHCP relay, check if there is a problem with systemd:

To see how DHCP relay is working on your switch, run the journalctl command:

cumulus@leaf01:~$ sudo journalctl -l -n 20 | grep dhcrelay
Dec 05 20:58:55 leaf01 dhcrelay[6152]: sending upstream swp52
Dec 05 20:58:55 leaf01 dhcrelay[6152]: sending upstream swp51
Dec 05 20:58:55 leaf01 dhcrelay[6152]: Relaying Reply to fe80::4638:39ff:fe00:3 port 546 down.
Dec 05 20:58:55 leaf01 dhcrelay[6152]: Relaying Reply to fe80::4638:39ff:fe00:3 port 546 down.
Dec 05 21:03:55 leaf01 dhcrelay[6152]: Relaying Renew from fe80::4638:39ff:fe00:3 port 546 going up.
Dec 05 21:03:55 leaf01 dhcrelay[6152]: sending upstream swp52
Dec 05 21:03:55 leaf01 dhcrelay[6152]: sending upstream swp51
Dec 05 21:03:55 leaf01 dhcrelay[6152]: Relaying Reply to fe80::4638:39ff:fe00:3 port 546 down.
Dec 05 21:03:55 leaf01 dhcrelay[6152]: Relaying Reply to fe80::4638:39ff:fe00:3 port 546 down.

To specify a time period with the journalctl command, use the --since flag:

cumulus@leaf01:~$ sudo journalctl -l --since "2 minutes ago" | grep dhcrelay
Dec 05 21:08:55 leaf01 dhcrelay[6152]: Relaying Renew from fe80::4638:39ff:fe00:3 port 546 going up.
Dec 05 21:08:55 leaf01 dhcrelay[6152]: sending upstream swp52
Dec 05 21:08:55 leaf01 dhcrelay[6152]: sending upstream swp51

Configuration Errors

If you configure DHCP relays by editing the /etc/default/isc-dhcp-relay-default file manually, you can introduce configuration errors that cause the switch to crash.

For example, if you see an error similar to the following, check that there is no space between the DHCP server address and the interface you use as the uplink.

Core was generated by /usr/sbin/dhcrelay --nl -d -i vx-40 -i vlan10 10.0.0.4 -U 10.0.1.2  %vlan20.
Program terminated with signal SIGSEGV, Segmentation fault.

To resolve the issue, manually edit the /etc/default/isc-dhcp-relay-default file to remove the space, then run the systemctl restart dhcrelay@default.service command to restart the dhcrelay service and apply the configuration change.

Considerations

DHCP Servers

A DHCP server automatically provides and assigns IP addresses and other network parameters to client devices. It relies on DHCP to respond to broadcast requests from clients.

If you intend to run the dhcpd service within a VRF, including the management VRF, follow these steps.

Basic Configuration

This section shows you how to configure a DHCP server using the following topology, where the DHCP server is a switch running Cumulus Linux.

To configure the DHCP server on a Cumulus Linux switch:

In addition, you can configure a static IP address for a resource, such as a server or printer:

  • To configure static IP address assignments, you must first configure a pool.
  • You can set the DNS server IP address and domain name globally or specify different DNS server IP addresses and domain names for different pools. The following example commands configure a DNS server IP address and domain name for a pool.

cumulus@switch:~$ nv set service dhcp-server default pool 10.1.10.0/24 pool-name storage-servers
cumulus@switch:~$ nv set service dhcp-server default pool 10.1.10.0/24 domain-name example.com
cumulus@switch:~$ nv set service dhcp-server default pool 10.1.10.0/24 domain-name-server 192.168.200.53
cumulus@switch:~$ nv set service dhcp-server default pool 10.1.10.0/24 range 10.1.10.100 to 10.1.10.199
cumulus@switch:~$ nv set service dhcp-server default pool 10.1.10.0/24 gateway 10.1.10.1
cumulus@switch:~$ nv set service dhcp-server default static server1
cumulus@switch:~$ nv set service dhcp-server default static server1 ip-address 10.0.0.2
cumulus@switch:~$ nv set service dhcp-server default static server1 mac-address 44:38:39:00:01:7e
cumulus@switch:~$ nv config apply

To allocate DHCP addresses from the configured pool, you must configure an interface with an IP address from the pool subnet. For example:

cumulus@switch:~$ nv set interface vlan10 ip address 10.1.10.1/24
cumulus@switch:~$ nv config apply

To set the DNS server IP address and domain name globally, use the nv set service dhcp-server <vrf> domain-name-server <address> and nv set service dhcp-server <vrf> domain-name <domain> commands.

cumulus@switch:~$ nv set service dhcp-server6 default pool 2001:db8::/64 
cumulus@switch:~$ nv set service dhcp-server6 default pool 2001:db8::/64 pool-name storage-servers
cumulus@switch:~$ nv set service dhcp-server6 default pool 2001:db8::/64 domain-name-server 2001:db8:100::64
cumulus@switch:~$ nv set service dhcp-server6 default pool 2001:db8::/64 domain-name example.com
cumulus@switch:~$ nv set service dhcp-server6 default pool 2001:db8::/64 range 2001:db8::100 to 2001:db8::199 
cumulus@switch:~$ nv set service dhcp-server6 default static server1
cumulus@switch:~$ nv set service dhcp-server6 default static server1 ip-address 2001:db8::100
cumulus@switch:~$ nv set service dhcp-server6 default static server1 mac-address 44:38:39:00:01:7e
cumulus@switch:~$ nv config apply

To allocate DHCP addresses from the configured pool, you must configure an interface with an IP address from the pool subnet. For example:

cumulus@switch:~$ nv set interface vlan10 ip address 2001:db8::10/64
cumulus@switch:~$ nv config apply

To set the DNS server IP address and domain name globally, use the nv set service dhcp-server6 <vrf> domain-name-server <address> and nv set service dhcp-server6 <vrf> domain-name <domain> commands.

  1. In a text editor, edit the /etc/dhcp/dhcpd.conf file. Use following configuration as an example:

    cumulus@switch:~$ sudo nano /etc/dhcp/dhcpd.conf
    authoritative;
    subnet 10.1.10.0 netmask 255.255.255.0 {
       option domain-name-servers 192.168.200.53;
       option domain-name example.com;
       default-lease-time 3600;
       max-lease-time 3600;
       default-url ;
    pool {
           range 10.1.10.100 10.1.10.199;
           }
    }
    #Statics
    group {
       host server1 {
          hardware ethernet 44:38:39:00:01:7e;
          fixed-address 10.0.0.2;
       }
    }
    

To set the DNS server IP address and domain name globally, add the DNS server IP address and domain name before the pool information in the /etc/dhcp/dhcpd.conf file. For example:

cumulus@switch:~$ sudo nano /etc/dhcp/dhcpd.conf
authoritative;
option domain-name servers;
option domain-name-servers 192.168.200.51;
subnet 10.1.10.0 netmask 255.255.255.0 {
   default-lease-time 3600;
   max-lease-time 3600;
...
  1. Edit the /etc/default/isc-dhcp-server configuration file so that the DHCP server starts when the system boots. Here is an example configuration:

    cumulus@switch:~$ sudo nano /etc/default/isc-dhcp-server
    DHCPD_CONF="-cf /etc/dhcp/dhcpd.conf"
    

    INTERFACES="swp1"

  2. Enable and start the dhcpd service:

    cumulus@switch:~$ sudo systemctl enable dhcpd.service
    cumulus@switch:~$ sudo systemctl start dhcpd.service
    
  1. In a text editor, edit the /etc/dhcp/dhcpd6.conf file. Use following configuration as an example:

    cumulus@switch:~$ sudo nano /etc/dhcp/dhcpd6.conf
    authoritative;
    subnet6 2001:db8::/64 {
       option domain-name-servers 2001:db8:100::64;
       option domain-name example.com;
       default-lease-time 3600;
       max-lease-time 3600;
       default-url ;
       pool {
           range6 2001:db8::100 2001:db8::199;
       }
    }
    #Statics
    group {
       host server1 {
           hardware ethernet 44:38:39:00:01:7e;
           fixed-address6 2001:db8::100;
       }
    }
    

To set the DNS server IP address and domain name globally, add the DNS server IP address and domain name before the pool information in the /etc/dhcp/dhcpd6.conf file. For example:

cumulus@switch:~$ sudo nano /etc/dhcp/dhcpd6.conf
authoritative;
option domain-name servers;
option domain-name-servers 2001:db8:100::64;
subnet6 2001:db8::/64 {
   default-lease-time 3600;
   max-lease-time 3600;
...
  1. Edit the /etc/default/isc-dhcp-server6 file so that the DHCP server launches when the system boots. Here is an example configuration:

    cumulus@switch:~$ sudo nano /etc/default/isc-dhcp-server6
    DHCPD_CONF="-cf /etc/dhcp/dhcpd6.conf"
    

    INTERFACES="swp1"

  2. Enable and start the dhcpd6 service:

    cumulus@switch:~$ sudo systemctl enable dhcpd6.service
    cumulus@switch:~$ sudo systemctl start dhcpd6.service
    

Optional Configuration

Lease Time

You can set the network address lease time assigned to DHCP clients. You can specify a number between 180 and 31536000. The default lease time is 3600 seconds.

cumulus@switch:~$ nv set service dhcp-server default pool 10.1.10.0/24 lease-time 200000
cumulus@switch:~$ nv config apply
cumulus@switch:~$ nv set service dhcp-server6 default pool 2001:db8::/64 lease-time 200000
cumulus@switch:~$ nv config apply
  1. Edit the /etc/dhcp/dhcpd.conf file to set the lease time (in seconds):

    cumulus@switch:~$ sudo nano /etc/dhcp/dhcpd.conf
    authoritative;
    subnet 10.1.10.0 netmask 255.255.255.0 {
       option domain-name-servers 192.168.200.53;
       option domain-name example.com;
       default-lease-time 200000;
       max-lease-time 200000;
       default-url ;
    pool {
           range 10.1.10.100 10.1.10.199;
           }
    }
    
  2. Restart the dhcpd service:

    cumulus@switch:~$ sudo systemctl restart dhcpd.service
    
  1. Edit the /etc/dhcp/dhcpd6.conf file to set the lease time (in seconds):

    cumulus@switch:~$ sudo nano /etc/dhcp/dhcpd6.conf
    authoritative;
    subnet6 2001:db8::/64 {
       option domain-name-servers 2001:db8:100::64;
       option domain-name example.com;
       default-lease-time 200000;
       max-lease-time 200000;
       default-url ;
       pool {
           range6 2001:db8::100 2001:db8::199;
       }
    }
    
  2. Restart the dhcpd6 service:

    cumulus@switch:~$ sudo systemctl restart dhcpd6.service
    

Ping Check

Configure the DHCP server to ping the address you want to assign to a client before issuing the IP address. If there is no response, DHCP delivers the IP address; otherwise, it attempts the next available address in the range.

cumulus@switch:~$ nv set service dhcp-server default pool 10.1.10.0/24 ping-check on
cumulus@switch:~$ nv config apply
cumulus@switch:~$ nv set service dhcp-server6 default pool 2001:db8::/64 ping-check on
cumulus@switch:~$ nv config apply
  1. Edit the /etc/dhcp/dhcpd.conf file to add ping-check true;:

    cumulus@switch:~$ sudo nano /etc/dhcp/dhcpd.conf
    authoritative;
    subnet 10.1.10.0 netmask 255.255.255.0 {
       option domain-name-servers 192.168.200.53;
       option domain-name example.com;
       default-lease-time 200000;
       max-lease-time 200000;
       ping-check true;
       default-url ;
    pool {
           range 10.1.10.100 10.1.10.199;
           }
    }
    
  2. Restart the dhcpd service:

    cumulus@switch:~$ sudo systemctl restart dhcpd.service
    
  1. Edit the /etc/dhcp/dhcpd6.conf file to add ping-check true;:

    cumulus@switch:~$ sudo nano /etc/dhcp/dhcpd6.conf
    authoritative;
    subnet6 2001:db8::/64 {
       option domain-name-servers 2001:db8:100::64;
       option domain-name example.com;
       default-lease-time 200000;
       max-lease-time 200000;
       ping-check true;
       default-url ;
       pool {
           range6 2001:db8::100 2001:db8::199;
       }
    }
    
  2. Restart the dhcpd6 service:

    cumulus@switch:~$ sudo systemctl restart dhcpd6.service
    

Assign Port-based IP Addresses

You can assign an IP address and other DHCP options based on physical location or port regardless of MAC address to clients that attach directly to the Cumulus Linux switch through a switch port. This is helpful when swapping out switches and servers; you can avoid the inconvenience of collecting the MAC address and sending it to the network administrator to modify the DHCP server configuration.

Cumulus Linux does not provide NVUE commands for this setting.
Cumulus Linux does not provide NVUE commands for this setting.
  1. Edit the /etc/dhcp/dhcpd.conf file to add the interface and IP address:

    cumulus@switch:~$ sudo nano /etc/dhcp/dhcpd.conf
    ...
    authoritative;
    option cumulus-provision-url code 239 = text;
    

    subnet 10.0.0.0 netmask 255.255.255.0 { option domain-name-servers 192.168.200.53; option domain-name "example.com"; default-lease-time 3600; max-lease-time 3600; ping-check off;

    pool {
        range 10.0.0.2 10.0.0.254;
    }
    

    } group { host myhost { ifname "swp1" ; fixed-address 10.0.0.2 ; } } …

  2. Restart the dhcpd service:

    cumulus@switch:~$ sudo systemctl restart dhcpd.service
    
  1. Edit the /etc/dhcp/dhcpd6.conf file to add the interface and IP address:

    cumulus@switch:~$ sudo nano /etc/dhcp/dhcpd6.conf
    ...
    host myhost {
        ifname "swp1" ;
        fixed-address 2001:db8::100 ;
    }
    ...
    
  2. Restart the dhcpd6 service:

    cumulus@switch:~$ sudo systemctl restart dhcpd6.service
    

Troubleshooting

To show the current DHCP server settings, run the nv show service dhcp-server command:

cumulus@leaf01:mgmt:~$ nv show service dhcp-server
           Summary
---------  ------------------
+ default  interface:   "swp1
  default  pool: 10.1.10.0/24
  default  static:    server1

The DHCP server determines if a DHCP request is a relay or a non-relay DHCP request. Run the following command to see the DHCP request:

cumulus@server02:~$ sudo tail /var/log/syslog | grep dhcpd
2016-12-05T19:03:35.379633+00:00 server02 dhcpd: Relay-forward message from 2001:db8:101::1 port 547, link address 2001:db8:101::1, peer address fe80::4638:39ff:fe00:3
2016-12-05T19:03:35.380081+00:00 server02 dhcpd: Advertise NA: address 2001:db8::110 to client with duid 00:01:00:01:1f:d8:75:3a:44:38:39:00:00:03 iaid = 956301315 valid for 600 seconds
2016-12-05T19:03:35.380470+00:00 server02 dhcpd: Sending Relay-reply to 2001:db8:101::1 port 547

Considerations

DHCP packets received on bridge ports and sent to the CPU for processing cause the RX_DROP counter to increment on the interface.

DHCP Snooping

DHCP snooping enables Cumulus Linux to act as a middle layer between the DHCP infrastructure and DHCP clients by scanning DHCP control packets and building an IP-MAC database. Cumulus Linux accepts DHCP offers from only trusted interfaces and can rate limit packets.

DHCP option 82 processing is not supported.

Configure DHCP Snooping

To configure DHCP snooping, you need to:

The following example shows you how to configure DHCP snooping for IPv4 and IPv6.

NVUE does not provide commands to configure DHCP Snooping.

Create the /etc/dhcpsnoop/dhcp_snoop.json file and add DHCP snooping configuration under the bridge.

The following example enables DHCP snooping for IPv4 on VLAN 10, sets the rate limit to 50 and the trusted interface to swp3. swp3 is a member of the bridge br_default:

cumulus@leaf01:~$ sudo nano /etc/dhcpsnoop/dhcp_snoop.json
{
  "bridge": [
    {
      "bridge_id": "br_default",
      "vlan": [
        {
          "vlan_id": 10,
          "snooping": 1,
          "rate_limit": 50,
          "ip_version": 4,
          "trusted_interface": [
            "swp3"
          ],
        }
      ]
    }
  ]
}

The following example enables DHCP snooping for IPv6 on VLAN 10, sets the rate limit to 50 and the trusted interface to swp6. swp6 is a member of the bridge br_default:

cumulus@leaf01:~$ sudo nano /etc/dhcpsnoop/dhcp_snoop.json
{
  "bridge": [
    {
      "bridge_id": "br_default",
      "vlan": [
        {
          "vlan_id": 10,
          "snooping": 1,
          "rate_limit": 50,
          "ip_version": 6,
          "trusted_interface": [
            "swp6"
          ],
        }
      ]
    }
  ]
}

When DHCP snooping detects a violation, the packet is dropped and a message is logged to the /var/log/dhcpsnoop.log file.

Show the DHCP Binding Table

To show the DHCP binding table, run the net show dhcp-snoop table command for IPv4 or the net show dhcp-snoop6 table command for IPv6. The following example command shows the DHCP binding table for IPv4:

cumulus@leaf01:~$ net show dhcp-snoop table
Port VLAN IP        MAC               Lease State Bridge
---- ---- --------- ----------------- ----- ----- ------

swp5 1002 10.0.0.3  00:02:00:00:00:04 7200  ACK   br0

swp5 1000 10.0.1.3  00:02:00:00:00:04 7200  ACK   br0

Prescriptive Topology Manager - PTM

In data center topologies, right cabling is time consuming and error prone. PTM is a dynamic cabling verification tool that can detect and eliminate errors. PTM uses a Graphviz-DOT specified network cabling plan in a topology.dot file and couples it with runtime information from LLDP to verify that the cabling matches the specification. The check occurs on every link transition on each node in the network.

You can customize the topology.dot file to control ptmd at both the global/network level and the node/port level.

PTM runs as a daemon, named ptmd.

Supported Features

Configure PTM

ptmd verifies the physical network topology against a DOT-specified network graph file, /etc/ptm.d/topology.dot.

PTM supports undirected graphs.

At startup, ptmd connects to lldpd (the LLDP daemon) over a Unix socket and retrieves the neighbor name and port information. It then compares the retrieved port information with the configuration information that it reads from the topology file. If there is a match, it is a PASS, otherwise it is a FAIL.

PTM performs its LLDP neighbor check using the PortID ifname TLV information.

ptmd Scripts

ptmd executes scripts at /etc/ptm.d/if-topo-pass and /etc/ptm.d/if-topo-failfor each interface that goes through a change and runs if-topo-pass when an LLDP or BFD check passes or if-topo-fails when the check fails. The scripts receive an argument string that is the result of the ptmctl command; see ptmd commands below.

You can modify these default scripts.

Configuration Parameters

You can configure ptmd parameters in the topology file. The parameters are host-only, global, per-port/node and templates.

Host-only Parameters

Host-only parameters apply to the entire host on which PTM is running. You can include the hostnametype host-only parameter, which specifies if PTM uses only the hostname (hostname) or the fully qualified domain name (fqdn) while looking for the self-node in the graph file. For example, in the graph file below PTM ignores the FQDN and only looks for switch04 because that is the hostname of the switch on which it is running:

  • Always wrap the hostname in double quotes; for example, "www.example.com" to prevent ptmd from failing.
  • To avoid errors when starting the ptmd process, make sure that /etc/hosts and /etc/hostname both reflect the hostname you are using in the topology.dot file.

graph G {
          hostnametype="hostname"
          BFD="upMinTx=150,requiredMinRx=250"
          "cumulus":"swp44" -- "switch04.cumulusnetworks.com":"swp20"
          "cumulus":"swp46" -- "switch04.cumulusnetworks.com":"swp22"
}

In this next example, PTM compares using the FQDN and looks for switch05.cumulusnetworks.com, which is the FQDN of the switch ion which it is running:

graph G {
          hostnametype="fqdn"
          "cumulus":"swp44" -- "switch05.cumulusnetworks.com":"swp20"
          "cumulus":"swp46" -- "switch05.cumulusnetworks.com":"swp22"
}

Global Parameters

Global parameters apply to every port in the topology file. There are two global parameters: LLDP and BFD. LLDP is on by default; if no keyword is present, PTM uses the default values for all ports. However, BFD is off if no keyword is present unless a per-port override exists. For example:

graph G {
          LLDP=""
          BFD="upMinTx=150,requiredMinRx=250,afi=both"
          "cumulus":"swp44" -- "qct-ly2-04":"swp20"
          "cumulus":"swp46" -- "qct-ly2-04":"swp22"
}

Per-port Parameters

Per-port parameters provide finer-grained control at the port level. These parameters override any global or compiled defaults. For example:

graph G {
          LLDP=""
          BFD="upMinTx=300,requiredMinRx=100"
          "cumulus":"swp44" -- "qct-ly2-04":"swp20" [BFD="upMinTx=150,requiredMinRx=250,afi=both"]
          "cumulus":"swp46" -- "qct-ly2-04":"swp22"
}

Templates

Templates provide flexibility in choosing different parameter combinations and applying them to a given port. A template instructs ptmd to reference a named parameter string instead of a default one. There are two parameter strings ptmd supports:

For example:

graph G {
          LLDP=""
          BFD="upMinTx=300,requiredMinRx=100"
          BFD1="upMinTx=200,requiredMinRx=200"
          BFD2="upMinTx=100,requiredMinRx=300"
          LLDP1="match_type=ifname"
          LLDP2="match_type=portdescr"
          "cumulus":"swp44" -- "qct-ly2-04":"swp20" [BFD="bfdtmpl=BFD1", LLDP="lldptmpl=LLDP1"]
          "cumulus":"swp46" -- "qct-ly2-04":"swp22" [BFD="bfdtmpl=BFD2", LLDP="lldptmpl=LLDP2"]
          "cumulus":"swp46" -- "qct-ly2-04":"swp22"
}

In this template, LLDP1 and LLDP2 are templates for LLDP parameters. BFD1 and BFD2 are templates for BFD parameters.

Supported BFD and LLDP Parameters

ptmd supports the following BFD parameters:

The following is an example of a topology with BFD at the port level:

graph G {
          "cumulus-1":"swp44" -- "cumulus-2":"swp20" [BFD="upMinTx=300,requiredMinRx=100,afi=v6"]
          "cumulus-1":"swp46" -- "cumulus-2":"swp22" [BFD="detectMult=4"]
}

ptmd supports the following LLDP parameters:

The following is an example of a topology with LLDP at the port level:

graph G {
          "cumulus-1":"swp44" -- "cumulus-2":"swp20" [LLDP="match_hostname=fqdn"]
          "cumulus-1":"swp46" -- "cumulus-2":"swp22" [LLDP="match_type=portdescr"]
}

When you specify match_hostname=fqdn, ptmd matches the entire FQDN, (cumulus-2.domain.com in the example below). If you do not specify anything for match_hostname, ptmd matches based on hostname only, (cumulus-3 below), and ignores the rest of the URL:

graph G {
          "cumulus-1":"swp44" -- "cumulus-2.domain.com":"swp20" [LLDP="match_hostname=fqdn"]
          "cumulus-1":"swp46" -- "cumulus-3":"swp22" [LLDP="match_type=portdescr"]
}

BFD

BFD provides low overhead and rapid detection of failures in the paths between two network devices. It provides a unified mechanism for link detection over all media and protocol layers. Use BFD to detect failures for IPv4 and IPv6 single or multihop paths between any two network devices, including unidirectional path failure detection. For information about configuring BFD using PTM, see BFD.

You can enable PTM to perfom additional checks to ensure that routing adjacencies form only on links that have connectivity and that conform to the specification that ptmd defines.

You only need to enable PTM to check link state. You do not need to enable PTM to determine BFD status.

cumulus@switch:~$ nv set router ptm enable
cumulus@switch:~$ nv config apply

To disable the check link state:

cumulus@switch:~$ nv unset router ptm enable
cumulus@switch:~$ nv config apply
cumulus@switch:~$ sudo vtysh
...
switch# configure terminal
switch(config)# ptm-enable
switch(config)# end
switch# write memory
switch# exit
cumulus@switch:~$

To disable the check link state, set the no ptm-enable parameter:

cumulus@switch:~$ sudo vtysh
...
switch# configure terminal
switch(config)# no ptm-enable
switch(config)# end
switch# write memory
switch# exit
cumulus@switch:~$

To check PTM status on an interface, run the net show interface <interface> command or the vtysh show interface <interface> command.

cumulus@switch:~$ net show interface swp4
       Name  MAC                Speed  MTU   Mode
-----  ----  -----------------  -----  ----  -------------
ADMDN  swp4  48:b0:2d:59:0a:de  N/A    1500  NotConfigured

Routing
-------
Interface swp4 is up, line protocol is up
  Link ups:       0    last: (never)
  Link downs:     0    last: (never)
  PTM status: disabled
  vrf: default
  index 3 metric 0 mtu 1550 speed 4294967295
  flags: <UP,BROADCAST,RUNNING,MULTICAST>
  Type: Ethernet
  HWaddr: c4:54:44:bd:01:41
...

ptmd Service Commands

PTM sends client notifications in CSV format.

To start or restart the ptmd service, run the following command. The topology.dot file must be present for the service to start.

cumulus@switch:~$ sudo systemctl start|restart|force-reload ptmd.service

To instruct ptmd to read the topology.dot file again to apply the new configuration to the running state without restarting:

cumulus@switch:~$ sudo systemctl reload ptmd.service

To stop the ptmd service:

cumulus@switch:~$ sudo systemctl stop ptmd.service

To retrieve the current running state of ptmd:

cumulus@switch:~$ sudo systemctl status ptmd.service

ptmctl Commands

ptmctl is a client of ptmd that retrieves the operational state of the ports configured on the switch and information about BFD sessions from ptmd. ptmctl parses the CSV notifications sent by ptmd. See man ptmctl for more information.

ptmctl Examples

The examples below contain the following keywords in the output of the cbl status column:

cbl status KeywordDefinition
passThe topology file defines the interface, the interface receives LLDP information, and the LLDP information for the interface matches the information in the topology file.
failThe topology file defines the interface, the interface receives LLDP information, and the LLDP information for the interface does not match the information in the topology file.
N/AThe topology file defines the interface but the interface does not receive LLDP information. The interface might be down or disconnected, or the neighbor is not sending LLDP packets.
The N/A and fail status might indicate a wiring problem to investigate.
The N/A status does not show when you use the -l option with ptmctl; The output shows only interfaces that are receiving LLDP information.

For basic output, use ptmctl without any options:

cumulus@switch:~$ sudo ptmctl

-------------------------------------------------------------
port  cbl     BFD     BFD                  BFD    BFD
      status  status  peer                 local  type
-------------------------------------------------------------
swp1  pass    pass    11.0.0.2             N/A    singlehop
swp2  pass    N/A     N/A                  N/A    N/A
swp3  pass    N/A     N/A                  N/A    N/A

For more detailed output, use the -d option:

cumulus@switch:~$ sudo ptmctl -d

--------------------------------------------------------------------------------------
port  cbl    exp     act      sysname  portID  portDescr  match  last    BFD   BFD
      status nbr     nbr                                  on     upd     Type  state
--------------------------------------------------------------------------------------
swp45 pass   h1:swp1 h1:swp1  h1       swp1    swp1       IfName 5m: 5s  N/A   N/A
swp46 fail   h2:swp1 h2:swp1  h2       swp1    swp1       IfName 5m: 5s  N/A   N/A

#continuation of the output
-------------------------------------------------------------------------------------------------
BFD   BFD       det_mult  tx_timeout  rx_timeout  echo_tx_timeout  echo_rx_timeout  max_hop_cnt
peer  DownDiag
-------------------------------------------------------------------------------------------------
N/A   N/A       N/A       N/A         N/A         N/A              N/A              N/A
N/A   N/A       N/A       N/A         N/A         N/A              N/A              N/A

To return information on active BFD sessions ptmd is tracking, use the -b option:

cumulus@switch:~$ sudo ptmctl -b

----------------------------------------------------------
port  peer        state  local         type       diag

----------------------------------------------------------
swp1  11.0.0.2    Up     N/A           singlehop  N/A
N/A   12.12.12.1  Up     12.12.12.4    multihop   N/A

To return LLDP information, use the -l option. The output returns only the active neighbors that ptmd is tracking.

cumulus@switch:~$ sudo ptmctl -l

---------------------------------------------
port  sysname  portID  port   match  last
                       descr  on     upd
---------------------------------------------
swp45 h1       swp1    swp1   IfName 5m:59s
swp46 h2       swp1    swp1   IfName 5m:59s

To return detailed information on active BFD sessions ptmd is tracking, use the -b and -d option (results are for an IPv6-connected peer):

cumulus@switch:~$ sudo ptmctl -b -d

----------------------------------------------------------------------------------------
port  peer                 state  local  type       diag  det   tx_timeout  rx_timeout
                                                          mult
----------------------------------------------------------------------------------------
swp1  fe80::202:ff:fe00:1  Up     N/A    singlehop  N/A   3     300         900
swp1  3101:abc:bcad::2     Up     N/A    singlehop  N/A   3     300         900

#continuation of output
---------------------------------------------------------------------
echo        echo        max      rx_ctrl  tx_ctrl  rx_echo  tx_echo
tx_timeout  rx_timeout  hop_cnt
---------------------------------------------------------------------
0           0           N/A      187172   185986   0        0
0           0           N/A      501      533      0        0

ptmctl Error Outputs

If there are errors in the topology file or there is no session, PTM returns appropriate outputs. Typical error strings are:

Topology file error [/etc/ptm.d/topology.dot] [cannot find node cumulus] -
please check /var/log/ptmd.log for more info

Topology file error [/etc/ptm.d/topology.dot] [cannot open file (errno 2)] -
please check /var/log/ptmd.log for more info

No Hostname/MgmtIP found [Check LLDPD daemon status] -
please check /var/log/ptmd.log for more info

No BFD sessions . Check connections

No LLDP ports detected. Check connections

Unsupported command

For example:

cumulus@switch:~$ sudo ptmctl
-------------------------------------------------------------------------
cmd         error
-------------------------------------------------------------------------
get-status  Topology file error [/etc/ptm.d/topology.dot]
            [cannot open file (errno 2)] - please check /var/log/ptmd.log
            for more info

If you encounter errors with the topology.dot file, you can use dot (included in the Graphviz package) to validate the syntax of the topology file.

Open the topology file with Graphviz to ensure that it is readable and that the file format is correct.

If you edit topology.dot file from a Windows system, be sure to double check the file formatting; there might be extra characters that keep the graph from working correctly.

Basic Topology Example

The following example shows a basic example DOT file and its corresponding topology diagram. Use the same topology.dot file on all switches and do not split the file for each device to allow for easy automation by using the same exact file on each device.

graph G {
    "spine1":"swp1" -- "leaf1":"swp1";
    "spine1":"swp2" -- "leaf2":"swp1";
    "spine2":"swp1" -- "leaf1":"swp2";
    "spine2":"swp2" -- "leaf2":"swp2";
    "leaf1":"swp3" -- "leaf2":"swp3";
    "leaf1":"swp4" -- "leaf2":"swp4";
    "leaf1":"swp5s0" -- "server1":"eth1";
    "leaf2":"swp5s0" -- "server2":"eth1";
}

Considerations

ptmd Is in an Incorrect Failure State

When ptmd is in an incorrect failure state and you enable the Zebra interface, PIF BGP sessions do not establish the route but the subinterface does establish routes.

If the subinterface is on the physical interface and PTM marks the physical interface in a PTM FAIL state, FRR does not process routes on the physical interface, but the subinterface is working.

Commas in Port Descriptions

If an LLDP neighbor advertises a PortDescr that contains commas, ptmctl -d splits the string on the commas and misplaces its components in other columns. Do not use commas in your port descriptions.

Port Security

Port security is a layer 2 traffic control feature that enables you to manage network access from end-users. Use port security to:

You can specify what action to take when there is a port security violation (drop packets or put the port into ADMIN down state) and add a timeout for the action to take effect.

  • Layer 2 interfaces in trunk or access mode are currently supported. However, interfaces in a bond are not supported.
  • NVUE commands are not available for port security configuration.

Configure Port Security

To configure port security, add the configuration settings you want to use to the /etc/cumulus/switchd.d/port_security.conf file, then restart switchd to apply the changes.

Setting
Description
interface.<port>.port_security.enable1 enables security on the port. 0 disables security on the port.
interface.<port>.port_security.mac_limitThe maximum number of MAC addresses allowed to access the port. You can specify a number between 0 and 512. The default is 32.
interface.<port>.port_security.static_macThe specific MAC addresses allowed to access the port. You can specify multiple MAC addresses. Separate each MAC address with a space.
interface.<port>.port_security.sticky_mac1 enables sticky MAC, where the first learned MAC address on the port is the only MAC address allowed. 0 disables sticky MAC.
interface.<port>.port_security.sticky_timeoutThe time period after which the first learned MAC address ages out and no longer has access to the port. The default aging timeout value is 30 minutes. You can specify a value between 0 and 60 minutes.
interface.<port>.port_security.sticky_aging1 enables sticky MAC aging. 0 disables sticky MAC aging.
interface.<port>.port_security.violation_modeThe violation mode: 0 (shutdown) puts a port into ADMIN down state. 1 (restrict) drops packets.
interface.<