If you are using the current version of Cumulus Linux, the content on this page may not be up to date. The current version of the documentation is available here. If you are redirected to the main page of the user guide, then this page may have been renamed; please search for it there.

Cumulus Linux 5.1 Release Notes

Download 5.1 Release Notes xls    Download all 5.1 release notes as .xls

5.1.0 Release Notes

Open Issues in 5.1.0

Issue IDDescriptionAffectsFixed
When you try to upgrade a switch from Cumulus Linux 5.5 or earlier to 5.8.0 with package upgrade, you see errors for expired GPG keys that prevent you from upgrading. To work around this issue, install the new keys with the following commands, then upgrade the switch.
cumulus@switch:~$ wget https://download.nvidia.com/cumulus/apt.cumulusnetworks.com/repo/pool/cumulus/c/cumulus-archive-keyring/cumulus-archive-keyring_4-cl5.6.0u5_all.deb
cumulus@switch:~$ sudo apt install ./cumulus-archive-keyring_4-cl5.6.0u5_all.deb
cumulus@switch:~$ sudo apt update
cumulus@switch:~$ sudo apt upgrade
4.0.0-4.4.5, 5.0.0-5.8.0
When monitoring system statistics and network traffic with sFlow, an aggressive link flap might produce a memory leak in the sFlow service hsflowd.5.1.0-
DHCP lease information is not collected in the cl-support file.4.3.0-
If BGP remote-as is set to an integer and you try to configure the local-as for a BGP instance, you see the following error:
% AS specified for local as is the same as the remote as and this is not allowed
This configuration is not allowed; it is considered to be eBGP and local preference is not advertised.
5.0.0-, 5.6.0-5.8.0
NVUE and ip link show traditional bridge VLAN subinterface counts incorrectly. The ingress (Rx) count increments correctly but the egress (Tx) count does not increment. This issues occurs because the hardware does not support transmit counters for a VLAN subinterface; therefore, no statistics from the hardware are updated. Statistics for software forwarded packets show correctly.5.0.0-
When you configure a route distinguisher (RD) or a route target (RT) manually for layer 2 VNIs, type-1 routes are not properly updated, type-1 EVI routes with the old RD are not properly withdrawn, and type-1 ES routes do not have the corresponding layer 2 VNI route target updated.5.0.0-
CVE-2023-38408: The PKCS#11 feature in ssh-agent in OpenSSH before 9.3p2 has an insufficiently trustworthy search path, leading to remote code execution if an agent is forwarded to an attacker-controlled system. (Code in /usr/lib is not necessarily safe for loading into ssh-agent.) NOTE: this issue exists because of an incomplete fix for CVE-2016-10009
Mitigation: Do not use ssh-agent forwarding (the man page for ssh_config says that “agent forwarding should be enabled with caution”), or start the ssh-agent program with the -P option to allow only specific PKCS#11 libraries (or none with -P ‘')
For Cumulus Linux 4.3.2, the /usr/bin/ssh-agent program has all permissions turned off (chmod 0) to prevent its execution if a vulnerable version is detected.
4.0.0-4.3.1, 5.0.0-
Collecting a cl-support file in a high VNI and interface environment can result in an out-of-memory (OOM) event on the switch. An OOM event can cause critical services to restart and might impact traffic.5.1.0-
When BGP receives an EVPN type-5 route with a gateway IP overlay attribute, the gateway IP overlay attribute in the attr memory (which is already inserted in the attribute hash) might change. As a result, the modified attr memory might match with another attr in the attribute hash, which produces duplicate entries in the hash table. As a result, BGP might crash when deleting one of the duplicate attr structures.5.0.0-
When zebra receives route updates that include both a route with a recursive next hop and the route used to resolve that next hop, zebra might mark the route with the recursive next hop as inactive. To work around this issue, reprocess the route updates by running the appropriate clear command for the protocol in use. For example, for BGP, clear inbound routes from the relevant neighbor using the nv action clear vrf router bgp neighbor address-family in command.4.2.1-
The SNMP MIB definition file /usr/share/snmp/mibs/Cumulus-BGPVRF-MIB.txt does not define the INDEX of the bgpPeerEntry correctly. This issue does not impact SNMP functionality for this MIB.4.3.1-
FRR does not apply Type-0 ESI configuration for EVPN multihoming bonds consistently after an FRR service reload. This issue occurs because the system MAC address value (es-sys-mac) is only compatible with a 3-byte Ethernet segment ID (es-id) for Type-3 ESIs, but still renders even when the Ethernet segment ID is 10 bytes for Type-0 ESIs. To work around this issue, configure EVPN multihoming bonds with a Type-3 ESI (es-sys-mac plus a 3-byte es-id).5.0.0-
When you remove the restriction from a TACACS+ mapped user to remove per command authorization, the tacplus-restrict -R command does not restore ownership of restored files correctly. As a result, some commands might fail due to permission errors in the files or directories under the home directory. To work around this issue, run the sudo chown command to correct the ownership of the affected files and directories.5.0.0-
On the Spectrum-2 and Spectrum-3 switch, multiple interfaces (in the same PLL quarter) might flap intermittently at the same time.4.2.1-
sudo for TACACS+ users with privilege level 15 does not work when reaching the TACACS+ server through the default VRF. To work around this issue, specify the interface name that the default VRF uses in the vrf= setting of the /etc/tacplus_servers file or run the NVUE nv set system aaa tacacs vrf command. If you don’t run either command, a TACACS+ user with privilege level 15 can run vrf task exec default sudo … to execute the sudo command.5.0.0-5.8.0
The ADVA 5401 SFP module with hardware revision 5.01 does not come up at layer 1 when you use 10G QSA adaptors. To work around this issue, use 25G QSA adaptors.4.4.0-4.4.5, 5.0.0-5.8.0
During upgrade, when one MLAG node is upgraded and the other MLAG node is not yet upgraded, permanent neighbors cannot synchronize between MLAG nodes. The clagctl dumppermanentneighs command only shows local neighbors.4.2.1-4.3.1, 4.4.0-, 5.5.0-5.8.0
To reach the TACACS+ server through the default VRF, you must specify the egress interface you use in the default VRF. Either run the NVUE nv set system aaa tacacs vrf command (for example, nv set system aaa tacacs vrf swp51) or set the vrf= option in the /etc/tacplus_servers file (for example, vrf=swp51). A similar issue might prevent TACACS+ users with privilege level 15 from using sudo if the TACACS+ server is reachable only on the default VRF. If this occurs, and you do not run the above configuration workaround, the TACACS+ user with privilege level 15 can use vrf task exec default sudo … to execute the sudo command using the TACACS+ server on the default VRF.5.0.0-
For layer 3 interfaces configured on the switch, certain triggers, such as port flaps and subinterface flaps, or when configuring the ports to and from layer 2 and layer 3, cause the dummy internal VLAN to not free up, which can result in exhaustion of the dummy internal VLANs designated for the layer 3 interfaces. When this occurs, you see the following switchd log messages:
ERR dummy internal vlans exhaustedERR cannot allocate vlan for sub-interface
TACACS+ packages in the local apt repository might be out of date; as a result, the upgrade does not install tacacs0 through tacacs15 users in the correct NVUE groups. When you run NVUE commands as a TACACS+ user, the commands fail and you see the error You do not have permission to execute that command
To obtain the correct packages, install the tacplus-client package and its dependencies from apt.cumulusnetworks.com.
Currently, the default core dump size limit on Cumulus Linux is 256M but the SDK generates core dumps around 800M. To avoid incomplete core files, you can increase the core dump size limit.4.2.1-4.3.1, 4.4.0-, 5.4.0-5.8.0
NVUE deprecated the port split command options (2x10G, 2x25G, 2x40G, 2x50G, 2x100G, 2x200G, 4x10G, 4x25G, 4x50G, 4x100G, 8x50G) with no backwards compatibility.5.0.0-
In an MLAG configuration, when a link failure occurs on the peerlink or the peerlink shuts down, the switch in the secondary role attracts traffic to its local VTEP as it advertises the local VTEP IP address momentarily just before the VXLAN device is protodown. This traffic is dropped for a brief moment (between 5 and 10 seconds) because the MLAG bonds on the secondary switch are already protodown.5.1.0-
When using TACACS+, a TACACS+ server name that returns more than one IP address, such as an IPv6 and IPv4 address, is counted many times against the limit of seven TACACS+ servers, which might cause some of the later listed servers to be ignored as over the limit. To work around this issue, you can set the prefer_ip_version configuration option (the default value is 4) to choose between an IPv4 or IPv6 address if both are present.3.7.0-
If you use su to change to a user specified through TACACS+, the user becomes the local tacacs0 thru tacacs15 user instead of the named user to run sudo commands. As a result, the named user password might not match the local tacacs0 thru tacacs15 user password.3.7.0-3.7.16, 4.0.0-4.4.5, 5.0.0-5.8.0
Some EVPN multihoming show commands might cause BGP to crash if you use the json flag and attempt to reference the default VRF by name. For example, show bgp l2vpn evpn es-vrf json.5.0.0-
After restarting switchd on the NVIDIA SN2100 switch, the FAN speeds are at one hundred percent. To work around this issue, restart the hw-management service.4.4.5-
When you try to configure VRF route leaking between many VRFs using multiple NCLU commands before running the net commit command, the commit fails. To work around this issue, configure VRF leaking one command at a time and run net commit after each command.4.4.4-
When daylight saving time changes the time, the MLAG initDelay timer resets and all MLAG bonds go down.4.4.4-
Certain routes on tenant VRFs have missing next hop entries because the router MAC address is missing in the bridge forwarding database table that corresponds to the remote VTEP. As a result, traffic forwarding is affected for these routes.4.3.0-
When you run the NVUE nv set bridge domain br_default multicast snooping enable off command to disable multicast snooping, the bridge still shows that multicast snooping is enabled.5.0.1-
Multicast PTP over UDP traffic does not forward to data ports when the PTP service is disabled. To work around this issue, change the ptp.timestamping setting to FALSE in the /etc/cumulus/switchd.conf file, then restart switchd.5.0.1-
RADIUS authenticated users with read-only access to NCLU commands (users in the users_with_show list) can run edit commands if a username for a non-local account is on the users_with_edit line of the /etc/netd.conf file. To work around this issue, make sure that all usernames on the users_with_edit line of the /etc/netd.conf file are configured local users for the system (real Linux users).3.7.0-
On the NVIDIA Spectrum-2 switch, when receiving multicast traffic on a PIM enabled VLAN, the multicast traffic is forwarded correctly to the associated VLAN, however WJH shows traffic loss with the error:

Packet size is larger than router interface MTU – Validate the router interface MTU configuration
The NVIDIA SN4600 switch might experience SDK errors caused by the garbage collection process.5.1.0-
At high scale with 79 VRFs and 10 VLANs per VRF (a total of 790 VLANs), clagd loses backup connection during a switchd restart. To work around this issue, reduce the scale to 40 VRFs with no more than 400 VLANs in the configuration, and use a common MAC address.5.1.0-
Locally generated multicast traffic including IGMPv2 GSQs do not transmit to local clients when using PIM.5.0.1-
After rebooting the switch, the IPv6 link local address for an SVI that belongs to non-default VRF is missing, and doesn’t show on the switch. To resolve this issue, run the ifreload -a command.5.0.0-
When the switch receives an LLDP frame from a Cisco router right after a ptmd restart, the ptmd service crashes.4.3.0-4.3.1, 4.4.0-, 5.3.0-5.8.0
At high scale with 160 VRFs and 10 VLANs per VRF (a total of 1600 VLANs), you see traffic loss during primary switch reboot. To work around this issue, reduce the scale to 40 VRFs with no more than 400 VLANs in the configuration, and use a common MAC address.5.1.0-
The EVPN Multihoming ESI configuration command nv set interface evpn multihoming segment identifier does not work.5.1.0-
The cl-support generation script causes TC filter collection to run as a background process for each interface, which can lead to memory exhaustion on a high scale configuration and on a switch with a small memory footprint.5.1.0-
The NVUE nv set bridge domain br_default stp priority command does not change the STP priority.5.1.0-
In rare cases, changing configuration on an existing bond, VLAN, or VXLAN interface can result in the MTU of that interface being reset to 0. To work around this issue, run ifreload -a a second time to set the MTU back to the configured or default value.
FRR does not install EVPN type-2 routes correctly after the specific operation that deletes and adds all non-uplink ports. The routes show as rejected in the zebra RIB. To work around this problem, restart FRR with the sudo systemctl restart frr command.5.1.0-
If there is extensive and continuous next-hop group (NHG) churn when routes keep moving from one NHG to another NHG repeatedly, switchd increases in memory allocation until memory is exhausted. Other processes might be affected as they try to acquire memory which is unavailable.
The NVUE command to disable EVPN duplicate address detection does not work. To work around this issue, use an NVUE snippet.
When you try to query REDECN counters with the mlxcmd utility on a bond member port with the following commands, syslog reports an error
sudo /usr/lib/cumulus/mlxcmd roce counters –port sudo /usr/lib/cumulus/mlxcmd qos counters –clear –port 
Cumulus Linux incorrectly programs overlay routes in the hardware as LOCAL routes instead of pointing to the remote VTEP even though the kernel has the correct route entry and next hop. To recover from this state, restart the switchd service with the systemctl restart switchd.service command.
During a host failure, where a link remains up but LACP stops being sent, the EVPN multihoming ES bond goes into bypass mode active without a link state change.4.4.2-
When a ZTP script executes a switchd restart, the switchd service might fail with the following log message:
switchd[11549]: hal.c:1378 CRIT No backends found
To work around this issue, avoid restarting the switchd service in the ZTP script; reboot the switch instead.
NVUE configuration commands produce errors when included as part of a ZTP script that executes automatically during the switch boot process. This occurs because the $HOME variable is not set during ZTP. This does not occur if you trigger ZTP manually from the CLI with the sudo ztp -r http://x.x.x.x/cumulus-ztp command. To work around this issue, define the $HOME variable within the ZTP script with export HOME=/root.
FRR does not establish BGP peering with neighbors configured with a router ID that overlaps with IP addresses in the class D or E address spaces.
The BGP4-MIB.txt file is missing from Net-SNMP agent.5.0.0-
A slow memory leak (~5KB over 24 hour period at a 60 second polling interval) might occur in SNMP when you walk the following system MIB objects ( –> Entity MIB –> Entity Sensor MIB –> rip2 –> interface/interfaces –> ifMIB –> IP –> hostResource
If there is extensive and continuous next-hop group (NHG) churn when routes keep moving from one NHG to another NHG repeatedly, switchd increases in memory allocation until memory is exhausted. Other processes might be affected as they try to acquire memory which is unavailable.5.0.1-
When Cumulus Linux updates the ECMP container with a new next hop list, it allocates the flow counters for the new next hop list without deallocating the counters bound to the old next hop list. This results in resource exhaustion and you see the following error messages in the /var/log/switchd.log file:
hal_mlx_stat.c:3215 ERR Failed to allocate counter(s) for ecmp [71025:0] status: Internal Errorhal_mlx_stat.c:3196 ERR Counter set for ecmp [71025:0] idx 0 failed: Internal Errorhal_mlx_sdk_nexthop_wrap.c:1076 ERR Counter 0 alloc for ecmp next hop failed: Internal Errorhal_mlx_sdk_counter_wrap.c:54 ERR Counter alloc failed: No More Resources
This issue does not have any functional impact to forwarding. Even without the flow counters attached to the ECMP group, packet forwarding works without any issues
To avoid allocating next hop counters for any new ECMP next hop list update, set mlx.stats.ecmp.enable to FALSE in the /etc/mlx/datapath/stats.conf file, then restart switchd with the sudo systemctl reload switchd command.
The switch duplicates DHCP packets that pass through the VTEP.4.3.0-
When the next hop interface for EVPN type 5 routes flaps, FRR might uninstall the routes and Route install failed appears in /var/log/frr/frr.log. To work around this problem, restart FRR with the sudo systemctl restart frr command.4.4.0-
When a layer 3 neighbor entry resolves to a bridge FDB entry that does not exist in the kernel, switchd might contribute to high CPU load while it continues to try to sync and resolve the neighbor entry. This results in many sync_l3_nexthop messages printed to /var/log/switchd.log.5.0.1-
When you upgrade from Cumulus Linux 5.0.1 to Cumulus Linux 5.1.0, the upgrade adds KexAlgorithms and MACs configuration to the /etc/ssh/sshd_config file without prompting for confirmation. This might cause the /etc/ssh/sshd_config file to be incorrect if there is a Match section; KexAlgorithms and MACs must come before Match. To work around this issue, move the lines that start with KexAlgorithms and MACs before Match or remove them, then restart the SSH service with the sudo systemctl restart ssh command. If you have already specified KexAlgorithms or MACs, you can remove the newly added lines after upgrade.5.0.1-
The tacplus package does not create the correct tacacs0-15 users in the right groups. NVUE commands are rejected with the error: “You do not have permission to execute that command.” To work around this issue, add tacacs15 to the nvapply group. Also, add tacacs0 through 14 to the nvshow group:
sudo usermod -a -G nvapply tacacs15sudo usermod -a -G nvshow tacacs0..
sudo usermod -a -G nvshow tacacs14
After you configure the NVIDIA SN2010 series switch for the first time with NVUE, you see the fan speed at 100 percent. To work around this issue, run the sudo systemctl restart hw-management.service command to restart the hardware management service.
During EVPN multihoming bond failover, ARP and ND redirection fails if you configure layer 2 VNIs and ES bonds before you configure the loopback IP address of the switch. To work around this issue, configure the loopback IP address, then restart FRR with the systemctl restart frr command.4.3.0-
When you configure an interface in FRR to send IPv6 RAs before you configure the interface in the /etc/network/interfaces file, the switch does not process IPv6 RAs. To work around this issue, remove the interface configuration in FRR and reapply it.3.7.15-4.3.0, 4.4.0-, 5.2.0-5.8.0
If the switch receives an EVPN route with multiple RTs that match the import policy for a local VNI, the bgpd service crashes.5.0.0-
In an MLAG topology, if you admin down a single connected interface, any dynamic MAC addresses on the peer link are flushed, then added back momentarily, which creates a disruption in traffic.3.7.15-
When you edit the /usr/share/openvswitch/scripts/ovs-ctl-vtep file to change the ovs-vtepd configuration between vlan-aware and vlan-unaware mode, ovs-vtepd crashes when you restart the service. To recover, restart the networking service with the sudo systemctl restart networking command.4.3.0-
In the Cumulus-BGPVRF-MIB, the bgpPeerFsmEstablishedTime OID does not correctly report the time since a BGP session goes down.4.4.4-
When you configure two VNIs in the same VLAN, ifupdown2 shows a vlan added to two or more VXLANS warning, which is only issued after the VNI is already added to the bridge. This leaves the new VNI in the PVID even if there is already an existing VNI configured in that PVID.5.1.0-5.8.0
When you configure a VRF static route using the legacy command syntax in FRR (for example: ip route vrf vrf-red), then make subsequent VRF or route configuration changes, FRR might crash. To avoid this problem, use the current method for configuring VRF routes within the VRF stanza:
vrf vrf-red
ip route vrf vrf-redend vrf
In the Cumulus-BGPVRF-MIB, the bgpPeerFsmEstablishedTransitions OID always reports a value of
After you disable traffic shaping in the /etc/cumulus/datapath/qos/qos_features.conf file, the default QOS traffic shaping configuration does not restore. To work around this issue, restart switchd.4.4.3, 5.0.0-
Under a high load, you might see ingress drop counters increase. The drops are classified as HwIfInDiscards in ethtool and shown as ingress_general in hardware.4.3.0-4.4.5, 5.0.0-5.8.0
syslog writes phcsync phc_ctl set clock time messages continuously every minute even when supervisord is not running, which prevents critical information from being logged.
On the NVIDIA SN4800 switch, the LED on the line cards does not match the CLI command output.
On the NVIDIA Spectrum 1 switch, when a port goes down, it might not come back up. To work around this issue, disable, then enable the port.5.0.0-, 5.2.0-5.8.0
When you run the NVUE command to change the minimum interval between received BFD control packets or the minimum interval for sending BFD control packets, the configuration apply fails.

cumulus@switch:~$ nv set vrf default router bgp neighbor bfd min-rx-interval 400
cumulus@switch:~$ nv config apply
2022-05-04T21:36:10.800975+00:00 switch frrinit.sh16431: Stopped watchfrr.
When you configure multiple multicast RPs with groups matched by prefix lists, Cumulus Linux selects only one of the RPs and this selection is incorrect.5.0.1-
When a MAC address is moved to a new VTEP in an EVPN MAC mobility scenario using traditional bridges, there might be up to 30 seconds of convergence delay.5.0.1-
You can not apply NVUE configurations when TACACS is enabled for user authentication. To work around this issue, add the nvue account to the exclude_users line in /etc/tacplus_nss.conf:
On rare occasions, after you reboot or restart switchd on a Spectrum 1 switch, any 25G connections with Direct Attach Copper (DAC) cables that connect from the switch to a non-NVIDIA device might flap continuously. To work around this issue, bring the affected link administratively down for a few seconds on the non-NVIDIA device, then bring the link back up.4.4.4-4.4.5, 5.1.0-5.8.0
When you run the systemctl reload switchd command, there is momentary traffic loss after a port configured with lossless buffers goes down. This is only temporary and the traffic stabilizes after the initial drops.5.1.0-
In an EVPN-MH configuration, the switch fails to redirect tagged frames with the CoS bits set.4.4.0-4.4.3, 5.0.0-
When the CPU load is high during a warm boot, bonds with a slow LACP rate fail to forward layer 2 traffic for up to 60 seconds (depending on the duration of the CPU load) and static bonds fail to forward layer 2 traffic for up to 5 seconds.5.1.0-5.8.0
When you run the NVUE command to change the minimum interval between received BFD control packets or the minimum interval for sending BFD control packets, the configuration apply fails
cumulus@switch:~$ nv set vrf default router bgp neighbor bfd min-rx-interval 400cumulus@switch:~$ nv config apply2022-05-04T21:36:10.800975+00:00 switch frrinit.sh16431: Stopped watchfrr
When you add an interface to a layer 3 bond, traffic does not forward and you see errors similar to the following:
2022-05-02T13:14:40.118597+00:00 cumulus sx_sdk: ROUTER: Failed to delete router interface(27) ref count isn’t 0, err= Resource is in use
4.4.2-4.4.3, 5.0.1-
When you configure VRF leaking from the default VRF to a non-default VRF, SSH sessions originating from the switch CLI in the default VRF do not connect to devices in the non-default VRF.5.0.1-
In an OSPF configuration, after you change the IPv6 subnet mask, the old address remains in the RIB as a connected OSPF route
To resolve this issue, restart FRR with the sudo systemctl restart frr command.
After you run Linux commands to enable a custom ECMP or LAG hash parameter, if you set the hash_config.enable or lag_hash_config.enable parameter to false, the custom parameters do not restore their default values. To work around this issue, change the custom ECMP or LAG hash parameters to their default values manually.5.1.0-
When you run NVUE commands as part of ZTP scripts, the commands fail with errors that indicate a missing $HOME environment variable. The issue has been fixed where the ZTP module initializes the $HOME environment variable before launching the ZTP scripts. However, if you are running older releases, before you use any NVUE commands in the ZTP script, add a section and define the HOME environment variable. Populate the variable with the default expected root user home directory value (/root), then export the HOME variable so it is available globally for NVUE to use
HOME=/rootexport HOME
When the CPU load is high during a warm boot, bonds with a slow LACP rate fail to forward layer 2 traffic for up to 60 seconds (depending on the duration of the CPU load) and static bonds fail to forward layer 2 traffic for up to 5 seconds.5.1.0-5.8.0
Spectrum-2 and Spectrum-3 switches do not support 1G speed with Cumulus Linux.5.1.0-
The cl-resource-query command output shows ECMP nextHop Table exhaustion (above 100 percent utilization) and the switchd.log file contains ECMP resource errors with routes and next hops failing to install.4.2.1-
If GTP Hashing is set to true, after more than two warm boots, switchd fails and a cl-support file is generated.5.1.0-
In the non-default VRF, BFD goes down after port flap.5.0.1-
NVUE configuration and show commands are not available for GTP hashing. To configure GTP hashing, modify the parameters in the /etc/cumulus/datapath/traffic.conf file.
When you add or remove PortAutoEdge on a bond with the NVUE nv set interface bridge domain br_default stp auto-edge command, the command fails with the following error and then attempts to enable or disable PortAutoEdge on any interface also fail
cumulus@switch:~$ nv set interface swp1 bridge domain br_default stp auto-edge offcumulus@switch:~$ nv config applyUnable to reload-or-restart services (switchd,ifreload-nvue.service):[sudo] password for nvue: Job for ifreload-nvue.service failed because the control process exited with error code
Failure during apply. Ignore? [y/N]
When you configure EVPN multihoming with NVUE on a switch with the Spectrum-a1 ASIC, you must configure the following snippet to enable EVPN multihoming in hardware. This is not required for Spectrum-2 or Spectrum-3 switches
- set:
file: “/etc/cumulus/switchd.conf”
content: |
permissions: “0644”
service: switchd
action: restart
Apply the snippet with the nv config patch <snippet.yaml> command, then run the nv config apply -y command.
The NVUE nv show interface link state command shows an empty table instead of showing the port link state.5.0.0-
In an MLAG EVPN deployment when either of the MLAG peers reboots, FRR incorrectly programs the local host entries in the ARP table as remote. To work around this issue, either restart FRR or use BGP policies to mark and drop routes within an MLAG pair. Both MLAG peers must have an outbound policy that add a community representing the unique MLAG pair to Type-2 EVPN routes and an inbound policy to match and drop that community.4.4.4-
When you run NVUE commands to unset one or more options associated with a field, the command fails with an error. For example:
cumulus@switch:~$ nv unset system forwarding ecmp-hash source-portusage: nv unset system forwarding ecmp-hash [options]nv unset system forwarding ecmp-hash: error: unrecognized arguments: source-port
When ARP suppression is off, Cumulus Linux sends GARPs from neighmgrd for remote neighbors over VXLAN.3.7.15-4.3.0, 4.4.0-4.4.3, 5.0.0-, 4.4.4-4.4.5, 5.2.0-5.8.0
In certain cases, when you power cycle the switch, the NVUE configuration might become corrupted, which prevents NVUE from running. You see a critical error in the log file similar to:
CRITICAL: cue_versions_v1.repo: The NVUE internal data store is corrupted or has been initialized incorrectly. The is an unrecoverable error
To work around this issue, remove the /var/lib/nvue/config and /var/lib/nvue/meta directories, then restart the nvued service with the sudo systemctl start nvued command. If possible, NVUE recovers user configuration and saves it in the /etc/nvue.d directory. The recovered configuration will be saved as YAML files, which are named as nvue-recovery-.yaml. You can reapply the recovered configuration with the nv config patch nvue-recovery-.yaml followed by nv config apply commands.
The NVUE nv show interface command shows the operational state of the tunnel as down even though the tunnel is up, and encapsulation and decapsulation occurs correctly.5.1.0-
On the NVIDIA SN3420 switch, the smonctl command output shows the maximum PSU temperature higher than the critical temperature.4.4.2-4.4.3, 5.0.0-
On the NVIDIA SN2010 and SN2100 switch, smond indicates that the FAN status is BAD and syslog is flooded with Path /run/hw-management/thermal/fan1_status does not exist errors. When you run the smonctl -v command, the TEMP on switch looks OK
cumulus@switch:~$ smonctl -vFan1(Fan 1): BAD fan:6931 RPM (max = 25000 RPM, min = 4500 RPM, limit_variance = 15%)Fan2(Fan 2): BAD fan:6619 RPM (max = 25000 RPM, min = 4500 RPM, limit_variance = 15%)Fan3(Fan 3): BAD fan:6931 RPM (max = 25000 RPM, min = 4500 RPM, limit_variance = 15%)Fan4(Fan 4): BAD fan:6720 RPM (max = 25000 RPM, min = 4500 RPM, limit_variance = 15%)
With RADIUS enabled for user shell authentication, there might be a delay in local user authentication for non cumulus user accounts.5.0.0-5.8.0
When a VNI flaps, an incorrect list of layer 2 VNIs are associated with a layer 3 VNI. The NCLU net show evpn vni detail command output shows duplicate layer 2 VNIs under a layer 3 VNI.3.7.15, 4.4.2-4.4.5, 5.0.0-
The net show time ntp servers command does not show any output with the management VRF.3.7.15-3.7.16, 4.1.1-4.4.5, 5.0.0-5.8.0
The NVUE command nv show service ntp mgmt server does not show any configured servers.5.0.0-
When you run the ethtool -m or the l1-show command, the 400G interface optical values do not show.4.4.0-4.4.5, 5.0.0-5.8.0
CVE-2021-39925: Buffer overflow in the Bluetooth SDP dissector in Wireshark 3.4.0 to 3.4.9 and 3.2.0 to 3.2.17 allows denial of service via packet injection or crafted capture file.
Vulnerable: <= 2.6.20-0+deb10u1
Fixed: 2.6.20-0+deb10u2
4.0.0-4.4.1, 5.0.0-
CVE-2021-42771: relative path traversal in Babel, a set of tools for internationalising Python applications, could result in the execution of arbitrary code
Vulnerable: 2.6.0+dfsg.1-1Fixed: 2.6.0+dfsg.1-1+deb10u1
4.0.0-4.4.1, 5.0.0-
If you enable or disable the advertise primary IP address setting when originating EVPN default type-5 routes, the default route or prefix originated from one of the MLAG peers sends a null layer 3 VNI, which prevents the remote VTEP from installing the default route.5.0.0-
The validate-ports -d command does not return the correct speeds for ports. Use the speeds specified in the /etc/cumulus/ports.conf file.5.0.0-
When connecting the NVIDIA SN4600 switch to another NVIDIA Spectrum switch, you must use auto-negotiation mode (not force mode); otherwise the switch might use the wrong Tx configuration.5.0.0-5.8.0
When you use NCLU to remove the configuration for a peer that is a member of a group but also has other peer-specific configuration, you must remove the peer-specific configuration before you delete the peer in a separate NCLU commit.5.0.0-5.8.0
The switch duplicates DHCP packets that pass through the VTEP.4.3.0, 4.4.0-4.4.5, 5.0.0-
Cumuls Linux does not support a bond with more than 64 ports. Any configuration with more than 64 ports in a bond changes all ports to down when you apply the configuration.5.0.0-5.8.0
FRR restarts even when the NVUE configuration overwrite mode is set.5.0.0-
When you configure PIM, you can either configure RP mappings for different multicast groups or use a prefix list to specify the RP to group mapping. You cannot use the two methods together.5.0.0-
When you use MD5 passwords and you configure a non-default VRF before the default VRF in the /etc/frr/frr.conf file, numbered BGP sessions do not establish.3.7.15-
When you change the VRRP advertisement interval on the master, the master advertisement interval field in the show vrrp command output does not show the updated value.4.4.0-4.4.5, 5.0.0-5.8.0
ACL [No More Resources] messages keep appearing and you can’t reinstall the ACL.4.3.0-
When configured with NVUE, SVIs do not inherit the pinned MAC address of the bridge.4.3.0, 5.0.0-
The NVUE nv show vrf default router bgp peer command produces a 404 not found error.4.4.0-4.4.5, 5.0.0-5.8.0
When you enable a service in the management VRF, systemctl issues a warning similar to the following:
Warning: The unit file, source configuration file or drop-ins of ntp@mgmt.service changed on disk. Run ‘systemctl daemon-reload’ to reload unit
You can safely ignore this warning.
4.0.0-4.4.5, 5.0.0-5.8.0

Fixed Issues in 5.1.0

Issue IDDescriptionAffects
If two FDB entries are added in hardware with a single API call (at the same time), when one entry already exists in hardware and the additional entry has a tunnel type, the resulting FDB entry might be configured improperly in hardware. This can cause corruption of the packets that match the FDB entry.4.4.0-4.4.2, 5.0.0-5.0.1
The net show interface detail command output shows Type=Unknown for the specified interface.4.4.3-5.0.1
Communication between single-connected MLAG hosts on different switches fails because packets received by single-connected MLAG hosts are not forwarded over the peer link. To work around this issue, when adding a switch to an MLAG pair, enable all the interfaces.5.0.0-5.0.1
When you run the NVUE nv show interface command, a watchdog timeout might occur and the nvued service fails.5.0.1
If you update the MAC address of an SVI using ifreload and hwaddress, the kernel maintains a stale permanent fdb entry for the old MAC address.3.7.15, 4.3.0, 4.4.0-4.4.3, 5.0.0-5.0.1
On Spectrum-2 switches, when a packet has a CRC and the ports are in cut-though mode, the switch might stop forwarding traffic.4.4.2-4.4.3, 5.0.0-5.0.1
When you upgrade Cumulus Linux from 4.0 and later to Cumulus Linux 5.1.0 with package upgrade apt-get upgrade, the upgrade fails with the following error and the NVUE service does not start
Setting up python3-nvue ( ..
Adding user nvue to group netshow/usr/sbin/policy-rc.d returned 101, not running ‘restart nvued.service’/usr/sbin/policy-rc.d returned 101, not running ‘restart nvue-startup.service’/usr/sbin/policy-rc.d returned 101, not running ‘try-restart ifreload-nvue.service’To enable the newly installed bash completion for CUE in this shell, execute..
source /etc/bash_completionCreated symlink /etc/systemd/system/multi-user.target.wants/nvued.service _ /lib/systemd/system/nvued.service
Created symlink /etc/systemd/system/multi-user.target.wants/nvue-startup.service _ /lib/systemd/system/nvue-startup.service
Job for nvue-startup.service failed because the control process exited with error code
See “systemctl status nvue-startup.service” and “journalctl -xe” for details
dpkg: error processing package python3-nvue (–configure):installed python3-nvue package post-installation script subprocess returned error exit status 1
To work around this issue, reboot the system.
When you configure ACLs on the switch, you might see a switchd segmentation fault.5.0.1
In BGP unnumbered, when you try to remove an interface from the underlay default VRF with the NVUE nv unset vrf default router bgp neighbor command, the command fails to apply.4.4.2-5.0.1
When you change the time with NTP or manually, the clagd service stops.4.3.0
Docker creates a bridge called docker0 and this causes compatibility issues with WJH, which runs in a Docker container.
After you remove the port from the EVPN-MH bond, the port stays in the PRTDN state with the protodown flag ON.4.4.3, 5.0.0-5.0.1
PBR rules that you apply to interfaces in the default VRF install in the kernel with the action lookup local. As a result, packets that match this rule only perform a route lookup in the local table (which contains special routes for local IP addresses and broadcast addresses) but not in the main table (which contains unicast routes). As a result, policy routing might be applied to traffic incorrectly.4.4.2-5.0.1
When you run the /usr/share/snmp/resq_pp.py script used by SNMP, you see the following log message in syslog regardless of the forwarding table profile set in the /etc/cumulus/datapath/traffic.conf file.4.4.0-4.4.3, 5.0.0-5.0.1
After you convert a port from a layer 2 bond member to a layer 3 port, the switch drops transmitted untagged packets as egress VLAN membership discards
To work around this issue, restart switchd with the sudo systemctl restart switchd.service command.
4.4.2-4.4.3, 5.0.0-5.0.1
When you set vlan-bridge-binding on for a VLAN interface, the VLAN interface status does not change to down even when all bridge member ports are down.4.4.3-5.0.1
After you configure a new VLAN on a bond, traffic might stop forwarding on the bond interface. This issue occurs only when you specify bridge-vids on the bond. This issue does not occur when you configure VLANs only on the bridge interface and let the bond get the bridge-vids applied from the bridge.4.4.2-4.4.3
The sudo smonctl command output shows an error for the ASIC temperature sensor (temp6).5.0.0-5.0.1
Updating an existing tunnel configuration with NVUE or directly in the /etc/network/interfaces file causes traffic loss. The original tunnel is destroyed and then recreated (with a new ifindex)
The new behavior will make sure to apply the configuration delta without disrupting any traffic as much as possible. Note that a tunnel mode change can’t be applied without causing traffic loss.
If you remove NGINX from the switch, then run apt autoremove, switchd does not reload. This occurs because removing NGINX also removes the libyaml-0-2 and python-yaml packages, which are required for the switchd consistency check.5.0.0-5.0.1
sFlow fails to send flow samples.5.0.0-5.0.1
When you run ifquery as non-root, EVPN multihoming bond configuration fails
To work around this issue, always use sudo when running ifupdown2 commands (ifup, ifreload, ifdown, and ifquery).
When you configure QoS remarking on a bond, the port stops forwarding traffic.
After you delete the last vxlan-remoteip configuration line from the /etc/network/interfaces file and run the ifreload -a command, the corresponding BUM flood entry is not removed.3.7.15-5.0.1
When you poll TCP-MIB objects, the snmpd process slowly leaks memory. To work around this issue, restart the snmpd service to free memory with the systemctl restart snmpd command.4.3.0
When you use NVUE to configure an ACL rule with a set cos action, the nv config apply command fails with the following error message:{nofromat}$ cumulus@switch:~$ nv config applyFailed to prepare to applyUnrecoverable internal error{nofromat}5.0.1
After you install the RADIUS libnss-mapuser package, the nvued service fails to start.5.0.0-5.0.1
If switchd requires more time to update port or bond configuration after the port or bond flaps, the systemd watchdog times out. As result, systemd might assume that switchd is unresponsive and restarts it.4.2.1-4.4.2
Cumulus Linux lets you add more than one VXLAN interface to same VLAN on the same bridge. This is an invalid configuration as certain Cumulus Linux components, such as switchd, expect a single VNI for a given bridge or VLAN.3.7.15, 4.2.1-4.3.0, 4.4.2-5.0.1
The overlay ASN is removed after a route flap.4.4.0-5.0.1
If you reboot the switch when using WJH, you need to start the what-just-happened service even if the service is enabled.5.0.1
If you use NVUE to configure selective route leaking to exclude certain prefixes, the route map fails to apply when you run the nv config apply command.5.0.0-5.0.1
You cannot run NVUE commands to configure route leaking. To work around this issue, create a snippet in yaml format and add the configuration to the /etc/frr/frr.conf file.4.4.0-5.0.1
NVUE flexible snippets create invalid YAML files.5.0.0-5.0.1
ECMP error messages, similar to the following, show in log files:
Dec 15 10:01:35 leaf01 switchd3431: hal_mlx_sdk_nexthop_wrap.c:361 ERR ECMP: cmd CREATE failed: No More Resources, nexthops 1Dec 15 10:01:35 leaf01 switchd3431: hal_mlx_sdk_nexthop_wrap.c:621 ERR ECMP: failed to CREATE static ecmp in hwDec 15 10:01:35 leaf01 switchd3431: hal_mlx_sdk_nexthop_wrap.c:656 ERR ECMP: cmd CREATE failed: No More Resources, nexthops 1Dec 15 10:01:35 leaf01 switchd3431: hal_mlx_ecmp.c:1540 ERR ECMP: failed to allocate hw ecmp status No More ResourcesDec 15 10:01:35 leaf01 switchd3431: hal_mlx_ecmp.c:1561 ERR ECMP: error allocating static ecmpDec 15 10:01:35 leaf01 switchd3431: hal_mlx_ecmp.c:2207 ERR ECMP: failed to find ecmp container
SNMP reports the same ifType of ethernetCsmacd(6) for loopback interfaces.3.7.15-4.4.2, 5.0.0-5.0.1
The nv show interfaces command returns a 500 error and syslog shows a python error, triggered by third party devices (non CL) missing LLDP fields
To work around this issue, disable LLDP on a single interface.
NVUE commands including the nv config apply command might fail with the following error because the /etc/resolv.conf file is missing
Failed to prepare to applyUnrecoverable internal error
CVE-2020-35498: A vulnerability was found in openvswitch. A limitation in the implementation of userspace packet parsing can allow a malicious user to send a specially crafted packet causing the resulting megaflow in the kernel to be too wide, potentially causing a denial of service. The highest threat from this vulnerability is to system availability
Vulnerable: <= 2.8.90-1-cl4u5Fixed: 2.8.90-1-cl4u6, 2.8.90-1-cl4.4.0u1, 2.8.90-1-cl5.0.0u8
NVUE commands fail to configure port mirroring.5.0.0-5.0.1
When you change the port breakout configuration, you must restart switchd to clean up any previously-associated port states and reinitialize the ports. Reloading switchd does not work.5.0.0-5.0.1
In a scaled EVPN-MLAG configuration (observed with 400 or more VNIs and 20K or more MAC addresses – the actual scale might vary), when the peer link flaps causing all VNIs to come up at the same time, there might be high CPU utilization on the system for several minutes and the FRR service might restart. After FRR restarts or the CPU utilization settles down, the system functions normally.4.2.1-4.3.0, 4.4.0-5.0.1
Incomplete or unnecessary configuration in FRR results in FRR restarting instead of rejecting the configuration with an error.5.0.0-5.0.1
If two FDB entries are added in hardware with a single API call (at the same time), when one entry already exists in hardware and the additional entry has a tunnel type, the resulting FDB entry might be configured improperly in hardware. This can cause corruption of the packets that match the FDB entry.4.4.0-5.0.1
In a static VXLAN configuration with a traditional or single VXLAN device, enabling bridge learning on the VNI leads to an incorrect warning and the setting is removed in the next commit. The warning is similar to the following:
warning: vni10: possible mis-configuration detected: l2-vni configured with bridge-learning ON while EVPN is also configured - these two parameters conflict with each other
Traffic failover in a multicast topology with redundancy has the mroute stuck in a prune state and PIM join messages continue to send. To work around this issue, run the vtysh clear ip mroute command.3.7.15-4.3.0, 5.0.0-5.0.1
An unexpected software system shutdown can occur due to a thermal zones issue in the hw-management package. The following message might appear in /var/log/syslog before the shutdown:
thermal thermal_zoneX: critical temperature reached (33 C), shutting down
In an EVPN configuration, an FRR restart on a border leaf VRRP master causes a stale route for the VRRP VIP on some remote VTEPs to point to the VRRP backup after convergence.3.7.12-3.7.15, 4.3.0, 4.4.2-5.0.1
With the ip-acl-heavy TCAM profile, the following message might appear after you install an ACL with NCLU or cl-acltool and the ACL might not work correctly
hal_flx_acl_util.c:378 ERR hal_flx_acl_resource_release resource region 0 size 7387 create failed: No More Resources
To work around this issue, change the TCAM profile to acl-heavy or ip-acl-heavy with ACL non-atomic mode.
When you use the NVUE command nv set interface lo router ospf area to configure OSPF on a loopback interface, the configuration fails to apply
To work around this issue, configure the loopback interface in the desired OSPF area with the nv set vrf default router ospf area 0 network command and reference the assigned prefix of the loopback interface. For example:
cumulus@leaf01:~$ nv set vrf default router ospf area 0 network