Cumulus Linux 5.4 Release Notes

Download 5.4 Release Notes xls    Download all 5.4 release notes as .xls

5.4.0 Release Notes

Open Issues in 5.4.0

Issue IDDescriptionAffectsFixed
If you run the NVUE nv set service snmp-server readonly-community command to set an SNMP V2 trap community string that includes fewer than eight characters, the configuration fails. The SNMP V2 trap community string must include eight or more characters.5.4.0
If you restore an NVUE startup.yaml file after upgrade that includes breakout ports with QoS configured on the breakout interfaces or you run the nv config patch command to update a configuration with a yaml file that includes breakout ports with QoS configured on the breakout interfaces, the NVUE configuration fails to apply and subsequent attempts to run nv config apply fail with the following message:
cumulus@switch:mgmt:/var/log# nv config applyInvalid config [rev_id: 11]
qos config is not supported on following invalid interface: swp1s0. Supported on swp and bond interface types
To work around this issue, run nv unset on the configured QoS settings, then apply the breakout port configuration before you configure QoS or remove the QoS configuration from the yaml file and patch it separately after applying the breakout configuration.
The NVUE PTP shaping commands are available in the NVUE command list; however, these commands are disabled and do not configure PTP shaping. PTP shaping is not supported in Cumulus Linux
When you expand the root schema in the NVUE Swagger API, you see many errors in the Error panel at the top of the page.5.3.1-5.4.0
Cumulus Linux 5.4 package upgrade (apt-upgrade) does not support warm restart to complete the upgrade; performing an unsupported upgrade can result in unexpected or undesirable behavior, such as a traffic outage.5.4.0
Configuring TACACS NSS debugging can cause NVUE commands to break for all users who log in while they are enabled. To work around this issue, if your terminal session is unaffected and the debugs are configured with NVUE, unset the debugs or set the debug level to
If you use TACACS+ authentication, modifying the TACACS+ configuration with NVUE might result in a timeout error when you run the nv config apply command. To work around the issue, restart the nvued service with the sudo systemctl restart nvued.service command, then apply the configuration again.5.4.0
If you uninstall dynamic NAT rules and switchd restarts before all the dynamic NAT flows age out and are deleted, you might see errors in switchd.log due to dynamic flow deletion errors. These errors do not affect new dynamic NAT flows from new NAT rules.5.4.0
On the Spectrum-2 and Spectrum-3 switch with ports operating at 1G speed, there is loss of frames that have an odd or random frame size. In the frame size range of 75 to 1000 bytes, there is frame loss of less than approximately one percent for all odd or random frame sizes in the range. In the frame size range greater than 1000 bytes, there is no loss observed.5.4.0
The switch does not learn MAC addresses from DHCP packets. When a DHCP enabled host is plugged in for the first time, it tries to obtain an IP address through DHCP. The switch does not learn the MAC address of the host when it receives these DHCP packets; therefore, the host MAC address is not updated in the local forwarding database and it does not get advertised across EVPN. The switch learns the MAC address when it receives other packets, such as ARP or ND from the host. To work around this issue, either configure a temporary IP address on the host to initiate ARP/ND or enable IPv6, which sends ND after link local address creation.5.2.0-5.4.0
When connecting a NVIDIA-to-NVIDIA in PAM4, auto-negotiation should always be enabled.5.4.0
The NVUE nv show interface qos command takes a significant time to show output or times out. To work around this issue, use specific QoS commands. For example, to show congestion control information, run the nv show interface qos congestion-control command.5.4.0
If you use the NVUE REST API to configure a local user with a hashed password, the user cannot log in and the /etc/nvue.d/startup.yaml file shows the password as plain text.5.4.0
When you run the NVUE nv show interface command, you see an error similar to the following:
Error: GET /nvue_v1/interface/swp45?rev=operational responded with 500 INTERNAL SERVER ERROR
Using the NVUE REST API with a TACACS+ user account does not work due to authentication failures. To work around this problem, replace /etc/pam.d/nvueapi with the following content:
@include common-auth@include common-account@include common-session-noninteractive
If you run the NVUE nv set interface description command without providing a description, the nv config apply command fails with an error:
root@cumulus:mgmt:~# nv set interface swp5 description ““root@cumulus:mgmt:~# nv config apply
Unable to restart services (ifreload-nvue.service):
Job for ifreload-nvue.service failed because the control process exited with error code
During restart of ifreload-nvue.service:error: /etc/network/interfaces: line28: iface swp5: invalid syntax ‘alias’
»> Full logs available in: /var/log/ifupdown2/network_config_ifupdown2_104_Jan-20-2023_15:00:10.761330 ««br />Failed to start ifreload wrapper service (for NVUE compatibility)
In a fairly high scale BGP EVPN route environment, running the NVUE nv show vrf default router bgp address-family l2vpn-evpn loc-rib command to obtain data leads to high resource usage. In some cases, there is an out of memory error, which leads to multiple daemon crashes.5.4.0
When using TACACS+, if the /etc/nsswitch.conf file specifies passwd: files tacplus (files is listed before tacplus), the user name mapping might be incorrect; for example, the user name shown in the default prompt might be incorrect. When you use NVUE, this occurs when the priority for the authentication order of local is higher than tacacs.3.7.0-3.7.16, 4.0.0-4.4.5, 5.0.0-5.4.0
When using TACACS+, if the /etc/nsswitch.conf file specifies passwd: files tacplus (files is listed before tacplus), a user that is present in both the local /etc/passwd file and the TACACS+ server cannot log into the switch.5.4.0
Ethtool HwIfInDot3FrameErrors (Rx FCS Errors) might lead to an incorrect very large HwIfInErrors count. To work around this issue, stop the source of the FCS errors, then reset the interface counters:1. Run the sudo mst status command to find the device
2. Run the sudo mlxlink -d -p <port_number> -pc command to reset the interface counters. For example: sudo mlxlink -d /dev/mst/mt53104_pciconf0 -p 39 -pc.
Using su to change to a user specified through TACACS+ results in becoming the local tacacs0 thru tacacs15 user instead of the named user to run sudo commands. When sudo asks for the password of the named user, it is unlikely to match that of the local tacacs0 thru tacacs15 user.3.7.0-3.7.16, 4.0.0-4.4.5, 5.0.0-5.4.0
If you have a large number of MAC addresses, they do not age out at the MAC ageing timeout value configured on the switch. It might take up to 30 seconds more for the MAC addresses to age out and be deleted from the hardware. To work around this issue, wait for the ageing timeout value plus 30 seconds to allow for the MAC addresses to age out and be deleted from the hardware.5.4.0
On the Spectrum 1 switch if PTP is configured on a bond, the ptp4l service fails to start and you see a message similar to the following:
systemd[1]: ptp4l.service: Failed with result ‘exit-code’
systemd[1]: Failed to start Precision Time Protocol (PTP) service
If you run NVUE commands to break out a port into four interfaces, NVUE disables the subsequent port automatically. However, if you run NVUE commands to break out a port into eight interfaces, NVUE does not disable the subsequent port automatically; you have to run the NVUE command to disable the subsequent port.5.4.0
When you restart the LLDP service, you see a broken pipe error and a log message in the lldpd.service logs. This error does not affect LLDP functionality.5.4.0
If you use NVUE to configure multiple SNMP listener addresses at the same time, the SNMP service fails to start. To work around this issue, configure multiple SNMP listener addresses one at a time.5.3.0-5.4.0
When you apply switch configuration for the first time on a freshly booted switch, you might see the error message Failed to start Hostname Service when you run the nv config apply command after setting the hostname with nv set system hostname. To work around this issue, run the nv config apply command a second time.5.3.0-5.4.0
With double tagged QinQ interfaces, if the bridge corresponding to the QinQ interface flaps, you might see invalid learning notifications and errors from similar to the following:
Can’t set non-static MAC address for non-vPort 0x0001006B when VID is VFID. 
The NVUE nv unset interface link lanes command does not restore the port lane setting to default value. To work around this issue, run the nv set interface link lanes command.5.4.0
The l1-show eth0 command does not show port information and is not supported in this release.5.3.0-5.4.0
Cumulus Linux 5.2.0 and 5.2.1 VX images might include an incorrect entry at the end of /etc/apt/sources.list, which produces warnings when you run apt update. Remove this entry to avoid these warnings.5.2.0-5.4.0
On rare occasions, when you query the system hostname through the hostnamctl application, you see a timeout. NVUE uses the hostnamctl application to determine the system hostname, which can result in an nv config apply command failure.5.2.0-5.4.0
When you connect the NVIDIA SN4600C switch to a Spectrum 1 or Spectrum-3 switch with a 40GbE passive copper cable (Part Number: MC2210126-005) on edge ports 1-4 and 61-64, there is an Effective BER of 1E-12 in PHY.5.2.0-5.4.0
You cannot use NVUE to configure an SNMP view to include a subtree beginning with a period. For example:
$ nv set service snmp-server viewname cumulusOnly included . GET /nvue_v1/service/snmp-server/viewname/cumulusOnly/included?pointers=%5B%22%2Fparameters%22%2C+%22%2Fpatch%2FrequestBody%2Fcontent%2Fapplication~1json%2Fschema%22%2C+%22%2Fpatch%2Fparameters%22%2C+%22%2Fpatch%2Fresponses%2F200%2Flinks%22%5D responded with 404 NOT FOUND
To work around this problem, reference the OID without the preceding period ( . ) in the command.
If you disable the NVUE service, the /etc/cumulus/datapath/nvue_traffic.conf file does not delete automatically, which prevents ECMP and LAG hash settings in the /etc/cumulus/datapath/traffic.conf file from taking effect. To work around this issue, delete the nvue_traffic.conf file with the sudo rm /etc/cumulus/datapath/nvue_traffic.conf command.5.2.0-5.4.0
On the NVIDIA Spectum-1 switch, the nv show system forwarding command shows GTP hashing output, which is not supported on this switch.5.2.0-5.4.0
The BGP4-MIB.txt file is missing from Net-SNMP in Cumulus Linux 5.0.0 through 5.2.1 and
PAM4 split cables (such as 2x100G, 4x100G, and 4x50G) do not work with a forced speed setting (when auto-negotiation is off) as the default speed enabled is for NRZ mode (such as 100G_4X). To work around this issue, set the appropriate lanes for forced speed (with auto-negotation off) with the ethtool -s swpX speed <port_speed> autoneg off lanes <no_of_lanes> command. For example:
ethtool -s swp1 speed 100000 autoneg off lanes 2
On the NVIDIA SN4700 switch, inserting and removing the PSU might cause loss of frames.5.2.0-5.4.0
When you configure two VNIs in the same VLAN, ifupdown2 shows a vlan added to two or more VXLANS warning, which is only issued after the VNI is already added to the bridge. This leaves the new VNI in the PVID even if there is already an existing VNI configured in that PVID.5.1.0-5.4.0
On the NVIDIA SN4700 switch, inserting and removing the PSU might cause loss of frames.5.2.0-5.4.0
QOS traffic shaping doesn’t restore the default configuration after you disable traffic shaping in the /etc/cumulus/datapath/qos/qos_features.conf file. To work around this issue, restart switchd.4.4.3, 5.0.0-
Under a high load, you might see ingress drop counters increase. The drops are classified as HwIfInDiscards in ethtool and shown as ingress_general in hardware.4.3.0-4.4.5, 5.0.0-5.4.0
On rare occasions, after you reboot or restart switchd on a Spectrum 1 switch, any 25G connections with Direct Attach Copper (DAC) cables that connect from the switch to a non-NVIDIA device might flap continuously. To work around this issue, bring the affected link administratively down for a few seconds on the non-NVIDIA device, then bring the link back up.4.4.4-4.4.5, 5.1.0-5.4.0
When the CPU load is high during a warm boot, bonds with a slow LACP rate fail to forward layer 2 traffic for up to 60 seconds (depending on the duration of the CPU load) and static bonds fail to forward layer 2 traffic for up to 5 seconds.5.1.0-5.4.0
After you run Linux commands to enable a custom ECMP or LAG hash parameter, if you set the hash_config.enable or lag_hash_config.enable parameter to false, the custom parameters do not restore their default values. To work around this issue, change the custom ECMP or LAG hash parameters to their default values manually.5.1.0-5.4.0
The cl-resource-query command output shows ECMP nextHop Table exhaustion (above 100 percent utilization) and the switchd.log file contains ECMP resource errors with routes and next hops failing to install.4.2.1-4.4.5, 5.0.0-5.4.0
When the CPU load is high during a warm boot, bonds with a slow LACP rate fail to forward layer 2 traffic for up to 60 seconds (depending on the duration of the CPU load) and static bonds fail to forward layer 2 traffic for up to 5 seconds.5.1.0-5.4.0
If GTP Hashing is set to true, after more than two warm boots, switchd fails and a cl-support file is generated.5.1.0-5.4.0
In a MLAG EVPN deployment when either of the clag peer undergoes reboot then the local host entries in ARP table are incorrectly programmed as remote by FRR.4.4.4-4.4.5, 5.1.0-5.4.0
With RADIUS enabled for user shell authentication, there might be a delay in local user authentication for non cumulus user accounts.5.0.0-5.4.0
When a VNI flaps, an incorrect list of layer 2 VNIs are associated with a layer 3 VNI. The NCLU net show evpn vni detail command output shows duplicate layer 2 VNIs under a layer 3 VNI.3.7.15, 4.4.2-4.4.5, 5.0.0-
The net show time ntp servers command does not show any output with management VRF.3.7.15-3.7.16, 4.1.1-4.4.5, 5.0.0-5.4.0
When you run the ethtool -m or the l1-show command, the 400G interface optical values do not show.4.4.0-4.4.5, 5.0.0-5.4.0
CVE-2021-39925: Buffer overflow in the Bluetooth SDP dissector in Wireshark 3.4.0 to 3.4.9 and 3.2.0 to 3.2.17 allows denial of service via packet injection or crafted capture file
Vulnerable: <= 2.6.20-0+deb10u1Fixed: 2.6.20-0+deb10u2
4.0.0-4.4.1, 5.0.0-
CVE-2021-42771: relative path traversal in Babel, a set of tools for internationalising Python applications, could result in the execution of arbitrary code
Vulnerable: 2.6.0+dfsg.1-1Fixed: 2.6.0+dfsg.1-1+deb10u1
4.0.0-4.4.1, 5.0.0-
When connecting the SN4600V to an SNxxxx system, it must run in auto-negotiation mode (not force mode) otherwise the wrong Tx configuration might be chosen.5.0.0-5.4.0
In a static VXLAN configuration with a traditional VXLAN device, enabling bridge learning on the VNI leads to an incorrect warning and the setting is removed in the next commit. The warning is similar to the following:
warning: vni10: possible mis-configuration detected: l2-vni configured with bridge-learning ON while EVPN is also configured - these two parameters conflict with each other
Configuring a router with the REST API through the switch front panel ports (swps) is supported in the default VRF only
To work around this issue, use the localHost IP address or the MGMT IP address to configure router using the Rest API.
When you use NCLU to remove the configuration for a peer that is a member of a group but also has other peer-specific configuration, you must remove the peer-specific configuration before you delete the peer in a separate NCLU commit.5.0.0-5.4.0
Cumuls Linux does not support a bond with more than 64 ports. Any configuration with more than 64 ports in a bond changes all ports to down when you apply the configuration.5.0.0-5.4.0
When you change the VRRP advertisement interval on the master, the master advertisement interval field in the show vrrp command output does not show the updated value.4.4.0-4.4.5, 5.0.0-5.4.0
SVIs do not inherit the pinned MAC address of the bridge.4.3.0, 5.0.0-
A default route learned from DHCP on eth0 in the management VRF might install in the default VRF if eth0 is disconnected and the original next hop is reachable in the default VRF. To work around this issue, delete the DHCP lease file for eth0 with the sudo rm /var/lib/dhcp/dhclient.eth0.leases command.4.3.0, 5.0.0-
The NVUE nv show vrf default router bgp peer command produces a 404 not found error.4.4.0-4.4.5, 5.0.0-5.4.0

Fixed Issues in 5.4.0

Issue IDDescriptionAffects
In rare circumstances, attempting to install a Cumulus Linux 5.3 image can fail during installation. The device stops at the (initramfs) prompt. To resume installation, enter the exit command at the (initramfs) prompt.5.3.0-5.3.1
Currently, the default core dump size limit on Cumulus Linux is 256M but the SDK generates core dumps around 800M. To avoid incomplete core files, you can increase the core dump size limit.4.2.1-5.3.1
Switch fans run at very high speed but the temperature is normal.5.2.0-5.3.1
When the switch boots up, you might see logs similar to the following in the nvued log files because switchd is not up and running. This does not impact switch functionality
2023-01-29T06:05:18.683152+00:00 cumulus nvued:  INFO: Apply Issues: (b'),(update-ports returned with error (code 254): ports validation node file is not accessibleswitchd validate_node is absent),(ports configuration(ports.conf/ports_width.conf) is invalid),(')
The ethtool -m command does not show Digital Optical Monitoring (DOM) for SFP transceivers. To work around this issue, run the l1-show or mlxlink command instead.5.2.0-5.3.1
When a switch is operating as a PTP Grand Master, the phc2sys service might exit shortly after starting as the initial offset to correct is the delta from epoch, which is too large to correct.
When using TACACS+, a TACACS+ server name that returns more than one IP address, such as an IPv6 and IPv4 address, is counted many times against the limit of seven TACACS+ servers, which might cause some of the later listed servers to be ignored as over the limit. To work around this issue, you can set the prefer_ip_version configuration option (the default value is 4) to choose between an IPv4 or IPv6 address if both are present.3.7.0-5.3.1
The SNMP monitor might fail to send the expected traps.5.3.0-5.3.1
The traffic control rules that the EVPN multihoming configuration adds to an interface are deleted when the hsflowd service restarts. The hsflowd service deletes the EVPN multihoming traffic control filters after you stop hsflowd, then adds back the match-all filters with the psample action; however, hsflowd does not add back the EVPN multihoming traffic control rules.4.4.0-5.3.1
DHCP packets do not forward over VXLAN interfaces in multicast replication environments. This issue does not affect VXLAN environments using head end replication (HER).5.2.0-5.3.1
The memory consumption in ptmd can grow when the socket being used for a BFD session needs to be recreated. This is often seen when the route being used to forward the BFD packets is removed; for example, if the connected route is removed when an interface goes down, over which a single hop BFD session is formed.5.3.0-5.3.1
Some EVPN multihoming show commands might cause BGP to crash if you use the json flag and attempt to reference the default VRF by name. For example, show bgp l2vpn evpn es-vrf json.5.0.0-5.3.1
When upgrading from Cumulus Linux 5.0.0 thru 5.2.1 to Cumulus Linux 5.3.0 or 5.3.1, the babeltrace and python3-babeltrace packages are not added automatically even though they are in the default image in Cumulus Linux 5.3.0 and later. You may need these packages to decode LTTNG traces with /usr/lib/frr/ If you need to use this script, run the sudo apt update && sudo apt install babeltrace python3-babeltrace command to install the packages.5.3.0-5.3.1
NVUE gracefully detects and handles upgrades that include valid flexible snippets. For any invalid (incompatible) flexible snippets, you must delete the snippets before you apt upgrade Cumulus Linux; otherwise, the NVUE nv config apply command and the equivalent REST API, do not run.5.3.0-5.3.1
When you add the /etc/frr/frr.conf file to the ignore list for NVUE, any configuration change causes FRR to restart because a check is done to see if any running configuration has changed since the previously applied configuration in the vtysh shell.5.3.0-5.3.1
NVUE requires the SNMPv2 community string to be a minimum of eight characters.5.3.0-5.3.1
When the switch needs to forward a frame that has a source MAC address of 00:00:00:00:00:00, the dmesg log might report the message bridge: RTM_NEWNEIGH with invalid ether address in a loop every 30 seconds. The log message is harmless and frames with that MAC forward correctly.5.3.0-5.3.1
The memory consumption in ptmd can grow when the socket being used for a BFD session needs to be recreated. This is often seen when the route being used to forward BFD packets is removed; for example, if the connected route is removed when an interface goes down, over which a single hop BFD session is formed.5.2.0-5.3.1
After you restart the FRR service, show commands incorrectly reflect the VLAN associated with layer 3 VNIs as 0:
# net show evpn vni 123VNI: 123Type: L3Tenant VRF: BLUEVlan: 0
On Spectrum 1 switches when configuring ACLs in non-atomic mode, if there are too many IPv6 matches due to rules with both input-interface and output-interface matches on SVIs, the ACL install fails and switchd crashes.5.2.0-5.3.1
Due to a race at the initial configuration, the SDK RDQ test may test RDQ configured for WJH and fail the test resulting in a fatal health event.5.2.0-5.3.1
When an FRR routing service (such as bgpd) becomes unresponsive, watchfrr might fail to stop and restart service. To work around this issue, restart FRR with the systemctl restart frr command.4.4.0-5.3.1
The Linux utility that sends ARP packets is constrained to 512 interfaces on the system. In large scale deployments, the warm boot process fails repeatedly as it sends gratuitous ARP requests for each local address. This issue does not impact the functionality and can be ignored.5.2.0-5.3.1
ACL configurations fail when the TCAM memory is exhausted because the CTCAM profile is configured with duplicate entries.5.2.0-5.3.1
When you delete a route under the following conditions, switchd might crash:- The minimum number of routes is set to a non-zero value
- KVD utilization is higher than sixty percent
- The number of routes currently configured is less than the minimum reserved value, and multiple KVD linear resources have just been freed and are waiting in the Garbage Collector queue.
When you configure or unconfigure a BGP peer and interface towards a host, memory corruption can cause BGP to crash.5.0.1-5.3.1
When using TACACS+, if you configure per-command authorization with the tacplus-restrict command, NVUE configuration commands fail for any user with a privilege level lower than 15. This occurs because NVUE is not able to create a .local user directory.5.2.0-5.3.1
The NVUE nv show system forwarding –output json command does not provide any output. To work around this issue, run the nv show system forwarding command.5.2.0-5.3.1
You can not apply NVUE configurations when TACACS is enabled for user authentication. To work around this issue, add the nvue account to the exclude_users line in /etc/tacplus_nss.conf:
The NVUE nv show interface link state command shows an empty table instead of showing the port link state.5.0.0-5.3.1
The NVUE nv show interface command shows the operational state of the tunnel as down even though the tunnel is up, and encapsulation and decapsulation occurs correctly.5.1.0-5.3.1
FRR restarts even when the NVUE configuration overwrite mode is set.5.0.0-5.3.1