NVIDIA ConnectX-5 Adapter Cards Firmware Release Notes v16.35.3006 LTS

Known Issues

Ethernet Rate Limit per VF in RoCE Mode Limitations

Dual Port Device

Single Port Device

w/o LAG (TOTAL_VFS>32)

With LAG (TOTAL_VFS<32)

w/o LAG

w/o QoS

Full QoS

w/o QoS

Full QoS

w/o QoS

Full QoS

127

127

64

64

127

127

Ethernet Rate Limit per VF in InfiniBand Mode Limitations

Dual Port Device

Single Port Device

w/o LAG

w/o LAG

w/o QoS

Full QoS

w/o QoS

Full QoS

127

127

127

127

Known Issues

Internal Ref.

Issue

3209624

Description: To configure Adaptive Routing in RoCE through ROCE_ACCL access register or through cmdif mlxconfig, ROCE_ADAPTIVE_ROUTING_EN nvconfig parameter must be set.

Workaround: N/A

Keywords: Adaptive Routing in RoCE

Discovered in Version: 16.35.1012

3200779

Description: Changing dynamic PCIe link width is not supported.

Workaround: N/A

Keywords: PCIe

Discovered in Version: 16.34.1002

2864238

Description: VPD cannot be accessed after firmware upgrade or reset when the following sequence is performed:

  1. Upgrade to a new firmware and perform a cold reboot

  2. Downgrade to an old firmware

  3. Run fwreset

  4. Upgrade to a new firmware

  5. Run fwreset

Workaround: Run the upgrade or reset sequence as follow:

  1. Upgrade to a new firmware and perform a cold reboot

  2. Downgrade to an old firmware

  3. Run fwreset

  4. Upgrade to a new firmware

  5. Perform a cold reboot

Keywords: VDP

Discovered in Version: 16.32.1010

2850374

Description: When using the Fast Linkup flow, once in 40 iterations the linkup time may take up to ~800 sec.

Workaround: N/A

Keywords: Fast linkup flow

Discovered in Version: 16.32.1010

2616755

Description: Forward action for IPoIB is not supported on RX RDMA Flow Table.

Workaround: N/A

Keywords: Steering, IPoIB

Discovered in Version: 16.32.1010

2622688

Description: Software steering on multi-port devices requires performing cfg. on top of the multi-port function and not the affiliated single-port function.

Workaround: N/A

Keywords: Software steering, multi-port devices

Discovered in Version: 16.29.2002

2378593

Description: Sub 1sec firmware update (fast reset flow) is not supported when updating from previous releases to the current one. Doing so may cause network disconnection events.

Workaround: Use full reset flow for firmware upgrade/downgrade.

Keywords: Sub 1sec firmware update

Discovered in Version: 16.29.1016

2213356

Description: The following are the Steering Dump limitations:

  • Requires passing the version (FW/Stelib/MFT) and device type to stelib

  • Re-format is not supported

  • Advanced multi-port feature is not supported – LAG/ROCE_AFFILIATION/MPFS_LB/ESW_LB (only traffic vhca <-> wire)

  • Packet types supported:

    • Layer 2 Eth

    • Layer 3 IPv4/Ipv6/Grh

    • Layer 4 TCP/UDP/Bth/GreV0/GreV1

    • Tunneling VXLAN/Geneve/GREv0/Mpls

  • FlexParser protocols are not supported (e.g AliVxlan/VxlanGpe etc..).

  • Compiles only on x86

Workaround: N/A

Keywords: Steering Bump

Discovered in Version: 16.29.1016

2365322

Description: When configuring adapter card's Level Scheduling, a QoS tree leaf (QUEUE_GROUP) configured with default rate_limit and default bw_share, may not obey the QoS restrictions imposed by any of the leaf’s ancestors.

Workaround: To prevent such a case, configure at least one of the following QoS attributes of a leaf: max_average_bw or bw_share

Keywords: QoS

Discovered in Version: 16.29.1016

2109187

Description: CRC errors are observed when connecting between FPGA and ConnectX-5 using 3rd party cables.

Workaround: N/A

Keywords: CRC

Discovered in Version: 16.27.2008

2064538

Description: When working with an NVME offload QP that is created with a unaligned page size (page_offset != 0), the QP moves to an error state on the first posted WQE.

Workaround: Create an NVME offload QP with page an aligned size (page_offset = 0).

Keywords: NVMF offload, unaligned page size

Discovered in Version: 16.27.2008

2080512

Description: Running VF lag with TTL WA (ESWITCH_IPV4_TTL_MODIFY_ENABLE = 1) may cause performance degradation.

Workaround: To bypass this issue, configure the following using mlxconfig:

  • ESWITCH_HAIRPIN_DESCRIPTORS[0..7]=11

  • ESWITCH_HAIRPIN_TOT_BUFFER_SIZE[0..7]=17

Keywords: mlxconfig, VF Lag

Discovered in Version: 16.27.1016

2071210

Description: mlxconfig query for the BOOT_INTERRUPT_DIS TLV shows a wrong value in the "current value" field.

Workaround: Use "next boot" indication to see the right value.

Keywords: mlxconfig

Discovered in Version: 16.27.1016

1930619

Description: PF_BAR2 and ATS cannot be enabled together, i.e. when PF_BAR2 is enabled, ATS cannot be enabled too.

Workaround: N/A

Keywords: ATS, SF, BAR2, Multi GVMI

Discovered in Version: 16.26.1040

-

Description: In rare cases, following a server powerup, a fatal error (device's health compromised) message might appear with ext_synd 0x8d1d. The error will be accompanied by a failure to use mlxconfig and in some cases flash burning tools.

Workaround: N/A

Keywords: mlxconfig, flash tool, ext_synd 0x8d1d

Discovered in Version: 16.26.1040

1836465

Description: When using the hairpin feature, and using VLAN strip or using the “modify esw vport context” command, the packets can have an incorrect VLAN header. Meaning, using VLAN push/pop may not work properly when using vport context VLAN.

The features that may be affected by this and not work properly are:

  • Host chaining

  • Mirroring in FDB

  • TTL modify in FDB

  • VGT+

Workaround: N/A

Keywords: E-switch vport context, VLAN

Discovered in Version: 16.26.1040

1842278

Description: DC LAG can function only in case there is a single PF per port without any active VFs.

Workaround: N/A

Keywords: DC LAG

Discovered in Version: 16.26.1040

1796628

Description: Due to performance considerations, unicast loopback traffic will go through the NIC SX tables, and multicast loopback traffic will skip the NIC SX tables.

Workaround: N/A

Keywords: Performance, unicast loopback traffic, multicast loopback traffic

Discovered in Version: 16.26.1040

1797493

Description: Firmware asserts may occur when setting the PF_BAR2_SIZE value higher than the maximum supported size.

Workaround: Configure within limits (NIC PF_BAR_SIZE <= 4).

Keywords: Multi-GVMI, Sub-Function, SFs, BAR2

Discovered in Version: 16.26.1040

1768814/1772474

Description: Due to hardware limitation, REG_C cannot be passed over loopback when the FDB action is forwarded to multiple destinations.

Workaround: N/A

Keywords: Connection-Tracking

Discovered in Version: 16.25.1020

1770736

Description: When a PF or ECPF with many VFs (SR-IOV), and/or SFs (Multi-GVMI) triggers an FLR, PCIe completion timeout might occur.

Workaround: Increase the PCIe completion timeout.

Keywords: Multi-GVMI, SR-IOV, Sub-Function, Virtual Function, PF FLR

Discovered in Version: 16.25.1020

1716334

Description: When mlxconfig.PF_BAR2_EN is enabled, configuring more than 255 PCI functions will raise an assert.

Workaround: When working with BAR2, configure SR-IOV to align to the 255 PCI functions limitation.

mlxconfig.NUM_OF_VFS controls the number of configured SR-IOV VFs. e.g.:

  • Smart NICs: 2 External Host PFs, 2 ARM ECPFs, 125 VFs per PF.

  • Non-smart NICs: 2 External Host PFs, 126 VFs per PF

Keywords: Multi-GVMI, PF_BAR2_EN, Sub-Functions, SR-IOV, VFs

Discovered in Version: 16.25.1020

1699214

Description: NODNIC VF is partially tested. It is fully tested only in ConnectX-5 adapter cards.

Workaround: N/A

Keywords: NODNIC VF

Discovered in Version: 16.25.1020

1749691

Description: On rare occasions, when using Socket-Direct devices, inband burning through the external port might fail.

Workaround: N/A

Keywords: Socket-Direct, inband burning

Discovered in Version: 16.25.1020

1689186

Description: Changing priority to TC map during traffic might cause packet drops.

Workaround: N/A

Keywords: QoS

Discovered in Version: 16.25.1020

1604699

Description: Ethernet RFC 2819 counter ether_stats_oversize_pkts and Ethernet IEEE 802.3 counter a_frame_too_long_errors share the same resource. Clearing each of them will affect the other.

Workaround: N/A

Keywords: Counters

Discovered in Version: 16.25.1020

1558250

Description: eSwitch owner may receive NIC_VPORT_CONTEXT events from vPorts that are not necessarily armed using the nic vport context arm_change_even tbit.

Workaround: N/A

Keywords: Port event, NODNIC

-

Description: In Ethernet mode, at 10/40GbE speeds, only NO-FEC in Force mode is supported. Other user configurations are overridden.

Workaround: N/A

Keywords: Ethernet, 10GbE, 40GbE, RS-FEC

Discovered in Version: 16.25.1020

1574876

Description: DC RoCE LAG is functional only if the router posts VRRP address as the source MAC.

Workaround: N/A

Keywords: DC RoCE LAG

Discovered in Version: 16.25.1020

1498399

Description: If the XRC switches between SRQ/RMPs while there is an outstanding ODP on the responder XRC QP, a CQE with an error might be generated (that is not a PFAULT abort).

Workaround: N/A

Keywords: XRC SRQ/RMP ODP

Discovered in Version: 16.25.1020

1546492

Description: Executing the update_lid command while the IB port sniffer utility is active can stop the utility.

Workaround: N/A

Keywords: IB Sniffer

Discovered in Version: 16.24.1000

1537898

Description: Initializing a function while the IB port sniffer utility is active can stop the utility.

Workaround: N/A

Keywords: IB Sniffer

Discovered in Version: 16.24.1000

1523577

Description: When modifying the TTL in the NIC RX, the CQE checksum is not recalculated automatically. The limitation is indicated by the ttl_checksum_correction bit. If the ttl_checksum_correction=0, the capability is not functioning properly.

Workaround: N/A

Keywords: multi_prio_sq, VF

Discovered in Version: 16.24.1000

1414290

Description: When getting an inline scatter CQE on IB striding RQ, the stride index in the CQE will be zero.

Workaround: N/A

Keywords: Scatter CQE

Discovered in Version: 16.24.1000

1475490

Description: Reboot is not supported on any host during the PLDM firmware burning process.

Workaround: N/A

Keywords: PLDM

Discovered in Version: 16.23.1020

1332714/1345824

Description: The maximum “read” size of MTRC_STDB is limited to 272 Bytes.

Workaround: Set the MTRC_STDB.read_size to the maximum value of 0x110=272 Bytes

Keywords: Access register, MTRC_STDB, tracer to dmesg, fwtrace to dmesg

Discovered in Version: 16.23.1020

1408994

Description: FTE with both forward (FWD) and encapsulation (ENCAP) actions is not supported in the SX NIC Flow Table.

Workaround: N/A

Keywords: SX NIC Flow Table

Discovered in Version: 16.23.1020

1350794

Description: Encapsulation / Decapsulation support in steering has the following limitations:

  • Encapsulation / Decapsulation can be open on the FDB only if all VFs are non active.

  • Encapsulation / Decapsulation supports single mode only: FDB / NIC. Opening tables of both types is not supported.

  • Encapsulation / Decapsulation per device support:

    encap-decap.PNG

Workaround: N/A

Keywords: Steering Encapsulation / Decapsulation

Discovered in Version: 16.23.1020

1027553

Description: While using e-switch vport sVLAN stripping, the RX steering values on the sVLAN might not be accurate.

Workaround: N/A

Keywords: e-sw vport sVLAN stripping, RX steering

Discovered in Version: 16.24.1000

1799917

Description: Untagged CVLAN packets in the Steering Flow Tables do not match the SVLAN tagged packets.

Workaround: N/A

Keywords: Steering Flow Tables, CVLAN/SVLAN packets

Discovered in Version: 16.23.1020

1504073

Description: When using ConnectX-5 with LRO over PPC systems there might be backpressure to the NIC due to delayed PCI writes operations. In this case bandwidth might drop from line-rate to ~35Gb/s. Packet loss or pause frames might also be observed.

Workaround: Look for an indication of PCI back pressure (“outbound_pci_stalled_wr” counter in ethtools advancing). Disabling LRO helps reduce the back pressure and its effects.

Keywords: Flow Control, LRO

Discovered in Version: 16.23.1020

1178792

Description: Host Chaining Limitations:

  • Single MAC address per port is supported

  • Both ports should be configured to Ethernet when host chaining is enabled

  • The following capabilities cannot function when host chaining is enabled:

    • SR-IOV

    • DSCP

    • NODNIC

    • Load balancing

    • LAG

    • Dual Port RoCE (multi port vHCA)

Workaround: N/A

Keywords: Host Chaining

Discovered in Version: 16.22.1002

1277762

Description: An Ethernet multicast loopback packet is not counted (even if it is not a local loopback packet) when running the nic_receive_steering_discard command.

Workaround: N/A

Keywords: Ethernet multicast loopback packet

Discovered in Version: 16.22.1002

1190753

Description: When a dual-port VHCA sends a RoCE packet on its non-native port. and the packet arrives to its affiliated vport FDB, a mismatch might happen on the rules that match the packet source vport.

Workaround: N/A

Keywords: RoCE, vport FDB

Discovered in Version: 16.22.1002

1306342

Description: Signature-accessing WQEs sent locally to the NVMeF target QPs that encounter signature errors, will not send a SIGERR CQE.

Workaround: N/A

Keywords: Signature-accessing WQEs, NVMeF target

Discovered in Version: 16.22.1002

1059975

Description: NVMeF limitation:

  • Transaction size - up to 128KB per IO (non-inline)

  • Support up to 16K connections

  • Support single namespace per drive

  • Staging buffer size must be at least 16MB in order to allow SRQ size of 64 entries

Workaround: N/A

Keywords: NVMeF

Discovered in Version: 16.22.1010

1168594

Description: RoCE Dual Port Mode (a.k.a Multi-Port vHCA: MPV) is not supported in Multi-Host setups.

Workaround: N/A

Keywords: Multi-Port vHCA, Multi-Host

Discovered in Version: 16.21.1000

1072337

Description: If a packet is modified in e-sw flow steering, the SX sniffer Flow Table (of the VF) will see the sniffed packet after the modification.

Workaround: N/A

Keywords: SX sniffer Flow Table

Discovered in Version: 16.21.1000

1171013

Description: Signature Handover Operations is not supported when FPP (Function-Per-Port) mode is disabled.

Workaround: N/A

Keywords: Signature Handover Operations, FPP

Discovered in Version: 16.21.1000

© Copyright 2023, NVIDIA. Last updated on Sep 5, 2023.