What can I help you with?
DOCA Documentation v2.9.2 LTS OVS Update

HBN Service Release Notes

The following subsections provide information on HBN service new features, interoperability, known issues, and bug fixes.

HBN 2.4.2 offers the following new features and updates:

  • Bug fixes

HBN 2.4.2 does not include any user affecting changes if upgrading from the previous HBN version.

Supported BlueField Networking Platforms

HBN 2.4.2 has been validated on the following NVIDIA® BlueField® networking platforms:

  • BlueField-2 DPUs:

    • BlueField-2 P-Series DPU 25GbE Dual-Port SFP56; PCIe Gen4 x8; Crypto Enabled; 16GB on-board DDR; 1GbE OOB management; HHHL

    • BlueField-2 P-Series DPU 25GbE Dual-Port SFP56; integrated BMC; PCIe Gen4 x8; Secure Boot Enabled; Crypto Enabled; 16GB on-board DDR; 1GbE OOB management; FHHL

    • BlueField-2 P-Series DPU 25GbE Dual-Port SFP56; integrated BMC; PCIe Gen4 x8; Secure Boot Enabled; Crypto Enabled; 32GB on-board DDR; 1GbE OOB management; FHHL

    • BlueField-2 P-Series DPU 100GbE Dual-Port QSFP56; integrated BMC; PCIe Gen4 x16; Secure Boot Enabled; Crypto Enabled; 32GB on-board DDR; 1GbE OOB management; FHHL

  • BlueField-3 DPUs:

    • BlueField-3 B3210 P-Series FHHL DPU; 100GbE (default mode)/HDR100 IB; Dual-port QSFP112; PCIe Gen5.0 x16 with x16 PCIe extension option; 16 Arm cores; 32GB on-board DDR; integrated BMC; Crypto Enabled

    • BlueField-3 B3220 P-Series FHHL DPU; 200GbE (default mode)/NDR200 IB; Dual-port QSFP112; PCIe Gen5.0 x16 with x16 PCIe extension option; 16 Arm cores; 32GB on-board DDR; integrated BMC; Crypto Enabled

    • BlueField-3 B3240 P-Series Dual-slot FHHL DPU; 400GbE/NDR IB (default mode); Dual-port QSFP112; PCIe Gen5.0 x16 with x16 PCIe extension option; 16 Arm cores; 32GB on-board DDR; integrated BMC; Crypto Enabled

  • BlueField-3 SuperNICs:

    • BlueField-3 B3210L E-series FHHL SuperNIC, 100GbE (default mode)/HDR100 IB, Dual port QSFP112, PCIe Gen4.0 x16, 8 Arm cores, 16GB on-board DDR, integrated BMC, Crypto Enabled

    • BlueField-3 B3220L E-Series FHHL SuperNIC, 200GbE (default mode)/NDR200 IB, Dual-port QSFP112, PCIe Gen5.0 x16, 8 Arm cores, 16GB on-board DDR, integrated BMC, Crypto Enabled

    • BlueField-3 B3140L E-Series FHHL SuperNIC, 400GbE/NDR IB (default mode), Single-port QSFP112, PCIe Gen5.0 x16, 8 Arm cores, 16GB on-board DDR, integrated BMC, Crypto Enabled

    • BlueField-3 B3140H E-series HHHL SuperNIC, 400GbE (default mode)/NDR IB, Single-port QSFP112, PCIe Gen5.0 x16, 8 Arm cores, 16GB on board DDR, integrated BMC, Crypto Enabled

Note

BlueField platforms with 8GB on-board DDR memory are currently not supported with HBN.


Supported BlueField OS

HBN 2.4.2 supports DOCA 2.9.2 (BSP 4.9.2) on Ubuntu 22.04 OS.

Verified Scalability Limits

HBN 2.4.2 has been tested to sustain the following maximum scalability limits:

Limit

BlueField-2

BlueField-3

Comments

VTEP peers (BlueFields per control plane) in the fabric

8k 1

8k 1

Number of BlueFields (VTEPs) within a single overlay fabric (reachable in the underlay)

L2 VNIs/Overlay networks per BlueField

20

20

Total number of L2 VNIs in the fabric for L2 VXLAN use-case assuming every interface is associated with its own VLAN + L2 VNI

L3 VNIs/Overlay networks per BlueField

20 - for up to 4K VTEPs

10 - for up to 8K VTEPs

20 - for up to 4K VTEPs

10 - for up to 8K VTEPs

Total number of L3 VNIs in the fabric for L3 VXLAN use-case assuming every interface is associated with its own VLAN + L2 VNI + L3 VNI + VRF

BlueFields per a single L2 VNI network

8k

8k

Total number of DPUs, configured with the same L2 VNI (3 real DPUs, 2000 emulated VTEPs)

BlueFields per a single L3 VNI network

8k

8k

Total number of DPUs, configured with the same L3 VNI (3 real DPUs, 2000 emulated VTEPs)

Maximum number of local MAC/ARP entries per BlueField

20

20

Max total number of MAC/ARP entries learned from the host on the DPU

Maximum number of local BGP routes per BlueField

200

200

Max total number of BGP routes advertised by the host to the BlueField (BGP peering with the host): 100 IPv4 + 100 IPv6

Maximum number of remote L3 LPM routes (underlay)

8k

8k

IPv4 or IPv6 underlay LPM routes per BlueField (default + host routes + LPM)

Maximum number of EVPN type-2 entries

16K

16k

Remote overlay MAC/IP entries for compute peers stored on a single BlueField (L2 EVPN use case)

Maximum number of EVPN type-5 entries

32K

80K

Remote overlay L3 LPM entries for compute peers stored on a single BlueField (L3 EVPN use case)

Maximum number of Next-hops in ECMP Next-hop group

16

16

Max number of next-hops in ECMP Next-hop group (for overlay ECMP)

Maximum number of PFs on the Host side

2

2

Total number of PFs visible to the host

Maximum number of VFs on the Host side

16

16

Total number of VFs created on the host

Maximum number of SFs on BlueField side

2

2

Total number of SF devices created on BlueField Arm

  1. Tested with 4 VNIs.            

The following table lists the known issues and limitations for this release of HBN.

Reference

Description

4263035

Description: In L3 EVPN scenarios with 16k overlay and 4k underlay routes, OVS may get stuck or abnormally terminate.

Workaround: ovs-vsctl set Open_vSwitch . other_config:doca-congestion-threshold=60

Keywords: L3 EVPN, 16k overlay, 4k underlay, OVS

Reported in HBN version: 2.4.2

4333972

Description: In case of sudden power off of the host the nvue git repo goes into bad state and wont recover. It might happen but user ill not end up in this state every time.

Workaround: On the DPU:

1. cd /var/lib/hbn/var/lib/nvue/

2. Remove all files from the directory

3. Goto container

4. Restart nvued (supervisorctl restart nvued)

5. Run: nv config apply empty -y

6. Copy the startup.yaml (Your old configuration) to /var/lib/hbn/etc/nvue.d

7. Run: nv config apply startup -y

Keywords: NVUE git, HBN

Reported in HBN version: 2.4.2

4193046

Description: When LLDP is enabled on BlueField, it may not work on uplink ports when HBN service is running. This might happen if LLDP is running without any interface filter configuration.

Workaround: Configure LLDP to run only on interfaces where LLDP is required to be run, using a configuration file, /etc/lldpd.d/ports.conf, for the lldpd daemon. The interfaces can be specified using a regular expression pattern, if needed. For example:

  • To run LLDP only on the uplinks (p0 and p1), the configuration can be done as follows:

    Copy
    Copied!
                

    $ cat /etc/lldpd.d/ports.conf

    Configure system interface pattern p[01].

  • To run LLDP on the uplinks plus some host-facing PFs or VFs, the configuration can be done as follows:

    Copy
    Copied!
                

    $ cat /etc/lldpd.d/ports.conf

    Configure system interface pattern p[0-1],pf[0-1]hpf,pf[0-1]vf[0-12].

If this configuration file is changed while the LLDP service is running, it must be restarted using systemctl restart lldpd.

Keywords: LLDP

Reported in HBN version: 2.4.1

4200335

Description: Sometimes the DNS resolution may fail if resolv.conf is not updated with the proper name server, leading to loss of OOB connectivity.

Workaround: N/A

Keywords: DNS; OOB connectivity

Reported in HBN version: 2.4.1

4197067

Description: The management VRF does not have an IPv6 address configured, resulting in the absence of a default IPv6 route in the management VRF. Consequently, IPv6 connectivity on the management port is unavailable, and only IPv4 connectivity is supported.

Workaround: N/A

Keywords: IPv6 OOB connectivity

Reported in HBN version: 2.4.1

4011688

Description: The following critical error message is generated during HBN POD reboot. It can be safely ignored.

Copy
Copied!
            

Error message: "CRIT Server 'unix_http_server' running without any HTTP authentication checking"

Workaround: N/A

Keywords: Log

Reported in HBN version: 2.4.0

4098158

Description: When using default BGP timers, an OVS restart may lead to extended traffic loss due to BGP peering reset.

Workaround: N/A

Keywords: BGP; OVS

Reported in HBN version: 2.4.0

4155959

Description: With uplinks in the br-sfc bridge, IPv6 traffic in uplink-to-uplink direction results in OVS crash resulting in complete traffic drop.

Workaround: Restart the SFC service:

Copy
Copied!
            

systemctl restart sfc

Keywords: OVS restart; traffic drop

Reported in HBN version: 2.4.0

3743942

Description: HBN container may hang in init-sfs during container restart when the HBN YAML file (i.e., /etc/kubelet.d/doca_hbn.yaml) is modified while container is running.

Workaround: If the container hangs in init-sfs for more than 1 minute, reload the DPU.

Keywords: Hang; container

Reported in HBN version: 2.3.0

3961387

Description: The changing of the port number for NVUE REST API using nv CLI/API is not supported. The following command should not be used to change the port number:

Copy
Copied!
            

nv set system api port <port-no>

Workaround: On HBN, NVUE is accessible through 8765 (i.e., default port number).

Keywords: NVUE API; port number

Reported in HBN version: 2.3.0

3967748

Description: The command nv show system api connections does not return any data.

Workaround: N/A

Keywords: REST API; nginx

Reported in HBN version: 2.3.0

3769309

Description: A ping or other IP connectivity from a locally connected host in vrf-X to an interface IP address on the DPU/HBN itself in vrf-Y will not work, even if VRF route-leaking is enabled between these two VRFs.

Workaround: N/A

Keyword: IP

Reported in HBN version: 2.2.0

3835295

Description: Traffic entering HBN service on a host PF/VF main-interface and exiting on a sub-interface of the same PF/VF (and vice versa) is not hardware offloaded. Similarly, traffic entering HBN service on one sub-interface and exiting on another sub-interface of the same host PF/VF is also not hardware offloaded.

Workaround: N/A

Keyword: Hardware offload; interfaces

Reported in HBN version: 2.2.0

3772552

Description: The DHCP relay gateway-interface IP address does not automatically pick up the IP address assigned to the associated VRF.

Workaround: The gateway-interface IP address must be explicitly configured.

Keyword: DHCP relay gateway; IP

Reported in HBN version: 2.2.0

3891542

Description: If NVUE-based routing policy (route map) configuration is used to associated route target extended communities with a EVPN route, only one route target can be specified.

Workaround: N/A

Keyword: NVUE; route target

Reported in HBN version: 2.2.0

3757686

Description: When the HBN container is coming up and applying a large configuration through the NVUE-startup service which includes entities used by DHCP relay (e.g., interfaces, SVIs and VRFs), the DHCP relay service may go into FATAL state. It can be observed using the following command:

Copy
Copied!
            

supervisorctl status | grep isc-dhcp-relay isc-dhcp-relay-vrf11 RUNNING pid 2069, uptime 0:11:31 isc-dhcp-relay-vrf12 RUNNING pid 2071, uptime 0:11:31 isc-dhcp-relay-vrf13 FATAL Exited too quickly (process log may have details) isc-dhcp-relay-vrf14 FATAL Exited too quickly (process log may have details)

Workaround: Restart the DHCP relay service which is in FATAL state using the command:

Copy
Copied!
            

supervisorctl restart <relay-service-name>

Keyword: DHCP relay; fatal; container; restart

Reported in HBN version: 2.1.0

3605486

Description: When the DPU boots up after issuing a "reboot" command from the DPU itself, some host-side interfaces may remain down.

Workaround:

  1. Restart openibd:

    Copy
    Copied!
                

    systemctl restart openibd

  2. Recreate SR-IOV interfaces if they are needed.

  3. Replay interface config. For example:

    • If using ifupdown2:

      Copy
      Copied!
                  

      ifreload -a 

    • If using Netplan:

      Copy
      Copied!
                  

      netplan apply

Keyword: Reboot

Reported in HBN version: 1.5.0

3547103

Description: IPv6 stateless ACLs are not supported.

Workaround: N/A

Keyword: IPv6 ACL

Reported in HBN version: 1.5.0

3339304

Description: Statistics for hardware-offloaded traffic are not reflected on SFs inside an HBN container.

Workaround: Look up the stats using ip -s link show on PFs outside of the HBN container. PFs would show Tx/Rx stats for traffic that is hardware-accelerated in the HBN container.

Keyword: Statistics; container

Reported in HBN version: 1.4.0

3352003

Description: NVUE show, config, and apply commands malfunction if the nvued and nvued-startup services are not in the RUNNING and EXITED states respectively.

Workaround: N/A

Keyword: NVUE commands

Reported in HBN version: 1.3.0

3184745

Description: The command nv show interface <intf> acl does not show correct information if there are multiple ACLs bound to the interface.

Workaround: Use the command nv show interface <intf> to view the ACLs bound to an interface.

Keyword: ACLs

Reported in HBN version: 1.2.0

3158934

Description: Deleting an NVUE user by removing their password file and restarting the decrypt-user-add service on the HBN container does not work.

Workaround: Either respawn the container after deleting the file or delete the password file corresponding to the user by running userdel -r username.

Keyword: User deletion

Reported in HBN version: 1.2.0

3185003

Description: When a packet is encapsulated with a VXLAN header, it adds extra bytes which may cause the packet to exceed the MTU of link. Typically, the packet would be fragmented but its silently dropped and no fragmentation happens.

Workaround: Make sure that the MTU on the uplink port is always 50 bytes more than host ports so that even after adding VXLAN headers, ingress packets do not exceed the MTU.

Keyword: MTU; VXLAN

Reported in HBN version: 1.2.0

3184905

Description: On VXLAN encapsulation, the DF flag is not propagated to the outer header. Such a packet may be truncated when forwarded in the kernel, and it may be dropped when hardware offloaded.

Workaround: Make sure that the MTU on the uplink port is always 50 bytes more than host ports so that even after adding VXLAN headers, ingress packets do not exceed the MTU.

Keyword: VXLAN

Reported in HBN version: 1.2.0

3188688

Description: When stopping the container using the command crictl stop an error may be reported because the command uses a timeout of 0 which is not enough to stop all the processes in the HBN container.

Workaround: Pass a timeout value when stopping the HBN container by running:

Copy
Copied!
            

crictl stop --timeout 60 <hbn-container>

Keyword: Timeout

Reported in HBN version: 1.2.0

3129749

Description: The same ACL rule cannot be applied in both the inbound and outbound direction on a port.

Workaround: N/A

Keyword: ACLs

Reported in HBN version: 1.2.0

3126560

Description: The system's time zone cannot be modified using NVUE in the HBN container.

Workaround: The time zone can be manually changed by symlinking the /etc/localtime file to a binary time zone's identifier in the /usr/share/zoneinfo directory. For example:

Copy
Copied!
            

sudo ln -sf /usr/share/zoneinfo/GMT /etc/localtime

Keyword: Time zone; NVUE

Reported in HBN version: 1.2.0

3118204

Description: Auto-BGP functionality (where the ASN does not need to be configured but is dynamically inferred by the system based on the system's role as a leaf or spine device) is not supported on HBN.

Workaround: If BGP is configured and used on HBN, the BGP ASN must be manually configured.

Keyword: BGP

Reported in HBN version: 1.2.0

3233088

Description: Since checksum calculation is offloaded to the hardware (not done by the kernel), it is expected to see an incorrect checksum in the tcpdump for locally generated, outgoing packets. BGP keepalives and updates are some of the packets that show such incorrect checksum in tcpdump.

Workaround: N/A

Keyword: BGP

Reported in HBN version: 1.2.0

2821785

Description: MAC addresses are not learned in the hardware but only in software. This may affect performance in pure L2 unicast traffic.

Workaround: N/A

Keyword: MAC; L2

Reported in HBN version: 1.3.0

3017202

Description: Due to disabled backend foundation units, some NVUE commands return 500 INTERNAL SERVER ERROR/404 NOT FOUND. These commands are related to features or subsystems which are not supported on HBN.

Workaround: N/A

Keyword: Unsupported NVUE commands

Reported in HBN version: 1.3.0

2828838

Description: NetworkManager and other services not directly related to HBN may display the following message in syslog:

Copy
Copied!
            

"netlink: read: too many netlink events. Need to resynchronize platform cache"

The message has no functional impact and may be ignored.

Workaround: N/A

Keyword: Error

Reported in HBN version: 1.3.0

The following table lists the known issues which have been fixed for this release of HBN.

Reference

Description

4299341

Description: After booting, DPU might lose DNS connectivity, /etc/resolv.conf being empty, and/or hostname might be set to localhost without out-of-band management access.

Fixed in HBN version: 2.4.2

4303575

Description: If the BGP instance is deleted, FRR service needs to be restarted. The restarting mechanism was not compatible with HBN, leading to config not being applied until user restarts the FRR manually.

Fixed in HBN version: 2.4.2

4309839

Description: If NVUE git datastore gets corrupted and the backup.tar is corrupted, NVUE is not able to recover or start.

Fixed in HBN version: 2.4.2

© Copyright 2025, NVIDIA. Last updated on Mar 24, 2025.