What can I help you with?
NVIDIA BlueField Platform Software Troubleshooting Guide

BlueField Out-of-band Management

The BlueField OOB interface is a gigabit Ethernet interface which provides TCP/IP network connectivity to the Arm cores. This interface is named oob_net0 and is intended to be used for traffic management (e.g., file transfer protocols, SSH, etc). The Linux driver controlling this interface is named mlxbf_gige.ko and is automatically loaded upon boot. This interface can be configured and monitored using standard Linux tools (e.g., ifconfig, ethtool, etc).

Command

Description

ifconfig oob_net0

Display industry standard statistics on oob_net0 interface

ethtool oob_net0

Display Ethernet-specific configuration of oob_net0 interface

ethtool -r oob_net0

Restart auto-negotiation on oob_net0 interface

ethtool -S oob_net0

Display vendor-specific statistics on oob_net0 interface

ethtool -a oob_net0

Display pause frame configuration of oob_net0 interface

ethtool -I -a oob_net0

Display pause frame counters of oob_net0 interface

Industry Standard Counters

The commands ifconfig oob_net0 and ip -s link show oob_net0 are two ways to display standard counters, for example packet transmit and receive.

Copy
Copied!
            

# ifconfig oob_net0 oob_net0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500         inet 10.15.8.54  netmask 255.255.255.0  broadcast 10.15.8.255         inet6 fe80::293e:3f2b:443d:cc0c  prefixlen 64  scopeid 0x20<link>         ether b8:3f:d2:e1:c7:20  txqueuelen 1000  (Ethernet)         RX packets 1679160  bytes 144372970 (137.6 MiB)         RX errors 0  dropped 0  overruns 0  frame 0         TX packets 183989  bytes 17345583 (16.5 MiB)         TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0   # ip -s link show oob_net0 3: oob_net0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP mode DEFAULT group default qlen 1000     link/ether b8:3f:d2:e1:c7:20 brd ff:ff:ff:ff:ff:ff   RX:  bytes packets errors dropped  missed   mcast    144396715 1679434      0       0       0       0   TX:  bytes packets errors dropped carrier collsns     17348389  184020      0       0       0       0


Vendor-Specific Counters

The command ethtool -S oob_net0 will display vendor-specific counters that are not part of standard statistics.

Copy
Copied!
            

# ethtool -S oob_net0 NIC statistics:      hw_access_errors: 0      tx_invalid_checksums: 0      tx_small_frames: 0      tx_index_errors: 0      sw_config_errors: 0      sw_access_errors: 0      rx_truncate_errors: 0      rx_mac_errors: 0      rx_din_dropped_pkts: 0      tx_fifo_full: 0      rx_filter_passed_pkts: 1679734      rx_filter_discard_pkts: 80759


Interface oob_net0 is Not Present

  • If the output of ifconfig -a does not show the oob_net0 interface, check if the proper kernel modules are configured in the kernel and dynamically loaded

  • Check that the mlxbf_gige module is configured properly and loaded:

    • If the mlxbf_gige driver is configured as a loadable kernel module (CONFIG_MLXBF_GIGE=m) then the output of lsmod should show mlxbf_gige

    • If the mlxbf_gige driver is configured as a built-in module (CONFIG_MLXBF_GIGE=y) then the driver is present and there is no need to check output of lsmod

  • Check that the appropriate PHY device driver is configured properly and loaded:

    • BlueField-2 uses Micrel 9031 PHY

      • If the micrel driver is configured as a loadable kernel module (CONFIG_MICREL=m) then the output of lsmod should show micrel

      • If the micrel driver is configured as a built-in module (CONFIG_MICREL=y) then the driver is present and there is no need to check output of lsmod

    • BlueField-3 uses Vitesse 8221 PHY

      • If the vitesse driver is configured as a loadable kernel module (CONFIG_VITESSE_PHY=m) then the output of lsmod should show vitesse

      • If the vitesse driver is configured as a built-in module (CONFIG_VITESSE_PHY=y) then the driver is present and there is no need to check output of lsmod

  • Ensure the startup script for the DPU OS is creating and bringing up the oob_net0 interface

Interface oob_net0 is Present, but Link is Not Up

  • If the output of ifconfig oob_net0 does not show RUNNING state, the interface is down

  • Check that the proper kernel modules are loaded (see section "Interface oob_net0 is not present")

  • Ensure that the proper PHY device driver has attached by executing the command dmesg | grep -i phy from the DPU Linux console:

    • BlueField-2 platforms use a Micrel PHY, so the following output should appear in the dmesg:

      Copy
      Copied!
                  

      Micrel KSZ9031 Gigabit PHY MLNXBF17:00:03: attached PHY driver [Micrel KSZ9031 Gigabit PHY] (mii_bus:phy_addr=MLNXBF17:00:03, irq=67)

    • BlueField-3 platforms use a Vitesse PHY, so the following output should appear in the dmesg:

      Copy
      Copied!
                  

      Vitesse VSC8221 MLNXBF17:00:03: attached PHY driver (mii_bus:phy_addr=MLNXBF17:00:03, irq=55)

  • Execute the command dmesg | grep -i gige from DPU Linux console and verify there are no errors

  • Execute the command ethtool oob_net0 and make sure to see Link detected in the output

  • Issue the command ethtool -r oob_net0 to restart auto-negotiation as this may bring up link

Interface oob_net0 is Present and the Link is Up, but Interface has No IP Address

  1. If the interface is configured for DHCP, verify configuration of the DHCP server.

  2. If DHCP configuration looks appropriate, execute ethtool -S oob_net0 to check for any packet errors.

    1. If there are packet errors and the counts are increasing, execute ethtool -r oob_net0 to reset the PHY and trigger auto-negotiation restart. This may clear up the situation. Issue the command cat /proc/interrupts | grep gige_rx to check for packet receive interrupts. The interrupt count must increase, otherwise the interface is not receiving packets.

    2. If there are no packet errors and the interface appears to be receiving packets (interrupt count is increasing), then execute dhclient -r oob_net0 followed by dhclient oob_net0 to retrigger DHCP resolution.

Interface oob_net0 has an IP Address but External Host Cannot Ping this IP Address

  • Try to ping the IP address of the DPU BMC

  • If this ping is not successful, check cabling to the BlueField platform RJ-45 port

Interface oob_net0 is Using Incorrect MAC Address

  • During the manufacturing process, a MAC address is allocated for oob_net0 interface

  • This MAC address is shown as OOB: <value> on a board-level label of the BlueField platform

  • This same MAC address should be visible in two places:

    • Output of bfcfg -d command, which displays manufacturing data, including MAC and this must match value shown on board level label

    • Output of ifconfig oob_net0 command, which displays interface statistics and MAC.

      Info

      Unless a new MAC address has been allocated manually, the value shown in ifconfig oob_net0 should match value shown in bfcfg -d.

© Copyright 2024, NVIDIA. Last updated on Nov 12, 2024.