NVIDIA BlueField DPU BSP v4.5.0
1.0

Connectivity Troubleshooting

The UART cable in the Accessories Kit (OPN: MBF20-DKIT) can be used to connect to the DPU console and identify the stage at which BlueField is hanging.

Follow this procedure:

  1. Connect the UART cable to a USB socket, and find it in your USB devices.

    Copy
    Copied!
                

    sudo lsusb Bus 002 Device 003: ID 0403:6001 Future Technology Devices International, Ltd FT232 Serial (UART) IC

    Warning

    For more information on the UART connectivity, please refer to the DPU's hardware user guide under Supported Interfaces > Interfaces Detailed Description > NC-SI Management Interface.

    Note

    It is good practice to connect the other end of the NC-SI cable to a different host than the one on which the BlueField DPU is installed.

  2. Install the minicom application.

    • For CentOS/RHEL:

      Copy
      Copied!
                  

      sudo yum install minicom -y

    • For Ubuntu/Debian:

      Copy
      Copied!
                  

      sudo apt-get install minicom

  3. Open the minicom application.

    Copy
    Copied!
                

    sudo minicom -s -c on

  4. Go to "Serial port setup"

  5. Enter "F" to change "Hardware Flow control" to NO

  6. Enter "A" and change to /dev/ttyUSB0 and press Enter

  7. Press ESC.

  8. Type on "Save setup as dfl"

  9. Exit minicom by pressing Ctrl + a + z.

    Copy
    Copied!
                

    +-----------------------------------------------------------------------+ | A - Serial Device : /dev/ttyUSB0 | | | | C - Callin Program : | | D - Callout Program : | | E - Bps/Par/Bits : 115200 8N1 | | F - Hardware Flow Control : No | | G - Software Flow Control : No | | | | Change which setting? | +-----------------------------------------------------------------------+

What this looks like in dmsg:

Copy
Copied!
            

[275604.216789] mlx5_core 0000:af:00.1: 63.008 Gb/s available PCIe bandwidth, limited by 8 GT/s x8 link at 0000:ae:00.0 (capable of 126.024 Gb/s with 16 GT/s x8 link) [275624.187596] mlx5_core 0000:af:00.1: wait_fw_init:316:(pid 943): Waiting for FW initialization, timeout abort in 100s [275644.152994] mlx5_core 0000:af:00.1: wait_fw_init:316:(pid 943): Waiting for FW initialization, timeout abort in 79s [275664.118404] mlx5_core 0000:af:00.1: wait_fw_init:316:(pid 943): Waiting for FW initialization, timeout abort in 59s [275684.083806] mlx5_core 0000:af:00.1: wait_fw_init:316:(pid 943): Waiting for FW initialization, timeout abort in 39s [275704.049211] mlx5_core 0000:af:00.1: wait_fw_init:316:(pid 943): Waiting for FW initialization, timeout abort in 19s [275723.954752] mlx5_core 0000:af:00.1: mlx5_function_setup:1237:(pid 943): Firmware over 120000 MS in pre-initializing state, aborting [275723.968261] mlx5_core 0000:af:00.1: init_one:1813:(pid 943): mlx5_load_one failed with error code -16 [275723.978578] mlx5_core: probe of 0000:af:00.1 failed with error -16

The driver on the host server is dependent on the Arm side. If the driver on Arm is up, then the driver on the host server will also be up.

Please verify that:

  • The driver is loaded in the BlueField (Arm)

  • The Arm is booted into OS

  • The Arm is not in UEFI Boot Menu

  • The Arm is not hanged

Then:

  1. Perform a graceful shutdown and a power cycle on the host server.

  2. If the problem persists, reset nvconfig (sudo mlxconfig -d /dev/mst/<device> -y reset), perform a graceful shutdown, then power cycle the host.

    Warning

    If your DPU is VPI capable, please be aware that this configuration will reset the link type on the network ports to IB. To change the network port's link type to Ethernet, run:

    Copy
    Copied!
                

    sudo mlxconfig -d <device> s LINK_TYPE_P1=2 LINK_TYPE_P2=2

    Perform a graceful shutdown and system a graceful shutdown and Perform a graceful shutdown and system

  3. If this problem still persists, please make sure to install the latest bfb image and then restart the driver in host server. Please refer to "Upgrading NVIDIA BlueField DPU Software" for more information.

Verify that the bridge is configured properly on the Arm side.

The following is an example for default configuration:

Copy
Copied!
            

$ sudo ovs-vsctl show f6740bfb-0312-4cd8-88c0-a9680430924f Bridge ovsbr1 Port pf0sf0 Interface pf0sf0 Port p0 Interface p0 Port pf0hpf Interface pf0hpf Port ovsbr1 Interface ovsbr1 type: internal Bridge ovsbr2 Port p1 Interface p1 Port pf1sf0 Interface pf1sf0 Port pf1hpf Interface pf1hpf Port ovsbr2 Interface ovsbr2 type: internal ovs_version: "2.14.1"

If no bridge configuration exists, please refer to "Virtual Switch on BlueField DPU".

Please check that the cables are connected properly into the network ports of the DPU and the peer device.

© Copyright 2023, NVIDIA. Last updated on Jan 29, 2024.