Connectivity Troubleshooting

Connection (ssh, screen console) to the BlueField is lost

The UART cable in the Accessories Kit (OPN: MBF20-DKIT) can be used to connect to the DPU console and identify the stage at which BlueField is hanging.

Follow this procedure:

  1. Connect the UART cable to a USB socket, and find it in your USB devices.

    sudo lsusb
Bus 002 Device 003: ID 0403:6001 Future Technology Devices International, Ltd FT232 Serial (UART) IC

    Warning

    For more information on the UART connectivity, please refer to the DPU's hardware user guide under Supported Interfaces > Interfaces Detailed Description > NC-SI Management Interface.

    Note

    It is good practice to connect the other end of the NC-SI cable to a different host than the one on which the BlueField DPU is installed.

  2. Install the minicom application.

    • For CentOS/RHEL:

      sudo yum install minicom -y

    • For Ubuntu/Debian:

      sudo apt-get install minicom

  3. Open the minicom application.

    sudo minicom -s -c on

  4. Go to "Serial port setup"

  5. Enter "F" to change "Hardware Flow control" to NO

  6. Enter "A" and change to /dev/ttyUSB0 and press Enter

  7. Press ESC.

  8. Type on "Save setup as dfl"

  9. Exit minicom by pressing Ctrl + a + z.

        +-----------------------------------------------------------------------+
    | A -    Serial Device      : /dev/ttyUSB0                              |
    |                                                                       |
    | C -   Callin Program      :                                           |
    | D -  Callout Program      :                                           |
    | E -    Bps/Par/Bits       : 115200 8N1                                |
    | F - Hardware Flow Control : No                                        |
    | G - Software Flow Control : No                                        |
    |                                                                       |
    |    Change which setting?                                              |
    +-----------------------------------------------------------------------+

Driver not loading in host server

What this looks like in dmsg:

[275604.216789] mlx5_core 0000:af:00.1: 63.008 Gb/s available PCIe bandwidth, limited by 8 GT/s x8 link at 0000:ae:00.0 (capable of 126.024 Gb/s with 16 GT/s x8 link)
[275624.187596] mlx5_core 0000:af:00.1: wait_fw_init:316:(pid 943): Waiting for FW initialization, timeout abort in 100s
[275644.152994] mlx5_core 0000:af:00.1: wait_fw_init:316:(pid 943): Waiting for FW initialization, timeout abort in 79s
[275664.118404] mlx5_core 0000:af:00.1: wait_fw_init:316:(pid 943): Waiting for FW initialization, timeout abort in 59s
[275684.083806] mlx5_core 0000:af:00.1: wait_fw_init:316:(pid 943): Waiting for FW initialization, timeout abort in 39s
[275704.049211] mlx5_core 0000:af:00.1: wait_fw_init:316:(pid 943): Waiting for FW initialization, timeout abort in 19s
[275723.954752] mlx5_core 0000:af:00.1: mlx5_function_setup:1237:(pid 943): Firmware over 120000 MS in pre-initializing state, aborting
[275723.968261] mlx5_core 0000:af:00.1: init_one:1813:(pid 943): mlx5_load_one failed with error code -16
[275723.978578] mlx5_core: probe of 0000:af:00.1 failed with error -16

The driver on the host server is dependent on the Arm side. If the driver on Arm is up, then the driver on the host server will also be up.

Please verify that:

  • The driver is loaded in the BlueField (Arm)

  • The Arm is booted into OS

  • The Arm is not in UEFI Boot Menu

  • The Arm is not hanged

Then:

  1. Power cycle on the host server.

  2. If the problem persists, please reset nvconfig (sudo mlxconfig -d /dev/mst/<device> -y reset), and then power cycle the host.

    Warning

    If your DPU is VPI capable, please be aware that this configuration will reset the link type on the network ports to IB. To change the network port's link type to Ethernet, run:

    sudo mlxconfig -d <device> s LINK_TYPE_P1=2 LINK_TYPE_P2=2

  3. If this problem still persists, please make sure to install the latest bfb image and then restart the driver in host server. Please refer to "Upgrading NVIDIA BlueField DPU Software" for more information.

No connectivity between network interfaces of source host to destination device

Verify that the bridge is configured properly on the Arm side.

The following is an example for default configuration:

$ sudo ovs-vsctl show                                                    
f6740bfb-0312-4cd8-88c0-a9680430924f
    Bridge ovsbr1                   
        Port pf0sf0                 
            Interface pf0sf0        
        Port p0                     
            Interface p0            
        Port pf0hpf                 
            Interface pf0hpf        
        Port ovsbr1                 
            Interface ovsbr1        
                type: internal      
    Bridge ovsbr2                   
        Port p1                     
            Interface p1            
        Port pf1sf0                 
            Interface pf1sf0        
        Port pf1hpf                 
            Interface pf1hpf        
        Port ovsbr2                 
            Interface ovsbr2       
                type: internal      
    ovs_version: "2.14.1"

If no bridge configuration exists, please refer to "Virtual Switch on BlueField DPU".

Uplink in Arm down while uplink in host server up

Please check that the cables are connected properly into the network ports of the DPU and the peer device.
