InfiniBand Fabric Diagnostic Utilities

The diagnostic utilities described in this chapter provide means for debugging the connectivity and status of InfiniBand (IB) devices in a fabric.

This section first describes common configuration, interface, and addressing for all the tools in the package. Then it provides detailed descriptions of the tools themselves including: operation, synopsis and options descriptions, error codes, and examples.

Topology File (Optional)

An InfiniBand fabric is composed of switches and channel adapter (HCA/TCA) devices. To identify devices in a fabric (or even in one switch system), each device is given a GUID (a MAC equivalent). Since a GUID is a non-user-friendly string of characters, it is better to alias it to a meaningful, user-given name. For this objective, the IB Diagnostic Tools can be provided with a “topology file”, which is an optional configuration file specifying the IB fabric topology in user-given names.

For diagnostic tools to fully support the topology file, the user may need to provide the local system name (if the local hostname is not used in the topology file).

To specify a topology file to a diagnostic tool use one of the following two options:

  1. On the command line, specify the file name using the option ‘-t <topology file name>’

  2. Define the environment variable IBDIAG_TOPO_FILE

To specify the local system name to a diagnostic tool, use one of the following two options:

  1. On the command line, specify the system name using the option ‘-s <local system name>’

  2. Define the environment variable IBDIAG_SYS_NAME

IB Interface Definition

The diagnostic tools installed on a machine connect to the IB fabric by means of an HCA port through which they send MADs. To specify this port to an IB diagnostic tool use one of the following options:

  1. On the command line, specify the port number using the option ‘-p <local port number>’ (see below)

  2. Define the environment variable IBDIAG_PORT_NUM

In case more than one HCA device is installed on the local machine, it is necessary to specify the device’s index to the tool as well. For this use on of the following options:

  1. On the command line, specify the index of the local device using the following option:
    ‘-i <index of local device>’

Define the environment variable IBDIAG_DEV_IDX.

Addressing

Warning

This section applies to the ibdiagpath tool only. A tool command may require defining the destination device or port to which it applies.

The following addressing modes can be used to define the IB ports:

  • Using a Directed Route to the destination: (Tool option ‘-d’)
    This option defines a directed route of output port numbers from the local port to the destination.

  • Using port LIDs: (Tool option ‘-l’):
    In this mode, the source and destination ports are defined by means of their LIDs. If the fabric is configured to allow multiple LIDs per port, then using any of them is valid for defining a port.

  • Using port names defined in the topology file: (Tool option ‘-n’)
    This option refers to the source and destination ports by the names defined in the topology file. (Therefore, this option is relevant only if a topology file is specified to the tool.) In this mode, the tool uses the names to extract the port LIDs from the matched topology, then the tool operates as in the ‘-l’ option.

Warning

For further information on the following tools, please refer to the tool's main page.

Diagnostic Utilities

Utility

Description

ibdiagnet

Scans the fabric using directed route packets and extracts all the available information regarding its connectivity and devices. It’s only supported in Windows Server 2012 and above, or Windows Client 8.1 and above.

ibportstate

Enables querying the logical (link) and physical port states of an InfiniBand port. It also allows adjusting the link speed that is enabled on any InfiniBand port.

If the queried port is a switch port, then ibportstate can be used to

•Disable, enable or reset the port

•Validate the port’s link width and speed against the peer port

ibroute

Uses SMPs to display the forwarding tables for unicast (LinearForwardingTable or LFT) or multicast (MulticastForwardingTable or MFT) for the specified switch LID and the optional lid (mlid) range. The default range is all valid entries in the range of 1 to FDBTop.

ibdump

Dumps InfiniBand, Ethernet and all RoCE versions’ traffic that flows to and from Mellanox ConnectX®-3/ConnectX®-3 Pro NIC’s ports. It provides a similar functionality to the tcpdump tool on a 'standard' Ethernet port. The ibdump tool generates packet dump file in .pcap format. This file can be loaded by the Wireshark tool (www.wireshark.org) for graphical traffic analysis.

This provides the ability to analyze network behavior and performance, and to debug applications that send or receive RDMA network traffic. Run "ibdump -h" to display a help message which details the tools options.

smpquery

Provides a basic subset of standard SMP queries to query Subnet management attributes such as node info, node description, switch info, and port info.

perfquery

Queries InfiniBand ports’ performance and error counters. Optionally, it displays aggregated counters for all ports of a node. It can also reset counters after reading them or simply reset them.

ibping

Uses vendor MADs to validate connectivity between IB nodes. On exit, (IP) ping like output is shown. ibping is run as client/server, however the default is to run it as a client. Note also that in addition to ibping, a default server is implemented within the kernel.

ibnetdiscover

Performs IB subnet discovery and outputs a readable topology file. GUIDs, node types, and port numbers are displayed as well as port LIDs and NodeDescriptions. All nodes (and links) are displayed (full topology). Optionally, this utility can be used to list the current connected nodes by node-type. The output is printed to standard output unless a topology file is specified.

ibtracert

Uses SMPs to trace the path from a source GID/LID to a destination GID/LID. Each hop along the path is displayed until the destination is reached or a hop does not respond. By using the -m option, multicast path tracing can be performed between source and destination nodes.

sminfo

Optionally sets and displays the output of a sminfo query in a readable format. The target SM is the one listed in the local port info, or the SM specified by the optional SM lid or by the SM direct routed path.

ibclearerrors

Clears the PMA error counters in PortCounters by either waking the InfiniBand subnet topology or using an already saved topology file.

ibstat

Displays basic information obtained from the local IB driver. Output includes LID, SMLID, port state, link width active, and port physical state.

vstat

Displays information on the HCA attributes.

osmtest

Validates InfiniBand subnet manager and administration (SM/SA). Default is to run all flows with the exception of the QoS flow. osmtest provides a test suite for opensm.

ibaddr

Displays the lid (and range) as well as the GID address of the port specified (by DR path, lid, or GUID) or the local port by default.

ibcacheedit

Allows users to edit an ibnetdiscover cache created through the --cache option in ibnetdiscover(8).

iblinkinfo

Reports link info for each port in an IB fabric, node by node. Optionally, iblinkinfo can do partial scans and limit its output to parts of a fabric.

ibqueryerrors

Reports the port error counters which exceed a threshold for each port in the fabric. The default threshold is zero (0). Error fields can also be suppressed entirely.

In addition to reporting errors on every port. ibqueryerrors can report the port transmit and receive data as well as report full link information to the remote port if available.

ibsysstat

Uses vendor MADs to validate connectivity between InfiniBand nodes and obtain other information about the InfiniBand node. ibsysstat is run as client/server. Default is to run as client.

saquery

Issues the selected SA query. Node records are queried by default.

smpdump

Gets SM attributes from a specified SMA. The result is dumped in hex by default.

© Copyright 2023, NVIDIA. Last updated on Oct 26, 2023.