What can I help you with?
NVIDIA UFM Cable Validation Tool v1.4.0

Circuits Overview

The Circuits View provides detailed insights into the connection between two endpoints, referred to as A and Z.

worddav01d3d04081497d8c35a3a3642786f15f-version-1-modificationdate-1743004659156-api-v2.png

Each circuit includes the following information:

1. Status

The status reflects the overall health of the circuit and can have one of the following values:

  • Fail: Indicates that there are issues with one or both endpoints.

  • Pass: Indicates that there are no issues with either endpoint.

  • Incomplete: Indicates no issues with the endpoints, but the port status is not yet ready.

2. Endpoint Information

Detailed information about each endpoint includes:

  • Location: Specifies the endpoint's location, Rack, and Unit.

  • Node and Port Details: information about the nodes and the associated ports.

  • Transceiver Information: Includes details such as:

    • Transceiver part number (hidden by default)

    • Transceiver serial number (hidden by default)

    • Firmware version (hidden by default)

    • Connection status: Displays false if one of the endpoints has a "No Transceiver" issue.

  • Port Status: Indicates the current state of the port:

    • Down: If any of the following conditions are present:

      • "No Transceiver"

      • "Link Down, No Signal"

      • "ErrDisable - Flap"

      • "Admin Down"

      • "ErrDisable - Rx"

      • "Negotiation Failure"

    • Up: When none of the above conditions apply.

  • Signal Status: Shows transmission and reception power levels (Tx/Rx power lanes) , only supported for Admin users with InfiniBand and Ethernet fabric.new

  • BER Counters: Provides bit error rate (BER) statistics, such as raw BER and effective BER , only supported for Admin users with InfiniBand and Ethernet fabric.new

  • Traffic Counters: Includes traffic statistics, such as errors, drops, and byte counts for incoming and outgoing traffic.

  • Recommended Action: Suggests actions to resolve any identified issues.

  • Report Status: Indicates the timeliness of the report, with the following values:

    • No Report: The report has not been updated yet.

    • Stale Report: The report has not been updated within the last 15 minutes (the associated endpoint will be grayed out).

    • Latest Report: The report was updated within the last 15 minutes.

By default, only circuits with a Fail status (unhealthy circuits) are displayed. However, users can view all circuits by selecting the All option.

worddav7b0044f302172d5c5bbbaa860b030ef5-version-1-modificationdate-1743004659418-api-v2.png

To select the peer-port cable issue user can right-click on the specified circuit and select Go To Circuit to see the combined circuit details and remediation actions.

worddava1f4ba613ab87f72b74c9ecac1e6c081-version-1-modificationdate-1743004659669-api-v2.png

Rx/Tx Power Lane

Rx/Tx Power Lane are only supported with InfiniBand and Ethernet fabric

In the realm of data transmission, understanding the power values for each lane in the port is crucial for maintaining optimal performance and ensuring efficient signal transmission.

Rx/Tx Power Values for Each Lane

The Rx and Tx power values are essential metrics that provide insight into the performance of each lane in a port. These values are especially significant in high-speed data transmission environments, where maintaining the correct power levels is critical for data integrity and system reliability.

NDR Switches

For NDR switches, the power values for each lane in the port are provided, with each cage containing two ports. The relevant lanes for each port can be identified using the _Lanes_Used' counter. This counter plays a vital role in determining which lanes are active and relevant for data transmission.

The 'Module_Lanes_Used' counter is a binary indicator that specifies the active lanes in a particular module. For instance:

  • If the value is '1_1_1_1_0_0_0_0', lanes 4-7 should be considered.

  • If the value is '0_0_0_0_1_1_1_1', lanes 0-3 should be taken into account.

The Rx and Tx power values for these lanes are denoted as rx_power_lane_n and tx_power_lane_n, where 'n' can range from 0 to 7, depending on the active lanes.

InfiniBand Technology

When it comes to InfiniBand technology, the approach to handling Rx and Tx power values differs slightly. InfiniBand typically provides power values only for lanes 0-3. As a result, the 'Module_Lanes_Used' mask value is disregarded in this context.

Testing and Verification

Step 1: Identifying Active Lanes

Using the 'Module_Lanes_Used' counter, identify the active lanes for each port. For NDR switches, deter 'Module mine whether lanes 0-3 or 4-7 are active based on the counter's value.

Step 2: Recording Power Values

Record the Rx and Tx power values for the active lanes. Ensure that the values are accurately captured and correspond to the designated lanes.

Step 3: Cross-Verification

Cross-verify the recorded power values with the expected values for the active lanes. This step is crucial for identifying any discrepancies or anomalies in the power readings.

Step 4: Ignoring Irrelevant Lanes

For InfiniBand, ignore the 'Module_Lanes_Used' mask value and focus solely on lanes 0-3. Verify that the power values for these lanes are accurate and within the acceptable range.

This page is only supported for Admin users with InfiniBand and Ethernet fabric

worddav2c6e91f58c6989d173eca865319b43fb-version-1-modificationdate-1743004659930-api-v2.png

switches Carrier transitions are monitored every 10 seconds. If it increments by more than 1 a link flap alarm is raised and the circuit will treated as flapping circuit.

The system provides a detailed view of current and historical flapping events for both endpoints. The historical data spans multiple time intervals, including the last 30 seconds, 1 minute, 5 minutes, 1 hour, 12 hours, and 24 hours. This information is displayed in a table.

The table includes the following columns for both endpoints:

  • Data Hall and SU

  • Location: Specifies the endpoint's location, Rack, and Unit.

  • Node and Port Details: information about the nodes and the associated ports.

  • Status: flapping status and it could be one of the following values

    1. Ok - no flapping events since agent started

    2. Flapping - agent detected a flapping event in the last 1 minute

    3. Flapped - agent detected a flapping event at some point. The Flapping event counter >= 1.

  • Total Flapping Count: This is the total number of transitions occurring since the bringup agent is started on the switch.

  • Flap 30 sec: how many flaps happened in the last 30 seconds.

  • Flap 1 min: how many flaps happened in the last 1 minute.

  • Flap 5 min: how many flaps happened in the last 5 minutes.

  • Flap 1 hour: how many flaps happened in the last 1 hour.

  • Flap 12 hour: how many flaps happened in the last 12 hours.

  • Flap 24 hour: how many flaps happened in the last 24 hours.

worddav2051cb3fbd7c4e06c21c5cbf2156614f-version-1-modificationdate-1743004660199-api-v2.png

Flapping History

The circuit maintains a record of the number of flaps occurred, along with the corresponding timestamps, over a 24-hour period. This record is referred to as the flapping history. Users can view this history by selecting any row in the flapping circuits table.

worddav3978594eb621542678a453c00c444c8a-version-1-modificationdate-1743004660436-api-v2.png

The history is displayed as a bar chart, where green bars represent the A endpoint and blue bars represent the Z endpoint.

If flaps are detected simultaneously for both the A and Z endpoints, the Z value is stacked on top of the A value in the bar chart. For example, in the first bar of the chart, the A endpoint has a value of 4, and the Z endpoint has a value of 3, making the total height of the bar 7.

Additionally, users can toggle the visibility of individual bars by interacting with the chart legend located on the right-hand side of the chart.

© Copyright 2025, NVIDIA. Last updated on Mar 26, 2025.