Introduction
The Cable Validation Tool (CVT) is designed to ensure the accuracy and quality of network cluster wiring. Its primary purpose is to validate the connectivity of physical links within the cluster and verify high-quality communication between the network components. By maintaining the integrity of these connections.
The Collector, also referred to as the "bring-up server," serves as the central component of the system and operates as a Docker container. It can be deployed on any machine connected to the management network of the switches. This deployment enables seamless communication with the switches. For large-scale systems, the Collector relies on dedicated agents installed on each switch. These agents are responsible for verifying the connections between the switches.

Collector Responsibilities
The Collector performs the following critical tasks:
Deployment and Execution: It is installed and executed on a server with network access to all nodes requiring validation.
Topology Validation: Reads the Topology Files (P2P, topo or dot), which serves as the authoritative source for validating the physical link connections in the fabric topology.
Agent Management:
deploy agents on all nodes
Monitor agents' health
Supports external (unmanaged) agent deployments, with the Collector only monitoring their health.
Data Collection and Processing: Collects and processes agents' reports from all validated nodes
User Interface and Reporting:
Displays validation results through a web page, along with recommended remediation steps.
Provides data visualization options, including aggregation, sorting, and filtering, and supports downloading reports in CSV format. REST APIs are also available for integration with other systems.
Agent Responsibilities
Agents are installed on all switches and servers within the cluster. Their key functions include:
deployed on all Switches and Servers
Real-Time Monitoring:
Monitor node and link statuses every 10 seconds. Agents detect changes in link states and, when a change occurs, send an updated status report to the Collector.
If no changes are detected, agents send a periodic status report every 10 minutes.
Amberfile collection takes place upon state changes, which can take 40-50s.
Event-Driven Reporting:
Upon receiving a "start_validation" message from the Collector, agents initiate status reporting.
Reports are triggered in two scenarios:
When a link status change is detected (ad-hoc report).
Every 10 minutes as part of routine reporting.
Further details on the Collector and Cable Agents, including their operational workflows, will be discussed in the subsequent chapters.
Supported Fabric Types
The UFM Cable Validation Tool is compatible with three types of fabric:
InfiniBand
NVOS
Ethernet
The CVT tool supports the following symptoms/issues:
Validate Physical Connections (Cable, End points) / Miswiring
Wrong-neighbor – the cable connects to a different device than Topo file (P2P, topo or dot) dictates
Wrong-port – the cable connects to the expected device but on the wrong port
Unknown-neighbor – the cable connects to a device not mentioned in the Topo file (P2P, topo or dot) or LLDP is not enabled/failing
Extra Cable – a cable was found to be connected but not part of the loaded Topo file (P2P, topo or dot)
No Transceiver – a transceiver is not present in the port
Validate Layer1 Link Integrity (Bit Error Rate, Lane powers, Temperature)
Flapping link – the link state has transitioned up->down->up on its own or due to external actions.
Link Down, No Signal – fiber is not connected or broken
ErrDisable-Rx - interface down events due to the Server NIC firmware bug issues (RX Disable)
Err-Disable-Flap - Link Protection feature (5 flaps/10 sec.) due to excessive flaps
Anomalous-Port - out-of-range parameters such as transceiver lane signal strength or transceiver temperature
Underperforming (BER) Bit-Error-Rate
Effective BER errors should be 0 during the first 125 mins of the link being up
Raw BER should be ≤ 1e-6
Effective BER should be ≤ 1.5E-254 for ≤ 6hrs measurements and ≤ 1E-15 for ≥ 6hrs measurements
Triage Non-cable issues (Provisioning, CVT issues) requiring Escalations
AdminDown - spectrum switch port is administratively shutdown
Negotiation Fail - detects interface issue due to speed and duplex mismatch between devices
No-report – agent communication is working but report not received
unreachable-device – agent not installed, not running, not reachable (e.g. port 8251 not open in switch configuration)
Syndrome | Description | non-admin users | Supported Fabric |
Negotiation Fail | Negotiation of speed, fec or config issue. | YES | ib, eth, nvos |
AdminDown | link is disabled administratively | Yes | ib, eth, nvos |
Wrong-neighbor | Port is connected to the wrong peer node | YES | ib, eth, nvos |
Wrong-port | Port is connected to the wrong port in the correct peer node | YES | ib, eth, nvos |
Extra-cable | port is connected but neighbor is not in the P2P topology | YES | ib, eth, nvos |
Flapping-link | On switches: Carrier transitions are monitored every 10 seconds. If it increments by more than 2 in 125 sec interval, a link flap alarm is raised | YES | ib, eth, nvos |
Underperforming-link | High BER counters | YES | ib, eth |
Anomalous-port (Signal, Temperature) | Some counters are not in range | YES | ib, eth, nvos |
Unreachable-device | Cannot ping it and/or Agent not deployed or Agent communication is failing | NO | ib, eth, nvos |
No Transceiver | port is down and transceiver is not plugged in | YES | ib, eth, nvos |
Unknown-neighbor | Port is up, however no peer info found one known instance is when the far end is not reachable | NO | ib, eth, nvos |
Link Down, No signal | port is down, while transceivers are plugged in | YES | ib, eth, nvos |
ErrDisable – Flap, Proto Down | Cumulus switch Port locally disabled by Link Protection (≥5 flaps/10s), defensive mechanism enabled by default. | YES | ib, eth, nvos |
ErrDisable -Rx | interface down events due to the Server NIC firmware bug issues (RX Disable) | YES | ib, eth, nvos |
no report | node is reachable but no agent report was received yet | NO | ib, eth, nvos |