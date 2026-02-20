Introduction
The Cable Validation Tool (CVT) is designed to ensure the accuracy and quality of network cluster wiring. Its primary purpose is to validate the connectivity of physical links within the cluster and verify high-quality communication between the network components. By maintaining the integrity of these connections.
The Collector, also referred to as the "bring-up server," serves as the central component of the system and operates as a Docker container. It can be deployed on any machine connected to the management network of the switches. This deployment enables seamless communication with the switches. For large-scale systems, the Collector relies on dedicated agents installed on each switch. These agents are responsible for verifying the connections between the switches.
Collector Responsibilities
The Collector performs the following critical tasks:
Deployment and Execution: It is installed and executed on a server with network access to all nodes requiring validation.
Topology Validation: Reads the Topology Files (P2P, topo or dot), which serves as the authoritative source for validating the physical link connections in the fabric topology.
Agent Management:
Deploys agents on all nodes and uninstalls them
Monitors agents' health
Supports external (unmanaged) agent deployments, with the Collector only monitoring their health
Data Collection and Processing: Collects and processes agents' reports from all validated nodes
User Interface and Reporting:
Displays validation results through a web page, along with recommended remediation steps.
Provides data visualization options, including aggregation, sorting, and filtering, and supports downloading reports in CSV format. REST APIs are also available for integration with other systems.
Agent Responsibilities
Agents are installed on all switches and servers within the cluster. Their key functions include:
deployed on all Switches and Servers
Real-Time Monitoring:
Monitor node and link statuses every 10 seconds. Agents detect changes in link states and, when a change occurs, send an updated status report to the Collector.
If no changes are detected, agents send a periodic status report every 10 minutes.
Amberfile collection takes place upon state changes, which can take 40-50s.
Event-Driven Reporting:
Upon receiving a "start_validation" message from the Collector, agents initiate status reporting.
Reports are triggered in two scenarios:
When a link status change is detected (ad-hoc report).
Every 10 minutes as part of routine reporting.
Further details on the Collector and Cable Agents, including their operational workflows, will be discussed in the subsequent chapters.
The CVT tool supports the following symptoms/issues:
Validate Physical Connections (Cable, End points) / Miswiring
Wrong-neighbor - the cable connects to a different device than Topo file (P2P, topo or dot) dictates
Wrong-port - the cable connects to the expected device but on the wrong port
Unknown-neighbor - the cable connects to a device not mentioned in the Topo file (P2P, topo or dot) or LLDP is not enabled/failing
Extra Cable - a cable was found to be connected but not part of the loaded Topo file (P2P, topo or dot)
Media Unplugged - a cable is missing in the port. A transceiver is not present in the port if optical cable is used.
NIC name-mismatch - The interface name of hosts in the ptp file does not match any actual interface names on the host
RX/TX power mismatch - The power on the Transmitter side and Receiver side does not match. Alerted when the power difference is greater than 3dBm
Validate Layer1 Link Integrity (Bit Error Rate, Lane powers, Temperature)
Flapping link - the link state has transitioned up->down->up on its own or due to external actions.
Link Down, No Signal - fiber is not connected or broken
ErrDisable-Rx - interface down events due to the Server NIC firmware bug issues (RX Disable)
Err-Disable-Flap - Link Protection feature (5 flaps/10 sec.) due to excessive flaps
Anomalous-Port - out-of-range parameters such as transceiver lane signal strength or transceiver temperature
Underperforming (BER) Bit-Error-Rate
Effective BER errors should be 0 during the first 125 mins of the link being up
Raw BER should be ≤ 1e-6
Effective BER should be ≤ 1.5E-254 for ≤ 6hrs measurements and ≤ 1E-15 for ≥ 6hrs measurements
Triage Non-cable issues (Provisioning, CVT issues) requiring Escalations
AdminDown - spectrum switch port is administratively shutdown
Negotiation Fail - detects interface issue due to speed and duplex mismatch between devices
No-report - agent communication is working but report not received
unreachable-device - agent not installed, not running, not reachable (e.g. port 8251 not open in switch configuration)
Syndrome
Description
non-admin users
Supported Fabric
Negotiation Fail
Negotiation of speed, fec or config issue
YES
IB, ETH, XDR, NVlink
AdminDown
Link is disabled administratively
YES
IB, ETH, XDR, NVlink
Wrong-neighbor
Port is connected to the wrong peer node
YES
IB, ETH, XDR
Wrong-port
Port is connected to the wrong port in the correct peer node
YES
IB, ETH, XDR
Extra-cable
port is connected but neighbor is not in the P2P topology
YES
IB, ETH, XDR
Flapping-link
On switches:
Carrier transitions are monitored every 10 seconds. If it increments by more than 2 in 125 sec interval, a link flap alarm is raised
YES
IB, ETH, XDR, NVlink
Underperforming-link
High BER counters
YES
IB, ETH, XDR, NVlink
Anomalous-port (Signal, Temperature)
Some counters are not in range
YES
IB, ETH, XDR, NVlink
Unreachable-device
Cannot ping it and/or
Agent not deployed orAgent communication is failing
NO
IB, ETH, XDR, NVlink
Media Unplugged
Port is down and no cable is plugged in or transceiver is not plugged in if its a optical cable
YES
IB, ETH, XDR
NIC Name-mismatch
Interface name of hosts in the ptp file does not match any actual interface names on the host
YES
ETH
Unknown-neighbor
Port is up, however no peer info found
one known instance is when the far end is not reachable
NO
IB, ETH, XDR
Link Down, No signal
Port is down, while transceivers are plugged in
YES
IB, ETH, XDR, NVlink
ErrDisable – Flap, Proto Down
Cumulus switch Port locally disabled by Link Protection (≥5 flaps/10s), defensive mechanism enabled by default.
YES
IB, ETH, XDR, NVlink
ErrDisable -Rx
Interface down events due to the Server NIC firmware bug issues (RX Disable)
YES
IB, ETH, XDR, NVlink
RX/TX power mismatch
The difference in power on the Transmitter and Receiver side is greater than 3dBm
YES
IB, ETH, XDR, NVlink
no report
Node is reachable but no agent report was received yet
NO
IB, ETH, XDR, NVlink
Feature
Description
Supported Fabric
Circuit View
Shows the links being monitored
IB, ETH, XDR, NVlink
Port Status
Shows the port is up or down
IB, ETH, XDR, NVlink
Link Syndrome
Shows the cable/port issues for the link
IB, ETH, XDR, NVlink
BER Stats
Shows the eff BER, raw BER, and grading based on BER
IB, ETH, XDR, NVlink
Report Anomalies
Shows the anomalies detected
IB, ETH, XDR, NVlink
Flapping Status
Shows the advanced flapping stats
IB, ETH, XDR, NVlink
Rack View
Shows the switches and hosts on the rack
IB, ETH, XDR, NVlink
System Admin
Allows to start the bringup service, load topology, set credentials and start validation. Shows overview of the validation session and reporting status of agents. Allows to manage GUI users and displays collector resource utilization.
IB, ETH, XDR, NVlink
Golden BER Test
Creates a test to monitor Bit Error Rates (BER) and analyze the interface counters
IB, ETH
Amber Collection Test
Allows to collect amber files on demand
IB, ETH, XDR, NVlink
Advanced Flapping Test
Creates a flapping test to analyze the metrics that could lead to a flapping event
IB, ETH, XDR, NVlink