Cable Validation
NVIDIA Cable Validation tool is a platform for connectivity validation.
Cable validation is the process of validating the actual cable deployment, against the expected topology (from the planning).
This process can be done in parallel / in between the deployment of portions of the cluster, to validate that the current deployed equipment is valid as expected.
The tool is not dependent on a working SM, or any working IB communication at all, but utilizes the management interfaces of all the connected network devices.
This allows to bring up the cluster gradually / incrementally, and to validate the deployment in smaller pieces, rather than all at once.
NOTE: the tool is dependent on connectivity of the host it works on (the UFM host), to the management network of the network devices. i.e., the network devices should be reachable from the host through the management network.
NOTE: the validation can utilize managed switches only
Main flow of how the tool works
load the expected/planned topology into the tool
install and run agents on all reachable managed switches (through the management interface/network)
initiate start of validation - it triggers each agent to:
perform IB neighbors search
compare the result to the expected topology
report it to the main tool
aggregate the report of all switches, and display the results/issues
repeat steps 3-4 every ~10 minutes (by default. interval time can change in different situations)
load and activate the cable validation tool as UFM-plugin
load the planned topology into the tool
deploy agents on all reachable switches
start the validation
check the results
more detailed information can be found HERE
load and activate the cable validation tool as UFM-plugin
in the UFM GUI, click on 'Settings' in left side main menu
enter to 'Plugin Management' tab
click on 'Upload new plugin's image' green button
if loading from local, browse to the image file location
if pulling from online repository, can use this one: https://hub.docker.com/r/mellanox/ufm-plugin-cablevalidation/tags
the pulling/upload of the image will be displayed in 'Jobs' (on left side menu)
when the job is completed, the plugin can be seen again in 'Plugin Management' tab, back in the 'Settings' window
right click on the 'cablevalidation', and hit 'Add'
after activation, you'll be requested to refresh the UFM GUI, then the 'Cable Validation' will be visible on the left side menu (with no data yet at it's window)
verify that the tool is using reachable management interface
in terminal, connect to the UFM host (the primary/master)
check UFM's default management address
~# hostname -i
if the address in the output is reachable/pingable by the switches, this stage is done. otherwise, continue to step 4 and on.
look for a management interface (using 'ifconfig' command) that is reachable from a switch (using ping from the switch to the address of the interface)
enter to the cable validation container
~# docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES d6170edaf7ff mellanox/ufm-plugin-cablevalidation:latest
"/usr/bin/supervisor…"
4
minutes ago Up4
minutes ufm-plugin-cablevalidation d5919cfda713 mellanox/ufm-enterprise:latest"/bin/bash /usr/sbin…"
6
hours ago Up6
hours ufm~# docker exec -it ufm-plugin-cablevalidation bash /#
edit the config file at: /config/config.cfg
set there the following env variable with the name of the chosen management interface, and save
AGENTS_IFC_NAME=<
interface
-name>
load the planned topology
in terminal, connect to the UFM host (the primary/master)
show the containers, verify that cable validation container is there
~# docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES d6170edaf7ff mellanox/ufm-plugin-cablevalidation:latest
"/usr/bin/supervisor…"
4
minutes ago Up4
minutes ufm-plugin-cablevalidation d5919cfda713 mellanox/ufm-enterprise:latest"/bin/bash /usr/sbin…"
6
hours ago Up6
hours ufmupload the topo file into the cable validation container
~# docker cp output.topo ufm-plugin-cablevalidation:/tmp/output.topo Successfully copied
2
.56kB to ufm-plugin-cablevalidation:/tmp/output.topoenter to the cable validation container terminal
~# docker exec -it ufm-plugin-cablevalidation bash
enter to the tool cli shell
/# bringupcli
to exit bringup cli shell, can type 'exit' or hit CTRL+D
FYI: you can use 'help' to check what commands/operations available, or use help on each command for description. also, auto completion is available for these commands
Cable Bringup: help Documented commands (type help <topic>): ======================================== add_certificate load remove_single_agent start_validation amber_show_latest load_clusters set_default_creds stop_validation check_switch_status load_ip set_node_creds version deploy_all_agents load_ptp show_clusters deploy_single_agent load_topo show_switch_history exit remove_all_agents show_switches Cable Bringup:
Cable Bringup: help load_topo load_topo filename dns=
true
/false
[cluster=<cluster name>]default
dns=true
If no dns server to resolve hostnames in topo file, you should set dns=false
and provide IP addresses file. whentrue
, no need to provide IP addresses.if
cluster name is provided it will be set to the provided value,else
it will be set to'default'
. Cable Bringup:load the topo using 'load_topo' command
Cable Bringup: load_topo /tmp/output.topo Load topology from file: /tmp/output.topo Loaded
3
switches,12
links. Loaded IP addresses of3
switches! Cable Bringup:
deploy agents on all reachable switches
deploy agents on all managed switches, wait for all agents to be installed and started
Cable Bringup: deploy_all_agents
start the validation
start the validation process
Cable Bringup: start_validation
check results
the results can be seen on the terminal or in the Cable Validation window in UFM GUI
the actual output shows syndromes / issues of the actual connections, compared to the expected (from the topo).
the expectation is to have no issues related to the currently deployed connections