NVIDIA UFM Cable Validation Tool v1.8.1

Deploying the Module

Fabric Size

CPU Requirements*

Memory Requirements

Disk Space Requirements

Minimum

Recommended

Up to 1000 nodes

4-core server

4 GB

20 GB

50 GB

1000-5000 nodes

8-core server

16 GB

40 GB

120 GB

5000-10000 nodes

16-core server

32 GB

80 GB

160 GB

Above 10000 nodes

Contact NVIDIA Support

Supported architectures for collector installation: x86_64 and ARM64 (aarch64)

The following operating systems were tested with Docker Container:

Component

Type and Version

Supported OS

  • RHEL8

  • RHEL9

  • RHEL10

  • Ubuntu20.04

  • Ubuntu22.04

  • Ubuntu24.04

  • Debian 10

  • Oracle Enterprise Linux 8.10

The Cable Validation tool can be deployed in two methods:

Deploy the cables_bringupcontainer on a host as described below:

  1. docker load -i <image_path>/cables_bringup_<version>.tar.gz

  2. docker run --name cables_bringup -itd --network=host cables_bringup

  3. docker exec -it cables_bringup /bin/bash

Setting Docker Environment

There are three ways to set environment variables to help customize some of the settings on CVT.

  1. Set the values in CVT environment variable configuration file [Least Priority]

  2. Setting Environment variables when starting the docker container

  3. Exporting environment variables manually inside the container [Highest Priority]

Environment Variable Configuration File

To enhance flexibility and usability, CVT supports environment variable management through a dedicated file. All variables listed below can be set in the configuration file, which includes default values for easy customization.

If an environment variable is defined in both the Docker environment and the configuration file, the Docker environment value takes precedence.

Step 1: Configuration File

The cvt_env.conf file is installed with CVT and comes preloaded with default values.

You can modify this file to match your environment requirements.

Step 2: Updating Variables

To update an environment variable:

  1. Edit the <cable_bringup_root>/config/cvt_env.conf file.

  2. Save your changes.

  3. Restart the CVT collector for the changes to take effect.

    This can be done in the following ways:

    1. supervisorctl restart cvt-service
    2. supervisorctl stop cvt-service bringupcli -k
Note

Note: A Docker container restart is not required—only the CVT collector needs to be restarted.

Sample cvt_env.conf file

Copy
Copied!
            

[Version] # DO NOT EDIT THIS SECTION # Developer note: when adding/removing/changing a variable, you must increment the version number. # Version of the cvt_env.conf file # This version is used to check if the cvt_env.conf file is compatible with the current version of the CVT # If the version is not compatible, the original cvt_env.conf file will be saved as cvt_env.conf.save # and the new cvt_env.conf file will be created with the current version # The new cvt_env.conf file will be used to start the CVT CVT_ENV_VERSION = 1.0.7 # ### Variable names are case-sensitive, and should be unique among sections.   [network] ### Network Configuration # IP addresses used by the agents: # if no Environment Variable is set, the IP address of the default interface will be used. # if AGENTS_COLLECTOR_NAT_IP is set # - the agents (switch and host) will use this IP address # otherwise # - if DEFAULT_AGENTS_INTERFACE_NAME is set, switch and host agents will use the IP address of the interface # specified by DEFAULT_AGENTS_INTERFACE_NAME # - if HOST_SPECIFIC_INTERFACE_NAME is set, host agents will use the IP address of the interface # specified by HOST_SPECIFIC_INTERFACE_NAME # Collector External (NAT) IP address and port; define if there is a NAT between the collector and the agents # this IP address and port are used by the agents to communicate the collector # fetch images and send data/reports to the collector # Leave empty if there is no NAT between the collector and the agents # the NAT port is used for port-forwarding from K8s or other intermediate networks to the collector # if the port is empty, CVT will use the https port of the collector (APACHE_HTTPS_PORT) or default 443. AGENTS_COLLECTOR_NAT_IP = AGENTS_COLLECTOR_NAT_PORT = # Interface name of the collector over which all the agents (switch and host) will communicate # The IP address of this interface will be used by all agents (switch and host) to communicate to # the collector to fetch images and send data/reports DEFAULT_AGENTS_INTERFACE_NAME = # Use the following variable if you want to use a different interface of the collector for the host agents # Define if the interface to connect with hosts is different from the one used for switch agents # specified by DEFAULT_AGENTS_INTERFACE_NAME # The IP address of this interface will be used by the host agents to communicate to the # collector to fetch images and send data/reports # Leave empty if you want to use the same interface as DEFAULT_AGENTS_INTERFACE_NAME HOST_SPECIFIC_INTERFACE_NAME =   [agent] ### Agent Configuration (settings used by the agents themselves) # set `true` if the switch hostname contains a dot (other than the domain part) CV_DOT_IN_HOSTNAME = # Time after which a full report is forced to be published. # Value to be provided in minutes. Default is 720 minutes (12 hours). # Interval less than 10 mins is not supported. FULL_REPORT_PUBLISH_INTERVAL_MINUTES = 720 # Set to `true` to publish amber data on each agent iteration, regardless of changes. # Default is `false` - amber is only published when there are changes or during forced full reports. AMBER_PUBLISH_EACH_ITERATION = false # Agent data collection interval in seconds. Default is 600 seconds (10 minutes). # This controls how often the agent collects and processes port/link data. AGENT_COLLECT_INTERVAL = 600 # Optional: add arbitrary extra fields/counters to be included in advanced stats. # These fields are looked up first in the AMBER CSV record (by column name), and if not found, # they are looked up as AdvancedStats attributes (e.g. transceiver_reinsert_cnt). # NOTE: all fields must be numeric. # Format: comma-separated list (whitespace is ignored). Example: # CUSTOM_AMBER_FIELDS = transceiver_reinsert_cnt, transceiver_swap_cnt, Advanced_Status_Opcode CUSTOM_AMBER_FIELDS = Advanced_Status_Opcode # Enable/Disable nvlink addition and validation for gb200/300 nodes. # Default is `true`. Set to `false` to disable nvlink validation. NVLINK_VALIDATION = true CVT_SKIP_ACP_REPORTS = false # Interval to check ports in seconds (default 10 seconds) # This controls how often the agent checks the ports for changes. CHECK_PORTS_INTERVAL = 10 # Maximum interval between authentication failures in seconds (default 10 minutes) # interval starts from 10 seconds and doubles every failure, till it reaches the maximum interval MAX_AUTH_FAILURE_INTERVAL = 600   [agent deployment] ### Container Runtime Configuration for Agent Deployment # These settings control how agents are deployed on target hosts. # Supports Docker (default), containerd (via nerdctl), and Kubernetes. # # Container runtime to use for agent deployment. # Leave empty for auto-detection (recommended). # Supported values: docker, containerd, k8s # - docker: Uses Docker daemon (default if available) # - containerd: Uses nerdctl CLI with containerd # - k8s: Deploys as Kubernetes Pod or DaemonSet # NOTE: Switch deployments (SONiC, Cumulus) always use Docker regardless of this setting. CVT_CONTAINER_RUNTIME = # containerd namespace for nerdctl (only used when CVT_CONTAINER_RUNTIME = containerd) # Default: default CVT_CONTAINERD_NAMESPACE = default # Kubernetes namespace for agent deployment (only used when CVT_CONTAINER_RUNTIME = k8s) # The namespace will be created if it doesn't exist. # Default: cvt-agent CVT_K8S_NAMESPACE = cvt-agent # Kubernetes Pod/DaemonSet name (only used when CVT_CONTAINER_RUNTIME = k8s) # Default: cables-agent CVT_K8S_APP_NAME = cables-agent   [collector] ### Collector Configuration (settings used by the collector to manage agents) # Max time to wait for an agent to become inactive in minutes MAX_INACTIVE_INTERVAL = 15 # Interval to check for new switches in minutes CHECK_NEW_SWITCHES_INTERVAL = 15 # Time to wait for an agent to become active after start validation (minutes) START_VALIDATION_TIMEOUT = 5 # Time to wait for an agent to become inactive in minutes WAIT_TIME_INACTIVE_AGENTS = 1 ### Worker Concurrency Settings # Max number of workers to run in parallel for general operations (validation, connectivity, DNS) CVT_MAX_WORKERS = 30 # Max number of workers for agent deployment (limited due to 384MB image transfers) CVT_DEPLOYMENT_MAX_WORKERS = 30 ### Timeout Settings # Quick timeout for unreachable devices (seconds) - reduces wait time for failed connections CVT_QUICK_TIMEOUT = 3 # Agent communication timeout (seconds) - timeout for individual HTTP requests to agents AGENT_COMM_TIMEOUT = 30 ### Batch Processing Settings # Batching threshold - only use batching for deployments larger than this (reduces overhead) CVT_BATCHING_THRESHOLD = 5000 # Batch size when batching is used (devices per batch) CVT_BATCH_SIZE = 1000 ### DNS Resolution Settings # DNS resolver options for fast timeouts to avoid long waits on unresolvable hostnames # This configures the system resolver behavior when load_topo performs parallel DNS resolution # Format: timeout:X attempts:Y single-request # - timeout:X : seconds to wait per DNS query (default: 1) # - attempts:Y : number of retry attempts (default: 1, no retries) # - single-request : send A and AAAA queries separately (improves performance) CVT_DNS_RES_OPTIONS = timeout:1 attempts:1 single-request   [ssh] ### SSH Configuration # SSH private key file path for passwordless authentication to HOST devices only. # NOTE: SSH keys are NOT used for switch devices (switches use password authentication). # Used by both SSH commands and SFTP file transfers during agent deployment to hosts. # IMPORTANT: Path must be accessible inside the collector container, not the host system. # If using Docker volumes, ensure the key file is mounted into the container. # - Leave empty to use standard SSH key discovery (recommended) # - When empty, SSH will automatically try default container locations like: # - ~/.ssh/id_rsa, ~/.ssh/id_dsa, ~/.ssh/id_ecdsa, ~/.ssh/id_ed25519 # Set to specific path only if you need to use a non-standard key location # Examples: # 1. CV_SSH_KEY_FILE = /opt/collector/keys/host_key (container path) # 2. CV_SSH_KEY_FILE = /home/collector/.ssh/custom_key (container path) CV_SSH_KEY_FILE = # SSH connection timeout in seconds # Applied to both SSH command execution and SFTP file transfers # Increase for slow networks, decrease for faster failure detection SSH_CONN_TIMEOUT = 20 # Enable automatic SSH key discovery from SSH agent and default locations # NOTE: Only applies to HOST devices, not switches # When enabled, the system will try to use keys from (inside container): # - SSH agent (if running and accessible in container) # - Default container locations (~/.ssh/id_rsa, ~/.ssh/id_dsa, etc.) # Only used for host devices when no password is provided SSH_LOOK_FOR_KEYS = true   [application] # Topology loading at startup - specify what to load: # none - do not load any topology file (default) # last - load the last loaded topology from history # <path> - load a specific topology file (supports .topo, .dot, .xlsx, .json) # Examples: # 1. STARTUP_TOPOLOGY = none # 2. STARTUP_TOPOLOGY = last # 3. STARTUP_TOPOLOGY = ./topologies/production.topo # 4. STARTUP_TOPOLOGY = /absolute/path/to/topology.xlsx STARTUP_TOPOLOGY = none # Credentials file to load at startup # If not set, no credentials will be loaded # 1. STARTUP_CREDENTIALS_FILE = /opt/collector/credentials.ini # 2. STARTUP_CREDENTIALS_FILE = /absolute/path/to/credentials.json # 3. STARTUP_CREDENTIALS_FILE = none # If STARTUP_CREDENTIALS_FILE is set to none, no credentials will be loaded STARTUP_CREDENTIALS_FILE = none # automatically start validation if a topology file is loaded AUTO_START_VALIDATION = false   [apache] ### Apache Web Server Configuration # Apache performance profile based on server resources and cluster scale # This controls Apache's concurrency, connection handling, and optimization settings # Options: # small - 2-16 cores / 8-64GB RAM / < 1,000 agents (default) # medium - 32-128 cores / 128-512GB RAM / 1,000-10,000 agents # large - 200-300 cores / 1TB+ RAM / 10,000-20,000 agents # xlarge - 400+ cores / 1.5TB+ RAM / 20,000+ agents # auto - Auto-detect based on available CPU cores CVT_APACHE_PROFILE = small # ### Apache Logging Configuration # Log level for Apache error logs. Higher levels reduce log volume. # Options (from most to least verbose): # debug, info, notice, warn (default), error, crit, alert, emerg # For high-scale production deployments, consider using 'error' or 'crit' CVT_APACHE_LOG_LEVEL = warn # Enable or disable Apache access logs (request logging) # Set to 'false' to disable access logs and reduce disk I/O on high-traffic deployments # Options: true (default), false CVT_APACHE_ACCESS_LOG = true   [data management] # set `true` if you want to poll for stats from CVT collector ENABLE_STATS_POLLING = false # ### Data Archiving Configuration # Cron schedule for data directory archiving (compresses old date directories) # Format: minute hour day month weekday # Default: 5 0 * * * (daily at 00:05 UTC) # Adjust to run during low-usage periods in your timezone # Examples: # 0 6 * * * = 06:00 UTC (1am CT / 2am ET) # 0 2 * * * = 02:00 UTC (9pm ET previous day) # 30 7 * * * = 07:30 UTC (2:30am CT) CVT_DATA_ARCHIVE_SCHEDULE = 5 0 * * * # Maximum number of plain (uncompressed) data directories to keep # Older directories are compressed to .tgz archives CVT_MAX_PLAIN_DATA_DIRS = 7 # Maximum total data entries (plain + archived) to keep # Oldest archives are deleted when this limit is exceeded CVT_MAX_DATA_DIRS = 30

More details on the configuration parameters can be found at CVT Configuration

Setting Environmental Variables with Docker Run

Specifying the Network Interface

If the host system is equipped with multiple network interfaces and the switches are connected to the host through an interface that differs from the default management interface, the user can designate this particular interface by utilizing a specific environment variable, namely AGENTS_IFC_NAME. To illustrate, assuming the hypothetical interface name is eno3:

Copy
Copied!
            

docker run --name cables_bringup -itd --network=host --env AGENTS_IFC_NAME=eno3


Adding Hostnames

If the switches are not configured in the DNS server, you may add hostnames; the user may use the --add-host option when running the container. For example (assuming the switch name is switch-3245fa and its IP is 192.168.1.1):

Copy
Copied!
            

docker run --name cables_bringup -itd --network=host --add-host=switch-3245fa:192.168.1.1 cables_bringup


Using Volumes

Volumes can be used for data persistence or easier file transfer to the cables_bringup container. The volume must be mapped to /cable_bringup_root in the container for data persistence. This volume can also be used for loading topology files. Example:

Copy
Copied!
            

docker run --name cables_bringup -itd --network=host -v /opt/bringup_data:/cable_bringup_root cables_bringup


Overriding Apache Configuration

In the event that a host machine is running another Apache instance and utilizing the default ssh ports 443, an alternative port may be designated for the bringup server by the user, these ports should be available and free. To accomplish this, the APACHE_HTTPS_PORT environment variables can be employed. Consider the following example:

Copy
Copied!
            

docker run --name cables_bringup -itd --network=host --env APACHE_HTTPS_PORT=9443 cables_bringup

Warning

Warning: Please note that Running Cable Validation as plugin is not supported on UFM Gen2.0.

Deploy the module as a UFM Enterprise plugin as follows:

  1. docker load -i /<image_path>/ufm-plugin-cablevalidation-<version>.tar.gz

  2. ./manage_ufm_plugins.sh add -p cablevalidation -t <version>

  3. ./manage_ufm_plugins.sh start -p cablevalidation

  4. docker exec -it ufm-plugin-cablevalidation bash

Copy Files to the Plugin

Users have two methods for copying files, such as topology files, to the Cable Validation plugin:

  1. Copy the files to the plugin's data volume /opt/ufm/ufm_plugins_data/cablevalidation which is mapped to /data/ inside the plugin container.

  2. Use docker cp to copy the needed files to the container.

Overriding the Apache Configuration

When using Cable Validation as a plugin, the default ports 443 are already in use by UFM Enterprise. Therefore, port 8633 will be used for HTTPS by default. Users can opt to use different ports for the bring-up server, provided that these ports are available and free.

The plugin config.cfg file can be modified to update APACHE_HTTPS_PORT variables for that purpose. To make this adjustment, follow these steps:

  1. Execute /opt/ufm/scripts/manage_ufm_plugins.sh add -p cablevalidation to add the Cable Validation plugin.

  2. Stop the plugin using /opt/ufm/scripts/manage_ufm_plugins.sh stop -p cablevalidation

  3. Use vim /opt/ufm/files/conf/plugins/cablevalidation/config.cfg to modify the 'APACHE_HTTPS_PORT' variable.

  4. Update and save the file.

  5. Start the plugin again with /opt/ufm/scripts/manage_ufm_plugins.sh start -p cablevalidation.

With these changes, the new configuration will take effect, and Apache will run with the updated ports.

© Copyright 2026, NVIDIA. Last updated on Feb 20, 2026