NVIDIA UFM Cable Validation Tool v1.7.1

SSH Configuration and Usage in Agent Deployment/Uninstall

The Cable Validation Tool (CVT) uses SSH for deploying and managing agents on Linux-based devices (hosts and switches). This document provides verification and QA teams with comprehensive guidance on SSH configuration, testing procedures, troubleshooting, and validation criteria for agent deployment and uninstall operations.

System Components

The CVT system uses multiple SSH components to handle different types of device connections:

  1. SSH Connection Management

    • Base SSH client for establishing secure connections

    • Linux-specific SSH client for command execution on hosts and Linux switches

    • SFTP client for secure file transfers during deployment

    • Specialized client for MLNX-OS switch communication

  2. Agent Deployment System

    • Linux agent deployment handler for hosts and Linux switches

    • MLNX-OS agent deployment handler for Mellanox switches

Device Support Matrix

Device Type

OS Type

SSH Usage

Authentication

Host

Linux

SSH + SFTP

Password or SSH Keys

Switch

Linux (Cumulus, NVOS)

SSH + SFTP

Password (required)

Switch

MLNX-OS

JSON API (not SSH)

Password only

Important Notes:

  • SSH is NOT used for MLNX-OS switches - they use JSON API over HTTP/HTTPS

  • For switches (including Linux switches), password authentication is required as the agent uses these credentials to communicate with the switch for port information retrieval

  • Supported Linux switch operating systems: Cumulus Linux, NVOS (for NVLink and XDR switches)

  • SONiC is not currently supported

Environment Variables

Configure SSH behavior using these environment variables in /etc/cablevalidation/cvt_env.conf:

[ssh]

# SSH private key file path for HOST devices only

# NOTE: SSH keys are NOT used for switch devices (switches require password authentication)

# Switch passwords are mandatory as agents use them to communicate with switches for port information

# Path must be accessible inside the collector container

CV_SSH_KEY_FILE=

# SSH connection timeout in seconds (default: 20)

# Applied to both SSH commands and SFTP transfers

SSH_CONN_TIMEOUT=20

# Enable automatic SSH key discovery (default: true)

# Only applies to HOST devices when no password is provided

# Searches: SSH agent, ~/.ssh/id_rsa, ~/.ssh/id_dsa, ~/.ssh/id_ecdsa, ~/.ssh/id_ed25519

SSH_LOOK_FOR_KEYS=true

Key Configuration Details

  1. CV_SSH_KEY_FILE

    • Only used for HOST devices (switches always require passwords)

    • Must be a container-accessible path

    • Leave empty for automatic key discovery

    • Switch devices cannot use SSH keys due to agent communication requirements

  2. SSH_CONN_TIMEOUT

    • Connection timeout in seconds

    • Applies to both SSH commands and SFTP transfers

    • Increase for slow networks, decrease for faster failure detection

  3. SSH_LOOK_FOR_KEYS

    • Only affects HOST devices (not applicable to switches)

    • When enabled, searches standard SSH key locations

    • Only used when no password is provided for hosts

1. Password Authentication

  • Supported: All Linux devices (hosts and switches)

  • Configuration: Set credentials using CVT credential management

  • Usage: MANDATORY for all switches (required for agent communication with switch for port information)

  • Usage: Optional for hosts (can use SSH keys instead)

2. SSH Key Authentication

  • Supported: HOST devices only (NOT supported for switches)

  • Configuration: Set CV_SSH_KEY_FILE or enable SSH_LOOK_FOR_KEYS

  • Usage: Available for hosts only

  • Switch Limitation: Switches cannot use SSH keys because deployed agents need password credentials to communicate with the switch OS for retrieving port information

Authentication Priority (for hosts only)

  1. SSH key authentication (if key available and no password set)

  2. Password authentication (if password provided)

  3. Automatic key discovery (if SSH_LOOK_FOR_KEYS=true and no password)

Switch Authentication Requirements

  • All switch types require password authentication

  • SSH keys are not supported for switches

  • Passwords are used by agents for ongoing switch communication

  • Supported switch OS types: Cumulus Linux, NVOS (NVLink/XDR switches)

  • Not supported: SONiC (not currently supported)

Linux Devices Deployment Flow

  1. Preparation Phase

    • System generates deployment script from template

    • Script customized with environment-specific values (image URLs, checksums, configuration)

    • Temporary deployment file created for transfer

  2. File Transfer Phase (SFTP)

    • Secure connection established to target device

    • Deployment script uploaded to /tmp directory on target device

    • Connection closed after successful transfer

  3. Execution Phase (SSH)

    • SSH connection established for command execution

    • Deployment script executed with elevated privileges

    • Cleanup commands remove temporary files

    • Connection closed after completion

  4. Validation Phase

    • Deployment results logged and validated

    • Temporary files removed from both systems

    • Success/failure status reported

Deployment Script Features

The install_agent.sh script handles:

  • Architecture detection (x86_64, aarch64)

  • Docker prerequisite checks

  • Image download with checksum verification

  • Container deployment with appropriate parameters

  • GPU support detection (for specific hardware)

  • LLDP socket mounting (for ethernet monitoring)

  • Comprehensive logging to /var/log/cvt_deployment.log

Linux Devices Uninstall Flow

  1. Preparation

    • Generate uninstall script from template (uninstall_agent.sh)

    • Create temporary uninstall file

  2. File Transfer and Execution

    • Same SFTP upload process as deployment

    • Execute uninstall script with sudo privileges

  3. Uninstall Operations

    • Container shutdown and removal

    • Docker image cleanup

    • System resource cleanup

Setting Credentials

The CVT system supports multiple levels of credential configuration:

Default Credentials

  • Configure default username/password for all switches (password required)

  • Configure default username/password for all hosts (password can be empty if using SSH keys)

  • Applied when no specific credentials are found

Node-Specific Credentials

  • Set unique credentials for individual devices

  • Override default credentials for specific IP addresses

  • For hosts: password can be empty when using SSH key authentication

  • For switches: password is always required

  • Highest priority in credential resolution

Credential Profiles

  • Group devices with common credentials

  • Assign profile names to device groups

  • Manage credentials for multiple devices centrally

  • Same password rules apply: switches require passwords, hosts can use empty passwords with SSH keys

Credential Priority

  1. Node-specific credentials

  2. Credential profile credentials (if assigned)

  3. Default credentials for device type

Common SSH Issues

  1. Authentication Failures

    SSH Authentication failure: please check device credentials
    • Verify credentials in CVT credential management

    • Check if SSH keys are properly configured for hosts

    • Ensure SSH service is running on target device

  2. Connection Timeouts

    Failed to execute commands on node: <IP>: Connection timeout
    • Increase SSH_CONN_TIMEOUT value

    • Check network connectivity to target device

    • Verify firewall rules allow SSH (port 22)

  3. Permission Denied

    Failed to execute commands on node: <IP>: Permission denied
    • Verify sudo access for the user account

    • Check if password is required for sudo

    • Ensure user has Docker access permissions

  4. File Transfer Failures

    Failed to upload deployment file
    • Check SFTP connectivity

    • Verify write permissions to /tmp directory

    • Ensure sufficient disk space on target device

SSH Key Issues

  1. Key File Not Found

    • Verify CV_SSH_KEY_FILE path is accessible in container

    • Check file permissions (should be 600 or 400)

    • Ensure key file is mounted into container if using Docker volumes

  2. Key Format Issues

    • Ensure key is in OpenSSH format (not PuTTY or other formats)

    • Verify key format compatibility with paramiko SSH library

    • Check for proper key file structure and encoding

  3. Key Permission Problems

    • Verify SSH key file permissions are restrictive (600 or 400)

    • Ensure correct ownership of key files

    • Check that key files are readable by the CVT process

Deployment Script Issues

  1. Docker Not Available

    docker is not installed, it is required for running the agent
    • Install Docker on target device

    • Ensure Docker service is running

    • Add user to docker group if needed

  2. Image Download Failures

    Failed to fetch the image from server
    • Check network connectivity to image server

    • Verify image URL is accessible

    • Check firewall rules for HTTP/HTTPS traffic

  3. Checksum Verification Failures

    Checksum verification failed!
    • Image may be corrupted during download

    • Network issues during transfer

    • Script will automatically retry download

Debugging Steps

  1. Enable Debug Logging

    • Check deployment logs: /var/log/cvt_deployment.log on target device

    • Review CVT collector logs for SSH connection details

  2. Manual SSH Testing

    • Test SSH connectivity with specified timeout values

    • Verify SFTP connectivity for file transfer operations

    • Test with specific SSH keys when configured

    • Validate authentication methods work as expected

  3. Network Connectivity Testing

    • Verify basic network connectivity to target devices

    • Test SSH port accessibility (default port 22)

    • Check for firewall or network restrictions

    • Validate network latency and timeout settings

Security

  1. SSH Key Management

    • Use dedicated SSH keys for CVT operations

    • Rotate keys regularly

    • Restrict key access with proper file permissions

    • Consider using SSH agent forwarding in containers

  2. Credential Security

    • Use strong passwords

    • Implement credential rotation policies

    • Use credential profiles for device groups

    • Store credentials securely (CVT encrypts stored credentials)

  3. Network Security

    • Use SSH key authentication when possible

    • Implement network segmentation

    • Configure firewall rules appropriately

    • Consider using SSH jump hosts for isolated networks

Performance

  1. Connection Management

    • Adjust SSH_CONN_TIMEOUT based on network conditions

    • Use parallel deployment for multiple devices

    • Monitor deployment worker limits

  2. Resource Management

    • Ensure sufficient bandwidth for image transfers

    • Monitor disk space on target devices

    • Clean up temporary files after deployment

Operational

  1. Monitoring

    • Monitor deployment success rates

    • Track authentication failures

    • Review deployment logs regularly

  2. Documentation

    • Maintain inventory of SSH keys and their usage

    • Document credential profiles and their assignments

    • Keep network topology documentation updated

When running CVT in containers:

  1. SSH Key Access

    • Mount SSH keys into container using volume mapping

    • Configure CV_SSH_KEY_FILE to point to container-accessible path

    • Verify key file permissions and ownership within container

  2. Network Access

    • Ensure container can reach target devices

    • Configure network settings for direct device access

    • Verify container networking doesn't block SSH connections

  3. SSH Agent

    • Forward SSH agent for key-based authentication

    • Configure agent socket mounting for container access

    • Verify agent accessibility within container environment

© Copyright 2025, NVIDIA. Last updated on Nov 12, 2025