High Availability (HA) Mode Support for CVT Plugin
The CVT (Cable Validation Tool) plugin supports High Availability (HA) mode to ensure continuous operation and service resilience in production environments. In an HA deployment, CVT operates in an active-passive configuration where the collector runs on only one node at a time. When a failover event occurs, the CVT collector on the active node (node1) is stopped, and the CVT collector on the standby node (node2) automatically starts up. This architecture ensures that cable validation monitoring continues with minimal interruption, which is critical for maintaining network health and detecting issues in real-time.
The HA implementation for CVT follows a shared-storage model where both nodes have access to common configuration and data files. This shared storage ensures that when the standby node takes over, it has immediate access to the same topology data, validation history, and configuration that was being used by the primary node. The shared resources include:
Configuration files - All CVT configuration settings, including
cvt_env.confTopology data - Current and historical topology files
Database and persistent storage - All collected metrics and historical information
Note on Validation State: The runtime validation state (agent status, validation results, and operational data) is maintained in memory only and is not persisted to shared storage. When a failover occurs, the standby node will reload the topology from shared storage and restart validation. The validation state will be rebuilt as agents are redeployed and begin reporting data. This means there will be a brief period during failover where the validation state is being re-established, but the topology configuration and historical data remain intact.
When a failover occurs—whether due to planned maintenance, system failures, or network issues—the standby node can quickly resume operations by loading the same topology and restarting validation, minimizing the gap in validation coverage.
To enable seamless automatic failover in HA mode, the CVT plugin requires specific configuration settings that allow the standby collector to automatically resume validation operations immediately upon startup. Without proper configuration, manual intervention would be required after each failover to reload the topology and restart validation, which would defeat the purpose of having an HA setup. The automated recovery mechanism ensures that network monitoring remains consistent and continuous across failover events.
For proper HA failover support, users must configure two critical parameters in the cvt_env.conf file under the [application] section:
Setting | Description |
STARTUP_TOPOLOGY=last | This setting instructs the collector to automatically load the last successfully loaded topology from the history when starting up. In a failover scenario, this ensures that the standby collector immediately restores the exact topology state that was active on the primary collector before the failover. Since both nodes share the same data files, the "last" topology reference points to the same topology file that was in use on the primary node. Without this setting, the collector would start with no topology loaded (the default |
AUTO_START_VALIDATION=true | This setting enables automatic validation startup once a topology is successfully loaded. When combined with This is essential for maintaining continuous validation during failover events, as it eliminates the manual steps that would otherwise be needed to resume monitoring. The validation process will automatically deploy agents to network devices and resume cable validation exactly as it was running on the primary node. |
To configure CVT for HA mode with automatic failover support, update the [application] section in cvt_env.conf:
[application]# Topology loading at startup - specify what to load:
# none - do not load any topology file (default)
# last - load the last loaded topology from history
# <path> - load a specific topology file (supports .topo, .dot, .xlsx, .json)
# For HA mode: set to 'last' to enable automatic topology restore on failover
STARTUP_TOPOLOGY=last
# Automatically start validation if a topology file is loaded
# For HA mode: set to 'true' to enable automatic validation resume on failover
AUTO_START_VALIDATION=true
With the above configuration, the failover sequence operates as follows:
Primary node (node1) failure - The CVT collector on node1 stops due to hardware failure, network issue, or planned maintenance
HA cluster failover - The cluster management system (e.g., Pacemaker, Kubernetes, or other HA solution) detects the failure and initiates failover to node2
Standby node (node2) startup - The CVT collector starts on node2
Automatic topology restore - CVT reads the
STARTUP_TOPOLOGY=lastsetting and automatically loads the last topology from the shared storageAutomatic validation start - CVT reads the
AUTO_START_VALIDATION=truesetting and immediately begins validation operationsValidation state rebuild - Agents are redeployed to network devices and begin reporting data. The runtime validation state (agent status, current validation results) is rebuilt in memory as agents come online and start collecting cable data
Service resumed - Cable validation monitoring continues with minimal interruption
The entire process is fully automated, requiring no manual intervention to restore service after a failover event. While the topology configuration and historical data are immediately available from shared storage, the runtime validation state will be progressively rebuilt as agents reconnect and report their status.
Shared Storage: Ensure that the shared storage is reliable and accessible from both nodes with low latency
Network Configuration: Both nodes should have similar network configurations to ensure agents can communicate with the collector regardless of which node is active
Cluster Management: Use a proper HA cluster management solution to handle failover detection and node transitions
Testing: Regularly test failover scenarios to ensure the configuration is working as expected
Monitoring: Implement monitoring to detect failover events and verify that validation resumes successfully on the standby node
Together, the STARTUP_TOPOLOGY=last and AUTO_START_VALIDATION=true configuration settings create a robust failover mechanism where the standby collector can automatically take over validation operations in an active-passive HA deployment. This configuration ensures minimal disruption to network monitoring and allows HA deployments to provide the high reliability that production environments demand, without requiring manual intervention during failover events.