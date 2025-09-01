NVIDIA NVOS User Manual for InfiniBand Switches v25.02.5002
NVIDIA Docs Hub Homepage  NVIDIA Networking  Networking Software  Switch Software  NVIDIA NVOS User Manual for InfiniBand Switches v25.02.5002  Cluster Manager

On This Page

Cluster Manager

NMX Manager is a component of the NMX solution designed for collecting and processing data center telemetry, monitoring, and providing insights and predictions on system operability and health. The role of NMX Manager is to aggregate streamed telemetry from the NMX Telemetry subsystem, filter, sort, run local predictions, and stream the collected data into the NMX Oasis data lake for further analysis. Additionally, NMX Manager can control network behavior and configuration settings by sending control messages to the NMX Controller.

How to Use Cluster Manager (Non-Secure)

Presented below is the NVOS configuration for utilizing Cluster Manager in a non-secure environment.

Copy
Copied!
            

            
admin@nvos:~$ nv set cluster state enabled
admin@nvos:~$ nv config apply
admin@nvos:~$ nv action update cluster apps nmx-controller manager encryption disabled
admin@nvos:~$ nv action update cluster apps nmx-controller manager enabled

Cluster Manager Security

NMX Controller and NMX Telemetry offer security support.

They use GRPC for client communication, which works over TLS or MTLS configured via NVOS CLI. Below is a simple flow for using MTLS with Cluster Manager, along with a list of cluster commands attached to this manual.

image-2024-12-6_13-47-44-version-1-modificationdate-1756652624547-api-v2.png

Note

The same CA could be used on both sides, or each side could choose a different CA.

Flow Description

In NVOS, the Cluster applications NMX-C/NMX-T function as GRPC servers, while the Cluster Manager device operates as the GRPC client.

The configuration from the NVOS CLI stores the Client CA certificate and the Server certificate on the NVOS side and binds the certificates to the apps for supporting the TLS/MTLS on top of GRPC.

Configuration for Enabling mTLS with Cluster Manager

To configure mutual TLS (mTLS) with Cluster Manager, ensure that the necessary certificates and configurations are set up across both the control plane and data plane components. Below is an example of how to configure mTLS in your Cluster Manager environment:

Copy
Copied!
            

            
admin@nvos:~$ nv set cluster state enabled
admin@nvos:~$ nv config apply
admin@nvos:~$ nv action import system security certificate cert-name passphrase 12345678 uri-bundle scp://your_username:your_password@1.2.3.4/path-to-cert/cert.p12 //Saving the server certificate for TLS/MTLS
admin@nvos:~$ nv action import system security ca-certificate cacert-name uri scp://your_username:your_password@1.2.3.4/path-to-cacert/ca.crt //Saving the Client CA certificate for MTLS
admin@nvos:~$ nv action update cluster apps nmx-controller manager enabled
admin@nvos:~$ nv action update cluster apps nmx-controller manager certificate cert-name //binding the imported certificate to NMX
admin@nvos:~$ nv action update cluster apps nmx-controller manager ca-certificate  cacert-name //binding the imported CA certificate to NMX
admin@nvos:~$ nv action update cluster apps nmx-controller manager encryption mtls


Configuration for Enabling TLS with Cluster Manager

To enable TLS with Cluster Manager, you must configure the appropriate certificates and security settings for encrypted communication between services in the cluster. Below is an example configuration for setting up TLS in your Cluster Manager environment:

Copy
Copied!
            

            
admin@nvos:~$ nv set cluster state enabled 
admin@nvos:~$ nv config apply 
admin@nvos:~$ nv action import system security certificate cert-name passphrase 12345678 uri-bundle scp://your_username:your_password@1.2.3.4/path-to-cert/cert.p12 
admin@nvos:~$ nv action update cluster apps nmx-controller manager enabled 
admin@nvos:~$ nv action update cluster apps nmx-controller manager certificate cert-name 
admin@nvos:~$ nv action update cluster apps nmx-controller manager encryption tls


Configuration for Enabling RBAC with Cluster RBAC Commands

To enable role-based access control (RBAC) with the Cluster, the appropriate certificates and security settings must be configured for encrypted communication between services in the cluster. Then configure the appropriate RBAC file to be used with the Cluster. Below is an example configuration for setting up RBAC in a Cluster environment:

Copy
Copied!
            

            
admin@nvos:~$ nv set cluster state enabled
admin@nvos:~$ nv config apply
admin@nvos:~$ nv action import system security certificate cert-name passphrase 12345678 uri-bundle scp://your_username:your_password@1.2.3.4/path-to-cert/cert.p12 //Saving the server certificate for TLS/MTLS
admin@nvos:~$ nv action import system security ca-certificate cacert-name uri scp://your_username:your_password@1.2.3.4/path-to-cacert/ca.crt //Saving the Client CA certificate for MTLS
admin@nvos:~$ nv action update cluster apps nmx-controller manager enabled
admin@nvos:~$ nv action update cluster apps nmx-controller manager certificate cert-name //binding the imported certificate to NMX
admin@nvos:~$ nv action update cluster apps nmx-controller manager ca-certificate  cacert-name //binding the imported CA certificate to NMX
admin@nvos:~$ nv action update cluster apps nmx-controller manager encryption mtls 
 
admin@nvos:~$ nv action  import cluster rbac file rbac-id scp://your_username:your_password@1.2.3.4/path-to-cert/rbac_spiffe.yaml
admin@nvos:~$ nv action update cluster apps nmx-controller rbac file rbac-id   // binding rbac file to NMX
admin@nvos:~$ nv action update cluster apps nmx-controller rbac mode spiffe    // Set RBAC mode to NMX


Cluster Manager Commands

Note

Action commands of Cluster Manager require a delay for execution between them. During this period, GPRC traffic will be paused. The command nv action update cluster apps log-level is an exception to this rule.

After importing a certificate (whether entity certificate or CA certificate), you need to update the service with the command: nv action update cluster apps <app> manager certificate <cert-name>.

Updating the certificate will cause an NMX configuration change, resulting in a brief connection interruption.
© Copyright 2025, NVIDIA. Last updated on Sep 1, 2025.
content here