Install and Configure NetQ#

NetQ NvLink (previously NMX-M) provides a single interface for management and telemetry collection of NVLink switches. NetQ is deployed on Kubernetes, along with the other components that make up Mission Control.

NetQ Kubernetes Setup#

NetQ Permanent License Generation and Application Guide#

When installing NetQ, the system receives an evaluation license valid for 60 days. When the evaluation license expires, REST API access is blocked until a new license is applied.

Generating a License File#

Before you generate the license file, you need to do the following:

Prepare a list of servers with the MAC address of each server on which you plan to install the NetQ software.
Have access to the NVIDIA Licensing Portal (NLP) with valid credentials.

To generate the license file, follow the steps below:

Access the NVIDIA Licensing Portal
- Go to the NVIDIA Licensing Portal (NLP).
- Log in using your credentials.
Navigate to Network Entitlements
- Click on the Network Entitlements tab.
- You’ll see a list of all your software product serial licenses, license information, and status.
Select and Activate License
- Select the license you want to activate.
- Click on the “Actions” button.
Configure MAC Addresses
- In the MAC Address field, enter the MAC address of the delegated license-registered host.
- If applicable, in the HA MAC Address field, enter your High Availability (HA) server MAC address.
- Note: If you have more than one NIC installed on a UFM Server, use any of the MAC addresses.
Generate and Download License
- Click on Generate License File to create the license key file for the software.
- Click on Download License File and save it on your local computer.

Important Notes about License Regeneration#

When you regenerate a license, you need to keep the following in mind:

If you replace your NIC or server, repeat the process of generating the license to set new MAC addresses.
You can only regenerate a license two times.
To regenerate the license after that, contact NVIDIA Sales Administration at enterprisesupport@nvidia.com.

Download the NetQ install package#

The NetQ install package can be downloaded from the NVIDIA Licensing Portal (NLP).

Downloading NetQ#

To download the package, follow the steps below:

Go to the NVIDIA Licensing Portal (NLP) and log in using your credentials.
Click on Software Downloads, filter the product family to NetQ, find the relevant version, and download the Appliance platform package.
Click on Download.
Save the file on your local drive.
Click Close.
Copy the .tar.gz file to the BCM head node:

rsync -azP <path-to-tar.gz-file> root@bcm11-head-01:/root

To download the Debian packages, use the following link: https://download.nvidia.com/cumulus/apps3.cumulusnetworks.com/repos/deb/pool/netq-5.0/

Find and download the relevant apps and agents packages for Ubuntu 24 (ub24) and the relevant CPU architecture (arm or amd).

Installing NetQ#

This section describes how to install NetQ on a BCM-managed Kubernetes cluster.

Prerequisites#

Kubernetes 1.33 is installed
Nginx ingress is installed
Three nodes with the minimum hardware requirements (4TB disk space, 512GB Free RAM, Core count 48)
An IP address that is reserved (to be used as Virtual IP)

Note that NetQ requires 512 GB of free memory rather than installed memory. If installed on a cluster with 512 GB of RAM, you may pass the –skip-netq-prerequisites-checks flag to the command. Please validate that the system has the required CPU core count before doing so.

Installation#

Start the installation wizard using the cm-mission-control-setup command.

Select the NetQ installation here:

Choose the related Kubernetes cluster:

Choose the node category for the nodes where NetQ will be installed (usually the category for k8s-admin control plane nodes):

If left empty, you can also choose 3 nodes:

Node selection screen showing 3 nodes option

Provide the NetQ overlay name and priority (in most cases ‘default’ can be used):

NetQ overlay name and priority configuration screen

Provide a Virtual IP for the Cluster - An unused IP address allocated from the same subnet assigned to the default interface for your master and worker nodes.

Virtual IP configuration screen for the cluster

Provide the paths to the NetQ tarball and debians:

NetQ tarball and debian package paths configuration screen

Select the NetQ deployment mode: Choose the NVL Mode.

NetQ deployment mode selection screen showing NVL Mode option

Set Kong (NMX Api) username and password:

Choose the storage path for Longhorn (replicated storage system). This must be a path with at least 4 TB of available space. On the k8s-admin node, run df -H to check storage availability; /local/longhorn may be a suitable option.

Longhorn storage path configuration screen

Save and deploy:

Then allow the installation to run to completion.

Configure Longhorn to not be a default storageclass#

After installation of NetQ, the default behavior is to set Longhorn as a default storageclass. We only want to use Longhorn for the NetQ components and not for storage.

To configure Longhorn to not be a default storageclass, use the following steps:

Get the current storageclasses:

kubectl get storageclass
NAME                      PROVISIONER                                      RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
local-path (default)      cluster.local/local-path-provisioner             Delete          WaitForFirstConsumer   true                   25h
longhorn (default)        driver.longhorn.io                               Delete          Immediate              true                   24h
longhorn-no-replication   driver.longhorn.io                               Delete          Immediate              true                   24h
longhorn-static           driver.longhorn.io                               Delete          Immediate              true                   24h
shoreline-local-path-sc   cluster.local/shoreline-local-path-provisioner   Delete          WaitForFirstConsumer   true                   18h

Run the following code to patch the Longhorn storageclass:

kubectl patch storageclass longhorn -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"false"}}}'


kubectl get storageclass
NAME                      PROVISIONER                                      RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
local-path (default)      cluster.local/local-path-provisioner             Delete          WaitForFirstConsumer   true                   26h
longhorn                  driver.longhorn.io                               Delete          Immediate              true                   25h
longhorn-no-replication   driver.longhorn.io                               Delete          Immediate              true                   25h
longhorn-static           driver.longhorn.io                               Delete          Immediate              true                   25h
shoreline-local-path-sc   cluster.local/shoreline-local-path-provisioner   Delete          WaitForFirstConsumer   true                   19h

Post-Installation Validation#

Run the command:

kubectl get pods -A

Ensure all pods are in a Running or Complete state.

Connect to:

https://<Virtual IP>:30443/nmx/swag/index.html

Use the rw-user or ro-user credentials and the password set during the installation.

Uninstall NetQ#

Run cm-mission-control-setup and select the “NVIDIA Mission Control NetQ uninstallation” option (this option appears if NetQ is installed).

The wizard prompts for confirmation.

Note

Files in the Longhorn directory must be deleted manually after the uninstallation on all nodes where NetQ ran.

NetQ NvLink Configuration#

This section describes how to configure NetQ NvLink with NMX-C and NMX-T.

Installing Certificates and Configuring NMX-C and NMX-T on NVLink Switch#

First, validate which NVLink switches have been selected as leaders using cmsh.

To list active leaders (denoted as “Active”), use the following command:

cmsh -c "device; nvfabricinfo"

Example output:

Domain Active Switches
-------- --------------- -------------------------------
A05 a05-p1-nvsw-01 a05-p1-nvsw-01..a05-p1-nvsw-09
A06 a06-p1-nvsw-01 a06-p1-nvsw-01..a06-p1-nvsw-09
A07 a07-p1-nvsw-01 a07-p1-nvsw-01..a07-p1-nvsw-09
B05 b05-p1-nvsw-01 b05-p1-nvsw-01..b05-p1-nvsw-09
B06 b06-p1-nvsw-01 b06-p1-nvsw-01..b06-p1-nvsw-09
B07 b07-p1-nvsw-01 b07-p1-nvsw-01..b07-p1-nvsw-09
B08 b08-p1-nvsw-01 b08-p1-nvsw-01..b08-p1-nvsw-09

A script is provided as part of the installation package that generates certificates for mTLS authentication between the NVLink switch leader and the NetQ NvLink deployment.

Run the script located at /opt/netq-admin/nvl/scripts/create-certificate.sh as the root user. When prompted, provide a certificate name (typically the switch’s hostname or IP address):

nvidia@ubuntu:~$ su
root@ubuntu:/home/nvidia# /opt/netq-admin/nvl/scripts/create-certificate.sh example
certificate.cert-manager.io/example-certificate created
Certificate is ready after 10 seconds.
Extracting secret data to local files…
Files created:
-rw-r--r-- 1 root root 1094 May 29 11:57 example-ca.crt
-rw-r--r-- 1 root root 1424 May 29 11:57 example-tls.crt
-rw-r--r-- 1 root root 3243 May 29 11:57 example-tls.key
-rw------- 1 root root 3907 May 29 11:57 example-tls.p12
Done.

To copy the generated certificates to the NVLink switch leader, use the following command:

scp a06-p1-nvsw-01-ca.crt a06-p1-nvsw-01-tls.p12 admin@a06-p1-nvsw-01:/home/admin

Then, SSH onto the NVLink switch leader and enable NMX-Controller (NMX-C) and NMX-Telemetry (NMX-T) apps:

ssh admin@a06-p1-nvsw-01 "nv action update cluster apps nmx-controller manager enabled; nv action update cluster apps nmx-telemetry manager enabled"

Example output:

NVOS switch
admin@a06-p1-nvsw-01s password:
Action executing ...
Cluster Manager Port updated successfully
Action succeeded
Action executing ...
Cluster Manager Port updated successfully
Action succeeded

To install the previously generated certificates by importing them through the NVLink switch leader, use the following command:

ssh admin@a06-p1-nvsw-01 "nv action import system security certificate nmxm-cert uri-bundle file:///home/admin/a06-p1-nvsw-01-tls.p12; nv action import system security ca-certificate manager-ca-cert uri file:///home/admin/a06-p1-nvsw-01-ca.crt"

Example output:

NVOS switch
admin@a06-p1-nvsw-01s password:
Action executing ...
Succeeded in importing X.509 entity certificate `nmxm-cert`.
NOTE: Certificate `nmxm-cert` is self-signed.
Action succeeded
Action executing ...
Succeeded in importing X.509 CA certificate `manager-ca-cert`.
Action succeeded

To enable the services and enable mTLS, use the following command:

ssh admin@a06-p1-nvsw-01 "nv action update cluster apps nmx-controller manager ca-certificate manager-ca-cert; nv action update cluster apps nmx-telemetry manager ca-certificate manager-ca-cert; nv action update cluster apps nmx-controller manager certificate nmxm-cert; nv action update cluster apps nmx-telemetry manager certificate nmxm-cert; nv action update cluster apps nmx-controller manager encryption mtls; nv action update cluster apps nmx-telemetry manager encryption mtls"

Example output:

NVOS switch
admin@a06-p1-nvsw-01s password:
Action executing ...
Cluster Manager CA Cert updated successfully
Action succeeded
Action executing ...
Cluster Manager CA Cert updated successfully
Action succeeded
Action executing ...
Cluster Manager Cert updated successfully
Action succeeded
Action executing ...
Cluster Manager Cert updated successfully
Action succeeded
Action executing ...
Cluster Manager Encryption updated successfully
Action succeeded
Action executing ...
Cluster Manager Encryption updated successfully
Action succeeded

To restart NMX-C and NMX-T services, use the following command:

ssh admin@a06-p1-nvsw-01 "nv action stop cluster apps nmx-telemetry; nv action start cluster apps nmx-telemetry; nv action stop cluster apps nmx-controller; nv action start cluster apps nmx-controller"

Example output:

NVOS switch
admin@a06-p1-nvsw-01s password:
Action executing ...
Running app stop command: nmx-telemetry
Action executing ...
App has been successfully stopped
Action succeeded
Action executing ...
Running app start command: nmx-telemetry
Action executing ...
App has been successfully started
Action succeeded
Action executing ...
Running app stop command: nmx-controller
Action executing ...
App has been successfully stopped
Action succeeded
Action executing ...
Running app start command: nmx-controller
Action executing ...
App has been successfully started
Action succeeded

NetQ’s primary method of interaction is through its REST API. Use curl for the next steps to add the NVLink switch services.

With the Kubernetes cluster installed by BCM, traffic is directed to the head nodes by default. Nginx running on the head nodes proxies requests to Kubernetes nodes running ingress-nginx. For this reason, the examples use curl to https://master/nmx.

In the POST payload, include the IP address of the leader switch you’re configuring. Find this using cmsh. For example, to find the leader switch for rack A06:

cmsh -c "device; use a06-p1-nvsw-01; get ip"

Example output:

7.241.3.31

To add the NVLink switch leader’s NMX-C to NetQ, make a POST request to the /v1/services endpoint:

curl -sk -X POST 'https://master/nmx/v1/services' -u rw-user:rw-password --header 'Content-Type: application/json' \
--data '{
"Name": "a06-p1-nvsw-01",
"Description": "a06-p1-nvsw-01",
"ServiceType": "CONTROLLER",
"ServiceConnectionInformation": {
  "Address": "7.241.3.31",
  "PortNumber": 9370
}
}'

Example response:

{
  "Address": "7.241.3.31",
  "Description": "a06-p1-nvsw-01",
  "ID": "68703777cf6f5852a7316906",
  "Name": "a06-p1-nvsw-01",
  "PortNumber": 9370,
  "ServiceType": "CONTROLLER",
  "Status": "IN_PROGRESS",
  "StatusInfo": "",
  "Version": ""
}

Verify that this was successful by making a GET request to the /v1/services endpoint:

curl -sk -X GET 'https://master/nmx/v1/services/68703777cf6f5852a7316906' -u rw-user:rw-password --header 'Content-Type: application/json' | jq .

Example response:

{
  "Address": "7.241.3.31",
  "ApplicationUUID": "c9bd7a13-ccb2-4a90-95ff-9dcf5e9038bc",
  "ClusterDomainUUID": "c2b42a4c-e407-4f98-af6a-8c96823a807e",
  "Description": "a06-p1-nvsw-01",
  "ID": "68703777cf6f5852a7316906",
  "Name": "a06-p1-nvsw-01",
  "PortNumber": 9370,
  "RegisteredAt": "2025-07-10T21:58:15.908Z",
  "ServiceType": "CONTROLLER",
  "Status": "UP",
  "StatusInfo": "",
  "UpSince": "2025-07-10T21:58:15.908Z",
  "Version": "1.2.0_2025-06-07_10-33"
}

To add the NVLink switch leader’s NMX-T to NetQ, make a POST request to the /v1/services endpoint:

curl -sk -X POST 'https://master/nmx/v1/services' -u rw-user:rw-password --header 'Content-Type: application/json' \
--data '{
"Name": "a06-p1-nvsw-01",
"Description": "a06-p1-nvsw-01",
"ServiceType": "TELEMETRY",
"ServiceConnectionInformation": {
  "Address": "7.241.3.31",
  "PortNumber": 9351
}
}'

Example response:

{
  "Address": "7.241.3.31",
  "Description": "a06-p1-nvsw-01",
  "ID": "6870386b8c7b451eeafddfda",
  "Name": "a06-p1-nvsw-01",
  "PortNumber": 9351,
  "ServiceType": "TELEMETRY",
  "Status": "IN_PROGRESS",
  "StatusInfo": "",
  "Version": ""
}

Verify that this was successful by making a GET request to the /v1/services endpoint, using the ID from the prior response:

curl -sk -X GET 'https://master/nmx/v1/services/6870386b8c7b451eeafddfda' -u rw-user:rw-password --header 'Content-Type: application/json' | jq .

Example response:

{
  "Address": "7.241.3.31",
  "ApplicationUUID": "6c164aa5-2aa7-4789-9587-31b79dc62897",
  "ClusterDomainUUID": "c2b42a4c-e407-4f98-af6a-8c96823a807e",
  "Description": "a06-p1-nvsw-01",
  "ID": "6870386b8c7b451eeafddfda",
  "Name": "a06-p1-nvsw-01",
  "PortNumber": 9351,
  "RegisteredAt": "2025-07-10T22:02:19.865Z",
  "ServiceType": "TELEMETRY",
  "Status": "UP",
  "StatusInfo": "",
  "UpSince": "2025-07-10T22:02:19.865Z",
  "Version": "1.1.3"
}

NetQ provides a Prometheus exporter interface for collected metrics. Use this to validate that NetQ is working with the newly added NVLink switch leader:

curl -sk "https://master/nmx/v1/metrics?id=$(curl -sk -X GET 'https://master/nmx/v1/services/6870386b8c7b451eeafddfda' -u rw-user:rw-password --header 'Content-Type: application/json' | jq -r '.ClusterDomainUUID')" \
-u rw-user:rw-password \
| head -n 20

Example output:

diag_supply_voltage{domain_id="c2b42a4c-e407-4f98-af6a-8c96823a807e",node_guid="0x330aa4e54b8d4c2d",Port="11"} 0 1752185097534
diag_supply_voltage{Port="11",domain_id="c2b42a4c-e407-4f98-af6a-8c96823a807e",node_guid="0x2c5eab0300ca6700"} 0 1752185097534
diag_supply_voltage{domain_id="c2b42a4c-e407-4f98-af6a-8c96823a807e",node_guid="0x9f6028016bbe9123",Port="11"} 0 1752185097534
diag_supply_voltage{node_guid="0x2c5eab0300ca6720",Port="29",domain_id="c2b42a4c-e407-4f98-af6a-8c96823a807e"} 0 1752185097534
diag_supply_voltage{Port="5",domain_id="c2b42a4c-e407-4f98-af6a-8c96823a807e",node_guid="0x2bad4538ad47b824"} 0 1752185097534

Apply a Permanent NetQ NvLink License#

To apply a permanent NetQ NvLink license, follow these steps:

Download the License#

Log in to the NVIDIA Licensing Portal.
From the menu, select Entitlements > Networking.
Locate the license file. From the Actions column, select the three-dot menu to download the license.

Apply a New License#

Create the /opt/netq-admin/nvl/licenses directory.
Upload the new license file to /opt/netq-admin/nvl/licenses.
Run the following script on one of the NetQ nodes:

/opt/netq-admin/nvl/scripts/license-config.sh

When prompted, select the first option: Apply new license (replace existing).
Select the license file and confirm that the new license details are correct.

The script concurrently applies the new license and replaces the previous one.

View License Details#

To view license details, including license type, issue date, and expiration date:

Run the following script:

/opt/netq-admin/nvl/scripts/license-config.sh

When prompted, select the second option: Get active license information.

Receive License Alerts#

If you have configured a webhook receiver, you will receive a notification when your license is about to expire or has already expired. These notifications are sent every 24 hours until the license status is updated.