Before You Install

This overview is designed to help you understand the various NetQ deployment and installation options.

Installation Overview

Consider the following deployment options and requirements before you install the NetQ system.

Single ServerClusterScale Cluster
On-premises onlyOn-premises onlyOn-premises only
Network size: small
    • 1-node: Supports up to 40 switches*
    Network size: medium
    • 3-node: Supports up to 100 switches*
    Network size: large
      KVM or VMware hypervisorKVM or VMware hypervisorKVM or VMware hypervisor
      No high-availability optionHigh-availabilityHigh-availability
      System requirements:
        • 16 virtual CPUs
        • 64GB RAM
        • 500GB SSD disk
        System requirements (per node):
          • 16 virtual CPUs
          • 64GB RAM
          • 500GB SSD disk
          System requirements (per node):
            • 48 virtual CPUs
            • 512GB RAM
            • 3.2TB SSD disk
            Not supported:
            • NVLink monitoring
            Not supported:
            • NVLink monitoring
            Not supported:
            • Network snapshots
            • Trace requests
            • Flow analysis
            • MAC commentary
            • Duplicate IP address validations
            Limited support:
            • Link health view (beta)

            *When switches are configured with both OpenTelemetry (OTLP) and the NetQ agent, switch support per deployment model is reduced by half.

            Server Arrangement

            Single server: A standalone server is easier to set up, configure, and manage, but limits your ability to scale your network monitoring and provides no redundancy in case of a hardware failure.

            Cluster: The cluster deployment comprises three servers: one master and two workers nodes. NetQ supports high-availability using a virtual IP address. Even if the master node fails, NetQ services remain operational.

            Scale cluster: The scale cluster deployment is intended for large network environments and allows you to expand NetQ monitoring capacity by adding nodes as your network grows. NVIDIA typically recommends this deployment for environments with 100 or more switches. It is the only deployment model that supports monitoring for NVIDIA NVLink, NVIDIA Spectrum-X Ethernet, as well as mixed Ethernet and NVLink networks.

            The following table shows high-level device support per-node for Ethernet-only, NVLink-only, and combined deployments. This deployment model is currently in beta for clusters larger than 5 nodes. See Verified Limits for detailed testing information.

            Deployment3 Nodes4 Nodes5 Nodes6 Nodes7 Nodes8 Nodes9 Nodes
            Exclusively Ethernet500 switches, 2K hosts750 switches, 3K hosts1000 switches, 4K hosts1250 switches, 5K hosts1500 switches, 6K hosts1750 switches, 7K hosts2000 switches, 8K hosts
            Exclusively NVLink128 NVL160 NVL192 NVL224 NVL256 NVL288 NVL320 NVL
            Ethernet and NVLink combined250 switches, 1K hosts, 64 NVL375 switches, 1.5K hosts, 96 NVL500 switches, 2K hosts, 128 NVL625 switches, 2.5K hosts, 160 NVL750 switches, 3K hosts, 192 NVL875 switches, 3.5K hosts, 224 NVL1K switches, 4K hosts, 256 NVL

            In both cluster deployments, the majority of nodes must be operational for NetQ to function. For example, a three-node cluster can tolerate a one-node failure, but not a two-node failure. Similarly, a five-node cluster can tolerate a two-node failure, but not a three-node failure. If the majority of failed nodes are Kubernetes control plane nodes, NetQ will no longer function. For more information, refer to the etcd documentation.

            Verified Limits

            The following values have been explicitly tested and validated, but they might not reflect the maximum theoretical system limits for NetQ.

            Deployment TypeVerified FeaturesVerified Scale LimitData RateHardware Requirements
            6-node scale cluster: Ethernet + NVLink- Ethernet agent features: WJH, RoCE, histograms, adaptive routing, interfaces, inventory, BGP sessions, validations
            - Switch OTLP data collection
            - DPU OTLP data collection
            - NVLink data collection: topology, partitions, metrics
            - Ethernet switches: 675 (GPUs: 32K)
            - DPUs: 8K (OTLP data)
            - NVLink: 450 GB with 72x1 configuration
            - NetQ Agent: ~7 Mbps
            - OTLP switch: 445 MB/s (3.56 Gbps)
            - OTLP host: 1,000,000 samples/s at 10-second interval
            - NVLink: ~32,000 messages/s (2,628 ports)
            - Counters: 112 per GB/s
            6 nodes, each with:
            - 48 vCPUs
            - 512 GB RAM
            - 3 TB SSD/NVMe
            6-node scale cluster: Ethernet + NVLink- Ethernet agent features: WJH, RoCE, histograms, adaptive routing, interfaces, inventory, BGP sessions, validations
            - Switch OTLP data collection
            - DPU OTLP data collection
            - Ethernet switches: 1,300 (GPUs: 55K)
            - DPUs: 14K (OTLP data)
            - NetQ Agent: ~7 Mbps
            - OTLP switch: 445 MB/s (3.56 Gbps)
            - OTLP host: 1,718,750 samples/s at 10-second interval
            6 nodes, each with:
            - 48 vCPUs
            - 512 GB RAM
            - 3 TB SSD/NVMe
            5-node scale cluster: Ethernet + NVLink (Ethernet agent only)- Ethernet agent features: WJH, RoCE, histograms, adaptive routing, interfaces, inventory, BGP sessions, validations- Ethernet switches: 1,300 (GPUs: 55K)- NetQ Agent: ~14 Mbps5 nodes, each with:
            - 48 vCPUs
            - 512 GB RAM
            - 3 TB SSD/NVMe
            3-node scale cluster: Ethernet + NVLink- Ethernet agent features: WJH, RoCE, histograms, adaptive routing, interfaces, inventory, BGP sessions, validations
            - Switch OTLP data collection
            - DPU OTLP data collection
            - NVLink data collection: topology, partitions, metrics
            - Ethernet switches: 250 (GPUs: 8K)
            - DPUs: 1K (OTLP data)
            - NVLink: 100 GB with 72x1 configuration
            - NetQ Agent: 2.5 Mbps
            - OTLP switch: 165 MB/s (1.32 Gbps)
            - OTLP host: 250,000 samples/s at 10-second interval
            - NVLink: ~9,200 messages/s (2,628 ports)
            - Counters: 112 per GB/s
            3 nodes, each with:
            - 48 vCPUs
            - 512 GB RAM
            - 3 TB SSD/NVMe
            3-node scale cluster: Ethernet-only- Ethernet agent features: WJH, RoCE, histograms, adaptive routing, interfaces, inventory, BGP sessions, validations
            - Ethernet OTLP data collection
            - Ethernet switches: 500 (GPUs: 16K)
            - DPUs: 2K (OTLP data)
            - NetQ Agent: 5 Mbps
            - OTLP switch: 330 MB/s (2.64 Gbps)
            - OTLP host: 500,000 samples/s at 10-second interval
            3 nodes, each with:
            - 48 vCPUs
            - 512 GB RAM
            - 3 TB SSD/NVMe
            3-node scale cluster: NVLink-only- NVLink data collection: topology, partitions, metrics- NVLink: 110 GB with 72x1 configuration
            - Partitions: 1,600
            - NVLink: ~10,000 messages/s (2,628 ports)
            - Counters: 112 per GB/s
            3 nodes, each with:
            - 48 vCPUs
            - 512 GB RAM
            - 3 TB SSD/NVMe
            5-node scale cluster: Ethernet-only- Ethernet agent features: WJH, RoCE, histograms, adaptive routing, interfaces, inventory, BGP sessions, validations
            - Ethernet OTLP data collection
            - Ethernet switches: 1,000 (GPUs: 32K)
            - DPUs: 4K (OTLP data)
            - NetQ Agent: 10 Mbps
            - OTLP switch: 660 MB/s (5.28 Gbps)
            - OTLP host: 1,000,000 samples/s at 10-second interval
            5 nodes, each with:
            - 48 vCPUs
            - 512 GB RAM
            - 3 TB SSD/NVMe
            3-node cluster (non-scale): Ethernet-only- Ethernet agent features: WJH, RoCE, histograms, adaptive routing, interfaces, inventory, BGP sessions, validations
            - Ethernet OTLP data collection
            - Ethernet switches: 50 (GPUs: 1.6K)- NetQ Agent: 500 Kbps
            - OTLP switch: 33 MB/s (264 Mbps)
            - OTLP host: 50,000 samples/s at 10-second interval
            3 nodes, each with:
            - 16 vCPUs
            - 64 GB RAM
            - 500 GB SSD/NVMe

            Large networks have the potential to generate a large amount of data. For large networks, NVIDIA does not recommend using the NetQ CLI; additionally, tabular data in the UI is limited to 10,000 rows. If you need to review a large amount of data, NVIDIA recommends downloading and exporting the tabular data as a CSV or JSON file and analyzing it in a spreadsheet program.

            Base Command Manager

            NetQ is also available through NVIDIA’s cluster management software, Base Command Manager. Refer to the Base Command Manager administrator and containerization manuals for instructions on how to launch and configure NetQ using Base Command Manager.

            Next Steps

            After you’ve decided on your deployment type, you’re ready to install NetQ.