NVIDIA NVLink SGXLS10 Switch Systems User Manual
NVIDIA NVLink SGXLS10 Switch Systems User Manual

Software Management

The NVLink switch systems come with an embedded management CPU card that runs the NVOS management software. The NVOS documentation will be published at https://docs.nvidia.com/networking/category/nvos upon the official release.

Fabric Manager Overview

NVIDIA Fabric Manager (FM) configures the NVSwitch memory fabrics to form a single memory fabric among all participating GPUs, and monitors the NVLinks that support the fabric. The Fabric Manager has the following main responsibilities:

1. Coordinate with the NVSwitch driver to initialize and train NVSwitch to NVSwitch NVLink interconnects.
2. Coordinate with the GPU driver to initialize and train NVSwitch to GPU NVLink interconnects.
3. Configure routing among NVSwitch ports.
4. Monitor the fabric for NVLink and NVSwitch errors.

Please refer to the Fabric Manager for NVIDIA NVSwitch Systems User Guide for an overview of various Fabric Manager features. It is intended for system administrators and individual users of NVSwitch-based server systems.

NetQ Overview

NetQ is a highly scalable, modern network operations tool set that provides visibility and troubleshooting of overlay and underlay networks in real-time. NetQ delivers actionable insights and operational intelligence about the data center's health — from the container, virtual machine or host, all the way to the switch and port. NetQ correlates configuration and operational status, and instantly identifies and tracks state changes, while simplifying management for the entire Linux-based data center. With NetQ, network operations change from a manual, reactive, node-by-node approach to an automated, informed and agile one.

NetQ performs three primary functions:

  1. Data collection: real-time and historical telemetry and network state information

  2. Data analytics: deep processing of the data

  3. Data visualization: rich graphical user interface (GUI) for actionable insights

NetQ is available as an on-site or an in-cloud deployment.

NetQ delivers significant operational improvements to the network management and maintenance processes. It simplifies the data center network by reducing the complexity through real-time visibility into hardware and software status, and eliminates the guesswork associated with investigating issues through the analysis and presentation of detailed, focused data.

For further information, please refer to the NVIDIA NetQ 4.3 User Guide.

© Copyright 2023, NVIDIA. Last updated on Dec 13, 2023.