Definitions/Abbreviation
Definitions / Abbreviation | Description |
NMX | Product suite name for HPC cluster network monitoring and management system comprises NMX Telemetry, NMX Manager, NMX Controller and NMX Oasis. |
NMX Telemetry (NMX-T) | NMX subsystem is an integrated solution constructed from multiple services, responsible for the collection, aggregation and transmission of telemetry data collected from various devices, applications and platforms that build the data center. |
NMX Manager Connector | Part of the NMX Telemetry, responsible for the authentication, the connection handling to the Southbound Gateway and the telemetry streaming. |
NMX Controller (NMX-C) | NMX subsystem is a control plane entity that is responsible for the configuration, monitoring and control of various systems, mainly network devices, that build the Data Center. |
NMX Manager (NMX-M) | NMX subsystem that collect telemetry data from NMX Telemetry, can aggregate, analyze, run ML models for inference and pattern detections. NMX Manager can control the behavior of the HPC by changing the configuration of network or compute entities using NMX Controller. |
NMX Manager Gateway | Part of NMX Manager, responsible for accepting connections from NMX Manager Connectors from multiple locations, handles the authorization and telemetry streams. |
NMX Inference Engine | An AI pipeline receiving streaming telemetry from Kafka topic, runs AI models and returns predictions / actions. |
NMX Controller Engine | Part of NMX Manager. Receives abstract action and sends domain specific actions, mainly targeted to NMX Controller to update network configurations. |
NMX Oasis Connector | Part of the NMX Manager SaaS. Its role is to connect to NMX Oasis data lake, while complying to data lake communication security, and stream the telemetry and logs data from NMX Manager to the data lake. |
NMX Oasis | NMX subsystem is a data lake solution that lives in single/multiple clouds. Its components are API gateways, ETL processes, compute clusters, analysis models and informative dashboards. |
Telemetry Agent | An entity that provides the telemetry data to a collector using pull/push methods. An agent is located on a target device, such as a network device, a host or a sensor controller. The agent uses various protocols to provide the telemetry, such as: gNMI, HTTP REST, IB MAD, etc. Agents can also support traps as means of event notifications. |
Telemetry Collector | An entity that connects-to or is being-connected-from an agent to collect telemetry data. A collector uses a single protocol to connect to an agent. Collecting telemetry from agents using different protocols requires multiple collectors. Data collected can be metrics, events and logs. |
Telemetry Aggregator | An entity that collects telemetry data from various collectors or other aggregators, runs data filtering, transformation and aggregation and provides means of storing the data (temporarily or long term). As mentioned above, aggregators can have a tree structure where an aggregator can connect to multiple aggregators as a method of collecting and transforming various data from various sources. |
Circuit Management | The process of connecting a client to a server, monitoring the operational state of the connection, detection of a failed connection and managing the process of connection re-establishment. |
NVOS | NVIDIA Networking OS, formerly known as MLNX-OS. NVOS is used as the Switch OS for L1 NVSwitch Trays and L2 NVSwitches |
gRPC | Google’s Remote Procedure Call (RPC) framework that can run in any environment |