NVIDIA NVOS User Manual for InfiniBand Switches v25.02.7002

gNMI Streaming

The gRPC Network Management Interface (gNMI) can collect and export system resources, interface, and counter information from NVOS to your gNMI client.

The gNMI server feature state can be set over NVOS using simple NVUE CLI commands:

Show command:

Copy
Copied!
            

nvos@switch:~$ nv show system gnmi-server operational applied ----------- ------------- ----------- state enabled enabled certificate self-signed self-signed is-running yes version 4.13.0-3000-2

Set command:

Copy
Copied!
            

nvos@switch:~$ nv set system gnmi-server state <enabled | disabled>

Unset command:

Copy
Copied!
            

nvos@switch:~$ nv unset system gnmi-server state

The state is enabled by default and the unset command will restore the state to enabled, if it is not already.

NVOS supports the following gNMI subscription modes:

  • STREAM Mode: In this mode, the client subscribes to receive updates whenever there is a change in the telemetry data. This mode is suitable for scenarios where you need real-time notifications of data changes.

  • ONCE Mode: This mode retrieves the data once and then terminates the subscription. It's ideal for scenarios where a single snapshot of the data is needed without ongoing updates.

  • POLL Mode: In this mode, the client periodically requests data from the server. This mode allows clients to fetch data at defined intervals, providing a balance between real-time and scheduled updates.

Supported stream modes:

  • ON_CHANGE—When a subscription is defined to be "on change", data updates are only sent when the value of the data item changes.

  • SAMPLE —This mode allows clients to receive periodic samples of telemetry data at specified intervals. This mode is beneficial for scenarios where continuous streaming of data is not necessary, but periodic updates are required for monitoring and analytics.

Key Parameters for STREAM SAMPLE Mode:

  • sample_interval (mandatory): Defines the interval at which samples are sent to the client. This parameter controls the frequency of data transmission.

  • suppress_redundant (optional, default false): Determines whether redundant data updates, which have not changed since the last sample, should be suppressed. This helps in reducing unnecessary data transmission and optimizing network usage.

  • heartbeat_interval (optional, default disabled): Specifies the interval for sending heartbeat messages to indicate that the connection is still active. Heartbeats help in monitoring the health of the connection and detecting failures.

Models Overview

The NVOS gNMI Model is based on OpenConfig YANG models, extended with NVIDIA-specific augments where required.

It provides a consistent, vendor-neutral telemetry structure while allowing NVIDIA to expose additional InfiniBand, platform, and diagnostic data.

The gNMI YANG models consist of:

  • Standard OpenConfig models (baseline support)

  • NVIDIA Models (NVOS-specific enrichment)

  • Legacy NVIDIA models retained for backward compatibility

OpenConfig Supported Models

Model

Supported Data

openconfig-interfaces

Base interface configuration, state, and counters: Name, Description, AdminStatus, OperStatus, Enabled, IfIndex, LoopbackMode, and base interface counters (InPkts, OutPkts, InOctets, OutOctets, InUnicastPkts, OutUnicastPkts, InMulticastPkts, OutMulticastPkts, InBroadcastPkts, OutBroadcastPkts, InDiscards, OutDiscards, InErrors, OutErrors), plus InfiniBand-specific interface state (IBSpeed, Speed, IBSubnet, LogicalPortState, PhysicalPortState, MaintenanceState, MTU, MaxSupportedMTUs, SupportedIBSpeeds, SupportedWidths, VLCapabilities, OperationalVL) and InfiniBand port counters (SymbolErrorCounter, XmitWait, RcvErrors, RcvRemotePhyErrors, RcvSwitchRelayErrors, LocalLinkIntegrityErrors, ExcessiveBufferOverrun, LinkErrorRecovery, LinkDowned, QP1Dropped, VL15Dropped and related IB statistics).

openconfig-system

System identity, software, and resource usage: Hostname, BootTime, SoftwareVersion, Location, Contact, RoutingMAC, CPU utilization (aggregate Total/Average), and system memory usage (Physical, Used).

openconfig-platform

Chassis, ASIC, PSU, fan, storage, and other hardware inventory: Component Name, Type, Description, ModelName, PartNo, SerialNo, FirmwareVersion, OperStatus, Temperature, plus component-specific data for fans (Speed, Status), PSUs (Enabled, InputVoltage, InputCurrent, OutputVoltage, OutputCurrent, OutputPower, Status), ASICs (Name, Temperature), chassis/switch (SerialNo, ModelName, PartNo, OperStatus), storage (TotalSize), and platform health (Health Status, LastUnhealthy, UnhealthyCount).

openconfig-platform-transceiver

Optical transceiver module and channel monitoring: module presence and identity (Present, FormFactor, VendorPart, SerialNo), electrical and thermal telemetry (SupplyVoltage, LaserTemperature, module temperature thresholds – Lower, Upper), per-channel optical DOM data (InputPower, OutputPower, LaserBiasCurrent), and per-channel / host-lane status flags (RxCDRLoL, RxLOS, TxCDRLoL, TxLOS, TxFault, TxAdEqFault) and module temperature / voltage alarm flags.

openconfig-platform-healthz

Component health status and history: Status, LastUnhealthy, UnhealthyCount.


NVIDIA Models

These models extend OpenConfig to expose NVIDIA-specific telemetry that is not covered by the base OpenConfig schemas.

Model

Supported Data

nvidia-interfaces-infiniband

InfiniBand-specific interface configuration and state: IBSpeed, Speed, IBSubnet, LogicalPortState, PhysicalPortState, MaintenanceState, MTU, MaxSupportedMTUs, SupportedIBSpeeds, SupportedWidths, VLCapabilities, OperationalVL, SpeedNegotiate and related InfiniBand admin/oper fields.

nvidia-interfaces-infiniband- errors-ext

InfiniBand-specific error and status counters: ExcessiveBufferOverrun, LinkErrorRecovery, LinkDowned, LocalLinkIntegrityErrors, RcvErrors, RcvRemotePhyErrors, RcvSwitchRelayErrors, QP1Dropped, VL15Dropped and similar InfiniBand-specific port error counters.

nvidia-system-augments

NVIDIA-specific system metadata: system Location and Contact, plus other NVIDIA system-level extensions modeled as augments to the openconfig-system tree (superseding the legacy platform-general location/contact fields).

nvidia-system-events

Structured system event reporting: EventId, TypeId, Text, Resource, Severity, TimeCreated.

nvidia-if-phy-augments

Enhanced physical-layer diagnostics and BER/FEC telemetry: general PHY and BER state (TimeSinceLastClear, EffectiveErrors, ReceivedBits, SymbolErrors, RawBER, EffectiveBER, SymbolBER, ProfileFECInUse, ZeroHist), per-lane BER and error counters (per-channel RawBER and RawErrors), RS histogram bins (RSCorrectedError counters), link-down statistics (TotalEvents, IntentionalEvents, UnintentionalEvents) and reasons (Local/Remote reason code and status), recovery statistics (LastLogicRecoveryAttempts, LastSerdesEqRecoveryAttempts, TimeBetweenLastTwoRecoveries, TimeInLastLogicRecoveryEvent, TimeInLastSerdesEqRecoveryEvent, TimeSinceLastRecovery, TotalSuccessfulRecoveryEvents), PLR metrics (PLR_BW_LossPercent, PLR_CodesLoss, PLR_RcvCodes, PLR_RcvCodeErr, PLR_RcvUncorrectableCode, PLR_SyncEvents, PLR_XmitCodes, PLR_XmitRetryCodes, PLR_XmitRetryEvents, PLR_XmitRetryEventsWithinTsecMax), and InfiniBand port error and port statistic counters (PortBufferOverrunErrors, PortDLIDMappingErrors, PortInactiveDiscards, PortLocalPhysicalErrors, PortLoopingErrors, PortMalformedPacketErrors, PortNeighborMTUDiscards, PortVLMapp­ingErrors, PortRcvData, PortRcvPkts, PortUnicastRcvPkts, PortUnicastXmitPkts, PortMulticastRcvPkts, PortMulticastXmitPkts, PortXmitData, PortXmitPkts, RQGeneralError, SyncHeaderErrorCounter).

nvidia-platform-integrated-circuit-augments

ASIC power telemetry over standard integrated-circuit model: LongTermAvgPower, ShortTermAvgPower (average power values per monitoring interval on ASIC integrated-circuit power).

nvidia-platform-storage- augments

Switch-local storage utilization: TotalSize for the logical switch storage device

nvidia-platform-transceiver- augments

Transceiver firmware and alarm model: DataPathFirmwareFault, ModuleFirmwareFault, ModuleErrorType and generic alarm state (AlarmStatus, AlarmSeverity, AlarmThreshold) for module temperature and supply voltage, and for channel InputPower, OutputPower and LaserBiasCurrent (replacing legacy module/channel-specific alarm flags).


Legacy NVIDIA Models

NVOS exposes a set of legacy NVIDIA YANG models for backward compatibility.

These models exist only to support deprecated gNMI xpaths. All data is available through the Model above, and these models are planned for removal in a future NVOS release.

Model

Supported Data (legacy)

nvidia-platform-general-ext

Legacy platform-wide system and resource information: Contact, Location, NOSVersion, PlatformName, MemoryTotalSize, MemoryUsed, DiskTotalSize, DiskUsed, AmbientTemperature and LeakSensor Id/State.

nvidia-platform-general- ext-versions

Legacy system component firmware inventory: FWVersionBIOS, FWVersionBMC, FWVersionFPGA, FWVersionEROT and FWVersionCPLD / FWVersionSMA entries (per-id version and id).

nvidia-platform-asic

Legacy ASIC-specific telemetry model: ASICName, ASICTemp, LongTermAvgPower, ShortTermAvgPower.

nvidia-if-phy-diag

Legacy PHY diagnostic model: CableProtoCapExt, CoreToPhyLinkProtoEnabled, CoreToPhyLinkWidthEnabled, ETH-AN/IB-PHY/PD/PHY-HST/PHY-Manager FSM and link mode fields, LoopbackMode, FECModeRequest, ProfileFECInUse, EffectiveBER, RawBER, SymbolBER, EffectiveErrors, PhyReceivedBits, SymbolErrors, RS histogram bins (RS_Num_Corr_Err_Bin0–Bin15), PLR_* metrics, InfiniBand port-errors and port-statistics counters, link-down and recovery metrics (LinkDown, IntentionalLinkDownEvents, UnintentionalLinkDownEvents, LinkDownReasonCode/Status Local/Remote, TimeSinceLastClear, TimeBetweenLastTwoRecoveries, TimeInLastLogic/ SerdesEqRecoveryEvent, TimeSinceLastRecovery, TotalSuccessfulRecoveryEvents, ZeroHist), and related PHY diagnostic leaves.

nvidia-platform-transceiver-diag

Legacy transceiver diagnostics model: ModuleOperStatus, DataPathFirmwareFault, ModuleFirmwareFault, ModuleErrorType, module TemperatureHigh/Low Alarm and Warning flags, VccHigh/Low Alarm and Warning flags, and channel-level flags for TxAdEqFault, TxFault, TxCDRLoL, TxLOS, RxCDRLoL, RxLOS.


YANG Model Availability

The YANG models above are available on the NVIDIA Enterprise Support Portal → Downloads → Switches and Gateways → Switch Software → QM-3 NVOS InfiniBand → More files.

NVOS YANG Package Structure

The NVOS YANG package is provided as a tar archive with the following structure:

Copy
Copied!
            

models/ ietf IETF standard base YANG models openconfig OpenConfig models with NVIDIA Model augments nvos NVOS-specific OpenConfig augments kept for legacy backward compatibility not-supported Deviation modules that mark non-supported leaves and nodes in the models above gnmi-supported-paths.html Reference list of all gNMI-supported paths in this release


The gNMI service enforces limitations on the number of active and incoming gRPC connections to ensure system stability and optimal resource usage.

  • Maximum Established Connections:

    The gNMI server supports a maximum of 10 concurrently established gRPC connections at any given time. Once this limit is reached, new connection attempts will be rejected until at least one of the existing connections is terminated.

  • Source IP–Based Rate Limiting:

    The gNMI server allows up to 10 concurrent TCP connections from the same source IP address. If additional connection requests are initiated from that IP while the limit is reached, those connection attempts will be dropped automatically. The new connections will only be accepted when the number of active TCP sessions from that IP drops below the configured threshold.

  • To enhance the security of gNMI communications, it is strongly recommended to implement mutual TLS (mTLS) authentication together with SPIFFE (Secure Production Identity Framework For Everyone):

    • Mutual TLS (mTLS): Ensures that both client and server authenticate each other using trusted X.509 certificates, thereby preventing unauthorized access and man‑in‑the‑middle attacks.

    • SPIFFE Integration: Leverages SPIFFE IDs to provide consistent, identity-based authentication and authorization across services. This minimizes dependence on static credentials and simplifies certificate management.

gNMI client on a host can request capabilities and data from the switch. The examples below use the gNMIc client.

The following example shows a gNMIc STREAM SAMPLE mode request for specific Interface data, with a sample interval of 30 seconds, suppress redundant flag enabled, and heartbeat interval of 120 seconds:

Copy
Copied!
            

gnmic -a "IP" --port 9339 --skip-verify subscribe --prefix "interfaces"  --path "/interface[name=sw1p1]"  --target nvos -u admin -p ***** --mode stream --stream-mode sample --sample-interval 30s --suppress-redundant --heartbeat-interval 120s

The following example shows a gNMIc STREAM ON-CHANGE mode request for system events, with an updates-only flag enabled:

Copy
Copied!
            

gnmic -a "IP" --port 9339 --skip-verify subscribe --prefix "/system-events"  --path "" --target nvos -u admin -p ***** --mode stream --stream-mode on-change --updates-only

The following example shows a gNMIc ONCE mode request and server response for IB interface MTU (-d for debug mode):

Copy
Copied!
            

gnmic -a "IP" --port 9339 --skip-verify subscribe --prefix "interfaces"  --path "/interface[name=sw1p1]/infiniband/state/mtu" -d --target nvos -u admin -p ***** --mode once {   "source": "IP",   "subscription-name": "default-1709707931",   "timestamp": 1709707925858795109,   "time": "2024-03-06T08:52:05.858795109+02:00",   "prefix": "interfaces/interface[name=sw1p1]",   "target": "nvos",   "updates": [     {       "Path": "infiniband/state/mtu",       "values": {         "infiniband/state/mtu": 256       }     }   ] }

The following example shows a gNMIc ONCE request for all supported paths:

Copy
Copied!
            

gnmic -a "IP" --port 9339 --skip-verify subscribe --prefix "/"  --path "" --target nvos -u admin -p ***** --mode once

The following example shows a gNMIc POLL mode request and server response for FAN1/1 speed:

Copy
Copied!
            

gnmic -a "IP" --port 9339 --skip-verify subscribe --prefix "components" --path "component[name=FAN1/1]/fan/state/speed" --target nvos -u admin -p ***** --format flat --mode poll components/component[name=FAN1/1]/fan/state/speed: 33

The following example shows a gNMIc STREAM mode request for specific system-event "text" leaf with PROTO encoding:

Copy
Copied!
            

gnmic -a "IP" --port 9339 --skip-verify subscribe --prefix "system-events" --path "system-event[event-id=38]/state/text" --target nvos -u admin -p ***** --encoding proto --format prototext --mode stream   sync_response: true   update: { timestamp: 1719295967820127958 prefix: { elem: { name: "system-events" } elem: { name: "system-event" key: { key: "event-id" value: "38" } } target: "nvos" } update: { path: { elem: { name: "state" } elem: { name: "text" } } val: { string_val: "Interface admin state is up" } } }

A list of supported events can be found in the Event Management page.

The following example shows a gRPC curl command to describe the server using gRPC reflection service:

Copy
Copied!
            

docker run fullstorydev/grpcurl -H username:admin -H password:***** -insecure "IP":9339 describe   gnmi.gNMI is a service: service gNMI {   rpc Capabilities ( .gnmi.CapabilityRequest ) returns ( .gnmi.CapabilityResponse );   rpc Get ( .gnmi.GetRequest ) returns ( .gnmi.GetResponse );   rpc Set ( .gnmi.SetRequest ) returns ( .gnmi.SetResponse );   rpc Subscribe ( stream .gnmi.SubscribeRequest ) returns ( stream .gnmi.SubscribeResponse ); } grpc.reflection.v1.ServerReflection is a service: service ServerReflection {   rpc ServerReflectionInfo ( stream .grpc.reflection.v1.ServerReflectionRequest ) returns ( stream .grpc.reflection.v1.ServerReflectionResponse ); } grpc.reflection.v1alpha.ServerReflection is a service: service ServerReflection {   rpc ServerReflectionInfo ( stream .grpc.reflection.v1alpha.ServerReflectionRequest ) returns ( stream .grpc.reflection.v1alpha.ServerReflectionResponse ); }

The following example shows a gNMIc ONCE mode request for all the supported paths:

Copy
Copied!
            

gnmic -a "IP" --port 9339 --skip-verify subscribe --prefix "/"  --path ""  --target nvos -u admin -p ***** --mode once --format flat

The following example shows a gNMIc Capabilities request to retrieve the set of capabilities that is supported by the server:

Copy
Copied!
            

gnmic -a "IP" --port 9339 --skip-verify capabilities -u admin -p *****

© Copyright 2026, NVIDIA. Last updated on Feb 23, 2026