gNMI Streaming

You can use gRPC Network Management Interface (gNMI) to collect system resource, interface, and counter information from Cumulus Linux and export it to your own gNMI client.

Configure the gNMI Agent

The netq-agent package includes the gNMI agent, which it disables by default. To enable the gNMI agent:

 cumulus@switch:~$ sudo systemctl enable netq-agent.service
 cumulus@switch:~$ sudo systemctl start netq-agent.service
 cumulus@switch:~$ netq config add agent gnmi-enable true

The gNMI agent listens over port 9339. You can change the default port in case you use that port in another application. The /etc/netq/netq.yml file stores the configuration.

Use the following commands to adjust the settings:

  1. Disable the gNMI agent:

    cumulus@switch:~$ netq config add agent gnmi-enable false
    
  2. Change the default port over which the gNMI agent listens:

    cumulus@switch:~$ netq config add agent gnmi-port <gnmi_port>
    
  3. Restart the NetQ Agent to incorporate the configuration changes:

    cumulus@switch:~$ netq config restart agent
    

The gNMI agent relies on the data it collects from the NVUE service. For complete data collection with gNMI, you must enable the NVUE service. To check the status of the nvued service, run the sudo systemctl status nvued.service command:

cumulus@switch:mgmt:~$ sudo systemctl status nvued.service
● nvued.service - NVIDIA User Experience Daemon
   Loaded: loaded (/lib/systemd/system/nvued.service; enabled; vendor preset: enabled)
   Active: active (running) since Thu 2023-03-09 20:00:17 UTC; 6 days ago

If necessary, enable and start the service:

cumulus@switch:mgmt:~$ sudo systemctl enable nvued.service
cumulus@switch:mgmt:~$ sudo systemctl start nvued.service

Use the gNMI Agent Only

NVIDIA recommends that you collect data with both the gNMI and NetQ agents. However, if you do not want to collect data with both agents or you are not streaming data to NetQ, you can disable the NetQ agent. Cumulus Linux then sents data only to the gNMI agent.

To disable the NetQ agent:

cumulus@switch:~$ netq config add agent opta-enable false

You cannot disable both the NetQ and gNMI agent. If you enable both agents on Cumulus Linux and a NetQ server is unreachable, the switch does not send the data to gNMI from the following models:

  • openconfig-interfaces
  • openconfig-if-ethernet
  • openconfig-if-ethernet-ext
  • openconfig-system
  • nvidia-if-ethernet-ext

WJH, openconfig-platform, and openconfig-lldp data continue streaming to gNMI in this state. If you are only using gNMI and a NetQ telemetry server does not exist, disable the NetQ agent by setting opta-enable to false.

Supported Subscription Modes

Cumulus Linux supports the following gNMI subscription modes:

  • POLL mode
  • ONCE mode
  • STREAM mode, supported for ON_CHANGE subscriptions only

Supported Models

Cumulus Linux supports the following OpenConfig models:

ModelSupported Data
openconfig-interfacesName, Operstatus, AdminStatus, IfIndex, MTU, LoopbackMode, Enabled, Counters (InPkts, OutPkts, InOctets, InUnicastPkts, InDiscards, InMulticastPkts, InBroadcastPkts, InErrors, OutOctets, OutUnicastPkts, OutMulticastPkts, OutBroadcastPkts, OutDiscards, OutErrors)
openconfig-if-ethernetAutoNegotiate, PortSpeed, MacAddress, NegotiatedPortSpeed, Counters (InJabberFrames, InOversizeFrames,​ InUndersizeFrames)
openconfig-if-ethernet-extFrame size counters (InFrames_64Octets, InFrames_65_127Octets, InFrames_128_255Octets, InFrames_256_511Octets, InFrames_512_1023Octets, InFrames_1024_1518Octets)
openconfig-systemMemory, CPU
openconfig-platformPlatform data (Name, Description, Version)
openconfig-lldpLLDP data (PortIdType, PortDescription, LastUpdate, SystemName, SystemDescription, ChassisId, Ttl, Age, ManagementAddress, ManagementAddressType, Capability)
ModelSupported Data
nvidia-if-wjh-drop-aggregateAggregated WJH drops, including layer 1, layer 2, router, ACL, tunnel, and buffer drops
nvidia-if-ethernet-extExtended Ethernet counters (AlignmentError, InAclDrops, InBufferDrops, InDot3FrameErrors, InDot3LengthErrors, InL3Drops, InPfc0Packets, InPfc1Packets, InPfc2Packets, InPfc3Packets, InPfc4Packets, InPfc5Packets, InPfc6Packets, InPfc7Packets, OutNonQDrops, OutPfc0Packets, OutPfc1Packets, OutPfc2Packets, OutPfc3Packets, OutPfc4Packets, OutPfc5Packets, OutPfc6Packets, OutPfc7Packets, OutQ0WredDrops, OutQ1WredDrops, OutQ2WredDrops, OutQ3WredDrops, OutQ4WredDrops, OutQ5WredDrops, OutQ6WredDrops, OutQ7WredDrops, OutQDrops, OutQLength, OutWredDrops, SymbolErrors, OutTxFifoFull)

The client can use the following YANG models as a reference:

nvidia-if-ethernet-ext
nvidia-if-wjh-drop-aggregate

Collect WJH Data with gNMI

You can export What Just Happened (WJH) data from the NetQ agent to your own gNMI client. Refer to the nvidia-if-wjh-drop-aggregate reference YANG model, above.

Supported Features

The gNMI Agent supports Capabilities and STREAM subscribe requests for WJH events.

WJH Drop Reasons

The data that NetQ sends to the gNMI agent is in the form of WJH drop reasons. The SDK generates the drop reasons and Cumulus Linux stores them in the /usr/etc/wjh_lib_conf.xml file. Use this file as a guide to filter for specific reason types (L1, ACL, and so on), reason IDs, or event severeties.

Layer 1 Drop Reasons

Reason IDReasonDescription
10021Port admin downValidate port configuration
10022Auto-negotiation failureSet port speed manually, disable auto-negotiation
10023Logical mismatch with peer linkCheck cable or transceiver
10024Link training failureCheck cable or transceiver
10025Peer is sending remote faultsReplace cable or transceiver
10026Bad signal integrityReplace cable or transceiver
10027Cable or transceiver is not supportedUse supported cable or transceiver
10028Cable or transceiver is unpluggedPlug cable or transceiver
10029Calibration failureCheck cable or transceiver
10030Cable or transceiver bad statusCheck cable or transceiver
10031Other reasonOther L1 drop reason

Layer 2 Drop Reasons

Reason IDReasonSeverityDescription
201MLAG port isolationNoticeExpected behavior
202Destination MAC is reserved (DMAC=01-80-C2-00-00-0x)ErrorBad packet received from the peer
203VLAN tagging mismatchErrorValidate the VLAN tag configuration on both ends of the link
204Ingress VLAN filteringErrorValidate the VLAN membership configuration on both ends of the link
205Ingress spanning tree filterNoticeExpected behavior
206Unicast MAC table action discardErrorValidate MAC table for this destination MAC
207Multicast egress port list is emptyWarningValidate why IGMP join or multicast router port does not exist
208Port loopback filterErrorValidate MAC table for this destination MAC
209Source MAC is multicastErrorBad packet received from peer
210Source MAC equals destination MACErrorBad packet received from peer

Router Drop Reasons

Reason IDReasonSeverityDescription
301Non-routable packetNoticeExpected behavior
302Blackhole routeWarningValidate routing table for this destination IP
303Unresolved neighbor or next hopWarningValidate ARP table for the neighbor or next hop
304Blackhole ARP or neighborWarningValidate ARP table for the next hop
305IPv6 destination in multicast scope FFx0:/16NoticeExpected behavior - packet is not routable
306IPv6 destination in multicast scope FFx1:/16NoticeExpected behavior - packet is not routable
307Non-IP packetNoticeDestination MAC is the router, packet is not routable
308Unicast destination IP but multicast destination MACErrorBad packet received from the peer
309Destination IP is loopback addressErrorBad packet received from the peer
310Source IP is multicastErrorBad packet received from the peer
311Source IP is in class EErrorBad packet received from the peer
312Source IP is loopback addressErrorBad packet received from the peer
313Source IP is unspecifiedErrorBad packet received from the peer
314Checksum or IPver or IPv4 IHL too shortErrorBad cable or bad packet received from the peer
315Multicast MAC mismatchErrorBad packet received from the peer
316Source IP equals destination IPErrorBad packet received from the peer
317IPv4 source IP is limited broadcastErrorBad packet received from the peer
318IPv4 destination IP is local network (destination=0.0.0.0/8)ErrorBad packet received from the peer
320Ingress router interface is disabledWarningValidate your configuration
321Egress router interface is disabledWarningValidate your configuration
323IPv4 routing table (LPM) unicast missWarningValidate routing table for this destination IP
324IPv6 routing table (LPM) unicast missWarningValidate routing table for this destination IP
325Router interface loopbackWarningValidate the interface configuration
326Packet size is larger than router interface MTUWarningValidate the router interface MTU configuration
327TTL value is too smallWarningActual path is longer than the TTL

Tunnel Drop Reasons

Reason IDReasonSeverityDescription
402Overlay switch - Source MAC is multicastErrorThe peer sent a bad packet
403Overlay switch - Source MAC equals destination MACErrorThe peer sent a bad packet
404Decapsulation errorErrorThe peer sent a bad packet

ACL Drop Reasons

Reason IDReasonSeverityDescription
601Ingress port ACLNoticeValidate Access Control List configuration
602Ingress router ACLNoticeValidate Access Control List
603Egress router ACLNoticeValidate Access Control List
604Egress port ACLNoticeValidate Access Control List

Buffer Drop Reasons

Reason IDReasonSeverityDescription
503Tail dropWarningMonitor network congestion
504WREDWarningMonitor network congestion
505Port TC congestion threshold crossedNoticeMonitor network congestion
506Packet latency threshold crossedNoticeMonitor network congestion

gNMI Client Requests

You can use your gNMI client on a host to request capabilities and data to which the Agent subscribes. The examples below use the gNMIc client..

The following example shows a gNMIc STREAM request for WJH data:

gnmic -a 10.209.37.121:9339 -u cumulus -p ****** --skip-verify subscribe --path "wjh/aggregate/l2/reasons/reason[id=209][severity=error]/state/drop" --mode stream --prefix "/interfaces/interface[name=swp8]/" --target netq

{
  "source": "10.209.37.121:9339",
  "subscription-name": "default-1677695197",
  "timestamp": 1677695102858146800,
  "time": "2023-03-01T18:25:02.8581468Z",
  "prefix": "interfaces/interface[name=swp8]/wjh/aggregate/l2/reasons/reason[severity=error][id=209]",
  "target": "netq",
  "updates": [
    {
      "Path": "state/drop",
      "values": {
        "state/drop": "[{\"AggCount\":283,\"Dip\":\"0.0.0.0\",\"Dmac\":\"1c:34:da:17:93:7c\",\"Dport\":0,\"DropType\":\"L2\",\"EgressPort\":\"\",\"EndTimestamp\":1677695102,\"FirstTimestamp\":1677695072,\"Hostname\":\"neo-switch01\",\"IngressLag\":\"\",\"IngressPort\":\"swp8\",\"Proto\":0,\"Reason\":\"Source MAC is multicast\",\"ReasonId\":209,\"Severity\":\"Error\",\"Sip\":\"0.0.0.0\",\"Smac\":\"01:00:5e:00:00:01\",\"Sport\":0}]"
      }
    }
  ]
}
{
  "source": "10.209.37.121:9339",
  "subscription-name": "default-1677695197",
  "timestamp": 1677695132988218890,
  "time": "2023-03-01T18:25:32.98821889Z",
  "prefix": "interfaces/interface[name=swp8]/wjh/aggregate/l2/reasons/reason[severity=error][id=209]",
  "target": "netq",
  "updates": [
    {
      "Path": "state/drop",
      "values": {
        "state/drop": "[{\"AggCount\":287,\"Dip\":\"0.0.0.0\",\"Dmac\":\"1c:34:da:17:93:7c\",\"Dport\":0,\"DropType\":\"L2\",\"EgressPort\":\"\",\"EndTimestamp\":1677695132,\"FirstTimestamp\":1677695102,\"Hostname\":\"neo-switch01\",\"IngressLag\":\"\",\"IngressPort\":\"swp8\",\"Proto\":0,\"Reason\":\"Source MAC is multicast\",\"ReasonId\":209,\"Severity\":\"Error\",\"Sip\":\"0.0.0.0\",\"Smac\":\"01:00:5e:00:00:01\",\"Sport\":0}]"
      }
    }
  ]
}

The following example shows a gNMIc ONCE mode request for interface port speed:

gnmic -a 10.209.37.121:9339 -u cumulus -p ****** --skip-verify subscribe --path "ethernet/state/port-speed" --mode once --prefix "/interfaces/interface[name=swp1]/" --target netq
{
  "source": "10.209.37.123:9339",
  "subscription-name": "default-1677695151",
  "timestamp": 1677256036962254134,
  "time": "2023-02-24T16:27:16.962254134Z",
  "target": "netq",
  "updates": [
    {
      "Path": "interfaces/interface[name=swp1]/ethernet/state/port-speed",
      "values": {
        "interfaces/interface/ethernet/state/port-speed": "SPEED_1GB"
      }
    }
  ]
}

The following example shows a gNMIc POLL mode request for interface status:

gnmic -a 10.209.37.121:9339 -u cumulus -p ****** --skip-verify subscribe --path "state/oper-status" --mode poll --prefix "/interfaces/interface[name=swp1]/" --target netq
{
  "timestamp": 1677644403153198642,
  "time": "2023-03-01T04:20:03.153198642Z",
  "prefix": "interfaces/interface[name=swp1]",
  "target": "netq",
  "updates": [
    {
      "Path": "state/oper-status",
      "values": {
        "state/oper-status": "UP"
      }
    }
  ]
}
received sync response 'true' from '10.209.37.123:9339'
{
  "timestamp": 1677644403153198642,
  "time": "2023-03-01T04:20:03.153198642Z",
  "prefix": "interfaces/interface[name=swp1]",
  "target": "netq",
  "updates": [
    {
      "Path": "state/oper-status",
      "values": {
        "state/oper-status": "UP"
      }
    }
  ]
}