gNMI Streaming
You can use gRPC Network Management Interface (gNMI) to collect system resource, interface, and counter information from Cumulus Linux and export it to your own gNMI client.
Configure the gNMI Agent
The netq-agent
package includes the gNMI agent, which it disables by default. To enable the gNMI agent:
cumulus@switch:~$ sudo systemctl enable netq-agent.service
cumulus@switch:~$ sudo systemctl start netq-agent.service
cumulus@switch:~$ netq config add agent gnmi-enable true
The gNMI agent listens over port 9339. You can change the default port in case you use that port in another application. The /etc/netq/netq.yml
file stores the configuration.
Use the following commands to adjust the settings:
Disable the gNMI agent:
cumulus@switch:~$ netq config add agent gnmi-enable false
Change the default port over which the gNMI agent listens:
cumulus@switch:~$ netq config add agent gnmi-port <gnmi_port>
Restart the NetQ Agent to incorporate the configuration changes:
cumulus@switch:~$ netq config restart agent
The gNMI agent relies on the data it collects from the NVUE service. For complete data collection with gNMI, you must enable the NVUE service. To check the status of the nvued
service, run the sudo systemctl status nvued.service
command:
cumulus@switch:mgmt:~$ sudo systemctl status nvued.service
● nvued.service - NVIDIA User Experience Daemon
Loaded: loaded (/lib/systemd/system/nvued.service; enabled; vendor preset: enabled)
Active: active (running) since Thu 2023-03-09 20:00:17 UTC; 6 days ago
If necessary, enable and start the service:
cumulus@switch:mgmt:~$ sudo systemctl enable nvued.service
cumulus@switch:mgmt:~$ sudo systemctl start nvued.service
Use the gNMI Agent Only
NVIDIA recommends that you collect data with both the gNMI and NetQ agents. However, if you do not want to collect data with both agents or you are not streaming data to NetQ, you can disable the NetQ agent. Cumulus Linux then sents data only to the gNMI agent.
To disable the NetQ agent:
cumulus@switch:~$ netq config add agent opta-enable false
You cannot disable both the NetQ and gNMI agent. If you enable both agents on Cumulus Linux and a NetQ server is unreachable, the switch does not send the data to gNMI from the following models:
openconfig-interfaces
openconfig-if-ethernet
openconfig-if-ethernet-ext
openconfig-system
nvidia-if-ethernet-ext
WJH, openconfig-platform
, and openconfig-lldp
data continue streaming to gNMI in this state. If you are only using gNMI and a NetQ telemetry server does not exist, disable the NetQ agent by setting opta-enable
to false
.
Supported Subscription Modes
Cumulus Linux supports the following gNMI subscription modes:
POLL
modeONCE
modeSTREAM
mode, supported forON_CHANGE
subscriptions only
Supported Models
Cumulus Linux supports the following OpenConfig models:
Model | Supported Data |
---|---|
openconfig-interfaces | Name, Operstatus, AdminStatus, IfIndex, MTU, LoopbackMode, Enabled, Counters (InPkts, OutPkts, InOctets, InUnicastPkts, InDiscards, InMulticastPkts, InBroadcastPkts, InErrors, OutOctets, OutUnicastPkts, OutMulticastPkts, OutBroadcastPkts, OutDiscards, OutErrors) |
openconfig-if-ethernet | AutoNegotiate, PortSpeed, MacAddress, NegotiatedPortSpeed, Counters (InJabberFrames, InOversizeFrames, InUndersizeFrames) |
openconfig-if-ethernet-ext | Frame size counters (InFrames_64Octets, InFrames_65_127Octets, InFrames_128_255Octets, InFrames_256_511Octets, InFrames_512_1023Octets, InFrames_1024_1518Octets) |
openconfig-system | Memory, CPU |
openconfig-platform | Platform data (Name, Description, Version) |
openconfig-lldp | LLDP data (PortIdType, PortDescription, LastUpdate, SystemName, SystemDescription, ChassisId, Ttl, Age, ManagementAddress, ManagementAddressType, Capability) |
Model | Supported Data |
---|---|
nvidia-if-wjh-drop-aggregate | Aggregated WJH drops, including L1, L2, router, ACL, tunnel, and buffer drops |
nvidia-if-ethernet-ext | Extended Ethernet counters (AlignmentError, InAclDrops, InBufferDrops, InDot3FrameErrors, InDot3LengthErrors, InL3Drops, InPfc0Packets, InPfc1Packets, InPfc2Packets, InPfc3Packets, InPfc4Packets, InPfc5Packets, InPfc6Packets, InPfc7Packets, OutNonQDrops, OutPfc0Packets, OutPfc1Packets, OutPfc2Packets, OutPfc3Packets, OutPfc4Packets, OutPfc5Packets, OutPfc6Packets, OutPfc7Packets, OutQ0WredDrops, OutQ1WredDrops, OutQ2WredDrops, OutQ3WredDrops, OutQ4WredDrops, OutQ5WredDrops, OutQ6WredDrops, OutQ7WredDrops, OutQDrops, OutQLength, OutWredDrops, SymbolErrors, OutTxFifoFull) |
The client can use the following YANG models as a reference:
Collect WJH Data with gNMI
You can export What Just Happened (WJH) data from the NetQ agent to your own gNMI client. Refer to the nvidia-if-wjh-drop-aggregate
reference YANG model, above.
Supported Features
The gNMI Agent supports Capabilities
and STREAM
subscribe requests for WJH events.
WJH Drop Reasons
The data that NetQ sends to the gNMI agent is in the form of WJH drop reasons. The SDK generates the drop reasons and Cumulus Linux stores them in the /usr/etc/wjh_lib_conf.xml
file. Use this file as a guide to filter for specific reason types (L1, ACL, and so on), reason IDs, or event severeties.
L1 Drop Reasons
Reason ID | Reason | Description |
---|---|---|
10021 | Port admin down | Validate port configuration |
10022 | Auto-negotiation failure | Set port speed manually, disable auto-negotiation |
10023 | Logical mismatch with peer link | Check cable or transceiver |
10024 | Link training failure | Check cable or transceiver |
10025 | Peer is sending remote faults | Replace cable or transceiver |
10026 | Bad signal integrity | Replace cable or transceiver |
10027 | Cable or transceiver is not supported | Use supported cable or transceiver |
10028 | Cable or transceiver is unplugged | Plug cable or transceiver |
10029 | Calibration failure | Check cable or transceiver |
10030 | Cable or transceiver bad status | Check cable or transceiver |
10031 | Other reason | Other L1 drop reason |
L2 Drop Reasons
Reason ID | Reason | Severity | Description |
---|---|---|---|
201 | MLAG port isolation | Notice | Expected behavior |
202 | Destination MAC is reserved (DMAC=01-80-C2-00-00-0x) | Error | Bad packet received from the peer |
203 | VLAN tagging mismatch | Error | Validate the VLAN tag configuration on both ends of the link |
204 | Ingress VLAN filtering | Error | Validate the VLAN membership configuration on both ends of the link |
205 | Ingress spanning tree filter | Notice | Expected behavior |
206 | Unicast MAC table action discard | Error | Validate MAC table for this destination MAC |
207 | Multicast egress port list is empty | Warning | Validate why IGMP join or multicast router port does not exist |
208 | Port loopback filter | Error | Validate MAC table for this destination MAC |
209 | Source MAC is multicast | Error | Bad packet received from peer |
210 | Source MAC equals destination MAC | Error | Bad packet received from peer |
Router Drop Reasons
Reason ID | Reason | Severity | Description |
---|---|---|---|
301 | Non-routable packet | Notice | Expected behavior |
302 | Blackhole route | Warning | Validate routing table for this destination IP |
303 | Unresolved neighbor or next hop | Warning | Validate ARP table for the neighbor or next hop |
304 | Blackhole ARP or neighbor | Warning | Validate ARP table for the next hop |
305 | IPv6 destination in multicast scope FFx0:/16 | Notice | Expected behavior - packet is not routable |
306 | IPv6 destination in multicast scope FFx1:/16 | Notice | Expected behavior - packet is not routable |
307 | Non-IP packet | Notice | Destination MAC is the router, packet is not routable |
308 | Unicast destination IP but multicast destination MAC | Error | Bad packet received from the peer |
309 | Destination IP is loopback address | Error | Bad packet received from the peer |
310 | Source IP is multicast | Error | Bad packet received from the peer |
311 | Source IP is in class E | Error | Bad packet received from the peer |
312 | Source IP is loopback address | Error | Bad packet received from the peer |
313 | Source IP is unspecified | Error | Bad packet received from the peer |
314 | Checksum or IPver or IPv4 IHL too short | Error | Bad cable or bad packet received from the peer |
315 | Multicast MAC mismatch | Error | Bad packet received from the peer |
316 | Source IP equals destination IP | Error | Bad packet received from the peer |
317 | IPv4 source IP is limited broadcast | Error | Bad packet received from the peer |
318 | IPv4 destination IP is local network (destination=0.0.0.0/8) | Error | Bad packet received from the peer |
320 | Ingress router interface is disabled | Warning | Validate your configuration |
321 | Egress router interface is disabled | Warning | Validate your configuration |
323 | IPv4 routing table (LPM) unicast miss | Warning | Validate routing table for this destination IP |
324 | IPv6 routing table (LPM) unicast miss | Warning | Validate routing table for this destination IP |
325 | Router interface loopback | Warning | Validate the interface configuration |
326 | Packet size is larger than router interface MTU | Warning | Validate the router interface MTU configuration |
327 | TTL value is too small | Warning | Actual path is longer than the TTL |
Tunnel Drop Reasons
Reason ID | Reason | Severity | Description |
---|---|---|---|
402 | Overlay switch - Source MAC is multicast | Error | The peer sent a bad packet |
403 | Overlay switch - Source MAC equals destination MAC | Error | The peer sent a bad packet |
404 | Decapsulation error | Error | The peer sent a bad packet |
ACL Drop Reasons
Reason ID | Reason | Severity | Description |
---|---|---|---|
601 | Ingress port ACL | Notice | Validate Access Control List configuration |
602 | Ingress router ACL | Notice | Validate Access Control List |
603 | Egress router ACL | Notice | Validate Access Control List |
604 | Egress port ACL | Notice | Validate Access Control List |
Buffer Drop Reasons
Reason ID | Reason | Severity | Description |
---|---|---|---|
503 | Tail drop | Warning | Monitor network congestion |
504 | WRED | Warning | Monitor network congestion |
505 | Port TC congestion threshold crossed | Notice | Monitor network congestion |
506 | Packet latency threshold crossed | Notice | Monitor network congestion |
gNMI Client Requests
You can use your gNMI client on a host to request capabilities and data to which the Agent subscribes. The examples below use the gNMIc client..
The following example shows a gNMIc STREAM
request for WJH data:
gnmic -a 10.209.37.121:9339 -u cumulus -p ****** --skip-verify subscribe --path "wjh/aggregate/l2/reasons/reason[id=209][severity=error]/state/drop" --mode stream --prefix "/interfaces/interface[name=swp8]/" --target netq
{
"source": "10.209.37.121:9339",
"subscription-name": "default-1677695197",
"timestamp": 1677695102858146800,
"time": "2023-03-01T18:25:02.8581468Z",
"prefix": "interfaces/interface[name=swp8]/wjh/aggregate/l2/reasons/reason[severity=error][id=209]",
"target": "netq",
"updates": [
{
"Path": "state/drop",
"values": {
"state/drop": "[{\"AggCount\":283,\"Dip\":\"0.0.0.0\",\"Dmac\":\"1c:34:da:17:93:7c\",\"Dport\":0,\"DropType\":\"L2\",\"EgressPort\":\"\",\"EndTimestamp\":1677695102,\"FirstTimestamp\":1677695072,\"Hostname\":\"neo-switch01\",\"IngressLag\":\"\",\"IngressPort\":\"swp8\",\"Proto\":0,\"Reason\":\"Source MAC is multicast\",\"ReasonId\":209,\"Severity\":\"Error\",\"Sip\":\"0.0.0.0\",\"Smac\":\"01:00:5e:00:00:01\",\"Sport\":0}]"
}
}
]
}
{
"source": "10.209.37.121:9339",
"subscription-name": "default-1677695197",
"timestamp": 1677695132988218890,
"time": "2023-03-01T18:25:32.98821889Z",
"prefix": "interfaces/interface[name=swp8]/wjh/aggregate/l2/reasons/reason[severity=error][id=209]",
"target": "netq",
"updates": [
{
"Path": "state/drop",
"values": {
"state/drop": "[{\"AggCount\":287,\"Dip\":\"0.0.0.0\",\"Dmac\":\"1c:34:da:17:93:7c\",\"Dport\":0,\"DropType\":\"L2\",\"EgressPort\":\"\",\"EndTimestamp\":1677695132,\"FirstTimestamp\":1677695102,\"Hostname\":\"neo-switch01\",\"IngressLag\":\"\",\"IngressPort\":\"swp8\",\"Proto\":0,\"Reason\":\"Source MAC is multicast\",\"ReasonId\":209,\"Severity\":\"Error\",\"Sip\":\"0.0.0.0\",\"Smac\":\"01:00:5e:00:00:01\",\"Sport\":0}]"
}
}
]
}
The following example shows a gNMIc ONCE
mode request for interface port speed:
gnmic -a 10.209.37.121:9339 -u cumulus -p ****** --skip-verify subscribe --path "ethernet/state/port-speed" --mode once --prefix "/interfaces/interface[name=swp1]/" --target netq
{
"source": "10.209.37.123:9339",
"subscription-name": "default-1677695151",
"timestamp": 1677256036962254134,
"time": "2023-02-24T16:27:16.962254134Z",
"target": "netq",
"updates": [
{
"Path": "interfaces/interface[name=swp1]/ethernet/state/port-speed",
"values": {
"interfaces/interface/ethernet/state/port-speed": "SPEED_1GB"
}
}
]
}
The following example shows a gNMIc POLL
mode request for interface status:
gnmic -a 10.209.37.121:9339 -u cumulus -p ****** --skip-verify subscribe --path "state/oper-status" --mode poll --prefix "/interfaces/interface[name=swp1]/" --target netq
{
"timestamp": 1677644403153198642,
"time": "2023-03-01T04:20:03.153198642Z",
"prefix": "interfaces/interface[name=swp1]",
"target": "netq",
"updates": [
{
"Path": "state/oper-status",
"values": {
"state/oper-status": "UP"
}
}
]
}
received sync response 'true' from '10.209.37.123:9339'
{
"timestamp": 1677644403153198642,
"time": "2023-03-01T04:20:03.153198642Z",
"prefix": "interfaces/interface[name=swp1]",
"target": "netq",
"updates": [
{
"Path": "state/oper-status",
"values": {
"state/oper-status": "UP"
}
}
]
}