NVIDIA WinOF-2 Documentation v3.0
Linux Kernel Upstream Release Notes v6.5

Adapter Cards Counters

Adapter cards counters are used to provide information on Operating System, application, service or the drivers' performance. Counters can be used for different system debugging purposes, help to determine system bottlenecks and fine-tune system and application performance. The Operating System, network, and devices provide counter data that the application can consume to provide users with a graphical view of the system’s performance quality.

WinOF-2 counters hold the standard Windows CounterSet API that includes:

  • Network Interface

  • RDMA activity

  • SMB Direct Connection

Mellanox WinOF-2 Port Traffic counters set consists of counters that measure the rates at which bytes and packets are sent and received over a port network connection. It includes counters that monitor connection errors.

Mellanox WinOF-2 Port Traffic

Description

Bytes/Packets IN

Bytes Received

Shows the number of bytes received by network adapter. The counted bytes include framing characters.

KBytes Received/Sec

Shows the rate at which kilobytes are received by a network adapter. The counted kilobytes include framing characters.

Packets Received

Shows the number of packets received by a network interface.

Packets Received/Sec

Shows the rate at which packets are received by a network interface.

Packets Received Frame too long Error

The number of received packets on a physical port dropped due to a large MTU size.

Packets Received Unsupported opcode Error

The number of MAC control packets received on a physical port with unsupported opcode.

Packets Received Frame undersize Error

The number of received packets on a physical port dropped due to the length of the packet being shorter than 64 bytes.

Packets Received Fragments Error

The number of received packets on a physical port dropped due to the length of the packet being shorter than 64 bytes and have FCS error.

Packets Received jabbers Error

The number of received packets on a physical port dropped due to the length of the packet being longer than 64 bytes and have FCS error.

Bytes/Packets OUT

Bytes Sent

Shows the number of bytes sent by a network adapter. The counted bytes include framing.

KBytes Sent/Sec

Shows the rate at which kilobytes are sent by a network adapter. The counted kilobytes include framing characters.

Packets Sent

Shows the number of packets sent by a network interface.

Packets Sent/Sec

Shows the rate at which packets are sent by a network interface.

Bytes Total

Shows the total of bytes handled by a network adapter. The counted bytes include framing characters.

KBytes Total/Sec

Shows the total rate of kilobytes that are sent and received by a network adapter. The counted kilobytes include framing characters.

Packets Total

Shows the total of packets handled by a network interface.

Packets Total/Sec

Shows the rate at which packets are sent and received by a network interface.

Control Packets

The total number of successfully received control frames.

Note: This counter is relevant only for ETH ports

ERRORS, DISCARDED

Packets Received Frame too long Error

The number of received packets on a physical port dropped due to a large MTU size.

Note: This counter is relevant only for ETH ports

Packets Received Unsupported opcode Error

The number of MAC control packets received on a physical port with unsupported opcode.

Note: This counter is relevant only for ETH ports

Packets Received Frame undersize Error

The number of received packets on a physical port dropped due to the length of the packet being shorter than 64 bytes.

Note: This counter is relevant only for ETH ports

Packets Received Fragments Error

The number of received packets on a physical port dropped due to the length of the packet being shorter than 64 bytes and have FCS error.

Note: This counter is relevant only for ETH ports

Packets Received jabbers Error

The number of received packets on a physical port dropped due to the length of the packet being longer than 64 bytes and have FCS error.

Note: This counter is relevant only for ETH ports

Packets Outbound Errors

Shows the number of outbound packets that could not be transmitted because of errors found in the physical layer.

Packets Outbound Discarded

Shows the number of outbound packets to be discarded in the physical layer, even though no errors had been detected to prevent transmission. One possible reason for discarding packets could be to free up buffer space.

Packets Received Errors

Shows the number of inbound packets that contained errors in the physical layer, preventing them from being deliverable.

Packets Received Frame Length Error

Shows the number of inbound packets that contained error where the frame has length error. Packets received with frame length error are a subset of packets received errors.

Note: This counter is relevant only for ETH ports

Packets Received Symbol Error

Shows the number of inbound packets that contained symbol error or an invalid block. Packets received with symbol error are a subset of packets received errors.

Packets Received Bad CRC Error

Shows the number of inbound packets that contained bad CRC error. Packets received with bad CRC error are a subset of packets received errors.

Packets Received Discarded

No Receive WQEs - Packets discarded due to no receive descriptors posted by driver or software.

RSC Aborts

Number of RSC abort events. That is, the number of exceptions other than the IP datagram length being exceeded. This includes the cases where a packet is not coalesced because of insufficient hard-ware resources.

Note: This counter is relevant only for ETH ports

RSC Coalesced Events

Number of RSC Coalesced events. That is, the total number of packets that were formed from coalescing packets.

Note: This counter is relevant only for ETH ports

RSC Coalesced Octets

Number of RSC Coalesced bytes.

Note: This counter is relevant only for ETH ports

RSC Coalesced Packets

Number of RSC Coalesced Packets.

Note: This counter is relevant only for ETH ports

RSC Average Packet Size

RSC Average Packet Size is the average size in bytes of received packets across all TCP connections.

Note: This counter is relevant only for ETH ports

Warning

Mellanox WinOF2 VF Port Traffic counters exist per each VF and are created according to the adapter's configurations. These counters are created upon VFs configuration even if the VFs are not up.

Mellanox WinOF-2 VF Port Traffic counters set consists of counters that measure the rates at which bytes and packets are sent and received over a virtual port network connection that is bound to a virtual PCI function. It includes counters that monitor connection errors.

This set is available only on hypervisors and not on virtual network adapters.

Warning

These counters set is relevant only for ETH ports.

Mellanox WinOF-2 VF Port Traffic

Description

Bytes/Packets IN

Bytes Received/Sec

Shows the rate at which bytes are received over each network VPort. The counted bytes include framing characters.

Bytes Received Unicast/Sec

Shows the rate at which subnet-unicast bytes are delivered to a higher-layer protocol.

Bytes Received Broadcast/Sec

Shows the rate at which subnet-broadcast bytes are delivered to a higher-layer protocol.

Bytes Received Multicast/Sec

Shows the rate at which subnet-multicast bytes are delivered to a higher-layer protocol.

Packets Received Unicast/Sec

Shows the rate at which subnet-unicast packets are delivered to a higher-layer protocol.

Packets Received Broadcast/Sec

Shows the rate at which subnet-broadcast packets are delivered to a higher-layer protocol.

Packets Received Multicast/Sec

Shows the rate at which subnet-multicast packets are delivered to a higher-layer protocol.

Bytes/Packets OUT

Bytes Sent/Sec

Shows the rate at which bytes are sent over each network VPort. The counted bytes include framing characters.

Bytes Sent Unicast/Sec

Shows the rate at which bytes are requested to be transmitted to subnet-unicast addresses by higher-level protocols. The rate includes the bytes that were discarded or not sent.

Bytes Sent Broadcast/Sec

Shows the rate at which bytes are requested to be transmitted to subnet-broadcast addresses by higher-level protocols. The rate includes the bytes that were discarded or not sent.

Bytes Sent Multicast/Sec

Shows the rate at which bytes are requested to be transmitted to subnet-multicast addresses by higher-level protocols. The rate includes the bytes that were discarded or not sent.

Packets Sent Unicast/Sec

Shows the rate at which packets are requested to be transmitted to subnet-unicast addresses by higher-level protocols. The rate includes the packets that were discarded or not sent.

Packets Sent Broadcast/Sec

Shows the rate at which packets are requested to be transmitted to subnet-broadcast addresses by higher-level protocols. The rate includes the packets that were discarded or not sent.

Packets Sent Multicast/Sec

Shows the rate at which packets are requested to be transmitted to subnet-multicast addresses by higher-level protocols. The rate includes the packets that were discarded or not sent.

ERRORS, DISCARDED

Packets Outbound Discarded

Shows the number of outbound packets to be discarded even though no errors had been detected to prevent transmission. One possible reason for discarding a packet could be to free up buffer space.

Packets Outbound Errors

Shows the number of outbound packets that could not be transmitted because of errors.

Packets Received Discarded

Shows the number of inbound packets that were chosen to be discarded even though no errors had been detected to prevent their being deliverable to a higher-layer protocol. One possible reason for discarding such a packet could be to free up buffer space.

Packets Received Errors

Shows the number of inbound packets that contained errors preventing them from being deliverable to a higher-layer protocol.

Mac Anti-Spoofing Packets Discarded

Shows the number of packets discarded due to illegal mac address usage.

Mac Anti-Spoofing Bytes Discarded

Shows the number of bytes discarded due to illegal mac address usage.

Vlan Anti-Spoofing Packets Discarded

Shows the number of packets discarded due to illegal vlan usage.

Vlan Anti-Spoofing Bytes Discarded

Shows the number of bytes discarded due to illegal vlan usage.

Allowed EthType Anti-Spoofing Packets Discarded

Shows the number of packets discarded due to unallowed ether type usage.

Allowed EthType Anti-Spoofing Bytes Discarded

Shows the number of Bytes discarded due to unallowed ether type usage.

RDMA Bytes/Packets IN

Rdma Packets Received Unicast/Sec

Shows the rate at which subnet-unicast rdma packets are delivered to a higher-layer protocol.

Rdma Packets Received Multicast/Sec

Shows the rate at which subnet-multicast rdma packets are delivered to a higher-layer protocol.

Rdma Bytes Received Unicast/Sec

Shows the rate at which subnet-unicast rdma bytes are delivered to a higher-layer protocol.

Rdma Bytes Received Multicast/Sec

Shows the rate at which subnet-multicast rdma bytes are delivered to a higher-layer protocol.

RDMA Bytes/Packets OUT

Rdma Packets Sent Unicast/Sec

Shows the rate at which subnet-unicast rdma packets are sent by a higher-layer protocol.

Rdma Packets Sent Multicast/Sec

Shows the rate at which subnet-multicast rdma packets are sent by a higher-layer protocol.

Rdma Bytes Sent Unicast/Sec

Shows the rate at which subnet-unicast rdma bytes are sent by a higher-layer protocol.

Rdma Bytes Sent Multicast/Sec

Shows the rate at which subnet-multicast rdma bytes are sent by a higher-layer protocol.

Mellanox WinOF-2 Port QoS counters set consists of flow statistics per (VLAN) priority. Each QoS policy is associated with a priority. The counter presents the priority's traffic, pause statistic.

Warning

These counters set is relevant only for ETH ports.

Mellanox WinOF-2 QoS

Description

Bytes/Packets IN

Bytes Received

The number of bytes received that are covered by this priority. The counted bytes include framing characters (modulo 2^64).

KBytes Received/Sec

The number of kilobytes received per second that are covered by this priority. The counted kilobytes include framing characters.

Packets Received

The number of packets received that are covered by this priority (modulo 2^64).

Packets Received/Sec

The number of packets received per second that are covered by this priority.

Packets Received Discarded

The number of outbound packets to be discarded in the physical layer even though no errors have been detected to prevent transmission. A possible reason for discarding packets could be to free up buffer space.

Bytes/Packets OUT

Bytes Sent

The number of bytes sent that are covered by this priority. The counted bytes include framing characters (modulo 2^64).

KBytes Sent/Sec

The number of kilobytes sent per second that are covered by this priority. The counted kilobytes include framing characters.

Packets Sent

The number of packets sent that are covered by this priority (modulo 2^64).

Packets Sent/Sec

The number of packets sent per second that are covered by this priority.

Bytes and Packets Total

Bytes Total

The total number of bytes that are covered by this priority. The counted bytes include framing characters (modulo 2^64).

KBytes Total/Sec

The total number of kilobytes per second that are covered by this priority. The counted kilobytes include framing characters.

Packets Total

The total number of packets that are covered by this priority (modulo 2^64).

Packets Total/Sec

The total number of packets per second that are covered by this priority.

PAUSE INDICATION

Sent Pause Duration

The total time in microseconds that the peer port has been requested to pause.

Sent Pause Frames

The number of pause packets transmitted on priority p on a physical port. If this counter is increasing, it implies that the adapter is congested and cannot absorb the traffic coming from the network.

Received Pause Frames

The number of pause packets received with priority p on a physical port. If this counter is increasing, it implies that the network is congested and cannot absorb the traffic coming from the adapter.

Received Pause Duration

The total time in microseconds that the transmission of packets to the peer port have been paused.

RDMA Activity counters set consists of NDK performance counters. These performance counters allow you to track Network Direct Kernel (RDMA) activity, including traffic rates, errors, and control plane activity.

RDMA Activity

Description

RDMA Accepted Connections

The number of inbound RDMA connections established.

RDMA Active Connections

The number of active RDMA connections.

RDMA Completion Queue Errors

This counter is not supported, and always is set to zero.

RDMA Failed Connection Attempts

The number of inbound and outbound RDMA connection attempts that failed.

RDMA Inbound Bytes/sec

The number of bytes for all incoming RDMA traffic. This includes additional layer two protocol overhead.

RDMA Inbound Frames/sec

The number, in frames, of layer two frames that carry incoming RDMA traffic.

RDMA Initiated Connections

The number of outbound connections established.

RDMA Outbound Bytes/sec

The number of bytes for all outgoing RDMA traffic. This includes additional layer two protocol overhead.

RDMA Outbound Frames/sec

The number, in frames, of layer two frames that carry outgoing RDMA traffic.

Mellanox WinOF-2 Congestion Control counters set consists of counters that measure the DCQCN statistics over the network adapter.

Warning

These counters set is relevant only for ETH ports.

Mellanox WinOF-2 Congestion Control

Description

Notification Point

Notification Point - CNPs Sent Successfully

Number of congestion notification packets (CNPs) successfully sent by the notification point.

Notification Point - RoCEv2 DCQCN Marked
Packets

Number of RoCEv2 packets that were marked as congestion encountered.

Reaction Point

Reaction Point - Current Number of Flows

Current number of Rate Limited Flows due to RoCEv2 Congestion Control.

Reaction Point - Ignored CNP Packets

Number of ignored congestion notification packets (CNPs).

Reaction Point - Successfully Handled CNP Packets

Number of congestion notification packets (CNPs) received and handled successfully.

Mellanox WinOF-2 Diagnostics counters set consists of the following counters:

Mellanox WinOF-2 Diagnostics

Description

Reset Requests

Number of resets requested by NDIS.

Link State Change Events

Number of link status updates received from the hardware.

Link State Change Down Events

Number of events received from the hardware, where the link state was changed to down.

Minor Stall Watermark Reached

Number of times the device detected a stalled state for a period longer than device_stall_minor_watermark.

Note: This counter is relevant only for ETH ports

Critical Stall Watermark Reached

Number of times the port detected a stalled state for a period longer than device_stall_critical_watermark.

Note: This counter is relevant only for ETH ports

Head of Queue timeout Packet discarded

Number of packets discarded by the transmitter due to Head-Of-Queue Lifetime Limit timeout.

Note: This counter is relevant only for ETH ports

Stalled State Packet discarded

Number of packets discarded by the transmitter due to TC in Stalled state.

Note: This counter is relevant only for ETH ports

Requester CQEs flushed with error

Number of requester CQEs flushed with error flowing queue transition to error state.

Send queues priority

The total number of QP/SQ priority/SL update events.

Async EQ Overrun

The number of times an EQ mapped to Async events queue encountered overrun queue.

Completion EQ Overrun

The number of times an EQ mapped to Completion events queue encountered overrun queue.

Current Queues Under Processor Handle

The current number of queues that are handled by the processor due to an Async error (e.g. retry exceeded) or due to a CMD error (e.g. 2eer_qp cmd).

Total Queues Under Processor Handle

The total number of queues that are handled by the processor due to an Async error (e.g. retry exceeded) or due to a CMD error (e.g. 2eer_qp cmd),

Queued Send Packets

Number of send packets pending transmission due to hardware queues overflow.

Send Completions in Passive/Sec

Number of send completion events handled in passive mode per second.

Receive Completions in Passive/Sec

Number of receive completion events handled in passive mode per second.

Packets Received dropped due to Steering

Number of packets that completed the NIC Receive FlowTable steering and were discarded due to lack of match rule in Flow Table.

Copied Send Packets

Number of send packets that were copied in slow path.

Correct Checksum Packets In Slow Path

Number of receive packets that required the driver to perform the checksum calculation and resulted in success.

Bad Checksum Packets In Slow Path

Number of receive packets that required the driver to perform checksum calculation and resulted in failure.

Undetermined Checksum Packets In Slow Path

Number of receive packets with undetermined checksum result.

Watch Dog Expired/Sec

Number of watch dogs expired per second.

Requester time out received

Number of time out received when the local machine generates outbound traffic.

Requester out of order sequence NAK

Number of Out of Sequence NAK received when the local machine generates outbound traffic, i.e. the number of times the local machine received NAKs indicating OOS on the receiving side.

Requester RNR NAK

Number of RNR (Receiver Not Ready) NAKs received when the local machine generates outbound traffic.

Responder RNR NAK

Number of RNR (Receiver Not Ready) NAKs sent when the local machine receives inbound traffic.

Responder out of order sequence received

Number of Out of Sequence packets received when the local machine receives inbound traffic, i.e. the number of times the local machine received messages that are not consecutive.

Responder duplicate request received

Number of duplicate requests received when the local machine receives inbound traffic.

Requester RNR NAK retries exceeded errors

Number of RNR (Receiver Not Ready) NAKs retries exceeded errors when the local machine generates outbound traffic.

Responder Local Length Errors

Number of times the responder detected local length errors

Requester Local Length Errors

Number of times the requester detected local length errors

Responder Local QP Operation Errors

Number of times the responder detected local QP operation errors

Local Operation ErrorsLocal Operation Errors (a.k.a Requester Local QP Operation Errors)

Number of times the requester detected local QP operation errors

Responder Local Protection Errors

Number of times the responder detected memory protection error in its local memory subsystem

Requester Local Protection Errors

Number of times the requester detected a memory protection error in its local memory subsystem

Responder CQEs with Error

Number of times the responder flow reported a completion with error

Requester CQEs with Error

Number of times the requester flow reported a completion with error

Responder CQEs Flushed with Error

Number of times the responder flow completed a work request as flushed with error

Requester CQEs Flushed with Error

Number of times the requester completed a work request as flushed with error

Requester Memory Window Binding Errors

Number of times the requester detected memory window binding error

Requester Bad Response

Number of times an unexpected transport layer opcode was returned by the responder

Requester Remote Invalid Request Errors

Number of times the requester detected remote invalid request error

Responder Remote Invalid Request Errors

Number of times the responder detected remote invalid request error

Requester Remote Access Errors

Number of times the requester detected remote access error

Responder Remote Access Errors

Number of times the responder detected remote access error

Requester Remote Operation Errors

Number of times the requester detected remote operation error

Requester Retry Exceeded Errors

Number of times the requester detected transport retries exceed error

CQ Overflow

Counts the QPs attached to a CQ with overflow condition

Received RDMA Write requests

Number of RDMA write requests received

Received RDMA Read requests

Number of RDMA read requests received

Implied NAK Sequence Errors

Number of times the Requester detected an ACK with a PSN larger than the expected PSN for an RDMA READ or ATOMIC response. The QP retry limit was not exceeded

Dropless Mode Entries

The number of times entered dropless mode.

Dropless Mode Exits

The number of times exited dropless mode.

Transmission Engine Hang Events

The number of sx execution engine hang events.

MTT Entries Used For QP

Number of Memory Translation Table (MTT) entries used for QPs.

MTT Entries Used For CQ

Number of Memory Translation Table (MTT) entries used for CQs.

MTT Entries Used For EQ

Number of Memory Translation Table (MTT) entries used for EQs.

MTT Entries Used For MR

Number of Memory Translation Table (MTT) entries used for MRs.

CPU MEM-Pages (4K) Mapped By TPT For QP

Total number of CPU memory pages (4K) mapped by TPT for QPs.

CPU MEM-Pages (4K) Mapped By TPT For CQ

Total number of CPU memory pages (4K) mapped by TPT for CQs.

CPU MEM-Pages (4K) Mapped By TPT For EQ

Total number of CPU memory pages (4K) mapped by TPT for EQs.

CPU MEM-Pages (4K) Mapped By TPT For MR

Total number of CPU memory pages (4K) mapped by TPT for MRs.

Quota Exceeded Command

Number of commands issued by the VF and failed due to quota being exceeded.

Send Queue Priority Update Flow

The total number of QP/SQ priority/SL update events.

CQ Overrun

Number of times a CQ entered an error state due to overflow. Overflow occurs when the device tries to post a CQE into a full CQ buffer.

Mellanox WinOF-2 Diagnostics Ext 1 counters set consists of the following counters:

Mellanox WinOf-2 Diagnostics Ext 1

Description

RoCE Adaptive Retransmission

The number of adaptive retransmissions for RoCE traffic.

RoCE adaptive retransmission timeouts

The number of times RoCE traffic reached timeout due to adaptive retransmission.

RoCE Slow Restart

The number of times RoCE slow restart option was used.

RoCE Slow Restart CNPs

The number of times RoCE slow restart generated CNP packets.

RoCE Slow Restart Transmission

The number of times RoCE slow restart changed its state to slow restart.

Checksum calculated by SW/Packet

The number of times SW has calculated the checksum.

CQ Overrun

Number of times a CQ entered an error state due to overflow. Overflow occurs when the device tries to post a CQE into a full CQ buffer.

Mellanox WinOF-2 SW Backchannel Diagnostics counters set consists of the following counters:

Mellanox WinOf-2 SW Backchannel Diagnostics

Description

Supported Capabilities Bitmask

Bitmask of capabilities supported by VF

Currently Active Capabilities Bitmask

Bitmask of capabilities currently activated for VF

Read Config Block OIDs/Sec

The number of OID_SRIOV_READ_VF_CONFIG_BLOCK received per second

Write Config Block OIDs/Sec

The number of OID_SRIOV_WRITE_VF_CONFIG_BLOCK received per second

Illegal Or Unsupported Read Config Block OIDs

The number of OID_SRIOV_READ_VF_CONFIG_BLOCK detected as illegal or unsupported

Illegal Or Unsupported Write Config Block OIDs

The number of OID_SRIOV_WRITE_VF_CONFIG_BLOCK detected as illegal or unsupported

Read Config Block OIDs Failed To Apply

The number of OID_SRIOV_READ_VF_CONFIG_BLOCK returned with fail status

Note: It does not necessary indicates error.

Write Config Block OIDs Failed To Apply

The number of OID_SRIOV_WRITE_VF_CONFIG_BLOCK returned with fail status.

Note: It does not necessary indicates error

Warning

Mellanox WinOF-2 Device Diagnostic counters are global for the device used. Therefore, all the adapter cards associated with the device will have the same counters' values.

Mellanox WinOF-2 Device Diagnostic counters set consists of the following counters:.

Mellanox WinOF-2 Device Diagnostics

Description

L0 MTT miss

The number of access to L0 MTT that were missed

L0 MTT miss/Sec

The rate of access to L0 MTT that were missed

L0 MTT hit

The number of access to L0 MTT that were hit

L0 MTT hit/Sec

The rate of access to L0 MTT that were hit

L1 MTT miss

The number of access to L1 MTT that were missed

L1 MTT miss/Sec

The rate of access to L1 MTT that were missed

L1 MTT hit

The number of access to L1 MTT that were hit

L1 MTT hit/Sec

The rate of access to L1 MTT that were hit

L0 MPT miss

The number of access to L0 MKey that were missed

L0 MPT miss/Sec

The rate of access to L0 MKey that were missed

L0 MPT hit

The number of access to L0 MKey that were hit

L0 MPT hit/Sec

The rate of access to L0 MKey that were hit

L1 MPT miss

The number of access to L1 MKey that were missed

L1 MPT miss/Sec

The rate of access to L1 MKey that were missed

L1 MPT hit

The number of access to L1 MKey that were hit

L1 MPT hit/Sec

The rate of access to L1 MKey that were hit

RXS no slow path credits

No room in RXS for slow path packets

RXS no fast path credits

No room in RXS for fast path packets

RXT no slow path credits

No room in RXT for slow path packets

RXT no fast path credits

No room in RXT for fast path packets

Slow path packets slice load

Number of slow path packets loaded to HCA as slices from the network

Fast path packets slice load

Number of fast path packets loaded to HCA as slices from the network

Steering pipe 0 processing time

Number of clocks that steering pipe 0 worked

Steering pipe 1 processing time

Number of clocks that steering pipe 1 worked

WQE address translation back-pressure

No credits between RXW and TPT

Receive WQE cache miss

Number of packets that got miss in RWqe buffer L0 cache

Receive WQE cache hit

Number of packets that got hit in RWqe buffer L0 cache

Slow packets miss in LDB L1 cache

Number of slow packet that got missed in LDB L1 cache

Slow packets hit in LDB L1 cache

Number of slow packet that got hit in LDB L1 cache

Fast packets miss in LDB L1 cache

Number of fast packet that got missed in LDB L1 cache

Fast packets hit in LDB L1 cache

Number of fast packet that got hit in LDB L1 cache

Packets miss in LDB L2 cache

Number of packet that got missed in LDB L2 cache

Packets hit in LDB L2 cache

Number of packet that got hit in LDB L2 cache

Slow packets miss in REQSL L1

Number of slow packet that got missed in REQSL L1 fast cache

Slow packets hit in REQSL L1

Number of slow packet that got hit in REQSL L1 fast cache

Fast packets miss in REQSL L1

Number of fast packet that got missed in REQSL L1 fast cache

Fast packets hit in REQSL L1

Number of fast packet that got hit in REQSL L1 fast cache

Packets miss in REQSL L2

Number of packet that got missed in REQSL L2 fast cache

Packets hit in REQSL L2

Number of packet that got hit in REQSL L2 fast cache

No PXT credits time

Number of clocks in which there were no PXT credits

EQ slices busy time

Number of clocks where all EQ slices were busy

CQ slices busy time

Number of clocks where all CQ slices were busy

MSIX slices busy time

Number of clocks where all MSIX slices were busy

QP done due to VL limited

Number of QP done scheduling due to VL limited (e.g. lack of VL credits)

QP done due to desched

Number of QP done scheduling due to de-scheduling (Tx full burst size)

QP done due to work done

Number of QP done scheduling due to work done (Tx all QP data)

QP done due to limited

Number of QP done scheduling due to limited rate (e.g. max read)

QP done due to E2E credits

Number of QP done scheduling due to e2e credits (other peer credits)

Packets sent by SXW to SXP

Number of packets that were authorized to send by SXW (to SXP)

Steering hit

Number of steering lookups that were hit

Steering miss

Number of steering lookups that were miss

Steering processing time

Number of clocks that steering pipe worked

No send credits for scheduling time

The number of clocks that were no credits for scheduling (Tx)

No slow path send credits for scheduling time

The number of clocks that were no credits for scheduling (Tx) for slow path

TPT indirect memory key access

The number of indirect mkey accesses

Internal RQ out of buffer

Number of times the device that owned the queue had insufficient number of buffers allocated

Nic temperature in Celsius degrees unit

The temperature of the NIC in Celsius degrees unit

Mellanox WinOF-2 PCI Device Diagnostic counters set consists of the following counters:

Mellanox WinOF-2 PCI Device Diagnostic

Description

PCI back-pressure cycles

The number of clocks where BP was received from the PCI, while trying to send a packet to the host.

PCI back-pressure cycles/Sec

The rate of clocks where BP was received from the PCI, while trying to send a packet to the host.

PCI write back-pressure cycles

The number of clocks where there was lack of posted outbound credits from the PCI, while trying to send a packet to the host.

PCI write back-pressure cycles/Sec

The rate of clocks where there was lack of posted outbound credits from the PCI, while trying to send a packet to the host.

PCI read back-pressure cycles

The number of clocks where there was lack of non-posted outbound credits from the PCI, while trying to send a packet to the host.

PCI read back-pressure cycles/Sec

The rate of clocks where there was lack of non-posted outbound credits from the PCI, while trying to send a packet to the host.

PCI read stuck no receive buffer

The number of clocks where there was lack in global byte credits for non-posted outbound from the PCI, while trying to send a packet to the host.

Available PCI BW/Sec

The number (per seconds) of 128 bytes that are available by the host.

Used PCI BW//Sec

The number (per seconds) of 128 bytes that were received from the host.

Available PCI BW

[Deprecated] The number of 128 bytes that are available by the host.

Used PCI BW

[Deprecated] The number of 128 bytes that were received from the host.

RX PCI errors

The number of physical layer PCIe signal integrity errors. The number of transitions to recovery due to Framing errors and CRC (dlp and tlp). If the counter is advancing, try to change the PCIe slot in use.

Note: Only a continues increment of the counter value is considered an error.

TX PCI errors

The number of physical layer PCIe signal integrity errors. The number of transition to recovery initiated by the other side (moving to Recovery due to getting TS/EIEOS). If the counter is advancing, try to change the PCIe slot in use.

Note: transitions to recovery can happen during initial machine boot. The counter should not increment after boot.

Note: Only a continues increment of the counter value is considered an error.

TX PCI non-fatal errors

The number of PCI transport layer Non-Fatal error msg sent. If the counter is advancing, try to change the PCIe slot in use.

TX PCI fatal errors

The number of PCIe transport layer fatal error msg sent. If the counter is advancing, try to change the PCIe slot in use.

PCI link width the current width of PCIe link

In order to get the overall PCIe bandwidth, the PCI link width should be multiply by PCI link speed.

PCI link speed the current speed of PCIe link

In order to get the overall PCIe bandwidth, the PCI link speed should be multiply by PCI link width.

RX Packet Drops PCIe Buffers

Number of packets dropped by Weighted Random Early Detection (WRED) function.

RX Packet Marked PCIe Buffers

Number of packets marked as ECN.

Warning

Mellanox WinOF2 VF Diagnostics counters exist per each VF and are created according to the adapter's configurations. These counters are created upon VFs configuration even if the VFs are not up.

Mellanox WinOF2 VF Diagnostics counters set consists of VF diagnostic and debug counters. This set is available only on the hypervisors and not on the virtual network adapters:

Mellanox WinOF-2 VF Diagnostics

Description

Async EQ Overrun

The number of times an EQ mapped to Async events queue encountered overrun queue.

Completion EQ Overrun

The number of times an EQ mapped to Completion events queue encountered overrun queue.

Current Queues Under Processor Handle

The current number of queues that are handled by the processor due to an Async error (e.g. retry exceeded) or due to a CMD error (e.g. 2eer_qp cmd).

Total Queues Under Processor Handle

The total number of queues that are handled by the processor due to an Async error (e.g. retry exceeded) or due to a CMD error (e.g. 2eer_qp cmd).

Packets Received dropped due to Steering

Number of packets that completed the NIC Receive FlowTable steering and were discarded due to lack of match rule in Flow Table.

Packets Received dropped due to VPort Down

Number of packets that were steered to a VPort, and discarded because the VPort was not in a state to receive packets

Packets Transmitted dropped due to VPort Down

Number of packets that were transmitted by a vNIC, and discarded because the VPort was not in a state to transmit packets.

Invalid Commands

Number of commands issued by the VF and failed.

Quota Exceeded Command

Number of commands issued by the VF and failed due to
quota exceeded.

Send Queue Priority Update Flow

The total number of QP/SQ priority/SL update events.

Packets Received WQE too small

The number of packets that reached the Ethernet RQ but cannot fit into the WQE due to their large size

CQ Overrun

Number of times CQs entered an error state due to overflow

Packets Received dropped due to lack of receive WQEs

Number of dropped packets due to lack of receive WQEs for an internal device RQs

Warning

Mellanox WinOF-2 VF Internal Traffic Counters are relevant for Physical Functions ONLY.

Mellanox WinOF-2 VF Internal Traffic Counters set consists of counters that measure the rates at which bytes and packets are sent and received over each core of a virtual port that is bound to a virtual PCI function.

This set is available only on hypervisors, and each virtual network adapter should be allowed to update its counters by using the mlx5cmd tool.

Warning

The virtual network adapter driver should support internal traffic counter set exposure, to make it available on hypervisor.

Warning

These counters are relevant only for ETH ports.

Mellanox WinOF-2 VF Internal Traffic

Description

Receive Packets

The number of packets received by this virtual adapter at specific core.

Receive Octets

The number of bytes received by this virtual adapter at specific core. The counted bytes don't include framing characters (modulo 2^64)

Transmit Packets

The number of packets sent by this virtual adapter at specific core.

Transmit Octets

The number of bytes sent by this virtual adapter at specific core. The counted bytes don't include framing characters (modulo 2^64)

Controlling VF Internal Traffic

VF Internal Traffic Counters can be controlled using the mlx5cmd.exe tool. The tool enables the user to make the virtual network adapter's traffic counters per core available or unavailable for performance monitoring consumers.

Usage:

mlx5cmd.exe -VfStats -name <adapter> -vf <virtual function ID> [-register -rate <in 100 mSec.> | -unregister]

Detailed usage:

mlx5cmd.exe -VfStats -hh

Warning

These counters set is relevant only for ETH ports.

Warning

Mellanox WinOF-2 Rss counters may have performance impact when they are active.

Mellanox WinOF-2 Rss Counters set provides monitoring for hardware RSS behavior. These counters are accumulative and collect packets per type (IPv4 or IPv6 only, IPv4/6 TCP or UDP), for tunneled and non-tunneled traffic separately, and when the hardware RSS is functional or dysfunctional.

The counters are activated upon first addition into perfmon, and are stopped upon removal.

Setting "RssCountersActivatedAtStartup" registry key to 1 in the NIC properties will cause the Rss counters to collect data from the startup of the device.

All Rss counters are provided under the counter set “Mellanox Adapter Rss Counters”.

Each Ethernet adapter provides multiple instances:

  • Instance per vPort per CPU in HwRSS mode is formatted: <NetworkAdapter> + vPort_<id> CPU_<cpu>

  • Instance per network adapter per CPU in native Rss per CPU is formatted: <NetworkAdapter> CPU_<cpu>

Mellanox WinOF-2 Rss

Description

Number of interrupts

Number of interrupts generated to process RX completions.

Rss IPv4 Only

Shows the number of received packets that have RSS hash calculated on IPv4 header only

Rss IPv4/TCP

Shows the number of received packets that have RSS hash calculated on IPv4 and TCP headers

Rss IPv4/UDP

Shows the number of received packets that have RSS hash calculated on IPv4 and UDP headers

Rss IPv6 only

Shows the number of received packets that have RSS hash calculated on IPv6 header only

Rss IPv6/TCP

Shows the number of received packets that have RSS hash calculated on IPv6 and TCP headers

Rss IPv6/UDP

Shows the number of received packets that have RSS hash calculated on IPv6 and UDP headers

Encapsulated Rss IPv4 Only

Shows the number of received encapsulated packets that have RSS hash calculated on IPv4 header only

Encapsulated Rss IPv4/TCP

Shows the number of received encapsulated packets that have RSS hash calculated on IPv4 and TCP headers

Encapsulated Rss IPv4/UDP

Shows the number of received encapsulated packets that have RSS hash calculated on IPv4 and UDP headers

Encapsulated Rss IPv6 Only

Shows the number of received encapsulated packets that have RSS hash calculated on IPv6 header only

Encapsulated Rss IPv6/TCP

Shows the number of received encapsulated packets that have RSS hash calculated on IPv6 and TCP headers

Encapsulated Rss IPv6/UDP

Shows the number of received encapsulated packets that have RSS hash calculated on IPv6 and UDP headers

NonRss IPv4 Only

Shows the number of IPv4 packets that have no RSS hash calculated by the hardware

NonRss IPv4/TCP

Shows the number of IPv4 TCP packets that have no RSS hash calculated by the hardware

NonRss IPv4/UDP

Shows the number of IPv4 UDP packets that have no RSS hash calculated by the hardware

NonRss IPv6 Only

Shows the number of IPv6 packets that have no RSS hash calculated by the hardware

NonRss IPv6/TCP

Shows the number of IPv6 TCP packets that have no RSS hash calculated by the hardware

NonRss IPv6/UDP

Shows the number of IPv6 UDP packets that have no RSS hash calculated by the hardware

Encapsulated NonRss IPv4 Only

Shows the number of encapsulated IPv4 packets that have no RSS hash calculated by the hardware

Encapsulated NonRss IPv4/TCP

Shows the number of encapsulated IPv4 TCP packets that have no RSS hash calculated by the hardware

Encapsulated NonRss IPv4/UDP

Shows the number of encapsulated IPv4 UDP packets that have no RSS hash calculated by the hardware

Encapsulated NonRss IPv6 Only

Shows the number of encapsulated IPv6 packets that have no RSS hash calculated by the hardware

Encapsulated NonRss IPv6/TCP

Shows the number of encapsulated IPv6 TCP packets that have no RSS hash calculated by the hardware

Encapsulated NonRss IPv6/UDP

Shows the number of encapsulated IPv6 UDP packets that have no RSS hash calculated by the hardware

Rss Misc

Shows the number of received packets that have RSS hash calculated with unknown RSS hash type

Encapsulated Rss Misc

Shows the number of received encapsulated packets that have RSS hash calculated with unknown RSS hash type

NonRss Misc

Shows the number of packets that have no RSS hash calculated by the hardware for no apparent reason

Encapsulated NonRss Misc

Shows the number of encapsulated packets that have no RSS hash calculated by the hardware for no apparent reason

Mellanox WinOF-2 Receive Datapath counters set provides queue counters per receive. These counters are available in Native, VMQ and SR-IOV mode. These counters provide visibility into the driver when running traffic. Each Ethernet adapter provides multiple instances. An instance per vPort per queue number is formatted as one of the below depending on the mode set (Native or VMQ/SR-IOV):

  • <NetworkAdapter> + RqNum_<num>

  • <NetworkAdapter> + vPort_<id> + RqNum_<num>

Warning

These counters set is relevant only for ETH ports.

Mellanox WinOF-2 Receive Datapath

Description

Cpu Number

The CPU where the driver process the queue completions.

Drops due to invalid packet size

Advanced when a packet is received with <A> size that is larger than the maximum MTU size allowed, which is the max size HW supports. The value can be checked using the NDIS miniport adapter general attributes struct in the field MTuSize.

Number of receive buffers posted

When this counter is not advancing, the SW/HW might be stuck. Meaning, either the SW is not processing the receive requests or the HW is not using the post receives. To check the state of WQ/CQ, check the error events log messages.

Average packet count per indicate

The average of the handled send packets per indicate calls to NDIS. The average is the number of packets completed /number of indicates to NDIS.

Packets in low resource mode

When a forced low resource (Regestry ForceLowResourcesIndication is 1, when the default is 0) or the number of outstanding post receive is lower than the minimum number of RFDs configured (Regestry is NicMinRfds).

Packets processed in interrupt mode

The number of packets indicated to NDIS during interrupt. The counter progresses as the argument “NumberOfNetBufferLists” in the function "NdisMIndicateReceiveNetBufferLists" progresses when it is called during interrupt handling.

Packets processed in polling mode

The number of packets indicated to NDIS while in polling mode.

Consumed max receives

Number of times the driver processed the number of packets that is higher than the maximum calls to NDIS Indicate (the value shown in REgestry MaxCallsToNdisIndicate). When this counter progresses, the driver stops processing any more packets.

Note: The counters “Packets processed in polling mode” and “Packets processed in interrupt mode” also progress accordingly.

Number of traffic profile transitions

Number of times the core’s Receive Queue changed traffic Latency/ Throughput.

DpcWatchDog (SingleDpc) Starvation

The number of times the driver had watchdog starvation during DPC and re-submitted a DPC. When this counter progresses, DPC does not process any packets, meaning counters 6-10 will not progress.

DpcWatchDog (TotalDpc) Starvation

The number of times the driver had watchdog starvation during DPC and moved to. When this counter progresses, DPC does not process any packets, meaning counters 6-10 will not progress.

Drops due to completion queue errors

The number of Receive Drops Due To Cqe Errors.

Interrupts on incorrect cpu

The number of received interrupts on a wrong CPU. In this case, the driver re-submits a DPC on the correct CPU.

Number of interrupts

Number of Receive Datapath interrupts.

Strided Wqes

The number of Wqes that its strides are consumed by the HW. They should progress only if StridingRQ feature is enabled (check in Regestry StridingRqEnabled).

Counters: For every N > 0 packets received, the packetsCounter should be incremented by N. The wqe counter can be incremented by [upper bound(N/number of strides in wqe) ,N].

Ecn Marked Packets (Ipv4)

The number of times the driver marked an IPv4 packet with ECN.

Ecn Marked Packets (Ipv6)

The number of times the driver marked an IPv6 packet with ECN.

Packets processed in NDIS poll mode

When the feature is enabled, counter for “Packets processed in Interrupt mode” or “Packets processed in poll mode” are not counters incremented.

Mellanox WinOF-2 Transmit Datapath counters set provides queue counters per transmit. These counters are available in Native, VMQ and SR-IOV mode. These counters provide visibility into the driver when running traffic. Each Ethernet adapter provides multiple instances. An instance per vPort per queue number is formatted as one of the below depending on mode (Native or VMQ/SR-IOV):

  • <NetworkAdapter> + SqNum_<num>

  • <NetworkAdapter> + vPort_<id> + SqNum_<num>

Warning

These counters set is relevant only for ETH ports.

Mellanox WinOF-2 Transmit Datapath

Description

Cpu Number

The CPU where the driver process the queue completions.

Transmit ring is full

Counts the time the transmit ring was full during sends.

Transmit copy packets

Counts the number of times a packet should be copied during sends. This could happen in case a packet has a size larger than supported by the HW.

Number of packets posted

The number of send requests that have been forwarded to the HW, (packets that are pending aren’t counted).

Number of packets completed

Counts the number of processed and completed sends, when it progress, the resources allocated to the sent packet is freed.

OS call to build SGL failed

The LSO header size cannot be received if SKB allocation fails or the packet has an invalid size.

Drops due to invalid packet size

Number of packets with invalid size, “OS call to build SGL failed” counter should also progress in this case.

Number of packets posted in bypass mode

Number of packets detected by driver as forwarded.

Average packet count per indicate

The average of the handled send packets per indicate calls to NDIS. The average is the number of packets completed /number of indicates to NDIS.

Interrupts on incorrect cpu

The number of times the TX received a completion on the wrong CPU. In such case, the driver re-submits a DCP on the correct CPU.

CQ Overrun

Number of times a CQ entered an error state due to overflow. Overflow occurs when the device tries to post a CQE into a full CQ buffer.

Mellanox WinOF-2 Port Diagnostics counters set contains physical layer statistical counters. This set exists for every adapter in the PF, it is not supported in the VF.

Mellanox WinOF-2 Port Diagnostics

Description

RX Error Lane0 phy

The number error bits on lane 0

RX Error Lane0 phy/Sec

The rate of changing of the lane 0 counter

RX Error Lane1 phy

The number error bits on lane 1

RX Error Lane1 phy/Sec

The rate of changing of the lane 1 counter

RX Error Lane2 phy

The number error bits on lane 2

RX Error Lane2 phy/Sec

The rate of changing of the lane 2 counter

RX Error Lane3 phy

The number error bits on lane 3

RX Error Lane3 phy/Sec

The rate of changing of the lane 3 counter

RX Kbits phy

The total amount of traffic that could have been received on the port

RX Kbits phy/Sec

The rate of changing of the above counter

RX PCS Corrected Bits phy

The number of symbol errors that wasn't corrected by FEC correction algorithm or that FEC algorithm was not active on this interface

RX PCS Corrected Bits phy/Sec

The rate of changing of the above counter

RX PCS Symbol Error phy

The number of corrected bits on this port according to active FEC (RS/FC).

If this counter is increasing, it implies that the link between the NIC and the network is suffering from high BER

RX PCS Symbol Error phy/Sec

The rate of changing of the above counter

© Copyright 2023, NVIDIA. Last updated on May 23, 2023.