NVIDIA® WinOF VPI Documentation v5.50.54000
Linux Kernel Upstream Release Notes v6.5

Device Proprietary Counters

Device propriety counters are per device and not per port.

These counters are intended for advanced debug of performance issues and may be used by Mellanox support to identify root cause in such cases. They do not necessarily indicate the existence of a problem but are often useful as additional information in the debug of performance issues.

Name

Description

PCI Back-pressure/sec

Device core clocks without PCIe read/write credits.

This value will be larger if the Host’s ability to receive data from the NIC is lower.

Possible causes: the memory accessed is not cached or aligned properly, or CPU frequency is low or throttled by power management.

No-WQE drops/sec

Number of times per second a received queue from the device to the host has no software buffers (WQE - Work Queue Entries) allocated for the adapter's incoming traffic. This counter indicates that the NIC hardware was not able to post received data to the host due to lack of software allocated buffers.

Possible causes: Slow or overloaded CPU cores.

Possible fixes: Increase the number of receive buffers in the driver's advanced properties tab.

This counter is summed in OID_GEN_STATISTICS.ifInDiscards and is not counted in Packets Received.

Scatter Back-pressure/sec

Device core clocks where Scatter delays Rx packet processing. Supported only on ConnectX3-Pro.

WQE fetch/Atomic Back-pressure/sec

Device core clocks where Work-Queue-Element fetch or Atomic operation delay Rx packet processing. Supported only on ConnectX3-Pro.

Steering/QPC Back-pressure/sec

Device core clocks where packet steering or queue-context handling delay Rx packet processing. Supported only on ConnectX3-Pro.

SQ Miss/sec

Transmit-queue/Requestor-QP context cache miss.

RQ Miss/sec

Receive-queue/Responder-QP context cache miss.

CQ Miss/sec

Completion-Queue (CQ) context cache miss.

EQ Miss/sec

Event-Queue (EQ) context cache miss.

MTT Miss/sec

Address translation page table (MTT) cache miss.

MPT Miss/sec

Address translation region table (MPT) cache miss.

External Blueflame hit/sec

Latency critical work-queue-element (BlueFlame) read from NIC buffer.

External Blueflame replace/sec

Latency critical work-queue-element (BlueFlame) swap out from NIC buffer.

External Doorbell push/sec

Amount of doorbells received.

External Doorbell drop/sec

Amount of doorbells dropped.

This set of counters contains device’s low-level counters used for debugging and behavior analysis.

Mellanox WinOF Bus Counters

Description

PCI Back-pressure/sec

Device core clocks without PCIe read/write credits.

No-WQE Drops/sec

The amount of packet drops due to no available receive buffers in the host.

Scatter Back-pressure/sec

Device core clocks where the Scatter delays Rx packet processing. Supported only on Connectx3-Pro.

WQE fetch/Atomic Back-pressure/sec

Device core clocks where Work-Queue-Element fetch or Atomic operation delay Rx packet processing. Supported only on Connectx3-Pro.

Steering/QPC Back-pressure/sec

Device core clocks where packet steering or queue-context handling delay Rx packet processing. Supported only on Connectx3-Pro.

Receive WQE cache hit/sec

The number of receive WQE cache lookups resulted in a hit.

Receive WQE cache lookup/sec

The number of receive WQE cache lookups.

SQ Miss/sec

Transmit-queue/Requester-QP context cache miss.

RQ Miss/sec

Receive-queue/Responder-QP context cache miss.

CQ Miss/sec

Completion-Queue (CQ) context cache miss.

EQ Miss/sec

Event-Queue (EQ) context cache miss.

MTT Miss/sec

Address translation page table (MTT) cache miss.

MPT Miss/sec

Address translation region table (MPT) cache miss.

External Blueflame hit/sec

Latency critical work-queue-element (BlueFlame) read from NIC buffer.

External Blueflame Replace/sec

Latency critical work-queue-element (BlueFlame) swap out from NIC buffer.

External Doorbell Push/sec

Amount of doorbells received.

External Doorbell Drop/sec

Amount of doorbells dropped.

Internal Processor0 Maximum Latency

The longest internal processor[0] process cycle in microSec.

Internal Processor1 Maximum Latency

The longest internal processor[1] process cycle in microSec.

Internal Processor2 Maximum Latency

The longest internal processor[2] process cycle in microSec.

Internal Processor3 Maximum Latency

The longest internal processor[3] process cycle in microSec.

Internal processor executed commands

The number of commands executed by the internal processor due to driver request via HCR command interface.

Last Retransmitted QP

The last QP that performed retransmission - RC QP only.

Current QPS in error state

The number of QPs in error state due to async error (e.g. retry exceeded) or due to CMD with errors (e.g. 2eer_qp cmd).

QP priority update flow events

The number of QP priority/SL update events.

Transmission engine hang events

The number of SX execution engine hang events.

Current QPS in limited state

The number of QPs that are in a limited state.

Total QPS in limited state

The total number of QPs that were in limited state.

Maximum QPS in limited state

Maximum number of QPs that were in limited state at the same time

MPT entries used for QP

The number of Memory Protection Table (MPT) entries used for QPs.

MPT entries used for CQ

The number of Memory Protection Table (MPT) entries used for CQs.

MPT entries used for EQ

The number of Memory Protection Table (MPT) entries used for EQs.

MPT entries used for MR

The number of Memory Protection Table (MPT) entries used for MRs.

MTT entries used for QP

The number of Memory Translation Table (MTT) entries used for QPs.

MTT entries used for CQ

The number of Memory Translation Table (MTT) entries used for CQs.

MTT entries used for EQ

The number of Memory Translation Table (MTT) entries used for EQs.

MTT entries used for MR

The number of Memory Translation Table (MTT) entries used for MRs.

CPU MEM-pages (4K) mapped by TPT for QP

The total number of CPU memory pages (4K) mapped by TPT for QPs.

CPU MEM-pages (4K) mapped by TPT for CQ

The total number of CPU memory pages (4K) mapped by TPT for CQs.

CPU MEM-pages (4K) mapped by TPT for EQ

The total number of CPU memory pages (4K) mapped by TPT for EQs.

CPU MEM-pages (4K) mapped by TPT for MR

The total number of CPU memory pages (4K) mapped by TPT for MRs.

Arrived RDMA CNPs

The total number of received CNP packets for both ports.

Packets discarded due to invalid QP

The number of packets discarded due to an invalid QP.

© Copyright 2023, NVIDIA. Last updated on Oct 26, 2023.