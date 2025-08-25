MlxNdPerf is a new tool that replaces all older Network Direct applications from older drivers (e.g nd_write_bw, nd_read_bw, nd_send_bw, nd_*_lat). The tool is used to determine the maximum performance with various parameters and what is the current available RDMA Read\Write\Send Performance between two endpoints.

The following are the commands used by the tool to perform various operations:

The role of Client or Server determines if this side is an RDMA requestor or responder (Client → Requestor, Server → Responder).

Usage MlxNdPerf.exe -Server\ -Client

Determines the RDMA operation to be performed, a single option per time.

Usage MlxNdPerf -Read\-Write\-Send

Determines the RDMA operation which sends data via several QPs.

The number of QPs is defined by the "-NumOfQps" parameter.

Usage MlxNdPerf -ReadEx\-WriteEx\-SendEx

The following are several limitations related to multi-QP tests:

the "-NumOfThreads" parameter is not supported, therefore the new operations run only with one thread

the "-SGENumber" parameter is not supported, therefore a single SGE is used

the "-NumOfQPs" parameter should be coded on both sides and should have the same value

the "-UseEvents" parameter should be either used, or not used on both sides

the "'-Resilient' parameter is not supported with multi-QP tests

Determines the Source IP (The local IP) and the Destination IP (Remote IP).

Usage MlxNdPerf -SrcIP\ -DestIP

Determines the number of threads to be executed, a single QP per thread.

Usage MlxNdPerf -NumOfThreads

Determines the number of QPs to be used with multi-QP operations.

Usage MlxNdPerf -NumOfQps

Determines the port number used.

Usage MlxNdPerf -PortNumber

Determines the number of scatter gather entries per post Send\Write\Read.

Usage MlxNdPerf -SgeNumber

Determines the number of bytes to be transmitted by a single post Send\Write\Read.

Usage MlxNdPerf -BufferSize

Determines the number of entries in the QP and the CQ.

Usage MlxNdPerf -QueueDepth

Determines the number of iteration for post Send\Write\Read. Is ignored when in Duration mode.

Usage MlxNdPerf -Iterations

Duration mode – for how long the test executes in seconds.

Usage MlxNdPerf -Duration

Use event Notification mode for the CQ, it does not poll the CQ.

Usage MlxNdPerf -UseEvents

Registering to the adapter's status changes callbacks and listening for any adapter status changes. In this mode the application will not exit unless the test is completed successfully.

Note: This Mode is not available for the Server side when in Send Mode.

Usage MlxNdPerf -Resilient

Latency can be measured using the "-Latency" parameter. This parameter should be added to one of the operation - Write, Read or Send .

Usage MlxNdPerf -Read\-Write\-Send -Latency

The following are several limitations related to the latency tests:

Parameters '-Latency' and '-BufferSize' should be coded from both sides.

Parameters '-Resilient' , '-NumOfThreads', '-UseEvents' and '-QueueDepth' are not supported with latency tests.

The tool prints the intermediate results once in a second. This can be changed using the "-ReportPeriod" parameter. The value "N=0", disables the intermediate prints.

Usage MlxNdPerf -ReportPeriod <N>

Enables extra information prints.

Usage MlxNdPerf -Verbose 1

Example 1 Measure the bandwidth on operation IB Read with traffic from 2 threads, running for 30 seconds

Server side: MlxNdPerf.exe -Server -Read -SrcIp 11.137.58.1 -DestIp 11.137.58.1 -NumOfThreads 2 -Duration 30

Client side: MlxNdPerf.exe -Client -Read -SrcIp 11.137.57.1 -DestIp 11.137.58.1 -NumOfThreads 2 -Duration 30

Example 2 Measure the latency on operation IB Write with Ipv6 addresses (if Estat presents)

Server side: MlxNdPerf.exe -Server -Write -SrcIp fe80::ee0d:9aff:fe42:e8e8 -DestIp fe80::ee0d:9aff:fe42:e8e8 -Latency -BufferSize 8

Client side: MlxNdPerf.exe -Client -Write -SrcIp fe80::ee0d:9aff:fe42:e8e4 -DestIp fe80::ee0d:9aff:fe42:e8e8 -Latency -BufferSize 8

Note Ngauge is supported on the following cards: ConnectX-7 and newer cards

BlueField-3 and newer cards

The Ngauge tool can now be executed directly from a virtual machine with a Virtual Function (VF) that is configured with the appropriate trust capabilities.

To run Ngauge directly from a VM, the host requires specific configuration:

The DiagTelemetryMode registry key must be set to 1 (DOCA only) for all adapters associated with the NIC in use.

The VFTrustCaps registry key must be configured to enable DIAG_DATA_TRUST .

For detailed instructions on configuring these registry keys, refer to the “Configuring the Driver Registry Keys” section of the User Manual.

For more information about the Ngauge tool, please refer to the DOCA User Manual.

Ngauge is currently supported only on Linux.

When using the Ngauge tool, the following counter sets cannot be queried through Perfmon: Mellanox WinOF-2 Device Diagnostics Mellanox WinOF-2 PCI Device Diagnostics Mellanox WinOF-2 Icmc Diag Counters Ext1

Perfmon should not be used to query any counters from the above sets while Ngauge is running.

The user can run tests using GPU memory instead of host memory. This requires CUDA version 12.8 or later. To test with GPU memory, use the device specified by CudaDeviceID.

Usage MlxNdPerf -CudaDeviceID