Fabric Performance Utilities
The performance utilities described in this chapter are intended to be used as a performance micro-benchmark. They support both InfiniBand and RoCE.
For further information on the following tools, please refer to the help text of the tool by running the --help command line parameter.
The performance utilities described in the table below will be deprecated as of the next release.
Utility | Description |
nd_write_bw | This test is used for performance measuring of RDMA-Write requests in Microsoft Windows Operating Systems. nd_write_bw is performance oriented for RDMA-Write with maximum throughput, and runs over Microsoft's NetworkDirect standard. The level of customizing for the user is relatively high. User may choose to run with a customized message size, customized number of iterations, or alternatively, customized test duration time. nd_write_bw runs with all message sizes from 1B to 4MB (powers of 2), message inlining, CQ moderation. |
nd_write_lat | This test is used for performance measuring of RDMA-Write requests in Microsoft Windows Operating Systems. nd_write_lat is performance oriented for RDMA-Write with minimum latency, and runs over Microsoft's NetworkDirect standard. The level of customizing for the user is relatively high. User may choose to run with a customized message size, customized number of iterations, or alternatively, customized test duration time. nd_write_lat runs with all message sizes from 1B to 4MB (powers of 2), message inlining, CQ moderation. |
nd_read_bw | This test is used for performance measuring of RDMA-Read requests in Microsoft Windows Operating Systems. nd_read_bw is performance oriented for RDMA-Read with maximum throughput, and runs over Microsoft's NetworkDirect standard. The level of customizing for the user is relatively high. User may choose to run with a customized message size, customized number of iterations, or alternatively, customized test duration time. nd_read_bw runs with all message sizes from 1B to 4MB (powers of 2), message inlining, CQ moderation. |
nd_read_lat | This test is used for performance measuring of RDMA-Read requests in Microsoft Windows Operating Systems. nd_read_lat is performance oriented for RDMA-Read with minimum latency, and runs over Microsoft's NetworkDirect standard. The level of customizing for the user is relatively high. User may choose to run with a customized message size, customized number of iterations, or alternatively, customized test duration time. nd_read_lat runs with all message sizes from 1B to 4MB (powers of 2), message inlining, CQ moderation. |
nd_send_bw | This test is used for performance measuring of Send requests in Microsoft Windows Operating Systems. nd_send_bw is performance oriented for Send with maximum throughput, and runs over Microsoft's NetworkDirect standard. The level of customizing for the user is relatively high. User may choose to run with a customized message size, customized number of iterations, or alternatively, customized test duration time. nd_send_bw runs with all message sizes from 1B to 4MB (powers of 2), message inlining, CQ moderation. |
nd_send_lat | This test is used for performance measuring of Send requests in Microsoft Windows Operating Systems. nd_send_lat is performance oriented for Send with minimum latency, and runs over Microsoft's NetworkDirect standard. The level of customizing for the user is relatively high. User may choose to run with a customized message size, customized number of iterations, or alternatively, customized test duration time. nd_send_lat runs with all message sizes from 1B to 4MB (powers of 2), message inlining, CQ moderation. |
MlxNdPerf is a new tool that replaces all older Network Direct applications from older drivers (e.g nd_write_bw, nd_read_bw, nd_send_bw, nd_*_lat). The tool is used to determine the maximum performance with various parameters and what is the current available RDMA Read\Write\Send Performance between two endpoints.
The following are the commands used by the tool to perform various operations:
Client or Server Role
The role of Client or Server determines if this side is an RDMA requestor or responder (Client → Requestor, Server → Responder).
Usage | MlxNdPerf.exe -Server\ -Client |
RDMA Operation
Determines the RDMA operation to be performed, a single option per time.
Usage | MlxNdPerf -Read\-Write\-Send |
Source/Destination IP
Determines the Source IP (The local IP) and the Destination IP (Remote IP).
Usage | MlxNdPerf -SrcIP\ -DestIP |
Number of Threads
Determines the number of threads to be executed, a single QP per thread.
Usage | MlxNdPerf -NumOfThreads |
Port Number
Determines the port number used.
Usage | MlxNdPerf -PortNumber |
Number of Scatter
Determines the number of scatter gather entries per post Send\Write\Read.
Usage | MlxNdPerf -SgeNumber |
Buffer Size
Determines the number of bytes to be transmitted by a single post Send\Write\Read.
Usage | MlxNdPerf -BufferSize |
Queue Depth
Determines the number of entries in the QP and the CQ.
Usage | MlxNdPerf -QueueDepth |
Number of Iteration
Determines the number of iteration for post Send\Write\Read. Is ignored when in Duration mode.
Usage | MlxNdPerf -Iterations |
Duration Mode
Duration mode – for how long the test executes in seconds.
Usage | MlxNdPerf -Duration |
Event Notification Mode
Use event Notification mode for the CQ, it does not poll the CQ.
Usage | MlxNdPerf -UseEvents |
Resiliency
Registering to the adapter's status changes callbacks and listening for any adapter status changes. In this mode the application will not exit unless the test is completed successfully.
Note: This Mode is not available for the Server side when in Send Mode.
Usage | MlxNdPerf -Resilient |
Verbose
Enables extra information prints.
Usage | MlxNdPerf -Verbose |
Example1 Measure the bandwidth on operation IB Read with traffic from 2 threads, running for 30 seconds
Server side: MlxNdPerf.exe -Server -Read -SrcIp 11.137.58.1 -DestIp 11.137.58.1 -NumOfThreads 2 -Duration 30
Client side: MlxNdPerf.exe -Client -Read -SrcIp 11.137.57.1 -DestIp 11.137.58.1 -NumOfThreads 2 -Duration 30
Example2 Measure the latency on operation IB Write with Ipv6 addresses (if Estat presents)
Server side: MlxNdPerf.exe -Server -Write -SrcIp fe80::ee0d:9aff:fe42:e8e8 -DestIp fe80::ee0d:9aff:fe42:e8e8 -EstatLatency -BufferSize 8
Client side: MlxNdPerf.exe -Client -Write -SrcIp fe80::ee0d:9aff:fe42:e8e4 -DestIp fe80::ee0d:9aff:fe42:e8e8 -EstatLatency -BufferSize 8
The purpose of this test is to check interoperability between Linux and Windows via an RDMA ping. The Windows nd_rping was ported from Linux's RDMACM example: rping.c
Windows
To use the built-in nd_rping.exe tool, go to: C:\Program Files\Mellanox\MLNX_WinOF2\Performance Tools
To build the nd_rping.exe from scratch, use the SDK example: choose the machine's OS in the configuration manager of the solution, and build the nd_rping.exe .
Linux
Installing the MLNX_OFED on a Linux server will also provide the "rping" application.