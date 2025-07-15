DOCA Perftest supports the following usage modes to accommodate both basic and advanced benchmarking needs:

Simple benchmarks – Use command-line arguments directly on the client and server nodes to perform quick bandwidth or latency tests. This mode is ideal for testing individual parameters or small-scale deployments.

Complex scenarios – Use a JSON input file to define multi-node, multi-benchmark configurations. This mode enables synchronized, distributed benchmarking across multiple systems with centralized result aggregation.

To perform a basic benchmark test, run doca_perftest on both the server and client nodes using CLI parameters.

Example:

Server (responder): Copy Copied! doca_perftest -co 1,2 -d mlx5_0 -ct UC - v send -m bw -s 1024 -D 10

Client (requestor): Copy Copied! doca_perftest -co 1 , 2 -d mlx5_0 -ct UC -v send -m bw -s 1024 -D 10 -sn remote-server-name

Parameter breakdown:

-co 1,2 – Run synchronously on CPU cores 1 and 2

-d mlx5_0 – Use the local IB device mlx5_0

-ct UC – Use Unreliable Connection (UC) type

-v send – Use send RDMA verb

-m bw – Measure bandwidth (BW)

-s 1024 – Message size is 1024 bytes

-D 10 – Set test duration to 10 seconds

-sn remote-server-name – (Client only) Target server hostname

The following options are available via -h / --help :

Copy Copied! doca_perftest -h

Argument Description Default Value -in / --input_file JSON input file for complex scenarios. If specified, overrides all other options (except help/debug). -co / --cores MANDATORY. Comma-separated list or range of CPU cores. -ct / --connection_type Connection type: RC or UC . -v / --verb RDMA verb: read , write , send , or writeImm . write -m / --metric Metric type: bw (bandwidth) or lat (latency). bw -s / --msg_size Message size in bytes. 64k -l / --inline_size Inline size, up to 1024 bytes. 0 or message size (if < 220 ) -tx / --tx_depth Send queue depth. 128 --poll_batch_size Max number of WCs (cookies) per poll. 16 -mod Cookie moderation value (up to 8k messages). 1 -os / --old_post_send Use legacy post-send mode instead of IB_WR_API . IB_WR_API -er / --enhanced_reorder Enhanced reorder allows responders (Rx) to receive packets with all types of opcodes OOO: auto or disable auto -or / --out_reads Number of outstanding reads. 1 -qp Number of QPs. 1 --poll_stat Print average number of cookies polled per process. Disabled --qp_histogram Print workload fairness histogram for QPs. Disabled -i / --iterations Number of iterations per QP. Mutually exclusive with -D . 5000 -D / --duration Traffic duration in seconds. Mutually exclusive with -i . Iteration-based -o / --output Specify specific output: BW , Lat , or MR . Suppresses all other outputs. No filtering -ip Server IPv4 address. Hardcoded -sn / --server_name Server hostname (client side only). Mutually exclusive with -ip . -sp / --server_port Server port. 18555 -d / --device_name IB device name. First available device -w / --warmup Warmup time in seconds. 2 --disable_pcir Disable PCI relaxed ordering. Enabled (if possible) --save_raw_data Latency-only. Save raw latency to JSON file. Path optional. Disabled -j / --json Print config/output in JSON format to file. Optional path. Disabled -u / --user Executor name (used in JSON output). -sd / --session_desc Session description (used in JSON output). --cuda <cuda device id> Use CUDA memory (GPUDirect RDMA). Host memory

For advanced benchmarking scenarios involving multiple hosts, users can define tests using a structured JSON input file. Example:

Copy Copied! doca_perftest -in path_to_scenario_file.json

Capabilities:

Automatically deploys and coordinates execution across all defined hosts

Synchronized test initiation across all nodes

Collects and aggregates results on the invoking node

Use cases:

Cluster-wide RDMA performance testing

Multi-benchmark test suites

Automation and repeatability of complex test setups

