DOCA Perftest
This guide describes DOCA Perftest, an RDMA benchmarking tool designed for compute clusters that enables fine-tuned evaluation of bandwidth, message rate, and latency across various RDMA operations and complex multi-node scenarios.
NVIDIA® doca-perftest is an RDMA benchmarking utility designed to evaluate performance across a wide range of compute and networking environments—from simple client-server tests to complex, distributed cluster scenarios.
It provides fine-grained benchmarking of bandwidth, message rate, and latency, while supporting diverse RDMA operations and configurations.
Key features:
Comprehensive RDMA Benchmarks – Supports bandwidth, message rate, and latency testing.
Unified RDMA Testing Tool – A single executable for all RDMA verbs, with rich configuration options and CUDA/GPUDirect RDMA integration.
Cluster-Wide Benchmarking – Run distributed tests across multiple nodes, initiated from a single host, with aggregated performance results.
Flexible Scenario Definition – Define complex multi-node, multi-test configurations via a JSON input file.
Command-Line Simplicity – Quickly run local or point-to-point benchmarks directly from the CLI.
Synchronized Execution – Ensures all benchmarks begin and end simultaneously for consistent results.
The doca-perftest utility simplifies evaluation and comparison of RDMA performance across applications and environments.
Unlike legacy RDMA benchmarking tools (e.g., ib_write_bw, ib_send_lat), doca-perftest is a native implementation designed for modern data centers. As opposed to a wrapper, it is a standalone product that replaces both the legacy tools and the custom orchestration scripts often required to run them at scale.
Architectural differences:
Feature | Legacy Perftest | DOCA Perftest |
Scope | Point-to-Point (P2P) only | Single-node to Cluster-wide |
Orchestration | Manual or third-party wrappers | Built-in (Single-host initiation) |
Concurrency | Single-process per execution | Native multi-process/multi-core |
Synchronization | Loose (Serial start) | Hardware-aligned (Synchronized start/stop) |
Result Handling | Per-process manual extraction | Automatic cluster-wide aggregation |
Benefits of migration to doca-perftest:
Standard RDMA benchmarks require complex external scripts (Ansible, Bash, Python) to manage remote process launching, NUMA pinning, GPU selection and result parsing.
doca-perftesthandles these natively via the CLI or JSON scenario files.In large-scale clusters, measuring fabric congestion or incast/outcast scenarios requires all nodes to hit the network simultaneously.
doca-perftestutilizes a centralized sync engine to ensure all processes begin and end traffic in a coordinated window, providing accuracy that is impossible to achieve with asynchronous legacy wrappers.While legacy tools require running multiple instances to saturate high-speed links (e.g., 200G/400G+),
doca-perftestscales linearly across cores within a single execution using the-Nor-Cflags.Rather than collecting individual output files from dozens of servers,
doca-perftestprovides a unified report. This includes the full scenario definition, all raw results, and calculated aggregations per-device, per-node, and per-test.
For simple benchmarks, doca-perftest can be run directly from the command line.
When invoked on the client, the utility automatically launches the corresponding server process (requires passwordless SSH) and selects optimal CPU cores on both systems based on NUMA affinity.
Example command:
# Run on client
doca_perftest -d mlx5_0 -n <server-host-name>
This is equivalent to running:
# On server
doca_perftest -d mlx5_0 -N 1 -c RC -v write -m bw -s 65536 -D 10
# On client
doca_perftest -d mlx5_0 -N 1 -c RC -v write -m bw -s 65536 -D 10 -n <server-host-name>
Parameter breakdown:
Parameter | Description |
| Uses the device |
| Runs one process, automatically selecting an optimal core. (Use |
| Uses a Reliable Connection (RC) transport. |
| Selects the Write verb for transmission. |
| Measures bandwidth. |
| Sets message size to 65,536 bytes. |
| Runs for 10 seconds. |
| (Client only) Specifies the remote target host. |
For a full list of CLI arguments, run doca_perftest -h or man doca_perftest.
If passwordless SSH is not configured, you must manually run doca-perftest on both client and server, ensuring parameters match.
For large-scale or multi-benchmark configurations, doca-perftest accepts a JSON input file defining all participating nodes, benchmarks, and parameters.
Example invocation:
doca_perftest -f path_to_scenario_file.json
JSON mode advantages:
Can be initiated from any node in the cluster (even non-participating ones).
Synchronizes benchmark start and stop across all nodes.
Aggregates all metrics on the initiating host.
Supports predefined traffic patterns such as
ALL_TO_ALL,MANY_TO_ONE,ONE_TO_MANY, andBISECTION.Fully compatible with all CLI parameters — JSON parameters inherit the same defaults.
Example JSON configuration files are provided under: /usr/share/doc/doca-perftest/examples/. It is recommended to start by copying and modifying an existing example file.
Bandwidth
Bandwidth tests measure the aggregate data transfer rate and message-handling efficiency across all participating processes.
Metrics collected:
Message Rate (Mpps): Number of Completion Queue Entries (CQEs) processed per second.
Bandwidth (Gb/s): Total throughput (
bandwidth = message_rate × message_size).
Measurement notes:
Results are aggregated across all active test processes.
Concurrency is controlled via
-co(CLI) or thecoresfield (JSON).Test duration is averaged across processes for consistent sampling.
Interpretation tips:
Observation | Possible Cause |
High message rate, low bandwidth | Small message sizes |
High bandwidth, moderate message rate | Larger messages or fewer CQEs |
These results help optimize network saturation, queue depth, and core allocation strategies.
Latency
Latency tests measure the delay between message transmission and acknowledgment. The measured direction depends on the RDMA verb used.
RDMA verb modes:
Verb | Measurement Type |
Send/Receive | One-way latency (Client → Server) |
Write | Round-trip latency (Client → Server → Client) |
Metrics collected:
Minimum latency – Fastest observed transaction
Maximum latency – Longest observed transaction
Mean latency – Average across all iterations
Median latency – Midpoint value (less influenced by outliers)
Standard deviation – Variability indicator
99% tail latency – 99% of messages completed within this time
99.9% tail latency – Outlier detection for extreme cases
Measurement notes:
Latency measured using tight RDMA verb loops.
Timing collected on the sender side for accuracy.
Aggregated across processes for final reporting.
Interpretation tips:
Pattern | Insight |
Low mean/median, high max/tail | Indicates jitter or queue buildup |
Low standard deviation | Indicates stable and predictable performance |
High 99%/99.9% tail | Indicates possible SLA breaches in real-time workloads |
doca-perftest provides improved write latency accuracy over legacy perftest tools.
Differences in latency measurement methodologies exist; compare tools carefully when validating results.
This section highlights some of the most commonly used parameters and use-cases.
Unidirectional vs Bidirectional Traffic
doca-perftest supports two traffic-flow modes that fundamentally change how data moves between nodes and how resources are allocated.
Unidirectional Traffic (Default)
In unidirectional mode, traffic flows in one direction only.
The client (requestor) initiates operations, and the server (responder) receives them.
This is the default mode and provides clear, predictable performance metrics.
Bidirectional Traffic
In bidirectional mode, traffic flows in both directions simultaneously. Each side acts as both requestor and responder, creating full-duplex communication.
Bidirectional tests use two traffic runners (requestor + responder) sharing resources. It may show different aggregate bandwidth than 2× unidirectional.
Run bi-directional traffic from the command line:sa
# Enable bidirectional traffic
doca_perftest -d mlx5_0 -n <server-name> -b
For JSON mode, use the "TrafficDirection" field and set it to "BIDIR" or "UNIDIR".
Traffic Patterns
Traffic patterns provide built-in shortcuts for complex multi-node communication scenarios.
While these configurations were always possible through detailed JSON definitions, traffic patterns dramatically simplify setup for common topologies.
Example JSONs using traffic patterns are available under /usr/share/doc/doca-perftest/examples.
Available patterns:
ONE_TO_ONEONE_TO_MANYMANY_TO_ONEALL_TO_ALLBISECTION
Multicast is not supported. Each connection is point-to-point, synchronized to start simultaneously.
They collapse complex multi-node wiring into a few lines of JSON. Instead of manually listing dozens of connections, you specify a regex-like host list and a pattern (e.g., ALL_TO_ALL) and doca-perftest generates and synchronizes all connections for you.
One-to-One (O2O)
Simple point-to-point between two nodes; useful for baseline performance testing.
"testNodes": [ {"hostname": "node01", "deviceName": "mlx5_0"},
{"hostname": "node02", "deviceName": "mlx5_0"} ],
"trafficPattern": "ONE_TO_ONE"
One-to-Many (O2M)
Single sender to multiple receivers; the first node sends to all others.
"testNodes": [ {"hostname": "sender", "deviceName": "mlx5_0"},
{"hostname": "receiver[1-10]", "deviceName": "mlx5_0"} ],
"trafficPattern": "ONE_TO_MANY"
This creates 10 connections: sender→receiver1, sender→receiver2, ..., sender→receiver10.
Many-to-One (M2O)
Multiple senders to one receiver; all nodes send to the first node.
"testNodes": [ {"hostname": "aggregator", "deviceName": "mlx5_0"},
{"hostname": "client[01-20]", "deviceName": "mlx5_0"} ],
"trafficPattern": "MANY_TO_ONE"
This creates 20 connections: client1→aggregator, client2→aggregator, ..., client20→aggregator.
All-to-All (A2A)
Full-mesh connectivity; every node connects to every other node.
"testNodes": [ {"hostname": "compute[01-16]", "deviceName": "mlx5_0"} ],
"trafficPattern": "ALL_TO_ALL",
"trafficDirection": "UNIDIR"
This creates 240 connections (16×15) for unidirectional, or 120 bidirectional pairs.
Bisection (B)
Divides nodes into two equal halves; the first half connects to the second half. Requires an even number of nodes.
"testNodes": [ {"hostname": "rack1-[01-10]", "deviceName": "mlx5_0"},
{"hostname": "rack2-[01-10]", "deviceName": "mlx5_0"} ],
"trafficPattern": "BISECTION"
This creates 10 connections: rack1-01↔rack2-01, rack1-02↔rack2-02, ..., rack1-10↔rack2-10.
Per-iteration-sync Flow (Lock-step Benchmarking)
Designed to mimic AI workloads, this flow ensures data transfer occurs in distinct, synchronized steps. By forcing every process to wait for all peers to complete an iteration before proceeding, it enables granular data validation and allows for QP parameter modification between steps.
Configuration constraints:
Must be triggered via JSON; CLI execution is not supported
Requires
ALL_2_ALLpattern withBIDIRtrafficMust be defined by specific iterations (time-based duration is not supported).
Logic and Implementation
The flow utilizes a bidirectional ALL_TO_ALL pattern. Each iteration consists of four distinct phases:
Data phase:
Every process sends a data message to all peers.
The total
msgSizeis split across available QPs. Each QP writes to a specific offset to utilize the full buffer.
Sync phase:
Once data transfer completes, each process sends a Sync Message to all peers.
The Sync Message is a zero-length RDMA Write with Immediate Data.
Barrier phase:
A process completes the iteration only after it has received confirmation for its own Sync Send and received Sync Messages from all peers.
Post-iteration (management) phase:
Occurs after synchronization but before the next iteration begins.
Performs non-timed management tasks, such as modifying QPs, checking data validation results, or updating pointers.
Configuration
This flow is supported only in JSON mode (CLI is not supported). Add the following fields to your configuration file:
Field | Value | Notes |
|
| Activates the flow logic. |
|
| Required. Must be used even for 1:1 node connections. |
|
| Required. Flow requires bidirectional exchange. |
| (Integer) | Required. Defines the run duration. Time-based "Duration" mode is not supported. |
|
| Determines the verb used in the Data Phase. |
|
| Calculates both Bandwidth and Latency. |
Data Validation Integration
When dataValidation is set to true, the flow performs a bit-exact verification of all received data at the end of every iteration.
Highly effective for catching transient data corruption in complex A2A patterns.
Validation occurs during the "post-iteration" management phase, outside of the timed performance interval.
Limitations
Due to the heavy synchronization barrier, the measured "streaming" bandwidth will be lower than a standard continuous A2A test.
A single scenario file cannot mix synchronization types. All tests must be either "Iteration-Sync" (
"iterationSyncType": "write_imm") or standard ("iterationSyncType": "none").
Hostname and Device Name Ranged Selection
To streamline configuration for multi-node and multi-device scenarios, doca-perftest supports bracket-based range expansion in JSON mode. This allows you to define large-scale clusters concisely.
Supported Syntax
Feature | Syntax Example | Expansion Result |
Numeric Range |
|
|
Comma List |
|
|
Zero Padding |
|
|
Expansion Logic
When ranges are defined for both hostnames and device names, the tool generates all possible combinations (Cartesian product).
For example:
Input:
hostname=host[1-2],devicename=mlx5_[0-1]Result (4 Connections):
host1↔mlx5_0host1↔mlx5_1host2↔mlx5_0host2↔mlx5_1
Multiprocess (Cores)
doca-perftest can run synchronized multi-process tests, ensuring traffic starts simultaneously across all cores.
By default, it runs a single process on one automatically selected core.
Process and core selection:
Option | Description |
| Number of processes; cores auto-selected. |
| Explicitly specify core IDs or ranges. |
Examples:
# Run on 3 synchronized processes (cores auto-selected)
doca_perftest -d mlx5_0 -n <server> -N 3
# Run on specific cores
doca_perftest -d mlx5_0 -n <server> -C 5
doca_perftest -d mlx5_0 -n <server> -C 5,7
doca_perftest -d mlx5_0 -n <server> -C 5-9
Working with GPUs – Device Selection
doca-perftest can automatically select the most suitable GPU for each network device based on PCIe topology proximity. The ranking follows NVIDIA's nvidia-smi topo hierarchy: NV > PIX > PXB > PHB > NODE > SYS.
This ensures that the GPU closest to the NIC is chosen, minimizing latency and maximizing throughput.
Although auto-selection is the default behavior, users can still manually specify a GPU device using the -G argument in CLI mode, or the "cuda_dev" field in JSON mode.
# Manually choose a specific GPU
doca_perftest -d mlx5_0 -n server-name -G 0
# Automatically select both GPU and memory type (recommended)
doca_perftest -d mlx5_0 -n server-name -M cuda
# Deprecated syntax (still supported, equivalent to cuda_auto_detect)
doca_perftest -d mlx5_0 -n server-name --cuda 0
Working with GPUs – Memory Types
RDMA operations can leverage GPU memory directly, bypassing CPU involvement for maximum throughput and minimal latency.
doca-perftest supports several CUDA memory modes optimized for different hardware and driver configurations.
Auto-Detection Mode (cuda_auto_detect)
Automatically selects the best available CUDA memory type in this order:
Data Direct
DMA-BUF
Peermem
This is the recommended mode for most users.
Automatically selects the optimal CUDA memory strategy:
# Auto-detect best GPU memory type (recommended)
doca_perftest -d mlx5_0 -n server-name -M cuda -G 0
# With custom CUDA library path
doca_perftest -d mlx5_0 -n server-name -M cuda -G 0 --cuda_lib_path /usr/local/cuda-12/lib64
# Deprecated but equivalent syntax
doca_perftest -d mlx5_0 -n server-name --cuda 0
Fallback behavior: With -M cuda_auto_detect, doca_perftest automatically tries cuda_data_direct → cuda_dmabuf → cuda_peermem in this order.
Standard CUDA Memory (cuda_peermem)
Traditional CUDA peer-memory allocation.
Supported on all CUDA-capable systems, though with slightly higher overhead compared to newer methods.
# Explicitly force peermem (bypasses auto-detect)
doca_perftest -d mlx5_0 -n server-name -M cuda_peermem -G 0
# Auto-detect fallback order (when using -M cuda_auto_detect):
# 1) cuda_data_direct (fastest, requires HW/driver support)
# 2) cuda_dmabuf
# 3) cuda_peermem (universal fallback)
DMA-BUF Memory (cuda_dmabuf)
Uses the Linux DMA-BUF framework for zero-copy GPU–NIC transfers. Requires CUDA 11.7+ and kernel support.
doca_perftest -d mlx5_0 -n server-name -M cuda_dmabuf-G 0
Data Direct Memory (cuda_data_direct)
Most efficient GPU memory access method using direct PCIe mappings. Requires specific hardware and driver support; provides the lowest latency and highest throughput.
doca_perftest -d mlx5_0 -n server-name -M cuda_data_direct-G 0
Memory Types
Beyond GPU memory types, doca-perftest supports several memory allocation strategies for RDMA operations.
Host Memory (host)
Default mode using standard system RAM.
# Default host memory usage
doca_perftest -d mlx5_0 -n <server-name>
# Explicitly specify host memory
doca_perftest -d mlx5_0 -n <server-name> -M host
Null Memory Region (nullmr)
Does not allocate real memory; useful for ultra-low-latency synthetic tests.
# Null memory region for bandwidth testing
doca_perftest -d mlx5_0 -n <server-name> -M nullmr
Device Memory (device)
Allocates memory directly on the adapter hardware (limited by on-board capacity).
# Null memory region for bandwidth testing
doca_perftest -d mlx5_0 -n <server-name> -M device
RDMA Drivers
Two RDMA driver backends are supported:
The available drivers depend on your installed packages and hardware.
Driver | Prerequisites | Usage |
IBV (libibverbs) | The standard RDMA Verbs delivered as part of DOCA-OFED (and standard inbox drivers). Recommended for general compatibility across all IB/RoCE adapters. |
|
DV (doca_verbs) | The specialized DOCA RDMA Verbs backend. This provides a high-performance alternative to standard verbs and is optimized for the DOCA SDK ecosystem. |
|
Auto-Launching Remote Server
doca-perftest can automatically launch the remote server via SSH (CLI-only).
Requires passwordless SSH and identical versions on both sides.
# Auto-launch server (default)
doca_perftest -d mlx5_0 -n server-name
# Disable auto-launch
doca_perftest -d mlx5_0 -n server-name --launch_server disable
Server override examples:
# Server uses different device than client
doca_perftest -d mlx5_0 -n server-name --server_device mlx5_1
# Server uses different memory type
doca_perftest -d mlx5_0 -n server-name -M host --server_mem_type cuda_auto_detect
# Server runs on specific cores
doca_perftest -d mlx5_0 -n server-name -C 0-3 --server_cores 4-7
# Alternate server executable path
doca_perftest -d mlx5_0 -n server-name --server_exe /tmp/other_doca_perftest_version
# Different SSH username, supported by passwordless-ssh
doca_perftest -d mlx5_0 -n server-name --server_username testuser
QP Histogram
The QP histogram provides visibility into how work is distributed across multiple queue pairs during a test. This is useful for identifying load balancing issues, scheduling inefficiencies, or hardware limitations when using multiple QPs.
Enabling QP histogram:
# Enable QP histogram with multiple queue pairs
doca_perftest -d mlx5_0 -n server-name -q 8 -H
Example output:
--------------------- QP WORK DISTRIBUTION ---------------------
Qp num 0: ████████████████████████ 45.23 Gbit/sec | Relative deviation: -2.1%
Qp num 1: █████████████████████████ 46.89 Gbit/sec | Relative deviation: 1.5%
Qp num 2: ████████████████████████ 45.67 Gbit/sec | Relative deviation: -1.2%
Qp num 3: █████████████████████████████ 48.21 Gbit/sec | Relative deviation: 4.3%
Start Packet Sequence Number
Start PSN Controls the initial Packet Sequence Number for each Queue Pair (QP) at connection initialization. If unspecified, a random value is generated.
This feature is essential for debugging sequence-sensitive behavior, ensuring reproducibility, and interoperability testing.
Interface | Configuration | Requirement |
CLI |
| The number of values must exactly match the number of QPs. |
JSON |
|
Keys must be contiguous (e.g., |
Data Validation
Data validation verifies the integrity of RDMA traffic during bandwidth tests. When enabled, the requestor generates a deterministic payload for each message, and the responder compares the received data against the expected pattern.
To enable validation, set the dataValidation field to true in your test configuration.
No other specific JSON changes are required, provided the test meets the constraints listed below.
Validation introduces CPU and memory overhead, reducing measured bandwidth. iteration-sync mode, however, performs validation during the inter-iteration gap, preserving performance accuracy.
Prerequisites and Constraints
Test type: Supported only for bandwidth tests (latency testing is not supported).
Supported modes:
Standard
sendverb tests.Tests running in
iteration-syncmode.
Buffer configuration:
rxDepthmust be greater than or equal totxDepth.Warmup: Warmup time must be explicitly disabled.
Enhanced Reliability (ER): If "ER Auto Mode" is set, it will be automatically disabled when validation is active.
Output and Reporting
When validation is enabled, the JSON output includes a validationResults section.
Key metric:
invalidDataSampleCount(the total number of messages that failed validation).Logging: Individual failure logs are capped at the first 5,000 invalid samples. Additional failures are counted in the metric but not logged individually.
If validation is disabled, this section is omitted entirely.
Enhanced Connection Establishment
ECE is an optional RDMA setup phase that aligns connection capabilities between the client and server before traffic begins.
When enabled, doca-perftest exchanges ECE parameters for each connection, leveraging the hardware-firmware negotiation that occurs during the Queue Pair (QP) transition from RESET to INIT.
High‑Level Flow
The ECE process ensures both sides agree on supported features before establishing the connection.
The client queries its local ECE capabilities and sends them to the server via the control channel.
The server applies the client's proposal, transitions its QP to INIT, and queries the device for the final accepted ECE configuration.
The server sends the finalized ECE configuration back to the client.
The client applies the finalized configuration, transitions its QP to INIT, and validates the negotiated result.
Standard QP data exchange and RTR/RTS transitions proceed as usual.
ECE Configuration
Interface | Instruction |
CLI |
Add the |
JSON |
Set |
Limitations and Constraints
Driver Support | Currently supported only with the libibverbs driver ( |
Connection Type | Supported only on RC QPs |
QP Hints
DOCA RDMA Verbs supports attaching opaque Congestion Control (CC) hints to Queue Pairs (QPs) for use by the Programmable Congestion Control (PCC) algorithm.
doca-perftest allows users to provide a binary hints file along with specific metadata (file size, vendor ID, and format ID). These parameters are passed directly to the PCC via the DOCA RDMA Verbs driver.
This feature is available only when using the DOCA RDMA Verbs driver (-r dv).
Configuring QP Hints
You can configure QP hints via CLI or JSON.
CLI – Pass a comma-separated list containing the file path and metadata using the
--cc_group_hintsflag.JSON input – Add the
ccGroupHintsobject to your test configuration:"ccGroupHints": {"filePath":"/path/to/hints.bin","fileSize":1024,"vendorId":1,"formatId":1}
TPH
PCIe optimization providing hints to CPUs for cache management and reduced memory-access latency.
Requires ConnectX-6 + hardware and a TPH-enabled kernel.
Parameters:
Option | Meaning |
| Processing hint: 0 = Bidirectional (default), 1 = Requester, 2 = Completer, 3 = High-priority completer |
| Target CPU core for TPH handling |
| Memory type: |
Examples:
# Invalid: Core ID without memory type
doca_perftest -d mlx5_0 -n server-name --tph_core_id 0 # ERROR
# Invalid: Memory type without core ID
doca_perftest -d mlx5_0 -n server-name --tph_mem pm # ERROR
# Valid: Both or neither
doca_perftest -d mlx5_0 -n server-name --ph 1 # OK (hints only)
doca_perftest -d mlx5_0 -n server-name --ph 1 --tph_core_id 0 --tph_mem pm # OK (full config)
doca-perftest is capable of generating traffic from either the x86 host or the BlueField Arm cores, determined entirely by the input JSON configuration.
MPI Network Configuration
When launching doca-perftest from the server (regardless of whether the traffic originates from the x86 host or the BlueField), it is recommended to explicitly specify the MPI TCP network interface.
Add the subnet that connects the management server and the BlueField devices to the mpiTcpNetworkInterfaces field in your JSON input (e.g., "mpiTcpNetworkInterfaces": "10.7.8.0/24").
Traffic Originating from x86 Host (Server)
In this mode, traffic is generated by the x86 server. The RDMA device on the host (e.g., mlx5_0) performs DMA operations directly to/from host DRAM via PCIe.
Data path:
Path: NIC ↔ PCIe ↔ Host Memory
Bottlenecks: Performance is influenced by PCIe bandwidth and host CPU behavior, in addition to the network link and NIC capabilities.
JSON configuration:
hostName: Set to the x86 server hostname.deviceName: Set to the RDMA device on the server (e.g.,mlx5_0).
Traffic Originating from BlueField (Arm Cores)
In this mode, traffic is generated by the BlueField Arm cores, even if the test is launched from the x86 server. The RDMA device on the BlueField (e.g., p0, p1, mlx5_2) performs DMA operations to/from the BlueField's on-board DDR.
Data path:
Path: NIC ↔ DPU DDR (No PCIe hop)
Bottlenecks: Performance is typically limited by the network link, NIC, and DPU DDR bandwidth. The PCIe bus is not involved in the data path.
JSON Configuration:
hostName: Set to the BlueField hostname.deviceName: Set to the RDMA device on the BlueField (e.g.,p0,mlx5_2).NoteDevice naming conventions may vary depending on the BlueField operating mode.
doca-perftest integrates seamlessly with SLURM job schedulers, leveraging
MPI
for multi-node orchestration within SLURM allocations.
The following is a basic usage example with salloc:
Allocate nodes via SLURM (e.g.,
salloc -N8).Update the JSON to include the allocated nodes. Simple bisection example:
"testNodes": [ {"hostname":"rack1-[01-03]","deviceName":"mlx5_0"}, {"hostname":"rack2-[04-07]","deviceName":"mlx5_0"} ],"trafficPattern":"BISECTION"Run the
doca-perftestwith the updated json# Invalid: Core ID without memory type doca_perftest -f <updated-json>